For the rest of these results, the definitions in Table 6.1.1 apply:
TABLE 6.1.1 Definition of methods
Name of the Method
|
Uni/Multi
|
Impurity Measure
|
Pruning
|
Multiple Splits
|
ID3
|
Uni
|
Information Gain
|
Pre-pruning
|
No
|
ID3Gini
|
Uni
|
Gini Index
|
Pre-pruning
|
No
|
ID3Root
|
Uni
|
Weak Theory L.
|
Pre-pruning
|
No
|
ID3P
|
Uni
|
Information Gain
|
Post-pruning
|
No
|
ID3-2
|
Uni
|
Information Gain
|
Pre-pruning
|
Yes Degree 2
|
ID3-3
|
Uni
|
Information Gain
|
Post-pruning
|
Yes Degree 3
|
6.1.1.Comparison of Different Kinds of Learning Measures
In this part, the three impurity learning measures are compared: Information Gain, Gini Index and Weak Theory Learning Measure. For pruning purposes, pre-pruning is applied. This section compares these three measures in terms of accuracy, node size and learning computation time. Accuracy results for impurity measures are shown in Table 6.1.1.1 and Figure 6.1.1.1. Node results are shown in Table 6.1.1.2 and Figure 6.1.1.2. Learning time results are shown in Table 6.1.1.3 and Figure 6.1.1.3.
For three impurity measures there is no significant difference in accuracy (except in one data set).
For larger data sets, which have more than 1000 samples, in three of five cases, ID3Root is better then ID3 and ID3 is better then ID3Gini in node size significantly. In other cases no significant increase or decrease is found.
For mixed data sets, where continuous and discrete attributes are together, it is seen that the discrete attribute with larger arity is firstly selected as a split attribute. This is due to the fragmentation problem.
As the node size increases, learning time increases accordingly and this result becomes significant while the data set size growing.
In terms of learning time, in seven of 20 data sets, ID3 is better then ID3Gini significantly.
TABLE 6.1.1.1 Accuracy results for three different types of impurity measures
Data set name
|
ID3
|
ID3Gini
|
ID3Root
|
Significance
|
Breast
|
94.111.24
|
94.131.57
|
94.341.53
|
|
Bupa
|
62.265.33
|
60.463.85
|
61.394.38
|
|
Car
|
80.971.26
|
80.491.32
|
80.901.24
|
|
Cylinder
|
68.502.22
|
67.392.91
|
70.063.66
|
|
Dermatology
|
92.842.37
|
93.332.36
|
92.512.32
|
|
Ecoli
|
78.103.57
|
78.212.50
|
77.924.18
|
|
Flare
|
85.262.03
|
84.892.00
|
85.072.14
|
|
Glass
|
60.655.97
|
59.356.14
|
63.366.27
|
|
Hepatitis
|
78.443.71
|
74.3310.36
|
75.475.53
|
|
Horse
|
87.551.98
|
87.501.93
|
87.831.94
|
|
Iris
|
93.872.75
|
93.872.75
|
93.472.47
|
|
Ironosphere
|
87.633.15
|
84.962.83
|
87.002.37
|
|
Monks
|
92.2710.15
|
92.2210.20
|
92.2210.20
|
|
Mushroom
|
99.700.06
|
99.680.08
|
99.620.15
|
|
Ocrdigits
|
78.401.47
|
77.331.74
|
76.741.32
|
|
Pendigits
|
85.731.01
|
86.590.85
|
85.371.16
|
|
Segment
|
91.081.16
|
90.492.02
|
89.081.06
|
1>>3
|
Vote
|
94.941.06
|
95.631.83
|
94.940.94
|
|
Wine
|
88.653.72
|
89.553.97
|
90.113.78
|
|
Zoo
|
92.064.80
|
92.264.75
|
92.454.79
|
|
TABLE 6.1.1.2 Node results for three different types of impurity measures
Data set name
|
ID3
|
ID3Gini
|
ID3Root
|
Significance
|
Breast
|
17.002.11
|
18.802.90
|
18.604.30
|
|
Bupa
|
53.405.48
|
54.206.20
|
58.006.75
|
|
Car
|
25.400.70
|
25.100.74
|
25.600.52
|
|
Cylinder
|
54.105.90
|
59.407.75
|
56.606.70
|
2>>1
|
Dermatology
|
20.402.67
|
19.802.15
|
19.202.39
|
|
Ecoli
|
33.802.70
|
35.002.67
|
34.606.52
|
|
Flare
|
37.904.51
|
39.304.30
|
37.503.87
|
|
Glass
|
38.205.90
|
38.804.16
|
40.205.27
|
|
Hepatitis
|
19.603.78
|
20.602.95
|
21.202.90
|
|
Horse
|
55.805.92
|
56.606.64
|
55.805.92
|
|
Iris
|
8.401.35
|
8.401.35
|
8.401.35
|
|
Ironosphere
|
19.203.05
|
21.604.53
|
19.603.41
|
|
Monks
|
25.4013.53
|
25.2013.21
|
25.2013.21
|
|
Mushroom
|
23.000.00
|
24.401.71
|
22.401.26
|
|
Ocrdigits
|
74.404.01
|
97.807.50
|
61.403.63
|
2>>1>>>3
|
Pendigits
|
81.805.51
|
99.207.27
|
67.803.79
|
2>>>1>>>3
|
Segment
|
41.803.79
|
47.805.43
|
34.403.66
|
2>>3
|
Vote
|
18.203.16
|
18.003.02
|
19.403.86
|
|
Wine
|
10.401.35
|
10.202.15
|
9.201.75
|
|
Zoo
|
15.001.89
|
14.601.58
|
14.601.58
|
|
TABLE 6.1.1.3 Learning time results for different types of impurity measures (in sec.)
Data set name
|
ID3
|
ID3Gini
|
ID3Root
|
Significance
|
Breast
|
20
|
31
|
20
|
2>>1
|
Bupa
|
31
|
40
|
51
|
2>>1
|
Car
|
50
|
50
|
50
|
|
Cylinder
|
102
|
111
|
101
|
2>1
|
Dermatology
|
30
|
30
|
30
|
|
Ecoli
|
30
|
41
|
41
|
3>1
|
Flare
|
20
|
20
|
21
|
|
Glass
|
30
|
40
|
41
|
2>>1
|
Hepatitis
|
10
|
20
|
10
|
|
Horse
|
40
|
41
|
40
|
|
Iris
|
00
|
00
|
00
|
|
Ironosphere
|
397
|
487
|
397
|
|
Monks
|
21
|
21
|
21
|
|
Mushroom
|
11333
|
8433
|
9266
|
3>2,1>>3
|
Ocrdigits
|
2079
|
25440
|
17024
|
2>>1>>>3
|
Pendigits
|
47622
|
516107
|
41590
|
3>>1>>>2
|
Segment
|
34510
|
49333
|
34256
|
2>>>1>>3
|
Vote
|
10
|
10
|
11
|
|
Wine
|
10
|
20
|
20
|
|
Zoo
|
10
|
10
|
10
|
|
As mentioned, two different types of pruning techniques have been used: pre-pruning and post-pruning. For simplicity, Information Gain is used as the impurity measure. In this section, we would like to find which pruning technique is better than the other. Accuracy results for these two pruning techniques are given in Table 6.1.2.1 and Figure 6.1.2.1. Node results are shown in Table 6.1.2.2 and Figure 6.1.2.2. Learning time results are shown in Table 6.1.2.3 and Figure 6.1.2.3.
TABLE 6.1.2.1 Accuracy results for pre-pruning and post-pruning techniques
Data set name
|
ID3
|
ID3P
|
Significance
|
Breast
|
94.111.24
|
94.681.84
|
|
Bupa
|
62.265.33
|
62.84.3.39
|
|
Car
|
80.971.26
|
79.937.90
|
|
Cylinder
|
68.502.22
|
67.625.11
|
|
Dermatology
|
92.842.37
|
92.512.42
|
|
Ecoli
|
78.103.57
|
78.274.00
|
|
Flare
|
85.262.03
|
88.352.55
|
|
Glass
|
60.655.97
|
60.195.35
|
|
Hepatitis
|
78.443.71
|
78.954.48
|
|
Horse
|
87.551.98
|
88.803.02
|
|
Iris
|
93.872.75
|
92.933.33
|
|
Ironosphere
|
87.633.15
|
86.153.72
|
|
Monks
|
92.2710.15
|
89.817.82
|
|
Mushroom
|
99.700.06
|
99.870.11
|
|
Ocrdigits
|
78.401.47
|
84.341.48
|
2>>1
|
Pendigits
|
85.731.01
|
92.540.61
|
2>>>1
|
Segment
|
91.081.16
|
91.990.95
|
|
Vote
|
94.941.06
|
95.630.66
|
|
Wine
|
88.653.72
|
86.631.94
|
|
Zoo
|
92.064.80
|
82.977.36
|
|
TABLE 6.1.2.2 Node results for pre-pruning and post-pruning techniques
Data set name
|
ID3
|
ID3P
|
Significance
|
Breast
|
17.002.11
|
13.004.99
|
|
Bupa
|
53.405.48
|
17.4012.54
|
1>>>2
|
Car
|
25.400.70
|
60.7845.00
|
|
Cylinder
|
54.105.90
|
20.408.47
|
1>>>2
|
Dermatology
|
20.402.67
|
12.401.35
|
1>>2
|
Ecoli
|
33.802.70
|
14.204.64
|
1>>>2
|
Flare
|
37.904.51
|
6.106.62
|
1>>2
|
Glass
|
38.205.90
|
14.404.01
|
1>>>2
|
Hepatitis
|
19.603.78
|
2.802.39
|
1>>2
|
Horse
|
55.805.92
|
45.603.92
|
1>>2
|
Iris
|
8.401.35
|
5.400.84
|
|
Ironosphere
|
19.203.05
|
7.602.67
|
1>>2
|
Monks
|
25.4013.53
|
25.409.28
|
|
Mushroom
|
23.000.00
|
26.801.99
|
2>1
|
Ocrdigits
|
74.404.01
|
104.4012.44
|
2>1
|
Pendigits
|
81.805.51
|
134.8013.48
|
2>>1
|
Segment
|
41.803.79
|
43.006.93
|
|
Vote
|
18.203.16
|
4.002.16
|
2>>>1
|
Wine
|
10.401.35
|
6.802.57
|
|
Zoo
|
15.001.89
|
9.202.39
|
1>>>2
|
There is no significant difference in accuracy between pre-pruning and post-pruning techniques. But due to the horizon effect, two data sets have significant accuracy improvement by using post-pruning.
Post-pruning technique lends to less nodes then pre-pruning technique. (In 11 out of 20 data sets)
When horizon effect applies, the node size also increases. So in those two data sets, the node size is significantly larger then in pre-pruning technique.
In discrete data sets, where the arity is greater then five, like in Car and Mushroom data sets, post-pruning technique can not prune the tree well. So it has large number of nodes.
In some data sets, where the number of instances for one class is very high compared to the other classes, if post-pruning is applied, then the number of nodes goes to one. So the whole tree is pruned back into only one node.
Post-pruning technique takes significantly large amount of time to learn. It is because of the fact that post-pruning technique prunes the tree after its construction. In some cases, pre-pruning takes less amount of time. This is because pruning set is taken from the training set, so in post-pruning the instances in the training set is less.
TABLE 6.1.2.3 Learning time results for pre-pruning and post-pruning techniques (in sec.)
Data set name
|
ID3
|
ID3P
|
Significance
|
Breast
|
20
|
31
|
|
Bupa
|
31
|
40
|
|
Car
|
50
|
403
|
2>>1
|
Cylinder
|
102
|
101
|
1>>2
|
Dermatology
|
30
|
30
|
|
Ecoli
|
30
|
30
|
|
Flare
|
20
|
21
|
|
Glass
|
30
|
20
|
|
Hepatitis
|
10
|
10
|
|
Horse
|
40
|
40
|
|
Iris
|
00
|
00
|
|
Ironosphere
|
397
|
214
|
1>>>2
|
Monks
|
21
|
31
|
|
Mushroom
|
11333
|
8429
|
|
Ocrdigits
|
2079
|
497109
|
2>>>1
|
Pendigits
|
47622
|
931255
|
2>>>1
|
Segment
|
34510
|
2628
|
1>>>2
|
Vote
|
10
|
20
|
2>>1
|
Wine
|
10
|
10
|
|
Zoo
|
10
|
00
|
|
In this section we want to find out if it is better to use multiple splits instead of binary splits. To check this, we have made experiments on the data set with three-way and four-way splits and compared it with two-way splits. The results are shown in Table 6.1.3.1 and Figure 6.1.3.1. Node results are shown in Table 6.1.3.2 and Figure 6.1.3.2. Learning time results are shown in Table 6.1.3.3, Figure 6.1.3.3 and Figure 6.1.3.4.
TABLE 6.1.3.1 Accuracy results for splits with degrees two,three and four
Data set name
|
ID3
|
ID3-2
|
ID3-3
|
Significance
|
Breast
|
94.111.24
|
94.081.38
|
93.650.87
|
1>3
|
Bupa
|
62.265.33
|
59.414.61
|
59.702.78
|
1>>>2
|
Cylinder
|
68.502.22
|
63.624.08
|
65.445.66
|
|
Dermatology
|
92.842.37
|
92.461.80
|
91.372.51
|
|
Ecoli
|
78.103.57
|
76.613.97
|
75.244.61
|
|
Flare
|
85.262.03
|
85.262.03
|
85.262.03
|
|
Glass
|
60.655.97
|
56.925.82
|
54.214.89
|
|
Hepatitis
|
78.443.71
|
73.388.65
|
71.078.41
|
|
Horse
|
87.551.98
|
87.122.22
|
86.902.37
|
|
Iris
|
93.872.75
|
92.673.28
|
92.932.27
|
|
Ironosphere
|
87.633.15
|
87.631.39
|
N/A
|
|
Monks
|
92.2710.15
|
91.537.29
|
80.288.26
|
1>>2,1>>3
|
Ocrdigits
|
78.401.47
|
67.252.24
|
63.411.72
|
1>>>2>>>3
|
Pendigits
|
85.731.01
|
82.191.47
|
N/A
|
1>>2
|
Segment
|
91.081.16
|
N/A
|
N/A
|
|
Wine
|
88.653.72
|
86.635.28
|
83.034.90
|
|
Zoo
|
92.064.80
|
87.104.96
|
88.695.35
|
1>>2
|
TABLE 6.1.3.2 Node results for splits with degrees two,three and four
Data set name
|
ID3
|
ID3-2
|
ID3-3
|
Significance
|
Breast
|
17.002.11
|
17.903.90
|
18.803.36
|
|
Bupa
|
53.405.48
|
50.703.53
|
54.405.58
|
3>>2
|
Cylinder
|
54.105.90
|
52.405.21
|
54.804.85
|
|
Dermatology
|
20.402.67
|
20.303.56
|
27.003.89
|
3>>1
|
Ecoli
|
33.802.70
|
34.403.17
|
36.903.87
|
|
Flare
|
37.904.51
|
37.203.99
|
37.803.55
|
|
Glass
|
38.205.90
|
38.704.60
|
37.305.93
|
|
Hepatitis
|
19.603.78
|
20.603.81
|
21.403.95
|
|
Horse
|
55.805.92
|
57.506.59
|
58.206.92
|
|
Iris
|
8.401.35
|
8.002.26
|
8.101.97
|
|
Ironosphere
|
19.203.05
|
20.603.24
|
N/A
|
|
Monks
|
25.4013.53
|
33.906.76
|
38.505.04
|
2>>1,3>>>1
|
Ocrdigits
|
74.404.01
|
63.504.03
|
67.905.61
|
|
Pendigits
|
81.805.51
|
73.605.40
|
N/A
|
|
Segment
|
41.803.79
|
N/A
|
N/A
|
|
Wine
|
10.401.35
|
12.901.91
|
15.002.36
|
3>1
|
Zoo
|
15.001.89
|
16.201.99
|
16.502.17
|
2>1
|
For multiple splits the accuracy decreases while the degree of the split is increased from two to four; this difference is significant in six out of 20 data sets. This may be due to the fragmentation problem.
The number of nodes also increases when the degree of the split increases. Only in some small data sets there is a drop in node size from going degree two to three.
Learning time of higher degree splits is significantly greater then lower degree splits.
The accuracy, number of nodes and learning time does not change in data sets where all attributes are discrete (as can be expected).
TABLE 6.1.3.3 Learning time results for splits with degrees two,three and four
Data set name
|
ID3
|
ID3-2
|
ID3-3
|
Significance
|
Breast
|
20
|
40
|
111
|
3>>>2>>>1
|
Bupa
|
31
|
152
|
16863
|
3>>>2>>>1
|
Cylinder
|
102
|
7010
|
1013246
|
3>>2>>>1
|
Dermatology
|
30
|
71
|
3621
|
2>>1
|
Ecoli
|
30
|
231
|
41953
|
3>>>2>>>1
|
Flare
|
20
|
20
|
20
|
|
Glass
|
30
|
294
|
556105
|
3>>>2>>>1
|
Hepatitis
|
10
|
51
|
418
|
3>>>2>>>1
|
Horse
|
40
|
142
|
15880
|
3>2>>>1
|
Iris
|
00
|
10
|
132
|
3>>>2>>1
|
Ironosphere
|
397
|
60480
|
N/A
|
2>>>1
|
Monks
|
21
|
20
|
31
|
2>>1,3>>>1
|
Ocrdigits
|
2079
|
59698
|
2384352
|
3>>>2>>>1
|
Pendigits
|
47622
|
92722356
|
N/A
|
2>>>1
|
Segment
|
34510
|
N/A
|
N/A
|
|
Wine
|
10
|
183
|
26660
|
3>>>2>>>1
|
Zoo
|
10
|
10
|
10
|
|
|