For the rest of these results, the definition given in Table 6.3.1 applies.
TABLE 6.3.1 Definition of neural-network based methods
Name
|
Class Separation
|
Impurity Measure
|
Pruning
|
Linearity
|
ID-LPS
|
Selection
|
Information Gain
|
Pre-pruning
|
Linear
|
ID-LPE
|
Exchange
|
Information Gain
|
Pre-pruning
|
Linear
|
ID-MLPE
|
Exchange
|
Information Gain
|
Pre-pruning
|
Nonlinear
|
ID-Hybrid-F
|
Exchange
|
Information Gain
|
Pre-pruning
|
Both with F-test
|
ID-Hybrid-t
|
Exchange
|
Information Gain
|
Pre-pruning
|
Both with t-test
| 6.3.1.Comparison of Class Separation Techniques
The aim of this section is to find which class separation technique (selection or exchange) is better than the other. For simplicity, other variables such as impurity measure or pruning technique are fixed. If there are only two classes available in a data set, it is not included in the results because there will be no class separation. The results are shown in Table 6.3.1.1 and Figure 6.3.1.1. Node results are shown in Table 6.3.1.2 and Figure 6.3.1.2. Learning time results are shown in Table 6.3.1.3, Figure 6.3.1.4 and Figure 6.3.1.5.
In none of the data sets, the selection method is more accurate than the exchange method in accuracy. But the exchange method is more accurate than selection method in three data sets. Two of these data sets Ocrdigits and Pendigits have 10 classes. The other data set Ecoli has eight classes. So we can conclude that, the more classes you have, the better is the exchange method compared to the selection method, due to the large number of division candidates.
If the node size results are compared, it is also seen that in two data sets, Pendigits and Glass (which has eight classes), out of 11, the exchange method is better than the selection method while the selection is never better.
As we have explained, the exchange method has larger time complexity. So in all data sets except one, the selection method is better than the exchange method in terms of learning time. This significance also increases with the size of the data set and the number of classes.
TABLE 6.3.1.1 Accuracy results for ID-LPS and ID-LPE
Data set name
|
ID-LPS
|
ID-LPE
|
Significance
|
Car
|
87.503.07
|
89.484.01
|
|
Dermatology
|
69.5122.01
|
85.747.06
|
|
Ecoli
|
68.515.39
|
82.624.06
|
2>1
|
Flare
|
88.172.21
|
88.362.37
|
|
Glass
|
55.536.16
|
54.957.83
|
|
Iris
|
81.7314.40
|
77.6015.70
|
|
Ocrdigits
|
54.146.25
|
93.870.92
|
2>>>1
|
Pendigits
|
67.465.44
|
91.944.16
|
2>>>1
|
Segment
|
70.556.68
|
79.7611.58
|
|
Wine
|
85.0614.00
|
87.7512.62
|
|
Zoo
|
78.017.67
|
79.388.10
|
|
TABLE 6.3.1.2 Node results for ID-LPS and ID-LPE
Data set name
|
ID-LPS
|
ID-LPE
|
Significance
|
Car
|
11.406.10
|
7.400.84
|
|
Dermatology
|
7.403.10
|
8.801.48
|
|
Ecoli
|
15.005.33
|
10.802.90
|
|
Flare
|
2.801.99
|
3.202.20
|
|
Glass
|
20.803.46
|
10.204.64
|
1>>2
|
Iris
|
5.603.13
|
4.001.05
|
|
Ocrdigits
|
45.204.76
|
34.804.94
|
|
Pendigits
|
58.409.52
|
30.406.40
|
1>>>2
|
Segment
|
28.606.31
|
16.606.65
|
|
Wine
|
4.201.03
|
4.400.97
|
|
Zoo
|
11.402.07
|
8.801.75
|
|
TABLE 6.3.1.3 Learning time results for ID-LPS and ID-LPE
Data set name
|
ID-LPS
|
ID-LPE
|
Significance
|
Car
|
7917
|
15216
|
2>>>1
|
Dermatology
|
224
|
429
|
2>>1
|
Ecoli
|
224
|
5715
|
2>>1
|
Flare
|
52
|
94
|
2>>1
|
Glass
|
131
|
339
|
2>>>1
|
Iris
|
20
|
30
|
2>1
|
Ocrdigits
|
2764384
|
8035757
|
2>>>1
|
Pendigits
|
4164246
|
183403319
|
2>>>1
|
Segment
|
40763
|
937103
|
2>>>1
|
Wine
|
20
|
41
|
|
Zoo
|
51
|
102
|
2>>1
|
6.3.2.Comparison of Hybrid Tests in Decision Nodes for Neural Networks
The aim of this section is to find which test measure (F-test or t-test) is best in comparing the performance of hybrid trees. In big data sets as Mushroom, Ocrdigits, Pendigits and Segment, training is done with 10 epochs instead of 50 epochs. This is due to the large amount of computation to train the networks with t-test. For example training with t-test of Ocrdigits data set takes approximately 4 days, where we have 160 runs like that.
TABLE 6.3.2.1 Accuracy results for hybrid network models
Data set name
|
ID-Hybrid-F
|
ID-Hybrid-t
|
Significance
|
Breast
|
96.620.55
|
96.620.63
|
|
Bupa
|
63.422.57
|
63.713.24
|
|
Car
|
94.511.15
|
92.191.37
|
1>2
|
Cylinder
|
71.311.74
|
71.241.89
|
|
Dermatology
|
94.544.67
|
85.7411.97
|
|
Ecoli
|
83.104.19
|
81.433.75
|
|
Flare
|
88.112.43
|
87.982.28
|
|
Glass
|
55.059.72
|
60.376.60
|
|
Hepatitis
|
83.743.41
|
83.483.38
|
|
Horse
|
82.662.58
|
82.013.28
|
|
Iris
|
92.673.28
|
92.803.34
|
|
Ironosphere
|
87.802.15
|
87.351.79
|
|
Monks
|
66.391.85
|
66.301.77
|
|
Mushroom
|
99.960.03
|
99.950.03
|
|
Ocrdigits
|
92.792.20
|
N/A
|
|
Pendigits
|
90.829.62
|
N/A
|
|
Segment
|
81.7712.97
|
85.136.33
|
|
Vote
|
94.711.13
|
94.801.06
|
|
Wine
|
96.072.07
|
95.962.32
|
|
Zoo
|
86.935.39
|
86.744.15
|
|
TABLE 6.3.2.2 Node results for hybrid network models
Data set name
|
ID-Hybrid-F
|
ID-Hybrid-t
|
Significance
|
Breast
|
3.000.00
|
3.000.00
|
|
Bupa
|
4.401.90
|
4.401.65
|
|
Car
|
7.600.97
|
7.201.48
|
|
Cylinder
|
8.801.75
|
9.002.11
|
|
Dermatology
|
11.201.14
|
11.000.00
|
|
Ecoli
|
10.602.27
|
10.802.57
|
|
Flare
|
3.001.33
|
2.400.97
|
|
Glass
|
11.005.50
|
11.802.70
|
|
Hepatitis
|
3.000.00
|
3.000.00
|
|
Horse
|
5.602.84
|
5.201.75
|
|
Iris
|
5.000.00
|
5.000.00
|
|
Ironosphere
|
4.001.05
|
3.801.03
|
|
Monks
|
3.000.00
|
3.000.00
|
|
Mushroom
|
3.000.00
|
3.000.00
|
|
Ocrdigits
|
25.403.75
|
N/A
|
|
Pendigits
|
23.405.80
|
N/A
|
|
Segment
|
14.402.84
|
14.603.10
|
|
Vote
|
4.201.93
|
4.401.90
|
|
Wine
|
5.000.00
|
5.200.63
|
|
Zoo
|
12.401.90
|
12.601.26
|
|
The results are shown in Table 6.3.2.1 and Figure 6.3.2.1. Node results are shown in Table 6.3.2.2 and Figure 6.3.2.2. Learning time results are shown in Table 6.3.2.3, Figure 6.3.2.4 and Figure 6.3.2.5.
There is no significant difference in terms of accuracy and node size between the two test selection measures (Only in Car in terms of accuracy).
But the difference is in learning time. In all data sets, t-test is slower than F-test with over than %99 level. Because t-test runs with 30 fold cross validation with the whole training set while F-test runs only 10 fold cross validation with half of the training set.
TABLE 6.3.2.3 Learning time results for hybrid network models
Data set name
|
ID-Hybrid-F
|
ID-Hybrid-t
|
Significance
|
Breast
|
301
|
1754
|
2>>>1
|
Bupa
|
153
|
9117
|
2>>>1
|
Car
|
911151
|
6422656
|
2>>>1
|
Cylinder
|
36645
|
2455346
|
2>>>1
|
Dermatology
|
39925
|
3428899
|
2>>>1
|
Ecoli
|
21938
|
1697338
|
2>>>1
|
Flare
|
6726
|
420134
|
2>>1
|
Glass
|
13728
|
1275281
|
2>>>1
|
Hepatitis
|
100
|
661
|
2>>>1
|
Horse
|
32789
|
2089392
|
2>>>1
|
Iris
|
141
|
9511
|
2>>>1
|
Ironosphere
|
539
|
30059
|
2>>>1
|
Monks
|
160
|
971
|
2>>>1
|
Mushroom*
|
5529414
|
8546934
|
2>>>1
|
Ocrdigits*
|
147913204
|
N/A
|
2>>>1
|
Pendigits*
|
99421566
|
N/A
|
2>>>1
|
Segment*
|
4756453
|
6217529
|
2>>1
|
Vote
|
5311
|
37684
|
2>>>1
|
Wine
|
192
|
12526
|
2>>>1
|
Zoo
|
6610
|
42280
|
2>>>1
|
We must also determine the type of the neural network to train in a decision tree. So we must find out which type of neural network performs the best. In order to accomplish this task, we have three different types of networks: Linear perceptron, multilayer perceptron and a hybrid of them (with F-test). Multilayer perceptron is a nonlinear method. These three networks are compared according to accuracy, node size and learning time. Accuracy results are shown in Table 6.3.3.1 and Figure 6.3.3.1. Node results are shown in Table 6.3.3.2 and Figure 6.3.3.2. Learning time results are shown in Table 6.3.3.3, Figures 6.3.3.3, 6.3.3.4 and 6.3.3.5.
Linear neural network methods results can be divided into two groups. Data sets having two classes and data sets having more than two classes. If the data set has two classes and if the classes are not linearly separable, the accuracy results can be very low. But if they are linearly separable, the results can be very good as in Breast data set. For nonlinear network models, results are higher in such data sets. More generally in three data sets out of 20, the nonlinear model outperforms the linear model and in two data sets, the hybrid model outperforms the linear model. In four data sets; Ocrdigits, Dermatology, Zoo and Segment, the nonlinear model has good results but does not converge all the time. So these data sets have larger variance.
If we look at the node results, the nonlinear model is better than linear model in two data sets and it is better than the hybrid model in two data sets. Any data set having c classes must have at least 2c-1 nodes so that each class can be in one leaf. The nonlinear model converges to the optimum solution in the number of nodes. There is an order between the node size of the models as Linear > Hybrid > Nonlinear model. In some data sets we see that the hybrid model performs worse than the other two in terms of node size. This mainly depends on the deviation of the results. The nonlinear model outperforms the hybrid model in four data sets and the linear model in two data sets. The hybrid and linear models outperform each other in only one data set.
In terms of the time consumed for learning, linear model performs the best as we have expected. If we compare times we see an ordering as Hybrid > Nonlinear > Linear. But sometimes, the linear model has larger training time than the nonlinear model, which is due to the large number of nodes in the tree with the linear model and the large number of instances of that data set.
TABLE 6.3.3.1 Accuracy results for different network models
Data set name
|
ID-LPE
|
ID-MLPE
|
ID-Hybrid-F
|
Significance
|
Breast
|
96.600.61
|
96.770.91
|
96.620.55
|
|
Bupa
|
63.532.76
|
63.244.31
|
63.422.57
|
|
Car
|
89.484.01
|
96.862.30
|
94.511.15
|
|
Cylinder
|
70.214.48
|
70.359.56
|
71.311.74
|
|
Dermatology
|
85.747.06
|
87.8113.59
|
94.544.67
|
|
Ecoli
|
82.624.06
|
80.125.12
|
83.104.19
|
|
Flare
|
88.362.37
|
87.672.56
|
88.112.43
|
|
Glass
|
54.957.83
|
58.0413.30
|
55.059.72
|
|
Hepatitis
|
84.132.86
|
83.742.43
|
83.743.41
|
|
Horse
|
82.073.48
|
84.672.64
|
82.662.58
|
|
Iris
|
77.6015.70
|
92.673.57
|
92.673.28
|
2>1,3>1
|
Ironosphere
|
87.802.18
|
87.522.11
|
87.802.15
|
|
Monks
|
66.341.87
|
66.992.17
|
66.391.85
|
|
Mushroom
|
99.950.03
|
99.990.02
|
99.960.03
|
2>1
|
Ocrdigits
|
93.870.92
|
83.9010.22
|
92.792.20
|
|
Pendigits
|
91.944.16
|
91.356.55
|
90.829.62
|
|
Segment
|
79.7611.58
|
80.3512.36
|
81.7712.97
|
|
Vote
|
94.711.05
|
95.581.72
|
94.711.13
|
|
Wine
|
87.7512.62
|
95.962.13
|
96.072.07
|
2>1,3>1
|
Zoo
|
79.388.10
|
85.3311.86
|
86.935.39
|
|
TABLE 6.3.3.2 Node results for different network models
Data set name
|
ID-LPE
|
ID-MLPE
|
ID-HybridEf
|
Significance
|
Breast
|
3.000.00
|
3.000.00
|
3.000.00
|
|
Bupa
|
4.601.84
|
3.801.03
|
4.401.90
|
|
Car
|
7.400.84
|
6.601.26
|
7.600.97
|
|
Cylinder
|
8.401.90
|
6.401.65
|
8.801.75
|
3>>2
|
Dermatology
|
8.801.48
|
9.602.32
|
11.201.14
|
3>1
|
Ecoli
|
10.802.90
|
8.802.20
|
10.602.27
|
|
Flare
|
3.202.20
|
2.201.03
|
3.001.33
|
|
Glass
|
10.204.64
|
7.203.71
|
11.005.50
|
|
Hepatitis
|
3.000.00
|
3.000.00
|
3.000.00
|
|
Horse
|
5.001.63
|
4.001.41
|
5.602.84
|
|
Iris
|
4.001.05
|
5.000.00
|
5.000.00
|
2>>1,3>>1
|
Ironosphere
|
3.801.03
|
3.801.03
|
4.001.05
|
|
Monks
|
3.000.00
|
3.000.00
|
3.000.00
|
|
Mushroom
|
3.000.00
|
3.000.00
|
3.000.00
|
|
Ocrdigits
|
34.804.94
|
18.403.53
|
25.403.75
|
1>3>>2
|
Pendigits
|
30.406.40
|
17.601.35
|
23.405.80
|
1>>>2,3>>2
|
Segment
|
16.606.65
|
11.602.12
|
14.402.84
|
|
Vote
|
4.201.93
|
3.000.00
|
4.201.93
|
|
Wine
|
4.400.97
|
5.000.00
|
5.000.00
|
|
Zoo
|
8.801.75
|
10.602.80
|
12.401.90
|
3>1
|
TABLE 6.3.3.3 Learning time results for different network models
Data set name
|
ID-LPE
|
ID-MLPE
|
ID-HybridEf
|
Significance
|
Breast
|
50
|
70
|
301
|
3>>>1,3>>>2
|
Bupa
|
31
|
31
|
153
|
3>>>2>>>1
|
Car
|
15216
|
21618
|
911151
|
3>>>2>>>1
|
Cylinder
|
192
|
10215
|
36645
|
3>>>2>>>1
|
Dermatology
|
429
|
12219
|
39925
|
3>>>2>>>1
|
Ecoli
|
5715
|
5013
|
21938
|
3>>>1,3>>>2
|
Flare
|
94
|
187
|
6726
|
3>>2>>1
|
Glass
|
339
|
278
|
13728
|
3>>>1,3>>>2
|
Hepatitis
|
10
|
30
|
100
|
3>>>1,3>>>2
|
Horse
|
142
|
9720
|
32789
|
3>2>>>1
|
Iris
|
30
|
30
|
141
|
3>>>1,3>>>2
|
Ironosphere
|
41
|
142
|
539
|
3>>>2>>>1
|
Monks
|
30
|
30
|
160
|
3>>>1,3>>>2
|
Mushroom
|
628204
|
1858270
|
5529414
|
3>>>2>>>1
|
Ocrdigits*
|
8035757
|
109931402
|
147913204
|
3>>>2>1
|
Pendigits*
|
183403319
|
84731742
|
99421566
|
3>>>1>>2
|
Segment
|
937103
|
927126
|
4756453
|
3>>>1,3>>>2
|
Vote
|
61
|
132
|
5311
|
3>>2>>1
|
Wine
|
41
|
41
|
192
|
3>>>1,3>>>2
|
Zoo
|
102
|
184
|
6610
|
3>>>2>>>1
|
|