6.4.Results for LDA
For the rest of these results, the definition given in Table 6.4.1 applies.
TABLE 6.4.1 Definition of neural-network based methods
Name
|
Class Separation
|
Pruning
|
PCA
|
PCA percentage
|
ID-LDA
|
Exchange
|
Pre-pruning
|
Always
|
%90
|
ID-LDA-R
|
Exchange
|
Pre-pruning
|
If Required
|
%90
|
ID-LDA-R99
|
Exchange
|
Pre-pruning
|
If Required
|
%99
|
Previously we saw that PCA must be used to solve the singular covariance matrix problem in Chapter 5. But there are also data sets where we do not need PCA in some nodes because the covariance matrix is invertible in those nodes. Hence, we took those data sets performances two times, one time we used always PCA and the other time we used PCA when it is required. In this section we will compare these two results and want to find out if PCA decrements the performance because of the %10 loss in variance. The results are shown in Table 6.4.1.1 and Figure 6.4.1.1 for accuracy, in Table 6.4.1.2 and Figure 6.4.1.2 for tree sizes and in Table 6.4.1.3 and Figure 6.4.1.3 for learning time. Some of the data sets are shown with an asterisk near them. In those data sets, PCA is never required.
If we look at the accuracy results we see that PCA causes a decrease in performance. In three data sets out of five, accuracy is significantly dropped when PCA is applied. In these data sets, PCA is never required. In other data sets where PCA is applied, accuracy does not change significantly.
TABLE 6.4.1.1 Accuracy results for ID-LDA and ID-LDA-R
Data set name
|
ID-LDA
|
ID-LDA-R
|
Significance
|
Breast*
|
96.650.66
|
95.850.72
|
|
Bupa*
|
57.283.23
|
67.422.97
|
2>>>1
|
Ecoli
|
83.102.50
|
83.693.58
|
|
Glass
|
57.853.67
|
55.514.43
|
|
Iris*
|
82.675.52
|
97.201.47
|
2>>>1
|
Monks*
|
66.341.93
|
74.312.26
|
2>>1
|
Wine*
|
94.043.18
|
96.072.66
|
|
Zoo
|
80.796.97
|
82.565.62
|
|
TABLE 6.4.1.2 Node results for ID-LDA and ID-LDA-R
Data set name
|
ID-LDA
|
ID-LDA-R
|
Significance
|
Breast*
|
8.001.05
|
7.200.63
|
|
Bupa*
|
6.404.90
|
8.201.93
|
|
Ecoli
|
20.204.34
|
17.604.81
|
|
Glass
|
20.605.64
|
25.603.78
|
|
Iris*
|
8.002.16
|
5.400.84
|
1>>>2
|
Monks*
|
3.000.00
|
7.202.39
|
2>1
|
Wine*
|
7.401.84
|
5.400.84
|
|
Zoo
|
12.602.07
|
12.602.07
|
|
For the node results, ID-LDA-R is better than ID-LDA in one data set, whereas ID-LDA is better than ID-LDA-R in one data set. ID-LDA is better than ID-LDA-R in Monks data set, where it can not find a split after one split. So it has lower node size. These results also effect learning time. On the Iris data set where the tree size is significantly smaller with ID-LDA-R, learning time is also significantly less.
When PCA is applied, the number of reduced dimensions is usually decreased from the root node to a leaf node. For example, while in the root node we need 14 eigenvectors to define the data on Ecoli data set, we only need 5 eigenvectors to define data in a leaf node.
TABLE 6.4.1.3 Learning time results for ID-LDA and ID-LDA-R
Data set name
|
ID-LDA
|
ID-LDA-R
|
Significance
|
Breast*
|
31
|
20
|
|
Bupa*
|
11
|
10
|
|
Ecoli
|
62
|
62
|
|
Glass
|
51
|
71
|
|
Iris*
|
11
|
00
|
1>>2
|
Monks*
|
11
|
10
|
|
Wine*
|
10
|
10
|
|
Zoo
|
20
|
20
|
|



As Section 6.4.1 shows, LDA performance is decreased when PCA is applied because of the 10% loss ( = 0.90). We have also made experiments with another percentage levels; with %99 ( = 0.99), and compared the results of two. The results are shown in Table 6.4.2.1 and Figure 6.4.2.1 for accuracy, in Table 6.4.2.2 and Figure 6.4.2.2 for tree size and in Table 6.4.2.3, Figure 6.4.2.3 and Figure 6.4.2.4 for learning time.
TABLE 6.4.2.1 Accuracy results for ID-LDA-R and ID-LDA-R99
Data set name
|
IDA-LDA-R
|
ID-LDA-R99
|
Significance
|
Car
|
70.021.75
|
92.091.07
|
2>>>1
|
Cylinder
|
67.392.39
|
69.803.01
|
|
Dermatology
|
94.751.91
|
96.171.59
|
|
Ecoli
|
83.693.58
|
83.752.53
|
|
Flare
|
88.052.39
|
88.172.83
|
|
Glass
|
55.514.43
|
57.294.16
|
|
Hepatitis
|
83.612.12
|
82.065.60
|
|
Horse
|
72.392.62
|
81.091.64
|
2>>1
|
Ironosphere
|
86.382.68
|
91.112.22
|
|
Mushroom
|
94.150.83
|
98.250.57
|
2>>>1
|
Ocrdigits
|
89.190.92
|
94.590.49
|
2>>>1
|
Pendigits
|
91.990.94
|
95.520.44
|
2>>>1
|
Segment
|
82.192.35
|
90.311.20
|
2>>>1
|
Vote
|
90.852.35
|
94.852.17
|
2>1
|
Zoo
|
82.565.62
|
81.417.25
|
|
TABLE 6.4.2.2 Node results for ID-LDA-R and ID-LDA-R99
Data set name
|
IDA-LDA-R
|
ID-LDA-R99
|
Significance
|
Car
|
1.000.00
|
12.002.54
|
2>>>1
|
Cylinder
|
10.803.33
|
16.408.95
|
|
Dermatology
|
17.002.11
|
12.801.48
|
|
Ecoli
|
17.604.81
|
20.002.71
|
|
Flare
|
5.203.71
|
5.603.13
|
|
Glass
|
25.603.78
|
26.204.44
|
|
Hepatitis
|
4.603.10
|
8.603.10
|
|
Horse
|
10.203.29
|
16.804.05
|
|
Ironosphere
|
5.401.84
|
11.602.67
|
|
Mushroom
|
17.403.63
|
19.204.85
|
|
Ocrdigits
|
87.408.37
|
59.402.07
|
1>>2
|
Pendigits
|
80.803.71
|
89.006.25
|
|
Segment
|
40.405.66
|
39.8011.08
|
|
Vote
|
9.203.05
|
9.802.53
|
|
Zoo
|
12.602.07
|
11.801.93
|
|
When we look at the accuracy results, we see that there is a dramatic increase in accuracy while going from ID-LDA-R to ID-LDA-R99. In seven data sets out of 20, there is a significant increase in accuracy that is especially noticeable on large data sets.
On the Car data set, for which ID-LDA-R can not find any split, ID-LDA-R99 gives a performance of 92 percent. Therefore its node size in ID-LDA-R99 is significantly more than in ID-LDA-R. On Ocrdigits the effect is the opposite, that is, while going from ID-LDA-R to ID-LDA-R99, the accuracy increases and the node size decreases significantly. These results have also an effect on learning time. ID-LDA-R has significantly lower learning time on Car because of no split.
TABLE 6.4.2.3 Learning time results for ID-LDA-R and ID-LDA-R99
Data set name
|
IDA-LDA-R
|
ID-LDA-R99
|
Significance
|
Car
|
62
|
286
|
2>>1
|
Cylinder
|
3414
|
7457
|
|
Dermatology
|
182
|
161
|
|
Ecoli
|
62
|
61
|
|
Flare
|
32
|
43
|
|
Glass
|
71
|
71
|
|
Hepatitis
|
11
|
21
|
|
Horse
|
3816
|
9440
|
|
Ironosphere
|
31
|
114
|
|
Mushroom
|
1846776
|
25331533
|
|
Ocrdigits
|
3189270
|
2288108
|
1>>2
|
Pendigits
|
99686
|
121191
|
2>1
|
Segment
|
15415
|
14840
|
|
Vote
|
84
|
93
|
|
Zoo
|
20
|
20
|
|




In this section we compare three types of linear decision tree construction methods. These are CART (Classification and regression trees), ID-LP (Multivariate decision tree with neural perceptron) and ID-LDA (Multivariate decision tree with linear discriminant ID-LDA-R99). The results are shown in Table 6.4.3.1, Table 6.4.3.2 and Figure 6.4.3.1 for accuracy results, in Table 6.4.3.3, Table 6.4.3.4 and Figure 6.4.3.2 for node results and in Table 6.4.3.5, Table 6.4.3.6, Figure 6.4.3.3, Figure 6.4.3.4 and Figure 6.4.3.5 for learning time results. The exchange method is used for simplicity for ID-LP and ID-LDA.
TABLE 6.4.3.1 Accuracy results for linear decision tree methods
Data set name
|
CART
|
ID-LP
|
ID-LDA
|
Significance
|
Breast
|
94.851.44
|
96.600.61
|
95.850.72
|
|
Bupa
|
61.743.38
|
63.532.76
|
67.422.97
|
|
Car
|
83.842.03
|
89.484.01
|
92.091.07
|
3>>>1
|
Cylinder
|
59.524.05
|
70.214.48
|
69.803.01
|
2>>1,3>>>1
|
Dermatology
|
80.874.56
|
85.747.06
|
96.171.59
|
3>>1
|
Ecoli
|
74.743.80
|
82.624.06
|
83.752.53
|
2>>1,3>1
|
Flare
|
81.553.60
|
88.362.37
|
88.172.83
|
|
Glass
|
53.934.20
|
54.957.83
|
57.294.16
|
|
Hepatitis
|
78.964.04
|
84.132.86
|
82.065.60
|
2>>1
|
Horse
|
76.963.02
|
82.073.48
|
81.091.64
|
2>>1
|
Iris
|
89.334.44
|
77.6015.70
|
97.201.47
|
3>>1>>2
|
Ironosphere
|
86.844.03
|
87.802.18
|
91.112.22
|
|
Monks
|
91.206.89
|
66.341.87
|
74.312.26
|
1>>3>>>2
|
Mushroom
|
93.451.75
|
99.950.03
|
98.250.57
|
2>>3>>1
|
Ocrdigits
|
81.352.08
|
93.870.92
|
94.590.49
|
3>>>1,2>>>1
|
Pendigits
|
87.102.91
|
91.944.16
|
95.520.44
|
3>>1
|
Segment
|
88.071.69
|
79.7611.58
|
90.311.20
|
|
Vote
|
90.303.17
|
94.711.05
|
94.852.17
|
2>1,3>>1
|
Wine
|
87.304.40
|
87.7512.62
|
96.072.66
|
3>1
|
Zoo
|
69.929.69
|
79.388.10
|
81.417.25
|
|
TABLE 6.4.3.2 Accuracy comparisons
Method
|
CART
|
ID-LP
|
ID-LDA
|
CART
|
|
2
|
1
|
ID-LP
|
7
|
|
1
|
ID-LDA
|
10
|
2
|
|
TABLE 6.4.3.3 Node results for linear decision tree methods
Data set name
|
CART
|
ID-LP
|
ID-LDA
|
Significance
|
Breast
|
11.602.67
|
3.000.00
|
7.200.63
|
3>>>2,1>>2
|
Bupa
|
43.203.82
|
4.601.84
|
8.201.93
|
1>>>3>>2
|
Car
|
29.003.40
|
7.400.84
|
12.002.54
|
1>>>2,1>>>3
|
Cylinder
|
45.004.90
|
8.401.90
|
16.408.95
|
1>>3,1>>>2
|
Dermatology
|
28.004.74
|
8.801.48
|
12.801.48
|
1>>3>2
|
Ecoli
|
34.005.01
|
10.802.90
|
20.002.71
|
1>3>>>2
|
Flare
|
33.806.20
|
3.202.20
|
5.603.13
|
1>>>3,1>>>2
|
Glass
|
42.404.12
|
10.204.64
|
26.204.44
|
1>>>3>>>2
|
Hepatitis
|
14.003.43
|
3.000.00
|
8.603.10
|
1>>>2
|
Horse
|
28.005.19
|
5.001.63
|
16.804.05
|
1>>>3>>>2
|
Iris
|
10.202.35
|
4.001.05
|
5.400.84
|
1>>3,1>>2
|
Ironosphere
|
16.403.78
|
3.801.03
|
11.602.67
|
1>>2,3>>>2
|
Monks
|
17.8010.16
|
3.000.00
|
7.202.39
|
1>>>2
|
Mushroom
|
43.006.53
|
3.000.00
|
19.204.85
|
1>>3>>2
|
Ocrdigits
|
70.803.98
|
34.804.94
|
59.402.07
|
1>3>>>2
|
Pendigits
|
77.8010.08
|
30.406.40
|
89.006.25
|
3>>>2,1>>>2
|
Segment
|
45.208.97
|
16.606.65
|
39.8011.08
|
1>>2
|
Vote
|
17.205.29
|
4.201.93
|
9.802.53
|
1>>>2
|
Wine
|
9.402.27
|
4.400.97
|
5.400.84
|
1>2
|
Zoo
|
25.204.94
|
8.801.75
|
11.801.93
|
1>>>3,1>>>2
|
TABLE 6.4.3.4 Node comparisons
Method
|
CART
|
ID-LP
|
ID-LDA
|
CART
|
|
0
|
0
|
ID-LP
|
20
|
|
10
|
ID-LDA
|
12
|
0
|
|
TABLE 6.4.3.5 Learning time results for linear decision tree methods
Data set name
|
CART
|
ID-LP
|
ID-LDA
|
Significance
|
Breast
|
10717
|
50
|
20
|
1>>>2>>>3
|
Bupa
|
25223
|
31
|
10
|
1>>>2>>3
|
Car
|
1178148
|
15216
|
286
|
1>>>2>>>3
|
Cylinder
|
4589343
|
192
|
7457
|
1>>>2,1>>>3
|
Dermatology
|
858170
|
429
|
161
|
1>>>2>>3
|
Ecoli
|
22125
|
5715
|
61
|
1>>>2>>>3
|
Flare
|
1032203
|
94
|
43
|
1>>>2,1>>>3
|
Glass
|
32025
|
339
|
71
|
1>>>2>>>3
|
Hepatitis
|
20947
|
10
|
21
|
1>>>2,1>>>3
|
Horse
|
34811101
|
142
|
9440
|
1>>3>>2
|
Iris
|
3111
|
30
|
00
|
1>>2>>>3
|
Ironosphere
|
54494
|
41
|
114
|
1>>>3>2
|
Monks
|
12661
|
30
|
10
|
1>>2>>>3
|
Mushroom
|
336132942
|
628204
|
25331533
|
1>>>2,1>>>3
|
Ocrdigits
|
9148713
|
8035757
|
2288108
|
2>>>3,1>>>3
|
Pendigits
|
3311350
|
183403319
|
121191
|
2>>>1>>>3
|
Segment
|
1212170
|
937103
|
14840
|
1>2>>>3,
|
Vote
|
805167
|
61
|
93
|
1>>>2,1>>>3
|
Wine
|
8426
|
41
|
10
|
1>>>2>>3
|
Zoo
|
45361
|
102
|
20
|
1>>>2>>>3
|
TABLE 6.4.3.6 Learning time comparisons
Method
|
CART
|
ID-LP
|
ID-LDA
|
CART
|
|
1
|
0
|
ID-LP
|
18
|
|
2
|
ID-LDA
|
20
|
13
|
|
If we compare the three linear methods in terms of accuracy, node size and learning time, we see that:
-
Accuracy: ID-LP=ID-LDA>CART.
-
Node Size: CART>ID-LDA>ID-LP.
-
Learning Time: CART >ID-LP>ID-LDA.
In terms of accuracy, CART outperforms ID-LP in those data sets where ID-LP does not always converge. On the Monks data set, CART outperforms ID-LP and ID-LDA quite significantly.





|