Ana səhifə

Multivariate decision trees for machine learning


Yüklə 5.57 Mb.
səhifə14/15
tarix24.06.2016
ölçüsü5.57 Mb.
1   ...   7   8   9   10   11   12   13   14   15

6.4.Results for LDA


For the rest of these results, the definition given in Table 6.4.1 applies.

TABLE 6.4.1 Definition of neural-network based methods



Name

Class Separation

Pruning

PCA

PCA percentage

ID-LDA

Exchange

Pre-pruning

Always

%90

ID-LDA-R

Exchange

Pre-pruning

If Required

%90

ID-LDA-R99

Exchange

Pre-pruning

If Required

%99

6.4.1.Effects of PCA on the Results


Previously we saw that PCA must be used to solve the singular covariance matrix problem in Chapter 5. But there are also data sets where we do not need PCA in some nodes because the covariance matrix is invertible in those nodes. Hence, we took those data sets performances two times, one time we used always PCA and the other time we used PCA when it is required. In this section we will compare these two results and want to find out if PCA decrements the performance because of the %10 loss in variance. The results are shown in Table 6.4.1.1 and Figure 6.4.1.1 for accuracy, in Table 6.4.1.2 and Figure 6.4.1.2 for tree sizes and in Table 6.4.1.3 and Figure 6.4.1.3 for learning time. Some of the data sets are shown with an asterisk near them. In those data sets, PCA is never required.

If we look at the accuracy results we see that PCA causes a decrease in performance. In three data sets out of five, accuracy is significantly dropped when PCA is applied. In these data sets, PCA is never required. In other data sets where PCA is applied, accuracy does not change significantly.



TABLE 6.4.1.1 Accuracy results for ID-LDA and ID-LDA-R

Data set name

ID-LDA

ID-LDA-R

Significance

Breast*

96.650.66

95.850.72




Bupa*

57.283.23

67.422.97

2>>>1

Ecoli

83.102.50

83.693.58




Glass

57.853.67

55.514.43




Iris*

82.675.52

97.201.47

2>>>1

Monks*

66.341.93

74.312.26

2>>1

Wine*

94.043.18

96.072.66




Zoo

80.796.97

82.565.62




TABLE 6.4.1.2 Node results for ID-LDA and ID-LDA-R

Data set name

ID-LDA

ID-LDA-R

Significance

Breast*

8.001.05

7.200.63




Bupa*

6.404.90

8.201.93




Ecoli

20.204.34

17.604.81




Glass

20.605.64

25.603.78




Iris*

8.002.16

5.400.84

1>>>2

Monks*

3.000.00

7.202.39

2>1

Wine*

7.401.84

5.400.84




Zoo

12.602.07

12.602.07




For the node results, ID-LDA-R is better than ID-LDA in one data set, whereas ID-LDA is better than ID-LDA-R in one data set. ID-LDA is better than ID-LDA-R in Monks data set, where it can not find a split after one split. So it has lower node size. These results also effect learning time. On the Iris data set where the tree size is significantly smaller with ID-LDA-R, learning time is also significantly less.

When PCA is applied, the number of reduced dimensions is usually decreased from the root node to a leaf node. For example, while in the root node we need 14 eigenvectors to define the data on Ecoli data set, we only need 5 eigenvectors to define data in a leaf node.

TABLE 6.4.1.3 Learning time results for ID-LDA and ID-LDA-R


Data set name

ID-LDA

ID-LDA-R

Significance

Breast*

31

20




Bupa*

11

10




Ecoli

62

62




Glass

51

71




Iris*

11

00

1>>2

Monks*

11

10




Wine*

10

10




Zoo

20

20











6.4.2.Effects of PCA Percentage on the Results


As Section 6.4.1 shows, LDA performance is decreased when PCA is applied because of the 10% loss ( = 0.90). We have also made experiments with another percentage levels; with %99 ( = 0.99), and compared the results of two. The results are shown in Table 6.4.2.1 and Figure 6.4.2.1 for accuracy, in Table 6.4.2.2 and Figure 6.4.2.2 for tree size and in Table 6.4.2.3, Figure 6.4.2.3 and Figure 6.4.2.4 for learning time.

TABLE 6.4.2.1 Accuracy results for ID-LDA-R and ID-LDA-R99



Data set name

IDA-LDA-R

ID-LDA-R99

Significance

Car

70.021.75

92.091.07

2>>>1

Cylinder

67.392.39

69.803.01




Dermatology

94.751.91

96.171.59




Ecoli

83.693.58

83.752.53




Flare

88.052.39

88.172.83




Glass

55.514.43

57.294.16




Hepatitis

83.612.12

82.065.60




Horse

72.392.62

81.091.64

2>>1

Ironosphere

86.382.68

91.112.22




Mushroom

94.150.83

98.250.57

2>>>1

Ocrdigits

89.190.92

94.590.49

2>>>1

Pendigits

91.990.94

95.520.44

2>>>1

Segment

82.192.35

90.311.20

2>>>1

Vote

90.852.35

94.852.17

2>1

Zoo

82.565.62

81.417.25



TABLE 6.4.2.2 Node results for ID-LDA-R and ID-LDA-R99



Data set name

IDA-LDA-R

ID-LDA-R99

Significance

Car

1.000.00

12.002.54

2>>>1

Cylinder

10.803.33

16.408.95




Dermatology

17.002.11

12.801.48




Ecoli

17.604.81

20.002.71




Flare

5.203.71

5.603.13




Glass

25.603.78

26.204.44




Hepatitis

4.603.10

8.603.10




Horse

10.203.29

16.804.05




Ironosphere

5.401.84

11.602.67




Mushroom

17.403.63

19.204.85




Ocrdigits

87.408.37

59.402.07

1>>2

Pendigits

80.803.71

89.006.25




Segment

40.405.66

39.8011.08




Vote

9.203.05

9.802.53




Zoo

12.602.07

11.801.93




When we look at the accuracy results, we see that there is a dramatic increase in accuracy while going from ID-LDA-R to ID-LDA-R99. In seven data sets out of 20, there is a significant increase in accuracy that is especially noticeable on large data sets.

On the Car data set, for which ID-LDA-R can not find any split, ID-LDA-R99 gives a performance of 92 percent. Therefore its node size in ID-LDA-R99 is significantly more than in ID-LDA-R. On Ocrdigits the effect is the opposite, that is, while going from ID-LDA-R to ID-LDA-R99, the accuracy increases and the node size decreases significantly. These results have also an effect on learning time. ID-LDA-R has significantly lower learning time on Car because of no split.



TABLE 6.4.2.3 Learning time results for ID-LDA-R and ID-LDA-R99

Data set name

IDA-LDA-R

ID-LDA-R99

Significance

Car

62

286

2>>1

Cylinder

3414

7457




Dermatology

182

161




Ecoli

62

61




Flare

32

43




Glass

71

71




Hepatitis

11

21




Horse

3816

9440




Ironosphere

31

114




Mushroom

1846776

25331533




Ocrdigits

3189270

2288108

1>>2

Pendigits

99686

121191

2>1

Segment

15415

14840




Vote

84

93




Zoo

20

20












6.4.3.Comparison of Different Linear Multivariate Techniques


In this section we compare three types of linear decision tree construction methods. These are CART (Classification and regression trees), ID-LP (Multivariate decision tree with neural perceptron) and ID-LDA (Multivariate decision tree with linear discriminant ID-LDA-R99). The results are shown in Table 6.4.3.1, Table 6.4.3.2 and Figure 6.4.3.1 for accuracy results, in Table 6.4.3.3, Table 6.4.3.4 and Figure 6.4.3.2 for node results and in Table 6.4.3.5, Table 6.4.3.6, Figure 6.4.3.3, Figure 6.4.3.4 and Figure 6.4.3.5 for learning time results. The exchange method is used for simplicity for ID-LP and ID-LDA.
TABLE 6.4.3.1 Accuracy results for linear decision tree methods

Data set name

CART

ID-LP

ID-LDA

Significance

Breast

94.851.44

96.600.61

95.850.72




Bupa

61.743.38

63.532.76

67.422.97




Car

83.842.03

89.484.01

92.091.07

3>>>1

Cylinder

59.524.05

70.214.48

69.803.01

2>>1,3>>>1

Dermatology

80.874.56

85.747.06

96.171.59

3>>1

Ecoli

74.743.80

82.624.06

83.752.53

2>>1,3>1

Flare

81.553.60

88.362.37

88.172.83




Glass

53.934.20

54.957.83

57.294.16




Hepatitis

78.964.04

84.132.86

82.065.60

2>>1

Horse

76.963.02

82.073.48

81.091.64

2>>1

Iris

89.334.44

77.6015.70

97.201.47

3>>1>>2

Ironosphere

86.844.03

87.802.18

91.112.22




Monks

91.206.89

66.341.87

74.312.26

1>>3>>>2

Mushroom

93.451.75

99.950.03

98.250.57

2>>3>>1

Ocrdigits

81.352.08

93.870.92

94.590.49

3>>>1,2>>>1

Pendigits

87.102.91

91.944.16

95.520.44

3>>1

Segment

88.071.69

79.7611.58

90.311.20




Vote

90.303.17

94.711.05

94.852.17

2>1,3>>1

Wine

87.304.40

87.7512.62

96.072.66

3>1

Zoo

69.929.69

79.388.10

81.417.25




TABLE 6.4.3.2 Accuracy comparisons

Method

CART

ID-LP

ID-LDA

CART




2

1

ID-LP

7




1

ID-LDA

10

2




TABLE 6.4.3.3 Node results for linear decision tree methods

Data set name

CART

ID-LP

ID-LDA

Significance

Breast

11.602.67

3.000.00

7.200.63

3>>>2,1>>2

Bupa

43.203.82

4.601.84

8.201.93

1>>>3>>2

Car

29.003.40

7.400.84

12.002.54

1>>>2,1>>>3

Cylinder

45.004.90

8.401.90

16.408.95

1>>3,1>>>2

Dermatology

28.004.74

8.801.48

12.801.48

1>>3>2

Ecoli

34.005.01

10.802.90

20.002.71

1>3>>>2

Flare

33.806.20

3.202.20

5.603.13

1>>>3,1>>>2

Glass

42.404.12

10.204.64

26.204.44

1>>>3>>>2

Hepatitis

14.003.43

3.000.00

8.603.10

1>>>2

Horse

28.005.19

5.001.63

16.804.05

1>>>3>>>2

Iris

10.202.35

4.001.05

5.400.84

1>>3,1>>2

Ironosphere

16.403.78

3.801.03

11.602.67

1>>2,3>>>2

Monks

17.8010.16

3.000.00

7.202.39

1>>>2

Mushroom

43.006.53

3.000.00

19.204.85

1>>3>>2

Ocrdigits

70.803.98

34.804.94

59.402.07

1>3>>>2

Pendigits

77.8010.08

30.406.40

89.006.25

3>>>2,1>>>2

Segment

45.208.97

16.606.65

39.8011.08

1>>2

Vote

17.205.29

4.201.93

9.802.53

1>>>2

Wine

9.402.27

4.400.97

5.400.84

1>2

Zoo

25.204.94

8.801.75

11.801.93

1>>>3,1>>>2

TABLE 6.4.3.4 Node comparisons

Method

CART

ID-LP

ID-LDA

CART




0

0

ID-LP

20




10

ID-LDA

12

0



TABLE 6.4.3.5 Learning time results for linear decision tree methods



Data set name

CART

ID-LP

ID-LDA

Significance

Breast

10717

50

20

1>>>2>>>3

Bupa

25223

31

10

1>>>2>>3

Car

1178148

15216

286

1>>>2>>>3

Cylinder

4589343

192

7457

1>>>2,1>>>3

Dermatology

858170

429

161

1>>>2>>3

Ecoli

22125

5715

61

1>>>2>>>3

Flare

1032203

94

43

1>>>2,1>>>3

Glass

32025

339

71

1>>>2>>>3

Hepatitis

20947

10

21

1>>>2,1>>>3

Horse

34811101

142

9440

1>>3>>2

Iris

3111

30

00

1>>2>>>3

Ironosphere

54494

41

114

1>>>3>2

Monks

12661

30

10

1>>2>>>3

Mushroom

336132942

628204

25331533

1>>>2,1>>>3

Ocrdigits

9148713

8035757

2288108

2>>>3,1>>>3

Pendigits

3311350

183403319

121191

2>>>1>>>3

Segment

1212170

937103

14840

1>2>>>3,

Vote

805167

61

93

1>>>2,1>>>3

Wine

8426

41

10

1>>>2>>3

Zoo

45361

102

20

1>>>2>>>3

TABLE 6.4.3.6 Learning time comparisons

Method

CART

ID-LP

ID-LDA

CART




1

0

ID-LP

18




2

ID-LDA

20

13




If we compare the three linear methods in terms of accuracy, node size and learning time, we see that:

  • Accuracy: ID-LP=ID-LDA>CART.

  • Node Size: CART>ID-LDA>ID-LP.

  • Learning Time: CART >ID-LP>ID-LDA.

In terms of accuracy, CART outperforms ID-LP in those data sets where ID-LP does not always converge. On the Monks data set, CART outperforms ID-LP and ID-LDA quite significantly.










1   ...   7   8   9   10   11   12   13   14   15


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©atelim.com 2016
rəhbərliyinə müraciət