Multivariate decision trees for machine learning

səhifə	10/15
tarix	24.06.2016
ölçüsü	5.57 Mb.

1 ... 7 8 9 10 11 12 13 14 15

6.RESULTS

For testing the algorithms discussed in this thesis, 20 data sets from the UCI Repository (Merz and Murphy, 1998) are used. The properties of these data sets are shown in Table 3.1 (See Appendix A for more details). The number of instances of these sets varies from 100 to 8000, the number of attributes varies from five to 65 and the number of classes varies from two to ten. There are also three different types of attributes: Continuous, discrete and mixed. Seven of these data sets have also missing values.

TABLE 6.1 Data sets properties

Data set name	Instances	Attributes	Classes	Missing	Type of Attributes
Breast	699	9	2	Y	Continuous
Bupa	345	6	2	N	Continuous
Car	1728	21	4	N	Discrete
Cylinder	541	69	2	Y	Mixed
Dermatology	366	34	6	Y	Continuous
Ecoli	336	7	8	N	Continuous
Flare	323	23	3	N	Mixed
Glass	214	9	7	N	Continuous
Hepatitis	155	19	2	Y	Continuous
Horse	368	97	2	Y	Mixed
Iris	150	4	3	N	Continuous
Ironosphere	351	34	2	N	Continuous
Monks	432	6	2	N	Continuous
Mushroom	8124	66	2	Y	Discrete
Ocrdigits	3823	64	10	N	Continuous
Pendigits	7494	16	10	N	Continuous
Segment	2310	18	7	N	Continuous
Vote	435	32	2	Y	Discrete
Wine	178	13	3	N	Continuous
Zoo	101	16	7	N	Continuous

For each method, we performed ten runs on each data set. The results of ten runs are then averaged and we report the mean and standard deviation of each method classification rate for each data set. For comparing performance of the methods we have used the combined 5x2 cv F Test (Alpaydın, 1999).

In our results, > denotes a confidence level between %90 and %95, >> denotes a confidence level between %95 and %99, >>> denotes a confidence level of over %99.

1 ... 7 8 9 10 11 12 13 14 15