6.RESULTS
For testing the algorithms discussed in this thesis, 20 data sets from the UCI Repository (Merz and Murphy, 1998) are used. The properties of these data sets are shown in Table 3.1 (See Appendix A for more details). The number of instances of these sets varies from 100 to 8000, the number of attributes varies from five to 65 and the number of classes varies from two to ten. There are also three different types of attributes: Continuous, discrete and mixed. Seven of these data sets have also missing values.
TABLE 6.1 Data sets properties
Data set name
|
Instances
|
Attributes
|
Classes
|
Missing
|
Type of Attributes
|
Breast
|
699
|
9
|
2
|
Y
|
Continuous
|
Bupa
|
345
|
6
|
2
|
N
|
Continuous
|
Car
|
1728
|
21
|
4
|
N
|
Discrete
|
Cylinder
|
541
|
69
|
2
|
Y
|
Mixed
|
Dermatology
|
366
|
34
|
6
|
Y
|
Continuous
|
Ecoli
|
336
|
7
|
8
|
N
|
Continuous
|
Flare
|
323
|
23
|
3
|
N
|
Mixed
|
Glass
|
214
|
9
|
7
|
N
|
Continuous
|
Hepatitis
|
155
|
19
|
2
|
Y
|
Continuous
|
Horse
|
368
|
97
|
2
|
Y
|
Mixed
|
Iris
|
150
|
4
|
3
|
N
|
Continuous
|
Ironosphere
|
351
|
34
|
2
|
N
|
Continuous
|
Monks
|
432
|
6
|
2
|
N
|
Continuous
|
Mushroom
|
8124
|
66
|
2
|
Y
|
Discrete
|
Ocrdigits
|
3823
|
64
|
10
|
N
|
Continuous
|
Pendigits
|
7494
|
16
|
10
|
N
|
Continuous
|
Segment
|
2310
|
18
|
7
|
N
|
Continuous
|
Vote
|
435
|
32
|
2
|
Y
|
Discrete
|
Wine
|
178
|
13
|
3
|
N
|
Continuous
|
Zoo
|
101
|
16
|
7
|
N
|
Continuous
|
For each method, we performed ten runs on each data set. The results of ten runs are then averaged and we report the mean and standard deviation of each method classification rate for each data set. For comparing performance of the methods we have used the combined 5x2 cv F Test (Alpaydın, 1999).
In our results, > denotes a confidence level between %90 and %95, >> denotes a confidence level between %95 and %99, >>> denotes a confidence level of over %99.
|