Page
FIGURE 2.1. Instances of the problem Choosing Car (This is an imaginary data set). 3
FIGURE 2.2. Decision tree for the problem Choosing Car 4
FIGURE 2.3.1 Graphs of Impurity Measures 9
FIGURE 2.4.1 A decision tree with multiple splits 12
14
FIGURE 2.6.1. Overfitting in Learning 14
FIGURE 2.6.2.1 A pruned subtree 16
FIGURE 3.1.1 Comparison of univariate and multivariate splits on the plane 17
FIGURE 3.1.2. An example decision tree for replication problem 18
FIGURE 3.1.3. An example decision tree for fragmentation problem 18
FIGURE 3.1.4 Instances of the problem Choosing Car with multivariate split 19
FIGURE 3.1.5 Multivariate decision tree for the problem Choosing Car 20
FIGURE 3.4.1 A step in CART algorithm 23
FIGURE 4.1 Linear perceptron model for multivariate decision trees 25
28
FIGURE 4.1.2.1 Multilayer perceptron model with one hidden layer 28
FIGURE 4.1.2.2 A nonlinear split to Choosing Car problem 30
LIST OF TABLES
Page
TABLE 6.1 Data sets properties 36
TABLE 6.1.1 Definition of methods 37
TABLE 6.1.1.1 Accuracy results for three different types of impurity measures 39
TABLE 6.1.1.2 Node results for three different types of impurity measures 40
TABLE 6.1.1.3 Learning time results for different types of impurity measures (in sec.) 40
TABLE 6.1.2.1 Accuracy results for prepruning and postpruning techniques 47
TABLE 6.1.2.2 Node results for prepruning and postpruning techniques 48
TABLE 6.1.2.3 Learning time results for prepruning and postpruning techniques (in sec.) 49
TABLE 6.1.3.1 Accuracy results for splits with degrees two,three and four 55
TABLE 6.1.3.2 Node results for splits with degrees two,three and four 55
TABLE 6.1.3.3 Learning time results for splits with degrees two,three and four 56
TABLE 6.2.1 Definition of treebased methods 62
TABLE 6.2.2 Accuracy results for ID3 and CART 63
TABLE 6.2.3 Node results for ID3 and CART 63
TABLE 6.2.4 Learning time results for ID3 and CART 64
TABLE 6.3.1 Definition of neuralnetwork based methods 70
TABLE 6.3.1.1 Accuracy results for IDLPS and IDLPE 71
TABLE 6.3.1.2 Node results for IDLPS and IDLPE 72
TABLE 6.3.1.3 Learning time results for IDLPS and IDLPE 72
TABLE 6.3.2.1 Accuracy results for hybrid network models 78
TABLE 6.3.2.2 Node results for hybrid network models 79
TABLE 6.3.2.3 Learning time results for hybrid network models 79
TABLE 6.3.3.1 Accuracy results for different network models 87
TABLE 6.3.3.2 Node results for different network models 88
TABLE 6.3.3.3 Learning time results for different network models 88
TABLE 6.4.1 Definition of neuralnetwork based methods 95
TABLE 6.4.1.1 Accuracy results for IDLDA and IDLDAR 96
TABLE 6.4.1.2 Node results for IDLDA and IDLDAR 96
TABLE 6.4.1.3 Learning time results for IDLDA and IDLDAR 96
TABLE 6.4.2.1 Accuracy results for IDLDAR and IDLDAR99 101
TABLE 6.4.2.2 Node results for IDLDAR and IDLDAR99 102
TABLE 6.4.2.3 Learning time results for IDLDAR and IDLDAR99 103
TABLE 6.4.3.1 Accuracy results for linear decision tree methods 109
TABLE 6.4.3.2 Accuracy comparisons 109
TABLE 6.4.3.3 Node results for linear decision tree methods 110
TABLE 6.4.3.4 Node comparisons 110
TABLE 6.4.3.5 Learning time results for linear decision tree methods 110
TABLE 6.4.3.6 Learning time comparisons 111
LIST OF SYMBOLS
n the number of instances in a node
x_{i} attribute i of instance
x^{t} instance t
d^{t} desired output for instance t
y^{t} real output for instance t
t a decision node
m number of splits in a node
x_{ij}_{ }possible value j of an unordered feature i
f the number of features for a data set
C_{i}_{ }class i in a node
C_{L} classes in the left branch
C_{R} classes in the right branch
p_{i} merit value of partition i
c the number of classes in a node
l the number of nodes in the tree
T an unpruned tree T
T’ a pruned tree T
w_{i} weight of the feature i
1.INTRODUCTION
Machine learning aims at determining a description of a given concept from a set of concept examples provided by teacher and from the background knowledge. Concept examples can be positive or negative. Background knowledge contains the information about the language used to describe the examples and concepts. For instance, it can include possible values of variables and their hierarchies or predicates. The learning algorithm then builds on the type of examples, on the size and relevance of the background knowledge, on the representational issues (Mitchell, 1996).
Having explained the principles of machine learning, we can proceed with the decision tree construction method, which will be discussed in this thesis. The essence of this method is very simple. The entire set of examples is split into subsets that are easier to handle (Mitchell, 1996). The type of the split will determine type of the decision tree. The split can be based on one feature (Quinlan, 1986) or a linear combination of features (Breiman et. al., 1984).
The goodness of the split is based on a criterion called the partitionmerit criterion. In the literature several experiments are done to compare these criteria (Brodley and Utgoff, 1995) (Mingers, 1989). We will also discuss three of them (Dietterich et. al., 1996) (Breiman et. al., 1984) (Quinlan, 1986).
In this thesis, we will discuss also other aspects of decision tree construction such as multiple splits, filling in missing values (Quinlan, 1989) and avoiding overfitting, using different pruning techniques. Although several experiments (Breiman et. al., 1984) (Quinlan, 1993) show that earlier pruning can decrease performance, we will compare prepruning and postpruning.
After univariate methods, we will continue with multivariate methods. The methods can be classified by the method they use to find the contribution of the attributes. They can be heuristic (Breiman et. al., 1984) or can be based on other machine learning methods (Guo and Gelfond, 1992). We will also propose a statistical method to construct the tree (Duda and Hart, 1973) (Rencher, 1995).
In Chapter 2, we will discuss decision tree learning and issues in decision tree construction. We will also see a univariate algorithm ID3.
In Chapter 3, we give the drawbacks of univariate algorithms and show how the multivariate methods work introducing the CART algorithm.
In Chapter 4, we discuss neural trees, which are decision trees using neural networks at each node. We will also propose a hybrid algorithm.
In Chapter 5, we propose a statistical method to construct multivariate decision tree and discuss problems in this statistical method and how to get rid of them.
Chapter 6 gives simulation results. In this chapter, we see the results of several aspects of decision tree learning, comparing different types of decision trees on twenty standard data sets based on accuracy of classification, the size of the constructed decision tree and learning time.
We conclude and propose future work in Chapter 7.
