Page
FIGURE 2.1. Instances of the problem Choosing Car (This is an imaginary data set). 3
FIGURE 2.2. Decision tree for the problem Choosing Car 4
FIGURE 2.3.1 Graphs of Impurity Measures 9
FIGURE 2.4.1 A decision tree with multiple splits 12
14
FIGURE 2.6.1. Overfitting in Learning 14
FIGURE 2.6.2.1 A pruned subtree 16
FIGURE 3.1.1 Comparison of univariate and multivariate splits on the plane 17
FIGURE 3.1.2. An example decision tree for replication problem 18
FIGURE 3.1.3. An example decision tree for fragmentation problem 18
FIGURE 3.1.4 Instances of the problem Choosing Car with multivariate split 19
FIGURE 3.1.5 Multivariate decision tree for the problem Choosing Car 20
FIGURE 3.4.1 A step in CART algorithm 23
FIGURE 4.1 Linear perceptron model for multivariate decision trees 25
28
FIGURE 4.1.2.1 Multilayer perceptron model with one hidden layer 28
FIGURE 4.1.2.2 A nonlinear split to Choosing Car problem 30
LIST OF TABLES
Page
TABLE 6.1 Data sets properties 36
TABLE 6.1.1 Definition of methods 37
TABLE 6.1.1.1 Accuracy results for three different types of impurity measures 39
TABLE 6.1.1.2 Node results for three different types of impurity measures 40
TABLE 6.1.1.3 Learning time results for different types of impurity measures (in sec.) 40
TABLE 6.1.2.1 Accuracy results for pre-pruning and post-pruning techniques 47
TABLE 6.1.2.2 Node results for pre-pruning and post-pruning techniques 48
TABLE 6.1.2.3 Learning time results for pre-pruning and post-pruning techniques (in sec.) 49
TABLE 6.1.3.1 Accuracy results for splits with degrees two,three and four 55
TABLE 6.1.3.2 Node results for splits with degrees two,three and four 55
TABLE 6.1.3.3 Learning time results for splits with degrees two,three and four 56
TABLE 6.2.1 Definition of tree-based methods 62
TABLE 6.2.2 Accuracy results for ID3 and CART 63
TABLE 6.2.3 Node results for ID3 and CART 63
TABLE 6.2.4 Learning time results for ID3 and CART 64
TABLE 6.3.1 Definition of neural-network based methods 70
TABLE 6.3.1.1 Accuracy results for ID-LPS and ID-LPE 71
TABLE 6.3.1.2 Node results for ID-LPS and ID-LPE 72
TABLE 6.3.1.3 Learning time results for ID-LPS and ID-LPE 72
TABLE 6.3.2.1 Accuracy results for hybrid network models 78
TABLE 6.3.2.2 Node results for hybrid network models 79
TABLE 6.3.2.3 Learning time results for hybrid network models 79
TABLE 6.3.3.1 Accuracy results for different network models 87
TABLE 6.3.3.2 Node results for different network models 88
TABLE 6.3.3.3 Learning time results for different network models 88
TABLE 6.4.1 Definition of neural-network based methods 95
TABLE 6.4.1.1 Accuracy results for ID-LDA and ID-LDA-R 96
TABLE 6.4.1.2 Node results for ID-LDA and ID-LDA-R 96
TABLE 6.4.1.3 Learning time results for ID-LDA and ID-LDA-R 96
TABLE 6.4.2.1 Accuracy results for ID-LDA-R and ID-LDA-R99 101
TABLE 6.4.2.2 Node results for ID-LDA-R and ID-LDA-R99 102
TABLE 6.4.2.3 Learning time results for ID-LDA-R and ID-LDA-R99 103
TABLE 6.4.3.1 Accuracy results for linear decision tree methods 109
TABLE 6.4.3.2 Accuracy comparisons 109
TABLE 6.4.3.3 Node results for linear decision tree methods 110
TABLE 6.4.3.4 Node comparisons 110
TABLE 6.4.3.5 Learning time results for linear decision tree methods 110
TABLE 6.4.3.6 Learning time comparisons 111
LIST OF SYMBOLS
n the number of instances in a node
xi attribute i of instance
xt instance t
dt desired output for instance t
yt real output for instance t
t a decision node
m number of splits in a node
xij possible value j of an unordered feature i
f the number of features for a data set
Ci class i in a node
CL classes in the left branch
CR classes in the right branch
pi merit value of partition i
c the number of classes in a node
l the number of nodes in the tree
T an unpruned tree T
T’ a pruned tree T
wi weight of the feature i
1.INTRODUCTION
Machine learning aims at determining a description of a given concept from a set of concept examples provided by teacher and from the background knowledge. Concept examples can be positive or negative. Background knowledge contains the information about the language used to describe the examples and concepts. For instance, it can include possible values of variables and their hierarchies or predicates. The learning algorithm then builds on the type of examples, on the size and relevance of the background knowledge, on the representational issues (Mitchell, 1996).
Having explained the principles of machine learning, we can proceed with the decision tree construction method, which will be discussed in this thesis. The essence of this method is very simple. The entire set of examples is split into subsets that are easier to handle (Mitchell, 1996). The type of the split will determine type of the decision tree. The split can be based on one feature (Quinlan, 1986) or a linear combination of features (Breiman et. al., 1984).
The goodness of the split is based on a criterion called the partition-merit criterion. In the literature several experiments are done to compare these criteria (Brodley and Utgoff, 1995) (Mingers, 1989). We will also discuss three of them (Dietterich et. al., 1996) (Breiman et. al., 1984) (Quinlan, 1986).
In this thesis, we will discuss also other aspects of decision tree construction such as multiple splits, filling in missing values (Quinlan, 1989) and avoiding overfitting, using different pruning techniques. Although several experiments (Breiman et. al., 1984) (Quinlan, 1993) show that earlier pruning can decrease performance, we will compare pre-pruning and post-pruning.
After univariate methods, we will continue with multivariate methods. The methods can be classified by the method they use to find the contribution of the attributes. They can be heuristic (Breiman et. al., 1984) or can be based on other machine learning methods (Guo and Gelfond, 1992). We will also propose a statistical method to construct the tree (Duda and Hart, 1973) (Rencher, 1995).
In Chapter 2, we will discuss decision tree learning and issues in decision tree construction. We will also see a univariate algorithm ID3.
In Chapter 3, we give the drawbacks of univariate algorithms and show how the multivariate methods work introducing the CART algorithm.
In Chapter 4, we discuss neural trees, which are decision trees using neural networks at each node. We will also propose a hybrid algorithm.
In Chapter 5, we propose a statistical method to construct multivariate decision tree and discuss problems in this statistical method and how to get rid of them.
Chapter 6 gives simulation results. In this chapter, we see the results of several aspects of decision tree learning, comparing different types of decision trees on twenty standard data sets based on accuracy of classification, the size of the constructed decision tree and learning time.
We conclude and propose future work in Chapter 7.
|