Multivariate decision trees for machine learning

səhifə	2/15
tarix	24.06.2016
ölçüsü	5.57 Mb.

1 2 3 4 5 6 7 8 9 ... 15

LIST OF FIGURES

Page

FIGURE 2.1. Instances of the problem Choosing Car (This is an imaginary data set). 3

FIGURE 2.2. Decision tree for the problem Choosing Car 4

FIGURE 2.3.1 Graphs of Impurity Measures 9

FIGURE 2.4.1 A decision tree with multiple splits 12

FIGURE 2.6.1. Overfitting in Learning 14

FIGURE 2.6.2.1 A pruned subtree 16

FIGURE 3.1.1 Comparison of univariate and multivariate splits on the plane 17

FIGURE 3.1.2. An example decision tree for replication problem 18

FIGURE 3.1.3. An example decision tree for fragmentation problem 18

FIGURE 3.1.4 Instances of the problem Choosing Car with multivariate split 19

FIGURE 3.1.5 Multivariate decision tree for the problem Choosing Car 20

FIGURE 3.4.1 A step in CART algorithm 23

FIGURE 4.1 Linear perceptron model for multivariate decision trees 25

FIGURE 4.1.2.1 Multilayer perceptron model with one hidden layer 28

FIGURE 4.1.2.2 A nonlinear split to Choosing Car problem 30

LIST OF TABLES

Page

TABLE 6.1 Data sets properties 36

TABLE 6.1.1 Definition of methods 37

TABLE 6.1.1.1 Accuracy results for three different types of impurity measures 39

TABLE 6.1.1.2 Node results for three different types of impurity measures 40

TABLE 6.1.1.3 Learning time results for different types of impurity measures (in sec.) 40

TABLE 6.1.2.1 Accuracy results for pre-pruning and post-pruning techniques 47

TABLE 6.1.2.2 Node results for pre-pruning and post-pruning techniques 48

TABLE 6.1.2.3 Learning time results for pre-pruning and post-pruning techniques (in sec.) 49

TABLE 6.1.3.1 Accuracy results for splits with degrees two,three and four 55

TABLE 6.1.3.2 Node results for splits with degrees two,three and four 55

TABLE 6.1.3.3 Learning time results for splits with degrees two,three and four 56

TABLE 6.2.1 Definition of tree-based methods 62

TABLE 6.2.2 Accuracy results for ID3 and CART 63

TABLE 6.2.3 Node results for ID3 and CART 63

TABLE 6.2.4 Learning time results for ID3 and CART 64

TABLE 6.3.1 Definition of neural-network based methods 70

TABLE 6.3.1.1 Accuracy results for ID-LPS and ID-LPE 71

TABLE 6.3.1.2 Node results for ID-LPS and ID-LPE 72

TABLE 6.3.1.3 Learning time results for ID-LPS and ID-LPE 72

TABLE 6.3.2.1 Accuracy results for hybrid network models 78

TABLE 6.3.2.2 Node results for hybrid network models 79

TABLE 6.3.2.3 Learning time results for hybrid network models 79

TABLE 6.3.3.1 Accuracy results for different network models 87

TABLE 6.3.3.2 Node results for different network models 88

TABLE 6.3.3.3 Learning time results for different network models 88

TABLE 6.4.1 Definition of neural-network based methods 95

TABLE 6.4.1.1 Accuracy results for ID-LDA and ID-LDA-R 96

TABLE 6.4.1.2 Node results for ID-LDA and ID-LDA-R 96

TABLE 6.4.1.3 Learning time results for ID-LDA and ID-LDA-R 96

TABLE 6.4.2.1 Accuracy results for ID-LDA-R and ID-LDA-R99 101

TABLE 6.4.2.2 Node results for ID-LDA-R and ID-LDA-R99 102

TABLE 6.4.2.3 Learning time results for ID-LDA-R and ID-LDA-R99 103

TABLE 6.4.3.1 Accuracy results for linear decision tree methods 109

TABLE 6.4.3.2 Accuracy comparisons 109

TABLE 6.4.3.3 Node results for linear decision tree methods 110

TABLE 6.4.3.4 Node comparisons 110

TABLE 6.4.3.5 Learning time results for linear decision tree methods 110

TABLE 6.4.3.6 Learning time comparisons 111

LIST OF SYMBOLS

n the number of instances in a node

x_i attribute i of instance

x^t instance t

d^t desired output for instance t

y^t real output for instance t

t a decision node

m number of splits in a node

x_ijpossible value j of an unordered feature i

f the number of features for a data set

C_iclass i in a node

C_L classes in the left branch

C_R classes in the right branch

p_i merit value of partition i

c the number of classes in a node

l the number of nodes in the tree

T an unpruned tree T

T’ a pruned tree T

w_i weight of the feature i

1.INTRODUCTION

Machine learning aims at determining a description of a given concept from a set of concept examples provided by teacher and from the background knowledge. Concept examples can be positive or negative. Background knowledge contains the information about the language used to describe the examples and concepts. For instance, it can include possible values of variables and their hierarchies or predicates. The learning algorithm then builds on the type of examples, on the size and relevance of the background knowledge, on the representational issues (Mitchell, 1996).

Having explained the principles of machine learning, we can proceed with the decision tree construction method, which will be discussed in this thesis. The essence of this method is very simple. The entire set of examples is split into subsets that are easier to handle (Mitchell, 1996). The type of the split will determine type of the decision tree. The split can be based on one feature (Quinlan, 1986) or a linear combination of features (Breiman et. al., 1984).

The goodness of the split is based on a criterion called the partition-merit criterion. In the literature several experiments are done to compare these criteria (Brodley and Utgoff, 1995) (Mingers, 1989). We will also discuss three of them (Dietterich et. al., 1996) (Breiman et. al., 1984) (Quinlan, 1986).

In this thesis, we will discuss also other aspects of decision tree construction such as multiple splits, filling in missing values (Quinlan, 1989) and avoiding overfitting, using different pruning techniques. Although several experiments (Breiman et. al., 1984) (Quinlan, 1993) show that earlier pruning can decrease performance, we will compare pre-pruning and post-pruning.

After univariate methods, we will continue with multivariate methods. The methods can be classified by the method they use to find the contribution of the attributes. They can be heuristic (Breiman et. al., 1984) or can be based on other machine learning methods (Guo and Gelfond, 1992). We will also propose a statistical method to construct the tree (Duda and Hart, 1973) (Rencher, 1995).

In Chapter 2, we will discuss decision tree learning and issues in decision tree construction. We will also see a univariate algorithm ID3.

In Chapter 3, we give the drawbacks of univariate algorithms and show how the multivariate methods work introducing the CART algorithm.

In Chapter 4, we discuss neural trees, which are decision trees using neural networks at each node. We will also propose a hybrid algorithm.

In Chapter 5, we propose a statistical method to construct multivariate decision tree and discuss problems in this statistical method and how to get rid of them.

Chapter 6 gives simulation results. In this chapter, we see the results of several aspects of decision tree learning, comparing different types of decision trees on twenty standard data sets based on accuracy of classification, the size of the constructed decision tree and learning time.

We conclude and propose future work in Chapter 7.

1 2 3 4 5 6 7 8 9 ... 15