Multivariate decision trees for machine learning

səhifə	6/15
tarix	24.06.2016
ölçüsü	5.57 Mb.

1 2 3 4 5 6 7 8 9 ... 15

3.MULTIVARIATE DECISION TREES

3.1.Univariate vs. Multivariate Splits

Multivariate decision trees mainly differ from univariate decision trees in the way they test the attributes. Univariate decision trees test only one attribute at a node whereas multivariate decision trees test more than one attribute (generally a linear combination of attributes) at a node. This limitation to one attribute reduces the ability of expressing concepts. It shows its disability in three forms. Splits could only be orthogonal to axes, subtrees may be replicated in the decision tree and there may be fragmentation.

For the first disability, consider the two-dimensional instance space shown in Figure 3.1.1. To approximate the class boundary, the corresponding univariate decision tree uses a series of orthogonal splits, whereas the multivariate test uses only one linear split.

FIGURE 3.1.1 Comparison of univariate and multivariate splits on the plane

This example shows the well-known problem that a univariate test using feature x_i can only split a space with a boundary that is orthogonal to the x_iaxis. This results in larger trees and poor generalization.

FIGURE 3.1.2. An example decision tree for replication problem

The second problem is the replication of trees. The decision tree shown in Figure 3.1.2 gives an inefficient representation of the proposition (A  B)  (C D). While the term (A  B) is described efficiently, the term (C D) requires two identical subtrees to be represented. In general, conjunctions can be described efficiently by decision trees while disjunctions require a large tree to describe (Pagallo and Haussler, 1990) (Mathues and Rendell, 1989). One solution to the replication problem is to allow decision nodes consisting of more than one feature by combining them with an appropriate function.

F
IGURE 3.1.3. An example decision tree for fragmentation problem

Figure 3.1.3 shows another kind of replication problem that can occur when the data contains attributes with high arity values i.e., attributes with large number of possible values. If a tree has high arity attributes (say arity  5) then it will quickly fragment the data in that node into small partitions. To avoid this problem we can construct subsets of the attribute values (Fayyad and Irani, 1992), e.g., seasons instead of months.

But in a multivariate decision tree, each test can be based on more than one feature. In Figure 3.1.1, we can separate the examples of two classes with one line. The test node is the multivariate test w₁ x₁ + w₂ x₂  c. Instances for which w₁ x₁ + w₂ x₂ is less than or equal to c are classified as one class; otherwise they are classified as the other class.

The multivariate decision tree-constructing algorithm selects not the best attribute but the best linear combination of the attributes: . w_i are the weights associated with each feature x_iand w₀is the threshold to be determined from the data. So there are basically two main operations in multivariate algorithms: Feature Selection determining which features to use and finding the weights w_i of those features and the threshold w₀.

FIGURE 3.1.4 Instances of the problem Choosing Car with multivariate split

Figure 3.1.4 illustrates a multivariate split for the same problem in Chapter 2. As it can be seen, the classes (Ferrari’s and Mustang’s) can now be splitted using only one multivariate split. The corresponding multivariate decision tree is shown in Figure 3.1.5. The split for the first node is

w₁ Year + w₂ Price  w₀,

which is a linear combination of features. The size of the tree is now reduced from seven nodes into three nodes.

FIGURE 3.1.5 Multivariate decision tree for the problem Choosing Car

Multivariate decision trees differ from univariate trees in two other respects: First, because the features in the linear combination split are multiplied with the coefficients, we must convert the symbolic features into numeric features. And second, because the final weighted sum is numeric, all splits are binary.

3.2.Symbolic and Numeric Features

A decision tree algorithm must handle both ordered and unordered features. In a univariate decision tree algorithm, we can use ordered features as x_i a, where a is in the observed range of the feature i and unordered features as x_i= a_j where a_j are possible values of the unordered feature x_iin the train set.

In a multivariate decision tree algorithm, we must convert each unordered feature i into a numeric representation to be able to take a linear combination of the features. So in our implementation, where i is the unordered feature with possible values a₁, …, a_m we convert it into multiple feature set as i₁, …,i_mwhere i_j= 1 ifx_i takes value a_j and 0 otherwise. At the end of this preprocessing we convert previous features (some ordered, some unordered) x₁, …,x_finto all ordered features as x₁, x₂₁, x₂₂, x₂₃… x_2m, x₃₁, x₃₂, x₃₃… x_3p, x₄_,… , x_fwhere x₁ , x₄… are ordered, x_2j ,x_3j… are unordered features.

This encoding avoids imposing any order on the unordered values of the feature (Hampson and Volper, 1986), (Utgoff and Brodley, 1991). Note that the dimensionality and complexity increases when the number of attributes is increased.

1 2 3 4 5 6 7 8 9 ... 15