3.1.Univariate vs. Multivariate Splits
Multivariate decision trees mainly differ from univariate decision trees in the way they test the attributes. Univariate decision trees test only one attribute at a node whereas multivariate decision trees test more than one attribute (generally a linear combination of attributes) at a node. This limitation to one attribute reduces the ability of expressing concepts. It shows its disability in three forms. Splits could only be orthogonal to axes, subtrees may be replicated in the decision tree and there may be fragmentation.
For the first disability, consider the two-dimensional instance space shown in Figure 3.1.1. To approximate the class boundary, the corresponding univariate decision tree uses a series of orthogonal splits, whereas the multivariate test uses only one linear split.
FIGURE 3.1.1 Comparison of univariate and multivariate splits on the plane
This example shows the well-known problem that a univariate test using feature xi can only split a space with a boundary that is orthogonal to the xi axis. This results in larger trees and poor generalization.
FIGURE 3.1.2. An example decision tree for replication problem
The second problem is the replication of trees. The decision tree shown in Figure 3.1.2 gives an inefficient representation of the proposition (A B) (C D). While the term (A B) is described efficiently, the term (C D) requires two identical subtrees to be represented. In general, conjunctions can be described efficiently by decision trees while disjunctions require a large tree to describe (Pagallo and Haussler, 1990) (Mathues and Rendell, 1989). One solution to the replication problem is to allow decision nodes consisting of more than one feature by combining them with an appropriate function.
IGURE 3.1.3. An example decision tree for fragmentation problem
Figure 3.1.3 shows another kind of replication problem that can occur when the data contains attributes with high arity values i.e., attributes with large number of possible values. If a tree has high arity attributes (say arity 5) then it will quickly fragment the data in that node into small partitions. To avoid this problem we can construct subsets of the attribute values (Fayyad and Irani, 1992), e.g., seasons instead of months.
But in a multivariate decision tree, each test can be based on more than one feature. In Figure 3.1.1, we can separate the examples of two classes with one line. The test node is the multivariate test w1 x1 + w2 x2 c. Instances for which w1 x1 + w2 x2 is less than or equal to c are classified as one class; otherwise they are classified as the other class.
The multivariate decision tree-constructing algorithm selects not the best attribute but the best linear combination of the attributes: . wi are the weights associated with each feature xi and w0 is the threshold to be determined from the data. So there are basically two main operations in multivariate algorithms: Feature Selection determining which features to use and finding the weights wi of those features and the threshold w0.
FIGURE 3.1.4 Instances of the problem Choosing Car with multivariate split
Figure 3.1.4 illustrates a multivariate split for the same problem in Chapter 2. As it can be seen, the classes (Ferrari’s and Mustang’s) can now be splitted using only one multivariate split. The corresponding multivariate decision tree is shown in Figure 3.1.5. The split for the first node is
w1 Year + w2 Price w0,
which is a linear combination of features. The size of the tree is now reduced from seven nodes into three nodes.
FIGURE 3.1.5 Multivariate decision tree for the problem Choosing Car
Multivariate decision trees differ from univariate trees in two other respects: First, because the features in the linear combination split are multiplied with the coefficients, we must convert the symbolic features into numeric features. And second, because the final weighted sum is numeric, all splits are binary.
3.2.Symbolic and Numeric Features
A decision tree algorithm must handle both ordered and unordered features. In a univariate decision tree algorithm, we can use ordered features as xi a, where a is in the observed range of the feature i and unordered features as xi = aj where aj are possible values of the unordered feature xi in the train set.
In a multivariate decision tree algorithm, we must convert each unordered feature i into a numeric representation to be able to take a linear combination of the features. So in our implementation, where i is the unordered feature with possible values a1, …, am we convert it into multiple feature set as i1, …,im where ij = 1 if xi takes value aj and 0 otherwise. At the end of this preprocessing we convert previous features (some ordered, some unordered) x1 , …,xf into all ordered features as x1, x21, x22, x23 … x2m, x31, x32, x33 … x3p, x4, … , xf where x1 , x4 … are ordered, x2j ,x3j … are unordered features.
This encoding avoids imposing any order on the unordered values of the feature (Hampson and Volper, 1986), (Utgoff and Brodley, 1991). Note that the dimensionality and complexity increases when the number of attributes is increased.