Decision Tree: How do I configure the dimensions?
Having chosen the dimensions there are a number of options available to you influence how they are used.
These options are explained in the following sections. For some examples of how to use the options, see Suggested Uses of Dimension Settings.
Creating and Using Dimensions
All dimensions dragged on to the dimensions tab are processed for each node and the results displayed in the Next Splits panel.
There are varying degrees to which dimensions can be processed within the Decision Tree. The "Create Split" and "Use Split" check boxes are used to determine how dimensions are processed over and above the creation of counts.
|
Description |
Neither |
Counts will be created for all dimensions that are dragged on to the dimensions tab. Categories will not be combined to make splits.
|
Create Split |
Categories are combined to make splits, but the split will not be selected automatically for use in creating child nodes.
|
Create Split and Use Split |
Categories are combined to make splits, and the split can be selected automatically for use in creating child nodes. |
Unclassifieds
There are a number of ways of handling missing values in the data:
Unclassified Handling |
Description |
Free-Floating |
This is the default option.
People with unclassified values are included in the Decision Tree. At each split the rules created are allowed to combine the unclassified category with any of the other categories.
|
Keep Separate |
People with unclassified values are still included in the Decision Tree. However, if the variable chosen for use at a split contains people with unclassified values, then a split is created that forces these people in to a separate node.
|
Omit Unclassified |
All people are included in the root node, but as each variable is used in a split, people with an unclassified value for that variable are omitted from the tree.
|
Low |
People with unclassified values can only be grouped with the lowest category (e.g. Lowest income band), i.e. the category with the first code.
It is still possible for unclassified values to form a branch on their own if this creates a better split. |
High |
People with unclassified values can only be grouped with the highest category (e.g. Highest income band), i.e. the category with the last code.
It is still possible for unclassified values to form a branch on their own if this creates a better split. |
Either End |
People with unclassified values can be grouped with either the lowest or the highest category (e.g. Extreme income bands), i.e. the category with the first or last code.
It is still possible for unclassified values to form a branch on their own if this creates a better split. |
Selector Branches
There are a number of ways of creating branches based on the categories within a selector. The use of these options typically depends on whether the selector is ordinal or nominal.
-
A nominal variable is one where there is no meaningful order to the categories as they appear in the selector (e.g. Occupation). A variable such as Region is nominal, despite the fact that there is a “spatial order”, since there is no significance in the order within the selector variable (i.e. based on their codes).
-
An ordinal variable is one where there is some order to the categories as they are presented in the selector (e.g. Income bands).
The options for handling selectors are as follows:
Unclassified Handling |
Description |
Mixed Categories |
This is the default option.
Branches can contain any combination of categories.
|
Ordered |
This ensures that each node always contains consecutive categories.
For Binary (Mean PWE Split) this is done by forming the two branches from categories either side of the mean PWE value. This was previously known as the "Single Cut" option. |
Cyclic |
This again imposes a restriction on the split to keep consecutive categories together, but allows the lowest and highest categories within the data to be joined together.
For example, the highest and lowest income bands across the whole data could be grouped together. |