Decision Tree: How do Decision Tree Statistics compare to Profiles?
Omit Unclassified
By default Unclassified values are not omitted in a Decision Tree. Since a Decision Tree’s branching is a stepwise process, it is often useful to keep people who are unclassified on one variable so that another split can be tried using another variable. This can be done using either the Keep Separate or Free Floating dimension settings. See How do I configure the dimensions?
People with unclassified values can be omitted at a point in the Decision Tree where the variable is used. However, totals are still calculated including these Unclassified people, so that gain figures relative to the root node are still meaningful.
Omit Zeros
Categories for which the analysis count is zero are not omitted from the Decision Tree build process. These categories form an essential part of the nodes which have a low Analysis %.
Indices & PWE
These are calculated in the same way for Profiles and Decision Trees. You will get the same figures for both, providing that your profile is set to Omit Zeros and Omit Unclassified.
Mean PWE
For profiles, the Mean PWE is used to give an indication of the strength of the variables relationship with the analysis selection. To achieve this:
-
Firstly, the PWE scores are multiplied by 100.
-
The mean of the absolute PWE figures is calculated, so that categories giving a high positive or low negative both contribute to increasing the Mean PWE score. The Mean PWE will therefore always be positive.
-
The mean is also weighted by the analysis counts, so that categories which are more important to the analysis selection are given more influence on the mean score.
For the Decision Tree, the Mean PWE is used to give an indication of the mid-range PWE score. To achieve this:
-
The mean is of the raw PWE figures (not absolute, also not multiplied by 100).
-
The mean is weighted by the base counts, since splits based on the Mean PWE value will be used to allocate all records in the base selection.
Mean Index
For profiles, the Mean Index is used to give an indication of the strength of the variables relationship with the analysis selection. To achieve this:
-
Firstly, 100 is subtracted from the Index scores, so they are based around 0.
-
The mean of the absolute Index figures is calculated, so that categories giving a high positive or low negative both contribute to increasing the Mean Index score. The Mean Index will therefore always be positive.
-
The mean is also weighted by the analysis counts, so that categories which are more important to the analysis selection are given more influence on the mean score.
For the Decision Tree, the Mean Index is used to give an indication of the mid-range Index score. To achieve this:
-
The mean is of the raw Index figures (not absolute, also not multiplied by 100).
-
The mean is not weighted by anything. Weighting by the base counts would give a mean of exactly 100.
Percent to Totals (e.g. % of Base)
As explained above, Unclassified values are always included in any Totals that are quoted. This means that measures such as “% of base” or “% of analysis” are based on a total that includes the unclassified. If you have chosen to Omit unclassified for a particular dimension, then these percentages will add up to 100% minus the percentage of unclassified.
Analysis % vs % of Analysis
In Profiles, only one percentage is quoted relating to the analysis count for a particular variable category:
-
Analysis % = Analysis Count for this category / total Analysis Count for the variable
In Decision Trees, two types of percentage are calculated relating to the analysis count for a particular variable category:
-
% of [all] Analysis = Analysis Count for this category / total Analysis Count for the variable
-
Analysis % = i.e. the Analysis Count for the category / Base Count for that category
Unfortunately the definition of Analysis % used in profiles corresponds to the definition of % of Analysis used in the Decision Tree. This is to keep a consistency between Profiles and the Model Report.
Z score
The Z-scores in Profiles and Decision Trees are calculated on a different basis. They will, however, often give similar figures although these will not be identical.
A Profile Z-Score is assessing whether the variable category proportions are different between the analysis and base selections
-
E.g. is the male/female ratio different in the analysis selection (e.g. Swedish holiday makers) compared to the base selection (e.g. the whole population)
Perhaps comparing a male proportion of 48% in the analysis selection (perhaps a sample of 25,000), with a male proportion of 50% in the base selection (perhaps a sample of 1 million)
A Decision Tree Z-Score is assessing whether the analysis proportion is different between the variable categories
-
E.g. is the proportion of Swedish holiday makers different amongst males compared to females.
Perhaps comparing an Analysis % of 2.5% (= 25,000 out of 1million) amongst males (perhaps a sample of 500,000), with an Analysis % of 3% amongst females (perhaps a sample of 500,000))
As you can see above, the same situation can be viewed differently, leading to two different proportions being compared within different samples and hence yielding different Z-Scores.