Identifying the best Behavioural Features to use in a model

The Dimensions tab of the Modelling Environment is where you can explore a range of possible behavioural features (dimensions) and identify those which are most useful in creating a model. You can then take this subset of the best features forward to create an actual model using PWE or Decision Tree.

The dimensions chosen to create a model should be:

1) Predictive - dimensions which can distinguish between the analysis and base selections

2) Diverse - dimensions which are not related to each other

See below for how graphical charts and numerical measures can help you identify predictive dimensions - particularly in relation to behavioural modelling.

For more on diverse dimensions, relevant to standard and behavioural modelling, see - Using Measures of Association to find Diverse Dimensions

Added in Q2 22

The essence of a good behavioural feature is one which finds patterns of distinctive differences between the analysis and base selections, where these patterns are consistent with other periods. For example, a feature such as “the number of recent web visits” could have a distinctive pattern (e.g. more visits means more likely to respond) which is likely to be consistent regardless of the period (e.g. this year or last year). There are other cases where a pattern observed in one period does not apply to other periods - either due to external differences between the periods (e.g. due to COVID) or because a feature is too niche and has picked up on a pattern which is just a nuance of behaviour at a particular time.

The charts below are designed to provide useful information and help you identify the best behavioural features (dimensions) to include in a model. Although described in more detail below, the main concepts are:

Uplift Categories: The categories of a dimension for which a positive PWE score is given, based on the training period.
Uplift PWE: This indicates how effective the dimension is at making positive predictions.
Uplift Coverage: This indicates how many people would be given a positive score by this feature and is calculated from evaluation base counts for the Uplift Categories.

These metrics all focus on the categories of a behavioural feature which provide the positive insight; those categories where there is an uplift in the likelihood of somebody being in the analysis selection, and of that person fulfilling the objective - for example, responding to a campaign, or taking out insurance.

Some behavioural features are able to provide a positive score based on the presence of a specific behaviour. However, the absence of this behaviour is often neutral and not negative. For example, reserving a skydiving activity for your holiday would make you a good prospect for an insurance campaign, but the absence of such activities does not imply that you are a bad prospect.

A good model will be based on both of the following types of features:

Niche Features

Those with high Uplift PWE but probably low Uplift Coverage and low Power
These features only apply to a few customers but, for these customers, they are able to give a high positive score, indicating a high likelihood to fulfil the objective, such as responding to a campaign

Broad Features

Those with a high power
These features apply to a broad range of customers
These scores will typically be both positive and negative but may not be as strong as those given by a niche feature

Let's examine things in more detail, starting with the two key metrics - Power and Uplift PWE - before moving on to examples of the charts.

Power

The Power metric is used to identify the best dimensions (behavioural features) to use in models. It is excellent for identifying “broad-brush" features which apply to a large number of customers, providing a general indication of whether those customers are good prospects or not. However, this measure can be affected by the volume of records and, consequently, there is the potential to miss other powerful predictors which only apply to fewer records. To address this, you can use the Uplift PWE measure.

Uplift PWE

Uplift PWE highlights “niche features”. It is a weighted average of positive PWEs, where the weighting is based on the evaluation Base selection count. These are features which may only apply to a few customers, but can still give a very strong indication of a good prospect. For example, people who have booked skydiving activities are highly likely to need and purchase appropriate insurance cover. However, this transactional behaviour may only apply to a small subset of the customer base and, if relying solely on the Power metric, may well be missed. For the majority of people to whom this feature does not apply, a neutral rather than negative score is given.

The table and chart below highlight some niche features which identify some of the best prospects for purchasing insurance. For example, people who book holidays to potentially higher risk destinations such as Asia and Africa, or those who book to do particular, more dangerous activities. The table is sorted by Power, demonstrating that these features would be hard to identify without the Uplift PWE measure.

Comparing behavioural patterns for Training and Evaluation periods

The Modelling Environment allows two periods to be defined: training and evaluation. When results are generated on the Dimensions tab, a profile is being run for each period for each dimension. The PWE metric is useful for identifying patterns in each period separately but, where relevant, the For drop-down menu includes a Training and Evaluation option, allowing you to see and more easily compare results, side by side.

You can see below, for example, that when investigating Months since Last Policy Date you can choose to view the Training and Evaluation charts separately:

There are similarities between the two charts, but having both together in a single display makes it far easier to discern the similarities and differences:

Charts displaying both analysis and base series - i.e. counts and percentages - can only display training or evaluation one at a time, not both together.

The following charts are based on a comparison of the training and evaluation periods and only apply to the combined Training & Evaluation option. Let's take a look!

PWE Agreement

Purpose

This chart highlights differences in PWE between the training and evaluation periods. It allows you to visualise the number of records in each category and informs on the significance of each category.

Terminology

Evaluation PWE – how predictive is each category on the evaluation date
Training PWE – how predictive is each category on the training date
Evaluation agrees – training and evaluation agree and the results are statistically significant
Evaluation inconclusive – the category is not statistically significant in the evaluation period
Evaluation disagrees – training and evaluation disagree and the results are statistically significant
Training not significant – the results in the category are not statistically significant in the training period

Ideally, the points in the chart will be arranged diagonally from bottom left to top right, indicating that the training and evaluation data agree on the categories which predict that someone is likely - or unlikely - to exhibit particular behaviours.

In this example, splitting the chart into four quarters indicates:
- Bottom-left - agreement that the category is not likely to result in someone taking out an insurance policy
- Top-right - agreement that the category is likely to predict someone taking out an insurance policy
- Top-left and bottom-right - the training and evaluation data do not agree
Hovering over a point displays a tool-tip that identifies the category the point represents. In this example:
- The large point in the bottom-left corner represents those people with no months since their last policy. Both the training and evaluation dates agree that, if there are no months since your last policy, i.e. you have just taken out a policy, then you are unlikely to take a policy out in the next 3 months.
- There are three points clustered in the top-right, all of which highlight categories that the training and evaluation dates agree are predictive. The largest category comprises of people whose last policy was taken out more than 12 months ago. Unsurprisingly, the dimension suggests that, if a person's last policy was taken out a long time ago, they are likely to purchase another policy in the next 3 months.

PWE v Coverage (Categories)

Purpose

There will always be people to whom the behavioural feature does not apply. For example, a behavioural feature looking at Months since Last Policy Date would not apply to those who have never had a policy. This chart allows you to see the PWE scores for a category, together with the number of base records covered.

Terminology

Coverage – the number of base records included in that category
Uplift PWE – a weighted average of positive PWEs, where the weighting is based on the evaluation base count.
Uplift Coverage - the count of evaluation records with a positive PWE score

The chart colours are consistent and can be interpreted in the same way across all charts.
- The length of the bar from the 0 axis indicates the PWE for that category; the width is representative of the number of base records falling into that category.
- The scale is consistent across the charts of all features and is based on the total number of people in the base selection on the evaluation date. In this example, the majority of people are not scored as they have not taken out an insurance policy.
- The horizontal red line indicates the Uplift PWE, and the vertical red line represents the Uplift Coverage. Hovering over the red line displays a tool-tip which, in this example, indicates that the Uplift Coverage is 3,976 of the base count. The chart has identified 3,976 people who, according to the Months since Last Policy Date dimension, are good prospects for taking out an insurance policy. Such a small amount of people could easily be missed if only the PWE of the dimension is studied - as illustrated by the charts below which relate to the ONLY:Adventure Sports combination.

Let's take a look at the Dimensions tab and consider the Mean PWE and Uplift PWE:

The key point to note is the difference between the Mean PWE (0.115) and Uplift PWE (1.612). In isolation, you may have chosen to discard this ONLY: Adventure Sports dimension based on the Mean PWE value, but the Uplift PWE suggests that there are some very predictive categories. The reason the Mean PWE is so low is because a very large number of the base group does not have this transactional combination and, as such, the average is brought down.

Looking at the first of the charts above, the bar is so small that it is hardly visible. In the second chart, using the zoom option allows you to zoom into the most ‘useful’ part of the chart display where you can identify a small number of records - people who are highly likely to exhibit the desired behaviour. Looking only at the Mean PWE or Power of the dimension could very well lead to the conclusion that the overall dimension is not very predictive and, potentially, you could miss these records completely.

Uplift PWE v Coverage (Dimensions)

Purpose

The PWE v Coverage (Categories) chart demonstrates how niche categories can be very predictive, but only apply to a few records. The Uplift PWE v Coverage (Dimensions) chart shows the relationship[ between Uplift PWE and the records covered for each dimension.

Terminology

Coverage (Dimensions) – the number of base records included in that dimension
Uplift PWE – a weighted average of positive PWEs where the weighting is based on the base count

Unlike the previous charts, Uplift PWE v Coverage (Dimensions) is a single chart rather than a separate chart for each dimension. Instead, this chart has a point for each dimension.

Green represents the dimension selected in the Dimensions panel
Yellow corresponds to dimensions currently selected for use in the model
Grey represents any dimension that is unticked and, therefore, not being used in the model

With this chart, it is common to see that, as the Uplift PWE value increases, the Uplift Coverage value decreases. To identify a particular point, hover your cursor over it and/or click the dimension in the variable panel.

In this example, Months since last Policy Date has been selected and is shaded green. It has an uplift PWE of over 1.9 which indicates that it is predictive, but only covers approximately 3% of the evaluation's base selection.

The bottom-right point represents Change in Average Holiday Length and covers a far greater percentage of the base group, but is not nearly as predictive.

Power

Purpose

The Power metric will be familiar to existing FastStats Modelling users. Power works very well when assessing models with a variety of dimensions, because multiple dimensions are likely to contribute different insight. Using the Power metric on a single dimension only works well if that dimension can cover a wide range of customers. The Power chart allows us to see the effectiveness of a dimension when its categories are ordered from most predictive to least predictive.

Terminology

Power– how well a model/dimension differentiates good and bad prospects
The horizontal axis represents the percentage of those scored by the feature, not the percentage of the entire base. Hovering over a point displays a tool-tip that indicates the category to which it relates.
The more records scored, and the more predictive categories are, the more the line curves away from being a straight, diagonal line. Ideally, the dimensions have a pronounced curve. If the curve is close to the diagonal, it suggests the dimension as a whole may not be ideal but, as demonstrated earlier, the Uplift PWE might counter this argument and suggest that some of the categories are very predictive.
The categories are ordered from most predictive to least predictive.
In the first chart above, you can see that the top 3 categories result in a steep gradient before the curve becomes more shallow. This suggests that those 3 categories are powerful in predicting the likelihood of someone being in the analysis group.
In the second chart, the line is almost diagonal. This is mainly because there are only two categories available - either a person has some sport activity, or they do not. Initially there is a rise in the Power with about 4% of the base identifying approximately 8% of the analysis. The only other point falls at 100% point on both axes, resulting in a straight line.

In short, this is a useful chart when dimensions have a number of categories covering a number of records, but not as useful when the dimension has fewer categories, or covers fewer records.

Explore the tool-tips on the axis labels and the vertical and horizontal reference lines - they provide a definition of the concepts involved!

Uplift PWE v Power (Dimensions)

Purpose

This chart helps to make the Power metric more relevant for dimensions of all types. The Uplift PWE of each dimension is plotted against the Power for that dimension and allows you to interpret the value of a dimension by assessing how powerful it is in relation to the amount of PWE uplift it offers.

Terminology

Power – how well a model/dimension differentiates good and bad prospects
Uplift PWE – a weighted average of positive PWEs where the weighting is based on the base count

Each point on the chart represents a dimension and hovering over the point gives a tool-tip that identifies that dimension.

Green represents the dimension selected in the Dimensions panel
Yellow corresponds to dimensions currently selected for use in the model
Grey represents any dimension that is unticked and, therefore, not being used in the model

The green point on this chart represents bookings that only have adventure sport activities. The Power for this dimension is very low because the dimension as a whole is not powerful. However, the positive Uplift PWE value indicates great predictive power.

The point with the highest Power (0.6) and the highest Uplift PWE (1.9) is the most useful dimension (Months since Last Policy Date), whilst the 3 points clustered in the bottom-left of the display represent the least useful dimensions in this example.