Behaviour-based Modelling - an example scenario

The following scenario uses a Base selection of all people who have received a communication in the previous year. The objective is to predict who is most likely to make a booking in the following month.

The artificial nature of the Holidays demo data dictates that all people make a booking at some point before the end of the database (Dec 31st 2021). This leads to the unusual effect that people are more likely to make a booking in the next month if they have not yet made a booking; this becomes increasingly true towards the end of the database.

This pattern is apparent in the graphical charts displayed on the Dimensions tab. The screenshot below shows that people with a zero count of bookings have a positive PWE, indicating they are more likely to be in the Analysis selection of people who make a booking in the next month.

A warning is provided when no data is available

A warning is given when the Dimensions or results are processed and no data is available. This will generally be because the selections and date ranges specified in the Analysis/Base scenario return no records for the Training or Evaluation Dates.

For example, the following warning relates to the training process, set to examine behaviour based on a Training Date of 01 April 2020. Given the above scenario, the Base selection is looking for people with a communication in the year before this date. The message indicates that there is no communication data in this range (April 2019-April 2020).

The following message relates to the evaluation process, set to examine behaviour based on the Evaluation Date of April 2015. Given the above selections again, the message indicates that there is no booking data in the month following this date (April 2015).

Evaluation results are stored

The evaluation results are stored against each Evaluation Date used and remain available if the Evaluation Date is changed and then restored to its previous value. This allows you to easily try different Evaluation Dates and determine whether the observed patterns apply to other date ranges.

In the above example, an evaluation date that is later in the database (e.g. June 2021) shows the same basic pattern as that observed for the Training Date (April 2021); people with no previous bookings are the most likely to book in the next month. This is displayed visually by the chart being similar and the fact that the power figure is positive also indicates the corroboration.

However, an Evaluation Date from earlier in the database (e.g. Apr 2018) shows the opposite pattern. This is captured by the power figure being negative - something which is fairly unusual and indicates that the model is targeting exactly the wrong people. At the start of the Holidays' database, the principle that people need to make that booking before the end is not that apparent; instead, the frequent holidaymakers are the most likely to make a booking in the next month.

Results for each Evaluation Date are stored while the Modelling Environment is open. Note that only the results for the currently displayed Evaluation Date are stored when a Modelling Environment is saved.

Build button provides the ability to 'Cancel'

While the Dimensions or results are building, the Build button provides you with the ability to Cancel the process:

Build button detects when changes have been made

The Build button turns green when changes have been made for which results do not exist. For example, if the Training or Evaluation Date is changed to a new date, the Build button turns green. If only some results need updating, the others will not be recalculated.

If the date is changed to one that has already been processed, the previous results will still be available and so the button remains blue.

Changing the sampling or underlying selection scenario also invalidates the current results; you need to rebuild, unless the sampling or scenario has already been evaluated.

Only the currently active results are saved - that is, results for all Dimensions under the current selections, sampling, Training and Evaluation Dates.

Opening the Event selection dialog, and then closing it via the OK button, results in the selection scenario being changed, even if no changes have been made within the dialog. This invalidates the current results. Use Cancel instead of OK to avoid this.

If you wish to force results to be updated (e.g. after rebuilding the system) the simplest way to achieve this is to change the date, drag off a copy of the Modelling Environment, and then change the date back again.

Use of current date as the scoring date for model variables

The ultimate aim of Behavioural Modelling is to create a model score virtual variable that takes the learning that has been obtained from the training process and applies it to recent data. This scoring process identifies people who are exhibiting the same behavioural patterns by giving them a high score, often with the intention of you then using these people as the target audience for a campaign.

The Scoring Date allows you to specify which data should be used in the scoring process. Previously this had to be a fixed date which was specified in the Wizard and recorded permanently with the virtual variable. Consequently, even if the variable was later refreshed, it would only score people based on their behaviour up to the fixed date recorded.

From the Q3 21 release, you can specify that the Scoring Date should be taken dynamically from the run date when the virtual variable is refreshed. This means that you can create a single score variable, refresh it periodically and, on each refresh, a customer’s latest behaviour is evaluated.

The model itself is not updated on each refresh. The same patterns learned during the training process are applied each time the score variable is refreshed. These patterns are captured by the coefficients in a PWE model, and by the structure of a Decision Tree.

With reference to this particular example, the screenshot below shows the bookings made by people with a high score in the PWE model and in the top node of the Decision Tree model. These are the people that the models predict are most likely to book in the next month. The models were created on the 7th of July, and it is apparent that the model has correctly identified people who have bookings in the month after this.

The main thing to note is that these people do not have any bookings before the 7th July. This confirms the application of the behavioural feature learned from the training data - i.e. that people with no previous bookings are most likely to book in the next month.

The PWE model variable has already been updated on the 8th of July and has reapplied this learning, scoring people and including the more recent data. Consequently, the PWE Cube shows the same principle, but moved on by a day.

Creating a model score variable is a two stage process and you can end this early at the halfway point:

When a Profile or Decision Tree is launched from within the Modelling Environment, the results of actions taken are also recorded within the Modelling Environment. This includes listing any virtual variables that are created via a PWE or Decision Tree Wizard. The Modelling Environment evaluates the effectiveness of these model variables but, to do this, the model variables need to be selector type rather than numeric.

If a numeric "actual value" type variable is created by the PWE Wizard, the following message is displayed:

Evaluation processing

The purpose of the evaluation process is to test whether any patterns suggested by the training date range will also apply to the evaluation date range. The training process is explicitly limited to only analyse people who match the criteria of the Base event - in this scenario, that is people who received a communication.

The evaluation process previously also applied the Base selection and, so, would only test the learning on people who had, in this example, received a communication in the year before the evaluation date.

From Q3 21, the evaluation process removes the Base event and evaluates a fuller selection of people. This provides a more realistic evaluation of whether the patterns will apply to all people. In some situations, it is actually essential to remove the Base condition, since this might not be relevant to the evaluation date range. For example, the Base condition might refer to a specific campaign that was being analysed for the training process. If the evaluation also used this condition, it would try to find people with this campaign before the evaluation date and this may well result in no data to evaluate.

This change is apparent if the selections used in the model report are viewed. However, the change is being applied in all places that a power figure is quoted (i.e. on the Dimensions and Results tab of the Modelling Environment). The screenshot below is based on the example in this scenario. Notice that:

The Analysis event is still being used and, therefore, the effectiveness of the model at predicting people who make a booking is being evaluated.
The Base event, requiring people to have received a communication in the last year, has been removed and, therefore, the Model Report is evaluating ‘All People’.

Event selections should not contain date criteria

The event-driven method of creating Analysis and Base selections requires simple, transaction level selections that should not include date criteria. They should only include criteria based on other variable types - for example, the campaign or product codes. The Analysis and Base selections are then dynamically constructed from these event selections, and date criteria inserted based on the options chosen via dialogs.

If the Event selection contains a date variable, it will not be possible to use the selection when constructing the event-driven logic and the following warning is displayed:

You will see a warning if Dimensions include non-behavioural features

The intention of the behavioural modelling functionality is that it automatically generates Dimensions known as “behavioural features”. These summarise an aspect of a person’s behaviour during a particular time frame, as you explore possible differences in behaviour between the Analysis and Base selections. These behavioural features will automatically insert the criteria to focus on the intended time frame - e.g. the month before the person received a communication.

If you create and drag on your own aggregation, such as the expression below, this will not be “under the control of” the behavioural modelling process. A warning to explain this is given when the Dimensions are built. This Dimension will count all bookings over the entire database for each person, regardless of the Training or Evaluation Dates. This is not in itself an error, since it could be a useful predictor - albeit a static one which does not depend on the date (like other person level variables such as demographics).

The booking level Continent variable is also operating in a non-time dependent way. It is assessing the impact of whether somebody has ever been to a certain destination and, again, it is operating over the whole database, and a warning is given.