Selection: How do I do a Top N Selection?
FastStats lets you restrict a selection or part of a selection to the top or bottom of the sequence of records when they are ordered by a data variable that you choose, or by an expression that you have created.
Top N selection ordered by a variable
Example:
Pick the 1,000 People with the highest price paid for a holiday by applying a Top N selection of Top 1,000 by Cost. If your other selection criteria identify over 1,000 records, the group of records selected will be limited to the 1,000 largest by Cost.
The example below applies a Top N selection to get the 1000 customers with the highest priced holiday. To do this, set a selection on the People table of 'Cost > 0' and then apply a Top N limit of 1000 records using the 'Cost' variable as follows.
To apply a Top N selection:
-
Right click on the node in the selection tree which you want to apply the limit to
-
Choose Apply Top N... from the pop-up menu as shown below
The resolve table of the Top N is shown in the top right hand corner ('Bookings' in the example below). This means we need to use a variable from the 'Bookings' table as the variable to perform the Top N selection with.
In the above example, the defined settings will take the top 1000 records based on booking cost.
-
Select the Top N check-box
-
Choose to pick either the Top, Bottom, Between the Top, Between the Bottom, Between the Top % or Between the Bottom % records from the list box
-
Choose to pick either a fixed total or a percentage and specify the amount to select in the relevant box
-
Click OK
Notice that the node in the selection which now has a Top N limit applied changes to display TOP as shown below:
To see a full description of any applied sampling, open the View Settings dialog on the Selection window toolbar and select Display full selection modifier information - see below
To modify a selection which has a Top N limit on it, right click on the part which has the limit applied and choose Modify Top N... from the pop-up menu. To remove the limit, uncheck the Top N check-box on the resulting dialog box.
Top N selection ordered by an expression
In the following example we will identify our top 100 customers in terms of their length of engagement with our organisation. We will determine length of engagement, in months, by looking at the time difference between their first and last transaction. The expression to find this information is shown below:
If we now open a blank selection we can right click on the New Selection heading and select Apply Top N.
By dragging the expression onto the drop box and setting the total to 100 we can now achieve our aim to find our top100 customers by the number of months they have engaged with us.
A Data Grid allows us to view the results. This example shows the Top 100 customers to have engaged with the organisation to be 71 or 70 months in total.
Aggregated Top N selection
This feature allows you to take a numeric variable from a child table to use on the Top N function, without the need to create a virtual variable.
For more information see Selection: How do I make an Aggregated Top N selection?