Modelling Environment: Sampling

The Modelling Environment in FastStats allows you to create and work on a sample of your data before applying your findings to the database as a whole. This is particularly useful when working on large databases, as it is often more convenient to do initial analysis on a smaller sample of data. Once preliminary investigation shows the model is promising, it can then be refined on larger volumes of data. You can sample data in terms of the number of records to use, and determine the proportion of records from the Analysis group that should be included.

You access Sampling from within the Modelling Environment:

  1. Click Sampling

  2. Check the Use sampled selections box

You can now define your sample by selecting the size of the Analysis group and the proportion of the sample that the Analysis group should make up.

  1. Set the Analysis Count to 100,000

The default settings ask for 100,000 Analysis records, with these records making up 50% of the sample. As such the Base Total sample size is 200,000.

  1. Change the Count to 10,000 and the % percentage to 25%

This results in a Base Sample size of 40,000 - i.e. 10,000 (25%) coming from the Analysis group and 30,000 (75%) coming from the Non Analysis group.

Each time you change your sample settings, you must rebuild the Modelling Environment for the results of those changes to be visible on the Profile tab.