Incremental Extracts

Incremental extracts are design to reduce the extract time of large transaction tables by only extracting new records instead of extracting the entire table on every build. They are used to speed up the extract process and reduce load on the source database. The rest of the FastStats Designer build process will still run on the resulting full extract file.

It is suitable for ‘append only’ tables, i.e. tables that only grow with new records (insert only). It is not suitable for tables where existing records may be amended (update, deletes). Examples of suitable tables include data such as Orders, Order Lines and Communication History. Tables which are unlikely to be suitable include Households, Customers.

Incremental extracts are only supported by the standard extract method. FastStats Designer does not support the use of BCP extracts for incremental extracts.

Definition

Defining an incremental extract is straight-forward and only requires the addition of an ‘ERN’ (Extract Record Number) into your data set. The same query is used for both an incremental extract and for a full extract.

Extracts are defined on a per-table basis so it very likely that some tables will still be defined using full extracts and some tables will be defined as an incremental source.

Defining an ERN (Extract Record Number)

An ERN is used by Designer to record the last transaction extracted. An ERN must be a positive 64 bit integer (sometimes called a ‘BigInt’ or ‘Long’ value). The ERN must be present in the source query. The ERN can be generated from other data types (such as a timestamp.

Defining an incremental source is done using a FastStats Designer Custom Query using the @ERN parameter:

Copy
SELECT <top> * FROM Orders WHERE OrderID > @ERN

When running an extract Designer will initially use -1 as the ERN to retrieve all the results. During the extract process Designer will monitor the extracted data and keep a record of the highest extracted ERN. Subsequent extracts will substitute the current highest recorded ERN into the query to retrieve the delta data set.

FastStats Designer needs to know which database field it should monitor. This is defined in the ‘ERN Field’ setting. The ERN Field does not need to be included in the FastStats design and the data does not have to be extracted in ERN order. The extracted records are appended to the end of the existing extract file.

Managing Incremental Extracts

Each incremental extract is recorded as a Checkpoint and at any stage the extract can be rolled back to any checkpoint.

Reset

To reset an incremental source and re-extract the ERN will be set to -1. This will cause Designer to delete the existing file and extract from scratch. If the data source definition changes then the extract will be automatically reset. This includes if changes are made to the extract columns (columns added or removed), changes to the file encoding, etc.

Rollback

Designer can roll back the extract file to any previous extract since the last full extract. It does this by recording the length of the extract file and the corresponding ERN. By truncating the extract file and setting the ERN back a roll back is performed and the incremental extract can be performed again.

Test Extract

Designer can run the query containing an ERN as a test and display the results without performing the extract. It will run the query with the ERN substituted for the current ERN value. It will display the number of records returned, the first 100 records and the last 100 records returned by the query without appending the data to a file or changing the ERN. The records extracted are discarded and are not added to the incremental extract file.

Examples

Using a URN field as an ERN

Copy
SELECT <top> * FROM [Bookings] WHERE [Booking URN] > @ERN

Using Oracle

For Oracle the ERN parameter must be specified with a colon like this:

Copy
SELECT * FROM Orders WHERE OrderID > :ERN

Using a DateTime field as an ERN

To use a DateTime field as an ERN you must calculate an ERN from it. One way of doing this is to compare the DateTime field to a fixed epoch. So to derive an ERN from a DateTime field you can use a query such as this:

Copy
SELECT <top> *,

DATEDIFF(s, '20000101', OrderDate) OrderDateERN

FROM Orders

WHERE OrderDate > DATEADD(s, @ERN, '20000101')

This specifies an ERN (OrderDateERN) as the number of seconds that have elapsed since the start of 2000.

Incremental Extract Rules

An incremental extract rule can be specified to maintain the size of the extract file. See Incremental Extract Rules.

Example Incremental Data Source

The Training Data table [BookingsIncremental] contains a version of the Bookings table divided up into 6 sets of append-only records. These sets are defined by the field 'ERN' which ranges from 0 to 5.

To configure this incremental data source set up a Custom Query for the Bookings table:

Set the 'ERN Field' to be the field 'ERN' :

The training data provider contains some simple logic where it always returns the (ERN + 1) set of values to give new records for each build. To fully build the Bookings records you will have to perform 6 builds, ensure you extract the Bookings data source each time to retrieve the new data: