Selections¶

Introduction¶

A FastStats Selection is represented in py-apteco by a Clause object, possibly containing other nested or connected Clause objects, which combine to make the rule defining a set of records to be selected. The table from which to select the records is also embedded in the rule.

As well as the fundamental action of counting a selection to see how many records in the table match the conditions defined by the rule, selections form the basis of many other pieces of analysis and can be used in many different contexts.

Basic use¶

Setting up variables:

>>> from datetime import date
>>> dest = bookings["Destination"]
>>> trav = bookings["Travel Date"]
>>> cost = bookings["Cost"]

Creating a selection:

>>> sweden = dest == "29"
>>> at_least_2k = cost >= 2000
>>> before_2020 = trav <= date(2019, 12, 31)

Counting a selection:

>>> sweden.count()
25207

Combining selections:

>>> sweden_before_2020 = sweden & before_2020
>>> sweden_or_expensive = sweden | at_least_2k

Changing table:

>>> been_to_sweden = people * sweden

Taking sample:

>>> random_3_pct_sweden = sweden.sample(frac=0.03, sample_type="Random")

Applying limit:

>>> top_1000_sweden_by_cost = sweden.limit(1000, by=cost)

API Reference¶

Core attributes & methods¶

table: Table¶: resolve table of this selection

table_name: str¶: name of the resolve table of this selection

count()¶: return the number of records in this selection

Sampling and limits¶

sample(n=None, frac=None, sample_type="Random", skip_first=0, *, label=None)¶

Take a sample of records from the selection.

Parameters

n (int) – Number of records to return from selection. Cannot be used with frac.
frac (float) – Proportion of records to return out of whole selection, given as a number between 0 and 1. Cannot be used with n.
sample_type ({'Random', 'Stratified', 'First'}) – Type of sampling to use. Default is ‘Random’.
skip_first (int) – Number of records to skip from start of selection. Default is 0.
label (str) – Optional textual name for this selection clause.

limit(n=None, frac=None, by=None, ascending=None, per=None, *, label=None)¶

Limit the selection to a subset of records.

Parameters

n (int or tuple) – Number of records to return from selection. Cannot be used with frac. If by is given, a tuple of two integers (i, j) may be passed to select from the ith to the jth records.
frac (float or tuple) – Proportion of records to return out of whole selection, given as a number between 0 and 1. Cannot be used with n. If by is given, a tuple of two numbers (p, q) may be passed to select the proportion of records between them. For example frac=(0.1, 0.25) with ascending=False would give the top 10–25% of records.
by (Variable) – Variable specifying order in which records are selected.
ascending (bool, optional) – Whether to order records ascending (True) or descending (False) when selecting limit. Must be used with by. Default is False.
per (Table or Variable) – Return n records per this entity. Cannot be used with frac. If per is a Table, it must be a parent or ancestor table of the selection’s table, and for each record on this table n child records are returned from the selection. If per is a Variable, n records are returned for each value of this variable. If per is a selector variable, this means n records for each selector category.
label (str) – Optional textual name for this selection clause.

Data Grids and Cubes¶

datagrid(columns, table=None, max_rows=1000)¶

Build a data grid with this selection underlying it.

>>> cols = (
        [people[var] for var in ("Initial", "Surname")]
        + [bookings[var] for var in ("boDate", "boCost", "boDest")]
    )
>>> northern = households["Region"] == ["01", "02", "13"]
>>> datagrid = bookings.datagrid(cols, northern, max_rows=100)
>>> datagrid.to_df().head()
  Initial   Surname Booking Date     Cost    Destination
0       A     Allen   2020-08-11   551.81         France
1       W   Livesey   2021-08-02  1167.57   Sierra Leone
2       W   Livesey   2021-08-19   562.56  United States
3       W   Livesey   2021-08-08   960.55      Australia
4       O  Robinson   2021-08-22   455.60  United States