Variables¶
Introduction¶
The Variable
object represents a FastStats system variable.
It holds metadata about the FastStats variable,
and has a reference to the Table
object
corresponding to the FastStats table the variable belongs to.
But its main use is in enabling you to build selections
in a quick and intuitive way.
Variable
objects for a FastStats system are automatically created
when you initialize a Session
connected to that system.
Basic use¶
Retrieving a variable from a Session
object:
>>> cost = my_session.variables["Cost"]
Retrieving a variable from its Table
object:
>>> gender = people["Gender"]
Examining variable metadata:
>>> print(
... f"'{cost.description}' is a"
... f"{' virtual' if cost.is_virtual else ''}"
... f" {cost.type} variable on the '{cost.table_name}' table."
... )
...
'Cost' is a Numeric variable on the 'Bookings' table.
Building a selection:
>>> expensive_bookings = cost > 2000
>>> expensive_bookings.count()
53266
API reference¶
General variable properties¶
-
class
Variable
¶ The base class for all variables. It has various attributes for variable metadata common to all variable types.
-
name
¶ The variable’s short reference name (e.g. boDest).
-
description
¶ The variable’s descriptive name (e.g. Destination).
-
type
¶ A string of the variable’s type.
-
folder_name
¶ The FastStats system folder this variable belongs to.
-
table
¶ The table this variable is from (a
Table
object).
-
table_name
¶ The name of the table this variable is from (alias of
table.name
).
-
is_selectable
¶ Whether the variable is allowed to be used in selections.
-
is_browsable
¶ Whether the variable is allowed to be viewed with a client application (but not exported).
-
is_exportable
¶ Whether the variable is allowed to be exported by a client application.
-
is_virtual
¶ Whether the variable is a virtual variable.
-
Selector-type variables¶
-
class
BaseSelectorVariable
¶ The base class for selector-type variables with attributes common to all of them.
-
code_length
¶ The length (in bytes) of the var codes for this variable.
-
num_codes
¶ The number of different var codes this variable has.
-
var_code_min_count
¶ The number of records that have the var code with the smallest count.
-
var_code_max_count
¶ The number of records that have the var code with the largest count.
-
var_code_order
¶ How the var codes are ordered for this variable, out of: Nominal, Ascending, Descending.
-
Selector variable¶
-
class
SelectorVariable
¶ Subclass of
BaseSelectorVariable
which represents a standard FastStats Selector variable.
Operators¶
Use the equals operator ==
to select records
where this selector variable equals the given value.
If multiple values are given, it must be equal to one of them.
>>> sweden = bookings["Destination"] == "29"
>>> high_earners = people["Income"] == ["07", "08", "09", "10", "11"]
Use the ‘not equals’ operator !=
to select records
where this selector variable does not equal the given value.
If multiple values are given, it must not be equal to any of them.
>>> not_unclassified = people["Occupation"] != "!"
>>> england = households["Region"] != ["10", "11", "12", "14"]
Numeric variable¶
-
class
NumericVariable
¶ Class which represents a FastStats Numeric variable.
-
min_value
¶ The smallest value of this variable over all records.
-
max_value
¶ The largest value of this variable over all records.
-
is_currency
¶ Whether this variable represents a currency value.
-
currency_locale
¶ Locale name for the currency (if this is a currency variable).
-
currency_symbol
¶ Currency symbol for the currency (if this is a currency variable).
-
-
NumericVariable.
missing
(*, include=True, label=None)¶ Select records where this numeric variable’s value is missing.
- Parameters
include (bool) – set to False to exclude missing values from the selection (default is True)
label (str or None) – textual label for this selection clause
Operators¶
Use the ==
operator to select records
where this numeric variable equals the given value.
If multiple values are given, it must be equal to one of them.
>>> booked_3_days_ago = policies["Days Since Booking"] == 3
>>> cost_multiple_of_100 = bookings["Cost"] == [i * 100 for i in range(1, 284)]
Use the !=
operator to select records
where this numeric variable does not equal the given value.
If multiple values are given, it must not be equal to any of them.
>>> some_profit = bookings["Profit"] != 0
>>> waiting_more_than_week = journeys["Days Waiting"] != range(7)
Use the <=
operator to select records
where this numeric variable is less than or equal to the given value.
>>> premium_up_to_25 = policies["Premium"] <= 25
Use the >=
operator to select records
where this numeric variable is greater than or equal to the given value.
>>> high_profit = bookings["Profit"] >= 1000
Use the <
operator to select records
where this numeric variable is strictly less than the given value.
>>> visit_shorter_than_minute = web_visits["Duration"] < 60
Use the >
operator to select records
where this numeric variable is strictly greater than the given value.
>>> more_than_4_weeks_to_travel = policies["Days Until Travel"] > 28
Warning
You cannot use two comparison operators at once, for example, to try to pick values bounded within a range on either side. So the following code will not have the desired effect:
>>> cost_between_50_100 = 50 <= bookings["Cost"] <= 100
See the Operator chaining section below for more details.
Text variable¶
-
TextVariable.
max_length
¶ An integer giving the maximum length (in bytes) of text data per record for this variable.
-
TextVariable.
equals
(value, match_case=True, *, include=True, label=None)¶ Select records where this text variable equals the given value. If multiple values are given, it must be equal to one of them.
Can also use the
==
operator, or!=
for include=False.- Parameters
value (str or Iterable[str]) – value(s) to use in the selection
match_case (bool) – set to False to perform case-insensitive matching on the given values (default is True)
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
-
TextVariable.
contains
(value, match_case=True, *, include=True, label=None)¶ Select records where this text variable contains the given value. If multiple values are given, it must contain at least one of them.
- Parameters
value (str or Iterable[str]) – value(s) to use in the selection
match_case (bool) – set to False to perform case-insensitive matching on the given values (default is True)
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
-
TextVariable.
startswith
(value, match_case=True, *, include=True, label=None)¶ Select records where this text variable begins with the given value. If multiple values are given, it must begin with one of them.
- Parameters
value (str or Iterable[str]) – value(s) to use in the selection
match_case (bool) – set to False to perform case-insensitive matching on the given values (default is True)
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
-
TextVariable.
endswith
(value, match_case=True, *, include=True, label=None)¶ Select records where this text variable ends with the given value. If multiple values are given, it must end with one of them.
- Parameters
value (str or Iterable[str]) – value(s) to use in the selection
match_case (bool) – set to False to perform case-insensitive matching on the given values (default is True)
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
-
TextVariable.
before
(self, value, allow_equal=False, *, include=True, label=None)¶ Select records where this text variable is alphabetically before the given value. Set allow_equal=True to also select the value itself. This method is not case-sensitive.
Can also use the
<
operator, or<=
for allow_equal=True.- Parameters
value (str) – value to use in the selection
allow_equal (bool) – set to True to also include the value itself (default is False)
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
-
TextVariable.
after
(self, value, allow_equal=False, *, include=True, label=None)¶ Select records where this text variable is alphabetically after the given value. Set allow_equal=True to also select the value itself. This method is not case-sensitive.
Can also use the
>
operator, or>=
for allow_equal=True.- Parameters
value (str) – value to use in the selection
allow_equal (bool) – set to True to include the value itself (default is False)
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
-
TextVariable.
between
(start, end, *, include=True, label=None)¶ Select records where this text variable is alphabetically between start and end (inclusive). This method is not case-sensitive.
- Parameters
start (str) – start of permitted range
end (str) – end of permitted range
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
-
TextVariable.
matches
(value, match_case=True, *, include=True, label=None)¶ Select records where this text variable matches the given value, based on wildcard matching rules. If multiple values are given, it must match at least one of them.
- Parameters
value (str or Iterable[str]) – value(s) to use in the selection (see below for details of wildcards)
match_case (bool) – set to False to perform case-insensitive matching on the given values (default is True)
include (bool) – set to False to specify these as values to exclude from the selection (default is True)
label (str or None) – textual label for this selection clause
Wildcards
Wildcard
Explanation
Examples
?
matches any single character
Sm?th
matchesSmith
,Smyth
*
matches any number of characters, or none
Smith*
matchesSmith
,Smithers
,Smith-Wood
?
,*
(the two wildcards can be used in conjunction)
Sm?th*
matches all of the above, as well asSmethurst
,Smythe
Operators¶
Use the equals operator ==
to select records
where this text variable equals the given value.
If multiple values are given, it must be equal to one of them.
>>> smiths = people["Surname"] == "Smith"
>>> royal = people["Surname"] == ["King", "Queen", "Prince", "Princess"]
Use the ‘not equals’ operator !=
to select records
where this text variable does not equal the given value.
If multiple values are given, it must not be equal to any of them.
>>> not_s = people["Initial"] != "S"
>>> consonant = people["Initial"] != list("AEIOU")
Use the <
operator to select records
where this text variable is alphabetically before the given value.
This operation is case-insensitive.
>>> first_half_of_alphabet = people["Initial"] < "N"
Use the <=
operator to select records
where this text variable is alphabetically before the given value,
or equal to the value itself.
This operation is case-insensitive.
>>> up_to_jones = people["Surname"] <= "Jones"
Use the >
operator to select records
where this text variable is alphabetically after the given value.
This operation is case-insensitive.
>>> after_t = people["Initial"] > "T"
Use the >=
operator to select records
where this text variable is alphabetically after the given value,
or equal to the value itself.
This operation is case-insensitive.
>>> smith_onwards = people["Surname"] >= "Smith"
Array variable¶
-
class
ArrayVariable
¶ Subclass of
BaseSelectorVariable
which represents a FastStats Array variable.
Operators¶
Use the equals operator ==
to select records
where this array variable has the given value.
If multiple values are given, it must have one of them.
>>> james_bond = households["Car Make Code"] == "ASM"
>>> french_car = households["Car Make Code"] == ["CIT", "PEU", "REN"]
Use the ‘not equals’ operator !=
to select records
where this array variable does not have the given value.
If multiple values are given, it must not have any of them.
>>> not_unclassified_car = households["Car Make Code"] != " !"
>>> not_f_car = households["Car Make Code"] != ["FER", "FIA", "FOR"]
Flag array variable¶
-
class
FlagArrayVariable
¶ Subclass of
BaseSelectorVariable
which represents a FastStats FlagArray variable.
Operators¶
Use the equals operator ==
to select records
where this flag array variable has the given value.
If multiple values are given, it must have one of them.
>>> financial_times_reader = people["Newspapers"] == "Financial Times"
>>> tabloid_reader = people["Newspapers"] == [
... "Daily Express", "The Sun", "Daily Mirror", "Daily Mail", "Record"
... ]
Use the ‘not equals’ operator !=
to select records
where this flag array variable does not have the given value.
If multiple values are given, it must not have any of them.
>>> self_catering = bookings["Facilities"] != "Food"
>>> cannot_contact = people["Contact Permission"] != ["EPS", "MPS", "TPS", "FPS"]
Date-type variables¶
-
class
BaseDateVariable
¶ A subclass of
BaseSelectorVariable
which is the base class for date-type variables, with attributes common to all of them.-
min_date
¶ The earliest date value of this variable over all records.
-
max_date
¶ The latest date value of this variable over all records.
-
Date variable¶
-
class
DateVariable
¶ Subclass of
BaseDateVariable
which represents a FastStats Date variable.
Operators¶
Use the equals operator ==
to select records
where this date variable is the given date.
If multiple dates are given, it must be one of them.
>>> from datetime import date
>>> christmas_day_2018 = bookings["Booking Date"] == date(2018, 12, 25)
>>> valentines_day = bookings["Travel Date"] == [
... date(y, 2, 14) for y in range(2016, 2023)
... ]
Use the equals operator !=
to select records
where this date variable is not the given date.
If multiple dates are given, it must not be any of them.
>>> not_new_years_day_2020 = bookings["Travel Date"] != date(2020, 1, 1)
>>> not_easter = bookings["Travel Date"] != [
... date(2016, 3, 27)
... date(2017, 4, 16)
... date(2018, 4, 1)
... date(2019, 4, 21)
... date(2020, 4, 12)
... date(2021, 4, 4)
... date(2022, 4, 17)
... ]
Use the <=
operator to select records
where this date variable is before the given date (or is the date itself).
>>> bookings_before_2019 = bookings["Booking Date"] <= date(2018, 12, 31)
Use the >=
operator to select records
where this date variable is after the given date (or is the date itself).
>>> from dateutil.relativedelta import relativedelta
>>> under_30 = people["DOB"] >= date.today() - relativedelta(years=30)
Warning
You cannot use two comparison operators at once, for example, to try to pick values bounded within a range on either side. So the following code will not have the desired effect:
>>> summer_holiday_2019 = date(2019, 7, 1) <= bookings["Travel Date"] <= date(2019, 8, 31)
See the Operator chaining section below for more details.
Date-time variable¶
-
class
DateTimeVariable
¶ Subclass of
BaseDateVariable
which represents a FastStats DateTime variable.
Operators¶
Use the <=
operator to select records
where this datetime variable is before the given datetime
(or is the datetime itself).
>>> before_4pm_halloween_2019 = web_visits["wvTime"] <= datetime(
... 2019, 10, 31, 15, 59, 59
... )
Use the >=
operator to select records
where this datetime variable is after the given datetime
(or is the datetime itself).
>>> after_july_2016 = communications["cmCommDt"] >= datetime(2016, 8, 1, 0, 0, 0)
Warning
You cannot use two comparison operators at once, for example, to try to pick values bounded within a range on either side. So the following code will not have the desired effect:
>>> during_webinar = datetime(2019, 6, 3, 15) <= web_visits["wvTime"] <= datetime(2019, 6, 3, 16)
See the Operator chaining section below for more details.
Reference variable¶
-
class
ReferenceVariable
¶ Class which represents a FastStats Reference variable.
(no operators are currently supported for ReferenceVariable
objects)
Operator chaining¶
The following is true for all variable types,
as it also applies to the [not] equals operators ==
, !=
,
but is particularly relevant for those types that support the operators
<=
, >=
, <
, >
:
Numeric, Date, DateTime
Warning
You cannot use two comparison operators at once, for example, to try to pick values bounded within a range on either side. So the following code will not have the desired effect:
>>> born_in_1990 = date(1990, 1, 1) <= people["DOB"] <= date(1990, 12, 31)
Python does normally support this ‘operator chaining’ syntax when using the operators for standard comparison, but it doesn’t work in our situation where the operators have been overloaded for creating selections.
In this example, because of the way operator chaining and short-circuit evaluation work, this ends up being equivalent to just the right-hand part of the expression:
>>> born_in_1990 = people["DOB"] <= date(1990, 12, 31)
Variables missing from Session object¶
If the session initialization process encounters an unfamiliar variable type not supported by the package, it will log a warning using the Python built-in logging module. If a variable from your FastStats system seems to be missing from the Session object, this may be the cause.
To investigate, you will need to run the session initialization process with a
log handler configured to direct the log output to your chosen destination
to inspect the log.
The log message for this warning will begin Failed to initialize variable
.
You can report unsupported variable types by contacting Apteco support: support@apteco.com or creating a GitHub issue — this will help us prioritise developments.