5. Specifying selection values

In this part of the tutorial we’ll learn other ways of specifying the values within selections.

Excluding values from selections

So far we’ve always specified the values we want to include in our selections. But just as in the FastStats application, we can also choose what values we want to exclude instead. We do this using the Python ‘not equal’ operator != :

>>> uk_only = households["Region"] != "14"
>>> uk_only.count()
741572

(14 is the selector code for the Channel Islands region, so this a selection of all households in the UK ‘proper’)

Hint

Just as with the == operator seen in previous examples, instead of performing a comparison, the != operator has been ‘overloaded’ to create a selection but with the include flag set to False — to exclude the given value.

>>> type(uk_only)
<class 'apteco.query.SelectorClause'>
>>> uk_only.values
['14']
>>> uk_only.include
False

We can also specify multiple values to exclude:

>>> england = households["Region"] != ["10", "11", "12", "14"]
>>> england.count()
627550

(10, 11, 12, 14 are the selector codes for Scotland, Wales, Northern Ireland & Channel Islands)

You can use the != operator everywhere that you can use == .

Specifying ranges of values

As well as choosing one or more specific values, some variables also allow you to choose a whole range of values using the Python operators <= and >= :

Numeric ranges

>>> at_least_2k = bookings["Cost"] >= 2000
>>> at_least_2k.count()
53267
>>> low_profit = bookings["Profit"] <= 25
>>> low_profit.count()
211328

Numeric variables also support the operators < and > for selecting a range of values, but not include the boundary value itself.

Note

Unlike the == and != operators, the other comparison operators only accept a single value.

Date & DateTime ranges

We can use comparison operators to create a selection with a range of dates or times:

>>> from datetime import date, datetime
>>> bookings_before_2019 = bookings["Booking Date"] <= date(2018, 12, 31)
>>> bookings_before_2019.count()
972439
>>> web_visits = my_session.tables["WebVisits"]
>>> website_launch = datetime(2019, 5, 11, 15, 12, 36)
>>> visits_to_new_site = web_visits["Web Visit Time"] >= website_launch
>>> visits_to_new_site.count()
133564

Because Date and DateTime variables use datetime.date and datetime.datetime objects, we can take advantage of functionality available for working with these. For example, using the popular dateutil package:

>>> from dateutil.relativedelta import relativedelta
>>> under_30 = people["DOB"] >= date.today() - relativedelta(years=30)
>>> under_30.count()
207737

Text Ranges

Using the comparison operators with a Text variable allows you to select values that are alphabetically earlier or later than a given value.

>>> second_half_of_alphabet = people["Surname"] >= "N"
>>> second_half_of_alphabet.count()
410954

Restrictions on using comparison operators

All of the examples above specify an unbounded range of values — it is only limited by one value, and allows all values above or below this (depending on the operator used).

Warning

You cannot use two comparison operators at once, for example, to try to pick values bounded within a range on either side. So the following code will not have the desired effect:

>>> born_in_1990 = date(1990, 1, 1) <= people["DOB"] <= date(1990, 12, 31)

See the warning about Operator chaining in the variables reference guide for more information about this.

We will learn in the next part how to achieve the desired result by joining more than one selection together instead.