6. Combining selections using logic

In this part of the tutorial we’ll learn how to build more complex selections by joining separate parts together, and how to use various types of logic in the process.

Joining two selections together

So far we’ve only looked at selections with one condition based on a single variable, but we can combine these to create selections with multiple conditions.

We simply use the Python bitwise and operator & to combine two parts into one selection:

>>> sweden = bookings["Destination"] == "29"
>>> at_least_2k = bookings["Cost"] >= 2000
>>> expensive_sweden = sweden & at_least_2k
>>> expensive_sweden.count()
632

This works just like the AND logic in a selection tree in the FastStats application: a record (in this case, a booking) must match both conditions to be selected.

In a similar way, you can use the bitwise or operator | to combine selections using OR logic:

>>> student = people["Occupation"] == "4"
>>> under_21 = people["DOB"] >= date.today() - relativedelta(years=21)
>>> eligible_for_discount = student | under_21
>>> eligible_for_discount.count()
188364

Note

Be sure to use the & and | operators, i.e. the ampersand and vertical bar symbols. Using the and or or keywords will not work.

Returning to the example from the previous part, here’s how we can create a selection with values in a range bounded on both sides:

>>> born_in_1990 = (
...     (people["DOB"] >= date(1990, 1, 1))
...     & (people["DOB"] <= date(1990, 12, 31))
... )

Negating a clause

To exclude a selection from the final count, you can use the bitwise not operator ~ to apply NOT logic to it:

>>> high_earners = people["Income"] == (f"{i:02}" for i in range(7, 12))
>>> low_mid_earners = ~high_earners
>>> low_mid_earners.count()
1149439

Of course, we could have made the same selection by just selecting the low-mid income codes directly or by swapping the == operator for !=. But the ~ is particularly useful to apply to a selection made of more than one part, rather than trying to reverse the overall logic by changing the individual parts:

>>> eligible_for_discount = student | under_21
>>> pay_full_price = ~eligible_for_discount
>>> pay_full_price.count()
968189

Note

Be sure to use ~ operator, i.e. the tilde symbol. Using the not keyword will not work.

Joining across different tables

The previous examples joined a pair of clauses from the same table, but you can also combine clauses from different tables:

>>> high_earners = people["Income"] == (f"{i:02}" for i in range(7, 12))  # people selection
>>> at_least_2k = bookings["Cost"] >= 2000  # bookings selection
>>> high_affordability = high_earners | at_least_2k  # will resolve to people
>>> high_affordability.count()
56096
>>> high_affordability.table_name
'People'

When combining clauses from different tables the resolve table of the resulting selection is determined by the left-most part. In this example, that’s the high_earners selection, which is on the People table.

py-apteco automatically adapts other parts of the selection to match this, by inserting the required table changes using the ANY or THE operations familiar from FastStats selection trees. So the selection in this example comprises 56,096 people who:

  • either have income of £60k+

  • or have made ANY booking costing at least £2k

Although the automatic table changes are often what we want, we are also able to manually change the resolve table of a selection, which we’ll learn about in the next part.