“Sometimes you have to go through something else to find what you're looking for.”
–Robert Douglas Genn, Canadian landscape painter, May 15, 1936-May 27, 2014
In our last post on alternative column encodings, we discussed how storing clear-text approximations of sensitive data makes searching easier and faster. We looked at encoding names with Soundex and encoding a person’s birth year (or year and quarter) as examples. These alternate encodings are “loss-full” encodings that offer significant analytic value and do not “leak” PII data.[i]
Continuing the topic of database searching, let’s consider another possibility: can one perform range searches, that is searching for a data item between two values, on protected data? If the range is continuous, for example numeric values between fifty and one hundred, unfortunately the answer is no. Yet if the range is discrete, for example dates between January 1 and June 30 of this year, the answer is yes. If one, as Genn suggests, searches by going through something else.
That something else? A set of discrete protected values. When performing set searches, we can use the SQL[ii] WHERE IN clause to find what we’re looking for. Let’s compare two searches yielding the exact same output: the first will be a continuous range search using an inequality. And the second will be a discrete range search using a set.
Here’s the example problem: suppose we had a warehouse issue for all orders shipped between May 31 and June 5, 2019 inclusive. We can discover the impacted order numbers by running an SQL query. First, let’s express this as an inequality search:
SELECT order_number FROM shipments
WHERE ship_date >= '05/31/2019'
AND ship_date <= '06/05/2019';
Here we’re asking for a list of all order numbers from the shipments table where the date is between two values, an inequality. Now let’s express this same query as a set search:
SELECT order_number FROM shipments
WHERE ship_date IN ('05/31/2019', '06/01/2019', '06/02/2019',
'06/03/2019', '06/04/2019', '06/05/2019');
This is the same query except we are asking for rows where the ship date matches any one of six discrete values, or elements, in the set of needed values. We end up with the same outcome, however ( (again quoting Denn) by going through something else to get there.
Let that soak in for a while. Absorb the difference between a continuous query, using an inequality, and a discrete query, using a set. Get comfortable with the concept of going through something else. Then when these examples soak in, let’s consider a more general case.
We can specify an algorithm for performing discrete range searches on protected data using a user defined function, such as the following (expressed in pseudo code):
let searchValues = list(plain text values to be searched);
for all searchValues
let searchValue[i] = protect(searchValue[i]);
let columnNames = list(columns to be displayed);
let tableNames = list(tables to be searched);
let searchColumn = column to be searched;
let resultSet = SELECT columnNames
WHERE searchColumn IN (searchValues);
Now how about some application examples? Well, let’s save those for our next post!
Meanwhile, what is your experience with database operations? Have you run across the concept of searching on continuous vs. discrete values before? Please post your thoughts in the comments section below.
[ii] Structured Query Language
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.