In our last post, we discussed the fifth of our six-process series on establishing a data security practice, namely implementing data security throughout the enterprise. We’ll discuss the sixth and final process on monitoring a data security practice in our next post. Yet for now, let’s take a technical detour and discuss the key (pardon the pun) cost and performance drivers for data security.
No information technology (IT) process is free. There is always a cost somewhere: maybe not to the service consumer, yet most definitely to the service provider. Data protection in the form of either encryption or tokenization is no exception: when these technologies are added to an existing IT process, costs go up.
These can be direct costs, in the form of acquiring more equipment or software licenses. Or spending more on operational costs, such as power and staff. Or the costs can be indirect, for example taking more time to complete an IT process as data protection is CPU intensive. Perhaps each data protection operation takes microseconds. Yet in a large organization protecting petabytes of data, these all add up.
The question then becomes how can we mitigate these direct and indirect costs? By reducing the number of times we need to run a data protection IT process. We can do this in three ways: protect data at the application level, avoid decryption when possible, and reduce the life cycle key management cost.
One of the benefits of protecting data at the application level is protection need be applied only once. Last time we discussed that persistent and pervasive data benefit most from application level protection as the data need not be protected and re-identified when it moves throughout the enterprise. This avoids security gaps as well as the cost of repeated protect/unprotect cycles. So while application level data protection may (or may not) cost more during deployment, this method has lower operational cost.
So one method of reducing cost and increasing overall system performance is avoiding unnecessary protection. The second method is to avoid unnecessary re-identification, decryption or detokenization, whenever possible.
This begs the question, how is it possible to avoid re-identification? How can we use protected data in a business process? Earlier in this blog series, we discussed performing analytics on protected data. We gave the example of how a health care organization can help prevent deaths by detecting prescription drug abuse. This is a real-world use case employed by a number of such organizations.
What makes this magic possible? Data protection is a function with an inverse: protect something and we get exactly one unique output. Re-identify that output and we always receive the original input. Thus we maintain relational integrity and thus may perform relational algebra on protected data. Without spending the cost or time to first re-identify that data.
We can extend the ability to avoid re-identification further via the concept of partial protection. Let’s take a credit card number for example: 5336 4312 3456 4059. Those of you who follow the payment industry realize this is a card number using “first six last four” partial protection. Meaning the first six digits of the card number, 5336 43, and the last four, 4059, are not protected, or in the clear. The middle six digits, 12 3456, are protected, or cipher text.
How does partial protection avoid cost and increase performance beyond that of full protection? It allows us to remove data sensitivity while still allowing more types of authorized use. One can use the first six digits to perform business analytics, for example answer the question what percentage of participants in an on-line promotion hold a MasterCard (5336 43 is a bin assigned to this card issuer)?
We can also use this partially protected value in customer service operations, for example verifying a card holder via the account number (4059) in interactive voice response (IVR) authentication system. Yet no one can use the partially protected number to commit fraud or theft!
The third method we can use to reduce cost and increase performance is employ stateless key management. Encryption requires the use of a security key, one for every different access policy. And there are often thousands of granular access policies running within an enterprise.
Stateless key managers generate encryption keys for authenticated users on demand. If a user needs a key that protected data ten years ago, a stateless key manager has no trouble delivering that key today. And stateless key managers do not have an upper limit on the number of keys that may be in use at any point in time. If an enterprise needs ten million keys to protect its Internet of Things, a stateless key manager can handle the load. And a stateless key manager integrates with existing directory authentication services, avoiding the need for a dedicated staff.
Contrast this an ordinary or stateful key manager: these generate keys first and then store, retrieve, and forward those keys to applications. Stateful key managers cannot regenerate the same key twice: if one deletes a key, it may never be recovered. These key managers have hard upper limit on the number of keys that may be employed simultaneously, and the best top out in the single digit millions. More importantly, most stateful key managers require dedicated staff and do not integrate with enterprise directory services.
Keep in mind that judicious use of data security does reduce its cost and increase its performance. Avoiding re-identification, employing partial protection, and focusing on key management cost of ownership are long-term methods for efficient implementation of a data security practice.
What are your thoughts on these techniques? Does protecting only the middle six digits of a credit card number offer sufficient security? Why or why not? We’d love to hear your thoughts in the comments below.
Data security and encryption