Test Data Management? Didn’t We Solve This Years Ago?

by in CyberRes

What is Test Data Management (TDM)?

Since the first applications were developed, the need for a test database was recognized so that development operations did not impact production databases and systems.

Test Data ManagementTraditionally, TDM was about creating a relationally intact, reliable copy or subset of production data, or data very similar to it, that application and systems developers could use for different test use cases. The primary reason people would do this was to reduce the size of the production dataset in order to save on storage and improve performance.

At its core, Test Data Management is the ability to create non-production data that is a realistic simulation of the actual production data.

Historically, these projects were very tactical and of low importance on the IT project list—and typically were run entirely within IT between database administrators and the testing/quality assurance functions without any oversight from regulatory bodies or auditors. Therefore, since there was only a “carrot” and no “stick,” many organizations chose to create clunky, manual in-house solutions rather than invest in state of the art (at the time) solutions.

What are the challenges?

In addition to a perceived low value around investing in TDM, there were several other challenges.

The first: the need to create a realistic dataset representing actual production data and exhibiting the same level of data integrity. The last thing you want your development and testing communities debugging is good code that fails due to poor data quality. Nothing irritates developers more than de-bugging non-code related issues due to poor data quality.

Also, under the previous constraint, how do you create a realistic “right-size” dataset? Further, since these are test databases, what about the creation and refresh rates required as the speed of the development and release cycles continues to dramatically increase year over year as more companies adopt agile DevOps processes? Given the low value of investment in these “non-strategic” projects through the years, many processes are very manual, are difficult to repeat, and lack automation.

Therefore, it is also not unusual to hear of production database copies, perhaps containing personal and sensitive data, being cloned and used as a testing database. Even in the case where the “critical” production systems had some protection, many of the systems, marketing for example, were largely ignored.

Lastly, the majority of energy and focus is on protection of production databases. Since test data is considered “internal use,” most organizations de-emphasize and/or lowers its security posture in regards to non-production system. To exasperate the issue, there are many, many copies of a single production database. For core systems, it is not unheard of to hear of 20+ copies of production to support development, quality assurance, load testing, model office, education as well as reporting and analytics copies. So, one lonely little production database with some sensitive content can spread like crazy across an enterprise.

What Changed?

It would be hard to argue against the fact that breaches are rising exponentially. Further, a large portion of breaches happen on non-production systems as detailed below:

  • [Breach] involved “an isolated, self-enclosed demo lab in Australia – not connected to Symantec’s corporate network – used to [demonstrate] various Symantec security solutions and how they work together” [The Guardian]
  • … cosmetics giant Estée Lauder exposed its database containing over 440 million records on the internet. As per the company, the database was from an “education platform,” [StealthLabs.com]
  • Parenting retailer Kiddicare has suffered a data breach that exposed the names, addresses and telephone numbers of some of its customers … data had been taken from a version of its website set up for testing purposes. [BBC News]

As well as countless breaches on “unsecured servers”.

Data privacy regulations, including the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), have refocused attention on Test Data Management. These laws have put pressure on organizations to ensure that all personal and sensitive content has been identified and removed from test data. 

Therefore, in addition to the “carrot” which has always been there, Data Privacy regulations now provide the “stick.” And a hefty stick it can be! There is significant focus and penalties for non-compliance of CCPA.

Further, the changing regulatory landscape means TDM is no longer just an IT-only function. Data privacy mandates have raised the stakes considerably for organizations that fail to apply due diligence standards to protecting personal and sensitive data.

So chief compliance officers, legal counsel, chief data officers, and other senior executives have all begun paying closer attention to data risks, in both production and non-production environments. To demonstrate compliance with organizational standards for data protection, TDM projects and processes cannot exist in their current form. They must evolve.

Integrating the identification and remediation of personal and personal and sensitive data in test data subsets with automated data-centric protection is now a requirement for effective TDM. This ensures that data privacy compliance is built-in.

To summarize, the new privacy laws demand a modern approach to TDM.

The Cyber Resilience Approach

What’s needed in today’s climate is a complete, reliable, adaptive, and automated, framework to navigate the new world of data protection and privacy. The combination of Voltage Structured Data Manager (SDM) and Voltage SecureData (SD) does just that.

The first piece is you have to “know what you don’t know.” It's about making sure you have analyzed every database and every dataset for personal and sensitive content.

Once you have done that discovery, you need an automated process for identifying and protecting personal and sensitive content using a predefined set of rules that would act on the content and protect it based on what kind of classification it falls under. The use of format-preserving encryption, anonymization, masking, and other techniques to prevent personal and sensitive content from getting exposed in the development and testing stages is now more important than ever.

Therefore, our capabilities allow you manage your non-production data from identification all the way through remediation in an automated, repeatable process. Additionally, you can create a very defensible posture of what you found, what the associated risk was, how it was prioritized and how you remediated it through dashboards and audit reports as supporting evidence.

So, while this challenge has been addressed many years ago, solutions of that time did not understand or take into account the size and scope of data privacy regulations today. Therefore, everyone should conduct a thorough review of existing processes and new approaches developed.

Join our Community | Data Security User Discussion Forum| Tips & Info| What is Data Security? 


Data security and encryption
Parents Comment Children
No Data