What is Test Data Management (TDM)?

In my previous blog, The Importance of Test Data Management in the Modern Era, I discussed the need of test data generation and consumption in today’s DevOps and cloud era. In this blog I will focus on what is test data management (TDM) and how compliance and regulatory requirements play a major role.

The creation and consumption of test data by Quality Assurance is a mature process. However, the introduction of Agile & DevOps-based development methodologies and various automated and manual application testing tools provided by cloud platforms has made this an evolving process. Implementing a coherent test data strategy has never been more important. 

Realising the benefits of Agile & DevOps-based development requires you to deliver the right data to the right place, at the right time. Yet, developers and testers spend 50% of their time searching for the correct data. This makes testing a bottleneck to reducing costs and delivering valuable software early. 

Test Data Management teams need to deliver: 

  • The right quality of data – Application functionality rarely changes, and most test data sets are reused, therefore it would make sense to generate the data once and use it multiple times. The quality of data used in TDM depends on schemas, tables, and their data association between primary and foreign keys. Applications consist of more than one schema or tables from which they fetch data. While applying data protection on primary keys the encryption should be applied on foreign keys to maintain referential integrity.
  • The right amount of data – Production databases are exceptionally large and contain application and log information. To determine the correct amount of data that can be extracted as a subset the following needs to be considered:
    • Functionality of the application under test
    • Schema and tables from which the application functionality is fetching data
    • Number of test cases executed, to arrive at the amount of data required to execute the test cases.

Often, rare occurrences of data are quickly ‘burned’ for testing. This necessitates a complete database refresh request. This is costly and time-consuming. Data cloning allows you to extract just the specific attributes you need and ‘clone’ them multiple times, ensuring that they are always available for use. You can also significantly reduce the time required to deliver data into development.

  • The right format of sensitive data – There are numerous ways of protecting sensitive data while creating test data from production databases.
    • Static Masking – replacing the sensitive data with random alphabets, alpha-numeric or with fixed alpha/numeric values. Reversible masking requires a dictionary or storage to get back the original values, whereas non-reversible masking requires algorithms to generate unique random values to avoid duplication of masked data.
    • Dynamic Masking – as this type of masking used to send data to untrusted applications or while displaying data on the screen, it is not useful in generation of test data.
    • Tokenization – primarily used to replace numeric values such as Credit Card Numbers, Social Security Number (SSN), Permanent Account Number (PAN) to a random value using vault-based tokenization. This requires storage equal to the data to be tokenized or vault-less tokenization which is based on seed-based tokenization algorithms which generates same token value for a given data. This anonymization maintains format-like number elements in the sensitive data.
    • Encryption – this method is used to protect the data while in use. Data is encrypted when it is being transferred from source to destination. Also, using encryption while generating test data is not recommended, as encrypted data will be in binary format and will need schema changes to store encrypted data in databases. However, CyberRes Voltage’s patented Format Preserved Encryption (FPE) makes sure that the format of the encrypted data follows the format of source data like alpha/numeric/case-sensitiveness which allows the creation of test data securely, in compliance with regulatory requirements. 
  • Data available at the right time - Correcting the data causes extensive rework and critical delays in testing. It also means that downstream teams are sitting idle, waiting for the right data to be provisioned.
  • The right tools – whether data is going to be cloned/extracted/copied from production database or produced manually using home grown scripts or using third party tools, quality, correctness, and security compliance of data, are critical in generation of test data. 

Users & Challenges in Test Data Management

In an enterprise, there is data, the core of business processes and there are users, diverse types based on how they utilize the data. They include:

  • Users - who use the data, day-in and day-out to keep the business running. Database administrators, developers, application teams, and QA team are the some of them.
  • Custodians – who own the data and are responsible for the security and sharing of the data internally within the organization or with external organizations depending on the business needs. Typically, this group consists of application owners, CISO/CTO, and organization’s compliance & legal team.
  • Monitoring Authority – who defines compliance and regulatory requirements on how the data can be processed within the business process or within the country. This group of data users consists of regulatory body like GDPR, PCI-DSS, and CCPA.
  • Exploiters or malicious users – unauthorized users who intend to extract the data for personal or financial/material gain. 

There are many methods for generating test data, including manual generation, synthetic data generation and data extraction from the production database, to validate new application functionality and execute various test cases like unit, integration, performance, and system tests. Each method has its own benefits and challenges.

Data vs Test Data Compliance

Our CyberRes solution on Test Data Creation and Management is based on extracting data from the production database. This method has its own challenges such as:  

  • Test data needs to mimic production as closely as possible. Poor data quality or improper obfuscation techniques can lengthen development cycles as developers debug issues related to poor data quality.
  • Finding and identifying sensitive and personal data, as well as documenting the remediation of the sensitive and personal data—to provide a defensible position in the event of a breach.
  • Keeping test data “fresh.” With multiple parallel initiatives occurring in rapidly diminishing DevOps release cycles, “out of date” databases and data structures can cause further delays in development efforts.
  • Test data management is typically “disconnected” from DevOps tools and processes. Test data management solutions should not only fit in with your current methodology, but also integrate with existing tools and processes used by the development and testing community

Questions on TDM Influenced by Digital Transformation

CyberRes Voltage Point of View

Why do compliance and regulatory requirements need to be considered while generating test data?

 

Most countries are working on regulations to secure information related to personal, health, payment, and sensitive business data in use, stored, or while processing. And most of these regulations recommend encrypting or anonymizing data based on the use case. For regulations, such as GDPR and CCPA, anonymizing the sensitive data is recommended while generating test data from production data bases.

 

What are the risks of semi-compliance or non-compliance with data privacy regulations?

 

The evolving enforcement of data privacy regulations mandates that organizations establish the terms of liability and exposure for the sensitive data they handle. The most common approach is to set up a structure of legal actions and financial implications for specific violations.

 

Why does sensitive data have to be discovered? How can the data discovery and protection in test data creation be automated?

 

To protect sensitive data, we must know where they are stored; regulatory requirements also mandate that organizations identify them. Micro Focus products that provide automated search of sensitive data built on technology such as AI and ML and grammars based on regulatory requirements speed up the discovery of sensitive data in all or most data repositories located on premises and in the cloud.

How can existing data be used and monetized effectively in generating test data?

 

Different protection techniques can be applied to sensitive data, based on how and where it will be used. However, if data is encrypted with a protection mechanism, such as Format-Preserving Encryption (FPE), it is easier to share the data with the TDM or analytics team because they can simply extract the data from the database.

How can sensitive data be secured to meet approved encryption or protection standards?

 

Sensitive data can be protected using any of the anonymization techniques, such as encryption, tokenization, or masking. Encryption is a reversible process where the original data is recovered by decrypting it with an encryption key, whereas the process of tokenization and masking does not support a reversible operation. Micro Focus Format-Preserving Encryption (FPE) is a NIST-approved proprietary encryption algorithm based on AES.

Where in the geography will test data be used? What kind of compliance and regulatory requirements need to be satisfied?

For organizations that are geographically distributed (including development and testing teams), they must comply with regulations related to where data is stored and where it is consumed. Fortunately, most of the regulations recommend similar data protection methods (such as encryption or tokenization) for sensitive data. This means the same products/tools can be reused across the organization. Micro Focus products can be deployed on premises or in the cloud and support integrations with most application platforms.

The CyberRes Voltage product suite provides a best-in-class solution in creating test data from existing production data base in compliance with current security regulations such as GDPR, CCPA, HIPAA, PIPEDA, PDPA (Australia) and more. 

CyberRes Voltage Secure and Compliant Test Data Management solution integrates Voltage Structured Data Manager (SDM), and Voltage SecureData. It offers organizations an automated solution to discover and secure sensitive data. 

Thank you for reading my blog. In the next blog in this series, I will be focusing on our CyberRes SC-TDM architecture and how it generates test data securely in compliant to regulatory requirements. 

Connect With Us:

Join our Voltage Data Privacy and Protection Community. Have technical questions about Data Security and Encryption? Visit the Data Security User Discussion Forum. Keep up with the latest product announcements and Tips & Info about Data Security and Encryption. We’d love to hear your thoughts on this blog. Log in or register to comment below.

Labels:

Data Privacy and Protection
Anonymous