Enforcing Third Normal Form in Database Design

In our previous post introducing this database security blog series, we identified six major topics of discussion. The first of which is enforcing third normal form (3NF) in database design. This is a fundamental data model necessary to guarantee relational integrity when we encrypt data items. Especially if one wants to rotate keys within a database.

Enforcing Third Normal Form in Database Design.pngOf course before we can talk about 3NF, we must talk about its predecessors: first normal form (1NF) and second normal form (2NF). In general, database normalization is an iterative process of organizing a data model.

For example, to be considered compliant with 2NF, a database must first be compliant with 1NF. The most normalized form is 6NF, so 3NF is a low standard of compliance. While discussing the requirements for 1NF, 2NF, and 3NF, please reference the above-linked Wikipedia articles for diagrams and more examples.

Let’s start with 1NF compliance: this means the domain (or column) of each attribute (or data item) contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain.[1] Or in other words, being compliant with 1NF means all columns must contain exactly the same type with exactly one data item of that type in each row. Non-compliant examples include a column that could contain either an email or a telephone (not the same type) or a telephone column containing any a comma separated list of values (not atomic).

Moving on to 2NF compliance: this is defined as 1NF compliance and an additional constraint that there is no non-prime (not part of a key) attribute that is functionally dependent on any proper subset of any candidate key of the relation.[2] Or in other words, in 2NF every non-key column must be dependent on all keys, including the primary key or another candidate key.

A non-compliant example would be storing shipping addresses and line items in the same order table, with the shipping addresses dependent on the order number and the line items dependent on the order number and time stamp. Since the shipping address is identified by a subset (just one column) of the same key used by line items (both columns), this arrangement is not 2NF compliant.

Designing a database to be 2NF compliant does save on storage space. It also makes the database easier to query. However 3NF compliance is necessary when we encrypt data items for security. And 3NF compliance means the database design is 2NF compliant—as well as 1NF compliant.

So 3NF compliance requires the additional constraint that any non-prime attribute may not be dependent on any other non-prime attribute.[3] In practice, this means if a database is 3NF compliant, the primary key does not contain any sensitive information. Mathematically speaking, that’s not part of 3NF’s definition. Yet practically speaking, it is.

For example, let’s consider a customer table with the following attributes: customer name, credit card number, driver license, social security number. In the 2NF case, all of those attributes depend on at least one: customer name.

Even though in practice we could use the credit card number as a primary key to uniquely identify the row, 3NF compliance says we cannot. Then we would have a situation where the driver’s license number would be related to the customer name. Since both of those attributes are non-prime, we would be violating 3NF.

How do we comply with 3NF? Introduce another value that will uniquely identify each row yet has no relation to any other attribute in the table. We can then have all these other non-prime attributes depend on each other. Yet they would be independent—and have nothing to do with—the primary key.

This is why 3NF compliance is so important to database security as it removes all sensitive information from primary key columns. But why is this essential for database security? What’s wrong if we use, say, a driver’s license number as a primary key?

If our database design is not 3NF compliant, we create a problem applying encryption on a primary key column. Encryption does increase security while maintaining referential integrity, so that’s a solid practice. Yet the problem comes when a security policy calls for encryption key rotation: a single input value would be transformed into more than one output. And this, friends, breaks referential integrity.

Continuing our example, if we use a driver’s license as a primary key, encrypt it, and then rotate keys every year, we would break all foreign key relations to that primary key. However if we use 3NF, the primary key is not sensitive, would not be encrypted, and thus would never be subject to rotation.

Normally those of us in the database security practice do not design data models. However we must maintain awareness of how relational databases operate and insist compliance with 3NF. At least then we can rotate keys used to encrypt data items without breaking relational integrity.

In our next post, we will discuss choosing a database protection strategy from among the more prevalent technologies. Meanwhile what is your experience with rotating encryption keys in a database? How did (or does) that work for you? Please leave your thoughts on this discussion in the comments section below.


[1] Elmasri, Ramez, and Shamkant Navathe. Fundamentals of database systems. Addison-Wesley Publishing Company, 2010.

[2] Codd, Edgar F. "Further normalization of the data base relational model." Data base systems (1972): 33-64.

[3] Ibid.


Data security and encryption