Retaining Relational Integrity in the Data Warehouse
Find out why Michelle thinks you shouldn't enforce referential integrity at the data warehouse level.
June 2, 2008
Retaining relational integrity between the fact table and its dimensions is very important. The data warehouse can become a trusted repository for enterprise data only if the data is clean and its integrity is intact. If, because of operational problems, you can't justify leaving the fact table foreign key constraints in place, plan to execute a periodic program that searches for violations of referential integrity and reports on them, so you can remedy the situation. It’s one thing to have a dimension record that’s not reflected in a fact table—that’s pretty typical. But it’s another thing to have a foreign key value in a fact table that has no associated record in a dimension.
Related: Revisiting How to Avoid Referential Integrity Errors
The valid orphan child record condition is unusual, even in a transactional database. In a data warehouse, it can cause problems for business intelligence (BI) software or when trying to build cubes for SQL Server Analysis Services (SSAS). Your best approach in this case, if orphan child records in the fact table are valid, is to create a record in the associated dimension with a value of unknown and during the extraction, transformation and loading (ETL) process, set the foreign key value in the fact table to point to the unknown dimension record.
Not Enforcing Referenctial Integrity
My opinion might rub some readers the wrong way because many shops have ETL strategies that enforce referential integrity. They depend on the ETL applications in much of the same way as transactional applications enforce referential integrity in an OLTP database. However, as in the OLTP environment, all it takes is a single ETL process that fails to enforce referential integrity, and you’ve got the scenario I described in the preceding paragraph.
The other, more valid argument for not enforcing referential integrity at the data warehouse level is that ETL processes sometimes have to load data in such a way as to violate referential integrity. In such a situation, there’s no alternative to letting the ETL process enforce referential integrity. In this case, you must have strong standards and control over your ETL processes to prevent breaches of referential integrity.
Related: 5 Things to Consider for SQL Server Data Warehouse DBAs
About the Author
You May Also Like