Sidebar: The Pseudo-Parallel Approach to Purging Non-Partitioned Data
Parallelizing operations is easy in Oracle. SQL Server doesn't offer this capability, but you can run parallel operations.
December 19, 2010
The Pseudo-Parallel Approach to Purging Non-Partitioned Data
When you have related tables that need to be purged, particularly when foreign key constraints are involved, you typically need to run the purges for each table serially so that the data is deleted from the first table, then the second table, and so on, in the correct order. For small volumes of data, running purges serially is usually acceptable. However, for large volumes, you might need to break the work into pieces and purge the pieces concurrently (i.e., in parallel) where possible.
In Oracle, parallelizing operations is easy. SQL Server doesn't offer this capability, but you can run pseudo, or "fake," parallel operations. The pseudo-parallel approach works when you have only application-enforced foreign key relationships or very limited numbers of database-enforced foreign key relationships between the tables. In some systems, you might have a mix of both application-enforced and database enforced constraints, which would also allow you to use this approach.
The code in Listing 1 demonstrates the pseudo-parallel approach. (You can download this code by clicking the Download the Code Here button near the top of the page.) To run this code, you need to schedule multiple SQL Server jobs to run at the same time, with each job running a particular piece number.
With this design, you can greatly scale the purge process out rather than scaling it up—that is, you can spread the work across multiple threads of execution as opposed to throwing more and bigger hardware at the process. However, the pseudo-parallel approach has limited applicability due to the requirement that you must be able to split the DELETE operations into separate, unrelated, and preferably restartable processes. This approach doesn't work when there are significant numbers of foreign key relationships involved and the tables being purged don't stand alone without the presence of hard foreign key constraints, even though a given group of tables might all be part of a single database subsystem.
When you can deploy this method, though, you can achieve extremely high volume purges in a very short period of time. I've achieved delete throughput in the tens of millions of rows per hour with this design.
Web Listing 1: The Pseudo-Parallel Approach
CREATE PROCEDURE MyProc (@ParallelMode Bit = 0, @PieceNumber Tinyint = 0)ASSET NOCOUNT ONIF (@ParallelMode = 0) BEGIN DELETE FROM Table1 WHERE SomeField = SomeValue DELETE FROM ChildTable WHERE SomeField = SomeValue DELETE FROM ParentTable WHERE SomeField = SomeValue ENDIF (@ParallelMode = 1) BEGIN IF (@PieceNumber = 1) BEGIN DELETE FROM Table1 WHERE SomeField = SomeValue END IF (@PieceNumber = 2) BEGIN DELETE FROM ChildTable WHERE SomeField = SomeValue DELETE FROM ParentTable WHERE SomeField = SomeValue END END
About the Author
You May Also Like