Sidebar: The Pseudo-Parallel Approach to Purging Non-Partitioned Data

Parallelizing operations is easy in Oracle. SQL Server doesn't offer this capability, but you can run parallel operations.

Vic Newell

December 19, 2010

2 Min Read
ITPro Today logo in a gray background | ITPro Today

The Pseudo-Parallel Approach to Purging Non-Partitioned Data

When you have related tables that need to be purged, particularly when foreign key constraints are involved, you typically need to run the purges for each table serially so that the data is deleted from the first table, then the second table, and so on, in the correct order. For small volumes of data, running purges serially is usually acceptable. However, for large volumes, you might need to break the work into pieces and purge the pieces concurrently (i.e., in parallel) where possible.

In Oracle, parallelizing operations is easy. SQL Server doesn't offer this capability, but you can run pseudo, or "fake," parallel operations. The pseudo-parallel approach works when you have only application-enforced foreign key relationships or very limited numbers of database-enforced foreign key relationships between the tables. In some systems, you might have a mix of both application-enforced and database enforced constraints, which would also allow you to use this approach.

The code in Listing 1 demonstrates the pseudo-parallel approach. (You can download this code by clicking the Download the Code Here button near the top of the page.) To run this code, you need to schedule multiple SQL Server jobs to run at the same time, with each job running a particular piece number.

With this design, you can greatly scale the purge process out rather than scaling it up—that is, you can spread the work across multiple threads of execution as opposed to throwing more and bigger hardware at the process. However, the pseudo-parallel approach has limited applicability due to the requirement that you must be able to split the DELETE operations into separate, unrelated, and preferably restartable processes. This approach doesn't work when there are significant numbers of foreign key relationships involved and the tables being purged don't stand alone without the presence of hard foreign key constraints, even though a given group of tables might all be part of a single database subsystem.

When you can deploy this method, though, you can achieve extremely high volume purges in a very short period of time. I've achieved delete throughput in the tens of millions of rows per hour with this design.

 

 

Web Listing 1: The Pseudo-Parallel Approach

CREATE PROCEDURE MyProc (@ParallelMode Bit = 0, @PieceNumber Tinyint = 0)ASSET NOCOUNT ONIF (@ParallelMode = 0)  BEGIN    DELETE FROM Table1 WHERE SomeField = SomeValue    DELETE FROM ChildTable WHERE SomeField = SomeValue    DELETE FROM ParentTable WHERE SomeField = SomeValue  ENDIF (@ParallelMode = 1)  BEGIN    IF (@PieceNumber = 1)      BEGIN        DELETE FROM Table1 WHERE SomeField = SomeValue      END      IF (@PieceNumber = 2)          BEGIN            DELETE FROM ChildTable WHERE SomeField = SomeValue            DELETE FROM ParentTable WHERE SomeField = SomeValue          END  END

 

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like