DTS Scripts and Tricks
Put DTS through its paces by applying these common scripting patterns to package development, including invoking a package from within another package, accessing properties the Package Designer doesn't directly expose, and more.
Put DTS through its paces
During the past few months, we've seen numerous newsgroup postings and received several email requests for additional coverage of Data Transformation Services (DTS) scripting techniques within the Package Designer. So let's examine several common scripting patterns that you can apply to package development, including invoking a package from within another package, accessing properties the Package Designer doesn't directly expose, and dynamically changing the source or destination data store. We'll show you how to overcome a subtle but show-stopping gotcha that Visual Basic (VB) developers need to watch for, and how to create a splitter transformation to normalize one row with many repeating columns into a table where each of the repeating columns becomes a row.
Package Within a Package
How many times have you developed a complex transformation in one package, then found that you need the same functionality in another package? In this situation, most developers duplicate the code by creating a new package and copying the original task's scripts and transformations. Although this process works, it creates two separate code bases, resulting in two points of maintenance. Instead, we propose leveraging the original package by calling it from within the second package.
You have two options for invoking a package from another package. The first option is to use an Execute Process task to directly launch DTSRun.exe. The second option is to use an ActiveX Script task to programmatically create, load, and execute the original package. To illustrate these options, we'll create a package called PackageA, which Screen 1 shows. PackageA contains an ActiveX Script task that returns a simple message box. Now let's create a second package, PackageB, and add an Execute Process task to launch PackageA. Screen 2 shows how you invoke the DTSRun executable. You pass the /S parameter to specify which server the package is stored on, the /E parameter to instruct DTSRun to use a trusted connection when connecting to SQL Server, and the /N parameter to identify which package to execute. Executing PackageB, as you see in Screen 3, produces a spawned command process in which PackageA executes and produces a message box containing PackageA's output.
An alternative to using the Execute Process task is using an ActiveX Script task to invoke a package, which gives you greater control by exposing the entire package, through the DTS object model, to customization. For this example, we'll implement the simple ActiveX Script task that Screen 4 shows to load and execute PackageA. This task uses VBScript to directly access the DTS object model. To start, the task declares a local variable, moPackage, to hold the package. Next, the Create Object function instantiates a DTS package object. Then the package's LoadFromSQLServer method loads the definition of PackageA from the SQL Server machine Marble. The task is now ready to execute PackageA. After package execution, the script reclaims the package object's memory by setting its value to nothing and sets the task's completion status to report a successful execution. This time, instead of spawning a new process, PackageA executes within PackageB's memory space and produces the expected message box, which you see in Screen 5.
Looking Under the Covers
We've noted in previous articles that the Package Designer hides many details of using the DTS object model directly. For example, the Package Designer handles the task-to-step associations that control the task execution flow. The Package Designer also hides package methods such as GetExecutionErrorInfo, properties such as StartTime and FinishTime, and details of the underlying task and connection objects. Wouldn't you like to have access to this information, or better, be able to dynamically change the information at runtime? To get access, you need to write directly to the DTS object model. In May and June 1999 ("Unleash the Power of DTS" and "Pump Up the Power of DTS"), we explored how to create packages with VB. You can reuse many of those concepts within the Package Designer.
The key to accessing this information is creating a reference to the package you want to access. To see how this process works, imagine that you've developed a package that imports a data file. Later, your project's requirements change, and now you need to process two more data files with the same format and similar data content. Rather than creating two new packages or adding two additional tasks to your existing package, you can reuse the existing package and dynamically reset the source connection to process the three files.
Let's first build a package to process one file. Listing 1 describes the schema that holds the destination data. Before creating the task, run this script to set up your database environment. This task creates a database called StagingDB and the table SourceData, which will be the destination for the transformation. We'll use the other table, VisitPhysician, in a later example. Now let's create the package.
The package, DataLoader, consists of one SQL Server connection and one bulk insert task. Screen 6 shows the SQL Server connection, StagingDB, which defines the database connection. The bulk insert task, which Screen 7 shows, contains the reference to the data file that we'll later modify. We pointed the source file to the first data set, C:ProjectsArticleSampleData01.dat. For the purpose of this example, the data's format is unimportant. (Subscribers can download the format file, SampleData.fmt, along with all the other sample code, at the link to this article at http://www.sqlmag.com.) At this point, save the package to your local SQL Server and run it once to test it.
To complete the example, a second package, DataLoaderDriver, consists of a single ActiveX Script task. The task, in Listing 2, page 48, builds on the programmatic package-within-a-package technique we discussed earlier. This time, the task goes a step further by referencing the package's Tasks collection, specifically the DTSTask_DTSBulkInsertTask_1 task, and changing its DataFile property to one of the filenames in the FileNames array. With this process, you can quickly and easily iterate through the file list and process each file without having to recode any of the original package. (If you try this example, be sure to change the pathnames of the data files and the SQL Server server name in the script to match your environment.) To extend this example, you can read the file list from a database, the Active Directory, an external file, or even the Registry. Where the list originates is irrelevant. You need only to understand how to modify the package's properties from within another package.
Another variation on this example lets a package modify itself. Here, too, you need to create a reference to the underlying package and then leverage your knowledge of the DTS object model. For example, from within a package, you access the package through the DTSGlobalVariables meta data. Consider the ActiveX Script in Listing 3, page 48. This script drives the task in a package called CreatorName. The script begins by using the package's GlobalVariables collection to gain access to the parent package. After you have access to the package, you can access or manipulate any of its attributes. In this example, the task simply uses a message box to display the package's creator.
DTS Threading Model
When developing with DTS and VB, you need to be aware of a subtle but potentially show-stopping difference in the threading models of the two tools. Briefly, threads let multiple tasks execute within one process. For example, in a multithreaded application, one thread might handle disk I/O while another handles network traffic. Multithreading allows parallelism within an application. In November 1999 ("DTS Error Handling Revealed"), we discussed the importance of choosing thread-safe data providers that fully implement the OLE DB Service Components interface, and the ramifications of using non-thread-safe providers. This warning extends to developers who are mixing DTS with VB.
The problem with mixing DTS and VB is that DTS is free-threaded (a variation of multithreading), whereas VB is apartment-threaded (a concept born of the COM architecture). The exact differences between the two and the definition of apartments are topics well beyond the scope of this article. For further reading on COM threading models, please refer to the Microsoft Developer Network (MSDN) Online Library under the topic Platform SDK/COM/COM Fundamentals/ Processes, Apartments, and Threads (http://msdn.microsoft.com/ library/psdk/com/aptnthrd_8po3.htm). However, the differences are serious enough that not handling them appropriately within your projects will lead to access violations at runtime. What makes this compatibility issue hard to diagnose is that Enterprise Manager and the Package Designer are free-threaded. Therefore, packages executing through the Package Designer won't experience the problem. Executing under VB, however, often generates this error:
Run-Time Error -2177221499 (80040005)Provider generated code execution exception: EXCEPTION_ACCESS_VIOLATION
So, how do you avoid this access violation? It's simple. For any package that executes through VB or any package that references a custom task developed with VB, DTS is forced to run within the main thread. Programmatically, setting the step object's ExecuteOnMainThread attribute to TRUE does the same thing. Screen 8 shows how the Options tab of the Workflow Properties dialog box exposes this attribute within the Package Designer. The workflow properties belong to this package's Data Transform task. It's important to note that enabling the ExecuteOnMainThread attribute forces the associated tasks to run in serial rather than parallel, resulting in possible performance degradation (the performance degradation could be significant on an SMP machine).
Changing Columns into Rows
In relational database design, the first rule of normalization is to eliminate repeating groups. In an ideal world, all databases would follow this rule. However, in the real world, not all databases are relational and, even within many relational databases, repeating groups exist. This fact leaves database developers, especially in the data warehousing community, with the challenge of transforming repeating groups into normalized, relational tables, or to put it another way, splitting columns into rows.
At first glance, DTS might not seem to be a good candidate for solving this normalization problem. After all, as we've noted in several previous articles, transformation tasks are limited to a single operation or statement against the destination data store. To solve this problem, you'd potentially need many operations to properly convert one source row into several destination rows. DTS offers two possible solutions. The first solution uses a Data-Driven Query task with a stored procedure. With the data-driven query, DTS sends the entire source row to the server where the stored procedure resides, and then the stored procedure maps the columns into their appropriate INSERT statements. "The DTS Development Guide" (July 1999) covered the use of data-driven queries, so we won't go into detail here. Instead, we'll focus on the second solution to the splitter problem, using a transformation task's DTSTransformationStatus constants to control the flow of processing.
Within each transformation, you can control when the DataPump moves to the next row. By manipulating the DTSTransformationStatus value, you can force the transformation script to process the same row multiple times, thereby letting the transformation generate multiple INSERT statements from the same source row.
The following example illustrates this idea: At a hospital, multiple physicians are associated with a visit. The SourceData table, whose schema you see in Listing 1, represents the source system that supplies the data. The VisitPhysician table, whose schema is also in Listing 1, represents the destination in the warehouse.
First, build a package containing two connections, the first representing the sourcesystem and the second representing the warehouse. Next, add a Transform Task between the two connections and define a new ActiveX Script to handle the transformation. By default, DTS maps each source column to a destination column in column order. Before you add the transformation, you must delete the default column mappings by selecting and deleting each transformation line between the source and destination tables. Now, add the new transformation. As Screen 9 shows, when you're defining the transformation, select all columns in the source and destination tables and click New to create a new transformation of type ActiveX Script. Clicking New opens the ActiveX Script Transformation Properties dialog box. Place the VB Script from Listing 4 into this dialog box to implement the splitter.
In Listing 4, before the Main function declaration, establish and initialize a variable, nCounter, to keep track of where the code is in processing the row. The value 4 represents the number of output rows per source row. The function Main() begins by checking whether any rows are left to be inserted. When the function returns true, it first sets its return status to DTSTransformStat_SkipFetch. Here's the key to the splitter's functionality: By setting the status to SkipFetch, the transformation tells the DataPump not to get a new row from the source connection, but instead to resubmit the current row for additional processing. If nCounter equals 0, the transformation resets the nCounter value to 4 and sets the status to DTSTransformStat_SkipInsert. This time the transformation tells the DataPump not to write anything to the destination, but to move on to the next source row. The remainder of the function determines which PhysicianType to insert and sets the destination columns appropriately.
Using the data-driven query approach to solve this normalization problem offers a definite performance advantage over implementing the splitter by limiting the number of calls to the database. However, the splitter offers the flexibility of leveraging additional scripting and error-handling capabilities that aren't available to Transact SQL (T-SQL). Besides the splitter, you can use a similar technique to implement PivotTable-like functionality. A PivotTable goes in the opposite direction of the splitter—that is, where the splitter turns columns into rows, the PivotTable turns rows into columns. But we'll leave that example for a future article.
Being able to leverage common design and coding patterns is one of the hallmarks of a good developer. This month, we touched on several common scripting patterns that you can use on your DTS projects, but we've only scratched the surface. Send us email and tell us about your scripting tricks. We're always looking to expand our knowledge base and share that information with you.
About the Authors
You May Also Like