RowGen 2.0
Use RowGen 2.0 to create referentially valid but random sample data that you can use in your test environment.
October 29, 2008
In today’s world of Health Insurance Portability and Accountability Act (HIPAA) and privacy-related regulations, the ability of Innovative Routines International’s RowGen 2.0 replace sensitive client information you copied from your production environment to use in your test environment with realistic data that doesn’t infringe on your clients’ privacy is very useful. RowGen can take a file definition of your data structures and use the structures as a template to create referentially valid but random test data for your applications. This product is compatible with data definitions from multiple data stores, including SQL Server, Oracle, DB2, and packaged applications such as PeopleSoft. You can load the file containing your DDL into RowGen and it will automatically generate data based on your definition. The idea sounds promising; however, let me state up front that RowGen isn't a simple, easy-to-use product.
You can’t just install and run RowGen because it requires a significant time investment to learn its proprietary data definition syntax, so you'll want to have someone dedicated to working with the tool. During the installation process, three executables were installed on my test machine. The first executable, Setup, is for defining your license; the second is RowGen itself, a command-line executable that doesn’t have a menu item in Windows; and the third is the RapidACE-RowGen GUI executable. (RapidACE is the front end for the RowGen engine.) Once you’ve installed the package, run the Setup program to generate the text file that defines your system’s characteristics so that you can be issued a license from CoSort, which you need to run the tool.
In addition, RowGen installs three PDF files that contain the product’s documentation. There’s a document for the Setup executable (RowGen 2.1 Install Guide.pdf), for the RowGen engine (RowGen 2.1 Manual.pdf), and for RapidACE (RA-RapidACE Operations Guide.pdf). These documents are accessible from the RowGen folder on the Start menu. You should use these documents, especially the RapidACE Operations Guide, which is crucial for being able to work with RowGen in a minimal amount of time, to walk through the GUI version of the tool.
The product’s software also adds a set of configuration settings in the registry under HKEY_LOCAL_MACHINESOFTWAREInnovative Routines Int'l, Inc.RowGen 2.1Global Configuration. These settings include the number of threads that are limited by your license, and if you get a runtime error related to exceeding the number of available sort threads, you’ll need to edit these settings. Note that although the settings can be overridden in a local resource file, I recommend updating the settings here. More information about these settings is available in Appendix A of the RowGen21_manual.pdf file.
RowGen is written in Java and supposedly requires the Java SE Development Kit (JDK) 1.5 or later. However, during the installation process, I initially installed and registered the tool without the JDK, so the installation process doesn’t check for the JDK. When I realized I needed the JDK, I copied the JDK 1.5 installation package from another machine. (I copied JDK 1.5 over to my test machine because as I mentioned earlier, the RowGen licensing model includes the specific machine name in the information you submit to CoSort to obtain your license keys, and I had already gotten my keys and set up RowGen on my Windows Vista test machine.) After installing JDK 1.5, a Java package error displayed when I attempting to run RowGen to generate data. However, I was prompted to update my JDK to version 1.6, after which RowGen ran successfully.
To access RowGen's main window, select the RapidACE RowGen menu item, which opens the Workspace window. Because RowGen is a command-line executable, you can in theory access the installation directory and execute the engine from the command line. However, executing the engine from the command line isn’t recommended because you need to create a set of configuration files for the engine to process. Instead, go into the RowGen folder under the Start menu and click RapidACE. Unlike the script-driven, command-line tool, RapidACE is a GUI that lets you generate RowGen scripts and call the RowGen engine to generate your test data. (Note that according to IRI’s website, RapidACE is a separate product from RowGen; however, without having this GUI to generate the necessary scripts, using RowGen to create data would require manually editing a set of scripts using a proprietary scripting language, or 4GL as it’s referred to on the company’s website. Thus, RowGen ships with RapidACE for importing your DDL and lets you automatically generate the necessary scripts to drive the test data generation.)
There are three main panes (shown in Figure 1) in RapidACE. On the left side of the RapidACE window is the Workspace pane in which you can identify the files that contain your data generation rules. The top right section is a text edit window in which you can edit the contents of your data generation files (i.e., either an imported DDL file or your generated script—.rcl—files). Finally, the bottom right window is the Console pane in which you can execute specific data generation commands. One of the first things I realized as I worked with the tool is that the reason it is database agnostic is because it works with files only on the file system, thus you need to export your T-SQL. Note that when exporting T-SQL you should export only the table definition because any additional information is likely to confuse RowGen and keep it from recognizing your tables. When your file is imported you should see your tables as opposed to a top-level database description.
RowGen supports several output file formats, including comma-separated value (CSV) and XML file formats. To create an output file, you need a script (.rcl) file that will describe both the data source definition and the output file definition. The generated output will include two types of files. The first is a data table (.tab) file, which will contain your generated data. The second type of output file is a Set (.set) file, which describes the generated primary keys so that files that need referential integrity can reference your generated data. Set files are used by the RowGen engine when generating multiple data tables.
To avoid errors related to exporting T-SQL, I suggest first walking through RowGen using the demo DDL that’s included with the installation package. This file is located in the DDL sub-folder of the folder RA-RowGen, which is a sub-folder of the RowGen21 folder. You can go under File, Import in RapidACE to import this set of data structures from the emp_evals.ddl file. You can edit the DDL file by double-clicking it.
RapidACE doesn’t let you simply select a group of tables and create a category to describe how these files should be processed. Instead, you need to right-click the Generation Categories node and explicitly define a category. The category defines, for example, how many rows of data should be generated and the delimiter between columns. Once you’ve created a category, you can move the tables defined in your DDL to the category. Figure 1 shows RapidACE after the DDL file has been imported, a new category has been created, and several tables have been added to the category. Note that the DDL file is open for editing on the right side of the display.
Figure 1 also shows the .rcl files that are created when you select your category and choose to generate the RowGen scripts associated with your category. It’s the .rcl scripts that are used by RowGen to generate your sample data. The .rcl files describe the custom 4GL used by RowGen to define your original data structure as well as the output that’s needed to correctly generate data for the tables that you grouped into a category. When RapidACE generates these scripts on disk, you’ll need to select a project folder on your machine to hold the resulting .rcl files. RapidACE will place all of the generated .rcl files into this project file, as well as a .bat file, which you can then use to execute the RowGen engine and generate your data.
Web Listing 1 shows the Employees.rcl script, which was generated as part of this process and is used by the RowGen.exe engine to generate the table’s data. In theory, you can right-click any of the .rcl files and RapidACE will let you run RowGen for that script. However, in my experience, RapidACE failed to remember where my copy of RowGen was installed, so I recommend first navigating to your installation location. In addition, running RowGen via this method doesn’t provide any indication of success or failure. As a result, it’s better to use the .bat file from the command line. Although you can run the .bat file from within Windows, the generated batch file doesn’t pause in the event of an error when executing the script, so to see any errors, such as those related to the number of threads that the tool was attempting to use, you need to execute the .bat file from the command line.
Once the .bat file has finished running, you’ll have a .TAB file containing your generated data for each .rcl script. Additionally, you might have one or more .set files related to your .rcl file that are used to indicate the identity values generated as part of the associated .TAB file. The resulting data will be tab delimited and match your required data definition. At this point the tool is complete, the generated data is in the .TAB files, and by leveraging its generated .set files, RowGen is able to maintain referential integrity. However, the user is left to import the resulting tab delimited files. When I generated random data, the resulting files, regardless of my choice of delimiter, defaulted to a tab delimited file. Although I did edit one of my .rcl files to manually change this delimiter for the fields, the tool still defaulted to a tab delimiter for each row, thus implying I would want to use a custom reader or one of the other IRI tools to read and import this data. This is also a good indicator of the level of knowledge users will require to review and make minor revisions to DDL files as well as to the custom .rcl script files.
If your environment supports a dedicated test team or you have someone who can dedicate the time to get up to speed with this tool, RowGen can be useful. The tool runs quickly and the data files are generated efficiently. In addition, because RowGen is a command-line tool that accepts input from and outputs to standard system files, you can incorporate the tool in your automated processes. Considering the fact that Microsoft’s Visual Studio Team System 2008 Team Suite is going to include a similar tool, as well the amount of time needed to master RowGen and RapidACE and set up all of the required DDL and script files, might make RowGen not worth its four-figure price. However, if you have a heterogeneous database environment and a dedicated test team that’s familiar with test automation and writing scripts, this tool’s command-line capabilities might be of interest to you.
About the Author
You May Also Like