Function Creates Multidimensional Arrays from Delimited Text Files

This VBScript user-defined function can help streamline many text-based processes

James Turner

April 8, 2008

9 Min Read
ITPro Today logo in a gray background | ITPro Today


 Executive Summary:

Instead of having to write a separate routine for each text-based process that arranges data into a form that can be used within scripts, you can simply create a multidimensional array of the data and reference the data by rows and columns as if it were a database. To do so, you can use the Multi_Dimension function, a VBScript user-defined function that arranges data into a multidimensional array.

If you’ve ever had to write scripts that work with delimited text files, you know that virtually every one of them is unique in one fashion or another. Some text files are delimited with commas, some with tabs, some with spaces, and some with other unique characters like the tilde or semicolon. They also differ in size—some have only a few lines with one or two items on each line and some are relatively large, with many lines and many delimited items per line.

At my company, sometimes these delimited text files come from HR, which periodically sends the IT department delimited text files that contain new and modified user data. We then write scripts that use these files to update Active Directory (AD). Other times the delimited text files come from people who evaluate the applications (e.g., People Soft, Microsoft Excel, Microsoft Access) that our company uses. These delimited text files contain lists of jobs that need to be run and the associated parameter settings for each job. We then create scripts that use these files for specialized performance testing of those applications.

 

I wrote the Multi_Dimension function to streamline many of these text-based processes. Now, instead of having to write a separate routine for each of our text-based processes that basically arranges the data into a form that can be used within our scripts, I simply create a multidimensional array of the data and reference the data by rows and columns as if it were a database.

 

What the Function Does

The Multi_Dimension function can automatically create a two-dimensional array of any size, meaning any number of rows and any number of columns. The syntax used to create a multidimensional array is

 

YourWorkArray =  Multi_Dimension("TextFileName",  "DelimiterCharacter")

 

where YourWorkArray is the name of the multidimensional array you'll use as your data source; TextFileName is the path and filename (enclosed in double quotes) of the text file to be used in constructing your multidimensional array; and DelimiterCharacter is the character used within the text file to delimit each field. The delimiter character needs to be enclosed in double quotes unless the file is tab delimited, in which case you specify vbTab as the delimiter and omit the surrounding quotes.

 

For example, in the sample script Multi_D_ArrayFunc.vbs, which Listing 1 shows, here's how I call the Multi_Dimension function:

 

WorkArray = Multi_Dimension _  ("C:Temp~JobFileCSV~.csv", _  ",")

 

WorkArray is the name of the two-dimensional array that contains all of the data from the text file; C:Temp~JobFileCSV~.csv is the path and filename of the delimited text file; and "," specifies that the text file is delimited by commas.

 

Working with the Function

Let’s take a look at the Multi_Dimension function, which starts at callout A in Listing 1. Here you’ll notice that the function receives the filename and the delimiter character that were passed to it from the line of code I just referenced.

 

Next, in callout B, you’ll see that the function checks for the existence of the text file. If the file doesn't exist, the function displays a message box indicating that the file couldn't be found, then terminates the script. However, if the file does exist, it's opened and the entire contents are read into a storage variable called textData.

 

Because of the good possibility that the file might have a line feed at the end of the file, I perform a check on the last character and remove the line feed if one is found. Callout C shows this operation.

 

The next thing the function does is convert the data stored in the textData variable into a one-dimensional array, as callout D shows. The function converts the data by splitting the variable's string into substrings, using the carriage return/line feed character, or vbCrLf, as the delimiter.

 

Now there’s an array element for every line (or row) of text that existed in the text file. Each line contains one or more commas. Because each line contains commas, each array element can be split up into yet another array—in essence, each row can be split up into multiple columns or fields based on how many commas exist on each line.

 

Keeping that in mind, let’s move on down to callout E. Here’s where I begin to create the multidimensional array. First, I declare it as a dynamic array with this statement:

 

Dim BuildArray()

 

I then set the actual size of the multidimensional array by redimensioning the array and declaring both the Row and Col dimensions. The sample text file used by this script, ~JobFileCSV~.csv, consists of 20 rows and 4 columns. So, the dimensions of the array in this case would be 20 by 4, or 80 elements. However, since arrays are zero-based, the actual array dimensions would be 19 by 3. As you can see, I don’t hard code these values, but instead use the ubound function to programmatically determine both the total number of rows—TotalRows = UBound(textDataArray)—and the total number of columns—TotalColumns = UBound(Split(textDataArray(0),Delim))—that my multidimensional array will have. With these two values in hand I can accurately redimension my array with this statement:

 

ReDim BuildArray(TotalRows,TotalColumns)

 

The section of code in callout F basically builds the multidimensional array one row and one column at a time. It does so by stepping through each row of the textData one-dimensional array. For each row, it creates a temporary array of fields, then assigns the values of each row and each field until it reaches the end of the textData array.

 

After that’s complete, I pass my newly created multidimensional array back to where I first called the function. That’s done at callout G.

 

At this point, WorkArray contains the entire text file in a multidimensional array format. All you need to do after you have your multidimensional array is cycle through it and reference the array elements as needed, with code such as

 

Row0_Field0 = WorkArray(0,0)
Row0_Field1 = WorkArray(0,1)
' And so on

 

How to Use the Script and the Function

The Multi_D_ArrayFunc.vbs script in Listing 1 demonstrates how to use the Multi_Dimension function, then displays the multidimensional array's contents in a row and column format. To run the script using the test data file (~JobFileCSV~.csv), download all the files for this article and copy the ~JobFileCSV~.csv file to your C:Temp folder. Then simply double-click the Multi_D_ArrayFunc.vbs file.

 

Before you start to use the Multi_Dimension function in your own scripts, you need to be aware of a few points:

  • The function assumes that the data in your delimited text files are uniform and each record (or line) contains the same number of delimited fields. Blank fields are acceptable as long as there’s a delimiter where that field would be. For instance, if you looked at one of your delimited text files and saw two consecutive commas, this would indicate that that record contains a blank field.

  • You need to know your data. Equally important, the folks who are generating the delimited files need to know that they must use a unique delimiter in the files they're generating. You don’t want to discover late in the process that you were sent a comma-delimited text file that has fields with actual commas as part of the data.

  • Multi_D_ArrayFunc.vbs uses VBScript's MsgBox function to display the data in the array. If you were to use this function in your code, you would obviously be performing actual tasks rather than displaying it on screen. Multi_D_ArrayFunc.vbs is just for demonstration purposes, so you probably don’t want to use this script to display data from very large delimited text files. You could, of course, change the script to use Windows Script Host's (WSH's) Echo method (WScript.Echo) if you wanted to.

  • Keep in mind that you aren’t limited to just a few rows or columns of data in your text file. You can use nearly any size text file that’s accurately delimited by virtually any unique character. However, you might use a good deal of your computer memory if you have an extremely large delimited text file. When I tested the script using 65,000 records with each record having 11 variable length fields, I used approximately 14MB. This was an extreme test and a test only—normally I’m only dealing with a few hundred to a few thousand records in my processes. I might also add that the processing time with these ranges is usually pretty speedy even on my older PC.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like