Understanding VBScript: Functions to Manipulate Strings
This monthly column covers all aspects of the VBScript language. The November column discusses how to use functions to initialize, format, extract, search for, and convert strings.
October 24, 1999
My coverage of functions continues this month with a look at the functions you can use to manipulate strings. VBScript has many functions that you can use to initialize, format, extract, edit, search for, and convert strings. Rather than briefly outline every function, I'll cover in some depth the more commonly used functions. Table 1 summarizes these commonly used functions. I'll also discuss how you can create custom string-manipulation functions to meet specific needs.
Initializing Strings
To initialize strings, you can use the String and Space functions. Both functions assign an initial value to a string by repeating a character a specified number of times. The Space function repeats a space. The function takes just one argument: the number of spaces that you want in the final string. For example, the code
str1 = Space(20)
results in a string that contains 20 spaces.
Unlike the Space function, the String function lets you specify the character you want to repeat. You use ASCII code to specify the character in the function's second argument. The first argument specifies how many times you want to repeat that character. For example, the code
str2 = String(20, 32)
results in a string that contains 20 spaces. (The ASCII code for the space character is 32.)
The Space and String functions are quite useful when you work with components that expect their client to pass buffers of preallocated memory. When you use a Dim statement to declare a string variable, VBScript doesn't allocate memory for that variable. Consequently, if you use the declared variable as an argument for a component that requires preallocated buffers of memory, a runtime error occurs. (This error isn't likely to occur for novice scriptwriters because they typically work with COM components. COM components usually allocate and return strings rather than fill preallocated buffers.)
The length of a string can change throughout a script because the string's length depends on the content it holds. You can use the Len function at any point in a script to determine how many characters a string contains. For example, consider the code
str2 = String(20, 32)MsgBox Len(str2)str2 = "Hello"MsgBox Len(str2)
The first Len function results in 20 because the previous line initializes the str2 string variable to 20 space characters. The second Len function results in 5 because the previous line sets the str2 variable to the string "Hello".
Formatting Strings
Three popular formatting functions are FormatCurrency, FormatNumber, and FormatPercent. These functions let you create special strings from parametric values (i.e., values from parameters such as variables). For example, suppose you have a number representing a currency value, such as 1000, and you need this value to appear with a currency symbol and decimal point. You can use the FormatCurrency function, which adds a currency symbol in front of the string you specify as the argument. For example, the code
MsgBox FormatCurrency(1000)
displays the value of $1,000.00 on a machine with the regional setting of English (United States) or the value of L.1.000 on a machine with the regional setting of Italian (Standard).
To see a machine's regional setting, double-click the Control Panel Regional Settings applet and select the Regional Settings tab. If English (United States) is the selected regional setting, the Currency tab of Regional Settings specifies the dollar sign as the currency symbol by default. The Currency tab also specifies other currency formats, such as the character representing the decimal symbol (the default for the US setting is a period) and where to place the decimal symbol (the default for the US setting is two digits after the decimal). FormatCurrency defaults to the settings in the Currency tab for almost all its formats.
If you don't want to include the decimal point in your currency value, you can change the Currency tab's No. of digits after decimal setting from 2 to 0. An even easier approach is to set the function's optional second argument, which specifies the number of digits after the decimal point. For example, if you set the second argument to 0 with code such as
MsgBox FormatCurrency(1000,0)
the result is $1,000. FormatCurrency also has optional arguments that you can use to specify whether you want fractional values to include a leading 0, whether you want negative values in parentheses, and how you want to group digits. (For more information about these arguments, go to http://msdn.microsoft.com/scripting/ default.htm?/scripting /vbscript/techinfo/vbsdocs.htm and select Language and Run-time Reference. From the Function drop-down list, select FormatCurrency Function and click Go.)
The FormatNumber function works the same as FormatCurrency, except that FormatNumber doesn't include a currency symbol in the resulting value. For example, the code
MsgBox FormatNumber(1000)
displays the result of 1,000.00. FormatNumber uses the settings in the Number tab of Regional Settings unless you use the function's arguments to specify otherwise. FormatNumber has the same arguments as FormatCurrency.
The FormatPercent function is useful when you want to convert a fractional number to a percentage. The function multiplies the number you specify by 100, appends a percent sign (%), and applies the format that the Number tab specifies, unless you use the function's arguments to specify otherwise. (FormatPercent has the same arguments as FormatCurrency.) For example, the code
MsgBox FormatPercent(-0.1, 0)
displays -10%.
Extracting Substrings
Three functions—Left, Right, and Mid—let you extract text from a string and return that text as a new string. You use the Left function to extract a specified number of characters from the left side of the string and return those extracted characters as a new string. You specify the number of characters to extract in the function's second argument and the string to extract those characters from in the first argument. For example, the code
path = _ "C:mydirmysubdirmyfile.ext"MsgBox Left(path, 3)
returns C:. The Right function works the same way, except that it extracts the specified number of characters from the right side of the string. For example, the code
path = _ "C:mydirmysubdirmyfile.ext"MsgBox Right(path, 3)
displays ext. The Mid function is more flexible that the Right and Left functions because you can start extracting characters from any position in the string. The Mid function's syntax is
Mid(string, start[, length])
The mandatory start argument specifies the position where you want to start extracting a certain number of characters, which the length argument specifies. However, the length argument is optional. If you don't specify the length, Mid returns the substring from the start position to the end.
A typical error when using Mid is to interpret the length argument as the ending position of the extraction rather than the number of the characters to extract. In other words, the code
Mid(path, 2, 5)
doesn't tell VBScript to extract all the characters from position 2 to 5 but rather to extract 5 characters starting from and including position 2.
Editing Strings
To edit strings, you can use the Replace, LCase, and UCase functions. The Replace function is a powerful tool to retrieve and edit substrings within a given string. Its syntax is
Replace(expression, find, _ replacewith[, start[, count[, _ compare]]])
The first three arguments are mandatory. You must specify the string to work on (expression), the substring to replace (find), and the new string to insert (replacewith). For example, the code
str = "Hello, world!"MsgBox strfind = "world"replacewith = "boys"MsgBox Replace(str, find, _ replacewith)
displays the string Hello, world!, retrieves the substring "world", replaces that substring with "boys", and displays the result Hello, boys!
The three optional arguments (start, count, and compare) significantly increase the function's flexibility. The start argument lets you specify the position within the string where the code must start looking for possible replaceable substrings. By default, the start position is 1 (i.e., the beginning of the expression). The count argument specifies the maximum number of replacements allowed, which lets you control the number of replacements the code makes. The compare argument lets you specify whether the comparison is case-sensitive (the default value of 0) or case-insensitive (value of 1).
Although the optional arguments increase the function's usefulness, you need to know about the start argument's two quirks:
If you specify a start value that is greater than the string's length, VBScript returns an empty string rather than the original string.
If you specify a start value other than 1, VBScript returns a modified string that begins at the start position. In other words, the Replace function truncates the part of the string before the replaced character. For example, the code
str = "Hi All"find = "A"replacewith = "a"MsgBox Replace(str, find, _ replacewith, 4)
doesn't display Hi all but rather all.
In addition to Replace, you can use the LCase and UCase functions to edit strings. As their names suggest, the LCase and UCase functions convert strings to lower case and upper case, respectively. For example, the codestr = "HELLO, World!"MsgBox LCase(str)
displays hello, world!
Searching for Substrings Within a String
The Replace function is powerful but intrusive. Although it leverages a useful search engine to locate substrings in strings, the function also changes the content of that string. When you only need to know the position of a substring, you can use the InStr or InStrRev function. InStr and InStrRev return the position of the first occurrence of a substring within a string. The difference between InStr and InStrRev lies in the direction they scan the original string. InStr starts at the beginning of the string and works its way to the end; InStrRev starts at the end of the string and works its way to the beginning.
Although these two functions accomplish a similar task, their syntax differs:
InStr([start, ]string1, string2[, _ compare])InStrRev(string1, string2[, _ start[, compare]])
Both functions have two mandatory and two optional arguments, but the arguments' positions vary. In particular, the start argument (which denotes the position the search starts from) is the first argument in InStr but the third in InStrRev. Consequently, the string1 argument (which specifies the string to search) and the string2 argument (which specifies the substring to search for) have different positions. In both functions, the compare argument (which specifies whether the comparison is case-sensitive) is the last argument.
You can use InStrRev to find any filename in any fully qualified pathname with the code
path = _ "C:mydirmysubdirmyfile.ext"pos = InStrRev(path, "") +1MsgBox Mid(path, pos)
In this code, you use InStrRev to locate the position of the last occurrence of the backslash. From that position (in this example, 18), you add 1 and assign the result (19) to the pos variable. You then use Mid to extract the substring from the start position (pos, or 19) to the end of the string. Finally, MsgBox displays the result, which is always the filename and its extension (myfile.ext). The next example shows how you can use InStr and InStrRev to find and extract the directory and subdirectory in a given path:
path = _ "C:mydirmysubdirmyfile.ext"posA = InStr(1, path, "") +1posB = InStrRev(path, "")MsgBox Mid(path, posA, posB - posA)
In this code, you first use InStr to locate the position of the first occurrence of the backslash (in this case, 3) and add 1, assigning the result (4) to the posA variable. You then use InStrRev to locate the position of the last occurrence of the backslash (18) and assign the result (18) to the posB variable. Next, you use Mid to extract the substring from the start position (posA, or 4). Instead of subtracting all the characters from the start position to the end, you're instead extracting 14 characters (posB - posA, or 18 - 4). Thus, this code displays what is between the first and the last backslash, which is, in this case, the string mydirmysubdir.
When you search for strings and substrings, a function you might find useful is the StrComp function. StrComp lets you perform case-sensitive and case-insensitive comparisons. For more information about StrComp, see my August column.
Converting Strings
VBScript offers many functions to convert strings to other data subtypes or to convert other data subtypes to strings. For example, you can use the CDate function to convert a string to a date or the CStr function to convert a date to a string. For more information about these types of conversion functions, see my June column.
VBScript 5.0 also offers the Split and Join conversion functions for arrays. (Previous VBScript versions don't have these functions.)
The Split function lets you separate a string into substrings and assign those substrings to an array's indexes. The Join function lets you combine the substrings that each array index contains into one string and assign that string to a string variable. For more information about the Split and Join functions, see my July column.
Creating User-Defined String-Manipulation Functions
Although VBScript offers many functions to manipulate strings, you can develop custom functions to meet your specific needs. For example, I created the Quote function, which encloses a string in quotation marks:
Function Quote(str) Quote = Chr(34) & str & Chr(34)End Function
This function uses the concatenation (&) operator to embed the string between two Chr(34) constants. Chr(34) is VBScript's intrinsic constant for the quotation mark. (For information about how to create user-defined functions, see my October column.)
On the Win32 Scripting Journal Web site (http://www.winntmag.com/ newsletter/scripting), you can find the Web-exclusive sidebar "Create a Custom Function to Manipulate Strings," which contains two more examples of user-defined string-manipulation functions. The first function, SentenceCase, transforms a string into sentence case (i.e., a string in which all the letters are lowercase except the first letter). To make this transformation, the SentenceCase function applies the built-in LCase, UCase, Replace, and Left functions. The second function, SetIndexAt, lets you replace one character at a fixed position in a string. The SetIndexAt function applies the built-in Left, Mid, and Replace functions to achieve this replacement. SetIndexAt also applies a workaround for the Replace function's truncation quirk.
A Happy Medium
If you know how to use only the most basic VBScript functions, you might create more work for yourself because VBScript offers many handy functions that let you perform common tasks with only one call. If you try to learn how to use every one of VBScript's numerous functions, you might spend more time reading rather than applying what you've learned. You can find a happy medium by knowing how to use the functions I've covered this month and last month. This set of functions will likely meet most of your daily scripting needs.
About the Author
You May Also Like