Progressive Perl for Windows: Strolling Down Win32 Paths
Understanding how Perl deals with Win32 paths might resolve some of your scripting problems.
September 23, 2001
Recently, I've noticed a flurry of questions about Win32 paths coursing through the Internet's Usenet and email lists. Systems administrators often take paths for granted, but paths are the cornerstone of what an OS does—namely, manage files. If you've encountered problems in your Perl scripts, understanding how Perl deals with paths in Win32 OSs might help you resolve those problems. Here's a detailed look into the differences between local paths and Universal Naming Convention (UNC) paths in Win32 OSs, how Win32 paths differ from UNIX and DOS paths, and how absolute and relative paths differ.
Local vs. UNC Paths
Win32 OSs have two basic types of paths: local paths and UNC paths. Local paths are the better known because they were introduced back in the days of DOS, before Windows came onto the scene.
Local paths, such as C:temp, begin with a drive letter. Because only 26 valid drive letters exist (i.e., A through Z), you have a limited number of local paths with which to work. When you consider that some of those drives are already taken, you have even fewer local paths. For example, typically a computer's A drive is the 3.5" floppy disk drive, the C drive is the hard disk drive, and the D drive is the CD-ROM drive. Although the latest Win32 platforms let you change a drive's letter (e.g., change the drive for CD-ROMs from D to W), you're still using one of the 26 drives for that purpose. The only way to gain additional local paths is to map a drive letter to a remote machine's shared directory.
The UNC path points to a directory or file relative to a particular networked machine. The format of this type of path is
\machine_nameshare_nametemp
where machine_name is a computer's name and share_name is a shared directory's name. Double backslashes (\) preface the computer's name, which can be a valid NetBIOS name, DNS name, or IP address. Any further entries at the end of the path (in this case, temp) are relative to the machine's shared directory.
UNIX vs. Win32 Paths
Paths exist in both the UNIX and Win32 worlds. Administrators new to the Win32 platform need to understand the differences between the two platforms' paths. The two main differences are the delimiters and the case of the characters.
Delimiters. The most important difference between UNIX and Win32 paths is the delimiter that separates each segment of the path. In UNIX paths, a forward slash (/) delimits directories and files (e.g., /usr/bin/ foo.pl), whereas in Win32 paths, a backslash () is the delimiter (e.g., tempfoo.pl). This distinction is important because Perl lets you use different delimiters to make porting scripts between platforms easier. Therefore, in Perl code, the UNIX path
@Statistics = stat( "/temp/foo.pl" );
is equivalent to the Win32 local path
@Statistics = stat( "\temp\foo.pl" );
Note the double backslashes in the Win32 local path. In Perl, you need to flag, or escape, a backslash with the escape character, which happens to be a backslash.
Interestingly, Perl lets you use single quotes (') as well as double quotes (") to enclose strings, including paths. However, single-quoted strings differ from double-quoted strings in two ways:
Unlike double-quoted strings, single-quoted strings don't support interpretation of variables. In other words, the Perl interpreter doesn't expand variable names in single-quoted strings.
Although you have to escape backslashes in double-quoted strings, you don't have to escape them in single-quoted strings. The Perl interpreter doesn't recognize escape sequences in single-quoted strings, except in two instances. First, the Perl interpreter recognizes the ' sequence (e.g., 'dave's script.pl) so that you can insert a single quote inside a single-quoted string. Second, the Perl interpreter recognizes the \ sequence, which is useful when the sequence immediately precedes the ending single quote (e.g., 'temp\'). This exception lets you insert a backslash at the end of a path. In this case, neither exception applies, so you can write the previous Win32 local path as
@Statistics = stat( 'tempfoo.pl' );
When you enclose a Win32 UNC path in single quotes, you run into a slight problem. The Perl interpreter reads the double backslash at the beginning of the UNC path as an escaped single backslash. The only workaround is to escape the first backslash in the pair with another backslash. Therefore, a UNC path in single quotes might look like
@Statistics = stat( '\servertempfoo.pl' );
You handle double-quoted UNC paths similarly to single-quoted UNC paths. The only difference is that you need to escape every backslash:
@Statistics = stat( "\\server\temp\foo.pl" );
Although the Perl language lets you use Win32 or UNIX paths, many Perl extensions aren't as flexible, so using a path with forward slashes instead of backslashes would likely result in an error. Thus, when you write Perl code for Win32 OSs, a good rule of thumb is to follow Win32 conventions.
Character case. Unlike UNIX paths, Win32 paths are typically case insensitive. One exception might occur if you use an alternative (non-FAT16, non-FAT32, or non-NTFS) Installable File System (IFS) driver. You can develop an IFS driver that imposes case restrictions. For example, if your company has an NFS file-system driver, you might decide to impose case sensitivity. However, the default Win32 file systems (i.e., FAT16, FAT32, and NTFS) are case insensitive, so the code that Listing 1 shows is completely legal to use with those file systems.
In Listing 1, the first line lowercases the first parameter passed into the script, then assigns that parameter to the $Path variable. The call to the open() function works regardless of the case of the $Path variable's value.
DOS vs. Win32 Paths
In a legacy DOS platform, the file and directory names must be uppercase, exclude spaces, and follow the 8.3 format (i.e., the name can be up to eight characters long and the extension can be up to three characters long). In the Win32 platform, file and directory names can be mixed case, include spaces, and be much longer. Thus, a path such as C:Program FilesMy test file.txt is valid in Win32 but not DOS.
Sometimes, however, Win32 paths can create problems. For example, if you run an early Windows or DOS program, it won't understand a long pathname or one that contains spaces or mixed cases. Under these circumstances, the Win32 OS converts the path into a DOS-compatible version. When necessary, the Win32 OS
shortens each component in the path (i.e., each directory name or filename) so that the component doesn't exceed the 8.3 format
removes all spaces in the path
uppercases all letters in the path
For example, a Win32 OS might convert C:Program FilesMy test file.txt to C:PROGRA~1MYTEST~1.TXT.
When the Win32 OS converts a filename or directory name, it renames the file or directory in relation to the files or directories that already exist. For example, suppose an application first creates the file C:Program FilesMy test file is cool.txt, then creates the file C:Program FilesMy test file is long.txt. When the Win32 OS converts the two filenames, it changes My test file is cool.txt to MYTEST~1.TXT and changes My test file is long.txt to MYTEST~2.TXT. If the application had created the files in reverse order, the Win32 OS would have changed My test file is long.txt to MYTEST~1.TXT and changed My test file is cool.txt to MYTEST~2.TXT. As this example shows, you can't determine a file's or directory's 8.3-compliant name (hereafter referred to as simply the short name) from its long name. Thus, you can run into problems if a backup program restores files in a different order than the original order of creation. In this situation, you shouldn't depend on the OS to determine the short names; rather, you should create 8.3-compliant filenames.
To determine the short name for a file or directory, you can use the Win32::GetShortPathName() function. This function returns the short pathname for a given long pathname. As Listing 2 shows, you simply pass in the long pathname as the function's sole parameter. Conversely, to obtain a long pathname, you can use the Win32::GetLongPathName() function. In this case, you pass the short pathname to the function. Win32 Perl (aka ActivePerl) includes both functions, so you don't need to load the Win32 extension before you use them.
Absolute vs. Relative Paths
In a Perl script, you can use absolute paths (also called full paths) or relative paths. An absolute path contains all the path components, from the drive letter or network share to the filename (e.g., C:tempfoo.pl). A relative path doesn't contain all the path components (e.g., foo.pl or somedirfoo.pl); its path is relative to the default directory.
All processes that run on a Win32 machine have a default directory. Typically, the default directory is the folder that contains the running program's .exe file. In Perl's case, the default directory is typically the directory in which the script resides. If you want a Perl script to open a file but you don't specify the file's absolute path, the script attempts to load the file from Perl's default directory.
You can use the Win32::GetCwd() function to determine the default directory for the current process. Cwd stands for current working directory; working directory is another name for the default directory. You can change the current default directory by calling the Win32::SetCwd() function and specifying the new directory name. You can even specify a UNC directory name. (Some programs, though, are unable to handle a UNC directory name.) Listing 2 contains sample code for the Win32::GetCwd() and Win32 ::SetCwd() functions.
Relative paths are extremely flexible. However, determining the full path to a file or directory based on a relative path can be difficult. The Win32:: GetFullPathName() function can help. When you call this function in an array context, the function returns an array of two strings. The first string specifies the pathname's drive and directory components, and the second string specifies the filename. For example, to determine your Perl interpreter's directory and filename, you can use either line of code that Listing 3 shows.
You can use the Win32::GetFullPath Name() function to determine not only the absolute path to your Perl interpreter's directory but also the absolute path to the running script. As Listing 3 shows, you can call the function in an array context and assign the resulting array's first element to the $ScriptDir variable. You need to pass in the long pathname of the $0 variable to guarantee that you don't accidentally get the short pathname. I prefer to use the Win32::Get- FullPathName() function rather than the Win32::GetCwd() function to determine the running script's directory. Using the Win32::GetCwd() function doesn't guarantee that you'll receive the location of the Perl script that's running because you can inadvertently start a process that specifies an alternative default directory.
The Path Is Clear
I wanted to use this month's column to cover some of the finer points of Win32 paths. I hope that I've answered some of the questions about paths that might have been lurking in the back of your mind.
About the Author
You May Also Like