Clear Out Cached Data
A simple Perl script cleans up
October 5, 2003
Today's user-friendly programs and OSs often provide users instant access to recently used data and documents through cached Web pages, type-ahead lists, and most recently used (MRU) lists. All these features are helpful but can consume a great deal of disk space and can become a burden for systems administrators who need to protect users' privacy. When such problems arise, you can use the Perl script CleanCache.pl, which Web Listing 1 (http://www.winscriptingsolutions.com, InstantDoc ID 40232) shows, to clean out many of the most common Windows file caches and MRU lists. (This article assumes that you have a basic understanding of Perl; for more information about that language, see http://www.roth.net/perl.)
Internet Explorer Cache
When you use Microsoft Internet Explorer (IE) to surf the Web, IE stores Web pages, graphics, cookies, Java scripts, and other downloaded content on your hard disk in a location known as the IE cache. Subsequent requests for such stored objects don't require you to reconnect to a Web server. Thus, you save connection and download time, especially for large downloadable files. The cache doesn't belong only to IE, however; rather, it belongs to a library known as WinInet.dll. This library provides Web, FTP, and gopher protocol access as well as caching functionality. IE and other applications, such as Microsoft Outlook Express and Windows Media Player (WMP), use the WinInet cache, so the cache can contain interesting data files. To view the list of items in the cache, open the Control Panel Internet Options applet, go to the General tab, and click Settings (under Temporary Internet files). Then, click View Files in the Settings dialog box.
The WinInet cache consists of two components: a database and a repository. The repository is a directory or collection of directories in which the cached files reside. The database contains entries that map each URL to a cached item's location on disk. When you download a file from the Web, IE stores the file in the repository and adds a pointer entry to the database. Typically, the cache repository is under %systemroot%documents and settingsusernamelocal settingstemporary internet files, where username is the name of the user.
Clearing the cache is fairly simple—you just need to remove the cached files from the hard disk's repository (typically the Temporary Internet Files directory) and remove the entries from the database. However, cache database entries can map not only to the repository but also to any file in any directory on any drive on the machine or even on a remote machine. Therefore, if you simply delete all the files in the Temporary Internet Files directory, you might miss some cached files.
In addition, the WinInet cache can store cached files individually or as a group. Therefore, any script that cleans out the cache should query the cache database for both individual files and groups of files. For example, WMP can cache content when streaming from a media server or progressively download content from a Web server. WMP stores these files as a group in the WinInet cache. Therefore, if a script searches for and deletes only files, the script might not remove the group of cached WMP files. Along the same lines, to delete only WMP media files, you need a script that will recognize cached group files.
Furthermore, deleting cached files doesn't clean out the cache database, which an application (e.g., IE) or script can still query. To properly clean out the cache, you need to enumerate all entries in the database, then delete each cached file and remove the related entry from the database. The WinInet library exposes functions that enumerate each cache entry and a function—DeleteUrlCacheEntry()—that both deletes the cached file and removes the entry from the cache database. (The same procedure is necessary to remove cached cookies and browser history information.) Because multiple processes can use the WinInet cache simultaneously, the database files are usually open. Consequently, a script can't delete the actual database files. Rather, the script must simply delete information from the database files.
Temporary Files
Applications generate all sorts of temporary files. Although some applications automatically delete their temporary files when they shut down, many applications don't remove these files. (Microsoft Word is notorious for creating temporary files and never cleaning them out.)
Over time, lingering temporary files can take up considerable space, ranging from several kilobytes to several hundreds of megabytes or more. Worse yet, these files are generally small and can lead to fragmentation of larger files. Severe file fragmentation can degrade overall system performance (especially if the Windows pagefile becomes fragmented) and cause undue wear and tear on your hard disk. Therefore, cleaning out the temporary file directories from time to time is a good idea.
Temporary files reside in two locations: the Temporary Internet Files directory and the Temp directory. The WinInet library uses the Temporary Internet Files directory to store cached Internet files, as I explained earlier. The Temp directory stores temporary files that applications and the OS create. The temp environment variable determines the exact location of the Temp directory. You can view this variable by issuing the Set command from a command line. (In Perl, you can use the %ENV hash with the TEMP key—$ENV{'TEMP'}—to discover the path to the Temp directory.) To see the list of environment variables, right-click the My Computer icon and select Properties. Go to the Advanced tab and click Environment Variables. Removing temporary files is as simple as deleting all files from the Temp directory; removing files from the Temporary Internet Files directory is a bit more complicated, as I explained earlier.
Other Data
Several other types of data can clutter up disk space. You can use CleanCache.pl to clean out the following data (often by simply removing the specified registry subkey's values).
IE form data. IE can remember data that a user enters into a Web form. This ability makes filling out forms easier for the user. This remembered information resides in the HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerIntelliFormsSPW registry subkey.
IE Typed URL list. When you type a URL into the IE address bar, IE stores the address in a Typed URL list from which you can then select previously typed addresses. The Typed URL list resides in the HKEY_CURRENT_USERSoftwareMicrosoftInternet ExplorerTypedURLs registry subkey.
MRU list. Choosing Start, Run opens an edit box in which you can enter a command or path to run an application. This edit box provides a drop-down list of previously entered paths. This list is known as the MRU list. The MRU list is handy because you don't need to remember the complete command or path you previously used to run an application. However, anyone who accesses your logon account or the MRU setting's registry subkey can also find this information. The MRU list resides in the HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionExplorerRunMRU registry subkey.
Recent File list. Windows maintains a Recent File list (aka the My Documents list), which lists all the files that you've loaded recently, similar to the way the MRU list lists recently run applications. A Trojan horse or virus can query the Recent File list to gather data about a user's work habits. The list is actually a directory on the hard disk, typically %systemdrive%documents and settingsuser- namerecent. The directory contains shortcut link (.lnk) files that point to the actual files. The directory can contain hundreds of entries, but Windows displays only a short list of the most recently accessed files.
Recycle Bins. When you use Windows Explorer to delete files, Windows stores the files in a Recycle Bin so that you can recover the files if necessary. Each drive usually contains a Recycle Bin, and an aggregate Recycle Bin resides on the desktop. Recycle Bins can become quite large and should be emptied periodically.
The Script
CleanCache.pl cleans up all these various types of cached data. This script is useful when cleaning a user account from a machine so that other users can't discover which data the user accessed and used. The script might seem complicated at first glance but is really quite simple. Let's examine the most important sections of the script.
The code excerpt that Listing 1, page 11, shows deals with configuration. In this section, the script assigns values to various variables. I obtained many of these values from Microsoft Developer Network (MSDN) documentation; I discovered others through experimentation. The script encloses this section in a "no strict" block that disables Perl's strict so that the many defined variables that aren't lexically scoped with "my" won't cause the script to issue warnings.
The code excerpt that Listing 2, page 12, shows loads various libraries to expose required functions. The script will use the Win32::API::Prototype module to call these functions to perform specific tasks such as emptying the Recycle Bin—the SHQueryRecycleBin() function—and deleting a cache entry—the DeleteUrlCacheEntry() function. The code excerpt that Listing 3, page 13, shows calls various subroutines to delete files, remove values from registry subkeys, and call OS functions.
The DeleteUrlCacheGroups subroutine enumerates the cache groups that exist in the WinInet cache database. The script then gathers information (e.g., how much disk space the group consumes) about each cache group, as the code excerpt in Listing 4 shows. Notice that the block of code at callout A in Listing 4 uses a trick to assign an array of values to a hash. This trick works because the order of the array is well known. However, the block would cause errors if the code didn't disable the use of strict for that block. The code excerpt that Listing 5 shows deletes and clears the cache (assuming you've directed the script to do so, as I explain later).
The DeleteUrlCacheFiles subroutine does essentially the same thing as the DeleteUrlCacheGroups subroutine but deletes individual cache entries as opposed to a cache group and is much more complicated than the DeleteUrlCacheGroups subroutine. Each cached entry contains information and attributes such as the date and time when the file was cached, the URL that maps to the entry, and the cached file's expiration time. Each cached entry can be a different size, so the script first allocates a 1KB buffer to the $pCacheInfo variable. The code excerpt that Listing 6 shows begins the enumeration process but must determine whether the buffer is large enough to hold the cache entry data. If the buffer in insufficient, the script reallocates the buffer. The script uses this strategy each time it accesses a cache entry. The code excerpt that Listing 7 shows uses the same technique as the code in Listing 4 to unpack the cached entry data into a %Cache hash. After extracting the cache data, the script determines the cache entry type (i.e., a cookie, URL history entry, or cached file).
The CleanDirectory subroutine makes a call to the OS's SHGetFolderpath() function. By passing in a class identifier (CLSID) value, the function returns the full path to a specialized directory such as the My Documents directory, the Recent File list, or the Temporary Internet Files directory. (For more information about how the script discovers paths, see the sidebar "Discovering Paths," page 16.) The function returns a Unicode string, and the block removes any NULL character in the string; this action can be a problem for paths that actually use Unicode characters. The subroutine then calls the CleanDirectoryAndFiles() function to remove files from the directory. If the deletion of a file fails, the script attempts to rename the file so that it can easily be identified for cleanup later.
The ClearRegistryKey() subroutine removes all values from a specified registry subkey. The script calls this subroutine several times to clear out the Run MRU list, IE's form data, and IE's Typed URL list.
The EmptyRecycleBin subroutine queries the machine's Recycle Bin for statistics (such as how many files are in the bins) and empties the Recycle Bins. When the script calls the SHEmptyRecycleBin() Windows function to empty the bins, the function passes in several flags to prevent a confirmation dialog box from being offered to the user. The flags also suppress any sound signaling that the bins are being purged and any dialog box showing the progress of such purging.
Running the Script
The script uses the Win32::API::Prototype module, which you can install using the Perl Package Manager (PPM). To do so, enter the following on a command line:
ppm install http://www.roth.net/perl/packages/win32-api-prototype.ppd
This module relies on the Win32-API extension, which comes standard with ActiveState's build of Perl. You can also get the extension at http://dada.perl.it/#api and use PPM to install it with the command
ppm install win32-api
When you run the script without passing in any parameters, it will collect information regarding how much disk space the cache is using, how many items are cached, how many items are in the MRU list, and so forth. The script then will display a tally of this information but won't delete any items or clean out the cache. When you run the script with the /v parameter, the script's display will be verbose. When you run the script with the /s parameter, the script will run in silent mode and won't display any text; this parameter overrides the /v parameter. When you run the script with the /d parameter, the script will delete cached files, clear the cache database, and clear the other types of data I discussed earlier.
The success of the script depends on exclusive access to the cache database. If another process is using the WinInet library, the cache database might not be cleaned out fully. For this reason, you should terminate all instances of IE, including instances embedded in other applications such as WMP's Media Guide, before running the script. In addition, some poorly written services use the WinInet library and should be stopped before you run the script.
You might also notice that even after you run the script with the /d parameter, Windows Explorer's Run MRU list might not appear to have been cleared. Windows Explorer loads the MRU list into memory and doesn't necessarily reload it from storage. To show the cleared MRU list, terminate and restart Windows Explorer by logging off and logging back on.
About the Author
You May Also Like