Progressive Perl for Windows: Cookies, the Cache, and You, Part 2
In Part 1 of this two-part series, Dave Roth explained how to use the WinInet library to retrieve and set cookies. In Part 2, he further explores how to use this library to prestuff, delete, and search for items in the cache.
July 22, 2001
The Win32 Internet (WinInet) library contains functions that enable an application such as Microsoft Internet Explorer (IE) and MSN Messenger Service to store or cache cookies and Web pages. Last month, I explained how to use the WinInet library to retrieve and set cookies. This month, I further explore how to use this library. After I explain the concept of Web caching, I show you how to prestuff, delete, and search for items in the cache.
Web Caching
The best way to understand Web caching is to see what would happen if it didn't exist. Suppose your application needs to access a Web page. The application calls the WinInet library and requests the desired URL. In response, WinInet accesses the Internet, retrieves the requested Web page, and hands the page over to your application. A few minutes later, another application requests that WinInet retrieve the same URL. WinInet again accesses the Internet, retrieves the same Web page, and hands the page over to the second application. As you can see, this process is redundant and wastes bandwidth, time, and network resources.
Fortunately, Web caches exist. Think of a Web cache as a temporary storage location for Web files. WinInet maintains a cache of requested Web content on the hard disk. So, here's what really happens in the example just given. When your application requests the desired URL, WinInet accesses the Internet, retrieves the requested Web page, hands the page over to your application, and stores a temporary copy of the Web page in the cache. When the second application requests the same URL, WinInet simply retrieves the copy on the hard disk instead of retrieving the Web page from the Internet again. As you can see, this setup is faster and doesn't stress network resources.
If you open IE's Internet Options or the Control Panel Internet Options applet, you'll discover all the caching options. These options affect all applications that rely on the WinInet library to interact with the Web.
How the Cache Works
WinInet doesn't cache everything. Typically, WinInet caches the various graphics, style sheets, and other downloadable Web objects that make up a Web page. However, a Web master might decide not to cache certain objects locally. For these objects, the Web master can specify a cache-control directive. Such directives instruct WinInet not to cache an object or to cache an object but provide an expiration date.
When Web content has an expiration date, WinInet can use the cached copy until the specified date. After that time, the library must reconnect to the Web server and check for a newer copy of the content. If a newer copy exists, WinInet downloads that copy and replaces the old copy with it. This process is known as freshness checking.
Web masters often set their content to expire every 24 hours. This expiration date accommodates Web pages that change daily (e.g., a daily news site) but still lets users cache Web pages. Because another request for that Web page won't occur for at least 24 hours, the hosting server's load is reduced.
The WinInet library saves cached Web objects in a special location on your computer's hard disk. However, the names of the directories and files are hard to decipher. For example, the file C:documents and settingsrothdlocal settingstemporary internet filescontent.ie5wxyz4127844228[1].htm corresponds to a Web page from http://www.slashdot.com. Therefore, Perl scripts that interact with these cached objects must rely on WinInet's ability to make sense of the directory names and filenames.
As I mentioned last month, a Perl script can't directly access the WinInet caching functions. Although there's a Win32::Internet extension that uses WinInet to access Web and FTP content, this extension lacks the various caching functions. Therefore, to use WinInet's caching features, a script needs to use the Win32::API or Win32::API::Prototype extensions. In the code samples in this column, I use Win32::API::Prototype because the code is easier for non-C programmers to understand. You can use the Perl Package Manager (PPM) tool to install the Win32::API::Prototype extension by typing ppm install http://www.roth.net/perl/packages/win32-api-prototype.ppd. Alternatively, you can find the module on the Comprehensive Perl Archive Network (CPAN—http://www.cpan.org).
Prestuffing the Cache
Because the WinInet cache is simply a location on your hard disk in which the library stores files, you can add to, or prestuff, the cache. You prestuff a cache so that when users request the content, the content comes from the cache instead of the Web server.
Listing 1 contains the script Prestuff.pl, which you can use to prestuff a cache with an Internet or intranet Web page. To launch this script, you use the syntax
perl Prestuff.pl Url File
where Url is the address of the Internet or intranet Web page to stuff (e.g., http://www.amazon.com, http://www.mydomain.com) and File is the pathname of the file to prestuff (e.g., C:tempmyamazonpage.htm, C:temptestcache.txt).
Here's how Prestuff.pl works. The code at callout A in Listing 1 calls the NewString() function, which the Win32::API::Prototype exports. NewString() creates a Perl string of the specified size (in this case, 256 characters). Whenever you pass a string pointer to a WinInet function (or any DLL function that Win32::API exposes), you need to create such a Perl string.
After the script assigns the Perl string to the $Path variable, the script uses the Win32::API::Prototype's exported ApiLink() function to create a link to WinInet's CreateUrlCacheEntry and CommitUrlCacheEntry functions. CreateUrlCacheEntry creates an appropriate cache filename for the file you want to prestuff. CommitUrlCacheEntry writes that file to the cache.
ApiLink() isn't as complex as it looks. To use this function, you pass in the name of the library (in this case, WinInet) as the first argument. The second argument is a string that represents the library's C-language function prototype for the particular WinInet function you're accessing. You can copy such strings from the Microsoft Developer Network (MSDN) Online Web Workshop. Go to the Microsoft Win32 Internet Functions Web page at http://msdn.microsoft.com/workshop/networking/wininet/wininet.asp. Under the Function heading, select the desired WinInet function.
The code at callout B in Listing 1 collects the data to prestuff. If you provide a file's pathname when you launch the script, the script opens that file and collects the data. If you don't provide a pathname, the script does nothing. At this point, you could adapt Prestuff.pl so that it connects to the URL you provide when you launch the script.
The line of code after callout B creates a new cache entry in the WinInet cache. Note that the script passes the $Path variable to the function. Remember that the $Path variable holds an empty string that consists of 256 characters. The call to CreateUrlCacheEntry assumes that a buffer to which $Path points contains 256 characters.
If CreateUrlCacheEntry is successful, the script sets the $Path variable to the cache file's full path. Then, the code at callout C in Listing 1 opens the file and stores the cached data into it. Finally, the script calls CommitUrlCacheEntry, which writes the new cache file to the cache.
If you want to experiment with prestuffing the cache, download Prestuff.pl from the Code Library on the Windows Scripting Solutions Web site (http://www.winscriptingsolutions.com). Go to the command line, and run the code that Listing 2 shows. Then, open IE and go to http://www.mydomain.com. You should see your test cache page, unless the Web server's page is newer than your prestuffed copy. To prevent this problem, you can disable the Check for newer versions of stored page option in the Internet Options applet. You can also access this option from IE. Go to Tools, Internet Options.
Deleting Objects
After you've prestuffed all sorts of test pages into the cache, you might want to remove them. To do so, you can use the script DeleteCache.pl, which Listing 3 shows. You launch this script with the command
perl DeleteCache.pl URL
where URL is the address for which you want to remove all cached Web objects. You can pass in multiple URLs.
In DeleteCache.pl, the code at callout A in Listing 3 does all the work. It calls the DeleteUrlCacheEntry function and passes in the URL you specified at the command line.
Searching the Cache
Now that you know how to add and remove cached items, let's tackle the job of looking through WinInet's cache. In Part 1, I mentioned that cookies, cached items, and a user's URL history are all in the same database—the WinInet cache. In the Code Library, you can find the script ShowCache.pl, which interacts with the WinInet cache to procure lists of cookies, cached items, and URL histories.
To launch ShowCache.pl, you use the command
perl ShowCache.pl Filter SearchFor
where Filter is the filter you want to use during the search and SearchFor is the search string. For the Filter parameter, you have three options:
You can specify cookie as the Filter parameter if you want to list only cached cookies.
You can specify visited as the Filter parameter if you want to list only cached content.
You can specify *.* as the Filter parameter if you want to list all the items in the cache. Alternatively, you can leave out that parameter.
For the SearchFor parameter, you pass in a regular-expression string that represents the URL for which you want to see the cached cookies or content. If you want the script to search for all the URLs in the cache, you don't pass in a value for the SearchFor parameter. If you don't receive any output, your Internet options might be preventing the storage of visited sites or cookies.
Because the string is a regular expression, you can search for specific types of cached content. For example, you can use amazon.com/.*.htm to search for only amazon.com Web pages (and not graphics or other types of files). However, be aware that if you pass in a string such as amazon.com, the script searches for amazon.com, amazonacom, amazonbcom, and so forth because the period (.) is a wildcard character. Fortunately, in most instances, a widened search doesn't present problems.
By specifying both the Filter and SearchFor parameters, you can effectively target the cached information you want. For example, if you use the command
perl ShowCache.pl cookie amazon.com
ShowCache.pl will list all the cached cookies for any site from amazon.com.
Two options that you can use with ShowCache.pl are the verbose (/v) and delete (/d) switches. The /v switch prompts the script to enumerate not only the requested cache entry (i.e., the cookies, content, or both) but also the actual data associated with that cache entry. The /d switch prompts the script to delete the requested cache entry. You put the /v or /d switch after the SearchFor parameter:
perl ShowCache.pl cookie amazon.com /v
Listing 4 contains an excerpt from ShowCache.pl. Here's a quick overview of how ShowCache.pl works. As the code at callout A in Listing 4 shows, the script first creates an empty 1KB block of memory for the data it will obtain from the cache. Then, WinInet's FindFirstUrlCacheEntryEx and FindNextUrlCacheEntryEx functions work together to perform a filtered enumeration of the cache and return a special packed cache-entry data structure. FindFirstUrlCacheEntryEx and FindNextUrlCacheEntryEx fill the empty 1KB block of memory with this data structure.
Next, the unpack() function cracks open the data structure to reveal the header size. This value specifies the length (in bytes) of the data structure's HeaderInfo member. You need to know HeaderInfo's length because you use it to procure all the values from the cache-entry data structure.
Unpacking the cache-entry data structure results in an array of values. Callout B in Listing 4 highlights a neat trick that the script uses to populate a hash with those values. This trick is known as creating a slice. If you cast a hash as an array, you can assign values to the hash keys by specifying the order of the keys. For example, the code
@List{ 'one', 'two', 'three' } = (1,2,3);
assigns hash keys one, two, and three the values of 1, 2, and 3, respectively.
If you didn't use any switches, ShowCache.pl prints the results. If you used the /v switch, the script opens the cache file, collects the extra data, and prints the results. If you used the /d switch, the script attempts to delete each applicable cache entry and prints whether that delete operation succeeded or failed.
Cache Away
I've touched only the tip of the proverbial iceberg of WinInet's caching capabilities. If you want to learn more, check out the Microsoft Win32 Internet Functions Web page. You might be surprised at how useful the WinInet functions can be. For example, you might use the functions in a network logon script to go through the cache and delete any questionable cookies or prestuff content you expect the user to access.
About the Author
You May Also Like