Fast Cache, Fresh Data
Pull the trigger on stale files in your Web database cache
October 18, 2004
For most Web applications, caching relatively static database data in memory can boost application performance several fold. Caching not only takes some of the load off your database server, but it can also eliminate a network round-trip. Often, a Web application stores its cache in a memory location (either in-process or out of process) on a Web server or an application server; optimally, you'd store the cache in the Web application's local memory. In ASP.NET, the System.Web.Caching.Cache class is stored in local memory, which can greatly increase application performance. ASP.NET has made data-cache implementation so easy and safe that many people have switched their applications to ASP.NET, usually for the performance gains they can get from caching.
Most developers understand that caching is good for performance, but it has one major drawback. Namely, when the cache's underlying data changes, you can have stale data in your cache. No matter how fast your application can retrieve the data from the cache, if the data isn't current, it probably does users no good. ASP.NET lets you create time-based dependencies or dependencies between items in the cache and files in the file system. For example, if you use the ASP.NET Cache object to cache an XML file into memory and have cache and file dependencies set up, then when the file changes, the object automatically invalidates the cache. Or, if you have a time dependency defined, the file's contents would expire after the specified time constraint. Caching combined with cache dependencies is very powerful because it lets you cache data on your Web server without having to worry about it becoming stale. The only limitation on cache dependencies in Microsoft .NET Framework 1.1 and 1.0 is that these versions have no support for database cache dependencies. But what if you aren't developing in ASP.NET? What if you support Web applications on multiple platforms and still need database cache dependencies? Let's look at an example application with a flexible, generic design that lets you gain all the performance advantages of caching in your applications without having to worry if your cache has become out of sync with your database.
In this example scenario, you have Web applications operating alone or as part of a Web-server farm that consumes data from a relational database. You need to be able to invalidate Web-application data caches when the database tables that the caches are derived from change. The solution is to associate caches and their dependencies in a data store, essentially providing the framework to link a cache and its associated database tables. Also, you can define callbacks from the information stores that are connected with local memory caches within Web applications. When updates happen on the table that a cache is based on, a trigger on that table asynchronously invalidates the Web applications' local memory caches. Having the invalidation occur asynchronously is important so that SQL Server doesn't hold locks on the updated table while invalidating the cache.
The advantage of using this design is that Web applications can reap all the benefits of caching database data and still be able to invalidate their local caches when the database data changes. The downside to this design is that it requires extra processing on both the Web server and the database server to set up and invalidate the caches. You shouldn't use this design in scenarios where the database data will be constantly changing; the extra cache-invalidation processing will defeat the purpose of the cache.
This implementation uses SQL Server to track the cache dependencies and notify an application that its cache has expired when the underlying table's data changes. To facilitate the implementation of cache dependencies, I created the database structure that Figure 1 shows. This structure lets you associate more than one database table with the same Web-application cache. (For other solutions to this problem, see the sidebar "Related Solutions.")
Database Cache Dependencies Old and New
In the upcoming .NET Framework 2.0, Microsoft adds support for database cache dependencies, requiring either SQL Server 2000 or 7.0 or SQL Server 2005 (formerly code-named Yukon). The .NET database cache dependency support doesn't work with other databases. Table 1 shows the functional differences between using SQL Server 2000/7.0 and Yukon for database cache dependencies in .NET Framework 2.0. But Microsoft probably won't release .NET Framework 2.0 and Yukon to production for several months at least. So if you need to implement caching now or you need to add this functionality to Web applications that run on platforms other than ASP.NET, this article's solution is for you.
I wanted to develop a solution that works for Active Server Pages (ASP), ASP.NET, Common Gateway Interface (CGI), Cold Fusion, or any other Web-application platform, so I looked for a generic design. My next goal was to build a scalable solution that I could use in a production environment. With performance and scalability in mind, I then implemented the design, using SQL Server 2000 to track database cache dependencies and to asynchronously inform Web applications that a database cache has expired when the cache's underlying table data has changed. (I didn't deal with time dependencies in this solution, however.)
Note that this solution doesn't require SQL Server 2000; you could implement it with almost any relational database management system (RDBMS) that supports triggers, a way of making HTTP calls, and (optionally) asynchronous programming. SQL Server is my database of choice, so I used SQL Server 2000 as the cache-dependency server.
In the other solutions I've seen, the cache invalidations inside the trigger all happen synchronously. This synchronous invalidation could lead to serious performance degradation in a production environment. Even though the triggers in the other solutions don't do much work, they have to call an extended stored procedure. One of the procedures makes an HTTP call to a Web server, and if any delays occur during that HTTP call, either on the Web server or across the network, the trigger will block the table's transaction. Also, if it encounters an error during the HTTP call, SQL Server rolls back the UPDATE action on the table. The transaction on the table might have been legitimate, but because of an error during cache invalidation, nobody can update the table.
I wanted to avoid this problem and decouple the callback logic so that the cache invalidation would happen asynchronously. My solution dynamically creates a SQL Server job to invalidate the cache, then sets the job to execute a few seconds after creation and deletes itself if execution is successful. This way, the callback executes on a different thread from the trigger on the database table and doesn't block the current update transaction. Figure 2 uses a sequence diagram to show the flow of actions when the trigger fires. As Figure 2 shows, the table trigger first creates a job, then adds a job step to call the sp_ProcessCacheExpiryQueue stored procedure to invalidate the cache items. The SQL Server Agent service then executes the job, which calls the stored procedures that make an HTTP call back to the Web server to expire the cache.
The Table Trigger
As I mentioned, this solution is based on a trigger whose main task is to create the job, which calls a few system stored procedures all wrapped inside one stored procedure. Listing 1 shows the template code for creating the trigger. The trigger first calls the sp_CacheExpiryQueue_Insert stored procedure from Listing 2 to add to the CacheExpiryQueue table the name of the table that's being updated. The CacheExpiryQueue table lets the job know which tables' caches it needs to invalidate. As soon as the job invalidates the cache, the job deletes the table-name entry from this table.
The second stored procedure the trigger calls is sp_Create_Process_CacheExpiryQueue_Job. This stored procedure wraps all the system stored procedures you need to create and schedule a job. Listing 2 shows the code for sp_Create_Process_CacheExpiryQueue_Job, which has five steps. First, it creates a unique job name. Next, it creates the job in the msdb database, flagging the job to delete itself if it succeeds. Third, it adds a step for the job, designating that the step will use T-SQL to execute the sp_Process_CacheExpiryQueue stored procedure. Fourth, it calculates the date 10 seconds from the current system time, casts it as an integer, then stores this value in a variable. Finally, the procedure schedules the job to run only once, with a start time of the previously calculated integer variable value.
After it creates the job, the trigger's work is done. Ten seconds later, the SQLServerAgent service executes the job that the trigger created. This means that the job's thread (not the trigger's thread) executes the sp_Process_CacheExpiryQueue stored procedure when the job runs.
The sp_Process_CacheExpiryQueue stored procedure loops through all the table names in the CacheExpiryQueue table, calls sp_Expire_Cache to make the HTTP call to the Web server for each one, then deletes the table-name entry from the CacheExpiryQueue table. Listing 3 shows how to code all this without using a cursor.
To get around having to use a cursor, I simply create a table variable that has an IDENTITY column that starts at 1 and increments by 1. This technique lets you return a known sequence of all the values in the CacheExpiryQueue table. At callout A, the code starts a While loop where the counter variable is less than or equal to the count of records in the table variable. To return the table name, the code queries the table variable where the counter value is equal to the IDENTITY column value. Next, the code at callout B passes the table name as a parameter to the sp_Expire_Cache stored procedure. The code at callout C then increments the counter by 1 after every row in the table variable is processed. If you skip this important step of incrementing the counter, the SQL Server thread will spin forever in an infinite loop.
The sp_Expire_Cache stored procedure, which Listing 4 shows, is responsible for expiring all the Web-application data caches associated with the table name that the code at callout B in Listing 3 passed in to it. The procedure first finds the Web data caches that need invalidating by querying and joining the CacheExpiryUrl and CacheDependency tables. The sp_Expire_Cache stored procedure then puts the results of this query into a table variable that lets it loop through the results in much the same way as the looping code in sp_Process_CacheExpiryQueue does. During the loop, the MSXML2.ServerXMLHTTP COM object requests the cache expiration URL from the Web server. This object is easy to use and thread-safe (that is, you can call it from multiple threads without unwanted interaction between the threads), and it makes the code easy to deploy because you don't need to create any new extended stored procedures to make HTTP calls. You call this COM object by using the sp_OACreate, sp_OAGetErrorInfo, and sp_OAMethod system stored procedures. To execute these stored procedures, the calling user must either be a member of the sysadmin fixed server role or have explicit EXECUTE permissions on the stored procedures. Because you're executing this job in the context of the SQLServerAgent service, you just need to make sure that the account that the SQLServerAgent service is running under is a member of the sysadmin fixed server role or that it has EXECUTE permissions on these stored procedures.
At callout A in Listing 4, the stored procedure creates the COM object before the loop so that you can reuse the same COM object instance for all items in the loop. Calling the sp_OACreate system stored procedure creates the object. To do this, the code at callout A passes in the programmatic identifier (ProgID) parameter MSXML2.ServerXMLHTTP. This stored procedure has an integer type OUTPUT variable (@output), which the code uses as the reference pointer to calling the object's methods. The procedure then uses the sp_OAMethod system stored procedure to call the COM object's open and send methods so that it can make the HTTP request to the Web server defined in the URL. After sending the request, the code checks the HTTP status returned from the Web server. If the status is 200, the request was successful. After every call to the COM object, including its creation, you must check the HRESULT value to see whether an error occurred. If the value is anything other than 0, an error has occurred. In the case of an error, the code at callout B calls the sp_OAGetErrorInfo system stored procedure to retrieve the error information. The procedure then sends the error information to the sp_CacheExpiryError_Insert stored procedure, which adds an entry to the CacheExpiryError table. You don't need to explicitly destroy the reference to the COM object; SQL Server will automatically destroy it at the end of the stored procedure.
As I mentioned earlier, you could implement this design on other relational databases besides SQL Server. As long as your RDBMS supports triggers and some mechanism to make HTTP requests, you can at least implement the design synchronously. If your database system supports programmatically creating and scheduling jobs, you can implement the design asynchronously.
Implementing the Design on the Client
In this article, the client I refer to is the Web application. I implemented this article's client solutions in ASP and ASP.NET, but you could just as easily implement the clients in Cold Fusion, CGI, IBM WebSphere, and other Web-application platforms. Both sample clients I provided in this article have a dynamic Web page that retrieves all the rows from the Northwind database's Customers table, then displays the count of all customers to the screen.
When executing, each client application's Web page must perform the following steps:
Try to get the customer data from the cache.
If the data isn't in the cache, perform the following steps:
Get the customer data from the database.
Cache the customer data locally.
Create a cache dependency in the database.
Create a trigger on the database table.
Display the count of records in the set of records.
Of the Web page's execution steps, the only ones that really participate in my design are the ones that create the cache dependency in the database and create the trigger on the database table. To create the database cache dependency, you need to call the sp_CreateCacheDependency stored procedure, which Listing 5 creates.
This stored procedure accepts as its parameters the URL for the Web page that invalidates the local data cache and the database table that the cache is dependent on. The procedure first checks that no record already exists in the database for the given URL, then inserts the data into the CacheExpiryUrl and CacheDependency tables.
To create the trigger, the code from the Web application sends a dynamic SQL string to the database server and executes it. To perform this action in SQL Server, the user must be a member of the db_ddladmin fixed database role. Although the Web applications in the example solution create the trigger dynamically, you could also create it ahead of time.
In addition to the execution steps I just mentioned, each Web application must implement a Web page that does nothing but invalidate the local data cache. This client piece of the design is extremely important because it completes the cache-invalidation sequence diagram that Figure 2 shows. It's important that the Web page does nothing else because this page's URL is called by the database server's job, and you want this page to return a response to the database server as quickly as possible.
In the code downloads for this article, which you can find at InstantDoc ID 43982, I include a script that creates all the supporting database tables, constraints, indexes, stored procedures, and even a new database user that the client applications use in their database connection strings. I also include full client implementations in both classic ASP 3.0 and ASP.NET, using the .NET Framework 1.1. To get the client solutions working, you just need to create virtual directories that point to the solution files, then run the database script on your SQL Server 2000 database.
In this article, I showed you how to create database cache dependencies for your Web applications and implement the solution on one database platform and two Web-application platforms. This solution builds on the good ideas of others but takes those ideas a few steps further with more emphasis on performance, scalability, and flexibility. The solution is highly scalable because all the work required to invalidate the database cache dependency occurs on a separate thread. By leveraging the ideas and techniques in this article, you can confidently build database caching into your Web applications without having to worry about stale data.
About the Author
You May Also Like