Content Delivery Networks
Cache your Web content at different locations on the Internet to more quickly serve pages to users.
February 4, 2002
Who needs them?
You can easily become bogged down in the day-to-day aspects of an IIS administrator's job—such as installing the most recent security hotfixes—and overlook more interesting tasks such as tuning Web site performance, ensuring scalability, and optimizing bandwidth utilization. For the good of your Web site and your sanity, take a few minutes away from NTBugtraq to learn how Content Delivery Networks (CDNs) can improve your site's performance while reducing your costs.
What's a CDN?
A CDN is a set of multiple, coordinated HTTP proxy servers—called CDN cache nodes—distributed throughout the Internet. Proxy servers, such as Microsoft Internet Security and Acceleration (ISA) Server 2000, sit between a Web server and the public Internet and cache frequently accessed pages and images. As Figure 1, page 2, shows, a proxy server receives an incoming request exactly as though the proxy server were a Web server. The proxy server then impersonates a Web browser to request the page from the actual Web server. After the Web server returns the page, the proxy server forwards a copy to the end user and stores a copy on its local hard disk. The next time a user requests that page, the proxy server can serve it directly from its hard disk, saving a request to the Web server.
A CDN distributes Web site content throughout the Internet to cache nodes at the ISPs that provide access to the end users who visit your site. The cache nodes serve the Web pages faster than your Web server can because the pages don't have to cross the congested Internet backbone. Figure 2, page 2, shows a small CDN consisting of two cache nodes at two ISPs. However, CDN providers can have thousands of nodes distributed throughout the Internet.
To determine the ideal locations for its cache nodes, a CDN provider must understand where users connect to the Internet and how traffic travels between ISPs. After identifying locations, the CDN provider must make arrangements with ISPs to plug its cache nodes into their networks. Although most ISPs offer rack space for hosting Web servers, CDN providers need to get their cache nodes as close to the network as possible—preferably directly connected to the ISP's main routers. CDN providers might pay an ISP to plug a cache node into the ISP's network, or they might convince the ISP that having a cache node on the ISP's network will improve performance for the ISP's customers and reduce the ISP's upstream bandwidth requirements.
Contracting with a CDN provider to move your Web site's content closer to your end users has several positive effects. First, it improves the performance of your Web site for users because their requests don't have to incur the delay of crossing the Internet. Second, a CDN takes the load off your Web server, so you don't need as many servers or processors to meet your users' demands. Finally, it reduces the bandwidth your Web server generates, which saves you money because you can purchase less bandwidth from your ISP or hosting service provider. The CDN provider charges you for the bandwidth you use, but this cost is typically lower than the alternative.
Of course, using a CDN presents some technical challenges. If you manage one IIS server, making a change to your Web site is as simple as editing a file. Changes are more complicated when your content is distributed around the Internet. At a minimum, CDN cache nodes respect the content expiration setting you configure on the HTTP Headers tab of the Web site's Properties dialog box (just as browsers do). Some CDN providers also offer a proprietary UI for removing content from cache nodes.
CDN Origins
CDNs were originally designed to compensate for Internet performance problems. From 1996 to 1998, Internet use was growing too quickly for most ISPs to keep up with. Internet backbones and peering points, in particular, were becoming performance bottlenecks. (Peering points are locations on the Internet at which ISPs freely exchange traffic with one another.) The performance of a Web site for a given user is tied to the number of hops between the two points—the more hops across the backbone the traffic had to make, the slower the site performed for the user when the Internet backbone was slow. E-commerce sites lost money because performance was so bad that customers left the sites without ordering.
Given that the backbone was a significant cause of poor performance, the idea of placing a Web site in multiple places on the Internet made sense. The promise of performance improvement made CDNs a clear value proposition.
Then, in 2000, everything changed. When the dot-coms went bust, the number of e-commerce sites dwindled. As the growth of the Internet slowed, ISPs had a chance to catch up with the performance demands placed on the backbone. Now that backbone and Web site performance has improved, who still needs a CDN?
Scalability and Bandwidth
Perhaps you do, but you probably don't need a CDN to reduce Internet latency. Today, CDNs are most useful to those who have a different performance-related need—Web site scalability. It makes sense that you can reach a larger audience if you serve content from 100 different devices on the Internet than from one Web server. CDNs let a Web site scale to accommodate many times more users than would otherwise be possible, and they do so with the cost-efficiency of a shared infrastructure.
CDNs aren't the only way to reach more users—you could add more IIS servers instead. However, adding enough Web servers to meet your site's maximum-capacity requirements is an expensive proposition. In fact, the need to build to peak-capacity requirements is one of the greatest inefficiencies of large-scale Web sites. If your Web site requires 100 servers to meet its peak requirements, you must buy the servers and network capacity up front and pay the administrators to manage these resources year-round, even if you hit peak capacity only once in a year (or not at all). Most of the year, your Web servers will work far below maximum capacity, and precious computing and networking resources will be wasted.
CDN service providers eliminate this inefficiency by offering multiple Web sites a shared infrastructure. Companies purchase a subscription to a CDN, and the service provider distributes their Web sites to tens, hundreds, or thousands of different CDN cache nodes. A CDN cache node might serve pages for hundreds of different Web sites. Efficiency results because, in theory, the Web sites that share space on a CDN won't hit their peak-capacity requirements at the same time. A local news Web site might hit its peak utilization during local elections; an e-commerce site is at its busiest during the holiday shopping season; a movie site takes numerous hits near the movie release date; and an online tax service spikes in the days or weeks leading up to April 15. Because the peak requirements for these Web sites fall at different times of the year, the sites make more efficient use of processing power and network capacity than if they each relied on purely dedicated solutions.
September 11, 2001, was an extraordinarily busy day for news Web sites. Fortunately, most of the major news organizations use CDNs. CDN service providers recognized the importance of getting the latest news to Internet users and dynamically increased news Web sites' resource allocation. Of course, allocating additional resources to news organizations meant resources were taken away from other types of sites. This situation illustrates both the key advantage and disadvantage of CDNs—you should have more capacity when you need it, but the possibility exists that the high demands of other Web sites will steal resources from you.
The bandwidth of your Internet connection is another factor that can limit your Web site's performance. Consider this example: Your Web site uses an average of 700Kbps of bandwidth on a T1 line to your ISP that can carry about 1500Kbps of traffic. However, you want the site to continue to function well during periods of peak utilization, in which traffic can soar to as much as 10 times the average utilization. To handle 7000Kbps of Web traffic, you could purchase more costly bandwidth from your ISP or have a CDN service provider cache your content to reduce traffic at your location. A CDN that cached 90 percent of your content would reduce the bandwidth requirement on your ISP connection to 70Kbps average and 700Kbps peak, making your current T1 connection sufficient.
CDN pricing structures vary. Typically, CDN providers bill customers on a monthly basis. Customers commit to a minimal amount of bandwidth and are charged every month for this bandwidth, whether they use it or not. If they burst above this bandwidth during a given month, they're charged at the burst rate for the month. For example, if you consume an average of 3Mbps for 3 weeks out of the month but burst to 10Mbps during the other week, your provider will charge you for 10Mbps for that month.
CDN Gotchas
Proxy servers and CDN cache nodes were originally designed to cache static content. As Web pages have become more complex, Web servers have begun to dynamically generate more content. All IIS administrators are familiar with Active Server Pages (ASP) applications, which return standard HTML to the end user but which IIS generates dynamically as the user requests them. Most commercial IIS sites have a Microsoft SQL Server database that ASP applications query for information such as personal preferences, product catalogs, and shipping information. IIS generates these types of pages independently for each incoming request, so today's CDNs can't cache them (although future CDNs will probably be able to).
Some CDN providers deceptively claim to cache dynamic content. What they might be doing is caching the results of ASP pages that don't change between requests. For example, a CDN can cache a page that contains an ASP command that simply directs the page to include another HTML file, as in
Before you write off CDNs, thinking that your Web pages are too dynamic to cache, consider that static images account for the majority of the bandwidth that the average Web site generates. Even if IIS dynamically generates a page, the images in the page are almost always static.When evaluating CDN providers, make sure you understand their routing algorithm. Serving content from the CDN cache node closest to the end user isn't as cut and dried as it might seem. Imagine that you're in a big city with one-way streets, construction detours, and traffic jams and you need to get to the nearest cash machine. In addition to a map of city streets and ATMs, you'd need up-to-the-minute traffic information. The Internet is far larger and more dynamic than even the largest city. The performance benefit a CDN can offer is only as good as the algorithm it uses to route users to the best cache node.You should also make sure that your CDN provider supplies the usage reporting that you need. IIS can log all requests, and software such as Microsoft Commerce Server 2000 makes building reports on IIS logs easy. However, if you use a CDN, many user requests won't ever reach your Web servers. You can still track your users' behavior, but doing so is a little more difficult with a CDN. All the major CDN providers offer reporting and log-aggregation services, but you'll need to gather your log files from your CDN provider and correlate them with your own IIS logs.The Future of CDNs As I mentioned earlier, the original concept behind CDNs was to push simple, static Web site content out to the edges of the Internet. However, the modern Internet is neither simple nor static. Most popular Web sites include personalized content, shopping-cart capabilities, or search features. Because traditional CDNs can't cache dynamic content, they must forward all requests for such Web pages to the Web site. In these situations, CDNs actually slow site performance by adding an unnecessary step.Managed Edge Services, or edge computing, describes the next generation of CDNs, which are capable of dynamically generating some types of content at the edge of the Internet—without sending a request to the host Web site. Personalization features are the first dynamic content type to be pushed to the edge of the Internet.You've probably used a personalized Web site. If you've surfed to CNN.com, the first time you did so, you were asked whether you wanted to visit the US, European, or Asian site. The site stored your answer in a cookie on your computer. Each subsequent time that you go to CNN.com, your browser sends the cookie to the site so that it can send you the correct regional version. Sending a particular page in response to a cookie is easy for a dynamic Web site but is a fairly recent advancement for CDNs.Over the next few years, CDNs will add much more intelligence. You'll be able to browse an online catalog, purchase an item, and receive a receipt without the CDN sending your request on to the Web site. Transaction processing and database content will be on CDN nodes, and synchronization between the nodes and the Web site will happen transparently to the users and Web masters. Eventually, the need for an origin IIS server might be eliminated completely, and content authors will publish updates directly to the CDN.Edge Side Includes (ESI) are the best-developed standards for dynamic CDN pages. For more information about ESI, visit http://www.esi.org.Who Needs a CDN? If you answer yes to any of these questions, you should investigate CDNs.Does your site have extreme seasonal peaks in traffic?Is your site prone to random bursts of traffic?Is response time currently unacceptable?Are you paying your ISP too much for bandwidth?Do you have a plan for scaling your site if your traffic increases?For a list of CDN service providers, visit http://www.webreference.com/internet/software/site_management/cdns.html. For more general information about CDNs, visit http://www.caching.com.CDNs aren't for everyone, but for many sites, they're a necessity. Some companies use them to improve their site's performance so that visitors will keep coming back to their Web site. The additional revenue they receive from customers more than offsets the CDN subscription. Other companies can justify a CDN because the CDN bandwidth charges are lower than those of their hosting service provider. CDNs can reduce the number of IIS servers a site needs, which offsets hardware, software, and administrative costs.Understanding CDNs is important because they can save you some cash. More important, if your site suddenly gets busy, they can save your reputation.
About the Author
You May Also Like