Smart SEO
(March 2009 Issue)
October 30, 2009
asp:Feature
LANGUAGES:C# | VB.NET
ASP.NETVERSIONS: ALL
Smart SEO
URL Rewriting Using ASP.NET and IIS 6.0
By Jeffrey Hasan
Search Engine Optimization (SEO) has become the majordriver for Uniform Resource Locator (URL) rewriting in Web sites. A URL is theonline discoverable address for a Web resource, including Web pages, andWeb-hosted documents and images. For simplicity, this article focuses on Webpages as the Web resource of primary concern.
In the world of online search engines, page rank and linkequity are a measure of how relevant and popular a specific Web page is for acertain search request. High page rank is what enables your Web page to appearon the first page of Google search results, instead of the thirtieth. In theultra-competitive world of online commerce, this difference is significant, as Website visitors are most likely to select their Web site destination from thefirst or second pages of choices.
With SEO, the Web site becomes less important than aspecific Web page, because search engines deconstruct your Web site intorelevant pages. High page rank and link equity cannot be achieved overnight;instead, they must be earned over a period of time. And once earned, they mustbe preserved which means your Web page URLs cannot change. Otherwise, you muststart rebuilding the page rank and link equity from the ground up.
URL rewriting enables your Web application to supportSEO-friendly URLs and logical site navigation, while using a different physicalnavigation structure in the backend. URL rewriting also allows you to makenecessary adjustments to your Web site navigation structure withoutcompromising your search engine rankings.
Note: This articlefocuses on URL rewriting using IIS 6.0 and all versions of ASP.NET. The newer IIS7.0 and ASP.NET 3.5 integrate differently and provide different options for URLrewriting. This will be the topic of a companion article to appear in a futureissue of asp.netPRO.
What Is URL Rewriting?
URL rewriting involves intercepting an incoming HTTPrequest, then remapping the request to an alternate URL. URL rewriting can beperformed at the Web server level or at the Web application level. At the Webserver level, the URL rewriting is handled directly by Internet InformationServer (IIS), in the case of Windows server platforms. At the Web applicationlevel, the URL rewriting may be handled at various points within theapplication execution workflow. Your choice of intercept point depends on thenature of the URL rewriting and the type of Web resource being requested.
What Is an SEO-friendly URL?
SEO-friendly URLs avoid using querystring parameters anduse qualified path formats instead. Here s an example of an SEO-friendly(static) URL for an HP printer on an e-commerce site:
http://www.jeffshop.com/printers/hp/P1006.aspx
Here s an alternate (dynamic) URL for the same product,but constructed in an unfriendly way for SEO:
http://www.jeffshop.com/printers.aspx?mfr=hp&model=p1006
The contrast between the two URLs is clear: the friendly(static) URL has a simple path structure, whereas the unfriendly (dynamic) URLuses multiple querystring parameters. Search engines interpret uniquecombinations of query parameters as different Web pages, which dilute therelevance of the printers destination node within the e-commerce site.
Even better from an SEO perspective is to load up theproduct URL with rich keywords, as in:
http://www.jeffshop.com/forsale/hp-laserjet-p1006-printer-monochrome-17-ppm/876543.html
This URL format preserves the SEO-friendly simplequalified path structure, and adds a keyword-rich sub-path section to the URL.In this example, 876543.html could correspond to a product identifier, such asa stock keeping unit (SKU).
Finally, you could eliminate the .html Web page file extensionand publish a simplified SEO-friendly URL, as in:
http://www.jeffshop.com/forsale/876543/hp-laserjet-p1006-printer-monochrome-17-ppm/
In this example the product SKU is part of the URL, but isnot called out as such. The Web server is instructed how to interpret this subportion of the URL, and could in fact ignore everything that comes after theproduct SKU portion of the URL. In effect, the Web server considers therelevant URL to be:
http://www.jeffshop.com/forsale/876543/
In reality, this URL doesn t have to physically exist onthe Web server. In fact, SEO-friendly URLs typically do not, being insteadlogical constructs that are tailored for the benefit of the search engine andthe site visitor rather than the application developer.
Now you can see the need for a URL bridge mechanism thatrewrites logical URLs into their associated physical URLs. This URL bridge willinterpret incoming SEO-friendly logical URLs and map them on the Web server tothe actual physical resource address. In the above example, the Web servercould process the incoming SEO-friendly product URL, then redirect the user tothe actual product page URL, as in:
http://www.jeffshop.com/catalog.aspx?sku=876543
How IIS 6.0 and ASP.NET Process HTTP Requests
Remember when you were first learning ASP.NET, and youskipped over the section that described the processing workflow for incomingHTTP requests? And how the ASP.NET engine processes the Web page and returnsthe HTTP response output via the Web server? Right, that section. Well, it stime to go back to the textbook, because you need to know how HTTP requests areprocessed by the Web server and the ASP.NET engine. Only then can youunderstand where HTTP requests can be intercepted, should you need to rewritethe requested URL and redirect the site visitor to an alternate location.
Grab a textbook if you have one handy; otherwise, refer toFigure 1, which summarizes the important steps in processing incoming HTTPrequests and returning HTTP responses.
Figure 1: The ASP.NET HTTP pipeline.
Briefly, here s how it works. The client browses a Webpage by typing a URL into their browser. The browser relays the HTTP request tothe Web server over the Internet. The Web server, in turn, parses the URL todetermine the type of resource being requested, whether it is a Web page, animage, or some other type of file.
Let s assume the client requests an ASP.NET page, with a*.aspx extension. The Web server, in this case, IIS, will direct the HTTPrequest to the ASP.NET engine, alternatively known as the ASP.NET HTTP pipeline.The HTTP request passes through a variable number of HTTP modules, whichinclude standard modules (e.g., for session state management and FormsAuthentication), plus any custom modules that the Web site developer has installedin the pipeline. After passing through the sequence of HTTP modules, the HTTPrequest reaches the dedicated HTTP handler for *.aspx Web pages (other Webresource types will have their own dedicated handlers, such as *.asmx). TheHTTP handler executes the compiled code for the Web page that was requested bythe client. Finally, an HTTP response is returned to the client through theASP.NET engine and out of the Web server. The HTTP response includes an HTTPheader code that could include:
HTTP 200. The requested resource was found at the URL location and a response has been successfully returned.
HTTP 301. The requested resource has been permanently redirected to an alternate URL location and a response has been successfully returned.
HTTP 404. The requested resource was not found. Web users know this status code by the common Page Not Found default status page.
From a search engine perspective, these are the mostsignificant of the various HTTP status codes, and the implications are noteworthy.A search engine spider successfully indexes HTTP 200 and 301 responses, and forthe latter it will permanently record the new URL location for the Web resourceand migrate the previous Web resource s page rank and history. However, forHTTP 404 responses the search engine spider will drop the Web resource from itsindex, thereby losing the page rank and link equity that this Web resourceformerly had. This is why you should always redirect users from old pages tonew ones, rather than present them with a Page Not Found error. URL redirectionallows you to preserve hard-earned page rank and link equity, and so isessential for any SEO-focused Web site.
URL Rewriting in ASP.NET and IIS 6.0
Before you can rewrite a URL you must intercept theincoming HTTP request and inspect it. Let s assume that the Web site visitorhas requested an ASP.NET Web page on your Web site. There are five appropriatelocations to intercept HTTP requests within the Web server and ASP.NET HTTP pipeline.These are shown in Figure 1 by numbered location; they are:
1) WithinIIS using an ISAPI filter, which is a plug-in that extends the functionality ofIIS.
2) Usinga custom HTTP handler to replace the default ASP.NET engine s *.aspx handler.
3) Usinga custom HTTP module to intercept and inspect the HTTP request before itreaches the ASP.NET handler.
4) Withinthe hosting Web site s Global.asax application file (e.g., in theApplication_BeginRequest event handler).
5) Withinthe compiled Web page code (e.g., in the Page_Load event handler).
Of these five options, we ll exclude looking at ISAPIfilters and custom HTTP handlers, which are the first and second options,respectively. ISAPI filters are beyond the scope of our focus. Custom HTTPhandlers are also out of scope, mainly because the URL rewriting modificationsthat we could make there can mostly be made in other code constructs instead,such as within HTTP modules.
Let s examine each of the remaining three options, andexpand on how URL rewrites are accomplished using ASP.NET and IIS.
URL Rewriting in the Web Application s Global.asax File
The Global.asax file is an optional file that resideswithin the Web application, and which provides programmatic access toapplication lifecycle events and the HTTP request and response messages. TheGlobal.asax file is like a simplified HTTP module, one that cannot be reusedacross multiple applications, but which is easier to use because it requires nospecial configuration within the Web application. With the Global.asax file youcan simply start coding into application and session-level event handlers. TheIIS Web server will bounce back any direct HTTP requests that are made on thisfile. In short, the Global.asax file lets you stay focused on code and not beconcerned about deployment issues.
Figure 2 illustrates a redirect scenario, where the sitenavigation for a product Web site has been redesigned. Originally, the Web siteserved product pages from a subdirectory named forsale. However, the sitenavigation has been flattened so that the product pages have all been moved tothe root of the application. Solution Explorer contains two pages named876543.aspx, one under the Web root (the active product page), the other underthe forsale subdirectory (the obsolete product page). The Web site has built ahigh page ranking for the product page under the forsale subdirectory, so theywant site visitors to continue to access this subdirectory but these incomingrequests must be redirected on the server to the corresponding page in the Webroot.
Figure 2: URL rewriting for anobsolete Web page.
Figure 3 provides the code listing for the Global.asaxApplication_Start event handler. The code first looks for HTTP request URLsthat contain the forsale subdirectory. If none are found, the request isallowed to execute normally. However, if a match is found, the code uses aregular expression to extract the Web page name and rewrite the URL so thispage is served from the Web root instead of from the forsale subdirectory.
void Application_Start(object sender, EventArgs e)
{
// Code that runs onapplication startup
string requestPage;
string requestURL =Context.Request.Url.ToString().ToLower();
if(requestURL.Contains("/forsale/"))
{
Regex regex = newRegex(@"([0-9]+).aspx", RegexOptions.IgnoreCase);
MatchCollection matches =regex.Matches(requestURL);
if (matches.Count >0)
{
// Rewrite the URL
requestPage =matches[0].Groups[1].ToString();
Context.RewritePath("../" + requestPage + ".aspx");
// Set HTTP Status301 - Permanent Redirection
Context.Response.Status = "301 Moved Permanently";
Context.Response.AddHeader("Location",Context.Request.Url.AbsoluteUri);
}
}
}
Figure 3: UsingContext.RewritePath in Global.asax.
After the URL is rewritten, the code manually updates theHTTP headers to record an updated HTTP status code 301 (permanent redirection)and the new fully qualified location of the replacement page. Without thisupdate, the new page would have rendered, but the browser URL would have displayedthe old URL and the HTTP status code would have read 200. This is adequatefor people visiting the site, but not for search engine spiders, which need torecord permanent redirections. Recall that this is the only way for the searchengine spider to transfer to the replacement page the page ranking and linkequity of the previous page.
Note that you should not use Response.Redirect in place ofthe RewritePath method, because this issues an HTTP status code 302 back to theclient, which indicates a temporary redirect. This status code is confusing tosearch engine spiders, which will not drop the original URL from their index,but they may also record the destination URL. If the original and destinationURLs appear very different, or if they are in a different domain, you riskhaving your site dropped, and with it all the link equity that you ve earned.This is because less reputable sites make it a practice to guide you to one Website and then redirect you to a completely different one to which you did notintend to go. So to be safe, never issue an HTTP status code 302, eitherdirectly or indirectly. Always use HTTP status code 301 to indicate permanentredirection.
URL Rewriting in an ASP.NET Web Page
From an SEO perspective, a URL rewrite within an ASP.NETpage occurs too late in the process lifecycle because the HTTP request has beenallowed to get all the way to the destination page without interception. Froman SEO perspective it is unlikely that you would want to rewrite a URL at thislate stage. For one thing, the original destination Web page is now alreadyrendering, which defeats the purpose of redirecting the visitor somewhere else:It would be more efficient to redirect them earlier, if that s what you intendto do.
An SEO-driven URL rewrite and redirect is not aboutintelligence, it is about rules. It is possible that an ASP.NET Web pagecode-behind file contains some decision logic that is needed for a URLredirect. But this would typically be a different kind of URL redirect, not relatedto an SEO-driven URL rewrite. In short, do not use the ASP.NET Web pagecode-behind file for SEO-driven URL rewrites, although feel free to use thislocation for other types of URL redirects.
URL Rewriting Using an HTTP Module
HTTP modules are assemblies that plug in to the ASP.NETHTTP request pipeline. They intercept all incoming HTTP requests to a Webapplication and contain various event handlers that can be used to access thedetails of the HTTP request and modify it, if necessary. Custom HTTP modulescan be developed in .NET, deployed on the Web server, and configured for use byregistering it in a Web application s configuration file (Web.config).
HTTP modules are straightforward to write, but the URLredirection logic can be tedious. If you don t want to write your own URLrewriting infrastructure code, I recommend using an open source module calledUrlRewriter.NET, which allows you to configure your custom URL rewriting rulesdirectly within the Web application s Web.config file using regular expressionpattern-matching syntax. Not only can you encode rules, but you can addconditional statements to separate out rule groups based on the incomingrequest type.
Figure 4 shows an example of a Web.config file for anapplication that uses UrlRewriter.NET with a regular expression pattern-matchingrule to map a keyword-rich product URL to a querystring formatted URL that ismore suitable for generic database lookups. In this case, the rewrite is from:
http://www.jeffshop.com/forsale/876543.aspx
to:
http://www.jeffshop.com/catalog.aspx?sku=876543
to="^/catalog.aspx?sku=$1" />
Figure 4: Rewritea URL using UrlRewriter.NET.
The code listing in Figure 4 redirects the HTTP request toa different URL, but the HTTP status code in the response is returned as 200,which is not an official redirect code. This is OK if your Web site visitor isa person, but for search engine spiders you ll want to return HTTP status code301 to inform the search engine to record the permanent redirect, and transferpage rank and link equity from the previous URL. In this case you should usethe configuration element instead of the element. The element returns the permanent redirect HTTPstatus code 301 by default.
UrlRewriter.NET is not the only available module, but itis currently one of the more popular ones, and has received good coverage inbooks and blog postings. In addition, being an open source project, the sourcecode is freely available on http://SourceForge.netfor you to browse and modify. You can download UrlRewriter.NET from http://urlrewriter.net/.
Conclusion
SEO-driven URL rewrites are essential to preservinghard-earned page rank and link equity for Web pages. The leading search enginesassign higher rankings to Web pages that support keyword-rich URLs; have manyincoming links from reputable sites; and have longevity. URL rewrites allow youto preserve hard-earned search engine rankings even if your site navigationneeds to change. IIS and ASP.NET provide good technical support for URLrewrites, especially in combination with popular third-party modules.
Source code accompanyingthis article is available for download.
Jeffrey Hasan, MCSD,is Senior VP of Strategic Consulting Services at Axis Technical Group, Inc. (http://www.axistechnical.com). He hasbeen a professional systems architect and developer for 12 years. His workfocuses on enterprise integration, business intelligence, data warehouses, andworkflow-driven portals using SharePoint. Jeff has authored several .NET books,including Expert Service Oriented Architecturein C# (Apress, 2006).
About the Author
You May Also Like