The HTML Text Filter

Ken Spencer introduces you to a Windows 2000 Resource Kit tool that can improve performance without losing maintainability--the HTML Text Filter.

Ken Spencer

September 26, 2000

5 Min Read
ITPro Today logo in a gray background | ITPro Today


Improve performance without losing maintainability

Many Web sites and applications are going through a performance crisis because they either weren't designed to performance standards or weren't tested during the development and deployment cycles. Developers and systems administrators are trying to make sites and applications perform better. However, any process that optimizes applications or sites makes maintaining them more difficult.

This dilemma is particularly troublesome for Web applications and sites because the application or site almost always produces HTML code that must travel over a network connection to the requesting browser. Any increase in the HTML file's size results in increased network traffic, which might translate into slower response time for the application's users. This problem is especially evident when users access the site through a slower-speed analog modem connection.

You need to weigh this need for speed against the need for maintenance. To assist in application maintenance, developers insert comments into their code. For HTML code, developers insert comment blocks into their code like this:

The browser picks up the comment tag (). This process lets developers insert into a page notes or other text that describes what the HTML code does. (The above comment uses the HTML 3.2 format. Developers can also use other formats in various languages.)

Developers and designers also use white space in an HTML application to improve the code's legibility. Because white space breaks the code into sections, the space lets a developer or designer quickly grasp the code's meaning. Design tools such as Microsoft FrontPage also insert formatting into the HTML they produce to make the code more readable.

The Microsoft Windows 2000 Resource Kit includes the HTML Text Filter tool, which you can use to automatically remove comments and extra white space from .htm, .html, and .asp files. This automation makes it easy to shrink a file before it goes into production; the resulting smaller file runs faster.

To test this tool, I created the simple .htm code (commenttester.htm) that Listing 1 shows. To execute the tool against this test file, use this syntax:

C:htmlfltr commenttester.htm

Demonstrating the Filter
Figure 1 shows the execution of this command and its results (i.e., that it compressed one file). When the tool finished processing, the file looked like the file in Listing 2.

Notice that the HTML in Listing 2 has a new starting tag . The filter inserted this tag to show that the tool filtered the text. Also, notice the structure of the HTML in Listing 2: The HTML is not as easy to scan as the HTML in Listing 1 because the filter removed most of the white space and squeezed the tags together. For example, the , , and tags are all on the same line. Reducing the amount of white space reduces the size of the file but makes the HTML much harder to read. This sample file doesn't represent the complexity of a file with hundreds of lines of code. The lack of white space becomes more important with larger files because HTML's complexity makes them much harder to read.

One other thing has changed from Listing 1 to Listing 2. The comment block after the tag is gone. However, the comment tag embedded in the

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like