Index Web Content with Site Server 3.0

Site Server 3.0's indexing features let you control how your Web server catalogs and searches your Web content.

T.J. Harty

November 30, 1998

11 Min Read
ITPro Today logo in a gray background | ITPro Today

Customize your search engine to fit your needs

Last month, I introduced you to Site Server 3.0 and described one of the software's most powerful features: the Microsoft Site Server 3.0 Publishing Solution (see "Publish Web Content with Site Server 3.0," November 1998). This month, I'll describe Site Server 3.0's other important feature, the Microsoft Site Server 3.0 Knowledge Management Solution, and show you how to build general catalogs and database catalogs for indexing and searching. (For an overview of Site Server, see Paula Sharick, "Site Server 2.0," June 1998.) The Knowledge Management Solution offers all the functionality of Microsoft's Index Server and a whole lot more.

I first wrote about Index Server 1.1, Microsoft's add-on indexing and search engine for Internet Information Server (IIS), in January 1997 (see "Microsoft's Index Server"). Since then, Microsoft has added Index Server 2.0 to IIS 4.0. In Site Server 3.0, Index Server takes another giant step forward. Whereas Index Server 2.0 indexed files on one computer, Site Server 3.0 indexes multiple Web sites, following various Web links. In addition, Site Server can index catalog content and search content from Web pages, Microsoft Office documents, text documents, Open Database Connectivity (ODBC) databases, and Microsoft Exchange public folders; hand off the process of building catalogs to another server; and search across multiple catalogs with one query. Together, these features let you index more types of content than ever before and give you better control over the necessary resources for content indexing and searching.

Getting Started
Before you can index or search your Web content, you must build a catalog of your Web documents within the Knowledge Management Solution. A catalog contains information about each document (e.g., title and subject), a link to each document, and a full-text index of each document.

Building a general catalog. To create a catalog, open the Site Server Microsoft Management Console (MMC) plug-in, double-click Search, and double-click your Web server's name. You will see two folders, Catalog Build Server and Search Server, as Screen 1, page 184, shows. Catalog Build Server contains the information that Site Server 3.0 needs to build your catalogs. Search Server contains the completed catalogs ready for searching.

To start the catalog building process, right-click Catalog Build Server, and select New Catalog with a Wizard. Enter a name for the new catalog, and click Next. Then, decide what kind of crawl (Web link crawl, File crawl, or Exchange crawl) you want to perform. A crawl consists of going through the content files and indexing them. The next screen asks you to provide the directory you want Site Server to crawl. You can enter a fully qualified path (e.g., c:serverdirectory), or you can enter a Uniform Naming Convention (UNC) name (e.g., \servernamesharename). Unfortunately, you can't browse to the directory you want. Then, select the Search Host (i.e., the server or servers that you want to host the completed catalog). You can place the catalog on one or many servers. After you select the Search Host, click Start build now to build the catalog.

Site Server 3.0 will earn big points with many Web developers because the software typically lets you test your catalog as soon as Site Server finishes indexing. You can use the MMC to see whether the software is properly building your catalog and whether you're getting the results you expect. Double-click the Search Server folder, and double-click the name of the catalog you want to test. You might have to press F5 to refresh the screen. Select the Search page listed under your catalog name, and the MMC will display a Web page in the frame on the right. From this page, you can type any word or phrase you want to search on, and you'll see any documents the search returns.

Building a database catalog. Building a catalog to index databases is slightly different from building other types of catalogs. For starters, you use WebAdmin, Site Server 3.0's browser-based administration tool. Speaking from experience, I can tell you that building a database catalog never works on the first try. Every time I created this type of catalog, it timed out and failed on the first attempt but worked on subsequent attempts. I later found out that the default timeout value for building catalogs was too low.

To avoid this problem on complicated indexes that take a long time to catalog, you can go to the IIS folder in the MMC, right-click your Web server, and select Properties. Select the Home Directory tab, and click Configuration. Go to the App Options tab, where you'll see the ASP Script timeout field. You can increase this value to provide more time for the catalog to complete (e.g., increase the value to 300 seconds to allow 5 minutes). The downside to this fix is that IIS applies this timeout value to all your IIS scripts, and it can be long for a default length. Alternatively, you can go to microsoftsiteserversiteserveradminknowledgesearch and open dbwiz6.asp. You then input

Server.ScriptTimeout = 300

in any of the <% %> tags. This setting tells the server to give your Site Server template 5 minutes before it times out.

To create an ODBC catalog, open the Site Server Search administration area at http://localhost/SiteServer/Admin/Knowledge/Search/default.asp. Start by defining the catalog. Click Catalog Build Definitions. Screen 2 shows all the available options. You can manage and create any general catalog from this page (note that you can create ODBC database catalogs only from this page).

Click Create, and select A database as the type of document you want to catalog. If you've already set up a data source, select it from the drop-down box of data sources available on the server. If you haven't set up your data source, you need to do so; otherwise, you can't connect to your database server. Select the data source you want to use, enter a new catalog name, and provide a username and password so that you'll have the proper permissions to access the table. If you have the proper permissions, the next screen lists all the tables that exist within your data source.

The next step involves testing how well your index and searches work. I used the recently released Windows NT Magazine article index database, which contains all Windows NT Magazine articles and other Web content. Because the database is relational, Site Server can cross-reference other materials that relate to the information a user searches on. For Site Server to cross-reference these materials, I needed to index the database's articles, which is the information users request most often on the Windows NT Magazine Web site.

After I selected the Articles table, I had to decide which content column and primary key I wanted to use for reference. I selected the Abstract field as the column and the ArticleID as my primary key. Site Server provides a sophisticated default output, so you can also tell the software which column to use for the cross-reference hyperlink; I used the Title (i.e., article title) column.

After I configured these settings, the next screen let me determine which columns the search engine searches and retrieves information from. If you mark a column as searchable, users can search on that field after Site Server indexes the column. If you mark a field as retrievable, Site Server makes that field content available for display after a user performs a search. I marked the ArticleID and IssueID fields as retrievable. Finally, I clicked the Build the databases catalog now box. Screen 3 shows the search page I used to test my new database catalog, and Screen 4 shows the details page after I selected a result from the search page.

As part of the Windows NT Magazine Web site, I used Allaire's Cold Fusion Application Server over Active Server Pages (ASP) to provide Web-to-database connectivity. If your enterprise is similar, you'll appreciate how easily you can migrate the Site Server Search summary ASP output to your Cold Fusion templates to process and manipulate the data further. Keep in mind when you create your catalog that you need to select the Site Server check box that contains your identity field (i.e., the field that makes each record unique) during the build process, and you'll have all that you need to finish the pages. Screen 5, page 186, shows the Site Server 3.0 results page after you build the catalog. I made a few formatting changes to make this page easier to read.

I wanted to link the results from Screen 5 to the Article Index pages. When I created the catalog, I marked the ArticleID and IssueID fields as retrievable. I needed to use the values in these fields to pull the queries on my article pages. So I opened the results.asp page in the microsoft site serversiteserverknowledgesearchdatabasesearcharticles directory and changed the value in the URL <% = RS("DocAddress") %>, which points to the view.asp page in the same directory with the ID value amended to it, to the location of the Cold Fusion page that I already use on the site. The new URL is http://servername/template.cfm?IssueID=<% = RS("IssueID") %>&ArticleID=<% = RS("ArticleID") %>.

Indexing and Cataloging a Remote Web Site
After I set up my site to index static files and an ODBC database, I wanted to index a site on another machine and then propagate the catalog back to the indexed server. To test this process, I indexed the Windows NT Magazine Web site from my home office. I wanted to create and manage the process from my home server, but I didn't want my home server to perform the actual indexing. Two servers (LiveServer and SearchServer) in my work office run Site Server 3.0. My objective was to index LiveServer from SearchServer and make the index searchable from LiveServer.

To start, I ensured I had proper permissions set on SearchServer. I opened the MMC, double-clicked SearchServer, right-clicked Catalog Build Server, and selected the Accounts tab. I made sure that both the Administrative access account and the Default content access account had the username and password of a user with permissions on LiveServer. To make things simple, I used my user account. (If you need to change the account information under the Accounts tab, you will probably have to reboot the server for the changes to take effect.)

Next, I checked to see whether I had added the hosts I needed to the MMC view. To add these hosts, I right-clicked the Search folder in the MMC and selected Add Host. I added LiveServer and SearchServer. Then on SearchServer, I right-clicked Catalog Build Server and selected New Catalog with a Wizard. I entered ToLiveServer for the catalog name, and I chose to do a File crawl. For the Start address, I entered the path to the \LiveServerProfCon share. I then selected the names of the servers where I wanted to propagate the completed catalogs. I selected LiveServer and deselected the default setting, which was the name of my work-at-home server that I was using to perform this procedure. On the final screen, I selected the Start build now check box and clicked Finish. That's all I had to do to configure SearchServer to index LiveServer, create a catalog, and propagate the catalog back to LiveServer so users can perform searches.

Although Windows NT's Performance Monitor isn't very scientific, I used it to watch the two servers while the remote indexing took place. I immediately saw the advantage of removing the indexing function from LiveServer. Whereas SearchServer's processor usage during the indexing function worked up to 100 percent until the index was complete, LiveServer's processor usage never varied more than 10 percent.

To test how well my remote indexing and cataloging worked, I went to http://LiveServer/siteserver/knowledge/search/default.htm and entered a search word. Sure enough, the server displayed results with hot links to the right cross-referenced files. However, I noticed that all the links were file://LiveServer/filename.html. You can't avoid this type of labeling while you create the index, but you can remedy the situation after you build the index. To display the proper link names, I found the catalog I just built on SearchServer, right-clicked and selected Properties, and selected the URLs tab. I clicked Add in the Mappings section and added

//LiveServer

in the Access location box and

http://LiveServer

in the Display location box. Then all I had to do was start a new build on the catalog. When the build completed, it displayed the URLs for the parameters that the user sees. Figure 1 shows the process of gathering, indexing, and displaying the search results. This process lets SearchServer perform normal file crawls behind a firewall with whatever permissions the Index Server has, but still lets you display the files to the end user from LiveServer in proper Web format.

By setting up SearchServer, I removed the strain Site Server's indexing put on LiveServer's resources and required to create and maintain a catalog. Also, I can now manage all my catalogs for numerous Web servers from one machine.

When you go to the http://servername/siteserver/knowledge/search/default.htm, you can run a keyword search against all your catalogs at once or individually. For example, if you build a couple catalogs from different sources, you can provide one interface to all of them, or you can separate each type of content into their appropriate areas.

Site Server's search capabilities have changed a lot since Site Server 2.0. Site Server Search is powerful, easy to implement, and a tremendous service to your users. Most important, Site Server Search provides enough functionality to justify buying Site Server 3.0.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like