Build a Better Search Engine

Use SQL Server 7.0's full-text search capability to build searchable Web databases

David Jones

May 23, 2000

8 Min Read
ITPro Today logo in a gray background | ITPro Today

Whether you're doing research, looking up a phone number, or just surfing, you probably use an Internet search engine every day. To meet this demand for retrievable information, many companies are building dynamic, searchable Web sites that use relational databases on the back end. As DBAs and application developers, we need to build better ways for consumers to search company information, stock reports, press releases, product catalogs, and so on. SQL Server 7.0's full-text search might be the answer. SQL Server's full-text search engine gives you fast queries and advanced pattern matching in an enterprise environment.

In earlier SQL Server versions, you had to query large blocks of text by using LIKE in a SELECT statement, but the LIKE statement is limited because it can only match patterns. Also, if you use a LIKE statement with a percent sign (%) before and after a search string, you'll produce a time-consuming table scan. Most users want advanced pattern matching, with results as soon as they press Enter. With SQL Server 7.0 and later, you can still use a LIKE statement if you want; but now you can use a new syntax option and index the data.

Instead of using indexes stored in your database, the full-text search engine uses Microsoft Search Service to catalog these indexes on your server's hard drive. When you select Full-Text Search in the SQL Server Custom Setup, the installation process automatically installs Microsoft Search Service. However, Microsoft Search Service is available only in Standard and Enterprise systems on Microsoft Windows 2000 or Windows NT, and not in the desktop or Windows 9x versions. The Microsoft Search Service has two distinct functions: building and populating your full-text catalog, and processing the search.

To install the full-text search engine after installing SQL Server, you need to rerun Setup from the SQL Server 7.0 CD-ROM. At the Select Components dialog box, select Full-Text Search, as Figure 1 shows, then reboot. Only SQL Server 7.0 running on NT Server supports full-text searching. An NT Server, Enterprise Edition clustering environment doesn't currently support SQL Server full-text search. Microsoft will support full-text search in a clustered environment in SQL Server 2000.

Microsoft Search Service stores full-text indexes in files on your server's hard drive. Microsoft Search Service stores these files, called full-text catalogs, by default in MSSQL7FTDATA, but you can change the path to these catalogs. You can't store full-text catalogs on a floppy drive, network drive, or removable drive. I always create full-text indexes on a partition with plenty of drive space because large tables will need it.

To use full-text indexing, your SQL Server table needs a unique index, or primary key. If you can't create a unique key from a potential indexed table, you need to add to your table an incrementing identity column or another column that will let you create a unique index. To help Microsoft Search Service run optimally, keep the unique index as small as possible. After you establish the unique index, or primary key, you can issue full-text indexing only on char, varchar, text, nchar, nvarchar, and ntext columns.

The full-text search engine doesn't populate full-text indexes dynamically. Therefore, any additions, changes, or deletions to your data won't be searchable until you repopulate the full-text catalog. You can repopulate the full-text catalog in two ways: run a manual population or schedule a job to repopulate the catalog. With both of these methods, you can choose a full repopulation or an incremental population.

A full population will clear all data stored in a full-text catalog and repopulate it with data related to its corresponding tables. Because a full population can consume time and resources for large tables, I prefer to use an incremental population, which populates only textual data added to the database since the last incremental population. For you to do an incremental population, one of the columns in the full-text-enabled table needs to be a timestamp data type. You won't need to fully repopulate an incrementally populated catalog until the table schema changes or you need to add tables to the catalog.

Be aware that full-text indexes aren't as exact as standard indexes. Although a search based on standard indexes will return only results that match an exact character pattern, a full-text search can query a word or phrase, the prefix of a word or phrase, a word or phrase that is near another, or an inflectional form of a word (for example, a search on "query" would also return results that contained "queries," "querying," and "queried"). Table 1, adapted from information in SQL Server Books Online (BOL), lists other differences between classic SQL and full-text indexes.

Although I believe the advantages predominate, the full-text search feature does have disadvantages.

  1. You can't back up a SQL Server full-text catalog. If your system crashes, you'll have to rebuild all full-text indexes manually during a restore. Keep this limitation in mind if you're working with critical applications that use full-text search. Remember that Microsoft doesn't support a full-text catalog failover in a clustered environment.

  2. Full-text indexes consume significant hard drive space. Because hard drive space is cheap, I use 9GB to 18GB hard drives.

  3. Unlike traditional indexes, full-text indexes aren't dynamically updated. For full-text repopulating, you need to set and monitor a job. I always repopulate my full-text indexes when database usage is minimal.

  4. Full-text repopulating of large tables can be time-consuming. Again, I recommend using the incremental repopulation to minimize resource usage.

  5. In a full-text catalog, you can run only eight searches at one time. If you begin a ninth search, Microsoft Search Service will hold it in a queue.

  6. It's difficult to write code that incorporates all methods of searching full-text data. I always code my search engine with the CONTAINS clause (which I'll explain later), and I've found this the best method to search my data.

  7. You can only use a wildcard (asterisk) as a placeholder for suffixes; you can't use a wildcard at the beginning of a search string. (For example, to find all occurrences of "sailboat," you need type "sail*" instead of "*boat.")

  8. You can't query segments of words without using a wildcard. For example, if you searched for "grass," "sawgrass" would not appear as a valid result.

  9. SQL Server full-text search can't span multiple databases. I've worked around this limitation by using cross-database joins.

  10. SQL Server full-text search can't index views.
     

After you've installed Microsoft Search Service to enable full-text searching, you need to enable your databases and tables for full-text indexing. (This step won't populate your tables; to populate tables, follow the instructions in the next section, "Populating Your Full-Text Indexes.") You can finish installing the full-text search engine by using either SQL Server built-in stored procedures or SQL Server Enterprise Manager. Follow these steps to enable a database and table for full-text indexing with SQL Server Enterprise Manager:

  1. Launch Enterprise Manager, and register your server.

  2. Open the server group where your database is located.

  3. Open the desired SQL Server 7.0 server.

  4. Open the Support Services folder, and verify that Full-Text Search is running.

  5. Open the Databases folder, and select the database you want.

  6. From the Tools menu, select Full-Text Indexing (as Figure 2 shows). This will launch the Full-Text Indexing Wizard.

  7. Select a table from the drop-down list. (Remember that the table must contain at least one text, ntext, char, nchar, varchar, or nvarchar column.)

  8. Select a unique index for the table selected (as Figure 3 shows). If the table has no index, you'll have to exit the Full-Text Indexing Wizard or go back and select another table.

  9. Select table columns (as Figure 4 shows). Move to Added columns any columns that you want to make eligible for a full-text search.

  10. Select a catalog (as Figure 5 shows). You can choose an existing catalog or create a new one. I use a different catalog for every table, but you don't need to. Always give a new catalog a name associated with your table.

  11. Select or create population schedules. Because full-text indexes aren't updated like regular indexes, choose this option if you want to schedule a full repopulation or an incremental population. All scheduled tasks will be run as a normal job in SQL Server Agent.

  12. Repeat these steps for additional databases.

To enable additional tables for full-text indexing, open the Databases folder and select your desired database. Open that database folder, and click Tables. When your tables appear, right-click a table and select Full-Text Index Table, as Figure 6 shows. This action launches the Full-Text Indexing Wizard used in the previous steps.Then, follow steps 8 through 12 in the previous instructions.

Populating Your Full-Text Indexes

To populate full-text indexes, open the Databases folder and select your full-text-enabled database. Click Full-Text Catalogs under the Databases folder. Your newly formed catalogs will appear to the right. Right-click a catalog. Select Start Population, as Figure 7 shows.

Querying Full-Text Indexes

In addition to LIKE, you can use several other T-SQL keywords to query a full-text catalog, including CONTAINS, FREETEXT, CONTAINSTABLE, and FREETEXTTABLE. CONTAINS and FREETEXT are easiest to use and can provide most of the functionality you'll need. You can use these keywords in the WHERE clause of a T-SQL statement only for searching character strings or phrases. You use CONTAINSTABLE and FREETEXTTABLE keywords, which determine relevant search rankings, in the FROM clause of a T-SQL statement. These statements will return a two-column table containing the unique row ID and a relevant ranking for the search. (For more information about using CONTAINSTABLE and FREETEXTTABLE keywords, see BOL.)

For instructions about building a sample full-text search engine and practice queries, see the sidebar "Sample Full-Text Search Engine." Despite some disadvantages, SQL Server 7.0's full-text search is a better way of searching large textual items such as text and varchar data. If you install this search capability, it can build the full-text catalog, populate it, and process a full-text search. You'll avoid time-consuming table scans, enable advanced pattern matching, and deliver quick results in high-performance, robust corporate applications and websites.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like