Data Mining in SQL Server 2005

Chances are, if you're like most DBAs or developers, you don't know much about data mining. This month, Douglas McDowell gets you started.

Douglas McDowell

May 19, 2005

3 Min Read
ITPro Today logo in a gray background | ITPro Today

Chances are, if you're like most DBAs or developers, you don't know much about data mining. Data mining was added to SQL Server 2000 Analysis Services late in the development cycle (that's why they changed the name from OLAP Services). SQL Server 2000 data mining comes with two algorithms from Microsoft Research, Data Transformation Services (DTS) tasks, standardized APIs and OLE DB for data mining data access, a new SQL dialect for data-mining-specific Data Definition Language (DDL) and Data Manipulation Language (DML) commands and also user controls. In SQL Server 2005, data mining is completely revamped with many state-of-the-art algorithms and powerful APIs (Data Mining Extensions--DMX--and XML for Analysis). Microsoft is moving forward aggressively with its strong platform approach that's gone beyond delivering data mining to the masses by providing enterprise tools that help developers embed data mining in applications and core business processes.

Recently, I talked with the Lead Program Manager for data mining, Zhaohui Tang, and the data mining engine lead, Jamie MacLennan, about data mining in SQL Server 2005 and what readers should know that they might not be catching in the mainstream product marketing. The first topic they wanted to cover was text mining, which is profiling and pattern discovery in unstructured data such as written documents or transcripts from customer support calls. I hadn't dug into text mining yet, so I headed to the data mining team's resource Web site ( http://lists.sqlmag.com/t?ctl=A77B:7B3DB ). I got Peter Kim's (an engineer on the SQL Server Data Mining development team) Text Data Mining tutorial and I had soon imported sample user group posting data, created a dictionary and term vectors, trained and tested my samples, and built data-mining models that were ready to deploy. Kim wrote the tutorial for SQL Server 2005 Beta 2; so if you're using a later build a few things are different.

After talking about text mining, we discussed how valuable platform integration is for data mining in SQL Server 2005. We'll see deep integration with SQL Server 2005 Integration Services (SSIS), OLAP in Analysis Services, and SQL Server 2000 Reporting Services. SSIS becomes the primary vehicle for data cleansing and preparation for data mining model creation in addition to training and performing predictions against existing data mining models. In Analysis Services, dimensions, and subsequently OLAP cubes or Unified Dimensional Models (UDM), can be created based on data mining models letting you expose interesting patterns and segmentation in your reporting and analysis. Also Reporting Services offers the ability to create reports with an embedded DMX for SQL query builder for getting information from data mining models. This platform integration provides unrivaled empowerment for developers to rapidly become proficient in creating many varieties of data mining applications.

Microsoft's data mining focuses on accelerated deployment. Studying the Crisp data-mining process, Microsoft supports the full data-mining life cycle but lets organizations deploy models and predictions in minutes and hours rather than days and weeks using automation and embedding capabilities. Upon MacLennan's recommendation, I checked out http://lists.sqlmag.com/t?ctl=A779:7B3DB to get my specifics straight regarding the Crisp data-mining-process. As you experiment with data mining, you should read up to understand the industry standard best practices.

Another great data-mining tutorial on the http://lists.sqlmag.com/t?ctl=A77B:7B3DB Web site is the general Data Mining Tutorial. This step-by-step guide walks you through how to recreate most of the data mining items you see in the AdventureWorksDW Analysis Services sample solution that's installed as part of the SQL Server 2005 samples. You can start out looking at the data-mining viewers and getting a feel for the finished product, then dig into the tutorial to create it on your own.

An UPDATE commentary slot isn't enough space to properly cover everything going on with the data mining team, but I want to encourage you to get you feet wet with data mining sooner rather than later--you just might find you have a real passion for it. Happy Mining!

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like