SQL Server 2005's ETL Tool: Integration Services
Welcome to a New Year and a new focus for SQL Server Magazine UPDATE--Special Edition. These are exciting times as we approach SQL Server 2005's release.
March 30, 2005
Welcome to a New Year and a new focus for SQL Server Magazine UPDATE--Special Edition. These are exciting times as we approach SQL Server 2005's release. Over the coming months I'll be highlighting each tier of Microsoft's new business intelligence (BI) offerings, weighing in on SQL Server 2005's products, and passing along insights I gather from the development team and other industry experts who are already familiar with the newest 2005 features. If you aren't already a BI junkie, stay tuned; you'll catch the fever before 2005 hits the streets. In the meantime, I want to encourage you to dig into SQL Server 2005's betas and Community Technology Previews (CTPs). Read Brian Moran's article "Technology Previews Deliver Pre-Beta 3 Peeks at SQL Server 2005" at http://www.winnetmag.com/article/articleid/44291/44291.html for more information about these between-beta treats.
I want to begin my BI exploration with the foundation: extraction, transformation, and loading (ETL). If you've ever delivered an enterprise data warehouse then you probably know that most of the project risk and developer hours go into the ETL development, which is why SQL Server 2005's ETL capabilities are so exciting. In case you missed the memo ("PASS News: DTS Renamed Integration Services" at http://www.winnetmag.com/article/articleid/44154/44154.html ), Microsoft replaced Data Transformation Services (DTS) with SQL Server Integration Services (SSIS). Notice that I didn't say DTS has been renamed. SSIS is essentially a new product.
I spoke with SSIS Group Program Manager Donald Farmer to learn what he thinks is most exciting about the product. Donald started by telling me that SSIS is not just an ETL tool. Although it's an application that moves millions of rows of data to and from heterogeneous data sources, SSIS functionality doesn't stop there. The tool leverages the end-to-end BI suite by offering complete data integration, movement, and shaping, which means that SSIS provides data cleansing, extensibility, and interoperability. All of which improves application performance. SSIS comes with a lot of pre-built data-cleansing functionality, including completely integrated functions such as Fuzzy Matching and Fuzzy Grouping, which use algorithms to match or group disparate data to a configurable degree of accuracy. SSIS will offer broad extensibility for third-party components; so if a functionality doesn't come in the box, SSIS lets you build your own components or add a third-party component to solve your problem. Interoperability ties the Microsoft BI platform together; SSIS can load data directly into Analysis Services cubes and it offers robust data-mining features for including scalable data-mining model creation, training, and predictions. Perhaps most surprising to me was SQL Server 2000 Reporting Services integration, which lets you treat an SSIS package as the data source for reporting. And, the SSIS team has worked extensively on balancing performance and scalability; the result of their efforts lets complex, high-volume ETL run on lighter-weight servers, so you can scale down. It will be hard for the competition to keep up with SSIS performance.
I also cornered my colleague, Erik Veerman, who has been working with SSIS for more than a year and has a lot of experience with SSIS development and production deployment. Erik noted that SSIS has let him rethink his long-standing ETL best practices and build new architectures that reduce ETL data staging areas and minimize performance costs associated with data staging (disk I/O and serial processing). Erik explained that SSIS offers capabilities in its data flow to perform complex data transformations, data cleansing, and high-volume lookups--all inline from source to destination. Another specific feature Erik raves about is the Slowly Changing Dimension (SCD) wizard. Through the SCD interface, you can rapidly generate all the steps and required code to add unique handling of history to multiple attributes in a given dimension. The SCD Wizard is definitely a feature that will separate Microsoft from the rest of the ETL pack.
I've been excited about SSIS since last summer, when I got involved in an early beta program. Spending time talking with Donald and Erik has gotten me even more excited to see what SQL Server Magazine readers are going to do once they've gotten up to speed with SSIS. Keep me posted!
About the Author
You May Also Like