Jumbo Columns
Use SQL Server's support for LOBs and the MAX specifier to define extra-large columns
August 22, 2006
As I discussed last month in “Stretching the 8K Row Limit,” InstantDoc ID 50490, SQL Server 2005 uses row-overflow pages to store data in variable-length columns that exceed the 8060-byte maximum row size.But even with the ability to store more than 300 columns of variable-length data of up to 8000 bytes each, in every row, you might still want to circumvent the 8K-per-column limitation sometimes. You can do so by using SQL Server’s support for large object (LOB) data and the MAX specifier with varchar columns.
Large Objects in SQL Server
SQL Server has long been able to store large objects in LOB columns, by using the text and image data types (since SQL Server 4.0) and, additionally, the Unicode ntext data type (since SQL Server 7.0). Let’s take a closer look at LOB data by running the code in Listing 1, which creates the hugerows table with a fixed-length column of 1000 bytes, a variable-length column of up to 8000 bytes, and a text field and inserts one row into the table.
By default, the text data is stored completely off the row on another type of page, with only a 16-byte pointer in the row itself. So the data row inserted in Listing 1 will be only 2016 bytes long, plus a few overhead bytes, and fits on the regular data page. The LOB data will be stored on one or more additional LOB pages. One limitation in the INSERT statement in Listing 1 is although I’ve attempted to insert 10,000 bytes into the text field, which can hold 2GB of data, the REPLICATE function can’t return a value of type text and quietly truncates anything longer than 8000 bytes to that size. So if I run the following SELECT statement, SQL Server reports that the length of column c is 8000 bytes:
SELECT datalength(c) FROM hugerows
The code in Listing 2 executes DBCC IND on the hugerows table, captures the output, and returns some of the most relevant columns. Table 1 shows Listing 2’s output: one page for the regular in-row data, one page for the Index Allocation Map (IAM) for the in-row data,one IAM page for the LOB data, and two pages for the LOB data. (Your output, of course, will show different values for the actual page numbers.)
LOB Data Storage
Beginning with SQL Server 7.0, LOB data is stored in a tree structure, similar to the trees used to store indexes. This structure makes accessing part of the data efficient if you know the offset within the column. If the amount of data in a LOB column is less than about 40KB, the text pointer in the data row points to an 84- byte text root structure. This structure forms the root node of the B-tree structure, and each row has its own root. The root node points to the blocks of text, ntext, or image data. One of the LOB data pages in Table 1 is a page containing the root structure for the first row; the others are the actual LOB data—the string of 8000 c’s.
Let’s add another row to the hugerows table, by using the statement in Listing 3. Now when you run the code in Listing 2 again, you should get only one more row of output, indicating the LOB page needed to store the new text data with the string of f’s.The root structure for this second row can be stored on the same page as the first row’s root structure; no new page is needed. Although the data for LOB columns is arranged logically in a B-tree, physically both the root node and the individual blocks of data can be spread throughout the LOB pages for the table—wherever space is available. In this case, since the actual data took almost an entire page, SQL Server put the root structures on a separate page, but that might not always happen. If the text data took only half a page, the root structure could be stored on the same page, and you wouldn’t need a separate LOB page just for the root structures. The size of each block of data is determined by the size written by an application.
To see all three types of pages, insert one more row into the hugerows table, as Listing 4 shows.This time, the varchar column is so large that even without the text column,the row no longer fits in one page.The fixed- length column needs 1000 bytes,so an additional 8000 bytes in the varchar column won’t fit. Run the code in Listing 2 again, and you’ll get nine rows of output, similar to the output that Table 2 shows.
You now have one IAM page and one data page for the regular in-row data,an IAM page and a data page for the row-overflow data, and five LOB pages: an IAM page, a page containing the three root structures, and three pages for the actual text data.
Although LOB columns can hold up to 2GB of data in a column, they aren’t easy to work with. Many functions don’t take LOB data types as input, few functions can return a LOB data type as output (see the previous REPLICATE function),and LOB data columns can’t be accessed from the inserted and deleted tables in the body of an AFTER trigger. As of SQL Server 2000, LOB data columns can be accessed from the inserted and deleted tables in the body of an INSTEAD OF trigger. To do most of the initial population and manipulation of LOB data, you must use the special operators READ- TEXT, WRITETEXT, and UPDATE-TEXT,which are employed differently from SELECT,INSERT and UPDATE. For more information about these special operators, see “BLOB Access Technologies,”July 2003, InstantDoc ID 39058;“Importing Word Documents into SQL Server,” March 2003, InstantDoc ID 37903;and “Off the Record,” January 2003, InstantDoc ID 26997.
The MAX Solution
Before SQL Server 2005,you had to decide whether to limit your columns to 8000 bytes or to deal with large data columns by using different operators. SQL Server 2005 provides a solution that gives you the best of both worlds.
SQL Server 2005 lets you use the MAX specifier to define a variable-length column. You can use MAX with the nvarchar and varbinary LOB data types as well as varchar.You can indicate the MAX specifier instead of an actual size when you define a column, variable, or parameter that uses one of those types.
When you use MAX, by default you’re letting SQL Server determine whether to store the value as a regular varchar, nvarchar, or varbinary value or as a LOB.You can override the default by using a new table option called large value types out of row and enable this option by using the sp_tableoption procedure in Listing 5. If the option’s value is set to 1, the data in columns defined using MAX will be stored off row as LOB data with the 16-byte pointer stored in the row. If the table option is set to 0 and the column length is less than or equal to 8000 bytes, SQL Server checks the total row size.If the row size is less than 8060 bytes, SQL Server stores the column values as regular in-row data; if the row is greater than 8060 bytes, SQL Server stores the column data off row as LOB data, again with the 16-byte pointer stored in the row.
No matter where the data that has the MAX specifier is stored,in your code you can treat it as if it were regular variable-length data—varchar, nvarchar, or varbinary—and use all the regular string operators to work with the data. The special commands for LOB data aren’t required. In fact, SQL Server 2005 Books Online (BOL) indicates that,in a future SQL Server version,the text, image, and ntext data types will no longer exist, and you might want to consider not using them for new development.
Run the code in Listing 6 to re-create the hugerows table so that it contains a fixed-length column, a varchar(8000) column, and a varchar(MAX) column, then insert the two rows shown in Listing 7. Run the script in Listing 6 after each insert. Note that the REPLICATE function can return a value of type varchar(MAX) if REPLICATE is used with the CAST or CON- VERT function. REPLICATE returns a character expression of the same type as the supplied character expression. To have REPLICATE return an expression of type varchar(MAX), you must supply it with a parameter of that type. Using varchar(MAX), I can use REPLICATE to generate strings of more than 8000 characters, as Listing 7 shows.
After the first insert, you see only the regular data page and its IAM because SQL Server treats the 50-byte column as a regular varchar. However, after the second insert, you’ll have LOB data pages: one page for the LOB IAM and four LOB pages to hold the 30,000 bytes of LOB data and the root structure.
Finally, let’s insert another row into hugerows, as Listing 8 shows. You might guess that this INSERT will lead to row-overflow data because the 7000-byte column precludes the row’s fitting on one page. But when you run the script in Listing 6, you don’t get a row-overflow page, but rather another LOB page. If a column defined with the MAX specifier is too big to fit on a regular data page, SQL Server stores it on a LOB page. The only columns that can be stored on row-overflow pages are columns defined by using the regular variable-length definitions, which have a specific upper limit to their size.
The size of the columns defined by using MAX can reach the maximum size supported by LOB data, which is currently 2GB. By using MAX, though, you’re indicating that the maximum size should be the maximum the system supports. If in the future, you upgrade a table that has a varchar(MAX) column to a new SQL Server version, the MAX length will be whatever the new maximum in the new version is.
A Mixed Blessing
SQL Server’s LOB data types let you store large amounts of data in one column. Using the new MAX specifier lets you access and manipulate large amounts of data in your T-SQL code exactly as if that data were in regular variable-length columns. Although MAX lets you easily manipulate large data columns, the data can require much more storage space than data that fits in the regular data row, and SQL Server must work harder to access data stored off the regular data pages. There’s also overhead involved in keeping track of where the special data pages are actually kept. Only you can decide, ideally by doing a thorough performance testing of your applications, whether using the variable-length data types with the MAX specifier is worthwhile.
About the Author
You May Also Like