Text in Row Internals
Learn the ins and outs of text data
February 18, 2003
Dealing with the large object (LOB) data types (text, ntext, and image) is a little more complex than dealing with the other types, as you probably know. When you throw in the text in row option, which specifies that small amounts of data can be stored in the data row instead of on separate pages, you have even more details to be concerned about. As I mentioned last month in "Text Pointers," there's a crucial difference in locking behavior between normal LOB data and LOB data stored in a table that has the text in row option enabled. If a table isn't enabled for in-row text storage, SQL Server holds no row lock for the text pointer, and another process might invalidate the text pointer before you can use it. The only way to lock the data row and protect the text pointer is to raise the isolation level to at least REPEATABLE READ. The drawback of using the isolation level to prolong the lock is that you can't release the lock earlier than the end of the transaction.
When you're working with LOB data in a table that has the text in row option enabled, you can't acquire a text pointer except in a user-defined transaction. You must then use the text pointer in the same transaction because the pointer is invalidated automatically at the end of the transaction, and once the pointer is invalid, SQL Server releases the lock on the row. However, the text pointer can be invalidated in other ways. Let's examine some of the ways to invalidate the pointer and see how you can use DBCC commands to view text data to determine whether it's on the data page or on its own special LOB pages.
Releasing the Lock
In a table that has text in row enabled, you have control over when the lock is released—it's released whenever the text pointer becomes invalid. As I mentioned, the end of a transaction automatically invalidates a text pointer, but you can force SQL Server to invalidate the text pointer and release the lock before the end of the transaction. Note that the rules for invalidating text pointers apply to all rows in a table that have text in row enabled, even the rows in which the text data is too big to fit in the row. SQL Server also invalidates text pointers for tables that have text in row enabled when you perform the following DDL operations on the table:
CREATE CLUSTERED INDEX
DROP CLUSTERED INDEX
ALTER TABLE
DROP TABLE
TRUNCATE TABLE
sp_tableoption with the text in row option
sp_indexoption
You might need to force invalidation of text pointers because SQL Server sets an upper limit on the number of pointers for in-row text data. SQL Server maintains internal memory for each valid text pointer, and it needs to limit the amount of memory that's used in this way. You can have up to 1024 valid text pointers per transaction per database. If your transaction spans two databases, you can have 1024 text pointers in each database. Within a transaction, obtaining a text pointer twice on the same LOB data gives you the same text pointer structure and doesn't count against your limit of 1024. SQL Server doesn't limit the number of pointers you can have for LOB data in a table that doesn't have text in row enabled.
The code in Listing 1 shows an example, based on code I used in "Text Pointers," of using ALTER TABLE, which invalidates the text pointer. Be sure to drop the t1 table first if it already exists. The code in Listing 1 creates a table that has a text column, then enables the text in row option for the table. The code then inserts two rows into the table: one with data that fits into the supplied text in row limit of 50 bytes and one with text data that's too big. The code then starts a transaction, retrieves a text pointer, then uses the text pointer to access text data. Inside the same transaction, the code executes an ALTER TABLE statement; a second attempt to use the text pointer results in an error, demonstrating that the text pointer was invalid. Note that you must submit the whole transaction as one batch because the local variable holding the value of the text pointer exists only for the duration of the batch.
After SQL Server generates the error message that reports the invalid text pointer, it continues with the batch and commits the transaction, so the new column is part of the table. If you try to run the same transaction again, the ALTER TABLE statement generates an error because you can't add a second column of the same name. The error on ALTER TABLE terminates the batch, so the COMMIT never executes and you're left with an open transaction. By testing the value of @@trancount, you can check for an open transaction after the error is reported.
Another way you can invalidate a text pointer for a table that has text in row enabled is by using a special stored procedure, sp_invalidate_textptr, which takes a text pointer as an argument. You can verify this behavior for yourself by running Listing 1's code again, replacing the line containing ALTER TABLE with this one:
EXEC sp_invalidate_textptr @ptrval
Looking at Text Data
Besides knowing how to manipulate the LOB data and the text pointers, for database management and sizing purposes, you also need to understand where the LOB data is stored. In "Off the Record" (January 2003), I showed a simple way of verifying when SQL Server actually moved text data onto a data page after the text in row option was enabled. You use the sp_spaceused stored procedure and note when the number of pages reserved for the table changes. When the text data is on the special LOB page, SQL Server reserves more pages for a table than after you move the text data into the row. However, that technique of looking at the number of reserved pages is useful only for the simple case of a table containing just one row. In a table with multiple rows, if some LOB data is in the data rows because it's small enough and the LOB data for some rows is on its own special pages, the sp_spaceused procedure might not give any useful information because it shows only the total number of pages, not any information about what's on those pages. However, you can use the undocumented command DBCC PAGE to look at the data on the LOB pages themselves.
I won't go into complete detail about the workings of DBCC PAGE, but you can read my articles "More Fill-Factor Mysteries" (May 2001) and "Do the Splits" (June 2001) for more information. This command isn't an official part of the SQL Server documentation, but using the undocumented command is your only alternative if you want to view the storage structures within SQL Server.
For an example of how to look at the data on LOB pages, use the script that Listing 2 shows to build a table called hasText with five columns, one of which is a text data type. The code inserts one row into the table, putting a string of 250 Xs in the text column. The code then uses DBCC IND to discover which pages belong to the table. DBCC IND returns four rows, representing four pages in the table. The DBCC IND output has two rows for pages with a PageType value of 10, which give details about allocation pages; I'll ignore those. The page with PageType 1 is a data page; for my hasText table, the data page is page 284. The page with PageType 3 is the LOB data page (for my table, it's page 282). If you run this script, you'll have to replace the 284 and the 282 with your own page numbers. Also, if you don't use the Pubs database, you'll have to change the first argument to be the database ID of the database you're using.
Next, the script in Listing 2 enables the text in row option with a maximum of 500 bytes, then uses the OBJECTPROPERTY() function to verify this limit. The code enables DBCC PAGE to return output to the screen by enabling trace flag 3604, then uses DBCC PAGE with database ID 5 for the Pubs database to display both pages—the regular data page and the LOB data page.
In the DBCC PAGE output, SQL Server attempts to translate any character data on the page into readable ASCII format. You can see the contents of the row on the data page after the header that says Record Type = PRIMARY_RECORD. For now, let's ignore the cryptic byte output on the page and just look at the translation of the bytes on the far right. You can see the other character values that the code inserted into the row, but not the string of Xs. To see this string, you have to scroll down to where the LOB page's information begins, right after the header that says Record Type = BLOB_FRAGMENT. Again, just looking at the information on the far right, you can see the long string of Xs. Because the code enabled text in row after inserting the data into the table, the data doesn't automatically move into the row.
The script in Listing 2 then updates the row so that the string of Xs is now a string of Zs. The update causes the text data to move onto the data page. Running DBCC PAGE again for both the data and the LOB pages shows you that the string of Zs is on the data page. It also shows that SQL Server first updated the data on the LOB page; the Zs are on the LOB page. However, because this was the only row on the page, SQL Server deallocated the LOB page. The information is still on the page, but it no longer belongs to the hasText table. The indicator that this page isn't allocated is in the Allocation Status section of the page header. For the data page, I have this information: PFS (1:1) = 0x61 MIXED_EXT ALLOCATED. For the LOB page, I have something similar, but the word ALLOCATED is missing, which means the page isn't allocated to any table.
You can do more testing of LOB data behavior on your own. Use DBCC PAGE to see what happens when some of your text data is longer than the text in row limit and some is shorter or when you have text data that's too big to fit on one page. You can disable the text in row option and see what happens to the pages in this table.
LOB data can be useful for storing large chunks of data, and having the ability to store smaller pieces of LOB data in the table makes using these data types much more efficient than in versions earlier than SQL Server 2000. Understanding how to work with LOB data types and how they behave inside and outside of transactions can help you make the best decisions about how to store your LOB data and how to write efficient applications. And being able to examine the internals of the pages helps you understand what SQL Server does when you manipulate data of one of these special data types.
About the Author
You May Also Like