To BLOB or Not to BLOB
Externalizing BLOBs to optimize SharePoint storage
July 20, 2011
I'd like to tackle and frame the debate about BLOB externalization, because it’s likely that if you find yourself in a scenario that requires supporting large content databases, you’re likely to be considering BLOB externalization as part of your storage optimization efforts.
(Previously, I discussed the content scalability limits and guidance that Microsoft revised.)
As you know, by default SharePoint stores all content for a site collection in a content database, including binary large objects [BLOBs] representing unstructured data such as documents in a document library and attachments on list items. Within the content database on a SQL server, BLOBs are stored in separate tables from the metadata about documents and list items.
That should give you a clue that BLOBs impact performance. SQL as a product is optimized for small (we’re talking 8KB or less), structured data, and performs best with access patterns against such data. By moving BLOBs to separate tables, SharePoint performance improves because SQL performance improves.
You have the option to move those BLOBs even further away from SQL Server, and potentially away from expensive Tier I storage that many organizations use for SQL Server, to other storage platforms including NAS, shared folders, or cloud storage.
You can do this with External BLOB Storage (EBS)—a set of APIs introduced in SharePoint 2007 (MOSS 2007/WSS v3) Service Pack 1—or Remote BLOB Store (RBS)—a framework and set of APIs now baked into SQL Server, available to SharePoint through the RBS version in the Feature Pack for SQL Server 2008 R2 which can also be installed on SQL Server 2008 SP1. Both EBS and RBS are supported by SharePoint 2010 (Foundation and Server) and across editions of SQL Server (including Express) with various caveats and limitations.
Both EBS and RBS require that you build or buy the code (called a provider for RBS) that implements the correct APIs for the storage platform. The RBS installer includes a provider, called the FILESTREAM provider, which can be used to externalize BLOBs to the local file system of the SQL Server (which can include mounted SAN volumes, as they appear as local volumes).
Microsoft also recently announced support for NAS volumes, as long as latency and time-to-first-byte (TTFB) are within guidelines (20 ms TTFB). The FILESTREAM provider sits within SQL Server. Third-party RBS (and EBS) providers can externalize to a wide range of storage platforms and will generally outperform the FILESTREAM provider. Some third party providers are even free of charge, making them what I would call a “no brainer” if you’re going to externalize BLOBs.
But the question is, should you externalize BLOBs? There’s a lot of noise and information out in the community about BLOB externalization.
Rather than regurgitate it all here, I am going to summarize and encapsulate the important issues, and highlight several key considerations that I believe are being overlooked or misrepresented in the debate.
The following issues are among the most important to consider as you decide whether to externalize BLOBs as part of your storage optimization strategy. I’m going to present the advantages and disadvantages of BLOB externalization as if you should externalize BLOBs.
But the most important message to remember is that BLOB externalization—while “good” for many scenarios, is not "right" for every scenario. Many factors must be considered, and BLOB externalization must be architected very carefully.
Please keep in mind that most advantages of externalization can be translated to disadvantages of not externalizing, and disadvantages of BLOB externalization are, conversely, advantages of keeping BLOBs in your SQL content database. I’m presenting the externalization side of the story for brevity’s sake.
Reduced cost of storage
BLOBs can be moved out of expensive, high-performance SQL Server storage to less costly storage platforms. This is an easy argument to understand, and an easy ROI calculation.
I’ve seen several customers whose storage savings per year were in the 7 and 8 figures! Don’t forget the impact on your transaction logs (detailed in the next section) when calculating storage savings.
Improved performance of file access
This is a fairly well documented benefit. BLOBs put a burden on the CPU and RAM resources of a SQL server every time a read or write operation is performed. Write operations are particularly problematic because the BLOB is written twice—first to the transaction log for transactional consistency then written to the appropriate table in the SQL content database.
Boiling down a lot of data, it’s pretty clear that files greater than 1MB perform better (reads and writes) when the BLOB is externalized, and that very small files (< 256 KB) tend to perform better in the content database, but there’s a lot of "it depends"—access patterns and characteristics of storage and RBS providers are among the most important variables.
Improved performance of access to all content on the SQL Server
This, in my opinion, is terribly underdocumented: By externalizing BLOBs, even small BLOBs (<256 KB), you can improve the performance of SharePoint across the board.
To make a long story very short, if you have only one user, then this isn’t true. But when you look at a real-world workload, in which multiple users are hitting SQL, and some of those users are reading and writing BLOBs, even small ones, the cumulative impact is to reduce performance for all users—even those accessing content such as list views that have nothing to do with BLOBs.
The BLOBs accessed by some users "weigh" down the SQL Server’s resources for all. I’ve seen significant performance increases—across the board—in customers. Microsoft has released some test results that support this observation, and over the coming weeks you will see more test results published by the community to give quantitative metrics to this effect.
But believe me, it’s there. I’ve said it before and I will try not to repeat it too many times, it depends and it must be architected well, but this is a potential big win for SharePoint performance under many real-world production workloads.
Access to important features of the storage platform
Another under-emphasized benefit of BLOB externalization is that you get the benefit of features of the underlying storage platform. Even with the FILESTREAM provider, you could utilize the out-of-box capability of the NTFS file system to compress and encrypt files.
Other storage platforms provide de-duplication, differential compression, snapshots, and other features that can further reduce your storage footprint and manage storage more effectively.
Efficient content restructure
After you externalize BLOBs, it is easier and faster to restructure content—to move content between sites, site collections, and even web applications.
Some of this restructuring can be done with the improved Move-SPSite cmdlet in SharePoint 2010 SP1, other restructuring requires third party tools, but the bottom line is that you can move metadata in SQL with the "pointers" to the BLOBs intact, and you do not have to actually move the BLOBs.
Storage platform management
One of the key disadvantages of BLOB externalization is that now your content is, of course, "split." You now must manage two platforms—the SQL Server databases and the BLOB store.
Administration, patching, maintenance windows, monitoring, auditing, and all management activities must be accounted for. Among the most important are item and database recovery, disaster recovery, and high availability—so important that I’m addressing those as their own issues, below.
Backup and recovery of content databases and items
There is also a lot of noise about backup and restore of content with externalized BLOBs and the bottom line is that you must architect your backup and restore to account for them, but it doesn’t have to be anywhere near as complicated as some people make it out to be.
I’ll document this more, later, but you can architect a solution that will meet your SLAs as long as you have the right technologies and processes in place. Again, certain storage platforms add capabilities (particularly snapshot-based backup and restore) that are important in very large scenarios; and some third party tools provide for restore SLAs of less than two minutes!
Disaster recovery and high availability
When BLOBs are stored in the SQL content database, SQL clustering and mirroring are among the several options for providing disaster recovery and high availability. When you externalize BLOBs, you must architect a solution that meets your DR and HA requirements.
Be sure to read these resources:
And these great blog posts for some additional opinions:
Have you found other great resources for your evaluation of RBS? If so, let me know by posting a comment!
RBS takes a lot of "hits" because it’s complex (more complex than a single content database on a SQL server) but my experience is that BLOB externalization is well suited to many key scenarios (particularly when migrating from file shares) and, if well architected, can be a "big win."
Please remember, that although Microsoft has now raised the bar of support—they will now support you in much larger content scenarios—it’s up to you to be fully informed of all of the requirements, caveats, and architectural considerations.
About the Author
You May Also Like