Using NoSQL with Windows Azure Roles: MongoDB and Azure Table Storage

Comparing MongoDB and Windows Azure Table Storage for use by Web and Worker role instances

Zoiner Tejada

June 14, 2012

8 Min Read
Using NoSQL with Windows Azure Roles: MongoDB and Azure Table Storage

In recent years, the NoSQL movement has gained a foothold in the developer community, offering open-source, non-relational alternatives to relational databases. NoSQL databases are defined by the fact they don't use SQL; might not give full atomicity, consistency, isolation, durability (ACID) guarantees (although often NoSQL databases are eventually consistent); and demonstrate support for a highly available, distributed architecture that can scale to handle large amounts of data. NoSQL databases are also typified by their aversion to supporting SQL-style joins between tables. Instead, NoSQL databases favor having all the necessary data in a single table, and when joins are required, forcing them to occur on the client side.

NoSQL is a viable database option for Windows Azure developers. In this article I discuss and compare two NoSQL options that you can use from your Windows Azure–hosted services. These options are Windows Azure Table Storage, which is likely more familiar to readers of this column, and MongoDB, which is one of the more widely used of the NoSQL databases and offers a compelling set of NoSQL features that can be hosted within Azure. (See "Exploring the MongoDB Document Database: A Primer for .NET Developers" for an introduction to MongoDB.) Both Azure Table Storage and MongoDB support the aforementioned key aspects of the NoSQL movement: high scalability and high availability. We'll look at the two options from a range of perspectives, from features to costs, so that you understand the value each option provides and how your Azure roles can interact with them.

A Brief Introduction

Windows Azure Table Storage has been around since the early days of Windows Azure (circa 2008) and has received improvements ever since. Microsoft provides Azure Table Storage as a service. The logical model for Table Storage, from the top, is the subscription (which is the unit of billing). Each subscription can contain one or more storage accounts (a storage account defines both name/key-style access control and the physical storage location). A storage account contains schema-less tables, in which a given table contains entities and any given entity has a variable set of properties. Property values themselves provide support for basic types, such as string, bool, binary, datetime, double, guid, and integers. Windows Azure Table Storage currently supports only a single composite primary-key index, which consists of two columns: PartitionKey and RowKey. Azure Table Storage has no support for secondary indexes (e.g., the non-clustered indexes you might create in a SQL Server database).

MongoDB is a document-oriented database provided by 10gen and was initially released in 2009. MongoDB has gained a tremendous following in the developer community and has grown in popularity with the rise of the NoSQL movement. The logical storage model for MongoDB is a little more complex than Azure Table Storage, primarily because it usually isn't provided as Software as a Service (SaaS) but rather as a distributed system that you need to host in Azure (albeit there are companies now offering hosted solutions -- we'll return to that point later).

A typical deployment of MongoDB that's designed to scale well consists of shards (which contains subsets of the total data split out by a shard key). The data for an individual shard is actually stored across a replica set, which consists of multiple nodes (called replicas) that store the data to disk. Each replica set consists of a single primary replica (to which all writes and read requests go) and usually two or more secondary nodes (which are eventually consistent with the primary replica) that provide redundancy for high availability. Secondary nodes can also be used to respond to read requests and thereby also improve read performance. Distributed among shards and replica sets are databases, which themselves consist of collections (think of these as similar to tables), where a collection stores BSON documents (which are JSON-like documents that have a binary encoding). Each document contains fields, where a given field consists of a key and a value. Values can be a basic type (e.g., string, integer, float, dates, binary), another nested BSON document, or an array.

Sharding is enabled as an opt-in process for each collection, and this effectively distributes a collection among multiple "shard servers," each of which consists of a replica set. A powerful feature of MongoDB is its support for indexing on any key or multiple keys of a document stored within a collection, and you can create as many secondary indexes as desired. When your applications query a MongoDB database, they connect to a routing server, which, using shard configuration data stored in supporting config servers, then forwards the requests on to the appropriate shard server and ultimately to a replica node.

A basic, highly available MongoDB database typically runs using three medium-size Azure Worker role instances, each hosting a replica node in the replica set. However, if you enable sharding, you will add at least three additional small instances that run the routing and config server processes -- bringing the minimum size for a sharded implementation to six Azure role instances. The replica nodes and the config server nodes each store their data as Virtual Hard Disks (VHDs) in Windows Azure Blob storage.

Querying

So how can you interact with these NoSQL databases from your Azure Worker and Web roles? MongoDB provides drivers for use from .NET, Java, JavaScript, Node.js, Ruby, and many more development platforms that are officially supported or provided by the community. By comparison, Windows Azure Table Storage offers SDKs supporting access from .NET, Java, PHP, Node.js, and REST clients.

If you just want to query the database directly, with MongoDB you can do so using either the MongoDB command line or any of a plethora of third-party tools. Microsoft does not provide tooling for querying Azure Table Storage directly, but third-party products such as Red Gate Software's Cloud Storage Studio 2 make quick work of doing this.

Scale and Capacity

One thing that sets NoSQL databases apart is their horizontal scaling architecture, which is designed to provide support for big data and high availability. Both MongoDB and Windows Azure Table Storage deliver in this regard.

Scaling model. MongoDB provides support for sharding, which has a theoretical upper limit of 1,000 shard nodes but has only been tested to hundreds of nodes, so more than 100TB is possible. Within a collection, individual documents are limited to 16MB, but MongoDB provides support for transparently splitting huge files among multiple documents using its GridFS feature. As you need more storage, you simply add more shard nodes, and MongoDB will take care of balancing the data across them.

Windows Azure tables scale automatically and without your involvement. Ultimately, you are limited to 100TB per storage account, 1MB per entity, and 255 properties per entity.

High availability. High availability for your data is provided by redundant storage. In MongoDB, this is achieved by replica sets, where data is always written to a node elected as the primary node and then synchronized out to secondary nodes in an eventually consistent fashion. Your deployment architecture determines the degree of fault tolerance, as a result of how many replica nodes you opt to have in a single replica set (typically you have three replica nodes and can grow the set to include up to seven nodes). In Azure Tables, your data is triple-replicated with immediate failover within the same data center and is also geo-replicated between two data centers hundreds of miles apart.

Retail Pricing Comparison

Naturally, any discussion comparing cloud solutions needs to take at least a cursory look at the costs. Here are some pricing scenarios -- first for MongoDB, then for Azure.

In the case of MongoDB, when you are hosting it on Azure, you are always getting a dedicated MongoDB database. That said, there are third-party companies, such as MongoHQ, that will host MongoDB for you. Excluding any bandwidth costs between your Azure role instances hosting your application and your MongoDB databases, you could see costs in line with the following for a moderate load:

A single large instance:

$345.60 (720 hours/month × $0.48/hour for a medium instance) +
$31.25 ($0.125 × 250GB VHD storage) +
$250 ($0.01 per 10,000 I/O requests × 25,000 units) =
Total: 345.60 + 31.25 + 250 = $626.85 per month

A replica set (of large instances):

$1,036.80 ($345.60 × 3 replica instances) +
$93.75 ($31.25 per VHD × 3) +
$250 ($0.01 per 10,000 I/O requests × 25,000 units) =
Total: 1,036.80 + 93.75 + 250 = $1,380.55 per month

Note that when you are self-hosting within Azure, you can add support for sharding/routing. Alternatively, you can host your MongoDB with a third-party hosting service such as MongoHQ. The difference with third-party hosting is that it typically includes all I/O and bandwidth costs but does not provide support for sharding or routing.

For example, MongoHQ offers both shared and dedicated database plans that are hosted by Amazon EC2. (Shared runs from $15/month for 2GB to $299/month for 10GB; dedicated runs from $637/month for a 250GB single large server to $1,912/month for a replica set of three large servers.) MongoHQ also supports hosting in Joyent Cloud, which is usually less expensive than the Amazon options.

Windows Azure Table Storage has a fairly simple pricing structure:

  • no bandwidth costs if transferring between Azure instances and storage located in same region

  • $0.125 per gigabyte

  • $0.01 per 100,000 I/O requests

So, to take our example from the previously described scenarios for 250GB of data and a moderate workload, you would see costs in the neighborhood of (250GB × $0.125) + (2,500 × $0.01) = $56.25 per month. Naturally, this amount is significantly cheaper than the MongoDB third-party hosting scenario because you have eliminated the compute costs.

Try It Out

For those interested in exploring NoSQL, you now know that Windows Azure supports two varieties of non-relational, NoSQL databases: the Microsoft variety -- Windows Azure Table Storage -- and the open-source variety -- MongoDB. If you want to try out MongoDB, running it on Azure in a moderate-workload scenario will likely be considerably less costly than using a third-party MongoDB hosting service. To continue your exploration of MongoDB and other Azure-related topics, check out the resources listed in the Learning Path. And keep an eye out for forthcoming solutions that make it easier to deploy MongoDB to Azure!

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like