New Ways to Enable High Availability for File Shares

Windows Server 2012 File and Storage Services and SMB 3.0 change how file sharing works

John Savill

August 21, 2012

15 Min Read
ITPro Today logo in a gray background | ITPro Today

What's the coolest feature in Windows Server? My guess is that file-sharing services didn't make your top five. But that might change with Windows Server 2012. File and Storage Services combined with the new Server Message Block (SMB 3.0, formally known as SMB 2.2) protocol introduce some truly great newfeatures that completely change how file sharing works and that can be used in a highly available configuration. In this article, I'll focus on two newcapabilities of file services in a failover cluster: SMB Transparent Failover and SMB Scale-Out. I'll show how you can use these capabilities together toprovide a file services environment that can be used for the most demanding workloads, including hosting Microsoft SQL Server databases and Hyper-V virtualmachines (VMs).

File Services in a Failover Cluster Environment

Before I focus on the new features, I want to quickly describe how file services work in a failover cluster environment, which allows highly available fileservers and, more specifically, file shares. A Server 2012 failover cluster consists of as many as 64 servers (up from 16 in Windows Server 2008 R2) thathave the Failover Clustering feature installed and are configured to share a common set of storage and services.

The services that are defined in a cluster can be moved between the servers (aka nodes) in the cluster. These services consist of various resources, suchas IP address, network name, storage, and the actual service, such as a file server, print server, VM, Microsoft Exchange Server mailbox server, and so on.Services can be moved between nodes in the cluster in a planned situation or in an unplanned scenario, such as a server failure. In the latter case,services that ran on the failed server are automatically redistributed among the remaining nodes in the cluster.

Figure 1 shows a four-node cluster and a file server resource. The file server offers a single file share, which stores its content on an NTFS-formattedLUN. The LUN is a block of space from the shared storage, to which all the nodes in the cluster can connect. The file server, and thus the file share, isonline initially from the third node of the cluster. This node also mounts the LUN, which contains the file server content. During any failure, the fileserver moves to the fourth node, which also mounts the LUN, which is required to offer the share content. The LUN must be mounted by whichever node isoffering the file server that corresponds to the content because NTFS is a shared-nothing file system and can't be accessed concurrently by more than onenode. Therefore, when a file server moves to another node, the LUN must be moved between nodes as well. A file server is online by only one node at a time.


Figure 1: Basic failover cluster with a service moving between the nodes 

SMB Transparent Failover

The previous example involves challenges to using a file share that is moved between nodes in the cluster in planned and unplanned scenarios. First, when afile on a file share is used by an application, handles are typically created to allow an application to access the file and potentially to lock the fileto stop another application trying to write at the same time. In addition, the handle defines how data is accessed and specifically whether data can bebuffered on the file server, which might help to enhance performance. With Server 2008 R2 and earlier, any handles and locks are lost when the file servermoves to another node. In general, this behavior doesn’t cause a huge problem for regular users accessing Microsoft Word documents. However, that wouldn'tbe true if this was a database used by SQL Server.

The second challenge involves the time that is needed for a file server client to recognize that a file server is no longer available and to start takingrecovery steps. TCP/IP timeout values can typically cause an interruption of about 40 seconds -- unacceptable when server applications store data on fileshares. For those 40 seconds, all activity that requires file I/O to the share pauses -- an event commonly known as a brownout. Removing these challengesis vital for SMB. If server applications such as SQL Server and Hyper-V are going to use SMB file shares, they can't lose data handles or suffer 40-secondpauses in I/O!

The new SMB Transparent Failover feature addresses both issues. The feature enables continuously available file shares for SMB 3.0 clients, removing theloss of handles during a failover and reducing the time needed to detect that a file server has moved to another node, thus reducing brownouts.

Keeping file shares available.SMB Transparent Failover consists of several configuration changes and new technologies. One benefit that file servers traditionally offer clients isbuffering of data writes to disk. This element provides faster acknowledgments to client write requests because the file server caches the write operationin its volatile memory (meaning that if the server loses power, it loses the data), tells the clients that the data is written so that the client can carryon its work, then performs the write in the most optimal way. Certain applications always open handles with this caching disabled, through the use of theFILE_FLAG_WRITE_THROUGH attribute when creating the handle, ensuring that data is always written to the actual disk before receiving acknowledgment andavoiding any volatile cache. SMB Transparent Failover sets the FILE_FLAG_WRITE_THROUGH as the default for all created handles, eliminating the use ofvolatile memory cache. Now, there might be some slight performance implications because the cache is no longer used, but the assurance of data integrity isa good trade for the possibility of a slight performance degradation.

The second change that SMB Transparent Failover makes is how the OS manages file handles. File handles typically are stored in the memory of the fileserver. However, if a node fails and the file server moves to another node in the cluster, the handles are lost -- bad news for the using application. Inaddition to storing the handle state in memory, SMB Transparent Failover backs up the handle state in the Resume Key Database, in the System VolumeInformation folder of the disk on which the file resides and that the handle is referencing. Storing the handle information on diskmaintains the handle state when the file server moves between nodes in the cluster. However, because disk access is multitudes timesslower than memory, heavy metadata–generating workloads such as creating, deleting, renaming, extending, opening, and closing files cause additional I/O inthe Resume Key Database, removing available I/O from normal disk usage. But again, this tradeoff is acceptable to ensure that handles are maintained whenmoving file servers between nodes. (See the sidebar "What About Performance?" for my rationale on this exchange.)

Reducing brownouts. To meet the second challenge and reduce the time that an SMB client takes to realize that its TCP connection has died, the cluster must be proactive. Thecluster must notify SMB clients that connect to a cluster-hosted share whenever the hosting file server moves to another node. That way, the client canmore quickly reconnect. Enter the new SMB Witness capability, which operates something like this:

SMB Client:"I want to connect to this share on your ServerA."

SMB ServerA:"OK, you are connected. This share is hosted on a cluster; let your SMB Witness process know."

SMB Client Witness:"Great! Tell me about all the nodes in the cluster."

SMB ServerA:"Here is a list of all the nodes in the cluster: ServerA, ServerB, ServerC . . ."

SMB Client Witness:"Hey, ServerB. I am connecting to this share with this IP address on ServerA. I want to register with you so that you can tell me if something happens toServerA or if the file server moves."

SMB ServerB:"Sure, I'll let you know."

After this exchange, if anything happens to that file server in the cluster, the SMB client is notified proactively via its SMB Witness process and canreconnect far more quickly than TCP/IP timeouts would allow. The new time to detect and react to a failure or file server move is likely in the range of 5to 7 seconds instead of 40 seconds.

To enable SMB Transparent Failover, you don't need to do a thing. When you use the Failover Cluster Manager, Server Manager, or Windows PowerShell tocreate a file share on a Server 2012 cluster file server, SMB Transparent Failover is enabled by default on that share. (Note that this isn't the case whenyou create the share by using Explorer or the Net Share command, neither of which understand SMB Transparent Failover.) Windows 8 or Server 2012 clients,which are SMB 3.0-compatible, will then use the SMB Witness capability and will open sessions to use write-through handles.

You can use PowerShell to confirm that this process is happening. In my lab, I have two nodes in a cluster with a file server resource and a share. Iconnected from my client machine, and from an elevated PowerShell window I executed the following command on a node in the cluster:

PS C:> get-smbwitnessclient | select clientname, fileservernodename, witnessnodenameclientname fileservernodename witnessnodename---------- ------------------ ---------------savdalwks08 WIN8FC01 WIN8FC02 


As you can see, the output shows the name of my client computer (savdalwks08), the file server to which the client is connected (Win8FC01), and the nodewith which it has registered for notification (the witness, Win8FC02). (Another option is to use the Get-SmbOpenFile PowerShell cmdlet and look at theContinuouslyAvailable property.)

To view a list of all the administrator-created shares and to determine whether they are configured for continuous availability, use the followingPowerShell code:

PS C:> Get-SmbShare | Where {$_.Scoped -eq "true" -and $_.Special -ne "True"} | Sort ClusterType | Format-Table Name, ScopeName, ClusterType, ContinuouslyAvailable, PathName ScopeName ClusterType ContinuouslyAvailable Path---- --------- ----------- --------------------- ----NonCSVData WIN8FSTRAD Traditional True E:SharesNonCSVDataDataCSV WIN8FSSCOUT ScaleOut True C:ClusterStorageVo... 


SMB Scale-Out

Using file servers in a cluster hasn't changed fundamentally since its introduction. Only one node in a cluster can mount and host shares for a particularNTFS-formatted LUN at any one time. This single-node offering of services can limit scalability and introduce delays because LUNs must be dismounted,moved, and mounted when the file server resource moves. This necessity has led storage and file services architects to make some sub-optimal designdecisions when planning their clusters, to avoid nodes sitting idle.

Consider an organization that wants to share one NTFS volume but requires the share to be highly available. This scenario requires at least two hosts in acluster, but only one host can actually offer the share. To avoid this active/passive situation in which one host does nothing, the storage administratorsdivide the LUN into two, create two NTFS volumes (one on each volume), then create two file servers in the cluster, each with its own share. This setupallows each node to offer one share and to host the other node's share during a failure. This way, both hosts are working -- but the storage is now dividedin ways the organization might not want. In addition, if you don't divide the content correctly, one share might get more traffic than the other, causingan imbalance and potentially forcing you to move data around. And this is with just two nodes. Now imagine that you have four nodes, as Figure 2 shows, oreight nodes; that's a lot of separate LUNs, NTFS volumes, and shares just to keep all the nodes in the cluster busy.


Figure 2: Required compromise with traditional clustered file servers 

The root of the problem is that NTFS volumes don't share and can't be used by more than one node simultaneously. This issue was partially solved in Server2008 R2, which introduced Cluster Shared Volumes (CSVs). I wrote about CSVs in " Introduction to Cluster Shared Volumes," so I'm notgoing to discuss it in detail here. Basically, CSV enables a single NTFS-formatted LUN to be written to and read from all nodes in the clustersimultaneously, through some clever behind-the-scenes mechanics. CSVs in Server 2008 R2 were supported only for the storage of Hyper-V VMs running on theHyper-V hosts in the cluster that contained the CSVs.

Server 2012 expands the use of CSV to a new type of cluster file server, namely the new SMB Scale-Out file server. The file server type -- Scale-Out orTraditional (i.e., the existing file server model) -- is selected at the time of creation. When you create a new file server of the Scale-Out type, youmust create the shares on folders that are stored on CSV volumes. In Server 2012, NTFS volumes that have been CSV-enabled show as file system type CSVFSinstead of NTFS. In reality, the file system is still NTFS, but the change in file-system labeling makes it easy to distinguish between volumes on disksthat are CSV-enabled (i.e., CSVFS) and those that are not (i.e., NTFS). Remember that a CSV is available to all nodes in the cluster simultaneously, sothis created share can now be offered by all the nodes in the cluster at the same time, and all the nodes can get to the content. When creating a Scale-Outfile server, you don't need to specify an IP address. The IP addresses for the interfaces that are configured for client access on the cluster nodes areused; all nodes offer the service.

Another great feature is the ability to use SMB Transparent Failover to move a client from one node that offers a Scale-Out file server to another node,without any access interruption. Suppose, for example, that you want to place a node in maintenance mode. The following command moves a specific SMB clientfrom one node to another; you can easily use PowerShell to execute this command for all clients that use a specific node in the cluster.

First, I determine which server an SMB client is using (we used this command previously):

PS C:> get-smbwitnessclient | select clientname, fileservernodename, witnessnodenameclientname fileservernodename witnessnodename---------- ------------------ ---------------savdalwks08 WIN8FC01 WIN8FC02Now, I move that client to my other server:PS C:> Move-SmbWitnessClient -ClientName savdalwks08 -DestinationNode Win8FC02 


To verify that the move happened, I rerun my command. I see that the client has moved to the other node in my cluster, and the witness is now my originalserver. (As the file server and witness can't be the same server, that wouldn't be useful!)

PS C:> get-smbwitnessclient | select clientname, fileservernodename, witnessnodenameclientname fileservernodename witnessnodename---------- ------------------ ---------------savdalwks08 WIN8FC02 WIN8FC01 


What does this output mean? Refer again to Figure 2. You can now create that single big LUN that you wanted, with one NTFS volume that all four nodes sharesimultaneously. (Microsoft supports as many as eight nodes offering one SMB Scale-Out file server). This capability simplifies management, eliminating theneed to associate numerous separate LUNs, shares, and IP addresses with each file server. So why does the traditional file server type still exist? Whywould you ever use it?

As I mentioned previously, CSV performs some clever mechanics to enable one NTFS volume to be written to and read from all nodes in the clustersimultaneously. One of the cleverest parts is handling metadata writes to NTFS volumes, which is the biggest problem with multiple computers concurrentlyusing one NTFS volume. Having two servers writing metadata at the same time is likely to cause a corruption. CSV solves this problem by having acoordinator node for each CSV disk. This node mounts the disk locally and performs all metadata activity on behalf of the other cluster nodes that sendmetadata writes over the cluster network to the coordinator. (These other nodes can still directly access the disk for standard data I/O.) This metadataredirection over the network can cause latency in operations. That's why the SMB Scale-Out file server is targeted at key application server workloads suchas SQL Server and Hyper-V, which are very light on metadata activity and focus on data I/O. When you contrast the server application I/O characteristicswith those of a typical information worker using Microsoft Office documents, the I/O for an information worker is typically 60 to 70 percent metadataoperations. That's a lot of data being redirected. I'm not saying that using an SMB Scale-Out file server in such a scenario won't work or will performbadly if architected correctly, but it's certainly something to consider. At this time, the Scale-Out file server is recommended only for serverapplications like SQL Server and Hyper-V.

There is another reason that the Scale-Out file server is unsuitable for storing Office documents and other user data. The Windows file server platform isused in many situations because of features such as quotas, file screening, file classification, BranchCache, and (in Server 2012) data de-duplication.None of these features are available on a Scale-Out file server. Server applications don't care about such features.

Closing Thoughts

When you combine the Scale-Out file server with the SMB Transparent Failover feature (which works for traditional and SMB Scale-Out file servers), you geta file services platform that allows multiple servers to serve the same share with the same content. The result is great scalability for clients and aresiliency that was previously impossible. Although Scale-Out focuses mainly on SQL Server and Hyper-V workloads, expect more types of workloads to betested and recommended over time, offering customers many new options in their storage and overall IT architectures.


Sidebar: What About Performance?

I've talked about how the changes that SMB Transparent Failover makes could introduce a slight performance penalty because of the bypassing of write cacheand the increasing of I/O from metadata-heavy operations. This penalty might sound fairly off-putting. But in reality, many key server applications thatwould benefit from this technology, such as Microsoft SQL Server and Hyper-V, specify the use of FILE_FLAG_WRITE_THROUGH to bypass write cache anyway.Also, such applications perform very few metadata operations. Rather, they read and write to the data of the file, so they won't be much affected by thedisk-based Resume Key Database. These changes are more likely to have an effect on user workloads, such as opening Microsoft Office documents. Suchworkloads aren't the focus of this feature.

About the Author

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like