Virtualizing SQL Server: 8 Things to Avoid
I've been involved in virtualization projects for about 10 years now. In that time, I've had the opportunity to track my own list of best practice items. It's a list I share freely with customers that seek out virtualization advice.
November 20, 2013
I've been involved in virtualization projects for about 10 years now. In that time, I've had the opportunity to track my own list of best practice items. It's a list I share freely with customers that seek out virtualization advice. I can tell that virtualization (and cloud) efforts are on the rise simply by the number of requests I get for help, specifically for virtualizing SQL Server.
Related: Essential Tips for Virtualizing SQL Server
I like to call this list my "facepalm" special, as any one of these essentially triggers a facepalm reaction. They have helped my customers in the past and I'm certain they'll help you.
1. Build Your Own
Working in an IT department often affords many of us the opportunity to recycle spare parts for a variety of purposes. Sometimes it's a monitor, or a battery, or extra RAM that gets redeployed. As such, we fall into the habit of getting the most life out of the hardware we purchased. There'll be a period of time when you will look around your server room and think "Hey! I've got enough parts here to build a really big, beefy server that could be a virtual host!"
Don't be that guy.
You don't build a host—especially a PRODUCTION host—out of spare parts leftover from servers that are near the end of their life. If your equipment is already behind the times now, imagine how much farther it will be in 18 months. If you want to go on the cheap and use spare parts, do it for a development host and get ready to spend extra time keeping that bucket of bolts together.
Related: 5 SQL Server Virtualization Tips
If you're going virtual, you'll want to buy new hardware to use for your hosts. That hardware should be more powerful than the servers you have currently deployed as physical boxes. There's also licensing considerations here. It could be the case that it's cheaper to buy new hardware and have less to license overall.
2. No Performance Expectations
You cannot go virtual without having an idea as to what's an acceptable performance level. VMware says they can get you 98 percent of the same performance as a current physical implementation of SQL Server. Note, that doesn't mean you'll get better performance by moving to VMware. Often times, you get better performance as a result of moving to newer hardware (see the previous item in this article). But if you don't know what your current performance SLAs are, then you won't have any idea if you have still met the SLAs once you have converted to virtual. Get your expectations set now so you can track them going forward.
When you're doing your baselining for performance, don't focus solely on duration of queries as the only indicator. You'll also want to measure the logical I/O and CPU costs for the queries. If you're going to try to squeeze as much performance out of your hosts as possible, you're going to want to be able to quickly identify those queries consuming the most CPU and memory and tune them first (see Figure 1).
Figure 1: Example of I/O and CPU Costs as shown in Ignite
You have two main options here: raw device mappings (RDM) and virtual machine disk format (VMDK). Which one do you want to use, and when? The real difference is functional, or architectural (I know I just scared away some DBAs because I used the word 'architecture,' but yeah, I went there).
VMware has published a list of scenarios where RDMs would be a better solution for your shop. You need to know these differences before you start deploying a solution that fails to meet some critical business requirement.
You should also understand the differences between RAID levels and which ones are best for SQL Server. Lastly, keep in mind that one of the reasons virtualization is a desired tools these days is because the shared storage aspect allows for servers to be easily moved between hosts as necessary. This is also the great weakness as it allows for the "noisy neighbor" to consume resources to the point that it causes pain for the other servers using the same disk. Just something to keep in mind as you lay out your disk configuration options.
4. Thin Provisioning
Thin provisioning is one of those bad ideas that sounds good and often produces the same results as do-it-yourself dentistry. It starts out innocently enough: someone wants to save space and only allocate storage as needed. The end result is that no one keeps efficient track of what VMs have been thin provisioned and eventually, the files grow in size until they fill up all the available storage, causing all activity to stop because the disk is full.
The common recommendation given to fix the issue of your storage disks becoming full is to migrate the guests to new hosts where they'll fit. Great advice, but I'm guessing you didn't have enough room to start with, otherwise, you would'nt have been using thin provisioning.
I help DBAs understand thin provisioning by comparing it to autogrowth for SQL Server. If you have a data file of considerable size (say, 500GB) and it's still at a default growth of 10 percent, then the next growth event will consume 50GB of disk space without warning. If you have less than 50GB of available disk space, then the drive will be filled completely. That's the same thing that happens with thin provisioning.
Most good senior DBAs take steps to minimize the chances that an autogrowth event will fill up a disk and cause headaches for end users. If you find yourself using thin provisioning in order to maximize your space then you're also going to want take the extra steps to manage your environment to make certain you don't have any sudden growth events.
5. Over-allocation Of Memory/CPU
It's okay to want to over allocate your memory and CPU resources. What you don't want to have happen is to have your memory and CPU become over committed, as that's where performance issues manifest themselves.
When I'm reviewing a customer's configuration, I tell them the line in the sand I draw is a 1.5:1 ratio as an upper bound default for CPU resources (so, 16 logical cores means you can allocate 24 CPU as a baseline and adjust up or down as needed based upon workload and load balancing allows).
For memory settings, I follow what is outlined in the VMware Performance Best Practices guide which states "...avoid over-allocating memory." In other words, I'm much more conservative with memory over allocation than with CPU over allocation.
It's easy to get carried away by over-allocating resources as you add more and more guests to your hosts. The next time you go to add a new guest to a host, take a few minutes and add together the upper bound for the memory and CPUs already allocated. Those few minutes may save you hours of headaches later. (More infomaton on Demand Based Memory Allocation for SQL Server is available on the Confio LogicalRead site.)
6. Trusting O/S Counters
When you go virtual, that means you have an additional layer of abstraction (i.e., the hypervisor) between you and the data on disk. I usually just say "there are many layers of delicious cake between you and your data." The trouble is that you need to know which layer is causing you performance issues. Is it the host? Is it the guest? As such, you need to rely on the VM performance counters in order to get a complete picture of what's happening.
DBAs are often aware that their server has been virtualized, but complain that they don’t have insight to the VMware performance counters. Without that insight, DBAs have no idea where the bottleneck truly lies. One common example has to do with CPU pressure. SQL Server may display the symptoms of internal CPU pressure and a DBA might start taking steps to correct the issue. However, that internal pressure may be a result of an issue with the guest, or even the host.
Not knowing where the bottleneck truly lies means the DBA wastes time trying to fix something that isn't broken. I hate wasting time, don't you? The issue isn't with the database engine, the issue is external to the engine, and that's where you want to spend your time effectively.
If you're still relying on standard O/S metrics for a virtualized server, then you're doing it wrong.
7. Running It All At Once
Remember how I said that you want to avoid over committing all your resources at once? That's where load balancing and knowing your workloads are key. You cannot carve out a dozen guests to be used as production database servers to be run during regular business hours and expect that performance will remain at that 98 percent mark that VMware suggests is attainable.
The concept of load balancing can be more art than science at times. This is due to the fact that despite your best intentions to place guest servers onto a host that have staggered usage times and workloads, some event will happen that disrupts the chain one day. When that event happens, and things start to run slow, a domino effect kicks in and it can take a few cycles before things go back to normal.
You have to balance your workload, otherwise you're going to find that your over allocation of resources is now an over commit of resources. It's not enough to just balance though, you have to leave room for growth and usage, so that if an event happens you can weather the storm without it disrupting too many other systems.
Yet, I still see customers stretching their hosts way too thin, too soon.
8. Capacity Planning
Speaking of "too much, too soon," we have the idea of proper capacity planning. In theory, this is a wonderful idea. In reality, it's often a futile effort.
As much as you want to say "the past six months we have seen a growth of x percent, and we expect to be at Y in the next six months," the reality is that tomorrow the business will decide that downloading 1,000 files that are 1GB in size shouldn't be any problem whatsoever.
In other words, business requirements will change, often without warning. Just as we mentioned with load balancing, you want to allow for growth with regards to proper capacity planning. Don't stretch everything you have (CPU, memory, disk, network) to the max right out of the gate. If you do, you're asking for trouble in a very short amount of time.
Summary
Virtualization technology has come a long way in the past 10 years. Long considered "untouchable" for any virtualization efforts, many shops are now comfortable with the idea of virtualizing their database servers. As more and more of these efforts are being completed, I see the above eight items as the pain points for many of those efforts.
These eight items result in bad performance and instability that leaves users and administrators frustrated. They are also easily avoidable with just a bit of up front knowledge and requirements gathering.
Thomas LaRock (aka SQLRockstar) is a Microsoft Certified Master and president-elect of PASS. As Technical Evangelist at Confio Software, he helps DBAs and developers understand and resolve their biggest SQL Server performance problems, including those on VMware. LaRock's go-to software for DBAs coping with virtualization is IgniteVM.
About the Author
You May Also Like