My Query is (Not) Waiting Because of Parallelism?

I recently received a frantic email from a team member stating their query was suspended because of parallelism. Let's talk about why that's probably not true.

Tim Ford, Owner

October 18, 2014

7 Min Read
My Query is (Not) Waiting Because of Parallelism?
Wiki File:US_Navy_101104-N-6383T-508

It Always Seems to Start On a Friday

Last week, on Friday afternoon, around the time of the week when you start thinking about turning off the computers for the weekend (also known as 9 a.m.), I received a frantic message from an Application Analyst who has a fairly decent understanding of SQL Server. She attends our local SQL Server PASS Chapter meetings and so I knew that what she was seeing was something that would turn into an educatable experience for one, or both of us. In the back of my mind, I was already thinking about screen shots and code snippets for this article as well. Two birds, one stone . . . right?

The Application Analyst reported that when she looked at the sys.dm_exec_requests query, she was seeing her session_id in a suspended state with an associated CXPACKET wait type:

Certainly, if I was to see this information, I'd probably arrive at the same conclusion. There's one thing to always keep in mind though when you see CXPACKET waits, this always signifies that there is an associated task still being performed.

What is a CXPACKET Wait?

CXPACKET waits are indicators of one or more parallel task within Microsoft SQL Server waiting on active tasks to be completed before continuing on with their assigned workload. If a request to the query engine meets certain criteria for being parallelizable and its cost estimates exceed the instance's Cost Threshold for Parallelism setting and where the Max Degrees of Parallelism value is not equal to 1 (which basically signifies that you can parallelize queries so long as the maximum number of parallel processes is 1), then the request may be divided up to run across more than one thread. This work may not be divided evenly. If you're a parent, you see this all the time when it comes to dividing chores between your children. One is always going to complain that they're doing more work than their brother or sister. (Can you tell I'm writing this right after my sons come home from school?)

Since the work is most-likely not going to be evenly divided, this means that some tasks will be done before others. The waiting tasks then post out their wait time as CXPACKET while the other tasks continue to run or wait on other wait types and resources.

When a process goes parallel, you end up with something called "execution contexts"—the simplest way to think of these are as the individual chores being divided up to the various threads to run. There's also a special execution context created to coordinate all of this fun. It's often referred to as "Context Zero"  because of it's associated execution_context_id of 0 or as the "Coordinating Execution Context" or "Coordinator Thread." The interesting thing to remember: This thread will always post a CXPACKET wait

You may begin to see why on any instance of SQL Server where parallelism is an option (meeting the following criteria):

  • Serializable-only operations non-existant

  • Costs for queries exceed Cost Threshhold for Parallelism server-wide setting

  • Max Degrees of Parallelism server setting is 0 (dynamic) or is greater than 1

that CXPACKET usually is the highest wait type as a percentage of total waits. It's because any parallel-processed request is going to post at least 1 * (execution duration) of CXPACKET wait time (because of the Coordinator Thread), as well as the accumulated time for any other thread(s) waiting on active threads to process their workload.

The situation described above always reminds me of highway construction crews where you see five people standing around a hole watching one person shoveling. Think of all of those workers standing around posting CXPACKET waits (along with the crew chief who is drinking coffee in the cab of her truck), while the one guy in the hole is doing the work (or even better—possibly waiting on a new hire to go fetch a shovel and incurring a "SHOV_L" wait type). Putting that into SQL terms, what is the true wait to be concerned about here? The multiple CXPACKET waits, or the SHOV_L wait that is holding EVERYONE up? Of course, it's the non-CXPACKET wait.

So, how do we see this in SQL Server in order to focus on the SQL equivalent of a SHOV_L wait?

Focus on the Non-CXPACKET Execution Context of Parallelized Requests

The Analyst came to me because of information presented to her through sys.dm_exec_requests. While that provides information at the request level, you've now learned that in cases where requests can go parallel, you'll end up with multiple execution contexts.

We need to expand the query she was using beyond just sys.dm_exec_requests to also include additional Dynamic Management Views (DMVs) that will give granular detail at the execution_context_id level. Once we do that, you begin to see that, while a request may post out a single wait on CXPACKET in the Requests DMV, the underlying resource wait that is the true cause of the wait isn't really CXPACKET at all. You'll also see why CXPACKET always seems to be the top wait whenever you look at accumulated wait statistics for any SQL Server instance that has parallelism to any degree:

--==========================  --What is waiting?--==========================  SELECT DB_NAME(eR.database_id) AS database_name , eR.session_id , oWT.exec_context_id , oWT.blocking_exec_context_id , oWT.wait_duration_ms , oWT.wait_type , oWT.resource_descriptionFROM sys.dm_os_waiting_tasks oWT INNER JOIN sys.dm_exec_sessions eS   ON oWT.session_id = eS.session_id INNER JOIN sys.dm_exec_requests eR   ON oWT.session_id = eR.session_idWHERE eS.session_id <> @@SPID AND is_user_process = 1ORDER BY eR.session_id , oWT.exec_context_id , oWT.blocking_exec_context_id;

Taking a quick look at the first dozen-or-so results, we can see that there is actually a decent amount of CXPACKET waits accruing. This is happening because some execution contexts are still processing workloads while these execution contexts continue to wait (and post CXPACKET wait types as a result). You should also note that, while this information looks redundant, it's not. If you take a close look, you'll see the sys.dm_os_waiting_tasks DMV provides information at the exec_context_id level and that an execution context will show a block by any actively-engaged execution contexts working on the parallelized workload.

So, what is the wait we should be looking into if CXPACKET is no more than a signifier of other contraints? That information can be seen if we look at the end of this results set to those execution contexts that are not posting CXPACKET waits:

Here, you can see that under all the waiting execution contexts, we're really dealing with exclusive latch waits (LATCH_EX). I also include resource_description in the column list from sys.dm_os_waiting_tasks so you can get additional information if needed. 

In this case, if we expand any of the active records we see the following:

ACCESS_METHODS_DATASET_PARENT

This points to the active elements of this executing request waiting on getting a latch (lightweight lock) in order to do some internal work of knitting this mish-mash of individual results all back together to return to the calling client. What this can mean on a larger scale, though, is that the query driving this workload needs to be optimized to be less of a burden on the instance's buffer pool. Perhaps, there are indexes that can't be taken advantage of in the current structure of the query, or the query is too broad and could benefit from additional predicates to filter out unnecessary results earlier in the execution plan's timeline. 

What it does clearly state, is that your wait is really not CXPACKET.  It's LATCH_EX. However, what it does also say is that in all truth, parallelism is assisting in the waiting because what is probably an inefficient query is going parallel, and due to stress on the buffer pool, those parallel processes are competing on resources to knit intermediate results together in order to finish processing.

I'll still say, more often than not, a parallelized query is usually going to execute faster than as serial query of the same ilk. Keep in mind that there will always be some exceptions. Am I saying here that we need to turn off parallelism (essentially) by setting MAXDOP = 1? Of course not. I'm advocating, as I usually do, to looking at the underlying query and looking for tuning opportunities first and foremost.

Related: Parallelism in SQL Server Query Tuning

About the Author

Tim Ford

Owner, SQL Cruise

http://thesqlagentman.com/

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like