Understanding SQLIOSim Output
More information on the SQLIOSim IO testing tool I described in my Tool Time column.
June 26, 2007
One of the things that’s great about my job is the fact that I get to meet people who are a lot smarter than me. One such person I count among my friends is Jimmy May, a Microsoftee from the Indianapolis area. Jimmy has been using SQLIOSIM for a while, but has discovered that its reporting leaves a bit to be desired. Jimmy and I were discussing the reporting characteristics of SQLIOSIM and he thought it’d be useful to share his findings with the public. Thanks for sharing, Jimmy!
~~~~~
One thing which threw me off at first is the all-important “Target I/O” parameter. As I understand it—be clear that I’m paraphrasing here—SQLIOSim tries to hammer the system trying to get to that threshold. In fact, a well-tuned system with lots of capacity will resist high values. Note my comment below loosely comparing it to disk latency.
SQLIOSim Sources
KB: http://support.microsoft.com/kb/231619/en-us
SQL Server Storage Engine blog: http://blogs.msdn.com/sqlserverstorageengine/archive/2006/10/06/SQLIOSim-available-for-download.aspx
PSS SQL Server Engineers: http://blogs.msdn.com/psssql/archive/2007/04/10/sqliosim-outstanding-i-o-warnings.aspx
SQLIOSim Amalgamation of Undocumented Info
Characterization of Output:
Amalgamated from Bob Dorr & George Reynya
Target IO Duration (ms)
A threshold--SQLIOSim tries to hammer the system yet tries to stay under that threshold
If the average I/O duration exceeds the target I/O duration, the SQLIOSim utility throttles the number of outstanding I/O operations to decrease the load and to improve I/O completion time.
Pasted from <http://support.microsoft.com/kb/231619>
Running Average IO Duration (ms)
Low values are good
[Bob Dorr] Characterized as "Response time for the stress test"
[Bob Dorr] Target IO is 100ms; since ideal I/O is 4-8ms, anything higher is not a preferred outcome
[JWM] Though strictly speaking this isn't "disk latency" (Avg. Disk sec/[Read|Write|Transfer], acceptable values are similar:
Data files: 0-5ms or so are excellent
Log files: 0-2ms are acceptable
Warnings for I/Os exceeding 15s (à la SP4 diags)
Algorithm is designed to mitigate big swings, esp. because of one very large I/O (similar to SQL Server Checkpoint logic)
Used to control the number of outstanding I/O requests
IO request blocks
High numbers are good
The more requests a system can handle, the more I/Os are issued by the tool
Characterized as "Concurrent IOs"
Capped by MaxIO; not uncommon to have 100 or more, esp. for Enterprise Edition for which read-aheads & such are attempted
Accumulators: Reads, Scatter Reads, Writes, Gather Writes
High numbers are good
Simple Accumulators
Each time an operation is posted, the category for the specific request type is incremented
Total IO Time (ms)
Desirable numbers are relative
Useful for comparing drives, especially on the same system.
If the values for the Accumulators are similar for two drives, but one drive has significantly less Total IO Time,
then the latter is more performant because it was able to service the same I/O in less time.
Characterized as "Accumulated total duration time for the I/O in ms"
The same duration is used to track the "Running Average IO Duration (ms)"
Sum of accumulators * running average = ~Running Average IO Duration
Number of times IO throttled
Low numbers are better
Number of times a request was removed because the duration was exceeded
Additional Documentation
Additional information will be based on demand & will be added to KB article & Wiki
Architecture
SQLIOSim is built with lightweight UMS/SQLOS-like layer to better simulate SQL Server, incl. logical CPUs
Output:
Output Example #1:
Data
********** Final Summary for file S:sqliosim.mdx **********
Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 17, Number of times IO throttled = 9006, IO request blocks = 33
Reads = 75414, Scatter Reads = 88286, Writes = 3295, Gather Writes = 54845, Total IO Time (ms) = 44936446
********** Final Summary for file T:sqliosim.mdx **********
Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 4135, IO request blocks = 106
Reads = 67091, Scatter Reads = 81249, Writes = 2501, Gather Writes = 53652, Total IO Time (ms) = 13033268
Interpretation
[George Reynya] The running average of 17ms for S: does not look all that bad as a target as it means that over the life of the test you maintained 17ms response time for the stress test. Your test shows that the number of concurrent IOs has dropped to 33 which is not so good. Especially when you compare the same basic work load against the T: drive--0ms w/ 106 I/O requests. I would look at the S drive IO path a bit to determine the differences.
[JWM] Note: The trial lasted 300s. The interpretation is consistent with physical layout: Disks were SAS DAS 73GB 15K RAID 10; S: was comprised of 6 disks, T: of 8 disks.
[JWM] Note: George's comment regarding 17ms vs. 33ms was relative; in fact, a neither value would not be acceptable to me—I’m shooting for <5ms for my data drives.
Output Example #2:
Data
Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 121, Number of times IO throttled = 1167, IO request blocks = 13
Interpretation
[George Reynya] Customer disk system exceeded Target IO Duration by 21ms for this file and had to throttle 1167 times in an attempt to keep the IO within the target. With the number of IOs we do for a file this is a bit concerning. We tried to throttle several times and they could not keep the average below the 100ms target.
Output Example #3:
Data
04/27/07 14:24:55 ********** Final Summary for file M:sqliosim.mdx **********
04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 15, Number of times IO throttled = 382, IO request blocks = 108
04/27/07 14:24:55 Reads = 187508, Scatter Reads = 259268, Writes = 15286, Gather Writes = 130498, Total IO Time (ms) = 5460993
04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255
04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No
04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616951, Write time = 6606954, Idle time = 170616, Bytes read = 74679347200, Bytes written = 74403336192, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK
04/27/07 14:24:55 ********** Final Summary for file M:sqliosim2.mdx **********
04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No
04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 16, Number of times IO throttled = 762, IO request blocks = 44
04/27/07 14:24:55 Reads = 183061, Scatter Reads = 256624, Writes = 15038, Gather Writes = 132071, Total IO Time (ms) = 5890665
04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255
04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No
04/27/07 14:24:55 DRIVE LEVEL: Read count = 886717, Read time = 3333859, Write count = 616967, Write time = 6606962, Idle time = 170630, Bytes read = 74679609344, Bytes written = 74403405824, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK
04/27/07 14:24:55 ********** Final Summary for file M:sqliosim.ldx **********
04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No
04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 0, IO request blocks = 8
04/27/07 14:24:55 Reads = 0, Scatter Reads = 0, Writes = 316141, Gather Writes = 0, Total IO Time (ms) = 1346919
04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No
04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616962, Write time = 6606959, Idle time = 170623, Bytes read = 74679347200, Bytes written = 74403381248, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK
Stress Test Attempt Complete
Interpretation
[JWM] This test was done against what was thought by the customer to be a well-architected disk I/O subsystem (EMC Clariion CX700, lots of 146GB 15K disks, RAID 5 (RAID5--when will they learn?)). The system is intended to support a system from which very high performance is required.
The results of this test belie this assumption. I would want Average IO Duration to be less than 5ms, no more than 10ms; however both data files have values exceeding this: 15ms & 16ms, respectively.
The IO Request Blocks aren't nearly as high as I'd expect, especially compared to the 8 disk RAID 10 drive T: in Output Example #1.
The far less respectable values for M:sqliosim2.mdx compared to the first file will prompt further investigation.
Note also that I/O was throttled 382 & 762 times, respectively.
The Average IO Duration for the log file is excellent. I don't have enough experience interpreting output for log files, so I don't know whether to be concerned about the log number of IO Request Blocks.
About the Author
You May Also Like