QUEUES (Perfmon Counters)
The Queues aspect of the Waits and Queues approach to performance analysis refers to PERFMON counters and other sources of information that measures resource usage. PERFMON counters provide a view of system performance from a resource perspective.
PERFMON Counters, correlation, possible conclusions and actions
Resource Component Disk
Perfmon Object: Physical Disk
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Current Disk Queue Length
|
Sustained high queues mean your IO subsystem is not keeping up.
|
Confirm IO issues with disk sec/read and disk sec/write.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
Average Disk Queue Length
|
Average of disk queues over time. If this number is consistently high, disk sec/read and disk sec/write is also high indicating IO bandwidth issues.
|
Confirm IO issues with disk sec/read and disk sec/write. Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
Disk Sec/Read
|
Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor for exact read time). Sustained queues skew this number higher because disk sec/read factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests
Check individual drive performance if there are multiple drives. If it is a broad problem affecting all drives, the IO subsystem is not keeping up. More drives could be useful. If there is ONE very hot drive, examine disk activity such as location of paging file, database, transaction log, and other read/write activity.
|
If disk sec/read > normal read time (ask vendor for typical read time) you can consider the following options:
1. Resolve IO bottleneck by adding more drives; spreading IO across new drives if it is possible. For example, move files such as database, transaction log, other application files that are being written to or read from.
2. Check for memory pressure, see memory component.
3. Check for appropriate indexing of SQL tables. Correct indexing can save IO. Check SQL query plans looking for scans and sorts, and so on. Showplan identifies sorting steps.
4. Run SQL Profiler to identify Transact-SQL statements doing scans. In Profiler, select the scans event class and scan stopped event. Click the data column tab and add object Id. Run the trace. Save the profiler trace to a trace table, and then search for the scans event. Alternatively, you can search for high duration, reads, and writes.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
Disk Sec/Write
|
Under typical circumstances, reads should take 4-8 ms (confirm with hardware vendor). Sustained queues skew this disk sec/write higher because this counter factors in the effects of disk queues. High numbers mean your IO subsystem is not keeping up with requests. In some SAN environments, writes can be as low as 1-2 ms.
|
See disk sec/read.
High performance (significant insert, update, and delete activity) requires the transaction log to be on a separate drive from the database.
Waitstats correlation:
1. IO_COMPLETION
2. ASYNC_IO_COMPLETION
3. WRITELOG
4. LOGMGR
|
Resource Component: Memory / Cache
Perfmon Object: Memory
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Page Faults/sec
|
Includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle a large numbers of soft faults without significant consequences. However, hard faults that require disk access can cause significant delays. See the disk component for more information.
|
Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending.
|
Pages/sec
|
Number of pages read from or written to disk to resolve hard page faults.
These are hard faults that require physical IO to fetch the page.
|
Compare with Page Faults/sec.
Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending.
|
Resource Component: CPU
Perfmon Object: Processor
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
% User Time
|
Percentage of time SQL Server runs in User mode. Privileged mode is designed for operating system components and enables direct access to hardware and all memory.
|
Make sure % user time >70%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If user time <70%, check on %Processor Time and % Privileged activity.
|
% Privileged Time
|
The operating system switches application threads to privileged mode to access operating system services
|
Should be <20%. Check task manager (taskmgr.exe) to see how much CPU sqlserver.exe is getting. If % privileged time >20%, check on % Processor Time and % User Time.
|
% Processor Time
|
Percentage of time the CPU is executing over sample interval.
|
Common uses of CPU resources:
1. Compilation and recompilation use CPU resources. Plan reuse and parameterization minimizes CPU consumption because of compilation. For more information about compilation, recompilation, parameterization, and plan reuse, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
Plan reuse is where usecounts are > 1
Select cacheobjtype, objtype, usecounts, or refcounts from sys.dm_exec_cached_plans and order by usecounts
Matches to PERFMON counters:
1. System: Processor Queue length
2. SQL Statistics: Compilations/sec
3. SQL Statistics: Re-Compilations/sec
4. SQL Statistics: Requests/sec
If both of the following are true, you are CPU bound:
1. Proc time >85% on average
2. Context switches (see system object) >20K / sec
Light weight pooling can provide a 15% boost. Light weight pooling (also known as fiber mode) divides a thread into 10 fibers. Overhead per fiber is less than that of individual threads.
|
% Idle Time
|
Percentage of time CPU is idle over sample interval
|
|
Interrupts/sec
|
Interrupts/sec is the average rate, in incidents per second, at which the processor received and serviced hardware interrupts.
|
Correlate with other perfmon counters such as IO, Network.
|
Resource Component Thread
Perfmon Object: Process
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Page Faults/sec
|
This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequences. However, hard faults, which require disk access, can cause significant delays. See the disk component for more information.
|
Check for memory pressure (see SQL Server buffer manager), low data page hit rates, and memory grants pending, page life expectancy.
|
Resource Component: System
Perfmon Object: System
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Processor Queue Length
|
|
Number of threads waiting to be scheduled for CPU time. Some common uses of CPU resources that can be avoided:
1. Unnecessary compilation and recompilation. Parameterization and plan reuse would reduce CPU consumption. See http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
2. memory pressure
3. lack of appropriate indexing
|
Context Switches/sec
|
|
|
Resource Component: SQL Server
Perfmon Object: SQLServer:Access Method
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Forwarded Records/sec
|
Number of records fetched through forwarded record pointers.
Tables without a clustered index. If you start with a short row, and update the row creating a wider row, the row might not fit on the data page. A pointer is put in its location and the row is forwarded to another page.
|
Look at code to determine where the short row is inserted followed by an update.
Can be avoided by:
1. Using Default values (so that an update does not result in a longer row that is the root cause of forwarded records).
2. Using Char instead of varchar (fixes length so that an update does not result in a longer row
|
Full Scan/sec
|
The entire table or index is scanned. Scans can cause excessive IO if an index would be useful.
|
SQL Profiler can be used to identify which Transact-SQL statements do scan. Select the scans event class and events scan:started and scan:completed. Include the object Id data column. Save the profiler trace to a trace table, and then search for the scans event.
The scan:completed event provides associated IO so that you can also search for high reads, writes, and duration.
|
Index Searches/
sec
|
Number of index searches. Index searches are used to start range scans, single index record fetches, and to reposition in an index.
|
Compare to Full Scan/sec. You want to see high values for index searches.
|
Page Splits/sec
|
Number of page splits occurring as the result of index pages overflowing. Typically associated with leaf pages of clustered indexes and non-clustered indexes.
|
Page splits are extra IO overhead that results from random inserts.
When there is no room on a data page, and the row must be inserted on the page (because of index order), SQL splits the page by inserting a new row and moving the rests of the rows to a new page.
Compare to Disk: page sec/write. If this is very high, you might want to reorganize the indexes on the tables causing the page splits, to reduce page splits temporarily. Fillfactor can be used to leave space for inserts.
|
Resource Component: SQL Server
Perfmon Object: SQLServer:Memory Mgr
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Memory Grants Pending
|
Memory resources are required for each user request. If sufficient memory is not available, the user waits until there is adequate memory for the query to run.
|
Compare with Memory grants outstanding. If grants pending increases, you can do the following:
1. add more memory to SQL Server
2. add more physical memory to the box.
3. check for memory pressure. See and correct indexing if you experience “out of memory” conditions.
Correlate to Waittype
1. RESOURCE_SEMAPHORE
|
Resource Component: SQL Server
Perfmon Object: SQLServer:Buffer Manager
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Buffer cache hit ratio
|
Percentage of time that the pages requested are already in cache
|
Check for memory pressure. See Checkpoint pages/sec, Lazy writes/sec and Page life expectancy.
|
Checkpoint pages/sec
|
Pages written to disk during the checkpoint process. This frees the SQL cache
|
Memory pressure is indicated if this counter is high together with high lazy writes/sec and low page life expectancy (<300 seconds)
|
Lazy writes/sec
|
Pages written to disk by the lazy writer, This frees the SQL cache
|
Memory pressure is indicated if this counter is high together with high lazy writes/sec and low page life expectancy (<300 seconds)
|
Page life expectancy
|
Time in seconds the data pages, on average, stay in SQL cache. Low page life <300 can indicate (1) SQL cache is cold, (2) memory problems or (3) missing indexes. Correlate to Lazy writes/sec and Checkpoint pages/sec
|
Memory pressure is indicated if this counter is low (<300) together with high lazy writes/sec and checkpoint pages/sec.
Check for missing indexes and bad query plans (scans in profiler)
Check for high page faults/sec.
|
Readahead pages/sec
|
If there are memory shortages, a cold cache, or low hit rates, SQL Server might use worker threads to readahead (bring in pages beforehand) to raise hit rates. By itself readahead is not a problem unless users are flushing each other’s pages consistently.
|
Correlate to counters for SQL buffer mgr: buffer cache hit ratio, page life expectancy, lazy writes, and checkpoint pages for memory pressure.
Check for appropriate indexing and bad query plans (scans in profiler)
|
Resource Component: SQL Server
Perfmon Object: SQLServer:Plan Cache
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Cache Hit Ratio
|
Percentage of time that the procedure plan pages are already in cache. For example,. procedure cache hits. That is, how frequently a compiled procedure is found in the procedure cache (therefore avoiding the need to recompile).
|
Check for memory pressure. See Checkpoint pages/sec, Lazy writes/sec and Page life expectancy.
See SQL Profiler: Stored Procedure: CacheHit, CacheMiss, and CacheInsert to see what stored procedure query plans are already in cache (Hit) vs. those not in cache (Miss,Insert)
Check for appropriate plan reuse. See section “Query Plan Reuse”. It is frequently desirable for query plans to be reused for similar SQL although not always.
See SQL Statistics: Compilations/sec for discussion of plan reuse.
If there is memory pressure, plans are discarded to make room for other data or procedure plans.
|
Resource Component: SQL Server
Perfmon Object: SQLServer:Databases
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Log Flush Wait Time
|
Waiting for transaction log writes (ms)
|
See disk performance counters
Check transaction log file sys.dm_io_virtual_file_stats(dbid, file#) for Io_stall (waits in ms)
|
Log Flush Waits/sec
|
This is the number of commits waiting on a log flush.
|
See disk performance counters, sys.dm_io_virtual_file_stats for Io_stall.
|
Log Growths
|
Microsoft Windows® automatically extends the transaction log to accommodate insert, update, and delete activity.
|
Generally, growths of the transaction log temporarily freezes writes to the transaction log when Windows extends the transaction log file. Check to see that the extend increment is large enough. If not, performance will decrease when the log is extending frequently.
|
Transactions /sec
|
SQL Server transactions per second
|
|
Resource Component: SQL Server
Perfmon Object: SQLServer:General Statistics
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Logins/sec
|
Number of logins per second
|
User connections
|
Logout/sec
|
Number of logouts per second
|
|
User connections
|
Number of user connections
|
|
Resource Component: SQL Server
Perfmon Object: SQLServer:Latches
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Average Latch Wait Time(ms)
|
Latches are short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, and so on.
|
If high, check PERFMON DISK and MEMORY objects for
1. IO bottlenecks
2. memory pressure
Typically reduced with more memory or IO capacity
|
Latch Waits/sec
|
See Average Latch Wait Time(ms)
|
|
Total Latch Wait Time(ms)
|
Short term light weight synchronization object. Latches are not held for the duration of a transaction. Typical latching operations during row transfers to memory, controlling modifications to row offset table, and so on.
|
If high, check PERFMON DISK and MEMORY objects for
1. IO bottlenecks
2. memory pressure
Typically reduced with more memory or IO capacity
|
Resource Component: SQL Server
Perfmon Object: SQLServer:Locks
Counters to Monitor
|
Description
|
Possible conclusions / actions
|
Average Wait Time(ms)
|
Transactions should be as short as possible to limit the blocking of other users.
|
Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks.
|
Lock Wait Time(ms)
|
Transactions should be as short as possible to limit the blocking of other users.
|
Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks
|
Lock Waits/sec
|
Transactions should be as short as possible to limit the blocking of other users.
|
Hint: check for memory pressure, which causes more physical IO, therefore prolonging the duration of transactions and locks
|
Resource Component: SQL Server
Perfmon Object: SQLServer:SQL Statistics
Counters to Monitor / Description
|
Possible conclusions / actions
|
SQL Compilations/sec
|
Before a SQL statement can be executed, the query optimizer must have created a query plan. A query plan consists of steps that return the results for a given SQL statement.
For more information about SQL Server 2005 Optimizer and Statistics refer to http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx
Compilations/sec Includes both initial compiles and subsequent recompiles. Compilation and recompilation are CPU intensive operations.
Unnecessary compilation can sometimes be avoided with query plan reuse. Check for appropriate plan reuse. See section “Query Plan Reuse”. It is frequently desirable for query plans to be reused for similar SQL although not always.
Parameterization is important for plan reuse. In addition, some types of recompilation can be avoided. See the SQL Server 2005 recompilation paper for more information: http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
To obtain initial compilations only, you must subtract recompilations/sec from compilations/sec.
Compare to batch requests/sec to see extent of compilation.
|
SQL Re-Compilations/sec
|
Only contains recompiles. SQL Profiler can provide information about what processors are recompiling, what statement, and the reason for recompilation. In Profiler, select the stored procedure event class and SP:recompilation event, and include the data column eventsubclass. Review the trace searching for eventsubclass values 1 through 6. The previous statements caused the recompilation. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
|
Batch Requests/sec
|
Total batch requests should be compared with compilations/sec
|
Auto-Param Attempts/sec
|
Auto-param tries should be compared to failed auto-params/sec. Appropriate parameterization is important for plan reuse. In some cases, Sp_executeSQL could be used with ad hoc SQL. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
|
Failed Auto-Params/sec
|
Auto-param tries should be compared to failed auto-params/sec. Appropriate parameterization is important for plan reuse. In some cases, Sp_executeSQL could be used with ad hoc SQL. For more information about recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
|
Interesting PERFMON Ratios and comparisons
Some counters in PERFMON have to be compared to other counters to draw the appropriate conclusion Although the following ratios and comparisons are not exhaustive, they nevertheless point you in the right direction.
-
Batch requests/sec versus SQL Compilations/sec. Plan reuse is desirable for OLTP workloads because of its tendency toward high volumes of identical transactions. See section titled "Plan reuse and Performance Counters". The worst case is when compilations are very high compared with batch requests as this could be a case of memory pressure where query plans are discarded quickly to make room for other activity. Another possibility is lack of parameterization, which is important for ad hoc SQL plan reuse. Parameterization is where variables are used instead of literal values. sp_executeSQL can be used to parameterize ad hoc SQL.
Perfmon counters are SQLServer:SQL Statistcs:Batch Requests/sec and SQLServer:SQL Statistics:SQL Compilations/sec
For more information about SQL Server 2005 Optimizer and Statistics see http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx
-
SQL Compilations/sec versus SQL Re-Compilations/sec. SQL Compilations/sec include all compilations (initial compiles and recompiles) but SQL Re-Compilations/sec only includes recompiles (excludes initial compiles). See section titled "Plan reuse and Performance Counters". If initial compiles are low (SQL Compilations – SQL Re-Compilations) compared to SQL recompilations, there is a probable recompilation problem. To find the actual statements that are recompiled, see section titled “Query Plans and DMVs".
Perfmon counters are SQLServer:SQL Statistcs: SQL Compilations/sec and SQLServer:SQL Statistics:SQL Re-Compilations/sec.
For more information, see the SQL Server 2005 recompilation paper: http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
-
Kernel CPU versus User CPU. If (Kernel CPU/ User CPU) > .25, this can indicate a network, disk driver, or hardware issue. Network and Disk IO is serviced in kernel mode. SQL is serviced in user mode. Look at Task Manager.
Perfmon counters are Processor:%Processor Time, Processor:%User time, and Processor:%Interrupt time.
-
Context switches per second. This value should remain under 20000. Values greater than this threshold indicate too much context switching. Values greater than 50000 indicate that the server might be spending more CPU clock cycles context switching threads instead of actively running given threads. This kind of scenario was visible on Microsoft Windows 2000 Server. In Windows Server® 2003 it is less likely but can still occur.
-
Disk Queue Length versus Disk sec/Transfer. As disk queue length increases, so does disk sec/transfer.
Performance monitor counters are PhysicalDisk:Avg Disk Queue Length and PhysicalDisk:Avg Disk sec/Transfer.
-
Page life expectancy, checkpoint pages/sec, lazy writes/sec comparison. Memory pressure is indicated with low page life expectancy, and high checkpoint pages and lazy writes/sec. Memory pressure, which adversely affects performance, can be lessened one or more of the following
-
Adding more memory to the box
-
Increasing SQL memory
-
Avoiding table and index scans with appropriate indexing
-
SQL buffer cache hit ratio. If this is consistently under 90% it indicates that the buffer cache is getting flushed.
-
Signal waits and the runnable queue comparison. Basics of execution model (simplified) is as follows.
-
If a session id is running and must lock a resource that is unavailable at the time, it moves to the resource wait list (time T0).
-
A signal indicates resource available, SPID moves to runnable queue at time T1.
-
SPID waits running status until T2 as CPU goes through the runnable queue in order of arrival
-
The resource wait time is the actual time waiting for the resource to be available, Time T0 to T1.
-
The signal wait time spent in the runnable queue. It starts from the time the resource is available (T1) to the point in which the process is running again at T2. Therefore, signal waits are T2-T1.
The larger the value T 1 – T 0 = means that the particular resource availability is tending to be more limited. If it is a lock that is waiting then blocking could likely be occurring. If it is IO that is waiting then the disk subsystem is likely to be bottlenecking.
T2 – T1 indicates CPU pressure. This indicates that as the value increases the time that is spent in the runnable queue also increases. Session_ids in the runnable queue are waiting only for CPU resource. If this is say >25%, there is a CPU bottleneck.
-
Key questions: Are Resource and Signal time significant?
-
Highest waits indicate the bottleneck you need to solve for scalability
-
Generally if you have LOW% SIGNAL WAITS, the CPU is handling the workload. For example, session_ids move through runnable queue quickly
-
HIGH % SIGNAL WAITS indicates CPU cannot keep up, significant time for SPIDs to move up the runnable queue to reach running status
-
Network: Current bandwidth, bytes total/sec, packets/sec. Network bandwidth issues should be corroborated with bytes total/sec. [Network interface: bytes total/sec] / [Network interface: Current Bandwidth] > .6, possible network bottleneck.
-
Page Faults/sec versus Pages/sec. Page faults include both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory). Most processors can handle large numbers of soft faults without significant consequences. However, hard faults, which require disk access, can cause significant delays. Pages/sec represents the number of hard page faults that require physical IO to bring the pages into memory.
Memory Issues
The SQL Server relational database system uses memory for many different purposes internally. For a complete discussion of the memory uses in SQL Server, see http://www.winnetmag.com/Article/ArticleID/43419/43419.html.
To summarize, the main uses are as follows:
-
Database page cache, which is used to cache database (table / index) pages
-
Query Workspace memory, which is used by memory intensive query operations such as Hash and Sort.
-
Plan cache, which is used to cache query plans so that they can be reused
-
Other; such as locks, connection memory, thread stacks, memory for utilities such as backup/restore and so on.
-
Memory used by other components linked into the SQL Server process such as XPs, OLE-DB providers and so on. The memory is typically referred to as the MemToLeave memory area because SQL Server refrains from allocating this memory so that these other components linked into the process can do so.
Comparison of 32-bit memory architecture vs. 64-bit flat memory
Of these, only the first use, database page cache, can use AWE memory on 32-bit systems. The rest of the uses require virtual memory, therefore they are limited to 2GB (or 3GB with the /3GB switch in boot.ini) on 32-bit systems. If an application stresses one or more of these other uses of memory in SQL Server to a point beyond what can be handled by the 32-bit virtual memory limits, you might consider the 64-bit option. To determine whether this is the case, see the following for some steps that you might want to consider:
-
Overall Server Memory: Look at counters under SQL Server:Memory Manager. If Total Server Memory is well below Target Server Memory at steady state, it tells you that the server is not experiencing memory pressure. In this case, you likely have no performance-related reason to consider SS64. Otherwise, you need to look further into the cause of memory pressure by following the following steps. Of course, if you do have memory pressure, you might already be using additional memory in SQL Server by enabling AWE. In this case, SQL Server allocates Max Server Memory at startup and therefore Total Server Memory does not change dynamically. Additionally in this case, follow these steps to tell if there continues to be memory pressure.
-
Also, examining the buffer cache hit ratio would be good. Typically one of the first signs of memory problems.
-
Query Workspace Memory: Look at counters under SQL Server:Memory Manager. Look at Memory Grants Outstanding and Memory Grants Pending. If you see a long queue of Pending grants compared to Outstanding grants, there is likely memory pressure because of query workspace memory. You can confirm this by checking the Granted Workspace Memory (KB) counter that tells you how much memory has currently been granted to running queries. If there is memory pressure because of workspace memory, this value should be at least 25% of the virtual memory available to SQL Server. If the memory pressure is severe, the server might even return errors such as 701 or 8645. If this is the case, this might be a good reason to consider using SS64.
-
Plan Cache: The counter SQL Server:Buffer Manager:Procedure Cache Pages captures the total number of pages in the plan cache. If this number is a significant fraction (typically, greater than 25 percent) of the total number of pages in the buffer pool, the application is plan cache intensive. However, this by itself is insufficient to consider a move to SS64. If the plan cache is large because it is full of plans that are rarely reused, moving to SS64 does not yield any benefits (and in fact might make the situation worse because of the larger size of plan cache as described previously). To determine what kinds of plan are in the plan cache see the section titled “Query Plan Reuse” to see whether these plans are being reused, examine the usecounts column. If the plan cache is full of plans that are being reused and yet there is memory pressure, this indicates the application would benefit from more virtual memory and therefore SS64 might be a good option to consider.
-
Memory pressure on MemToLeave, for example, because of XPs or OLE-DB providers. Typically, if you have pressure in the MemToLeave area, you might see errors such as 7399, 17802, or 17803. In these cases, you might have already considered altering the –g startup parameter for SQL Server to increase the MemToLeave value. This in turn might translate to some of the other kinds of memory pressure described here.
-
High CPU cost of AWE memory: In some cases, even if the memory use consists primarily of database page cache, the CPU cost of mapping and un-mapping database pages using AWE might become too expensive, as evidenced by high kernel CPU time. This is especially true when the number of CPUs in the system is 8 or more and/or when the size of physical memory exceeds 32GB on your 32-bit system. This is another point at which you might consider use of a 64-bit system.
64-bit flat memory vs. higher 32-bit clock speeds
As seen earlier, there are cases where memory pressure is genuine and SS64 might be an attractive option in those cases. However, even in these cases the choice of a 64-bit system over 32-bit systems is not straight-forward. Clock speeds on Itanium-based 64-bit systems are much lower than on Xeon-based 32-bit systems. The Itanium’s ability to execute multiple instructions at the same time does compensate for this to a degree. However, if your application is CPU-heavy, you might find you need as many or more processors on a 64-bit system as on the comparable 32-bit system to handle the same workload. It is always recommended that the relative performance of the two choices be verified through a prototype or proof of concept to verify that the 64-bit platform would be a good investment.
Application Design issues
There are application design considerations resulting from the Waits and Queues methodology. The following table describes some of the application design implications.
Observation
|
Application issue
|
Possible remedies
|
High IO waits
|
Database design
Memory pressure
|
Bad query plans resulting from inappropriate indexing.
Add correct indexes to minimize IO.
Add more memory
|
High CPU use
|
Memory pressure
Plan reuse
Parameterization
|
Check for correct plan reuse, parameterization, recompilation, see http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
|
High blocking / locking
|
Transaction management
|
Redo transaction management
Use correct transaction isolation levels
|
Recommendations
The Waits and Queues methodology is recommended as the most effective technique for identifying and resolving performance issues. It focuses on the best opportunities to improve performance, the so called “biggest bang for the buck”. These performance improvements are likely to have a significant return on the performance tuning time investment.
Conclusion
There are two complimentary sources of performance information for SQL Server. Wait types are an invaluable clue in analyzing overall system performance from an application point of view. Wait types provide a view of system performance from a SQL thread standpoint while Performance Monitor provides a view of system performance from a resource standpoint.
Wait statistics should be corroborated or associated with resource counters in Performance Monitor. For example, a high SQL Server wait types signal the need for additional PERFMON investigation of underlying resources such as processor, IO subsystem, network and so on. Together, these associations or correlations of wait types to performance counters, and other related counter ratios provide a broad picture of application performance.
In come cases, the experienced performance expert must look beyond the symptom to find the root problem. Although not exhaustive, the correlated performance information, possible conclusions and actions, and interesting ratios and comparisons sections shed light on actual root problems, given the symptoms. The waits and queues methodology presented here, identify system bottlenecks and propose additional corroboration and conclusions, where appropriate.
In sum, the performance methodology of waits and queues draws on the available performance information that consists of waitstats, PERFMON counters, and correlated information, to provide a broad profile of application performance. It is an invaluable tool in pinpointing bottlenecks and fixing performance problems.
For more information:
http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/default.mspx
Did this paper help you? Please give us your feedback. On a scale of 1 (poor) to 5 (excellent), how would you rate this paper?!href(mailto: sqlfback@microsoft.com?subject=Feedback: [Paper Title])
300>300>300> |