Compared to OLTP, data warehouse applications are characterized by low volumes of large transactions. This workload profile is exactly the opposite of OLTP. Data warehouse and reporting applications typically include big SELECT or read operations. The implications are significant for database design, resource usage, and system performance. Index fragmentation, cache turnover, and IO performance are often important factors for these workloads.
Data warehouse performance blue print objectives: There are performance problems if any of the resource issues in the following tables are true.
Note While the actual value in the value column could be debated, it is a good starting point when identifying common performance problems with DataWarehousing and Reporting applications.
Database Design
Rule
|
Description
|
Value
|
Source
|
Problem Description
|
1
|
Excessive sorting and RID lookup operations should be reduced with covered indexes.
|
|
Sys.dm_exec_sql_text
Sys.dm_exec_cached_plans
|
Large data warehouse can benefit from more indexes. Indexes can be used to cover queries and avoid sorting. The cost of index overhead is only paid when data is loaded.
|
2
|
Excessive fragmentation:
Average fragmentation_in_percent should be <25%
|
>25%
|
sys.dm_db _index_physical_stats
|
Reducing index fragmentation through index rebuilds can benefit big range scans, common in data warehouse and Reporting scenarios.
|
3
|
Scans and ranges are common. Look for missing indexes
|
>= 1
|
Perfmon object
SQL Server Access Methods
Sys.dm_db_missing_index_group_stats
Sys.dm_db_missing_index_groups
Sys.dm_db_missing_index_details
|
A missing index flushes the cache.
|
4
|
Unused Indexes should be avoided
|
If an index is NEVER used, it will not appear in the DMV sys.dm_db_index_usage_stats
|
Index maintenance for unused indexes should be avoided.
|
Resource issue: CPU
Rule
|
Description
|
Value
|
Source
|
Problem Description
|
1
|
Signal Waits
|
> 25%
|
Sys.dm_os_wait_stats
|
Time in runnable queue is pure CPU wait.
|
2
|
Avoid plan reuse
|
> 25%
|
Perfmon object
SQL Server Statistics
|
Data warehouse has fewer transactions than OLTP, each with significantly bigger IO. Therefore, having the correct plan is more important than reusing a plan. Unlike OLTP, data warehouse queries are not identical.
|
3
|
Parallelism: Cxpacket waits
|
<10%
|
Sys.dm_os_wait_stats
|
Parallelism is desirable in data warehouse or reporting workloads.
|
Resource issue: Memory
Rule
|
Description
|
Value
|
Source
|
Problem Description
|
1
|
Memory grants pending
|
>1
|
Perfmon object
SQL Server Memory Manager
|
Memory grant not available for query to run. Check for
Sufficient memory and page life expectancy.
|
2
|
Page life expectancy
|
Drops by 50%
|
Perfmon object
SQL Server Buffer Manager
|
Page life expectancy is the average number of seconds a data page stays in cache. Low values could indicate a cache flush that is caused by a big read.
Look for possible missing index.
|
Resource issue: IO
Rule
|
Description
|
Value
|
Source
|
Problem Description
|
1
|
Average Disk sec/read
|
>20 ms
|
Perfmon object
Physical Disk
|
Reads should take 4-8ms without any IO pressure.
|
2
|
Average Disk sec/write
|
>20 ms
|
Perfmon object
Physical Disk
|
Writes (sequential) can be as fast as 1 ms for transaction log.
|
3
|
Big scans
|
>1
|
Perfmon object
SQL Server Access Methods
|
A missing index flushes the cache.
|
4
|
If Top 2 values for wait stats are any of the following:
ASYNCH_IO_COMPLETION
IO_COMPLETION
LOGMGR
WRITELOG
PAGEIOLATCH_x
|
Top 2
|
Sys.dm_os_wait_stats
|
If top 2 wait_stats values include IO, there is an IO bottleneck
|
Resource issue: Blocking
Rule
|
Description
|
Value
|
Source
|
Problem Description
|
1
|
Block percentage
|
>2%
|
Sys.dm_db_index_operational_stats
|
Frequency of blocks.
|
2
|
Block process report
|
30 sec
|
Sp_configure, profiler
|
Report of statements.
|
3
|
Average Row Lock Waits
|
>100ms
|
Sys.dm_db_index_operational_stats
|
Duration of blocks.
|
4
|
If Top 2 values for wait stats are any of the following:
LCK_M_BU
LCK_M_IS
LCK_M_IU
LCK_M_IX
LCK_M_RIn_NL
LCK_M_RIn_S
LCK_M_RIn_U
LCK_M_RIn_X
LCK_M_RS_S
LCK_M_RS_U
LCK_M_RX_S
LCK_M_RX_U
LCK_M_RX_X
LCK_M_S
LCK_M_SCH_M
LCK_M_SCH_S
LCK_M_SIU
LCK_M_SIX
LCK_M_U
LCK_M_UIX
LCK_M_X
|
Top 2
|
Sys.dm_os_wait_stats
|
If top 2 wait_stats values include IO, there is a blocking bottleneck.
Consider using row versioning to minimize shared locking blocks.
|
Exactly the opposite of OLTP applications, reporting or relational data warehouse applications are characterized by small numbers of (different) big transactions. These are frequently SELECT intensive operations. The implications are significant for database design, resource usage, and system performance.
Reporting and data warehouse performance objectives are as follows:
-
Data warehouse and relational data warehouse designs can have more indexes as the cost of index maintenance is paid only one time, during the batch update process.
-
Plan reuse should generally be avoided. Plan reuse may result in picking up a plan that was good for some other query (with different data distribution), but may not be good for this query. The time taken for plan generation of a large DataWarehouse query is not nearly as important as having the right plan.
-
Sorts can and should be minimized with correct index usage.
-
Missing index situations should be investigated and corrected.
-
Large IOs such as range scans benefits from on disk contiguity. Index fragmentation should be frequently monitored and kept to a minimum with index rebuilds.
-
Blocking is generally uncommon as most data warehouse transactions are read operations.
-
Parallelism is generally desirable for data warehouse applications.
|