Troubleshoot slow-running queries

Article
06/20/2023

Original product version: SQL Server
Original KB number: 243589

Introduction

This article describes how to handle a performance issue that database applications may experience when using SQL Server: slow performance of a specific query or group of queries. The following methodology will help you narrow down the cause of the slow queries issue and direct you towards resolution.

Find slow queries

To establish that you have query performance issues on your SQL Server instance, start by examining queries by their execution time (elapsed time). Check if the time exceeds a threshold you have set (in milliseconds) based on an established performance baseline. For example, in a stress testing environment, you may have established a threshold for your workload to be no longer than 300 ms, and you can use this threshold. Then, you can identify all queries that exceed that threshold, focusing on each individual query and its pre-established performance baseline duration. Ultimately, business users care about the overall duration of database queries; therefore, the main focus is on execution duration. Other metrics like CPU time and logical reads are gathered to help with narrowing down the investigation.

For currently executing statements, check total_elapsed_time and cpu_time columns in sys.dm_exec_requests. Run the following query to get the data:

SELECT req.session_id , req.total_elapsed_time AS duration_ms , req.cpu_time AS cpu_time_ms , req.total_elapsed_time - req.cpu_time AS wait_time , req.logical_reads , SUBSTRING (REPLACE (REPLACE (SUBSTRING (ST.text, (req.statement_start_offset/2) + 1, ((CASE statement_end_offset WHEN -1 THEN DATALENGTH(ST.text) ELSE req.statement_end_offset END - req.statement_start_offset)/2) + 1) , CHAR(10), ' '), CHAR(13), ' '), 1, 512) AS statement_text FROM sys.dm_exec_requests AS req CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS STORDER BY total_elapsed_time DESC;

For past executions of the query, check last_elapsed_time and last_worker_time columns in sys.dm_exec_query_stats. Run the following query to get the data:

SELECT t.text, (qs.total_elapsed_time/1000) / qs.execution_count AS avg_elapsed_time, (qs.total_worker_time/1000) / qs.execution_count AS avg_cpu_time, ((qs.total_elapsed_time/1000) / qs.execution_count ) - ((qs.total_worker_time/1000) / qs.execution_count) AS avg_wait_time, qs.total_logical_reads / qs.execution_count AS avg_logical_reads, qs.total_logical_writes / qs.execution_count AS avg_writes, (qs.total_elapsed_time/1000) AS cumulative_elapsed_time_all_executionsFROM sys.dm_exec_query_stats qs CROSS apply sys.Dm_exec_sql_text (sql_handle) tWHERE t.text like '<Your Query>%'-- Replace <Your Query> with your query or the beginning part of your query. The special chars like '[','_','%','^' in the query should be escaped.ORDER BY (qs.total_elapsed_time / qs.execution_count) DESC

Note

If avg_wait_time shows a negative value, it's a parallel query.

If you can execute the query on demand in SQL Server Management Studio (SSMS) or Azure Data Studio, run it with SET STATISTICS TIME ON and SET STATISTICS IO ON.

SET STATISTICS TIME ONSET STATISTICS IO ON<YourQuery>SET STATISTICS IO OFFSET STATISTICS TIME OFF

Then from Messages, you'll see the CPU time, elapsed time, and logical reads like this:

 Table 'tblTest'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0. SQL Server Execution Times: CPU time = 460 ms, elapsed time = 470 ms.

If you can collect a query plan, check the data from the Execution plan properties.
1. Run the query with Include Actual Execution Plan on.
  See Also
  When is too many columns too many columns? – SQLServerCentral Forums A Workaround for the Access 255 Columns Limit What Do You Mean, SQL Can't Do Big Data?How to Optimize your SQL Database to handle millions of records ?
2. Select the left-most operator from Execution plan.
3. From Properties, expand QueryTimeStats property.
4. Check ElapsedTime and CpuTime.

Running vs. Waiting: why are queries slow?

If you find queries that exceed your predefined threshold, examine why they could be slow. The cause of performance problems can be grouped into two categories, running or waiting:

WAITING: Queries can be slow because they're waiting on a bottleneck for a long time. See a detailed list of bottlenecks in types of Waits.
RUNNING: Queries can be slow because they're running (executing) for a long time. In other words, these queries are actively using CPU resources.

A query can be running for some time and waiting for some time in its lifetime (duration). However, your focus is to determine which is the dominant category that contributes to its long elapsed time. Therefore, the first task is to establish in which category the queries fall. It's simple: if a query isn't running, it's waiting. Ideally, a query spends most of its elapsed time in a running state and very little time waiting for resources. Also, in the best-case scenario, a query runs within or below a predetermined baseline. Compare the elapsed time and CPU time of the query to determine the issue type.

Type 1: CPU-bound (runner)

If the CPU time is close, equal to, or higher than the elapsed time, you can treat it as a CPU-bound query. For example, if the elapsed time is 3000 milliseconds (ms) and the CPU time is 2900 ms, that means most of the elapsed time is spent on the CPU. Then we can say it's a CPU-bound query.

Examples of running (CPU-bound) queries:

Elapsed Time (ms)	CPU Time (ms)	Reads (logical)
3200	3000	300000
1080	1000	20

Logical reads - reading data/index pages in cache - are most frequently the drivers of CPU utilization in SQL Server. There could be scenarios where CPU use comes from other sources: a while loop (in T-SQL or other code like XProcs or SQL CRL objects). The second example in the table illustrates such a scenario, where the majority of the CPU is not from reads.

Note

If the CPU time is greater than the duration, this indicates a parallel query is executed; multiple threads are using the CPU at the same time. For more information, see Parallel queries - runner or waiter.

Type 2: Waiting on a bottleneck (waiter)

A query is waiting on a bottleneck if the elapsed time is significantly greater than the CPU time. The elapsed time includes the time executing the query on the CPU (CPU time) and the time waiting for a resource to be released (wait time). For example, if the elapsed time is 2000 ms and the CPU time is 300 ms, the wait time is 1700 ms (2000 - 300 = 1700). For more information, see Types of Waits.

Examples of waiting queries:

Elapsed Time (ms)	CPU Time (ms)	Reads (logical)
2000	300	28000
10080	700	80000

Parallel queries - runner or waiter

Parallel queries may use more CPU time than the overall duration. The goal of parallelism is to allow multiple threads to run parts of a query simultaneously. In one second of clock time, a query may use eight seconds of CPU time by executing eight parallel threads. Therefore, it becomes challenging to determine a CPU-bound or a waiting query based on the elapsed time and CPU time difference. However, as a general rule, follow the principles listed in the above two sections. The summary is:

If the elapsed time is much greater than the CPU time, consider it a waiter.
If the CPU time is much greater than the elapsed time, consider it a runner.

Examples of Parallel queries:

Elapsed Time (ms)	CPU Time (ms)	Reads (logical)
1200	8100	850000
3080	12300	1500000

High-level visual representation of the methodology

Diagnose and resolve waiting queries

If you established that your queries of interest are waiters, your next step is to focus on resolving bottleneck issues. Otherwise, go to step 4: Diagnose and resolve running queries.

To optimize a query that's waiting on bottlenecks, identify how long the wait is and where the bottleneck is (the wait type). Once the wait type is confirmed, reduce the wait time or eliminate the wait completely.

To calculate the approximate wait time, subtract the CPU time (worker time) from the elapsed time of a query. Typically, the CPU time is the actual execution time, and the remaining part of the lifetime of the query is waiting.

Examples of how to calculate approximate wait duration:

Elapsed Time (ms)	CPU Time (ms)	Wait time (ms)
3200	3000	200
7080	1000	6080

Identify the bottleneck or wait

To identify historical long-waiting queries (for example, >20% of the overall elapsed time is wait time), run the following query. This query uses performance statistics for cached query plans since the start of SQL Server.

SELECT t.text, qs.total_elapsed_time / qs.execution_count AS avg_elapsed_time, qs.total_worker_time / qs.execution_count AS avg_cpu_time, (qs.total_elapsed_time - qs.total_worker_time) / qs.execution_count AS avg_wait_time, qs.total_logical_reads / qs.execution_count AS avg_logical_reads, qs.total_logical_writes / qs.execution_count AS avg_writes, qs.total_elapsed_time AS cumulative_elapsed_timeFROM sys.dm_exec_query_stats qs CROSS apply sys.Dm_exec_sql_text (sql_handle) tWHERE (qs.total_elapsed_time - qs.total_worker_time) / qs.total_elapsed_time > 0.2ORDER BY qs.total_elapsed_time / qs.execution_count DESC

To identify currently executing queries with waits longer than 500 ms, run the following query:

SELECT r.session_id, r.wait_type, r.wait_time AS wait_time_msFROM sys.dm_exec_requests r JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id WHERE wait_time > 500AND is_user_process = 1

If you can collect a query plan, check the WaitStats from the execution plan properties in SSMS:
1. Run the query with Include Actual Execution Plan on.
2. Right-click the left-most operator in the Execution plan tab
3. Select Properties and then WaitStats property.
4. Check the WaitTimeMs and WaitType.
If you're familiar with PSSDiag/SQLdiag or SQL LogScout LightPerf/GeneralPerf scenarios, consider using either of them to collect performance statistics and identify waiting queries on your SQL Server instance. You can import the collected data files and analyze the performance data with SQL Nexus.

References to help eliminate or reduce waits

The causes and resolutions for each wait type vary. There's no one general method to resolve all wait types. Here are articles to troubleshoot and resolve common wait type issues:

Understand and resolve blocking issues (LCK_M_*)
Understand and resolve Azure SQL Database blocking problems
Troubleshoot slow SQL Server performance caused by I/O issues (PAGEIOLATCH_*, WRITELOG, IO_COMPLETION, BACKUPIO)
Resolve last-page insert PAGELATCH_EX contention in SQL Server
Memory grants explanations and solutions (RESOURCE_SEMAPHORE)
Troubleshoot slow queries that result from ASYNC_NETWORK_IO wait type
Troubleshooting High HADR_SYNC_COMMIT wait type with Always On Availability Groups
How It Works: CMEMTHREAD and Debugging Them
Making parallelism waits actionable (CXPACKET and CXCONSUMER)
THREADPOOL wait

For descriptions of many Wait types and what they indicate, see the table in Types of Waits.

Diagnose and resolve running queries

If CPU (worker) time is very close to the overall elapsed duration, the query spends most of its lifetime executing. Typically, when the SQL Server engine drives high CPU usage, the high CPU usage is coming from queries that drive a large number of logical reads (the most common reason).

To identify the queries that are responsible for high-CPU activity currently, run the following statement:

SELECT TOP 10 s.session_id, r.status, r.cpu_time, r.logical_reads, r.reads, r.writes, r.total_elapsed_time / (1000 * 60) 'Elaps M', SUBSTRING(st.TEXT, (r.statement_start_offset / 2) + 1, ((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(st.TEXT) ELSE r.statement_end_offset END - r.statement_start_offset) / 2) + 1) AS statement_text, COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' + QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid)) + N'.' + QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text, r.command, s.login_name, s.host_name, s.program_name, s.last_request_end_time, s.login_time, r.open_transaction_countFROM sys.dm_exec_sessions AS sJOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS stWHERE r.session_id != @@SPIDORDER BY r.cpu_time DESC

If queries aren't driving the CPU at this moment, you can run the following statement to look for historical CPU-bound queries:

SELECT TOP 10 st.text AS batch_text, SUBSTRING(st.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(st.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset) / 2) + 1) AS statement_text, (qs.total_worker_time / 1000) / qs.execution_count AS avg_cpu_time_ms, (qs.total_elapsed_time / 1000) / qs.execution_count AS avg_elapsed_time_ms, qs.total_logical_reads / qs.execution_count AS avg_logical_reads, (qs.total_worker_time / 1000) AS cumulative_cpu_time_all_executions_ms, (qs.total_elapsed_time / 1000) AS cumulative_elapsed_time_all_executions_msFROM sys.dm_exec_query_stats qsCROSS APPLY sys.dm_exec_sql_text(sql_handle) stORDER BY(qs.total_worker_time / qs.execution_count) DESC

Common methods to resolve long-running, CPU-bound queries

Examine the query plan of the query
Update Statistics
Identify and apply Missing Indexes. For more steps on how to identify missing indexes, see Tune nonclustered indexes with missing index suggestions
Redesign or rewrite the queries
Identify and resolve parameter-sensitive plans
Identify and resolve SARG-ability issues
Identify and resolve Row goal issues where long-running nested loops can be caused by TOP, EXISTS, IN, FAST, SET ROWCOUNT, OPTION (FAST N). For more information, see Row Goals Gone Rogue and Showplan enhancements - Row Goal EstimateRowsWithoutRowGoal
Assess and resolve cardinality estimation issues. For more information, see Decreased query performance after upgrade from SQL Server 2012 or earlier to 2014 or later
Identify and resolve quries that seem no never complete, see Troubleshoot queries that seem to never end in SQL Server
Identify and resolve slow queries affected by optimizer timeout
Identify high CPU performance issues. For more information, see Troubleshoot high-CPU-usage issues in SQL Server
Troubleshoot a query that shows a significant performance difference between two servers
Increase computing resources on the system (CPUs)
Troubleshoot UPDATE performance issues with narrow and wide plans

Recommended resources

Detectable types of query performance bottlenecks in SQL Server and Azure SQL Managed Instance
Performance Monitoring and Tuning Tools
Auto-Tuning Options in SQL Server
Index architecture and Design Guidelines
Troubleshoot query time-out errors
Troubleshoot high-CPU-usage issues in SQL Server
Decreased query performance after upgrade from SQL Server 2012 or earlier to 2014 or later

I'm an expert in SQL Server performance optimization, and I can assure you that handling slow queries is a critical aspect of maintaining a well-performing database system. The methodology presented in the provided article offers a comprehensive approach to identifying and resolving performance issues related to slow queries. Let me break down the key concepts discussed in the article:

Finding Slow Queries:
- Queries are examined based on their execution time (elapsed time).
- Metrics such as total_elapsed_time, cpu_time, logical_reads, and statement_text are analyzed using sys.dm_exec_requests and sys.dm_exec_query_stats.
Identifying Query Performance Issues:
- Two main categories of performance issues are discussed: running (CPU-bound) and waiting.
- CPU-bound queries spend most of their elapsed time on the CPU, while waiting queries are delayed by bottlenecks.
- Parallel queries, which use multiple threads simultaneously, are also considered.
Diagnosing Waiting Queries:
- Focus is on resolving bottleneck issues for queries identified as waiters.
- Calculating wait time involves subtracting CPU time from the total elapsed time.
- Queries with wait times exceeding 20% of the overall elapsed time are identified for further analysis.
Identifying and Resolving Wait Types:
- Historical and currently executing queries with long waits are identified using sys.dm_exec_query_stats and sys.dm_exec_requests.
- Different wait types are discussed, and references are provided for troubleshooting and resolving each type.
Diagnosing Running Queries:
- Queries with high CPU time are identified using sys.dm_exec_sessions and sys.dm_exec_requests.
- Historical CPU-bound queries are also examined.
- Various methods for resolving long-running, CPU-bound queries are discussed, such as examining query plans, updating statistics, and identifying missing indexes.
Common Methods for Query Optimization:
- Strategies for optimizing queries include examining query plans, updating statistics, identifying missing indexes, and redesigning or rewriting queries.
- Specific issues like parameter sensitivity, SARG-ability, and row goals are addressed.
Recommended Resources:
- The article provides references for further reading on topics such as query performance bottlenecks, performance monitoring tools, auto-tuning options, index architecture, and troubleshooting.

In conclusion, the article offers a robust methodology for handling slow queries in SQL Server, covering both running and waiting scenarios. By following the outlined steps and leveraging the recommended resources, database administrators and developers can effectively diagnose and resolve performance issues, ensuring optimal SQL Server performance.

Troubleshoot slow-running queries - SQL Server (2024)