Write down 3 biggest pain points- "IS" it the database that's slowing things down or environmental? Is perfmon the answer?
- Deadlocks / locking. Should we be running in a differnet isolation mode?
- Audit solution - triggers vs in-app, etc.
Gather SQL server statsGather Wait statistics
- Batch Requests per Second (measure of workload)
- "SQL Query Stress" tool
- rollback will give you a % complete status while 'kill' will not
- rollback only ever executes on a single thread, regardless of DOP
sp_WhoIsActive
only shows you the last command in the transaction in the session- make sure you're using
@get_locks
command to see the locks held by the session
- make sure you're using
sp_AskBrent
will diagnose whether or not there's really a problem with the SQL ServerSOS_Scheduler_Yield
points to a CPU issuePage IO Latch
should return in less than 100ms otherwise there may be a storage issueexec sp_configure 'remote admin connections', 1
This will enable Direct Admin Connectionssp_BlitzCache
- cpu weight is a measure of cpu load
- focus on
read weight
to see who is doing the most work exec sp_BlitzCache @top = 10, @sort_order = 'reads'
Will also run sort by 'executions'
- Run
sp_BlitzCache
once per week and review to get a feel for what the top queries should be - Triage tools in order
- Monitoring software (flatline)
sp_WhoIsActive
sp_AskBrent
sp_BlitzCache
sp_BlitzIndex
sp_Blitz
set statistics io on
- when using profiler, set 75 millisconds as a threshold to filter out the small statements
- Extended Events - uses event tracing for windows. lighter weight than a server side trace
- Single Statement Table Value Function - SQL Server can optimize and process this much more efficiently because SQL will inline the function
- SQL Server 2014 estimates that 100 rows will come back from a TVF. 2012 and earlier return 1.
- By pushing data into a temp table first, statistics are going to be accurate
- Look for usages of fn_split in our larger procedures. SQL server does a terrible job of estimating when this function is involved
- DON'T JOIN TO TABLE VALUE FUNCTIONS (or table variables for that matter). use temp tables instead
- CTEs are executed every time they are referenced.
- Dynamic SQL can result in a large plan cache if you're not using parameters properly
- SARG (search argumentable). non-SARGable means it won't use an index. this can happen with optional sproc parameters
WHERE LEFT, UPPER, LIKE, YEAR, VARCHAR = @NVARCHAR
all of these result in a non-sargable queries. all of these will result in full table scans- Correlated subquery issues:
- has some of the same problems as scalar functions
- can use windowing functions ('OVER') instead
- Nested Views - can cause performance issues. Be careful
CXPACKET
means there were inefficiencies in how work was handed outCost Threshold for Parallelism
- default of 5 is typically too low these days starting point should be 50 or 100.- Amdahl's law - adding more workers becomes counter productive at some point
CXPACKET
should be below 50%, but don't drive to 0- usually means there's missing indexes or query opportunities
- this ensures that only big queries go parallel
- this is not fixed by setting max DOP, you need to index and tune queries
- Don't look at %s when looking at wait types, look at clock time
- Monitoring software will measure hours of wait time per hour. Very useful
- "Density Vector" average number of rows for any given value
- If an execution plan gets seeded with a small number of estimated rows, then when the sproc runs again with different parameters it won't necessarily use statistics to find the plan to run
- Look on the actual execution plan XML and you'll see ParameterCompiledValue differences from ParameterRuntimeValue. If it's zero, then it's not using statistics it's just using the cached item.
- With procedures, look for differences in the avg duration between statements and the procedure. This could be an indication that parameter sniffing is an issue.
- stored procedures use statistics differently from direct statements. This can be a real issue when you're trying to trace down perf issues with stored procs inside of mgmt studio.
- `exec sp_helpstats 'dbo.Posts', 'All'``
DBCC SHOW_STATISTICS('Posts', 'kl_posts_OwnerUserId')
- Steps will show you the number of histogram blocks. 200 is the max, 201 steps with nulls
- If you use a local variable in a sproc then it uses a different set of statistics - uses density vector calculations instead. Talked without someone else at the conference who said they used this across all their sprocs to avoid parameter sniffing issues. This could be used instead of
WITH RECOMPILE
- You won't be able to reproduce the problem oustide of a sproc
- INSTEAD, create a temp sproc to find out what's going on...
- The
WITH RECOMPILE
option, the exec plan will be re-compiled and then not stored in cache- Note that this only applies to the outer most statement
- Rather than 'WITH RECOMPILE' at the sproc level, you should use 'OPTION (RECOMPILE)' at the statement level within the sproc
- DON'T put RECOMPILE hints in the header of the sproc
- `OPTION (OPTIMIZE FOR UNKNOWN)`` as an alternative to recompile. Forces it to caculate estimated rows based on averages
- Local variables inside a sproc CANNOT be sniffed
- `OPTION OPTIMIZE FOR (@UserId=557499)`` this will give you a consistent execution plan
- Another option would be to have different versions of the sproc for small / large customers
- Prepared statements act just like spocs
- NonClustered indexes always include the clustered index key
- When testing slow running sprocs, create a temp sproc, look at the 'compiled for' value from the slow running instance and then
OPTIMIZE FOR
that value. - Updating statistics forces a recompile. This doesn't mean that the stats were bad, it could be that the cached plan was bad.
- Read committed is the current default pessimistic isolation level in the boxed version of sql server
- Use monitoring tools or perfmon alerts to identify blocking issues
- Create alert in SQL Agent -
General Statistics:Processes Blocked
. You can also set the response values to runsp_WhoIsActive
when this happens. Make sure you set a soak time of like 5 minutes to prevent multiple alerts from coming in LCK_M_IX
,LCK_M_S
,LCK_M_U
,LCK_M_IS
- locks are either row or object locksLCK_M_SCH_S
,LCK_M_SCH_M
- Index rebuilds will also grab a schema lock when indexing occursALTER TABLE DISABLE TRIGGER
- nobody can use the table while you're running
- Blocked process report (look for links in materials) can help you visualize the blocking chains every 5 seconds
- Deadlock monitor runs every 5 seconds. Also reports on blocking that can be picked up by a trace which can be picked up by extended events.
- 'Lock Escalation' SQL Server will escalate to a larger scoped lock once you cross a certain threshold. If you do any type of clustered scan there's a risk that the lock could escalate
- Look at wait stats to see what types of locking are occuring
- Identify the root of the blocking chain
- Optimistic Concurrency
- Read Committed Snapshot Isolation (SNAPSHOT)
- most recent committed version at the statement level
- Snapshot Isolation
- transactionally consistent within a transaction
- Use Snapshot isolation for reports can help without blocking the rest of the application. Make the big read only reports use snapshot isoluation. Best to use only on big read queries, not updates
- Read Committed Snapshot Isolation (SNAPSHOT)
- You can run RCSI and SNAPSHOT isolation together. SNAPSHOT is great for long running reports, just be sure to wrap the logic for the report in a transaction and set SNAPSHOT isolation
- With RCSI and SNAPSHOT, performance of tempdb becomes vital.
- Monitor tempdb perf and look for abandoned transactions (look back through notes on how to do this)