What to do if a database performance issue is reported

A high level overview.

Expectation should be for all OLTP operations to take less than 1 second, and all batch processing jobs and background scheduled report jobs to take less than 1 hour each. This is deliverable on even large databases with moderate hardware, with some effort put in.

1) Is the issue even a performance issue at all?
Optionally quickly confirm this is a real performance issue rather than a general system issue being misreported as a performance issue. Commonly three non database-performance issues get reported this way:

1.1) Is there a system wide loss-of-service or near-loss-of-service due to server or database down, filesystem not mounted, space at 100%, rogue process consuming 100% cpu, etc.?

1.2) Are sessions waiting for a row lock to be released by another session?
In Oracle, $ORACLE_HOME/rdbms/admin/utllockt.sql or gv$sesion or Enterprise Manager GUI.
In Microsoft SQL Server right click instance in Management Studio > Activity Monitor.

1.3) Are application servers rather than database servers the bottleneck?
Quickly check one or more of the application servers cpu with:
In Windows > Right click bottom bar > Start Task Manager > Performance tab.
In Unix, top or topas or sar 5 5 etc.
Typically extra application servers can quickly and cheaply be cloned up in a ‘web garden’ architecture, or at least extra cpu resource can be allocated to the existing servers.
Requiring a lot of application servers is an indication that more application processing should be moved to within database, although that is normally more of a long term improvement aspiration.
Unlikely nowadays, but users PC could also be a bottleneck for client-server applications.

Now onto Database Tuning:

2) Identify the session or process consuming the most resources, if possible limiting your search to the sessions related to the particular application or user who has the performance problem.
In Oracle, query gv$session or look at top sessions in Oracle Enterprise Manager or Oracle SQL Developer.
In Microsoft SQL Server, use SQL Server Profiler or right click instance in Management Studio > Activity Monitor.

3) Obtain the SQL statements which that session recently ran. Focus first on the top SQL statement from that set.
In Oracle, use:
3.1) dbms_monitor.session_trace_enable + tkprof (best by far), or
3.2) $ORACLE_HOME/rdbms/admin/awrrpt.sql (if diagnostic pack licensed) (for RAC, awrgrpt.sql), or
3.3) $ORACLE_HOME/rdbms/admin/spreport.sql (if diagnostic pack not licensed), or
3.4) AWR performance view in Oracle Enterprise Manager (if diagnostic pack licensed).
In Microsoft SQL Server, use SQL Server Profiler.
Be careful here not to be drawn into wasting time looking at an irrelevant session or at sql from one of the irrelevant sessions.

4) Indexes – this is normally where the performance fix is at.
Find which indexes make that statement run faster.
In Oracle (if you have diagnostic and tuning pack licensed), $ORACLE_HOME/rdbms/admin/sqltrpt.sql or Enterprise Manager
In Microsoft SQL Server, use Database Engine Tuning Advisor – but only on a test server or out of hours, is itself very resource intensive.
The recommended SQL Profiles (Oracle) and create statistics (Microsoft SQL Server) tend to be spurious, but the index recommendations are better – at least as a starting point, will often recommend more indexes or more columns in an index than are actually necessary.
Indexes can also be created individually with execution plan used to see which indexes are beneficial.
In Oracle that is available in Oracle SQL Developer or with dbms_xplan.display_plan or set autotrace.
In Microsoft SQL Server that is available in query window of Management Studio.
In MySQL, prefix your SQL statement with the EXPLAIN keyword.
Function based indexes such as on UPPER(surname) or TRUNC(datetime) can be essential.
In Oracle, indexes can be switched off and on with visible/invisible to help speed up testing different scenarios.
Don’t be afraid to add indexes, the extra overhead on inserts is negligible compared to the savings on selects.

That may be enough to fix the problem, if not then the next steps are (unfortunately here things can start to be time consuming):

5) SQL Code. Can it be rewritten to be more efficient?
Look for slow-by-slow loop processing, unnecessary table joins (often happens with views, and views-of-views), unnecessary outer joins, old left over hints and sql profiles that are now inefficient.
Try to move the bulk of the main processing into a single SQL statement at the start, rather than issuing large numbers of separate SQL commands within a loop or repeated function calls.

6) Are you suffering Plan Instability?
A real curse for Oracle.
Means SQL runs fine some days, dead slow on other days.
Fixes include removing histograms on optimizer stats, setting cursor_sharing parameter to (default) value of exact, freezing optimizer statistics in place, using SQL Plan Management Baselines.
Plan instability is not (afaik) as much of an issue with Microsoft SQL Server, but freezing optimizer statistics in place may still be needed there.

7) Batch processing schedules.
Can they be moved to start earlier, or run immediately after each other in sequence with no gaps of wasted time, or set to run simultaneously rather than one at a time?
Adding a status or last_modified timestamp column to main table, with an index on that column, can greatly reduce the time taken for the initial select of records to be processed.

8) Reporting specific issues.
Consider moving slow reports to a dedicated reporting database away from the live OLTP database.
You can make queries automatically rewrite to use fast small summary objects containing the results the user wants. To do that, create:
Oracle – materialized views.
Microsoft SQL Server – indexed views.
It is also possible to manually build summary tables and point reports at those, although likely involves more work than the above automatic method.
Reporting data may need far more indexes than OLTP data.
Bitmap indexes and table and index compression may be useful in a reporting database.

9) Are database or server parameters wrongly configured?
Very unlikely in this day and age to find these set wrongly, used to be common back in the day (1990s) so still gets mentioned in old tuning books and articles.
Two that are still sometimes wrongly configured today are:
9.1) Oracle on linux without hugepages configured (the default is not to have hugepages configured).
9.2) Oracle on unix, non-ASM, with filesystems mounted using (the default) filesystem OS buffer caching instead of direct IO.
Memory assigned to the database should be generous, but that is now anyway done as default with the Oracle DBCA and Microsoft SQL Server install GUIs.
In MySQL this is innodb_buffer_pool_size=Xg in file my.cnf.
Don’t be tempted to skimp on pga_aggregate_target for Oracle on linux, especially for reporting and batch applications.
Physical memory chips are very cheap now, but cpu is still expensive, especially so for database servers (as opposed to application servers) because the database license likely depends on the cpu count. Database servers are anyway not normally cpu bound.

Gotchas:

1) It is very easy to be distracted in performance investigations, notice something which is maybe not quite right but is irrelevant for the issue at hand, wasting a lot of time investigating and fixing the irrelevant issue.
Especially easy if you know how to fix the irrelevant issue yet don’t know how to fix the real issue – human nature is to put off difficult tasks.
Make sure to focus on what is causing the problem, ignore everything else (or maybe take a note to go back and look at those other things in the future when you have spare time).

2) Higher spec hardware and/or newer versions of OS or database software is always a worthwhile thing to have, but most often makes no difference to a performance issue, beware of recommending that as a performance fix.

3) Performance testing is essential before releasing new applications or new versions to live, and before upgrading database versions.
Batch jobs are easy to run against a test database.
For OLTP browser interface applications, make a recording in Apache JMeter (free download) or a commercial equivalent like HP LoadRunner and replay that against the new environment before go live.

Further reading:

December 14, 2015

Leave a Reply

Your email address will not be published. Required fields are marked *