SQL Performance Issues
The initial investigation of this issue is written in the Production Emergency post. You might want to read that first if you haven’t already.
Addressing the underlying problem
We had identified the source of the problem. The SQL text can be extracted from the database as follows:
This displays the SQL text. If the application populates the module
, program
and client_id
of v$session
using
dbms_application_info
, then there is enough information to find out what code is causing the
problem and even who was running the program that caused it.
It is possible to find the execution plan that the database has chosen for the SQL as follows:
|
|
It is even possible to find out how much of the plan has been executed:
Having identified the SQL that is running slow, how do we force a good plan? That is difficult. Sometimes just purging the plan and forcing the SQL to get parsed again will work:
According to the documentation the first parameter is a concatenation of the address and
hash value. The second is anything apart from any single character in PpQqRrSsTt
. According to
Oracle base,
most people pick C
for Cursor.
Another option if the problem SQL is a single statement, is to use SQL Plan Management to fix a good plan or disable a bad one. To do this, we need to load the plans:
It is somewhat difficult to match the SQL_ID
and plan_hash_value
of a SQL to an SQL_Handle
and
Plan_Name
recorded in dba_sql_plan_baselines
. Typically it is easiest to search for parts of the
SQL that seem unique, or indeed it is possible to use the SQL as a key:
This is a little slow, but it brings back the handle. Next, how to work out which plan is which?
To do this I used a handy procedure from OracleProf It prompts for the SQL_Handle.
|
|
Running this prompts for an sql_handle
. It gives the sql_id
and the plan_hash_value
for each statement. The plan hash value can be matched with a poor plan identified e.g. from
the sql_monitor
or dbms_xplan.display_cursor
. I discovered that none of the plans loaded
was the one causing the problem. It appears it was already purged from the cache, suggesting
that the database had already abandoned it due to cardinality feedback.
It still seemed reasonable to give the database a hint that these are good plans to use, so they are enabled by evolving them:
This took a really long time (Significantly more than 24 hours) to run, so I left it.
However I could see that within 15 minutes or so, it had accepted the plans by looking
at the ACCEPTED
column of dba_sql_plan_baselines.
Conclusion
The above steps kept the system largely working while the dev team investigated.
The dev team worked out a more efficient way to write the SQL. There was no obvious reason that I could see for the change in performance of this SQL statement, but possibly it’s complexity combined patching and minor changes in stats just pushed it over a threshold from a good plan to a bad one.