This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

SAP BI – ORA-04031 – Oracle 10.2.0.5 – Bug 9737897 – End Of.


If you use SAP BI 7.0 on an Oracle 10.2.0.5 64bit database you may experience this issue.
Our BI system performs tha delta loads for FI data overnight.
These are large loads of records into the cubes.

The problem is Oracle error 04031:
----------------------------------------------------------------------------------
Mon Jul 11 00:46:26 GMT Daylight Time 2011
Errors in file j:oraclebwpsaptraceusertracebwp_ora_3716.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,online$,contents...","Typecheck","kgghteInit")

ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,online$,contents...","Typecheck","kgghteInit")

ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,online$,contents...","Typecheck","kgghteInit")

Mon Jul 11 00:46:26 GMT Daylight Time 2011
Errors in file j:oraclebwpsaptracebackgroundbwp_smon_8756.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select file#, block#, ts# fr...","Typecheck","seg:kggfaAllocSeg")

Mon Jul 11 00:46:27 GMT Daylight Time 2011
Errors in file j:oraclebwpsaptracebackgroundbwp_smon_8756.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select file#, block#, ts# fr...","Typecheck","seg:kggfaAllocSeg")

----------------------------------------------------------------------------------


Once it hits this problem, it causes the rest of the chain to fail on the index rebuilds.
Sometimes the user queries also fail in the Bex Web front-end. The user sees a similar error as those shown above.

Using V$SGASTAT I was able to take snapshots of the Shared Pool usage via a Windows scheduled task.
I traced the problem to the memory allocation of “obj stat memo” in the Shared Pool of the database.
It’s constantly growing, and although the “free memory” reduces slightly, it doesn’t seem to correlate to the increase in “obj stat memo” usage.

So next step was looking on Oracle Support.
I found bug 9737897, which is valid *upto* 10.2.0.4. But it does not state that it has actually been fixed in 10.2.0.5.
The bug description says: “An SGA memory leak is possible for “obj stat memo” and “obj htab chun”
memory if a segment is dropped and recreated many times.”

The bug note recommends querying x$ksolsfts (see here about this table https://yong321.freeshell.org/computer/x$table.html).
The following SQL is recommended by Oracle:

select distinct fts_objd, fts_objn from x$ksolsfts x
where not exists (select 1 from obj$ o
where o.dataobj#=x.fts_objd and o.obj#=x.fts_objn);

The note says that a high number of records returned could indicate that you are seeing the symptoms of the bug.
I was seeing over 160,000 records in a database that had been restarted 24 hours prior.

I produced a lovely Excel chart showing the snapshot times against the memory use of the “obj stat memo” component.
The chart showed significant increases (step ups) of approximately 10MB at specific points in time.
I was then able to search SM37 using the “extended” option, to show background jobs active during the memory “step ups” identified by the chart.
Generally I saw 10-20 jobs active at the time, but the correlation showed that more often than not, the “step ups” were at near enough the same time as a BI_PROCESS_INDEX job.

Looking in the background job log, I was able to identify the process chain and the type of index/load that was being performed.
The indexes in question were FI cube related. They were quite large indexes and they were bitmap indexes (“CREATE BITMAP INDEX…”).
Each night these indexes got re-built after a load.
The re-build command was visible in the background job log and included “COMPUTE STATISTIC”.

So I had successfully identified a segment(s) that was being dropped and re-created regularly.
This met with the Oracle bug description.

At the bottom of the bug note, it states:

“Note:
If you suffer “obj stat memo” memory leak problem in 10.2.0.5, you can try
_object_statistics = false as a workaround, or following init parameter to see
if it solves problem.
_disable_objstat_del_broadcast=false”

So what does SAP say about this.
Well the following SAP notes were/are available:
1120481 – ORA-4031 – memory allocation errors for object-level stat
1070102 – High memory allocation by ‘obj stat memo’
869006 – Composite SAP note: ORA-04031

The only note of relevance is 1120481 which has a short reference to Oracle 10.2.0.4.
The other notes tend to point the read at just blindly increasing the Share Pool. I tried this, it didn’t work when 900MB was used by “obj stat memo”!

So, stuck with only one possible solution (other than upgrading to 11g), I can confirm that setting “_object_statistics = false” (alter system set “_object_statistics”=false scope=spfile;) and bouncing the database solved the problem as it completely removes the use of the “obj stat memo” memory area (it’s no longer visible in V$SGASTAT).

When Should I Pay For Oracle Diagnostics/Tuning Pack in 10G/11G


If you are unsure about what action will result in you having to pay for a license for the Oracle Diagnostic Pack or Tuning Pack (or other options come to think of it), then you could use the query below to determine the check conditions:

SELECT NAME, DESCRIPTION FROM WRI$_DBU_FEATURE_METADATA;

This will output the name of the feature and the description of the check that is performed.
You could also see the check “code” if you include the USG_DET_LOGIC column in the SELECT.

In 10G it is immediately possible to see that the AWR is only classed as used if “At least one Workload Repository Report has been created by the user“.
In 11G, AWR is classed as used if “At least one AWR Baseline has been created by the user” OR “At least one AWR Baseline Template has been created by the user“.

The AWR is licenseable (https://www.orafaq.com/wiki/AWR).

So does this mean you can select from those views? Well Oracle do ask you to in some support situations. So maybe it is reasonable to assume so.
The Oracle 11G Diagnostics Pack doc https://www.oracle.com/technetwork/oem/grid-control/overview/diagnostics-pack-11gr2-datasheet-134502.pdf states otherwise: “These features may be accessible through Oracle Enterprise Manager , Oracle SQL Developer and APIs provided with Oracle Database software. The use of these and other features described in the product licensing documentation requires licensing of the Oracle Diagnostic Pack regardless of the access mechanism.“.

The Oracle 11G Tuning Pack *does not* say the same: https://www.oracle.com/technetwork/oem/grid-control/overview/tuning-pack-11gr2-datasheet-134450.pdf
But, the tuning pack APIs use features included in the Diagnostic Pack.  Chicken and egg.

I guess it’s open to interpretation. Better stay safe and ensure that you don’t hit any of the checks.

Oracle Home Space Cleanup (how random that sounds)

Just a quick post to make you aware of a fantastic option I recently discovered in opatch.
I look after a large number of database systems, some having multiple Oracle homes on the same drive (I know I know, but when you’re limited on drives…).
These homes consist of things like Oracle RDBMS (obvious one), Oracle 10g EM Agent and finally Oracle 10gAS (and 10g OEM if you like to think that OMS is separate, but it’s a 10gAS so I didn’t).

Generally the Oracle home for a 10gR2 RDBMS Enterprise Edition binary codeset seems to be around 1GB in size.
Apply a couple of opatches and you could be looking at 2GB!
All of a sudden, the partition is looking quite full (it used to be 9i on there when it was originally designed!!).

Lets bring in a new opatch command that came in as part of an opatch upgrade in 10gR2 (opatch patch 6880880).
The command is part of the “opatch util” command and is a sub-command called “cleanup”.
It supposedly removes the duplicated binary files taken during an opatch patch application.

Oracle doc 550522.1 states “Starting with 10.2, Opatch does not backup only the affected modules, it also takes a backup of the complete affected libraries to $ORACLE_HOME/.patch_storage//backup//.“.

The cleanup command only removes files that are not required as part of a patch rollback.

Syntax: “$ opatch util cleanup“.

You will be prompted to select the oracle home and confirm the cleanup. It will also inform you of the amount of space saving.
Enjoy the space you get back, while you can!

Checking Lack of Bind Variables & Adhoc SQL in Oracle

The following Oracle SQL is a crude method that can be used to check the V$SQLAREA view for SQL statements that have the same execution plan, but where the SQL is unique. e.g. SQL statements where only the predicate changes in the WHERE clause.  These are otherwise known as adhoc queries (something that SQL Server 2008 is supposed to be very good at detecting).

The statement was constructed based on the “Understanding Shared Pool Memory” document by Oracle (https://www.oracle.com/technetwork/database/focus-areas/manageability/ps-s003-274003-106-1-fin-v2-128827.pdf).

The doc basically says that a count of the PLAN_HASH_VALUE column versus the number of SQL_IDs using the plan indicates how bad the SQL needs bind variables since it should have only one or two SQL statements (the lower the better) per plan.

In my opinion, systems that return >60% for ACTION “Could use bind variables.” should either consider revising the application SQL code (permanent fix), or using the CURSOR_SHARING=FORCE init parameter (temporary fix!).

SELECT 'Multiple SQLs in SQLAREA using same plan:' DESCRIPTION,sum(pct_hash_use) PCT,'Could use bind variables.' ACTION FROM (
select hc.plan_hash_value,
hc.hash_use_count,
sc.sql_count,
round((hc.hash_use_count/sc.sql_count*100),2) pct_hash_use
from
(select s.plan_hash_value, count(s.plan_hash_value) hash_use_count
from v$sqlarea s
group by s.plan_hash_value
having plan_hash_value > 0
) hc,
(select count(1) sql_count
from v$sqlarea t
where plan_hash_value > 0
) sc
order by pct_hash_use desc
) WHERE hash_use_count > 1
UNION
SELECT 'Single SQLs in SQLAREA using same plan:' Description,sum(pct_hash_use) PCT,'No action needed.' ACTION FROM (
select hc.plan_hash_value,
hc.hash_use_count,
sc.sql_count,
round((hc.hash_use_count/sc.sql_count*100),2) pct_hash_use
from
(select s.plan_hash_value, count(s.plan_hash_value) hash_use_count
from v$sqlarea s
group by s.plan_hash_value
having plan_hash_value > 0
) hc,
(select count(1) sql_count
from v$sqlarea t
where plan_hash_value > 0
) sc
order by pct_hash_use desc
) WHERE hash_use_count <= 1;


You can also use the following query to determine roughly how much memory is wasted on single SQL statements that have been executed only once (you should use this on a well bedded in system that has been running for a while):

select count(1) num_sql,
sum(decode(executions, 1, 1, 0)) num_1_use_sql,
sum(sharable_mem)/1024/1024 mb_sql_mem,
sum(decode(executions, 1, sharable_mem, 0))/1024/1024 mb_1_use_sql_mem
from v$sqlarea
where sharable_mem >0;


You should double check the output with the SQL text in V$SQLAREA to ensure that the SQL is truly ad-hoc.

Remember to get some hard and fast values for the “Concurrency” wait class before and after the change (bind variables introduced or init parameter changed).

You can use SQL: “select * from v$system_event where wait_class#=4 order by average_wait desc;” to determine the average wait time.

I would expect the library cache operation wait times to reduce as hard parses are reduced.

Here is an example of a system that is not using bind variables to optimise SQL:

When running the script we get:

You should note that an OLTP system such as SAP ERP should have a much lower count for adhoc SQL  compared to a SAP BW system where users can write their own queries.

Oracle Explain Plans: Cardinality & Histograms

This is one of the better white papers on how to read an Oracle explain plan:
https://www.oracle.com/technetwork/database/focus-areas/bi-datawarehousing/twp-explain-the-explain-plan-052011-393674.pdf
The most important point is made on cardinality on page 10 when explaining “Data Skew”.
The term “Data Skew” is used to refer to the imbalance of values with respect to a normal distribution (https://en.wikipedia.org/wiki/Skewness).
The imbalance is shown as the Mode (https://en.wikipedia.org/wiki/Mode_(statistics)) of the values in a column of a table.

Example:
SELECT * FROM A_TABLE;

COL1 | COL2
-----|-----
   1 | A <== Mode = most frequent.
   2 | A
   3 | A <== Median = middle one
   4 | B
   5 | B

Histogram for COL2:

Value |Number
------|------
A     |###
B     |##

Data Skew means that more of (usually it takes a lot more, like +60%) the values in COL2 have a value of “A” than of “B”.
As the Oracle whitepaper points out “One of the simplest formulas is used when there is a single equality predicate in a single table query (with no histogram). In this case the Optimizer assumes a uniform distribution and calculates the cardinality for the query by dividing the total number of rows in the table by the number of distinct values in the column used in the where clause predicate.“.

In example “SELECT * FROM A_TABLE WHERE COL2='A'” this would equate to: Cardinality = 5 / 2 A cardinality value of 2.5 would be rounded to 3.
Accurate enough.

The problem with this is that the larger the number of rows the more skewed the cardinality will become.
Imagine the table with values like so:

Value |Number
------|---------------
A     |###############
B     |##

Now in example “SELECT * FROM A_TABLE WHERE COL2='A'” this would equate to: Cardinality = 17 / 2
A cardinality value of 8.5 would be rounded to 9.
Definitely not 15.

To solve this problem, Oracle has the capability to collect statistics for individual columns.
You can use the DBMS_STATS.GATHER_TABLE_STATS command to collect stats on the column (https://www.dba-oracle.com/t_histograms.htm).

The below statement will collect stats on the COL2 column of A_TABLE in the OE schema with a bucket size of 20 buckets.  These column stats are histograms of the distribution of values.

BEGIN
DBMS_STATS.GATHER_table_STATS (OWNNAME => 'OE', TABNAME => 'A_TABLE',
METHOD_OPT => 'FOR COLUMNS SIZE 20 COL2');

END;
/

Two types of HISTOGRAM are available in Oracle, FREQUENCY and HEIGHT.
Oracle uses HEIGHT based histograms *only* when the number of distinct values of a column is greater than the number of buckets specified.
In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall.“.

Oracle will use FREQUENCY based histograms by default when the number of distinct values of a column is less than or equal to the number of buckets specified.
In a frequency histogram, each value of the column corresponds to a single bucket of the histogram.“.

https://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm

It is possible to view the collected histogram for a column:

SELECT column_name, num_distinct, num_buckets, histogram
FROM USER_TAB_COL_STATISTICS
WHERE table_name = 'A_TABLE' AND column_name = 'COL2';