This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Disaster Recovery – Where To Start

Perhaps IT implementors think that Disaster Recovery is a dark art, or maybe it’s just another item to de-scope from the implementation project due to time restrictions or budget cuts.
But I find that it’s coming up more and more as part of the yearly financial system audits (the ones we love to hate!).
Not only is it an action on the main financial systems, but the auditors are training there sights on the ancillary systems too.

So where do I normally start?
Usually it’s not a case of where I would like to start, it’s where I *have* to start due to missing system documentation and configuration information.
Being a DBA, I’m responsible for ensuring the continuity of business with respect to the database system and the preservation of the data.
So no matter what previous work has been completed to provide a DR capability, or a HA capability (so often confused) it’s useless without any documentation or analysis to show what the tolerances are (how much data can you afford to lose – RPO, and how much time it will take to get a system back, RTO).

First things first, ensure that the business recognises that DR is not just an IT task.
You should point them at the host of web pages devoted to creating the perfect Business Continuity Plan (BCP).
It states that a BCP should include what *the business* intends to do in a disaster.  This more often than not involves a slight re-work to the standard business processes.  Whether this involves going to a paper based temporary system or just using a cut-down process, the business needs a plan.
It’s not all about Recovery Point Objective and Recovery Time Objective!
A nice “swim lane” diagram will help a lot when picturing the business process.
You should start at the beginning of your business process (normally someone wants to buy something), then work through to the end (your company gets paid).  Add all the steps in between and overlay the systems used.  Imagine if those systems were not present, how could you work around them.

Secondly, you need to evaluate the risks among the systems you support.  Identify points of failure.
To be able to do this effectively, you will need to know your systems inside out.
You should be aware of *all* dependencies (e.g. interfaces) especially with the more complex business systems like ERP, BPM, BI, middleware etc.
Start by creating a baseline of your existing configuration:
– What applications do you have.
– What versions are they.
– How are they linked (interfaces).
– Who uses them (business areas).
– What type of business data do they store (do they store data at all).

Third, you should make yourself aware of the hardware capabilities of important things like your storage sub-system (SAN, NAS etc).  Can you leverage these devices to make yourself better protected/prepared.
An overall architecture diagram speaks a thousand words.

Once you understand what you have got, only then can you start to formulate a plan for the worst.
Decide what “the worst” actually is!  I’ve worked for companies that are convinced an aeroplane would decimate the data centre at any moment.  All the prep-work in the world couldn’t prevent it from happening, but maybe you can save the data (if not yourself 😉 ).

Report All Oracle User Accounts Through OEM Grid Control

You know the problem: You have Oracle Enterprise Manager Grid Control 10g installed and all your databases are configured in it.
It allows you to manage your entire landscape.  Great!

All of a sudden, a “leaver form” pops on your desk from HR with a long list of people who have left the company recently.
Do they have accounts in your Oracle databases?
If you have only one or two Oracle databases to check, then you can use the power of Grid Control to sort out the list in a few minutes.
What if you’ve got 50+ databases?
What if you’ve got 100+ databases?
Don’t they say: “The little jobs are the most time consuming”?

You may maintain an external spreadsheet of Oracle accounts, or you may not.  If you do, or even if you don’t, this article will show you how to configure a Oracle Enterprise Manager Grid Control 10g report that will show you all users accounts across the landscape at the click of a button.  You could use this technique to extend it to almost anything that can be done in an SQL or PL/SQL session.

Architecture overview:

We will be creating a new OEM Grid Control job called LIST_USER_ACOUNT_DETAILS, and a new report called LIST_USER_ACCOUNT_DETAILS.
The job is executed on the monitored database targets and it stores it’s output in the usual MGMT$ table in the OEM repository database.
The report simply pulls and re-connects the job output.

In OEM Grid Control create a new job called “LIST_USER_ACCOUNT_DETAILS”.
Add the database targets or target group.

Set the SQL script to be:

SET HEAD OFF;
SET PAGESIZE 9999;
SET LINESIZE 999;
SET TRIMOUT ON;
SET TRIMSPOOL ON;
SET FEEDBACK OFF;
SET COLSEP ' ';
COL MARKER FORMAT A2;
COL USERNAME FORMAT A30;
COL ACCOUNT_STATUS FORMAT A25;
COL LOCK_DATE FORMAT A12;
COL EXPIRY_DATE FORMAT A12;
COL HAS_REMOVED_ROLE FORMAT A30;
COL HAS_DBA_ROLE FORMAT A20;
WHENEVER SQLERROR EXIT FAILURE;
SELECT
'@|' marker,
username||'|' username,
account_status||'|' account_status,
'|'||TO_CHAR(lock_date,'DD-MM-RRRR')||'|' lock_date,
'|'||TO_CHAR(expiry_date,'DD-MM-RRRR')||'|' expiry_date,
'|'||DECODE((SELECT 'X' FROM dba_role_privs WHERE grantee = username AND granted_role ='REMOVED_ACCOUNTS'),'X','HAS ROLE "REMOVED_ACCOUNTS"','')||'|' has_removed_role,
'|'||DECODE((SELECT 'X' FROM dba_role_privs WHERE grantee = username AND granted_role ='DBA'),'X','HAS DBA ROLE!!','') has_dba_role
from dba_users;

Set the Parameters to be “-s” (silent):

Set the Credentials to use an account capable of querying DBA_USERS and DBA_ROLE_PRIVS (you could change this to use ALL_ views and use a lower priviledged account).
Set the schedule for the job. However frequently you wish to query the account details.

Submit the job for execution.

Once the job runs, the job output will be visible in the OEM MGMT$JOB_STEP_HISTORY table.
The actual SQLPlus output is stored in the OUTPUT column which is a CLOB field.
We have had to insert a marker in the SQL so that the output contains a ‘@|’ at the beginning of each line. This is used to determine the start of each record in the OUTPUT CLOB.
We just need to write a report to pull out the details.

Create a new report called “LIST_USER_ACCOUNT_DETAILS”.
On the “General” tab, select the “Use the specified target” option and enter the hostname of your OEM Grid Control repository server (DB server).
Select the “Run report using target privileges of the report owner (SYSMAN)” tick box.

On the “Elements” tab, add a “Table from SQL” item and then set the header, then paste in the statement below:

SELECT TARGET_NAME,JOB_TIME,ACCOUNT_DETAILS FROM (
SELECT '===============' TARGET_NAME,'================' JOB_TIME,'==================================================' ACCOUNT_DETAILS FROM DUAL
UNION
SELECT
TARGET_NAME,
TO_CHAR(END_TIME,'DD-MON-RRRR HH24:MI:SS') JOB_TIME,
-- non-regexp option -- TO_CHAR(REPLACE(SUBSTR(mh.OUTPUT,INSTR(mh.OUTPUT,'@| ',1,ct.x),INSTR(mh.OUTPUT,'@| ',1,ct.x+1) - INSTR(mh.OUTPUT,'@| ',1,ct.x)),'@| ','')) ACCOUNT_DETAILS
TO_CHAR(REGEXP_SUBSTR(mh.OUTPUT,'[^@]+', 1,ct.x)) ACCOUNT_DETAILS
FROM MGMT$JOB_STEP_HISTORY mh,
(SELECT rownum x FROM ALL_OBJECTS WHERE rownum<=5000) ct
WHERE TARGET_TYPE='oracle_database'
AND JOB_OWNER='SYSMAN'
AND JOB_NAME LIKE 'LIST_USER_ACCOUNT_DETAILS.%'
AND STEP_NAME='Command'
)
WHERE ACCOUNT_DETAILS IS NOT NULL
ORDER BY TARGET_NAME, JOB_TIME;

NOTE: In the report SQL, we have assumed a maximum number of 5000 user accounts across all database targets.
If you exceed this (if you have a lot of user accounts) then you will need to increase the number.
We have also assumed that SYSMAN is the owner of the job that created the output. You should change this if necessary.

You can choose to schedule the report if you wish (you should take into account the source job schedule).
It's probably best to secure the report so that only administrators can see it.

Test the report.

The output format consists of the following columns:
 TARGET NAME
 JOB TIME
 ACCOUNT DETAILS {
                 USERNAME
                 ACCOUNT STATUS
                 LOCK DATE
                 EXPIRY DATE
                 HAS REMOVED ROLE
                 HAS DBA ROLE
                }

The "HAS REMOVED ROLE" column relates to a "flag" role that I use to denote that an Oracle account has been de-activated purposely.
Instead of the account being deleted, it is retained for audit purposes and the REMOVED_ACCOUNTS role granted to it. You can choose to ignore or adapt this column.

Once you have the output, you can extract it to Excel for comparison with an Active Directory dump or any other means.

Optimal Oracle RMAN Backup Command Options


Just a very short post about an optimal Oracle RMAN backup command run block.
Here’s what *I* think provides an optimal backup for a generic setup.

— Check our existing files are available before we do some obsolete deleting.
crosscheck archivelog all;
crosscheck backup;

— Delete any obsolete files before we start so that we have max space available.
— I have “REDUNDANCY 1” set so that I retain the previous backup until the next backup is completed successfully (you need double disk/tape space available for this).
delete noprompt obsolete;

— Do the DB and ARCH log backups but put each datafile into it’s own piece file.
— I prefer this because should I need only a specific datafile restored and I have to go to tape, I only need to get the piece that’s relevant and this piece is only the datafile needed, saving tape restore time.
— For Oracle 9i remove the ‘as compressed backupset’ option.
backup as compressed backupset database filesperset 1 plus archivelog filesperset 1;

— Now only delete archive logs that are in two separate backups.  This is critical.
delete noprompt archivelog all backed up 2 times to disk;

— Delete the previous night’s backup from disk (obviously you need the space to be able to store two backups on disk).
delete noprompt obsolete;

— Finally, always a good idea to keep a list of datafiles and temp files that existed at the time of the backup.
report schema;

Make sure that you always save the RMAN backup logs.
I’ve found that they can be very useful during a restore scenario.

As a tip, you can also run a daily job to list what backups are required to restore a database using the “RESTORE DATABASE PREVIEW SUMMARY;” command.

SAP BI – ORA-04031 – Oracle 10.2.0.5 – Bug 9737897 – End Of.


If you use SAP BI 7.0 on an Oracle 10.2.0.5 64bit database you may experience this issue.
Our BI system performs tha delta loads for FI data overnight.
These are large loads of records into the cubes.

The problem is Oracle error 04031:
----------------------------------------------------------------------------------
Mon Jul 11 00:46:26 GMT Daylight Time 2011
Errors in file j:oraclebwpsaptraceusertracebwp_ora_3716.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,online$,contents...","Typecheck","kgghteInit")

ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,online$,contents...","Typecheck","kgghteInit")

ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,online$,contents...","Typecheck","kgghteInit")

Mon Jul 11 00:46:26 GMT Daylight Time 2011
Errors in file j:oraclebwpsaptracebackgroundbwp_smon_8756.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select file#, block#, ts# fr...","Typecheck","seg:kggfaAllocSeg")

Mon Jul 11 00:46:27 GMT Daylight Time 2011
Errors in file j:oraclebwpsaptracebackgroundbwp_smon_8756.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select file#, block#, ts# fr...","Typecheck","seg:kggfaAllocSeg")

----------------------------------------------------------------------------------


Once it hits this problem, it causes the rest of the chain to fail on the index rebuilds.
Sometimes the user queries also fail in the Bex Web front-end. The user sees a similar error as those shown above.

Using V$SGASTAT I was able to take snapshots of the Shared Pool usage via a Windows scheduled task.
I traced the problem to the memory allocation of “obj stat memo” in the Shared Pool of the database.
It’s constantly growing, and although the “free memory” reduces slightly, it doesn’t seem to correlate to the increase in “obj stat memo” usage.

So next step was looking on Oracle Support.
I found bug 9737897, which is valid *upto* 10.2.0.4. But it does not state that it has actually been fixed in 10.2.0.5.
The bug description says: “An SGA memory leak is possible for “obj stat memo” and “obj htab chun”
memory if a segment is dropped and recreated many times.”

The bug note recommends querying x$ksolsfts (see here about this table https://yong321.freeshell.org/computer/x$table.html).
The following SQL is recommended by Oracle:

select distinct fts_objd, fts_objn from x$ksolsfts x
where not exists (select 1 from obj$ o
where o.dataobj#=x.fts_objd and o.obj#=x.fts_objn);

The note says that a high number of records returned could indicate that you are seeing the symptoms of the bug.
I was seeing over 160,000 records in a database that had been restarted 24 hours prior.

I produced a lovely Excel chart showing the snapshot times against the memory use of the “obj stat memo” component.
The chart showed significant increases (step ups) of approximately 10MB at specific points in time.
I was then able to search SM37 using the “extended” option, to show background jobs active during the memory “step ups” identified by the chart.
Generally I saw 10-20 jobs active at the time, but the correlation showed that more often than not, the “step ups” were at near enough the same time as a BI_PROCESS_INDEX job.

Looking in the background job log, I was able to identify the process chain and the type of index/load that was being performed.
The indexes in question were FI cube related. They were quite large indexes and they were bitmap indexes (“CREATE BITMAP INDEX…”).
Each night these indexes got re-built after a load.
The re-build command was visible in the background job log and included “COMPUTE STATISTIC”.

So I had successfully identified a segment(s) that was being dropped and re-created regularly.
This met with the Oracle bug description.

At the bottom of the bug note, it states:

“Note:
If you suffer “obj stat memo” memory leak problem in 10.2.0.5, you can try
_object_statistics = false as a workaround, or following init parameter to see
if it solves problem.
_disable_objstat_del_broadcast=false”

So what does SAP say about this.
Well the following SAP notes were/are available:
1120481 – ORA-4031 – memory allocation errors for object-level stat
1070102 – High memory allocation by ‘obj stat memo’
869006 – Composite SAP note: ORA-04031

The only note of relevance is 1120481 which has a short reference to Oracle 10.2.0.4.
The other notes tend to point the read at just blindly increasing the Share Pool. I tried this, it didn’t work when 900MB was used by “obj stat memo”!

So, stuck with only one possible solution (other than upgrading to 11g), I can confirm that setting “_object_statistics = false” (alter system set “_object_statistics”=false scope=spfile;) and bouncing the database solved the problem as it completely removes the use of the “obj stat memo” memory area (it’s no longer visible in V$SGASTAT).

When Should I Pay For Oracle Diagnostics/Tuning Pack in 10G/11G


If you are unsure about what action will result in you having to pay for a license for the Oracle Diagnostic Pack or Tuning Pack (or other options come to think of it), then you could use the query below to determine the check conditions:

SELECT NAME, DESCRIPTION FROM WRI$_DBU_FEATURE_METADATA;

This will output the name of the feature and the description of the check that is performed.
You could also see the check “code” if you include the USG_DET_LOGIC column in the SELECT.

In 10G it is immediately possible to see that the AWR is only classed as used if “At least one Workload Repository Report has been created by the user“.
In 11G, AWR is classed as used if “At least one AWR Baseline has been created by the user” OR “At least one AWR Baseline Template has been created by the user“.

The AWR is licenseable (https://www.orafaq.com/wiki/AWR).

So does this mean you can select from those views? Well Oracle do ask you to in some support situations. So maybe it is reasonable to assume so.
The Oracle 11G Diagnostics Pack doc https://www.oracle.com/technetwork/oem/grid-control/overview/diagnostics-pack-11gr2-datasheet-134502.pdf states otherwise: “These features may be accessible through Oracle Enterprise Manager , Oracle SQL Developer and APIs provided with Oracle Database software. The use of these and other features described in the product licensing documentation requires licensing of the Oracle Diagnostic Pack regardless of the access mechanism.“.

The Oracle 11G Tuning Pack *does not* say the same: https://www.oracle.com/technetwork/oem/grid-control/overview/tuning-pack-11gr2-datasheet-134450.pdf
But, the tuning pack APIs use features included in the Diagnostic Pack.  Chicken and egg.

I guess it’s open to interpretation. Better stay safe and ensure that you don’t hit any of the checks.