This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

RMAN 10.2 Block Corruption Checking – Physical, Logicial or Both

It’s an old topic, so I won’t dwell on the actual requirements or the process.

However, what I was not certain about, was whether RMAN in 10.2 (10gR2) would perform both physical *and* logical corruption checking if you use the command:

RMAN> BACKUP VALIDATE CHECK LOGICAL DATABASE;

I kept finding various documents with wording like that found here: https://docs.oracle.com/cd/B19306_01/backup.102/b14191/rcmbackp.htm#i1006353

“For example, you can validate that all database files and archived redo logs can be backed up by running a command as follows:

RMAN> BACKUP VALIDATE DATABASE ARCHIVELOG ALL;

This form of the command would check for physical corruption. To check for logical corruption,

RMAN> BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;"

It took a while, but I found the original document from Oracle here: https://docs.oracle.com/cd/B19306_01/backup.102/b14191/rcmconc1.htm#i1008614

Right at the bottom, it confirms that ordinarily “BACKUP VALIDATE DATABASE;” would check for physical corruption.
The additional keywords “CHECK LOGICAL” will check for logical corruption *in addition* to physical corruption.

So RMAN doesn’t need running twice with each validate command combination.

DBCLONE Is Still Running, Running & Running Running…

Scenario: Your running through an upgrade of SAP on Oracle, either applying an EHP or a support package stack update.  You’re using the “Standard” downtime minimized approach and you’ve got to the SUM stage MAIN_SHDCRE/SUBMOD_SHDDBCLONE/DBCLONE and it has just bailed out!

During the DBCLONE step, a number of background jobs are created that copy certain tables, programs sources etc, from the current SAP database schema, to the shadow instance schema (on Oracle SAPSR3SHD).
The copies of the tables, sources etc, are placed into the new tablespace that you were asked to create in the earlier steps.

During this copy process, the database will be generating a lot of redo information (it is performing a lot of INSERTs).  This means that it will be generating a lot of archive logs also.  Most systems are in archive log mode by default, as this is the safest way of upgrading a production system.

The DBCLONE step can take a long time depending on a few factors:

  • Size of your SAP system master data.  Since the transactional data is not copied, most SAP systems will be roughly the same sort of size for the master data tables and sources etc (e.g tables D010TAB, D010INC, REPOSRC).  Don’t forget, once tables are cloned, it needs to build the required indexes on those tables too!  Then it will gather stats on the tables and indexes.
  • Quality of your database.  If your Oracle database is highly fragmented, the indexes are not in good shape, or there is a lack of memory allocated to the database.

  • Redo disk write times.  The faster the write times for redo, the quicker this is going to go.
  • Number of parallelised jobs.  The SUM tool recommends 3 jobs in parallel.  Prior to this SUM step, you would have been asked to configure the number of parallel jobs (and also your number of background work processes).  If you configure less than 3, then it will take longer.  I would personally recommend to have n+3, where n= your normal production number of background work processes.  This means you will not be hampering day-to-day usage by blocking background jobs.  The 3 jobs are created with high priority (Class A) so they get all the background processing they need.
  • Whether you elected to pre-size the new shadow tablespace data files.
    Setting them to autoextend is fine, but by default, the SAP brspace commands create the files with only 200MB.  By setting these files to be as big as they need to be (no autoextend) then you will save time.

During the DBCLONE step, the SUM tool monitors progress by RFC connection into the SAP system.  It checks to see when the DBCLONE background jobs complete (and that they complete successfully).
If you have limited space available in your archive log area, and this fills up, then the RFC connection from SUM fails to work (archiver stuck issue).
This causes SUM to report that the step has aborted, but that DBCLONE was still running.

You will still see DBCLONE running in the background when you resolve the archiver stuck issue.
At this point, you could choose to manually cancel the jobs by “Cancel without Core” in SM50 for the busy background work processes where DBCLONE is running.  However, they are still running, and simply waiting until they have finished, then restarting SUM, will continue from where it left off.

It knows where it got to, because it records the list of tables cloned in the SAPSR3.PUTTB_SHD table by setting the CLONSTATE column to ‘X’.
During the cloning process, the tables to be cloned are assigned to background jobs using the CLONSTATE column in the SAPSR3.PUTTB_SHD table.

You can monitor the cloning progress by using the following SQL:

SQL> select count(*) num, clonstate from SAPSR3.PUTTB_SHD group by clonstate;

You will notice that the CLONSTATE column will contain:
‘X’  – Tables cloned.
‘1’  – Tables on the work list of background job DBCLONE1.
‘2’  – Tables on the work list of background job DBCLONE2.
‘n’  – Tables on the work list of background job DBCLONEn.
‘  ‘  – Tables not being cloned.

As tables are cloned, the CLONSTATE changes from ‘n’ to ‘X’.
It seems that larger tables are performed first.

The method used to clone the tables is: “INSERT into <table> SELECT (<fields>) FROM <source>;”.
Then a “CREATE INDEX” is performed.

It’s also worth noting that you may need enough PSAPTEMP space to account for the index creation.
In a Solution Manager 7.1 SPS10 system, there are 13109 tables to be cloned.

As a final parting comment, if you have Oracle 11gR2, you should consider the database compression options available to you.  Reducing the I/O requirements will massively help speed up the process.

HowTo: Read ST03 IO Redo Log Per Hour, Log Switches in SAP

Within the SAP St03 transaction, the analysis view “Wait Event Analysis -> IO Redo Log Per Hour” or “Redo Log Switches” (from SAPKB70029 onwards), is able to show you the Oracle redo log switch measure.
SAP Oracle IO Redo Log Per Hour
You will need to adjust the “Minimum Time Between Switches[sec]” and “Maximum Time Between Switches [sec]” values, then click the refresh button.
The results tell you in what hour period, the number of times a redo log switch was performed where the time between the switches was within your defined range.
As an example, the screen shot above shows that on 29-11-2013 between 04:00 and 05:00, a redo log switch occurred on only 1 occasion where the time between one log switch and another was within 120 seconds (2 minutes).
It’s difficult to say if too many log switches is an actual problem for your specific database, but I would tend to investigate any database where the logs are switching on more than a couple of occasions a day, within 60 seconds.

Solaris 10 – Asynch I/O on UFS – Not Really

Whilst investigating a performance issue on an Oracle 10.2.0.5 database running on Solaris 10, I had to do a little digging to discover exactly how asynchronous I/O is implemented within Oracle when the database is using the UFS filesystem.

It was not as straightforward as i thought, because I had to start from the beginning and not trust any current Oracle settings.  For example, FILESYSTEMIO_OPTIONS was set to “ASYNCH” in the spfile.
This would normally indicate that a DBA has specifically determined that ASYNCH I/O is possible and that they do not want to use DIRECTIO (or concurrent I/O).

Yet, reading the SAP notes, you would think that this is a moot point.
SAP note  830576 “Parameter recommendations for Oracle 10g” v227 clearly states that Oracle 10.2 on Solaris with UFS filesystems, supports asynchronous I/O, so therefore set the FILESYSTEMIO_OPTIONS to “SETALL”.

However, if you have access to “My Oracle Support” you can check Oracle document “SOLARIS: Asynchronous I/O (AIO) on Solaris (SPARC) servers (Doc ID 48769.1)” which explains that, in Solaris with UFS, Oracle does not actually perform asynchronous I/O at the kernel (Solaris) level.  Instead, Solaris simulates AIO by performing parallel writes (calls pwrite() ) but to Oracle it still looks and feels like KAIO.
So, the details in the SAP note are not exactly accurate.  AIO is not supported in Solaris, but is simulated.  Also, the simulation is just issuing more parallel I/O requests. 

If your disk sub-system is slow, it’s still going to be slow, but Oracle writes might be slightly faster if you handle/tune for more parallel requests at the storage layer.

Here’s a nice article that explains it really well:  https://sunsite.uakom.sk/sunworldonline/swol-07-1998/swol-07-insidesolaris.html

SAP Netweaver 731 Oracle Create DB Statement

By default, when you use the Software Provisioning Manager (SWPM) to create a new NW731 Oracle database, it will generate and run an Oracle “CREATE DATABASE” statement as follows:

SQL> CREATE DATABASE DB1 CONTROLFILE REUSE 
MAXLOGFILES 255
MAXLOGMEMBERS 3
MAXLOGHISTORY 1000
MAXDATAFILES 1000
MAXINSTANCES 50
NOARCHIVELOG
CHARACTER SET UTF8
NATIONAL CHARACTER SET UTF8
DATAFILE '/oracle/DB1/sapdata1/system_1/system.data1' SIZE 350M REUSE AUTOEXTEND ON NEXT 20M MAXSIZE 10000M EXTENT MANAGEMENT LOCAL
DEFAULT TEMPORARY TABLESPACE PSAPTEMP TEMPFILE '/oracle/DB1/sapdata1/temp_1/temp.data1' SIZE 50M REUSE
AUTOEXTEND ON NEXT 20M MAXSIZE 10000M
UNDO TABLESPACE PSAPUNDO DATAFILE '/oracle/DB1/sapdata1/undo_1/undo.data1' SIZE 100M REUSE AUTOEXTEND ON NEXT 20M MAXSIZE 10000M
SYSAUX DATAFILE '/oracle/DB1/sapdata1/sysaux_1/sysaux.data1' SIZE 200M REUSE AUTOEXTEND ON NEXT 20M MAXSIZE 10000M
LOGFILE GROUP 1 ('/oracle/DB1/origlogA/log_g11m1.dbf') SIZE 200M  REUSE ,
GROUP 2 ('/oracle/DB1/origlogA/log_g12m1.dbf') SIZE 200M  REUSE ;


Notice that both the character set and national character set are UTF8.