This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Analysing & Reducing HANA Backup Catalog Records

In honour of DBA Appreciation Day today 3rd July, I’ve written a small piece on a menial but crucial task that HANA database administrators may wish to check. It’s very easy to overlook but the impact can be quite amazing.

HANA Transaction Logging

In “normal” log mode (for recoverability), the HANA database, like Oracle, has an automatic transaction log backup process, which is responsible for backing up transaction log segments so that the HANA log volume disk space can be re-used by new transactions.
No free disk space in the HANA log volume, means the database will hang, until free space becomes available.

It is strongly recommended by SAP, to have your HANA database in log mode “normal”, since this offers the point-in-time recovery capability through the use of the transaction log backups.

By default a transaction log backup will be triggered automatically by HANA every time a log segment becomes full or if the timeout for an individual service is hit, whichever of those is sooner.
This is known as “immediate” interval mode.

I’m not going to go into the differences of the various interval options and the pros and cons of each since this is highly scenario specific. A lot of companies have small HANA databases and are quite happy with the default options. Some companies have high throughput, super low latency requirements, and would be tuning the log backup process for maximum throughput, while other companies want minimal data-loss and adjust the parameters to ensure that transactions are backed up off the machine as soon as possible.

The SITREP

In this specific situation that I encountered, I have a small HANA database of around ~200GB in memory, serving a SAP Solution Manager 7.2 system (so it has 2x tenant databases plus the SystemDB).

The settings are such that all databases run in log_mode “normal” with consolidated log backups enabled in “immediate” mode and a max_log_backup_size of 16GB (the default, but specified).

All backups are written to a specific disk area, before being pushed off the VM to an Azure Storage Account.

The Issue

I noticed that the local disk area was becoming quite full where the HANA database backups are written. Out of context you might have said it’s normal for an increase of activity in the system, but I know that this system is not doing anything at all (it’s a test system for testing Solution Manager patches and nobody was using it).

What Was Causing the Disk Usage?

Looking at the disk backup file system, I could easily see at the O/S level, that the HANA database log backups were the reason for the extra space usage.
Narrowing that down even further, I could be specific enough to see that the SYSTEMDB was to blame.

The SYSTEMDB in a very lightly used HANA database should not be transacting enough to have a day-to-day noticeable increase in log backup disk usage.
This was no ordinary increase!
I was looking at a total HANA database size on disk of ~120GB (SYSTEMDB plus 2x TenantDBs), and yet I was seeing ~200GB of transaction log backups per day from just the SYSTEMDB.

Drilling down further into the log backup directory for the SYSTEMDB, I could see the name of the log backup files and their sizes.
I was looking at log backup files of 2.8GB in size every ~10 to ~15 minutes.
The files that were biggest were….

… log_backup_0_0_0_0.<unix epoch time>
That’s right, the backup catalog backups!

Whenever HANA writes a backup, whether it is a complete data backup, or a transaction log backup, it also writes a backup of the backup catalog.
This is extremely useful if you have to restore a system and need to know about the backups that have taken place.
By default, the backup catalog backups are accumulated, which means that HANA doesn’t need to write out multiple backups of the backup catalog for each log backup (remember, we have 2x tenantDBs).

Why Were Catalog Backup Files So Big?

The catalog backups include the entire backup catalog.
This means every prior backup is in the backup file, so by default the backup catalog backup file will increase in size at each backup, unless you do some housekeeping of the backup catalog records.

My task was to write some SQL to check the backup catalog to see how many backup catalog records existed, for what type of backups, in which database and how old they were.

I came up with the following SQL:

--- Breakdown of age of backup records in months, by type of record.
SELECT smbc.DATABASE_NAME,
smbc.ENTRY_TYPE_NAME,
MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE) as AGE_MONTHS,
COUNT(MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE)) RECORDS,
t_smbc.YOUNGEST_BACKUP_ID
FROM	"SYS_DATABASES"."M_BACKUP_CATALOG" AS smbc,
		(SELECT xmbc.DATABASE_NAME, 
				xmbc.ENTRY_TYPE_NAME, 
				MONTHS_BETWEEN(xmbc.SYS_START_TIME, CURRENT_DATE) as AGE_MONTHS, 
				max (xmbc.BACKUP_ID) as YOUNGEST_BACKUP_ID 
				FROM "SYS_DATABASES"."M_BACKUP_CATALOG" xmbc 
				GROUP BY xmbc.DATABASE_NAME, 
						xmbc.ENTRY_TYPE_NAME, 
						MONTHS_BETWEEN(xmbc.SYS_START_TIME, CURRENT_DATE) 
		) as t_smbc 
WHERE t_smbc.DATABASE_NAME = smbc.DATABASE_NAME 
AND t_smbc.ENTRY_TYPE_NAME = smbc.ENTRY_TYPE_NAME 
AND t_smbc.AGE_MONTHS = MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE) 
GROUP BY 	smbc.DATABASE_NAME, 
			smbc.ENTRY_TYPE_NAME, 
			MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE), 
			t_smbc.YOUNGEST_BACKUP_ID 
ORDER BY DATABASE_NAME, 
		AGE_MONTHS DESC,
		RECORDS

The key points to note are:

  • I use the SYS_DATABASES.M_BACKUP_CATALOG view in the SYSTEMDB to see across all databases in the HANA system instead of checking in each one.
  • For each database, the SQL outputs:
    – type of backup (complete or log).
    – age in months of the backup.
    – number of backup records in that age group.
    – youngest backup id for that age group (so I can do some cleanup).

An example execution is:

(NOTE: I made a mistake with the last column name, it’s correct in the SQL now – YOUNGEST_BACKUP_ID)

You can see that the SQL execution took only 3.8 seconds.
Based on my output, I could immediately see one problem, I had backup records from 6 months ago in the SYSTEMDB!

All of these records would be backed up on every transaction log backup.
For whatever reason, the backup process was not able to honour the “BACKUP CATALOG DELETE” which was meant to keep the catalog to less than 1 month of records.
I still cannot adequately explain why this had failed. The same process is used on other HANA databases and none had exhibited the same issue.

I can only presume something was preventing the deletion somehow, since in the next few steps you will see that I was able to use the exact same process with no reported issues.
For reference this is HANA 2.0 SPS04 rev47, patched all the way from SPS02 rev23.

Resolving the Issue

How did I resolve the issue? I simply re-ran the catalog deletion that was already running after each backup.
I was able to use the backup ID from the YOUNGEST_BACKUP_ID column to reduce the backup records.

In the SYSTEMDB:

BACKUP CATALOG DELETE ALL BEFORE BACKUP_ID xxxxxxxx

Then for each TenantDB (still in the SYSTEMDB):

BACKUP CATALOG DELETE FOR <TENANTBD> ALL BEFORE BACKUP_ID xxxxxxxx

At the end of the first DELETE execution *in the first Tenant*, I re-ran the initial SQL query to check and this was the output:

We now only have 1 backup record, which was the youngest record in that age group for that first tenant database (compare to screenshot of first execution of the SQL query with backup id 1,590,747,286,179).
Crucially we have way less log backups for that tenant. Weve gone down from 2247 to 495.
Nice!
I then progressed to do the delete in the SYSTEMDB and other TenantDB of this HANA system.

Checking the Results

As a final check, I was able to compare the log backup file sizes:

The catalog backup in file “log_backup_0_0_0_0.nnnnnnn” at 09:16 is before the cleanup and is 2.7GB in size.
Whereas the catalog backup in “log_backup_0_0_0_0.nnnnnnn” at 09:29 is after the cleanup and is only 76KB in size.
An absolutely massive reduction!

How do we know that file “log_backup_0_0_0_0.nnnnnnn” is a catalog backup?
Because we can check using the Linux “strings” command to see the file string contents.
Way further down the listing it says it is a catalog backup, but I thought it was more interesting to see the “MAGIC” of Berlin:

UPDATE: August 2020 – SAP note 2962726 has been released which contains some standard SQL to help remove failed backup entries from the catalog.

Summary

  • Check your HANA backup catalog backup sizes.
  • Ensure you have alerting on file systems (if doing backups to disk).
  • Double check the backup catalog record age.
  • Give tons of freebies and thanks to your DBAs on DBA Appreciation Day!
Useful Links

Enable and Disable Automatic Log Backup
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/241c0f0020b2492fb93a69a40b1b1b9a.html

Accumulated Backups of the Backup Catalog
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/3def15378b954aac85f2b93bb3f85a49.html

Log Modes
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/c486a0a3bb571014ab46c0633224f02f.html

Consolidated Log Backups
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/653b5c6d5f9d41808011a5bd0fac6709.html

Writing a Simple Backup Script

Database Backup Script Disks

So you want a backup script to backup your databases to disk.
Sounds like a nice half-a-day scripting job, doesn’t it?
Let’s analyse this requirement a little more and you will start to get the idea that it’s never as simple as it sounds.

Requirements

Digging a little deeper we come up with the following requirements:

Backup Configuration

We need a standard backup location on all database servers.
When we backup a database we need a folder location on the server (unless you’re using a specific device/method like Tivoli, or backint for SAP).
If we are backing up to a local directory (let’s assume this) and you don’t have the same drive/folder structure, then you will need to create a list of locations for each database and have the backup script look at the list, or worst case, hardcode the config into the script.
The easy way to solve this is to move the configuration out of the backup script altogether.
Most databases support the use of a predefined backup configuration.
For example, in ASE we can use “dump configurations”.
In HANA we adjust the ini file of the respective indexserver or nameserver to set the backup path locations.
Having done the above, for each database, we can then “simply” pull out the config from the target database or during the backup command execution, include the name of the config profile to use, which will dictate the target backup location.
We are going to ignore other issues, such as the size of the disk location, type of disk storage tier and other infrastructure related questions.

Access to the Database

We need a way of pulling out the dump/backup path from the database.
As mentioned above, if we move the configuration of the backup location out of the databases, we need to access it from the script.
This is definitely possible, but if it’s stored in the database (let’s assume this), at this point we have no way of logging into the database.
We therefore need a way of logging into the database in a standard way across all databases.
Therefore, we should elect to use a harmonised username for our backups (not a harmonised password!).

Multiple Scripts

We need more than one script.
Pulling out the configuration from the database is never easy due to the differences in the way that the command line interpreters work.
For example, in HANA 1.0, the hdbsql command outputs a slightly different format and provides less capability to customise the output, compared to the HANA 2.0 hdbsql command line tool.
SAP ASE is completely different and again needs a specific script setup.
The same will be for combinations of Linux and Windows systems.  You may need some PowerShell in there!

Modularised Code Packages

We need it to be packaged and easy to read.
Based on the above, we can see that we will need more than one script to cater for different DB vendors and architectures/platforms.
Therefore, one script for backing up HANA databases and one for ASE databases.
What if you need a SQL Server one?  Well again, the DB call is slightly different, so another script is needed.
It’s possible to modularise the scripts in such a way that the DB call itself is a separate script for each DB, leaving your core script the same.
However, we are now into the realms of script readability and simplicity versus complexity and re-use.
But it’s worth considering the options within the limitations of the operating system landscape that you have.

Database Security

We need a secure method of storing the password for the harmonised backup user across all the databases.
Since we will need to log into the database to perform tasks such as getting the backup config settings and actually calling the backup commands, we need to think about how we will be storing the username and password.
In some cases, like HANA, we can just use the secure file system store (hdbuserstore) to add our required username and password.
There are some issues with certain versions of HANA (HANA 1.0 is different to HANA 2.0) in this area.
For ASE we may be able to use the sybxctrl binary to execute our script in an intelligent way, avoiding the need to pass passwords at all.
Recent versions of SAP ASE 16.0 SP03, include a new aseuserstore command, which I will highlight in another post.  It’s very similar to the HANA hdbuserstore, except it allows ASE (for Business Suite) and ASE Enterprise Edition, to use the same method of password-less access.

Scheduling

We need a common backup scheduling capability across the servers.
You can use a central system (e.g. an enterprise scheduler), or something like cron or Windows Task Scheduler, but centrally controlled from a task controller script.
This will rely on either a shared storage ability (e.g. NFS/CIFS) or some method of the scripts talking centrally to a control plane.
Not only does the central location/control provide easy control of the schedule, but you will also find this useful for capturing error situations, where the backup script may have failed and needs to notify the operators.

Logging

We need the logging output of the scripts to be accessible.
When you are backing up over 100 databases, your administrators are not going to want to go onto each individual server to look at logs.
You will need a central location for logging and aggregation of that logging.

O/S Users

We may need common O/S user accounts.
The execution environment of the script needs to include the capability to access the log areas.
The user account used to perform the execution of the script needs to have a common setup across all servers.
If you’re using CIFS or NFS for storing logs, you will have permissions issues if you use different users, unless you configure your NFS/SMB settings appropriately.
In Unix/Linux, it’s easy to create a specific user (could be linked into Windows Active Directory) with the same UID across many servers, or make the user a member of the same group.

Housekeeping

We need housekeeping for our logging capability.
During the execution of your scripts, the logs you generate will need to be kept according to your usual policies.
You may want to see a report of backups for the last month.
You may need to provide audit evidence that a backup has been performed.

Disk Space

We need a defined amount of backup space.
If you are backing up to disk, you may need a way of calculating how much disk space you will need.
In HANA this is fairly easy as it provides views you can query to estimate the backup disk requirements prior to executing a backup.
You will need to call these and check against the target disk location, before your script starts the backup.
How will you account for additional space requirements over time?  Will you just fail or can you provide a warning?
How many backups or backup files will you retain on disk and over how many days?
Will these be removed by your script once they are no longer needed?

Backup Strategies

We should consider different backup strategies.
What type of backup will your script need to handle?
For example, with ASE or SQL Server, you may need to run transaction log backups as well as normal full backups.
Will this be the same script?  Can they run together at the same time?
If you are dumping to disk, performing a full backup once a day, then will you need those transaction log dumps from the previous day?
As well as performing a backup of the databases, your script should also backup the recommended configuration files.
For example on ASE, it is recommended to include the configuration file and also the dumphist file.
On HANA it is recommended to include the ini files, the backup.log and useful to include the trace files.
Will your backup be encrypted and will you need to store the keys somewhere?

Validation

We may need backup validation.
Some databases provide post-backup validation, and some provide inline validation of the blocks.
Do you need to consider to these check on (most are by default turned off)?

Authenticity

We should consider backup file authenticity.
Do you need to know if the backup files have been tampered with?
Or maybe just check that what was sent over the network to the target storage location, is the exact same file that was originally created?
You may need to perform some sort of checksum on the original backup file to help establish and authenticate the backup files.
This process should be the same ideally, for all databases.

Pre-Execution Checks

We should performing checking of the environment.
Before your script starts to run the backup, you may wish to include a common set of pre-checks.
The reason is that common issues can be integrated into the pre-checks over time.

Examples Include:

  • Are you running as the correct O/S user?
  • Do you have execution access to all required sub-scripts/log directories?
  • Is the type of target database that your script supports, installed?
  • Is the target database running?
  • Are any other required processes running (e.g. ASE backupserver)?
  • Is a backup script already executing?
  • Is the version of the database supported by your script?
  • Is the target backup destination available?  (e.g. file/folder location).
  • Is there enough disk space for your backup to complete?
  • Was the last backup a success?  If not, can you remove the previous dump files?

Once you’ve got all the above decided, then it will be a simple task of writing the script.

DB2 10.1 fp4 ICC Error

Scenario: After applying DB2 10.1 fp4 and trying to backup the database, we were seeing an ICC Error in the log:

2014-11-03-15.31.41.864652+000 I15689A2279 LEVEL: Error
PID : 18481304 TID : 1 PROC : nsrdb2ra
INSTANCE: db2sid NODE : 000
HOSTNAME: db01
EDUID : 1
FUNCTION: DB2 Common, Cryptography, cryptContextRealInit, probe:60
MESSAGE : ECF=0x90000403=-1879047165=ECF_CRYPT_UNEXPECTED_ERROR
Unexpected cryptographic error
DATA #1 : Hex integer, 4 bytes
0x00000002
DATA #2 : Hex integer, 4 bytes
0x00000002
DATA #3 : Hex integer, 4 bytes
0x00000000
DATA #4 : String, 32 bytes
Invalid data value: icclib.c:989
CALLSTCK: (Static functions may not be resolved correctly, as they are
resolved to the nearest symbol)
[0] 0x090000007ECEDB34 pdOSSeLoggingCallback + 0xC0
[1] 0x090000007DFB0224 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x090000007DFB0024 ossLog + 0xC4
[3] 0x090000007F37A960 cryptLogICCErrorWithStatus + 0xF8
[4] 0x090000007F37A2D0 cryptContextRealInit + 0x544
[5] 0x090000007F37B160 cryptContextCheckAndInit + 0x58
[6] 0x090000007F37B5E4 cryptDHInit + 0x1BC
[7] 0x090000007F35E3C4 sqlexSlcServerEncryptAccsec + 0xD0
[8] 0x090000007F35E260
sqlexSlcServerEncryptAuthenticate__FP14db2UCinterfacelPUi + 0x130
[9] 0x090000007E50B258 sqlexAppAuthenticate__FP14db2UCinterface +
0x3550
[10] 0x090000007E6DCC80 sqljrDrdaArAttach__FP14db2UCinterface + 0x84
[11] 0x090000007E6DC26C sqleUCdrdaARinit__FP14db2UCconHandle + 0x4B4
[12] 0x090000007E56A39C sqleUCappAttach + 0x92C
[13] 0x090000007E6A1F28 sqleatin__FPcN41sP5sqlca + 0x23C
[14] 0x090000007E6A1530 sqleatcp_api + 0x1C0
[15] 0x090000007E6A127C sqleatin_api + 0x54
[16] 0x00000001002FB550 InstAttach + 0xC0
[17] 0x00000001002DD104 validate_db2_credentials + 0x128
[18] 0x00000001002DEE38 db2ra_verify_credentials + 0xC8
[19] 0x00000001000008C0 ra_message_handler + 0xF0
[20] 0x0000000100055134 ra_default_message_handler + 0x308
[21] 0x0000000100054BE0 ra_callback + 0x17C
[22] 0x00000001002D93A4 ssncommon_do_callback + 0x70
[23] 0x00000001002D6908 msgssn_get_expected_msg_varp + 0x674
[24] 0x00000001002D7DFC ssn_getmsg_poll_varp + 0xEC
[25] 0x0000000100054980 ra_startagent + 0x128
[26] 0x0000000100055478 ra_main + 0x278
[27] 0x0000000100000604 main + 0x114
[28] 0x0000000100000318 __start + 0x90

The quick resolution was that we needed to adjust the environment of the AIX database backup user to include the following:

> export ICC_IGNORE_FIPS=YES

This was put into the .profile of the AIX backup user.

SAP HANA Backup Allocation Failed

During a HANA backup, you get an “Allocation Failed” error.
This is caused by a lack of memory.  If possible, increase the memory available to HANA by increasing the free memory at the O/S level (e.g. shutting down other HANA instances), or increase the global allocation limit.