In honour of DBA Appreciation Day today 3rd July, I’ve written a small piece on a menial but crucial task that HANA database administrators may wish to check. It’s very easy to overlook but the impact can be quite amazing.
HANA Transaction Logging
In “normal” log mode (for recoverability), the HANA database, like Oracle, has an automatic transaction log backup process, which is responsible for backing up transaction log segments so that the HANA log volume disk space can be re-used by new transactions. No free disk space in the HANA log volume, means the database will hang, until free space becomes available.
It is strongly recommended by SAP, to have your HANA database in log mode “normal”, since this offers the point-in-time recovery capability through the use of the transaction log backups.
By default a transaction log backup will be triggered automatically by HANA every time a log segment becomes full or if the timeout for an individual service is hit, whichever of those is sooner. This is known as “immediate” interval mode.
I’m not going to go into the differences of the various interval options and the pros and cons of each since this is highly scenario specific. A lot of companies have small HANA databases and are quite happy with the default options. Some companies have high throughput, super low latency requirements, and would be tuning the log backup process for maximum throughput, while other companies want minimal data-loss and adjust the parameters to ensure that transactions are backed up off the machine as soon as possible.
The SITREP
In this specific situation that I encountered, I have a small HANA database of around ~200GB in memory, serving a SAP Solution Manager 7.2 system (so it has 2x tenant databases plus the SystemDB).
The settings are such that all databases run in log_mode “normal” with consolidated log backups enabled in “immediate” mode and a max_log_backup_size of 16GB (the default, but specified).
All backups are written to a specific disk area, before being pushed off the VM to an Azure Storage Account.
The Issue
I noticed that the local disk area was becoming quite full where the HANA database backups are written. Out of context you might have said it’s normal for an increase of activity in the system, but I know that this system is not doing anything at all (it’s a test system for testing Solution Manager patches and nobody was using it).
What Was Causing the Disk Usage?
Looking at the disk backup file system, I could easily see at the O/S level, that the HANA database log backups were the reason for the extra space usage. Narrowing that down even further, I could be specific enough to see that the SYSTEMDB was to blame.
The SYSTEMDB in a very lightly used HANA database should not be transacting enough to have a day-to-day noticeable increase in log backup disk usage. This was no ordinary increase! I was looking at a total HANA database size on disk of ~120GB (SYSTEMDB plus 2x TenantDBs), and yet I was seeing ~200GB of transaction log backups per day from just the SYSTEMDB.
Drilling down further into the log backup directory for the SYSTEMDB, I could see the name of the log backup files and their sizes. I was looking at log backup files of 2.8GB in size every ~10 to ~15 minutes. The files that were biggest were….
… log_backup_0_0_0_0.<unix epoch time> That’s right, the backup catalog backups!
Whenever HANA writes a backup, whether it is a complete data backup, or a transaction log backup, it also writes a backup of the backup catalog. This is extremely useful if you have to restore a system and need to know about the backups that have taken place. By default, the backup catalog backups are accumulated, which means that HANA doesn’t need to write out multiple backups of the backup catalog for each log backup (remember, we have 2x tenantDBs).
Why Were Catalog Backup Files So Big?
The catalog backups include the entire backup catalog. This means every prior backup is in the backup file, so by default the backup catalog backup file will increase in size at each backup, unless you do some housekeeping of the backup catalog records.
My task was to write some SQL to check the backup catalog to see how many backup catalog records existed, for what type of backups, in which database and how old they were.
I came up with the following SQL:
--- Breakdown of age of backup records in months, by type of record.
SELECT smbc.DATABASE_NAME,
smbc.ENTRY_TYPE_NAME,
MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE) as AGE_MONTHS,
COUNT(MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE)) RECORDS,
t_smbc.YOUNGEST_BACKUP_ID
FROM "SYS_DATABASES"."M_BACKUP_CATALOG" AS smbc,
(SELECT xmbc.DATABASE_NAME,
xmbc.ENTRY_TYPE_NAME,
MONTHS_BETWEEN(xmbc.SYS_START_TIME, CURRENT_DATE) as AGE_MONTHS,
max (xmbc.BACKUP_ID) as YOUNGEST_BACKUP_ID
FROM "SYS_DATABASES"."M_BACKUP_CATALOG" xmbc
GROUP BY xmbc.DATABASE_NAME,
xmbc.ENTRY_TYPE_NAME,
MONTHS_BETWEEN(xmbc.SYS_START_TIME, CURRENT_DATE)
) as t_smbc
WHERE t_smbc.DATABASE_NAME = smbc.DATABASE_NAME
AND t_smbc.ENTRY_TYPE_NAME = smbc.ENTRY_TYPE_NAME
AND t_smbc.AGE_MONTHS = MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE)
GROUP BY smbc.DATABASE_NAME,
smbc.ENTRY_TYPE_NAME,
MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE),
t_smbc.YOUNGEST_BACKUP_ID
ORDER BY DATABASE_NAME,
AGE_MONTHS DESC,
RECORDS
The key points to note are:
I use the SYS_DATABASES.M_BACKUP_CATALOG view in the SYSTEMDB to see across all databases in the HANA system instead of checking in each one. For each database, the SQL outputs: – type of backup (complete or log). – age in months of the backup. – number of backup records in that age group. – youngest backup id for that age group (so I can do some cleanup).
An example execution is: (NOTE : I made a mistake with the last column name, it’s correct in the SQL now – YOUNGEST_BACKUP_ID)
You can see that the SQL execution took only 3.8 seconds. Based on my output, I could immediately see one problem, I had backup records from 6 months ago in the SYSTEMDB!
All of these records would be backed up on every transaction log backup. For whatever reason, the backup process was not able to honour the “BACKUP CATALOG DELETE” which was meant to keep the catalog to less than 1 month of records. I still cannot adequately explain why this had failed. The same process is used on other HANA databases and none had exhibited the same issue.
I can only presume something was preventing the deletion somehow, since in the next few steps you will see that I was able to use the exact same process with no reported issues. For reference this is HANA 2.0 SPS04 rev47, patched all the way from SPS02 rev23.
Resolving the Issue
How did I resolve the issue? I simply re-ran the catalog deletion that was already running after each backup. I was able to use the backup ID from the YOUNGEST_BACKUP_ID column to reduce the backup records.
In the SYSTEMDB:
BACKUP CATALOG DELETE ALL BEFORE BACKUP_ID xxxxxxxx
Then for each TenantDB (still in the SYSTEMDB):
BACKUP CATALOG DELETE FOR <TENANTBD> ALL BEFORE BACKUP_ID xxxxxxxx
At the end of the first DELETE execution *in the first Tenant*, I re-ran the initial SQL query to check and this was the output:
We now only have 1 backup record, which was the youngest record in that age group for that first tenant database (compare to screenshot of first execution of the SQL query with backup id 1,590,747,286,179). Crucially we have way less log backups for that tenant. Weve gone down from 2247 to 495. Nice! I then progressed to do the delete in the SYSTEMDB and other TenantDB of this HANA system.
Checking the Results
As a final check, I was able to compare the log backup file sizes:
The catalog backup in file “log_backup_0_0_0_0.nnnnnnn” at 09:16 is before the cleanup and is 2.7GB in size. Whereas the catalog backup in “log_backup_0_0_0_0.nnnnnnn” at 09:29 is after the cleanup and is only 76KB in size. An absolutely massive reduction!
How do we know that file “log_backup_0_0_0_0.nnnnnnn” is a catalog backup? Because we can check using the Linux “strings ” command to see the file string contents. Way further down the listing it says it is a catalog backup, but I thought it was more interesting to see the “MAGIC” of Berlin:
UPDATE : August 2020 – SAP note 2962726 has been released which contains some standard SQL to help remove failed backup entries from the catalog.
Summary
Check your HANA backup catalog backup sizes. Ensure you have alerting on file systems (if doing backups to disk). Double check the backup catalog record age. Give tons of freebies and thanks to your DBAs on DBA Appreciation Day!
Useful Links
Enable and Disable Automatic Log Backuphttps://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/241c0f0020b2492fb93a69a40b1b1b9a.html Accumulated Backups of the Backup Cataloghttps://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/3def15378b954aac85f2b93bb3f85a49.html Log Modeshttps://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/c486a0a3bb571014ab46c0633224f02f.html Consolidated Log Backupshttps://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/653b5c6d5f9d41808011a5bd0fac6709.html
You may also be interested in: