Writing a Simple Backup Script » Musings of an IT Implementor

So you want a backup script to backup your databases to disk.
Sounds like a nice half-a-day scripting job, doesn’t it?
Let’s analyse this requirement a little more and you will start to get the idea that it’s never as simple as it sounds.

Requirements

Digging a little deeper we come up with the following requirements:

Backup Configuration

We need a standard backup location on all database servers.
When we backup a database we need a folder location on the server (unless you’re using a specific device/method like Tivoli, or backint for SAP).
If we are backing up to a local directory (let’s assume this) and you don’t have the same drive/folder structure, then you will need to create a list of locations for each database and have the backup script look at the list, or worst case, hardcode the config into the script.
The easy way to solve this is to move the configuration out of the backup script altogether.
Most databases support the use of a predefined backup configuration.
For example, in ASE we can use “dump configurations”.
In HANA we adjust the ini file of the respective indexserver or nameserver to set the backup path locations.
Having done the above, for each database, we can then “simply” pull out the config from the target database or during the backup command execution, include the name of the config profile to use, which will dictate the target backup location.
We are going to ignore other issues, such as the size of the disk location, type of disk storage tier and other infrastructure related questions.

Access to the Database

We need a way of pulling out the dump/backup path from the database.
As mentioned above, if we move the configuration of the backup location out of the databases, we need to access it from the script.
This is definitely possible, but if it’s stored in the database (let’s assume this), at this point we have no way of logging into the database.
We therefore need a way of logging into the database in a standard way across all databases.
Therefore, we should elect to use a harmonised username for our backups (not a harmonised password!).

Multiple Scripts

We need more than one script.
Pulling out the configuration from the database is never easy due to the differences in the way that the command line interpreters work.
For example, in HANA 1.0, the hdbsql command outputs a slightly different format and provides less capability to customise the output, compared to the HANA 2.0 hdbsql command line tool.
SAP ASE is completely different and again needs a specific script setup.
The same will be for combinations of Linux and Windows systems. You may need some PowerShell in there!

Modularised Code Packages

We need it to be packaged and easy to read.
Based on the above, we can see that we will need more than one script to cater for different DB vendors and architectures/platforms.
Therefore, one script for backing up HANA databases and one for ASE databases.
What if you need a SQL Server one? Well again, the DB call is slightly different, so another script is needed.
It’s possible to modularise the scripts in such a way that the DB call itself is a separate script for each DB, leaving your core script the same.
However, we are now into the realms of script readability and simplicity versus complexity and re-use.
But it’s worth considering the options within the limitations of the operating system landscape that you have.

Database Security

We need a secure method of storing the password for the harmonised backup user across all the databases.
Since we will need to log into the database to perform tasks such as getting the backup config settings and actually calling the backup commands, we need to think about how we will be storing the username and password.
In some cases, like HANA, we can just use the secure file system store (hdbuserstore) to add our required username and password.
There are some issues with certain versions of HANA (HANA 1.0 is different to HANA 2.0) in this area.
For ASE we may be able to use the sybxctrl binary to execute our script in an intelligent way, avoiding the need to pass passwords at all.
Recent versions of SAP ASE 16.0 SP03, include a new aseuserstore command, which I will highlight in another post. It’s very similar to the HANA hdbuserstore, except it allows ASE (for Business Suite) and ASE Enterprise Edition, to use the same method of password-less access.

Scheduling

We need a common backup scheduling capability across the servers.
You can use a central system (e.g. an enterprise scheduler), or something like cron or Windows Task Scheduler, but centrally controlled from a task controller script.
This will rely on either a shared storage ability (e.g. NFS/CIFS) or some method of the scripts talking centrally to a control plane.
Not only does the central location/control provide easy control of the schedule, but you will also find this useful for capturing error situations, where the backup script may have failed and needs to notify the operators.

Logging

We need the logging output of the scripts to be accessible.
When you are backing up over 100 databases, your administrators are not going to want to go onto each individual server to look at logs.
You will need a central location for logging and aggregation of that logging.

O/S Users

We may need common O/S user accounts.
The execution environment of the script needs to include the capability to access the log areas.
The user account used to perform the execution of the script needs to have a common setup across all servers.
If you’re using CIFS or NFS for storing logs, you will have permissions issues if you use different users, unless you configure your NFS/SMB settings appropriately.
In Unix/Linux, it’s easy to create a specific user (could be linked into Windows Active Directory) with the same UID across many servers, or make the user a member of the same group.

Housekeeping

We need housekeeping for our logging capability.
During the execution of your scripts, the logs you generate will need to be kept according to your usual policies.
You may want to see a report of backups for the last month.
You may need to provide audit evidence that a backup has been performed.

Disk Space

We need a defined amount of backup space.
If you are backing up to disk, you may need a way of calculating how much disk space you will need.
In HANA this is fairly easy as it provides views you can query to estimate the backup disk requirements prior to executing a backup.
You will need to call these and check against the target disk location, before your script starts the backup.
How will you account for additional space requirements over time? Will you just fail or can you provide a warning?
How many backups or backup files will you retain on disk and over how many days?
Will these be removed by your script once they are no longer needed?

Backup Strategies

We should consider different backup strategies.
What type of backup will your script need to handle?
For example, with ASE or SQL Server, you may need to run transaction log backups as well as normal full backups.
Will this be the same script? Can they run together at the same time?
If you are dumping to disk, performing a full backup once a day, then will you need those transaction log dumps from the previous day?
As well as performing a backup of the databases, your script should also backup the recommended configuration files.
For example on ASE, it is recommended to include the configuration file and also the dumphist file.
On HANA it is recommended to include the ini files, the backup.log and useful to include the trace files.
Will your backup be encrypted and will you need to store the keys somewhere?

Validation

We may need backup validation.
Some databases provide post-backup validation, and some provide inline validation of the blocks.
Do you need to consider to these check on (most are by default turned off)?

Authenticity

We should consider backup file authenticity.
Do you need to know if the backup files have been tampered with?
Or maybe just check that what was sent over the network to the target storage location, is the exact same file that was originally created?
You may need to perform some sort of checksum on the original backup file to help establish and authenticate the backup files.
This process should be the same ideally, for all databases.

Pre-Execution Checks

We should performing checking of the environment.
Before your script starts to run the backup, you may wish to include a common set of pre-checks.
The reason is that common issues can be integrated into the pre-checks over time.

Examples Include:

Are you running as the correct O/S user?
Do you have execution access to all required sub-scripts/log directories?
Is the type of target database that your script supports, installed?
Is the target database running?
Are any other required processes running (e.g. ASE backupserver)?
Is a backup script already executing?
Is the version of the database supported by your script?
Is the target backup destination available? (e.g. file/folder location).
Is there enough disk space for your backup to complete?
Was the last backup a success? If not, can you remove the previous dump files?

Once you’ve got all the above decided, then it will be a simple task of writing the script.