Script Archives » Page 2 of 4 » Musings of an IT Implementor

Checking Azure Disk Cache Settings on a Linux VM in Shell

In a previous blog post, I ended the post by showing how you can use the Azure Enhanced Monitoring for Linux to obtain the disk cache settings.
Except, as we found, it doesn’t easily allow you to relate the Linux O/S disk device names and volume groups, to the Azure data disk names.

You can read the previous post here: Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

In this short post, I pick up where I left off and outline a method that will allow you to correlate the O/S volume group name, with the Linux O/S disk devices and correlate those Linux disk devices with the Azure data disk names, and finally, the Azure data disks with their disk cache settings.

Using the method I will show you, you will see how easily you can verify that the disk cache settings are consistent for all disks that make up a single volume group (very important), and also be able to easily associate those volume groups with the type of usage of the underlying Azure disks (e.g. is it for database data, logs or executable binaries).

1. Check If AEM Is Installed

Our first step is to check if the Azure Enhanced Monitoring for Linux (AEM) extension is installed on the Azure VM.
This extension is required, for your VM to be supported by SAP.

We use standard Linux command line to check for the extension on the VM:

ls -1 /var/lib/waagent/Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-*/config/0.settings

The listing should return at least 1 file called “0.settings”.
If you don’t have this and you don’t have a directory starting with “Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-“, then you don’t have AEM and you should get it installed following standard Microsoft documentation.

2. Get the Number of Disks Known to AEM

We need to know how many disks AEM knows about:

grep -c 'disk;Caching;' /var/lib/AzureEnhancedMonitor/PerfCounters

3. Get the Number of SCSI Disks Known to Linux

We need to know how many disks Linux knows about (we exclude the root disk /dev/sda):

lsscsi --size --size | grep -cv '/dev/sda'

4. Compare Disk Counts

Compare the disks quantity from AEM and from Linux. They should be the same. This is the number of data disks attached to the VM.

If you have a lower number from the AEM PerfCounters file, then you may be suffering the effects of an Azure bug in the AEM extension which is unable to handle more than 9 data disks.
Do you have more than 9 data disks?

At this point if you do not have matching numbers, then you will not be able to continue, as the AEM output is vital in the next steps.

Mapping Disks to the Cache Settings

Once we know our AEM PerfCounters file contains all our data disks, we are now ready to map the physical volumes (on our disk devices) to the cache settings. On the Linux VM:

pvs -o "pv_name,vg_name" --separator=' ' --noheadings

Your output should be a list of disks and their volume groups like so (based on our diagram earlier in the post):

/dev/sdc vg_data
/dev/sdd vg_data

Next we look for a line in the AEM PerfCounters file that contains that disk device name, to get the cache setting:

awk -F';' '/;disk;Caching;/ { sub(/\/dev\//,"",$4); printf "/dev/%s %s\n", tolower($4), tolower($6) }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be the Linux disk device name and the Azure data disk cache setting:

/dev/sdc none
/dev/sdd none

For each line of disks from the cache setting, we can now see what volume group it belongs to.
Example: /dev/sdc is vg_data and the disk in Azure has a cache setting of “none”.

If there are multiple disks in the volume group, they all must have the same cache setting applied!

Finally, we look for the device name in the PerfCounters file again, to get the name of the Azure disk:

NOTE: Below is looking specifically for “sdc”.

awk -F';' '/;Phys. Disc to Storage Mapping;sdc;/ { print $6 }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be like so:

None sapserver01-datadisk1
None sapserver01-datadisk2

We can ignore the first column output (“None”) in the above, it’s not needed.

Summary

If you package the AEM disk count check and the subsequent AEM PerfCounters AWK scripts into one neat script with the required loops, then you can get the output similar to this, in one call:

/dev/sdd none vg_data sapserver01-datadisk2
/dev/sdc none vg_data sapserver01-datadisk1
/dev/sda readwrite

Based on the above output, I can see that my vg_data volume group disks (sdc & sdd) all have the correct setting for Azure data disk caching in Azure for a HANA database data disk location.

Taking a step further, if you have intelligently named your volume group names, you then also check in your script, the cache setting based on the name of the volume group to determine if it is correct, or not.
You can then embed this validation script into a “custom validation” within SAP LaMa and it will alert you automatically if your VM disk cache settings are not correct.

You may be wondering, why not do all this from the Azure Portal?
Well, the answer to that is that you don’t know what Linux VM volume groups those Azure disks are used by, unless you have tagged them or named them intelligently in Azure.

You may also be interested in:

HowTo: Show Current Role of a HA SAP Cloud Connector

If you have installed the SAP Cloud Connector, you will know that out-of-the-box it is capable of providing a High Availability feature at the application layer.

Essentially, you can install 2x SAP Cloud Connectors on 2x VMs and then they can be paired so that one acts as Master and one as “Shadow” (secondary).

The Shadow instance connects to the Master to replicate the required configuration.

If you decide to patch the Cloud Connector (everything needs patching right?!), then you can simply patch the Shadow instance, trigger a failover then patch the old Master.

There is only one complication in this, and that is that it’s not “easy” to see which is acting in which role unless you log into the web administration console.

You can go through the log files to see which has taken over the role of Master at some point, but this is a not easy and doesn’t lend itself to being scripted for automated detection of the current role.

Here’s a nice easy way to detect the current role, and could be used (for example) as part of a Custom Instance monitor script for SAP LaMa automation of the Cloud Connector:

awk '/<haRole>/ { match($1,/<haRole>(.*)<\/haRole>/,role); if (role[1] != "" ) { print role[1]; exit } }' /opt/sap/scc/scc_config/scc_config.ini

Out will be either “shadow”, or “master”.

I use awk a lot of the time for pattern group matching because I like the simplicity, it’s a powerful tool and deserves the very long O’Reilly book.

Here’s what that single code line is doing:

awk	The call to the program binary.
‘	Start the contents of the inline AWK script (prevents interpretation by the shell).
/<haRole>/	Match every line that contains the <haRole> tag.
{	On each line match, execute this block of code (we close with “}”).
$1	Match against the 1st space delimited parameter on the line.
/<haRole>(.)<\/haRole>/,*	Obtain any text “.*” between <haRole> tag.
role	Store the match in a new array called “role”.
if (role[1] != “” )	Check that after the matching, the role array has 2 entries (zero initialised array).
{ print role[1]; exit }	If we do have 2 entries, print the second one (1st is the complete matched text string) from the array and exit.
}’	Close off the command and AWK script.
/opt/sap/scc/ scc_config/ scc_config.ini	The name of the input file for AWK to scan.

It’s a nice simple way of checking the current role, and can be embedded into a shell script for easy execution.

Writing a Simple Backup Script

So you want a backup script to backup your databases to disk.
Sounds like a nice half-a-day scripting job, doesn’t it?
Let’s analyse this requirement a little more and you will start to get the idea that it’s never as simple as it sounds.

Requirements

Digging a little deeper we come up with the following requirements:

Backup Configuration

We need a standard backup location on all database servers.
When we backup a database we need a folder location on the server (unless you’re using a specific device/method like Tivoli, or backint for SAP).
If we are backing up to a local directory (let’s assume this) and you don’t have the same drive/folder structure, then you will need to create a list of locations for each database and have the backup script look at the list, or worst case, hardcode the config into the script.
The easy way to solve this is to move the configuration out of the backup script altogether.
Most databases support the use of a predefined backup configuration.
For example, in ASE we can use “dump configurations”.
In HANA we adjust the ini file of the respective indexserver or nameserver to set the backup path locations.
Having done the above, for each database, we can then “simply” pull out the config from the target database or during the backup command execution, include the name of the config profile to use, which will dictate the target backup location.
We are going to ignore other issues, such as the size of the disk location, type of disk storage tier and other infrastructure related questions.

Access to the Database

We need a way of pulling out the dump/backup path from the database.
As mentioned above, if we move the configuration of the backup location out of the databases, we need to access it from the script.
This is definitely possible, but if it’s stored in the database (let’s assume this), at this point we have no way of logging into the database.
We therefore need a way of logging into the database in a standard way across all databases.
Therefore, we should elect to use a harmonised username for our backups (not a harmonised password!).

Multiple Scripts

We need more than one script.
Pulling out the configuration from the database is never easy due to the differences in the way that the command line interpreters work.
For example, in HANA 1.0, the hdbsql command outputs a slightly different format and provides less capability to customise the output, compared to the HANA 2.0 hdbsql command line tool.
SAP ASE is completely different and again needs a specific script setup.
The same will be for combinations of Linux and Windows systems. You may need some PowerShell in there!

Modularised Code Packages

We need it to be packaged and easy to read.
Based on the above, we can see that we will need more than one script to cater for different DB vendors and architectures/platforms.
Therefore, one script for backing up HANA databases and one for ASE databases.
What if you need a SQL Server one? Well again, the DB call is slightly different, so another script is needed.
It’s possible to modularise the scripts in such a way that the DB call itself is a separate script for each DB, leaving your core script the same.
However, we are now into the realms of script readability and simplicity versus complexity and re-use.
But it’s worth considering the options within the limitations of the operating system landscape that you have.

Database Security

We need a secure method of storing the password for the harmonised backup user across all the databases.
Since we will need to log into the database to perform tasks such as getting the backup config settings and actually calling the backup commands, we need to think about how we will be storing the username and password.
In some cases, like HANA, we can just use the secure file system store (hdbuserstore) to add our required username and password.
There are some issues with certain versions of HANA (HANA 1.0 is different to HANA 2.0) in this area.
For ASE we may be able to use the sybxctrl binary to execute our script in an intelligent way, avoiding the need to pass passwords at all.
Recent versions of SAP ASE 16.0 SP03, include a new aseuserstore command, which I will highlight in another post. It’s very similar to the HANA hdbuserstore, except it allows ASE (for Business Suite) and ASE Enterprise Edition, to use the same method of password-less access.

Scheduling

We need a common backup scheduling capability across the servers.
You can use a central system (e.g. an enterprise scheduler), or something like cron or Windows Task Scheduler, but centrally controlled from a task controller script.
This will rely on either a shared storage ability (e.g. NFS/CIFS) or some method of the scripts talking centrally to a control plane.
Not only does the central location/control provide easy control of the schedule, but you will also find this useful for capturing error situations, where the backup script may have failed and needs to notify the operators.

Logging

We need the logging output of the scripts to be accessible.
When you are backing up over 100 databases, your administrators are not going to want to go onto each individual server to look at logs.
You will need a central location for logging and aggregation of that logging.

O/S Users

We may need common O/S user accounts.
The execution environment of the script needs to include the capability to access the log areas.
The user account used to perform the execution of the script needs to have a common setup across all servers.
If you’re using CIFS or NFS for storing logs, you will have permissions issues if you use different users, unless you configure your NFS/SMB settings appropriately.
In Unix/Linux, it’s easy to create a specific user (could be linked into Windows Active Directory) with the same UID across many servers, or make the user a member of the same group.

Housekeeping

We need housekeeping for our logging capability.
During the execution of your scripts, the logs you generate will need to be kept according to your usual policies.
You may want to see a report of backups for the last month.
You may need to provide audit evidence that a backup has been performed.

Disk Space

We need a defined amount of backup space.
If you are backing up to disk, you may need a way of calculating how much disk space you will need.
In HANA this is fairly easy as it provides views you can query to estimate the backup disk requirements prior to executing a backup.
You will need to call these and check against the target disk location, before your script starts the backup.
How will you account for additional space requirements over time? Will you just fail or can you provide a warning?
How many backups or backup files will you retain on disk and over how many days?
Will these be removed by your script once they are no longer needed?

Backup Strategies

We should consider different backup strategies.
What type of backup will your script need to handle?
For example, with ASE or SQL Server, you may need to run transaction log backups as well as normal full backups.
Will this be the same script? Can they run together at the same time?
If you are dumping to disk, performing a full backup once a day, then will you need those transaction log dumps from the previous day?
As well as performing a backup of the databases, your script should also backup the recommended configuration files.
For example on ASE, it is recommended to include the configuration file and also the dumphist file.
On HANA it is recommended to include the ini files, the backup.log and useful to include the trace files.
Will your backup be encrypted and will you need to store the keys somewhere?

Validation

We may need backup validation.
Some databases provide post-backup validation, and some provide inline validation of the blocks.
Do you need to consider to these check on (most are by default turned off)?

Authenticity

We should consider backup file authenticity.
Do you need to know if the backup files have been tampered with?
Or maybe just check that what was sent over the network to the target storage location, is the exact same file that was originally created?
You may need to perform some sort of checksum on the original backup file to help establish and authenticate the backup files.
This process should be the same ideally, for all databases.

Pre-Execution Checks

We should performing checking of the environment.
Before your script starts to run the backup, you may wish to include a common set of pre-checks.
The reason is that common issues can be integrated into the pre-checks over time.

Examples Include:

Are you running as the correct O/S user?
Do you have execution access to all required sub-scripts/log directories?
Is the type of target database that your script supports, installed?
Is the target database running?
Are any other required processes running (e.g. ASE backupserver)?
Is a backup script already executing?
Is the version of the database supported by your script?
Is the target backup destination available? (e.g. file/folder location).
Is there enough disk space for your backup to complete?
Was the last backup a success? If not, can you remove the previous dump files?

Once you’ve got all the above decided, then it will be a simple task of writing the script.

Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

As part of a SAP HANA deployment, there are a set of recommendations around the Azure VM disk caching settings and the use of the Azure VM WriteAccelerator.
These features should be applied to the SAP HANA database data volume and log volume disks to ensure optimum performance of the database I/O operations.

This post is not about the cache settings, but about how it’s possible to gather the required information about the current settings across your landscape.

There are 3 main methods available to an infrastructure person, to see the current Azure VM disk cache settings.
I will discuss these method below.

1, Using the Azure Portal

You can use the Azure Portal to locate the VM you are interested in, then checking the disks, and looking on each disk.
You can only see the disk cache settings under the VM view inside the Azure Portal.

While slightly counter intuitive (you would expect to see the same under the “Disks” view), it’s because the disk cache feature is provided for by the VM onto which the disks are bound, therefore it’s tied to the VM view.

2, Using the Azure CLI

Using the Azure CLI (bash or powershell) to find the disks and get the settings.

This is by far the most common approach for anyone managing a large estate. It uses the existing Azure API layers and the Azure CLI to query your Azure subscription, return the data in JSON format and parse it.
The actual query is written in JMESPATH (https://jmespath.org/) and is similar to XPath (for XML).

A couple of sample queries in BASH (my favourite shell):

List all VM names:

az vm list --query [].name -o table

List VM names, powerstate, vmsize, O/S and RG:

az vm list --show-details --query '[].{name:name, state:powerState, OS:storageProfile.osDisk.osType, Type:hardwareProfile.vmSize, rg:resourceGroup, diskName:storageProfile.dataDisks.name, diskLUN:storageProfile.dataDisks.lun, diskCaching:storageProfile.dataDisks.caching, diskSizeG:storageProfile.dataDisks.diskSizeGb, WAEnabled:storageProfile.dataDisks.writeAcceleratorEnabled }' -o table

List all VMs with names ending d01 or d02 or d03, then pull out the data disk details and whether the WriteAccelerator is enabled:

az vm list --query "[?ends_with(name,'d01')||ends_with(name,'d02')||ends_with(name,'d03')]|[].storageProfile.dataDisks[].[lun,name,caching,diskSizeGb,writeAcceleratorEnabled]" -o tsv

To execute the above, simply launch the Cloud Shell and select “Bash” in the Azure Portal:

Then paste in the query and hit return:

3, A Most Obscure Method.

Since SAP require you to have the “Enhanced Monitoring for Linux” (OEM) agent extension installed, you can obtain the disk details directly on each VM.

For Linux VMs, the OEM creates a special text file for performance counters, which is used by the Saposcol (remember that) for use by SAP diagnostic agents, ABAP stacks and other tools.

Using a simple piece of awk scripting, we can pull out the disk cache settings from the file like so:

awk -F';' '/;disk;Caching;/ { sub(//dev//,"",$4); printf "/dev/%s %sn", tolower($4), tolower($6) }' /var/lib/AzureEnhancedMonitor/PerfCounters

There’s a lot more information in the text file (/var/lib/AzureEnhancedMonitor/PerfCounters) and my later post Checking Azure Disk Cache Settings on a Linux VM in Shell, I show how you can pull out the complete mapping between Linux disk devices, disk volume groups, Azure disk names and the disk caching settings, like so:

Useful Links

You may also be interested in:

Generate HMAC for Azure Storage from KSH

Generating an Azure HMAC Signature for calling Azure Storage Services from KSH

While custom writing an Azure Storage Service blob deletion script, I experienced a problem using the OpenSSL method for generating an HMAC.

For those not familiar with Azure Storage Services (or even signature based authentication) the act of sending a signature as part of an HTTP request serves to prove to the target server that you are in possession of the secret key and that you also would like to perform a specific operation.

The shared key (that you have been given out-of-band) is used to sign the HTTP call. This is so the target server can then perform the same signing operation at its end, and if the signature it obtains matches the one you’ve sent, then it trusts and permits you to perform the specific HTTP operation you’ve requested.

See here for more details: https://docs.microsoft.com/en-us/rest/api/storageservices/authenticate-with-shared-key#Constructing_Element

In my example, the operation is a simple BLOB deletion from an Azure Storage Account, but that is irrelevant to this particular post.
The HMAC generation routine is the same no matter what HTTP operation you wish to perform.

Based on searching in Google, the following OpenSSL method seems popular and able to provide a method of generating an HMAC:

l_input=”your HTTP operation to be signed”
l_key=”your big long Azure storage account key”
l_key_decoded=”$(echo -n “${l_key}”|base64 -d)”
l_hmac=”$(echo -n “${l_input}”|openssl dgst -sha256 -hmac “${l_key_decoded}” -binary | base64)”

The above works, with KSH, most of the time.
There have been one or two occasions when for no apparent reason, an incorrect HMAC is generated.
It’s possible that this stems from the character set interpretation e.g. UTF-8 and/or some strangeness in the way the KSH interpreter works with specific characters. I really wasn’t able to investigate deep enough with the time I had.

Instead of the above, I decided to take a leaf out of the Blobxfer utility team’s book and use a Python based solution instead.
Browsing the Blobxfer source in GitHub, I isolated the specific Python routine that was used to provide the HMAC.
Putting this routine into KSH makes it look like the following:

l_hmac=”$(cat <<EOF | python –
import sys
import hmac
import hashlib
import base64

def _encode_base64(data):
encoded = base64.b64encode(data)
return encoded

def _decode_base64_to_bytes(data):
return base64.b64decode(data)

def _sign_string(key, string_to_sign):
key = _decode_base64_to_bytes(key.encode(‘utf-8’))
string_to_sign = string_to_sign.encode(‘utf-8’)
signed_hmac_sha256 = hmac.HMAC(key, string_to_sign, hashlib.sha256)
digest = signed_hmac_sha256.digest()
encoded_digest = _encode_base64(digest)
return encoded_digest

data = “””${l_input}”””
key = “${l_key}”
print (_sign_string(key,data))
EOF
)”

I’m using a combination of HERE document and KSH in-line sub-shell execution to call python and pass in the stdin containing the python code to be executed.
KSH is responsible for embedding the required variables into the Python code, such as l_input and l_key.

So far, this routine has proved successful 100% of the time.

1. Check If AEM Is Installed

2. Get the Number of Disks Known to AEM

3. Get the Number of SCSI Disks Known to Linux

4. Compare Disk Counts

Mapping Disks to the Cache Settings

Summary

You may also be interested in:

Requirements

Backup Configuration

Access to the Database

Multiple Scripts

Modularised Code Packages

Database Security

Scheduling

Logging

O/S Users

Housekeeping

Disk Space

Backup Strategies

Validation

Authenticity

Pre-Execution Checks

1, Using the Azure Portal

2, Using the Azure CLI

3, A Most Obscure Method.

Useful Links

You may also be interested in:

l_input=”your HTTP operation to be signed”l_key=”your big long Azure storage account key”l_key_decoded=”$(echo -n “${l_key}”|base64 -d)”l_hmac=”$(echo -n “${l_input}”|openssl dgst -sha256 -hmac “${l_key_decoded}” -binary | base64)”

l_input=”your HTTP operation to be signed”
l_key=”your big long Azure storage account key”
l_key_decoded=”$(echo -n “${l_key}”|base64 -d)”
l_hmac=”$(echo -n “${l_input}”|openssl dgst -sha256 -hmac “${l_key_decoded}” -binary | base64)”