This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Checking Azure Disk Cache Settings on a Linux VM in Shell

In a previous blog post, I ended the post by showing how you can use the Azure Enhanced Monitoring for Linux to obtain the disk cache settings.
Except, as we found, it doesn’t easily allow you to relate the Linux O/S disk device names and volume groups, to the Azure data disk names.

You can read the previous post here: Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

In this short post, I pick up where I left off and outline a method that will allow you to correlate the O/S volume group name, with the Linux O/S disk devices and correlate those Linux disk devices with the Azure data disk names, and finally, the Azure data disks with their disk cache settings.

Using the method I will show you, you will see how easily you can verify that the disk cache settings are consistent for all disks that make up a single volume group (very important), and also be able to easily associate those volume groups with the type of usage of the underlying Azure disks (e.g. is it for database data, logs or executable binaries).

1. Check If AEM Is Installed

Our first step is to check if the Azure Enhanced Monitoring for Linux (AEM) extension is installed on the Azure VM.
This extension is required, for your VM to be supported by SAP.

We use standard Linux command line to check for the extension on the VM:

ls -1 /var/lib/waagent/Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-*/config/0.settings

The listing should return at least 1 file called “0.settings”.
If you don’t have this and you don’t have a directory starting with “Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-“, then you don’t have AEM and you should get it installed following standard Microsoft documentation.

2. Get the Number of Disks Known to AEM

We need to know how many disks AEM knows about:

grep -c 'disk;Caching;' /var/lib/AzureEnhancedMonitor/PerfCounters

3. Get the Number of SCSI Disks Known to Linux

We need to know how many disks Linux knows about (we exclude the root disk /dev/sda):

lsscsi --size --size | grep -cv '/dev/sda'

4. Compare Disk Counts

Compare the disks quantity from AEM and from Linux.  They should be the same.  This is the number of data disks attached to the VM.

If you have a lower number from the AEM PerfCounters file, then you may be suffering the effects of an Azure bug in the AEM extension which is unable to handle more than 9 data disks.
Do you have more than 9 data disks?

At this point if you do not have matching numbers, then you will not be able to continue, as the AEM output is vital in the next steps.

Mapping Disks to the Cache Settings

Once we know our AEM PerfCounters file contains all our data disks, we are now ready to map the physical volumes (on our disk devices) to the cache settings. On the Linux VM:

pvs -o "pv_name,vg_name" --separator=' ' --noheadings

Your output should be a list of disks and their volume groups like so (based on our diagram earlier in the post):

/dev/sdc vg_data
/dev/sdd vg_data

Next we look for a line in the AEM PerfCounters file that contains that disk device name, to get the cache setting:

awk -F';' '/;disk;Caching;/ { sub(/\/dev\//,"",$4); printf "/dev/%s %s\n", tolower($4), tolower($6) }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be the Linux disk device name and the Azure data disk cache setting:

/dev/sdc none
/dev/sdd none

For each line of disks from the cache setting, we can now see what volume group it belongs to.
Example: /dev/sdc is vg_data and the disk in Azure has a cache setting of “none”.

If there are multiple disks in the volume group, they all must have the same cache setting applied!

Finally, we look for the device name in the PerfCounters file again, to get the name of the Azure disk:

NOTE: Below is looking specifically for “sdc”.

awk -F';' '/;Phys. Disc to Storage Mapping;sdc;/ { print $6 }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be like so:

None sapserver01-datadisk1
None sapserver01-datadisk2

We can ignore the first column output (“None”) in the above, it’s not needed.

Summary

If you package the AEM disk count check and the subsequent AEM PerfCounters AWK scripts into one neat script with the required loops, then you can get the output similar to this, in one call:

/dev/sdd none vg_data sapserver01-datadisk2
/dev/sdc none vg_data sapserver01-datadisk1
/dev/sda readwrite

Based on the above output, I can see that my vg_data volume group disks (sdc & sdd) all have the correct setting for Azure data disk caching in Azure for a HANA database data disk location.

Taking a step further, if you have intelligently named your volume group names, you then also check in your script, the cache setting based on the name of the volume group to determine if it is correct, or not.
You can then embed this validation script into a “custom validation” within SAP LaMa and it will alert you automatically if your VM disk cache settings are not correct.

You may be wondering, why not do all this from the Azure Portal?
Well, the answer to that is that you don’t know what Linux VM volume groups those Azure disks are used by, unless you have tagged them or named them intelligently in Azure.