This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Listing & Tagging Orphaned Azure Disks using Azure CLI from Bash Cloud Shell

Nobody likes wastage, but it happens.
Finding orphaned Azure disks is super simple using the Cloud Shell.

In this post I show how you can use the CLI, with a little Bash scripting in the Cloud Shell, to find unattached disks and also tag them for future removal.
There are plenty of PowerShell examples in the PowerShell Runbook gallery, but I want to show a CLI example.
I then go on to show how this could be scheduled, but it’s not as simple as first thought.

Using Bash Cloud Shell & the Az Command (CLI)

We can use the az command (CLI) in a bash Cloud Shell to list the disks, then use JMESPath to filter and find the ones that are not attached (“Unattached”) to any VM:

az disk list --query "[?diskState=='Unattached'].{name:name,state:diskState,size:diskSizeGb,sku:sku.name,tag:to_string(tags)}" -o tsv

ase01_datadisk_2        Unattached      256     Standard_LRS    null
ase01_OsDisk_1_8e31587ee6604463ada5167f91e6345f Unattached      30      StandardSSD_LRS null

Notice I have also included the “tags” column with some processing.
If you wanted to search for disks that are not attached, and filter by tag, then you can do the following.
First I apply a tag; I’m going to create a tag called “testTag”, with a value “testTagValue”:

az disk update --name ase01_datadisk_2 --resource-group UK-West --set tags.testTag=testTagValue

Now I have set a value for “testTag”, let’s query based on that specific tag, for “Unattached” disks:

az disk list --query "[?diskState=='Unattached'&&tags.testTag=='testTagValue'].{name:name,state:diskState,size:diskSizeGb,sku:sku.name,tag:to_string(tags)}" -o tsv

ase01_datadisk_2        Unattached      256     Standard_LRS    {"testTag":"testTagValue"}

You can search for a specific value, contains a part of a value, or alternatively, search for a value that is “null”.

Adjusting the Disks

Now we have a list of disks output from our query, we could just delete them.
But it’s probably better to tag them for deletion, allowing a deletion at some point after a more detailed review. I would look to produce a report that could go to a CAB review, ready for deletion.

Here’s how we can use the power of the Cloud Shell, the Azure CLI and Bash scripting to tag those disks that are Unattached:

set updated=0
set failed=0
set tabchar="$(printf "\t")"
set deldate="$(date -d "+ 30 days" "+%Y%m%d")"
az disk list --query "[?diskState=='Unattached'].{name:name,rg:resourceGroup}" -o tsv | while read line
do
   diskname="${line%%${tabchar}*}"
   rgname="${line##*${tabchar}}"
   echo -n "Updating: $diskname ... "
   az disk update --name ${diskname} --resource-group ${rgname} --set tags.deleteDate=${deldate} >/dev/null
   if [[ $? -eq 0 ]] ; then 
      echo "[SUCCESS]"
      (( updated++ ))
    else
      echo "[FAILED]"
      (( failed++ ))
   fi
done
printf "### SUMMARY ### \n Updated: %s\n Failed: %s\n" $updated $failed

Let’s look at the above script in detail below:

  1. We define the variable to hold our updated count.
  2. We define the variable to hold our failed count.
  3. We execute the CLI query to get the list of disks, outputting the disk name and resource group into a “do” loop a line at a time.
  4. The start of the loop body.
  5. We have to capture the character that represents a TAB character.
  6. We get the disk name out of the line by splitting the line by TAB from the left.
  7. We get the disk resource group out of the line by splitting the line by TAB from the right.
  8. A little output text to say which disk is being worked on.
  9. The call to the CLI to update the disk, applying the tag “deleteDate” with a value of today + 30 days in the format yyyymmdd.
  10. We detect the successful (or not) execution of the CLI command.
  11. Output “success” for a successful execution.
  12. Update the success variable count.
  13. Alternatively, if we failed.
  14. Output “failed” for a failed execution.
  15. Update the failed variable count.
  16. Closure of the “if” statement.
  17. Closure of the loop.
  18. Output a summary count of success vs failed.

Here is the execution sample:

Reporting on the deleteDate Tag

Now we have our disks tagged ready for deletion, we might want to find disks that have a deleteDate older than a specific date. These disks would then be ripe for deletion.
Here’s how we can do that,:

az disk list --query "[?tags.deleteDate<'20210402'].{name:name,rg:resourceGroup,tag:to_string(tags)}" -o tsv

We can even go as far as using Bash to inject the current date into the query, thereby allowing us to look for any disks due for deletion from today without adjusting the query each time:

az disk list --query "[?tags.deleteDate<'$(date "+%Y%m%d")'].{name:name,rg:resourceGroup,tag:to_string(tags)}" -o tsv

How About Scheduled Execution?

Well, using the CLI and the Bash shell inside an Azure Automation account is not possible right now.
Instead we would need to convert our disk update scripted CLI example to PowerShell to make it Runbook compatible, then we can schedule inside an Azure Runbook as a PowerShell Runbook.
The code needs to change slightly because we need to use the automation “RunAs Account” feature to connect from our Runbook, but it looks very much like our CLI code in Bash:

IMPORTANT: All the “simple” help guides on creating a PowerShell Runbook fail to mention the need to actually import the required PowerShell modules that you will need to run your code. In the Automation Account section, look at the “modules”. For the code below you need 2 additional modules: Az.Accounts and Az.Compute which you can import from the Gallery. Seems simple, but as I said, not obvious.

$Conn = Get-AutomationConnection -Name 'AzureRunAsConnection'
try{
    $auth = Connect-AzAccount -ServicePrincipal -Tenant $Conn.TenantID -ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
} Catch {
    throw $_.Exception
}
$updated=0
$failed=0
$deldate=(get-date -Format "yyyyMMdd" -date $([datetime]::parseexact($(get-date -format 'yyyyMMdd'), 'yyyyMMdd', $null)).AddDays(+30))
$updateConfig = New-AzDiskUpdateConfig -tag @{ "deleteDate" = "$deldate" }
Get-AzDisk |? { $_.DiskState -eq "Unattached" } |% { 
   write-output "Updating: $($_.Name) ..."
   Update-AzDisk -ResourceGroupName $_.ResourceGroupName -DiskName $_.Name -DiskUpdate $updateConfig >$null 2>&1
   if ( $? -eq $true ) { 
      echo "   [SUCCESS]"
      $updated++
    }
    else {
      echo "   [FAILED]"
      $failed++
   }
}
echo "### SUMMARY ### `n Updated: ${updated}`n Failed: ${failed}`n"

With the above code in a PowerShell Runbook, we can test it:

Obviously if you will be scheduling the code above, then you may wish to change it slightly so that it excludes any orphaned disks already with a deleteDate tag.

Summary

Using the CLI, Cloud Shell and some Bash scripting, we have a simple mechanism to tag unattached disks, then use that tag to report on disks due for deletion after a specific date (or we could use today’s date).
This is a great solution for those with Bash shell scripting skills and as shown it is reasonably simple.

We have also looked at the possibility of scheduling the code and found that it is not possible for CLI. Instead a PowerShell Runbook is a possible solution that allows the scheduling of PowerShell code.
PowerShell could be your pain point, but it really isn’t that far from shell.

Korn Shell Calling SAP HANA – Hello Hello!

So you’ve automated SAP HANA stuff huh?
What tools do you use?
Python? Chef? Puppet? Ansible? DSC/Powershell?

No. I use Korn shell. ¯\_(ツ)_/¯

Me, Trying to Support Korn…

I find Korn shell is a lowest common denominator across many Linux/Unix systems, and also extremely simple to support.
It does exactly what I need.

For me it’s readable, fairly self-explanatory, easily editable and dependable.

…and Failing?

But I do know what you’re saying:

  • there’s no built in version control
  • it’s not easy to debug
  • it’s definitely not very cool 🤓
  • you can’t easily do offline development
  • my Grandad told me about something called Korn shell ?

Have you Considered

If you have no budget for tools, then you can start automating by using what you already have. Why not.
Don’t wait for the right tool, start somewhere and only then will you understand what works for you and doesn’t work for you.

Sometimes it’s not about budget. There are companies out there that do not allow certain sets of utilities and tools to be installed on servers, because they can provide too much help to hackers. Guess what they do allow? Yep, Korn shell is allowed.

Let’s Get Started

Here’s a sample function to run some SQL in HANA by calling the hdbsql (delivered with the HANA client) and return the output:

#!/bin/ksh
function run_sql {
   typeset -i l_inst="${1}" 
   typeset l_db="${2}" 
   typeset l_user="${3}" 
   typeset l_pw="${4}" 
   typeset l_col_sep="${5}" 
   typeset l_sql="${6}" 
   typeset l_output="" 
   typeset l_auth="" 
   typeset -i l_ret=0

   # Check if PW is blank, then use hdbuserstore (-U). 
   if [[ -n "${l_pw}" && "${l_pw}" != " " ]] ; then 
      l_auth="-u ${l_user} -p ${l_pw}" 
    else l_auth="-U ${l_user}" 
   fi

   l_output="$(hdbsql -quiet -x -z -a -C -j -i ${l_inst} ${l_auth} -d ${l_db} -F "${l_col_sep}"<<-EOF1 2>>/tmp/some-script.log 
		${l_sql}; 
		quit 
EOF1 )"
   
   l_ret=$?

   # For HANA 1.0 we need to trim the first 6 lines of output, because it doesn't understand "-quiet". 
   #if [[ "$(check_major_version)" -lt 2 ]] ; then 
      # l_output="$(print "${l_output}"| tail -n +7)" 
   #fi

   print "${l_output}" 
   return $l_ret 

}

To call the above function, we then just do (in the same script):

l_result="$(run_sql "10" "SystemDB" "SYSTEM" "SYSTEMPW" " " "ALTER SYSTEM ALTER CONFIGURATION ('global.ini', 'SYSTEM') SET ('persistence','log_mode')='overwrite' WITH RECONFIGURE")"

We are passing in the HANA instance number 10, you can use whatever your instance number is.

We can check the function return code (did the function return cleanly) like so:

if [[ $? -ne 0 ]] ; then 
   print "FAILED" 
   exit 1; 
fi

Here’s what we’re passing in our call to hdbsql (you can find this output by calling “hdbsql –help”):

-i instance number of the database engine
-d name of the database to connect
-U use credentials from user store
-u user name to connect
-p password to connect
-x run quietly (no messages, only query output)
-quiet Do not print the welcome screen
-F use as the field separator (default: ‘,’)
-C suppress escape output format
-j switch the page by page scroll output off
-Q show each column on a separate line
-a do not print any header for SELECT commands

If you wanted to return a value, then the “l_result” variable would contain the output.

Ideally, the function we wrote would be put into a chunk of modular code that could be referenced time and again from other Korn shell scripts.

You would also be looking to create some sets of standard functions for logging of messages to help with debugging. You can make it as complex as you wish.

In the call to “run_sql” we pass a column separator.
I usually like to use a “|” (pipe), then parse the returned values using the “awk” utility like so:

l_result="$(run_sql "10" "SystemDB" "SYSTEM" "SYSTEMPW" "|" "SELECT file_name,layer_name,section,key, value FROM SYS.M_INIFILE_CONTENTS WHERE layer_name='SYSTEM'")"

echo "${l_result}" | /bin/awk -F'|' '{ print $2" "$3" "$4 }'

When we execute the script we get the first 3 columns like so:

daemon.ini SYSTEM daemon 
diserver.ini SYSTEM communication 
global.ini SYSTEM auditing 
configuration global.ini SYSTEM 
backup global.ini SYSTEM
...

Obviously we don’t really embed the password in the script; it gets passed in.
You can either pass it in using the command line parameter method (./myscript.ksh someparam) or via the Linux environment variables (export myparam=something; ./myscript.ksh).
If you want you can even pipe it in (echo “myparam”| ./myscript.ksh) and “read” it into a variable.
You can also take a look at the “expect” utility to automate command line input.
Also, take a look at the “hdbuserstore” utility to store credentials for automation scripts (remember to set appropriatly secure privs on these database users).

That’s all there is to it for you to get started using Korn shell to call HANA.

Checking Azure Disk Cache Settings on a Linux VM in Shell

In a previous blog post, I ended the post by showing how you can use the Azure Enhanced Monitoring for Linux to obtain the disk cache settings.
Except, as we found, it doesn’t easily allow you to relate the Linux O/S disk device names and volume groups, to the Azure data disk names.

You can read the previous post here: Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

In this short post, I pick up where I left off and outline a method that will allow you to correlate the O/S volume group name, with the Linux O/S disk devices and correlate those Linux disk devices with the Azure data disk names, and finally, the Azure data disks with their disk cache settings.

Using the method I will show you, you will see how easily you can verify that the disk cache settings are consistent for all disks that make up a single volume group (very important), and also be able to easily associate those volume groups with the type of usage of the underlying Azure disks (e.g. is it for database data, logs or executable binaries).

1. Check If AEM Is Installed

Our first step is to check if the Azure Enhanced Monitoring for Linux (AEM) extension is installed on the Azure VM.
This extension is required, for your VM to be supported by SAP.

We use standard Linux command line to check for the extension on the VM:

ls -1 /var/lib/waagent/Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-*/config/0.settings

The listing should return at least 1 file called “0.settings”.
If you don’t have this and you don’t have a directory starting with “Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-“, then you don’t have AEM and you should get it installed following standard Microsoft documentation.

2. Get the Number of Disks Known to AEM

We need to know how many disks AEM knows about:

grep -c 'disk;Caching;' /var/lib/AzureEnhancedMonitor/PerfCounters

3. Get the Number of SCSI Disks Known to Linux

We need to know how many disks Linux knows about (we exclude the root disk /dev/sda):

lsscsi --size --size | grep -cv '/dev/sda'

4. Compare Disk Counts

Compare the disks quantity from AEM and from Linux.  They should be the same.  This is the number of data disks attached to the VM.

If you have a lower number from the AEM PerfCounters file, then you may be suffering the effects of an Azure bug in the AEM extension which is unable to handle more than 9 data disks.
Do you have more than 9 data disks?

At this point if you do not have matching numbers, then you will not be able to continue, as the AEM output is vital in the next steps.

Mapping Disks to the Cache Settings

Once we know our AEM PerfCounters file contains all our data disks, we are now ready to map the physical volumes (on our disk devices) to the cache settings. On the Linux VM:

pvs -o "pv_name,vg_name" --separator=' ' --noheadings

Your output should be a list of disks and their volume groups like so (based on our diagram earlier in the post):

/dev/sdc vg_data
/dev/sdd vg_data

Next we look for a line in the AEM PerfCounters file that contains that disk device name, to get the cache setting:

awk -F';' '/;disk;Caching;/ { sub(/\/dev\//,"",$4); printf "/dev/%s %s\n", tolower($4), tolower($6) }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be the Linux disk device name and the Azure data disk cache setting:

/dev/sdc none
/dev/sdd none

For each line of disks from the cache setting, we can now see what volume group it belongs to.
Example: /dev/sdc is vg_data and the disk in Azure has a cache setting of “none”.

If there are multiple disks in the volume group, they all must have the same cache setting applied!

Finally, we look for the device name in the PerfCounters file again, to get the name of the Azure disk:

NOTE: Below is looking specifically for “sdc”.

awk -F';' '/;Phys. Disc to Storage Mapping;sdc;/ { print $6 }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be like so:

None sapserver01-datadisk1
None sapserver01-datadisk2

We can ignore the first column output (“None”) in the above, it’s not needed.

Summary

If you package the AEM disk count check and the subsequent AEM PerfCounters AWK scripts into one neat script with the required loops, then you can get the output similar to this, in one call:

/dev/sdd none vg_data sapserver01-datadisk2
/dev/sdc none vg_data sapserver01-datadisk1
/dev/sda readwrite

Based on the above output, I can see that my vg_data volume group disks (sdc & sdd) all have the correct setting for Azure data disk caching in Azure for a HANA database data disk location.

Taking a step further, if you have intelligently named your volume group names, you then also check in your script, the cache setting based on the name of the volume group to determine if it is correct, or not.
You can then embed this validation script into a “custom validation” within SAP LaMa and it will alert you automatically if your VM disk cache settings are not correct.

You may be wondering, why not do all this from the Azure Portal?
Well, the answer to that is that you don’t know what Linux VM volume groups those Azure disks are used by, unless you have tagged them or named them intelligently in Azure.