This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

As part of a SAP HANA deployment, there are a set of recommendations around the Azure VM disk caching settings and the use of the Azure VM WriteAccelerator.
These features should be applied to the SAP HANA database data volume and log volume disks to ensure optimum performance of the database I/O operations.

This post is not about the cache settings, but about how it’s possible to gather the required information about the current settings across your landscape.

There are 3 main methods available to an infrastructure person, to see the current Azure VM disk cache settings.
I will discuss these method below.

1, Using the Azure Portal

You can use the Azure Portal to locate the VM you are interested in, then checking the disks, and looking on each disk.
You can only see the disk cache settings under the VM view inside the Azure Portal.

While slightly counter intuitive (you would expect to see the same under the “Disks” view), it’s because the disk cache feature is provided for by the VM onto which the disks are bound, therefore it’s tied to the VM view.

2, Using the Azure CLI

Using the Azure CLI (bash or powershell) to find the disks and get the settings.

This is by far the most common approach for anyone managing a large estate. It uses the existing Azure API layers and the Azure CLI to query your Azure subscription, return the data in JSON format and parse it.
The actual query is written in JMESPATH (https://jmespath.org/) and is similar to XPath (for XML).

A couple of sample queries in BASH (my favourite shell):

List all VM names:

az vm list --query [].name -o table

List VM names, powerstate, vmsize, O/S and RG:

az vm list --show-details --query '[].{name:name, state:powerState, OS:storageProfile.osDisk.osType, Type:hardwareProfile.vmSize, rg:resourceGroup, diskName:storageProfile.dataDisks.name, diskLUN:storageProfile.dataDisks.lun, diskCaching:storageProfile.dataDisks.caching, diskSizeG:storageProfile.dataDisks.diskSizeGb, WAEnabled:storageProfile.dataDisks.writeAcceleratorEnabled }' -o table

List all VMs with names ending d01 or d02 or d03, then pull out the data disk details and whether the WriteAccelerator is enabled:

az vm list --query "[?ends_with(name,'d01')||ends_with(name,'d02')||ends_with(name,'d03')]|[].storageProfile.dataDisks[].[lun,name,caching,diskSizeGb,writeAcceleratorEnabled]" -o tsv

To execute the above, simply launch the Cloud Shell and select “Bash” in the Azure Portal:

Then paste in the query and hit return:

3, A Most Obscure Method.

Since SAP require you to have the “Enhanced Monitoring for Linux” (OEM) agent extension installed, you can obtain the disk details directly on each VM.

For Linux VMs, the OEM creates a special text file for performance counters, which is used by the Saposcol (remember that) for use by SAP diagnostic agents, ABAP stacks and other tools.

Using a simple piece of awk scripting, we can pull out the disk cache settings from the file like so:

awk -F';' '/;disk;Caching;/ { sub(//dev//,"",$4); printf "/dev/%s %sn", tolower($4), tolower($6) }' /var/lib/AzureEnhancedMonitor/PerfCounters

There’s a lot more information in the text file (/var/lib/AzureEnhancedMonitor/PerfCounters) and my later post Checking Azure Disk Cache Settings on a Linux VM in Shell, I show how you can pull out the complete mapping between Linux disk devices, disk volume groups, Azure disk names and the disk caching settings, like so:

Useful Links

Making saptune Actually Work & Patching to v2

Having recently spent some time analysing the performance of a HANA database system, I got down to the depths of Linux device I/O performance on an Azure hosted VM.

There was no reason to suspect any issue, because during the implementation of the VM image build process, we had followed all the relevant SAP notes.
In our case, on SUSE Enterprise Linux for SAP 12, we were explicitly following SAP Note 1275776 “Linux: Preparing SLES for SAP environments”.
Inside that SAP note, you go through the process of understanding the difference between sapconf and saptune, plus actually configure saptune (since it comes automatically with the “for SAP” versions of SLES 12).

Once configured, saptune should apply all the best practices that are encompassed in a number of SAP notes including SAP Note 2205917 “SAP HANA DB: Recommended OS settings for SLES 12 / SLES for SAP Applications 12”, which is itself needed during the HANA DB installation preparation work.
If you follow the note, there are a number of required O/S adjustments that are needed for HANA, which can be either applied manually, or (as recommended) automatically via saptune, provided the correct saptune profile is selected.

As part of our configuration, we had applied saptune solution profile S4HANA-DBSERVER (also noted in the SUSE documentation for SAP HANA).
This is applied using the standard:

saptune solution apply S4HANA-DBSERVER

You don’t get a lot of feedback from the saptune execution, but the fact there are no errors, indicates (normally) that it has done what has been requested.
You can check it has applied the profile by executing:

saptune solution list

The item that is starred in the returned list, is the profile that has been applied.
That’s it.

As part of my troubleshooting I even took the trouble of running the publicly available script sapconf_saptune_check (see here: https://github.com/scmschmidt/sapconf_saptune_check/blob/master/sapconf_saptune_check ), which just confirmed that saptune was indeed active/enabled and had a valid profile configured:

Back to the task of checking out the performance issue, and you can probably see where this is going now.
On investigation of the actual saptune profile contents, it was possible to see that a large majority of O/S changes had not been applied.
Specifically, we were not seeing the NOOP scheduler selected for the HANA disks devices.

By executing either of the following, you can check the currently selected scheduler:

grep -l ‘.*’ /sys/block/s??/queue/scheduler

or

cat /sys/block/s??/queue/scheduler

The selected scheduler will be in square brackets.
In my case, I was seeing “[cfq]” for all devices. Not good and not the recommendation from SAP and SUSE.
This setting should be automatically adjusted by the tuned daemon.

Looking at my version of saptune, I could see it was version 1.1.7 (from the output of the execution of the sapconf_saptune_check script).

Reading some of the recent blog posts from Soeren Schmidt here: https://blogs.sap.com/2019/05/03/a-new-saptune-is-knocking-on-your-door/
I could see that version 2 of saptune was now released.

Downloading the newer version (not installing directly!), reverting the old solution profile, installing the new saptune version and finally re-applying the same profile, confirmed that saptune was the culprit.

The new saptune2 fixed the issue, immediately activating a number of critical O/S changes, including the NOOP scheduler setting on each device.

The moral of the story, is therefore that as well as following the SAP processes, you still need to actually validate what it says it should have done.
The new saptune2 has been incorporated into our build process, plus the configuration check scripts will be specifically checking for it.
However, since the upgrade from saptune1 to saptune2 could cause issues if it just blindly re-applied the “new” profile settings, SAP have made saptune follow a backwards compatible upgrade process, whereby the O/S settings are retained as they were before the upgrade was executed.

Therefore, as per the SAP Note 2816790 “Differences between sapconf and saptune” links, the upgrade process for an already applied profile, is to revert it prior to the saptune upgrade, then applied the upgrade, then re-apply.
This could therefore not just be rolled out via our standard SLES patching routine. We had to develop an automated script that would specifically pre-patch saptune to saptune2 using the correct procedure, before we embarked on the next SLES patching round.

As a post-note, you should make yourself familiar with the coming changes to the SLES scheduler settings, with the introduction of the NONE scheduler (see below links for link to the blog).

Useful notes/links:
https://www.suse.com/c/sles-1112-os-tuning-optimisation-guide-part-1/
https://blogs.sap.com/2019/06/25/sapconf-versus-saptune-in-more-detail/
https://blogs.sap.com/2019/05/03/a-new-saptune-is-knocking-on-your-door/
https://www.suse.com/c/noop-now-named-none/

HANA 2.0 – Calc View – SAP DBTech JDBC 2048 Column Store Error

Scenario: During DB access of a HANA 2.0 SPS3 Calculation View from S/4HANA ABAP stack (via ABAP) or even directly in HANA Studio (via “Raw Data”), an error is displayed in a short dump or on screen along the lines of “SAP DBTech JDBC (2048: column store error: search table error: (nnnn) Instantiation of calculation model failed: exception 30600. An Internal error occurred”.

After investigation you observe the following error inside the indexserver trace log: “Could not get template scenario <SID>::_SYS_BIC:_SYS_SS_CE_<nnnn>_vers2_lang6_type1_CS_2_2_TMP (t -1) of calculation index <SID>::_SYS_BIC:<PACKAGE>/<CALCVIEW> (t -1). reason: CalculationEngine read from metadata failed.; Condition ‘aScenarioHandle.is_valid()’ failed.”.

The error clearly references the name of your Calculation View (calculation index) but it also references another object with a name like “_SYS_SS_CE_*”.

SAP note 1646743 explains that objects with a naming convention of “_SYS_SS_CE_<guid>_TMPTBL” are temporary tables created during compilation of procedure objects. Whilst our objects naming convention is not an exact match, the assumption is that the object is temporary in nature and created during the compilation of our Calculation View.

To backup the above theory, SAP note 2717365 matches the initial error message in some respects and shows the method to recompile the temporary object.
The note explains that to reproduce its described situation you must “Create a script calculation view which is created based on a procedure.”.

With this in mind, after checking our erroring Calculation View, it is clearly possible to see that ours utilises a “Script” as part of its design.

Therefore, we can assume that the temporary object with naming convention “_SYS_SS_CE_<nnnn>_vers2_lang6_type1_CS_2_2_TMP” is the temporary representation of the script from within our Calculation View.

Following the SAP note, we can recompile the object via its source Calculation View as follows using HANA Studio SQL execution (or hdbsql command line):

(NOTE: in our case the object is owned by user SAPABAP1, so we login/connect as that user in order to execute)

ALTER PROCEDURE “_SYS_BIC”.”<PACKAGE>/<CALCVIEW>/proc” RECOMPILE;

The execution succeeds.
However on retrying to access the data within the view, we still get an error.
What happened, well looking again at our Calculation View, it appears that it references another Calculation View!
So we must recompile all referencing Calculation Views also.

To cut a long story short, it turned out that we had over 4 levels Calculation Views before I decided to just recompile all procedures (if existing) of all known Calculation Views. Some of the views were even in different schemas.

How do we obtain a list of all Calculation Views that use a script and would have temporary procedures?

We can use this SQL string to create the required list of “type 6” objects:

SELECT ‘ALTER PROCEDURE “‘||schema||'”.”‘||name||'” RECOMPILE;’ FROM sys.p_objects_ WHERE type=6 and name like ‘%/proc’

How did I find this? All (or most) HANA objects are represented in the SYS.P_OBJECTS table.

Even temporary SQL objects need to be accounted for in the general administrative operations of the database, they need to be listed somewhere and have a corresponding object ID.
By executing the SQL as the SAPABAP1 user, we get output in a similar fashion as to that shown below, with the first line being a column header:

‘ALTER PROCEDURE “‘||SCHEMA||'”.”‘||NAME||'” RECOMPILE;’

ALTER PROCEDURE “_SYS_BIC”.”sap.erp.sfin.co.pl/FCO_C_ACCOUNT_ASSIGNMENT/proc” RECOMPILE;

ALTER PROCEDURE “_SYS_BIC”.”sap.erp.sfin.rtc/RTC_C_FISCMAPA/proc” RECOMPILE;

We can then simply execute the output SQL lines for each object to be recompiled.
On attempting access to the Calculation View, it now correctly returns data (or no data), and does not show an error message.

The next question is, why did we get this problem?

Looking back at SAP note 2717365 it says “This error happens because the temporary created objects were not cleared up properly when this happened with an error.”.
Remember that this is not an exact match for our error, but I think the explanation is good enough.

An error occurred during the creation of the temporary procedures that underpin our scripted Calculation Views.

We don’t know what the error or issue was, but subsequently recompiling those Calculation View temporary procedures fixes the issue.