This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

New SAP ASE Audit Logging Destination in 16.0.4

Let’s face it, auditing in SAP ASE 16 is difficult to configure due to the requirement to produce your own stored procedure and correctly size the rotating table setup with multiple database segments for each of the multiple audit tables. Once configured, you then had the realisation that to obtain the records, you needed to extract them from the database somehow, and then the problem of who does this task, what privileges they need, should they themselves be audited etc etc.

Good news! With the introduction of ASE 16.0 SP04, configuring auditing in the SAP ASE database just got dramatically easier and way more useable!

Introducing “audit trail type”

In previous versions of the SAP ASE database, database audit records were stored only in the sybsecurity database in a specific set of tables that you had to size and create and rotate yourself (or with a stored procedure).

Once the records were in the database, it was then up to you to define how and when those records would be analysed.
Depending on whether your SIEM tool supports direct ODBC/JDBC access to SAP ASE or not, would depend on how complex the extraction process would be.

In SP04 a new parameter was introduced called “audit trail type” where you can now set the audit store to be “syslog”.

When setting the store to be “syslog”, the audit records are pushed out to the Linux syslogd daemon (or rsyslogd or syslog-ng) and written to the O/S defined location according to the configuration of syslogd:

Each audit record gets a tag/program name of “SAP_ASE_AUDIT”, which means you can define a custom syslogd log file to hold the records, and also then specify a custom rotation should you wish.
Your syslogd logs may already be pulled into your SIEM tools, in which case you will simply need to classify and store those records for analysis.

With the new parameter set to “syslog” and the audit records being stored as file(s) on the file system, you will need to ensure that the file system has adequate space and establish a comfortable file retention (logrotate) configuration to ensure that audit records do not cause the file system to fill (preventing persistence of additional audit records).

Of course, should you enjoy torture, you can always go ahead and continue to use the database to store the audit records. Simply setting the new parameter “audit trail type” to “table”, will store the audit records in the database just like the previous versions of ASE.

Useful Links

What’s new in ASE 16.0.4

Parameter: audit trail type

Listing & Tagging Orphaned Azure Disks using Azure CLI from Bash Cloud Shell

Nobody likes wastage, but it happens.
Finding orphaned Azure disks is super simple using the Cloud Shell.

In this post I show how you can use the CLI, with a little Bash scripting in the Cloud Shell, to find unattached disks and also tag them for future removal.
There are plenty of PowerShell examples in the PowerShell Runbook gallery, but I want to show a CLI example.
I then go on to show how this could be scheduled, but it’s not as simple as first thought.

Using Bash Cloud Shell & the Az Command (CLI)

We can use the az command (CLI) in a bash Cloud Shell to list the disks, then use JMESPath to filter and find the ones that are not attached (“Unattached”) to any VM:

az disk list --query "[?diskState=='Unattached'].{name:name,state:diskState,size:diskSizeGb,sku:sku.name,tag:to_string(tags)}" -o tsv

ase01_datadisk_2        Unattached      256     Standard_LRS    null
ase01_OsDisk_1_8e31587ee6604463ada5167f91e6345f Unattached      30      StandardSSD_LRS null

Notice I have also included the “tags” column with some processing.
If you wanted to search for disks that are not attached, and filter by tag, then you can do the following.
First I apply a tag; I’m going to create a tag called “testTag”, with a value “testTagValue”:

az disk update --name ase01_datadisk_2 --resource-group UK-West --set tags.testTag=testTagValue

Now I have set a value for “testTag”, let’s query based on that specific tag, for “Unattached” disks:

az disk list --query "[?diskState=='Unattached'&&tags.testTag=='testTagValue'].{name:name,state:diskState,size:diskSizeGb,sku:sku.name,tag:to_string(tags)}" -o tsv

ase01_datadisk_2        Unattached      256     Standard_LRS    {"testTag":"testTagValue"}

You can search for a specific value, contains a part of a value, or alternatively, search for a value that is “null”.

Adjusting the Disks

Now we have a list of disks output from our query, we could just delete them.
But it’s probably better to tag them for deletion, allowing a deletion at some point after a more detailed review. I would look to produce a report that could go to a CAB review, ready for deletion.

Here’s how we can use the power of the Cloud Shell, the Azure CLI and Bash scripting to tag those disks that are Unattached:

set updated=0
set failed=0
set tabchar="$(printf "\t")"
set deldate="$(date -d "+ 30 days" "+%Y%m%d")"
az disk list --query "[?diskState=='Unattached'].{name:name,rg:resourceGroup}" -o tsv | while read line
do
   diskname="${line%%${tabchar}*}"
   rgname="${line##*${tabchar}}"
   echo -n "Updating: $diskname ... "
   az disk update --name ${diskname} --resource-group ${rgname} --set tags.deleteDate=${deldate} >/dev/null
   if [[ $? -eq 0 ]] ; then 
      echo "[SUCCESS]"
      (( updated++ ))
    else
      echo "[FAILED]"
      (( failed++ ))
   fi
done
printf "### SUMMARY ### \n Updated: %s\n Failed: %s\n" $updated $failed

Let’s look at the above script in detail below:

  1. We define the variable to hold our updated count.
  2. We define the variable to hold our failed count.
  3. We execute the CLI query to get the list of disks, outputting the disk name and resource group into a “do” loop a line at a time.
  4. The start of the loop body.
  5. We have to capture the character that represents a TAB character.
  6. We get the disk name out of the line by splitting the line by TAB from the left.
  7. We get the disk resource group out of the line by splitting the line by TAB from the right.
  8. A little output text to say which disk is being worked on.
  9. The call to the CLI to update the disk, applying the tag “deleteDate” with a value of today + 30 days in the format yyyymmdd.
  10. We detect the successful (or not) execution of the CLI command.
  11. Output “success” for a successful execution.
  12. Update the success variable count.
  13. Alternatively, if we failed.
  14. Output “failed” for a failed execution.
  15. Update the failed variable count.
  16. Closure of the “if” statement.
  17. Closure of the loop.
  18. Output a summary count of success vs failed.

Here is the execution sample:

Reporting on the deleteDate Tag

Now we have our disks tagged ready for deletion, we might want to find disks that have a deleteDate older than a specific date. These disks would then be ripe for deletion.
Here’s how we can do that,:

az disk list --query "[?tags.deleteDate<'20210402'].{name:name,rg:resourceGroup,tag:to_string(tags)}" -o tsv

We can even go as far as using Bash to inject the current date into the query, thereby allowing us to look for any disks due for deletion from today without adjusting the query each time:

az disk list --query "[?tags.deleteDate<'$(date "+%Y%m%d")'].{name:name,rg:resourceGroup,tag:to_string(tags)}" -o tsv

How About Scheduled Execution?

Well, using the CLI and the Bash shell inside an Azure Automation account is not possible right now.
Instead we would need to convert our disk update scripted CLI example to PowerShell to make it Runbook compatible, then we can schedule inside an Azure Runbook as a PowerShell Runbook.
The code needs to change slightly because we need to use the automation “RunAs Account” feature to connect from our Runbook, but it looks very much like our CLI code in Bash:

IMPORTANT: All the “simple” help guides on creating a PowerShell Runbook fail to mention the need to actually import the required PowerShell modules that you will need to run your code. In the Automation Account section, look at the “modules”. For the code below you need 2 additional modules: Az.Accounts and Az.Compute which you can import from the Gallery. Seems simple, but as I said, not obvious.

$Conn = Get-AutomationConnection -Name 'AzureRunAsConnection'
try{
    $auth = Connect-AzAccount -ServicePrincipal -Tenant $Conn.TenantID -ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
} Catch {
    throw $_.Exception
}
$updated=0
$failed=0
$deldate=(get-date -Format "yyyyMMdd" -date $([datetime]::parseexact($(get-date -format 'yyyyMMdd'), 'yyyyMMdd', $null)).AddDays(+30))
$updateConfig = New-AzDiskUpdateConfig -tag @{ "deleteDate" = "$deldate" }
Get-AzDisk |? { $_.DiskState -eq "Unattached" } |% { 
   write-output "Updating: $($_.Name) ..."
   Update-AzDisk -ResourceGroupName $_.ResourceGroupName -DiskName $_.Name -DiskUpdate $updateConfig >$null 2>&1
   if ( $? -eq $true ) { 
      echo "   [SUCCESS]"
      $updated++
    }
    else {
      echo "   [FAILED]"
      $failed++
   }
}
echo "### SUMMARY ### `n Updated: ${updated}`n Failed: ${failed}`n"

With the above code in a PowerShell Runbook, we can test it:

Obviously if you will be scheduling the code above, then you may wish to change it slightly so that it excludes any orphaned disks already with a deleteDate tag.

Summary

Using the CLI, Cloud Shell and some Bash scripting, we have a simple mechanism to tag unattached disks, then use that tag to report on disks due for deletion after a specific date (or we could use today’s date).
This is a great solution for those with Bash shell scripting skills and as shown it is reasonably simple.

We have also looked at the possibility of scheduling the code and found that it is not possible for CLI. Instead a PowerShell Runbook is a possible solution that allows the scheduling of PowerShell code.
PowerShell could be your pain point, but it really isn’t that far from shell.

SAP Netweaver ICM Fast Channel Architecture

SAP Netweaver has been around for many, many years now. In fact we have had very nearly 20 years of Netweaver.
Back in March 2001, SAP acquired TopTier and went on to use TopTier’s application as the underpinning to the SAP Netweaver application server (WebAS).
Now this would not have been the Netweaver Java stack, that was to come later in the form of WebAS 6.30.
My point is, you would imagine by now that Netweaver is known inside and out by most BASIS professionals, but this is just not the case. It’s a complex and very capable application server and there are things that we know and things that we know in detail.
One of the things that seems to be little known is the FCA and it’s role within the ICM of the Netweaver Java stack.

In this post I want to explain the function of the SAP Netweaver Internet Communication Manager (ICM) Fast Channel Architecture (FCA) and how this is responsible for routing the HTTP communications to your Netweaver Java stack.

As usual, a little context will help set the scene.

A History of Netweaver Java

Before Netweaver 7.1, the Java stack did not have an Internet Communication Manager (ICM). This was reserved only for the Netweaver ABAP stack.
Instead, these old Netweaver Java versions had additional Java nodes (JVMs) called dispatcher nodes (in addition to the server0 node).

The dispatcher node was responsible for receiving and dispatching the inbound HTTP requests to the server nodes of the instance.

The ICM Was Added

Since Netweaver 7.1, the Java stack was given the ICM, which runs from the Kernel binaries, instead of a JVM.


The benefits of this change were:

  • Faster startup and response time (Kernel is C++ compiled binary code).
  • Smaller memory requirements.
  • Same ICM in Netweaver ABAP and Netweaver Java (same Kernel DB independent part).
  • Use of profile files for configuration (for SSL, security, memory params) instead of ConfigTool.

Identifying the FCA

We know the ICM is visible as a separate binary executable process at the operating system level.
In Windows we see “icman.exe” and in Unix/Linux we see “icman”.
At execution, the icman program reads the instance profile to determine it’s configuration.

The Fast Channel Architecture (FCA) is a specific, dedicated set of memory pipes (MPIs) in the shared memory region, accessible by both the ICM and the Java server nodes and used as a method of super fast inter-process communication between the ICM and the Java server nodes.
In Linux, shared memory segments are visible using the “ipcs -m” command, in Windows these are memory mapped files and you cannot see them so easily, you would need a 3rd party tool.

By using shared memory and the concept of memory pipes, it avoids the need for the data in a HTTP request/response to be sent from the ICM to the Java Server node. Instead of sending the actual data, a simple memory pointer can be sent (smaller and consistent in size), telling the Java Server node where to look in memory, for the data.
Effectively what this means is that the shared memory area for the MPIs, sits logically between the ICM and the Java Server nodes.

According to the Netweaver AS Java documentation, the FCA is itself just another MPI, that acts as a FIFO queue.
The HTTP requests coming into the ICM via a TCP port, travel through a regular (anonymous) MPI, before the ICM dispatches the request into a specific FCA queue.
If you have two server nodes on your Java stack (server0 and server1), then the ICM will query the server node to determine the back-end load, then push the request to the specific FCA queue of the target server node that has capacity to handle the request.
Therefore, if you have two server nodes, you will have a dedicated FCA queue for each.
It is the responsibility of the Java server node, to create the FCA queue in the ICM shared memory during start-up.

Once the HTTP request (or rather, the memory pointer to the request) hits the FCA, it becomes the responsibility of the Java server node to pull the request off the queue into a thread for processing.
Inside the Java Server node, these threads are known as the FCA threads or HTTP Worker Threads.
If you run a SAP PI/PO system, then you may already be familiar with these threads and their configuration.
You may have seen these threads when running thread dumps for SAP support incidents.

There are two methods to actually see the FCA Queues:

  • Within the SAP ICM Web Administration page.
  • Using the “icmon” command line tool.

We can call the icmon tool as follows:

icmon pf=<path-to-instance-profile>

then from the menu select "m"
then from the menu select "y"

Once the MPI list is dumped (option “y”), the the FCA queues are visible at the end of the output:

...
MPI<174>: 4d50494d 'ANON' 11 50 0 0 0 0(4996) 1(30001) 1(30001)
MPI<173>: 4d50494d 'ANON' 10 50 0 0 0 0(4996) 1(30001) 1(30001)
MPI<60>: 4d50494d 'TS1_00_1234650_HTTP_WAIT' 5 -1 20 0 0 0(4996) 1(10002) 0(-1)
MPI<5f>: 4d50494d 'TS1_00_1234650_HTTP' 4 -1 20 0 0 0(4996) 1(10002) 1(30001)
MPI<58>: 4d50494d 'TS1_00_1234651_HTTP_WAIT' 2 -1 20 0 4406 0(4996) 1(10003) 0(-1)
MPI<57>: 4d50494d 'TS1_00_1234651_HTTP' 7 -1 20 0 0 0(4996) 1(10003) 1(30001)
MPI<52>: 4d50494d 'TS1_00_1234650_P4' 6 -1 20 0 0 0(4996) 1(10002) 1(30001)
MPI<4d>: 4d50494d 'TS1_00_1234651_P4' 3 -1 20 0 0 0(4996) 1(10003) 1(30001)
MPI<4>: 4d50494d 'ANON' 1 1 0 0 0 0(4996) 1(30001) 1(30001)
MPI<2>: 4d50494d 'ANON' 0 1 0 0 0 0(4996) 1(30001) 1(30001)
 
    q - quit
    m - menue 

NOTE: For those interested, the 4d 50 49 4d at the beginning of each line, translates from HEX to ASCII as “MPIM”.

In my example, you can see I have 2 Java server nodes registered at this ICM: 1234650 and 1234651.
You will notice that there are 3 queues for each Java server node.
The P4 queue is self explanatory, it is used to talk to the Java server node on it’s P4 port (SAP proprietary protocol) and is probably used to acquire capacity/load information from the server node.
Of the other 2 queues, one queue is the “WAIT” queue and is where (I think) the inbound requests (destined to the Java server node) are held, before they enter the other request queue which is where (I think) the Java server node is waiting to process the requests.
(There is not a great deal of documentation on the above, but I have seen instances where the WAIT queue fills, which makes me believe it’s a holding area).

In the dev_icm trace we can also see the joining of the server nodes to the ICM for the HTTP protocol (other protocols are supported, such as Telnet, P4):

[Thr 140608759801600] Wed Mar 17 22:59:32:934 2021
[Thr 140608759801600] JNCMIHttpCallLBListener: node 1234650, service Http joins load balancing
[Thr 140608759801600] HttpJ2EELbPut: server 1234650 started protocol HTTP, attached to request queue TS1_00_1234650_HTTP
[Thr 140608759801600] JNCMIHttpMsPutLogon: set http logon port (port:50000) (lbcount: 2)
[Thr 140608759801600] JNCMIHttpMsPutLogon: set https logon port (port:50001) (lbcount: 2)

In the Java server node developer trace files (e.g. dev_server0 and dev_server1), we can see the name of the node (JNODE_10002 for server0) which is also visible in the dev_icm trace output in column 10:

F [Thr 139637668607872] Wed Mar 17 22:53:49 2021
F [Thr 139637668607872] JSFSetLocalAddr: using NI defaults for bind()
I [Thr 139637668607872] MtxInit: JNODE_10002 0 2

The relevant dev_icm output:

MPI<60>: 4d50494d ‘TS1_00_1234650_HTTP_WAIT’ 5 -1 20 0 0 0(4996) 1(10002) 0(-1)
MPI<5f>: 4d50494d ‘TS1_00_1234650_HTTP’ 4 -1 20 0 0 0(4996) 1(10002) 1(30001)

Sizing the FCA

The size of the FCA is not directly configurable.
Instead, we can configure the size of the shared memory area (total area) for all the MPIs using parameter “mpi/total_size_MB“, then from this total size, the maximum possible size of any individual MPI is fixed to 25% of the total area size.

In later Netweaver versions (7.40+), it is not recommended to adjust “mpi/total_size_MB“, instead, adjust the “icm/max_conn” parameter, which is then used to calculate “mpi/total_size_MB“.
The internal formula is described as:
mpi/total_size_MB = min(0.06 * $(icm/max_conn) + 50, 2000)

There is another undocumented (apart from SAP notes) parameter, which can allow you to increase the max size of an MPI. However it means any one MPI can consume more of the total area than the default 25%.
It is therefore not advised to be adjusted.

We can see the value of the parameter “mpi/total_size_MB” in the ICM developer trace file (dev_icm) during it’s start up. This is useful as it shows us the calculation based on the formula mentioned above.
We are looing at “total size MB” right at the end of the line:

[Thr 140610607359872] MPI init, created: pipes=40010 buffers=19985 reserved=5995 quota=10%, buffer size=65536, total size MB=1250

Common FCA Errors

There are a dedicated set of SAP notes for FCA errors, such as 1867119.
Based on the architecture we can see that they describe issues with throughput (through the FCA Queue), and with issues in the Java server node threads causing the FCA Queues to fill.
They also show issues with sizing of the MPIs, and the number of the worker threads (for high throughput scenarios).

In my experience the following types of FCA errors can be seen in the Java server developer traces “dev_server<n>” files:

  • “-3” error: The Java server node is unable to put a response back onto the FCA Queue, probably because the MPI area is full from a full FCA Queue. This can happen if one of the Java server node HTTP Worker threads has become stuck (waiting) for resources or for the database.
    As you will see from my previous diagram, a full MPI area will then start to affect HTTP access to both Java server nodes as they share the ICM (it’s a single point of failure).
  • “-7” error: This affects one individual Java server node and prevents it from pulling requests off the FCA queue in a timely manner. This specific issue is usually a timeout mismatch between the HTTP provider and the ICM.

Both of the above errors look similar, but one is a lack of resources in the Java stack and the other is a full FCA Queue (in shared memory) due to inaction (stuck threads) in the Java stack.
The “-7” error can therefore present itself as an issue in the ICM or in the Java stack, but it is usually a problem in the Java stack that causes it to close the connection early.

Summary

There you have it, the simple FCA queue that serves HTTP requests to your Java Server nodes.
We learned:

  • Netweaver Java was given the ICM in 7.1 onwards.
  • The ICM in the Netweaver Java and ABAP stacks is the same binary.
  • The ICM uses shared memory for the MPIs.
  • The shared memory area is controlled via a parameter of which it’s value is controlled via 1 parameter (in NW 7.40+).
  • The FCA queues are MPIs.
  • Only memory pointers are passed through the FCA Queues.
  • The Java server nodes are responsible for creating the FCA queues in the ICM shared memory.
  • There are 2 FCA queues for each server node.
  • The developer traces store information about the size of the ICM shared memory and the registration of the Java Server nodes to a queue.
  • There are a known set of errors that can occur and are documented in SAP notes.
Useful SAP References
  • SAP Note 1867119 – No more memory for FCA
  • SAP Note 2417488 – Resource leak for MPI buffers in FCA communication
  • SAP Note 1945745 – How to increase HTTP Worker (FCA) threads in PI
  • SAP Note 2579836 – AS Java system has performance problem – FCAException – Best practices and tuning recommendations.
  • SAP Note 2997765 – AS Java system has performance problem – FCAException – Best practices for analysis
  • SAP Note 2276273 – AS Java – How to identify the largest MPI buffer consumer by MPI dump

HowTo: Install Azure Enhanced Monitoring for Linux for SAP

One SAP support prerequisite for running SAP on Azure, is that you must have Azure Enhanced Monitoring for Linux installed onto the Azure Linux VMs where your SAP application runs (including DB servers). Details are in SAP note 2015553.

In this brief post I show how to check if it is already installed, then how to install it, without needing to install the Powershell Azure Cmdlets.

What is Azure Enhanced Monitoring for Linux?

Azure Enhanced Monitoring for Linux (AEM) is an Azure VM extension installed onto the target Linux VM.
The extension uses the Azure Instance Agent to pull additional telemetry information down onto the local VM, and places it into a file on the Linux file system called /var/lib/AzureEnhancedMonitor/PerfCounters.

This special file is pure ASCII text with data inside that is semi-colon separated.
You can use Linux command line utilities to query information from the file (it’s readable by any user).

The file is parsed by the SAP Host Agent (also installed on every SAP VM) and made available in the monitoring memory segment used by the Netweaver ABAP stack, with the data being visible in transaction ST06 (OS06).

How to Check if AEM Is Installed

There are a number of ways to check if Azure Enhanced Monitoring for Linux is installed on a VM:

  • Inside the VM in Linux we can check for the existence of file: “/var/lib/AzureEnhancedMonitor/PerfCounters”
  • Inside the VM in Linux we can check the extension home dir exists: “/var/lib/waagent/Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-*”
  • In the Azure Portal, we can check the status of the extension in the Azure Portal:
  • In the Azure Cloud Shell, we can either Test or Get the AEM Extension to see if it is installed:
Get-AzVMAEMExtension -ResourceGroupName <RG-NAME> -VMName <VM-Name>
Test-AzVMAEMExtension -ResourceGroupName <RG-NAME> -VMName <VM-Name>

Installing AEM

There are two ways to install the Azure Enhanced Monitoring for Linux extension into a VM:

  • Using local PowerShell (on your computer) with the Azure Cmdlets installed.
    You will need to have the rights on the local machine to perform the install of the Azure Cmdlets.
    I will not cover this method as it is quite tedious to setup and the chances are that your PowerShell is locked down by your company and will not allow you to install the required Cmdlets.
  • Using Powershell in the Azure Portal Cloud Shell.
    This has all the required Cmdlets already installed, but to setup the Cloud Shell you will need rights in Azure to be able to create a Storage Account to use for your shell home location.

Out of the two options, I usually opt for the Cloud Shell. Once you have it setup, you will find you can use it for many other things and access it from anywhere!
In this post I will be using Cloud Shell to do the installation.

To install the AEM extension, we use Powershell commands to do the following sequence of tasks:

  • Obtain our subscription context.
  • Deploy the extension to the specific VM in the subscription.

Let’s start the Cloud Shell (NOTE: You will need a Storage Account for the Cloud Shell to work).
Go to the Azure Portal and click the button on the button bar:

Make sure that you are in a PowerShell Shell:

We may need to switch to a specific subscription.
We can list all subscriptions by calling Get-AzSubscription and filtering on the Id property:

Get-AzSubscription | Select-Object Id

We can then set the context of our Cloud Shell to the specific subscription Id as follows:

$context = Get-AzSubscription -SubscriptionId '<SubscriptionID>'
Set-AzContext -SubscriptionObject $context

Once the code has executed, we can check if the AEM extension is already installed:

Get-AzVMAEMExtension -ResourceGroupName <RG-NAME> -VMName <VM-Name>

If the AEM extension is already installed, then we will see output being returned from the Get command:

ResourceGroupName       : UK-West
VMName                  : vm01
Name                    : AzureEnhancedMonitorForLinux
Location                : ukwest
Etag                    : null
Publisher               : Microsoft.OSTCExtensions
ExtensionType           : AzureEnhancedMonitorForLinux
TypeHandlerVersion      : 3.0
Id                      : /subscriptions/mybigid/resourceGroups/UK-West/providers/Microsoft.Compute/virtualMachines
                          /vm01/extensions/AzureEnhancedMonitorForLinux
PublicSettings          : {
                            "cfg": [
                              {
                                "key": "vmsize",
                                "value": "Standard_D4s_v3"
                              },
                              {
                                "key": "vm.role",
                                "value": "IaaS"
                              },
                              {
                                "key": "vm.memory.isovercommitted",
                                "value": 0
                              },
                              {
                                "key": "vm.cpu.isovercommitted",
                                "value": 0
                              },
                              {
                                "key": "script.version",
                                "value": "3.0.0.0"
                              },
                              {
                                "key": "verbose",
                                "value": "0"
                              },
                              {
                                "key": "href",
                                "value": "http://aka.ms/sapaem"
                              },
                              {
                                "key": "vm.sla.throughput",
                                "value": 96
                              },
                              {
                                "key": "vm.sla.iops",
                                "value": 6400
                              },
                              {
                                "key": "wad.isenabled",
                                "value": 0
                              }
                            ]
                          }
ProtectedSettings       :
ProvisioningState       : Succeeded
Statuses                :
SubStatuses             :
AutoUpgradeMinorVersion : True
ForceUpdateTag          : 637516905202791108
EnableAutomaticUpgrade  :


If the AEM extension is not installed, not output will be seen from the “Get” command.
We can then install the AEM extension with the “Set-AzVMAEMExtension” command as follows:

Set-AzVMAEMExtension -ResourceGroupName <RG-NAME> -VMName <VM-Name>

The extension should be installed successfully.
If you need to remove it, you can use the “Remove-AzVMAEMExtension” command.

There is a “Test” command that you can call to test the AEM:

Test-AzVMAEMExtension -ResourceGroupName <RG-NAME> -VMName <VM-Name>

Finally, if you want to see the additional command line options, then use the standard “Get-Help” as follows:

Get-Help Set-AzVMAEMExtension -Full

Issues with AEM

There’s one known issue with Azure Enhanced Monitoring for Linux, the number of data disks reported in the PerfCounters file seems to be limited to 9.
This means that if you have more than 9 data disks, the performance data may not be visible in the file and therefore not visible in SAP.
It’s possible a fix is on the way.

SAP PI/PO Performance Statistics Data Extract and Analysis using Excel & Power Query

End-to-End performance analysis of a specific SAP PO/PI interface is a tricky business using the out-of-the-box tools. For example, you might be thinking that SAP Solution Manager can provide this detail, but it is not able to correlate inbound SAP Web Dispatcher HTTP logs or Netweaver Java ICM logs to the PI/PO interface.
Instead you need to implement your own monitoring to bring these components together, unless you implement a complete tool like AppDynamics.

In this post I will show how to extract the SAP PI/PO performance data and import into Excel, using a Power Query to transform it from the XML format into something that we can report on.
After seeing my example, you can take the same design away and, using PowerBI, construct a dashboard for reporting. You can also choose to incorporate the HTTP logs, to give you an the end-to-end performance analysis by interface.

This post is similar to another set of blog posts I published, showing how to pull data about your virtual machines directly from Azure, see: List Your Azure VMs in Excel. That other post used Power Query to transform JSON.

Data Sources

Before we go and start Excel, we need to understand where our data will be coming from.
There are 4 locations that we can report on in a SAP Netweaver AS Java stack running SAP PI/PO:

  • SAP Web Dispatcher (load balancer across multiple App instances).
  • SAP ICM (load balancer across multiple NW java server nodes).
  • SAP NW HTTP Provider.
  • SAP PI/PO performance data servlet.

The last item is the key one, which will require some Power Query goodness to transform the data.
We use the 2014 blog post from Vadim Klimov to see how to pull this data direct from PI/PO using the servlet “PerformanceDataQueryServlet”.

For the Web Dispatcher, the ICM and the HTTP Provider, we really need to choose just one of those points to collect the log information.
Since our source system is handing over processing to “us” at the Web Dispatcher, then that would be the logical location to collect the HTTP logs.
However, some companies use a load balancing appliance like an F5 at the entry-point, in which case, you would be better gathering the HTTP logs from each of the ICM server processes.

The reason for using the HTTP logs from the front-end of the architecture stack, is because you want to capture any HTTP 50x messages caused by unavailability of the back-end parts.
For example, if SAP Netweaver is down, then the Web Disp logs would show a HTTP 503 (service unavailable).
If the PO application is not started inside SAP Netweaver, then the ICM logs would show a HTTP 503.
You want to collect the logs from the closest point of the handover between your source system and the Netweaver stack.

For the HTTP log sources, we have a little bit of an issue.
In most cases, logging is not enabled in Web Dispatcher and ICM configurations. To enable the logging we need to look at the parameter “icm/HTTP/logging_<xx>“.
Ideally what we need for the log format is: %h %u %t “%r” %s %H %L
This will give:

  • %h = Source IP.
  • %u = BASIC Auth username or cert common name.
  • %t = Date/time.
  • %r = The request with query string !IMPORTANT!
  • %s = The HTTP response code e.g. 200 or 500 etc.
  • %H = The name of the server host.
  • %L = Response time in milliseconds.

The log file should be switched DAILY (SWITCHF=day) to prevent it from growing too large.
We will need to transform the data in the log, but we can do this in Power Query or in a script at source.

Data Transfer

We now need to understand how we will transfer the data from the data source to Excel (or PowerBI).
Natively, Excel (and PowerBI) can query a HTTP target to obtain data in a variety of formats.
Since XML is supported with no problems, this enables us to call the PI/PO PerformanceDataQueryServlet directly from Excel.

For those feeling adventurous, the HTTP logs can actually be picked up by Azure Log Analytics. You may need to adjust the date/time format with a script, but then it will be possible to have them stored in your workspace for pulling into PowerBI.

Alternatively, you will need to place the HTTP logs into a storage location on a regular interval, somewhere accessible from Excel/PowerBI. This could be Sharepoint or an Azure Storage Account.
Another option is to have them placed into a location that serves HTTP, such as the SAP instance agent like I have shown before. For the Web Dispatcher you may have 2 logs (in an active/active setup) for the ICM you will have a log file for each Application server instance.
By naming the log files in an intelligent manner, you can ensure that your Power Query can always find the files (e.g. don’t include the date/time in the file name).

Data Aquisition

With your data accessible via HTTP, you can use Excel or PowerBI to process it.
In this example, we will go through the processing for the PerformanceDataQueryServlet, since that is the hardest to process in its raw XML format, with multiple nested tables. The nested tables is the reason we use Power Query to transform it.

Open Excel and create a new workbook, then select the “Data” tab:

Click “Get Data” and select “From Other Sources”, then click “Blank Query”:

Click “Advanced Editor”:

Remove any existing text from the query box:

At this point, we can paste in the code necessary to obtain our data, but first we need to understand the URL composition correctly.
For this we can refer to the 2014 blog post from Vadim Klimov to understand the required parameters.

Here’s my sample HTTP call:
https://sapts1app1:50001/mdt/performancedataqueryservlet?component=af.ts1.sapts1db01&begin=2021-01-10T00:00:00.000Z&end=2021-01-11T00:00:00.000Z

This can be broken down as follows:

EntryValue
Https://sapts1app1:50001Host address of PO App server (this can be any one of the Java instances).
/mdt/performancedataqueryservletURI for the PerformanceDataQueryServlet
component=af.ts1.sapts1db01The name of our AAEX
begin=2021-01-10T00:00:00.000ZThe begin time of our data selection period.
end=2021-01-11T00:00:00.000ZThe end time of our data selection period.

Something you will notice about our URL is that we are using the HOURLY data selection period, listing data for a 24 hour period aggregated by hour.
We don’t really have much choice with the PerformanceDataQueryServlet, as we can only choose from MINUTE, HOURLY or DAILY with aggregation levels being 15mins, 1hour or 1day.

If we were to decide not to pull the data via HTTP, then we could save it to a flat file.
The data format that will be returned from the HTTP call could be pre-saved.
Here’s my sample data returned from the call to the PerformanceDataQueryServlet:

<?xml version="1.0" encoding="UTF-8" ?>
<PerformanceDataQueryResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sapts1app1:50001/mdt/monitor/PerformanceDataQuery.xsd">
<Result>
<Code>OK</Code>
<Details>OK</Details>
<Text>Successfully&#x20;read&#x20;performance&#x20;data.</Text>
<Component>af.ts1.sapts1db01</Component>
<PeriodType>DAILY</PeriodType>
<BeginTime timezone="UTC">2021-01-10&#x20;00&#x3a;00&#x3a;00.0</BeginTime>
<EndTime timezone="UTC">2021-01-11&#x20;00&#x3a;00&#x3a;00.0</EndTime>
</Result>
<Data>
<ColumnNames>
<Column>INBOUND_CHANNEL</Column>
<Column>OUTBOUND_CHANNEL</Column>
<Column>DIRECTION</Column>
<Column>DELIVERY_SEMANTICS</Column>
<Column>SERVER_NODE</Column>
<Column>FROM_PARTY_NAME</Column>
<Column>FROM_SERVICE_NAME</Column>
<Column>TO_PARTY_NAME</Column>
<Column>TO_SERVICE_NAME</Column>
<Column>ACTION_NAME</Column>
<Column>ACTION_TYPE</Column>
<Column>SCENARIO_IDENTIFIER</Column>
<Column>MESSAGE_COUNTER</Column>
<Column>MAX_MESSAGE_SIZE</Column>
<Column>MIN_MESSAGE_SIZE</Column>
<Column>AVG_MESSAGE_SIZE</Column>
<Column>MAX_RETRY_COUNTER</Column>
<Column>MIN_RETRY_COUNTER</Column>
<Column>AVG_RETRY_COUNTER</Column>
<Column>AVG_PROCESSING_TIME</Column>
<Column>TOTAL_PROCESSING_TIME</Column>
<Column>MEASURING_POINTS</Column>
</ColumnNames>
<DataRows>
<Row>
<Entry>HTTPSender_TS1</Entry>
<Entry>SOAPReceiver_Proxy</Entry>
<Entry>OUTBOUND</Entry>
<Entry>BE</Entry>
<Entry>1234567</Entry>
<Entry></Entry>
<Entry>BSR_SAP</Entry>
<Entry></Entry>
<Entry>TS_SERVER</Entry>
<Entry>ProcessingIn</Entry>
<Entry>urn&#x3a;in2db.com&#x3a;ERP&#x3a;Test</Entry>
<Entry>&#x7c;BSR_SAP&#x7c;ProcessingOut&#x7c;&#x7c;</Entry>
<Entry>3</Entry>
<Entry>2396</Entry>
<Entry>1391</Entry>
<Entry>1817</Entry>
<Entry>0</Entry>
<Entry>0</Entry>
<Entry>0.00</Entry>
<Entry>524476</Entry>
<Entry>1573428</Entry>
<Entry> <MeasuringPoints><MP> <Name>MS&#x3a;module_in&#x3a;CallSapAdapter</Name> <Sequence>1</Sequence> <Max>394</Max> <Avg>349</Avg> <Min>261</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SI</Name> <Sequence>2</Sequence> <Max>12</Max> <Avg>9</Avg> <Min>8</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;BI</Name> <Sequence>3</Sequence> <Max>73</Max> <Avg>60</Avg> <Min>52</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VI</Name> <Sequence>4</Sequence> <Max>12</Max> <Avg>8</Avg> <Min>7</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;MS</Name> <Sequence>5</Sequence> <Max>1266</Max> <Avg>1050</Avg> <Min>771</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Store</Name> <Sequence>6</Sequence> <Max>155</Max> <Avg>112</Avg> <Min>90</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Disp_Queue</Name> <Sequence>7</Sequence> <Max>2328</Max> <Avg>836</Avg> <Min>82</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Disp_Queue</Name> <Sequence>8</Sequence> <Max>1445</Max> <Avg>630</Avg> <Min>203</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Queue</Name> <Sequence>9</Sequence> <Max>44</Max> <Avg>42</Avg> <Min>42</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Queue</Name> <Sequence>10</Sequence> <Max>323</Max> <Avg>263</Avg> <Min>195</Min></MP><MP> <Name>MS&#x3a;Message_Update_Status</Name> <Sequence>11</Sequence> <Max>233</Max> <Avg>166</Avg> <Min>128</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AM</Name> <Sequence>12</Sequence> <Max>114891</Max> <Avg>41811</Avg> <Min>2755</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SO</Name> <Sequence>13</Sequence> <Max>59</Max> <Avg>40</Avg> <Min>24</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VO</Name> <Sequence>14</Sequence> <Max>44</Max> <Avg>33</Avg> <Min>25</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AT</Name> <Sequence>15</Sequence> <Max>468</Max> <Avg>364</Avg> <Min>304</Min></MP><MP> <Name>MS&#x3a;module_out&#x3a;sap.com&#x2f;com.sap.aii.af.soapadapter&#x2f;XISOAPAdapterBean</Name> <Sequence>16</Sequence> <Max>1008279</Max> <Avg>478000</Avg> <Min>131434</Min></MP><MP> <Name>MS&#x3a;Resp&#x3a;stage&#x3a;BI</Name> <Sequence>17</Sequence> <Max>575</Max> <Avg>481</Avg> <Min>395</Min></MP><MP> <Name>MS&#x3a;Resp&#x3a;Message_Put_In_Store</Name> <Sequence>18</Sequence> <Max>157</Max> <Avg>136</Avg> <Min>121</Min></MP><MP> <Name>MS&#x3a;Resp&#x3a;Message_Update_Status</Name> <Sequence>19</Sequence> <Max>89</Max> <Avg>86</Avg> <Min>81</Min></MP> </MeasuringPoints></Entry>
</Row>
<Row>
<Entry>SOAP_Sender</Entry>
<Entry>SOAPReceiver_Proxy</Entry>
<Entry>OUTBOUND</Entry>
<Entry>EO</Entry>
<Entry>1234567</Entry>
<Entry></Entry>
<Entry>BSR_SAP</Entry>
<Entry></Entry>
<Entry>TS_SERVER</Entry>
<Entry>Confirmation_In</Entry>
<Entry>http&#x3a;&#x2f;&#x2f;sap.com&#x2f;xi&#x2f;IS-U&#x2f;Global2</Entry>
<Entry>&#x7c;BSR_SAP&#x7c;Confirmation_Out&#x7c;&#x7c;</Entry>
<Entry>48</Entry>
<Entry>672</Entry>
<Entry>672</Entry>
<Entry>672</Entry>
<Entry>0</Entry>
<Entry>0</Entry>
<Entry>0.00</Entry>
<Entry>89338</Entry>
<Entry>4288227</Entry>
<Entry> <MeasuringPoints><MP> <Name>MS&#x3a;SOAPHandler.processSOAPtoXMB</Name> <Sequence>1</Sequence> <Max>488</Max> <Avg>296</Avg> <Min>190</Min></MP><MP> <Name>MS&#x3a;module_in&#x3a;CallSapAdapter</Name> <Sequence>2</Sequence> <Max>521</Max> <Avg>211</Avg> <Min>144</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SI</Name> <Sequence>3</Sequence> <Max>55</Max> <Avg>6</Avg> <Min>5</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;BI</Name> <Sequence>4</Sequence> <Max>195</Max> <Avg>37</Avg> <Min>26</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VI</Name> <Sequence>5</Sequence> <Max>28</Max> <Avg>5</Avg> <Min>4</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;MS</Name> <Sequence>6</Sequence> <Max>7495</Max> <Avg>2675</Avg> <Min>1340</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Store</Name> <Sequence>7</Sequence> <Max>28648</Max> <Avg>8891</Avg> <Min>6457</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Disp_Queue</Name> <Sequence>8</Sequence> <Max>12290</Max> <Avg>6102</Avg> <Min>3558</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Queue</Name> <Sequence>9</Sequence> <Max>191</Max> <Avg>46</Avg> <Min>21</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Queue</Name> <Sequence>10</Sequence> <Max>401</Max> <Avg>229</Avg> <Min>153</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Disp_Queue</Name> <Sequence>11</Sequence> <Max>18855</Max> <Avg>5289</Avg> <Min>8</Min></MP><MP> <Name>MS&#x3a;Message_Update_Status</Name> <Sequence>12</Sequence> <Max>25237</Max> <Avg>9398</Avg> <Min>5056</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AM</Name> <Sequence>13</Sequence> <Max>390</Max> <Avg>183</Avg> <Min>124</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SO</Name> <Sequence>14</Sequence> <Max>102</Max> <Avg>17</Avg> <Min>16</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VO</Name> <Sequence>15</Sequence> <Max>155</Max> <Avg>22</Avg> <Min>17</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AT</Name> <Sequence>16</Sequence> <Max>1813</Max> <Avg>332</Avg> <Min>205</Min></MP><MP> <Name>MS&#x3a;module_out&#x3a;sap.com&#x2f;com.sap.aii.af.soapadapter&#x2f;XISOAPAdapterBean</Name> <Sequence>17</Sequence> <Max>91602</Max> <Avg>55588</Avg> <Min>46038</Min></MP> </MeasuringPoints></Entry>
</Row>
</DataRows>
</Data>

</PerformanceDataQueryResults>

The XML data is complex and contains nested tables for the “MeasuringPoint” elements. This is not something that is possible to extract using the Excel data import GUI alone. You will need to use my code 😉
In the code there are two points that do the required pre-processing to transpose, fillUp and then remove some data parts, returning it in the required format so that you can report on it with all the “MeasuringPoints” if you need them.
Could the above be done in another tool? Probably. But everyone has Excel.

Let’s put my Power Query code into the Excel query editor:

let
    // Uncomment to use a URL source,
    // Source = Xml.Tables(Web.Contents("https://sapts1app1:50001/mdt/performancedataqueryservlet?component=af.ts1.sapts1db01&begin=2021-01-10T00:00:00.000Z&end=2021-01-11T00:00:00.000Z")), 
    Source = Xml.Tables(File.Contents("C:\Users\darryl\Documents\Projects\po-perf-metrics\performancedataqueryservlet-1.xml")),
    Data = Source{1}[Table],
    DataRows = Data{1}[Table],
    Row = DataRows{0}[Table],
    #"Expanded Entry" = Table.TransformColumns(Row, {"Entry", each Table.RemoveLastN(Table.FillUp(Table.Transpose(_), {"Column22"}),1)}),
    #"Expanded Entry1" = Table.ExpandTableColumn(#"Expanded Entry", "Entry", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22"}, {"Entry.Column1", "Entry.Column2", "Entry.Column3", "Entry.Column4", "Entry.Column5", "Entry.Column6", "Entry.Column7", "Entry.Column8", "Entry.Column9", "Entry.Column10", "Entry.Column11", "Entry.Column12", "Entry.Column13", "Entry.Column14", "Entry.Column15", "Entry.Column16", "Entry.Column17", "Entry.Column18", "Entry.Column19", "Entry.Column20", "Entry.Column21", "Entry.Column22"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Entry1",{{"Entry.Column1", "INBOUND_CHANNEL"}, {"Entry.Column2", "OUTBOUND_CHANNEL"}, {"Entry.Column3", "DIRECTION"}, {"Entry.Column4", "DELIVERY_SEMANTICS"}, {"Entry.Column5", "SERVER_NODE"}, {"Entry.Column6", "FROM_PARTY_NAME"}, {"Entry.Column7", "FROM_SERVICE_NAME"}, {"Entry.Column8", "TO_PARTY_NAME"}, {"Entry.Column9", "TO_SERVICE_NAME"}, {"Entry.Column10", "ACTION_NAME"}, {"Entry.Column11", "ACTION_TYPE"}, {"Entry.Column12", "SCENARIO_IDENTIFIER"}, {"Entry.Column13", "MESSAGE_COUNTER"}, {"Entry.Column14", "MAX_MESSAGE_SIZE"}, {"Entry.Column15", "MIN_MESSAGE_SIZE"}, {"Entry.Column16", "AVG_MESSAGE_SIZE"}, {"Entry.Column17", "MAX_RETRY_COUNTER"}, {"Entry.Column18", "MIN_RETRY_COUNTER"}, {"Entry.Column19", "AVG_RETRY_COUNTER"}, {"Entry.Column20", "AVG_PROCESSING_TIME"}, {"Entry.Column21", "TOTAL_PROCESSING_TIME"}}),
    #"Expanded MP" = Table.ExpandTableColumn(#"Renamed Columns", "Entry.Column22", {"MP"}, {"Entry.Column22.MP"})
in
    #"Expanded MP"

In the code above, you will notice the “Source=” is using a local file. You can uncomment the “Web” source and comment out the “File” source if you are pulling the data direct via HTTP.

With the Power Query code entered into the editor, check there are no syntax errors and click “Done”:

When querying the data directly over HTTP you will need to edit the credentials at this point.
In the credentials screen, enter the “Basic” username and password to use.
The data will be displayed.
In my sample I see two rows of data:

At the initial top-level, you will see we have the Average Processing Time (in milliseconds) for each interface:

We also have an additional column at the end which contains embedded tables of additional metric data for each specific stage of processing within PI/PO:

By clicking the double arrows in the top of the header for the “Entry.Column22.MP” we can expand the embedded table (should you wish to), and you will see that it presents the following additional columns of data:

When you click “OK” it adds those columns to the main list, but it will create additional rows of data for each of those additional columns that have been expanded:

With the above data expanded, we can really produce some nice graphs.
Here’s an example showing the breakdown of average response time for each of those processing stages.
First I put them into a pivot table and apply an average to the “Avg” column for each of the “Name” column values :

Then I create a pie chart for the data and we can report on which processing stage inside PI/PO is consuming the most time:

By applying additional graphs and filters we could report on individual interfaces’ overall response times, then allow drill-down into the specifics.

Any Potential Issues?

There is one big caveat with the above process of pulling the data from the servlet.
The servlet is extracting data from a memory cache area inside PI/PO.
This cache is an LRU cache, meaning it has a limited size and gets overwritten when it becomes full.
The quantity of data is therefore limited.

It is possible that you could switch on the database persistence (logging) of successful messages in PI/PO, but this has a detrimental impact to message throughput performance and is not recommended by SAP for production systems.

To try and get around the cache limitations, my advice would be to extract the data using the smallest granular frequency that the servlet allows (MINUTE), and save to a disk file which could be accessible from Excel somehow.
Another method could be to use Microsoft Power Automate (previously MS Flow) to pull the file into Sharepoint or other storage.
By extracting the data frequently, you are trying to ensure that you have it stored before the cache is cleared, but also you are building a time-series from which you could potentially include into your reporting tool.
A time-series would allow you to scroll through data in windows of at least 15 mins in size. Not bad.

Summary

We identified the important areas of data collection in PI/PO (and Netweaver in general) to allow response times to be seen.
We also noted that HTTP response codes such as 503 should be obtained from the first outermost point of the Netweaver stack (if possible).

We saw an example of using the “PerformanceDataQueryServlet” to pull data from the PI/PO memory cache and transformed it using Power Query to allow detailed reporting on the response times.
I created a demonstration graph from a pivot table using my sample data, which showed a possible drill-down in the response time of individual processing stages within SAP PI/PO.

Hopefully I have given you some ideas on for how you can solve your PI/PO performance reporting requirement.