This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

SAP ASE Error – Process No Longer Found After Startup

This post is about a strange issue I was hitting during the configuration of SAP LaMa 3.0 to start/stop a SAP ABAP 7.52 system (with Kernel 7.53) that was running with a SAP ASE 16.0 database.

During the LaMa start task, the task would fail with an error message: “ASE process no longer found after startup. (fault code: 127)“.

When I logged directly onto the SAP server Linux host, I could see that the database had indeed started up, eventually.
So what was causing the failure?

The Investigation

At first I thought this was related to the Kernel, but having checked the versions of the Kernel components, I found that they were the same as another SAP system that was starting up perfectly fine using the exact same LaMa system.

The next check I did was to turn on tracing on the hostagent itself. This is a simple task of putting the trace value to “3” in the host_profile of the hostagent and restarting it:

service/trace = 3

The trace output is shown in a number of different trace files in the work directory of the hostagent but the trace file we were interested in is called dev_sapdbctrl.

The developer trace file for the sapdbctrl binary executable is important, because the sapdbctrl binary is executed by SAP hostagent (saphostexec) to perform the database start. If you observe the contents of the sapdbctrl trace output, you will see that it loads the Sybase specific shared library which contains the required code to start/stop the ASE database.

The same sapdbctrl also contains the ability to load the required libraries for other database systems.

As a side note, it is still not known to me, how the Sybase shared library comes to exist in the hostagent executable directory. When SAP ASE is patched, this library must also be patched, otherwise how does the hostagent stay in-step with the ASE database that it needs to talk with?

Once tracing was turned on, I shut the SAP ASE instance down again and then used SAP LaMa to initiate the SAP system start once again.
Sure enough, the LaMa start task failed again.

Looking in the trace file dev_sapdbctrl I could see the same error message that I was seeing in SAP LaMa:

Error: Command execution failed. : ASE process no longer found after startup. 
(fault code: 127) Operation ID: 000D3A3862631EEAAEDDA232BE624001
----- Log messages ---- 
Info: saphostcontrol: Executing StartDatabase 
Error: sapdbctrl: ASE process no longer found after startup. 
Error: saphostcontrol: StartDatabase failed (sapdbctrl exit code = 1)

This was great. It confirmed that SAP LaMa was just seeing the symptom of some other issue, since LaMa just calls the hostagent to do the start.

Now I knew the hostagent was seeing the error, I tried using the hostagent directly to perform the start, using the following:

/usr/sap/hostctrl/exe/saphostctrl -debug -function StartDatabase -dbname <SID> -dbtype syb -dbhost <the-ASE-host>

NOTE: The hostagent “-debug” command line option puts out the same information without the need for the hostagent tracing to be turned on in the host_profile.

Once again, the start process failed and the same error message was present in the dev_sapdbctrl trace file.

This was now really strange.
I decided that the next course of action was to start the process of raising the issue with SAP via an incident.
If you suspect that something could take a while to fix, then it’s always best to raise it with SAP early and continue to look at the issue in parallel.

Continuing the Diagnosis

While the SAP incident was in progress, I continued the process of trying to self-diagnose the issue further.
I tried a couple more things such as:

  • Starting and stopping SAP ASE manually using stopdb/startdb commands.
  • Restarting the whole server (this step has a place in every troubleshooting process, eventually).
  • Checking the server patch level.
  • Checking the environment of the Linux user, the shell, the profile files, the O/S limits applied.
  • Checking what happens if McAfee anti-virus was disabled (I’ve seen the ePO blocking processes before).

Eventually exhaustion set in and I decided to give the SAP support processor time to get back to me with some hints.

Some Sleep

I spend a lot of time solving SAP problems. A lot of time.
Something doesn’t work according to the docs, something did work but has stopped working, something has never worked well…
It builds up in your mind and you carry this stuff around in your head.
Subconsciously you think about these problems.

Then, at about 3am when you can’t get back to sleep, you have a revelation.
The hostagent is forking the process to start the database as the syb<sid> Linux user (it uses “su”), from the root user (hostagent runs as the root user).

Linux Domain Users

The revelation I had regarding the forking of the user, was just the trigger I needed to make me consider the way the Linux authentication was setup on this specific server with the problem ASE startup.

I remembered at the beginning of the project that I had hit an issue with the SSSD Linux daemon, which is responsible for interfacing between Linux and Microsoft Active Directory. At that time, the issue was causing the hostagent to hang when operations were executed which required a switch to another Linux user.
This previous issue was actually a hostagent issue that was fixed in a later hostagent patch. During that particular investigation, I requested that the Linux team re-configure the SSSD daemon to be more efficient with its Active Directory traversals, when it was looking to see if the desired user account was local to Linux or if it was a domain account.

With this previous issue in mind, I checked the SSSD configuration on the problem server. This is a simple conf file in /etc/sssd.

The Solution

After all the troubleshooting, the raising of the incident, the sleeping, I had finally got to the solution.

After checking the SSSD daemon configuration file /etc/sssd/sssd.conf, I could clearly see that there was one entry missing compared to the other servers that didn’t experience the SAP ASE start error.

The parameter: “subdomain_enumerate = none” was missing.
Looking at the manual page for SSSD it would seem that without this parameter there is additional overhead during any Active Directory traversal.

I set the parameter accordingly in the /etc/sssd/sssd.conf file and restarted the SSSD daemon using:

service sssd restart

Then I retried the start of the database using the hostagent command shown previously.
It worked!
I then retried with SAP LaMa and that also now started ASE without error messages.

Root Cause

What it seems was happening, was some sort of internal pre-set timeout in the sapdbctrl binary, when hit, the sapdbctrl just abandons and throws the error that I was seeing. This leaves the ASE database to continue and start (the process was initiated), but in the hostagent it looked like it had failed to start.
By adding the “subdomain_enumerate = none” parameter, any “delay”, caused by inappropriate call to Active Directory was massively reduced and subsequent start activities were successful.

Azure Disk Cache Settings for an SAP Database on Linux

One of your go-live tasks once you have built a VM in Azure, should be to ensure that the Azure disk cache settings on the Linux VM data disks, are set correctly in accordance with the Microsoft recommended settings.
In this post I explain the disk cache options and how they apply to SAP and especially to SAP databases such as SAP ASE and SAP HANA, to ensure you get optimum performance.

What Are the Azure Disk Cache Settings?

In Microsoft Azure you can configure different disk cache settings on data disks that are attached to a VM.
NOTE: You do not need to consider changing the O/S root disk cache settings, as by default they are applied as per the Azure recommendations.

Only specific VMs and specific disks (Standard or Premium Storage) have the ability to use caching.
If you use Azure Standard storage, the cache is provided by local disks on the physical server hosting your Linux VM.
If you use Azure Premium storage, the cache is provided by a combination of RAM and local SSD on the physical server hosting your Linux VM.

There are 3 different Azure disk cache settings:

  • None
  • ReadOnly (or “read-only”)
  • ReadWrite (or “read/write”)

The cache settings can influence the performance and also the consistency of the data written to the Azure storage service where your data disks are stored.

Cache Setting: None

By specifying “None” as the cache setting, no caching is used and a write operation at the VM O/S level is confirmed as completed once the data is written to the storage service.
All read operations for data not already in the VM O/S file system cache, will be read from the storage service.

Cache Setting: ReadOnly

By specifying “ReadOnly” as the cache setting, a write operation at the VM O/S level is confirmed as completed once the data is written to the storage service.
All read operations for data not already in the VM O/S file system cache, will be read from the read cache on the underlying physical machine, before being read from the storage service.

Cache Setting: ReadWrite

By specifying “ReadWrite” as the cache setting, a write operation at the VM O/S level is confirmed as completed once the data is written to the cache on the underlying physical machine.
All read operations for data not already in the VM O/S file system cache, will be read from the read cache on the underlying physical machine, before being read from the storage service.

Where Do We Configure the Disk Cache Settings?

The disk cache settings are configured in Azure against the VM (in the Disks settings), since the disk cache is both physical host and VM series dependent. It is *not* configured against the disk resource itself, as explained in my previous blog post: Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

Any Recommendations for Disk Cache Settings?

There are specific recommendations for Azure disk cache settings, especially when running SAP and especially when running databases like SAP ASE or SAP HANA.

In general, the rules are:

Disk UsageAzure Disk Cache Setting
Root O/S disk (/)ReadWrite – ALWAYS!
HANA SharedReadOnly
ASE Home
(/sybase/<SID>)
ReadOnly
Database DataHANA=None, ASE=ReadOnly
Database LogNone

The above settings for SAP ASE have been obtained from SAP note 2367194 (SQL Server is same as ASE) and from the general deployment guide here: https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/dbms_guide_general
The use of write caching on the ASE home is optional, you could choose ReadOnly, it would help protect the ASE config file in a very specific scenario. It is envisaged that using ASE 16.0 with SRS/HADR you would have a separate data disk for the Replication Server data (I’ll talk about this in another post).

The above settings for HANA have been taken from the updated guide here: https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/hana-vm-operations-storage which is designed to meet the KPIs mentioned in SAP note 2762990.

The reason for not using a write cache every time, is because an issue at the physical host level, affecting the cache, could cause the application (e.g database) to think it has committed data, when it actually isn’t written to disk. This is not good for databases, especially if the issue affects the transaction/redo log area. Data loss could occur.

It’s worth noting that this cache “issue” has always been true of every caching technology ever created, on which databases run. Storage tech vendors try to mitigate this by putting batteries into the storage appliances, but since the write cache in Azure is at the physical host level, there’s just no guarantee that when the VM O/S thinks the write operation has committed to disk, that it has actually been written to disk.

How About Write Accelerator?

There are specific Azure VM series (M-series at current) that support something known as “Write Accelerator”.
This is an extra VM level setting for Premium Storage disks attached to M-series VMs.

Enabling the Write Accelerator setting is a requirement by Microsoft for production SAP HANA transaction log disks on M-Series VMs. This setting ebales the Azure VM to meet the SAP HANA key performance indicators in note 2762990. Azure Write Accelerator is designed to provide lower latency write times on Premium Storage.

You should ensure that the Write Accelerator setting is enabled where appropriate, for your HANA database transaction log disks. You can check if it is enabled following my previous blog post: Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

I’ve tried my best to find more detailed information on how the Write Accelerator feature is actually provided, but unfortunately it seems very elusive. Robert Boban (of Microsoft) commented on a LinkedIn post here: “It is special caching impl. for M-Series VM to fulfill SAP HANA req. for <1ms latency between VM and storage layer.“.

Check the IOPS

Once you have configured your disks and the cache settings, you should ensure that you test the IOPS achieved using the Microsoft recommended process.
You can follow similar steps as my previous post: Recreating SAP ASE Database I/O Workload using Fio on Azure

As mentioned in other places in the Microsoft documentation and SAP notes such as 2367194, you need to ensure that you choose the correct size and series of VM to ensure that you align the required VM maximum IOPS with the intended amount of data disks and their potential IOPS maximum. Otherwise you could hit the VM max IOPS before touching the disk IOPS maximum.

Enable Accelerated Networking

Since the storage is itself connected to your VM via the network, you should ensure that Accelerator Networking is enabled in your VMs Network Settings:

Checking Cache Settings Directly on the VM

As per my previous post Checking Azure Disk Cache Settings on a Linux VM in Shell, you can actually check the Azure disk cache settings on the VM itself. You can do it manually, or write a script (better option for whole landscape validation).

Summary:

I discussed the two types of storage (standard or premium) that offer disk caching, plus where in Azure you need to change the setting.
The table provided a list of cache settings for both SAP ASE and SAP HANA databases and their data disk areas, based on available best-practices.

I mentioned Write Accelerator for HANA transaction log disks and ensuring that you enable Accelerated Networking.
Also provided was a link to my previous post about running a check of IOPS for your data disks, as recommended by Microsoft as part of your go-live checks.

A final mention was made another post of mine, with a great way of checking the disk cache settings across the VMs in the landscape.

Useful Links:

Windows File Cache

https://docs.microsoft.com/en-us/azure/virtual-machines/linux/premium-storage-performance

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/how-to-enable-write-accelerator

https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/hana-vm-operations-storage#production-storage-solution-with-azure-write-accelerator-for-azure-m-series-virtual-machines

https://petri.com/digging-into-azure-vm-disk-performance-features

https://techcommunity.microsoft.com/t5/running-sap-applications-on-the/sap-on-azure-general-update-march-2019/ba-p/377456

https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/dbms_guide_general

https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/hana-vm-operations-storage

SAP Note 2762990 – How to interpret the report of HWCCT File System Test

SAP Note 2367194 – Use of Azure Premium SSD Storage for SAP DBMS Instance

Recreating SAP ASE Database I/O Workload using Fio on Azure

After you have deployed SAP databases onto Azure hosted virtual machines, you may find that sometimes you don’t get the performance you were expecting.

 

How can this be? It’s guaranteed isn’t it?
Well, the answer is, as with everything, sometimes it just doesn’t work that way and there are a large number of factors involved.
Even the Microsoft consultants I’ve spoken with have a check point for customers to confirm at the VM level, that they are seeing the IOPS that they are expecting to see.
Especially when deploying high performance applications such as SAP HANA in Azure.
I can’t comment on the reasons why performance may not be as expected, although I do have my own theories.

Let’s look at how we can simply simulate an SAP ASE 16.0 SP03 database I/O operation, so that we can run a reasonably representative and repetitive test, without the need for ASE to even be installed.
Remember, your specific workload could be different due to the design of your database, type and size of transactions and other factors.
What I’m really trying to show here, is how you can use an approximation to provide a simple test that is repetitive and doesn’t need ASE installing.

Microsoft have their own page dedicated to running I/O tests in Azure, and they document the use of the Fio tool for this process.
Read further detail about Fio here: https://docs.microsoft.com/en-gb/azure/virtual-machines/linux/disks-benchmarks

Since you may need to show your I/O results to your local Microsoft representative, I would recommend you use the tool that Microsoft are familiar with, and not some other tool. This should help speed up any fault resolution process.

NOTE: The IOPS will not hit the maximum achievable, because in our test, the page/block size is too high for this. Microsoft’s quoted Azure disk values are achievable only with random read, 8KB page sizes, multiple threads/jobs and a queue depth of 256 (see here: https://docs.microsoft.com/en-gb/azure/virtual-machines/linux/disks-benchmarks).

In SAP ASE 16.0 SP03 (this is the version I had to hand) on a SUSE Linux 12.3 server, imagine we run a SQL operation like “SELECT * FROM MYTABLE WHERE COL2=’X'” which in our example causes an execution path that performs a table scan of the table MYTABLE.
The table scan results in an asynchronous sequential read of the single database data file (data device) on the VM disk which is an LVM logical volume striped over 3 physical disks that make up the one volume group.

We are going to assume that you have saptune installed and configured correctly for SAP ASE, so we will not touch on the Linux configuration.
One thing to note, is that our assumption includes that the Linux file system hosting the database devices is configured to permit direct I/O (avoiding the Linux filesystem cache). This helps with the test configuration setup.

SAP ASE will try and optimise our SQL operation if ASE has been configured correctly, and use a read-ahead algorithm with large I/O pages up-to 128KB. But even with the right ASE configuration, the use of 128KB pages is is not always possible, for example if the table is in some ways fragmented.
As part of our testing we will assume that 128KB pages are not being used. We will instead use 16KB, which is the smallest page size in ASE (worst case scenario).
We will also assume that our SQL statement results in exactly 1GB of data to be read from the disk each time.
This is highly unlikely in a tuned ASE instance, due to the database datacache. However, we will assume this instance is not tuned and under slight load, causing the datacache to have re-used the memory pages between tests.

If we look at the help page for the Fio tool, it’s a fairly hefty read.
Let’s start by translating some of the notations used to something we can appreciate with regards to our test scenario:

Fio Config Item            Our Test Values/Setup
I/O type                    = sequential read
Blocksize                 = 16KB
I/O size                    = 1024m (amount of data)
I/O engine               = asynch I/O – direct (unbuffered)
I/O depth                 = 2048 (disk queue depth)
Target file/device    = /sybase/AS1/sapdata/AS1_data_001.dat
Threads/processes/jobs = 1

We can see that from the list above, the queue depth is the only thing that we are not sure on.
The actual values can be determined by querying the Linux disk devices but in essence what this is doing is asking for a value that represents how much I/O can be queued for a specific disk device.
In checking my setup, I can see that I have 2048 defined on SLES 12 SP3.
More information on queue depth in Azure can be found here: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/premium-storage-performance#queue-depth

On SLES you can check the queue depth using the lsscsi command with the Long, Long, Long format (-lll):

lsscsi -lll

 

[5:0:0:4] disk Msft Virtual Disk 1.0 /dev/sdd
device_blocked=0
iocounterbits=32
iodone_cnt=0x2053eea
ioerr_cnt=0x0
iorequest_cnt=0x2053eea
queue_depth=2048
queue_type=simple
scsi_level=6
state=running
timeout=300
type=0

An alternative way to check is to output the content of the /proc/scsi/sg/devices file and look at the values in the 7th column:

cat /proc/scsi/sg/devices

 

2 0 0 0 0 1 2048 1 1
3 0 1 0 0 1 2048 0 1
5 0 0 0 0 1 2048 0 1
5 0 0 4 0 1 2048 0 1
5 0 0 2 0 1 2048 0 1
5 0 0 1 0 1 2048 0 1
5 0 0 3 0 1 2048 0 1

For the target file (source file in our read test case), we can either use an existing data device file (if ASE is installed and database exists), or we could create a new data file containing zeros, of 1GB in size.

Using “dd” you can quickly create a 1GB file full of zeros:

dd if=/dev/zero of=/sybase/AS1/sapdata/AS1_data_001.dat bs=1024 count=1048576

 

1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.4592 s, 166 MB/s

We will be using only 1 job/thread in Fio to perform the I/O test.
Generally in ASE 16.0 SP03, the number of “disk tasks” is configured using “sp_configure” and visible in the configuration file.
The configured value is usually 1 in a default installation and vary rarely needs adjusting.

See here: https://help.sap.com/viewer/379424e5820941d0b14683dd3a992d5c/16.0.3.5/en-US/a778c8d8bc2b10149f11a28571f24818.html

Once we’re happy with the above settings, we just need to apply them to the Fio command line as follows:

fio –name=global –readonly –rw=read –direct=1 –bs=16k –size=1024m –iodepth=2048 –filename=/sybase/AS1/sapdata/AS1_data_001.dat –numjobs=1 –name=job1

You will see the output of Fio on the screen as it performs the I/O work.
In testing, the amount of clock time that Fio takes to perform the work is reflective of the performance of the I/O subsystem.
In extremely fast cases, you will need to look at the statistics that have been output to the screen.

The Microsoft documentation and examples show running very lengthy operations on Fio, to ensure that the disk caches are populated properly.
In my experience, I’ve never had the liberty to explain to the customer that they just need to do the same operation for 30 minutes, over and over and it will be much better. I prefer to run this test cold and see what I get as a possible worst-case.

job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=2048
fio-3.10
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=109MiB/s][r=6950 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=87654: Tue Jan 14 06:36:01 2020
read: IOPS=6524, BW=102MiB/s (107MB/s)(1024MiB/10044msec)
clat (usec): min=49, max=12223, avg=148.22, stdev=228.29
lat (usec): min=49, max=12223, avg=148.81, stdev=228.39
clat percentiles (usec):
| 1.00th=[ 61], 5.00th=[ 67], 10.00th=[ 70], 20.00th=[ 75],
| 30.00th=[ 81], 40.00th=[ 88], 50.00th=[ 96], 60.00th=[ 108],
| 70.00th=[ 125], 80.00th=[ 159], 90.00th=[ 322], 95.00th=[ 412],
| 99.00th=[ 644], 99.50th=[ 848], 99.90th=[ 3097], 99.95th=[ 5145],
| 99.99th=[ 7963]
bw ( KiB/s): min=64576, max=131712, per=99.98%, avg=104379.00, stdev=21363.19, samples=20
iops : min= 4036, max= 8232, avg=6523.65, stdev=1335.24, samples=20
lat (usec) : 50=0.01%, 100=54.55%, 250=32.72%, 500=10.48%, 750=1.59%
lat (usec) : 1000=0.31%
lat (msec) : 2=0.20%, 4=0.07%, 10=0.07%, 20=0.01%
cpu : usr=6.25%, sys=20.35%, ctx=65541, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=65536,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=2048

 

Run status group 0 (all jobs):
READ: bw=102MiB/s (107MB/s), 102MiB/s-102MiB/s (107MB/s-107MB/s), io=1024MiB (1074MB), run=10044-10044msec

Disk stats (read/write):
dm-8: ios=64233/2, merge=0/0, ticks=7416/8, in_queue=7436, util=74.54%, aggrios=21845/0, aggrmerge=0/0, aggrticks=2580/2, aggrin_queue=2581, aggrutil=25.78%
sdg: ios=21844/0, merge=0/0, ticks=2616/0, in_queue=2616, util=25.78%
sdh: ios=21844/1, merge=0/0, ticks=2600/4, in_queue=2600, util=25.63%
sdi: ios=21848/1, merge=0/0, ticks=2524/4, in_queue=2528, util=24.92%

The lines of significance to you, will be:

– Line: IOPS.

Shows the min, max and average IOPS that were obtained during the execution. This should roughly correspond to the IOPS expected for the type of Azure disk on which your source data file is located. Remember that if you have striped file system with RAID under a logical volume manager, then you should expect to see more IOPS because you have more disks.

NOTE: The IOPS will not hit the maximum achievable, because our page/block size is too high for this. The Azure disk values are achievable only with random read, 8KB page sizes, multiple threads/jobs and a queue depth of 256 (https://docs.microsoft.com/en-gb/azure/virtual-machines/linux/disks-benchmarks).

– Lines: “lat (usec)” and “lat (msec)”.

These are the proportions of latency in micro and milliseconds respectively.
If you have high percentages in the millisecond ranges, then you may have an issue. You would not expect this for the type of disks you would want to be running an SAP ASE database on.

In my example above, I am using 3x P40 Premium Storage SSD disks.
You can tell it is a striped logical volume setup, because the very last 3 lines of output shows my 3 Linux disk device names (sdg, sdh and sdi) which sit under my volume group.

You can use the useful links here to determine what you should be seeing on your setup:

NOTE: If you are running SAP on the ASE database, then you will more than likely be using Premium Storage (it’s the only option supported by SAP) and it will be Azure Managed (not un-managed).

Let’s look at the same Fio output using a 128KB page size (like ASE would if it was using large I/O).
We use the same command line but just change the “-bs” parameter to 128KB:

fio –name=global –readonly –rw=read –direct=1 –bs=128k –size=1024m –iodepth=2048 –filename=/sybase/AS1/sapdata/AS1_data_001.dat –numjobs=1 –name=job1

 

job1: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=2048
fio-3.10
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=128MiB/s][r=1021 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=93539: Tue Jan 14 06:54:48 2020
read: IOPS=1025, BW=128MiB/s (134MB/s)(1024MiB/7987msec)
clat (usec): min=90, max=46843, avg=971.48, stdev=5784.85
lat (usec): min=91, max=46844, avg=972.04, stdev=5784.84
clat percentiles (usec):
| 1.00th=[ 101], 5.00th=[ 109], 10.00th=[ 113], 20.00th=[ 119],
| 30.00th=[ 124], 40.00th=[ 130], 50.00th=[ 137], 60.00th=[ 145],
| 70.00th=[ 157], 80.00th=[ 176], 90.00th=[ 210], 95.00th=[ 273],
| 99.00th=[42206], 99.50th=[42730], 99.90th=[43254], 99.95th=[43254],
| 99.99th=[46924]
bw ( KiB/s): min=130299, max=143616, per=100.00%, avg=131413.00, stdev=3376.53, samples=15
iops : min= 1017, max= 1122, avg=1026.60, stdev=26.40, samples=15
lat (usec) : 100=0.87%, 250=93.13%, 500=3.26%, 750=0.43%, 1000=0.13%
lat (msec) : 2=0.18%, 4=0.01%, 10=0.04%, 50=1.95%
cpu : usr=0.55%, sys=4.12%, ctx=8194, majf=0, minf=41
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=2048

Run status group 0 (all jobs):
READ: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s), io=1024MiB (1074MB), run=7987-7987msec

Disk stats (read/write):
dm-8: ios=8059/0, merge=0/0, ticks=7604/0, in_queue=7640, util=95.82%, aggrios=5461/0, aggrmerge=0/0, aggrticks=5114/0, aggrin_queue=5114, aggrutil=91.44%
sdg: ios=5461/0, merge=0/0, ticks=564/0, in_queue=564, util=6.96%
sdh: ios=5461/0, merge=0/0, ticks=7376/0, in_queue=7376, util=91.08%
sdi: ios=5462/0, merge=0/0, ticks=7404/0, in_queue=7404, util=91.44%

You can see that we actually got a lower IOPS value, but we returned all the data quicker and got a higher throughput.
This is due to the laws of how IOPS and throughput interact. A higher page/block size means we can potentially read more data in each I/O request.

Some of the performance randomness now becomes apparent, with the inconsistency of the “util” for each disk device. However, there is a note on the Fio webpage about how this metric (util) is not necessarily reliable.

You should note that, although we are doing a simulated direct I/O (unbuffered) operation at the Linux level, outside of Linux at the Azure level, there could be caching (data disk caching, which is actually cached on the underlying Azure physical host).

You can check your current setup directly in Azure or at the Linux level, by reading through my previous post on how to do this easily.

https://www.it-implementor.co.uk/2019/12/17/listing-azure-vm-datadisks-and-cache-settings-using-azure-portal-jmespath-bash/

Now for the final test.
Can we get the IOPS that we should be getting for our current setup and disks?

Following the Microsoft documentation to create the fioread.ini and execute (note it needs 120GB of disk space – 4 reader jobs x 30GB):

cat <<EOF > /tmp/fioread.ini
[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=8k

 

[reader1]
rw=randread
directory=/sybase/AS1/sapdata/

[reader2]
rw=randread
directory=/sybase/AS1/sapdata/

[reader3]
rw=randread
directory=/sybase/AS1/sapdata/

[reader4]
rw=randread
directory=/sybase/AS1/sapdata/
EOF

fio –runtime 30 /tmp/fioread.ini
reader1: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
reader2: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
reader3: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
reader4: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
fio-3.10
Starting 4 processes
reader1: Laying out IO file (1 file / 30720MiB)
reader2: Laying out IO file (1 file / 30720MiB)
reader3: Laying out IO file (1 file / 30720MiB)
reader4: Laying out IO file (1 file / 30720MiB)
Jobs: 4 (f=4): [r(4)][100.0%][r=128MiB/s][r=16.3k IOPS][eta 00m:00s]
reader1: (groupid=0, jobs=1): err= 0: pid=120284: Tue Jan 14 08:16:38 2020
read: IOPS=4250, BW=33.2MiB/s (34.8MB/s)(998MiB/30067msec)
slat (usec): min=3, max=7518, avg=10.06, stdev=43.39
clat (usec): min=180, max=156683, avg=60208.81, stdev=32909.11
lat (usec): min=196, max=156689, avg=60219.59, stdev=32908.61
clat percentiles (usec):
| 1.00th=[ 1549], 5.00th=[ 3294], 10.00th=[ 4883], 20.00th=[ 45351],
| 30.00th=[ 47973], 40.00th=[ 49021], 50.00th=[ 51643], 60.00th=[ 54789],
| 70.00th=[ 94897], 80.00th=[ 98042], 90.00th=[100140], 95.00th=[101188],
| 99.00th=[143655], 99.50th=[145753], 99.90th=[149947], 99.95th=[149947],
| 99.99th=[149947]
bw ( KiB/s): min=25168, max=46800, per=26.07%, avg=34003.88, stdev=4398.09, samples=60
iops : min= 3146, max= 5850, avg=4250.45, stdev=549.78, samples=60
lat (usec) : 250=0.01%, 500=0.02%, 750=0.12%, 1000=0.28%
lat (msec) : 2=1.35%, 4=5.69%, 10=5.72%, 20=1.15%, 50=30.21%
lat (msec) : 100=45.60%, 250=9.86%
cpu : usr=1.29%, sys=5.58%, ctx=6247, majf=0, minf=523
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=127794,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
reader2: (groupid=0, jobs=1): err= 0: pid=120285: Tue Jan 14 08:16:38 2020
read: IOPS=4183, BW=32.7MiB/s (34.3MB/s)(983MiB/30067msec)
slat (usec): min=3, max=8447, avg= 9.92, stdev=54.73
clat (usec): min=194, max=154937, avg=61163.27, stdev=32365.78
lat (usec): min=217, max=154945, avg=61173.85, stdev=32365.26
clat percentiles (usec):
| 1.00th=[ 1778], 5.00th=[ 3294], 10.00th=[ 5145], 20.00th=[ 46400],
| 30.00th=[ 47973], 40.00th=[ 49546], 50.00th=[ 52167], 60.00th=[ 55313],
| 70.00th=[ 94897], 80.00th=[ 98042], 90.00th=[100140], 95.00th=[101188],
| 99.00th=[111674], 99.50th=[145753], 99.90th=[147850], 99.95th=[149947],
| 99.99th=[149947]
bw ( KiB/s): min=26816, max=43104, per=25.67%, avg=33474.27, stdev=3881.96, samples=60
iops : min= 3352, max= 5388, avg=4184.27, stdev=485.26, samples=60
lat (usec) : 250=0.01%, 500=0.03%, 750=0.08%, 1000=0.15%
lat (msec) : 2=1.02%, 4=6.31%, 10=5.05%, 20=1.12%, 50=27.79%
lat (msec) : 100=49.09%, 250=9.37%
cpu : usr=1.14%, sys=5.53%, ctx=6362, majf=0, minf=522
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=125800,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
reader3: (groupid=0, jobs=1): err= 0: pid=120286: Tue Jan 14 08:16:38 2020
read: IOPS=3919, BW=30.6MiB/s (32.1MB/s)(921MiB/30066msec)
slat (usec): min=3, max=12886, avg= 9.40, stdev=56.68
clat (usec): min=276, max=151726, avg=65256.88, stdev=31578.48
lat (usec): min=283, max=151733, avg=65266.86, stdev=31578.73
clat percentiles (usec):
| 1.00th=[ 1958], 5.00th=[ 3884], 10.00th=[ 10421], 20.00th=[ 47449],
| 30.00th=[ 49021], 40.00th=[ 51119], 50.00th=[ 53740], 60.00th=[ 65274],
| 70.00th=[ 96994], 80.00th=[ 99091], 90.00th=[100140], 95.00th=[101188],
| 99.00th=[139461], 99.50th=[145753], 99.90th=[149947], 99.95th=[149947],
| 99.99th=[149947]
bw ( KiB/s): min=21344, max=42960, per=24.04%, avg=31354.32, stdev=5530.77, samples=60
iops : min= 2668, max= 5370, avg=3919.27, stdev=691.34, samples=60
lat (usec) : 500=0.01%, 750=0.05%, 1000=0.12%
lat (msec) : 2=0.92%, 4=4.15%, 10=4.59%, 20=0.59%, 50=25.92%
lat (msec) : 100=53.48%, 250=10.18%
cpu : usr=0.96%, sys=5.22%, ctx=7986, majf=0, minf=521
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=117853,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
reader4: (groupid=0, jobs=1): err= 0: pid=120287: Tue Jan 14 08:16:38 2020
read: IOPS=3955, BW=30.9MiB/s (32.4MB/s)(928MiB/30020msec)
slat (usec): min=3, max=9635, avg= 9.57, stdev=52.03
clat (usec): min=163, max=151463, avg=64699.59, stdev=32233.21
lat (usec): min=176, max=151468, avg=64709.90, stdev=32232.66
clat percentiles (usec):
| 1.00th=[ 1729], 5.00th=[ 3720], 10.00th=[ 7832], 20.00th=[ 46924],
| 30.00th=[ 48497], 40.00th=[ 51119], 50.00th=[ 53740], 60.00th=[ 87557],
| 70.00th=[ 96994], 80.00th=[ 99091], 90.00th=[100140], 95.00th=[102237],
| 99.00th=[109577], 99.50th=[143655], 99.90th=[147850], 99.95th=[147850],
| 99.99th=[147850]
bw ( KiB/s): min=21488, max=46320, per=24.22%, avg=31592.63, stdev=4760.10, samples=60
iops : min= 2686, max= 5790, avg=3949.05, stdev=595.03, samples=60
lat (usec) : 250=0.02%, 500=0.07%, 750=0.07%, 1000=0.09%
lat (msec) : 2=1.31%, 4=4.04%, 10=5.13%, 20=1.28%, 50=24.76%
lat (msec) : 100=52.89%, 250=10.35%
cpu : usr=1.06%, sys=5.21%, ctx=8226, majf=0, minf=522
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=118743,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
READ: bw=127MiB/s (134MB/s), 30.6MiB/s-33.2MiB/s (32.1MB/s-34.8MB/s), io=3830MiB (4016MB), run=30020-30067msec

Disk stats (read/write):
dm-8: ios=490190/1, merge=0/0, ticks=30440168/64, in_queue=30570784, util=99.79%, aggrios=163396/0, aggrmerge=0/0, aggrticks=10170760/21, aggrin_queue=10172817, aggrutil=99.60%
sdg: ios=162989/1, merge=0/0, ticks=10134108/64, in_queue=10135484, util=99.59%
sdh: ios=163379/0, merge=0/0, ticks=10175316/0, in_queue=10177440, util=99.60%
sdi: ios=163822/0, merge=0/0, ticks=10202856/0, in_queue=10205528, util=99.59%

throughput = [IOPS] * [block size]
example: 3000 IOPS * 8 (8KB) = 24000KB/s (24MB/s)

From our output, we can see how the IOPS and blocksize affect the throughput calculation:
16,300 (IOPS total) * 8 (8KB) = 130400KB/s (127MB/s)

Simple answer, no, we don’t get what we expect for our P40 disks. Further investigation required. 🙁

Complications of using SAP ASE 16.0 in a HADR pair plus DR node Setup

Firstly, we need to clarify that HADR in SAP ASE speak, is the SAP ASE feature-set name for a HA or DR setup consisting of 2 SAP ASE database instances with a defined replication mode.

The pair can be either for HA or DR, but rarely both, due to the problem of latency.
The problem of latency is inverse to the solution of DR. The further away your second datacentre, the better, from a DR perspective.
Conversely, the worse your latency will become, meaning it can only seriously be used for DR, and not for HA.

If you can find a sweet spot between distance (better for DR) and latency (better for HA), then you would have a HADR setup. But this is unlikely.

As of ASE 16 SP03, an additional DR node is supported to be incorporated into a HADR pair of ASE database instances.
This produces a 3 node setup, with 2 nodes forming a pair (designed to be for HA), then a remote 3rd node (designed for DR).
The reason you may consider such a setup is to provide HA between the two nodes, maybe within an existing datacentre, then DR is provided by a remote 3rd node.
Since the two nodes within the HA pair would likely have low latency, they would have one replication mode (e.g synchronous replication) keeping the data better protected, with the replication mode to the third database being asynchronous, for higher latency scenarios, but less protected data.

In the scenarios and descriptions below, we are highlighting the possibility of running a two node HADR pair plus DR node in public cloud using a paired region:

Whilst an SAP application layer is also supported on the 3 node setup, there are complications that should be understood prior to implementation.
These complications will drive up both cost of implementation and also administrative overhead, so you should ensure that you fully understand how the setup will work before embarking on this solution.

Setup Process:

We will briefly describe the process for setting up the 3 nodes.
In this setup we will use the remote, co-located replication server setup, whereby the SAP SRS (replication server) is installed onto the same servers as the ASE database instances.

1, Install primary ASE database instance.

2, Install Data Movement (DM) component into the binary software installation of the primary ASE database instance.

3, Install secondary ASE database instance.

4, Install Data Movement (DM) component into the binary software installation of the secondary ASE database instance.

5, Run the setuphadr utility to configure the replication between primary and secondary.

This step involves the materialisation of the master and <SID> databases. The master database materialisation is automatic, the <SID> database is manual and requires dump & load.

Therefore, if you have a large <SID> database, then materialisation can take a while.

6, Install tertiary ASE database instance.

7, Install Data Movement (DM) component into the binary software installation of the tertiary ASE database instance.

8, Run the setuphadr utility to configure the tertiary ASE instance as a DR node.

This step involves the materialisation of the master and <SID> databases. The master database materialisation is automatic, the <SID> database is manual and requires dump & load.
Therefore, if you have a large <SID> database, then materialisation can take a while.

In the above, you can adjust the replication mode between primary and secondary, depending on your latency.
In Public cloud (Microsoft Azure), we found that the latency between paired regions was perfectly fine for asynchronous replication mode.
This also permitted the RPO to be met, so we actually went asynchronous all the way through.

POINT 1:

Based on the above, we have our first point to make.

When doing the dump & load for the tertiary database, both master and <SID> databases are taken from the primary database, which in most cases will be in a different datacentre, so materialisation of a large <SID> database will take longer than the secondary database materialisation timings.

You will need to develop a process for quickly getting the dump across the network to the tertiary database node (before the transaction log fills up on the primary).

Developing this fast materialisation process is crucial to the operation of the 3 node setup, since you will be doing this step a lot.

Operational Process:

We now have a 3 node setup, with replication happily pushing primary database transactions from primary (they go from the Replication Agent within the primary ASE instance), to the SRS on the secondary ASE node.
The SRS on the secondary instance then pushes the transactions into the secondary ASE instance databases (master & <SID>) and also to the SRS on the tertiary ASE database instance.

While this is working, you can see the usual SRS output by connecting into the SRS DR Agent on the secondary node and issuing the “sap_status path” command.
The usual monitoring functions exist for monitoring the 3 ASE nodes. You can use the DBACockpit (DB02) in a Netweaver ABAP stack, the ASE Fault Manager or manually at the command line.

One of the critical processes with an ASE HADR setup, is the flow of transactions from primary. You will be constantly engaged trying to prevent the backlog of transactions, which could cause primary to halt database commits until transaction log space is freed.
By correctly sizing the whole chain (primary, secondary and tertiary transaction logs) plus sizing the inbound queues of the SRS, you should have little work to do on a daily basis.

POINT 2:

It’s not the daily monitoring that will impact, but the exceptional change scenarios.
As an example, all 3 ASE database instances should have the same database device sizes, transaction log sizes and configuration settings.
Remembering to increase the device, database, transaction log, queue on each of them can be arduous and mistakes can be made.
Putting a solid change process around the database and SRS is very important to avoid primary database outages.
Since all 3 databases are independent, you can’t rely on auto-growby to grow the devices and databases in sync. So you may need to consider manually increasing the device and database sizes.

Failover Process:

During a failover, the team need to be trained in the scenario of recovery of the data to whichever database server node is active/available/healthy.
The exact scenario training could be difficult as it may involve public cloud, in which case it may not be possible to accurately simulate.
For the 3 node SAP ASE HADR + DR node, the failure scenario that you experience could make a big difference to you overall recovery time.

When we mention recovery time, we are not just talking about RPO/RTO for getting production systems working, we are talking about the time to actually recover the service to a protected state.
For example, recovery of the production database to a point where it is once again adequately protected from failure through database replication.

Loss of the primary database in a 3 node setup, means that the secondary node is the choice to become primary.
In this scenario, the secondary SRS is no longer used. Instead the SRS on the DR node would be configured to be the recipient of transactions from the Replication Agent of the secondary ASE.
If done quickly enough, then re-materialisation of the tertiary database can be avoided as both secondary and tertiary should have the same point-in-time.
In practice however, you will find more often than not, that you are just re-materialising the DR node from the secondary.
In some cases, you may decide not to both until the original primary is back in action. The effort is just too much.

Loss of the secondary database in a 3 node setup, means that the primary becomes instantly unprotected!
Both the secondary node and the tertiary node will drift out of sync.
In this scenario, you will more than likely find that you will be pushed for time and need to teardown the replication on the primary database to prevent the primary transaction lo filling.

Loss of the tertiary database in a 3 node setup, means that you no longer have DR protection for your data!
The transaction log on the primary will start to fill because secondary SRS will be unable to commit transactions in the queue to the tertiary database.
In this scenario, you will more than likely find that you will be pushed for time and need to re-materialise the DR database from the primary.
Time will be of the essence, because you will need transaction log space available in the primary database and queue space in the SRS, for the time to perform the re-materialsation.

POINT 3:

Sizing of the production transaction log size is crucial.
The same size is needed on the secondary and tertiary databases (to allow materialisation (dump & load) to work.
The SRS queue size also needs to be a hefty size (bigger than the transaction log size) to accommodate the transactions from the transaction log.
The primary transaction log size is no longer now just about daily database transactional throughput, but is also intertwined with the requirement for the time it takes to dmp & load the DB across the network to the DR node (slowest link in the chain).
Plus, on top of the above sizings, you should accommodate some additional buffer space for added delays, troubleshooting, decision making.

You should understand your dump & load timings intricately to be able to understand your actual time to return production to a protected state. This will help you decide which is the best route to that state.

Maintenance Process:

Patching a two node ASE HADR setup, is fairly simple and doesn’t take too much effort in planning.
Patching a three node setup (HADR + DR node), involves a little more thought due to the complex way you are recommended to patch.
The basics of the process are that you should be patching the inactive portions of the HADR + DR setup.
Therefore, you end up partially patching the ASE binary stack, leaving the currently active primary SRS (on the secondary node) until last.
As well patching the ASE binaries, you will also have to patch the SAP Hostagent on each of the three nodes. Especially since the Hostagent is used to perform the ASE patching process.
Since there is also a SAP instance agent present on each database node, you will also need to patch the SAP Kernel (SAPEXE part only) on each database node.

POINT 4:

Database patching & maintenance effort increases with each node added. Since the secondary and DR nodes have a shared nothing architecture, you patch specific items more than once across the three nodes.

Summary:

The complexity of managing a two node SAP ASE HADR pair plus DR node should not be underestimated.
You can gain the ability to have HA and DR, especially in a public cloud scenario, but you will pay a heavy price in overhead from maintenance and potentially lose time during a real DR due to the complexity.
It really does depend on how rigid you can be at defining your failover processes and most importantly, testing them.

Carefully consider the cost of HA and DR, versus just DR (using a two node HADR setup with the same asynchronous replication mode).
Do you really need HA? Is your latency small enough to permit a small amount of time running across regions (in public cloud)?