This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Recreating SAP ASE Database I/O Workload using Fio on Azure

After you have deployed SAP databases onto Azure hosted virtual machines, you may find that sometimes you don’t get the performance you were expecting.

 

How can this be? It’s guaranteed isn’t it?
Well, the answer is, as with everything, sometimes it just doesn’t work that way and there are a large number of factors involved.
Even the Microsoft consultants I’ve spoken with have a check point for customers to confirm at the VM level, that they are seeing the IOPS that they are expecting to see.
Especially when deploying high performance applications such as SAP HANA in Azure.
I can’t comment on the reasons why performance may not be as expected, although I do have my own theories.

Let’s look at how we can simply simulate an SAP ASE 16.0 SP03 database I/O operation, so that we can run a reasonably representative and repetitive test, without the need for ASE to even be installed.
Remember, your specific workload could be different due to the design of your database, type and size of transactions and other factors.
What I’m really trying to show here, is how you can use an approximation to provide a simple test that is repetitive and doesn’t need ASE installing.

Microsoft have their own page dedicated to running I/O tests in Azure, and they document the use of the Fio tool for this process.
Read further detail about Fio here: https://docs.microsoft.com/en-gb/azure/virtual-machines/linux/disks-benchmarks

Since you may need to show your I/O results to your local Microsoft representative, I would recommend you use the tool that Microsoft are familiar with, and not some other tool. This should help speed up any fault resolution process.

NOTE: The IOPS will not hit the maximum achievable, because in our test, the page/block size is too high for this. Microsoft’s quoted Azure disk values are achievable only with random read, 8KB page sizes, multiple threads/jobs and a queue depth of 256 (see here: https://docs.microsoft.com/en-gb/azure/virtual-machines/linux/disks-benchmarks).

In SAP ASE 16.0 SP03 (this is the version I had to hand) on a SUSE Linux 12.3 server, imagine we run a SQL operation like “SELECT * FROM MYTABLE WHERE COL2=’X'” which in our example causes an execution path that performs a table scan of the table MYTABLE.
The table scan results in an asynchronous sequential read of the single database data file (data device) on the VM disk which is an LVM logical volume striped over 3 physical disks that make up the one volume group.

We are going to assume that you have saptune installed and configured correctly for SAP ASE, so we will not touch on the Linux configuration.
One thing to note, is that our assumption includes that the Linux file system hosting the database devices is configured to permit direct I/O (avoiding the Linux filesystem cache). This helps with the test configuration setup.

SAP ASE will try and optimise our SQL operation if ASE has been configured correctly, and use a read-ahead algorithm with large I/O pages up-to 128KB. But even with the right ASE configuration, the use of 128KB pages is is not always possible, for example if the table is in some ways fragmented.
As part of our testing we will assume that 128KB pages are not being used. We will instead use 16KB, which is the smallest page size in ASE (worst case scenario).
We will also assume that our SQL statement results in exactly 1GB of data to be read from the disk each time.
This is highly unlikely in a tuned ASE instance, due to the database datacache. However, we will assume this instance is not tuned and under slight load, causing the datacache to have re-used the memory pages between tests.

If we look at the help page for the Fio tool, it’s a fairly hefty read.
Let’s start by translating some of the notations used to something we can appreciate with regards to our test scenario:

Fio Config Item            Our Test Values/Setup
I/O type                    = sequential read
Blocksize                 = 16KB
I/O size                    = 1024m (amount of data)
I/O engine               = asynch I/O – direct (unbuffered)
I/O depth                 = 2048 (disk queue depth)
Target file/device    = /sybase/AS1/sapdata/AS1_data_001.dat
Threads/processes/jobs = 1

We can see that from the list above, the queue depth is the only thing that we are not sure on.
The actual values can be determined by querying the Linux disk devices but in essence what this is doing is asking for a value that represents how much I/O can be queued for a specific disk device.
In checking my setup, I can see that I have 2048 defined on SLES 12 SP3.
More information on queue depth in Azure can be found here: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/premium-storage-performance#queue-depth

On SLES you can check the queue depth using the lsscsi command with the Long, Long, Long format (-lll):

lsscsi -lll

 

[5:0:0:4] disk Msft Virtual Disk 1.0 /dev/sdd
device_blocked=0
iocounterbits=32
iodone_cnt=0x2053eea
ioerr_cnt=0x0
iorequest_cnt=0x2053eea
queue_depth=2048
queue_type=simple
scsi_level=6
state=running
timeout=300
type=0

An alternative way to check is to output the content of the /proc/scsi/sg/devices file and look at the values in the 7th column:

cat /proc/scsi/sg/devices

 

2 0 0 0 0 1 2048 1 1
3 0 1 0 0 1 2048 0 1
5 0 0 0 0 1 2048 0 1
5 0 0 4 0 1 2048 0 1
5 0 0 2 0 1 2048 0 1
5 0 0 1 0 1 2048 0 1
5 0 0 3 0 1 2048 0 1

For the target file (source file in our read test case), we can either use an existing data device file (if ASE is installed and database exists), or we could create a new data file containing zeros, of 1GB in size.

Using “dd” you can quickly create a 1GB file full of zeros:

dd if=/dev/zero of=/sybase/AS1/sapdata/AS1_data_001.dat bs=1024 count=1048576

 

1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.4592 s, 166 MB/s

We will be using only 1 job/thread in Fio to perform the I/O test.
Generally in ASE 16.0 SP03, the number of “disk tasks” is configured using “sp_configure” and visible in the configuration file.
The configured value is usually 1 in a default installation and vary rarely needs adjusting.

See here: https://help.sap.com/viewer/379424e5820941d0b14683dd3a992d5c/16.0.3.5/en-US/a778c8d8bc2b10149f11a28571f24818.html

Once we’re happy with the above settings, we just need to apply them to the Fio command line as follows:

fio –name=global –readonly –rw=read –direct=1 –bs=16k –size=1024m –iodepth=2048 –filename=/sybase/AS1/sapdata/AS1_data_001.dat –numjobs=1 –name=job1

You will see the output of Fio on the screen as it performs the I/O work.
In testing, the amount of clock time that Fio takes to perform the work is reflective of the performance of the I/O subsystem.
In extremely fast cases, you will need to look at the statistics that have been output to the screen.

The Microsoft documentation and examples show running very lengthy operations on Fio, to ensure that the disk caches are populated properly.
In my experience, I’ve never had the liberty to explain to the customer that they just need to do the same operation for 30 minutes, over and over and it will be much better. I prefer to run this test cold and see what I get as a possible worst-case.

job1: (g=0): rw=read, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=2048
fio-3.10
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=109MiB/s][r=6950 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=87654: Tue Jan 14 06:36:01 2020
read: IOPS=6524, BW=102MiB/s (107MB/s)(1024MiB/10044msec)
clat (usec): min=49, max=12223, avg=148.22, stdev=228.29
lat (usec): min=49, max=12223, avg=148.81, stdev=228.39
clat percentiles (usec):
| 1.00th=[ 61], 5.00th=[ 67], 10.00th=[ 70], 20.00th=[ 75],
| 30.00th=[ 81], 40.00th=[ 88], 50.00th=[ 96], 60.00th=[ 108],
| 70.00th=[ 125], 80.00th=[ 159], 90.00th=[ 322], 95.00th=[ 412],
| 99.00th=[ 644], 99.50th=[ 848], 99.90th=[ 3097], 99.95th=[ 5145],
| 99.99th=[ 7963]
bw ( KiB/s): min=64576, max=131712, per=99.98%, avg=104379.00, stdev=21363.19, samples=20
iops : min= 4036, max= 8232, avg=6523.65, stdev=1335.24, samples=20
lat (usec) : 50=0.01%, 100=54.55%, 250=32.72%, 500=10.48%, 750=1.59%
lat (usec) : 1000=0.31%
lat (msec) : 2=0.20%, 4=0.07%, 10=0.07%, 20=0.01%
cpu : usr=6.25%, sys=20.35%, ctx=65541, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=65536,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=2048

 

Run status group 0 (all jobs):
READ: bw=102MiB/s (107MB/s), 102MiB/s-102MiB/s (107MB/s-107MB/s), io=1024MiB (1074MB), run=10044-10044msec

Disk stats (read/write):
dm-8: ios=64233/2, merge=0/0, ticks=7416/8, in_queue=7436, util=74.54%, aggrios=21845/0, aggrmerge=0/0, aggrticks=2580/2, aggrin_queue=2581, aggrutil=25.78%
sdg: ios=21844/0, merge=0/0, ticks=2616/0, in_queue=2616, util=25.78%
sdh: ios=21844/1, merge=0/0, ticks=2600/4, in_queue=2600, util=25.63%
sdi: ios=21848/1, merge=0/0, ticks=2524/4, in_queue=2528, util=24.92%

The lines of significance to you, will be:

– Line: IOPS.

Shows the min, max and average IOPS that were obtained during the execution. This should roughly correspond to the IOPS expected for the type of Azure disk on which your source data file is located. Remember that if you have striped file system with RAID under a logical volume manager, then you should expect to see more IOPS because you have more disks.

NOTE: The IOPS will not hit the maximum achievable, because our page/block size is too high for this. The Azure disk values are achievable only with random read, 8KB page sizes, multiple threads/jobs and a queue depth of 256 (https://docs.microsoft.com/en-gb/azure/virtual-machines/linux/disks-benchmarks).

– Lines: “lat (usec)” and “lat (msec)”.

These are the proportions of latency in micro and milliseconds respectively.
If you have high percentages in the millisecond ranges, then you may have an issue. You would not expect this for the type of disks you would want to be running an SAP ASE database on.

In my example above, I am using 3x P40 Premium Storage SSD disks.
You can tell it is a striped logical volume setup, because the very last 3 lines of output shows my 3 Linux disk device names (sdg, sdh and sdi) which sit under my volume group.

You can use the useful links here to determine what you should be seeing on your setup:

NOTE: If you are running SAP on the ASE database, then you will more than likely be using Premium Storage (it’s the only option supported by SAP) and it will be Azure Managed (not un-managed).

Let’s look at the same Fio output using a 128KB page size (like ASE would if it was using large I/O).
We use the same command line but just change the “-bs” parameter to 128KB:

fio –name=global –readonly –rw=read –direct=1 –bs=128k –size=1024m –iodepth=2048 –filename=/sybase/AS1/sapdata/AS1_data_001.dat –numjobs=1 –name=job1

 

job1: (g=0): rw=read, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=2048
fio-3.10
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=128MiB/s][r=1021 IOPS][eta 00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=93539: Tue Jan 14 06:54:48 2020
read: IOPS=1025, BW=128MiB/s (134MB/s)(1024MiB/7987msec)
clat (usec): min=90, max=46843, avg=971.48, stdev=5784.85
lat (usec): min=91, max=46844, avg=972.04, stdev=5784.84
clat percentiles (usec):
| 1.00th=[ 101], 5.00th=[ 109], 10.00th=[ 113], 20.00th=[ 119],
| 30.00th=[ 124], 40.00th=[ 130], 50.00th=[ 137], 60.00th=[ 145],
| 70.00th=[ 157], 80.00th=[ 176], 90.00th=[ 210], 95.00th=[ 273],
| 99.00th=[42206], 99.50th=[42730], 99.90th=[43254], 99.95th=[43254],
| 99.99th=[46924]
bw ( KiB/s): min=130299, max=143616, per=100.00%, avg=131413.00, stdev=3376.53, samples=15
iops : min= 1017, max= 1122, avg=1026.60, stdev=26.40, samples=15
lat (usec) : 100=0.87%, 250=93.13%, 500=3.26%, 750=0.43%, 1000=0.13%
lat (msec) : 2=0.18%, 4=0.01%, 10=0.04%, 50=1.95%
cpu : usr=0.55%, sys=4.12%, ctx=8194, majf=0, minf=41
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=2048

Run status group 0 (all jobs):
READ: bw=128MiB/s (134MB/s), 128MiB/s-128MiB/s (134MB/s-134MB/s), io=1024MiB (1074MB), run=7987-7987msec

Disk stats (read/write):
dm-8: ios=8059/0, merge=0/0, ticks=7604/0, in_queue=7640, util=95.82%, aggrios=5461/0, aggrmerge=0/0, aggrticks=5114/0, aggrin_queue=5114, aggrutil=91.44%
sdg: ios=5461/0, merge=0/0, ticks=564/0, in_queue=564, util=6.96%
sdh: ios=5461/0, merge=0/0, ticks=7376/0, in_queue=7376, util=91.08%
sdi: ios=5462/0, merge=0/0, ticks=7404/0, in_queue=7404, util=91.44%

You can see that we actually got a lower IOPS value, but we returned all the data quicker and got a higher throughput.
This is due to the laws of how IOPS and throughput interact. A higher page/block size means we can potentially read more data in each I/O request.

Some of the performance randomness now becomes apparent, with the inconsistency of the “util” for each disk device. However, there is a note on the Fio webpage about how this metric (util) is not necessarily reliable.

You should note that, although we are doing a simulated direct I/O (unbuffered) operation at the Linux level, outside of Linux at the Azure level, there could be caching (data disk caching, which is actually cached on the underlying Azure physical host).

You can check your current setup directly in Azure or at the Linux level, by reading through my previous post on how to do this easily.

https://www.it-implementor.co.uk/2019/12/17/listing-azure-vm-datadisks-and-cache-settings-using-azure-portal-jmespath-bash/

Now for the final test.
Can we get the IOPS that we should be getting for our current setup and disks?

Following the Microsoft documentation to create the fioread.ini and execute (note it needs 120GB of disk space – 4 reader jobs x 30GB):

cat <<EOF > /tmp/fioread.ini
[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=8k

 

[reader1]
rw=randread
directory=/sybase/AS1/sapdata/

[reader2]
rw=randread
directory=/sybase/AS1/sapdata/

[reader3]
rw=randread
directory=/sybase/AS1/sapdata/

[reader4]
rw=randread
directory=/sybase/AS1/sapdata/
EOF

fio –runtime 30 /tmp/fioread.ini
reader1: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
reader2: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
reader3: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
reader4: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
fio-3.10
Starting 4 processes
reader1: Laying out IO file (1 file / 30720MiB)
reader2: Laying out IO file (1 file / 30720MiB)
reader3: Laying out IO file (1 file / 30720MiB)
reader4: Laying out IO file (1 file / 30720MiB)
Jobs: 4 (f=4): [r(4)][100.0%][r=128MiB/s][r=16.3k IOPS][eta 00m:00s]
reader1: (groupid=0, jobs=1): err= 0: pid=120284: Tue Jan 14 08:16:38 2020
read: IOPS=4250, BW=33.2MiB/s (34.8MB/s)(998MiB/30067msec)
slat (usec): min=3, max=7518, avg=10.06, stdev=43.39
clat (usec): min=180, max=156683, avg=60208.81, stdev=32909.11
lat (usec): min=196, max=156689, avg=60219.59, stdev=32908.61
clat percentiles (usec):
| 1.00th=[ 1549], 5.00th=[ 3294], 10.00th=[ 4883], 20.00th=[ 45351],
| 30.00th=[ 47973], 40.00th=[ 49021], 50.00th=[ 51643], 60.00th=[ 54789],
| 70.00th=[ 94897], 80.00th=[ 98042], 90.00th=[100140], 95.00th=[101188],
| 99.00th=[143655], 99.50th=[145753], 99.90th=[149947], 99.95th=[149947],
| 99.99th=[149947]
bw ( KiB/s): min=25168, max=46800, per=26.07%, avg=34003.88, stdev=4398.09, samples=60
iops : min= 3146, max= 5850, avg=4250.45, stdev=549.78, samples=60
lat (usec) : 250=0.01%, 500=0.02%, 750=0.12%, 1000=0.28%
lat (msec) : 2=1.35%, 4=5.69%, 10=5.72%, 20=1.15%, 50=30.21%
lat (msec) : 100=45.60%, 250=9.86%
cpu : usr=1.29%, sys=5.58%, ctx=6247, majf=0, minf=523
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=127794,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
reader2: (groupid=0, jobs=1): err= 0: pid=120285: Tue Jan 14 08:16:38 2020
read: IOPS=4183, BW=32.7MiB/s (34.3MB/s)(983MiB/30067msec)
slat (usec): min=3, max=8447, avg= 9.92, stdev=54.73
clat (usec): min=194, max=154937, avg=61163.27, stdev=32365.78
lat (usec): min=217, max=154945, avg=61173.85, stdev=32365.26
clat percentiles (usec):
| 1.00th=[ 1778], 5.00th=[ 3294], 10.00th=[ 5145], 20.00th=[ 46400],
| 30.00th=[ 47973], 40.00th=[ 49546], 50.00th=[ 52167], 60.00th=[ 55313],
| 70.00th=[ 94897], 80.00th=[ 98042], 90.00th=[100140], 95.00th=[101188],
| 99.00th=[111674], 99.50th=[145753], 99.90th=[147850], 99.95th=[149947],
| 99.99th=[149947]
bw ( KiB/s): min=26816, max=43104, per=25.67%, avg=33474.27, stdev=3881.96, samples=60
iops : min= 3352, max= 5388, avg=4184.27, stdev=485.26, samples=60
lat (usec) : 250=0.01%, 500=0.03%, 750=0.08%, 1000=0.15%
lat (msec) : 2=1.02%, 4=6.31%, 10=5.05%, 20=1.12%, 50=27.79%
lat (msec) : 100=49.09%, 250=9.37%
cpu : usr=1.14%, sys=5.53%, ctx=6362, majf=0, minf=522
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=125800,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
reader3: (groupid=0, jobs=1): err= 0: pid=120286: Tue Jan 14 08:16:38 2020
read: IOPS=3919, BW=30.6MiB/s (32.1MB/s)(921MiB/30066msec)
slat (usec): min=3, max=12886, avg= 9.40, stdev=56.68
clat (usec): min=276, max=151726, avg=65256.88, stdev=31578.48
lat (usec): min=283, max=151733, avg=65266.86, stdev=31578.73
clat percentiles (usec):
| 1.00th=[ 1958], 5.00th=[ 3884], 10.00th=[ 10421], 20.00th=[ 47449],
| 30.00th=[ 49021], 40.00th=[ 51119], 50.00th=[ 53740], 60.00th=[ 65274],
| 70.00th=[ 96994], 80.00th=[ 99091], 90.00th=[100140], 95.00th=[101188],
| 99.00th=[139461], 99.50th=[145753], 99.90th=[149947], 99.95th=[149947],
| 99.99th=[149947]
bw ( KiB/s): min=21344, max=42960, per=24.04%, avg=31354.32, stdev=5530.77, samples=60
iops : min= 2668, max= 5370, avg=3919.27, stdev=691.34, samples=60
lat (usec) : 500=0.01%, 750=0.05%, 1000=0.12%
lat (msec) : 2=0.92%, 4=4.15%, 10=4.59%, 20=0.59%, 50=25.92%
lat (msec) : 100=53.48%, 250=10.18%
cpu : usr=0.96%, sys=5.22%, ctx=7986, majf=0, minf=521
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=117853,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
reader4: (groupid=0, jobs=1): err= 0: pid=120287: Tue Jan 14 08:16:38 2020
read: IOPS=3955, BW=30.9MiB/s (32.4MB/s)(928MiB/30020msec)
slat (usec): min=3, max=9635, avg= 9.57, stdev=52.03
clat (usec): min=163, max=151463, avg=64699.59, stdev=32233.21
lat (usec): min=176, max=151468, avg=64709.90, stdev=32232.66
clat percentiles (usec):
| 1.00th=[ 1729], 5.00th=[ 3720], 10.00th=[ 7832], 20.00th=[ 46924],
| 30.00th=[ 48497], 40.00th=[ 51119], 50.00th=[ 53740], 60.00th=[ 87557],
| 70.00th=[ 96994], 80.00th=[ 99091], 90.00th=[100140], 95.00th=[102237],
| 99.00th=[109577], 99.50th=[143655], 99.90th=[147850], 99.95th=[147850],
| 99.99th=[147850]
bw ( KiB/s): min=21488, max=46320, per=24.22%, avg=31592.63, stdev=4760.10, samples=60
iops : min= 2686, max= 5790, avg=3949.05, stdev=595.03, samples=60
lat (usec) : 250=0.02%, 500=0.07%, 750=0.07%, 1000=0.09%
lat (msec) : 2=1.31%, 4=4.04%, 10=5.13%, 20=1.28%, 50=24.76%
lat (msec) : 100=52.89%, 250=10.35%
cpu : usr=1.06%, sys=5.21%, ctx=8226, majf=0, minf=522
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=118743,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
READ: bw=127MiB/s (134MB/s), 30.6MiB/s-33.2MiB/s (32.1MB/s-34.8MB/s), io=3830MiB (4016MB), run=30020-30067msec

Disk stats (read/write):
dm-8: ios=490190/1, merge=0/0, ticks=30440168/64, in_queue=30570784, util=99.79%, aggrios=163396/0, aggrmerge=0/0, aggrticks=10170760/21, aggrin_queue=10172817, aggrutil=99.60%
sdg: ios=162989/1, merge=0/0, ticks=10134108/64, in_queue=10135484, util=99.59%
sdh: ios=163379/0, merge=0/0, ticks=10175316/0, in_queue=10177440, util=99.60%
sdi: ios=163822/0, merge=0/0, ticks=10202856/0, in_queue=10205528, util=99.59%

throughput = [IOPS] * [block size]
example: 3000 IOPS * 8 (8KB) = 24000KB/s (24MB/s)

From our output, we can see how the IOPS and blocksize affect the throughput calculation:
16,300 (IOPS total) * 8 (8KB) = 130400KB/s (127MB/s)

Simple answer, no, we don’t get what we expect for our P40 disks. Further investigation required. 🙁

Complications of using SAP ASE 16.0 in a HADR pair plus DR node Setup

Firstly, we need to clarify that HADR in SAP ASE speak, is the SAP ASE feature-set name for a HA or DR setup consisting of 2 SAP ASE database instances with a defined replication mode.

The pair can be either for HA or DR, but rarely both, due to the problem of latency.
The problem of latency is inverse to the solution of DR. The further away your second datacentre, the better, from a DR perspective.
Conversely, the worse your latency will become, meaning it can only seriously be used for DR, and not for HA.

If you can find a sweet spot between distance (better for DR) and latency (better for HA), then you would have a HADR setup. But this is unlikely.

As of ASE 16 SP03, an additional DR node is supported to be incorporated into a HADR pair of ASE database instances.
This produces a 3 node setup, with 2 nodes forming a pair (designed to be for HA), then a remote 3rd node (designed for DR).
The reason you may consider such a setup is to provide HA between the two nodes, maybe within an existing datacentre, then DR is provided by a remote 3rd node.
Since the two nodes within the HA pair would likely have low latency, they would have one replication mode (e.g synchronous replication) keeping the data better protected, with the replication mode to the third database being asynchronous, for higher latency scenarios, but less protected data.

In the scenarios and descriptions below, we are highlighting the possibility of running a two node HADR pair plus DR node in public cloud using a paired region:

Whilst an SAP application layer is also supported on the 3 node setup, there are complications that should be understood prior to implementation.
These complications will drive up both cost of implementation and also administrative overhead, so you should ensure that you fully understand how the setup will work before embarking on this solution.

Setup Process:

We will briefly describe the process for setting up the 3 nodes.
In this setup we will use the remote, co-located replication server setup, whereby the SAP SRS (replication server) is installed onto the same servers as the ASE database instances.

1, Install primary ASE database instance.

2, Install Data Movement (DM) component into the binary software installation of the primary ASE database instance.

3, Install secondary ASE database instance.

4, Install Data Movement (DM) component into the binary software installation of the secondary ASE database instance.

5, Run the setuphadr utility to configure the replication between primary and secondary.

This step involves the materialisation of the master and <SID> databases. The master database materialisation is automatic, the <SID> database is manual and requires dump & load.

Therefore, if you have a large <SID> database, then materialisation can take a while.

6, Install tertiary ASE database instance.

7, Install Data Movement (DM) component into the binary software installation of the tertiary ASE database instance.

8, Run the setuphadr utility to configure the tertiary ASE instance as a DR node.

This step involves the materialisation of the master and <SID> databases. The master database materialisation is automatic, the <SID> database is manual and requires dump & load.
Therefore, if you have a large <SID> database, then materialisation can take a while.

In the above, you can adjust the replication mode between primary and secondary, depending on your latency.
In Public cloud (Microsoft Azure), we found that the latency between paired regions was perfectly fine for asynchronous replication mode.
This also permitted the RPO to be met, so we actually went asynchronous all the way through.

POINT 1:

Based on the above, we have our first point to make.

When doing the dump & load for the tertiary database, both master and <SID> databases are taken from the primary database, which in most cases will be in a different datacentre, so materialisation of a large <SID> database will take longer than the secondary database materialisation timings.

You will need to develop a process for quickly getting the dump across the network to the tertiary database node (before the transaction log fills up on the primary).

Developing this fast materialisation process is crucial to the operation of the 3 node setup, since you will be doing this step a lot.

Operational Process:

We now have a 3 node setup, with replication happily pushing primary database transactions from primary (they go from the Replication Agent within the primary ASE instance), to the SRS on the secondary ASE node.
The SRS on the secondary instance then pushes the transactions into the secondary ASE instance databases (master & <SID>) and also to the SRS on the tertiary ASE database instance.

While this is working, you can see the usual SRS output by connecting into the SRS DR Agent on the secondary node and issuing the “sap_status path” command.
The usual monitoring functions exist for monitoring the 3 ASE nodes. You can use the DBACockpit (DB02) in a Netweaver ABAP stack, the ASE Fault Manager or manually at the command line.

One of the critical processes with an ASE HADR setup, is the flow of transactions from primary. You will be constantly engaged trying to prevent the backlog of transactions, which could cause primary to halt database commits until transaction log space is freed.
By correctly sizing the whole chain (primary, secondary and tertiary transaction logs) plus sizing the inbound queues of the SRS, you should have little work to do on a daily basis.

POINT 2:

It’s not the daily monitoring that will impact, but the exceptional change scenarios.
As an example, all 3 ASE database instances should have the same database device sizes, transaction log sizes and configuration settings.
Remembering to increase the device, database, transaction log, queue on each of them can be arduous and mistakes can be made.
Putting a solid change process around the database and SRS is very important to avoid primary database outages.
Since all 3 databases are independent, you can’t rely on auto-growby to grow the devices and databases in sync. So you may need to consider manually increasing the device and database sizes.

Failover Process:

During a failover, the team need to be trained in the scenario of recovery of the data to whichever database server node is active/available/healthy.
The exact scenario training could be difficult as it may involve public cloud, in which case it may not be possible to accurately simulate.
For the 3 node SAP ASE HADR + DR node, the failure scenario that you experience could make a big difference to you overall recovery time.

When we mention recovery time, we are not just talking about RPO/RTO for getting production systems working, we are talking about the time to actually recover the service to a protected state.
For example, recovery of the production database to a point where it is once again adequately protected from failure through database replication.

Loss of the primary database in a 3 node setup, means that the secondary node is the choice to become primary.
In this scenario, the secondary SRS is no longer used. Instead the SRS on the DR node would be configured to be the recipient of transactions from the Replication Agent of the secondary ASE.
If done quickly enough, then re-materialisation of the tertiary database can be avoided as both secondary and tertiary should have the same point-in-time.
In practice however, you will find more often than not, that you are just re-materialising the DR node from the secondary.
In some cases, you may decide not to both until the original primary is back in action. The effort is just too much.

Loss of the secondary database in a 3 node setup, means that the primary becomes instantly unprotected!
Both the secondary node and the tertiary node will drift out of sync.
In this scenario, you will more than likely find that you will be pushed for time and need to teardown the replication on the primary database to prevent the primary transaction lo filling.

Loss of the tertiary database in a 3 node setup, means that you no longer have DR protection for your data!
The transaction log on the primary will start to fill because secondary SRS will be unable to commit transactions in the queue to the tertiary database.
In this scenario, you will more than likely find that you will be pushed for time and need to re-materialise the DR database from the primary.
Time will be of the essence, because you will need transaction log space available in the primary database and queue space in the SRS, for the time to perform the re-materialsation.

POINT 3:

Sizing of the production transaction log size is crucial.
The same size is needed on the secondary and tertiary databases (to allow materialisation (dump & load) to work.
The SRS queue size also needs to be a hefty size (bigger than the transaction log size) to accommodate the transactions from the transaction log.
The primary transaction log size is no longer now just about daily database transactional throughput, but is also intertwined with the requirement for the time it takes to dmp & load the DB across the network to the DR node (slowest link in the chain).
Plus, on top of the above sizings, you should accommodate some additional buffer space for added delays, troubleshooting, decision making.

You should understand your dump & load timings intricately to be able to understand your actual time to return production to a protected state. This will help you decide which is the best route to that state.

Maintenance Process:

Patching a two node ASE HADR setup, is fairly simple and doesn’t take too much effort in planning.
Patching a three node setup (HADR + DR node), involves a little more thought due to the complex way you are recommended to patch.
The basics of the process are that you should be patching the inactive portions of the HADR + DR setup.
Therefore, you end up partially patching the ASE binary stack, leaving the currently active primary SRS (on the secondary node) until last.
As well patching the ASE binaries, you will also have to patch the SAP Hostagent on each of the three nodes. Especially since the Hostagent is used to perform the ASE patching process.
Since there is also a SAP instance agent present on each database node, you will also need to patch the SAP Kernel (SAPEXE part only) on each database node.

POINT 4:

Database patching & maintenance effort increases with each node added. Since the secondary and DR nodes have a shared nothing architecture, you patch specific items more than once across the three nodes.

Summary:

The complexity of managing a two node SAP ASE HADR pair plus DR node should not be underestimated.
You can gain the ability to have HA and DR, especially in a public cloud scenario, but you will pay a heavy price in overhead from maintenance and potentially lose time during a real DR due to the complexity.
It really does depend on how rigid you can be at defining your failover processes and most importantly, testing them.

Carefully consider the cost of HA and DR, versus just DR (using a two node HADR setup with the same asynchronous replication mode).
Do you really need HA? Is your latency small enough to permit a small amount of time running across regions (in public cloud)?

HowTo: Shrink SAP ASE Transaction Log

Providing that you’ve understood that with SAP ASE, you cannot shrink the device size, then you may be looking to shrink the size of a transaction log or datafile within a device.
From isql, list out the segments of the transaction log using “sp_helpdb <dbid>” as follows (I’ve cleaned up the output a little):

1> sp_helpdb saptools
2> go

name
         db_size
         owner                dbid
         created
         durability
         lobcomplvl
         inrowlen
         status                       
——————————————————–

saptools
             4628.0 MB
         sapsa                   5
         Jul 20, 2015
         full
                100
             2000
         trunc log on chkpt, ddl in tran, allow nulls by default, abort tran on
         log full, allow wide dol rows, allow incremental dumps,full logging for all

(1 row affected)

device_fragments               size          usage
         created                   free kbytes
—————————— ————- ——————–
         ————————- —————-
saptools_data_001                  2048.0 MB data only
         Jul 20 2015  6:50PM                1626832
saptools_log_001                    204.0 MB log only
         Jul 20 2015  6:50PM       not applicable
saptools_log_001                    308.0 MB log only
         Oct 26 2015 10:16AM       not applicable
saptools_data_001                  2048.0 MB data only
         Nov 10 2015 11:51AM                2088960
saptools_data_001                    20.0 MB data only
         Nov 10 2015 11:53AM                  20400
——————————————————–

log only free kbytes = 522128                       

(return status = 0)

You can see that the saptools_log_001 has been extended in the past as there are multiple lines under “device fragments” for the saptools_log_001 device.
There are two segments, one is 204MB and the other 308MB.
We can remove one of these segments to free up space within the device saptools_log_001 (remember this will not return the space to the operating system, you can’t do that in ASE).
Since this is a transaction log, we will need to ensure that the segment we are removing is no longer used for redo information.
In our case we are happy to simply truncate, but you may wish to actually dump the transaction log to your backup tool or to disk.

1> dump tran saptools with truncate_only
2> go

NOTE: You may need to do the above multiple times before the log segments become free.
Once cleared, we can then tell the saptools database to remove the 308MB segment, by specifying the exact size of the segment we would like to remove.
NOTE: You cannot just choose a segment, you must start with the last segment and work backwards else you will create “holes” in your device.

1> alter database saptools log off saptools_log_001 = 308
2> go

Removing 19712 pages (308.0 MB) from disk ‘saptools_log_001’ in database ‘saptools’.

Processed 571 allocation unit(s) out of 640 units (allocation page 145920). 89%

completed.

Processed 635 allocation unit(s) out of 640 units (allocation page 162304). 99%

completed.

Now let’s check:

1> sp_helpdb saptools
2> go

name
         db_size
         owner                dbid
         created
         durability
         lobcomplvl
         inrowlen
         status                       
——————————————————–

saptools
             4320.0 MB
         sapsa                   5
         Jul 20, 2015
         full
                100
             2000
         trunc log on chkpt, ddl in tran, allow nulls by default, abort tran on
         log full, allow wide dol rows, allow incremental dumps,full logging for all

(1 row affected)

device_fragments               size          usage
         created                   free kbytes
—————————— ————- ——————–
         ————————- —————-
saptools_data_001                  2048.0 MB data only
         Jul 20 2015  6:50PM                1626832
saptools_log_001                    204.0 MB log only
         Jul 20 2015  6:50PM       not applicable
saptools_data_001                  2048.0 MB data only
         Nov 10 2015 11:51AM                2088960
saptools_data_001                    20.0 MB data only
         Nov 10 2015 11:53AM                  20400
——————————————————–

log only free kbytes = 207968, log only unavailable kbytes = 315392

(return status = 0)

Only one segment for the saptools_log_001 device remains.
We have increased the “log only unavailable kbytes” as the space is still used by the device, but is there for us to expand into again if we wish.
NOTE: The saptools_log_001 is a “log only” device.  So we cannot use that new free space for a data segment expansion.  Only for log expansion.

Manual Explain Plan in SAP ASE

When running SAP on top of SAP ASE database, you can use the DBACOCKPIT transaction to help produce an EXPLAIN PLAN (execution plan).
It’s also possible to use the “EXPLAIN” button from an SQL trace (from ST05 or ST12).
However, under certain circumstances it may not be possible (or it just may not function – it’s Java based) and you may want to use command line iSQL to generate the EXPLAIN PLAN output at the database level directly.
Here’s how.
From the SQL trace (ST05 or ST01) in the SAP GUI, you can double click to obtain the SQL text, which will usually be a prepared statement (contain ‘?’ in place of actual data).
It may look like the following (sorry for the large statement):

SELECT
    “MANDT” ,”PERNR” ,”SUBTY” ,”OBJPS” ,”SPRPS” ,”ENDDA” ,”BEGDA” ,”SEQNR” , “AEDTM” ,”UNAME” ,”HISTO” ,”ITXEX” ,”REFEX” ,”ORDEX” ,”ITBLD” ,”PREAS” , “FLAG1″ ,”FLAG2″ ,”FLAG3″ ,”FLAG4″ ,”RESE1″ ,”RESE2″ ,”GRPVL” ,”ANSSA” ,   “NAME2″ ,”STRAS” ,”ORT01″ ,”ORT02″ ,”PSTLZ” ,”LAND1″ ,”TELNR” ,”ENTKM” , “WKWNG” ,”BUSRT” ,”LOCAT” ,”ADR03″ ,”ADR04″ ,”STATE” ,”HSNMR” ,”POSTA” , “BLDNG” ,”FLOOR” ,”STRDS” ,”ENTK2″ ,”COM01″ ,”NUM01″ ,”COM02″ ,”NUM02″ , “COM03″ ,”NUM03″ ,”COM04″ ,”NUM04″ ,”COM05″ ,”NUM05″ ,”COM06″ ,”NUM06” , “INDRL” ,”COUNC” ,”RCTVC” ,”OR2KK” ,”CONKK” ,”OR1KK” ,”RAILW”
  FROM
    “PA0006”
  WHERE
    “MANDT” = ? AND “PERNR” IN ( ? , ? , ? , ? , ? )  /* R3:SAPDBPNP:11498 T:PA0006 M:100 */  /*unc. rd.*/
Variable
A0(CH,3)  = 100
A1(NU,8)  = 00000001
A2(NU,8)  = 00000002
A3(NU,8)  = 00000003
A4(NU,8)  = 00000004
A5(NU,8)  = 00000005

The text after the first “/*” is comment text added by the SAP DBSL layer to indicate the calling module and line number, as well as the client and table against which it is executing.
All you need to do to use this SQL directly at the DB level is to populate the ‘?’ with the actual data which you can see in the variable list at the bottom left.
The variables run in order, left to right.
Therefore, A1 will be the first ‘?’ in the SQL statement.
You will also see from the variable list that the ABAP data type is included.
CH = CHARacter and NU = Number.
All you need to know, is that SAP rarely uses the underlying database data type, ensuring the agnostic nature of the OPEN SQL.
This means you enclose all of the variables in quotes as though they were characters.
Based on the example above, removing the comment area, you should then have the following SQL statement:

SELECT
    “MANDT” ,”PERNR” ,”SUBTY” ,”OBJPS” ,”SPRPS” ,”ENDDA” ,”BEGDA” ,”SEQNR” , “AEDTM” ,”UNAME” ,”HISTO” ,”ITXEX” ,”REFEX” ,”ORDEX” ,”ITBLD” ,”PREAS” , “FLAG1″ ,”FLAG2″ ,”FLAG3″ ,”FLAG4″ ,”RESE1″ ,”RESE2″ ,”GRPVL” ,”ANSSA” , “NAME2″ ,”STRAS” ,”ORT01″ ,”ORT02″ ,”PSTLZ” ,”LAND1″ ,”TELNR” ,”ENTKM” , “WKWNG” ,”BUSRT” ,”LOCAT” ,”ADR03″ ,”ADR04″ ,”STATE” ,”HSNMR” ,”POSTA” , “BLDNG” ,”FLOOR” ,”STRDS” ,”ENTK2″ ,”COM01″ ,”NUM01″ ,”COM02″ ,”NUM02″ , “COM03″ ,”NUM03″ ,”COM04″ ,”NUM04″ ,”COM05″ ,”NUM05″ ,”COM06″ ,”NUM06” , “INDRL” ,”COUNC” ,”RCTVC” ,”OR2KK” ,”CONKK” ,”OR1KK” ,”RAILW”
  FROM
    “PA0006”
  WHERE
    “MANDT” = ‘100’ AND “PERNR” IN (‘00000001’, ‘00000002’ , ‘00000003’ , ‘00000004’ , ‘00000005’ )

Now to run it at the database level.
Log onto the database server as either the syb<sid> or <sid>adm (I’m using Unix/Linux) user (both usually have the isql binary in their path).
Once logged on, run isql and connect into the <SID> database as the SAPSR3 (or SAPSR3DB on Java stacks) user:

isql –USAPSR3 –S<SID> -D<SID> -W999 –X

Now you’re connected, we need to set some session settings so that we get the explain plan output.
NOTE: The “use” is optional as we’ve already specified the DB we want to use with the “-D” parameter at the isql command line.

use <SID>
go

We want to show the resultant execution (EXPLAIN) plan.

set showplan on
go

We would like some additional useful details from the EXPLAIN plan.

set statistics io on
go
set statistics time on
go
set statistics plancost on
go

We don’t want any of our SQL to be cached, just incase we want to change it and the system decides to try and use the most efficient one.

set statement_cache off
go

We make life easier by enabling the use of quotes, since SAP has already put them there for us.

set quoted_identifier on
go

Finally, we paste the SQL itself, followed by the ASE SQL command terminator “go”:

SELECT
    “MANDT” ,”PERNR” ,”SUBTY” ,”OBJPS” ,”SPRPS” ,”ENDDA” ,”BEGDA” ,”SEQNR” , “AEDTM” ,”UNAME” ,”HISTO” ,”ITXEX” ,”REFEX” ,”ORDEX” ,”ITBLD” ,”PREAS” , “FLAG1″ ,”FLAG2″ ,”FLAG3″ ,”FLAG4″ ,”RESE1″ ,”RESE2″ ,”GRPVL” ,”ANSSA” , “NAME2″ ,”STRAS” ,”ORT01″ ,”ORT02″ ,”PSTLZ” ,”LAND1″ ,”TELNR” ,”ENTKM” , “WKWNG” ,”BUSRT” ,”LOCAT” ,”ADR03″ ,”ADR04″ ,”STATE” ,”HSNMR” ,”POSTA” , “BLDNG” ,”FLOOR” ,”STRDS” ,”ENTK2″ ,”COM01″ ,”NUM01″ ,”COM02″ ,”NUM02″ , “COM03″ ,”NUM03″ ,”COM04″ ,”NUM04″ ,”COM05″ ,”NUM05″ ,”COM06″ ,”NUM06” , “INDRL” ,”COUNC” ,”RCTVC” ,”OR2KK” ,”CONKK” ,”OR1KK” ,”RAILW”
  FROM
    “PA0006”
  WHERE
    “MANDT” = ‘100’ AND “PERNR” IN (‘00000001’, ‘00000002’ , ‘00000003’ , ‘00000004’ , ‘00000005’ )

Here’s the sample output plan:

================ Lava Operator Tree ================
                        Emit
                        (VA = 3)
                        r:36 er:39
                        cpu: 0
             /
            NestLoopJoin
            Inner Join
            (VA = 2)
            r:36 er:39
            l:0 el:31
            p:0 ep:25
/                     
OrScan                  IndexScan
Max Rows: 5             PA0006~0
(VA = 0)                (VA = 1)
r:5 er:-1               r:36 er:39
l:0 el:-1               l:35 el:31
p:0 ep:-1               p:1 ep:25
=====================================================

Table: PA0006 scan count 5, logical reads: (regular=35 apf=0 total=35), physical
reads: (regular=1 apf=0 total=1), apf IOs used=0
Total writes for this command: 0
Execution Time 0.
Adaptive Server cpu time: 10 ms.  Adaptive Server elapsed time: 24 ms.

(36 rows affected)

Reading the plan from the lowest “VA” value, we start with an OrScan (breaks down the 5 “IN” list values we passed into the query, into an SQL “OR” statement).
The OrScan is returning 5 rows (r:5) and performed zero logical reads (l:0) and zero physical reads (p:0).
There were no estimated logical or physical reads (el & ep) due to the type of operation.
We were then using an index (IndexScan) of PA0006~0 (a primary key on table PA0006).  Of which we estimated that we would return 39 rows (er:39), based on statistics (I would hope), but we actually returned 36 (r:36).
We performed 35 logical reads and 1 physical read on the index (I believe that logical reads encompass physical reads, just like Oracle).
Which was better than we anticipated with the estimated logical and physical values of 31 and 25 respectively.
Both the OrScan and the IndexScan are accessed from the parent NestLoopJoin(VA = 2).
We return 36 rows (r:36) upto the EMIT for return back to the client (isql in this case, but it would normally be our SAP dialog work process).
The summary at the very bottom of the SQL output shows a nice set of easy to interpret values.
Because our query was broken into an “OR” statement like this:

SELECT column1
  FROM table1
WHERE column1 = value1
      OR   column1 = value2
      OR   column1 = value3
      OR   column1 = value4
      OR   column1  = value5

It means that the index was scanned 5 times against the matching key column (scan count 5).
The sum of the logical reads and physical reads is shown (APF reads – Asynchronous Pre-Fetch, are reads to/from the data cache in a hopeful way).
We then have visibility of the actual SQL execution time plus the required CPU time and the overall elapsed time.
Additional information can be found on the Sybase infocenter site, although I feel it’s lifetime is limited.