This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Staying on-top of SAP Kernel Patches

Staying on top of patching & maintenance in a large SAP landscape can be a daunting and exhausting task.
The BASIS team need to understand every nook & cranny of software installed in the landscape, provide a strategy for patching it, and do the patching with minimal effort and business outage.
It’s harder than you can imagine!

There are numerous tools available to help automate a lot of the process; however, there’s one aspect that still needs some grey matter applying!

Analysing the SAP security notes, reading the latest Kernel distribution notes and staying on top of software component developments is, something that is not very easily automated.
As an example:  How do you determine the criticality of a bug in the Kernel in a feature that you don’t currently use, but plan to use?  Or how do you decide which future version of a Kernel is most stable for you to move to?

The BASIS team need to review the information, classify it and apply the appropriate actions.

For this reason, one of my daily reading habits, is checking the “SAP Kernel: Important News” wiki page.
Endless amounts of useful information covering the whole Kernel spectrum from 721 to Kernels not even in general use yet.
You will gain useful information from understanding what Kernels are available, what doesn’t work properly and the future direction of the Kernels in current distribution.

It’s easily read on the commute into the office.

https://wiki.scn.sap.com/wiki/display/SI/SAP+Kernel:+Important+News

SAP ABAP Kernel SNAPSHOTS

If you haven’t already seen it, quite some time ago I wrote a brief blog post on the Flight Recorder for the NW AS Java stack.
Many years later, I’ve still very rarely seen a company use the flight recorder information.
Snapshots (also known as SAP Kernel Snapshot) in the NW AS ABAP stack are much the same thing.

When a serious condition occurs within the ABAP stack, a system message is registered in the system log (SM21) and a snapshot of the current system status is generated.
An example condition could be that a work process has died without warning, or that there was a lack of resources (background or dialog), hard shutdown of the SAP system/server or that the dispatcher queue was full.

Each of these scenarios is logged under system log message code “Q41”, in category “DP” and “Process No” “000”.
The only difference between the failures is the text following “reason: “ which is passed in by the calling Kernel function.

Q41 in system log

According to the SAP notes, the initial feature was provided in 2013 as per SAP note: 1786182 “CreateSnapshot: Collecting developer traces using sapcontrol”.
It was provided as an additional set of web service functions on the SAP instance agent (sapstartsrv) and is therefore accessible from outside of the SAP system if need be.

Originally, it appears to have been designed to be operated independently outside the SAP system, however SAP note 2640476 – “How to analyze Server Snapshot with kernel snapshot analyser” from 2019, indicates that it was integrated into the SAP Kernel in 7.40 (i.e. that the Kernel itself can instigate the snapshot).

How can we use SNAPSHOTS?

You can either access the snapshot zip files directly on the O/S level using O/S commands to extract and inspect the files, or you can use the ABAP transaction code SNAPSHOTS to see an ALV list of snapshot files.

In the O/S the files are stored in /sapmnt/<SID>/global/sapcontrol/snapshots.

Usually, the sequence of access is dictated by an extraordinary event within the ABAP stack, then you may see the System log entry which will inform you of the existence of a snapshot, but you may choose to regularly check in SNAPSHOTS anyway as part of your daily checks.

SNAPSHOTS (program: RS_DOWNLOAD_SNAPSHOTS)

As you can see the ABAP transaction incorporates the reason for the snapshot, whereas the O/S file listing is not so easy to identify.
If you want to use the O/S level, then unzipping the file will reveal a file called “description.txt” which states the reason for the snapshot:

From the SNAPSHOTS transaction (program RS_DOWNLOAD_SNAPSHOTS) you have the option to download the snapshot file to your front-end.
Here you can unzip the file and expose the contents.

Once you have extracted the snapshot zip file, you will see a tree structure under which will sit a number of XML files:

The names of the XML files are fairly self explanatory.
ABAPGetWPTable for example, is the name of the sapcontrol web service function that is used to get the ABAP Work Process Table (same as transaction SM50).

Opening any of the XML files is going to be a lot easier with Microsoft Excel.
Except the XML is not suitable for Excel without a little bit of manipulation (this is a real pain, but once in Excel you will love it).

Edit the XML file in a text editor and delete the header lines that are the result of the web service function call, leaving just the raw XML:

Save the file and then it will happily open in Excel!

As mentioned, this is a snapshot of the work process table at the point when an issue occurred.
Very useful indeed.
You have lots of other XML files to examine.

Plus, as an added bonus, further down the directory structure of the snapshot zip file, is a complete XML snapshot of all the developer trace files for this app server:

How can we manually create SNAPSHOTS?

You can manually create and administer the snapshots (they will need clearing down) using the SAP instance agent (sapcontrol) web service commands as follows:

sapcontrol -nr <##> -function [snapshot_function]
  CreateSnapshot [<description> [<datcol_param> [<analyse_severity -1..2>
  [<analyse_maxentries> [<analyse_starttime YYYY MM DD HH:MM:SS>
  <analyse_endtime YYYY MM DD HH:MM:SS> [<maxentries>
  [<filename1> … <filenameN>]]]]]]]
  ReadSnapshot <filename> [<local filename>]
  ListSnapshots
  DeleteSnapshots <filename1> [<filename2>… <filenameN>]

The ABAP transaction SNAPSHOTS only allows you to view/download and delete the snapshots. You can not trigger them using any standard transaction that I can find.

SUSE Linux 12 – Kernel 4.4.73 – Boot Hang – BTRFS Issue

I had a VMWare guest running SUSE Linux 12 SP3 64bit (kernel 4.4.73).
One day after a power outage, the VM failed to boot.
It would arrive at the SUSE Linux “lizard” splash screen and then just hang.

I noticed prior to this error that the SUSE 12 operating system creates it’s root partition inside a logical volume call “/dev/system/root” and it is then formatted as a BTRFS filesystem.

At this point I decided that I must have a corrupt disk block.
I launched the VM with the CDROM attached and pointing at the SUSE 12 installation ISO file.
While the VM starts you need to press F2 to get to the “BIOS” boot options to enable the CDROM to be bootable before the hard-disks.

Once the installation cdrom was booting, I selected “Recovery” from the SUSE menu.
This drops you into a recovery session with access to the BTRFS filesystem check tools.

Following a fair amount of Google action, I discovered I could run a “check” of the BTRFS file system (much like the old fsck on EXT file systems).

Since I already knew the device name for the root file system, things were pretty easy:

# btrfs check /dev/system/root
Checking filesystem on /dev/system/root

found 5274484736 bytes used err is 0

Looks like the command worked, but it is showing no errors.
So I tried to mount the partition:

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root

At this point the whole VM hung again!
I had to restart the whole process.
So there was definately an issue with the BTRFS filesystem on the root partition.

Starting the VM again and re-entering the recovery mode of SUSE, I decided to try and mount the partition in recovery mode:

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root -o ro,recovery

It worked!
No problems.  Weird.
So I unmounted and tried to re-mount in read-write mode again:

# umount /old_root
# mount -t btrfs /dev/system/root /old_root

BAM! The VM hung again.

Starting the VM again and re-entering the recovery mode of SUSE, I decided to just run the btrfs command with the “repair” command (although it says this should be a last resort).

# btrfs check –repair /dev/system/root
enabling repair mode
Checking filesystem on /dev/system/root
UUID: a09b7c3c-9d33-4195-af6e-9519fe550694
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don’t match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 5274484736 bytes used err is 0
total csum bytes: 4909484
total tree bytes: 236126208
total fs tree bytes: 215973888
total extent tree bytes: 13647872
btree space waste bytes: 38681887
file data blocks allocated: 5186543616

Maybe this cache problem that it fixed is the issue.

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root

Yay!
So, weird problem fixed.
Maybe this is a Kernel level issue and later Kernels have a patch, not sure.  It’s not my primary concern to fix this as I don’t plan on having many power outages, but if it was my production system then I might be more concerned and motivated.

When SLES for SAP is not SLES for SAP

I recently downloaded and installed “SUSE Enterprise Linux for SAP 12 SP3” into a local virtual machine.
It seemed to contain everything that I thought it would contain with regards to included SAP Linux packages.

Noteable were the following in my local VM:

# which saptune
/usr/sbin/saptune
# rpm -qa | grep sap
cyrus-sasl-gssapi-32bit-2.1.26-7.1.x86_64
sap-netscape-link-0.1-1.2.noarch
sap-installation-wizard-3.1.81-3.1.x86_64
yast2-sap-scp-1.0.3-11.2.noarch
saptune-1.1.3-1.1.x86_64
saprouter-systemd-0.2-1.1.noarch
cyrus-sasl-gssapi-2.1.26-7.1.x86_64
patterns-sles-sap_server-12-77.8.x86_64
patterns-sles-sap_server-32bit-12-77.8.x86_64
yast2-saptune-1.2-1.5.noarch
sap-locale-32bit-1.0-92.4.x86_64
sapconf-4.1.8-1.18.noarch
sap-locale-1.0-92.4.x86_64
yast2-sap-scp-prodlist-1.0.2-4.2.noarch
# cat /etc/os-release
NAME=”SLES”
VERSION=”12-SP3″
VERSION_ID=”12.3″
PRETTY_NAME=”SUSE Linux Enterprise Server 12 SP3″
ID=”sles”
ANSI_COLOR=”0;32″
CPE_NAME=”cpe:/o:suse:sles_sap:12:sp3″
# uname -a
Linux hana01 4.4.73-7-default #1 SMP Fri Jul 21 13:26:40 UTC 2017 (6beeafd) x86_64 x86_64 x86_64 GNU/Linux

All looks good to me.

I then created an Azure hosted virtual machine using the image “SLES for SAP 12 SP3 (BYOS)”:

 

The Azure VM seems to be missing a lot of the packages that I would expect to be in place:

# which saptune
which: no saptune in (/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/lib/mit/bin)
# rpm -qa | grep sap
patterns-sles-sap_server-12-77.8.x86_64
yast2-sap-scp-prodlist-1.0.2-4.2.noarch
yast2-sap-scp-1.0.3-11.2.noarch
cyrus-sasl-gssapi-2.1.26-7.1.x86_64
sapconf-4.1.10-40.37.1.noarch
# cat /etc/os-release
NAME=”SLES”
VERSION=”12-SP3″
VERSION_ID=”12.3″
PRETTY_NAME=”SUSE Linux Enterprise Server 12 SP3″
ID=”sles”
ANSI_COLOR=”0;32″
CPE_NAME=”cpe:/o:suse:sles_sap:12:sp3″
# uname -a
Linux hana01 4.4.82-6.3-default #1 SMP Mon Aug 14 14:14:02 UTC 2017 (4c72484) x86_64 x86_64 x86_64 GNU/Linux

Notice also that the Kernel release is slightly newer on the Azure image, plus the version of the sapconf package is slightly newer.
The most important point is that the Azure image is missing the saptune package.
This is important as it is a method presented in numerous SAP notes for automatically applying the recommended O/S settings (that’s right, they don’t all get applied out-of-the-box).