This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Korn Shell Calling SAP HANA – Hello Hello!

So you’ve automated SAP HANA stuff huh?
What tools do you use?
Python? Chef? Puppet? Ansible? DSC/Powershell?

No. I use Korn shell. ¯\_(ツ)_/¯

Me, Trying to Support Korn…

I find Korn shell is a lowest common denominator across many Linux/Unix systems, and also extremely simple to support.
It does exactly what I need.

For me it’s readable, fairly self-explanatory, easily editable and dependable.

…and Failing?

But I do know what you’re saying:

  • there’s no built in version control
  • it’s not easy to debug
  • it’s definitely not very cool 🤓
  • you can’t easily do offline development
  • my Grandad told me about something called Korn shell ?

Have you Considered

If you have no budget for tools, then you can start automating by using what you already have. Why not.
Don’t wait for the right tool, start somewhere and only then will you understand what works for you and doesn’t work for you.

Sometimes it’s not about budget. There are companies out there that do not allow certain sets of utilities and tools to be installed on servers, because they can provide too much help to hackers. Guess what they do allow? Yep, Korn shell is allowed.

Let’s Get Started

Here’s a sample function to run some SQL in HANA by calling the hdbsql (delivered with the HANA client) and return the output:

#!/bin/ksh
function run_sql {
   typeset -i l_inst="${1}" 
   typeset l_db="${2}" 
   typeset l_user="${3}" 
   typeset l_pw="${4}" 
   typeset l_col_sep="${5}" 
   typeset l_sql="${6}" 
   typeset l_output="" 
   typeset l_auth="" 
   typeset -i l_ret=0

   # Check if PW is blank, then use hdbuserstore (-U). 
   if [[ -n "${l_pw}" && "${l_pw}" != " " ]] ; then 
      l_auth="-u ${l_user} -p ${l_pw}" 
    else l_auth="-U ${l_user}" 
   fi

   l_output="$(hdbsql -quiet -x -z -a -C -j -i ${l_inst} ${l_auth} -d ${l_db} -F "${l_col_sep}"<<-EOF1 2>>/tmp/some-script.log 
		${l_sql}; 
		quit 
EOF1 )"
   
   l_ret=$?

   # For HANA 1.0 we need to trim the first 6 lines of output, because it doesn't understand "-quiet". 
   #if [[ "$(check_major_version)" -lt 2 ]] ; then 
      # l_output="$(print "${l_output}"| tail -n +7)" 
   #fi

   print "${l_output}" 
   return $l_ret 

}

To call the above function, we then just do (in the same script):

l_result="$(run_sql "10" "SystemDB" "SYSTEM" "SYSTEMPW" " " "ALTER SYSTEM ALTER CONFIGURATION ('global.ini', 'SYSTEM') SET ('persistence','log_mode')='overwrite' WITH RECONFIGURE")"

We are passing in the HANA instance number 10, you can use whatever your instance number is.

We can check the function return code (did the function return cleanly) like so:

if [[ $? -ne 0 ]] ; then 
   print "FAILED" 
   exit 1; 
fi

Here’s what we’re passing in our call to hdbsql (you can find this output by calling “hdbsql –help”):

-i instance number of the database engine
-d name of the database to connect
-U use credentials from user store
-u user name to connect
-p password to connect
-x run quietly (no messages, only query output)
-quiet Do not print the welcome screen
-F use as the field separator (default: ‘,’)
-C suppress escape output format
-j switch the page by page scroll output off
-Q show each column on a separate line
-a do not print any header for SELECT commands

If you wanted to return a value, then the “l_result” variable would contain the output.

Ideally, the function we wrote would be put into a chunk of modular code that could be referenced time and again from other Korn shell scripts.

You would also be looking to create some sets of standard functions for logging of messages to help with debugging. You can make it as complex as you wish.

In the call to “run_sql” we pass a column separator.
I usually like to use a “|” (pipe), then parse the returned values using the “awk” utility like so:

l_result="$(run_sql "10" "SystemDB" "SYSTEM" "SYSTEMPW" "|" "SELECT file_name,layer_name,section,key, value FROM SYS.M_INIFILE_CONTENTS WHERE layer_name='SYSTEM'")"

echo "${l_result}" | /bin/awk -F'|' '{ print $2" "$3" "$4 }'

When we execute the script we get the first 3 columns like so:

daemon.ini SYSTEM daemon 
diserver.ini SYSTEM communication 
global.ini SYSTEM auditing 
configuration global.ini SYSTEM 
backup global.ini SYSTEM
...

Obviously we don’t really embed the password in the script; it gets passed in.
You can either pass it in using the command line parameter method (./myscript.ksh someparam) or via the Linux environment variables (export myparam=something; ./myscript.ksh).
If you want you can even pipe it in (echo “myparam”| ./myscript.ksh) and “read” it into a variable.
You can also take a look at the “expect” utility to automate command line input.
Also, take a look at the “hdbuserstore” utility to store credentials for automation scripts (remember to set appropriatly secure privs on these database users).

That’s all there is to it for you to get started using Korn shell to call HANA.

Hardening SAP Hostagent SSL Connections

You may have recently had a penetration test and in the report you find that the SSL port for the SAP Hosagent (saphostexec) are listed as allowing weak encryption cipher strength and older SSL protocols.
You want to know how you can remedy this.

In this post I will show how we can appease the Cyber Security penetration testing team, by hardening the SSL ciphers and protocols used for connections to the Hostagent.

What Are Weak Ciphers?

Ciphers, like Triple-DES and Blowfish use 64-bit block sizes (the cipher text is split up into blocks of 64-bit in length) which makes a block cipher more vulnerable to compromise, compared to a cipher that uses a larger 128-bit block size.

The cipher is agreed upon during the setup of the SSL connection between the client and the server (SAP Hostagent in our scenario).
If a server advertises that it supports weaker ciphers, and a client elected to use one of the supported weaker ciphers during the SSL connection negotiation, then the connection could be vulnerable to decryption.

What Are Older SSL Protocols?

Through time the SSL protocol has been improved and strengthened.
The SSL protocol versions go from SSL v1.0 to SSL v3.0, then re-named to TLS and the versions again incremented from TLS 1.0, TLS 1.1, TLS 1.2 and the most recent TLS 1.3 (in 2018).

The old SSL versions of the protocol are deprecated and should not be used. The slightly newer TLS versions 1.0 and 1.1 are also now widely deprecated (do not confuse “deprecated” with “unused”).

It is therefore recommended, generally, to use TLS 1.2 and above.

Why Harden the Hostagent SSL Service?

Now we have an appreciation of our older ciphers and protocols, let’s look at the Hostagent.
Usually the PEN test report will highlight the SSL TCP port 1129, and the report will state two things:

  • The SSL ciphers accepted by the Hostagent include weaker ciphers (such as RC4).
  • The SSL protocols accepted by the Hostagent include TLS 1.0 and 1.1.

The above issues present opportunities for hackers that may allow them to more easily compromise a SAP Hostagent on a SAP server.
Whilst this may not sound too bad, it is just the Hostagent, when we realise that the Hostagent runs as the Linux root (or Windows SYSTEM user) and there are known vulnerabilities that allow remote exploitation, we can see that the Hostagent could be a window into the SAP system as the highest privileged user on the server!
It is therefore quite important to try and protect the Hostagent as much as possible.

How Can We Harden the Hostagent SSL Service?

To ensure that weak ciphers are not used, the server needs to be configured to not use them. In the context of SAP Hostagents, they are the SSL servers and they need to be configured to only use stronger ciphers.

The SAP Hostagent is really the same as the SAP Instance Agent in disguise.
Because of this, it is possible to find documented parameters that allow us to harden the SSL service of the Hostagent in the same way.

By following SAP note 510007, we can see two SAP recommended parameters and settings that can be used to harden the SSL ciphers used:

  • ssl/ciphersuites = 135:PFS:HIGH::EC_P256:EC_HIGH
  • ssl/client_ciphersuites = 150:PFS:HIGH::EC_P256:EC_HIGH

The SAP note 510007 includes an extremely good description of the SAP cryptographic library’s capabilities, the role of SSL and even some commentary on the probability of an older protocol being abused.
I feel that the note has been written by someone with a lot of experience.

The above two parameters apply a numeric calculation that selects an appropriate strength of cryptographic ciphers to be used for server and client connectivity.
With the Hostagent, we are more concerned with the server side, but the Hostagent can also do client calls, so we apply both parameters in unison.

The values assigned to the two parameters are described by the SAP note as being good, but also allow flexibility for backwards compatibility with the older SAP and non-SAP software. Again the SAP note stresses the importance of compatibility (and having stuff continue to work) versus security.

What is the Impact of the Parameters?

To be able to see the impact to the Hostagent, we first need to see what the Hostagent supports out-of-the-box.

Thanks to a great post here: www.ise.io/using-openssl-determine-ciphers-enabled-server
we can use a super simple shell script (on Unix/Linux) to call the OpenSSL executable, make a connection to the target server (the Hostagent) and check the list of ciphers and protocols that are advertised.
The code from the above site is here:

for v in ssl2 ssl3 tls1 tls1_1 tls1_2; do 
   for c in $(openssl ciphers 'ALL:eNULL' | tr ':' ' '); do 
      openssl s_client -connect localhost:1129 -cipher $c -$v < /dev/null > /dev/null 2>&1 && echo -e "$v:\t$c" 
   done 
done

You can see that I have placed “localhost” and “1129” in the code.
This is because I am running the script on a Linux host with a SAP Hostagent installed, and the SSL port is 1129 (default).

The output is something like this (depending on your version of the Hostagent):

tls1: ECDHE-RSA-AES256-SHA 
tls1: AES256-SHA 
tls1: ECDHE-RSA-AES128-SHA 
tls1: AES128-SHA 
tls1: RC4-SHA 
tls1: RC4-MD5 
tls1: DES-CBC3-SHA 
tls1_1: ECDHE-RSA-AES256-SHA 
tls1_1: AES256-SHA 
tls1_1: ECDHE-RSA-AES128-SHA 
tls1_1: AES128-SHA 
tls1_1: RC4-SHA 
tls1_1: RC4-MD5 
tls1_1: DES-CBC3-SHA 
tls1_2: ECDHE-RSA-AES256-GCM-SHA384 
tls1_2: ECDHE-RSA-AES256-SHA384 
tls1_2: ECDHE-RSA-AES256-SHA 
tls1_2: AES256-GCM-SHA384 
tls1_2: AES256-SHA 
tls1_2: ECDHE-RSA-AES128-GCM-SHA256 
tls1_2: ECDHE-RSA-AES128-SHA 
tls1_2: AES128-GCM-SHA256 
tls1_2: AES128-SHA 
tls1_2: RC4-SHA 
tls1_2: RC4-MD5 
tls1_2: DES-CBC3-SHA

You can see that we have some RC4 and some DES ciphers listed in the TLS 1.0, TLS 1.1 and TLS 1.2 sections.
We now use SAP note 510007 to decide that we want to use the more secure settings that remove these weaker ciphers.

In the case of SAP Host Agents, we adjust the profile file /usr/sap/hostctrl/exe/host_profile (as root), and add our two SAP recommended parameters (mentioned previously):
ssl/ciphersuites = 135:PFS:HIGH::EC_P256:EC_HIGH
ssl/client_ciphersuites = 150:PFS:HIGH::EC_P256:EC_HIGH

NOTE: You should be running the latest SAP Hostagent, this is very important for security of your system. There are known vulnerabilities in older versions that allow remote compromise.

Once set, we need to restart the agent:

/usr/sap/hostctrl/exe/saphostexec -restart

We can re-execute our check script to see that we have a more secure configuration:

tls1: ECDHE-RSA-AES256-SHA 
tls1: AES256-SHA 
tls1: ECDHE-RSA-AES128-SHA 
tls1: AES128-SHA 
tls1_1: ECDHE-RSA-AES256-SHA 
tls1_1: AES256-SHA 
tls1_1: ECDHE-RSA-AES128-SHA 
tls1_1: AES128-SHA 
tls1_2: ECDHE-RSA-AES256-GCM-SHA384 
tls1_2: ECDHE-RSA-AES256-SHA384 
tls1_2: ECDHE-RSA-AES256-SHA 
tls1_2: AES256-GCM-SHA384 
tls1_2: AES256-SHA 
tls1_2: ECDHE-RSA-AES128-GCM-SHA256 
tls1_2: ECDHE-RSA-AES128-SHA 
tls1_2: AES128-GCM-SHA256 
tls1_2: AES128-SHA

The more insecure ciphers are removed, but we still see those older protocols (TLS 1.0 and TLS 1.1) in the list.
We decide that we would like to further harden the setup by removing those protocols.

If we look at SAP note 2384290, we can see that an alternate set of parameter values are provided:

  • ssl/ciphersuites = 545:PFS:HIGH::EC_P256:EC_HIGH
  • ssl/client_ciphersuites = 560:PFS:HIGH::EC_P256:EC_HIGH

Let’s apply these and re-run the test for a final time.
We can see that we get a super refined list of protocols and ciphers:

tls1_2: ECDHE-RSA-AES256-GCM-SHA384 
tls1_2: ECDHE-RSA-AES256-SHA384 
tls1_2: ECDHE-RSA-AES256-SHA 
tls1_2: AES256-GCM-SHA384 
tls1_2: AES256-SHA 
tls1_2: ECDHE-RSA-AES128-GCM-SHA256 
tls1_2: ECDHE-RSA-AES128-SHA 
tls1_2: AES128-GCM-SHA256 
tls1_2: AES128-SHA

Our Hostagent SSL service is now as secure as it can be at this point in time, within reason. If we try and adjust the ciphers any further, we may end up breaking compatibility with other SAP systems in your landscape.

Summary

We’ve seen how applying two SAP standard parameters to the SAP Hostagent and restarting it, can significantly strengthen the posture of the Hostagent’s SSL service.

However, we need to be cautious of compatibility with other SAP and non-SAP software in the landscape, which may talk to the Hostagent only with older protocols.

As a final note, you may be wondering if we can remove the HTTP service from the Hostagent? At this point in time I have not found a SAP note that would indicate this is possible or recommended. However, since the HTTP protocol is known to be insecure, just don’t use it. This is in comparison with SSL which should be secure, but might not be as secure as it could be.

Best Disk Topology for SAP ASE Databases on Azure

Maybe you are considering migration of on-premise SAP ASE databases to Microsoft Azure, or you may be considering migrating from your existing database vendor to SAP ASE on Azure.
Either way, you will benefit from understanding a good, practical disk topology for SAP ASE on Azure.

In this post, I show how you can optimise use of the SAP ASE, Linux and Azure technical layers to provide a balanced approach to disk use, considering both performance and disk (ASE device) management.

The Different Layers

In an ASE on Linux on Azure (IaaS) setup, you have the following layers:

  • Azure Storage Services
  • Azure Data Disk Cache Settings
  • Linux Physical Disks
  • Linux Logical Volumes
  • Linux File Systems
  • ASE Database Data Devices
  • ASE Instance

Each layer has different options around tuning and setup, which I will highlight below.

Azure Storage Services

Starting at the bottom of the diagram we need to consider the Azure Disk Storage that we wish to use.
There are 2 design considerations here:

  • size of disk space required.
  • performance of disk device.

For performance, you are more than likely tied by the SAP requirements for running SAP on Azure.
Currently these require a minimum of Premium SSD storage, since it provides a guaranteed SLA. However, as of June 2020, Standard SSD was also given an SLA by Microsoft, potentially paving the way for cheaper disk (when certified by SAP) provided that it meets your SLA expectations.

Generally, the size of disk determines the performance (IOPS and MBps), but this can also be influenced by the quantity of data disk devices.
For example, by using 2 data disks striped together you can double the available IOPS. The IOPS are an important factor for databases, especially on high throughput database systems.

When considering multiple data disks, you also need to remember that each VM has limitations. There is a VM level IOPS limit, a VM level throughput limit (megabytes per second) plus a limit to the number of data disks that can be attached. These limit values are different for different Azure VM types.

Also remember that in Linux, each disk device has its own set of queues and buffers. Making use of multiple Linux disk devices (which translates directly to the number of Azure data disks) usually means better performance.

Essentials:

  • Choose minimum of Premium SSD (until Standard SSD is supported by SAP).
  • Spread database space requirements over multiple data disks.
  • Be aware of the VM level limits.

Azure Data Disk Cache Settings

Correct configuration of the Azure data disk cache settings on the Azure VM can help with performance and is an easy step to complete.
I have already documented the best practice Azure Disk Cache settings for ASE on Azure in a previous post.

Essentials:

  • Correctly set Azure VM disk cache settings on Azure data disks at the point of creation.

Use LVM For Managing Disks

Always use a logical volume manager, instead of formatting the Linux physical disk devices directly.
This allows the most flexibility for growing, shrinking and striping the disks for size and performance.

You should stripe the data logical volumes with a minimum of 2 physical disks and a maximum stripe size of 128KB (test it!). This fits within the window of testing that Microsoft have performed in order to achieve the designated IOPS for the underlying disk. It’s also the maximum size that ASE will read at. Depending on your DB read/write profile, you may choose a smaller stripe size such as 64KB, but it depends on the amount of Large I/O and pre-fetch. When reading the Microsoft documents, consider ASE to be the same as MS SQL Server (they are are from the same code lineage).

Stripe the transaction log logical volume(s) with a smaller stripe size, maybe start at 32KB and go lower but test it (remember HANA is 2KB stripe size for log volumes, but HANA uses Azure WriteAccelerator).

Essentials:

  • Use LVM to create volume groups and logical volumes.
  • Stipe the data logical volumes with (max) 128KB stripe size & test it.

Use XFS File System

You can essentially choose to use your preferred file system format; there are no restrictions – see note 405827.
However, if you already run or are planning to run HANA databases in your landscape, then choosing XFS for ASE will make your landscape architecture simpler, because HANA is recommended to run on an XFS file system (when on local disk) on Linux; again see SAP note 405827.

Where possible you will need to explicitly disable any Linux file system write barrier caching, because Azure will be handling the caching for you.
In SUSE Linux this is the “nobarrier” setting on the mount options of the XFS partition and for EXT4 partitions it is option “barrier=0”.

Essentials:

  • Choose disk file system wisely.
  • Disable write barriers.

Correctly Partition ASE

For SAP ASE, you should segregate the disk partitions of the database to avoid certain system specific databases or logging areas, from filling other disk locations and causing a general database system crash.

If you are using database replication (maybe SAP Replication Server a.k.a HADR for ASE), then you will have additional replication queue disk requirements, which should also be segregated.

A simple but flexible example layout is:

Volume
Group
Logical
Volume
Mount PointDescription
vg_aselv_ase<SID>/sybase/<SID>For ASE binaries
vg_sapdatalv_sapdata<SID>_1./sapdata_1One for each ASE device for SAP SID database.
vg_saploglv_saplog<SID>_1./saplog_1One for each log device for SAP SID database.
vg_asedatalv_asesec<SID>./sybsecurityASE security database.
lv_asesyst<SID>./sybsystemASE system databases (master, sybmgmtdb).
lv_saptemp<SID>./saptempThe SAP SID temp database.
lv_asetemp<SID>./sybtempThe ASE temp database.
lv_asediag<SID>./sapdiagThe ASE saptools database.
vg_asehadrlv_repdata<SID>./repdataThe HADR queue location.
vg_backupslv_backups<SID>./backupsDisk backup location.

The above will allow each disk partition usage type to be separately expanded, but more importantly, it allows specific Azure data disk cache settings to be applied to the right locations.
For instance, you can use read-write caching on the vg_ase volume group disks, because that location is only for storing binaries, text logs and config files for the ASE instance. The vg_asedata contains all the small ASE system databases, which will not use too much space, but could still benefit from read caching on the data disks.

TIP: Depending on the size of your database, you may decide to also separate the saptemp database into its own volume group. If you use HADR you may benefit from doing this.

You may not need the backups disk area if you are using a backup utility, but you may benefit from a scratch area of disk for system copies or emergency dumps.

You should choose a good naming standard for volume groups and logical volumes, because this will help you during the check phase, where you can script the checking of disk partitioning and cache settings.

Essentials:

  • Segregate disk partitions correctly.
  • Use a good naming standard for volume groups and LVs.
  • Remember the underlying cache settings on those affected disks.

Add Whole New ASE Devices

Follow the usual SAP ASE database practices of adding additional ASE data devices on additional file system partitions sapdata_2, sapdata_3 etc.
Do not be tempted to constantly (or automatically) expand the ASE device on sapdata_1 by adding new disks, you will find this difficult to maintain because striped logical volumes need at least 2 disks in the stripe set.
It will get complicated and is not easy to shrink back from this.

When you add new disks to an existing volume group and then expand an existing lv_sapdata<SID>_n logical volume, it is not as clean as adding a whole new logical volume (e.g. lv_sapdata<SID>_n+1) and then adding a whole new ASE data device.
The old problem of shrinking data devices is more easily solved by being able to drop a whole ASE device, instead of trying to shrink one.

NOTE: The Microsoft notes suggest enabling automatic DB expansion, but on Azure I don’t think it makes sense from a DB administration perspective.
Yes, by adding a new ASE device, as data ages you may end up with “hot” devices, but you can always move specific devices around and add more underlying disks and re-stripe etc. Keep the layout flexible.

Essentials:

  • Add new disks to new logical volumes (sapdata_n+1).
  • Add big whole new ASE devices to the new LVs.

Summary:

We’ve been through each of the layers in detail and now we can summarise as follows:

  • Choose a minimum of Premium SSD.
  • Spread database space requirements over multiple data disks.
  • Correctly set Azure VM disk cache settings on Azure data disks at the point of creation.
  • Use LVM to create volume groups and logical volumes.
  • Stipe the logical volumes with (max) 128KB stripe size & test it.
  • Choose disk file system wisely.
  • Disable write barriers.
  • Segregate disk partitions correctly.
  • Use a good naming standard for volume groups (and LVs).
  • Remember the underlying cache settings on those affected disks.
  • Add new disks to new logical volumes (sapdata_n).
  • Add big whole new ASE devices to the new LVs.

Useful Links:

HowTo: Check Netweaver 7.02 Secure Store Keyphrase

For Netweaver 7.1 and above, SAP provide a Java class that you can use to check the Secure Store keyphrase.
See SAP note 1895736 “Check if secure store keyphrase is correct”.
However, in the older Netweaver 7.02, the Java check function does not exist.

In this post I provide a simple way to check the keyphrase without making any destructive changes in Netweaver AS Java 7.02.

Why Check the Keyphrase?

Being able to check the Netweaver AS Java Secure Store keyphrase is useful when setting up SAP ASE HADR. The Software Provisioning Manager requests the keyphrase when installing the companion database on the standby/DR server.

The Check Process

In NW 7.02, you can use the following method, to check that you have the correct keyphrase for the Secure Store.
The method does not cause any outage or overwrite anything.
It is completely non-destructive, so you can run it as many times as you need.
I guess in a way it could also be used as a brute force method of guessing the keyphrase.

As the adm Linux user on the Java Central Instance, we first set up some useful variables:

setenv SLTOOLS /sapmnt/${SAPSYSTEMNAME}/global/sltools
setenv LIB ${SLTOOLS}/sharedlib
setenv IAIK ${SLTOOLS}/../security/lib/tools

Now we can call the java code that allows us to create a temporary Secure Store using the same keyphrase that we think is the real Secure Store keyphrase:
NOTE: We change “thepw” for the keyphrase that we think is correct.

/usr/sap/${SAPSYSTEMNAME}/J*/exe/sapjvm_*/bin/java -classpath "${LIB}/tc_sec_secstorefs.jar:${LIB}/exception.jar:${IAIK}/iaik_jce.jar:${LIB}/logging.jar" com.sap.security.core.server.secstorefs.SecStoreFS create -s ${SAPSYSTEMNAME} -f /tmp/${SAPSYSTEMNAME}sec.properties -k /tmp/${SAPSYSTEMNAME}sec.key -enc -p "thepw"

The output of the command above is 2 files in the /tmp folder, called sec.key and sec.properties.
If we now compare the checksum of the new temporary key file, to the current system Secure Store key file (in our case this is called SecStore.key):

cksum /sapmnt/${SAPSYSTEMNAME}/global/security/data/SecStore.key 
cksum /tmp/${SAPSYSTEMNAME}Sec.key


If both the check sum values are the same, then you have the correct keyphrase.

Is my GCP hosted SLES 12 Linux VM Affected by the BootHole Vulnerability

In an effort to really drag this topic out (it’s now a trilogy), I’ve taken my previous Azure specific post and also the AWS specific post and decided to do some further research into whether the same is true in Google Cloud Platform (a.k.a GCP).

Previously

(If I was writing this like a true screenwriter, it would get shorter and faster each recap).

In July 2020, a GRUB2 bootloader vulnerability was discovered which could allow attackers to replace the bootloader on a machine which has Secure Boot turned on.
The vulnerability is designated CVE-2020-10713 and is rated 8.2 HIGH on the CVSS (see here).

Let’s recap what this is (honestly, please see my Azure post for details, it’s quite technical), and how it impacts a GCP virtual machine running SUSE Enterprise Linux 12, which is commonly used to run SAP systems such as SAP HANA or other SAP products.

What is the Vulnerability?

Essentially, some evil input data can be entered into some part of the GRUB2 program binaries, which is not checked/validated.
By carefully crafting the data that is the overflow, it is possible to cause a specifically targeted memory area to be overwritten.

As described by Eclypsium here (the security company that detected this) “Attackers exploiting this vulnerability can install persistent and stealthy bootkits or malicious bootloaders that could give them near-total control over the victim device“.

Essentially, the vulnerability allows an attacker with root privileges to replace the bootloader with a malicious one.

What is GRUB2?

GRUB2 is v2 of the GRand Unified Bootloader (see here for the manual).
It can be used to load the main operating system of a computer.

What is Secure Boot?

There are commonly two boot methods: “Legacy Boot” and “Secure Boot” (a.k.a UEFI boot).
Until Secure Boot was invented, the bootloader would sit in a designated location on the hard disk and would be executed by the computer BIOS to start the chain of processes for the computer start up.

With Secure Boot, certificates are used to secure the boot process chain.
This BootHole vulnerability means a new CA certificate needs to be implemented in every machine that uses Secure Boot!

But the attackers Need Root?

Yes, the vulnerability is in a GRUB2 configuration text file owned by the root user. Additional text added to the file can cause the buffer overflow.
Anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus.

NOTE: The flaw also exists if you also use the network boot capability (PXE boot).

What is the Patch?

Due to the complexity of the problem (did you read the prior Eclypsium link?), it needs more than one piece of software to be patched and in different layers of the boot chain.

The vulnerable GRUB2 software needs patching.
To be able to stop the vulnerable version of GRUB2 being re-installed and used, three things need to happen:

  1. The O/S vendor (SUSE) needs to adjust their code (known as the “shim”) so that it no longer trusts the vulnerable version of GRUB2. Again, this is a software patch from the O/S vendor (SUSE) which will need a reboot.
  2. Since someone with root could simply re-install O/S vendor code (the “shim”) that trusts the vulnerable version of GRUB2, the adjusted O/S vendor code will need signing and trusting by the certificates further up the chain.
  3. The revocation list of Secure Boot needs to be adjusted to prevent the vulnerable version of the O/S vendor code (“shim”) from being called during boot. (This is known as the “dbx” (exclusion database), which will need updating with a firmware update).

What is SUSE doing about it?

There needs to be a multi-pronged patching process because SUSE also found some additional bugs during their analysis.

You can see the SUSE page on CVE-2020-10713 here, which includes the mention of the additional bugs.

How does this impact GCP VMs?

In the previous paragraphs we found that a firmware update is needed to update the “dbx” exclusion database.
Since GCP virtual machines are hosted in a KVM based hypervisor, the “firmware” is actually software.

Whilst looking for details on “Secure Boot” in GCP virtual machines, we come across the Google Compute Engine’s “Shielded VM” option.
You can read about it in detail here.
In brief, in GCP a Shielded VM is deployed using a pre-defined set of Google specific guest operating systems:

As noted above, the documentation specifically mentions that the “firmware” underpinning the virtual machine contains Google’s Certificate Authority (CA) certificate, as the root of the trust chain.
This is important because the Eclypsium description of the vulnerability is specifically citing a problem with the Microsoft CA.
What this means is that Google actually decide on the trust chain themselves and can probably more rapidly adjust the firmware with a new CA certificate.
To reiterate, this is specific to Google specific VM images that you deploy as a Shielded VM.

Another point worth noting is that when creating a Shielded VM, you can enable the vTPM (virtual trusted platform module), which allows integrity monitoring of the boot process. Any change to the boot process and a validation alert is triggered. Whilst this would not prevent compromise, it would at least alert an administrator.

Reading the Google infrastructure security document, we find that just like AWS, Google have designed and are implementing their own security chip called Titan, on the physical hosts. This is used to ensure that physical hosts boot securely, but it is not clear if this chip is used in anyway for Shielded VMs booted on the physical host.

If we delve further into the GCP documentation we find that we also have the option to create a custom image for deployment into a Shielded VM.
See the documentation on how to create a custom Shielded VM image:

The above states that you can create your own Secure Boot capable VM image for deployment in GCP as a Shielded VM.
If we read further down that page under section “Default certificates“, we find a slight difference compared to the Google “curated” images:

The above is telling us, by default the standard Microsoft CA certificates are used for the Secure Boot setup of VMs created using a custom image (remember non-custom Secure Boot images use Google’s root CA) in GCP.
When it says “default values”, right now, they are the only values because of a small note further up the page:

OK, so you can only use the defaults for now. The same compromised defaults that will need fixing. 🤷‍♂️

What do we think needs to happen once Google create the ability to replace the certificates?
From reading those previously mentioned documents, I would guess that to rebuild the certificate database used during the creation of the custom Shielded VM image, you are going to need to re-create the VM image and then re-deploy a VM from that image!

The question remains, is SLES 12 supported as a Shielded VM guest-OS on GCP?
According to the Shielded VM page here, it is not by default. You will need to therefore create your own image:

Summary:

The BootHole vulnerability is far reaching and will impact many, many devices (servers, laptops, IoT devices, TVs, fridges, cars?).
However, only those devices that actually *use* Secure Boot will truly be impacted, since the devices not using Secure Boot do not need to be patched (it’s fruitless).

If you run SLES 12 on GCP virtual machines, using public images, then by default you will not being using the Shielded VM instances, so there is no point patching to fix a vulnerability for which you are not affected.
You are only introducing more risk by patching.

If however, you do decide to patch (even if you don’t need to) then follow the advice from SUSE and patch to fix GRUB2, the “shim” and the other vulnerabilities that were found.

On a final closing point, you could be running a custom SLES image deployed in GCP as a Shielded VM. An image that your company has built and which uses Secure Boot. You would be wise to contact your cloud administrators to ensure that they are preparing for a VM rebuild and subsequent patching required to ensure that Secure Boot remains secure.

Useful Links: