This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

SAP’s Deeper Partnership with Red Hat

An announcement back in February 2023 from Waldorf tells us of a “deepening” partnership between SAP and the Enterprise Linux Operating System vendor Red Hat.

They have a long history together already, with the SAP Linux Labs encompassing the Red Hat tech team to ensure SAP on Red Hat Linux works and performs as it should.
 

Here are the lines of significance from the SAP news article: https://news.sap.com/2023/02/red-hat-and-sap-deepen-partnership/

…SAP is boosting support for the RISE with SAP solution using Red Hat Enterprise Linux as the preferred operating system for net new business for RISE with SAP solution deployments.

The platform builds on this trust by offering a consistent, reliable foundation for SAP software deployments, providing a standard Linux backbone to support SAP customers across hybrid and multi-cloud environments.

…building on Red Hat’s scalable, flexible, open hybrid cloud infrastructure.

…SAP’s internal IT environments and SAP Enterprise Cloud Services adopting Red Hat Enterprise Linux can gain greater flexibility to address modern and future technology requirements.

“…Red Hat Enterprise Linux offers enhanced performance capabilities to support RISE with SAP solution deployments across cloud environments…

There are a lot of points to cover and, as always, a little history is useful.
Grab a bagel (that’s what American’s eat right?) put some Obatzda cheese on it (it’s German, I’m trying to equate eating with the subject of this article) and settle in for a read.

Who is Red Hat?

You can read all about Red Hat on Wikipedia here: https://en.wikipedia.org/wiki/Red_Hat , but suffice to say:

  • It is owned by IBM since 2019.
  • It owns Ansible.
  • It owns Red Hat Enterprise Linux CoreOS (RHCOS), which is the production Linux Operating System beneath the container platform OpenShift.  RHCOS is built on the same Red Hat Enterprise Linux (RHEL) kernel.

What is RISE with SAP?

There are many views on why “RISE with SAP” came to fruition and who it benefits, but the official line is that RISE with SAP is a solution designed to support the needs of the customer’s business in any industry, with SAP responsible for the holistic service level agreement (SLA), cloud operations, and technical support and the partner (insert any Global SI) provides sales, consulting and application managed services (AMS).

…SAP is boosting support for the RISE with SAP solution using Red Hat Enterprise Linux as the preferred operating system for net new business for RISE with SAP solution deployments.

When the article talks about “net new” that just means any brand new RISE subscriptions.

Notice that one of the significant lines I pulled out of the article says:

…providing a standard Linux backbone to support SAP customers across hybrid and multi-cloud environments.

Since SAP are doing the hosting, the “multi-cloud” part is probably referring to SAP’s hybrid and multi-cloud.  i.e. SAP’s own datacentres and also the hyperscalers.

An enticing option that comes as part of the RISE deal (depending on the customer spend) is SAP Business Technology Platform (BTP).
SAP BTP is a PaaS solution under a subscription model, in which SAP customers can combine and deploy curated SAP services from SAP or third-parties, or use services to code their own solutions in a variety of languages including SAP’s proprietary ABAP language.

The SAP BTP environments are hybrid and multi-cloud, as they are hosted in Cloud Foundry (the newest) or Neo (currently sun-setting), with these being run from a combination of SAP’s own datacentres and/or on the main hyperscalers (Cloud Foundry).  There are two other environments Kyma, a micro-services runtime based on Kubernetes and the ABAP environment, hosted in Cloud Foundry.

In conclusion on this section, I suggest that the described “net new business” is actually internal business inside of SAP and not directly the hosting of customer’s S/4HANA systems.  In fact, S/4HANA is only very loosely mentioned in the article, which leads me to believe that this announcement is purely for BTP and other surround services.

SAP HANA and Compute Power

In one of the statements from SAP on this:

“deepening” partnership, we see “…Red Hat Enterprise Linux offers enhanced performance capabilities to support RISE with SAP solution deployments across cloud environments…
 

I can’t see anything specifically mentioned about how Red Hat’s Linux operating system is more performant than SUSE, other than an article from 2019 where a SAP Business Warehouse (BW) on HANA system (maybe, could be BW/4HANA, difficult to tell) holds a world record.

See here for more:  https://www.redhat.com/en/resources/red-hat-enterprise-linux-for-sap-solutions-datasheet   which links to here:  https://www.redhat.com/en/blog/red-hat-enterprise-linux-intels-newest-xeon-processors-posts-record-performance-results-across-wide-range-industry-benchmarks?source=blogchannel

The thing to note about those claims are that:

  • This was based on a 2nd Gen Intel Xeon (3rd Gen is already available).
  • The CPU used Intel Advanced Vector Extensions 512 (AVX-512) instruction set, which Intel says arrived in 3rd Gen chips (is the Red Hat article quoting the wrong chip generation?).
  • Generally we run HANA on hyperscalers on Intel Skylake or Cascade lake CPUs.  Only HANA on bare metal may allow Xeon CPUs.
  • The Red Hat Linux Operating System version was 7.2 for the world record, but 7.9 is the latest support pack version and  9.0 is out now.  Also, 7.2 is now only supported for older versions of HANA 2.0 (up to SPS03).
  • The use of Intel OptaneDC (Intel’s non-volatile memory persistence technology) was used in the world record, but recently announced in 2022 as defunct (superseded by another initiative).
  • 2019 was the year that the IBM acquisition of Red Hat concluded.  Coincidence?

My summary of this section is that I don’t believe performance is the reason for any switch by SAP from (mainly) SUSE to Red Hat.  The one article of relevance that I can find seems just too old and outdated.

What I think, is that the announcement from SAP is referring to something other than the Linux Operating System alone.

Red Hat’s Scalable, flexible, open hybrid cloud infrastructure

We maybe need to look past the Red Hat Linux Operating System and at the infrastructure eco-system that the Operating System is part of.

…building on Red Hat’s scalable, flexible, open hybrid cloud infrastructure.

When the article talks about “open” we are inclined to think about Open Source, freely available or even open APIs (sometimes just having APIs can make something “open”).

In my mind, something that can run seamlessly almost anywhere on hybrid cloud would involve containers.  Containers provide scalability (scale-out) and flexibility (multiple environments offered).

Let me introduce you to OpenShift.  Yeah, it’s got “open” in the name.

See here for a wiki article:  https://en.wikipedia.org/wiki/OpenShift

As a summary of OpenShift, the Red Hat Enterprise Linux CoreOS (RHCOS) underpins the OpenShift hybrid cloud platform and RHCOS uses the same kernel as Red Hat Enterprise Linux.

The orchestration of OpenShift containers is done using Kubernetes and Red Hat is the second largest contributor to Kubernetes after Google (Red Hat is a platinum member: https://www.cncf.io/about/members/).

I think you might be able to see where we are heading in this section.

Could SAP be adopting OpenShift internally for its future container hosting platform strategy?

IBM Cloud deprecated support for Cloud Foundry in mid-2022.  As suspected, Red Hat OpenShift is one of the touted solutions to replace it: https://cloud.ibm.com/docs/cloud-foundry-public?topic=cloud-foundry-public-deprecation#dep_nextsteps

Need greater efficiency and revolutionary delivery? Red Hat OpenShift on IBM Cloud might be your solution.

The above quote on the IBM Cloud site does provide some hint that operating Cloud Foundry platform services at scale, could be less efficient and less innovative compared to Red Hat OpenShift.


Maybe this is something that, internally, SAP have also concluded?

What Does SUSE Offer to Compete with Red Hat and it’s OpenShift offering?

The SUSE Linux Enterprise Server (SLES) Operating has been a solid foundation for running SAP systems.

Similar to Red Hat, SUSE has a varied portfolio of products in the Linux container technology space.
Rancher Labs is one of those products, and allows easier management of Kubernetes, especially once the quantity of containers accelerates.

SUSE is also a contributor to Kubernetes (it is a silver member).

SUSE also owns Rancher, which is an open source container management platform similar to Red Hat’s OpenShift. 

The SUSE Rancher product is open armed, in that it embraces many different operating systems and a number of license options, whereas Red Hat OpenShift supports only the Red Hat CoreOs and requires a SUSE subscription.

While being open is a good thing, it also adds complexity, since Red Hat’s CoreOs is a purpose built Operating System with all required features and it would appear to have a simpler method of deploying and maintaining it.

It’s possible that SAP’s announcement comes after some internal evaluation of the two products, with Red Hat’s being favoured the most.

Conclusions

We’ve looked at the article from the SAP site where the new “deeper” partnership with Red Hat was announced.

I think I ruled out performance as a reason for the Operating System change.  The article just didn’t have enough depth for my liking.

I have speculated on how this SAP and Red Hat partnership could be about the internal SAP hosting of PaaS and maybe SaaS related systems and not directly related to hosting of customer’s S/4HANA systems.

What we could be looking at, is the next generation of hosting platform for SAP BTP or possibly SAP S/4HANA Cloud public edition.
Red Hat’s OpenShift platform, underpinned with the Red Hat CoreOS and the Red Hat tools to monitor, automate and orchestrate, could all combine to provide a solid accompaniment to solve SAP’s internal strategic issues.

It’s one of the platforms chosen by IBM Cloud (a no brainer for them really), with the justification that Cloud Foundry was no longer the strategic platform.

The announcement has no impact on the certification of SUSE for running S/4HANA and therefore should not reflect any customer decisions during their RISE with SAP journey for their S/4HANA systems.

Resources:

https://news.sap.com/2023/02/red-hat-and-sap-deepen-partnership/
https://blogs.sap.com/2019/07/15/evolution-of-sap-cloud-platform-retirement-of-sap-managed-backing-services/
https://blogs.sap.com/2023/06/14/farewell-neo-sap-btp-multi-cloud-environment-the-deployment-environment-of-choice/
https://me.sap.com/notes/2235581
https://learn.microsoft.com/en-us/azure/virtual-machines/mv2-series
https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-compute
https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-solution-brief.html
https://www.redhat.com/en/resources/red-hat-enterprise-linux-for-sap-solutions-datasheet
https://www.redhat.com/en/blog/red-hat-enterprise-linux-intels-newest-xeon-processors-posts-record-performance-results-across-wide-range-industry-benchmarks
https://docs.openshift.com/container-platform/4.8/architecture/architecture-rhcos.html#rhcos-key-features_architecture-rhcos
https://www.anandtech.com/show/14146/intel-xeon-scalable-cascade-lake-deep-dive-now-with-optane
https://www.sap.com/products/erp/s4hana.html
https://en.wikipedia.org/wiki/Red_Hat
https://en.wikipedia.org/wiki/Rancher_Labs
https://en.wikipedia.org/wiki/OpenStack
https://en.wikipedia.org/wiki/OpenShift
https://en.wikipedia.org/wiki/Cloud_Foundry
https://en.wikipedia.org/wiki/3D_XPoint
https://www.ibm.com/support/pages/sap-s4hana-red-hat-openshift-container-platform-business-perspective-cloud-hosting-provider
https://cloud.ibm.com/docs/cloud-foundry-public?topic=cloud-foundry-public-deprecation
https://www.cncf.io/about/members/

Is my GCP hosted SLES 12 Linux VM Affected by the BootHole Vulnerability

In an effort to really drag this topic out (it’s now a trilogy), I’ve taken my previous Azure specific post and also the AWS specific post and decided to do some further research into whether the same is true in Google Cloud Platform (a.k.a GCP).

Previously

(If I was writing this like a true screenwriter, it would get shorter and faster each recap).

In July 2020, a GRUB2 bootloader vulnerability was discovered which could allow attackers to replace the bootloader on a machine which has Secure Boot turned on.
The vulnerability is designated CVE-2020-10713 and is rated 8.2 HIGH on the CVSS (see here).

Let’s recap what this is (honestly, please see my Azure post for details, it’s quite technical), and how it impacts a GCP virtual machine running SUSE Enterprise Linux 12, which is commonly used to run SAP systems such as SAP HANA or other SAP products.

What is the Vulnerability?

Essentially, some evil input data can be entered into some part of the GRUB2 program binaries, which is not checked/validated.
By carefully crafting the data that is the overflow, it is possible to cause a specifically targeted memory area to be overwritten.

As described by Eclypsium here (the security company that detected this) “Attackers exploiting this vulnerability can install persistent and stealthy bootkits or malicious bootloaders that could give them near-total control over the victim device“.

Essentially, the vulnerability allows an attacker with root privileges to replace the bootloader with a malicious one.

What is GRUB2?

GRUB2 is v2 of the GRand Unified Bootloader (see here for the manual).
It can be used to load the main operating system of a computer.

What is Secure Boot?

There are commonly two boot methods: “Legacy Boot” and “Secure Boot” (a.k.a UEFI boot).
Until Secure Boot was invented, the bootloader would sit in a designated location on the hard disk and would be executed by the computer BIOS to start the chain of processes for the computer start up.

With Secure Boot, certificates are used to secure the boot process chain.
This BootHole vulnerability means a new CA certificate needs to be implemented in every machine that uses Secure Boot!

But the attackers Need Root?

Yes, the vulnerability is in a GRUB2 configuration text file owned by the root user. Additional text added to the file can cause the buffer overflow.
Anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus.

NOTE: The flaw also exists if you also use the network boot capability (PXE boot).

What is the Patch?

Due to the complexity of the problem (did you read the prior Eclypsium link?), it needs more than one piece of software to be patched and in different layers of the boot chain.

The vulnerable GRUB2 software needs patching.
To be able to stop the vulnerable version of GRUB2 being re-installed and used, three things need to happen:

  1. The O/S vendor (SUSE) needs to adjust their code (known as the “shim”) so that it no longer trusts the vulnerable version of GRUB2. Again, this is a software patch from the O/S vendor (SUSE) which will need a reboot.
  2. Since someone with root could simply re-install O/S vendor code (the “shim”) that trusts the vulnerable version of GRUB2, the adjusted O/S vendor code will need signing and trusting by the certificates further up the chain.
  3. The revocation list of Secure Boot needs to be adjusted to prevent the vulnerable version of the O/S vendor code (“shim”) from being called during boot. (This is known as the “dbx” (exclusion database), which will need updating with a firmware update).

What is SUSE doing about it?

There needs to be a multi-pronged patching process because SUSE also found some additional bugs during their analysis.

You can see the SUSE page on CVE-2020-10713 here, which includes the mention of the additional bugs.

How does this impact GCP VMs?

In the previous paragraphs we found that a firmware update is needed to update the “dbx” exclusion database.
Since GCP virtual machines are hosted in a KVM based hypervisor, the “firmware” is actually software.

Whilst looking for details on “Secure Boot” in GCP virtual machines, we come across the Google Compute Engine’s “Shielded VM” option.
You can read about it in detail here.
In brief, in GCP a Shielded VM is deployed using a pre-defined set of Google specific guest operating systems:

As noted above, the documentation specifically mentions that the “firmware” underpinning the virtual machine contains Google’s Certificate Authority (CA) certificate, as the root of the trust chain.
This is important because the Eclypsium description of the vulnerability is specifically citing a problem with the Microsoft CA.
What this means is that Google actually decide on the trust chain themselves and can probably more rapidly adjust the firmware with a new CA certificate.
To reiterate, this is specific to Google specific VM images that you deploy as a Shielded VM.

Another point worth noting is that when creating a Shielded VM, you can enable the vTPM (virtual trusted platform module), which allows integrity monitoring of the boot process. Any change to the boot process and a validation alert is triggered. Whilst this would not prevent compromise, it would at least alert an administrator.

Reading the Google infrastructure security document, we find that just like AWS, Google have designed and are implementing their own security chip called Titan, on the physical hosts. This is used to ensure that physical hosts boot securely, but it is not clear if this chip is used in anyway for Shielded VMs booted on the physical host.

If we delve further into the GCP documentation we find that we also have the option to create a custom image for deployment into a Shielded VM.
See the documentation on how to create a custom Shielded VM image:

The above states that you can create your own Secure Boot capable VM image for deployment in GCP as a Shielded VM.
If we read further down that page under section “Default certificates“, we find a slight difference compared to the Google “curated” images:

The above is telling us, by default the standard Microsoft CA certificates are used for the Secure Boot setup of VMs created using a custom image (remember non-custom Secure Boot images use Google’s root CA) in GCP.
When it says “default values”, right now, they are the only values because of a small note further up the page:

OK, so you can only use the defaults for now. The same compromised defaults that will need fixing. 🤷‍♂️

What do we think needs to happen once Google create the ability to replace the certificates?
From reading those previously mentioned documents, I would guess that to rebuild the certificate database used during the creation of the custom Shielded VM image, you are going to need to re-create the VM image and then re-deploy a VM from that image!

The question remains, is SLES 12 supported as a Shielded VM guest-OS on GCP?
According to the Shielded VM page here, it is not by default. You will need to therefore create your own image:

Summary:

The BootHole vulnerability is far reaching and will impact many, many devices (servers, laptops, IoT devices, TVs, fridges, cars?).
However, only those devices that actually *use* Secure Boot will truly be impacted, since the devices not using Secure Boot do not need to be patched (it’s fruitless).

If you run SLES 12 on GCP virtual machines, using public images, then by default you will not being using the Shielded VM instances, so there is no point patching to fix a vulnerability for which you are not affected.
You are only introducing more risk by patching.

If however, you do decide to patch (even if you don’t need to) then follow the advice from SUSE and patch to fix GRUB2, the “shim” and the other vulnerabilities that were found.

On a final closing point, you could be running a custom SLES image deployed in GCP as a Shielded VM. An image that your company has built and which uses Secure Boot. You would be wise to contact your cloud administrators to ensure that they are preparing for a VM rebuild and subsequent patching required to ensure that Secure Boot remains secure.

Useful Links:

Is my AWS hosted SLES 12 Linux VM Affected by the BootHole Vulnerability

In an effort to spin this story out a little further, I’ve taken my previous Azure specific post and decided to do some further research into whether the same is true in Amazon Web Services (a.k.a AWS).

Previously

In July 2020, a GRUB2 bootloader vulnerability was discovered which could allow attackers to replace the bootloader on a machine which has Secure Boot turned on.
The vulnerability is designated CVE-2020-10713 and is rated 8.2 HIGH on the CVSS (see here).

Let’s recap what this is (honestly, please see my other post for details, it’s quite technical), and how it impacts an AWS virtual machine running SUSE Enterprise Linux 12, which is commonly used to run SAP systems such as SAP HANA or other SAP products.

What is the Vulnerability?

Essentially, some evil input data can be entered into some part of the GRUB2 program binaries, which is not checked/validated.
By carefully crafting the data that is the overflow, it is possible to cause a specifically targeted memory area to be overwritten.

As described by Eclypsium here (the security company that detected this) “Attackers exploiting this vulnerability can install persistent and stealthy bootkits or malicious bootloaders that could give them near-total control over the victim device“.

Essentially, the vulnerability allows an attacker with root privileges to replace the bootloader with a malicious one.

What is GRUB2?

GRUB2 is v2 of the GRand Unified Bootloader (see here for the manual).
It can be used to load the main operating system of a computer.

What is Secure Boot?

There are commonly two boot methods: “Legacy Boot” and “Secure Boot” (a.k.a UEFI boot).
Until Secure Boot was invented, the bootloader would sit in a designated location on the hard disk and would be executed by the computer BIOS to start the chain of processes for the computer start up.

With Secure Boot, certificates are used to secure the boot process chain.
This BootHole vulnerability means a new CA certificate needs to be implemented in every machine that uses Secure Boot!

But the attackers Need Root?

Yes, the vulnerability is in a GRUB2 configuration text file owned by the root user. Additional text added to the file can cause the buffer overflow.
Anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus.

NOTE: The flaw also exists if you also use the network boot capability (PXE boot).

What is the Patch?

Due to the complexity of the problem (did you read the prior Eclypsium link?), it needs more than one piece of software to be patched and in different layers of the boot chain.

The vulnerable GRUB2 software needs patching.
To be able to stop the vulnerable version of GRUB2 being re-installed and used, three things need to happen:

  1. The O/S vendor (SUSE) needs to adjust their code (known as the “shim”) so that it no longer trusts the vulnerable version of GRUB2. Again, this is a software patch from the O/S vendor (SUSE) which will need a reboot.
  2. Since someone with root could simply re-install O/S vendor code (the “shim”) that trusts the vulnerable version of GRUB2, the adjusted O/S vendor code will need signing and trusting by the certificates further up the chain.
  3. The revocation list of Secure Boot needs to be adjusted to prevent the vulnerable version of the O/S vendor code (“shim”) from being called during boot. (This is known as the “dbx” (exclusion database), which will need updating with a firmware update).

What is SUSE doing about it?

There needs to be a multi-pronged patching process because SUSE also found some additional bugs during their analysis.

You can see the SUSE page on CVE-2020-10713 here, which includes the mention of the additional bugs.

How does this impact AWS VMs?

In the previous paragraphs we found that a firmware update is needed to update the “dbx” exclusion database.
Since AWS virtual machines are hosted in a KVM based hypervisor, the “firmware” is actually software.

Whilst looking for details on “Secure Boot” in AWS virtual machines, there is absolutely no mention of it being supported for Linux.
If we dig into the the VM import/export documents here on the AWS docs site, we find:

So the above states that for VMs imported/exported, “UEFI/EFI boot partitions are supported only for Windows boot volumes with VHDX as the image format. Otherwise, a VM’s boot volume must use Master Boot Record (MBR) partitions.“.
The words “…only for Windows…” are the key part of this.
Because if we scan just a little further down the page, it says that the UEFI boot partitions are actually “supported” for Windows, by being converted to MBR (not Secure Boot compatible):

I feel we can surmise that AWS does not support running Linux VMs with Secure Boot.
Apart from this little gem of information here.
This slide shows that the launch of the AWS Graviton2 chip enables ARM based Linux distributions to support Secure Boot.
We can read the Amazon EC2 User Guide here (updated August 28, 2020), to find that SLES 15 is the only SUSE Linux that supports ARM cpus on AWS:

So we know that Secure Boot is not available in AWS on any of the SLES x86 operating systems, and SLES 12 on ARM is not supported on Graviton based cpus.

Summary:

The BootHole vulnerability is far reaching and will impact many, many devices (servers, laptops, IoT devices, TVs, fridges, cars?).
However, only those devices that actually *use* Secure Boot will truly be impacted, since the devices not using Secure Boot do not need to be patched (it’s fruitless).

If you run SLES 12 on AWS virtual machines, you cannot possibly use Secure Boot, so there is no point patching to fix a vulnerability for which you are not affected.
You are only introducing more risk by patching.

If however, you do decide to patch (even if you don’t need to) then follow the advice from SUSE and patch to fix GRUB2, the “shim” and the other vulnerabilities that were found.

If you are running SLES 12 on AWS, then there is no specific order of patching, because you do not use Secure Boot, so there is no possibility of breaking the trust chain that doesn’t exist.

On a final closing point, you could be running a HANA system in AWS on what is known as “Bare Metal” (“High Memory Instances” or a.k.a “*.metal”). These are physical machines using the Nitro based hyper-visor. So whilst EC2 Virtual Machines can’t use Secure Boot, these “Bare Metal” machines may well do so through the use of the Nitro Security Chip (see a good deep dive here). You would be wise to contact your AWS account representative to establish if they will be patching the firmware.

Useful Links:

Is my Azure hosted SLES 12 Linux VM Affected by the BootHole Vulnerability

In July 2020, a GRUB2 bootloader vulnerability was discovered which could allow attackers to replace the bootloader on a machine which has Secure Boot turned on.
The vulnerability is designated CVE-2020-10713 and is rated 8.2 HIGH on the CVSS (see here).

Let’s look at what this is and how it impacts a Microsoft Azure virtual machine running SUSE Enterprise Linux 12, which is commonly used to run SAP systems such as SAP HANA or other SAP products.

What is the Vulnerability?

It is a “Classic Buffer Overflow” vulnerability in the GRUB2 bootloader for versions prior to 2.06.
Essentially, some evil input data can be entered into some part of the GRUB2 program binaries, which is not checked/validated.
The input data causes an overflow of the holding memory area into adjacent memory areas.
By carefully crafting the data that is the overflow, it is possible to cause a specifically targeted memory area to be overwritten.

As described by Eclypsium here (the security company that detected this) “Attackers exploiting this vulnerability can install persistent and stealthy bootkits or malicious bootloaders that could give them near-total control over the victim device“.

Essentially, the vulnerability allows an attacker with root privileges to replace the bootloader with a malicious one, boot into it and then have further capability to effectively set up camp (a backdoor) on the server.
This backdoor would be hard to remove because the bootloader is one of the first things to be booted (anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus).

What is GRUB2?

GRUB2 is v2 of the GRand Unified Bootloader (see here for the manual).
It is used to load the main operating system of a computer.
Usually on Linux virtual machines, GRUB is used to load Linux. It is possible to install GRUB on machines that then boot into Windows.

What is Secure Boot?

There are commonly two boot methods: “Legacy Boot” and “Secure Boot” (a.k.a UEFI boot).
Until Secure Boot was invented, the bootloader would sit in a designated location on the hard disk and would be executed by the computer BIOS to start the chain of processes for the computer start up.
This is clearly quite insecure, since any program could put itself at the designated location and then be executed at boot up.

With Secure Boot, certificates are used to secure the boot process chain.
As with any certificate based process, at the top (root) level there needs to exist a certificate which is valid for many years and is ultimately trusted – the Certificate Authority (CA).
The next levels in the chain trust that CA certificate implicitly and if any point in the chain is compromised, then the trust is broken and will need re-establishing with new certificates.
Depending which level of the chain is compromised, will dictate the amount of effort needed to fix it.

This BootHole vulnerability means a new CA certificate needs to be implemented in every machine that uses Secure Boot!

But the attackers Need Root?

Yes, the vulnerability is in a GRUB2 configuration text file owned by the root user. Additional text added to the file can cause the buffer overflow.
Once the attacker has used malware to instigate the overflow, and installed a malicious bootloader, they then have a backdoor to the server, which would be executed every time the server is rebooted.
This backdoor would be hard to remove because the bootloader is one of the first things to be booted (anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus).

NOTE: The flaw also exists if you also use the network boot capability (PXE boot).

What is the Patch?

Due to the complexity of the problem (did you read the prior Eclypsium link?), it needs more than one piece of software to be patched and in different layers of the boot chain.

First off, the vulnerable GRUB2 software needs patching; this is quite easy and will require a reboot of the Linux O/S.
The problem with patching just GRUB2, is that it is still possible for an attacker with root to re-install a vulnerable version of GRUB2 and then use that vulnerable version to compromise the system further.
Remember, the chain of trust is still trusting that vulnerable version of GRUB2.
Therefore, to be able to stop the vulnerable version of GRUB2 being re-installed and used, three things need to happen:

  1. The O/S vendor (SUSE) needs to adjust their code (known as the “shim”) so that it no longer trusts the vulnerable version of GRUB2. Again, this is a software patch from the O/S vendor (SUSE) which will need a reboot.
  2. Since someone with root could simply re-install O/S vendor code (the “shim”) that trusts the vulnerable version of GRUB2, the adjusted O/S vendor code will need signing and trusting by the certificates further up the chain.
  3. The revocation list of Secure Boot needs to be adjusted to prevent the vulnerable version of the O/S vendor code (“shim”) from being called during boot. (This is known as the “dbx” (exclusion database), which will need updating with a firmware update).

What is SUSE doing about it?

There needs to be a multi-pronged patching process because SUSE also found some additional bugs during their analysis.

You can see the SUSE page on CVE-2020-10713 here, which includes the mention of the additional bugs.

They key point is that you *could* start patching, but if it were me, I would be tempted to wait until the SUSE “shim” has been updated with the new chain certificate, patch GRUB2 and then update the “dbx”.

How does this impact Azure VMs?

In the previous paragraphs we found that a firmware update is needed to update the “dbx” exclusion database.
Since Microsoft Azure is using the Hyper-V hypervisor, the “firmware” is actually software in Hyper-v.
See here, which says: “Secure Boot or UEFI firmware isn’t required on the physical Hyper-V host. Hyper-V provides virtual firmware to virtual machines that is independent of what’s on the Hyper-V host.

So the above would indicate that the Virtual Machine contains the necessary code from Hyper-V.
I would imagine that this is included at VM creation time.

If we dig into the VM details a little bit here on the Microsoft sites, we find:

So the above states that “…generation 2 VMs in Azure do not support Secure Boot…“.
The words “…in Azure…” are the key part of this.

OK, then how about Hyper-V in general (on-premise):

The above states “To Secure Boot generation 2 Linux virtual machines, you need to choose the UEFI CA Secure Boot template when you create the virtual machine.“.
BUT this is for Hyper-V in general, not for Azure virtual machines.

So we know that Secure Boot is not available in Azure on any of the generation 1 or generation 2 VMs (as of writing there are only 2).

Summary:

The BootHole vulnerability is far reaching and will impact many, many devices (servers, laptops, IoT devices, TVs, fridges, cars?).
However, only those devices that actually *use* Secure Boot will truly be impacted, since the devices not using Secure Boot do not need to be patched (it’s fruitless).

If you run SLES 12 on Azure virtual machines, you cannot possibly use Secure Boot, so there is no point patching to fix a vulnerability for which you are not affected.
You are only introducing more risk by patching.

If however, you do decide to patch (even if you don’t need to) then follow the advice from SUSE and patch to fix GRUB2, the “shim” and the other vulnerabilities that were found.

If you are running SLES on Azure, then there is no specific order of patching, because you do not use Secure Boot, so there is no possibility of breaking the trust chain that doesn’t exist.

On a final closing point, you could be running a HANA system in Azure on what is known as “HANA Large Instances” (HLI). These are physical machines. So whilst Virtual Machines can’t use Secure Boot, these physical machines may well do so. You would be wise to contact your Microsoft account representative to establish if they will be patching the firmware.

Useful Links:

Making saptune Actually Work & Patching to v2

Having recently spent some time analysing the performance of a HANA database system, I got down to the depths of Linux device I/O performance on an Azure hosted VM.

There was no reason to suspect any issue, because during the implementation of the VM image build process, we had followed all the relevant SAP notes.
In our case, on SUSE Enterprise Linux for SAP 12, we were explicitly following SAP Note 1275776 “Linux: Preparing SLES for SAP environments”.
Inside that SAP note, you go through the process of understanding the difference between sapconf and saptune, plus actually configure saptune (since it comes automatically with the “for SAP” versions of SLES 12).

Once configured, saptune should apply all the best practices that are encompassed in a number of SAP notes including SAP Note 2205917 “SAP HANA DB: Recommended OS settings for SLES 12 / SLES for SAP Applications 12”, which is itself needed during the HANA DB installation preparation work.
If you follow the note, there are a number of required O/S adjustments that are needed for HANA, which can be either applied manually, or (as recommended) automatically via saptune, provided the correct saptune profile is selected.

As part of our configuration, we had applied saptune solution profile S4HANA-DBSERVER (also noted in the SUSE documentation for SAP HANA).
This is applied using the standard:

saptune solution apply S4HANA-DBSERVER

You don’t get a lot of feedback from the saptune execution, but the fact there are no errors, indicates (normally) that it has done what has been requested.
You can check it has applied the profile by executing:

saptune solution list

The item that is starred in the returned list, is the profile that has been applied.
That’s it.

As part of my troubleshooting I even took the trouble of running the publicly available script sapconf_saptune_check (see here: https://github.com/scmschmidt/sapconf_saptune_check/blob/master/sapconf_saptune_check ), which just confirmed that saptune was indeed active/enabled and had a valid profile configured:

Back to the task of checking out the performance issue, and you can probably see where this is going now.
On investigation of the actual saptune profile contents, it was possible to see that a large majority of O/S changes had not been applied.
Specifically, we were not seeing the NOOP scheduler selected for the HANA disks devices.

By executing either of the following, you can check the currently selected scheduler:

grep -l ‘.*’ /sys/block/s??/queue/scheduler

or

cat /sys/block/s??/queue/scheduler

The selected scheduler will be in square brackets.
In my case, I was seeing “[cfq]” for all devices. Not good and not the recommendation from SAP and SUSE.
This setting should be automatically adjusted by the tuned daemon.

Looking at my version of saptune, I could see it was version 1.1.7 (from the output of the execution of the sapconf_saptune_check script).

Reading some of the recent blog posts from Soeren Schmidt here: https://blogs.sap.com/2019/05/03/a-new-saptune-is-knocking-on-your-door/
I could see that version 2 of saptune was now released.

Downloading the newer version (not installing directly!), reverting the old solution profile, installing the new saptune version and finally re-applying the same profile, confirmed that saptune was the culprit.

The new saptune2 fixed the issue, immediately activating a number of critical O/S changes, including the NOOP scheduler setting on each device.

The moral of the story, is therefore that as well as following the SAP processes, you still need to actually validate what it says it should have done.
The new saptune2 has been incorporated into our build process, plus the configuration check scripts will be specifically checking for it.
However, since the upgrade from saptune1 to saptune2 could cause issues if it just blindly re-applied the “new” profile settings, SAP have made saptune follow a backwards compatible upgrade process, whereby the O/S settings are retained as they were before the upgrade was executed.

Therefore, as per the SAP Note 2816790 “Differences between sapconf and saptune” links, the upgrade process for an already applied profile, is to revert it prior to the saptune upgrade, then applied the upgrade, then re-apply.
This could therefore not just be rolled out via our standard SLES patching routine. We had to develop an automated script that would specifically pre-patch saptune to saptune2 using the correct procedure, before we embarked on the next SLES patching round.

As a post-note, you should make yourself familiar with the coming changes to the SLES scheduler settings, with the introduction of the NONE scheduler (see below links for link to the blog).

Useful notes/links:
https://www.suse.com/c/sles-1112-os-tuning-optimisation-guide-part-1/
https://blogs.sap.com/2019/06/25/sapconf-versus-saptune-in-more-detail/
https://blogs.sap.com/2019/05/03/a-new-saptune-is-knocking-on-your-door/
https://www.suse.com/c/noop-now-named-none/