Ops Archives » Page 3 of 16 » Musings of an IT Implementor

Is my Azure hosted SLES 12 Linux VM Affected by the BootHole Vulnerability

In July 2020, a GRUB2 bootloader vulnerability was discovered which could allow attackers to replace the bootloader on a machine which has Secure Boot turned on.
The vulnerability is designated CVE-2020-10713 and is rated 8.2 HIGH on the CVSS (see here).

Let’s look at what this is and how it impacts a Microsoft Azure virtual machine running SUSE Enterprise Linux 12, which is commonly used to run SAP systems such as SAP HANA or other SAP products.

What is the Vulnerability?

It is a “Classic Buffer Overflow” vulnerability in the GRUB2 bootloader for versions prior to 2.06.
Essentially, some evil input data can be entered into some part of the GRUB2 program binaries, which is not checked/validated.
The input data causes an overflow of the holding memory area into adjacent memory areas.
By carefully crafting the data that is the overflow, it is possible to cause a specifically targeted memory area to be overwritten.

As described by Eclypsium here (the security company that detected this) “Attackers exploiting this vulnerability can install persistent and stealthy bootkits or malicious bootloaders that could give them near-total control over the victim device“.

Essentially, the vulnerability allows an attacker with root privileges to replace the bootloader with a malicious one, boot into it and then have further capability to effectively set up camp (a backdoor) on the server.
This backdoor would be hard to remove because the bootloader is one of the first things to be booted (anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus).

What is GRUB2?

GRUB2 is v2 of the GRand Unified Bootloader (see here for the manual).
It is used to load the main operating system of a computer.
Usually on Linux virtual machines, GRUB is used to load Linux. It is possible to install GRUB on machines that then boot into Windows.

What is Secure Boot?

There are commonly two boot methods: “Legacy Boot” and “Secure Boot” (a.k.a UEFI boot).
Until Secure Boot was invented, the bootloader would sit in a designated location on the hard disk and would be executed by the computer BIOS to start the chain of processes for the computer start up.
This is clearly quite insecure, since any program could put itself at the designated location and then be executed at boot up.

With Secure Boot, certificates are used to secure the boot process chain.
As with any certificate based process, at the top (root) level there needs to exist a certificate which is valid for many years and is ultimately trusted – the Certificate Authority (CA).
The next levels in the chain trust that CA certificate implicitly and if any point in the chain is compromised, then the trust is broken and will need re-establishing with new certificates.
Depending which level of the chain is compromised, will dictate the amount of effort needed to fix it.

This BootHole vulnerability means a new CA certificate needs to be implemented in every machine that uses Secure Boot!

But the attackers Need Root?

Yes, the vulnerability is in a GRUB2 configuration text file owned by the root user. Additional text added to the file can cause the buffer overflow.
Once the attacker has used malware to instigate the overflow, and installed a malicious bootloader, they then have a backdoor to the server, which would be executed every time the server is rebooted.
This backdoor would be hard to remove because the bootloader is one of the first things to be booted (anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus).

NOTE: The flaw also exists if you also use the network boot capability (PXE boot).

What is the Patch?

Due to the complexity of the problem (did you read the prior Eclypsium link?), it needs more than one piece of software to be patched and in different layers of the boot chain.

First off, the vulnerable GRUB2 software needs patching; this is quite easy and will require a reboot of the Linux O/S.
The problem with patching just GRUB2, is that it is still possible for an attacker with root to re-install a vulnerable version of GRUB2 and then use that vulnerable version to compromise the system further.
Remember, the chain of trust is still trusting that vulnerable version of GRUB2.
Therefore, to be able to stop the vulnerable version of GRUB2 being re-installed and used, three things need to happen:

The O/S vendor (SUSE) needs to adjust their code (known as the “shim”) so that it no longer trusts the vulnerable version of GRUB2. Again, this is a software patch from the O/S vendor (SUSE) which will need a reboot.
Since someone with root could simply re-install O/S vendor code (the “shim”) that trusts the vulnerable version of GRUB2, the adjusted O/S vendor code will need signing and trusting by the certificates further up the chain.
The revocation list of Secure Boot needs to be adjusted to prevent the vulnerable version of the O/S vendor code (“shim”) from being called during boot. (This is known as the “dbx” (exclusion database), which will need updating with a firmware update).

What is SUSE doing about it?

There needs to be a multi-pronged patching process because SUSE also found some additional bugs during their analysis.

You can see the SUSE page on CVE-2020-10713 here, which includes the mention of the additional bugs.

They key point is that you *could* start patching, but if it were me, I would be tempted to wait until the SUSE “shim” has been updated with the new chain certificate, patch GRUB2 and then update the “dbx”.

How does this impact Azure VMs?

In the previous paragraphs we found that a firmware update is needed to update the “dbx” exclusion database.
Since Microsoft Azure is using the Hyper-V hypervisor, the “firmware” is actually software in Hyper-v.
See here, which says: “Secure Boot or UEFI firmware isn’t required on the physical Hyper-V host. Hyper-V provides virtual firmware to virtual machines that is independent of what’s on the Hyper-V host.“

So the above would indicate that the Virtual Machine contains the necessary code from Hyper-V.
I would imagine that this is included at VM creation time.

If we dig into the VM details a little bit here on the Microsoft sites, we find:

So the above states that “…generation 2 VMs in Azure do not support Secure Boot…“.
The words “…in Azure…” are the key part of this.

OK, then how about Hyper-V in general (on-premise):

The above states “To Secure Boot generation 2 Linux virtual machines, you need to choose the UEFI CA Secure Boot template when you create the virtual machine.“.
BUT this is for Hyper-V in general, not for Azure virtual machines.

So we know that Secure Boot is not available in Azure on any of the generation 1 or generation 2 VMs (as of writing there are only 2).

Summary:

The BootHole vulnerability is far reaching and will impact many, many devices (servers, laptops, IoT devices, TVs, fridges, cars?).
However, only those devices that actually *use* Secure Boot will truly be impacted, since the devices not using Secure Boot do not need to be patched (it’s fruitless).

If you run SLES 12 on Azure virtual machines, you cannot possibly use Secure Boot, so there is no point patching to fix a vulnerability for which you are not affected.
You are only introducing more risk by patching.

If however, you do decide to patch (even if you don’t need to) then follow the advice from SUSE and patch to fix GRUB2, the “shim” and the other vulnerabilities that were found.

If you are running SLES on Azure, then there is no specific order of patching, because you do not use Secure Boot, so there is no possibility of breaking the trust chain that doesn’t exist.

On a final closing point, you could be running a HANA system in Azure on what is known as “HANA Large Instances” (HLI). These are physical machines. So whilst Virtual Machines can’t use Secure Boot, these physical machines may well do so. You would be wise to contact your Microsoft account representative to establish if they will be patching the firmware.

Useful Links:

NIST CVE-2020-10713: nvd.nist.gov/vuln/detail/CVE-2020-10713
Eclypsium BootHole: eclypsium.com/2020/07/29/theres-a-hole-in-the-boot/
Azure Virtual Machines – Generation 2 & Secure Boot: docs.microsoft.com/en-us/azure/virtual-machines/generation-2
Hyper-v and Secure Boot: docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/should-i-create-a-generation-1-or-2-virtual-machine-in-hyper-v

Checking Azure Disk Cache Settings on a Linux VM in Shell

In a previous blog post, I ended the post by showing how you can use the Azure Enhanced Monitoring for Linux to obtain the disk cache settings.
Except, as we found, it doesn’t easily allow you to relate the Linux O/S disk device names and volume groups, to the Azure data disk names.

You can read the previous post here: Listing Azure VM DataDisks and Cache Settings Using Azure Portal JMESPATH & Bash

In this short post, I pick up where I left off and outline a method that will allow you to correlate the O/S volume group name, with the Linux O/S disk devices and correlate those Linux disk devices with the Azure data disk names, and finally, the Azure data disks with their disk cache settings.

Using the method I will show you, you will see how easily you can verify that the disk cache settings are consistent for all disks that make up a single volume group (very important), and also be able to easily associate those volume groups with the type of usage of the underlying Azure disks (e.g. is it for database data, logs or executable binaries).

1. Check If AEM Is Installed

Our first step is to check if the Azure Enhanced Monitoring for Linux (AEM) extension is installed on the Azure VM.
This extension is required, for your VM to be supported by SAP.

We use standard Linux command line to check for the extension on the VM:

ls -1 /var/lib/waagent/Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-*/config/0.settings

The listing should return at least 1 file called “0.settings”.
If you don’t have this and you don’t have a directory starting with “Microsoft.OSTCExtensions.AzureEnhancedMonitorForLinux-“, then you don’t have AEM and you should get it installed following standard Microsoft documentation.

2. Get the Number of Disks Known to AEM

We need to know how many disks AEM knows about:

grep -c 'disk;Caching;' /var/lib/AzureEnhancedMonitor/PerfCounters

3. Get the Number of SCSI Disks Known to Linux

We need to know how many disks Linux knows about (we exclude the root disk /dev/sda):

lsscsi --size --size | grep -cv '/dev/sda'

4. Compare Disk Counts

Compare the disks quantity from AEM and from Linux. They should be the same. This is the number of data disks attached to the VM.

If you have a lower number from the AEM PerfCounters file, then you may be suffering the effects of an Azure bug in the AEM extension which is unable to handle more than 9 data disks.
Do you have more than 9 data disks?

At this point if you do not have matching numbers, then you will not be able to continue, as the AEM output is vital in the next steps.

Mapping Disks to the Cache Settings

Once we know our AEM PerfCounters file contains all our data disks, we are now ready to map the physical volumes (on our disk devices) to the cache settings. On the Linux VM:

pvs -o "pv_name,vg_name" --separator=' ' --noheadings

Your output should be a list of disks and their volume groups like so (based on our diagram earlier in the post):

/dev/sdc vg_data
/dev/sdd vg_data

Next we look for a line in the AEM PerfCounters file that contains that disk device name, to get the cache setting:

awk -F';' '/;disk;Caching;/ { sub(/\/dev\//,"",$4); printf "/dev/%s %s\n", tolower($4), tolower($6) }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be the Linux disk device name and the Azure data disk cache setting:

/dev/sdc none
/dev/sdd none

For each line of disks from the cache setting, we can now see what volume group it belongs to.
Example: /dev/sdc is vg_data and the disk in Azure has a cache setting of “none”.

If there are multiple disks in the volume group, they all must have the same cache setting applied!

Finally, we look for the device name in the PerfCounters file again, to get the name of the Azure disk:

NOTE: Below is looking specifically for “sdc”.

awk -F';' '/;Phys. Disc to Storage Mapping;sdc;/ { print $6 }' /var/lib/AzureEnhancedMonitor/PerfCounters

The output will be like so:

None sapserver01-datadisk1
None sapserver01-datadisk2

We can ignore the first column output (“None”) in the above, it’s not needed.

Summary

If you package the AEM disk count check and the subsequent AEM PerfCounters AWK scripts into one neat script with the required loops, then you can get the output similar to this, in one call:

/dev/sdd none vg_data sapserver01-datadisk2
/dev/sdc none vg_data sapserver01-datadisk1
/dev/sda readwrite

Based on the above output, I can see that my vg_data volume group disks (sdc & sdd) all have the correct setting for Azure data disk caching in Azure for a HANA database data disk location.

Taking a step further, if you have intelligently named your volume group names, you then also check in your script, the cache setting based on the name of the volume group to determine if it is correct, or not.
You can then embed this validation script into a “custom validation” within SAP LaMa and it will alert you automatically if your VM disk cache settings are not correct.

You may be wondering, why not do all this from the Azure Portal?
Well, the answer to that is that you don’t know what Linux VM volume groups those Azure disks are used by, unless you have tagged them or named them intelligently in Azure.

Critical SAP Host Agent Security Changes in PL47 – PermissionPolicy

The SAP Host Agent is a critical part of the SAP landscape infrastructure, used to control and, importantly, help automate some aspects of SAP systems.
Generally, writing custom scripts for the Host Agent has been easy.
With experience, it’s easy to see how the Host Agent could be easily abused in such a way that could allow highly privileged access to the server host, without certain security considerations being implemented.

As of the SAP Host Agent 7.21 PL47, the security of the SAP Host Agent and the way that it executes custom scripts is changing.
In this post I will describe how this could break a few things.

What Can The Host Agent Be Used For?

In my experience I have used the Host Agent for the following:

Detecting SAP instances on a server host.
Patching SAP instances on a server host.
Starting/Stopping SAP instances on a server host.
Executing scripts on a server host.

Some of the above actions have been performed direct on the server, from SAP BPA (Business Process Automation), from scripts or from tools like Postman, and a lot of the time from SAP LaMa (Landscape Management).

See a previous post for a more detailed example: How an Azure hosted SAP LaMa Controlled SAP System Starts Up

In the majority of cases I have been calling custom scripts, written to perform specific tasks on the target server host.
The scripts are generally hosted in a central location, accessible from all server hosts. This makes it simple to call whichever script.

To be able to execute a custom script, a Host Agent operation descriptor file is required to be deployed into the operations.d directory of the Host Agent home executable directory (usually /usr/sap/hostctrl/exe or C:\Program Files\SAP\hostctrl\exe).
The descriptor allows the Host Agent to understand how to execute the custom script. It contains, for example, the target platform (Windows\Linux), the name and path for the target script, which operating system user is needed to execute the script and any parameters.

On Linux, the descriptor can be specified to execute the target script as any operating system user on Linux, including the root user.
For this reason, the Host Agent and it’s installation directory location are owned by the root user. All files are only modifiable by the root user.

On Windows it is more secure by default.
The Windows security mechanisms prevent the Host Agent from executing any script as any user other than the Computer SYSTEM user (this is the user that the Host Agent executes as). NOTE: I have a workaround for this which I have developed.

Even though the Host Agent installation location and descriptor location and files are not necessarily easily modified, the weakest link in the security chain is the target script/executable and the location of the target script/executable.

What is Changing With Patch Level 47?

From June 2020, with the introduction of Host Agent 7.21 PL47, a new set of security requirements (PermissionPolicy) are introduced, which make the Host Agent more secure when executing custom scripts.

In fact, the changes were introduced before PL47, probably in PL44 or 45, as I remember seeing the PermissionPolicy check output in a previous trace file. It was obviously disabled by default in those prior patch levels.

The main changes introduced by the new PermissionPolicy are:

The target script and its directory must be owned by the same user as is specified in the descriptor file for the execution of the script, or it should be executable by the root user (Linux).
The script’s source directory must be writeable by this same user or root (Linux), or be writeable by the primary group of the user.
If the script is located on an NFS share, “root squash” must be disabled.

What Is Impacted By the New PermissionPolicy Change?

Any descriptor in the Host Agent operations.d directory, will be impacted.
Any target script will be checked by the new Host Agent security policy.
Only Linux/Unix servers will be affected due to the way that Windows security works (as mentioned before).

Because the new security policy affects Linux and affects any descriptor, this will also have a direct impact on some SAP HANA HSR operations performed from SAP LaMa, plus impact any custom operations that you have created.

By default the new security policy is enabled in the Host Agent as soon as you apply patch level 47.

How to Minimise Disruption?

A lot of customer implement the Host Agent auto-update feature, which saves significant effort when applying the frequent SAP Host Agent patches to the entire landscape.

The auto-update feature has one downside; it’s too easy to apply a patch to the whole landscape without reading the SAP notes to discover the contents of the patch or any changes in the patch. Make sure you always read the notes and make sure your auto-update architecture is designed to allow selective roll-out of the Host Agent patches to a portion of your landscape at a time (not the whole landscape in one go).

See here for a brief overview of SAP Host Agent auto-update.

The SAP note 2932953 mentions a method of adjusting the descriptor file to disable the new PermissionPolicy setting completely.
However, this needs pro-active adjustment, since some of the operations affected may only be used in a HANA HSR failover scenario (you will not know it doesn’t work until you need to use it).

Disabling the new security policy is obviously not a long term solution, since it could be enforced in the future.

Remember: Make your desired PermissionPolicy changes to your descriptor files before you apply the Host Agent patch.

HowTo: Show Current Role of a HA SAP Cloud Connector

If you have installed the SAP Cloud Connector, you will know that out-of-the-box it is capable of providing a High Availability feature at the application layer.

Essentially, you can install 2x SAP Cloud Connectors on 2x VMs and then they can be paired so that one acts as Master and one as “Shadow” (secondary).

The Shadow instance connects to the Master to replicate the required configuration.

If you decide to patch the Cloud Connector (everything needs patching right?!), then you can simply patch the Shadow instance, trigger a failover then patch the old Master.

There is only one complication in this, and that is that it’s not “easy” to see which is acting in which role unless you log into the web administration console.

You can go through the log files to see which has taken over the role of Master at some point, but this is a not easy and doesn’t lend itself to being scripted for automated detection of the current role.

Here’s a nice easy way to detect the current role, and could be used (for example) as part of a Custom Instance monitor script for SAP LaMa automation of the Cloud Connector:

awk '/<haRole>/ { match($1,/<haRole>(.*)<\/haRole>/,role); if (role[1] != "" ) { print role[1]; exit } }' /opt/sap/scc/scc_config/scc_config.ini

Out will be either “shadow”, or “master”.

I use awk a lot of the time for pattern group matching because I like the simplicity, it’s a powerful tool and deserves the very long O’Reilly book.

Here’s what that single code line is doing:

awk	The call to the program binary.
‘	Start the contents of the inline AWK script (prevents interpretation by the shell).
/<haRole>/	Match every line that contains the <haRole> tag.
{	On each line match, execute this block of code (we close with “}”).
$1	Match against the 1st space delimited parameter on the line.
/<haRole>(.)<\/haRole>/,*	Obtain any text “.*” between <haRole> tag.
role	Store the match in a new array called “role”.
if (role[1] != “” )	Check that after the matching, the role array has 2 entries (zero initialised array).
{ print role[1]; exit }	If we do have 2 entries, print the second one (1st is the complete matched text string) from the array and exit.
}’	Close off the command and AWK script.
/opt/sap/scc/ scc_config/ scc_config.ini	The name of the input file for AWK to scan.

It’s a nice simple way of checking the current role, and can be embedded into a shell script for easy execution.

Java 8 SE – I Just Removed It

I have just removed Java 8 SE from my computer.
I wrote a blog post a while back about how Oracle was changing the way it licenses the Java virtual machine 8 Standard Edition (SE).
You can read it here: SAP JVM and the Oracle Java SE 8 Licensing Confusion

At the time of the post, it was not very clear how the license changes were going to impact the use of the Java 8 SE virtual machine.

What’s Changed?

Briefly, at the end of January 2019, Oracle have essentially now stopped free updates to the Java 8 SE for non-personal customers.
This means that you as a personal user can continue to update Java 8 SE, but as a corporate user you may only apply updates to Java 8 SE if you have purchased a subscription to receive the updates.

I’m a Corporate User

If you are a corporate user of the Oracle 8 SE, unless you have a subscription, you can no longer update Java 8 SE.
If you wish to remain secure and remove security risks from your computers, you should de-install it if you do not want to purchase a subscription from Oracle.
If you do not uninstall Java 8 SE, but continue to update it, and you are audited by Oracle, then you may need to pay for a subscription.

Can Oracle Audit Me?

The Java 8 SE auto-update application now displays a prompt on machines that have the auto-update enabled and that have an internet connection.
If you choose to “Install” (you already have it installed) then at that point Oracle deem that you have accepted the license agreement and they can audit your company for the use of Oracle products.

How Do I Remove Java 8 SE?

For me it was easy.
My Java 8 SE installation has the auto-update function enabled, so it simply told me the license terms had changed and offered me the button to simply remove it. So I did.

You may need to uninstall it from within the Windows program uninstallation tool within Windows Control Panel.
Your IT teams may have already started the removal process automatically.

What If I Need an Up-to-date Java 8 VM?

If you need a Java 8 JVM, you can move to an open source version of Java, such as OpenJDK, or a number of others.

For SAP customers wanting to run their SAP tools, you can actually use the SAPJVM for use with your SAP tools such as SAP Software Download Manager, SAP HANA Studio, SAP ABAP tools on Eclipse and other Java based tools.

How Do I Download SAPJVM?

Downloading the SAPJVM is simple.
Take a look at SAP note 1442124 “How to download a SAP JVM patch from the SMP”.

References:

https://www.it-implementor.co.uk/2019/01/sap-jvm-and-oracle-java-se-8-licensing.html
https://blogs.oracle.com/java-platform-group/extension-of-oracle-java-se-8-public-updates-and-java-web-start-support
upperedge.com/oracle/top-3-reasons-oracle-java-users-are-unknowingly-out-of-compliance/
www.oracle.com/downloads/licenses/binary-code-license.html www.oracle.com/downloads/licenses/javase-license1.html www.oracle.com/technetwork/java/javase/terms/oaa.html

Category: Ops