This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Is my Azure hosted SLES 12 Linux VM Affected by the BootHole Vulnerability

In July 2020, a GRUB2 bootloader vulnerability was discovered which could allow attackers to replace the bootloader on a machine which has Secure Boot turned on.
The vulnerability is designated CVE-2020-10713 and is rated 8.2 HIGH on the CVSS (see here).

Let’s look at what this is and how it impacts a Microsoft Azure virtual machine running SUSE Enterprise Linux 12, which is commonly used to run SAP systems such as SAP HANA or other SAP products.

What is the Vulnerability?

It is a “Classic Buffer Overflow” vulnerability in the GRUB2 bootloader for versions prior to 2.06.
Essentially, some evil input data can be entered into some part of the GRUB2 program binaries, which is not checked/validated.
The input data causes an overflow of the holding memory area into adjacent memory areas.
By carefully crafting the data that is the overflow, it is possible to cause a specifically targeted memory area to be overwritten.

As described by Eclypsium here (the security company that detected this) “Attackers exploiting this vulnerability can install persistent and stealthy bootkits or malicious bootloaders that could give them near-total control over the victim device“.

Essentially, the vulnerability allows an attacker with root privileges to replace the bootloader with a malicious one, boot into it and then have further capability to effectively set up camp (a backdoor) on the server.
This backdoor would be hard to remove because the bootloader is one of the first things to be booted (anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus).

What is GRUB2?

GRUB2 is v2 of the GRand Unified Bootloader (see here for the manual).
It is used to load the main operating system of a computer.
Usually on Linux virtual machines, GRUB is used to load Linux. It is possible to install GRUB on machines that then boot into Windows.

What is Secure Boot?

There are commonly two boot methods: “Legacy Boot” and “Secure Boot” (a.k.a UEFI boot).
Until Secure Boot was invented, the bootloader would sit in a designated location on the hard disk and would be executed by the computer BIOS to start the chain of processes for the computer start up.
This is clearly quite insecure, since any program could put itself at the designated location and then be executed at boot up.

With Secure Boot, certificates are used to secure the boot process chain.
As with any certificate based process, at the top (root) level there needs to exist a certificate which is valid for many years and is ultimately trusted – the Certificate Authority (CA).
The next levels in the chain trust that CA certificate implicitly and if any point in the chain is compromised, then the trust is broken and will need re-establishing with new certificates.
Depending which level of the chain is compromised, will dictate the amount of effort needed to fix it.

This BootHole vulnerability means a new CA certificate needs to be implemented in every machine that uses Secure Boot!

But the attackers Need Root?

Yes, the vulnerability is in a GRUB2 configuration text file owned by the root user. Additional text added to the file can cause the buffer overflow.
Once the attacker has used malware to instigate the overflow, and installed a malicious bootloader, they then have a backdoor to the server, which would be executed every time the server is rebooted.
This backdoor would be hard to remove because the bootloader is one of the first things to be booted (anti-virus can’t remove the bootloader if the bootloader boots first and “adjusts” the anti-virus).

NOTE: The flaw also exists if you also use the network boot capability (PXE boot).

What is the Patch?

Due to the complexity of the problem (did you read the prior Eclypsium link?), it needs more than one piece of software to be patched and in different layers of the boot chain.

First off, the vulnerable GRUB2 software needs patching; this is quite easy and will require a reboot of the Linux O/S.
The problem with patching just GRUB2, is that it is still possible for an attacker with root to re-install a vulnerable version of GRUB2 and then use that vulnerable version to compromise the system further.
Remember, the chain of trust is still trusting that vulnerable version of GRUB2.
Therefore, to be able to stop the vulnerable version of GRUB2 being re-installed and used, three things need to happen:

  1. The O/S vendor (SUSE) needs to adjust their code (known as the “shim”) so that it no longer trusts the vulnerable version of GRUB2. Again, this is a software patch from the O/S vendor (SUSE) which will need a reboot.
  2. Since someone with root could simply re-install O/S vendor code (the “shim”) that trusts the vulnerable version of GRUB2, the adjusted O/S vendor code will need signing and trusting by the certificates further up the chain.
  3. The revocation list of Secure Boot needs to be adjusted to prevent the vulnerable version of the O/S vendor code (“shim”) from being called during boot. (This is known as the “dbx” (exclusion database), which will need updating with a firmware update).

What is SUSE doing about it?

There needs to be a multi-pronged patching process because SUSE also found some additional bugs during their analysis.

You can see the SUSE page on CVE-2020-10713 here, which includes the mention of the additional bugs.

They key point is that you *could* start patching, but if it were me, I would be tempted to wait until the SUSE “shim” has been updated with the new chain certificate, patch GRUB2 and then update the “dbx”.

How does this impact Azure VMs?

In the previous paragraphs we found that a firmware update is needed to update the “dbx” exclusion database.
Since Microsoft Azure is using the Hyper-V hypervisor, the “firmware” is actually software in Hyper-v.
See here, which says: “Secure Boot or UEFI firmware isn’t required on the physical Hyper-V host. Hyper-V provides virtual firmware to virtual machines that is independent of what’s on the Hyper-V host.

So the above would indicate that the Virtual Machine contains the necessary code from Hyper-V.
I would imagine that this is included at VM creation time.

If we dig into the VM details a little bit here on the Microsoft sites, we find:

So the above states that “…generation 2 VMs in Azure do not support Secure Boot…“.
The words “…in Azure…” are the key part of this.

OK, then how about Hyper-V in general (on-premise):

The above states “To Secure Boot generation 2 Linux virtual machines, you need to choose the UEFI CA Secure Boot template when you create the virtual machine.“.
BUT this is for Hyper-V in general, not for Azure virtual machines.

So we know that Secure Boot is not available in Azure on any of the generation 1 or generation 2 VMs (as of writing there are only 2).

Summary:

The BootHole vulnerability is far reaching and will impact many, many devices (servers, laptops, IoT devices, TVs, fridges, cars?).
However, only those devices that actually *use* Secure Boot will truly be impacted, since the devices not using Secure Boot do not need to be patched (it’s fruitless).

If you run SLES 12 on Azure virtual machines, you cannot possibly use Secure Boot, so there is no point patching to fix a vulnerability for which you are not affected.
You are only introducing more risk by patching.

If however, you do decide to patch (even if you don’t need to) then follow the advice from SUSE and patch to fix GRUB2, the “shim” and the other vulnerabilities that were found.

If you are running SLES on Azure, then there is no specific order of patching, because you do not use Secure Boot, so there is no possibility of breaking the trust chain that doesn’t exist.

On a final closing point, you could be running a HANA system in Azure on what is known as “HANA Large Instances” (HLI). These are physical machines. So whilst Virtual Machines can’t use Secure Boot, these physical machines may well do so. You would be wise to contact your Microsoft account representative to establish if they will be patching the firmware.

Useful Links:

Cookies, SAP Analytics Cloud and CORS in Netweaver & HANA

Back in 2019 (now designated as 2019AC – Anno-Covid19), I wrote a post explaining in simple terms what CORS is and how it can affect a SAP landscape.
In that post I showed a simple “on-premise” setup using Fiori, a back-end system and how a Web Dispatcher can help alleviate CORS issues without needing too much complexity.
This post is about a recent CORS related issue that impacts access to back-end SAP data repositories.

Back To The Future

If we hit the “Fast-Forward” button to 2020MC (Mid-Covid19), CORS is now an extremely important technical setup to enable Web Browser based user interfaces to be served from Internet based SAP SaaS services (like SAP Analytics Cloud) and communicate with back-end on-premise/private data sources such as SAP BW systems or SAP HANA databases.

We see that CORS is going to become ever more important going forward, since Web Browser based user interfaces will become more abundant (due to the increase of SaaS products) for the types of back-end data access. The old world of installing a software application on-premise takes too much time and effort to keep up with changing technology.
Using SaaS applications as user interfaces to on-premise data allows a far more agile delivery of user functionality.

The next generation of Web Interfaces will be capable of processing ever larger data sets, with richer capabilities and more in-built intelligence. We’re talking about the Web Browser being a central hub of cross-connected Web Based services.
Imagine, one “web application” that needs a connection to a SaaS product that provides the analytical interface and version management, a connection to one or more back-end data repositories, a connection to a separate SaaS product for AI data analysis and pattern matching (deep insights), a connection to a separate SaaS product for content management (publishing), a connection to a separate SaaS product for marketing and customer engagement.

All of that, from one central web “origin” will mean CORS will become critical to prevent unwanted connections and data leaks. The Web Browser is already the target of many cyber security exploits, therefore staying secure is extremely important, but security is always at the expense of functionality.

IETF Is On It

The Internet Engineering Task Force already have this in hand. That’s how we have CORS in the first place (tools.ietf.org/html/rfc6454).
The Web Origin Concept is constantly evolving to provide features for useability and also security. Way back in 2016 an update to RFC 6265 was proposed, to enhance the HTTP state management mechanism, which is commonly known to you and I as “cookies”.

This amendment (the RFC details are here: tools.ietf.org/html/draft-ietf-httpbis-cookie-same-site-00) was the SameSite attribute that can be set for cookies.
Even in this RFC, you can see that it actually attributes the idea of “samedomain-cookies” back to Mozilla, in 2011. So this is not really a “new” security feature, it’s a long time coming!

The Deal With SAC

The “problem” that has brought me back around to CORS, is recent experience with a CORS issue and SAP Analytics Cloud (SAC).
The issue led me to a blog post by Dong Pan of SAP Canada in Feb 2020 and a recent blog post by Ian Henry, also of SAP in Aug 2020.

Dong Pan wrote quite a long technical blog post on how to fix or work-around the full introduction of the SameSite cookie attribute in Google Chrome version 80 when using SAP Analytics Cloud (SAC).

Ian Henry’s post is also based on the same set of solutions that Dong Pan wrote about, but his issue was accessing a backend HANA XS Engine via Web Dispatcher.

The problem in both cases is that SAP Analytics Cloud (SAC) uses the Web Browser as a middleman to create a “Live Connection” back to an “on-premise” data repository (such as SAP BW or SAP S/4HANA), but the back-end SAP Netweaver/SAP ABAP Platform stack/HANA XS engine, that hosts the “on-premise” data repository does not apply the “SameSite” attribute to cookies that it creates.

You can read Dong Pan’s blog post here: www.sapanalytics.cloud/direct-live-connections-in-sap-analytics-cloud-and-samesite-cookies/
You can read Ian Henry’s blog post here: https://blogs.sap.com/2020/08/26/how-to-fix-google-chrome-samesite-cookie-issue-with-sac-and-hana/

By not applying the “SameSite” attribute to the cookie, Google Chrome browsers of version 80+ will not allow SAC to establish a full session to the back-end system.
You will see an HTTP 400 “session expired” error when viewing the HTTP browser traffic, because SAC tries to establish the connection to the back-end, but no back-end system cookies are allowed to be visible to SAC. Therefore SAC thinks you have no session to the back-end.

How to See the Problem

You will need to be proficient at tracing HTTP requests to be able to capture the problem, but it looks like the following in the HTTP response from the back-end system:

You will see (in Google Chrome) two yellow warning triangles on the “set-cookie” headers in the response from the back-end during the call to “GetServerInfo” to establish the actual connection.
The call is the GET for URL “/sap/bw/ina/GetServerInfo?sap-client=xxx&sap-language=EN&sap-sessionviaurl=X“, with the sap-sessionviaurl in the query-string being the key part.
The text when you hover over the yellow triangle is: “This Set-Cookie didn’t specify a “SameSite” attribute and was defaulted to “SameSite=Lax,” and was blocked because it came from a cross-site response which was not the response to a top-level navigation. The Set-Cookie had to have been set with “SameSite=None” to enable cross-site usage.“.

The Fix(es)

SAP Netweaver (or SAP ABAP Platform) needs some code fixes to add the required cookie attribute “SameSite”.

A workaround (it is a workaround) is possible by using the rewrite module capability of the Internet Communication Management (ICM) or using a rewrite rule in a Web Dispatcher, to re-write the responses and include a generic “SameSite” attribute on each cookie.
This is a workaround for a reason, because using the rewrite method causes unnecessary extra work in the ICM (or Web Dispatcher) for every request (matched or not matched) by the rewrite engine.

It’s always better (more secure, more efficient) to apply the code fix to Netweaver (or ABAP Platform) so the “SameSite” attribute is added at the point of the cookie creation.
For HANA XS, it will need a patch to be applied (if it ever gets fixed in the XS since it is soon deprecated).
With the workaround, we are forcing a setting onto cookies outside of the creation process of those cookies.

Don’t get me wrong, I’m not saying that the workaround should not be used. In some cases it will be the only way to fix this problem in some older SAP systems. I’m just pointing out that there are consequences and it’s not ideal.

Dong Pan and Ian Henry have done a good job of providing options for fixing this in a way that should work for 99% of cases.

Is There a Pretty Picture?

This is something I always find useful when I try and work something through in my mind.
I’ve adjusted my original CORS diagram to include an overview of how I think this “SameSite” attribute issue can be imagined.
Hopefully it will help.

We see the following architecture setup with SAC and it’s domain “sapanalytics.cloud”, issuing CORS requests to back-end system BE2, which sits in domain “corp.net”:

Using the above picture for reference, we can now show where the “SameSite” issue occurs in the processing of the “Resource Response” when it comes back to the browser from the BE2 back-end system:

The blocking, by the Chrome Web browser, of the cookies set by the back-end system in domain “corp.net”, means that from the point of view of SAC, no session was established.
There are a couple more “Request”, “Response” exchanges, before the usual HTTP Authorization header is sent from SAC, but at that point it’s really too late as the returned SAP SSO cookie will also be blocked.

At this point you could see a number of different error messages in SAC, but in the Chrome debugging you will see no HTTP errors because the actual HTTP request/response mechanism is working and HTTP content is being returned. It’s just that SAC will know it does not have a session established, because it will not be finding the usual cookies that it would expect from a successfully established session.

Hopefully I’ve helped explain what was already a highly technical topic, in a more visual way and helped convey the problem and the solution.


Useful Links:

Azure Front Door in a SAP Context

In April 2019, Microsoft announced the general availability of the Azure Front Door service.
The highlight of this service is layer 7 (HTTP/S) load balancing.
In this post I want to briefly explore how Azure Front Door could sit in an example SAP landscape.

But We Have Azure Application Gateway…

Yes, while the Azure Front Door service does provide similar capabilities with regards to load balancing an HTTP/s based back-end service, the similarities end when we start to consider multi-regional distribution of services. That is, multiple Azure regions actively servicing global clients.

Azure Application Gateway

The Azure Application Gateway service is the go-to service for HTTP/S load balancing for your Azure hosted HTTP/S IaaS or Container based services that are contained within a region.

Event for some, limited, SAP uses, the Azure Application Gateway may be sufficient, but you really need an experienced SAP Solution Architect to help you plan your SAP landscape architecture at this point. The consequences of doing it wrong, could cause you to completely re-implement a new architecture pattern in your landscape and, of course, additional cost.

… and SAP Web Dispatcher

I have discussed the features of the SAP Web Dispatcher before.
The need for a SAP Web Dispatcher in a SAP landscape is clear and even more appropriate in a cloud deployment of SAP.
Just like Azure Application Gateway, the SAP Web Dispatcher’s context should be limited to a single region. This is especially true because it is IaaS, which means the VMs on which the Web Dispatcher is deployed, are themselves bound to a specific region.

However, what is not clear is how disparate Web Dispatcher systems (i.e. different SAPGLOBALHOST values) can be used in different regions to correctly load balance. This is not the same as a single system with different instances in different regions!

How It All Hangs Together

If we go back to the purpose of this post, I wanted to show how Azure Front Door could be used within the context of a SAP system deployment in Azure.

To help convey the idea, I’ve put together a simple diagram:

In the above diagram, you can see that the Azure Front Door service is used to balance inbound requests from a customer booking system, across multiple Azure regions, directly from the internet. This means that Azure Front Door is most definitely suited as a global customer facing load balancer.
An example scenario is a 2 (or more) region architecture with primary region and disaster recovery region. If the primary region for our customer booking system is unavailable, a DR could be invoked and customers could be routed to the DR region, allowing customer bookings to be taken.

In the diagram, traffic routed from Azure Front Door, is then (for the sake of example) routed through Azure Application Gateway. This is just for example, but in reality it’s not really needed. It could be that you have a real mixture of SAP and non-SAP in some converged sub-domain, and it may be easier to load balance this mix of URLs at this level.
The main point at this point is, you are committed to returning data from a single region.

In our example diagram, the Azure Application Gateway then routes traffic to the SAP Web Dispatcher, which load balances the traffic over the back-end SAP ECC system available application servers using the ABAP stack message server (a feature that is not easily replicated in any other load balancer).

Where Does Azure Traffic Manager Sit?

The Azure Traffic Manager service is a DNS based routing and distribution service. If your company is a multinational conglomerate with a latency sensitive web based customer service, then the Azure Traffic Manager can be used to route customers to their nearest region, where you have your web service hosted and where they can potentially get the speedist and most appropriate content.
If you have only 2 or 3 regions, do not have latency issues and have no need to provide region specific content, then Azure Front Door is probably what you need.

Summary:

I’ve tried to show how the Azure Front Door service can provide your internet sourced, customer entry point into your multi-region web service.
The diagram I’ve provided hopefully shows how Azure Front Door can be distinguished from other similar technologies in a SAP landscape including how Azure Application Gateway could also be in the mix (although rare).
Finally I discuss how Azure Traffic Manager may not always be appropriate for load distribution.

Useful Links

SAP Web Dispatcher Reverse Proxy Features

If you read through any SAP documentation, you may be forgiven for thinking that the SAP Web Dispatcher is just a reverse HTTP proxy.
It can be located in front of a SAP WebAS and balance the load.
Therefore, it is a simple reverse proxy, right?

In this post, I am going to highlight some of the core features of the SAP Web Dispatcher, so that you may understand its strengths in comparison with other solutions such as Azure Application Gateway and even Azure Front-Door.

Heavily Engineered

There’s a common misconception that SAP is just another piece of software using an array of different components lumped together with some bits of Open Source. In some small cases this may be true of acquired software.
However, the core SAP software offerings are actually far more coherent and intricately linked than you may first imagine.

Ask any Oracle EBS administrator about their software stack and you will be impressed at how well the SAP software stack has been engineered.
This is especially true for the lower SAP Kernel level software components. The older parts of the software stack, are reused so often because of their robustness.

3 Routing Principals

The main thing to remember is that the SAP Web Dispatcher can route requests according to 3 main principles:

  1. Capability
    Is the desired target URL path served by the configured target back-end system(s).

  2. Availability
    Is the desired target URL path served by a configured back-end system that is available (i.e. not in maintenance mode).

  3. Capacity
    Are there more than one target back-end servers capable of handling the request and which one has more capacity.

Load Balancing Act

The SAP Web Dispatcher takes the HTTP/S request from the end-user and as part of the routing determination it analyses the target back-end system load.
It’s actually continually aware of the back-end systems.

There’s a great picture here, which highlights the load balancing methods used for the different types of SAP back-end: https://help.sap.com/viewer/683d6a1797a34730a6e005d1e8de6f22/7.40.18/en-US/4899c3d999273987e10000000a421937.html

What is not mentioned on the help.sap.com page linked above, is target back-end systems configured as “EXTSRV” (non-SAP routing) and also the “flat-file” routing method (very rarely used – at least, I’ve not used it).

The “EXTSRV” back-end systems will use basic round-robin to distribute the request between a comma separated list of target servers. Sticky-ness is achieved through HTTP headers, allowing the Web Dispatcher to determine which back-end system it routed your previous request to.

Even though “EXTSRV” is really designed for non-SAP back-ends, I have used “EXTSRV” for SAP systems, especially when using the SAP Web Dispatcher to avoid issues for system-to-system communications and wanting to avoid CORS issues (see CORS in a SAP Netweaver Landscape).

The “flat-file” method simply uses a static text file as a kind of false load response from a Message Server. The flat-file can be generated by anything and the Web Dispatcher configuration is then defined to route to whatever is in the flat-file.

Back-End

Apart from “EXTSRV” and “flat-file”, all other routing mechanisms use SAP proprietary methods to determine the back-end system load.
As you can see in the SAP Help page link referenced above, the SAP Web Dispatcher knows about the back-end because in the SAP Web Dispatcher configuration, we tell it what it is going to be routing to.

As an example, ABAP back-end systems are added to the Web Dispatcher profile file with the ABAP Message Server described in the configuration.
The Web Dispatcher connects to the target system’s Message Server and says “hello”.
Once connected, the SAP Web Dispatcher retrieves the list of URLs that are provided by the ABAP back-end system, the servers that are served by the Message Server and the relative load of those servers.
All of this information is used during the routing determination.

Protocols

The Web Dispatcher can handle HTTP 1.0, 1.1 and 2.0 (HTTP/2) protocols delivered over TLS (SSL).

Since Kernel 7.49, HTTP/2 has been supported in the Web Dispatcher and also in the ABAP Netweaver stack. This is significant for the latest HTTP based SAP UX known as SAP Fiori. The use of HTTP/2 allows request multiplexing over a single continuous TCP connection, reducing latency and increasing throughput.

NOTE: There are some great SAP blogs out on there on how and why to enable HTTP/2 for Fiori!

For many years now, the SAP Web Dispatcher has supported the Web Socket protocol.
The Web Socket protocol allows developers to utilise push-notifications and provide a more real-time interactive experience for HTML 5.0 content.
Bringing a closer level of integration with the consuming Web Browser.

Security

Some of the more complex uses of the SAP Web Dispatcher involve specific security scenarios.

One such scenario that comes to mind, is Principal Propagation, which can use the Web Dispatcher to front a set of common back-end systems.
The whole premise of Principal Propagation, is that the iDP (identity provider) is “impersonating” the authenticated user, by issuing a generated certificate of authenticity to the target system, on behalf of the user.
With a reverse proxy between the Web Browser and the target HTTP service, things can become complex because the generated X.509 client certificate can become consumed by the proxy server, instead of being forwarded to the target HTTP server.
To prevent the certificate from being interpreted in the wrong way, the SAP Web Dispatcher can be configured to shift the client certificate out to a predefined HTTP header., allowing a kind of X.509 client certificate forwarding.
(More information can be found here: Principal Propagation with SAP Cloud Platform).

Update Aug-2020: As pointed out by a reader, the SAP Web Dispatcher is also capable of reverse invocation. This is an added security feature which allows the target SAP system to open the connection to the SAP Web Dispatcher (instead of the other way around). The SAP Web Dispatcher then uses this open connection channel to send load balanced requests back to the target SAP system. The Reverse Invoke feature is obviously meant for scenarios where the Web Dispatcher exists in a separate network segment (DMZ) to the target SAP system, meaning you only need to open the firewall in the outbound (from the target SAP system) direction.
(Details here: https://help.sap.com/doc/7b196aab728810148a4b1a83b0e91070/1511%20000/en-US/frameset.htm)

Manageability

There’s nothing I like about trying to trace a HTTP call through a proxy server.
The SAP Web Dispatcher comes with it’s own secure administration page from where an administrator can enable advanced tracing capabilities.

The SAP Web Dispatcher makes it much easier to trace requests and responses, with the ability to show the complete unencrypted trace of SSL encrypted sessions (not using pass-through encryption).

The trace is able to show the exact ABAP work process number that processed the request in the target back-end system.

An administrator is able to move individual back-end systems into “maintenance mode” and provide a custom HTTP 503 (service unavailable) message, without affecting the other back-end systems serviced by the same Web Dispatcher.

The SAP Web Dispatcher comes with a vast array of configuration parameters to hone the characteristics of the service you are trying to deliver.
As an example, parameter “wdisp/handle_webdisp_ap_header” can be set to allow the Web Dispatcher to add additional HTTP headers to the request, thereby informing the target back-end system of the Web Dispatcher forward-facing TCP ports. This feature allows the target back-end systems to correctly rewrite HTML links and referral URLs, with the ports on which the SAP Web Dispatcher is listening for requests.
This is just one example of where the back-end SAP system is actually aware that it is being called via a SAP Web Dispatcher.

The Future

With the seemingly constant evolution of cloud based services, what do I imagine the future is for the SAP Web Dispatcher?
In my opinion it is here for another few years yet. The feature list is too specific to SAP landscapes for any real profit to be made by a competitive product.
However, what we may see in this hyper-competitive race for cloud adoption, is the use of a SaaS based version of SAP Web Dispatcher, provided for by the major cloud providers.
Right now, a SAP Web Dispatcher consumes far too much cost/resources/effort than it needs to. Therefore, a simple button click and subsequent configuration in something like the Azure Portal, would be a great saving and more importantly, a great incentive to potential cloud customers with SAP landscapes.

Summary

In this short article, we have discussed how the robust engineering of the SAP Web Dispatcher makes it the ideal front-end reverse proxy for the back-end systems of a SAP landscape.

In fact, in some situations it is the only possibility due to the way the Web Dispatcher is acutely SAP back-end aware, with many features built for native SAP compatibility.

Conversely we’ve seen how, in some situations, the back-end system is actually aware of the presence of the SAP Web Dispatcher and can rewrite HTML links and referral URLs accordingly.

We know the latest HTTP/2 protocol is supported and that this is in line with SAP’s goal of having Fiori as the future SAP presentation layer.

We discussed the extensive tracing capabilities, helping SAP administrators to diagnose complex HTTP connectivity, and authentication issues.

We can conclude that, SAP Web Dispatcher is not just a simple reverse proxy and its use within your SAP landscape is more than likely going to be beneficial in some way or another.
The SAP Web Dispatcher will be with us for a while longer.

References:

SAP ASE Error – Process No Longer Found After Startup

This post is about a strange issue I was hitting during the configuration of SAP LaMa 3.0 to start/stop a SAP ABAP 7.52 system (with Kernel 7.53) that was running with a SAP ASE 16.0 database.

During the LaMa start task, the task would fail with an error message: “ASE process no longer found after startup. (fault code: 127)“.

When I logged directly onto the SAP server Linux host, I could see that the database had indeed started up, eventually.
So what was causing the failure?

The Investigation

At first I thought this was related to the Kernel, but having checked the versions of the Kernel components, I found that they were the same as another SAP system that was starting up perfectly fine using the exact same LaMa system.

The next check I did was to turn on tracing on the hostagent itself. This is a simple task of putting the trace value to “3” in the host_profile of the hostagent and restarting it:

service/trace = 3

The trace output is shown in a number of different trace files in the work directory of the hostagent but the trace file we were interested in is called dev_sapdbctrl.

The developer trace file for the sapdbctrl binary executable is important, because the sapdbctrl binary is executed by SAP hostagent (saphostexec) to perform the database start. If you observe the contents of the sapdbctrl trace output, you will see that it loads the Sybase specific shared library which contains the required code to start/stop the ASE database.

The same sapdbctrl also contains the ability to load the required libraries for other database systems.

As a side note, it is still not known to me, how the Sybase shared library comes to exist in the hostagent executable directory. When SAP ASE is patched, this library must also be patched, otherwise how does the hostagent stay in-step with the ASE database that it needs to talk with?

Once tracing was turned on, I shut the SAP ASE instance down again and then used SAP LaMa to initiate the SAP system start once again.
Sure enough, the LaMa start task failed again.

Looking in the trace file dev_sapdbctrl I could see the same error message that I was seeing in SAP LaMa:

Error: Command execution failed. : ASE process no longer found after startup. 
(fault code: 127) Operation ID: 000D3A3862631EEAAEDDA232BE624001
----- Log messages ---- 
Info: saphostcontrol: Executing StartDatabase 
Error: sapdbctrl: ASE process no longer found after startup. 
Error: saphostcontrol: StartDatabase failed (sapdbctrl exit code = 1)

This was great. It confirmed that SAP LaMa was just seeing the symptom of some other issue, since LaMa just calls the hostagent to do the start.

Now I knew the hostagent was seeing the error, I tried using the hostagent directly to perform the start, using the following:

/usr/sap/hostctrl/exe/saphostctrl -debug -function StartDatabase -dbname <SID> -dbtype syb -dbhost <the-ASE-host>

NOTE: The hostagent “-debug” command line option puts out the same information without the need for the hostagent tracing to be turned on in the host_profile.

Once again, the start process failed and the same error message was present in the dev_sapdbctrl trace file.

This was now really strange.
I decided that the next course of action was to start the process of raising the issue with SAP via an incident.
If you suspect that something could take a while to fix, then it’s always best to raise it with SAP early and continue to look at the issue in parallel.

Continuing the Diagnosis

While the SAP incident was in progress, I continued the process of trying to self-diagnose the issue further.
I tried a couple more things such as:

  • Starting and stopping SAP ASE manually using stopdb/startdb commands.
  • Restarting the whole server (this step has a place in every troubleshooting process, eventually).
  • Checking the server patch level.
  • Checking the environment of the Linux user, the shell, the profile files, the O/S limits applied.
  • Checking what happens if McAfee anti-virus was disabled (I’ve seen the ePO blocking processes before).

Eventually exhaustion set in and I decided to give the SAP support processor time to get back to me with some hints.

Some Sleep

I spend a lot of time solving SAP problems. A lot of time.
Something doesn’t work according to the docs, something did work but has stopped working, something has never worked well…
It builds up in your mind and you carry this stuff around in your head.
Subconsciously you think about these problems.

Then, at about 3am when you can’t get back to sleep, you have a revelation.
The hostagent is forking the process to start the database as the syb<sid> Linux user (it uses “su”), from the root user (hostagent runs as the root user).

Linux Domain Users

The revelation I had regarding the forking of the user, was just the trigger I needed to make me consider the way the Linux authentication was setup on this specific server with the problem ASE startup.

I remembered at the beginning of the project that I had hit an issue with the SSSD Linux daemon, which is responsible for interfacing between Linux and Microsoft Active Directory. At that time, the issue was causing the hostagent to hang when operations were executed which required a switch to another Linux user.
This previous issue was actually a hostagent issue that was fixed in a later hostagent patch. During that particular investigation, I requested that the Linux team re-configure the SSSD daemon to be more efficient with its Active Directory traversals, when it was looking to see if the desired user account was local to Linux or if it was a domain account.

With this previous issue in mind, I checked the SSSD configuration on the problem server. This is a simple conf file in /etc/sssd.

The Solution

After all the troubleshooting, the raising of the incident, the sleeping, I had finally got to the solution.

After checking the SSSD daemon configuration file /etc/sssd/sssd.conf, I could clearly see that there was one entry missing compared to the other servers that didn’t experience the SAP ASE start error.

The parameter: “subdomain_enumerate = none” was missing.
Looking at the manual page for SSSD it would seem that without this parameter there is additional overhead during any Active Directory traversal.

I set the parameter accordingly in the /etc/sssd/sssd.conf file and restarted the SSSD daemon using:

service sssd restart

Then I retried the start of the database using the hostagent command shown previously.
It worked!
I then retried with SAP LaMa and that also now started ASE without error messages.

Root Cause

What it seems was happening, was some sort of internal pre-set timeout in the sapdbctrl binary, when hit, the sapdbctrl just abandons and throws the error that I was seeing. This leaves the ASE database to continue and start (the process was initiated), but in the hostagent it looked like it had failed to start.
By adding the “subdomain_enumerate = none” parameter, any “delay”, caused by inappropriate call to Active Directory was massively reduced and subsequent start activities were successful.