I have never been fully satisfied with the reference architecture on the Microsoft site for running active-active SAP Web Dispatchers in an Azure IaaS platform.
Don’t get me wrong, from a high-level Azure perspective they are representative of what you will be desiring. However, they just lack enough detail to make you really think about the solution more than I feel you should need to.
To re-cap, in an Active-Active SAP Web Dispatcher in Azure, you rely on the inherent capabilities of the Azure Internal Load Balancer (ILB) to provide availability routing to the active-working VMs that are running the Web Dispatcher instances.
To help you (and me) understand what needs to be configured, I’ve put together what I feel is a pretty good low-level architecture diagram.
It’s almost a version two to SAP on Azure Web Dispatcher High Availability.
Show Us the Picture or It Never Happened!
Below is the diagram that I have created.
There is quite a lot of detail in it, and also quite a lot of detail that is not included (required O/S params, instance params and config for the network layer etc). It is really not as simple as you first imagine to include the required detail in one picture.
It Happened, Now Explain Please
If we look at the diagram above, we can easily see that WD1 is the SAP System name, with 2 instances of WD1, both with an instance number of 98 but installed against 2 virtual hostnames of sapwd1 and sapwd2.
Could we have installed on the server hostname directly? Yes, we could have. But that is not inline with a SAP Adaptive Computing Design principal, which is separation of the SAP instance from the host.
Notice that we have a Highly Available NFS share that hosts our SAP system instance profile files and a single shared PSE (SAPSSLS.pse).
We don’t show that this could be from a HA fileshare or NetApp or some other technology, but please use your imagination here. For production the Azure Files service is not currently supported.
Our ILB is configured to accept HTTP and HTTPS connections on their native ports (80 and 443) and it routes these through to the 8098 and 44398 ports that the Web Dispatchers are configured to listen on. You can configure whatever ports you want, but ultimately, having separately addressable back-end ports allows you to re-use the SSL port for Web Dispatcher administration and tie-down the access to a DMZ hosted Jump Box (definitely not on the diagram).
The ILB is probing both back-end VM members on tcp/8098 to see if the Web Dispatcher is responding. It’s a basic TCP SYN check (can I open a connection – yes, OK). For a better check, you can use a HTTP health probe on tcp/8098, which would allow you to set the Web Dispatcher to “maintenance” mode, causing a HTTP “service unavailable” response to be returned to the ILB probe, which would remove that particular Web Dispatcher from the ILB routing. If you also followed the other suggestion of accessing the admin page from the 44398 port via the virtual hostname, then you will see that an administrator would still have admin control for maintenance purposes. Nice.
We have a SAN enabled SSL certificate inside our shared PSE, with 3 Common Names associated with that certificate, one for the ILB “host” name (sapwd), and 1 for each of the virtual hostnames against which we have installed the Web Dispatcher instances (sapwd1 and sapwd2).
Our “icm/host_name_full” parameter sets both Web Dispatchers to think that they are accessed through sapwd.corp.net. However, we have to be careful that we do not use EXTBIND in this particular case, because we do not have the IP address of the ILB bound onto the servers (although if you read my post on how to add a secondary IP address on the Loopback device I can show you how it’s possible to do this and why you may want to).
How Do We Cater for DR?
Because we do not have a high disk I/O throughput on a Web Dispatcher VM, it is perfect to be protected by Azure Site Recovery (ASR).
This means the VM is replicated across to the Azure DR region (the region of your choice).
But wait, we’re only replicating 1 VM! Yes, we don’t need to pay for both, since a cost-optimised approach would be to just re-install the second Web Dispatcher after a DR failover.
We have a dependency on some sort of NFS share replication to exist in the DR region, but it doesn’t necessarily need to be fancy in the case of the SAP Web Dispatcher, because very little will be changing on the /sapmnt/<SID> area.
NOTE: The replicated VM is not accessible until a failover is instigated.
What Happens In a Failover to DR
In a DR scenario, the decision to failover to the DR region is manual.
The decision maker can choose to failover as soon as the primary region is unavailable, or they can choose to wait to see if the primary region is quickly recovered by Microsoft.
I’ve put together a diagram of how this could affect our simple HA Web Dispatcher setup:
The decision to failover should not be taken lightly, because it will take a lot of effort to failback (for databases).
Generally the recommendation from Microsoft is to use an Azure Automation Runbook to execute a pre-defined script of tasks.
In our case, the runbook will create the ILB above the Web Dispatcher VM and add the replicated VM to the ILB.
Our runbook will also then add secondary IP addresses to the VM and finally adjust DNS for all our hostnames and virtual host names, assigning the new IP addresses to the DNS records.
Once our Web Dispatcher is online and working, we could choose to then build out a further VM and add it into the ILB back-end pool, depending on how long we think we will be living in the DR region.
Did we successfully include more detail in the architecture diagram? Yes we sure did!
Was it all the detail? No. There’s a lot that I have not included still.
Will I be enhancing this diagram? Probably; I hate leaving holes.
I’ve shown above how an active-active SAP Web Dispatcher architecture can work in Azure and how that could be setup for a DR protection.
We also briefly touched on some good points about separation of administration traffic, using a HTTP health probe for an ILB aware Web Dispatcher maintenance capability, and how the SSL setup uses a SAN certificate.
Would this diagram be more complicated by having an active-active HA Web Dispatcher alongside an ASCS or SCS? Yes, it gets more complicated, but there are some great features in the ILB that allow simplification of the rules which allow you to use the ILB for more than one purpose, saving cost.
Update Jun-2020: This duplicate Web Dispatcher architecture is known in SAP as “Parallel Web Dispatcher” and a basic description is visible here: https://help.sap.com/viewer/683d6a1797a34730a6e005d1e8de6f22/1709%20002/en-US/489a9a6b48c673e8e10000000a42189b.html
Update Mar-2021: Some of you have asked about how the “Maintenance Mode” activation works with the ILB. This is siply that the WDisp returns a HTTP 503 when Wdisp maintenance mode is enabled.
By default the ILB health probe will be “http://<the-vm>:<your-port>/”, but if you don’t have a back-end service allocated to “/” then you will get a HTTP 404 constantly. You need to adjust the URL to an actual working URL location based on the config of your back-end systems.
If you don’t want the health probe to make a call to an actual back-end system (during the health probe ping) then use parameter “icm/HTTP/file_access_<xx>” to define a custom local directory and place a blank file called “health.htm”. Then just adjust the health problem URL with the path to the “health.htm” and the health probe pings will never call a back-end system URL. It also means that you can touch or remove the health.htm to permit the ILB to use or not use that specific WDisp.