This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

HowTo: Find the Datacentre Region and Physical Host of your Azure VM

With VMs hosted in Azure you need a fine balance between protection from hardware failure on the underlying Azure platform, plus performance from having the tiers of your SAP application being physically close together.

For this very purpose, Microsoft introduced Proximity Placement Groups (PPGs) to allow an administrator to ensure that specific tiers (e.g. application and database) are located close. Potentially even in the same server rack.
The PPGs also affect the location of the storage assigned to the VMs, although the storage infrastructure is actually transparent to administrators.

The PPGs still allow Azure to honor the Availability Sets, Fault Domains and Update Domains.

In this post, I show a method of finding the physical hostname of your Linux VM which could be part of a check before/after implementing a PPG.
NOTE: PPGs should be created at the time a VM is created, and assigned to the “lead” system of the rarest size. Example, an M-series VM is rare, so this should be the lead system when creating the PPG. This will anchor the other VMs to this M-series VMs location.

A separate post shows how to do this for a Windows VM.
On a Linux VM in Azure, as any Linux user, you can use the following to see the name of the physical host on which your VM is running:

awk -F 'H' '{ sub(/ostName/,"",$2); print $2 }' /var/lib/hyperv/.kvp_pool_3

Example output: DUB012345678910

In this case, we take the first 3 chars to be “Dublin”, which is in the EU North Azure region.
The remaining characters consist of the rack and physical hostname.

If you have 2 VMs in the same rack on the same physical host, then you will have minimal latency for networking between them.

Conversely, if you have 2 VMs on the same physical host, you are open to HA issues.

Therefore, you need a good balance for SAP.
You should expect to see SAP S/4HANA application servers and HANA DBs in the same Proximity Placement Groups, within the same rack, even potentially on the same host (providing you have availability sets across the tiers you will be safe).

Update: 23-Apr-2020
To get the above script output into a bash variable, the output contains hidden characters, we can use the following:

awk '{ gsub(/[^[:print:]]/,""); split ($0,a,"H"); sub(/ostName/,"",a[2]); print a[2]}' /var/lib/hyperv/.kvp_pool_3

Update: 05-Oct-2020
I have since found that there is another location where the above information can also be found.
Depending on your Linux O/S, you may also find the physical server name in the network scripts as follows:

grep BOOTSERVERNAME /var/run/netconfig/eth0/netconfig0

The aboe will return something like:
BOOTSERVERNAME=’AMS072nnnnnnnnn’

SUSE Cloud-Netconfig and Azure VMs – Dynamic Network Configuration

What is SUSE Cloud-Netconfig:
Within the SUSE SLES 12 (and OpenSUSE) operating system, lies a piece of functionality called Cloud-Netconfig.
It is provided as part of the System/Management group of packages.

The Cloud-Netconfig software consists of a set of shell functions and init scripts that are responsible for control of the network interfaces on the SUSE VM when running inside of a cloud framework such as Microsoft Azure.
The core code is part of the SUSE-Enceladus project (code & documents for use with public cloud) and hosted on GitHub here: https://github.com/SUSE-Enceladus/cloud-netconfig.
Cloud-Netconfig requires the sysconfig-netconfig package, as it essentially provides a netconfig module.
Upon installation, the Cloud-Netconfig module is prepended to the front of the netconfig module list like this: NETCONFIG_MODULES_ORDER=”cloud-netconfig dns-resolver dns-bind dns-dnsmasq nis ntp-runtime”.

What Cloud-Netconfig does:
As with every public cloud platform, a deployed VM is allocated and booted with the configuration for the networking provided by the cloud platform, outside of the VM.
In order to provide the usual networking devices and modules inside the VM with the required configuration information, the VM must know about its environment and be able to make a call out to the cloud platform.
This is where Cloud-Netconfig does its work.
The Cloud-Netconfig code will be called at boot time from the standard SUSE Linux init process (systemd).
It has the ability to detect the cloud platform that it is running within and make the necessary calls to obtain the networking configuration.
Once it has the configuration, this is persisted into the usual network configuration files inside the /sysconfig/network/scripts and /netconfig.d/cloud-netconfig locations.
The configuration files are then used by the wicked service to adjust the networking configuration of the VM accordingly.

What information does Cloud-Netconfig obtain:
Cloud-Netconfig has the ability to influence the following aspects of networking inside the VM.
– DHCP.
– DNS.
– IPv4.
– IPv6.
– Hostname.
– MAC address.

All of the above information is obtained and can be persisted and updated accordingly.

What is the impact of changing the networking configuration of a VM in Azure Portal:
Changing the configuration of the SUSE VM within Azure (for example: changing the DNS server list), will trigger an update inside the VM via the Cloud-Netconfig module.
This happens because Cloud-Netconfig is able to poll the Azure VM Instance metadata service (see my previous blog post on the Azure VM Instance metadata service).
If the information has changed since the last poll, then the networking changes are instigated.

What happens if a network interface is to remain static:
If you wish for Cloud-Netconfig to not manage a networking interface, then there exists the capability to disable management by Cloud-Netconfig.
Simply adjusting the network configuration file in /etc/sysconfig/network and set the variable CLOUD_NETCONFIG_MANAGE=no.
This will prevent future adjustments to this network interface.

How does Cloud-Netconfig interact with Wicked:
SUSE SLES 12 uses the Wicked network manager.
The Cloud-Netconfig scripts adjust the network configuration files in the locations /sysconfig/network/scripts which are then detected by Wicked and the necessary adjustments made (e.g. interfaces brought online, IP addresses assigned or DNS server lists updated).
As soon as the network configuration files have been written by Cloud-Netconfig, this is where the interaction ends.
From this point the usual netconfig services take over (wicked and nanny – for detecting the carrier on the interface).

What happens in the event of a VM primary IP address change:
If the primary IP address of the VM is adjusted in Azure, then the same process as before takes place.
The interface is brought down and then brought back up again by wicked.
This means that in an Azure Site Recovery replicated VM, should you activate the replica, the VM will boot and Cloud-Netconfig will automatically adjust the network configuration to that provided by Azure, even though this VM only contained the config for the previous hosting location (region or zone).
This significantly speeds up your failover process during a DR situation.

Are there any issues with this dynamic network config capability:
Yes, I have seen a number of issues.
In SLES 12 sp3 I have seen issues whereby a delay in the provision of the Azure VM Instance metadata during the boot cycle has caused the VM to lose sight of any secondary IP addresses assigned to the VM in Azure.
On tracing, the problem seemed to originate from a slowness in the full startup of the Azure Linux agent – possibly due to boot diagnostics being enabled.  A SLES patch is still being waited on for this fix.

I have also seen a “problem” whereby an incorrect entry inside the /etc/hosts file can cause the reconfiguration of the VM’s hostname.
Quite surprising.  This caused other custom SAP deployment script related issues as the hostname was being relied on to be in a specific intelligent naming convention, when instead, it was being changed to a temporary hostname for resolution during an installation of SAP sing the Software Provisioning Manager.

How can I debug the Cloud-Netconfig scripts:
According to the manuals, debug logging can be enabled through the standard DEBUG=”yes” and WICKED_DEBUG=”all” variables in config file /etc/sysconfig/network/config.
However, casting an eye over the scripts and functions inside of the Cloud-Netconfig module, these settings don’t seem to be picked up and sufficient logging produced.  Especially around the polling of the Azure VM Instance metadata service.
I found that when debugging I had to actually resort to adjusting the function script functions.cloud-netconfig.

Additional information:
https://www.suse.com/c/multi-nic-cloud-netconfig-ec2-azure/
https://www.suse.com/documentation/sles-12/singlehtml/book_sle_admin/book_sle_admin.html
https://github.com/SUSE-Enceladus/cloud-netconfig
https://www.suse.com/media/presentation/wicked.pdf
https://github.com/openSUSE/wicked

Recovery From: Operation start is not allowed on VM since the VM is generalized – Linux

Scenario: In Azure you had a Linux virtual machine.  In the Azure portal you clicked the “Capture” button in the Portal view of your Linux virtual machine, now you are unable to start the virtual machine as you get the Azure error: “Operation ‘start’ is not allowed on VM ‘abcd’ since the VM is generalized.“.

What this error/information prompt is telling you, is that the “Capture” button actually creates a generic image of your virtual machine, which means it is effectively a template that can be used to create a new VM.
Because the process that is applied to your original VM modifies it in such a way, it is now unable to boot up normally.  The process is called “sysprep”.

Can you recover your original VM?  no.  It’s not possible to recover it properly using the Azure Portal capabilities.  You could do it if you downloaded the disk image, but there’s no need.
Plus, there is no telling what changes have been made to the O/S that might affect your applications that have been installed.

It’s possible for you to create a new VM from your captured image, or even to use your old VM’s O/S disk to create a new VM.
However, both of the above mean you will have a new VM.  Like I said, who knows what changes could have been introduced from the sysprep process.  Maybe it’s better to rebuild…

Because the disk files are still present you can rescue your data and look at the original O/S disk files.
Here’s how I did it.

I’m starting from the point of holding my head in my hands after clicking “Capture“!
The next steps I took were:

– Delete your original VM (just the VM).  The disk files will remain, but at least you can create a new VM of the same name (I liked the original name).

– Create a new Linux VM, same as you did for the one you’ve just lost.
Use the same install image if possible.

– Within the properties of your new VM, go to the “Disks” area.

– Click to add a new data disk.
We will then be able to attach the existing O/S disk to the virtual machine (you will need to find itin the list).
You can add other data disks from the old VM if you need to.

Once your disks are attached to your new Linux VM, you just need to mount them up.
For my specific scenario, I could see that the root partition “/” on my new Linux VM, was of type “ext4” (check the output of ‘df -h’ command).
This means that my old VM’s root partition format would have also been ext4.
Therefore I just needed to find and mount the new disk in the O/S of my new VM.

As root on the new Linux VM find the last disk device added:

# ls -ltr /dev/sd*

The last line is your old VM disk.  Mine was device /dev/sdc and specifically, I needed partition 2 (the whole disk), so I would choose /dev/sdc2.

Mount the disk:

# mkdir /old_vm
# mount -t ext4 /dev/sdc2 /old_vm

I could then access the disk and copy any required files/settings:

# cd /old_vm

Once completed, I unmounted the old O/S disk in the new Linux VM:

# umount /old_vm

Then, back in the Azure Portal in the disks area of the new VM (in Edit mode), I detatched the old disk:

 

Once those disks are not owned by a VM anymore (you can see in the properties for the specific disk), then it’s safe to delete them.