Azure Archives » Page 5 of 6 » Musings of an IT Implementor

SUSE Cloud-Netconfig and Azure VMs – Dynamic Network Configuration

What is SUSE Cloud-Netconfig:
Within the SUSE SLES 12 (and OpenSUSE) operating system, lies a piece of functionality called Cloud-Netconfig.
It is provided as part of the System/Management group of packages.

The Cloud-Netconfig software consists of a set of shell functions and init scripts that are responsible for control of the network interfaces on the SUSE VM when running inside of a cloud framework such as Microsoft Azure.
The core code is part of the SUSE-Enceladus project (code & documents for use with public cloud) and hosted on GitHub here: https://github.com/SUSE-Enceladus/cloud-netconfig.
Cloud-Netconfig requires the sysconfig-netconfig package, as it essentially provides a netconfig module.
Upon installation, the Cloud-Netconfig module is prepended to the front of the netconfig module list like this: NETCONFIG_MODULES_ORDER=”cloud-netconfig dns-resolver dns-bind dns-dnsmasq nis ntp-runtime”.

What Cloud-Netconfig does:
As with every public cloud platform, a deployed VM is allocated and booted with the configuration for the networking provided by the cloud platform, outside of the VM.
In order to provide the usual networking devices and modules inside the VM with the required configuration information, the VM must know about its environment and be able to make a call out to the cloud platform.
This is where Cloud-Netconfig does its work.
The Cloud-Netconfig code will be called at boot time from the standard SUSE Linux init process (systemd).
It has the ability to detect the cloud platform that it is running within and make the necessary calls to obtain the networking configuration.
Once it has the configuration, this is persisted into the usual network configuration files inside the /sysconfig/network/scripts and /netconfig.d/cloud-netconfig locations.
The configuration files are then used by the wicked service to adjust the networking configuration of the VM accordingly.

What information does Cloud-Netconfig obtain:
Cloud-Netconfig has the ability to influence the following aspects of networking inside the VM.
– DHCP.
– DNS.
– IPv4.
– IPv6.
– Hostname.
– MAC address.

All of the above information is obtained and can be persisted and updated accordingly.

What is the impact of changing the networking configuration of a VM in Azure Portal:
Changing the configuration of the SUSE VM within Azure (for example: changing the DNS server list), will trigger an update inside the VM via the Cloud-Netconfig module.
This happens because Cloud-Netconfig is able to poll the Azure VM Instance metadata service (see my previous blog post on the Azure VM Instance metadata service).
If the information has changed since the last poll, then the networking changes are instigated.

What happens if a network interface is to remain static:
If you wish for Cloud-Netconfig to not manage a networking interface, then there exists the capability to disable management by Cloud-Netconfig.
Simply adjusting the network configuration file in /etc/sysconfig/network and set the variable CLOUD_NETCONFIG_MANAGE=no.
This will prevent future adjustments to this network interface.

How does Cloud-Netconfig interact with Wicked:
SUSE SLES 12 uses the Wicked network manager.
The Cloud-Netconfig scripts adjust the network configuration files in the locations /sysconfig/network/scripts which are then detected by Wicked and the necessary adjustments made (e.g. interfaces brought online, IP addresses assigned or DNS server lists updated).
As soon as the network configuration files have been written by Cloud-Netconfig, this is where the interaction ends.
From this point the usual netconfig services take over (wicked and nanny – for detecting the carrier on the interface).

What happens in the event of a VM primary IP address change:
If the primary IP address of the VM is adjusted in Azure, then the same process as before takes place.
The interface is brought down and then brought back up again by wicked.
This means that in an Azure Site Recovery replicated VM, should you activate the replica, the VM will boot and Cloud-Netconfig will automatically adjust the network configuration to that provided by Azure, even though this VM only contained the config for the previous hosting location (region or zone).
This significantly speeds up your failover process during a DR situation.

Are there any issues with this dynamic network config capability:
Yes, I have seen a number of issues.
In SLES 12 sp3 I have seen issues whereby a delay in the provision of the Azure VM Instance metadata during the boot cycle has caused the VM to lose sight of any secondary IP addresses assigned to the VM in Azure.
On tracing, the problem seemed to originate from a slowness in the full startup of the Azure Linux agent – possibly due to boot diagnostics being enabled. A SLES patch is still being waited on for this fix.

I have also seen a “problem” whereby an incorrect entry inside the /etc/hosts file can cause the reconfiguration of the VM’s hostname.
Quite surprising. This caused other custom SAP deployment script related issues as the hostname was being relied on to be in a specific intelligent naming convention, when instead, it was being changed to a temporary hostname for resolution during an installation of SAP sing the Software Provisioning Manager.

How can I debug the Cloud-Netconfig scripts:
According to the manuals, debug logging can be enabled through the standard DEBUG=”yes” and WICKED_DEBUG=”all” variables in config file /etc/sysconfig/network/config.
However, casting an eye over the scripts and functions inside of the Cloud-Netconfig module, these settings don’t seem to be picked up and sufficient logging produced. Especially around the polling of the Azure VM Instance metadata service.
I found that when debugging I had to actually resort to adjusting the function script functions.cloud-netconfig.

Additional information:
https://www.suse.com/c/multi-nic-cloud-netconfig-ec2-azure/
https://www.suse.com/documentation/sles-12/singlehtml/book_sle_admin/book_sle_admin.html
https://github.com/SUSE-Enceladus/cloud-netconfig
https://www.suse.com/media/presentation/wicked.pdf
https://github.com/openSUSE/wicked

Understand and Use the Azure Instance Metadata Service with SAP

In the below post, we will explore the Azure Instance Metadata service and how we can make use of the service when deploying our SAP landscape.

What Is the Azure Instance Metadata Service?

The Azure Metadata Service is a locally accessed (on each VM deployed in Azure), REST enabled, API versioned HTTP service endpoint that provides a gateway to the Azure “fabric” hosting your VMs.

New features are added through new versions of the API, accessed through the URI and by appending the required version as a querystring parameter.

What Can You Do With the Azure Instance Metadata Service?

A simple example, would be to query the service to show the current VM size (Azure VM Size) from within the VM itself, without needing access to the Azure Portal or any Azure authorisation (e.g. Service Principals).

How Can You Query the Azure Intance Metadata Service?

Depending on whether you’re using Linux or Windows as your VM operating system, you can call the REST API for the Azure Instance Metadata Service using something similar to the following in Linux:

curl -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2019-06-01"

or in PowerShell 6.3+ on Windows (includes -noproxy):

Invoke-RestMethod -Headers @{"Metadata"="true"} -Method GET -NoProxy -Uri 169.254.169.254/metadata/instance?api-version=2019-06-01

or Powershell <6.0 compatible (excludes -noproxy):

Invoke-RestMethod -Headers @{"Metadata"="true"} -Method GET -Uri 169.254.169.254/metadata/instance?api-version=2019-06-01

This will return a JSON string which, among other things, will contain the current VM size.

You can use the querystring parameter “format=text” to get a raw text response:

169.254.169.254/metadata/instance?api-version=2017-08-01&format=text

For more information on the API options and returned data use the following links for Windows or Linux VMs:

What Is Providing the 169.254.x.x Address?

The Azure Instance Metadata service is provided by the WAAGENT. This (in Linux) is a daemon service and in Windows is a Windows Service installed during the VM build process when a VM is built using the Azure Resource Manager (not the Classic Azure VM build process).

The agent is a set of python routines. These python routines are visible on GitHub here: https://github.com/Azure/WALinuxAgent
The agent is not required to be installed inside VMs hosted in Azure but it is used by a multitude of Azure features.

If you analyse the agent log files (see /var/log/waagent.log in Linux), you will see that the agent is in constant communication with Azure APIs over HTTP (and HTTPS).

Can I Disable the Azure Instance Metadata Service?

Yes, you can disable it (see here: https://github.com/Azure/WALinuxAgent/wiki/VMs-without-WALinuxAgent), but without the agent running, you will not be able to run the Azure Enhanced Monitoring for Linux (AEM) plugin which is required in a production SAP system, because of the required use of Premium disks (see SAP note 2191498).
The Azure Instance Metadata service will auto-start with the VM.

There are noted downsides to having the agent running (documented here: https://raymii.org/s/blog/Linux_on_Microsoft_Azure_Disable_this_built_in_root_access_backdoor.html) but as mentioned, for SAP support, you need Azure Enhanced Monitoring (for Linux) which is a plugin for this agent.

Is the Azure Instance Metadata Service Used by SAP?

Yes, although indirectly.
The SAP Hostagent (7.21) is able to query the metadata service statistics of the guest VM.
The statistics are recorded into local file system files by the Azure Enhanced Monitoring for Linux agent plugin (also listed on GitHub under here: https://github.com/Azure/azure-linux-extensions/tree/master/AzureEnhancedMonitor).

The AEM plugin is a basic set of Python routines for the recording of the Azure disk and CPU statistics into designated flat text files (in Linux see /var/lib/AzureEnhancedMonitor/PerfCounters), and these files are then consumed by the SAP Hostagent.

As you may know, the Hostagent includes the SAPOSCOL (SAP O/S Collector) binary executable, which is the actual process within the SAP Hostagent delivered binaries, responsible for digesting the AEM statistics.
It makes the statistical information available in a shared memory segment, which can be accessed by a SAP Netweaver stack (in fact you can access it manually also by using the SAPOSCOL interactive command line).

In SAP Netweaver (AS ABAP) you can use transaction ST06 to access this SAPOSCOL information, where you will see a summary page for the O/S details (including the Azure provided details) plus a historical report of statistical data, all obtained from the SAPOSCOL memory segment.

Is the Azure Instance Metadata Service ReadOnly?

Yes, all of the data is readonly.
However there is one area that you can influence using a HTTP POST as outlined in the information provided here:
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/scheduled-events

As you will see the ScheduledEvents API doesn’t really give you any control of the VM, as it’s more of a notification provider that gives you fair warning and allows you time to perform some provisional processing prior to a scheduled event execution.
It’s not used by the SAP Hostagent as far as I can determine.

How Can We Utilise the Azure Instance Metadata Service During SAP Deployment Projects?

During deployments of SAP into Microsoft Azure, I have found it very useful to script access to the Azure Instance Metadata service to form part of a basic configuration check of VMs.

As an example, a Custom Operation can be defined in SAP LaMa (SAP Landscape Manager) which can be executed across all known SAP Hostagents and can return the information back into SAP LaMa as part of a Custom Validation execution (see more about SAP LaMa Custom Validation here: https://blogs.sap.com/2018/05/14/how-to-use-sap-landscape-management-custom-validations).

This then provides you with an easy SAP level reporting capability to see what size of VMs you’re running in your landscape and the configuration of such items like Azure disk cache settings (an important topic for HANA databases!).

What is /usr/sbin/azuremetadata ?

In distributions of SUSE Linux (including OpenSUSE), a commandline binary executable exists which calls the Azure Instance Metadata service.

It has a fixed set of command line options and can be used to retrieve a minimised set of data as can be queried using “curl” or “wget”.

If you need only the barest, quickest method of calling the Azure Instance Metadata service, then this binary executable will probably suffice.

This executable is also used by other SUSE features, so it is unlikely that it will be deprecated, however, it may not use the latest version of the API.

What Is the Latest Version of the Azure Instance Metadata Service API?

If you look at the two URLs provided previously for Windows and Linux, you will notice they contain a section called “Versioning” on the pages which details the currently supported versions of the API.

Are There Any Issues With the Azure Instance Metadata Service?

Yes, I’ve seen a couple of issues.
The service is relied upon in various areas of SUSE Linux cloud-netconfig to provide the VM with IP address details at boot time.
If this integration fails or is slow, your Linux VM may not have all IP addresses after boot (only the primary IP).

Sometimes (quite a lot of times) you will notice timeout errors in the agent log file as it tries to talk to Azure APIs.
Apparently this is normal and noted in a few forum posts in places. However, it means that the agent is obviously “stalling” while it experiences this “timeout”. Therefore I would argue that it is not ideal.

Korn Shell vs Powershell and the New AZ Module

Do you know Korn and are thinking about learning Powershell?

Look at this:

function What-am-I {
echo “Korn or powershell?”
}

what-am-i
echo $?

Looks like Korn, but it also looks like Powershell.
In actual fact, it executes in both Korn shell and Powershell.

There’s a slight difference in the output from “$?” because Powershell will output “True” and Korn will output “0”.
Not much in it really. That is just another reason Linux people are feeling the Microsoft love right now.

Plus, as recently highlighted by a Microsoft blog post, the Azure CLI known as “az” which allows you to interact with Azure APIs and functions, will now also be the name of the new Powershell module used to perform the same operations and replacing “AzureRM”.

It makes sense for Microsoft to harmonise the two names.
It could save them an awful lot of documentation because currently they have to write examples for both “az” CLI and Powershell cmdlets for each new Azure feature/function.

Generate HMAC for Azure Storage from KSH

Generating an Azure HMAC Signature for calling Azure Storage Services from KSH

While custom writing an Azure Storage Service blob deletion script, I experienced a problem using the OpenSSL method for generating an HMAC.

For those not familiar with Azure Storage Services (or even signature based authentication) the act of sending a signature as part of an HTTP request serves to prove to the target server that you are in possession of the secret key and that you also would like to perform a specific operation.

The shared key (that you have been given out-of-band) is used to sign the HTTP call. This is so the target server can then perform the same signing operation at its end, and if the signature it obtains matches the one you’ve sent, then it trusts and permits you to perform the specific HTTP operation you’ve requested.

See here for more details: https://docs.microsoft.com/en-us/rest/api/storageservices/authenticate-with-shared-key#Constructing_Element

In my example, the operation is a simple BLOB deletion from an Azure Storage Account, but that is irrelevant to this particular post.
The HMAC generation routine is the same no matter what HTTP operation you wish to perform.

Based on searching in Google, the following OpenSSL method seems popular and able to provide a method of generating an HMAC:

l_input=”your HTTP operation to be signed”
l_key=”your big long Azure storage account key”
l_key_decoded=”$(echo -n “${l_key}”|base64 -d)”
l_hmac=”$(echo -n “${l_input}”|openssl dgst -sha256 -hmac “${l_key_decoded}” -binary | base64)”

The above works, with KSH, most of the time.
There have been one or two occasions when for no apparent reason, an incorrect HMAC is generated.
It’s possible that this stems from the character set interpretation e.g. UTF-8 and/or some strangeness in the way the KSH interpreter works with specific characters. I really wasn’t able to investigate deep enough with the time I had.

Instead of the above, I decided to take a leaf out of the Blobxfer utility team’s book and use a Python based solution instead.
Browsing the Blobxfer source in GitHub, I isolated the specific Python routine that was used to provide the HMAC.
Putting this routine into KSH makes it look like the following:

l_hmac=”$(cat <<EOF | python –
import sys
import hmac
import hashlib
import base64

def _encode_base64(data):
encoded = base64.b64encode(data)
return encoded

def _decode_base64_to_bytes(data):
return base64.b64decode(data)

def _sign_string(key, string_to_sign):
key = _decode_base64_to_bytes(key.encode(‘utf-8’))
string_to_sign = string_to_sign.encode(‘utf-8’)
signed_hmac_sha256 = hmac.HMAC(key, string_to_sign, hashlib.sha256)
digest = signed_hmac_sha256.digest()
encoded_digest = _encode_base64(digest)
return encoded_digest

data = “””${l_input}”””
key = “${l_key}”
print (_sign_string(key,data))
EOF
)”

I’m using a combination of HERE document and KSH in-line sub-shell execution to call python and pass in the stdin containing the python code to be executed.
KSH is responsible for embedding the required variables into the Python code, such as l_input and l_key.

So far, this routine has proved successful 100% of the time.

Recovery From: Operation start is not allowed on VM since the VM is generalized – Linux

Scenario: In Azure you had a Linux virtual machine. In the Azure portal you clicked the “Capture” button in the Portal view of your Linux virtual machine, now you are unable to start the virtual machine as you get the Azure error: “Operation ‘start’ is not allowed on VM ‘abcd’ since the VM is generalized.“.

What this error/information prompt is telling you, is that the “Capture” button actually creates a generic image of your virtual machine, which means it is effectively a template that can be used to create a new VM.
Because the process that is applied to your original VM modifies it in such a way, it is now unable to boot up normally. The process is called “sysprep”.

Can you recover your original VM? no. It’s not possible to recover it properly using the Azure Portal capabilities. You could do it if you downloaded the disk image, but there’s no need.
Plus, there is no telling what changes have been made to the O/S that might affect your applications that have been installed.

It’s possible for you to create a new VM from your captured image, or even to use your old VM’s O/S disk to create a new VM.
However, both of the above mean you will have a new VM. Like I said, who knows what changes could have been introduced from the sysprep process. Maybe it’s better to rebuild…

Because the disk files are still present you can rescue your data and look at the original O/S disk files.
Here’s how I did it.

I’m starting from the point of holding my head in my hands after clicking “Capture“!
The next steps I took were:

– Delete your original VM (just the VM). The disk files will remain, but at least you can create a new VM of the same name (I liked the original name).

– Create a new Linux VM, same as you did for the one you’ve just lost.
Use the same install image if possible.

– Within the properties of your new VM, go to the “Disks” area.

– Click to add a new data disk.
We will then be able to attach the existing O/S disk to the virtual machine (you will need to find itin the list).
You can add other data disks from the old VM if you need to.

Once your disks are attached to your new Linux VM, you just need to mount them up.
For my specific scenario, I could see that the root partition “/” on my new Linux VM, was of type “ext4” (check the output of ‘df -h’ command).
This means that my old VM’s root partition format would have also been ext4.
Therefore I just needed to find and mount the new disk in the O/S of my new VM.

As root on the new Linux VM find the last disk device added:

# ls -ltr /dev/sd*

The last line is your old VM disk. Mine was device /dev/sdc and specifically, I needed partition 2 (the whole disk), so I would choose /dev/sdc2.

Mount the disk:

# mkdir /old_vm

# mount -t ext4 /dev/sdc2 /old_vm

I could then access the disk and copy any required files/settings:

# cd /old_vm

Once completed, I unmounted the old O/S disk in the new Linux VM:

# umount /old_vm

Then, back in the Azure Portal in the disks area of the new VM (in Edit mode), I detatched the old disk:

Once those disks are not owned by a VM anymore (you can see in the properties for the specific disk), then it’s safe to delete them.