This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

SUSE Cloud-Netconfig and Azure VMs – Dynamic Network Configuration

What is SUSE Cloud-Netconfig:
Within the SUSE SLES 12 (and OpenSUSE) operating system, lies a piece of functionality called Cloud-Netconfig.
It is provided as part of the System/Management group of packages.

The Cloud-Netconfig software consists of a set of shell functions and init scripts that are responsible for control of the network interfaces on the SUSE VM when running inside of a cloud framework such as Microsoft Azure.
The core code is part of the SUSE-Enceladus project (code & documents for use with public cloud) and hosted on GitHub here: https://github.com/SUSE-Enceladus/cloud-netconfig.
Cloud-Netconfig requires the sysconfig-netconfig package, as it essentially provides a netconfig module.
Upon installation, the Cloud-Netconfig module is prepended to the front of the netconfig module list like this: NETCONFIG_MODULES_ORDER=”cloud-netconfig dns-resolver dns-bind dns-dnsmasq nis ntp-runtime”.

What Cloud-Netconfig does:
As with every public cloud platform, a deployed VM is allocated and booted with the configuration for the networking provided by the cloud platform, outside of the VM.
In order to provide the usual networking devices and modules inside the VM with the required configuration information, the VM must know about its environment and be able to make a call out to the cloud platform.
This is where Cloud-Netconfig does its work.
The Cloud-Netconfig code will be called at boot time from the standard SUSE Linux init process (systemd).
It has the ability to detect the cloud platform that it is running within and make the necessary calls to obtain the networking configuration.
Once it has the configuration, this is persisted into the usual network configuration files inside the /sysconfig/network/scripts and /netconfig.d/cloud-netconfig locations.
The configuration files are then used by the wicked service to adjust the networking configuration of the VM accordingly.

What information does Cloud-Netconfig obtain:
Cloud-Netconfig has the ability to influence the following aspects of networking inside the VM.
– DHCP.
– DNS.
– IPv4.
– IPv6.
– Hostname.
– MAC address.

All of the above information is obtained and can be persisted and updated accordingly.

What is the impact of changing the networking configuration of a VM in Azure Portal:
Changing the configuration of the SUSE VM within Azure (for example: changing the DNS server list), will trigger an update inside the VM via the Cloud-Netconfig module.
This happens because Cloud-Netconfig is able to poll the Azure VM Instance metadata service (see my previous blog post on the Azure VM Instance metadata service).
If the information has changed since the last poll, then the networking changes are instigated.

What happens if a network interface is to remain static:
If you wish for Cloud-Netconfig to not manage a networking interface, then there exists the capability to disable management by Cloud-Netconfig.
Simply adjusting the network configuration file in /etc/sysconfig/network and set the variable CLOUD_NETCONFIG_MANAGE=no.
This will prevent future adjustments to this network interface.

How does Cloud-Netconfig interact with Wicked:
SUSE SLES 12 uses the Wicked network manager.
The Cloud-Netconfig scripts adjust the network configuration files in the locations /sysconfig/network/scripts which are then detected by Wicked and the necessary adjustments made (e.g. interfaces brought online, IP addresses assigned or DNS server lists updated).
As soon as the network configuration files have been written by Cloud-Netconfig, this is where the interaction ends.
From this point the usual netconfig services take over (wicked and nanny – for detecting the carrier on the interface).

What happens in the event of a VM primary IP address change:
If the primary IP address of the VM is adjusted in Azure, then the same process as before takes place.
The interface is brought down and then brought back up again by wicked.
This means that in an Azure Site Recovery replicated VM, should you activate the replica, the VM will boot and Cloud-Netconfig will automatically adjust the network configuration to that provided by Azure, even though this VM only contained the config for the previous hosting location (region or zone).
This significantly speeds up your failover process during a DR situation.

Are there any issues with this dynamic network config capability:
Yes, I have seen a number of issues.
In SLES 12 sp3 I have seen issues whereby a delay in the provision of the Azure VM Instance metadata during the boot cycle has caused the VM to lose sight of any secondary IP addresses assigned to the VM in Azure.
On tracing, the problem seemed to originate from a slowness in the full startup of the Azure Linux agent – possibly due to boot diagnostics being enabled.  A SLES patch is still being waited on for this fix.

I have also seen a “problem” whereby an incorrect entry inside the /etc/hosts file can cause the reconfiguration of the VM’s hostname.
Quite surprising.  This caused other custom SAP deployment script related issues as the hostname was being relied on to be in a specific intelligent naming convention, when instead, it was being changed to a temporary hostname for resolution during an installation of SAP sing the Software Provisioning Manager.

How can I debug the Cloud-Netconfig scripts:
According to the manuals, debug logging can be enabled through the standard DEBUG=”yes” and WICKED_DEBUG=”all” variables in config file /etc/sysconfig/network/config.
However, casting an eye over the scripts and functions inside of the Cloud-Netconfig module, these settings don’t seem to be picked up and sufficient logging produced.  Especially around the polling of the Azure VM Instance metadata service.
I found that when debugging I had to actually resort to adjusting the function script functions.cloud-netconfig.

Additional information:
https://www.suse.com/c/multi-nic-cloud-netconfig-ec2-azure/
https://www.suse.com/documentation/sles-12/singlehtml/book_sle_admin/book_sle_admin.html
https://github.com/SUSE-Enceladus/cloud-netconfig
https://www.suse.com/media/presentation/wicked.pdf
https://github.com/openSUSE/wicked

Korn Shell vs Powershell and the New AZ Module

Do you know Korn and are thinking about learning Powershell?

Look at this:

function What-am-I {
   echo “Korn or powershell?”
}

what-am-i
echo $?

Looks like Korn, but it also looks like Powershell.
In actual fact, it executes in both Korn shell and Powershell.

There’s a slight difference in the output from “$?” because Powershell will output “True” and Korn will output “0”.
Not much in it really. That is just another reason Linux people are feeling the Microsoft love right now.

Plus, as recently highlighted by a Microsoft blog post, the Azure CLI known as “az” which allows you to interact with Azure APIs and functions, will now also be the name of the new Powershell module used to perform the same operations and replacing “AzureRM”.

It makes sense for Microsoft to harmonise the two names.
It could save them an awful lot of documentation because currently they have to write examples for both “az” CLI and Powershell cmdlets for each new Azure feature/function.

Generate HMAC for Azure Storage from KSH

Generating an Azure HMAC Signature for calling Azure Storage Services from KSH

While custom writing an Azure Storage Service blob deletion script, I experienced a problem using the OpenSSL method for generating an HMAC.

For those not familiar with Azure Storage Services (or even signature based authentication) the act of sending a signature as part of an HTTP request serves to prove to the target server that you are in possession of the secret key and that you also would like to perform a specific operation.

The shared key (that you have been given out-of-band) is used to sign the HTTP call. This is so the target server can then perform the same signing operation at its end, and if the signature it obtains matches the one you’ve sent, then it trusts and permits you to perform the specific HTTP operation you’ve requested.

See here for more details: https://docs.microsoft.com/en-us/rest/api/storageservices/authenticate-with-shared-key#Constructing_Element

In my example, the operation is a simple BLOB deletion from an Azure Storage Account, but that is irrelevant to this particular post.
The HMAC generation routine is the same no matter what HTTP operation you wish to perform.

Based on searching in Google, the following OpenSSL method seems popular and able to provide a method of generating an HMAC:

l_input=”your HTTP operation to be signed”
l_key=”your big long Azure storage account key”
l_key_decoded=”$(echo -n “${l_key}”|base64 -d)”
l_hmac=”$(echo -n “${l_input}”|openssl dgst -sha256 -hmac “${l_key_decoded}” -binary | base64)”

The above works, with KSH, most of the time.
There have been one or two occasions when for no apparent reason, an incorrect HMAC is generated.
It’s possible that this stems from the character set interpretation e.g. UTF-8 and/or some strangeness in the way the KSH interpreter works with specific characters. I really wasn’t able to investigate deep enough with the time I had.

Instead of the above, I decided to take a leaf out of the Blobxfer utility team’s book and use a Python based solution instead.
Browsing the Blobxfer source in GitHub, I isolated the specific Python routine that was used to provide the HMAC.
Putting this routine into KSH makes it look like the following:

l_hmac=”$(cat <<EOF | python –
import sys
import hmac
import hashlib
import base64

def _encode_base64(data):
encoded = base64.b64encode(data)
return encoded

def _decode_base64_to_bytes(data):
return base64.b64decode(data)

def _sign_string(key, string_to_sign):
key = _decode_base64_to_bytes(key.encode(‘utf-8’))
string_to_sign = string_to_sign.encode(‘utf-8’)
signed_hmac_sha256 = hmac.HMAC(key, string_to_sign, hashlib.sha256)
digest = signed_hmac_sha256.digest()
encoded_digest = _encode_base64(digest)
return encoded_digest

data = “””${l_input}”””
key = “${l_key}”
print (_sign_string(key,data))
EOF
)”

I’m using a combination of HERE document and KSH in-line sub-shell execution to call python and pass in the stdin containing the python code to be executed.
KSH is responsible for embedding the required variables into the Python code, such as l_input and l_key.

So far, this routine has proved successful 100% of the time.

Recovery From: Operation start is not allowed on VM since the VM is generalized – Linux

Scenario: In Azure you had a Linux virtual machine.  In the Azure portal you clicked the “Capture” button in the Portal view of your Linux virtual machine, now you are unable to start the virtual machine as you get the Azure error: “Operation ‘start’ is not allowed on VM ‘abcd’ since the VM is generalized.“.

What this error/information prompt is telling you, is that the “Capture” button actually creates a generic image of your virtual machine, which means it is effectively a template that can be used to create a new VM.
Because the process that is applied to your original VM modifies it in such a way, it is now unable to boot up normally.  The process is called “sysprep”.

Can you recover your original VM?  no.  It’s not possible to recover it properly using the Azure Portal capabilities.  You could do it if you downloaded the disk image, but there’s no need.
Plus, there is no telling what changes have been made to the O/S that might affect your applications that have been installed.

It’s possible for you to create a new VM from your captured image, or even to use your old VM’s O/S disk to create a new VM.
However, both of the above mean you will have a new VM.  Like I said, who knows what changes could have been introduced from the sysprep process.  Maybe it’s better to rebuild…

Because the disk files are still present you can rescue your data and look at the original O/S disk files.
Here’s how I did it.

I’m starting from the point of holding my head in my hands after clicking “Capture“!
The next steps I took were:

– Delete your original VM (just the VM).  The disk files will remain, but at least you can create a new VM of the same name (I liked the original name).

– Create a new Linux VM, same as you did for the one you’ve just lost.
Use the same install image if possible.

– Within the properties of your new VM, go to the “Disks” area.

– Click to add a new data disk.
We will then be able to attach the existing O/S disk to the virtual machine (you will need to find itin the list).
You can add other data disks from the old VM if you need to.

Once your disks are attached to your new Linux VM, you just need to mount them up.
For my specific scenario, I could see that the root partition “/” on my new Linux VM, was of type “ext4” (check the output of ‘df -h’ command).
This means that my old VM’s root partition format would have also been ext4.
Therefore I just needed to find and mount the new disk in the O/S of my new VM.

As root on the new Linux VM find the last disk device added:

# ls -ltr /dev/sd*

The last line is your old VM disk.  Mine was device /dev/sdc and specifically, I needed partition 2 (the whole disk), so I would choose /dev/sdc2.

Mount the disk:

# mkdir /old_vm
# mount -t ext4 /dev/sdc2 /old_vm

I could then access the disk and copy any required files/settings:

# cd /old_vm

Once completed, I unmounted the old O/S disk in the new Linux VM:

# umount /old_vm

Then, back in the Azure Portal in the disks area of the new VM (in Edit mode), I detatched the old disk:

 

Once those disks are not owned by a VM anymore (you can see in the properties for the specific disk), then it’s safe to delete them.