This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

SUSE Linux 12 – Kernel 4.4.73 – Boot Hang – BTRFS Issue

I had a VMWare guest running SUSE Linux 12 SP3 64bit (kernel 4.4.73).
One day after a power outage, the VM failed to boot.
It would arrive at the SUSE Linux “lizard” splash screen and then just hang.

I noticed prior to this error that the SUSE 12 operating system creates it’s root partition inside a logical volume call “/dev/system/root” and it is then formatted as a BTRFS filesystem.

At this point I decided that I must have a corrupt disk block.
I launched the VM with the CDROM attached and pointing at the SUSE 12 installation ISO file.
While the VM starts you need to press F2 to get to the “BIOS” boot options to enable the CDROM to be bootable before the hard-disks.

Once the installation cdrom was booting, I selected “Recovery” from the SUSE menu.
This drops you into a recovery session with access to the BTRFS filesystem check tools.

Following a fair amount of Google action, I discovered I could run a “check” of the BTRFS file system (much like the old fsck on EXT file systems).

Since I already knew the device name for the root file system, things were pretty easy:

# btrfs check /dev/system/root
Checking filesystem on /dev/system/root

found 5274484736 bytes used err is 0

Looks like the command worked, but it is showing no errors.
So I tried to mount the partition:

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root

At this point the whole VM hung again!
I had to restart the whole process.
So there was definately an issue with the BTRFS filesystem on the root partition.

Starting the VM again and re-entering the recovery mode of SUSE, I decided to try and mount the partition in recovery mode:

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root -o ro,recovery

It worked!
No problems.  Weird.
So I unmounted and tried to re-mount in read-write mode again:

# umount /old_root
# mount -t btrfs /dev/system/root /old_root

BAM! The VM hung again.

Starting the VM again and re-entering the recovery mode of SUSE, I decided to just run the btrfs command with the “repair” command (although it says this should be a last resort).

# btrfs check –repair /dev/system/root
enabling repair mode
Checking filesystem on /dev/system/root
UUID: a09b7c3c-9d33-4195-af6e-9519fe550694
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don’t match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 5274484736 bytes used err is 0
total csum bytes: 4909484
total tree bytes: 236126208
total fs tree bytes: 215973888
total extent tree bytes: 13647872
btree space waste bytes: 38681887
file data blocks allocated: 5186543616

Maybe this cache problem that it fixed is the issue.

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root

Yay!
So, weird problem fixed.
Maybe this is a Kernel level issue and later Kernels have a patch, not sure.  It’s not my primary concern to fix this as I don’t plan on having many power outages, but if it was my production system then I might be more concerned and motivated.