This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

DB2 10.1 fp4 ICC Error

Scenario: After applying DB2 10.1 fp4 and trying to backup the database, we were seeing an ICC Error in the log:

2014-11-03-15.31.41.864652+000 I15689A2279 LEVEL: Error
PID : 18481304 TID : 1 PROC : nsrdb2ra
INSTANCE: db2sid NODE : 000
HOSTNAME: db01
EDUID : 1
FUNCTION: DB2 Common, Cryptography, cryptContextRealInit, probe:60
MESSAGE : ECF=0x90000403=-1879047165=ECF_CRYPT_UNEXPECTED_ERROR
Unexpected cryptographic error
DATA #1 : Hex integer, 4 bytes
0x00000002
DATA #2 : Hex integer, 4 bytes
0x00000002
DATA #3 : Hex integer, 4 bytes
0x00000000
DATA #4 : String, 32 bytes
Invalid data value: icclib.c:989
CALLSTCK: (Static functions may not be resolved correctly, as they are
resolved to the nearest symbol)
[0] 0x090000007ECEDB34 pdOSSeLoggingCallback + 0xC0
[1] 0x090000007DFB0224 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x090000007DFB0024 ossLog + 0xC4
[3] 0x090000007F37A960 cryptLogICCErrorWithStatus + 0xF8
[4] 0x090000007F37A2D0 cryptContextRealInit + 0x544
[5] 0x090000007F37B160 cryptContextCheckAndInit + 0x58
[6] 0x090000007F37B5E4 cryptDHInit + 0x1BC
[7] 0x090000007F35E3C4 sqlexSlcServerEncryptAccsec + 0xD0
[8] 0x090000007F35E260
sqlexSlcServerEncryptAuthenticate__FP14db2UCinterfacelPUi + 0x130
[9] 0x090000007E50B258 sqlexAppAuthenticate__FP14db2UCinterface +
0x3550
[10] 0x090000007E6DCC80 sqljrDrdaArAttach__FP14db2UCinterface + 0x84
[11] 0x090000007E6DC26C sqleUCdrdaARinit__FP14db2UCconHandle + 0x4B4
[12] 0x090000007E56A39C sqleUCappAttach + 0x92C
[13] 0x090000007E6A1F28 sqleatin__FPcN41sP5sqlca + 0x23C
[14] 0x090000007E6A1530 sqleatcp_api + 0x1C0
[15] 0x090000007E6A127C sqleatin_api + 0x54
[16] 0x00000001002FB550 InstAttach + 0xC0
[17] 0x00000001002DD104 validate_db2_credentials + 0x128
[18] 0x00000001002DEE38 db2ra_verify_credentials + 0xC8
[19] 0x00000001000008C0 ra_message_handler + 0xF0
[20] 0x0000000100055134 ra_default_message_handler + 0x308
[21] 0x0000000100054BE0 ra_callback + 0x17C
[22] 0x00000001002D93A4 ssncommon_do_callback + 0x70
[23] 0x00000001002D6908 msgssn_get_expected_msg_varp + 0x674
[24] 0x00000001002D7DFC ssn_getmsg_poll_varp + 0xEC
[25] 0x0000000100054980 ra_startagent + 0x128
[26] 0x0000000100055478 ra_main + 0x278
[27] 0x0000000100000604 main + 0x114
[28] 0x0000000100000318 __start + 0x90

The quick resolution was that we needed to adjust the environment of the AIX database backup user to include the following:

> export ICC_IGNORE_FIPS=YES

This was put into the .profile of the AIX backup user.

SUSE Linux 12 – Kernel 4.4.73 – Boot Hang – BTRFS Issue

I had a VMWare guest running SUSE Linux 12 SP3 64bit (kernel 4.4.73).
One day after a power outage, the VM failed to boot.
It would arrive at the SUSE Linux “lizard” splash screen and then just hang.

I noticed prior to this error that the SUSE 12 operating system creates it’s root partition inside a logical volume call “/dev/system/root” and it is then formatted as a BTRFS filesystem.

At this point I decided that I must have a corrupt disk block.
I launched the VM with the CDROM attached and pointing at the SUSE 12 installation ISO file.
While the VM starts you need to press F2 to get to the “BIOS” boot options to enable the CDROM to be bootable before the hard-disks.

Once the installation cdrom was booting, I selected “Recovery” from the SUSE menu.
This drops you into a recovery session with access to the BTRFS filesystem check tools.

Following a fair amount of Google action, I discovered I could run a “check” of the BTRFS file system (much like the old fsck on EXT file systems).

Since I already knew the device name for the root file system, things were pretty easy:

# btrfs check /dev/system/root
Checking filesystem on /dev/system/root

found 5274484736 bytes used err is 0

Looks like the command worked, but it is showing no errors.
So I tried to mount the partition:

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root

At this point the whole VM hung again!
I had to restart the whole process.
So there was definately an issue with the BTRFS filesystem on the root partition.

Starting the VM again and re-entering the recovery mode of SUSE, I decided to try and mount the partition in recovery mode:

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root -o ro,recovery

It worked!
No problems.  Weird.
So I unmounted and tried to re-mount in read-write mode again:

# umount /old_root
# mount -t btrfs /dev/system/root /old_root

BAM! The VM hung again.

Starting the VM again and re-entering the recovery mode of SUSE, I decided to just run the btrfs command with the “repair” command (although it says this should be a last resort).

# btrfs check –repair /dev/system/root
enabling repair mode
Checking filesystem on /dev/system/root
UUID: a09b7c3c-9d33-4195-af6e-9519fe550694
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don’t match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 5274484736 bytes used err is 0
total csum bytes: 4909484
total tree bytes: 236126208
total fs tree bytes: 215973888
total extent tree bytes: 13647872
btree space waste bytes: 38681887
file data blocks allocated: 5186543616

Maybe this cache problem that it fixed is the issue.

# mkdir /old_root
# mount -t btrfs /dev/system/root /old_root

Yay!
So, weird problem fixed.
Maybe this is a Kernel level issue and later Kernels have a patch, not sure.  It’s not my primary concern to fix this as I don’t plan on having many power outages, but if it was my production system then I might be more concerned and motivated.

Recovery From: Operation start is not allowed on VM since the VM is generalized – Linux

Scenario: In Azure you had a Linux virtual machine.  In the Azure portal you clicked the “Capture” button in the Portal view of your Linux virtual machine, now you are unable to start the virtual machine as you get the Azure error: “Operation ‘start’ is not allowed on VM ‘abcd’ since the VM is generalized.“.

What this error/information prompt is telling you, is that the “Capture” button actually creates a generic image of your virtual machine, which means it is effectively a template that can be used to create a new VM.
Because the process that is applied to your original VM modifies it in such a way, it is now unable to boot up normally.  The process is called “sysprep”.

Can you recover your original VM?  no.  It’s not possible to recover it properly using the Azure Portal capabilities.  You could do it if you downloaded the disk image, but there’s no need.
Plus, there is no telling what changes have been made to the O/S that might affect your applications that have been installed.

It’s possible for you to create a new VM from your captured image, or even to use your old VM’s O/S disk to create a new VM.
However, both of the above mean you will have a new VM.  Like I said, who knows what changes could have been introduced from the sysprep process.  Maybe it’s better to rebuild…

Because the disk files are still present you can rescue your data and look at the original O/S disk files.
Here’s how I did it.

I’m starting from the point of holding my head in my hands after clicking “Capture“!
The next steps I took were:

– Delete your original VM (just the VM).  The disk files will remain, but at least you can create a new VM of the same name (I liked the original name).

– Create a new Linux VM, same as you did for the one you’ve just lost.
Use the same install image if possible.

– Within the properties of your new VM, go to the “Disks” area.

– Click to add a new data disk.
We will then be able to attach the existing O/S disk to the virtual machine (you will need to find itin the list).
You can add other data disks from the old VM if you need to.

Once your disks are attached to your new Linux VM, you just need to mount them up.
For my specific scenario, I could see that the root partition “/” on my new Linux VM, was of type “ext4” (check the output of ‘df -h’ command).
This means that my old VM’s root partition format would have also been ext4.
Therefore I just needed to find and mount the new disk in the O/S of my new VM.

As root on the new Linux VM find the last disk device added:

# ls -ltr /dev/sd*

The last line is your old VM disk.  Mine was device /dev/sdc and specifically, I needed partition 2 (the whole disk), so I would choose /dev/sdc2.

Mount the disk:

# mkdir /old_vm
# mount -t ext4 /dev/sdc2 /old_vm

I could then access the disk and copy any required files/settings:

# cd /old_vm

Once completed, I unmounted the old O/S disk in the new Linux VM:

# umount /old_vm

Then, back in the Azure Portal in the disks area of the new VM (in Edit mode), I detatched the old disk:

 

Once those disks are not owned by a VM anymore (you can see in the properties for the specific disk), then it’s safe to delete them.

SAP ASE Job Server Error

Whilst administering a SAP ASE based SAP system, I came across an issue in the ASE Job server error log “JSTASK.log”:

00:140737306879744:140737340581728:2016/02/24 16:50:00.87 worker  ct_connect() failed.
00:140737306879744:140737340581728:2016/02/24 16:50:00.87 worker  jsj__RunSQLJob: jsd_MakeConnection() failed for user sapsa to server SID
00:140737306879744:140737340581728:2016/02/24 16:50:00.87 worker  jsj__RunSQLJob() failed for xid 66430
00:140737317369600:140737340581728:2016/02/24 16:55:00.87 worker  Client message: ct_connect(): protocol specific layer: external error: The attempt to connect to the server failed.

The issue was caused by a change of the sapsa user password whereby the SAP recommended method of using the hostctrl process, wasn’t followed.
The recommended method updates the sapsa user, the secure storage file plus also the external login for the Job Server.
This is mentioned at the very end of SAP note 1706410 (although it is suggested that the process in this note is no longer followed to change the passwords).
To fix the issue, follow finals steps in the SAP note 1706410:

isql -X -Usapsa -S<SID> -w999

use master
go
sp_helpexternlogin
go

Server                 Login                Externlogin
———————- ——————– ————
SYB_JSTASK             sapsa                sapsa

Drop the SYB_JSTASK entry:

exec sp_dropexternlogin SYB_JSTASK, sapsa
go

Re-create it with the new password:

exec sp_addexternlogin SYB_JSTASK, sapsa, sapsa, ‘<new sapsa password>’
go

This should fix the issue.

SAP ASE Backup Server Error Writing to Archive

When running SAP Business Suite on the SAP ASE database platform, I was trying to dump and load from one database to another database.
A backup server error was seen on the target database and in the backup server log file (<SID>_BS.log) during the LOAD statement execution:

Backup Server: 4.145.2.22: [3] Error for database/archive device while working on stripe device ‘/<file1>’. Error writing to archive device /<file1>. Attempted to write 65536 bytes, 32768 bytes were written.

This specific issue turned out to be caused by my target DB not having exactly the same size for the data or log devices.

I even found in some cases that the log device needed to be a tiny little bit bigger (we’re talking about 1MB bigger) than the source database.