This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

SAP ASE Error – Process No Longer Found After Startup

This post is about a strange issue I was hitting during the configuration of SAP LaMa 3.0 to start/stop a SAP ABAP 7.52 system (with Kernel 7.53) that was running with a SAP ASE 16.0 database.

During the LaMa start task, the task would fail with an error message: “ASE process no longer found after startup. (fault code: 127)“.

When I logged directly onto the SAP server Linux host, I could see that the database had indeed started up, eventually.
So what was causing the failure?

The Investigation

At first I thought this was related to the Kernel, but having checked the versions of the Kernel components, I found that they were the same as another SAP system that was starting up perfectly fine using the exact same LaMa system.

The next check I did was to turn on tracing on the hostagent itself. This is a simple task of putting the trace value to “3” in the host_profile of the hostagent and restarting it:

service/trace = 3

The trace output is shown in a number of different trace files in the work directory of the hostagent but the trace file we were interested in is called dev_sapdbctrl.

The developer trace file for the sapdbctrl binary executable is important, because the sapdbctrl binary is executed by SAP hostagent (saphostexec) to perform the database start. If you observe the contents of the sapdbctrl trace output, you will see that it loads the Sybase specific shared library which contains the required code to start/stop the ASE database.

The same sapdbctrl also contains the ability to load the required libraries for other database systems.

As a side note, it is still not known to me, how the Sybase shared library comes to exist in the hostagent executable directory. When SAP ASE is patched, this library must also be patched, otherwise how does the hostagent stay in-step with the ASE database that it needs to talk with?

Once tracing was turned on, I shut the SAP ASE instance down again and then used SAP LaMa to initiate the SAP system start once again.
Sure enough, the LaMa start task failed again.

Looking in the trace file dev_sapdbctrl I could see the same error message that I was seeing in SAP LaMa:

Error: Command execution failed. : ASE process no longer found after startup. 
(fault code: 127) Operation ID: 000D3A3862631EEAAEDDA232BE624001
----- Log messages ---- 
Info: saphostcontrol: Executing StartDatabase 
Error: sapdbctrl: ASE process no longer found after startup. 
Error: saphostcontrol: StartDatabase failed (sapdbctrl exit code = 1)

This was great. It confirmed that SAP LaMa was just seeing the symptom of some other issue, since LaMa just calls the hostagent to do the start.

Now I knew the hostagent was seeing the error, I tried using the hostagent directly to perform the start, using the following:

/usr/sap/hostctrl/exe/saphostctrl -debug -function StartDatabase -dbname <SID> -dbtype syb -dbhost <the-ASE-host>

NOTE: The hostagent “-debug” command line option puts out the same information without the need for the hostagent tracing to be turned on in the host_profile.

Once again, the start process failed and the same error message was present in the dev_sapdbctrl trace file.

This was now really strange.
I decided that the next course of action was to start the process of raising the issue with SAP via an incident.
If you suspect that something could take a while to fix, then it’s always best to raise it with SAP early and continue to look at the issue in parallel.

Continuing the Diagnosis

While the SAP incident was in progress, I continued the process of trying to self-diagnose the issue further.
I tried a couple more things such as:

  • Starting and stopping SAP ASE manually using stopdb/startdb commands.
  • Restarting the whole server (this step has a place in every troubleshooting process, eventually).
  • Checking the server patch level.
  • Checking the environment of the Linux user, the shell, the profile files, the O/S limits applied.
  • Checking what happens if McAfee anti-virus was disabled (I’ve seen the ePO blocking processes before).

Eventually exhaustion set in and I decided to give the SAP support processor time to get back to me with some hints.

Some Sleep

I spend a lot of time solving SAP problems. A lot of time.
Something doesn’t work according to the docs, something did work but has stopped working, something has never worked well…
It builds up in your mind and you carry this stuff around in your head.
Subconsciously you think about these problems.

Then, at about 3am when you can’t get back to sleep, you have a revelation.
The hostagent is forking the process to start the database as the syb<sid> Linux user (it uses “su”), from the root user (hostagent runs as the root user).

Linux Domain Users

The revelation I had regarding the forking of the user, was just the trigger I needed to make me consider the way the Linux authentication was setup on this specific server with the problem ASE startup.

I remembered at the beginning of the project that I had hit an issue with the SSSD Linux daemon, which is responsible for interfacing between Linux and Microsoft Active Directory. At that time, the issue was causing the hostagent to hang when operations were executed which required a switch to another Linux user.
This previous issue was actually a hostagent issue that was fixed in a later hostagent patch. During that particular investigation, I requested that the Linux team re-configure the SSSD daemon to be more efficient with its Active Directory traversals, when it was looking to see if the desired user account was local to Linux or if it was a domain account.

With this previous issue in mind, I checked the SSSD configuration on the problem server. This is a simple conf file in /etc/sssd.

The Solution

After all the troubleshooting, the raising of the incident, the sleeping, I had finally got to the solution.

After checking the SSSD daemon configuration file /etc/sssd/sssd.conf, I could clearly see that there was one entry missing compared to the other servers that didn’t experience the SAP ASE start error.

The parameter: “subdomain_enumerate = none” was missing.
Looking at the manual page for SSSD it would seem that without this parameter there is additional overhead during any Active Directory traversal.

I set the parameter accordingly in the /etc/sssd/sssd.conf file and restarted the SSSD daemon using:

service sssd restart

Then I retried the start of the database using the hostagent command shown previously.
It worked!
I then retried with SAP LaMa and that also now started ASE without error messages.

Root Cause

What it seems was happening, was some sort of internal pre-set timeout in the sapdbctrl binary, when hit, the sapdbctrl just abandons and throws the error that I was seeing. This leaves the ASE database to continue and start (the process was initiated), but in the hostagent it looked like it had failed to start.
By adding the “subdomain_enumerate = none” parameter, any “delay”, caused by inappropriate call to Active Directory was massively reduced and subsequent start activities were successful.

Analysing & Reducing HANA Backup Catalog Records

In honour of DBA Appreciation Day today 3rd July, I’ve written a small piece on a menial but crucial task that HANA database administrators may wish to check. It’s very easy to overlook but the impact can be quite amazing.

HANA Transaction Logging

In “normal” log mode (for recoverability), the HANA database, like Oracle, has an automatic transaction log backup process, which is responsible for backing up transaction log segments so that the HANA log volume disk space can be re-used by new transactions.
No free disk space in the HANA log volume, means the database will hang, until free space becomes available.

It is strongly recommended by SAP, to have your HANA database in log mode “normal”, since this offers the point-in-time recovery capability through the use of the transaction log backups.

By default a transaction log backup will be triggered automatically by HANA every time a log segment becomes full or if the timeout for an individual service is hit, whichever of those is sooner.
This is known as “immediate” interval mode.

I’m not going to go into the differences of the various interval options and the pros and cons of each since this is highly scenario specific. A lot of companies have small HANA databases and are quite happy with the default options. Some companies have high throughput, super low latency requirements, and would be tuning the log backup process for maximum throughput, while other companies want minimal data-loss and adjust the parameters to ensure that transactions are backed up off the machine as soon as possible.

The SITREP

In this specific situation that I encountered, I have a small HANA database of around ~200GB in memory, serving a SAP Solution Manager 7.2 system (so it has 2x tenant databases plus the SystemDB).

The settings are such that all databases run in log_mode “normal” with consolidated log backups enabled in “immediate” mode and a max_log_backup_size of 16GB (the default, but specified).

All backups are written to a specific disk area, before being pushed off the VM to an Azure Storage Account.

The Issue

I noticed that the local disk area was becoming quite full where the HANA database backups are written. Out of context you might have said it’s normal for an increase of activity in the system, but I know that this system is not doing anything at all (it’s a test system for testing Solution Manager patches and nobody was using it).

What Was Causing the Disk Usage?

Looking at the disk backup file system, I could easily see at the O/S level, that the HANA database log backups were the reason for the extra space usage.
Narrowing that down even further, I could be specific enough to see that the SYSTEMDB was to blame.

The SYSTEMDB in a very lightly used HANA database should not be transacting enough to have a day-to-day noticeable increase in log backup disk usage.
This was no ordinary increase!
I was looking at a total HANA database size on disk of ~120GB (SYSTEMDB plus 2x TenantDBs), and yet I was seeing ~200GB of transaction log backups per day from just the SYSTEMDB.

Drilling down further into the log backup directory for the SYSTEMDB, I could see the name of the log backup files and their sizes.
I was looking at log backup files of 2.8GB in size every ~10 to ~15 minutes.
The files that were biggest were….

… log_backup_0_0_0_0.<unix epoch time>
That’s right, the backup catalog backups!

Whenever HANA writes a backup, whether it is a complete data backup, or a transaction log backup, it also writes a backup of the backup catalog.
This is extremely useful if you have to restore a system and need to know about the backups that have taken place.
By default, the backup catalog backups are accumulated, which means that HANA doesn’t need to write out multiple backups of the backup catalog for each log backup (remember, we have 2x tenantDBs).

Why Were Catalog Backup Files So Big?

The catalog backups include the entire backup catalog.
This means every prior backup is in the backup file, so by default the backup catalog backup file will increase in size at each backup, unless you do some housekeeping of the backup catalog records.

My task was to write some SQL to check the backup catalog to see how many backup catalog records existed, for what type of backups, in which database and how old they were.

I came up with the following SQL:

--- Breakdown of age of backup records in months, by type of record.
SELECT smbc.DATABASE_NAME,
smbc.ENTRY_TYPE_NAME,
MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE) as AGE_MONTHS,
COUNT(MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE)) RECORDS,
t_smbc.YOUNGEST_BACKUP_ID
FROM	"SYS_DATABASES"."M_BACKUP_CATALOG" AS smbc,
		(SELECT xmbc.DATABASE_NAME, 
				xmbc.ENTRY_TYPE_NAME, 
				MONTHS_BETWEEN(xmbc.SYS_START_TIME, CURRENT_DATE) as AGE_MONTHS, 
				max (xmbc.BACKUP_ID) as YOUNGEST_BACKUP_ID 
				FROM "SYS_DATABASES"."M_BACKUP_CATALOG" xmbc 
				GROUP BY xmbc.DATABASE_NAME, 
						xmbc.ENTRY_TYPE_NAME, 
						MONTHS_BETWEEN(xmbc.SYS_START_TIME, CURRENT_DATE) 
		) as t_smbc 
WHERE t_smbc.DATABASE_NAME = smbc.DATABASE_NAME 
AND t_smbc.ENTRY_TYPE_NAME = smbc.ENTRY_TYPE_NAME 
AND t_smbc.AGE_MONTHS = MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE) 
GROUP BY 	smbc.DATABASE_NAME, 
			smbc.ENTRY_TYPE_NAME, 
			MONTHS_BETWEEN(smbc.SYS_START_TIME, CURRENT_DATE), 
			t_smbc.YOUNGEST_BACKUP_ID 
ORDER BY DATABASE_NAME, 
		AGE_MONTHS DESC,
		RECORDS

The key points to note are:

  • I use the SYS_DATABASES.M_BACKUP_CATALOG view in the SYSTEMDB to see across all databases in the HANA system instead of checking in each one.
  • For each database, the SQL outputs:
    – type of backup (complete or log).
    – age in months of the backup.
    – number of backup records in that age group.
    – youngest backup id for that age group (so I can do some cleanup).

An example execution is:

(NOTE: I made a mistake with the last column name, it’s correct in the SQL now – YOUNGEST_BACKUP_ID)

You can see that the SQL execution took only 3.8 seconds.
Based on my output, I could immediately see one problem, I had backup records from 6 months ago in the SYSTEMDB!

All of these records would be backed up on every transaction log backup.
For whatever reason, the backup process was not able to honour the “BACKUP CATALOG DELETE” which was meant to keep the catalog to less than 1 month of records.
I still cannot adequately explain why this had failed. The same process is used on other HANA databases and none had exhibited the same issue.

I can only presume something was preventing the deletion somehow, since in the next few steps you will see that I was able to use the exact same process with no reported issues.
For reference this is HANA 2.0 SPS04 rev47, patched all the way from SPS02 rev23.

Resolving the Issue

How did I resolve the issue? I simply re-ran the catalog deletion that was already running after each backup.
I was able to use the backup ID from the YOUNGEST_BACKUP_ID column to reduce the backup records.

In the SYSTEMDB:

BACKUP CATALOG DELETE ALL BEFORE BACKUP_ID xxxxxxxx

Then for each TenantDB (still in the SYSTEMDB):

BACKUP CATALOG DELETE FOR <TENANTBD> ALL BEFORE BACKUP_ID xxxxxxxx

At the end of the first DELETE execution *in the first Tenant*, I re-ran the initial SQL query to check and this was the output:

We now only have 1 backup record, which was the youngest record in that age group for that first tenant database (compare to screenshot of first execution of the SQL query with backup id 1,590,747,286,179).
Crucially we have way less log backups for that tenant. Weve gone down from 2247 to 495.
Nice!
I then progressed to do the delete in the SYSTEMDB and other TenantDB of this HANA system.

Checking the Results

As a final check, I was able to compare the log backup file sizes:

The catalog backup in file “log_backup_0_0_0_0.nnnnnnn” at 09:16 is before the cleanup and is 2.7GB in size.
Whereas the catalog backup in “log_backup_0_0_0_0.nnnnnnn” at 09:29 is after the cleanup and is only 76KB in size.
An absolutely massive reduction!

How do we know that file “log_backup_0_0_0_0.nnnnnnn” is a catalog backup?
Because we can check using the Linux “strings” command to see the file string contents.
Way further down the listing it says it is a catalog backup, but I thought it was more interesting to see the “MAGIC” of Berlin:

UPDATE: August 2020 – SAP note 2962726 has been released which contains some standard SQL to help remove failed backup entries from the catalog.

Summary

  • Check your HANA backup catalog backup sizes.
  • Ensure you have alerting on file systems (if doing backups to disk).
  • Double check the backup catalog record age.
  • Give tons of freebies and thanks to your DBAs on DBA Appreciation Day!
Useful Links

Enable and Disable Automatic Log Backup
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/241c0f0020b2492fb93a69a40b1b1b9a.html

Accumulated Backups of the Backup Catalog
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/3def15378b954aac85f2b93bb3f85a49.html

Log Modes
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/c486a0a3bb571014ab46c0633224f02f.html

Consolidated Log Backups
https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/2.0.05/en-US/653b5c6d5f9d41808011a5bd0fac6709.html

HANA 2.0 – Calc View – SAP DBTech JDBC 2048 Column Store Error

Scenario: During DB access of a HANA 2.0 SPS3 Calculation View from S/4HANA ABAP stack (via ABAP) or even directly in HANA Studio (via “Raw Data”), an error is displayed in a short dump or on screen along the lines of “SAP DBTech JDBC (2048: column store error: search table error: (nnnn) Instantiation of calculation model failed: exception 30600. An Internal error occurred”.

After investigation you observe the following error inside the indexserver trace log: “Could not get template scenario <SID>::_SYS_BIC:_SYS_SS_CE_<nnnn>_vers2_lang6_type1_CS_2_2_TMP (t -1) of calculation index <SID>::_SYS_BIC:<PACKAGE>/<CALCVIEW> (t -1). reason: CalculationEngine read from metadata failed.; Condition ‘aScenarioHandle.is_valid()’ failed.”.

The error clearly references the name of your Calculation View (calculation index) but it also references another object with a name like “_SYS_SS_CE_*”.

SAP note 1646743 explains that objects with a naming convention of “_SYS_SS_CE_<guid>_TMPTBL” are temporary tables created during compilation of procedure objects. Whilst our objects naming convention is not an exact match, the assumption is that the object is temporary in nature and created during the compilation of our Calculation View.

To backup the above theory, SAP note 2717365 matches the initial error message in some respects and shows the method to recompile the temporary object.
The note explains that to reproduce its described situation you must “Create a script calculation view which is created based on a procedure.”.

With this in mind, after checking our erroring Calculation View, it is clearly possible to see that ours utilises a “Script” as part of its design.

Therefore, we can assume that the temporary object with naming convention “_SYS_SS_CE_<nnnn>_vers2_lang6_type1_CS_2_2_TMP” is the temporary representation of the script from within our Calculation View.

Following the SAP note, we can recompile the object via its source Calculation View as follows using HANA Studio SQL execution (or hdbsql command line):

(NOTE: in our case the object is owned by user SAPABAP1, so we login/connect as that user in order to execute)

ALTER PROCEDURE “_SYS_BIC”.”<PACKAGE>/<CALCVIEW>/proc” RECOMPILE;

The execution succeeds.
However on retrying to access the data within the view, we still get an error.
What happened, well looking again at our Calculation View, it appears that it references another Calculation View!
So we must recompile all referencing Calculation Views also.

To cut a long story short, it turned out that we had over 4 levels Calculation Views before I decided to just recompile all procedures (if existing) of all known Calculation Views. Some of the views were even in different schemas.

How do we obtain a list of all Calculation Views that use a script and would have temporary procedures?

We can use this SQL string to create the required list of “type 6” objects:

SELECT ‘ALTER PROCEDURE “‘||schema||'”.”‘||name||'” RECOMPILE;’ FROM sys.p_objects_ WHERE type=6 and name like ‘%/proc’

How did I find this? All (or most) HANA objects are represented in the SYS.P_OBJECTS table.

Even temporary SQL objects need to be accounted for in the general administrative operations of the database, they need to be listed somewhere and have a corresponding object ID.
By executing the SQL as the SAPABAP1 user, we get output in a similar fashion as to that shown below, with the first line being a column header:

‘ALTER PROCEDURE “‘||SCHEMA||'”.”‘||NAME||'” RECOMPILE;’

ALTER PROCEDURE “_SYS_BIC”.”sap.erp.sfin.co.pl/FCO_C_ACCOUNT_ASSIGNMENT/proc” RECOMPILE;

ALTER PROCEDURE “_SYS_BIC”.”sap.erp.sfin.rtc/RTC_C_FISCMAPA/proc” RECOMPILE;

We can then simply execute the output SQL lines for each object to be recompiled.
On attempting access to the Calculation View, it now correctly returns data (or no data), and does not show an error message.

The next question is, why did we get this problem?

Looking back at SAP note 2717365 it says “This error happens because the temporary created objects were not cleared up properly when this happened with an error.”.
Remember that this is not an exact match for our error, but I think the explanation is good enough.

An error occurred during the creation of the temporary procedures that underpin our scripted Calculation Views.

We don’t know what the error or issue was, but subsequently recompiling those Calculation View temporary procedures fixes the issue.

Powershell PSCredential – Exception calling .ctor with 2 arguments

Scenario: You are using Powershell with “System.Management.Automation.PSCredential” to capture credentials to an encrypted file using “ConvertFrom-SecureString”.
You are then trying to read those credentials as a different Windows account under which they were originally captured, using ConvertTo-SecureString.

You get an error like “Exception calling .ctor with 2 arguments”.

As per David Lee’s blog page (here https://blog.kloud.com.au/2016/04/21/using-saved-credentials-securely-in-powershell-scripts/):

…PowerShell is using the native Windows Data Protection API (DAPI) functionality to encrypt the password from the ‘secure string’ into a text string. This string can be written to a plain text file, but the way that DAPI works is that the encryption is such that only the original user on the original machine the encryption was performed on can decrypt the string back into a ‘Secure string’ to be reused“.

Therefore, following David’s blog, simply use the “-key” parameter when calling “ConvertFrom-SecureString” or “ConvertTo-SecureString” along with a pre-determined AES key.

What I actually did was store the Key file securely with ACL permissions along with the credential file.
Then when reading them back in using “ConvertFrom-SecureString” from a different Windows account on the same machine, I was able to successfully decrypt and use the credential.

Example using credential to map a target UNC to the H: drive on the current machine:

# Some variable declarations.
$keyFile = “C:somefile.key”
$credFile = “C:somefile.cred”
$myUser = “someuser”
$targetPath = “H”
$sourcePath = “//some/unc/path”

# Generate AES key and save to keyFile:
$AESKey = New-Object Byte[] 32
[Security.Cryptography.RNGCryptoServiceProvider]::Create().GetBytes($AESKey)
Set-Content $keyFile $AESKey

# Prompt for a user/password and save the password to a credential file.
$credential = Get-Credential “$myUser”
$credential.Password | ConvertFrom-SecureString -Key (Get-Content $keyFile) | Set-Content $credFile

# As other user account.
# We can now reference the key file (imagine we are running as another user account in another script or something at this point).
# Read in the credential from the file and use it.
$encrypted = Get-Content $credFile | ConvertTo-SecureString -key (Get-Content $keyFile)
$credential = New-Object -TypeName System.Management.Automation.PsCredential -ArgumentList “$myUser”, $encrypted

# Map a drive using the credential.
New-PsDrive -Persist -Name $targetPath -PsProvider “FileSystem” -Scope “Global” -Root $sourcePath -Credential $credential -ErrorAction Stop

Using the separate AES key got around the error.

SAP ABAP Tools in Eclipse – ACM_UNEXPECTED_VALUE

Scenario: You’ve installed the SAP ABAP Tools in Eclipse and you’re trying to browse the catalog in a SAP Netweaver system.
You keep getting an error prompt which contains the error ACM_UNEXPECTED_VALUE.  Checking for SAP notes takes you down the path of ACLs and DCL for ABAP CDS views.  However, you’re not convinced this is the right direction to resolve your problem.

I had the above issue and it was a simple case of a large number of the ABAP CDS (Core Data Services) views in the database did not exist (they had either not been created or failed to activate). 

It was nothing to do with authorisations or permissions (I had profile SAP_ALL).
I was also seeing quite a number of short dumps related to “table not found” type errors from the database layer.

To fix the problem, SAP note 2519431 documents transaction SDDLAR (I was in a NW 7.52 system).

You can run transaction SDDLAR then select “All Missing DDL Views” in the section “Repair“; then click “Activate“.
It gives you the option of running with “Check Only“, without modification to the database.
Browse the list of returned objects that are missing, then when you’re ready, try and activate them all by re-running the report with the “Check and Repair” option selected.