This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

ASE Issue During Setuphadr Removes Logins on Companion

There’s a specific scenario that can occur during the setup of SAP ASE HADR, which can leave the companion (secondary) database with no logins or roles. You will be unable to log into it. It will be completely unusable.

This issue could apply to any ASE SP03 where auditing is turned on (default for PL06 and above) but I have seen this in two systems, one on ASE SP03 PL08 and one on PL08-HF1.

The symptoms are seen during setuphadr in the step for materialisation of the master database; the setuphadr screen output doesn’t change. Eventually it displays a timeout.
When doing a “ps” on the primary ASE server, you can see a number of BCP processes that appear to be “stuck”. They are connecting to the companion database.
Inside the repdata directory location, the BCP data and out files are present and but they do not change.

When attempting to restart the setuphadr, the process fails as it cannot log into the companion ASE.
When trying manually to log into the companion ASE using isql, you are unable to log in as sapsa, sapsso. You can log in as “sa” only, but you have no roles/privs to do anything.
You check the list of logins and roles in the companion master DB (table master..syslogins) and the tables are empty, all logins have been removed!
You may also see login errors mentioned in the RMA log file on the companion server (the active SRS) with “SQLState: ZZZZZ“.

From what I can see, the cause of the issue is ASE auditing. It is enabled on the companion DB (the sp_configure parameter “auditing” has value of “1”).
The audit table is full or nearly full and/or the sybsecurity database is full or nearly full or the sybsecurity database tran log is full or nearly full.
This prevents the BCP processes from successfully loading the table data from the primary.
It doesn’t seem to prevent the truncation of the companion master DB table data, leaving it with no table data for roles and logins in the master DB.

The workaround that I have found that works most effectively is to completely disable ASE auditing (auditing = 0) in the companion ASE instance, to prevent the issue from happening in the first place.
There are a couple of params that can change the way auditing works and maybe adjusting the filtering to prevent auditing of system users would also solve the problem, but this is the setup of replication so I don’t see why you would want that process audited at all. You can always re-enable auditing after the HADR setup is completed.

Another prevention tip is to ensure that sybsecurity database and tran log have space before the setuphadr process is started.

What if you have already hit the above issue and are looking for help right now?
If the above issue is hit, then the companion master DB could be restored from backup, except you cannot log into it to start the LOAD command.
You could copy the master.dat from primary, but the inbuilt GUID that gets set on DB creation would then be identical to primary (companion and primary should be different).

This leaves two possible options:

  • Re-create the companion master db following SAP note 2631728 “How to rebuild master device and master database from backup – SAP ASE”.
  • Restore master DB from backup (could be a backup from primary master DB).
  • If backup from primary was used, then you may need to re-create the SID database (to be on the safe side), using the DDLGEN commands.

or:

  • Blast the companion ASE away and re-install using SWPM.

Good luck, and make sure you take regular backups 🙂

Tuning SAP ASE – Decompressing Small Lookup Tables

By default, for SAP applications based on SAP Netweaver, SAP ASE 16.0 will compress the data pages within the tables inside the database.
In some scenarios this will not help with performance.

In this brief post I explain the scenario, I correct a SAP note and show you how to check and adjust the tables in a safe way (do not just follow the note, you could corrupt the table data).

Data Page Compression

In ASE 16.0, table data pages can be compressed.

For SAP Netweaver based SAP systems such as SAP ERP, the use of table data page compression can help improve performance in two ways:

  1. Reduced I/O
    Reading and writing compressed data pages to disk makes it more efficient. Imagine that it’s a bit like copying a 4MB zip file on your computer, instead of an uncompressed 40MB text file!
  2. Reduced Memory Consumption
    In memory (in the data cache), the data pages can be held in compressed form and may not need to be uncompressed in order to satisfy certain queries.

The above sounds good and indeed it is good.
But there are some borderline cases where compression on a table may not be beneficial.

The Drawback with Compression

In some cases, compression can affect the response time of queries in very specific circumstances.

If you have “small” tables and those tables are accessed frequently with SQL queries that perform full table scans, then the extra CPU time required to decompress the table contents in the data cache, can impact the response time.

How do We Identify Candidate Tables?

If there are tables which are adversely affected from having compression turned, on, how do we identify those tables?

We can follow SAP note 1775764, which provides a nice simple SQL statement to list likely candidates. Great!

However, the provided SQL statement actually does not do what it should!
Oh dear!
Instead, the SQL incorrectly lists tables that are definitely not good candidates for decompression. In fact, some of them are the complete opposite of good candidates!

The SQL taken from the SAP note is:

--- DO NOT RUN THIS IT IS WRONG!!! ---
use <SID>
go
select top 10 ObjectName, 
LogicalReads 
from master..monOpenObjectActivity 
where DBID = db_id() 
and LogicalReads > 1000000 
-->   and data_pages(DBID,0)) < 6400      <-- WRONG LINE HERE! 
and object_attr(DBName || '.SAPSR3.' || ObjectName,'compression') <> 'none' 
order by LogicalReads desc
go

You will see in the above SQL taken from the SAP note, the “data_pages” function call has 2 parameters, the ID of the current database and an object id. Except the object id on line 9, has been hard coded to “0”.

This causes a value of “0” to be returned from the “data_pages” call, which is obviously always less than 6400.

Essentially, any table that has compression enabled, with high numbers of LogicalReads (table scans in the data cache), will be reported as a candidate! How confusing.

Correcting the SQL

We can see that we need to correct the SQL.
Here’s how:

--- This is the corrected statement ---
use <SID>
go
select top 10 ObjectName, 
LogicalReads 
from master..monOpenObjectActivity 
where DBID = db_id() 
and LogicalReads > 1000000 
and data_pages(DBID,ObjectID)) < 6400
and object_attr(DBName || '.SAPSR3.' || ObjectName,'compression') <> 'none' 
order by LogicalReads desc
go

It’s simple enough, we just substitute the incorrect “0” on line 9, with the column “ObjectID” from the monOpenObjectActivity table we are selecting from.
This correction then allows the correct output from the call to function “data_pages“.
With the correction in place, tables with a data page count of less than 6400 pages (using 16KB pages, means tables less than 100MB in size), with compression enabled and a high number of LogicalReads, will be listed as candidates. Yay!

Adjusting the Candidate Tables

Once you have your table list, you might now wish to turn off compression.
This is simple enough.
You can use the additional SQL supplied in the SAP note:

--- WARNING: check SAP note 2614712 first! ---
use <SAPSID>
go
setuser 'SAPSR3'
set quoted_identifier on
set chained on
go
alter table SAPSR3.<tablename> set compression = none
reorg rebuild SAPSR3.<tablename> [with online]  <-- check 2614712 first!
commit
go

As you can see, you will need to perform a REORG on the table.
The SQL statement specifies the optional “with online” option.
BE WARNED: There are lots of SAP notes about issues with REORG ONLINE, please ensure that you check SAP note 2614712 first! Shame on SAP for not mentioning this in the SAP note!

Checking the Result

We’ve gone through the effort of adjusting tables we think that might benefit from no-compression, but how do we now confirm that we have positively impacted the response time?

Well that’s a difficult one. I’ll leave that for another post.

Summary

The summary for this post, is a little bit “Boris Johnson”:

  • Follow the SAP notes.
  • But don’t follow the SAP notes.
  • Check and double check what is output from commands, don’t just take the output as being correct!
  • Check your results.
  • Compressing tables is a good thing.
  • Compressing specific tables may not be a good thing.

SAP ASE Instance Log File Rotation

SAP ASE does a bare minimum of rotating its text log files.
I’m talking about the logs that you can use to see what’s going on under the covers.
Each time the ASE instance is restarted, it pushes the previous instance log file to a date stamped file and starts fresh. This is not great if you have had issues with the database and the log has grown huge, because who restarts the database every day?

ASE also does not preserve the backupserver log or the job server agent log, which could be critical in solving problems.
Instead, it’s better to devise your own custom daily log rotation process.

In this post I describe the main SAP ASE log files, and my recommendation for a rotation configuration on Linux (yay!) and Windows (here be dragons!).

What Needs to Be Rotated?


The following are standard ASE 16.0 text log files that just keep on growing:

  • ASE database instance log: /sybase/<SID>/ASE-16_0/install/<SID>.log
  • ASE backup server log: /sybase/<SID>/ASE-16_0/install/<SID>_BS.log
  • ASE Job Server log: /sybase/<SID>/ASE-16_0/install/<SID>_JSAGENT.log

I do not include the dumphist file, which (if dumphistory is enabled) gets backed up daily from /sybase/<SID>/ASE-16_0/dumphist to /sybase/<SID>/ASE-16_0/dumphist.nnnnnnnnn.
These can also accumulate, so it might be best to just clear them down, especially if (as is good practice) you copy them off with your database backups.

NOTE: For ASE instances with HADR installed (a.k.a SAP Replication Server) there are also an additional 4 log files which should be rotated, that I won’t go into in this post. (Hire me, and I’ll tell you).

What Happens When ASE Rotates Those Logs?

When the ASE instance is restarted, some of the above standard logs are rotated and some are cleared down. For the ones that are rotated, there is no end to the rotation, they keep building up. They are never removed.

As part of your log rotation solution, you should also rotate the ASE rotated logs, making sure that they don’t build up over the years of ASE instance restarts (especially if starting/stopping systems frequently in The Cloud).

This is what happens to the auto-rotated logs:

  • <SID>.log is moved to <SID>.log.nnnnnnnn_nnnnnn
  • <SID>_BS.log is cleared down.
  • <SID>_JSAGENT.log is cleared down.

We can see that for those log files that are cleared down, a daily rotation should capture them, but it will be hit and miss. If the ASE instance is restarted part way through the day, then the morning logs will not have been preserved (unless you remember to manually run your log rotation). There’s not much we can do about this.

For the log that is auto-rotated by ASE (on restart), then we also need to capture that in our rotation schedule. We have a choice to maybe just compress it and only retain a specific number.

What Log Retention Works Best?

For production systems, I usually say 30 rotations is ideal (1 month).
It’s good to keep the logs so you can see when a problem may have started. Critically, this might help you decide how far back you may need to restore a database (in the dire event of corruption). If you keep 30 days of database backups, then having the <SID>_BS.log for 30 days might just help.

Rotation Schedule

With tools like logrotate, you can decide to rotate logs on a time basis, or even on a size basis.

Since SAP ASE automatically rotates the <SID>.log file on instance restart, it would get a bit messy if we also have a daily rotation on top of this.
Instead, it works a little bit better to rotate this file on a size basis. Keeping the <SID>.log within a defined file size means that you are able to prevent the log file from growing huge, but also reducing the number of rotations within rotations.

We’ve already mentioned that the ASE backup server log is quite important for helping us to understand when backups were taken and any issues during that backup. This is the reason we rotate this file daily, no matter what size it is.

The ASE job server is a little less critical, so we really don’t need it rotated every day. However it can grow quite big if the job server experiences a database space issue. So instead we can choose to rotate on size.

With the above in mind, this is what I would recommend for the ASE log files:

  • <SID>.log rotated on size >5MB with 30 rotations.
  • <SID>_BS.log rotated daily with 30 rotations.
  • <SID>_JSAGENT.log rotated on size>5MB with 30 rotations.

For ASE rotated copies of <SID>.log (called <SID>.log.nnnnnnnn_nnnnnn), it’s best to retain something like 10 rotations for production systems.
This captures the logs from the last 10 restarts. If you are restarting the ASE instance every day, then you may wish to retain more than 10.

Log File Truncation

During the rotation process, you have an option to keep the existing log file or truncate it. My preference is to truncate the original log file, to prevent the log from growing.
If you don’t truncate the original, then you will have a cumulative collection of logs for 30 days, with each day’s copy being bigger than the last and including the previous day’s content.

The exception to this rule is obviously the ASE auto-rotated copies of the <SID>.log. We don’t want to be truncating that at all.

Log File Compression

After the copy of the logs has been taken, the copy can then be compressed. This makes complete sense, because text files (these are ASCII text logs) compress really, really well. You should expect ~90% compression ratios.

Just make sure that you chose a compression tool/utility that has an easy method to access the compressed contents.

For example, in Linux I can use gzip. Then if I need to scan the log for a particular error I can use zcat and pipe through grep to check the contents (like this: “zcat <theGZlog>|grep error”).
I would not want to first uncompress the log, then go through it. That would be painful.

On Windows, I would need to use PowerShell to stream the zipped contents and pattern match. So a script or a separate utility would be needed to help with this.

Log Copy Naming

Once the log file has been copied to a new file and compressed, the name of the copied log file should be appended with the date time stamp.
This allows you to easily identity from when a log file was copied.
Just bear in mind that if you run your log rotation at exactly midnight, the date time stamp will be for the current day, but the log contents will be for the previous day!

The exception to this rule is the ASE auto-rotated copies of <SID>.log.
These copies already have a date time stamp on the end, so we don’t need another one.

Sample Log Rotation Config

Here’s a sample logrotate configuration file that includes all that we have mentioned above.
As part of the logrotate setup we have a designated “status” file which is used by logrotate to record its current state of play, in preparation for its next execution.

The call to logrotate looks like this: logrotate –state <state-file> <config-file>

# logrotate configuration file. 
# This file defines the config for which logrotate uses.
################################################ 
# The global defaults are established first.
# compress all files using default gzip -9. 
compress
# Dont rotate it if its empty. 
notifempty
# Ignore if missing. 
missingok
# Add date as extension instead of just a number. 
dateext 
################################################
# Each section below is specific to a log file match. 
# The contents of the section can override the global defaults.
#
# The ASE dataserver log. 
/sybase/[A-Z][0-9][A-Z]/ASE-[0-9][0-9]*/install/[A-Z][0-9][A-Z].log { 
# Truncate the original log once rotated. 
copytruncate 
# Permit 30 rotations. 
rotate 30 
# Rotate the file based on size. 
size 5M 
}
#
# The auto-rotated ASE dataserver log after a restart of ASE. 
/sybase/[A-Z][0-9][A-Z]/ASE-[0-9][0-9]*/install/[A-Z][0-9][A-Z].log.[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9] { 
# We just zip this file. 
# We dont truncate it. 
nocopytruncate 
# We dont copy it. 
nocopy 
# We dont add the date on the end, its already there. 
nodateext 
# Check daily for these files. 
daily 
# Permit 10 rotations = 10 restarts of ASE. 
rotate 10 
}
#
# The ASE backup server log. 
/sybase/[A-Z][0-9][A-Z]/ASE-[0-9][0-9]*/install/*_BS.log { 
# Truncate the original log once rotated. 
copytruncate 
# Permit 30 rotations. 
rotate 30 
# Rotate the file daily. 
daily 
}
#
# The ASE jsagent server log. 
/sybase/[A-Z][0-9][A-Z]/ASE-[0-9][0-9]*/install/*_JSAGENT.log { 
# Truncate the original log once rotated. 
copytruncate 
# Permit 30 rotations. 
rotate 30 
# Rotate the file based on size. 
size 5M 
}

The above is just the start.
As I mentioned right at the beginning, if you have an ASE instance with HADR configured, then you should also include the 4 other log files (1 for RMA and 3 for RepServer).

There is also the issue of the <SID>.cfg configuration file which is copied to <SID>.nnnnnnnnn on each ASE instance configuration change. You can have a lot of these files build up.
It’s good practice to include this file in your backup regime, but you can also include it in a rotation schedule and remove the ASE created copies.

Executing File Removal

In the logrotate configuration file, in every log rotation file block, it is possible to include a script block that gets executed every time a successful file rotation is performed.
In the logrotate manual, look at the “lastaction” block and include the removal of old dumphist files in there using the “rm” command.
A word of warning, when using these script blocks, logrotate has the ability to pass the path and name of the rotated file as the 1st parameter (i.e. as “$1”). I have found that in my specific version it includes a space on the end. Rotating “somefile.ext.001” and then having a script block in “lastaction” with a pattern like “rm $1.[0-9][0-9][0-9]” would actually remove the file passed in as “$1” (somefile.ext) plus anything that it finds that matches pattern “.[0-9][0-9][0-9]”. Quite problematic and dangerous.
Instead I found it more reliable to code the file pattern manually just like the main part of the file block.

Rotation Fun with Windows

Contrary to the header, it’s not fun to try and rotate logs on Windows.
There are generally 2 issues I’ve found:

  1. Windows locks files and can cause issues with log file truncation unless you use a utility able to handle this (LogRotateWin can).
  2. There is no built-in log rotation utility on Windows.

If you have a heterogeneous landscape you generally do not want to be maintaining too many different tools and configurations.
I needed to rotate ASE logs on Linux and Windows, so I found that using the free LogRotateWin sort of worked well. It is mostly compatible with the same configuration options as the Linux version.
It does not support all the configuration settings of the Linux version. Example, “notifempty” is not supported and throws an error in the debug output.
It also does not like pattern matching properly. What it seems to do is traverse through the whole directory structure from the moment it sees a wildcard, looking for files at each branch, instead of following my specified pattern.
An example being pattern: F:\sybase\???\ASE-16_0\???.log.
The LogRotateWin tool seems to look through all directories under “F:\sybase”, trying to match *files* and folders with wildcard “???”, when it is clearly a folder that I have specified.
For this reason, I actually found it far more reliable to use PowerShell to transform an existing configuration file and find and replace those wildcards with the actual file names.
I used PowerShell to find the files I needed to be rotated and adjust a temporary version of the LogRotateWin configuration.
In fact this worked so well, that I went a step further! I created some PowerShell to download my Linux logrotate configuration file, adjust it to make it LogRotateWin compatible by swapping the slash directions, pre-finding the files (by traversing the system drives) and replacing the wildcards then also removing unsupported features like “notifempty”.
It also swapped my Linux “rm” syntax with a call to “pwsh” (PowerShell) and the short-hand call to the “remove-item” (rm) command-let.
The PowerShell script then calls the LogRotateWin with the temporary configuration file.

Not fun, but it works and it means I only have 1 configuration file I need to worry about.

Summary:

A good log rotation strategy for SAP ASE databases is important to maintain the ASE text log files effectively.
It’s not just about disk space management and housekeeping, but it is also about retaining vital information to help resolve issues and having easy and quick access to that information.

Maintaining a consistent approach across heterogeneous landscapes is not easy, and may require you to be creative in your method.
You should seek to harmonise configuration as much as possible and reduce maintenance effort.

As part of the ASE log rotation configuration, you should also look to include the dumphist file and also the HADR log files (from SRS and RMA).

A final comment: In some cases, I’ve witnessed an ASE system error which has caused thousands of retries to execute “sp_configure” which caused thousands of versions of the <SID>.cfg file to be created. So many, that it took multiple executions of “rm” to remove them.
You may wish to look at including these <SID>.cfg file copies in a rotation configuration as a preventative measure.

Best Disk Topology for SAP ASE Databases on Azure

Maybe you are considering migration of on-premise SAP ASE databases to Microsoft Azure, or you may be considering migrating from your existing database vendor to SAP ASE on Azure.
Either way, you will benefit from understanding a good, practical disk topology for SAP ASE on Azure.

In this post, I show how you can optimise use of the SAP ASE, Linux and Azure technical layers to provide a balanced approach to disk use, considering both performance and disk (ASE device) management.

The Different Layers

In an ASE on Linux on Azure (IaaS) setup, you have the following layers:

  • Azure Storage Services
  • Azure Data Disk Cache Settings
  • Linux Physical Disks
  • Linux Logical Volumes
  • Linux File Systems
  • ASE Database Data Devices
  • ASE Instance

Each layer has different options around tuning and setup, which I will highlight below.

Azure Storage Services

Starting at the bottom of the diagram we need to consider the Azure Disk Storage that we wish to use.
There are 2 design considerations here:

  • size of disk space required.
  • performance of disk device.

For performance, you are more than likely tied by the SAP requirements for running SAP on Azure.
Currently these require a minimum of Premium SSD storage, since it provides a guaranteed SLA. However, as of June 2020, Standard SSD was also given an SLA by Microsoft, potentially paving the way for cheaper disk (when certified by SAP) provided that it meets your SLA expectations.

Generally, the size of disk determines the performance (IOPS and MBps), but this can also be influenced by the quantity of data disk devices.
For example, by using 2 data disks striped together you can double the available IOPS. The IOPS are an important factor for databases, especially on high throughput database systems.

When considering multiple data disks, you also need to remember that each VM has limitations. There is a VM level IOPS limit, a VM level throughput limit (megabytes per second) plus a limit to the number of data disks that can be attached. These limit values are different for different Azure VM types.

Also remember that in Linux, each disk device has its own set of queues and buffers. Making use of multiple Linux disk devices (which translates directly to the number of Azure data disks) usually means better performance.

Essentials:

  • Choose minimum of Premium SSD (until Standard SSD is supported by SAP).
  • Spread database space requirements over multiple data disks.
  • Be aware of the VM level limits.

Azure Data Disk Cache Settings

Correct configuration of the Azure data disk cache settings on the Azure VM can help with performance and is an easy step to complete.
I have already documented the best practice Azure Disk Cache settings for ASE on Azure in a previous post.

Essentials:

  • Correctly set Azure VM disk cache settings on Azure data disks at the point of creation.

Use LVM For Managing Disks

Always use a logical volume manager, instead of formatting the Linux physical disk devices directly.
This allows the most flexibility for growing, shrinking and striping the disks for size and performance.

You should stripe the data logical volumes with a minimum of 2 physical disks and a maximum stripe size of 128KB (test it!). This fits within the window of testing that Microsoft have performed in order to achieve the designated IOPS for the underlying disk. It’s also the maximum size that ASE will read at. Depending on your DB read/write profile, you may choose a smaller stripe size such as 64KB, but it depends on the amount of Large I/O and pre-fetch. When reading the Microsoft documents, consider ASE to be the same as MS SQL Server (they are are from the same code lineage).

Stripe the transaction log logical volume(s) with a smaller stripe size, maybe start at 32KB and go lower but test it (remember HANA is 2KB stripe size for log volumes, but HANA uses Azure WriteAccelerator).

Essentials:

  • Use LVM to create volume groups and logical volumes.
  • Stipe the data logical volumes with (max) 128KB stripe size & test it.

Use XFS File System

You can essentially choose to use your preferred file system format; there are no restrictions – see note 405827.
However, if you already run or are planning to run HANA databases in your landscape, then choosing XFS for ASE will make your landscape architecture simpler, because HANA is recommended to run on an XFS file system (when on local disk) on Linux; again see SAP note 405827.

Where possible you will need to explicitly disable any Linux file system write barrier caching, because Azure will be handling the caching for you.
In SUSE Linux this is the “nobarrier” setting on the mount options of the XFS partition and for EXT4 partitions it is option “barrier=0”.

Essentials:

  • Choose disk file system wisely.
  • Disable write barriers.

Correctly Partition ASE

For SAP ASE, you should segregate the disk partitions of the database to avoid certain system specific databases or logging areas, from filling other disk locations and causing a general database system crash.

If you are using database replication (maybe SAP Replication Server a.k.a HADR for ASE), then you will have additional replication queue disk requirements, which should also be segregated.

A simple but flexible example layout is:

Volume
Group
Logical
Volume
Mount PointDescription
vg_aselv_ase<SID>/sybase/<SID>For ASE binaries
vg_sapdatalv_sapdata<SID>_1./sapdata_1One for each ASE device for SAP SID database.
vg_saploglv_saplog<SID>_1./saplog_1One for each log device for SAP SID database.
vg_asedatalv_asesec<SID>./sybsecurityASE security database.
lv_asesyst<SID>./sybsystemASE system databases (master, sybmgmtdb).
lv_saptemp<SID>./saptempThe SAP SID temp database.
lv_asetemp<SID>./sybtempThe ASE temp database.
lv_asediag<SID>./sapdiagThe ASE saptools database.
vg_asehadrlv_repdata<SID>./repdataThe HADR queue location.
vg_backupslv_backups<SID>./backupsDisk backup location.

The above will allow each disk partition usage type to be separately expanded, but more importantly, it allows specific Azure data disk cache settings to be applied to the right locations.
For instance, you can use read-write caching on the vg_ase volume group disks, because that location is only for storing binaries, text logs and config files for the ASE instance. The vg_asedata contains all the small ASE system databases, which will not use too much space, but could still benefit from read caching on the data disks.

TIP: Depending on the size of your database, you may decide to also separate the saptemp database into its own volume group. If you use HADR you may benefit from doing this.

You may not need the backups disk area if you are using a backup utility, but you may benefit from a scratch area of disk for system copies or emergency dumps.

You should choose a good naming standard for volume groups and logical volumes, because this will help you during the check phase, where you can script the checking of disk partitioning and cache settings.

Essentials:

  • Segregate disk partitions correctly.
  • Use a good naming standard for volume groups and LVs.
  • Remember the underlying cache settings on those affected disks.

Add Whole New ASE Devices

Follow the usual SAP ASE database practices of adding additional ASE data devices on additional file system partitions sapdata_2, sapdata_3 etc.
Do not be tempted to constantly (or automatically) expand the ASE device on sapdata_1 by adding new disks, you will find this difficult to maintain because striped logical volumes need at least 2 disks in the stripe set.
It will get complicated and is not easy to shrink back from this.

When you add new disks to an existing volume group and then expand an existing lv_sapdata<SID>_n logical volume, it is not as clean as adding a whole new logical volume (e.g. lv_sapdata<SID>_n+1) and then adding a whole new ASE data device.
The old problem of shrinking data devices is more easily solved by being able to drop a whole ASE device, instead of trying to shrink one.

NOTE: The Microsoft notes suggest enabling automatic DB expansion, but on Azure I don’t think it makes sense from a DB administration perspective.
Yes, by adding a new ASE device, as data ages you may end up with “hot” devices, but you can always move specific devices around and add more underlying disks and re-stripe etc. Keep the layout flexible.

Essentials:

  • Add new disks to new logical volumes (sapdata_n+1).
  • Add big whole new ASE devices to the new LVs.

Summary:

We’ve been through each of the layers in detail and now we can summarise as follows:

  • Choose a minimum of Premium SSD.
  • Spread database space requirements over multiple data disks.
  • Correctly set Azure VM disk cache settings on Azure data disks at the point of creation.
  • Use LVM to create volume groups and logical volumes.
  • Stipe the logical volumes with (max) 128KB stripe size & test it.
  • Choose disk file system wisely.
  • Disable write barriers.
  • Segregate disk partitions correctly.
  • Use a good naming standard for volume groups (and LVs).
  • Remember the underlying cache settings on those affected disks.
  • Add new disks to new logical volumes (sapdata_n).
  • Add big whole new ASE devices to the new LVs.

Useful Links:

SAP ASE Error – Process No Longer Found After Startup

This post is about a strange issue I was hitting during the configuration of SAP LaMa 3.0 to start/stop a SAP ABAP 7.52 system (with Kernel 7.53) that was running with a SAP ASE 16.0 database.

During the LaMa start task, the task would fail with an error message: “ASE process no longer found after startup. (fault code: 127)“.

When I logged directly onto the SAP server Linux host, I could see that the database had indeed started up, eventually.
So what was causing the failure?

The Investigation

At first I thought this was related to the Kernel, but having checked the versions of the Kernel components, I found that they were the same as another SAP system that was starting up perfectly fine using the exact same LaMa system.

The next check I did was to turn on tracing on the hostagent itself. This is a simple task of putting the trace value to “3” in the host_profile of the hostagent and restarting it:

service/trace = 3

The trace output is shown in a number of different trace files in the work directory of the hostagent but the trace file we were interested in is called dev_sapdbctrl.

The developer trace file for the sapdbctrl binary executable is important, because the sapdbctrl binary is executed by SAP hostagent (saphostexec) to perform the database start. If you observe the contents of the sapdbctrl trace output, you will see that it loads the Sybase specific shared library which contains the required code to start/stop the ASE database.

The same sapdbctrl also contains the ability to load the required libraries for other database systems.

As a side note, it is still not known to me, how the Sybase shared library comes to exist in the hostagent executable directory. When SAP ASE is patched, this library must also be patched, otherwise how does the hostagent stay in-step with the ASE database that it needs to talk with?

Once tracing was turned on, I shut the SAP ASE instance down again and then used SAP LaMa to initiate the SAP system start once again.
Sure enough, the LaMa start task failed again.

Looking in the trace file dev_sapdbctrl I could see the same error message that I was seeing in SAP LaMa:

Error: Command execution failed. : ASE process no longer found after startup. 
(fault code: 127) Operation ID: 000D3A3862631EEAAEDDA232BE624001
----- Log messages ---- 
Info: saphostcontrol: Executing StartDatabase 
Error: sapdbctrl: ASE process no longer found after startup. 
Error: saphostcontrol: StartDatabase failed (sapdbctrl exit code = 1)

This was great. It confirmed that SAP LaMa was just seeing the symptom of some other issue, since LaMa just calls the hostagent to do the start.

Now I knew the hostagent was seeing the error, I tried using the hostagent directly to perform the start, using the following:

/usr/sap/hostctrl/exe/saphostctrl -debug -function StartDatabase -dbname <SID> -dbtype syb -dbhost <the-ASE-host>

NOTE: The hostagent “-debug” command line option puts out the same information without the need for the hostagent tracing to be turned on in the host_profile.

Once again, the start process failed and the same error message was present in the dev_sapdbctrl trace file.

This was now really strange.
I decided that the next course of action was to start the process of raising the issue with SAP via an incident.
If you suspect that something could take a while to fix, then it’s always best to raise it with SAP early and continue to look at the issue in parallel.

Continuing the Diagnosis

While the SAP incident was in progress, I continued the process of trying to self-diagnose the issue further.
I tried a couple more things such as:

  • Starting and stopping SAP ASE manually using stopdb/startdb commands.
  • Restarting the whole server (this step has a place in every troubleshooting process, eventually).
  • Checking the server patch level.
  • Checking the environment of the Linux user, the shell, the profile files, the O/S limits applied.
  • Checking what happens if McAfee anti-virus was disabled (I’ve seen the ePO blocking processes before).

Eventually exhaustion set in and I decided to give the SAP support processor time to get back to me with some hints.

Some Sleep

I spend a lot of time solving SAP problems. A lot of time.
Something doesn’t work according to the docs, something did work but has stopped working, something has never worked well…
It builds up in your mind and you carry this stuff around in your head.
Subconsciously you think about these problems.

Then, at about 3am when you can’t get back to sleep, you have a revelation.
The hostagent is forking the process to start the database as the syb<sid> Linux user (it uses “su”), from the root user (hostagent runs as the root user).

Linux Domain Users

The revelation I had regarding the forking of the user, was just the trigger I needed to make me consider the way the Linux authentication was setup on this specific server with the problem ASE startup.

I remembered at the beginning of the project that I had hit an issue with the SSSD Linux daemon, which is responsible for interfacing between Linux and Microsoft Active Directory. At that time, the issue was causing the hostagent to hang when operations were executed which required a switch to another Linux user.
This previous issue was actually a hostagent issue that was fixed in a later hostagent patch. During that particular investigation, I requested that the Linux team re-configure the SSSD daemon to be more efficient with its Active Directory traversals, when it was looking to see if the desired user account was local to Linux or if it was a domain account.

With this previous issue in mind, I checked the SSSD configuration on the problem server. This is a simple conf file in /etc/sssd.

The Solution

After all the troubleshooting, the raising of the incident, the sleeping, I had finally got to the solution.

After checking the SSSD daemon configuration file /etc/sssd/sssd.conf, I could clearly see that there was one entry missing compared to the other servers that didn’t experience the SAP ASE start error.

The parameter: “subdomain_enumerate = none” was missing.
Looking at the manual page for SSSD it would seem that without this parameter there is additional overhead during any Active Directory traversal.

I set the parameter accordingly in the /etc/sssd/sssd.conf file and restarted the SSSD daemon using:

service sssd restart

Then I retried the start of the database using the hostagent command shown previously.
It worked!
I then retried with SAP LaMa and that also now started ASE without error messages.

Root Cause

What it seems was happening, was some sort of internal pre-set timeout in the sapdbctrl binary, when hit, the sapdbctrl just abandons and throws the error that I was seeing. This leaves the ASE database to continue and start (the process was initiated), but in the hostagent it looked like it had failed to start.
By adding the “subdomain_enumerate = none” parameter, any “delay”, caused by inappropriate call to Active Directory was massively reduced and subsequent start activities were successful.