ASE Archives » Page 2 of 6 » Musings of an IT Implementor

SAP ASE HADR Overview – Part3

In this multi-part post, I’m explaining the basics behind how SAP Replication Server works when replicating from a SAP ASE database to a SAP ASE database as part of an HADR (ASE always-on) setup for a SAP Business Suite system.
The post will be based on SAP ASE (Adaptive Server Enterprise) 16.0 HADR with SAP Replication Server (SRS) 16.0.

In Part 1 we started with:

What is SRS.
The basic premise of HADR with SRS.
What a transaction is.

In Part 2 we went on to discuss:

What is the ASE transaction log.
Which databases are replicated.
How do transactions move to the SRS.

In this part we discuss the role of the active SRS and how the internal processing moves the transactions into the secondary ASE database.

What is the Active SRS?

In a standard HADR setup with primary and secondary databases, there are two instances of SRS.

The inactive SRS is running on the primary database node.
The active SRS is running on the secondary database node.

The active SRS receives the transactions (commands) from the Replication Agents in the primary databases.

To those with DB2 or Oracle experience, this replication hop seems strange at first. On closer inspection it achieves the same desired result, the transactional data is successfully persisted on a server separate to the primary database..

The inactive SRS is unused until a failover occurs.
During a failover the inactive SRS, on the old primary server, can switch replication paths to become the active SRS. Therefore the inactive SRS is the reverse path of replication.

What are the Key Parts of SRS?

In my view, there are 5 key parts to the SRS 16.0 architecture.
At each of these stages the data is persisted.

The primary database.
Sends transactions from the transaction logs, via the Replication Agent threads to the active SRS.
The SRS Simple Persistent Queue (SPQ).
This is a simple set of disk files for persisting the unordered transactions received on the SRS.
In synchronous replication mode, once a replicated transaction is persisted in the SPQ, it is classified as “received” on the secondary which allows the transaction to commit on the primary database.
A backlog on the SPQ could mean the underlying disk is not fast enough for the replication workload, or that the server hosting the actives SRS is suffering CPU/IO saturation.
If you have anti-virus installed, you should treat the SPQ disk just like you treat other database data files (i.e. exclude them from the A/V scanning).
The SRS Stable Inbound Queue (IBQ).
An ordered table of the replicated transaction commands that represent the open or final transactions.
In the case of transactions that are rolled back, they are removed at the IBQ once the rollback command is seen (needs to come via the SPQ).
There is one IBQ for each primary database and it only ever holds transactions from a database and never from another Rep Server.
The SRS internals process the transactions on the IBQ that have a corresponding commit record (needs to come via the SPQ).
These committed transactions are grouped/compacted, ordered and translated into the correct target language for the target database platform and moved to the respective outbound queue for the target databases.
A backlog on the IBQ could mean the SRS internals may not be keeping up with the replication workload.
It is important to re-state that transactions on the IBQ could be open and waiting for a rollback or commit, which means the SPQ needs space for the transactions that contain those commands to make it through to the IBQ.
The SRS Stable Outbound Queue (OBQ).
Committed transactions are moved from the IBQ onto the the OBQ.
The OBQ actually shares a portion of the same partition space as the IBQ, so moving between the IBQ and OBQ is very quick.
There is one OBQ for each target database and one if the target is another SRS (in scenarios with a third DR node).
The Target (a.k.a Standby or Secondary or Companion) Databases.
The SRS has a set of distribution threads (DIST) that apply the transactions from the OBQs to the respective target databases via the DSI (Data Server Interface).
NOTE: In my diagrams I’ve positioned the DSI as slightly separate, but it is actually a module/component of the DSI.

For scenarios with a DR node also, the target is the DR Rep Server.
In the target databases you will the <SID>_maint user is used to apply the transactions.
A backlog on the OBQ could indicate a problem with the performance of the target database.

Replication Synchronisation Modes

With ASE HADR there are 3 replication modes available. The different modes affect how transactions are treated.

In asynchronous replication mode (a.k.a “Warm Standby” or “DR”), the Replication Agent threads operate in a lazy way. Scanning for and sending transactions to the active SRS in batches when they can.

There is a replication delay in asynchronous mode, but it does mean that the responsiveness of the primary database may seem better to the client applications (SAP Business Suite) because the commit to the primary database does not wait for the Replication Agent to send the transaction to the SRS.
Because of the inherent delay, there is a high possibility of data-loss in a failover scenario.
Even in asynchronous mode, the transaction log of the primary database cannot be freed until transactions are actually committed to secondary (the STP cannot be moved until transactions are committed on secondary).

When HADR is configured in synchronous replication mode (a.k.a “Hot Standby” or “HA”), each transaction is immediately sent by the replication agent to the SRS for persisting on disk at the SPQ.
Once safely in the SPQ, the primary database is allowed to commit the transaction.

This means synchronous replication mode has a direct impact on the responsiveness of the primary database transactions, because the commit on primary will be delayed by the Replication Agent + network transfer + memory + I/O latency for persisting to the SPQ of the SRS.

Lastly, near-synchronous mode works the same way as synchronous, but it is designed for slower disks hosting the SPQ on the SRS. This means that the SRS acknowledgement is sent back to the Replication Agent as soon as transaction is received in memory, but before it is persisted to the SPQ files on disk.

Compared to synchronous mode, near-sync has a slightly higher possibility of data-loss, in exchange for a slight reduction in latency.

Open Transaction Handling

As mentioned previously, transactions are replicated from the primary almost as soon as they are started (as soon as the transaction log records the start of the transaction). This means “open” transactions are replicated as soon as they are started and flow all the way to the IBQ.

(See Part 1 for a description of “Open Transactions”)

If transactions were only replicated once they were committed on primary, there would be a large delay before that transaction was safely applied to the secondary database.
When enabled, the “early dispatch” feature (“parallel_dist”) even allows large open transactions (using specific SRS threads) to start applying to the secondary database early, once the commit record is seen at the SPQ.

What are the Implications of Replication Delay?

There are two main implications of delay (latency) in replication:

Potential Data Loss.
Any delay in the replication from primary to SRS introduces potential data-loss.
This is because the aim of replication is to get the transactional data off the primary databases as soon as possible and safely onto the active SRS node (server) in another location. Any delay between the active SRS and the secondary database could mean data-loss but only if the SRS software itself is lost or corrupted.
The ultimate goal is to get from the primary database to the active SRS as quickly as possible, with the second goal being to get from the primary database to the secondary database as quickly as possible.
This opens up possibilities regarding architecture design and sizing of the secondary databases, which I will cover in another post.
Response Time.
Any delay in synchronous replication mode can also delay the commit of the transactions on the primary databases. In a Netweaver ABAP stack, this delay would be seen in the SAP Business Suite as a longer database response time in the Netweaver UPDATE work processes, therefore the delay is unlikely to be seen by the users themselves. In a Java stack, there is no such UPDATE work process, so the Java stack is likely to be more sensitive to longer response times.

In the next part (part 4 is here), we will step through the replication of a single transaction and discuss the impact of different scenarios along the way.

HowTo: Script ASE Configuration Check

Maybe you have lots of ASE database instances and you would like to ensure that they are all in alignment with regards to configuration.

In this post I show how to output a generic line of text for each configuration line in the ASE <SID>.cfg file.
The line includes the following items:

The filename (e.g. <SID>.cfg)
The section name of the file (e.g. [somesection])
The name of a parameter within the section.
The value of the parameter.

The output is in this format:
<filename>:<section>~<parameter> = <value>

This allows you to combine multiple ASE instance configuration files into a single log file, which can then either be downloaded into Excel for value comparison across the landscape or you can use tools such as “grep” to compare individual parameters.

The Process

First you need to gather up all the configuration files from across your landscape and place them all in one directory:

<SID1>.cfg
<SID2>.cfg
etc

For the above, I can recommend a custom operation via HostAgent and either a website with upload capability, an FTP site, SCP script, shared NFS location or some other common shared area where you can upload the files from each server.

Now we execute the code against the directory where the configuration files have been collected.

Switch to bash or ksh:

ksh

Run the code (change “your_dir” for your config files location):

ls -1 /your_dir/???.cfg | while read file 
do 
   awk '/^\[/ { print $0 }' $file | sort | while read line 
   do 
      awk -v input="$line" '{ if ( $0 == input ) { getline; while ( $0 !~ /^\[/ ) { if(match($0,/\t(.*)$/,aval)) { if (length(lval)>0){ lval=lval"\n" }; lval=lval""FILENAME":"input"~"aval[1] }; if (! getline) break; }}}END{if (lval) print lval}' $file 
   done 
done

This will list all output to the screen.
To make it go to a new file, append “ > newfile.log” to the very end like this:

...
   done 
done > newfile.log

You will see inside the newfile.log, that you have a great file format for use with “grep“, and can query the value of parameters like so:

NOTE: Replace “<section>” and “<parameter>” with your query values.

grep ':<section>~<parameter> ='  newfile.log

The above will list all matching parameters in a specific section, for each of the ASE instances, for comparison on the screen.
You can use this information to align parameters across your landscape or just as a configuration check.

You may also be interested in:

SAP ASE HADR Overview – Part2

In this multi-part post, I’m explaining the basics behind how SAP Replication Server works when replicating from a SAP ASE database to a SAP ASE database as part of an HADR (ASE always-on) setup for a SAP Business Suite system.
The post will be based on SAP ASE (Adaptive Server Enterprise) 16.0 HADR with SAP Replication Server (SRS) 16.0.

A recap of what we discussed in Part 1:

What is SRS.
The basic premise of HADR with SRS.
What a transaction is.

In this part we discuss the ASE transaction log and how important it is for the working of the database then we discuss how transactions are replicated to the active SRS and touch on the internals of the SRS.

What is the ASE Transaction Log?

The ASE transaction log is the most active part of an ASE database.
When we talk about database ACID principles, the transaction log is what helps deliver those things. It is simply a log of transactions applied to a database.
The transaction log is where changes to the database are first made, it is the register, used to ensure that the database can be recovered to a consistent point-in-time.
Some of the seven ASE databases hosted in an ASE 16.0 instance do not need this point-in-time recovery capability, so they have a special setting that simply truncates the records of the transaction log frequently. The setting/option is called “trunc log on chkpt” and should *never* be applied to the SID database (the BusSite database) in production.
The SID database holds our vital business data, and needs to be recoverable right up to the last transaction that had a final state (committed or rolled-back) applied on that database.

On a regular interval, data changes contained in the transaction log that have a final state, are applied to the database data devices (the database main persistence).
This process is known as a checkpoint.
In those ASE databases with option “trunc log on chkpt” turned on, a checkpoint causes the used transaction log entries to be simply truncated (cleared), with no dump required.
In ASE databases with “trunc log on chkpt” turned off (like the SID database), after a checkpoint has occurred, the used transaction log segments are marked as freeable, and these freeable segments can become available for re-use only once a transaction log backup (dump) is performed to a backup disk or device.

This is a cycle that allows the database to continue to be used for making changes to the database data while also protecting the register of transactions in case of a database crash or even complete corruption.

If the transaction log ever becomes full, then the database is unable to allow new changes and no new transactions can be started until space is made available in the transaction log. It is therefore very important for regular database transaction logs dumps to be performed.
Not only to free space in the transaction log, but also to protect the database in the event that the transaction log is lost or corrupted. For added protection, these transaction dump file(s) can also be moved off to a separate location, away from the database server itself. This is a function of almost every backup tool and as you will find a requirement of almost every database that abides by the ACID principles.

Which Databases are Replicated?

Each ASE instance hosts multiple ASE databases.
In a Business Suite system, the ASE 16.0 instance hosts 7 databases (plus 2 temporary databases and a model database):

Master
Sybsystem
Sybmgmtdb
Sybsecurity
Sybsystemprocs
Saptools
SID

In a HADR system, we usually only replicate the Master database and the SID database. This is because the master database holds vital information about the user logins, roles and privileges, which are needed to be the same in the target system. The SID database is the database for the SAP Business Suite system, it holds our business data, which we need to protect.

How do Transactions Move to the SRS?

Inside the replicated primary ASE databases, there exists a Replication Agent process that runs in its own process thread.
The Replication Agent scans the transaction log of the database it serves and sends the transactions across to the active SAP Replication Server (SRS).
In the active SRS a matching receiver thread (Rep Agent User component thread) receives the transactions and pushes them into the Simple Persistent Queue (SPQ) on the disk of the active SRS server.

Subsequent components inside the active SRS process the transactions from the SPQ and eventually push them into the secondary database.

Part 3 is now available here.

You may also be interested in:

SAP ASE HADR Overview – Part1

In this multi-part post, I’m going to attempt to explain (mainly for my own understanding – as usual), the basics behind how SAP Replication Server works when replicating from a SAP ASE database to a SAP ASE database as part of an HADR (ASE always-on) setup for a SAP Business Suite system.
The post will be based on SAP ASE (Adaptive Server Enterprise) 16.0 HADR with SAP Replication Server (SRS) 16.0.

“‘And what is the use of a book,’ thought Alice, ‘without pictures or conversation?’“.
The quote is from the story, Alice in Wonderland, and I think you may find a picture is most definitely worth a thousand words when it comes to understanding the rabbit hole that is SAP Replication Server.

Hold onto your hats, it is about to get mighty bumpy!

What is SAP Replication Server (SRS)?

Replication Server was originally a Sybase product, ingested by SAP when it bought Sybase.
SAP Replication Server (SRS) is not database specific, it can support a number of other source and target database systems such as SAP ASE, SAP HANA, Oracle and SQLAnywhere. Because of this heterogeneous database support, SRS is quite a complex product, offering a multitude of replication scenarios.
In fact, the SRS product is the underpinning to a number of other SAP products such as SAP Landscape Transformation (SLT) and the Near-Zero Downtime (NZDT) option for database migrations.

SRS can be used to provide HA and/or DR for databases, but it can also be used to produce active-active setups with multi-regional replicas of databases for improved local access times.
This is somewhat of an exceptional case and most definitely rare with SAP Business Suite applications.

In this post, we will be using the simple example of just a primary and a secondary (companion) database.
This is known as either SAP ASE HADR or ASE always-on and is the most common use case. In SRS language our secondary database is referred to as the “companion” database.

With SRS it is also possible to have a three tiered architecture, primary, secondary and tertiary. In this three tier setup the secondary is known as the companion and the disaster recovery (DR) database is known as the “DR node” or tertiary database. You would usually use a three tier HADR architecture setup if you want HA in a primary datacentre or cloud region and also a separate DR in a secondary datacentre or cloud region.

There are multiple replication options: synchronous, near-synchronous or asynchronous, depending on your latency between source and target databases, your required RPO and also depending on your required use of HADR.

SRS is the recommended option for SAP ASE database replication.

What is the Basic Premise of HADR with SRS?

For SAP systems with SAP ASE HADR (always-on), the SRS provides replication of “transactions” from the source database(s) to secondary (companion) and/or DR database(s).

In a SAP landscape, the SAP system is configured to fail-over its connection to the companion in the event of a database failover. No cluster is needed for the database network connectivity, because the ASE database driver (dbsl) is “HA aware”.

Inside the primary database a process called the Replication Agent is responsible for sending the transactions to the Replication Server on the companion database and it is configured in Stream Replication mode. This is the only supported mode in HADR.
NOTE: Stream Replication is also known as “ci” (Component Interface) throughout the SRS administration manuals. There are many occasions where you will need to know this information.

In “ci” mode, the SRS proprietary language Log Transfer Language (LTL) is not used.
With “ci” mode, there are three possible synchronisation modes: synchronous, near-synchronous or asynchronous.

With SRS, the primary and secondary databases are NOT the same database (regarding layout, size, blocks), unlike HANA System Replication.
They are their own databases requiring all the usual care and attention that would be applied to the primary. Such as frequent transaction log backups, health checks etc.

With SRS enabled, database transactions that are started on the primary database are replicated to the secondary while still in the open state.

What is a Transaction?

A transaction is an ATOMIC unit of work with a beginning and an end, with work performed in between.

Each transaction can have one of two final states. It is either committed (saved) to the database, or it can be rolled-back (undone).

A transaction that is not yet in a final state, is called an “open” transaction. Any transaction in the “open” state, is still in progress and occupies space in the database transaction (tran) log. Used space in the tran log cannot be used by other transactions.
In a HADR system, the oldest open transaction executing on the primary database, is usually what is responsible for the position of the Secondary Truncation Point (STP) in the primary database.
The STP is a marker point placed into the primary database transaction log by the Replication Agent and is used to determine the current commit point in the companion database (i.e. it shows the latest transaction that is not yet committed on the companion database).

That’s it for part 1.
In part 2 we will go into the internals of SRS and how transactions are replicated through it.

You may also be interested in:

HowTo: Call sp_dump_history with Purge in ASE 16.0 SP03

Due to a bug in ASE 16.0 SP03 PL08 (and maybe earlier versions) the sp_dump_history call does not work properly.
It just returns an empty screen.
When you look at the procedure code, you will see that it fails to find the tempdb for whatever reason.

We can use the sp_dump_history procedure to look at the dump history (backup history) for an ASE database.
From ASE 15.7 onwards we can also use it to purge the history, reducing the size of the dumphist file and in some cases improving the performance of backups themselves (see SAP note 2753062 for details on issues).

By prefixing the tempdb onto the call from the master database, you can get sp_dump_history to work as it should.

Below is an example that purges all backup records from the dump history file (dumphist) from before January 29th @ 15:00:00 :

use master
go

tempdb..sp_dump_history @operation='purge', @until_time="january 29,2020 15:00:00:00"
go

Tip: In Linux the correct date format can be obtained with the date command:

date "+%b %d,%Y %T"

As another tip, you can use the sp_helptext procedure to look at the source code of any stored procedures. Take a look on help.sap.com for the parameters for sp_helptext.