This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

HowTo: Extract SAP PI/PO Message Payload from SAP ASE DB

Sometimes you may need to directly the extract the SAP PO message payload from the underlying database tables such as BC_MSG_LOG in SAP ASE 16.0 database.
This could also potentially be called: extracting hex encoded ASCII data from an ASE image column. Because the SAP PO tables use an ASE image data type to store the payload as an off-row LOB.

There are plenty of examples for doing this extraction in Oracle, but in ASE it is not so easy because the message size could be larger than that supported by the page size of ASE (usually 16k in an ASE for BusSuite).
This means you won’t be able to store it into a T-SQL variable and use the ASE functions.

Instead, we can use the below simple approach to extract the raw hex, and then use Python 2 to convert it to ASCII:

1, Execute the selection SQL using isql at the Linux command prompt on the database server:

isql -USAPSR3DB -S<SID> -w999 -X

select MSG_BYTES
from [SAPSR3DB.BC_MSG_LOG]
where MSG_ID='<YOUR MSG ID>'
and DIRECTION='OUTBOUND'
and LOG_LOCATION='MS'

go

The output will consist of hexadecimal output, which starts with “0x” and should look something like this:

0x2d2d5341505f6

Copy & paste into a text file on the Linux server (use your favourite text editor) and call the file data.txt.

Edit the data.txt file and remove the first “0x” characters from the data.
Remove all newlines and carriage returns in the file.

Now create a simple Python script to read the data from our file data.txt and translate from hex to ASCII then print to the screen:

with open('data.txt', 'r') as file:
    data = file.read()
print data.decode('hex')

Run the Python script:

python ./myscript.py

The output should contain a header and a footer which start with:  “–SAP_”.
If you get an error from the Python script, then it could be because there are additional newlines or carriage returns in the data.txt file.

SAP ASE HADR Overview – Part5

In this multi-part post, I’m explaining the basics behind how SAP Replication Server works when replicating from a SAP ASE database to a SAP ASE database as part of an HADR (ASE always-on) setup for a SAP Business Suite system.
The post will be based on SAP ASE (Adaptive Server Enterprise) 16.0 HADR with SAP Replication Server (SRS) 16.0.

In Part 1 we started with:

  • What is SRS.
  • The basic premise of HADR with SRS.
  • What a transaction is.

In Part 2 we went on to discuss:

  • What is the ASE transaction log.
  • Which databases are replicated.
  • How do transactions move to the SRS.

In Part 3 we covered:

  • What is the Active SRS.
  • What are the Key Parts of SRS.
  • Replication Synchronisation Modes
  • Open Transaction Handling
  • What are the Implications of Replication Delay

In Part 4 we stepped through the replication of Bob’s data change and saw how the transactional data was replicated first to the SRS and eventually to the companion (secondary) database.

This is the last part of my ASE 16.0 HADR mini-series, and in this final part I will discuss possible issues that can impact an ASE 16.0 HADR system, which might be useful when planning operational acceptance testing.

Companion ASE Unavailable

When the companion (secondary) database is unavailable for whatever reason (undergoing maintenance, it’s broken or for other reasons), then the replicated transactions will still continue to move through the SRS until they get to the outbound queue (OBQ).
The assumption is that the active SRS (on same server as the companion DB) is still up and working.

In the OBQ the transactions will wait until the companion is available again.
As soon as the companion is available, the transactions will move from the OBQ into the companion.
The primary database will be unaffected during this time and transactions will continue through the SRS until the OBQ becomes full.

If the OBQ fills up, then transactions will start to accumulate in the inbound queue (IBQ).

If the companion database is down for a long period of time, you may need to make a decision:

  • Increase the stable queue partition space to hold more transactions.
    With the hope that the companion can be brought back online.
  • Disable replication, removing the Secondary Truncation Point (STP) from the primary database and acknowledging that the companion will need re-materialisation to bring it back in-sync with primary.

Inbound Queue Full

When the inbound queue becomes full, the transactions will start to build up into the simple persistent queue (SPQ).
You should note that the IBQ, by default, is only allowed to use up to 70% of the stable queue partition size. The rest is for the OBQ. So “full” is not actually 100% of the queue space.

There can be two common reasons for the IBQ to fill:

  1. The OBQ is also full due to an issue with the connection to the companion ASE database, or the companion is unavailable.
    or
  2. There is an open transaction in the IBQ and the SRS is waiting for the “commit” or “rollback” command to come through from the SPQ for the open transaction.

To resolve the full IBQ, you are going to need to establish which of the two issues is occurring.
An easy way to do this is to check the OBQ fill level.
If transactions are moving from the OBQ to the companion, then the issue is an open transaction.

If an open transaction has caused the IBQ to fill, then the “commit” or “rollback” transaction could now be stuck in the SPQ. Since there is no space in the IBQ, the SRS is also unable to process the SPQ records, which leaves the IBQ open transaction in a stale-mate situation.
You will need to make a decision:

  • Add more space to the stable queues to increase the IBQ size.
    or
  • Increase the proportion of stable queue size that the IBQ can use (if OBQ is empty).
    or
  • Zap (remove) the open transaction from the IBQ (will mean data-loss on companion so a rematerialise may be needed).

Normally, you can just add more space by adding another partition to the stable queues, hopefully resolve the issue, then remove the extra space again. How much is needed? Nobody will know.
However, if you have to zap the open transaction, then make sure you dump the queue contents out first, so you can see what DML was open, you can then make a decision on how the missing transaction will affect the companion database integrity (could negate the need for rematerialisation).

During this problematic period, the SPQ has remained functional, which has meant that the primary database has been able to continue to send transactions to the active SRS and therefore allowed it to continue to commit data in a timely manner. The primary database will have no issues.

Simple Persistent Queue Full

This is probably the most serious of the scenarios.
Once the SPQ becomes full, it immediately impacts the Replication Agent running in the primary ASE database.

Transactions are unable to move from the primary database transaction log to the active SRS.
You will start to get urgent warnings in the primary ASE database error log, informing you that the SPQ is full and that the buffers in the primary Replication Agent are full.

You will also see that the Replication Agent will be producing error messages and informing you that it has switched from synchronous to asynchronous replication mode.

The integrity of your production ASE database is now at risk!

It is possible you can add more space to the SPQ if you think you can resolve the problem in the IBQ and have the time to wait for the IBQ to empty!

You should note that this scenario has the symptoms if the active SRS is not working at all. If the SPQ is not available, then you need to troubleshoot the SRS error log. It’s also possible you may have issues with the Replication Agent in the primary ASE.

Primary Transaction Log Full

If your active SRS has now filled up to the SPQ, your primary ASE database is now at serious risk.
With the Replication Agent unable to move the Secondary Truncation Point (STP) then the transaction log of the primary ASE will start to fill up.
You will not be able to release the segments, even with a transaction log dump, because they are still needed by the Replication Agent.

You have the following options available:

  • Add more space to the transaction log by adding a new device (recommended, instead of expanding the existing device).
    This will give you more time to maybe fix the SRS.
    or
  • Disable replication, which removes the STP and allows the transaction log to be dumped. A rematerialise of the companion will be needed at a later date.
    or
  • If the transaction log is already full, then even trying to disable replication will not work (no transactions will be permitted).
    You will have to use DBCC to remove (ignore) the “LTM” (Last Truncation Marker), which will have a similar outcome to disabling replication.

At this point, if all else fails and your transaction log is full DO NOT RESTART THE PRIMARY ASE!
If you restart the primary ASE with a full transaction log, then you will be in a world of pain trying to get it to start up again.
You need to stop all processing (it won’t be working anyway), then try and resolve the issue.

Summary

This concludes the 5 part mini-series on SAP ASE 16.0 HADR.
I hope it has given you enough of an overview to be able to explain a typical setup, the way replication occurs and the most common issues that put your production ASE database at risk.

Feel free to contact me with any questions and feedback.

SAP ASE HADR Overview – Part4

In this multi-part post, I’m explaining the basics behind how SAP Replication Server works when replicating from a SAP ASE database to a SAP ASE database as part of an HADR (ASE always-on) setup for a SAP Business Suite system.
The post will be based on SAP ASE (Adaptive Server Enterprise) 16.0 HADR with SAP Replication Server (SRS) 16.0.

In Part 1 we started with:

  • What is SRS.
  • The basic premise of HADR with SRS.
  • What a transaction is.

In Part 2 we went on to discuss:

  • What is the ASE transaction log.
  • Which databases are replicated.
  • How do transactions move to the SRS.

In Part 3 we covered:

  • What is the Active SRS.
  • What are the Key Parts of SRS.
  • Replication Synchronisation Modes
  • Open Transaction Handling
  • What are the Implications of Replication Delay

In this penultimate part we step through the process of replication for an individual transaction, looking at how each of the previously discussed components plays it part along the way.

Step 1 -Begin Transaction

In the first step, our SAP Business Suite application is being used by Bob, our end-user.
Bob has a screen open and saves a change to an item of data.
The Business Suite application calls the Netweaver application stack to persist the change to the data according to the application’s dictionary model.
The dictionary dictates the relationship of the business object (e.g. an invoice) to the relational database tables.
The Netweaver code uses the SAP Kernel supplied libraries to save the change to the necessary system database tables.

There could be many tables involved in Bob’s one change.
It would not be right to update just one table without also updating the others that are related. Otherwise we would not have consistency.
For this reason, a single database transaction can include many data manipulation language (DML) statements.
By grouping the DML statements for the related table updates into a single database transaction, the SAP Kernel can enforce consistency.

In our example, our transaction will include updates to 2 different tables: tableA and tableB.
To designate the start of the database transaction, the SAP Kernel calls the database library to “BEGIN TRANSACTION”.

The affect of the database call to “BEGIN TRANSACTION”, is that a new transaction log record is OPENened:

Step 2 – Replication Agent – Open Transaction

Once a transaction has started (opened), the ASE HADR Replication Agent will see it.
Remember, we discussed the Replication Agent in Part 2.

The Replication Agent sends the transaction data across the network to the target SAP Replication Server (SRS). It knows where to send the data because of the configuration applied to the Replication Agent during the HADR setup process.

The SRS receives the data from the Replication Agent and writes it to the Simple Persistent Queue (SPQ), then sends an acknowledgement back to the Replication Agent.

Step 3a – Update Tables – DML

So far, all that has happened is a new transaction has been opened.
Now Netweaver will apply the required DML to the transaction:

UPDATE tableA SET column1=”ABC”
UPDATE tableB SET column1=”123″

The Kernel will apply the required DML to the opened transaction, this will also update the transaction log, which will be seen by the Replication Agent and sent across to the SRS as before.

You will notice that at this point, we are still using transaction log space, but we are also consuming space in the SPQ.

At this step, if one of the required “UPDATE” statements was to fail, then the whole transaction could be cancelled (a rollback) and no changes would be permanently made to any of the tables.
This is one of the requirements of the ACID principles.

Step 3b – The SRS Inbound Queue

At the same time as the DML is being applied to the open transaction in step 3a, the SRS continues to process the open transaction.

Inside the SRS, there are various component modules that process the incoming transactions from the SPQ, placing them in the correct order and compacting them (grouping) into larger, more efficient transactions.
Once this initial processing has completed, the new transaction is placed into the inbound queue (IBQ).

Something you will notice, is that we now have consumed space in:

  • Primary database transaction log.
  • Simple Persistent Queue.
  • Inbound Queue.

Once the transaction is safely persisted into the IBQ, the record in the SPQ is now free for re-use.

Step 4 – End Transaction

In steps 3a and 3b, we see the opening transaction record and the DML move across to the SRS.
At this point in time, Bob’s changes are still not replicated to the companion (secondary/standby) database.
In fact, Bob’s changes are not even visible to other users of the primary database, because Bob’s changes are not yet committed.

Once all the DML in Bob’s transaction has been applied at the primary database successfully (still not committed), then the SAP Kernel can issue the “END TRANSACTION”.
This signifies that this group of changes are finished.
After the “END TRANSACTION”, the SAP Kernel can issue one of two things; a “COMMIT” or a “ROLLBACK”.
In our case, the Kernel issues a “COMMIT”.

The “COMMIT” on the primary database is now performed, at the same time, the “COMMIT” record is also sent by the Replication Agent to the SPQ of the SRS.

I can hear the DB experts gasp at this point!
Yes, in Part 3 I mentioned that the commit is not allowed on the primary until after the Replication Agent has successfully sent the commit to the SPQ. In actual fact, this is not a hard and fast rule. The Replication Agent will attempt to send the commit record to the SPQ; it will wait for a given amount of time, before switching to asynchronous replication mode (see Part 3 for a description of this mode).
The commit to the primary database is therefore allowed to happen, even if it has not yet been acknowledged by the SPQ. This is the trade-off between performance and protection. The HADR solution has flexibility that allows a configurable amount of replication delay before synchronous replication is switched to asynchronous.

The acknowledgement that the “COMMIT” record has been successfully stored in the SRS SPQ, allows the primary database Replication Agent to move the Secondary Truncation Point (STP) forward and release the transaction log record, which allows it to be freed when the next “DUMP TRANSACTION” (transaction log backup) is performed.

The primary database data change becomes visible to all users of the database.
Bob’s screen returns a message telling him that his data is saved.

Step 5 – The SRS Outbound Queue

Inside the SRS IBQ, all of our transaction is now complete, we have a “BEGIN” and an “END”, a “COMMIT” and some DML in between that contains the updates to the required tables.

Once the “COMMIT” record is seen by the Inbound Queue (IBQ) of the SRS, then the SRS will process the re-packaged transaction from the IBQ to the Outbound Queue (OBQ).

This processing could involve adjusting the SQL language used, if the target database is not ASE.

From the OBQ, the Data Server Interface (DSI) component of the SRS, applies the transaction to the target database.

Finally, the replicated transaction data is applied to the target secondary database and harmony is achieved.

Bob’s View

Throughout this whole replication cycle, Bob had no idea that his data was being replicated to the target secondary database hundreds of kilometres away.
The response time of the database to the SAP Kernel was only slightly impacted by the addition of the Replication Agent processing time, plus the network transfer time to the active SRS, plus the processing and persistence time to the SPQ of the SRS.

As well as the response time, we noticed how the storage requirements of the SRS are bigger than the actual transaction log of the primary database due to the way the transaction is processed/transformed and re-processed through the SRS, then queued for application to the target database.

In the final part 5, I will discuss some common issues that can occur with HADR, allowing you to comprehensively plan your operational acceptance testing.

SAP ASE HADR Overview – Part3

In this multi-part post, I’m explaining the basics behind how SAP Replication Server works when replicating from a SAP ASE database to a SAP ASE database as part of an HADR (ASE always-on) setup for a SAP Business Suite system.
The post will be based on SAP ASE (Adaptive Server Enterprise) 16.0 HADR with SAP Replication Server (SRS) 16.0.

In Part 1 we started with:

  • What is SRS.
  • The basic premise of HADR with SRS.
  • What a transaction is.

In Part 2 we went on to discuss:

  • What is the ASE transaction log.
  • Which databases are replicated.
  • How do transactions move to the SRS.

In this part we discuss the role of the active SRS and how the internal processing moves the transactions into the secondary ASE database.

What is the Active SRS?

In a standard HADR setup with primary and secondary databases, there are two instances of SRS.

  • The inactive SRS is running on the primary database node.
  • The active SRS is running on the secondary database node.

The active SRS receives the transactions (commands) from the Replication Agents in the primary databases.

To those with DB2 or Oracle experience, this replication hop seems strange at first. On closer inspection it achieves the same desired result, the transactional data is successfully persisted on a server separate to the primary database..

The inactive SRS is unused until a failover occurs.
During a failover the inactive SRS, on the old primary server, can switch replication paths to become the active SRS. Therefore the inactive SRS is the reverse path of replication.

What are the Key Parts of SRS?

In my view, there are 5 key parts to the SRS 16.0 architecture.
At each of these stages the data is persisted.

  1. The primary database.
    Sends transactions from the transaction logs, via the Replication Agent threads to the active SRS.
  2. The SRS Simple Persistent Queue (SPQ).
    This is a simple set of disk files for persisting the unordered transactions received on the SRS.
    In synchronous replication mode, once a replicated transaction is persisted in the SPQ, it is classified as “received” on the secondary which allows the transaction to commit on the primary database.
    A backlog on the SPQ could mean the underlying disk is not fast enough for the replication workload, or that the server hosting the actives SRS is suffering CPU/IO saturation.
    If you have anti-virus installed, you should treat the SPQ disk just like you treat other database data files (i.e. exclude them from the A/V scanning).
  3. The SRS Stable Inbound Queue (IBQ).
    An ordered table of the replicated transaction commands that represent the open or final transactions.
    In the case of transactions that are rolled back, they are removed at the IBQ once the rollback command is seen (needs to come via the SPQ).
    There is one IBQ for each primary database and it only ever holds transactions from a database and never from another Rep Server.
    The SRS internals process the transactions on the IBQ that have a corresponding commit record (needs to come via the SPQ).
    These committed transactions are grouped/compacted, ordered and translated into the correct target language for the target database platform and moved to the respective outbound queue for the target databases.
    A backlog on the IBQ could mean the SRS internals may not be keeping up with the replication workload.
    It is important to re-state that transactions on the IBQ could be open and waiting for a rollback or commit, which means the SPQ needs space for the transactions that contain those commands to make it through to the IBQ.
  4. The SRS Stable Outbound Queue (OBQ).
    Committed transactions are moved from the IBQ onto the the OBQ.
    The OBQ actually shares a portion of the same partition space as the IBQ, so moving between the IBQ and OBQ is very quick.
    There is one OBQ for each target database and one if the target is another SRS (in scenarios with a third DR node).
  5. The Target (a.k.a Standby or Secondary or Companion) Databases.
    The SRS has a set of distribution threads (DIST) that apply the transactions from the OBQs to the respective target databases via the DSI (Data Server Interface).
    NOTE: In my diagrams I’ve positioned the DSI as slightly separate, but it is actually a module/component of the DSI.

    For scenarios with a DR node also, the target is the DR Rep Server.
    In the target databases you will the <SID>_maint user is used to apply the transactions.
    A backlog on the OBQ could indicate a problem with the performance of the target database.

Replication Synchronisation Modes

With ASE HADR there are 3 replication modes available. The different modes affect how transactions are treated.

In asynchronous replication mode (a.k.a “Warm Standby” or “DR”), the Replication Agent threads operate in a lazy way. Scanning for and sending transactions to the active SRS in batches when they can.

There is a replication delay in asynchronous mode, but it does mean that the responsiveness of the primary database may seem better to the client applications (SAP Business Suite) because the commit to the primary database does not wait for the Replication Agent to send the transaction to the SRS.
Because of the inherent delay, there is a high possibility of data-loss in a failover scenario.
Even in asynchronous mode, the transaction log of the primary database cannot be freed until transactions are actually committed to secondary (the STP cannot be moved until transactions are committed on secondary).

When HADR is configured in synchronous replication mode (a.k.a “Hot Standby” or “HA”), each transaction is immediately sent by the replication agent to the SRS for persisting on disk at the SPQ.
Once safely in the SPQ, the primary database is allowed to commit the transaction.

This means synchronous replication mode has a direct impact on the responsiveness of the primary database transactions, because the commit on primary will be delayed by the Replication Agent + network transfer + memory + I/O latency for persisting to the SPQ of the SRS.

Lastly, near-synchronous mode works the same way as synchronous, but it is designed for slower disks hosting the SPQ on the SRS. This means that the SRS acknowledgement is sent back to the Replication Agent as soon as transaction is received in memory, but before it is persisted to the SPQ files on disk.

Compared to synchronous mode, near-sync has a slightly higher possibility of data-loss, in exchange for a slight reduction in latency.

Open Transaction Handling

As mentioned previously, transactions are replicated from the primary almost as soon as they are started (as soon as the transaction log records the start of the transaction). This means “open” transactions are replicated as soon as they are started and flow all the way to the IBQ.

(See Part 1 for a description of “Open Transactions”)

If transactions were only replicated once they were committed on primary, there would be a large delay before that transaction was safely applied to the secondary database.
When enabled, the “early dispatch” feature (“parallel_dist”) even allows large open transactions (using specific SRS threads) to start applying to the secondary database early, once the commit record is seen at the SPQ.

What are the Implications of Replication Delay?

There are two main implications of delay (latency) in replication:

  1. Potential Data Loss.
    Any delay in the replication from primary to SRS introduces potential data-loss.
    This is because the aim of replication is to get the transactional data off the primary databases as soon as possible and safely onto the active SRS node (server) in another location. Any delay between the active SRS and the secondary database could mean data-loss but only if the SRS software itself is lost or corrupted.
    The ultimate goal is to get from the primary database to the active SRS as quickly as possible, with the second goal being to get from the primary database to the secondary database as quickly as possible.
    This opens up possibilities regarding architecture design and sizing of the secondary databases, which I will cover in another post.
  2. Response Time.
    Any delay in synchronous replication mode can also delay the commit of the transactions on the primary databases. In a Netweaver ABAP stack, this delay would be seen in the SAP Business Suite as a longer database response time in the Netweaver UPDATE work processes, therefore the delay is unlikely to be seen by the users themselves. In a Java stack, there is no such UPDATE work process, so the Java stack is likely to be more sensitive to longer response times.

In the next part, we will step through the replication of a single transaction and discuss the impact of different scenarios along the way.

SAP ASE HADR Overview – Part2

In this multi-part post, I’m explaining the basics behind how SAP Replication Server works when replicating from a SAP ASE database to a SAP ASE database as part of an HADR (ASE always-on) setup for a SAP Business Suite system.
The post will be based on SAP ASE (Adaptive Server Enterprise) 16.0 HADR with SAP Replication Server (SRS) 16.0.

A recap of what we discussed in Part 1:

  • What is SRS.
  • The basic premise of HADR with SRS.
  • What a transaction is.

In this part we discuss the ASE transaction log and how important it is for the working of the database then we discuss how transactions are replicated to the active SRS and touch on the internals of the SRS.

What is the ASE Transaction Log?

The ASE transaction log is the most active part of an ASE database.
When we talk about database ACID principles, the transaction log is what helps deliver those things. It is simply a log of transactions applied to a database.
The transaction log is where changes to the database are first made, it is the register, used to ensure that the database can be recovered to a consistent point-in-time.
Some of the seven ASE databases hosted in an ASE 16.0 instance do not need this point-in-time recovery capability, so they have a special setting that simply truncates the records of the transaction log frequently. The setting/option is called “trunc log on chkpt” and should *never* be applied to the SID database (the BusSite database) in production.
The SID database holds our vital business data, and needs to be recoverable right up to the last transaction that had a final state (committed or rolled-back) applied on that database.

On a regular interval, data changes contained in the transaction log that have a final state, are applied to the database data devices (the database main persistence).
This process is known as a checkpoint.
In those ASE databases with option “trunc log on chkpt” turned on, a checkpoint causes the used transaction log entries to be simply truncated (cleared), with no dump required.
In ASE databases with “trunc log on chkpt” turned off (like the SID database), after a checkpoint has occurred, the used transaction log segments are marked as freeable, and these freeable segments can become available for re-use only once a transaction log backup (dump) is performed to a backup disk or device.

This is a cycle that allows the database to continue to be used for making changes to the database data while also protecting the register of transactions in case of a database crash or even complete corruption.

If the transaction log ever becomes full, then the database is unable to allow new changes and no new transactions can be started until space is made available in the transaction log. It is therefore very important for regular database transaction logs dumps to be performed.
Not only to free space in the transaction log, but also to protect the database in the event that the transaction log is lost or corrupted. For added protection, these transaction dump file(s) can also be moved off to a separate location, away from the database server itself. This is a function of almost every backup tool and as you will find a requirement of almost every database that abides by the ACID principles.

Which Databases are Replicated?

Each ASE instance hosts multiple ASE databases.
In a Business Suite system, the ASE 16.0 instance hosts 7 databases (plus 2 temporary databases and a model database):

  • Master
  • Sybsystem
  • Sybmgmtdb
  • Sybsecurity
  • Sybsystemprocs
  • Saptools
  • SID

In a HADR system, we usually only replicate the Master database and the SID database. This is because the master database holds vital information about the user logins, roles and privileges, which are needed to be the same in the target system. The SID database is the database for the SAP Business Suite system, it holds our business data, which we need to protect.

How do Transactions Move to the SRS?

Inside the replicated primary ASE databases, there exists a Replication Agent process that runs in its own process thread.
The Replication Agent scans the transaction log of the database it serves and sends the transactions across to the active SAP Replication Server (SRS).
In the active SRS a matching receiver thread (Rep Agent User component thread) receives the transactions and pushes them into the Simple Persistent Queue (SPQ) on the disk of the active SRS server.

Subsequent components inside the active SRS process the transactions from the SPQ and eventually push them into the secondary database.

Part 3 is now available here.