This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

SAP PI/PO Performance Statistics Data Extract and Analysis using Excel & Power Query

End-to-End performance analysis of a specific SAP PO/PI interface is a tricky business using the out-of-the-box tools. For example, you might be thinking that SAP Solution Manager can provide this detail, but it is not able to correlate inbound SAP Web Dispatcher HTTP logs or Netweaver Java ICM logs to the PI/PO interface.
Instead you need to implement your own monitoring to bring these components together, unless you implement a complete tool like AppDynamics.

In this post I will show how to extract the SAP PI/PO performance data and import into Excel, using a Power Query to transform it from the XML format into something that we can report on.
After seeing my example, you can take the same design away and, using PowerBI, construct a dashboard for reporting. You can also choose to incorporate the HTTP logs, to give you an the end-to-end performance analysis by interface.

This post is similar to another set of blog posts I published, showing how to pull data about your virtual machines directly from Azure, see: List Your Azure VMs in Excel. That other post used Power Query to transform JSON.

Data Sources

Before we go and start Excel, we need to understand where our data will be coming from.
There are 4 locations that we can report on in a SAP Netweaver AS Java stack running SAP PI/PO:

  • SAP Web Dispatcher (load balancer across multiple App instances).
  • SAP ICM (load balancer across multiple NW java server nodes).
  • SAP NW HTTP Provider.
  • SAP PI/PO performance data servlet.

The last item is the key one, which will require some Power Query goodness to transform the data.
We use the 2014 blog post from Vadim Klimov to see how to pull this data direct from PI/PO using the servlet “PerformanceDataQueryServlet”.

For the Web Dispatcher, the ICM and the HTTP Provider, we really need to choose just one of those points to collect the log information.
Since our source system is handing over processing to “us” at the Web Dispatcher, then that would be the logical location to collect the HTTP logs.
However, some companies use a load balancing appliance like an F5 at the entry-point, in which case, you would be better gathering the HTTP logs from each of the ICM server processes.

The reason for using the HTTP logs from the front-end of the architecture stack, is because you want to capture any HTTP 50x messages caused by unavailability of the back-end parts.
For example, if SAP Netweaver is down, then the Web Disp logs would show a HTTP 503 (service unavailable).
If the PO application is not started inside SAP Netweaver, then the ICM logs would show a HTTP 503.
You want to collect the logs from the closest point of the handover between your source system and the Netweaver stack.

For the HTTP log sources, we have a little bit of an issue.
In most cases, logging is not enabled in Web Dispatcher and ICM configurations. To enable the logging we need to look at the parameter “icm/HTTP/logging_<xx>“.
Ideally what we need for the log format is: %h %u %t “%r” %s %H %L
This will give:

  • %h = Source IP.
  • %u = BASIC Auth username or cert common name.
  • %t = Date/time.
  • %r = The request with query string !IMPORTANT!
  • %s = The HTTP response code e.g. 200 or 500 etc.
  • %H = The name of the server host.
  • %L = Response time in milliseconds.

The log file should be switched DAILY (SWITCHF=day) to prevent it from growing too large.
We will need to transform the data in the log, but we can do this in Power Query or in a script at source.

Data Transfer

We now need to understand how we will transfer the data from the data source to Excel (or PowerBI).
Natively, Excel (and PowerBI) can query a HTTP target to obtain data in a variety of formats.
Since XML is supported with no problems, this enables us to call the PI/PO PerformanceDataQueryServlet directly from Excel.

For those feeling adventurous, the HTTP logs can actually be picked up by Azure Log Analytics. You may need to adjust the date/time format with a script, but then it will be possible to have them stored in your workspace for pulling into PowerBI.

Alternatively, you will need to place the HTTP logs into a storage location on a regular interval, somewhere accessible from Excel/PowerBI. This could be Sharepoint or an Azure Storage Account.
Another option is to have them placed into a location that serves HTTP, such as the SAP instance agent like I have shown before. For the Web Dispatcher you may have 2 logs (in an active/active setup) for the ICM you will have a log file for each Application server instance.
By naming the log files in an intelligent manner, you can ensure that your Power Query can always find the files (e.g. don’t include the date/time in the file name).

Data Aquisition

With your data accessible via HTTP, you can use Excel or PowerBI to process it.
In this example, we will go through the processing for the PerformanceDataQueryServlet, since that is the hardest to process in its raw XML format, with multiple nested tables. The nested tables is the reason we use Power Query to transform it.

Open Excel and create a new workbook, then select the “Data” tab:

Click “Get Data” and select “From Other Sources”, then click “Blank Query”:

Click “Advanced Editor”:

Remove any existing text from the query box:

At this point, we can paste in the code necessary to obtain our data, but first we need to understand the URL composition correctly.
For this we can refer to the 2014 blog post from Vadim Klimov to understand the required parameters.

Here’s my sample HTTP call:
https://sapts1app1:50001/mdt/performancedataqueryservlet?component=af.ts1.sapts1db01&begin=2021-01-10T00:00:00.000Z&end=2021-01-11T00:00:00.000Z

This can be broken down as follows:

EntryValue
Https://sapts1app1:50001Host address of PO App server (this can be any one of the Java instances).
/mdt/performancedataqueryservletURI for the PerformanceDataQueryServlet
component=af.ts1.sapts1db01The name of our AAEX
begin=2021-01-10T00:00:00.000ZThe begin time of our data selection period.
end=2021-01-11T00:00:00.000ZThe end time of our data selection period.

Something you will notice about our URL is that we are using the HOURLY data selection period, listing data for a 24 hour period aggregated by hour.
We don’t really have much choice with the PerformanceDataQueryServlet, as we can only choose from MINUTE, HOURLY or DAILY with aggregation levels being 15mins, 1hour or 1day.

If we were to decide not to pull the data via HTTP, then we could save it to a flat file.
The data format that will be returned from the HTTP call could be pre-saved.
Here’s my sample data returned from the call to the PerformanceDataQueryServlet:

<?xml version="1.0" encoding="UTF-8" ?>
<PerformanceDataQueryResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sapts1app1:50001/mdt/monitor/PerformanceDataQuery.xsd">
<Result>
<Code>OK</Code>
<Details>OK</Details>
<Text>Successfully&#x20;read&#x20;performance&#x20;data.</Text>
<Component>af.ts1.sapts1db01</Component>
<PeriodType>DAILY</PeriodType>
<BeginTime timezone="UTC">2021-01-10&#x20;00&#x3a;00&#x3a;00.0</BeginTime>
<EndTime timezone="UTC">2021-01-11&#x20;00&#x3a;00&#x3a;00.0</EndTime>
</Result>
<Data>
<ColumnNames>
<Column>INBOUND_CHANNEL</Column>
<Column>OUTBOUND_CHANNEL</Column>
<Column>DIRECTION</Column>
<Column>DELIVERY_SEMANTICS</Column>
<Column>SERVER_NODE</Column>
<Column>FROM_PARTY_NAME</Column>
<Column>FROM_SERVICE_NAME</Column>
<Column>TO_PARTY_NAME</Column>
<Column>TO_SERVICE_NAME</Column>
<Column>ACTION_NAME</Column>
<Column>ACTION_TYPE</Column>
<Column>SCENARIO_IDENTIFIER</Column>
<Column>MESSAGE_COUNTER</Column>
<Column>MAX_MESSAGE_SIZE</Column>
<Column>MIN_MESSAGE_SIZE</Column>
<Column>AVG_MESSAGE_SIZE</Column>
<Column>MAX_RETRY_COUNTER</Column>
<Column>MIN_RETRY_COUNTER</Column>
<Column>AVG_RETRY_COUNTER</Column>
<Column>AVG_PROCESSING_TIME</Column>
<Column>TOTAL_PROCESSING_TIME</Column>
<Column>MEASURING_POINTS</Column>
</ColumnNames>
<DataRows>
<Row>
<Entry>HTTPSender_TS1</Entry>
<Entry>SOAPReceiver_Proxy</Entry>
<Entry>OUTBOUND</Entry>
<Entry>BE</Entry>
<Entry>1234567</Entry>
<Entry></Entry>
<Entry>BSR_SAP</Entry>
<Entry></Entry>
<Entry>TS_SERVER</Entry>
<Entry>ProcessingIn</Entry>
<Entry>urn&#x3a;in2db.com&#x3a;ERP&#x3a;Test</Entry>
<Entry>&#x7c;BSR_SAP&#x7c;ProcessingOut&#x7c;&#x7c;</Entry>
<Entry>3</Entry>
<Entry>2396</Entry>
<Entry>1391</Entry>
<Entry>1817</Entry>
<Entry>0</Entry>
<Entry>0</Entry>
<Entry>0.00</Entry>
<Entry>524476</Entry>
<Entry>1573428</Entry>
<Entry> <MeasuringPoints><MP> <Name>MS&#x3a;module_in&#x3a;CallSapAdapter</Name> <Sequence>1</Sequence> <Max>394</Max> <Avg>349</Avg> <Min>261</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SI</Name> <Sequence>2</Sequence> <Max>12</Max> <Avg>9</Avg> <Min>8</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;BI</Name> <Sequence>3</Sequence> <Max>73</Max> <Avg>60</Avg> <Min>52</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VI</Name> <Sequence>4</Sequence> <Max>12</Max> <Avg>8</Avg> <Min>7</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;MS</Name> <Sequence>5</Sequence> <Max>1266</Max> <Avg>1050</Avg> <Min>771</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Store</Name> <Sequence>6</Sequence> <Max>155</Max> <Avg>112</Avg> <Min>90</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Disp_Queue</Name> <Sequence>7</Sequence> <Max>2328</Max> <Avg>836</Avg> <Min>82</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Disp_Queue</Name> <Sequence>8</Sequence> <Max>1445</Max> <Avg>630</Avg> <Min>203</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Queue</Name> <Sequence>9</Sequence> <Max>44</Max> <Avg>42</Avg> <Min>42</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Queue</Name> <Sequence>10</Sequence> <Max>323</Max> <Avg>263</Avg> <Min>195</Min></MP><MP> <Name>MS&#x3a;Message_Update_Status</Name> <Sequence>11</Sequence> <Max>233</Max> <Avg>166</Avg> <Min>128</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AM</Name> <Sequence>12</Sequence> <Max>114891</Max> <Avg>41811</Avg> <Min>2755</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SO</Name> <Sequence>13</Sequence> <Max>59</Max> <Avg>40</Avg> <Min>24</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VO</Name> <Sequence>14</Sequence> <Max>44</Max> <Avg>33</Avg> <Min>25</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AT</Name> <Sequence>15</Sequence> <Max>468</Max> <Avg>364</Avg> <Min>304</Min></MP><MP> <Name>MS&#x3a;module_out&#x3a;sap.com&#x2f;com.sap.aii.af.soapadapter&#x2f;XISOAPAdapterBean</Name> <Sequence>16</Sequence> <Max>1008279</Max> <Avg>478000</Avg> <Min>131434</Min></MP><MP> <Name>MS&#x3a;Resp&#x3a;stage&#x3a;BI</Name> <Sequence>17</Sequence> <Max>575</Max> <Avg>481</Avg> <Min>395</Min></MP><MP> <Name>MS&#x3a;Resp&#x3a;Message_Put_In_Store</Name> <Sequence>18</Sequence> <Max>157</Max> <Avg>136</Avg> <Min>121</Min></MP><MP> <Name>MS&#x3a;Resp&#x3a;Message_Update_Status</Name> <Sequence>19</Sequence> <Max>89</Max> <Avg>86</Avg> <Min>81</Min></MP> </MeasuringPoints></Entry>
</Row>
<Row>
<Entry>SOAP_Sender</Entry>
<Entry>SOAPReceiver_Proxy</Entry>
<Entry>OUTBOUND</Entry>
<Entry>EO</Entry>
<Entry>1234567</Entry>
<Entry></Entry>
<Entry>BSR_SAP</Entry>
<Entry></Entry>
<Entry>TS_SERVER</Entry>
<Entry>Confirmation_In</Entry>
<Entry>http&#x3a;&#x2f;&#x2f;sap.com&#x2f;xi&#x2f;IS-U&#x2f;Global2</Entry>
<Entry>&#x7c;BSR_SAP&#x7c;Confirmation_Out&#x7c;&#x7c;</Entry>
<Entry>48</Entry>
<Entry>672</Entry>
<Entry>672</Entry>
<Entry>672</Entry>
<Entry>0</Entry>
<Entry>0</Entry>
<Entry>0.00</Entry>
<Entry>89338</Entry>
<Entry>4288227</Entry>
<Entry> <MeasuringPoints><MP> <Name>MS&#x3a;SOAPHandler.processSOAPtoXMB</Name> <Sequence>1</Sequence> <Max>488</Max> <Avg>296</Avg> <Min>190</Min></MP><MP> <Name>MS&#x3a;module_in&#x3a;CallSapAdapter</Name> <Sequence>2</Sequence> <Max>521</Max> <Avg>211</Avg> <Min>144</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SI</Name> <Sequence>3</Sequence> <Max>55</Max> <Avg>6</Avg> <Min>5</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;BI</Name> <Sequence>4</Sequence> <Max>195</Max> <Avg>37</Avg> <Min>26</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VI</Name> <Sequence>5</Sequence> <Max>28</Max> <Avg>5</Avg> <Min>4</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;MS</Name> <Sequence>6</Sequence> <Max>7495</Max> <Avg>2675</Avg> <Min>1340</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Store</Name> <Sequence>7</Sequence> <Max>28648</Max> <Avg>8891</Avg> <Min>6457</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Disp_Queue</Name> <Sequence>8</Sequence> <Max>12290</Max> <Avg>6102</Avg> <Min>3558</Min></MP><MP> <Name>MS&#x3a;Message_Put_In_Queue</Name> <Sequence>9</Sequence> <Max>191</Max> <Avg>46</Avg> <Min>21</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Queue</Name> <Sequence>10</Sequence> <Max>401</Max> <Avg>229</Avg> <Min>153</Min></MP><MP> <Name>MS&#x3a;Message_Wait_In_Disp_Queue</Name> <Sequence>11</Sequence> <Max>18855</Max> <Avg>5289</Avg> <Min>8</Min></MP><MP> <Name>MS&#x3a;Message_Update_Status</Name> <Sequence>12</Sequence> <Max>25237</Max> <Avg>9398</Avg> <Min>5056</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AM</Name> <Sequence>13</Sequence> <Max>390</Max> <Avg>183</Avg> <Min>124</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;SO</Name> <Sequence>14</Sequence> <Max>102</Max> <Avg>17</Avg> <Min>16</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;VO</Name> <Sequence>15</Sequence> <Max>155</Max> <Avg>22</Avg> <Min>17</Min></MP><MP> <Name>MS&#x3a;stage&#x3a;AT</Name> <Sequence>16</Sequence> <Max>1813</Max> <Avg>332</Avg> <Min>205</Min></MP><MP> <Name>MS&#x3a;module_out&#x3a;sap.com&#x2f;com.sap.aii.af.soapadapter&#x2f;XISOAPAdapterBean</Name> <Sequence>17</Sequence> <Max>91602</Max> <Avg>55588</Avg> <Min>46038</Min></MP> </MeasuringPoints></Entry>
</Row>
</DataRows>
</Data>

</PerformanceDataQueryResults>

The XML data is complex and contains nested tables for the “MeasuringPoint” elements. This is not something that is possible to extract using the Excel data import GUI alone. You will need to use my code 😉
In the code there are two points that do the required pre-processing to transpose, fillUp and then remove some data parts, returning it in the required format so that you can report on it with all the “MeasuringPoints” if you need them.
Could the above be done in another tool? Probably. But everyone has Excel.

Let’s put my Power Query code into the Excel query editor:

let
    // Uncomment to use a URL source,
    // Source = Xml.Tables(Web.Contents("https://sapts1app1:50001/mdt/performancedataqueryservlet?component=af.ts1.sapts1db01&begin=2021-01-10T00:00:00.000Z&end=2021-01-11T00:00:00.000Z")), 
    Source = Xml.Tables(File.Contents("C:\Users\darryl\Documents\Projects\po-perf-metrics\performancedataqueryservlet-1.xml")),
    Data = Source{1}[Table],
    DataRows = Data{1}[Table],
    Row = DataRows{0}[Table],
    #"Expanded Entry" = Table.TransformColumns(Row, {"Entry", each Table.RemoveLastN(Table.FillUp(Table.Transpose(_), {"Column22"}),1)}),
    #"Expanded Entry1" = Table.ExpandTableColumn(#"Expanded Entry", "Entry", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22"}, {"Entry.Column1", "Entry.Column2", "Entry.Column3", "Entry.Column4", "Entry.Column5", "Entry.Column6", "Entry.Column7", "Entry.Column8", "Entry.Column9", "Entry.Column10", "Entry.Column11", "Entry.Column12", "Entry.Column13", "Entry.Column14", "Entry.Column15", "Entry.Column16", "Entry.Column17", "Entry.Column18", "Entry.Column19", "Entry.Column20", "Entry.Column21", "Entry.Column22"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Entry1",{{"Entry.Column1", "INBOUND_CHANNEL"}, {"Entry.Column2", "OUTBOUND_CHANNEL"}, {"Entry.Column3", "DIRECTION"}, {"Entry.Column4", "DELIVERY_SEMANTICS"}, {"Entry.Column5", "SERVER_NODE"}, {"Entry.Column6", "FROM_PARTY_NAME"}, {"Entry.Column7", "FROM_SERVICE_NAME"}, {"Entry.Column8", "TO_PARTY_NAME"}, {"Entry.Column9", "TO_SERVICE_NAME"}, {"Entry.Column10", "ACTION_NAME"}, {"Entry.Column11", "ACTION_TYPE"}, {"Entry.Column12", "SCENARIO_IDENTIFIER"}, {"Entry.Column13", "MESSAGE_COUNTER"}, {"Entry.Column14", "MAX_MESSAGE_SIZE"}, {"Entry.Column15", "MIN_MESSAGE_SIZE"}, {"Entry.Column16", "AVG_MESSAGE_SIZE"}, {"Entry.Column17", "MAX_RETRY_COUNTER"}, {"Entry.Column18", "MIN_RETRY_COUNTER"}, {"Entry.Column19", "AVG_RETRY_COUNTER"}, {"Entry.Column20", "AVG_PROCESSING_TIME"}, {"Entry.Column21", "TOTAL_PROCESSING_TIME"}}),
    #"Expanded MP" = Table.ExpandTableColumn(#"Renamed Columns", "Entry.Column22", {"MP"}, {"Entry.Column22.MP"})
in
    #"Expanded MP"

In the code above, you will notice the “Source=” is using a local file. You can uncomment the “Web” source and comment out the “File” source if you are pulling the data direct via HTTP.

With the Power Query code entered into the editor, check there are no syntax errors and click “Done”:

When querying the data directly over HTTP you will need to edit the credentials at this point.
In the credentials screen, enter the “Basic” username and password to use.
The data will be displayed.
In my sample I see two rows of data:

At the initial top-level, you will see we have the Average Processing Time (in milliseconds) for each interface:

We also have an additional column at the end which contains embedded tables of additional metric data for each specific stage of processing within PI/PO:

By clicking the double arrows in the top of the header for the “Entry.Column22.MP” we can expand the embedded table (should you wish to), and you will see that it presents the following additional columns of data:

When you click “OK” it adds those columns to the main list, but it will create additional rows of data for each of those additional columns that have been expanded:

With the above data expanded, we can really produce some nice graphs.
Here’s an example showing the breakdown of average response time for each of those processing stages.
First I put them into a pivot table and apply an average to the “Avg” column for each of the “Name” column values :

Then I create a pie chart for the data and we can report on which processing stage inside PI/PO is consuming the most time:

By applying additional graphs and filters we could report on individual interfaces’ overall response times, then allow drill-down into the specifics.

Any Potential Issues?

There is one big caveat with the above process of pulling the data from the servlet.
The servlet is extracting data from a memory cache area inside PI/PO.
This cache is an LRU cache, meaning it has a limited size and gets overwritten when it becomes full.
The quantity of data is therefore limited.

It is possible that you could switch on the database persistence (logging) of successful messages in PI/PO, but this has a detrimental impact to message throughput performance and is not recommended by SAP for production systems.

To try and get around the cache limitations, my advice would be to extract the data using the smallest granular frequency that the servlet allows (MINUTE), and save to a disk file which could be accessible from Excel somehow.
Another method could be to use Microsoft Power Automate (previously MS Flow) to pull the file into Sharepoint or other storage.
By extracting the data frequently, you are trying to ensure that you have it stored before the cache is cleared, but also you are building a time-series from which you could potentially include into your reporting tool.
A time-series would allow you to scroll through data in windows of at least 15 mins in size. Not bad.

Summary

We identified the important areas of data collection in PI/PO (and Netweaver in general) to allow response times to be seen.
We also noted that HTTP response codes such as 503 should be obtained from the first outermost point of the Netweaver stack (if possible).

We saw an example of using the “PerformanceDataQueryServlet” to pull data from the PI/PO memory cache and transformed it using Power Query to allow detailed reporting on the response times.
I created a demonstration graph from a pivot table using my sample data, which showed a possible drill-down in the response time of individual processing stages within SAP PI/PO.

Hopefully I have given you some ideas on for how you can solve your PI/PO performance reporting requirement.

Analysing High SAP Roll Wait Time – NW702

One of my clients identified an increase in the Response Time measurements they monitor.  They use the EarlyWatch Report to monitor the SAP system SLAs against response time performance.
I initially thought this was going to a be a nice easy assignment, identify the database performance issues and propose some reorganisation work.  After all, this does seem to be the case in ~75% of the cases I’ve seen.

Basic info: SAP NW 702 on Oracle 11gR2 on Unix.
SLAs are monitored against DIALOG response time from 9am to 6pm weekdays only.

I brought up the ST03 screen and checked the “Ranking Lists -> Top Response Time of Dialog Steps” (also known as the Hit List) for the past 3 month’s data, one month at a time.
All three months showed a similar pattern.  The worst performing reports were customer developed reports.  The database time looked reasonable (< 40% of Response Time – Wait Time), processing time was long-ish and load/gen time was tiny.
What didn’t look good was the “Roll Wait Time”.

I backtracked a step and looked again at the “Workload Overview”, and specifically tab “Parts of response time“:

MWSnap101 2014-01-02, 09_55_04

That’s right, 26% of the response time for Dialog, is Roll Wait time.
Time to dig out the definition of SAP Roll Wait Time…
I’ve blogged about the basic performance tuning steps before, but I’ve yet to deep dive into the Roll Wait Time metric.

What is “Roll Wait Time”.
SAP note 8963 says:

The roll wait time is time that elapses on the client while the user context is in the roll area. Therefore, no resources are used on the client and there is no bottleneck here if the roll wait time is long.

So we may not be looking at a problem with the SAP server, but a problem between the SAP application and the front-end.
The note goes on to state that for DIALOG processing, the roll out of the context can be caused by calls to other external applications from the R/3 system, or from a call to an external RFC.
More importantly, it says:

As of Release 4.6, roll wait times also occur when the R/3 system communicates with the controls in the front end. While the data in the front end is processed, the R/3 context is rolled out, and the work process is released as a result. This may occur several times during a transaction step. If front-end office applications (such as Microsoft Word) are started and only closed after a long time (more than several minutes), a very long roll wait time also occurs.

This means that the communication to the front-end (SAP GUI, Microsoft Excel, Word etc), can cause the DIALOG work process to roll out, subsequently increasing the “Roll Wait Time”.
Even further clarification is provided in SAP notes 364625 and 376148 which mention the “new” GUI controls introduced in R/3 4.7.
SAP note 606610 explains how an ABAP “WAIT” command causes a task to roll-out of the work process.

SAP note 203924 provides some more detailed information on high Roll Wait Time:

As of Release 4.6 the roll wait time (and therefore the total response time) contains the time for installing “Enjoy Elements” (=controls) and therefore the essential part of the communication between the GUI and the R/3 instance. In this case, the response time displayed in Transaction ST03 is closer to (almost identical to) the time experienced by the user.

Plus it also confirms what we’re thinking:

As of Release 4.6, therefore, a high roll wait time is a first indication of a slow network between the GUI and the R/3 instance.

Section 2a of the note provides some good pointers in diagnosing network performance issues and checking the size of the SAP user menus.

As per the note, I opened up transaction ST06 and via the “Detailed Analysis” button, I went to the “LAN Check by PING” screen, then clicked “Presentation Server“:

 MWSnap104 2014-01-02, 10_57_13

MWSnap105 2014-01-02, 10_57_22

MWSnap106 2014-01-02, 11_02_32

Once here, I selected a random selection of “Presentation Server” clients and initiated a 10x PING.
What I found was that specific “Presentation Servers” (client PCs) were not responding within the expected 20ms:

 image

I knew that we were operating in a WAN environment (multiple offices across different cities) so I should be expecting a WAN connection time of between 50ms and 250ms (according to SAP note 203924).
I was seeing ~60ms in some cases.  So I can conclude that we have a moderately quick WAN setup.
The main point is that I was not seeing any packet loss.  Which is a good sign.

Whilst the immediate data is always good to see, it’s worth mentioning that the speed could be circumstantial.  It’s best to check the times in our ST03 statistics.  Opening up ST03 again and navigating to the “Workload Overview” analysis view, I can see on the “Times” tab, an “Average GUI Time per Dialog Step” of 90.4ms:

image

round trip

  GUI Time

GUI Time:  Measured at the Application Server, this is the average time taken for all requests (of a Dialog step) to go from the Application Server to the front-end SAP GUI (it does not include the time taken for the SAP GUI to send data to the Application Server).
The time I see is not too bad and will vary depending on the amount of data sent.

We can verify the value by checking the “Average Front-end Network Time” (per Dialog step):

image

  Front End Network Time

Front-end Network Time:  Measured at the SAP GUI, we see that the Front-end has recorded an average of 91.4ms time taken for the first request (of a Dialog step) to go from the Application Server to the front-end SAP GUI, plus the time taken for the last request (of a Dialog step) to go from the Application Server to the SAP GUI plus the processing required on the front-end for screen formatting.  This roughly agrees with our network finding of 60ms ping time (with no GUI processing at all), which means that on average, we’re probably seeing ~30ms of time for the SAP GUI to perform it’s rendering.

Based on the above findings I can rule out networking as being a probable cause to the high Roll Wait Time as it seems to be (on average) pretty normal.  Although I could recommend to my client that they use the “Low Speed Connection” setting in the SAP GUI as this is recommended by SAP in WAN setups (see SAP note 203924 section 4).  It’s also possible I could recommend reverting to the Classic Theme in the GUI.  Also recommended in the mentioned note.

SAP note 62418 discusses the typical amount of data sent per Dialog step in SAP ECC 6.0.  Around 4.6KB is to be expected (per Dialog step).  It doesn’t state if that is a normal connection setup or using the “Low Speed Connection” setup.
We can also look at the relationship of GUI Time to Roll Wait Time.  There’s something there that doesn’t look quite right.

If I go back to the Workload Overview, I can see that for DIALOG tasks, the “Average GUI Time” is 90.4ms, which should be almost equal to the “Roll-out” + “Roll Wait” + “Roll In” (these happen whilst the GUI is doing it’s stuff – during RFC).

Except, in my case, I can see that the GUI (plus the time to talk to it) is doing it’s stuff much quicker than the Application Server is rolling out and waiting (NOTE: We’ve had to drag the columns around in ST03 and we’re on the “All Data” tab):

image

0.4 + 147.7 + 15.8 = 163.9ms.

This is 163.9 – 90.4 = 73.5ms slower (on average, per Dialog step) than I would have expected!
This is ~12% of the response time (the Roll Wait Time is ~26% of the response time).

These are the possible reasons I considered to explain this:

  • Bad network performance.  We’ve looked at this and we can see that GUI Time is actually pretty normal.  Why would waiting around take longer than the time to do something.  Network performance seems good.
  • Lack of dialog work processes.  No it can’t be this, because this would not be attributable to Roll Wait Time, but instead would be measured as Dispatcher Queue Time (also known as “Wait Time” without the word “Roll”).
  • Longer time to  Roll In and Roll Out.  It’s possible that Roll In and Roll Out time could affect the calculations.  I’m seeing average Roll In time (per dialog step) of 15.8ms and Roll Out time of 0.4ms.  But this still doesn’t add up to 73.5ms.
  • Time taken to initialise and open the RFC connection to the SAP GUI.  It’s possible that the network lookup/hostname buffer is slow to get going before we manage to open the connection to the SAP GUI, but 75ms is pretty slow.

I needed to get out of the aggregated data and into the real nitty gritty.
I needed transaction STAD and some real live data.

Opening STAD, I left the default options with a read time of 10 minutes ago.
Once the records were displayed, I changed the layout of the screen to remove fields I wasn’t interested in, I then added fields I was interested in:
– Started
– Server
– Transaction
– Program
– Type
– Screen
– Work Process
– User
– Response Time
– RFC+CPIC
– Roll Wait
– GUI Time

Once I had the screen setup, it was a simple case of scanning through the records looking for any that had “Roll Wait” > “GUI Time” with 0 “RFC+CPIC”.

I found some!
One of the records (shown below) has a huge “Roll Wait” time of around 5 seconds, yet “GUI Time” is zero:

 image

It just so happens that at the same time as this STAD record was created, I was also running an Enqueue Trace, SQL Trace and Buffer Trace from transaction ST05 (yeah, it was a quiet day so I felt confident that dipping in and out of tracing wouldn’t hurt performance too bad).

So I had an opportunity to display these traces for the same period of time (and the same Work Process number).
I found that there were no long durations in the entire trace.  In fact, the total duration of the trace period didn’t add up to even 1 second.  What!

Sorting the trace by the “Time” column and manually scrolling through looking for the 5 seconds between one row and the next and sure enough, I found it.  The missing 5 seconds:

image

  image

I single clicked the first record after the time gap, and because it was an ABAP program, I was able to click the “Display Call Positions in ABAP Programs” button:

image

image

The source code position that I arrived at didn’t seem to be doing anything other than an OPENSQL statement, so I scrolled back up the program a little.
Then I saw it.  The program was moving UNIX files around.  Not only that, but there was an ABAP command “WAIT UP TO 5 SECONDS.”:

image

image

Here’s what the ABAP syntax help says:

This statement interrupts the execution of the program by the number of seconds specified… … after the specified time has passed, the program continues with the statement following WAIT.

It also states:

Each time the statement WAIT is used, a database commit is performed.  For this reason, WAIT must not be used between Open SQL statements that open or close a database cursor.


SUMMARY:

We’ve seen how “GUI Time” is measured and checked the network performance stats to see how accurate it is.
We’ve also learned how “GUI Time” is actually related in some ways to the value of the “Roll Wait Time”.

It’s been a long hard slog to get to the bottom of why I have a high average “Roll Wait Time” shown in ST03, when the average “GUI Time” is much lower.  A hardcoded ABAP statement was causing my work processes to WAIT for a fixed period of time, increasing the overall Response Time reported in ST03.  We referenced SAP note 606610 at the beginning of our article, but it seems very difficult to actually find out (without going through ABAP source) if a WAIT statement is has been the cause of Roll Wait Time.

We have subsequently learned that the ST03 Dialog Response Time measurements should be taken lightly, and that you should always try to exclude “GUI Time” by using the “Response Time Distribution” analysis view and clicking the “Dialog Without GUI Time” button.  This will exclude the “Roll Wait Time” as described in SAP note

ADDITIONAL NOTES FOR REFERENCE:
During my investigation, I also found a few other things.

We were actually suffering from the program error described in SAP note 1789729, so there is some potential to get better performance from the system by eliminating the additional database & buffer calls.

Some of the records in STAD contained HTTP records.
When I analysed these, I could see that certain document related transactions were calling out to the SAP HTTP Content Server to access documents.
I managed to calculate that the “Call Time” for the HTTP access, was recorded as “Processing Time” in the overall Response Time.
So, if you use a Content Server, be sure to check the response times, as this could also be a factor in slower response times, and this wasted time *is* recorded in the overall Response Time.
Obviously, using SSL will make this process slightly slower, so maybe some form of front-end cache would be better.

Thanks for reading.