This blog contains experience gained over the years of implementing (and de-implementing) large scale IT applications/software.

Dropping Empty SAP BW Table Partitions in SAP ASE

In a SAP BW 7.4 on SAP ASE database, table partitions are used as a method of storing data, primarily for query performance but also for object (table) management.

In this post I show a simple way to identify tables with many empty partitions, so that you can more quickly drop those empty partitions.
Less empty partitions reduces the downtime of database migrations, and can also increase the performance of business-as-usual queries.

Partitioning in SAP ASE

To use SAP BW on SAP ASE, the “partitioning” license needs to be bought/included in the database license.
The license is automatically included in the runtime license for ASE for SAP Business Suite.

SAP note 2187579 “SYB: Pros and cons of physical partitioning of fact tables” list all of the benefits and the options of partitions for ASE 15.7 and ASE 16.0.

During normal business usage, the database can use less effort to obtain data from a partitioned table, when the partition key column is used as a predicate in the query.
This is because the database knows exactly where the data is held.
It’s in its own table partition and is therefore more manageable.

A good analogy is to imagine that you have two holding pens, one with all cats, and one with all dogs. The partition key is “animal type” and each holding pen is a partition.
Both holding pens together make the table.
If we wanted to get all data from the table where the occupant was a cat, we simply go to the pen with all the cats and read the data.

Now imagine that we had 3 partitions that make up our table, but one of those partitions was actually empty.
In some cases, depending on the database settings, certain types of data queries will still scan for data in that empty partition.
These additional scans do not take a huge amount of time individually, but it does cause extra effort nevertheless.

If we upscale our scenario to a large multi-terabyte SAP BW system, and to a BW FACT table with thousands of partitions.
Imagine if we had thousands of empty partitions and we were accessing all records of the table (a full table scan), this would give a reasonable delay before the query could return the results.
For this exact reason, SAP provide a BW report.

The Standard SAP BW Report

The standard ABAP report SAP_DROP_EMPTY_FPARTITIONS is specifically for the FACT tables of a BW system and it is a recommendation in the ASE Best Practices document for this report to be run before a BW system migration/copy is performed.

By reducing the empty partitions, we also reduce the system export time. Easy winner.

The problem with the SAP standard report, is that you will need to go through each individual BW info-cube and execute the report in “show” mode.
This is really, really painfully slow.

A Better Way

Instead of the standard report, I decided to go straight to the DB layer and use SQL.
The example below is for SAP ASE 16.0 (should work on 15.7 also):

select distinct 
       convert(varchar(20),so.name) as tabname, 
       t_spc.numparts-1 as num_parts, 
       t_spn.numparts-1 as num_emptyparts 
from sysobjects so, 
     (select sp1.id, 
             count(sp1.partitionid) as numparts 
      from syspartitions sp1 
      where sp1.indid = 0 
      group by sp1.id 
     ) as t_spc,
     (select sp2.id, 
            count(sp2.partitionid) as numparts 
      from syspartitions sp2, 
           systabstats sts 
      where sp2.indid = 0 
        and sp2.partitionid = sts.partitionid
        and sts.indid = 0 
        and sts.rowcnt = 0 
      group by sp2.id 
     ) as t_spn 
where so.name like '/BIC/F%' 
  and so.id = t_spc.id 
  and so.id = t_spn.id 
  and so.loginame = 'SAPSR3' 
  and t_spn.numparts > 1 
order by t_spn.numparts asc, so.name

It’s fairly crude because it restricts the tables to those owned by SAPSR3 (change this if your schema/owner is different) and it is looking for FACT tables by their name (“/BIC/F*”) which may not be conclusive.

Below is an example output of the SQL versus the report SAP_DROP_EMPTY_FPARTITIONS in “show” mode:

You can see we are only 1 count out (I’ve corrected in the SQL now) but essentially we get a full list of the tables on which we can have the most impact!

Let’s look at a sample SELECT statement against that table:

We used the following SQL:

set statistics time on 
select count(*) from [SAPSR3./BIC/FZPSC0201]
go

Execution time on that statement was 25.9 seconds (elapsed time of 25931 ms).
We spent 2 seconds parsing and compiling the SQL statement (lots of partitions probably doesn’t help this either).
Since the CPU time is only 7 seconds, we have to assume that I/O was the reason for the delay while ASE scanned over the partitions.

Dropping The Partitions

Let’s go ahead and actually drop those empty partitions using another standard ABAP report SAP_DROP_EMPTY_FPARTITIONS.

NOTE: See SAP note 1974523 “SYB: Adaption of SAP_DROP_EMPTY_FPARTITIONS for SAP ASE” for more details on how to use the report.


We need to run this in the background, because dropping over 1,000 partitions will take a while.

Once dropped, we can re-run our select statement:

Total elapsed time is now down to just 6 seconds.
Admittedly there could be some time saving due to the data cache and plan cache already being populated for this table, so I ran ASE stored procedure: sp_unbindcache, which seemed to have done something.
Then I re-ran the query:

Being unsure if the unbind worked or not (I could not bounce the DB to be absolutely sure), I’m going to just accept that we have improved the result by dropping those empty partitions.

HANA OOM Error Tip #1 – Partition Tables Correctly

If your HANA system is regularly experiencing OOM (Out Of Memory) errors, then there are a number of things that you can do to try and reduce memory consumption.

Tip #1:  Partition Large Tables Correctly
If there are large Column Store tables in your HANA system, you should partition them.
Whilst this is an absolute must in a HANA scale-out scenario (for enabling multi-node parallelism), it might not be so obvious that it can also help in a single node HANA system.
Partitioning a column table means that only the required partitions of the table are loaded into memory when accessed, you would think.

Partitioning a large table into smaller chunks will therefore help to reduce the memory usage of the table during SQL queries and also during updates.
During updates, each partition gets its own delta cache area.

Choosing how to partition a table is slightly more difficult and will depend on whether the table is a custom table, SAP standard table or other.
Plus, you will need to know what and how queries or updates are executed against the table.  A period of monitoring is suggested, to enable to you collect the required information to be able to make a decision.

One thing you should try to do, is partition the table using the most logical, selective columns.
Read on for a simple example with a twist!

A simple example, a single node HANA system has 1 large column table T1.
The table is partitioned into ranges based on the date column INVOICEDATE:

CREATE COLUMN TABLE “DARRYL”.”T1″ (“INVOICEREF” VARCHAR(1) NOT NULL ,
     “INVOICEDATE” DAYDATE CS_DAYDATE NOT NULL ) UNLOAD PRIORITY 5 AUTO MERGE WITH PARAMETERS (‘PARTITION_SPEC’ = ‘RANGE year(INVOICEDATE) 2000-2001,2001-2002,2002-2003,*’)
;
CREATE UNIQUE INDEX “I1” ON “DARRYL”.”T1″ ( “INVOICEREF” ASC ) NONLEAF PARTIAL KEY LENGTH 1;

As you can see, I’ve created 3 partitions by year:  2000 to 2001,  2001 to 2002 and 2002 to 2003.
This will actually create 4 partitions:  year 2000,  year 2001, year 2002 and year <OTHER>.

HANA Table Distribution

Insert 5 records into the table:

insert into darryl.t1 (INVOICEREF,INVOICEDATE) values(‘1′,’2000-01-01’)
insert into darryl.t1 (INVOICEREF,INVOICEDATE) values(‘2′,’2001-01-01’)
insert into darryl.t1 (INVOICEREF,INVOICEDATE) values(‘3′,’2002-01-01’)
insert into darryl.t1 (INVOICEREF,INVOICEDATE) values(‘4′,’2003-01-01’)
insert into darryl.t1 (INVOICEREF,INVOICEDATE) values(‘5′,’2004-01-01’)

Inside the Table Distribution tab, you will now see that the records have been inserted according to their values into the respective partitions (see Raw Record Count field on the right):

HANA Table Distribution

The last two records for year 2004 and 2003 are in the fourth partition.
You can also see that each partition has a Delta Size, and that the Delta Size for the fourth partition with the most records, is larger than the other partitions.
Unload the table from memory:

HANA Unload Table from memory

Refreshing the Table Distribution tab now shows the table partitions to have zero size in memory:

HANA Table Distribution

Now select the records for the years 2004 and 2005 only:

select * from darryl.t1 where invoicedate in (‘2004-01-01′,’2003-01-01’)

Refreshing the Table Distribution tab now shows the tables to have non-zero size in memory for ALL partitions!

HANA Table Distribution

All of the records from all of the partitions appear to be loaded!
What went wrong?
Well, it’s simple, we didn’t create an index on the column INVOICEDATE.
This forced HANA to scan through the entire table to access the required records, meaning that it needed to load them all into memory.

Let’s create an index in INVOICEDATE:

CREATE UNIQUE INDEX “I1” ON “DARRYL”.”T1″ ( “INVOICEDATE” ASC ) NONLEAF PARTIAL KEY LENGTH 1;

Unload the table from memory:

HANA Unload Table from memory

Refreshing the Table Distribution tab now shows the tables to have zero size in memory:

HANA Table Distribution

Now re-execute the SELECT statement:

select * from darryl.t1 where invoicedate in (‘2004-01-01′,’2003-01-01’)

Once again, on the Table Distribution tab, we can see that it has accessed all partitions, AGAIN!:

HANA Table Distribution

What went wrong this time?  Well, HANA doesn’t yet have any statistics on the table data, so it simply ignored the index.
If you now unload the table from memory once again (we haven’t done anything else):

HANA Unload Table from memory

Now re-execute the SELECT statement:

select * from darryl.t1 where invoicedate in (‘2004-01-01′,’2003-01-01’)

Look at the Table Distribution tab:

HANA Table Distribution

You can see that HANA has now only accessed the final partition of the table.  The other partitions have not been loaded into memory.
At first I thought this feature might be due to statistics, so I tried removing them from the table T1 (drop statistics on T1;).  Then I retried the process of unloading and re-running the query.  This had no effect, HANA correctly went straight to the fourth partition.
This left me with one other option, the Plan Cache.

Clearing the Plan Cache using:

ALTER SYSTEM CLEAR SQL PLAN CACHE

I then re-ran the test by unloading the table from memory:

HANA Unload Table from memory

Re-executing the SQL SELECT:

select * from darryl.t1 where invoicedate in (‘2004-01-01′,’2003-01-01’)

Look at the Table Distribution tab:

HANA Table Distribution

Bingo!
The Plan Cache was storing some form of execution plan statistics that meant that it was accessing the fourth partition straight away.
Each time the table is unloaded, the statistics from the existing Plan Cache remain and are re-used upon next execution of the query, which means HANA is able to go straight to the fourth partition.

Summary:
Partitioning is a great way of parallelising access to a table in HANA.
It also serves to help reduce memory by only loading specific partitions into memory when they are required.
In order to effectively use partitioning, you need to partition on an indexed column.
The initial access of a newly partition table with a new index, does not enable the benefits of partition until the second subsequent access of the table due to the Plan Cache.  A method/process of pre-loading the execution plan/statistics into the cache is required.