Perhaps IT implementors think that Disaster Recovery is a dark art, or maybe it’s just another item to de-scope from the implementation project due to time restrictions or budget cuts.
But I find that it’s coming up more and more as part of the yearly financial system audits (the ones we love to hate!).
Not only is it an action on the main financial systems, but the auditors are training there sights on the ancillary systems too.
So where do I normally start?
Usually it’s not a case of where I would like to start, it’s where I *have* to start due to missing system documentation and configuration information.
Being a DBA, I’m responsible for ensuring the continuity of business with respect to the database system and the preservation of the data.
So no matter what previous work has been completed to provide a DR capability, or a HA capability (so often confused) it’s useless without any documentation or analysis to show what the tolerances are (how much data can you afford to lose – RPO, and how much time it will take to get a system back, RTO).
First things first, ensure that the business recognises that DR is not just an IT task.
You should point them at the host of web pages devoted to creating the perfect Business Continuity Plan (BCP).
It states that a BCP should include what *the business* intends to do in a disaster. This more often than not involves a slight re-work to the standard business processes. Whether this involves going to a paper based temporary system or just using a cut-down process, the business needs a plan.
It’s not all about Recovery Point Objective and Recovery Time Objective!
A nice “swim lane” diagram will help a lot when picturing the business process.
You should start at the beginning of your business process (normally someone wants to buy something), then work through to the end (your company gets paid). Add all the steps in between and overlay the systems used. Imagine if those systems were not present, how could you work around them.
Secondly, you need to evaluate the risks among the systems you support. Identify points of failure.
To be able to do this effectively, you will need to know your systems inside out.
You should be aware of *all* dependencies (e.g. interfaces) especially with the more complex business systems like ERP, BPM, BI, middleware etc.
Start by creating a baseline of your existing configuration:
– What applications do you have.
– What versions are they.
– How are they linked (interfaces).
– Who uses them (business areas).
– What type of business data do they store (do they store data at all).
Third, you should make yourself aware of the hardware capabilities of important things like your storage sub-system (SAN, NAS etc). Can you leverage these devices to make yourself better protected/prepared.
An overall architecture diagram speaks a thousand words.
Once you understand what you have got, only then can you start to formulate a plan for the worst.
Decide what “the worst” actually is! I’ve worked for companies that are convinced an aeroplane would decimate the data centre at any moment. All the prep-work in the world couldn’t prevent it from happening, but maybe you can save the data (if not yourself 😉 ).