| DRP Disaster Recovery Plan | | Print | |
|
The article describes a plan and implementation process for disaster recovery planning. The secret to success in our experience is to involve the local response team from the outset of the project. Copyright 2006 D.Lieberman. This work is licensed under the Creative Commons Attribution License The disaster recovery plan is designed to assist companies in responding quickly and effectively to a disaster in a local office and restore business as quickly as possible. In our experience, participation in the planning and implementation process is more important than the process itself and helps ensure that the local response teams understand what they need to do and that resources they need will be available. Keywords
DR planning is not about writing a procedure, getting people to sign up and then filing it away somewhere. In the BIT (business impact timeline) we see a continuum of actions before and after an incident. In the pre-incident phase, the teams are built, plans are written, and preparedness is maintained with training and audit. After an incident, the team responds, recovers, restores service and assesses effectiveness of the plan. T=ZERO is the time an incident happens. Even though one hopes that disaster will never strike, refresher training should be conducted every 6 months because of employee turnover and system changes and self-audits conducted by the ERT every 3 months. Building the DR planBuild the ERTAssign a 2-person team in each major office (for small offices with one or two people, then the employee will do it himself) to be the ERT. The people in the ERT need to have both technical and social skills to handle the job. Technical skills means being able to call an IT vendor and being able to help the vendor diagnose a major issue such as an unrecoverable hard disk crash on an office file and print server. Social skills means staying cool under pressure and following procedure in major events such as fire, flooding or terror attack. In addition to an ERT in each office, one ERT will be designated as response manager. The response manager is a more senior person (with a backup person) that will command the local teams during crisis, maintain the DRP documentation and provide escalation. The local response team becomes involved and committed to the DRP by planning their responses to incidents and documenting locations of resources they need in order to respond and restore service. DR Planning Pre-incident activitiesKickoff callThe purpose of the call is to introduce the DRP process and set expectations for the local ERT. Two days before the call, the local team will receive a PowerPoint presentation describing DRP, the implementation process and the BIA worksheet. At the end of the call, the team will take a commitment to fill out the worksheet and prepare for a review session on the phone one week later. Business Impact Assessment (BIA)In the BIA, the team lists possible incidents that might happen and assesses the impact of a disaster on the business. For example there are no monsoons in Las Vegas but there might be an earthquake (Vegas is surrounded by tectonic faults and number 3 in the US for seismic activity) and an earthquake could put a customer service center in Vegas out of business for several days at least. Recover and RestoreRecovery is about the ERT having detailed and accessible information about backups data, server, people and alternative office space. Within 30 days after a disaster, full service should be restored by the ERT working with local vendors and the response manager. DR Plan ReviewThe purpose of the call is to allow each team to present their worksheet and discuss appropriate responses with the global response manager. Two days before the call, the teams will send in their BIA worksheet. The day after the call the revised DRP will be posted. Filling out the DRP worksheetsThere are two worksheets the BIA worksheet (which turns into the primary response checklist) and the RRP (recover and restore plan) worksheet, which contains a detailed list of how to recover backup resources and restore service. Filling out the BIA worksheet.In the BIA worksheet, the team lists possible incidents and assesses the impact of a disaster on the business. In order to assess the impact of a disaster on the business we grade incidents using a tic-tac-toe matrix. The team will mark the probability and impact rating for an incident going across a row of the matrix. A risk might have probability 2 and impact 5 making it a 7, while another risk might have probability 1 and impact 3 making it a 4. Countermeasures would be implemented for the 7 risk before being implemented for the 4 risk. BIA worksheet step by step
Filling out the RRP worksheet.In the RRP worksheet, the team documents in detail how to locate and restore backups and how to access servers (in the network and physically). Maintaining the DR planDR exercisesOnce every 6 months, the response manager will run an unannounced exercise, simulating an emergency. In a typical DR exercise the local ERT will be required to:
Self-AuditAfter completion of the ER plan the local response team needs to perform periodic self-audits. A member of the local ERT will schedule an audit once every 3 months and notify the response manager by email regarding the date.
|
