Measuring downtime: Recovery Point Objective and Recovery Time Objective

Business data is the lifeblood of every organization. As a result, uptime is one of the most important necessities in the digital world—and perhaps its most fragile as well.

When that data can’t be accessed due to an outage, cyberattack, human error or for some other reason, unplanned downtime has a range of negative impacts to businesses of all sizes.

How do we measure downtime and the risks associated with it?

To better understand and plan how downtime impacts an organization, you need to set two critical metrics—Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

What is Recovery Point Objective?

Recovery Point Objective describes the maximum amount of data that can be lost in an incident before there is an unacceptable impact on business. The Recovery Point Objective is described as a function of time because it is based on the regular intervals when your data is backed up. For example, if the last available copy of data comes from 18 hours ago and your business continuity plan allows for a recovery time period no greater than 20, then you’re still within your Recovery Point Objective.

Recovery Point Objective also describes the worst-case scenario data loss associated with downtime. If you backup your data every 12 hours, then you will lose a maximum of the last 12 hours of data.

The Recovery Point Objective should be as low as possible in order to minimize the amount of damage caused by an incident. You should also set up notifications that warn you when your Recovery Point Objective is reaching critical levels, and set individualized RPO targets for each application based on the thresholds set in your service-level agreements—not just having a single shared Recovery Point Objective for your entire business.

Of course, reducing Recovery Point Objective means increasing the frequency of backups, which in turn increases the data and bandwidth requirements over time.

What is Recovery Time Objective?

In contrast, Recovery Time Objective (RTO) is the duration of time and service level within which a business process must be restored after notification to avoid unacceptable consequences associated with interruption.

Essentially, it’s the answer to the question in your recovery plan: “How long will it take before we’re back in business after a service outage or downtime?”

Just like RPO, an RTO must be reduced to as low a figure as possible. Every minute of downtime represents thousands of dollars in lost revenue.

Why is understanding the difference between RPO and RTO so important?

Understanding the difference between RPO and RTO is critical in your planning for disaster.

Knowing the maximum amount of time your business can tolerate being offline (RTO) and how much data loss is tolerable for business impact (RPO) can help shape your backup and recovery strategy and answer questions like what types of backups you should run for certain business-critical applications and how frequently those backups should take place, for example.

Why can RPOs and RTOs be considered calculations of risk?

Both RPO and RTO are calculations of risk because they provide measurements for how long a business can tolerate being offline from a disaster or outage.

The amount of risk is complex to quantify as it is unique to every company, application and dataset.

That's why it's so important that all the stakeholders invested in the availability of a business’s applications and data agree on how much risk can be tolerated when it comes to downtime.

What is a failover and failback?

Failover is the ability to switch automatically and seamlessly from one system of operation to another with minimal or no downtime for users. To achieve redundancy upon the abnormal failure or termination of a formerly active version, it is imperative that standby hardware components always stand ready to automatically switch into action.

All backup and recovery services must themselves be resistant to failure because disaster recovery relies on failover being successful.

Failback refers to switching back, known as 'failing back', to normal operations.

Defining RTO and RPO values for client applications

When defining an RTO for your MSP client's business, you should factor in:

  • The cost per minute/hour/day of an outage
  • Are there recovery SLAs in place with customers?
  • Which applications or systems are a priority for being restored?
  • What is the ideal order in which critical applications need to be recovered?

When defining the RPO for your MSP client's business, you should factor in:

  • How much data, if any, can you stand to lose?
  • What are the potential financial implications?
  • What are the potential legal implications?
  • How does data loss affect your brand and reputation?

Why do you need DRaaS in your MSP toolkit?

Traditional backup only protects a segment of data. That's why our practical and free white paper Most MSPs Have Inadequate Disaster Recovery Solutions outlines everything your MSP needs to know about the importance of DRaaS. 

Simply click below to download your copy today!