Summary

Recovery time objective (RTO) and recovery point objective (RPO) are two concepts that are used in business continuity and disaster recovery planning to establish a business’s tolerance for data loss and recovery time in the event of a failure.

image_pdfimage_print

Recovery time objective (RTO) and recovery point objective (RPO) are two important concepts used in disaster recovery planning. Both represent critical points of failure.

RTO is the service level defining how long a recovery may take before unacceptable levels of damage occur from an outage. Meanwhile, RPO is the service level defining the point in time when data loss resulting from an outage becomes unacceptable. Exceeding both has the same result: business suffers. 

In this article, we’ll deep dive into what the key differences are between RTO and RPO and why they’re important in a disaster. We’ll also cover how you incorporate them into disaster planning and what to prioritize. 

RTO in Disaster Recovery

RTO, as stated above, is the maximum amount of time a business can function without a specific application. After that time, the business suffers. That can range from the inability to appropriately onboard and schedule staff if a staff resourcing system is down to the inability to process revenue if a digital storefront or payment processing system is down. 

That means that RTO will vary between different systems and even how different systems interact with different parts of the business. The level of granularity depends on the criticality of the system to support business operations. 

Organizations typically define RTO as a time period greater than zero. Very critical systems might have a very low RTO time, ranging from one to four hours. Less critical systems may have RTO times ranging from hours to days. Establishing those times depends on what priorities a business sets for itself and the resources it can bring to bear during disaster recovery.

RPO in Disaster Recovery

RPO is a corollary to RTO with a much different focus. RPO refers to the amount of data, typically expressed in time, that can be lost before business operations suffer. The volume of data will change drastically, depending on the services provided by the downed system. 

For example, hospitals use electronic health record (EHR) systems to sustain clinical care operations. EHR systems tend to have low RPO due to the patient safety issues downtime presents. Building access systems that can operate independently in a disaster may have a very high RPO, especially if they safeguard a low-trafficked area. 

RPO helps define RTO. If high volumes of data loss are unsustainable or unacceptable, then the RTO must be low for that system. The system must be up to receive data within the specified RTO so that the RPO isn’t compromised. There are ways to mitigate that and will depend on how you structure a disaster recovery strategy.

Remember, though, RPO is focused almost entirely on the data itself and preserving its integrity. 

How RTO and RPO Play Into a Disaster Recovery Strategy

RTO and RPO both play heavily into a disaster recovery strategy. They converge for very technologically oriented processes and tend to align for business-critical processes. That means that for very critical system-dependent business processes or functions, downtime and data loss approach unacceptability. 

When thinking about RTO, there are a few concepts to think about when crafting a strategy:

  • Resource availability: Since no organization has infinite staff or money, there are limits on resources that can be brought to bear recovering from a technological disaster. RTOs for different systems need to reflect their criticality to business operations so finite resources can be leveraged to bring up more critical systems first. 
  • Business priorities: A disaster recovery strategy must account for business priorities like revenue, supply chain, staff management, and others. Those priorities should be ranked by way of importance. Systems supporting more important functions should be prioritized over those that support less important functions.
  • Support systems: Some systems may aid in disaster recovery but may otherwise be indicated as low importance because of their only indirect and tangential support of mission-critical priorities. Systems that ease and expedite disaster recovery efforts should be prioritized for recovery.

Similarly, when thinking about RPO, business continuity and disaster recovery plans should factor in these key concepts:

  • Backups: For business processes with relatively low RPO, think about frequent backups to assure continuity and preservation of RPO levels.
  • Data volumes: Data volumes will impact RPOs because of the ability to preserve continuity during downtime. Very high data volume processes are difficult to back up frequently if backups aren’t correctly architected and could result in large volumes of lost data in a downtime.
  • Cost: Backups cost money. All backups take up storage space, which can be priced by the gigabyte, and cloud backups may incur ingress and egress charges depending on architecture. A careful cost versus risk of loss analysis is a helpful tool to plan for disaster recovery. 
  • Data criticality: Less critical data might not need to be backed up as frequently as highly critical data. Evaluating data criticality to business processes is key to managing appropriate recovery objectives.
  • Data changes: Some data stores may experience regular high volumes of changes, while others will rarely change. Backup frequency and maintenance should account for the statefulness of the target data set.

Based on these factors, it’s straightforward to see why high-velocity, high-volatility, data-driven technology workflows that support critical business functions may have both low RTO and low RPO. Fortunately, these correlated concepts have a common resolution: tailored backup solutions with rapid recovery. 

SLA Perception vs. Reality: RPO and RTO

Many IT managers believe meeting their RPO and RTO SLAs is achievable. But reality often fails to meet perception.  A report by ESG stated 90% of respondents reported their organization could not withstand in excess of an hour’s worth of lost data before experiencing significant business impact, equating to an estimated mean RPO of 22 minutes.This is especially troublesome when nearly half of these organizations indicated that their data is typically at least a week old, with the overall average age being 49 days. The amount of data restored as part of a typical recovery effort tends to skew older, especially for larger recoveries. 

The vast majority (71%) of one-day old recoveries are less than 50 GB, which could simply be explained by the nature of these types of recoveries, such as corrupted tables, deleted files, etc., and make it much easier to deliver on operational recovery SLAs. Past the one-day window, a significant and progressive jump occurs towards larger recoveries—longer time means more data, and likely more recovery time and resources. 

Disaster Recovery as a Service with Pure Protect

A cyber resiliency architecture can significantly minimize RTO and RPO by implementing robust backup and disaster recovery solutions that ensure rapid system restoration and data retrieval after a cyber incident.

Pure Protect™//DRaaS does exactly that. Its tailored solutions are right-sized for businesses and their critical assets. Pure Protect //DRaaS also keeps your data where you need it: in your custody and at your fingertips. By maintaining it in your AWS cloud, Pure Protect //DRaaS also ensures that you can maximize recovery speed through cloud-preconfigured workloads. 

What’s more, Pure Protect //DRaaS helps you test your backups and resilience in a segmented test environment that maximizes your preparation for the worst, while avoiding unwanted disruptions to your production environment. Pure Protect //DRaaS is a highly resilient, highly transparent solution for all your disaster recovery needs. 

Conclusion

Disaster recovery planning can be a daunting proposition. Keeping RTO and RPO in mind conceptually as you plan is key. Thinking about recovery as downtime in hours and gigabytes streamlines the ability to focus on key metrics and requirements. 

Those requirements and metrics revolve around doing business. What’s important to your business and keeping it running is the focus of your disaster recovery efforts. Those hours and gigabytes directly translate into dollars and understanding how to minimize that loss as quickly as possible will make your organization more resilient and ready for disaster.