Why These Concepts Matter
Every business wants zero downtime, but in reality, solutions must balance cost, complexity, and risk tolerance. Two key metrics — RTO (Recovery Time Objective) and RPO (Recovery Point Objective) — drive continuity planning.
As a Solution Architect, you must design systems that meet these objectives while leveraging Azure’s high availability (HA) and disaster recovery (DR) capabilities.
Key Terms
Recovery Time Objective (RTO):
-
Maximum time allowed to restore service after an outage.
-
Example: “Web app must be back online within 2 hours.”
Recovery Point Objective (RPO):
-
Maximum acceptable data loss measured in time.
-
Example: “Database can lose max 15 minutes of transactions.”
High Availability (HA):
-
Ensures app/service stays online during localized failures (VM, rack, datacenter zone).
Disaster Recovery (DR):
-
Ensures workloads can recover from regional or catastrophic failures.
Azure High Availability Building Blocks
-
Availability Sets
-
Protect against rack-level failures (fault & update domains).
-
SLA = 99.95%.
-
Availability Zones (AZs)
-
Separate datacenters within a region.
-
Protect against datacenter failures.
-
SLA = 99.99%.
-
Paired Regions
-
Azure links regions into pairs for disaster recovery.
-
Example: East US ↔ West US.
-
Ensures data replication, updates, and failover safety.
-
Geo-Redundant Storage (GRS)
-
Data replicated to a secondary paired region.
-
Provides durability in case of regional outage.
Design Considerations
-
Match SLA to Business Needs
-
Mission-critical apps → Zones + multi-region failover.
-
Internal apps → Availability Sets may be enough.
-
Define RTO & RPO per workload
-
Finance system → RPO of seconds, RTO of minutes.
-
HR system → RPO of hours, RTO of a day.
-
Cost vs Continuity
-
High availability = higher cost (multiple regions, more redundancy).
-
Not every app needs multi-region DR.
Example Enterprise Scenario
A global e-commerce company requires:
-
Customer-facing website always available (HA + DR).
-
Database must not lose more than 5 minutes of data.
-
Internal HR portal can be offline for up to 24 hours in disaster.
Correct design:
-
Use Availability Zones for website VMs.
-
Replicate DB with Geo-Replication / Always On groups for 5-min RPO.
-
Backup HR portal to Azure Backup, accept longer RTO/RPO.
Confusion Buster
-
HA vs DR
-
HA = handles small/localized failures.
-
DR = handles regional/catastrophic failures.
-
-
RTO vs RPO
-
RTO = “How fast can we recover?”
-
RPO = “How much data can we afford to lose?”
-
-
GRS vs RA-GRS
-
GRS = replicates to another region, no read access.
-
RA-GRS = adds read access to secondary region.
-
Exam Tips
-
“Which metric defines acceptable downtime?” → RTO.
-
“Which metric defines acceptable data loss?” → RPO.
-
“Which Azure feature provides intra-region HA?” → Availability Zones.
-
“Which Azure feature ensures regional disaster recovery for storage?” → GRS.
What to Expect in the Exam
-
Direct Q: “What does RPO measure?” → Max data loss in time.
-
Scenario Q: “Finance system must recover within 15 minutes, losing max 5 min of data.” → Low RTO + Low RPO → use Availability Zones + Geo-replication.
-
Trick Q: “High availability and disaster recovery are the same.” → False.