Database & Storage Resiliency (SQL Auto-Failover, Cosmos DB, GRS, RA-GRS)

Applications are only as resilient as their data layer. If the database or storage becomes unavailable, the entire system fails. Azure provides built-in mechanisms to ensure high availability (HA), disaster recovery (DR), and global distribution for databases and storage accounts.

1. SQL Database Resiliency

a. Built-in HA

Azure SQL Database automatically replicates data within a region.
Service tiers (Basic, Standard, Premium) include HA guarantees.

b. Auto-Failover Groups

Enable automatic failover between two regions.
Supports both SQL DB and SQL Managed Instance.
Provides a single listener endpoint.
Supports both planned (maintenance) and unplanned failovers.

c. Geo-Replication

Active geo-replication → up to 4 readable secondaries.
Ideal for read-intensive apps distributed globally.

Exam Tip: If you see “cross-region automatic failover with minimal downtime” → Auto-Failover Groups.

2. Cosmos DB Resiliency

a. Multi-Region Distribution

Natively distributes data across multiple Azure regions.
Clients connect to nearest region → low latency.

b. Consistency Models (from strongest to weakest):

Strong (global ACID consistency).
Bounded Staleness.
Session (default, good balance).
Consistent Prefix.
Eventual (lowest latency, eventual consistency).

c. Multi-Master Write Support

Enables writes in multiple regions.
Automatic conflict resolution policies.

Exam Tip: If scenario needs “global distribution + tunable consistency” → Cosmos DB.

3. Storage Account Resiliency

a. Replication Options:

LRS (Locally Redundant Storage): 3 copies in same datacenter.
ZRS (Zone-Redundant Storage): 3 copies across AZs within a region.
GRS (Geo-Redundant Storage): LRS + async replication to paired region.
RA-GRS (Read-Access GRS): Same as GRS but secondary is readable.

b. Use Cases:

LRS → cheapest, non-critical data.
ZRS → high availability within region.
GRS → disaster recovery with cross-region replication.
RA-GRS → disaster recovery + global read endpoints.

Exam Tip: If requirement says “global read access during regional outage” → RA-GRS.

Example Enterprise Scenario

An e-commerce platform requires:

SQL Database must failover automatically to another region.
Global catalog DB must support multi-region writes.
Storage account must allow read access from secondary during outage.

Correct design:

Use SQL Auto-Failover Groups for DB continuity.
Deploy Cosmos DB multi-master for catalog.
Configure RA-GRS storage for product images.

Confusion Buster

Geo-Replication vs Auto-Failover Groups (SQL)
- Geo-replication = manual failover.
- Auto-Failover Groups = automatic failover with listener endpoint.
Cosmos DB vs SQL DB
- Cosmos DB = global, multi-model, tunable consistency.
- SQL DB = relational with cross-region HA.
GRS vs RA-GRS
- GRS = replicated but secondary not readable.
- RA-GRS = replicated + readable secondary.

Exam Tips

“Which feature enables cross-region automatic DB failover?” → Auto-Failover Groups.
“Which Azure DB offers tunable consistency models?” → Cosmos DB.
“Which storage redundancy option allows read access to secondary region?” → RA-GRS.
“Which Cosmos DB consistency level balances performance and correctness?” → Session.

What to Expect in the Exam

Direct Q: “Which redundancy option allows zone-level protection within a region?” → ZRS.
Scenario Q: “Company requires multi-region DB writes with conflict resolution.” → Cosmos DB multi-master.
Scenario Q: “Company requires read access during regional outage.” → RA-GRS.
Trick Q: “SQL geo-replication provides automatic failover.” → False (manual unless Auto-Failover Groups).