Why Application-Level Resiliency?
Even if your infrastructure is redundant, apps can still fail due to transient faults, high load, or dependency failures. Resilient apps are designed to recover gracefully and continue serving users. Azure supports these patterns through platform services (Service Bus, Event Grid, Functions) and design principles.
1. Retry Logic
Definition:
Apps should automatically retry failed operations (e.g., DB connection drops, API call fails).
Best Practices:
-
Use exponential backoff (increasing retry intervals).
-
Limit max retries to avoid infinite loops.
-
Use SDKs with built-in retry policies (e.g., Azure Storage SDK).
Exam Tip: If question says “handle transient SQL connectivity errors” → use Retry Logic.
2. Queues & Asynchronous Messaging
Definition:
Decouples components by buffering requests in a queue.
Options in Azure:
-
Azure Storage Queues → simple, lightweight messaging.
-
Azure Service Bus → enterprise-grade (topics, sessions, dead-letter queues).
-
Event Grid → event-based, push model for serverless workflows.
-
Event Hubs → big data ingestion, telemetry.
Benefits:
-
Avoids bottlenecks.
-
Increases resiliency when downstream services are unavailable.
-
Enables async processing.
Exam Tip: If scenario says “app must remain available even if backend service is down” → Queue-based decoupling.
3. Decoupling & Loose Coupling
Definition:
Design so components don’t directly depend on each other’s availability.
Techniques:
-
Microservices with API Gateway.
-
Queue-based buffering.
-
Event-driven architecture (publish/subscribe).
-
Azure Functions triggered by queues/events.
Benefit:
-
Failures in one component don’t cascade to others.
4. Circuit Breaker Pattern
Definition:
Prevents apps from endlessly calling a failing service by “breaking” the connection until it recovers.
How It Works:
-
If a service fails repeatedly → breaker trips → requests fail fast.
-
After a cooldown, test request is sent. If successful, breaker resets.
Example:
-
Payment service dependency fails → circuit breaker stops calls → fallback response provided.
Azure Implementation:
-
Custom code with libraries like Polly in .NET.
-
Logic App/Function retries with error handling.
Example Enterprise Scenario
A ride-sharing app requires:
-
Retry when database is under heavy load.
-
Continue accepting ride requests even if billing service is down.
-
Prevent repeated billing API failures from overwhelming backend.
Correct design:
-
Implement retry with exponential backoff for DB queries.
-
Use Service Bus queues to decouple ride requests from billing.
-
Implement circuit breaker for billing API.
Confusion Buster
-
Retry vs Circuit Breaker
-
Retry = assume issue is temporary, try again.
-
Circuit Breaker = assume issue is persistent, stop hammering.
-
-
Service Bus vs Event Grid
-
Service Bus = ordered, reliable delivery (transactions, DLQ).
-
Event Grid = lightweight, push events (serverless triggers).
-
-
Queue vs Direct API Call
-
Queue = decoupled, resilient.
-
Direct call = tight coupling, failure cascades.
-
Exam Tips
-
“Which design pattern handles transient SQL errors?” → Retry Logic.
-
“Which Azure service supports async messaging with dead-letter queue?” → Service Bus.
-
“Which pattern prevents apps from repeatedly calling a failing service?” → Circuit Breaker.
-
“Which service provides event-driven triggers with low latency?” → Event Grid.
What to Expect in the Exam
-
Direct Q: “Which pattern prevents cascading failures when a service is down?” → Circuit Breaker.
-
Scenario Q: “E-commerce app must continue taking orders even if payment service is down.” → Queue decoupling with Service Bus.
-
Scenario Q: “App must automatically retry transient SQL connection drops.” → Retry logic with exponential backoff.
-
Trick Q: “Queues guarantee instant processing of requests.” → False (they enable async, not instant).