Azure Monitor & Metrics Design

In Azure, resources are dynamic and distributed. Without effective monitoring, issues can go unnoticed until they cause downtime or compliance failures.
Azure Monitor is the central service for collecting, analyzing, and acting on telemetry data from Azure resources and applications.

As a Solution Architect, your role is to design a monitoring strategy that ensures visibility, proactive alerting, and alignment with business SLAs.

Core Components of Azure Monitor

1. Metrics

Lightweight, real-time numeric values.
Examples: CPU %, memory usage, request count, disk IOPS.
Granularity: typically 1-minute intervals.
Best for performance monitoring and autoscaling triggers.

2. Logs

Detailed, text-based data about events and activities.
Examples: sign-in logs, network requests, app errors.
Stored in Log Analytics Workspace for querying with Kusto Query Language (KQL).
Best for root cause analysis and auditing.

3. Alerts

Automated actions based on conditions in metrics or logs.
Can trigger emails, SMS, push notifications, or automation (runbooks, Logic Apps).

4. Insights Solutions

Pre-built monitoring dashboards for common services:
- VM Insights → CPU, memory, disk usage.
- Container Insights → AKS clusters.
- Application Insights → web apps and APIs.

Designing a Monitoring Strategy

Plan What to Monitor

Infrastructure: VM health, network latency, disk performance.
Applications: API response times, request failures, exceptions.
Security: suspicious sign-ins, firewall changes, DDoS attempts.
Business Metrics: transaction volume, checkout success rate.

Choose the Right Data Source

Use Metrics for real-time and scaling.
Use Logs for deep analysis and compliance.

Alerting Design

Avoid alert fatigue (too many alerts = ignored alerts).
Use severity levels (Critical, Warning, Informational).
Route alerts to proper teams (Ops, Security, Dev).

Integration

Integrate with ITSM (e.g., ServiceNow).
Automate remediation via Logic Apps or Automation Runbooks.

Example Enterprise Scenario

A SaaS provider wants to:

Scale web servers when CPU > 70%.
Alert security team if more than 5 failed sign-ins happen from the same IP in 10 minutes.
Analyze API latency trends over time.

Correct design:

Use Metrics for CPU % and auto-scaling rules.
Use Log Analytics for sign-in analysis with KQL queries.
Use Application Insights for API response time monitoring.

Confusion Buster

Metrics vs Logs
- Metrics = real-time, numbers, fast decisions.
- Logs = detailed, text-based, deep analysis.
Azure Monitor vs Application Insights
- Monitor = umbrella service.
- App Insights = specialized for apps and APIs.
Alerts vs Automation
- Alerts = notify/trigger.
- Automation = fix/act.

Exam Tips

“Real-time CPU utilization tracking” → Metrics.
“Query login failures with KQL” → Log Analytics.
“Which tool monitors app performance (APM)?” → Application Insights.
“Company wants to auto-scale VMs based on performance” → Azure Monitor metrics + autoscale rules.

What to Expect in the Exam

Direct Q: “Which Azure service collects and analyzes telemetry data from resources?” → Azure Monitor.
Scenario Q: “Company wants proactive alerts for suspicious sign-ins.” → Logs + Alerts.
Trick Q: “Metrics provide detailed log data.” → False.