Observability is critical for understanding complex microservices systems. This guide covers the three pillars: metrics, logs, and traces.
The Three Pillars
Metrics
Numerical measurements over time:
1
2
3
4
5
6
7
8
9
| from prometheus_client import Counter, Histogram
request_count = Counter('http_requests_total', 'Total requests')
request_duration = Histogram('http_request_duration_seconds', 'Request duration')
@request_duration.time()
def handle_request():
request_count.inc()
# Handle request
|
Logs
Discrete events:
1
2
3
4
5
6
7
| {
"timestamp": "2024-03-09T10:00:00Z",
"level": "ERROR",
"service": "user-service",
"traceId": "abc123",
"message": "Database connection failed"
}
|
Traces
Request flow across services:
1
2
3
4
5
6
7
8
9
10
| const { trace } = require('@opentelemetry/api');
const span = trace.getTracer('my-service').startSpan('processOrder');
span.setAttribute('order.id', orderId);
try {
await processOrder(orderId);
} finally {
span.end();
}
|
OpenTelemetry
Unified observability framework:
1
2
3
4
5
6
7
8
9
10
11
12
| from opentelemetry import trace, metrics
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
tracer = trace.get_tracer(__name__)
app = FastAPI()
FastAPIInstrumentor.instrument_app(app)
@app.get("/users/{user_id}")
async def get_user(user_id: int):
with tracer.start_as_current_span("get_user"):
return await fetch_user(user_id)
|
Monitoring Stack
Prometheus + Grafana
1
2
3
4
5
6
7
8
9
10
11
| # docker-compose.yml
services:
prometheus:
image: prom/prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana
ports:
- "3000:3000"
|
ELK Stack
1
2
3
4
5
6
7
8
9
10
11
12
13
| services:
elasticsearch:
image: elasticsearch:8.x
logstash:
image: logstash:8.x
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
kibana:
image: kibana:8.x
ports:
- "5601:5601"
|
Jaeger for Tracing
1
2
3
4
| docker run -d --name jaeger \
-p 16686:16686 \
-p 14268:14268 \
jaegertracing/all-in-one:latest
|
Best Practices
- Structured logging
- Consistent trace context
- Meaningful metric names
- Alert on symptoms, not causes
- Dashboard for each service
- Correlate across pillars
- Retention policies
Conclusion
Effective observability requires integration of metrics, logs, and traces. Use OpenTelemetry and modern tools for comprehensive system understanding.