Observability in Microservices: Metrics, Logs, and Traces

Observability is critical for understanding complex microservices systems. This guide covers the three pillars: metrics, logs, and traces.

The Three Pillars

Metrics

Numerical measurements over time:

1
2
3
4
5
6
7
8
9
from prometheus_client import Counter, Histogram

request_count = Counter('http_requests_total', 'Total requests')
request_duration = Histogram('http_request_duration_seconds', 'Request duration')

@request_duration.time()
def handle_request():
    request_count.inc()
    # Handle request

Logs

Discrete events:

1
2
3
4
5
6
7
{
  "timestamp": "2024-03-09T10:00:00Z",
  "level": "ERROR",
  "service": "user-service",
  "traceId": "abc123",
  "message": "Database connection failed"
}

Traces

Request flow across services:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
const { trace } = require('@opentelemetry/api');

const span = trace.getTracer('my-service').startSpan('processOrder');
span.setAttribute('order.id', orderId);

try {
    await processOrder(orderId);
} finally {
    span.end();
}

OpenTelemetry

Unified observability framework:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from opentelemetry import trace, metrics
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

tracer = trace.get_tracer(__name__)
app = FastAPI()

FastAPIInstrumentor.instrument_app(app)

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    with tracer.start_as_current_span("get_user"):
        return await fetch_user(user_id)

Monitoring Stack

Prometheus + Grafana

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
  
  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"

ELK Stack

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
services:
  elasticsearch:
    image: elasticsearch:8.x
  
  logstash:
    image: logstash:8.x
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
  
  kibana:
    image: kibana:8.x
    ports:
      - "5601:5601"

Jaeger for Tracing

1
2
3
4
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 14268:14268 \
  jaegertracing/all-in-one:latest

Best Practices

Structured logging
Consistent trace context
Meaningful metric names
Alert on symptoms, not causes
Dashboard for each service
Correlate across pillars
Retention policies

Conclusion

Effective observability requires integration of metrics, logs, and traces. Use OpenTelemetry and modern tools for comprehensive system understanding.