Skip to content

Observability is critical for understanding complex microservices systems. This guide covers the three pillars: metrics, logs, and traces.

The Three Pillars

Metrics

Numerical measurements over time:

1
2
3
4
5
6
7
8
9
from prometheus_client import Counter, Histogram

request_count = Counter('http_requests_total', 'Total requests')
request_duration = Histogram('http_request_duration_seconds', 'Request duration')

@request_duration.time()
def handle_request():
    request_count.inc()
    # Handle request

Logs

Discrete events:

1
2
3
4
5
6
7
{
  "timestamp": "2024-03-09T10:00:00Z",
  "level": "ERROR",
  "service": "user-service",
  "traceId": "abc123",
  "message": "Database connection failed"
}

Traces

Request flow across services:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
const { trace } = require('@opentelemetry/api');

const span = trace.getTracer('my-service').startSpan('processOrder');
span.setAttribute('order.id', orderId);

try {
    await processOrder(orderId);
} finally {
    span.end();
}

OpenTelemetry

Unified observability framework:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from opentelemetry import trace, metrics
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

tracer = trace.get_tracer(__name__)
app = FastAPI()

FastAPIInstrumentor.instrument_app(app)

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    with tracer.start_as_current_span("get_user"):
        return await fetch_user(user_id)

Monitoring Stack

Prometheus + Grafana

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# docker-compose.yml
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
  
  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"

ELK Stack

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
services:
  elasticsearch:
    image: elasticsearch:8.x
  
  logstash:
    image: logstash:8.x
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
  
  kibana:
    image: kibana:8.x
    ports:
      - "5601:5601"

Jaeger for Tracing

1
2
3
4
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 14268:14268 \
  jaegertracing/all-in-one:latest

Best Practices

  1. Structured logging
  2. Consistent trace context
  3. Meaningful metric names
  4. Alert on symptoms, not causes
  5. Dashboard for each service
  6. Correlate across pillars
  7. Retention policies

Conclusion

Effective observability requires integration of metrics, logs, and traces. Use OpenTelemetry and modern tools for comprehensive system understanding.