OpenTelemetry Reference: SDK Setup, Collector Config, Auto-Instrumentation & Debugging
OpenTelemetry (OTel) is the CNCF standard for application observability — unified tracing, metrics, and logs via a single SDK and Collector. This reference covers the patterns you need to actually instrument production services.
Contents
1. Signals: What OTel Collects
Traces, Metrics, Logs — the three pillars
| Signal | What it captures | When to use | OTel status |
|---|---|---|---|
| Traces | Request flow across services, latency per span, errors | Distributed latency debugging, service dependency mapping | Stable |
| Metrics | Aggregated numbers (counters, gauges, histograms) | Dashboards, alerting, SLOs | Stable |
| Logs | Structured events with trace correlation | Error details, audit trails linked to spans | Beta |
# Core concepts: # Span: a single unit of work (HTTP request, DB query) — has name, start/end time, attributes, events # Trace: tree of spans across services — tied together by trace_id # SpanContext: trace_id + span_id + trace_flags (sampled bit) # Baggage: key/value pairs propagated across service boundaries (use sparingly — propagates to all downstream services) # Attribute naming conventions (semantic conventions): http.method # HTTP verb http.status_code # response code db.system # "postgresql", "redis" db.statement # SQL query (be careful with PII) rpc.service # gRPC service name service.name # your service (set via OTEL_SERVICE_NAME) service.version # your version (set via OTEL_SERVICE_VERSION) deployment.environment # "production", "staging"
2. SDK: Go
Go SDK setup — traces + metrics with OTLP export
# Install:
go get go.opentelemetry.io/otel@latest
go get go.opentelemetry.io/otel/sdk@latest
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc@latest
go get go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc@latest
# Tracer provider setup (main.go):
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/otel/sdk/resource"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func initTracer(ctx context.Context) (func(), error) {
res := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName("my-service"),
semconv.ServiceVersion("1.2.0"),
semconv.DeploymentEnvironment("production"),
)
exporter, _ := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint("otel-collector:4317"),
otlptracegrpc.WithInsecure(), // TLS in prod
)
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
sdktrace.WithSampler(sdktrace.ParentBased(
sdktrace.TraceIDRatioBased(0.1), // sample 10% in prod
)),
)
otel.SetTracerProvider(tp)
return func() { tp.Shutdown(ctx) }, nil
}
// Create spans:
tracer := otel.Tracer("my-package")
func processOrder(ctx context.Context, orderID string) error {
ctx, span := tracer.Start(ctx, "processOrder",
trace.WithAttributes(
attribute.String("order.id", orderID),
attribute.String("order.currency", "GBP"),
),
)
defer span.End()
// Record events and errors:
span.AddEvent("inventory.checked")
if err := chargeCard(ctx, orderID); err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
return err
}
return nil
}
3. SDK: Python
Python SDK — traces + auto-instrumentation for FastAPI / Flask / Django
pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc
pip install opentelemetry-instrumentation-fastapi # or flask, django, psycopg2, redis, requests
# Manual setup:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME, SERVICE_VERSION
resource = Resource(attributes={
SERVICE_NAME: "my-service",
SERVICE_VERSION: "1.0.0",
"deployment.environment": "production",
})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
def process_order(order_id: str):
with tracer.start_as_current_span("process_order",
attributes={"order.id": order_id}
) as span:
try:
result = charge_card(order_id)
span.set_attribute("payment.status", result.status)
except Exception as e:
span.record_exception(e)
span.set_status(trace.StatusCode.ERROR, str(e))
raise
# FastAPI auto-instrumentation:
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.redis import RedisInstrumentor
FastAPIInstrumentor.instrument_app(app) # adds span per request automatically
SQLAlchemyInstrumentor().instrument(engine=db_engine) # db.statement + db.duration
RedisInstrumentor().instrument()
4. SDK: Java
Java — zero-code agent vs SDK instrumentation
# Zero-code (OpenTelemetry Java Agent — recommended for Spring, Quarkus, Micronaut):
# Download: https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases
java -javaagent:opentelemetry-javaagent.jar \
-Dotel.service.name=my-service \
-Dotel.exporter.otlp.endpoint=http://otel-collector:4317 \
-Dotel.traces.sampler=parentbased_traceidratio \
-Dotel.traces.sampler.arg=0.1 \
-jar app.jar
# The agent auto-instruments: Spring MVC, JDBC, Kafka, gRPC, Redis, HTTP clients, etc.
# ~100 libraries instrumented with zero code changes.
# Manual SDK for custom spans:
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.common.Attributes;
Tracer tracer = GlobalOpenTelemetry.getTracer("my-service", "1.0.0");
Span span = tracer.spanBuilder("processOrder")
.setAttribute("order.id", orderId)
.setAttribute("order.currency", "GBP")
.startSpan();
try (var scope = span.makeCurrent()) {
processLogic();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
5. SDK: Node.js
Node.js — SDK setup and auto-instrumentation for Express / HTTP
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-grpc @opentelemetry/resources \
@opentelemetry/semantic-conventions
# instrumentation.js (loaded BEFORE app code via --require):
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const sdk = new NodeSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-service',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
'deployment.environment': process.env.NODE_ENV,
}),
traceExporter: new OTLPTraceExporter({
url: 'http://otel-collector:4317',
}),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy
}),
],
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown());
# Start app with:
node --require ./instrumentation.js app.js
# Manual spans:
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('my-service', '1.0.0');
async function processOrder(orderId) {
return tracer.startActiveSpan('processOrder', async (span) => {
try {
span.setAttribute('order.id', orderId);
const result = await chargeCard(orderId);
span.setAttribute('payment.status', result.status);
return result;
} catch (err) {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
throw err;
} finally {
span.end();
}
});
}
6. OpenTelemetry Collector
Collector config — receivers, processors, exporters, pipelines
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317 # gRPC (preferred — binary, lower overhead)
http:
endpoint: 0.0.0.0:4318 # HTTP/JSON (for browsers, languages without gRPC)
prometheus:
config:
scrape_configs:
- job_name: my-service
static_configs:
- targets: ['localhost:8080']
processors:
batch: # always add — buffers and compresses
timeout: 5s
send_batch_size: 1000
memory_limiter: # prevents OOM under load
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
resource:
attributes:
- action: insert # add deployment.environment to all telemetry
key: deployment.environment
value: production
filter/drop_health: # drop health check spans (reduce noise)
spans:
exclude:
match_type: regexp
attributes:
- key: http.target
value: "^/(health|readyz|livez)"
exporters:
otlp/jaeger: # send traces to Jaeger / Tempo
endpoint: jaeger:4317
tls:
insecure: true
prometheusremotewrite: # send metrics to Prometheus / Mimir
endpoint: http://prometheus:9090/api/v1/write
loki: # send logs to Loki
endpoint: http://loki:3100/loki/api/v1/push
debug: # log to stdout for debugging
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, filter/drop_health, resource]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, batch, resource]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
7. Auto-Instrumentation
Kubernetes operator zero-code injection
# OpenTelemetry Operator — injects agents automatically via annotations
# Install:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
# Annotate deployments for automatic instrumentation:
kubectl annotate deployment my-app \
instrumentation.opentelemetry.io/inject-python="true" # Python
# or: inject-java, inject-nodejs, inject-dotnet, inject-go (requires eBPF)
# Create Instrumentation CR to configure the agent:
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: my-instrumentation
spec:
exporter:
endpoint: http://otel-collector:4317
env:
- name: OTEL_EXPORTER_OTLP_TIMEOUT
value: "20"
sampler:
type: parentbased_traceidratio
argument: "0.1"
python:
env:
- name: OTEL_PYTHON_LOG_CORRELATION
value: "true" # inject trace_id into log records
SYS_PTRACE + privileged: true. Not recommended in all environments. Manual SDK instrumentation is safer for Go.8. Kubernetes Deployment
OTel Collector as DaemonSet + Sidecar patterns
# Option 1: DaemonSet (one Collector per node — good for node-level metrics)
# Option 2: Deployment (central gateway Collector — fan-out to multiple backends)
# Option 3: Sidecar (per-pod — highest isolation, more resource overhead)
# Collector as Deployment (central):
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: central
spec:
mode: deployment # or: daemonset, sidecar
config: |
# your collector config yaml here
# Sidecar injection:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
sidecar.opentelemetry.io/inject: "true" # inject Collector sidecar
spec:
template:
spec:
containers:
- name: my-app
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://localhost:4317" # local sidecar
- name: OTEL_SERVICE_NAME
value: "my-service"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "k8s.pod.name=$(POD_NAME),k8s.namespace.name=$(POD_NAMESPACE)"
# Resource attributes from Kubernetes metadata (K8s Attributes Processor):
processors:
k8sattributes:
passthrough: false
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.namespace.name
- k8s.node.name
9. Backends & Context Propagation
Jaeger, Grafana Tempo, and W3C Trace Context propagation
| Backend | Good for | OTel endpoint | Notes |
|---|---|---|---|
| Jaeger | Trace UI, self-hosted | OTLP gRPC :4317 | Free, UI is excellent for trace exploration |
| Grafana Tempo | Trace storage + Grafana integration | OTLP gRPC :4317 | Pairs with Loki (logs) + Prometheus (metrics) in Grafana |
| Honeycomb | Sampling + querying | OTLP gRPC | Best-in-class querying; free tier available |
| Datadog | All-in-one commercial | OTLP via Agent | Set DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT |
| Prometheus | Metrics only | Prometheus exporter / remote write | OTLP metrics → prometheusremotewrite exporter |
# W3C Trace Context (standard propagation — all OTel SDKs default to this):
# traceparent: 00-{trace_id}-{parent_span_id}-{flags}
# Example: traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
# B3 propagation (Zipkin legacy — still needed if you have old services):
# Add to your SDK config:
from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry.propagate import set_global_textmap
set_global_textmap(B3MultiFormat())
# Log correlation — inject trace_id into every log line:
import logging
from opentelemetry import trace
class OTelLoggingHandler(logging.StreamHandler):
def emit(self, record):
span = trace.get_current_span()
ctx = span.get_span_context()
if ctx.is_valid:
record.trace_id = format(ctx.trace_id, '032x')
record.span_id = format(ctx.span_id, '016x')
super().emit(record)
10. Debugging OpenTelemetry
When spans don’t appear — common causes and fixes
# 1. Is the SDK initialized BEFORE your app code?
# Python/Node: SDK setup must come before any import/require of instrumented libraries
# Go: call initTracer() before any HTTP server/client initialization
# 2. Are spans being sampled out?
OTEL_TRACES_SAMPLER=always_on # force 100% sampling for debugging (revert in prod!)
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=1.0
# 3. Is the exporter reaching the Collector?
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 # check this is correct
# Add debug exporter to Collector config to see what arrives:
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
exporters: [debug, otlp/backend]
# 4. Check Collector logs:
kubectl logs -n monitoring deployment/otel-collector | grep -E 'error|drop|refused'
# 5. Test connectivity directly:
grpcurl -plaintext otel-collector:4317 list # list gRPC services
# Should return: opentelemetry.proto.collector.trace.v1.TraceService etc.
# 6. Verify with a test span (Python):
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
exporter = InMemorySpanExporter()
# Check exporter.get_finished_spans() after your code runs
# 7. Common gotchas:
# - BatchSpanProcessor buffers spans — shutdown() flushes remaining spans (important for short-lived processes)
# - Context propagation: you must pass ctx through function calls or use context.Context injection
# - Go: span.End() MUST be called — use defer span.End() immediately after tracer.Start()
# - Head-based sampling drops at SDK level (never reaches Collector) vs tail-based sampling (Collector decides after full trace received)
sdk.shutdown() (Python/Node) or tp.Shutdown(ctx) (Go) before exiting.
Track the OpenTelemetry release cycle — the OTel Collector and each SDK ship on independent schedules.
ReleaseRun monitors CNCF project releases alongside Node.js, Python, Go, Kubernetes, and 13+ other technologies.
🔍 Free tool: PyPI Package Health Checker — check opentelemetry-api, opentelemetry-sdk, and related Python packages for known CVEs and latest versions.
Founded
2023 in London, UK
Contact
hello@releaserun.com