OpenTelemetry Reference: SDK Setup, Collector Config, Auto-Instrumentation & Debugging

OpenTelemetry (OTel) is the CNCF standard for application observability — unified tracing, metrics, and logs via a single SDK and Collector. This reference covers the patterns you need to actually instrument production services.

Signals: Traces, Metrics, Logs
SDK: Go
SDK: Python
SDK: Java
SDK: Node.js
OTel Collector
Auto-Instrumentation
Kubernetes Deployment
Backends & Context Propagation
Debugging OTel

1. Signals: What OTel Collects

Traces, Metrics, Logs — the three pillars

Signal	What it captures	When to use	OTel status
Traces	Request flow across services, latency per span, errors	Distributed latency debugging, service dependency mapping	Stable
Metrics	Aggregated numbers (counters, gauges, histograms)	Dashboards, alerting, SLOs	Stable
Logs	Structured events with trace correlation	Error details, audit trails linked to spans	Beta

# Core concepts:
# Span:   a single unit of work (HTTP request, DB query) — has name, start/end time, attributes, events
# Trace:  tree of spans across services — tied together by trace_id
# SpanContext: trace_id + span_id + trace_flags (sampled bit)
# Baggage: key/value pairs propagated across service boundaries (use sparingly — propagates to all downstream services)

# Attribute naming conventions (semantic conventions):
http.method         # HTTP verb
http.status_code    # response code
db.system           # "postgresql", "redis"
db.statement        # SQL query (be careful with PII)
rpc.service         # gRPC service name
service.name        # your service (set via OTEL_SERVICE_NAME)
service.version     # your version (set via OTEL_SERVICE_VERSION)
deployment.environment # "production", "staging"

2. SDK: Go

Go SDK setup — traces + metrics with OTLP export

# Install:
go get go.opentelemetry.io/otel@latest
go get go.opentelemetry.io/otel/sdk@latest
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc@latest
go get go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc@latest

# Tracer provider setup (main.go):
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    "go.opentelemetry.io/otel/sdk/resource"
    semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)

func initTracer(ctx context.Context) (func(), error) {
    res := resource.NewWithAttributes(
        semconv.SchemaURL,
        semconv.ServiceName("my-service"),
        semconv.ServiceVersion("1.2.0"),
        semconv.DeploymentEnvironment("production"),
    )

    exporter, _ := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("otel-collector:4317"),
        otlptracegrpc.WithInsecure(), // TLS in prod
    )

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.ParentBased(
            sdktrace.TraceIDRatioBased(0.1), // sample 10% in prod
        )),
    )
    otel.SetTracerProvider(tp)

    return func() { tp.Shutdown(ctx) }, nil
}

// Create spans:
tracer := otel.Tracer("my-package")

func processOrder(ctx context.Context, orderID string) error {
    ctx, span := tracer.Start(ctx, "processOrder",
        trace.WithAttributes(
            attribute.String("order.id", orderID),
            attribute.String("order.currency", "GBP"),
        ),
    )
    defer span.End()

    // Record events and errors:
    span.AddEvent("inventory.checked")
    if err := chargeCard(ctx, orderID); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return err
    }
    return nil
}

3. SDK: Python

Python SDK — traces + auto-instrumentation for FastAPI / Flask / Django

pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc
pip install opentelemetry-instrumentation-fastapi  # or flask, django, psycopg2, redis, requests

# Manual setup:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME, SERVICE_VERSION

resource = Resource(attributes={
    SERVICE_NAME: "my-service",
    SERVICE_VERSION: "1.0.0",
    "deployment.environment": "production",
})

provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

def process_order(order_id: str):
    with tracer.start_as_current_span("process_order",
        attributes={"order.id": order_id}
    ) as span:
        try:
            result = charge_card(order_id)
            span.set_attribute("payment.status", result.status)
        except Exception as e:
            span.record_exception(e)
            span.set_status(trace.StatusCode.ERROR, str(e))
            raise

# FastAPI auto-instrumentation:
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.redis import RedisInstrumentor

FastAPIInstrumentor.instrument_app(app)  # adds span per request automatically
SQLAlchemyInstrumentor().instrument(engine=db_engine)  # db.statement + db.duration
RedisInstrumentor().instrument()

Use OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME, and OTEL_RESOURCE_ATTRIBUTES env vars — no code changes needed between environments.

4. SDK: Java

Java — zero-code agent vs SDK instrumentation

# Zero-code (OpenTelemetry Java Agent — recommended for Spring, Quarkus, Micronaut):
# Download: https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases
java -javaagent:opentelemetry-javaagent.jar \
     -Dotel.service.name=my-service \
     -Dotel.exporter.otlp.endpoint=http://otel-collector:4317 \
     -Dotel.traces.sampler=parentbased_traceidratio \
     -Dotel.traces.sampler.arg=0.1 \
     -jar app.jar

# The agent auto-instruments: Spring MVC, JDBC, Kafka, gRPC, Redis, HTTP clients, etc.
# ~100 libraries instrumented with zero code changes.

# Manual SDK for custom spans:
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.common.Attributes;

Tracer tracer = GlobalOpenTelemetry.getTracer("my-service", "1.0.0");

Span span = tracer.spanBuilder("processOrder")
    .setAttribute("order.id", orderId)
    .setAttribute("order.currency", "GBP")
    .startSpan();

try (var scope = span.makeCurrent()) {
    processLogic();
} catch (Exception e) {
    span.recordException(e);
    span.setStatus(StatusCode.ERROR, e.getMessage());
    throw e;
} finally {
    span.end();
}

5. SDK: Node.js

Node.js — SDK setup and auto-instrumentation for Express / HTTP

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
            @opentelemetry/exporter-trace-otlp-grpc @opentelemetry/resources \
            @opentelemetry/semantic-conventions

# instrumentation.js (loaded BEFORE app code via --require):
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'my-service',
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
    'deployment.environment': process.env.NODE_ENV,
  }),
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4317',
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy
    }),
  ],
});

sdk.start();
process.on('SIGTERM', () => sdk.shutdown());

# Start app with:
node --require ./instrumentation.js app.js

# Manual spans:
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('my-service', '1.0.0');

async function processOrder(orderId) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      span.setAttribute('order.id', orderId);
      const result = await chargeCard(orderId);
      span.setAttribute('payment.status', result.status);
      return result;
    } catch (err) {
      span.recordException(err);
      span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
      throw err;
    } finally {
      span.end();
    }
  });
}

6. OpenTelemetry Collector

Collector config — receivers, processors, exporters, pipelines

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317     # gRPC (preferred — binary, lower overhead)
      http:
        endpoint: 0.0.0.0:4318     # HTTP/JSON (for browsers, languages without gRPC)
  prometheus:
    config:
      scrape_configs:
        - job_name: my-service
          static_configs:
            - targets: ['localhost:8080']

processors:
  batch:                            # always add — buffers and compresses
    timeout: 5s
    send_batch_size: 1000
  memory_limiter:                   # prevents OOM under load
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  resource:
    attributes:
      - action: insert              # add deployment.environment to all telemetry
        key: deployment.environment
        value: production
  filter/drop_health:               # drop health check spans (reduce noise)
    spans:
      exclude:
        match_type: regexp
        attributes:
          - key: http.target
            value: "^/(health|readyz|livez)"

exporters:
  otlp/jaeger:                      # send traces to Jaeger / Tempo
    endpoint: jaeger:4317
    tls:
      insecure: true
  prometheusremotewrite:            # send metrics to Prometheus / Mimir
    endpoint: http://prometheus:9090/api/v1/write
  loki:                             # send logs to Loki
    endpoint: http://loki:3100/loki/api/v1/push
  debug:                            # log to stdout for debugging
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers:  [otlp]
      processors: [memory_limiter, batch, filter/drop_health, resource]
      exporters:  [otlp/jaeger]
    metrics:
      receivers:  [otlp, prometheus]
      processors: [memory_limiter, batch, resource]
      exporters:  [prometheusremotewrite]
    logs:
      receivers:  [otlp]
      processors: [memory_limiter, batch]
      exporters:  [loki]

Collector vs Agent pattern: Deploy an OTel Collector sidecar or DaemonSet in Kubernetes — apps export to the local Collector, which fans out to multiple backends. This decouples your app from the observability backend and lets you change backends without redeploying services.

7. Auto-Instrumentation

Kubernetes operator zero-code injection

# OpenTelemetry Operator — injects agents automatically via annotations
# Install:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

# Annotate deployments for automatic instrumentation:
kubectl annotate deployment my-app \
  instrumentation.opentelemetry.io/inject-python="true"      # Python
  # or: inject-java, inject-nodejs, inject-dotnet, inject-go (requires eBPF)

# Create Instrumentation CR to configure the agent:
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector:4317
  env:
    - name: OTEL_EXPORTER_OTLP_TIMEOUT
      value: "20"
  sampler:
    type: parentbased_traceidratio
    argument: "0.1"
  python:
    env:
      - name: OTEL_PYTHON_LOG_CORRELATION
        value: "true"              # inject trace_id into log records

Go auto-instrumentation uses eBPF (kernel-level) — requires SYS_PTRACE + privileged: true. Not recommended in all environments. Manual SDK instrumentation is safer for Go.

8. Kubernetes Deployment

OTel Collector as DaemonSet + Sidecar patterns

# Option 1: DaemonSet (one Collector per node — good for node-level metrics)
# Option 2: Deployment (central gateway Collector — fan-out to multiple backends)
# Option 3: Sidecar (per-pod — highest isolation, more resource overhead)

# Collector as Deployment (central):
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: central
spec:
  mode: deployment    # or: daemonset, sidecar
  config: |
    # your collector config yaml here

# Sidecar injection:
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    sidecar.opentelemetry.io/inject: "true"   # inject Collector sidecar
spec:
  template:
    spec:
      containers:
        - name: my-app
          env:
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://localhost:4317"   # local sidecar
            - name: OTEL_SERVICE_NAME
              value: "my-service"
            - name: OTEL_RESOURCE_ATTRIBUTES
              value: "k8s.pod.name=$(POD_NAME),k8s.namespace.name=$(POD_NAMESPACE)"

# Resource attributes from Kubernetes metadata (K8s Attributes Processor):
processors:
  k8sattributes:
    passthrough: false
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.namespace.name
        - k8s.node.name

9. Backends & Context Propagation

Jaeger, Grafana Tempo, and W3C Trace Context propagation

Backend	Good for	OTel endpoint	Notes
Jaeger	Trace UI, self-hosted	OTLP gRPC :4317	Free, UI is excellent for trace exploration
Grafana Tempo	Trace storage + Grafana integration	OTLP gRPC :4317	Pairs with Loki (logs) + Prometheus (metrics) in Grafana
Honeycomb	Sampling + querying	OTLP gRPC	Best-in-class querying; free tier available
Datadog	All-in-one commercial	OTLP via Agent	Set DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT
Prometheus	Metrics only	Prometheus exporter / remote write	OTLP metrics → prometheusremotewrite exporter

# W3C Trace Context (standard propagation — all OTel SDKs default to this):
# traceparent: 00-{trace_id}-{parent_span_id}-{flags}
# Example: traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

# B3 propagation (Zipkin legacy — still needed if you have old services):
# Add to your SDK config:
from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry.propagate import set_global_textmap
set_global_textmap(B3MultiFormat())

# Log correlation — inject trace_id into every log line:
import logging
from opentelemetry import trace

class OTelLoggingHandler(logging.StreamHandler):
    def emit(self, record):
        span = trace.get_current_span()
        ctx = span.get_span_context()
        if ctx.is_valid:
            record.trace_id = format(ctx.trace_id, '032x')
            record.span_id  = format(ctx.span_id, '016x')
        super().emit(record)

10. Debugging OpenTelemetry

When spans don’t appear — common causes and fixes

# 1. Is the SDK initialized BEFORE your app code?
#    Python/Node: SDK setup must come before any import/require of instrumented libraries
#    Go: call initTracer() before any HTTP server/client initialization

# 2. Are spans being sampled out?
OTEL_TRACES_SAMPLER=always_on   # force 100% sampling for debugging (revert in prod!)
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=1.0

# 3. Is the exporter reaching the Collector?
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317  # check this is correct
# Add debug exporter to Collector config to see what arrives:
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    traces:
      exporters: [debug, otlp/backend]

# 4. Check Collector logs:
kubectl logs -n monitoring deployment/otel-collector | grep -E 'error|drop|refused'

# 5. Test connectivity directly:
grpcurl -plaintext otel-collector:4317 list    # list gRPC services
# Should return: opentelemetry.proto.collector.trace.v1.TraceService etc.

# 6. Verify with a test span (Python):
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
exporter = InMemorySpanExporter()
# Check exporter.get_finished_spans() after your code runs

# 7. Common gotchas:
# - BatchSpanProcessor buffers spans — shutdown() flushes remaining spans (important for short-lived processes)
# - Context propagation: you must pass ctx through function calls or use context.Context injection
# - Go: span.End() MUST be called — use defer span.End() immediately after tracer.Start()
# - Head-based sampling drops at SDK level (never reaches Collector) vs tail-based sampling (Collector decides after full trace received)

BatchSpanProcessor in short-lived CLI scripts: spans may be buffered and not flushed before exit. Call sdk.shutdown() (Python/Node) or tp.Shutdown(ctx) (Go) before exiting.

Track the OpenTelemetry release cycle — the OTel Collector and each SDK ship on independent schedules.
ReleaseRun monitors CNCF project releases alongside Node.js, Python, Go, Kubernetes, and 13+ other technologies.

🔍 Free tool: PyPI Package Health Checker — check opentelemetry-api, opentelemetry-sdk, and related Python packages for known CVEs and latest versions.

🔍 Free tool: GitHub Actions Version Auditor — paste your workflow YAML and instantly see which action versions are outdated, whether you're using mutable tags vs pinned SHAs, and which publishers are verified.

Contents

1. Signals: What OTel Collects

2. SDK: Go

3. SDK: Python

4. SDK: Java

5. SDK: Node.js

6. OpenTelemetry Collector

7. Auto-Instrumentation

8. Kubernetes Deployment

9. Backends & Context Propagation

10. Debugging OpenTelemetry