OpenTelemetry Go: Instrumenting gRPC Services

Earlier this year I migrated a set of internal Go gRPC services at IBM from OpenTracing/Jaeger to OpenTelemetry. The migration involved about a dozen services, an existing Jaeger deployment, and the constraint that we couldn't break tracing during the rollout. If you want the broader picture of distributed tracing with OpenTelemetry in Go — spans, context propagation, HTTP instrumentation — that post covers the fundamentals. This one focuses specifically on gRPC services. Here's how OTel works in Go and what the migration looked like.

Why OpenTelemetry

OpenTracing is archived. OpenCensus is archived. OpenTelemetry is the merger of both and is now the standard. For new services, there's no reason to use anything else. For existing OpenTracing services, the bridge package makes migration incremental.

Initializing the TracerProvider

The TracerProvider is the central object. You create one at startup, register it globally, and then every piece of code that wants to create spans pulls a tracer from it.

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/propagation"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.12.0"
)

func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
    exporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("otel-collector:4317"),
        otlptracegrpc.WithInsecure(), // use WithTLSClientConfig in prod
    )
    if err != nil {
        return nil, fmt.Errorf("creating OTLP exporter: %w", err)
    }

    res := resource.NewWithAttributes(
        semconv.SchemaURL,
        semconv.ServiceNameKey.String("key-management-service"),
        semconv.ServiceVersionKey.String(version),
        attribute.String("deployment.environment", env),
    )

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.AlwaysSample()), // adjust for production volume
    )

    otel.SetTracerProvider(tp)
    otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
        propagation.TraceContext{},
        propagation.Baggage{},
    ))

    return tp, nil
}

Call this in main, defer tp.Shutdown(ctx) to flush pending spans on shutdown.

The two global setter calls matter: SetTracerProvider makes the provider available to any code calling otel.GetTracerProvider(), and SetTextMapPropagator configures how trace context is serialized into and extracted from headers. propagation.TraceContext{} is the W3C standard format. If you're talking to services that use B3 headers (Zipkin, older Jaeger), add b3.New() to the composite propagator.

Resource Attributes

The resource is metadata that gets attached to every span from this service. The three I always set:

service.name: identifies the service in your trace backend UI. Use the same name as your Kubernetes service.
service.version: build version. Makes it trivial to correlate a latency spike to a specific deployment.
deployment.environment: production, staging, etc. Lets you filter in Jaeger/Grafana Tempo.

These follow the OpenTelemetry semantic conventions — the attribute names are standardized so backends can index and display them consistently.

gRPC Interceptors

This is where most of the value is for gRPC services. The otelgrpc package provides interceptors that automatically create spans for every RPC call and propagate trace context across service boundaries. If you're setting up gRPC for the first time, Building a gRPC Service in Go covers the server and client wiring that these interceptors slot into.

On the server:

import "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"

grpcServer := grpc.NewServer(
    grpc.UnaryInterceptor(otelgrpc.UnaryServerInterceptor()),
    grpc.StreamInterceptor(otelgrpc.StreamServerInterceptor()),
)

On the client:

conn, err := grpc.DialContext(ctx, target,
    grpc.WithUnaryInterceptor(otelgrpc.UnaryClientInterceptor()),
    grpc.WithStreamInterceptor(otelgrpc.StreamClientInterceptor()),
)

These interceptors:
1. Extract incoming trace context from gRPC metadata (on the server)
2. Create a span for the RPC with the method name as the span name
3. Inject outgoing trace context into gRPC metadata (on the client)
4. Record the RPC status code as a span attribute

You get distributed traces across all your gRPC services without any manual instrumentation in your handlers. A trace that hits five services will show a clean parent-child span hierarchy in Jaeger with zero per-handler work.

Manual Spans for Important Operations

The interceptors handle the RPC boundary. For operations within a handler that are worth tracing — database queries, external calls, expensive computations — add manual spans:

func (s *Server) WrapDEK(ctx context.Context, req *pb.WrapDEKRequest) (*pb.WrapDEKResponse, error) {
    tracer := otel.Tracer("key-management-service")

    ctx, span := tracer.Start(ctx, "wrap-dek")
    defer span.End()

    span.SetAttributes(
        attribute.String("key.version", req.KeyVersion),
        attribute.String("key.algorithm", req.Algorithm),
    )

    ciphertext, err := s.kms.Wrap(ctx, req.Plaintext, req.KeyVersion)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, status.Errorf(codes.Internal, "wrap failed: %v", err)
    }

    span.SetAttributes(attribute.Int("ciphertext.length", len(ciphertext)))
    return &pb.WrapDEKResponse{Ciphertext: ciphertext}, nil
}

The hierarchy here: the otelgrpc interceptor creates the root span for the WrapDEK RPC. tracer.Start(ctx, "wrap-dek") creates a child span under it. If s.kms.Wrap also creates spans (because your KMS client is instrumented), those become grandchildren.

Always pass ctx through. The context carries the active span, and tracer.Start uses it to establish the parent-child relationship. If you drop the context, your spans become orphans.

The OTel Collector

Don't send traces directly from your service to Jaeger. Use an OTel Collector as an intermediary. The architecture:

Services → OTel Collector → Jaeger (traces)
                          → Prometheus (metrics)
                          → Cloud Logging (logs, if you configure it)

The Collector gives you:
- Fan-out: one SDK config in your service, the Collector routes to multiple backends
- Batching and retry: the Collector handles export failures gracefully; your service just fires and forgets
- Tail-based sampling: you can't make sampling decisions based on full trace data in your service (you only see one span at a time); the Collector can see the full trace and make intelligent decisions — e.g., always keep traces where any span has an error

Deploy the Collector as a DaemonSet or a Deployment in your cluster. The services export to it at otel-collector:4317 (the gRPC OTLP port).

Migrating from OpenTracing

If you have existing services using the opentracing-go library, you can migrate incrementally using the bridge:

import (
    otbridge "go.opentelemetry.io/otel/bridge/opentracing"
)

// In your tracer init, after creating tp:
bridgeTracer, wrapperProvider := otbridge.NewTracerPair(tp.Tracer(""))
opentracing.SetGlobalTracer(bridgeTracer)

This makes the OpenTracing global tracer delegate to OTel. Old code using opentracing.StartSpan continues to work and produces OTel spans. You can migrate service by service, removing the bridge and the opentracing-go dependency once a service is fully converted.

Sampling in Production

sdktrace.AlwaysSample() is fine for development and low-volume services. In production with high RPC rates, you want to sample. Two options:

sdktrace.TraceIDRatioBased(0.1) samples 10% of traces based on trace ID. Simple, predictable overhead, but you'll miss low-frequency errors.

Head-based sampling in the service + tail-based sampling in the Collector is the production pattern. Set a high ratio in the service (50-100%), let the Collector apply tail-based sampling to keep error traces and drop routine successful traces.

The migration from OpenTracing was smoother than I expected, mostly because the bridge worked well and the gRPC interceptors handled the hardest part (context propagation) automatically. The bigger win was the OTel Collector replacing a mix of Jaeger clients and custom metrics exporters with a single pipeline.