Skip to content

Distributed Tracing

Distributed tracing provides end-to-end visibility into requests as they travel through your Fluxbase instance. Fluxbase uses OpenTelemetry for standardized, vendor-agnostic tracing.

Distributed tracing tracks a request as it moves through different services and components. Each unit of work is called a “span”, and spans are combined into a “trace” that shows the complete journey of a request.

Key Benefits:

  • Performance Analysis: Identify slow database queries, API calls, and function executions
  • Error Debugging: Trace errors across service boundaries
  • Architecture Understanding: Visualize service dependencies and data flow
  • Capacity Planning: Make data-driven decisions about scaling

Enable OpenTelemetry tracing in your fluxbase.yaml:

observability:
tracing:
enabled: true
endpoint: "localhost:4317" # OTLP collector endpoint
service_name: "fluxbase" # Service name for traces
environment: "production" # Environment (development, staging, production)
sample_rate: 1.0 # Sample rate (0.0-1.0, 1.0 = 100%)
insecure: false # Use TLS for production

Environment Variables:

Terminal window
export FLUXBASE_OBSERVABILITY_TRACING_ENABLED=true
export FLUXBASE_OBSERVABILITY_TRACING_ENDPOINT="collector.example.com:4317"
export FLUXBASE_OBSERVABILITY_TRACING_SERVICE_NAME="fluxbase"
export FLUXBASE_OBSERVABILITY_TRACING_ENVIRONMENT="production"
export FLUXBASE_OBSERVABILITY_TRACING_SAMPLE_RATE=0.1 # Sample 10% of traces

Fluxbase uses the OTLP (OpenTelemetry Protocol) format, which is compatible with many backends:

Jaeger is a popular open-source tracing backend.

Run Jaeger with Docker:

Terminal window
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest

Configure Fluxbase:

observability:
tracing:
enabled: true
endpoint: "localhost:4317"
insecure: true # For local development

Access Jaeger UI:

  • Navigate to http://localhost:16686
  • Browse traces by service, operation, and tags

Grafana Tempo is a scalable, high-performance distributed tracing backend.

Run Tempo with Docker:

Terminal window
docker run -d --name tempo \
-p 4317:4317 \
-p 3200:3200 \
grafana/tempo:latest \
-server.http-listen-port=3200 \
-storage.trace.backend=local \
-storage.trace.local.path=/tmp/tempo

Configure Fluxbase:

observability:
tracing:
enabled: true
endpoint: "localhost:4317"

Access Tempo UI:

  • Use Grafana with Tempo data source
  • Navigate to Grafana → Explore → Select Tempo data source

For production deployments, use the OpenTelemetry Collector as a central processing pipeline:

otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s
max_batch_size: 1000
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger, logging]

Run Collector:

Terminal window
docker run -d --name otel-collector \
-v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
-p 4317:4317 \
otel/opentelemetry-collector:latest

AWS X-Ray:

# Use AWS Distro for OpenTelemetry (ADOT) Collector
observability:
tracing:
enabled: true
endpoint: "localhost:4317" # ADOT Collector endpoint

Google Cloud Trace:

# Use OpenTelemetry Collector with Google Cloud Trace exporter
observability:
tracing:
enabled: true
endpoint: "localhost:4317"

Azure Monitor:

# Use Azure Monitor Application Insights exporter
observability:
tracing:
enabled: true
endpoint: "localhost:4317"

Fluxbase automatically creates spans for:

All PostgreSQL queries are automatically traced:

// Automatic span created for this query
ctx, span := observability.StartDBSpan(ctx, "SELECT", "users")
defer observability.EndDBSpan(span, err)
rows, err := db.Query(ctx, "SELECT * FROM users WHERE id = $1", userID)

Span Attributes:

  • db.system: “postgresql”
  • db.operation: “SELECT”, “INSERT”, “UPDATE”, “DELETE”
  • db.table: Table name
  • Error status if query fails

All auth operations create spans:

  • auth.signup
  • auth.signin
  • auth.signout
  • auth.oauth
  • auth.magic_link

File storage operations are traced:

  • storage.upload
  • storage.download
  • storage.delete

Create custom spans for additional visibility:

import "github.com/nimbleflux/fluxbase/internal/observability"
// Start a custom span
ctx, span := observability.StartSpan(ctx, "my-custom-operation")
defer span.End()
// Your code here
result, err := doSomething(ctx)
// Record error if failed
if err != nil {
observability.RecordError(ctx, err)
}
import (
"go.opentelemetry.io/otel/attribute"
"github.com/nimbleflux/fluxbase/internal/observability"
)
ctx, span := observability.StartSpan(ctx, "process-data")
defer span.End()
// Add custom attributes
observability.SetSpanAttributes(ctx,
attribute.String("user.id", userID),
attribute.Int("record.count", len(records)),
attribute.String("processing.type", "batch"),
)
// Add events to track progress
observability.AddSpanEvent(ctx, "validation.started",
attribute.Int("record.count", len(records)),
)
// ... validation code ...
observability.AddSpanEvent(ctx, "validation.completed",
attribute.Int("valid.records", validCount),
attribute.Int("invalid.records", invalidCount),
)

Fluxbase automatically traces edge function execution:

// Your Deno function
import { tracer } from "https://deno.land/x/otel@v0.1.0/api.ts";
// Span context is automatically available via environment variables
const traceParent = Deno.env.get("TRACEPARENT");
const traceId = Deno.env.get("OTEL_TRACE_ID");
const spanId = Deno.env.get("OTEL_SPAN_ID");
// Fluxbase automatically creates function spans with attributes:
// - function.execution_id
// - function.name
// - function.namespace
// - user.id (if authenticated)
// - http.method
// - http.url

Function Span Events:

// Add custom events to function spans
await fetch("https://api.example.com/data", {
headers: {
"traceparent": traceParent, // Propagate trace context
},
});

Jobs are automatically traced with progress tracking:

// Job span is created when job starts
ctx, span := observability.StartJobSpan(ctx, observability.JobSpanConfig{
JobID: jobID,
JobName: "send-email",
Namespace: "notifications",
Priority: 5,
ScheduledAt: scheduledAt,
WorkerID: workerID,
WorkerName: "worker-1",
UserID: userID,
})
defer span.End()
// Track job progress
observability.SetJobProgress(ctx, 25, "Email queued")
// ... send email ...
observability.SetJobProgress(ctx, 50, "Email sent")
// ... update database ...
observability.SetJobProgress(ctx, 100, "Completed")
// Set final result
observability.SetJobResult(ctx, "completed", duration, nil)

Trace context automatically propagates to:

  1. Database Queries: All queries carry trace context
  2. HTTP Clients: Use traceparent header
  3. Background Jobs: Jobs inherit parent trace
  4. Edge Functions: Trace context passed as environment variables

Manual Propagation:

// Get trace context for subprocesses
env := observability.GetTraceContextEnv(ctx)
// Pass to subprocess
cmd := exec.CommandContext(ctx, "my-subprocess")
cmd.Env = append(os.Environ(), flattenEnv(env)...)

Reduce tracing overhead with smart sampling:

# Sample all traces in development
observability:
tracing:
sample_rate: 1.0 # 100% sampling
# Sample 10% of traces in production
observability:
tracing:
sample_rate: 0.1 # 10% sampling
# Dynamic sampling based on route
observability:
tracing:
sample_rate: 0.01 # 1% baseline

Head-Based Sampling:

// Always trace slow operations
if duration > time.Second {
observability.SetSpanAttributes(ctx,
attribute.Bool("slow.request", true),
)
}
// Always trace errors
if err != nil {
observability.RecordError(ctx, err)
}

Look for database spans with high duration:

  1. Open Jaeger UI or Grafana Tempo
  2. Filter by operation db.query or db.SELECT
  3. Sort by duration
  4. Click on slow spans to see SQL query

Follow an error through the system:

  1. Find traces with error status
  2. Expand the trace timeline
  3. Look for red error spans
  4. Click on error spans to see stack traces

Identify optimization opportunities:

  1. Look for spans with high duration
  2. Check if spans run sequentially (could be parallelized)
  3. Identify N+1 query patterns
  4. Find slow external API calls

Follow OpenTelemetry semantic conventions:

import semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
observability.SetSpanAttributes(ctx,
semconv.HTTPMethodKey.String("GET"),
semconv.HTTPStatusCodeKey.Int(200),
semconv.EnduserIDKey.String(userID),
)

Track important events in spans:

observability.AddSpanEvent(ctx, "cache.miss",
attribute.String("cache.key", cacheKey),
)
observability.AddSpanEvent(ctx, "db.query.started")
// ... run query ...
observability.AddSpanEvent(ctx, "db.query.completed",
attribute.Int("db.row_count", rowCount),
)
if err != nil {
observability.RecordError(ctx, err)
span.SetStatus(codes.Error, err.Error())
} else {
span.SetStatus(codes.Ok, "")
}

Connect related spans:

// Link to background job span
span.AddLink(trace.Link{
SpanContext: jobSpan.SpanContext(),
Attributes: []attribute.KeyValue{
attribute.String("job.id", jobID),
},
})

Identify the service generating traces:

observability:
tracing:
service_name: "fluxbase"
environment: "production"

Resource attributes added automatically:

  • service.name: Service name
  • service.version: Fluxbase version
  • deployment.environment: Environment name
  • service.namespace: “fluxbase”

Check 1: Verify tracing is enabled

Terminal window
# Check logs for initialization message
grep "OpenTelemetry tracing initialized" /var/log/fluxbase/fluxbase.log

Check 2: Verify endpoint connectivity

Terminal window
# Test connection to collector
telnet localhost 4317

Check 3: Check sample rate

# Ensure sample_rate > 0
observability:
tracing:
sample_rate: 1.0 # Try 100% sampling for testing

Check 4: Verify collector configuration

Terminal window
# Check collector logs
docker logs otel-collector

Issue: Spans appear but don’t form a complete trace.

Solution: Ensure trace context propagation is working:

  1. Check that requests include traceparent header
  2. Verify context is passed through function calls
  3. Check that spans use defer span.End()

Issue: Tracing causes high memory usage.

Solutions:

  1. Reduce sample rate:

    observability:
    tracing:
    sample_rate: 0.1 # Sample only 10%
  2. Use batch processing:

    # Collector configuration
    processors:
    batch:
    max_batch_size: 1000
    timeout: 10s
  3. Limit span attributes:

    // Avoid adding large attributes
    observability.SetSpanAttributes(ctx,
    attribute.String("huge.data", hugeDataString), // Bad
    attribute.String("data.hash", hashData(hugeData)), // Good
    )

Tracing overhead is minimal with proper configuration:

ConfigurationOverheadUse Case
Sampling: 100%~5-10%Development, critical paths
Sampling: 10%~1-2%Production general
Sampling: 1%<1%High-traffic production

Optimization Tips:

  1. Use sampling in production
  2. Disable tracing for health checks
  3. Use batch exporters
  4. Filter sensitive data from spans
  5. Set appropriate span timeout