Skip to content

Monitoring & Observability

Fluxbase provides comprehensive monitoring and observability features to help you track system health, performance, and troubleshoot issues in production.

Fluxbase exposes metrics, health checks, and system statistics through multiple endpoints:

  • Prometheus Metrics (/metrics) - Standard Prometheus format metrics
  • System Metrics (/api/v1/monitoring/metrics) - JSON system statistics
  • Health Checks (/api/v1/monitoring/health) - Component health status
  • Logs - Structured JSON logging with zerolog

┌─────────────────────────────────────────────────────────────┐
│ Monitoring Stack │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Fluxbase │───▶│ Prometheus │───▶│ Grafana │ │
│ │ /metrics │ │ (Scraper) │ │ (Dashboard) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Structured │───▶│ Loki/ELK │ │
│ │ Logs │ │ (Log Agg.) │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Health │───▶│ Uptime Mon. │ │
│ │ Checks │ │ (AlertMgr) │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘

Fluxbase exposes Prometheus-compatible metrics at /metrics endpoint.

CategoryMetricTypeLabelsDescription
HTTPfluxbase_http_requests_totalCountermethod, path, statusTotal HTTP requests
fluxbase_http_request_duration_secondsHistogrammethod, path, statusHTTP request latency
fluxbase_http_requests_in_flightGauge-Active requests
Databasefluxbase_db_queries_totalCounteroperation, tableTotal database queries
fluxbase_db_query_duration_secondsHistogramoperation, tableDatabase query latency
fluxbase_db_connectionsGauge-Current connections
fluxbase_db_connections_idleGauge-Idle connections
fluxbase_db_connections_maxGauge-Maximum connections
Realtimefluxbase_realtime_connectionsGauge-WebSocket connections
fluxbase_realtime_channelsGauge-Active channels
fluxbase_realtime_subscriptionsGauge-Total subscriptions
fluxbase_realtime_messages_totalCounterchannel_typeMessages sent
Storagefluxbase_storage_bytes_totalCounteroperation, bucketBytes stored/retrieved
fluxbase_storage_operations_totalCounteroperation, bucket, statusStorage operations
fluxbase_storage_operation_duration_secondsHistogramoperation, bucketStorage latency
Authfluxbase_auth_attempts_totalCountermethod, resultAuth attempts
fluxbase_auth_success_totalCountermethodSuccessful auths
fluxbase_auth_failure_totalCountermethod, reasonFailed auths
Rate Limitingfluxbase_rate_limit_hits_totalCounterlimiter_type, identifierRate limit hits
Systemfluxbase_system_uptime_secondsGauge-System uptime

1. Create prometheus.yml:

scrape_configs:
- job_name: "fluxbase"
static_configs:
- targets: ["localhost:8080"]
metrics_path: "/metrics"

2. Run Prometheus:

Terminal window
docker run -d -p 9090:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

3. Verify: Visit http://localhost:9090 and query fluxbase_http_requests_total


JSON metrics endpoint at /api/v1/monitoring/metrics returns system statistics:

Terminal window
curl http://localhost:8080/api/v1/monitoring/metrics -H "Authorization: Bearer TOKEN"

Response includes:

CategoryMetrics
Systemuptime_seconds, go_version, num_goroutines
Memorymemory_alloc_mb, memory_sys_mb, num_gc, gc_pause_ms
Databaseacquired_conns, idle_conns, max_conns, acquire_duration_ms
Realtimetotal_connections, active_channels, total_subscriptions
Storagetotal_buckets, total_files, total_size_gb

Endpoint: /api/v1/monitoring/health

Terminal window
curl http://localhost:8080/api/v1/monitoring/health

Returns 200 OK if healthy, 503 if unhealthy. Checks database, realtime, and storage services.

Docker Compose:

healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/api/v1/monitoring/health"]
interval: 30s
timeout: 5s
retries: 3

1. Run Grafana:

Terminal window
docker run -d -p 3000:3000 grafana/grafana

2. Add Prometheus data source:

  • Open http://localhost:3000 (login: admin / admin)
  • Configuration → Data Sources → Add Prometheus
  • URL: http://prometheus:9090

3. Key dashboard queries:

PanelQuery
Request Raterate(fluxbase_http_requests_total[5m])
P95 Latencyhistogram_quantile(0.95, rate(fluxbase_http_request_duration_seconds_bucket[5m]))
Error Raterate(fluxbase_http_requests_total{status=~"5xx"}[5m])
DB Connectionsfluxbase_db_connections
Realtime Connectionsfluxbase_realtime_connections

Key alert rules for Prometheus:

AlertConditionDescription
HighErrorRaterate(fluxbase_http_requests_total{status="5xx"}[5m]) > 0.055xx error rate > 5%
HighLatencyhistogram_quantile(0.95, rate(fluxbase_http_request_duration_seconds_bucket[5m])) > 1P95 latency > 1s
ConnectionPoolExhaustedfluxbase_db_connections >= fluxbase_db_connections_max * 0.9Connection pool > 90%
HighAuthFailuresrate(fluxbase_auth_failure_total[5m]) > 10Auth failures > 10/sec
FluxbaseDownup{job="fluxbase"} == 0Instance unreachable

Fluxbase uses structured JSON logging (zerolog):

{
"level": "info",
"time": "2024-01-15T10:30:00Z",
"message": "HTTP request",
"method": "POST",
"path": "/api/v1/tables/users",
"status": 200,
"duration_ms": 25.5
}

Log levels: debug, info, warn, error, fatal

Configuration:

Terminal window
FLUXBASE_DEBUG=true # Enable debug logging
FLUXBASE_DEBUG=false # Production (info+)

Logged events: HTTP requests, auth events, database queries, realtime connections, storage operations, webhooks, rate limits, security events


Key metrics and targets:

MetricTargetQuery
Request LatencyP95 < 200ms, P99 < 500mshistogram_quantile(0.95, rate(fluxbase_http_request_duration_seconds_bucket[5m]))
DB Query LatencyP95 < 50ms, P99 < 100mshistogram_quantile(0.95, rate(fluxbase_db_query_duration_seconds_bucket[5m]))
Error Rate< 0.1%rate(fluxbase_http_requests_total{status="5xx"}[5m]) / rate(fluxbase_http_requests_total[5m])
Connection Pool< 80%fluxbase_db_connections / fluxbase_db_connections_max
MemoryStableMonitor memory_alloc_mb over time
GoroutinesStableMonitor num_goroutines over time

IssueSymptomsDiagnosisSolutions
High LatencySlow API responsesCheck slow endpoints, DB query latencyAdd indexes, optimize queries, increase connection pool
High Error Rate5xx errorsMonitor rate(fluxbase_http_requests_total{status="5xx"}[5m])Check logs, verify DB connectivity, review deployments
Memory LeaksIncreasing memoryMonitor memory_alloc_mb and goroutine growthReview long-running ops, check unclosed connections, update version
Connection Pool ExhaustionSlow queries, timeoutsCheck fluxbase_db_connections >= fluxbase_db_connections_maxIncrease max_connections, reduce query time, add replicas

PracticeDescription
Set up monitoring earlyConfigure Prometheus scraping, health checks, log aggregation, and alerting rules before production
Monitor key metricsFocus on request latency (P95, P99), error rate, database performance, connection pool usage, auth failures
Set up alertsCreate alerts for high error rate (> 1%), high latency (P95 > 500ms), service unavailable, connection pool exhaustion
Regular reviewReview dashboards daily, analyze trends weekly, optimize based on metrics, update alert thresholds
Document runbooksCreate runbooks: High latency → check indexes; 5xx errors → check logs; Memory leaks → restart & investigate