Skip to content

Observability

What this module does

With 40+ agencies running on 80+ Elastic Beanstalk instances, knowing what is happening at runtime is critical. This module covers:

  • Structured JSON logging to CloudWatch
  • Per-request SQL query metrics
  • AWS X-Ray distributed tracing
  • PostHog product analytics
  • Error deduplication and tracking

Key files

  • Directoryorchid/
    • Directoryutils/
      • logging_config.py 9 KB — structured JSON logging, CloudWatch, circuit breaker
      • metrics_middleware.py 5 KB — request metrics collection
      • query_metrics.py 7 KB — per-request SQL query counting and timing
      • xray_config.py 3 KB — AWS X-Ray distributed tracing
      • posthog_config.py 8 KB — PostHog product analytics
      • external_api_metrics.py 8 KB — external API call tracking
      • cron_metrics.py 9 KB — cron job performance metrics
      • pool_metrics.py 4 KB — DB connection pool metrics
      • sqldebug.py 7 KB — SQL debugging helper
    • Directorymodels/
      • __init__.py Errors, ErrAPI, Err400 — error deduplication models

Structured logging

All logs are emitted as structured JSON using the configuration in orchid/utils/logging_config.py. Log records include:

{
"level": "INFO",
"timestamp": "2025-05-29T14:32:01Z",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "Request completed",
"status": 200,
"duration_ms": 142,
"query_count": 7,
"query_total_ms": 38,
"endpoint": "admin_views.case_detail",
"user_type": "admin"
}

Logs are shipped to CloudWatch via the watchtower library.

The circuit breaker

The CloudWatch logger has a circuit breaker to prevent CloudWatch failures from taking down the app:

  • After 3 consecutive CloudWatch failures, the circuit trips
  • All CloudWatch logging is suspended for 60 seconds
  • After the cooldown, the circuit resets and CloudWatch logging resumes

Request correlation — X-Request-ID

Every request gets a unique UUID4 assigned in before_request:

g.request_id = str(uuid.uuid4())

This ID is:

  1. Logged with every log entry for this request
  2. Returned in the X-Request-ID response header

When a user reports an error, ask them for the value of X-Request-ID from their browser’s network tab. Use that ID to find all related logs in CloudWatch.

Per-request SQL metrics

orchid/utils/query_metrics.py tracks SQL queries for each request:

  • g.query_count — how many queries ran during this request
  • g.query_total_ms — total time spent in database queries

These are logged in after_request as part of the structured request log. If a page is slow, this tells you immediately whether the bottleneck is in Python or in the database.

You can also view query details at the debug page templates/debug_queries.html (admin-only, local dev).

AWS X-Ray

orchid/utils/xray_config.py configures AWS X-Ray distributed tracing. X-Ray provides:

  • Request latency breakdown (Python processing vs. DB vs. external API)
  • Service map showing how components interact
  • Trace IDs for correlating across services

X-Ray is active in production. In local mode, X-Ray tracing is a no-op.

PostHog analytics

orchid/utils/posthog_config.py configures PostHog for product analytics. PostHog captures:

  • Page view events (fired in after_request)
  • Custom product events (feature usage, key actions)

PostHog is for Orchid’s product team to understand usage patterns. It is not for agency-level operational reporting (use the Reports module for that).

Error tracking models

orchid/models/__init__.py defines three error tables with content-based deduplication:

ModelUsed for
ErrorsGeneral application errors
ErrAPIAPI-specific errors
Err400HTTP 400-series errors

Content-based deduplication: The add_error() function computes a hash of the error message. If that hash already exists in the table, no new record is created. This prevents high-volume errors from flooding the table.

# Use this instead of just logging:
from orchid.models import add_error
add_error(exc, action='some_operation', user=g.admin)

External API metrics

orchid/utils/external_api_metrics.py tracks calls to external services (SendGrid, HelloSign, Gmail API, etc.) including:

  • Latency per call
  • Success/failure rates
  • Error types

This helps diagnose whether a slowness problem is in Orchid’s code or in a third-party API.

Gotchas