Observability
What this module does
With 40+ agencies running on 80+ Elastic Beanstalk instances, knowing what is happening at runtime is critical. This module covers:
- Structured JSON logging to CloudWatch
- Per-request SQL query metrics
- AWS X-Ray distributed tracing
- PostHog product analytics
- Error deduplication and tracking
Key files
Directoryorchid/
Directoryutils/
- logging_config.py 9 KB — structured JSON logging, CloudWatch, circuit breaker
- metrics_middleware.py 5 KB — request metrics collection
- query_metrics.py 7 KB — per-request SQL query counting and timing
- xray_config.py 3 KB — AWS X-Ray distributed tracing
- posthog_config.py 8 KB — PostHog product analytics
- external_api_metrics.py 8 KB — external API call tracking
- cron_metrics.py 9 KB — cron job performance metrics
- pool_metrics.py 4 KB — DB connection pool metrics
- sqldebug.py 7 KB — SQL debugging helper
Directorymodels/
- __init__.py Errors, ErrAPI, Err400 — error deduplication models
Structured logging
All logs are emitted as structured JSON using the configuration in orchid/utils/logging_config.py. Log records include:
{ "level": "INFO", "timestamp": "2025-05-29T14:32:01Z", "request_id": "550e8400-e29b-41d4-a716-446655440000", "message": "Request completed", "status": 200, "duration_ms": 142, "query_count": 7, "query_total_ms": 38, "endpoint": "admin_views.case_detail", "user_type": "admin"}Logs are shipped to CloudWatch via the watchtower library.
The circuit breaker
The CloudWatch logger has a circuit breaker to prevent CloudWatch failures from taking down the app:
- After 3 consecutive CloudWatch failures, the circuit trips
- All CloudWatch logging is suspended for 60 seconds
- After the cooldown, the circuit resets and CloudWatch logging resumes
Request correlation — X-Request-ID
Every request gets a unique UUID4 assigned in before_request:
g.request_id = str(uuid.uuid4())This ID is:
- Logged with every log entry for this request
- Returned in the
X-Request-IDresponse header
When a user reports an error, ask them for the value of X-Request-ID from their browser’s network tab. Use that ID to find all related logs in CloudWatch.
Per-request SQL metrics
orchid/utils/query_metrics.py tracks SQL queries for each request:
g.query_count— how many queries ran during this requestg.query_total_ms— total time spent in database queries
These are logged in after_request as part of the structured request log. If a page is slow, this tells you immediately whether the bottleneck is in Python or in the database.
You can also view query details at the debug page templates/debug_queries.html (admin-only, local dev).
AWS X-Ray
orchid/utils/xray_config.py configures AWS X-Ray distributed tracing. X-Ray provides:
- Request latency breakdown (Python processing vs. DB vs. external API)
- Service map showing how components interact
- Trace IDs for correlating across services
X-Ray is active in production. In local mode, X-Ray tracing is a no-op.
PostHog analytics
orchid/utils/posthog_config.py configures PostHog for product analytics. PostHog captures:
- Page view events (fired in
after_request) - Custom product events (feature usage, key actions)
PostHog is for Orchid’s product team to understand usage patterns. It is not for agency-level operational reporting (use the Reports module for that).
Error tracking models
orchid/models/__init__.py defines three error tables with content-based deduplication:
| Model | Used for |
|---|---|
Errors | General application errors |
ErrAPI | API-specific errors |
Err400 | HTTP 400-series errors |
Content-based deduplication: The add_error() function computes a hash of the error message. If that hash already exists in the table, no new record is created. This prevents high-volume errors from flooding the table.
# Use this instead of just logging:from orchid.models import add_erroradd_error(exc, action='some_operation', user=g.admin)External API metrics
orchid/utils/external_api_metrics.py tracks calls to external services (SendGrid, HelloSign, Gmail API, etc.) including:
- Latency per call
- Success/failure rates
- Error types
This helps diagnose whether a slowness problem is in Orchid’s code or in a third-party API.