Skip to content

Monitoring & Alerting

The monitoring stack

ToolPurposeWho uses it
AWS CloudWatchStructured JSON logs from all EB instancesDevelopers, on-call
AWS X-RayDistributed request tracingDevelopers
PostHogProduct analytics / feature usageProduct team
Error models (Errors, ErrAPI, Err400)Per-agency error deduplication tableDevelopers, support
Cron metricsJob success/failure trackingDevelopers
External API metricsThird-party API latency and errorsDevelopers

CloudWatch — structured logs

All logs from flaskapp.py and cronapp.py are shipped to CloudWatch as structured JSON via watchtower.

Finding logs for a specific request

Every request has a request_id (UUID4). To find all logs for a failing request:

  1. Get the X-Request-ID from the failing HTTP response (browser DevTools → Network → response headers)
  2. Open CloudWatch → Log Groups → find the EB environment’s log group
  3. Filter events by the request_id value

Key log fields

{
"level": "ERROR",
"request_id": "550e8400-...",
"endpoint": "admin_views.case_detail",
"status": 500,
"duration_ms": 234,
"query_count": 12,
"query_total_ms": 89,
"agency_db": "agency_production_db",
"user_type": "admin"
}

CloudWatch circuit breaker

If logs stop appearing in CloudWatch, the circuit breaker may have tripped. After 3 consecutive CloudWatch failures, the logger suspends CloudWatch shipping for 60 seconds. Logs fall back to console (EB instance system logs) during this time.

Check the EB instance’s system logs directly if CloudWatch logs are missing.

AWS X-Ray — distributed tracing

X-Ray provides latency breakdowns for each request — how much time was spent in Python vs. database vs. external APIs.

To use X-Ray:

  1. Open the AWS X-Ray console
  2. Find the service map (shows Orchid → MySQL → external services)
  3. Filter by time range or trace ID
  4. Drill into a specific trace to see the waterfall of operations

X-Ray trace IDs can be correlated with CloudWatch logs by searching for the trace ID string.

Error models — per-agency error tracking

The Errors, ErrAPI, and Err400 models in each agency’s database store deduplicated error records. These are queryable directly:

-- Recent errors in an agency's DB
SELECT message, count, last_seen_at, path
FROM err_api
ORDER BY last_seen_at DESC
LIMIT 20;

Content-based deduplication means each unique error message appears only once (with an incrementing count and updated last_seen_at). High-count errors are your highest-priority bugs.

Watching for slow pages

The query_count and query_total_ms fields in every request log tell you immediately whether a slow page is a Python problem or a database problem:

  • High duration_ms, low query_total_ms → Python is slow (computation, external API call)
  • High query_count (40+) → N+1 query problem — add eager loading
  • High query_total_ms with normal query_count → individual queries are slow — check MySQL indexes

Checking email sync health

For email sync issues:

-- Check sync status for all admins in an agency
SELECT es.admin_id, es.sync_type, es.status, es.last_synced_at,
dr.reason as disconnect_reason
FROM email_sync es
LEFT JOIN email_sync_disconnect_reason dr ON es.disconnect_reason_id = dr.id
ORDER BY es.last_synced_at DESC;

status = 'error' or status = 'disconnected' with a disconnect_reason tells you exactly why sync stopped.

Checking cron job health

-- Recent cron job outcomes (in cron metrics table)
SELECT job_name, db_name, status, duration_ms, error_message, run_at
FROM cron_metrics
ORDER BY run_at DESC
LIMIT 50;

Alerting

Currently there are no automated alerts configured. When adding high-value monitoring:

  • CloudWatch Alarms can trigger on metric thresholds (error rate, latency p95)
  • Consider adding alarms for: 5xx error rate > 1%, request latency p95 > 5s, cron job failure rate > 10%

Emergency access to logs

If the CloudWatch console is unavailable, logs are also written to the EB instance’s system log:

Terminal window
# SSH into the EB instance (requires AWS key and correct security group)
eb ssh [environment-name]
# View recent app logs
cat /var/log/app-1.log | tail -100