# 11. Application Metrics & Dashboards meta: id: web-production-11 feature: web-production priority: P2 depends_on: [] tags: [observability, metrics, production] objective: - Collect and visualize application metrics for performance monitoring and capacity planning deliverables: - Prometheus metrics endpoint - Custom business metrics - Grafana or Datadog dashboards - Alerting on metric thresholds steps: 1. Add metrics collection: - Install prom-client for Node.js metrics - Create web/src/server/lib/metrics.ts - Expose /metrics endpoint for Prometheus scraping 2. Collect standard metrics: - HTTP request duration (histogram) - HTTP request count (counter, by status code, endpoint) - Active connections (gauge) - Memory usage (gauge) - Event loop lag (gauge) 3. Collect business metrics: - Signup rate (counter) - Login success/failure rate (counter) - Subscription conversions (counter) - DarkWatch scan completions (counter) - Alert generation rate (counter) - Average threat score (gauge) 4. Set up dashboards: - Grafana dashboard or Datadog dashboard - Request latency percentiles (p50, p95, p99) - Error rate over time - Business funnel (landing → signup → subscribe) - Infrastructure health (CPU, memory, DB connections) 5. Configure alerts: - p99 latency > 500ms for 5 minutes - Error rate > 1% for 2 minutes - Memory usage > 80% for 10 minutes - DB connection pool > 90% for 5 minutes tests: - Unit: Test metrics increment correctly - Integration: Verify /metrics endpoint returns valid Prometheus format - Dashboard: Confirm all panels show data acceptance_criteria: - /metrics endpoint serving valid Prometheus exposition format - Request duration histogram with 0.1, 0.5, 1, 2, 5 second buckets - Business metrics visible in dashboard - Alert fires when p99 latency exceeds 500ms - Dashboard refreshes every 10 seconds with live data - Metrics retention for 30 days validation: - `curl /metrics` → valid Prometheus output - Grafana dashboard shows request latency graph - Trigger slow endpoint → alert fires within 5 minutes notes: - Prometheus + Grafana is open source and cost-effective - Datadog is easier but more expensive - Consider using Vercel Analytics if deployed on Vercel