get to prod tasks

2026-05-26 16:06:34 -04:00
parent 04e839640f
commit 5214412fff
105 changed files with 7447 additions and 38 deletions
--- a/tasks/web-production/08-health-checks-shutdown.md
+++ b/tasks/web-production/08-health-checks-shutdown.md
@@ -0,0 +1,67 @@
+# 08. Graceful Shutdown & Health Check Endpoints
+
+meta:
+  id: web-production-08
+  feature: web-production
+  priority: P1
+  depends_on: []
+  tags: [reliability, infrastructure, production]
+
+objective:
+- Implement health checks and graceful shutdown to ensure zero-downtime deployments and reliable operations
+
+deliverables:
+- Health check endpoint (/health)
+- Readiness probe endpoint (/ready)
+- Graceful shutdown handler
+- Dependency health checks (DB, Redis, Stripe)
+
+steps:
+1. Create health check endpoints:
+   - GET /health → basic liveness (HTTP 200 if process running)
+   - GET /ready → readiness check (DB, Redis, Stripe connectivity)
+   - GET /health/deep → comprehensive check with dependency status
+2. Implement dependency health checks:
+   - Database: simple SELECT 1 query
+   - Redis: PING command
+   - Stripe: retrieve account info (cached)
+   - WebSocket server: connection count
+3. Add graceful shutdown:
+   - Handle SIGTERM/SIGINT signals
+   - Stop accepting new connections
+   - Wait for active requests to complete (30s timeout)
+   - Close database connections
+   - Close Redis connections
+   - Exit process cleanly
+4. Add startup probe:
+   - Delay readiness until all services initialized
+   - Retry logic for DB connection on startup
+5. Add metrics endpoint (/metrics) for Prometheus:
+   - Request count and duration
+   - Error rates
+   - Active connections
+   - Dependency health status
+
+tests:
+- Unit: Test health check responses
+- Integration: Test graceful shutdown with active requests
+- Load: Verify zero failed requests during rolling restart
+
+acceptance_criteria:
+- /health returns 200 within 100ms
+- /ready returns 200 only when all dependencies healthy
+- /ready returns 503 with detailed error when dependency down
+- Graceful shutdown completes within 30 seconds
+- Zero failed requests during rolling deployment
+- Prometheus metrics endpoint available
+
+validation:
+- `curl /health` → {"status":"ok"}
+- `curl /ready` → {"status":"ok","dependencies":{"db":"ok","redis":"ok","stripe":"ok"}}
+- Stop container with active requests → all complete before exit
+- Block DB port → /ready returns 503
+
+notes:
+- Nitro/SolidStart may need custom server plugin for signal handling
+- Use node-graceful-shutdown or similar library
+- Kubernetes/Docker health checks rely on these endpoints