Files
Kordant/tasks/web-production/08-health-checks-shutdown.md
2026-05-26 16:06:34 -04:00

68 lines
2.3 KiB
Markdown

# 08. Graceful Shutdown & Health Check Endpoints
meta:
id: web-production-08
feature: web-production
priority: P1
depends_on: []
tags: [reliability, infrastructure, production]
objective:
- Implement health checks and graceful shutdown to ensure zero-downtime deployments and reliable operations
deliverables:
- Health check endpoint (/health)
- Readiness probe endpoint (/ready)
- Graceful shutdown handler
- Dependency health checks (DB, Redis, Stripe)
steps:
1. Create health check endpoints:
- GET /health → basic liveness (HTTP 200 if process running)
- GET /ready → readiness check (DB, Redis, Stripe connectivity)
- GET /health/deep → comprehensive check with dependency status
2. Implement dependency health checks:
- Database: simple SELECT 1 query
- Redis: PING command
- Stripe: retrieve account info (cached)
- WebSocket server: connection count
3. Add graceful shutdown:
- Handle SIGTERM/SIGINT signals
- Stop accepting new connections
- Wait for active requests to complete (30s timeout)
- Close database connections
- Close Redis connections
- Exit process cleanly
4. Add startup probe:
- Delay readiness until all services initialized
- Retry logic for DB connection on startup
5. Add metrics endpoint (/metrics) for Prometheus:
- Request count and duration
- Error rates
- Active connections
- Dependency health status
tests:
- Unit: Test health check responses
- Integration: Test graceful shutdown with active requests
- Load: Verify zero failed requests during rolling restart
acceptance_criteria:
- /health returns 200 within 100ms
- /ready returns 200 only when all dependencies healthy
- /ready returns 503 with detailed error when dependency down
- Graceful shutdown completes within 30 seconds
- Zero failed requests during rolling deployment
- Prometheus metrics endpoint available
validation:
- `curl /health` → {"status":"ok"}
- `curl /ready` → {"status":"ok","dependencies":{"db":"ok","redis":"ok","stripe":"ok"}}
- Stop container with active requests → all complete before exit
- Block DB port → /ready returns 503
notes:
- Nitro/SolidStart may need custom server plugin for signal handling
- Use node-graceful-shutdown or similar library
- Kubernetes/Docker health checks rely on these endpoints