get to prod tasks
This commit is contained in:
67
tasks/web-production/08-health-checks-shutdown.md
Normal file
67
tasks/web-production/08-health-checks-shutdown.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# 08. Graceful Shutdown & Health Check Endpoints
|
||||
|
||||
meta:
|
||||
id: web-production-08
|
||||
feature: web-production
|
||||
priority: P1
|
||||
depends_on: []
|
||||
tags: [reliability, infrastructure, production]
|
||||
|
||||
objective:
|
||||
- Implement health checks and graceful shutdown to ensure zero-downtime deployments and reliable operations
|
||||
|
||||
deliverables:
|
||||
- Health check endpoint (/health)
|
||||
- Readiness probe endpoint (/ready)
|
||||
- Graceful shutdown handler
|
||||
- Dependency health checks (DB, Redis, Stripe)
|
||||
|
||||
steps:
|
||||
1. Create health check endpoints:
|
||||
- GET /health → basic liveness (HTTP 200 if process running)
|
||||
- GET /ready → readiness check (DB, Redis, Stripe connectivity)
|
||||
- GET /health/deep → comprehensive check with dependency status
|
||||
2. Implement dependency health checks:
|
||||
- Database: simple SELECT 1 query
|
||||
- Redis: PING command
|
||||
- Stripe: retrieve account info (cached)
|
||||
- WebSocket server: connection count
|
||||
3. Add graceful shutdown:
|
||||
- Handle SIGTERM/SIGINT signals
|
||||
- Stop accepting new connections
|
||||
- Wait for active requests to complete (30s timeout)
|
||||
- Close database connections
|
||||
- Close Redis connections
|
||||
- Exit process cleanly
|
||||
4. Add startup probe:
|
||||
- Delay readiness until all services initialized
|
||||
- Retry logic for DB connection on startup
|
||||
5. Add metrics endpoint (/metrics) for Prometheus:
|
||||
- Request count and duration
|
||||
- Error rates
|
||||
- Active connections
|
||||
- Dependency health status
|
||||
|
||||
tests:
|
||||
- Unit: Test health check responses
|
||||
- Integration: Test graceful shutdown with active requests
|
||||
- Load: Verify zero failed requests during rolling restart
|
||||
|
||||
acceptance_criteria:
|
||||
- /health returns 200 within 100ms
|
||||
- /ready returns 200 only when all dependencies healthy
|
||||
- /ready returns 503 with detailed error when dependency down
|
||||
- Graceful shutdown completes within 30 seconds
|
||||
- Zero failed requests during rolling deployment
|
||||
- Prometheus metrics endpoint available
|
||||
|
||||
validation:
|
||||
- `curl /health` → {"status":"ok"}
|
||||
- `curl /ready` → {"status":"ok","dependencies":{"db":"ok","redis":"ok","stripe":"ok"}}
|
||||
- Stop container with active requests → all complete before exit
|
||||
- Block DB port → /ready returns 503
|
||||
|
||||
notes:
|
||||
- Nitro/SolidStart may need custom server plugin for signal handling
|
||||
- Use node-graceful-shutdown or similar library
|
||||
- Kubernetes/Docker health checks rely on these endpoints
|
||||
Reference in New Issue
Block a user