Files
Kordant/tasks/web-production/08-health-checks-shutdown.md
2026-05-26 16:06:34 -04:00

2.3 KiB

08. Graceful Shutdown & Health Check Endpoints

meta: id: web-production-08 feature: web-production priority: P1 depends_on: [] tags: [reliability, infrastructure, production]

objective:

  • Implement health checks and graceful shutdown to ensure zero-downtime deployments and reliable operations

deliverables:

  • Health check endpoint (/health)
  • Readiness probe endpoint (/ready)
  • Graceful shutdown handler
  • Dependency health checks (DB, Redis, Stripe)

steps:

  1. Create health check endpoints:
    • GET /health → basic liveness (HTTP 200 if process running)
    • GET /ready → readiness check (DB, Redis, Stripe connectivity)
    • GET /health/deep → comprehensive check with dependency status
  2. Implement dependency health checks:
    • Database: simple SELECT 1 query
    • Redis: PING command
    • Stripe: retrieve account info (cached)
    • WebSocket server: connection count
  3. Add graceful shutdown:
    • Handle SIGTERM/SIGINT signals
    • Stop accepting new connections
    • Wait for active requests to complete (30s timeout)
    • Close database connections
    • Close Redis connections
    • Exit process cleanly
  4. Add startup probe:
    • Delay readiness until all services initialized
    • Retry logic for DB connection on startup
  5. Add metrics endpoint (/metrics) for Prometheus:
    • Request count and duration
    • Error rates
    • Active connections
    • Dependency health status

tests:

  • Unit: Test health check responses
  • Integration: Test graceful shutdown with active requests
  • Load: Verify zero failed requests during rolling restart

acceptance_criteria:

  • /health returns 200 within 100ms
  • /ready returns 200 only when all dependencies healthy
  • /ready returns 503 with detailed error when dependency down
  • Graceful shutdown completes within 30 seconds
  • Zero failed requests during rolling deployment
  • Prometheus metrics endpoint available

validation:

  • curl /health → {"status":"ok"}
  • curl /ready → {"status":"ok","dependencies":{"db":"ok","redis":"ok","stripe":"ok"}}
  • Stop container with active requests → all complete before exit
  • Block DB port → /ready returns 503

notes:

  • Nitro/SolidStart may need custom server plugin for signal handling
  • Use node-graceful-shutdown or similar library
  • Kubernetes/Docker health checks rely on these endpoints