# 18. Load & Stress Testing meta: id: web-production-18 feature: web-production priority: P2 depends_on: [] tags: [testing, performance, production] objective: - Validate application performance under production-like load and identify bottlenecks deliverables: - Load test suite with k6 or Artillery - Performance baseline documentation - Bottleneck identification report - Scaling recommendations steps: 1. Set up load testing tool: - Install k6 or Artillery - Create tests/ directory for load tests - Configure test environment (staging) 2. Write load tests for critical endpoints: - GET / (landing page) - POST /api/trpc/user.login - GET /api/trpc/user.me (authenticated) - GET /api/trpc/darkwatch.getExposures - GET /api/trpc/alerts.getAlerts - WebSocket connection and alert subscription 3. Define load scenarios: - Baseline: 100 concurrent users, 5 minutes - Target: 1000 concurrent users, 10 minutes - Stress: 5000 concurrent users, 5 minutes - Spike: 0 to 2000 users in 10 seconds 4. Measure and record: - Response time percentiles (p50, p95, p99) - Error rate - Requests per second (throughput) - CPU and memory usage on server - Database connection pool utilization - Redis memory usage 5. Identify bottlenecks: - Slow queries from database - Memory leaks - Connection pool exhaustion - CPU-bound operations 6. Document scaling recommendations: - Horizontal scaling (more instances) - Vertical scaling (bigger instances) - Caching improvements - Query optimization tests: - Load: Baseline test passes with <200ms p95 - Stress: App remains functional under 5x normal load - Spike: App recovers within 30 seconds after spike acceptance_criteria: - Baseline load (100 concurrent) → p95 < 200ms, 0% errors - Target load (1000 concurrent) → p95 < 500ms, <1% errors - Stress load (5000 concurrent) → no crashes, <5% errors - Spike test → recovery within 30 seconds - Performance baseline documented with metrics - Bottleneck report with actionable recommendations - Scaling plan documented validation: - Run k6 against staging → results within acceptable thresholds - Check server metrics during test → CPU <80%, memory <80% - Database connections → pool not exhausted - Review report → identified 3+ bottlenecks with fixes notes: - Always test against staging, never production - Schedule load tests during low-traffic periods - Use k6 Cloud for distributed load testing if needed - Consider using Vercel Analytics for real-user monitoring (RUM)