2.5 KiB
2.5 KiB
18. Load & Stress Testing
meta: id: web-production-18 feature: web-production priority: P2 depends_on: [] tags: [testing, performance, production]
objective:
- Validate application performance under production-like load and identify bottlenecks
deliverables:
- Load test suite with k6 or Artillery
- Performance baseline documentation
- Bottleneck identification report
- Scaling recommendations
steps:
- Set up load testing tool:
- Install k6 or Artillery
- Create tests/ directory for load tests
- Configure test environment (staging)
- Write load tests for critical endpoints:
- GET / (landing page)
- POST /api/trpc/user.login
- GET /api/trpc/user.me (authenticated)
- GET /api/trpc/darkwatch.getExposures
- GET /api/trpc/alerts.getAlerts
- WebSocket connection and alert subscription
- Define load scenarios:
- Baseline: 100 concurrent users, 5 minutes
- Target: 1000 concurrent users, 10 minutes
- Stress: 5000 concurrent users, 5 minutes
- Spike: 0 to 2000 users in 10 seconds
- Measure and record:
- Response time percentiles (p50, p95, p99)
- Error rate
- Requests per second (throughput)
- CPU and memory usage on server
- Database connection pool utilization
- Redis memory usage
- Identify bottlenecks:
- Slow queries from database
- Memory leaks
- Connection pool exhaustion
- CPU-bound operations
- Document scaling recommendations:
- Horizontal scaling (more instances)
- Vertical scaling (bigger instances)
- Caching improvements
- Query optimization
tests:
- Load: Baseline test passes with <200ms p95
- Stress: App remains functional under 5x normal load
- Spike: App recovers within 30 seconds after spike
acceptance_criteria:
- Baseline load (100 concurrent) → p95 < 200ms, 0% errors
- Target load (1000 concurrent) → p95 < 500ms, <1% errors
- Stress load (5000 concurrent) → no crashes, <5% errors
- Spike test → recovery within 30 seconds
- Performance baseline documented with metrics
- Bottleneck report with actionable recommendations
- Scaling plan documented
validation:
- Run k6 against staging → results within acceptable thresholds
- Check server metrics during test → CPU <80%, memory <80%
- Database connections → pool not exhausted
- Review report → identified 3+ bottlenecks with fixes
notes:
- Always test against staging, never production
- Schedule load tests during low-traffic periods
- Use k6 Cloud for distributed load testing if needed
- Consider using Vercel Analytics for real-user monitoring (RUM)