get to prod tasks

This commit is contained in:
2026-05-26 16:06:34 -04:00
parent 04e839640f
commit 5214412fff
105 changed files with 7447 additions and 38 deletions

View File

@@ -0,0 +1,61 @@
# 01. Security Headers & CORS Configuration
meta:
id: web-production-01
feature: web-production
priority: P1
depends_on: []
tags: [security, infrastructure, production]
objective:
- Implement comprehensive security headers and CORS configuration to protect against common web vulnerabilities
deliverables:
- Security headers middleware in web/src/middleware.ts or Nitro config
- CORS configuration for API endpoints
- Content Security Policy (CSP) headers
- Remove X-Powered-By and other identifying headers
steps:
1. Add helmet-like security headers via Nitro hooks or Vite plugin:
- Strict-Transport-Security (HSTS)
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
- X-XSS-Protection: 1; mode=block
- Referrer-Policy: strict-origin-when-cross-origin
- Permissions-Policy for camera, microphone, geolocation
2. Implement CSP header allowing only necessary sources:
- script-src: 'self', stripe.com, clerk.dev
- style-src: 'self', 'unsafe-inline' (needed for Tailwind)
- img-src: 'self', data:, blob:, gravatar.com
- connect-src: 'self', api endpoints, websocket URL
- frame-src: 'self', stripe.com (for Checkout)
3. Configure CORS for /api/trpc endpoints:
- Allow origins: production domain, mobile app origins
- Allow methods: GET, POST
- Allow headers: Content-Type, Authorization, x-api-key
- Credentials: true
4. Remove server-identifying headers (X-Powered-By, Server)
5. Add tests verifying headers are present on all responses
tests:
- Unit: Test each header is present and correct value
- Integration: Test API endpoints return correct CORS headers
- Security scan: Use securityheaders.com or similar to verify A+ rating
acceptance_criteria:
- All 8 security headers present on every HTTP response
- CSP blocking inline scripts except nonce/hash approved
- CORS preflight requests handled correctly for API endpoints
- SecurityHeaders.com scan returns A+ rating
- No server version information leaked in headers
validation:
- Run `curl -I https://localhost:3000` and verify headers
- Run automated security header scanner
- Check browser dev tools Network tab for all response headers
notes:
- SolidStart/Nitro may require custom plugin for headers
- CSP 'unsafe-inline' for styles is acceptable with Tailwind v4 but document the trade-off
- Consider using nonce-based CSP once Tailwind supports it fully

View File

@@ -0,0 +1,58 @@
# 02. Rate Limiting & DDoS Protection
meta:
id: web-production-02
feature: web-production
priority: P1
depends_on: []
tags: [security, infrastructure, production]
objective:
- Implement robust rate limiting and DDoS protection beyond the basic in-memory tRPC middleware
deliverables:
- Redis-backed rate limiting for distributed deployment
- Per-endpoint rate limit tiers
- IP-based and user-based limiting
- DDoS protection via Cloudflare or similar
steps:
1. Replace in-memory rate limit map with Redis-backed solution:
- Use ioredis or @upstash/ratelimit for distributed rate limiting
- Create web/src/server/lib/ratelimit.ts with configurable tiers
2. Define rate limit tiers:
- Public endpoints (login, signup): 5 req/min per IP
- Authenticated API: 100 req/min per user
- Sensitive operations (password reset): 3 req/hour per email
- WebSocket connections: 1 per user, reconnect max 5/min
- Admin endpoints: 50 req/min per admin
3. Add IP-based rate limiting at edge/Nitro level for anonymous traffic
4. Configure Cloudflare (or alternative) for:
- DDoS protection
- Bot management
- Challenge pages for suspicious traffic
5. Add rate limit response headers (X-RateLimit-Remaining, X-RateLimit-Reset)
6. Implement sliding window algorithm for fairer limiting
tests:
- Unit: Test rate limiter correctly counts and resets
- Integration: Flood endpoint with requests, verify 429 responses
- Load: Use k6 or artillery to test limits under load
acceptance_criteria:
- Redis-backed rate limiting active on all endpoints
- 429 responses include Retry-After header
- Rate limits enforced per-IP, per-user, and per-endpoint
- DDoS protection layer active at edge
- No single IP can exceed 1000 req/min to any endpoint
- Rate limit headers present on all API responses
validation:
- `ab -n 1000 -c 10` against login endpoint → 429s after limit
- Verify Redis keys exist for rate limit counters
- Check Cloudflare dashboard for blocked threats
notes:
- Current in-memory rate limit in web/src/server/api/utils.ts will not work across multiple server instances
- Upstash Redis recommended for serverless deployments
- Consider implementing token bucket for burst tolerance

View File

@@ -0,0 +1,62 @@
# 03. Input Validation & XSS Prevention Audit
meta:
id: web-production-03
feature: web-production
priority: P1
depends_on: []
tags: [security, validation, production]
objective:
- Audit and harden all input validation to prevent XSS, injection attacks, and malformed data
deliverables:
- XSS prevention audit report
- Input sanitization layer
- HTML escaping on all user-generated content
- SQL injection protection verification
steps:
1. Audit all tRPC routers for input validation gaps:
- Check web/src/server/api/routers/*.ts for missing valibot schemas
- Ensure all user inputs have strict type validation
- Add maxLength constraints to all string inputs
2. Implement output escaping for user-generated content:
- Blog posts, user names, alert messages
- Use DOMPurify or similar on client-side rendering
- Escape HTML entities server-side before DB storage
3. Audit database queries for SQL injection:
- Verify all queries use Drizzle parameterized queries
- Check raw SQL usage in jobs and services
- Ensure no string concatenation in SQL
4. Add content validation for file uploads (if any):
- MIME type verification
- File size limits
- Scan for malware
5. Implement request body size limits:
- 1MB max for JSON payloads
- 10MB max for file uploads
6. Add tests for malformed input handling
tests:
- Unit: Test each router with XSS payloads, SQL injection attempts
- Integration: Submit malicious inputs via API, verify safe handling
- Security: Run OWASP ZAP or Burp Suite against app
acceptance_criteria:
- All tRPC inputs have strict valibot validation with bounds
- User-generated content escaped before rendering
- No SQL injection vectors in any query
- XSS payloads rendered as plain text, not executed
- Request body size limits enforced
- OWASP ZAP scan shows no high/critical vulnerabilities
validation:
- Submit `<script>alert('xss')</script>` in all text fields → rendered safely
- Submit SQL injection in search fields → no database errors
- Run `npm audit` and address all high severity issues
notes:
- Valibot schemas already in use — expand them with stricter bounds
- Consider using zod for more complex validation if valibot is limiting
- Sanitize inputs at API boundary, not just client-side

View File

@@ -0,0 +1,71 @@
# 04. Authentication & Session Security Hardening
meta:
id: web-production-04
feature: web-production
priority: P1
depends_on: []
tags: [security, auth, production]
objective:
- Harden authentication and session management to prevent session hijacking, fixation, and brute force attacks
deliverables:
- Secure session configuration
- JWT hardening
- Brute force protection
- Session invalidation on logout
- Multi-factor authentication foundation
steps:
1. Harden JWT implementation in web/src/server/auth/jwt.ts:
- Remove fallback secret (currently uses dev secret if env missing)
- Add JWT issuer and audience claims
- Implement token blacklisting for logout
- Add refresh token rotation
2. Harden session management in web/src/server/auth/session.ts:
- Use httpOnly, secure, sameSite=strict cookies
- Add session fingerprinting (user agent hash)
- Implement concurrent session limits (max 5 per user)
- Add automatic session expiry refresh on activity
3. Add brute force protection:
- Track failed login attempts per IP/email
- Progressive delays: 1s, 2s, 4s, 8s, 16s
- Lock account after 10 failed attempts (1 hour)
4. Implement secure logout:
- Invalidate session in database
- Clear all cookies
- Blacklist JWT token
- Revoke refresh token
5. Add MFA foundation:
- TOTP secret generation
- QR code for authenticator apps
- Backup codes
6. Audit Clerk integration for security:
- Verify webhook signature validation
- Check Clerk session sync with custom sessions
tests:
- Unit: Test JWT signing/verification with invalid tokens
- Integration: Test brute force lockout, session expiry
- Security: Test session hijacking resistance
acceptance_criteria:
- No hardcoded or fallback secrets in auth code
- All cookies have httpOnly, secure, sameSite=strict
- Brute force protection active on login endpoints
- Logout invalidates session completely
- JWT tokens include iss, aud, iat, exp claims
- Session fingerprinting prevents cookie theft reuse
- MFA TOTP generation working with Google Authenticator
validation:
- Attempt 10 failed logins → account locked
- Steal session cookie from one browser → invalid in another (fingerprinting)
- Logout → session token rejected on subsequent requests
- Check JWT with jwt.io → valid iss and aud claims
notes:
- Current JWT has fallback secret — this is critical to fix before production
- Clerk handles frontend auth but backend needs its own hardening
- Consider using Lucia Auth or NextAuth patterns for session management

View File

@@ -0,0 +1,61 @@
# 05. CDN & Asset Optimization
meta:
id: web-production-05
feature: web-production
priority: P2
depends_on: []
tags: [performance, infrastructure, production]
objective:
- Configure CDN for static assets and optimize frontend bundle delivery
deliverables:
- CDN configuration (Cloudflare, Vercel Edge, or AWS CloudFront)
- Asset optimization (images, fonts, JS/CSS)
- Brotli/Gzip compression
- Cache-Control headers for static assets
steps:
1. Configure CDN for static assets:
- Set up Cloudflare or Vercel Edge Network
- Point CDN to web/dist/client or .output/public
- Configure cache rules for static files
2. Optimize image delivery:
- Convert landing page SVGs to optimized formats where appropriate
- Add responsive image srcset for photos
- Implement lazy loading for below-fold images
3. Configure compression:
- Enable Brotli compression (better than gzip)
- Ensure Nitro/Vite build outputs compressed assets
4. Set Cache-Control headers:
- Immutable assets (hashed filenames): 1 year
- HTML pages: no-cache (for SSR)
- API responses: no-store or short cache
5. Implement resource hints:
- Preconnect to API domain, Stripe, Clerk
- Prefetch critical routes
6. Add tests verifying asset optimization
tests:
- Unit: Test asset hashing and cache headers
- Integration: Test CDN cache hit rates
- Performance: Lighthouse performance audit >90
acceptance_criteria:
- Static assets served from CDN with <50ms TTFB
- Brotli compression active on all text assets
- Cache-Control headers correct per asset type
- Image optimization reducing total page weight by >30%
- Lighthouse Performance score ≥ 90
- Preconnect hints present on critical pages
validation:
- `curl -I https://cdn.example.com/assets/main.js` → Cache-Control: public, max-age=31536000, immutable
- Lighthouse CI run shows Performance ≥ 90
- PageSpeed Insights shows <2s LCP on mobile
notes:
- SolidStart with Nitro should handle asset hashing automatically
- Vercel deployment may include CDN automatically
- Consider using @solidjs/start image optimization if available

View File

@@ -0,0 +1,62 @@
# 06. Database Connection Pooling & Query Optimization
meta:
id: web-production-06
feature: web-production
priority: P1
depends_on: []
tags: [performance, database, production]
objective:
- Optimize database connections and queries for production load
deliverables:
- Connection pooling configuration
- Query performance audit
- Index optimization
- Slow query logging
steps:
1. Configure connection pooling:
- If using PostgreSQL: configure PgBouncer or use @libsql/client pooling
- Set max connections based on server instances (e.g., 20 per instance)
- Add connection timeout and idle timeout settings
2. Audit all Drizzle queries for performance:
- Check web/src/server/db/schema/*.ts for missing indexes
- Review web/src/server/api/routers/*.ts for N+1 queries
- Add pagination to all list endpoints (default 50, max 100)
3. Add database indexes:
- createdAt indexes for time-range queries (alerts, exposures)
- Composite indexes for common filter combinations
- userId indexes on all user-scoped tables
4. Implement query result caching:
- Cache user profile lookups (5 min TTL)
- Cache subscription status (1 min TTL)
- Cache dashboard summary (30 sec TTL)
5. Add slow query logging:
- Log queries taking >500ms
- Alert on >1s queries
6. Set up database performance monitoring
tests:
- Unit: Test query execution plans for major endpoints
- Load: Run 1000 concurrent dashboard loads, verify <200ms p95
- Integration: Test pagination boundaries
acceptance_criteria:
- Database connection pool configured with max 20 connections
- No N+1 queries in any API endpoint
- All list endpoints paginated with cursor or offset
- Slow query logging active
- Dashboard load query <100ms p95
- Alert endpoint query <50ms p95
validation:
- EXPLAIN ANALYZE on major queries shows index usage
- Load test with k6: 1000 concurrent users, p95 < 200ms
- Database CPU <50% under normal load
notes:
- Current schema has some indexes but may need more for production scale
- Drizzle ORM doesn't automatically handle connection pooling — configure at driver level
- Consider read replicas if dashboard load is heavy

View File

@@ -0,0 +1,61 @@
# 07. Caching Strategy (Redis + HTTP Cache)
meta:
id: web-production-07
feature: web-production
priority: P2
depends_on: []
tags: [performance, caching, production]
objective:
- Implement multi-layer caching to reduce database load and improve response times
deliverables:
- Redis caching layer for API responses
- HTTP cache headers for client-side caching
- Cache invalidation strategy
- Stale-while-revalidate pattern
steps:
1. Implement Redis caching for API responses:
- Create web/src/server/lib/cache.ts with Redis-backed cache
- Cache user profile: key `user:{id}`, TTL 5 minutes
- Cache subscription: key `sub:{userId}`, TTL 1 minute
- Cache dashboard summary: key `dash:{userId}`, TTL 30 seconds
- Cache blog posts: key `blog:{slug}`, TTL 1 hour
2. Add cache decorators/procedures:
- Create cachedProcedure wrapper for tRPC
- Support cache tags for invalidation
3. Implement HTTP caching headers:
- Static assets: Cache-Control: public, max-age=31536000, immutable
- API responses: Cache-Control: private, max-age=30
- HTML pages: Cache-Control: no-cache (SSR)
4. Add cache invalidation:
- Invalidate user cache on profile update
- Invalidate subscription cache on billing event
- Invalidate blog cache on publish/edit
5. Implement stale-while-revalidate for dashboard data
6. Add cache hit/miss metrics
tests:
- Unit: Test cache set/get/delete operations
- Integration: Test cache invalidation on mutations
- Performance: Compare cached vs uncached response times
acceptance_criteria:
- Redis cache layer active on all read-heavy endpoints
- Cache hit rate >80% for user profile and subscription endpoints
- Cache invalidation working on all mutations
- HTTP cache headers correct per endpoint type
- Stale-while-revalidate pattern on dashboard widgets
- Cache metrics visible in monitoring dashboard
validation:
- Load test: cached endpoint p95 < 20ms
- Verify Redis keys created for cached data
- Update profile → cache invalidated, next request hits DB
notes:
- Redis already used for BullMQ jobs — share connection or use separate DB index
- Be careful caching authenticated data — always include userId in key
- Consider using Vercel KV or Upstash Redis for serverless

View File

@@ -0,0 +1,67 @@
# 08. Graceful Shutdown & Health Check Endpoints
meta:
id: web-production-08
feature: web-production
priority: P1
depends_on: []
tags: [reliability, infrastructure, production]
objective:
- Implement health checks and graceful shutdown to ensure zero-downtime deployments and reliable operations
deliverables:
- Health check endpoint (/health)
- Readiness probe endpoint (/ready)
- Graceful shutdown handler
- Dependency health checks (DB, Redis, Stripe)
steps:
1. Create health check endpoints:
- GET /health → basic liveness (HTTP 200 if process running)
- GET /ready → readiness check (DB, Redis, Stripe connectivity)
- GET /health/deep → comprehensive check with dependency status
2. Implement dependency health checks:
- Database: simple SELECT 1 query
- Redis: PING command
- Stripe: retrieve account info (cached)
- WebSocket server: connection count
3. Add graceful shutdown:
- Handle SIGTERM/SIGINT signals
- Stop accepting new connections
- Wait for active requests to complete (30s timeout)
- Close database connections
- Close Redis connections
- Exit process cleanly
4. Add startup probe:
- Delay readiness until all services initialized
- Retry logic for DB connection on startup
5. Add metrics endpoint (/metrics) for Prometheus:
- Request count and duration
- Error rates
- Active connections
- Dependency health status
tests:
- Unit: Test health check responses
- Integration: Test graceful shutdown with active requests
- Load: Verify zero failed requests during rolling restart
acceptance_criteria:
- /health returns 200 within 100ms
- /ready returns 200 only when all dependencies healthy
- /ready returns 503 with detailed error when dependency down
- Graceful shutdown completes within 30 seconds
- Zero failed requests during rolling deployment
- Prometheus metrics endpoint available
validation:
- `curl /health` → {"status":"ok"}
- `curl /ready` → {"status":"ok","dependencies":{"db":"ok","redis":"ok","stripe":"ok"}}
- Stop container with active requests → all complete before exit
- Block DB port → /ready returns 503
notes:
- Nitro/SolidStart may need custom server plugin for signal handling
- Use node-graceful-shutdown or similar library
- Kubernetes/Docker health checks rely on these endpoints

View File

@@ -0,0 +1,66 @@
# 09. Structured Logging & Log Aggregation
meta:
id: web-production-09
feature: web-production
priority: P2
depends_on: []
tags: [observability, logging, production]
objective:
- Replace ad-hoc logging with structured, aggregated logging for production debugging and auditing
deliverables:
- Structured logging library integration (Pino or Winston)
- Log aggregation pipeline (Datadog, Logtail, or CloudWatch)
- Request ID propagation across all logs
- Log rotation and retention policy
steps:
1. Add structured logging library:
- Install pino or winston in web/package.json
- Create web/src/server/lib/logger.ts with configured logger
- Replace all console.log/console.error with logger
2. Implement request context logging:
- Generate request ID for each incoming request
- Attach user ID, session ID to log context
- Propagate request ID through tRPC context
3. Configure log levels:
- ERROR: unhandled exceptions, auth failures, DB errors
- WARN: rate limit hits, slow queries, deprecated API usage
- INFO: requests, logins, signups, billing events
- DEBUG: query details, cache hits/misses (dev only)
4. Set up log aggregation:
- Configure log shipping to aggregation service
- Set up log parsing and indexing
- Create saved searches for common issues
5. Implement log rotation:
- 100MB max per file
- 7 days retention for production
- 30 days retention for audit logs
6. Add sensitive data redaction:
- Mask credit card numbers, SSNs, passwords in logs
- Redact JWT tokens (show only first 10 chars)
tests:
- Unit: Test logger outputs valid JSON
- Integration: Test request ID propagation
- Security: Verify no sensitive data in logs
acceptance_criteria:
- All logs output as structured JSON
- Request ID present on every log line for a given request
- Log aggregation service receiving logs in real-time
- Sensitive data redacted from all log output
- Log rotation preventing disk fill
- Searchable logs by user ID, request ID, endpoint
validation:
- Trigger error → log appears in aggregation with stack trace, request ID, user ID
- Search logs by request ID → all related logs returned
- Check log files → no credit card numbers, passwords, full JWTs
notes:
- Pino is fastest and recommended for Node.js
- Use pino-pretty for local development, JSON for production
- Consider OpenTelemetry for unified tracing + logging

View File

@@ -0,0 +1,69 @@
# 10. Error Tracking & Alerting (Sentry Integration)
meta:
id: web-production-10
feature: web-production
priority: P1
depends_on: []
tags: [observability, error-tracking, production]
objective:
- Implement comprehensive error tracking with Sentry to catch and alert on production errors in real-time
deliverables:
- Sentry integration for backend and frontend
- Error alerting rules
- Source maps upload for production builds
- Breadcrumbs for error context
steps:
1. Add Sentry SDK:
- Install @sentry/node for backend
- Install @sentry/solid or @sentry/browser for frontend
- Configure DSN from environment variable
2. Initialize Sentry in backend:
- Add to web/src/entry-server.tsx or Nitro plugin
- Capture unhandled exceptions
- Capture unhandled promise rejections
- Attach user context (ID, email) when available
3. Initialize Sentry in frontend:
- Add to web/src/entry-client.tsx
- Capture JavaScript errors
- Capture SolidJS component errors via ErrorBoundary
- Attach release version and environment
4. Configure error alerting:
- Slack/Discord/PagerDuty integration for P1 errors
- Email alerts for new error types
- Digest emails for recurring errors
- Alert thresholds: >10 errors/minute or >1 unhandled exception
5. Upload source maps:
- Configure Vite plugin for source map generation
- Upload maps to Sentry during build
- Verify error stack traces show original source
6. Add breadcrumbs:
- Log navigation changes
- Log API calls with response status
- Log user actions (clicks, form submissions)
tests:
- Unit: Test Sentry capture in error scenarios
- Integration: Trigger error, verify appears in Sentry
- Alert: Verify alert fires within 1 minute of error
acceptance_criteria:
- 100% of unhandled exceptions captured in Sentry
- All errors include user context, request URL, and environment
- Source maps working → stack traces show original TypeScript
- Alert fired within 60 seconds of first occurrence
- No duplicate alerts for same error (grouping working)
- Error rate dashboard showing trends over time
validation:
- Deploy with intentional bug → error appears in Sentry within 30s
- Check alert channel → notification received
- View error detail → correct file, line number, user context
notes:
- Sentry free tier: 5k errors/month — may need paid plan for scale
- Use Sentry releases to track which deploy introduced errors
- Consider integrating with GitHub for suspect commits

View File

@@ -0,0 +1,70 @@
# 11. Application Metrics & Dashboards
meta:
id: web-production-11
feature: web-production
priority: P2
depends_on: []
tags: [observability, metrics, production]
objective:
- Collect and visualize application metrics for performance monitoring and capacity planning
deliverables:
- Prometheus metrics endpoint
- Custom business metrics
- Grafana or Datadog dashboards
- Alerting on metric thresholds
steps:
1. Add metrics collection:
- Install prom-client for Node.js metrics
- Create web/src/server/lib/metrics.ts
- Expose /metrics endpoint for Prometheus scraping
2. Collect standard metrics:
- HTTP request duration (histogram)
- HTTP request count (counter, by status code, endpoint)
- Active connections (gauge)
- Memory usage (gauge)
- Event loop lag (gauge)
3. Collect business metrics:
- Signup rate (counter)
- Login success/failure rate (counter)
- Subscription conversions (counter)
- DarkWatch scan completions (counter)
- Alert generation rate (counter)
- Average threat score (gauge)
4. Set up dashboards:
- Grafana dashboard or Datadog dashboard
- Request latency percentiles (p50, p95, p99)
- Error rate over time
- Business funnel (landing → signup → subscribe)
- Infrastructure health (CPU, memory, DB connections)
5. Configure alerts:
- p99 latency > 500ms for 5 minutes
- Error rate > 1% for 2 minutes
- Memory usage > 80% for 10 minutes
- DB connection pool > 90% for 5 minutes
tests:
- Unit: Test metrics increment correctly
- Integration: Verify /metrics endpoint returns valid Prometheus format
- Dashboard: Confirm all panels show data
acceptance_criteria:
- /metrics endpoint serving valid Prometheus exposition format
- Request duration histogram with 0.1, 0.5, 1, 2, 5 second buckets
- Business metrics visible in dashboard
- Alert fires when p99 latency exceeds 500ms
- Dashboard refreshes every 10 seconds with live data
- Metrics retention for 30 days
validation:
- `curl /metrics` → valid Prometheus output
- Grafana dashboard shows request latency graph
- Trigger slow endpoint → alert fires within 5 minutes
notes:
- Prometheus + Grafana is open source and cost-effective
- Datadog is easier but more expensive
- Consider using Vercel Analytics if deployed on Vercel

View File

@@ -0,0 +1,69 @@
# 12. Uptime & Performance Monitoring
meta:
id: web-production-12
feature: web-production
priority: P2
depends_on: []
tags: [observability, uptime, production]
objective:
- Monitor application uptime and performance from external vantage points to ensure reliability
deliverables:
- External uptime monitoring (Pingdom, UptimeRobot, or Datadog Synthetics)
- Synthetic monitoring for critical user journeys
- Performance budget enforcement
- Status page for incident communication
steps:
1. Set up uptime monitoring:
- Configure checks for homepage, API health, dashboard
- Check from multiple regions (US East, US West, EU)
- 1-minute interval checks
- Alert on 2 consecutive failures
2. Implement synthetic monitoring:
- Signup flow: homepage → signup → verify email
- Login flow: login → dashboard → view alerts
- Billing flow: dashboard → pricing → checkout (test mode)
- DarkWatch flow: dashboard → darkwatch → add watchlist item
3. Set performance budgets:
- LCP (Largest Contentful Paint) < 2.5s mobile, < 1.5s desktop
- FID (First Input Delay) < 100ms
- CLS (Cumulative Layout Shift) < 0.1
- TTFB (Time to First Byte) < 200ms
- API response p95 < 200ms
4. Configure alerting:
- Downtime alert via Slack/SMS
- Performance degradation alert (LCP > 3s)
- SSL certificate expiry alert (30 days before)
- Domain expiry alert (30 days before)
5. Set up status page:
- Use statuspage.io or instatus.com
- Auto-update from monitoring checks
- Subscribe users for incident notifications
- Post incident updates and post-mortems
tests:
- Integration: Verify monitoring catches simulated outage
- Performance: Confirm synthetic tests complete successfully
- Alert: Test alert channels with deliberate failure
acceptance_criteria:
- Uptime monitoring checking every 60 seconds from 3+ regions
- 99.9% uptime SLA measured over 30 days
- Synthetic tests covering signup, login, and core flows
- Performance budget alerts for LCP > 2.5s
- Status page accessible and auto-updating
- SSL certificate expiry alert 30 days in advance
validation:
- Simulate outage → alert received within 2 minutes
- Check status page → shows incident with timeline
- Run synthetic test → completes in <30 seconds
- Lighthouse CI shows all metrics within budget
notes:
- UptimeRobot free tier: 50 monitors, 5-minute intervals
- Pingdom more reliable but paid
- Consider using Checkly for synthetic monitoring with JS

View File

@@ -0,0 +1,72 @@
# 13. GitHub Actions CI Pipeline
meta:
id: web-production-13
feature: web-production
priority: P1
depends_on: [web-production-17, web-production-18, web-production-19, web-production-20]
tags: [cicd, automation, production]
objective:
- Build a comprehensive CI pipeline that runs tests, linting, type checking, and security scans on every pull request
deliverables:
- GitHub Actions workflow files
- PR checks for web and browser-ext
- Test reporting and coverage
- Dependency vulnerability scanning
steps:
1. Create .github/workflows/ci.yml:
- Trigger on pull_request and push to main
- Set up Node.js 22 with pnpm
- Install dependencies with frozen lockfile
2. Add job: lint-and-typecheck:
- Run `pnpm lint` (tsc --noEmit)
- Run `pnpm lint:ext`
- Fail on any TypeScript errors
3. Add job: test:
- Run `pnpm test` (vitest for web)
- Run `pnpm test:ext` (vitest for browser-ext)
- Generate coverage reports with @vitest/coverage-v8
- Upload coverage to Codecov or similar
4. Add job: build:
- Run `pnpm build` for web
- Run `pnpm build:ext` for browser-ext
- Verify build artifacts exist
5. Add job: security-scan:
- Run `pnpm audit` with --audit-level=high
- Run `npm audit fix` suggestions as PR comment
- Add OWASP dependency check
6. Add job: docker-build:
- Build scheduler Dockerfile
- Verify Docker image builds successfully
7. Configure branch protection:
- Require all checks to pass before merge
- Require 1 reviewer approval
- Require up-to-date branch before merge
tests:
- Integration: Create test PR, verify all checks run
- Security: Introduce vulnerable dependency, verify scan catches it
- Build: Verify build artifacts are created
acceptance_criteria:
- All PRs trigger CI pipeline automatically
- Lint, typecheck, test, build, and security jobs run in parallel
- Tests failing blocks PR merge
- Coverage report uploaded for every PR
- Security vulnerabilities (high+) block PR merge
- Docker build verified on every PR
- Pipeline completes in <10 minutes
validation:
- Open test PR → all checks green
- Introduce TypeScript error → lint job fails
- Add vulnerable package → security scan fails
- Check Codecov → coverage diff visible in PR
notes:
- Use pnpm/action-setup for proper pnpm installation
- Cache node_modules between runs for speed
- Consider using GitHub Actions matrix for multiple Node versions

View File

@@ -0,0 +1,75 @@
# 14. Automated Deployment Pipeline
meta:
id: web-production-14
feature: web-production
priority: P1
depends_on: [web-production-13, web-production-15, web-production-16]
tags: [cicd, deployment, production]
objective:
- Build automated deployment pipelines for staging and production environments with rollback capability
deliverables:
- Staging deployment on merge to main
- Production deployment with manual approval
- Database migration automation
- Rollback strategy
steps:
1. Create .github/workflows/deploy-staging.yml:
- Trigger on push to main
- Build web application
- Run database migrations (drizzle-kit push)
- Deploy to staging environment (Vercel, Railway, or VPS)
- Run smoke tests against staging
2. Create .github/workflows/deploy-production.yml:
- Trigger on release published or manual dispatch
- Require manual approval from 1 team member
- Build and tag Docker image
- Run database migrations in dry-run first
- Deploy to production with blue-green or rolling strategy
- Run post-deploy smoke tests
3. Implement database migration safety:
- Migrations run before app deployment
- Backward-compatible migrations only (add columns, don't drop)
- Migration rollback script for each migration
- Database backup before production migration
4. Add deployment notifications:
- Slack notification on deploy start, success, failure
- Include commit SHA, author, and changelog
5. Implement rollback:
- One-click rollback to previous release
- Database migration rollback (if safe)
- CDN cache purge on rollback
6. Add smoke tests:
- Test homepage loads
- Test login API responds
- Test health endpoint
- Test critical user journey with Playwright
tests:
- Integration: Deploy to staging, verify app functional
- Rollback: Trigger rollback, verify previous version restored
- Migration: Test migration failure doesn't break deployment
acceptance_criteria:
- Every merge to main auto-deploys to staging
- Production deploy requires manual approval
- Database migrations run automatically before app start
- Rollback completes in <5 minutes
- Smoke tests pass before marking deploy successful
- Deployment notifications sent to Slack
- Zero-downtime deployment for web app
validation:
- Merge PR → staging deploys automatically within 5 minutes
- Trigger production deploy → approval gate shown
- Approve → production deploys, smoke tests pass
- Introduce bug → rollback to previous version in <5 minutes
notes:
- Vercel offers automatic preview deployments per PR
- For VPS deployment, use Docker Compose with rolling restart
- Consider using GitHub Environments for approval gates
- Database migrations should be additive-only in production

View File

@@ -0,0 +1,75 @@
# 15. Docker & Infrastructure Optimization
meta:
id: web-production-15
feature: web-production
priority: P2
depends_on: []
tags: [infrastructure, docker, production]
objective:
- Optimize Docker images and infrastructure for production deployment with security and efficiency
deliverables:
- Multi-stage optimized Dockerfile for web app
- Docker Compose for local production simulation
- Infrastructure as Code (Terraform or Pulumi)
- Security scanning for Docker images
steps:
1. Create optimized Dockerfile for web app:
- Multi-stage build (deps → build → runtime)
- Use node:22-alpine for minimal image size
- Run as non-root user
- Copy only necessary files to runtime stage
- Health check in Dockerfile
2. Optimize scheduler Dockerfile:
- Reduce image size (currently copies many files)
- Use .dockerignore to exclude unnecessary files
- Pin base image versions
3. Create docker-compose.prod.yml:
- Web app service with replicas
- Redis service with persistence
- PostgreSQL service (or external)
- Nginx reverse proxy with SSL termination
- Watchtower for automatic updates
4. Add security scanning:
- Trivy or Snyk scan in CI pipeline
- Fail build on CRITICAL vulnerabilities
- Weekly automated scan of production images
5. Implement Infrastructure as Code:
- Terraform configuration for AWS/GCP/Vultr
- VPC, subnets, security groups
- ECS/Fargate or Kubernetes deployment
- Load balancer with SSL
- RDS/Cloud SQL for PostgreSQL
- ElastiCache/Memorystore for Redis
6. Add environment-specific configs:
- Production nginx.conf with rate limiting
- SSL certificate management (Let's Encrypt)
- Firewall rules
tests:
- Integration: Build image, verify size <200MB
- Security: Trivy scan shows no CRITICAL vulnerabilities
- Deploy: Terraform apply creates infrastructure
acceptance_criteria:
- Web Docker image <200MB compressed
- Scheduler Docker image <150MB compressed
- No CRITICAL vulnerabilities in image scans
- docker-compose.prod.yml runs full stack locally
- Terraform creates reproducible infrastructure
- Nginx reverse proxy with SSL and rate limiting
- Non-root user running containers
validation:
- `docker images` → web image <200MB
- `trivy image kordant-web` → no CRITICAL
- `docker-compose -f docker-compose.prod.yml up` → full stack running
- `terraform plan` → no unexpected changes
notes:
- Current scheduler/Dockerfile copies many source files — optimize with .dockerignore
- Consider using distroless images for even smaller footprint
- Use AWS Fargate or Google Cloud Run for serverless containers

View File

@@ -0,0 +1,75 @@
# 16. Environment Management & Secrets Rotation
meta:
id: web-production-16
feature: web-production
priority: P1
depends_on: []
tags: [security, infrastructure, production]
objective:
- Implement secure environment variable management and automated secrets rotation
deliverables:
- Environment variable validation on startup
- Secrets manager integration (AWS Secrets Manager, Doppler, or 1Password)
- Automated secrets rotation
- Environment documentation
steps:
1. Create environment validation:
- Create web/src/server/lib/env.ts with Zod/Valibot schema
- Validate all required env vars on server startup
- Fail fast with clear error messages for missing vars
- Type-safe env access throughout codebase
2. Migrate to secrets manager:
- Set up Doppler or AWS Secrets Manager
- Move DATABASE_URL, JWT_SECRET, STRIPE_SECRET_KEY, CLERK_SECRET_KEY to secrets manager
- Remove secrets from .env files in production
- Use short-lived tokens where possible
3. Implement secrets rotation:
- JWT secret: rotate quarterly
- Database credentials: rotate monthly
- Stripe keys: rotate after any suspected leak
- API keys: rotate every 6 months
- Automated rotation scripts
4. Add environment documentation:
- Document all environment variables in docs/ENVIRONMENT.md
- Mark required vs optional
- Include examples and validation rules
- Document secrets rotation schedule
5. Secure local development:
- .env.example with dummy values
- .env.local in .gitignore
- Pre-commit hook to prevent secret commits
- Use 1Password CLI or Doppler CLI for local secrets
6. Audit existing secrets:
- Scan git history for leaked secrets (git-secrets, truffleHog)
- Rotate any potentially leaked secrets
- Enable GitHub secret scanning
tests:
- Unit: Test env validation catches missing vars
- Security: Verify no secrets in codebase with scanner
- Integration: Test secrets manager integration
acceptance_criteria:
- Server fails to start with clear error if required env var missing
- Zero secrets in codebase or git history
- All production secrets stored in secrets manager
- Rotation schedule documented and automated
- Environment documentation complete and accurate
- GitHub secret scanning enabled
- Pre-commit hooks preventing secret commits
validation:
- Remove DATABASE_URL → server exits with clear error
- Run truffleHog → no secrets found in history
- Check secrets manager → all production secrets stored
- Run rotation script → new JWT secret generated, app continues working
notes:
- Doppler is excellent for team secret management
- AWS Secrets Manager integrates well with ECS/Fargate
- Never commit .env files — use .env.example only
- Consider using sealed secrets for Kubernetes

View File

@@ -0,0 +1,73 @@
# 17. End-to-End Testing (Playwright)
meta:
id: web-production-17
feature: web-production
priority: P1
depends_on: []
tags: [testing, e2e, quality]
objective:
- Implement comprehensive end-to-end tests covering critical user journeys using Playwright
deliverables:
- Playwright test suite for critical flows
- Test database seeding and cleanup
- Visual regression testing setup
- CI integration for E2E tests
steps:
1. Install and configure Playwright:
- Install @playwright/test in web/package.json
- Create playwright.config.ts with project settings
- Configure test database (separate from dev)
2. Create test utilities:
- Test user creation helper
- Database reset between tests
- Authentication state management
- API mocking helpers
3. Write critical path tests:
- Landing page → Signup → Onboarding → Dashboard
- Login → Dashboard → DarkWatch → Add watchlist item
- Login → Settings → Update profile
- Login → Billing → View pricing → Checkout (test mode)
- Admin login → Blog → Create post → Publish
- Real-time alerts: WebSocket connection and alert display
4. Add visual regression tests:
- Screenshot comparison for landing page
- Screenshot comparison for dashboard
- Screenshot comparison for mobile responsive layout
5. Configure test data:
- Seed test database with known data
- Use test Stripe keys for billing tests
- Mock external APIs (Twilio, FCM) in tests
6. Add CI integration:
- Run E2E tests on PR (not blocking initially)
- Upload test artifacts (screenshots, videos)
- Parallel test execution across browsers
tests:
- E2E: All critical paths pass in CI
- Visual: Screenshot diffs reviewed and approved
- Cross-browser: Tests pass on Chromium, Firefox, WebKit
acceptance_criteria:
- 10+ E2E tests covering critical user journeys
- Tests run in <5 minutes with parallel execution
- Tests pass on Chromium, Firefox, and WebKit
- Visual regression catching UI changes
- Test artifacts (screenshots, videos) uploaded on failure
- Tests use isolated test database
- Mobile viewport tests included
validation:
- `npx playwright test` → all tests pass
- CI pipeline runs E2E tests on PR
- Change button color → visual regression test fails
- Check test report → screenshots and traces available
notes:
- Playwright is faster and more reliable than Cypress
- Use test database to avoid polluting dev data
- Start with 5 critical paths, expand over time
- Consider using MSW for API mocking in tests

View File

@@ -0,0 +1,78 @@
# 18. Load & Stress Testing
meta:
id: web-production-18
feature: web-production
priority: P2
depends_on: []
tags: [testing, performance, production]
objective:
- Validate application performance under production-like load and identify bottlenecks
deliverables:
- Load test suite with k6 or Artillery
- Performance baseline documentation
- Bottleneck identification report
- Scaling recommendations
steps:
1. Set up load testing tool:
- Install k6 or Artillery
- Create tests/ directory for load tests
- Configure test environment (staging)
2. Write load tests for critical endpoints:
- GET / (landing page)
- POST /api/trpc/user.login
- GET /api/trpc/user.me (authenticated)
- GET /api/trpc/darkwatch.getExposures
- GET /api/trpc/alerts.getAlerts
- WebSocket connection and alert subscription
3. Define load scenarios:
- Baseline: 100 concurrent users, 5 minutes
- Target: 1000 concurrent users, 10 minutes
- Stress: 5000 concurrent users, 5 minutes
- Spike: 0 to 2000 users in 10 seconds
4. Measure and record:
- Response time percentiles (p50, p95, p99)
- Error rate
- Requests per second (throughput)
- CPU and memory usage on server
- Database connection pool utilization
- Redis memory usage
5. Identify bottlenecks:
- Slow queries from database
- Memory leaks
- Connection pool exhaustion
- CPU-bound operations
6. Document scaling recommendations:
- Horizontal scaling (more instances)
- Vertical scaling (bigger instances)
- Caching improvements
- Query optimization
tests:
- Load: Baseline test passes with <200ms p95
- Stress: App remains functional under 5x normal load
- Spike: App recovers within 30 seconds after spike
acceptance_criteria:
- Baseline load (100 concurrent) → p95 < 200ms, 0% errors
- Target load (1000 concurrent) → p95 < 500ms, <1% errors
- Stress load (5000 concurrent) → no crashes, <5% errors
- Spike test → recovery within 30 seconds
- Performance baseline documented with metrics
- Bottleneck report with actionable recommendations
- Scaling plan documented
validation:
- Run k6 against staging → results within acceptable thresholds
- Check server metrics during test → CPU <80%, memory <80%
- Database connections → pool not exhausted
- Review report → identified 3+ bottlenecks with fixes
notes:
- Always test against staging, never production
- Schedule load tests during low-traffic periods
- Use k6 Cloud for distributed load testing if needed
- Consider using Vercel Analytics for real-user monitoring (RUM)

View File

@@ -0,0 +1,78 @@
# 19. Accessibility Audit & WCAG Compliance
meta:
id: web-production-19
feature: web-production
priority: P2
depends_on: []
tags: [testing, accessibility, compliance]
objective:
- Ensure the web application meets WCAG 2.1 AA standards and is usable by people with disabilities
deliverables:
- Automated accessibility testing with axe-core
- Manual keyboard navigation audit
- Screen reader testing
- Accessibility statement page
steps:
1. Set up automated accessibility testing:
- Install @axe-core/react or jest-axe
- Add accessibility tests to component test suite
- Integrate axe-core with Playwright E2E tests
- Fail build on critical accessibility violations
2. Run automated audit:
- Test all pages: landing, auth, dashboard, settings
- Check for: missing alt text, low contrast, missing labels, focus issues
- Generate report with violation severity
3. Manual keyboard navigation audit:
- Navigate entire app using only Tab, Enter, Space, Escape
- Verify focus indicators visible on all interactive elements
- Test skip links and logical tab order
- Verify no keyboard traps
4. Screen reader testing:
- Test with NVDA (Windows) or VoiceOver (macOS)
- Verify all interactive elements have accessible names
- Test live regions for dynamic content (alerts, toasts)
- Verify form error messages announced
5. Fix critical issues:
- Add missing aria-labels and aria-describedby
- Fix color contrast ratios (minimum 4.5:1 for normal text)
- Ensure all images have alt text
- Add proper heading hierarchy (h1 → h2 → h3)
6. Create accessibility statement:
- Page at /accessibility
- Commitment to WCAG 2.1 AA
- Known limitations
- Contact for accessibility feedback
7. Add accessibility CI check:
- Lighthouse accessibility audit >95
- axe-core scan in CI pipeline
tests:
- Automated: axe-core scan passes with 0 violations
- Manual: Keyboard navigation completes all flows
- Screen reader: All critical paths navigable
acceptance_criteria:
- WCAG 2.1 AA compliance on all pages
- Lighthouse accessibility score ≥ 95
- 0 critical or serious axe-core violations
- All interactive elements keyboard accessible
- Focus indicators visible and logical
- All images have descriptive alt text
- Color contrast ratios ≥ 4.5:1 for normal text
- Accessibility statement page live
validation:
- Run axe-core → 0 critical/serious violations
- Lighthouse CI → Accessibility score ≥ 95
- Navigate with keyboard only → complete signup flow
- Screen reader test → all elements announced correctly
notes:
- Current app has some accessibility features (skip link, aria-live) but needs audit
- SolidJS components need proper aria attributes
- Consider using Radix UI primitives for built-in accessibility
- Test with actual assistive technology, not just automated tools

View File

@@ -0,0 +1,71 @@
# 20. Dependency Vulnerability Scanning
meta:
id: web-production-20
feature: web-production
priority: P1
depends_on: []
tags: [security, dependencies, production]
objective:
- Implement continuous dependency vulnerability scanning and automated updates
deliverables:
- npm audit integration in CI
- Snyk or Dependabot monitoring
- Automated security patch PRs
- SBOM (Software Bill of Materials) generation
steps:
1. Set up automated scanning:
- Enable Dependabot alerts in GitHub repository settings
- Configure Dependabot version updates (weekly)
- Add Snyk integration for deeper analysis
- Configure Snyk to fail builds on high+ severity
2. Add CI scanning:
- `pnpm audit --audit-level=high` in GitHub Actions
- `snyk test` in CI pipeline
- Block PR merge on high/critical vulnerabilities
3. Implement automated patching:
- Dependabot auto-PR for patch updates
- Snyk auto-fix PRs for fixable vulnerabilities
- Manual review required for major version updates
4. Generate SBOM:
- Use cyclonedx or spdx-sbom-generator
- Generate on every release
- Store with release artifacts
5. Audit current dependencies:
- Run `pnpm audit` and fix all high/critical issues
- Check for unmaintained packages
- Review direct dependencies for necessity
- Remove unused dependencies
6. Set up alerting:
- Slack notification for new vulnerabilities
- Weekly vulnerability report
- Emergency alert for critical CVEs
tests:
- Security: Introduce vulnerable package → CI blocks merge
- Integration: Verify Dependabot creates PR for outdated package
- Audit: SBOM generated and contains all dependencies
acceptance_criteria:
- Zero high or critical vulnerabilities in dependencies
- Dependabot monitoring all dependencies
- CI fails on high+ severity vulnerabilities
- SBOM generated for every release
- Automated PRs for security patches within 24 hours
- Weekly dependency update report
- All unused dependencies removed
validation:
- `pnpm audit` → 0 high/critical findings
- Check GitHub Security tab → no open alerts
- Merge PR with vulnerable package → CI fails
- Create release → SBOM artifact attached
notes:
- Some vulnerabilities may be in devDependencies — these are lower priority
- Focus on production dependencies first
- Consider using pnpm overrides for emergency patches
- Review major version updates carefully for breaking changes

View File

@@ -0,0 +1,78 @@
# 21. Privacy Policy, TOS & Legal Pages
meta:
id: web-production-21
feature: web-production
priority: P2
depends_on: []
tags: [compliance, legal, production]
objective:
- Create and deploy all required legal pages for production operation
deliverables:
- Privacy Policy page (/privacy)
- Terms of Service page (/terms)
- Cookie Policy page (/cookies)
- Data Processing Agreement (DPA) page
- Legal pages linked in footer
steps:
1. Create Privacy Policy:
- Data collection practices (what, why, how long)
- Third-party services (Stripe, Clerk, Twilio, Firebase)
- User rights (access, rectification, deletion, portability)
- Contact information for privacy inquiries
- Last updated date
2. Create Terms of Service:
- Service description and limitations
- User responsibilities and prohibited conduct
- Subscription terms and billing
- Termination clauses
- Limitation of liability
- Dispute resolution
3. Create Cookie Policy:
- Types of cookies used (essential, analytics, marketing)
- Purpose of each cookie
- How to manage cookies
- Third-party cookies
4. Create Data Processing Agreement:
- Roles and responsibilities
- Data security measures
- Subprocessor list
- Breach notification procedures
5. Add legal pages to app:
- Create routes: /privacy, /terms, /cookies, /dpa
- Add links in Footer component
- Ensure pages are server-rendered for SEO
6. Review with legal counsel:
- Have privacy policy reviewed by attorney
- Ensure compliance with applicable jurisdictions
- Update based on feedback
tests:
- Unit: Test routes render correctly
- Integration: Verify links in footer navigate correctly
- Compliance: Review with legal counsel
acceptance_criteria:
- Privacy Policy live at /privacy
- Terms of Service live at /terms
- Cookie Policy live at /cookies
- DPA live at /dpa
- All pages linked in site footer
- Pages reviewed and approved by legal counsel
- Last updated date within 30 days of launch
- Contact email for privacy inquiries functional
validation:
- Navigate to /privacy → complete policy displayed
- Click footer links → correct pages load
- Legal counsel approval documented
- Email to privacy@kordant.com → received
notes:
- Consider using Termly or iubenda for generated policies
- Ensure policies cover all data processors (Stripe, Clerk, etc.)
- Update policies when adding new third-party services
- Keep records of user consent to terms

View File

@@ -0,0 +1,80 @@
# 22. Cookie Consent & GDPR Compliance
meta:
id: web-production-22
feature: web-production
priority: P2
depends_on: []
tags: [compliance, gdpr, cookies, production]
objective:
- Implement GDPR-compliant cookie consent with granular controls and data processing transparency
deliverables:
- Cookie consent banner component
- Granular cookie preference management
- Consent storage and enforcement
- GDPR compliance verification
steps:
1. Create cookie consent banner:
- Banner appears on first visit
- Accept all, reject non-essential, customize options
- Links to cookie policy
- Dismissible but persistent until choice made
- Mobile-responsive design
2. Implement granular controls:
- Essential cookies (always on): auth, security
- Analytics cookies (opt-in): PostHog, Plausible
- Marketing cookies (opt-in): retargeting, ads
- Preference cookies (opt-in): theme, language
3. Create preference modal:
- Toggle switches for each category
- Description of each cookie type
- Save preferences button
- Re-openable from footer link
4. Implement consent enforcement:
- Store consent in cookie/localStorage
- Block analytics scripts until consent given
- Block marketing scripts until consent given
- Respect "Do Not Track" browser setting
5. Add GDPR-specific features:
- Data processing notice in signup flow
- Right to access data (export tool)
- Right to erasure (delete account)
- Right to portability (data export)
- Data retention periods documented
6. Add consent logging:
- Log consent choices with timestamp
- Store for compliance audit trail
- Allow users to view their consent history
tests:
- Unit: Test consent banner rendering and interaction
- Integration: Test analytics blocked until consent
- Compliance: Verify DNT respected
acceptance_criteria:
- Cookie banner appears on first visit to all users
- Users can accept, reject, or customize cookie preferences
- Analytics scripts load only after opt-in consent
- Marketing scripts load only after opt-in consent
- Essential cookies function without consent
- Consent preferences persist across sessions
- "Do Not Track" browser setting respected
- Consent choice logged with timestamp
- GDPR rights accessible from settings page
- Cookie policy linked from banner and footer
validation:
- Clear cookies → visit site → banner appears
- Click "Reject" → analytics network requests blocked
- Click "Customize" → toggle analytics on → requests allowed
- Enable DNT in browser → banner shows "DNT detected"
- Check localStorage → consent object stored
notes:
- Use CookieConsent by Orestbida or build custom with SolidJS
- Must comply with both GDPR (EU) and CCPA (California)
- Analytics must be completely blocked, not just paused
- Document consent choices for 2 years (regulatory requirement)

View File

@@ -0,0 +1,76 @@
# 23. Data Export & Deletion Tools
meta:
id: web-production-23
feature: web-production
priority: P2
depends_on: []
tags: [compliance, gdpr, privacy, production]
objective:
- Implement user-facing data export and account deletion tools to comply with GDPR and CCPA requirements
deliverables:
- Data export API and UI (/settings/data-export)
- Account deletion API and UI (/settings/delete-account)
- Data retention policy enforcement
- Deletion confirmation and grace period
steps:
1. Create data export functionality:
- API endpoint: POST /api/trpc/user.exportData
- Collect all user data: profile, alerts, exposures, subscriptions, family members
- Format as JSON or machine-readable format
- Include metadata: export date, data categories
- Email download link or provide direct download
- Complete within 30 days (GDPR requirement)
2. Create account deletion:
- UI in settings page with warning and confirmation
- Require password re-entry for confirmation
- API endpoint: POST /api/trpc/user.delete
- Soft delete first (mark deletedAt, anonymize)
- Hard delete after 30-day grace period
- Cancel active subscriptions via Stripe
- Remove from email lists
3. Implement family data handling:
- If family group owner: transfer ownership or delete group
- If family member: remove from group
- Notify family members of account deletion
4. Add data retention policy:
- Define retention periods per data type
- Automated cleanup of deleted accounts after 30 days
- Audit logs retained for 1 year
- Backup deletion after retention period
5. Add admin tools:
- Admin endpoint to fulfill data export requests
- Admin endpoint to process deletion requests
- Audit log of all export/deletion actions
tests:
- Unit: Test export includes all user data
- Integration: Test deletion flow end-to-end
- Compliance: Verify grace period and hard delete
acceptance_criteria:
- Users can export all personal data from settings
- Export includes: profile, alerts, exposures, watchlist, subscriptions, family data
- Export delivered within 30 seconds (async for large data)
- Account deletion requires password confirmation
- Deleted accounts soft-deleted immediately, hard-deleted after 30 days
- Active subscriptions cancelled on deletion
- Family group handled correctly (ownership transfer)
- Deletion audit log maintained
- Data retention policy documented and enforced
validation:
- Export data → JSON file contains all user data
- Delete account → user marked deleted, can login to restore within 30 days
- After 30 days → user data completely removed from DB
- Check Stripe → subscription cancelled
- Check audit log → deletion action recorded
notes:
- Soft delete preserves referential integrity for family groups
- Hard delete must cascade through all related tables
- Consider GDPR Article 17 exceptions (legal obligations)
- Backup restoration may temporarily restore deleted data

View File

@@ -0,0 +1,79 @@
# 24. Security.txt & Responsible Disclosure
meta:
id: web-production-24
feature: web-production
priority: P2
depends_on: []
tags: [security, compliance, production]
objective:
- Implement security.txt and responsible disclosure process for security researchers
deliverables:
- security.txt file at /.well-known/security.txt
- security@kordant.com email address
- Responsible disclosure policy page
- Bug bounty program foundation
steps:
1. Create security.txt:
- Contact: mailto:security@kordant.com
- Expires: date 1 year in future
- Encryption: link to PGP key (optional)
- Acknowledgments: link to hall of fame page
- Policy: link to disclosure policy
- Hiring: link to security jobs (if applicable)
2. Create responsible disclosure policy:
- Page at /security/disclosure
- Scope of testing (what's in scope, what's out)
- Rules of engagement (no DDoS, no data exfiltration)
- Safe harbor promise (won't prosecute good faith research)
- Reporting process and expected response time
- Reward/recognition program details
3. Set up security email:
- Create security@kordant.com alias
- Forward to engineering team
- Set up auto-responder with acknowledgment
- Create internal triage process
4. Create vulnerability response process:
- Internal SLA: acknowledge within 48 hours
- Triage within 72 hours
- Fix critical vulnerabilities within 7 days
- Fix high severity within 30 days
- Public disclosure after fix deployed
5. Add hall of fame page:
- Page at /security/hall-of-fame
- List researchers who reported valid vulnerabilities
- Include date, severity, and researcher name (with permission)
6. Add security page to footer:
- Link to disclosure policy
- Link to security.txt
- Link to hall of fame
tests:
- Integration: Verify security.txt accessible
- Process: Test email auto-responder
- Content: Review policy with security team
acceptance_criteria:
- security.txt accessible at /.well-known/security.txt
- Disclosure policy live at /security/disclosure
- security@kordant.com email active with auto-responder
- Hall of fame page live at /security/hall-of-fame
- Safe harbor promise clearly stated
- Response SLA documented and followed
- Security links in site footer
- PGP key available for encrypted communication (optional)
validation:
- `curl https://kordant.com/.well-known/security.txt` → valid security.txt
- Email security@kordant.com → auto-responder received
- Navigate to /security/disclosure → complete policy visible
- Check footer → security links present
notes:
- security.txt standard defined by RFC 9116
- Safe harbor is critical for encouraging responsible disclosure
- Consider joining HackerOne or Bugcrowd for managed bug bounty
- Document vulnerability severity classification (CVSS)

View File

@@ -0,0 +1,83 @@
# 25. Sitemap, Robots.txt & Open Graph
meta:
id: web-production-25
feature: web-production
priority: P2
depends_on: []
tags: [seo, marketing, production]
objective:
- Implement SEO fundamentals including sitemap, robots.txt, and Open Graph meta tags for all pages
deliverables:
- Dynamic sitemap.xml generation
- robots.txt configuration
- Open Graph meta tags on all pages
- Twitter Card meta tags
- Canonical URLs
steps:
1. Create dynamic sitemap:
- Route: /sitemap.xml
- Include all public pages: /, /about, /features, /pricing, /blog/*
- Include auth pages: /login, /signup
- Exclude admin pages and user-specific pages
- Set priorities and change frequencies
- Auto-update when blog posts published
2. Create robots.txt:
- Allow: all public pages
- Disallow: /(admin)/*, /api/*, /billing/*, /auth/*
- Sitemap reference
- Crawl-delay for respectful crawling
3. Add Open Graph tags to all pages:
- og:title matching page title
- og:description from meta description
- og:image with branded preview image (1200x630)
- og:url with canonical URL
- og:type (website, article for blog)
- og:site_name: Kordant
4. Add Twitter Card tags:
- twitter:card: summary_large_image
- twitter:title, twitter:description, twitter:image
5. Add canonical URLs:
- Prevent duplicate content issues
- Use absolute URLs with https
- Handle query parameters correctly
6. Create branded OG image:
- Design 1200x630px image with Kordant branding
- Include logo, tagline, and shield icon
- Generate dynamically for blog posts (optional)
7. Add structured data:
- Organization schema on homepage
- WebSite schema with SearchAction
- Article schema for blog posts
- SoftwareApplication schema for app
tests:
- Unit: Test sitemap XML generation
- Integration: Verify meta tags on all pages
- SEO: Test with Facebook Sharing Debugger and Twitter Card Validator
acceptance_criteria:
- Sitemap accessible at /sitemap.xml with all public pages
- robots.txt accessible at /robots.txt with correct directives
- Open Graph tags present on all public pages
- Twitter Card tags present on all public pages
- Canonical URL on every page
- Branded OG image displaying correctly in social shares
- Structured data valid per schema.org (test with Google Rich Results)
- Blog posts have Article schema
validation:
- `curl /sitemap.xml` → valid XML with all routes
- `curl /robots.txt` → correct allow/disallow directives
- Facebook Sharing Debugger → OG image and title display correctly
- Google Rich Results Test → structured data valid
- View page source → all meta tags present
notes:
- SolidJS MetaProvider already in use — extend with OG tags
- Use @solidjs/meta for dynamic meta tags per route
- Consider using @vercel/og or similar for dynamic OG images
- Blog sitemap should update automatically on publish

View File

@@ -0,0 +1,83 @@
# 26. Analytics Integration (Plausible/PostHog)
meta:
id: web-production-26
feature: web-production
priority: P2
depends_on: []
tags: [analytics, marketing, production]
objective:
- Implement privacy-respecting analytics to understand user behavior and measure conversion funnels
deliverables:
- Analytics tracking setup
- Custom event tracking for key actions
- Conversion funnel measurement
- Dashboard for key metrics
steps:
1. Set up analytics platform:
- Choose: Plausible (privacy-first, simple) or PostHog (powerful, self-hostable)
- Create account and add tracking script
- Configure domain and goals
2. Add tracking to app:
- Add script to web/src/entry-client.tsx or layout
- Respect cookie consent (load only after opt-in)
- Respect Do Not Track
- Exclude admin traffic
3. Track page views:
- All public pages
- Dashboard pages (anonymized)
- Blog post reads
4. Track custom events:
- signup_started, signup_completed
- login, logout
- subscription_started, subscription_completed
- darkwatch_scan_initiated
- alert_viewed, alert_resolved
- feature_page_viewed (voiceprint, spamshield, etc.)
5. Create conversion funnels:
- Landing → Signup → Onboarding → Dashboard
- Dashboard → Pricing → Checkout → Subscription
- Blog → Signup (content marketing ROI)
6. Set up dashboards:
- Daily/weekly active users
- Signup conversion rate
- Subscription conversion rate
- Feature adoption (DarkWatch, VoicePrint, etc.)
- Churn rate
- Revenue metrics (via Stripe integration)
7. Add A/B testing foundation:
- PostHog feature flags or Split.io
- Test landing page variants
- Test pricing page variants
tests:
- Integration: Verify events fire correctly
- Privacy: Confirm no PII in analytics payload
- Consent: Test analytics blocked until cookie consent
acceptance_criteria:
- Analytics tracking active on all public pages
- Custom events firing for signup, login, subscription, key features
- Conversion funnels visible in dashboard
- No PII (names, emails, IDs) sent to analytics
- Analytics loads only after cookie consent (if required)
- Admin pages excluded from tracking
- Daily active users metric available
- Subscription conversion rate tracked
- A/B testing framework ready for use
validation:
- Visit landing page → pageview event in analytics
- Sign up → signup_completed event with funnel progression
- Check analytics dashboard → conversion rates visible
- Inspect network tab → no email addresses in payload
- Reject cookies → analytics script not loaded
notes:
- Plausible is GDPR-compliant without cookie consent banner
- PostHog offers more features but requires consent in EU
- Consider self-hosting Plausible for complete data control
- Stripe can send revenue data to analytics automatically

View File

@@ -0,0 +1,82 @@
# 27. Structured Data & Rich Snippets
meta:
id: web-production-27
feature: web-production
priority: P2
depends_on: []
tags: [seo, marketing, production]
objective:
- Implement schema.org structured data to enable rich snippets in search results and improve SEO
deliverables:
- JSON-LD structured data on all relevant pages
- Organization schema
- WebSite schema with search
- Article schema for blog posts
- SoftwareApplication schema
- BreadcrumbList schema
steps:
1. Add Organization schema to homepage:
- @type: Organization
- name: Kordant
- url: https://kordant.com
- logo: URL to logo image
- sameAs: social media profiles
- description: AI-powered identity protection
2. Add WebSite schema:
- @type: WebSite
- url: https://kordant.com
- potentialAction: SearchAction with search URL template
3. Add SoftwareApplication schema:
- @type: SoftwareApplication
- name: Kordant
- applicationCategory: SecurityApplication
- operatingSystem: Web, iOS, Android
- offers: Free tier, Plus ($12/mo), Premium ($29/mo)
- aggregateRating (once reviews collected)
- featureList: DarkWatch, VoicePrint, SpamShield, HomeTitle, RemoveBrokers
4. Add Article schema for blog posts:
- @type: Article
- headline, author, datePublished, dateModified
- image, articleBody, keywords
- publisher (Organization reference)
5. Add BreadcrumbList schema:
- Dynamic breadcrumbs based on current route
- Include in all non-home pages
6. Add FAQPage schema (optional):
- For /about or /features pages
- Common questions and answers
7. Validate all structured data:
- Test with Google Rich Results Test
- Test with Schema Markup Validator
- Fix any warnings or errors
tests:
- Unit: Test JSON-LD generation for each schema type
- Integration: Verify schema present in page source
- SEO: Validate with Google's tools
acceptance_criteria:
- Organization schema on homepage
- WebSite schema with SearchAction on homepage
- SoftwareApplication schema with pricing and features
- Article schema on all blog posts
- BreadcrumbList on all non-home pages
- All schemas pass Google Rich Results Test
- No errors or warnings in Schema Markup Validator
- Schemas dynamically generated based on page data
validation:
- View homepage source → Organization and WebSite JSON-LD present
- View blog post source → Article JSON-LD with correct dates
- Google Rich Results Test → all schemas valid
- Search console → rich results reported
notes:
- Use @solidjs/meta or script tags in JSX for JSON-LD
- Keep JSON-LD in <head> for optimal crawler discovery
- Update SoftwareApplication schema when pricing changes
- Consider adding Review schema once user reviews available

View File

@@ -0,0 +1,73 @@
# 28. API Versioning & Deprecation Strategy
meta:
id: web-production-28
feature: web-production
priority: P2
depends_on: []
tags: [api, stability, mobile]
objective:
- Establish API versioning and deprecation strategy to support mobile app updates without breaking existing clients
deliverables:
- API versioning scheme
- Deprecation policy documentation
- Backward compatibility testing
- Mobile client version tracking
steps:
1. Implement API versioning:
- Current: tRPC v10 (consider upgrade to v11)
- Add version header or URL prefix for breaking changes
- Version format: v1, v2, etc.
- Mobile apps send X-API-Version header
2. Create deprecation policy:
- Document in docs/API_VERSIONING.md
- Breaking changes only in major versions
- Support previous version for minimum 6 months
- Announce deprecations 3 months in advance
- Sunset dates for old versions
3. Add version negotiation:
- Backend supports multiple tRPC router versions
- Route to correct router based on version header
- Default to latest for web clients
4. Track client versions:
- Log app version from User-Agent or X-Client-Version
- Dashboard showing active client versions
- Alert when old versions still in use near sunset
5. Add compatibility tests:
- Test all mobile app versions against current API
- Automated compatibility matrix
- Breaking change detection in CI
6. Document API changes:
- Changelog for all API modifications
- Migration guides for major versions
- Breaking vs non-breaking classification
tests:
- Unit: Test version routing
- Integration: Test old client with new API
- Compatibility: Verify mobile app versions work
acceptance_criteria:
- API versioning scheme documented and implemented
- Mobile apps send version header in all requests
- Backend supports at least 2 API versions simultaneously
- Deprecation policy published and followed
- 6-month support window for old versions
- Client version tracking dashboard active
- Compatibility tests passing for all supported versions
- Changelog maintained for all API changes
validation:
- Mobile app sends X-API-Version: 1 → receives v1 responses
- Deploy v2 changes → v1 clients continue working
- Check dashboard → active client versions visible
- Review changelog → all changes documented
notes:
- tRPC v10 to v11 is a breaking change — plan migration carefully
- Mobile apps may take weeks to update — long support windows needed
- Consider using feature flags instead of versioning for minor changes
- Track iOS and Android app versions separately

View File

@@ -0,0 +1,82 @@
# 29. API Documentation (OpenAPI/tRPC Docs)
meta:
id: web-production-29
feature: web-production
priority: P2
depends_on: []
tags: [api, documentation, production]
objective:
- Generate and publish comprehensive API documentation for internal and external developers
deliverables:
- Auto-generated API documentation
- Interactive API explorer
- Authentication documentation
- Error code reference
steps:
1. Set up tRPC documentation generation:
- Use trpc-openapi or @trpc/openapi-v3 to generate OpenAPI spec
- Or use trpc-docs or @trpc/doc-generator
- Export spec as JSON/YAML
2. Create documentation site:
- Use Swagger UI or Scalar for interactive docs
- Host at /api/docs or separate docs subdomain
- Include request/response examples
- Include authentication requirements
3. Document all routers:
- User router: login, signup, profile, family
- Billing router: subscription, checkout, webhooks
- DarkWatch router: watchlist, exposures, scan
- VoicePrint router: enrollments, analysis
- SpamShield router: rules, phone check
- HomeTitle router: properties, monitoring
- RemoveBrokers router: listings, removals
- Alerts router: list, resolve, correlation
- Admin router: user management, blog
4. Add authentication docs:
- Session cookie authentication
- JWT bearer token authentication
- API key authentication (for extensions)
- Clerk webhook handling
5. Add error documentation:
- Standard error codes (400, 401, 403, 404, 429, 500)
- tRPC error codes and meanings
- Rate limit headers explanation
6. Add webhook documentation:
- Stripe webhook events
- Clerk webhook events
- Payload schemas and verification
7. Keep docs in sync:
- Auto-generate on build
- CI check for doc changes
- Version docs with API versions
tests:
- Unit: Test OpenAPI spec generation
- Integration: Verify docs site loads and examples work
- Review: Team review for accuracy
acceptance_criteria:
- API docs accessible at /api/docs
- All tRPC routers documented with input/output schemas
- Interactive explorer allowing test requests
- Authentication methods documented with examples
- All error codes explained with examples
- Webhook payloads documented with verification steps
- Docs auto-generated from code (single source of truth)
- Examples use realistic test data
validation:
- Navigate to /api/docs → interactive explorer loads
- Try user.me endpoint → returns example response
- Check auth section → all methods documented
- Review webhook docs → verification steps clear
notes:
- trpc-openapi requires adding meta tags to procedures
- Consider using Scalar (modern alternative to Swagger UI)
- Docs should be public but sensitive endpoints marked as auth-required
- Keep examples updated when schemas change

View File

@@ -0,0 +1,82 @@
# 30. WebSocket Production Hardening
meta:
id: web-production-30
feature: web-production
priority: P1
depends_on: []
tags: [security, websockets, production]
objective:
- Harden WebSocket server for production with authentication, rate limiting, and connection management
deliverables:
- Authenticated WebSocket connections
- Connection rate limiting
- Connection cleanup on logout
- Horizontal scaling support (Redis adapter)
steps:
1. Harden WebSocket authentication:
- Validate JWT token in connection query param
- Reject unauthenticated connections immediately
- Re-authenticate periodically (every 15 minutes)
- Close connection on token expiry
2. Implement connection rate limiting:
- Max 1 WebSocket connection per user
- Max 5 reconnection attempts per minute
- IP-based connection limits (100 per IP)
3. Add connection management:
- Track active connections per user
- Close duplicate connections
- Heartbeat with timeout (current implementation good)
- Graceful close on server shutdown
4. Implement horizontal scaling:
- Use Redis adapter for ws (socket.io-redis or @socket.io/redis-adapter)
- Or use Redis pub/sub for broadcast across instances
- Ensure alerts reach all connected clients regardless of instance
5. Add message validation:
- Validate all incoming message schemas
- Reject malformed messages
- Limit message size (max 10KB)
- Sanitize message content
6. Add monitoring:
- Track active connection count
- Track messages per second
- Track connection duration
- Alert on connection spikes (possible DDoS)
7. Secure WebSocket server:
- Run on separate port or path
- TLS encryption (wss://)
- No mixed content (ws on https page)
tests:
- Unit: Test authentication rejection
- Integration: Test duplicate connection handling
- Load: Test 1000 concurrent WebSocket connections
- Security: Test unauthenticated connection rejection
acceptance_criteria:
- All WebSocket connections authenticated with valid JWT
- Unauthenticated connections rejected immediately
- Max 1 connection per user (duplicates closed)
- Heartbeat/ping-pong working with 30s interval
- Redis adapter active for multi-instance deployment
- Message size limited to 10KB
- TLS encryption (wss://) in production
- Connection metrics visible in monitoring
- Graceful shutdown closes all connections cleanly
validation:
- Connect without token → connection rejected
- Connect with valid token → connection accepted
- Open second connection → first connection closed
- Send 20KB message → connection closed with error
- Scale to 2 server instances → alerts broadcast to all clients
- Check metrics → active connections, message rate visible
notes:
- Current WebSocket in web/src/lib/websocket.ts and web/src/server/websocket.ts
- ws library supports Redis adapter for scaling
- Consider using Socket.io for more robust connection management
- WebSocket auth via query params is common but consider cookie-based for security

View File

@@ -0,0 +1,77 @@
# 31. Backup Strategy & Point-in-Time Recovery
meta:
id: web-production-31
feature: web-production
priority: P1
depends_on: []
tags: [database, reliability, production]
objective:
- Implement automated database backups with point-in-time recovery capability
deliverables:
- Automated daily backups
- Point-in-time recovery setup
- Backup testing and verification
- Retention policy
steps:
1. Set up automated backups:
- If PostgreSQL: configure pg_dump cron job or managed backups (RDS, Cloud SQL)
- If SQLite/Turso: configure Turso database branching/backups
- Daily full backups at off-peak hours (3 AM UTC)
- Hourly incremental backups (WAL archiving for Postgres)
2. Configure backup storage:
- Store in separate region/cloud provider (S3, GCS, R2)
- Encrypt backups at rest
- Versioning enabled (protect against deletion)
3. Implement point-in-time recovery:
- WAL archiving for PostgreSQL
- Transaction log backups every 15 minutes
- Test recovery to specific timestamp
4. Add backup monitoring:
- Alert on backup failure
- Track backup size and duration
- Verify backup integrity (checksum)
5. Test restore procedures:
- Monthly restore test to staging environment
- Document step-by-step restore process
- Measure RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
- Target: RTO < 1 hour, RPO < 15 minutes
6. Document retention:
- Daily backups: 7 days
- Weekly backups: 4 weeks
- Monthly backups: 12 months
- Annual backups: 7 years (compliance)
7. Add Redis backup:
- RDB snapshots every 6 hours
- AOF persistence for point-in-time
- Backup to S3/GCS
tests:
- Integration: Test backup creation
- Recovery: Test restore to staging
- Monitoring: Verify backup alerts
acceptance_criteria:
- Daily automated backups running successfully
- Backups stored in separate region with encryption
- Point-in-time recovery tested and working
- Backup failures trigger alerts within 5 minutes
- Monthly restore test completed and documented
- RTO < 1 hour, RPO < 15 minutes
- Retention policy enforced automatically
- Redis backups included in strategy
validation:
- Check backup storage → daily backups present
- Trigger restore test → staging database restored successfully
- Simulate backup failure → alert received
- Check retention → old backups purged per policy
notes:
- Turso offers automatic backups for SQLite — verify configuration
- RDS automated backups are easiest for PostgreSQL
- Test restores are critical — untested backups are useless
- Document restore process for on-call engineers

View File

@@ -0,0 +1,79 @@
# 32. Migration Safety & Rollback Procedures
meta:
id: web-production-32
feature: web-production
priority: P1
depends_on: []
tags: [database, reliability, production]
objective:
- Ensure database migrations are safe, reversible, and won't cause downtime or data loss in production
deliverables:
- Migration safety guidelines
- Backward-compatible migration policy
- Rollback scripts for each migration
- Migration testing in staging
steps:
1. Create migration safety guidelines:
- Document in docs/MIGRATIONS.md
- Additive changes only in production (add columns, create tables)
- No destructive changes during deployment (no DROP COLUMN)
- Two-phase migrations for destructive changes:
- Phase 1: Add new column/table, deploy code to use it
- Phase 2: Remove old column/table after code stable
2. Audit existing migrations:
- Review all drizzle migrations in web/src/server/db/
- Check for any destructive operations
- Add rollback scripts where missing
3. Implement migration testing:
- Run migrations against staging database copy
- Verify app works after migration
- Test rollback script
- Measure migration duration (must be <30 seconds)
4. Add migration safety checks:
- CI check: verify no destructive migrations in PR
- Pre-deploy: dry-run migration in production
- Post-deploy: verify migration applied successfully
5. Document rollback procedures:
- Step-by-step rollback for each migration
- Database backup before migration
- Code rollback procedure
- Data recovery steps if needed
6. Add migration monitoring:
- Log migration start, duration, success/failure
- Alert on migration failure
- Track migration duration trends
7. Set up migration automation:
- GitHub Action to run migrations on staging deploy
- Manual approval for production migrations
- Automated rollback on migration failure
tests:
- Unit: Test migration scripts in isolation
- Integration: Test migration on staging database
- Rollback: Test rollback procedure
acceptance_criteria:
- All production migrations are additive-only
- Two-phase migration process documented for destructive changes
- Rollback script exists for every migration
- Migrations tested on staging before production
- Migration duration <30 seconds
- Automated CI check preventing destructive migrations
- Backup taken before every production migration
- Migration failure triggers automatic alert and rollback
validation:
- Review migration history → no destructive changes in production
- Test rollback → database restored to previous state
- Run destructive migration in PR → CI blocks merge
- Check migration logs → all migrations completed successfully
notes:
- Drizzle migrations are generally safe but review generated SQL
- Use drizzle-kit generate with --custom for complex migrations
- Consider using gh-ost or pt-online-schema-change for large tables
- Always have a database backup before running production migrations

View File

@@ -0,0 +1,93 @@
# Web Production Readiness
Objective: Harden, optimize, and operationalize the SolidStart web application for production deployment with enterprise-grade security, performance, monitoring, and compliance.
Status legend: [ ] todo, [~] in-progress, [x] done
## Tasks
### Security & Hardening
- [ ] 01 — Security Headers & CORS Configuration → `01-security-headers-cors.md`
- [ ] 02 — Rate Limiting & DDoS Protection → `02-rate-limiting-ddos.md`
- [ ] 03 — Input Validation & XSS Prevention Audit → `03-input-validation-xss.md`
- [ ] 04 — Authentication & Session Security Hardening → `04-auth-session-hardening.md`
### Performance & Reliability
- [ ] 05 — CDN & Asset Optimization → `05-cdn-asset-optimization.md`
- [ ] 06 — Database Connection Pooling & Query Optimization → `06-db-connection-pooling.md`
- [ ] 07 — Caching Strategy (Redis + HTTP Cache) → `07-caching-strategy.md`
- [ ] 08 — Graceful Shutdown & Health Check Endpoints → `08-health-checks-shutdown.md`
### Monitoring & Observability
- [ ] 09 — Structured Logging & Log Aggregation → `09-structured-logging.md`
- [ ] 10 — Error Tracking & Alerting (Sentry Integration) → `10-error-tracking.md`
- [ ] 11 — Application Metrics & Dashboards → `11-metrics-dashboards.md`
- [ ] 12 — Uptime & Performance Monitoring → `12-uptime-monitoring.md`
### CI/CD & DevOps
- [ ] 13 — GitHub Actions CI Pipeline → `13-github-actions-ci.md`
- [ ] 14 — Automated Deployment Pipeline → `14-deployment-pipeline.md`
- [ ] 15 — Docker & Infrastructure Optimization → `15-docker-infra.md`
- [ ] 16 — Environment Management & Secrets Rotation → `16-env-secrets.md`
### Testing & Quality Assurance
- [ ] 17 — End-to-End Testing (Playwright) → `17-e2e-testing.md`
- [ ] 18 — Load & Stress Testing → `18-load-testing.md`
- [ ] 19 — Accessibility Audit & WCAG Compliance → `19-accessibility-audit.md`
- [ ] 20 — Dependency Vulnerability Scanning → `20-dependency-scanning.md`
### Compliance & Legal
- [ ] 21 — Privacy Policy, TOS & Legal Pages → `21-legal-pages.md`
- [ ] 22 — Cookie Consent & GDPR Compliance → `22-cookie-gdpr.md`
- [ ] 23 — Data Export & Deletion Tools → `23-data-export-deletion.md`
- [ ] 24 — Security.txt & Responsible Disclosure → `24-security-txt.md`
### SEO & Marketing
- [ ] 25 — Sitemap, Robots.txt & Open Graph → `25-seo-meta.md`
- [ ] 26 — Analytics Integration (Plausible/PostHog) → `26-analytics.md`
- [ ] 27 — Structured Data & Rich Snippets → `27-structured-data.md`
### API & Backend Stability
- [ ] 28 — API Versioning & Deprecation Strategy → `28-api-versioning.md`
- [ ] 29 — API Documentation (OpenAPI/tRPC Docs) → `29-api-documentation.md`
- [ ] 30 — WebSocket Production Hardening → `30-websocket-production.md`
### Database Production Readiness
- [ ] 31 — Backup Strategy & Point-in-Time Recovery → `31-db-backup.md`
- [ ] 32 — Migration Safety & Rollback Procedures → `32-migration-safety.md`
## Dependencies
- 01, 02, 03, 04 can be done in parallel (security foundation)
- 05, 06, 07, 08 can be done in parallel (performance foundation)
- 09, 10, 11, 12 can be done in parallel (observability)
- 13 depends on 17, 18, 19, 20 (tests must pass before CI)
- 14 depends on 13, 15, 16 (CI + infra + env)
- 21, 22, 23, 24 can be done in parallel (compliance)
- 25, 26, 27 can be done in parallel (SEO)
- 28, 29, 30 can be done in parallel (API stability)
- 31, 32 can be done in parallel (DB ops)
- All groups can proceed independently
## Exit Criteria
- All security headers present and scoring A+ on Security Headers scan
- Rate limiting active on all public endpoints (100 req/min)
- Database queries optimized with connection pooling (PgBouncer or equivalent)
- Redis caching layer active for hot paths
- Health check endpoint responding with 200 and dependency status
- Structured logging shipping to aggregation service
- Error tracking capturing 100% of unhandled exceptions
- CI pipeline running tests, lint, typecheck, and build on every PR
- Automated deployment to staging on merge to main
- E2E tests covering critical user journeys (signup → dashboard → billing)
- Load tests confirming 1000 concurrent users with <200ms p95 latency
- Accessibility audit passing WCAG 2.1 AA
- All production dependencies vulnerability-free
- Legal pages live and linked in footer
- Cookie consent banner functional with granular controls
- GDPR data export and deletion APIs operational
- SEO meta tags, sitemap, and robots.txt serving correctly
- Analytics tracking page views and conversion events
- API documentation publicly accessible and up-to-date
- WebSocket connections stable with reconnection logic tested
- Database backups automated with 7-day retention
- Migration rollback tested and documented