# 05. Periodic Scan Scheduling, WebSocket Progress, and Alert Deduplication meta: id: core-services-05 feature: core-services-implementation priority: P1 depends_on: [core-services-03, core-services-04] tags: [darkwatch, scheduler, websocket, real-time, deduplication, alerts] objective: - Make DarkWatch continuously useful by scheduling periodic scans, providing real-time progress via WebSocket, and eliminating alert fatigue through intelligent deduplication. deliverables: - Cron-based scan scheduler with configurable frequency per tier - WebSocket real-time scan progress updates (already have `websocket.ts`) - Alert cooldown periods to prevent duplicate notifications - Digest mode: batch low-priority alerts into daily/weekly summaries - Scan history and metrics dashboard data steps: 1. Implement cron job scheduler in `jobs/handlers/darkwatch.scan.ts`: - Daily scans for active subscriptions - Respects tier limits (Shield = HIBP only daily, Guard+ = full suite weekly) 2. Add `scanFrequency` field to subscription schema (daily, weekly, monthly) 3. Wire WebSocket push from existing `websocket.ts` into scan engine: - Emit `scan:started`, `scan:progress` (completedSources/totalSources), `scan:completed` events - Client dashboard subscribes to user-specific scan events 4. Enhance alert deduplication beyond existing exposure dedup: - Add `alertCooldownHours` per alert type (e.g., 24h for same breach, 72h for property changes) - Track lastAlertSentAt per (userId, alertType, source) tuple - Don't create new alerts during cooldown unless severity increases 5. Implement digest mode: - Low-priority alerts (info) batched into daily digest email - Warning/critical alerts sent immediately via push + email - User preference: immediate vs. digest per severity level 6. Add scan metrics: - Store scan duration, sources checked, exposures found, alerts generated - Aggregate for dashboard "threat score" calculation 7. Implement scan failure recovery: - Partial scan results saved even if one source fails - Failed sources retried individually in next scan window 8. Add rate limit per user: max 1 concurrent scan, queue subsequent requests tests: - Unit: Verify cron expression parsing, cooldown logic, digest batching - Integration: Trigger scheduled scan, verify WebSocket events emitted in correct order - E2E: Start scan from dashboard → watch progress bar → receive completion notification acceptance_criteria: - [ ] Scans run automatically on schedule without manual trigger (cron job) - [ ] WebSocket pushes real-time progress: `scan:progress` events with percentage complete - [ ] Only one scan runs per user at a time; additional requests are queued - [ ] Duplicate alerts are suppressed during cooldown period (configurable per type) - [ ] Info-level alerts are batched into daily digest; warning/critical sent immediately - [ ] Scan history is persisted and visible in dashboard (last scan date, sources checked, findings) - [ ] Failed sources don't fail entire scan — partial results are saved - [ ] Dashboard threat score updates automatically after each scan completion - [ ] Free tier gets weekly scans; paid tiers get daily scans - [ ] No duplicate notifications for same exposure across multiple scans validation: - Run cron job manually: `bun run job:darkwatch:scan`, verify scan completes and exposures created - Connect to WebSocket: `wscat -c ws://localhost:3000/ws`, subscribe to scan events - Check dashboard: Scan progress bar animates during active scan, threat score updates after - Test cooldown: Trigger same scan twice rapidly, verify second scan doesn't create duplicate alerts notes: - The existing `scanStates` Map in `darkwatch.service.ts` is in-memory — move to Redis for multi-instance safety - WebSocket infrastructure exists at `websocket.ts` — extend it for scan-specific events - The scheduler directory (`scheduler/`) currently only has Dockerfiles — this task creates actual job logic - Consider using Honker (Rust queue) for scan job distribution once it's production-ready - Alert fatigue is a real churn driver — aggressive deduplication is a competitive advantage