Kordant/tasks/core-services-implementation/03-darkwatch-hibp.md

# 03. HaveIBeenPwned API Integration for Email Breach Monitoring

meta:
  id: core-services-03
  feature: core-services-implementation
  priority: P0
  depends_on: [core-services-01]
  tags: [darkwatch, hibp, breach-monitoring, api-integration, table-stakes]

objective:
- Replace the stub `scanHIBP()` function in the DarkWatch scan engine with a real HaveIBeenPwned API integration that checks user emails against known breach databases and creates exposure records.

deliverables:
- HIBP API client with k-anonymity support for password checking
- Email breach lookup with result parsing and normalization
- Exposure record creation in database with proper severity scoring
- Alert generation via existing alert pipeline
- Circuit breaker integration (already exists in scan engine)

steps:
1. Sign up for HIBP API key at https://haveibeenpwned.com/API/Key (free tier: 1,500 req/mo)
2. Add `HIBP_API_KEY` to `.env.example` and validate in `env.ts`
3. Create `darkwatch/hibp.client.ts` with functions:
   - `checkEmail(email): BreachResult[]` — query breachedaccount endpoint
   - `checkPassword(passwordHash): PwnedPasswordResult` — query pwnedpasswords endpoint using k-anonymity
   - `getBreaches(): Breach[]` — fetch breach metadata for caching
4. Parse HIBP response: breach name, date, compromised data types, affected accounts
5. Map data types to internal schema: email, password, phone, address, ssn, domain
6. Calculate severity: critical if SSN/credit card, warning if email/phone, info if username only
7. Deduplicate against existing exposures using `identifierHash` (already implemented)
8. Create exposure records via existing `processExposure()` pipeline
9. Cache breach metadata in Redis (update daily) to reduce API calls
10. Handle rate limits: 1 req/sec free tier, 10 req/sec paid — implement request queue
11. Add comprehensive error handling for 404 (no breach), 429 (rate limit), 503 (service unavailable)

tests:
- Unit: Mock HIBP API responses, verify parsing and severity scoring
- Integration: Test with real HIBP API using test email `test@example.com` (no breaches expected)
- E2E: Add email to watchlist → trigger scan → verify exposure records created for breached email

acceptance_criteria:
- [ ] `scanHIBP(email)` makes real HTTP request to `https://haveibeenpwned.com/api/v3/breachedaccount/{email}`
- [ ] Breached emails create exposure records with correct breach metadata (name, date, data classes)
- [ ] Non-breached emails return empty results without creating false exposure records
- [ ] Rate limits are respected (1 req/sec free tier, configurable for paid)
- [ ] 404 responses are handled gracefully (no breach = no exposure, not an error)
- [ ] Circuit breaker opens after 3 consecutive failures and stays open for 60 seconds
- [ ] Exposure deduplication prevents duplicate records for same email + breach combination
- [ ] Alerts are generated for critical exposures (SSN, password) via existing pipeline
- [ ] HIBP breach metadata is cached in Redis and refreshed daily

validation:
- Run `vitest run darkwatch.test.ts` — all tests pass
- Manual: Add known breached email to watchlist, trigger scan, verify alert received
- Check Redis: `GET hibp:breaches` returns cached breach metadata
- Monitor logs: No `"not yet implemented"` or `console.log("[darkwatch] stub")` messages

notes:
- HIBP free tier is 1,500 requests/month — enough for development, need paid tier ($3.50/mo) for production
- The k-anonymity password check sends only first 5 chars of SHA-1 hash — already privacy-safe
- The existing `scan.engine.ts` has the circuit breaker infrastructure — wire HIBP client into it
- HIBP does NOT crawl dark web — it only aggregates known public breaches. For live dark web monitoring, add Breachsense later (Phase 3)
- Consider subscribing to HIBP domain monitoring for enterprise upsell later