# 03. HaveIBeenPwned API Integration for Email Breach Monitoring meta: id: core-services-03 feature: core-services-implementation priority: P0 depends_on: [core-services-01] tags: [darkwatch, hibp, breach-monitoring, api-integration, table-stakes] objective: - Replace the stub `scanHIBP()` function in the DarkWatch scan engine with a real HaveIBeenPwned API integration that checks user emails against known breach databases and creates exposure records. deliverables: - HIBP API client with k-anonymity support for password checking - Email breach lookup with result parsing and normalization - Exposure record creation in database with proper severity scoring - Alert generation via existing alert pipeline - Circuit breaker integration (already exists in scan engine) steps: 1. Sign up for HIBP API key at https://haveibeenpwned.com/API/Key (free tier: 1,500 req/mo) 2. Add `HIBP_API_KEY` to `.env.example` and validate in `env.ts` 3. Create `darkwatch/hibp.client.ts` with functions: - `checkEmail(email): BreachResult[]` — query breachedaccount endpoint - `checkPassword(passwordHash): PwnedPasswordResult` — query pwnedpasswords endpoint using k-anonymity - `getBreaches(): Breach[]` — fetch breach metadata for caching 4. Parse HIBP response: breach name, date, compromised data types, affected accounts 5. Map data types to internal schema: email, password, phone, address, ssn, domain 6. Calculate severity: critical if SSN/credit card, warning if email/phone, info if username only 7. Deduplicate against existing exposures using `identifierHash` (already implemented) 8. Create exposure records via existing `processExposure()` pipeline 9. Cache breach metadata in Redis (update daily) to reduce API calls 10. Handle rate limits: 1 req/sec free tier, 10 req/sec paid — implement request queue 11. Add comprehensive error handling for 404 (no breach), 429 (rate limit), 503 (service unavailable) tests: - Unit: Mock HIBP API responses, verify parsing and severity scoring - Integration: Test with real HIBP API using test email `test@example.com` (no breaches expected) - E2E: Add email to watchlist → trigger scan → verify exposure records created for breached email acceptance_criteria: - [ ] `scanHIBP(email)` makes real HTTP request to `https://haveibeenpwned.com/api/v3/breachedaccount/{email}` - [ ] Breached emails create exposure records with correct breach metadata (name, date, data classes) - [ ] Non-breached emails return empty results without creating false exposure records - [ ] Rate limits are respected (1 req/sec free tier, configurable for paid) - [ ] 404 responses are handled gracefully (no breach = no exposure, not an error) - [ ] Circuit breaker opens after 3 consecutive failures and stays open for 60 seconds - [ ] Exposure deduplication prevents duplicate records for same email + breach combination - [ ] Alerts are generated for critical exposures (SSN, password) via existing pipeline - [ ] HIBP breach metadata is cached in Redis and refreshed daily validation: - Run `vitest run darkwatch.test.ts` — all tests pass - Manual: Add known breached email to watchlist, trigger scan, verify alert received - Check Redis: `GET hibp:breaches` returns cached breach metadata - Monitor logs: No `"not yet implemented"` or `console.log("[darkwatch] stub")` messages notes: - HIBP free tier is 1,500 requests/month — enough for development, need paid tier ($3.50/mo) for production - The k-anonymity password check sends only first 5 chars of SHA-1 hash — already privacy-safe - The existing `scan.engine.ts` has the circuit breaker infrastructure — wire HIBP client into it - HIBP does NOT crawl dark web — it only aggregates known public breaches. For live dark web monitoring, add Breachsense later (Phase 3) - Consider subscribing to HIBP domain monitoring for enterprise upsell later