Files
Kordant/tasks/core-services-implementation/03-darkwatch-hibp.md
2026-05-31 22:03:18 -04:00

3.8 KiB

03. HaveIBeenPwned API Integration for Email Breach Monitoring

meta: id: core-services-03 feature: core-services-implementation priority: P0 depends_on: [core-services-01] tags: [darkwatch, hibp, breach-monitoring, api-integration, table-stakes]

objective:

  • Replace the stub scanHIBP() function in the DarkWatch scan engine with a real HaveIBeenPwned API integration that checks user emails against known breach databases and creates exposure records.

deliverables:

  • HIBP API client with k-anonymity support for password checking
  • Email breach lookup with result parsing and normalization
  • Exposure record creation in database with proper severity scoring
  • Alert generation via existing alert pipeline
  • Circuit breaker integration (already exists in scan engine)

steps:

  1. Sign up for HIBP API key at https://haveibeenpwned.com/API/Key (free tier: 1,500 req/mo)
  2. Add HIBP_API_KEY to .env.example and validate in env.ts
  3. Create darkwatch/hibp.client.ts with functions:
    • checkEmail(email): BreachResult[] — query breachedaccount endpoint
    • checkPassword(passwordHash): PwnedPasswordResult — query pwnedpasswords endpoint using k-anonymity
    • getBreaches(): Breach[] — fetch breach metadata for caching
  4. Parse HIBP response: breach name, date, compromised data types, affected accounts
  5. Map data types to internal schema: email, password, phone, address, ssn, domain
  6. Calculate severity: critical if SSN/credit card, warning if email/phone, info if username only
  7. Deduplicate against existing exposures using identifierHash (already implemented)
  8. Create exposure records via existing processExposure() pipeline
  9. Cache breach metadata in Redis (update daily) to reduce API calls
  10. Handle rate limits: 1 req/sec free tier, 10 req/sec paid — implement request queue
  11. Add comprehensive error handling for 404 (no breach), 429 (rate limit), 503 (service unavailable)

tests:

  • Unit: Mock HIBP API responses, verify parsing and severity scoring
  • Integration: Test with real HIBP API using test email test@example.com (no breaches expected)
  • E2E: Add email to watchlist → trigger scan → verify exposure records created for breached email

acceptance_criteria:

  • scanHIBP(email) makes real HTTP request to https://haveibeenpwned.com/api/v3/breachedaccount/{email}
  • Breached emails create exposure records with correct breach metadata (name, date, data classes)
  • Non-breached emails return empty results without creating false exposure records
  • Rate limits are respected (1 req/sec free tier, configurable for paid)
  • 404 responses are handled gracefully (no breach = no exposure, not an error)
  • Circuit breaker opens after 3 consecutive failures and stays open for 60 seconds
  • Exposure deduplication prevents duplicate records for same email + breach combination
  • Alerts are generated for critical exposures (SSN, password) via existing pipeline
  • HIBP breach metadata is cached in Redis and refreshed daily

validation:

  • Run vitest run darkwatch.test.ts — all tests pass
  • Manual: Add known breached email to watchlist, trigger scan, verify alert received
  • Check Redis: GET hibp:breaches returns cached breach metadata
  • Monitor logs: No "not yet implemented" or console.log("[darkwatch] stub") messages

notes:

  • HIBP free tier is 1,500 requests/month — enough for development, need paid tier ($3.50/mo) for production
  • The k-anonymity password check sends only first 5 chars of SHA-1 hash — already privacy-safe
  • The existing scan.engine.ts has the circuit breaker infrastructure — wire HIBP client into it
  • HIBP does NOT crawl dark web — it only aggregates known public breaches. For live dark web monitoring, add Breachsense later (Phase 3)
  • Consider subscribing to HIBP domain monitoring for enterprise upsell later