03. HaveIBeenPwned API Integration for Email Breach Monitoring

meta: id: core-services-03 feature: core-services-implementation priority: P0 depends_on: [core-services-01] tags: [darkwatch, hibp, breach-monitoring, api-integration, table-stakes]

objective:

Replace the stub scanHIBP() function in the DarkWatch scan engine with a real HaveIBeenPwned API integration that checks user emails against known breach databases and creates exposure records.

deliverables:

HIBP API client with k-anonymity support for password checking
Email breach lookup with result parsing and normalization
Exposure record creation in database with proper severity scoring
Alert generation via existing alert pipeline
Circuit breaker integration (already exists in scan engine)

steps:

Sign up for HIBP API key at https://haveibeenpwned.com/API/Key (free tier: 1,500 req/mo)
Add HIBP_API_KEY to .env.example and validate in env.ts
Create darkwatch/hibp.client.ts with functions:
- checkEmail(email): BreachResult[] — query breachedaccount endpoint
- checkPassword(passwordHash): PwnedPasswordResult — query pwnedpasswords endpoint using k-anonymity
- getBreaches(): Breach[] — fetch breach metadata for caching
Parse HIBP response: breach name, date, compromised data types, affected accounts
Map data types to internal schema: email, password, phone, address, ssn, domain
Calculate severity: critical if SSN/credit card, warning if email/phone, info if username only
Deduplicate against existing exposures using identifierHash (already implemented)
Create exposure records via existing processExposure() pipeline
Cache breach metadata in Redis (update daily) to reduce API calls
Handle rate limits: 1 req/sec free tier, 10 req/sec paid — implement request queue
Add comprehensive error handling for 404 (no breach), 429 (rate limit), 503 (service unavailable)

tests:

Unit: Mock HIBP API responses, verify parsing and severity scoring
Integration: Test with real HIBP API using test email test@example.com (no breaches expected)
E2E: Add email to watchlist → trigger scan → verify exposure records created for breached email

acceptance_criteria:

scanHIBP(email) makes real HTTP request to https://haveibeenpwned.com/api/v3/breachedaccount/{email}
Breached emails create exposure records with correct breach metadata (name, date, data classes)
Non-breached emails return empty results without creating false exposure records
Rate limits are respected (1 req/sec free tier, configurable for paid)
404 responses are handled gracefully (no breach = no exposure, not an error)
Circuit breaker opens after 3 consecutive failures and stays open for 60 seconds
Exposure deduplication prevents duplicate records for same email + breach combination
Alerts are generated for critical exposures (SSN, password) via existing pipeline
HIBP breach metadata is cached in Redis and refreshed daily

validation:

Run vitest run darkwatch.test.ts — all tests pass
Manual: Add known breached email to watchlist, trigger scan, verify alert received
Check Redis: GET hibp:breaches returns cached breach metadata
Monitor logs: No "not yet implemented" or console.log("[darkwatch] stub") messages

notes:

HIBP free tier is 1,500 requests/month — enough for development, need paid tier ($3.50/mo) for production
The k-anonymity password check sends only first 5 chars of SHA-1 hash — already privacy-safe
The existing scan.engine.ts has the circuit breaker infrastructure — wire HIBP client into it
HIBP does NOT crawl dark web — it only aggregates known public breaches. For live dark web monitoring, add Breachsense later (Phase 3)
Consider subscribing to HIBP domain monitoring for enterprise upsell later

3.8 KiB Raw Blame History

03. HaveIBeenPwned API Integration for Email Breach Monitoring

3.8 KiB

Raw Blame History