shortcommings

2026-05-31 22:03:18 -04:00
parent 3b29de3234
commit c159f07322
17 changed files with 1535 additions and 4 deletions
--- a/tasks/core-services-implementation/01-stripe-checkout-webhooks.md
+++ b/tasks/core-services-implementation/01-stripe-checkout-webhooks.md
@@ -0,0 +1,57 @@
+# 01. Stripe Checkout, Webhooks, and Subscription State Management
+
+meta:
+  id: core-services-01
+  feature: core-services-implementation
+  priority: P0
+  depends_on: []
+  tags: [billing, stripe, payments, foundation]
+
+objective:
+- Enable paid customer acquisition by implementing complete Stripe payment lifecycle — checkout, webhook handling, subscription state machine, and customer portal.
+
+deliverables:
+- Stripe Checkout session creation for each plan tier (Shield, Guard, Fortress, Family Fortress)
+- Webhook endpoint handling all critical Stripe events
+- Subscription state machine in Drizzle ORM
+- Customer portal (billing settings, plan change, cancellation)
+- Trial period support (14-day free trial)
+
+steps:
+1. Add `STRIPE_WEBHOOK_SECRET` to `.env.example` and validate in `env.ts`
+2. Implement `createCheckoutSession(planId, customerId?, trial?)` in `billing.service.ts`
+3. Implement `POST /api/webhooks/stripe` route handler with signature verification
+4. Handle events: `checkout.session.completed`, `invoice.payment_succeeded`, `invoice.payment_failed`, `customer.subscription.updated`, `customer.subscription.deleted`
+5. Update subscription record in database on each event (status, tier, period end, payment method)
+6. Implement `createCustomerPortalSession(customerId)` for subscription management
+7. Add trial logic: create subscription with `trial_end`, handle trial-to-paid transition
+8. Add proration logic for tier upgrades/downgrades using `proration_behavior: 'create_prorations'`
+9. Update billing router tRPC procedures: `getCheckoutUrl`, `getPortalUrl`, `getSubscription`, `cancelSubscription`
+10. Add rate limiting on checkout creation (prevent abuse)
+
+tests:
+- Unit: Mock Stripe API responses, verify database state transitions for each webhook event
+- Integration: Create real Stripe test-mode checkout session, complete payment, verify subscription activation
+- E2E: End-to-end checkout flow from dashboard → Stripe Checkout → webhook → active subscription
+
+acceptance_criteria:
+- [ ] Customer can click "Subscribe" on Shield plan and be redirected to Stripe Checkout
+- [ ] After successful payment, webhook creates active subscription record in database
+- [ ] Customer can access billing portal to view invoices, change plan, or cancel
+- [ ] Trial subscription auto-converts to paid or suspends after trial ends
+- [ ] Tier upgrade creates prorated invoice and updates subscription immediately
+- [ ] `invoice.payment_failed` sets grace period status and sends retry email
+- [ ] All webhook events are idempotent (duplicate events don't create duplicate records)
+- [ ] Webhook handler returns 200 for handled events, 400 for invalid signatures
+
+validation:
+- Run `stripe trigger checkout.session.completed` in Stripe CLI, verify database record
+- Run `stripe trigger invoice.payment_failed`, verify grace period status
+- Create test checkout, pay with `4242 4242 4242 4242`, verify active subscription in dashboard
+- Run test suite: `vitest run billing.test.ts`
+
+notes:
+- Stripe API version: `2026-04-22.dahlia` (already configured in `stripe.ts`)
+- Webhook endpoint must be publicly accessible for Stripe to deliver — use ngrok for local dev
+- Store `stripeCustomerId` and `stripeSubscriptionId` on user/subscription records
+- Use `stripe-webhook` event type in database for audit trail
--- a/tasks/core-services-implementation/02-removebrokers-top-20.md
+++ b/tasks/core-services-implementation/02-removebrokers-top-20.md
@@ -0,0 +1,61 @@
+# 02. Automated Removal Engine for Top 20 Data Brokers
+
+meta:
+  id: core-services-02
+  feature: core-services-implementation
+  priority: P0
+  depends_on: [core-services-01]
+  tags: [removebrokers, automation, playwright, scraping, revenue]
+
+objective:
+- Replace the `submitAutomatedRemoval()` stub that returns `crypto.randomUUID()` with a real Playwright-based browser automation that submits opt-out requests to the top 20 data brokers.
+
+deliverables:
+- Playwright-based removal engine in `removebrokers/removal.engine.ts`
+- Per-broker adapter modules for top 20 brokers (Spokeo, Whitepages, MyLife, BeenVerified, etc.)
+- CAPTCHA detection and graceful failure (manual fallback flow)
+- Removal request status tracking with actual polling
+- Email notification service integration for opt-out confirmations
+
+steps:
+1. Install Playwright: `npm install -D playwright @playwright/test`
+2. Analyze opt-out flows for top 20 brokers from existing registry data
+3. Create `removebrokers/adapters/` directory with one module per broker
+4. Implement base adapter interface: `scanForProfile`, `submitOptOut`, `verifyRemoval`, `getStatus`
+5. Implement adapters for each top 20 broker with navigation, form filling, and submission logic
+6. Add proxy rotation support (BrightData or similar) to avoid IP blocking
+7. Add stealth mode (playwright-stealth) to reduce detection
+8. Implement `submitAutomatedRemoval()` to select correct adapter by broker ID and execute
+9. Store actual request IDs from brokers (not generated UUIDs) in database
+10. Implement `trackRemovalStatus()` with periodic re-scans for submitted requests
+11. Integrate with notification service to email user when removal is confirmed
+12. Add job handler for batch removal processing queue
+13. Handle failures gracefully: retry with backoff, escalate to manual queue after 3 failures
+
+tests:
+- Unit: Mock Playwright browser, verify adapter navigation sequences
+- Integration: Run adapter against real broker site in headful mode, verify opt-out form submission
+- E2E: Full flow — add broker to watchlist → trigger removal → verify status progression
+
+acceptance_criteria:
+- [ ] Top 20 broker adapters are implemented and tested against live sites
+- [ ] `submitAutomatedRemoval()` no longer returns mock UUIDs — it submits real opt-out requests
+- [ ] Removal status tracks actual broker state (pending → submitted → completed/failed)
+- [ ] Failed removals are retried 3 times with exponential backoff, then escalated to manual queue
+- [ ] CAPTCHA challenges are detected and flagged for manual processing (not silently failing)
+- [ ] Job queue processes removals asynchronously without blocking API responses
+- [ ] User dashboard shows real removal progress per broker
+- [ ] All Playwright browsers are properly closed after each session (no resource leaks)
+
+validation:
+- Run `vitest run removebrokers.service.test.ts` — all tests pass
+- Manual test: Trigger removal for Spokeo, verify opt-out email received
+- Check database: `removal_requests` table has real request IDs and actual status values
+- Run removal job: `bun run job:removebrokers` processes queue without errors
+
+notes:
+- Broker sites change frequently — expect 15–25% of adapters to break per quarter
+- Some brokers require email verification sent to the listed email (often outdated) — flag these
+- Start with brokers that have simple form-based opt-outs; defer email/physical mail brokers to Phase 3
+- The existing broker registry in `broker.registry.ts` already has removal URLs — use these as starting points
+- Budget $1K–$3K/mo for proxy infrastructure at scale
--- a/tasks/core-services-implementation/03-darkwatch-hibp.md
+++ b/tasks/core-services-implementation/03-darkwatch-hibp.md
@@ -0,0 +1,63 @@
+# 03. HaveIBeenPwned API Integration for Email Breach Monitoring
+
+meta:
+  id: core-services-03
+  feature: core-services-implementation
+  priority: P0
+  depends_on: [core-services-01]
+  tags: [darkwatch, hibp, breach-monitoring, api-integration, table-stakes]
+
+objective:
+- Replace the stub `scanHIBP()` function in the DarkWatch scan engine with a real HaveIBeenPwned API integration that checks user emails against known breach databases and creates exposure records.
+
+deliverables:
+- HIBP API client with k-anonymity support for password checking
+- Email breach lookup with result parsing and normalization
+- Exposure record creation in database with proper severity scoring
+- Alert generation via existing alert pipeline
+- Circuit breaker integration (already exists in scan engine)
+
+steps:
+1. Sign up for HIBP API key at https://haveibeenpwned.com/API/Key (free tier: 1,500 req/mo)
+2. Add `HIBP_API_KEY` to `.env.example` and validate in `env.ts`
+3. Create `darkwatch/hibp.client.ts` with functions:
+   - `checkEmail(email): BreachResult[]` — query breachedaccount endpoint
+   - `checkPassword(passwordHash): PwnedPasswordResult` — query pwnedpasswords endpoint using k-anonymity
+   - `getBreaches(): Breach[]` — fetch breach metadata for caching
+4. Parse HIBP response: breach name, date, compromised data types, affected accounts
+5. Map data types to internal schema: email, password, phone, address, ssn, domain
+6. Calculate severity: critical if SSN/credit card, warning if email/phone, info if username only
+7. Deduplicate against existing exposures using `identifierHash` (already implemented)
+8. Create exposure records via existing `processExposure()` pipeline
+9. Cache breach metadata in Redis (update daily) to reduce API calls
+10. Handle rate limits: 1 req/sec free tier, 10 req/sec paid — implement request queue
+11. Add comprehensive error handling for 404 (no breach), 429 (rate limit), 503 (service unavailable)
+
+tests:
+- Unit: Mock HIBP API responses, verify parsing and severity scoring
+- Integration: Test with real HIBP API using test email `test@example.com` (no breaches expected)
+- E2E: Add email to watchlist → trigger scan → verify exposure records created for breached email
+
+acceptance_criteria:
+- [ ] `scanHIBP(email)` makes real HTTP request to `https://haveibeenpwned.com/api/v3/breachedaccount/{email}`
+- [ ] Breached emails create exposure records with correct breach metadata (name, date, data classes)
+- [ ] Non-breached emails return empty results without creating false exposure records
+- [ ] Rate limits are respected (1 req/sec free tier, configurable for paid)
+- [ ] 404 responses are handled gracefully (no breach = no exposure, not an error)
+- [ ] Circuit breaker opens after 3 consecutive failures and stays open for 60 seconds
+- [ ] Exposure deduplication prevents duplicate records for same email + breach combination
+- [ ] Alerts are generated for critical exposures (SSN, password) via existing pipeline
+- [ ] HIBP breach metadata is cached in Redis and refreshed daily
+
+validation:
+- Run `vitest run darkwatch.test.ts` — all tests pass
+- Manual: Add known breached email to watchlist, trigger scan, verify alert received
+- Check Redis: `GET hibp:breaches` returns cached breach metadata
+- Monitor logs: No `"not yet implemented"` or `console.log("[darkwatch] stub")` messages
+
+notes:
+- HIBP free tier is 1,500 requests/month — enough for development, need paid tier ($3.50/mo) for production
+- The k-anonymity password check sends only first 5 chars of SHA-1 hash — already privacy-safe
+- The existing `scan.engine.ts` has the circuit breaker infrastructure — wire HIBP client into it
+- HIBP does NOT crawl dark web — it only aggregates known public breaches. For live dark web monitoring, add Breachsense later (Phase 3)
+- Consider subscribing to HIBP domain monitoring for enterprise upsell later
--- a/tasks/core-services-implementation/04-darkwatch-attack-surface.md
+++ b/tasks/core-services-implementation/04-darkwatch-attack-surface.md
@@ -0,0 +1,75 @@
+# 04. SecurityTrails, Censys, and Shodan API Integrations
+
+meta:
+  id: core-services-04
+  feature: core-services-implementation
+  priority: P1
+  depends_on: [core-services-03]
+  tags: [darkwatch, securitytrails, censys, shodan, attack-surface, api-integration]
+
+objective:
+- Integrate SecurityTrails, Censys, and Shodan APIs into the DarkWatch scan engine to monitor domain/IP attack surface exposure, complementing HIBP's breach monitoring.
+
+deliverables:
+- SecurityTrails client for DNS/WHOIS monitoring and subdomain enumeration
+- Censys client for internet-wide host scanning and certificate transparency
+- Shodan client for IoT/device exposure and Tor exit node monitoring
+- Unified exposure normalization from all three sources
+- Cost-aware scanning (respect rate limits, cache aggressively)
+
+steps:
+1. Sign up for API keys:
+   - SecurityTrails: https://securitytrails.com (free: 50 req/mo, Pro: $49/mo)
+   - Censys: https://censys.io (free: 250 req/mo, Pro: $79/mo)
+   - Shodan: https://shodan.io (free: 1,250 results/mo, Small Biz: $299/mo)
+2. Add `SECURITYTRAILS_API_KEY`, `CENSYS_API_ID`, `CENSYS_API_SECRET`, `SHODAN_API_KEY` to `.env.example`
+3. Create `darkwatch/securitytrails.client.ts`:
+   - `getDomainInfo(domain)` — WHOIS, DNS records, subdomains
+   - `getSubdomains(domain)` — enumerate all subdomains
+   - `getHistory(domain)` — historical DNS changes
+4. Create `darkwatch/censys.client.ts`:
+   - `searchHosts(query)` — find exposed hosts by IP/domain
+   - `getCertificates(domain)` — certificate transparency logs
+   - `viewHost(ip)` — detailed host fingerprinting
+5. Create `darkwatch/shodan.client.ts`:
+   - `search(query)` — search exposed devices and services
+   - `host(ip)` — detailed host information
+   - `count(query)` — result counts for monitoring
+6. Implement unified `processScanResult(source, result)` that normalizes all API responses to internal exposure schema
+7. Map exposure types:
+   - SecurityTrails: subdomain exposure, DNS misconfiguration, domain hijacking risk
+   - Censys: exposed services, outdated TLS, certificate issues
+   - Shodan: open ports, default credentials, IoT exposure, Tor association
+8. Add tier-aware scan limits: Shield = HIBP only, Guard+ = all sources
+9. Implement intelligent caching: cache SecurityTrails DNS data for 24h, Censys/Shodan for 7d
+10. Add cost-per-scan tracking in database for billing/usage analytics
+
+tests:
+- Unit: Mock all three API responses, verify normalization and exposure creation
+- Integration: Test each client against real APIs using low-risk test queries
+- E2E: Add domain to watchlist → trigger scan → verify exposures from all three sources
+
+acceptance_criteria:
+- [ ] SecurityTrails client queries real API and returns parsed domain/subdomain data
+- [ ] Censys client queries real API and returns host/certificate information
+- [ ] Shodan client queries real API and returns device/service exposure data
+- [ ] Each client respects rate limits (SecurityTrails: 10 req/sec, Censys: 200 req/min, Shodan: 5 req/sec)
+- [ ] Circuit breakers open after 3 failures and reset after 60 seconds for each source
+- [ ] Exposure records are normalized regardless of source (consistent schema)
+- [ ] Alerts are generated for critical findings (open admin panels, exposed databases, certificate expiry)
+- [ ] Cache hit reduces API calls — verify Redis stores and returns cached data
+- [ ] Cost tracking records API usage per scan for later billing optimization
+- [ ] Free tier users only get HIBP; paid tiers unlock SecurityTrails, Censys, Shodan
+
+validation:
+- Run `vitest run darkwatch.test.ts` — all tests pass
+- Manual: Query `example.com` across all three APIs, verify meaningful results returned
+- Check Redis: Cached responses reduce subsequent API calls
+- Monitor cost: API call counts tracked in database
+
+notes:
+- SecurityTrails is most useful for domain monitoring; Censys/Shodan for IP/host exposure
+- Shodan's dark web relevance is limited — it sees Tor exit nodes, not .onion content. Consider DarkOwl ($40K+/yr) for deep dark web later
+- The free tiers are sufficient for development but production needs paid plans ($500–$1,000/mo combined)
+- Focus on actionable findings: exposed RDP, default credentials, certificate expiry — not just raw port scans
+- The existing scan engine in `darkwatch.service.ts` already routes by watchlist item type — wire in new clients there
--- a/tasks/core-services-implementation/05-darkwatch-scheduler.md
+++ b/tasks/core-services-implementation/05-darkwatch-scheduler.md
@@ -0,0 +1,72 @@
+# 05. Periodic Scan Scheduling, WebSocket Progress, and Alert Deduplication
+
+meta:
+  id: core-services-05
+  feature: core-services-implementation
+  priority: P1
+  depends_on: [core-services-03, core-services-04]
+  tags: [darkwatch, scheduler, websocket, real-time, deduplication, alerts]
+
+objective:
+- Make DarkWatch continuously useful by scheduling periodic scans, providing real-time progress via WebSocket, and eliminating alert fatigue through intelligent deduplication.
+
+deliverables:
+- Cron-based scan scheduler with configurable frequency per tier
+- WebSocket real-time scan progress updates (already have `websocket.ts`)
+- Alert cooldown periods to prevent duplicate notifications
+- Digest mode: batch low-priority alerts into daily/weekly summaries
+- Scan history and metrics dashboard data
+
+steps:
+1. Implement cron job scheduler in `jobs/handlers/darkwatch.scan.ts`:
+   - Daily scans for active subscriptions
+   - Respects tier limits (Shield = HIBP only daily, Guard+ = full suite weekly)
+2. Add `scanFrequency` field to subscription schema (daily, weekly, monthly)
+3. Wire WebSocket push from existing `websocket.ts` into scan engine:
+   - Emit `scan:started`, `scan:progress` (completedSources/totalSources), `scan:completed` events
+   - Client dashboard subscribes to user-specific scan events
+4. Enhance alert deduplication beyond existing exposure dedup:
+   - Add `alertCooldownHours` per alert type (e.g., 24h for same breach, 72h for property changes)
+   - Track lastAlertSentAt per (userId, alertType, source) tuple
+   - Don't create new alerts during cooldown unless severity increases
+5. Implement digest mode:
+   - Low-priority alerts (info) batched into daily digest email
+   - Warning/critical alerts sent immediately via push + email
+   - User preference: immediate vs. digest per severity level
+6. Add scan metrics:
+   - Store scan duration, sources checked, exposures found, alerts generated
+   - Aggregate for dashboard "threat score" calculation
+7. Implement scan failure recovery:
+   - Partial scan results saved even if one source fails
+   - Failed sources retried individually in next scan window
+8. Add rate limit per user: max 1 concurrent scan, queue subsequent requests
+
+tests:
+- Unit: Verify cron expression parsing, cooldown logic, digest batching
+- Integration: Trigger scheduled scan, verify WebSocket events emitted in correct order
+- E2E: Start scan from dashboard → watch progress bar → receive completion notification
+
+acceptance_criteria:
+- [ ] Scans run automatically on schedule without manual trigger (cron job)
+- [ ] WebSocket pushes real-time progress: `scan:progress` events with percentage complete
+- [ ] Only one scan runs per user at a time; additional requests are queued
+- [ ] Duplicate alerts are suppressed during cooldown period (configurable per type)
+- [ ] Info-level alerts are batched into daily digest; warning/critical sent immediately
+- [ ] Scan history is persisted and visible in dashboard (last scan date, sources checked, findings)
+- [ ] Failed sources don't fail entire scan — partial results are saved
+- [ ] Dashboard threat score updates automatically after each scan completion
+- [ ] Free tier gets weekly scans; paid tiers get daily scans
+- [ ] No duplicate notifications for same exposure across multiple scans
+
+validation:
+- Run cron job manually: `bun run job:darkwatch:scan`, verify scan completes and exposures created
+- Connect to WebSocket: `wscat -c ws://localhost:3000/ws`, subscribe to scan events
+- Check dashboard: Scan progress bar animates during active scan, threat score updates after
+- Test cooldown: Trigger same scan twice rapidly, verify second scan doesn't create duplicate alerts
+
+notes:
+- The existing `scanStates` Map in `darkwatch.service.ts` is in-memory — move to Redis for multi-instance safety
+- WebSocket infrastructure exists at `websocket.ts` — extend it for scan-specific events
+- The scheduler directory (`scheduler/`) currently only has Dockerfiles — this task creates actual job logic
+- Consider using Honker (Rust queue) for scan job distribution once it's production-ready
+- Alert fatigue is a real churn driver — aggressive deduplication is a competitive advantage
--- a/tasks/core-services-implementation/06-spamshield-reputation.md
+++ b/tasks/core-services-implementation/06-spamshield-reputation.md
@@ -0,0 +1,70 @@
+# 06. Twilio Lookup and Phone Reputation API Integration
+
+meta:
+  id: core-services-06
+  feature: core-services-implementation
+  priority: P1
+  depends_on: [core-services-01]
+  tags: [spamshield, reputation, twilio, caller-id, api-integration, table-stakes]
+
+objective:
+- Replace the stub Hiya/Truecaller lookup functions that return `{ score: 0, isSpam: false }` with real phone reputation API integrations (Twilio Lookup) and integrate results into the spam classification pipeline.
+
+deliverables:
+- Twilio Lookup API client for caller name, line type, and carrier info
+- Phone reputation scoring system with caching
+- Integration with existing rule engine (reputation score augments rule-based decisions)
+- STIR/SHAKEN attestation verification (if carrier partnership available)
+- Rate-limited, cost-aware API usage
+
+steps:
+1. Sign up for Twilio account and enable Lookup API at https://www.twilio.com/lookup
+2. Add `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` to `.env.example`
+3. Create `spamshield/twilio.client.ts`:
+   - `lookupPhone(phoneNumber, type?)` — caller name, line type (mobile/landline/VoIP), carrier
+   - `lookupReputation(phoneNumber)` — spam risk score, call volume, report counts
+   - `verifyStirShaken(phoneNumber)` — attestation level (A/B/C) if available
+4. Replace stub `lookupHiya()` and `lookupTruecaller()` in `reputation.api.ts` with real Twilio calls
+5. Implement reputation scoring algorithm:
+   - Twilio spam risk score (0–100) mapped to internal confidence (0.0–1.0)
+   - Line type weighting: VoIP = higher risk, landline = lower risk
+   - Carrier reputation: known spam carriers = +20 risk
+   - STIR/SHAKEN attestation: Full attestation (A) = -30 risk, None (C) = +20 risk
+6. Cache results in Redis with 24h TTL (phone numbers don't change reputation rapidly)
+7. Wire into `spamshield.service.ts`:
+   - Before rule engine, check reputation
+   - If reputation confidence > 0.7, block immediately
+   - If reputation confidence 0.4–0.7, flag for review
+   - If reputation confidence < 0.4, proceed to rule engine + ML classifier
+8. Add cost tracking: $0.004–$0.03 per lookup, track monthly usage per user
+9. Implement fallback: if Twilio API fails, use internal rule engine only (graceful degradation)
+
+tests:
+- Unit: Mock Twilio API responses, verify reputation scoring algorithm
+- Integration: Test with real Twilio Lookup API using known spam number
+- E2E: Submit spam check for phone number → verify reputation lookup → get classification result
+
+acceptance_criteria:
+- [ ] `lookupPhone()` makes real HTTP request to Twilio Lookup API
+- [ ] Reputation scores are calculated from real Twilio data (not hardcoded zeros)
+- [ ] High-reputation numbers (confidence > 0.7) trigger automatic block without rule/ML processing
+- [ ] Cache stores reputation results for 24 hours, reducing API costs
+- [ ] Twilio API failures gracefully fall back to rule engine (no crashes)
+- [ ] Cost tracking records each lookup for billing analytics
+- [ ] STIR/SHAKEN attestation is checked and factored into score when available
+- [ ] VoIP lines get +20 risk weighting compared to landline
+- [ ] Internal DB cache (`lookupInternalDB`) is checked before Twilio API call
+- [ ] Rate limits: max 100 lookups/minute per user to prevent abuse
+
+validation:
+- Run `vitest run spamshield.service.test.ts` — all tests pass
+- Manual: Check reputation for known spam number (e.g., reported robocall number), verify high score
+- Check cache: Redis `GET spamshield:reputation:+15551234567` returns cached result
+- Monitor cost: Database shows lookup usage per user per month
+
+notes:
+- Twilio Lookup costs $0.004 per basic lookup, $0.03 per advanced lookup (reputation, caller name)
+- At 100 lookups/user/month, cost is $0.40–$3.00 per user — manageable at $12+/mo ARPU
+- Hiya and Truecaller have proprietary APIs but require carrier partnerships — Twilio is the best consumer-accessible option
+- STIR/SHAKEN requires telecom partner for full attestation data — implement if/when partnership exists
+- The existing rule engine (`ruleEngine()`) is functional — reputation augments it, doesn't replace it
--- a/tasks/core-services-implementation/07-spamshield-ml-classifier.md
+++ b/tasks/core-services-implementation/07-spamshield-ml-classifier.md
@@ -0,0 +1,84 @@
+# 07. Fine-Tuned DistilBERT SMS Spam Classifier with ONNX Deployment
+
+meta:
+  id: core-services-07
+  feature: core-services-implementation
+  priority: P1
+  depends_on: [core-services-06]
+  tags: [spamshield, ml, nlp, distilbert, onnx, text-classification]
+
+objective:
+- Replace the stub `classifyTextBERT()` function that returns `{ isSpam: false, confidence: 1.0 }` with a production ML pipeline: fine-tune DistilBERT on SMS spam data, export to ONNX for fast inference, and integrate into the spam classification flow.
+
+deliverables:
+- Training pipeline for fine-tuning DistilBERT on SMS spam dataset
+- ONNX-exported model for low-latency CPU inference (~50ms per message)
+- Inference server with batching and caching
+- Integration with existing spam classification service
+- Model versioning and A/B testing framework
+
+steps:
+1. Set up Python training environment:
+   - Install `transformers`, `datasets`, `onnxruntime`, `torch`, `optimum[onnxruntime]`
+   - Create `ml/spam-classifier/` directory in project root
+2. Acquire training data:
+   - SMS Spam Collection Dataset (UCI ML Repository, 5,574 messages)
+   - Enron Spam Dataset (email corpus, filter to SMS-like short messages)
+   - Custom labeled data from user feedback (Phase 2)
+3. Fine-tune DistilBERT-base-uncased:
+   - Binary classification: spam vs. ham
+   - 3 epochs, batch size 32, learning rate 2e-5
+   - Expected accuracy: 97–99% on SMS Spam Collection
+4. Export to ONNX:
+   - Use Optimum CLI: `optimum-cli export onnx --model distilbert-spam ./onnx_model/`
+   - Quantize to INT8 for 2x speedup with minimal accuracy loss
+   - Target model size: ~65MB (DistilBERT base), ~33MB (INT8)
+5. Create Node.js ONNX inference wrapper:
+   - Install `onnxruntime-node`
+   - Load model once at startup, reuse session
+   - Preprocess: tokenize with DistilBERT tokenizer (max length 128)
+   - Postprocess: sigmoid on logits → probability → binary decision
+   - Target latency: <50ms per message on CPU, <10ms on GPU
+6. Integrate into `spamshield.service.ts`:
+   - Replace `classifyTextBERT()` call with real ONNX inference
+   - Classification flow: reputation lookup → rule engine → ML classifier (ensemble)
+   - Threshold tuning: default 0.5, adjustable per user preference
+7. Implement feedback loop:
+   - User can report false positive/negative
+   - Store feedback in `spamFeedback` table (already exists)
+   - Weekly retraining batch using accumulated feedback
+8. Add model versioning:
+   - Store model artifact in S3-compatible storage
+   - A/B test new models on subset of traffic
+   - Rollback capability if accuracy degrades
+
+tests:
+- Unit: Verify ONNX inference produces correct labels for known spam/ham test cases
+- Integration: End-to-end classification flow with real model loading
+- E2E: Submit SMS text → receive classification with confidence score
+
+acceptance_criteria:
+- [ ] `classifyTextBERT()` runs real ONNX inference (not returning hardcoded `{ isSpam: false }`)
+- [ ] Model accuracy > 95% on held-out test set from SMS Spam Collection
+- [ ] Inference latency < 50ms per message on CPU (measured in production)
+- [ ] Model file is versioned and loadable from external storage (S3/local path)
+- [ ] False positive rate < 2% (legitimate messages incorrectly flagged as spam)
+- [ ] User feedback ("not spam" / "spam") is stored and used for model improvement
+- [ ] Classification threshold is configurable per user (strict/moderate/lenient)
+- [ ] ONNX model loads once at server startup, not per-request
+- [ ] Graceful fallback to rule engine if ONNX runtime fails
+- [ ] Model size < 100MB for reasonable cold-start time
+
+validation:
+- Run `vitest run spamshield.service.test.ts` — tests use real ONNX model
+- Benchmark: `bun run benchmark:spamshield` — measure 1000 inferences, report p50/p95/p99 latency
+- Manual: Classify known spam message "Congratulations! You've won $1000...", verify `isSpam: true, confidence > 0.9`
+- Check feedback: Database `spamFeedback` table accumulates user corrections
+
+notes:
+- DistilBERT is chosen over BERT for 40% smaller size and 60% faster inference with minimal accuracy loss
+- ONNX Runtime Node.js has limited platform support — test on your deployment target (Linux x64, macOS ARM)
+- Training can happen in CI (GitHub Actions with GPU runner) or locally — inference happens in production
+- Consider TensorFlow Lite or ONNX Runtime Web for on-device mobile inference later
+- The SMS Spam Collection is small (5,574 messages) — augment with synthetic spam variants for robustness
+- For European languages, consider multilingual model like `distilbert-base-multilingual-cased`
--- a/tasks/core-services-implementation/08-removebrokers-50-plus.md
+++ b/tasks/core-services-implementation/08-removebrokers-50-plus.md
@@ -0,0 +1,79 @@
+# 08. Expand Broker Coverage to 50+ with CAPTCHA Solving and Re-Scan Pipeline
+
+meta:
+  id: core-services-08
+  feature: core-services-implementation
+  priority: P2
+  depends_on: [core-services-02]
+  tags: [removebrokers, automation, captcha, scaling, maintenance]
+
+objective:
+- Scale from top 20 brokers to 50+ automated removals, implement CAPTCHA solving, and build the re-scan pipeline that detects re-listings.
+
+deliverables:
+- 30+ additional broker adapters (total 50+)
+- CAPTCHA solving integration (2Captcha or AntiCaptcha API)
+- Re-scan scheduler that checks if removed profiles have reappeared
+- Email verification handling for opt-out confirmation emails
+- Removal success rate dashboard metric
+
+steps:
+1. Select next 30 brokers from registry by opt-out complexity (medium-difficulty form-based flows)
+2. Create adapter modules for each broker in `removebrokers/adapters/`
+3. Implement CAPTCHA solving:
+   - Detect reCAPTCHA v2/v3, hCaptcha, image challenges
+   - Integrate 2Captcha API ($0.001–$0.01 per solve)
+   - Add `CAPTCHA_SOLVER_API_KEY` to environment config
+   - Fallback to manual queue if CAPTCHA solving fails 3 times
+4. Implement email verification handling:
+   - Monitor mailbox for opt-out confirmation emails
+   - Parse confirmation links and auto-click them
+   - Store confirmation status in database
+5. Build re-scan pipeline:
+   - Weekly scheduled job that re-scans all "completed" removals
+   - If profile reappears, create new removal request automatically
+   - Track re-listing rate per broker (some re-list every 30 days)
+6. Add success metrics:
+   - Track removal success rate per broker (% of opt-outs that stick)
+   - Dashboard widget showing "X of Y brokers removed"
+   - Alert user when re-listing detected
+7. Implement proxy rotation pool:
+   - Use residential proxy service (BrightData, IPRoyal)
+   - Rotate IP per broker session to avoid blocks
+   - Budget $1K–$3K/mo for proxy infrastructure
+8. Add adapter health monitoring:
+   - Track adapter breakage rate
+   - Alert engineering when >5% of adapters fail in 24h
+   - Auto-disable broken adapters, queue for manual fix
+
+tests:
+- Unit: Mock CAPTCHA solver, verify retry and fallback logic
+- Integration: Test CAPTCHA solving against real broker site
+- E2E: Complete removal for broker with CAPTCHA → verify re-scan detects re-listing
+
+acceptance_criteria:
+- [ ] 50+ broker adapters implemented and tested
+- [ ] CAPTCHA challenges are detected and solved automatically (2Captcha integration)
+- [ ] Failed CAPTCHA solving escalates to manual queue after 3 attempts
+- [ ] Email confirmation links are parsed and clicked automatically
+- [ ] Re-scan job runs weekly and detects re-listings within 7 days
+- [ ] Re-listed profiles trigger automatic new removal requests
+- [ ] Dashboard shows accurate removal progress: "47 of 50 brokers completed"
+- [ ] Per-broker success rate is tracked and visible in admin panel
+- [ ] Proxy rotation prevents IP blocking on high-volume brokers
+- [ ] Adapter breakage is detected within 24 hours and auto-disabled
+- [ ] Monthly proxy + CAPTCHA cost per user < $4 (within gross margin target)
+
+validation:
+- Run `vitest run removebrokers.service.test.ts` — extended tests for 50 brokers
+- Manual: Test CAPTCHA broker (e.g., MyLife), verify automatic solving works
+- Check re-scan: Run `bun run job:removebrokers:rescan`, verify re-listings detected
+- Monitor costs: Dashboard shows monthly proxy/CAPTCHA spend per customer
+
+notes:
+- Broker sites change frequently — budget 20% engineering time for adapter maintenance
+- Some brokers (Acxiom, Epsilon) require physical mail — flag these for manual processing
+- Re-listing is common — data brokers rebuild databases from public records every 30–90 days
+- Consider AI-assisted form field detection (GPT-4 Vision) to reduce per-adapter development time
+- The existing `broker.registry.ts` already has 100+ entries — prioritize by traffic/popularity
+- Success rate target: 80%+ for automated removals, 90%+ with manual fallback
--- a/tasks/core-services-implementation/09-hometitle-attom-api.md
+++ b/tasks/core-services-implementation/09-hometitle-attom-api.md
@@ -0,0 +1,74 @@
+# 09. Attom Data Solutions API for Property Record Snapshots
+
+meta:
+  id: core-services-09
+  feature: core-services-implementation
+  priority: P2
+  depends_on: [core-services-01]
+  tags: [hometitle, attom, property-records, api-integration, real-estate]
+
+objective:
+- Replace the `fetchCountyRecords()` stub that returns `{ ownerName: "Unknown Owner" }` with a real property data API integration using Attom Data Solutions, enabling actual property snapshot and change detection.
+
+deliverables:
+- Attom API client for property search, owner info, and tax/assessment data
+- Property snapshot creation and storage in database
+- Change detection pipeline wired to real data (your detector logic already works)
+- Alert generation for ownership changes, liens, and tax status changes
+
+steps:
+1. Sign up for Attom Data API at https://attomdata.com (pricing: ~$0.05–$0.10/record, enterprise plans available)
+2. Add `ATTOM_API_KEY` to `.env.example` and validate in `env.ts`
+3. Create `hometitle/attom.client.ts`:
+   - `searchProperty(address)` — find property by address, return parcel ID and metadata
+   - `getPropertyProfile(parcelId)` — full property record: owner, deed date, tax info, liens
+   - `getPropertyHistory(parcelId)` — historical ownership and transaction records
+   - `getTaxInfo(parcelId)` — tax amount, delinquency status, exemptions
+4. Replace `fetchCountyRecords()` in `scanner.ts` with Attom API call:
+   - Use geocoding result (Google Maps API, already works) to get normalized address
+   - Query Attom by address → get parcel ID → fetch full property profile
+   - Parse response into `CountyRecord` / `SnapshotData` schema
+5. Implement snapshot storage:
+   - Store initial snapshot in `propertySnapshots` table
+   - On re-scan, fetch new snapshot → compare with last → detect changes
+6. Wire change detection (your `change.detector.ts` is already implemented):
+   - `ownership_transfer`: owner name changed → critical alert
+   - `lien_filing`: lien count increased → warning/critical alert
+   - `tax_change`: tax amount changed → info alert
+   - `deed_change`: deed date changed → critical alert
+7. Implement tier limits:
+   - Guard: 1 property monitored
+   - Fortress: 3 properties monitored
+   - Family: 5 properties monitored
+8. Add cost tracking: ~$0.05–$0.10 per property lookup, track per-user usage
+
+tests:
+- Unit: Mock Attom API responses, verify parsing and snapshot creation
+- Integration: Test with real Attom API using known property address
+- E2E: Add property to watchlist → trigger scan → verify snapshot created → simulate change → verify alert
+
+acceptance_criteria:
+- [ ] `fetchCountyRecords()` makes real HTTP request to Attom API (not returning mock data)
+- [ ] Property snapshots contain real owner name, deed date, tax amount, lien count
+- [ ] Change detection compares real snapshots and identifies actual changes
+- [ ] Ownership transfer creates critical alert with property address in message
+- [ ] Lien filing creates warning or critical alert depending on lien amount
+- [ ] Alert severity matches existing `severityForChange()` logic
+- [ ] Geocoding → Attom search → snapshot pipeline works end-to-end
+- [ ] Cost tracking records each Attom API call for billing analytics
+- [ ] Tier limits enforced: Guard = 1 property, Fortress = 3, Family = 5
+- [ ] Graceful fallback: if Attom API fails, retry once, then alert user of monitoring gap
+
+validation:
+- Run `vitest run hometitle.test.ts` — all tests pass with real Attom mock
+- Manual: Add real property address, trigger scan, verify snapshot in database
+- Simulate change: Update snapshot in database with different owner, trigger detector, verify alert
+- Check cost: Database shows Attom API usage per user per month
+
+notes:
+- Attom covers ~150M US properties but not all counties equally — some rural areas may have gaps
+- For counties not covered by Attom, Phase 3 (task 10) implements county recorder web scrapers
+- Property fraud is a real and growing problem: FTC reports $1B+ in losses annually
+- This is a unique differentiator — no major identity protection competitor offers property monitoring
+- Consider partnership with title insurance companies for added credibility
+- The existing Google Maps geocoding already works — verify `GEOCODING_API_KEY` is set
--- a/tasks/core-services-implementation/10-hometitle-county-scrapers.md
+++ b/tasks/core-services-implementation/10-hometitle-county-scrapers.md
@@ -0,0 +1,83 @@
+# 10. County Recorder Web Scrapers for Top 100 US Counties
+
+meta:
+  id: core-services-10
+  feature: core-services-implementation
+  priority: P2
+  depends_on: [core-services-09]
+  tags: [hometitle, scraping, county-records, fallback, coverage]
+
+objective:
+- Build Playwright-based web scrapers for county recorder websites in the top 100 US counties by population, providing a fallback for counties not covered by Attom API and reducing API costs.
+
+deliverables:
+- Scrapers for 100 US county recorder websites (starting with top 50)
+- Unified property record parser that normalizes disparate HTML formats
+- Fallback logic: Attom API → county scraper → manual request (in order)
+- scraper health monitoring and breakage detection
+
+steps:
+1. Identify top 100 US counties by population (start with top 50):
+   - Los Angeles County, CA; Cook County, IL; Harris County, TX; Maricopa County, AZ; etc.
+2. Research each county's recorder website:
+   - Search URL pattern (usually `https://{county}.gov/recorder` or similar)
+   - Record search interface (by owner name, parcel ID, or address)
+   - Result format (HTML table, PDF, JSON API, proprietary system)
+3. Create `hometitle/county-scrapers/` directory with one module per county
+4. Implement base scraper interface:
+   - `searchByAddress(address): Promise<CountyRecord[]>`
+   - `searchByParcelId(parcelId): Promise<CountyRecord | null>`
+   - `parseResults(html): CountyRecord[]`
+5. Implement scrapers for each county using Playwright:
+   - Navigate to recorder website
+   - Fill search form (address or parcel ID)
+   - Submit and wait for results
+   - Parse HTML table or detail page
+   - Extract: owner name, deed date, tax info, lien status
+6. Implement unified `parseDeedRecords(html)` that handles common formats:
+   - HTML tables with standard columns
+   - Detail pages with labeled fields
+   - PDF records (download + text extraction)
+7. Add fallback chain in `scanner.ts`:
+   - Try Attom API first (fastest, most reliable)
+   - If Attom returns null/empty, try county scraper
+   - If scraper fails, queue for manual request (email to user)
+8. Add scraper monitoring:
+   - Track success/failure rate per county
+   - Alert when >20% of scrapers fail in 24h (site changes)
+   - Auto-disable broken scrapers, fall back to Attom/manual
+9. Handle rate limiting:
+   - Throttle requests to county sites (max 1 req/5 sec per county)
+   - Use residential proxies if county blocks datacenter IPs
+   - Respect robots.txt and terms of service
+
+tests:
+- Unit: Mock HTML responses for common county formats, verify parser normalization
+- Integration: Test 5 representative county scrapers against live sites
+- E2E: Property in county without Attom coverage → scraper fetches real data → snapshot created
+
+acceptance_criteria:
+- [ ] 50+ county recorder scrapers implemented and tested against live sites
+- [ ] `parseDeedRecords()` parses real HTML and returns structured CountyRecord objects
+- [ ] Fallback chain works: Attom → county scraper → manual request
+- [ ] Each scraper handles the county's specific search interface and result format
+- [ ] Rate limiting respects county sites (max 1 request per 5 seconds)
+- [ ] Broken scrapers are auto-detected within 24 hours and disabled
+- [ ] Scraper success rate > 70% across all implemented counties
+- [ ] Property records from scrapers match Attom data quality (owner name, deed date, liens)
+- [ ] Failed scraper attempts fall back to manual queue with user notification
+- [ ] No county site is overwhelmed by scraping (responsible rate limits)
+
+validation:
+- Run `vitest run hometitle.test.ts` — extended tests for county scrapers
+- Manual: Search property in Cook County IL, verify scraper returns real owner data
+- Check fallback: Disable Attom API key, trigger scan, verify county scraper activates
+- Monitor health: Dashboard shows per-county scraper success rate
+
+notes:
+- County recorder sites are notoriously fragile — expect 30–40% of scrapers to break per quarter
+- Many counties use proprietary systems (e.g., Tyler Technologies, Fidlar Technologies) with complex JavaScript
+- Some counties require payment per record ($1–$5) — flag these for manual processing
+- Consider partnering with Attom for counties they don't cover rather than building scrapers
+- Legal: Ensure scraping complies with each county's terms of service and state public records laws
+- The existing `parseDeedRecords()` currently logs "not yet implemented" — replace with real parsing
--- a/tasks/core-services-implementation/11-voiceprint-azure-api.md
+++ b/tasks/core-services-implementation/11-voiceprint-azure-api.md
@@ -0,0 +1,84 @@
+# 11. Azure Voice Live API for Synthetic Voice Detection
+
+meta:
+  id: core-services-11
+  feature: core-services-implementation
+  priority: P2
+  depends_on: [core-services-01]
+  tags: [voiceprint, azure, voice-clone-detection, liveness, api-integration]
+
+objective:
+- Replace the stub `detectSynthetic()` that returns `{ isSynthetic: false, confidence: 1.0 }` with a real Azure Voice Live API integration, enabling consumer-facing voice clone detection via uploaded call recordings or live microphone capture.
+
+deliverables:
+- Azure Speech Services client with Voice Live API endpoint
+- Audio preprocessing pipeline (resampling, normalization, VAD)
+- Voice enrollment system for trusted contacts (family member voice templates)
+- Synthetic detection endpoint that returns real confidence scores
+- Call recording upload and analysis workflow
+
+steps:
+1. Sign up for Azure Speech Services at https://azure.microsoft.com/services/cognitive-services/speech-services/
+2. Add `AZURE_SPEECH_KEY` and `AZURE_SPEECH_REGION` to `.env.example`
+3. Create `voiceprint/azure.client.ts`:
+   - `detectLiveness(audioBuffer, referenceText?)` — Voice Live API for challenge-response liveness
+   - `verifySpeaker(audioBuffer, enrollmentId)` — speaker verification against enrolled voice
+   - `enrollSpeaker(audioSamples): Promise<enrollmentId>` — create voice template from samples
+4. Implement audio preprocessing:
+   - Convert to 16kHz mono PCM (Azure requirement)
+   - Normalize amplitude to -3 dBFS
+   - Trim silence using VAD (WebRTC or Silero)
+   - Max duration: 30 seconds per analysis
+5. Implement enrollment flow:
+   - User records 3–5 samples of family member saying phrases
+   - Store enrollment in database with `voiceEnrollments` schema (already exists)
+   - Generate enrollment ID, link to user account
+6. Implement detection flow:
+   - User uploads suspicious call recording or captures live audio
+   - Preprocess audio → Azure Voice Live API → get liveness score
+   - If enrollment exists, also run speaker verification → similarity score
+   - Combine scores: synthetic = low liveness AND low speaker match
+7. Implement `detectSynthetic()` to return real analysis:
+   - Score: 0.0–1.0 (synthetic likelihood)
+   - Confidence: based on audio quality and API response certainty
+   - Decision: synthetic if score > 0.7, suspicious if 0.4–0.7, genuine if < 0.4
+8. Add analysis history:
+   - Store every analysis in database (audio hash, score, decision)
+   - Dashboard shows history of analyzed calls
+   - User can report false positive/negative for model improvement
+9. Implement tier limits:
+   - Fortress+: VoicePrint included
+   - Lower tiers: not available or limited to 5 analyses/month
+
+tests:
+- Unit: Mock Azure API responses, verify score calculation and decision logic
+- Integration: Test with real Azure Voice Live API using synthetic and genuine audio samples
+- E2E: Upload suspicious call recording → receive analysis result with confidence score
+
+acceptance_criteria:
+- [ ] `detectSynthetic()` calls real Azure Voice Live API (not returning hardcoded `isSynthetic: false`)
+- [ ] Audio preprocessing converts to 16kHz mono PCM and normalizes amplitude
+- [ ] Voice enrollment creates usable template from 3–5 user-provided samples
+- [ ] Speaker verification returns similarity score between 0.0 and 1.0
+- [ ] Liveness detection returns pass/fail with confidence for challenge-response mode
+- [ ] Combined score correctly flags known synthetic voice samples (>0.7 threshold)
+- [ ] Analysis results are stored in database with audio hash and metadata
+- [ ] Dashboard shows analysis history with play button for uploaded audio
+- [ ] Tier enforcement: VoicePrint only available on Fortress+ plans
+- [ ] Graceful fallback: if Azure API fails, return "analysis unavailable" (not false negative)
+- [ ] False positive rate < 5% on genuine voice samples (tested with 100+ samples)
+
+validation:
+- Run `vitest run voiceprint.test.ts` — all tests pass with Azure mock
+- Manual: Upload genuine voice sample, verify `isSynthetic: false` with confidence > 0.9
+- Manual: Upload synthetic voice (e.g., from ElevenLabs), verify `isSynthetic: true` with confidence > 0.7
+- Check enrollment: Database `voiceEnrollments` table has real templates with Azure enrollment IDs
+
+notes:
+- Azure Voice Live API costs ~$0.016/minute of audio analyzed
+- At 50 analyses/user/month (1–2 min each), cost is ~$0.80–$1.60/user/month
+- This is the ONLY practical path for a startup — building in-house costs $840K–$1.25M Year 1
+- The differentiator isn't the detection tech (everyone uses Azure/Daon/Pindrop) — it's the consumer UX and integration
+- Consider adding forensic analysis mode: detailed spectrogram visualization for user education
+- Mobile integration (iOS CallKit, Android Telecom) is Phase 4 (task 12) — this task is server-side only
+- Store audio samples securely (encrypted at rest) and allow user deletion (privacy compliance)
--- a/tasks/core-services-implementation/12-voiceprint-mobile-integration.md
+++ b/tasks/core-services-implementation/12-voiceprint-mobile-integration.md
@@ -0,0 +1,84 @@
+# 12. iOS CallKit and Android Telecom API for Real-Time Call Analysis
+
+meta:
+  id: core-services-12
+  feature: core-services-implementation
+  priority: P2
+  depends_on: [core-services-11]
+  tags: [voiceprint, ios, android, callkit, telecom-api, real-time, mobile]
+
+objective:
+- Integrate VoicePrint into the iOS and Android mobile apps via CallKit and Telecom API, enabling real-time call recording, analysis, and synthetic voice alerts during active phone calls.
+
+deliverables:
+- iOS CallKit extension for call interception and recording
+- Android Telecom API integration for call screening and recording
+- Real-time audio streaming to server for analysis
+- Push notification alert when synthetic voice detected during call
+- On-device audio capture and upload pipeline
+
+steps:
+1. **iOS Implementation:**
+   - Create CallKit extension (`CallDirectoryExtension`) for caller identification
+   - Implement `CXProvider` delegate for call state monitoring
+   - Add audio recording permission (NSMicrophoneUsageDescription in Info.plist)
+   - Stream call audio to server via WebSocket or upload after call ends
+   - Show in-call alert overlay when synthetic voice detected
+   - Handle app backgrounding and call recording continuity
+2. **Android Implementation:**
+   - Implement `TelecomManager` with `ConnectionService` for call monitoring
+   - Add `READ_PHONE_STATE`, `RECORD_AUDIO`, `FOREGROUND_SERVICE` permissions
+   - Create call screening service that triggers on incoming/outgoing calls
+   - Record call audio using `MediaRecorder` or `AudioRecord`
+   - Upload audio to server for analysis after call ends
+   - Show heads-up notification when synthetic voice detected
+3. **Server-side integration:**
+   - Extend VoicePrint tRPC router with `analyzeCallRecording` endpoint
+   - Handle multipart audio upload (WAV/MP3 format)
+   - Queue analysis job, push result via WebSocket or push notification
+   - Store analysis result linked to call metadata (number, duration, timestamp)
+4. **Real-time vs. post-call analysis:**
+   - Phase 1: Post-call upload + analysis (simpler, lower latency requirement)
+   - Phase 2: Real-time streaming chunks during call (requires <500ms analysis)
+5. **User experience:**
+   - Settings toggle: "Analyze calls for voice cloning"
+   - After each analyzed call: summary card in app (genuine/suspicious/synthetic)
+   - Emergency override: one-tap hangup + block number when synthetic detected
+6. **Privacy and compliance:**
+   - Two-party consent state detection (disable recording in 2-party consent states)
+   - User must explicitly opt-in before any call recording
+   - Audio data encrypted in transit and at rest
+   - Auto-delete audio after analysis (configurable retention: 0–30 days)
+
+tests:
+- Unit: Mock CallKit/Telecom callbacks, verify audio capture and upload logic
+- Integration: Test audio upload and analysis flow on device simulator
+- E2E: Receive call on device → record audio → upload → receive analysis notification
+
+acceptance_criteria:
+- [ ] iOS app can record incoming call audio and upload to server for analysis
+- [ ] Android app can record incoming call audio and upload to server for analysis
+- [ ] Call recording only happens after explicit user opt-in
+- [ ] Two-party consent states are detected and recording is disabled (legal compliance)
+- [ ] Uploaded audio is analyzed by Azure Voice Live API and result pushed to device
+- [ ] Push notification sent within 30 seconds of analysis completion
+- [ ] In-app call summary shows: caller number, duration, analysis result, confidence score
+- [ ] Emergency hangup button available when synthetic voice detected
+- [ ] Audio data is encrypted in transit (TLS) and deleted after analysis (0-day retention default)
+- [ ] App handles backgrounding without losing call recording session
+- [ ] Recording doesn't interfere with normal call audio quality
+
+validation:
+- iOS: Test on physical device (simulator doesn't support CallKit), verify recording and upload
+- Android: Test on physical device, verify Telecom API integration and notification delivery
+- Server: Verify `analyzeCallRecording` endpoint accepts multipart upload and returns analysis
+- Legal review: Confirm 2-party consent logic covers all US states correctly
+
+notes:
+- iOS CallKit extensions run in separate process — share data via App Groups
+- Android Telecom API requires phone app to be default dialer (limited market penetration)
+- Alternative: Use accessibility service on Android for broader call recording (more invasive UX)
+- Real-time analysis requires chunking audio into 3–5 second segments and streaming — much harder than post-call
+- Consider starting with post-call analysis and adding real-time as Phase 2
+- Audio file sizes: 1 minute of WAV at 16kHz mono = ~1.9MB; compress to AAC/MP3 for upload
+- The existing iOS `VoicePrintViewModel.swift` and Android `VoicePrintViewModel.kt` need updating
--- a/tasks/core-services-implementation/13-correlation-engine.md
+++ b/tasks/core-services-implementation/13-correlation-engine.md
@@ -0,0 +1,81 @@
+# 13. Cross-Service Threat Correlation Scoring and Unified Alert Feed
+
+meta:
+  id: core-services-13
+  feature: core-services-implementation
+  priority: P2
+  depends_on: [core-services-05, core-services-07, core-services-08]
+  tags: [correlation, threat-scoring, unified-alerts, intelligence, dashboard]
+
+objective:
+- Activate the correlation service to cross-reference findings across VoicePrint, DarkWatch, SpamShield, HomeTitle, and RemoveBrokers, generating unified threat scores and correlated alert narratives that explain multi-vector attacks.
+
+deliverables:
+- Cross-service correlation rules (e.g., breached email + spam call from same source = coordinated attack)
+- Unified threat score algorithm (0–100) per user and per family member
+- Correlated alert narratives: "Your email was breached on Monday, and today you received a spam call to that number — this may be a targeted attack"
+- Dashboard threat score widget with historical trend
+
+steps:
+1. Analyze existing correlation service (`services/correlation/`):
+   - Review current schema and logic in `correlation.service.ts`
+   - Identify data sources available from each service
+2. Define correlation rules:
+   - Rule 1: Same email found in HIBP breach AND receiving spam calls → coordinated attack (+30 threat score)
+   - Rule 2: Property lien filed AND data broker listing active → identity theft in progress (+40 threat score)
+   - Rule 3: Voice clone detected AND family member SSN on dark web → targeted family scam (+50 threat score)
+   - Rule 4: Multiple breaches in 30 days → compromised identity (+20 threat score)
+   - Rule 5: Spam call from number associated with known scam campaign → high risk (+25 threat score)
+3. Implement correlation detection pipeline:
+   - Subscribe to alert creation events from all 5 services
+   - Window function: look back 30 days for related findings
+   - Match on shared entities (email, phone, SSN, address, name)
+4. Implement threat scoring algorithm:
+   - Base score: sum of individual alert severities (info=1, warning=3, critical=5)
+   - Correlation bonus: +10–50 per matched rule
+   - Time decay: scores decrease by 10% per week (old alerts matter less)
+   - Family aggregation: highest individual score + average of others / 2
+   - Cap at 100, floor at 0
+5. Implement unified alert feed:
+   - Merge individual service alerts into chronological feed
+   - Group correlated alerts into "attack narratives"
+   - Show narrative summary: "3 related events detected — possible coordinated attack"
+6. Update dashboard widgets:
+   - Threat Score widget: current score with color coding (green <30, yellow 30–60, red >60)
+   - Trend graph: score over last 90 days
+   - Alert Feed widget: unified feed with narrative grouping
+7. Add proactive recommendations:
+   - If score > 60: recommend password changes, credit freeze, family notification
+   - If HomeTitle + RemoveBrokers correlated: recommend title insurance review
+   - If VoicePrint detected: recommend warning family members, filing FTC report
+
+tests:
+- Unit: Mock alerts from multiple services, verify correlation rules fire correctly
+- Integration: Create correlated alerts in database, verify threat score calculation
+- E2E: Trigger breach alert + spam alert for same email → verify unified narrative created
+
+acceptance_criteria:
+- [ ] Correlation rules detect cross-service relationships within 30-day window
+- [ ] Threat score is calculated from individual alert severities + correlation bonuses
+- [ ] Score decays by 10% per week (time-weighted relevance)
+- [ ] Family plan aggregates scores across members
+- [ ] Unified alert feed groups correlated events into narrative summaries
+- [ ] Dashboard threat score widget updates in real-time as new alerts arrive
+- [ ] Proactive recommendations appear based on current threat score and active correlations
+- [ ] Correlation engine doesn't create false positives (test with 100 random alerts, <5% false correlation rate)
+- [ ] Historical trend graph shows score changes over 90 days
+- [ ] Each correlated narrative links to individual alert details
+
+validation:
+- Run `vitest run correlation.test.ts` — all tests pass
+- Manual: Create test alerts (breached email + spam call), verify correlation detected
+- Dashboard: Threat score updates from 15 to 55 after correlation bonus applied
+- Trend: 90-day graph shows spike during test period
+
+notes:
+- The existing `correlation.service.ts` and `correlation.ts` router need activation — not just stubs
+- Correlation is the key differentiator from point-solution competitors (Aura, LifeLock)
+- False positive rate must be low — users will ignore alerts if too many are irrelevant
+- Consider using graph database (Neo4j) for complex relationship queries at scale
+- The existing `normalizedAlerts` table already stores cross-service alerts — use this as correlation source
+- Mobile apps should show simplified threat score and latest narrative, not full correlation graph
--- a/tasks/core-services-implementation/14-family-plans.md
+++ b/tasks/core-services-implementation/14-family-plans.md
@@ -0,0 +1,91 @@
+# 14. Family Plan Member Management, Billing Proration, and Multi-User Dashboard
+
+meta:
+  id: core-services-14
+  feature: core-services-implementation
+  priority: P2
+  depends_on: [core-services-01]
+  tags: [billing, family-plans, multi-user, proration, dashboard, member-management]
+
+objective:
+- Implement family plan support: invite family members, manage their access, prorate billing on member changes, and provide a multi-user dashboard showing consolidated family security status.
+
+deliverables:
+- Family member invitation system (email invites with acceptance flow)
+- Role-based access control (primary account holder vs. member)
+- Billing proration for adding/removing family members mid-cycle
+- Family dashboard showing all members' threat scores and alerts
+- Per-member service configuration (what each member monitors)
+
+steps:
+1. Extend database schema:
+   - Add `familyGroups` table: id, primaryUserId, planTier, maxMembers, createdAt
+   - Add `familyMembers` table: id, familyGroupId, userId, role (primary/member), status (pending/active/removed), invitedAt, joinedAt
+   - Add `familyInvitations` table: id, familyGroupId, email, token, expiresAt, acceptedAt
+2. Implement invitation flow:
+   - Primary user sends invite by email → generates signed token
+   - Invitee clicks link → creates account (if new) or links existing account
+   - Invitation expires after 7 days
+   - Send reminder email after 3 days if not accepted
+3. Implement member management:
+   - Primary user can view all members, their active services, and threat scores
+   - Primary user can remove members (prorated refund or credit)
+   - Members can leave family group voluntarily
+   - Members cannot see other members' sensitive data (SSN, specific breach details)
+4. Implement billing proration:
+   - Add member mid-cycle: charge prorated amount for remaining days via Stripe
+   - Remove member mid-cycle: credit prorated amount to account balance
+   - Change plan tier: prorate difference, apply to next invoice
+   - Use Stripe's `proration_behavior: 'create_prorations'` for all changes
+5. Implement family dashboard:
+   - Sidebar shows family group name and member count
+   - Main view: cards for each member with photo, name, threat score, recent alert count
+   - Click member → detailed view with their services, alerts, and settings
+   - Consolidated family threat score (from correlation engine)
+6. Implement per-member service configuration:
+   - Primary user assigns which services each member gets
+   - Default: all members get DarkWatch + SpamShield + RemoveBrokers
+   - HomeTitle and VoicePrint limited by property/voice enrollment slots
+   - Members can configure their own watchlist items within assigned services
+7. Implement notification routing:
+   - Critical alerts notify primary user AND affected member
+   - Billing notifications go to primary user only
+   - Member can opt into/off specific alert types
+8. Add family plan tiers:
+   - Family Fortress: 5 adults + unlimited children, $45/mo
+   - Family Guard: 3 adults + unlimited children, $35/mo
+   - Enforce max member limits at invitation time
+
+tests:
+- Unit: Proration calculation for add/remove/upgrade scenarios
+- Integration: Full invitation flow from email to account linking
+- E2E: Create family plan → invite 2 members → verify billing → remove member → verify prorated credit
+
+acceptance_criteria:
+- [ ] Primary user can send email invitations to family members
+- [ ] Invitations expire after 7 days and can be resent
+- [ ] Members can accept invitations and join family group
+- [ ] Adding member mid-cycle creates prorated charge on next invoice
+- [ ] Removing member mid-cycle creates prorated credit on next invoice
+- [ ] Family dashboard shows all members with threat scores and alert counts
+- [ ] Primary user can configure which services each member has access to
+- [ ] Members cannot see other members' sensitive breach details (only score + summary)
+- [ ] Billing notifications route to primary user; security alerts route to affected member
+- [ ] Max member limits enforced at invitation (5 for Fortress, 3 for Guard)
+- [ ] Plan downgrade prevents inviting beyond new tier's member limit
+- [ ] All family plan changes handled via Stripe proration (no manual calculations)
+
+validation:
+- Run `vitest run billing.test.ts` — extended tests for family proration
+- Manual: Send invitation to test email, click link, verify member joins family
+- Stripe Dashboard: Verify proration items appear on invoices after member changes
+- Dashboard: Family view shows 3 member cards with individual threat scores
+
+notes:
+- Family plans have 30–50% lower churn than individual plans — this is a critical retention driver
+- Stripe's `proration_behavior` handles most math automatically — trust it
+- Children's accounts should be restricted: no dark web monitoring for minors, only spam/basic alerts
+- Consider adding "family safety alerts" — notify primary user if child receives suspicious contact
+- The existing `invitation.ts` schema may need extension for family-specific invitation tokens
+- Member removal should not delete their account — just unlink from family group
+- Children (under 18) should have simplified dashboard — no breach details, only "safe/attention needed"
--- a/tasks/core-services-implementation/README.md
+++ b/tasks/core-services-implementation/README.md
@@ -0,0 +1,45 @@
+# Core Services Implementation
+
+**Objective:** Convert all stub/placeholder services into production-ready implementations with real API integrations, enabling paid customer subscriptions and revenue.
+
+**Status legend:** [ ] todo, [~] in-progress, [x] done
+
+## Tasks
+
+### Phase 1 — Foundation (Revenue Enabler)
+- [ ] 01 — Stripe Checkout, webhooks, and subscription state management → `01-stripe-checkout-webhooks.md`
+- [ ] 02 — Automated removal engine for top 20 data brokers → `02-removebrokers-top-20.md`
+
+### Phase 2 — Core Services (Table Stakes)
+- [ ] 03 — HIBP API integration for email breach monitoring → `03-darkwatch-hibp.md`
+- [ ] 04 — SecurityTrails, Censys, Shodan API integrations → `04-darkwatch-attack-surface.md`
+- [ ] 05 — Periodic scan scheduling, WebSocket progress, alert deduplication → `05-darkwatch-scheduler.md`
+- [ ] 06 — Twilio Lookup and phone reputation API integration → `06-spamshield-reputation.md`
+- [ ] 07 — Fine-tuned DistilBERT SMS spam classifier with ONNX deployment → `07-spamshield-ml-classifier.md`
+
+### Phase 3 — Scale & Expand
+- [ ] 08 — Expand broker coverage to 50+ with CAPTCHA solving → `08-removebrokers-50-plus.md`
+- [ ] 09 — Attom Data Solutions API for property record snapshots → `09-hometitle-attom-api.md`
+- [ ] 10 — County recorder web scrapers for top 100 US counties → `10-hometitle-county-scrapers.md`
+- [ ] 11 — Azure Voice Live API for synthetic voice detection → `11-voiceprint-azure-api.md`
+
+### Phase 4 — Differentiation & Polish
+- [ ] 12 — iOS CallKit and Android Telecom API for real-time call analysis → `12-voiceprint-mobile-integration.md`
+- [ ] 13 — Cross-service threat correlation scoring and unified alert feed → `13-correlation-engine.md`
+- [ ] 14 — Family plan member management, billing proration, multi-user dashboard → `14-family-plans.md`
+
+## Dependencies
+- 02 → 08 (expand broker automation after initial 20 work)
+- 03 → 04 → 05 (HIBP before attack surface APIs before scheduling)
+- 06 → 07 (reputation APIs before ML classifier)
+- 09 → 10 (Attom API before county scraping fallback)
+- 11 → 12 (Azure API before mobile integration)
+- 01 → 14 (billing before family plan management)
+- 05, 07, 08 → 13 (core services feed into correlation engine)
+
+## Exit Criteria
+- All 5 core services make real API calls or run real ML inference — no stub responses remain in production code
+- Billing supports Stripe Checkout, webhooks, tier upgrades/downgrades, and trial periods
+- A paying customer can sign up, receive real alerts, and see tangible value within 48 hours
+- Mobile apps display real data from all working services
+- No `crypto.randomUUID()`, `isSynthetic: false`, `isSpam: false`, or `Unknown Owner` mock responses in production paths