shortcommings

This commit is contained in:
2026-05-31 22:03:18 -04:00
parent 3b29de3234
commit c159f07322
17 changed files with 1535 additions and 4 deletions

View File

@@ -0,0 +1,428 @@
# Kordant: Product Gap Analysis & Path to Revenue
**Date:** May 31, 2026
**Scope:** What's functional vs. scaffolding, what's needed to ship, expected customer value, pricing
---
## Executive Summary
Kordant is a **well-architected platform with mostly scaffolding implementations**. The codebase has excellent structure — tRPC routers, Drizzle ORM schemas, service layers, job handlers, mobile apps, and a Rust queueing library (Honker). However, **none of the five core services deliver real value to a paying customer today**. The ML models return stub data, external API integrations are placeholders, and data sources return mock results.
**Bottom line:** You have the platform skeleton. You need to build the muscles.
| Service | Status | Lines of Code | Real Functionality | Effort to Ship |
|---------|--------|---------------|-------------------|----------------|
| **VoicePrint** | ❌ Pure scaffolding | ~240 | None — returns `isSynthetic: false` | 612 months, $100K$500K |
| **DarkWatch** | ⚠️ Architecture only | ~500+ | Circuit breakers, alert pipeline, CRUD — no real API calls | 24 months, $20K$50K |
| **SpamShield** | ⚠️ Rule engine only | ~400+ | Pattern matching works — ML & reputation APIs are stubs | 23 months, $15K$40K |
| **HomeTitle** | ❌ Scaffolding | ~300 | Geocoding works — county records return mock data | 36 months, $30K$80K |
| **RemoveBrokers** | ⚠️ Registry only | ~1,500+ | Broker registry (100+ entries) — removal engine is placeholder | 24 months, $20K$50K |
| **Billing** | ⚠️ Minimal | ~100 | Stripe client — no webhooks, proration, or checkout | 12 months, $10K$20K |
| **Auth** | ✅ Functional | ~200 | JWT + bcrypt working | Done |
---
## 1. Current State: What Actually Works
### ✅ Functional (Shippable Today)
- **Authentication:** JWT signing/verification (jose), password hashing (bcrypt, 10 rounds). Solid implementation.
- **Database Schema:** Complete Drizzle ORM schemas for all 5 services, alerts, billing, subscriptions, audit logs.
- **tRPC API Layer:** Router scaffolding for all services with proper Zod schemas.
- **Dashboard UI:** Web dashboard with sidebar, threat score widget, alert feed, service widgets.
- **Mobile Apps:** iOS (SwiftUI) and Android (Compose) with ViewModels, Models, and navigation. Thin clients calling tRPC.
- **Browser Extension:** Chrome Manifest V3 extension shell.
- **Honker (Rust):** Queueing library for background jobs, FFI bindings.
- **Geocoding:** Google Maps API integration in HomeTitle (works if API key provided).
- **SpamShield Rule Engine:** Regex/area code/prefix pattern matching works.
- **DarkWatch Alert Pipeline:** Severity scoring, exposure deduplication, alert creation logic.
- **RemoveBrokers Registry:** 100+ broker entries with domains, removal URLs, categories.
### ❌ Not Functional (Scaffolding/Placeholders)
| Component | What It Does | What It Should Do |
|-----------|-------------|-------------------|
| **VoicePrint ML Engine** | Returns `{ isSynthetic: false, confidence: 1.0, score: 0.0 }` | Detect AI-generated voices in real-time |
| **VoicePrint Voice Matching** | Returns `{ similarity: 0, matched: false }` | Compare voice against enrolled templates |
| **VoicePrint Embedding** | Returns empty `Float64Array(256)` + SHA256 hash | Generate voice embeddings for enrollment |
| **DarkWatch Scan Engine** | Has circuit breaker structure — no actual API calls to HIBP, SecurityTrails, Censys, Shodan | Query real breach databases and dark web sources |
| **SpamShield ML Engine** | `classifyTextBERT()` returns `{ isSpam: false, confidence: 1.0 }` | Classify SMS/call text as spam using ML |
| **SpamShield Reputation API** | Hiya/Truecaller lookups return `{ score: 0, isSpam: false }` | Query real phone reputation databases |
| **HomeTitle County Scanner** | Returns `{ ownerName: "Unknown Owner", address: {} }` | Fetch real county deed records |
| **HomeTitle HTML Parser** | `parseDeedRecords()` logs "not yet implemented" and returns null | Parse county record HTML/JSON responses |
| **RemoveBrokers Removal Engine** | Returns `{ success: true, requestId: crypto.randomUUID() }` | Actually submit opt-out requests to brokers |
| **RemoveBrokers Email** | Returns `{ success: true }` without sending anything | Send opt-out emails to broker addresses |
| **RemoveBrokers Status Tracking** | Returns `{ status: "pending" }` always | Poll brokers for actual removal status |
| **Billing Webhooks** | No webhook handler implemented | Handle Stripe webhook events (checkout, renewal, cancel) |
| **Billing Checkout** | No checkout session creation | Create Stripe Checkout sessions for subscription plans |
---
## 2. Gap Analysis by Service
### VoicePrint — Voice Clone Detection
**Current:** 56-line ML engine, all stubs. No audio processing, no model loading, no inference.
**What's needed for a working product:**
1. **API-first approach (fastest):**
- Integrate Microsoft Azure Voice Live API (~$0.016/min) for liveness detection
- Integrate Pindrop or Daon API for passive detection
- Estimated cost: $60K$230K/year at scale
2. **Build in-house (differentiating but expensive):**
- Deploy AASIST or RawNet2 model (open-source from ASVspoof 2021)
- GPU inference infrastructure (NVIDIA T4/A10, $300$800/mo per node)
- Audio preprocessing pipeline (VAD, resampling, normalization)
- Enrollment system (collect voice samples, generate embeddings)
- Estimated cost: $840K$1.25M Year 1
3. **Mobile integration:**
- iOS: Integrate with CallKit for real-time call analysis
- Android: Integrate with Telecom API
- On-device inference for low-latency screening
**Market reality:** Voice clone detection is the most technically ambitious service. Hiya and Truecaller have carrier-level integrations you can't replicate without carrier partnerships. Your differentiator should be **consumer-facing analysis** (record a suspicious call → analyze → report), not real-time PSTN interception.
**Effort:** 612 months to MVP, $100K$500K
**Revenue potential:** High — this is the most novel service in your suite. Competitors don't offer this to consumers.
---
### DarkWatch — Dark Web & Breach Monitoring
**Current:** Best-implemented service. Has scan engine architecture, circuit breakers, alert pipeline, watchlist CRUD, exposure dedup. Missing: actual API calls to external data sources.
**What's needed for a working product:**
1. **API integrations (the core work):**
- **HaveIBeenPwned (HIBP):** Free tier (1,500 req/mo) → Paid ($3.50/mo individual). Check emails against breach database.
- **SecurityTrails:** $49/mo Pro plan. DNS/WHOIS monitoring for domain exposure.
- **Censys:** $79/mo Pro. Internet-wide scanning for exposed services.
- **Shodan:** $299/mo Small Business. IoT/device exposure monitoring.
- **Optional — Breachsense:** $199/mo for deep dark web scanning.
2. **Data pipeline:**
- Implement actual `fetchWithCircuit()` calls to each API
- Parse and normalize responses into your exposure schema
- Schedule periodic scans (daily/weekly depending on tier)
- WebSocket push for real-time scan progress
3. **Alert quality:**
- Your severity scoring logic is already implemented
- Add alert fatigue reduction (dedup, cooldown periods)
- Email + push notification delivery
**Monthly API costs at scale:** ~$500$1,000/mo for base data sources
**Per-customer API cost:** ~$0.50$2.00/mo (amortized across user base)
**Effort:** 24 months, $20K$50K
**Revenue potential:** Medium — crowded market (Aura, LifeLock, Experian all offer this). Must differentiate on alert quality and multi-source correlation.
---
### SpamShield — Spam Call/SMS Classification
**Current:** Rule engine works (pattern matching, area code, prefix). ML engine and reputation APIs are stubs.
**What's needed for a working product:**
1. **Reputation API integrations:**
- **Hiya API:** Phone number reputation scoring. Carrier-level integration preferred but API available.
- **Truecaller API:** Caller ID and spam labeling.
- **Twilio Lookup API:** $0.004$0.03 per lookup. Caller name + line type.
- **STIR/SHAKEN verification:** Call authentication (requires telecom partner).
2. **ML text classification:**
- Fine-tune lightweight model (DistilBERT or TinyBERT) on SMS spam dataset
- Deploy as ONNX model for low-latency inference
- Training data: Enron Spam Corpus, SMS Spam Collection, custom labeled data
3. **Mobile integration:**
- iOS: CallKit integration for real-time caller screening
- Android: Telecom API for call filtering
- SMS interception (requires carrier permissions or SMS app integration)
**Monthly API costs:** Twilio Lookup ~$0.004/lookup. Hiya/Truecaller custom pricing.
**Per-customer cost:** ~$1$5/mo depending on call volume.
**Effort:** 23 months, $15K$40K
**Revenue potential:** Medium-High — Hiya/Truecaller dominate at carrier level, but consumer-facing spam classification with AI detection is underserved.
---
### HomeTitle — Property Deed Monitoring
**Current:** Geocoding works (Google Maps API). County records fetcher returns mock data. HTML parser not implemented. Change detection logic is solid.
**What's needed for a working product:**
1. **County data sources (the hard part):**
- **US county recorder APIs:** ~3,000 counties, each with different data formats
- **Commercial aggregators:**
- **Attom Data Solutions:** Property records API, ~$0.05$0.10/record
- **CoreLogic:** Property intelligence, enterprise pricing
- **Black Knight (Moody's):** Property data, enterprise pricing
- **County-specific APIs:** Some counties offer open data (e.g., Cook County IL, Harris County TX)
- **Web scraping fallback:** Parse county recorder websites (fragile, requires maintenance)
2. **Monitoring pipeline:**
- Initial property snapshot (owner, deed date, liens, tax info)
- Periodic re-scan (weekly/monthly)
- Change detection (your logic is already implemented)
- Alert generation (ownership transfer, lien filing, tax change)
3. **Property verification:**
- Geocoding → parcel ID lookup → county record fetch
- Handle counties without digital records (mail-based requests)
**Monthly data costs:** Attom ~$500$5,000/mo depending on volume.
**Per-customer cost:** ~$2$10/mo depending on scan frequency.
**Effort:** 36 months, $30K$80K
**Revenue potential:** Medium — unique differentiator. No major competitor offers this in consumer identity protection. Real estate fraud is rising (FTC reports $1B+ in property fraud annually).
---
### RemoveBrokers — Data Broker Opt-Out
**Current:** Broker registry with 100+ entries (solid). Removal engine is a placeholder that returns mock request IDs. Email sending not implemented. Form submission not implemented.
**What's needed for a working product:**
1. **Automated removal engine:**
- **Headless browser automation:** Playwright/Puppeteer for each broker's opt-out flow
- **Form filling:** Dynamic form field detection and population
- **CAPTCHA handling:** 2Captcha/AntiCaptcha integration ($0.001$0.01/solve)
- **Email verification:** Handle opt-out confirmation emails
- **Physical mail:** Generate and mail opt-out letters for brokers requiring it
2. **Broker-specific adapters:**
- Each of 100+ brokers has unique opt-out flow
- Estimated 25 hours per broker to implement and test
- Ongoing maintenance: 1525% of scripts break per quarter
3. **Re-scan pipeline:**
- Periodic re-scans to detect re-listings
- Status tracking and progress reporting
4. **Competitor benchmark:**
- **DeleteMe:** 300+ brokers, $139/yr individual, $329/yr family
- **Kanary:** 400+ brokers, $132/yr individual, $264/yr family
- **OneRep:** 200+ brokers, $180/yr individual
**Monthly operational costs:** Proxies ($1K$6K), CAPTCHA solving ($3$8/customer), compute ($1K$5K)
**Per-customer cost:** ~$13$53/year (high margin: 6090%)
**Effort:** 24 months for initial 50 brokers, then incremental
**Revenue potential:** Medium — competitive market but high margins. Your advantage: bundling with other services.
---
### Billing & Payments
**Current:** Stripe client initialized. No checkout, webhooks, or subscription management.
**What's needed:**
1. **Stripe Checkout integration:**
- Create checkout sessions for each plan tier
- Handle success/cancel redirects
- Customer portal for subscription management
2. **Webhook handlers:**
- `checkout.session.completed` → activate subscription
- `invoice.payment_succeeded` → renew subscription
- `invoice.payment_failed` → grace period, retry
- `customer.subscription.deleted` → cancel access
- `customer.subscription.updated` → tier changes
3. **Subscription management:**
- Trial periods (14-day free trial)
- Tier upgrades/downgrades with proration
- Family plan member management
- Grace period before suspension
4. **Plan structure:**
- See pricing recommendations below
**Effort:** 12 months, $10K$20K
**Revenue potential:** N/A (enables all revenue)
---
## 3. Recommended Build Priority
Based on effort vs. market differentiation:
| Priority | Service | Why | Effort | Revenue Impact |
|----------|---------|-----|--------|----------------|
| **1** | **RemoveBrokers** | Highest margin (6090%), existing registry, clear competitor benchmark | 24 mo | Direct revenue, $11$27/mo |
| **2** | **DarkWatch** | Best architecture, API integrations needed, table-stakes feature | 24 mo | Core retention driver |
| **3** | **SpamShield** | Rule engine works, needs reputation APIs + ML | 23 mo | Differentiation vs. competitors |
| **4** | **Billing** | Enables all revenue, must ship before paid plans | 12 mo | Revenue enabler |
| **5** | **HomeTitle** | Unique differentiator, but data sourcing is hard | 36 mo | Premium tier upsell |
| **6** | **VoicePrint** | Most novel, but highest effort and cost | 612 mo | Brand differentiation |
**Recommended MVP scope:** RemoveBrokers + DarkWatch + SpamShield + Billing = **58 months to first revenue**.
---
## 4. Pricing Strategy
### Recommended Plan Structure
| Plan | Monthly Price | Annual Price | Features |
|------|--------------|--------------|----------|
| **Shield** (Entry) | $12/mo | $9/mo ($108/yr) | DarkWatch (basic), SpamShield, RemoveBrokers (50 brokers) |
| **Guard** (Core) | $22/mo | $18/mo ($216/yr) | All Shield + DarkWatch (full), RemoveBrokers (200+), HomeTitle (1 property) |
| **Fortress** (Premium) | $35/mo | $29/mo ($348/yr) | All Guard + HomeTitle (3 properties), VoicePrint, priority alerts, family (2 adults) |
| **Family Fortress** | $45/mo | $39/mo ($468/yr) | All Fortress + 5 adults + unlimited children |
### Competitive Positioning
| Your Plan | vs. Aura | vs. DeleteMe | vs. LifeLock |
|-----------|----------|-------------|--------------|
| Shield ($12) | Matches Aura Individual | Cheaper than DeleteMe ($11.58) | Cheaper than LifeLock Select |
| Guard ($22) | Below Aura Family | N/A (DeleteMe is removal-only) | Below LifeLock Advantage |
| Fortress ($35) | Below Aura Family | N/A | Below LifeLock Ultimate |
| Family ($45) | Above Aura Family ($37) | Above DeleteMe Family ($27.42) | Above LifeLock Family |
### Expected Unit Economics
| Metric | Estimate | Basis |
|--------|----------|-------|
| **ARPU (blended)** | $18$25/mo | Mix of tiers, family plans raise ARPU |
| **Gross margin** | 6575% | API costs, infrastructure, support |
| **CAC (organic)** | $50$150 | Content marketing, word-of-mouth |
| **CAC (paid)** | $200$400 | Google Ads, affiliate |
| **Monthly churn (individual)** | 35% | Industry benchmark |
| **Monthly churn (family)** | 12% | Higher switching costs |
| **LTV (individual)** | $600$1,200 | 24-mo avg life, $20 ARPU |
| **LTV (family)** | $1,600$2,400 | 48-mo avg life, $45 ARPU |
| **LTV:CAC (organic)** | 48x | Healthy |
| **LTV:CAC (paid)** | 24x | Marginal |
---
## 5. What Customers Actually Get (When Working)
### Monthly Value Perception
| Service | Customer Perceives | Actual Value |
|---------|-------------------|--------------|
| **VoicePrint** | "They detected a scam call cloning my daughter's voice" | Highest emotional impact, brand-defining |
| **DarkWatch** | "They found my email in a breach I didn't know about" | Table-stakes, expected by all competitors |
| **SpamShield** | "They blocked 47 spam calls this month" | Daily utility, high engagement |
| **HomeTitle** | "They caught a fraudulent lien on my house" | Highest dollar impact ($10K$100K+ saved) |
| **RemoveBrokers** | "They removed me from 127 people-search sites" | Tangible progress, visible results |
### Customer Loyalty Drivers
1. **Alert quality (not quantity):** One perfect alert > 20 noise alerts. Your correlation engine should reduce false positives.
2. **Family plan lock-in:** Once a family is enrolled, switching costs are high.
3. **Visible progress:** RemoveBrokers dashboard showing "127/300 removed" drives retention.
4. **Crisis response:** When a major breach hits (e.g., Change Healthcare 2024), proactive alerts create loyalty spikes.
5. **Mobile app quality:** Credit lock/unlock, real-time alerts, one-tap actions.
---
## 6. Infrastructure Costs at Scale
### Monthly Fixed Costs
| Component | 100 Users | 1,000 Users | 10,000 Users |
|-----------|-----------|-------------|--------------|
| **Turso (SQLite)** | $0$25 | $25$100 | $100$500 |
| **Redis** | $0$15 | $15$50 | $50$200 |
| **HIBP API** | $0 (free tier) | $3.50 | $50+ |
| **SecurityTrails** | $49 | $49 | $249 |
| **Censys** | $79 | $79 | $299 |
| **Shodan** | $299 | $299 | $599 |
| **Twilio (SpamShield)** | $5$20 | $20$100 | $100$500 |
| **Attom (HomeTitle)** | $500 | $1,000 | $5,000 |
| **Azure Voice Live** | $0 (dev) | $100$500 | $500$5,000 |
| **Proxies (RemoveBrokers)** | $100 | $500 | $2,000 |
| **CAPTCHA solving** | $10 | $50 | $200 |
| **Compute (SolidStart)** | $50 | $200 | $1,000 |
| **Total Fixed** | ~$1,200 | ~$2,500 | ~$16,000 |
### Per-User Variable Costs
| Service | Cost/User/Month | Notes |
|---------|-----------------|-------|
| DarkWatch | $0.50$2.00 | Amortized API costs |
| SpamShield | $1.00$5.00 | Twilio lookups, ML inference |
| HomeTitle | $2.00$10.00 | Attom record lookups |
| RemoveBrokers | $1.00$4.00 | Proxy + CAPTCHA + compute |
| VoicePrint | $0.50$3.00 | Azure API or GPU inference |
| **Total** | **$5.00$24.00** | Depends on usage |
At $18/mo average ARPU and $10/mo variable cost, **gross margin is ~44%** at early scale. Improves to **6575%** as API costs amortize and you negotiate volume pricing.
---
## 7. Risks & Mitigations
| Risk | Severity | Mitigation |
|------|----------|-----------|
| **VoicePrint never reaches production accuracy** | High | Ship API-first (Azure Voice Live), defer in-house model |
| **County data sourcing blocked** | High | Start with top 100 counties, use Attom API, expand gradually |
| **Broker scripts break constantly** | Medium | Budget 20% engineering time for maintenance, use AI-assisted scraping |
| **Competitor price war (Aura at $12/mo)** | Medium | Differentiate on VoicePrint + HomeTitle (unique features) |
| **API cost overruns** | Medium | Implement rate limits per tier, cache aggressively, negotiate volume pricing |
| **Regulatory compliance (FCRA, GLBA)** | High | Legal review before launch, SOC 2 Type II certification |
| **False positive alerts destroy trust** | High | Human review queue for low-confidence alerts, user feedback loop |
---
## 8. Timeline to Revenue
### Phase 1: Foundation (Months 12)
- ✅ Billing integration (Stripe Checkout + webhooks)
- ✅ RemoveBrokers: Implement removal for top 20 brokers
- ✅ DarkWatch: Connect HIBP + SecurityTrails APIs
- **Revenue:** None (beta testers only)
### Phase 2: MVP Launch (Months 34)
- ✅ RemoveBrokers: 50+ brokers with automated removal
- ✅ DarkWatch: Full scan pipeline with HIBP, SecurityTrails, Censys
- ✅ SpamShield: Reputation API integration (Twilio Lookup + Hiya)
- ✅ Billing: Free trial + paid plans
- **Revenue:** $12/mo Shield plan, target 100 beta users
### Phase 3: Growth (Months 58)
- ✅ RemoveBrokers: 100+ brokers
- ✅ DarkWatch: Add Shodan, Breachsense
- ✅ SpamShield: ML text classification (fine-tuned DistilBERT)
- ✅ HomeTitle: Top 50 counties + Attom API
- **Revenue:** All tiers, target 1,000 users
### Phase 4: Differentiation (Months 912)
- ✅ VoicePrint: Azure Voice Live API integration
- ✅ HomeTitle: 200+ counties
- ✅ Correlation engine: Cross-service threat scoring
- ✅ Mobile: Real-time call screening (iOS CallKit, Android Telecom)
- **Revenue:** Premium tiers, target 5,000 users
---
## 9. Bottom Line
**What you have:** A well-architected platform skeleton with auth, database, API layer, dashboard UI, mobile apps, and queueing infrastructure.
**What you need:** The actual data integrations and ML models that make the services useful. Currently, every core service returns mock data or stub responses.
**Fastest path to revenue (58 months):** RemoveBrokers + DarkWatch + SpamShield + Billing. These three services are achievable with API integrations and automation — no custom ML training required.
**Total investment to MVP revenue:** ~$65K$140K (engineering + API costs for 58 months).
**Expected pricing:** $12$45/mo depending on tier. Industry benchmark ARPU: $18$25/mo.
**Expected LTV:** $600$2,400 depending on plan tier (individual vs. family).
**Key differentiator from competitors:** VoicePrint (voice clone detection) + HomeTitle (property monitoring). These are unique in the consumer market. But they're also the hardest to build.
**Strategic recommendation:** Ship RemoveBrokers + DarkWatch first (fastest ROI, proven demand), then layer in SpamShield + HomeTitle for differentiation, then VoicePrint as the crown jewel that justifies premium pricing.

View File

@@ -0,0 +1,57 @@
# 01. Stripe Checkout, Webhooks, and Subscription State Management
meta:
id: core-services-01
feature: core-services-implementation
priority: P0
depends_on: []
tags: [billing, stripe, payments, foundation]
objective:
- Enable paid customer acquisition by implementing complete Stripe payment lifecycle — checkout, webhook handling, subscription state machine, and customer portal.
deliverables:
- Stripe Checkout session creation for each plan tier (Shield, Guard, Fortress, Family Fortress)
- Webhook endpoint handling all critical Stripe events
- Subscription state machine in Drizzle ORM
- Customer portal (billing settings, plan change, cancellation)
- Trial period support (14-day free trial)
steps:
1. Add `STRIPE_WEBHOOK_SECRET` to `.env.example` and validate in `env.ts`
2. Implement `createCheckoutSession(planId, customerId?, trial?)` in `billing.service.ts`
3. Implement `POST /api/webhooks/stripe` route handler with signature verification
4. Handle events: `checkout.session.completed`, `invoice.payment_succeeded`, `invoice.payment_failed`, `customer.subscription.updated`, `customer.subscription.deleted`
5. Update subscription record in database on each event (status, tier, period end, payment method)
6. Implement `createCustomerPortalSession(customerId)` for subscription management
7. Add trial logic: create subscription with `trial_end`, handle trial-to-paid transition
8. Add proration logic for tier upgrades/downgrades using `proration_behavior: 'create_prorations'`
9. Update billing router tRPC procedures: `getCheckoutUrl`, `getPortalUrl`, `getSubscription`, `cancelSubscription`
10. Add rate limiting on checkout creation (prevent abuse)
tests:
- Unit: Mock Stripe API responses, verify database state transitions for each webhook event
- Integration: Create real Stripe test-mode checkout session, complete payment, verify subscription activation
- E2E: End-to-end checkout flow from dashboard → Stripe Checkout → webhook → active subscription
acceptance_criteria:
- [ ] Customer can click "Subscribe" on Shield plan and be redirected to Stripe Checkout
- [ ] After successful payment, webhook creates active subscription record in database
- [ ] Customer can access billing portal to view invoices, change plan, or cancel
- [ ] Trial subscription auto-converts to paid or suspends after trial ends
- [ ] Tier upgrade creates prorated invoice and updates subscription immediately
- [ ] `invoice.payment_failed` sets grace period status and sends retry email
- [ ] All webhook events are idempotent (duplicate events don't create duplicate records)
- [ ] Webhook handler returns 200 for handled events, 400 for invalid signatures
validation:
- Run `stripe trigger checkout.session.completed` in Stripe CLI, verify database record
- Run `stripe trigger invoice.payment_failed`, verify grace period status
- Create test checkout, pay with `4242 4242 4242 4242`, verify active subscription in dashboard
- Run test suite: `vitest run billing.test.ts`
notes:
- Stripe API version: `2026-04-22.dahlia` (already configured in `stripe.ts`)
- Webhook endpoint must be publicly accessible for Stripe to deliver — use ngrok for local dev
- Store `stripeCustomerId` and `stripeSubscriptionId` on user/subscription records
- Use `stripe-webhook` event type in database for audit trail

View File

@@ -0,0 +1,61 @@
# 02. Automated Removal Engine for Top 20 Data Brokers
meta:
id: core-services-02
feature: core-services-implementation
priority: P0
depends_on: [core-services-01]
tags: [removebrokers, automation, playwright, scraping, revenue]
objective:
- Replace the `submitAutomatedRemoval()` stub that returns `crypto.randomUUID()` with a real Playwright-based browser automation that submits opt-out requests to the top 20 data brokers.
deliverables:
- Playwright-based removal engine in `removebrokers/removal.engine.ts`
- Per-broker adapter modules for top 20 brokers (Spokeo, Whitepages, MyLife, BeenVerified, etc.)
- CAPTCHA detection and graceful failure (manual fallback flow)
- Removal request status tracking with actual polling
- Email notification service integration for opt-out confirmations
steps:
1. Install Playwright: `npm install -D playwright @playwright/test`
2. Analyze opt-out flows for top 20 brokers from existing registry data
3. Create `removebrokers/adapters/` directory with one module per broker
4. Implement base adapter interface: `scanForProfile`, `submitOptOut`, `verifyRemoval`, `getStatus`
5. Implement adapters for each top 20 broker with navigation, form filling, and submission logic
6. Add proxy rotation support (BrightData or similar) to avoid IP blocking
7. Add stealth mode (playwright-stealth) to reduce detection
8. Implement `submitAutomatedRemoval()` to select correct adapter by broker ID and execute
9. Store actual request IDs from brokers (not generated UUIDs) in database
10. Implement `trackRemovalStatus()` with periodic re-scans for submitted requests
11. Integrate with notification service to email user when removal is confirmed
12. Add job handler for batch removal processing queue
13. Handle failures gracefully: retry with backoff, escalate to manual queue after 3 failures
tests:
- Unit: Mock Playwright browser, verify adapter navigation sequences
- Integration: Run adapter against real broker site in headful mode, verify opt-out form submission
- E2E: Full flow — add broker to watchlist → trigger removal → verify status progression
acceptance_criteria:
- [ ] Top 20 broker adapters are implemented and tested against live sites
- [ ] `submitAutomatedRemoval()` no longer returns mock UUIDs — it submits real opt-out requests
- [ ] Removal status tracks actual broker state (pending → submitted → completed/failed)
- [ ] Failed removals are retried 3 times with exponential backoff, then escalated to manual queue
- [ ] CAPTCHA challenges are detected and flagged for manual processing (not silently failing)
- [ ] Job queue processes removals asynchronously without blocking API responses
- [ ] User dashboard shows real removal progress per broker
- [ ] All Playwright browsers are properly closed after each session (no resource leaks)
validation:
- Run `vitest run removebrokers.service.test.ts` — all tests pass
- Manual test: Trigger removal for Spokeo, verify opt-out email received
- Check database: `removal_requests` table has real request IDs and actual status values
- Run removal job: `bun run job:removebrokers` processes queue without errors
notes:
- Broker sites change frequently — expect 1525% of adapters to break per quarter
- Some brokers require email verification sent to the listed email (often outdated) — flag these
- Start with brokers that have simple form-based opt-outs; defer email/physical mail brokers to Phase 3
- The existing broker registry in `broker.registry.ts` already has removal URLs — use these as starting points
- Budget $1K$3K/mo for proxy infrastructure at scale

View File

@@ -0,0 +1,63 @@
# 03. HaveIBeenPwned API Integration for Email Breach Monitoring
meta:
id: core-services-03
feature: core-services-implementation
priority: P0
depends_on: [core-services-01]
tags: [darkwatch, hibp, breach-monitoring, api-integration, table-stakes]
objective:
- Replace the stub `scanHIBP()` function in the DarkWatch scan engine with a real HaveIBeenPwned API integration that checks user emails against known breach databases and creates exposure records.
deliverables:
- HIBP API client with k-anonymity support for password checking
- Email breach lookup with result parsing and normalization
- Exposure record creation in database with proper severity scoring
- Alert generation via existing alert pipeline
- Circuit breaker integration (already exists in scan engine)
steps:
1. Sign up for HIBP API key at https://haveibeenpwned.com/API/Key (free tier: 1,500 req/mo)
2. Add `HIBP_API_KEY` to `.env.example` and validate in `env.ts`
3. Create `darkwatch/hibp.client.ts` with functions:
- `checkEmail(email): BreachResult[]` — query breachedaccount endpoint
- `checkPassword(passwordHash): PwnedPasswordResult` — query pwnedpasswords endpoint using k-anonymity
- `getBreaches(): Breach[]` — fetch breach metadata for caching
4. Parse HIBP response: breach name, date, compromised data types, affected accounts
5. Map data types to internal schema: email, password, phone, address, ssn, domain
6. Calculate severity: critical if SSN/credit card, warning if email/phone, info if username only
7. Deduplicate against existing exposures using `identifierHash` (already implemented)
8. Create exposure records via existing `processExposure()` pipeline
9. Cache breach metadata in Redis (update daily) to reduce API calls
10. Handle rate limits: 1 req/sec free tier, 10 req/sec paid — implement request queue
11. Add comprehensive error handling for 404 (no breach), 429 (rate limit), 503 (service unavailable)
tests:
- Unit: Mock HIBP API responses, verify parsing and severity scoring
- Integration: Test with real HIBP API using test email `test@example.com` (no breaches expected)
- E2E: Add email to watchlist → trigger scan → verify exposure records created for breached email
acceptance_criteria:
- [ ] `scanHIBP(email)` makes real HTTP request to `https://haveibeenpwned.com/api/v3/breachedaccount/{email}`
- [ ] Breached emails create exposure records with correct breach metadata (name, date, data classes)
- [ ] Non-breached emails return empty results without creating false exposure records
- [ ] Rate limits are respected (1 req/sec free tier, configurable for paid)
- [ ] 404 responses are handled gracefully (no breach = no exposure, not an error)
- [ ] Circuit breaker opens after 3 consecutive failures and stays open for 60 seconds
- [ ] Exposure deduplication prevents duplicate records for same email + breach combination
- [ ] Alerts are generated for critical exposures (SSN, password) via existing pipeline
- [ ] HIBP breach metadata is cached in Redis and refreshed daily
validation:
- Run `vitest run darkwatch.test.ts` — all tests pass
- Manual: Add known breached email to watchlist, trigger scan, verify alert received
- Check Redis: `GET hibp:breaches` returns cached breach metadata
- Monitor logs: No `"not yet implemented"` or `console.log("[darkwatch] stub")` messages
notes:
- HIBP free tier is 1,500 requests/month — enough for development, need paid tier ($3.50/mo) for production
- The k-anonymity password check sends only first 5 chars of SHA-1 hash — already privacy-safe
- The existing `scan.engine.ts` has the circuit breaker infrastructure — wire HIBP client into it
- HIBP does NOT crawl dark web — it only aggregates known public breaches. For live dark web monitoring, add Breachsense later (Phase 3)
- Consider subscribing to HIBP domain monitoring for enterprise upsell later

View File

@@ -0,0 +1,75 @@
# 04. SecurityTrails, Censys, and Shodan API Integrations
meta:
id: core-services-04
feature: core-services-implementation
priority: P1
depends_on: [core-services-03]
tags: [darkwatch, securitytrails, censys, shodan, attack-surface, api-integration]
objective:
- Integrate SecurityTrails, Censys, and Shodan APIs into the DarkWatch scan engine to monitor domain/IP attack surface exposure, complementing HIBP's breach monitoring.
deliverables:
- SecurityTrails client for DNS/WHOIS monitoring and subdomain enumeration
- Censys client for internet-wide host scanning and certificate transparency
- Shodan client for IoT/device exposure and Tor exit node monitoring
- Unified exposure normalization from all three sources
- Cost-aware scanning (respect rate limits, cache aggressively)
steps:
1. Sign up for API keys:
- SecurityTrails: https://securitytrails.com (free: 50 req/mo, Pro: $49/mo)
- Censys: https://censys.io (free: 250 req/mo, Pro: $79/mo)
- Shodan: https://shodan.io (free: 1,250 results/mo, Small Biz: $299/mo)
2. Add `SECURITYTRAILS_API_KEY`, `CENSYS_API_ID`, `CENSYS_API_SECRET`, `SHODAN_API_KEY` to `.env.example`
3. Create `darkwatch/securitytrails.client.ts`:
- `getDomainInfo(domain)` — WHOIS, DNS records, subdomains
- `getSubdomains(domain)` — enumerate all subdomains
- `getHistory(domain)` — historical DNS changes
4. Create `darkwatch/censys.client.ts`:
- `searchHosts(query)` — find exposed hosts by IP/domain
- `getCertificates(domain)` — certificate transparency logs
- `viewHost(ip)` — detailed host fingerprinting
5. Create `darkwatch/shodan.client.ts`:
- `search(query)` — search exposed devices and services
- `host(ip)` — detailed host information
- `count(query)` — result counts for monitoring
6. Implement unified `processScanResult(source, result)` that normalizes all API responses to internal exposure schema
7. Map exposure types:
- SecurityTrails: subdomain exposure, DNS misconfiguration, domain hijacking risk
- Censys: exposed services, outdated TLS, certificate issues
- Shodan: open ports, default credentials, IoT exposure, Tor association
8. Add tier-aware scan limits: Shield = HIBP only, Guard+ = all sources
9. Implement intelligent caching: cache SecurityTrails DNS data for 24h, Censys/Shodan for 7d
10. Add cost-per-scan tracking in database for billing/usage analytics
tests:
- Unit: Mock all three API responses, verify normalization and exposure creation
- Integration: Test each client against real APIs using low-risk test queries
- E2E: Add domain to watchlist → trigger scan → verify exposures from all three sources
acceptance_criteria:
- [ ] SecurityTrails client queries real API and returns parsed domain/subdomain data
- [ ] Censys client queries real API and returns host/certificate information
- [ ] Shodan client queries real API and returns device/service exposure data
- [ ] Each client respects rate limits (SecurityTrails: 10 req/sec, Censys: 200 req/min, Shodan: 5 req/sec)
- [ ] Circuit breakers open after 3 failures and reset after 60 seconds for each source
- [ ] Exposure records are normalized regardless of source (consistent schema)
- [ ] Alerts are generated for critical findings (open admin panels, exposed databases, certificate expiry)
- [ ] Cache hit reduces API calls — verify Redis stores and returns cached data
- [ ] Cost tracking records API usage per scan for later billing optimization
- [ ] Free tier users only get HIBP; paid tiers unlock SecurityTrails, Censys, Shodan
validation:
- Run `vitest run darkwatch.test.ts` — all tests pass
- Manual: Query `example.com` across all three APIs, verify meaningful results returned
- Check Redis: Cached responses reduce subsequent API calls
- Monitor cost: API call counts tracked in database
notes:
- SecurityTrails is most useful for domain monitoring; Censys/Shodan for IP/host exposure
- Shodan's dark web relevance is limited — it sees Tor exit nodes, not .onion content. Consider DarkOwl ($40K+/yr) for deep dark web later
- The free tiers are sufficient for development but production needs paid plans ($500$1,000/mo combined)
- Focus on actionable findings: exposed RDP, default credentials, certificate expiry — not just raw port scans
- The existing scan engine in `darkwatch.service.ts` already routes by watchlist item type — wire in new clients there

View File

@@ -0,0 +1,72 @@
# 05. Periodic Scan Scheduling, WebSocket Progress, and Alert Deduplication
meta:
id: core-services-05
feature: core-services-implementation
priority: P1
depends_on: [core-services-03, core-services-04]
tags: [darkwatch, scheduler, websocket, real-time, deduplication, alerts]
objective:
- Make DarkWatch continuously useful by scheduling periodic scans, providing real-time progress via WebSocket, and eliminating alert fatigue through intelligent deduplication.
deliverables:
- Cron-based scan scheduler with configurable frequency per tier
- WebSocket real-time scan progress updates (already have `websocket.ts`)
- Alert cooldown periods to prevent duplicate notifications
- Digest mode: batch low-priority alerts into daily/weekly summaries
- Scan history and metrics dashboard data
steps:
1. Implement cron job scheduler in `jobs/handlers/darkwatch.scan.ts`:
- Daily scans for active subscriptions
- Respects tier limits (Shield = HIBP only daily, Guard+ = full suite weekly)
2. Add `scanFrequency` field to subscription schema (daily, weekly, monthly)
3. Wire WebSocket push from existing `websocket.ts` into scan engine:
- Emit `scan:started`, `scan:progress` (completedSources/totalSources), `scan:completed` events
- Client dashboard subscribes to user-specific scan events
4. Enhance alert deduplication beyond existing exposure dedup:
- Add `alertCooldownHours` per alert type (e.g., 24h for same breach, 72h for property changes)
- Track lastAlertSentAt per (userId, alertType, source) tuple
- Don't create new alerts during cooldown unless severity increases
5. Implement digest mode:
- Low-priority alerts (info) batched into daily digest email
- Warning/critical alerts sent immediately via push + email
- User preference: immediate vs. digest per severity level
6. Add scan metrics:
- Store scan duration, sources checked, exposures found, alerts generated
- Aggregate for dashboard "threat score" calculation
7. Implement scan failure recovery:
- Partial scan results saved even if one source fails
- Failed sources retried individually in next scan window
8. Add rate limit per user: max 1 concurrent scan, queue subsequent requests
tests:
- Unit: Verify cron expression parsing, cooldown logic, digest batching
- Integration: Trigger scheduled scan, verify WebSocket events emitted in correct order
- E2E: Start scan from dashboard → watch progress bar → receive completion notification
acceptance_criteria:
- [ ] Scans run automatically on schedule without manual trigger (cron job)
- [ ] WebSocket pushes real-time progress: `scan:progress` events with percentage complete
- [ ] Only one scan runs per user at a time; additional requests are queued
- [ ] Duplicate alerts are suppressed during cooldown period (configurable per type)
- [ ] Info-level alerts are batched into daily digest; warning/critical sent immediately
- [ ] Scan history is persisted and visible in dashboard (last scan date, sources checked, findings)
- [ ] Failed sources don't fail entire scan — partial results are saved
- [ ] Dashboard threat score updates automatically after each scan completion
- [ ] Free tier gets weekly scans; paid tiers get daily scans
- [ ] No duplicate notifications for same exposure across multiple scans
validation:
- Run cron job manually: `bun run job:darkwatch:scan`, verify scan completes and exposures created
- Connect to WebSocket: `wscat -c ws://localhost:3000/ws`, subscribe to scan events
- Check dashboard: Scan progress bar animates during active scan, threat score updates after
- Test cooldown: Trigger same scan twice rapidly, verify second scan doesn't create duplicate alerts
notes:
- The existing `scanStates` Map in `darkwatch.service.ts` is in-memory — move to Redis for multi-instance safety
- WebSocket infrastructure exists at `websocket.ts` — extend it for scan-specific events
- The scheduler directory (`scheduler/`) currently only has Dockerfiles — this task creates actual job logic
- Consider using Honker (Rust queue) for scan job distribution once it's production-ready
- Alert fatigue is a real churn driver — aggressive deduplication is a competitive advantage

View File

@@ -0,0 +1,70 @@
# 06. Twilio Lookup and Phone Reputation API Integration
meta:
id: core-services-06
feature: core-services-implementation
priority: P1
depends_on: [core-services-01]
tags: [spamshield, reputation, twilio, caller-id, api-integration, table-stakes]
objective:
- Replace the stub Hiya/Truecaller lookup functions that return `{ score: 0, isSpam: false }` with real phone reputation API integrations (Twilio Lookup) and integrate results into the spam classification pipeline.
deliverables:
- Twilio Lookup API client for caller name, line type, and carrier info
- Phone reputation scoring system with caching
- Integration with existing rule engine (reputation score augments rule-based decisions)
- STIR/SHAKEN attestation verification (if carrier partnership available)
- Rate-limited, cost-aware API usage
steps:
1. Sign up for Twilio account and enable Lookup API at https://www.twilio.com/lookup
2. Add `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN` to `.env.example`
3. Create `spamshield/twilio.client.ts`:
- `lookupPhone(phoneNumber, type?)` — caller name, line type (mobile/landline/VoIP), carrier
- `lookupReputation(phoneNumber)` — spam risk score, call volume, report counts
- `verifyStirShaken(phoneNumber)` — attestation level (A/B/C) if available
4. Replace stub `lookupHiya()` and `lookupTruecaller()` in `reputation.api.ts` with real Twilio calls
5. Implement reputation scoring algorithm:
- Twilio spam risk score (0100) mapped to internal confidence (0.01.0)
- Line type weighting: VoIP = higher risk, landline = lower risk
- Carrier reputation: known spam carriers = +20 risk
- STIR/SHAKEN attestation: Full attestation (A) = -30 risk, None (C) = +20 risk
6. Cache results in Redis with 24h TTL (phone numbers don't change reputation rapidly)
7. Wire into `spamshield.service.ts`:
- Before rule engine, check reputation
- If reputation confidence > 0.7, block immediately
- If reputation confidence 0.40.7, flag for review
- If reputation confidence < 0.4, proceed to rule engine + ML classifier
8. Add cost tracking: $0.004$0.03 per lookup, track monthly usage per user
9. Implement fallback: if Twilio API fails, use internal rule engine only (graceful degradation)
tests:
- Unit: Mock Twilio API responses, verify reputation scoring algorithm
- Integration: Test with real Twilio Lookup API using known spam number
- E2E: Submit spam check for phone number → verify reputation lookup → get classification result
acceptance_criteria:
- [ ] `lookupPhone()` makes real HTTP request to Twilio Lookup API
- [ ] Reputation scores are calculated from real Twilio data (not hardcoded zeros)
- [ ] High-reputation numbers (confidence > 0.7) trigger automatic block without rule/ML processing
- [ ] Cache stores reputation results for 24 hours, reducing API costs
- [ ] Twilio API failures gracefully fall back to rule engine (no crashes)
- [ ] Cost tracking records each lookup for billing analytics
- [ ] STIR/SHAKEN attestation is checked and factored into score when available
- [ ] VoIP lines get +20 risk weighting compared to landline
- [ ] Internal DB cache (`lookupInternalDB`) is checked before Twilio API call
- [ ] Rate limits: max 100 lookups/minute per user to prevent abuse
validation:
- Run `vitest run spamshield.service.test.ts` — all tests pass
- Manual: Check reputation for known spam number (e.g., reported robocall number), verify high score
- Check cache: Redis `GET spamshield:reputation:+15551234567` returns cached result
- Monitor cost: Database shows lookup usage per user per month
notes:
- Twilio Lookup costs $0.004 per basic lookup, $0.03 per advanced lookup (reputation, caller name)
- At 100 lookups/user/month, cost is $0.40$3.00 per user — manageable at $12+/mo ARPU
- Hiya and Truecaller have proprietary APIs but require carrier partnerships — Twilio is the best consumer-accessible option
- STIR/SHAKEN requires telecom partner for full attestation data — implement if/when partnership exists
- The existing rule engine (`ruleEngine()`) is functional — reputation augments it, doesn't replace it

View File

@@ -0,0 +1,84 @@
# 07. Fine-Tuned DistilBERT SMS Spam Classifier with ONNX Deployment
meta:
id: core-services-07
feature: core-services-implementation
priority: P1
depends_on: [core-services-06]
tags: [spamshield, ml, nlp, distilbert, onnx, text-classification]
objective:
- Replace the stub `classifyTextBERT()` function that returns `{ isSpam: false, confidence: 1.0 }` with a production ML pipeline: fine-tune DistilBERT on SMS spam data, export to ONNX for fast inference, and integrate into the spam classification flow.
deliverables:
- Training pipeline for fine-tuning DistilBERT on SMS spam dataset
- ONNX-exported model for low-latency CPU inference (~50ms per message)
- Inference server with batching and caching
- Integration with existing spam classification service
- Model versioning and A/B testing framework
steps:
1. Set up Python training environment:
- Install `transformers`, `datasets`, `onnxruntime`, `torch`, `optimum[onnxruntime]`
- Create `ml/spam-classifier/` directory in project root
2. Acquire training data:
- SMS Spam Collection Dataset (UCI ML Repository, 5,574 messages)
- Enron Spam Dataset (email corpus, filter to SMS-like short messages)
- Custom labeled data from user feedback (Phase 2)
3. Fine-tune DistilBERT-base-uncased:
- Binary classification: spam vs. ham
- 3 epochs, batch size 32, learning rate 2e-5
- Expected accuracy: 9799% on SMS Spam Collection
4. Export to ONNX:
- Use Optimum CLI: `optimum-cli export onnx --model distilbert-spam ./onnx_model/`
- Quantize to INT8 for 2x speedup with minimal accuracy loss
- Target model size: ~65MB (DistilBERT base), ~33MB (INT8)
5. Create Node.js ONNX inference wrapper:
- Install `onnxruntime-node`
- Load model once at startup, reuse session
- Preprocess: tokenize with DistilBERT tokenizer (max length 128)
- Postprocess: sigmoid on logits → probability → binary decision
- Target latency: <50ms per message on CPU, <10ms on GPU
6. Integrate into `spamshield.service.ts`:
- Replace `classifyTextBERT()` call with real ONNX inference
- Classification flow: reputation lookup → rule engine → ML classifier (ensemble)
- Threshold tuning: default 0.5, adjustable per user preference
7. Implement feedback loop:
- User can report false positive/negative
- Store feedback in `spamFeedback` table (already exists)
- Weekly retraining batch using accumulated feedback
8. Add model versioning:
- Store model artifact in S3-compatible storage
- A/B test new models on subset of traffic
- Rollback capability if accuracy degrades
tests:
- Unit: Verify ONNX inference produces correct labels for known spam/ham test cases
- Integration: End-to-end classification flow with real model loading
- E2E: Submit SMS text → receive classification with confidence score
acceptance_criteria:
- [ ] `classifyTextBERT()` runs real ONNX inference (not returning hardcoded `{ isSpam: false }`)
- [ ] Model accuracy > 95% on held-out test set from SMS Spam Collection
- [ ] Inference latency < 50ms per message on CPU (measured in production)
- [ ] Model file is versioned and loadable from external storage (S3/local path)
- [ ] False positive rate < 2% (legitimate messages incorrectly flagged as spam)
- [ ] User feedback ("not spam" / "spam") is stored and used for model improvement
- [ ] Classification threshold is configurable per user (strict/moderate/lenient)
- [ ] ONNX model loads once at server startup, not per-request
- [ ] Graceful fallback to rule engine if ONNX runtime fails
- [ ] Model size < 100MB for reasonable cold-start time
validation:
- Run `vitest run spamshield.service.test.ts` — tests use real ONNX model
- Benchmark: `bun run benchmark:spamshield` — measure 1000 inferences, report p50/p95/p99 latency
- Manual: Classify known spam message "Congratulations! You've won $1000...", verify `isSpam: true, confidence > 0.9`
- Check feedback: Database `spamFeedback` table accumulates user corrections
notes:
- DistilBERT is chosen over BERT for 40% smaller size and 60% faster inference with minimal accuracy loss
- ONNX Runtime Node.js has limited platform support — test on your deployment target (Linux x64, macOS ARM)
- Training can happen in CI (GitHub Actions with GPU runner) or locally — inference happens in production
- Consider TensorFlow Lite or ONNX Runtime Web for on-device mobile inference later
- The SMS Spam Collection is small (5,574 messages) — augment with synthetic spam variants for robustness
- For European languages, consider multilingual model like `distilbert-base-multilingual-cased`

View File

@@ -0,0 +1,79 @@
# 08. Expand Broker Coverage to 50+ with CAPTCHA Solving and Re-Scan Pipeline
meta:
id: core-services-08
feature: core-services-implementation
priority: P2
depends_on: [core-services-02]
tags: [removebrokers, automation, captcha, scaling, maintenance]
objective:
- Scale from top 20 brokers to 50+ automated removals, implement CAPTCHA solving, and build the re-scan pipeline that detects re-listings.
deliverables:
- 30+ additional broker adapters (total 50+)
- CAPTCHA solving integration (2Captcha or AntiCaptcha API)
- Re-scan scheduler that checks if removed profiles have reappeared
- Email verification handling for opt-out confirmation emails
- Removal success rate dashboard metric
steps:
1. Select next 30 brokers from registry by opt-out complexity (medium-difficulty form-based flows)
2. Create adapter modules for each broker in `removebrokers/adapters/`
3. Implement CAPTCHA solving:
- Detect reCAPTCHA v2/v3, hCaptcha, image challenges
- Integrate 2Captcha API ($0.001$0.01 per solve)
- Add `CAPTCHA_SOLVER_API_KEY` to environment config
- Fallback to manual queue if CAPTCHA solving fails 3 times
4. Implement email verification handling:
- Monitor mailbox for opt-out confirmation emails
- Parse confirmation links and auto-click them
- Store confirmation status in database
5. Build re-scan pipeline:
- Weekly scheduled job that re-scans all "completed" removals
- If profile reappears, create new removal request automatically
- Track re-listing rate per broker (some re-list every 30 days)
6. Add success metrics:
- Track removal success rate per broker (% of opt-outs that stick)
- Dashboard widget showing "X of Y brokers removed"
- Alert user when re-listing detected
7. Implement proxy rotation pool:
- Use residential proxy service (BrightData, IPRoyal)
- Rotate IP per broker session to avoid blocks
- Budget $1K$3K/mo for proxy infrastructure
8. Add adapter health monitoring:
- Track adapter breakage rate
- Alert engineering when >5% of adapters fail in 24h
- Auto-disable broken adapters, queue for manual fix
tests:
- Unit: Mock CAPTCHA solver, verify retry and fallback logic
- Integration: Test CAPTCHA solving against real broker site
- E2E: Complete removal for broker with CAPTCHA → verify re-scan detects re-listing
acceptance_criteria:
- [ ] 50+ broker adapters implemented and tested
- [ ] CAPTCHA challenges are detected and solved automatically (2Captcha integration)
- [ ] Failed CAPTCHA solving escalates to manual queue after 3 attempts
- [ ] Email confirmation links are parsed and clicked automatically
- [ ] Re-scan job runs weekly and detects re-listings within 7 days
- [ ] Re-listed profiles trigger automatic new removal requests
- [ ] Dashboard shows accurate removal progress: "47 of 50 brokers completed"
- [ ] Per-broker success rate is tracked and visible in admin panel
- [ ] Proxy rotation prevents IP blocking on high-volume brokers
- [ ] Adapter breakage is detected within 24 hours and auto-disabled
- [ ] Monthly proxy + CAPTCHA cost per user < $4 (within gross margin target)
validation:
- Run `vitest run removebrokers.service.test.ts` — extended tests for 50 brokers
- Manual: Test CAPTCHA broker (e.g., MyLife), verify automatic solving works
- Check re-scan: Run `bun run job:removebrokers:rescan`, verify re-listings detected
- Monitor costs: Dashboard shows monthly proxy/CAPTCHA spend per customer
notes:
- Broker sites change frequently — budget 20% engineering time for adapter maintenance
- Some brokers (Acxiom, Epsilon) require physical mail — flag these for manual processing
- Re-listing is common — data brokers rebuild databases from public records every 3090 days
- Consider AI-assisted form field detection (GPT-4 Vision) to reduce per-adapter development time
- The existing `broker.registry.ts` already has 100+ entries — prioritize by traffic/popularity
- Success rate target: 80%+ for automated removals, 90%+ with manual fallback

View File

@@ -0,0 +1,74 @@
# 09. Attom Data Solutions API for Property Record Snapshots
meta:
id: core-services-09
feature: core-services-implementation
priority: P2
depends_on: [core-services-01]
tags: [hometitle, attom, property-records, api-integration, real-estate]
objective:
- Replace the `fetchCountyRecords()` stub that returns `{ ownerName: "Unknown Owner" }` with a real property data API integration using Attom Data Solutions, enabling actual property snapshot and change detection.
deliverables:
- Attom API client for property search, owner info, and tax/assessment data
- Property snapshot creation and storage in database
- Change detection pipeline wired to real data (your detector logic already works)
- Alert generation for ownership changes, liens, and tax status changes
steps:
1. Sign up for Attom Data API at https://attomdata.com (pricing: ~$0.05$0.10/record, enterprise plans available)
2. Add `ATTOM_API_KEY` to `.env.example` and validate in `env.ts`
3. Create `hometitle/attom.client.ts`:
- `searchProperty(address)` — find property by address, return parcel ID and metadata
- `getPropertyProfile(parcelId)` — full property record: owner, deed date, tax info, liens
- `getPropertyHistory(parcelId)` — historical ownership and transaction records
- `getTaxInfo(parcelId)` — tax amount, delinquency status, exemptions
4. Replace `fetchCountyRecords()` in `scanner.ts` with Attom API call:
- Use geocoding result (Google Maps API, already works) to get normalized address
- Query Attom by address → get parcel ID → fetch full property profile
- Parse response into `CountyRecord` / `SnapshotData` schema
5. Implement snapshot storage:
- Store initial snapshot in `propertySnapshots` table
- On re-scan, fetch new snapshot → compare with last → detect changes
6. Wire change detection (your `change.detector.ts` is already implemented):
- `ownership_transfer`: owner name changed → critical alert
- `lien_filing`: lien count increased → warning/critical alert
- `tax_change`: tax amount changed → info alert
- `deed_change`: deed date changed → critical alert
7. Implement tier limits:
- Guard: 1 property monitored
- Fortress: 3 properties monitored
- Family: 5 properties monitored
8. Add cost tracking: ~$0.05$0.10 per property lookup, track per-user usage
tests:
- Unit: Mock Attom API responses, verify parsing and snapshot creation
- Integration: Test with real Attom API using known property address
- E2E: Add property to watchlist → trigger scan → verify snapshot created → simulate change → verify alert
acceptance_criteria:
- [ ] `fetchCountyRecords()` makes real HTTP request to Attom API (not returning mock data)
- [ ] Property snapshots contain real owner name, deed date, tax amount, lien count
- [ ] Change detection compares real snapshots and identifies actual changes
- [ ] Ownership transfer creates critical alert with property address in message
- [ ] Lien filing creates warning or critical alert depending on lien amount
- [ ] Alert severity matches existing `severityForChange()` logic
- [ ] Geocoding → Attom search → snapshot pipeline works end-to-end
- [ ] Cost tracking records each Attom API call for billing analytics
- [ ] Tier limits enforced: Guard = 1 property, Fortress = 3, Family = 5
- [ ] Graceful fallback: if Attom API fails, retry once, then alert user of monitoring gap
validation:
- Run `vitest run hometitle.test.ts` — all tests pass with real Attom mock
- Manual: Add real property address, trigger scan, verify snapshot in database
- Simulate change: Update snapshot in database with different owner, trigger detector, verify alert
- Check cost: Database shows Attom API usage per user per month
notes:
- Attom covers ~150M US properties but not all counties equally — some rural areas may have gaps
- For counties not covered by Attom, Phase 3 (task 10) implements county recorder web scrapers
- Property fraud is a real and growing problem: FTC reports $1B+ in losses annually
- This is a unique differentiator — no major identity protection competitor offers property monitoring
- Consider partnership with title insurance companies for added credibility
- The existing Google Maps geocoding already works — verify `GEOCODING_API_KEY` is set

View File

@@ -0,0 +1,83 @@
# 10. County Recorder Web Scrapers for Top 100 US Counties
meta:
id: core-services-10
feature: core-services-implementation
priority: P2
depends_on: [core-services-09]
tags: [hometitle, scraping, county-records, fallback, coverage]
objective:
- Build Playwright-based web scrapers for county recorder websites in the top 100 US counties by population, providing a fallback for counties not covered by Attom API and reducing API costs.
deliverables:
- Scrapers for 100 US county recorder websites (starting with top 50)
- Unified property record parser that normalizes disparate HTML formats
- Fallback logic: Attom API → county scraper → manual request (in order)
- scraper health monitoring and breakage detection
steps:
1. Identify top 100 US counties by population (start with top 50):
- Los Angeles County, CA; Cook County, IL; Harris County, TX; Maricopa County, AZ; etc.
2. Research each county's recorder website:
- Search URL pattern (usually `https://{county}.gov/recorder` or similar)
- Record search interface (by owner name, parcel ID, or address)
- Result format (HTML table, PDF, JSON API, proprietary system)
3. Create `hometitle/county-scrapers/` directory with one module per county
4. Implement base scraper interface:
- `searchByAddress(address): Promise<CountyRecord[]>`
- `searchByParcelId(parcelId): Promise<CountyRecord | null>`
- `parseResults(html): CountyRecord[]`
5. Implement scrapers for each county using Playwright:
- Navigate to recorder website
- Fill search form (address or parcel ID)
- Submit and wait for results
- Parse HTML table or detail page
- Extract: owner name, deed date, tax info, lien status
6. Implement unified `parseDeedRecords(html)` that handles common formats:
- HTML tables with standard columns
- Detail pages with labeled fields
- PDF records (download + text extraction)
7. Add fallback chain in `scanner.ts`:
- Try Attom API first (fastest, most reliable)
- If Attom returns null/empty, try county scraper
- If scraper fails, queue for manual request (email to user)
8. Add scraper monitoring:
- Track success/failure rate per county
- Alert when >20% of scrapers fail in 24h (site changes)
- Auto-disable broken scrapers, fall back to Attom/manual
9. Handle rate limiting:
- Throttle requests to county sites (max 1 req/5 sec per county)
- Use residential proxies if county blocks datacenter IPs
- Respect robots.txt and terms of service
tests:
- Unit: Mock HTML responses for common county formats, verify parser normalization
- Integration: Test 5 representative county scrapers against live sites
- E2E: Property in county without Attom coverage → scraper fetches real data → snapshot created
acceptance_criteria:
- [ ] 50+ county recorder scrapers implemented and tested against live sites
- [ ] `parseDeedRecords()` parses real HTML and returns structured CountyRecord objects
- [ ] Fallback chain works: Attom → county scraper → manual request
- [ ] Each scraper handles the county's specific search interface and result format
- [ ] Rate limiting respects county sites (max 1 request per 5 seconds)
- [ ] Broken scrapers are auto-detected within 24 hours and disabled
- [ ] Scraper success rate > 70% across all implemented counties
- [ ] Property records from scrapers match Attom data quality (owner name, deed date, liens)
- [ ] Failed scraper attempts fall back to manual queue with user notification
- [ ] No county site is overwhelmed by scraping (responsible rate limits)
validation:
- Run `vitest run hometitle.test.ts` — extended tests for county scrapers
- Manual: Search property in Cook County IL, verify scraper returns real owner data
- Check fallback: Disable Attom API key, trigger scan, verify county scraper activates
- Monitor health: Dashboard shows per-county scraper success rate
notes:
- County recorder sites are notoriously fragile — expect 3040% of scrapers to break per quarter
- Many counties use proprietary systems (e.g., Tyler Technologies, Fidlar Technologies) with complex JavaScript
- Some counties require payment per record ($1$5) — flag these for manual processing
- Consider partnering with Attom for counties they don't cover rather than building scrapers
- Legal: Ensure scraping complies with each county's terms of service and state public records laws
- The existing `parseDeedRecords()` currently logs "not yet implemented" — replace with real parsing

View File

@@ -0,0 +1,84 @@
# 11. Azure Voice Live API for Synthetic Voice Detection
meta:
id: core-services-11
feature: core-services-implementation
priority: P2
depends_on: [core-services-01]
tags: [voiceprint, azure, voice-clone-detection, liveness, api-integration]
objective:
- Replace the stub `detectSynthetic()` that returns `{ isSynthetic: false, confidence: 1.0 }` with a real Azure Voice Live API integration, enabling consumer-facing voice clone detection via uploaded call recordings or live microphone capture.
deliverables:
- Azure Speech Services client with Voice Live API endpoint
- Audio preprocessing pipeline (resampling, normalization, VAD)
- Voice enrollment system for trusted contacts (family member voice templates)
- Synthetic detection endpoint that returns real confidence scores
- Call recording upload and analysis workflow
steps:
1. Sign up for Azure Speech Services at https://azure.microsoft.com/services/cognitive-services/speech-services/
2. Add `AZURE_SPEECH_KEY` and `AZURE_SPEECH_REGION` to `.env.example`
3. Create `voiceprint/azure.client.ts`:
- `detectLiveness(audioBuffer, referenceText?)` — Voice Live API for challenge-response liveness
- `verifySpeaker(audioBuffer, enrollmentId)` — speaker verification against enrolled voice
- `enrollSpeaker(audioSamples): Promise<enrollmentId>` — create voice template from samples
4. Implement audio preprocessing:
- Convert to 16kHz mono PCM (Azure requirement)
- Normalize amplitude to -3 dBFS
- Trim silence using VAD (WebRTC or Silero)
- Max duration: 30 seconds per analysis
5. Implement enrollment flow:
- User records 35 samples of family member saying phrases
- Store enrollment in database with `voiceEnrollments` schema (already exists)
- Generate enrollment ID, link to user account
6. Implement detection flow:
- User uploads suspicious call recording or captures live audio
- Preprocess audio → Azure Voice Live API → get liveness score
- If enrollment exists, also run speaker verification → similarity score
- Combine scores: synthetic = low liveness AND low speaker match
7. Implement `detectSynthetic()` to return real analysis:
- Score: 0.01.0 (synthetic likelihood)
- Confidence: based on audio quality and API response certainty
- Decision: synthetic if score > 0.7, suspicious if 0.40.7, genuine if < 0.4
8. Add analysis history:
- Store every analysis in database (audio hash, score, decision)
- Dashboard shows history of analyzed calls
- User can report false positive/negative for model improvement
9. Implement tier limits:
- Fortress+: VoicePrint included
- Lower tiers: not available or limited to 5 analyses/month
tests:
- Unit: Mock Azure API responses, verify score calculation and decision logic
- Integration: Test with real Azure Voice Live API using synthetic and genuine audio samples
- E2E: Upload suspicious call recording → receive analysis result with confidence score
acceptance_criteria:
- [ ] `detectSynthetic()` calls real Azure Voice Live API (not returning hardcoded `isSynthetic: false`)
- [ ] Audio preprocessing converts to 16kHz mono PCM and normalizes amplitude
- [ ] Voice enrollment creates usable template from 35 user-provided samples
- [ ] Speaker verification returns similarity score between 0.0 and 1.0
- [ ] Liveness detection returns pass/fail with confidence for challenge-response mode
- [ ] Combined score correctly flags known synthetic voice samples (>0.7 threshold)
- [ ] Analysis results are stored in database with audio hash and metadata
- [ ] Dashboard shows analysis history with play button for uploaded audio
- [ ] Tier enforcement: VoicePrint only available on Fortress+ plans
- [ ] Graceful fallback: if Azure API fails, return "analysis unavailable" (not false negative)
- [ ] False positive rate < 5% on genuine voice samples (tested with 100+ samples)
validation:
- Run `vitest run voiceprint.test.ts` — all tests pass with Azure mock
- Manual: Upload genuine voice sample, verify `isSynthetic: false` with confidence > 0.9
- Manual: Upload synthetic voice (e.g., from ElevenLabs), verify `isSynthetic: true` with confidence > 0.7
- Check enrollment: Database `voiceEnrollments` table has real templates with Azure enrollment IDs
notes:
- Azure Voice Live API costs ~$0.016/minute of audio analyzed
- At 50 analyses/user/month (12 min each), cost is ~$0.80$1.60/user/month
- This is the ONLY practical path for a startup — building in-house costs $840K$1.25M Year 1
- The differentiator isn't the detection tech (everyone uses Azure/Daon/Pindrop) — it's the consumer UX and integration
- Consider adding forensic analysis mode: detailed spectrogram visualization for user education
- Mobile integration (iOS CallKit, Android Telecom) is Phase 4 (task 12) — this task is server-side only
- Store audio samples securely (encrypted at rest) and allow user deletion (privacy compliance)

View File

@@ -0,0 +1,84 @@
# 12. iOS CallKit and Android Telecom API for Real-Time Call Analysis
meta:
id: core-services-12
feature: core-services-implementation
priority: P2
depends_on: [core-services-11]
tags: [voiceprint, ios, android, callkit, telecom-api, real-time, mobile]
objective:
- Integrate VoicePrint into the iOS and Android mobile apps via CallKit and Telecom API, enabling real-time call recording, analysis, and synthetic voice alerts during active phone calls.
deliverables:
- iOS CallKit extension for call interception and recording
- Android Telecom API integration for call screening and recording
- Real-time audio streaming to server for analysis
- Push notification alert when synthetic voice detected during call
- On-device audio capture and upload pipeline
steps:
1. **iOS Implementation:**
- Create CallKit extension (`CallDirectoryExtension`) for caller identification
- Implement `CXProvider` delegate for call state monitoring
- Add audio recording permission (NSMicrophoneUsageDescription in Info.plist)
- Stream call audio to server via WebSocket or upload after call ends
- Show in-call alert overlay when synthetic voice detected
- Handle app backgrounding and call recording continuity
2. **Android Implementation:**
- Implement `TelecomManager` with `ConnectionService` for call monitoring
- Add `READ_PHONE_STATE`, `RECORD_AUDIO`, `FOREGROUND_SERVICE` permissions
- Create call screening service that triggers on incoming/outgoing calls
- Record call audio using `MediaRecorder` or `AudioRecord`
- Upload audio to server for analysis after call ends
- Show heads-up notification when synthetic voice detected
3. **Server-side integration:**
- Extend VoicePrint tRPC router with `analyzeCallRecording` endpoint
- Handle multipart audio upload (WAV/MP3 format)
- Queue analysis job, push result via WebSocket or push notification
- Store analysis result linked to call metadata (number, duration, timestamp)
4. **Real-time vs. post-call analysis:**
- Phase 1: Post-call upload + analysis (simpler, lower latency requirement)
- Phase 2: Real-time streaming chunks during call (requires <500ms analysis)
5. **User experience:**
- Settings toggle: "Analyze calls for voice cloning"
- After each analyzed call: summary card in app (genuine/suspicious/synthetic)
- Emergency override: one-tap hangup + block number when synthetic detected
6. **Privacy and compliance:**
- Two-party consent state detection (disable recording in 2-party consent states)
- User must explicitly opt-in before any call recording
- Audio data encrypted in transit and at rest
- Auto-delete audio after analysis (configurable retention: 030 days)
tests:
- Unit: Mock CallKit/Telecom callbacks, verify audio capture and upload logic
- Integration: Test audio upload and analysis flow on device simulator
- E2E: Receive call on device → record audio → upload → receive analysis notification
acceptance_criteria:
- [ ] iOS app can record incoming call audio and upload to server for analysis
- [ ] Android app can record incoming call audio and upload to server for analysis
- [ ] Call recording only happens after explicit user opt-in
- [ ] Two-party consent states are detected and recording is disabled (legal compliance)
- [ ] Uploaded audio is analyzed by Azure Voice Live API and result pushed to device
- [ ] Push notification sent within 30 seconds of analysis completion
- [ ] In-app call summary shows: caller number, duration, analysis result, confidence score
- [ ] Emergency hangup button available when synthetic voice detected
- [ ] Audio data is encrypted in transit (TLS) and deleted after analysis (0-day retention default)
- [ ] App handles backgrounding without losing call recording session
- [ ] Recording doesn't interfere with normal call audio quality
validation:
- iOS: Test on physical device (simulator doesn't support CallKit), verify recording and upload
- Android: Test on physical device, verify Telecom API integration and notification delivery
- Server: Verify `analyzeCallRecording` endpoint accepts multipart upload and returns analysis
- Legal review: Confirm 2-party consent logic covers all US states correctly
notes:
- iOS CallKit extensions run in separate process — share data via App Groups
- Android Telecom API requires phone app to be default dialer (limited market penetration)
- Alternative: Use accessibility service on Android for broader call recording (more invasive UX)
- Real-time analysis requires chunking audio into 35 second segments and streaming — much harder than post-call
- Consider starting with post-call analysis and adding real-time as Phase 2
- Audio file sizes: 1 minute of WAV at 16kHz mono = ~1.9MB; compress to AAC/MP3 for upload
- The existing iOS `VoicePrintViewModel.swift` and Android `VoicePrintViewModel.kt` need updating

View File

@@ -0,0 +1,81 @@
# 13. Cross-Service Threat Correlation Scoring and Unified Alert Feed
meta:
id: core-services-13
feature: core-services-implementation
priority: P2
depends_on: [core-services-05, core-services-07, core-services-08]
tags: [correlation, threat-scoring, unified-alerts, intelligence, dashboard]
objective:
- Activate the correlation service to cross-reference findings across VoicePrint, DarkWatch, SpamShield, HomeTitle, and RemoveBrokers, generating unified threat scores and correlated alert narratives that explain multi-vector attacks.
deliverables:
- Cross-service correlation rules (e.g., breached email + spam call from same source = coordinated attack)
- Unified threat score algorithm (0100) per user and per family member
- Correlated alert narratives: "Your email was breached on Monday, and today you received a spam call to that number — this may be a targeted attack"
- Dashboard threat score widget with historical trend
steps:
1. Analyze existing correlation service (`services/correlation/`):
- Review current schema and logic in `correlation.service.ts`
- Identify data sources available from each service
2. Define correlation rules:
- Rule 1: Same email found in HIBP breach AND receiving spam calls → coordinated attack (+30 threat score)
- Rule 2: Property lien filed AND data broker listing active → identity theft in progress (+40 threat score)
- Rule 3: Voice clone detected AND family member SSN on dark web → targeted family scam (+50 threat score)
- Rule 4: Multiple breaches in 30 days → compromised identity (+20 threat score)
- Rule 5: Spam call from number associated with known scam campaign → high risk (+25 threat score)
3. Implement correlation detection pipeline:
- Subscribe to alert creation events from all 5 services
- Window function: look back 30 days for related findings
- Match on shared entities (email, phone, SSN, address, name)
4. Implement threat scoring algorithm:
- Base score: sum of individual alert severities (info=1, warning=3, critical=5)
- Correlation bonus: +1050 per matched rule
- Time decay: scores decrease by 10% per week (old alerts matter less)
- Family aggregation: highest individual score + average of others / 2
- Cap at 100, floor at 0
5. Implement unified alert feed:
- Merge individual service alerts into chronological feed
- Group correlated alerts into "attack narratives"
- Show narrative summary: "3 related events detected — possible coordinated attack"
6. Update dashboard widgets:
- Threat Score widget: current score with color coding (green <30, yellow 3060, red >60)
- Trend graph: score over last 90 days
- Alert Feed widget: unified feed with narrative grouping
7. Add proactive recommendations:
- If score > 60: recommend password changes, credit freeze, family notification
- If HomeTitle + RemoveBrokers correlated: recommend title insurance review
- If VoicePrint detected: recommend warning family members, filing FTC report
tests:
- Unit: Mock alerts from multiple services, verify correlation rules fire correctly
- Integration: Create correlated alerts in database, verify threat score calculation
- E2E: Trigger breach alert + spam alert for same email → verify unified narrative created
acceptance_criteria:
- [ ] Correlation rules detect cross-service relationships within 30-day window
- [ ] Threat score is calculated from individual alert severities + correlation bonuses
- [ ] Score decays by 10% per week (time-weighted relevance)
- [ ] Family plan aggregates scores across members
- [ ] Unified alert feed groups correlated events into narrative summaries
- [ ] Dashboard threat score widget updates in real-time as new alerts arrive
- [ ] Proactive recommendations appear based on current threat score and active correlations
- [ ] Correlation engine doesn't create false positives (test with 100 random alerts, <5% false correlation rate)
- [ ] Historical trend graph shows score changes over 90 days
- [ ] Each correlated narrative links to individual alert details
validation:
- Run `vitest run correlation.test.ts` — all tests pass
- Manual: Create test alerts (breached email + spam call), verify correlation detected
- Dashboard: Threat score updates from 15 to 55 after correlation bonus applied
- Trend: 90-day graph shows spike during test period
notes:
- The existing `correlation.service.ts` and `correlation.ts` router need activation — not just stubs
- Correlation is the key differentiator from point-solution competitors (Aura, LifeLock)
- False positive rate must be low — users will ignore alerts if too many are irrelevant
- Consider using graph database (Neo4j) for complex relationship queries at scale
- The existing `normalizedAlerts` table already stores cross-service alerts — use this as correlation source
- Mobile apps should show simplified threat score and latest narrative, not full correlation graph

View File

@@ -0,0 +1,91 @@
# 14. Family Plan Member Management, Billing Proration, and Multi-User Dashboard
meta:
id: core-services-14
feature: core-services-implementation
priority: P2
depends_on: [core-services-01]
tags: [billing, family-plans, multi-user, proration, dashboard, member-management]
objective:
- Implement family plan support: invite family members, manage their access, prorate billing on member changes, and provide a multi-user dashboard showing consolidated family security status.
deliverables:
- Family member invitation system (email invites with acceptance flow)
- Role-based access control (primary account holder vs. member)
- Billing proration for adding/removing family members mid-cycle
- Family dashboard showing all members' threat scores and alerts
- Per-member service configuration (what each member monitors)
steps:
1. Extend database schema:
- Add `familyGroups` table: id, primaryUserId, planTier, maxMembers, createdAt
- Add `familyMembers` table: id, familyGroupId, userId, role (primary/member), status (pending/active/removed), invitedAt, joinedAt
- Add `familyInvitations` table: id, familyGroupId, email, token, expiresAt, acceptedAt
2. Implement invitation flow:
- Primary user sends invite by email → generates signed token
- Invitee clicks link → creates account (if new) or links existing account
- Invitation expires after 7 days
- Send reminder email after 3 days if not accepted
3. Implement member management:
- Primary user can view all members, their active services, and threat scores
- Primary user can remove members (prorated refund or credit)
- Members can leave family group voluntarily
- Members cannot see other members' sensitive data (SSN, specific breach details)
4. Implement billing proration:
- Add member mid-cycle: charge prorated amount for remaining days via Stripe
- Remove member mid-cycle: credit prorated amount to account balance
- Change plan tier: prorate difference, apply to next invoice
- Use Stripe's `proration_behavior: 'create_prorations'` for all changes
5. Implement family dashboard:
- Sidebar shows family group name and member count
- Main view: cards for each member with photo, name, threat score, recent alert count
- Click member → detailed view with their services, alerts, and settings
- Consolidated family threat score (from correlation engine)
6. Implement per-member service configuration:
- Primary user assigns which services each member gets
- Default: all members get DarkWatch + SpamShield + RemoveBrokers
- HomeTitle and VoicePrint limited by property/voice enrollment slots
- Members can configure their own watchlist items within assigned services
7. Implement notification routing:
- Critical alerts notify primary user AND affected member
- Billing notifications go to primary user only
- Member can opt into/off specific alert types
8. Add family plan tiers:
- Family Fortress: 5 adults + unlimited children, $45/mo
- Family Guard: 3 adults + unlimited children, $35/mo
- Enforce max member limits at invitation time
tests:
- Unit: Proration calculation for add/remove/upgrade scenarios
- Integration: Full invitation flow from email to account linking
- E2E: Create family plan → invite 2 members → verify billing → remove member → verify prorated credit
acceptance_criteria:
- [ ] Primary user can send email invitations to family members
- [ ] Invitations expire after 7 days and can be resent
- [ ] Members can accept invitations and join family group
- [ ] Adding member mid-cycle creates prorated charge on next invoice
- [ ] Removing member mid-cycle creates prorated credit on next invoice
- [ ] Family dashboard shows all members with threat scores and alert counts
- [ ] Primary user can configure which services each member has access to
- [ ] Members cannot see other members' sensitive breach details (only score + summary)
- [ ] Billing notifications route to primary user; security alerts route to affected member
- [ ] Max member limits enforced at invitation (5 for Fortress, 3 for Guard)
- [ ] Plan downgrade prevents inviting beyond new tier's member limit
- [ ] All family plan changes handled via Stripe proration (no manual calculations)
validation:
- Run `vitest run billing.test.ts` — extended tests for family proration
- Manual: Send invitation to test email, click link, verify member joins family
- Stripe Dashboard: Verify proration items appear on invoices after member changes
- Dashboard: Family view shows 3 member cards with individual threat scores
notes:
- Family plans have 3050% lower churn than individual plans — this is a critical retention driver
- Stripe's `proration_behavior` handles most math automatically — trust it
- Children's accounts should be restricted: no dark web monitoring for minors, only spam/basic alerts
- Consider adding "family safety alerts" — notify primary user if child receives suspicious contact
- The existing `invitation.ts` schema may need extension for family-specific invitation tokens
- Member removal should not delete their account — just unlink from family group
- Children (under 18) should have simplified dashboard — no breach details, only "safe/attention needed"

View File

@@ -0,0 +1,45 @@
# Core Services Implementation
**Objective:** Convert all stub/placeholder services into production-ready implementations with real API integrations, enabling paid customer subscriptions and revenue.
**Status legend:** [ ] todo, [~] in-progress, [x] done
## Tasks
### Phase 1 — Foundation (Revenue Enabler)
- [ ] 01 — Stripe Checkout, webhooks, and subscription state management → `01-stripe-checkout-webhooks.md`
- [ ] 02 — Automated removal engine for top 20 data brokers → `02-removebrokers-top-20.md`
### Phase 2 — Core Services (Table Stakes)
- [ ] 03 — HIBP API integration for email breach monitoring → `03-darkwatch-hibp.md`
- [ ] 04 — SecurityTrails, Censys, Shodan API integrations → `04-darkwatch-attack-surface.md`
- [ ] 05 — Periodic scan scheduling, WebSocket progress, alert deduplication → `05-darkwatch-scheduler.md`
- [ ] 06 — Twilio Lookup and phone reputation API integration → `06-spamshield-reputation.md`
- [ ] 07 — Fine-tuned DistilBERT SMS spam classifier with ONNX deployment → `07-spamshield-ml-classifier.md`
### Phase 3 — Scale & Expand
- [ ] 08 — Expand broker coverage to 50+ with CAPTCHA solving → `08-removebrokers-50-plus.md`
- [ ] 09 — Attom Data Solutions API for property record snapshots → `09-hometitle-attom-api.md`
- [ ] 10 — County recorder web scrapers for top 100 US counties → `10-hometitle-county-scrapers.md`
- [ ] 11 — Azure Voice Live API for synthetic voice detection → `11-voiceprint-azure-api.md`
### Phase 4 — Differentiation & Polish
- [ ] 12 — iOS CallKit and Android Telecom API for real-time call analysis → `12-voiceprint-mobile-integration.md`
- [ ] 13 — Cross-service threat correlation scoring and unified alert feed → `13-correlation-engine.md`
- [ ] 14 — Family plan member management, billing proration, multi-user dashboard → `14-family-plans.md`
## Dependencies
- 02 → 08 (expand broker automation after initial 20 work)
- 03 → 04 → 05 (HIBP before attack surface APIs before scheduling)
- 06 → 07 (reputation APIs before ML classifier)
- 09 → 10 (Attom API before county scraping fallback)
- 11 → 12 (Azure API before mobile integration)
- 01 → 14 (billing before family plan management)
- 05, 07, 08 → 13 (core services feed into correlation engine)
## Exit Criteria
- All 5 core services make real API calls or run real ML inference — no stub responses remain in production code
- Billing supports Stripe Checkout, webhooks, tier upgrades/downgrades, and trial periods
- A paying customer can sign up, receive real alerts, and see tangible value within 48 hours
- Mobile apps display real data from all working services
- No `crypto.randomUUID()`, `isSynthetic: false`, `isSpam: false`, or `Unknown Owner` mock responses in production paths

View File

@@ -7,10 +7,10 @@ Status legend: [ ] todo, [~] in-progress, [x] done
## Tasks
### App Store Preparation
- [ ] 01 — App Store Screenshots & Metadata → `01-app-store-screenshots.md`
- [ ] 02 — App Preview Video → `02-app-preview-video.md`
- [ ] 03 — App Store Connect Configuration → `03-app-store-connect.md`
- [ ] 04 — TestFlight Beta Distribution → `04-testflight-beta.md`
- [x] 01 — App Store Screenshots & Metadata → `01-app-store-screenshots.md`
- [x] 02 — App Preview Video → `02-app-preview-video.md`
- [x] 03 — App Store Connect Configuration → `03-app-store-connect.md`
- [x] 04 — TestFlight Beta Distribution → `04-testflight-beta.md`
### Security Hardening
- [ ] 05 — Certificate Pinning & TLS Validation → `05-certificate-pinning.md`