Files
Kordant/tasks/core-services-implementation/08-removebrokers-50-plus.md
2026-05-31 22:03:18 -04:00

3.9 KiB
Raw Permalink Blame History

08. Expand Broker Coverage to 50+ with CAPTCHA Solving and Re-Scan Pipeline

meta: id: core-services-08 feature: core-services-implementation priority: P2 depends_on: [core-services-02] tags: [removebrokers, automation, captcha, scaling, maintenance]

objective:

  • Scale from top 20 brokers to 50+ automated removals, implement CAPTCHA solving, and build the re-scan pipeline that detects re-listings.

deliverables:

  • 30+ additional broker adapters (total 50+)
  • CAPTCHA solving integration (2Captcha or AntiCaptcha API)
  • Re-scan scheduler that checks if removed profiles have reappeared
  • Email verification handling for opt-out confirmation emails
  • Removal success rate dashboard metric

steps:

  1. Select next 30 brokers from registry by opt-out complexity (medium-difficulty form-based flows)
  2. Create adapter modules for each broker in removebrokers/adapters/
  3. Implement CAPTCHA solving:
    • Detect reCAPTCHA v2/v3, hCaptcha, image challenges
    • Integrate 2Captcha API ($0.001$0.01 per solve)
    • Add CAPTCHA_SOLVER_API_KEY to environment config
    • Fallback to manual queue if CAPTCHA solving fails 3 times
  4. Implement email verification handling:
    • Monitor mailbox for opt-out confirmation emails
    • Parse confirmation links and auto-click them
    • Store confirmation status in database
  5. Build re-scan pipeline:
    • Weekly scheduled job that re-scans all "completed" removals
    • If profile reappears, create new removal request automatically
    • Track re-listing rate per broker (some re-list every 30 days)
  6. Add success metrics:
    • Track removal success rate per broker (% of opt-outs that stick)
    • Dashboard widget showing "X of Y brokers removed"
    • Alert user when re-listing detected
  7. Implement proxy rotation pool:
    • Use residential proxy service (BrightData, IPRoyal)
    • Rotate IP per broker session to avoid blocks
    • Budget $1K$3K/mo for proxy infrastructure
  8. Add adapter health monitoring:
    • Track adapter breakage rate
    • Alert engineering when >5% of adapters fail in 24h
    • Auto-disable broken adapters, queue for manual fix

tests:

  • Unit: Mock CAPTCHA solver, verify retry and fallback logic
  • Integration: Test CAPTCHA solving against real broker site
  • E2E: Complete removal for broker with CAPTCHA → verify re-scan detects re-listing

acceptance_criteria:

  • 50+ broker adapters implemented and tested
  • CAPTCHA challenges are detected and solved automatically (2Captcha integration)
  • Failed CAPTCHA solving escalates to manual queue after 3 attempts
  • Email confirmation links are parsed and clicked automatically
  • Re-scan job runs weekly and detects re-listings within 7 days
  • Re-listed profiles trigger automatic new removal requests
  • Dashboard shows accurate removal progress: "47 of 50 brokers completed"
  • Per-broker success rate is tracked and visible in admin panel
  • Proxy rotation prevents IP blocking on high-volume brokers
  • Adapter breakage is detected within 24 hours and auto-disabled
  • Monthly proxy + CAPTCHA cost per user < $4 (within gross margin target)

validation:

  • Run vitest run removebrokers.service.test.ts — extended tests for 50 brokers
  • Manual: Test CAPTCHA broker (e.g., MyLife), verify automatic solving works
  • Check re-scan: Run bun run job:removebrokers:rescan, verify re-listings detected
  • Monitor costs: Dashboard shows monthly proxy/CAPTCHA spend per customer

notes:

  • Broker sites change frequently — budget 20% engineering time for adapter maintenance
  • Some brokers (Acxiom, Epsilon) require physical mail — flag these for manual processing
  • Re-listing is common — data brokers rebuild databases from public records every 3090 days
  • Consider AI-assisted form field detection (GPT-4 Vision) to reduce per-adapter development time
  • The existing broker.registry.ts already has 100+ entries — prioritize by traffic/popularity
  • Success rate target: 80%+ for automated removals, 90%+ with manual fallback