3.9 KiB
3.9 KiB
08. Expand Broker Coverage to 50+ with CAPTCHA Solving and Re-Scan Pipeline
meta: id: core-services-08 feature: core-services-implementation priority: P2 depends_on: [core-services-02] tags: [removebrokers, automation, captcha, scaling, maintenance]
objective:
- Scale from top 20 brokers to 50+ automated removals, implement CAPTCHA solving, and build the re-scan pipeline that detects re-listings.
deliverables:
- 30+ additional broker adapters (total 50+)
- CAPTCHA solving integration (2Captcha or AntiCaptcha API)
- Re-scan scheduler that checks if removed profiles have reappeared
- Email verification handling for opt-out confirmation emails
- Removal success rate dashboard metric
steps:
- Select next 30 brokers from registry by opt-out complexity (medium-difficulty form-based flows)
- Create adapter modules for each broker in
removebrokers/adapters/ - Implement CAPTCHA solving:
- Detect reCAPTCHA v2/v3, hCaptcha, image challenges
- Integrate 2Captcha API ($0.001–$0.01 per solve)
- Add
CAPTCHA_SOLVER_API_KEYto environment config - Fallback to manual queue if CAPTCHA solving fails 3 times
- Implement email verification handling:
- Monitor mailbox for opt-out confirmation emails
- Parse confirmation links and auto-click them
- Store confirmation status in database
- Build re-scan pipeline:
- Weekly scheduled job that re-scans all "completed" removals
- If profile reappears, create new removal request automatically
- Track re-listing rate per broker (some re-list every 30 days)
- Add success metrics:
- Track removal success rate per broker (% of opt-outs that stick)
- Dashboard widget showing "X of Y brokers removed"
- Alert user when re-listing detected
- Implement proxy rotation pool:
- Use residential proxy service (BrightData, IPRoyal)
- Rotate IP per broker session to avoid blocks
- Budget $1K–$3K/mo for proxy infrastructure
- Add adapter health monitoring:
- Track adapter breakage rate
- Alert engineering when >5% of adapters fail in 24h
- Auto-disable broken adapters, queue for manual fix
tests:
- Unit: Mock CAPTCHA solver, verify retry and fallback logic
- Integration: Test CAPTCHA solving against real broker site
- E2E: Complete removal for broker with CAPTCHA → verify re-scan detects re-listing
acceptance_criteria:
- 50+ broker adapters implemented and tested
- CAPTCHA challenges are detected and solved automatically (2Captcha integration)
- Failed CAPTCHA solving escalates to manual queue after 3 attempts
- Email confirmation links are parsed and clicked automatically
- Re-scan job runs weekly and detects re-listings within 7 days
- Re-listed profiles trigger automatic new removal requests
- Dashboard shows accurate removal progress: "47 of 50 brokers completed"
- Per-broker success rate is tracked and visible in admin panel
- Proxy rotation prevents IP blocking on high-volume brokers
- Adapter breakage is detected within 24 hours and auto-disabled
- Monthly proxy + CAPTCHA cost per user < $4 (within gross margin target)
validation:
- Run
vitest run removebrokers.service.test.ts— extended tests for 50 brokers - Manual: Test CAPTCHA broker (e.g., MyLife), verify automatic solving works
- Check re-scan: Run
bun run job:removebrokers:rescan, verify re-listings detected - Monitor costs: Dashboard shows monthly proxy/CAPTCHA spend per customer
notes:
- Broker sites change frequently — budget 20% engineering time for adapter maintenance
- Some brokers (Acxiom, Epsilon) require physical mail — flag these for manual processing
- Re-listing is common — data brokers rebuild databases from public records every 30–90 days
- Consider AI-assisted form field detection (GPT-4 Vision) to reduce per-adapter development time
- The existing
broker.registry.tsalready has 100+ entries — prioritize by traffic/popularity - Success rate target: 80%+ for automated removals, 90%+ with manual fallback