80 lines
3.9 KiB
Markdown
80 lines
3.9 KiB
Markdown
# 08. Expand Broker Coverage to 50+ with CAPTCHA Solving and Re-Scan Pipeline
|
||
|
||
meta:
|
||
id: core-services-08
|
||
feature: core-services-implementation
|
||
priority: P2
|
||
depends_on: [core-services-02]
|
||
tags: [removebrokers, automation, captcha, scaling, maintenance]
|
||
|
||
objective:
|
||
- Scale from top 20 brokers to 50+ automated removals, implement CAPTCHA solving, and build the re-scan pipeline that detects re-listings.
|
||
|
||
deliverables:
|
||
- 30+ additional broker adapters (total 50+)
|
||
- CAPTCHA solving integration (2Captcha or AntiCaptcha API)
|
||
- Re-scan scheduler that checks if removed profiles have reappeared
|
||
- Email verification handling for opt-out confirmation emails
|
||
- Removal success rate dashboard metric
|
||
|
||
steps:
|
||
1. Select next 30 brokers from registry by opt-out complexity (medium-difficulty form-based flows)
|
||
2. Create adapter modules for each broker in `removebrokers/adapters/`
|
||
3. Implement CAPTCHA solving:
|
||
- Detect reCAPTCHA v2/v3, hCaptcha, image challenges
|
||
- Integrate 2Captcha API ($0.001–$0.01 per solve)
|
||
- Add `CAPTCHA_SOLVER_API_KEY` to environment config
|
||
- Fallback to manual queue if CAPTCHA solving fails 3 times
|
||
4. Implement email verification handling:
|
||
- Monitor mailbox for opt-out confirmation emails
|
||
- Parse confirmation links and auto-click them
|
||
- Store confirmation status in database
|
||
5. Build re-scan pipeline:
|
||
- Weekly scheduled job that re-scans all "completed" removals
|
||
- If profile reappears, create new removal request automatically
|
||
- Track re-listing rate per broker (some re-list every 30 days)
|
||
6. Add success metrics:
|
||
- Track removal success rate per broker (% of opt-outs that stick)
|
||
- Dashboard widget showing "X of Y brokers removed"
|
||
- Alert user when re-listing detected
|
||
7. Implement proxy rotation pool:
|
||
- Use residential proxy service (BrightData, IPRoyal)
|
||
- Rotate IP per broker session to avoid blocks
|
||
- Budget $1K–$3K/mo for proxy infrastructure
|
||
8. Add adapter health monitoring:
|
||
- Track adapter breakage rate
|
||
- Alert engineering when >5% of adapters fail in 24h
|
||
- Auto-disable broken adapters, queue for manual fix
|
||
|
||
tests:
|
||
- Unit: Mock CAPTCHA solver, verify retry and fallback logic
|
||
- Integration: Test CAPTCHA solving against real broker site
|
||
- E2E: Complete removal for broker with CAPTCHA → verify re-scan detects re-listing
|
||
|
||
acceptance_criteria:
|
||
- [ ] 50+ broker adapters implemented and tested
|
||
- [ ] CAPTCHA challenges are detected and solved automatically (2Captcha integration)
|
||
- [ ] Failed CAPTCHA solving escalates to manual queue after 3 attempts
|
||
- [ ] Email confirmation links are parsed and clicked automatically
|
||
- [ ] Re-scan job runs weekly and detects re-listings within 7 days
|
||
- [ ] Re-listed profiles trigger automatic new removal requests
|
||
- [ ] Dashboard shows accurate removal progress: "47 of 50 brokers completed"
|
||
- [ ] Per-broker success rate is tracked and visible in admin panel
|
||
- [ ] Proxy rotation prevents IP blocking on high-volume brokers
|
||
- [ ] Adapter breakage is detected within 24 hours and auto-disabled
|
||
- [ ] Monthly proxy + CAPTCHA cost per user < $4 (within gross margin target)
|
||
|
||
validation:
|
||
- Run `vitest run removebrokers.service.test.ts` — extended tests for 50 brokers
|
||
- Manual: Test CAPTCHA broker (e.g., MyLife), verify automatic solving works
|
||
- Check re-scan: Run `bun run job:removebrokers:rescan`, verify re-listings detected
|
||
- Monitor costs: Dashboard shows monthly proxy/CAPTCHA spend per customer
|
||
|
||
notes:
|
||
- Broker sites change frequently — budget 20% engineering time for adapter maintenance
|
||
- Some brokers (Acxiom, Epsilon) require physical mail — flag these for manual processing
|
||
- Re-listing is common — data brokers rebuild databases from public records every 30–90 days
|
||
- Consider AI-assisted form field detection (GPT-4 Vision) to reduce per-adapter development time
|
||
- The existing `broker.registry.ts` already has 100+ entries — prioritize by traffic/popularity
|
||
- Success rate target: 80%+ for automated removals, 90%+ with manual fallback
|