Files
Kordant/tasks/core-services-implementation/08-removebrokers-50-plus.md
2026-05-31 22:03:18 -04:00

80 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 08. Expand Broker Coverage to 50+ with CAPTCHA Solving and Re-Scan Pipeline
meta:
id: core-services-08
feature: core-services-implementation
priority: P2
depends_on: [core-services-02]
tags: [removebrokers, automation, captcha, scaling, maintenance]
objective:
- Scale from top 20 brokers to 50+ automated removals, implement CAPTCHA solving, and build the re-scan pipeline that detects re-listings.
deliverables:
- 30+ additional broker adapters (total 50+)
- CAPTCHA solving integration (2Captcha or AntiCaptcha API)
- Re-scan scheduler that checks if removed profiles have reappeared
- Email verification handling for opt-out confirmation emails
- Removal success rate dashboard metric
steps:
1. Select next 30 brokers from registry by opt-out complexity (medium-difficulty form-based flows)
2. Create adapter modules for each broker in `removebrokers/adapters/`
3. Implement CAPTCHA solving:
- Detect reCAPTCHA v2/v3, hCaptcha, image challenges
- Integrate 2Captcha API ($0.001$0.01 per solve)
- Add `CAPTCHA_SOLVER_API_KEY` to environment config
- Fallback to manual queue if CAPTCHA solving fails 3 times
4. Implement email verification handling:
- Monitor mailbox for opt-out confirmation emails
- Parse confirmation links and auto-click them
- Store confirmation status in database
5. Build re-scan pipeline:
- Weekly scheduled job that re-scans all "completed" removals
- If profile reappears, create new removal request automatically
- Track re-listing rate per broker (some re-list every 30 days)
6. Add success metrics:
- Track removal success rate per broker (% of opt-outs that stick)
- Dashboard widget showing "X of Y brokers removed"
- Alert user when re-listing detected
7. Implement proxy rotation pool:
- Use residential proxy service (BrightData, IPRoyal)
- Rotate IP per broker session to avoid blocks
- Budget $1K$3K/mo for proxy infrastructure
8. Add adapter health monitoring:
- Track adapter breakage rate
- Alert engineering when >5% of adapters fail in 24h
- Auto-disable broken adapters, queue for manual fix
tests:
- Unit: Mock CAPTCHA solver, verify retry and fallback logic
- Integration: Test CAPTCHA solving against real broker site
- E2E: Complete removal for broker with CAPTCHA → verify re-scan detects re-listing
acceptance_criteria:
- [ ] 50+ broker adapters implemented and tested
- [ ] CAPTCHA challenges are detected and solved automatically (2Captcha integration)
- [ ] Failed CAPTCHA solving escalates to manual queue after 3 attempts
- [ ] Email confirmation links are parsed and clicked automatically
- [ ] Re-scan job runs weekly and detects re-listings within 7 days
- [ ] Re-listed profiles trigger automatic new removal requests
- [ ] Dashboard shows accurate removal progress: "47 of 50 brokers completed"
- [ ] Per-broker success rate is tracked and visible in admin panel
- [ ] Proxy rotation prevents IP blocking on high-volume brokers
- [ ] Adapter breakage is detected within 24 hours and auto-disabled
- [ ] Monthly proxy + CAPTCHA cost per user < $4 (within gross margin target)
validation:
- Run `vitest run removebrokers.service.test.ts` — extended tests for 50 brokers
- Manual: Test CAPTCHA broker (e.g., MyLife), verify automatic solving works
- Check re-scan: Run `bun run job:removebrokers:rescan`, verify re-listings detected
- Monitor costs: Dashboard shows monthly proxy/CAPTCHA spend per customer
notes:
- Broker sites change frequently — budget 20% engineering time for adapter maintenance
- Some brokers (Acxiom, Epsilon) require physical mail — flag these for manual processing
- Re-listing is common — data brokers rebuild databases from public records every 3090 days
- Consider AI-assisted form field detection (GPT-4 Vision) to reduce per-adapter development time
- The existing `broker.registry.ts` already has 100+ entries — prioritize by traffic/popularity
- Success rate target: 80%+ for automated removals, 90%+ with manual fallback