3.5 KiB
3.5 KiB
02. Automated Removal Engine for Top 20 Data Brokers
meta: id: core-services-02 feature: core-services-implementation priority: P0 depends_on: [core-services-01] tags: [removebrokers, automation, playwright, scraping, revenue]
objective:
- Replace the
submitAutomatedRemoval()stub that returnscrypto.randomUUID()with a real Playwright-based browser automation that submits opt-out requests to the top 20 data brokers.
deliverables:
- Playwright-based removal engine in
removebrokers/removal.engine.ts - Per-broker adapter modules for top 20 brokers (Spokeo, Whitepages, MyLife, BeenVerified, etc.)
- CAPTCHA detection and graceful failure (manual fallback flow)
- Removal request status tracking with actual polling
- Email notification service integration for opt-out confirmations
steps:
- Install Playwright:
npm install -D playwright @playwright/test - Analyze opt-out flows for top 20 brokers from existing registry data
- Create
removebrokers/adapters/directory with one module per broker - Implement base adapter interface:
scanForProfile,submitOptOut,verifyRemoval,getStatus - Implement adapters for each top 20 broker with navigation, form filling, and submission logic
- Add proxy rotation support (BrightData or similar) to avoid IP blocking
- Add stealth mode (playwright-stealth) to reduce detection
- Implement
submitAutomatedRemoval()to select correct adapter by broker ID and execute - Store actual request IDs from brokers (not generated UUIDs) in database
- Implement
trackRemovalStatus()with periodic re-scans for submitted requests - Integrate with notification service to email user when removal is confirmed
- Add job handler for batch removal processing queue
- Handle failures gracefully: retry with backoff, escalate to manual queue after 3 failures
tests:
- Unit: Mock Playwright browser, verify adapter navigation sequences
- Integration: Run adapter against real broker site in headful mode, verify opt-out form submission
- E2E: Full flow — add broker to watchlist → trigger removal → verify status progression
acceptance_criteria:
- Top 20 broker adapters are implemented and tested against live sites
submitAutomatedRemoval()no longer returns mock UUIDs — it submits real opt-out requests- Removal status tracks actual broker state (pending → submitted → completed/failed)
- Failed removals are retried 3 times with exponential backoff, then escalated to manual queue
- CAPTCHA challenges are detected and flagged for manual processing (not silently failing)
- Job queue processes removals asynchronously without blocking API responses
- User dashboard shows real removal progress per broker
- All Playwright browsers are properly closed after each session (no resource leaks)
validation:
- Run
vitest run removebrokers.service.test.ts— all tests pass - Manual test: Trigger removal for Spokeo, verify opt-out email received
- Check database:
removal_requeststable has real request IDs and actual status values - Run removal job:
bun run job:removebrokersprocesses queue without errors
notes:
- Broker sites change frequently — expect 15–25% of adapters to break per quarter
- Some brokers require email verification sent to the listed email (often outdated) — flag these
- Start with brokers that have simple form-based opt-outs; defer email/physical mail brokers to Phase 3
- The existing broker registry in
broker.registry.tsalready has removal URLs — use these as starting points - Budget $1K–$3K/mo for proxy infrastructure at scale