Files
Kordant/tasks/core-services-implementation/02-removebrokers-top-20.md
2026-05-31 22:03:18 -04:00

3.5 KiB
Raw Blame History

02. Automated Removal Engine for Top 20 Data Brokers

meta: id: core-services-02 feature: core-services-implementation priority: P0 depends_on: [core-services-01] tags: [removebrokers, automation, playwright, scraping, revenue]

objective:

  • Replace the submitAutomatedRemoval() stub that returns crypto.randomUUID() with a real Playwright-based browser automation that submits opt-out requests to the top 20 data brokers.

deliverables:

  • Playwright-based removal engine in removebrokers/removal.engine.ts
  • Per-broker adapter modules for top 20 brokers (Spokeo, Whitepages, MyLife, BeenVerified, etc.)
  • CAPTCHA detection and graceful failure (manual fallback flow)
  • Removal request status tracking with actual polling
  • Email notification service integration for opt-out confirmations

steps:

  1. Install Playwright: npm install -D playwright @playwright/test
  2. Analyze opt-out flows for top 20 brokers from existing registry data
  3. Create removebrokers/adapters/ directory with one module per broker
  4. Implement base adapter interface: scanForProfile, submitOptOut, verifyRemoval, getStatus
  5. Implement adapters for each top 20 broker with navigation, form filling, and submission logic
  6. Add proxy rotation support (BrightData or similar) to avoid IP blocking
  7. Add stealth mode (playwright-stealth) to reduce detection
  8. Implement submitAutomatedRemoval() to select correct adapter by broker ID and execute
  9. Store actual request IDs from brokers (not generated UUIDs) in database
  10. Implement trackRemovalStatus() with periodic re-scans for submitted requests
  11. Integrate with notification service to email user when removal is confirmed
  12. Add job handler for batch removal processing queue
  13. Handle failures gracefully: retry with backoff, escalate to manual queue after 3 failures

tests:

  • Unit: Mock Playwright browser, verify adapter navigation sequences
  • Integration: Run adapter against real broker site in headful mode, verify opt-out form submission
  • E2E: Full flow — add broker to watchlist → trigger removal → verify status progression

acceptance_criteria:

  • Top 20 broker adapters are implemented and tested against live sites
  • submitAutomatedRemoval() no longer returns mock UUIDs — it submits real opt-out requests
  • Removal status tracks actual broker state (pending → submitted → completed/failed)
  • Failed removals are retried 3 times with exponential backoff, then escalated to manual queue
  • CAPTCHA challenges are detected and flagged for manual processing (not silently failing)
  • Job queue processes removals asynchronously without blocking API responses
  • User dashboard shows real removal progress per broker
  • All Playwright browsers are properly closed after each session (no resource leaks)

validation:

  • Run vitest run removebrokers.service.test.ts — all tests pass
  • Manual test: Trigger removal for Spokeo, verify opt-out email received
  • Check database: removal_requests table has real request IDs and actual status values
  • Run removal job: bun run job:removebrokers processes queue without errors

notes:

  • Broker sites change frequently — expect 1525% of adapters to break per quarter
  • Some brokers require email verification sent to the listed email (often outdated) — flag these
  • Start with brokers that have simple form-based opt-outs; defer email/physical mail brokers to Phase 3
  • The existing broker registry in broker.registry.ts already has removal URLs — use these as starting points
  • Budget $1K$3K/mo for proxy infrastructure at scale