Files
Kordant/tasks/core-services-implementation/02-removebrokers-top-20.md
2026-05-31 22:03:18 -04:00

62 lines
3.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 02. Automated Removal Engine for Top 20 Data Brokers
meta:
id: core-services-02
feature: core-services-implementation
priority: P0
depends_on: [core-services-01]
tags: [removebrokers, automation, playwright, scraping, revenue]
objective:
- Replace the `submitAutomatedRemoval()` stub that returns `crypto.randomUUID()` with a real Playwright-based browser automation that submits opt-out requests to the top 20 data brokers.
deliverables:
- Playwright-based removal engine in `removebrokers/removal.engine.ts`
- Per-broker adapter modules for top 20 brokers (Spokeo, Whitepages, MyLife, BeenVerified, etc.)
- CAPTCHA detection and graceful failure (manual fallback flow)
- Removal request status tracking with actual polling
- Email notification service integration for opt-out confirmations
steps:
1. Install Playwright: `npm install -D playwright @playwright/test`
2. Analyze opt-out flows for top 20 brokers from existing registry data
3. Create `removebrokers/adapters/` directory with one module per broker
4. Implement base adapter interface: `scanForProfile`, `submitOptOut`, `verifyRemoval`, `getStatus`
5. Implement adapters for each top 20 broker with navigation, form filling, and submission logic
6. Add proxy rotation support (BrightData or similar) to avoid IP blocking
7. Add stealth mode (playwright-stealth) to reduce detection
8. Implement `submitAutomatedRemoval()` to select correct adapter by broker ID and execute
9. Store actual request IDs from brokers (not generated UUIDs) in database
10. Implement `trackRemovalStatus()` with periodic re-scans for submitted requests
11. Integrate with notification service to email user when removal is confirmed
12. Add job handler for batch removal processing queue
13. Handle failures gracefully: retry with backoff, escalate to manual queue after 3 failures
tests:
- Unit: Mock Playwright browser, verify adapter navigation sequences
- Integration: Run adapter against real broker site in headful mode, verify opt-out form submission
- E2E: Full flow — add broker to watchlist → trigger removal → verify status progression
acceptance_criteria:
- [ ] Top 20 broker adapters are implemented and tested against live sites
- [ ] `submitAutomatedRemoval()` no longer returns mock UUIDs — it submits real opt-out requests
- [ ] Removal status tracks actual broker state (pending → submitted → completed/failed)
- [ ] Failed removals are retried 3 times with exponential backoff, then escalated to manual queue
- [ ] CAPTCHA challenges are detected and flagged for manual processing (not silently failing)
- [ ] Job queue processes removals asynchronously without blocking API responses
- [ ] User dashboard shows real removal progress per broker
- [ ] All Playwright browsers are properly closed after each session (no resource leaks)
validation:
- Run `vitest run removebrokers.service.test.ts` — all tests pass
- Manual test: Trigger removal for Spokeo, verify opt-out email received
- Check database: `removal_requests` table has real request IDs and actual status values
- Run removal job: `bun run job:removebrokers` processes queue without errors
notes:
- Broker sites change frequently — expect 1525% of adapters to break per quarter
- Some brokers require email verification sent to the listed email (often outdated) — flag these
- Start with brokers that have simple form-based opt-outs; defer email/physical mail brokers to Phase 3
- The existing broker registry in `broker.registry.ts` already has removal URLs — use these as starting points
- Budget $1K$3K/mo for proxy infrastructure at scale