62 lines
3.5 KiB
Markdown
62 lines
3.5 KiB
Markdown
# 02. Automated Removal Engine for Top 20 Data Brokers
|
||
|
||
meta:
|
||
id: core-services-02
|
||
feature: core-services-implementation
|
||
priority: P0
|
||
depends_on: [core-services-01]
|
||
tags: [removebrokers, automation, playwright, scraping, revenue]
|
||
|
||
objective:
|
||
- Replace the `submitAutomatedRemoval()` stub that returns `crypto.randomUUID()` with a real Playwright-based browser automation that submits opt-out requests to the top 20 data brokers.
|
||
|
||
deliverables:
|
||
- Playwright-based removal engine in `removebrokers/removal.engine.ts`
|
||
- Per-broker adapter modules for top 20 brokers (Spokeo, Whitepages, MyLife, BeenVerified, etc.)
|
||
- CAPTCHA detection and graceful failure (manual fallback flow)
|
||
- Removal request status tracking with actual polling
|
||
- Email notification service integration for opt-out confirmations
|
||
|
||
steps:
|
||
1. Install Playwright: `npm install -D playwright @playwright/test`
|
||
2. Analyze opt-out flows for top 20 brokers from existing registry data
|
||
3. Create `removebrokers/adapters/` directory with one module per broker
|
||
4. Implement base adapter interface: `scanForProfile`, `submitOptOut`, `verifyRemoval`, `getStatus`
|
||
5. Implement adapters for each top 20 broker with navigation, form filling, and submission logic
|
||
6. Add proxy rotation support (BrightData or similar) to avoid IP blocking
|
||
7. Add stealth mode (playwright-stealth) to reduce detection
|
||
8. Implement `submitAutomatedRemoval()` to select correct adapter by broker ID and execute
|
||
9. Store actual request IDs from brokers (not generated UUIDs) in database
|
||
10. Implement `trackRemovalStatus()` with periodic re-scans for submitted requests
|
||
11. Integrate with notification service to email user when removal is confirmed
|
||
12. Add job handler for batch removal processing queue
|
||
13. Handle failures gracefully: retry with backoff, escalate to manual queue after 3 failures
|
||
|
||
tests:
|
||
- Unit: Mock Playwright browser, verify adapter navigation sequences
|
||
- Integration: Run adapter against real broker site in headful mode, verify opt-out form submission
|
||
- E2E: Full flow — add broker to watchlist → trigger removal → verify status progression
|
||
|
||
acceptance_criteria:
|
||
- [ ] Top 20 broker adapters are implemented and tested against live sites
|
||
- [ ] `submitAutomatedRemoval()` no longer returns mock UUIDs — it submits real opt-out requests
|
||
- [ ] Removal status tracks actual broker state (pending → submitted → completed/failed)
|
||
- [ ] Failed removals are retried 3 times with exponential backoff, then escalated to manual queue
|
||
- [ ] CAPTCHA challenges are detected and flagged for manual processing (not silently failing)
|
||
- [ ] Job queue processes removals asynchronously without blocking API responses
|
||
- [ ] User dashboard shows real removal progress per broker
|
||
- [ ] All Playwright browsers are properly closed after each session (no resource leaks)
|
||
|
||
validation:
|
||
- Run `vitest run removebrokers.service.test.ts` — all tests pass
|
||
- Manual test: Trigger removal for Spokeo, verify opt-out email received
|
||
- Check database: `removal_requests` table has real request IDs and actual status values
|
||
- Run removal job: `bun run job:removebrokers` processes queue without errors
|
||
|
||
notes:
|
||
- Broker sites change frequently — expect 15–25% of adapters to break per quarter
|
||
- Some brokers require email verification sent to the listed email (often outdated) — flag these
|
||
- Start with brokers that have simple form-based opt-outs; defer email/physical mail brokers to Phase 3
|
||
- The existing broker registry in `broker.registry.ts` already has removal URLs — use these as starting points
|
||
- Budget $1K–$3K/mo for proxy infrastructure at scale
|