Files
Kordant/tasks/shieldai-unified-restructure/17-spamshield-router.md
2026-05-25 12:23:23 -04:00

6.2 KiB

17. Backend Router — SpamShield (Spam Detection & Call Analysis)

meta: id: shieldai-unified-restructure-17 feature: shieldai-unified-restructure priority: P1 depends_on: [shieldai-unified-restructure-12, shieldai-unified-restructure-13, shieldai-unified-restructure-14] tags: [backend, trpc, spamshield, ml, api]

objective:

  • Build the tRPC router for SpamShield, the spam detection and call analysis service. Port all logic from services/spamshield/ and packages/api/src/routes/spamshield.routes.ts into a unified spamshield router and service layer.

deliverables:

  • web/src/server/api/routers/spamshield.ts — SpamShield router:
    • spamshield.checkNumberpublicProcedure (or API-key protected) checking phone number reputation
    • spamshield.classifySMSpublicProcedure classifying SMS text as spam/ham
    • spamshield.classifyCallpublicProcedure analyzing call metadata for spam likelihood
    • spamshield.getRulesprotectedProcedure returning user's spam rules
    • spamshield.createRuleprotectedProcedure creating a custom spam rule
    • spamshield.deleteRuleprotectedProcedure deleting a rule
    • spamshield.submitFeedbackprotectedProcedure submitting false positive/negative feedback
    • spamshield.getStatsprotectedProcedure returning spam detection statistics
  • web/src/server/services/spamshield.service.ts — Core business logic:
    • checkNumberReputation(phoneNumber) — query Hiya/Truecaller/other reputation APIs
    • classifySMS(text) — run BERT-based spam classification
    • classifyCall(metadata) — run rule engine + ML model on call data
    • createRule(userId, ruleType, pattern, action) — save custom rule
    • applyRules(userId, phoneNumber, text?) — evaluate custom rules against input
    • submitFeedback(userId, phoneNumber, isSpam, feedbackType) — log feedback for model retraining
    • getStats(userId, period?) — aggregate detection stats
  • web/src/server/services/spamshield/ml.engine.ts — ML inference:
    • classifyTextBERT(text) — BERT model inference for SMS spam
    • extractFeatures(metadata) — feature extraction for call analysis
    • ruleEngine(rules, input) — evaluate user-defined and global rules
  • web/src/server/services/spamshield/reputation.api.ts — External reputation lookups:
    • lookupHiya(phoneNumber) — Hiya API
    • lookupTruecaller(phoneNumber) — Truecaller API
    • lookupInternalDB(phoneNumber) — query cached reputation scores

steps:

  1. Create web/src/server/api/routers/spamshield.ts.
  2. Define Zod schemas:
    • checkNumberSchema: phoneNumber: z.string() (E.164 format validation)
    • classifySMSSchema: text: z.string().max(2000)
    • classifyCallSchema: callerNumber: z.string(), duration: z.number().optional(), timeOfDay: z.number().optional()
    • createRuleSchema: ruleType: z.enum([...]), pattern: z.string(), action: z.enum([...]), priority: z.number().default(0)
    • feedbackSchema: phoneNumber: z.string(), isSpam: z.boolean(), feedbackType: z.enum([...])
  3. Implement router procedures:
    • Number reputation check (may be called by extension or mobile apps)
    • SMS and call classification
    • Rule CRUD with user scoping
    • Feedback submission
  4. Create web/src/server/services/spamshield.service.ts:
    • Port from services/spamshield/src/
    • Implement number normalization (E.164)
    • Implement reputation caching (Redis or in-memory with TTL)
  5. Create ML engine:
    • classifyTextBERT: placeholder for BERT model. If not available in JS, create a Python bridge or use a pre-trained ONNX model.
    • extractFeatures: derive features from call metadata (time patterns, area code, duration)
    • ruleEngine: evaluate regex patterns, area code blocks, prefix blocks, reputation scores
  6. Create reputation API module:
    • Implement circuit breaker for external APIs (reference legacy services/spamshield/test/circuit-breaker.test.ts)
    • Cache results in DB or Redis for 24 hours
    • Fallback to internal database if external APIs fail
  7. Implement audit logging:
    • Every classification decision is logged to AuditLog table
    • Include input, output, confidence, model version, timestamp
  8. Wire router into web/src/server/api/root.ts.
  9. Write unit tests with mocked ML engine and reputation APIs.

steps:

  • Unit: checkNumberReputation normalizes phone and queries APIs with circuit breaker
  • Unit: classifySMS returns spam/ham with confidence
  • Unit: ruleEngine evaluates custom rules correctly
  • Unit: submitFeedback creates feedback record
  • Unit: Audit logging captures all classification decisions
  • Integration: tRPC checkNumber returns reputation for valid E.164 number

acceptance_criteria:

  • Phone numbers are normalized to E.164 before processing
  • Number reputation checks query external APIs with circuit breaker and caching
  • SMS classification returns spam/ham verdict with confidence score
  • Call analysis evaluates rules and ML model
  • Users can create, list, and delete custom spam rules
  • Feedback submissions are logged for model improvement
  • All classification decisions are audit-logged
  • Stats endpoint returns aggregated detection metrics per user

validation:

  • Call spamshield.checkNumber with a test phone number → verify reputation response
  • Call spamshield.classifySMS with known spam text → verify high spam score
  • Create a custom rule and verify it blocks matching numbers
  • Submit feedback and verify record created in DB
  • Run cd web && pnpm test for SpamShield unit tests

notes:

  • Reference legacy: services/spamshield/src/, packages/api/src/routes/spamshield.routes.ts
  • The BERT model for SMS classification may require Python. Use the same approach as VoicePrint: pluggable ML engine with Python bridge or ONNX.
  • Hiya and Truecaller APIs require commercial agreements. For development, mock these or use free alternatives like NumVerify.
  • The checkNumber endpoint may receive high traffic from the browser extension. Ensure it is rate-limited and cached aggressively.
  • Consider adding a global spam database that accumulates feedback from all users (anonymized) to improve detection.
  • The rule engine should support both user-specific rules and global admin rules.