17. Backend Router — SpamShield (Spam Detection & Call Analysis)

meta: id: shieldai-unified-restructure-17 feature: shieldai-unified-restructure priority: P1 depends_on: [shieldai-unified-restructure-12, shieldai-unified-restructure-13, shieldai-unified-restructure-14] tags: [backend, trpc, spamshield, ml, api]

objective:

Build the tRPC router for SpamShield, the spam detection and call analysis service. Port all logic from services/spamshield/ and packages/api/src/routes/spamshield.routes.ts into a unified spamshield router and service layer.

deliverables:

web/src/server/api/routers/spamshield.ts — SpamShield router:
- spamshield.checkNumber — publicProcedure (or API-key protected) checking phone number reputation
- spamshield.classifySMS — publicProcedure classifying SMS text as spam/ham
- spamshield.classifyCall — publicProcedure analyzing call metadata for spam likelihood
- spamshield.getRules — protectedProcedure returning user's spam rules
- spamshield.createRule — protectedProcedure creating a custom spam rule
- spamshield.deleteRule — protectedProcedure deleting a rule
- spamshield.submitFeedback — protectedProcedure submitting false positive/negative feedback
- spamshield.getStats — protectedProcedure returning spam detection statistics
web/src/server/services/spamshield.service.ts — Core business logic:
- checkNumberReputation(phoneNumber) — query Hiya/Truecaller/other reputation APIs
- classifySMS(text) — run BERT-based spam classification
- classifyCall(metadata) — run rule engine + ML model on call data
- createRule(userId, ruleType, pattern, action) — save custom rule
- applyRules(userId, phoneNumber, text?) — evaluate custom rules against input
- submitFeedback(userId, phoneNumber, isSpam, feedbackType) — log feedback for model retraining
- getStats(userId, period?) — aggregate detection stats
web/src/server/services/spamshield/ml.engine.ts — ML inference:
- classifyTextBERT(text) — BERT model inference for SMS spam
- extractFeatures(metadata) — feature extraction for call analysis
- ruleEngine(rules, input) — evaluate user-defined and global rules
web/src/server/services/spamshield/reputation.api.ts — External reputation lookups:
- lookupHiya(phoneNumber) — Hiya API
- lookupTruecaller(phoneNumber) — Truecaller API
- lookupInternalDB(phoneNumber) — query cached reputation scores

steps:

Create web/src/server/api/routers/spamshield.ts.
Define Zod schemas:
- checkNumberSchema: phoneNumber: z.string() (E.164 format validation)
- classifySMSSchema: text: z.string().max(2000)
- classifyCallSchema: callerNumber: z.string(), duration: z.number().optional(), timeOfDay: z.number().optional()
- createRuleSchema: ruleType: z.enum([...]), pattern: z.string(), action: z.enum([...]), priority: z.number().default(0)
- feedbackSchema: phoneNumber: z.string(), isSpam: z.boolean(), feedbackType: z.enum([...])
Implement router procedures:
- Number reputation check (may be called by extension or mobile apps)
- SMS and call classification
- Rule CRUD with user scoping
- Feedback submission
Create web/src/server/services/spamshield.service.ts:
- Port from services/spamshield/src/
- Implement number normalization (E.164)
- Implement reputation caching (Redis or in-memory with TTL)
Create ML engine:
- classifyTextBERT: placeholder for BERT model. If not available in JS, create a Python bridge or use a pre-trained ONNX model.
- extractFeatures: derive features from call metadata (time patterns, area code, duration)
- ruleEngine: evaluate regex patterns, area code blocks, prefix blocks, reputation scores
Create reputation API module:
- Implement circuit breaker for external APIs (reference legacy services/spamshield/test/circuit-breaker.test.ts)
- Cache results in DB or Redis for 24 hours
- Fallback to internal database if external APIs fail
Implement audit logging:
- Every classification decision is logged to AuditLog table
- Include input, output, confidence, model version, timestamp
Wire router into web/src/server/api/root.ts.
Write unit tests with mocked ML engine and reputation APIs.

steps:

Unit: checkNumberReputation normalizes phone and queries APIs with circuit breaker
Unit: classifySMS returns spam/ham with confidence
Unit: ruleEngine evaluates custom rules correctly
Unit: submitFeedback creates feedback record
Unit: Audit logging captures all classification decisions
Integration: tRPC checkNumber returns reputation for valid E.164 number

acceptance_criteria:

Phone numbers are normalized to E.164 before processing
Number reputation checks query external APIs with circuit breaker and caching
SMS classification returns spam/ham verdict with confidence score
Call analysis evaluates rules and ML model
Users can create, list, and delete custom spam rules
Feedback submissions are logged for model improvement
All classification decisions are audit-logged
Stats endpoint returns aggregated detection metrics per user

validation:

Call spamshield.checkNumber with a test phone number → verify reputation response
Call spamshield.classifySMS with known spam text → verify high spam score
Create a custom rule and verify it blocks matching numbers
Submit feedback and verify record created in DB
Run cd web && pnpm test for SpamShield unit tests

notes:

Reference legacy: services/spamshield/src/, packages/api/src/routes/spamshield.routes.ts
The BERT model for SMS classification may require Python. Use the same approach as VoicePrint: pluggable ML engine with Python bridge or ONNX.
Hiya and Truecaller APIs require commercial agreements. For development, mock these or use free alternatives like NumVerify.
The checkNumber endpoint may receive high traffic from the browser extension. Ensure it is rate-limited and cached aggressively.
Consider adding a global spam database that accumulates feedback from all users (anonymized) to improve detection.
The rule engine should support both user-specific rules and global admin rules.

6.2 KiB Raw Blame History

17. Backend Router — SpamShield (Spam Detection & Call Analysis)

6.2 KiB

Raw Blame History