16. Backend Router — VoicePrint (Voice Cloning Detection)

meta: id: kordant-unified-restructure-16 feature: kordant-unified-restructure priority: P1 depends_on: [kordant-unified-restructure-12, kordant-unified-restructure-13, kordant-unified-restructure-14] tags: [backend, trpc, voiceprint, ml, api]

objective:

Build the tRPC router for VoicePrint, the AI voice cloning detection service. Port all logic from services/voiceprint/ and packages/api/src/routes/voiceprint.routes.ts into a unified voiceprint router and service layer.

deliverables:

web/src/server/api/routers/voiceprint.ts — VoicePrint router:
- voiceprint.getEnrollments — protectedProcedure returning voice enrollments
- voiceprint.createEnrollment — protectedProcedure uploading and processing voice sample
- voiceprint.deleteEnrollment — protectedProcedure removing enrollment
- voiceprint.analyzeAudio — protectedProcedure analyzing audio for synthetic voice detection
- voiceprint.getAnalyses — protectedProcedure returning analysis history
- voiceprint.getAnalysisResult — protectedProcedure returning detailed analysis results
- voiceprint.getJobStatus — protectedProcedure checking batch analysis job status
web/src/server/services/voiceprint.service.ts — Core business logic:
- createEnrollment(userId, name, audioBuffer, metadata) — save audio, generate embedding hash
- deleteEnrollment(userId, enrollmentId) — remove audio file and DB record
- analyzeAudio(userId, audioBuffer, enrollmentId?) — run ML detection:
  - Preprocess audio (VAD, noise reduction)
  - Run ECAPA-TDNN model for synthetic detection
  - If enrollment provided, run FAISS vector matching
  - Return confidence score and verdict
- getAnalyses(userId, filters?) — query analysis history
- createBatchJob(userId, audioFilePath) — create analysis job for async processing
web/src/server/services/voiceprint/ml.engine.ts — ML inference:
- preprocessAudio(audioBuffer) — VAD, resampling, noise reduction
- detectSynthetic(audioFeatures) — ECAPA-TDNN inference
- matchVoice(embedding, enrollmentId) — FAISS vector index search
- generateEmbedding(audioFeatures) — create voice embedding vector
web/src/server/services/voiceprint/storage.ts — Audio file storage:
- saveAudio(userId, audioBuffer) — save to local disk or S3-compatible storage
- getAudioUrl(userId, audioHash) — generate signed URL for retrieval
- deleteAudio(audioHash) — remove file

steps:

Create web/src/server/api/routers/voiceprint.ts.
Define Zod schemas:
- createEnrollmentSchema: name: z.string().min(1), audioBase64: z.string() (or multipart handling)
- analyzeAudioSchema: audioBase64: z.string(), enrollmentId: z.string().uuid().optional()
- analysisFilterSchema: page, limit, verdict optional
Implement router procedures:
- Enrollment CRUD with user ownership
- Audio analysis with optional enrollment matching
- Job status queries
Create web/src/server/services/voiceprint.service.ts:
- Port from services/voiceprint/src/voiceprint.service.ts
- Handle audio preprocessing pipeline
- Integrate with ML engine
Create ML engine:
- preprocessAudio: use WebRTC VAD logic or a Node.js equivalent (e.g., node-vad)
- detectSynthetic: placeholder for ECAPA-TDNN model integration. If model is not available in JS, create a Python microservice bridge or use ONNX Runtime.
- matchVoice: placeholder for FAISS integration. If FAISS is not available in JS, use faiss-node or a Python bridge.
- generateEmbedding: create embedding vector for storage
Create storage module:
- For local dev: save to uploads/voiceprint/{userId}/{hash}.wav
- For production: integrate with S3, R2, or similar
- Generate presigned URLs for client retrieval
Implement analysis pipeline:
- Save audio → preprocess → run detection → store result → create alert if synthetic detected
- If enrollment provided, also run matching and include similarity score
Wire router into web/src/server/api/root.ts.
Write unit tests for service functions (mock ML engine).

steps:

Unit: createEnrollment saves audio and creates DB record
Unit: analyzeAudio returns verdict and confidence
Unit: matchVoice returns similarity score for enrolled voice
Unit: Storage module saves and retrieves files correctly
Unit: ML engine placeholders return mock results
Integration: tRPC procedures enforce user ownership of enrollments

acceptance_criteria:

Voice enrollments can be created, listed, and deleted per user
Audio analysis returns synthetic/natural/uncertain verdict with confidence score
If enrollment is provided, analysis includes voice matching similarity
Analysis history is queryable with pagination
Batch jobs can be created and their status tracked
Audio files are stored securely with user-scoped access
Synthetic voice detection triggers an alert notification

validation:

Upload a test audio file via tRPC client, verify enrollment created
Request analysis on test audio, verify result structure (verdict, confidence, metadata)
Verify that user A cannot access user B's enrollments or analyses
Run cd web && pnpm test for VoicePrint unit tests

notes:

Reference legacy: services/voiceprint/src/, packages/api/src/routes/voiceprint.routes.ts
The ECAPA-TDNN and FAISS components may require Python or compiled native modules. If they cannot run in the Node.js monolith:
- Option A: Create a lightweight Python gRPC/HTTP service for ML inference, call it from the monolith
- Option B: Use ONNX Runtime Node.js bindings if a converted model is available
- Option C: Keep the Python service separate but unify the API layer in tRPC (the monolith calls the Python service internally)
For this task, implement the service layer with a pluggable ML engine interface. Use mock/stub implementations if native ML is not yet available.
Audio files can be large. Consider streaming uploads instead of base64 encoding for production.
The analysis pipeline should be idempotent: analyzing the same audio twice should return cached results.

6.2 KiB Raw Blame History

16. Backend Router — VoicePrint (Voice Cloning Detection)

6.2 KiB

Raw Blame History