Files
Kordant/tasks/kordant-unified-restructure/16-voiceprint-router.md
2026-05-25 22:49:37 -04:00

6.2 KiB

16. Backend Router — VoicePrint (Voice Cloning Detection)

meta: id: kordant-unified-restructure-16 feature: kordant-unified-restructure priority: P1 depends_on: [kordant-unified-restructure-12, kordant-unified-restructure-13, kordant-unified-restructure-14] tags: [backend, trpc, voiceprint, ml, api]

objective:

  • Build the tRPC router for VoicePrint, the AI voice cloning detection service. Port all logic from services/voiceprint/ and packages/api/src/routes/voiceprint.routes.ts into a unified voiceprint router and service layer.

deliverables:

  • web/src/server/api/routers/voiceprint.ts — VoicePrint router:
    • voiceprint.getEnrollmentsprotectedProcedure returning voice enrollments
    • voiceprint.createEnrollmentprotectedProcedure uploading and processing voice sample
    • voiceprint.deleteEnrollmentprotectedProcedure removing enrollment
    • voiceprint.analyzeAudioprotectedProcedure analyzing audio for synthetic voice detection
    • voiceprint.getAnalysesprotectedProcedure returning analysis history
    • voiceprint.getAnalysisResultprotectedProcedure returning detailed analysis results
    • voiceprint.getJobStatusprotectedProcedure checking batch analysis job status
  • web/src/server/services/voiceprint.service.ts — Core business logic:
    • createEnrollment(userId, name, audioBuffer, metadata) — save audio, generate embedding hash
    • deleteEnrollment(userId, enrollmentId) — remove audio file and DB record
    • analyzeAudio(userId, audioBuffer, enrollmentId?) — run ML detection:
      • Preprocess audio (VAD, noise reduction)
      • Run ECAPA-TDNN model for synthetic detection
      • If enrollment provided, run FAISS vector matching
      • Return confidence score and verdict
    • getAnalyses(userId, filters?) — query analysis history
    • createBatchJob(userId, audioFilePath) — create analysis job for async processing
  • web/src/server/services/voiceprint/ml.engine.ts — ML inference:
    • preprocessAudio(audioBuffer) — VAD, resampling, noise reduction
    • detectSynthetic(audioFeatures) — ECAPA-TDNN inference
    • matchVoice(embedding, enrollmentId) — FAISS vector index search
    • generateEmbedding(audioFeatures) — create voice embedding vector
  • web/src/server/services/voiceprint/storage.ts — Audio file storage:
    • saveAudio(userId, audioBuffer) — save to local disk or S3-compatible storage
    • getAudioUrl(userId, audioHash) — generate signed URL for retrieval
    • deleteAudio(audioHash) — remove file

steps:

  1. Create web/src/server/api/routers/voiceprint.ts.
  2. Define Zod schemas:
    • createEnrollmentSchema: name: z.string().min(1), audioBase64: z.string() (or multipart handling)
    • analyzeAudioSchema: audioBase64: z.string(), enrollmentId: z.string().uuid().optional()
    • analysisFilterSchema: page, limit, verdict optional
  3. Implement router procedures:
    • Enrollment CRUD with user ownership
    • Audio analysis with optional enrollment matching
    • Job status queries
  4. Create web/src/server/services/voiceprint.service.ts:
    • Port from services/voiceprint/src/voiceprint.service.ts
    • Handle audio preprocessing pipeline
    • Integrate with ML engine
  5. Create ML engine:
    • preprocessAudio: use WebRTC VAD logic or a Node.js equivalent (e.g., node-vad)
    • detectSynthetic: placeholder for ECAPA-TDNN model integration. If model is not available in JS, create a Python microservice bridge or use ONNX Runtime.
    • matchVoice: placeholder for FAISS integration. If FAISS is not available in JS, use faiss-node or a Python bridge.
    • generateEmbedding: create embedding vector for storage
  6. Create storage module:
    • For local dev: save to uploads/voiceprint/{userId}/{hash}.wav
    • For production: integrate with S3, R2, or similar
    • Generate presigned URLs for client retrieval
  7. Implement analysis pipeline:
    • Save audio → preprocess → run detection → store result → create alert if synthetic detected
    • If enrollment provided, also run matching and include similarity score
  8. Wire router into web/src/server/api/root.ts.
  9. Write unit tests for service functions (mock ML engine).

steps:

  • Unit: createEnrollment saves audio and creates DB record
  • Unit: analyzeAudio returns verdict and confidence
  • Unit: matchVoice returns similarity score for enrolled voice
  • Unit: Storage module saves and retrieves files correctly
  • Unit: ML engine placeholders return mock results
  • Integration: tRPC procedures enforce user ownership of enrollments

acceptance_criteria:

  • Voice enrollments can be created, listed, and deleted per user
  • Audio analysis returns synthetic/natural/uncertain verdict with confidence score
  • If enrollment is provided, analysis includes voice matching similarity
  • Analysis history is queryable with pagination
  • Batch jobs can be created and their status tracked
  • Audio files are stored securely with user-scoped access
  • Synthetic voice detection triggers an alert notification

validation:

  • Upload a test audio file via tRPC client, verify enrollment created
  • Request analysis on test audio, verify result structure (verdict, confidence, metadata)
  • Verify that user A cannot access user B's enrollments or analyses
  • Run cd web && pnpm test for VoicePrint unit tests

notes:

  • Reference legacy: services/voiceprint/src/, packages/api/src/routes/voiceprint.routes.ts
  • The ECAPA-TDNN and FAISS components may require Python or compiled native modules. If they cannot run in the Node.js monolith:
    • Option A: Create a lightweight Python gRPC/HTTP service for ML inference, call it from the monolith
    • Option B: Use ONNX Runtime Node.js bindings if a converted model is available
    • Option C: Keep the Python service separate but unify the API layer in tRPC (the monolith calls the Python service internally)
  • For this task, implement the service layer with a pluggable ML engine interface. Use mock/stub implementations if native ML is not yet available.
  • Audio files can be large. Consider streaming uploads instead of base64 encoding for production.
  • The analysis pipeline should be idempotent: analyzing the same audio twice should return cached results.