6.2 KiB
6.2 KiB
16. Backend Router — VoicePrint (Voice Cloning Detection)
meta: id: kordant-unified-restructure-16 feature: kordant-unified-restructure priority: P1 depends_on: [kordant-unified-restructure-12, kordant-unified-restructure-13, kordant-unified-restructure-14] tags: [backend, trpc, voiceprint, ml, api]
objective:
- Build the tRPC router for VoicePrint, the AI voice cloning detection service. Port all logic from
services/voiceprint/andpackages/api/src/routes/voiceprint.routes.tsinto a unifiedvoiceprintrouter and service layer.
deliverables:
web/src/server/api/routers/voiceprint.ts— VoicePrint router:voiceprint.getEnrollments—protectedProcedurereturning voice enrollmentsvoiceprint.createEnrollment—protectedProcedureuploading and processing voice samplevoiceprint.deleteEnrollment—protectedProcedureremoving enrollmentvoiceprint.analyzeAudio—protectedProcedureanalyzing audio for synthetic voice detectionvoiceprint.getAnalyses—protectedProcedurereturning analysis historyvoiceprint.getAnalysisResult—protectedProcedurereturning detailed analysis resultsvoiceprint.getJobStatus—protectedProcedurechecking batch analysis job status
web/src/server/services/voiceprint.service.ts— Core business logic:createEnrollment(userId, name, audioBuffer, metadata)— save audio, generate embedding hashdeleteEnrollment(userId, enrollmentId)— remove audio file and DB recordanalyzeAudio(userId, audioBuffer, enrollmentId?)— run ML detection:- Preprocess audio (VAD, noise reduction)
- Run ECAPA-TDNN model for synthetic detection
- If enrollment provided, run FAISS vector matching
- Return confidence score and verdict
getAnalyses(userId, filters?)— query analysis historycreateBatchJob(userId, audioFilePath)— create analysis job for async processing
web/src/server/services/voiceprint/ml.engine.ts— ML inference:preprocessAudio(audioBuffer)— VAD, resampling, noise reductiondetectSynthetic(audioFeatures)— ECAPA-TDNN inferencematchVoice(embedding, enrollmentId)— FAISS vector index searchgenerateEmbedding(audioFeatures)— create voice embedding vector
web/src/server/services/voiceprint/storage.ts— Audio file storage:saveAudio(userId, audioBuffer)— save to local disk or S3-compatible storagegetAudioUrl(userId, audioHash)— generate signed URL for retrievaldeleteAudio(audioHash)— remove file
steps:
- Create
web/src/server/api/routers/voiceprint.ts. - Define Zod schemas:
createEnrollmentSchema:name: z.string().min(1),audioBase64: z.string()(or multipart handling)analyzeAudioSchema:audioBase64: z.string(),enrollmentId: z.string().uuid().optional()analysisFilterSchema:page,limit,verdictoptional
- Implement router procedures:
- Enrollment CRUD with user ownership
- Audio analysis with optional enrollment matching
- Job status queries
- Create
web/src/server/services/voiceprint.service.ts:- Port from
services/voiceprint/src/voiceprint.service.ts - Handle audio preprocessing pipeline
- Integrate with ML engine
- Port from
- Create ML engine:
preprocessAudio: use WebRTC VAD logic or a Node.js equivalent (e.g.,node-vad)detectSynthetic: placeholder for ECAPA-TDNN model integration. If model is not available in JS, create a Python microservice bridge or use ONNX Runtime.matchVoice: placeholder for FAISS integration. If FAISS is not available in JS, usefaiss-nodeor a Python bridge.generateEmbedding: create embedding vector for storage
- Create storage module:
- For local dev: save to
uploads/voiceprint/{userId}/{hash}.wav - For production: integrate with S3, R2, or similar
- Generate presigned URLs for client retrieval
- For local dev: save to
- Implement analysis pipeline:
- Save audio → preprocess → run detection → store result → create alert if synthetic detected
- If enrollment provided, also run matching and include similarity score
- Wire router into
web/src/server/api/root.ts. - Write unit tests for service functions (mock ML engine).
steps:
- Unit:
createEnrollmentsaves audio and creates DB record - Unit:
analyzeAudioreturns verdict and confidence - Unit:
matchVoicereturns similarity score for enrolled voice - Unit: Storage module saves and retrieves files correctly
- Unit: ML engine placeholders return mock results
- Integration: tRPC procedures enforce user ownership of enrollments
acceptance_criteria:
- Voice enrollments can be created, listed, and deleted per user
- Audio analysis returns synthetic/natural/uncertain verdict with confidence score
- If enrollment is provided, analysis includes voice matching similarity
- Analysis history is queryable with pagination
- Batch jobs can be created and their status tracked
- Audio files are stored securely with user-scoped access
- Synthetic voice detection triggers an alert notification
validation:
- Upload a test audio file via tRPC client, verify enrollment created
- Request analysis on test audio, verify result structure (verdict, confidence, metadata)
- Verify that user A cannot access user B's enrollments or analyses
- Run
cd web && pnpm testfor VoicePrint unit tests
notes:
- Reference legacy:
services/voiceprint/src/,packages/api/src/routes/voiceprint.routes.ts - The ECAPA-TDNN and FAISS components may require Python or compiled native modules. If they cannot run in the Node.js monolith:
- Option A: Create a lightweight Python gRPC/HTTP service for ML inference, call it from the monolith
- Option B: Use ONNX Runtime Node.js bindings if a converted model is available
- Option C: Keep the Python service separate but unify the API layer in tRPC (the monolith calls the Python service internally)
- For this task, implement the service layer with a pluggable ML engine interface. Use mock/stub implementations if native ML is not yet available.
- Audio files can be large. Consider streaming uploads instead of base64 encoding for production.
- The analysis pipeline should be idempotent: analyzing the same audio twice should return cached results.