rebranding
This commit is contained in:
102
tasks/kordant-unified-restructure/16-voiceprint-router.md
Normal file
102
tasks/kordant-unified-restructure/16-voiceprint-router.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# 16. Backend Router — VoicePrint (Voice Cloning Detection)
|
||||
|
||||
meta:
|
||||
id: kordant-unified-restructure-16
|
||||
feature: kordant-unified-restructure
|
||||
priority: P1
|
||||
depends_on: [kordant-unified-restructure-12, kordant-unified-restructure-13, kordant-unified-restructure-14]
|
||||
tags: [backend, trpc, voiceprint, ml, api]
|
||||
|
||||
objective:
|
||||
- Build the tRPC router for VoicePrint, the AI voice cloning detection service. Port all logic from `services/voiceprint/` and `packages/api/src/routes/voiceprint.routes.ts` into a unified `voiceprint` router and service layer.
|
||||
|
||||
deliverables:
|
||||
- `web/src/server/api/routers/voiceprint.ts` — VoicePrint router:
|
||||
- `voiceprint.getEnrollments` — `protectedProcedure` returning voice enrollments
|
||||
- `voiceprint.createEnrollment` — `protectedProcedure` uploading and processing voice sample
|
||||
- `voiceprint.deleteEnrollment` — `protectedProcedure` removing enrollment
|
||||
- `voiceprint.analyzeAudio` — `protectedProcedure` analyzing audio for synthetic voice detection
|
||||
- `voiceprint.getAnalyses` — `protectedProcedure` returning analysis history
|
||||
- `voiceprint.getAnalysisResult` — `protectedProcedure` returning detailed analysis results
|
||||
- `voiceprint.getJobStatus` — `protectedProcedure` checking batch analysis job status
|
||||
- `web/src/server/services/voiceprint.service.ts` — Core business logic:
|
||||
- `createEnrollment(userId, name, audioBuffer, metadata)` — save audio, generate embedding hash
|
||||
- `deleteEnrollment(userId, enrollmentId)` — remove audio file and DB record
|
||||
- `analyzeAudio(userId, audioBuffer, enrollmentId?)` — run ML detection:
|
||||
- Preprocess audio (VAD, noise reduction)
|
||||
- Run ECAPA-TDNN model for synthetic detection
|
||||
- If enrollment provided, run FAISS vector matching
|
||||
- Return confidence score and verdict
|
||||
- `getAnalyses(userId, filters?)` — query analysis history
|
||||
- `createBatchJob(userId, audioFilePath)` — create analysis job for async processing
|
||||
- `web/src/server/services/voiceprint/ml.engine.ts` — ML inference:
|
||||
- `preprocessAudio(audioBuffer)` — VAD, resampling, noise reduction
|
||||
- `detectSynthetic(audioFeatures)` — ECAPA-TDNN inference
|
||||
- `matchVoice(embedding, enrollmentId)` — FAISS vector index search
|
||||
- `generateEmbedding(audioFeatures)` — create voice embedding vector
|
||||
- `web/src/server/services/voiceprint/storage.ts` — Audio file storage:
|
||||
- `saveAudio(userId, audioBuffer)` — save to local disk or S3-compatible storage
|
||||
- `getAudioUrl(userId, audioHash)` — generate signed URL for retrieval
|
||||
- `deleteAudio(audioHash)` — remove file
|
||||
|
||||
steps:
|
||||
1. Create `web/src/server/api/routers/voiceprint.ts`.
|
||||
2. Define Zod schemas:
|
||||
- `createEnrollmentSchema`: `name: z.string().min(1)`, `audioBase64: z.string()` (or multipart handling)
|
||||
- `analyzeAudioSchema`: `audioBase64: z.string()`, `enrollmentId: z.string().uuid().optional()`
|
||||
- `analysisFilterSchema`: `page`, `limit`, `verdict` optional
|
||||
3. Implement router procedures:
|
||||
- Enrollment CRUD with user ownership
|
||||
- Audio analysis with optional enrollment matching
|
||||
- Job status queries
|
||||
4. Create `web/src/server/services/voiceprint.service.ts`:
|
||||
- Port from `services/voiceprint/src/voiceprint.service.ts`
|
||||
- Handle audio preprocessing pipeline
|
||||
- Integrate with ML engine
|
||||
5. Create ML engine:
|
||||
- `preprocessAudio`: use WebRTC VAD logic or a Node.js equivalent (e.g., `node-vad`)
|
||||
- `detectSynthetic`: placeholder for ECAPA-TDNN model integration. If model is not available in JS, create a Python microservice bridge or use ONNX Runtime.
|
||||
- `matchVoice`: placeholder for FAISS integration. If FAISS is not available in JS, use `faiss-node` or a Python bridge.
|
||||
- `generateEmbedding`: create embedding vector for storage
|
||||
6. Create storage module:
|
||||
- For local dev: save to `uploads/voiceprint/{userId}/{hash}.wav`
|
||||
- For production: integrate with S3, R2, or similar
|
||||
- Generate presigned URLs for client retrieval
|
||||
7. Implement analysis pipeline:
|
||||
- Save audio → preprocess → run detection → store result → create alert if synthetic detected
|
||||
- If enrollment provided, also run matching and include similarity score
|
||||
8. Wire router into `web/src/server/api/root.ts`.
|
||||
9. Write unit tests for service functions (mock ML engine).
|
||||
|
||||
steps:
|
||||
- Unit: `createEnrollment` saves audio and creates DB record
|
||||
- Unit: `analyzeAudio` returns verdict and confidence
|
||||
- Unit: `matchVoice` returns similarity score for enrolled voice
|
||||
- Unit: Storage module saves and retrieves files correctly
|
||||
- Unit: ML engine placeholders return mock results
|
||||
- Integration: tRPC procedures enforce user ownership of enrollments
|
||||
|
||||
acceptance_criteria:
|
||||
- [ ] Voice enrollments can be created, listed, and deleted per user
|
||||
- [ ] Audio analysis returns synthetic/natural/uncertain verdict with confidence score
|
||||
- [ ] If enrollment is provided, analysis includes voice matching similarity
|
||||
- [ ] Analysis history is queryable with pagination
|
||||
- [ ] Batch jobs can be created and their status tracked
|
||||
- [ ] Audio files are stored securely with user-scoped access
|
||||
- [ ] Synthetic voice detection triggers an alert notification
|
||||
|
||||
validation:
|
||||
- Upload a test audio file via tRPC client, verify enrollment created
|
||||
- Request analysis on test audio, verify result structure (verdict, confidence, metadata)
|
||||
- Verify that user A cannot access user B's enrollments or analyses
|
||||
- Run `cd web && pnpm test` for VoicePrint unit tests
|
||||
|
||||
notes:
|
||||
- Reference legacy: `services/voiceprint/src/`, `packages/api/src/routes/voiceprint.routes.ts`
|
||||
- The ECAPA-TDNN and FAISS components may require Python or compiled native modules. If they cannot run in the Node.js monolith:
|
||||
- Option A: Create a lightweight Python gRPC/HTTP service for ML inference, call it from the monolith
|
||||
- Option B: Use ONNX Runtime Node.js bindings if a converted model is available
|
||||
- Option C: Keep the Python service separate but unify the API layer in tRPC (the monolith calls the Python service internally)
|
||||
- For this task, implement the service layer with a pluggable ML engine interface. Use mock/stub implementations if native ML is not yet available.
|
||||
- Audio files can be large. Consider streaming uploads instead of base64 encoding for production.
|
||||
- The analysis pipeline should be idempotent: analyzing the same audio twice should return cached results.
|
||||
Reference in New Issue
Block a user