current org
This commit is contained in:
462
technical-architecture.md
Normal file
462
technical-architecture.md
Normal file
@@ -0,0 +1,462 @@
|
||||
# Technical Architecture: AudiobookPipeline Web Platform
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document outlines the technical architecture for transforming the AudiobookPipeline CLI tool into a full-featured SaaS platform with web interface, user management, and cloud infrastructure.
|
||||
|
||||
**Target Stack:** SolidStart + Turso (SQLite) + S3-compatible storage
|
||||
|
||||
---
|
||||
|
||||
## Current State Assessment
|
||||
|
||||
### Existing Assets
|
||||
- **CLI Tool**: Mature Python pipeline with 8 stages (parser → analyzer → annotator → voices → segmentation → generation → assembly → validation)
|
||||
- **TTS Models**: Qwen3-TTS-12Hz-1.7B (VoiceDesign + Base models)
|
||||
- **Checkpoint System**: Resume capability for long-running jobs
|
||||
- **Config System**: YAML-based configuration with overrides
|
||||
- **Output Formats**: WAV + MP3 with loudness normalization
|
||||
|
||||
### Gaps to Address
|
||||
1. No user authentication or multi-tenancy
|
||||
2. No job queue or async processing
|
||||
3. No API layer for web clients
|
||||
4. No usage tracking or billing integration
|
||||
5. CLI-only UX (no dashboard, history, or file management)
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Client Layer │
|
||||
│ ┌───────────┐ ┌───────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Web │ │ CLI │ │ REST API (public) │ │
|
||||
│ │ App │ │ (enhanced)│ │ │ │
|
||||
│ │ (SolidStart)│ │ │ │ /api/jobs, /api/files │ │
|
||||
│ └───────────┘ └───────────┘ └─────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ API Gateway Layer │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Next.js API Routes │ │
|
||||
│ │ - Auth middleware (Clerk or custom JWT) │ │
|
||||
│ │ - Rate limiting + quota enforcement │ │
|
||||
│ │ - Request validation (Zod) │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Service Layer │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
|
||||
│ │ Job │ │ File │ │ User │ │ Billing │ │
|
||||
│ │ Service │ │ Service │ │ Service │ │ Service │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────┼─────────────┐
|
||||
▼ ▼ ▼
|
||||
┌───────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Turso │ │ S3 │ │ GPU │
|
||||
│ (SQLite) │ │ (Storage) │ │ Workers │
|
||||
│ │ │ │ │ (TTS Jobs) │
|
||||
│ - Users │ │ - Uploads │ │ │
|
||||
│ - Jobs │ │ - Outputs │ │ - Qwen3-TTS │
|
||||
│ - Usage │ │ - Models │ │ - Assembly │
|
||||
│ - Subscriptions│ │ │ │ │
|
||||
└───────────────┘ └──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Technology Decisions
|
||||
|
||||
### Frontend: SolidStart
|
||||
|
||||
**Why SolidStart?**
|
||||
- Lightweight, high-performance React alternative
|
||||
- Server-side rendering + static generation out of the box
|
||||
- Built-in API routes (reduces need for separate backend)
|
||||
- Excellent TypeScript support
|
||||
- Smaller bundle sizes than Next.js
|
||||
|
||||
**Key Packages:**
|
||||
```json
|
||||
{
|
||||
"solid-start": "^1.0.0",
|
||||
"solid-js": "^1.8.0",
|
||||
"@solidjs/router": "^0.14.0",
|
||||
"zod": "^3.22.0"
|
||||
}
|
||||
```
|
||||
|
||||
### Database: Turso (SQLite)
|
||||
|
||||
**Why Turso?**
|
||||
- Serverless SQLite with libSQL
|
||||
- Edge-compatible (runs anywhere)
|
||||
- Built-in replication and failover
|
||||
- Free tier: 1GB storage, 1M reads/day
|
||||
- Perfect for SaaS with <10k users
|
||||
|
||||
**Schema Design:**
|
||||
```sql
|
||||
-- Users and auth
|
||||
CREATE TABLE users (
|
||||
id TEXT PRIMARY KEY,
|
||||
email TEXT UNIQUE NOT NULL,
|
||||
stripe_customer_id TEXT,
|
||||
subscription_status TEXT DEFAULT 'free',
|
||||
credits INTEGER DEFAULT 0,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
-- Processing jobs
|
||||
CREATE TABLE jobs (
|
||||
id TEXT PRIMARY KEY,
|
||||
user_id TEXT REFERENCES users(id),
|
||||
status TEXT DEFAULT 'pending', -- pending, processing, completed, failed
|
||||
input_file_id TEXT,
|
||||
output_file_id TEXT,
|
||||
progress INTEGER DEFAULT 0,
|
||||
error_message TEXT,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
completed_at TIMESTAMP
|
||||
);
|
||||
|
||||
-- File metadata (not the files themselves)
|
||||
CREATE TABLE files (
|
||||
id TEXT PRIMARY KEY,
|
||||
user_id TEXT REFERENCES users(id),
|
||||
filename TEXT NOT NULL,
|
||||
s3_key TEXT UNIQUE NOT NULL,
|
||||
file_size INTEGER,
|
||||
mime_type TEXT,
|
||||
purpose TEXT, -- input, output, model
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
-- Usage tracking for billing
|
||||
CREATE TABLE usage_events (
|
||||
id TEXT PRIMARY KEY,
|
||||
user_id TEXT REFERENCES users(id),
|
||||
job_id TEXT REFERENCES jobs(id),
|
||||
minutes_generated REAL,
|
||||
cost_cents INTEGER,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
### Storage: S3-Compatible
|
||||
|
||||
**Why S3?**
|
||||
- Industry standard for file storage
|
||||
- Cheap (~$0.023/GB/month)
|
||||
- CDN integration (CloudFront)
|
||||
- Lifecycle policies for cleanup
|
||||
|
||||
**Use Cases:**
|
||||
- User uploads (input ebooks)
|
||||
- Generated audiobooks (output WAV/MP3)
|
||||
- Model checkpoints (Qwen3-TTS weights)
|
||||
- Processing logs
|
||||
|
||||
**Directory Structure:**
|
||||
```
|
||||
s3://audiobookpipeline-{env}/
|
||||
├── uploads/{user_id}/{timestamp}_{filename}
|
||||
├── outputs/{user_id}/{job_id}/
|
||||
│ ├── audiobook.wav
|
||||
│ ├── audiobook.mp3
|
||||
│ └── metadata.json
|
||||
├── models/
|
||||
│ ├── qwen3-tts-voicedesign/
|
||||
│ └── qwen3-tts-base/
|
||||
└── logs/{date}/{job_id}.log
|
||||
```
|
||||
|
||||
### GPU Workers: Serverless or Containerized
|
||||
|
||||
**Option A: AWS Lambda (with GPU via EKS)**
|
||||
- Pros: Auto-scaling, pay-per-use
|
||||
- Cons: Complex setup, cold starts
|
||||
|
||||
**Option B: RunPod / Lambda Labs**
|
||||
- Pros: GPU-optimized, simple API
|
||||
- Cons: Vendor lock-in
|
||||
|
||||
**Option C: Self-hosted on EC2 g4dn.xlarge**
|
||||
- Pros: Full control, predictable pricing (~$0.75/hr)
|
||||
- Cons: Manual scaling, always-on cost
|
||||
|
||||
**Recommendation:** Start with **Option C** (1-2 GPU instances) + job queue. Scale to serverless later.
|
||||
|
||||
---
|
||||
|
||||
## Core Components
|
||||
|
||||
### 1. Job Processing Pipeline
|
||||
|
||||
```python
|
||||
# services/job_processor.py
|
||||
class JobProcessor:
|
||||
"""Processes audiobook generation jobs."""
|
||||
|
||||
async def process_job(self, job_id: str) -> None:
|
||||
job = await self.db.get_job(job_id)
|
||||
|
||||
try:
|
||||
# Download input file from S3
|
||||
input_path = await self.file_service.download(job.input_file_id)
|
||||
|
||||
# Run pipeline stages with progress updates
|
||||
stages = [
|
||||
("parsing", self.parse_ebook),
|
||||
("analyzing", self.analyze_book),
|
||||
("segmenting", self.segment_text),
|
||||
("generating", self.generate_audio),
|
||||
("assembling", self.assemble_audiobook),
|
||||
]
|
||||
|
||||
for stage_name, stage_func in stages:
|
||||
await self.update_progress(job_id, stage_name)
|
||||
await stage_func(input_path, job.config)
|
||||
|
||||
# Upload output to S3
|
||||
output_file_id = await self.file_service.upload(
|
||||
job_id=job_id,
|
||||
files=["output.wav", "output.mp3"]
|
||||
)
|
||||
|
||||
await self.db.complete_job(job_id, output_file_id)
|
||||
|
||||
except Exception as e:
|
||||
await self.db.fail_job(job_id, str(e))
|
||||
raise
|
||||
```
|
||||
|
||||
### 2. API Routes (SolidStart)
|
||||
|
||||
```typescript
|
||||
// app/routes/api/jobs.ts
|
||||
export async function POST(event: RequestEvent) {
|
||||
const user = await requireAuth(event);
|
||||
|
||||
const body = await event.request.json();
|
||||
const schema = z.object({
|
||||
fileId: z.string(),
|
||||
config: z.object({
|
||||
voices: z.object({
|
||||
narrator: z.string().optional(),
|
||||
}),
|
||||
}).optional(),
|
||||
});
|
||||
|
||||
const { fileId, config } = schema.parse(body);
|
||||
|
||||
// Check quota
|
||||
const credits = await db.getUserCredits(user.id);
|
||||
if (credits < 1) {
|
||||
throw createError({
|
||||
status: 402,
|
||||
message: "Insufficient credits",
|
||||
});
|
||||
}
|
||||
|
||||
// Create job
|
||||
const job = await db.createJob({
|
||||
userId: user.id,
|
||||
inputFileId: fileId,
|
||||
config,
|
||||
});
|
||||
|
||||
// Queue for processing
|
||||
await jobQueue.add("process-audiobook", { jobId: job.id });
|
||||
|
||||
return event.json({ job });
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Dashboard UI
|
||||
|
||||
```tsx
|
||||
// app/routes/dashboard.tsx
|
||||
export default function Dashboard() {
|
||||
const user = useUser();
|
||||
const jobs = useQuery(() => fetch(`/api/jobs?userId=${user.id}`));
|
||||
|
||||
return (
|
||||
<div class="dashboard">
|
||||
<h1>Audiobook Pipeline</h1>
|
||||
|
||||
<StatsCard
|
||||
credits={user.credits}
|
||||
booksGenerated={jobs.data.length}
|
||||
/>
|
||||
|
||||
<UploadButton />
|
||||
|
||||
<JobList jobs={jobs.data} />
|
||||
</div>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Authentication
|
||||
- **Option 1:** Clerk (fastest to implement, $0-25/mo)
|
||||
- **Option 2:** Custom JWT with email magic links
|
||||
- **Recommendation:** Clerk for MVP
|
||||
|
||||
### Authorization
|
||||
- Row-level security in Turso queries
|
||||
- S3 pre-signed URLs with expiration
|
||||
- API rate limiting per user
|
||||
|
||||
### Data Isolation
|
||||
- All S3 keys include `user_id` prefix
|
||||
- Database queries always filter by `user_id`
|
||||
- GPU workers validate job ownership
|
||||
|
||||
---
|
||||
|
||||
## Deployment Architecture
|
||||
|
||||
### Development
|
||||
```bash
|
||||
# Local setup
|
||||
npm run dev # SolidStart dev server
|
||||
turso dev # Local SQLite
|
||||
minio # Local S3-compatible storage
|
||||
```
|
||||
|
||||
### Production (Vercel + Turso)
|
||||
```
|
||||
┌─────────────┐ ┌──────────────┐ ┌──────────┐
|
||||
│ Vercel │────▶│ Turso │ │ S3 │
|
||||
│ (SolidStart)│ │ (Database) │ │(Storage) │
|
||||
└─────────────┘ └──────────────┘ └──────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ GPU Fleet │
|
||||
│ (Workers) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### CI/CD Pipeline
|
||||
```yaml
|
||||
# .github/workflows/deploy.yml
|
||||
name: Deploy
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: npm ci
|
||||
- run: npm test
|
||||
|
||||
deploy:
|
||||
needs: test
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: vercel/actions@v2
|
||||
with:
|
||||
token: ${{ secrets.VERCEL_TOKEN }}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MVP Implementation Plan
|
||||
|
||||
### Phase 1: Foundation (Week 1-2)
|
||||
- [ ] Set up SolidStart project structure
|
||||
- [ ] Integrate Turso database
|
||||
- [ ] Implement user auth (Clerk)
|
||||
- [ ] Create file upload endpoint (S3)
|
||||
- [ ] Build basic dashboard UI
|
||||
|
||||
### Phase 2: Pipeline Integration (Week 2-3)
|
||||
- [ ] Containerize existing Python pipeline
|
||||
- [ ] Set up job queue (BullMQ or Redis)
|
||||
- [ ] Implement job processor service
|
||||
- [ ] Add progress tracking API
|
||||
- [ ] Connect GPU workers
|
||||
|
||||
### Phase 3: User Experience (Week 3-4)
|
||||
- [ ] Job history UI with status indicators
|
||||
- [ ] Audio player for preview/download
|
||||
- [ ] Usage dashboard + credit system
|
||||
- [ ] Stripe integration for payments
|
||||
- [ ] Email notifications on job completion
|
||||
|
||||
---
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
### Infrastructure Costs (Monthly)
|
||||
|
||||
| Component | Tier | Cost |
|
||||
|-----------|------|------|
|
||||
| Vercel | Pro | $20/mo |
|
||||
| Turso | Free tier | $0/mo (<1M reads/day) |
|
||||
| S3 Storage | 1TB | $23/mo |
|
||||
| GPU (g4dn.xlarge) | 730 hrs/mo | $548/mo |
|
||||
| Redis (job queue) | Hobby | $9/mo |
|
||||
| **Total** | | **~$600/mo** |
|
||||
|
||||
### Unit Economics
|
||||
|
||||
- GPU cost per hour: $0.75
|
||||
- Average book processing time: 2 hours (30k words)
|
||||
- Cost per book: ~$1.50 (GPU only)
|
||||
- Price per book: $39/mo subscription (unlimited, but fair use)
|
||||
- **Gross margin: >95%**
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Immediate:** Set up SolidStart + Turso scaffolding
|
||||
2. **This Week:** Implement auth + file upload
|
||||
3. **Next Week:** Containerize Python pipeline + job queue
|
||||
4. **Week 3:** Dashboard UI + Stripe integration
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Environment Variables
|
||||
|
||||
```bash
|
||||
# Database
|
||||
TURSO_DATABASE_URL="libsql://frenocorp.turso.io"
|
||||
TURSO_AUTH_TOKEN="..."
|
||||
|
||||
# Storage
|
||||
AWS_ACCESS_KEY_ID="..."
|
||||
AWS_SECRET_ACCESS_KEY="..."
|
||||
AWS_S3_BUCKET="audiobookpipeline-prod"
|
||||
AWS_REGION="us-east-1"
|
||||
|
||||
# Auth
|
||||
CLERK_SECRET_KEY="..."
|
||||
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="..."
|
||||
|
||||
# Billing
|
||||
STRIPE_SECRET_KEY="..."
|
||||
STRIPE_WEBHOOK_SECRET="..."
|
||||
|
||||
# GPU Workers
|
||||
GPU_WORKER_ENDPOINT="https://workers.audiobookpipeline.com"
|
||||
GPU_API_KEY="..."
|
||||
```
|
||||
Reference in New Issue
Block a user