462 lines
14 KiB
Markdown
462 lines
14 KiB
Markdown
# Technical Architecture: AudiobookPipeline Web Platform
|
|
|
|
## Executive Summary
|
|
|
|
This document outlines the technical architecture for transforming the AudiobookPipeline CLI tool into a full-featured SaaS platform with web interface, user management, and cloud infrastructure.
|
|
|
|
**Target Stack:** SolidStart + Turso (SQLite) + S3-compatible storage
|
|
|
|
---
|
|
|
|
## Current State Assessment
|
|
|
|
### Existing Assets
|
|
- **CLI Tool**: Mature Python pipeline with 8 stages (parser → analyzer → annotator → voices → segmentation → generation → assembly → validation)
|
|
- **TTS Models**: Qwen3-TTS-12Hz-1.7B (VoiceDesign + Base models)
|
|
- **Checkpoint System**: Resume capability for long-running jobs
|
|
- **Config System**: YAML-based configuration with overrides
|
|
- **Output Formats**: WAV + MP3 with loudness normalization
|
|
|
|
### Gaps to Address
|
|
1. No user authentication or multi-tenancy
|
|
2. No job queue or async processing
|
|
3. No API layer for web clients
|
|
4. No usage tracking or billing integration
|
|
5. CLI-only UX (no dashboard, history, or file management)
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Client Layer │
|
|
│ ┌───────────┐ ┌───────────┐ ┌─────────────────────────┐ │
|
|
│ │ Web │ │ CLI │ │ REST API (public) │ │
|
|
│ │ App │ │ (enhanced)│ │ │ │
|
|
│ │ (SolidStart)│ │ │ │ /api/jobs, /api/files │ │
|
|
│ └───────────┘ └───────────┘ └─────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ API Gateway Layer │
|
|
│ ┌──────────────────────────────────────────────────────┐ │
|
|
│ │ Next.js API Routes │ │
|
|
│ │ - Auth middleware (Clerk or custom JWT) │ │
|
|
│ │ - Rate limiting + quota enforcement │ │
|
|
│ │ - Request validation (Zod) │ │
|
|
│ └──────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Service Layer │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
|
|
│ │ Job │ │ File │ │ User │ │ Billing │ │
|
|
│ │ Service │ │ Service │ │ Service │ │ Service │ │
|
|
│ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌─────────────┼─────────────┐
|
|
▼ ▼ ▼
|
|
┌───────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Turso │ │ S3 │ │ GPU │
|
|
│ (SQLite) │ │ (Storage) │ │ Workers │
|
|
│ │ │ │ │ (TTS Jobs) │
|
|
│ - Users │ │ - Uploads │ │ │
|
|
│ - Jobs │ │ - Outputs │ │ - Qwen3-TTS │
|
|
│ - Usage │ │ - Models │ │ - Assembly │
|
|
│ - Subscriptions│ │ │ │ │
|
|
└───────────────┘ └──────────────┘ └──────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Technology Decisions
|
|
|
|
### Frontend: SolidStart
|
|
|
|
**Why SolidStart?**
|
|
- Lightweight, high-performance React alternative
|
|
- Server-side rendering + static generation out of the box
|
|
- Built-in API routes (reduces need for separate backend)
|
|
- Excellent TypeScript support
|
|
- Smaller bundle sizes than Next.js
|
|
|
|
**Key Packages:**
|
|
```json
|
|
{
|
|
"solid-start": "^1.0.0",
|
|
"solid-js": "^1.8.0",
|
|
"@solidjs/router": "^0.14.0",
|
|
"zod": "^3.22.0"
|
|
}
|
|
```
|
|
|
|
### Database: Turso (SQLite)
|
|
|
|
**Why Turso?**
|
|
- Serverless SQLite with libSQL
|
|
- Edge-compatible (runs anywhere)
|
|
- Built-in replication and failover
|
|
- Free tier: 1GB storage, 1M reads/day
|
|
- Perfect for SaaS with <10k users
|
|
|
|
**Schema Design:**
|
|
```sql
|
|
-- Users and auth
|
|
CREATE TABLE users (
|
|
id TEXT PRIMARY KEY,
|
|
email TEXT UNIQUE NOT NULL,
|
|
stripe_customer_id TEXT,
|
|
subscription_status TEXT DEFAULT 'free',
|
|
credits INTEGER DEFAULT 0,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
-- Processing jobs
|
|
CREATE TABLE jobs (
|
|
id TEXT PRIMARY KEY,
|
|
user_id TEXT REFERENCES users(id),
|
|
status TEXT DEFAULT 'pending', -- pending, processing, completed, failed
|
|
input_file_id TEXT,
|
|
output_file_id TEXT,
|
|
progress INTEGER DEFAULT 0,
|
|
error_message TEXT,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
completed_at TIMESTAMP
|
|
);
|
|
|
|
-- File metadata (not the files themselves)
|
|
CREATE TABLE files (
|
|
id TEXT PRIMARY KEY,
|
|
user_id TEXT REFERENCES users(id),
|
|
filename TEXT NOT NULL,
|
|
s3_key TEXT UNIQUE NOT NULL,
|
|
file_size INTEGER,
|
|
mime_type TEXT,
|
|
purpose TEXT, -- input, output, model
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
-- Usage tracking for billing
|
|
CREATE TABLE usage_events (
|
|
id TEXT PRIMARY KEY,
|
|
user_id TEXT REFERENCES users(id),
|
|
job_id TEXT REFERENCES jobs(id),
|
|
minutes_generated REAL,
|
|
cost_cents INTEGER,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
```
|
|
|
|
### Storage: S3-Compatible
|
|
|
|
**Why S3?**
|
|
- Industry standard for file storage
|
|
- Cheap (~$0.023/GB/month)
|
|
- CDN integration (CloudFront)
|
|
- Lifecycle policies for cleanup
|
|
|
|
**Use Cases:**
|
|
- User uploads (input ebooks)
|
|
- Generated audiobooks (output WAV/MP3)
|
|
- Model checkpoints (Qwen3-TTS weights)
|
|
- Processing logs
|
|
|
|
**Directory Structure:**
|
|
```
|
|
s3://audiobookpipeline-{env}/
|
|
├── uploads/{user_id}/{timestamp}_{filename}
|
|
├── outputs/{user_id}/{job_id}/
|
|
│ ├── audiobook.wav
|
|
│ ├── audiobook.mp3
|
|
│ └── metadata.json
|
|
├── models/
|
|
│ ├── qwen3-tts-voicedesign/
|
|
│ └── qwen3-tts-base/
|
|
└── logs/{date}/{job_id}.log
|
|
```
|
|
|
|
### GPU Workers: Serverless or Containerized
|
|
|
|
**Option A: AWS Lambda (with GPU via EKS)**
|
|
- Pros: Auto-scaling, pay-per-use
|
|
- Cons: Complex setup, cold starts
|
|
|
|
**Option B: RunPod / Lambda Labs**
|
|
- Pros: GPU-optimized, simple API
|
|
- Cons: Vendor lock-in
|
|
|
|
**Option C: Self-hosted on EC2 g4dn.xlarge**
|
|
- Pros: Full control, predictable pricing (~$0.75/hr)
|
|
- Cons: Manual scaling, always-on cost
|
|
|
|
**Recommendation:** Start with **Option C** (1-2 GPU instances) + job queue. Scale to serverless later.
|
|
|
|
---
|
|
|
|
## Core Components
|
|
|
|
### 1. Job Processing Pipeline
|
|
|
|
```python
|
|
# services/job_processor.py
|
|
class JobProcessor:
|
|
"""Processes audiobook generation jobs."""
|
|
|
|
async def process_job(self, job_id: str) -> None:
|
|
job = await self.db.get_job(job_id)
|
|
|
|
try:
|
|
# Download input file from S3
|
|
input_path = await self.file_service.download(job.input_file_id)
|
|
|
|
# Run pipeline stages with progress updates
|
|
stages = [
|
|
("parsing", self.parse_ebook),
|
|
("analyzing", self.analyze_book),
|
|
("segmenting", self.segment_text),
|
|
("generating", self.generate_audio),
|
|
("assembling", self.assemble_audiobook),
|
|
]
|
|
|
|
for stage_name, stage_func in stages:
|
|
await self.update_progress(job_id, stage_name)
|
|
await stage_func(input_path, job.config)
|
|
|
|
# Upload output to S3
|
|
output_file_id = await self.file_service.upload(
|
|
job_id=job_id,
|
|
files=["output.wav", "output.mp3"]
|
|
)
|
|
|
|
await self.db.complete_job(job_id, output_file_id)
|
|
|
|
except Exception as e:
|
|
await self.db.fail_job(job_id, str(e))
|
|
raise
|
|
```
|
|
|
|
### 2. API Routes (SolidStart)
|
|
|
|
```typescript
|
|
// app/routes/api/jobs.ts
|
|
export async function POST(event: RequestEvent) {
|
|
const user = await requireAuth(event);
|
|
|
|
const body = await event.request.json();
|
|
const schema = z.object({
|
|
fileId: z.string(),
|
|
config: z.object({
|
|
voices: z.object({
|
|
narrator: z.string().optional(),
|
|
}),
|
|
}).optional(),
|
|
});
|
|
|
|
const { fileId, config } = schema.parse(body);
|
|
|
|
// Check quota
|
|
const credits = await db.getUserCredits(user.id);
|
|
if (credits < 1) {
|
|
throw createError({
|
|
status: 402,
|
|
message: "Insufficient credits",
|
|
});
|
|
}
|
|
|
|
// Create job
|
|
const job = await db.createJob({
|
|
userId: user.id,
|
|
inputFileId: fileId,
|
|
config,
|
|
});
|
|
|
|
// Queue for processing
|
|
await jobQueue.add("process-audiobook", { jobId: job.id });
|
|
|
|
return event.json({ job });
|
|
}
|
|
```
|
|
|
|
### 3. Dashboard UI
|
|
|
|
```tsx
|
|
// app/routes/dashboard.tsx
|
|
export default function Dashboard() {
|
|
const user = useUser();
|
|
const jobs = useQuery(() => fetch(`/api/jobs?userId=${user.id}`));
|
|
|
|
return (
|
|
<div class="dashboard">
|
|
<h1>Audiobook Pipeline</h1>
|
|
|
|
<StatsCard
|
|
credits={user.credits}
|
|
booksGenerated={jobs.data.length}
|
|
/>
|
|
|
|
<UploadButton />
|
|
|
|
<JobList jobs={jobs.data} />
|
|
</div>
|
|
);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Security Considerations
|
|
|
|
### Authentication
|
|
- **Option 1:** Clerk (fastest to implement, $0-25/mo)
|
|
- **Option 2:** Custom JWT with email magic links
|
|
- **Recommendation:** Clerk for MVP
|
|
|
|
### Authorization
|
|
- Row-level security in Turso queries
|
|
- S3 pre-signed URLs with expiration
|
|
- API rate limiting per user
|
|
|
|
### Data Isolation
|
|
- All S3 keys include `user_id` prefix
|
|
- Database queries always filter by `user_id`
|
|
- GPU workers validate job ownership
|
|
|
|
---
|
|
|
|
## Deployment Architecture
|
|
|
|
### Development
|
|
```bash
|
|
# Local setup
|
|
npm run dev # SolidStart dev server
|
|
turso dev # Local SQLite
|
|
minio # Local S3-compatible storage
|
|
```
|
|
|
|
### Production (Vercel + Turso)
|
|
```
|
|
┌─────────────┐ ┌──────────────┐ ┌──────────┐
|
|
│ Vercel │────▶│ Turso │ │ S3 │
|
|
│ (SolidStart)│ │ (Database) │ │(Storage) │
|
|
└─────────────┘ └──────────────┘ └──────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ GPU Fleet │
|
|
│ (Workers) │
|
|
└─────────────┘
|
|
```
|
|
|
|
### CI/CD Pipeline
|
|
```yaml
|
|
# .github/workflows/deploy.yml
|
|
name: Deploy
|
|
on:
|
|
push:
|
|
branches: [main]
|
|
|
|
jobs:
|
|
test:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- run: npm ci
|
|
- run: npm test
|
|
|
|
deploy:
|
|
needs: test
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: vercel/actions@v2
|
|
with:
|
|
token: ${{ secrets.VERCEL_TOKEN }}
|
|
```
|
|
|
|
---
|
|
|
|
## MVP Implementation Plan
|
|
|
|
### Phase 1: Foundation (Week 1-2)
|
|
- [ ] Set up SolidStart project structure
|
|
- [ ] Integrate Turso database
|
|
- [ ] Implement user auth (Clerk)
|
|
- [ ] Create file upload endpoint (S3)
|
|
- [ ] Build basic dashboard UI
|
|
|
|
### Phase 2: Pipeline Integration (Week 2-3)
|
|
- [ ] Containerize existing Python pipeline
|
|
- [ ] Set up job queue (BullMQ or Redis)
|
|
- [ ] Implement job processor service
|
|
- [ ] Add progress tracking API
|
|
- [ ] Connect GPU workers
|
|
|
|
### Phase 3: User Experience (Week 3-4)
|
|
- [ ] Job history UI with status indicators
|
|
- [ ] Audio player for preview/download
|
|
- [ ] Usage dashboard + credit system
|
|
- [ ] Stripe integration for payments
|
|
- [ ] Email notifications on job completion
|
|
|
|
---
|
|
|
|
## Cost Analysis
|
|
|
|
### Infrastructure Costs (Monthly)
|
|
|
|
| Component | Tier | Cost |
|
|
|-----------|------|------|
|
|
| Vercel | Pro | $20/mo |
|
|
| Turso | Free tier | $0/mo (<1M reads/day) |
|
|
| S3 Storage | 1TB | $23/mo |
|
|
| GPU (g4dn.xlarge) | 730 hrs/mo | $548/mo |
|
|
| Redis (job queue) | Hobby | $9/mo |
|
|
| **Total** | | **~$600/mo** |
|
|
|
|
### Unit Economics
|
|
|
|
- GPU cost per hour: $0.75
|
|
- Average book processing time: 2 hours (30k words)
|
|
- Cost per book: ~$1.50 (GPU only)
|
|
- Price per book: $39/mo subscription (unlimited, but fair use)
|
|
- **Gross margin: >95%**
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Immediate:** Set up SolidStart + Turso scaffolding
|
|
2. **This Week:** Implement auth + file upload
|
|
3. **Next Week:** Containerize Python pipeline + job queue
|
|
4. **Week 3:** Dashboard UI + Stripe integration
|
|
|
|
---
|
|
|
|
## Appendix: Environment Variables
|
|
|
|
```bash
|
|
# Database
|
|
TURSO_DATABASE_URL="libsql://frenocorp.turso.io"
|
|
TURSO_AUTH_TOKEN="..."
|
|
|
|
# Storage
|
|
AWS_ACCESS_KEY_ID="..."
|
|
AWS_SECRET_ACCESS_KEY="..."
|
|
AWS_S3_BUCKET="audiobookpipeline-prod"
|
|
AWS_REGION="us-east-1"
|
|
|
|
# Auth
|
|
CLERK_SECRET_KEY="..."
|
|
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="..."
|
|
|
|
# Billing
|
|
STRIPE_SECRET_KEY="..."
|
|
STRIPE_WEBHOOK_SECRET="..."
|
|
|
|
# GPU Workers
|
|
GPU_WORKER_ENDPOINT="https://workers.audiobookpipeline.com"
|
|
GPU_API_KEY="..."
|
|
``` |