current org

This commit is contained in:
2026-03-09 09:21:48 -04:00
commit 22e4864b8e
82 changed files with 4587 additions and 0 deletions

462
technical-architecture.md Normal file
View File

@@ -0,0 +1,462 @@
# Technical Architecture: AudiobookPipeline Web Platform
## Executive Summary
This document outlines the technical architecture for transforming the AudiobookPipeline CLI tool into a full-featured SaaS platform with web interface, user management, and cloud infrastructure.
**Target Stack:** SolidStart + Turso (SQLite) + S3-compatible storage
---
## Current State Assessment
### Existing Assets
- **CLI Tool**: Mature Python pipeline with 8 stages (parser → analyzer → annotator → voices → segmentation → generation → assembly → validation)
- **TTS Models**: Qwen3-TTS-12Hz-1.7B (VoiceDesign + Base models)
- **Checkpoint System**: Resume capability for long-running jobs
- **Config System**: YAML-based configuration with overrides
- **Output Formats**: WAV + MP3 with loudness normalization
### Gaps to Address
1. No user authentication or multi-tenancy
2. No job queue or async processing
3. No API layer for web clients
4. No usage tracking or billing integration
5. CLI-only UX (no dashboard, history, or file management)
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Client Layer │
│ ┌───────────┐ ┌───────────┐ ┌─────────────────────────┐ │
│ │ Web │ │ CLI │ │ REST API (public) │ │
│ │ App │ │ (enhanced)│ │ │ │
│ │ (SolidStart)│ │ │ │ /api/jobs, /api/files │ │
│ └───────────┘ └───────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ API Gateway Layer │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Next.js API Routes │ │
│ │ - Auth middleware (Clerk or custom JWT) │ │
│ │ - Rate limiting + quota enforcement │ │
│ │ - Request validation (Zod) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Service Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ Job │ │ File │ │ User │ │ Billing │ │
│ │ Service │ │ Service │ │ Service │ │ Service │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────┼─────────────┐
▼ ▼ ▼
┌───────────────┐ ┌──────────────┐ ┌──────────────┐
│ Turso │ │ S3 │ │ GPU │
│ (SQLite) │ │ (Storage) │ │ Workers │
│ │ │ │ │ (TTS Jobs) │
│ - Users │ │ - Uploads │ │ │
│ - Jobs │ │ - Outputs │ │ - Qwen3-TTS │
│ - Usage │ │ - Models │ │ - Assembly │
│ - Subscriptions│ │ │ │ │
└───────────────┘ └──────────────┘ └──────────────┘
```
---
## Technology Decisions
### Frontend: SolidStart
**Why SolidStart?**
- Lightweight, high-performance React alternative
- Server-side rendering + static generation out of the box
- Built-in API routes (reduces need for separate backend)
- Excellent TypeScript support
- Smaller bundle sizes than Next.js
**Key Packages:**
```json
{
"solid-start": "^1.0.0",
"solid-js": "^1.8.0",
"@solidjs/router": "^0.14.0",
"zod": "^3.22.0"
}
```
### Database: Turso (SQLite)
**Why Turso?**
- Serverless SQLite with libSQL
- Edge-compatible (runs anywhere)
- Built-in replication and failover
- Free tier: 1GB storage, 1M reads/day
- Perfect for SaaS with <10k users
**Schema Design:**
```sql
-- Users and auth
CREATE TABLE users (
id TEXT PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
stripe_customer_id TEXT,
subscription_status TEXT DEFAULT 'free',
credits INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Processing jobs
CREATE TABLE jobs (
id TEXT PRIMARY KEY,
user_id TEXT REFERENCES users(id),
status TEXT DEFAULT 'pending', -- pending, processing, completed, failed
input_file_id TEXT,
output_file_id TEXT,
progress INTEGER DEFAULT 0,
error_message TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP
);
-- File metadata (not the files themselves)
CREATE TABLE files (
id TEXT PRIMARY KEY,
user_id TEXT REFERENCES users(id),
filename TEXT NOT NULL,
s3_key TEXT UNIQUE NOT NULL,
file_size INTEGER,
mime_type TEXT,
purpose TEXT, -- input, output, model
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Usage tracking for billing
CREATE TABLE usage_events (
id TEXT PRIMARY KEY,
user_id TEXT REFERENCES users(id),
job_id TEXT REFERENCES jobs(id),
minutes_generated REAL,
cost_cents INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
### Storage: S3-Compatible
**Why S3?**
- Industry standard for file storage
- Cheap (~$0.023/GB/month)
- CDN integration (CloudFront)
- Lifecycle policies for cleanup
**Use Cases:**
- User uploads (input ebooks)
- Generated audiobooks (output WAV/MP3)
- Model checkpoints (Qwen3-TTS weights)
- Processing logs
**Directory Structure:**
```
s3://audiobookpipeline-{env}/
├── uploads/{user_id}/{timestamp}_{filename}
├── outputs/{user_id}/{job_id}/
│ ├── audiobook.wav
│ ├── audiobook.mp3
│ └── metadata.json
├── models/
│ ├── qwen3-tts-voicedesign/
│ └── qwen3-tts-base/
└── logs/{date}/{job_id}.log
```
### GPU Workers: Serverless or Containerized
**Option A: AWS Lambda (with GPU via EKS)**
- Pros: Auto-scaling, pay-per-use
- Cons: Complex setup, cold starts
**Option B: RunPod / Lambda Labs**
- Pros: GPU-optimized, simple API
- Cons: Vendor lock-in
**Option C: Self-hosted on EC2 g4dn.xlarge**
- Pros: Full control, predictable pricing (~$0.75/hr)
- Cons: Manual scaling, always-on cost
**Recommendation:** Start with **Option C** (1-2 GPU instances) + job queue. Scale to serverless later.
---
## Core Components
### 1. Job Processing Pipeline
```python
# services/job_processor.py
class JobProcessor:
"""Processes audiobook generation jobs."""
async def process_job(self, job_id: str) -> None:
job = await self.db.get_job(job_id)
try:
# Download input file from S3
input_path = await self.file_service.download(job.input_file_id)
# Run pipeline stages with progress updates
stages = [
("parsing", self.parse_ebook),
("analyzing", self.analyze_book),
("segmenting", self.segment_text),
("generating", self.generate_audio),
("assembling", self.assemble_audiobook),
]
for stage_name, stage_func in stages:
await self.update_progress(job_id, stage_name)
await stage_func(input_path, job.config)
# Upload output to S3
output_file_id = await self.file_service.upload(
job_id=job_id,
files=["output.wav", "output.mp3"]
)
await self.db.complete_job(job_id, output_file_id)
except Exception as e:
await self.db.fail_job(job_id, str(e))
raise
```
### 2. API Routes (SolidStart)
```typescript
// app/routes/api/jobs.ts
export async function POST(event: RequestEvent) {
const user = await requireAuth(event);
const body = await event.request.json();
const schema = z.object({
fileId: z.string(),
config: z.object({
voices: z.object({
narrator: z.string().optional(),
}),
}).optional(),
});
const { fileId, config } = schema.parse(body);
// Check quota
const credits = await db.getUserCredits(user.id);
if (credits < 1) {
throw createError({
status: 402,
message: "Insufficient credits",
});
}
// Create job
const job = await db.createJob({
userId: user.id,
inputFileId: fileId,
config,
});
// Queue for processing
await jobQueue.add("process-audiobook", { jobId: job.id });
return event.json({ job });
}
```
### 3. Dashboard UI
```tsx
// app/routes/dashboard.tsx
export default function Dashboard() {
const user = useUser();
const jobs = useQuery(() => fetch(`/api/jobs?userId=${user.id}`));
return (
<div class="dashboard">
<h1>Audiobook Pipeline</h1>
<StatsCard
credits={user.credits}
booksGenerated={jobs.data.length}
/>
<UploadButton />
<JobList jobs={jobs.data} />
</div>
);
}
```
---
## Security Considerations
### Authentication
- **Option 1:** Clerk (fastest to implement, $0-25/mo)
- **Option 2:** Custom JWT with email magic links
- **Recommendation:** Clerk for MVP
### Authorization
- Row-level security in Turso queries
- S3 pre-signed URLs with expiration
- API rate limiting per user
### Data Isolation
- All S3 keys include `user_id` prefix
- Database queries always filter by `user_id`
- GPU workers validate job ownership
---
## Deployment Architecture
### Development
```bash
# Local setup
npm run dev # SolidStart dev server
turso dev # Local SQLite
minio # Local S3-compatible storage
```
### Production (Vercel + Turso)
```
┌─────────────┐ ┌──────────────┐ ┌──────────┐
│ Vercel │────▶│ Turso │ │ S3 │
│ (SolidStart)│ │ (Database) │ │(Storage) │
└─────────────┘ └──────────────┘ └──────────┘
┌─────────────┐
│ GPU Fleet │
│ (Workers) │
└─────────────┘
```
### CI/CD Pipeline
```yaml
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm test
deploy:
needs: test
runs-on: ubuntu-latest
steps:
- uses: vercel/actions@v2
with:
token: ${{ secrets.VERCEL_TOKEN }}
```
---
## MVP Implementation Plan
### Phase 1: Foundation (Week 1-2)
- [ ] Set up SolidStart project structure
- [ ] Integrate Turso database
- [ ] Implement user auth (Clerk)
- [ ] Create file upload endpoint (S3)
- [ ] Build basic dashboard UI
### Phase 2: Pipeline Integration (Week 2-3)
- [ ] Containerize existing Python pipeline
- [ ] Set up job queue (BullMQ or Redis)
- [ ] Implement job processor service
- [ ] Add progress tracking API
- [ ] Connect GPU workers
### Phase 3: User Experience (Week 3-4)
- [ ] Job history UI with status indicators
- [ ] Audio player for preview/download
- [ ] Usage dashboard + credit system
- [ ] Stripe integration for payments
- [ ] Email notifications on job completion
---
## Cost Analysis
### Infrastructure Costs (Monthly)
| Component | Tier | Cost |
|-----------|------|------|
| Vercel | Pro | $20/mo |
| Turso | Free tier | $0/mo (<1M reads/day) |
| S3 Storage | 1TB | $23/mo |
| GPU (g4dn.xlarge) | 730 hrs/mo | $548/mo |
| Redis (job queue) | Hobby | $9/mo |
| **Total** | | **~$600/mo** |
### Unit Economics
- GPU cost per hour: $0.75
- Average book processing time: 2 hours (30k words)
- Cost per book: ~$1.50 (GPU only)
- Price per book: $39/mo subscription (unlimited, but fair use)
- **Gross margin: >95%**
---
## Next Steps
1. **Immediate:** Set up SolidStart + Turso scaffolding
2. **This Week:** Implement auth + file upload
3. **Next Week:** Containerize Python pipeline + job queue
4. **Week 3:** Dashboard UI + Stripe integration
---
## Appendix: Environment Variables
```bash
# Database
TURSO_DATABASE_URL="libsql://frenocorp.turso.io"
TURSO_AUTH_TOKEN="..."
# Storage
AWS_ACCESS_KEY_ID="..."
AWS_SECRET_ACCESS_KEY="..."
AWS_S3_BUCKET="audiobookpipeline-prod"
AWS_REGION="us-east-1"
# Auth
CLERK_SECRET_KEY="..."
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="..."
# Billing
STRIPE_SECRET_KEY="..."
STRIPE_WEBHOOK_SECRET="..."
# GPU Workers
GPU_WORKER_ENDPOINT="https://workers.audiobookpipeline.com"
GPU_API_KEY="..."
```