# Technical Architecture: AudiobookPipeline Web Platform ## Executive Summary This document outlines the technical architecture for transforming the AudiobookPipeline CLI tool into a full-featured SaaS platform with web interface, user management, and cloud infrastructure. **Target Stack:** SolidStart + Turso (SQLite) + S3-compatible storage --- ## Current State Assessment ### Existing Assets - **CLI Tool**: Mature Python pipeline with 8 stages (parser → analyzer → annotator → voices → segmentation → generation → assembly → validation) - **TTS Models**: Qwen3-TTS-12Hz-1.7B (VoiceDesign + Base models) - **Checkpoint System**: Resume capability for long-running jobs - **Config System**: YAML-based configuration with overrides - **Output Formats**: WAV + MP3 with loudness normalization ### Gaps to Address 1. No user authentication or multi-tenancy 2. No job queue or async processing 3. No API layer for web clients 4. No usage tracking or billing integration 5. CLI-only UX (no dashboard, history, or file management) --- ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────┐ │ Client Layer │ │ ┌───────────┐ ┌───────────┐ ┌─────────────────────────┐ │ │ │ Web │ │ CLI │ │ REST API (public) │ │ │ │ App │ │ (enhanced)│ │ │ │ │ │ (SolidStart)│ │ │ │ /api/jobs, /api/files │ │ │ └───────────┘ └───────────┘ └─────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ API Gateway Layer │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Next.js API Routes │ │ │ │ - Auth middleware (Clerk or custom JWT) │ │ │ │ - Rate limiting + quota enforcement │ │ │ │ - Request validation (Zod) │ │ │ └──────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Service Layer │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │ │ │ Job │ │ File │ │ User │ │ Billing │ │ │ │ Service │ │ Service │ │ Service │ │ Service │ │ │ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ┌─────────────┼─────────────┐ ▼ ▼ ▼ ┌───────────────┐ ┌──────────────┐ ┌──────────────┐ │ Turso │ │ S3 │ │ GPU │ │ (SQLite) │ │ (Storage) │ │ Workers │ │ │ │ │ │ (TTS Jobs) │ │ - Users │ │ - Uploads │ │ │ │ - Jobs │ │ - Outputs │ │ - Qwen3-TTS │ │ - Usage │ │ - Models │ │ - Assembly │ │ - Subscriptions│ │ │ │ │ └───────────────┘ └──────────────┘ └──────────────┘ ``` --- ## Technology Decisions ### Frontend: SolidStart **Why SolidStart?** - Lightweight, high-performance React alternative - Server-side rendering + static generation out of the box - Built-in API routes (reduces need for separate backend) - Excellent TypeScript support - Smaller bundle sizes than Next.js **Key Packages:** ```json { "solid-start": "^1.0.0", "solid-js": "^1.8.0", "@solidjs/router": "^0.14.0", "zod": "^3.22.0" } ``` ### Database: Turso (SQLite) **Why Turso?** - Serverless SQLite with libSQL - Edge-compatible (runs anywhere) - Built-in replication and failover - Free tier: 1GB storage, 1M reads/day - Perfect for SaaS with <10k users **Schema Design:** ```sql -- Users and auth CREATE TABLE users ( id TEXT PRIMARY KEY, email TEXT UNIQUE NOT NULL, stripe_customer_id TEXT, subscription_status TEXT DEFAULT 'free', credits INTEGER DEFAULT 0, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Processing jobs CREATE TABLE jobs ( id TEXT PRIMARY KEY, user_id TEXT REFERENCES users(id), status TEXT DEFAULT 'pending', -- pending, processing, completed, failed input_file_id TEXT, output_file_id TEXT, progress INTEGER DEFAULT 0, error_message TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, completed_at TIMESTAMP ); -- File metadata (not the files themselves) CREATE TABLE files ( id TEXT PRIMARY KEY, user_id TEXT REFERENCES users(id), filename TEXT NOT NULL, s3_key TEXT UNIQUE NOT NULL, file_size INTEGER, mime_type TEXT, purpose TEXT, -- input, output, model created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Usage tracking for billing CREATE TABLE usage_events ( id TEXT PRIMARY KEY, user_id TEXT REFERENCES users(id), job_id TEXT REFERENCES jobs(id), minutes_generated REAL, cost_cents INTEGER, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` ### Storage: S3-Compatible **Why S3?** - Industry standard for file storage - Cheap (~$0.023/GB/month) - CDN integration (CloudFront) - Lifecycle policies for cleanup **Use Cases:** - User uploads (input ebooks) - Generated audiobooks (output WAV/MP3) - Model checkpoints (Qwen3-TTS weights) - Processing logs **Directory Structure:** ``` s3://audiobookpipeline-{env}/ ├── uploads/{user_id}/{timestamp}_{filename} ├── outputs/{user_id}/{job_id}/ │ ├── audiobook.wav │ ├── audiobook.mp3 │ └── metadata.json ├── models/ │ ├── qwen3-tts-voicedesign/ │ └── qwen3-tts-base/ └── logs/{date}/{job_id}.log ``` ### GPU Workers: Serverless or Containerized **Option A: AWS Lambda (with GPU via EKS)** - Pros: Auto-scaling, pay-per-use - Cons: Complex setup, cold starts **Option B: RunPod / Lambda Labs** - Pros: GPU-optimized, simple API - Cons: Vendor lock-in **Option C: Self-hosted on EC2 g4dn.xlarge** - Pros: Full control, predictable pricing (~$0.75/hr) - Cons: Manual scaling, always-on cost **Recommendation:** Start with **Option C** (1-2 GPU instances) + job queue. Scale to serverless later. --- ## Core Components ### 1. Job Processing Pipeline ```python # services/job_processor.py class JobProcessor: """Processes audiobook generation jobs.""" async def process_job(self, job_id: str) -> None: job = await self.db.get_job(job_id) try: # Download input file from S3 input_path = await self.file_service.download(job.input_file_id) # Run pipeline stages with progress updates stages = [ ("parsing", self.parse_ebook), ("analyzing", self.analyze_book), ("segmenting", self.segment_text), ("generating", self.generate_audio), ("assembling", self.assemble_audiobook), ] for stage_name, stage_func in stages: await self.update_progress(job_id, stage_name) await stage_func(input_path, job.config) # Upload output to S3 output_file_id = await self.file_service.upload( job_id=job_id, files=["output.wav", "output.mp3"] ) await self.db.complete_job(job_id, output_file_id) except Exception as e: await self.db.fail_job(job_id, str(e)) raise ``` ### 2. API Routes (SolidStart) ```typescript // app/routes/api/jobs.ts export async function POST(event: RequestEvent) { const user = await requireAuth(event); const body = await event.request.json(); const schema = z.object({ fileId: z.string(), config: z.object({ voices: z.object({ narrator: z.string().optional(), }), }).optional(), }); const { fileId, config } = schema.parse(body); // Check quota const credits = await db.getUserCredits(user.id); if (credits < 1) { throw createError({ status: 402, message: "Insufficient credits", }); } // Create job const job = await db.createJob({ userId: user.id, inputFileId: fileId, config, }); // Queue for processing await jobQueue.add("process-audiobook", { jobId: job.id }); return event.json({ job }); } ``` ### 3. Dashboard UI ```tsx // app/routes/dashboard.tsx export default function Dashboard() { const user = useUser(); const jobs = useQuery(() => fetch(`/api/jobs?userId=${user.id}`)); return (

Audiobook Pipeline

); } ``` --- ## Security Considerations ### Authentication - **Option 1:** Clerk (fastest to implement, $0-25/mo) - **Option 2:** Custom JWT with email magic links - **Recommendation:** Clerk for MVP ### Authorization - Row-level security in Turso queries - S3 pre-signed URLs with expiration - API rate limiting per user ### Data Isolation - All S3 keys include `user_id` prefix - Database queries always filter by `user_id` - GPU workers validate job ownership --- ## Deployment Architecture ### Development ```bash # Local setup npm run dev # SolidStart dev server turso dev # Local SQLite minio # Local S3-compatible storage ``` ### Production (Vercel + Turso) ``` ┌─────────────┐ ┌──────────────┐ ┌──────────┐ │ Vercel │────▶│ Turso │ │ S3 │ │ (SolidStart)│ │ (Database) │ │(Storage) │ └─────────────┘ └──────────────┘ └──────────┘ │ ▼ ┌─────────────┐ │ GPU Fleet │ │ (Workers) │ └─────────────┘ ``` ### CI/CD Pipeline ```yaml # .github/workflows/deploy.yml name: Deploy on: push: branches: [main] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: npm ci - run: npm test deploy: needs: test runs-on: ubuntu-latest steps: - uses: vercel/actions@v2 with: token: ${{ secrets.VERCEL_TOKEN }} ``` --- ## MVP Implementation Plan ### Phase 1: Foundation (Week 1-2) - [ ] Set up SolidStart project structure - [ ] Integrate Turso database - [ ] Implement user auth (Clerk) - [ ] Create file upload endpoint (S3) - [ ] Build basic dashboard UI ### Phase 2: Pipeline Integration (Week 2-3) - [ ] Containerize existing Python pipeline - [ ] Set up job queue (BullMQ or Redis) - [ ] Implement job processor service - [ ] Add progress tracking API - [ ] Connect GPU workers ### Phase 3: User Experience (Week 3-4) - [ ] Job history UI with status indicators - [ ] Audio player for preview/download - [ ] Usage dashboard + credit system - [ ] Stripe integration for payments - [ ] Email notifications on job completion --- ## Cost Analysis ### Infrastructure Costs (Monthly) | Component | Tier | Cost | |-----------|------|------| | Vercel | Pro | $20/mo | | Turso | Free tier | $0/mo (<1M reads/day) | | S3 Storage | 1TB | $23/mo | | GPU (g4dn.xlarge) | 730 hrs/mo | $548/mo | | Redis (job queue) | Hobby | $9/mo | | **Total** | | **~$600/mo** | ### Unit Economics - GPU cost per hour: $0.75 - Average book processing time: 2 hours (30k words) - Cost per book: ~$1.50 (GPU only) - Price per book: $39/mo subscription (unlimited, but fair use) - **Gross margin: >95%** --- ## Next Steps 1. **Immediate:** Set up SolidStart + Turso scaffolding 2. **This Week:** Implement auth + file upload 3. **Next Week:** Containerize Python pipeline + job queue 4. **Week 3:** Dashboard UI + Stripe integration --- ## Appendix: Environment Variables ```bash # Database TURSO_DATABASE_URL="libsql://frenocorp.turso.io" TURSO_AUTH_TOKEN="..." # Storage AWS_ACCESS_KEY_ID="..." AWS_SECRET_ACCESS_KEY="..." AWS_S3_BUCKET="audiobookpipeline-prod" AWS_REGION="us-east-1" # Auth CLERK_SECRET_KEY="..." NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="..." # Billing STRIPE_SECRET_KEY="..." STRIPE_WEBHOOK_SECRET="..." # GPU Workers GPU_WORKER_ENDPOINT="https://workers.audiobookpipeline.com" GPU_API_KEY="..." ```