Files

Michael Freno 22e4864b8e current org

2026-03-09 09:21:48 -04:00

14 KiB

Raw Blame History

Technical Architecture: AudiobookPipeline Web Platform

Executive Summary

This document outlines the technical architecture for transforming the AudiobookPipeline CLI tool into a full-featured SaaS platform with web interface, user management, and cloud infrastructure.

Target Stack: SolidStart + Turso (SQLite) + S3-compatible storage

Current State Assessment

Existing Assets

CLI Tool: Mature Python pipeline with 8 stages (parser → analyzer → annotator → voices → segmentation → generation → assembly → validation)
TTS Models: Qwen3-TTS-12Hz-1.7B (VoiceDesign + Base models)
Checkpoint System: Resume capability for long-running jobs
Config System: YAML-based configuration with overrides
Output Formats: WAV + MP3 with loudness normalization

Gaps to Address

No user authentication or multi-tenancy
No job queue or async processing
No API layer for web clients
No usage tracking or billing integration
CLI-only UX (no dashboard, history, or file management)

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                      Client Layer                           │
│  ┌───────────┐  ┌───────────┐  ┌─────────────────────────┐  │
│  │   Web     │  │   CLI     │  │   REST API (public)     │  │
│  │  App      │  │  (enhanced)│  │                       │  │
│  │ (SolidStart)│ │           │  │  /api/jobs, /api/files │  │
│  └───────────┘  └───────────┘  └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                   API Gateway Layer                         │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              Next.js API Routes                      │   │
│  │  - Auth middleware (Clerk or custom JWT)            │   │
│  │  - Rate limiting + quota enforcement                │   │
│  │  - Request validation (Zod)                         │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    Service Layer                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐  │
│  │  Job     │  │   File   │  │   User   │  │   Billing  │  │
│  │ Service  │  │  Service │  │  Service │  │  Service   │  │
│  └──────────┘  └──────────┘  └──────────┘  └────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
┌───────────────┐  ┌──────────────┐  ┌──────────────┐
│   Turso       │  │    S3        │  │   GPU        │
│   (SQLite)    │  │  (Storage)   │  │  Workers     │
│               │  │              │  │  (TTS Jobs)  │
│ - Users       │  │ - Uploads    │  │              │
│ - Jobs        │  │ - Outputs    │  │ - Qwen3-TTS  │
│ - Usage       │  │ - Models     │  │ - Assembly   │
│ - Subscriptions│ │              │  │              │
└───────────────┘  └──────────────┘  └──────────────┘

Technology Decisions

Frontend: SolidStart

Why SolidStart?

Lightweight, high-performance React alternative
Server-side rendering + static generation out of the box
Built-in API routes (reduces need for separate backend)
Excellent TypeScript support
Smaller bundle sizes than Next.js

Key Packages:

{
  "solid-start": "^1.0.0",
  "solid-js": "^1.8.0",
  "@solidjs/router": "^0.14.0",
  "zod": "^3.22.0"
}

Database: Turso (SQLite)

Why Turso?

Serverless SQLite with libSQL
Edge-compatible (runs anywhere)
Built-in replication and failover
Free tier: 1GB storage, 1M reads/day
Perfect for SaaS with <10k users

Schema Design:

-- Users and auth
CREATE TABLE users (
  id TEXT PRIMARY KEY,
  email TEXT UNIQUE NOT NULL,
  stripe_customer_id TEXT,
  subscription_status TEXT DEFAULT 'free',
  credits INTEGER DEFAULT 0,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Processing jobs
CREATE TABLE jobs (
  id TEXT PRIMARY KEY,
  user_id TEXT REFERENCES users(id),
  status TEXT DEFAULT 'pending', -- pending, processing, completed, failed
  input_file_id TEXT,
  output_file_id TEXT,
  progress INTEGER DEFAULT 0,
  error_message TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  completed_at TIMESTAMP
);

-- File metadata (not the files themselves)
CREATE TABLE files (
  id TEXT PRIMARY KEY,
  user_id TEXT REFERENCES users(id),
  filename TEXT NOT NULL,
  s3_key TEXT UNIQUE NOT NULL,
  file_size INTEGER,
  mime_type TEXT,
  purpose TEXT, -- input, output, model
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Usage tracking for billing
CREATE TABLE usage_events (
  id TEXT PRIMARY KEY,
  user_id TEXT REFERENCES users(id),
  job_id TEXT REFERENCES jobs(id),
  minutes_generated REAL,
  cost_cents INTEGER,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Storage: S3-Compatible

Why S3?

Industry standard for file storage
Cheap (~$0.023/GB/month)
CDN integration (CloudFront)
Lifecycle policies for cleanup

Use Cases:

User uploads (input ebooks)
Generated audiobooks (output WAV/MP3)
Model checkpoints (Qwen3-TTS weights)
Processing logs

Directory Structure:

s3://audiobookpipeline-{env}/
├── uploads/{user_id}/{timestamp}_{filename}
├── outputs/{user_id}/{job_id}/
│   ├── audiobook.wav
│   ├── audiobook.mp3
│   └── metadata.json
├── models/
│   ├── qwen3-tts-voicedesign/
│   └── qwen3-tts-base/
└── logs/{date}/{job_id}.log

GPU Workers: Serverless or Containerized

Option A: AWS Lambda (with GPU via EKS)

Pros: Auto-scaling, pay-per-use
Cons: Complex setup, cold starts

Option B: RunPod / Lambda Labs

Pros: GPU-optimized, simple API
Cons: Vendor lock-in

Option C: Self-hosted on EC2 g4dn.xlarge

Pros: Full control, predictable pricing (~$0.75/hr)
Cons: Manual scaling, always-on cost

Recommendation: Start with Option C (1-2 GPU instances) + job queue. Scale to serverless later.

Core Components

1. Job Processing Pipeline

# services/job_processor.py
class JobProcessor:
    """Processes audiobook generation jobs."""
    
    async def process_job(self, job_id: str) -> None:
        job = await self.db.get_job(job_id)
        
        try:
            # Download input file from S3
            input_path = await self.file_service.download(job.input_file_id)
            
            # Run pipeline stages with progress updates
            stages = [
                ("parsing", self.parse_ebook),
                ("analyzing", self.analyze_book),
                ("segmenting", self.segment_text),
                ("generating", self.generate_audio),
                ("assembling", self.assemble_audiobook),
            ]
            
            for stage_name, stage_func in stages:
                await self.update_progress(job_id, stage_name)
                await stage_func(input_path, job.config)
            
            # Upload output to S3
            output_file_id = await self.file_service.upload(
                job_id=job_id,
                files=["output.wav", "output.mp3"]
            )
            
            await self.db.complete_job(job_id, output_file_id)
            
        except Exception as e:
            await self.db.fail_job(job_id, str(e))
            raise

2. API Routes (SolidStart)

// app/routes/api/jobs.ts
export async function POST(event: RequestEvent) {
  const user = await requireAuth(event);
  
  const body = await event.request.json();
  const schema = z.object({
    fileId: z.string(),
    config: z.object({
      voices: z.object({
        narrator: z.string().optional(),
      }),
    }).optional(),
  });
  
  const { fileId, config } = schema.parse(body);
  
  // Check quota
  const credits = await db.getUserCredits(user.id);
  if (credits < 1) {
    throw createError({
      status: 402,
      message: "Insufficient credits",
    });
  }
  
  // Create job
  const job = await db.createJob({
    userId: user.id,
    inputFileId: fileId,
    config,
  });
  
  // Queue for processing
  await jobQueue.add("process-audiobook", { jobId: job.id });
  
  return event.json({ job });
}

3. Dashboard UI

// app/routes/dashboard.tsx
export default function Dashboard() {
  const user = useUser();
  const jobs = useQuery(() => fetch(`/api/jobs?userId=${user.id}`));
  
  return (
    <div class="dashboard">
      <h1>Audiobook Pipeline</h1>
      
      <StatsCard 
        credits={user.credits}
        booksGenerated={jobs.data.length}
      />
      
      <UploadButton />
      
      <JobList jobs={jobs.data} />
    </div>
  );
}

Security Considerations

Authentication

Option 1: Clerk (fastest to implement, $0-25/mo)
Option 2: Custom JWT with email magic links
Recommendation: Clerk for MVP

Authorization

Row-level security in Turso queries
S3 pre-signed URLs with expiration
API rate limiting per user

Data Isolation

All S3 keys include user_id prefix
Database queries always filter by user_id
GPU workers validate job ownership

Deployment Architecture

Development

# Local setup
npm run dev # SolidStart dev server
turso dev   # Local SQLite
minio       # Local S3-compatible storage

Production (Vercel + Turso)

┌─────────────┐     ┌──────────────┐     ┌──────────┐
│   Vercel    │────▶│    Turso     │     │    S3    │
│  (SolidStart)│     │  (Database)  │     │(Storage) │
└─────────────┘     └──────────────┘     └──────────┘
       │
       ▼
┌─────────────┐
│  GPU Fleet  │
│  (Workers)  │
└─────────────┘

CI/CD Pipeline

# .github/workflows/deploy.yml
name: Deploy
on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm test
      
  deploy:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: vercel/actions@v2
        with:
          token: ${{ secrets.VERCEL_TOKEN }}

MVP Implementation Plan

Phase 1: Foundation (Week 1-2)

Set up SolidStart project structure
Integrate Turso database
Implement user auth (Clerk)
Create file upload endpoint (S3)
Build basic dashboard UI

Phase 2: Pipeline Integration (Week 2-3)

Containerize existing Python pipeline
Set up job queue (BullMQ or Redis)
Implement job processor service
Add progress tracking API
Connect GPU workers

Phase 3: User Experience (Week 3-4)

Job history UI with status indicators
Audio player for preview/download
Usage dashboard + credit system
Stripe integration for payments
Email notifications on job completion

Cost Analysis

Infrastructure Costs (Monthly)

Component	Tier	Cost
Vercel	Pro	$20/mo
Turso	Free tier	$0/mo (<1M reads/day)
S3 Storage	1TB	$23/mo
GPU (g4dn.xlarge)	730 hrs/mo	$548/mo
Redis (job queue)	Hobby	$9/mo
Total		~$600/mo

Unit Economics

GPU cost per hour: $0.75
Average book processing time: 2 hours (30k words)
Cost per book: ~$1.50 (GPU only)
Price per book: $39/mo subscription (unlimited, but fair use)
Gross margin: >95%

Next Steps

Immediate: Set up SolidStart + Turso scaffolding
This Week: Implement auth + file upload
Next Week: Containerize Python pipeline + job queue
Week 3: Dashboard UI + Stripe integration

Appendix: Environment Variables

# Database
TURSO_DATABASE_URL="libsql://frenocorp.turso.io"
TURSO_AUTH_TOKEN="..."

# Storage
AWS_ACCESS_KEY_ID="..."
AWS_SECRET_ACCESS_KEY="..."
AWS_S3_BUCKET="audiobookpipeline-prod"
AWS_REGION="us-east-1"

# Auth
CLERK_SECRET_KEY="..."
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY="..."

# Billing
STRIPE_SECRET_KEY="..."
STRIPE_WEBHOOK_SECRET="..."

# GPU Workers
GPU_WORKER_ENDPOINT="https://workers.audiobookpipeline.com"
GPU_API_KEY="..."

14 KiB Raw Blame History