---
date: 2026-03-08
day_of_week: Sunday
task_id: FRE-17
title: Add Memory-Efficient Model Loading
status: todo
company_id: FrenoCorp
objective: Implement gradient checkpointing and mixed precision for lower VRAM usage
context: |
  - Qwen3-TTS 1.7B may not fit in low-end GPUs
  - Gradient checkpointing trades compute for memory
  - Mixed precision (FP16) reduces memory by half
issue_type: enhancement
priority: medium
assignee: Atlas
parent_task: FRE-32
goal_id: MVP_Pipeline_Working
blocking_tasks: []
expected_outcome: |
  - Model runs on GPUs with <8GB VRAM
  - Configurable precision (FP32/FP16/BF16)
  - Graceful degradation when memory insufficient
acceptance_criteria:
  - FP16 mode reduces memory usage by ~50%
  - Gradient checkpointing option available
  - Clear error when memory still insufficient

notes:
  - Use torch.cuda.amp for mixed precision
  - Set gradient_checkpointing=True in model config

links:
  tts_model: /home/mike/code/AudiobookPipeline/src/generation/tts_model.py
---