Files
FrenoCorp/tasks/FRE-17.yaml

62 lines
2.2 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
date: 2026-03-08
day_of_week: Sunday
task_id: FRE-17
title: Add Memory-Efficient Model Loading
status: done
completed_date: 2026-03-15
company_id: FrenoCorp
objective: Implement gradient checkpointing and mixed precision for lower VRAM usage
context: |
- Qwen3-TTS 1.7B may not fit in low-end GPUs
- Gradient checkpointing trades compute for memory
- Mixed precision (FP16) reduces memory by half
issue_type: enhancement
priority: medium
assignee: Atlas
parent_task: FRE-32
goal_id: MVP_Pipeline_Working
blocking_tasks: []
expected_outcome: |
- Model runs on GPUs with <8GB VRAM
- Configurable precision (FP32/FP16/BF16)
- Graceful degradation when memory insufficient
acceptance_criteria:
- FP16 mode reduces memory usage by ~50%
- Gradient checkpointing option available
- Clear error when memory still insufficient
notes:
- Use torch.cuda.amp for mixed precision
- Set gradient_checkpointing=True in model config
- COMPLETED: Added memory-efficient model loading with auto-detection
completion_notes: |
Completed 2026-03-15. Deliverables:
**New Parameters:**
- `memory_efficient` (bool, default=True): Enable all memory-saving features
- `use_gradient_checkpointing` (bool, default=False): Trade compute for memory
- Enhanced `dtype` support with auto-selection based on available GPU memory
**New Methods:**
- `_check_gpu_memory()`: Returns (total_gb, available_gb)
- `_select_optimal_dtype(available_gb)`: Auto-selects fp32/bf16/fp16
- `get_memory_stats()`: Returns dict with current GPU memory usage
- `estimate_model_memory()`: Returns estimated memory for different precisions
**Features:**
- Auto-detects GPU memory and selects optimal dtype (bf16 for Ampere+, fp16 otherwise)
- Graceful degradation: fp32 → bf16 → fp16 based on available memory
- Enhanced OOM error messages with actionable suggestions
- Memory stats reported on load/unload
- Gradient checkpointing support for training scenarios
**Memory Estimates:**
- FP32: ~6.8GB (1.7B params × 4 bytes + overhead)
- FP16/BF16: ~3.9GB (50% reduction)
- Minimum recommended: 4GB VRAM
links:
tts_model: /home/mike/code/AudiobookPipeline/src/generation/tts_model.py
---