62 lines
2.2 KiB
YAML
62 lines
2.2 KiB
YAML
---
|
||
date: 2026-03-08
|
||
day_of_week: Sunday
|
||
task_id: FRE-17
|
||
title: Add Memory-Efficient Model Loading
|
||
status: done
|
||
completed_date: 2026-03-15
|
||
company_id: FrenoCorp
|
||
objective: Implement gradient checkpointing and mixed precision for lower VRAM usage
|
||
context: |
|
||
- Qwen3-TTS 1.7B may not fit in low-end GPUs
|
||
- Gradient checkpointing trades compute for memory
|
||
- Mixed precision (FP16) reduces memory by half
|
||
issue_type: enhancement
|
||
priority: medium
|
||
assignee: Atlas
|
||
parent_task: FRE-32
|
||
goal_id: MVP_Pipeline_Working
|
||
blocking_tasks: []
|
||
expected_outcome: |
|
||
- Model runs on GPUs with <8GB VRAM
|
||
- Configurable precision (FP32/FP16/BF16)
|
||
- Graceful degradation when memory insufficient
|
||
acceptance_criteria:
|
||
- FP16 mode reduces memory usage by ~50%
|
||
- Gradient checkpointing option available
|
||
- Clear error when memory still insufficient
|
||
|
||
notes:
|
||
- Use torch.cuda.amp for mixed precision
|
||
- Set gradient_checkpointing=True in model config
|
||
- COMPLETED: Added memory-efficient model loading with auto-detection
|
||
|
||
completion_notes: |
|
||
Completed 2026-03-15. Deliverables:
|
||
|
||
**New Parameters:**
|
||
- `memory_efficient` (bool, default=True): Enable all memory-saving features
|
||
- `use_gradient_checkpointing` (bool, default=False): Trade compute for memory
|
||
- Enhanced `dtype` support with auto-selection based on available GPU memory
|
||
|
||
**New Methods:**
|
||
- `_check_gpu_memory()`: Returns (total_gb, available_gb)
|
||
- `_select_optimal_dtype(available_gb)`: Auto-selects fp32/bf16/fp16
|
||
- `get_memory_stats()`: Returns dict with current GPU memory usage
|
||
- `estimate_model_memory()`: Returns estimated memory for different precisions
|
||
|
||
**Features:**
|
||
- Auto-detects GPU memory and selects optimal dtype (bf16 for Ampere+, fp16 otherwise)
|
||
- Graceful degradation: fp32 → bf16 → fp16 based on available memory
|
||
- Enhanced OOM error messages with actionable suggestions
|
||
- Memory stats reported on load/unload
|
||
- Gradient checkpointing support for training scenarios
|
||
|
||
**Memory Estimates:**
|
||
- FP32: ~6.8GB (1.7B params × 4 bytes + overhead)
|
||
- FP16/BF16: ~3.9GB (50% reduction)
|
||
- Minimum recommended: 4GB VRAM
|
||
|
||
links:
|
||
tts_model: /home/mike/code/AudiobookPipeline/src/generation/tts_model.py
|
||
--- |