Auto-commit 2026-03-15 02:40
This commit is contained in:
@@ -3,7 +3,8 @@ date: 2026-03-08
|
||||
day_of_week: Sunday
|
||||
task_id: FRE-17
|
||||
title: Add Memory-Efficient Model Loading
|
||||
status: todo
|
||||
status: done
|
||||
completed_date: 2026-03-15
|
||||
company_id: FrenoCorp
|
||||
objective: Implement gradient checkpointing and mixed precision for lower VRAM usage
|
||||
context: |
|
||||
@@ -28,6 +29,33 @@ acceptance_criteria:
|
||||
notes:
|
||||
- Use torch.cuda.amp for mixed precision
|
||||
- Set gradient_checkpointing=True in model config
|
||||
- COMPLETED: Added memory-efficient model loading with auto-detection
|
||||
|
||||
completion_notes: |
|
||||
Completed 2026-03-15. Deliverables:
|
||||
|
||||
**New Parameters:**
|
||||
- `memory_efficient` (bool, default=True): Enable all memory-saving features
|
||||
- `use_gradient_checkpointing` (bool, default=False): Trade compute for memory
|
||||
- Enhanced `dtype` support with auto-selection based on available GPU memory
|
||||
|
||||
**New Methods:**
|
||||
- `_check_gpu_memory()`: Returns (total_gb, available_gb)
|
||||
- `_select_optimal_dtype(available_gb)`: Auto-selects fp32/bf16/fp16
|
||||
- `get_memory_stats()`: Returns dict with current GPU memory usage
|
||||
- `estimate_model_memory()`: Returns estimated memory for different precisions
|
||||
|
||||
**Features:**
|
||||
- Auto-detects GPU memory and selects optimal dtype (bf16 for Ampere+, fp16 otherwise)
|
||||
- Graceful degradation: fp32 → bf16 → fp16 based on available memory
|
||||
- Enhanced OOM error messages with actionable suggestions
|
||||
- Memory stats reported on load/unload
|
||||
- Gradient checkpointing support for training scenarios
|
||||
|
||||
**Memory Estimates:**
|
||||
- FP32: ~6.8GB (1.7B params × 4 bytes + overhead)
|
||||
- FP16/BF16: ~3.9GB (50% reduction)
|
||||
- Minimum recommended: 4GB VRAM
|
||||
|
||||
links:
|
||||
tts_model: /home/mike/code/AudiobookPipeline/src/generation/tts_model.py
|
||||
|
||||
Reference in New Issue
Block a user