Auto-commit 2026-03-15 02:40

This commit is contained in:
2026-03-15 02:40:30 -04:00
parent d7a37079f1
commit 891b25318a
6 changed files with 324 additions and 3 deletions

View File

@@ -43,6 +43,12 @@ completion_notes: |
Testing requires: docker-compose up -d redis
**Code Review Improvements (2026-03-15):**
- Fixed hardcoded subscriptionStatus="free" - now fetched from database via getUserSubscription()
- Fixed hardcoded demo user data in job completion/failure notifications
- Notifications now use actual user_id, email, and job data from database
- Added getUserEmailFromUserId() helper for fetching user emails
review_notes: |
Code review completed 2026-03-14 by Code Reviewer:
- Found solid implementation with proper separation of concerns

View File

@@ -3,7 +3,8 @@ date: 2026-03-08
day_of_week: Sunday
task_id: FRE-17
title: Add Memory-Efficient Model Loading
status: todo
status: done
completed_date: 2026-03-15
company_id: FrenoCorp
objective: Implement gradient checkpointing and mixed precision for lower VRAM usage
context: |
@@ -28,6 +29,33 @@ acceptance_criteria:
notes:
- Use torch.cuda.amp for mixed precision
- Set gradient_checkpointing=True in model config
- COMPLETED: Added memory-efficient model loading with auto-detection
completion_notes: |
Completed 2026-03-15. Deliverables:
**New Parameters:**
- `memory_efficient` (bool, default=True): Enable all memory-saving features
- `use_gradient_checkpointing` (bool, default=False): Trade compute for memory
- Enhanced `dtype` support with auto-selection based on available GPU memory
**New Methods:**
- `_check_gpu_memory()`: Returns (total_gb, available_gb)
- `_select_optimal_dtype(available_gb)`: Auto-selects fp32/bf16/fp16
- `get_memory_stats()`: Returns dict with current GPU memory usage
- `estimate_model_memory()`: Returns estimated memory for different precisions
**Features:**
- Auto-detects GPU memory and selects optimal dtype (bf16 for Ampere+, fp16 otherwise)
- Graceful degradation: fp32 → bf16 → fp16 based on available memory
- Enhanced OOM error messages with actionable suggestions
- Memory stats reported on load/unload
- Gradient checkpointing support for training scenarios
**Memory Estimates:**
- FP32: ~6.8GB (1.7B params × 4 bytes + overhead)
- FP16/BF16: ~3.9GB (50% reduction)
- Minimum recommended: 4GB VRAM
links:
tts_model: /home/mike/code/AudiobookPipeline/src/generation/tts_model.py