removed unneeded md

This commit is contained in:
Michael Freno
2025-12-05 14:54:26 -05:00
parent abe34c4749
commit 1855e7f0f3
3 changed files with 0 additions and 664 deletions

View File

@@ -1,205 +0,0 @@
# Algorithmic Performance Optimizations
## Summary
Implemented high-impact algorithmic optimizations to FlexLöve UI framework based on profiling analysis. These optimizations target the real performance bottlenecks identified in `PERFORMANCE_ANALYSIS.md`.
**Estimated Total Gain: 2-3x faster layouts** (40-60% improvement expected based on profiling)
## Optimizations Implemented
### 1. Dirty Flag System ✅ (Priority 3)
**Estimated Gain: 30-50% fewer layouts**
**Implementation:**
- Added `_dirty` and `_childrenDirty` flags to Element module
- Elements track when properties change that affect layout
- Parent elements track when children need layout recalculation
- `LayoutEngine:_canSkipLayout()` checks dirty flags first (fastest check)
- `Element:invalidateLayout()` propagates dirty flags up the tree
**Files Modified:**
- `modules/Element.lua`
- Added dirty flags initialization in `Element.new()`
- Enhanced `Element:invalidateLayout()` to mark self and ancestors
- Updated `Element:setProperty()` to invalidate layout for layout-affecting properties
- `modules/LayoutEngine.lua`
- Enhanced `_canSkipLayout()` to check dirty flags before expensive checks
**Key Properties That Trigger Invalidation:**
- Dimensions: `width`, `height`, `padding`, `margin`, `gap`
- Layout: `flexDirection`, `flexWrap`, `justifyContent`, `alignItems`, `alignContent`, `positioning`
- Grid: `gridRows`, `gridColumns`
- Positioning: `top`, `right`, `bottom`, `left`
### 2. Dimension Caching ✅ (Priority 4)
**Estimated Gain: 10-15% faster**
**Implementation:**
- Element module already had basic caching via `_borderBoxWidth` and `_borderBoxHeight`
- Enhanced with proper cache invalidation in `invalidateLayout()`
- Caches are cleared when element properties change
- `getBorderBoxWidth()` and `getBorderBoxHeight()` return cached values when available
**Files Modified:**
- `modules/Element.lua`
- Added cache invalidation to `invalidateLayout()`
- Maintained existing `_borderBoxWidth` and `_borderBoxHeight` caching
### 3. Local Variable Hoisting ✅ (Priority 2)
**Estimated Gain: 15-20% faster**
**Implementation:**
Optimized hot paths in `LayoutEngine:layoutChildren()` by hoisting frequently accessed table properties to local variables:
**Wrapping Logic (Lines 403-441):**
- Hoisted `self.flexDirection` comparison → `isHorizontal`
- Hoisted `self.gap``gapSize`
- Cached `child.margin` per iteration
- Eliminated repeated enum lookups in tight loops
**Line Height Calculation (Lines 458-487):**
- Hoisted `self.flexDirection` comparison → `isHorizontal`
- Preallocated `lineHeights` array with `table.create()` if available
- Cached `child.margin` per iteration
- Reduced repeated table access for margin properties
**Positioning Loop (Lines 586-700):**
This is the **hottest path** - optimized heavily:
- Hoisted `self.element.x`, `self.element.y``elementX`, `elementY`
- Hoisted `self.element.padding``elementPadding`
- Hoisted padding properties → `elementPaddingLeft`, `elementPaddingTop`
- Hoisted alignment enums → `alignItems_*` constants
- Cached `child.margin`, `child.padding`, `child.autosizing` per iteration
- Cached individual margin values → `childMarginLeft`, `childMarginTop`, etc.
- Eliminated redundant table lookups in alignment calculations
**Performance Impact:**
- **Before:** `child.margin.left` accessed 3-4 times per child → 3-4 table lookups
- **After:** `child.margin` cached once, then `childMarginLeft` used → 2 table lookups total
- Multiplied across hundreds/thousands of children = significant savings
**Files Modified:**
- `modules/LayoutEngine.lua`
- Optimized wrapping logic (lines 403-441)
- Optimized line height calculation (lines 458-487)
- Optimized positioning loop for horizontal layout (lines 586-658)
- Optimized positioning loop for vertical layout (lines 660-700)
### 4. Array Preallocation ✅ (Priority 5)
**Estimated Gain: 5-10% less GC pressure**
**Implementation:**
- Used `table.create(#lines)` to preallocate `lineHeights` array when available (LuaJIT)
- Graceful fallback to `{}` on standard Lua
- Reduces GC pressure by avoiding table resizing during growth
**Files Modified:**
- `modules/LayoutEngine.lua`
- Preallocated `lineHeights` array (line 460)
## Testing
**All 1257 tests passing**
Ran full test suite with:
```bash
lua testing/runAll.lua --no-coverage
```
No regressions introduced. All layout calculations remain correct.
## Performance Comparison
### Before (FFI Optimizations Only)
- **Gain:** 5-10% improvement
- **Bottleneck:** O(n²) layout algorithm with repeated table access
- **Issue:** Targeting wrong optimization (memory allocation vs algorithm)
### After (Algorithmic Optimizations)
- **Estimated Gain:** 40-60% improvement (2-3x faster)
- **Approach:** Target real bottlenecks (dirty flags, caching, local hoisting)
- **Benefit:** Fewer layouts + faster layout calculations
### Combined (FFI + Algorithmic)
- **Total Estimated Gain:** 45-65% improvement
- **Reality:** Most gains come from algorithmic improvements, not FFI
## What Was NOT Implemented
### Single-Pass Layout (Priority 1)
**Estimated Gain: 40-60% faster** - Not implemented due to complexity
This would require major refactoring of the layout algorithm to:
- Combine size calculation and positioning into single pass
- Cache dimensions during first pass
- Eliminate redundant iterations
**Recommendation:** Consider for future optimization if more performance is needed after measuring gains from current optimizations.
## Code Quality
- ✅ Zero breaking changes
- ✅ All tests passing
- ✅ Maintains existing API
- ✅ Backward compatible
- ✅ Clear comments explaining optimizations
- ✅ Graceful fallbacks (e.g., `table.create`)
## Benchmarking
To benchmark improvements, use the existing profiling tools:
```bash
# Run FFI comparison profile
love profiling/ ffi_comparison_profile
# After 5 phases, press 'S' to save report
# Compare FPS and frame times before/after
```
**Expected Results:**
- **Small UIs (50 elements):** 20-30% faster
- **Medium UIs (200 elements):** 40-50% faster
- **Large UIs (1000 elements):** 50-60% faster
- **Deep nesting (10 levels):** 60%+ faster (dirty flags help most here)
## Next Steps
1. **Measure Real-World Performance:**
- Run benchmarks on actual applications
- Profile with 50, 200, 1000 element UIs
- Compare before/after metrics
2. **Consider Single-Pass Layout:**
- If more performance needed after measuring
- Estimated 40-60% additional gain
- Complex refactor, weigh benefit vs cost
3. **Profile Edge Cases:**
- Deep nesting scenarios
- Frequent property updates
- Immediate mode vs retained mode
## Conclusion
These algorithmic optimizations address the **real performance bottlenecks** identified through profiling:
1.**Dirty flags** - Skip unnecessary layout recalculations
2.**Dimension caching** - Avoid redundant calculations
3.**Local hoisting** - Reduce table access overhead in hot paths
4.**Array preallocation** - Reduce GC pressure
Unlike FFI optimizations (5-10% gain), these changes target the O(n²) layout algorithm complexity and table access overhead that actually dominate performance.
**Bottom Line:** Simple algorithmic improvements beat fancy memory optimizations every time.
---
**Branch:** `algorithmic-performance-optimizations`
**Status:** Complete, all tests passing
**Recommendation:** Merge after benchmarking confirms expected gains

View File

@@ -1,158 +0,0 @@
# LuaJIT FFI Optimization Summary
## What Was Implemented
**FFI Module** - Object pooling for Vec2, Rect, Timer structs
**LayoutEngine Integration** - Batch calculation functions (not called)
**Performance Module** - FFI-aware monitoring
**Graceful Fallback** - Works on standard Lua
**Profiling Tools** - Comparison profiles and reports
## Actual Performance Gains
### Reality: 5-10% Improvement (Marginal)
The FFI optimizations provide **minimal gains** because they target the wrong bottleneck:
| Scenario | Improvement | Why So Small? |
|----------|-------------|---------------|
| 50 elements | 2-5% | FFI overhead > benefit |
| 200 elements | 5-8% | Some GC reduction |
| 1000 elements | 8-12% | Pooling helps slightly |
### Why Are Gains So Small?
1. **FFI batch functions aren't called** - They exist but the layout algorithm doesn't use them
2. **Colors don't use FFI** - Need methods, so use Lua tables
3. **Wrong bottleneck** - Real issue is O(n²) layout algorithm, not memory allocation
4. **Table access overhead** - Lua table lookups dominate, not object creation
## Real Performance Bottlenecks
Based on profiling, here's where time actually goes:
1. **Layout Algorithm** (60-80%) - Multiple passes, repeated calculations
2. **Table Access** (15-20%) - Nested table lookups in loops
3. **Function Calls** (10-15%) - Method call overhead
4. **GC** (10-20%) - Temporary allocations
5. **FFI Overhead** (5-10%) - What we optimized
## High-Impact Optimizations (Not Yet Implemented)
These would provide **2-3x performance gains**:
### 1. Dirty Flag System (40-50% gain)
Skip layouts for unchanged subtrees
### 2. Local Variable Hoisting (15-20% gain)
Cache table lookups outside loops
### 3. Dimension Caching (10-15% gain)
Cache computed border-box dimensions
### 4. Single-Pass Layout (30-40% gain)
Eliminate redundant iterations
### 5. Array Preallocation (5-10% gain)
Reduce GC pressure
**See `docs/PERFORMANCE_ANALYSIS.md` for details**
## Should You Use FFI Optimizations?
### ✅ Yes, Keep Them Because:
- Zero cost when disabled (standard Lua)
- Automatic on LuaJIT
- Foundation for future optimizations
- Some benefit for large UIs
- Well-tested and documented
### ❌ Don't Expect Miracles:
- Won't fix slow layouts
- Marginal gains in practice
- Real wins come from algorithmic improvements
## Recommendations
### For Users
**Just use it** - FFI optimizations are automatic and safe. You'll get 5-10% improvement on LuaJIT with zero code changes.
### For Developers
**Focus elsewhere** - If you want big performance gains:
1. Implement dirty flag system
2. Add dimension caching
3. Hoist locals in hot loops
4. Profile and measure
FFI is nice-to-have, not a silver bullet.
## Comparison: FFI vs Algorithmic Optimizations
| Optimization | Effort | Gain | Complexity |
|--------------|--------|------|------------|
| **FFI (current)** | 8 hours | 5-10% | Medium |
| **Dirty flags** | 2 hours | 40-50% | Low |
| **Local hoisting** | 3 hours | 15-20% | Low |
| **Dimension cache** | 2 hours | 10-15% | Low |
| **Single-pass layout** | 6 hours | 30-40% | High |
**Lesson:** Simple algorithmic improvements beat fancy FFI optimizations.
## Files Modified
### New Files
- `modules/FFI.lua` - FFI module with pooling
- `docs/FFI_OPTIMIZATIONS.md` - User documentation
- `docs/PERFORMANCE_ANALYSIS.md` - Bottleneck analysis
- `profiling/__profiles__/ffi_comparison_profile.lua` - Comparison tool
- `profiling/__profiles__/ffi_optimization_profile.lua` - Demo
### Modified Files
- `FlexLove.lua` - Initialize FFI
- `modules/LayoutEngine.lua` - Batch functions (unused)
- `modules/Performance.lua` - FFI integration
- `modules/Color.lua` - Intentionally NOT using FFI
## Testing
Run comparison profile:
```bash
love profiling/ ffi_comparison_profile
```
After 5 phases (50, 100, 200, 500, 1000 elements):
- Press 'S' to save report
- Check `profiling/reports/ffi_comparison/latest.md`
- Compare FPS, frame times, P99 values
## Next Steps
If you want **real** performance gains:
1. **Read** `docs/PERFORMANCE_ANALYSIS.md`
2. **Implement** dirty flag system (biggest bang for buck)
3. **Profile** with comparison tool
4. **Measure** actual improvements
5. **Iterate** on high-impact optimizations
FFI is done. Focus on the algorithm.
## Conclusion
**FFI optimizations are:**
- ✅ Correctly implemented
- ✅ Well-tested
- ✅ Properly documented
- ✅ Production-ready
- ❌ Not high-impact
**They're a good foundation but not the solution to slow layouts.**
The real wins come from smarter algorithms, not fancier memory management.
---
**Branch:** `luajit-ffi-optimizations`
**Status:** Complete (but marginal gains)
**Recommendation:** Merge, then focus on algorithmic optimizations

View File

@@ -1,301 +0,0 @@
# FlexLöve Performance Analysis & Optimization Opportunities
## Current State: Why FFI Gains Are Marginal
The current FFI optimizations provide minimal gains because:
1. **FFI isn't used in hot paths** - The batch calculation function exists but isn't called
2. **Colors don't use FFI** - We disabled it due to method requirements
3. **Real bottleneck is elsewhere** - Layout algorithm complexity, not memory allocation
## Actual Performance Bottlenecks (Profiled)
### 1. Layout Algorithm Complexity - **HIGHEST IMPACT**
**Problem:** O(n²) complexity in flex layout with wrapping
- Iterates children multiple times per layout
- Recalculates sizes repeatedly
- No caching of computed values
**Impact:** 60-80% of frame time with 500+ elements
**Solution:**
- Cache computed dimensions per frame
- Single-pass layout algorithm
- Dirty-flag system to skip unchanged subtrees
### 2. Table Access Overhead - **HIGH IMPACT**
**Problem:** Lua table lookups in tight loops
```lua
for i, child in ipairs(children) do
local w = child.width + child.padding.left + child.padding.right
local h = child.height + child.padding.top + child.padding.bottom
-- Repeated table access: child.margin.left, child.margin.right, etc.
end
```
**Impact:** 15-20% of layout time
**Solution:**
- Local variable hoisting
- Flatten nested table access
- Use numeric indices instead of string keys where possible
### 3. Function Call Overhead - **MEDIUM IMPACT**
**Problem:** Method calls in loops
```lua
for i, child in ipairs(children) do
local w = child:getBorderBoxWidth() -- Function call overhead
local h = child:getBorderBoxHeight() -- Another function call
end
```
**Impact:** 10-15% of layout time
**Solution:**
- Inline critical getters
- Direct field access where safe
- JIT-friendly code patterns
### 4. Garbage Collection - **MEDIUM IMPACT**
**Problem:** Temporary table allocation in loops
```lua
for i, child in ipairs(children) do
positions[i] = { x = x, y = y } -- New table every iteration
end
```
**Impact:** 10-20% overhead from GC pauses
**Solution:**
- Reuse tables instead of allocating
- Object pooling for frequently created objects
- Preallocate arrays with known sizes
### 5. String Concatenation - **LOW IMPACT**
**Problem:** String operations in hot paths
```lua
local id = "layout_" .. elementId .. "_" .. frameCount
```
**Impact:** 5-10% in specific scenarios
**Solution:**
- Cache generated strings
- Use string.format sparingly
- Avoid string operations in inner loops
## High-Impact Optimizations (Recommended)
### Priority 1: Layout Algorithm Optimization
**Estimated Gain: 40-60% faster layouts**
```lua
-- BEFORE: Multiple passes
function LayoutEngine:layoutChildren()
-- Pass 1: Calculate sizes
for i, child in ipairs(children) do
child:calculateSize()
end
-- Pass 2: Position elements
for i, child in ipairs(children) do
child:calculatePosition()
end
-- Pass 3: Layout recursively
for i, child in ipairs(children) do
child:layoutChildren()
end
end
-- AFTER: Single pass with caching
function LayoutEngine:layoutChildren()
-- Cache dimensions once
local childSizes = {}
for i, child in ipairs(children) do
childSizes[i] = {
width = child._borderBoxWidth or (child.width + child.padding.left + child.padding.right),
height = child._borderBoxHeight or (child.height + child.padding.top + child.padding.bottom),
}
end
-- Single pass: position and recurse
for i, child in ipairs(children) do
local size = childSizes[i]
child.x = calculateX(size.width)
child.y = calculateY(size.height)
child:layoutChildren() -- Recurse
end
end
```
### Priority 2: Local Variable Hoisting
**Estimated Gain: 15-20% faster**
```lua
-- BEFORE: Repeated table access
for i, child in ipairs(children) do
local x = parent.x + parent.padding.left + child.margin.left
local y = parent.y + parent.padding.top + child.margin.top
local w = child.width + child.padding.left + child.padding.right
end
-- AFTER: Hoist to locals
local parentX = parent.x
local parentY = parent.y
local parentPaddingLeft = parent.padding.left
local parentPaddingTop = parent.padding.top
for i, child in ipairs(children) do
local childMarginLeft = child.margin.left
local childMarginTop = child.margin.top
local childPaddingLeft = child.padding.left
local childPaddingRight = child.padding.right
local x = parentX + parentPaddingLeft + childMarginLeft
local y = parentY + parentPaddingTop + childMarginTop
local w = child.width + childPaddingLeft + childPaddingRight
end
```
### Priority 3: Dirty Flag System
**Estimated Gain: 30-50% fewer layouts**
```lua
-- Add dirty tracking to Element
function Element:setProperty(key, value)
if self[key] ~= value then
self[key] = value
self._dirty = true
self:invalidateLayout()
end
end
function LayoutEngine:layoutChildren()
if not self.element._dirty and not self.element._childrenDirty then
return -- Skip layout entirely
end
-- ... perform layout ...
self.element._dirty = false
self.element._childrenDirty = false
end
```
### Priority 4: Dimension Caching
**Estimated Gain: 10-15% faster**
```lua
-- Cache computed dimensions
function Element:getBorderBoxWidth()
if self._borderBoxWidthCache then
return self._borderBoxWidthCache
end
self._borderBoxWidthCache = self.width + self.padding.left + self.padding.right
return self._borderBoxWidthCache
end
-- Invalidate on property change
function Element:setWidth(width)
self.width = width
self._borderBoxWidthCache = nil -- Invalidate cache
self._dirty = true
end
```
### Priority 5: Preallocate Arrays
**Estimated Gain: 5-10% less GC pressure**
```lua
-- BEFORE: Grow array dynamically
local positions = {}
for i, child in ipairs(children) do
positions[i] = { x = x, y = y }
end
-- AFTER: Preallocate
local positions = table.create and table.create(#children) or {}
for i, child in ipairs(children) do
positions[i] = { x = x, y = y }
end
```
## FFI Optimizations (Current Implementation)
**Estimated Gain: 5-10% in specific scenarios**
Current FFI optimizations help with:
- Vec2/Rect pooling for batch operations
- Reduced GC pressure for position calculations
- Better cache locality for large arrays
But they're limited because:
- Not used in main layout algorithm
- Colors can't use FFI (need methods)
- Overhead of wrapping/unwrapping FFI objects
## Recommended Implementation Order
1. **Dirty Flag System** (1-2 hours) - Biggest bang for buck
2. **Local Variable Hoisting** (2-3 hours) - Easy win
3. **Dimension Caching** (1-2 hours) - Simple optimization
4. **Single-Pass Layout** (4-6 hours) - Complex but high impact
5. **Array Preallocation** (1 hour) - Quick win
**Total Estimated Gain: 2-3x faster layouts**
## Benchmarking Strategy
To measure improvements:
1. **Baseline** - Current implementation
2. **After each optimization** - Measure incremental gain
3. **Compare scenarios**:
- Small UIs (50 elements)
- Medium UIs (200 elements)
- Large UIs (1000 elements)
- Deep nesting (10 levels)
- Flat hierarchy (1 level)
## Why Not More Aggressive FFI?
**Option: FFI-based layout engine**
Could implement entire layout algorithm in C via FFI:
- 5-10x faster
- Much more complex
- Harder to maintain
- Loses Lua flexibility
**Verdict:** Not worth it. The optimizations above give 80% of the benefit with 20% of the complexity.
## Conclusion
The current FFI optimizations are correct but target the wrong bottleneck. The real gains come from:
1. **Algorithmic improvements** (dirty flags, caching)
2. **Lua optimization patterns** (local hoisting, inline)
3. **Reducing work** (skip unchanged subtrees)
FFI helps at the margins but isn't the silver bullet. Focus on the high-impact optimizations first.
---
**Next Steps:**
1. Implement dirty flag system
2. Add dimension caching
3. Hoist locals in hot loops
4. Profile again and measure gains
5. Consider single-pass layout if needed