From 1855e7f0f3935a48f868d0c925282210c62536e9 Mon Sep 17 00:00:00 2001 From: Michael Freno Date: Fri, 5 Dec 2025 14:54:26 -0500 Subject: [PATCH] removed unneeded md --- ALGORITHMIC_OPTIMIZATIONS.md | 205 ------------------------ FFI_OPTIMIZATION_SUMMARY.md | 158 ------------------ docs/PERFORMANCE_ANALYSIS.md | 301 ----------------------------------- 3 files changed, 664 deletions(-) delete mode 100644 ALGORITHMIC_OPTIMIZATIONS.md delete mode 100644 FFI_OPTIMIZATION_SUMMARY.md delete mode 100644 docs/PERFORMANCE_ANALYSIS.md diff --git a/ALGORITHMIC_OPTIMIZATIONS.md b/ALGORITHMIC_OPTIMIZATIONS.md deleted file mode 100644 index 2e54499..0000000 --- a/ALGORITHMIC_OPTIMIZATIONS.md +++ /dev/null @@ -1,205 +0,0 @@ -# Algorithmic Performance Optimizations - -## Summary - -Implemented high-impact algorithmic optimizations to FlexLöve UI framework based on profiling analysis. These optimizations target the real performance bottlenecks identified in `PERFORMANCE_ANALYSIS.md`. - -**Estimated Total Gain: 2-3x faster layouts** (40-60% improvement expected based on profiling) - -## Optimizations Implemented - -### 1. Dirty Flag System ✅ (Priority 3) - -**Estimated Gain: 30-50% fewer layouts** - -**Implementation:** -- Added `_dirty` and `_childrenDirty` flags to Element module -- Elements track when properties change that affect layout -- Parent elements track when children need layout recalculation -- `LayoutEngine:_canSkipLayout()` checks dirty flags first (fastest check) -- `Element:invalidateLayout()` propagates dirty flags up the tree - -**Files Modified:** -- `modules/Element.lua` - - Added dirty flags initialization in `Element.new()` - - Enhanced `Element:invalidateLayout()` to mark self and ancestors - - Updated `Element:setProperty()` to invalidate layout for layout-affecting properties -- `modules/LayoutEngine.lua` - - Enhanced `_canSkipLayout()` to check dirty flags before expensive checks - -**Key Properties That Trigger Invalidation:** -- Dimensions: `width`, `height`, `padding`, `margin`, `gap` -- Layout: `flexDirection`, `flexWrap`, `justifyContent`, `alignItems`, `alignContent`, `positioning` -- Grid: `gridRows`, `gridColumns` -- Positioning: `top`, `right`, `bottom`, `left` - -### 2. Dimension Caching ✅ (Priority 4) - -**Estimated Gain: 10-15% faster** - -**Implementation:** -- Element module already had basic caching via `_borderBoxWidth` and `_borderBoxHeight` -- Enhanced with proper cache invalidation in `invalidateLayout()` -- Caches are cleared when element properties change -- `getBorderBoxWidth()` and `getBorderBoxHeight()` return cached values when available - -**Files Modified:** -- `modules/Element.lua` - - Added cache invalidation to `invalidateLayout()` - - Maintained existing `_borderBoxWidth` and `_borderBoxHeight` caching - -### 3. Local Variable Hoisting ✅ (Priority 2) - -**Estimated Gain: 15-20% faster** - -**Implementation:** -Optimized hot paths in `LayoutEngine:layoutChildren()` by hoisting frequently accessed table properties to local variables: - -**Wrapping Logic (Lines 403-441):** -- Hoisted `self.flexDirection` comparison → `isHorizontal` -- Hoisted `self.gap` → `gapSize` -- Cached `child.margin` per iteration -- Eliminated repeated enum lookups in tight loops - -**Line Height Calculation (Lines 458-487):** -- Hoisted `self.flexDirection` comparison → `isHorizontal` -- Preallocated `lineHeights` array with `table.create()` if available -- Cached `child.margin` per iteration -- Reduced repeated table access for margin properties - -**Positioning Loop (Lines 586-700):** -This is the **hottest path** - optimized heavily: -- Hoisted `self.element.x`, `self.element.y` → `elementX`, `elementY` -- Hoisted `self.element.padding` → `elementPadding` -- Hoisted padding properties → `elementPaddingLeft`, `elementPaddingTop` -- Hoisted alignment enums → `alignItems_*` constants -- Cached `child.margin`, `child.padding`, `child.autosizing` per iteration -- Cached individual margin values → `childMarginLeft`, `childMarginTop`, etc. -- Eliminated redundant table lookups in alignment calculations - -**Performance Impact:** -- **Before:** `child.margin.left` accessed 3-4 times per child → 3-4 table lookups -- **After:** `child.margin` cached once, then `childMarginLeft` used → 2 table lookups total -- Multiplied across hundreds/thousands of children = significant savings - -**Files Modified:** -- `modules/LayoutEngine.lua` - - Optimized wrapping logic (lines 403-441) - - Optimized line height calculation (lines 458-487) - - Optimized positioning loop for horizontal layout (lines 586-658) - - Optimized positioning loop for vertical layout (lines 660-700) - -### 4. Array Preallocation ✅ (Priority 5) - -**Estimated Gain: 5-10% less GC pressure** - -**Implementation:** -- Used `table.create(#lines)` to preallocate `lineHeights` array when available (LuaJIT) -- Graceful fallback to `{}` on standard Lua -- Reduces GC pressure by avoiding table resizing during growth - -**Files Modified:** -- `modules/LayoutEngine.lua` - - Preallocated `lineHeights` array (line 460) - -## Testing - -✅ **All 1257 tests passing** - -Ran full test suite with: -```bash -lua testing/runAll.lua --no-coverage -``` - -No regressions introduced. All layout calculations remain correct. - -## Performance Comparison - -### Before (FFI Optimizations Only) -- **Gain:** 5-10% improvement -- **Bottleneck:** O(n²) layout algorithm with repeated table access -- **Issue:** Targeting wrong optimization (memory allocation vs algorithm) - -### After (Algorithmic Optimizations) -- **Estimated Gain:** 40-60% improvement (2-3x faster) -- **Approach:** Target real bottlenecks (dirty flags, caching, local hoisting) -- **Benefit:** Fewer layouts + faster layout calculations - -### Combined (FFI + Algorithmic) -- **Total Estimated Gain:** 45-65% improvement -- **Reality:** Most gains come from algorithmic improvements, not FFI - -## What Was NOT Implemented - -### Single-Pass Layout (Priority 1) -**Estimated Gain: 40-60% faster** - Not implemented due to complexity - -This would require major refactoring of the layout algorithm to: -- Combine size calculation and positioning into single pass -- Cache dimensions during first pass -- Eliminate redundant iterations - -**Recommendation:** Consider for future optimization if more performance is needed after measuring gains from current optimizations. - -## Code Quality - -- ✅ Zero breaking changes -- ✅ All tests passing -- ✅ Maintains existing API -- ✅ Backward compatible -- ✅ Clear comments explaining optimizations -- ✅ Graceful fallbacks (e.g., `table.create`) - -## Benchmarking - -To benchmark improvements, use the existing profiling tools: - -```bash -# Run FFI comparison profile -love profiling/ ffi_comparison_profile - -# After 5 phases, press 'S' to save report -# Compare FPS and frame times before/after -``` - -**Expected Results:** -- **Small UIs (50 elements):** 20-30% faster -- **Medium UIs (200 elements):** 40-50% faster -- **Large UIs (1000 elements):** 50-60% faster -- **Deep nesting (10 levels):** 60%+ faster (dirty flags help most here) - -## Next Steps - -1. **Measure Real-World Performance:** - - Run benchmarks on actual applications - - Profile with 50, 200, 1000 element UIs - - Compare before/after metrics - -2. **Consider Single-Pass Layout:** - - If more performance needed after measuring - - Estimated 40-60% additional gain - - Complex refactor, weigh benefit vs cost - -3. **Profile Edge Cases:** - - Deep nesting scenarios - - Frequent property updates - - Immediate mode vs retained mode - -## Conclusion - -These algorithmic optimizations address the **real performance bottlenecks** identified through profiling: - -1. ✅ **Dirty flags** - Skip unnecessary layout recalculations -2. ✅ **Dimension caching** - Avoid redundant calculations -3. ✅ **Local hoisting** - Reduce table access overhead in hot paths -4. ✅ **Array preallocation** - Reduce GC pressure - -Unlike FFI optimizations (5-10% gain), these changes target the O(n²) layout algorithm complexity and table access overhead that actually dominate performance. - -**Bottom Line:** Simple algorithmic improvements beat fancy memory optimizations every time. - ---- - -**Branch:** `algorithmic-performance-optimizations` -**Status:** Complete, all tests passing -**Recommendation:** Merge after benchmarking confirms expected gains diff --git a/FFI_OPTIMIZATION_SUMMARY.md b/FFI_OPTIMIZATION_SUMMARY.md deleted file mode 100644 index 5f40e11..0000000 --- a/FFI_OPTIMIZATION_SUMMARY.md +++ /dev/null @@ -1,158 +0,0 @@ -# LuaJIT FFI Optimization Summary - -## What Was Implemented - -✅ **FFI Module** - Object pooling for Vec2, Rect, Timer structs -✅ **LayoutEngine Integration** - Batch calculation functions (not called) -✅ **Performance Module** - FFI-aware monitoring -✅ **Graceful Fallback** - Works on standard Lua -✅ **Profiling Tools** - Comparison profiles and reports - -## Actual Performance Gains - -### Reality: 5-10% Improvement (Marginal) - -The FFI optimizations provide **minimal gains** because they target the wrong bottleneck: - -| Scenario | Improvement | Why So Small? | -|----------|-------------|---------------| -| 50 elements | 2-5% | FFI overhead > benefit | -| 200 elements | 5-8% | Some GC reduction | -| 1000 elements | 8-12% | Pooling helps slightly | - -### Why Are Gains So Small? - -1. **FFI batch functions aren't called** - They exist but the layout algorithm doesn't use them -2. **Colors don't use FFI** - Need methods, so use Lua tables -3. **Wrong bottleneck** - Real issue is O(n²) layout algorithm, not memory allocation -4. **Table access overhead** - Lua table lookups dominate, not object creation - -## Real Performance Bottlenecks - -Based on profiling, here's where time actually goes: - -1. **Layout Algorithm** (60-80%) - Multiple passes, repeated calculations -2. **Table Access** (15-20%) - Nested table lookups in loops -3. **Function Calls** (10-15%) - Method call overhead -4. **GC** (10-20%) - Temporary allocations -5. **FFI Overhead** (5-10%) - What we optimized - -## High-Impact Optimizations (Not Yet Implemented) - -These would provide **2-3x performance gains**: - -### 1. Dirty Flag System (40-50% gain) -Skip layouts for unchanged subtrees - -### 2. Local Variable Hoisting (15-20% gain) -Cache table lookups outside loops - -### 3. Dimension Caching (10-15% gain) -Cache computed border-box dimensions - -### 4. Single-Pass Layout (30-40% gain) -Eliminate redundant iterations - -### 5. Array Preallocation (5-10% gain) -Reduce GC pressure - -**See `docs/PERFORMANCE_ANALYSIS.md` for details** - -## Should You Use FFI Optimizations? - -### ✅ Yes, Keep Them Because: -- Zero cost when disabled (standard Lua) -- Automatic on LuaJIT -- Foundation for future optimizations -- Some benefit for large UIs -- Well-tested and documented - -### ❌ Don't Expect Miracles: -- Won't fix slow layouts -- Marginal gains in practice -- Real wins come from algorithmic improvements - -## Recommendations - -### For Users -**Just use it** - FFI optimizations are automatic and safe. You'll get 5-10% improvement on LuaJIT with zero code changes. - -### For Developers -**Focus elsewhere** - If you want big performance gains: - -1. Implement dirty flag system -2. Add dimension caching -3. Hoist locals in hot loops -4. Profile and measure - -FFI is nice-to-have, not a silver bullet. - -## Comparison: FFI vs Algorithmic Optimizations - -| Optimization | Effort | Gain | Complexity | -|--------------|--------|------|------------| -| **FFI (current)** | 8 hours | 5-10% | Medium | -| **Dirty flags** | 2 hours | 40-50% | Low | -| **Local hoisting** | 3 hours | 15-20% | Low | -| **Dimension cache** | 2 hours | 10-15% | Low | -| **Single-pass layout** | 6 hours | 30-40% | High | - -**Lesson:** Simple algorithmic improvements beat fancy FFI optimizations. - -## Files Modified - -### New Files -- `modules/FFI.lua` - FFI module with pooling -- `docs/FFI_OPTIMIZATIONS.md` - User documentation -- `docs/PERFORMANCE_ANALYSIS.md` - Bottleneck analysis -- `profiling/__profiles__/ffi_comparison_profile.lua` - Comparison tool -- `profiling/__profiles__/ffi_optimization_profile.lua` - Demo - -### Modified Files -- `FlexLove.lua` - Initialize FFI -- `modules/LayoutEngine.lua` - Batch functions (unused) -- `modules/Performance.lua` - FFI integration -- `modules/Color.lua` - Intentionally NOT using FFI - -## Testing - -Run comparison profile: -```bash -love profiling/ ffi_comparison_profile -``` - -After 5 phases (50, 100, 200, 500, 1000 elements): -- Press 'S' to save report -- Check `profiling/reports/ffi_comparison/latest.md` -- Compare FPS, frame times, P99 values - -## Next Steps - -If you want **real** performance gains: - -1. **Read** `docs/PERFORMANCE_ANALYSIS.md` -2. **Implement** dirty flag system (biggest bang for buck) -3. **Profile** with comparison tool -4. **Measure** actual improvements -5. **Iterate** on high-impact optimizations - -FFI is done. Focus on the algorithm. - -## Conclusion - -**FFI optimizations are:** -- ✅ Correctly implemented -- ✅ Well-tested -- ✅ Properly documented -- ✅ Production-ready -- ❌ Not high-impact - -**They're a good foundation but not the solution to slow layouts.** - -The real wins come from smarter algorithms, not fancier memory management. - ---- - -**Branch:** `luajit-ffi-optimizations` -**Status:** Complete (but marginal gains) -**Recommendation:** Merge, then focus on algorithmic optimizations diff --git a/docs/PERFORMANCE_ANALYSIS.md b/docs/PERFORMANCE_ANALYSIS.md deleted file mode 100644 index 574a25f..0000000 --- a/docs/PERFORMANCE_ANALYSIS.md +++ /dev/null @@ -1,301 +0,0 @@ -# FlexLöve Performance Analysis & Optimization Opportunities - -## Current State: Why FFI Gains Are Marginal - -The current FFI optimizations provide minimal gains because: - -1. **FFI isn't used in hot paths** - The batch calculation function exists but isn't called -2. **Colors don't use FFI** - We disabled it due to method requirements -3. **Real bottleneck is elsewhere** - Layout algorithm complexity, not memory allocation - -## Actual Performance Bottlenecks (Profiled) - -### 1. Layout Algorithm Complexity - **HIGHEST IMPACT** - -**Problem:** O(n²) complexity in flex layout with wrapping -- Iterates children multiple times per layout -- Recalculates sizes repeatedly -- No caching of computed values - -**Impact:** 60-80% of frame time with 500+ elements - -**Solution:** -- Cache computed dimensions per frame -- Single-pass layout algorithm -- Dirty-flag system to skip unchanged subtrees - -### 2. Table Access Overhead - **HIGH IMPACT** - -**Problem:** Lua table lookups in tight loops -```lua -for i, child in ipairs(children) do - local w = child.width + child.padding.left + child.padding.right - local h = child.height + child.padding.top + child.padding.bottom - -- Repeated table access: child.margin.left, child.margin.right, etc. -end -``` - -**Impact:** 15-20% of layout time - -**Solution:** -- Local variable hoisting -- Flatten nested table access -- Use numeric indices instead of string keys where possible - -### 3. Function Call Overhead - **MEDIUM IMPACT** - -**Problem:** Method calls in loops -```lua -for i, child in ipairs(children) do - local w = child:getBorderBoxWidth() -- Function call overhead - local h = child:getBorderBoxHeight() -- Another function call -end -``` - -**Impact:** 10-15% of layout time - -**Solution:** -- Inline critical getters -- Direct field access where safe -- JIT-friendly code patterns - -### 4. Garbage Collection - **MEDIUM IMPACT** - -**Problem:** Temporary table allocation in loops -```lua -for i, child in ipairs(children) do - positions[i] = { x = x, y = y } -- New table every iteration -end -``` - -**Impact:** 10-20% overhead from GC pauses - -**Solution:** -- Reuse tables instead of allocating -- Object pooling for frequently created objects -- Preallocate arrays with known sizes - -### 5. String Concatenation - **LOW IMPACT** - -**Problem:** String operations in hot paths -```lua -local id = "layout_" .. elementId .. "_" .. frameCount -``` - -**Impact:** 5-10% in specific scenarios - -**Solution:** -- Cache generated strings -- Use string.format sparingly -- Avoid string operations in inner loops - -## High-Impact Optimizations (Recommended) - -### Priority 1: Layout Algorithm Optimization - -**Estimated Gain: 40-60% faster layouts** - -```lua --- BEFORE: Multiple passes -function LayoutEngine:layoutChildren() - -- Pass 1: Calculate sizes - for i, child in ipairs(children) do - child:calculateSize() - end - - -- Pass 2: Position elements - for i, child in ipairs(children) do - child:calculatePosition() - end - - -- Pass 3: Layout recursively - for i, child in ipairs(children) do - child:layoutChildren() - end -end - --- AFTER: Single pass with caching -function LayoutEngine:layoutChildren() - -- Cache dimensions once - local childSizes = {} - for i, child in ipairs(children) do - childSizes[i] = { - width = child._borderBoxWidth or (child.width + child.padding.left + child.padding.right), - height = child._borderBoxHeight or (child.height + child.padding.top + child.padding.bottom), - } - end - - -- Single pass: position and recurse - for i, child in ipairs(children) do - local size = childSizes[i] - child.x = calculateX(size.width) - child.y = calculateY(size.height) - child:layoutChildren() -- Recurse - end -end -``` - -### Priority 2: Local Variable Hoisting - -**Estimated Gain: 15-20% faster** - -```lua --- BEFORE: Repeated table access -for i, child in ipairs(children) do - local x = parent.x + parent.padding.left + child.margin.left - local y = parent.y + parent.padding.top + child.margin.top - local w = child.width + child.padding.left + child.padding.right -end - --- AFTER: Hoist to locals -local parentX = parent.x -local parentY = parent.y -local parentPaddingLeft = parent.padding.left -local parentPaddingTop = parent.padding.top - -for i, child in ipairs(children) do - local childMarginLeft = child.margin.left - local childMarginTop = child.margin.top - local childPaddingLeft = child.padding.left - local childPaddingRight = child.padding.right - - local x = parentX + parentPaddingLeft + childMarginLeft - local y = parentY + parentPaddingTop + childMarginTop - local w = child.width + childPaddingLeft + childPaddingRight -end -``` - -### Priority 3: Dirty Flag System - -**Estimated Gain: 30-50% fewer layouts** - -```lua --- Add dirty tracking to Element -function Element:setProperty(key, value) - if self[key] ~= value then - self[key] = value - self._dirty = true - self:invalidateLayout() - end -end - -function LayoutEngine:layoutChildren() - if not self.element._dirty and not self.element._childrenDirty then - return -- Skip layout entirely - end - - -- ... perform layout ... - - self.element._dirty = false - self.element._childrenDirty = false -end -``` - -### Priority 4: Dimension Caching - -**Estimated Gain: 10-15% faster** - -```lua --- Cache computed dimensions -function Element:getBorderBoxWidth() - if self._borderBoxWidthCache then - return self._borderBoxWidthCache - end - - self._borderBoxWidthCache = self.width + self.padding.left + self.padding.right - return self._borderBoxWidthCache -end - --- Invalidate on property change -function Element:setWidth(width) - self.width = width - self._borderBoxWidthCache = nil -- Invalidate cache - self._dirty = true -end -``` - -### Priority 5: Preallocate Arrays - -**Estimated Gain: 5-10% less GC pressure** - -```lua --- BEFORE: Grow array dynamically -local positions = {} -for i, child in ipairs(children) do - positions[i] = { x = x, y = y } -end - --- AFTER: Preallocate -local positions = table.create and table.create(#children) or {} -for i, child in ipairs(children) do - positions[i] = { x = x, y = y } -end -``` - -## FFI Optimizations (Current Implementation) - -**Estimated Gain: 5-10% in specific scenarios** - -Current FFI optimizations help with: -- Vec2/Rect pooling for batch operations -- Reduced GC pressure for position calculations -- Better cache locality for large arrays - -But they're limited because: -- Not used in main layout algorithm -- Colors can't use FFI (need methods) -- Overhead of wrapping/unwrapping FFI objects - -## Recommended Implementation Order - -1. **Dirty Flag System** (1-2 hours) - Biggest bang for buck -2. **Local Variable Hoisting** (2-3 hours) - Easy win -3. **Dimension Caching** (1-2 hours) - Simple optimization -4. **Single-Pass Layout** (4-6 hours) - Complex but high impact -5. **Array Preallocation** (1 hour) - Quick win - -**Total Estimated Gain: 2-3x faster layouts** - -## Benchmarking Strategy - -To measure improvements: - -1. **Baseline** - Current implementation -2. **After each optimization** - Measure incremental gain -3. **Compare scenarios**: - - Small UIs (50 elements) - - Medium UIs (200 elements) - - Large UIs (1000 elements) - - Deep nesting (10 levels) - - Flat hierarchy (1 level) - -## Why Not More Aggressive FFI? - -**Option: FFI-based layout engine** - -Could implement entire layout algorithm in C via FFI: -- 5-10x faster -- Much more complex -- Harder to maintain -- Loses Lua flexibility - -**Verdict:** Not worth it. The optimizations above give 80% of the benefit with 20% of the complexity. - -## Conclusion - -The current FFI optimizations are correct but target the wrong bottleneck. The real gains come from: - -1. **Algorithmic improvements** (dirty flags, caching) -2. **Lua optimization patterns** (local hoisting, inline) -3. **Reducing work** (skip unchanged subtrees) - -FFI helps at the margins but isn't the silver bullet. Focus on the high-impact optimizations first. - ---- - -**Next Steps:** -1. Implement dirty flag system -2. Add dimension caching -3. Hoist locals in hot loops -4. Profile again and measure gains -5. Consider single-pass layout if needed