removed unneeded md

2025-12-05 14:54:26 -05:00
parent abe34c4749
commit 1855e7f0f3
3 changed files with 0 additions and 664 deletions
--- a/ALGORITHMIC_OPTIMIZATIONS.md
+++ b/ALGORITHMIC_OPTIMIZATIONS.md
@@ -1,205 +0,0 @@
 # Algorithmic Performance Optimizations
 ## Summary
 Implemented high-impact algorithmic optimizations to FlexLöve UI framework based on profiling analysis. These optimizations target the real performance bottlenecks identified in `PERFORMANCE_ANALYSIS.md`.
 **Estimated Total Gain: 2-3x faster layouts** (40-60% improvement expected based on profiling)
 ## Optimizations Implemented
 ### 1. Dirty Flag System ✅ (Priority 3)
 **Estimated Gain: 30-50% fewer layouts**
 **Implementation:**
 - Added `_dirty` and `_childrenDirty` flags to Element module
 - Elements track when properties change that affect layout
 - Parent elements track when children need layout recalculation
 - `LayoutEngine:_canSkipLayout()` checks dirty flags first (fastest check)
 - `Element:invalidateLayout()` propagates dirty flags up the tree
 **Files Modified:**
 - `modules/Element.lua`
  - Added dirty flags initialization in `Element.new()`
  - Enhanced `Element:invalidateLayout()` to mark self and ancestors
  - Updated `Element:setProperty()` to invalidate layout for layout-affecting properties
 - `modules/LayoutEngine.lua`
  - Enhanced `_canSkipLayout()` to check dirty flags before expensive checks
 **Key Properties That Trigger Invalidation:**
 - Dimensions: `width`, `height`, `padding`, `margin`, `gap`
 - Layout: `flexDirection`, `flexWrap`, `justifyContent`, `alignItems`, `alignContent`, `positioning`
 - Grid: `gridRows`, `gridColumns`
 - Positioning: `top`, `right`, `bottom`, `left`
 ### 2. Dimension Caching ✅ (Priority 4)
 **Estimated Gain: 10-15% faster**
 **Implementation:**
 - Element module already had basic caching via `_borderBoxWidth` and `_borderBoxHeight`
 - Enhanced with proper cache invalidation in `invalidateLayout()`
 - Caches are cleared when element properties change
 - `getBorderBoxWidth()` and `getBorderBoxHeight()` return cached values when available
 **Files Modified:**
 - `modules/Element.lua`
  - Added cache invalidation to `invalidateLayout()`
  - Maintained existing `_borderBoxWidth` and `_borderBoxHeight` caching
 ### 3. Local Variable Hoisting ✅ (Priority 2)
 **Estimated Gain: 15-20% faster**
 **Implementation:**
 Optimized hot paths in `LayoutEngine:layoutChildren()` by hoisting frequently accessed table properties to local variables:
 **Wrapping Logic (Lines 403-441):**
 - Hoisted `self.flexDirection` comparison → `isHorizontal`
 - Hoisted `self.gap` → `gapSize`
 - Cached `child.margin` per iteration
 - Eliminated repeated enum lookups in tight loops
 **Line Height Calculation (Lines 458-487):**
 - Hoisted `self.flexDirection` comparison → `isHorizontal`
 - Preallocated `lineHeights` array with `table.create()` if available
 - Cached `child.margin` per iteration
 - Reduced repeated table access for margin properties
 **Positioning Loop (Lines 586-700):**
 This is the **hottest path** - optimized heavily:
 - Hoisted `self.element.x`, `self.element.y` → `elementX`, `elementY`
 - Hoisted `self.element.padding` → `elementPadding`
 - Hoisted padding properties → `elementPaddingLeft`, `elementPaddingTop`
 - Hoisted alignment enums → `alignItems_*` constants
 - Cached `child.margin`, `child.padding`, `child.autosizing` per iteration
 - Cached individual margin values → `childMarginLeft`, `childMarginTop`, etc.
 - Eliminated redundant table lookups in alignment calculations
 **Performance Impact:**
 - **Before:** `child.margin.left` accessed 3-4 times per child → 3-4 table lookups
 - **After:** `child.margin` cached once, then `childMarginLeft` used → 2 table lookups total
 - Multiplied across hundreds/thousands of children = significant savings
 **Files Modified:**
 - `modules/LayoutEngine.lua`
  - Optimized wrapping logic (lines 403-441)
  - Optimized line height calculation (lines 458-487)
  - Optimized positioning loop for horizontal layout (lines 586-658)
  - Optimized positioning loop for vertical layout (lines 660-700)
 ### 4. Array Preallocation ✅ (Priority 5)
 **Estimated Gain: 5-10% less GC pressure**
 **Implementation:**
 - Used `table.create(#lines)` to preallocate `lineHeights` array when available (LuaJIT)
 - Graceful fallback to `{}` on standard Lua
 - Reduces GC pressure by avoiding table resizing during growth
 **Files Modified:**
 - `modules/LayoutEngine.lua`
  - Preallocated `lineHeights` array (line 460)
 ## Testing
 ✅ **All 1257 tests passing**
 Ran full test suite with:
 ```bash
 lua testing/runAll.lua --no-coverage
 ```
 No regressions introduced. All layout calculations remain correct.
 ## Performance Comparison
 ### Before (FFI Optimizations Only)
 - **Gain:** 5-10% improvement
 - **Bottleneck:** O(n²) layout algorithm with repeated table access
 - **Issue:** Targeting wrong optimization (memory allocation vs algorithm)
 ### After (Algorithmic Optimizations)
 - **Estimated Gain:** 40-60% improvement (2-3x faster)
 - **Approach:** Target real bottlenecks (dirty flags, caching, local hoisting)
 - **Benefit:** Fewer layouts + faster layout calculations
 ### Combined (FFI + Algorithmic)
 - **Total Estimated Gain:** 45-65% improvement
 - **Reality:** Most gains come from algorithmic improvements, not FFI
 ## What Was NOT Implemented
 ### Single-Pass Layout (Priority 1)
 **Estimated Gain: 40-60% faster** - Not implemented due to complexity
 This would require major refactoring of the layout algorithm to:
 - Combine size calculation and positioning into single pass
 - Cache dimensions during first pass
 - Eliminate redundant iterations
 **Recommendation:** Consider for future optimization if more performance is needed after measuring gains from current optimizations.
 ## Code Quality
 - ✅ Zero breaking changes
 - ✅ All tests passing
 - ✅ Maintains existing API
 - ✅ Backward compatible
 - ✅ Clear comments explaining optimizations
 - ✅ Graceful fallbacks (e.g., `table.create`)
 ## Benchmarking
 To benchmark improvements, use the existing profiling tools:
 ```bash
 # Run FFI comparison profile
 love profiling/ ffi_comparison_profile
 # After 5 phases, press 'S' to save report
 # Compare FPS and frame times before/after
 ```
 **Expected Results:**
 - **Small UIs (50 elements):** 20-30% faster
 - **Medium UIs (200 elements):** 40-50% faster
 - **Large UIs (1000 elements):** 50-60% faster
 - **Deep nesting (10 levels):** 60%+ faster (dirty flags help most here)
 ## Next Steps
 1. **Measure Real-World Performance:**
   - Run benchmarks on actual applications
   - Profile with 50, 200, 1000 element UIs
   - Compare before/after metrics
 2. **Consider Single-Pass Layout:**
   - If more performance needed after measuring
   - Estimated 40-60% additional gain
   - Complex refactor, weigh benefit vs cost
 3. **Profile Edge Cases:**
   - Deep nesting scenarios
   - Frequent property updates
   - Immediate mode vs retained mode
 ## Conclusion
 These algorithmic optimizations address the **real performance bottlenecks** identified through profiling:
 1. ✅ **Dirty flags** - Skip unnecessary layout recalculations
 2. ✅ **Dimension caching** - Avoid redundant calculations
 3. ✅ **Local hoisting** - Reduce table access overhead in hot paths
 4. ✅ **Array preallocation** - Reduce GC pressure
 Unlike FFI optimizations (5-10% gain), these changes target the O(n²) layout algorithm complexity and table access overhead that actually dominate performance.
 **Bottom Line:** Simple algorithmic improvements beat fancy memory optimizations every time.
 ---
 **Branch:** `algorithmic-performance-optimizations`
 **Status:** Complete, all tests passing
 **Recommendation:** Merge after benchmarking confirms expected gains
--- a/FFI_OPTIMIZATION_SUMMARY.md
+++ b/FFI_OPTIMIZATION_SUMMARY.md
@@ -1,158 +0,0 @@
 # LuaJIT FFI Optimization Summary
 ## What Was Implemented
 ✅ **FFI Module** - Object pooling for Vec2, Rect, Timer structs  
 ✅ **LayoutEngine Integration** - Batch calculation functions (not called)  
 ✅ **Performance Module** - FFI-aware monitoring  
 ✅ **Graceful Fallback** - Works on standard Lua  
 ✅ **Profiling Tools** - Comparison profiles and reports  
 ## Actual Performance Gains
 ### Reality: 5-10% Improvement (Marginal)
 The FFI optimizations provide **minimal gains** because they target the wrong bottleneck:
 | Scenario | Improvement | Why So Small? |
 |----------|-------------|---------------|
 | 50 elements | 2-5% | FFI overhead > benefit |
 | 200 elements | 5-8% | Some GC reduction |
 | 1000 elements | 8-12% | Pooling helps slightly |
 ### Why Are Gains So Small?
 1. **FFI batch functions aren't called** - They exist but the layout algorithm doesn't use them
 2. **Colors don't use FFI** - Need methods, so use Lua tables
 3. **Wrong bottleneck** - Real issue is O(n²) layout algorithm, not memory allocation
 4. **Table access overhead** - Lua table lookups dominate, not object creation
 ## Real Performance Bottlenecks
 Based on profiling, here's where time actually goes:
 1. **Layout Algorithm** (60-80%) - Multiple passes, repeated calculations
 2. **Table Access** (15-20%) - Nested table lookups in loops
 3. **Function Calls** (10-15%) - Method call overhead
 4. **GC** (10-20%) - Temporary allocations
 5. **FFI Overhead** (5-10%) - What we optimized
 ## High-Impact Optimizations (Not Yet Implemented)
 These would provide **2-3x performance gains**:
 ### 1. Dirty Flag System (40-50% gain)
 Skip layouts for unchanged subtrees
 ### 2. Local Variable Hoisting (15-20% gain)
 Cache table lookups outside loops
 ### 3. Dimension Caching (10-15% gain)
 Cache computed border-box dimensions
 ### 4. Single-Pass Layout (30-40% gain)
 Eliminate redundant iterations
 ### 5. Array Preallocation (5-10% gain)
 Reduce GC pressure
 **See `docs/PERFORMANCE_ANALYSIS.md` for details**
 ## Should You Use FFI Optimizations?
 ### ✅ Yes, Keep Them Because:
 - Zero cost when disabled (standard Lua)
 - Automatic on LuaJIT
 - Foundation for future optimizations
 - Some benefit for large UIs
 - Well-tested and documented
 ### ❌ Don't Expect Miracles:
 - Won't fix slow layouts
 - Marginal gains in practice
 - Real wins come from algorithmic improvements
 ## Recommendations
 ### For Users
 **Just use it** - FFI optimizations are automatic and safe. You'll get 5-10% improvement on LuaJIT with zero code changes.
 ### For Developers
 **Focus elsewhere** - If you want big performance gains:
 1. Implement dirty flag system
 2. Add dimension caching
 3. Hoist locals in hot loops
 4. Profile and measure
 FFI is nice-to-have, not a silver bullet.
 ## Comparison: FFI vs Algorithmic Optimizations
 | Optimization | Effort | Gain | Complexity |
 |--------------|--------|------|------------|
 | **FFI (current)** | 8 hours | 5-10% | Medium |
 | **Dirty flags** | 2 hours | 40-50% | Low |
 | **Local hoisting** | 3 hours | 15-20% | Low |
 | **Dimension cache** | 2 hours | 10-15% | Low |
 | **Single-pass layout** | 6 hours | 30-40% | High |
 **Lesson:** Simple algorithmic improvements beat fancy FFI optimizations.
 ## Files Modified
 ### New Files
 - `modules/FFI.lua` - FFI module with pooling
 - `docs/FFI_OPTIMIZATIONS.md` - User documentation
 - `docs/PERFORMANCE_ANALYSIS.md` - Bottleneck analysis
 - `profiling/__profiles__/ffi_comparison_profile.lua` - Comparison tool
 - `profiling/__profiles__/ffi_optimization_profile.lua` - Demo
 ### Modified Files
 - `FlexLove.lua` - Initialize FFI
 - `modules/LayoutEngine.lua` - Batch functions (unused)
 - `modules/Performance.lua` - FFI integration
 - `modules/Color.lua` - Intentionally NOT using FFI
 ## Testing
 Run comparison profile:
 ```bash
 love profiling/ ffi_comparison_profile
 ```
 After 5 phases (50, 100, 200, 500, 1000 elements):
 - Press 'S' to save report
 - Check `profiling/reports/ffi_comparison/latest.md`
 - Compare FPS, frame times, P99 values
 ## Next Steps
 If you want **real** performance gains:
 1. **Read** `docs/PERFORMANCE_ANALYSIS.md`
 2. **Implement** dirty flag system (biggest bang for buck)
 3. **Profile** with comparison tool
 4. **Measure** actual improvements
 5. **Iterate** on high-impact optimizations
 FFI is done. Focus on the algorithm.
 ## Conclusion
 **FFI optimizations are:**
 - ✅ Correctly implemented
 - ✅ Well-tested
 - ✅ Properly documented
 - ✅ Production-ready
 - ❌ Not high-impact
 **They're a good foundation but not the solution to slow layouts.**
 The real wins come from smarter algorithms, not fancier memory management.
 ---
 **Branch:** `luajit-ffi-optimizations`  
 **Status:** Complete (but marginal gains)  
 **Recommendation:** Merge, then focus on algorithmic optimizations
--- a/docs/PERFORMANCE_ANALYSIS.md
+++ b/docs/PERFORMANCE_ANALYSIS.md
@@ -1,301 +0,0 @@
 # FlexLöve Performance Analysis & Optimization Opportunities
 ## Current State: Why FFI Gains Are Marginal
 The current FFI optimizations provide minimal gains because:
 1. **FFI isn't used in hot paths** - The batch calculation function exists but isn't called
 2. **Colors don't use FFI** - We disabled it due to method requirements
 3. **Real bottleneck is elsewhere** - Layout algorithm complexity, not memory allocation
 ## Actual Performance Bottlenecks (Profiled)
 ### 1. Layout Algorithm Complexity - **HIGHEST IMPACT**
 **Problem:** O(n²) complexity in flex layout with wrapping
 - Iterates children multiple times per layout
 - Recalculates sizes repeatedly
 - No caching of computed values
 **Impact:** 60-80% of frame time with 500+ elements
 **Solution:**
 - Cache computed dimensions per frame
 - Single-pass layout algorithm
 - Dirty-flag system to skip unchanged subtrees
 ### 2. Table Access Overhead - **HIGH IMPACT**
 **Problem:** Lua table lookups in tight loops
 ```lua
 for i, child in ipairs(children) do
  local w = child.width + child.padding.left + child.padding.right
  local h = child.height + child.padding.top + child.padding.bottom
  -- Repeated table access: child.margin.left, child.margin.right, etc.
 end
 ```
 **Impact:** 15-20% of layout time
 **Solution:**
 - Local variable hoisting
 - Flatten nested table access
 - Use numeric indices instead of string keys where possible
 ### 3. Function Call Overhead - **MEDIUM IMPACT**
 **Problem:** Method calls in loops
 ```lua
 for i, child in ipairs(children) do
  local w = child:getBorderBoxWidth()  -- Function call overhead
  local h = child:getBorderBoxHeight() -- Another function call
 end
 ```
 **Impact:** 10-15% of layout time
 **Solution:**
 - Inline critical getters
 - Direct field access where safe
 - JIT-friendly code patterns
 ### 4. Garbage Collection - **MEDIUM IMPACT**
 **Problem:** Temporary table allocation in loops
 ```lua
 for i, child in ipairs(children) do
  positions[i] = { x = x, y = y } -- New table every iteration
 end
 ```
 **Impact:** 10-20% overhead from GC pauses
 **Solution:**
 - Reuse tables instead of allocating
 - Object pooling for frequently created objects
 - Preallocate arrays with known sizes
 ### 5. String Concatenation - **LOW IMPACT**
 **Problem:** String operations in hot paths
 ```lua
 local id = "layout_" .. elementId .. "_" .. frameCount
 ```
 **Impact:** 5-10% in specific scenarios
 **Solution:**
 - Cache generated strings
 - Use string.format sparingly
 - Avoid string operations in inner loops
 ## High-Impact Optimizations (Recommended)
 ### Priority 1: Layout Algorithm Optimization
 **Estimated Gain: 40-60% faster layouts**
 ```lua
 -- BEFORE: Multiple passes
 function LayoutEngine:layoutChildren()
  -- Pass 1: Calculate sizes
  for i, child in ipairs(children) do
    child:calculateSize()
  end
  -- Pass 2: Position elements
  for i, child in ipairs(children) do
    child:calculatePosition()
  end
  -- Pass 3: Layout recursively
  for i, child in ipairs(children) do
    child:layoutChildren()
  end
 end
 -- AFTER: Single pass with caching
 function LayoutEngine:layoutChildren()
  -- Cache dimensions once
  local childSizes = {}
  for i, child in ipairs(children) do
    childSizes[i] = {
      width = child._borderBoxWidth or (child.width + child.padding.left + child.padding.right),
      height = child._borderBoxHeight or (child.height + child.padding.top + child.padding.bottom),
    }
  end
  -- Single pass: position and recurse
  for i, child in ipairs(children) do
    local size = childSizes[i]
    child.x = calculateX(size.width)
    child.y = calculateY(size.height)
    child:layoutChildren() -- Recurse
  end
 end
 ```
 ### Priority 2: Local Variable Hoisting
 **Estimated Gain: 15-20% faster**
 ```lua
 -- BEFORE: Repeated table access
 for i, child in ipairs(children) do
  local x = parent.x + parent.padding.left + child.margin.left
  local y = parent.y + parent.padding.top + child.margin.top
  local w = child.width + child.padding.left + child.padding.right
 end
 -- AFTER: Hoist to locals
 local parentX = parent.x
 local parentY = parent.y
 local parentPaddingLeft = parent.padding.left
 local parentPaddingTop = parent.padding.top
 for i, child in ipairs(children) do
  local childMarginLeft = child.margin.left
  local childMarginTop = child.margin.top
  local childPaddingLeft = child.padding.left
  local childPaddingRight = child.padding.right
  local x = parentX + parentPaddingLeft + childMarginLeft
  local y = parentY + parentPaddingTop + childMarginTop
  local w = child.width + childPaddingLeft + childPaddingRight
 end
 ```
 ### Priority 3: Dirty Flag System
 **Estimated Gain: 30-50% fewer layouts**
 ```lua
 -- Add dirty tracking to Element
 function Element:setProperty(key, value)
  if self[key] ~= value then
    self[key] = value
    self._dirty = true
    self:invalidateLayout()
  end
 end
 function LayoutEngine:layoutChildren()
  if not self.element._dirty and not self.element._childrenDirty then
    return -- Skip layout entirely
  end
  -- ... perform layout ...
  self.element._dirty = false
  self.element._childrenDirty = false
 end
 ```
 ### Priority 4: Dimension Caching
 **Estimated Gain: 10-15% faster**
 ```lua
 -- Cache computed dimensions
 function Element:getBorderBoxWidth()
  if self._borderBoxWidthCache then
    return self._borderBoxWidthCache
  end
  self._borderBoxWidthCache = self.width + self.padding.left + self.padding.right
  return self._borderBoxWidthCache
 end
 -- Invalidate on property change
 function Element:setWidth(width)
  self.width = width
  self._borderBoxWidthCache = nil -- Invalidate cache
  self._dirty = true
 end
 ```
 ### Priority 5: Preallocate Arrays
 **Estimated Gain: 5-10% less GC pressure**
 ```lua
 -- BEFORE: Grow array dynamically
 local positions = {}
 for i, child in ipairs(children) do
  positions[i] = { x = x, y = y }
 end
 -- AFTER: Preallocate
 local positions = table.create and table.create(#children) or {}
 for i, child in ipairs(children) do
  positions[i] = { x = x, y = y }
 end
 ```
 ## FFI Optimizations (Current Implementation)
 **Estimated Gain: 5-10% in specific scenarios**
 Current FFI optimizations help with:
 - Vec2/Rect pooling for batch operations
 - Reduced GC pressure for position calculations
 - Better cache locality for large arrays
 But they're limited because:
 - Not used in main layout algorithm
 - Colors can't use FFI (need methods)
 - Overhead of wrapping/unwrapping FFI objects
 ## Recommended Implementation Order
 1. **Dirty Flag System** (1-2 hours) - Biggest bang for buck
 2. **Local Variable Hoisting** (2-3 hours) - Easy win
 3. **Dimension Caching** (1-2 hours) - Simple optimization
 4. **Single-Pass Layout** (4-6 hours) - Complex but high impact
 5. **Array Preallocation** (1 hour) - Quick win
 **Total Estimated Gain: 2-3x faster layouts**
 ## Benchmarking Strategy
 To measure improvements:
 1. **Baseline** - Current implementation
 2. **After each optimization** - Measure incremental gain
 3. **Compare scenarios**:
   - Small UIs (50 elements)
   - Medium UIs (200 elements)
   - Large UIs (1000 elements)
   - Deep nesting (10 levels)
   - Flat hierarchy (1 level)
 ## Why Not More Aggressive FFI?
 **Option: FFI-based layout engine**
 Could implement entire layout algorithm in C via FFI:
 - 5-10x faster
 - Much more complex
 - Harder to maintain
 - Loses Lua flexibility
 **Verdict:** Not worth it. The optimizations above give 80% of the benefit with 20% of the complexity.
 ## Conclusion
 The current FFI optimizations are correct but target the wrong bottleneck. The real gains come from:
 1. **Algorithmic improvements** (dirty flags, caching)
 2. **Lua optimization patterns** (local hoisting, inline)
 3. **Reducing work** (skip unchanged subtrees)
 FFI helps at the margins but isn't the silver bullet. Focus on the high-impact optimizations first.
 ---
 **Next Steps:**
 1. Implement dirty flag system
 2. Add dimension caching
 3. Hoist locals in hot loops
 4. Profile again and measure gains
 5. Consider single-pass layout if needed