Implemented high-impact optimizations from PERFORMANCE_ANALYSIS.md:
1. Dirty Flag System (30-50% fewer layouts):
- Added _dirty and _childrenDirty flags to Element module
- Elements track when properties change that affect layout
- LayoutEngine checks dirty flags before expensive layout calculations
- Element:setProperty() invalidates layout for layout-affecting properties
2. Dimension Caching (10-15% faster):
- Enhanced _borderBoxWidth/_borderBoxHeight caching
- Proper cache invalidation in invalidateLayout()
- Reduces redundant getBorderBox calculations
3. Local Variable Hoisting (15-20% faster):
- Hoisted frequently accessed properties outside tight loops
- Reduced table lookups in wrapping logic (child.margin cached)
- Optimized line height calculation (isHorizontal hoisted)
- Heavily optimized positioning loop (hottest path):
* Cached element.x, element.y, element.padding
* Hoisted alignment enums outside loop
* Cached child.margin, child.padding per iteration
* 3-4 table lookups → 2 lookups per child
4. Array Preallocation (5-10% less GC):
- Preallocated lineHeights with table.create() when available
- Graceful fallback to {} on standard Lua
Estimated total gain: 40-60% improvement (2-3x faster layouts)
All 1257 tests passing. Zero breaking changes.
See ALGORITHMIC_OPTIMIZATIONS.md for full details.
7.3 KiB
Algorithmic Performance Optimizations
Summary
Implemented high-impact algorithmic optimizations to FlexLöve UI framework based on profiling analysis. These optimizations target the real performance bottlenecks identified in PERFORMANCE_ANALYSIS.md.
Estimated Total Gain: 2-3x faster layouts (40-60% improvement expected based on profiling)
Optimizations Implemented
1. Dirty Flag System ✅ (Priority 3)
Estimated Gain: 30-50% fewer layouts
Implementation:
- Added
_dirtyand_childrenDirtyflags to Element module - Elements track when properties change that affect layout
- Parent elements track when children need layout recalculation
LayoutEngine:_canSkipLayout()checks dirty flags first (fastest check)Element:invalidateLayout()propagates dirty flags up the tree
Files Modified:
modules/Element.lua- Added dirty flags initialization in
Element.new() - Enhanced
Element:invalidateLayout()to mark self and ancestors - Updated
Element:setProperty()to invalidate layout for layout-affecting properties
- Added dirty flags initialization in
modules/LayoutEngine.lua- Enhanced
_canSkipLayout()to check dirty flags before expensive checks
- Enhanced
Key Properties That Trigger Invalidation:
- Dimensions:
width,height,padding,margin,gap - Layout:
flexDirection,flexWrap,justifyContent,alignItems,alignContent,positioning - Grid:
gridRows,gridColumns - Positioning:
top,right,bottom,left
2. Dimension Caching ✅ (Priority 4)
Estimated Gain: 10-15% faster
Implementation:
- Element module already had basic caching via
_borderBoxWidthand_borderBoxHeight - Enhanced with proper cache invalidation in
invalidateLayout() - Caches are cleared when element properties change
getBorderBoxWidth()andgetBorderBoxHeight()return cached values when available
Files Modified:
modules/Element.lua- Added cache invalidation to
invalidateLayout() - Maintained existing
_borderBoxWidthand_borderBoxHeightcaching
- Added cache invalidation to
3. Local Variable Hoisting ✅ (Priority 2)
Estimated Gain: 15-20% faster
Implementation:
Optimized hot paths in LayoutEngine:layoutChildren() by hoisting frequently accessed table properties to local variables:
Wrapping Logic (Lines 403-441):
- Hoisted
self.flexDirectioncomparison →isHorizontal - Hoisted
self.gap→gapSize - Cached
child.marginper iteration - Eliminated repeated enum lookups in tight loops
Line Height Calculation (Lines 458-487):
- Hoisted
self.flexDirectioncomparison →isHorizontal - Preallocated
lineHeightsarray withtable.create()if available - Cached
child.marginper iteration - Reduced repeated table access for margin properties
Positioning Loop (Lines 586-700): This is the hottest path - optimized heavily:
- Hoisted
self.element.x,self.element.y→elementX,elementY - Hoisted
self.element.padding→elementPadding - Hoisted padding properties →
elementPaddingLeft,elementPaddingTop - Hoisted alignment enums →
alignItems_*constants - Cached
child.margin,child.padding,child.autosizingper iteration - Cached individual margin values →
childMarginLeft,childMarginTop, etc. - Eliminated redundant table lookups in alignment calculations
Performance Impact:
- Before:
child.margin.leftaccessed 3-4 times per child → 3-4 table lookups - After:
child.margincached once, thenchildMarginLeftused → 2 table lookups total - Multiplied across hundreds/thousands of children = significant savings
Files Modified:
modules/LayoutEngine.lua- Optimized wrapping logic (lines 403-441)
- Optimized line height calculation (lines 458-487)
- Optimized positioning loop for horizontal layout (lines 586-658)
- Optimized positioning loop for vertical layout (lines 660-700)
4. Array Preallocation ✅ (Priority 5)
Estimated Gain: 5-10% less GC pressure
Implementation:
- Used
table.create(#lines)to preallocatelineHeightsarray when available (LuaJIT) - Graceful fallback to
{}on standard Lua - Reduces GC pressure by avoiding table resizing during growth
Files Modified:
modules/LayoutEngine.lua- Preallocated
lineHeightsarray (line 460)
- Preallocated
Testing
✅ All 1257 tests passing
Ran full test suite with:
lua testing/runAll.lua --no-coverage
No regressions introduced. All layout calculations remain correct.
Performance Comparison
Before (FFI Optimizations Only)
- Gain: 5-10% improvement
- Bottleneck: O(n²) layout algorithm with repeated table access
- Issue: Targeting wrong optimization (memory allocation vs algorithm)
After (Algorithmic Optimizations)
- Estimated Gain: 40-60% improvement (2-3x faster)
- Approach: Target real bottlenecks (dirty flags, caching, local hoisting)
- Benefit: Fewer layouts + faster layout calculations
Combined (FFI + Algorithmic)
- Total Estimated Gain: 45-65% improvement
- Reality: Most gains come from algorithmic improvements, not FFI
What Was NOT Implemented
Single-Pass Layout (Priority 1)
Estimated Gain: 40-60% faster - Not implemented due to complexity
This would require major refactoring of the layout algorithm to:
- Combine size calculation and positioning into single pass
- Cache dimensions during first pass
- Eliminate redundant iterations
Recommendation: Consider for future optimization if more performance is needed after measuring gains from current optimizations.
Code Quality
- ✅ Zero breaking changes
- ✅ All tests passing
- ✅ Maintains existing API
- ✅ Backward compatible
- ✅ Clear comments explaining optimizations
- ✅ Graceful fallbacks (e.g.,
table.create)
Benchmarking
To benchmark improvements, use the existing profiling tools:
# Run FFI comparison profile
love profiling/ ffi_comparison_profile
# After 5 phases, press 'S' to save report
# Compare FPS and frame times before/after
Expected Results:
- Small UIs (50 elements): 20-30% faster
- Medium UIs (200 elements): 40-50% faster
- Large UIs (1000 elements): 50-60% faster
- Deep nesting (10 levels): 60%+ faster (dirty flags help most here)
Next Steps
-
Measure Real-World Performance:
- Run benchmarks on actual applications
- Profile with 50, 200, 1000 element UIs
- Compare before/after metrics
-
Consider Single-Pass Layout:
- If more performance needed after measuring
- Estimated 40-60% additional gain
- Complex refactor, weigh benefit vs cost
-
Profile Edge Cases:
- Deep nesting scenarios
- Frequent property updates
- Immediate mode vs retained mode
Conclusion
These algorithmic optimizations address the real performance bottlenecks identified through profiling:
- ✅ Dirty flags - Skip unnecessary layout recalculations
- ✅ Dimension caching - Avoid redundant calculations
- ✅ Local hoisting - Reduce table access overhead in hot paths
- ✅ Array preallocation - Reduce GC pressure
Unlike FFI optimizations (5-10% gain), these changes target the O(n²) layout algorithm complexity and table access overhead that actually dominate performance.
Bottom Line: Simple algorithmic improvements beat fancy memory optimizations every time.
Branch: algorithmic-performance-optimizations
Status: Complete, all tests passing
Recommendation: Merge after benchmarking confirms expected gains