removed unneeded md
This commit is contained in:
@@ -1,205 +0,0 @@
|
||||
# Algorithmic Performance Optimizations
|
||||
|
||||
## Summary
|
||||
|
||||
Implemented high-impact algorithmic optimizations to FlexLöve UI framework based on profiling analysis. These optimizations target the real performance bottlenecks identified in `PERFORMANCE_ANALYSIS.md`.
|
||||
|
||||
**Estimated Total Gain: 2-3x faster layouts** (40-60% improvement expected based on profiling)
|
||||
|
||||
## Optimizations Implemented
|
||||
|
||||
### 1. Dirty Flag System ✅ (Priority 3)
|
||||
|
||||
**Estimated Gain: 30-50% fewer layouts**
|
||||
|
||||
**Implementation:**
|
||||
- Added `_dirty` and `_childrenDirty` flags to Element module
|
||||
- Elements track when properties change that affect layout
|
||||
- Parent elements track when children need layout recalculation
|
||||
- `LayoutEngine:_canSkipLayout()` checks dirty flags first (fastest check)
|
||||
- `Element:invalidateLayout()` propagates dirty flags up the tree
|
||||
|
||||
**Files Modified:**
|
||||
- `modules/Element.lua`
|
||||
- Added dirty flags initialization in `Element.new()`
|
||||
- Enhanced `Element:invalidateLayout()` to mark self and ancestors
|
||||
- Updated `Element:setProperty()` to invalidate layout for layout-affecting properties
|
||||
- `modules/LayoutEngine.lua`
|
||||
- Enhanced `_canSkipLayout()` to check dirty flags before expensive checks
|
||||
|
||||
**Key Properties That Trigger Invalidation:**
|
||||
- Dimensions: `width`, `height`, `padding`, `margin`, `gap`
|
||||
- Layout: `flexDirection`, `flexWrap`, `justifyContent`, `alignItems`, `alignContent`, `positioning`
|
||||
- Grid: `gridRows`, `gridColumns`
|
||||
- Positioning: `top`, `right`, `bottom`, `left`
|
||||
|
||||
### 2. Dimension Caching ✅ (Priority 4)
|
||||
|
||||
**Estimated Gain: 10-15% faster**
|
||||
|
||||
**Implementation:**
|
||||
- Element module already had basic caching via `_borderBoxWidth` and `_borderBoxHeight`
|
||||
- Enhanced with proper cache invalidation in `invalidateLayout()`
|
||||
- Caches are cleared when element properties change
|
||||
- `getBorderBoxWidth()` and `getBorderBoxHeight()` return cached values when available
|
||||
|
||||
**Files Modified:**
|
||||
- `modules/Element.lua`
|
||||
- Added cache invalidation to `invalidateLayout()`
|
||||
- Maintained existing `_borderBoxWidth` and `_borderBoxHeight` caching
|
||||
|
||||
### 3. Local Variable Hoisting ✅ (Priority 2)
|
||||
|
||||
**Estimated Gain: 15-20% faster**
|
||||
|
||||
**Implementation:**
|
||||
Optimized hot paths in `LayoutEngine:layoutChildren()` by hoisting frequently accessed table properties to local variables:
|
||||
|
||||
**Wrapping Logic (Lines 403-441):**
|
||||
- Hoisted `self.flexDirection` comparison → `isHorizontal`
|
||||
- Hoisted `self.gap` → `gapSize`
|
||||
- Cached `child.margin` per iteration
|
||||
- Eliminated repeated enum lookups in tight loops
|
||||
|
||||
**Line Height Calculation (Lines 458-487):**
|
||||
- Hoisted `self.flexDirection` comparison → `isHorizontal`
|
||||
- Preallocated `lineHeights` array with `table.create()` if available
|
||||
- Cached `child.margin` per iteration
|
||||
- Reduced repeated table access for margin properties
|
||||
|
||||
**Positioning Loop (Lines 586-700):**
|
||||
This is the **hottest path** - optimized heavily:
|
||||
- Hoisted `self.element.x`, `self.element.y` → `elementX`, `elementY`
|
||||
- Hoisted `self.element.padding` → `elementPadding`
|
||||
- Hoisted padding properties → `elementPaddingLeft`, `elementPaddingTop`
|
||||
- Hoisted alignment enums → `alignItems_*` constants
|
||||
- Cached `child.margin`, `child.padding`, `child.autosizing` per iteration
|
||||
- Cached individual margin values → `childMarginLeft`, `childMarginTop`, etc.
|
||||
- Eliminated redundant table lookups in alignment calculations
|
||||
|
||||
**Performance Impact:**
|
||||
- **Before:** `child.margin.left` accessed 3-4 times per child → 3-4 table lookups
|
||||
- **After:** `child.margin` cached once, then `childMarginLeft` used → 2 table lookups total
|
||||
- Multiplied across hundreds/thousands of children = significant savings
|
||||
|
||||
**Files Modified:**
|
||||
- `modules/LayoutEngine.lua`
|
||||
- Optimized wrapping logic (lines 403-441)
|
||||
- Optimized line height calculation (lines 458-487)
|
||||
- Optimized positioning loop for horizontal layout (lines 586-658)
|
||||
- Optimized positioning loop for vertical layout (lines 660-700)
|
||||
|
||||
### 4. Array Preallocation ✅ (Priority 5)
|
||||
|
||||
**Estimated Gain: 5-10% less GC pressure**
|
||||
|
||||
**Implementation:**
|
||||
- Used `table.create(#lines)` to preallocate `lineHeights` array when available (LuaJIT)
|
||||
- Graceful fallback to `{}` on standard Lua
|
||||
- Reduces GC pressure by avoiding table resizing during growth
|
||||
|
||||
**Files Modified:**
|
||||
- `modules/LayoutEngine.lua`
|
||||
- Preallocated `lineHeights` array (line 460)
|
||||
|
||||
## Testing
|
||||
|
||||
✅ **All 1257 tests passing**
|
||||
|
||||
Ran full test suite with:
|
||||
```bash
|
||||
lua testing/runAll.lua --no-coverage
|
||||
```
|
||||
|
||||
No regressions introduced. All layout calculations remain correct.
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
### Before (FFI Optimizations Only)
|
||||
- **Gain:** 5-10% improvement
|
||||
- **Bottleneck:** O(n²) layout algorithm with repeated table access
|
||||
- **Issue:** Targeting wrong optimization (memory allocation vs algorithm)
|
||||
|
||||
### After (Algorithmic Optimizations)
|
||||
- **Estimated Gain:** 40-60% improvement (2-3x faster)
|
||||
- **Approach:** Target real bottlenecks (dirty flags, caching, local hoisting)
|
||||
- **Benefit:** Fewer layouts + faster layout calculations
|
||||
|
||||
### Combined (FFI + Algorithmic)
|
||||
- **Total Estimated Gain:** 45-65% improvement
|
||||
- **Reality:** Most gains come from algorithmic improvements, not FFI
|
||||
|
||||
## What Was NOT Implemented
|
||||
|
||||
### Single-Pass Layout (Priority 1)
|
||||
**Estimated Gain: 40-60% faster** - Not implemented due to complexity
|
||||
|
||||
This would require major refactoring of the layout algorithm to:
|
||||
- Combine size calculation and positioning into single pass
|
||||
- Cache dimensions during first pass
|
||||
- Eliminate redundant iterations
|
||||
|
||||
**Recommendation:** Consider for future optimization if more performance is needed after measuring gains from current optimizations.
|
||||
|
||||
## Code Quality
|
||||
|
||||
- ✅ Zero breaking changes
|
||||
- ✅ All tests passing
|
||||
- ✅ Maintains existing API
|
||||
- ✅ Backward compatible
|
||||
- ✅ Clear comments explaining optimizations
|
||||
- ✅ Graceful fallbacks (e.g., `table.create`)
|
||||
|
||||
## Benchmarking
|
||||
|
||||
To benchmark improvements, use the existing profiling tools:
|
||||
|
||||
```bash
|
||||
# Run FFI comparison profile
|
||||
love profiling/ ffi_comparison_profile
|
||||
|
||||
# After 5 phases, press 'S' to save report
|
||||
# Compare FPS and frame times before/after
|
||||
```
|
||||
|
||||
**Expected Results:**
|
||||
- **Small UIs (50 elements):** 20-30% faster
|
||||
- **Medium UIs (200 elements):** 40-50% faster
|
||||
- **Large UIs (1000 elements):** 50-60% faster
|
||||
- **Deep nesting (10 levels):** 60%+ faster (dirty flags help most here)
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Measure Real-World Performance:**
|
||||
- Run benchmarks on actual applications
|
||||
- Profile with 50, 200, 1000 element UIs
|
||||
- Compare before/after metrics
|
||||
|
||||
2. **Consider Single-Pass Layout:**
|
||||
- If more performance needed after measuring
|
||||
- Estimated 40-60% additional gain
|
||||
- Complex refactor, weigh benefit vs cost
|
||||
|
||||
3. **Profile Edge Cases:**
|
||||
- Deep nesting scenarios
|
||||
- Frequent property updates
|
||||
- Immediate mode vs retained mode
|
||||
|
||||
## Conclusion
|
||||
|
||||
These algorithmic optimizations address the **real performance bottlenecks** identified through profiling:
|
||||
|
||||
1. ✅ **Dirty flags** - Skip unnecessary layout recalculations
|
||||
2. ✅ **Dimension caching** - Avoid redundant calculations
|
||||
3. ✅ **Local hoisting** - Reduce table access overhead in hot paths
|
||||
4. ✅ **Array preallocation** - Reduce GC pressure
|
||||
|
||||
Unlike FFI optimizations (5-10% gain), these changes target the O(n²) layout algorithm complexity and table access overhead that actually dominate performance.
|
||||
|
||||
**Bottom Line:** Simple algorithmic improvements beat fancy memory optimizations every time.
|
||||
|
||||
---
|
||||
|
||||
**Branch:** `algorithmic-performance-optimizations`
|
||||
**Status:** Complete, all tests passing
|
||||
**Recommendation:** Merge after benchmarking confirms expected gains
|
||||
@@ -1,158 +0,0 @@
|
||||
# LuaJIT FFI Optimization Summary
|
||||
|
||||
## What Was Implemented
|
||||
|
||||
✅ **FFI Module** - Object pooling for Vec2, Rect, Timer structs
|
||||
✅ **LayoutEngine Integration** - Batch calculation functions (not called)
|
||||
✅ **Performance Module** - FFI-aware monitoring
|
||||
✅ **Graceful Fallback** - Works on standard Lua
|
||||
✅ **Profiling Tools** - Comparison profiles and reports
|
||||
|
||||
## Actual Performance Gains
|
||||
|
||||
### Reality: 5-10% Improvement (Marginal)
|
||||
|
||||
The FFI optimizations provide **minimal gains** because they target the wrong bottleneck:
|
||||
|
||||
| Scenario | Improvement | Why So Small? |
|
||||
|----------|-------------|---------------|
|
||||
| 50 elements | 2-5% | FFI overhead > benefit |
|
||||
| 200 elements | 5-8% | Some GC reduction |
|
||||
| 1000 elements | 8-12% | Pooling helps slightly |
|
||||
|
||||
### Why Are Gains So Small?
|
||||
|
||||
1. **FFI batch functions aren't called** - They exist but the layout algorithm doesn't use them
|
||||
2. **Colors don't use FFI** - Need methods, so use Lua tables
|
||||
3. **Wrong bottleneck** - Real issue is O(n²) layout algorithm, not memory allocation
|
||||
4. **Table access overhead** - Lua table lookups dominate, not object creation
|
||||
|
||||
## Real Performance Bottlenecks
|
||||
|
||||
Based on profiling, here's where time actually goes:
|
||||
|
||||
1. **Layout Algorithm** (60-80%) - Multiple passes, repeated calculations
|
||||
2. **Table Access** (15-20%) - Nested table lookups in loops
|
||||
3. **Function Calls** (10-15%) - Method call overhead
|
||||
4. **GC** (10-20%) - Temporary allocations
|
||||
5. **FFI Overhead** (5-10%) - What we optimized
|
||||
|
||||
## High-Impact Optimizations (Not Yet Implemented)
|
||||
|
||||
These would provide **2-3x performance gains**:
|
||||
|
||||
### 1. Dirty Flag System (40-50% gain)
|
||||
Skip layouts for unchanged subtrees
|
||||
|
||||
### 2. Local Variable Hoisting (15-20% gain)
|
||||
Cache table lookups outside loops
|
||||
|
||||
### 3. Dimension Caching (10-15% gain)
|
||||
Cache computed border-box dimensions
|
||||
|
||||
### 4. Single-Pass Layout (30-40% gain)
|
||||
Eliminate redundant iterations
|
||||
|
||||
### 5. Array Preallocation (5-10% gain)
|
||||
Reduce GC pressure
|
||||
|
||||
**See `docs/PERFORMANCE_ANALYSIS.md` for details**
|
||||
|
||||
## Should You Use FFI Optimizations?
|
||||
|
||||
### ✅ Yes, Keep Them Because:
|
||||
- Zero cost when disabled (standard Lua)
|
||||
- Automatic on LuaJIT
|
||||
- Foundation for future optimizations
|
||||
- Some benefit for large UIs
|
||||
- Well-tested and documented
|
||||
|
||||
### ❌ Don't Expect Miracles:
|
||||
- Won't fix slow layouts
|
||||
- Marginal gains in practice
|
||||
- Real wins come from algorithmic improvements
|
||||
|
||||
## Recommendations
|
||||
|
||||
### For Users
|
||||
**Just use it** - FFI optimizations are automatic and safe. You'll get 5-10% improvement on LuaJIT with zero code changes.
|
||||
|
||||
### For Developers
|
||||
**Focus elsewhere** - If you want big performance gains:
|
||||
|
||||
1. Implement dirty flag system
|
||||
2. Add dimension caching
|
||||
3. Hoist locals in hot loops
|
||||
4. Profile and measure
|
||||
|
||||
FFI is nice-to-have, not a silver bullet.
|
||||
|
||||
## Comparison: FFI vs Algorithmic Optimizations
|
||||
|
||||
| Optimization | Effort | Gain | Complexity |
|
||||
|--------------|--------|------|------------|
|
||||
| **FFI (current)** | 8 hours | 5-10% | Medium |
|
||||
| **Dirty flags** | 2 hours | 40-50% | Low |
|
||||
| **Local hoisting** | 3 hours | 15-20% | Low |
|
||||
| **Dimension cache** | 2 hours | 10-15% | Low |
|
||||
| **Single-pass layout** | 6 hours | 30-40% | High |
|
||||
|
||||
**Lesson:** Simple algorithmic improvements beat fancy FFI optimizations.
|
||||
|
||||
## Files Modified
|
||||
|
||||
### New Files
|
||||
- `modules/FFI.lua` - FFI module with pooling
|
||||
- `docs/FFI_OPTIMIZATIONS.md` - User documentation
|
||||
- `docs/PERFORMANCE_ANALYSIS.md` - Bottleneck analysis
|
||||
- `profiling/__profiles__/ffi_comparison_profile.lua` - Comparison tool
|
||||
- `profiling/__profiles__/ffi_optimization_profile.lua` - Demo
|
||||
|
||||
### Modified Files
|
||||
- `FlexLove.lua` - Initialize FFI
|
||||
- `modules/LayoutEngine.lua` - Batch functions (unused)
|
||||
- `modules/Performance.lua` - FFI integration
|
||||
- `modules/Color.lua` - Intentionally NOT using FFI
|
||||
|
||||
## Testing
|
||||
|
||||
Run comparison profile:
|
||||
```bash
|
||||
love profiling/ ffi_comparison_profile
|
||||
```
|
||||
|
||||
After 5 phases (50, 100, 200, 500, 1000 elements):
|
||||
- Press 'S' to save report
|
||||
- Check `profiling/reports/ffi_comparison/latest.md`
|
||||
- Compare FPS, frame times, P99 values
|
||||
|
||||
## Next Steps
|
||||
|
||||
If you want **real** performance gains:
|
||||
|
||||
1. **Read** `docs/PERFORMANCE_ANALYSIS.md`
|
||||
2. **Implement** dirty flag system (biggest bang for buck)
|
||||
3. **Profile** with comparison tool
|
||||
4. **Measure** actual improvements
|
||||
5. **Iterate** on high-impact optimizations
|
||||
|
||||
FFI is done. Focus on the algorithm.
|
||||
|
||||
## Conclusion
|
||||
|
||||
**FFI optimizations are:**
|
||||
- ✅ Correctly implemented
|
||||
- ✅ Well-tested
|
||||
- ✅ Properly documented
|
||||
- ✅ Production-ready
|
||||
- ❌ Not high-impact
|
||||
|
||||
**They're a good foundation but not the solution to slow layouts.**
|
||||
|
||||
The real wins come from smarter algorithms, not fancier memory management.
|
||||
|
||||
---
|
||||
|
||||
**Branch:** `luajit-ffi-optimizations`
|
||||
**Status:** Complete (but marginal gains)
|
||||
**Recommendation:** Merge, then focus on algorithmic optimizations
|
||||
@@ -1,301 +0,0 @@
|
||||
# FlexLöve Performance Analysis & Optimization Opportunities
|
||||
|
||||
## Current State: Why FFI Gains Are Marginal
|
||||
|
||||
The current FFI optimizations provide minimal gains because:
|
||||
|
||||
1. **FFI isn't used in hot paths** - The batch calculation function exists but isn't called
|
||||
2. **Colors don't use FFI** - We disabled it due to method requirements
|
||||
3. **Real bottleneck is elsewhere** - Layout algorithm complexity, not memory allocation
|
||||
|
||||
## Actual Performance Bottlenecks (Profiled)
|
||||
|
||||
### 1. Layout Algorithm Complexity - **HIGHEST IMPACT**
|
||||
|
||||
**Problem:** O(n²) complexity in flex layout with wrapping
|
||||
- Iterates children multiple times per layout
|
||||
- Recalculates sizes repeatedly
|
||||
- No caching of computed values
|
||||
|
||||
**Impact:** 60-80% of frame time with 500+ elements
|
||||
|
||||
**Solution:**
|
||||
- Cache computed dimensions per frame
|
||||
- Single-pass layout algorithm
|
||||
- Dirty-flag system to skip unchanged subtrees
|
||||
|
||||
### 2. Table Access Overhead - **HIGH IMPACT**
|
||||
|
||||
**Problem:** Lua table lookups in tight loops
|
||||
```lua
|
||||
for i, child in ipairs(children) do
|
||||
local w = child.width + child.padding.left + child.padding.right
|
||||
local h = child.height + child.padding.top + child.padding.bottom
|
||||
-- Repeated table access: child.margin.left, child.margin.right, etc.
|
||||
end
|
||||
```
|
||||
|
||||
**Impact:** 15-20% of layout time
|
||||
|
||||
**Solution:**
|
||||
- Local variable hoisting
|
||||
- Flatten nested table access
|
||||
- Use numeric indices instead of string keys where possible
|
||||
|
||||
### 3. Function Call Overhead - **MEDIUM IMPACT**
|
||||
|
||||
**Problem:** Method calls in loops
|
||||
```lua
|
||||
for i, child in ipairs(children) do
|
||||
local w = child:getBorderBoxWidth() -- Function call overhead
|
||||
local h = child:getBorderBoxHeight() -- Another function call
|
||||
end
|
||||
```
|
||||
|
||||
**Impact:** 10-15% of layout time
|
||||
|
||||
**Solution:**
|
||||
- Inline critical getters
|
||||
- Direct field access where safe
|
||||
- JIT-friendly code patterns
|
||||
|
||||
### 4. Garbage Collection - **MEDIUM IMPACT**
|
||||
|
||||
**Problem:** Temporary table allocation in loops
|
||||
```lua
|
||||
for i, child in ipairs(children) do
|
||||
positions[i] = { x = x, y = y } -- New table every iteration
|
||||
end
|
||||
```
|
||||
|
||||
**Impact:** 10-20% overhead from GC pauses
|
||||
|
||||
**Solution:**
|
||||
- Reuse tables instead of allocating
|
||||
- Object pooling for frequently created objects
|
||||
- Preallocate arrays with known sizes
|
||||
|
||||
### 5. String Concatenation - **LOW IMPACT**
|
||||
|
||||
**Problem:** String operations in hot paths
|
||||
```lua
|
||||
local id = "layout_" .. elementId .. "_" .. frameCount
|
||||
```
|
||||
|
||||
**Impact:** 5-10% in specific scenarios
|
||||
|
||||
**Solution:**
|
||||
- Cache generated strings
|
||||
- Use string.format sparingly
|
||||
- Avoid string operations in inner loops
|
||||
|
||||
## High-Impact Optimizations (Recommended)
|
||||
|
||||
### Priority 1: Layout Algorithm Optimization
|
||||
|
||||
**Estimated Gain: 40-60% faster layouts**
|
||||
|
||||
```lua
|
||||
-- BEFORE: Multiple passes
|
||||
function LayoutEngine:layoutChildren()
|
||||
-- Pass 1: Calculate sizes
|
||||
for i, child in ipairs(children) do
|
||||
child:calculateSize()
|
||||
end
|
||||
|
||||
-- Pass 2: Position elements
|
||||
for i, child in ipairs(children) do
|
||||
child:calculatePosition()
|
||||
end
|
||||
|
||||
-- Pass 3: Layout recursively
|
||||
for i, child in ipairs(children) do
|
||||
child:layoutChildren()
|
||||
end
|
||||
end
|
||||
|
||||
-- AFTER: Single pass with caching
|
||||
function LayoutEngine:layoutChildren()
|
||||
-- Cache dimensions once
|
||||
local childSizes = {}
|
||||
for i, child in ipairs(children) do
|
||||
childSizes[i] = {
|
||||
width = child._borderBoxWidth or (child.width + child.padding.left + child.padding.right),
|
||||
height = child._borderBoxHeight or (child.height + child.padding.top + child.padding.bottom),
|
||||
}
|
||||
end
|
||||
|
||||
-- Single pass: position and recurse
|
||||
for i, child in ipairs(children) do
|
||||
local size = childSizes[i]
|
||||
child.x = calculateX(size.width)
|
||||
child.y = calculateY(size.height)
|
||||
child:layoutChildren() -- Recurse
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
### Priority 2: Local Variable Hoisting
|
||||
|
||||
**Estimated Gain: 15-20% faster**
|
||||
|
||||
```lua
|
||||
-- BEFORE: Repeated table access
|
||||
for i, child in ipairs(children) do
|
||||
local x = parent.x + parent.padding.left + child.margin.left
|
||||
local y = parent.y + parent.padding.top + child.margin.top
|
||||
local w = child.width + child.padding.left + child.padding.right
|
||||
end
|
||||
|
||||
-- AFTER: Hoist to locals
|
||||
local parentX = parent.x
|
||||
local parentY = parent.y
|
||||
local parentPaddingLeft = parent.padding.left
|
||||
local parentPaddingTop = parent.padding.top
|
||||
|
||||
for i, child in ipairs(children) do
|
||||
local childMarginLeft = child.margin.left
|
||||
local childMarginTop = child.margin.top
|
||||
local childPaddingLeft = child.padding.left
|
||||
local childPaddingRight = child.padding.right
|
||||
|
||||
local x = parentX + parentPaddingLeft + childMarginLeft
|
||||
local y = parentY + parentPaddingTop + childMarginTop
|
||||
local w = child.width + childPaddingLeft + childPaddingRight
|
||||
end
|
||||
```
|
||||
|
||||
### Priority 3: Dirty Flag System
|
||||
|
||||
**Estimated Gain: 30-50% fewer layouts**
|
||||
|
||||
```lua
|
||||
-- Add dirty tracking to Element
|
||||
function Element:setProperty(key, value)
|
||||
if self[key] ~= value then
|
||||
self[key] = value
|
||||
self._dirty = true
|
||||
self:invalidateLayout()
|
||||
end
|
||||
end
|
||||
|
||||
function LayoutEngine:layoutChildren()
|
||||
if not self.element._dirty and not self.element._childrenDirty then
|
||||
return -- Skip layout entirely
|
||||
end
|
||||
|
||||
-- ... perform layout ...
|
||||
|
||||
self.element._dirty = false
|
||||
self.element._childrenDirty = false
|
||||
end
|
||||
```
|
||||
|
||||
### Priority 4: Dimension Caching
|
||||
|
||||
**Estimated Gain: 10-15% faster**
|
||||
|
||||
```lua
|
||||
-- Cache computed dimensions
|
||||
function Element:getBorderBoxWidth()
|
||||
if self._borderBoxWidthCache then
|
||||
return self._borderBoxWidthCache
|
||||
end
|
||||
|
||||
self._borderBoxWidthCache = self.width + self.padding.left + self.padding.right
|
||||
return self._borderBoxWidthCache
|
||||
end
|
||||
|
||||
-- Invalidate on property change
|
||||
function Element:setWidth(width)
|
||||
self.width = width
|
||||
self._borderBoxWidthCache = nil -- Invalidate cache
|
||||
self._dirty = true
|
||||
end
|
||||
```
|
||||
|
||||
### Priority 5: Preallocate Arrays
|
||||
|
||||
**Estimated Gain: 5-10% less GC pressure**
|
||||
|
||||
```lua
|
||||
-- BEFORE: Grow array dynamically
|
||||
local positions = {}
|
||||
for i, child in ipairs(children) do
|
||||
positions[i] = { x = x, y = y }
|
||||
end
|
||||
|
||||
-- AFTER: Preallocate
|
||||
local positions = table.create and table.create(#children) or {}
|
||||
for i, child in ipairs(children) do
|
||||
positions[i] = { x = x, y = y }
|
||||
end
|
||||
```
|
||||
|
||||
## FFI Optimizations (Current Implementation)
|
||||
|
||||
**Estimated Gain: 5-10% in specific scenarios**
|
||||
|
||||
Current FFI optimizations help with:
|
||||
- Vec2/Rect pooling for batch operations
|
||||
- Reduced GC pressure for position calculations
|
||||
- Better cache locality for large arrays
|
||||
|
||||
But they're limited because:
|
||||
- Not used in main layout algorithm
|
||||
- Colors can't use FFI (need methods)
|
||||
- Overhead of wrapping/unwrapping FFI objects
|
||||
|
||||
## Recommended Implementation Order
|
||||
|
||||
1. **Dirty Flag System** (1-2 hours) - Biggest bang for buck
|
||||
2. **Local Variable Hoisting** (2-3 hours) - Easy win
|
||||
3. **Dimension Caching** (1-2 hours) - Simple optimization
|
||||
4. **Single-Pass Layout** (4-6 hours) - Complex but high impact
|
||||
5. **Array Preallocation** (1 hour) - Quick win
|
||||
|
||||
**Total Estimated Gain: 2-3x faster layouts**
|
||||
|
||||
## Benchmarking Strategy
|
||||
|
||||
To measure improvements:
|
||||
|
||||
1. **Baseline** - Current implementation
|
||||
2. **After each optimization** - Measure incremental gain
|
||||
3. **Compare scenarios**:
|
||||
- Small UIs (50 elements)
|
||||
- Medium UIs (200 elements)
|
||||
- Large UIs (1000 elements)
|
||||
- Deep nesting (10 levels)
|
||||
- Flat hierarchy (1 level)
|
||||
|
||||
## Why Not More Aggressive FFI?
|
||||
|
||||
**Option: FFI-based layout engine**
|
||||
|
||||
Could implement entire layout algorithm in C via FFI:
|
||||
- 5-10x faster
|
||||
- Much more complex
|
||||
- Harder to maintain
|
||||
- Loses Lua flexibility
|
||||
|
||||
**Verdict:** Not worth it. The optimizations above give 80% of the benefit with 20% of the complexity.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The current FFI optimizations are correct but target the wrong bottleneck. The real gains come from:
|
||||
|
||||
1. **Algorithmic improvements** (dirty flags, caching)
|
||||
2. **Lua optimization patterns** (local hoisting, inline)
|
||||
3. **Reducing work** (skip unchanged subtrees)
|
||||
|
||||
FFI helps at the margins but isn't the silver bullet. Focus on the high-impact optimizations first.
|
||||
|
||||
---
|
||||
|
||||
**Next Steps:**
|
||||
1. Implement dirty flag system
|
||||
2. Add dimension caching
|
||||
3. Hoist locals in hot loops
|
||||
4. Profile again and measure gains
|
||||
5. Consider single-pass layout if needed
|
||||
Reference in New Issue
Block a user