removed unneeded md
This commit is contained in:
@@ -1,205 +0,0 @@
|
|||||||
# Algorithmic Performance Optimizations
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Implemented high-impact algorithmic optimizations to FlexLöve UI framework based on profiling analysis. These optimizations target the real performance bottlenecks identified in `PERFORMANCE_ANALYSIS.md`.
|
|
||||||
|
|
||||||
**Estimated Total Gain: 2-3x faster layouts** (40-60% improvement expected based on profiling)
|
|
||||||
|
|
||||||
## Optimizations Implemented
|
|
||||||
|
|
||||||
### 1. Dirty Flag System ✅ (Priority 3)
|
|
||||||
|
|
||||||
**Estimated Gain: 30-50% fewer layouts**
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
- Added `_dirty` and `_childrenDirty` flags to Element module
|
|
||||||
- Elements track when properties change that affect layout
|
|
||||||
- Parent elements track when children need layout recalculation
|
|
||||||
- `LayoutEngine:_canSkipLayout()` checks dirty flags first (fastest check)
|
|
||||||
- `Element:invalidateLayout()` propagates dirty flags up the tree
|
|
||||||
|
|
||||||
**Files Modified:**
|
|
||||||
- `modules/Element.lua`
|
|
||||||
- Added dirty flags initialization in `Element.new()`
|
|
||||||
- Enhanced `Element:invalidateLayout()` to mark self and ancestors
|
|
||||||
- Updated `Element:setProperty()` to invalidate layout for layout-affecting properties
|
|
||||||
- `modules/LayoutEngine.lua`
|
|
||||||
- Enhanced `_canSkipLayout()` to check dirty flags before expensive checks
|
|
||||||
|
|
||||||
**Key Properties That Trigger Invalidation:**
|
|
||||||
- Dimensions: `width`, `height`, `padding`, `margin`, `gap`
|
|
||||||
- Layout: `flexDirection`, `flexWrap`, `justifyContent`, `alignItems`, `alignContent`, `positioning`
|
|
||||||
- Grid: `gridRows`, `gridColumns`
|
|
||||||
- Positioning: `top`, `right`, `bottom`, `left`
|
|
||||||
|
|
||||||
### 2. Dimension Caching ✅ (Priority 4)
|
|
||||||
|
|
||||||
**Estimated Gain: 10-15% faster**
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
- Element module already had basic caching via `_borderBoxWidth` and `_borderBoxHeight`
|
|
||||||
- Enhanced with proper cache invalidation in `invalidateLayout()`
|
|
||||||
- Caches are cleared when element properties change
|
|
||||||
- `getBorderBoxWidth()` and `getBorderBoxHeight()` return cached values when available
|
|
||||||
|
|
||||||
**Files Modified:**
|
|
||||||
- `modules/Element.lua`
|
|
||||||
- Added cache invalidation to `invalidateLayout()`
|
|
||||||
- Maintained existing `_borderBoxWidth` and `_borderBoxHeight` caching
|
|
||||||
|
|
||||||
### 3. Local Variable Hoisting ✅ (Priority 2)
|
|
||||||
|
|
||||||
**Estimated Gain: 15-20% faster**
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
Optimized hot paths in `LayoutEngine:layoutChildren()` by hoisting frequently accessed table properties to local variables:
|
|
||||||
|
|
||||||
**Wrapping Logic (Lines 403-441):**
|
|
||||||
- Hoisted `self.flexDirection` comparison → `isHorizontal`
|
|
||||||
- Hoisted `self.gap` → `gapSize`
|
|
||||||
- Cached `child.margin` per iteration
|
|
||||||
- Eliminated repeated enum lookups in tight loops
|
|
||||||
|
|
||||||
**Line Height Calculation (Lines 458-487):**
|
|
||||||
- Hoisted `self.flexDirection` comparison → `isHorizontal`
|
|
||||||
- Preallocated `lineHeights` array with `table.create()` if available
|
|
||||||
- Cached `child.margin` per iteration
|
|
||||||
- Reduced repeated table access for margin properties
|
|
||||||
|
|
||||||
**Positioning Loop (Lines 586-700):**
|
|
||||||
This is the **hottest path** - optimized heavily:
|
|
||||||
- Hoisted `self.element.x`, `self.element.y` → `elementX`, `elementY`
|
|
||||||
- Hoisted `self.element.padding` → `elementPadding`
|
|
||||||
- Hoisted padding properties → `elementPaddingLeft`, `elementPaddingTop`
|
|
||||||
- Hoisted alignment enums → `alignItems_*` constants
|
|
||||||
- Cached `child.margin`, `child.padding`, `child.autosizing` per iteration
|
|
||||||
- Cached individual margin values → `childMarginLeft`, `childMarginTop`, etc.
|
|
||||||
- Eliminated redundant table lookups in alignment calculations
|
|
||||||
|
|
||||||
**Performance Impact:**
|
|
||||||
- **Before:** `child.margin.left` accessed 3-4 times per child → 3-4 table lookups
|
|
||||||
- **After:** `child.margin` cached once, then `childMarginLeft` used → 2 table lookups total
|
|
||||||
- Multiplied across hundreds/thousands of children = significant savings
|
|
||||||
|
|
||||||
**Files Modified:**
|
|
||||||
- `modules/LayoutEngine.lua`
|
|
||||||
- Optimized wrapping logic (lines 403-441)
|
|
||||||
- Optimized line height calculation (lines 458-487)
|
|
||||||
- Optimized positioning loop for horizontal layout (lines 586-658)
|
|
||||||
- Optimized positioning loop for vertical layout (lines 660-700)
|
|
||||||
|
|
||||||
### 4. Array Preallocation ✅ (Priority 5)
|
|
||||||
|
|
||||||
**Estimated Gain: 5-10% less GC pressure**
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
- Used `table.create(#lines)` to preallocate `lineHeights` array when available (LuaJIT)
|
|
||||||
- Graceful fallback to `{}` on standard Lua
|
|
||||||
- Reduces GC pressure by avoiding table resizing during growth
|
|
||||||
|
|
||||||
**Files Modified:**
|
|
||||||
- `modules/LayoutEngine.lua`
|
|
||||||
- Preallocated `lineHeights` array (line 460)
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
✅ **All 1257 tests passing**
|
|
||||||
|
|
||||||
Ran full test suite with:
|
|
||||||
```bash
|
|
||||||
lua testing/runAll.lua --no-coverage
|
|
||||||
```
|
|
||||||
|
|
||||||
No regressions introduced. All layout calculations remain correct.
|
|
||||||
|
|
||||||
## Performance Comparison
|
|
||||||
|
|
||||||
### Before (FFI Optimizations Only)
|
|
||||||
- **Gain:** 5-10% improvement
|
|
||||||
- **Bottleneck:** O(n²) layout algorithm with repeated table access
|
|
||||||
- **Issue:** Targeting wrong optimization (memory allocation vs algorithm)
|
|
||||||
|
|
||||||
### After (Algorithmic Optimizations)
|
|
||||||
- **Estimated Gain:** 40-60% improvement (2-3x faster)
|
|
||||||
- **Approach:** Target real bottlenecks (dirty flags, caching, local hoisting)
|
|
||||||
- **Benefit:** Fewer layouts + faster layout calculations
|
|
||||||
|
|
||||||
### Combined (FFI + Algorithmic)
|
|
||||||
- **Total Estimated Gain:** 45-65% improvement
|
|
||||||
- **Reality:** Most gains come from algorithmic improvements, not FFI
|
|
||||||
|
|
||||||
## What Was NOT Implemented
|
|
||||||
|
|
||||||
### Single-Pass Layout (Priority 1)
|
|
||||||
**Estimated Gain: 40-60% faster** - Not implemented due to complexity
|
|
||||||
|
|
||||||
This would require major refactoring of the layout algorithm to:
|
|
||||||
- Combine size calculation and positioning into single pass
|
|
||||||
- Cache dimensions during first pass
|
|
||||||
- Eliminate redundant iterations
|
|
||||||
|
|
||||||
**Recommendation:** Consider for future optimization if more performance is needed after measuring gains from current optimizations.
|
|
||||||
|
|
||||||
## Code Quality
|
|
||||||
|
|
||||||
- ✅ Zero breaking changes
|
|
||||||
- ✅ All tests passing
|
|
||||||
- ✅ Maintains existing API
|
|
||||||
- ✅ Backward compatible
|
|
||||||
- ✅ Clear comments explaining optimizations
|
|
||||||
- ✅ Graceful fallbacks (e.g., `table.create`)
|
|
||||||
|
|
||||||
## Benchmarking
|
|
||||||
|
|
||||||
To benchmark improvements, use the existing profiling tools:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Run FFI comparison profile
|
|
||||||
love profiling/ ffi_comparison_profile
|
|
||||||
|
|
||||||
# After 5 phases, press 'S' to save report
|
|
||||||
# Compare FPS and frame times before/after
|
|
||||||
```
|
|
||||||
|
|
||||||
**Expected Results:**
|
|
||||||
- **Small UIs (50 elements):** 20-30% faster
|
|
||||||
- **Medium UIs (200 elements):** 40-50% faster
|
|
||||||
- **Large UIs (1000 elements):** 50-60% faster
|
|
||||||
- **Deep nesting (10 levels):** 60%+ faster (dirty flags help most here)
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
1. **Measure Real-World Performance:**
|
|
||||||
- Run benchmarks on actual applications
|
|
||||||
- Profile with 50, 200, 1000 element UIs
|
|
||||||
- Compare before/after metrics
|
|
||||||
|
|
||||||
2. **Consider Single-Pass Layout:**
|
|
||||||
- If more performance needed after measuring
|
|
||||||
- Estimated 40-60% additional gain
|
|
||||||
- Complex refactor, weigh benefit vs cost
|
|
||||||
|
|
||||||
3. **Profile Edge Cases:**
|
|
||||||
- Deep nesting scenarios
|
|
||||||
- Frequent property updates
|
|
||||||
- Immediate mode vs retained mode
|
|
||||||
|
|
||||||
## Conclusion
|
|
||||||
|
|
||||||
These algorithmic optimizations address the **real performance bottlenecks** identified through profiling:
|
|
||||||
|
|
||||||
1. ✅ **Dirty flags** - Skip unnecessary layout recalculations
|
|
||||||
2. ✅ **Dimension caching** - Avoid redundant calculations
|
|
||||||
3. ✅ **Local hoisting** - Reduce table access overhead in hot paths
|
|
||||||
4. ✅ **Array preallocation** - Reduce GC pressure
|
|
||||||
|
|
||||||
Unlike FFI optimizations (5-10% gain), these changes target the O(n²) layout algorithm complexity and table access overhead that actually dominate performance.
|
|
||||||
|
|
||||||
**Bottom Line:** Simple algorithmic improvements beat fancy memory optimizations every time.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Branch:** `algorithmic-performance-optimizations`
|
|
||||||
**Status:** Complete, all tests passing
|
|
||||||
**Recommendation:** Merge after benchmarking confirms expected gains
|
|
||||||
@@ -1,158 +0,0 @@
|
|||||||
# LuaJIT FFI Optimization Summary
|
|
||||||
|
|
||||||
## What Was Implemented
|
|
||||||
|
|
||||||
✅ **FFI Module** - Object pooling for Vec2, Rect, Timer structs
|
|
||||||
✅ **LayoutEngine Integration** - Batch calculation functions (not called)
|
|
||||||
✅ **Performance Module** - FFI-aware monitoring
|
|
||||||
✅ **Graceful Fallback** - Works on standard Lua
|
|
||||||
✅ **Profiling Tools** - Comparison profiles and reports
|
|
||||||
|
|
||||||
## Actual Performance Gains
|
|
||||||
|
|
||||||
### Reality: 5-10% Improvement (Marginal)
|
|
||||||
|
|
||||||
The FFI optimizations provide **minimal gains** because they target the wrong bottleneck:
|
|
||||||
|
|
||||||
| Scenario | Improvement | Why So Small? |
|
|
||||||
|----------|-------------|---------------|
|
|
||||||
| 50 elements | 2-5% | FFI overhead > benefit |
|
|
||||||
| 200 elements | 5-8% | Some GC reduction |
|
|
||||||
| 1000 elements | 8-12% | Pooling helps slightly |
|
|
||||||
|
|
||||||
### Why Are Gains So Small?
|
|
||||||
|
|
||||||
1. **FFI batch functions aren't called** - They exist but the layout algorithm doesn't use them
|
|
||||||
2. **Colors don't use FFI** - Need methods, so use Lua tables
|
|
||||||
3. **Wrong bottleneck** - Real issue is O(n²) layout algorithm, not memory allocation
|
|
||||||
4. **Table access overhead** - Lua table lookups dominate, not object creation
|
|
||||||
|
|
||||||
## Real Performance Bottlenecks
|
|
||||||
|
|
||||||
Based on profiling, here's where time actually goes:
|
|
||||||
|
|
||||||
1. **Layout Algorithm** (60-80%) - Multiple passes, repeated calculations
|
|
||||||
2. **Table Access** (15-20%) - Nested table lookups in loops
|
|
||||||
3. **Function Calls** (10-15%) - Method call overhead
|
|
||||||
4. **GC** (10-20%) - Temporary allocations
|
|
||||||
5. **FFI Overhead** (5-10%) - What we optimized
|
|
||||||
|
|
||||||
## High-Impact Optimizations (Not Yet Implemented)
|
|
||||||
|
|
||||||
These would provide **2-3x performance gains**:
|
|
||||||
|
|
||||||
### 1. Dirty Flag System (40-50% gain)
|
|
||||||
Skip layouts for unchanged subtrees
|
|
||||||
|
|
||||||
### 2. Local Variable Hoisting (15-20% gain)
|
|
||||||
Cache table lookups outside loops
|
|
||||||
|
|
||||||
### 3. Dimension Caching (10-15% gain)
|
|
||||||
Cache computed border-box dimensions
|
|
||||||
|
|
||||||
### 4. Single-Pass Layout (30-40% gain)
|
|
||||||
Eliminate redundant iterations
|
|
||||||
|
|
||||||
### 5. Array Preallocation (5-10% gain)
|
|
||||||
Reduce GC pressure
|
|
||||||
|
|
||||||
**See `docs/PERFORMANCE_ANALYSIS.md` for details**
|
|
||||||
|
|
||||||
## Should You Use FFI Optimizations?
|
|
||||||
|
|
||||||
### ✅ Yes, Keep Them Because:
|
|
||||||
- Zero cost when disabled (standard Lua)
|
|
||||||
- Automatic on LuaJIT
|
|
||||||
- Foundation for future optimizations
|
|
||||||
- Some benefit for large UIs
|
|
||||||
- Well-tested and documented
|
|
||||||
|
|
||||||
### ❌ Don't Expect Miracles:
|
|
||||||
- Won't fix slow layouts
|
|
||||||
- Marginal gains in practice
|
|
||||||
- Real wins come from algorithmic improvements
|
|
||||||
|
|
||||||
## Recommendations
|
|
||||||
|
|
||||||
### For Users
|
|
||||||
**Just use it** - FFI optimizations are automatic and safe. You'll get 5-10% improvement on LuaJIT with zero code changes.
|
|
||||||
|
|
||||||
### For Developers
|
|
||||||
**Focus elsewhere** - If you want big performance gains:
|
|
||||||
|
|
||||||
1. Implement dirty flag system
|
|
||||||
2. Add dimension caching
|
|
||||||
3. Hoist locals in hot loops
|
|
||||||
4. Profile and measure
|
|
||||||
|
|
||||||
FFI is nice-to-have, not a silver bullet.
|
|
||||||
|
|
||||||
## Comparison: FFI vs Algorithmic Optimizations
|
|
||||||
|
|
||||||
| Optimization | Effort | Gain | Complexity |
|
|
||||||
|--------------|--------|------|------------|
|
|
||||||
| **FFI (current)** | 8 hours | 5-10% | Medium |
|
|
||||||
| **Dirty flags** | 2 hours | 40-50% | Low |
|
|
||||||
| **Local hoisting** | 3 hours | 15-20% | Low |
|
|
||||||
| **Dimension cache** | 2 hours | 10-15% | Low |
|
|
||||||
| **Single-pass layout** | 6 hours | 30-40% | High |
|
|
||||||
|
|
||||||
**Lesson:** Simple algorithmic improvements beat fancy FFI optimizations.
|
|
||||||
|
|
||||||
## Files Modified
|
|
||||||
|
|
||||||
### New Files
|
|
||||||
- `modules/FFI.lua` - FFI module with pooling
|
|
||||||
- `docs/FFI_OPTIMIZATIONS.md` - User documentation
|
|
||||||
- `docs/PERFORMANCE_ANALYSIS.md` - Bottleneck analysis
|
|
||||||
- `profiling/__profiles__/ffi_comparison_profile.lua` - Comparison tool
|
|
||||||
- `profiling/__profiles__/ffi_optimization_profile.lua` - Demo
|
|
||||||
|
|
||||||
### Modified Files
|
|
||||||
- `FlexLove.lua` - Initialize FFI
|
|
||||||
- `modules/LayoutEngine.lua` - Batch functions (unused)
|
|
||||||
- `modules/Performance.lua` - FFI integration
|
|
||||||
- `modules/Color.lua` - Intentionally NOT using FFI
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
Run comparison profile:
|
|
||||||
```bash
|
|
||||||
love profiling/ ffi_comparison_profile
|
|
||||||
```
|
|
||||||
|
|
||||||
After 5 phases (50, 100, 200, 500, 1000 elements):
|
|
||||||
- Press 'S' to save report
|
|
||||||
- Check `profiling/reports/ffi_comparison/latest.md`
|
|
||||||
- Compare FPS, frame times, P99 values
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
If you want **real** performance gains:
|
|
||||||
|
|
||||||
1. **Read** `docs/PERFORMANCE_ANALYSIS.md`
|
|
||||||
2. **Implement** dirty flag system (biggest bang for buck)
|
|
||||||
3. **Profile** with comparison tool
|
|
||||||
4. **Measure** actual improvements
|
|
||||||
5. **Iterate** on high-impact optimizations
|
|
||||||
|
|
||||||
FFI is done. Focus on the algorithm.
|
|
||||||
|
|
||||||
## Conclusion
|
|
||||||
|
|
||||||
**FFI optimizations are:**
|
|
||||||
- ✅ Correctly implemented
|
|
||||||
- ✅ Well-tested
|
|
||||||
- ✅ Properly documented
|
|
||||||
- ✅ Production-ready
|
|
||||||
- ❌ Not high-impact
|
|
||||||
|
|
||||||
**They're a good foundation but not the solution to slow layouts.**
|
|
||||||
|
|
||||||
The real wins come from smarter algorithms, not fancier memory management.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Branch:** `luajit-ffi-optimizations`
|
|
||||||
**Status:** Complete (but marginal gains)
|
|
||||||
**Recommendation:** Merge, then focus on algorithmic optimizations
|
|
||||||
@@ -1,301 +0,0 @@
|
|||||||
# FlexLöve Performance Analysis & Optimization Opportunities
|
|
||||||
|
|
||||||
## Current State: Why FFI Gains Are Marginal
|
|
||||||
|
|
||||||
The current FFI optimizations provide minimal gains because:
|
|
||||||
|
|
||||||
1. **FFI isn't used in hot paths** - The batch calculation function exists but isn't called
|
|
||||||
2. **Colors don't use FFI** - We disabled it due to method requirements
|
|
||||||
3. **Real bottleneck is elsewhere** - Layout algorithm complexity, not memory allocation
|
|
||||||
|
|
||||||
## Actual Performance Bottlenecks (Profiled)
|
|
||||||
|
|
||||||
### 1. Layout Algorithm Complexity - **HIGHEST IMPACT**
|
|
||||||
|
|
||||||
**Problem:** O(n²) complexity in flex layout with wrapping
|
|
||||||
- Iterates children multiple times per layout
|
|
||||||
- Recalculates sizes repeatedly
|
|
||||||
- No caching of computed values
|
|
||||||
|
|
||||||
**Impact:** 60-80% of frame time with 500+ elements
|
|
||||||
|
|
||||||
**Solution:**
|
|
||||||
- Cache computed dimensions per frame
|
|
||||||
- Single-pass layout algorithm
|
|
||||||
- Dirty-flag system to skip unchanged subtrees
|
|
||||||
|
|
||||||
### 2. Table Access Overhead - **HIGH IMPACT**
|
|
||||||
|
|
||||||
**Problem:** Lua table lookups in tight loops
|
|
||||||
```lua
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
local w = child.width + child.padding.left + child.padding.right
|
|
||||||
local h = child.height + child.padding.top + child.padding.bottom
|
|
||||||
-- Repeated table access: child.margin.left, child.margin.right, etc.
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
**Impact:** 15-20% of layout time
|
|
||||||
|
|
||||||
**Solution:**
|
|
||||||
- Local variable hoisting
|
|
||||||
- Flatten nested table access
|
|
||||||
- Use numeric indices instead of string keys where possible
|
|
||||||
|
|
||||||
### 3. Function Call Overhead - **MEDIUM IMPACT**
|
|
||||||
|
|
||||||
**Problem:** Method calls in loops
|
|
||||||
```lua
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
local w = child:getBorderBoxWidth() -- Function call overhead
|
|
||||||
local h = child:getBorderBoxHeight() -- Another function call
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
**Impact:** 10-15% of layout time
|
|
||||||
|
|
||||||
**Solution:**
|
|
||||||
- Inline critical getters
|
|
||||||
- Direct field access where safe
|
|
||||||
- JIT-friendly code patterns
|
|
||||||
|
|
||||||
### 4. Garbage Collection - **MEDIUM IMPACT**
|
|
||||||
|
|
||||||
**Problem:** Temporary table allocation in loops
|
|
||||||
```lua
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
positions[i] = { x = x, y = y } -- New table every iteration
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
**Impact:** 10-20% overhead from GC pauses
|
|
||||||
|
|
||||||
**Solution:**
|
|
||||||
- Reuse tables instead of allocating
|
|
||||||
- Object pooling for frequently created objects
|
|
||||||
- Preallocate arrays with known sizes
|
|
||||||
|
|
||||||
### 5. String Concatenation - **LOW IMPACT**
|
|
||||||
|
|
||||||
**Problem:** String operations in hot paths
|
|
||||||
```lua
|
|
||||||
local id = "layout_" .. elementId .. "_" .. frameCount
|
|
||||||
```
|
|
||||||
|
|
||||||
**Impact:** 5-10% in specific scenarios
|
|
||||||
|
|
||||||
**Solution:**
|
|
||||||
- Cache generated strings
|
|
||||||
- Use string.format sparingly
|
|
||||||
- Avoid string operations in inner loops
|
|
||||||
|
|
||||||
## High-Impact Optimizations (Recommended)
|
|
||||||
|
|
||||||
### Priority 1: Layout Algorithm Optimization
|
|
||||||
|
|
||||||
**Estimated Gain: 40-60% faster layouts**
|
|
||||||
|
|
||||||
```lua
|
|
||||||
-- BEFORE: Multiple passes
|
|
||||||
function LayoutEngine:layoutChildren()
|
|
||||||
-- Pass 1: Calculate sizes
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
child:calculateSize()
|
|
||||||
end
|
|
||||||
|
|
||||||
-- Pass 2: Position elements
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
child:calculatePosition()
|
|
||||||
end
|
|
||||||
|
|
||||||
-- Pass 3: Layout recursively
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
child:layoutChildren()
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
-- AFTER: Single pass with caching
|
|
||||||
function LayoutEngine:layoutChildren()
|
|
||||||
-- Cache dimensions once
|
|
||||||
local childSizes = {}
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
childSizes[i] = {
|
|
||||||
width = child._borderBoxWidth or (child.width + child.padding.left + child.padding.right),
|
|
||||||
height = child._borderBoxHeight or (child.height + child.padding.top + child.padding.bottom),
|
|
||||||
}
|
|
||||||
end
|
|
||||||
|
|
||||||
-- Single pass: position and recurse
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
local size = childSizes[i]
|
|
||||||
child.x = calculateX(size.width)
|
|
||||||
child.y = calculateY(size.height)
|
|
||||||
child:layoutChildren() -- Recurse
|
|
||||||
end
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
### Priority 2: Local Variable Hoisting
|
|
||||||
|
|
||||||
**Estimated Gain: 15-20% faster**
|
|
||||||
|
|
||||||
```lua
|
|
||||||
-- BEFORE: Repeated table access
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
local x = parent.x + parent.padding.left + child.margin.left
|
|
||||||
local y = parent.y + parent.padding.top + child.margin.top
|
|
||||||
local w = child.width + child.padding.left + child.padding.right
|
|
||||||
end
|
|
||||||
|
|
||||||
-- AFTER: Hoist to locals
|
|
||||||
local parentX = parent.x
|
|
||||||
local parentY = parent.y
|
|
||||||
local parentPaddingLeft = parent.padding.left
|
|
||||||
local parentPaddingTop = parent.padding.top
|
|
||||||
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
local childMarginLeft = child.margin.left
|
|
||||||
local childMarginTop = child.margin.top
|
|
||||||
local childPaddingLeft = child.padding.left
|
|
||||||
local childPaddingRight = child.padding.right
|
|
||||||
|
|
||||||
local x = parentX + parentPaddingLeft + childMarginLeft
|
|
||||||
local y = parentY + parentPaddingTop + childMarginTop
|
|
||||||
local w = child.width + childPaddingLeft + childPaddingRight
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
### Priority 3: Dirty Flag System
|
|
||||||
|
|
||||||
**Estimated Gain: 30-50% fewer layouts**
|
|
||||||
|
|
||||||
```lua
|
|
||||||
-- Add dirty tracking to Element
|
|
||||||
function Element:setProperty(key, value)
|
|
||||||
if self[key] ~= value then
|
|
||||||
self[key] = value
|
|
||||||
self._dirty = true
|
|
||||||
self:invalidateLayout()
|
|
||||||
end
|
|
||||||
end
|
|
||||||
|
|
||||||
function LayoutEngine:layoutChildren()
|
|
||||||
if not self.element._dirty and not self.element._childrenDirty then
|
|
||||||
return -- Skip layout entirely
|
|
||||||
end
|
|
||||||
|
|
||||||
-- ... perform layout ...
|
|
||||||
|
|
||||||
self.element._dirty = false
|
|
||||||
self.element._childrenDirty = false
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
### Priority 4: Dimension Caching
|
|
||||||
|
|
||||||
**Estimated Gain: 10-15% faster**
|
|
||||||
|
|
||||||
```lua
|
|
||||||
-- Cache computed dimensions
|
|
||||||
function Element:getBorderBoxWidth()
|
|
||||||
if self._borderBoxWidthCache then
|
|
||||||
return self._borderBoxWidthCache
|
|
||||||
end
|
|
||||||
|
|
||||||
self._borderBoxWidthCache = self.width + self.padding.left + self.padding.right
|
|
||||||
return self._borderBoxWidthCache
|
|
||||||
end
|
|
||||||
|
|
||||||
-- Invalidate on property change
|
|
||||||
function Element:setWidth(width)
|
|
||||||
self.width = width
|
|
||||||
self._borderBoxWidthCache = nil -- Invalidate cache
|
|
||||||
self._dirty = true
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
### Priority 5: Preallocate Arrays
|
|
||||||
|
|
||||||
**Estimated Gain: 5-10% less GC pressure**
|
|
||||||
|
|
||||||
```lua
|
|
||||||
-- BEFORE: Grow array dynamically
|
|
||||||
local positions = {}
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
positions[i] = { x = x, y = y }
|
|
||||||
end
|
|
||||||
|
|
||||||
-- AFTER: Preallocate
|
|
||||||
local positions = table.create and table.create(#children) or {}
|
|
||||||
for i, child in ipairs(children) do
|
|
||||||
positions[i] = { x = x, y = y }
|
|
||||||
end
|
|
||||||
```
|
|
||||||
|
|
||||||
## FFI Optimizations (Current Implementation)
|
|
||||||
|
|
||||||
**Estimated Gain: 5-10% in specific scenarios**
|
|
||||||
|
|
||||||
Current FFI optimizations help with:
|
|
||||||
- Vec2/Rect pooling for batch operations
|
|
||||||
- Reduced GC pressure for position calculations
|
|
||||||
- Better cache locality for large arrays
|
|
||||||
|
|
||||||
But they're limited because:
|
|
||||||
- Not used in main layout algorithm
|
|
||||||
- Colors can't use FFI (need methods)
|
|
||||||
- Overhead of wrapping/unwrapping FFI objects
|
|
||||||
|
|
||||||
## Recommended Implementation Order
|
|
||||||
|
|
||||||
1. **Dirty Flag System** (1-2 hours) - Biggest bang for buck
|
|
||||||
2. **Local Variable Hoisting** (2-3 hours) - Easy win
|
|
||||||
3. **Dimension Caching** (1-2 hours) - Simple optimization
|
|
||||||
4. **Single-Pass Layout** (4-6 hours) - Complex but high impact
|
|
||||||
5. **Array Preallocation** (1 hour) - Quick win
|
|
||||||
|
|
||||||
**Total Estimated Gain: 2-3x faster layouts**
|
|
||||||
|
|
||||||
## Benchmarking Strategy
|
|
||||||
|
|
||||||
To measure improvements:
|
|
||||||
|
|
||||||
1. **Baseline** - Current implementation
|
|
||||||
2. **After each optimization** - Measure incremental gain
|
|
||||||
3. **Compare scenarios**:
|
|
||||||
- Small UIs (50 elements)
|
|
||||||
- Medium UIs (200 elements)
|
|
||||||
- Large UIs (1000 elements)
|
|
||||||
- Deep nesting (10 levels)
|
|
||||||
- Flat hierarchy (1 level)
|
|
||||||
|
|
||||||
## Why Not More Aggressive FFI?
|
|
||||||
|
|
||||||
**Option: FFI-based layout engine**
|
|
||||||
|
|
||||||
Could implement entire layout algorithm in C via FFI:
|
|
||||||
- 5-10x faster
|
|
||||||
- Much more complex
|
|
||||||
- Harder to maintain
|
|
||||||
- Loses Lua flexibility
|
|
||||||
|
|
||||||
**Verdict:** Not worth it. The optimizations above give 80% of the benefit with 20% of the complexity.
|
|
||||||
|
|
||||||
## Conclusion
|
|
||||||
|
|
||||||
The current FFI optimizations are correct but target the wrong bottleneck. The real gains come from:
|
|
||||||
|
|
||||||
1. **Algorithmic improvements** (dirty flags, caching)
|
|
||||||
2. **Lua optimization patterns** (local hoisting, inline)
|
|
||||||
3. **Reducing work** (skip unchanged subtrees)
|
|
||||||
|
|
||||||
FFI helps at the margins but isn't the silver bullet. Focus on the high-impact optimizations first.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Next Steps:**
|
|
||||||
1. Implement dirty flag system
|
|
||||||
2. Add dimension caching
|
|
||||||
3. Hoist locals in hot loops
|
|
||||||
4. Profile again and measure gains
|
|
||||||
5. Consider single-pass layout if needed
|
|
||||||
Reference in New Issue
Block a user