- Fix parallel trie store_in_db_trie: was setting true when masks were present regardless of whether they were empty. Now correctly checks that at least one mask is non-empty. - Remove leftover TrieMasks struct usage in proof_v2 test
15 KiB
Reth Memory Layout Optimization Analysis
Engine, Trie, Multiproof, and Prewarming Components
This document provides a comprehensive analysis of memory layout optimization opportunities across the reth codebase, focusing on engine, trie, multiproof, and prewarming components.
Executive Summary
After thorough exploration of the codebase, I've identified 30+ data structures with potential memory layout improvements. The highest priority optimizations are in:
- Trie node representations -
SparseNodeenum (40+ bytes per node) - Proof structures -
MultiProof,StorageMultiProofwith duplicate hash mask maps - Execution cache -
ExecutionCache,PrewarmContextwith scattered boolean fields - State tracking -
TreeState,BlockStatewith suboptimal field ordering
Estimated memory savings: 5-15% reduction in hot path allocations with proper field reordering and consolidation.
Table of Contents
- Critical Priority - Trie Node Structures
- Critical Priority - Proof Structures
- High Priority - Engine Execution Structures
- High Priority - State Management
- Medium Priority - Supporting Structures
- Alloy Type Considerations
- Recommended Actions
1. Critical Priority - Trie Node Structures
1.1 SparseNode Enum
File: crates/trie/sparse/src/trie.rs:1835
pub enum SparseNode {
Empty,
Hash(B256), // 32 bytes
Leaf { key: Nibbles, hash: Option<B256> }, // Nibbles + 40 bytes
Extension {
key: Nibbles,
hash: Option<B256>,
store_in_db_trie: Option<bool>
},
Branch {
state_mask: TrieMask, // u16 = 2 bytes
hash: Option<B256>, // 33 bytes (Option)
store_in_db_trie: Option<bool> // 2 bytes (Option<bool>)
},
}
Issues:
- Enum discriminant + largest variant = 40+ bytes per node
Option<B256>adds 33 bytes (1 byte tag + 32 bytes value)Option<bool>wastes 2 bytes when 1 bit sufficesTrieMask(u16) followed byOption<B256>causes 6 bytes padding
Recommendations:
- Pack
store_in_db_trieandhash.is_some()into a singleu8flags field - Store hash externally in a separate HashMap keyed by node path
- Consider smaller representation for common patterns
1.2 SerialSparseTrie
File: crates/trie/sparse/src/trie.rs:297-315
pub struct SerialSparseTrie {
nodes: HashMap<Nibbles, SparseNode>,
branch_node_tree_masks: HashMap<Nibbles, TrieMask>, // TrieMask = u16
branch_node_hash_masks: HashMap<Nibbles, TrieMask>, // Duplicate structure
values: HashMap<Nibbles, Vec<u8>>,
prefix_set: PrefixSetMut,
updates: Option<SparseTrieUpdates>,
rlp_buf: Vec<u8>,
}
Issues:
- Two identical HashMaps (
tree_masksandhash_masks) with same key type - 4 HashMap allocations with
Nibbleskeys (variable-length) rlp_buf: Vec<u8>at end is a reusable buffer that could beArcshared
Recommendations:
- Consolidate mask maps:
HashMap<Nibbles, (TrieMask, TrieMask)>orHashMap<Nibbles, TrieMasks> - Reorder fields:
rlp_bufbeforeupdates(often accessed after updates) - Consider
Arc<RwLock<Vec<u8>>>forrlp_bufif shared across threads
1.3 ParallelSparseTrie
File: crates/trie/sparse-parallel/src/trie.rs:105-127
pub struct ParallelSparseTrie {
upper_subtrie: Box<SparseSubtrie>,
lower_subtries: [LowerSparseSubtrie; 16], // Fixed array of 16
prefix_set: PrefixSetMut,
updates: Option<SparseTrieUpdates>,
branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
update_actions_buffers: Vec<Vec<SparseTrieUpdatesAction>>,
parallelism_thresholds: ParallelismThresholds,
#[cfg(feature = "metrics")]
metrics: ParallelSparseTrieMetrics,
}
Issues:
- Array of 16 subtries after
Box- severe padding potential - Same duplicate mask map pattern as
SerialSparseTrie Option<SparseTrieUpdates>before two HashMaps
Recommendations:
- Move
lower_subtriesarray to end (largest field) - Consolidate mask maps
- Group
Optionand small scalar fields together
2. Critical Priority - Proof Structures
2.1 MultiProof
File: crates/trie/common/src/proofs.rs:173-182
pub struct MultiProof {
pub account_subtree: ProofNodes,
pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
pub storages: B256Map<StorageMultiProof>,
}
Issues:
- Two separate HashMaps with identical structure
ProofNodes(large) followed by two maps, then another map
Recommendations:
- Consolidate mask maps: Create
TrieMasks { hash: TrieMask, tree: TrieMask }struct HashMap<Nibbles, TrieMasks>saves one HashMap allocation
2.2 StorageMultiProof
File: crates/trie/common/src/proofs.rs:450-460
pub struct StorageMultiProof {
pub root: B256, // 32 bytes
pub subtree: ProofNodes,
pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
}
Issues:
- Same duplicate HashMap pattern
root(32 bytes) at start is optimal for alignment- Two HashMaps at end should be consolidated
2.3 ProofResultMessage
File: crates/trie/parallel/src/proof_task.rs:598-607
pub struct ProofResultMessage {
pub sequence_number: u64,
pub result: Result<ProofResult, ParallelStateRootError>,
pub elapsed: Duration,
pub state: HashedPostState,
}
Issues:
u64(8 bytes) at start,Duration(16 bytes) laterResultenum size varies significantly by variant
Recommendations:
- Reorder:
elapsed(16 bytes) →sequence_number(8 bytes) →result→state - Consider boxing the error variant in
Result
3. High Priority - Engine Execution Structures
3.1 PayloadProcessor
File: crates/engine/tree/src/tree/payload_processor/mod.rs:104
pub struct PayloadProcessor<Evm> {
executor: WorkloadExecutor,
execution_cache: ExecutionCache,
trie_metrics: MultiProofTaskMetrics,
cross_block_cache_size: u64,
disable_transaction_prewarming: bool, // scattered bools
disable_state_cache: bool,
evm_config: Evm,
precompile_cache_disabled: bool,
precompile_cache_map: PrecompileCacheMap<SpecFor<Evm>>,
sparse_state_trie: Arc<Mutex<Option<...>>>,
disable_parallel_sparse_trie: bool,
prewarm_max_concurrency: usize,
}
Issues:
- 4 boolean fields scattered throughout structure
- Booleans between large fields cause padding waste
- Generic
Evmtype size varies
Recommendations:
- Create
ProcessorFlagsbitfield:struct ProcessorFlags { bits: u8, // 4 bools = 4 bits } impl ProcessorFlags { const DISABLE_TX_PREWARMING: u8 = 0b0001; const DISABLE_STATE_CACHE: u8 = 0b0010; const DISABLE_PRECOMPILE_CACHE: u8 = 0b0100; const DISABLE_PARALLEL_SPARSE_TRIE: u8 = 0b1000; } - Move all config to struct end
- Group related fields (executor, evm_config)
3.2 PrewarmContext
File: crates/engine/tree/src/tree/payload_processor/prewarm.rs:457
pub(super) struct PrewarmContext<N, P, Evm> {
pub(super) env: ExecutionEnv<Evm>,
pub(super) evm_config: Evm,
pub(super) saved_cache: Option<SavedCache>,
pub(super) provider: StateProviderBuilder<N, P>,
pub(super) metrics: PrewarmMetrics,
pub(super) terminate_execution: Arc<AtomicBool>,
pub(super) precompile_cache_disabled: bool,
pub(super) precompile_cache_map: PrecompileCacheMap<SpecFor<Evm>>,
}
Issues:
- Frequently cloned for worker distribution
boolafterArc<AtomicBool>causes padding- Generic types make layout unpredictable
Recommendations:
- Move
terminate_executionandprecompile_cache_disabledtogether - Consider
Arc<PrewarmContextShared>for thread-shared parts - Separate mutable worker state from immutable config
3.3 ExecutionCache
File: crates/engine/tree/src/tree/cached_state.rs:342
pub(crate) struct ExecutionCache {
code_cache: Cache<B256, Option<Bytecode>>,
storage_cache: Cache<Address, Arc<AccountStorageCache>>,
account_cache: Cache<Address, Option<Account>>,
}
Issues:
- 3 separate Moka caches = 3 allocations
- Accessed on every state read (hot path)
Recommendations:
- Consider a unified cache with tagged keys if access patterns permit
- Pre-size caches based on expected workload
- Evaluate if
Arc<AccountStorageCache>sharing is necessary
4. High Priority - State Management
4.1 TreeState
File: crates/engine/tree/src/tree/state.rs:24
pub struct TreeState<N: NodePrimitives = EthPrimitives> {
pub(crate) blocks_by_hash: HashMap<B256, ExecutedBlock<N>>,
pub(crate) blocks_by_number: BTreeMap<BlockNumber, Vec<ExecutedBlock<N>>>,
pub(crate) parent_to_child: HashMap<B256, HashSet<B256>>,
pub(crate) current_canonical_head: BlockNumHash,
pub(crate) engine_kind: EngineApiKind, // 1-2 bytes
}
Issues:
engine_kind(1-2 bytes) at end after large collections- 6-7 bytes padding before
engine_kind
Recommendations:
- Move
engine_kindandcurrent_canonical_head(16 bytes) to struct start - Order:
engine_kind→current_canonical_head→ collections
4.2 BlockState
File: crates/chain-state/src/in_memory.rs:575
pub struct BlockState<N: NodePrimitives = EthPrimitives> {
block: ExecutedBlock<N>,
parent: Option<Arc<Self>>,
}
Issues:
Option<Arc<Self>>is 16 bytes (Option tag + pointer)- Small struct, but frequently allocated
4.3 HashedStorage
File: crates/trie/common/src/hashed_state.rs:404-409
pub struct HashedStorage {
pub wiped: bool, // 1 byte
pub storage: B256Map<U256>, // HashMap
}
Issues:
boolbefore HashMap causes 7 bytes padding (HashMap aligned to 8)
Recommendations:
- Move
wipedafterstorage, or - Pack with other metadata if available
5. Medium Priority - Supporting Structures
5.1 InvalidHeaderCache
File: crates/engine/tree/src/tree/invalid_headers.rs:19
pub struct InvalidHeaderCache {
headers: LruMap<B256, HeaderEntry>,
metrics: InvalidHeaderCacheMetrics,
}
struct HeaderEntry {
hit_count: u8, // 1 byte
header: BlockWithParent, // Large
}
Issues:
hit_count: u8before large struct = 7 bytes padding
5.2 BlockBuffer
File: crates/engine/tree/src/tree/block_buffer.rs:19
pub struct BlockBuffer<B: Block> {
pub(crate) blocks: HashMap<BlockHash, SealedBlock<B>>,
pub(crate) parent_to_child: HashMap<BlockHash, HashSet<BlockHash>>,
pub(crate) earliest_blocks: BTreeMap<BlockNumber, HashSet<BlockHash>>,
pub(crate) block_queue: VecDeque<BlockHash>,
pub(crate) max_blocks: usize,
pub(crate) metrics: BlockBufferMetrics,
}
Issues:
- 5 collection types in sequence
max_blocks(8 bytes) between collections
5.3 AccountProof
File: crates/trie/common/src/proofs.rs:573-585
pub struct AccountProof {
pub address: Address, // 20 bytes
pub info: Option<Account>,
pub proof: Vec<Bytes>,
pub storage_root: B256, // 32 bytes
pub storage_proofs: Vec<StorageProof>,
}
Issues:
Address(20 bytes) at start, misaligned for 32-byte B256
Recommendations:
- Reorder:
storage_root(32 bytes) →address(20 bytes) → Vecs/Options
6. Alloy Type Considerations
6.1 Core Types
| Type | Size | Alignment | Notes |
|---|---|---|---|
B256 |
32 bytes | 1 (byte array) | Optimal for hashing |
Address |
20 bytes | 1 (byte array) | Often padded to 24 |
U256 |
32 bytes | 8 | Little-endian limbs |
TrieMask |
2 bytes | 2 (u16) | Branch child bitmap |
Nibbles |
8-72 bytes | 8 | SmallVec-backed |
6.2 Map Types
B256Map<V>- Usesfoldhashhasher optimized for 32-byte keysB256Set- Same optimization for set operations- Both avoid unnecessary hashing computation on fixed-size keys
6.3 Potential Alloy Improvements
Addresspadding: Consider#[repr(align(8))]for Address to reduce struct paddingTrieMaskpacking: Could be combined with flags in same u16Nibblesvariants: Consider fixed-size array for common path lengths (≤ 64)
7. Recommended Actions
Immediate Wins (Low Risk, High Impact)
| Priority | Structure | Change | Est. Savings |
|---|---|---|---|
| 1 | HashedStorage |
Move wiped after storage |
7 bytes/instance |
| 2 | TreeState |
Reorder engine_kind to start |
6 bytes/instance |
| 3 | PayloadProcessor |
Consolidate bools to bitfield | 24 bytes/instance |
| 4 | HeaderEntry |
Move hit_count to end |
7 bytes/entry |
Medium Effort (Good ROI)
| Priority | Structure | Change | Est. Savings |
|---|---|---|---|
| 5 | MultiProof |
Consolidate mask maps | 1 HashMap alloc |
| 6 | StorageMultiProof |
Consolidate mask maps | 1 HashMap alloc |
| 7 | SerialSparseTrie |
Consolidate mask maps | 1 HashMap alloc |
| 8 | PrewarmContext |
Group related fields | Better cache locality |
| 9 | AccountProof |
Reorder by size | 4-8 bytes/instance |
Strategic (Requires Deeper Changes)
| Priority | Structure | Change | Impact |
|---|---|---|---|
| 10 | SparseNode |
External hash storage | 8-16 bytes/node |
| 11 | SparseNode |
Pack flags into u8 | 2 bytes/node |
| 12 | ExecutionCache |
Unified cache design | Reduced allocations |
| 13 | PrewarmContext |
Arc-shared immutable data | Reduced cloning |
Mask Map Consolidation Pattern
// Before: Two separate HashMaps
pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
// After: Single HashMap with tuple value
#[derive(Clone, Copy, Default)]
pub struct TrieMasks {
pub hash: TrieMask, // 2 bytes
pub tree: TrieMask, // 2 bytes
} // Total: 4 bytes, no padding
pub branch_node_masks: HashMap<Nibbles, TrieMasks>,
This pattern applies to:
MultiProofDecodedMultiProofStorageMultiProofDecodedStorageMultiProofSerialSparseTrieParallelSparseTrie
Appendix: Files to Review
Critical Files
crates/trie/sparse/src/trie.rs- SparseNode, SerialSparseTriecrates/trie/common/src/proofs.rs- MultiProof, StorageMultiProofcrates/engine/tree/src/tree/payload_processor/mod.rs- PayloadProcessorcrates/engine/tree/src/tree/payload_processor/prewarm.rs- PrewarmContext
High Priority Files
crates/engine/tree/src/tree/cached_state.rs- ExecutionCachecrates/engine/tree/src/tree/state.rs- TreeStatecrates/trie/common/src/hashed_state.rs- HashedStoragecrates/trie/parallel/src/proof_task.rs- ProofResultMessage
Supporting Files
crates/engine/tree/src/tree/invalid_headers.rs- InvalidHeaderCachecrates/engine/tree/src/tree/block_buffer.rs- BlockBuffercrates/chain-state/src/in_memory.rs- BlockState