Files
reth/reth-memory-layout-analysis.md
yongkangc 35430830c6 fix: correct store_in_db_trie logic and remove leftover TrieMasks usage
- Fix parallel trie store_in_db_trie: was setting true when masks were
  present regardless of whether they were empty. Now correctly checks
  that at least one mask is non-empty.
- Remove leftover TrieMasks struct usage in proof_v2 test
2026-01-03 04:50:08 +00:00

15 KiB

Reth Memory Layout Optimization Analysis

Engine, Trie, Multiproof, and Prewarming Components

This document provides a comprehensive analysis of memory layout optimization opportunities across the reth codebase, focusing on engine, trie, multiproof, and prewarming components.


Executive Summary

After thorough exploration of the codebase, I've identified 30+ data structures with potential memory layout improvements. The highest priority optimizations are in:

  1. Trie node representations - SparseNode enum (40+ bytes per node)
  2. Proof structures - MultiProof, StorageMultiProof with duplicate hash mask maps
  3. Execution cache - ExecutionCache, PrewarmContext with scattered boolean fields
  4. State tracking - TreeState, BlockState with suboptimal field ordering

Estimated memory savings: 5-15% reduction in hot path allocations with proper field reordering and consolidation.


Table of Contents

  1. Critical Priority - Trie Node Structures
  2. Critical Priority - Proof Structures
  3. High Priority - Engine Execution Structures
  4. High Priority - State Management
  5. Medium Priority - Supporting Structures
  6. Alloy Type Considerations
  7. Recommended Actions

1. Critical Priority - Trie Node Structures

1.1 SparseNode Enum

File: crates/trie/sparse/src/trie.rs:1835

pub enum SparseNode {
    Empty,
    Hash(B256),                                     // 32 bytes
    Leaf { key: Nibbles, hash: Option<B256> },     // Nibbles + 40 bytes
    Extension {
        key: Nibbles,
        hash: Option<B256>,
        store_in_db_trie: Option<bool>
    },
    Branch {
        state_mask: TrieMask,                       // u16 = 2 bytes
        hash: Option<B256>,                         // 33 bytes (Option)
        store_in_db_trie: Option<bool>              // 2 bytes (Option<bool>)
    },
}

Issues:

  • Enum discriminant + largest variant = 40+ bytes per node
  • Option<B256> adds 33 bytes (1 byte tag + 32 bytes value)
  • Option<bool> wastes 2 bytes when 1 bit suffices
  • TrieMask (u16) followed by Option<B256> causes 6 bytes padding

Recommendations:

  1. Pack store_in_db_trie and hash.is_some() into a single u8 flags field
  2. Store hash externally in a separate HashMap keyed by node path
  3. Consider smaller representation for common patterns

1.2 SerialSparseTrie

File: crates/trie/sparse/src/trie.rs:297-315

pub struct SerialSparseTrie {
    nodes: HashMap<Nibbles, SparseNode>,
    branch_node_tree_masks: HashMap<Nibbles, TrieMask>,  // TrieMask = u16
    branch_node_hash_masks: HashMap<Nibbles, TrieMask>,  // Duplicate structure
    values: HashMap<Nibbles, Vec<u8>>,
    prefix_set: PrefixSetMut,
    updates: Option<SparseTrieUpdates>,
    rlp_buf: Vec<u8>,
}

Issues:

  • Two identical HashMaps (tree_masks and hash_masks) with same key type
  • 4 HashMap allocations with Nibbles keys (variable-length)
  • rlp_buf: Vec<u8> at end is a reusable buffer that could be Arc shared

Recommendations:

  1. Consolidate mask maps: HashMap<Nibbles, (TrieMask, TrieMask)> or HashMap<Nibbles, TrieMasks>
  2. Reorder fields: rlp_buf before updates (often accessed after updates)
  3. Consider Arc<RwLock<Vec<u8>>> for rlp_buf if shared across threads

1.3 ParallelSparseTrie

File: crates/trie/sparse-parallel/src/trie.rs:105-127

pub struct ParallelSparseTrie {
    upper_subtrie: Box<SparseSubtrie>,
    lower_subtries: [LowerSparseSubtrie; 16],  // Fixed array of 16
    prefix_set: PrefixSetMut,
    updates: Option<SparseTrieUpdates>,
    branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
    branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
    update_actions_buffers: Vec<Vec<SparseTrieUpdatesAction>>,
    parallelism_thresholds: ParallelismThresholds,
    #[cfg(feature = "metrics")]
    metrics: ParallelSparseTrieMetrics,
}

Issues:

  • Array of 16 subtries after Box - severe padding potential
  • Same duplicate mask map pattern as SerialSparseTrie
  • Option<SparseTrieUpdates> before two HashMaps

Recommendations:

  1. Move lower_subtries array to end (largest field)
  2. Consolidate mask maps
  3. Group Option and small scalar fields together

2. Critical Priority - Proof Structures

2.1 MultiProof

File: crates/trie/common/src/proofs.rs:173-182

pub struct MultiProof {
    pub account_subtree: ProofNodes,
    pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
    pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
    pub storages: B256Map<StorageMultiProof>,
}

Issues:

  • Two separate HashMaps with identical structure
  • ProofNodes (large) followed by two maps, then another map

Recommendations:

  1. Consolidate mask maps: Create TrieMasks { hash: TrieMask, tree: TrieMask } struct
  2. HashMap<Nibbles, TrieMasks> saves one HashMap allocation

2.2 StorageMultiProof

File: crates/trie/common/src/proofs.rs:450-460

pub struct StorageMultiProof {
    pub root: B256,                                     // 32 bytes
    pub subtree: ProofNodes,
    pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
    pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
}

Issues:

  • Same duplicate HashMap pattern
  • root (32 bytes) at start is optimal for alignment
  • Two HashMaps at end should be consolidated

2.3 ProofResultMessage

File: crates/trie/parallel/src/proof_task.rs:598-607

pub struct ProofResultMessage {
    pub sequence_number: u64,
    pub result: Result<ProofResult, ParallelStateRootError>,
    pub elapsed: Duration,
    pub state: HashedPostState,
}

Issues:

  • u64 (8 bytes) at start, Duration (16 bytes) later
  • Result enum size varies significantly by variant

Recommendations:

  1. Reorder: elapsed (16 bytes) → sequence_number (8 bytes) → resultstate
  2. Consider boxing the error variant in Result

3. High Priority - Engine Execution Structures

3.1 PayloadProcessor

File: crates/engine/tree/src/tree/payload_processor/mod.rs:104

pub struct PayloadProcessor<Evm> {
    executor: WorkloadExecutor,
    execution_cache: ExecutionCache,
    trie_metrics: MultiProofTaskMetrics,
    cross_block_cache_size: u64,
    disable_transaction_prewarming: bool,     // scattered bools
    disable_state_cache: bool,
    evm_config: Evm,
    precompile_cache_disabled: bool,
    precompile_cache_map: PrecompileCacheMap<SpecFor<Evm>>,
    sparse_state_trie: Arc<Mutex<Option<...>>>,
    disable_parallel_sparse_trie: bool,
    prewarm_max_concurrency: usize,
}

Issues:

  • 4 boolean fields scattered throughout structure
  • Booleans between large fields cause padding waste
  • Generic Evm type size varies

Recommendations:

  1. Create ProcessorFlags bitfield:
    struct ProcessorFlags {
        bits: u8,  // 4 bools = 4 bits
    }
    impl ProcessorFlags {
        const DISABLE_TX_PREWARMING: u8 = 0b0001;
        const DISABLE_STATE_CACHE: u8 = 0b0010;
        const DISABLE_PRECOMPILE_CACHE: u8 = 0b0100;
        const DISABLE_PARALLEL_SPARSE_TRIE: u8 = 0b1000;
    }
    
  2. Move all config to struct end
  3. Group related fields (executor, evm_config)

3.2 PrewarmContext

File: crates/engine/tree/src/tree/payload_processor/prewarm.rs:457

pub(super) struct PrewarmContext<N, P, Evm> {
    pub(super) env: ExecutionEnv<Evm>,
    pub(super) evm_config: Evm,
    pub(super) saved_cache: Option<SavedCache>,
    pub(super) provider: StateProviderBuilder<N, P>,
    pub(super) metrics: PrewarmMetrics,
    pub(super) terminate_execution: Arc<AtomicBool>,
    pub(super) precompile_cache_disabled: bool,
    pub(super) precompile_cache_map: PrecompileCacheMap<SpecFor<Evm>>,
}

Issues:

  • Frequently cloned for worker distribution
  • bool after Arc<AtomicBool> causes padding
  • Generic types make layout unpredictable

Recommendations:

  1. Move terminate_execution and precompile_cache_disabled together
  2. Consider Arc<PrewarmContextShared> for thread-shared parts
  3. Separate mutable worker state from immutable config

3.3 ExecutionCache

File: crates/engine/tree/src/tree/cached_state.rs:342

pub(crate) struct ExecutionCache {
    code_cache: Cache<B256, Option<Bytecode>>,
    storage_cache: Cache<Address, Arc<AccountStorageCache>>,
    account_cache: Cache<Address, Option<Account>>,
}

Issues:

  • 3 separate Moka caches = 3 allocations
  • Accessed on every state read (hot path)

Recommendations:

  1. Consider a unified cache with tagged keys if access patterns permit
  2. Pre-size caches based on expected workload
  3. Evaluate if Arc<AccountStorageCache> sharing is necessary

4. High Priority - State Management

4.1 TreeState

File: crates/engine/tree/src/tree/state.rs:24

pub struct TreeState<N: NodePrimitives = EthPrimitives> {
    pub(crate) blocks_by_hash: HashMap<B256, ExecutedBlock<N>>,
    pub(crate) blocks_by_number: BTreeMap<BlockNumber, Vec<ExecutedBlock<N>>>,
    pub(crate) parent_to_child: HashMap<B256, HashSet<B256>>,
    pub(crate) current_canonical_head: BlockNumHash,
    pub(crate) engine_kind: EngineApiKind,              // 1-2 bytes
}

Issues:

  • engine_kind (1-2 bytes) at end after large collections
  • 6-7 bytes padding before engine_kind

Recommendations:

  1. Move engine_kind and current_canonical_head (16 bytes) to struct start
  2. Order: engine_kindcurrent_canonical_head → collections

4.2 BlockState

File: crates/chain-state/src/in_memory.rs:575

pub struct BlockState<N: NodePrimitives = EthPrimitives> {
    block: ExecutedBlock<N>,
    parent: Option<Arc<Self>>,
}

Issues:

  • Option<Arc<Self>> is 16 bytes (Option tag + pointer)
  • Small struct, but frequently allocated

4.3 HashedStorage

File: crates/trie/common/src/hashed_state.rs:404-409

pub struct HashedStorage {
    pub wiped: bool,                    // 1 byte
    pub storage: B256Map<U256>,         // HashMap
}

Issues:

  • bool before HashMap causes 7 bytes padding (HashMap aligned to 8)

Recommendations:

  1. Move wiped after storage, or
  2. Pack with other metadata if available

5. Medium Priority - Supporting Structures

5.1 InvalidHeaderCache

File: crates/engine/tree/src/tree/invalid_headers.rs:19

pub struct InvalidHeaderCache {
    headers: LruMap<B256, HeaderEntry>,
    metrics: InvalidHeaderCacheMetrics,
}

struct HeaderEntry {
    hit_count: u8,              // 1 byte
    header: BlockWithParent,    // Large
}

Issues:

  • hit_count: u8 before large struct = 7 bytes padding

5.2 BlockBuffer

File: crates/engine/tree/src/tree/block_buffer.rs:19

pub struct BlockBuffer<B: Block> {
    pub(crate) blocks: HashMap<BlockHash, SealedBlock<B>>,
    pub(crate) parent_to_child: HashMap<BlockHash, HashSet<BlockHash>>,
    pub(crate) earliest_blocks: BTreeMap<BlockNumber, HashSet<BlockHash>>,
    pub(crate) block_queue: VecDeque<BlockHash>,
    pub(crate) max_blocks: usize,
    pub(crate) metrics: BlockBufferMetrics,
}

Issues:

  • 5 collection types in sequence
  • max_blocks (8 bytes) between collections

5.3 AccountProof

File: crates/trie/common/src/proofs.rs:573-585

pub struct AccountProof {
    pub address: Address,           // 20 bytes
    pub info: Option<Account>,
    pub proof: Vec<Bytes>,
    pub storage_root: B256,         // 32 bytes
    pub storage_proofs: Vec<StorageProof>,
}

Issues:

  • Address (20 bytes) at start, misaligned for 32-byte B256

Recommendations:

  1. Reorder: storage_root (32 bytes) → address (20 bytes) → Vecs/Options

6. Alloy Type Considerations

6.1 Core Types

Type Size Alignment Notes
B256 32 bytes 1 (byte array) Optimal for hashing
Address 20 bytes 1 (byte array) Often padded to 24
U256 32 bytes 8 Little-endian limbs
TrieMask 2 bytes 2 (u16) Branch child bitmap
Nibbles 8-72 bytes 8 SmallVec-backed

6.2 Map Types

  • B256Map<V> - Uses foldhash hasher optimized for 32-byte keys
  • B256Set - Same optimization for set operations
  • Both avoid unnecessary hashing computation on fixed-size keys

6.3 Potential Alloy Improvements

  1. Address padding: Consider #[repr(align(8))] for Address to reduce struct padding
  2. TrieMask packing: Could be combined with flags in same u16
  3. Nibbles variants: Consider fixed-size array for common path lengths (≤ 64)

Immediate Wins (Low Risk, High Impact)

Priority Structure Change Est. Savings
1 HashedStorage Move wiped after storage 7 bytes/instance
2 TreeState Reorder engine_kind to start 6 bytes/instance
3 PayloadProcessor Consolidate bools to bitfield 24 bytes/instance
4 HeaderEntry Move hit_count to end 7 bytes/entry

Medium Effort (Good ROI)

Priority Structure Change Est. Savings
5 MultiProof Consolidate mask maps 1 HashMap alloc
6 StorageMultiProof Consolidate mask maps 1 HashMap alloc
7 SerialSparseTrie Consolidate mask maps 1 HashMap alloc
8 PrewarmContext Group related fields Better cache locality
9 AccountProof Reorder by size 4-8 bytes/instance

Strategic (Requires Deeper Changes)

Priority Structure Change Impact
10 SparseNode External hash storage 8-16 bytes/node
11 SparseNode Pack flags into u8 2 bytes/node
12 ExecutionCache Unified cache design Reduced allocations
13 PrewarmContext Arc-shared immutable data Reduced cloning

Mask Map Consolidation Pattern

// Before: Two separate HashMaps
pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,

// After: Single HashMap with tuple value
#[derive(Clone, Copy, Default)]
pub struct TrieMasks {
    pub hash: TrieMask,  // 2 bytes
    pub tree: TrieMask,  // 2 bytes
}  // Total: 4 bytes, no padding

pub branch_node_masks: HashMap<Nibbles, TrieMasks>,

This pattern applies to:

  • MultiProof
  • DecodedMultiProof
  • StorageMultiProof
  • DecodedStorageMultiProof
  • SerialSparseTrie
  • ParallelSparseTrie

Appendix: Files to Review

Critical Files

  • crates/trie/sparse/src/trie.rs - SparseNode, SerialSparseTrie
  • crates/trie/common/src/proofs.rs - MultiProof, StorageMultiProof
  • crates/engine/tree/src/tree/payload_processor/mod.rs - PayloadProcessor
  • crates/engine/tree/src/tree/payload_processor/prewarm.rs - PrewarmContext

High Priority Files

  • crates/engine/tree/src/tree/cached_state.rs - ExecutionCache
  • crates/engine/tree/src/tree/state.rs - TreeState
  • crates/trie/common/src/hashed_state.rs - HashedStorage
  • crates/trie/parallel/src/proof_task.rs - ProofResultMessage

Supporting Files

  • crates/engine/tree/src/tree/invalid_headers.rs - InvalidHeaderCache
  • crates/engine/tree/src/tree/block_buffer.rs - BlockBuffer
  • crates/chain-state/src/in_memory.rs - BlockState