mirror of https://github.com/paradigmxyz/reth.git synced 2026-01-09 07:17:56 -05:00

Files

yongkangc 35430830c6 fix: correct store_in_db_trie logic and remove leftover TrieMasks usage

- Fix parallel trie store_in_db_trie: was setting true when masks were
  present regardless of whether they were empty. Now correctly checks
  that at least one mask is non-empty.
- Remove leftover TrieMasks struct usage in proof_v2 test

2026-01-03 04:50:08 +00:00

15 KiB

Raw Blame History

Reth Memory Layout Optimization Analysis

Engine, Trie, Multiproof, and Prewarming Components

This document provides a comprehensive analysis of memory layout optimization opportunities across the reth codebase, focusing on engine, trie, multiproof, and prewarming components.

Executive Summary

After thorough exploration of the codebase, I've identified 30+ data structures with potential memory layout improvements. The highest priority optimizations are in:

Trie node representations - SparseNode enum (40+ bytes per node)
Proof structures - MultiProof, StorageMultiProof with duplicate hash mask maps
Execution cache - ExecutionCache, PrewarmContext with scattered boolean fields
State tracking - TreeState, BlockState with suboptimal field ordering

Estimated memory savings: 5-15% reduction in hot path allocations with proper field reordering and consolidation.

Critical Priority - Trie Node Structures
Critical Priority - Proof Structures
High Priority - Engine Execution Structures
High Priority - State Management
Medium Priority - Supporting Structures
Alloy Type Considerations
Recommended Actions

1. Critical Priority - Trie Node Structures

1.1 `SparseNode` Enum

File: crates/trie/sparse/src/trie.rs:1835

pub enum SparseNode {
    Empty,
    Hash(B256),                                     // 32 bytes
    Leaf { key: Nibbles, hash: Option<B256> },     // Nibbles + 40 bytes
    Extension {
        key: Nibbles,
        hash: Option<B256>,
        store_in_db_trie: Option<bool>
    },
    Branch {
        state_mask: TrieMask,                       // u16 = 2 bytes
        hash: Option<B256>,                         // 33 bytes (Option)
        store_in_db_trie: Option<bool>              // 2 bytes (Option<bool>)
    },
}

Issues:

Enum discriminant + largest variant = 40+ bytes per node
Option<B256> adds 33 bytes (1 byte tag + 32 bytes value)
Option<bool> wastes 2 bytes when 1 bit suffices
TrieMask (u16) followed by Option<B256> causes 6 bytes padding

Recommendations:

Pack store_in_db_trie and hash.is_some() into a single u8 flags field
Store hash externally in a separate HashMap keyed by node path
Consider smaller representation for common patterns

1.2 `SerialSparseTrie`

File: crates/trie/sparse/src/trie.rs:297-315

pub struct SerialSparseTrie {
    nodes: HashMap<Nibbles, SparseNode>,
    branch_node_tree_masks: HashMap<Nibbles, TrieMask>,  // TrieMask = u16
    branch_node_hash_masks: HashMap<Nibbles, TrieMask>,  // Duplicate structure
    values: HashMap<Nibbles, Vec<u8>>,
    prefix_set: PrefixSetMut,
    updates: Option<SparseTrieUpdates>,
    rlp_buf: Vec<u8>,
}

Issues:

Two identical HashMaps (tree_masks and hash_masks) with same key type
4 HashMap allocations with Nibbles keys (variable-length)
rlp_buf: Vec<u8> at end is a reusable buffer that could be Arc shared

Recommendations:

Consolidate mask maps: HashMap<Nibbles, (TrieMask, TrieMask)> or HashMap<Nibbles, TrieMasks>
Reorder fields: rlp_buf before updates (often accessed after updates)
Consider Arc<RwLock<Vec<u8>>> for rlp_buf if shared across threads

1.3 `ParallelSparseTrie`

File: crates/trie/sparse-parallel/src/trie.rs:105-127

pub struct ParallelSparseTrie {
    upper_subtrie: Box<SparseSubtrie>,
    lower_subtries: [LowerSparseSubtrie; 16],  // Fixed array of 16
    prefix_set: PrefixSetMut,
    updates: Option<SparseTrieUpdates>,
    branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
    branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
    update_actions_buffers: Vec<Vec<SparseTrieUpdatesAction>>,
    parallelism_thresholds: ParallelismThresholds,
    #[cfg(feature = "metrics")]
    metrics: ParallelSparseTrieMetrics,
}

Issues:

Array of 16 subtries after Box - severe padding potential
Same duplicate mask map pattern as SerialSparseTrie
Option<SparseTrieUpdates> before two HashMaps

Recommendations:

Move lower_subtries array to end (largest field)
Consolidate mask maps
Group Option and small scalar fields together

2. Critical Priority - Proof Structures

2.1 `MultiProof`

File: crates/trie/common/src/proofs.rs:173-182

pub struct MultiProof {
    pub account_subtree: ProofNodes,
    pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
    pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
    pub storages: B256Map<StorageMultiProof>,
}

Issues:

Two separate HashMaps with identical structure
ProofNodes (large) followed by two maps, then another map

Recommendations:

Consolidate mask maps: Create TrieMasks { hash: TrieMask, tree: TrieMask } struct
HashMap<Nibbles, TrieMasks> saves one HashMap allocation

2.2 `StorageMultiProof`

File: crates/trie/common/src/proofs.rs:450-460

pub struct StorageMultiProof {
    pub root: B256,                                     // 32 bytes
    pub subtree: ProofNodes,
    pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
    pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,
}

Issues:

Same duplicate HashMap pattern
root (32 bytes) at start is optimal for alignment
Two HashMaps at end should be consolidated

2.3 `ProofResultMessage`

File: crates/trie/parallel/src/proof_task.rs:598-607

pub struct ProofResultMessage {
    pub sequence_number: u64,
    pub result: Result<ProofResult, ParallelStateRootError>,
    pub elapsed: Duration,
    pub state: HashedPostState,
}

Issues:

u64 (8 bytes) at start, Duration (16 bytes) later
Result enum size varies significantly by variant

Recommendations:

Reorder: elapsed (16 bytes) → sequence_number (8 bytes) → result → state
Consider boxing the error variant in Result

3. High Priority - Engine Execution Structures

3.1 `PayloadProcessor`

File: crates/engine/tree/src/tree/payload_processor/mod.rs:104

pub struct PayloadProcessor<Evm> {
    executor: WorkloadExecutor,
    execution_cache: ExecutionCache,
    trie_metrics: MultiProofTaskMetrics,
    cross_block_cache_size: u64,
    disable_transaction_prewarming: bool,     // scattered bools
    disable_state_cache: bool,
    evm_config: Evm,
    precompile_cache_disabled: bool,
    precompile_cache_map: PrecompileCacheMap<SpecFor<Evm>>,
    sparse_state_trie: Arc<Mutex<Option<...>>>,
    disable_parallel_sparse_trie: bool,
    prewarm_max_concurrency: usize,
}

Issues:

4 boolean fields scattered throughout structure
Booleans between large fields cause padding waste
Generic Evm type size varies

Recommendations:

Create ProcessorFlags bitfield:

struct ProcessorFlags {
    bits: u8,  // 4 bools = 4 bits
}
impl ProcessorFlags {
    const DISABLE_TX_PREWARMING: u8 = 0b0001;
    const DISABLE_STATE_CACHE: u8 = 0b0010;
    const DISABLE_PRECOMPILE_CACHE: u8 = 0b0100;
    const DISABLE_PARALLEL_SPARSE_TRIE: u8 = 0b1000;
}

Move all config to struct end
Group related fields (executor, evm_config)

3.2 `PrewarmContext`

File: crates/engine/tree/src/tree/payload_processor/prewarm.rs:457

pub(super) struct PrewarmContext<N, P, Evm> {
    pub(super) env: ExecutionEnv<Evm>,
    pub(super) evm_config: Evm,
    pub(super) saved_cache: Option<SavedCache>,
    pub(super) provider: StateProviderBuilder<N, P>,
    pub(super) metrics: PrewarmMetrics,
    pub(super) terminate_execution: Arc<AtomicBool>,
    pub(super) precompile_cache_disabled: bool,
    pub(super) precompile_cache_map: PrecompileCacheMap<SpecFor<Evm>>,
}

Issues:

Frequently cloned for worker distribution
bool after Arc<AtomicBool> causes padding
Generic types make layout unpredictable

Recommendations:

Move terminate_execution and precompile_cache_disabled together
Consider Arc<PrewarmContextShared> for thread-shared parts
Separate mutable worker state from immutable config

3.3 `ExecutionCache`

File: crates/engine/tree/src/tree/cached_state.rs:342

pub(crate) struct ExecutionCache {
    code_cache: Cache<B256, Option<Bytecode>>,
    storage_cache: Cache<Address, Arc<AccountStorageCache>>,
    account_cache: Cache<Address, Option<Account>>,
}

Issues:

3 separate Moka caches = 3 allocations
Accessed on every state read (hot path)

Recommendations:

Consider a unified cache with tagged keys if access patterns permit
Pre-size caches based on expected workload
Evaluate if Arc<AccountStorageCache> sharing is necessary

4. High Priority - State Management

4.1 `TreeState`

File: crates/engine/tree/src/tree/state.rs:24

pub struct TreeState<N: NodePrimitives = EthPrimitives> {
    pub(crate) blocks_by_hash: HashMap<B256, ExecutedBlock<N>>,
    pub(crate) blocks_by_number: BTreeMap<BlockNumber, Vec<ExecutedBlock<N>>>,
    pub(crate) parent_to_child: HashMap<B256, HashSet<B256>>,
    pub(crate) current_canonical_head: BlockNumHash,
    pub(crate) engine_kind: EngineApiKind,              // 1-2 bytes
}

Issues:

engine_kind (1-2 bytes) at end after large collections
6-7 bytes padding before engine_kind

Recommendations:

Move engine_kind and current_canonical_head (16 bytes) to struct start
Order: engine_kind → current_canonical_head → collections

4.2 `BlockState`

File: crates/chain-state/src/in_memory.rs:575

pub struct BlockState<N: NodePrimitives = EthPrimitives> {
    block: ExecutedBlock<N>,
    parent: Option<Arc<Self>>,
}

Issues:

Option<Arc<Self>> is 16 bytes (Option tag + pointer)
Small struct, but frequently allocated

4.3 `HashedStorage`

File: crates/trie/common/src/hashed_state.rs:404-409

pub struct HashedStorage {
    pub wiped: bool,                    // 1 byte
    pub storage: B256Map<U256>,         // HashMap
}

Issues:

bool before HashMap causes 7 bytes padding (HashMap aligned to 8)

Recommendations:

Move wiped after storage, or
Pack with other metadata if available

5. Medium Priority - Supporting Structures

5.1 `InvalidHeaderCache`

File: crates/engine/tree/src/tree/invalid_headers.rs:19

pub struct InvalidHeaderCache {
    headers: LruMap<B256, HeaderEntry>,
    metrics: InvalidHeaderCacheMetrics,
}

struct HeaderEntry {
    hit_count: u8,              // 1 byte
    header: BlockWithParent,    // Large
}

Issues:

hit_count: u8 before large struct = 7 bytes padding

5.2 `BlockBuffer`

File: crates/engine/tree/src/tree/block_buffer.rs:19

pub struct BlockBuffer<B: Block> {
    pub(crate) blocks: HashMap<BlockHash, SealedBlock<B>>,
    pub(crate) parent_to_child: HashMap<BlockHash, HashSet<BlockHash>>,
    pub(crate) earliest_blocks: BTreeMap<BlockNumber, HashSet<BlockHash>>,
    pub(crate) block_queue: VecDeque<BlockHash>,
    pub(crate) max_blocks: usize,
    pub(crate) metrics: BlockBufferMetrics,
}

Issues:

5 collection types in sequence
max_blocks (8 bytes) between collections

5.3 `AccountProof`

File: crates/trie/common/src/proofs.rs:573-585

pub struct AccountProof {
    pub address: Address,           // 20 bytes
    pub info: Option<Account>,
    pub proof: Vec<Bytes>,
    pub storage_root: B256,         // 32 bytes
    pub storage_proofs: Vec<StorageProof>,
}

Issues:

Address (20 bytes) at start, misaligned for 32-byte B256

Recommendations:

Reorder: storage_root (32 bytes) → address (20 bytes) → Vecs/Options

6. Alloy Type Considerations

6.1 Core Types

Type	Size	Alignment	Notes
`B256`	32 bytes	1 (byte array)	Optimal for hashing
`Address`	20 bytes	1 (byte array)	Often padded to 24
`U256`	32 bytes	8	Little-endian limbs
`TrieMask`	2 bytes	2 (u16)	Branch child bitmap
`Nibbles`	8-72 bytes	8	SmallVec-backed

6.2 Map Types

B256Map<V> - Uses foldhash hasher optimized for 32-byte keys
B256Set - Same optimization for set operations
Both avoid unnecessary hashing computation on fixed-size keys

6.3 Potential Alloy Improvements

Address padding: Consider #[repr(align(8))] for Address to reduce struct padding
TrieMask packing: Could be combined with flags in same u16
Nibbles variants: Consider fixed-size array for common path lengths (≤ 64)

7. Recommended Actions

Immediate Wins (Low Risk, High Impact)

Priority	Structure	Change	Est. Savings
1	`HashedStorage`	Move `wiped` after `storage`	7 bytes/instance
2	`TreeState`	Reorder `engine_kind` to start	6 bytes/instance
3	`PayloadProcessor`	Consolidate bools to bitfield	24 bytes/instance
4	`HeaderEntry`	Move `hit_count` to end	7 bytes/entry

Medium Effort (Good ROI)

Priority	Structure	Change	Est. Savings
5	`MultiProof`	Consolidate mask maps	1 HashMap alloc
6	`StorageMultiProof`	Consolidate mask maps	1 HashMap alloc
7	`SerialSparseTrie`	Consolidate mask maps	1 HashMap alloc
8	`PrewarmContext`	Group related fields	Better cache locality
9	`AccountProof`	Reorder by size	4-8 bytes/instance

Strategic (Requires Deeper Changes)

Priority	Structure	Change	Impact
10	`SparseNode`	External hash storage	8-16 bytes/node
11	`SparseNode`	Pack flags into u8	2 bytes/node
12	`ExecutionCache`	Unified cache design	Reduced allocations
13	`PrewarmContext`	Arc-shared immutable data	Reduced cloning

Mask Map Consolidation Pattern

// Before: Two separate HashMaps
pub branch_node_hash_masks: HashMap<Nibbles, TrieMask>,
pub branch_node_tree_masks: HashMap<Nibbles, TrieMask>,

// After: Single HashMap with tuple value
#[derive(Clone, Copy, Default)]
pub struct TrieMasks {
    pub hash: TrieMask,  // 2 bytes
    pub tree: TrieMask,  // 2 bytes
}  // Total: 4 bytes, no padding

pub branch_node_masks: HashMap<Nibbles, TrieMasks>,

This pattern applies to:

MultiProof
DecodedMultiProof
StorageMultiProof
DecodedStorageMultiProof
SerialSparseTrie
ParallelSparseTrie

Appendix: Files to Review

Critical Files

crates/trie/sparse/src/trie.rs - SparseNode, SerialSparseTrie
crates/trie/common/src/proofs.rs - MultiProof, StorageMultiProof
crates/engine/tree/src/tree/payload_processor/mod.rs - PayloadProcessor
crates/engine/tree/src/tree/payload_processor/prewarm.rs - PrewarmContext

High Priority Files

crates/engine/tree/src/tree/cached_state.rs - ExecutionCache
crates/engine/tree/src/tree/state.rs - TreeState
crates/trie/common/src/hashed_state.rs - HashedStorage
crates/trie/parallel/src/proof_task.rs - ProofResultMessage

Supporting Files

crates/engine/tree/src/tree/invalid_headers.rs - InvalidHeaderCache
crates/engine/tree/src/tree/block_buffer.rs - BlockBuffer
crates/chain-state/src/in_memory.rs - BlockState

15 KiB Raw Blame History

Reth Memory Layout Optimization Analysis

Engine, Trie, Multiproof, and Prewarming Components

Executive Summary

Table of Contents

1. Critical Priority - Trie Node Structures

1.1 SparseNode Enum

1.2 SerialSparseTrie

1.3 ParallelSparseTrie

2. Critical Priority - Proof Structures

2.1 MultiProof

2.2 StorageMultiProof

2.3 ProofResultMessage

3. High Priority - Engine Execution Structures

3.1 PayloadProcessor

3.2 PrewarmContext

3.3 ExecutionCache

4. High Priority - State Management

4.1 TreeState

4.2 BlockState

4.3 HashedStorage

5. Medium Priority - Supporting Structures

5.1 InvalidHeaderCache

5.2 BlockBuffer

5.3 AccountProof

6. Alloy Type Considerations

6.1 Core Types

6.2 Map Types

6.3 Potential Alloy Improvements

7. Recommended Actions

Immediate Wins (Low Risk, High Impact)

Medium Effort (Good ROI)

Strategic (Requires Deeper Changes)

Mask Map Consolidation Pattern

Appendix: Files to Review

Critical Files

High Priority Files

Supporting Files

15 KiB

Raw Blame History

1.1 `SparseNode` Enum

1.2 `SerialSparseTrie`

1.3 `ParallelSparseTrie`

2.1 `MultiProof`

2.2 `StorageMultiProof`

2.3 `ProofResultMessage`

3.1 `PayloadProcessor`

3.2 `PrewarmContext`

3.3 `ExecutionCache`

4.1 `TreeState`

4.2 `BlockState`

4.3 `HashedStorage`

5.1 `InvalidHeaderCache`

5.2 `BlockBuffer`

5.3 `AccountProof`