Compare commits

..

1 Commits

Author SHA1 Message Date
Georgios Konstantopoulos
d701d70dc1 perf(db): disable prefault_write to avoid mincore() syscall overhead
In WRITEMAP mode, MDBX by default calls mincore() before each page write to check if
the page is resident in memory. Profiling shows this accounts for ~20-22% of
persistence time, yet pages are usually already warm from Engine Task reads.

Disabling prefault_write skips the mincore() syscall and lets the kernel handle
page faults directly. Since pages are likely resident, this trades potential
page faults (which rarely occur due to cache warmth) for syscall overhead
elimination.

Amp-Thread-ID: https://ampcode.com/threads/T-019c233a-3fa9-733b-8985-991c393eb610
Co-authored-by: Amp <amp@ampcode.com>
2026-02-03 11:23:57 +00:00
1230 changed files with 86321 additions and 62408 deletions

View File

@@ -1,4 +0,0 @@
---
---
Added site-level meta description for SEO.

View File

@@ -1,5 +0,0 @@
---
reth-transaction-pool: patch
---
Renamed and documented validation methods for clarity. `validate_one_no_state` and `validate_one_against_state` are now public methods `validate_stateless` and `validate_stateful` with improved documentation explaining their respective validation phases.

View File

@@ -1,10 +0,0 @@
---
reth-engine-primitives: patch
reth-engine-tree: patch
reth-node-core: patch
reth-trie-parallel: minor
---
Removed legacy proof calculation system and V2-specific configuration flags.
Removed the legacy (non-V2) proof calculation code paths, simplified multiproof task architecture by removing the dual-mode system, and cleaned up V2-specific CLI flags (`--engine.disable-proof-v2`, `--engine.disable-trie-cache`) that are no longer needed. The codebase now exclusively uses V2 proofs with the sparse trie cache.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: patch
---
Refactored sparse trie node state tracking to use RLP nodes instead of hashes. Replaced `Option<B256>` hash fields with `SparseNodeState` enum that tracks either dirty nodes or cached RLP nodes with optional database storage flags. Added debug assertions to validate leaf path lengths and improved pruning logic to use node paths directly instead of path-hash tuples.

View File

@@ -1,20 +0,0 @@
# Changelogs configuration for reth
# https://github.com/wevm/changelogs
# How to bump packages that depend on changed packages
dependent_bump = "patch"
[changelog]
# Generate per-crate changelogs (vs single root changelog)
format = "per-crate"
# Fixed groups: all always share the same version
# reth binaries share version
[[fixed]]
members = ["reth"]
# Packages to ignore (internal/test-only crates)
ignore = [
"reth-testing-utils",
"reth-bench",
]

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: patch
---
Fixed a bug in `merge_subtrie_updates` where source insertions did not cancel destination removals (and vice versa), causing inconsistent trie updates accumulated across multiple `root()` calls without intermediate `take_updates()`. Added a test covering the cross-cancellation behavior.

View File

@@ -1,5 +0,0 @@
---
reth-transaction-pool: minor
---
Added support for optional custom stateless and stateful validation hooks in `EthTransactionValidator` via `set_additional_stateless_validation` and `set_additional_stateful_validation` methods. Also implemented a manual `Debug` impl to handle the non-`Debug` function pointer fields.

View File

@@ -1,17 +0,0 @@
---
reth: minor
reth-cli-commands: minor
reth-e2e-test-utils: minor
reth-ethereum-cli: minor
reth-node-core: minor
reth-optimism-bin: minor
reth-optimism-cli: minor
reth-prune: patch
reth-stages: patch
reth-storage-api: minor
reth-storage-db-api: minor
reth-storage-db-common: patch
reth-storage-provider: patch
---
Introduced `--storage.v2` flag to control storage mode defaults, replacing the `edge` feature flag with `rocksdb` feature. The new flag enables v2 storage settings (static files + RocksDB routing) while individual `--static-files.*` and `--rocksdb.*` flags can still override defaults. Updated feature gates from `edge` to `rocksdb` across all affected crates.

View File

@@ -1,5 +0,0 @@
---
reth-tasks: patch
---
Added panic handler to all rayon thread pools that logs panics via `tracing::error` instead of aborting the process.

View File

@@ -1,5 +0,0 @@
---
reth: patch
---
Removed Windows platform support from the codebase, including the Windows cross-compilation Dockerfile, build targets in Cross.toml and Makefile, and Windows-specific options in the bug report template.

View File

@@ -1,5 +0,0 @@
---
reth-network: minor
---
Added reason label to backed_off_peers metric. The metric now tracks backed off peers by reason (too_many_peers, graceful_close, connection_error) to improve observability.

View File

@@ -1,5 +0,0 @@
---
reth-network-types: patch
---
Increased default maximum concurrent outbound dials from 15 to 30.

View File

@@ -1,6 +0,0 @@
---
reth-trie-common: minor
reth-trie: minor
---
Added `contains_range` method to `PrefixSet` for checking if any key falls within a half-open range. Added prefix set support to `ProofCalculator` via `with_prefix_set`, enabling stale cached hash invalidation and branch collapse detection when keys are inserted or removed; propagated storage prefix sets through `SyncAccountValueEncoder`.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: patch
---
Refactored arena trie internals by adding a `BranchChildIdx::sibling()` helper, deduplicating `Index`/`NodeArena` type aliases, and replacing `is_empty()` with a `drop_root()` method. Fixed a bug where `cursor.pop()` was called before checking if the leaf was the root node, which could cause incorrect dirty-state propagation.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: patch
---
Added recording of `SetRoot` operation in `ParallelSparseTrie::set_root` when the `trie-debug` feature is enabled.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: minor
---
Fixed a bug in `ArenaParallelSparseTrie` where subtrie updates that would completely empty a subtrie were incorrectly dispatched to parallel workers instead of being processed inline, preventing correct branch collapse detection when blinded siblings are present. Refactored the `SparseTrie` test suite to accept a `fn() -> T` factory instead of requiring `T: Default`, enabling a new `arena_parallel_sparse_trie_always_parallel` test variant that exercises all tests with parallelism thresholds set to 1. Added `test_branch_collapse_multi_empty_subtries_blinded_remaining` to cover the case where removing multiple revealed leaves empties their subtries and leaves a single blinded sibling requiring a proof.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse-parallel: patch
---
Fixed parallel sparse trie to skip revealing disconnected leaves by checking parent branch reachability before inserting leaf nodes.

View File

@@ -1,5 +0,0 @@
---
ef-tests: patch
---
Removed reth-stateless crate and stateless validation from ef-tests.

View File

@@ -1,6 +0,0 @@
---
reth-rpc-convert: minor
reth-storage-rpc-provider: minor
---
Replaced the separate `TryFromBlockResponse`, `TryFromReceiptResponse`, and `TryFromTransactionResponse` traits with a unified `RpcResponseConverter` trait and default `EthRpcConverter` implementation. Removed the `op-alloy-network` dependency and refactored `RpcBlockchainProvider` to store a dynamic converter instance instead of relying on per-type trait bounds.

View File

@@ -1,6 +0,0 @@
---
reth-engine-tree: patch
reth-trie-sparse-parallel: patch
---
Added tracing spans and debug logs to sparse trie operations for better observability during parallel state root computation.

View File

@@ -1,6 +0,0 @@
---
reth-exex: patch
reth-exex-types: patch
---
Added configurable backfill thresholds to ExEx notifications stream and added regression tests for state provider parity between pipeline and backfill execution paths.

View File

@@ -1,5 +0,0 @@
---
reth-payload-builder: minor
---
Added observability metrics for payload resolve latency and new payload job creation latency to the payload builder service.

View File

@@ -1,4 +0,0 @@
---
---
Added WebSocket subscription integration tests for eth_subscribe.

View File

@@ -1,5 +0,0 @@
---
reth-transaction-pool: minor
---
Added `consensus_ref` method to `PoolTransaction` trait for borrowing consensus transactions without cloning.

View File

@@ -1,4 +0,0 @@
---
---
Improved nightly Docker build failure Slack notification with more detailed formatting and context.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: patch
---
Fixed another branch collapse edge case where `check_subtrie_collapse_needs_proof` incorrectly compared removal count against total update count (including `Touched` entries), causing it to skip proof requests for blinded siblings and panic when the subtrie emptied. Added a regression test covering the removals + `Touched` + blinded sibling scenario.

View File

@@ -1,10 +0,0 @@
---
reth-chain-state: minor
reth-engine-primitives: minor
reth-engine-tree: minor
reth-node-core: minor
reth-node-events: minor
reth: patch
---
Added configurable slow block logging (`--engine.slow-block-threshold`) that emits a structured `warn!` log with detailed timing, state-operation counts, and cache hit-rate metrics for blocks whose total processing time exceeds the threshold. Introduced `ExecutionTimingStats`, `CacheStats`, `StateProviderStats`, and `SlowBlockInfo` types to carry execution statistics from block validation through persistence, and refactored `PersistenceResult` to carry commit duration alongside the last persisted block.

View File

@@ -1,5 +0,0 @@
---
reth-rpc-eth-types: patch
---
Updated `eth_simulateV1` revert error code from `-32000` to `3` to be consistent with `eth_call`, per [execution-apis#748](https://github.com/ethereum/execution-apis/pull/748).

View File

@@ -1,5 +0,0 @@
---
reth-engine-tree: patch
---
Reordered cache size calculations in `ExecutionCache::new` to group related operations together.

View File

@@ -1,7 +0,0 @@
---
reth: patch
reth-cli-commands: patch
reth-node-core: patch
---
Removed experimental ress protocol support for stateless Ethereum nodes.

View File

@@ -1,6 +0,0 @@
---
reth-trie-db: minor
reth-engine-tree: minor
---
Added `PendingChangeset` and `PendingChangesetGuard` to `ChangesetCache` so concurrent readers wait for an in-progress computation instead of falling back to the expensive DB-based path. The guard automatically cancels the pending entry on drop (e.g. task panic), ensuring waiters always make progress.

View File

@@ -1,5 +0,0 @@
---
reth-engine-tree: patch
---
Added sub-phase timing histograms to the sparse trie event loop, tracking channel wait, proof coalescing, multiproof reveal, and trie update durations separately.

View File

@@ -1,5 +0,0 @@
---
reth-node-builder: patch
---
Removed biased select in engine service loop to allow fair scheduling of shutdown requests alongside event processing.

View File

@@ -1,5 +0,0 @@
---
reth-transaction-pool: patch
---
Fixed swapped arguments in `blob_tx_priority` function calls, correcting the parameter order to match the function signature.

View File

@@ -1,4 +0,0 @@
---
---
Improved documentation overview page with better structure and clarity.

View File

@@ -1,5 +0,0 @@
---
reth: patch
---
Re-enabled changelog workflow to run automatically on pull requests.

View File

@@ -1,5 +0,0 @@
---
reth-node-events: patch
---
Updated consensus engine log message to be more accurate about received updates.

View File

@@ -1,8 +0,0 @@
---
reth-engine-primitives: minor
reth-engine-tree: minor
reth-node-core: minor
reth-trie-parallel: minor
---
Added `--engine.proof-jitter` CLI option behind the `trie-debug` feature flag. When set, each proof worker sleeps for a random duration up to the specified value before starting proof computation, useful for stress-testing timing-sensitive proof logic.

View File

@@ -1,6 +0,0 @@
---
reth-rpc-eth-api: minor
reth-rpc-server-types: minor
---
Added `eth_getStorageValues` RPC method for batch storage slot retrieval across multiple addresses.

View File

@@ -1,6 +0,0 @@
---
reth-chainspec: minor
reth-network-peers: minor
---
Removed OP stack bootnodes from default chain configurations and network peers module.

View File

@@ -1,5 +0,0 @@
---
reth-rpc-convert: patch
---
Updated `alloy-evm` dependency to git revision `9bc2dba` and adapted `TxEnvConverter` impl to the updated `TryIntoTxEnv` trait signature that now includes a `Spec` generic parameter.

View File

@@ -1,6 +0,0 @@
---
reth-trie: patch
reth-trie-sparse: patch
---
Refactored test harness for sparse trie tests by extracting `TrieTestHarness` into a shared `reth-trie` test utility, replacing duplicated inline harness code across multiple test modules. Updated `proof_v2` return type to include an optional root hash, and converted `original_root` and `storage` from public fields to accessor methods.

View File

@@ -1,5 +0,0 @@
---
reth-trie: patch
---
Removed the local `increment_and_strip_trailing_zeros` function and `PATH_ALL_ZEROS` static in `proof_v2`, replacing them with the equivalent `Nibbles::next_without_prefix` and `Nibbles::is_zeroes` builtins. Also replaced manual `.get()` calls on `state_mask`/`hash_mask` with direct field access and switched to `Nibbles::unpack_array` over the unsafe `unpack_unchecked`.

View File

@@ -1,9 +0,0 @@
---
reth-network-api: minor
reth-network-types: minor
reth-network: minor
reth-node-core: minor
reth: minor
---
Added optional ENR fork ID enforcement to filter out peers from incompatible networks during peer discovery, controlled by the `--enforce-enr-fork-id` CLI flag.

View File

@@ -1,5 +0,0 @@
---
reth-primitives: patch
---
Moved feature-referenced dependencies from dev-dependencies to optional dependencies to ensure they are available when their corresponding features are enabled.

View File

@@ -1,5 +0,0 @@
---
reth-engine-tree: patch
---
Added idle-time pre-computation of account trie upper hashes in the sparse trie payload processor when no pending proof results are available.

View File

@@ -1,5 +0,0 @@
---
reth-engine-tree: patch
---
Fixed `compare_trie_updates` to return `bool` indicating whether differences were found, and updated the caller to properly use the return value instead of treating all successful comparisons as having no differences.

View File

@@ -1,7 +0,0 @@
---
reth-cli-commands: minor
reth-node-core: minor
reth: patch
---
Made v2 storage the default for all new databases, deprecating the `--storage.v2` flag to a hidden no-op kept for backwards compatibility. Updated CLI reference docs to remove the now-hidden flag from all command help pages.

View File

@@ -1,5 +0,0 @@
---
reth-node-core: minor
---
Added `with_dev_block_time` helper method to `NodeConfig` for configuring dev miner block production interval.

View File

@@ -1,5 +0,0 @@
---
reth-transaction-pool: minor
---
Added `IntoIter: Send` bounds to `validate_transactions` and `validate_transactions_with_origin` in the `TransactionValidator` trait, avoiding unnecessary `Vec` collects. Simplified default `validate_transactions_with_origin` to delegate to `validate_transactions`.

View File

@@ -1,5 +0,0 @@
---
reth-db-api: patch
---
Changed `StoredNibblesSubKey` encoding to use a stack-allocated `[u8; 65]` array instead of a heap-allocated `Vec<u8>`, avoiding unnecessary heap allocation.

View File

@@ -1,7 +0,0 @@
---
reth-engine-tree: patch
reth-trie-sparse: patch
reth-tasks: patch
---
Offloaded deallocation of expensive proof node buffers to a persistent background thread (`Runtime::spawn_drop`) to avoid blocking state root computation or lock-holding code.

View File

@@ -1,5 +0,0 @@
---
reth-storage-api: patch
---
Added `Arc` to `auto_impl` derive for storage-api traits to support automatic `Arc` wrapper implementations.

View File

@@ -1,8 +0,0 @@
---
reth: patch
reth-engine-tree: patch
reth-node-builder: patch
reth-trie-sparse: minor
---
Added `trie-debug` feature for recording sparse trie mutations to aid in debugging state root mismatches.

View File

@@ -1,6 +0,0 @@
---
reth-static-file-types: patch
reth-provider: patch
---
Move changeset offsets from segment header to external `.csoff` sidecar file for incremental writes and crash recovery.

View File

@@ -1,5 +0,0 @@
---
reth-provider: patch
---
Removed unused staging types from ProviderFactoryBuilder.

View File

@@ -1,6 +0,0 @@
---
reth-trie-sparse: patch
reth-engine-tree: patch
---
Removed the `skip_proof_node_filtering` flag, `revealed_account_paths`/`revealed_paths` tracking, and the `filter_revealed_v2_proof_nodes` function from the sparse trie implementation. Also removed the corresponding skipped-nodes metrics, simplifying the proof node reveal path to always pass nodes directly to the sparse trie without pre-filtering.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: minor
---
Added a comprehensive generic `SparseTrie` test suite covering `set_root`, `reveal_nodes`, `update_leaves`, `root`, `take_updates`, `commit_updates`, `prune`, `wipe`/`clear`, `get_leaf_value`, `find_leaf`, `size_hint`, and integration lifecycle scenarios. Tests are stamped out for all concrete `SparseTrie` implementations via a macro.

View File

@@ -1,5 +0,0 @@
---
reth-provider: patch
---
Fixed sender pruning during block reorg to skip when sender_recovery is fully pruned, preventing a fatal crash when no sender data exists in static files.

View File

@@ -1,5 +0,0 @@
---
reth-engine-tree: patch
---
Downgraded per-transaction prewarm span from `debug_span!` to `trace_span!` to reduce noise in debug-level logging.

View File

@@ -1,7 +0,0 @@
---
reth-network-types: minor
reth-network: minor
reth-node-core: patch
---
Added `PersistedPeerInfo` struct to persist richer peer metadata (kind, fork ID, reputation) to disk. Updated `PeersConfig::with_basic_nodes_from_file` to support both the new `PersistedPeerInfo` format and the legacy `Vec<NodeRecord>` format with automatic conversion, and updated `write_peers_to_file` to exclude backed-off and banned peers.

View File

@@ -1,5 +0,0 @@
---
reth: patch
---
Added automated changelog generation infrastructure using wevm/changelogs-rs with Claude Code integration. Configured per-crate changelog format with fixed version groups for reth binaries and exclusions for internal test utilities.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: minor
---
Removed `SerialSparseTrie` from the workspace, consolidating on `ParallelSparseTrie` as the single sparse trie implementation in `reth-trie-sparse`.

View File

@@ -1,5 +0,0 @@
---
reth-cli-commands: minor
---
Added `reth_version` field to `SnapshotManifest` to record the Reth version that produced a snapshot. The field is optional and populated automatically during manifest generation.

View File

@@ -1,5 +0,0 @@
---
reth-network: minor
---
Added `fork_id` as a tiebreaker in peer selection when reputations are equal, preferring peers with a discovered `fork_id` as it indicates fork compatibility. Added a test to verify the tiebreaker behavior.

View File

@@ -1,5 +0,0 @@
---
reth-trie-sparse: patch
---
Fixed a bug where trie nodes could appear in both `updated_nodes` and `removed_nodes` simultaneously by removing entries from `removed_nodes` when a node is inserted as updated.

View File

@@ -1,8 +0,0 @@
---
reth-trie-common: minor
reth-trie: minor
reth-trie-parallel: minor
reth-engine-tree: patch
---
Moved `ProofV2Target`, `MultiProofTargetsV2`, and `ChunkedMultiProofTargetsV2` from `reth-trie-parallel::targets_v2` into a new `reth-trie-common::target_v2` module, making these types available at a lower level without pulling in the full parallel trie crate. Added a `multiproof_v2` method to `Proof` in `reth-trie` that generates a state multiproof using the V2 proof calculator with synchronous account value encoding.

View File

@@ -1,14 +0,0 @@
---
reth-engine-tree: patch
reth-evm-ethereum: patch
reth-evm: patch
reth-primitives-traits: patch
reth-revm: patch
reth-rpc-eth-api: patch
reth-rpc-eth-types: patch
reth-provider: patch
example-custom-evm: patch
example-precompile-cache: patch
---
Bumped revm to v35.0.0, revm-inspectors to 0.35.0, and alloy-evm to 0.29.0. Updated call sites throughout the codebase to align with the new APIs, including `ExecutionResult` field changes (`gas_used``gas.used()`/`gas.final_refunded()`), removal of `.without_state_clear()`, updated `EthPrecompiles::new(spec)` constructor, and updated `block_hashes.lowest()` access.

View File

@@ -1,9 +0,0 @@
---
reth-trie-sparse: minor
reth-engine-primitives: minor
reth-engine-tree: minor
reth-node-core: minor
reth-trie-common: patch
---
Added an arena-based sparse trie implementation (`ArenaParallelSparseTrie`) using `slotmap` arena allocation for node storage, enabling parallel subtrie mutation without per-node hashing overhead. Added `ConfigurableSparseTrie` enum to switch between the arena and hash-map implementations, and a `--engine.enable-arena-sparse-trie` CLI flag to opt in at runtime.

View File

@@ -1,5 +0,0 @@
---
reth: patch
---
Updated Alloy dependencies from 1.5.2 to 1.6.1.

View File

@@ -1,4 +0,0 @@
---
---
Expanded CLI integration tests with subcommand help coverage, config TOML validation, genesis JSON validation, and send transaction round-trip test for dev mode.

View File

@@ -1,6 +0,0 @@
---
reth-engine-local: minor
reth-node-builder: minor
---
Added trigger-based `MiningMode` variant that allows blocks to be built on-demand via custom streams, and exposed `with_mining_mode` method on `DebugNodeLauncherFuture` to override default mining configuration.

View File

@@ -1,5 +0,0 @@
---
reth-network: minor
---
Added direction labels to `closed_sessions` and `pending_session_failures` metrics. Operators can now distinguish session closures and failures by direction (`active`, `incoming_pending`, `outgoing_pending` for closed sessions; `inbound`, `outbound` for pending session failures).

View File

@@ -1,4 +0,0 @@
---
---
Moved Kurtosis CI failure notifications to the hive Slack channel.

View File

@@ -1,7 +0,0 @@
---
reth-rpc-api: minor
reth-rpc-builder: patch
reth-rpc: minor
---
Added `subscribeFinalizedChainNotifications` RPC endpoint that buffers committed chain notifications and emits them once a new finalized block is received.

View File

@@ -1,5 +0,0 @@
---
reth-transaction-pool: patch
---
Fixed a bug where transactions from the same sender were added to the pending subpool out of nonce order. Ensured `process_updates` runs before `add_new_transaction` so that lower-nonce promotions are enqueued before the newly inserted higher-nonce transaction, preserving correct ordering for live `BestTransactions` iterators.

View File

@@ -1,7 +0,0 @@
---
reth-engine-primitives: patch
reth-engine-tree: patch
reth-node-core: patch
---
Removed `--engine.enable-arena-sparse-trie` CLI flag and made the arena-based sparse trie the default implementation. The hash-map-based `ParallelSparseTrie` variant is no longer selectable.

View File

@@ -1,7 +0,0 @@
---
reth-trie: major
reth-trie-db: major
reth-provider: minor
---
Added `MaskedTrieCursorFactory` and `MaskedTrieCursor` to handle prefix-set-based hash invalidation at the cursor layer, replacing the `DatabaseTrieWitness` trait abstraction. Removed `with_prefix_sets_mut` from `TrieWitness` and deleted `DatabaseTrieWitness` — callers should now wrap their cursor factory with `MaskedTrieCursorFactory` to apply prefix sets during witness/proof computation.

View File

@@ -1,6 +0,0 @@
---
reth-trie: minor
reth-trie-parallel: minor
---
Added `root_node` and `storage_root_node` methods to proof calculators for efficient root-only calculations. These methods directly return the root node without requiring dummy targets, replacing the previous workaround of passing fake targets to proof generation.

View File

@@ -1,5 +0,0 @@
---
reth-trie: patch
---
Fixed a potential panic in `ProofCalculator` by clearing internal computation state (`branch_stack`, `child_stack`, `branch_path`, etc.) after errors, preventing stale state from causing `usize` underflow panics when the calculator is reused. Added a test verifying correct behavior after simulated mid-computation errors.

View File

@@ -1,6 +1,6 @@
[profile.default]
retries = { backoff = "exponential", count = 2, delay = "2s", jitter = true }
slow-timeout = { period = "30s", terminate-after = 2 }
slow-timeout = { period = "30s", terminate-after = 4 }
[[profile.default.overrides]]
filter = "test(general_state_tests)"

View File

@@ -20,11 +20,6 @@
# include dist directory, where the reth binary is located after compilation
!/dist
# include PGO build helper used by Dockerfile.depot
!/.github
!/.github/scripts
!/.github/scripts/build_pgo_bolt.sh
# include licenses
!LICENSE-*

5
.github/CODEOWNERS vendored
View File

@@ -1,7 +1,7 @@
* @gakonst
crates/chain-state/ @fgimenez @mattsse
crates/chainspec/ @Rjected @joshieDo @mattsse
crates/cli/ @mattsse @Rjected
crates/cli/ @mattsse
crates/config/ @shekhirin @mattsse @Rjected
crates/consensus/ @mattsse @Rjected
crates/e2e-test-utils/ @mattsse @Rjected @klkvr @fgimenez
@@ -19,6 +19,7 @@ crates/metrics/ @mattsse @Rjected
crates/net/ @mattsse @Rjected
crates/net/downloaders/ @Rjected
crates/node/ @mattsse @Rjected @klkvr
crates/optimism/ @mattsse @Rjected
crates/payload/ @mattsse @Rjected
crates/primitives-traits/ @Rjected @mattsse @klkvr
crates/primitives/ @Rjected @mattsse @klkvr
@@ -38,7 +39,7 @@ crates/storage/libmdbx-rs/ @shekhirin
crates/storage/nippy-jar/ @joshieDo @shekhirin
crates/storage/provider/ @joshieDo @shekhirin @yongkangc
crates/storage/storage-api/ @joshieDo
crates/tasks/ @mattsse @DaniPopes
crates/tasks/ @mattsse
crates/tokio-util/ @mattsse
crates/tracing/ @mattsse @shekhirin
crates/tracing-otlp/ @mattsse @Rjected

View File

@@ -43,6 +43,7 @@ body:
- `~/.cache/reth/logs` on Linux
- `~/Library/Caches/reth/logs` on macOS
- `%localAppData%/reth/logs` on Windows
render: text
validations:
required: false
@@ -57,6 +58,8 @@ body:
- Linux (ARM)
- Mac (Intel)
- Mac (Apple Silicon)
- Windows (x86)
- Windows (ARM)
- type: dropdown
id: container_type
attributes:

View File

@@ -5,4 +5,3 @@ self-hosted-runner:
- depot-ubuntu-latest-4
- depot-ubuntu-latest-8
- depot-ubuntu-latest-16
- available

View File

@@ -0,0 +1,36 @@
ethereum_package:
participants:
- el_type: reth
el_extra_params:
- "--rpc.eth-proof-window=100"
cl_type: teku
network_params:
preset: minimal
genesis_delay: 5
additional_preloaded_contracts: '
{
"0x4e59b44847b379578588920cA78FbF26c0B4956C": {
"balance": "0ETH",
"code": "0x7fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe03601600081602082378035828234f58015156039578182fd5b8082525050506014600cf3",
"storage": {},
"nonce": "1"
}
}'
optimism_package:
chains:
chain0:
participants:
node0:
el:
type: op-geth
cl:
type: op-node
node1:
el:
type: op-reth
image: "ghcr.io/paradigmxyz/op-reth:kurtosis-ci"
cl:
type: op-node
network_params:
holocene_time_offset: 0
isthmus_time_offset: 0

View File

@@ -4,17 +4,3 @@ updates:
directory: "/"
schedule:
interval: "weekly"
- package-ecosystem: "cargo"
directory: "/"
schedule:
interval: "weekly"
labels:
- "A-dependencies"
commit-message:
prefix: "chore(deps)"
open-pull-requests-limit: 1
groups:
cargo-weekly:
applies-to: "version-updates"
patterns: ["*"]
update-types: ["minor", "patch"]

View File

@@ -1,106 +0,0 @@
// Generates a rich GitHub Actions job summary for reth-bench results.
//
// Reads from environment:
// BENCH_WORK_DIR Directory containing summary.json
// BENCH_PR PR number (may be empty)
// BENCH_ACTOR GitHub user who triggered the bench
// BENCH_CORES CPU core limit (0 = all)
// BENCH_WARMUP_BLOCKS Number of warmup blocks
// BENCH_SAMPLY 'true' if samply profiling was enabled
// BENCH_ABBA 'true' if ABBA interleaved order was used
//
// Usage from actions/github-script:
// const jobSummary = require('./.github/scripts/bench-job-summary.js');
// await jobSummary({ core, context, chartSha, grafanaUrl, runId });
const fs = require('fs');
const { verdict, loadSamplyUrls, blocksLabel, metricRows, waitTimeRows } = require('./bench-utils');
module.exports = async function ({ core, context, chartSha, grafanaUrl, runId }) {
let summary;
try {
summary = JSON.parse(fs.readFileSync(process.env.BENCH_WORK_DIR + '/summary.json', 'utf8'));
} catch (e) {
await core.summary.addRaw('⚠️ Benchmark completed but failed to load summary.').write();
return;
}
const repo = `${context.repo.owner}/${context.repo.repo}`;
const prNumber = process.env.BENCH_PR;
const actor = process.env.BENCH_ACTOR;
const commitUrl = `https://github.com/${repo}/commit`;
const { emoji, label } = verdict(summary.changes);
const baselineLink = `[\`${summary.baseline.name}\`](${commitUrl}/${summary.baseline.ref})`;
const featureLink = `[\`${summary.feature.name}\`](${commitUrl}/${summary.feature.ref})`;
const diffUrl = `https://github.com/${repo}/compare/${summary.baseline.ref}...${summary.feature.ref}`;
// Header & metadata
const metaParts = [];
if (prNumber) metaParts.push(`**[PR #${prNumber}](https://github.com/${repo}/pull/${prNumber})**`);
metaParts.push(`triggered by @${actor}`);
let md = `# ${emoji} ${label}\n\n`;
md += metaParts.join(' · ') + '\n\n';
md += `**Baseline:** ${baselineLink}\n`;
md += `**Feature:** ${featureLink} ([diff](${diffUrl}))\n`;
md += blocksLabel(summary).map(p => `**${p.key}:** ${p.value}`).join(' · ') + '\n\n';
// Main comparison table
const rows = metricRows(summary);
md += `| Metric | Baseline | Feature | Change |\n`;
md += `|--------|----------|---------|--------|\n`;
for (const r of rows) {
md += `| ${r.label} | ${r.baseline} | ${r.feature} | ${r.change} |\n`;
}
md += '\n';
// Wait time breakdown
const wtRows = waitTimeRows(summary);
if (wtRows.length > 0) {
md += `### Wait Time Breakdown\n\n`;
md += `| Metric | Baseline | Feature |\n`;
md += `|--------|----------|--------|\n`;
for (const r of wtRows) {
md += `| ${r.title} | ${r.baseline} | ${r.feature} |\n`;
}
md += '\n';
}
// Charts
if (chartSha) {
const prNum = prNumber || '0';
const baseUrl = `https://raw.githubusercontent.com/decofe/reth-bench-charts/${chartSha}/pr/${prNum}/${runId}`;
const charts = [
{ file: 'latency_throughput.png', label: 'Latency, Throughput & Diff' },
{ file: 'wait_breakdown.png', label: 'Wait Time Breakdown' },
{ file: 'gas_vs_latency.png', label: 'Gas vs Latency' },
];
md += `### Charts\n\n`;
for (const chart of charts) {
md += `<details><summary>${chart.label}</summary>\n\n`;
md += `![${chart.label}](${baseUrl}/${chart.file})\n\n`;
md += `</details>\n\n`;
}
}
// Samply profiles
const samplyUrls = loadSamplyUrls(process.env.BENCH_WORK_DIR);
const samplyLinks = Object.entries(samplyUrls).map(([run, url]) => `- **${run}**: [Firefox Profiler](${url})`);
if (samplyLinks.length > 0) {
md += `### Samply Profiles\n\n${samplyLinks.join('\n')}\n\n`;
}
// Grafana
if (grafanaUrl) {
md += `### Grafana Dashboard\n\n[View real-time metrics](${grafanaUrl})\n\n`;
}
// Node errors
try {
const errors = fs.readFileSync(process.env.BENCH_WORK_DIR + '/errors.md', 'utf8');
if (errors.trim()) md += '\n' + errors + '\n';
} catch {}
await core.summary.addRaw(md).write();
};

View File

@@ -1,276 +0,0 @@
#!/usr/bin/env python3
"""
Prometheus metrics proxy that fetches from a local reth node and
re-exposes with additional benchmark labels.
Reads labels from a JSON file (updated by local-reth-bench.sh between runs)
and injects them into every Prometheus metric line.
Returns empty 200 when reth is not running (clean Grafana gaps).
"""
import argparse
import ipaddress
import json
import subprocess
import sys
import time
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.request import urlopen
from urllib.error import URLError
def read_labels(path):
try:
with open(path) as f:
return json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
return {}
def inject_labels(metrics_bytes, label_str, label_names):
"""Inject labels into Prometheus text format.
Operates on bytes and uses simple string ops instead of regex
for speed on large payloads (reth exposes thousands of metrics).
Skips injecting into lines that already contain any of the label names
to avoid duplicate labels (which Prometheus rejects).
"""
if not label_str:
return metrics_bytes
label_bytes = label_str.encode("utf-8")
# Pre-encode label names for fast duplicate detection
label_name_bytes = [n.encode("utf-8") for n in label_names]
out = []
for line in metrics_bytes.split(b"\n"):
# Skip comments and blank lines
if line.startswith(b"#") or not line:
out.append(line)
continue
brace = line.find(b"{")
space = line.find(b" ")
if space == -1:
# Malformed, pass through
out.append(line)
elif brace != -1 and brace < space:
# Has labels: metric{existing="val"} 123
close = line.find(b"}", brace)
if close == -1:
out.append(line)
continue
# Filter out labels that already exist in this line
existing = line[brace + 1:close]
inject = label_bytes
if existing:
for name in label_name_bytes:
if name + b"=" in existing:
# Rebuild inject string excluding this label
inject = _remove_label(inject, name)
if not inject:
out.append(line)
continue
if close == brace + 1:
# Empty braces: metric{} 123
out.append(line[:close] + inject + line[close:])
else:
out.append(line[:close] + b"," + inject + line[close:])
else:
# No labels: metric 123
out.append(line[:space] + b"{" + label_bytes + b"}" + line[space:])
return b"\n".join(out)
def _remove_label(label_bytes, name):
"""Remove a single label (name=\"...\") from a comma-separated label string."""
parts = []
for part in label_bytes.split(b","):
if not part.startswith(name + b"="):
parts.append(part)
return b",".join(parts)
def build_label_str(labels):
"""Pre-format the label injection string: key1="val1",key2="val2" """
if not labels:
return ""
return ",".join(f'{k}="{v}"' for k, v in sorted(labels.items()))
def build_elapsed_gauge(labels):
"""Build a bench_elapsed_seconds gauge from run_start_epoch in labels."""
start = labels.get("run_start_epoch")
if not start:
return b""
try:
elapsed = time.time() - float(start)
except (ValueError, TypeError):
return b""
# Build labels excluding internal keys
display = {k: v for k, v in labels.items()
if k not in ("run_start_epoch", "reference_epoch")}
lstr = build_label_str(display)
return (
f"# HELP bench_elapsed_seconds Seconds since benchmark run started\n"
f"# TYPE bench_elapsed_seconds gauge\n"
f"bench_elapsed_seconds{{{lstr}}} {elapsed:.1f}\n"
).encode("utf-8")
def compute_timestamp_ms(labels):
"""Compute a synthetic timestamp so all runs share a common time origin.
Returns the timestamp in milliseconds, or None if not enough info.
Uses: reference_epoch + (now - run_start_epoch) → all runs overlay at
the same Grafana time range.
"""
ref = labels.get("reference_epoch")
start = labels.get("run_start_epoch")
if not ref or not start:
return None
try:
elapsed = time.time() - float(start)
return int((float(ref) + elapsed) * 1000)
except (ValueError, TypeError):
return None
def inject_timestamps(metrics_bytes, timestamp_ms):
"""Append a Prometheus timestamp (ms) to every data line.
Prometheus text format: metric{labels} value [timestamp_ms]
Adding timestamps causes Prometheus to store all runs' samples
at the same relative time, enabling natural overlay in Grafana.
"""
if timestamp_ms is None:
return metrics_bytes
ts = str(timestamp_ms).encode("utf-8")
out = []
for line in metrics_bytes.split(b"\n"):
if line.startswith(b"#") or not line:
out.append(line)
else:
out.append(line + b" " + ts)
return b"\n".join(out)
class MetricsHandler(BaseHTTPRequestHandler):
# Use HTTP/1.1 so Content-Length is respected and Prometheus
# doesn't have to rely on connection close to detect end of body.
protocol_version = "HTTP/1.1"
def do_GET(self):
src = self.client_address[0]
try:
resp = urlopen(self.server.upstream, timeout=2)
metrics = resp.read()
except (URLError, ConnectionError, OSError):
# reth not running — return empty 200
self._send(b"")
#print(f" scrape from {src}: empty (reth not running)", flush=True)
return
all_labels = read_labels(self.server.labels_file)
# Internal keys — not injected as Prometheus labels
internal = ("run_start_epoch", "reference_epoch")
labels = {k: v for k, v in all_labels.items() if k not in internal}
label_str = build_label_str(labels)
label_names = sorted(labels.keys())
t0 = time.monotonic()
result = inject_labels(metrics, label_str, label_names)
result += build_elapsed_gauge(all_labels)
ts_ms = compute_timestamp_ms(all_labels)
result = inject_timestamps(result, ts_ms)
dt = time.monotonic() - t0
self._send(result)
print(f" scrape from {src}: {len(metrics)} -> {len(result)} bytes, "
f"inject {dt*1000:.1f}ms", flush=True)
def _send(self, body):
self.send_response(200)
self.send_header("Content-Type", "text/plain; version=0.0.4")
self.send_header("Content-Length", str(len(body)))
self.send_header("Connection", "close")
self.end_headers()
if body:
self.wfile.write(body)
def log_message(self, format, *args):
pass # suppress per-request logging
def resolve_bind_address(subnet_cidr):
"""Find the local IP address that belongs to the given subnet.
Uses ``ip -j addr show`` to enumerate interfaces and returns the first
address that falls within *subnet_cidr* (e.g. ``10.10.0.0/24``).
"""
network = ipaddress.ip_network(subnet_cidr, strict=False)
try:
result = subprocess.run(
["ip", "-j", "addr", "show"],
capture_output=True, text=True, check=True,
)
interfaces = json.loads(result.stdout)
except (subprocess.CalledProcessError, FileNotFoundError, json.JSONDecodeError) as exc:
print(f"Error: cannot enumerate interfaces: {exc}", file=sys.stderr)
sys.exit(1)
for iface in interfaces:
for addr_info in iface.get("addr_info", []):
try:
addr = ipaddress.ip_address(addr_info["local"])
except (KeyError, ValueError):
continue
if addr in network:
return str(addr)
print(f"Error: no interface address found in subnet {subnet_cidr}", file=sys.stderr)
sys.exit(1)
def main():
parser = argparse.ArgumentParser(description="Prometheus metrics proxy with label injection")
parser.add_argument("--labels", default="/tmp/bench-metrics-labels.json",
help="Path to JSON file with labels to inject (default: /tmp/bench-metrics-labels.json)")
parser.add_argument("--upstream", default="http://127.0.0.1:9100/",
help="Upstream reth metrics URL (default: http://127.0.0.1:9100/)")
bind_group = parser.add_mutually_exclusive_group()
bind_group.add_argument("--bind", default=None,
help="Address to bind the proxy (default: 0.0.0.0)")
bind_group.add_argument("--subnet", default=None,
help="Auto-detect bind address from a local interface in this subnet (e.g. 10.10.0.0/24)")
parser.add_argument("--port", type=int, default=9090,
help="Port to bind the proxy (default: 9090)")
args = parser.parse_args()
if args.subnet:
bind_addr = resolve_bind_address(args.subnet)
elif args.bind:
bind_addr = args.bind
else:
bind_addr = "0.0.0.0"
server = HTTPServer((bind_addr, args.port), MetricsHandler)
server.upstream = args.upstream
server.labels_file = args.labels
print(f"bench-metrics-proxy listening on {bind_addr}:{args.port}")
print(f" upstream: {args.upstream}")
print(f" labels: {args.labels}")
sys.stdout.flush()
server.serve_forever()
if __name__ == "__main__":
main()

View File

@@ -1,131 +0,0 @@
#!/usr/bin/env bash
#
# Builds (or fetches from cache) reth binaries for benchmarking.
#
# Usage: bench-reth-build.sh <baseline|feature> <source-dir> <commit> [branch-sha]
#
# baseline — build/fetch the baseline binary at <commit> (merge-base)
# source-dir must be checked out at <commit>
# feature — build/fetch the candidate binary + reth-bench at <commit>
# source-dir must be checked out at <commit>
# optional branch-sha is the PR head commit for cache key
#
# Outputs:
# baseline: <source-dir>/target/profiling/reth
# feature: <source-dir>/target/profiling/reth, reth-bench installed to cargo bin
#
# Required: mc (MinIO client) with a configured alias
set -euo pipefail
MC="mc"
MODE="$1"
SOURCE_DIR="$2"
COMMIT="$3"
# Tracy support: when BENCH_TRACY is "on" or "full", add Tracy cargo features
# and frame pointers for accurate stack traces.
EXTRA_FEATURES=""
EXTRA_RUSTFLAGS=""
if [ "${BENCH_TRACY:-off}" != "off" ]; then
EXTRA_FEATURES="tracy,tracy-client/ondemand"
EXTRA_RUSTFLAGS=" -C force-frame-pointers=yes"
fi
# Cache suffix: hash of features+rustflags so different build configs get separate cache entries
if [ -n "$EXTRA_FEATURES" ] || [ -n "$EXTRA_RUSTFLAGS" ]; then
BUILD_SUFFIX="-$(echo "${EXTRA_FEATURES}${EXTRA_RUSTFLAGS}" | sha256sum | cut -c1-12)"
else
BUILD_SUFFIX=""
fi
# Verify a cached reth binary was built from the expected commit.
# `reth --version` outputs "Commit SHA: <full-sha>" on its own line.
verify_binary() {
local binary="$1" expected_commit="$2"
local version binary_sha
version=$("$binary" --version 2>/dev/null) || return 1
binary_sha=$(echo "$version" | sed -n 's/^Commit SHA: *//p')
if [ -z "$binary_sha" ]; then
echo "Warning: could not extract commit SHA from version output"
return 1
fi
if [ "$binary_sha" = "$expected_commit" ]; then
return 0
fi
echo "Cache mismatch: binary built from ${binary_sha} but expected ${expected_commit}"
return 1
}
case "$MODE" in
baseline|main)
BUCKET="minio/reth-binaries/${COMMIT}${BUILD_SUFFIX}"
mkdir -p "${SOURCE_DIR}/target/profiling"
CACHE_VALID=false
if $MC stat "${BUCKET}/reth" &>/dev/null; then
echo "Cache hit for baseline (${COMMIT}), downloading binary..."
$MC cp "${BUCKET}/reth" "${SOURCE_DIR}/target/profiling/reth"
chmod +x "${SOURCE_DIR}/target/profiling/reth"
if verify_binary "${SOURCE_DIR}/target/profiling/reth" "${COMMIT}"; then
CACHE_VALID=true
else
echo "Cached baseline binary is stale, rebuilding..."
fi
fi
if [ "$CACHE_VALID" = false ]; then
echo "Building baseline (${COMMIT}) from source..."
cd "${SOURCE_DIR}"
FEATURES_ARG=""
WORKSPACE_ARG=""
if [ -n "$EXTRA_FEATURES" ]; then
# --workspace is needed for cross-package feature syntax (tracy-client/ondemand)
FEATURES_ARG="--features ${EXTRA_FEATURES}"
WORKSPACE_ARG="--workspace"
fi
# shellcheck disable=SC2086
RUSTFLAGS="-C target-cpu=native${EXTRA_RUSTFLAGS}" \
cargo build --profile profiling --bin reth $WORKSPACE_ARG $FEATURES_ARG
$MC cp target/profiling/reth "${BUCKET}/reth"
fi
;;
feature|branch)
BRANCH_SHA="${4:-$COMMIT}"
BUCKET="minio/reth-binaries/${BRANCH_SHA}${BUILD_SUFFIX}"
CACHE_VALID=false
if $MC stat "${BUCKET}/reth" &>/dev/null && $MC stat "${BUCKET}/reth-bench" &>/dev/null; then
echo "Cache hit for ${BRANCH_SHA}, downloading binaries..."
mkdir -p "${SOURCE_DIR}/target/profiling"
$MC cp "${BUCKET}/reth" "${SOURCE_DIR}/target/profiling/reth"
$MC cp "${BUCKET}/reth-bench" /home/ubuntu/.cargo/bin/reth-bench
chmod +x "${SOURCE_DIR}/target/profiling/reth" /home/ubuntu/.cargo/bin/reth-bench
if verify_binary "${SOURCE_DIR}/target/profiling/reth" "${COMMIT}"; then
CACHE_VALID=true
else
echo "Cached feature binary is stale, rebuilding..."
fi
fi
if [ "$CACHE_VALID" = false ]; then
echo "Building feature (${COMMIT}) from source..."
cd "${SOURCE_DIR}"
rustup show active-toolchain || rustup default stable
if [ -n "$EXTRA_FEATURES" ]; then
# Can't use `make profiling` when adding features; build explicitly
# --workspace is needed for cross-package feature syntax (tracy-client/ondemand)
RUSTFLAGS="-C target-cpu=native${EXTRA_RUSTFLAGS}" \
cargo build --profile profiling --workspace --bin reth --features "${EXTRA_FEATURES}"
else
make profiling
fi
make install-reth-bench
$MC cp target/profiling/reth "${BUCKET}/reth"
$MC cp "$(which reth-bench)" "${BUCKET}/reth-bench"
fi
;;
*)
echo "Usage: $0 <baseline|feature> <source-dir> <commit> [branch-sha]"
exit 1
;;
esac

View File

@@ -1,260 +0,0 @@
#!/usr/bin/env python3
"""Generate benchmark charts from reth-bench CSV output.
Usage:
bench-engine-charts.py <combined_csv> --output-dir <dir> [--baseline <baseline_csv>]
Generates three PNG charts:
1. newPayload latency + Ggas/s per block (+ latency diff when baseline present)
2. Wait breakdown (persistence, execution cache, sparse trie) per block
3. Scatter plot of gas used vs latency
When --baseline is provided, charts overlay both datasets for comparison.
"""
import argparse
import csv
import sys
from pathlib import Path
import numpy as np
try:
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
except ImportError:
print("matplotlib is required: pip install matplotlib", file=sys.stderr)
sys.exit(1)
GIGAGAS = 1_000_000_000
def parse_combined_csv(path: str) -> list[dict]:
rows = []
with open(path) as f:
reader = csv.DictReader(f)
for row in reader:
rows.append(
{
"block_number": int(row["block_number"]),
"gas_used": int(row["gas_used"]),
"new_payload_latency_us": int(row["new_payload_latency"]),
"persistence_wait_us": int(row["persistence_wait"])
if row.get("persistence_wait")
else None,
"execution_cache_wait_us": int(row.get("execution_cache_wait", 0)),
"sparse_trie_wait_us": int(row.get("sparse_trie_wait", 0)),
}
)
return rows
def plot_latency_and_throughput(
feature: list[dict], baseline: list[dict] | None, out: Path,
baseline_name: str = "baseline", feature_name: str = "feature",
):
num_plots = 3 if baseline else 2
fig, axes = plt.subplots(num_plots, 1, figsize=(12, 4 * num_plots), sharex=True)
ax1, ax2 = axes[0], axes[1]
feat_x = [r["block_number"] for r in feature]
feat_lat = [r["new_payload_latency_us"] / 1_000 for r in feature]
feat_ggas = []
for r in feature:
lat_s = r["new_payload_latency_us"] / 1_000_000
feat_ggas.append(r["gas_used"] / lat_s / GIGAGAS if lat_s > 0 else 0)
if baseline:
base_x = [r["block_number"] for r in baseline]
base_lat = [r["new_payload_latency_us"] / 1_000 for r in baseline]
base_ggas = []
for r in baseline:
lat_s = r["new_payload_latency_us"] / 1_000_000
base_ggas.append(r["gas_used"] / lat_s / GIGAGAS if lat_s > 0 else 0)
l, = ax1.plot(base_x, base_lat, linewidth=0.8, label=baseline_name, alpha=0.7)
ax1.axhline(np.median(base_lat), color=l.get_color(), linestyle="--", linewidth=1, alpha=0.7, label=f"{baseline_name} median")
l, = ax2.plot(base_x, base_ggas, linewidth=0.8, label=baseline_name, alpha=0.7)
ax2.axhline(np.median(base_ggas), color=l.get_color(), linestyle="--", linewidth=1, alpha=0.7, label=f"{baseline_name} median")
l, = ax1.plot(feat_x, feat_lat, linewidth=0.8, label=feature_name)
ax1.axhline(np.median(feat_lat), color=l.get_color(), linestyle="--", linewidth=1, label=f"{feature_name} median")
ax1.set_ylabel("Latency (ms)")
ax1.set_title("newPayload Latency per Block")
ax1.grid(True, alpha=0.3)
ax1.legend()
l, = ax2.plot(feat_x, feat_ggas, linewidth=0.8, label=feature_name)
ax2.axhline(np.median(feat_ggas), color=l.get_color(), linestyle="--", linewidth=1, label=f"{feature_name} median")
ax2.set_ylabel("Ggas/s")
ax2.set_title("Execution Throughput per Block")
ax2.grid(True, alpha=0.3)
ax2.legend()
if baseline:
ax3 = axes[2]
base_by_block = {r["block_number"]: r["new_payload_latency_us"] for r in baseline}
blocks, diffs = [], []
for r in feature:
bn = r["block_number"]
if bn in base_by_block and base_by_block[bn] > 0:
pct = (r["new_payload_latency_us"] - base_by_block[bn]) / base_by_block[bn] * 100
blocks.append(bn)
diffs.append(pct)
if blocks:
colors = ["green" if d <= 0 else "red" for d in diffs]
ax3.bar(blocks, diffs, width=1.0, color=colors, alpha=0.7, edgecolor="none")
ax3.axhline(0, color="black", linewidth=0.5)
ax3.set_ylabel("Δ Latency (%)")
ax3.set_title("Per-Block newPayload Latency Change (feature vs baseline)")
ax3.grid(True, alpha=0.3, axis="y")
axes[-1].set_xlabel("Block Number")
fig.tight_layout()
fig.savefig(out, dpi=150)
plt.close(fig)
def plot_wait_breakdown(
feature: list[dict], baseline: list[dict] | None, out: Path,
baseline_name: str = "baseline", feature_name: str = "feature",
):
series = [
("Persistence Wait", "persistence_wait_us"),
("State Cache Wait", "execution_cache_wait_us"),
("Trie Cache Wait", "sparse_trie_wait_us"),
]
fig, axes = plt.subplots(len(series), 1, figsize=(12, 3 * len(series)), sharex=True)
for ax, (label, key) in zip(axes, series):
if baseline:
bx = [r["block_number"] for r in baseline if r[key] is not None]
by = [r[key] / 1_000 for r in baseline if r[key] is not None]
if bx:
ax.plot(bx, by, linewidth=0.8, label=baseline_name, alpha=0.7)
fx = [r["block_number"] for r in feature if r[key] is not None]
fy = [r[key] / 1_000 for r in feature if r[key] is not None]
if fx:
ax.plot(fx, fy, linewidth=0.8, label=feature_name)
ax.set_ylabel("ms")
ax.set_title(label)
ax.grid(True, alpha=0.3)
if baseline:
ax.legend()
axes[-1].set_xlabel("Block Number")
fig.suptitle("Wait Time Breakdown per Block", fontsize=14, y=1.01)
fig.tight_layout()
fig.savefig(out, dpi=150, bbox_inches="tight")
plt.close(fig)
def _add_regression(ax, x, y, color, label):
"""Add a linear regression line to the axes."""
if len(x) < 2:
return
xa, ya = np.array(x), np.array(y)
m, b = np.polyfit(xa, ya, 1)
x_range = np.linspace(xa.min(), xa.max(), 100)
ax.plot(x_range, m * x_range + b, color=color, linewidth=1.5, alpha=0.8,
label=label)
def plot_gas_vs_latency(
feature: list[dict], baseline: list[dict] | None, out: Path,
baseline_name: str = "baseline", feature_name: str = "feature",
):
fig, ax = plt.subplots(figsize=(8, 6))
if baseline:
bgas = [r["gas_used"] / 1_000_000 for r in baseline]
blat = [r["new_payload_latency_us"] / 1_000 for r in baseline]
ax.scatter(bgas, blat, s=8, alpha=0.5)
_add_regression(ax, bgas, blat, "tab:blue", baseline_name)
fgas = [r["gas_used"] / 1_000_000 for r in feature]
flat = [r["new_payload_latency_us"] / 1_000 for r in feature]
ax.scatter(fgas, flat, s=8, alpha=0.6)
_add_regression(ax, fgas, flat, "tab:orange", feature_name)
ax.set_xlabel("Gas Used (Mgas)")
ax.set_ylabel("newPayload Latency (ms)")
ax.set_title("Gas Used vs Latency")
ax.grid(True, alpha=0.3)
ax.legend()
fig.tight_layout()
fig.savefig(out, dpi=150)
plt.close(fig)
def merge_csvs(paths: list[str]) -> list[dict]:
"""Parse and merge multiple CSVs, averaging values for duplicate blocks."""
by_block: dict[int, list[dict]] = {}
for path in paths:
for row in parse_combined_csv(path):
by_block.setdefault(row["block_number"], []).append(row)
merged = []
for bn in sorted(by_block):
rows = by_block[bn]
if len(rows) == 1:
merged.append(rows[0])
else:
avg = {"block_number": bn}
for key in ("gas_used", "new_payload_latency_us"):
avg[key] = int(sum(r[key] for r in rows) / len(rows))
for key in ("persistence_wait_us", "execution_cache_wait_us", "sparse_trie_wait_us"):
vals = [r[key] for r in rows if r[key] is not None]
avg[key] = int(sum(vals) / len(vals)) if vals else None
merged.append(avg)
return merged
def main():
parser = argparse.ArgumentParser(description="Generate benchmark charts")
parser.add_argument(
"--feature", nargs="+", required=True,
help="Path(s) to feature combined_latency.csv",
)
parser.add_argument(
"--output-dir", required=True, help="Output directory for PNG charts"
)
parser.add_argument(
"--baseline", nargs="+", help="Path(s) to baseline combined_latency.csv"
)
parser.add_argument("--baseline-name", default="baseline", help="Label for baseline")
parser.add_argument("--feature-name", "--branch-name", default="feature", help="Label for feature")
args = parser.parse_args()
feature = merge_csvs(args.feature)
if not feature:
print("No results found in feature CSV(s)", file=sys.stderr)
sys.exit(1)
baseline = None
if args.baseline:
baseline = merge_csvs(args.baseline)
if not baseline:
print(
"Warning: no results in baseline CSV(s), skipping comparison",
file=sys.stderr,
)
baseline = None
out_dir = Path(args.output_dir)
out_dir.mkdir(parents=True, exist_ok=True)
bname = args.baseline_name
fname = args.feature_name
plot_latency_and_throughput(feature, baseline, out_dir / "latency_throughput.png", bname, fname)
plot_wait_breakdown(feature, baseline, out_dir / "wait_breakdown.png", bname, fname)
plot_gas_vs_latency(feature, baseline, out_dir / "gas_vs_latency.png", bname, fname)
print(f"Charts written to {out_dir}")
if __name__ == "__main__":
main()

View File

@@ -1,581 +0,0 @@
#!/usr/bin/env bash
#
# local-reth-bench.sh — Run the reth Engine API benchmark locally.
#
# Replicates the CI bench.yml workflow (build, snapshot, system tuning,
# interleaved B-F-F-B execution, summary, charts) without any GitHub
# Actions glue (no PR comments, no artifact upload, no Slack).
#
# Usage:
# local-reth-bench.sh <baseline-ref> <feature-ref> [options]
#
# Options:
# --blocks N Number of blocks to benchmark (default: 500)
# --warmup N Number of warmup blocks (default: 100)
# --cores N Limit reth to N CPU cores, 0 = all available (default: 0)
# --samply Enable samply profiling
# --tracy MODE Tracy profiling: off, on, full (default: off)
# --tracy-filter F Tracy tracing filter (default: debug)
# --no-tune Skip system tuning (useful on dev machines / macOS)
#
# Requires: the reth repo at RETH_REPO (default: ~/reth)
#
# Dependencies (install before first run):
# mc (MinIO client), schelk, cpupower, taskset, stdbuf, python3, curl,
# make, uv, pzstd, jq, Rust toolchain (cargo/rustup)
#
# The script delegates to the existing bench-reth-*.sh scripts in the reth
# repo for the actual build, snapshot, and run steps.
set -euo pipefail
# ── PATH ──────────────────────────────────────────────────────────────
# Ensure cargo and user-local bins (mc, uv) are visible
export PATH="$HOME/.local/bin:$HOME/.cargo/bin:$PATH"
# ── Defaults ──────────────────────────────────────────────────────────
RETH_REPO="${RETH_REPO:-$HOME/reth}"
BLOCKS=500
WARMUP=100
CORES=0
SAMPLY=false
TRACY="off"
TRACY_FILTER="debug"
TUNE=true
BASELINE_REF=""
FEATURE_REF=""
# ── Parse arguments ──────────────────────────────────────────────────
usage() {
cat <<EOF
Usage: $(basename "$0") <baseline-ref> <feature-ref> [options]
Options:
--blocks N Number of blocks to benchmark (default: 500)
--warmup N Number of warmup blocks (default: 100)
--cores N Limit reth to N CPU cores (default: 0 = all)
--samply Enable samply profiling
--tracy MODE Tracy profiling: off, on, full (default: off)
on = tracing only (lower overhead)
full = tracing + CPU sampling (higher overhead)
--tracy-filter F Tracy tracing filter (default: debug)
--no-tune Skip system tuning
EOF
exit 1
}
while [[ $# -gt 0 ]]; do
case "$1" in
--blocks) BLOCKS="$2"; shift 2 ;;
--warmup) WARMUP="$2"; shift 2 ;;
--cores) CORES="$2"; shift 2 ;;
--samply) SAMPLY=true; shift ;;
--tracy) TRACY="$2"; shift 2 ;;
--tracy-filter) TRACY_FILTER="$2"; shift 2 ;;
--no-tune) TUNE=false; shift ;;
--help|-h) usage ;;
-*) echo "Unknown option: $1"; usage ;;
*)
if [ -z "$BASELINE_REF" ]; then
BASELINE_REF="$1"
elif [ -z "$FEATURE_REF" ]; then
FEATURE_REF="$1"
else
echo "Unexpected argument: $1"; usage
fi
shift
;;
esac
done
if [ -z "$BASELINE_REF" ] || [ -z "$FEATURE_REF" ]; then
echo "Error: both <baseline-ref> and <feature-ref> are required."
usage
fi
# Validate --tracy value
case "$TRACY" in
off|on|full) ;;
*) echo "Error: --tracy must be off, on, or full (got: $TRACY)"; usage ;;
esac
# Samply + tracy=full are mutually exclusive (both use perf sampling)
if [ "$SAMPLY" = "true" ] && [ "$TRACY" = "full" ]; then
echo "Warning: samply and tracy=full both use perf sampling; downgrading tracy to 'on'."
TRACY="on"
fi
# ── Check dependencies ───────────────────────────────────────────────
missing=()
for cmd in mc schelk cpupower taskset stdbuf python3 curl make uv pzstd jq cargo; do
command -v "$cmd" &>/dev/null || missing+=("$cmd")
done
if [ ${#missing[@]} -gt 0 ]; then
echo "Error: missing required tools: ${missing[*]}"
echo "See the CI 'Install dependencies' step in .github/workflows/bench.yml for install instructions."
exit 1
fi
if [ "$TRACY" != "off" ]; then
if ! command -v tracy-capture &>/dev/null; then
echo "Error: tracy-capture is required for --tracy $TRACY"
exit 1
fi
fi
# Ensure tools that run via sudo are in a sudo-visible path.
# The bench scripts use `sudo schelk` / `sudo samply` but cargo installs
# them to ~/.cargo/bin which sudo's secure_path doesn't include.
for cmd in schelk samply; do
if command -v "$cmd" &>/dev/null && ! sudo sh -c "command -v $cmd" &>/dev/null; then
echo "Installing $cmd to /usr/local/bin (needed for sudo)..."
sudo install "$(command -v "$cmd")" /usr/local/bin/
fi
done
if [ ! -d "$RETH_REPO/.git" ]; then
echo "Error: RETH_REPO=$RETH_REPO is not a git repository."
echo "Set RETH_REPO or clone reth to ~/reth"
exit 1
fi
# ── Resolve paths ────────────────────────────────────────────────────
SELF_DIR="$(cd "$(dirname "$0")" && pwd)"
SCRIPTS_DIR="${RETH_REPO}/.github/scripts"
BENCH_WORK_DIR="${RETH_REPO}/../bench-work-$(date +%Y%m%d-%H%M%S)"
BASELINE_SRC="${RETH_REPO}/../reth-baseline"
FEATURE_SRC="${RETH_REPO}/../reth-feature"
mkdir -p "$BENCH_WORK_DIR"
BENCH_WORK_DIR="$(cd "$BENCH_WORK_DIR" && pwd)"
# ── Global cleanup trap (restores system tuning on any exit) ─────────
TUNING_APPLIED=false
CSTATE_PID=
METRICS_PROXY_PID=
cleanup_global() {
[ -n "$METRICS_PROXY_PID" ] && kill "$METRICS_PROXY_PID" 2>/dev/null || true
if [ "$TUNING_APPLIED" = true ]; then
echo
echo "▸ Restoring system settings..."
[ -n "$CSTATE_PID" ] && kill "$CSTATE_PID" 2>/dev/null || true
sudo systemctl start irqbalance cron atd 2>/dev/null || true
echo " System settings restored."
fi
}
trap cleanup_global EXIT
echo "═══════════════════════════════════════════════════════════"
echo " reth local benchmark"
echo "═══════════════════════════════════════════════════════════"
echo " Baseline ref : $BASELINE_REF"
echo " Feature ref : $FEATURE_REF"
echo " Blocks : $BLOCKS"
echo " Warmup : $WARMUP"
echo " Cores : $CORES"
echo " Samply : $SAMPLY"
echo " Tracy : $TRACY"
echo " Tracy filter : $TRACY_FILTER"
echo " System tune : $TUNE"
echo " Work dir : $BENCH_WORK_DIR"
echo " Reth repo : $RETH_REPO"
echo "═══════════════════════════════════════════════════════════"
echo
# Enable sccache if available (matches CI's RUSTC_WRAPPER=sccache)
if command -v sccache &>/dev/null; then
export RUSTC_WRAPPER="sccache"
fi
# Export env vars expected by the bench-reth-*.sh scripts
export BENCH_BLOCKS="$BLOCKS"
export BENCH_WARMUP_BLOCKS="$WARMUP"
export BENCH_CORES="$CORES"
export BENCH_SAMPLY="$SAMPLY"
export BENCH_TRACY="$TRACY"
export BENCH_TRACY_FILTER="$TRACY_FILTER"
export BENCH_WORK_DIR
export SCHELK_MOUNT="${SCHELK_MOUNT:-/reth-bench}"
export BENCH_RPC_URL="${BENCH_RPC_URL:-https://ethereum.reth.rs/rpc}"
export BENCH_METRICS_ADDR="127.0.0.1:9100"
# ── Step 1: Resolve refs to full SHAs ────────────────────────────────
echo "▸ Resolving git refs..."
cd "$RETH_REPO"
resolve_ref() {
local ref="$1"
git fetch origin "$ref" --quiet 2>/dev/null || true
git rev-parse "$ref" 2>/dev/null \
|| git rev-parse "origin/$ref" 2>/dev/null \
|| { echo "Error: cannot resolve ref '$ref'"; exit 1; }
}
BASELINE_SHA="$(resolve_ref "$BASELINE_REF")"
FEATURE_SHA="$(resolve_ref "$FEATURE_REF")"
echo " Baseline SHA : $BASELINE_SHA"
echo " Feature SHA : $FEATURE_SHA"
echo
# ── Step 2: Prepare source directories ───────────────────────────────
echo "▸ Preparing source directories..."
prepare_source() {
local src_dir="$1" ref="$2"
if [ -d "$src_dir" ]; then
git -C "$src_dir" fetch origin "$ref" 2>/dev/null || true
else
git clone --recurse-submodules "$RETH_REPO" "$src_dir"
fi
git -C "$src_dir" checkout "$ref" --force
git -C "$src_dir" submodule update --init --recursive
}
prepare_source "$BASELINE_SRC" "$BASELINE_SHA"
prepare_source "$FEATURE_SRC" "$FEATURE_SHA"
BASELINE_SRC="$(cd "$BASELINE_SRC" && pwd)"
FEATURE_SRC="$(cd "$FEATURE_SRC" && pwd)"
echo " Baseline src : $BASELINE_SRC"
echo " Feature src : $FEATURE_SRC"
echo
# ── Step 3: Check / download snapshot ────────────────────────────────
echo "▸ Checking snapshot..."
cd "$RETH_REPO"
SNAPSHOT_NEEDED=false
if ! "${SCRIPTS_DIR}/bench-reth-snapshot.sh" --check; then
SNAPSHOT_NEEDED=true
echo " Snapshot needs update."
else
echo " Snapshot is up-to-date."
fi
echo
# ── Step 4: Build binaries (+ snapshot download) in parallel ─────────
echo "▸ Building binaries (parallel)..."
cd "$RETH_REPO"
FAIL=0
"${SCRIPTS_DIR}/bench-reth-build.sh" baseline "$BASELINE_SRC" "$BASELINE_SHA" &
PID_BASELINE=$!
"${SCRIPTS_DIR}/bench-reth-build.sh" feature "$FEATURE_SRC" "$FEATURE_SHA" &
PID_FEATURE=$!
PID_SNAPSHOT=
if [ "$SNAPSHOT_NEEDED" = "true" ]; then
echo " Also downloading snapshot in parallel..."
"${SCRIPTS_DIR}/bench-reth-snapshot.sh" &
PID_SNAPSHOT=$!
fi
wait $PID_BASELINE || FAIL=1
wait $PID_FEATURE || FAIL=1
[ -n "$PID_SNAPSHOT" ] && { wait $PID_SNAPSHOT || FAIL=1; }
if [ $FAIL -ne 0 ]; then
echo "Error: one or more parallel tasks failed (builds / snapshot)"
exit 1
fi
echo " Binaries built successfully."
echo
# ── Step 5: System tuning (optional) ────────────────────────────────
if [ "$TUNE" = "true" ]; then
echo "▸ Applying system tuning..."
sudo cpupower frequency-set -g performance 2>/dev/null || true
# Disable turbo boost (Intel + AMD)
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo 2>/dev/null || true
echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost 2>/dev/null || true
sudo swapoff -a 2>/dev/null || true
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space 2>/dev/null || true
# Disable SMT (hyperthreading)
for cpu in /sys/devices/system/cpu/cpu*/topology/thread_siblings_list; do
[ -f "$cpu" ] || continue
first=$(cut -d, -f1 < "$cpu" | cut -d- -f1)
current=$(echo "$cpu" | grep -o 'cpu[0-9]*' | grep -o '[0-9]*')
if [ "$current" != "$first" ]; then
echo 0 | sudo tee "/sys/devices/system/cpu/cpu${current}/online" 2>/dev/null || true
fi
done
echo " Online CPUs: $(nproc)"
# Disable transparent huge pages
for p in /sys/kernel/mm/transparent_hugepage /sys/kernel/mm/transparent_hugepages; do
if [ -d "$p" ]; then
echo never | sudo tee "$p/enabled" 2>/dev/null || true
echo never | sudo tee "$p/defrag" 2>/dev/null || true
break
fi
done
# Prevent deep C-states
sudo sh -c 'exec 3<>/dev/cpu_dma_latency; echo -ne "\x00\x00\x00\x00" >&3; sleep infinity' &
CSTATE_PID=$!
# Pin IRQs to core 0
for irq in /proc/irq/*/smp_affinity_list; do
echo 0 | sudo tee "$irq" 2>/dev/null || true
done
# Stop noisy background services
sudo systemctl stop irqbalance cron atd unattended-upgrades snapd 2>/dev/null || true
TUNING_APPLIED=true
# Log environment for reproducibility (matches CI)
echo " === Benchmark environment ==="
echo " Kernel : $(uname -r)"
lscpu | grep -E 'Model name|CPU\(s\)|MHz|NUMA' | sed 's/^/ /'
echo " Governor : $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 2>/dev/null || echo unknown)"
echo " Freq : $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 2>/dev/null || echo unknown)"
echo " THP : $(cat /sys/kernel/mm/transparent_hugepage/enabled 2>/dev/null || cat /sys/kernel/mm/transparent_hugepages/enabled 2>/dev/null || echo unknown)"
free -h | sed 's/^/ /'
echo " System tuning applied."
echo
fi
# ── Step 5b: Tracefs mount (tracy=full only) ─────────────────────────
if [ "$TRACY" = "full" ] && [ "$(uname)" = "Linux" ]; then
echo "▸ Mounting tracefs for Tracy full mode..."
sudo mount -t tracefs tracefs /sys/kernel/tracing -o mode=755 2>/dev/null || true
fi
# ── Tracy upload & viewer helpers ────────────────────────────────────
TRACY_VIEWER_BASE="${TRACY_VIEWER_BASE:-}"
tracy_viewer_url() {
local profile_url="$1"
if [ -z "$TRACY_VIEWER_BASE" ]; then
echo ""
return
fi
local encoded
encoded=$(python3 -c "import urllib.parse, sys; print(urllib.parse.quote(sys.argv[1], safe=''))" "$profile_url")
echo "${TRACY_VIEWER_BASE}?profile_url=${encoded}"
}
upload_tracy() {
local label="$1" output_dir="$2" sha="$3"
local tracy_file="$output_dir/tracy-profile.tracy"
if [ ! -f "$tracy_file" ]; then
echo " Tracy: no profile found, skipping upload."
return
fi
local timestamp short_sha remote_name bucket mc_alias
timestamp=$(date +%Y%m%d-%H%M%S)
short_sha="${sha:0:7}"
remote_name="${label}-${short_sha}-${timestamp}.tracy"
bucket="${TRACY_BUCKET:-tracy-profiles}"
mc_alias="${MC_ALIAS:-minio}"
local minio_base="${TRACY_MINIO_URL:-http://minio.minio.svc.cluster.local:9000}"
echo " Tracy: uploading profile..."
if mc cp "$tracy_file" "${mc_alias}/${bucket}/${remote_name}"; then
local url="${minio_base}/${bucket}/${remote_name}"
echo "$url" > "$output_dir/tracy_url.txt"
local viewer
viewer=$(tracy_viewer_url "$url")
if [ -n "$viewer" ]; then
echo "$viewer" > "$output_dir/tracy_viewer_url.txt"
echo " Tracy: uploaded → $viewer"
else
echo " Tracy: uploaded → $url"
fi
else
echo " Tracy: upload failed (non-fatal)."
fi
# Delete large profile to free disk
rm -f "$tracy_file"
}
# ── Step 6: Pre-flight cleanup ───────────────────────────────────────
echo "▸ Pre-flight cleanup..."
pkill -f bench-metrics-proxy 2>/dev/null || true
sudo pkill -9 reth 2>/dev/null || true
sleep 1
if mountpoint -q "$SCHELK_MOUNT" 2>/dev/null; then
sudo umount -l "$SCHELK_MOUNT" 2>/dev/null || true
sudo schelk recover -y 2>/dev/null || true
fi
echo
# ── Step 7: Interleaved benchmark runs (B-F-F-B) ────────────────────
# This ordering reduces systematic bias from thermal drift and cache warming.
BASELINE_BIN="${BASELINE_SRC}/target/profiling/reth"
FEATURE_BIN="${FEATURE_SRC}/target/profiling/reth"
# Start metrics proxy (reth → label injection → Prometheus)
LABELS_FILE="/tmp/bench-metrics-labels.json"
echo '{}' > "$LABELS_FILE"
METRICS_SUBNET="${METRICS_SUBNET:-10.10.0.0/24}"
METRICS_PORT="${METRICS_PORT:-9090}"
python3 "${SELF_DIR}/bench-metrics-proxy.py" \
--labels "$LABELS_FILE" \
--upstream "http://${BENCH_METRICS_ADDR}/" \
--subnet "$METRICS_SUBNET" \
--port "$METRICS_PORT" &
METRICS_PROXY_PID=$!
echo "▸ Metrics proxy started (PID $METRICS_PROXY_PID) on subnet ${METRICS_SUBNET}, port ${METRICS_PORT}"
# Unique benchmark ID: local-<timestamp> for local runs, ci-<run_id> for CI
BENCH_ID="local-$(basename "$BENCH_WORK_DIR" | sed 's/bench-work-//')"
# Reference epoch: shared time origin so all runs overlay in Grafana.
# The proxy maps each run's elapsed time onto this common origin.
BENCH_REFERENCE_EPOCH=$(date +%s)
write_labels() {
local run_label="$1" run_type="$2" ref="$3" sha="$4"
LAST_RUN_START=$(date +%s)
cat > "$LABELS_FILE" <<-EOF
{"benchmark_run":"${run_label}","run_type":"${run_type}","git_ref":"${ref}","bench_sha":"${sha}","benchmark_id":"${BENCH_ID}","run_start_epoch":"${LAST_RUN_START}","reference_epoch":"${BENCH_REFERENCE_EPOCH}"}
EOF
}
run_bench() {
local label="$1" binary="$2" output_dir="$3"
echo "▸ Running benchmark: ${label}..."
cd "$RETH_REPO"
if command -v taskset &>/dev/null; then
taskset -c 0 "${SCRIPTS_DIR}/bench-reth-run.sh" "$label" "$binary" "$output_dir"
else
"${SCRIPTS_DIR}/bench-reth-run.sh" "$label" "$binary" "$output_dir"
fi
echo "${label} complete."
echo
}
write_labels "baseline-1" "baseline" "$BASELINE_REF" "$BASELINE_SHA"
run_bench "baseline-1" "$BASELINE_BIN" "$BENCH_WORK_DIR/baseline-1"
write_labels "feature-1" "feature" "$FEATURE_REF" "$FEATURE_SHA"
run_bench "feature-1" "$FEATURE_BIN" "$BENCH_WORK_DIR/feature-1"
write_labels "feature-2" "feature" "$FEATURE_REF" "$FEATURE_SHA"
run_bench "feature-2" "$FEATURE_BIN" "$BENCH_WORK_DIR/feature-2"
write_labels "baseline-2" "baseline" "$BASELINE_REF" "$BASELINE_SHA"
run_bench "baseline-2" "$BASELINE_BIN" "$BENCH_WORK_DIR/baseline-2"
# ── Compute Grafana URL ──────────────────────────────────────────────
GRAFANA_BASE_URL="https://tempoxyz.grafana.net/d/reth-bench-ghr/reth-bench-ghr"
GRAFANA_DATASOURCE="ef57fux92e9z4e"
LAST_RUN_DURATION=$(( $(date +%s) - LAST_RUN_START ))
FROM_MS=$(( BENCH_REFERENCE_EPOCH * 1000 ))
TO_MS=$(( (BENCH_REFERENCE_EPOCH + LAST_RUN_DURATION) * 1000 ))
GRAFANA_URL="${GRAFANA_BASE_URL}?orgId=1&from=${FROM_MS}&to=${TO_MS}&timezone=browser&var-datasource=${GRAFANA_DATASOURCE}&var-job=reth-bench&var-benchmark_id=${BENCH_ID}&var-benchmark_run=\$__all"
# ── Step 8: Scan logs for errors ─────────────────────────────────────
echo "▸ Scanning logs for errors..."
ERRORS_FILE="$BENCH_WORK_DIR/errors.md"
found_errors=false
for run_dir in baseline-1 feature-1 feature-2 baseline-2; do
LOG="$BENCH_WORK_DIR/$run_dir/node.log"
[ -f "$LOG" ] || continue
panics=$(grep -c -E 'panicked at' "$LOG" 2>/dev/null || true)
errors=$(grep -c ' ERROR ' "$LOG" 2>/dev/null || true)
if [ "$panics" -gt 0 ] || [ "$errors" -gt 0 ]; then
if [ "$found_errors" = false ]; then
printf '### ⚠️ Node Errors\n\n' >> "$ERRORS_FILE"
found_errors=true
fi
printf '<details><summary><b>%s</b>: %d panic(s), %d error(s)</summary>\n\n' \
"$run_dir" "$panics" "$errors" >> "$ERRORS_FILE"
if [ "$panics" -gt 0 ]; then
printf '**Panics:**\n```\n' >> "$ERRORS_FILE"
grep -E 'panicked at' "$LOG" | head -10 >> "$ERRORS_FILE"
printf '```\n' >> "$ERRORS_FILE"
fi
if [ "$errors" -gt 0 ]; then
printf '**Errors (first 20):**\n```\n' >> "$ERRORS_FILE"
grep ' ERROR ' "$LOG" | head -20 >> "$ERRORS_FILE"
printf '```\n' >> "$ERRORS_FILE"
fi
printf '\n</details>\n\n' >> "$ERRORS_FILE"
fi
done
if [ "$found_errors" = true ]; then
echo " ⚠ Errors found — see $ERRORS_FILE"
else
echo " No errors found."
fi
echo
# ── Step 9: Parse results ───────────────────────────────────────────
echo "▸ Parsing results..."
cd "$RETH_REPO"
SUMMARY_ARGS=(
--output-summary "$BENCH_WORK_DIR/summary.json"
--output-markdown "$BENCH_WORK_DIR/comment.md"
--repo "paradigmxyz/reth"
--baseline-ref "$BASELINE_SHA"
--baseline-name "$BASELINE_REF"
--feature-name "$FEATURE_REF"
--feature-ref "$FEATURE_SHA"
--baseline-csv "$BENCH_WORK_DIR/baseline-1/combined_latency.csv" "$BENCH_WORK_DIR/baseline-2/combined_latency.csv"
--feature-csv "$BENCH_WORK_DIR/feature-1/combined_latency.csv" "$BENCH_WORK_DIR/feature-2/combined_latency.csv"
--gas-csv "$BENCH_WORK_DIR/feature-1/total_gas.csv"
--grafana-url "$GRAFANA_URL"
)
python3 "${SCRIPTS_DIR}/bench-reth-summary.py" "${SUMMARY_ARGS[@]}"
echo
# ── Step 10: Generate charts ─────────────────────────────────────────
echo "▸ Generating charts..."
CHART_ARGS=(
--output-dir "$BENCH_WORK_DIR/charts"
--feature "$BENCH_WORK_DIR/feature-1/combined_latency.csv" "$BENCH_WORK_DIR/feature-2/combined_latency.csv"
--baseline "$BENCH_WORK_DIR/baseline-1/combined_latency.csv" "$BENCH_WORK_DIR/baseline-2/combined_latency.csv"
--baseline-name "$BASELINE_REF"
--feature-name "$FEATURE_REF"
)
if python3 -c "import matplotlib" 2>/dev/null; then
python3 "${SCRIPTS_DIR}/bench-reth-charts.py" "${CHART_ARGS[@]}"
elif command -v uv &>/dev/null; then
uv run --with matplotlib python3 "${SCRIPTS_DIR}/bench-reth-charts.py" "${CHART_ARGS[@]}"
else
echo " Warning: matplotlib not available, skipping chart generation."
fi
echo
# ── Step 11: Upload Tracy profiles ────────────────────────────────────
if [ "$TRACY" != "off" ]; then
echo "▸ Uploading Tracy profiles..."
upload_tracy "baseline-1" "$BENCH_WORK_DIR/baseline-1" "$BASELINE_SHA"
upload_tracy "feature-1" "$BENCH_WORK_DIR/feature-1" "$FEATURE_SHA"
upload_tracy "feature-2" "$BENCH_WORK_DIR/feature-2" "$FEATURE_SHA"
upload_tracy "baseline-2" "$BENCH_WORK_DIR/baseline-2" "$BASELINE_SHA"
echo
fi
# ── Done (system restore happens via EXIT trap) ─────────────────────
echo "═══════════════════════════════════════════════════════════"
echo " Benchmark complete!"
echo "═══════════════════════════════════════════════════════════"
echo " Results : $BENCH_WORK_DIR/summary.json"
echo " Markdown : $BENCH_WORK_DIR/comment.md"
echo " Charts : $BENCH_WORK_DIR/charts/"
if [ -f "$ERRORS_FILE" ]; then
echo " Errors : $ERRORS_FILE"
fi
echo " Grafana : $GRAFANA_URL"
if [ "$TRACY" != "off" ]; then
echo " ─── Tracy Profiles ───"
for run_dir in baseline-1 feature-1 feature-2 baseline-2; do
url_file="$BENCH_WORK_DIR/$run_dir/tracy_viewer_url.txt"
if [ -f "$url_file" ]; then
echo " $run_dir : $(cat "$url_file")"
fi
done
fi
echo "═══════════════════════════════════════════════════════════"

View File

@@ -1,300 +0,0 @@
#!/usr/bin/env bash
#
# Runs a single reth-bench cycle: mount snapshot → start node → warmup →
# benchmark → stop node → recover snapshot.
#
# Usage: bench-reth-run.sh <label> <binary> <output-dir>
#
# Required env: SCHELK_MOUNT, BENCH_RPC_URL, BENCH_BLOCKS, BENCH_WARMUP_BLOCKS
# Optional env: BENCH_BIG_BLOCKS (true/false), BENCH_WORK_DIR (for big blocks path)
# BENCH_RETH_NEW_PAYLOAD (true/false, default true)
# BENCH_WAIT_TIME (duration like 500ms, default empty)
# BENCH_BASELINE_ARGS (extra reth node args for baseline runs)
# BENCH_FEATURE_ARGS (extra reth node args for feature runs)
# BENCH_OTLP_TRACES_ENDPOINT (OTLP HTTP endpoint for traces, e.g. https://host/insert/opentelemetry/v1/traces)
# BENCH_OTLP_LOGS_ENDPOINT (OTLP HTTP endpoint for logs, e.g. https://host/insert/opentelemetry/v1/logs)
set -euo pipefail
LABEL="$1"
BINARY="$2"
OUTPUT_DIR="$3"
DATADIR="$SCHELK_MOUNT/datadir"
mkdir -p "$OUTPUT_DIR"
LOG="${OUTPUT_DIR}/node.log"
cleanup() {
kill "$TAIL_PID" 2>/dev/null || true
# Stop tracy-capture first (SIGINT makes it disconnect and flush to disk)
# Must happen before killing reth, otherwise reth keeps streaming data.
if [ -n "${TRACY_PID:-}" ] && kill -0 "$TRACY_PID" 2>/dev/null; then
echo "Stopping tracy-capture..."
kill -INT "$TRACY_PID" 2>/dev/null || true
for i in $(seq 1 30); do
kill -0 "$TRACY_PID" 2>/dev/null || break
if [ $((i % 10)) -eq 0 ]; then
echo "Waiting for tracy-capture to finish writing... (${i}s)"
fi
sleep 1
done
if kill -0 "$TRACY_PID" 2>/dev/null; then
echo "tracy-capture still running after 30s, killing..."
kill -9 "$TRACY_PID" 2>/dev/null || true
fi
wait "$TRACY_PID" 2>/dev/null || true
fi
if [ -n "${RETH_PID:-}" ] && sudo kill -0 "$RETH_PID" 2>/dev/null; then
if [ "${BENCH_SAMPLY:-false}" = "true" ]; then
# Send SIGINT to the inner reth process by exact name (not -f which
# would also match samply's cmdline containing "reth"). Samply will
# capture reth's exit and save the profile.
sudo pkill -INT -x reth 2>/dev/null || true
# Wait for samply to finish writing the profile and exit
for i in $(seq 1 120); do
sudo pgrep -x samply > /dev/null 2>&1 || break
if [ $((i % 10)) -eq 0 ]; then
echo "Waiting for samply to finish writing profile... (${i}s)"
fi
sleep 1
done
if sudo pgrep -x samply > /dev/null 2>&1; then
echo "Samply still running after 120s, sending SIGTERM..."
sudo pkill -x samply 2>/dev/null || true
fi
else
sudo kill "$RETH_PID"
for i in $(seq 1 30); do
sudo kill -0 "$RETH_PID" 2>/dev/null || break
sleep 1
done
fi
sudo kill -9 "$RETH_PID" 2>/dev/null || true
sleep 1
fi
# Fix ownership of reth-created files (reth runs as root)
sudo chown -R "$(id -un):$(id -gn)" "$OUTPUT_DIR" 2>/dev/null || true
if mountpoint -q "$SCHELK_MOUNT"; then
sudo umount -l "$SCHELK_MOUNT" || true
sudo schelk recover -y || true
fi
}
TAIL_PID=
TRACY_PID=
trap cleanup EXIT
# Clean up stale schelk state from a previous cancelled run.
# If schelk thinks it's still mounted (e.g. a cancelled run skipped cleanup),
# recover first to reset state.
sudo schelk recover -y -k || true
# Mount
sudo schelk mount -y
sync
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
echo "=== Cache state after drop ==="
free -h
grep Cached /proc/meminfo
# Start reth
# CPU layout: core 0 = OS/IRQs/reth-bench/aux, cores 1+ = reth node
RETH_BENCH="$(which reth-bench)"
ONLINE=$(nproc --all)
MAX_RETH=$(( ONLINE - 1 ))
if [ "${BENCH_CORES:-0}" -gt 0 ] && [ "$BENCH_CORES" -lt "$MAX_RETH" ]; then
MAX_RETH=$BENCH_CORES
fi
RETH_CPUS="1-${MAX_RETH}"
BIG_BLOCKS="${BENCH_BIG_BLOCKS:-false}"
RETH_ARGS=(
node
--datadir "$DATADIR"
--log.file.directory "$OUTPUT_DIR/reth-logs"
--engine.accept-execution-requests-hash
--http
--http.port 8545
--ws
--ws.api all
--authrpc.port 8551
--disable-discovery
--no-persist-peers
)
# Gate flag on binary support (older baselines may not have it).
# Uses --help which exits immediately via clap without node init.
SYNC_STATE_IDLE=false
if "$BINARY" node --help 2>/dev/null | grep -qF -- '--debug.startup-sync-state-idle'; then
RETH_ARGS+=(--debug.startup-sync-state-idle)
SYNC_STATE_IDLE=true
fi
# Big blocks mode requires the testing API, skip-invalid-transactions, and
# skip-gas-limit-ramp-check + gas-limit override to avoid the 6800-block ramp.
if [ "$BIG_BLOCKS" = "true" ]; then
RETH_ARGS+=(--http.api eth,net,web3,reth,testing --rpc.max-request-size max --testing.skip-invalid-transactions --testing.skip-gas-limit-ramp-check --testing.gas-limit 1000000000)
fi
# Append per-label extra node args (baseline or feature)
EXTRA_NODE_ARGS=""
case "$LABEL" in
baseline*) EXTRA_NODE_ARGS="${BENCH_BASELINE_ARGS:-}" ;;
feature*) EXTRA_NODE_ARGS="${BENCH_FEATURE_ARGS:-}" ;;
esac
if [ -n "$EXTRA_NODE_ARGS" ]; then
# Word-split the string into individual args
# shellcheck disable=SC2206
RETH_ARGS+=($EXTRA_NODE_ARGS)
fi
if [ -n "${BENCH_METRICS_ADDR:-}" ]; then
RETH_ARGS+=(--metrics "$BENCH_METRICS_ADDR")
fi
# OTLP traces and logs export
if [ -n "${BENCH_OTLP_TRACES_ENDPOINT:-}" ]; then
RETH_ARGS+=(--tracing-otlp="${BENCH_OTLP_TRACES_ENDPOINT}" --tracing-otlp.service-name=reth-bench)
fi
if [ -n "${BENCH_OTLP_LOGS_ENDPOINT:-}" ]; then
RETH_ARGS+=(--logs-otlp="${BENCH_OTLP_LOGS_ENDPOINT}" --logs-otlp.filter=debug)
fi
# Tracy profiling: add --log.tracy flags and set environment
if [ "${BENCH_TRACY:-off}" != "off" ]; then
RETH_ARGS+=(--log.tracy --log.tracy.filter "${BENCH_TRACY_FILTER:-debug}")
if [ "${BENCH_TRACY}" = "on" ]; then
export TRACY_NO_SYS_TRACE=1
elif [ "${BENCH_TRACY}" = "full" ]; then
export TRACY_SAMPLING_HZ="${BENCH_TRACY_SAMPLING_HZ:-1}"
fi
fi
SUDO_ENV=()
if [ -n "${OTEL_RESOURCE_ATTRIBUTES:-}" ]; then
SUDO_ENV+=("OTEL_RESOURCE_ATTRIBUTES=${OTEL_RESOURCE_ATTRIBUTES}")
SUDO_ENV+=("OTEL_BSP_MAX_QUEUE_SIZE=65536" "OTEL_BLRP_MAX_QUEUE_SIZE=65536")
fi
# Limit reth memory to 95% of available RAM to prevent OOM kills
TOTAL_MEM_KB=$(awk '/^MemTotal:/ {print $2}' /proc/meminfo)
MEM_LIMIT=$(( TOTAL_MEM_KB * 95 / 100 * 1024 ))
echo "Memory limit: $(( MEM_LIMIT / 1024 / 1024 ))MB (95% of $(( TOTAL_MEM_KB / 1024 ))MB)"
if [ "${BENCH_SAMPLY:-false}" = "true" ]; then
RETH_ARGS+=(--log.samply)
SAMPLY="$(which samply)"
sudo systemd-run --scope -p MemoryMax="$MEM_LIMIT" -p AllowedCPUs="$RETH_CPUS" \
env "${SUDO_ENV[@]}" nice -n -20 \
"$SAMPLY" record --save-only --presymbolicate --rate 10000 \
--output "$OUTPUT_DIR/samply-profile.json.gz" \
-- "$BINARY" "${RETH_ARGS[@]}" \
> "$LOG" 2>&1 &
else
sudo systemd-run --scope -p MemoryMax="$MEM_LIMIT" -p AllowedCPUs="$RETH_CPUS" \
env "${SUDO_ENV[@]}" nice -n -20 "$BINARY" "${RETH_ARGS[@]}" \
> "$LOG" 2>&1 &
fi
RETH_PID=$!
stdbuf -oL tail -f "$LOG" | sed -u "s/^/[reth] /" &
TAIL_PID=$!
for i in $(seq 1 60); do
if curl -sf http://127.0.0.1:8545 -X POST \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
> /dev/null 2>&1; then
echo "reth (${LABEL}) RPC is up after ${i}s"
break
fi
if [ "$i" -eq 60 ]; then
echo "::error::reth (${LABEL}) failed to start within 60s"
cat "$LOG"
exit 1
fi
sleep 1
done
# Wait for the pipeline to finish (eth_syncing returns false) so the
# engine is in live mode and can accept newPayload calls.
# Only possible when --debug.startup-sync-state-idle is supported.
if [ "$SYNC_STATE_IDLE" = "true" ]; then
for i in $(seq 1 300); do
SYNC_RESULT=$(curl -sf http://127.0.0.1:8545 -X POST \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' 2>/dev/null || true)
if [ -n "$SYNC_RESULT" ] && jq -e '.result == false' <<< "$SYNC_RESULT" > /dev/null 2>&1; then
echo "reth (${LABEL}) pipeline finished after ${i}s, engine is live"
break
fi
if [ "$i" -eq 300 ]; then
echo "::error::reth (${LABEL}) pipeline did not finish within 300s"
cat "$LOG"
exit 1
fi
sleep 1
done
else
echo "reth (${LABEL}) binary does not support --debug.startup-sync-state-idle, skipping sync wait"
fi
# Run reth-bench with high priority but as the current user so output
# files are not root-owned (avoids EACCES on next checkout).
BENCH_NICE="sudo nice -n -20 sudo -u $(id -un)"
# Build optional flags
EXTRA_BENCH_ARGS=()
if [ "${BENCH_RETH_NEW_PAYLOAD:-true}" != "false" ]; then
EXTRA_BENCH_ARGS+=(--reth-new-payload)
fi
if [ -n "${BENCH_WAIT_TIME:-}" ]; then
EXTRA_BENCH_ARGS+=(--wait-time "$BENCH_WAIT_TIME")
fi
if [ "$BIG_BLOCKS" = "true" ]; then
# Big blocks mode: replay pre-generated payloads
BIG_BLOCKS_DIR="${BENCH_WORK_DIR}/big-blocks"
# Start tracy-capture so profile only covers the benchmark
if [ "${BENCH_TRACY:-off}" != "off" ]; then
echo "Starting tracy-capture..."
tracy-capture -f -o "$OUTPUT_DIR/tracy-profile.tracy" &
TRACY_PID=$!
sleep 0.5 # give tracy-capture time to connect
fi
echo "Running big blocks benchmark (replay-payloads)..."
$BENCH_NICE "$RETH_BENCH" replay-payloads \
"${EXTRA_BENCH_ARGS[@]}" \
--payload-dir "$BIG_BLOCKS_DIR/payloads" \
--engine-rpc-url http://127.0.0.1:8551 \
--jwt-secret "$DATADIR/jwt.hex" \
--output "$OUTPUT_DIR" 2>&1 | sed -u "s/^/[bench] /"
else
# Standard mode: warmup + new-payload-fcu
# Warmup
$BENCH_NICE "$RETH_BENCH" new-payload-fcu \
--rpc-url "$BENCH_RPC_URL" \
--engine-rpc-url http://127.0.0.1:8551 \
--jwt-secret "$DATADIR/jwt.hex" \
--advance "${BENCH_WARMUP_BLOCKS:-50}" \
"${EXTRA_BENCH_ARGS[@]}" 2>&1 | sed -u "s/^/[bench] /"
# Start tracy-capture after warmup so profile only covers the benchmark
if [ "${BENCH_TRACY:-off}" != "off" ]; then
echo "Starting tracy-capture..."
tracy-capture -f -o "$OUTPUT_DIR/tracy-profile.tracy" &
TRACY_PID=$!
sleep 0.5 # give tracy-capture time to connect
fi
# Benchmark
$BENCH_NICE "$RETH_BENCH" new-payload-fcu \
--rpc-url "$BENCH_RPC_URL" \
--engine-rpc-url http://127.0.0.1:8551 \
--jwt-secret "$DATADIR/jwt.hex" \
--advance "$BENCH_BLOCKS" \
"${EXTRA_BENCH_ARGS[@]}" \
--output "$OUTPUT_DIR" 2>&1 | sed -u "s/^/[bench] /"
fi
# cleanup runs via trap

View File

@@ -1,120 +0,0 @@
#!/usr/bin/env bash
#
# Downloads the latest snapshot into the schelk volume using
# `reth download` with progress reporting to the GitHub PR comment.
#
# Skips the download if the manifest content hasn't changed since
# the last successful download (checked via SHA-256 of the manifest).
#
# Usage: bench-reth-snapshot.sh [--check]
# --check Only check if a download is needed; exits 0 if up-to-date, 10 if not.
#
# Required env:
# SCHELK_MOUNT schelk mount point (e.g. /reth-bench)
# BENCH_RETH_BINARY path to the reth binary
# GITHUB_TOKEN token for GitHub API calls (only for download)
# BENCH_COMMENT_ID PR comment ID to update (optional)
# BENCH_REPO owner/repo (e.g. paradigmxyz/reth)
# BENCH_JOB_URL link to the Actions job
# BENCH_ACTOR user who triggered the benchmark
# BENCH_CONFIG config summary line
set -euo pipefail
MC="mc"
BUCKET="minio/reth-snapshots"
MANIFEST_PATH="reth-1-minimal-stable/manifest.json"
DATADIR="$SCHELK_MOUNT/datadir"
HASH_FILE="$HOME/.reth-bench-snapshot-hash"
# Fetch manifest and compute content hash for reliable freshness check
MANIFEST_CONTENT=$($MC cat "${BUCKET}/${MANIFEST_PATH}" 2>/dev/null) || {
echo "::error::Failed to fetch snapshot manifest from ${BUCKET}/${MANIFEST_PATH}"
exit 2
}
REMOTE_HASH=$(echo "$MANIFEST_CONTENT" | sha256sum | awk '{print $1}')
LOCAL_HASH=""
[ -f "$HASH_FILE" ] && LOCAL_HASH=$(cat "$HASH_FILE")
if [ "$REMOTE_HASH" = "$LOCAL_HASH" ]; then
echo "Snapshot is up-to-date (manifest hash: ${REMOTE_HASH:0:16}…)"
exit 0
fi
echo "Snapshot needs update (local: ${LOCAL_HASH:+${LOCAL_HASH:0:16}}${LOCAL_HASH:-<none>}, remote: ${REMOTE_HASH:0:16}…)"
if [ "${1:-}" = "--check" ]; then
exit 10
fi
RETH="${BENCH_RETH_BINARY:?BENCH_RETH_BINARY must be set}"
if [ ! -x "$RETH" ]; then
echo "::error::reth binary not found or not executable at $RETH"
exit 1
fi
# Resolve the MinIO HTTP endpoint from the mc alias so reth can
# fetch archives over HTTP (the manifest's embedded base_url points
# to the cluster-internal address which is unreachable from runners).
MINIO_ENDPOINT=$($MC alias list minio --json 2>/dev/null | jq -r '.URL // empty') || true
if [ -z "$MINIO_ENDPOINT" ]; then
echo "::error::Failed to resolve MinIO endpoint from mc alias 'minio'"
exit 1
fi
BASE_URL="${MINIO_ENDPOINT}/reth-snapshots/reth-1-minimal-stable"
# Rewrite manifest's base_url with the runner-reachable endpoint
MANIFEST_TMP=$(mktemp --suffix=.json)
trap 'rm -f -- "$MANIFEST_TMP"' EXIT
echo "$MANIFEST_CONTENT" \
| jq --arg base "$BASE_URL" '.base_url = $base' > "$MANIFEST_TMP"
# Prepare mount
mountpoint -q "$SCHELK_MOUNT" && sudo schelk recover -y || true
sudo schelk mount -y
sudo rm -rf "$DATADIR"
sudo mkdir -p "$DATADIR"
# reth download runs as current user (not root), needs write access
sudo chown -R "$(id -u):$(id -g)" "$DATADIR"
update_comment() {
local status="$1"
[ -z "${BENCH_COMMENT_ID:-}" ] && return 0
local body
body="$(printf 'cc @%s\n\n🚀 Benchmark started! [View job](%s)\n\n⏳ **Status:** %s\n\n%s' \
"$BENCH_ACTOR" "$BENCH_JOB_URL" "$status" "$BENCH_CONFIG")"
curl -sf -X PATCH \
-H "Authorization: token ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/repos/${BENCH_REPO}/issues/comments/${BENCH_COMMENT_ID}" \
-d "$(jq -nc --arg body "$body" '{body: $body}')" \
> /dev/null 2>&1 || true
}
update_comment "Downloading snapshot…"
# Download using reth download (manifest-path with rewritten base_url)
"$RETH" download \
--manifest-path "$MANIFEST_TMP" \
-y \
--minimal \
--datadir "$DATADIR"
update_comment "Downloading snapshot… done"
echo "Snapshot download complete"
# Sanity check: verify expected directories exist
if [ ! -d "$DATADIR/db" ] || [ ! -d "$DATADIR/static_files" ]; then
echo "::error::Snapshot download did not produce expected directory layout (missing db/ or static_files/)"
ls -la "$DATADIR" || true
exit 1
fi
# Promote the new snapshot to become the schelk baseline (virgin volume).
# This copies changed blocks from scratch → virgin so that future
# `schelk recover` calls restore to this new state.
sync
sudo schelk promote -y
# Save manifest hash
echo "$REMOTE_HASH" > "$HASH_FILE"
echo "Snapshot promoted to schelk baseline (manifest hash: ${REMOTE_HASH:0:16}…)"

View File

@@ -1,590 +0,0 @@
#!/usr/bin/env python3
"""Parse reth-bench CSV output and generate a summary JSON + markdown comparison.
Usage:
bench-reth-summary.py <combined_csv> <gas_csv> \
--output-summary <summary.json> \
--output-markdown <comment.md> \
--baseline-csv <baseline_combined.csv> \
[--repo <owner/repo>] \
[--baseline-ref <sha>] \
[--feature-name <name>] \
[--feature-sha <sha>]
Generates a paired statistical comparison between baseline and feature.
Matches blocks by number and computes per-block diffs to cancel out gas
variance. Fails if baseline or feature CSV is missing or empty.
"""
import argparse
import csv
import json
import math
import random
import sys
GIGAGAS = 1_000_000_000
T_CRITICAL = 1.96 # two-tailed 95% confidence
BOOTSTRAP_ITERATIONS = 10_000
def _opt_int(row: dict, key: str) -> int | None:
"""Return int value for a CSV field, or None if missing/empty."""
v = row.get(key)
if v is None or v == "":
return None
return int(v)
def parse_combined_csv(path: str) -> list[dict]:
"""Parse combined_latency.csv into a list of per-block dicts."""
rows = []
with open(path) as f:
reader = csv.DictReader(f)
for row in reader:
rows.append(
{
"block_number": int(row["block_number"]),
"gas_used": int(row["gas_used"]),
"gas_limit": int(row["gas_limit"]),
"transaction_count": int(row["transaction_count"]),
"new_payload_latency_us": int(row["new_payload_latency"]),
"fcu_latency_us": int(row["fcu_latency"]),
"total_latency_us": int(row["total_latency"]),
"persistence_wait_us": _opt_int(row, "persistence_wait"),
"execution_cache_wait_us": _opt_int(row, "execution_cache_wait"),
"sparse_trie_wait_us": _opt_int(row, "sparse_trie_wait"),
}
)
return rows
def parse_gas_csv(path: str) -> list[dict]:
"""Parse total_gas.csv into a list of per-block dicts."""
rows = []
with open(path) as f:
reader = csv.DictReader(f)
for row in reader:
rows.append(
{
"block_number": int(row["block_number"]),
"gas_used": int(row["gas_used"]),
"time_us": int(row["time"]),
}
)
return rows
def stddev(values: list[float], mean: float) -> float:
if len(values) < 2:
return 0.0
return math.sqrt(sum((v - mean) ** 2 for v in values) / (len(values) - 1))
def percentile(sorted_vals: list[float], pct: int) -> float:
if not sorted_vals:
return 0.0
idx = int(len(sorted_vals) * pct / 100)
idx = min(idx, len(sorted_vals) - 1)
return sorted_vals[idx]
def compute_stats(combined: list[dict]) -> dict:
"""Compute per-run statistics from parsed CSV data."""
n = len(combined)
if n == 0:
return {}
latencies_ms = [r["new_payload_latency_us"] / 1_000 for r in combined]
sorted_lat = sorted(latencies_ms)
mean_lat = sum(latencies_ms) / n
std_lat = stddev(latencies_ms, mean_lat)
mgas_s_values = []
for r in combined:
lat_s = r["new_payload_latency_us"] / 1_000_000
if lat_s > 0:
mgas_s_values.append(r["gas_used"] / lat_s / 1_000_000)
mean_mgas_s = sum(mgas_s_values) / len(mgas_s_values) if mgas_s_values else 0
total_latencies_ms = [r["total_latency_us"] / 1_000 for r in combined]
wall_clock_s = sum(total_latencies_ms) / 1_000
mean_total_lat_ms = sum(total_latencies_ms) / n
return {
"n": n,
"mean_ms": mean_lat,
"stddev_ms": std_lat,
"p50_ms": percentile(sorted_lat, 50),
"p90_ms": percentile(sorted_lat, 90),
"p99_ms": percentile(sorted_lat, 99),
"mean_mgas_s": mean_mgas_s,
"wall_clock_s": wall_clock_s,
"mean_total_lat_ms": mean_total_lat_ms,
}
def compute_wait_stats(combined: list[dict], field: str) -> dict:
"""Compute mean/p50/p95 for a wait time field (in ms)."""
values_ms = []
for r in combined:
v = r.get(field)
if v is not None:
values_ms.append(v / 1_000)
if not values_ms:
return {}
n = len(values_ms)
mean_val = sum(values_ms) / n
sorted_vals = sorted(values_ms)
return {
"mean_ms": mean_val,
"p50_ms": percentile(sorted_vals, 50),
"p95_ms": percentile(sorted_vals, 95),
}
def _paired_data(
baseline: list[dict], feature: list[dict]
) -> tuple[list[tuple[float, float]], list[float], list[float], list[float]]:
"""Match blocks and return paired latencies and per-block diffs.
Returns:
pairs: list of (baseline_ms, feature_ms) tuples
lat_diffs_ms: list of feature baseline latency diffs in ms
mgas_diffs: list of feature baseline Mgas/s diffs
total_lat_diffs_ms: list of feature baseline total latency diffs in ms
"""
baseline_by_block = {r["block_number"]: r for r in baseline}
feature_by_block = {r["block_number"]: r for r in feature}
common_blocks = sorted(set(baseline_by_block) & set(feature_by_block))
pairs = []
lat_diffs_ms = []
mgas_diffs = []
total_lat_diffs_ms = []
for bn in common_blocks:
b = baseline_by_block[bn]
f = feature_by_block[bn]
b_ms = b["new_payload_latency_us"] / 1_000
f_ms = f["new_payload_latency_us"] / 1_000
pairs.append((b_ms, f_ms))
lat_diffs_ms.append(f_ms - b_ms)
b_lat_s = b["new_payload_latency_us"] / 1_000_000
f_lat_s = f["new_payload_latency_us"] / 1_000_000
if b_lat_s > 0 and f_lat_s > 0:
mgas_diffs.append(
f["gas_used"] / f_lat_s / 1_000_000
- b["gas_used"] / b_lat_s / 1_000_000
)
total_lat_diffs_ms.append(
f["total_latency_us"] / 1_000 - b["total_latency_us"] / 1_000
)
return pairs, lat_diffs_ms, mgas_diffs, total_lat_diffs_ms
def compute_paired_stats(
baseline_runs: list[list[dict]],
feature_runs: list[list[dict]],
) -> dict:
"""Compute paired statistics between baseline and feature runs.
Each pair (baseline_runs[i], feature_runs[i]) produces per-block diffs.
All diffs are pooled for the final CI.
"""
all_pairs = []
all_lat_diffs = []
all_mgas_diffs = []
all_total_lat_diffs = []
blocks_per_pair = []
for baseline, feature in zip(baseline_runs, feature_runs):
pairs, lat_diffs, mgas_diffs, total_lat_diffs = _paired_data(baseline, feature)
all_pairs.extend(pairs)
all_lat_diffs.extend(lat_diffs)
all_mgas_diffs.extend(mgas_diffs)
all_total_lat_diffs.extend(total_lat_diffs)
blocks_per_pair.append(len(pairs))
if not all_lat_diffs:
return {}
n = len(all_lat_diffs)
mean_diff = sum(all_lat_diffs) / n
std_diff = stddev(all_lat_diffs, mean_diff)
se = std_diff / math.sqrt(n) if n > 0 else 0.0
ci = T_CRITICAL * se
# Bootstrap CI on difference-of-percentiles (resample paired blocks)
base_lats = sorted([p[0] for p in all_pairs])
feature_lats = sorted([p[1] for p in all_pairs])
p50_diff = percentile(feature_lats, 50) - percentile(base_lats, 50)
p90_diff = percentile(feature_lats, 90) - percentile(base_lats, 90)
p99_diff = percentile(feature_lats, 99) - percentile(base_lats, 99)
rng = random.Random(42)
p50_boot, p90_boot, p99_boot = [], [], []
for _ in range(BOOTSTRAP_ITERATIONS):
sample = rng.choices(all_pairs, k=n)
b_sorted = sorted(p[0] for p in sample)
f_sorted = sorted(p[1] for p in sample)
p50_boot.append(percentile(f_sorted, 50) - percentile(b_sorted, 50))
p90_boot.append(percentile(f_sorted, 90) - percentile(b_sorted, 90))
p99_boot.append(percentile(f_sorted, 99) - percentile(b_sorted, 99))
p50_boot.sort()
p90_boot.sort()
p99_boot.sort()
lo = int(BOOTSTRAP_ITERATIONS * 0.025)
hi = int(BOOTSTRAP_ITERATIONS * 0.975)
mean_mgas_diff = sum(all_mgas_diffs) / len(all_mgas_diffs) if all_mgas_diffs else 0.0
std_mgas_diff = stddev(all_mgas_diffs, mean_mgas_diff) if len(all_mgas_diffs) > 1 else 0.0
mgas_se = std_mgas_diff / math.sqrt(len(all_mgas_diffs)) if all_mgas_diffs else 0.0
mgas_ci = T_CRITICAL * mgas_se
mean_total_diff = sum(all_total_lat_diffs) / len(all_total_lat_diffs) if all_total_lat_diffs else 0.0
std_total_diff = stddev(all_total_lat_diffs, mean_total_diff) if len(all_total_lat_diffs) > 1 else 0.0
total_se = std_total_diff / math.sqrt(len(all_total_lat_diffs)) if all_total_lat_diffs else 0.0
wall_clock_ci_ms = T_CRITICAL * total_se
return {
"n": n,
"mean_diff_ms": mean_diff,
"ci_ms": ci,
"p50_diff_ms": p50_diff,
"p50_ci_ms": (p50_boot[hi] - p50_boot[lo]) / 2,
"p90_diff_ms": p90_diff,
"p90_ci_ms": (p90_boot[hi] - p90_boot[lo]) / 2,
"p99_diff_ms": p99_diff,
"p99_ci_ms": (p99_boot[hi] - p99_boot[lo]) / 2,
"mean_mgas_diff": mean_mgas_diff,
"mgas_ci": mgas_ci,
"wall_clock_ci_ms": wall_clock_ci_ms,
"blocks": max(blocks_per_pair),
}
def format_duration(seconds: float) -> str:
if seconds >= 60:
return f"{seconds / 60:.1f}min"
return f"{seconds}s"
def format_gas(gas: int) -> str:
if gas >= GIGAGAS:
return f"{gas / GIGAGAS:.1f}G"
if gas >= 1_000_000:
return f"{gas / 1_000_000:.1f}M"
return f"{gas:,}"
def fmt_ms(v: float) -> str:
return f"{v:.2f}ms"
def fmt_mgas(v: float) -> str:
return f"{v:.2f}"
def fmt_s(v: float) -> str:
return f"{v:.2f}s"
def significance(pct: float, ci_pct: float, lower_is_better: bool) -> str:
"""Return significance label: 'good', 'bad', or 'neutral'."""
significant = abs(pct) > ci_pct
if not significant:
return "neutral"
elif (pct < 0) == lower_is_better:
return "good"
else:
return "bad"
def change_str(pct: float, ci_pct: float, lower_is_better: bool) -> str:
"""Format change% with paired CI significance.
Significant if the CI doesn't cross zero (i.e. |pct| > ci_pct).
"""
sig = significance(pct, ci_pct, lower_is_better)
emoji = {"good": "", "bad": "", "neutral": ""}[sig]
return f"{pct:+.2f}% {emoji}{ci_pct:.2f}%)"
def compute_changes(
baseline_stats: dict, feature_stats: dict, paired_stats: dict
) -> dict:
"""Pre-compute change percentages and significance for each metric."""
def pct(base: float, feat: float) -> float:
return (feat - base) / base * 100.0 if base > 0 else 0.0
def ci_pct(ci_ms: float, base_ms: float) -> float:
return ci_ms / base_ms * 100.0 if base_ms > 0 else 0.0
metrics = [
("mean", "mean_ms", "ci_ms", "mean_ms", True),
("p50", "p50_ms", "p50_ci_ms", "p50_ms", True),
("p90", "p90_ms", "p90_ci_ms", "p90_ms", True),
("p99", "p99_ms", "p99_ci_ms", "p99_ms", True),
("mgas_s", "mean_mgas_s", "mgas_ci", "mean_mgas_s", False),
("wall_clock", "wall_clock_s", "wall_clock_ci_ms", "mean_total_lat_ms", True),
]
changes = {}
for name, stat_key, ci_key, base_key, lower_is_better in metrics:
p = pct(baseline_stats[stat_key], feature_stats[stat_key])
c = ci_pct(paired_stats[ci_key], baseline_stats[base_key])
changes[name] = {
"pct": round(p, 4),
"ci_pct": round(c, 4),
"sig": significance(p, c, lower_is_better),
}
return changes
def generate_comparison_table(
run1: dict,
run2: dict,
paired: dict,
repo: str,
baseline_ref: str,
baseline_name: str,
feature_name: str,
feature_sha: str,
big_blocks: bool = False,
) -> str:
"""Generate a markdown comparison table between baseline and feature."""
n = paired["blocks"]
def pct(base: float, feat: float) -> float:
return (feat - base) / base * 100.0 if base > 0 else 0.0
mean_pct = pct(run1["mean_ms"], run2["mean_ms"])
gas_pct = pct(run1["mean_mgas_s"], run2["mean_mgas_s"])
wall_pct = pct(run1["wall_clock_s"], run2["wall_clock_s"])
p50_pct = pct(run1["p50_ms"], run2["p50_ms"])
p90_pct = pct(run1["p90_ms"], run2["p90_ms"])
p99_pct = pct(run1["p99_ms"], run2["p99_ms"])
# Bootstrap CIs as % of baseline percentile
p50_ci_pct = paired["p50_ci_ms"] / run1["p50_ms"] * 100.0 if run1["p50_ms"] > 0 else 0.0
p90_ci_pct = paired["p90_ci_ms"] / run1["p90_ms"] * 100.0 if run1["p90_ms"] > 0 else 0.0
p99_ci_pct = paired["p99_ci_ms"] / run1["p99_ms"] * 100.0 if run1["p99_ms"] > 0 else 0.0
# CI as a percentage of baseline mean
lat_ci_pct = paired["ci_ms"] / run1["mean_ms"] * 100.0 if run1["mean_ms"] > 0 else 0.0
mgas_ci_pct = paired["mgas_ci"] / run1["mean_mgas_s"] * 100.0 if run1["mean_mgas_s"] > 0 else 0.0
wall_ci_pct = paired["wall_clock_ci_ms"] / run1["mean_total_lat_ms"] * 100.0 if run1["mean_total_lat_ms"] > 0 else 0.0
base_url = f"https://github.com/{repo}/commit"
baseline_label = f"[`{baseline_name}`]({base_url}/{baseline_ref})"
feature_label = f"[`{feature_name}`]({base_url}/{feature_sha})"
lines = [
f"| Metric | {baseline_label} | {feature_label} | Change |",
"|--------|------|--------|--------|",
f"| Mean | {fmt_ms(run1['mean_ms'])} | {fmt_ms(run2['mean_ms'])} | {change_str(mean_pct, lat_ci_pct, lower_is_better=True)} |",
f"| StdDev | {fmt_ms(run1['stddev_ms'])} | {fmt_ms(run2['stddev_ms'])} | |",
f"| P50 | {fmt_ms(run1['p50_ms'])} | {fmt_ms(run2['p50_ms'])} | {change_str(p50_pct, p50_ci_pct, lower_is_better=True)} |",
f"| P90 | {fmt_ms(run1['p90_ms'])} | {fmt_ms(run2['p90_ms'])} | {change_str(p90_pct, p90_ci_pct, lower_is_better=True)} |",
f"| P99 | {fmt_ms(run1['p99_ms'])} | {fmt_ms(run2['p99_ms'])} | {change_str(p99_pct, p99_ci_pct, lower_is_better=True)} |",
f"| Mgas/s | {fmt_mgas(run1['mean_mgas_s'])} | {fmt_mgas(run2['mean_mgas_s'])} | {change_str(gas_pct, mgas_ci_pct, lower_is_better=False)} |",
f"| Wall Clock | {fmt_s(run1['wall_clock_s'])} | {fmt_s(run2['wall_clock_s'])} | {change_str(wall_pct, wall_ci_pct, lower_is_better=True)} |",
"",
f"*{n} {'big blocks' if big_blocks else 'blocks'}*",
]
return "\n".join(lines)
def generate_wait_time_table(
title: str,
baseline_stats: dict,
feature_stats: dict,
baseline_label: str,
feature_label: str,
) -> str:
"""Generate a markdown table for a wait time metric."""
if not baseline_stats or not feature_stats:
return ""
lines = [
f"### {title}",
"",
f"| Metric | {baseline_label} | {feature_label} |",
"|--------|------|--------|",
f"| Mean | {fmt_ms(baseline_stats['mean_ms'])} | {fmt_ms(feature_stats['mean_ms'])} |",
f"| P50 | {fmt_ms(baseline_stats['p50_ms'])} | {fmt_ms(feature_stats['p50_ms'])} |",
f"| P95 | {fmt_ms(baseline_stats['p95_ms'])} | {fmt_ms(feature_stats['p95_ms'])} |",
]
return "\n".join(lines)
def generate_markdown(
summary: dict, comparison_table: str,
wait_time_tables: list[str] | None = None,
behind_baseline: int = 0, repo: str = "", baseline_ref: str = "", baseline_name: str = "",
grafana_url: str | None = None,
) -> str:
"""Generate a markdown comment body."""
lines = ["## Benchmark Results", ""]
if behind_baseline > 0:
s = "s" if behind_baseline > 1 else ""
diff_link = f"https://github.com/{repo}/compare/{baseline_ref[:12]}...{baseline_name}"
lines.append(f"> ⚠️ Feature is [**{behind_baseline} commit{s} behind `{baseline_name}`**]({diff_link}). Consider rebasing for accurate results.")
lines.append("")
lines.append(comparison_table)
if wait_time_tables:
lines.append("")
lines.append("<details>")
lines.append("<summary>Wait Time Breakdown</summary>")
lines.append("")
for table in wait_time_tables:
if table:
lines.append(table)
lines.append("")
lines.append("</details>")
if grafana_url:
lines.append("")
lines.append(f"**[Grafana Dashboard]({grafana_url})**")
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(description="Parse reth-bench ABBA results")
parser.add_argument(
"--baseline-csv", nargs="+", required=True,
help="Baseline combined_latency.csv files (A1, A2)",
)
parser.add_argument(
"--feature-csv", "--branch-csv", nargs="+", required=True,
help="Feature combined_latency.csv files (B1, B2)",
)
parser.add_argument("--gas-csv", required=True, help="Path to total_gas.csv")
parser.add_argument(
"--output-summary", required=True, help="Output JSON summary path"
)
parser.add_argument("--output-markdown", required=True, help="Output markdown path")
parser.add_argument(
"--repo", default="paradigmxyz/reth", help="GitHub repo (owner/name)"
)
parser.add_argument("--baseline-ref", default=None, help="Baseline commit SHA")
parser.add_argument("--baseline-name", default=None, help="Baseline display name")
parser.add_argument("--feature-name", "--branch-name", default=None, help="Feature branch name")
parser.add_argument("--feature-ref", "--branch-sha", "--feature-sha", default=None, help="Feature commit SHA")
parser.add_argument("--behind-baseline", "--behind-main", type=int, default=0, help="Commits behind baseline")
parser.add_argument("--big-blocks", action="store_true", default=False, help="Big blocks mode")
parser.add_argument("--grafana-url", default=None, help="Grafana dashboard URL for this benchmark run")
args = parser.parse_args()
if len(args.baseline_csv) != len(args.feature_csv):
print("Must provide equal number of baseline and feature CSVs", file=sys.stderr)
sys.exit(1)
baseline_runs = []
feature_runs = []
for path in args.baseline_csv:
data = parse_combined_csv(path)
if not data:
print(f"No results in {path}", file=sys.stderr)
sys.exit(1)
baseline_runs.append(data)
for path in args.feature_csv:
data = parse_combined_csv(path)
if not data:
print(f"No results in {path}", file=sys.stderr)
sys.exit(1)
feature_runs.append(data)
gas = parse_gas_csv(args.gas_csv)
all_baseline = [r for run in baseline_runs for r in run]
all_feature = [r for run in feature_runs for r in run]
baseline_stats = compute_stats(all_baseline)
feature_stats = compute_stats(all_feature)
paired_stats = compute_paired_stats(baseline_runs, feature_runs)
if not paired_stats:
print("No common blocks between baseline and feature runs", file=sys.stderr)
sys.exit(1)
baseline_ref = args.baseline_ref or "main"
baseline_name = args.baseline_name or "baseline"
feature_name = args.feature_name or "feature"
feature_sha = args.feature_ref or "unknown"
comparison_table = generate_comparison_table(
baseline_stats,
feature_stats,
paired_stats,
repo=args.repo,
baseline_ref=baseline_ref,
baseline_name=baseline_name,
feature_name=feature_name,
feature_sha=feature_sha,
big_blocks=args.big_blocks,
)
print(f"Generated comparison ({paired_stats['n']} paired blocks, "
f"mean diff {paired_stats['mean_diff_ms']:+.3f}ms ± {paired_stats['ci_ms']:.3f}ms)")
base_url = f"https://github.com/{args.repo}/commit"
baseline_label = f"[`{baseline_name}`]({base_url}/{baseline_ref})"
feature_label = f"[`{feature_name}`]({base_url}/{feature_sha})"
wait_fields = [
("persistence_wait_us", "Persistence Wait"),
("sparse_trie_wait_us", "Trie Cache Update Wait"),
("execution_cache_wait_us", "Execution Cache Update Wait"),
]
wait_time_tables = []
wait_time_data = {}
for field, title in wait_fields:
b_stats = compute_wait_stats(all_baseline, field)
f_stats = compute_wait_stats(all_feature, field)
if b_stats and f_stats:
wait_time_data[field] = {
"title": title,
"baseline": b_stats,
"feature": f_stats,
}
table = generate_wait_time_table(title, b_stats, f_stats, baseline_label, feature_label)
if table:
wait_time_tables.append(table)
summary = {
"blocks": paired_stats["blocks"],
"big_blocks": args.big_blocks,
"baseline": {
"name": baseline_name,
"ref": baseline_ref,
"stats": baseline_stats,
},
"feature": {
"name": feature_name,
"ref": feature_sha,
"stats": feature_stats,
},
"paired": paired_stats,
"changes": compute_changes(baseline_stats, feature_stats, paired_stats),
"wait_times": wait_time_data,
}
with open(args.output_summary, "w") as f:
json.dump(summary, f, indent=2)
print(f"Summary written to {args.output_summary}")
markdown = generate_markdown(
summary, comparison_table,
wait_time_tables=wait_time_tables,
behind_baseline=args.behind_baseline,
repo=args.repo,
baseline_ref=baseline_ref,
baseline_name=baseline_name,
grafana_url=args.grafana_url,
)
with open(args.output_markdown, "w") as f:
f.write(markdown)
print(f"Markdown written to {args.output_markdown}")
if __name__ == "__main__":
main()

View File

@@ -1,132 +0,0 @@
#!/usr/bin/env bash
#
# Resolves baseline and feature refs for nightly regression benchmark runs.
#
# Queries the latest successful scheduled docker.yml run via GitHub API
# to find the commit that built the nightly Docker image. Compares with
# the last successful feature ref (from GH Actions cache) to determine
# baseline, detect staleness, and decide whether to skip.
#
# Usage: bench-nightly-refs.sh [--force]
#
# Outputs (via GITHUB_OUTPUT):
# baseline-ref — commit SHA for baseline
# feature-ref — commit SHA for feature (current nightly)
# should-skip — "true" if no new nightly since last run
# is-stale — "true" if latest nightly build is >24h old
# stale-age-hours — age of the nightly build in hours (only if stale)
# nightly-created — ISO timestamp of the nightly build
#
# Reads:
# .nightly-state/last-feature-ref (from GH Actions cache, may not exist)
#
# Requires: gh (GitHub CLI), jq, date
set -euo pipefail
FORCE="${1:-false}"
REPO="${GITHUB_REPOSITORY:-paradigmxyz/reth}"
# --- Step 1: Query latest successful scheduled docker.yml run ---
echo "::group::Querying latest nightly docker build"
RUNS_JSON=$(gh run list \
-R "$REPO" \
--workflow=docker.yml \
--event=schedule \
--status=completed \
--limit 5 \
--json headSha,createdAt,conclusion)
# Find the most recent successful run
LATEST=$(echo "$RUNS_JSON" | jq -r '[.[] | select(.conclusion == "success")] | first // empty')
if [ -z "$LATEST" ]; then
echo "::error::No successful scheduled docker.yml run found in the last 5 runs"
echo "Runs found: $RUNS_JSON"
exit 1
fi
FEATURE_REF=$(echo "$LATEST" | jq -r '.headSha')
CREATED_AT=$(echo "$LATEST" | jq -r '.createdAt')
echo "Latest nightly commit: $FEATURE_REF"
echo "Built at: $CREATED_AT"
echo "::endgroup::"
# --- Step 2: Staleness check ---
echo "::group::Checking staleness"
NOW_EPOCH=$(date +%s)
# Handle both GNU date (-d) and BSD date (-j -f) for cross-platform compat
CREATED_EPOCH=$(date -d "$CREATED_AT" +%s 2>/dev/null || \
date -j -f "%Y-%m-%dT%H:%M:%SZ" "$CREATED_AT" +%s 2>/dev/null || \
date -j -f "%Y-%m-%dT%T%z" "$CREATED_AT" +%s 2>/dev/null || \
{ echo "::error::Cannot parse date: $CREATED_AT"; exit 1; })
AGE_SECONDS=$(( NOW_EPOCH - CREATED_EPOCH ))
AGE_HOURS=$(( AGE_SECONDS / 3600 ))
IS_STALE="false"
if [ "$AGE_HOURS" -gt 24 ]; then
IS_STALE="true"
echo "::warning::STALE NIGHTLY: Build is ${AGE_HOURS}h old (>24h threshold)"
echo "This indicates the nightly docker build failed — no new image was produced"
else
echo "Nightly build age: ${AGE_HOURS}h (within 24h threshold)"
fi
echo "::endgroup::"
# --- Step 3: Read last successful feature ref from cache ---
echo "::group::Reading cached state"
LAST_FEATURE_REF=""
STATE_FILE=".nightly-state/last-feature-ref"
if [ -f "$STATE_FILE" ]; then
LAST_FEATURE_REF=$(tr -d '[:space:]' < "$STATE_FILE")
echo "Previous feature ref: $LAST_FEATURE_REF"
else
echo "No cached state found (first run)"
fi
echo "::endgroup::"
# --- Step 4: Determine baseline and skip logic ---
echo "::group::Resolving refs"
SHOULD_SKIP="false"
BASELINE_REF="$FEATURE_REF" # default for first run
if [ "$IS_STALE" = "true" ]; then
# Stale = error path, don't skip (will alert and fail downstream)
SHOULD_SKIP="false"
BASELINE_REF="${LAST_FEATURE_REF:-$FEATURE_REF}"
echo "Stale nightly detected — will alert and fail"
elif [ -z "$LAST_FEATURE_REF" ]; then
# First run: baseline = feature (self-comparison to establish baseline)
BASELINE_REF="$FEATURE_REF"
echo "First run — will benchmark nightly against itself to establish baseline"
elif [ "$LAST_FEATURE_REF" = "$FEATURE_REF" ]; then
# No new nightly since last successful run
if [ "$FORCE" = "true" ] || [ "$FORCE" = "--force" ]; then
echo "No new nightly, but force=true — running anyway"
BASELINE_REF="$LAST_FEATURE_REF"
else
SHOULD_SKIP="true"
echo "No new nightly since last run — will skip"
fi
else
# Normal case: new nightly available
BASELINE_REF="$LAST_FEATURE_REF"
echo "New nightly detected"
fi
echo "Baseline: $BASELINE_REF"
echo "Feature: $FEATURE_REF"
echo "Skip: $SHOULD_SKIP"
echo "Stale: $IS_STALE"
echo "::endgroup::"
# --- Step 5: Write outputs ---
{
echo "baseline-ref=$BASELINE_REF"
echo "feature-ref=$FEATURE_REF"
echo "should-skip=$SHOULD_SKIP"
echo "is-stale=$IS_STALE"
echo "stale-age-hours=$AGE_HOURS"
echo "nightly-created=$CREATED_AT"
} >> "$GITHUB_OUTPUT"

View File

@@ -1,308 +0,0 @@
// Sends Slack notifications for reth-bench results.
//
// Reads from environment:
// SLACK_BENCH_BOT_TOKEN Slack Bot User OAuth Token (xoxb-...)
// SLACK_BENCH_CHANNEL Public channel ID for significant improvements
// BENCH_WORK_DIR Directory containing summary.json
// BENCH_PR PR number (may be empty)
// BENCH_ACTOR GitHub user who triggered the bench
// BENCH_JOB_URL URL to the Actions job page
// BENCH_BASELINE_ARGS Extra CLI args for the baseline reth node
// BENCH_FEATURE_ARGS Extra CLI args for the feature reth node
// BENCH_SAMPLY 'true' if samply profiling was enabled
//
// Usage from actions/github-script:
// const notify = require('./.github/scripts/bench-slack-notify.js');
// await notify.success({ core, context });
// await notify.failure({ core, context, failedStep: '...' });
const fs = require('fs');
const path = require('path');
const { fmtChange, fmtMs, verdict, loadSamplyUrls, blocksLabel, metricRows, waitTimeRows } = require('./bench-utils');
const SLACK_API = 'https://slack.com/api/chat.postMessage';
function loadSlackUsers(repoRoot) {
try {
const raw = fs.readFileSync(path.join(repoRoot, '.github', 'scripts', 'bench-slack-users.json'), 'utf8');
const data = JSON.parse(raw);
// Filter out non-user-ID entries (like _comment)
const users = {};
for (const [k, v] of Object.entries(data)) {
if (!k.startsWith('_') && typeof v === 'string' && v.startsWith('U')) {
users[k] = v;
}
}
return users;
} catch {
return {};
}
}
async function postToSlack(token, channel, blocks, text, core, threadTs) {
const payload = { channel, blocks, text, unfurl_links: false };
if (threadTs) payload.thread_ts = threadTs;
const resp = await fetch(SLACK_API, {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(payload),
});
const data = await resp.json();
if (!data.ok) {
core.warning(`Slack API error (channel ${channel}): ${JSON.stringify(data)}`);
}
return data;
}
function cell(text) {
const s = String(text);
return { type: 'raw_text', text: s || ' ' };
}
// Slack shortcodes for verdict (Block Kit header doesn't support unicode emoji)
const SLACK_VERDICT = {
'⚠️': ':warning:',
'❌': ':x:',
'✅': ':white_check_mark:',
'⚪': ':white_circle:',
};
function buildSuccessBlocks({ summary, prNumber, actor, actorSlackId, jobUrl, repo, samplyUrls }) {
const { emoji, label } = verdict(summary.changes);
const headerEmoji = SLACK_VERDICT[emoji] || emoji;
const prUrl = prNumber ? `https://github.com/${repo}/pull/${prNumber}` : '';
const commitUrl = `https://github.com/${repo}/commit`;
const baselineLink = `<${commitUrl}/${summary.baseline.ref}|${summary.baseline.name}>`;
const featureLink = `<${commitUrl}/${summary.feature.ref}|${summary.feature.name}>`;
// Meta line
const metaParts = [];
if (prNumber) metaParts.push(`*<${prUrl}|PR #${prNumber}>*`);
metaParts.push(`triggered by ${actorSlackId ? `<@${actorSlackId}>` : `@${actor}`}`);
// Baseline/feature lines with samply profile links
let baselineLine = `*Baseline:* ${baselineLink}`;
const bl1 = samplyUrls['baseline-1'];
const bl2 = samplyUrls['baseline-2'];
if (bl1) baselineLine += ` | <${bl1}|Samply 1>`;
if (bl2) baselineLine += ` | <${bl2}|Samply 2>`;
let featureLine = `*Feature:* ${featureLink}`;
const fl1 = samplyUrls['feature-1'];
const fl2 = samplyUrls['feature-2'];
if (fl1) featureLine += ` | <${fl1}|Samply 1>`;
if (fl2) featureLine += ` | <${fl2}|Samply 2>`;
const countsLine = blocksLabel(summary).map(p => `*${p.key}:* ${p.value}`).join(' | ');
const baselineArgs = process.env.BENCH_BASELINE_ARGS || '';
const featureArgs = process.env.BENCH_FEATURE_ARGS || '';
const argsLines = [];
if (baselineArgs) argsLines.push(`*Baseline Args:* \`${baselineArgs}\``);
if (featureArgs) argsLines.push(`*Feature Args:* \`${featureArgs}\``);
const sectionText = [metaParts.join(' | '), '', baselineLine, featureLine, ...argsLines, countsLine].join('\n');
// Action buttons
const diffUrl = `https://github.com/${repo}/compare/${summary.baseline.ref}...${summary.feature.ref}`;
const buttons = [
{
type: 'button',
text: { type: 'plain_text', text: 'CI :github:', emoji: true },
url: jobUrl,
action_id: 'ci_button',
},
{
type: 'button',
text: { type: 'plain_text', text: 'Diff :github:', emoji: true },
url: diffUrl,
action_id: 'diff_button',
},
];
// Build table rows from shared metricRows
const rows = metricRows(summary);
const tableRows = [
[cell('Metric'), cell('Baseline'), cell('Feature'), cell('Change')],
...rows.map(r => [cell(r.label), cell(r.baseline), cell(r.feature), cell(r.change || ' ')]),
];
const blocks = [
{
type: 'header',
text: { type: 'plain_text', text: `${headerEmoji} ${label}`, emoji: true },
},
{
type: 'section',
text: { type: 'mrkdwn', text: sectionText },
},
{
type: 'table',
column_settings: [
{ align: 'left' },
{ align: 'right' },
{ align: 'right' },
{ align: 'right' },
],
rows: tableRows,
},
{
type: 'actions',
elements: buttons,
},
];
// Wait times as a separate table block (sent as threaded reply due to Slack one-table limit)
const threadBlocks = [];
const wtRows = waitTimeRows(summary);
if (wtRows.length > 0) {
const waitTableRows = [
[cell('Wait Time'), cell('Baseline'), cell('Feature')],
...wtRows.map(r => [cell(r.title), cell(r.baseline), cell(r.feature)]),
];
threadBlocks.push({
type: 'table',
column_settings: [
{ align: 'left' },
{ align: 'right' },
{ align: 'right' },
],
rows: waitTableRows,
});
}
return { blocks, threadBlocks };
}
function buildFailureBlocks({ prNumber, actor, actorSlackId, jobUrl, repo, failedStep }) {
const prUrl = prNumber ? `https://github.com/${repo}/pull/${prNumber}` : '';
const actorMention = actorSlackId ? `<@${actorSlackId}>` : `@${actor}`;
const parts = [
prNumber ? `*<${prUrl}|PR #${prNumber}>*` : '',
`by ${actorMention}`,
`failed while *${failedStep}*`,
].filter(Boolean);
const buttons = [
{
type: 'button',
text: { type: 'plain_text', text: 'CI :github:', emoji: true },
url: jobUrl,
action_id: 'ci_button',
},
];
return [
{
type: 'header',
text: { type: 'plain_text', text: ':rotating_light: Bench Failed', emoji: true },
},
{
type: 'section',
text: { type: 'mrkdwn', text: parts.join(' | ') },
},
{
type: 'actions',
elements: buttons,
},
];
}
async function success({ core, context }) {
const token = process.env.SLACK_BENCH_BOT_TOKEN;
if (!token) {
core.info('SLACK_BENCH_BOT_TOKEN not set, skipping Slack notification');
return;
}
let summary;
try {
summary = JSON.parse(fs.readFileSync(process.env.BENCH_WORK_DIR + '/summary.json', 'utf8'));
} catch (e) {
core.warning('Could not read summary.json for Slack notification');
return;
}
const repo = `${context.repo.owner}/${context.repo.repo}`;
const prNumber = process.env.BENCH_PR;
const actor = process.env.BENCH_ACTOR;
const jobUrl = process.env.BENCH_JOB_URL ||
`${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
const samplyUrls = loadSamplyUrls(process.env.BENCH_WORK_DIR);
const slackUsers = loadSlackUsers(process.env.GITHUB_WORKSPACE || '.');
const actorSlackId = slackUsers[actor];
const { blocks, threadBlocks } = buildSuccessBlocks({ summary, prNumber, actor, actorSlackId, jobUrl, repo, samplyUrls });
const text = `Bench: ${summary.baseline.name} vs ${summary.feature.name}`;
async function sendWithThread(ch) {
const res = await postToSlack(token, ch, blocks, text, core);
if (res.ok && res.ts && threadBlocks.length > 0) {
for (const tb of threadBlocks) {
await postToSlack(token, ch, [tb], 'Wait time breakdown', core, res.ts);
}
}
}
// Post to public channel if any metric shows significant improvement or regression
const channel = process.env.SLACK_BENCH_CHANNEL;
let postedToChannel = false;
if (channel) {
const changes = summary.changes || {};
const hasImprovement = Object.values(changes).some(c => c.sig === 'good');
if (hasImprovement) {
await sendWithThread(channel);
postedToChannel = true;
} else {
core.info('No significant improvement, skipping public channel notification');
}
}
// DM the actor only when results were not posted to the public channel
if (!postedToChannel) {
if (actorSlackId) {
await sendWithThread(actorSlackId);
} else {
core.info(`No Slack user mapping for GitHub user '${actor}', skipping DM`);
}
} else {
core.info(`Results posted to channel, skipping DM to ${actor}`);
}
}
async function failure({ core, context, failedStep }) {
const token = process.env.SLACK_BENCH_BOT_TOKEN;
if (!token) {
core.info('SLACK_BENCH_BOT_TOKEN not set, skipping Slack notification');
return;
}
const repo = `${context.repo.owner}/${context.repo.repo}`;
const prNumber = process.env.BENCH_PR;
const actor = process.env.BENCH_ACTOR;
const jobUrl = process.env.BENCH_JOB_URL ||
`${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
const slackUsers = loadSlackUsers(process.env.GITHUB_WORKSPACE || '.');
const actorSlackId = slackUsers[actor];
const blocks = buildFailureBlocks({ prNumber, actor, actorSlackId, jobUrl, repo, failedStep });
const text = `Bench failed while ${failedStep}`;
// Always DM the actor
if (actorSlackId) {
await postToSlack(token, actorSlackId, blocks, text, core);
} else {
core.info(`No Slack user mapping for GitHub user '${actor}', skipping DM`);
}
// Only DM for failures, don't post to public channel
}
module.exports = { success, failure };

View File

@@ -1,24 +0,0 @@
{
"_comment": "Maps GitHub usernames to Slack user IDs. Find yours: Slack profile > ··· > Copy member ID.",
"shekhirin": "U09FAL2UMLJ",
"mattsse": "U09FQNPMRT3",
"klkvr": "U09FAK95FC2",
"joshieDo": "U09LHN6GYAU",
"mediocregopher": "U09FF75KMQU",
"yongkangc": "U09FB0ECTD4",
"gakonst": "U092SEPDM40",
"Rjected": "U09F6SCKRGT",
"DaniPopes": "U09FAT8EK2A",
"emmajam": "U0A34UN92HW",
"onbjerg": "U09FB0UK5AA",
"fgimenez": "U09G3GP7CSU",
"rakita": "U09FB3Z2M7Y",
"jxom": "U09F72MG083",
"tmm": "U0AD0U8E88N",
"pepyakin": "U0A7HKMGEHJ",
"grandizzy": "U09F8DBDDRT",
"SuperFluffy": "U095BKHB2Q4",
"kamsz": "U0A2563UBRD",
"zerosnacks": "U09FARPMN74",
"samczsun": "U096R14E4H3"
}

View File

@@ -1,27 +0,0 @@
// Updates the reth-bench PR comment with current status.
//
// Reads from environment:
// BENCH_COMMENT_ID GitHub comment ID to update
// BENCH_JOB_URL URL to the Actions job page
// BENCH_CONFIG Config line (blocks, warmup, refs)
// BENCH_ACTOR User who triggered the benchmark
//
// Usage from actions/github-script:
// const s = require('./.github/scripts/bench-update-status.js');
// await s({github, context, status: 'Building baseline binary...'});
function buildBody(status) {
return `cc @${process.env.BENCH_ACTOR}\n\n🚀 Benchmark started! [View job](${process.env.BENCH_JOB_URL})\n\n⏳ **Status:** ${status}\n\n${process.env.BENCH_CONFIG}`;
}
async function updateStatus({ github, context, status }) {
await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: parseInt(process.env.BENCH_COMMENT_ID),
body: buildBody(status),
});
}
updateStatus.buildBody = buildBody;
module.exports = updateStatus;

View File

@@ -1,95 +0,0 @@
// Shared utilities for reth-bench result rendering.
//
// Used by bench-job-summary.js and bench-slack-notify.js.
const fs = require('fs');
const path = require('path');
const SIG_EMOJI = { good: '✅', bad: '❌', neutral: '⚪' };
function fmtMs(v) { return v.toFixed(2) + 'ms'; }
function fmtMgas(v) { return v.toFixed(2); }
function fmtS(v) { return v.toFixed(2) + 's'; }
function fmtChange(ch) {
if (!ch || (!ch.pct && !ch.ci_pct)) return '';
const pctStr = `${ch.pct >= 0 ? '+' : ''}${ch.pct.toFixed(2)}%`;
const ciStr = ch.ci_pct ? `${ch.ci_pct.toFixed(2)}%)` : '';
return `${pctStr}${ciStr} ${SIG_EMOJI[ch.sig]}`;
}
function verdict(changes) {
const vals = Object.values(changes);
const hasBad = vals.some(v => v.sig === 'bad');
const hasGood = vals.some(v => v.sig === 'good');
if (hasBad && hasGood) return { emoji: '⚠️', label: 'Mixed Results' };
if (hasBad) return { emoji: '❌', label: 'Regression' };
if (hasGood) return { emoji: '✅', label: 'Improvement' };
return { emoji: '⚪', label: 'No Difference' };
}
function loadSamplyUrls(workDir) {
const urls = {};
for (const run of ['baseline-1', 'baseline-2', 'feature-1', 'feature-2']) {
try {
const url = fs.readFileSync(path.join(workDir, run, 'samply-profile-url.txt'), 'utf8').trim();
if (url) urls[run] = url;
} catch {}
}
return urls;
}
function blocksLabel(summary) {
const parts = [];
if (summary.big_blocks) {
parts.push({ key: 'Big Blocks', value: summary.blocks });
} else {
const warmup = summary.warmup_blocks || process.env.BENCH_WARMUP_BLOCKS || '';
if (warmup) parts.push({ key: 'Warmup', value: warmup });
parts.push({ key: 'Blocks', value: summary.blocks });
}
const cores = process.env.BENCH_CORES || '0';
if (cores !== '0') parts.push({ key: 'Cores', value: cores });
return parts;
}
// The 7 metric rows shared by all renderers.
// Returns an array of { label, baseline, feature, change } objects.
function metricRows(summary) {
const b = summary.baseline.stats;
const f = summary.feature.stats;
const c = summary.changes;
return [
{ label: 'Mean', baseline: fmtMs(b.mean_ms), feature: fmtMs(f.mean_ms), change: fmtChange(c.mean) },
{ label: 'StdDev', baseline: fmtMs(b.stddev_ms), feature: fmtMs(f.stddev_ms), change: '' },
{ label: 'P50', baseline: fmtMs(b.p50_ms), feature: fmtMs(f.p50_ms), change: fmtChange(c.p50) },
{ label: 'P90', baseline: fmtMs(b.p90_ms), feature: fmtMs(f.p90_ms), change: fmtChange(c.p90) },
{ label: 'P99', baseline: fmtMs(b.p99_ms), feature: fmtMs(f.p99_ms), change: fmtChange(c.p99) },
{ label: 'Mgas/s', baseline: fmtMgas(b.mean_mgas_s), feature: fmtMgas(f.mean_mgas_s), change: fmtChange(c.mgas_s) },
{ label: 'Wall Clock', baseline: fmtS(b.wall_clock_s), feature: fmtS(f.wall_clock_s), change: fmtChange(c.wall_clock) },
];
}
// Wait time rows: one row per metric showing mean values.
function waitTimeRows(summary) {
const waitTimes = summary.wait_times || {};
const rows = [];
for (const key of Object.keys(waitTimes)) {
const wt = waitTimes[key];
rows.push({ title: wt.title, baseline: fmtMs(wt.baseline.mean_ms), feature: fmtMs(wt.feature.mean_ms) });
}
return rows;
}
module.exports = {
SIG_EMOJI,
fmtMs,
fmtMgas,
fmtS,
fmtChange,
verdict,
loadSamplyUrls,
blocksLabel,
metricRows,
waitTimeRows,
};

Some files were not shown because too many files have changed in this diff Show More