This PR moves kzg commitments to bid. The rationale behind is captured
in this [issue](https://github.com/ethereum/consensus-specs/issues/4870)
* Moves blob KZG commitments to the earlier point where builder intent
is known
* Removes duplicated commitments from data column sidecars which saves
descent b/w per slot
* Enables nodes to start fetching blobs via getBlobs as soon as the bid
is received
* Slightly increases bid size and may add minor bidding channel latency
but the tradeoff favors lower network load and simpler DA handling
**What type of PR is this?**
Other
**What does this PR do? Why is it needed?**
Follow up to https://github.com/OffchainLabs/prysm/pull/16215 this pr
improves logging, fixes stuttering in package naming, adds additional
unit tests, and deduplicates fallback node code.
**Which issues(s) does this PR fix?**
fixes a potential race if reconnecting to the same host very quickly
which has a stale connection still.
**Other notes for review**
**Acknowledgements**
- [x] I have read
[CONTRIBUTING.md](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md).
- [x] I have included a uniquely named [changelog fragment
file](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md#maintaining-changelogmd).
- [x] I have added a description with sufficient context for reviewers
to understand this PR.
- [x] I have tested that my changes work as expected and I added a
testing plan to the PR description (if applicable).
<!-- Thanks for sending a PR! Before submitting:
1. If this is your first PR, check out our contribution guide here
https://docs.prylabs.network/docs/contribute/contribution-guidelines
You will then need to sign our Contributor License Agreement (CLA),
which will show up as a comment from a bot in this pull request after
you open it. We cannot review code without a signed CLA.
2. Please file an associated tracking issue if this pull request is
non-trivial and requires context for our team to understand. All
features and most bug fixes should have
an associated issue with a design discussed and decided upon. Small bug
fixes and documentation improvements don't need issues.
3. New features and bug fixes must have tests. Documentation may need to
be updated. If you're unsure what to update, send the PR, and we'll
discuss
in review.
4. Note that PRs updating dependencies and new Go versions are not
accepted.
Please file an issue instead.
5. A changelog entry is required for user facing issues.
-->
**What type of PR is this?**
## Summary
This PR implements gRPC fallback support for the validator client,
allowing it to automatically switch between multiple beacon node
endpoints when the primary node becomes unavailable or unhealthy.
## Changes
- Added `grpcConnectionProvider` to manage multiple gRPC connections
with circular failover
- Validator automatically detects unhealthy beacon nodes and switches to
the next available endpoint
- Health checks verify both node responsiveness AND sync status before
accepting a node
- Improved logging to only show "Found fully synced beacon node" when an
actual switch occurs (reduces log noise)
I removed the old middleware that uses gRPC's built in load balancer
because:
- gRPC's pick_first load balancer doesn't provide sync-status-aware
failover
- The validator needs to ensure it connects to a fully synced node, not
just a reachable one
## Test Scenario
### Setup
Deployed a 4-node Kurtosis testnet with local validator connecting to 2
beacon nodes:
```yaml
# kurtosis-grpc-fallback-test.yaml
participants:
- el_type: nethermind
cl_type: prysm
validator_count: 128 # Keeps chain advancing
- el_type: nethermind
cl_type: prysm
validator_count: 64
- el_type: nethermind
cl_type: prysm
validator_count: 64 # Keeps chain advancing
- el_type: nethermind
cl_type: prysm
validator_count: 64 # Keeps chain advancing
network_params:
fulu_fork_epoch: 0
seconds_per_slot: 6
```
Local validator started with:
```bash
./validator --beacon-rpc-provider=127.0.0.1:33005,127.0.0.1:33012 ...
```
### Test 1: Primary Failover (cl-1 → cl-2)
1. Stopped cl-1 beacon node
2. Validator detected failure and switched to cl-2
**Logs:**
```
WARN Beacon node is not responding, switching host currentHost=127.0.0.1:33005 nextHost=127.0.0.1:33012
DEBUG Trying gRPC endpoint newHost=127.0.0.1:33012 previousHost=127.0.0.1:33005
INFO Failover succeeded: connected to healthy beacon node failedAttempts=[127.0.0.1:33005] newHost=127.0.0.1:33012 previousHost=127.0.0.1:33005
```
**Result:** ✅ PASSED - Validator continued submitting attestations on
cl-2
### Test 2: Circular Failover (cl-2 → cl-1)
1. Restarted cl-1, stopped cl-2
2. Validator detected failure and switched back to cl-1
**Logs:**
```
WARN Beacon node is not responding, switching host currentHost=127.0.0.1:33012 nextHost=127.0.0.1:33005
DEBUG Trying gRPC endpoint newHost=127.0.0.1:33005 previousHost=127.0.0.1:33012
INFO Failover succeeded: connected to healthy beacon node failedAttempts=[127.0.0.1:33012] newHost=127.0.0.1:33005 previousHost=127.0.0.1:33012
```
**Result:** ✅ PASSED - Circular fallback works correctly
## Key Log Messages
| Log Level | Message | Source |
|-----------|---------|--------|
| WARN | "Beacon node is not responding, switching host" |
`changeHost()` in validator.go |
| INFO | "Switched gRPC endpoint" | `SetHost()` in
grpc_connection_provider.go |
| INFO | "Found fully synced beacon node" | `FindHealthyHost()` in
validator.go (only on actual switch) |
## Test Plan
- [x] Verify primary failover (cl-1 → cl-2)
- [x] Verify circular failover (cl-2 → cl-1)
- [x] Verify validator continues producing attestations after switch
- [x] Verify "Found fully synced beacon node" only logs on actual switch
(not every health check)
**What does this PR do? Why is it needed?**
**Which issues(s) does this PR fix?**
Fixes # https://github.com/OffchainLabs/prysm/pull/7133
**Other notes for review**
**Acknowledgements**
- [x] I have read
[CONTRIBUTING.md](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md).
- [x] I have included a uniquely named [changelog fragment
file](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md#maintaining-changelogmd).
- [x] I have added a description with sufficient context for reviewers
to understand this PR.
- [x] I have tested that my changes work as expected and I added a
testing plan to the PR description (if applicable).
---------
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
#### This PR sets the foundation for the new logging features.
---
The goal of this big PR is the following:
1. Adding a log.go file to every package:
[_commit_](54f6396d4c)
- Writing a bash script that adds the log.go file to every package that
imports logrus, except the excluded packages, configured at the top of
the bash script.
- the log.go file creates a log variable and sets a field called
`package` to the full path of that package.
- I have tried to fix every error/problem that came from mass generation
of this file. (duplicate declarations, different prefix names, etc...)
- some packages had the log.go file from before, and had some helper
functions in there as well. I've moved all of them to a `log_helpers.go`
file within each package.
2. Create a CI rule which verifies that:
[_commit_](b799c3a0ef)
- every package which imports logrus, also has a log.go file, except the
excluded packages.
- the `package` field of each log.go variable, has the correct path. (to
detect when we move a package or change it's name)
- I pushed a commit with a manually changed log.go file to trigger the
ci check failure and it worked.
3. Alter the logging system to read the prefix from this `package` field
for every log while outputing:
[_commit_](b0c7f1146c)
- some packages have/want/need a different log prefix than their package
name (like `kv`). This can be solved by keeping a map of package paths
to prefix names somewhere.
---
**Some notes:**
- Please review everything carefully.
- I created the `prefixReplacement` map and populated the data that I
deemed necessary. Please check it and complain if something doesn't make
sense or is missing. I attached at the bottom, the list of all the
packages that used to use a different name than their package name as
their prefix.
- I have chosen to mark some packages to be excluded from this whole
process. They will either not log anything, or log without a prefix, or
log using their previously defined prefix. See the list of exclusions in
the bottom.
- I fixed all the tests that failed because of this change. These were
failing because they were expecting the old prefix to be in the
generated logs. I have changed those to expect the new `package` field
instead. This might not be a great solution. Ideally we might want to
remove this from the tests so they only test for relevant fields in the
logs. but this is a problem for another day.
- Please run the node with this config, and mention if you see something
weird in the logs. (use different verbosities)
- The CI workflow uses a script that basically runs the
`hack/gen-logs.sh` and checks that the git diff is zero. that script is
`hack/check-logs.sh`. This means that if one runs this script locally,
it will not actually _check_ anything, rather than just regenerate the
log.go files and fix any mistake. This might be confusing. Please
suggest solutions if you think it's a problem.
---
**A list of packages that used a different prefix than their package
names for their logs:**
- beacon-chain/cache/depositsnapshot/ package depositsnapshot, prefix
"cache"
- beacon-chain/core/transition/log.go — package transition, prefix
"state"
- beacon-chain/db/kv/log.go — package kv, prefix "db"
- beacon-chain/db/slasherkv/log.go — package slasherkv, prefix
"slasherdb"
- beacon-chain/db/pruner/pruner.go — package pruner, prefix "db-pruner"
- beacon-chain/light-client/log.go — package light_client, prefix
"light-client"
- beacon-chain/operations/attestations/log.go — package attestations,
prefix "pool/attestations"
- beacon-chain/operations/slashings/log.go — package slashings, prefix
"pool/slashings"
- beacon-chain/rpc/core/log.go — package core, prefix "rpc/core"
- beacon-chain/rpc/eth/beacon/log.go — package beacon, prefix
"rpc/beaconv1"
- beacon-chain/rpc/eth/validator/log.go — package validator, prefix
"beacon-api"
- beacon-chain/rpc/prysm/v1alpha1/beacon/log.go — package beacon, prefix
"rpc"
- beacon-chain/rpc/prysm/v1alpha1/validator/log.go — package validator,
prefix "rpc/validator"
- beacon-chain/state/stategen/log.go — package stategen, prefix
"state-gen"
- beacon-chain/sync/checkpoint/log.go — package checkpoint, prefix
"checkpoint-sync"
- beacon-chain/sync/initial-sync/log.go — package initialsync, prefix
"initial-sync"
- cmd/prysmctl/p2p/log.go — package p2p, prefix "prysmctl-p2p"
- config/features/log.go -- package features, prefix "flags"
- io/file/log.go — package file, prefix "fileutil"
- proto/prysm/v1alpha1/log.go — package eth, prefix "protobuf"
- validator/client/beacon-api/log.go — package beacon_api, prefix
"beacon-api"
- validator/db/kv/log.go — package kv, prefix "db"
- validator/db/filesystem/db.go — package filesystem, prefix "db"
- validator/keymanager/derived/log.go — package derived, prefix
"derived-keymanager"
- validator/keymanager/local/log.go — package local, prefix
"local-keymanager"
- validator/keymanager/remote-web3signer/log.go — package
remote_web3signer, prefix "remote-keymanager"
- validator/keymanager/remote-web3signer/internal/log.go — package
internal, prefix "remote-web3signer-
internal"
- beacon-chain/forkchoice/doubly... prefix is
"forkchoice-doublylinkedtree"
**List of excluded directories (their subdirectories are also
excluded):**
```
EXCLUDED_PATH_PREFIXES=(
"testing"
"validator/client/testutil"
"beacon-chain/p2p/testing"
"beacon-chain/rpc/eth/config"
"beacon-chain/rpc/prysm/v1alpha1/debug"
"tools"
"runtime"
"monitoring"
"io"
"cmd"
".well-known"
"changelog"
"hack"
"specrefs"
"third_party"
"bazel-out"
"bazel-bin"
"bazel-prysm"
"bazel-testlogs"
"build"
".github"
".jj"
".idea"
".vscode"
)
```
* Remove Beacon API endpoints that were deprecated in Electra
* changelog <3
* build fix
* remove more stuff
* fix post-submit e2e and remove structs
* list endpoints in the changelog
---------
Co-authored-by: james-prysm <90280386+james-prysm@users.noreply.github.com>
* Move ssz_query objects into testing folder (ensuring test objects only used in test environment)
* Add containers for response
* Export sszInfo
* Add QueryBeaconState/Block
* Add comments and few refactor
* Fix merge conflict issues
* Return 500 when calculate offset fails
* Add test for QueryBeaconState
* Add test for QueryBeaconBlock
* Changelog :)
* Rename `QuerySSZRequest` to `SSZQueryRequest`
* Fix middleware hooks for RPC to accept JSON from client and return SSZ
* Convert to `SSZObject` directly from proto
* Move marshalling/calculating hash tree root part after `CalculateOffsetAndLength`
* Make nogo happy
* Add informing comment for using proto unsafe conversion
---------
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Calculate max epoch and churn for slashing once
* calculate once for proposer and attester slashings
* changelog <3
* introduce struct
* check if err is nil in ProcessVoluntaryExits
* rename exitData to exitInfo and return from functions
* cleanup + tests
* cleanup after rebase
* Potuz's review
* pre-calculate total active balance
* remove `slashValidatorFunc` closure
* Avoid a second validator loop
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* remove balance parameter from slashing functions
---------
Co-authored-by: terence tsao <terence@prysmaticlabs.com>
Co-authored-by: potuz <potuz@prysmaticlabs.com>
* fix: submitPoolSyncCommitteeSignatures reponse inconsistent
* update: bazel build file
* update: add changelog fragment file
* update api/server/structs/BUILD.bazel format
* update the unit test
* update: the error format
---------
Co-authored-by: james-prysm <90280386+james-prysm@users.noreply.github.com>
* removing ssz-only flag
* gaz
* reverting other uses of sszonly
* gaz
* adding kasey and radek's suggestions
* update changelog
* adding test
* radek advice with new headers and tests
* adding logs and fixing comments
* adding logs and fixing comments
* gaz
* Update validator/client/beacon-api/rest_handler_client.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update api/apiutil/header.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update api/apiutil/header.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* radek's comments
* adding another failing case based on radek's suggestion
* another unit test
---------
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Convert genesis times from seconds to time.Time
* Fixing failed forkchoice tests in a new commit so it doesn't get worse
Fixing failed spectest tests in a new commit so it doesn't get worse
Fixing forkchoice tests, then spectests
* Fixing forkchoice tests, then spectests. Now asking for help...
* Fix TestForkChoice_GetProposerHead
* Fix broken build
* Resolve TODO(preston) items
* Changelog fragment
* Resolve TODO(preston) items again
* Resolve lint issues
* Use consistant field names for sinceSlotStart (no spaces)
* Manu's feedback
* Renamed StartTime -> UnsafeStartTime, marked as deprecated because it doesn't handle overflow scenarios.
Renamed SlotTime -> StartTime
Renamed SlotAt -> At
Handled the error in cases where StartTime was used.
@james-prysm feedback
* Revert beacon-chain/blockchain/receive_block_test.go from 1b7844de
* Fixing issues after rebase
* Accepted suggestions from @potuz
* Remove CanonicalHeadSlot from merge conflicts
---------
Co-authored-by: potuz <potuz@prysmaticlabs.com>
* Add log capitalization analyzer and apply fixes across codebase
Implements a new nogo analyzer to enforce proper log message capitalization and applies the fixes to all affected log statements throughout the beacon chain, validator, and supporting components.
Co-Authored-By: Claude <noreply@anthropic.com>
* Radek's feedback
---------
Co-authored-by: Claude <noreply@anthropic.com>
* poc changes for safe validator shutdown
* simplifying health routine and adding safe shutdown after max restarts reached
* fixing health tests
* fixing tests
* changelog
* gofmt
* fixing runner
* improve how runner times out
* improvements to ux on logs
* linting
* adding in max healthcheck flag
* changelog
* Update james-prysm_safe-validator-shutdown.md
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/runner.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/service.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/runner.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/runner.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* addressing some feedback from radek
* addressing some more feedback
* fixing name based on feedback
* fixing mistake on max health checks
* conflict accidently checked in
* go 1.23 no longer needs you to stop for the ticker
* Update flags.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* wip no unit test for recursive healthy host find
* rework healthcheck
* gaz
* fixing bugs and improving logs with new monitor
* removing health tracker, fixing runner tests, and adding placeholder for monitor tests
* fixing event stream check
* linting
* adding in health monitor tests
* gaz
* improving test
* removing some log.fatals
* forgot to remove comment
* missed fatal removal
* doppleganger should exit the node safely now
* Update validator/client/health_monitor.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* radek review
* Update validator/client/validator.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/validator.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/health_monitor.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/health_monitor.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/health_monitor.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/client/validator.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* radek feedback
* read up on more suggestions and making fixes to channel
* suggested updates after more reading
* reverting some of this because it froze the validator after healthcheck failed
* fully reverting
* some improvements I found during testing
* Update cmd/validator/flags/flags.go
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* preston's feedback
* clarifications on changelog
* converted to using an event feed instead of my own channel publishing implementation, adding relevant logs
* preston log suggestion
---------
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* Add the new Fulu state with the new field
* fix the hasher for the fulu state
* Fix ToProto() and ToProtoUnsafe()
* Add the fields as shared
* Add epoch transition code
* short circuit the proposer cache to use the state
* Marshal the state JSON
* update spectests to 1.6.0-alpha.1
* Remove deneb and electra entries from blob schedule
This was cherry picked from PR #15364
and edited to remove the minimal cases
* Fix minimal tests
* Increase deadling for processing blocks in spectests
* Preston's review
* review
---------
Co-authored-by: terence tsao <terence@prysmaticlabs.com>
* wip
* more wip
* commenting out builder redefinition of payload types
* removing unneeded commented out code and unused type conversions
* fixing unit tests and removing irrelevant unit tests
* gaz
* adding changelog
* Update api/server/structs/conversions_block_execution.go
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* Update api/server/structs/conversions_block_execution.go
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* Revert some changes to test files, change the minimum to support the new type changes
* Remove commented block
* fixing import
---------
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* Add blob schedule support
* Feedback
* Fix e2e test
* adding in temporary fix until web3signer is supported for blob schedule in config
* Add fallback
* Uncomment blob schedule from plcae holder fields for test
* Log blob limit increase
* Sort
---------
Co-authored-by: james-prysm <james@prysmaticlabs.com>
* Migrate Prysm repo to Offchain Labs organization ahead of Pectra upgrade v6
* Replace prysmaticlabs with OffchainLabs on general markdowns
* Update mock
* Gazelle and add mock.go to excluded generated mock file
* Implement static analysis to prevent panics
* Add nopanic to nogo
* Fix violations and add exclusions
Fix violations and add exclusions for all
* Changelog fragment
* Use pass.Report instead of pass.Reportf
* Remove strings.ToLower for checking init method name
* Add exclusion for herumi init
* Move api/client/beacon template function to init and its own file
* Fix nopanic testcase
* feat(event-stream): Add block_gossip topic support
* feat(event-stream): Add block_gossip topic support
* feat(event-stream): Add block_gossip topic support
* feat: add block gossip topic support to beacon api event stream
* fix: sync_fuzz_test panic
* fix: check for nil operationNotifier before sending block gossip
The operationNotifier was not being checked for nil before being used,
which could lead to a panic if it was not initialized. This commit adds
a nil check to prevent the panic.
---------
Co-authored-by: james-prysm <90280386+james-prysm@users.noreply.github.com>