<!-- Thanks for sending a PR! Before submitting:
1. If this is your first PR, check out our contribution guide here
https://docs.prylabs.network/docs/contribute/contribution-guidelines
You will then need to sign our Contributor License Agreement (CLA),
which will show up as a comment from a bot in this pull request after
you open it. We cannot review code without a signed CLA.
2. Please file an associated tracking issue if this pull request is
non-trivial and requires context for our team to understand. All
features and most bug fixes should have
an associated issue with a design discussed and decided upon. Small bug
fixes and documentation improvements don't need issues.
3. New features and bug fixes must have tests. Documentation may need to
be updated. If you're unsure what to update, send the PR, and we'll
discuss
in review.
4. Note that PRs updating dependencies and new Go versions are not
accepted.
Please file an issue instead.
5. A changelog entry is required for user facing issues.
-->
**What type of PR is this?**
## Summary
This PR implements gRPC fallback support for the validator client,
allowing it to automatically switch between multiple beacon node
endpoints when the primary node becomes unavailable or unhealthy.
## Changes
- Added `grpcConnectionProvider` to manage multiple gRPC connections
with circular failover
- Validator automatically detects unhealthy beacon nodes and switches to
the next available endpoint
- Health checks verify both node responsiveness AND sync status before
accepting a node
- Improved logging to only show "Found fully synced beacon node" when an
actual switch occurs (reduces log noise)
I removed the old middleware that uses gRPC's built in load balancer
because:
- gRPC's pick_first load balancer doesn't provide sync-status-aware
failover
- The validator needs to ensure it connects to a fully synced node, not
just a reachable one
## Test Scenario
### Setup
Deployed a 4-node Kurtosis testnet with local validator connecting to 2
beacon nodes:
```yaml
# kurtosis-grpc-fallback-test.yaml
participants:
- el_type: nethermind
cl_type: prysm
validator_count: 128 # Keeps chain advancing
- el_type: nethermind
cl_type: prysm
validator_count: 64
- el_type: nethermind
cl_type: prysm
validator_count: 64 # Keeps chain advancing
- el_type: nethermind
cl_type: prysm
validator_count: 64 # Keeps chain advancing
network_params:
fulu_fork_epoch: 0
seconds_per_slot: 6
```
Local validator started with:
```bash
./validator --beacon-rpc-provider=127.0.0.1:33005,127.0.0.1:33012 ...
```
### Test 1: Primary Failover (cl-1 → cl-2)
1. Stopped cl-1 beacon node
2. Validator detected failure and switched to cl-2
**Logs:**
```
WARN Beacon node is not responding, switching host currentHost=127.0.0.1:33005 nextHost=127.0.0.1:33012
DEBUG Trying gRPC endpoint newHost=127.0.0.1:33012 previousHost=127.0.0.1:33005
INFO Failover succeeded: connected to healthy beacon node failedAttempts=[127.0.0.1:33005] newHost=127.0.0.1:33012 previousHost=127.0.0.1:33005
```
**Result:** ✅ PASSED - Validator continued submitting attestations on
cl-2
### Test 2: Circular Failover (cl-2 → cl-1)
1. Restarted cl-1, stopped cl-2
2. Validator detected failure and switched back to cl-1
**Logs:**
```
WARN Beacon node is not responding, switching host currentHost=127.0.0.1:33012 nextHost=127.0.0.1:33005
DEBUG Trying gRPC endpoint newHost=127.0.0.1:33005 previousHost=127.0.0.1:33012
INFO Failover succeeded: connected to healthy beacon node failedAttempts=[127.0.0.1:33012] newHost=127.0.0.1:33005 previousHost=127.0.0.1:33012
```
**Result:** ✅ PASSED - Circular fallback works correctly
## Key Log Messages
| Log Level | Message | Source |
|-----------|---------|--------|
| WARN | "Beacon node is not responding, switching host" |
`changeHost()` in validator.go |
| INFO | "Switched gRPC endpoint" | `SetHost()` in
grpc_connection_provider.go |
| INFO | "Found fully synced beacon node" | `FindHealthyHost()` in
validator.go (only on actual switch) |
## Test Plan
- [x] Verify primary failover (cl-1 → cl-2)
- [x] Verify circular failover (cl-2 → cl-1)
- [x] Verify validator continues producing attestations after switch
- [x] Verify "Found fully synced beacon node" only logs on actual switch
(not every health check)
**What does this PR do? Why is it needed?**
**Which issues(s) does this PR fix?**
Fixes # https://github.com/OffchainLabs/prysm/pull/7133
**Other notes for review**
**Acknowledgements**
- [x] I have read
[CONTRIBUTING.md](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md).
- [x] I have included a uniquely named [changelog fragment
file](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md#maintaining-changelogmd).
- [x] I have added a description with sufficient context for reviewers
to understand this PR.
- [x] I have tested that my changes work as expected and I added a
testing plan to the PR description (if applicable).
---------
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
#### This PR sets the foundation for the new logging features.
---
The goal of this big PR is the following:
1. Adding a log.go file to every package:
[_commit_](54f6396d4c)
- Writing a bash script that adds the log.go file to every package that
imports logrus, except the excluded packages, configured at the top of
the bash script.
- the log.go file creates a log variable and sets a field called
`package` to the full path of that package.
- I have tried to fix every error/problem that came from mass generation
of this file. (duplicate declarations, different prefix names, etc...)
- some packages had the log.go file from before, and had some helper
functions in there as well. I've moved all of them to a `log_helpers.go`
file within each package.
2. Create a CI rule which verifies that:
[_commit_](b799c3a0ef)
- every package which imports logrus, also has a log.go file, except the
excluded packages.
- the `package` field of each log.go variable, has the correct path. (to
detect when we move a package or change it's name)
- I pushed a commit with a manually changed log.go file to trigger the
ci check failure and it worked.
3. Alter the logging system to read the prefix from this `package` field
for every log while outputing:
[_commit_](b0c7f1146c)
- some packages have/want/need a different log prefix than their package
name (like `kv`). This can be solved by keeping a map of package paths
to prefix names somewhere.
---
**Some notes:**
- Please review everything carefully.
- I created the `prefixReplacement` map and populated the data that I
deemed necessary. Please check it and complain if something doesn't make
sense or is missing. I attached at the bottom, the list of all the
packages that used to use a different name than their package name as
their prefix.
- I have chosen to mark some packages to be excluded from this whole
process. They will either not log anything, or log without a prefix, or
log using their previously defined prefix. See the list of exclusions in
the bottom.
- I fixed all the tests that failed because of this change. These were
failing because they were expecting the old prefix to be in the
generated logs. I have changed those to expect the new `package` field
instead. This might not be a great solution. Ideally we might want to
remove this from the tests so they only test for relevant fields in the
logs. but this is a problem for another day.
- Please run the node with this config, and mention if you see something
weird in the logs. (use different verbosities)
- The CI workflow uses a script that basically runs the
`hack/gen-logs.sh` and checks that the git diff is zero. that script is
`hack/check-logs.sh`. This means that if one runs this script locally,
it will not actually _check_ anything, rather than just regenerate the
log.go files and fix any mistake. This might be confusing. Please
suggest solutions if you think it's a problem.
---
**A list of packages that used a different prefix than their package
names for their logs:**
- beacon-chain/cache/depositsnapshot/ package depositsnapshot, prefix
"cache"
- beacon-chain/core/transition/log.go — package transition, prefix
"state"
- beacon-chain/db/kv/log.go — package kv, prefix "db"
- beacon-chain/db/slasherkv/log.go — package slasherkv, prefix
"slasherdb"
- beacon-chain/db/pruner/pruner.go — package pruner, prefix "db-pruner"
- beacon-chain/light-client/log.go — package light_client, prefix
"light-client"
- beacon-chain/operations/attestations/log.go — package attestations,
prefix "pool/attestations"
- beacon-chain/operations/slashings/log.go — package slashings, prefix
"pool/slashings"
- beacon-chain/rpc/core/log.go — package core, prefix "rpc/core"
- beacon-chain/rpc/eth/beacon/log.go — package beacon, prefix
"rpc/beaconv1"
- beacon-chain/rpc/eth/validator/log.go — package validator, prefix
"beacon-api"
- beacon-chain/rpc/prysm/v1alpha1/beacon/log.go — package beacon, prefix
"rpc"
- beacon-chain/rpc/prysm/v1alpha1/validator/log.go — package validator,
prefix "rpc/validator"
- beacon-chain/state/stategen/log.go — package stategen, prefix
"state-gen"
- beacon-chain/sync/checkpoint/log.go — package checkpoint, prefix
"checkpoint-sync"
- beacon-chain/sync/initial-sync/log.go — package initialsync, prefix
"initial-sync"
- cmd/prysmctl/p2p/log.go — package p2p, prefix "prysmctl-p2p"
- config/features/log.go -- package features, prefix "flags"
- io/file/log.go — package file, prefix "fileutil"
- proto/prysm/v1alpha1/log.go — package eth, prefix "protobuf"
- validator/client/beacon-api/log.go — package beacon_api, prefix
"beacon-api"
- validator/db/kv/log.go — package kv, prefix "db"
- validator/db/filesystem/db.go — package filesystem, prefix "db"
- validator/keymanager/derived/log.go — package derived, prefix
"derived-keymanager"
- validator/keymanager/local/log.go — package local, prefix
"local-keymanager"
- validator/keymanager/remote-web3signer/log.go — package
remote_web3signer, prefix "remote-keymanager"
- validator/keymanager/remote-web3signer/internal/log.go — package
internal, prefix "remote-web3signer-
internal"
- beacon-chain/forkchoice/doubly... prefix is
"forkchoice-doublylinkedtree"
**List of excluded directories (their subdirectories are also
excluded):**
```
EXCLUDED_PATH_PREFIXES=(
"testing"
"validator/client/testutil"
"beacon-chain/p2p/testing"
"beacon-chain/rpc/eth/config"
"beacon-chain/rpc/prysm/v1alpha1/debug"
"tools"
"runtime"
"monitoring"
"io"
"cmd"
".well-known"
"changelog"
"hack"
"specrefs"
"third_party"
"bazel-out"
"bazel-bin"
"bazel-prysm"
"bazel-testlogs"
"build"
".github"
".jj"
".idea"
".vscode"
)
```
* Migrate Prysm repo to Offchain Labs organization ahead of Pectra upgrade v6
* Replace prysmaticlabs with OffchainLabs on general markdowns
* Update mock
* Gazelle and add mock.go to excluded generated mock file
* wip passing e2e
* reverting temp comment
* remove unneeded comments
* fixing merge errors
* fixing more bugs from merge
* fixing test
* WIP moving code around and fixing tests
* unused linting
* gaz
* temp removing these tests as we need placeholder/wrapper APIs for them with the removal of the gateway
* attempting to remove dependencies to gRPC gateway , 1 mroe left in deps.bzl
* renaming flags and other gateway services to http
* goimport
* fixing deepsource
* git mv
* Update validator/package/validator.yaml
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update validator/package/validator.yaml
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update cmd/beacon-chain/flags/base.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update cmd/beacon-chain/flags/base.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* Update cmd/beacon-chain/flags/base.go
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* addressing feedback
* missed lint
* renaming import
* reversal based on feedback
* fixing web ui registration
* don't require mux handler
* gaz
* removing gRPC service from validator completely, merged with http service, renames are a work in progress
* updating go.sum
* linting
* trailing white space
* realized there was more cleanup i could do with code reuse
* adding wrapper for routes
* reverting version
* fixing dependencies from merging develop
* gaz
* fixing unit test
* fixing dependencies
* reverting unit test
* fixing conflict
* updating change log
* Update log.go
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* gaz
* Update api/server/httprest/server.go
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* addressing some feedback
* forgot to remove deprecated flag in usage
* gofmt
* fixing test
* fixing deepsource issue
* moving deprecated flag and adding timeout handler
* missed removal of a flag
* fixing test:
* Update CHANGELOG.md
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
* addressing feedback
* updating comments based on feedback
* removing unused field for now, we can add it back in if we need to use the option
* removing unused struct
* changing api-timeout flag based on feedback
---------
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
* First take at updating everything to v5
* Patch gRPC gateway to use prysm v5
Fix patch
* Update go ssz
---------
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>