Files
prysm/validator/node/node_test.go
james-prysm 641d90990d grpc fallback improvements (#16215)
<!-- Thanks for sending a PR! Before submitting:

1. If this is your first PR, check out our contribution guide here
https://docs.prylabs.network/docs/contribute/contribution-guidelines
You will then need to sign our Contributor License Agreement (CLA),
which will show up as a comment from a bot in this pull request after
you open it. We cannot review code without a signed CLA.
2. Please file an associated tracking issue if this pull request is
non-trivial and requires context for our team to understand. All
features and most bug fixes should have
an associated issue with a design discussed and decided upon. Small bug
   fixes and documentation improvements don't need issues.
3. New features and bug fixes must have tests. Documentation may need to
be updated. If you're unsure what to update, send the PR, and we'll
discuss
   in review.
4. Note that PRs updating dependencies and new Go versions are not
accepted.
   Please file an issue instead.
5. A changelog entry is required for user facing issues.
-->

**What type of PR is this?**

## Summary

This PR implements gRPC fallback support for the validator client,
allowing it to automatically switch between multiple beacon node
endpoints when the primary node becomes unavailable or unhealthy.

## Changes

- Added `grpcConnectionProvider` to manage multiple gRPC connections
with circular failover
- Validator automatically detects unhealthy beacon nodes and switches to
the next available endpoint
- Health checks verify both node responsiveness AND sync status before
accepting a node
- Improved logging to only show "Found fully synced beacon node" when an
actual switch occurs (reduces log noise)


I removed the old middleware that uses gRPC's built in load balancer
because:

- gRPC's pick_first load balancer doesn't provide sync-status-aware
failover
- The validator needs to ensure it connects to a fully synced node, not
just a reachable one

## Test Scenario

### Setup
Deployed a 4-node Kurtosis testnet with local validator connecting to 2
beacon nodes:

```yaml
# kurtosis-grpc-fallback-test.yaml
participants:
  - el_type: nethermind
    cl_type: prysm
    validator_count: 128  # Keeps chain advancing
  - el_type: nethermind
    cl_type: prysm
    validator_count: 64
  - el_type: nethermind
    cl_type: prysm
    validator_count: 64   # Keeps chain advancing
  - el_type: nethermind
    cl_type: prysm
    validator_count: 64   # Keeps chain advancing

network_params:
  fulu_fork_epoch: 0
  seconds_per_slot: 6
```

Local validator started with:
```bash
./validator --beacon-rpc-provider=127.0.0.1:33005,127.0.0.1:33012 ...
```

### Test 1: Primary Failover (cl-1 → cl-2)

1. Stopped cl-1 beacon node
2. Validator detected failure and switched to cl-2

**Logs:**
```
WARN  Beacon node is not responding, switching host currentHost=127.0.0.1:33005 nextHost=127.0.0.1:33012
DEBUG Trying gRPC endpoint newHost=127.0.0.1:33012 previousHost=127.0.0.1:33005
INFO  Failover succeeded: connected to healthy beacon node failedAttempts=[127.0.0.1:33005] newHost=127.0.0.1:33012 previousHost=127.0.0.1:33005
```

**Result:**  PASSED - Validator continued submitting attestations on
cl-2

### Test 2: Circular Failover (cl-2 → cl-1)

1. Restarted cl-1, stopped cl-2
2. Validator detected failure and switched back to cl-1

**Logs:**
```
WARN  Beacon node is not responding, switching host currentHost=127.0.0.1:33012 nextHost=127.0.0.1:33005
DEBUG Trying gRPC endpoint newHost=127.0.0.1:33005 previousHost=127.0.0.1:33012
INFO  Failover succeeded: connected to healthy beacon node failedAttempts=[127.0.0.1:33012] newHost=127.0.0.1:33005 previousHost=127.0.0.1:33012
```

**Result:**  PASSED - Circular fallback works correctly

## Key Log Messages

| Log Level | Message | Source |
|-----------|---------|--------|
| WARN | "Beacon node is not responding, switching host" |
`changeHost()` in validator.go |
| INFO | "Switched gRPC endpoint" | `SetHost()` in
grpc_connection_provider.go |
| INFO | "Found fully synced beacon node" | `FindHealthyHost()` in
validator.go (only on actual switch) |

## Test Plan

- [x] Verify primary failover (cl-1 → cl-2)
- [x] Verify circular failover (cl-2 → cl-1)
- [x] Verify validator continues producing attestations after switch
- [x] Verify "Found fully synced beacon node" only logs on actual switch
(not every health check)

**What does this PR do? Why is it needed?**

**Which issues(s) does this PR fix?**

Fixes # https://github.com/OffchainLabs/prysm/pull/7133


**Other notes for review**

**Acknowledgements**

- [x] I have read
[CONTRIBUTING.md](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md).
- [x] I have included a uniquely named [changelog fragment
file](https://github.com/prysmaticlabs/prysm/blob/develop/CONTRIBUTING.md#maintaining-changelogmd).
- [x] I have added a description with sufficient context for reviewers
to understand this PR.
- [x] I have tested that my changes work as expected and I added a
testing plan to the PR description (if applicable).

---------

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Co-authored-by: Radosław Kapka <rkapka@wp.pl>
Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
2026-02-02 14:51:56 +00:00

327 lines
11 KiB
Go

package node
import (
"flag"
"fmt"
"os"
"path"
"path/filepath"
"testing"
"github.com/OffchainLabs/prysm/v7/cmd"
"github.com/OffchainLabs/prysm/v7/cmd/validator/flags"
"github.com/OffchainLabs/prysm/v7/io/file"
"github.com/OffchainLabs/prysm/v7/testing/assert"
"github.com/OffchainLabs/prysm/v7/testing/require"
"github.com/OffchainLabs/prysm/v7/validator/accounts"
"github.com/OffchainLabs/prysm/v7/validator/accounts/wallet"
"github.com/OffchainLabs/prysm/v7/validator/db/kv"
"github.com/OffchainLabs/prysm/v7/validator/keymanager"
remoteweb3signer "github.com/OffchainLabs/prysm/v7/validator/keymanager/remote-web3signer"
logtest "github.com/sirupsen/logrus/hooks/test"
"github.com/urfave/cli/v2"
)
// Test that the sharding node can build with default flag values.
func TestNode_Builds(t *testing.T) {
app := cli.App{}
set := flag.NewFlagSet("test", 0)
set.String("datadir", t.TempDir()+"/datadir", "the node data directory")
dir := t.TempDir() + "/walletpath"
passwordDir := t.TempDir() + "/password"
require.NoError(t, os.MkdirAll(passwordDir, os.ModePerm))
passwordFile := filepath.Join(passwordDir, "password.txt")
walletPassword := "$$Passw0rdz2$$"
require.NoError(t, os.WriteFile(
passwordFile,
[]byte(walletPassword),
os.ModePerm,
))
set.String("wallet-dir", dir, "path to wallet")
set.String("wallet-password-file", passwordFile, "path to wallet password")
set.String("keymanager-kind", "imported", "keymanager kind")
set.String("verbosity", "debug", "log verbosity")
set.String("beacon-rpc-provider", "localhost:4000", "beacon node RPC endpoint")
set.String("beacon-rest-api-provider", "http://localhost:3500", "beacon node REST API endpoint")
require.NoError(t, set.Set(flags.WalletPasswordFileFlag.Name, passwordFile))
ctx := cli.NewContext(&app, set, nil)
opts := []accounts.Option{
accounts.WithWalletDir(dir),
accounts.WithKeymanagerType(keymanager.Local),
accounts.WithWalletPassword(walletPassword),
accounts.WithSkipMnemonicConfirm(true),
}
acc, err := accounts.NewCLIManager(opts...)
require.NoError(t, err)
_, err = acc.WalletCreate(ctx.Context)
require.NoError(t, err)
valClient, err := NewValidatorClient(ctx)
require.NoError(t, err, "Failed to create ValidatorClient")
err = valClient.db.Close()
require.NoError(t, err)
}
func TestGetLegacyDatabaseLocation(t *testing.T) {
dataDir := t.TempDir()
dataFile := path.Join(dataDir, "dataFile")
nonExistingDataFile := path.Join(dataDir, "nonExistingDataFile")
_, err := os.Create(dataFile)
require.NoError(t, err, "Failed to create data file")
walletDir := t.TempDir()
derivedDir := path.Join(walletDir, "derived")
err = file.MkdirAll(derivedDir)
require.NoError(t, err, "Failed to create derived dir")
derivedDbFile := path.Join(derivedDir, kv.ProtectionDbFileName)
_, err = os.Create(derivedDbFile)
require.NoError(t, err, "Failed to create derived db file")
dbFile := path.Join(walletDir, kv.ProtectionDbFileName)
_, err = os.Create(dbFile)
require.NoError(t, err, "Failed to create db file")
nonExistingWalletDir := t.TempDir()
testCases := []struct {
name string
isInteropNumValidatorsSet bool
isWeb3SignerURLFlagSet bool
dataDir string
dataFile string
walletDir string
validatorClient *ValidatorClient
wallet *wallet.Wallet
expectedDataDir string
expectedDataFile string
}{
{
name: "interop num validators set",
isInteropNumValidatorsSet: true,
dataDir: dataDir,
dataFile: dataFile,
expectedDataDir: dataDir,
expectedDataFile: dataFile,
},
{
name: "dataDir differs from default",
dataDir: dataDir,
dataFile: dataFile,
expectedDataDir: dataDir,
expectedDataFile: dataFile,
},
{
name: "dataFile exists",
dataDir: cmd.DefaultDataDir(),
dataFile: dataFile,
expectedDataDir: cmd.DefaultDataDir(),
expectedDataFile: dataFile,
},
{
name: "wallet is nil",
dataDir: cmd.DefaultDataDir(),
dataFile: nonExistingDataFile,
expectedDataDir: cmd.DefaultDataDir(),
expectedDataFile: nonExistingDataFile,
},
{
name: "web3signer url is not set and legacy data file does not exist",
dataDir: cmd.DefaultDataDir(),
dataFile: nonExistingDataFile,
wallet: wallet.New(&wallet.Config{
WalletDir: nonExistingWalletDir,
KeymanagerKind: keymanager.Derived,
}),
expectedDataDir: cmd.DefaultDataDir(),
expectedDataFile: nonExistingDataFile,
},
{
name: "web3signer url is not set and legacy data file does exist",
dataDir: cmd.DefaultDataDir(),
dataFile: nonExistingDataFile,
wallet: wallet.New(&wallet.Config{
WalletDir: walletDir,
KeymanagerKind: keymanager.Derived,
}),
expectedDataDir: path.Join(walletDir, "derived"),
expectedDataFile: path.Join(walletDir, "derived", kv.ProtectionDbFileName),
},
{
name: "web3signer url is set and legacy data file does not exist",
isWeb3SignerURLFlagSet: true,
dataDir: cmd.DefaultDataDir(),
dataFile: nonExistingDataFile,
walletDir: nonExistingWalletDir,
wallet: wallet.New(&wallet.Config{
WalletDir: walletDir,
KeymanagerKind: keymanager.Derived,
}),
expectedDataDir: cmd.DefaultDataDir(),
expectedDataFile: nonExistingDataFile,
},
{
name: "web3signer url is set and legacy data file does exist",
isWeb3SignerURLFlagSet: true,
dataDir: cmd.DefaultDataDir(),
dataFile: nonExistingDataFile,
walletDir: walletDir,
wallet: wallet.New(&wallet.Config{
WalletDir: walletDir,
KeymanagerKind: keymanager.Derived,
}),
expectedDataDir: walletDir,
expectedDataFile: path.Join(walletDir, kv.ProtectionDbFileName),
},
}
for _, tt := range testCases {
t.Run(tt.name, func(t *testing.T) {
validatorClient := &ValidatorClient{wallet: tt.wallet}
actualDataDir, actualDataFile, err := validatorClient.getLegacyDatabaseLocation(
tt.isInteropNumValidatorsSet,
tt.isWeb3SignerURLFlagSet,
tt.dataDir,
tt.dataFile,
tt.walletDir,
)
require.NoError(t, err, "Failed to get legacy database location")
assert.Equal(t, tt.expectedDataDir, actualDataDir, "data dir should be equal")
assert.Equal(t, tt.expectedDataFile, actualDataFile, "data file should be equal")
})
}
}
// TestClearDB tests clearing the database
func TestClearDB(t *testing.T) {
for _, isMinimalDatabase := range []bool{false, true} {
t.Run(fmt.Sprintf("isMinimalDatabase=%v", isMinimalDatabase), func(t *testing.T) {
hook := logtest.NewGlobal()
tmp := filepath.Join(t.TempDir(), "datadirtest")
require.NoError(t, clearDB(t.Context(), tmp, true, isMinimalDatabase))
require.LogsContain(t, hook, "Removing database")
})
}
}
// TestWeb3SignerConfig tests the web3 signer config returns the correct values.
func TestWeb3SignerConfig(t *testing.T) {
type args struct {
baseURL string
publicKeysOrURLs []string
persistentFile string
}
tests := []struct {
name string
args *args
want *remoteweb3signer.SetupConfig
wantErrMsg string
}{
{
name: "happy path with public keys",
args: &args{
baseURL: "http://localhost:8545",
publicKeysOrURLs: []string{"0xa99a76ed7796f7be22d5b7e85deeb7c5677e88e511e0b337618f8c4eb61349b4bf2d153f649f7b53359fe8b94a38e44c," +
"0xb89bebc699769726a318c8e9971bd3171297c61aea4a6578a7a4f94b547dcba5bac16a89108b6b6a1fe3695d1a874a0b"},
},
want: &remoteweb3signer.SetupConfig{
BaseEndpoint: "http://localhost:8545",
GenesisValidatorsRoot: nil,
PublicKeysURL: "",
ProvidedPublicKeys: []string{
"0xa99a76ed7796f7be22d5b7e85deeb7c5677e88e511e0b337618f8c4eb61349b4bf2d153f649f7b53359fe8b94a38e44c",
"0xb89bebc699769726a318c8e9971bd3171297c61aea4a6578a7a4f94b547dcba5bac16a89108b6b6a1fe3695d1a874a0b",
},
},
},
{
name: "happy path with external url",
args: &args{
baseURL: "http://localhost:8545",
publicKeysOrURLs: []string{"http://localhost:8545/api/v1/eth2/publicKeys"},
},
want: &remoteweb3signer.SetupConfig{
BaseEndpoint: "http://localhost:8545",
GenesisValidatorsRoot: nil,
PublicKeysURL: "http://localhost:8545/api/v1/eth2/publicKeys",
ProvidedPublicKeys: nil,
},
},
{
name: "Bad base URL",
args: &args{
baseURL: "0xa99a76ed7796f7be22d5b7e85deeb7c5677e88,",
publicKeysOrURLs: []string{"0xa99a76ed7796f7be22d5b7e85deeb7c5677e88e511e0b337618f8c4eb61349b4bf2d153f649f7b53359fe8b94a38e44c," +
"0xb89bebc699769726a318c8e9971bd3171297c61aea4a6578a7a4f94b547dcba5bac16a89108b6b6a1fe3695d1a874a0b"},
},
want: nil,
wantErrMsg: "web3signer url 0xa99a76ed7796f7be22d5b7e85deeb7c5677e88, is invalid: parse \"0xa99a76ed7796f7be22d5b7e85deeb7c5677e88,\": invalid URI for request",
},
{
name: "Base URL missing scheme or host",
args: &args{
baseURL: "localhost:8545",
publicKeysOrURLs: []string{"localhost"},
},
want: nil,
wantErrMsg: "web3signer url must be in the format of http(s)://host:port url used: localhost:8545",
},
{
name: "happy path with persistentFile",
args: &args{
baseURL: "http://localhost:8545",
persistentFile: "/remote/key/file.txt",
},
want: &remoteweb3signer.SetupConfig{
BaseEndpoint: "http://localhost:8545",
KeyFilePath: "/remote/key/file.txt",
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
app := cli.App{}
set := flag.NewFlagSet(tt.name, 0)
set.String("validators-external-signer-url", tt.args.baseURL, "baseUrl")
set.String(flags.Web3SignerKeyFileFlag.Name, "", "")
c := &cli.StringSliceFlag{
Name: "validators-external-signer-public-keys",
}
err := c.Apply(set)
require.NoError(t, err)
require.NoError(t, set.Set(flags.Web3SignerURLFlag.Name, tt.args.baseURL))
for _, key := range tt.args.publicKeysOrURLs {
require.NoError(t, set.Set(flags.Web3SignerPublicValidatorKeysFlag.Name, key))
}
if tt.args.persistentFile != "" {
require.NoError(t, set.Set(flags.Web3SignerKeyFileFlag.Name, tt.args.persistentFile))
}
cliCtx := cli.NewContext(&app, set, nil)
got, err := Web3SignerConfig(cliCtx)
if tt.wantErrMsg != "" {
require.ErrorContains(t, tt.wantErrMsg, err)
return
}
require.DeepEqual(t, tt.want, got)
})
}
}
func Test_parseBeaconApiHeaders(t *testing.T) {
t.Run("ok", func(t *testing.T) {
h := parseBeaconApiHeaders("key1=value1,key1=value2,key2=value3")
assert.Equal(t, 2, len(h))
assert.DeepEqual(t, []string{"value1", "value2"}, h["key1"])
assert.DeepEqual(t, []string{"value3"}, h["key2"])
})
t.Run("ignores malformed", func(t *testing.T) {
h := parseBeaconApiHeaders("key1=value1,key2value2,key3=,=key4")
assert.Equal(t, 1, len(h))
assert.DeepEqual(t, []string{"value1"}, h["key1"])
})
}