changes for safe validator shutdown and restarts on healthcheck (#15401)

* poc changes for safe validator shutdown

* simplifying health routine and adding safe shutdown after max restarts reached

* fixing health tests

* fixing tests

* changelog

* gofmt

* fixing runner

* improve how runner times out

* improvements to ux on logs

* linting

* adding in max healthcheck flag

* changelog

* Update james-prysm_safe-validator-shutdown.md

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/runner.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/service.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/runner.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/runner.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* addressing some feedback from radek

* addressing some more feedback

* fixing name based on feedback

* fixing mistake on max health checks

* conflict accidently checked in

* go 1.23 no longer needs you to stop for the ticker

* Update flags.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* wip no unit test for recursive healthy host find

* rework healthcheck

* gaz

* fixing bugs and improving logs with new monitor

* removing health tracker, fixing runner tests, and adding placeholder for monitor tests

* fixing event stream check

* linting

* adding in health monitor tests

* gaz

* improving test

* removing some log.fatals

* forgot to remove comment

* missed fatal removal

* doppleganger should exit the node safely now

* Update validator/client/health_monitor.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* radek review

* Update validator/client/validator.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/validator.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/health_monitor.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/health_monitor.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/health_monitor.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* Update validator/client/validator.go

Co-authored-by: Radosław Kapka <rkapka@wp.pl>

* radek feedback

* read up on more suggestions and making fixes to channel

* suggested updates after more reading

* reverting some of this because it froze the validator after healthcheck failed

* fully reverting

* some improvements I found during testing

* Update cmd/validator/flags/flags.go

Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>

* preston's feedback

* clarifications on changelog

* converted to using an event feed instead of my own channel publishing implementation, adding relevant logs

* preston log suggestion

---------

Co-authored-by: Radosław Kapka <rkapka@wp.pl>
Co-authored-by: Preston Van Loon <pvanloon@offchainlabs.com>
This commit is contained in:
james-prysm
2025-07-09 10:39:06 -05:00
committed by GitHub
parent 7025e50a6c
commit f2d57f0b5f
31 changed files with 1154 additions and 539 deletions

View File

@@ -19,6 +19,8 @@ const (
WalletDefaultDirName = "prysm-wallet-v2"
// DefaultHTTPServerHost for the validator client.
DefaultHTTPServerHost = "127.0.0.1"
DefaultMaxHealthChecks = 0
)
var (
@@ -394,6 +396,13 @@ var (
Usage: "Disables polling of duties on dependent root changes.",
Value: false,
}
// MaxHealthChecksFlag sets a maximum amount of times to check for beacon node health before validator client times out and shuts down
MaxHealthChecksFlag = &cli.IntFlag{
Name: "max-health-checks",
Usage: "Maximum number of health checks to perform before exiting if not healthy. Set to 0 or a negative number for indefinite checks.",
Value: DefaultMaxHealthChecks,
}
)
// DefaultValidatorDir returns OS-specific default validator directory.

View File

@@ -76,6 +76,7 @@ var appFlags = []cli.Flag{
flags.EnableDistributed,
flags.AuthTokenPathFlag,
flags.DisableDutiesPolling,
flags.MaxHealthChecksFlag,
// Consensys' Web3Signer flags
flags.Web3SignerURLFlag,
flags.Web3SignerPublicValidatorKeysFlag,

View File

@@ -141,6 +141,7 @@ var appHelpFlagGroups = []flagGroup{
flags.EnableDistributed,
flags.AuthTokenPathFlag,
flags.DisableDutiesPolling,
flags.MaxHealthChecksFlag,
},
},
{