To avoid data loss, this commit adds a grace period for lagging replicas to
catch up the replication offset.
Done:
* Wait for replicas when shutdown is triggered by SIGTERM and SIGINT.
* Wait for replicas when shutdown is triggered by the SHUTDOWN command. A new
blocked client type BLOCKED_SHUTDOWN is introduced, allowing multiple clients
to call SHUTDOWN in parallel.
Note that they don't expect a response unless an error happens and shutdown is aborted.
* Log warning for each replica lagging behind when finishing shutdown.
* CLIENT_PAUSE_WRITE while waiting for replicas.
* Configurable grace period 'shutdown-timeout' in seconds (default 10).
* New flags for the SHUTDOWN command:
- NOW disables the grace period for lagging replicas.
- FORCE ignores errors writing the RDB or AOF files which would normally
prevent a shutdown.
- ABORT cancels ongoing shutdown. Can't be combined with other flags.
* New field in the output of the INFO command: 'shutdown_in_milliseconds'. The
value is the remaining maximum time to wait for lagging replicas before
finishing the shutdown. This field is present in the Server section **only**
during shutdown.
Not directly related:
* When shutting down, if there is an AOF saving child, it is killed **even** if AOF
is disabled. This can happen if BGREWRITEAOF is used when AOF is off.
* Client pause now has end time and type (WRITE or ALL) per purpose. The
different pause purposes are *CLIENT PAUSE command*, *failover* and
*shutdown*. If clients are unpaused for one purpose, it doesn't affect client
pause for other purposes. For example, the CLIENT UNPAUSE command doesn't
affect client pause initiated by the failover or shutdown procedures. A completed
failover or a failed shutdown doesn't unpause clients paused by the CLIENT
PAUSE command.
Notes:
* DEBUG RESTART doesn't wait for replicas.
* We already have a warning logged when a replica disconnects. This means that
if any replica connection is lost during the shutdown, it is either logged as
disconnected or as lagging at the time of exit.
Co-authored-by: Oran Agra <oran@redislabs.com>
Redis Test Suite
The normal execution mode of the test suite involves starting and manipulating
local redis-server instances, inspecting process state, log files, etc.
The test suite also supports execution against an external server, which is
enabled using the --host and --port parameters. When executing against an
external server, tests tagged external:skip are skipped.
There are additional runtime options that can further adjust the test suite to match different external server configurations:
| Option | Impact |
|---|---|
--singledb |
Only use database 0, don't assume others are supported. |
--ignore-encoding |
Skip all checks for specific encoding. |
--ignore-digest |
Skip key value digest validations. |
--cluster-mode |
Run in strict Redis Cluster compatibility mode. |
--large-memory |
Enables tests that consume more than 100mb |
Tags
Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.
Tags can be applied in different context levels:
start_servercontexttagscontext that bundles several tests together- A single test context.
The following compatibility and capability tags are currently used:
| Tag | Indicates |
|---|---|
external:skip |
Not compatible with external servers. |
cluster:skip |
Not compatible with --cluster-mode. |
large-memory |
Test that requires more than 100mb |
tls:skip |
Not campatible with --tls. |
needs:repl |
Uses replication and needs to be able to SYNC from server. |
needs:debug |
Uses the DEBUG command or other debugging focused commands (like OBJECT). |
needs:pfdebug |
Uses the PFDEBUG command. |
needs:config-maxmemory |
Uses CONFIG SET to manipulate memory limit, eviction policies, etc. |
needs:config-resetstat |
Uses CONFIG RESETSTAT to reset statistics. |
needs:reset |
Uses RESET to reset client connections. |
needs:save |
Uses SAVE to create an RDB file. |
When using an external server (--host and --port), filtering using the
external:skip tags is done automatically.
When using --cluster-mode, filtering using the cluster:skip tag is done
automatically.
When not using --large-memory, filtering using the largemem:skip tag is done
automatically.
In addition, it is possible to specify additional configuration. For example, to
run tests on a server that does not permit SYNC use:
./runtest --host <host> --port <port> --tags -needs:repl