Fixes https://github.com/redis/redis/issues/14218
Before, we replicate HINCRBYFLOAT as an HSET command with the final
value in order to make sure that differences in float precision or
formatting will not create differences in replicas or after an AOF
restart.
However, on the replica side, if the field has an expiration time, HSET
will remove it, even though the master retains it. This leads to
inconsistencies between the master and the replica.
For Redis 8.0 and above, [PR
#14224](https://github.com/redis/redis/pull/14224) has fixed this issue.
However, Redis 7.4 does not support the HSETEX command. To address this,
HINCRBYFLOAT will be replicated as a simple HSET if no expiration is
set. If an expiration is specified, it will be replicated as a
MULTI/EXEC containing HSET and HPEXPIREAT commands.
Close#13973
This PR fixed two bugs.
1) `overhead_hashtable_lut` isn't updated correctly
This bug was introduced by https://github.com/redis/redis/pull/12913
We only update `overhead_hashtable_lut` at the beginning and end of
rehashing, but we forgot to update it when a dict is emptied or
released.
This PR introduces a new `bucketChanged` callback to track the change
changes in the bucket size.
Now, `rehashingStarted` and `rehashingCompleted` callbacks are no longer
responsible for bucket changes, but are entirely handled by
`bucketChanged`, this can also avoid that we need to register three
callbacks to track the change of bucket size, now only one is needed.
In most cases, it will be triggered together with `rehashingStarted` or
`rehashingCompleted`,
except when a dict is being emptied or released, in these cases, even if
the dict is not rehashing, we still need to subtract its current size.
On the other hand, `overhead_hashtable_lut` was duplicated with
`bucket_count`, so we remove `overhead_hashtable_lut` and use
`bucket_count` instead.
Note that this bug only happens with cluster mode, because we don't use
KVSTORE_FREE_EMPTY_DICTS without cluster.
2) The size of `dict_size_index` repeatedly counted in terms of memory
usage.
`dict_size_index` is created at startup, so its memory usage has been
counted into `used_memory_startup`.
However, when we want to count the overhead, we repeat the calculation,
which may cause the overhead to exceed the total memory usage.
---------
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
### Problem
A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP
port issue in CLUSTER SLOTS, but it seems the handling of the TLS port
was overlooked.
There is this comment in the `addNodeToNodeReply` function in the
`cluster.c` file:
```c
/* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */
addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo()));
addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN);
```
### Fixed
This PR fixes the TLS port issue and adds relevant tests.
This PR fix the lag calculation by ensuring that when consumer group's last_id
is behind the first entry, the consumer group's entries read is considered
invalid and recalculated from the start of the stream
Supplement to PR #13473Close#13957
Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Close https://github.com/redis/redis/issues/13892
config set port cmd updates server.port. cluster slot retrieves
information about cluster slots and their associated nodes. the fix
updates this info when config set port cmd is done, so cluster slots cmd
returns the right value.
Close https://github.com/redis/redis/issues/13868
This bug was introduced by https://github.com/redis/redis/pull/13468
## Issue
To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.
However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.
## Solution
Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
This PR is based on: https://github.com/valkey-io/valkey/pull/1801
[SoftlyRaining](https://github.com/SoftlyRaining) was hunting for defrag
bugs with Jim and found a couple of improvements to make. Jim pointed
out that in several of the callbacks, if the encoding were to change it
simply returns without doing anything to `cursor` to make it reach 0,
meaning that it would continue no-op working on that item without making
any progress. Type and encoding can change while the defrag scan is in
progress if the value is mutated or replaced by something else with the
same key.
---------
Signed-off-by: Rain Valentine <rsg000@gmail.com>
Co-authored-by: Rain Valentine <rsg000@gmail.com>
After https://github.com/redis/redis/pull/13167, when a client calls
`FLUSHDB` command, we still async empty database, and the client was
blocked until the lazyfree completes.
1) If another client calls `SLAVEOF` command during this time, the
server will unblock all blocked clients, including those blocked by the
lazyfree. However, when unblocking a lazyfree blocked client, we forgot
to call `updateStatsOnUnblock()`, which ultimately triggered the
following assertion.
2) If a client blocked by Lazyfree is unblocked midway, and at this
point the `bio_comp_list` has already received the completion
notification for the bio, we might end up processing a client that has
already been unblocked in `flushallSyncBgDone()`. Therefore, we need to
filter it out.
---------
Co-authored-by: oranagra <oran@redislabs.com>
Close#13628
This PR changes behavior of special `+` id of XREAD command. Now it uses
`streamLastValidID` to find last entry instead of `last_id` field of
stream object.
This PR adds test for the issue.
**Notes**
Initial idea to update `last_id` while executing XDEL seems to be wrong.
`last_id` is used to strore last generated id and not id of last entry.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
```
if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
server.aof_last_incr_fsync_offset != server.aof_last_incr_size &&
server.mstime - server.aof_last_fsync >= 1000 &&
!(sync_in_progress = aofFsyncInProgress())) {
goto try_fsync;
```
In https://github.com/redis/redis/pull/12622, when when
appendfsync=everysecond, if redis has written some data to AOF but not
`fsync`, and less than 1 second has passed since the last `fsync `,
redis will won't fsync AOF, but we will update `
fsynced_reploff_pending`, so it cause the `WAITAOF` to return
prematurely.
this bug is introduced in https://github.com/redis/redis/pull/12622,
from 7.4
The bug fix
1bd6688bca
is just as follows:
```diff
diff --git a/src/aof.c b/src/aof.c
index 8ccd8d8f8..521b30449 100644
--- a/src/aof.c
+++ b/src/aof.c
@@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) {
* in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated
* (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on
* the higher offset (which contains data that was only propagated to replicas, and not to AOF) */
- if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO)
+ if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size &&
+ !(sync_in_progress = aofFsyncInProgress()))
+ {
atomicSet(server.fsynced_reploff_pending, server.master_repl_offset);
+ }
return;
```
Additionally, we slightly refactored fsync AOF to make it simpler, as
584f008d1c
Starting from https://github.com/redis/redis/pull/13133, we allocate a
jemalloc thread cache and use it for lua vm.
On certain cases, like `script flush` or `function flush` command, we
free the existing thread cache and create a new one.
Though, for `function flush`, we were not actually destroying the
existing thread cache itself. Each call creates a new thread cache on
jemalloc and we leak the previous thread cache instances. Jemalloc
allows maximum 4096 thread cache instances. If we reach this limit,
Redis prints "Failed creating the lua jemalloc tcache" log and abort.
There are other cases that can cause this memory leak, including
replication scenarios when emptyData() is called.
The implication is that it looks like redis `used_memory` is low, but
`allocator_allocated` and RSS remain high.
Co-authored-by: debing.sun <debing.sun@redis.com>
PR #13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.
This PR builds on #13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
1. Ubuntu Lunar reached End of Life on January 25, 2024, so upgrade the
ubuntu version to plucky in action `test-ubuntu-jemalloc-fortify` to
pass the daily CI
2. The macOS-12 environment is deprecated so upgrade macos-12 to
macos-13 in daily CI
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
From 7.4, Redis allows `GET` options in cluster mode when the pattern maps to
the same slot as the key, but GET # pattern that represents key itself is missed.
This commit resolves it, bug report #13607.
---------
Co-authored-by: Yuan Wang <yuan.wang@redis.com>
Test 1 - give more time for expiration
Test 2 - Evaluate expiration time boundaries [+1,+2] before setting expiration [+1]
Test 3 - Avoid race on test HFEs propagated to replica
If the hash previously had HFEs (hash-fields with expiration) but later no longer
does, the key ref in the hash might become outdated after a MOVE, COPY,
RENAME or RESTORE operation. These commands maintain the key ref only
if HFEs are present. That is, we can only be sure that key ref is valid as long as the
hash has HFEs.
Found by @oranagra
Currently, when the size of dict becomes 1, we do not check whether
`delta` is positive or negative.
As a result, `non_empty_dicts` is still incremented when the size of
dict changes from 2 to 1.
We should only increment `non_empty_dicts` when `delta` is positive, as
this indicates the first time an element is inserted into the dict.
---------
Co-authored-by: oranagra <oran@redislabs.com>
This is a missing of the PR https://github.com/redis/redis/pull/13383.
We will call `functionsLibCtxClear()` in bio, so we shouldn't touch
`curr_functions_lib_ctx` in it.
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.
The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:
1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.
2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.
## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.
Supplement to PR https://github.com/redis/redis/pull/13338
Hash field expiration is optimized to avoid frequent update global HFE DS for
each field deletion. Eventually active-expiration will run and update or remove
the hash from global HFE DS gracefully. Nevertheless, statistic "subexpiry"
might reflect wrong number of hashes with HFE to the user if HDEL deletes
the last field with expiration in hash (yet there are more fields without expiration).
Following this change, if HDEL the last field with expiration in the hash then
take care to remove the hash from global HFE DS as well.
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/52.
Ref: https://github.com/redis/redis/pull/12760
Close https://github.com/redis/redis/issues/13401
This PR will replace https://github.com/redis/redis/pull/13449
Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .
With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.
This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.
Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
---------
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
82f00f5179/src/cluster_legacy.c (L2320)
otherwise, it will crash in the
82f00f5179/src/cluster_legacy.c (L2395)
So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
Close https://github.com/redis/redis/issues/13414
When the cluster's master node fails and is switched to another node,
the first node in the shard node list (the old master) is no longer
valid.
Add a new method clusterGetMasterFromShard() to obtain the current
master.
Fix#13337
Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.