12255 Commits

Author SHA1 Message Date
Yuan Wang
1d96a95d75 Fix HINCRBYFLOAT removes field expiration on replica (#14227)
Fixes https://github.com/redis/redis/issues/14218

Before, we replicate HINCRBYFLOAT as an HSET command with the final
value in order to make sure that differences in float precision or
formatting will not create differences in replicas or after an AOF
restart.
However, on the replica side, if the field has an expiration time, HSET
will remove it, even though the master retains it. This leads to
inconsistencies between the master and the replica.

For Redis 8.0 and above, [PR
#14224](https://github.com/redis/redis/pull/14224) has fixed this issue.
However, Redis 7.4 does not support the HSETEX command. To address this,
HINCRBYFLOAT will be replicated as a simple HSET if no expiration is
set. If an expiration is specified, it will be replicated as a
MULTI/EXEC containing HSET and HPEXPIREAT commands.
2025-07-30 19:02:19 +08:00
YaacovHazan
7e0f533932 Redis 7.4.5 7.4.5 2025-07-06 14:59:57 +03:00
Mincho Paskalev
dfa5b12627 Remove string cat usage in tcl tests in order to support tcl8.5 2025-07-06 14:59:57 +03:00
Ozan Tezcan
39ea429d03 Retry accept() even if accepted connection reports an error (CVE-2025-48367)
In case of accept4() returns an error, we should check errno value and decide if we should retry accept4() without waiting next event loop iteration.
2025-07-06 14:59:57 +03:00
debing.sun
8658e2aeb9 Fix out of bounds write in hyperloglog commands (CVE-2025-32023)
Co-authored-by: oranagra <oran@redislabs.com>
2025-07-06 14:59:57 +03:00
yzc-yzc
4c6ffd5125 Fix negative offset issue for ZRANGEBY[SCORE|LEX] command (#14043)
Fix #13952

This PR ensures that ZRANGE_SCORE/LEX command with a negative offset
will return empty.
2025-07-06 14:59:57 +03:00
YaacovHazan
08961e16db Redis 7.4.4 7.4.4 2025-05-27 15:38:26 +03:00
YaacovHazan
e93df78b8b Check length of AOF file name in redis-check-aof (CVE-2025-27151)
Ensure that the length of the input file name does not exceed PATH_MAX
2025-05-27 15:38:26 +03:00
debing.sun
f9530e77d1 Correctly update kvstore overhead after emptying or releasing dict (#13984)
Close #13973

This PR fixed two bugs.
1)  `overhead_hashtable_lut` isn't updated correctly
    This bug was introduced by https://github.com/redis/redis/pull/12913
We only update `overhead_hashtable_lut` at the beginning and end of
rehashing, but we forgot to update it when a dict is emptied or
released.

This PR introduces a new `bucketChanged` callback to track the change
changes in the bucket size.
Now, `rehashingStarted` and `rehashingCompleted` callbacks are no longer
responsible for bucket changes, but are entirely handled by
`bucketChanged`, this can also avoid that we need to register three
callbacks to track the change of bucket size, now only one is needed.

In most cases, it will be triggered together with `rehashingStarted` or
`rehashingCompleted`,
except when a dict is being emptied or released, in these cases, even if
the dict is not rehashing, we still need to subtract its current size.

On the other hand, `overhead_hashtable_lut` was duplicated with
`bucket_count`, so we remove `overhead_hashtable_lut` and use
`bucket_count` instead.

Note that this bug only happens with cluster mode, because we don't use
KVSTORE_FREE_EMPTY_DICTS without cluster.

2) The size of `dict_size_index` repeatedly counted in terms of memory
usage.
`dict_size_index` is created at startup, so its memory usage has been
counted into `used_memory_startup`.
However, when we want to count the overhead, we repeat the calculation,
which may cause the overhead to exceed the total memory usage.

---------

Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2025-05-27 15:38:26 +03:00
Vitah Lin
9d11a4f862 Fix tls port update not reflected in CLUSTER SLOTS (#13966)
### Problem 

A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP
port issue in CLUSTER SLOTS, but it seems the handling of the TLS port
was overlooked.

There is this comment in the `addNodeToNodeReply` function in the
`cluster.c` file:
```c
    /* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */
    addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo()));
    addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN);
```

### Fixed 

This PR fixes the TLS port issue and adds relevant tests.
2025-05-27 15:38:26 +03:00
nesty92
5f317a632c Fix incorrect lag due to trimming stream via XTRIM or XADD command (#13958)
This PR fix the lag calculation by ensuring that when consumer group's last_id
is behind the first entry, the consumer group's entries read is considered
invalid and recalculated from the start of the stream

Supplement to PR #13473 

Close #13957

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
2025-05-27 15:38:26 +03:00
Stav-Levi
38e9e8ed38 Fix port update not reflected in CLUSTER SLOTS (#13932)
Close https://github.com/redis/redis/issues/13892 
config set port cmd updates server.port. cluster slot retrieves
information about cluster slots and their associated nodes. the fix
updates this info when config set port cmd is done, so cluster slots cmd
returns the right value.
2025-05-27 15:38:26 +03:00
YaacovHazan
24080114cc Redis 7.4.3 (#13963) 7.4.3 2025-04-23 14:54:51 +03:00
YaacovHazan
b9405a7d9d Redis 7.4.3 2025-04-23 10:26:24 +03:00
YaacovHazan
a93a319ba2 Limiting output buffer for unauthenticated client (CVE-2025-21605)
For unauthenticated clients the output buffer is limited to prevent
them from abusing it by not reading the replies
2025-04-23 10:18:06 +03:00
YaacovHazan
c214e0f1eb Avoid sanitizer warning for stable CI 2025-04-22 14:53:32 +03:00
Vitah Lin
c2289ea402 Fix oldTC CI dk.archive.ubuntu.com could not connect (#13961) 2025-04-22 12:16:02 +03:00
Jason
3138dbac1b Ignore shardId updates from replica nodes (#13877)
Close https://github.com/redis/redis/issues/13868

This bug was introduced by https://github.com/redis/redis/pull/13468

## Issue
To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.

However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.

## Solution
Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
2025-04-22 10:50:48 +03:00
debing.sun
3415d7ae63 Fix defrag when type/encoding changes during scan (#13883)
This PR is based on: https://github.com/valkey-io/valkey/pull/1801

[SoftlyRaining](https://github.com/SoftlyRaining) was hunting for defrag
bugs with Jim and found a couple of improvements to make. Jim pointed
out that in several of the callbacks, if the encoding were to change it
simply returns without doing anything to `cursor` to make it reach 0,
meaning that it would continue no-op working on that item without making
any progress. Type and encoding can change while the defrag scan is in
progress if the value is mutated or replaced by something else with the
same key.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Co-authored-by: Rain Valentine <rsg000@gmail.com>
2025-04-22 10:50:37 +03:00
Benson-li
6717df1388 Fix potential infinite loop of RANDOMKEY during client pause (#13863)
The bug mentioned in this
[#13862](https://github.com/redis/redis/issues/13862) has been fixed.

---------

Signed-off-by: li-benson <1260437731@qq.com>
Signed-off-by: youngmore1024 <youngmore1024@outlook.com>
Co-authored-by: youngmore1024 <youngmore1024@outlook.com>
2025-04-22 10:50:21 +03:00
debing.sun
949d4efa36 Fix crash during SLAVEOF when clients are blocked on lazyfree (#13853)
After https://github.com/redis/redis/pull/13167, when a client calls
`FLUSHDB` command, we still async empty database, and the client was
blocked until the lazyfree completes.

1) If another client calls `SLAVEOF` command during this time, the
server will unblock all blocked clients, including those blocked by the
lazyfree. However, when unblocking a lazyfree blocked client, we forgot
to call `updateStatsOnUnblock()`, which ultimately triggered the
following assertion.

2) If a client blocked by Lazyfree is unblocked midway, and at this
point the `bio_comp_list` has already received the completion
notification for the bio, we might end up processing a client that has
already been unblocked in `flushallSyncBgDone()`. Therefore, we need to
filter it out.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-04-22 10:29:35 +03:00
Denis Nevmerzhitskii
9b45a148f2 Fix wrong behavior of XREAD + after last entry of stream have been removed (#13632)
Close #13628

This PR changes behavior of special `+` id of XREAD command. Now it uses
`streamLastValidID` to find last entry instead of `last_id` field of
stream object.
This PR adds test for the issue.

**Notes**

Initial idea to update `last_id` while executing XDEL seems to be wrong.
`last_id` is used to strore last generated id and not id of last entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
2025-04-22 10:25:22 +03:00
Yuan Wang
5d7b60e117 Fix wrongly updating fsynced_reploff_pending when appendfsync=everysecond (#13793)
```
if (server.aof_fsync == AOF_FSYNC_EVERYSEC &&
    server.aof_last_incr_fsync_offset != server.aof_last_incr_size &&
    server.mstime - server.aof_last_fsync >= 1000 &&
    !(sync_in_progress = aofFsyncInProgress())) {
    goto try_fsync;
```
In https://github.com/redis/redis/pull/12622, when when
appendfsync=everysecond, if redis has written some data to AOF but not
`fsync`, and less than 1 second has passed since the last `fsync `,
redis will won't fsync AOF, but we will update `
fsynced_reploff_pending`, so it cause the `WAITAOF` to return
prematurely.

this bug is introduced in https://github.com/redis/redis/pull/12622,
from 7.4

The bug fix
1bd6688bca
is just as follows:
```diff
diff --git a/src/aof.c b/src/aof.c
index 8ccd8d8f8..521b30449 100644
--- a/src/aof.c
+++ b/src/aof.c
@@ -1096,8 +1096,11 @@ void flushAppendOnlyFile(int force) {
              * in which case master_repl_offset will increase but fsynced_reploff_pending won't be updated
              * (because there's no reason, from the AOF POV, to call fsync) and then WAITAOF may wait on
              * the higher offset (which contains data that was only propagated to replicas, and not to AOF) */
-            if (!sync_in_progress && server.aof_fsync != AOF_FSYNC_NO)
+            if (server.aof_last_incr_fsync_offset == server.aof_last_incr_size &&
+                !(sync_in_progress = aofFsyncInProgress()))
+            {
                 atomicSet(server.fsynced_reploff_pending, server.master_repl_offset);
+            }
             return;
```
Additionally, we slightly refactored fsync AOF to make it simpler, as
584f008d1c
2025-04-22 10:25:15 +03:00
Mingyi Kang
7a9584b15d Bump actions/upload-artifact from 3 to 4 (#13780)
Update `upload-artifact` from v3 to v4 to avoid the failure of `External
Server Tests` (I encountered this error when opening
[#13779](https://github.com/redis/redis/pull/13779)):

> Error: This request has been automatically failed because it uses a
deprecated version of `actions/upload-artifact: v3`. Learn more:
https://github.blog/changelog/2024-04-16-deprecation-notice-v3-of-the-artifact-actions/
2025-04-22 10:25:07 +03:00
debing.sun
98077651bf Update info.tcl test to revert client output limits sooner (#13738)
This PR is based on: https://github.com/valkey-io/valkey/pull/1462

We set the client output buffer limits to 10 bytes, and then execute
info stats which produces more than 10 bytes of output, which can cause
that command to throw an error.

I'm not sure why it wasn't consistently erroring before, might have been
some change related to the ubuntu upgrade though.

failed CI:
https://github.com/redis/redis/actions/runs/12738281720/job/35500381299

------
Co-authored-by: Madelyn Olson
[madelyneolson@gmail.com](mailto:madelyneolson@gmail.com)
2025-04-22 10:25:00 +03:00
Ozan Tezcan
b2a8c675a7 Fix memory leak of jemalloc tcache on function flush command (#13661)
Starting from https://github.com/redis/redis/pull/13133, we allocate a
jemalloc thread cache and use it for lua vm.
On certain cases, like `script flush` or `function flush` command, we
free the existing thread cache and create a new one.

Though, for `function flush`, we were not actually destroying the
existing thread cache itself. Each call creates a new thread cache on
jemalloc and we leak the previous thread cache instances. Jemalloc
allows maximum 4096 thread cache instances. If we reach this limit,
Redis prints "Failed creating the lua jemalloc tcache" log and abort.

There are other cases that can cause this memory leak, including
replication scenarios when emptyData() is called.

The implication is that it looks like redis `used_memory` is low, but
`allocator_allocated` and RSS remain high.

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-04-22 10:24:30 +03:00
nafraf
a96216bb9b Fix module loadex command crash due to invalid config (#13653)
Fix to https://github.com/redis/redis/issues/13650

providing an invalid config to a module with datatype crashes when redis
tries to unload the module due to the invalid config

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-04-22 10:16:24 +03:00
Steve
2619411a4c Avoid cluster.nodes load corruption due to shard-id generation (#13468)
PR #13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on #13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-04-22 10:16:14 +03:00
YaacovHazan
a0a6f23d99 Redis 7.4.2 7.4.2 2025-01-06 16:00:10 +02:00
Vitah Lin
8b3cc8d6c6 Deprecate ubuntu lunar and macos-12 in workflows (#13669)
1. Ubuntu Lunar reached End of Life on January 25, 2024, so upgrade the
ubuntu version to plucky in action `test-ubuntu-jemalloc-fortify` to
pass the daily CI
2. The macOS-12 environment is deprecated so upgrade macos-12 to
macos-13 in daily CI

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-01-06 16:00:10 +02:00
YaacovHazan
781658e44e Fix LUA garbage collector (CVE-2024-46981)
Reset GC state before closing the lua VM to prevent user data
to be wrongly freed while still might be used on destructor callbacks.
2025-01-06 16:00:10 +02:00
YaacovHazan
67a442258c Fix Read/Write key pattern selector (CVE-2024-51741)
The '%' rule must contain one or both of R/W
2025-01-06 16:00:10 +02:00
guybe7
f097bef7c6 Modules: defrag CB should take robj, not sds (#13627)
Added a log of the keyname in the test modules to reproduce the problem
(tests crash without the fix)
2025-01-06 16:00:10 +02:00
Moti Cohen
2e473e9067 Fix memory leak on rdbload error (#13626)
On RDB load error, if an invalid `expireAt` value is read,
`dupSearchDict` is not released.
2025-01-06 16:00:10 +02:00
opt-m
cae8201754 Fix get # option in sort command (#13608)
From 7.4, Redis allows `GET` options in cluster mode when the pattern maps to
the same slot as the key, but GET # pattern that represents key itself is missed.
This commit resolves it, bug report #13607.

---------

Co-authored-by: Yuan Wang <yuan.wang@redis.com>
2025-01-06 16:00:10 +02:00
Moti Cohen
d427a2618e Fix race in HFE tests (#13563)
Test 1 - give more time for expiration
Test 2 - Evaluate expiration time boundaries [+1,+2] before setting expiration [+1]
Test 3 - Avoid race on test HFEs propagated to replica
2025-01-06 16:00:10 +02:00
Moti Cohen
232f15f6aa Correct spelling error at t_hash.c comment (#13540)
spell check error : ./src/t_hash.c:1141: RESOTRE ==> RESTORE
2025-01-06 16:00:10 +02:00
Moti Cohen
0daa3dde60 HFE - Fix key ref by the hash on RENAME/MOVE/SWAPDB/RESTORE (#13539)
If the hash previously had HFEs (hash-fields with expiration) but later no longer
does, the key ref in the hash might become outdated after a MOVE, COPY,
RENAME or RESTORE operation. These commands maintain the key ref only
if HFEs are present. That is, we can only be sure that key ref is valid as long as the
hash has HFEs.
2025-01-06 16:00:10 +02:00
debing.sun
dd1bbaad26 Increment kvstore's non_empty_dicts only on first insert (#13528)
Found by @oranagra 

Currently, when the size of dict becomes 1, we do not check whether
`delta` is positive or negative.
As a result, `non_empty_dicts` is still incremented when the size of
dict changes from 2 to 1.
We should only increment `non_empty_dicts` when `delta` is positive, as
this indicates the first time an element is inserted into the dict.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-01-06 16:00:10 +02:00
debing.sun
7145dc09b2 Fix a race condition issue in the cache_memory of functionsLibCtx (#13476)
This is a missing of the PR https://github.com/redis/redis/pull/13383.
We will call `functionsLibCtxClear()` in bio, so we shouldn't touch
`curr_functions_lib_ctx` in it.
2025-01-06 16:00:10 +02:00
debing.sun
9de665c613 Fix incorrect lag due to trimming stream via XTRIM command (#13473)
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR https://github.com/redis/redis/pull/13338
2025-01-06 16:00:10 +02:00
Moti Cohen
344bac9b44 On HDEL last field with expiry, update global HFE DS (#13470)
Hash field expiration is optimized to avoid frequent update global HFE DS for
each field deletion. Eventually active-expiration will run and update or remove
the hash from global HFE DS gracefully. Nevertheless, statistic "subexpiry"
might reflect wrong number of hashes with HFE to the user if HDEL deletes
the last field with expiration in hash (yet there are more fields without expiration).

Following this change, if HDEL the last field with expiration in the hash then
take care to remove the hash from global HFE DS as well.
2025-01-06 16:00:10 +02:00
debing.sun
1bce7ef755 Pass extensions to node if extension processing is handled by it (#13465)
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/52.
Ref: https://github.com/redis/redis/pull/12760
Close https://github.com/redis/redis/issues/13401
This PR will replace https://github.com/redis/redis/pull/13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
2025-01-06 16:00:10 +02:00
debing.sun
b0d5f813f9 Ensure validity of myself as master or replica when loading cluster config (#13443)
First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
82f00f5179/src/cluster_legacy.c (L2320)
otherwise, it will crash in the
82f00f5179/src/cluster_legacy.c (L2395)

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
2025-01-06 16:00:10 +02:00
debing.sun
fe7f73f1d6 Fix CLUSTER SHARDS command returns empty array (#13422)
Close https://github.com/redis/redis/issues/13414

When the cluster's master node fails and is switched to another node,
the first node in the shard node list (the old master) is no longer
valid.
Add a new method clusterGetMasterFromShard() to obtain the current
master.
2025-01-06 16:00:10 +02:00
debing.sun
9768e45413 Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group (#13338)
Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
2025-01-06 16:00:10 +02:00
Oran Agra
74b289a0e1 Release Redis 7.4.1 7.4.1 2024-10-02 22:04:05 +03:00
Oran Agra
9a931df94f Prevent pattern matching abuse (CVE-2024-31228) 2024-10-02 22:04:05 +03:00
Oran Agra
c5d0936b64 Fix ACL SETUSER Read/Write key pattern selector (CVE-2024-31227)
The '%' rule must contain one or both of R/W
2024-10-02 22:04:05 +03:00
Oran Agra
b483cc5071 Fix lua bit.tohex (CVE-2024-31449)
INT_MIN value must be explicitly checked, and cannot be negated.
2024-10-02 22:04:05 +03:00