8690 Commits

Author SHA1 Message Date
YaacovHazan
2ed2a678a0 Redis 7.2.13 2026-02-22 21:12:17 +02:00
debing.sun
857e9ac483 Strip CRLF from error and simple string replies (#826)
Because in some cases, the client put \r\n in the command parameters.
When Redis returns these parameters to the client via an error reply, the presence of \r\n in the middle can cause the client to only parse the portion before the \r\n when handling the error reply. This disrupts the protocol parsing and ultimately causes the connection to become stuck.
2026-02-22 20:35:23 +02:00
YaacovHazan
e4dcf76670 Redis 7.2.12 2025-11-01 22:19:46 +02:00
debing.sun
681246c248 Fix MurmurHash64A overflow in HyperLogLog with 2GB+ entries
The MurmurHash64A function in hyperloglog.c used an int parameter for length,
causing integer overflow when processing PFADD entries larger than 2GB.
This could lead to server crashes.

Changed the len parameter from int to size_t to properly handle
large inputs up to SIZE_MAX in HyperLogLog operations.
Refer to the implementation in facebook/mcrouter@2dbee3d/mcrouter/lib/fbi/hash.c#L54
2025-11-01 21:36:37 +02:00
YaacovHazan
d4c381df7a Redis 7.2.11 2025-10-03 09:03:04 +03:00
Ozan Tezcan
dccb672d83 Lua script can be executed in the context of another user (CVE-2025-46818) 2025-10-02 22:25:24 +03:00
debing.sun
0c5a8cca96 Fix missing expires_cursor check when existing defrag cycle (#14270)
Fix https://github.com/redis/redis/issues/13612
This bug was introduced by https://github.com/redis/redis/issues/11465

This PR added the incremental defragmentation for db->expires.
If the time for the defragmentation cycle has not arrived, it will exit
unless both cursor and expires_cursor are 0, meaning both keys and
expires have been fragmented.
However, the expires_cursor check has now been overlooked, which leads
to we will exit immediately even if the defragmentation doesn't reach
the end time, which will cause the efficiency of defragmentation to
become very low.

Note that this bug was already fixed in version
7.4(https://github.com/redis/redis/pull/13058)
2025-08-14 10:37:23 +08:00
YaacovHazan
5a752e1978 Redis 7.2.10 2025-07-06 15:12:33 +03:00
Ozan Tezcan
c76d618209 Retry accept() even if accepted connection reports an error (CVE-2025-48367)
In case of accept4() returns an error, we should check errno value and
decide if we should retry accept4() without waiting next event loop iteration.
2025-07-06 15:12:33 +03:00
debing.sun
f35b72dd17 Fix out of bounds write in hyperloglog commands (CVE-2025-32023)
Co-authored-by: oranagra <oran@redislabs.com>
2025-07-06 15:12:33 +03:00
YaacovHazan
080b99d982 Redis 7.2.9 2025-05-27 15:39:09 +03:00
YaacovHazan
d0eeee6e31 Check length of AOF file name in redis-check-aof (CVE-2025-27151)
Ensure that the length of the input file name does not exceed PATH_MAX
2025-05-27 15:39:09 +03:00
debing.sun
35eff3d49a Resolve bounds checks on cluster_legacy.c (#13970)
Based on https://github.com/valkey-io/valkey/pull/1463 and
https://github.com/valkey-io/valkey/pull/1481

In the failure of fully
CI(https://github.com/redis/redis/actions/runs/14595343452/job/40979173087?pr=13965)
in version 7.0 we are getting a number of errors like:
```
array subscript ‘clusterMsg[0]’ is partly outside array bounds of ‘unsigned char[2272]’
```

Which is basically GCC telling us that we have an object which is longer
than the underlying storage of the allocation. We actually do this a
lot, but GCC is generally not aware of how big the underlying allocation
is, so it doesn't throw this error. We are specifically getting this
error because the msgBlock can be of variable length depending on the
type of message, but GCC assumes it's the longest one possible. The
solution I went with here was make the message type optional, so that it
wasn't included in the size. I think this also makes some sense, since
it's really just a helper for us to easily cast the object around.

This compilation warning only occurs in version 7.2, because in [this
PR](https://github.com/redis/redis/pull/13073), we started passing
`-flto` to `CFLAGS` by default. It seems that in this case, GCC is
unable to detect such warnings. However, this change is not present in
version 7.2.
So, to reproduce this compilation warning in versions after 7.2, we can
pass `OPTIMIZATION=-O2` manually.

---------

Co-authored-by: madolson <34459052+madolson@users.noreply.github.com>
2025-05-27 15:39:09 +03:00
Vitah Lin
e7cd611be1 Fix tls port update not reflected in CLUSTER SLOTS (#13966)
### Problem 

A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP
port issue in CLUSTER SLOTS, but it seems the handling of the TLS port
was overlooked.

There is this comment in the `addNodeToNodeReply` function in the
`cluster.c` file:
```c
    /* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */
    addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo()));
    addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN);
```

### Fixed 

This PR fixes the TLS port issue and adds relevant tests.
2025-05-27 15:39:09 +03:00
nesty92
89aee9556d Fix incorrect lag due to trimming stream via XTRIM or XADD command (#13958)
This PR fix the lag calculation by ensuring that when consumer group's last_id
is behind the first entry, the consumer group's entries read is considered
invalid and recalculated from the start of the stream

Supplement to PR #13473 

Close #13957

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
2025-05-27 15:39:09 +03:00
Stav-Levi
50e91ca7db Fix port update not reflected in CLUSTER SLOTS (#13932)
Close https://github.com/redis/redis/issues/13892 
config set port cmd updates server.port. cluster slot retrieves
information about cluster slots and their associated nodes. the fix
updates this info when config set port cmd is done, so cluster slots cmd
returns the right value.
2025-05-27 15:39:09 +03:00
YaacovHazan
62b766aef6 Redis 7.2.8 2025-04-23 08:11:54 +00:00
YaacovHazan
42fb340ce4 Limiting output buffer for unauthenticated client (CVE-2025-21605)
For unauthenticated clients the output buffer is limited to prevent
them from abusing it by not reading the replies
2025-04-23 08:09:40 +00:00
guybe7
21d5e64ace Module unblock on keys: updateStatsOnUnblock is called twice (#13405)
This commit reverts the deletion of the condition `!bc->blocked_on_keys`
that was accidentally introduced by
https://github.com/redis/redis/pull/12817.
In case a blocked-on-keys module client is unblocked both
`moduleUnblockClientOnKey` and `moduleHandleBlockedClients` are called
which resulted in `updateStatsOnUnblock` being called twice

Now, that `moduleHandleBlockedClients` doesn't call
`updateStatsOnUnblock` in case of unblocked module key-blocked clients,
in the unlikely event that the module decides to call `RM_UnblockClient`
on a key-blocked client, we need to call `updateStatsOnUnblock` from
within `moduleBlockedClientTimedOut`, but since
`moduleBlockedClientTimedOut` is not tread-safe we can't call it
directly from withing `RM_UnblockClient`.
Added a new flag `blocked_on_keys_explicit_unblock` for that specific
case, which will cause `moduleBlockedClientTimedOut` to be called from
`moduleHandleBlockedClients` (which is only called from the main thread)

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-04-22 20:56:52 +08:00
YaacovHazan
9af9c4deff Avoid sanitizer warning for stable CI 2025-04-22 14:54:27 +03:00
Jason
779a20058b Ignore shardId updates from replica nodes (#13877)
Close https://github.com/redis/redis/issues/13868

This bug was introduced by https://github.com/redis/redis/pull/13468

To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.

However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.

Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
2025-04-22 13:49:12 +03:00
Benson-li
a26774cee1 Fix potential infinite loop of RANDOMKEY during client pause (#13863)
The bug mentioned in this
[#13862](https://github.com/redis/redis/issues/13862) has been fixed.

---------

Signed-off-by: li-benson <1260437731@qq.com>
Signed-off-by: youngmore1024 <youngmore1024@outlook.com>
Co-authored-by: youngmore1024 <youngmore1024@outlook.com>
2025-04-22 13:49:12 +03:00
nafraf
6e819e188b Fix module loadex command crash due to invalid config (#13653)
Fix to https://github.com/redis/redis/issues/13650

providing an invalid config to a module with datatype crashes when redis
tries to unload the module due to the invalid config

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-04-22 13:49:12 +03:00
guybe7
8e72b42a9a Modules: defrag CB should take robj, not sds (#13627)
Added a log of the keyname in the test modules to reproduce the problem
(tests crash without the fix)
2025-04-22 13:49:12 +03:00
Ping Xie
bdc78175b5 Fix PONG message processing for primary-ship tracking during failovers (#13055)
This commit updates the processing of PONG gossip messages in the
cluster. When a node (B) becomes a replica due to a failover, its PONG
messages include its new primary node's (A) information and B's
configuration epoch is aligned with A's. This allows observer nodes to
identify changes in primary-ship, addressing issues of intermediate
states and enhancing cluster state consistency during topology changes.

Fix #13018
2025-04-22 13:49:12 +03:00
debing.sun
30fe743638 Fix race condition issues between the main thread and module threads (#12817)
Fix #12785 and other race condition issues.
See the following isolated comments.

The following report was obtained using SANITIZER thread.
```sh
make SANITIZER=thread
./runtest-moduleapi --config io-threads 4 --config io-threads-do-reads yes --accurate
```

1. Fixed thread-safe issue in RM_UnblockClient()
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1831181220
* When blocking a client in a module using `RM_BlockClientOnKeys()` or
`RM_BlockClientOnKeysWithFlags()`
with a timeout_callback, calling RM_UnblockClient() in module threads
can lead to race conditions
     in `updateStatsOnUnblock()`.

     - Introduced: 
        Version: 6.2
        PR: #7491

     - Touch:
`server.stat_numcommands`, `cmd->latency_histogram`, `server.slowlog`,
and `server.latency_events`
     
     - Harm Level: High
Potentially corrupts the memory data of `cmd->latency_histogram`,
`server.slowlog`, and `server.latency_events`

     - Solution:
Differentiate whether the call to moduleBlockedClientTimedOut() comes
from the module or the main thread.
Since we can't know if RM_UnblockClient() comes from module threads, we
always assume it does and
let `updateStatsOnUnblock()` asynchronously update the unblock status.
     
* When error reply is called in timeout_callback(), ctx is not
thread-safe, eventually lead to race conditions in `afterErrorReply`.

     - Introduced: 
        Version: 6.2
        PR: #8217

     - Touch
       `server.stat_total_error_replies`, `server.errors`, 

     - Harm Level: High
       Potentially corrupts the memory data of `server.errors`
   
      - Solution: 
Make the ctx in `timeout_callback()` with `REDISMODULE_CTX_THREAD_SAFE`,
and asynchronously reply errors to the client.

2. Made RM_Reply*() family API thread-safe
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408707239
Call chain: `RM_Reply*()` -> `_addReplyToBufferOrList()` -> touch
server.current_client

    - Introduced: 
       Version: 7.2.0
       PR: #12326

   - Harm Level: None
Since the module fake client won't have the `CLIENT_PUSHING` flag, even
if we touch server.current_client,
     we can still exit after `c->flags & CLIENT_PUSHING`.

   - Solution
      Checking `c->flags & CLIENT_PUSHING` earlier.

3. Made freeClient() thread-safe
    Fix #12785

    - Introduced: 
       Version: 4.0
Commit:
3fcf959e60

    - Harm Level: Moderate
       * Trigger assertion
It happens when the module thread calls freeClient while the io-thread
is in progress,
which just triggers an assertion, and doesn't make any race condiaions.

* Touch `server.current_client`, `server.stat_clients_type_memory`, and
`clientMemUsageBucket->clients`.
It happens between the main thread and the module threads, may cause
data corruption.
1. Error reset `server.current_client` to NULL, but theoretically this
won't happen,
because the module has already reset `server.current_client` to old
value before entering freeClient.
2. corrupts `clientMemUsageBucket->clients` in
updateClientMemUsageAndBucket().
3. Causes server.stat_clients_type_memory memory statistics to be
inaccurate.
    
    - Solution:
* No longer counts memory usage on fake clients, to avoid updating
`server.stat_clients_type_memory` in freeClient.
* No longer resetting `server.current_client` in unlinkClient, because
the fake client won't be evicted or disconnected in the mid of the
process.
* Judgment assertion `io_threads_op == IO_THREADS_OP_IDLE` only if c is
not a fake client.

4. Fixed free client args without GIL
Related discussion:
https://github.com/redis/redis/pull/12817#discussion_r1408706695
When freeing retained strings in the module thread (refcount decr), or
using them in some way (refcount incr), we should do so while holding
the GIL,
otherwise, they might be simultaneously freed while the main thread is
processing the unblock client state.

    - Introduced: 
       Version: 6.2.0
       PR: #8141

   - Harm Level: Low
     Trigger assertion or double free or memory leak. 

   - Solution:
Documenting that module API users need to ensure any access to these
retained strings is done with the GIL locked

5. Fix adding fake client to server.clients_pending_write
    It will incorrectly log the memory usage for the fake client.
Related discussion:
https://github.com/redis/redis/pull/12817#issuecomment-1851899163

    - Introduced: 
       Version: 4.0
Commit:
9b01b64430

    - Harm Level: None
      Only result in NOP

    - Solution:
       * Don't add fake client into server.clients_pending_write
* Add c->conn assertion for updateClientMemUsageAndBucket() and
updateClientMemoryUsage() to avoid same
         issue in the future.
So now it will be the responsibility of the caller of both of them to
avoid passing in fake client.

6. Fix calling RM_BlockedClientMeasureTimeStart() and
RM_BlockedClientMeasureTimeEnd() without GIL
    - Introduced: 
       Version: 6.2
       PR: #7491

   - Harm Level: Low
Causes inaccuracies in command latency histogram and slow logs, but does
not corrupt memory.

   - Solution:
Module API users, if know that non-thread-safe APIs will be used in
multi-threading, need to take responsibility for protecting them with
their own locks instead of the GIL, as using the GIL is too expensive.

### Other issue
1. RM_Yield is not thread-safe, fixed via #12905.

### Summarize
1. Fix thread-safe issues for `RM_UnblockClient()`, `freeClient()` and
`RM_Yield`, potentially preventing memory corruption, data disorder, or
assertion.
2. Updated docs and module test to clarify module API users'
responsibility for locking non-thread-safe APIs in multi-threading, such
as RM_BlockedClientMeasureTimeStart/End(), RM_FreeString(),
RM_RetainString(), and RM_HoldString().

### About backpot to 7.2
1. The implement of (1) is not too satisfying, would like to get more
eyes.
2. (2), (3) can be safely for backport
3. (4), (6) just modifying the module tests and updating the
documentation, no need for a backpot.
4. (5) is harmless, no need for a backpot.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2025-04-22 12:35:44 +03:00
debing.sun
7278a0c26a Make RM_Yield thread-safe (#12905)
## Issues and solutions from #12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
    
    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
    #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
    #1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)

  Previous read of size 4 at 0x0001045990c0 by main thread:
    #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
    #1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
    #3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
    #3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)

  Previous write of size 4 at 0x000100b54000 by main thread:
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
    #3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```

## The final fix as the comments:
https://github.com/redis/redis/pull/12817#discussion_r1436427232
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
    
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
https://github.com/redis/redis/pull/12817#discussion_r1436427232
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
2025-04-22 12:35:36 +03:00
YaacovHazan
ba18105722 Redis 7.2.7 2025-01-06 16:03:47 +02:00
YaacovHazan
e344b2b587 Fix LUA garbage collector (CVE-2024-46981)
Reset GC state before closing the lua VM to prevent user data
to be wrongly freed while still might be used on destructor callbacks.
2025-01-06 16:03:47 +02:00
YaacovHazan
15e212bf69 Fix Read/Write key pattern selector (CVE-2024-51741)
The '%' rule must contain one or both of R/W
2025-01-06 16:03:47 +02:00
Steve
52b6c0a27b Avoid cluster.nodes load corruption due to shard-id generation (#13468)
PR #13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on #13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-01-06 16:03:47 +02:00
debing.sun
a904067f6f Fix incorrect lag due to trimming stream via XTRIM command (#13473)
## Describe
When using the `XTRIM` command to trim a stream, it does not update the
maximal tombstone (`max_deleted_entry_id`). This leads to an issue where
the lag calculation incorrectly assumes that there are no tombstones
after the consumer group's last_id, resulting in an inaccurate lag.

The reason XTRIM doesn't need to update the maximal tombstone is that it
always trims from the beginning of the stream. This means that it
consistently changes the position of the first entry, leading to the
following scenarios:

1) First entry trimmed after maximal tombstone:
If the first entry is trimmed to a position after the maximal tombstone,
all tombstones will be before the first entry, so they won't affect the
consumer group's lag.

2) First entry trimmed before maximal tombstone:
If the first entry is trimmed to a position before the maximal
tombstone, the maximal tombstone will not be updated.

## Solution
Therefore, this PR optimizes the lag calculation by ensuring that when
both the consumer group's last_id and the maximal tombstone are behind
the first entry, the consumer group's lag is always equal to the number
of remaining elements in the stream.

Supplement to PR https://github.com/redis/redis/pull/13338
2025-01-06 16:03:47 +02:00
debing.sun
3ce29e0b11 Pass extensions to node if extension processing is handled by it (#13465)
This PR is based on the commits from PR
https://github.com/valkey-io/valkey/pull/52.
Ref: https://github.com/redis/redis/pull/12760
Close https://github.com/redis/redis/issues/13401
This PR will replace https://github.com/redis/redis/pull/13449

Fixes compatibilty of Redis cluster (7.2 - extensions enabled by
default) with older Redis cluster (< 7.0 - extensions not handled) .

With some of the extensions enabled by default in 7.2 version, new nodes
running 7.2 and above start sending out larger clusterbus message
payload including the ping extensions. This caused an incompatibility
with node running engine versions < 7.0. Old nodes (< 7.0) would receive
the payload from new nodes (> 7.2) would observe a payload length
(totlen) > (estlen) and would perform an early exit and won't process
the message.

This fix does the following things:
1. Always set `CLUSTERMSG_FLAG0_EXT_DATA`, because during the meet
phase, we do not know whether the connected node supports ext data, we
need to make sure that it knows and send back its ext data if it has.
2. If another node does not support ext data, we will not send it ext
data to avoid the handshake failure due to the incorrect payload length.

Note: A successful `PING`/`PONG` is required as a sender for a given
node to be marked as `CLUSTERMSG_FLAG0_EXT_DATA` and then extensions
message
will be sent to it. This could cause a slight delay in receiving the
extensions message(s).

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
2025-01-06 16:03:47 +02:00
debing.sun
17be6d92e8 Ensure validity of myself as master or replica when loading cluster config (#13443)
First, we need to ensure that `curmaster` in
`clusterUpdateSlotsConfigWith()` is not NULL in the line
82f00f5179/src/cluster_legacy.c (L2320)
otherwise, it will crash in the
82f00f5179/src/cluster_legacy.c (L2395)

So when loading cluster node config, we need to ensure that the
following conditions are met:
1. A node must be at least one of the master or replica.
2. If a node is a replica, its master can't be NULL.
2025-01-06 16:03:47 +02:00
debing.sun
f4a6721dfd Fix CLUSTER SHARDS command returns empty array (#13422)
Close https://github.com/redis/redis/issues/13414

When the cluster's master node fails and is switched to another node,
the first node in the shard node list (the old master) is no longer
valid.
Add a new method clusterGetMasterFromShard() to obtain the current
master.
2025-01-06 16:03:47 +02:00
debing.sun
0e06e67a36 Fix incorrect lag field in XINFO when tombstone is after the last_id of consume group (#13338)
Fix #13337

Ths PR fixes fixed two bugs that caused lag calculation errors.
1. When the latest tombstone is before the first entry, the tombstone
may stil be after the last id of consume group.
2. When a tombstone is after the last id of consume group, the group's
counter will be invalid, we should caculate the entries_read by using
estimates.
2025-01-06 16:03:47 +02:00
Oran Agra
045617e10a Fix possible crash due to OOM panic on invalid command (#13380)
getKeysUsingKeySpece had the range check AFTER the allocation, of the
keys buffer, which could lead to an OOM panic when invalid arguments are
provided, leading to an overflow.
The allocated memory is only used after the range check, so there's no
risk of buffer overrun.
The OOM panic can happen on 32bit builds, or 64 builds running on
systems with less than 4GB of RAM, and is reachable via the COMMAND
GETKEYSANDFLAGS, and ACL key name validation.
2025-01-06 16:03:47 +02:00
debing.sun
4a92d66ca2 Don't keep global replication buffer reference for replicas marked CLIENT_CLOSE_ASAP (#13363)
In certain situations, we might generate a large number of propagates
(e.g., multi/exec, Lua script, or a single command generating tons of
propagations) within an event loop.
During the process of propagating to a replica, if the replica is
disconnected(marked as CLIENT_CLOSE_ASAP) due to exceeding the output
buffer limit, we should remove its reference to the global replication
buffer to avoid the global replication buffer being unable to be
properly trimmed due to being referenced.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-01-06 16:03:47 +02:00
gms
2c8ae06b67 Fix crash due to unblock client during slot migration (#13311)
In #13224, we found a crash during cluster slot migration but don't know
why. So i check all the return C_OK in processCommand to see if we are
missing some duration reset and see this.

This fix is like #12247, when we reject the command, we should reset the
duration. I test it and verify it can fix #13224.

So the reason may because we are using stream block and then during the
slot migration, it got a redirect and then crash the server.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-01-06 16:03:47 +02:00
Ted Lyngmo
e2aaa9096f Log the real reason for why posix_fadvise failed (#13246)
`reclaimFilePageCache` did not set `errno` but `rdbSaveInternal` which
is logging the error assumed it did. This makes sure `errno` is set.

Fixes #13245

Signed-off-by: Ted Lyngmo <ted@lyncon.se>
2025-01-06 16:03:47 +02:00
debing.sun
9f823685d2 Have consistent behavior of SPUBLISH within multi/exec like regular command (#13276)
This PR is based on the commits from PR #12944.

Allow SPUBLISH command within multi/exec on replica

Behavior on unstable:

```
127.0.0.1:6380> CLUSTER NODES
39ce8aa20f1f0d91f1a88d976ee1926dfefcdf1a 127.0.0.1:6380@16380 myself,slave 8b0feb120b68aac489d6a5af9c77dc40d71bc792 0 0 0 connected
8b0feb120b68aac489d6a5af9c77dc40d71bc792 127.0.0.1:6379@16379 master - 0 1705091681202 0 connected 0-16383
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
(error) MOVED 866 127.0.0.1:6379
```

With this change:

```
127.0.0.1:6380> SPUBLISH hello world
(integer) 0
127.0.0.1:6380> MULTI
OK
127.0.0.1:6380(TX)> SPUBLISH hello world
QUEUED
127.0.0.1:6380(TX)> EXEC
1) (integer) 0
```

---------

Co-authored-by: Harkrishn Patro <harkrisp@amazon.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-01-06 16:03:47 +02:00
Oran Agra
ae6a2aa95c Release Redis 7.2.6 2024-10-02 22:13:33 +03:00
Oran Agra
c8649f8e85 Prevent pattern matching abuse (CVE-2024-31228) 2024-10-02 22:13:33 +03:00
Oran Agra
b351d5a321 Fix ACL SETUSER Read/Write key pattern selector (CVE-2024-31227)
The '%' rule must contain one or both of R/W
2024-10-02 22:13:33 +03:00
debing.sun
2ad2548747 Fixed crashes due to missed slotToKeyInit() and missed expires_cursor reset (#13315)
this PR fixes two crashes:

1. Fix missing slotToKeyInit() when using `flushdb async` under cluster
mode.
    https://github.com/redis/redis/issues/13205

2. Fix missing expires_cursor reset when stopping active defrag in the
middle of defragment.
    https://github.com/redis/redis/issues/13307
If we stop active defrag in the middle of defragging db->expires, if
`expires_cursor` is not reset to 0, the next time we enable active
defrag again, defragLaterStep(db, ...) will be entered. However, at this
time, `db` has been reset to NULL, which results in crash.

The affected code were removed by #11695 and #13058 in usntable, so we
just need backport this to 7.2.
2024-06-18 18:02:22 +08:00
YaacovHazan
f60370ce28 Redis 7.2.5 2024-05-19 09:12:35 +03:00
Yanqi Lv
464aad9ee7 fix wrong data type conversion in zrangeResultBeginStore (#13148)
In `beginResultEmission`, -1 means the result length is not known in
advance. But after #12185, if we pass -1 to `zrangeResultBeginStore`, it
will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`.
Although `dictExpand` won't succeed because the size overflows, I think
we'd better to avoid this wrong conversion.

This bug can be triggered when the source of `zrangestore` doesn't exist
or we use `zrangestore` command with `byscore` or `bylex`.
The impact is that dst keys will be converted to use skiplist instead of
listpack.

(cherry picked from commit bad33f8738)
2024-05-19 09:12:35 +03:00
Binbin
439b8da475 Fix redis-check-aof incorrectly considering data in manifest format as MP-AOF (#12958)
The check in fileIsManifest misjudged the manifest file. For example,
if resp aof contains "file", it will be considered a manifest file and
the check will fail:
```
*3
$3
set
$4
file
$4
file
```

In #12951, if the preamble aof also contains it, it will also fail.
Fixes #12951.

the bug was happening if the the word "file" is mentioned
in the first 1024 lines of the AOF. and now as soon as it finds
a non-comment line it'll break (if it contains "file" or doesn't)

(cherry picked from commit da727ad445)
2024-05-19 09:12:35 +03:00
Matthew Douglass
1caaf581f1 Fix conversion of numbers in lua args to redis args (#13115)
Since lua_Number is not explicitly an integer or a double, we need to
make an effort
to convert it as an integer when that's possible, since the string could
later be used
in a context that doesn't support scientific notation (e.g. 1e9 instead
of 100000000).

Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`,
which ever is shorter,
this would break if we try to pass a long integer number to a command
that takes integer.
we'll get an implicit conversion to string in Lua, and then the parsing
in getLongLongFromObjectOrReply will fail.

```
> eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0
(nil)
> eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0
(error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1.
```

Switch to using ll2string if the number can be safely represented as a
long long.

The problem was introduced in #10587 (Redis 7.2).
closes #13113.

---------

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 5fdaa53d20)
2024-05-19 09:12:35 +03:00
LiiNen
8f70fcc65b Fix redis-cli --count (for --scan, --bigkeys, etc) was ignored unless --pattern was also used (#13092)
The --count option for redis-cli has been released in redis 7.2.
https://github.com/redis/redis/pull/12042
But I have found in code, that some logic was missing for using this
'count' option.

```
static redisReply *sendScan(unsigned long long *it) {
    redisReply *reply;

    if (config.pattern)
        reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d",
            *it, config.pattern, sdslen(config.pattern), config.count);
    else
        reply = redisCommand(context,"SCAN %llu",*it);
```

The intention was being able to using scan count.
But in this case, the --count will be only applied when 'pattern' is
declared.
So, I had fix it simply, to be worked properly - even if --pattern
option is not being used.

I tested it simply with time() command several times, and I could see it
works as intended with this commit.
The examples of test results are below:
```
# unstable build

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.287s
user    0m0.011s
sys     0m0.022s

# count is not applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m1.117s
user    0m0.011s
sys     0m0.020s

# count is applied with --pattern

time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.045s
user    0m0.002s
sys     0m0.002s
```

```
# fix-redis-cli-scan-count build
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null)

real    0m1.084s
user    0m0.008s
sys     0m0.024s

# count is applied even if --pattern is not declared
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null)

real    0m0.043s
user    0m0.000s
sys     0m0.004s

# of course this also applied
time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null)

real    0m0.031s
user    0m0.002s
sys     0m0.002s
```

Thanks a lot.

(cherry picked from commit 763827c981)
2024-05-19 09:12:35 +03:00