1464 Commits

Author SHA1 Message Date
Oran Agra
37b3b2a7e0 Fix range issues in ZRANDMEMBER and HRANDFIELD (CVE-2023-22458)
missing range check in ZRANDMEMBER and HRANDIFLD leading to panic due
to protocol limitations
2023-01-16 18:41:08 +02:00
Oran Agra
5453899878 Avoid integer overflows in SETRANGE and SORT (CVE-2022-35977)
Authenticated users issuing specially crafted SETRANGE and SORT(_RO)
commands can trigger an integer overflow, resulting with Redis attempting
to allocate impossible amounts of memory and abort with an OOM panic.
2023-01-16 18:41:08 +02:00
Oran Agra
3148f3e8a5 Obuf limit, exit during loop in *RAND* commands and KEYS
Related to the hang reported in #11671
Currently, redis can disconnect a client due to reaching output buffer limit,
it'll also avoid feeding that output buffer with more data, but it will keep
running the loop in the command (despite the client already being marked for
disconnection)

This PR is an attempt to mitigate the problem, specifically for commands that
are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER.
The RAND family of commands can take a negative COUNT argument (which is not
bound to the number of elements in the key), so it's enough to create a key
with one field, and then these commands can be used to hang redis.
For KEYS the caller can use the existing keyspace in redis (if big enough).
2023-01-16 18:41:08 +02:00
Yossi Gottlieb
de4d78d7df Fix TLS tests on newer tcl-tls/OpenSSL. (#10910)
Before this commit, TLS tests on Ubuntu 22.04 would fail as dropped
connections result with an ECONNABORTED error thrown instead of an empty
read.

(cherry picked from commit 69d5576832)
2022-12-12 17:02:54 +02:00
Yossi Gottlieb
e5e642aa9b Use 'gcc' instead of 'ld' to link test modules. (#9710)
This solves several problems in a more elegant way:

* No need to explicitly use `-lc` on x86_64 when building with `-m32`.
* Avoids issues with undefined floating point emulation funcs on ARM.

(cherry picked from commit f26e90be0c)
2022-12-12 17:02:54 +02:00
Yossi Gottlieb
52cccfbe94 Fix daily failures due to macos-latest change. (#9637)
* Fix test modules linking on macOS 11.x.
* Use macOS 10.x for FreeBSD VM as VirtualBox is not yet supported on
  11.

(cherry picked from commit 6d5a911707)
2022-12-12 17:02:54 +02:00
Oran Agra
0a3ae0f0a4 Try to fix a race in psync2 test (#11553)
This test sets the master ping interval to 1 hour, in order to avoid
pings in the replicatoin stream incrementing the replication offset,
however, it didn't increase the repl-timeout so on slow machines
where the test took more than 60 seconds, the replicas would drop
and reconnect.

```
*** [err]: PSYNC2: Partial resync after restart using RDB aux fields in tests/integration/psync2.tcl
Replica didn't partial sync
```

The test would detect 4 additional partial syncs where it expects
only one.

(cherry picked from commit b0250b4508)
2022-12-12 17:02:54 +02:00
uriyage
39e5a6fa2f Module CLIENT_CHANGE, Fix crash on free blocked client with DB!=0 (#11500)
In moduleFireServerEvent we change the real client DB to 0 on freeClient in case the event is REDISMODULE_EVENT_CLIENT_CHANGE.
It results in a crash if the client is blocked on a key on other than DB 0.

The DB change is not necessary even for module-client, as we set its DB to 0 on either createClient or moduleReleaseTempClient.

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit e4eb18b303)
2022-12-12 17:02:54 +02:00
Oran Agra
4ac3d79bfc fixes for fork child exit and test: #11463 (#11499)
Fix a few issues with the recent #11463
* use exitFromChild instead of exit
* test should ignore defunct process since that's what we expect to
  happen for thees child processes when the parent dies.
* fix typo

Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 4c54528f0f)
2022-12-12 17:02:54 +02:00
Oran Agra
51fa40ff42 diskless master, avoid bgsave child hung when fork parent crashes (#11463)
During a diskless sync, if the master main process crashes, the child would
have hung in `write`. This fix closes the read fd on the child side, so that if the
parent crashes, the child will get a write error and exit.

This change also fixes disk-based replication, BGSAVE and AOFRW.
In that case the child wouldn't have been hang, it would have just kept
running until done which may be pointless.

There is a certain degree of risk here. in case there's a BGSAVE child that could
maybe succeed and the parent dies for some reason, the old code would have let
the child keep running and maybe succeed and avoid data loss.
On the other hand, if the parent is restarted, it would have loaded an old rdb file
(or none), and then the child could reach the end and rename the rdb file (data
conflicting with what the parent has), or also have a race with another BGSAVE
child that the new parent started.

Note that i removed a comment saying a write error will be ignored in the child
and handled by the parent (this comment was very old and i don't think relevant).

(cherry picked from commit ccaef5c923)
2022-12-12 17:02:54 +02:00
DarrenJiang13
ec5564a4cf fix the client type in trackingInvalidateKey() (#11052)
Fix bug with scripts ignoring client tracking NOLOOP and
send an invalidation message anyway.

(cherry picked from commit 44859a41ee)
2022-12-12 17:02:54 +02:00
Huang Zhw
2cd5f6f3ff tracking pending invalidation message of flushdb sent by (#11068)
trackingHandlePendingKeyInvalidations should use proto.

(cherry picked from commit 61451b02cb)
2022-12-12 17:02:54 +02:00
Huang Zhw
fbcd591b28 When client tracking is on, invalidation message of flushdb in a (#11038)
When FLUSHDB / FLUSHALL / SWAPDB is inside MULTI / EXEC, the
client side tracking invalidation message was interleaved with transaction response.

(cherry picked from commit 6f0a27e38e)
2022-12-12 17:02:54 +02:00
Meir Shpilraien (Spielrein)
e1b17f1232 Fix #11030, use lua_rawget to avoid triggering metatables and crash. (#11032)
Fix #11030, use lua_rawget to avoid triggering metatables.

#11030 shows how return `_G` from the Lua script (either function or eval), cause the
Lua interpreter to Panic and the Redis processes to exit with error code 1.
Though return `_G` only panic on Redis 7 and 6.2.7, the underline issue exists on older
versions as well (6.0 and 6.2). The underline issue is returning a table with a metatable
such that the metatable raises an error.

The following example demonstrate the issue:
```
127.0.0.1:6379> eval "local a = {}; setmetatable(a,{__index=function() foo() end}) return a" 0
Error: Server closed the connection
```
```
PANIC: unprotected error in call to Lua API (user_script:1: Script attempted to access nonexistent global variable 'foo')
```

The Lua panic happened because when returning the result to the client, Redis needs to
introspect the returning table and transform the table into a resp. In order to scan the table,
Redis uses `lua_gettable` api which might trigger the metatable (if exists) and might raise an error.
This code is not running inside `pcall` (Lua protected call), so raising an error causes the
Lua to panic and exit. Notice that this is not a crash, its a Lua panic that exit with error code 1.

Returning `_G` panics on Redis 7 and 6.2.7 because on those versions `_G` has a metatable
that raises error when trying to fetch a none existing key.

### Solution

Instead of using `lua_gettable` that might raise error and cause the issue, use `lua_rawget`
that simply return the value from the table without triggering any metatable logic.
This is promised not to raise and error.

The downside of this solution is that it might be considered as breaking change, if someone
rely on metatable in the returned value. An alternative solution is to wrap this entire logic
with `pcall` (Lua protected call), this alternative require a much bigger refactoring.

### Back Porting

The same fix will work on older versions as well (6.2, 6.0). Notice that on those version,
the issue can cause Redis to crash if inside the metatable logic there is an attempt to accesses
Redis (`redis.call`). On 7.0, there is not crash and the `redis.call` is executed as if it was done
from inside the script itself.

### Tests

Tests was added the verify the fix

(cherry picked from commit 020e046b42)
2022-12-12 17:02:54 +02:00
Oran Agra
70aaf50082 optimize zset conversion on large ZRANGESTORE (#10789)
when we know the size of the zset we're gonna store in advance,
we can check if it's greater than the listpack encoding threshold,
in which case we can create a skiplist from the get go, and avoid
converting the listpack to skiplist later after it was already populated.

(cherry picked from commit 2189100383)
2022-12-12 17:02:54 +02:00
Vitaly
fd7bde5726 Fix ZRANGESTORE crash when zset_max_listpack_entries is 0 (#10767)
When `zrangestore` is called container destination object is created.
Before this PR we used to create a listpack based object even if `zset-max-ziplist-entries`
or equivalent`zset-max-listpack-entries` was set to 0.
This triggered immediate conversion of the listpack into a skiplist in `zrangestore`, which hits
an assertion resulting in an engine crash.

Added a TCL test that reproduces this issue.

(cherry picked from commit 6461f09f43)
2022-12-12 17:02:54 +02:00
Yossi Gottlieb
3053337043 Fix test modules build issue on OS X 11. (#9658)
(cherry picked from commit 8bf4c2e38c)
2022-04-27 16:31:52 +03:00
Oran Agra
215c5eacb0 Skip cluster test unit in TLS mode.
see 7d6744c739
2022-04-27 16:31:52 +03:00
Oran Agra
afb48c6cc9 Whitelist Lua print function to avoid breaking change in old releases 2022-04-27 16:31:52 +03:00
Yossi Gottlieb
ba2feb3004 Clean unused var compiler warning in module test. (#9289)
(cherry picked from commit 8bf433dc86)
2022-04-27 16:31:52 +03:00
sundb
aa0606db08 Fix memory leak due to missing freeCallback in blockonbackground moduleapi test (#9499)
Before #9497, before redis-server was shut down, we did not manually shut down all the clients,
which would have prevented valgrind from detecting a memory leak in the client's argc.

(cherry picked from commit 1376d83363)
2022-04-27 16:31:52 +03:00
Oran Agra
d92f2f5ad6 test suite improvements pulled back from 7.0 for cherry picked commits 2022-04-27 16:31:52 +03:00
qetu3790
159981e73c Fix geo search bounding box check causing missing results (#10018)
Consider the following example:
1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2.
2. geodist k1 n1 n2 returns  "4891.9380"
3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1.
n2 is in the  boundingbox but out of search areas.So we let  search areas contain boundingbox to get n2.

Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit b2d393b990)
2022-04-27 16:31:52 +03:00
Oran Agra
d6a8e64e69 Fix and improve module error reply statistics (#10278)
This PR handles several aspects
1. Calls to RM_ReplyWithError from thread safe contexts don't violate thread safety.
2. Errors returning from RM_Call to the module aren't counted in the statistics (they
  might be handled silently by the module)
3. When a module propagates a reply it got from RM_Call to it's client, then the error
  statistics are counted.

This is done by:
1. When appending an error reply to the output buffer, we avoid updating the global
  error statistics, instead we cache that error in a deferred list in the client struct.
2. When creating a RedisModuleCallReply object, the deferred error list is moved from
  the client into that object.
3. when a module calls RM_ReplyWithCallReply we copy the deferred replies to the dest
  client (if that's a real client, then that's when the error statistics are updated to the server)

Note about RM_ReplyWithCallReply: if the original reply had an array with errors, and the module
replied with just a portion of the original reply, and not the entire reply, the errors are currently not
propagated and the errors stats will not get propagated.

Fix #10180

(cherry picked from commit b099889a3a)
2022-04-27 16:31:52 +03:00
Binbin
e24b947c9b LPOP/RPOP with count against non existing list return null array (#10095)
It used to return `$-1` in RESP2, now we will return `*-1`.
This is a bug in redis 6.2 when COUNT was added, the `COUNT`
option was introduced in #8179. Fix #10089.

the documentation of [LPOP](https://redis.io/commands/lpop) says
```
When called without the count argument:
Bulk string reply: the value of the first element, or nil when key does not exist.

When called with the count argument:
Array reply: list of popped elements, or nil when key does not exist.
```

(cherry picked from commit 39feee8e3a)
2022-04-27 16:31:52 +03:00
guybe7
c5a753c890 lpGetInteger returns int64_t, avoid overflow (#10068)
Fix #9410

Crucial for the ms and sequence deltas, but I changed all
calls, just in case (e.g. "flags")

Before this commit:
`ms_delta` and `seq_delta` could have overflown, causing `currid` to be wrong,
which in turn would cause `streamTrim` to trim the entire rax node (see new test)

(cherry picked from commit 7cd6a64d2f)
2022-04-27 16:31:52 +03:00
Madelyn Olson
066b68328e Redact ACL SETUSER arguments if the user has spaces (#9935)
(cherry picked from commit c40d23b89f)
2022-04-27 16:31:52 +03:00
Meir Shpilraien (Spielrein)
93c1d31d97 Clean Lua stack before parsing call reply to avoid crash on a call with many arguments (#9809)
This commit 0f8b634cd (CVE-2021-32626 released in 6.2.6, 6.0.16, 5.0.14)
fixes an invalid memory write issue by using `lua_checkstack` API to make
sure the Lua stack is not overflow. This fix was added on 3 places:
1. `luaReplyToRedisReply`
2. `ldbRedis`
3. `redisProtocolToLuaType`

On the first 2 functions, `lua_checkstack` is handled gracefully while the
last is handled with an assert and a statement that this situation can
not happened (only with misbehave module):

> the Redis reply might be deep enough to explode the LUA stack (notice
that currently there is no such command in Redis that returns such a nested
reply, but modules might do it)

The issue that was discovered is that user arguments is also considered part
of the stack, and so the following script (for example) make the assertion reachable:
```
local a = {}
for i=1,7999 do
    a[i] = 1
end
return redis.call("lpush", "l", unpack(a))
```

This is a regression because such a script would have worked before and now
its crashing Redis. The solution is to clear the function arguments from the Lua
stack which makes the original assumption true and the assertion unreachable.

(cherry picked from commit 6b0b04f1b2)
2022-04-27 16:31:52 +03:00
Binbin
8fca090ede Add tests to cover EXPIRE overflow fix (#9839)
In #8287, some overflow checks have been added. But when
`when *= 1000` overflows, it will become a positive number.
And the check not able to catch it. The key will be added with
a short expiration time and will deleted a few seconds later.

In #9601, will check the overflow after `*=` and return an
error first, and avoiding this situation.

In this commit, added some tests to cover those code paths.
Found it in #9825, and close it.

(cherry picked from commit 9273d09dd4)
2022-04-27 16:31:52 +03:00
Oran Agra
e38d0b5a5f fix invalid read on corrupt ziplist (#9831)
If the last bytes in ziplist are corrupt and we decode from tail to head,
we may reach slightly outside the ziplist.

(cherry picked from commit a3a014294f)
2022-04-27 16:31:52 +03:00
Itamar Haber
968cd2b967 Fixes LPOP/RPOP wrong replies when count is 0 (#9692)
Introduced in #8179, this fixes the command's replies in the 0 count edge case.
[BREAKING] changes the reply type when count is 0 to an empty array (instead of nil)
Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE

(cherry picked from commit 06dd202a05)
2022-04-27 16:31:52 +03:00
Oran Agra
3ba7c6acbe fix new cluster tests issues (#9657)
Following #9483 the daily CI exposed a few problems.

* The cluster creation code (uses redis-cli) is complicated to test with TLS enabled.
  for now i'm just skipping them since the tests we run there don't really need that kind of coverage
* cluster port binding failures
  note that `find_available_port` already looks for a free cluster port
  but the code in `wait_server_started` couldn't detect the failure of binding
  (the text it greps for wasn't found in the log)

(cherry picked from commit 7d6744c739)
2022-04-27 16:31:52 +03:00
qetu3790
936ee01759 Release clients blocked on module commands in cluster resharding and down state (#9483)
Prevent clients from being blocked forever in cluster when they block with their own module command
and the hash slot is migrated to another master at the same time.
These will get a redirection message when unblocked.
Also, release clients blocked on module commands when cluster is down (same as other blocked clients)

This commit adds basic tests for the main (non-cluster) redis test infra that test the cluster.
This was done because the cluster test infra can't handle some common test features,
but most importantly we only build the test modules with the non-cluster test suite.

note that rather than really supporting cluster operations by the test infra, it was added (as dup code)
in two files, one for module tests and one for non-modules tests, maybe in the future we'll refactor that.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 4962c5526d)
2022-04-27 16:31:52 +03:00
Huang Zhw
0fb96d55bd Make tracking invalidation messages always after command's reply (#9422)
Tracking invalidation messages were sometimes sent in inconsistent order,
before the command's reply rather than after.
In addition to that, they were sometimes embedded inside other commands
responses, like MULTI-EXEC and MGET.

(cherry picked from commit fd135f3e2d)
2022-04-27 16:31:52 +03:00
Binbin
f5d8a36983 Add missing pause tcl test to test_helper.tcl (#9158)
* Add keyname tags to avoid CROSSSLOT errors in external server CI
* Use new wait_for_blocked_clients_count in pause.tcl

(cherry picked from commit 5dddf496ce)
2022-04-27 16:31:52 +03:00
meir
11b602fbf8 Protect any table which is reachable from globals and added globals allow list.
The allow list is done by setting a metatable on the global table before initializing
any library. The metatable set the `__newindex` field to a function that check
the allow list before adding the field to the table. Fields which is not on the
allow list are simply ignored.

After initialization phase is done we protect the global table and each table
that might be reachable from the global table. For each table we also protect
the table metatable if exists.
2022-04-27 16:31:52 +03:00
meir
b2ce3719af Protect globals of evals scripts.
Use the new `lua_enablereadonlytable` Lua API to protect the global tables of
evals scripts. The implemetation is easy, we simply call `lua_enablereadonlytable`
on the global table to turn it into a readonly table.
2022-04-27 16:31:52 +03:00
Vo Trong Phuc
34505d26f7 add check good slaves to write when execute script (#10249)
There was no check min-slave-* config when evaluating Lua script.
Add check enough good slaves for write command when evaluating scripts.

Co-authored-by: Phuc. Vo Trong <phucvt@vng.com.vn>
2022-04-11 12:59:27 +03:00
Oran Agra
aba9517542 corrupt-dump-fuzzer test, avoid creating junk keys (#9302)
The execution of the RPOPLPUSH command by the fuzzer created junk keys,
that were later being selected by RANDOMKEY and modified.
This also meant that lists were statistically tested more than other
files.

Fix the fuzzer not to pass junk key names to RPOPLPUSH, and add a check
that detects that new keys are not added by the fuzzer to detect future
similar issues.

(cherry picked from commit 3f3f678a47)
2021-10-04 13:59:40 +03:00
sundb
7540708a61 Fix missing check for sanitize_dump in corrupt-dump-fuzzer test (#9285)
this means the assertion that checks that when deep sanitization is enabled,
there are no crashes, was missing.

(cherry picked from commit 3db0f1a284)
2021-10-04 13:59:40 +03:00
Oran Agra
73d286d523 Fix stream sanitization for non-int first value (#9553)
This was recently broken in #9321 when we validated stream IDs to be
integers but did that after to the stepping next record instead of before.

(cherry picked from commit 5a4ab7c7d2)
2021-10-04 13:59:40 +03:00
zhaozhao.zz
9b25484a13 Fix wrong offset when replica pause (#9448)
When a replica paused, it would not apply any commands event the command comes from master, if we feed the non-applied command to replication stream, the replication offset would be wrong, and data would be lost after failover(since replica's `master_repl_offset` grows but command is not applied).

To fix it, here are the changes:
* Don't update replica's replication offset or propagate commands to sub-replicas when it's paused in `commandProcessed`.
* Show `slave_read_repl_offset` in info reply.
* Add an assert to make sure master client should never be blocked unless pause or module (some modules may use block way to do background (parallel) processing and forward original block module command to the replica, it's not a good way but it can work, so the assert excludes module now, but someday in future all modules should rewrite block command to propagate like what `BLPOP` does).

(cherry picked from commit 1b83353dc3)
2021-10-04 13:59:40 +03:00
Madelyn Olson
49f8f43890 Add test verifying PUBSUB NUMPAT behavior (#9209)
(cherry picked from commit 8b8f05c86c)
2021-10-04 13:59:40 +03:00
sundb
a2e8a3a241 Sanitize dump payload: fix double free after insert dup nodekey to stream rax and returns 0 (#9399)
(cherry picked from commit 492d8d0961)
2021-10-04 13:59:40 +03:00
sundb
09c63c45dd Sanitize dump payload: handle remaining empty key when RDB loading and restore command (#9349)
This commit mainly fixes empty keys due to RDB loading and restore command,
which was omitted in #9297.

1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned.
    If only some of the ziplists are empty, then we will skip the empty ziplists silently.
2) When loading hash zipmap, if zipmap is empty, sanitization check will fail.
3) When loading hash ziplist, if ziplist is empty, NULL will be returned.
4) Add RDB loading test with sanitize.

(cherry picked from commit cbda492909)
2021-10-04 13:59:40 +03:00
DarrenJiang13
1ed0f049fe [BUGFIX] Add some missed error statistics (#9328)
add error counting for some missed behaviors.

(cherry picked from commit 43eb0ce3bf)
2021-10-04 13:59:40 +03:00
Oran Agra
4b04ca0b18 Improvements to corrupt payload sanitization (#9321)
Recently we found two issues in the fuzzer tester: #9302 #9285
After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <sundbcn@gmail.com>
(cherry picked from commit 0c90370e6d)
2021-10-04 13:59:40 +03:00
sundb
2f54107289 Sanitize dump payload: fix empty keys when RDB loading and restore command (#9297)
When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This PR mainly fixes the following:
1) When restore command will return `Bad data format` error.
2) When loading RDB, we will silently discard the key.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 8ea777a6a0)
2021-10-04 13:59:40 +03:00
Viktor Söderqvist
77386ae011 redis-cli ASK redirect test: Add retry loop to fix timing issue (#9315)
(cherry picked from commit 1c59567a7f)
2021-10-04 13:59:40 +03:00
Oran Agra
0c959294a8 Skip new redis-cli ASK test in TLS mode (#9312)
(cherry picked from commit 52df350fe5)
2021-10-04 13:59:40 +03:00