Compiled Redis with COVERAGE_TEST, while using the fork API encountered
the following issue:
- Forked process calls `RedisModule_ExitFromChild` - child process
starts to report its COW while performing IO operations
- Parent process terminates child process with
`RedisModule_KillForkChild`
- Child process signal handler gets called while an IO operation is
called
- exit() is called because COVERAGE_TEST was on during compilation.
- exit() tries to perform more IO operations in its exit handlers.
- process gets deadlocked
Backtrace snippet:
```
#0 futex_wait (private=0, expected=2, futex_word=0x7e1220000c50) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait_private (futex=0x7e1220000c50) at ./nptl/lowlevellock.c:34
#2 0x00007e1234696429 in __GI__IO_flush_all () at ./libio/genops.c:698
#3 0x00007e123469680d in _IO_cleanup () at ./libio/genops.c:843
#4 0x00007e1234647b74 in __run_exit_handlers (status=status@entry=255, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:129
#5 0x00007e1234647bbe in __GI_exit (status=status@entry=255) at ./stdlib/exit.c:138
#6 0x00005ef753264e13 in exitFromChild (retcode=255) at /home/jonathan/CLionProjects/redis/src/server.c:263
#7 sigKillChildHandler (sig=<optimized out>) at /home/jonathan/CLionProjects/redis/src/server.c:6794
#8 <signal handler called>
#9 0x00007e1234685b94 in _IO_fgets (buf=buf@entry=0x7e122dafdd90 "KSM:", ' ' <repeats 19 times>, "0 kB\n", n=n@entry=1024, fp=fp@entry=0x7e1220000b70) at ./libio/iofgets.c:47
#10 0x00005ef75326c5e0 in fgets (__stream=<optimized out>, __n=<optimized out>, __s=<optimized out>, __s=<optimized out>, __n=<optimized out>, __stream=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:200
#11 zmalloc_get_smap_bytes_by_field (field=0x5ef7534c42fd "Private_Dirty:", pid=<optimized out>) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:928
#12 0x00005ef75338ab1f in zmalloc_get_private_dirty (pid=-1) at /home/jonathan/CLionProjects/redis/src/zmalloc.c:978
#13 sendChildInfoGeneric (info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE, keys=0, progress=-1, pname=0x5ef7534c95b2 "Module fork") at /home/jonathan/CLionProjects/redis/src/childinfo.c:71
#14 0x00005ef75337962c in sendChildCowInfo (pname=0x5ef7534c95b2 "Module fork", info_type=CHILD_INFO_TYPE_MODULE_COW_SIZE) at /home/jonathan/CLionProjects/redis/src/server.c:6895
#15 RM_ExitFromChild (retcode=0) at /home/jonathan/CLionProjects/redis/src/module.c:11468
```
Change is to make the exit() _exit() calls conditional based on a
parameter to exitFromChild function.
The signal handler should exit without io operations since it doesn't
know its history.(If we were in the middle of IO operations before it
was called)
---------
Co-authored-by: Yuan Wang <wangyuancode@163.com>
This PR fixes
https://github.com/redis/redis/issues/14056#issuecomment-3026114590
## Summary
Because evport uses `eventLoop->events[fd].mask` to determine whether to
remove the event, but in ae.c we call `aeApiDelEvent()` before updating
`eventLoop->events[fd].mask`, this causes evport to always see the old
value, and as a result, `port_dissociate()` is never called to remove
the fd.
This issue may not surface easily in a non-multithreaded, but since in
the multi-threaded case we frequently reassign fds to different threads,
it makes the crash much more likely to occur.
it aims to create listpacks of 500k, but did that with 5 insertions of
100k each, instead do that in one insertion, reducing the need for
listpack gradual growth, and reducing the number of commands we send.
apparently there are some stalls reading the replies of the commands,
specifically in GH actions, reducing the number of commands seems to
eliminate that.
Vector Sets deserialization was not designed to resist corrupted data,
assuming that a good checksum would mean everything is fine. However
Redis allows the user to specify extra protection via a specific
configuration option.
This commit makes the implementation more resistant, at the cost of some
slowdown. This also fixes a serialization bug that is unrelated (and has
no memory corruption effects) about the lack of the worst index /
distance serialization, that could lower the quality of a graph after
links are replaced. I'll address the serialization issues in a new PR
that will focus on that aspect alone (already work in progress).
The net result is that loading vector sets is, when the serialization of
worst index/distance is missing (always, for now) 100% slower, that is 2
times the loading time we had before. Instead when the info will be
added it will be just 10/15% slower, that is, just making the new sanity
checks.
It may be worth to export to modules if advanced sanity check if needed
or not. Anyway most of the slowdown in this patch comes from having to
recompute the worst neighbor, since duplicated and non reciprocal links
detection was heavy optimized with probabilistic algorithms.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This bug was introduced by https://github.com/redis/redis/issues/13814
When defragmenting `db->expires`, if the process exits early and
`db->expires` was modified in the meantime (e.g., FLUSHDB), we need to
check whether the previously defragmented expires is still the same as
the current one when resuming. If they differ, we should abort the
current defragmentation of expires.
However, in https://github.com/redis/redis/issues/13814, I made a
mistake by using `db->keys` and `db->expires`, as expires will never be
defragged.
When `repl-diskless-load` is enabled on a replica, and it is in the
process of loading an RDB file, a broken connection detected by the main
channel may trigger a call to rioAbort(). This sets a flag to cause the
rdb channel to fail on the next rioRead() call, allowing it to perform
necessary cleanup.
However, there are specific scenarios where the error is checked using
rioGetReadError(), which does not account for the RIO_ABORT flag (see
[source](79b37ff535/src/rdb.c (L3098))).
As a result, the error goes undetected. The code then proceeds to
validate a module type, fails to find a match, and calls
rdbReportCorruptRDB() which logs the following error and exits the
process:
```
The RDB file contains module data I can't load: no matching module type '_________'
```
To fix this issue, the RIO_ABORT flag has been removed. Now, rioAbort()
sets both read and write error flags, so that subsequent operations and
error checks properly detect the failure.
Additional keys were added to the short read test. It reproduces the
issue with this change. We hit that problematic line once per key. My
guess is that with many smaller keys, the likelihood of the connection
being killed at just the right moment increases.
Hi, as described, this implements WITHATTRIBS, a feature requested by a
few users, and indeed needed.
This was requested the first time by @rowantrollope but I was not sure
how to make it work with RESP2 and RESP3 in a clean way, hopefully
that's it.
The patch includes tests and documentation updates.
This bug was introduced in
[#13814](https://github.com/redis/redis/issues/13814), and was found by
@guybe7.
It incorrectly moved the update of `server.cronloops` from
`whileBlockedCron()` to `activeDefragTimeProc()`,
causing the cron-based timers to effectively run twice as fast when
active defrag is enabled.
As a result, memory statistics are not updated during blocked
operations.
The repair parts from https://github.com/redis/redis/pull/13995, because
it needs to be backport, so use a separate pr repair it.
# Add LOLWUT 8: TAPE MARK I - Computer Poetry Generation
This PR introduces LOLWUT 8, implementing Nanni Balestrini's
groundbreaking TAPE MARK I algorithm from 1962 - one of the first
experiments in computer-generated poetry.
## Background
TAPE MARK I, created by Italian poet Nanni Balestrini and published in
Almanacco Letterario Bompiani (1962), represents a [pioneering moment in
computational creativity](https://en.wikipedia.org/wiki/Digital_poetry).
Using an IBM 7090 mainframe, Balestrini developed an algorithm that
combines verses from three different literary sources:
1. **Diary of Hiroshima** by Michihito Hachiya
2. **The Mystery of the Elevator** by Paul Goldwin
3. **Tao Te Ching** by Lao Tse
The algorithm selects and arranges verses based on metrical
compatibility rules and ensures alternation between different literary
sources, creating unique poetic combinations with each execution.
## Implementation
This LOLWUT command faithfully reproduces Balestrini's original
algorithm.
The main difference is that the default output is in English, and not in
Italian. However it should be noted that Balestrini used three poems
that were not in Italian anyway, so the translation process was already
part of it. In the English versions, sometimes I operated minimal
changes in order to preserve either the metric, or to make sure that the
sentence stands on its own (like adding "it" before expands rapidly).
## Cultural Significance
TAPE MARK I predates most computational art experiments and demonstrates
the early intersection of literature, technology, and algorithmic
creativity. This implementation honors that pioneering work while making
it accessible to a modern audience through Redis's LOLWUT tradition.
Each execution generates a unique poem, just as Balestrini intended.
Trivia: the original code, running on an IBM 7090, used six minutes to
generate each verse :D
**IMPORTANT** This commit should be back-ported to Redis 8.
Hi all, this PR fixes two things:
1. An assertion, that prevented the RDB loading from recovery if there
was a quantization type mismatch (with regression test).
2. Two code paths that just returned NULL without proper cleanup during
RDB loading.
This PR adds support for REDISMODULE_OPTIONS_HANDLE_IO_ERRORS.
and tests for short read and corrupted RESTORE payload.
Please: note that I also removed the comment about async loading support
since we should be already covered. No manipulation of global data
structures in Vector Sets, if not for the unique ID used to create new
vector sets with different IDs.
This PR fixes an issue in the CI test for client-output-buffer-limit,
which was causing an infinite loop when running on macOS 15.4.
### Problem
This test start two clients, R and R1:
```c
R1 subscribe foo
R publish foo bar
```
When R executes `PUBLISH foo bar`, the server first stores the message
`bar` in R1‘s buf. Only when the space in buf is insufficient does it
call `_addReplyProtoToList`.
Inside this function, `closeClientOnOutputBufferLimitReached` is invoked
to check whether the client’s R1 output buffer has reached its
configured limit.
On macOS 15.4, because the server writes to the client at a high speed,
R1’s buf never gets full. As a result,
`closeClientOnOutputBufferLimitReached` in the test is never triggered,
causing the test to never exit and fall into an infinite loop.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
This PR replaces cJSON with an home-made parser designed for the kind of
access pattern the FILTER option of VSIM performs on JSON objects. The
main points here are:
* cJSON forces us to parse the whole JSON, create a graph of cJSON
objects, then we need to seek in O(N) to find the right field.
* The cJSON object associated with the value is not of the same format
as the expr.c virtual machine. We needed a conversion function doing
more allocation and work.
* Right now we only support top level fields in the JSON object, so a
full parser is not needed.
With all these things in mind, and after carefully profiling the old
code, I realized that a specialized parser able to parse JSON in a
zero-allocation fashion and only actually parse the value associated to
our key would be much more efficient. Moreover, after this change, the
dependencies of Vector Sets to external code drops to zero, and the
count of lines of code is 3000 lines less. The new line count with LOC
is 4200, making Vector Sets easily the smallest full featured
implementation of a Vector store available.
# Speedup achieved
In a dataset with JSON objects with 30 fields, 1 million elements, the
following query shows a 3.5x speedup:
vsim vectors:million ele ele943903 FILTER ".field29 > 1000 and .field15
< 50"
Please note that we get **3.5x speedup** in the VSIM command itself.
This means that the actual JSON parsing speedup is significantly greater
than that. However, in Redis land, under my past kingdom of many years
ago, the rule was that an improvement would produce speedups that are
*user facing*. This PR definitely qualifies.
What is interesting is that even with a JSON containing a single element
the speedup is of about 70%, so we are faster even in the worst case.
# Further info
Note that the new skipping parser, may happily process JSON objects that
are not perfectly valid, as soon as they look valid from the POV of
balancing [] and {} and so forth. This should not be an issue. Anyway
invalid JSON produces random results (the element is skipped at all even
if it would pass the filter).
Please feel free to ask me anything about the new implementation before
merging.
Since after https://github.com/redis/redis/pull/13695,
`io-threads-do-reads` config is deprecated, we should remove it from
normal config list and only keep it in deprecated config list, but we
forgot to do this, this PR fixes this.
thanks @YaacovHazan for reporting this
Used the augment agent to fix a given commands.json
Agent summary:
I've successfully fixed the `vectorset-commands.json` file to make it
coherent with the standard command files under `src/commands`. Here's a
summary of the changes I made:
1. Changed `type: "enum"` with `enum: ["TOKEN"]` to use the standard
format:
- For fixed tokens: token: `"TOKEN"` and `type: "pure-token"`
- For multiple choice options: `type: "oneof"` with nested arguments
2. Added missing fields to each command:
- `arity`: The number of arguments the command takes
- `function`: The C function that implements the command
- `command_flags`: Flags that describe the command's behavior
- Reorganized the structure to match the standard format:
3. Moved `group` and `since` to be consistent with other command files
- Properly structured the arguments with the correct types
4. Fixed the `multiple` attribute for parameters that can accept
multiple values
These changes make the vectorset-commands.json file consistent with the
standard command files under src/commands, while still keeping it as a
single file containing all the vector set commands as requested.
### Problem
A previous PR (https://github.com/redis/redis/pull/13932) fixed the TCP
port issue in CLUSTER SLOTS, but it seems the handling of the TLS port
was overlooked.
There is this comment in the `addNodeToNodeReply` function in the
`cluster.c` file:
```c
/* Report TLS ports to TLS client, and report non-TLS port to non-TLS client. */
addReplyLongLong(c, clusterNodeClientPort(node, shouldReturnTlsInfo()));
addReplyBulkCBuffer(c, clusterNodeGetName(node), CLUSTER_NAMELEN);
```
### Fixed
This PR fixes the TLS port issue and adds relevant tests.
This PR fix the lag calculation by ensuring that when consumer group's last_id
is behind the first entry, the consumer group's entries read is considered
invalid and recalculated from the start of the stream
Supplement to PR #13473Close#13957
Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
This MR includes minor improvements and grammatical fixes in the
documentation. Specifically:
• Corrected grammatical mistakes in sentences for better clarity.
• Fixed typos and improved phrasing to enhance readability.
• Ensured consistency in terminology and sentence structure.
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Close https://github.com/redis/redis/issues/13892
config set port cmd updates server.port. cluster slot retrieves
information about cluster slots and their associated nodes. the fix
updates this info when config set port cmd is done, so cluster slots cmd
returns the right value.
from the master's perspective, the replica can become online before it's
actually done loading the rdb file.
this was always like that, in disk-based repl, and thus ok with diskless
and rdb channel.
in this test, because all the keys are added before the backlog is
created, the replication offset is 0, so the test proceeds and could get
a LOADING error when trying to run the function.
If HGETEX command deletes the only field due to lazy expiry, Redis
currently sends `del` KSN (Keyspace Notification) first, followed by
`hexpired` KSN. The order should be reversed, `hexpired` should be sent
first and `del` later.
Additonal changes: More test coverage for HGETDEL KSN
---------
Co-authored-by: hristosko <hristosko.chaushev@redis.com>
This test was introduced by https://github.com/redis/redis/issues/13853
We determine if the client is in blocked status, but if async flushdb is
completed before checking the blocked status, the test will fail.
So modify the test to only determine if `lazyfree_pending_objects` is
correct to ensure that flushdb is async, that is, the client must be
blocked.