## Crash fix
### Current behavior
We might crash if we fail to collect some of the threads' output. If it exceeds timeout for example.
The threads mngr API guarantees that the output array length will be `tids_len`, however, some
indices can be NULL, in case it fails to collect some of the threads' outputs.
When we use the threads mngr to collect the threads' stacktraces, we rely on this and skip NULL
entries. Since the output array was allocated with malloc, instead of NULL, it contained garbage,
so we got a segmentation fault when trying to read this garbage. (in debug.c:writeStacktraces() )
### fix
Allocate the global output array with zcalloc.
### To reproduce the bug, you'll have to change the code:
**in threadsmngr:ThreadsManager_runOnThreads():**
make sure the g_output_array allocation is initialized with garbage and not 0s
(add `memset(g_output_array, 2, sizeof(void*) * tids_len);` below the allocation).
Force one of the threads to write to the array:
add a global var: `static redisAtomic size_t return_now = 0;`
add to `invoke_callback()` before writing to the output array:
```
size_t i_return;
atomicGetIncr(return_now, i_return, 1);
if(i_return == 1) return;
```
compile, start the server with `--enable-debug-command local` and run `redis-cli debug assert`
The assertion triggers the the stacktrace collection.
Expect to get 2 prints of the stack trace - since we get the segmentation fault after we return from
the threads mngr, it can be safely triggered again.
## Added global variables r/w lock in ThreadsManager
To avoid a situation where the main thread runs `ThreadsManager_cleanups` while threads are still
invoking the signal handler, we use a r/w lock.
For cleanups, we will acquire the write lock.
The threads will acquire the read lock to enable them to write simultaneously.
If we fail to acquire the read lock, it means cleanups are in progress and we return immediately.
After acquiring the lock we can safely check that the global output array wasn't nullified and proceed
to write to it.
This way we ensure the threads are not modifying the global variables/ trying to write to the output
array after they were zeroed/nullified/destroyed(the semaphore).
## other minor logging change
1. removed logging if the semaphore times out because the threads can still write to the output array
after this check. Instead, we print the total number of printed stacktraces compared to the exacted
number (len_tids).
2. use noinline attribute to make sure the uplevel number of ignored stack trace entries stays correct.
3. improve testing
Co-authored-by: Oran Agra <oran@redislabs.com>
Redis Test Suite
The normal execution mode of the test suite involves starting and manipulating
local redis-server instances, inspecting process state, log files, etc.
The test suite also supports execution against an external server, which is
enabled using the --host and --port parameters. When executing against an
external server, tests tagged external:skip are skipped.
There are additional runtime options that can further adjust the test suite to match different external server configurations:
| Option | Impact |
|---|---|
--singledb |
Only use database 0, don't assume others are supported. |
--ignore-encoding |
Skip all checks for specific encoding. |
--ignore-digest |
Skip key value digest validations. |
--cluster-mode |
Run in strict Redis Cluster compatibility mode. |
--large-memory |
Enables tests that consume more than 100mb |
Tags
Tags are applied to tests to classify them according to the subsystem they test, but also to indicate compatibility with different run modes and required capabilities.
Tags can be applied in different context levels:
start_servercontexttagscontext that bundles several tests together- A single test context.
The following compatibility and capability tags are currently used:
| Tag | Indicates |
|---|---|
external:skip |
Not compatible with external servers. |
cluster:skip |
Not compatible with --cluster-mode. |
large-memory |
Test that requires more than 100mb |
tls:skip |
Not compatible with --tls. |
needs:repl |
Uses replication and needs to be able to SYNC from server. |
needs:debug |
Uses the DEBUG command or other debugging focused commands (like OBJECT REFCOUNT). |
needs:pfdebug |
Uses the PFDEBUG command. |
needs:config-maxmemory |
Uses CONFIG SET to manipulate memory limit, eviction policies, etc. |
needs:config-resetstat |
Uses CONFIG RESETSTAT to reset statistics. |
needs:reset |
Uses RESET to reset client connections. |
needs:save |
Uses SAVE or BGSAVE to create an RDB file. |
When using an external server (--host and --port), filtering using the
external:skip tags is done automatically.
When using --cluster-mode, filtering using the cluster:skip tag is done
automatically.
When not using --large-memory, filtering using the largemem:skip tag is done
automatically.
In addition, it is possible to specify additional configuration. For example, to
run tests on a server that does not permit SYNC use:
./runtest --host <host> --port <port> --tags -needs:repl