mirror of
https://github.com/redis/redis.git
synced 2026-04-21 03:01:35 -04:00
## Introduction Redis introduced IO Thread in 6.0, allowing IO threads to handle client request reading, command parsing and reply writing, thereby improving performance. The current IO thread implementation has a few drawbacks. - The main thread is blocked during IO thread read/write operations and must wait for all IO threads to complete their current tasks before it can continue execution. In other words, the entire process is synchronous. This prevents the efficient utilization of multi-core CPUs for parallel processing. - When the number of clients and requests increases moderately, it causes all IO threads to reach full CPU utilization due to the busy wait mechanism used by the IO threads. This makes it challenging for us to determine which part of Redis has reached its bottleneck. - When IO threads are enabled with TLS and io-threads-do-reads, a disconnection of a connection with pending data may result in it being assigned to multiple IO threads simultaneously. This can cause race conditions and trigger assertion failures. Related issue: https://github.com/redis/redis/issues/12540 Therefore, we designed an asynchronous IO threads solution. The IO threads adopt an event-driven model, with the main thread dedicated to command processing, meanwhile, the IO threads handle client read and write operations in parallel. ## Implementation ### Overall As before, we did not change the fact that all client commands must be executed on the main thread, because Redis was originally designed to be single-threaded, and processing commands in a multi-threaded manner would inevitably introduce numerous race and synchronization issues. But now each IO thread has independent event loop, therefore, IO threads can use a multiplexing approach to handle client read and write operations, eliminating the CPU overhead caused by busy-waiting. the execution process can be briefly described as follows: the main thread assigns clients to IO threads after accepting connections, IO threads will notify the main thread when clients finish reading and parsing queries, then the main thread processes queries from IO threads and generates replies, IO threads handle writing reply to clients after receiving clients list from main thread, and then continue to handle client read and write events. ### Each IO thread has independent event loop We now assign each IO thread its own event loop. This approach eliminates the need for the main thread to perform the costly `epoll_wait` operation for handling connections (except for specific ones). Instead, the main thread processes requests from the IO threads and hands them back once completed, fully offloading read and write events to the IO threads. Additionally, all TLS operations, including handling pending data, have been moved entirely to the IO threads. This resolves the issue where io-threads-do-reads could not be used with TLS. ### Event-notified client queue To facilitate communication between the IO threads and the main thread, we designed an event-notified client queue. Each IO thread and the main thread have two such queues to store clients waiting to be processed. These queues are also integrated with the event loop to enable handling. We use pthread_mutex to ensure the safety of queue operations, as well as data visibility and ordering, and race conditions are minimized, as each IO thread and the main thread operate on independent queues, avoiding thread suspension due to lock contention. And we implemented an event notifier based on `eventfd` or `pipe` to support event-driven handling. ### Thread safety Since the main thread and IO threads can execute in parallel, we must handle data race issues carefully. **client->flags** The primary tasks of IO threads are reading and writing, i.e. `readQueryFromClient` and `writeToClient`. However, IO threads and the main thread may concurrently modify or access `client->flags`, leading to potential race conditions. To address this, we introduced an io-flags variable to record operations performed by IO threads, thereby avoiding race conditions on `client->flags`. **Pause IO thread** In the main thread, we may want to operate data of IO threads, maybe uninstall event handler, access or operate query/output buffer or resize event loop, we need a clean and safe context to do that. We pause IO thread in `IOThreadBeforeSleep`, do some jobs and then resume it. To avoid thread suspended, we use busy waiting to confirm the target status. Besides we use atomic variable to make sure memory visibility and ordering. We introduce these functions to pause/resume IO Threads as below. ``` pauseIOThread, resumeIOThread pauseAllIOThreads, resumeAllIOThreads pauseIOThreadsRange, resumeIOThreadsRange ``` Testing has shown that `pauseIOThread` is highly efficient, allowing the main thread to execute nearly 200,000 operations per second during stress tests. Similarly, `pauseAllIOThreads` with 8 IO threads can handle up to nearly 56,000 operations per second. But operations performed between pausing and resuming IO threads must be quick; otherwise, they could cause the IO threads to reach full CPU utilization. **freeClient and freeClientAsync** The main thread may need to terminate a client currently running on an IO thread, for example, due to ACL rule changes, reaching the output buffer limit, or evicting a client. In such cases, we need to pause the IO thread to safely operate on the client. **maxclients and maxmemory-clients updating** When adjusting `maxclients`, we need to resize the event loop for all IO threads. Similarly, when modifying `maxmemory-clients`, we need to traverse all clients to calculate their memory usage. To ensure safe operations, we pause all IO threads during these adjustments. **Client info reading** The main thread may need to read a client’s fields to generate a descriptive string, such as for the `CLIENT LIST` command or logging purposes. In such cases, we need to pause the IO thread handling that client. If information for all clients needs to be displayed, all IO threads must be paused. **Tracking redirect** Redis supports the tracking feature and can even send invalidation messages to a connection with a specified ID. But the target client may be running on IO thread, directly manipulating the client’s output buffer is not thread-safe, and the IO thread may not be aware that the client requires a response. In such cases, we pause the IO thread handling the client, modify the output buffer, and install a write event handler to ensure proper handling. **clientsCron** In the `clientsCron` function, the main thread needs to traverse all clients to perform operations such as timeout checks, verifying whether they have reached the soft output buffer limit, resizing the output/query buffer, or updating memory usage. To safely operate on a client, the IO thread handling that client must be paused. If we were to pause the IO thread for each client individually, the efficiency would be very low. Conversely, pausing all IO threads simultaneously would be costly, especially when there are many IO threads, as clientsCron is invoked relatively frequently. To address this, we adopted a batched approach for pausing IO threads. At most, 8 IO threads are paused at a time. The operations mentioned above are only performed on clients running in the paused IO threads, significantly reducing overhead while maintaining safety. ### Observability In the current design, the main thread always assigns clients to the IO thread with the least clients. To clearly observe the number of clients handled by each IO thread, we added the new section in INFO output. The `INFO THREADS` section can show the client count for each IO thread. ``` # Threads io_thread_0:clients=0 io_thread_1:clients=2 io_thread_2:clients=2 ``` Additionally, in the `CLIENT LIST` output, we also added a field to indicate the thread to which each client is assigned. `id=244 addr=127.0.0.1:41870 laddr=127.0.0.1:6379 ... resp=2 lib-name= lib-ver= io-thread=1` ## Trade-off ### Special Clients For certain special types of clients, keeping them running on IO threads would result in severe race issues that are difficult to resolve. Therefore, we chose not to offload these clients to the IO threads. For replica, monitor, subscribe, and tracking clients, main thread may directly write them a reply when conditions are met. Race issues are difficult to resolve, so we have them processed in the main thread. This includes the Lua debug clients as well, since we may operate connection directly. For blocking client, after the IO thread reads and parses a command and hands it over to the main thread, if the client is identified as a blocking type, it will be remained in the main thread. Once the blocking operation completes and the reply is generated, the client is transferred back to the IO thread to send the reply and wait for event triggers. ### Clients Eviction To support client eviction, it is necessary to update each client’s memory usage promptly during operations such as read, write, or command execution. However, when a client operates on an IO thread, it is not feasible to update the memory usage immediately due to the risk of data races. As a result, memory usage can only be updated either in the main thread while processing commands or in the `ClientsCron` periodically. The downside of this approach is that updates might experience a delay of up to one second, which could impact the precision of memory management for eviction. To avoid incorrectly evicting clients. We adopted a best-effort compensation solution, when we decide to eviction a client, we update its memory usage again before evicting, if the memory used by the client does not decrease or memory usage bucket is not changed, then we will evict it, otherwise, not evict it. However, we have not completely solved this problem. Due to the delay in memory usage updates, it may lead us to make incorrect decisions about the need to evict clients. ### Defragment In the majority of cases we do NOT use the data from argv directly in the db. 1. key names We store a copy that we allocate in the main thread, see `sdsdup()` in `dbAdd()`. 2. hash key and value We store key as hfield and store value as sds, see `hfieldNew()` and `sdsdup()` in `hashTypeSet()`. 3. other datatypes They don't even use SDS, so there is no reference issues. But in some cases client the data from argv may be retain by the main thread. As a result, during fragmentation cleanup, we need to move allocations from the IO thread’s arena to the main thread’s arena. We always allocate new memory in the main thread’s arena, but the memory released by IO threads may not yet have been reclaimed. This ultimately causes the fragmentation rate to be higher compared to creating and allocating entirely within a single thread. The following cases below will lead to memory allocated by the IO thread being kept by the main thread. 1. string related command: `append`, `getset`, `mset` and `set`. If `tryObjectEncoding()` does not change argv, we will keep it directly in the main thread, see the code in `tryObjectEncoding()`(specifically `trimStringObjectIfNeeded()`) 2. block related command. the key names will be kept in `c->db->blocking_keys`. 3. watch command the key names will be kept in `c->db->watched_keys`. 4. [s]subscribe command channel name will be kept in `serverPubSubChannels`. 5. script load command script will be kept in `server.lua_scripts`. 7. some module API: `RM_RetainString`, `RM_HoldString` Those issues will be handled in other PRs. ## Testing ### Functional Testing The commit with enabling IO Threads has passed all TCL tests, but we did some changes: **Client query buffer**: In the original code, when using a reusable query buffer, ownership of the query buffer would be released after the command was processed. However, with IO threads enabled, the client transitions from an IO thread to the main thread for processing. This causes the ownership release to occur earlier than the command execution. As a result, when IO threads are enabled, the client's information will never indicate that a shared query buffer is in use. Therefore, we skip the corresponding query buffer tests in this case. **Defragment**: Add a new defragmentation test to verify the effect of io threads on defragmentation. **Command delay**: For deferred clients in TCL tests, due to clients being assigned to different threads for execution, delays may occur. To address this, we introduced conditional waiting: the process proceeds to the next step only when the `client list` contains the corresponding commands. ### Sanitizer Testing The commit passed all TCL tests and reported no errors when compiled with the `fsanitizer=thread` and `fsanitizer=address` options enabled. But we made the following modifications: we suppressed the sanitizer warnings for clients with watched keys when updating `client->flags`, we think IO threads read `client->flags`, but never modify it or read the `CLIENT_DIRTY_CAS` bit, main thread just only modifies this bit, so there is no actual data race. ## Others ### IO thread number In the new multi-threaded design, the main thread is primarily focused on command processing to improve performance. Typically, the main thread does not handle regular client I/O operations but is responsible for clients such as replication and tracking clients. To avoid breaking changes, we still consider the main thread as the first IO thread. When the io-threads configuration is set to a low value (e.g., 2), performance does not show a significant improvement compared to a single-threaded setup for simple commands (such as SET or GET), as the main thread does not consume much CPU for these simple operations. This results in underutilized multi-core capacity. However, for more complex commands, having a low number of IO threads may still be beneficial. Therefore, it’s important to adjust the `io-threads` based on your own performance tests. Additionally, you can clearly monitor the CPU utilization of the main thread and IO threads using `top -H -p $redis_pid`. This allows you to easily identify where the bottleneck is. If the IO thread is the bottleneck, increasing the `io-threads` will improve performance. If the main thread is the bottleneck, the overall performance can only be scaled by increasing the number of shards or replicas. --------- Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: oranagra <oran@redislabs.com>
525 lines
23 KiB
Tcl
525 lines
23 KiB
Tcl
proc cmdstat {cmd} {
|
|
return [cmdrstat $cmd r]
|
|
}
|
|
|
|
proc errorstat {cmd} {
|
|
return [errorrstat $cmd r]
|
|
}
|
|
|
|
proc latency_percentiles_usec {cmd} {
|
|
return [latencyrstat_percentiles $cmd r]
|
|
}
|
|
|
|
start_server {tags {"info" "external:skip"}} {
|
|
start_server {} {
|
|
|
|
test {latencystats: disable/enable} {
|
|
r config resetstat
|
|
r CONFIG SET latency-tracking no
|
|
r set a b
|
|
assert_match {} [latency_percentiles_usec set]
|
|
r CONFIG SET latency-tracking yes
|
|
r set a b
|
|
assert_match {*p50=*,p99=*,p99.9=*} [latency_percentiles_usec set]
|
|
r config resetstat
|
|
assert_match {} [latency_percentiles_usec set]
|
|
}
|
|
|
|
test {latencystats: configure percentiles} {
|
|
r config resetstat
|
|
assert_match {} [latency_percentiles_usec set]
|
|
r CONFIG SET latency-tracking yes
|
|
r SET a b
|
|
r GET a
|
|
assert_match {*p50=*,p99=*,p99.9=*} [latency_percentiles_usec set]
|
|
assert_match {*p50=*,p99=*,p99.9=*} [latency_percentiles_usec get]
|
|
r CONFIG SET latency-tracking-info-percentiles "0.0 50.0 100.0"
|
|
assert_match [r config get latency-tracking-info-percentiles] {latency-tracking-info-percentiles {0 50 100}}
|
|
assert_match {*p0=*,p50=*,p100=*} [latency_percentiles_usec set]
|
|
assert_match {*p0=*,p50=*,p100=*} [latency_percentiles_usec get]
|
|
r config resetstat
|
|
assert_match {} [latency_percentiles_usec set]
|
|
}
|
|
|
|
test {latencystats: bad configure percentiles} {
|
|
r config resetstat
|
|
set configlatencyline [r config get latency-tracking-info-percentiles]
|
|
catch {r CONFIG SET latency-tracking-info-percentiles "10.0 50.0 a"} e
|
|
assert_match {ERR CONFIG SET failed*} $e
|
|
assert_equal [s total_error_replies] 1
|
|
assert_match [r config get latency-tracking-info-percentiles] $configlatencyline
|
|
catch {r CONFIG SET latency-tracking-info-percentiles "10.0 50.0 101.0"} e
|
|
assert_match {ERR CONFIG SET failed*} $e
|
|
assert_equal [s total_error_replies] 2
|
|
assert_match [r config get latency-tracking-info-percentiles] $configlatencyline
|
|
r config resetstat
|
|
assert_match {} [errorstat ERR]
|
|
}
|
|
|
|
test {latencystats: blocking commands} {
|
|
r config resetstat
|
|
r CONFIG SET latency-tracking yes
|
|
r CONFIG SET latency-tracking-info-percentiles "50.0 99.0 99.9"
|
|
set rd [redis_deferring_client]
|
|
r del list1{t}
|
|
|
|
$rd blpop list1{t} 0
|
|
wait_for_blocked_client
|
|
r lpush list1{t} a
|
|
assert_equal [$rd read] {list1{t} a}
|
|
$rd blpop list1{t} 0
|
|
wait_for_blocked_client
|
|
r lpush list1{t} b
|
|
assert_equal [$rd read] {list1{t} b}
|
|
assert_match {*p50=*,p99=*,p99.9=*} [latency_percentiles_usec blpop]
|
|
$rd close
|
|
}
|
|
|
|
test {latencystats: subcommands} {
|
|
r config resetstat
|
|
r CONFIG SET latency-tracking yes
|
|
r CONFIG SET latency-tracking-info-percentiles "50.0 99.0 99.9"
|
|
r client id
|
|
|
|
assert_match {*p50=*,p99=*,p99.9=*} [latency_percentiles_usec client\\|id]
|
|
assert_match {*p50=*,p99=*,p99.9=*} [latency_percentiles_usec config\\|set]
|
|
}
|
|
|
|
test {latencystats: measure latency} {
|
|
r config resetstat
|
|
r CONFIG SET latency-tracking yes
|
|
r CONFIG SET latency-tracking-info-percentiles "50.0"
|
|
r DEBUG sleep 0.05
|
|
r SET k v
|
|
set latencystatline_debug [latency_percentiles_usec debug]
|
|
set latencystatline_set [latency_percentiles_usec set]
|
|
regexp "p50=(.+\..+)" $latencystatline_debug -> p50_debug
|
|
regexp "p50=(.+\..+)" $latencystatline_set -> p50_set
|
|
assert {$p50_debug >= 50000}
|
|
assert {$p50_set >= 0}
|
|
assert {$p50_debug >= $p50_set}
|
|
} {} {needs:debug}
|
|
|
|
test {errorstats: failed call authentication error} {
|
|
r config resetstat
|
|
assert_match {} [errorstat ERR]
|
|
assert_equal [s total_error_replies] 0
|
|
catch {r auth k} e
|
|
assert_match {ERR AUTH*} $e
|
|
assert_match {*count=1*} [errorstat ERR]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=1} [cmdstat auth]
|
|
assert_equal [s total_error_replies] 1
|
|
r config resetstat
|
|
assert_match {} [errorstat ERR]
|
|
}
|
|
|
|
test {errorstats: failed call within MULTI/EXEC} {
|
|
r config resetstat
|
|
assert_match {} [errorstat ERR]
|
|
assert_equal [s total_error_replies] 0
|
|
r multi
|
|
r set a b
|
|
r auth a
|
|
catch {r exec} e
|
|
assert_match {ERR AUTH*} $e
|
|
assert_match {*count=1*} [errorstat ERR]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=0} [cmdstat set]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=1} [cmdstat auth]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=0} [cmdstat exec]
|
|
assert_equal [s total_error_replies] 1
|
|
|
|
# MULTI/EXEC command errors should still be pinpointed to him
|
|
catch {r exec} e
|
|
assert_match {ERR EXEC without MULTI} $e
|
|
assert_match {*calls=2,*,rejected_calls=0,failed_calls=1} [cmdstat exec]
|
|
assert_match {*count=2*} [errorstat ERR]
|
|
assert_equal [s total_error_replies] 2
|
|
}
|
|
|
|
test {errorstats: failed call within LUA} {
|
|
r config resetstat
|
|
assert_match {} [errorstat ERR]
|
|
assert_equal [s total_error_replies] 0
|
|
catch {r eval {redis.pcall('XGROUP', 'CREATECONSUMER', 's1', 'mygroup', 'consumer') return } 0} e
|
|
assert_match {*count=1*} [errorstat ERR]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=1} [cmdstat xgroup\\|createconsumer]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=0} [cmdstat eval]
|
|
|
|
# EVAL command errors should still be pinpointed to him
|
|
catch {r eval a} e
|
|
assert_match {ERR wrong*} $e
|
|
assert_match {*calls=1,*,rejected_calls=1,failed_calls=0} [cmdstat eval]
|
|
assert_match {*count=2*} [errorstat ERR]
|
|
assert_equal [s total_error_replies] 2
|
|
}
|
|
|
|
test {errorstats: failed call NOSCRIPT error} {
|
|
r config resetstat
|
|
assert_equal [s total_error_replies] 0
|
|
assert_match {} [errorstat NOSCRIPT]
|
|
catch {r evalsha NotValidShaSUM 0} e
|
|
assert_match {NOSCRIPT*} $e
|
|
assert_match {*count=1*} [errorstat NOSCRIPT]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=1} [cmdstat evalsha]
|
|
assert_equal [s total_error_replies] 1
|
|
r config resetstat
|
|
assert_match {} [errorstat NOSCRIPT]
|
|
}
|
|
|
|
test {errorstats: failed call NOGROUP error} {
|
|
r config resetstat
|
|
assert_match {} [errorstat NOGROUP]
|
|
r del mystream
|
|
r XADD mystream * f v
|
|
catch {r XGROUP CREATECONSUMER mystream mygroup consumer} e
|
|
assert_match {NOGROUP*} $e
|
|
assert_match {*count=1*} [errorstat NOGROUP]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=1} [cmdstat xgroup\\|createconsumer]
|
|
r config resetstat
|
|
assert_match {} [errorstat NOGROUP]
|
|
}
|
|
|
|
test {errorstats: rejected call unknown command} {
|
|
r config resetstat
|
|
assert_equal [s total_error_replies] 0
|
|
assert_match {} [errorstat ERR]
|
|
catch {r asdf} e
|
|
assert_match {ERR unknown*} $e
|
|
assert_match {*count=1*} [errorstat ERR]
|
|
assert_equal [s total_error_replies] 1
|
|
r config resetstat
|
|
assert_match {} [errorstat ERR]
|
|
}
|
|
|
|
test {errorstats: rejected call within MULTI/EXEC} {
|
|
r config resetstat
|
|
assert_equal [s total_error_replies] 0
|
|
assert_match {} [errorstat ERR]
|
|
r multi
|
|
catch {r set} e
|
|
assert_match {ERR wrong number of arguments for 'set' command} $e
|
|
catch {r exec} e
|
|
assert_match {EXECABORT*} $e
|
|
assert_match {*count=1*} [errorstat ERR]
|
|
assert_match {*count=1*} [errorstat EXECABORT]
|
|
assert_equal [s total_error_replies] 2
|
|
assert_match {*calls=0,*,rejected_calls=1,failed_calls=0} [cmdstat set]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=0} [cmdstat multi]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=1} [cmdstat exec]
|
|
assert_equal [s total_error_replies] 2
|
|
r config resetstat
|
|
assert_match {} [errorstat ERR]
|
|
}
|
|
|
|
test {errorstats: rejected call due to wrong arity} {
|
|
r config resetstat
|
|
assert_equal [s total_error_replies] 0
|
|
assert_match {} [errorstat ERR]
|
|
catch {r set k} e
|
|
assert_match {ERR wrong number of arguments for 'set' command} $e
|
|
assert_match {*count=1*} [errorstat ERR]
|
|
assert_match {*calls=0,*,rejected_calls=1,failed_calls=0} [cmdstat set]
|
|
# ensure that after a rejected command, valid ones are counted properly
|
|
r set k1 v1
|
|
r set k2 v2
|
|
assert_match {calls=2,*,rejected_calls=1,failed_calls=0} [cmdstat set]
|
|
assert_equal [s total_error_replies] 1
|
|
}
|
|
|
|
test {errorstats: rejected call by OOM error} {
|
|
r config resetstat
|
|
assert_equal [s total_error_replies] 0
|
|
assert_match {} [errorstat OOM]
|
|
r config set maxmemory 1
|
|
catch {r set a b} e
|
|
assert_match {OOM*} $e
|
|
assert_match {*count=1*} [errorstat OOM]
|
|
assert_match {*calls=0,*,rejected_calls=1,failed_calls=0} [cmdstat set]
|
|
assert_equal [s total_error_replies] 1
|
|
r config resetstat
|
|
assert_match {} [errorstat OOM]
|
|
r config set maxmemory 0
|
|
}
|
|
|
|
test {errorstats: rejected call by authorization error} {
|
|
r config resetstat
|
|
assert_equal [s total_error_replies] 0
|
|
assert_match {} [errorstat NOPERM]
|
|
r ACL SETUSER alice on >p1pp0 ~cached:* +get +info +config
|
|
r auth alice p1pp0
|
|
catch {r set a b} e
|
|
assert_match {NOPERM*} $e
|
|
assert_match {*count=1*} [errorstat NOPERM]
|
|
assert_match {*calls=0,*,rejected_calls=1,failed_calls=0} [cmdstat set]
|
|
assert_equal [s total_error_replies] 1
|
|
r config resetstat
|
|
assert_match {} [errorstat NOPERM]
|
|
r auth default ""
|
|
}
|
|
|
|
test {errorstats: blocking commands} {
|
|
r config resetstat
|
|
set rd [redis_deferring_client]
|
|
$rd client id
|
|
set rd_id [$rd read]
|
|
r del list1{t}
|
|
|
|
$rd blpop list1{t} 0
|
|
wait_for_blocked_client
|
|
r client unblock $rd_id error
|
|
assert_error {UNBLOCKED*} {$rd read}
|
|
assert_match {*count=1*} [errorstat UNBLOCKED]
|
|
assert_match {*calls=1,*,rejected_calls=0,failed_calls=1} [cmdstat blpop]
|
|
assert_equal [s total_error_replies] 1
|
|
$rd close
|
|
}
|
|
|
|
test {errorstats: limit errors will not increase indefinitely} {
|
|
r config resetstat
|
|
for {set j 1} {$j <= 1100} {incr j} {
|
|
assert_error "$j my error message" {
|
|
r eval {return redis.error_reply(string.format('%s my error message', ARGV[1]))} 0 $j
|
|
}
|
|
}
|
|
|
|
assert_equal [count_log_message 0 "Errorstats stopped adding new errors"] 1
|
|
assert_equal [count_log_message 0 "Current errors code list"] 1
|
|
assert_equal "count=1" [errorstat ERRORSTATS_DISABLED]
|
|
|
|
# Since we currently have no metrics exposed for server.errors, we use lazyfree
|
|
# to verify that we only have 128 errors.
|
|
wait_for_condition 50 100 {
|
|
[s lazyfreed_objects] eq 128
|
|
} else {
|
|
fail "errorstats resetstat lazyfree error"
|
|
}
|
|
}
|
|
|
|
test {stats: eventloop metrics} {
|
|
set info1 [r info stats]
|
|
set cycle1 [getInfoProperty $info1 eventloop_cycles]
|
|
set el_sum1 [getInfoProperty $info1 eventloop_duration_sum]
|
|
set cmd_sum1 [getInfoProperty $info1 eventloop_duration_cmd_sum]
|
|
assert_morethan $cycle1 0
|
|
assert_morethan $el_sum1 0
|
|
assert_morethan $cmd_sum1 0
|
|
after 110 ;# default hz is 10, wait for a cron tick.
|
|
set info2 [r info stats]
|
|
set cycle2 [getInfoProperty $info2 eventloop_cycles]
|
|
set el_sum2 [getInfoProperty $info2 eventloop_duration_sum]
|
|
set cmd_sum2 [getInfoProperty $info2 eventloop_duration_cmd_sum]
|
|
if {$::verbose} { puts "eventloop metrics cycle1: $cycle1, cycle2: $cycle2" }
|
|
assert_morethan $cycle2 $cycle1
|
|
assert_lessthan $cycle2 [expr $cycle1+10] ;# we expect 2 or 3 cycles here, but allow some tolerance
|
|
if {$::verbose} { puts "eventloop metrics el_sum1: $el_sum1, el_sum2: $el_sum2" }
|
|
assert_morethan $el_sum2 $el_sum1
|
|
assert_lessthan $el_sum2 [expr $el_sum1+100000] ;# we expect roughly 100ms here, but allow some tolerance
|
|
if {$::verbose} { puts "eventloop metrics cmd_sum1: $cmd_sum1, cmd_sum2: $cmd_sum2" }
|
|
assert_morethan $cmd_sum2 $cmd_sum1
|
|
assert_lessthan $cmd_sum2 [expr $cmd_sum1+15000] ;# we expect about tens of ms here, but allow some tolerance
|
|
}
|
|
|
|
test {stats: instantaneous metrics} {
|
|
r config resetstat
|
|
set retries 0
|
|
for {set retries 1} {$retries < 4} {incr retries} {
|
|
after 1600 ;# hz is 10, wait for 16 cron tick so that sample array is fulfilled
|
|
set value [s instantaneous_eventloop_cycles_per_sec]
|
|
if {$value > 0} break
|
|
}
|
|
|
|
assert_lessthan $retries 4
|
|
if {$::verbose} { puts "instantaneous metrics instantaneous_eventloop_cycles_per_sec: $value" }
|
|
assert_morethan $value 0
|
|
assert_lessthan $value [expr $retries*15] ;# default hz is 10
|
|
set value [s instantaneous_eventloop_duration_usec]
|
|
if {$::verbose} { puts "instantaneous metrics instantaneous_eventloop_duration_usec: $value" }
|
|
assert_morethan $value 0
|
|
assert_lessthan $value [expr $retries*22000] ;# default hz is 10, so duration < 1000 / 10, allow some tolerance
|
|
}
|
|
|
|
test {stats: debug metrics} {
|
|
# make sure debug info is hidden
|
|
set info [r info]
|
|
assert_equal [getInfoProperty $info eventloop_duration_aof_sum] {}
|
|
set info_all [r info all]
|
|
assert_equal [getInfoProperty $info_all eventloop_duration_aof_sum] {}
|
|
|
|
set info1 [r info debug]
|
|
|
|
set aof1 [getInfoProperty $info1 eventloop_duration_aof_sum]
|
|
assert {$aof1 >= 0}
|
|
set cron1 [getInfoProperty $info1 eventloop_duration_cron_sum]
|
|
assert {$cron1 > 0}
|
|
set cycle_max1 [getInfoProperty $info1 eventloop_cmd_per_cycle_max]
|
|
assert {$cycle_max1 > 0}
|
|
set duration_max1 [getInfoProperty $info1 eventloop_duration_max]
|
|
assert {$duration_max1 > 0}
|
|
|
|
after 110 ;# hz is 10, wait for a cron tick.
|
|
set info2 [r info debug]
|
|
|
|
set aof2 [getInfoProperty $info2 eventloop_duration_aof_sum]
|
|
assert {$aof2 >= $aof1} ;# AOF is disabled, we expect $aof2 == $aof1, but allow some tolerance.
|
|
set cron2 [getInfoProperty $info2 eventloop_duration_cron_sum]
|
|
assert_morethan $cron2 $cron1
|
|
set cycle_max2 [getInfoProperty $info2 eventloop_cmd_per_cycle_max]
|
|
assert {$cycle_max2 >= $cycle_max1}
|
|
set duration_max2 [getInfoProperty $info2 eventloop_duration_max]
|
|
assert {$duration_max2 >= $duration_max1}
|
|
}
|
|
|
|
test {stats: client input and output buffer limit disconnections} {
|
|
r config resetstat
|
|
set info [r info stats]
|
|
assert_equal [getInfoProperty $info client_query_buffer_limit_disconnections] {0}
|
|
assert_equal [getInfoProperty $info client_output_buffer_limit_disconnections] {0}
|
|
# set qbuf limit to minimum to test stat
|
|
set org_qbuf_limit [lindex [r config get client-query-buffer-limit] 1]
|
|
r config set client-query-buffer-limit 1048576
|
|
catch {r set key [string repeat a 1048576]}
|
|
set info [r info stats]
|
|
assert_equal [getInfoProperty $info client_query_buffer_limit_disconnections] {1}
|
|
r config set client-query-buffer-limit $org_qbuf_limit
|
|
# set outbuf limit to just 10 to test stat
|
|
set org_outbuf_limit [lindex [r config get client-output-buffer-limit] 1]
|
|
r config set client-output-buffer-limit "normal 10 0 0"
|
|
r set key [string repeat a 100000] ;# to trigger output buffer limit check this needs to be big
|
|
catch {r get key}
|
|
set info [r info stats]
|
|
assert_equal [getInfoProperty $info client_output_buffer_limit_disconnections] {1}
|
|
r config set client-output-buffer-limit $org_outbuf_limit
|
|
} {OK} {logreqres:skip} ;# same as obuf-limits.tcl, skip logreqres
|
|
|
|
test {clients: pubsub clients} {
|
|
set info [r info clients]
|
|
assert_equal [getInfoProperty $info pubsub_clients] {0}
|
|
set rd1 [redis_deferring_client]
|
|
set rd2 [redis_deferring_client]
|
|
# basic count
|
|
assert_equal {1} [ssubscribe $rd1 {chan1}]
|
|
assert_equal {1} [subscribe $rd2 {chan2}]
|
|
set info [r info clients]
|
|
assert_equal [getInfoProperty $info pubsub_clients] {2}
|
|
# unsubscribe non existing channel
|
|
assert_equal {1} [unsubscribe $rd2 {non-exist-chan}]
|
|
set info [r info clients]
|
|
assert_equal [getInfoProperty $info pubsub_clients] {2}
|
|
# count change when client unsubscribe all channels
|
|
assert_equal {0} [unsubscribe $rd2 {chan2}]
|
|
set info [r info clients]
|
|
assert_equal [getInfoProperty $info pubsub_clients] {1}
|
|
# non-pubsub clients should not be involved
|
|
assert_equal {0} [unsubscribe $rd2 {non-exist-chan}]
|
|
set info [r info clients]
|
|
assert_equal [getInfoProperty $info pubsub_clients] {1}
|
|
# close all clients
|
|
$rd1 close
|
|
$rd2 close
|
|
wait_for_condition 100 50 {
|
|
[getInfoProperty [r info clients] pubsub_clients] eq {0}
|
|
} else {
|
|
fail "pubsub clients did not clear"
|
|
}
|
|
}
|
|
|
|
test {clients: watching clients} {
|
|
set r2 [redis_client]
|
|
assert_equal [s watching_clients] 0
|
|
assert_equal [s total_watched_keys] 0
|
|
assert_match {*watch=0*} [r client info]
|
|
assert_match {*watch=0*} [$r2 client info]
|
|
# count after watch key
|
|
$r2 watch key
|
|
assert_equal [s watching_clients] 1
|
|
assert_equal [s total_watched_keys] 1
|
|
assert_match {*watch=0*} [r client info]
|
|
assert_match {*watch=1*} [$r2 client info]
|
|
# the same client watch the same key has no effect
|
|
$r2 watch key
|
|
assert_equal [s watching_clients] 1
|
|
assert_equal [s total_watched_keys] 1
|
|
assert_match {*watch=0*} [r client info]
|
|
assert_match {*watch=1*} [$r2 client info]
|
|
# different client watch different key
|
|
r watch key2
|
|
assert_equal [s watching_clients] 2
|
|
assert_equal [s total_watched_keys] 2
|
|
assert_match {*watch=1*} [$r2 client info]
|
|
assert_match {*watch=1*} [r client info]
|
|
# count after unwatch
|
|
r unwatch
|
|
assert_equal [s watching_clients] 1
|
|
assert_equal [s total_watched_keys] 1
|
|
assert_match {*watch=0*} [r client info]
|
|
assert_match {*watch=1*} [$r2 client info]
|
|
$r2 unwatch
|
|
assert_equal [s watching_clients] 0
|
|
assert_equal [s total_watched_keys] 0
|
|
assert_match {*watch=0*} [r client info]
|
|
assert_match {*watch=0*} [$r2 client info]
|
|
|
|
# count after watch/multi/exec
|
|
$r2 watch key
|
|
assert_equal [s watching_clients] 1
|
|
$r2 multi
|
|
$r2 exec
|
|
assert_equal [s watching_clients] 0
|
|
# count after watch/multi/discard
|
|
$r2 watch key
|
|
assert_equal [s watching_clients] 1
|
|
$r2 multi
|
|
$r2 discard
|
|
assert_equal [s watching_clients] 0
|
|
# discard without multi has no effect
|
|
$r2 watch key
|
|
assert_equal [s watching_clients] 1
|
|
catch {$r2 discard} e
|
|
assert_equal [s watching_clients] 1
|
|
# unwatch without watch has no effect
|
|
r unwatch
|
|
assert_equal [s watching_clients] 1
|
|
# after disconnect, since close may arrive later, or the client may
|
|
# be freed asynchronously, we use a wait_for_condition
|
|
$r2 close
|
|
wait_for_watched_clients_count 0
|
|
}
|
|
}
|
|
}
|
|
|
|
start_server {tags {"info" "external:skip"}} {
|
|
test {memory: database and pubsub overhead and rehashing dict count} {
|
|
r flushall
|
|
set info_mem [r info memory]
|
|
set mem_stats [r memory stats]
|
|
assert_equal [getInfoProperty $info_mem mem_overhead_db_hashtable_rehashing] {0}
|
|
assert_equal [dict get $mem_stats overhead.db.hashtable.lut] {0}
|
|
assert_equal [dict get $mem_stats overhead.db.hashtable.rehashing] {0}
|
|
assert_equal [dict get $mem_stats db.dict.rehashing.count] {0}
|
|
# Initial dict expand is not rehashing
|
|
r set a b
|
|
set info_mem [r info memory]
|
|
set mem_stats [r memory stats]
|
|
assert_equal [getInfoProperty $info_mem mem_overhead_db_hashtable_rehashing] {0}
|
|
assert_range [dict get $mem_stats overhead.db.hashtable.lut] 1 64
|
|
assert_equal [dict get $mem_stats overhead.db.hashtable.rehashing] {0}
|
|
assert_equal [dict get $mem_stats db.dict.rehashing.count] {0}
|
|
# set 4 more keys to trigger rehashing
|
|
# get the info within a transaction to make sure the rehashing is not completed
|
|
r multi
|
|
r set b c
|
|
r set c d
|
|
r set d e
|
|
r set e f
|
|
r info memory
|
|
r memory stats
|
|
set res [r exec]
|
|
set info_mem [lindex $res 4]
|
|
set mem_stats [lindex $res 5]
|
|
assert_range [getInfoProperty $info_mem mem_overhead_db_hashtable_rehashing] 1 64
|
|
assert_range [dict get $mem_stats overhead.db.hashtable.lut] 1 192
|
|
assert_range [dict get $mem_stats overhead.db.hashtable.rehashing] 1 64
|
|
assert_equal [dict get $mem_stats db.dict.rehashing.count] {1}
|
|
}
|
|
}
|