Thereby propagate pending write back-pressure via the
paused command channels, rather than by "blocking"
`Connection::next()` itself until a write completes.
Notably, no new buffers are introduced. When a frame write
cannot complete, command channels are paused and I/O
reads can continue while the write is pending. The paused
command channels exercise write back-pressure on the streams
and API and ensure that the only frames we still try
to send are those as a result of reading a frame - these
then indeed wait for completion of the prior pending
send operation, if any, but since it is done as a result
of reading a frame, the remote can in turn write again,
should it have been waiting to be able to do so before
it in turn can read again. Unexpected write deadlocks
of peers which otherwise read & write concurrently
from substreams can thus be avoided.
By limitting the data frame payload size one gains the following
beneftis:
1. Reduce delays sending time-sensitive frames, e.g. window updates.
2. Minimize head-of-line blocking across streams.
3. Enable better interleaving of send and receive operations, as each is
carried out atomically instead of concurrently with its respective
counterpart.
Limiting the frame size to 16KiB does not introduce a large overhead. A
Yamux frame header is 12 bytes large, thus this change introduces an
overhead of ~0.07%.
There are benign cases where frames from unknown streams can occur and
where a stream reset is not appropriate. See
https://github.com/paritytech/yamux/issues/110 for details.
This commit ignores frames from unknown streams for data, window update
and ping frames and adds documentation referencing back to issue #110.
Hold lock for `Shared` long enough to prevent interleaved calls to
`next_window_update` possibly granting a receive window higher than the
configured one.
In `WindowUpdateMode::OnRead` one only grants the sender new credit once
existing bytes have been read. Thus there is no need to check whether to
send a new WindowUpdate message when receiving new bytes. One only needs
to check whether to send a new WindowUpdate message when reading bytes.
To prevent senders from being blocked every time they have sent
RECEIVE_WINDOW (e.g. 256 KB) number of bytes, waiting for a WindowUpdate
message granting further sending credit, this commit makes Yamux send a
WindowUpdate message once half or more of the window has been received
(and consumed in `WindowUpdateMode::OnRead`). Benchmarking shows that
sending WindowUpdate messages early prevents senders from being blocked
waiting for additional credit on common network types.
For a detailed discussion as well as various benchmark results see
https://github.com/paritytech/yamux/issues/100.
Next to the above, this commit includes the following changes:
- Use `WindowUpdateMode::OnRead` in benchmarks. I would argue that
`OnRead` should be the default setting, given the importance of
flow-control. With that in mind, I suggest to benchmark that default
case.
- Ignore WindowUpdate messages for unknown streams. With this commit
WindowUpdate messages are sent more agressively. The following scenario
would surface: A sender would close its channel, eventually being
garbage collected. The receiver, before receiving the closing message,
sends out a WindowUpdate message. The sender receives the WindowUpdate
message not being able to associate it to a stream, thus resetting the
whole connection. This commit ignores WindowUpdate messages for unknown
streams instead.
When sending very large messages, WindowUpdate messages are still not
returned to the sender in time, thus the sender still being blocked in
such case. This can be prevented by splitting large messages into
smaller Yamux frames, see
https://github.com/paritytech/yamux/issues/100#issuecomment-774461550.
This additional optimization can be done in a future commit without
interfering with the optimization introduced in this commit.
Previously the benchmarks would use an unbounded channel to simulate a
network connection. While this is helpful to measure the CPU impact of
Yamux specific changes, it is difficult to estimate how changes might
effect network throughput in different environments.
The crate `constrained-connection` can help simulate different
environments (DSL, HSDPA, GBit LAN, ...) by providing a channel
enforcing a specified bandwidth and delay. That said, it does not
simulate the OS network stack, routers, buffers, packet loss, ... Thus
as one would expect this can be used to detect trends, but not to e.g.
determine bandwidth expectations.
This commit makes the benchmarks run on top of a subset of the
connection types provided by `constrained-connection`. In addition it
retains the simulation on top of an unconstrained connection.
When simulating with low bandwidth-delay-product connections, yamux can
deadlock in the roundtrip scenario. Imagine a client sending a large
payload exceeding the bandwidth-delay-product of the connection to a
server. The server will try to echo the payload back to the client. Once
both client and server exhausted the buffer provided through the
bandwidth-delay-product of the connection, neither will be able to make
progress as the other won't read from the connection. This commit
rewrites the benchmarks, only having the client send to the server, but
not the server echoing back to the client.