Fix bug that was causing peers to pass the refinery even when the version exchange failed. We use futures::select instead of the system method timeout() for more fine grained control over the different return types.
Before we would start the refine() session first because it contains a
SelfHandshake that ProtocolAddr/Seed were dependent on.
Since it's an internal process that deals with locks and HostState it
feels more logical to start() stop() it last in the sequence.
Previously we would do a version exchange with our own external
address and perform a time stamp when the version exchange occured
successfully. This tuple of (external_addr, last_seen) would be shared
with other nodes in ProtocolAddr and ProtocolSeed.
This commit removes SelfHandshake. Instead, we simply set the last_seen
to now when we send our addrs in ProtocolAddr/Seed.
Rationale:
1. Doing a version exchange with ourselves requires creating two
channels, an inbound and an outbound channel. This causes unintended
message broadcasting to our own addrs in p2p.broadcast(), and race
conditions when one of the two channels disconnects.
2. Being reachable by our own node does not guarantee our node is
reachable by other nodes, so it's not a reliable method for
determining whether an address is accessible.
3. As the external addrs ends up on the greylist of other peers, the
work is essentially redundant, since they will also ping the address
via their GreylistRefinery.
One notable impact of this change is that setting the timestamp to now
means that external addresses from other nodes will be placed on the top
of our greylist. Hostlists are sorted by last_seen, with the most
recently seen timestamp on the top of the list. However, this does not
impact the refinery ordering because refinery selects peers to refine
randomly.
The one area this could impact is outbound session, since we
search for an address to connect to by looping through the hostlists
(incl. greylist) starting from index 0 (i.e. the most recently seen
node). Therefore, this commit increases the probability of connecting to
a node that has recently sent its address to us (or marking that node as
"Suspend" if we cannot connect to it).
This change required adding a new abstraction called Slot that stores a
Connector, and refactoring ManualSession significantly such that it more
closely resembles the format of other sessions.
We also use FuturesUnordered to stop process concurrently wherever
relevant.
`socket2::connect_with_timeout()` is blocking, and so would wait up to
`outbound_connect_timeout` seconds before returning even when the
underlying connection has been stopped.
To make the connection non-blocking, we wrap it in an Async object and
wait for it to complete using the socket method `take_error()`. If
there's a timeout configured, we use futures::select to return whatever
completes first. Otherwise, we wait on the connection.
For more info, see: https://github.com/rust-lang/socket2/issues/466
Also helpful: https://stackoverflow.com/questions/73101446/non-blocking-tcp-socket-fails-to-connect-using-socket2
also give the seed node enough time to run the refinery twice, to ensure
peers have at least 1 addr from the seed node (since the node that was
refined will filter out its own addr in the first addr propagation).
We introduce a new abstraction called Slot which runs a connector for
each seed specified in Settings. On `p2p.start()`, seed slots are
started but sit in a pending state until a subsequent call to
`notify()`.
This change allows us to have more fine grained control over the `Connector`
(such as allowing us to stop the `Connector` when it is in progress).
Notably there has been a change to the error handling since we no longer
return a success or failure from the seedsync process. Instead, we store
success or failure of a given `Slot` in a AtomicBool, which allows us to
notify other parts of the codebase if all attempts to seed have failed.