From 9a646c02830e273259587db6c2fda2028df5ec27 Mon Sep 17 00:00:00 2001 From: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com> Date: Mon, 12 Dec 2022 22:00:46 +0200 Subject: [PATCH] kad-dht/: Recommend new values for Provider Record Republish and Expiration (#451) Recommend new values for provider record republish and expiration (22h/48h) based on request-for-measurement 17 results. Co-authored-by: Marcin Rataj --- kad-dht/README.md | 89 ++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 76 insertions(+), 13 deletions(-) diff --git a/kad-dht/README.md b/kad-dht/README.md index 5caa104..b0d7596 100644 --- a/kad-dht/README.md +++ b/kad-dht/README.md @@ -2,7 +2,7 @@ | Lifecycle Stage | Maturity | Status | Latest Revision | |-----------------|----------------|--------|-----------------| -| 3A | Recommendation | Active | r1, 2021-10-30 | +| 3A | Recommendation | Active | r2, 2022-12-09 | Authors: [@raulk], [@jhiesey], [@mxinden] @@ -75,14 +75,22 @@ nodes, unrestricted nodes should operate in _server mode_ and restricted nodes, e.g. those with intermittent availability, high latency, low bandwidth, low CPU/RAM/Storage, etc., should operate in _client mode_. -As an example, running the libp2p Kademlia protocol on top of the Internet, -publicly routable nodes, e.g. servers in a datacenter, might operate in _server +As an example, publicly routable nodes running the libp2p Kademlia protocol, +e.g. servers in a datacenter, should operate in _server mode_ and non-publicly routable nodes, e.g. laptops behind a NAT and firewall, -might operate in _client mode_. The concrete factors used to classify nodes into +should operate in _client mode_. The concrete factors used to classify nodes into _clients_ and _servers_ depend on the characteristics of the network topology -and the properties of the Kademlia DHT . Factors to take into account are e.g. +and the properties of the Kademlia DHT. Factors to take into account are e.g. network size, replication factor and republishing period. +For instance, setting the replication factor to a low value would require more +reliable peers, whereas having higher replication factor could allow for less +reliable peers at the cost of more overhead. Ultimately, peers that act as +servers should help the network (i.e., provide positive utility in terms of +availability, reachability, bandwidth). Any factor that slows down network +operations (e.g., a node not being reachable, or overloaded) for the majority +of times it is being contacted should instead be operating as a client node. + Nodes, both those operating in _client_ and _server mode_, add another node to their routing table if and only if that node operates in _server mode_. This distinction allows restricted nodes to utilize the DHT, i.e. query the DHT, @@ -228,7 +236,7 @@ Then we loop: becomes the new best peer (`Pb`). 2. If the new value loses, we add the current peer to `Po`. 2. If successful with or without a value, the response will contain the - closest nodes the peer knows to the key `Key`. Add them to the candidate + closest nodes the peer knows to the `Key`. Add them to the candidate list `Pn`, except for those that have already been queried. 3. If an error or timeout occurs, discard it. 4. Go to 1. @@ -256,7 +264,7 @@ type Validator interface { ``` `Validate()` should be a pure function that reports the validity of a record. It -may validate a cryptographic signature, or else. It is called on two occasions: +may validate a cryptographic signature, or similar. It is called on two occasions: 1. To validate values retrieved in a `GET_VALUE` query. 2. To validate values received in a `PUT_VALUE` query before storing them in the @@ -268,23 +276,76 @@ heuristic of the value to make the decision. ### Content provider advertisement and discovery -Nodes must keep track of which nodes advertise that they provide a given key -(CID). These provider advertisements should expire, by default, after 24 hours. -These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` +There are two things at play with regard to provider record (and therefore content) +liveness and reachability: + +Content needs to be reachable, despite peer churn; +and nodes that store and serve provider records should not serve records for stale content, +i.e., content that the original provider does not wish to make available anymore. + +The following two parameters help cover both of these cases. + +1. **Provider Record Republish Interval:** The content provider +needs to make sure that the nodes chosen to store the provider record +are still online when clients ask for the record. In order to +guarantee this, while taking into account the peer churn, content providers +republish the records they want to provide. Choosing the particular value for the +Republish interval is network-specific and depends on several parameters, such as +peer reliability and churn. + + - For the IPFS network it is currently set to **22 hours**. + +2. **Provider Record Expiration Interval:** The network needs to provide +content that content providers are still interested in providing. In other words, +nodes should not keep records for content that content providers have stopped +providing (aka stale records). In order to guarantee this, provider records +should _expire_ after some interval, i.e., nodes should stop serving those records, +unless the content provider has republished the provider record. Again, the specific +setting depends on the characteristics of the network. + + - In the IPFS DHT the Expiration Interval is set to **48 hours**. + +The values chosen for those parameters should be subject to continuous monitoring +and investigation. Ultimately, the values of those parameters should balance +the tradeoff between provider record liveness (due to node churn) and traffic overhead +(to republish records). +The latest parameters are based on the comprehensive study published +in [provider-record-measurements]. + +Provider records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS` messages. +It is also worth noting that the keys for provider records are multihashes. This +is because: + +- Provider records are used as a rendezvous point for all the parties who have +advertised that they store some piece of content. +- The same multihash can be in different CIDs (e.g. CIDv0 vs CIDv1 of a SHA-256 dag-pb object, +or the same multihash but with different codecs such as dag-pb vs raw). +- Therefore, the rendezvous point should converge on the minimal thing everyone agrees on, +which is the multihash, not the CID. + #### Content provider advertisement When the local node wants to indicate that it provides the value for a given -key, the DHT finds the closest peers to the key using the `FIND_NODE` RPC (see +key, the DHT finds the (`k` = 20) closest peers to the key using the `FIND_NODE` RPC (see [peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with -its own `PeerInfo` to each of these peers. +its own `PeerInfo` to each of these peers. The study in [provider-record-measurements] +proved that the replication factor of `k` = 20 is a good setting, although continuous +monitoring and investigation may change this recommendation in the future. Each peer that receives the `ADD_PROVIDER` RPC should validate that the received `PeerInfo` matches the sender's `peerID`, and if it does, that peer should store the `PeerInfo` in its datastore. Implementations may choose to not store the addresses of the providing peer e.g. to reduce the amount of required storage or -to prevent storing potentially outdated address information. +to prevent storing potentially outdated address information. Implementations that choose +to keep the network address (i.e., the `multiaddress`) of the providing peer should do it for +a period of time that they are confident the network addresses of peers do not change after the +provider record has been (re-)published. As with previous constant values, this is dependent +on the network's characteristics. A safe value here is the Routing Table Refresh Interval. +In the kubo IPFS implementation, this is set to 30 mins. After that period, peers provide +the provider's `peerID` only, in order to avoid pointing to stale network addresses +(i.e., the case where the peer has moved to a new network address). #### Content provider discovery @@ -470,3 +531,5 @@ multiaddrs are stored in the node's peerbook. [ping]: https://github.com/libp2p/specs/issues/183 [go-libp2p-xor]: https://github.com/libp2p/go-libp2p-xor + +[provider-record-measurements]: https://github.com/protocol/network-measurements/blob/master/results/rfm17-provider-record-liveness.md