Fix typos and improve grammar in dht.md

This commit is contained in:
Jimmy Debe
2025-11-03 16:31:25 -05:00
committed by GitHub
parent 0ff88ad40d
commit 6b2096d52b

View File

@@ -10,16 +10,16 @@ contributors:
## Abstract
This document explians the Codex DHT (Distributed Hash Table) component.
This document explains the Codex DHT (Distributed Hash Table) component.
It is used to store Codex's signed-peer-record (SPR) entries,
as well as content identifiers (CID) for each host.
## Background
Codex is a network of nodes, identified as providers,
particapting in a decentralized peer-to-peer storage protocol.
participating in a decentralized peer-to-peer storage protocol.
The decentralized storage solution offers data durability guarantees,
incentive mechanisms and data persistenace guarantees.
incentive mechanisms and data persistence guarantees.
The Codex DHT component is a modified version of
[DiscV5 DHT](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md) protocol used on the Ethereum network.
@@ -27,7 +27,7 @@ DiscV5 is a node discovery system used to find peers who are registered on a dis
This allows a provider to publish to the network their connection information and
information about what content they are storing.
A Codex provider will support this protocol at no extra cost other than the use of resources to store node records.
This allows any proviser node to be used as the entry point for new providers to connect to live nodes on the Codex netowork.
This allows any provider node to be used as the entry point for new providers to connect to live nodes on the Codex network.
## Wire Format
@@ -39,13 +39,13 @@ A `provider` is a node running the Codex protocol and providing resources to the
To become a `provider`, the node MUST have a node identity which is used to create a node record.
The record will be shared and
stored by other `provider`s in their local routing table.
A `FINDNODEMESSAGE` messsage is used by the new `provider` to query other nodes who may choose to store the new record.
A `FINDNODEMESSAGE` message is used by the new `provider` to query other nodes who may choose to store the new record.
Once stored by one `provider`,
the new `provider` will be accessible to the network.
This record SHOULD include node identity information, connection information,
timing information, and reliability information.
Information provided in the record can be updated any time to match the live details of the `provider`.
Information provided in the record can be updated at any time to match the live details of the `provider`.
The following is the `provider` node record in the Codex network.
``` js
@@ -66,7 +66,7 @@ The following is the `provider` node record in the Codex network.
"SignedPeerRecord" : {
"seqNum": uint64, // sequence number of record update
"pubkkey": PublicKey,
"ip": IpAddress, // ip address is optionaloptional
"ip": IpAddress, // ip address is optional
"tcpPort": Port,
"udpPort": Port
}
@@ -76,32 +76,32 @@ The following is the `provider` node record in the Codex network.
```
The `NodeId` is the sameused in the discv5 protocol to identitfy other nodes.
When contructing the `NodeId`,
the `provider` MUST use the Keccak256 hash function of it's `PublicKey`.
The `NodeId` is the same as that used in the discv5 protocol to identify other nodes.
When constructing the `NodeId`,
the `provider` MUST use the Keccak256 hash function of its `PublicKey`.
### Signed Peer Record
The `record` MUST be generated by the `provider`,
which contains their connection information.
On the Codex network,
the `record` is identitfied a `SignedPeerRecord`, SPR.
The strucutue of an SPR is modified from the ENR,
the `record` is identified a `SignedPeerRecord`, SPR.
The structure of an SPR is modified from the ENR,
[Ethereum Node Record](https://github.com/ethereum/devp2p/blob/master/enr.md) structure definition.
All values, excluding the `ip`, is REQUIRED in a SPR.
Which nodes and the amount of nodes in this set is described in the {routing table](#routingtable) section.
All values, excluding the `ip`, are REQUIRED in a SPR.
Which nodes and the number of nodes in this set are described in the [routing table](#routingtable) section.
The private key MUST be used to sign the `record`.
A `provider` SHOULD disregard messages from a node if the `record` is unsigned or become stale.
The `provider` SHOULD contact other live nodes to disseinate new and updated records.
The update will increase the `seqNum` then sign the new version of the `record`.
A `provider` SHOULD disregard messages from a node if the `record` is unsigned or becomes stale.
The `provider` SHOULD contact other live nodes to disseminate new and updated records.
The update will increase the `seqNum`, then sign the new version of the `record`.
### Distance calculation
#### Routing table
Each `provider` has a local routing table which stores the SPR of other nodes in the netwokr.
The routing table is accessiable by any node who knows the identity of the `provider`.
Each `provider` has a local routing table which stores the SPR of other nodes in the network.
The routing table is accessible by any node that knows the identity of the `provider`.
``` js
@@ -133,7 +133,7 @@ The routing table is accessiable by any node who knows the identity of the `prov
The `bitsPerHop` MUST indicate the minimum number of bits of a `NodeId` needed to get closer to finding the target per query.
Practically, it tells a `provider` also how often the node "not in range" branch will split off.
Setting this value to 1 is the basic, non accelerated version,
Setting this value to 1 is the basic, non-accelerated version,
which will never split off the not in range branch and
which will result in $ \log_2 n $ hops per lookup.
Setting it higher will increase the amount of splitting on a "not in range" branch,
@@ -142,31 +142,31 @@ will result in an improvement of $ \log_{2^b} n $ hops per lookup.
- `DistanceCalculator`: value MUST be generated with
- `istart`: The range of `NodeId`s this `KBucket` covers.
This is not a simple logarithmic distance as buckets can be split over a prefix that
This is not a simple logarithmic distance, as buckets can be split over a prefix that
does not cover the `localNode` id.
- `providers`: Node entries of the KBucket are sorted according to last time seen.
First entry (head) is considered the most recently seen node and
First entry (head) is considered the most recently seen node, and
the last entry (tail) is considered the least recently seen node.
Here "seen" indicates a successful request-response.
This can also not have occured yet.
Here, "seen" indicates a successful request-response.
This can also not have occurred yet.
- `IpLimits`: The routing table IP limits are applied on both the total table,
and on the individual buckets.
In each case, the active node entries,
but also the entries waiting in the replacement cache are accounted for.
but also the entries waiting in the replacement cache, are accounted for.
This way, the replacement cache can't get filled with nodes that then can't be added due to the limits that apply.
As entries are not verified immediately before or on entry,
it is possible that a malicious node could fill the routing table or
a specific bucket with SPRs that have `ip`s it does not control.
This would effect the node that actually owns the `ip`,
as they could have a difficult time getting its SPR distrubuted in the DHT.
This would affect the node that actually owns the `ip`,
as they could have a difficult time getting its SPR distributed in the DHT.
However, that `provider` can still search and find nodes to connect to.
There is the possiblity to set the `IPLimits` on verified `providers` only,
There is the possibility to set the `IPLimits` on verified `providers` only,
but that would allow for lookups to be done on a higher set of nodes owned by the same identity.
This is a worse alternative.
Doing lookups only on verified nodes would slow down discovery start up.
Doing lookups only on verified nodes would slow down discovery startup.
## Copyright