Compare commits

...

16 Commits

Author SHA1 Message Date
Jimmy Debe
887437973e Merge branch 'main' into dht-codex 2025-11-12 08:58:41 -05:00
Jimmy Debe
d65332a465 Fixess 2025-11-05 10:09:45 -05:00
Jimmy Debe
714d9e700e Fixes 2025-11-05 10:04:27 -05:00
Jimmy Debe
a311c357c7 Update DHT 2025-11-04 01:12:04 -05:00
Jimmy Debe
750909f382 Update DHT 2025-11-03 23:07:45 -05:00
Jimmy Debe
6b2096d52b Fix typos and improve grammar in dht.md 2025-11-03 16:31:25 -05:00
Jimmy Debe
0ff88ad40d Update codex/raw/dht.md
Co-authored-by: Eric <5089238+emizzle@users.noreply.github.com>
2025-11-03 15:45:25 -05:00
Jimmy Debe
9adf19a74c Update dht.md 2025-10-16 17:01:03 -04:00
Jimmy Debe
4a9da183c3 up dht.md 2025-10-14 20:29:25 -04:00
Jimmy Debe
d9c31267bc update dht.md 2025-10-14 20:22:40 -04:00
Jimmy Debe
9e0de41824 update dht.md
Updated terminology for clarity and corrected typos.
2025-10-13 11:33:28 -04:00
Jimmy Debe
3972db75b0 Update dht.md 2025-09-19 04:18:20 -04:00
Jimmy Debe
db15263b3f Update dht.md 2025-09-19 04:16:29 -04:00
Jimmy Debe
4f2a343c4b update dht.md 2025-09-19 03:49:12 -04:00
Jimmy Debe
8778f39d51 Update dht.md 2025-09-09 12:08:09 -04:00
Jimmy Debe
72b1f0e5d9 Create dht.md 2025-09-02 17:37:09 -04:00

186
codex/raw/dht.md Normal file
View File

@@ -0,0 +1,186 @@
---
title: CODEX-DHT
name: Codex Discovery
status: raw
tags:
editor:
contributors:
- Jimmy Debe <jimmy@status.im>
---
## Abstract
This document explains the Codex DHT (Distributed Hash Table) component.
It is used to store Codex's signed-peer-record (SPR) entries,
as well as content identifiers (CID) for each host.
## Background
Codex is a network of nodes, identified as providers,
participating in a decentralized peer-to-peer storage protocol.
The decentralized storage solution offers data durability guarantees,
incentive mechanisms and data persistence guarantees.
The Codex DHT component is a modified version of
[DiscV5 DHT](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md) protocol to store Codex SPR entries,
as well as the CID for each host.
DiscV5 is a node discovery system used to find peers who are registered on a distributed hash table (DHT).
This allows a provider to publish to the network their connection information and
information about what content they are storing.
A Codex provider will support this protocol at no extra cost other than the use of resources to store node records.
This allows any provider node to be used as the entry point for new providers to connect to live nodes on the Codex network.
## Wire Format
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,
“SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and
“OPTIONAL” in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
A `provider` is a node running the Codex protocol and providing resources to the Codex network.
To become a `provider`, the node MUST have a `NodeId`,
which is generated with the keccak256 hash function of its `PublicKey`.
The record will be shared and
stored by other `provider`s in their local routing table.
A `FINDNODEMESSAGE` message is used by the new `provider` to query other nodes who may choose to store the new record.
Once stored by one `provider`,
the new `provider` will be accessible to the network.
This record SHOULD include node identity information, connection information,
timing information, and reliability information.
Information provided in the record can be updated at any time to match the live details of the `provider`.
The following is the `provider` node record in the Codex network.
``` js
{
"Provider" : {
id: NodeId
pubkey: PublicKey
address: Array[peerId]
record: SignedPeerRecord
seen: float // Indicates if there was at least one successful
// request-response with this provider, or if the node was verified
// through the underlying transport mechanisms. After first contact
// it tracks how reliable is the communication with the provider.
stats: Stats // traffic measurements and statistics
}
"SignedPeerRecord" : {
"seqNum": uint64, // sequence number of record update
"pubkkey": PublicKey,
"ip": IpAddress, // ip address is optional
"tcpPort": Port,
"udpPort": Port
}
"NodeId" : keccak256(PublicKey)
"Stats" : {
"rttMin": float // millisec
"rttAvg": float // millisec
"bwAvg": float // bps
"bwMax": float // bps
}
}
```
As in the discV5 protocol, the `NodeId` is a unqiue identifier to find other nodes within the network.
The `peerId` and `NodeId` MUST NOT be the same.
### Signed Peer Record
The `record` MUST be generated by the `provider`,
which contains their connection information.
On the Codex network,
the `record` is identified a `SignedPeerRecord`, SPR.
All values, excluding the `ip`, are REQUIRED in a SPR.
Which nodes and the number of nodes in this set are described in the [routing table](#routing-table) section.
The `PrivateKey` MUST be used to sign the `record`.
A `provider` SHOULD disregard messages from a node if the `record` is unsigned or becomes stale.
The `provider` SHOULD contact other live nodes to disseminate new and updated records.
The update will increase the `seqNum`, then sign the new version of the `record`.
### Distance calculation
#### Routing Table
Each `provider` has a local routing table which stores the SPR of other `provider`s it has discovered.
New `provider`s SHOULD query live nodes to update their local routing table with SPRs.
``` js
"RoutingTable" : {
"localProvider": Provider,
"buckets": seq[KBucket],
"bitsPerHop": number,
"ipLimits": IpLimits, // IP limits for total routing table: all buckets and
// replacement caches.
"distanceCalculator" : DistanceCalculator,
"rng" : ref HmacDrbgContext
}
"KBucket" : {
"istart", "iend": NodeId,
"providers": seq[Provider],
"replacementCache": seq[Providers], // Nodes that could not be added to the `providers`
// seq as it is full and without stale nodes. This is practically a small
// LRU cache.
"ipLimits": IpLimits, ## IP limits for bucket: node entries and replacement
// cache entries combined.
}
"IpLimits" : {
"tableIpLimit": number,
"bucketIpLimit": number
}
```
The `bitsPerHop` MUST indicate the minimum number of bits of a `NodeId` needed to get closer to finding the target per query.
Practically, it tells a `provider` also how often a node "not in range", based on `NodeId` prefix similarities,
will cause a branch to split off.
Setting this value to 1 is the basic, non-accelerated version,
which will never split off the "not in range" branch and
which will result in $ \log_2 n $ hops per lookup.
Setting it higher will increase the amount of splitting on a "not in range" branch,
thus holding more `providers` with a better keyspace coverage and
will result in an improvement of $ \log_{2^b} n $ hops per lookup.
- `DistanceCalculator`: value SHOULD be generated with the defined node distance algorithm used in the [discv5 specification](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md).
- `istart`: The range of `NodeId`s this `KBucket` covers.
This is not a simple logarithmic distance, as buckets can be split over a prefix that
does not cover the `localNode` id.
- `providers`: Node entries of the KBucket are sorted according to last time seen.
First entry (head) is considered the most recently seen node, and
the last entry (tail) is considered the least recently seen node.
Here, "seen" indicates a successful request-response.
This can also not have occurred yet.
- `IpLimits`: The routing table IP limits are applied on both the total table,
and on the individual buckets.
In each case, the active node entries,
but also the entries waiting in the replacement cache, are accounted for.
This way, the replacement cache can't get filled with nodes that then can't be added due to the limits that apply.
As entries are not verified immediately before or on entry,
a malicious node MAY fill the routing table or
a specific bucket with SPRs that have `ip`s not controlled by that adversary.
This would affect the `provider` that actually controls the `ip`,
as they could have a difficult time disseminating its SPR to be stored in the DHT by other `provider`s.
However, that `provider` can still search and find nodes to connect to.
There is the possibility to set the `IPLimits` on verified `providers` only,
but that would allow for lookups to be done on a higher set of nodes owned by the same identity.
This is a worse alternative.
Doing lookups only on verified nodes would slow down discovery startup.
## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
## References
- [DiscV5 DHT](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md)
- [Ethereum Node Record](https://github.com/ethereum/devp2p/blob/master/enr.md)