mirror of
https://github.com/vacp2p/rfc-index.git
synced 2026-01-09 15:48:03 -05:00
Compare commits
35 Commits
nescience/
...
99a11e7e08
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
99a11e7e08 | ||
|
|
3cd37b4538 | ||
|
|
b9a08305bb | ||
|
|
bf198face6 | ||
|
|
dabc31786b | ||
|
|
b2f35644a4 | ||
|
|
4f54254706 | ||
|
|
7f1df32779 | ||
|
|
e742cd5192 | ||
|
|
9d11a22901 | ||
|
|
aaf158aa59 | ||
|
|
e39d2884fe | ||
|
|
d2df7e0c2d | ||
|
|
63107d3830 | ||
|
|
dd397adc59 | ||
|
|
cb4d0de84f | ||
|
|
69802377a8 | ||
|
|
e4f5f28ea3 | ||
|
|
171e934d61 | ||
|
|
36be428cdd | ||
|
|
6672c5bedf | ||
|
|
422b7ec3d4 | ||
|
|
51ef4cd533 | ||
|
|
53dfb97bc7 | ||
|
|
39d6f07d4f | ||
|
|
aa8a3b0c65 | ||
|
|
cfb3b78c71 | ||
|
|
34bbd7af90 | ||
|
|
a3a5b91df3 | ||
|
|
b1da70386e | ||
|
|
f051117d37 | ||
|
|
3505da6bd6 | ||
|
|
3b968ccce3 | ||
|
|
536d31b5b7 | ||
|
|
4361e2958f |
1715
codex/raw/codex-block-exchange.md
Normal file
1715
codex/raw/codex-block-exchange.md
Normal file
File diff suppressed because it is too large
Load Diff
802
codex/raw/codex-marketplace.md
Normal file
802
codex/raw/codex-marketplace.md
Normal file
@@ -0,0 +1,802 @@
|
||||
---
|
||||
slug: codex-marketplace
|
||||
title: CODEX-MARKETPLACE
|
||||
name: Codex Storage Marketplace
|
||||
status: raw
|
||||
category: Standards Track
|
||||
tags: codex, storage, marketplace, smart-contract
|
||||
editor: Codex Team and Dmitriy Ryajov <dryajov@status.im>
|
||||
contributors:
|
||||
- Mark Spanbroek <mark@codex.storage>
|
||||
- Adam Uhlíř <adam@codex.storage>
|
||||
- Eric Mastro <eric@codex.storage>
|
||||
- Jimmy Debe <jimmy@status.im>
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
Codex Marketplace and its interactions are defined by a smart contract deployed on an EVM-compatible blockchain. This specification describes these interactions for the various roles within the network.
|
||||
|
||||
The document is intended for implementors of Codex nodes.
|
||||
|
||||
## Semantics
|
||||
|
||||
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||
|
||||
### Definitions
|
||||
|
||||
| Terminology | Description |
|
||||
|---------------------------|---------------------------------------------------------------------------------------------------------------------------|
|
||||
| Storage Provider (SP) | A node in the Codex network that provides storage services to the marketplace. |
|
||||
| Validator | A node that assists in identifying missing storage proofs. |
|
||||
| Client | A node that interacts with other nodes in the Codex network to store, locate, and retrieve data. |
|
||||
| Storage Request or Request | A request created by a client node to persist data on the Codex network. |
|
||||
| Slot or Storage Slot | A space allocated by the storage request to store a piece of the request's dataset. |
|
||||
| Smart Contract | A smart contract implementing the marketplace functionality. |
|
||||
| Token | The ERC20-based token used within the Codex network. |
|
||||
|
||||
## Motivation
|
||||
|
||||
The Codex network aims to create a peer-to-peer storage engine with robust data durability, data persistence guarantees, and a comprehensive incentive structure.
|
||||
|
||||
The marketplace is a critical component of the Codex network, serving as a platform where all involved parties interact to ensure data persistence. It provides mechanisms to enforce agreements and facilitate data repair when SPs fail to fulfill their duties.
|
||||
|
||||
Implemented as a smart contract on an EVM-compatible blockchain, the marketplace enables various scenarios where nodes assume one or more roles to maintain a reliable persistence layer for users. This specification details these interactions.
|
||||
|
||||
The marketplace contract manages storage requests, maintains the state of allocated storage slots, and orchestrates SP rewards, collaterals, and storage proofs.
|
||||
|
||||
A node that wishes to participate in the Codex persistence layer MUST implement one or more roles described in this document.
|
||||
|
||||
### Roles
|
||||
|
||||
A node can assume one of the three main roles in the network: the client, SP, and validator.
|
||||
|
||||
A client is a potentially short-lived node in the network with the purpose of persisting its data in the Codex persistence layer.
|
||||
|
||||
An SP is a long-lived node providing storage for clients in exchange for profit. To ensure a reliable, robust service for clients, SPs are required to periodically provide proofs that they are persisting the data.
|
||||
|
||||
A validator ensures that SPs have submitted valid proofs each period where the smart contract required a proof to be submitted for slots filled by the SP.
|
||||
|
||||
---
|
||||
|
||||
## Part I: Protocol Specification
|
||||
|
||||
This part defines the **normative requirements** for the Codex Marketplace protocol. All implementations MUST comply with these requirements to participate in the Codex network. The protocol is defined by smart contract interactions on an EVM-compatible blockchain.
|
||||
|
||||
## Storage Request Lifecycle
|
||||
|
||||
The diagram below depicts the lifecycle of a storage request:
|
||||
|
||||
```text
|
||||
┌───────────┐
|
||||
│ Cancelled │
|
||||
└───────────┘
|
||||
▲
|
||||
│ Not all
|
||||
│ Slots filled
|
||||
│
|
||||
┌───────────┐ ┌──────┴─────────────┐ ┌─────────┐
|
||||
│ Submitted ├───►│ Slots Being Filled ├──────────►│ Started │
|
||||
└───────────┘ └────────────────────┘ All Slots └────┬────┘
|
||||
Filled │
|
||||
│
|
||||
┌───────────────────────┘
|
||||
Proving ▼
|
||||
┌────────────────────────────────────────────────────────────┐
|
||||
│ │
|
||||
│ Proof submitted │
|
||||
│ ┌─────────────────────────► All good │
|
||||
│ │ │
|
||||
│ Proof required │
|
||||
│ │ │
|
||||
│ │ Proof missed │
|
||||
│ └─────────────────────────► After some time slashed │
|
||||
│ eventually Slot freed │
|
||||
│ │
|
||||
└────────┬─┬─────────────────────────────────────────────────┘
|
||||
│ │ ▲
|
||||
│ │ │
|
||||
│ │ SP kicked out and Slot freed ┌───────┴────────┐
|
||||
All good │ ├─────────────────────────────►│ Repair process │
|
||||
Time ran out │ │ └────────────────┘
|
||||
│ │
|
||||
│ │ Too many Slots freed ┌────────┐
|
||||
│ └─────────────────────────────►│ Failed │
|
||||
▼ └────────┘
|
||||
┌──────────┐
|
||||
│ Finished │
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
## Client Role
|
||||
|
||||
A node implementing the client role mediates the persistence of data within the Codex network.
|
||||
|
||||
A client has two primary responsibilities:
|
||||
|
||||
- Requesting storage from the network by sending a storage request to the smart contract.
|
||||
- Withdrawing funds from the storage requests previously created by the client.
|
||||
|
||||
### Creating Storage Requests
|
||||
|
||||
When a user prompts the client node to create a storage request, the client node SHOULD receive the input parameters for the storage request from the user.
|
||||
|
||||
To create a request to persist a dataset on the Codex network, client nodes MUST split the dataset into data chunks, $(c_1, c_2, c_3, \ldots, c_{n})$. Using the erasure coding method and the provided input parameters, the data chunks are encoded and distributed over a number of slots. The applied erasure coding method MUST use the [Reed-Solomon algorithm](https://hackmd.io/FB58eZQoTNm-dnhu0Y1XnA). The final slot roots and other metadata MUST be placed into a `Manifest` (TODO: Manifest RFC). The CID for the `Manifest` MUST then be used as the `cid` for the stored dataset.
|
||||
|
||||
After the dataset is prepared, a client node MUST call the smart contract function `requestStorage(request)`, providing the desired request parameters in the `request` parameter. The `request` parameter is of type `Request`:
|
||||
|
||||
```solidity
|
||||
struct Request {
|
||||
address client;
|
||||
Ask ask;
|
||||
Content content;
|
||||
uint64 expiry;
|
||||
bytes32 nonce;
|
||||
}
|
||||
|
||||
struct Ask {
|
||||
uint256 proofProbability;
|
||||
uint256 pricePerBytePerSecond;
|
||||
uint256 collateralPerByte;
|
||||
uint64 slots;
|
||||
uint64 slotSize;
|
||||
uint64 duration;
|
||||
uint64 maxSlotLoss;
|
||||
}
|
||||
|
||||
struct Content {
|
||||
bytes cid;
|
||||
bytes32 merkleRoot;
|
||||
}
|
||||
```
|
||||
|
||||
The table below provides the description of the `Request` and the associated types attributes:
|
||||
|
||||
| attribute | type | description |
|
||||
|-----------|------|-------------|
|
||||
| `client` | `address` | The Codex node requesting storage. |
|
||||
| `ask` | `Ask` | Parameters of Request. |
|
||||
| `content` | `Content` | The dataset that will be hosted with the storage request. |
|
||||
| `expiry` | `uint64` | Timeout in seconds during which all the slots have to be filled, otherwise Request will get cancelled. The final deadline timestamp is calculated at the moment the transaction is mined. |
|
||||
| `nonce` | `bytes32` | Random value to differentiate from other requests of same parameters. It SHOULD be a random byte array. |
|
||||
| `pricePerBytePerSecond` | `uint256` | Amount of tokens that will be awarded to SPs for finishing the storage request. It MUST be an amount of tokens offered per slot per second per byte. The Ethereum address that submits the `requestStorage()` transaction MUST have [approval](https://docs.openzeppelin.com/contracts/2.x/api/token/erc20#IERC20-approve-address-uint256-) for the transfer of at least an equivalent amount of full reward (`pricePerBytePerSecond * duration * slots * slotSize`) in tokens. |
|
||||
| `collateralPerByte` | `uint256` | The amount of tokens per byte of slot's size that SPs submit when they fill slots. Collateral is then slashed or forfeited if SPs fail to provide the service requested by the storage request (more information in the [Slashing](#### Slashing) section). |
|
||||
| `proofProbability` | `uint256` | Determines the average frequency that a proof is required within a period: $\frac{1}{proofProbability}$. SPs are required to provide proofs of storage to the marketplace contract when challenged. To prevent hosts from only coming online when proofs are required, the frequency at which proofs are requested from SPs is stochastic and is influenced by the `proofProbability` parameter. |
|
||||
| `duration` | `uint64` | Total duration of the storage request in seconds. It MUST NOT exceed the limit specified in the configuration `config.requestDurationLimit`. |
|
||||
| `slots` | `uint64` | The number of requested slots. The slots will all have the same size. |
|
||||
| `slotSize` | `uint64` | Amount of storage per slot in bytes. |
|
||||
| `maxSlotLoss` | `uint64` | Max slots that can be lost without data considered to be lost. |
|
||||
| `cid` | `bytes` | An identifier used to locate the Manifest representing the dataset. It MUST be a [CIDv1](https://github.com/multiformats/cid#cidv1), SHA-256 [multihash](https://github.com/multiformats/multihash) and the data it represents SHOULD be discoverable in the network, otherwise the request will be eventually canceled. |
|
||||
| `merkleRoot` | `bytes32` | Merkle root of the dataset, used to verify storage proofs |
|
||||
|
||||
#### Renewal of Storage Requests
|
||||
|
||||
It should be noted that the marketplace does not support extending requests. It is REQUIRED that if the user wants to extend the duration of a request, a new request with the same CID must be [created](### Creating Storage Requests) **before the original request completes**.
|
||||
|
||||
This ensures that the data will continue to persist in the network at the time when the new (or existing) SPs need to retrieve the complete dataset to fill the slots of the new request.
|
||||
|
||||
### Monitoring and State Management
|
||||
|
||||
Client nodes MUST implement the following smart contract interactions for monitoring and state management:
|
||||
|
||||
- **getRequest(requestId)**: Retrieve the full `StorageRequest` data from the marketplace. This function is used for recovery and state verification after restarts or failures.
|
||||
|
||||
- **requestState(requestId)**: Query the current state of a storage request. Used for monitoring request progress and determining the appropriate client actions.
|
||||
|
||||
- **requestExpiresAt(requestId)**: Query when the request will expire if not fulfilled.
|
||||
|
||||
- **getRequestEnd(requestId)**: Query when a fulfilled request will end (used to determine when to call `freeSlot` or `withdrawFunds`).
|
||||
|
||||
Client nodes MUST subscribe to the following marketplace events:
|
||||
|
||||
- **RequestFulfilled(requestId)**: Emitted when a storage request has enough filled slots to start. Clients monitor this event to determine when their request becomes active and transitions from the submission phase to the active phase.
|
||||
|
||||
- **RequestFailed(requestId)**: Emitted when a storage request fails due to proof failures or other reasons. Clients observe this event to detect failed requests and initiate fund withdrawal.
|
||||
|
||||
### Withdrawing Funds
|
||||
|
||||
The client node MUST monitor the status of the requests it created. When a storage request enters the `Cancelled`, `Failed`, or `Finished` state, the client node MUST initiate the withdrawal of the remaining or refunded funds from the smart contract using the `withdrawFunds(requestId)` function.
|
||||
|
||||
Request states are determined as follows:
|
||||
|
||||
- The request is considered `Cancelled` if no `RequestFulfilled(requestId)` event is observed during the timeout specified by the value returned from the `requestExpiresAt(requestId)` function.
|
||||
- The request is considered `Failed` when the `RequestFailed(requestId)` event is observed.
|
||||
- The request is considered `Finished` after the interval specified by the value returned from the `getRequestEnd(requestId)` function has elapsed.
|
||||
|
||||
## Storage Provider Role
|
||||
|
||||
A Codex node acting as an SP persists data across the network by hosting slots requested by clients in their storage requests.
|
||||
|
||||
The following tasks need to be considered when hosting a slot:
|
||||
|
||||
- Filling a slot
|
||||
- Proving
|
||||
- Repairing a slot
|
||||
- Collecting request reward and collateral
|
||||
|
||||
### Filling Slots
|
||||
|
||||
When a new request is created, the `StorageRequested(requestId, ask, expiry)` event is emitted with the following properties:
|
||||
|
||||
- `requestId` - the ID of the request.
|
||||
- `ask` - the specification of the request parameters. For details, see the definition of the `Request` type in the [Creating Storage Requests](### Creating Storage Requests) section above.
|
||||
- `expiry` - a Unix timestamp specifying when the request will be canceled if all slots are not filled by then.
|
||||
|
||||
It is then up to the SP node to decide, based on the emitted parameters and node's operator configuration, whether it wants to participate in the request and attempt to fill its slot(s) (note that one SP can fill more than one slot). If the SP node decides to ignore the request, no further action is required. However, if the SP decides to fill a slot, it MUST follow the remaining steps described below.
|
||||
|
||||
The node acting as an SP MUST decide which slot, specified by the slot index, it wants to fill. The SP MAY attempt to fill more than one slot. To fill a slot, the SP MUST first reserve the slot in the smart contract using `reserveSlot(requestId, slotIndex)`. If reservations for this slot are full, or if the SP has already reserved the slot, the transaction will revert. If the reservation was unsuccessful, then the SP is not allowed to fill the slot. If the reservation was successful, the node MUST then download the slot data using the CID of the manifest (**TODO: Manifest RFC**) and the slot index. The CID is specified in `request.content.cid`, which can be retrieved from the smart contract using `getRequest(requestId)`. Then, the node MUST generate a proof over the downloaded data (**TODO: Proving RFC**).
|
||||
|
||||
When the proof is ready, the SP MUST call `fillSlot()` on the smart contract with the following REQUIRED parameters:
|
||||
|
||||
- `requestId` - the ID of the request.
|
||||
- `slotIndex` - the slot index that the node wants to fill.
|
||||
- `proof` - the `Groth16Proof` proof structure, generated over the slot data.
|
||||
|
||||
The Ethereum address of the SP node from which the transaction originates MUST have [approval](https://docs.openzeppelin.com/contracts/2.x/api/token/erc20#IERC20-approve-address-uint256-) for the transfer of at least the amount of tokens required as collateral for the slot (`collateralPerByte * slotSize`).
|
||||
|
||||
If the proof delivered by the SP is invalid or the slot was already filled by another SP, then the transaction will revert. Otherwise, a `SlotFilled(requestId, slotIndex)` event is emitted. If the transaction is successful, the SP SHOULD transition into the **proving** state, where it will need to submit proof of data possession when challenged by the smart contract.
|
||||
|
||||
It should be noted that if the SP node observes a `SlotFilled` event for the slot it is currently downloading the dataset for or generating the proof for, it means that the slot has been filled by another node in the meantime. In response, the SP SHOULD stop its current operation and attempt to fill a different, unfilled slot.
|
||||
|
||||
### Proving
|
||||
|
||||
Once an SP fills a slot, it MUST submit proofs to the marketplace contract when a challenge is issued by the contract. SPs SHOULD detect that a proof is required for the current period using the `isProofRequired(slotId)` function, or that it will be required using the `willProofBeRequired(slotId)` function in the case that the [proving clock pointer is in downtime](https://github.com/codex-storage/codex-research/blob/41c4b4409d2092d0a5475aca0f28995034e58d14/design/storage-proof-timing.md).
|
||||
|
||||
Once an SP knows it has to provide a proof it MUST get the proof challenge using `getChallenge(slotId)`, which then
|
||||
MUST be incorporated into the proof generation as described in Proving RFC (**TODO: Proving RFC**).
|
||||
|
||||
When the proof is generated, it MUST be submitted by calling the `submitProof(slotId, proof)` smart contract function.
|
||||
|
||||
#### Slashing
|
||||
|
||||
There is a slashing scheme orchestrated by the smart contract to incentivize correct behavior and proper proof submissions by SPs. This scheme is configured at the smart contract level and applies uniformly to all participants in the network. The configuration of the slashing scheme can be obtained via the `configuration()` contract call.
|
||||
|
||||
The slashing works as follows:
|
||||
|
||||
- When SP misses a proof and a validator trigger detection of this event using the `markProofAsMissing()` call, the SP is slashed by `config.collateral.slashPercentage` **of the originally required collateral** (hence the slashing amount is always the same for a given request).
|
||||
- If the number of slashes exceeds `config.collateral.maxNumberOfSlashes`, the slot is freed, the remaining collateral is burned, and the slot is offered to other nodes for repair. The smart contract also emits the `SlotFreed(requestId, slotIndex)` event.
|
||||
|
||||
If, at any time, the number of freed slots exceeds the value specified by the `request.ask.maxSlotLoss` parameter, the dataset is considered lost, and the request is deemed _failed_. The collateral of all SPs that hosted the slots associated with the storage request is burned, and the `RequestFailed(requestId)` event is emitted.
|
||||
|
||||
### Repair
|
||||
|
||||
When a slot is freed due to too many missed proofs, which SHOULD be detected by listening to the `SlotFreed(requestId, slotIndex)` event, an SP node can decide whether to participate in repairing the slot. Similar to filling a slot, the node SHOULD consider the operator's configuration when making this decision. The SP that originally hosted the slot but failed to comply with proving requirements MAY also participate in the repair. However, by refilling the slot, the SP **will not** recover its original collateral and must submit new collateral using the `fillSlot()` call.
|
||||
|
||||
The repair process is similar to filling slots. If the original slot dataset is no longer present in the network, the SP MAY use erasure coding to reconstruct the dataset. Reconstructing the original slot dataset requires retrieving other pieces of the dataset stored in other slots belonging to the request. For this reason, the node that successfully repairs a slot is entitled to an additional reward. (**TODO: Implementation**)
|
||||
|
||||
The repair process proceeds as follows:
|
||||
|
||||
1. The SP observes the `SlotFreed` event and decides to repair the slot.
|
||||
2. The SP MUST reserve the slot with the `reserveSlot(requestId, slotIndex)` call. For more information see the [Filling Slots](###filling slots) section.
|
||||
3. The SP MUST download the chunks of data required to reconstruct the freed slot's data. The node MUST use the [Reed-Solomon algorithm](https://hackmd.io/FB58eZQoTNm-dnhu0Y1XnA) to reconstruct the missing data.
|
||||
4. The SP MUST generate proof over the reconstructed data.
|
||||
5. The SP MUST call the `fillSlot()` smart contract function with the same parameters and collateral allowance as described in the [Filling Slots](###filling slots) section.
|
||||
|
||||
### Collecting Funds
|
||||
|
||||
An SP node SHOULD monitor the requests and the associated slots it hosts.
|
||||
|
||||
When a storage request enters the `Cancelled`, `Finished`, or `Failed` state, the SP node SHOULD call the `freeSlot(slotId)` smart contract function.
|
||||
|
||||
The aforementioned storage request states (`Cancelled`, `Finished`, and `Failed`) can be detected as follows:
|
||||
|
||||
- A storage request is considered `Cancelled` if no `RequestFulfilled(requestId)` event is observed within the time indicated by the `expiry` request parameter. Note that a `RequestCancelled` event may also be emitted, but the node SHOULD NOT rely on this event to assert the request expiration, as the `RequestCancelled` event is not guaranteed to be emitted at the time of expiry.
|
||||
- A storage request is considered `Finished` when the time indicated by the value returned from the `getRequestEnd(requestId)` function has elapsed.
|
||||
- A node concludes that a storage request has `Failed` upon observing the `RequestFailed(requestId)` event.
|
||||
|
||||
For each of the states listed above, different funds are handled as follows:
|
||||
|
||||
- In the `Cancelled` state, the collateral is returned along with a proportional payout based on the time the node actually hosted the dataset before the expiry was reached.
|
||||
- In the `Finished` state, the full reward for hosting the slot, along with the collateral, is collected.
|
||||
- In the `Failed` state, no funds are collected. The reward is returned to the client, and the collateral is burned. The slot is removed from the list of slots and is no longer included in the list of slots returned by the `mySlots()` function.
|
||||
|
||||
## Validator Role
|
||||
|
||||
In a blockchain, a contract cannot change its state without a transaction and gas initiating the state change. Therefore, our smart contract requires an external trigger to periodically check and confirm that a storage proof has been delivered by the SP. This is where the validator role is essential.
|
||||
|
||||
The validator role is fulfilled by nodes that help to verify that SPs have submitted the required storage proofs.
|
||||
|
||||
It is the smart contract that checks if the proof requested from an SP has been delivered. The validator only triggers the decision-making function in the smart contract. To incentivize validators, they receive a reward each time they correctly mark a proof as missing corresponding to the percentage of the slashed collateral defined by `config.collateral.validatorRewardPercentage`.
|
||||
|
||||
Each time a validator observes the `SlotFilled` event, it SHOULD add the slot reported in the `SlotFilled` event to the validator's list of watched slots. Then, after the end of each period, a validator has up to `config.proofs.timeout` seconds (a configuration parameter retrievable with `configuration()`) to validate all the slots. If a slot lacks the required proof, the validator SHOULD call the `markProofAsMissing(slotId, period)` function on the smart contract. This function validates the correctness of the claim, and if right, will send a reward to the validator.
|
||||
|
||||
If validating all the slots observed by the validator is not feasible within the specified `timeout`, the validator MAY choose to validate only a subset of the observed slots.
|
||||
|
||||
---
|
||||
|
||||
## Part II: Implementation Suggestions
|
||||
|
||||
> **IMPORTANT**: The sections above (Abstract through Validator Role) define the normative Codex Marketplace protocol requirements. All implementations MUST comply with those protocol requirements to participate in the Codex network.
|
||||
>
|
||||
> **The sections below are non-normative**. They document implementation approaches used in the nim-codex reference implementation. These are suggestions to guide implementors but are NOT required by the protocol. Alternative implementations MAY use different approaches as long as they satisfy the protocol requirements defined in Part I.
|
||||
|
||||
## Implementation Suggestions
|
||||
|
||||
This section describes implementation approaches used in reference implementations. These are **suggestions and not normative requirements**. Implementations are free to use different internal architectures, state machines, and data structures as long as they correctly implement the protocol requirements defined above.
|
||||
|
||||
### Storage Provider Implementation
|
||||
|
||||
The nim-codex reference implementation provides a complete Storage Provider implementation with state machine management, slot queueing, and resource management. This section documents the nim-codex approach.
|
||||
|
||||
#### State Machine
|
||||
|
||||
The Sales module implements a deterministic state machine for each slot, progressing through the following states:
|
||||
|
||||
1. **SalePreparing** - Find a matching availability and create a reservation
|
||||
2. **SaleSlotReserving** - Reserve the slot on the marketplace
|
||||
3. **SaleDownloading** - Stream and persist the slot's data
|
||||
4. **SaleInitialProving** - Wait for stable challenge and generate initial proof
|
||||
5. **SaleFilling** - Compute collateral and fill the slot
|
||||
6. **SaleFilled** - Post-filling operations and expiry updates
|
||||
7. **SaleProving** - Generate and submit proofs periodically
|
||||
8. **SalePayout** - Free slot and calculate collateral
|
||||
9. **SaleFinished** - Terminal success state
|
||||
10. **SaleFailed** - Free slot on market and transition to error
|
||||
11. **SaleCancelled** - Cancellation path
|
||||
12. **SaleIgnored** - Sale ignored (no matching availability or other conditions)
|
||||
13. **SaleErrored** - Terminal error state
|
||||
14. **SaleUnknown** - Recovery state for crash recovery
|
||||
15. **SaleProvingSimulated** - Proving with injected failures for testing
|
||||
|
||||
All states move to `SaleErrored` if an error is raised.
|
||||
|
||||
##### SalePreparing
|
||||
|
||||
- Find a matching availability based on the following criteria: `freeSize`, `duration`, `collateralPerByte`, `minPricePerBytePerSecond` and `until`
|
||||
- Create a reservation
|
||||
- Move to `SaleSlotReserving` if successful
|
||||
- Move to `SaleIgnored` if no availability is found or if `BytesOutOfBoundsError` is raised because of no space available.
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
|
||||
##### SaleSlotReserving
|
||||
|
||||
- Check if the slot can be reserved
|
||||
- Move to `SaleDownloading` if successful
|
||||
- Move to `SaleIgnored` if `SlotReservationNotAllowedError` is raised or the slot cannot be reserved. The collateral is returned.
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
|
||||
##### SaleDownloading
|
||||
|
||||
- Select the correct data expiry:
|
||||
- When the request is started, the request end date is used
|
||||
- Otherwise the expiry date is used
|
||||
- Stream and persist data via `onStore`
|
||||
- For each written batch, release bytes from the reservation
|
||||
- Move to `SaleInitialProving` if successful
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
- Move to `SaleFilled` on `SlotFilled` event from the `marketplace`
|
||||
|
||||
##### SaleInitialProving
|
||||
|
||||
- Wait for a stable initial challenge
|
||||
- Produce the initial proof via `onProve`
|
||||
- Move to `SaleFilling` if successful
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
|
||||
##### SaleFilling
|
||||
|
||||
- Get the slot collateral
|
||||
- Fill the slot
|
||||
- Move to `SaleFilled` if successful
|
||||
- Move to `SaleIgnored` on `SlotStateMismatchError`. The collateral is returned.
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
|
||||
##### SaleFilled
|
||||
|
||||
- Ensure that the current host has filled the slot by checking the signer address
|
||||
- Notify by calling `onFilled` hook
|
||||
- Call `onExpiryUpdate` to change the data expiry from expiry date to request end date
|
||||
- Move to `SaleProving` (or `SaleProvingSimulated` for simulated mode)
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
|
||||
##### SaleProving
|
||||
|
||||
- For each period: fetch challenge, call `onProve`, and submit proof
|
||||
- Move to `SalePayout` when the slot request ends
|
||||
- Re-raise `SlotFreedError` when the slot is freed
|
||||
- Raise `SlotNotFilledError` when the slot is not filled
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
|
||||
##### SaleProvingSimulated
|
||||
|
||||
- Submit invalid proofs every `N` periods (`failEveryNProofs` in configuration) to test failure scenarios
|
||||
|
||||
##### SalePayout
|
||||
|
||||
- Get the current collateral and try to free the slot to ensure that the slot is freed after payout.
|
||||
- Forward the returned collateral to cleanup
|
||||
- Move to `SaleFinished` if successful
|
||||
- Move to `SaleFailed` on `RequestFailed` event from the `marketplace`
|
||||
- Move to `SaleCancelled` on cancelled timer elapsed, set to storage contract expiry
|
||||
|
||||
##### SaleFinished
|
||||
|
||||
- Call `onClear` hook
|
||||
- Call `onCleanUp` hook
|
||||
|
||||
##### SaleFailed
|
||||
|
||||
- Free the slot
|
||||
- Move to `SaleErrored` with the failure message
|
||||
|
||||
##### SaleCancelled
|
||||
|
||||
- Ensure that the node hosting the slot frees the slot
|
||||
- Call `onClear` hook
|
||||
- Call `onCleanUp` hook with the current collateral
|
||||
|
||||
##### SaleIgnored
|
||||
|
||||
- Call `onCleanUp` hook with the current collateral
|
||||
|
||||
##### SaleErrored
|
||||
|
||||
- Call `onClear` hook
|
||||
- Call `onCleanUp` hook
|
||||
|
||||
##### SaleUnknown
|
||||
|
||||
- Recovery entry: get the `on-chain` state and jump to the appropriate state
|
||||
|
||||
#### Slot Queue
|
||||
|
||||
Slot queue schedules slot work and instantiates one `SalesAgent` per item with bounded concurrency.
|
||||
|
||||
- Accepts `(requestId, slotIndex, …)` items and orders them by priority
|
||||
- Spawns one `SalesAgent` for each dequeued item, in other words, one item for one agent
|
||||
- Caps concurrent agents to `maxWorkers`
|
||||
- Supports pause/resume
|
||||
- Allows controlled requeue when an agent finishes with `reprocessSlot`
|
||||
|
||||
##### Slot Ordering
|
||||
|
||||
The criteria are in the following order:
|
||||
|
||||
1) **Unseen before seen** - Items that have not been seen are dequeued first.
|
||||
2) **More profitable first** - Higher `profitability` wins. `profitability` is `duration * pricePerSlotPerSecond`.
|
||||
3) **Less collateral first** - The item with the smaller `collateral` wins.
|
||||
4) **Later expiry first** - If both items carry an `expiry`, the one with the greater timestamp wins.
|
||||
|
||||
Within a single request, per-slot items are shuffled before enqueuing so the default slot-index order does not influence priority.
|
||||
|
||||
##### Pause / Resume
|
||||
|
||||
When the Slot queue processes an item with `seen = true`, it means that the item was already evaluated against the current availabilities and did not match.
|
||||
To avoid draining the queue with untenable requests (due to insufficient availability), the queue pauses itself.
|
||||
|
||||
The queue resumes when:
|
||||
|
||||
- `OnAvailabilitySaved` fires after an availability update that increases one of: `freeSize`, `duration`, `minPricePerBytePerSecond`, or `totalRemainingCollateral`.
|
||||
- A new unseen item (`seen = false`) is pushed.
|
||||
- `unpause()` is called explicitly.
|
||||
|
||||
##### Reprocess
|
||||
|
||||
Availability matching occurs in `SalePreparing`.
|
||||
If no availability fits at that time, the sale is ignored with `reprocessSlot` to true, meaning that the slot is added back to the queue with the flag `seen` to true.
|
||||
|
||||
##### Startup
|
||||
|
||||
On `SlotQueue.start()`, the sales module first deletes reservations associated with inactive storage requests, then starts a new `SalesAgent` for each active storage request:
|
||||
|
||||
- Fetch the active `on-chain` active slots.
|
||||
- Delete the local reservations for slots that are not in the active list.
|
||||
- Create a new agent for each slot and assign the `onCleanUp` callback.
|
||||
- Start the agent in the `SaleUnknown` state.
|
||||
|
||||
#### Main Behaviour
|
||||
|
||||
When a new slot request is received, the sales module extracts the pair `(requestId, slotIndex, …)` from the request.
|
||||
A `SlotQueueItem` is then created with metadata such as `profitability`, `collateral`, `expiry`, and the `seen` flag set to `false`.
|
||||
This item is pushed into the `SlotQueue`, where it will be prioritised according to the ordering rules.
|
||||
|
||||
#### SalesAgent
|
||||
|
||||
SalesAgent is the instance that executes the state machine for a single slot.
|
||||
|
||||
- Executes the sale state machine across the slot lifecycle
|
||||
- Holds a `SalesContext` with dependencies and host hooks
|
||||
- Supports crash recovery via the `SaleUnknown` state
|
||||
- Handles errors by entering `SaleErrored`, which runs cleanup routines
|
||||
|
||||
#### SalesContext
|
||||
|
||||
SalesContext is a container for dependencies used by all sales.
|
||||
|
||||
- Provides external interfaces: `Market` (marketplace) and `Clock`
|
||||
- Provides access to `Reservations`
|
||||
- Provides host hooks: `onStore`, `onProve`, `onExpiryUpdate`, `onClear`, `onSale`
|
||||
- Shares the `SlotQueue` handle for scheduling work
|
||||
- Provides configuration such as `simulateProofFailures`
|
||||
- Passed to each `SalesAgent`
|
||||
|
||||
#### Marketplace Subscriptions
|
||||
|
||||
The sales module subscribes to on-chain events to keep the queue and agents consistent.
|
||||
|
||||
##### StorageRequested
|
||||
|
||||
When the marketplace signals a new request, the sales module:
|
||||
|
||||
- Computes collateral for free slots.
|
||||
- Creates per-slot `SlotQueueItem` entries (one per `slotIndex`) with `seen = false`.
|
||||
- Pushes the items into the `SlotQueue`.
|
||||
|
||||
##### SlotFreed
|
||||
|
||||
When the marketplace signals a freed slot (needs repair), the sales module:
|
||||
|
||||
- Retrieves the request data for the `requestId`.
|
||||
- Computes collateral for repair.
|
||||
- Creates a `SlotQueueItem`.
|
||||
- Pushes the item into the `SlotQueue`.
|
||||
|
||||
##### RequestCancelled
|
||||
|
||||
When a request is cancelled, the sales module removes all queue items for that `requestId`.
|
||||
|
||||
##### RequestFulfilled
|
||||
|
||||
When a request is fulfilled, the sales module removes all queue items for that `requestId` and notifies active agents bound to the request.
|
||||
|
||||
##### RequestFailed
|
||||
|
||||
When a request fails, the sales module removes all queue items for that `requestId` and notifies active agents bound to the request.
|
||||
|
||||
##### SlotFilled
|
||||
|
||||
When a slot is filled, the sales module removes the queue item for that specific `(requestId, slotIndex)` and notifies the active agent for that slot.
|
||||
|
||||
##### SlotReservationsFull
|
||||
|
||||
When the marketplace signals that reservations are full, the sales module removes the queue item for that specific `(requestId, slotIndex)`.
|
||||
|
||||
#### Reservations
|
||||
|
||||
The Reservations module manages both Availabilities and Reservations.
|
||||
When an Availability is created, it reserves bytes in the storage module so no other modules can use those bytes.
|
||||
Before a dataset for a slot is downloaded, a Reservation is created, and the freeSize of the Availability is reduced.
|
||||
When bytes are downloaded, the reservation of those bytes in the storage module is released.
|
||||
Accounting of both reserved bytes in the storage module and freeSize in the Availability are cleaned up upon completion of the state machine.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Availability] -->|creates| R[Reservation]
|
||||
A -->|reserves bytes in| SM[Storage Module]
|
||||
R -->|reduces| AF[Availability.freeSize]
|
||||
R -->|downloads data| D[Dataset]
|
||||
D -->|releases bytes to| SM
|
||||
TC[Terminal State] -->|triggers cleanup| C[Cleanup]
|
||||
C -->|returns bytes to| AF
|
||||
C -->|deletes| R
|
||||
C -->|returns collateral to| A
|
||||
```
|
||||
|
||||
#### Hooks
|
||||
|
||||
- **onStore**: streams data into the node's storage
|
||||
- **onProve**: produces proofs for initial and periodic proving
|
||||
- **onExpiryUpdate**: notifies the client node of a change in the expiry data
|
||||
- **onSale**: notifies that the host is now responsible for the slot
|
||||
- **onClear**: notification emitted once the state machine has concluded; used to reconcile Availability bytes and reserved bytes in the storage module
|
||||
- **onCleanUp**: cleanup hook called in terminal states to release resources, delete reservations, and return collateral to availabilities
|
||||
|
||||
#### Error Handling
|
||||
|
||||
- Always catch `CancelledError` from `nim-chronos` and log a trace, exiting gracefully
|
||||
- Catch `CatchableError`, log it, and route to `SaleErrored`
|
||||
|
||||
#### Cleanup
|
||||
|
||||
Cleanup releases resources held by a sales agent and optionally requeues the slot.
|
||||
|
||||
- Return reserved bytes to the availability if a reservation exists
|
||||
- Delete the reservation and return any remaining collateral
|
||||
- If `reprocessSlot` is true, push the slot back into the queue marked as seen
|
||||
- Remove the agent from the sales set and track the removal future
|
||||
|
||||
#### Resource Management Approach
|
||||
|
||||
The nim-codex implementation uses Availabilities and Reservations to manage local storage resources:
|
||||
|
||||
##### Reservation Management
|
||||
|
||||
- Maintain `Availability` and `Reservation` records locally
|
||||
- Match incoming slot requests to available capacity using prioritisation rules
|
||||
- Lock capacity and collateral when creating a reservation
|
||||
- Release reserved bytes progressively during download and free all remaining resources in terminal states
|
||||
|
||||
**Note:** Availabilities and Reservations are completely local to the Storage Provider implementation and are not visible at the protocol level. They provide one approach to managing storage capacity, but other implementations may use different resource management strategies.
|
||||
|
||||
---
|
||||
|
||||
> **Protocol Compliance Note**: The Storage Provider implementation described above is specific to nim-codex. The only normative requirements for Storage Providers are defined in the [Storage Provider Role](#storage-provider-role) section of Part I. Implementations must satisfy those protocol requirements but may use completely different internal designs.
|
||||
|
||||
### Client Implementation
|
||||
|
||||
The nim-codex reference implementation provides a complete Client implementation with state machine management for storage request lifecycles. This section documents the nim-codex approach.
|
||||
|
||||
The nim-codex implementation uses a state machine pattern to manage purchase lifecycles, providing deterministic state transitions, explicit terminal states, and recovery support. The state machine definitions (state identifiers, transitions, state descriptions, requirements, data models, and interfaces) are documented in the subsections below.
|
||||
|
||||
> **Note**: The Purchase module terminology and state machine design are specific to the nim-codex implementation. The protocol only requires that clients interact with the marketplace smart contract as specified in the Client Role section.
|
||||
|
||||
#### State Identifiers
|
||||
|
||||
- PurchasePending: `pending`
|
||||
- PurchaseSubmitted: `submitted`
|
||||
- PurchaseStarted: `started`
|
||||
- PurchaseFinished: `finished`
|
||||
- PurchaseErrored: `errored`
|
||||
- PurchaseCancelled: `cancelled`
|
||||
- PurchaseFailed: `failed`
|
||||
- PurchaseUnknown: `unknown`
|
||||
|
||||
#### General Rules for All States
|
||||
|
||||
- If a `CancelledError` is raised, the state machine logs the cancellation message and takes no further action.
|
||||
- If a `CatchableError` is raised, the state machine moves to `errored` with the error message.
|
||||
|
||||
#### State Transitions
|
||||
|
||||
```text
|
||||
|
|
||||
v
|
||||
------------------------- unknown
|
||||
| / /
|
||||
v v /
|
||||
pending ----> submitted ----> started ---------> finished <----/
|
||||
\ \ /
|
||||
\ ------------> failed <----/
|
||||
\ /
|
||||
--> cancelled <-----------------------
|
||||
```
|
||||
|
||||
**Note:**
|
||||
|
||||
Any state can transition to errored upon a `CatchableError`.
|
||||
`failed` is an intermediate state before `errored`.
|
||||
`finished`, `cancelled`, and `errored` are terminal states.
|
||||
|
||||
#### State Descriptions
|
||||
|
||||
**Pending State (`pending`)**
|
||||
|
||||
A storage request is being created by making a call `on-chain`. If the storage request creation fails, the state machine moves to the `errored` state with the corresponding error.
|
||||
|
||||
**Submitted State (`submitted`)**
|
||||
|
||||
The storage request has been created and the purchase waits for the request to start. When it starts, an `on-chain` event `RequestFulfilled` is emitted, triggering the subscription callback, and the state machine moves to the `started` state. If the expiry is reached before the callback is called, the state machine moves to the `cancelled` state.
|
||||
|
||||
**Started State (`started`)**
|
||||
|
||||
The purchase is active and waits until the end of the request, defined by the storage request parameters, before moving to the `finished` state. A subscription is made to the marketplace to be notified about request failure. If a request failure is notified, the state machine moves to `failed`.
|
||||
|
||||
Marketplace subscription signature:
|
||||
|
||||
```nim
|
||||
method subscribeRequestFailed*(market: Market, requestId: RequestId, callback: OnRequestFailed): Future[Subscription] {.base, async.}
|
||||
```
|
||||
|
||||
**Finished State (`finished`)**
|
||||
|
||||
The purchase is considered successful and cleanup routines are called. The purchase module calls `marketplace.withdrawFunds` to release the funds locked by the marketplace:
|
||||
|
||||
```nim
|
||||
method withdrawFunds*(market: Market, requestId: RequestId) {.base, async: (raises: [CancelledError, MarketError]).}
|
||||
```
|
||||
|
||||
After that, the purchase is done; no more states are called and the state machine stops successfully.
|
||||
|
||||
**Failed State (`failed`)**
|
||||
|
||||
If the marketplace emits a `RequestFailed` event, the state machine moves to the `failed` state and the purchase module calls `marketplace.withdrawFunds` (same signature as above) to release the funds locked by the marketplace. After that, the state machine moves to `errored`.
|
||||
|
||||
**Cancelled State (`cancelled`)**
|
||||
|
||||
The purchase is cancelled and the purchase module calls `marketplace.withdrawFunds` to release the funds locked by the marketplace (same signature as above). After that, the purchase is terminated; no more states are called and the state machine stops with the reason of failure as error.
|
||||
|
||||
**Errored State (`errored`)**
|
||||
|
||||
The purchase is terminated; no more states are called and the state machine stops with the reason of failure as error.
|
||||
|
||||
**Unknown State (`unknown`)**
|
||||
|
||||
The purchase is in recovery mode, meaning that the state has to be determined. The purchase module calls the marketplace to get the request data (`getRequest`) and the request state (`requestState`):
|
||||
|
||||
```nim
|
||||
method getRequest*(market: Market, id: RequestId): Future[?StorageRequest] {.base, async: (raises: [CancelledError]).}
|
||||
|
||||
method requestState*(market: Market, requestId: RequestId): Future[?RequestState] {.base, async.}
|
||||
```
|
||||
|
||||
Based on this information, it moves to the corresponding next state.
|
||||
|
||||
> **Note**: Functional and non-functional requirements for the client role are summarized in the [Codex Marketplace Specification](https://github.com/codex-storage/codex-spec/blob/master/specs/marketplace.md). The requirements listed below are specific to the nim-codex Purchase module implementation.
|
||||
|
||||
#### Functional Requirements
|
||||
|
||||
##### Purchase Definition
|
||||
|
||||
- Every purchase MUST represent exactly one `StorageRequest`
|
||||
- The purchase MUST have a unique, deterministic identifier `PurchaseId` derived from `requestId`
|
||||
- It MUST be possible to restore any purchase from its `requestId` after a restart
|
||||
- A purchase is considered expired when the expiry timestamp in its `StorageRequest` is reached before the request start, i.e, an event `RequestFulfilled` is emitted by the `marketplace`
|
||||
|
||||
##### State Machine Progression
|
||||
|
||||
- New purchases MUST start in the `pending` state (submission flow)
|
||||
- Recovered purchases MUST start in the `unknown` state (recovery flow)
|
||||
- The state machine MUST progress step-by-step until a deterministic terminal state is reached
|
||||
- The choice of terminal state MUST be based on the `RequestState` returned by the `marketplace`
|
||||
|
||||
##### Failure Handling
|
||||
|
||||
- On marketplace failure events, the purchase MUST immediately transition to `errored` without retries
|
||||
- If a `CancelledError` is raised, the state machine MUST log the cancellation and stop further processing
|
||||
- If a `CatchableError` is raised, the state machine MUST transition to `errored` and record the error
|
||||
|
||||
#### Non-Functional Requirements
|
||||
|
||||
##### Execution Model
|
||||
|
||||
A purchase MUST be handled by a single thread; only one worker SHOULD process a given purchase instance at a time.
|
||||
|
||||
##### Reliability
|
||||
|
||||
`load` supports recovery after process restarts.
|
||||
|
||||
##### Performance
|
||||
|
||||
State transitions should be non-blocking; all I/O is async.
|
||||
|
||||
##### Logging
|
||||
|
||||
All state transitions and errors should be clearly logged for traceability.
|
||||
|
||||
##### Safety
|
||||
|
||||
- Avoid side effects during `new` other than initialising internal fields; `on-chain` interactions are delegated to states using `marketplace` dependency.
|
||||
- Retry policy for external calls.
|
||||
|
||||
##### Testing
|
||||
|
||||
- Unit tests check that each state handles success and error properly.
|
||||
- Integration tests check that a full purchase flows correctly through states.
|
||||
|
||||
---
|
||||
|
||||
> **Protocol Compliance Note**: The Client implementation described above is specific to nim-codex. The only normative requirements for Clients are defined in the [Client Role](#client-role) section of Part I. Implementations must satisfy those protocol requirements but may use completely different internal designs.
|
||||
|
||||
---
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
|
||||
## References
|
||||
|
||||
### Normative
|
||||
|
||||
- [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt) - Key words for use in RFCs to Indicate Requirement Levels
|
||||
- [Reed-Solomon algorithm](https://hackmd.io/FB58eZQoTNm-dnhu0Y1XnA) - Erasure coding algorithm used for data encoding
|
||||
- [CIDv1](https://github.com/multiformats/cid#cidv1) - Content Identifier specification
|
||||
- [multihash](https://github.com/multiformats/multihash) - Self-describing hashes
|
||||
- [Proof-of-Data-Possession](https://hackmd.io/2uRBltuIT7yX0CyczJevYg?view) - Zero-knowledge proof system for storage verification
|
||||
- [Original Codex Marketplace Spec](https://github.com/codex-storage/codex-spec/blob/master/specs/marketplace.md) - Source specification for this document
|
||||
|
||||
### Informative
|
||||
|
||||
- [Codex Implementation](https://github.com/codex-storage/nim-codex) - Reference implementation in Nim
|
||||
- [Codex market implementation](https://github.com/codex-storage/nim-codex/blob/master/codex/market.nim) - Marketplace module implementation
|
||||
- [Codex Sales Component Spec](https://github.com/codex-storage/codex-docs-obsidian/blob/main/10%20Notes/Specs/Component%20Specification%20-%20Sales.md) - Storage Provider implementation details
|
||||
- [Codex Purchase Component Spec](https://github.com/codex-storage/codex-docs-obsidian/blob/main/10%20Notes/Specs/Component%20Specification%20-%20Purchase.md) - Client implementation details
|
||||
- [Nim Chronos](https://github.com/status-im/nim-chronos) - Async/await framework for Nim
|
||||
- [Storage proof timing design](https://github.com/codex-storage/codex-research/blob/41c4b4409d2092d0a5475aca0f28995034e58d14/design/storage-proof-timing.md) - Proof timing mechanism
|
||||
@@ -2,5 +2,5 @@
|
||||
|
||||
Nomos is building a secure, flexible, and
|
||||
scalable infrastructure for developers creating applications for the network state.
|
||||
To learn more about Nomos current protocols under discussion,
|
||||
head over to [Nomos Specs](https://github.com/logos-co/nomos-specs).
|
||||
Published Specifications are currently available here,
|
||||
[Nomos Specifications](https://nomos-tech.notion.site/project).
|
||||
|
||||
170
nomos/raw/nomosda-encoding.md
Normal file
170
nomos/raw/nomosda-encoding.md
Normal file
@@ -0,0 +1,170 @@
|
||||
---
|
||||
title: NOMOSDA-ENCODING
|
||||
name: NomosDA Encoding Protocol
|
||||
status: raw
|
||||
category:
|
||||
tags: data-availability
|
||||
editor: Daniel Sanchez-Quiros <danielsq@status.im>
|
||||
contributors:
|
||||
- Daniel Kashepava <danielkashepava@status.im>
|
||||
- Álvaro Castro-Castilla <alvaro@status.im>
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
- Thomas Lavaur <thomaslavaur@status.im>
|
||||
- Mehmet Gonen <mehmet@status.im>
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
This document describes the encoding and verification processes of NomosDA, which is the data availability (DA) solution used by the Nomos blockchain. NomosDA provides an assurance that all data from Nomos blobs are accessible and verifiable by every network participant.
|
||||
|
||||
This document presents an implementation specification describing how:
|
||||
|
||||
- Encoders encode blobs they want to upload to the Data Availability layer.
|
||||
- Other nodes implement the verification of blobs that were already uploaded to DA.
|
||||
|
||||
## Definitions
|
||||
|
||||
- **Encoder**: An encoder is any actor who performs the encoding process described in this document. This involves committing to the data, generating proofs, and submitting the result to the DA layer.
|
||||
|
||||
In the Nomos architecture, the rollup sequencer typically acts as the encoder, but the role is not exclusive and any actor in the DA layer can also act as encoders.
|
||||
- **Verifier**: Verifies its portion of the distributed blob data as per the verification protocol. In the Nomos architecture, the DA nodes act as the verifiers.
|
||||
|
||||
## Overview
|
||||
|
||||
In the encoding stage, the encoder takes the DA parameters and the padded blob data and creates an initial matrix of data chunks. This matrix is expanded using Reed-Solomon coding and various commitments and proofs are created for the data.
|
||||
|
||||
When a verifier receives a sample, it verifies the data it receives from the encoder and broadcasts the information if the data is verified. Finally, the verifier stores the sample data for the required length of time.
|
||||
|
||||
## Construction
|
||||
|
||||
The encoder and verifier use the [NomosDA cryptographic protocol](https://www.notion.so/NomosDA-Cryptographic-Protocol-1fd261aa09df816fa97ac81304732e77?pvs=21) to carry out their respective functions. These functions are implemented as abstracted and configurable software entities that allow the original data to be encoded and verified via high-level operations.
|
||||
|
||||
### Glossary
|
||||
|
||||
| Name | Description | Representation |
|
||||
| --- | --- | --- |
|
||||
| `Commitment` | Commitment as per the [NomosDA Cryptographic Protocol](https://www.notion.so/NomosDA-Cryptographic-Protocol-1fd261aa09df816fa97ac81304732e77?pvs=21) | `bytes` |
|
||||
| `Proof` | Proof as per the [NomosDA Cryptographic Protocol](https://www.notion.so/NomosDA-Cryptographic-Protocol-1fd261aa09df816fa97ac81304732e77?pvs=21) | `bytes` |
|
||||
| `ChunksMatrix` | Matrix of chunked data. Each chunk is **31 bytes.** Row and Column sizes depend on the encoding necessities. | `List[List[bytes]]` |
|
||||
|
||||
### Encoder
|
||||
|
||||
An encoder takes a set of parameters and the blob data, and creates a matrix of chunks that it uses to compute the necessary cryptographic data. It produces the set of Reed-Solomon (RS) encoded data, the commitments, and the proofs that are needed prior to [dispersal](https://www.notion.so/NomosDA-Dispersal-1fd261aa09df815288c9caf45ed72c95?pvs=21).
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A[DaEncoderParams] -->|Input| B(Encoder)
|
||||
I[31bytes-padded-input] -->|Input| B
|
||||
B -->|Creates| D[Chunks matrix]
|
||||
D --> |Input| C[NomosDA encoding]
|
||||
C --> E{Encoded data📄}
|
||||
```
|
||||
|
||||
#### Encoding Process
|
||||
|
||||
The encoder executes the encoding process as follows:
|
||||
|
||||
1. The encoder takes the following input parameters:
|
||||
|
||||
```python
|
||||
class DAEncoderParams:
|
||||
column_count: usize
|
||||
bytes_per_field_element: usize
|
||||
```
|
||||
|
||||
| Name | Description | Representation |
|
||||
| --- | --- | --- |
|
||||
| `column_count` | The number of subnets available for dispersal in the system | `usize`, `int` in Python |
|
||||
| `bytes_per_field_element` | The amount of bytes per data chunk. This is set to 31 bytes. Each chunk has 31 bytes rather than 32 to ensure that the chunk value does not exceed the maximum value on the [BLS12-381 elliptic curve](https://electriccoin.co/blog/new-snark-curve/). | `usize`, `int` in Python |
|
||||
|
||||
2. The encoder also includes the blob data to be encoded, which must be of a size that is a multiple of `bytes_per_field_element` bytes. Clients are responsible for padding the data so it fits this constraint.
|
||||
3. The encoder splits the data into `bytes_per_field_element`-sized chunks. It also arranges these chunks into rows and columns, creating a matrix.
|
||||
a. The amount of columns of the matrix needs to fit with the `column_count` parameter, taking into account the `rs_expansion_factor` (currently fixed to 2).
|
||||
i. This means that the size of each row in this matrix is `(bytes_per_field_element*column_count)/rs_expansion_factor`.
|
||||
b. The amount of rows depends on the size of the data.
|
||||
4. The data is encoded as per [the cryptographic details](https://www.notion.so/NomosDA-Cryptographic-Protocol-1fd261aa09df816fa97ac81304732e77?pvs=21).
|
||||
5. The encoder provides the encoded data set:
|
||||
|
||||
| Name | Description | Representation |
|
||||
| --- | --- | --- |
|
||||
| `data` | Original data | `bytes` |
|
||||
| `chunked_data` | Matrix before RS expansion | `ChunksMatrix` |
|
||||
| `extended_matrix` | Matrix after RS expansion | `ChunksMatrix` |
|
||||
| `row_commitments` | Commitments for each matrix row | `List[Commitment]` |
|
||||
| `combined_column_proofs` | Proofs for each matrix column | `List[Proof]` |
|
||||
|
||||
```python
|
||||
class EncodedData:
|
||||
data: bytes
|
||||
chunked_data: ChunksMatrix
|
||||
extended_matrix: ChunksMatrix
|
||||
row_commitments: List[Commitment]
|
||||
combined_column_proofs: List[Proof]
|
||||
```
|
||||
|
||||
#### Encoder Limits
|
||||
|
||||
NomosDA does not impose a fixed limit on blob size at the encoding level. However, protocols that involve resource-intensive operations must include upper bounds to prevent abuse. In the case of NomosDA, blob size limits are expected to be enforced, as part of the protocol's broader responsibility for resource management and fairness.
|
||||
|
||||
Larger blobs naturally result in higher computational and bandwidth costs, particularly for the encoder, who must compute a proof for each column. Without size limits, malicious clients could exploit the system by attempting to stream unbounded data to DA nodes. Since payment is provided before blob dispersal, DA nodes are protected from performing unnecessary work. This enables the protocol to safely accept very large blobs, as the primary computational cost falls on the encoder. The protocol can accommodate generous blob sizes in practice, while rejecting only absurdly large blobs, such as those exceeding 1 GB, to prevent denial-of-service attacks and ensure network stability.
|
||||
|
||||
To mitigate this, the protocol define acceptable blob size limits, and DA implementations enforce local mitigation strategies, such as flagging or blacklisting clients that violate these constraints.
|
||||
|
||||
### Verifier
|
||||
|
||||
A verifier checks the proper encoding of data blobs it receives. A verifier executes the verification process as follows:
|
||||
|
||||
1. The verifier receives a `DAShare` with the required verification data:
|
||||
|
||||
| Name | Description | Representation |
|
||||
| --- | --- | --- |
|
||||
| `column` | Column chunks (31 bytes) from the encoded matrix | `List[bytes]` |
|
||||
| `column_idx` | Column id (`0..2047`). It is directly related to the `subnetworks` in the [network specification](https://www.notion.so/NomosDA-Network-Specification-1fd261aa09df81188e76cb083791252d?pvs=21). | `u16`, unsigned int of 16 bits. `int` in Python |
|
||||
| `combined_column_proof` | Proof of the random linear combination of the column elements. | `Proof` |
|
||||
| `row_commitments` | Commitments for each matrix row | `List[Commitment]` |
|
||||
| `blob_id` | This is computed as the hash (**blake2b**) of `row_commitments` | `bytes` |
|
||||
|
||||
2. Upon receiving the above data it verifies the column data as per the [cryptographic details](https://www.notion.so/NomosDA-Cryptographic-Protocol-1fd261aa09df816fa97ac81304732e77?pvs=21). If the verification is successful, the node triggers the [replication protocol](https://www.notion.so/NomosDA-Subnetwork-Replication-1fd261aa09df811d93f8c6280136bfbb?pvs=21) and stores the blob.
|
||||
|
||||
```python
|
||||
class DAShare:
|
||||
column: Column
|
||||
column_idx: u16
|
||||
combined_column_proof: Proof
|
||||
row_commitments: List[Commitment]
|
||||
|
||||
def blob_id(self) -> BlobId:
|
||||
hasher = blake2b(digest_size=32)
|
||||
for c in self.row_commitments:
|
||||
hasher.update(bytes(c))
|
||||
return hasher.digest()
|
||||
```
|
||||
|
||||
### Verification Logic
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant N as Node
|
||||
participant S as Subnetwork Column N
|
||||
loop For each incoming blob column
|
||||
N-->>N: If blob is valid
|
||||
N-->>S: Replication
|
||||
N->>N: Stores blob
|
||||
end
|
||||
```
|
||||
|
||||
## Details
|
||||
|
||||
The encoder and verifier processes described above make use of a variety of cryptographic functions to facilitate the correct verification of column data by verifiers. These functions rely on primitives such as polynomial commitments and Reed-Solomon erasure codes, the details of which are outside the scope of this document. These details, as well as introductions to the cryptographic primitives being used, can be found in the NomosDA Cryptographic Protocol:
|
||||
|
||||
[NomosDA Cryptographic Protocol](https://www.notion.so/NomosDA-Cryptographic-Protocol-1fd261aa09df816fa97ac81304732e77?pvs=21)
|
||||
|
||||
## References
|
||||
|
||||
- Encoder Specification: [GitHub/encoder.py](https://github.com/logos-co/nomos-specs/blob/master/da/encoder.py)
|
||||
- Verifier Specification: [GitHub/verifier.py](https://github.com/logos-co/nomos-specs/blob/master/da/verifier.py)
|
||||
- Cryptographic protocol: [NomosDA Cryptographic Protocol](https://www.notion.so/NomosDA-Cryptographic-Protocol-1fd261aa09df816fa97ac81304732e77?pvs=21)
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
255
nomos/raw/nomosda-network.md
Normal file
255
nomos/raw/nomosda-network.md
Normal file
@@ -0,0 +1,255 @@
|
||||
---
|
||||
title: NOMOS-DA-NETWORK
|
||||
name: NomosDA Network
|
||||
status: raw
|
||||
category:
|
||||
tags: network, data-availability, da-nodes, executors, sampling
|
||||
editor: Daniel Sanchez Quiros <danielsq@status.im>
|
||||
contributors:
|
||||
- Álvaro Castro-Castilla <alvaro@status.im>
|
||||
- Daniel Kashepava <danielkashepava@status.im>
|
||||
- Gusto Bacvinka <augustinas@status.im>
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
NomosDA is the scalability solution protocol for data availability within the Nomos network.
|
||||
This document delineates the protocol's structure at the network level,
|
||||
identifies participants,
|
||||
and describes the interactions among its components.
|
||||
Please note that this document does not delve into the cryptographic aspects of the design.
|
||||
For comprehensive details on the cryptographic operations,
|
||||
a detailed specification is a work in progress.
|
||||
|
||||
## Objectives
|
||||
|
||||
NomosDA was created to ensure that data from Nomos zones is distributed, verifiable, immutable, and accessible.
|
||||
At the same time, it is optimised for the following properties:
|
||||
|
||||
- **Decentralization**: NomosDA’s data availability guarantees must be achieved with minimal trust assumptions
|
||||
and centralised actors. Therefore,
|
||||
permissioned DA schemes involving a Data Availability Committee (DAC) had to be avoided in the design.
|
||||
Schemes that require some nodes to download the entire blob data were also off the list
|
||||
due to the disproportionate role played by these “supernodes”.
|
||||
|
||||
- **Scalability**: NomosDA is intended to be a bandwidth-scalable protocol, ensuring that its functions are maintained as the Nomos network grows. Therefore, NomosDA was designed to minimise the amount of data sent to participants, reducing the communication bottleneck and allowing more parties to participate in the DA process.
|
||||
|
||||
To achieve the above properties, NomosDA splits up zone data and
|
||||
distributes it among network participants,
|
||||
with cryptographic properties used to verify the data’s integrity.
|
||||
A major feature of this design is that parties who wish to receive an assurance of data availability
|
||||
can do so very quickly and with minimal hardware requirements.
|
||||
However, this comes at the cost of additional complexity and resources required by more integral participants.
|
||||
|
||||
## Requirements
|
||||
|
||||
In order to ensure that the above objectives are met,
|
||||
the NomosDA network requires a group of participants
|
||||
that undertake a greater burden in terms of active involvement in the protocol.
|
||||
Recognising that not all node operators can do so,
|
||||
NomosDA assigns different roles to different kinds of participants,
|
||||
depending on their ability and willingness to contribute more computing power
|
||||
and bandwidth to the protocol.
|
||||
It was therefore necessary for NomosDA to be implemented as an opt-in Service Network.
|
||||
|
||||
Because the NomosDA network has an arbitrary amount of participants,
|
||||
and the data is split into a fixed number of portions (see the [Encoding & Verification Specification](https://www.notion.so/NomosDA-Encoding-Verification-4d8ca269e96d4fdcb05abc70426c5e7c)),
|
||||
it was necessary to define exactly how each portion is assigned to a participant who will receive and verify it.
|
||||
This assignment algorithm must also be flexible enough to ensure smooth operation in a variety of scenarios,
|
||||
including where there are more or fewer participants than the number of portions.
|
||||
|
||||
## Overview
|
||||
|
||||
### Network Participants
|
||||
|
||||
The NomosDA network includes three categories of participants:
|
||||
|
||||
- **Executors**: Tasked with the encoding and dispersal of data blobs.
|
||||
- **DA Nodes**: Receive and verify the encoded data,
|
||||
subsequently temporarily storing it for further network validation through sampling.
|
||||
- **Light Nodes**: Employ sampling to ascertain data availability.
|
||||
|
||||
### Network Distribution
|
||||
|
||||
The NomosDA network is segmented into `num_subnets` subnetworks.
|
||||
These subnetworks represent subsets of peers from the overarching network,
|
||||
each responsible for a distinct portion of the distributed encoded data.
|
||||
Peers in the network may engage in one or multiple subnetworks,
|
||||
contingent upon network size and participant count.
|
||||
|
||||
### Sub-protocols
|
||||
|
||||
The NomosDA protocol consists of the following sub-protocols:
|
||||
|
||||
- **Dispersal**: Describes how executors distribute encoded data blobs to subnetworks.
|
||||
[NomosDA Dispersal](https://www.notion.so/NomosDA-Dispersal-1818f96fb65c805ca257cb14798f24d4?pvs=21)
|
||||
- **Replication**: Defines how DA nodes distribute encoded data blobs within subnetworks.
|
||||
[NomosDA Subnetwork Replication](https://www.notion.so/NomosDA-Subnetwork-Replication-1818f96fb65c80119fa0e958a087cc2b?pvs=21)
|
||||
- **Sampling**: Used by sampling clients (e.g., light clients) to verify the availability of previously dispersed
|
||||
and replicated data.
|
||||
[NomosDA Sampling](https://www.notion.so/NomosDA-Sampling-1538f96fb65c8031a44cf7305d271779?pvs=21)
|
||||
- **Reconstruction**: Describes gathering and decoding dispersed data back into its original form.
|
||||
[NomosDA Reconstruction](https://www.notion.so/NomosDA-Reconstruction-1828f96fb65c80b2bbb9f4c5a0cf26a5?pvs=21)
|
||||
- **Indexing**: Tracks and exposes blob metadata on-chain.
|
||||
[NomosDA Indexing](https://www.notion.so/NomosDA-Indexing-1bb8f96fb65c8044b635da9df20c2411?pvs=21)
|
||||
|
||||
## Construction
|
||||
|
||||
### NomosDA Network Registration
|
||||
|
||||
Entities wishing to participate in NomosDA must declare their role via [SDP](https://www.notion.so/Final-Draft-Validator-Role-Protocol-17b8f96fb65c80c69c2ef55e22e29506) (Service Declaration Protocol).
|
||||
Once declared, they're accounted for in the subnetwork construction.
|
||||
|
||||
This enables participation in:
|
||||
|
||||
- Dispersal (as executor)
|
||||
- Replication & sampling (as DA node)
|
||||
- Sampling (as light node)
|
||||
|
||||
### Subnetwork Assignment
|
||||
|
||||
The NomosDA network comprises `num_subnets` subnetworks,
|
||||
which are virtual in nature.
|
||||
A subnetwork is a subset of peers grouped together so nodes know who they should connect with,
|
||||
serving as groupings of peers tasked with executing the dispersal and replication sub-protocols.
|
||||
In each subnetwork, participants establish a fully connected overlay,
|
||||
ensuring all nodes maintain permanent connections for the lifetime of the SDP set
|
||||
with peers within the same subnetwork.
|
||||
Nodes refer to nodes in the Data Availability SDP set to ascertain their connectivity requirements across subnetworks.
|
||||
|
||||
#### Assignment Algorithm
|
||||
|
||||
The concrete distribution algorithm is described in the following specification:
|
||||
[DA Subnetwork Assignation](https://www.notion.so/DA-Subnetwork-Assignation-217261aa09df80fc8bb9cf46092741ce)
|
||||
|
||||
## Executor Connections
|
||||
|
||||
Each executor maintains a connection with one peer per subnetwork,
|
||||
necessitating at least num_subnets stable and healthy connections.
|
||||
Executors are expected to allocate adequate resources to sustain these connections.
|
||||
An example algorithm for peer selection would be:
|
||||
|
||||
```python
|
||||
def select_peers(
|
||||
subnetworks: Sequence[Set[PeerId]],
|
||||
filtered_subnetworks: Set[int],
|
||||
filtered_peers: Set[PeerId]
|
||||
) -> Set[PeerId]:
|
||||
result = set()
|
||||
for i, subnetwork in enumerate(subnetworks):
|
||||
available_peers = subnetwork - filtered_peers
|
||||
if i not in filtered_subnetworks and available_peers:
|
||||
result.add(next(iter(available_peers)))
|
||||
return result
|
||||
```
|
||||
|
||||
## NomosDA Protocol Steps
|
||||
|
||||
### Dispersal
|
||||
|
||||
1. The NomosDA protocol is initiated by executors
|
||||
who perform data encoding as outlined in the [Encoding Specification](https://www.notion.so/NomosDA-Encoding-Verification-4d8ca269e96d4fdcb05abc70426c5e7c).
|
||||
2. Executors prepare and distribute each encoded data portion
|
||||
to its designated subnetwork (from `0` to `num_subnets - 1` ).
|
||||
3. Executors might opt to perform sampling to confirm successful dispersal.
|
||||
4. Post-dispersal, executors publish the dispersed `blob_id` and metadata to the mempool. <!-- TODO: add link to dispersal document-->
|
||||
|
||||
### Replication
|
||||
|
||||
DA nodes receive columns from dispersal or replication
|
||||
and validate the data encoding.
|
||||
Upon successful validation,
|
||||
they replicate the validated column to connected peers within their subnetwork.
|
||||
Replication occurs once per blob; subsequent validations of the same blob are discarded.
|
||||
|
||||
### Sampling
|
||||
|
||||
1. Sampling is [invoked based on the node's current role](https://www.notion.so/1538f96fb65c8031a44cf7305d271779?pvs=25#15e8f96fb65c8006b9d7f12ffdd9a159).
|
||||
2. The node selects `sample_size` random subnetworks
|
||||
and queries each for the availability of the corresponding column for the sampled blob. Sampling is deemed successful only if all queried subnetworks respond affirmatively.
|
||||
|
||||
- If `num_subnets` is 2048, `sample_size` is [20 as per the sampling research](https://www.notion.so/1708f96fb65c80a08c97d728cb8476c3?pvs=25#1708f96fb65c80bab6f9c6a946940078)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
SamplingClient ->> DANode_1: Request
|
||||
DANode_1 -->> SamplingClient: Response
|
||||
SamplingClient ->>DANode_2: Request
|
||||
DANode_2 -->> SamplingClient: Response
|
||||
SamplingClient ->> DANode_n: Request
|
||||
DANode_n -->> SamplingClient: Response
|
||||
```
|
||||
|
||||
### Network Schematics
|
||||
|
||||
The overall network and protocol interactions is represented by the following diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Replication
|
||||
subgraph Subnetwork_N
|
||||
N10 -->|Replicate| N20
|
||||
N20 -->|Replicate| N30
|
||||
N30 -->|Replicate| N10
|
||||
end
|
||||
subgraph ...
|
||||
end
|
||||
subgraph Subnetwork_0
|
||||
N1 -->|Replicate| N2
|
||||
N2 -->|Replicate| N3
|
||||
N3 -->|Replicate| N1
|
||||
end
|
||||
end
|
||||
subgraph Sampling
|
||||
N9 -->|Sample 0| N2
|
||||
N9 -->|Sample S| N20
|
||||
end
|
||||
subgraph Dispersal
|
||||
Executor -->|Disperse| N1
|
||||
Executor -->|Disperse| N10
|
||||
end
|
||||
```
|
||||
|
||||
## Details
|
||||
|
||||
### Network specifics
|
||||
|
||||
The NomosDA network is engineered for connection efficiency.
|
||||
Executors manage numerous open connections,
|
||||
utilizing their resource capabilities.
|
||||
DA nodes, with their resource constraints,
|
||||
are designed to maximize connection reuse.
|
||||
|
||||
NomosDA uses [multiplexed](https://docs.libp2p.io/concepts/transports/quic/#quic-native-multiplexing) streams over [QUIC](https://docs.libp2p.io/concepts/transports/quic/) connections.
|
||||
For each sub-protocol, a stream protocol ID is defined to negotiate the protocol,
|
||||
triggering the specific protocol once established:
|
||||
|
||||
- Dispersal: /nomos/da/{version}/dispersal
|
||||
- Replication: /nomos/da/{version}/replication
|
||||
- Sampling: /nomos/da/{version}/sampling
|
||||
|
||||
Through these multiplexed streams,
|
||||
DA nodes can utilize the same connection for all sub-protocols.
|
||||
This, combined with virtual subnetworks (membership sets),
|
||||
ensures the overlay node distribution is scalable for networks of any size.
|
||||
|
||||
## References
|
||||
|
||||
- [Encoding Specification](https://www.notion.so/NomosDA-Encoding-Verification-4d8ca269e96d4fdcb05abc70426c5e7c)
|
||||
- [Encoding & Verification Specification](https://www.notion.so/NomosDA-Encoding-Verification-4d8ca269e96d4fdcb05abc70426c5e7c)
|
||||
- [NomosDA Dispersal](https://www.notion.so/NomosDA-Dispersal-1818f96fb65c805ca257cb14798f24d4?pvs=21)
|
||||
- [NomosDA Subnetwork Replication](https://www.notion.so/NomosDA-Subnetwork-Replication-1818f96fb65c80119fa0e958a087cc2b?pvs=21)
|
||||
- [DA Subnetwork Assignation](https://www.notion.so/DA-Subnetwork-Assignation-217261aa09df80fc8bb9cf46092741ce)
|
||||
- [NomosDA Sampling](https://www.notion.so/NomosDA-Sampling-1538f96fb65c8031a44cf7305d271779?pvs=21)
|
||||
- [NomosDA Reconstruction](https://www.notion.so/NomosDA-Reconstruction-1828f96fb65c80b2bbb9f4c5a0cf26a5?pvs=21)
|
||||
- [NomosDA Indexing](https://www.notion.so/NomosDA-Indexing-1bb8f96fb65c8044b635da9df20c2411?pvs=21)
|
||||
- [SDP](https://www.notion.so/Final-Draft-Validator-Role-Protocol-17b8f96fb65c80c69c2ef55e22e29506)
|
||||
- [invoked based on the node's current role](https://www.notion.so/1538f96fb65c8031a44cf7305d271779?pvs=25#15e8f96fb65c8006b9d7f12ffdd9a159)
|
||||
- [20 as per the sampling research](https://www.notion.so/1708f96fb65c80a08c97d728cb8476c3?pvs=25#1708f96fb65c80bab6f9c6a946940078)
|
||||
- [multiplexed](https://docs.libp2p.io/concepts/transports/quic/#quic-native-multiplexing)
|
||||
- [QUIC](https://docs.libp2p.io/concepts/transports/quic/)
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
236
nomos/raw/p2p-hardware-requirements.md
Normal file
236
nomos/raw/p2p-hardware-requirements.md
Normal file
@@ -0,0 +1,236 @@
|
||||
---
|
||||
title: P2P-HARDWARE-REQUIREMENTS
|
||||
name: Nomos p2p Network Hardware Requirements Specification
|
||||
status: raw
|
||||
category: infrastructure
|
||||
tags: [hardware, requirements, nodes, validators, services]
|
||||
editor: Daniel Sanchez-Quiros <danielsq@status.im>
|
||||
contributors:
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
This specification defines the hardware requirements for running various types of Nomos blockchain nodes. Hardware needs vary significantly based on the node's role, from lightweight verification nodes to high-performance Zone Executors. The requirements are designed to support diverse participation levels while ensuring network security and performance.
|
||||
|
||||
## Motivation
|
||||
|
||||
The Nomos network is designed to be inclusive and accessible across a wide range of hardware configurations. By defining clear hardware requirements for different node types, we enable:
|
||||
|
||||
1. **Inclusive Participation**: Allow users with limited resources to participate as Light Nodes
|
||||
2. **Scalable Infrastructure**: Support varying levels of network participation based on available resources
|
||||
3. **Performance Optimization**: Ensure adequate resources for computationally intensive operations
|
||||
4. **Network Security**: Maintain network integrity through properly resourced validator nodes
|
||||
5. **Service Quality**: Define requirements for optional services that enhance network functionality
|
||||
|
||||
**Important Notice**: These hardware requirements are preliminary and subject to revision based on implementation testing and real-world network performance data.
|
||||
|
||||
## Specification
|
||||
|
||||
### Node Types Overview
|
||||
|
||||
Hardware requirements vary based on the node's role and services:
|
||||
|
||||
- **Light Node**: Minimal verification with minimal resources
|
||||
- **Basic Bedrock Node**: Standard validation participation
|
||||
- **Service Nodes**: Enhanced capabilities for optional network services
|
||||
|
||||
### Light Node
|
||||
|
||||
Light Nodes provide network verification with minimal resource requirements, suitable for resource-constrained environments.
|
||||
|
||||
**Target Use Cases:**
|
||||
|
||||
- Mobile devices and smartphones
|
||||
- Single-board computers (Raspberry Pi, etc.)
|
||||
- IoT devices with network connectivity
|
||||
- Users with limited hardware resources
|
||||
|
||||
**Hardware Requirements:**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|---------------|
|
||||
| **CPU** | Low-power processor (smartphone/SBC capable) |
|
||||
| **Memory (RAM)** | 512 MB |
|
||||
| **Storage** | Minimal (few GB) |
|
||||
| **Network** | Reliable connection, 1 Mbps free bandwidth |
|
||||
|
||||
### Basic Bedrock Node (Validator)
|
||||
|
||||
Basic validators participate in Bedrock consensus using typical consumer hardware.
|
||||
|
||||
**Target Use Cases:**
|
||||
|
||||
- Individual validators on consumer hardware
|
||||
- Small-scale validation operations
|
||||
- Entry-level network participation
|
||||
|
||||
**Hardware Requirements:**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|---------------|
|
||||
| **CPU** | 2 cores, 2 GHz modern multi-core processor |
|
||||
| **Memory (RAM)** | 1 GB minimum |
|
||||
| **Storage** | SSD with 100+ GB free space, expandable |
|
||||
| **Network** | Reliable connection, 1 Mbps free bandwidth |
|
||||
|
||||
### Service-Specific Requirements
|
||||
|
||||
Nodes can optionally run additional Bedrock Services that require enhanced resources beyond basic validation.
|
||||
|
||||
#### Data Availability (DA) Service
|
||||
|
||||
DA Service nodes store and serve data shares for the network's data availability layer.
|
||||
|
||||
**Service Role:**
|
||||
|
||||
- Store blockchain data and blob data long-term
|
||||
- Serve data shares to requesting nodes
|
||||
- Maintain high availability for data retrieval
|
||||
|
||||
**Additional Requirements:**
|
||||
|
||||
| Component | Specification | Rationale |
|
||||
|-----------|---------------|-----------|
|
||||
| **CPU** | Same as Basic Bedrock Node | Standard processing needs |
|
||||
| **Memory (RAM)** | Same as Basic Bedrock Node | Standard memory needs |
|
||||
| **Storage** | **Fast SSD, 500+ GB free** | Long-term chain and blob storage |
|
||||
| **Network** | **High bandwidth (10+ Mbps)** | Concurrent data serving |
|
||||
| **Connectivity** | **Stable, accessible external IP** | Direct peer connections |
|
||||
|
||||
**Network Requirements:**
|
||||
|
||||
- Capacity to handle multiple concurrent connections
|
||||
- Stable external IP address for direct peer access
|
||||
- Low latency for efficient data serving
|
||||
|
||||
#### Blend Protocol Service
|
||||
|
||||
Blend Protocol nodes provide anonymous message routing capabilities.
|
||||
|
||||
**Service Role:**
|
||||
|
||||
- Route messages anonymously through the network
|
||||
- Provide timing obfuscation for privacy
|
||||
- Maintain multiple concurrent connections
|
||||
|
||||
**Additional Requirements:**
|
||||
|
||||
| Component | Specification | Rationale |
|
||||
|-----------|---------------|-----------|
|
||||
| **CPU** | Same as Basic Bedrock Node | Standard processing needs |
|
||||
| **Memory (RAM)** | Same as Basic Bedrock Node | Standard memory needs |
|
||||
| **Storage** | Same as Basic Bedrock Node | Standard storage needs |
|
||||
| **Network** | **Stable connection (10+ Mbps)** | Multiple concurrent connections |
|
||||
| **Connectivity** | **Stable, accessible external IP** | Direct peer connections |
|
||||
|
||||
**Network Requirements:**
|
||||
|
||||
- Low-latency connection for effective message blending
|
||||
- Stable connection for timing obfuscation
|
||||
- Capability to handle multiple simultaneous connections
|
||||
|
||||
#### Executor Network Service
|
||||
|
||||
Zone Executors perform the most computationally intensive work in the network.
|
||||
|
||||
**Service Role:**
|
||||
|
||||
- Execute Zone state transitions
|
||||
- Generate zero-knowledge proofs
|
||||
- Process complex computational workloads
|
||||
|
||||
**Critical Performance Note**: Zone Executors perform the heaviest computational work in the network. High-performance hardware is crucial for effective participation and may provide competitive advantages in execution markets.
|
||||
|
||||
**Hardware Requirements:**
|
||||
|
||||
| Component | Specification | Rationale |
|
||||
|-----------|---------------|-----------|
|
||||
| **CPU** | **Very high-performance multi-core processor** | Zone logic execution and ZK proving |
|
||||
| **Memory (RAM)** | **32+ GB strongly recommended** | Complex Zone execution requirements |
|
||||
| **Storage** | Same as Basic Bedrock Node | Standard storage needs |
|
||||
| **GPU** | **Highly recommended/often necessary** | Efficient ZK proof generation |
|
||||
| **Network** | **High bandwidth (10+ Mbps)** | Data dispersal and high connection load |
|
||||
|
||||
**GPU Requirements:**
|
||||
|
||||
- **NVIDIA**: CUDA-enabled GPU (RTX 3090 or equivalent recommended)
|
||||
- **Apple**: Metal-compatible Apple Silicon
|
||||
- **Performance Impact**: Strong GPU significantly reduces proving time
|
||||
|
||||
**Network Requirements:**
|
||||
|
||||
- Support for **2048+ direct UDP connections** to DA Nodes (for blob publishing)
|
||||
- High bandwidth for data dispersal operations
|
||||
- Stable connection for continuous operation
|
||||
|
||||
*Note: DA Nodes utilizing [libp2p](https://docs.libp2p.io/) connections need sufficient capacity to receive and serve data shares over many connections.*
|
||||
|
||||
## Implementation Requirements
|
||||
|
||||
### Minimum Requirements
|
||||
|
||||
All Nomos nodes MUST meet:
|
||||
|
||||
1. **Basic connectivity** to the Nomos network via [libp2p](https://docs.libp2p.io/)
|
||||
2. **Adequate storage** for their designated role
|
||||
3. **Sufficient processing power** for their service level
|
||||
4. **Reliable network connection** with appropriate bandwidth for [QUIC](https://docs.libp2p.io/concepts/transports/quic/) transport
|
||||
|
||||
### Optional Enhancements
|
||||
|
||||
Node operators MAY implement:
|
||||
|
||||
- Hardware redundancy for critical services
|
||||
- Enhanced cooling for high-performance configurations
|
||||
- Dedicated network connections for service nodes utilizing [libp2p](https://docs.libp2p.io/) protocols
|
||||
- Backup power systems for continuous operation
|
||||
|
||||
### Resource Scaling
|
||||
|
||||
Requirements may vary based on:
|
||||
|
||||
- **Network Load**: Higher network activity increases resource demands
|
||||
- **Zone Complexity**: More complex Zones require additional computational resources
|
||||
- **Service Combinations**: Running multiple services simultaneously increases requirements
|
||||
- **Geographic Location**: Network latency affects optimal performance requirements
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Hardware Security
|
||||
|
||||
1. **Secure Storage**: Use encrypted storage for sensitive node data
|
||||
2. **Network Security**: Implement proper firewall configurations
|
||||
3. **Physical Security**: Secure physical access to node hardware
|
||||
4. **Backup Strategies**: Maintain secure backups of critical data
|
||||
|
||||
### Performance Security
|
||||
|
||||
1. **Resource Monitoring**: Monitor resource usage to detect anomalies
|
||||
2. **Redundancy**: Plan for hardware failures in critical services
|
||||
3. **Isolation**: Consider containerization or virtualization for service isolation
|
||||
4. **Update Management**: Maintain secure update procedures for hardware drivers
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Scalability
|
||||
|
||||
- **Light Nodes**: Minimal resource footprint, high scalability
|
||||
- **Validators**: Moderate resource usage, network-dependent scaling
|
||||
- **Service Nodes**: High resource usage, specialized scaling requirements
|
||||
|
||||
### Resource Efficiency
|
||||
|
||||
- **CPU Usage**: Optimized algorithms for different hardware tiers
|
||||
- **Memory Usage**: Efficient data structures for constrained environments
|
||||
- **Storage Usage**: Configurable retention policies and compression
|
||||
- **Network Usage**: Adaptive bandwidth utilization based on [libp2p](https://docs.libp2p.io/) capacity and [QUIC](https://docs.libp2p.io/concepts/transports/quic/) connection efficiency
|
||||
|
||||
## References
|
||||
|
||||
1. [libp2p protocol](https://docs.libp2p.io/)
|
||||
2. [QUIC protocol](https://docs.libp2p.io/concepts/transports/quic/)
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
377
nomos/raw/p2p-nat-solution.md
Normal file
377
nomos/raw/p2p-nat-solution.md
Normal file
@@ -0,0 +1,377 @@
|
||||
---
|
||||
title: P2P-NAT-SOLUTION
|
||||
name: Nomos P2P Network NAT Solution Specification
|
||||
status: raw
|
||||
category: networking
|
||||
tags: [nat, traversal, autonat, upnp, pcp, nat-pmp]
|
||||
editor: Antonio Antonino <antonio@status.im>
|
||||
contributors:
|
||||
- Álvaro Castro-Castilla <alvaro@status.im>
|
||||
- Daniel Sanchez-Quiros <danielsq@status.im>
|
||||
- Petar Radovic <petar@status.im>
|
||||
- Gusto Bacvinka <augustinas@status.im>
|
||||
- Youngjoon Lee <youngjoon@status.im>
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
This specification defines a comprehensive NAT (Network Address Translation) traversal solution for the Nomos P2P network. The solution enables nodes to automatically determine their NAT status and establish both outbound and inbound connections regardless of network configuration. The strategy combines [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md), dynamic port mapping protocols, and continuous verification to maximize public reachability while maintaining decentralized operation.
|
||||
|
||||
## Motivation
|
||||
|
||||
Network Address Translation presents a critical challenge for Nomos participants, particularly those operating on consumer hardware without technical expertise. The Nomos network requires a NAT traversal solution that:
|
||||
|
||||
1. **Automatic Operation**: Works out-of-the-box without user configuration
|
||||
2. **Inclusive Participation**: Enables nodes on consumer hardware to participate effectively
|
||||
3. **Decentralized Approach**: Leverages the existing Nomos P2P network rather than centralized services
|
||||
4. **Progressive Fallback**: Escalates through increasingly complex protocols as needed
|
||||
5. **Dynamic Adaptation**: Handles changing network environments and configurations
|
||||
|
||||
The solution must ensure that nodes can both establish outbound connections and accept inbound connections from other peers, maintaining network connectivity across diverse NAT configurations.
|
||||
|
||||
## Specification
|
||||
|
||||
### Terminology
|
||||
|
||||
- **Public Node**: A node that is publicly reachable via a public IP address or valid port mapping
|
||||
- **Private Node**: A node that is not publicly reachable due to NAT/firewall restrictions
|
||||
- **Dialing**: The process of establishing a connection using the [libp2p protocol](https://docs.libp2p.io/) stack
|
||||
- **NAT Status**: Whether a node is publicly reachable or hidden behind NAT
|
||||
|
||||
### Key Design Principles
|
||||
|
||||
#### Optional Configuration
|
||||
|
||||
The NAT traversal strategy must work out-of-the-box whenever possible. Users who do not want to engage in configuration should only need to install the node software package. However, users requiring full control must be able to configure every aspect of the strategy.
|
||||
|
||||
#### Decentralized Operation
|
||||
|
||||
The solution leverages the existing Nomos P2P network for coordination rather than relying on centralized third-party services. This maintains the decentralized nature of the network while providing necessary NAT traversal capabilities.
|
||||
|
||||
#### Progressive Fallback
|
||||
|
||||
The protocol begins with lightweight checks and escalates through more complex and resource-intensive protocols. Failure at any step moves the protocol to the next stage in the strategy, ensuring maximum compatibility across network configurations.
|
||||
|
||||
#### Dynamic Network Environment
|
||||
|
||||
Unless explicitly configured for static addresses, each node's public or private status is assumed to be dynamic. A once publicly-reachable node can become unreachable and vice versa, requiring continuous monitoring and adaptation.
|
||||
|
||||
### Node Discovery Considerations
|
||||
|
||||
The Nomos public network encourages participation from a large number of nodes, many deployed through simple installation procedures. Some nodes will not achieve Public status, but the discovery protocol must track these peers and allow other nodes to discover them. This prevents network partitioning and ensures Private nodes remain accessible to other participants.
|
||||
|
||||
### NAT Traversal Protocol
|
||||
|
||||
#### Protocol Requirements
|
||||
|
||||
**Each node MUST:**
|
||||
|
||||
- Run an [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) client, except for nodes statically configured as Public
|
||||
- Use the [Identify protocol](https://github.com/libp2p/specs/blob/master/identify/README.md) to advertise support for:
|
||||
- `/nomos/autonat/2/dial-request` for main network
|
||||
- `/nomos-testnet/autonat/2/dial-request` for public testnet
|
||||
- `/nomos/autonat/2/dial-back` and `/nomos-testnet/autonat/2/dial-back` respectively
|
||||
|
||||
#### NAT State Machine
|
||||
|
||||
The NAT traversal process follows a multi-phase state machine:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Start@{shape: circle, label: "Start"} -->|Preconfigured public IP or port mapping| StaticPublic[Statically configured as<br/>**Public**]
|
||||
subgraph Phase0 [Phase 0]
|
||||
Start -->|Default configuration| Boot
|
||||
end
|
||||
subgraph Phase1 [Phase 1]
|
||||
Boot[Bootstrap and discover AutoNAT servers]--> Inspect
|
||||
Inspect[Inspect own IP addresses]-->|At least 1 IP address in the public range| ConfirmPublic[AutoNAT]
|
||||
end
|
||||
subgraph Phase2 [Phase 2]
|
||||
Inspect -->|No IP addresses in the public range| MapPorts[Port Mapping Client<br/>UPnP/NAT-PMP/PCP]
|
||||
MapPorts -->|Successful port map| ConfirmMapPorts[AutoNAT]
|
||||
end
|
||||
ConfirmPublic -->|Node's IP address reachable by AutoNAT server| Public[**Public** Node]
|
||||
ConfirmPublic -->|Node's IP address not reachable by AutoNAT server or Timeout| MapPorts
|
||||
ConfirmMapPorts -->|Mapped IP address and port reachable by AutoNAT server| Public
|
||||
ConfirmMapPorts -->|Mapped IP address and port not reachable by AutoNAT server or Timeout| Private
|
||||
MapPorts -->|Failure or Timeout| Private[**Private** Node]
|
||||
subgraph Phase3 [Phase 3]
|
||||
Public -->Monitor
|
||||
Private --> Monitor
|
||||
end
|
||||
Monitor[Network Monitoring] -->|Restart| Inspect
|
||||
```
|
||||
|
||||
### Phase Implementation
|
||||
|
||||
#### Phase 0: Bootstrapping and Identifying Public Nodes
|
||||
|
||||
If the node is statically configured by the operator to be Public, the procedure stops here.
|
||||
|
||||
The node utilizes bootstrapping and discovery mechanisms to find other Public nodes. The [Identify protocol](https://github.com/libp2p/specs/blob/master/identify/README.md) confirms which detected Public nodes support [AutoNAT v2](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md).
|
||||
|
||||
#### Phase 1: NAT Detection
|
||||
|
||||
The node starts an [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) client and inspects its own addresses. For each public IP address, the node verifies public reachability via [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md). If any public IP addresses are confirmed, the node assumes Public status and moves to Phase 3. Otherwise, it continues to Phase 2.
|
||||
|
||||
#### Phase 2: Automated Port Mapping
|
||||
|
||||
The node attempts to secure port mapping on the default gateway using:
|
||||
|
||||
- **[PCP](https://datatracker.ietf.org/doc/html/rfc6887)** (Port Control Protocol) - Most reliable
|
||||
- **[NAT-PMP](https://datatracker.ietf.org/doc/html/rfc6886)** (NAT Port Mapping Protocol) - Second most reliable
|
||||
- **[UPnP-IGD](https://datatracker.ietf.org/doc/html/rfc6970)** (Universal Plug and Play Internet Gateway Device) - Most widely deployed
|
||||
|
||||
**Port Mapping Algorithm:**
|
||||
|
||||
```python
|
||||
def try_port_mapping():
|
||||
# Step 1: Get the local IPv4 address
|
||||
local_ip = get_local_ipv4_address()
|
||||
|
||||
# Step 2: Get the default gateway IPv4 address
|
||||
gateway_ip = get_default_gateway_address()
|
||||
|
||||
# Step 3: Abort if local or gateway IP could not be determined
|
||||
if not local_ip or not gateway_ip:
|
||||
return "Mapping failed: Unable to get local or gateway IPv4"
|
||||
|
||||
# Step 4: Probe the gateway for protocol support
|
||||
supports_pcp = probe_pcp(gateway_ip)
|
||||
supports_nat_pmp = probe_nat_pmp(gateway_ip)
|
||||
supports_upnp = probe_upnp(gateway_ip) # Optional for logging
|
||||
|
||||
# Step 5-9: Try protocols in order of reliability
|
||||
# PCP (most reliable) -> NAT-PMP -> UPnP -> fallback attempts
|
||||
|
||||
protocols = [
|
||||
(supports_pcp, try_pcp_mapping),
|
||||
(supports_nat_pmp, try_nat_pmp_mapping),
|
||||
(True, try_upnp_mapping), # Always try UPnP
|
||||
(not supports_pcp, try_pcp_mapping), # Fallback
|
||||
(not supports_nat_pmp, try_nat_pmp_mapping) # Last resort
|
||||
]
|
||||
|
||||
for supported, mapping_func in protocols:
|
||||
if supported:
|
||||
mapping = mapping_func(local_ip, gateway_ip)
|
||||
if mapping:
|
||||
return mapping
|
||||
|
||||
return "Mapping failed: No protocol succeeded"
|
||||
```
|
||||
|
||||
If mapping succeeds, the node uses [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) to confirm public reachability. Upon confirmation, the node assumes Public status. Otherwise, it assumes Private status.
|
||||
|
||||
**Port Mapping Sequence:**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
box Node
|
||||
participant AutoNAT Client
|
||||
participant NAT State Machine
|
||||
participant Port Mapping Client
|
||||
end
|
||||
participant Router
|
||||
|
||||
alt Mapping is successful
|
||||
Note left of AutoNAT Client: Phase 2
|
||||
Port Mapping Client ->> +Router: Requests new mapping
|
||||
Router ->> Port Mapping Client: Confirms new mapping
|
||||
Port Mapping Client ->> NAT State Machine: Mapping secured
|
||||
NAT State Machine ->> AutoNAT Client: Requests confirmation<br/>that mapped address<br/>is publicly reachable
|
||||
|
||||
alt Node asserts Public status
|
||||
AutoNAT Client ->> NAT State Machine: Mapped address<br/>is publicly reachable
|
||||
Note left of AutoNAT Client: Phase 3<br/>Network Monitoring
|
||||
else Node asserts Private status
|
||||
AutoNAT Client ->> NAT State Machine: Mapped address<br/>is not publicly reachable
|
||||
Note left of AutoNAT Client: Phase 3<br/>Network Monitoring
|
||||
end
|
||||
else Mapping fails, node asserts Private status
|
||||
Note left of AutoNAT Client: Phase 2
|
||||
Port Mapping Client ->> Router: Requests new mapping
|
||||
Router ->> Port Mapping Client: Refuses new mapping or Timeout
|
||||
Port Mapping Client ->> NAT State Machine: Mapping failed
|
||||
Note left of AutoNAT Client: Phase 3<br/>Network Monitoring
|
||||
end
|
||||
```
|
||||
|
||||
#### Phase 3: Network Monitoring
|
||||
|
||||
Unless explicitly configured, nodes must monitor their network status and restart from Phase 1 when changes are detected.
|
||||
|
||||
**Public Node Monitoring:**
|
||||
|
||||
A Public node must restart when:
|
||||
|
||||
- [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) client no longer confirms public reachability
|
||||
- A previously successful port mapping is lost or refresh fails
|
||||
|
||||
**Private Node Monitoring:**
|
||||
|
||||
A Private node must restart when:
|
||||
|
||||
- It gains a new public IP address
|
||||
- Port mapping is likely to succeed (gateway change, sufficient time passed)
|
||||
|
||||
**Network Monitoring Sequence:**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant AutoNAT Server
|
||||
box Node
|
||||
participant AutoNAT Client
|
||||
participant NAT State Machine
|
||||
participant Port Mapping Client
|
||||
end
|
||||
participant Router
|
||||
|
||||
Note left of AutoNAT Server: Phase 3<br/>Network Monitoring
|
||||
par Refresh mapping and monitor changes
|
||||
loop periodically refreshes mapping
|
||||
Port Mapping Client ->> Router: Requests refresh
|
||||
Router ->> Port Mapping Client: Confirms mapping refresh
|
||||
end
|
||||
break Mapping is lost, the node loses Public status
|
||||
Router ->> Port Mapping Client: Refresh failed or mapping dropped
|
||||
Port Mapping Client ->> NAT State Machine: Mapping lost
|
||||
NAT State Machine ->> NAT State Machine: Restart
|
||||
end
|
||||
and Monitor public reachability of mapped addresses
|
||||
loop periodically checks public reachability
|
||||
AutoNAT Client ->> AutoNAT Server: Requests dialback
|
||||
AutoNAT Server ->> AutoNAT Client: Dialback successful
|
||||
end
|
||||
break
|
||||
AutoNAT Server ->> AutoNAT Client: Dialback failed or Timeout
|
||||
AutoNAT Client ->> NAT State Machine: Public reachability lost
|
||||
NAT State Machine ->> NAT State Machine: Restart
|
||||
end
|
||||
end
|
||||
Note left of AutoNAT Server: Phase 1
|
||||
```
|
||||
|
||||
### Public Node Responsibilities
|
||||
|
||||
**A Public node MUST:**
|
||||
|
||||
- Run an [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) server
|
||||
- Listen on and advertise via [Identify protocol](https://github.com/libp2p/specs/blob/master/identify/README.md) its publicly reachable [multiaddresses](https://github.com/libp2p/specs/blob/master/addressing/README.md):
|
||||
|
||||
`/{public_peer_ip}/udp/{port}/quic-v1/p2p/{public_peer_id}`
|
||||
|
||||
- Periodically renew port mappings according to protocol recommendations
|
||||
- Maintain high availability for [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) services
|
||||
|
||||
### Peer Dialing
|
||||
|
||||
Other peers can always dial a Public peer using its publicly reachable [multiaddresses](https://github.com/libp2p/specs/blob/master/addressing/README.md):
|
||||
|
||||
`/{public_peer_ip}/udp/{port}/quic-v1/p2p/{public_peer_id}`
|
||||
|
||||
## Implementation Requirements
|
||||
|
||||
### Mandatory Components
|
||||
|
||||
All Nomos nodes MUST implement:
|
||||
|
||||
1. **[AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) client** for NAT status detection
|
||||
2. **Port mapping clients** for [PCP](https://datatracker.ietf.org/doc/html/rfc6887), [NAT-PMP](https://datatracker.ietf.org/doc/html/rfc6886), and [UPnP-IGD](https://datatracker.ietf.org/doc/html/rfc6970)
|
||||
3. **[Identify protocol](https://github.com/libp2p/specs/blob/master/identify/README.md)** for capability advertisement
|
||||
4. **Network monitoring** for status change detection
|
||||
|
||||
### Optional Enhancements
|
||||
|
||||
Nodes MAY implement:
|
||||
|
||||
- Custom port mapping retry strategies
|
||||
- Enhanced network change detection
|
||||
- Advanced [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) server load balancing
|
||||
- Backup connectivity mechanisms
|
||||
|
||||
### Configuration Parameters
|
||||
|
||||
#### [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) Configuration
|
||||
|
||||
```yaml
|
||||
autonat:
|
||||
client:
|
||||
dial_timeout: 15s
|
||||
max_peer_addresses: 16
|
||||
throttle_global_limit: 30
|
||||
throttle_peer_limit: 3
|
||||
server:
|
||||
dial_timeout: 30s
|
||||
max_peer_addresses: 16
|
||||
throttle_global_limit: 30
|
||||
throttle_peer_limit: 3
|
||||
```
|
||||
|
||||
#### Port Mapping Configuration
|
||||
|
||||
```yaml
|
||||
port_mapping:
|
||||
pcp:
|
||||
timeout: 30s
|
||||
lifetime: 7200s # 2 hours
|
||||
retry_interval: 300s
|
||||
nat_pmp:
|
||||
timeout: 30s
|
||||
lifetime: 7200s
|
||||
retry_interval: 300s
|
||||
upnp:
|
||||
timeout: 30s
|
||||
lease_duration: 7200s
|
||||
retry_interval: 300s
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### NAT Traversal Security
|
||||
|
||||
1. **Port Mapping Validation**: Verify that requested port mappings are actually created
|
||||
2. **[AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) Server Trust**: Implement peer reputation for [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) servers
|
||||
3. **Gateway Communication**: Secure communication with NAT devices
|
||||
4. **Address Validation**: Validate public addresses before advertisement
|
||||
|
||||
### Privacy Considerations
|
||||
|
||||
1. **IP Address Exposure**: Public nodes necessarily expose IP addresses
|
||||
2. **Traffic Analysis**: Monitor for patterns that could reveal node behavior
|
||||
3. **Gateway Information**: Minimize exposure of internal network topology
|
||||
|
||||
### Denial of Service Protection
|
||||
|
||||
1. **[AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) Rate Limiting**: Implement request throttling for [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) services
|
||||
2. **Port Mapping Abuse**: Prevent excessive port mapping requests
|
||||
3. **Resource Exhaustion**: Limit concurrent NAT traversal attempts
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Scalability
|
||||
|
||||
- **[AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md) Server Load**: Distributed across Public nodes
|
||||
- **Port Mapping Overhead**: Minimal ongoing resource usage
|
||||
- **Network Monitoring**: Efficient periodic checks
|
||||
|
||||
### Reliability
|
||||
|
||||
- **Fallback Mechanisms**: Multiple protocols ensure high success rates
|
||||
- **Continuous Monitoring**: Automatic recovery from connectivity loss
|
||||
- **Protocol Redundancy**: Multiple port mapping protocols increase reliability
|
||||
|
||||
## References
|
||||
|
||||
1. [Multiaddress spec](https://github.com/libp2p/specs/blob/master/addressing/README.md)
|
||||
2. [Identify protocol spec](https://github.com/libp2p/specs/blob/master/identify/README.md)
|
||||
3. [AutoNAT v2 protocol spec](https://github.com/libp2p/specs/blob/master/autonat/autonat-v2.md)
|
||||
4. [Circuit Relay v2 protocol spec](https://github.com/libp2p/specs/blob/master/relay/circuit-v2.md)
|
||||
5. [PCP - RFC 6887](https://datatracker.ietf.org/doc/html/rfc6887)
|
||||
6. [NAT-PMP - RFC 6886](https://datatracker.ietf.org/doc/html/rfc6886)
|
||||
7. [UPnP IGD - RFC 6970](https://datatracker.ietf.org/doc/html/rfc6970)
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
185
nomos/raw/p2p-network-bootstrapping.md
Normal file
185
nomos/raw/p2p-network-bootstrapping.md
Normal file
@@ -0,0 +1,185 @@
|
||||
---
|
||||
title: P2P-NETWORK-BOOTSTRAPPING
|
||||
name: Nomos P2P Network Bootstrapping Specification
|
||||
status: raw
|
||||
category: networking
|
||||
tags: [p2p, networking, bootstrapping, peer-discovery, libp2p]
|
||||
editor: Daniel Sanchez-Quiros <danielsq@status.im>
|
||||
contributors:
|
||||
- Álvaro Castro-Castilla <alvaro@status.im>
|
||||
- Petar Radovic <petar@status.im>
|
||||
- Gusto Bacvinka <augustinas@status.im>
|
||||
- Antonio Antonino <antonio@status.im>
|
||||
- Youngjoon Lee <youngjoon@status.im>
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
Nomos network bootstrapping is the process by which a new node discovers peers and synchronizes with the existing decentralized network. It ensures that a node can:
|
||||
|
||||
1. **Discover Peers** – Find other active nodes in the network.
|
||||
2. **Establish Connections** – Securely connect to trusted peers.
|
||||
3. **Negotiate (libp2p) Protocols** - Ensure that other peers operate in the same protocols as the node needs.
|
||||
|
||||
## Overview
|
||||
|
||||
The Nomos P2P network bootstrapping strategy relies on a designated subset of **bootstrap nodes** to facilitate secure and efficient node onboarding. These nodes serve as the initial entry points for new network participants.
|
||||
|
||||
### Key Design Principles
|
||||
|
||||
#### Trusted Bootstrap Nodes
|
||||
|
||||
A curated set of publicly announced and highly available nodes ensures reliability during initial peer discovery. These nodes are configured with elevated connection limits to handle a high volume of incoming bootstrapping requests from new participants.
|
||||
|
||||
#### Node Configuration & Onboarding
|
||||
|
||||
New node operators must explicitly configure their instances with the addresses of bootstrap nodes. This configuration may be preloaded or dynamically fetched from a trusted source to minimize manual setup.
|
||||
|
||||
#### Network Integration
|
||||
|
||||
Upon initialization, the node establishes connections with the bootstrap nodes and begins participating in Nomos networking protocols. Through these connections, the node discovers additional peers, synchronizes with the network state, and engages in protocol-specific communication (e.g., consensus, block propagation).
|
||||
|
||||
### Security & Decentralization Considerations
|
||||
|
||||
**Trust Minimization**: While bootstrap nodes provide initial connectivity, the network rapidly transitions to decentralized peer discovery to prevent over-reliance on any single entity.
|
||||
|
||||
**Authenticated Announcements**: The identities and addresses of bootstrap nodes are publicly verifiable to mitigate impersonation attacks. From [libp2p documentation](https://docs.libp2p.io/concepts/transports/quic/#quic-in-libp2p):
|
||||
|
||||
> To authenticate each others' peer IDs, peers encode their peer ID into a self-signed certificate, which they sign using their host's private key.
|
||||
|
||||
**Dynamic Peer Management**: After bootstrapping, nodes continuously refine their peer lists to maintain a resilient and distributed network topology.
|
||||
|
||||
This approach ensures **rapid, secure, and scalable** network participation while preserving the decentralized ethos of the Nomos protocol.
|
||||
|
||||
## Protocol
|
||||
|
||||
### Protocol Overview
|
||||
|
||||
The bootstrapping protocol follows libp2p conventions for peer discovery and connection establishment. Implementation details are handled by the underlying libp2p stack with Nomos-specific configuration parameters.
|
||||
|
||||
### Bootstrapping Process
|
||||
|
||||
#### Step-by-Step bootstrapping process
|
||||
|
||||
1. **Node Initial Configuration**: New nodes load pre-configured bootstrap node addresses. Addresses may be `IP` or `DNS` embedded in a compatible [libp2p PeerId multiaddress](https://docs.libp2p.io/concepts/fundamentals/peers/#peer-ids-in-multiaddrs). Node operators may chose to advertise more than one address. This is out of the scope of this protocol. For example:
|
||||
|
||||
`/ip4/198.51.100.0/udp/4242/p2p/QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N` or
|
||||
|
||||
`/dns/foo.bar.net/udp/4242/p2p/QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`
|
||||
|
||||
2. **Secure Connection**: Nodes establish connections to bootstrap nodes announced addresses. Verifies network identity and protocol compatibility.
|
||||
|
||||
3. **Peer Discovery**: Requests and receives validated peer lists from bootstrap nodes. Each entry includes connectivity details as per the peer discovery protocol engaging after the initial connection.
|
||||
|
||||
4. **Network Integration**: Iteratively connects to discovered peers. Gradually build peer connections.
|
||||
|
||||
5. **Protocol Engagement**: Establishes required protocol channels (gossip/consensus/sync). Begins participating in network operations.
|
||||
|
||||
6. **Ongoing Maintenance**: Continuously evaluates and refreshes peer connections. Ideally removes the connection to the bootstrap node itself. Bootstrap nodes may chose to remove the connection on their side to keep high availability for other nodes.
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Nomos Network
|
||||
participant Node
|
||||
participant Bootstrap Node
|
||||
|
||||
Node->>Node: Fetches bootstrapping addresses
|
||||
|
||||
loop Interacts with bootstrap node
|
||||
Node->>+Bootstrap Node: Connects
|
||||
Bootstrap Node->>-Node: Sends discovered peers information
|
||||
end
|
||||
|
||||
loop Connects to Network participants
|
||||
Node->>Nomos Network: Engages in connections
|
||||
Node->>Nomos Network: Negotiates protocols
|
||||
end
|
||||
|
||||
loop Ongoing maintenance
|
||||
Node-->>Nomos Network: Evaluates peer connections
|
||||
alt Bootstrap connection no longer needed
|
||||
Node-->>Bootstrap Node: Disconnects
|
||||
else Bootstrap enforces disconnection
|
||||
Bootstrap Node-->>Node: Disconnects
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The bootstrapping process for the Nomos p2p network uses the **QUIC** transport as specified in the Nomos network specification.
|
||||
|
||||
Bootstrapping is separated from the network's peer discovery protocol. It assumes that there is one protocol that would engage as soon as the connection with the bootstrapping node triggers. Currently Nomos network uses `kademlia` as the current first approach for the Nomos p2p network, this comes granted.
|
||||
|
||||
### Bootstrap Node Requirements
|
||||
|
||||
Bootstrap nodes MUST fulfill the following requirements:
|
||||
|
||||
- **High Availability**: Maintain uptime of 99.5% or higher
|
||||
- **Connection Capacity**: Support minimum 1000 concurrent connections
|
||||
- **Geographic Distribution**: Deploy across multiple regions
|
||||
- **Protocol Compatibility**: Support all required Nomos network protocols
|
||||
- **Security**: Implement proper authentication and rate limiting
|
||||
|
||||
### Network Configuration
|
||||
|
||||
Bootstrap node addresses are distributed through:
|
||||
|
||||
- **Hardcoded addresses** in node software releases
|
||||
- **DNS seeds** for dynamic address resolution
|
||||
- **Community-maintained lists** with cryptographic verification
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Trust Model
|
||||
|
||||
Bootstrap nodes operate under a **minimal trust model**:
|
||||
|
||||
- Nodes verify peer identities through cryptographic authentication
|
||||
- Bootstrap connections are temporary and replaced by organic peer discovery
|
||||
- No single bootstrap node can control network participation
|
||||
|
||||
### Attack Mitigation
|
||||
|
||||
**Sybil Attack Protection**: Bootstrap nodes implement connection limits and peer verification to prevent malicious flooding.
|
||||
|
||||
**Eclipse Attack Prevention**: Nodes connect to multiple bootstrap nodes and rapidly diversify their peer connections.
|
||||
|
||||
**Denial of Service Resistance**: Rate limiting and connection throttling protect bootstrap nodes from resource exhaustion attacks.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Bootstrapping Metrics
|
||||
|
||||
- **Initial Connection Time**: Target < 30 seconds to first bootstrap node
|
||||
- **Peer Discovery Duration**: Discover minimum viable peer set within 2 minutes
|
||||
- **Network Integration**: Full protocol engagement within 5 minutes
|
||||
|
||||
### Resource Requirements
|
||||
|
||||
#### Bootstrap Nodes
|
||||
|
||||
- Memory: Minimum 4GB RAM
|
||||
- Bandwidth: 100 Mbps sustained
|
||||
- Storage: 50GB available space
|
||||
|
||||
#### Regular Nodes
|
||||
|
||||
- Memory: 512MB for bootstrapping process
|
||||
- Bandwidth: 10 Mbps during initial sync
|
||||
- Storage: Minimal requirements
|
||||
|
||||
## References
|
||||
|
||||
- P2P Network Specification (internal document)
|
||||
- [libp2p QUIC Transport](https://docs.libp2p.io/concepts/transports/quic/)
|
||||
- [libp2p Peer IDs and Addressing](https://docs.libp2p.io/concepts/fundamentals/peers/)
|
||||
- [Ethereum bootnodes](https://ethereum.org/en/developers/docs/nodes-and-clients/bootnodes/)
|
||||
- [Bitcoin peer discovery](https://developer.bitcoin.org/devguide/p2p_network.html#peer-discovery)
|
||||
- [Cardano nodes connectivity](https://docs.cardano.org/stake-pool-operators/node-connectivity)
|
||||
- [Cardano peer sharing](https://www.coincashew.com/coins/overview-ada/guide-how-to-build-a-haskell-stakepool-node/part-v-tips/implementing-peer-sharing)
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
307
nomos/raw/p2p-network.md
Normal file
307
nomos/raw/p2p-network.md
Normal file
@@ -0,0 +1,307 @@
|
||||
---
|
||||
title: NOMOS-P2P-NETWORK
|
||||
name: Nomos P2P Network Specification
|
||||
status: draft
|
||||
category: networking
|
||||
tags: [p2p, networking, libp2p, kademlia, gossipsub, quic]
|
||||
editor: Daniel Sanchez-Quiros <danielsq@status.im>
|
||||
contributors:
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
This specification defines the peer-to-peer (P2P) network layer for Nomos blockchain nodes. The network serves as the comprehensive communication infrastructure enabling transaction dissemination through mempool and block propagation. The specification leverages established libp2p protocols to ensure robust, scalable performance with low bandwidth requirements and minimal latency while maintaining accessibility for diverse hardware configurations and network environments.
|
||||
|
||||
## Motivation
|
||||
|
||||
The Nomos blockchain requires a reliable, scalable P2P network that can:
|
||||
|
||||
1. **Support diverse hardware**: From laptops to dedicated servers across various operating systems and geographic locations
|
||||
2. **Enable inclusive participation**: Allow non-technical users to operate nodes with minimal configuration
|
||||
3. **Maintain connectivity**: Ensure nodes remain reachable even with limited connectivity or behind NAT/routers
|
||||
4. **Scale efficiently**: Support large-scale networks (+10k nodes) with eventual consistency
|
||||
5. **Provide low-latency communication**: Enable efficient transaction and block propagation
|
||||
|
||||
## Specification
|
||||
|
||||
### Network Architecture Overview
|
||||
|
||||
The Nomos P2P network addresses three critical challenges:
|
||||
|
||||
- **Peer Connectivity**: Mechanisms for peers to join and connect to the network
|
||||
- **Peer Discovery**: Enabling peers to locate and identify network participants
|
||||
- **Message Transmission**: Facilitating efficient message exchange across the network
|
||||
|
||||
### Transport Protocol
|
||||
|
||||
#### QUIC Protocol Transport
|
||||
|
||||
The Nomos network employs **[QUIC protocol](https://docs.libp2p.io/concepts/transports/quic/)** as the primary transport protocol, leveraging the [libp2p protocol](https://docs.libp2p.io/) implementation.
|
||||
|
||||
**Rationale for [QUIC protocol](https://docs.libp2p.io/concepts/transports/quic/):**
|
||||
|
||||
- Rapid connection establishment
|
||||
- Enhanced NAT traversal capabilities (UDP-based)
|
||||
- Built-in multiplexing simplifies configuration
|
||||
- Production-tested reliability
|
||||
|
||||
### Peer Discovery
|
||||
|
||||
#### Kademlia DHT
|
||||
|
||||
The network utilizes libp2p's Kademlia Distributed Hash Table (DHT) for peer discovery.
|
||||
|
||||
**Protocol Identifiers:**
|
||||
|
||||
- **Mainnet**: `/nomos/kad/1.0.0`
|
||||
- **Testnet**: `/nomos-testnet/kad/1.0.0`
|
||||
|
||||
**Features:**
|
||||
|
||||
- Proximity-based peer discovery heuristics
|
||||
- Distributed peer routing table
|
||||
- Resilient to network partitions
|
||||
- Automatic peer replacement
|
||||
|
||||
#### Identify Protocol
|
||||
|
||||
Complements Kademlia by enabling peer information exchange.
|
||||
|
||||
**Protocol Identifiers:**
|
||||
|
||||
- **Mainnet**: `/nomos/identify/1.0.0`
|
||||
- **Testnet**: `/nomos-testnet/identify/1.0.0`
|
||||
|
||||
**Capabilities:**
|
||||
|
||||
- Protocol support advertisement
|
||||
- Peer capability negotiation
|
||||
- Network interoperability enhancement
|
||||
|
||||
#### Future Considerations
|
||||
|
||||
The current Kademlia implementation is acknowledged as interim. Future improvements target:
|
||||
|
||||
- Lightweight design without full DHT overhead
|
||||
- Highly-scalable eventual consistency
|
||||
- Support for 10k+ nodes with minimal resource usage
|
||||
|
||||
### NAT Traversal
|
||||
|
||||
The network implements comprehensive NAT traversal solutions to ensure connectivity across diverse network configurations.
|
||||
|
||||
**Objectives:**
|
||||
|
||||
- Configuration-free peer connections
|
||||
- Support for users with varying technical expertise
|
||||
- Enable nodes on standard consumer hardware
|
||||
|
||||
**Implementation:**
|
||||
|
||||
- Tailored solutions based on user network configuration
|
||||
- Automatic NAT type detection and adaptation
|
||||
- Fallback mechanisms for challenging network environments
|
||||
|
||||
*Note: Detailed NAT traversal specifications are maintained in a separate document.*
|
||||
|
||||
### Message Dissemination
|
||||
|
||||
#### Gossipsub Protocol
|
||||
|
||||
Nomos employs **gossipsub** for reliable message propagation across the network.
|
||||
|
||||
**Integration:**
|
||||
|
||||
- Seamless integration with Kademlia peer discovery
|
||||
- Automatic peer list updates
|
||||
- Efficient message routing and delivery
|
||||
|
||||
#### Topic Configuration
|
||||
|
||||
**Mempool Dissemination:**
|
||||
|
||||
- **Mainnet**: `/nomos/mempool/0.1.0`
|
||||
- **Testnet**: `/nomos-testnet/mempool/0.1.0`
|
||||
|
||||
**Block Propagation:**
|
||||
|
||||
- **Mainnet**: `/nomos/cryptarchia/0.1.0`
|
||||
- **Testnet**: `/nomos-testnet/cryptarchia/0.1.0`
|
||||
|
||||
#### Network Parameters
|
||||
|
||||
**Peering Degree:**
|
||||
|
||||
- **Minimum recommended**: 8 peers
|
||||
- **Rationale**: Ensures redundancy and efficient propagation
|
||||
- **Configurable**: Nodes may adjust based on resources and requirements
|
||||
|
||||
### Bootstrapping
|
||||
|
||||
#### Initial Network Entry
|
||||
|
||||
New nodes connect to the network through designated bootstrap nodes.
|
||||
|
||||
**Process:**
|
||||
|
||||
1. Connect to known bootstrap nodes
|
||||
2. Obtain initial peer list through Kademlia
|
||||
3. Establish gossipsub connections
|
||||
4. Begin participating in network protocols
|
||||
|
||||
**Bootstrap Node Requirements:**
|
||||
|
||||
- High availability and reliability
|
||||
- Geographic distribution
|
||||
- Version compatibility maintenance
|
||||
|
||||
### Message Encoding
|
||||
|
||||
All network messages follow the Nomos Wire Format specification for consistent encoding and decoding across implementations.
|
||||
|
||||
**Key Properties:**
|
||||
|
||||
- Deterministic serialization
|
||||
- Efficient binary encoding
|
||||
- Forward/backward compatibility support
|
||||
- Cross-platform consistency
|
||||
|
||||
*Note: Detailed wire format specifications are maintained in a separate document.*
|
||||
|
||||
## Implementation Requirements
|
||||
|
||||
### Mandatory Protocols
|
||||
|
||||
All Nomos nodes MUST implement:
|
||||
|
||||
1. **Kademlia DHT** for peer discovery
|
||||
2. **Identify protocol** for peer information exchange
|
||||
3. **Gossipsub** for message dissemination
|
||||
|
||||
### Optional Enhancements
|
||||
|
||||
Nodes MAY implement:
|
||||
|
||||
- Advanced NAT traversal techniques
|
||||
- Custom peering strategies
|
||||
- Enhanced message routing optimizations
|
||||
|
||||
### Network Versioning
|
||||
|
||||
Protocol versions follow semantic versioning:
|
||||
|
||||
- **Major version**: Breaking protocol changes
|
||||
- **Minor version**: Backward-compatible enhancements
|
||||
- **Patch version**: Bug fixes and optimizations
|
||||
|
||||
## Configuration Parameters
|
||||
|
||||
### Implementation Note
|
||||
|
||||
**Current Status**: The Nomos P2P network implementation uses hardcoded libp2p protocol parameters for optimal performance and reliability. While the node configuration file (`config.yaml`) contains network-related settings, the core libp2p protocol parameters (Kademlia DHT, Identify, and Gossipsub) are embedded in the source code.
|
||||
|
||||
### Node Configuration
|
||||
|
||||
The following network parameters are configurable via `config.yaml`:
|
||||
|
||||
#### Network Backend Settings
|
||||
|
||||
```yaml
|
||||
network:
|
||||
backend:
|
||||
host: 0.0.0.0
|
||||
port: 3000
|
||||
node_key: <node_private_key>
|
||||
initial_peers: []
|
||||
```
|
||||
|
||||
#### Protocol-Specific Topics
|
||||
|
||||
**Mempool Dissemination:**
|
||||
|
||||
- **Mainnet**: `/nomos/mempool/0.1.0`
|
||||
- **Testnet**: `/nomos-testnet/mempool/0.1.0`
|
||||
|
||||
**Block Propagation:**
|
||||
|
||||
- **Mainnet**: `/nomos/cryptarchia/0.1.0`
|
||||
- **Testnet**: `/nomos-testnet/cryptarchia/0.1.0`
|
||||
|
||||
### Hardcoded Protocol Parameters
|
||||
|
||||
The following libp2p protocol parameters are currently hardcoded in the implementation:
|
||||
|
||||
#### Peer Discovery Parameters
|
||||
|
||||
- **Protocol identifiers** for Kademlia DHT and Identify protocols
|
||||
- **DHT routing table** configuration and query timeouts
|
||||
- **Peer discovery intervals** and connection management
|
||||
|
||||
#### Message Dissemination Parameters
|
||||
|
||||
- **Gossipsub mesh parameters** (peer degree, heartbeat intervals)
|
||||
- **Message validation** and caching settings
|
||||
- **Topic subscription** and fanout management
|
||||
|
||||
#### Rationale for Hardcoded Parameters
|
||||
|
||||
1. **Network Stability**: Prevents misconfigurations that could fragment the network
|
||||
2. **Performance Optimization**: Parameters are tuned for the target network size and latency requirements
|
||||
3. **Security**: Reduces attack surface by limiting configurable network parameters
|
||||
4. **Simplicity**: Eliminates need for operators to understand complex P2P tuning
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Network-Level Security
|
||||
|
||||
1. **Peer Authentication**: Utilize libp2p's built-in peer identity verification
|
||||
2. **Message Validation**: Implement application-layer message validation
|
||||
3. **Rate Limiting**: Protect against spam and DoS attacks
|
||||
4. **Blacklisting**: Mechanism for excluding malicious peers
|
||||
|
||||
### Privacy Considerations
|
||||
|
||||
1. **Traffic Analysis**: Gossipsub provides some resistance to traffic analysis
|
||||
2. **Metadata Leakage**: Minimize identifiable information in protocol messages
|
||||
3. **Connection Patterns**: Randomize connection timing and patterns
|
||||
|
||||
### Denial of Service Protection
|
||||
|
||||
1. **Resource Limits**: Impose limits on connections and message rates
|
||||
2. **Peer Scoring**: Implement reputation-based peer management
|
||||
3. **Circuit Breakers**: Automatic protection against resource exhaustion
|
||||
|
||||
### Node Configuration Example
|
||||
|
||||
[Nomos Node Configuration](https://github.com/logos-co/nomos/blob/master/nodes/nomos-node/config.yaml) is an example node configuration
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Scalability
|
||||
|
||||
- **Target Network Size**: 10,000+ nodes
|
||||
- **Message Latency**: Sub-second for critical messages
|
||||
- **Bandwidth Efficiency**: Optimized for limited bandwidth environments
|
||||
|
||||
### Resource Requirements
|
||||
|
||||
- **Memory Usage**: Minimal DHT routing table overhead
|
||||
- **CPU Usage**: Efficient cryptographic operations
|
||||
- **Network Bandwidth**: Adaptive based on node role and capacity
|
||||
|
||||
## References
|
||||
|
||||
Original working document, from Nomos Notion: [P2P Network Specification](https://nomos-tech.notion.site/P2P-Network-Specification-206261aa09df81db8100d5f410e39d75).
|
||||
|
||||
1. [libp2p Specifications](https://docs.libp2p.io/)
|
||||
2. [QUIC Protocol Specification](https://docs.libp2p.io/concepts/transports/quic/)
|
||||
3. [Kademlia DHT](https://docs.libp2p.io/concepts/discovery-routing/kaddht/)
|
||||
4. [Gossipsub Protocol](https://github.com/libp2p/specs/tree/master/pubsub/gossipsub)
|
||||
5. [Identify Protocol](https://github.com/libp2p/specs/blob/master/identify/README.md)
|
||||
6. [Nomos Implementation](https://github.com/logos-co/nomos) - Reference implementation and source code
|
||||
7. [Nomos Node Configuration](https://github.com/logos-co/nomos/blob/master/nodes/nomos-node/config.yaml) - Example node configuration
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
345
nomos/raw/sdp.md
Normal file
345
nomos/raw/sdp.md
Normal file
@@ -0,0 +1,345 @@
|
||||
---
|
||||
title: NOMOS-SDP
|
||||
name: Nomos Service Declaration Protocol Specification
|
||||
status: raw
|
||||
category:
|
||||
tags: participation, validators, declarations
|
||||
editor: Marcin Pawlowski <marcin@status.im>
|
||||
contributors:
|
||||
- Mehmet <mehmet@status.im>
|
||||
- Daniel Sanchez Quiros <danielsq@status.im>
|
||||
- Álvaro Castro-Castilla <alvaro@status.im>
|
||||
- Thomas Lavaur <thomaslavaur@status.im>
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
- Gusto Bacvinka <augustinas@status.im>
|
||||
- David Rusu <davidrusu@status.im>
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
This document defines a mechanism enabling validators to declare their participation in specific protocols that require a known and agreed-upon list of participants. Some examples of this are Data Availability and the Blend Network. We create a single repository of identifiers which is used to establish secure communication between validators and provide services. Before being admitted to the repository, the validator proves that it locked at least a minimum stake.
|
||||
|
||||
## Requirements
|
||||
|
||||
The requirements for the protocol are defined as follows:
|
||||
|
||||
- A declaration must be backed by a confirmation that the sender of the declaration owns a certain value of the stake.
|
||||
- A declaration is valid until it is withdrawn or is not used for a service-specific amount of time.
|
||||
|
||||
## Overview
|
||||
|
||||
The SDP enables nodes to declare their eligibility to serve a specific service in the system, and withdraw their declarations.
|
||||
|
||||
### Protocol Actions
|
||||
|
||||
The protocol defines the following actions:
|
||||
|
||||
- **Declare**: A node sends a declaration that confirms its willingness to provide a specific service, which is confirmed by locking a threshold of stake.
|
||||
- **Active**: A node marks that its participation in the protocol is active according to the service-specific activity logic. This action enables the protocol to monitor the node's activity. We utilize this as a non-intrusive differentiator of node activity. It is crucial to exclude inactive nodes from the set of active nodes, as it enhances the stability of services.
|
||||
- **Withdraw**: A node withdraws its declaration and stops providing a service.
|
||||
|
||||
The logic of the protocol is straightforward:
|
||||
|
||||
1. A node sends a declaration message for a specific service and proves it has a minimum stake.
|
||||
2. The declaration is registered on the ledger, and the node can commence its service according to the service-specific service logic.
|
||||
3. After a service-specific service-providing time, the node confirms its activity.
|
||||
4. The node must confirm its activity with a service-specific minimum frequency; otherwise, its declaration is inactive.
|
||||
5. After the service-specific locking period, the node can send a withdrawal message, and its declaration is removed from the ledger, which means that the node will no longer provide the service.
|
||||
|
||||
💡 The protocol messages are subject to a finality that means messages become part of the immutable ledger after a delay. The delay at which it happens is defined by the consensus.
|
||||
|
||||
## Construction
|
||||
|
||||
In this section, we present the main constructions of the protocol. First, we start with data definitions. Second, we describe the protocol actions. Finally, we present part of the Bedrock Mantle design responsible for storing and processing SDP-related messages and data.
|
||||
|
||||
### Data
|
||||
|
||||
In this section, we discuss and define data types, messages, and their storage.
|
||||
|
||||
#### Service Types
|
||||
|
||||
We define the following services which can be used for service declaration:
|
||||
|
||||
- `BN`: for Blend Network service.
|
||||
- `DA`: for Data Availability service.
|
||||
|
||||
```python
|
||||
class ServiceType(Enum):
|
||||
BN="BN" # Blend Network
|
||||
DA="DA" # Data Availability
|
||||
```
|
||||
|
||||
A declaration can be generated for any of the services above. Any declaration that is not one of the above must be rejected. The number of services might grow in the future.
|
||||
|
||||
#### Minimum Stake
|
||||
|
||||
The minimum stake is a global value that defines the minimum stake a node must have to perform any service.
|
||||
|
||||
The `MinStake` is a structure that holds the value of the stake `stake_threshold` and the block number it was set at: `timestamp`.
|
||||
|
||||
```python
|
||||
class MinStake:
|
||||
stake_threshold: StakeThreshold
|
||||
timestamp: BlockNumber
|
||||
```
|
||||
|
||||
The `stake_thresholds` is a structure aggregating all defined `MinStake` values.
|
||||
|
||||
```python
|
||||
stake_thresholds: list[MinStake]
|
||||
```
|
||||
|
||||
For more information on how the minimum stake is calculated, please refer to the Nomos documentation.
|
||||
|
||||
#### Service Parameters
|
||||
|
||||
The service parameters structure defines the parameters set necessary for correctly handling interaction between the protocol and services. Each of the service types defined above must be mapped to a set of the following parameters:
|
||||
|
||||
- `session_length` defines the session length expressed as the number of blocks; the sessions are counted from block `timestamp`.
|
||||
- `lock_period` defines the minimum time (as a number of sessions) during which the declaration cannot be withdrawn, this time must include the period necessary for finalizing the declaration (which might be implicit) and provision of a service for least a single session; it can be expressed as the number of blocks by multiplying its value by the `session_length`.
|
||||
- `inactivity_period` defines the maximum time (as a number of sessions) during which an activation message must be sent; otherwise, the declaration is considered inactive; it can be expressed as the number of blocks by multiplying its value by the `session_length`.
|
||||
- `retention_period` defines the time (as a number of sessions) after which the declaration can be safely deleted by the Garbage Collection mechanism; it can be expressed as the number of blocks by multiplying its value by the `session_length`.
|
||||
- `timestamp` defines the block number at which the parameter was set.
|
||||
|
||||
```python
|
||||
class ServiceParameters:
|
||||
session_length: NumberOfBlocks
|
||||
lock_period: NumberOfSessions
|
||||
inactivity_period: NumberOfSessions
|
||||
retention_period: NumberOfSessions
|
||||
timestamp: BlockNumber
|
||||
```
|
||||
|
||||
The `parameters` is a structure aggregating all defined `ServiceParameters` values.
|
||||
|
||||
```python
|
||||
parameters: list[ServiceParameters]
|
||||
```
|
||||
|
||||
#### Identifiers
|
||||
|
||||
We define the following set of identifiers which are used for service-specific cryptographic operations:
|
||||
|
||||
- `provider_id`: used to sign the SDP messages and to establish secure links between validators; it is `Ed25519PublicKey`.
|
||||
- `zk_id`: used for zero-knowledge operations by the validator that includes rewarding ([Zero Knowledge Signature Scheme (ZkSignature)](https://www.notion.so/Zero-Knowledge-Signature-Scheme-ZkSignature-21c261aa09df8119bfb2dc74a3430df6?pvs=21)).
|
||||
|
||||
#### Locators
|
||||
|
||||
A `Locator` is the address of a validator which is used to establish secure communication between validators. It follows the [multiaddr addressing scheme from libp2p](https://docs.libp2p.io/concepts/fundamentals/addressing/), but it must contain only the location part and must not contain the node identity (`peer_id`).
|
||||
|
||||
The `provider_id` must be used as the node identity. Therefore, the `Locator` must be completed by adding the `provider_id` at the end of it, which makes the `Locator` usable in the context of libp2p.
|
||||
|
||||
The length of the `Locator` is restricted to 329 characters.
|
||||
|
||||
The syntax of every `Locator` entry must be validated.
|
||||
|
||||
**The common formatting of every** `Locator` **must be applied to maintain its unambiguity, to make deterministic ID generation work consistently.** The `Locator` must at least contain only lower case letters and every part of the address must be explicit (no implicit defaults).
|
||||
|
||||
#### Declaration Message
|
||||
|
||||
The construction of the declaration message is as follows:
|
||||
|
||||
```python
|
||||
class DeclarationMessage:
|
||||
service_type: ServiceType
|
||||
locators: list[Locator]
|
||||
provider_id: Ed25519PublicKey
|
||||
zk_id: ZkPublicKey
|
||||
```
|
||||
|
||||
The `locators` list length must be limited to reduce the potential for abuse. Therefore, the length of the list cannot be longer than 8.
|
||||
|
||||
The message must be signed by the `provider_id` key to prove ownership of the key that is used for network-level authentication of the validator. The message is also signed by the `zk_id` key (by default all Mantle transactions are signed with `zk_id` key).
|
||||
|
||||
#### Declaration Storage
|
||||
|
||||
Only valid declaration messages can be stored on the ledger. We define the `DeclarationInfo` as follows:
|
||||
|
||||
```python
|
||||
class DeclarationInfo:
|
||||
service: ServiceType
|
||||
provider_id: Ed25519PublicKey
|
||||
zk_id: ZkPublicKey
|
||||
locators: list[Locator]
|
||||
created: BlockNumber
|
||||
active: BlockNumber
|
||||
withdrawn: BlockNumber
|
||||
nonce: Nonce
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- `service` defines the service type of the declaration;
|
||||
- `provider_id` is an `Ed25519PublicKey` used to sign the message by the validator;
|
||||
- `zk_id` is used for zero-knowledge operations by the validator that includes rewarding ([Zero Knowledge Signature Scheme (ZkSignature)](https://www.notion.so/Zero-Knowledge-Signature-Scheme-ZkSignature-21c261aa09df8119bfb2dc74a3430df6?pvs=21));
|
||||
- `locators` are a copy of the `locators` from the `DeclarationMessage`;
|
||||
- `created` refers to the block number of the block that contained the declaration;
|
||||
- `active` refers to the latest block number for which the active message was sent (it is set to `created` by default);
|
||||
- `withdrawn` refers to the block number for which the service declaration was withdrawn (it is set to 0 by default).
|
||||
- The `nonce` must be set to 0 for the declaration message and must increase monotonically by every message sent for the `declaration_id`.
|
||||
|
||||
We also define the `declaration_id` (of a `DeclarationId` type) that is the unique identifier of `DeclarationInfo` calculated as a hash of the concatenation of `service`, `provider_id`, `locators` and `zk_id`. The implementation of the hash function is `blake2b` using 256 bits of the output.
|
||||
|
||||
```python
|
||||
declaration_id = Hash(service||provider_id||zk_id||locators)
|
||||
```
|
||||
|
||||
The `declaration_id` is not stored as part of the `DeclarationInfo` but it is used to index it.
|
||||
|
||||
All `DeclarationInfo` references are stored in the `declarations` and are indexed by `declaration_id`.
|
||||
|
||||
```python
|
||||
declarations: list[declaration_id]
|
||||
```
|
||||
|
||||
#### Active Message
|
||||
|
||||
The construction of the active message is as follows:
|
||||
|
||||
```python
|
||||
class ActiveMessage:
|
||||
declaration_id: DeclarationId
|
||||
nonce: Nonce
|
||||
metadata: Metadata
|
||||
```
|
||||
|
||||
where `metadata` is a service-specific node activeness metadata.
|
||||
|
||||
The message must be signed by the `zk_id` key associated with the `declaration_id`.
|
||||
|
||||
The `nonce` must increase monotonically by every message sent for the `declaration_id`.
|
||||
|
||||
#### Withdraw Message
|
||||
|
||||
The construction of the withdraw message is as follows:
|
||||
|
||||
```python
|
||||
class WithdrawMessage:
|
||||
declaration_id: DeclarationId
|
||||
nonce: Nonce
|
||||
```
|
||||
|
||||
The message must be signed by the `zk_id` key from the `declaration_id`.
|
||||
|
||||
The `nonce` must increase monotonically by every message sent for the `declaration_id`.
|
||||
|
||||
#### Indexing
|
||||
|
||||
Every event must be correctly indexed to enable lighter synchronization of the changes. Therefore, we index every `declaration_id` according to `EventType`, `ServiceType`, and `Timestamp`. Where `EventType = { "created", "active", "withdrawn" }` follows the type of the message.
|
||||
|
||||
```python
|
||||
events = {
|
||||
event_type: {
|
||||
service_type: {
|
||||
timestamp: {
|
||||
declarations: list[declaration_id]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Protocol
|
||||
|
||||
#### Declare
|
||||
|
||||
The Declare action associates a validator with a service it wants to provide. It requires sending a valid `DeclarationMessage` (as defined in Declaration Message), which is then processed (as defined below) and stored (as defined in Declaration Storage).
|
||||
|
||||
The declaration message is considered valid when all of the following are met:
|
||||
|
||||
- The sender meets the stake requirements.
|
||||
- The `declaration_id` is unique.
|
||||
- The sender knows the secret behind the `provider_id` identifier.
|
||||
- The length of the `locators` list must not be longer than 8.
|
||||
- The `nonce` is increasing monotonically.
|
||||
|
||||
If all of the above conditions are fulfilled, then the message is stored on the ledger; otherwise, the message is discarded.
|
||||
|
||||
#### Active
|
||||
|
||||
The Active action enables marking the provider as actively providing a service. It requires sending a valid `ActiveMessage` (as defined in Active Message), which is relayed to the service-specific node activity logic (as indicated by the service type in Common SDP Structures).
|
||||
|
||||
The Active action updates the `active` value of the `DeclarationInfo`, which means that it also activates inactive (but not expired) providers.
|
||||
|
||||
The SDP active action logic is:
|
||||
|
||||
1. A node sends a `ActiveMessage` transaction.
|
||||
2. The `ActiveMessage` is verified by the SDP logic.
|
||||
a. The `declaration_id` returns an existing `DeclarationInfo`.
|
||||
b. The transaction containing `ActiveMessage` is signed by the `zk_id`.
|
||||
c. The `withdrawn` from the `DeclarationInfo` is set to zero.
|
||||
d. The `nonce` is increasing monotonically.
|
||||
3. If any of these conditions fail, discard the message and stop processing.
|
||||
4. The message is processed by the service-specific activity logic alongside the `active` value indicating the period since the last active message was sent. The `active` value comes from the `DeclarationInfo`.
|
||||
5. If the service-specific activity logic approves the node active message, then the `active` field of the `DeclarationInfo` is set to the current block height.
|
||||
|
||||
#### Withdraw
|
||||
|
||||
The withdraw action enables a withdrawal of a service declaration. It requires sending a valid `WithdrawMessage` (as defined in Withdraw Message). The withdrawal cannot happen before the end of the locking period, which is defined as the number of blocks counted since `created`. This lock period is stored as `lock_period` in the Service Parameters.
|
||||
|
||||
The logic of the withdraw action is:
|
||||
|
||||
1. A node sends a `WithdrawMessage` transaction.
|
||||
2. The `WithdrawMessage` is verified by the SDP logic:
|
||||
a. The `declaration_id` returns an existing `DeclarationInfo`.
|
||||
b. The transaction containing `WithdrawMessage` is signed by the `zk_id`.
|
||||
c. The `withdrawn` from `DeclarationInfo` is set to zero.
|
||||
d. The `nonce` is increasing monotonically.
|
||||
3. If any of the above is not correct, then discard the message and stop.
|
||||
4. Set the `withdrawn` from the `DeclarationInfo` to the current block height.
|
||||
5. Unlock the stake.
|
||||
|
||||
#### Garbage Collection
|
||||
|
||||
The protocol requires a garbage collection mechanism that periodically removes unused `DeclarationInfo` entries.
|
||||
|
||||
The logic of garbage collection is:
|
||||
|
||||
For every `DeclarationInfo` in the `declarations` set, remove the entry if either:
|
||||
|
||||
1. The entry is past the retention period: `withdrawn + retention_period < current_block_height`.
|
||||
2. The entry is inactive beyond the inactivity and retention periods: `active + inactivity_period + retention_period < current_block_height`.
|
||||
|
||||
#### Query
|
||||
|
||||
The protocol must enable querying the ledger in at least the following manner:
|
||||
|
||||
- `GetAllProviderId(timestamp)`, returns all `provider_id`s associated with the `timestamp`.
|
||||
- `GetAllProviderIdSince(timestamp)`, returns all `provider_id`s since the `timestamp`.
|
||||
- `GetAllDeclarationInfo(timestamp)`, returns all `DeclarationInfo` entries associated with the `timestamp`.
|
||||
- `GetAllDeclarationInfoSince(timestamp)`, returns all `DeclarationInfo` entries since the `timestamp`.
|
||||
- `GetDeclarationInfo(provider_id)`, returns the `DeclarationInfo` entry identified by the `provider_id`.
|
||||
- `GetDeclarationInfo(declaration_id)`, returns the `DeclarationInfo` entry identified by the `declaration_id`.
|
||||
- `GetAllServiceParameters(timestamp)`, returns all entries of the `ServiceParameters` store for the requested `timestamp`.
|
||||
- `GetAllServiceParametersSince(timestamp)`, returns all entries of the `ServiceParameters` store since the requested `timestamp`.
|
||||
- `GetServiceParameters(service_type, timestamp)`, returns the service parameter entry from the `ServiceParameters` store of a `service_type` for a specified `timestamp`.
|
||||
- `GetMinStake(timestamp)`, returns the `MinStake` structure at the requested `timestamp`.
|
||||
- `GetMinStakeSince(timestamp)`, returns a set of `MinStake` structures since the requested `timestamp`.
|
||||
|
||||
The query must return an error if the retention period for the delegation has passed and the requested information is not available.
|
||||
|
||||
The list of queries may be extended.
|
||||
|
||||
Every query must return information for a finalized state only.
|
||||
|
||||
### Mantle and ZK Proof
|
||||
|
||||
For more information about the Mantle and ZK proofs, please refer to [Mantle Specification](https://www.notion.so/Mantle-Specification-21c261aa09df810c8820fab1d78b53d9?pvs=21).
|
||||
|
||||
## Appendix
|
||||
|
||||
### Future Improvements
|
||||
|
||||
Refer to the [Mantle Specification](https://www.notion.so/Mantle-Specification-21c261aa09df810c8820fab1d78b53d9?pvs=21) for a list of potential improvements to the protocol.
|
||||
|
||||
## References
|
||||
|
||||
- Mantle and ZK Proof: [Mantle Specification](https://www.notion.so/Mantle-Specification-21c261aa09df810c8820fab1d78b53d9?pvs=21)
|
||||
- Ed25519 Digital Signatures: [RFC 8032](https://datatracker.ietf.org/doc/html/rfc8032)
|
||||
- BLAKE2b Cryptographic Hash: [RFC 7693](https://datatracker.ietf.org/doc/html/rfc7693)
|
||||
- libp2p Multiaddr: [Addressing Specification](https://docs.libp2p.io/concepts/fundamentals/addressing/)
|
||||
- Zero Knowledge Signatures: [ZkSignature Scheme](https://www.notion.so/Zero-Knowledge-Signature-Scheme-ZkSignature-21c261aa09df8119bfb2dc74a3430df6?pvs=21)
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
@@ -40,7 +40,7 @@ Request For Comments specification process managed by the Vac service department
|
||||
|
||||
## License
|
||||
|
||||
Copyright (c) 2008-24 the Editor and Contributors.
|
||||
Copyright (c) 2008-26 the Editor and Contributors.
|
||||
|
||||
This Specification is free software;
|
||||
you can redistribute it and/or
|
||||
|
||||
252
vac/raw/consensus-hashgraphlike.md
Normal file
252
vac/raw/consensus-hashgraphlike.md
Normal file
@@ -0,0 +1,252 @@
|
||||
---
|
||||
title: HASHGRAPHLIKE CONSENSUS
|
||||
name: Hashgraphlike Consensus Protocol
|
||||
status: raw
|
||||
category: Standards Track
|
||||
tags:
|
||||
editor: Ugur Sen [ugur@status.im](mailto:ugur@status.im)
|
||||
contributors: seemenkina [ekaterina@status.im](mailto:ekaterina@status.im)
|
||||
---
|
||||
## Abstract
|
||||
|
||||
This document specifies a scalable, decentralized, and Byzantine Fault Tolerant (BFT)
|
||||
consensus mechanism inspired by Hashgraph, designed for binary decision-making in P2P networks.
|
||||
|
||||
## Motivation
|
||||
|
||||
Consensus is one of the essential components of decentralization.
|
||||
In particular, in the decentralized group messaging application is used for
|
||||
binary decision-making to govern the group.
|
||||
Therefore, each user contributes to the decision-making process.
|
||||
Besides achieving decentralization, the consensus mechanism MUST be strong:
|
||||
|
||||
- Under the assumption of at least `2/3` honest users in the network.
|
||||
|
||||
- Each user MUST conclude the same decision and scalability:
|
||||
message propagation in the network MUST occur within `O(log n)` rounds,
|
||||
where `n` is the total number of peers,
|
||||
in order to preserve the scalability of the messaging application.
|
||||
|
||||
## Format Specification
|
||||
|
||||
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
|
||||
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document
|
||||
are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||
|
||||
## Flow
|
||||
|
||||
Any user in the group initializes the consensus by creating a proposal.
|
||||
Next, the user broadcasts the proposal to the whole network.
|
||||
Upon each user receives the proposal, validates the proposal,
|
||||
adds its vote as yes or no and with its signature and timestamp.
|
||||
The user then sends the proposal and vote to a random peer in a P2P setup,
|
||||
or to a subscribed gossipsub channel if gossip-based messaging is used.
|
||||
Therefore, each user first validates the signature and then adds its new vote.
|
||||
Each sending message counts as a round.
|
||||
After `log(n)` rounds all users in the network have the others vote
|
||||
if at least `2/3` number of users are honest where honesty follows the protocol.
|
||||
|
||||
In general, the voting-based consensus consists of the following phases:
|
||||
|
||||
1. Initialization of voting
|
||||
2. Exchanging votes across the rounds
|
||||
3. Counting the votes
|
||||
|
||||
### Assumptions
|
||||
|
||||
- The users in the P2P network can discover the nodes or they are subscribing same channel in a gossipsub.
|
||||
- We MAY have non-reliable (silent) nodes.
|
||||
- Proposal owners MUST know the number of voters.
|
||||
|
||||
## 1. Initialization of voting
|
||||
|
||||
A user initializes the voting with the proposal payload which is
|
||||
implemented using [protocol buffers v3](https://protobuf.dev/) as follows:
|
||||
|
||||
```bash
|
||||
syntax = "proto3";
|
||||
|
||||
package vac.voting;
|
||||
|
||||
message Proposal {
|
||||
string name = 10; // Proposal name
|
||||
string payload = 11; // Proposal description
|
||||
uint32 proposal_id = 12; // Unique identifier of the proposal
|
||||
bytes proposal_owner = 13; // Public key of the creator
|
||||
repeated Votes = 14; // Vote list in the proposal
|
||||
uint32 expected_voters_count = 15; // Maximum number of distinct voters
|
||||
uint32 round = 16; // Number of Votes
|
||||
uint64 timestamp = 17; // Creation time of proposal
|
||||
uint64 expiration_time = 18; // The time interval that the proposal is active.
|
||||
bool liveness_criteria_yes = 19; // Shows how managing the silent peers vote
|
||||
}
|
||||
|
||||
message Vote {
|
||||
uint32 vote_id = 20; // Unique identifier of the vote
|
||||
bytes vote_owner = 21; // Voter's public key
|
||||
uint32 proposal_id = 22; // Linking votes and proposals
|
||||
int64 timestamp = 23; // Time when the vote was cast
|
||||
bool vote = 24; // Vote bool value (true/false)
|
||||
bytes parent_hash = 25; // Hash of previous owner's Vote
|
||||
bytes received_hash = 26; // Hash of previous received Vote
|
||||
bytes vote_hash = 27; // Hash of all previously defined fields in Vote
|
||||
bytes signature = 28; // Signature of vote_hash
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
To initiate a consensus for a proposal,
|
||||
a user MUST complete all the fields in the proposal, including attaching its `vote`
|
||||
and the `payload` that shows the purpose of the proposal.
|
||||
Notably, `parent_hash` and `received_hash` are empty strings because there is no previous or received hash.
|
||||
Then the initialization section ends when the user who creates the proposal sends it
|
||||
to the random peer from the network or sends it to the proposal to the specific channel.
|
||||
|
||||
## 2. Exchanging votes across the peers
|
||||
|
||||
Once the peer receives the proposal message `P_1` from a 1-1 or a gossipsub channel does the following checks:
|
||||
|
||||
1. Check the signatures of the each votes in proposal, in particular for proposal `P_1`,
|
||||
verify the signature of `V_1` where `V_1 = P_1.votes[0]` with `V_1.signature` and `V_1.vote_owner`
|
||||
2. Do `parent_hash` check: If there are repeated votes from the same sender,
|
||||
check that the hash of the former vote is equal to the `parent_hash` of the later vote.
|
||||
3. Do `received_hash` check: If there are multiple votes in a proposal, check that the hash of a vote is equal to the `received_hash` of the next one.
|
||||
4. After successful verification of the signature and hashes, the receiving peer proceeds to generate `P_2` containing a new vote `V_2` as following:
|
||||
|
||||
4.1. Add its public key as `P_2.vote_owner`.
|
||||
|
||||
4.2. Set `timestamp`.
|
||||
|
||||
4.3. Set boolean `vote`.
|
||||
|
||||
4.4. Define `V_2.parent_hash = 0` if there is no previous peer's vote, otherwise hash of previous owner's vote.
|
||||
|
||||
4.5. Set `V_2.received_hash = hash(P_1.votes[0])`.
|
||||
|
||||
4.6. Set `proposal_id` for the `vote`.
|
||||
|
||||
4.7. Calculate `vote_hash` by hash of all previously defined fields in Vote:
|
||||
`V_2.vote_hash = hash(vote_id, owner, proposal_id, timestamp, vote, parent_hash, received_hash)`
|
||||
|
||||
4.8. Sign `vote_hash` with its private key corresponding the public key as `vote_owner` component then adds `V_2.vote_hash`.
|
||||
|
||||
5. Create `P_2` with by adding `V_2` as follows:
|
||||
|
||||
5.1. Assign `P_2.name`, `P_2.proposal_id`, and `P_2.proposal_owner` to be identical to those in `P_1`.
|
||||
|
||||
5.2. Add the `V_2` to the `P_2.Votes` list.
|
||||
|
||||
5.3. Increase the round by one, namely `P_2.round = P_1.round + 1`.
|
||||
|
||||
5.4. Verify that the proposal has not expired by checking that: `P_2.timestamp - current_time < P_1.expiration_time`.
|
||||
If this does not hold, other peers ignore the message.
|
||||
|
||||
After the peer creates the proposal `P_2` with its vote `V_2`,
|
||||
sends it to the random peer from the network or
|
||||
sends it to the proposal to the specific channel.
|
||||
|
||||
## 3. Determining the result
|
||||
|
||||
Because consensus depends on meeting a quorum threshold,
|
||||
each peer MUST verify the accumulated votes to determine whether the necessary conditions have been satisfied.
|
||||
The voting result is set YES if the majority of the `2n/3` from the distinct peers vote YES.
|
||||
|
||||
To verify, the `findDistinctVoter` method processes the proposal by traversing its `Votes` list to determine the number of unique voters.
|
||||
|
||||
If this method returns true, the peer proceeds with strong validation,
|
||||
which ensures that if any honest peer reaches a decision,
|
||||
no other honest peer can arrive at a conflicting result.
|
||||
|
||||
1. Check each `signature` in the vote as shown in the [Section 2](#2-exchanging-votes-across-the-peers).
|
||||
|
||||
2. Check the `parent_hash` chain if there are multiple votes from the same owner namely `vote_i` and `vote_i+1` respectively,
|
||||
the parent hash of `vote_i+1` should be the hash of `vote_i`
|
||||
|
||||
3. Check the `previous_hash` chain, each received hash of `vote_i+1` should be equal to the hash of `vote_i`.
|
||||
|
||||
4. Check the `timestamp` against the replay attack.
|
||||
In particular, the `timestamp` cannot be the old in the determined threshold.
|
||||
|
||||
5. Check that the liveness criteria defined in the Liveness section are satisfied.
|
||||
|
||||
If a proposal is verified by all the checks,
|
||||
the `countVote` method counts each YES vote from the list of Votes.
|
||||
|
||||
## 4. Properties
|
||||
|
||||
The consensus mechanism satisfies liveness and security properties as follows:
|
||||
|
||||
### Liveness
|
||||
|
||||
Liveness refers to the ability of the protocol to eventually reach a decision when sufficient honest participation is present.
|
||||
In this protocol, if `n > 2` and more than `n/2` of the votes among at least `2n/3` distinct peers are YES,
|
||||
then the consensus result is defined as YES; otherwise, when `n ≤ 2`, unanimous agreement (100% YES votes) is required.
|
||||
|
||||
The peer calculates the result locally as shown in the [Section 3](#3-determining-the-result).
|
||||
From the [hashgraph property](https://hedera.com/learning/hedera-hashgraph/what-is-hashgraph-consensus),
|
||||
if a node could calculate the result of a proposal,
|
||||
it implies that no peer can calculate the opposite of the result.
|
||||
Still, reliability issues can cause some situations where peers cannot receive enough messages,
|
||||
so they cannot calculate the consensus result.
|
||||
|
||||
Rounds are incremented when a peer adds and sends the new proposal.
|
||||
Calculating the required number of rounds, `2n/3` from the distinct peers' votes is achieved in two ways:
|
||||
|
||||
1. `2n/3` rounds in pure P2P networks
|
||||
2. `2` rounds in gossipsub
|
||||
|
||||
Since the message complexity is `O(1)` in the gossipsub channel,
|
||||
in case the network has reliability issues,
|
||||
the second round is used for the peers cannot receive all the messages from the first round.
|
||||
|
||||
If an honest and online peer has received at least one vote but not enough to reach consensus,
|
||||
it MAY continue to propagate its own vote — and any votes it has received — to support message dissemination.
|
||||
This process can continue beyond the expected round count,
|
||||
as long as it remains within the expiration time defined in the proposal.
|
||||
The expiration time acts as a soft upper bound to ensure that consensus is either reached or aborted within a bounded timeframe.
|
||||
|
||||
#### Equality of votes
|
||||
|
||||
An equality of votes occurs when verifying at least `2n/3` distinct voters and
|
||||
applying `liveness_criteria_yes` the number of YES and NO votes is equal.
|
||||
|
||||
Handling ties is an application-level decision. The application MUST define a deterministic tie policy:
|
||||
|
||||
RETRY: re-run the vote with a new proposal_id, optionally adjusting parameters.
|
||||
|
||||
REJECT: abort the proposal and return voting result as NO.
|
||||
|
||||
The chosen policy SHOULD be consistent for all peers via proposal's `payload` to ensure convergence on the same outcome.
|
||||
|
||||
### Silent Node Management
|
||||
|
||||
Silent nodes are the nodes that not participate the voting as YES or NO.
|
||||
There are two possible counting votes for the silent peers.
|
||||
|
||||
1. **Silent peers means YES:**
|
||||
Silent peers counted as YES vote, if the application prefer the strong rejection for NO votes.
|
||||
2. **Silent peers means NO:**
|
||||
Silent peers counted as NO vote, if the application prefer the strong acception for NO votes.
|
||||
|
||||
The proposal is set to default true, which means silent peers' votes are counted as YES namely `liveness_criteria_yes` is set true by default.
|
||||
|
||||
### Security
|
||||
|
||||
This RFC uses cryptographic primitives to prevent the
|
||||
malicious behaviours as follows:
|
||||
|
||||
- Vote forgery attempt: creating unsigned invalid votes
|
||||
- Inconsistent voting: a malicious peer submits conflicting votes (e.g., YES to some peers and NO to others)
|
||||
in different stages of the protocol, violating vote consistency and attempting to undermine consensus.
|
||||
- Integrity breaking attempt: tampering history by changing previous votes.
|
||||
- Replay attack: storing the old votes to maliciously use in fresh voting.
|
||||
|
||||
## 5. Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/)
|
||||
|
||||
## 6. References
|
||||
|
||||
- [Hedera Hashgraph](https://hedera.com/learning/hedera-hashgraph/what-is-hashgraph-consensus)
|
||||
- [Gossip about gossip](https://docs.hedera.com/hedera/core-concepts/hashgraph-consensus-algorithms/gossip-about-gossip)
|
||||
- [Simple implementation of hashgraph consensus](https://github.com/conanwu777/hashgraph)
|
||||
436
vac/raw/eth-mls-offchain.md
Normal file
436
vac/raw/eth-mls-offchain.md
Normal file
@@ -0,0 +1,436 @@
|
||||
---
|
||||
title: ETH-MLS-OFFCHAIN
|
||||
name: Secure channel setup using decentralized MLS and Ethereum accounts
|
||||
status: raw
|
||||
category: Standards Track
|
||||
tags:
|
||||
editor: Ugur Sen [ugur@status.im](mailto:ugur@status.im)
|
||||
contributors: seemenkina [ekaterina@status.im](mailto:ekaterina@status.im)
|
||||
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
The following document specifies Ethereum authenticated scalable
|
||||
and decentralized secure group messaging application by
|
||||
integrating Message Layer Security (MLS) backend.
|
||||
Decentralization refers each user is a node in P2P network and
|
||||
each user has voice for any changes in group.
|
||||
This is achieved by integrating a consensus mechanism.
|
||||
Lastly, this RFC can also be referred to as de-MLS,
|
||||
decentralized MLS, to emphasize its deviation
|
||||
from the centralized trust assumptions of traditional MLS deployments.
|
||||
|
||||
## Motivation
|
||||
|
||||
Group messaging is a fundamental part of digital communication,
|
||||
yet most existing systems depend on centralized servers,
|
||||
which introduce risks around privacy, censorship, and unilateral control.
|
||||
In restrictive settings, servers can be blocked or surveilled;
|
||||
in more open environments, users still face opaque moderation policies,
|
||||
data collection, and exclusion from decision-making processes.
|
||||
To address this, we propose a decentralized, scalable peer-to-peer
|
||||
group messaging system where each participant runs a node, contributes
|
||||
to message propagation, and takes part in governance autonomously.
|
||||
Group membership changes are decided collectively through a lightweight
|
||||
partially synchronous, fault-tolerant consensus protocol without a centralized identity.
|
||||
This design enables truly democratic group communication and is well-suited
|
||||
for use cases like activist collectives, research collaborations, DAOs, support groups,
|
||||
and decentralized social platforms.
|
||||
|
||||
## Format Specification
|
||||
|
||||
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
|
||||
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document
|
||||
are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||
|
||||
### Assumptions
|
||||
|
||||
- The nodes in the P2P network can discover other nodes or will connect to other nodes when subscribing to same topic in a gossipsub.
|
||||
- We MAY have non-reliable (silent) nodes.
|
||||
- We MUST have a consensus that is lightweight, scalable and finalized in a specific time.
|
||||
|
||||
## Roles
|
||||
|
||||
The three roles used in de-MLS is as follows:
|
||||
|
||||
- `node`: Nodes are participants in the network that are not currently members
|
||||
of any secure group messaging session but remain available as potential candidates for group membership.
|
||||
- `member`: Members are special nodes in the secure group messaging who
|
||||
obtains current group key of secure group messaging.
|
||||
Each node is assigned a unique identity represented as a 20-byte value named `member id`.
|
||||
- `steward`: Stewards are special and transparent members in the secure group
|
||||
messaging who organize the changes by releasing commit messages upon the voted proposals.
|
||||
There are two special subsets of steward as epoch and backup steward,
|
||||
which are defined in the section de-MLS Objects.
|
||||
|
||||
## MLS Background
|
||||
|
||||
The de-MLS consists of MLS backend, so the MLS services and other MLS components
|
||||
are taken from the original [MLS specification](https://datatracker.ietf.org/doc/rfc9420/), with or without modifications.
|
||||
|
||||
### MLS Services
|
||||
|
||||
MLS is operated in two services authentication service (AS) and delivery service (DS).
|
||||
Authentication service enables group members to authenticate the credentials presented by other group members.
|
||||
The delivery service routes MLS messages among the nodes or
|
||||
members in the protocol in the correct order and
|
||||
manage the `keyPackage` of the users where the `keyPackage` is the objects
|
||||
that provide some public information about a user.
|
||||
|
||||
### MLS Objects
|
||||
|
||||
Following section presents the MLS objects and components that used in this RFC:
|
||||
|
||||
`Epoch`: Time intervals that changes the state that is defined by members,
|
||||
section 3.4 in [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/).
|
||||
|
||||
`MLS proposal message:` Members MUST receive the proposal message prior to the
|
||||
corresponding commit message that initiates a new epoch with key changes,
|
||||
in order to ensure the intended security properties, section 12.1 in [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/).
|
||||
Here, the add and remove proposals are used.
|
||||
|
||||
`Application message`: This message type used in arbitrary encrypted communication between group members.
|
||||
This is restricted by [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/) as if there is pending proposal,
|
||||
the application message should be cut.
|
||||
Note that: Since the MLS is based on servers, this delay between proposal and commit messages are very small.
|
||||
|
||||
`Commit message:` After members receive the proposals regarding group changes,
|
||||
the committer, who may be any member of the group, as specified in [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/),
|
||||
generates the necessary key material for the next epoch, including the appropriate welcome messages
|
||||
for new joiners and new entropy for removed members. In this RFC, the committers only MUST be stewards.
|
||||
|
||||
### de-MLS Objects
|
||||
|
||||
This section presents the de-MLS objects:
|
||||
|
||||
`Voting proposal`: Similar to MLS proposals, but processed only if approved through a voting process.
|
||||
They function as application messages in the MLS group,
|
||||
allowing the steward to collect them without halting the protocol.
|
||||
There are three types of `voting proposal` according to the type of consensus as in shown Consensus Types section,
|
||||
these are, `commit proposal`, `steward election proposal` and `emergency criteria proposal`.
|
||||
|
||||
`Epoch steward`: The steward assigned to commit in `epoch E` according to the steward list.
|
||||
Holds the primary responsibility for creating commit in that epoch.
|
||||
|
||||
`Backup steward`: The steward next in line after the `epoch steward` on the `steward list` in `epoch E`.
|
||||
Only becomes active if the `epoch steward` is malicious or fails,
|
||||
in which case it completes the commitment phase.
|
||||
If unused in `epoch E`, it automatically becomes the `epoch steward` in `epoch E+1`.
|
||||
|
||||
`Steward list`: It is an ordered list that contains the `member id`s of authorized stewards.
|
||||
Each steward in the list becomes main responsible for creating the commit message when its turn arrives,
|
||||
according to this order for each epoch.
|
||||
For example, suppose there are two stewards in the list `steward A` first and `steward B` last in the list.
|
||||
`steward A` is responsible for creating the commit message for first epoch.
|
||||
Similarly, `steward B` is for the last epoch.
|
||||
Since the `epoch steward` is the primary committer for an epoch,
|
||||
it holds the main responsibility for producing the commit.
|
||||
However, other stewards MAY also generate a commit within the same epoch to preserve liveness
|
||||
in case the epoch steward is inactive or slow.
|
||||
Duplicate commits are not re-applied and only the single valid commit for the epoch is accepted by the group,
|
||||
as in described in section filtering proposals against the multiple comitting.
|
||||
|
||||
Therefore, if a malicious steward occurred, the `backup steward` will be charged with committing.
|
||||
Lastly, the size of the list named as `sn`, which also shows the epoch interval for steward list determination.
|
||||
|
||||
## Flow
|
||||
|
||||
General flow is as follows:
|
||||
|
||||
- A steward initializes a group just once, and then sends out Group Announcements (GA) periodically.
|
||||
- Meanwhile, each `node` creates and sends their `credential` includes `keyPackage`.
|
||||
- Each `member` creates `voting proposals` sends them to from MLS group during `epoch E`.
|
||||
- Meanwhile, the `steward` collects finalized `voting proposals` from MLS group and converts them into
|
||||
`MLS proposals` then sends them with corresponding `commit messages`
|
||||
- Evantually, with the commit messages, all members starts the next `epoch E+1`.
|
||||
|
||||
## Creating Voting Proposal
|
||||
|
||||
A `member` MAY initializes the voting with the proposal payload
|
||||
which is implemented using [protocol buffers v3](https://protobuf.dev/) as follows:
|
||||
|
||||
```protobuf
|
||||
|
||||
syntax = "proto3";
|
||||
|
||||
message Proposal {
|
||||
string name = 10; // Proposal name
|
||||
string payload = 11; // Describes the what is voting fore
|
||||
int32 proposal_id = 12; // Unique identifier of the proposal
|
||||
bytes proposal_owner = 13; // Public key of the creator
|
||||
repeated Vote votes = 14; // Vote list in the proposal
|
||||
int32 expected_voters_count = 15; // Maximum number of distinct voters
|
||||
int32 round = 16; // Number of Votes
|
||||
int64 timestamp = 17; // Creation time of proposal
|
||||
int64 expiration_time = 18; // Time interval that the proposal is active
|
||||
bool liveness_criteria_yes = 19; // Shows how managing the silent peers vote
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
message Vote {
|
||||
int32 vote_id = 20; // Unique identifier of the vote
|
||||
bytes vote_owner = 21; // Voter's public key
|
||||
int64 timestamp = 22; // Time when the vote was cast
|
||||
bool vote = 23; // Vote bool value (true/false)
|
||||
bytes parent_hash = 24; // Hash of previous owner's Vote
|
||||
bytes received_hash = 25; // Hash of previous received Vote
|
||||
bytes vote_hash = 26; // Hash of all previously defined fields in Vote
|
||||
bytes signature = 27; // Signature of vote_hash
|
||||
}
|
||||
```
|
||||
|
||||
The voting proposal MAY include adding a `node` or removing a `member`.
|
||||
After the `member` creates the voting proposal,
|
||||
it is emitted to the network via the MLS `Application message` with a lightweight,
|
||||
epoch based voting such as [hashgraphlike consensus.](https://github.com/vacp2p/rfc-index/blob/consensus-hashgraph-like/vac/raw/consensus-hashgraphlike.md)
|
||||
This consensus result MUST be finalized within the epoch as YES or NO.
|
||||
|
||||
If the voting result is YES, this points out the voting proposal will be converted into
|
||||
the MLS proposal by the `steward` and following commit message that starts the new epoch.
|
||||
|
||||
## Creating welcome message
|
||||
|
||||
When a MLS `MLS proposal message` is created by the `steward`,
|
||||
a `commit message` SHOULD follow,
|
||||
as in section 12.04 [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/) to the members.
|
||||
In order for the new `member` joining the group to synchronize with the current members
|
||||
who received the `commit message`,
|
||||
the `steward` sends a welcome message to the node as the new `member`,
|
||||
as in section 12.4.3.1. [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/).
|
||||
|
||||
## Single steward
|
||||
|
||||
To naive way to create a decentralized secure group messaging is having a single transparent `steward`
|
||||
who only applies the changes regarding the result of the voting.
|
||||
|
||||
This is mostly similar with the general flow and specified in voting proposal and welcome message creation sections.
|
||||
|
||||
1. Each time a single `steward` initializes a group with group parameters with parameters
|
||||
as in section 8.1. Group Context in [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/).
|
||||
2. `steward` creates a group anouncement (GA) according to the previous step and
|
||||
broadcast it to the all network periodically. GA message is visible in network to all `nodes`.
|
||||
3. The each `node` who wants to be a `member` needs to obtain this anouncement and create `credential`
|
||||
includes `keyPackage` that is specified in [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/) section 10.
|
||||
4. The `node` send the `KeyPackages` in plaintext with its signature with current `steward` public key which
|
||||
anounced in welcome topic. This step is crucial for security, ensuring that malicious nodes/stewards
|
||||
cannot use others' `KeyPackages`.
|
||||
It also provides flexibility for liveness in multi-steward settings,
|
||||
allowing more than one steward to obtain `KeyPackages` to commit.
|
||||
5. The `steward` aggregates all `KeyPackages` utilizes them to provision group additions for new members,
|
||||
based on the outcome of the voting process.
|
||||
6. Any `member` start to create `voting proposals` for adding or removing users,
|
||||
and present them to the voting in the MLS group as an application message.
|
||||
|
||||
However, unlimited use of `voting proposals` within the group may be misused by
|
||||
malicious or overly active members.
|
||||
Therefore, an application-level constraint can be introduced to limit the number
|
||||
or frequency of proposals initiated by each member to prevent spam or abuse.
|
||||
7. Meanwhile, the `steward` collects finalized `voting proposals` with in epoch `E`,
|
||||
that have received affirmative votes from members via application messages.
|
||||
Otherwise, the `steward` discards proposals that did not receive a majority of "YES" votes.
|
||||
Since voting proposals are transmitted as application messages, omitting them does not affect
|
||||
the protocol’s correctness or consistency.
|
||||
8. The `steward` converts all approved `voting proposals` into
|
||||
corresponding `MLS proposals` and `commit message`, and
|
||||
transmits both in a single operation as in [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/) section 12.4,
|
||||
including welcome messages for the new members.
|
||||
Therefore, the `commit message` ends the previous epoch and create new ones.
|
||||
9. The `members` applied the incoming `commit message` by checking the signatures and `voting proposals`
|
||||
and synchronized with the upcoming epoch.
|
||||
|
||||
## Multi stewards
|
||||
|
||||
Decentralization has already been achieved in the previous section.
|
||||
However, to improve availability and ensure censorship resistance,
|
||||
the single steward protocol is extended to a multi steward architecture.
|
||||
In this design, each epoch is coordinated by a designated steward,
|
||||
operating under the same protocol as the single steward model.
|
||||
Thus, the multi steward approach primarily defines how steward roles
|
||||
rotate across epochs while preserving the underlying structure and logic of the original protocol.
|
||||
Two variants of the multi steward design are introduced to address different system requirements.
|
||||
|
||||
### Consensus Types
|
||||
|
||||
Consensus is agnostic with its payload; therefore, it can be used for various purposes.
|
||||
Note that each message for the consensus of proposals is an `application message` in the MLS object section.
|
||||
It is used in three ways as follows:
|
||||
|
||||
1. `Commit proposal`: It is the proposal instance that is specified in Creating Voting Proposal section
|
||||
with `Proposal.payload` MUST show the commit request from `members`.
|
||||
Any member MAY create this proposal in any epoch and `epoch steward` MUST collect and commit YES voted proposals.
|
||||
This is the only proposal type common to both single steward and multi steward designs.
|
||||
2. `Steward election proposal`: This is the process that finalizes the `steward list`,
|
||||
which sets and orders stewards responsible for creating commits over a predefined number of range in (`sn_min`,`sn_max`).
|
||||
The validity of the choosen `steward list` ends when the last steward in the list (the one at the final index) completes its commit.
|
||||
At that point, a new `steward election proposal` MUST be initiated again by any member during the corresponding epoch.
|
||||
The `Proposal.payload` field MUST represent the ordered identities of the proposed stewards.
|
||||
Each steward election proposal MUST be verified and finalized through the consensus process
|
||||
so that members can identify which steward will be responsible in each epoch
|
||||
and detect any unauthorized steward commits.
|
||||
3. `Emergency criteria proposal`: If there is a malicious member or steward,
|
||||
this event MUST be voted on to finalize it.
|
||||
If this returns YES, the next epoch MUST include the removal of the member or steward.
|
||||
In a specific case where a steward is removed from the group, causing the total number of stewards to fall below `sn_min`,
|
||||
it is required to repeat the `steward election proposal`.
|
||||
`Proposal.payload` MUST consist of the evidence of the dishonesty as described in the Steward violation list,
|
||||
and the identifier of the malicious member or steward.
|
||||
This proposal can be created by any member in any epoch.
|
||||
|
||||
The order of consensus proposal messages is important to achieving a consistent result.
|
||||
Therefore, messages MUST be prioritized by type in the following order, from highest to lowest priority:
|
||||
|
||||
- `Emergency criteria proposal`
|
||||
|
||||
- `Steward election proposal`
|
||||
|
||||
- `Commit proposal`
|
||||
|
||||
This means that if a higher-priority consensus proposal is present in the network,
|
||||
lower-priority messages MUST be withheld from transmission until the higher-priority proposals have been finalized.
|
||||
|
||||
### Steward list creation
|
||||
|
||||
The `steward list` consists of steward nominees who will become actual stewards if the `steward election proposal` is finalized with YES,
|
||||
is arbitrarily chosen from `member` and OPTIONALLY adjusted depending on the needs of the implementation.
|
||||
The `steward list` size, defined by the minimum `sn_min` and maximum `sn_max` bounds,
|
||||
is determined at the time of group creation.
|
||||
The `sn_min` requirement is applied only when the total number of members exceeds `sn_min`;
|
||||
if the number of available members falls below this threshold,
|
||||
the list size automatically adjusts to include all existing members.
|
||||
|
||||
The actual size of the list MAY vary within this range as `sn`, with the minimum value being at least 1.
|
||||
|
||||
The index of the slots shows epoch info and value of index shows `member id`s.
|
||||
The next in line steward for the `epoch E` is named as `epoch steward`, which has index E.
|
||||
And the subsequent steward in the `epoch E` is named as the `backup steward`.
|
||||
For example, let's assume steward list is (S3, S2, S1) if in the previous epoch the roles were
|
||||
(`backup steward`: S2, `epoch steward`: S1), then in the next epoch they become
|
||||
(`backup steward`: S3, `epoch steward`: S2) by shifting.
|
||||
|
||||
If the `epoch steward` is honest, the `backup steward` does not involve the process in epoch,
|
||||
and the `backup steward` will be the `epoch steward` within the `epoch E+1`.
|
||||
|
||||
If the `epoch steward` is malicious, the `backup steward` is involved in the commitment phase in `epoch E`
|
||||
and the former steward becomes the `backup steward` in `epoch E`.
|
||||
|
||||
Liveness criteria:
|
||||
|
||||
Once the active `steward list` has completed its assigned epochs,
|
||||
|
||||
members MUST proceed to elect the next set of stewards
|
||||
(which MAY include some or all of the previous members).
|
||||
This election is conducted through a type 2 consensus procedure, `steward election proposal`.
|
||||
|
||||
A `Steward election proposal` is considered valid only if the resulting `steward list`
|
||||
is produced through a deterministic process that ensures an unbiased distribution of steward assignments,
|
||||
since allowing bias could enable a malicious participant to manipulate the list
|
||||
and retain control within a favored group for multiple epochs.
|
||||
|
||||
The list MUST consist of at least `sn_min` members, including retained previous stewards,
|
||||
sorted according to the ascending value of `SHA256(epoch E || member id || group id)`,
|
||||
where `epoch E` is the epoch in which the election proposal is initiated,
|
||||
and `group id` for shuffling the list across the different groups.
|
||||
Any proposal with a list that does not adhere to this generation method MUST be rejected by all members.
|
||||
|
||||
We assume that there are no recurring entries in `SHA256(epoch E || member id || group id)`, since the SHA256 outputs are unique
|
||||
when there is no repetition in the `member id` values, against the conflicts on sorting issues.
|
||||
|
||||
### Multi steward with big consensuses
|
||||
|
||||
In this model, all group modifications, such as adding or removing members,
|
||||
must be approved through consensus by all participants,
|
||||
including the steward assigned for `epoch E`.
|
||||
A configuration with multiple stewards operating under a shared consensus protocol offers
|
||||
increased decentralization and stronger protection against censorship.
|
||||
However, this benefit comes with reduced operational efficiency.
|
||||
The model is therefore best suited for small groups that value
|
||||
decentralization and censorship resistance more than performance.
|
||||
|
||||
To create a multi steward with a big consensus,
|
||||
the group is initialized with a single steward as specified as follows:
|
||||
|
||||
1. The steward initialized the group with the config file.
|
||||
This config file MUST contain (`sn_min`,`sn_max`) as the `steward list` size range.
|
||||
2. The steward adds the members as a centralized way till the number of members reaches the `sn_min`.
|
||||
Then, members propose lists by voting proposal with size `sn`
|
||||
as a consensus among all members, as mentioned in the consensus section 2, according to the checks:
|
||||
the size of the proposed list `sn` is in the interval (`sn_min`,`sn_max`).
|
||||
Note that if the total number of members is below `sn_min`,
|
||||
then the steward list size MUST be equal to the total member count.
|
||||
3. After the voting proposal ends up with a `steward list`,
|
||||
and group changes are ready to be committed as specified in single steward section
|
||||
with a difference which is members also check the committed steward is `epoch steward` or `backup steward`,
|
||||
otherwise anyone can create `emergency criteria proposal`.
|
||||
4. If the `epoch steward` violates the changing process as mentioned in the section Steward violation list,
|
||||
one of the members MUST initialize the `emergency criteria proposal` to remove the malicious Steward.
|
||||
Then `backup steward` fulfills the epoch by committing again correctly.
|
||||
|
||||
A large consensus group provides better decentralization, but it requires significant coordination,
|
||||
which MAY not be suitable for groups with more than 1000 members.
|
||||
|
||||
### Multi steward with small consensuses
|
||||
|
||||
The small consensus model offers improved efficiency with a trade-off in decentralization.
|
||||
In this design, group changes require consensus only among the stewards, rather than all members.
|
||||
Regular members participate by periodically selecting the stewards by `steward election proposal`
|
||||
but do not take part in commit decision by `commit proposal`.
|
||||
This structure enables faster coordination since consensus is achieved within a smaller group of stewards.
|
||||
It is particularly suitable for large user groups, where involving every member in each decision would be impractical.
|
||||
|
||||
The flow is similar to the big consensus including the `steward list` finalization with all members consensus
|
||||
only the difference here, the commit messages requires `commit proposal` only among the stewards.
|
||||
|
||||
## Filtering proposals against the multiple comitting
|
||||
|
||||
Since stewards are allowed to produce a commit even when they are not the designated `epoch steward`,
|
||||
multiple commits may appear within the same epoch, often reflecting recurring versions of the same proposal.
|
||||
To ensure a consistent outcome, the valid commit for the epoch SHOULD be selected as the one derived
|
||||
from the longest proposal chain, ordered by the ascending value of each proposal as `SHA256(proposal)`.
|
||||
All other cases, such as invalid commits or commits based on proposals that were not approved through voting,
|
||||
can be easily detected and discarded by the members.
|
||||
|
||||
## Steward violation list
|
||||
|
||||
A steward’s activity is called a violation if the action is one or more of the following:
|
||||
|
||||
1. Broken commit: The steward releases a different commit message from the voted `commit proposal`.
|
||||
This activity is identified by the `members` since the [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/) provides the methods
|
||||
that members can use to identify the broken commit messages that are possible in a few situations,
|
||||
such as commit and proposal incompatibility. Specifically, the broken commit can arise as follows:
|
||||
1. The commit belongs to the earlier epoch.
|
||||
2. The commit message should equal the latest epoch
|
||||
3. The commit needs to be compatible with the previous epoch’s `MLS proposal`.
|
||||
2. Broken MLS proposal: The steward prepares a different `MLS proposal` for the corresponding `voting proposal`.
|
||||
This activity is identified by the `members` since both `MLS proposal` and `voting proposal` are visible
|
||||
and can be identified by checking the hash of `Proposal.payload` and `MLSProposal.payload` is the same as RFC9240 section 12.1. Proposals.
|
||||
3. Censorship and inactivity: The situation where there is a voting proposal that is visible for every member,
|
||||
and the Steward does not provide an MLS proposal and commit.
|
||||
This activity is again identified by the `members`since `voting proposals` are visible to every member in the group,
|
||||
therefore each member can verify that there is no `MLS proposal` corresponding to `voting proposal`.
|
||||
|
||||
## Security Considerations
|
||||
|
||||
In this section, the security considerations are shown as de-MLS assurance.
|
||||
|
||||
1. Malicious Steward: A Malicious steward can act maliciously,
|
||||
as in the Steward violation list section.
|
||||
Therefore, de-MLS enforces that any steward only follows the protocol under the consensus order
|
||||
and commits without emergency criteria application.
|
||||
2. Malicious Member: A member is only marked as malicious
|
||||
when the member acts by releasing a commit message.
|
||||
3. Steward list election bias: Although SHA256 is used together with two global variables
|
||||
to shuffle stewards in a deterministic and verifiable manner,
|
||||
this approach only minimizes election bias; it does not completely eliminate it.
|
||||
This design choice is intentional, in order to preserve the efficiency advantages provided by the MLS mechanism.
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/)
|
||||
|
||||
### References
|
||||
|
||||
- [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/)
|
||||
- [Hashgraphlike Consensus](https://github.com/vacp2p/rfc-index/blob/consensus-hashgraph-like/vac/raw/consensus-hashgraphlike.md)
|
||||
- [vacp2p/de-mls](https://github.com/vacp2p/de-mls)
|
||||
1268
vac/raw/logos-capability-discovery.md
Normal file
1268
vac/raw/logos-capability-discovery.md
Normal file
File diff suppressed because it is too large
Load Diff
1730
vac/raw/mix.md
1730
vac/raw/mix.md
File diff suppressed because it is too large
Load Diff
233
vac/raw/sds.md
233
vac/raw/sds.md
@@ -55,6 +55,12 @@ but improves scalability by reducing direct interactions between participants.
|
||||
Each message has a globally unique, immutable ID (or hash).
|
||||
Messages can be requested from the high-availability caches or
|
||||
other participants using the corresponding message ID.
|
||||
* **Participant ID:**
|
||||
Each participant has a globally unique, immutable ID
|
||||
visible to other participants in the communication.
|
||||
* **Sender ID:**
|
||||
The **Participant ID** of the original sender of a message,
|
||||
often coupled with a **Message ID**.
|
||||
|
||||
## Wire protocol
|
||||
|
||||
@@ -72,20 +78,26 @@ syntax = "proto3";
|
||||
message HistoryEntry {
|
||||
string message_id = 1; // Unique identifier of the SDS message, as defined in `Message`
|
||||
optional bytes retrieval_hint = 2; // Optional information to help remote parties retrieve this SDS message; For example, A Waku deterministic message hash or routing payload hash
|
||||
|
||||
optional string sender_id = 3; // Participant ID of original message sender. Only populated if using optional SDS Repair extension
|
||||
}
|
||||
|
||||
message Message {
|
||||
// 1 Reserved for sender/participant id
|
||||
string sender_id = 1; // Participant ID of the message sender
|
||||
string message_id = 2; // Unique identifier of the message
|
||||
string channel_id = 3; // Identifier of the channel to which the message belongs
|
||||
optional int32 lamport_timestamp = 10; // Logical timestamp for causal ordering in channel
|
||||
optional uint64 lamport_timestamp = 10; // Logical timestamp for causal ordering in channel
|
||||
repeated HistoryEntry causal_history = 11; // List of preceding message IDs that this message causally depends on. Generally 2 or 3 message IDs are included.
|
||||
optional bytes bloom_filter = 12; // Bloom filter representing received message IDs in channel
|
||||
|
||||
repeated HistoryEntry repair_request = 13; // Capped list of history entries missing from sender's causal history. Only populated if using the optional SDS Repair extension.
|
||||
|
||||
optional bytes content = 20; // Actual content of the message
|
||||
}
|
||||
```
|
||||
|
||||
Each message MUST include its globally unique identifier in the `message_id` field,
|
||||
The sending participant MUST include its own globally unique identifier in the `sender_id` field.
|
||||
In addition, it MUST include a globally unique identifier for the message in the `message_id` field,
|
||||
likely based on a message hash.
|
||||
The `channel_id` field MUST be set to the identifier of the channel of group communication
|
||||
that is being synchronized.
|
||||
@@ -98,12 +110,22 @@ These fields MAY be left unset in the case of [ephemeral messages](#ephemeral-me
|
||||
The message `content` MAY be left empty for [periodic sync messages](#periodic-sync-message),
|
||||
otherwise it MUST contain the application-level content
|
||||
|
||||
> **_Note:_** Close readers may notice that,
|
||||
outside of filtering messages originating from the sender itself,
|
||||
the `sender_id` field is not used for much.
|
||||
Its importance is expected to increase once a p2p retrieval mechanism is added to SDS,
|
||||
as is planned for the protocol.
|
||||
|
||||
### Participant state
|
||||
|
||||
Each participant MUST maintain:
|
||||
|
||||
* A Lamport timestamp for each channel of communication,
|
||||
initialized to current epoch time in nanosecond resolution.
|
||||
initialized to current epoch time in millisecond resolution.
|
||||
The Lamport timestamp is increased as described in the [protocol steps](#protocol-steps)
|
||||
to maintain a logical ordering of events while staying close to the current epoch time.
|
||||
This allows the messages from new joiners to be correctly ordered with other recent messages,
|
||||
without these new participants first having to synchronize past messages to discover the current Lamport timestamp.
|
||||
* A bloom filter for received message IDs per channel.
|
||||
The bloom filter SHOULD be rolled over and
|
||||
recomputed once it reaches a predefined capacity of message IDs.
|
||||
@@ -136,8 +158,11 @@ the `lamport_timestamp`, `causal_history` and `bloom_filter` fields.
|
||||
|
||||
Before broadcasting a message:
|
||||
|
||||
* the participant MUST increase its local Lamport timestamp by `1` and
|
||||
include this in the `lamport_timestamp` field.
|
||||
* the participant MUST set its local Lamport timestamp
|
||||
to the maximum between the current value + `1`
|
||||
and the current epoch time in milliseconds.
|
||||
In other words the local Lamport timestamp is set to `max(timeNowInMs, current_lamport_timestamp + 1)`.
|
||||
* the participant MUST include the increased Lamport timestamp in the message's `lamport_timestamp` field.
|
||||
* the participant MUST determine the preceding few message IDs in the local history
|
||||
and include these in an ordered list in the `causal_history` field.
|
||||
The number of message IDs to include in the `causal_history` depends on the application.
|
||||
@@ -157,6 +182,8 @@ of unacknowledged outgoing messages.
|
||||
|
||||
Upon receiving a message,
|
||||
|
||||
* the participant SHOULD ignore the message if it has a `sender_id` matching its own.
|
||||
* the participant MAY deduplicate the message by comparing its `message_id` to previously received message IDs.
|
||||
* the participant MUST [review the ACK status](#review-ack-status) of messages
|
||||
in its unacknowledged outgoing buffer
|
||||
using the received message's causal history and bloom filter.
|
||||
@@ -240,7 +267,8 @@ participants SHOULD periodically send sync messages to maintain state.
|
||||
These sync messages:
|
||||
|
||||
* MUST be sent with empty content
|
||||
* MUST include an incremented Lamport timestamp
|
||||
* MUST include a Lamport timestamp increased to `max(timeNowInMs, current_lamport_timestamp + 1)`,
|
||||
where `timeNowInMs` is the current epoch time in milliseconds.
|
||||
* MUST include causal history and bloom filter according to regular message rules
|
||||
* MUST NOT be added to the unacknowledged outgoing buffer
|
||||
* MUST NOT be included in causal histories of subsequent messages
|
||||
@@ -271,6 +299,197 @@ Upon reception,
|
||||
ephemeral messages SHOULD be delivered immediately without buffering for causal dependencies
|
||||
or including in the local log.
|
||||
|
||||
### SDS Repair (SDS-R)
|
||||
|
||||
SDS Repair (SDS-R) is an optional extension module for SDS,
|
||||
allowing participants in a communication to collectively repair any gaps in causal history (missing messages)
|
||||
preferably over a limited time window.
|
||||
Since SDS-R acts as coordinated rebroadcasting of missing messages,
|
||||
which involves all participants of the communication,
|
||||
it is most appropriate in a limited use case for repairing relatively recent missed dependencies.
|
||||
It is not meant to replace mechanisms for long-term consistency,
|
||||
such as peer-to-peer syncing or the use of a high-availability centralised cache (Store node).
|
||||
|
||||
#### SDS-R message fields
|
||||
|
||||
SDS-R adds the following fields to SDS messages:
|
||||
|
||||
* `sender_id` in `HistoryEntry`:
|
||||
the original message sender's participant ID.
|
||||
This is used to determine the group of participants who will respond to a repair request.
|
||||
* `repair_request` in `Message`:
|
||||
a capped list of history entries missing for the message sender
|
||||
and for which it's requesting a repair.
|
||||
|
||||
#### SDS-R participant state
|
||||
|
||||
SDS-R adds the following to each participant state:
|
||||
|
||||
* Outgoing **repair request buffer**:
|
||||
a list of locally missing `HistoryEntry`s
|
||||
each mapped to a future request timestamp, `T_req`,
|
||||
after which this participant will request a repair if at that point the missing dependency has not been repaired yet.
|
||||
`T_req` is computed as a pseudorandom backoff from the timestamp when the dependency was detected missing.
|
||||
[Determining `T_req`](#determine-t_req) is described below.
|
||||
We RECOMMEND that the outgoing repair request buffer be chronologically ordered in ascending order of `T_req`.
|
||||
|
||||
* Incoming **repair request buffer**:
|
||||
a list of locally available `HistoryEntry`s
|
||||
that were requested for repair by a remote participant
|
||||
AND for which this participant might be an eligible responder,
|
||||
each mapped to a future response timestamp, `T_resp`,
|
||||
after which this participant will rebroadcast the corresponding requested `Message` if at that point no other participant had rebroadcast the `Message`.
|
||||
`T_resp` is computed as a pseudorandom backoff from the timestamp when the repair was first requested.
|
||||
[Determining `T_resp`](#determine-t_resp) is described below.
|
||||
We describe below how a participant can [determine if they're an eligible responder](#determine-response-group) for a specific repair request.
|
||||
|
||||
* Augmented local history log:
|
||||
for each message ID kept in the local log for which the participant could be a repair responder,
|
||||
the full SDS `Message` must be cached rather than just the message ID,
|
||||
in case this participant is called upon to rebroadcast the message.
|
||||
We describe below how a participant can [determine if they're an eligible responder](#determine-response-group) for a specific message.
|
||||
|
||||
**_Note:_** The required state can likely be significantly reduced in future by simply requiring that a responding participant should _reconstruct_ the original `Message` when rebroadcasting, rather than the simpler, but heavier,
|
||||
requirement of caching the entire received `Message` content in local history.
|
||||
|
||||
#### SDS-R global state
|
||||
|
||||
For a specific channel (that is, within a specific SDS-controlled communication)
|
||||
the following SDS-R configuration state SHOULD be common for all participants in the conversation:
|
||||
|
||||
* `T_min`: the _minimum_ time period to wait before a missing causal entry can be repaired.
|
||||
We RECOMMEND a value of at least 30 seconds.
|
||||
* `T_max`: the _maximum_ time period over which missing causal entries can be repaired.
|
||||
We RECOMMEND a value of between 120 and 600 seconds.
|
||||
|
||||
Furthermore, to avoid a broadcast storm with multiple participants responding to a repair request,
|
||||
participants in a single channel MAY be divided into discrete response groups.
|
||||
Participants will only respond to a repair request if they are in the response group for that request.
|
||||
The global `num_response_groups` variable configures the number of response groups for this communication.
|
||||
Its use is described below.
|
||||
A reasonable default value for `num_response_groups` is one response group for every `128` participants.
|
||||
In other words, if the (roughly) expected number of participants is expressed as `num_participants`, then
|
||||
`num_response_groups = num_participants div 128 + 1`.
|
||||
In other words, if there are fewer than 128 participants in a communication,
|
||||
they will all belong to the same response group.
|
||||
|
||||
We RECOMMEND that the global state variables `T_min`, `T_max` and `num_response_groups`
|
||||
be set _statically_ for a specific SDS-R application,
|
||||
based on expected number of group participants and volume of traffic.
|
||||
|
||||
**_Note:_** Future versions of this protocol will recommend dynamic global SDS-R variables,
|
||||
based on the current number of participants.
|
||||
|
||||
#### SDS-R send message
|
||||
|
||||
SDS-R adds the following steps when sending a message:
|
||||
|
||||
Before broadcasting a message,
|
||||
|
||||
* the participant SHOULD populate the `repair_request` field in the message
|
||||
with _eligible_ entries from the outgoing repair request buffer.
|
||||
An entry is eligible to be included in a `repair_request`
|
||||
if its corresponding request timestamp, `T_req`, has expired (in other words,
|
||||
`T_req <= current_time`).
|
||||
The maximum number of repair request entries to include is up to the application.
|
||||
We RECOMMEND that this quota be filled by the eligible entries from the outgoing repair request buffer with the lowest `T_req`.
|
||||
We RECOMMEND a maximum of 3 entries.
|
||||
If there are no eligible entries in the buffer,
|
||||
this optional field MUST be left unset.
|
||||
|
||||
#### SDS-R receive message
|
||||
|
||||
On receiving a message,
|
||||
|
||||
* the participant MUST remove entries matching the received message ID from its _outgoing_ repair request buffer.
|
||||
This ensures that the participant does not request repairs for dependencies that have now been met.
|
||||
* the participant MUST remove entries matching the received message ID from its _incoming_ repair request buffer.
|
||||
This ensures that the participant does not respond to repair requests that another participant has already responded to.
|
||||
* the participant SHOULD add any unmet causal dependencies to its outgoing repair request buffer against a unique `T_req` timestamp for that entry.
|
||||
It MUST compute the `T_req` for each such HistoryEntry according to the steps outlined in [_Determine T_req_](#determine-t_req).
|
||||
* for each item in the `repair_request` field:
|
||||
* the participant MUST remove entries matching the repair message ID from its own outgoing repair request buffer.
|
||||
This limits the number of participants that will request a common missing dependency.
|
||||
* if the participant has the requested `Message` in its local history _and_ is an eligible responder for the repair request,
|
||||
it SHOULD add the request to its incoming repair request buffer against a unique `T_resp` timestamp for that entry.
|
||||
It MUST compute the `T_resp` for each such repair request according to the steps outlined in [_Determine T_resp_](#determine-t_resp).
|
||||
It MUST determine if it's an eligible responder for a repair request according to the steps outlined in [_Determine response group_](#determine-response-group).
|
||||
|
||||
#### Determine T_req
|
||||
|
||||
A participant determines the repair request timestamp, `T_req`,
|
||||
for a missing `HistoryEntry` as follows:
|
||||
|
||||
```text
|
||||
T_req = current_time + hash(participant_id, message_id) % (T_max - T_min) + T_min
|
||||
```
|
||||
|
||||
where `current_time` is the current timestamp,
|
||||
`participant_id` is the participant's _own_ participant ID
|
||||
(not the `sender_id` in the missing `HistoryEntry`),
|
||||
`message_id` is the missing `HistoryEntry`'s message ID,
|
||||
and `T_min` and `T_max` are as set out in [SDS-R global state](#sds-r-global-state).
|
||||
|
||||
This allows `T_req` to be pseudorandomly and linearly distributed as a backoff of between `T_min` and `T_max` from current time.
|
||||
|
||||
> **_Note:_** placing `T_req` values on an exponential backoff curve will likely be more appropriate and is left for a future improvement.
|
||||
|
||||
#### Determine T_resp
|
||||
|
||||
A participant determines the repair response timestamp, `T_resp`,
|
||||
for a `HistoryEntry` that it could repair as follows:
|
||||
|
||||
```text
|
||||
distance = hash(participant_id) XOR hash(sender_id)
|
||||
T_resp = current_time + distance*hash(message_id) % T_max
|
||||
```
|
||||
|
||||
where `current_time` is the current timestamp,
|
||||
`participant_id` is the participant's _own_ (local) participant ID,
|
||||
`sender_id` is the requested `HistoryEntry` sender ID,
|
||||
`message_id` is the requested `HistoryEntry` message ID,
|
||||
and `T_max` is as set out in [SDS-R global state](#sds-r-global-state).
|
||||
|
||||
We first calculate the logical `distance` between the local `participant_id` and
|
||||
the original `sender_id`.
|
||||
If this participant is the original sender, the `distance` will be `0`.
|
||||
It should then be clear that the original participant will have a response backoff time of `0`,
|
||||
making it the most likely responder.
|
||||
The `T_resp` values for other eligible participants will be pseudorandomly and
|
||||
linearly distributed as a backoff of up to `T_max` from current time.
|
||||
|
||||
> **_Note:_** placing `T_resp` values on an exponential backoff curve will likely be more appropriate and
|
||||
is left for a future improvement.
|
||||
|
||||
#### Determine response group
|
||||
|
||||
Given a message with `sender_id` and `message_id`,
|
||||
a participant with `participant_id` is in the response group for that message if
|
||||
|
||||
```text
|
||||
hash(participant_id, message_id) % num_response_groups == hash(sender_id, message_id) % num_response_groups
|
||||
```
|
||||
|
||||
where `num_response_groups` is as set out in [SDS-R global state](#sds-r-global-state).
|
||||
This ensures that a participant will always be in the response group for its own published messages.
|
||||
It also allows participants to determine immediately on first reception of a message or
|
||||
a history entry if they are in the associated response group.
|
||||
|
||||
#### SDS-R incoming repair request buffer sweep
|
||||
|
||||
An SDS-R participant MUST periodically check if there are any incoming requests in the **incoming** repair request buffer* that is due for a response.
|
||||
For each item in the buffer,
|
||||
the participant SHOULD broadcast the corresponding `Message` from local history
|
||||
if its corresponding response timestamp, `T_resp`, has expired
|
||||
(in other words, `T_resp <= current_time`).
|
||||
|
||||
#### SDS-R Periodic Sync Message
|
||||
|
||||
If the participant is due to send a periodic sync message,
|
||||
it SHOULD send the message according to [SDS-R send message](#sds-r-send-message)
|
||||
if there are any eligible items in the outgoing repair request buffer,
|
||||
regardless of whether other participants have also recently broadcast a Periodic Sync message.
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
|
||||
405
vac/raw/zerokit-api.md
Normal file
405
vac/raw/zerokit-api.md
Normal file
@@ -0,0 +1,405 @@
|
||||
---
|
||||
title: Zerokit-API
|
||||
name: Zerokit API
|
||||
status: raw
|
||||
category: Standards Track
|
||||
tags: [zerokit, rln, api]
|
||||
editor:
|
||||
contributors:
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
This document specifies the Zerokit API, an implementation of the RLN-V2 protocol.
|
||||
The specification covers the unified interface exposed through native Rust,
|
||||
C-compatible Foreign Function Interface (FFI) bindings,
|
||||
and WebAssembly (WASM) bindings.
|
||||
|
||||
## Motivation
|
||||
|
||||
The main goal of this RFC is to define the API contract,
|
||||
serialization formats,
|
||||
and architectural guidance for integrating the Zerokit library
|
||||
across all supported platforms.
|
||||
Zerokit is the reference implementation of the RLN-V2 protocol.
|
||||
|
||||
## Format Specification
|
||||
|
||||
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”,
|
||||
“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document
|
||||
are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||
|
||||
### Important Note
|
||||
|
||||
All terms and parameters used remain the same as in [32/RLN-V1](../32/rln-v1.md).
|
||||
More details are available in the [technical overview](../32/rln-v1.md#technical-overview).
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
Zerokit follows a layered architecture where
|
||||
the core RLN logic is implemented once in Rust and
|
||||
exposed through platform-specific bindings.
|
||||
The protocol layer handles zero-knowledge proof generation and verification,
|
||||
Merkle tree operations,
|
||||
and cryptographic primitives.
|
||||
This core is wrapped by three interface layers:
|
||||
native Rust for direct library integration,
|
||||
FFI for C-compatible bindings consumed by languages (such as C and Nim),
|
||||
and WASM for browser and Node.js environments.
|
||||
All three interfaces maintain functional parity and
|
||||
share identical serialization formats for inputs and outputs.
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Application Layer │
|
||||
└──────────┬───────────────┬───────────────┬──────────┘
|
||||
│ │ │
|
||||
┌──────▼───────┐ ┌─────▼─────┐ ┌───────▼─────┐
|
||||
│ FFI API │ │ WASM API │ │ Rust API │
|
||||
│ (C/Nim/..) │ │ (Browser) │ │ (Native) │
|
||||
└──────┬───────┘ └─────┬─────┘ └───────┬─────┘
|
||||
└───────────────┼───────────────┘
|
||||
│
|
||||
┌─────────▼─────────┐
|
||||
│ RLN Protocol │
|
||||
│ (Rust Core) │
|
||||
└───────────────────┘
|
||||
```
|
||||
|
||||
### Supported Features
|
||||
|
||||
Zerokit provides compile-time feature flags that
|
||||
control Merkle tree storage backends,
|
||||
operational modes,
|
||||
and parallelization.
|
||||
|
||||
#### Merkle Tree Backends
|
||||
|
||||
`fullmerkletree` allocates the complete tree structure in memory.
|
||||
This backend provides the fastest performance but consumes the most memory.
|
||||
|
||||
`optimalmerkletree` uses sparse HashMap storage that only allocates nodes as needed.
|
||||
This backend balances performance and memory efficiency.
|
||||
|
||||
`pmtree` persists the tree to disk using a sled database.
|
||||
This backend enables state durability across process restarts.
|
||||
|
||||
#### Operational Modes
|
||||
|
||||
`stateless` disables the internal Merkle tree.
|
||||
Applications MUST provide the Merkle root and
|
||||
membership proof externally when generating proofs.
|
||||
|
||||
When `stateless` is not enabled,
|
||||
the library operates in stateful mode and
|
||||
requires one of the Merkle tree backends.
|
||||
|
||||
#### Parallelization
|
||||
|
||||
`parallel` enables rayon-based parallel computation for
|
||||
proof generation and tree operations.
|
||||
|
||||
This flag SHOULD be enabled for end-user clients where
|
||||
fastest individual proof generation time is required.
|
||||
For server-side proof services handling multiple concurrent requests,
|
||||
this flag SHOULD be disabled and
|
||||
applications SHOULD use dedicated worker threads per proof instead.
|
||||
The worker thread approach provides significantly higher throughput for
|
||||
concurrent proof generation.
|
||||
|
||||
## The API
|
||||
|
||||
### Overview
|
||||
|
||||
The API exposes functional interfaces with strongly-typed parameters.
|
||||
All three platform bindings share the same function signatures,
|
||||
differing only in language-specific conventions.
|
||||
Function signatures documented below are from the Rust perspective.
|
||||
|
||||
- Rust: <https://github.com/vacp2p/zerokit/blob/master/rln/src/public.rs>
|
||||
- FFI: <https://github.com/vacp2p/zerokit/tree/master/rln/src/ffi>
|
||||
- WASM: <https://github.com/vacp2p/zerokit/tree/master/rln-wasm>
|
||||
|
||||
### Initialization
|
||||
|
||||
`RLN::new(tree_depth, tree_config)` creates a new RLN instance by loading circuit resources from the default folder.
|
||||
The `tree_config` parameter accepts multiple types via the `TreeConfigInput` trait: a JSON string,
|
||||
a direct config object (with pmtree feature), or an empty string for defaults.
|
||||
Not available in WASM. Not available when `stateless` feature is enabled.
|
||||
|
||||
`RLN::new()` creates a new stateless RLN instance by loading circuit resources from the default folder.
|
||||
Only available when `stateless` feature is enabled. Not available in WASM.
|
||||
|
||||
`RLN::new_with_params(tree_depth, zkey_data, graph_data, tree_config)` creates a new RLN instance
|
||||
with pre-loaded circuit parameters passed as byte vectors.
|
||||
The `tree_config` parameter accepts multiple types via the `TreeConfigInput` trait.
|
||||
Not available in WASM. Not available when `stateless` feature is enabled.
|
||||
|
||||
`RLN::new_with_params(zkey_data, graph_data)` creates a new stateless RLN instance with pre-loaded circuit parameters.
|
||||
Only available when `stateless` feature is enabled. Not available in WASM.
|
||||
|
||||
`RLN::new_with_params(zkey_data)` creates a new stateless RLN instance for WASM with pre-loaded zkey data.
|
||||
Graph data is not required as witness calculation is handled externally in WASM environments.
|
||||
Only available in WASM with `stateless` feature enabled.
|
||||
|
||||
### Key Generation
|
||||
|
||||
`keygen()` generates a random identity keypair returning `(identity_secret, id_commitment)`.
|
||||
|
||||
`seeded_keygen(seed)` generates a deterministic identity keypair
|
||||
from a seed returning `(identity_secret, id_commitment)`.
|
||||
|
||||
`extended_keygen()` generates a random extended identity keypair
|
||||
returning `(identity_trapdoor, identity_nullifier, identity_secret, id_commitment)`.
|
||||
|
||||
`extended_seeded_keygen(seed)` generates a deterministic extended identity keypair
|
||||
from a seed returning `(identity_trapdoor, identity_nullifier, identity_secret, id_commitment)`.
|
||||
|
||||
### Merkle Tree Management
|
||||
|
||||
All tree management functions are only available when
|
||||
`stateless` feature is NOT enabled.
|
||||
|
||||
`set_tree(tree_depth)` initializes the internal Merkle tree with the specified depth.
|
||||
Leaves are set to the default zero value.
|
||||
|
||||
`set_leaf(index, leaf)` sets a leaf value at the specified index.
|
||||
|
||||
`get_leaf(index)` returns the leaf value at the specified index.
|
||||
|
||||
`set_leaves_from(index, leaves)` sets multiple leaves starting from the specified index.
|
||||
Updates `next_index` to `max(next_index, index + n)`.
|
||||
If n leaves are passed, they will be set at positions `index`, `index+1`, ..., `index+n-1`.
|
||||
|
||||
`init_tree_with_leaves(leaves)` resets the tree state to default and initializes it
|
||||
with the provided leaves starting from index 0.
|
||||
This resets the internal `next_index` to 0 before setting the leaves.
|
||||
|
||||
`atomic_operation(index, leaves, indices)` atomically inserts leaves starting from index
|
||||
and removes leaves at the specified indices.
|
||||
Updates `next_index` to `max(next_index, index + n)` where n is the number of leaves inserted.
|
||||
|
||||
`set_next_leaf(leaf)` sets a leaf at the next available index and increments `next_index`.
|
||||
The leaf is set at the current `next_index` value, then `next_index` is incremented.
|
||||
|
||||
`delete_leaf(index)` sets the leaf at the specified index to the default zero value.
|
||||
Does not change the internal `next_index` value.
|
||||
|
||||
`leaves_set()` returns the number of leaves that have been set in the tree.
|
||||
|
||||
`get_root()` returns the current Merkle tree root.
|
||||
|
||||
`get_subtree_root(level, index)` returns the root of a subtree at the specified level and index.
|
||||
|
||||
`get_merkle_proof(index)` returns the Merkle proof for the leaf at the specified index as `(path_elements, identity_path_index)`.
|
||||
|
||||
`get_empty_leaves_indices()` returns indices of leaves set to zero up to the final leaf that was set.
|
||||
|
||||
`set_metadata(metadata)` stores arbitrary metadata in the RLN object for application use.
|
||||
This metadata is not used by the RLN module.
|
||||
|
||||
`get_metadata()` returns the metadata stored in the RLN object.
|
||||
|
||||
`flush()` closes the connection to the Merkle tree database.
|
||||
Should be called before dropping the RLN object when using persistent storage.
|
||||
|
||||
### Witness Construction
|
||||
|
||||
`RLNWitnessInput::new(identity_secret, user_message_limit, message_id, path_elements, identity_path_index, x, external_nullifier)` constructs
|
||||
a witness input for proof generation. Validates that `message_id < user_message_limit`.
|
||||
|
||||
### Witness Calculation
|
||||
|
||||
For native (non-WASM) environments, witness calculation is handled internally by the proof generation functions.
|
||||
The circuit witness is computed from the `RLNWitnessInput` and passed to the zero-knowledge proof system.
|
||||
|
||||
For WASM environments, witness calculation must be performed externally using a JavaScript witness calculator.
|
||||
The workflow is:
|
||||
|
||||
1. Create a `WasmRLNWitnessInput` with the required parameters
|
||||
2. Export to JSON format using `toBigIntJson()` method
|
||||
3. Pass the JSON to an external JavaScript witness calculator
|
||||
4. Use the calculated witness with `generate_rln_proof_with_witness`
|
||||
|
||||
The witness calculator computes all intermediate values required by the RLN circuit.
|
||||
|
||||
### Proof Generation
|
||||
|
||||
`generate_zk_proof(witness)` generates a Groth16 zkSNARK proof from a witness.
|
||||
Extract proof values separately using `proof_values_from_witness`.
|
||||
Not available in WASM.
|
||||
|
||||
`generate_rln_proof(witness)` generates a complete RLN proof returning both the zkSNARK proof and proof values as `(proof, proof_values)`.
|
||||
This combines proof generation and proof values extraction.
|
||||
Not available in WASM.
|
||||
|
||||
`generate_rln_proof_with_witness(calculated_witness, witness)` generates an RLN proof using
|
||||
a pre-calculated witness from an external witness calculator.
|
||||
The `calculated_witness` should be a `Vec<BigInt>` obtained from the external witness calculator.
|
||||
Returns `(proof, proof_values)`.
|
||||
This is the primary proof generation method for WASM where witness calculation is handled by JavaScript.
|
||||
|
||||
### Proof Verification
|
||||
|
||||
`verify_zk_proof(proof, proof_values)` verifies only the zkSNARK proof without root or signal validation.
|
||||
Returns `true` if the proof is valid.
|
||||
|
||||
`verify_rln_proof(proof, proof_values, x)` verifies the proof against the internal Merkle tree root and
|
||||
validates that `x` matches the proof signal.
|
||||
Returns an error if verification fails (invalid proof, invalid root, or invalid signal).
|
||||
Only available when `stateless` feature is NOT enabled.
|
||||
|
||||
`verify_with_roots(proof, proof_values, x, roots)` verifies the proof against a set of acceptable roots and
|
||||
validates the signal.
|
||||
If the roots slice is empty, root verification is skipped. Returns an error if verification fails.
|
||||
|
||||
### Slashing
|
||||
|
||||
`recover_id_secret(proof_values_1, proof_values_2)` recovers the identity secret from two proof values
|
||||
that share the same external nullifier.
|
||||
Used to detect and penalize rate limit violations.
|
||||
|
||||
### Hash Utilities
|
||||
|
||||
`poseidon_hash(inputs)` computes the Poseidon hash of the input field elements.
|
||||
|
||||
`hash_to_field_le(input)` hashes arbitrary bytes to a field element using little-endian byte order.
|
||||
|
||||
`hash_to_field_be(input)` hashes arbitrary bytes to a field element using big-endian byte order.
|
||||
|
||||
### Serialization Utilities
|
||||
|
||||
`rln_witness_to_bytes_le` / `rln_witness_to_bytes_be` serializes an RLN witness to bytes.
|
||||
|
||||
`bytes_le_to_rln_witness` / `bytes_be_to_rln_witness` deserializes bytes to an RLN witness.
|
||||
|
||||
`rln_proof_to_bytes_le` / `rln_proof_to_bytes_be` serializes an RLN proof to bytes.
|
||||
|
||||
`bytes_le_to_rln_proof` / `bytes_be_to_rln_proof` deserializes bytes to an RLN proof.
|
||||
|
||||
`rln_proof_values_to_bytes_le` / `rln_proof_values_to_bytes_be` serializes proof values to bytes.
|
||||
|
||||
`bytes_le_to_rln_proof_values` / `bytes_be_to_rln_proof_values` deserializes bytes to proof values.
|
||||
|
||||
`fr_to_bytes_le` / `fr_to_bytes_be` serializes a field element to 32 bytes.
|
||||
|
||||
`bytes_le_to_fr` / `bytes_be_to_fr` deserializes 32 bytes to a field element.
|
||||
|
||||
`vec_fr_to_bytes_le` / `vec_fr_to_bytes_be` serializes a vector of field elements to bytes.
|
||||
|
||||
`bytes_le_to_vec_fr` / `bytes_be_to_vec_fr` deserializes bytes to a vector of field elements.
|
||||
|
||||
### WASM-Specific Notes
|
||||
|
||||
WASM bindings wrap the Rust API with JavaScript-compatible types. Key differences:
|
||||
|
||||
- Field elements are wrapped as `WasmFr` with `fromBytesLE`, `fromBytesBE`, `toBytesLE`, `toBytesBE` methods.
|
||||
- Vectors of field elements use `VecWasmFr` with `push`, `get`, `length` methods.
|
||||
- Identity generation uses `Identity.generate()` and `Identity.generateSeeded(seed)` static methods.
|
||||
- Extended identity uses `ExtendedIdentity.generate()` and `ExtendedIdentity.generateSeeded(seed)`.
|
||||
- Witness input uses `WasmRLNWitnessInput` constructor and `toBigIntJson()` for witness calculator integration.
|
||||
- Proof generation requires external witness calculation via `generateRLNProofWithWitness(calculatedWitness, witness)`.
|
||||
- When `parallel` feature is enabled, call `initThreadPool()` to initialize the thread pool.
|
||||
|
||||
### FFI-Specific Notes
|
||||
|
||||
FFI bindings use C-compatible types with the `ffi_` prefix. Key differences:
|
||||
|
||||
- Field elements are wrapped as `CFr` with corresponding conversion functions.
|
||||
- Results use `CResult` or `CBoolResult` structs with `ok` and `err` fields.
|
||||
- Memory must be explicitly freed using `ffi_*_free` functions.
|
||||
- Vectors use `repr_c::Vec` with `ffi_vec_*` helper functions.
|
||||
- Configuration is passed via file path to a JSON configuration file.
|
||||
|
||||
## Usage Patterns
|
||||
|
||||
This section describes common deployment scenarios and
|
||||
the recommended API combinations for each.
|
||||
|
||||
### Stateful with Changing Root
|
||||
|
||||
Applies when membership changes over time with members joining and slashing continuously.
|
||||
|
||||
Applications MUST maintain a sliding window of recent roots externally.
|
||||
When members are added or removed via `set_leaf`, `delete_leaf`, or `atomic_operation`,
|
||||
capture the new root using `get_root` and append it to the history buffer.
|
||||
Verify incoming proofs using `verify_with_roots` with the root history buffer,
|
||||
accepting proofs valid against any recent root.
|
||||
|
||||
The window size depends on network propagation delays and epoch duration.
|
||||
|
||||
### Stateful with Fixed Root
|
||||
|
||||
Applies when membership is established once and remains static during operation.
|
||||
|
||||
Initialize the tree using `init_tree_with_leaves` with the complete membership set.
|
||||
No root history is required.
|
||||
Verify proofs using `verify_rln_proof` which checks against the internal tree root directly.
|
||||
|
||||
### Stateless
|
||||
|
||||
Applies when membership state is managed externally,
|
||||
such as by a smart contract or relay network.
|
||||
|
||||
Enable the `stateless` feature flag.
|
||||
Obtain Merkle proofs and valid roots from the external source.
|
||||
Pass externally provided `path_elements` and `identity_path_index` to `RLNWitnessInput::new`.
|
||||
Verify using `verify_with_roots` with externally provided roots.
|
||||
|
||||
### WASM Browser Integration
|
||||
|
||||
WASM environments require external witness calculation.
|
||||
Use `WasmRLNWitnessInput::toBigIntJson()` to export the witness for
|
||||
JavaScript witness calculators,
|
||||
then pass the result to `generateRLNProofWithWitness`.
|
||||
|
||||
When `parallel` feature is enabled,
|
||||
call `initThreadPool()` before proof operations.
|
||||
This requires COOP/COEP headers for SharedArrayBuffer support.
|
||||
|
||||
#### Epoch and Rate Limit Configuration
|
||||
|
||||
The external nullifier is computed as `poseidon_hash([epoch, rln_identifier])`.
|
||||
Each application SHOULD use a unique `rln_identifier` to
|
||||
prevent cross-application nullifier collisions.
|
||||
|
||||
The `user_message_limit` in the rate commitment determines messages allowed per epoch.
|
||||
The `message_id` must be less than `user_message_limit` and
|
||||
should increment with each message.
|
||||
Applications MUST persist the `message_id` counter to avoid violations after restarts.
|
||||
|
||||
## Security/Privacy Considerations
|
||||
|
||||
The security of Zerokit depends on the correct implementation of the RLN-V2 protocol
|
||||
and the underlying zero-knowledge proof system.
|
||||
Applications MUST ensure that:
|
||||
|
||||
- Identity secrets are kept confidential and never transmitted or logged
|
||||
- The `message_id` counter is properly persisted to prevent accidental rate limit violations
|
||||
- External nullifiers are constructed correctly to prevent cross-application attacks
|
||||
- Merkle tree roots are validated when using stateless mode
|
||||
- Circuit parameters (zkey and graph data) are obtained from trusted sources
|
||||
|
||||
When using the `parallel` feature in WASM,
|
||||
applications MUST serve content with appropriate COOP/COEP headers to
|
||||
enable SharedArrayBuffer support securely.
|
||||
|
||||
The slashing mechanism exposes identity secrets when rate limits are violated.
|
||||
Applications SHOULD educate users about this risk and
|
||||
implement safeguards to prevent accidental violations.
|
||||
|
||||
## References
|
||||
|
||||
### Normative
|
||||
|
||||
- [32/RLN-V1](../32/rln-v1.md) - Rate Limit Nullifier V1 specification
|
||||
|
||||
### Informative
|
||||
|
||||
- [Zerokit GitHub Repository](https://github.com/vacp2p/zerokit) - Reference implementation
|
||||
- [RLN-V2 Specification](./rln-v2.md) - Rate Limit Nullifier V2 protocol
|
||||
- [Sled Database](https://sled.rs) - Embedded database used for persistent Merkle tree storage
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
@@ -2,7 +2,7 @@
|
||||
slug: 21
|
||||
title: 21/WAKU2-FAULT-TOLERANT-STORE
|
||||
name: Waku v2 Fault-Tolerant Store
|
||||
status: draft
|
||||
status: deleted
|
||||
editor: Sanaz Taheri <sanaz@status.im>
|
||||
contributors:
|
||||
---
|
||||
189
waku/standards/core/31/enr.md
Normal file
189
waku/standards/core/31/enr.md
Normal file
@@ -0,0 +1,189 @@
|
||||
---
|
||||
slug: 31
|
||||
title: 31/WAKU2-ENR
|
||||
name: Waku v2 usage of ENR
|
||||
status: draft
|
||||
tags: [waku/core-protocol]
|
||||
editor: Franck Royer <franck@status.im>
|
||||
contributors:
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
This specification describes the usage of the ENR (Ethereum Node Records)
|
||||
format for [10/WAKU2](../10/waku2.md) purposes.
|
||||
The ENR format is defined in [EIP-778](https://eips.ethereum.org/EIPS/eip-778) [[3]](#references).
|
||||
|
||||
This specification is an extension of EIP-778,
|
||||
ENR used in Waku MUST adhere to both EIP-778 and 31/WAKU2-ENR.
|
||||
|
||||
## Motivation
|
||||
|
||||
EIP-1459 with the usage of ENR has been implemented [[1]](#references) [[2]](#references) as a discovery protocol for Waku.
|
||||
|
||||
EIP-778 specifies a number of pre-defined keys.
|
||||
However, the usage of these keys alone does not allow for certain transport capabilities to be encoded,
|
||||
such as Websocket.
|
||||
Currently, Waku nodes running in a browser only support websocket transport protocol.
|
||||
Hence, new ENR keys need to be defined to allow for the encoding of transport protocol other than raw TCP.
|
||||
|
||||
### Usage of Multiaddr Format Rationale
|
||||
|
||||
One solution would be to define new keys such as `ws` to encode the websocket port of a node.
|
||||
However, we expect new transport protocols to be added overtime such as quic.
|
||||
Hence, this would only provide a short term solution until another specification would need to be added.
|
||||
|
||||
Moreover, secure websocket involves SSL certificates.
|
||||
SSL certificates are only valid for a given domain and ip,
|
||||
so an ENR containing the following information:
|
||||
|
||||
- secure websocket port
|
||||
- ipv4 fqdn
|
||||
- ipv4 address
|
||||
- ipv6 address
|
||||
|
||||
Would carry some ambiguity: Is the certificate securing the websocket port valid for the ipv4 fqdn?
|
||||
the ipv4 address?
|
||||
the ipv6 address?
|
||||
|
||||
The [10/WAKU2](../10/waku2.md) protocol family is built on the [libp2p](https://github.com/libp2p/specs) protocol stack.
|
||||
Hence, it uses [multiaddr](https://github.com/multiformats/multiaddr) to format network addresses.
|
||||
|
||||
Directly storing one or several multiaddresses in the ENR would fix the issues listed above:
|
||||
|
||||
- multiaddr is self-describing and support addresses for any network protocol:
|
||||
No new specification would be needed to support encoding other transport protocols in an ENR.
|
||||
- multiaddr contains both the host and port information,
|
||||
allowing the ambiguity previously described to be resolved.
|
||||
|
||||
## Wire Format
|
||||
|
||||
### `multiaddrs` ENR key
|
||||
|
||||
We define a `multiaddrs` key.
|
||||
|
||||
- The value MUST be a list of binary encoded multiaddr prefixed by their size.
|
||||
- The size of the multiaddr MUST be encoded in a Big Endian unsigned 16-bit integer.
|
||||
- The size of the multiaddr MUST be encoded in 2 bytes.
|
||||
- The `secp256k1` value MUST be present on the record;
|
||||
`secp256k1` is defined in [EIP-778](https://eips.ethereum.org/EIPS/eip-778) and
|
||||
contains the compressed secp256k1 public key.
|
||||
- The node's peer id SHOULD be deduced from the `secp256k1` value.
|
||||
- The multiaddresses SHOULD NOT contain a peer id except for circuit relay addresses
|
||||
- For raw TCP & UDP connections details,
|
||||
[EIP-778](https://eips.ethereum.org/EIPS/eip-778) pre-defined keys SHOULD be used;
|
||||
The keys `tcp`, `udp`, `ip` (and `tcp6`, `udp6`, `ip6` for IPv6)
|
||||
are enough to convey all necessary information;
|
||||
- To save space, `multiaddrs` key SHOULD only be used for connection details that cannot be represented using the [EIP-778](https://eips.ethereum.org/EIPS/eip-778) pre-defined keys.
|
||||
- The 300 bytes size limit as defined by [EIP-778](https://eips.ethereum.org/EIPS/eip-778) still applies;
|
||||
In practice, it is possible to encode 3 multiaddresses in ENR, more or
|
||||
less could be encoded depending on the size of each multiaddress.
|
||||
|
||||
### Usage
|
||||
|
||||
#### Many connection types
|
||||
|
||||
Alice is a Waku node operator, she runs a node that supports inbound connection for the following protocols:
|
||||
|
||||
- TCP 10101 on `1.2.3.4`
|
||||
- UDP 20202 on `1.2.3.4`
|
||||
- TCP 30303 on `1234:5600:101:1::142`
|
||||
- UDP 40404 on `1234:5600:101:1::142`
|
||||
- Secure Websocket on `wss://example.com:443/`
|
||||
- QUIC on `quic://quic.example.com:443/`
|
||||
- A circuit relay address `/ip4/1.2.3.4/tcp/55555/p2p/QmRelay/p2p-circuit/p2p/QmAlice`
|
||||
|
||||
Alice SHOULD structure the ENR for her node as follows:
|
||||
|
||||
| key | value |
|
||||
| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `tcp` | `10101` |
|
||||
| `udp` | `20202` |
|
||||
| `tcp6` | `30303` |
|
||||
| `udp6` | `40404` |
|
||||
| `ip` | `1.2.3.4` |
|
||||
| `ip6` | `1234:5600:101:1::142` |
|
||||
| `secp256k1` | Alice's compressed secp256k1 public key, 33 bytes |
|
||||
| `multiaddrs` | `len1 \| /dns4/example.com/tcp/443/wss \| len2 \| /dns4/quic.examle.com/tcp/443/quic \| len3 \| /ip4/1.2.3.4/tcp/55555/p2p/QmRelay` |
|
||||
|
||||
Where `multiaddrs`:
|
||||
|
||||
- `|` is the concatenation operator,
|
||||
- `len1` is the length of `/dns4/example.com/tcp/443/wss` byte representation,
|
||||
- `len2` is the length of `/dns4/quic.examle.com/tcp/443/quic` byte representation.
|
||||
- `len3` is the length of `/ip4/1.2.3.4/tcp/55555/p2p/QmRelay` byte representation.
|
||||
Notice that the `/p2p-circuit` component is not stored, but,
|
||||
since circuit relay addresses are the only one containing a `p2p` component,
|
||||
it's safe to assume that any address containing this component is a circuit relay address.
|
||||
Decoding this type of multiaddresses would require appending the `/p2p-circuit` component.
|
||||
|
||||
#### Raw TCP only
|
||||
|
||||
Bob is a node operator that runs a node that supports inbound connection for the following protocols:
|
||||
|
||||
- TCP 10101 on `1.2.3.4`
|
||||
|
||||
Bob SHOULD structure the ENR for his node as follows:
|
||||
|
||||
| key | value |
|
||||
| ----------- | ----------------------------------------------- |
|
||||
| `tcp` | `10101` |
|
||||
| `ip` | `1.2.3.4` |
|
||||
| `secp256k1` | Bob's compressed secp256k1 public key, 33 bytes |
|
||||
|
||||
As Bob's node's connection details can be represented with EIP-778's pre-defined keys only,
|
||||
it is not needed to use the `multiaddrs` key.
|
||||
|
||||
### Limitations
|
||||
|
||||
Supported key type is `secp256k1` only.
|
||||
|
||||
Support for other elliptic curve cryptography such as `ed25519` MAY be used.
|
||||
|
||||
### `waku2` ENR key
|
||||
|
||||
We define a `waku2` field key:
|
||||
|
||||
- The value MUST be an 8-bit flag field,
|
||||
where bits set to `1` indicate `true` and
|
||||
bits set to `0` indicate `false` for the relevant flags.
|
||||
- The flag values already defined are set out below,
|
||||
with `bit 7` the most significant bit and `bit 0` the least significant bit.
|
||||
|
||||
| bit 7 | bit 6 | bit 5 | bit 4 | bit 3 | bit 2 | bit 1 | bit 0 |
|
||||
| ------- | ------- | ------- | ------- | ----------- | -------- | ------- | ------- |
|
||||
| `undef` | `undef` | `undef` | `sync` | `lightpush` | `filter` | `store` | `relay` |
|
||||
|
||||
- In the scheme above, the flags `sync`, `lightpush`, `filter`, `store` and
|
||||
`relay` correlates with support for protocols with the same name.
|
||||
If a protocol is not supported, the corresponding field MUST be set to `false`.
|
||||
Indicating positive support for any specific protocol is OPTIONAL,
|
||||
though it MAY be required by the relevant application or discovery process.
|
||||
- Flags marked as `undef` is not yet defined.
|
||||
These SHOULD be set to `false` by default.
|
||||
|
||||
### Key Usage
|
||||
|
||||
- A Waku node MAY choose to populate the `waku2` field for enhanced discovery capabilities,
|
||||
such as indicating supported protocols.
|
||||
Such a node MAY indicate support for any specific protocol by setting the corresponding flag to `true`.
|
||||
- Waku nodes that want to participate in [Node Discovery Protocol v5](https://github.com/vacp2p/rfc-index/blob/main/waku/standards/core/33/discv5.md) [[4]](#references), however,
|
||||
MUST implement the `waku2` key with at least one flag set to `true`.
|
||||
- Waku nodes that discovered other participants using Discovery v5,
|
||||
MUST filter out participant records that do not implement this field or
|
||||
do not have at least one flag set to `true`.
|
||||
- In addition, such nodes MAY choose to filter participants on specific flags
|
||||
(such as supported protocols),
|
||||
or further interpret the `waku2` field as required by the application.
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
|
||||
## References
|
||||
|
||||
- [1](../10/waku2.md)
|
||||
- [2](https://github.com/status-im/nim-waku/pull/690)
|
||||
- [3](https://github.com/vacp2p/rfc/issues/462#issuecomment-943869940)
|
||||
- [4](https://eips.ethereum.org/EIPS/eip-778)
|
||||
- [5](https://github.com/ethereum/devp2p/blob/master/discv5/discv5.md)
|
||||
@@ -3,9 +3,10 @@ slug: 66
|
||||
title: 66/WAKU2-METADATA
|
||||
name: Waku Metadata Protocol
|
||||
status: draft
|
||||
editor: Alvaro Revuelta <alrevuelta@status.im>
|
||||
editor: Franck Royer <franck@status.im>
|
||||
contributors:
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
- Filip Dimitrijevic <filip@status.im>
|
||||
- Alvaro Revuelta <alrevuelta@status.im>
|
||||
---
|
||||
|
||||
## Abstract
|
||||
@@ -15,16 +16,19 @@ that can be associated with a [10/WAKU2](/waku/standards/core/10/waku2.md) node.
|
||||
|
||||
## Metadata Protocol
|
||||
|
||||
Waku specifies a req/resp protocol that provides information about the node's medatadata.
|
||||
Such metadata is meant to be used by the node to decide if a peer is worth connecting
|
||||
or not.
|
||||
The keywords “MUST”, // List style “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”,
|
||||
“NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||
|
||||
Waku specifies a req/resp protocol that provides information about the node's capabilities.
|
||||
Such metadata MAY be used by other peers for subsequent actions such as light protocol requests or disconnection.
|
||||
|
||||
The node that makes the request,
|
||||
includes its metadata so that the receiver is aware of it,
|
||||
without requiring an extra interaction.
|
||||
without requiring another round trip.
|
||||
The parameters are the following:
|
||||
|
||||
* `clusterId`: Unique identifier of the cluster that the node is running in.
|
||||
* `shards`: Shard indexes that the node is subscribed to.
|
||||
* `shards`: Shard indexes that the node is subscribed to via [`11/WAKU2-RELAY`](/waku/standards/core/11/relay.md).
|
||||
|
||||
***Protocol Identifier***
|
||||
|
||||
@@ -48,6 +52,51 @@ message WakuMetadataResponse {
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Suggestions
|
||||
|
||||
### Triggering Metadata Request
|
||||
|
||||
A node SHOULD proceed with metadata request upon first connection to a remote node.
|
||||
A node SHOULD use the remote node's libp2p peer id as identifier for this heuristic.
|
||||
|
||||
A node MAY proceed with metadata request upon reconnection to a remote peer.
|
||||
|
||||
A node SHOULD store the remote peer's metadata information for future reference.
|
||||
A node MAY implement a TTL regarding a remote peer's metadata, and refresh it upon expiry by initiating another metadata request.
|
||||
It is RECOMMENDED to set the TTL to 6 hours.
|
||||
|
||||
A node MAY trigger a metadata request after receiving an error response from a remote note
|
||||
stating they do not support a specific cluster or shard.
|
||||
For example, when using a request-response service such as [`19/WAKU2-LIGHTPUSH`](/waku/standards/core/19/lightpush.md).
|
||||
|
||||
### Providing Cluster Id
|
||||
|
||||
A node MUST include their cluster id into their metadata payload.
|
||||
It is RECOMMENDED for a node to operate on a single cluster id.
|
||||
|
||||
### Providing Shard Information
|
||||
|
||||
* Nodes that mount [`11/WAKU2-RELAY`](/waku/standards/core/11/relay.md) MAY include the shards they are subscribed to in their metadata payload.
|
||||
* Shard-relevant services are message related services,
|
||||
such as [`13/WAKU2-STORE`](/waku/standards/core/13/store.md), [12/WAKU2-FILTER](/waku/standards/core/12/filter.md)
|
||||
and [`19/WAKU2-LIGHTPUSH`](/waku/standards/core/19/lightpush.md)
|
||||
but not [`34/WAKU2-PEER-EXCHANGE`](/waku/standards/core/34/peer-exchange.md)
|
||||
* Nodes that mount [`11/WAKU2-RELAY`](/waku/standards/core/11/relay.md) and a shard-relevant service SHOULD include the shards they are subscribed to in their metadata payload.
|
||||
* Nodes that do not mount [`11/WAKU2-RELAY`](/waku/standards/core/11/relay.md) SHOULD NOT include any shard information
|
||||
|
||||
### Using Cluster Id
|
||||
|
||||
When reading the cluster id of a remote peer, the local node MAY disconnect if their cluster id is different from the remote peer.
|
||||
|
||||
### Using Shard Information
|
||||
|
||||
It is NOT RECOMMENDED to disconnect from a peer based on the fact that their shard information is different from the local node.
|
||||
|
||||
Ahead of doing a shard-relevant request,
|
||||
a node MAY use the previously received metadata shard information to select a peer that support the targeted shard.
|
||||
|
||||
For non-shard-relevant requests, a node SHOULD NOT discriminate a peer based on medata shard information.
|
||||
|
||||
## Copyright
|
||||
|
||||
Copyright and related rights waived via
|
||||
|
||||
Reference in New Issue
Block a user