Compare commits

...

20 Commits

Author SHA1 Message Date
Jimmy Debe
d2fb0bab9c Update and rename nomos/data-availability-layer.md to nomos/raw/data-availability.md 2024-09-06 11:56:02 -04:00
Jimmy Debe
b220cd053d Update data-availability-layer.md 2024-08-13 18:10:58 -04:00
Jimmy Debe
a7bba29820 Update data-availability-layer.md 2024-07-19 08:05:46 -04:00
Jimmy Debe
ee8acd3a47 Update data-availability-layer.md 2024-06-27 22:47:18 -04:00
Jimmy Debe
72aafc5e2c Update data-availability-layer.md 2024-06-27 17:57:30 -04:00
Jimmy Debe
caf83eb61e Update data-availability-layer.md 2024-06-27 13:53:16 -04:00
Jimmy Debe
9553437901 Update data-availability-layer.md 2024-06-27 00:01:48 -04:00
Jimmy Debe
9fa153cf85 Update data-availability-layer.md 2024-06-26 22:22:58 -04:00
Jimmy Debe
e942152ca8 Update data-availability-layer.md 2024-06-25 22:46:02 -04:00
Jimmy Debe
1f3a767b77 Update data-availability-layer.md 2024-06-20 22:56:45 -04:00
Jimmy Debe
9c7b761991 Update data-availability-layer.md 2024-06-20 01:21:14 -04:00
Jimmy Debe
a62f9480e5 Update data-availability-layer.md 2024-06-18 23:07:57 -04:00
Jimmy Debe
5e7ffd35a9 Update data-availability-layer.md 2024-06-13 21:58:32 -04:00
Jimmy Debe
b2f1d7c10a Update data-availability-layer.md 2024-06-06 19:06:31 -04:00
Jimmy Debe
1042e73141 Update data-availability-layer.md 2024-05-30 23:02:07 -04:00
Jimmy Debe
235ff23c10 Update and rename data-availability.md to data-availability-layer.md 2024-05-28 15:26:23 -04:00
Jimmy Debe
1f96afcc11 Update data-availability.md 2024-05-27 22:32:57 -04:00
Jimmy Debe
1f94cd38b6 Update data-availability.md 2024-05-24 10:37:26 -04:00
Jimmy Debe
11ab920d60 Update data-availability.md 2024-05-16 18:59:44 -04:00
Jimmy Debe
866ad0b08d Create data-availability.md 2024-05-08 15:56:02 -04:00

View File

@@ -0,0 +1,441 @@
---
title: NOMOS-DATA-AVAILABILITY-PROTOCOL
name: Nomos Data Availability Protocol
status: raw
tags: nomos
editor:
contributors:
---
## Abstract
This specification describes the protocol for the data availability for the Nomos network.
Nomos provides several services for network states to create efficient ecosystems.
Data availabilty is an important problem that network states need to solved.
## Background
Nomos is a cluster of blockchains known as zones.
Zones are layer 2 blockchains that utilize Nomos to maintain sovereignty.
They are initialized by the Nomos network and can utilize Nomos services, but
provide resources on their own.
They can define their own state as they are sovreign networks.
Nomos provides tools at the global layer that allow zones to define arbitrary configurations.
Nomos has two global layers offering services to Zones.
The base layer provides data availibility guarantees to zones that choose to utilize it.
The second layer is the coordination layer which enables state transition verification through zero-knowledge validity proofs.
The base layer allows users with resource-limiting devices, also known as light clients,
the ability to obtain all block data and process it locally.
Light clients should be able to access blockchain data similar to a full node.
To achieve this,
the Nomos data availbilty protocol provides guarantees that transaction data within Nomos zones are vaild.
## Motivation and Goal
Decentralized blockchains require full nodes to verify network transactions by downloading all the data of the network.
This becomes a problem as the blockchain data grows, full nodes will need more resources to download and
store the data while maintaining connection to the network.
Light nodes on the other hand do not download the entire network data because of it's resource limiting nature.
This retricts the network from scaling as the network is reliant on full nodes to process transactions,
and requires light nodes to rely on centralized parties.
A blockchain should allow light nodes to prove the validity of transaction data,
without requiring light nodes downloading all the blockchain data.
The data availability service on the Nomos base layer is a service that is used by zones for data availability guarantees.
This allows participants of a zone to access blockchain data in the event that nodes within a zone does not make the data available.
The service includes data encoding, verification, data availability sampling mechanism,
and data retrieval API to solve the data availability problem.
### Definitions
| Terminology | Description |
| --------------- | --------- |
| provider nodes | A Nomos base layer nodes providing data availability. |
| dispersal nodes | A Nomos node that |
| Nomos Zone | |
| light clients | A low resource |
## Specification
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
The data availability service of the Nomos base layer consist of different node roles, dispersal clients,
data availibity sampling nodes, and data availability provider nodes.
All network participants do data sampling and verification.
Nodes MAY decide to provide resources to the data availibilty service of the Nomos base-layer,
join a zone as a dispersal node, be a light client or
combination of different roles.
Limited resource roles,
like a dispersal client or data availitbiaty sampling node,
utilize Nomos zones to create or retrieve blockchain transactions.
A light client SHOULD NOT download large amounts of block data owned by zones.
- it MAY selectively vailidate zero knowledge proofs from the [Nomos Coordination Layer](#TODO),
- it MAY verify data availibility of the base layer for zones they prefer.
Data availibity on the base layer is only a temporary guarantee.
The data can only be verified for a predetermined time, based on the Nomos network.
The base layer MUST NOT provide long term incentives or
allocate resources to repair missing data.
It is the resposiblity of Zones to make blockchain data availible.
In the event that light clients can not access data,
it MAY utilize the data avilability of the Nomos base layer.
### Base Layer Nodes
Base layer nodes offering data availbility MUST NOT process or validate block data,
- it MUST store proof commitments
- it MUST store data chunks of zone block data.
- provides data availability for a limit amount of time.
The role of a provider node is to store polynomial commitment schemes for Nomos zones.
They MUST join a membership based list using libp2p,
to announce participation in a subnet,
which is a group of Nomos data availibilty provider nodes.
Nodes must register during a proof-of-validator stage where public keys are verified and
a node enters a subnet.
The RECCOMMENDED number of provider nodes within a subnet is 4096
Nodes registered within a subnet are connected with each other for data passing.
The list MUST be used by light nodes and
zones to connect to a node within a subnet to send data chunk to.
The data stored by provider nodes MUST not be interpeted or accessed,
except when sending data for [data availability sampling](#data-sampling), or
block reconstruction by light clients.
#### Message Passing
Nodes that participate in a Nomos zone are considered to be a Nomos base-layer nodes.
Nomos base-layer utilizes a libp2p publish/subscribe implementation to handle message passing between nodes in the network.
All base-layer nodes MUST be assigned to a data availability `pubsub-topic`.
Node configuations SHOULD define a `pubsub-topic` that is shared by all data availiability nodes:
```rs
pubsub-topic = 'DA_TOPIC';
```
#### Sending Data
Zones are responisble for creating data chunks that need to be stored on the blockchain.
The data SHOULD be sent to provider nodes.
#### Encoding and Verification
Nomos protocol allows nodes within a zone to encode data chucks using Reed Solomon and KGZ commitments.
Data chunks are divided into a finite field element,
a two-dimensional array also known as a matrices,
where data is organized into rows and columns.
For example: a matrix represented as $Data_{}$ for block data divided into chunks,
which is represented as ${ \Large c_{jk} }$:
$${ \Large Data = \begin{bmatrix} c_{11} & c_{12} & c_{13} & c_{...} & c_{1k} \cr c_{21} & c_{22} & c_{23} & c_{...} & c_{2k} \cr c_{31} & c_{32} & c_{33} & c_{...} & c_{3k} \cr c_{...} & c_{...} & c_{...} & c_{...} & c_{...} \cr c_{j1} & c_{j2} & c_{j3} & c_{...} & c_{jk} \end{bmatrix}}$$
Each row is a chunk of data and each column is considered a piece.
So there are ${ \Large k_{} }$ data pieces which include ${ \Large j_{} }$ data chucks.
- Each chuck SHOULD limit byte size of data
For every row ${ \Large i_{} }$,
a unique polynomial ${ \Large f_{i} }$ such that ${ \Large c_{ig} = f_{i}(w^{(g-1)}) }$,
for ${ \Large i_{} = 1,...,k }$ and ${ \Large g_{} = 1,...,j }$.
Random KGZ commitment values for the polynomials compute to:
$${ \Large f_{i} = (c_{i1}, c_{i2}, c_{i3},..., c_{ik}) }$$ and compute ${ \Large r_{i} = com(f_{i}) }$.
##### Reed Solomon Encoding
Nomos protocol REQUIRES data to be encoded using Reed-Solomon encoding after the data blob is divided into chucks,
placed into a matrix of row and columns, and
KZG commitments are computed for each data piece.
Encoding allows zones to ensure the security and integity of its blockchain data.
Using Reed-Solomon encoding, the martix from the previous step is expanded by the rows for redundancy.
The polynomial ${ \Large c_{ig} = f_{i}(w^{(g-1)}) }$ at new points ${ \Large w_{j} }$ where ${ \Large j_{} = k+1, k+2, ..., n }$.
The extended data can be demonstrated:
$${ \Large Extended Data = \begin{bmatrix} c_{11} & c_{12} & c_{...} & c_{1k} & c_{1(k+1)} & c_{1(k+2)} & c_{...} & c_{1(2k)} \cr c_{21} & c_{22} & c_{...} & c_{2k} & c_{...} & c_{...} & c_{...} & c_{...} \cr c_{31} & c_{32} & c_{...} & c_{3k} & c_{...} & c_{...} & c_{...} & c_{...} \cr c_{...} & c_{...} & c_{...} & c_{...} & c_{...} & c_{...} & c_{...} & c_{...} \cr c_{j1} & c_{j2} & c_{...} & c_{jk} & c_{j(k+1)} & c_{j(k+2)} & c_{...} & c_{j(2k)} \end{bmatrix}}$$
- There is an expansion factor of 1/2, so ${ \Large n_{} = 2k }$
- Calculate the row chuck: ${ \Large eval(f_{i}, w^{(j-1)}) \rightarrow c_{ji}, \pi_{c_{ji}} }$
##### Hash and Commitment Value of Colunms
Next, a dispersal client calculates the commitment for the inputs of each column using KGZ commitments.
Assume, ${ \Large j = 1,...,2k }$:
Each column contains ${ \Large j_{} }$ data chucks.
Using Lagrange interpolation, we can calculate the unique polynomial defined by these chunks.
Let's denote this polynomial as $\theta$
The commitment values for each column are calculated as follows:
${\Large \theta_j=\text{Interpolate}(data_1^j,data_2^j,\dots,data_\ell^j)}$
${ \Large C_j=com(\theta_j)}$
- In this protocol, we use elliptic curve as a group,
thus the entries of $C_j$s are also elliptic curve points.
Lets represent the $x$-coordinate of $C_j$ as $C_j^x$ and the $y $-coordinate of $C_j$ as $C_1^y$.
If you have just $C_j^x$ and one bit of $C_j^y$ then you can construct $C_j$.
Therefore, there is no need to use both coordinates of $C_j$.
However, for the sake of simplicity in the representation, we use only the value $C_j$ for now.
- We also calculate the hash of column data such that;
$H_j=Hash(01data_j^1||02data_j^2||\dots||0\ell data_\ell^j)$
##### Aggregate Column Commitment
The position integrity of each column for all data can be provided by a new column commitments.
To link each column to one another, we will calculate a new commitment value.
Each $\{H_j, C_j\}$ can be considered the new vector and assume they are in evaluation form.
In this case, calculate a new polynomial $\Phi$ and vector commitment value $C_{agg}$ as follows:
$\Phi=\text{Interpolate}(H_1, C_1,H_2, C_2,\dots,H_n, C_n)$
$C_{agg}=com(\Phi)$
Also calculate the proof value $\pi_{H_j,C_j}$ for each column.
Data chucks are sent with aggregate commitments, a list of row commitments for entire data blob, and
a column commitment for the specific data chuck.
##### Dispersal
###### Verification Process
Once encoded,
the data is dispersed to different Nomos data availibilty provider nodes that have joined a subnet on the base layer.
It is RECCOMENDED that the dispersal client sends a column to 4096 provider nodes for better bandwidth optimization.
A dispersal client sends the following:
```python
class EncodedData:
column_data:
extended_matrix: ChuckMatrix
row_commitments: List[Commitments]
row_proofs: List[List[Proof]]
column_commitment: List[Commitment]
aggregated_column_commitment: Commitment
aggregated_column_proofs: List[Proof]
```
These values are represented as:
- `extended_matrix` : ${ \Large data_i^j }$
- `row_commitments` : ${ \Large \{r_1,r_2,\dots,r_{\ell}\} }$
- `row_proofs` : ${ \Large \{\pi^j_{r_1},\pi^j_{r_2}, \dots,\pi^j_{r_\ell}\} }$
- `column_data` : ${ \Large \{data_1^j,data_2^j,\dots,data_\ell^j\} }$
- `column_commitment` : ${ \Large C_{j} }$
- `aggregated_column_commitment` : ${ \Large C_{agg} }$
- `aggregated_column_proofs` : ${ \Large \pi_{H_j,C_j} }$
When a provider node receives data chunks from dispersal nodes,
the data chunk is stored in the provider's node memory.
The following steps SHOULD occur once data is received by a provider node:
1. Checks the `aggregated_column_proofs` and verify the proofs.
Zone calculates the $eval$ value and sends it to $node_j$.
${ \Large eval(\Phi,w^{j-1})\to H_j, C_j }$, ${ \Large \pi_{H_j,C_j} }$
2. Calculates the `column_commitment` data.
${ \Large \theta'_j=\text{Interpolate}(data_1^j,data_2^j,\...\ell^j) }$
This value SHOULD be equal to ${ \Large C_j }$ : ${ \Large C_j\stackrel{?}{=}com(\theta'_j) }$
3. Calulates the hash of `column_data` :
${ \Large H_j=Hash(01data_j^1||02data_j^2||\dots||0\ell data_\ell^j)}$
Then verifies the proof :
${ \Large verify(r_i, data_i^j, \pi_{r_i}^j)\to true/false }$
4. For each `row_commitment`, verifies the proof of every chunk against its corresponding row commitment:
${ \Large verify(r_i, data_i^j, \pi_{r_i}^j)\to true/false }$
If all verification steps are true, this proves that the data has been encoded correctly.
### VID Certificate
A verifiable information dispersal certificate, VID certificate,
is a list of attestation from data availibility nodes.
It is used to verify that the data chucks have been dispersed properly amongst nodes in the base layer.
The provider node signs an attestation that contain the hash value of the `row_commitment` and
of the `aggregated_column_commitment`.
Signitures are verified by dispersal clients and
valid signitures SHOULD be added to the VID certificate
For every provider node $j$, assuming $sk_j$ is the private key, a signature is generated as follows:
${ \Large \sigma_j=Sign(sk_j, hash(C_{agg}, r_1,r_2,\dots,r_{\ell})) }$
The provider node sends the signed attestation back to the zone dispersal clients confirming the data has been received and
verified.
Once a dispersal client verifies data chucks have been hashed and signed by the base layer,
the VID certificate SHOULD be created.
The attesstation is created with the following values:
```rs
// Provider node SHOULD hash using Blake2 algorithm
// blob_hash : Hash of row_commitment and column_commitiment
fn sendAsstation () {
attestation_hash = hash(blob_hash, DAnode);
}
```
The VID certificate is then sent to block builder to be accepted through concensus,
as desirbed in [Cryptarchia](#).
### Data Availability Sampling
Light nodes MAY choose to be a data availability sampling node.
This node can participate in any other NOMOS service while providing verification of data dispersal services.
For example, a dispersal client can send data to be available through the base layer and
decide to perform data availability sampling to have a great assurance that the data is available.
This would reduce the potential threats from malicious or
faulty nodes not replicating data in their subnets.
The following steps are REQUIRED by a data availability sampling node to verify data dispersal:
1. Choose a random column value and row value from base layer provider nodes.
Light node wants to opening of $C_t$ and $r_{t'}$.
2. Assuming provider node $node_t$, it calculates the $eval$ value for the `column_commitment`.
Also calculates the `row_commitment` value $r_{t'}$ and the proof of it.
Then sends these values to the sampling node.
${ \Large eval(\Phi,w^{t-1})\to C_t,\pi_{C_t} }$
3. Sampling nodes verifies the `row_commitment` and the `column_commitment` as follows:
${ \Large verify(C_{agg},C_{r},\pi_{C_r}) \to true/false }$
${ \Large verify(C_{agg},C_r, \pi_{C_r})\to true/false}$
4. If this proof is true, then light nodes wants to opening of the column commitment.
$node_r$ calculates the $eval$ value and sends it to the light node to be verified.
${ \Large eval(\theta_t,w^{t'-1})\to data_{t'}^{t},\pi_{data_{t'}^{t}} }$
${ \Large verify(C_t, data_{t'}^t, \pi_{data_{t'}^t})\to true/false }$
If this is true, then this proves that the data chuck has been encoded correctly.
### Blockchain Data
The block data is stored by nodes within zones and can be retreived using the [read api](#).
A block producer, which is also be a base-layer provider node,
MUST choose certificates that need to be added to a new block from the base-layer mempool in the order it was received.
A block contains a list of VID certificates.
Once a new block for a zone is created,
it MUST be sent to the base layer to be persisted for a short period of time.
A zone MAY choose to use alternative methods to persist block data, like decentralized storage solutions.
A provider node will verify the signtures within the block are is also stored in the node memory.
If the node has the same data,
the block SHOULD be persisted.
If the node does not have the data,
the block SHOULD be skipped.
Light nodes are not REQUIRED to download all the blockchain data belonging to a zone.
To fulfill this requirement,
zone partipants MAY utilize the data availability of the base layer to retrieve block data and
pay for this resource with the native token.
Other nodes within the zones are REQUIRED to download block data for all prefered zones.
Data included in hash for next block in Zone
After block producer verify VID certificates,
the following data is store on the blockchain:
- CertificateID: A hash of the VID Certificate (including the C_agg and signatures from DA Nodes)
- AppID: The application identification for the specific application(zone) for the data chunk
- Index: A number for a particular sequence or position of the data chunk within the context of its AppID
Block producers receive certificates from zones along with metadata, `AppId` and
`Index`.
The metadata values are also stored in the block.
### Data Availability Core API
Data availiablity nodes utilize `read` and `write` API functions.
The `read` function allow node to query for information and
`write` function for communication for multiple services.
Data chuck is encoded as described above in [Encoding and Verification](#) and
delivered using the message passing protocol described above in [Message Passing](#).
The API functions are detailed below:
```python
class Chunk:
def __init__(self, data, app_id, index):
self.data = data
self.app_id = app_id
self.index = index
class Metadata:
def __init__(self, app_id, index):
self.app_id = app_id
self.index = index
class Certificate:
def __init__(self, proof, chunks_info):
self.proof = proof
self.chunks_info = chunks_info
class Block:
def __init__(self, certificates):
self.certificates = certificates
def receive_chunk():
# Receives from network new chunks to be processed
# Returns a tuple of (Chunk, Metadata)
chunk = Chunk(data = "chunk_data", app_id = "app_id", index = "index")
metadata = Metadata(app_id = "app_id", index = "index")
return chunk, metadata
def receive_block():
# Read from blockchain latest blocks added
# Returns a Block
certificate = Certificate(proof = "proof", chunks_info = "chunks_info")
block = Block(certificates = [certificate])
return block
def write_to_cache(chunk, metadata):
# Logic to write the chunk {metadata.index} to cache
def write_to_storage(certificate):
# Logic to write data to storage based on the certificate.proof
def da_node():
while True:
# Receiving chunk and metadata
chunk, metadata = receive_chunk()
write_to_cache(chunk, metadata)
# Receiving a block
block = receive_block()
for certificate in block.certificates:
write_to_storage(certificate)
```
- `receive_chunk` - Receives new chunks to be processed
- `receive_block` - Receives latest blocks added to the blockchain
- `write_to_cache` - Function to store newly recieved chunk in cache
- `write_to_storage` - Used when a certificate for Zone's data is observed in a blockchain
### Security Considerations
## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
## References