mirror of
https://github.com/vacp2p/rfc-index.git
synced 2026-01-09 23:58:02 -05:00
update erasure
Corrected spelling errors and improved clarity in the erasure coding documentation.
This commit is contained in:
@@ -9,16 +9,18 @@ contributors:
|
|||||||
|
|
||||||
## Abstract
|
## Abstract
|
||||||
|
|
||||||
This specification describes the erasue coding technique used by Codex clients.
|
This specification describes the erasure coding technique used by Codex clients.
|
||||||
A Codex client will encode a dataset before it is stored on the network.
|
A Codex client will encode a dataset before it is stored on the network.
|
||||||
|
|
||||||
## Background
|
## Background
|
||||||
|
|
||||||
The Codex protocol uses storage proofs to verify whether a storage provider (SP) is storing a certain dataset.
|
The Codex protocol uses storage proofs to verify whether a storage provider (SP) is storing a certain dataset.
|
||||||
Before a dataset can be retrievable on the network,
|
Before a dataset is retrieved on the network,
|
||||||
SPs must agree to store dataset for a certain period of time.
|
SPs must agree to store the dataset for a certain period of time.
|
||||||
When to storage request is active erasure coding help ensure the dataset is retrievable from the network.
|
When a storage request is active,
|
||||||
This is achieved by the dataset that is chunked is restored in retrieveal by erasure coding.****
|
erasure coding helps ensure the dataset is retrievable from the network.
|
||||||
|
This is achieved by the dataset that is chunked,
|
||||||
|
which is restored in retrieval by erasure coding.
|
||||||
When data blocks are abandoned by storage providers,
|
When data blocks are abandoned by storage providers,
|
||||||
the requester can be assured of data retrievability.
|
the requester can be assured of data retrievability.
|
||||||
|
|
||||||
@@ -29,89 +31,91 @@ The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL N
|
|||||||
“OPTIONAL” in this document are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
|
“OPTIONAL” in this document are to be interpreted as described in [2119](https://www.ietf.org/rfc/rfc2119.txt).
|
||||||
|
|
||||||
A client SHOULD perform the erasure encoding locally before providing a dataset to the network.
|
A client SHOULD perform the erasure encoding locally before providing a dataset to the network.
|
||||||
During validation, nodes will conduct error coorection and decoding based on the erasure coding technique known to the network.
|
During validation, nodes will conduct error correction and decoding based on the erasure coding technique known to the network.
|
||||||
Datasets using encodings not recognized by the network, MAY be ignored during decoding and
|
Datasets using encodings not recognized by the network MAY be ignored during decoding and
|
||||||
validation by other nodes in the network.
|
validation by other nodes in the network.
|
||||||
|
|
||||||
The dataset SHOULD split into data chunks represented by `k`, e.g. $(k_1, k_2, k_3, \ldots, k_{n})$.
|
The dataset SHOULD be split into data chunks represented by `k`, e.g. $(k_1, k_2, k_3, \ldots, k_{n})$.
|
||||||
Each chunk `k` MUST be encoded into `n` blocks, using an erasure encoding technique like the Reed Solomon algorithm.
|
Each chunk `k` MUST be encoded into `n` blocks, using an erasure encoding technique like the Reed Solomon algorithm.
|
||||||
Including a set of parity blocks that MUST be generated,
|
Including a set of parity blocks that MUST be generated,
|
||||||
represented by `m`.
|
represented by `m`.
|
||||||
All node roles on the Codex network use the [Leapard Codec](https://github.com/catid/leopard).
|
All node roles on the Codex network use the [Leopard Codec](https://github.com/catid/leopard).
|
||||||
|
|
||||||
Below is the encoding process:
|
Below is the encoding process:
|
||||||
|
|
||||||
1. Prepare the dataset for the marketplace using erasure encoding.
|
1. Prepare the dataset for the marketplace using erasure encoding.
|
||||||
2. Derive an manifest CID from the root encoded blocks
|
2. Derive a manifest CID from the root encoded blocks
|
||||||
3. Error correction by validator nodes once storage contract begins
|
3. Error correction by validator nodes once the storage contract begins
|
||||||
4. Decode data back to original data.
|
4. Decode data back to the original data.
|
||||||
|
|
||||||
### Encoding
|
### Encoding
|
||||||
|
|
||||||
A client MAY prepare a dataset locally before making the request to the network.
|
A client MAY prepare a dataset locally before making the request to the network.
|
||||||
The data chunks, `k`, MUST be the same size, if not,
|
The data chunks, `k`, MUST be the same size, if not,
|
||||||
the lesser chunk MAY be padded with empty data.
|
the smaller chunk MAY be padded with empty data.
|
||||||
|
|
||||||
The data blocks are encoded based on the following parameters:
|
The data blocks are encoded based on the following parameters:
|
||||||
|
|
||||||
```js
|
```js
|
||||||
|
|
||||||
struct encodingParms {
|
struct encodingParms {
|
||||||
ecK: int, # Number of data blocks (K)
|
ecK: int, // Number of data blocks (K)
|
||||||
ecM: int, # Number of parity blocks (M)
|
ecM: int, // Number of parity blocks (M)
|
||||||
rounded: int, # Dataset rounded to multiple of (K)
|
rounded: int, // Dataset rounded to multiple of (K)
|
||||||
steps: int, # Number of encoding iterations (steps)
|
steps: int, // Number of encoding iterations (steps)
|
||||||
blocksCount: int, # Total blocks after encoding
|
blocksCount: int, // Total blocks after encoding
|
||||||
strategy: enum, # Indexing strategy used
|
strategy: enum, // Indexing strategy used
|
||||||
}
|
}
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
After the erasure coding process,
|
After the erasure coding process,
|
||||||
a protected manifest SHOULD be generated for the dataset which would store the a CID of the root merkle tree.
|
a protected manifest SHOULD be generated for the dataset, which would store the CID of the root Merkle tree.
|
||||||
The content of the protected manifest below, see [CODEX-MANIFEST](./manifest.md) for more information:
|
The content of the protected manifest below, see [CODEX-MANIFEST](./manifest.md) for more information:
|
||||||
|
|
||||||
```js
|
```js
|
||||||
|
|
||||||
type verifiable {
|
syntax = "proto3";
|
||||||
verifyRoot: Cid # Root of verification tree
|
|
||||||
slotRoots: seq[Cid] # Individual slot roots
|
message verifiable {
|
||||||
cellSize: NBytes # Size of verification cells
|
string verifyRoot = 1 // Root of verification tree with CID
|
||||||
verifiableStrategy: StrategyType # Strategy for verification
|
repeated string slot_roots = 2 // List Individual slot roots with CID
|
||||||
|
uint32 cellSize = 3 // Size of verification cells
|
||||||
|
string verifiableStrategy = 4 // Strategy for verification
|
||||||
}
|
}
|
||||||
|
|
||||||
struct ErasureInfo {
|
message ErasureInfo {
|
||||||
optional uint32 ecK = 1; # number of encoded blocks
|
optional uint32 ecK = 1; // number of encoded blocks
|
||||||
optional uint32 ecM = 2; # number of parity blocks
|
optional uint32 ecM = 2; // number of parity blocks
|
||||||
optional bytes originalTreeCid = 3; # cid of the original dataset
|
optional bytes originalTreeCid = 3; // cid of the original dataset
|
||||||
optional uint32 originalDatasetSize = 4; # size of the original dataset
|
optional uint32 originalDatasetSize = 4; // size of the original dataset
|
||||||
optional VerificationInformation verification = 5; # verification information
|
optional VerificationInformation verification = 5; // verification information
|
||||||
}
|
}
|
||||||
|
|
||||||
struct Manifest {
|
message Manifest {
|
||||||
optional bytes treeCid = 1; # cid (root) of the tree
|
optional bytes treeCid = 1; // cid (root) of the tree
|
||||||
optional uint32 blockSize = 2; # size of a single block
|
optional uint32 blockSize = 2; // size of a single block
|
||||||
optional uint64 datasetSize = 3; # size of the dataset
|
optional uint64 datasetSize = 3; // size of the dataset
|
||||||
optional codec: MultiCodec = 4; # Dataset codec
|
optional codec: MultiCodec = 4; // Dataset codec
|
||||||
optional hcodec: MultiCodec = 5 # Multihash codec
|
optional hcodec: MultiCodec = 5 // Multihash codec
|
||||||
optional version: CidVersion = 6; # Cid version
|
optional version: CidVersion = 6; // Cid version
|
||||||
optional ErasureInfo erasure = 7; # erasure coding info
|
optional ErasureInfo erasure = 7; // erasure coding info
|
||||||
}
|
}
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
After the encoding process,
|
After the encoding process,
|
||||||
is ready to be stored on the network via the [CODEX-MARKETPLACE](./marketplace.md).
|
is ready to be stored on the network via the [CODEX-MARKETPLACE](./marketplace.md).
|
||||||
The merkle tree root SHOULD be included in the manifest so other nodes are able to locate and
|
The Merkle tree root SHOULD be included in the manifest so other nodes are able to locate and
|
||||||
recontruct a dataset from the erasure encoded blocks.
|
reconstruct a dataset from the erasure encoded blocks.
|
||||||
|
|
||||||
### Data Repair
|
### Data Repair
|
||||||
|
|
||||||
Storage providers may have periods during a storage contract where they are not storing the data.
|
Storage providers may have periods during a storage contract where they are not storing the data.
|
||||||
A validator node MAY store the `treeCid` from the `Manifest` to locate all the data blocks and
|
A validator node MAY store the `treeCid` from the `Manifest` to locate all the data blocks and
|
||||||
recontruct the merkle tree.
|
reconstruct the merkle tree.
|
||||||
When a missing branch of the tree is not retrievable from a SP, data repair will be REQUIRED.
|
When a missing branch of the tree is not retrievable from an SP, data repair will be REQUIRED.
|
||||||
The validator will open a request for a new SP to reconstruct the merkle tree and
|
The validator will open a request for a new SP to reconstruct the Merkle tree and
|
||||||
store the missing data blocks.
|
store the missing data blocks.
|
||||||
The validator role is described in the [CODEX-MARKETPLACE](./marketplace.md) specification.
|
The validator role is described in the [CODEX-MARKETPLACE](./marketplace.md) specification.
|
||||||
|
|
||||||
@@ -119,7 +123,7 @@ The validator role is described in the [CODEX-MARKETPLACE](./marketplace.md) spe
|
|||||||
|
|
||||||
During dataset retrieval, a node will use the `treeCid` to locate the data blocks.
|
During dataset retrieval, a node will use the `treeCid` to locate the data blocks.
|
||||||
The number of retrieved blocks by the node MUST be greater than `k`.
|
The number of retrieved blocks by the node MUST be greater than `k`.
|
||||||
If less than `k`, the node MAY not be able to recontruct the dataset.
|
If less than `k`, the node MAY not be able to reconstruct the dataset.
|
||||||
The node SHOULD request missing data chunks from the network and
|
The node SHOULD request missing data chunks from the network and
|
||||||
wait until the threshold is reached.
|
wait until the threshold is reached.
|
||||||
|
|
||||||
@@ -129,19 +133,19 @@ wait until the threshold is reached.
|
|||||||
|
|
||||||
An adversarial storage provider can remove only the first element from more than half of the block,
|
An adversarial storage provider can remove only the first element from more than half of the block,
|
||||||
and the slot data can no longer be recovered from the data that the host stores.
|
and the slot data can no longer be recovered from the data that the host stores.
|
||||||
For example, with data blocks of size 1TB erasure coded into 256 data and parity shard,
|
For example, with data blocks of size 1TB, erasure coded into 256 data and parity shards.
|
||||||
an adversary could strategically remove 129 bytes, and
|
An adversary could strategically remove 129 bytes, and
|
||||||
the data can no longer be fully recovered with the erasure coded data that is present on the host.
|
the data can no longer be fully recovered with the erasure-coded data that is present on the host.
|
||||||
|
|
||||||
The RECOMMENDED solution should perform checks on entire shards to protect against adversarial erasure.
|
The RECOMMENDED solution should perform checks on entire shards to protect against adversarial erasure.
|
||||||
In the Merkle storage proofs, the entire shard SHOULD be hased,
|
In the Merkle storage proofs, the entire shard SHOULD be hashed,
|
||||||
then that hash is checked against the Merkle proof.
|
then that hash is checked against the Merkle proof.
|
||||||
Effectively, the block size for Merkle proofs should equal the shard size of the erasure coding interleaving.
|
Effectively, the block size for Merkle proofs should equal the shard size of the erasure coding interleaving.
|
||||||
Hashing large amounts of data will be expensive to perform in a SNARK, which is used to compress proofs in size in Codex.
|
Hashing large amounts of data will be expensive to perform in an SNARK, which is used to compress proofs in size in Codex.
|
||||||
|
|
||||||
### Data Encryption
|
### Data Encryption
|
||||||
|
|
||||||
If data is not encryted before entering the encoding process, nodes, including storage providers,
|
If data is not encrypted before entering the encoding process, nodes, including storage providers,
|
||||||
MAY be able to access the data.
|
MAY be able to access the data.
|
||||||
This may lead to privacy concerns and the misuse of data.
|
This may lead to privacy concerns and the misuse of data.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user