From a60a2c45de49966c3550e0f5efe2fa4b081dd40f Mon Sep 17 00:00:00 2001 From: Jimmy Debe <91767824+jimstir@users.noreply.github.com> Date: Mon, 3 Feb 2025 03:52:35 -0500 Subject: [PATCH] 13/WAKU-STORE: Update (#124) Update 13/WAKU-STORE RFC to store v3 spec. Previous draft has been moved to `previous-versions` version number 00. --- .../core/13/previous-versions/00/store.md | 359 ++++++++++ waku/standards/core/13/store.md | 612 ++++++++++-------- 2 files changed, 707 insertions(+), 264 deletions(-) create mode 100644 waku/standards/core/13/previous-versions/00/store.md diff --git a/waku/standards/core/13/previous-versions/00/store.md b/waku/standards/core/13/previous-versions/00/store.md new file mode 100644 index 0000000..a80b91e --- /dev/null +++ b/waku/standards/core/13/previous-versions/00/store.md @@ -0,0 +1,359 @@ +--- +slug: 13 +title: 13/WAKU2-STORE +name: Waku v2 Store +status: draft +tags: waku-core +version: 00 +editor: Simon-Pierre Vivier +contributors: + - Dean Eigenmann + - Oskar Thorén + - Aaryamann Challani + - Sanaz Taheri + - Hanno Cornelius +--- + +## Abstract + +This specification explains the `13/WAKU2-STORE` protocol +which enables querying of messages received through the relay protocol and +stored by other nodes. +It also supports pagination for more efficient querying of historical messages. + +**Protocol identifier***: `/vac/waku/store/2.0.0-beta4` + +## Terminology + +The term PII, Personally Identifiable Information, +refers to any piece of data that can be used to uniquely identify a user. +For example, the signature verification key, and +the hash of one's static IP address are unique for each user and hence count as PII. + +## Design Requirements + +The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, +“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and +“OPTIONAL” in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt). + +Nodes willing to provide the storage service using `13/WAKU2-STORE` protocol, +SHOULD provide a complete and full view of message history. +As such, they are required to be *highly available* and +specifically have a *high uptime* to consistently receive and store network messages. +The high uptime requirement makes sure that no message is missed out +hence a complete and intact view of the message history +is delivered to the querying nodes. +Nevertheless, in case storage provider nodes cannot afford high availability, +the querying nodes may retrieve the historical messages from multiple sources +to achieve a full and intact view of the past. + +The concept of `ephemeral` messages introduced in +[`14/WAKU2-MESSAGE`](../14/message.md) affects `13/WAKU2-STORE` as well. +Nodes running `13/WAKU2-STORE` SHOULD support `ephemeral` messages as specified in +[14/WAKU2-MESSAGE](../14/message.md). +Nodes running `13/WAKU2-STORE` SHOULD NOT store messages +with the `ephemeral` flag set to `true`. + +## Adversarial Model + +Any peer running the `13/WAKU2-STORE` protocol, i.e. +both the querying node and the queried node, are considered as an adversary. +Furthermore, +we currently consider the adversary as a passive entity +that attempts to collect information from other peers to conduct an attack but +it does so without violating protocol definitions and instructions. +As we evolve the protocol, +further adversarial models will be considered. +For example, under the passive adversarial model, +no malicious node hides or +lies about the history of messages +as it is against the description of the `13/WAKU2-STORE` protocol. + +The following are not considered as part of the adversarial model: + +- An adversary with a global view of all the peers and their connections. +- An adversary that can eavesdrop on communication links +between arbitrary pairs of peers (unless the adversary is one end of the communication). +In specific, the communication channels are assumed to be secure. + +## Wire Specification + +Peers communicate with each other using a request / response API. +The messages sent are Protobuf RPC messages which are implemented using +[protocol buffers v3](https://developers.google.com/protocol-buffers/). +The following are the specifications of the Protobuf messages. + +### Payloads + +```protobuf +syntax = "proto3"; + +message Index { + bytes digest = 1; + sint64 receiverTime = 2; + sint64 senderTime = 3; + string pubsubTopic = 4; +} + +message PagingInfo { + uint64 pageSize = 1; + Index cursor = 2; + enum Direction { + BACKWARD = 0; + FORWARD = 1; + } + Direction direction = 3; +} + +message ContentFilter { + string contentTopic = 1; +} + +message HistoryQuery { + // the first field is reserved for future use + string pubsubtopic = 2; + repeated ContentFilter contentFilters = 3; + PagingInfo pagingInfo = 4; +} + +message HistoryResponse { + // the first field is reserved for future use + repeated WakuMessage messages = 2; + PagingInfo pagingInfo = 3; + enum Error { + NONE = 0; + INVALID_CURSOR = 1; + } + Error error = 4; +} + +message HistoryRPC { + string request_id = 1; + HistoryQuery query = 2; + HistoryResponse response = 3; +} +``` + +#### Index + +To perform pagination, +each `WakuMessage` stored at a node running the `13/WAKU2-STORE` protocol +is associated with a unique `Index` that encapsulates the following parts. + +- `digest`: a sequence of bytes representing the SHA256 hash of a `WakuMessage`. +The hash is computed over the concatenation of `contentTopic` +and `payload` fields of a `WakuMessage` (see [14/WAKU2-MESSAGE](../14/message.md)). +- `receiverTime`: the UNIX time in nanoseconds +at which the `WakuMessage` is received by the receiving node. +- `senderTime`: the UNIX time in nanoseconds +at which the `WakuMessage` is generated by its sender. +- `pubsubTopic`: the pubsub topic on which the `WakuMessage` is received. + +#### PagingInfo + +`PagingInfo` holds the information required for pagination. +It consists of the following components. + +- `pageSize`: A positive integer indicating the number of queried `WakuMessage`s +in a `HistoryQuery` +(or retrieved `WakuMessage`s in a `HistoryResponse`). +- `cursor`: holds the `Index` of a `WakuMessage`. +- `direction`: indicates the direction of paging +which can be either `FORWARD` or `BACKWARD`. + +#### ContentFilter + +`ContentFilter` carries the information required for filtering historical messages. + +- `contentTopic` represents the content topic of the queried historical `WakuMessage`. + This field maps to the `contentTopic` field of the [14/WAKU2-MESSAGE](../14/message.md). + +#### HistoryQuery + +RPC call to query historical messages. + +- The `pubsubTopic` field MUST indicate the pubsub topic +of the historical messages to be retrieved. +This field denotes the pubsub topic on which `WakuMessage`s are published. +This field maps to `topicIDs` field of `Message` in [`11/WAKU2-RELAY`](../11/relay.md). +Leaving this field empty means no filter on the pubsub topic +of message history is requested. +This field SHOULD be left empty in order to retrieve the historical `WakuMessage` +regardless of the pubsub topics on which they are published. +- The `contentFilters` field MUST indicate the list of content filters +based on which the historical messages are to be retrieved. +Leaving this field empty means no filter on the content topic +of message history is required. +This field SHOULD be left empty in order +to retrieve historical `WakuMessage` regardless of their content topics. +- `PagingInfo` holds the information required for pagination. +Its `pageSize` field indicates the number of `WakuMessage`s +to be included in the corresponding `HistoryResponse`. +It is RECOMMENDED that the queried node defines a maximum page size internally. +If the querying node leaves the `pageSize` unspecified, +or if the `pageSize` exceeds the maximum page size, +the queried node SHOULD auto-paginate the `HistoryResponse` +to no more than the configured maximum page size. +This allows mitigation of long response time for `HistoryQuery`. +In the forward pagination request, +the `messages` field of the `HistoryResponse` SHALL contain, at maximum, +the `pageSize` amount of `WakuMessage` whose `Index` +values are larger than the given `cursor` +(and vise versa for the backward pagination). +Note that the `cursor` of a `HistoryQuery` MAY be empty +(e.g., for the initial query), as such, and +depending on whether the `direction` is `BACKWARD` or +`FORWARD` the last or the first `pageSize` `WakuMessage` SHALL be returned, +respectively. + +#### Sorting Messages + +The queried node MUST sort the `WakuMessage` based on their `Index`, +where the `senderTime` constitutes the most significant part and +the `digest` comes next, and +then perform pagination on the sorted result. +As such, the retrieved page contains an ordered list of `WakuMessage` +from the oldest messages to the most recent one. +Alternatively, the `receiverTime` (instead of `senderTime`) +MAY be used to sort messages during the paging process. +However, it is RECOMMENDED the use of the `senderTime` +for sorting as it is invariant and +consistent across all the nodes. +This has the benefit of `cursor` reusability i.e., +a `cursor` obtained from one node can be consistently used +to query from another node. +However, this `cursor` reusability does not hold when the `receiverTime` is utilized +as the receiver time is affected by the network delay and +nodes' clock asynchrony. + +#### HistoryResponse + +RPC call to respond to a HistoryQuery call. + +- The `messages` field MUST contain the messages found, +these are [14/WAKU2-MESSAGE](../14/message.md) types. +- `PagingInfo` holds the paging information based +on which the querying node can resume its further history queries. +The `pageSize` indicates the number of returned Waku messages +(i.e., the number of messages included in the `messages` field of `HistoryResponse`). +The `direction` is the same direction as in the corresponding `HistoryQuery`. +In the forward pagination, the `cursor` holds the `Index` of the last message +in the `HistoryResponse` `messages` (and the first message in the backward paging). +Regardless of the paging direction, +the retrieved `messages` are always sorted in ascending order +based on their timestamp as explained in the [sorting messages](#sorting-messages)section, +that is, from the oldest to the most recent. +The requester SHALL embed the returned `cursor` inside its next `HistoryQuery` +to retrieve the next page of the [14/WAKU2-MESSAGE](../14/message.md). +The `cursor` obtained from one node SHOULD NOT be used in a request to another node +because the result may be different. +- The `error` field contains information about any error that has occurred +while processing the corresponding `HistoryQuery`. +`NONE` stands for no error. +This is also the default value. +`INVALID_CURSOR` means that the `cursor` field of `HistoryQuery` +does not match with the `Index` of any of the `WakuMessage` +persisted by the queried node. + +## Security Consideration + +The main security consideration to take into account +while using this protocol is that a querying node +have to reveal their content filters of interest to the queried node, +hence potentially compromising their privacy. + +## Future Work + +- **Anonymous query**: This feature guarantees that nodes +can anonymously query historical messages from other nodes i.e., +without disclosing the exact topics of [14/WAKU2-MESSAGE](../14/message.md) +they are interested in. +As such, no adversary in the `13/WAKU2-STORE` protocol +would be able to learn which peer is interested in which content filters i.e., +content topics of [14/WAKU2-MESSAGE](../14/message.md). +The current version of the `13/WAKU2-STORE` protocol does not provide anonymity +for historical queries, +as the querying node needs to directly connect to another node +in the `13/WAKU2-STORE` protocol and +explicitly disclose the content filters of its interest +to retrieve the corresponding messages. +However, one can consider preserving anonymity through one of the following ways: + - By hiding the source of the request i.e., anonymous communication. + That is the querying node shall hide all its PII in its history request + e.g., its IP address. + This can happen by the utilization of a proxy server or by using Tor. + Note that the current structure of historical requests + does not embody any piece of PII, otherwise, + such data fields must be treated carefully to achieve query anonymity. + + - By deploying secure 2-party computations in which the querying node + obtains the historical messages of a certain topic, + the queried node learns nothing about the query. + Examples of such 2PC protocols are secure one-way Private Set Intersections (PSI). + + + + +- **Robust and verifiable timestamps**: +Messages timestamp is a way to show that the message existed +prior to some point in time. +However, the lack of timestamp verifiability can create room for a range of attacks, +including injecting messages with invalid timestamps pointing to the far future. +To better understand the attack, +consider a store node whose current clock shows `2021-01-01 00:00:30` +(and assume all the other nodes have a synchronized clocks +-20seconds). +The store node already has a list of messages, + `(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20)`, +that are sorted based on their timestamp. +An attacker sends a message with an arbitrary large timestamp e.g., +10 hours ahead of the correct clock `(m',2021-01-01 10:00:30)`. +The store node places `m'` at the end of the list, + +```text +(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20),(m',2021-01-01 10:00:30). +``` + +Now another message arrives with a valid timestamp e.g., +`(m11, 2021-01-01 00:00:45)`. +However, since its timestamp precedes the malicious message `m'`, +it gets placed before `m'` in the list i.e., + +```text +(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20), (m11, 2021-01-01 00:00:45), (m',2021-01-01 10:00:30). +``` + +In fact, for the next 10 hours, +`m'` will always be considered as the most recent message and +served as the last message to the querying nodes irrespective +of how many other messages arrive afterward. + +A robust and verifiable timestamp allows the receiver of a message +to verify that a message has been generated prior to the claimed timestamp. +One solution is the use of [open timestamps](https://opentimestamps.org/) e.g., +block height in Blockchain-based timestamps. +That is, messages contain the most recent block height +perceived by their senders at the time of message generation. +This proves accuracy within a range of minutes (e.g., in Bitcoin blockchain) or +seconds (e.g., in Ethereum 2.0) from the time of origination. + +## Copyright + +Copyright and related rights waived via +[CC0](https://creativecommons.org/publicdomain/zero/1.0/). + +## References + +1. [14/WAKU2-MESSAGE](../14/message.md) +2. [protocol buffers v3](https://developers.google.com/protocol-buffers/) +3. [11/WAKU2-RELAY](../11/relay.md) +4. [Open timestamps](https://opentimestamps.org/) diff --git a/waku/standards/core/13/store.md b/waku/standards/core/13/store.md index f4441e7..4b588b9 100644 --- a/waku/standards/core/13/store.md +++ b/waku/standards/core/13/store.md @@ -1,311 +1,402 @@ --- slug: 13 title: 13/WAKU2-STORE -name: Waku v2 Store -status: draft +name: Waku Store Query tags: waku-core -editor: Simon-Pierre Vivier +version: 01 +editor: Hanno Cornelius contributors: - Dean Eigenmann - Oskar Thorén - Aaryamann Challani - Sanaz Taheri - - Hanno Cornelius --- +Previous version: [00](waku/standards/core/13/previous-versions/00/store.md) ## Abstract -This specification explains the `13/WAKU2-STORE` protocol -which enables querying of messages received through the relay protocol and -stored by other nodes. -It also supports pagination for more efficient querying of historical messages. +This specification explains the `WAKU2-STORE` protocol, +which enables querying of [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md)s. -**Protocol identifier***: `/vac/waku/store/2.0.0-beta4` +**Protocol identifier***: `/vac/waku/store-query/3.0.0` -## Terminology +### Terminology The term PII, Personally Identifiable Information, refers to any piece of data that can be used to uniquely identify a user. For example, the signature verification key, and the hash of one's static IP address are unique for each user and hence count as PII. -## Design Requirements +## Wire Specification The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119](https://www.ietf.org/rfc/rfc2119.txt). -Nodes willing to provide the storage service using `13/WAKU2-STORE` protocol, -SHOULD provide a complete and full view of message history. -As such, they are required to be *highly available* and -specifically have a *high uptime* to consistently receive and store network messages. -The high uptime requirement makes sure that no message is missed out -hence a complete and intact view of the message history -is delivered to the querying nodes. -Nevertheless, in case storage provider nodes cannot afford high availability, -the querying nodes may retrieve the historical messages from multiple sources -to achieve a full and intact view of the past. +### Design Requirements -The concept of `ephemeral` messages introduced in -[`14/WAKU2-MESSAGE`](../14/message.md) affects `13/WAKU2-STORE` as well. -Nodes running `13/WAKU2-STORE` SHOULD support `ephemeral` messages as specified in -[14/WAKU2-MESSAGE](../14/message.md). -Nodes running `13/WAKU2-STORE` SHOULD NOT store messages -with the `ephemeral` flag set to `true`. - -## Adversarial Model - -Any peer running the `13/WAKU2-STORE` protocol, i.e. -both the querying node and the queried node, are considered as an adversary. -Furthermore, -we currently consider the adversary as a passive entity -that attempts to collect information from other peers to conduct an attack but -it does so without violating protocol definitions and instructions. -As we evolve the protocol, -further adversarial models will be considered. -For example, under the passive adversarial model, -no malicious node hides or -lies about the history of messages -as it is against the description of the `13/WAKU2-STORE` protocol. - -The following are not considered as part of the adversarial model: - -- An adversary with a global view of all the peers and their connections. -- An adversary that can eavesdrop on communication links -between arbitrary pairs of peers (unless the adversary is one end of the communication). -In specific, the communication channels are assumed to be secure. - -## Wire Specification - -Peers communicate with each other using a request / response API. -The messages sent are Protobuf RPC messages which are implemented using -[protocol buffers v3](https://developers.google.com/protocol-buffers/). -The following are the specifications of the Protobuf messages. +The concept of `ephemeral` messages introduced in [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) affects `WAKU2-STORE` as well. +Nodes running `WAKU2-STORE` SHOULD support `ephemeral` messages as specified in [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md). +Nodes running `WAKU2-STORE` SHOULD NOT store messages with the `ephemeral` flag set to `true`. ### Payloads ```protobuf syntax = "proto3"; -message Index { - bytes digest = 1; - sint64 receiverTime = 2; - sint64 senderTime = 3; - string pubsubTopic = 4; +// Protocol identifier: /vac/waku/store-query/3.0.0 +package waku.store.v3; + +import "waku/message/v1/message.proto"; + +message WakuMessageKeyValue { + optional bytes message_hash = 1; // Globally unique key for a Waku Message + + // Full message content and associated pubsub_topic as value + optional waku.message.v1.WakuMessage message = 2; + optional string pubsub_topic = 3; } -message PagingInfo { - uint64 pageSize = 1; - Index cursor = 2; - enum Direction { - BACKWARD = 0; - FORWARD = 1; - } - Direction direction = 3; -} - -message ContentFilter { - string contentTopic = 1; -} - -message HistoryQuery { - // the first field is reserved for future use - string pubsubtopic = 2; - repeated ContentFilter contentFilters = 3; - PagingInfo pagingInfo = 4; -} - -message HistoryResponse { - // the first field is reserved for future use - repeated WakuMessage messages = 2; - PagingInfo pagingInfo = 3; - enum Error { - NONE = 0; - INVALID_CURSOR = 1; - } - Error error = 4; -} - -message HistoryRPC { +message StoreQueryRequest { string request_id = 1; - HistoryQuery query = 2; - HistoryResponse response = 3; + bool include_data = 2; // Response should include full message content + + // Filter criteria for content-filtered queries + optional string pubsub_topic = 10; + repeated string content_topics = 11; + optional sint64 time_start = 12; + optional sint64 time_end = 13; + + // List of key criteria for lookup queries + repeated bytes message_hashes = 20; // Message hashes (keys) to lookup + + // Pagination info. 50 Reserved + optional bytes pagination_cursor = 51; // Message hash (key) from where to start query (exclusive) + bool pagination_forward = 52; + optional uint64 pagination_limit = 53; +} + +message StoreQueryResponse { + string request_id = 1; + + optional uint32 status_code = 10; + optional string status_desc = 11; + + repeated WakuMessageKeyValue messages = 20; + + optional bytes pagination_cursor = 51; } ``` -#### Index +### General Store Query Concepts -To perform pagination, -each `WakuMessage` stored at a node running the `13/WAKU2-STORE` protocol -is associated with a unique `Index` that encapsulates the following parts. +#### Waku Message Key-Value Pairs -- `digest`: a sequence of bytes representing the SHA256 hash of a `WakuMessage`. -The hash is computed over the concatenation of `contentTopic` -and `payload` fields of a `WakuMessage` (see [14/WAKU2-MESSAGE](../14/message.md)). -- `receiverTime`: the UNIX time in nanoseconds -at which the `WakuMessage` is received by the receiving node. -- `senderTime`: the UNIX time in nanoseconds -at which the `WakuMessage` is generated by its sender. -- `pubsubTopic`: the pubsub topic on which the `WakuMessage` is received. +The store query protocol operates as a query protocol for a key-value store of historical messages, +with each entry having a [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) +and associated `pubsub_topic` as the value, +and [deterministic message hash](/waku/standards/core/14/message.md#deterministic-message-hashing) as the key. +The store can be queried to return either a set of keys or a set of key-value pairs. -#### PagingInfo +Within the store query protocol, +the [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) keys and +values MUST be represented in a `WakuMessageKeyValue` message. +This message MUST contain the deterministic `message_hash` as the key. +It MAY contain the full [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) and +associated pubsub topic as the value in the `message` and +`pubsub_topic` fields, depending on the use case as set out below. -`PagingInfo` holds the information required for pagination. -It consists of the following components. +If the message contains a value entry in addition to the key, +both the `message` and `pubsub_topic` fields MUST be populated. +The message MUST NOT have either `message` or `pubsub_topic` populated with the other unset. +Both fields MUST either be set or unset. -- `pageSize`: A positive integer indicating the number of queried `WakuMessage`s -in a `HistoryQuery` -(or retrieved `WakuMessage`s in a `HistoryResponse`). -- `cursor`: holds the `Index` of a `WakuMessage`. -- `direction`: indicates the direction of paging -which can be either `FORWARD` or `BACKWARD`. +#### Waku Message Store Eligibility -#### ContentFilter +In order for a message to be eligible for storage: -`ContentFilter` carries the information required for filtering historical messages. +- it MUST be a _valid_ [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md). +- the `timestamp` field MUST be populated with the Unix epoch time, +at which the message was generated in nanoseconds. +If at the time of storage the `timestamp` deviates by more than 20 seconds +either into the past or the future when compared to the store node’s internal clock, +the store node MAY reject the message. +- the `ephemeral` field MUST be set to `false`. -- `contentTopic` represents the content topic of the queried historical `WakuMessage`. - This field maps to the `contentTopic` field of the [14/WAKU2-MESSAGE](../14/message.md). - -#### HistoryQuery +#### Waku message sorting -RPC call to query historical messages. +The key-value entries in the store MUST be time-sorted by the [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) `timestamp` attribute. +Where two or more key-value entries have identical `timestamp` values, +the entries MUST be further sorted by the natural order of their message hash keys. +Within the context of traversing over key-value entries in the store, +_"forward"_ indicates traversing the entries in ascending order, +whereas _"backward"_ indicates traversing the entries in descending order. -- The `pubsubTopic` field MUST indicate the pubsub topic -of the historical messages to be retrieved. -This field denotes the pubsub topic on which `WakuMessage`s are published. -This field maps to `topicIDs` field of `Message` in [`11/WAKU2-RELAY`](../11/relay.md). -Leaving this field empty means no filter on the pubsub topic -of message history is requested. -This field SHOULD be left empty in order to retrieve the historical `WakuMessage` -regardless of the pubsub topics on which they are published. -- The `contentFilters` field MUST indicate the list of content filters -based on which the historical messages are to be retrieved. -Leaving this field empty means no filter on the content topic -of message history is required. -This field SHOULD be left empty in order -to retrieve historical `WakuMessage` regardless of their content topics. -- `PagingInfo` holds the information required for pagination. -Its `pageSize` field indicates the number of `WakuMessage`s -to be included in the corresponding `HistoryResponse`. -It is RECOMMENDED that the queried node defines a maximum page size internally. -If the querying node leaves the `pageSize` unspecified, -or if the `pageSize` exceeds the maximum page size, -the queried node SHOULD auto-paginate the `HistoryResponse` -to no more than the configured maximum page size. -This allows mitigation of long response time for `HistoryQuery`. -In the forward pagination request, -the `messages` field of the `HistoryResponse` SHALL contain, at maximum, -the `pageSize` amount of `WakuMessage` whose `Index` -values are larger than the given `cursor` -(and vise versa for the backward pagination). -Note that the `cursor` of a `HistoryQuery` MAY be empty -(e.g., for the initial query), as such, and -depending on whether the `direction` is `BACKWARD` or -`FORWARD` the last or the first `pageSize` `WakuMessage` SHALL be returned, -respectively. +#### Pagination -#### Sorting Messages +If a large number of entries in the store service node match the query criteria provided in a `StoreQueryRequest`, +the client MAY make use of pagination +in a chain of store query request and response transactions +to retrieve the full response in smaller batches termed _"pages"_. +Pagination can be performed either in [a _forward_ or _backward_ direction](#waku-message-sorting). -The queried node MUST sort the `WakuMessage` based on their `Index`, -where the `senderTime` constitutes the most significant part and -the `digest` comes next, and -then perform pagination on the sorted result. -As such, the retrieved page contains an ordered list of `WakuMessage` -from the oldest messages to the most recent one. -Alternatively, the `receiverTime` (instead of `senderTime`) -MAY be used to sort messages during the paging process. -However, it is RECOMMENDED the use of the `senderTime` -for sorting as it is invariant and -consistent across all the nodes. -This has the benefit of `cursor` reusability i.e., -a `cursor` obtained from one node can be consistently used -to query from another node. -However, this `cursor` reusability does not hold when the `receiverTime` is utilized -as the receiver time is affected by the network delay and -nodes' clock asynchrony. +A store query client MAY indicate the maximum number of matching entries it wants in the `StoreQueryResponse`, +by setting the page size limit in the `pagination_limit` field. +Note that a store service node MAY enforce its own limit +if the `pagination_limit` is unset +or larger than the service node's internal page size limit. -#### HistoryResponse +A `StoreQueryResponse` with a populated `pagination_cursor` indicates that more stored entries match the query than included in the response. -RPC call to respond to a HistoryQuery call. +A `StoreQueryResponse` without a populated `pagination_cursor` indicates that +there are no more matching entries in the store. -- The `messages` field MUST contain the messages found, -these are [14/WAKU2-MESSAGE](../14/message.md) types. -- `PagingInfo` holds the paging information based -on which the querying node can resume its further history queries. -The `pageSize` indicates the number of returned Waku messages -(i.e., the number of messages included in the `messages` field of `HistoryResponse`). -The `direction` is the same direction as in the corresponding `HistoryQuery`. -In the forward pagination, the `cursor` holds the `Index` of the last message -in the `HistoryResponse` `messages` (and the first message in the backward paging). -Regardless of the paging direction, -the retrieved `messages` are always sorted in ascending order -based on their timestamp as explained in the [sorting messages](#sorting-messages)section, -that is, from the oldest to the most recent. -The requester SHALL embed the returned `cursor` inside its next `HistoryQuery` -to retrieve the next page of the [14/WAKU2-MESSAGE](../14/message.md). -The `cursor` obtained from one node SHOULD NOT be used in a request to another node -because the result may be different. -- The `error` field contains information about any error that has occurred -while processing the corresponding `HistoryQuery`. -`NONE` stands for no error. -This is also the default value. -`INVALID_CURSOR` means that the `cursor` field of `HistoryQuery` -does not match with the `Index` of any of the `WakuMessage` -persisted by the queried node. +The client MAY request the next page of entries from the store service node +by populating a subsequent `StoreQueryRequest` with the `pagination_cursor` +received in the `StoreQueryResponse`. +All other fields and query criteria MUST be the same as in the preceding `StoreQueryRequest`. -## Security Consideration +A `StoreQueryRequest` without a populated `pagination_cursor` indicates that +the client wants to retrieve the "first page" of the stored entries matching the query. -The main security consideration to take into account -while using this protocol is that a querying node -have to reveal their content filters of interest to the queried node, +### Store Query Request + +A client node MUST send all historical message queries within a `StoreQueryRequest` message. +This request MUST contain a `request_id`. +The `request_id` MUST be a uniquely generated string. + +If the store query client requires the store service node to include [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) values in the query response, +it MUST set `include_data` to `true`. +If the store query client requires the store service node to return only message hash keys in the query response, +it SHOULD set `include_data` to `false`. +By default, therefore, the store service node assumes `include_data` to be `false`. + +A store query client MAY include query filter criteria in the `StoreQueryRequest`. +There are two types of filter use cases: + +1. Content filtered queries and +2. Message hash lookup queries + +#### Content filtered queries + +A store query client MAY request the store service node to filter historical entries by a content filter. +Such a client MAY create a filter on content topic, on time range or on both. + +To filter on content topic, +the client MUST populate _both_ the `pubsub_topic` _and_ `content_topics` field. +The client MUST NOT populate either `pubsub_topic` or +`content_topics` and leave the other unset. +Both fields MUST either be set or unset. +A mixed content topic filter with just one of either `pubsub_topic` or +`content_topics` set, SHOULD be regarded as an invalid request. + +To filter on time range, the client MUST set `time_start`, `time_end` or both. +Each `time_` field should contain a Unix epoch timestamp in nanoseconds. +An unset `time_start` SHOULD be interpreted as "from the oldest stored entry". +An unset `time_end` SHOULD be interpreted as "up to the youngest stored entry". + +If any of the content filter fields are set, +namely `pubsub_topic`, `content_topic`, `time_start`, or `time_end`, +the client MUST NOT set the `message_hashes` field. + +#### Message hash lookup queries + +A store query client MAY request the store service node to filter historical entries by one or +more matching message hash keys. +This type of query acts as a "lookup" against a message hash key or +set of keys already known to the client. + +In order to perform a lookup query, +the store query client MUST populate the `message_hashes` field with the list of message hash keys it wants to lookup in the store service node. + +If the `message_hashes` field is set, +the client MUST NOT set any of the content filter fields, +namely `pubsub_topic`, `content_topic`, `time_start`, or `time_end`. + +#### Presence queries + +A presence query is a special type of lookup query that allows a client to check for the presence of one or +more messages in the store service node, +without retrieving the full contents (values) of the messages. +This can, for example, be used as part of a reliability mechanism, +whereby store query clients verify that previously published messages have been successfully stored. + +In order to perform a presence query, +the store query client MUST populate the `message_hashes` field in the `StoreQueryRequest` with the list of message hashes +for which it wants to verify presence in the store service node. +The `include_data` property MUST be set to `false`. +The client SHOULD interpret every `message_hash` returned in the `messages` field of the `StoreQueryResponse` as present in the store. +The client SHOULD assume that all other message hashes included in the original `StoreQueryRequest` but +not in the `StoreQueryResponse` is not present in the store. + +#### Pagination info + +The store query client MAY include a message hash as `pagination_cursor`, +to indicate at which key-value entry a store service node SHOULD start the query. +The `pagination_cursor` is treated as exclusive +and the corresponding entry will not be included in subsequent store query responses. + +For forward queries, +only messages following (see [sorting](#waku-message-sorting)) the one indexed at `pagination_cursor` +will be returned. +For backward queries, +only messages preceding (see [sorting](#waku-message-sorting)) the one indexed at `pagination_cursor` +will be returned. + +If the store query client requires the store service node to perform a forward query, +it MUST set `pagination_forward` to `true`. +If the store query client requires the store service node to perform a backward query, +it SHOULD set `pagination_forward` to `false`. +By default, therefore, the store service node assumes pagination to be backward. + +A store query client MAY indicate the maximum number of matching entries it wants in the `StoreQueryResponse`, +by setting the page size limit in the `pagination_limit` field. +Note that a store service node MAY enforce its own limit +if the `pagination_limit` is unset +or larger than the service node's internal page size limit. + +See [pagination](#pagination) for more on how the pagination info is used in store transactions. + +### Store Query Response + +In response to any `StoreQueryRequest`, +a store service node SHOULD respond with a `StoreQueryResponse` with a `requestId` matching that of the request. +This response MUST contain a `status_code` indicating if the request was successful or not. +Successful status codes are in the `2xx` range. +A client node SHOULD consider all other status codes as error codes and +assume that the requested operation had failed. +In addition, +the store service node MAY choose to provide a more detailed status description in the `status_desc` field. + +#### Filter matching + +For [content filtered queries](#content-filtered-queries), +an entry in the store service node matches the filter criteria in a `StoreQueryRequest` if each of the following conditions are met: + +- its `content_topic` is in the request `content_topics` set +and it was published on a matching `pubsub_topic` OR the request `content_topics` and +`pubsub_topic` fields are unset +- its `timestamp` is _larger or equal_ than the request `start_time` OR the request `start_time` is unset +- its `timestamp` is _smaller_ than the request `end_time` OR the request `end_time` is unset + +Note that for content filtered queries, `start_time` is treated as _inclusive_ and +`end_time` is treated as _exclusive_. + +For [message hash lookup queries](#message-hash-lookup-queries), +an entry in the store service node matches the filter criteria if its `message_hash` is in the request `message_hashes` set. + +The store service node SHOULD respond with an error code and +discard the request if the store query request contains both content filter criteria +and message hashes. + +#### Populating response messages + +The store service node SHOULD populate the `messages` field in the response +only with entries matching the filter criteria provided in the corresponding request. +Regardless of whether the response is to a _forward_ or _backward_ query, +the `messages` field in the response MUST be ordered in a forward direction +according to the [message sorting rules](#waku-message-sorting). + +If the corresponding `StoreQueryRequest` has `include_data` set to true, +the service node SHOULD populate both the `message_hash` and +`message` for each entry in the response. +In all other cases, +the store service node SHOULD populate only the `message_hash` field for each entry in the response. + +#### Paginating the response + +The response SHOULD NOT contain more `messages` than the `pagination_limit` provided in the corresponding `StoreQueryRequest`. +It is RECOMMENDED that the store node defines its own maximum page size internally. +If the `pagination_limit` in the request is unset, +or exceeds this internal maximum page size, +the store service node SHOULD ignore the `pagination_limit` field and +apply its own internal maximum page size. + +In response to a _forward_ `StoreQueryRequest`: + +- if the `pagination_cursor` is set, + the store service node SHOULD populate the `messages` field + with matching entries following the `pagination_cursor` (exclusive). +- if the `pagination_cursor` is unset, + the store service node SHOULD populate the `messages` field + with matching entries from the first entry in the store. +- if there are still more matching entries in the store + after the maximum page size is reached while populating the response, + the store service node SHOULD populate the `pagination_cursor` in the `StoreQueryResponse` + with the message hash key of the _last_ entry _included_ in the response. + +In response to a _backward_ `StoreQueryRequest`: + +- if the `pagination_cursor` is set, + the store service node SHOULD populate the `messages` field + with matching entries preceding the `pagination_cursor` (exclusive). +- if the `pagination_cursor` is unset, + the store service node SHOULD populate the `messages` field + with matching entries from the last entry in the store. +- if there are still more matching entries in the store + after the maximum page size is reached while populating the response, + the store service node SHOULD populate the `pagination_cursor` in the `StoreQueryResponse` + with the message hash key of the _first_ entry _included_ in the response. + +### Security Consideration + +The main security consideration while using this protocol is that a querying node has to reveal its content filters of interest to the queried node, hence potentially compromising their privacy. -## Future Work +#### Adversarial Model -- **Anonymous query**: This feature guarantees that nodes -can anonymously query historical messages from other nodes i.e., -without disclosing the exact topics of [14/WAKU2-MESSAGE](../14/message.md) -they are interested in. -As such, no adversary in the `13/WAKU2-STORE` protocol -would be able to learn which peer is interested in which content filters i.e., -content topics of [14/WAKU2-MESSAGE](../14/message.md). -The current version of the `13/WAKU2-STORE` protocol does not provide anonymity -for historical queries, -as the querying node needs to directly connect to another node -in the `13/WAKU2-STORE` protocol and -explicitly disclose the content filters of its interest -to retrieve the corresponding messages. +Any peer running the `WAKU2-STORE` protocol, i.e. +both the querying node and the queried node, are considered as an adversary. +Furthermore, +we currently consider the adversary as a passive entity that attempts to collect information from other peers to conduct an attack but +it does so without violating protocol definitions and instructions. +As we evolve the protocol, +further adversarial models will be considered. +For example, under the passive adversarial model, +no malicious node hides or +lies about the history of messages as it is against the description of the `WAKU2-STORE` protocol. + +The following are not considered as part of the adversarial model: + +- An adversary with a global view of all the peers and their connections. +- An adversary that can eavesdrop on communication links between arbitrary pairs of peers (unless the adversary is one end of the communication). +Specifically, the communication channels are assumed to be secure. + +### Future Work + +- **Anonymous query**: This feature guarantees that nodes can anonymously query historical messages from other nodes i.e., +without disclosing the exact topics of [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) they are interested in. +As such, no adversary in the `WAKU2-STORE` protocol would be able to learn which peer is interested in which content filters i.e., +content topics of [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md). +The current version of the `WAKU2-STORE` protocol does not provide anonymity for historical queries, +as the querying node needs to directly connect to another node in the `WAKU2-STORE` protocol and +explicitly disclose the content filters of its interest to retrieve the corresponding messages. However, one can consider preserving anonymity through one of the following ways: - - By hiding the source of the request i.e., anonymous communication. - That is the querying node shall hide all its PII in its history request - e.g., its IP address. - This can happen by the utilization of a proxy server or by using Tor. - Note that the current structure of historical requests - does not embody any piece of PII, otherwise, - such data fields must be treated carefully to achieve query anonymity. - - - By deploying secure 2-party computations in which the querying node - obtains the historical messages of a certain topic, - the queried node learns nothing about the query. - Examples of such 2PC protocols are secure one-way Private Set Intersections (PSI). - - + +- By hiding the source of the request i.e., anonymous communication. +That is the querying node shall hide all its PII in its history request e.g., +its IP address. +This can happen by the utilization of a proxy server or by using Tor. +Note that the current structure of historical requests does not embody any piece of PII, otherwise, +such data fields must be treated carefully to achieve query anonymity. + +- By deploying secure 2-party computations +in which the querying node obtains the historical messages of a certain topic, +the queried node learns nothing about the query. +Examples of such 2PC protocols are secure one-way Private Set Intersections (PSI). + +This attack is not included in the specs since it does not fit the +passive adversarial model (the attacker needs to deviate from the store protocol).--> -- **Robust and verifiable timestamps**: -Messages timestamp is a way to show that the message existed -prior to some point in time. +- **Robust and verifiable timestamps**: Messages timestamp is a way to show that +the message existed prior to some point in time. However, the lack of timestamp verifiability can create room for a range of attacks, including injecting messages with invalid timestamps pointing to the far future. To better understand the attack, @@ -317,31 +408,25 @@ that are sorted based on their timestamp. An attacker sends a message with an arbitrary large timestamp e.g., 10 hours ahead of the correct clock `(m',2021-01-01 10:00:30)`. The store node places `m'` at the end of the list, - -```text -(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20),(m',2021-01-01 10:00:30). -``` - +`(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20), +(m',2021-01-01 10:00:30)`. Now another message arrives with a valid timestamp e.g., `(m11, 2021-01-01 00:00:45)`. However, since its timestamp precedes the malicious message `m'`, it gets placed before `m'` in the list i.e., - -```text -(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20), (m11, 2021-01-01 00:00:45), (m',2021-01-01 10:00:30). -``` - +`(m1,2021-01-01 00:00:00), (m2,2021-01-01 00:00:01), ..., (m10:2021-01-01 00:00:20), +(m11, 2021-01-01 00:00:45), (m',2021-01-01 10:00:30)`. In fact, for the next 10 hours, `m'` will always be considered as the most recent message and -served as the last message to the querying nodes irrespective -of how many other messages arrive afterward. +served as the last message to the querying nodes irrespective of how many other +messages arrive afterward. -A robust and verifiable timestamp allows the receiver of a message -to verify that a message has been generated prior to the claimed timestamp. +A robust and verifiable timestamp allows the receiver of a message to verify that +a message has been generated prior to the claimed timestamp. One solution is the use of [open timestamps](https://opentimestamps.org/) e.g., block height in Blockchain-based timestamps. -That is, messages contain the most recent block height -perceived by their senders at the time of message generation. +That is, messages contain the most recent block height perceived by their senders +at the time of message generation. This proves accuracy within a range of minutes (e.g., in Bitcoin blockchain) or seconds (e.g., in Ethereum 2.0) from the time of origination. @@ -352,7 +437,6 @@ Copyright and related rights waived via ## References -1. [14/WAKU2-MESSAGE](../14/message.md) +1. [14/WAKU2-MESSAGE](/waku/standards/core/14/message.md) 2. [protocol buffers v3](https://developers.google.com/protocol-buffers/) -3. [11/WAKU2-RELAY](../11/relay.md) -4. [Open timestamps](https://opentimestamps.org/) +3. [Open timestamps](https://opentimestamps.org/)