diff --git a/vac/raw/sds.md b/vac/raw/sds.md index 2d909e5..727884e 100644 --- a/vac/raw/sds.md +++ b/vac/raw/sds.md @@ -58,6 +58,9 @@ other participants using the corresponding message ID. * **Participant ID:** Each participant has a globally unique, immutable ID visible to other participants in the communication. +* **Sender ID:** +The *Participant ID* of the original sender of a message, +often coupled with a *Message ID*. ## Wire protocol @@ -75,6 +78,8 @@ syntax = "proto3"; message HistoryEntry { string message_id = 1; // Unique identifier of the SDS message, as defined in `Message` optional bytes retrieval_hint = 2; // Optional information to help remote parties retrieve this SDS message; For example, A Waku deterministic message hash or routing payload hash + + optional string sender_id = 3; // Participant ID of original message sender. Only populated if using optional SDS Repair extension } message Message { @@ -84,6 +89,9 @@ message Message { optional uint64 lamport_timestamp = 10; // Logical timestamp for causal ordering in channel repeated HistoryEntry causal_history = 11; // List of preceding message IDs that this message causally depends on. Generally 2 or 3 message IDs are included. optional bytes bloom_filter = 12; // Bloom filter representing received message IDs in channel + + repeated HistoryEntry repair_request = 13; // Capped list of history entries missing from sender's causal history. Only populated if using the optional SDS Repair extension. + optional bytes content = 20; // Actual content of the message } ``` @@ -289,6 +297,181 @@ Upon reception, ephemeral messages SHOULD be delivered immediately without buffering for causal dependencies or including in the local log. +### SDS Repair (SDS-R) + +SDS Repair (SDS-R) is an optional extension module for SDS, +allowing participants in a communication to collectively repair any gaps in causal history (missing messages) +preferably over a limited time window. +Since SDS-R acts as coordinated rebroadcasting of missing messages, +which involves all participants of the communication, +it is most appropriate in a limited use case for repairing relatively recent missed dependencies. +It is not meant to replace mechanisms for long-term consistency, +such as peer-to-peer syncing or the use of a high-availability centralised cache (Store node). + +#### SDS-R message fields + +SDS-R adds the following fields to SDS messages: +* `sender_id` in `HistoryEntry`: +the original message sender's participant ID. +This is used to determine the group of participants who will respond to a repair request. +* `repair_request` in `Message`: +a capped list of history entries missing for the message sender +and for which it's requesting a repair. + +#### SDS-R participant state + +SDS-R adds the following to each participant state: + +* Outgoing **repair request buffer**: +a list of locally missing `HistoryEntry`s +each mapped to a future request timestamp, `T_req`, +after which this participant will request a repair if at that point the missing dependency has not been repaired yet. +`T_req` is computed as a pseudorandom backoff from the timestamp when the dependency was detected missing. +[Determining `T_req`](#determine-t_req) is described below. +We RECOMMEND that the outgoing repair request buffer be chronologically ordered in ascending order of `T_req`. +- Incoming **repair request buffer**: +a list of locally available `HistoryEntry`s +that were requested for repair by a remote participant +AND for which this participant might be an eligible responder, +each mapped to a future response timestamp, `T_resp`, +after which this participant will rebroadcast the corresponding requested `Message` if at that point no other participant had rebroadcast the `Message`. +`T_resp` is computed as a pseudorandom backoff from the timestamp when the repair was first requested. +[Determining `T_resp`](#determine-t_resp) is described below. +We describe below how a participant can [determine if they're an eligible responder](#determine-response-group) for a specific repair request. +- Augmented local history log: +for each message ID kept in the local log for which the participant could be a repair responder, +the full SDS `Message` must be cached rather than just the message ID, +in case this participant is called upon to rebroadcast the message. +We describe below how a participant can [determine if they're an eligible responder](#determine-response-group) for a specific message. + +**_Note:_** The required state can likely be significantly reduced in future by simply requiring that a responding participant should _reconstruct_ the original `Message` when rebroadcasting, rather than the simpler, but heavier, requirement of caching the entire received `Message` content in local history. + +#### SDS-R global state + +For a specific channel (that is, within a specific SDS-controlled communication) +the following SDS-R configuration state SHOULD be common for all participants in the conversation: + +* `T_min`: the _minimum_ time period to wait before a missing causal entry can be repaired. +We RECOMMEND a value of at least 30 seconds. +* `T_max`: the _maximum_ time period over which missing causal entries can be repaired. +We RECOMMEND a value of between 120 and 600 seconds. + +Furthermore, to avoid a broadcast storm with multiple participants responding to a repair request, +participants in a single channel MAY be divided into discrete response groups. +Participants will only respond to a repair request if they are in the response group for that request. +The global `num_response_groups` variable configures the number of response groups for this communication. +Its use is described below. +A reasonable default value for `num_response_groups` is one response group for every `128` participants. +In other words, if the (roughly) expected number of participants is expressed as `num_participants`, then +`num_response_groups = num_participants div 128 + 1`. +In other words, if there are fewer than 128 participants in a communication, +they will all belong to the same response group. + +We RECOMMEND that the global state variables `T_min`, `T_max` and `num_response_groups` be set _statically_ for a specific SDS-R application, +based on expected number of group participants and volume of traffic. + +**_Note:_** Future versions of this protocol will recommend dynamic global SDS-R variables, based on the current number of participants. + +#### SDS-R send message + +SDS-R adds the following steps when sending a message: + +Before broadcasting a message, +* the participant SHOULD populate the `repair_request` field in the message +with _eligible_ entries from the outgoing repair request buffer. +An entry is eligible to be included in a `repair_request` +if its corresponding request timestamp, `T_req`, has expired (in other words, `T_req <= current_time`). +The maximum number of repair request entries to include is up to the application. +We RECOMMEND that this quota be filled by the eligible entries from the outgoing repair request buffer with the lowest `T_req`. +We RECOMMEND a maximum of 3 entries. +If there are no eligible entries in the buffer, this optional field MUST be left unset. + +#### SDS-R receive message + +On receiving a message, +* the participant MUST remove entries matching the received message ID from its _outgoing_ repair request buffer. +This ensures that the participant does not request repairs for dependencies that have now been met. +* the participant MUST remove entries matching the received message ID from its _incoming_ repair request buffer. +This ensures that the participant does not respond to repair requests that another participant has already responded to. +* the participant SHOULD add any unmet causal dependencies to its outgoing repair request buffer against a unique `T_req` timestamp for that entry. +It MUST compute the `T_req` for each such HistoryEntry according to the steps outlined in [_Determine T_req_](#determine-t_req). +* for each item in the `repair_request` field: + - the participant MUST remove entries matching the repair message ID from its own outgoing repair request buffer. + This limits the number of participants that will request a common missing dependency. + - if the participant has the requested `Message` in its local history _and_ is an eligible responder for the repair request, + it SHOULD add the request to its incoming repair request buffer against a unique `T_resp` timestamp for that entry. + It MUST compute the `T_resp` for each such repair request according to the steps outlined in [_Determine T_resp_](#determine-t_resp). + It MUST determine if it's an eligible responder for a repair request according to the steps outlined in [_Determine response group_](#determine-response-group). + +#### Determine T_req + +A participant determines the repair request timestamp, `T_req`, +for a missing `HistoryEntry` as follows: + +``` +T_req = current_time + hash(participant_id, message_id) % (T_max - T_min) + T_min +``` + +where `current_time` is the current timestamp, +`participant_id` is the participant's _own_ participant ID (not the `sender_id` in the missing `HistoryEntry`), +`message_id` is the missing `HistoryEntry`'s message ID, +and `T_min` and `T_max` are as set out in [SDS-R global state](#sds-r-global-state). + +This allows `T_req` to be pseudorandomly and linearly distributed as a backoff of between `T_min` and `T_max` from current time. + +> **_Note:_** placing `T_req` values on an exponential backoff curve will likely be more appropriate and is left for a future improvement. + +#### Determine T_resp + +A participant determines the repair response timestamp, `T_resp`, +for a `HistoryEntry` that it could repair as follows: + +``` +distance = hash(participant_id) XOR hash(sender_id) +T_resp = current_time + distance*hash(message_id) % T_max +``` + +where `current_time` is the current timestamp, +`participant_id` is the participant's _own_ (local) participant ID, +`sender_id` is the requested `HistoryEntry` sender ID, +`message_id` is the requested `HistoryEntry` message ID, +and `T_max` is as set out in [SDS-R global state](#sds-r-global-state). + +We first calculate the logical `distance` between the local `participant_id` and the original `sender_id`. +If this participant is the original sender, the `distance` will be `0`. +It should then be clear that the original participant will have a response backoff time of `0`, making it the most likely responder. +The `T_resp` values for other eligible participants will be pseudorandomly and linearly distributed as a backoff of up to `T_max` from current time. + +> **_Note:_** placing `T_resp` values on an exponential backoff curve will likely be more appropriate and is left for a future improvement. + +#### Determine response group + +Given a message with `sender_id` and `message_id`, +a participant with `participant_id` is in the response group for that message if + +``` +hash(participant_id, message_id) % num_response_groups == hash(sender_id, message_id) % num_response_groups +``` + +where `num_response_groups` is as set out in [SDS-R global state](#sds-r-global-state). +This ensures that a participant will always be in the response group for its own published messages. +It also allows participants to determine immediately on first reception of a message or a history entry +if they are in the associated response group. + +#### SDS-R incoming repair request buffer sweep + +An SDS-R participant MUST periodically check if there are any incoming requests in the *incoming repair request buffer* that is due for a response. +For each item in the buffer, +the participant SHOULD broadcast the corresponding `Message` from local history +if its corresponding response timestamp, `T_resp`, has expired (in other words, `T_resp <= current_time`). + +#### SDS-R Periodic Sync Message + +If the participant is due to send a periodic sync message, +it SHOULD send the message according to [SDS-R send message](#sds-r-send-message) +if there are any eligible items in the outgoing repair request buffer, +regardless of whether other participants have also recently broadcast a Periodic Sync message. + ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).