mirror of
https://github.com/vacp2p/specs.git
synced 2026-01-07 22:44:07 -05:00
446 lines
17 KiB
Markdown
446 lines
17 KiB
Markdown
# PubSub interface for libp2p
|
|
|
|
> Generalized publish/subscribe interface for libp2p.
|
|
|
|
| Lifecycle Stage | Maturity | Status | Latest Revision |
|
|
|-----------------|----------------|--------|-----------------|
|
|
| 3A | Recommendation | Active | r3, 2020-09-25 |
|
|
|
|
Authors: [@whyrusleeping], [@protolambda], [@raulk], [@vyzo].
|
|
|
|
Interest Group: [@yusefnapora], [@raulk], [@vyzo], [@Stebalien], [@jamesray1], [@vasco-santos]
|
|
|
|
[@whyrusleeping]: https://github.com/whyrusleeping
|
|
[@yusefnapora]: https://github.com/yusefnapora
|
|
[@raulk]: https://github.com/raulk
|
|
[@vyzo]: https://github.com/vyzo
|
|
[@Stebalien]: https://github.com/Stebalien
|
|
[@jamesray1]: https://github.com/jamesray1
|
|
[@vasco-santos]: https://github.com/vasco-santos
|
|
[@protolambda]: https://github.com/protolambda
|
|
|
|
See the [lifecycle document][lifecycle-spec] for context about the maturity level
|
|
and spec status.
|
|
|
|
[lifecycle-spec]: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md
|
|
|
|
## Table of Contents
|
|
|
|
- [PubSub interface for libp2p](#pubsub-interface-for-libp2p)
|
|
- [Table of Contents](#table-of-contents)
|
|
- [Overview](#overview)
|
|
- [Implementations](#implementations)
|
|
- [The RPC](#the-rpc)
|
|
- [The Message](#the-message)
|
|
- [Message Signing](#message-signing)
|
|
- [Signature Policy](#signature-policy)
|
|
- [Message Identification](#message-identification)
|
|
- [The Topic Descriptor](#the-topic-descriptor)
|
|
- [AuthOpts](#authopts)
|
|
- [AuthMode 'NONE'](#authmode-none)
|
|
- [AuthMode 'KEY'](#authmode-key)
|
|
- [AuthMode 'WOT'](#authmode-wot)
|
|
- [EncOpts](#encopts)
|
|
- [EncMode 'NONE'](#encmode-none)
|
|
- [EncMode 'SHAREDKEY'](#encmode-sharedkey)
|
|
- [EncMode 'WOT'](#encmode-wot)
|
|
- [Topic Validation](#topic-validation)
|
|
|
|
|
|
## Overview
|
|
|
|
This is the specification for generalized pubsub over libp2p. Pubsub in libp2p
|
|
is currently still experimental and this specification is subject to change.
|
|
This document does not go over specific implementation of pubsub routing
|
|
algorithms, it merely describes the common wire format that implementations
|
|
will use.
|
|
|
|
libp2p pubsub currently uses reliable ordered streams between peers. It assumes
|
|
that each peer is certain of the identity of each peer it is communicating
|
|
with. It does not assume that messages between peers are encrypted, however
|
|
encryption defaults to being enabled on libp2p streams.
|
|
|
|
You can find information about the PubSub research and notes in the following repos:
|
|
|
|
- https://github.com/libp2p/research-pubsub
|
|
- https://github.com/libp2p/pubsub-notes
|
|
|
|
## Implementations
|
|
- FloodSub, simple flooding pubsub (2017)
|
|
- [libp2p/go-libp2p-pubsub/floodsub.go](https://github.com/libp2p/go-libp2p-pubsub/blob/master/floodsub.go);
|
|
- [libp2p/js-libp2p-floodsub](http://github.com/libp2p/js-libp2p-floodsub);
|
|
- [libp2p/rust-libp2p/floodsub](https://github.com/libp2p/rust-libp2p/tree/master/protocols/floodsub)
|
|
- [status-im/nim-libp2p/floodsub](https://github.com/status-im/nim-libp2p/blob/master/libp2p/protocols/pubsub/floodsub.nim)
|
|
- GossipSub, extensible baseline pubsub (2018)
|
|
- [gossipsub](https://github.com/libp2p/specs/tree/master/pubsub/gossipsub#implementation-status)
|
|
- [EpiSub](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/episub.md), an epidemic broadcast tree router (defined 2018, not yet started as of Oct 2018)
|
|
|
|
## Stream management
|
|
|
|
Data should be exchanged between peers using two separately negotiated streams,
|
|
one inbound, one outbound. These streams are treated as unidirectional streams.
|
|
The outbound stream is used only to write data. The inbound stream is used only
|
|
to read data.
|
|
|
|
## The RPC
|
|
|
|
All communication between peers happens in the form of exchanging protobuf RPC
|
|
messages between participating peers.
|
|
|
|
The `RPC` protobuf is as follows:
|
|
|
|
```protobuf
|
|
syntax = "proto2";
|
|
message RPC {
|
|
repeated SubOpts subscriptions = 1;
|
|
repeated Message publish = 2;
|
|
|
|
message SubOpts {
|
|
optional bool subscribe = 1;
|
|
optional string topicid = 2;
|
|
}
|
|
}
|
|
```
|
|
|
|
This is a relatively simple message containing zero or more subscription
|
|
messages, and zero or more content messages. The subscription messages contain
|
|
a topicid string that specifies the topic, and a boolean signifying whether to
|
|
subscribe or unsubscribe to the given topic. True signifies 'subscribe' and
|
|
false signifies 'unsubscribe'.
|
|
|
|
## The Message
|
|
|
|
The RPC message can contain zero or more messages of type 'Message'. The Message protobuf looks like this:
|
|
|
|
```protobuf
|
|
syntax = "proto2";
|
|
message Message {
|
|
optional string from = 1;
|
|
optional bytes data = 2;
|
|
optional bytes seqno = 3;
|
|
required string topic = 4;
|
|
optional bytes signature = 5;
|
|
optional bytes key = 6;
|
|
}
|
|
```
|
|
|
|
The `optional` fields may be omitted, depending on the
|
|
[signature policy](#message-signing) and
|
|
[message ID function](#message-identification).
|
|
|
|
The `from` field (optional) denotes the author of the message. This is the peer
|
|
who initially authored the message, and NOT the peer who propagated it. Thus, as
|
|
the message is routed through a swarm of pubsubbing peers, the original
|
|
authorship is preserved.
|
|
|
|
The `seqno` field (optional) is a 64-bit big-endian uint that is a linearly
|
|
increasing number that is unique among messages originating from each given
|
|
peer. No two messages on a pubsub topic from the same peer should have the same
|
|
`seqno` value, however messages from different peers may have the same sequence
|
|
number. In other words, this number is not globally unique. It is used in
|
|
conjunction with `from` to derive a unique `message_id` (in the default
|
|
configuration).
|
|
|
|
Henceforth, we define the term **origin-stamped messaging** to refer to messages
|
|
whose `from` and `seqno` fields are populated.
|
|
|
|
The `data` (optional) field is an opaque blob of data representing the payload.
|
|
It can contain any data that the publisher wants it to.
|
|
|
|
The `topic` field specifies a topic that this message is being
|
|
published to.
|
|
|
|
The `signature` and `key` fields (optional) are used for message signing, if
|
|
such feature is enabled, as explained below.
|
|
|
|
The size of the `Message` should be limited, say to 1 MiB, but could also
|
|
be configurable, for more information see
|
|
[issue 118](https://github.com/libp2p/specs/issues/118), while messages should be
|
|
rejected if they are over this size.
|
|
Note that for applications where state such as messages is
|
|
stored, such as blockchains, it is suggested to have some kind of storage
|
|
economics (see e.g.
|
|
[here](https://ethresear.ch/t/draft-position-paper-on-resource-pricing/2838),
|
|
[here](https://ethresear.ch/t/ethereum-state-rent-for-eth-1-x-pre-eip-document/4378)
|
|
and
|
|
[here](https://ethresear.ch/t/improving-the-ux-of-rent-with-a-sleeping-waking-mechanism/1480)).
|
|
|
|
## Message Identification
|
|
|
|
Pubsub requires to uniquely identify messages via a message ID. This enables
|
|
a wide range of processes like de-duplication, tracking, scoring,
|
|
circuit-breaking, and others.
|
|
|
|
**The `message_id` is calculated from the `Message` struct.**
|
|
|
|
By default, **origin-stamping** is in force. This strategy relies on the string
|
|
concatenation of the `from` and `seqno` fields, to uniquely identify a message
|
|
based on the *author*.
|
|
|
|
Alternatively, a user-defined `message_id_fn` may be supplied, where
|
|
`message_id_fn(Message) => message_id`. Such a function could compute the hash
|
|
of the `data` field within the `Message`, and thus one could reify
|
|
**content-addressed messaging**.
|
|
|
|
If fabricated collisions are not a concern, or difficult enough within the
|
|
window the message is relevant in, a `message_id` based on a short digest of
|
|
inputs may benefit performance.
|
|
|
|
> **[[ Margin note ]]:** There's a potential caveat with using hashes instead of
|
|
> seqnos: the peer won't be able to send identical messages (e.g. keepalives)
|
|
> within the timecache interval, as they will get treated as duplicates. This
|
|
> consequence may or may not be relevant to the application at hand.
|
|
> Reference: [#116](https://github.com/libp2p/specs/issues/116).
|
|
|
|
**Note that the availability of these fields on the `Message` object will depend
|
|
on the [signature policy](#signature-policy) configured for the topic.**
|
|
|
|
Whichever the choice, it is crucial that **all peers** participating in a topic
|
|
implement identical message ID calculation logic, or the topic will malfunction.
|
|
|
|
## Message Signing
|
|
|
|
Signature behavior is configured in two axes: signature creation, and signature
|
|
verification.
|
|
|
|
**Signature creation.** There are two configurations possible:
|
|
|
|
* `Sign`: when publishing a message, perform **origin-stamping** and produce a
|
|
signature.
|
|
* `NoSign`: when publishing a message, do not perform **origin-stamping** and
|
|
do not produce a signature.
|
|
|
|
For signing purposes, the `signature` and `key` fields are used:
|
|
- The `signature` field contains the signature.
|
|
- The `key` field contains the signing key when it cannot be inlined in
|
|
the source peer ID (`from`). When present, it must match the peer ID.
|
|
|
|
The signature is computed over the marshalled message protobuf _excluding_ the
|
|
`signature` field itself.
|
|
|
|
This includes any fields that are not recognized, but still included in the
|
|
marshalled data.
|
|
|
|
The protobuf blob is prefixed by the string `libp2p-pubsub:` before signing.
|
|
|
|
> **[[ Margin note: ]]** Protobuf serialization is non-deterministic/canonical,
|
|
> and the same data structure may result in different, valid serialised bytes
|
|
> across implementations, as well as other issues. In the near future, the
|
|
> signature creation and verification algorithm will be made deterministic.
|
|
|
|
**Signature verification.** There are two configurations possible:
|
|
|
|
* `Strict`: either expect or not expect a signature.
|
|
* `Lax` (legacy, insecure, underterministic, to be deprecated): accept a signed
|
|
message if the signature verification passes, or if it's unsigned.
|
|
|
|
When signature validation fails for a signed message, the implementation must
|
|
drop the message and omit propagation. Locally, it may treat this event in
|
|
whichever manner it wishes (e.g. logging, penalization, etc.).
|
|
|
|
#### Signature Policy Options
|
|
|
|
The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message`
|
|
is configurable per topic.
|
|
|
|
> **[[ Implementation note ]]:** At the time of writing this section,
|
|
> go-libp2p-pubsub (reference implementation of this spec) allows for
|
|
> configuring the signature policy at the **global pubsub instance level**.
|
|
> This needs to be pushed down to topic-level configuration.
|
|
> Other implementations should support topic-level configuration, as this spec
|
|
> mandates.
|
|
|
|
The intersection of signing behaviours across the two axes (signature creation
|
|
and signature verification) gives way to four signature policy options:
|
|
|
|
* `StrictSign`, `StrictNoSign`. Deterministic, usage encouraged.
|
|
* `LaxSign`, `LaxNoSign`. Non-deterministic, legacy, usage discouraged. Mostly
|
|
for backwards compatibility. Will be deprecated. If the implementation decides
|
|
to support these, their use should be discouraged through deprecation warnings.
|
|
|
|
**`StrictSign` option**
|
|
|
|
On the producing side:
|
|
- Build messages with the `signature`, `key` (`from` may be enough for
|
|
certain inlineable public key types), `from` and `seqno` fields.
|
|
|
|
On the consuming side:
|
|
- Enforce the fields to be present, reject otherwise.
|
|
- Propagate only if the fields are valid and signature can be verified,
|
|
reject otherwise.
|
|
|
|
**`StrictNoSign` option**
|
|
|
|
On the producing side:
|
|
- Build messages without the `signature`, `key`, `from` and `seqno` fields.
|
|
- The corresponding protobuf key-value pairs are absent from the marshalled
|
|
message, not just empty.
|
|
|
|
On the consuming side:
|
|
- Enforce the fields to be absent, reject otherwise.
|
|
- Propagate only if the fields are absent, reject otherwise.
|
|
- A `message_id` function will not be able to use the above fields, and should
|
|
instead rely on the `data` field. A commonplace strategy is to calculate
|
|
a hash.
|
|
|
|
**`LaxSign` legacy option**
|
|
|
|
_Not required for backwards-compatibility. Considered insecure, nevertheless
|
|
defined for completeness._
|
|
|
|
Always sign, and verify incoming signatures, but accept unsigned messages.
|
|
|
|
On the producing side:
|
|
- Build messages with the `signature`, `key` (`from` may be enough), `from`
|
|
and `seqno` fields.
|
|
|
|
On the consuming side:
|
|
- `signature` may be absent, and not verified.
|
|
- Verify `signature`, iff the `signature` is present, then reject if
|
|
`signature` is invalid.
|
|
|
|
**`LaxNoSign` option**
|
|
|
|
_Previous default for 'no signature verification' mode_.
|
|
|
|
Do not sign nor origin-stamp, but verify incoming signatures, and accept
|
|
unsigned messages.
|
|
|
|
On the producing side:
|
|
- Build messages without the `signature`, `key`, `from` and `seqno` fields.
|
|
|
|
On the consuming side:
|
|
- Accept and propagate messages with above fields.
|
|
- Verify `signature`, iff the `signature` is present, then reject if
|
|
`signature` is invalid.
|
|
|
|
> **[[ Margin note: ]]** For content-addressed messaging, `StrictNoSign` is the
|
|
> most appropriate policy option, coupled with a user-defined `message_id_fn`,
|
|
> and a validator function to verify protocol-defined signatures.
|
|
>
|
|
> When publisher anonymity is being sought, `StrictNoSign` is also the most
|
|
> appropriate policy, as it refrains from outputting the `from` and `seqno`
|
|
> fields.
|
|
|
|
## The Topic Descriptor
|
|
|
|
The topic descriptor message is used to define various options and parameters
|
|
of a topic. It currently specifies the topic's human readable name, its
|
|
authentication options, and its encryption options. The `AuthOpts` and `EncOpts`
|
|
of the topic descriptor message are not used in current implementations, but
|
|
may be used in future. For clarity, this is added as a comment in the file,
|
|
and may be removed once used.
|
|
|
|
The `TopicDescriptor` protobuf is as follows:
|
|
|
|
```protobuf
|
|
syntax = "proto2";
|
|
message TopicDescriptor {
|
|
optional string name = 1;
|
|
// AuthOpts and EncOpts are unused as of Oct 2018, but
|
|
// are planned to be used in future.
|
|
optional AuthOpts auth = 2;
|
|
optional EncOpts enc = 3;
|
|
|
|
message AuthOpts {
|
|
optional AuthMode mode = 1;
|
|
repeated bytes keys = 2;
|
|
|
|
enum AuthMode {
|
|
NONE = 0;
|
|
KEY = 1;
|
|
WOT = 2;
|
|
}
|
|
}
|
|
|
|
message EncOpts {
|
|
optional EncMode mode = 1;
|
|
repeated bytes keyHashes = 2;
|
|
|
|
enum EncMode {
|
|
NONE = 0;
|
|
SHAREDKEY = 1;
|
|
WOT = 2;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The `name` field is a string used to identify or mark the topic. It can be
|
|
descriptive or random or anything that the creator chooses.
|
|
|
|
Note that instead of using `TopicDescriptor.name`, for privacy reasons the
|
|
`TopicDescriptor` struct may be hashed, and used as the topic ID. Another
|
|
option is to use a CID as a topic ID. While a consensus has not been reached,
|
|
for forwards and backwards compatibility, using an enum `TopicID` that allows
|
|
custom types in variants (i.e. `Name`, `hashedTopicDescriptor`, `CID`)
|
|
may be the most suitable option if it is available within an implementation's
|
|
language (otherwise it would be implementation defined).
|
|
|
|
The `auth` field specifies how authentication will work for this topic. Only
|
|
authenticated peers may publish to a given topic. See 'AuthOpts' below for
|
|
details.
|
|
|
|
The `enc` field specifies how messages published to this topic will be
|
|
encrypted. See 'EncOpts' below for details.
|
|
|
|
### AuthOpts
|
|
|
|
The `AuthOpts` message describes an authentication scheme. The `mode` field
|
|
specifies which scheme to use, and the `keys` field is an array of keys. The
|
|
meaning of the `keys` field is defined by the selected `AuthMode`.
|
|
|
|
There are currently three options defined for the `AuthMode` enum:
|
|
|
|
#### AuthMode 'NONE'
|
|
No authentication, anyone may publish to this topic.
|
|
|
|
#### AuthMode 'KEY'
|
|
Only peers whose peerIDs are listed in the `keys` array may publish to this
|
|
topic, messages from any other peer should be dropped.
|
|
|
|
#### AuthMode 'WOT'
|
|
Web Of Trust: any trusted peer may publish to the topic. A trusted peer is one
|
|
whose peerID is listed in the `keys` array, or any peer who is 'trusted' by
|
|
another trusted peer. The mechanism of signifying trust in another peer is yet
|
|
to be defined.
|
|
|
|
|
|
### EncOpts
|
|
|
|
The `EncOpts` message describes an encryption scheme for messages in a given
|
|
topic. The `mode` field denotes which encryption scheme will be used, and the
|
|
`keyHashes` field specifies a set of hashes of keys whose purpose may be
|
|
defined by the selected mode.
|
|
|
|
There are currently three options defined for the `EncMode` enum:
|
|
|
|
#### EncMode 'NONE'
|
|
Messages are not encrypted, anyone can read them.
|
|
|
|
#### EncMode 'SHAREDKEY'
|
|
Messages are encrypted with a preshared key. The salted hash of the key used is
|
|
denoted in the `keyHashes` field of the `EncOpts` message. The mechanism for
|
|
sharing the keys and salts is undefined.
|
|
|
|
#### EncMode 'WOT'
|
|
Web Of Trust publishing. Messages are encrypted with some certificate or
|
|
certificate chain shared amongst trusted peers. (Spec writer's note: this is the
|
|
least clearly defined option and my description here may be wildly incorrect,
|
|
needs checking).
|
|
|
|
## Topic Validation
|
|
|
|
Implementations MUST support attaching _validators_ to topics.
|
|
|
|
_Validators_ have access to the `Message` and can apply any logic to determine its validity.
|
|
When propagating a message for a topic, implementations will invoke all validators attached
|
|
to that topic, and will only continue propagation if, and only if all, validations pass.
|
|
|
|
In its simplest form, a _validator_ is a function with signature `(peer.ID, *Message) => bool`,
|
|
where the return value is `true` if validation passes, and `false` otherwise.
|
|
|
|
Local handling of failed validation is left up to the implementation (e.g. logging).
|
|
|
|
Implementations MAY allow dynamically adding and removing _validators_ at runtime.
|