From 8d10f25e278723b69ea29364cb8f706ddec4c7d2 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Mon, 21 Oct 2019 11:35:11 -0400 Subject: [PATCH] split into RFCs for signed envelope / addr records --- RFC/0002-signed-envelopes.md | 94 +++++++++++++ RFC/0003-address-records.md | 246 +++++++++++++++++++++++++++++++++++ 2 files changed, 340 insertions(+) create mode 100644 RFC/0002-signed-envelopes.md create mode 100644 RFC/0003-address-records.md diff --git a/RFC/0002-signed-envelopes.md b/RFC/0002-signed-envelopes.md new file mode 100644 index 0000000..7a1bfcb --- /dev/null +++ b/RFC/0002-signed-envelopes.md @@ -0,0 +1,94 @@ +# RFC 0002 - Signed Envelopes + +- Start Date: 2019-10-21 +- Related RFC: [0003 Address Records][addr-records-rfc] + +## Abstract + +This RFC proposes a "signed envelope" structure that contains an arbitray byte +string payload, a signature of the payload, and the public key that can be used +to verify the signature. + +This was spun out of an earlier draft of the [address records +RFC][addr-records-rfc], since it's generically useful. + +## Problem Statement + +Sometimes we'd like to store some data in a public location (e.g. a DHT, etc), +or make use of potentially untrustworthy intermediaries to relay information. It +would be nice to have an all-purpose data container that includes a signature of +the data, so we can verify that the data came from a specific peer and that it hasn't +been tampered with. + +## Wire Format + +Since we already have a [protobuf definition for public keys][peer-id-spec], we +can use protobuf for this as well and easily embed the key in the envelope: + + +```protobuf +message SignedEnvelope { + PublicKey publicKey = 1; // see peer id spec for definition + string purpose = 2; // arbitrary user-defined string for context + bytes cid = 3; // CIDv1 of contents + bytes contents = 4; // payload + bytes signature = 5; // signature of purpose + cid + contents +} +``` + +The `publicKey` field contains the public key whose secret counterpart was used +to sign the message. This MUST be consistent with the peer id of the signing +peer, as the recipient will derive the peer id of the signer from this key. + +The `purpose` field is an aribitrary string that can be used to give some hint +as to the contents. For example, if `contents` contains a serialized +`AddressState` record, `purpose` might contain the string `"AddressState"`. The +contents of the ``purpose`` field are signed alongside `contents` to prevent +tampering, and may be empty if desired. + +The `cid` field contains a version 1 [CID][cid] (content id) that corresponds to +the `content` field. It's used for retrieving messages from [local +storage](#local-storage-of-signed-envelopes), and the embedded multicodec also +gives a hint as to the data type of the `contents`. If the user does not specify +a multicodec when constructing the envelope, the default will be +[`raw`](https://github.com/multiformats/multicodec/blob/master/table.csv#L34) +for raw binary. + +## Signature Production / Verification + +When signing, a peer will prepare a buffer by concatenating the following: + +- The string `"libp2p-signed-envelope:"`, encoded as UTF-8 +- The `purpose` field, encoded as UTF-8 +- The `cid` field +- The `contents` field + +Then they will sign the buffer according to the rules in the [peer id +spec][peer-id-spec] and set the `signature` field accordingly. + +To verify, a peer will "inflate" the `publicKey` into a domain object that can +verify signatures, prepare a buffer as above and verify the `signature` field +against it. + +## Local Storage of Signed Envelopes + +Signed envelopes can be used for ephemeral data, but we may also want to persist +them for a while and / or make previously recieved envelopes accesible to +various libp2p modules. + +For example, if the envelope contains an [address record][addr-records-rfc], +those records might be used to populate a peer store with self-certified +records. Rather than requiring the peer store to persist the full envelope, we +could have a separate "envelope storage" service that keeps signed messages +around for future reference. + +The peer store can then just store the `cid` alongside a flag that indicates +that the address came from a trusted source. If we're using a persistent peer +store and the process restarts, we can look up the stored `cid` in the envelope +storage and verify the signature again. + +If we decide to build this, the storage service should have some kind of garbage +collection / TTL scheme to avoid unbounded growth. + +[addr-records-rfc]: ./0003-address-records.md +[peer-id-spec]: ../peer-ids/peer-ids.md diff --git a/RFC/0003-address-records.md b/RFC/0003-address-records.md new file mode 100644 index 0000000..feaf63f --- /dev/null +++ b/RFC/0003-address-records.md @@ -0,0 +1,246 @@ +# RFC 0003 - Address Records with Metadata + +- Start Date: 2019-10-04 +- Related Issues: + - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) + - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) + +## Abstract + +This RFC proposes a method for distributing address records, which contain a +peer's publicly reachable listen addresses, as well as some metadata that can +help other peers categorize addresses and prioritize thme when dialing. + +The record described here does not include a signature, but it is expected to +be serialized and wrapped in a [signed envelope][envelope-rfc], which will +prove the identity of the issuing peer. The dialer can then prioritize +self-certified addresses over addresses from an unknown origin. + +## Problem Statement + +All libp2p peers keep a "peer store" (called a peer book in some +implementations), which maps [peer ids][peer-id-spec] to a set of known +addresses for each peer. When the application layer wants to contact a peer, the +dialer will pull addresses from the peer store and try to initiate a connection +on one or more addresses. + +Addresses for a peer can come from a variety of sources. If we have already made +a connection to a peer, the libp2p [identify protocol][identify-spec] will +inform us of other addresses that they are listening on. We may also discover +their address by querying the DHT, checking a fixed "bootstrap list", or perhaps +through a pubsub message or an application-specific protocol. + +In the case of the identify protocol, we can be fairly certain that the +addresses originate from the peer we're speaking to, assuming that we're using a +secure, authenticated communication channel. However, more "ambient" discovery +methods such as DHT traversal and pubsub depend on potentially untrustworthy +third parties to relay address information. + +Even in the case of receiving addresses via the identify protocol, our +confidence that the address came directly from the peer is not actionable, because +the peer store does not track the origin of an address. Once added to the peer +store, all addresses are considered equally valid, regardless of their source. + +We would like to have a means of distributing _verifiable_ address records, +which we can prove originated from the addressed peer itself. We also need a way to +track the "provenance" of an address within libp2p's internal components such as +the peer store. Once those pieces are in place, we will also need a way to +prioritize addresses based on their authenticity, with the most strict strategy +being to only dial certified addresses. + +### Complications + +While producing a signed record is fairly trivial, there are a few aspects to +this problem that complicate things. + +1. Addresses are not static. A given peer may have several addresses at any given + time, and the set of addresses can change at arbitrary times. +2. Peers may not know their own addresses. It's often impossible to automatically + infer one's own public address, and peers may need to rely on third party + peers to inform them of their observed public addresses. +3. A peer may inadvertently or maliciously sign an address that they do not + control. In other words, a signature isn't a guarantee that a given address is + valid. +4. Some addresses may be ambiguous. For example, addresses on a private subnet + are valid within that subnet but are useless on the public internet. + +The first point implies that the address record should include some kind of +temporal component, so that newer records can replace older ones as the state +changes over time. This could be a timestamp and/or a simple sequence number +that each node increments whenever they publish a new record. + +The second and third points highlight the limits of certifying information that +is itself uncertain. While a signature can prove that the addresses originated +from the peer, it cannot prove that the addresses are correct or useful. Given +the asymmetric nature of real-world NATs, it's often the case that a peer is +_less likely_ to have correct information about its own address than an outside +observer, at least initially. + +This suggests that we should include some measure of "confidence" in our +records, so that peers can distribute addresses that they are not fully certain +are correct, while still asserting that they created the record. For example, +when requesting a dial-back via the [AutoNAT service][autonat], a peer could +send a "provisional" address record. When the AutoNAT peer confirms the address, +that address could be marked as confirmed and advertised in a new record. + +Regarding the fourth point about ambiguous addresses, it would also be desirable +for the address record to include a notion of "routability," which would +indicate how "accessible" the address is likely to be. This would allow us to +mark an address as "LAN-only," if we know that it is not mapped to a publicly +reachable address but would still like to distribute it to local peers. + +## Address Record Format + +Here's a protobuf that might work: + +```protobuf +// Routability indicates the "scope" of an address, meaning how visible +// or accessible it is. This allows us to distinguish between LAN and +// WAN addresses. +// +// Side Note: we could potentially have a GLOBAL_RELAY case, which would +// make it easy to prioritize non-relay addresses in the dialer. Bit of +// a mix of concerns though. +enum Routability { + // catch-all default / unknown scope + UNKNOWN = 1; + + // another process on the same machine + LOOPBACK = 2; + + // a local area network + LOCAL = 3; + + // public internet + GLOBAL = 4; + + // reserved for future use + INTERPLANETARY = 100; +} + + +// Confidence indicates how much we believe in the validity of the +// address. +enum Confidence { + // default, unknown confidence. we don't know one way or another + UNKNOWN = 1; + + // INVALID means we know that this address is invalid and should be deleted + INVALID = 2; + + // UNCONFIRMED means that we suspect this address is valid, but we haven't + // fully confirmed that we're reachable. + UNCONFIRMED = 3; + + // CONFIRMED means that we fully believe this address is valid. + // Each node / implementation can have their own criteria for confirmation. + CONFIRMED = 4; +} + +// AddressInfo is a multiaddr plus some metadata. +message AddressInfo { + bytes multiaddr = 1; + Routability routability = 2; + Confidence confidence = 3; +} + +// AddressState contains the listen addresses (and their metadata) +// for a peer at a particular point in time. +// +// Although this record contains a wall-clock `issuedAt` timestamp, +// there are no guarantees about node clocks being in sync or correct. +// As such, the `issuedAt` field should be considered informational, +// and `version` should be preferred when ordering records. +message AddressState { + // the peer id of the subject of the record. + bytes subjectPeer = 1; + + // `version` is an increment-only counter that can be used to + // order AddressState records chronologically. Newer records + // MUST have a higher `version` than older records, but there + // can be gaps between version numbers. + uint64 version = 2; + + // The `issuedAt` timestamp stores the creation time of this record in + // seconds from the unix epoch, according to the issuer's clock. There + // are no guarantees about clock sync or correctness. SHOULD NOT be used + // to order AddressState records; use `seqno` instead. + uint64 issuedAt = 3; + + // All current listen addresses and their metadata. + repeated AddressInfo addresses = 4; +} +``` + +The idea with the structure above is that you send some metadata along with your +addresses: your "routability", and your own confidence in the validity of the +address. This is wrapped in an `AddressInfo` struct along with the address +itself. + +Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. +An `AddressState` identifies the `subject` of the record, + + +#### Example + +Here's an example. Alice has an address that she thinks is publicly reachable +but has not confirmed. She also has a LAN-local address that she knows is valid, +but not routable via the public internet: + +```javascript + { + subjectPeer: "QmAlice...", + version: 23456, + issuedAt: 1570215229, + + addresses: [ + { + addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", + routability: "GLOBAL", + confidence: "UNCONFIRMED" + }, + { + addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", + routability: "LOCAL", + confidence: "CONFIRMED" + } + ] + } +``` + +If Alice wants to publish her address to a public shared resource like a DHT, +she should omit `LOCAL` and other unreachable addresses, and peers should +likewise filter out `LOCAL` addresses from public sources. + +## Certification / Verification + +This structure can be contained in a [signed envelope][envelope-rfc], which lets +us issue "self-certified" address records that are signed by the `subjectPeer`. + +## Peer Store APIs + + + +## Dialing Strategies + + +## TODO + +Some things I'd like to cover but haven't got to or figured out yet: + +- how to store signed records + - should be separate from "working set" that's optimized for retrieval + - need to store unaltered bytes +- how to surface routability and confidence via peerstore APIs +- figure out if IPLD is the way to go here. If not, what serialization format, + etc. +- extend identify protocol to include signed records? +- how are addresses prioritized when dialing? + + +[identify-spec]: ../identify/README.md +[peer-id-spec]: ../peer-ids/peer-ids.md +[autonat]: https://github.com/libp2p/specs/issues/180 +[ipld]: https://ipld.io/ +[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch +[envelope-rfc]: ./0002-signed-envelopes.md