From 77e3b6689466ed894c6752ab9091ced38ede3e88 Mon Sep 17 00:00:00 2001 From: Yusef Napora Date: Fri, 4 Oct 2019 16:00:59 -0400 Subject: [PATCH] rfc: 1st draft for signed address records --- RFC/0002-signed-address-records.md | 242 +++++++++++++++++++++++++++++ 1 file changed, 242 insertions(+) create mode 100644 RFC/0002-signed-address-records.md diff --git a/RFC/0002-signed-address-records.md b/RFC/0002-signed-address-records.md new file mode 100644 index 0000000..d83ab6b --- /dev/null +++ b/RFC/0002-signed-address-records.md @@ -0,0 +1,242 @@ +# RFC 0002 - Signed Address Records + +- Start Date: 2019-10-04 +- Related Issues: + - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) + - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) + +## Abstract + +This RFC proposes a method for distributing _self-certified_ address records, +which contain a peer's publicly reachable listen addresses. The record also +includes a signature, which proves that the record was produced by the peer +itself and not tampered with in transit. + +## Problem Statement + +All libp2p peers keep a "peer store" (called a peer book in some +implementations), which maps [peer ids][peer-id-spec] to a set of known +addresses for each peer. When the application layer wants to contact a peer, the +dialer will pull addresses from the peer store and try to initiate a connection +on one or more addresses. + +Addresses for a peer can come from a variety of sources. If we have already made +a connection to a peer, the libp2p [identify protocol][identify-spec] will +inform us of other addresses that they are listening on. We may also discover +their address by querying the DHT, checking a fixed "bootstrap list", or perhaps +through a pubsub message or an application-specific protocol. + +In the case of the identify protocol, we can be fairly certain that the +addresses originate from the peer we're speaking to, assuming that we're using a +secure, authenticated communication channel. However, more "ambient" discovery +methods such as DHT traversal and pubsub depend on potentially untrustworthy +third parties to relay address information. + +Even in the case of receiving addresses via the identify protocol, our +confidence that the address came directly from the peer is not actionable, because +the peer store does not track the origin of an address. Once added to the peer +store, all addresses are considered equally valid, regardless of their source. + +We would like to have a means of distributing _verifiable_ address records, +which we can prove originated from the addressed peer itself. We also need a way to +track the "provenance" of an address within libp2p's internal components such as +the peer store. Once those pieces are in place, we will also need a way to +prioritize addresses based on their authenticity, with the most strict strategy +being to only dial certified addresses. + +### Complications + +While producing a signed record is fairly trivial, there are a few aspects to +this problem that complicate things. + +1. Addresses are not static. A given peer may have several addresses at any given + time, and the set of addresses can change at arbitrary times. +2. Peers may not know their own addresses. It's often impossible to automatically + infer one's own public address, and peers may need to rely on third party + peers to inform them of their observed public addresses. +3. A peer may inadvertently or maliciously sign an address that they do not + control. In other words, a signature isn't a guarantee that a given address is + valid. +4. Some addresses may be ambiguous. For example, addresses on a private subnet + are valid within that subnet but are useless on the public internet. + +The first point implies that the address record should include some kind of +temporal component, so that newer records can replace older ones as the state +changes over time. This could be a timestamp and/or a simple sequence number +that each node increments whenever they publish a new record. + +The second and third points highlight the limits of certifying information that +is itself uncertain. While a signature can prove that the addresses originated +from the peer, it cannot prove that the addresses are correct or useful. Given +the asymmetric nature of real-world NATs, it's often the case that a peer is +_less likely_ to have correct information about its own address than an outside +observer, at least initially. + +This suggests that we should include some measure of "confidence" in our +records, so that peers can distribute addresses that they are not fully certain +are correct, while still asserting that they created the record. For example, +when requesting a dial-back via the [AutoNAT service][autonat], a peer could +send a "provisional" address record. When the AutoNAT peer confirms the address, +that address could be marked as publicly-routable and advertised in a new record. + +Regarding the fourth point about ambiguous addresses, it would also be desirable +for the address record to include a notion of "routability," which would +indicate how "accessible" the address is likely to be. This would allow us to +mark an address as "LAN-only," if we know that it is not mapped to a publicly +reachable address but would still like to distribute it to local peers. + +## Address Record Format + +There are many potential data structures that we could use to store and transmit +address information. This section sketches out a possible design using +[IPLD][ipld], although we may end up adopting a different format. Everything in +this section is subject to change as part of the RFC process. + +These types are defined using IPLD's Schema notation, the best reference for +which I'm currently aware of is [its own schema definition][ipld-schema-schema]. + +```sh + +## How accessible we believe a given address to be. +## Maybe include params? We could potentially have a subnet mask for local addresses +type Routability enum { + | "GLOBAL" ## Available on the public internet + | "LOCAL" ## Available on a local network (probably in a private address range) + | "LOOPBACK" ## Available on a loopback address on the same machine + | "UNKNOWN" ## Catch all (may include in-memory transports, etc) +} + +## How confident we are in the validity of an address +type Confidence enum { + | "CONFIRMED" ## We have verified that we're reachable on this address + | "UNCONFIRMED" ## We suspect, but have not confirmed that we're reachable + | "INVALID" ## We know that this address is invalid and should be deleted + | "UNKNOWN" ## No assertions about validity one way or another +} + +## A tuple of an address, how "routable" (public / private, etc) the address is, +## and how confident we are in its validity. +type AddressInfo struct { + addr Bytes ## Binary multiaddr + routability Routability + confidence Confidence +} + +## A point-in-time snapshot of all addresses (plus their info) that we know +## about at the time we issued the record. +## +type AddressState struct { + ## The subject of this record. Who do these addresses belong to? + subject PeerRef + + ## When was this record constructed? + issuedAt Timestamp + + ## A list of all AddressInfo records that apply at the current moment. + addresses List { + valueType &AddressInfo + } +} + +## A signed envelope containing an `AddressState` struct, our +## public key, and a signature of the state (verifiable with public key). +type AddressEnvelope { + state AddressState + + # Public key of issuer. + pubkey Bytes + + # Signature of `state`. Can be verified with `pubkey`. + # Maybe it's better to sign a merkle link to `state` instead... + sig Bytes +} + +## Unix epoch timestamp, UTC timezone. TODO: what precision? +type Timestamp Int + +# binary multihash of public key +type PeerId Bytes + +## A peer id, plus a peer-specific version clock. +## Represents a peer _at a moment in time_, where time is loosely defined as +## unit-less quantity that's always increasing. Version +## numbers must increase monotonically but do not need to be strictly +## sequential. If you don't want to preserve state across restarts or coordinate +## a counter, you can use epoch timestamps as version numbers. +type PeerRef struct { + peer PeerId + version Int +} +``` + +The idea with the structure above is that you send some metadata along with your +addresses: your "routability", and your own confidence in the validity of the +address. This is wrapped in an `AddressInfo` struct along with the address +itself. + +Then you have a big list of `AddressInfo`s, which we put in an `AddressState`. +An `AddressState` identifies the `subject` of the record, who is also the +issuing peer. We could potentially split that out into a separate `subject` and +`issuer` field, which would let peers make statements about each other in +addition to making statements about themselves. That complicates things though, +and may not be worth it. + +The state and a signature of it are wrapped in an `AddressEnvelope`, along with +the public key that produced the signature. Recipients must validate that the +public key is consistent with the peer id of the `subject` and validate the sig. + +Here's an example. Alice has an address that she thinks is publicly reachable +but has not confirmed. She also has a LAN-local address that she knows is valid, +but not routable via the public internet: + +```javascript + { + + pubkey: "", + state: { + subject: { + peer: "QmAlice...", + version: 23456 + }, + issuedAt: 1570215229, + + addresses: [ + { + addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", + routability: "GLOBAL", + confidence: "UNCONFIRMED" + }, + { + addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", + routability: "LOCAL", + confidence: "CONFIRMED" + } + ] + }, + sig: "" + } +``` + +If Alice wants to publish her address to a public shared resource like a DHT, +she should omit `LOCAL` and other unreachable addresses, and peers should +likewise filter out `LOCAL` addresses from public sources. + +## TODO + +Some things I'd like to cover but haven't got to or figured out yet: + +- how to store signed records + - should be separate from "working set" that's optimized for retrieval + - need to store unaltered bytes +- how to surface routability and confidence via peerstore APIs +- figure out if IPLD is the way to go here. If not, what serialization format, + etc. +- extend identify protocol to include signed records? +- how are addresses prioritized when dialing? + + +[identify-spec]: ../identify/README.md +[peer-id-spec]: ../peer-ids/peer-ids.md +[autonat]: https://github.com/libp2p/specs/issues/180 +[ipld]: https://ipld.io/ +[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch