split into RFCs for signed envelope / addr records

2026-01-09 15:28:03 -05:00 · 2019-10-21 11:35:11 -04:00
parent 5351d94fe6
commit 8d10f25e27
2 changed files with 340 additions and 0 deletions
--- a/RFC/0002-signed-envelopes.md
+++ b/RFC/0002-signed-envelopes.md
@@ -0,0 +1,94 @@
 # RFC 0002 - Signed Envelopes
 - Start Date: 2019-10-21
 - Related RFC: [0003 Address Records][addr-records-rfc]
 ## Abstract
 This RFC proposes a "signed envelope" structure that contains an arbitray byte
 string payload, a signature of the payload, and the public key that can be used
 to verify the signature.
 This was spun out of an earlier draft of the [address records
 RFC][addr-records-rfc], since it's generically useful.
 ## Problem Statement
 Sometimes we'd like to store some data in a public location (e.g. a DHT, etc),
 or make use of potentially untrustworthy intermediaries to relay information. It
 would be nice to have an all-purpose data container that includes a signature of
 the data, so we can verify that the data came from a specific peer and that it hasn't
 been tampered with.
 ## Wire Format
 Since we already have a [protobuf definition for public keys][peer-id-spec], we
 can use protobuf for this as well and easily embed the key in the envelope:
 ```protobuf
 message SignedEnvelope {
  PublicKey publicKey = 1; // see peer id spec for definition
  string purpose = 2;      // arbitrary user-defined string for context
  bytes cid = 3;           // CIDv1 of contents
  bytes contents = 4;      // payload
  bytes signature = 5;     // signature of purpose + cid + contents
 }
 ```
 The `publicKey` field contains the public key whose secret counterpart was used
 to sign the message. This MUST be consistent with the peer id of the signing
 peer, as the recipient will derive the peer id of the signer from this key.
 The `purpose` field is an aribitrary string that can be used to give some hint
 as to the contents. For example, if `contents` contains a serialized
 `AddressState` record, `purpose` might contain the string `"AddressState"`. The
 contents of the ``purpose`` field are signed alongside `contents` to prevent
 tampering, and may be empty if desired.
 The `cid` field contains a version 1 [CID][cid] (content id) that corresponds to
 the `content` field. It's used for retrieving messages from [local
 storage](#local-storage-of-signed-envelopes), and the embedded multicodec also
 gives a hint as to the data type of the `contents`. If the user does not specify
 a multicodec when constructing the envelope, the default will be
 [`raw`](https://github.com/multiformats/multicodec/blob/master/table.csv#L34)
 for raw binary.
 ## Signature Production / Verification
 When signing, a peer will prepare a buffer by concatenating the following:
 - The string `"libp2p-signed-envelope:"`, encoded as UTF-8
 - The `purpose` field, encoded as UTF-8
 - The `cid` field
 - The `contents` field
 Then they will sign the buffer according to the rules in the [peer id
 spec][peer-id-spec] and set the `signature` field accordingly.
 To verify, a peer will "inflate" the `publicKey` into a domain object that can
 verify signatures, prepare a buffer as above and verify the `signature` field
 against it.
 ## Local Storage of Signed Envelopes
 Signed envelopes can be used for ephemeral data, but we may also want to persist
 them for a while and / or make previously recieved envelopes accesible to
 various libp2p modules.
 For example, if the envelope contains an [address record][addr-records-rfc],
 those records might be used to populate a peer store with self-certified
 records. Rather than requiring the peer store to persist the full envelope, we
 could have a separate "envelope storage" service that keeps signed messages
 around for future reference. 
 The peer store can then just store the `cid` alongside a flag that indicates
 that the address came from a trusted source. If we're using a persistent peer
 store and the process restarts, we can look up the stored `cid` in the envelope
 storage and verify the signature again.
 If we decide to build this, the storage service should have some kind of garbage
 collection / TTL scheme to avoid unbounded growth.
 [addr-records-rfc]: ./0003-address-records.md
 [peer-id-spec]: ../peer-ids/peer-ids.md
--- a/RFC/0003-address-records.md
+++ b/RFC/0003-address-records.md
@@ -0,0 +1,246 @@
 # RFC 0003 - Address Records with Metadata
 - Start Date: 2019-10-04
 - Related Issues:
  - [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47)
  - [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436)
 ## Abstract
 This RFC proposes a method for distributing address records, which contain a
 peer's publicly reachable listen addresses, as well as some metadata that can
 help other peers categorize addresses and prioritize thme when dialing.
 The record described here does not include a signature, but it is expected to
 be serialized and wrapped in a [signed envelope][envelope-rfc], which will
 prove the identity of the issuing peer. The dialer can then prioritize
 self-certified addresses over addresses from an unknown origin.
 ## Problem Statement
 All libp2p peers keep a "peer store" (called a peer book in some
 implementations), which maps [peer ids][peer-id-spec] to a set of known
 addresses for each peer. When the application layer wants to contact a peer, the
 dialer will pull addresses from the peer store and try to initiate a connection
 on one or more addresses.
 Addresses for a peer can come from a variety of sources. If we have already made
 a connection to a peer, the libp2p [identify protocol][identify-spec] will
 inform us of other addresses that they are listening on. We may also discover
 their address by querying the DHT, checking a fixed "bootstrap list", or perhaps
 through a pubsub message or an application-specific protocol.
 In the case of the identify protocol, we can be fairly certain that the
 addresses originate from the peer we're speaking to, assuming that we're using a
 secure, authenticated communication channel. However, more "ambient" discovery
 methods such as DHT traversal and pubsub depend on potentially untrustworthy
 third parties to relay address information.
 Even in the case of receiving addresses via the identify protocol, our
 confidence that the address came directly from the peer is not actionable, because
 the peer store does not track the origin of an address. Once added to the peer
 store, all addresses are considered equally valid, regardless of their source.
 We would like to have a means of distributing _verifiable_ address records,
 which we can prove originated from the addressed peer itself. We also need a way to
 track the "provenance" of an address within libp2p's internal components such as
 the peer store. Once those pieces are in place, we will also need a way to
 prioritize addresses based on their authenticity, with the most strict strategy
 being to only dial certified addresses.
 ### Complications
 While producing a signed record is fairly trivial, there are a few aspects to
 this problem that complicate things.
 1. Addresses are not static. A given peer may have several addresses at any given
   time, and the set of addresses can change at arbitrary times.
 2. Peers may not know their own addresses. It's often impossible to automatically
   infer one's own public address, and peers may need to rely on third party
   peers to inform them of their observed public addresses.
 3. A peer may inadvertently or maliciously sign an address that they do not
   control. In other words, a signature isn't a guarantee that a given address is
   valid.
 4. Some addresses may be ambiguous. For example, addresses on a private subnet
   are valid within that subnet but are useless on the public internet.
 The first point implies that the address record should include some kind of
 temporal component, so that newer records can replace older ones as the state
 changes over time. This could be a timestamp and/or a simple sequence number
 that each node increments whenever they publish a new record.
 The second and third points highlight the limits of certifying information that
 is itself uncertain. While a signature can prove that the addresses originated
 from the peer, it cannot prove that the addresses are correct or useful. Given
 the asymmetric nature of real-world NATs, it's often the case that a peer is
 _less likely_ to have correct information about its own address than an outside
 observer, at least initially.
 This suggests that we should include some measure of "confidence" in our
 records, so that peers can distribute addresses that they are not fully certain
 are correct, while still asserting that they created the record. For example,
 when requesting a dial-back via the [AutoNAT service][autonat], a peer could
 send a "provisional" address record. When the AutoNAT peer confirms the address,
 that address could be marked as confirmed and advertised in a new record.
 Regarding the fourth point about ambiguous addresses, it would also be desirable
 for the address record to include a notion of "routability," which would
 indicate how "accessible" the address is likely to be. This would allow us to
 mark an address as "LAN-only," if we know that it is not mapped to a publicly
 reachable address but would still like to distribute it to local peers.
 ## Address Record Format
 Here's a protobuf that might work:
 ```protobuf
 // Routability indicates the "scope" of an address, meaning how visible
 // or accessible it is. This allows us to distinguish between LAN and
 // WAN addresses.
 //
 // Side Note: we could potentially have a GLOBAL_RELAY case, which would
 // make it easy to prioritize non-relay addresses in the dialer. Bit of
 // a mix of concerns though.
 enum Routability {
  // catch-all default / unknown scope
  UNKNOWN = 1;
  // another process on the same machine
  LOOPBACK = 2;
  // a local area network
  LOCAL = 3;
  // public internet
  GLOBAL = 4;
  // reserved for future use
  INTERPLANETARY = 100;
 }
 // Confidence indicates how much we believe in the validity of the
 // address.
 enum Confidence {
  // default, unknown confidence. we don't know one way or another
  UNKNOWN = 1;
  // INVALID means we know that this address is invalid and should be deleted
  INVALID = 2;
  // UNCONFIRMED means that we suspect this address is valid, but we haven't
  // fully confirmed that we're reachable.
  UNCONFIRMED = 3;
  // CONFIRMED means that we fully believe this address is valid.
  // Each node / implementation can have their own criteria for confirmation.
  CONFIRMED = 4;
 }
 // AddressInfo is a multiaddr plus some metadata.
 message AddressInfo {
  bytes multiaddr = 1;
  Routability routability = 2;
  Confidence confidence = 3;
 }
 // AddressState contains the listen addresses (and their metadata) 
 // for a peer at a particular point in time.
 //
 // Although this record contains a wall-clock `issuedAt` timestamp,
 // there are no guarantees about node clocks being in sync or correct.
 // As such, the `issuedAt` field should be considered informational,
 // and `version` should be preferred when ordering records.
 message AddressState {
  // the peer id of the subject of the record.
  bytes subjectPeer = 1;
  // `version` is an increment-only counter that can be used to
  // order AddressState records chronologically. Newer records
  // MUST have a higher `version` than older records, but there
  // can be gaps between version numbers.
  uint64 version = 2;
  // The `issuedAt` timestamp stores the creation time of this record in
  // seconds from the unix epoch, according to the issuer's clock. There
  // are no guarantees about clock sync or correctness. SHOULD NOT be used
  // to order AddressState records; use `seqno` instead.
  uint64 issuedAt = 3;
  // All current listen addresses and their metadata.
  repeated AddressInfo addresses = 4;
 }
 ```
 The idea with the structure above is that you send some metadata along with your
 addresses: your "routability", and your own confidence in the validity of the
 address. This is wrapped in an `AddressInfo` struct along with the address
 itself.
 Then you have a big list of `AddressInfo`s, which we put in an `AddressState`.
 An `AddressState` identifies the `subject` of the record,
 #### Example
 Here's an example. Alice has an address that she thinks is publicly reachable
 but has not confirmed. She also has a LAN-local address that she knows is valid,
 but not routable via the public internet:
 ```javascript
  {
    subjectPeer: "QmAlice...",
    version: 23456,
    issuedAt: 1570215229,
    addresses: [
      {
        addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice",
        routability: "GLOBAL",
        confidence: "UNCONFIRMED"
      },
      {
        addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice",
        routability: "LOCAL",
        confidence: "CONFIRMED"
      }
    ]
  }
 ```
 If Alice wants to publish her address to a public shared resource like a DHT,
 she should omit `LOCAL` and other unreachable addresses, and peers should
 likewise filter out `LOCAL` addresses from public sources.
 ## Certification / Verification
 This structure can be contained in a [signed envelope][envelope-rfc], which lets
 us issue "self-certified" address records that are signed by the `subjectPeer`.
 ## Peer Store APIs
 ## Dialing Strategies
 ## TODO
 Some things I'd like to cover but haven't got to or figured out yet:
 - how to store signed records 
  - should be separate from "working set" that's optimized for retrieval
  - need to store unaltered bytes
 - how to surface routability and confidence via peerstore APIs
 - figure out if IPLD is the way to go here. If not, what serialization format,
  etc.
 - extend identify protocol to include signed records?
 - how are addresses prioritized when dialing?
 [identify-spec]: ../identify/README.md
 [peer-id-spec]: ../peer-ids/peer-ids.md
 [autonat]: https://github.com/libp2p/specs/issues/180
 [ipld]: https://ipld.io/
 [ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch
 [envelope-rfc]: ./0002-signed-envelopes.md