trim scope & rename to "routing records"

This commit is contained in:
Yusef Napora
2019-11-01 17:20:12 -04:00
parent 107ddde284
commit cba046fd42
2 changed files with 237 additions and 267 deletions

View File

@@ -1,267 +0,0 @@
# RFC 0003 - Address Records with Metadata
- Start Date: 2019-10-04
- Related Issues:
- [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47)
- [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436)
## Abstract
This RFC proposes a method for distributing address records, which contain a
peer's publicly reachable listen addresses, as well as some metadata that can
help other peers categorize addresses and prioritize thme when dialing.
The record described here does not include a signature, but it is expected to
be serialized and wrapped in a [signed envelope][envelope-rfc], which will
prove the identity of the issuing peer. The dialer can then prioritize
self-certified addresses over addresses from an unknown origin.
## Problem Statement
All libp2p peers keep a "peer store" (called a peer book in some
implementations), which maps [peer ids][peer-id-spec] to a set of known
addresses for each peer. When the application layer wants to contact a peer, the
dialer will pull addresses from the peer store and try to initiate a connection
on one or more addresses.
Addresses for a peer can come from a variety of sources. If we have already made
a connection to a peer, the libp2p [identify protocol][identify-spec] will
inform us of other addresses that they are listening on. We may also discover
their address by querying the DHT, checking a fixed "bootstrap list", or perhaps
through a pubsub message or an application-specific protocol.
In the case of the identify protocol, we can be fairly certain that the
addresses originate from the peer we're speaking to, assuming that we're using a
secure, authenticated communication channel. However, more "ambient" discovery
methods such as DHT traversal and pubsub depend on potentially untrustworthy
third parties to relay address information.
Even in the case of receiving addresses via the identify protocol, our
confidence that the address came directly from the peer is not actionable, because
the peer store does not track the origin of an address. Once added to the peer
store, all addresses are considered equally valid, regardless of their source.
We would like to have a means of distributing _verifiable_ address records,
which we can prove originated from the addressed peer itself. We also need a way to
track the "provenance" of an address within libp2p's internal components such as
the peer store. Once those pieces are in place, we will also need a way to
prioritize addresses based on their authenticity, with the most strict strategy
being to only dial certified addresses.
### Complications
While producing a signed record is fairly trivial, there are a few aspects to
this problem that complicate things.
1. Addresses are not static. A given peer may have several addresses at any given
time, and the set of addresses can change at arbitrary times.
2. Peers may not know their own addresses. It's often impossible to automatically
infer one's own public address, and peers may need to rely on third party
peers to inform them of their observed public addresses.
3. A peer may inadvertently or maliciously sign an address that they do not
control. In other words, a signature isn't a guarantee that a given address is
valid.
4. Some addresses may be ambiguous. For example, addresses on a private subnet
are valid within that subnet but are useless on the public internet.
The first point implies that the address record should include some kind of
temporal component, so that newer records can replace older ones as the state
changes over time. This could be a timestamp and/or a simple sequence number
that each node increments whenever they publish a new record.
The second and third points highlight the limits of certifying information that
is itself uncertain. While a signature can prove that the addresses originated
from the peer, it cannot prove that the addresses are correct or useful. Given
the asymmetric nature of real-world NATs, it's often the case that a peer is
_less likely_ to have correct information about its own address than an outside
observer, at least initially.
This suggests that we should include some measure of "confidence" in our
records, so that peers can distribute addresses that they are not fully certain
are correct, while still asserting that they created the record. For example,
when requesting a dial-back via the [AutoNAT service][autonat], a peer could
send a "provisional" address record. When the AutoNAT peer confirms the address,
that address could be marked as confirmed and advertised in a new record.
Regarding the fourth point about ambiguous addresses, it would also be desirable
for the address record to include a notion of "routability," which would
indicate how "accessible" the address is likely to be. This would allow us to
mark an address as "LAN-only," if we know that it is not mapped to a publicly
reachable address but would still like to distribute it to local peers.
## Address Record Format
Here's a protobuf that might work:
```protobuf
// Routability indicates the "scope" of an address, meaning how visible
// or accessible it is. This allows us to distinguish between LAN and
// WAN addresses.
//
// Side Note: we could potentially have a GLOBAL_RELAY case, which would
// make it easy to prioritize non-relay addresses in the dialer. Bit of
// a mix of concerns though.
enum Routability {
// catch-all default / unknown scope
UNKNOWN = 1;
// another process on the same machine
LOOPBACK = 2;
// a local area network
LOCAL = 3;
// public internet
GLOBAL = 4;
// reserved for future use
INTERPLANETARY = 100;
}
// Confidence indicates how much we believe in the validity of the
// address.
enum Confidence {
// default, unknown confidence. we don't know one way or another
UNKNOWN = 1;
// INVALID means we know that this address is invalid and should be deleted
INVALID = 2;
// UNCONFIRMED means that we suspect this address is valid, but we haven't
// fully confirmed that we're reachable.
UNCONFIRMED = 3;
// CONFIRMED means that we fully believe this address is valid.
// Each node / implementation can have their own criteria for confirmation.
CONFIRMED = 4;
}
// AddressInfo is a multiaddr plus some metadata.
message AddressInfo {
bytes multiaddr = 1;
Routability routability = 2;
Confidence confidence = 3;
}
// AddressState contains the listen addresses (and their metadata)
// for a peer at a particular point in time.
//
// Although this record contains a wall-clock `issuedAt` timestamp,
// there are no guarantees about node clocks being in sync or correct.
// As such, the `issuedAt` field should be considered informational,
// and `version` should be preferred when ordering records.
message AddressState {
// the peer id of the subject of the record.
bytes subjectPeer = 1;
// `version` is an increment-only counter that can be used to
// order AddressState records chronologically. Newer records
// MUST have a higher `version` than older records, but there
// can be gaps between version numbers.
uint64 version = 2;
// The `issuedAt` timestamp stores the creation time of this record in
// seconds from the unix epoch, according to the issuer's clock. There
// are no guarantees about clock sync or correctness. SHOULD NOT be used
// to order AddressState records; use `version` instead.
uint64 issuedAt = 3;
// All current listen addresses and their metadata.
repeated AddressInfo addresses = 4;
}
```
The idea with the structure above is that you send some metadata along with your
addresses: your "routability", and your own confidence in the validity of the
address. This is wrapped in an `AddressInfo` struct along with the address
itself.
Then you have a big list of `AddressInfo`s, which we put in an `AddressState`.
An `AddressState` identifies the `subjectPeer`, which is the peer that the
record is about, to whom the addresses belong. It also includes a `version`
number, so that we can replace earlier `AddressState`s with newer ones, and a
timestamp for informational purposes.
#### Example
Here's an example. Alice has an address that she thinks is publicly reachable
but has not confirmed. She also has a LAN-local address that she knows is valid,
but not routable via the public internet:
```javascript
{
subjectPeer: "QmAlice...",
version: 23456,
issuedAt: 1570215229,
addresses: [
{
addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice",
routability: "GLOBAL",
confidence: "UNCONFIRMED"
},
{
addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice",
routability: "LOCAL",
confidence: "CONFIRMED"
}
]
}
```
If Alice wants to publish her address to a public shared resource like a DHT,
she should omit `LOCAL` and other unreachable addresses, and peers should
likewise filter out `LOCAL` addresses from public sources.
## Certification / Verification
This structure can be contained in a [signed envelope][envelope-rfc], which lets
us issue "self-certified" address records that are signed by the `subjectPeer`.
## Peer Store APIs
This section is a WIP, and I'd love input.
We need to figure out how to surface the address metadata in the peerstore APIs.
In go, extending the [`AddrInfo`
struct](https://github.com/libp2p/go-libp2p-core/blob/master/peer/addrinfo.go)
to include metadata seems like a decent place to start, and js likewise has
[js-peer-info](https://github.com/libp2p/js-peer-info) that could be extended.
When storing this metadata internally, we may want to make a distinction between
the remote peer's confidence in an address and our own confidence; we may decide
an address is invalid when the remote peer thinks otherwise. One idea is to have
our local confidence just be a numeric score (for easy sorting) that takes the
remote peer's confidence value as an input.
The go [AddrBook
interface](https://github.com/libp2p/go-libp2p-core/blob/master/peerstore/peerstore.go#L89)
would also need to be updated - it currently deals with "raw" multiaddrs, and
the only metadata exposed is a TTL for expiration. Changing this interface seems
like a fairly big refactor to me, especially with the implementation in another
repo. I'd love if some gophers could weigh in on a good way forward.
## Dialing Strategies
Once we're surfacing routability info alongside addresses, the dialer can decide
to optionally prioritize addresses it thinks are most likely to be reachable. We
can also add an option to only dial self-certified addresses, although that
likely won't be practical until self-certified addresses become commonplace.
## Changes to core libp2p protocols
How to publish these to the DHT? Are the backward compatibility issues with
older unsigned address records? Maybe we just publish these to a different key
prefix...
Should we update identify and mDNS discovery to use signed records?
[identify-spec]: ../identify/README.md
[peer-id-spec]: ../peer-ids/peer-ids.md
[autonat]: https://github.com/libp2p/specs/issues/180
[ipld]: https://ipld.io/
[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch
[envelope-rfc]: ./0002-signed-envelopes.md

237
RFC/0003-routing-records.md Normal file
View File

@@ -0,0 +1,237 @@
# RFC 0003 - Peer Routing Records
- Start Date: 2019-10-04
- Related Issues:
- [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47)
- [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436)
## Abstract
This RFC proposes a method for distributing peer routing records, which contain
a peer's publicly reachable listen addresses, and may be extended in the future
to contain additional metadata relevant to routing. This serves a similar
purpose to [Ethereum Node Records][eip-778]. Like ENR records, libp2p routing
records should be extensible, so that we can add information relevant to as-yet
unknown use cases.
The record described here does not include a signature, but it is expected to
be serialized and wrapped in a [signed envelope][envelope-rfc], which will
prove the identity of the issuing peer. The dialer can then prioritize
self-certified addresses over addresses from an unknown origin.
## Problem Statement
All libp2p peers keep a "peer store", which maps [peer ids][peer-id-spec] to a
set of known addresses for each peer. When the application layer wants to
contact a peer, the dialer will pull addresses from the peer store and try to
initiate a connection on one or more addresses.
Addresses for a peer can come from a variety of sources. If we have already made
a connection to a peer, the libp2p [identify protocol][identify-spec] will
inform us of other addresses that they are listening on. We may also discover
their address by querying the DHT, checking a fixed "bootstrap list", or perhaps
through a pubsub message or an application-specific protocol.
In the case of the identify protocol, we can be fairly certain that the
addresses originate from the peer we're speaking to, assuming that we're using a
secure, authenticated communication channel. However, more "ambient" discovery
methods such as DHT traversal and pubsub depend on potentially untrustworthy
third parties to relay address information.
Even in the case of receiving addresses via the identify protocol, our
confidence that the address came directly from the peer is not actionable, because
the peer store does not track the origin of an address. Once added to the peer
store, all addresses are considered equally valid, regardless of their source.
We would like to have a means of distributing _verifiable_ address records,
which we can prove originated from the addressed peer itself. We also need a way to
track the "provenance" of an address within libp2p's internal components such as
the peer store. Once those pieces are in place, we will also need a way to
prioritize addresses based on their authenticity, with the most strict strategy
being to only dial certified addresses.
### Complications
While producing a signed record is fairly trivial, there are a few aspects to
this problem that complicate things.
1. Addresses are not static. A given peer may have several addresses at any given
time, and the set of addresses can change at arbitrary times.
2. Peers may not know their own addresses. It's often impossible to automatically
infer one's own public address, and peers may need to rely on third party
peers to inform them of their observed public addresses.
3. A peer may inadvertently or maliciously sign an address that they do not
control. In other words, a signature isn't a guarantee that a given address is
valid.
4. Some addresses may be ambiguous. For example, addresses on a private subnet
are valid within that subnet but are useless on the public internet.
The first point can be addressed by having records contain a sequence number
that increases monotonically when new records are issued, and by having newer
records replace older ones.
The other points, while worth thinking about, are out of scope for this RFC.
However, we can take care to make our records extensible so that we can add
additional metadata in the future. Some thoughts along these lines are in the
[Future Work section below](#future-work).
## Address Record Format
Here's a protobuf that might work:
```protobuf
// RoutingRecord contains the listen addresses for a peer at a particular point in time.
message RoutingRecord {
// AddressInfo wraps a multiaddr. In the future, it may be extended to
// contain additional metadata, such as "routability" (whether an address is
// local or global, etc).
message AddressInfo {
bytes multiaddr = 1;
}
// the peer id of the subject of the record (who these addresses belong to).
bytes subjectPeer = 1;
// A monotonically increasing sequence number, used for record ordering.
uint64 seq = 2;
// All current listen addresses
repeated AddressInfo addresses = 4;
}
```
The `AddressInfo` wrapper message is used instead of a bare multiaddr to allow
us to extend addresses with additional metadata [in the future](#future-work).
The `seq` field contains a sequence number that MUST increase monotonically as
new records are created. Newer records MUST have a higher `seq` value than older
records. To avoid persisting state across restarts, implementations MAY use unix
epoch time as the `seq` value, however they MUST NOT attempt to interpret a
`seq` value from another peer as a valid timestamp.
#### Example
```javascript
{
subjectPeer: "QmAlice...",
seq: 1570215229,
addresses: [
{
addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice",
},
{
addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice",
}
]
}
```
## Certification / Verification
This structure can be contained in a [signed envelope][envelope-rfc], which lets
us issue "self-certified" address records that are signed by the `subjectPeer`.
To produce a "self-certified" address, a peer will construct a `RoutingRecord`
containing all of their publicly-reachable listen addresses. A peer SHOULD only
include addresses that it believes are routable via the public internet, ideally
having confirmed that this is the case via some external mechanism such as a
successful AutoNAT dial-back.
In some cases we may want to include localhost or LAN-local address; for
example, when testing the DHT using many processes on a single machine. To
support this, implementations may use a global runtime configuration flag or
environment variable to control whether local addresses will be included.
Once the `RoutingRecord` has been constructed, it should be serialized to a byte
string and wrapped in a [signed envelope][envelope-rfc]. The `publicKey` field
of the envelope MUST be consistent with the `subjectPeer` peer id for the record
to be considered valid.
### Signed Envelope Domain
Signed envelopes require a "domain separation" string that defines the "scope"
or purpose of a signature.
When wrapping a `RoutingRecord` in a signed envelope, the domain string MUST be
`libp2p-routing-record`.
### Signed Envelope Type Hint
Signed envelopes contain a "type hint" that indicates how to interpret the
contents of the envelope.
Ideally, we should define a new multicodec for routing records, so that we can
identify them in a few bytes. While we're still spec'ing and working on the
initial implementation, we can use the UTF-8 string ``"/libp2p/routing-record"`
as the type hint value.
## Peer Store APIs
We will need to add a few methods to the peer store:
- `AddCertifiedAddrs(envelope) -> Maybe<Error>`
- Add a self-certified address, wrapped in a signed envelope. This should
validate the envelope signature & store the envelope for future reference.
If any certified addresses already exist for the peer, only accept the new
envelope if it has a greater `seq` value than existing envelopes.
- `CertifiedAddrs(peerId) -> Set<Multiaddr>`
- return the set of self-certified addresses for the given peer id
And possibly:
- `IsCertified(peerId, multiaddr) -> Boolean`
- has a particular address been self-certified by the given peer?
We'll also need a method that constructs a new `RoutingRecord` containing our
listen address and wraps it in a signed envelope. This may belong on the Host
instead of the peer store, since it needs access to the private signing key.
## Dialing Strategies
Once self-certified addresses are available via the peer store, we can update
the dialer to prefer using them when possible. Some systems may want to _only_
dial self-certified addresses, so we should include some configuration options
to control whether non-certified addresses are acceptable.
## Changes to core libp2p protocols
How to publish these to the DHT? Are there backward compatibility issues with
older unsigned address records? Maybe we just publish these to a different key
prefix...
Should we update identify and mDNS discovery to use signed records?
## Future Work
Some things that were originally considered in this RFC were trimmed so that we
can focus on delivering a basic self-certified record, which is a pressing need.
This includes a notion of "routability", which could be used to communicate
whether a given address is global (reachable via the public internet),
LAN-local, etc. We may also want to include some kind of confidence score or
priority ranking, so that peers can communicate which addresses they would
prefer other peers to use.
To allow these fields to be added in the future, we wrap multiaddrs in the
`AddressInfo` message instead of having the `addresses` field be a list of "raw"
multiaddrs.
Another potentially useful extension would be a compact protocol table or bloom
filter that could be used to test whether a peer supports a given protocol
before interacting with them directly. This could be added as a new field in the
`RoutingRecord` message.
[identify-spec]: ../identify/README.md
[peer-id-spec]: ../peer-ids/peer-ids.md
[autonat]: https://github.com/libp2p/specs/issues/180
[ipld]: https://ipld.io/
[ipld-schema-schema]: https://github.com/ipld/specs/blob/master/schemas/schema-schema.ipldsch
[envelope-rfc]: ./0002-signed-envelopes.md
[eip-778]: https://eips.ethereum.org/EIPS/eip-778