Files
specs/webrtc/README.md
2023-01-23 15:02:20 +01:00

23 KiB

WebRTC

Lifecycle Stage Maturity Status Latest Revision
2A Candidate Recommendation Active r0, 2022-10-14

Authors: @mxinden

Interest Group: @marten-seemann

Table of Contents

Motivation

  1. No need for trusted TLS certificates. Enable browsers to connect to public server nodes without those server nodes providing a TLS certificate within the browser's trustchain. Note that we can not do this today with our Websocket transport as the browser requires the remote to have a trusted TLS certificate. Nor can we establish a plain TCP or QUIC connection from within a browser. We can establish a WebTransport connection from the browser (see WebTransport specification).

Addressing

WebRTC multiaddresses are composed of an IP and UDP address component, followed by /webrtc and a multihash of the certificate that the node uses.

Examples:

  • /ip4/192.0.2.0/udp/1234/webrtc/certhash/<hash>/p2p/<peer-id>
  • /ip6/fe80::1ff:fe23:4567:890a/udp/1234/webrtc/certhash/<hash>/p2p/<peer-id>

The TLS certificate fingerprint in /certhash is a multibase encoded multihash.

For compatibility implementations MUST support hash algorithm sha-256 and base encoding base64url. Implementations MAY support other hash algorithms and base encodings, but they may not be able to connect to all other nodes.

Connection Establishment

Browser to public Server

Scenario: Browser A wants to connect to server node B where B is publicly reachable but B does not have a TLS certificate trusted by A.

  1. Server node B generates a TLS certificate, listens on a UDP port and advertises the corresponding multiaddress (see [#addressing]) through some external mechanism.

    Given that B is publicly reachable, B acts as a ICE Lite agent. It binds to a UDP port waiting for incoming STUN and SCTP packets and multiplexes based on source IP and source port.

  2. Browser A discovers server node B's multiaddr, containing B's IP, UDP port, TLS certificate fingerprint and optionally libp2p peer ID (e.g. /ip6/2001:db8::/udp/1234/webrtc/certhash/<hash>/p2p/<peer-id>), through some external mechanism.

  3. A instantiates a RTCPeerConnection. See RTCPeerConnection().

    A (i.e. the browser) SHOULD NOT reuse the same certificate across RTCPeerConnections. Reusing the certificate can be used to identify A across connections by on-path observers given that WebRTC uses TLS 1.2.

  4. A constructs B's SDP answer locally based on B's multiaddr.

    A generates a random string prefixed with "libp2p+webrtc+v1/". The prefix allows us to use the ufrag as an upgrade mechanism to role out a new version of the libp2p WebRTC protocol on a live network. While a hack, this might be very useful in the future. A sets the string as the username (ufrag or username fragment) and password on the SDP of the remote's answer.

    A MUST set the a=max-message-size:16384 SDP attribute. See reasoning multiplexing for rational.

    Finally A sets the remote answer via RTCPeerConnection.setRemoteDescription().

  5. A creates a local offer via RTCPeerConnection.createOffer(). A sets the same username and password on the local offer as done in (4) on the remote answer.

    A MUST set the a=max-message-size:16384 SDP attribute. See reasoning multiplexing for rational.

    Finally A sets the modified offer via RTCPeerConnection.setLocalDescription().

    Note that this process, oftentimes referred to as "SDP munging" is disallowed by the specification, but not enforced across the major browsers (Safari, Firefox, Chrome) due to use-cases in the wild. See also https://bugs.chromium.org/p/chromium/issues/detail?id=823036

  6. Once A sets the SDP offer and answer, it will start sending STUN requests to B. B reads the ufrag from the incoming STUN request's username field. B then infers A's SDP offer using the IP, port, and ufrag of the request as follows:

    1. B sets the the ice-ufrag and ice-pwd equal to the value read from the username field.

    2. B sets an arbitrary sha-256 digest as the remote fingerprint as it does not verify fingerprints at this point.

    3. B sets the connection field (c) to the IP and port of the incoming request c=IN <ip> <port>.

    4. B sets the a=max-message-size:16384 SDP attribute. See reasoning multiplexing for rational.

    B sets this offer as the remote description. B generates an answer and sets it as the local description.

    The ufrag in combination with the IP and port of A can be used by B to identify the connection, i.e. demultiplex incoming UDP datagrams per incoming connection.

    Note that this step requires B to allocate memory for each incoming STUN message from A. This could be leveraged for a DOS attack where A is sending many STUN messages with different ufrags using different UDP source ports, forcing B to allocate a new peer connection for each. B SHOULD have a rate limiting mechanism in place as a defense measure. See also https://datatracker.ietf.org/doc/html/rfc5389#section-16.1.2.

  7. A and B execute the DTLS handshake as part of the standard WebRTC connection establishment.

    At this point B does not know the TLS certificate fingerprint of A. Thus B can not verify A's TLS certificate fingerprint during the DTLS handshake. Instead B needs to disable certificate fingerprint verification (see e.g. Pion's disableCertificateFingerprintVerification option).

    On success of the DTLS handshake the connection provides confidentiality and integrity but not authenticity. The latter is guaranteed through the succeeding Noise handshake. See Connection Security section.

  8. Messages on each RTCDataChannel are framed using the message framing mechanism described in Multiplexing.

  9. The remote is authenticated via an additional Noise handshake. See Connection Security section.

WebRTC can run both on UDP and TCP. libp2p WebRTC implementations MUST support UDP and MAY support TCP.

Multiplexing

The WebRTC browser APIs do not support half-closing of streams nor resets of the sending part of streams. RTCDataChannel.close() flushes the remaining messages and closes the local write and read side. After calling RTCDataChannel.close() one can no longer read from nor write to the channel. This lack of functionality is problematic, given that libp2p protocols running on top of transport protocols, like WebRTC, expect to be able to half-close or reset a stream. See Connection Establishment in libp2p.

To support half-closing and resets of streams, libp2p WebRTC uses message framing. Messages on a RTCDataChannel are embedded into the Protobuf message below and sent on the RTCDataChannel prefixed with the message length in bytes, encoded as an unsigned variable length integer as defined by the multiformats unsigned-varint spec.

It is an adaptation from the QUIC RFC. When in doubt on the semantics of these messages, consult the QUIC RFC.

syntax = "proto2";

package webrtc.pb;

message Message {
  enum Flag {
    // The sender will no longer send messages on the stream.
    FIN = 0;
    // The sender will no longer read messages on the stream. Incoming data is
    // being discarded on receipt.
    STOP_SENDING = 1;
    // The sender abruptly terminates the sending part of the stream. The
    // receiver MAY discard any data that it already received on that stream.
    RESET_STREAM = 2;
  }

  optional Flag flag=1;

  optional bytes message = 2;
}

Note that in contrast to QUIC (see QUIC RFC - 3.5 Solicited State Transitions) a libp2p WebRTC endpoint receiving a STOP_SENDING frame SHOULD NOT send a RESET_STREAM frame in reply. The STOP_SENDING frame is used for accurate accounting of the number of bytes sent for connection-level flow control in QUIC. The libp2p WebRTC message framing is not concerned with flow-control and thus does not need the RESET_STREAM frame to be send in reply to a STOP_SENDING frame.

Encoded messages including their length prefix MUST NOT exceed 16kiB to support all major browsers. See "Understanding message size limits". Implementations MAY choose to send smaller messages, e.g. to reduce delays sending flagged messages.

Ordering

Implementations MAY expose an unordered byte stream abstraction to the user by overriding the default value of ordered true to false when creating a new data channel via RTCPeerConnection.createDataChannel.

Head-of-line blocking

WebRTC datachannels and the underlying SCTP is message-oriented and not stream-oriented (e.g. see RTCDataChannel.send() and RTCDataChannel.onmessage()). libp2p streams on the other hand are byte oriented. Thus we run into the risk of head-of-line blocking.

Given that the browser does not give us access to the MTU on a given connection, we can not make an informed decision on the optimal message size.

We follow the recommendation of QUIC, requiring "a minimum IP packet size of at least 1280 bytes". We calculate with an IPv4 minimum header size of 20 bytes and an IPv6 header size of 40 bytes. We calculate with a UDP header size of 8 bytes. An SCTP packet common header is 12 bytes long. An SCTP data chunk header size is 16 bytes.

  • IPv4: 1280 bytes - 20 bytes - 8 bytes - 12 bytes - 16 bytes = 1224 bytes
  • IPv6: 1280 bytes - 40 bytes - 8 bytes - 12 bytes - 16 bytes = 1204 bytes

Thus for payloads that would suffer from head-of-line blocking, implementations SHOULD choose a message size equal or below 1204 bytes. Or, in case the implementation can differentiate by IP version, equal or below 1224 bytes on IPv4 and 1224 bytes on IPv6.

Long term we hope to be able to give better recommendations based on real-world experiments.

RTCDataChannel negotiation

RTCDataChannels are negotiated in-band by the WebRTC user agent (e.g. Firefox, Pion, ...). In other words libp2p WebRTC implementations MUST NOT change the default value negotiated: false when creating a RTCDataChannel via RTCPeerConnection.createDataChannel.

The WebRTC user agent (i.e. not the application) decides on the RTCDataChannel ID based on the local node's connection role. For the interested reader see RF8832 Protocol Overview. It is RECOMMENDED that user agents reuse IDs once their RTCDataChannel closes. IDs MAY be reused according to RFC 8831: "Streams are available for reuse after a reset has been performed", see RFC 8831 6.7 Closing a Data Channel . Up to 65535 (2^16) concurrent data channels can be opened at any given time.

According to RFC 8832 a RTCDataChannel initiator "MAY start sending messages containing user data without waiting for the reception of the corresponding DATA_CHANNEL_ACK message", thus using negotiated: false does not imply an additional round trip for each new RTCDataChannel.

RTCDataChannel label

RTCPeerConnection.createDataChannel() requires passing a label for the to-be-created RTCDataChannel. When calling createDataChannel implementations MUST pass an empty string. When receiving an RTCDataChannel via RTCPeerConnection.ondatachannel implementations MUST NOT require label to be an empty string. This allows future versions of this specification to make use of the RTCDataChannel label property.

Connection Security

Note that the below uses the message framing described in multiplexing.

While WebRTC offers confidentiality and integrity via TLS, one still needs to authenticate the remote peer by its libp2p identity.

After Connection Establishment:

  1. A and B open a WebRTC data channel with id: 0 and negotiated: true (pc.createDataChannel("", {negotiated: true, id: 0});).

  2. B starts a Noise XX handshake on the new channel. See noise-libp2p.

    A and B use the Noise Prologue mechanism. More specifically A and B set the Noise Prologue to <PREFIX><FINGERPRINT_A><FINGERPRINT_B> before starting the actual Noise handshake. <PREFIX> is the UTF-8 byte representation of the string libp2p-webrtc-noise:. <FINGERPRINT_A><FINGERPRINT_B> is the concatenation of the two TLS fingerprints of A (Noise handshake responder) and then B (Noise handshake initiator), in their multihash byte representation.

    On Chrome A can access its TLS certificate fingerprint directly via RTCCertificate#getFingerprints. Firefox does not allow A to do so. Browser compatibility can be found here. In practice, this is not an issue since the fingerprint is embedded in the local SDP string.

  3. On success of the authentication handshake, the used datachannel is closed and the plain WebRTC connection is used with its multiplexing capabilities via datachannels. See Multiplexing.

Note: WebRTC supports different hash functions to hash the TLS certificate (see https://datatracker.ietf.org/doc/html/rfc8122#section-5). The hash function used in WebRTC and the hash function used in the multiaddr /certhash component MUST be the same. On mismatch the final Noise handshake MUST fail.

A knows B's fingerprint hash algorithm through B's multiaddr. A MUST use the same hash algorithm to calculate the fingerprint of its (i.e. A's) TLS certificate. B assumes A to use the same hash algorithm it discovers through B's multiaddr. For now implementations MUST support sha-256. Future iterations of this specification may add support for other hash algorithms.

Implementations SHOULD setup all the necessary callbacks (e.g. ondatachannel) before starting the Noise handshake. This is to avoid scenarios like one where A initiates a stream before B got a chance to set the ondatachannel callback. This would result in B ignoring all the messages coming from A targeting that stream.

Implementations MAY open streams before completion of the Noise handshake. Applications MUST take special care what application data they send, since at this point the peer is not yet authenticated. Similarly, the receiving side MAY accept streams before completion of the handshake.

Test vectors

Noise prologue

All of these test vectors represent hex-encoded bytes.

Both client and server use SHA-256

Here client is A and server is B.

client_fingerprint = "3e79af40d6059617a0d83b83a52ce73b0c1f37a72c6043ad2969e2351bdca870"
server_fingerprint = "30fc9f469c207419dfdd0aab5f27a86c973c94e40548db9375cca2e915973b99"

prologue = "6c69627032702d7765627274632d6e6f6973653a12203e79af40d6059617a0d83b83a52ce73b0c1f37a72c6043ad2969e2351bdca870122030fc9f469c207419dfdd0aab5f27a86c973c94e40548db9375cca2e915973b99"

FAQ

  • Why exchange the TLS certificate fingerprint in the multiaddr? Why not base it on the libp2p public key?

    Browsers do not allow loading a custom certificate. One can only generate a certificate via rtcpeerconnection-generatecertificate.

  • Why not embed the peer ID in the TLS certificate, thus rendering the additional "peer certificate" exchange obsolete?

    Browsers do not allow editing the properties of the TLS certificate.

  • How about distributing the multiaddr in a signed peer record, thus rendering the additional "peer certificate" exchange obsolete?

    Signed peer records are not yet rolled out across the many libp2p protocols. Making the libp2p WebRTC protocol dependent on the former is not deemed worth it at this point in time. Later versions of the libp2p WebRTC protocol might adopt this optimization.

    Note, one can role out a new version of the libp2p WebRTC protocol through a new multiaddr protocol, e.g. /webrtc-2.

  • Why exchange fingerprints in an additional authentication handshake on top of an established WebRTC connection? Why not only exchange signatures of ones TLS fingerprints signed with ones libp2p private key on the plain WebRTC connection?

    Once A and B established a WebRTC connection, A sends signature_libp2p_a(fingerprint_a) to B and vice versa. While this has the benefit of only requring two messages, thus one round trip, it is prone to a key compromise and replay attack. Say that E is able to attain signature_libp2p_a(fingerprint_a) and somehow compromise A's TLS private key, E can now impersonate A without knowing A's libp2p private key.

    If one requires the signatures to contain both fingerprints, e.g. signature_libp2p_a(fingerprint_a, fingerprint_b), the above attack still works, just that E can only impersonate A when talking to B.

    Adding a cryptographic identifier of the unique connection (i.e. session) to the signature (signature_libp2p_a(fingerprint_a, fingerprint_b, connection_identifier)) would protect against this attack. To the best of our knowledge the browser does not give us access to such identifier.

  • Why use Protobuf for WebRTC message framing. Why not use our own, potentially smaller encoding schema?

    The Protobuf framing adds an overhead of 5 bytes. The unsigned-varint prefix adds another 2 bytes. On a large message the overhead is negligible ((5 bytes + 2 bytes) / (16384 bytes - 7 bytes) = 0.000427246). On a small message, e.g. a multistream-select message with ~40 bytes the overhead is high ((5 bytes + 2 bytes) / 40 bytes = 0.175) but likely irrelevant.

    Using Protobuf allows us to evolve the protocol in a backwards compatibile way going forward. Using Protobuf is consistent with the many other libp2p protocols. These benefits outweigh the drawback of additional overhead.

  • Can a browser know upfront its UDP port which it is listening for incoming connections on? Does the browser reuse the UDP port across many WebRTC connections? If that is the case one could connect to any public node, with the remote telling the local node what port it is perceived on. Thus one could use libp2p's identify and AutoNAT protocol instead of relying on STUN.

    No, a browser uses a new UDP port for each RTCPeerConnection.

  • Why not load a remote node's certificate into one's browser trust-store and then connect e.g. via WebSocket.

    This would require a mechanism to discover remote node's certificates upfront. More importantly, this does not scale with the number of connections a typical peer-to-peer application establishes.

  • Why not use a central TURN servers? Why rely on libp2p's Circuit Relay v2 instead?

    As a peer-to-peer networking library, libp2p should rely as little as possible on central infrastructure.

  • Can an attacker launch an amplification attack with the STUN endpoint of the server?

    We follow the reasoning of the QUIC protocol, namely requiring:

    an endpoint MUST limit the amount of data it sends to the unvalidated address to three times the amount of data received from that address.

    https://datatracker.ietf.org/doc/html/rfc9000#section-8

    This is the case for STUN response messages which are only slight larger than the request messages. See also https://datatracker.ietf.org/doc/html/rfc5389#section-16.1.2.

  • Why does B start the Noise handshake and not A?

    Given that WebRTC uses DTLS 1.2, B is the one that can send data first.