From 2cb3e7b97125ea37c5fbcd9b8b7b99cf34c701d8 Mon Sep 17 00:00:00 2001 From: vyzo Date: Mon, 7 Sep 2020 13:37:26 +0300 Subject: [PATCH] gossipsub v1.1: Validation queue protection with Random Early Drop (#292) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * gossipsub v1.1 validation queue protection with Random Early Drop * compute GlobalDecayCoefficient for 2 minute decay * tweak default parameter values * add note about per topic delivery weights * move RED to its own (draft) specification * Update spec title Co-authored-by: Raúl Kripalani * editorial changes Co-authored-by: Raúl Kripalani --- pubsub/gossipsub/red.md | 128 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+) create mode 100644 pubsub/gossipsub/red.md diff --git a/pubsub/gossipsub/red.md b/pubsub/gossipsub/red.md new file mode 100644 index 0000000..1886dac --- /dev/null +++ b/pubsub/gossipsub/red.md @@ -0,0 +1,128 @@ +# gossipsub v1.1: Functional Extension for Validation Queue Protection + +| Lifecycle Stage | Maturity | Status | Latest Revision | +|-----------------|---------------------------|--------|-----------------| +| 1A | Working Draft | Active | r1, 2020-09-05 | + +Authors: [@vyzo] + +Interest Group: [@yusefnapora], [@raulk], [@whyrusleeping], [@Stebalien], [@daviddias], [@protolambda], [@djrtwo], [@dryajov], [@mpetrunic], [@AgeManning], [@Nashatyrev], [@mhchia] + +[@whyrusleeping]: https://github.com/whyrusleeping +[@yusefnapora]: https://github.com/yusefnapora +[@raulk]: https://github.com/raulk +[@vyzo]: https://github.com/vyzo +[@Stebalien]: https://github.com/Stebalien +[@daviddias]: https://github.com/daviddias +[@protolambda]: https://github.com/protolambda +[@djrtwo]: https://github.com/djrtwo +[@dryajov]: https://github.com/dryajov +[@mpetrunic]: https://github.com/mpetrunic +[@AgeManning]: https://github.com/AgeManning +[@Nashatyrev]: https://github.com/Nashatyrev +[@mhchia]: https://github.com/mhchia + +See the [lifecycle document][lifecycle-spec] for context about maturity level and spec status. + +[lifecycle-spec]: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md + +--- + + + +- [Overview](#overview) +- [Validation Queue Protection](#validation-queue-protection) +- [Random Early Drop Algorithm](#random-early-drop-algorithm) +- [RED Parameters](#red-parameters) + + + +## Overview + +This document specifies an extension to [gossipsub v1.1](gossipsub-v1.1.md) intended to +provide a circuit breaker so that routers can withstand concerted attacks targetting the +validation queue with a flood of spam. +This extension does not modify the protocol in any way and works in conjuction with the defensive +mechanisms of gossipsub v1.1. + +## Validation Queue Protection + +An important aspect of gossipsub is the reliance on validators to signal acceptance of incoming +messages from the application to the router. The validation is asynchronous, with a typical +implementation strategy that uses of a front-end queue and a limit to the number of ongoing validations. +This creates a potential target for attacks, as an attacker can overload the queue by brute force, +sending spam messages at a very high rate. The effect would be that legitimate messages get dropped +by the validation front end, resulting in denial of service. + +In order to protect the system from this class of attacks, gossipsub v1.1 incorporates a circuit +breaker that sits before the validation queue and can make informed decisions on whether to +push a message into the validation queue. This defensive mechanism kicks in when the system detects +an elevated rate of dropped messages, and makes decisions on whether to accept incoming messages for +validation based on the statistical performance of peers in the origin IP address. The decision is +probabilistic and implements a Random Early Drop (RED) strategy that drops messages with a probability +that depends on the acceptance rates for messages from the origin IP. This strategy can neuter +attacks on the validation queue, because messages are no longer dropped indiscriminately in a drop-tail +fashion. + +## Random Early Drop Algorithm + +The algorithm has two aspects: +- The decision on whether to trigger RED. +- The decision on whether to drop a message from an origin IP address. + +In order to trigger RED, the circuit breaker maintains the following queue statistics: +- a _decaying_ counter for the number of message validations. +- a _decaying_ counter for the number of dropped messages. + +The decision on triggering RED is based on comparing the ratio of dropped messages to validations. +If the ratio exceeds an application configured threshold, then the RED algorithm +triggers and a decision on whether to accept the message for validation is made based on origin IP +statistics. There is also a quiet period, such that if no messages have been dropped for a while, the +circuit breaker turns back off. + +In order to make the actual RED decision, the circuit breaker maintains the following statistics per +IP: +- a _decaying_ counter for the number of accepted messages. +- a _decaying_ counter for the number of duplicate messages, mixed with a weight `W_duplicate`. +- a _decaying_ counter for the number of ignored messages, mixed with a weight `W_ignored`. +- a _decaying_ counter for the number of rejected messages, mixed with a weight `W_rejected`. + +The router generates a random float `r` and accepts the message if and only if +``` +r < (1 + accepted) / (1 + accepted + W_duplicate * duplicate + W_ignored * ignored + W_rejected * rejected) +``` + +The number of accepted messages is biased by 1 so that a single negative event cannot sinkhole an IP. +It also always gives a chance for a message to be accepted, albeit with sharply decreasing probability +as negative events accumulate. + +All the counters decay linearly with an application configured decay factor, so that the sytem adapts +to varying network conditions. + +Also note that per IP statistics are retained for a configured period of time after disconnection, so +that an attacker cannot easily clear traces of misbehaviour by disconnecting. + +Finally, the circuit breaker should allow the application to configure per topic accepted delivery +weights, so that deliveries in priority topics can be given more weight. +If a topic is not configured, then its delivery weight is 1. + +## RED Parameters + +The circuit breaker utilizes the following application configured parameters: + +| Parameter | Purpose | Default | +|-----------|---------|---------| +| `ActivationThreshold` | dropped to validated message ratio threshold for triggering the circuit breaker | `0.33` | +| `GlobalDecayCoefficient` | linear decay coefficient for global stats | computed such that the counter decays to 1% after 2 minutes | +| `SourceDecayCoefficient` | linear decay coefficient for per IP stats | computed such that the counter decays to 1% after 1 hour | +| `QuietInterval` | interval of no dropped message events before turning off the circuit breaker | 1 minute | +| `W_duplicate` | counter mixin weight for duplicate messages | `0.125` | +| `W_ignore` | counter mixin weight for ignored messages | `1.0` | +| `W_reject` | coutner mixin weight for rejected messages | `16.0` | +| `RetentionPeriod` | duration of stats retention after disconnection | 6 hours | + +With the default parameters, we are rapidly penalising rejections, mildly penalising ignored messages, +and softly weighting duplicate messages because they occur normally for mesh peers. +The result is that clearly misbehaving peers whose messages lead to outright rejections, will make up +for a substantial part of the decision to break the circuit, while underperforming peers will also +factor in, but with less force.