diff --git a/_posts/2021-10-25-waku-v1-vs-waku-v2.md b/_posts/2021-10-25-waku-v1-vs-waku-v2.md new file mode 100644 index 00000000..b7b2882b --- /dev/null +++ b/_posts/2021-10-25-waku-v1-vs-waku-v2.md @@ -0,0 +1,217 @@ +--- +layout: post +name: "Waku v1 vs Waku v2: Bandwidth Comparison" +title: "Waku v1 vs Waku v2: Bandwidth Comparison" +date: 2021-11-03 10:00:00 +0200 +author: hanno +published: true +permalink: /waku-v1-v2-bandwidth-comparison +categories: research +summary: A local comparison of bandwidth profiles showing significantly improved scalability in Waku v2 over Waku v1. +image: /assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-network-size.png +discuss: https://forum.vac.dev/t/discussion-waku-v1-vs-waku-v2-bandwidth-comparison/110 +--- + +## Background + +The [original plan](https://vac.dev/waku-v2-plan) for Waku v2 suggested theoretical improvements in resource usage over Waku v1, +mainly as a result of the improved amplification factors provided by GossipSub. +In its turn, [Waku v1 proposed improvements](https://vac.dev/fixing-whisper-with-waku) over its predecessor, Whisper. + +Given that Waku v2 is aimed at resource restricted environments, +we are specifically interested in its scalability and resource usage characteristics. +However, the theoretical performance improvements of Waku v2 over Waku v1, +has never been properly benchmarked and tested. + +Although we're working towards a full performance evaluation of Waku v2, +this would require significant planning and resources, +if it were to simulate "real world" conditions faithfully and measure bandwidth and resource usage across different network connections, +robustness against attacks/losses, message latencies, etc. +(There already exists a fairly comprehensive [evaluation of GossipSub v1.1](https://research.protocol.ai/publications/gossipsub-v1.1-evaluation-report/vyzovitis2020.pdf), +on which [`11/WAKU2-RELAY`](https://rfc.vac.dev/spec/11/) is based.) + +As a starting point, +this post contains a limited and local comparison of the _bandwidth_ profile (only) between Waku v1 and Waku v2. +It reuses and adapts existing network simulations for [Waku v1](https://github.com/status-im/nim-waku/blob/master/waku/v1/node/quicksim.nim) and [Waku v2](https://github.com/status-im/nim-waku/blob/master/waku/v2/node/quicksim2.nim) +and compares bandwidth usage for similar message propagation scenarios. + +## Theoretical improvements in Waku v2 + +Messages are propagated in Waku v1 using [flood routing](https://en.wikipedia.org/wiki/Flooding_(computer_networking)). +This means that every peer will forward every new incoming message to all its connected peers (except the one it received the message from). +This necessarily leads to unnecessary duplication (termed _amplification factor_), +wasting bandwidth and resources. +What's more, we expect this effect to worsen the larger the network becomes, +as each _connection_ will receive a copy of each message, +rather than a single copy per peer. + +Message routing in Waku v2 follows the `libp2p` _GossipSub_ protocol, +which lowers amplification factors by only sending full message contents to a subset of connected peers. +As a Waku v2 network grows, each peer will limit its number of full-message ("mesh") peerings - +`libp2p` suggests a maximum of `12` such connections per peer. +This allows much better scalability than a flood-routed network. +From time to time, a Waku v2 peer will send metadata about the messages it has seen to other peers ("gossip" peers). + +See [this explainer](https://hackmd.io/@vac/main/%2FYYlZYBCURFyO_ZG1EiteWg#11WAKU2-RELAY-gossipsub) for a more detailed discussion. + +## Methodology + +The results below contain only some scenarios that provide an interesting contrast between Waku v1 and Waku v2. +For example, [star network topologies](https://en.wikipedia.org/wiki/Star_network) do not show a substantial difference between Waku v1 and Waku v2. +This is because each peer relies on a single connection to the central node for every message, +which barely requires any routing: +each connection receives a copy of every message for both Waku v1 and Waku v2. +Hybrid topologies similarly show only a difference between Waku v1 and Waku v2 for network segments with [mesh-like connections](https://en.wikipedia.org/wiki/Mesh_networking), +where routing decisions need to be made. + +For this reason, the following approach applies to all iterations: +1. Simulations are run **locally**. +This limits the size of possible scenarios due to local resource constraints, +but is a way to quickly get an approximate comparison. +2. Nodes are treated as a **blackbox** for which we only measure bandwidth, +using an external bandwidth monitoring tool. +In other words, we do not consider differences in the size of the envelope (for v1) or the message (for v2). +3. Messages are published at a rate of **50 new messages per second** to each network, +except where explicitly stated otherwise. +4. Each message propagated in the network carries **8 bytes** of random payload, which is **encrypted**. +The same symmetric key cryptographic algorithm (with the same keys) are used in both Waku v1 and v2. +5. Traffic in each network is **generated from 10 nodes** (randomly-selected) and published in a round-robin fashion to **10 topics** (content topics for Waku v2). +In practice, we found no significant difference in _average_ bandwidth usage when tweaking these two parameters (the number of traffic generating nodes and the number of topics). +6. Peers are connected in a decentralized **full mesh topology**, +i.e. each peer is connected to every other peer in the network. +Waku v1 is expected to flood all messages across all existing connections. +Waku v2 gossipsub will GRAFT some of these connections for full-message peerings, +with the rest being gossip-only peerings. +7. After running each iteration, we **verify that messages propagated to all peers** (comparing the number of published messages to the metrics logged by each peer). + +For Waku v1, nodes are configured as "full" nodes (i.e. with full bloom filter), +while Waku v2 nodes are `relay` nodes, all subscribing and publishing to the same PubSub topic. + +## Network size comparison + +### Iteration 1: 10 nodes + +Let's start with a small network of 10 nodes only and see how Waku v1 bandwidth usage compares to that of Waku v2. +At this small scale we don't expect to see improved bandwidth usage in Waku v2 over Waku v1, +since all connections, for both Waku v1 and Waku v2, will be full-message connections. +The number of connections is low enough that Waku v2 nodes will likely GRAFT all connections to full-message peerings, +essentially flooding every message on every connection in a similar fashion to Waku v1. +If our expectations are confirmed, it helps validate our methodology, +showing that it gives more or less equivalent results between Waku v1 and Waku v2 networks. + +![](/assets/img/waku1-vs-waku2/waku1-vs-waku2-10-nodes.png) + +Sure enough, the figure shows that in this small-scale setup, +Waku v1 actually has a lower per-peer bandwidth usage than Waku v2. +One reason for this may be the larger overall proportion of control messages in a gossipsub-routed network such as Waku v2. +These play a larger role when the total network traffic is comparatively low, as in this iteration. +Also note that the average bandwidth remains more or less constant as long as the rate of published messages remains stable. + +### Iteration 2: 30 nodes + +Now, let's run the same scenario for a larger network of highly-connected nodes, this time consisting of 30 nodes. +At this point, the Waku v2 nodes will start pruning some connections to limit the number of full-message peerings (to a maximum of `12`), +while the Waku v1 nodes will continue flooding messages to all connected peers. +We therefore expect to see a somewhat improved bandwidth usage in Waku v2 over Waku v1. + +![](/assets/img/waku1-vs-waku2/waku1-vs-waku2-30-nodes.png) + +Bandwidth usage in Waku v2 has increased only slightly from the smaller network of 10 nodes (hovering between 2000 and 3000 kbps). +This is because there are only a few more full-message peerings than before. +Compare this to the much higher increase in bandwidth usage for Waku v1, which now requires more than 4000 kbps on average. + + +### Iteration 3: 50 nodes + +For an even larger network of 50 highly connected nodes, +the divergence between Waku v1 and Waku v2 is even larger. +The following figure shows comparative average bandwidth usage for a throughput of 50 messages per second. + +![](/assets/img/waku1-vs-waku2/waku1-vs-waku2-50-nodes.png) + +Average bandwidth usage (for the same message rate) has remained roughly the same for Waku v2 as it was for 30 nodes, +indicating that the number of full-message peerings per node has not increased. + +### Iteration 4: 85 nodes + +We already see a clear trend in the bandwidth comparisons above, +so let's confirm by running the test once more for a network of 85 nodes. +Due to local resource constraints, the effective throughput for Waku v1 falls to below 50 messages per second, +so the v1 results below have been normalized and are therefore approximate. +The local Waku v2 simulation maintains the message throughput rate without any problems. + +![](/assets/img/waku1-vs-waku2/waku1-vs-waku2-85-nodes.png) + +### Iteration 5: 150 nodes + +Finally, we simulate message propagation in a network of 150 nodes. +Due to local resource constraints, we run this simulation at a lower rate - +35 messages per second - +and for a shorter amount of time. + +![](/assets/img/waku1-vs-waku2/waku1-vs-waku2-150-nodes.png) + +Notice how the Waku v1 bandwidth usage is now more than 10 times worse than that of Waku v2. +This is to be expected, as each Waku v1 node will try to flood each new message to 149 other peers, +while the Waku v2 nodes limit their full-message peerings to no more than 12. + +### Discussion + +Let's summarize average bandwidth growth against network growth for a constant message propagation rate. +Since we are particularly interested in how Waku v1 compares to Waku v2 in terms of bandwidth usage, +the results are normalised to the Waku v2 average bandwidth usage for each network size. + +![](/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-network-size.png) + +Extrapolation is a dangerous game, +but it's safe to deduce that the divergence will only grow for even larger network topologies. +Although control signalling contributes more towards overall bandwidth for Waku v2 networks, +this effect becomes less noticeable for larger networks. +For network segments with more than ~18 densely connected nodes, +the advantage of using Waku v2 above Waku v1 becomes clear. + +## Network traffic comparison + +The analysis above controls the average message rate while network size grows. +In reality, however, active users (and therefore message rates) are likely to grow in conjunction with the network. +This will have an effect on bandwidth for both Waku v1 and Waku v2, though not in equal measure. +Consider the impact of an increasing rate of messages in a network of constant size: + +![](/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-message-rate.png) + +The _rate_ of increase in bandwidth for Waku v2 is slower than that for Waku v1 for a corresponding increase in message propagation rate. +In fact, for a network of 30 densely-connected nodes, +if the message propagation rate increases by 1 per second, +Waku v1 requires an increased average bandwidth of almost 70kbps at each node. +A similar traffic increase in Waku v2 requires on average 40kbps more bandwidth per peer, just over half that of Waku v1. + +## Conclusions + +- **Waku v2 scales significantly better than Waku v1 in terms of average bandwidth usage**, +especially for densely connected networks. +- E.g. for a network consisting of **150** or more densely connected nodes, +Waku v2 provides more than **10x** better average bandwidth usage rates than Waku v1. +- As the network continues to scale, both in absolute terms (number of nodes) and in network traffic (message rates) the disparity between Waku v2 and Waku v1 becomes even larger. + +## Future work + +Now that we've confirmed that Waku v2's bandwidth improvements over its predecessor matches theory, +we can proceed to a more in-depth characterisation of Waku v2's resource usage. +Some questions that we want to answer include: +- What proportion of Waku v2's bandwidth usage is used to propagate _payload_ versus bandwidth spent on _control_ messaging to maintain the mesh? +- To what extent is message latency (time until a message is delivered to its destination) affected by network size and message rate? +- How _reliable_ is message delivery in Waku v2 for different network sizes and message rates? +- What are the resource usage profiles of other Waku v2 protocols (e.g.[`12/WAKU2-FILTER`](https://rfc.vac.dev/spec/12/) and [`19/WAKU2-LIGHTPUSH`](https://rfc.vac.dev/spec/19/))? + +Our aim is to get ever closer to a "real world" understanding of Waku v2's performance characteristics, +identify and fix vulnerabilities +and continually improve the efficiency of our suite of protocols. + +## References + +- [Evaluation of GossipSub v1.1](https://research.protocol.ai/publications/gossipsub-v1.1-evaluation-report/vyzovitis2020.pdf) +- [Fixing Whisper with Waku](https://vac.dev/fixing-whisper-with-waku) +- [GossipSub vs flood routing](https://hackmd.io/@vac/main/%2FYYlZYBCURFyO_ZG1EiteWg#11WAKU2-RELAY-gossipsub) +- [Network topologies: star](https://www.techopedia.com/definition/13335/star-topology#:~:text=Star%20topology%20is%20a%20network,known%20as%20a%20star%20network.) +- [Network topologies: mesh](https://en.wikipedia.org/wiki/Mesh_networking) +- [Waku v2 original plan](https://vac.dev/waku-v2-plan) \ No newline at end of file diff --git a/assets/img/waku1-vs-waku2/waku1-vs-waku2-10-nodes.png b/assets/img/waku1-vs-waku2/waku1-vs-waku2-10-nodes.png new file mode 100644 index 00000000..d37302f9 Binary files /dev/null and b/assets/img/waku1-vs-waku2/waku1-vs-waku2-10-nodes.png differ diff --git a/assets/img/waku1-vs-waku2/waku1-vs-waku2-150-nodes.png b/assets/img/waku1-vs-waku2/waku1-vs-waku2-150-nodes.png new file mode 100644 index 00000000..b3d134db Binary files /dev/null and b/assets/img/waku1-vs-waku2/waku1-vs-waku2-150-nodes.png differ diff --git a/assets/img/waku1-vs-waku2/waku1-vs-waku2-30-nodes.png b/assets/img/waku1-vs-waku2/waku1-vs-waku2-30-nodes.png new file mode 100644 index 00000000..3b6c7c9e Binary files /dev/null and b/assets/img/waku1-vs-waku2/waku1-vs-waku2-30-nodes.png differ diff --git a/assets/img/waku1-vs-waku2/waku1-vs-waku2-50-nodes.png b/assets/img/waku1-vs-waku2/waku1-vs-waku2-50-nodes.png new file mode 100644 index 00000000..f00d9b3a Binary files /dev/null and b/assets/img/waku1-vs-waku2/waku1-vs-waku2-50-nodes.png differ diff --git a/assets/img/waku1-vs-waku2/waku1-vs-waku2-85-nodes.png b/assets/img/waku1-vs-waku2/waku1-vs-waku2-85-nodes.png new file mode 100644 index 00000000..d05390a5 Binary files /dev/null and b/assets/img/waku1-vs-waku2/waku1-vs-waku2-85-nodes.png differ diff --git a/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-message-rate.png b/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-message-rate.png new file mode 100644 index 00000000..2e2f150c Binary files /dev/null and b/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-message-rate.png differ diff --git a/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-network-size.png b/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-network-size.png new file mode 100644 index 00000000..94fe0cc5 Binary files /dev/null and b/assets/img/waku1-vs-waku2/waku1-vs-waku2-overall-network-size.png differ