From 8a95b0c93c8c9d81e511284fa4f12bdeb09dc94f Mon Sep 17 00:00:00 2001 From: ufarooqstatus <137903752+ufarooqstatus@users.noreply.github.com> Date: Mon, 6 Nov 2023 13:50:04 +0500 Subject: [PATCH] GossipSub Improvements blog post (#130) --- rlog/2023-09-27-gossipimprovements.mdx | 234 +++++++++++++++++++++++++ rlog/authors.yml | 5 + 2 files changed, 239 insertions(+) create mode 100644 rlog/2023-09-27-gossipimprovements.mdx diff --git a/rlog/2023-09-27-gossipimprovements.mdx b/rlog/2023-09-27-gossipimprovements.mdx new file mode 100644 index 00000000..cf773b86 --- /dev/null +++ b/rlog/2023-09-27-gossipimprovements.mdx @@ -0,0 +1,234 @@ +--- +title: 'GossipSub Improvements: Evolution of Overlay Design and Message Dissemination in Unstructured P2P Networks' +date: 2023-11-06 12:00:00 +authors: farooq +published: true +slug: GossipSub Improvements +categories: research + + +toc_min_heading_level: 2 +toc_max_heading_level: 5 +--- + +GossipSub Improvements: Evolution of Overlay Design and Message Dissemination in Unstructured P2P Networks + + + +## Motivitation +We have been recently working on analyzing and improving the performance of the GossipSub protocol for large messages, +as in the case of Ethereum Improvement Proposal [EIP-4844](https://eips.ethereum.org/EIPS/eip-4844). +This work led to a comprehensive study of unstructured P2P networks. +The intention was to identify the best practices that can serve as guidelines for performance improvement and scalability of P2P networks. + +## Introduction + +Nodes in an unstructured p2p network form self-organizing overlay(s) on top of the IP infrastructure to facilitate different services like information dissemination, +query propagation, file sharing, etc. The overlay(s) can be as optimal as a tree-like structure or as enforcing as a fully connected mesh. + +Due to peer autonomy and a trustless computing environment, some peers may deviate from the expected operation or even leave the network. +At the same time, the underlying IP layer is unreliable. + +Therefore, tree-like overlays are not best suited for reliable information propagation. +Moreover, tree-based solutions usually result in significantly higher message dissemination latency due to suboptimal branches. + +Flooding-based solutions, on the other hand, result in maximum resilience against adversaries and achieve minimal message dissemination latency because the message propagates through all (including the optimal) paths. +Redundant transmissions help maintain the integrity and security of the network in the presence of adversaries and high node failure but significantly increase network-wide bandwidth utilization, cramming the bottleneck links. + +An efficient alternative is to lower the number of redundant transmissions by D-regular broadcasting, where a peer will likely receive (or relay) a message from up to $D$ random peers. +Publishing through a D-regular overlay triggers approximately $N \times D$ transmissions. +Reducing $D$ reduces the redundant transmissions but compromises reachability and latency. +Sharing metadata through a K-regular overlay (where $K > D$) allows nodes to pull missing messages. + +GossipSub [[1](https://arxiv.org/pdf/2007.02754.pdf)] benefits from full-message (D-regular) and metadata-only (k-regular) overlays. +Alternatively, a metadata-only overlay can be used, requiring a pull-based operation that significantly minimizes bandwidth utilization at the cost of increased latency. + +Striking the right balance between parameters like $D, K$, pull-based operation, etc., can yield application-specific performance tuning, but scalability remains a problem. + +At the same time, many other aspects can significantly contribute to the network's performance and scalability. +One option is to realize peers' suitability and continuously changing capabilities while forming overlays. + +For instance, a low-bandwidth link near a publisher can significantly demean the entire network's performance. +Reshuffling of peering links according to the changing network conditions can lead to superior performance. + +Laying off additional responsibilities to more capable nodes (super nodes) can alleviate peer cramming, but it makes the network susceptible to adversaries/peer churn. +Grouping multiple super nodes to form virtual node(s) can solve this problem. + + +Similarly, flat (single-tier) overlays cannot address the routing needs in large (geographically dispersed) networks. + +Hierarchical (Multi-tier) overlays with different intra/inter-overlay routing solutions can better address these needs. +Moreover, using message aggregation schemes for grouping multiple messages can save bandwidth and provide better resilience against adversaries/peer churn. + + +This article's primary objective is to investigate the possible choices that can empower an unstructured P2P network to achieve superior performance for the broadest set of applications. +We look into different constraints imposed by application-specific needs (performance goals) and investigate various choices that can augment the network's performance. +We explore overlay designs/freshness, peer selection approaches, message-relaying mechanisms, and resilience against adversaries/peer churn. +We consider GossipSub a baseline protocol to explore various possibilities and decisively commit to the ones demonstrating superior performance. +We also discuss the current state and, where applicable, propose a strategic plan for embedding new features to the GossipSub protocol. + + +## GOAL1: Low Latency Operation +Different applications, like blockchain, streaming, etc., impose strict time bounds on network-wide message dissemination latency. +A message delivered after the imposed time bounds is considered as dropped. +An early message delivery in applications like live streaming can further enhance the viewing quality. + +The properties and nature of the overlay network topology significantly impact the performance of services and applications executed on top of them. +Studying and devising mechanisms for better overlay design and message dissemination is paramount to achieving superior performance. + +Interestingly, shortest-path message delivery trees have many limitations: + +1) Changing network dynamics requires a quicker and continuous readjustment of the multicast tree. +2) The presence of resource-constrained (bandwidth/compute, etc.) nodes in the overlay can result in congestion. +3) Node failure can result in partitions, making many segments unreachable. +4) Assuring a shortest-path tree-like structure requires a detailed view of the underlying (and continuously changing) network topology. + + +Solutions involve creating multiple random trees to add redundancy [[2](https://ieeexplore.ieee.org/abstract/document/6267905)]. +Alternatives involve building an overlay mesh and forwarding messages through the multicast delivery tree (eager push). + +Metadata is shared through the overlay links so that the nodes can ask for missing messages (lazy push or pull-based operation) through the overlay links. +New nodes are added from the overlay on node failure, but it requires non-faulty node selection. + +GossipSub uses eager push (through overlay mesh) and lazy push (through IWANT messages). + +The mesh degree $D_{Low} \leq D \leq D_{High}$ is crucial in deciding message dissemination latency. +A smaller value for $D$ results in higher latency due to increased rounds, whereas a higher $D$ reduces latency on the cost of increased bandwidth. +At the same time, keeping $D$ independent of the growing network size ($N$) may increase network-wide message dissemination latency. +Adjusting $D$ with $N$ maintains similar latency on the cost of increased workload for peers. +Authors in [[3](https://infoscience.epfl.ch/record/83478/files/EugGueKerMas04IEEEComp.pdf)] suggest only a logarithmic increase in $D$ to maintain a manageable workload for peers. +In [[4](https://inria.hal.science/tel-02375909/document)], it is reported that the average mesh degree should not exceed $D_{avg} = \ln(N) + C$ for an optimal operation, +where $C$ is a small constant. + +Moreover, quicker shuffling of peers results in better performance in the presence of resource-constrained nodes or node failure [[4](https://inria.hal.science/tel-02375909/document)]. + +## GOAL2: Considering Heterogeneity In Overlay Design +Random peering connections in P2P overlays represent a stochastic process. It is inherently difficult to precisely model the performance of such systems. +Most of the research on P2P networks provides simulation results assuming nodes with similar capabilities. +The aspect of dissimilar capabilities and resource-constrained nodes is less explored. + +It is discussed in GOAL1 that overlay mesh results in better performance if $D_{avg}$ does not exceed $\ln(N) + C$. +Enforcing all the nodes to have approximately $\ln(N) + C$ peers makes resource-rich nodes under-utilized, while resource-constrained nodes are overloaded. +At the same time, connecting high-bandwidth nodes through a low-bandwidth node undermines the network's performance. +Ideally, the workload on any node should not exceed its available resources. +A better solution involves a two-phased operation: + + +1. Every node computes its available bandwidth and selects a node degree $D$ proportional to its available bandwidth [[4](https://inria.hal.science/tel-02375909/document)]. + Different bandwidth estimation approaches are suggested in literature [[5](https://ieeexplore.ieee.org/abstract/document/1224454),[6](https://ieeexplore.ieee.org/abstract/document/1248658)]. + Simple bandwidth estimation approaches like variable packet size probing [[6](https://ieeexplore.ieee.org/abstract/document/1248658)] yield similar results with less complexity. + It is also worth mentioning that many nodes may want to allocate only a capped share of their bandwidth to the network. + Lowering $D$ according to the available bandwidth can still prove helpful. + Additionally, bandwidth preservation at the transport layer through approaches like µTP can be useful. + To further conform to the suggested mesh-degree average $D_{avg}$, every node tries achieving this average within its neighborhood, resulting in an overall similar $D_{avg}$. + +2. From the available local view, every node tries connecting peers with the lowest latency until $D$ connections are made. + We suggest referring to the peering solution discussed in GOAL5 to avoid network partitioning. + +The current GossipSub design considers homogeneous peers, and every node tries maintaining $D_{Low} \leq D \leq D_{High}$ connections. + + +## GOAL3: Bandwidth Optimization +Redundant message transmissions are essential for handling adversaries/node failure. However, these transmissions result in traffic bursts, cramming many overlay links. +This not only adds to the network-wide message dissemination latency but a significant share of the network's bandwidth is wasted on (usually) unnecessary transmissions. +It is essential to explore solutions that can minimize the number of redundant transmissions while assuring resilience against node failures. + +Many efforts have been made to minimize the impact of redundant transmissions. +These solutions include multicast delivery trees, metadata sharing to enable pull-based operation, in-network information caching, etc. [[7](https://dl.acm.org/doi/abs/10.1145/945445.945473),[8](https://link.springer.com/chapter/10.1007/11558989_12)]. +GossipSub employs a hybrid of eager push (message dissemination through the overlay) and lazy push (a pull-based operation by the nodes requiring information through IWANT messages). + +A better alternative to simple redundant transmission is to use message aggregation [[9](https://ieeexplore.ieee.org/abstract/document/8737576),[10](https://dl.acm.org/doi/abs/10.1145/1993636.1993676),[11](https://ieeexplore.ieee.org/abstract/document/4276446)] for the GossipSub protocol. +As a result, redundant message transmissions can serve as a critical advantage of the GossipSub protocol. +Suppose that we have three equal-length messages $x1, x2, x3$. Assuming an XOR coding function, +we know two trivial properties: $x1 \oplus x2 \oplus x2 = x1$ and $\vert x1 \vert = \vert x1 \oplus x2 \oplus x2 \vert$. + +This implies that instead of sending messages individually, we can encode and transmit composite message(s) to the network. +The receiver can reconstruct the original message from encoded segments. +As a result, fewer transmissions are sufficient for sending more messages to the network. + +However, sharing linear combinations of messages requires organizing messages in intervals, +and devising techniques to identify all messages belonging to each interval. +In addition, combining messages from different publishers requires more complex arrangements, +involving embedding publisher/message IDs, delayed forwarding (to accommodate more messages), and mechanisms to ensure the decoding of messages at all peers. +Careful application-specific need analysis can help decide the benefits against the added complexity. + +## GOAL4: Handling Large Messages +Many applications require transferring large messages for their successful operation. For instance, database/blockchain transactions [[12](https://eips.ethereum.org/EIPS/eip-4844)]. +This introduces two challenges: + +1) Redundant large message transmissions result in severe network congestion. +2) Message transmissions follow a store/forward process at all peers, which is inefficient in the case of large messages. + +The above-mentioned challenges result in a noticeable increase in message dissemination latency and bandwidth wastage. +Most of the work done for handling large messages involves curtailing redundant transmissions using multicast delivery trees, +reducing the number of fanout nodes, employing in-network message caching, pull-based operation, etc. + +Approaches like message aggregation also prove helpful in minimizing bandwidth wastage. + +Our recent work on GossipSub improvements (still a work in progress) suggests the following solutions to deal with large message transmissions: + +1. Using IDontWant message proposal [[13](https://github.com/libp2p/specs/pull/413)] and staggered sending. + + IDontWant message helps curtail redundant transmissions by letting other peers know we have already received the message. + Staggered sending enables relaying the message to a short subset of peers in each round. + We argue that simultaneously relaying a message to all peers hampers the effectiveness of the IDontWant message. + Therefore, using the IDontWant message with staggered sending can yield better results by allowing timely reception and processing of IDontWant messages. + +2. Message transmissions follow a store/forward process at all peers that is inefficient in the case of large messages. + We can parallelize message transmission by partitioning large messages into smaller fragments, letting intermediate peers relay these fragments as soon as they receive them. + + +## GOAL5: Scalability +P2P networks are inherently scalable because every incoming node brings in bandwidth and compute resources. +In other words, we can keep adding nodes to the network as long as every incoming node brings at-least $R \times D$ bandwidth, +where $R$ is average data arrival rate. +It is worth mentioning that network-wide message dissemination requires at-least $\lceil \log_D (N) \rceil$ hops. +Therefore, increasing network size increases message dissemination latency, assuming D is independent of the network size. + +Additionally, problems like peer churn, adversaries, heterogeneity, distributed operation, etc., significantly hamper the network's performance. +Most efforts for bringing scalability to the P2P systems have focused on curtailing redundant transmissions and flat overlay adjustments. +Hierarchical overlay designs, on the other hand, are less explored. + +Placing a logical structure in unstructured P2P systems can help scale P2P networks. + +One possible solution is to use a hierarchical overlay inspired by the approaches [[14](https://link.springer.com/article/10.1007/s12083-016-0460-5),[15](https://link.springer.com/chapter/10.1007/978-3-030-19223-5_16),[16](https://ieeexplore.ieee.org/abstract/document/9826458)]. +An abstract operation of such overlay design is provided below: + +1. Clustering nodes based on locality, assuming that such peers will have relatively lower intra-cluster latency and higher bandwidth. + For this purpose, every node tries connecting peers with the lowest latency until $D$ connections are made or the cluster limit is reached. + +2. A small subset of nodes having the highest bandwidth and compute resources is selected from each cluster. + These super nodes form a fully connected mesh and jointly act as a virtual node, + mitigating the problem of peer churn among super nodes. + +3. Virtual nodes form a fully connected mesh to construct a hierarchical overlay. + Each virtual node is essentially a collection of super nodes; + a link to any of the constituent super nodes represents a link to the virtual node. + +4. One possible idea is to use GossipSub for intra-cluster message dissemination and FloodSub for inter-cluster message dissemination. + +## Summary +Overlay acts as a virtual backbone for a P2P network. A flat overlay is more straightforward and allows effortless readjustment to application needs. +On the other hand, a hierarchical overlay can bring scalability at the cost of increased complexity. +Regardless of the overlay design, a continuous readjustment to appropriate peering links is essential for superior performance. +At the same time, bandwidth preservation (through message aggregation, caching at strategic locations, metadata sharing, pull-based operation, etc.) can help minimize latency. +However, problems like peer churn and in-network adversaries can be best alleviated through balanced redundant coverage, and frequent reshuffling of the peering links. + +# References + +* [1] D. Vyzovitis, Y. Napora, D. McCormick, D. Dias, and Y. Psaras, “Gossipsub: Attack-resilient message propagation in the filecoin and eth2. 0 networks,” arXiv preprint arXiv:2007.02754, 2020. Retrieved from https://arxiv.org/pdf/2007.02754.pdf +* [2] M. Matos, V. Schiavoni, P. Felber, R. Oliveira, and E. Riviere, “Brisa: Combining efficiency and reliability in epidemic data dissemination,” in 2012 IEEE 26th International Parallel and Distributed Processing Symposium. IEEE, 2012, pp. 983–994. Retrieved from https://ieeexplore.ieee.org/abstract/document/6267905 +* [3] P. T. Eugster, R. Guerraoui, A. M. Kermarrec, and L. Massouli, “Epidemic information dissemination in distributed systems,” IEEE Computer, vol. 37, no. 5, 2004. Retrieved from https://infoscience.epfl.ch/record/83478/files/EugGueKerMas04IEEEComp.pdf +* [4] D. Frey, “Epidemic protocols: From large scale to big data,” Ph.D. dissertation, Universite De Rennes 1, 2019. Retrieved from https://inria.hal.science/tel-02375909/document +* [5] M. Jain and C. Dovrolis, “End-to-end available bandwidth: measurement methodology, dynamics, and relation with tcp throughput,” IEEE/ACM Transactions on networking, vol. 11, no. 4, pp. 537–549, 2003. Retrieved from https://ieeexplore.ieee.org/abstract/document/1224454 +* [6] R. Prasad, C. Dovrolis, M. Murray, and K. Claffy, “Bandwidth estimation: metrics, measurement techniques, and tools,” IEEE network, vol. 17, no. 6, pp. 27–35, 2003. Retrieved from https://ieeexplore.ieee.org/abstract/document/1248658 +* [7] D. Kostic, A. Rodriguez, J. Albrecht, and A. Vahdat, “Bullet: High bandwidth data dissemination using an overlay mesh,” in Proceedings of the nineteenth ACM symposium on Operating systems principles, 2003, pp. 282–297. Retrieved from https://dl.acm.org/doi/abs/10.1145/945445.945473 +* [8] V. Pai, K. Kumar, K. Tamilmani, V. Sambamurthy, and A. E. Mohr, “Chainsaw: Eliminating trees from overlay multicast,” in Peer-to-Peer Systems IV: 4th International Workshop, IPTPS 2005, Ithaca, NY, USA, February 24-25, 2005. Revised Selected Papers 4. Springer, 2005, pp. 127–140. Retrieved from https://link.springer.com/chapter/10.1007/11558989_12 +* [9] Y.-D. Bromberg, Q. Dufour, and D. Frey, “Multisource rumor spreading with network coding,” in IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019, pp. 2359–2367. Retrieved from https://ieeexplore.ieee.org/abstract/document/8737576 +* [10] B. Haeupler, “Analyzing network coding gossip made easy,” in Proceedings of the forty-third annual ACM symposium on Theory of computing, 2011, pp. 293–302. Retrieved from https://dl.acm.org/doi/abs/10.1145/1993636.1993676 +* [11] S. Yu and Z. Li, “Massive data delivery in unstructured peer-to-peer networks with network coding,” in 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007). IEEE, 2007, pp. 592–597. Retrieved from https://ieeexplore.ieee.org/abstract/document/4276446 +* [12] V. Buterin, D. Feist, D. Loerakker, G. Kadianakis, M. Garnett, M. Taiwo, and A. Dietrichs, “Eip-4844: Shard blob transactions scale data-availability of ethereum in a simple, forwards-compatible manner,” 2022. Retrieved from https://eips.ethereum.org/EIPS/eip-4844 +* [13] A. Manning, “Gossipsub extension for epidemic meshes (v1.2.0),” 2022. Retrieved from https://github.com/libp2p/specs/pull/413 +* [14] Z. Duan, C. Tian, M. Zhou, X. Wang, N. Zhang, H. Du, and L. Wang, “Two-layer hybrid peer-to-peer networks,” Peer-to-Peer Networking and Applications, vol. 10, pp. 1304–1322, 2017. Retrieved from https://link.springer.com/article/10.1007/s12083-016-0460-5 +* [15] W. Hao, J. Zeng, X. Dai, J. Xiao, Q. Hua, H. Chen, K.-C. Li, and H. Jin, “Blockp2p: Enabling fast blockchain broadcast with scalable peer-to-peer network topology,” in Green, Pervasive, and Cloud Computing: 14th International Conference, GPC 2019, Uberlandia, Brazil, May 26–28, 2019, Proceedings 14. Springer, 2019, pp. 223–237. Retrieved from https://link.springer.com/chapter/10.1007/978-3-030-19223-5_16 +* [16] H. Qiu, T. Ji, S. Zhao, X. Chen, J. Qi, H. Cui, and S. Wang, “A geography-based p2p overlay network for fast and robust blockchain systems,” IEEE Transactions on Services Computing, 2022. Retrieved from https://ieeexplore.ieee.org/abstract/document/9826458 diff --git a/rlog/authors.yml b/rlog/authors.yml index 90adcf12..204ece6f 100644 --- a/rlog/authors.yml +++ b/rlog/authors.yml @@ -46,3 +46,8 @@ sanaz: moudy: name: 'Moudy' github: 'moudyellaz' + +farooq: + name: 'Umar Farooq' + github: 'ufarooqstatus' +