NCT - Commitments (major draft) (#3)

This commit is contained in:
Benjamin Arntzen
2024-10-07 11:52:42 +02:00
committed by GitHub
parent 590275935a
commit c7b74c0be1
9 changed files with 1090 additions and 1 deletions

View File

@@ -0,0 +1,152 @@
---
title: Codex Comparison
tags:
- "2024q4"
- "dst"
- "codex"
draft: false
description: "Measure Codex against systems like IPFS, BitTorrent etc and see how it compares. Primarily BitTorrent."
---
`vac:dst:codex:codex-comparison`
Measure Codex against systems like IPFS, BitTorrent etc and see how it compares. Primarily BitTorrent.
## Description
We will compare Codex to other systems like IPFS and BitTorrent
to see how it performs.
We will compare on things such as:
* Time to first byte
* Bandwidth usage
* Stability
* Reliability
Most importantly we will do a head to head speed test
comparing download speeds of Codex against other systems.
This will allow us to understand where Codex needs improvement
and where it stands right now in terms of suitability for different use cases.
We will support the Conduit of Expertise narrative directly
by providing valuable insights to Codex
that allow them to understand how Codex performs
in comparison to common and popular systems in the "altruistic" space.
Specifically, we will:
* Accelerate Codex reaching competitiveness with BitTorrent or find out what is and isn't possible to do.
* Answer the simple question: "Is Codex faster than BitTorrent?"
and in doing so, allow that to be a yes one day 😀
* Test the reliability of Codex in automated and highly stressful benchmarks
that push its limits and reveal its shortcomings.
* Improve the RFC culture by allowing us to reuse the work we do here
to build future scenarios that can test complicated situations
and requirements in a repeatable way.
## Task List
### Matrices Deployments
* fully qualified name: <vac:dst:codex:codex-comparison:matrices-deployments>
* owner: Wings
* status: 50%
* start-date: 2024/10/01
* end-date: 2024/10/11
#### Description
Expand upon the current deployment work
that uses Kubernetes manifests
to deploy and measure complex simulations
by implementing a combination of ArgoCD or some similar deployment tool,
and standardised Helm, Kustomize or plain manifests,
and devise a way to both script and control simulations
in a repeatable, easy way.
Build a system that can deploy and measure
a matrix of different scenarios and configurations.
It must allow multiple unrelated deployments,
such as nwaku and gowaku, to exist and interact
in the course of a single test.
#### Deliverables
* Example Helm charts or Kustomize for deploying Codex.
* Customisations to those Helm or Kustomize charts that allow tuning them to meet specific scenarios such as number of nodes, amount of data.
* Automated systems for running a matrix of tests and measuring them.
This will build on prior work by DST that benefits from this work as well (ArgoCD work).
### Control BitTorrent
* fully qualified name: <vac:dst:codex:codex-comparison:control-bittorrent>
* owner: Wings
* status: 0%
* start-date: 2024/10/10
* end-date: 2024/10/14
Pick a BitTorrent client that is Dockerizable and scriptable. Current main candidate is Deluge, maybe qBittorrent.
Find a sane way to control and script BitTorrent behaviour
such as distributing a torrent file to the set of peers
that will be tested and automating stopping, starting, and otherwise manipulating torrents
as a separate process from launching the initial client swarm. Flexibility and consistency is the goal.
Implement those controls and start using them to build towards the wider Commitment.
#### Deliverables
* Selected BitTorrent client and explained reasons for choice.
* Built a Dockerised image if there isn't one already.
* Implemented this into a test scenario of some kind and proven that we can script a scenario.
* A report on what we learned from the process.
### k8sified Tracker
* fully qualified name: <vac:dst:codex:codex-comparison:k8sified-tracker>
* owner: Wings
* status: 0%
* start-date: 2024/10/15
* end-date: 2024/10/17
Make a BitTorrent tracker work within Kubernetes and able to be controlled by API calls.
Most likely it will simply involve adding auth to an existing Deluge or similar API, and passing the request through the existing API.
#### Deliverables
* BitTorrent trackers compared, best one selected, reasons for best choice recorded.
* Chosen tracker is dockerized.
* Chosen tracker is scriptable.
* Finished script and docker container can realistically be used in a test scenario.
### Build/Test Scenarios
* fully qualified name: <vac:dst:codex:codex-comparison:build-test-scenarios>
* owner: Wings
* status: 0%
* start-date: <2024/10/15>
* end-date: <2024/12/31>
Use the work done in Matrices Deployments and Control BitTorrent to build and test a set of scenarios that can be used to test Codex.
We will target these things to compare:
**Modes**: BitTorrent, Codex Erasure-Coded, Codex Non-Erasure-Coded
**Swarm Size**:
* total size: 2, 8, 16, 32
* seeders: 1, 2, 4, 8, 16
* file size:
100
MB,
1
GB,
5
GB
We will compare a matrix of file sizes, seeders, total size, and build a flexible test harness on top of Matrices Deployments and Control BitTorrent to run the tests.
#### Deliverables
* A completely automated end to end test scenario that can be used to test Codex against BitTorrent.
* A report on the results of the tests and the conclusions we can draw from them.
* Hard numbers on what Codex is capable of and how these swarm sizes and other parameters affect performance, latency and other metrics.

View File

@@ -0,0 +1,132 @@
---
title: Codex Scaling
tags:
- "2024q4"
- "dst"
- "codex"
draft: false
description: "Improve Codex's scaling abilities
and our understanding of these,
using scientific testing and experiments,
leading to better scaling.
Compare to other systems.
Support the testnet efforts by providing base capacity.
Measure speed, latency, other metrics.
Give hard numbers on Codex vs BitTorrent."
---
`vac:dst:codex:codex-scaling`
## Description
Use real world testing, theoretical analysis and simulation
to determine and improve Codex's scaling properties.
Find the limits of Codex's capabilities and measure them in different scenarios.
We will allow Codex to scale to support large scale use cases,
test how it behaves in large 100TB+ testnet deployments
and in various deployment setups,
and we will help make Codex more scalable in the first place.
We will support the Conduit of Expertise narrative directly
by providing valuable insights to Codex
and the ability to theorise, reason about,
test, measure and improve
the performance, stability and scalability of Codex.
These efforts will contribute in these ways to the Conduit of Expertise narrative:
* Accelerate adoption and development and productising of Codex
by providing support to the Codex team
in the form of real world testing
to improve their efficiency and effectiveness
in building a better product.
* Improve the RFC culture
by allowing for faster and easier development of RFCs
with the aid of rapidly accelerated insights
into how an RFC in development will perform
as it's being expanded and going through the draft process.
* Allow easier post-mortem analysis
of the success or relative performance of a given RFC -
does this change use more or less bandwidth?
Did it improve things?
Seeing the effects of changes at scale
allows for a greater ability to usefully wrap up work on
and conclude an RFC process
and document and absorb what we learned
in the process into further improvements.
Further, we will contribute both directly and indirectly
to the Premier Research destination narrative
by helping Codex build a stable base
on which other research and interesting use cases can be built.
## Task List
### Deploy Base Capacity
* fully qualified name: <vac:dst:codex:codex-scaling:deploy-base-capacity>
* owner: Wings
* status: 99%
* start-date: 2024/10/05
* end-date: 2024/10/31
#### Description
Deploy a large set of base capacity to the Codex testnet and keep it online, stable and prevented from losing data where possible.
It will consist of 50x nodes with 10xTB of data each.
#### Deliverables
* Helm chart adapted to Vaclab and used to deploy the nodes.
* 50x nodes running and adopted into the testnet.
* Downloads/uploads tested and working for at least 3 selected nodes.
* Ongoing monitoring (not a one time thing)
* 500TB of overall capacity provided to the network
### How Fast Is Codex?
* fully qualified name: <vac:dst:codex:codex-scaling:how-fast-is-codex>
* owner: Wings
* status: 0%
* start-date: 2024/10/18
* end-date: 2024/10/21
#### Description
Related to Codex Comparison,
we simply want to find out fast Codex is, at various things
under different kinds of stress and load.
We will use the Base Capacity.
We will test and compare the following:
* Upload speed (1 client)
* Download speed
* Time to first byte
* Time to 50%
* Time to 90%
* Time to 100
We would also like to collect all data from the items in this matrix:
**Benchmark conditions**:
* total size: 2, 8, 16, 32
* seeders: 1, 2, 4, 8, 16
* file size:
100
MB,
1
GB,
5
GB
#### Deliverables
- [ ] Reports from how each item in the matrix performed.
- [ ] A general writeup

View File

@@ -0,0 +1,81 @@
---
title: Deployer Tool
tags:
- "2024q4"
- "dst"
- "ift"
draft: false
description: "Develop, test, demonstrate and graduate
a tool or method for reliably deploying,
measuring and scaling arbitrary sets of software
that needs testing and validation"
---
`vac:dst:ift:deployer-tool`
## Description
We will develop, test, demonstrate and graduate (productionise)
a tool or method for reliably deploying, measuring and scaling
arbitrary sets of software that needs testing and validation
- such as Waku, Codex, Nomos, etc.
The tool will be used to improve the developer experience of
deploying these systems at various scales,
including automation, metrics, and the ability to change
a running simulation as needed.
It should support arbitrary Helm and Kustomize charts,
allowing us to use well defined configurations
in the form of Kubernetes resources,
managed by modular bundles that can be swapped in and out as needed.
This will allow us to do all of our other work more easily,
allowing us to focus on providing value to the IFT ecosystem.
Through this, both the narratives of the Conduit of Expertise
is supported - through increasing our efficiency,
capabilities and the reliability of repeating our experiments
and research, allowing us to provide better insights and data
to the teams we work with to allow them to make better decisions.
## Task List
### ArgoCD Or Similar
* fully qualified name: `vac:dst:ift:argocd-or-similar`
* owner: Wings
* status: 80%
* start-date: 2024/10/04
* end-date: 2024/12/31
#### Description
Get ArgoCD or a similar tool up and running.
Use it to demonstrate deploying an nwaku simulation from a Git repo
with a Helm chart or plain manifests in it.
#### Deliverables
* The demonstrated ability to run an nwaku simulation.
### Working Matrices
* fully qualified name: <vac:dst:ift:working-matrices>
* owner: Wings
* status: 0%
* start-date: 2024/10/04
* end-date: 2024/12/31
#### Description
Ensure that deployment matrices work once `ArgoCD Or Similar` is completed.
Test some basic deployments and record findings.
#### Deliverables
* A report on the findings of the tests and the current state of the deployment matrices.
* A deployment matrix tool or set of instructions/documentation.
* Deployments tested and working with a 3x3 matrix of different configurations.

189
content/dst/ift/vaclab.md Normal file
View File

@@ -0,0 +1,189 @@
---
title: VacLab
tags:
- "2024q4"
- "dst"
- "ift"
draft: false
description: "Scale and apply the VacLab to IFT's needs.
Anticipate untapped use cases and needs from other teams.
Achieve 25% real world time usage."
---
`vac:dst:ift:vaclab`
## Description
The VacLab is a resource provided to Vac by Riff Labs Limited,
intended to help us perform detailed simulations
and deployments of distributed systems at scale
as well as the systems and dependencies that surround them.
With the VacLab reaching maturity where it can start being used comfortably
to advance IFT's research, development testing efforts, and quality control,
we want to ensure it is being used to its full potential
and that teams understand that the resource exists,
and if they find a useful case for their work being improved
by collaboration with the DST team through VacLab,
they should be comfortable reaching out to us,
and confident that we will be willing to try and help them,
based on our attitude and more importantly the track record
and results we will produce using these tools and our expertise.
The lab will be treated as an IaaS-style service at first,
with the raw underlying infrastructure being developed in partnership with Riff Labs
who handles the details of making that IaaS layer available to us and reliable.
As we progress through the maturity of the lab,
we will transition to supporting a more PaaS or even SaaS software model,
where as much as possible is accessible to the IFT ecosystem and teams
to use and benefit from without them needing to concern themselves with the details of the underlying infrastructure or be blocked by the need to build and manage their own.
We will move towards self service testing and deployment,
and by doing so unblock and accelerate the development, R&D and productionisation
of IFT's projects by providing a safe and reliable place to experiment and test.
It will continue to provide significant efficiency benefits in terms of cost vs output when compared to cloud providers
and even on traditional premises deployment of infrastructure,
using many independent and cheaper nodes
rather than larger more powerful vertically scaled machines,
building on second hand and used equipment,
and "patching around" the unreliability of individual hardware,
by ensuring everything is resilient and reliable even in the face of individual failures,
and in doing so continue to reduce and control the costs of testing our systems at scale.
Through the use of the VacLab
we will support the Conduit of Expertise narrative by:
* providing a unique capability to the IFT ecosystem
that would not otherwise be available to them,
lower the barrier to entry for teams needing research -
or services that require infrastructure -
by lowering the cost and removing the need for them to get it themselves
through cloud providers that provide less flexibility and direct control,
* using our knowledge of what is possible to do with these resources,
based on who is already using them,
and apply that knowledge to intuit new use cases
that will unlock better collaboration between teams and the DST,
driving and accelerating development of IFT projects
such as Waku, Codex, Nomos and more.
* Accelerating initiatives by providing the means, capability and encouragement
to test every aspect of anything that can be tested in a simulation,
across every team and use case that is interested in doing so,
up to the limits of what the DST team can support.
We will also provide support for the Premier Research destination narrative by:
* Allowing public access to non-sensitive telemetry and metrics from non-sensitive systems such as Codex storage nodes, and potentially even probes that measure the state of networks such as The Waku Network and Status.
## Task List
### Status Page Known
* fully qualified name: `vac:dst:ift:vaclab:status-page-known`
* owner: Wings
* status: 80%
* start-date: 2024/12/01
* end-date: 2024/12/07
#### Description
A status page for the VacLab
that has wide acceptance and use
by anyone who wants to know the current status
of the VacLab and its availability.
#### Deliverables
* Status page is created and hosted on the lab
and made available to users.
* Status page reflects reality and is accepted by the users
as being a good fit for their needs.
* Status page sees widespread use among its users.
* Build an external probe and a fallback status page
that can be used in case everything
### Better Time Slicing
* fully qualified name: `vac:dst:ift:vaclab:better-time-slicing`
* owner: Wings
* status: 0%
* start-date: 2024/06/01
* end-date: 2024/12/31
#### Description
Do a better job of time slicing the lab.
#### Deliverables
* A report on the current state of time slicing in the lab.
* A plan for how to improve time slicing in the lab.
* A timeline for implementing the plan.
* Measurable improvements in usage of the lab
that aims for an initial target of 25% of real world time
being used for useful workloads and tests
Later repeats in the VacLab commitment will aim to improve this to 50%,
then 75%, then as far as possible
to the limits of the underlying infrastructure and our actual needs.
### Train Lab Staff
<!-- technically sort of external
and will be done outside of normal DST cadence
but will be managed so as not to disrupt other works
-->
* fully qualified name: `vac:dst:ift:vaclab:train-lab-staff`
* owner: Wings
* status: 30%
* start-date: 2024/12/01
* end-date: 2024/12/31
#### Description
Fully dedicate all time outside of core DST deliverable work
to training Michaela, the VacLab (Riff Labs Perth) custodian,
in all aspects of not just managing the VacLab,
but providing support to DST's work that utilises it,
with the focus of improving both the reliability of the lab
and provide a better systems testing service.
Will - must, for practical reasons - be done in person in Perth.
Will also be used to improve the reliability and capabilities
of the VacLab as a platform for IFT's research and development needs.
Must not impact other works outside of this task.
#### Deliverables
- [ ] Full automation for anything we know needs doing regularly
- [ ] Automated patching for security updates (Debian, Authentik, SeaweedFS)
- [ ] Secure key management and rotation automation (for SSH keys)
- [ ] Michaela fully comfortable operating the lab independently
- [ ] A report on what was learned in this process
and how we believe it improved VacLab support and operations
- [ ] Improvements to the lab that are documented, implemented and recorded.
### Automation Uplift
<!-- technically sort of external
and will be done outside of normal DST cadence
but will be managed so as not to disrupt other works
-->
* fully qualified name: `vac:dst:ift:vaclab:automation-uplift`
* owner: Wings
* status: 0%
* start-date: 2024/12/01
* end-date: 2024/12/31
#### Description
Significantly improve the automation and management of the VacLab,
freeing up resources for Wings to focus on other work.
#### Deliverables
- [ ] Full automation for anything we know needs doing regularly
- [ ] Automated patching for security updates (Debian, Authentik, SeaweedFS)
- [ ] Secure key management and rotation automation (for SSH keys)
- [ ] A report on what was learned in this process
and how we believe it improved VacLab support and operations
- What was automated? Why? What did that change?
- What remains manual and needs improving?
- [ ] Improvements to the lab that are documented, implemented and recorded.

View File

@@ -0,0 +1,85 @@
---
title: Visualiser Tool
tags:
- "2024q4"
- "dst"
- "ift"
draft: false
description: "Develop tools or frameworks
suitable for visualising the state of arbitrary distributed systems.
This initial iteration must support Waku visualisation,
but future intention is to support any system
which is log compatible with the Visualiser Tools."
---
`vac:dst:ift:visualiser-tool`
## Description
Develop tools or frameworks
suitable for visualising the state of arbitrary distributed systems.
This initial iteration must support Waku visualisation,
but future intention is to support any system
which is log compatible with the Visualiser Tools.
We will demonstrate the usefulness and unique understanding
such a tool can give you about the way a p2p network behaves
under different conditions, and from its inception to its middle state and eventual end.
Through providing a way to visualise p2p network behaviour and message propagation,
we will help enable the Conduit of Research narrative to be supported
by giving the Waku team a way to intuitively understand
the actual way network propagation occurs,
and how it is affected by different factors.
It will also provide a way to test RFCs
that affect aspects of Waku
that are visible in a simulation
but hard to observe in the real world
or without significant mental time and investment into logs
that don't provide a visual way of analysing large scale behaviours.
## Task List
### debug-visualiser
* fully qualified name: `vac:dst:ift:visualiser-tool:debug-visualiser`
* owner: Alberto
* status: 60%
* start-date: 2024/06/01
* end-date: 2024/12/31
#### Description
The debug visualiser is designed
to allow for digging into the interactions,
packet flow, and behaviour,
of distributed systems, initially Waku.
It is intended to be "interesting and deep, not pretty or wide".
#### Deliverables
- [ ] https://github.com/vacp2p/dst-live-visualiser
### live-visualiser
* fully qualified name: `vac:dst:ift:visualiser-tool:live-visualiser`
* owner: Wings
* status: 99%
* start-date: 2024/06/01
* end-date: 2024/12/31
#### Description
The live visualiser is designed
to allow for digging into the interactions,
packet flow, and behaviour,
of distributed systems, initially Waku.
It is intended to be "pretty and wide" and in contrast to the debug visualiser
it runs in realtime along with the network
and shows you the network in a way that is easy to understand and interpret,
especially for those previously not familiar with peer to peer technologies or networks.
#### Deliverables
- [ ] https://github.com/vacp2p/dst-live-visualiser

View File

@@ -9,12 +9,23 @@ tags:
---
### `ift`
* [[dst/ift/deployer-tool|deployer-tool]]
* [ ] [[dst/ift/deployer-tool|deployer-tool]]
* [ ] [[dst/ift/visualiser-tool|visualiser-tool]]
* [ ] [[dst/ift/vaclab|vaclab]]
### `waku`
* [ ] [[dst/waku/waku-scaling|waku-scaling]]
### `codex`
* [ ] [[dst/codex/codex-scaling|codex-scaling]]
* [ ] [[dst/codex/codex-comparison|codex-comparison]]
<!--
### `nomos`
* [ ] [[dst/nomos/nomos-scaling|nomos-scaling]]
-->
### `vac`
* [ ] [[dst/vac/libp2p-evaluation|libp2p-evaluation]]

View File

@@ -0,0 +1,79 @@
---
title: Nomos Scaling
tags:
- "2024q4"
- "dst"
- "nomos"
draft: false
description: "Help Nomos understand and improve
the properties of Nomos.
Improve privacy and security,
and improve scaling properties."
---
`vac:dst:nomos:nomos-scaling`
## Description
Use real world testing,
theoretical analysis
and simulation
to determine and improve Nomos's scaling properties.
Find the limits of Nomos's capabilities
and measure them in different scenarios.
We will measure the real world speeds and latency of Nomos' mixnet,
and what use cases it is therefore able to support.
We will support the Conduit of Expertise narrative directly
by providing valuable insights to Nomos
and the ability to theorise, reason about,
test, measure and improve
the performance, stability and scalability of Nomos.
These efforts will contribute in these ways to the Conduit of Expertise narrative:
* Help Nomos ship a more scalable mixnet,
unlocking capabilities across IFT's teams and ecosystem
and allowing for more use cases to be supported and understood.
This will also help spur on outside adoption and contributions.
* Improve the RFC culture
by allowing for faster and easier development of RFCs
with the aid of rapidly accelerated insights into how an RFC in development will perform as it's being expanded and going through the draft process.
* Allow easier post-mortem analysis of the success or relative performance of a given RFC -
does this change use more or less bandwidth?
Did it improve things?
Seeing the effects of changes at scale allows for a greater ability to usefully wrap up work on and conclude an RFC process and document and absorb what we learned in the process into further improvements.
## Task List
### Mixnet benchmarking
* fully qualified name: <vac:dst:nomos:nomos-benchmarking>
* owner: Alberto
* status: 0%
* start-date: <yyyy/mm/dd>
* end-date: <yyyy/mm/dd>
#### Description
Measure the speed and reliability of Nomos's mixnet, benchmarking it against other mixnets and a selection of real world use cases.
#### Deliverables
* Benchmarks done
* Report published with all relevant details
### RFC analysis (recurring)
* fully qualified name: <vac:dst:nomos:rfc-analysis>
* owner: Alberto
* status: 0%
* start-date: <yyyy/mm/dd>
* end-date: <yyyy/mm/dd>
#### Description
Analyse the performance of RFCs that have an expected effect on the network's performance and scaling properties, using the benchmarking tools and real world measurements.
#### Deliverables
* Analysis done
* Report published with all relevant details
* RFC's GitHub issue updated with links to the analysis and results

View File

@@ -0,0 +1,79 @@
---
title: Libp2p Evaluation
tags:
- "2024q4"
- "dst"
- "vac"
draft: false
description: "Test libp2p on a regular basis
and look for regressions,
learn scaling properties and run scaling studies,
understand the limits of libp2p and its behaviour.
Deliver hard numbers and actionable insights.
Do this monthly, reliably, with strong documentation of findings."
---
`vac:dst:vac:libp2p-evaluation`
## Description
Test libp2p on a regular basis
and look for regressions,
learn scaling properties and run scaling studies,
understand the limits of libp2p and its behaviour.
We want to learn specific, actionable information
about libp2p's behaviour
and how it is evolving over time
with each new release
and with each thing we are specifically asked to check and test.
We will use a combination of real world testing,
theoretical analysis and simulation
to determine and measure the success,
side effects and other factors of libp2p and its evolution.
We will support the Conduit of Expertise narrative directly
by analysing and evaluating new libp2p releases and their features,
both with regards to features they have today
and with regards to how that compares to past behaviour.
We will:
* Enable improvements to libp2p
by allowing for repeatable, measureable
and real world insights into libp2p,
all the way from theory to practice and back.
* Reduce the risk of a libp2p regression
making it into a new release of our product
Additionally, these efforts will contribute
to the Premier Research destination narrative by:
* Improving and strengthening our relationship with the libp2p team
and thus increasing the reach and influence of the IFT,
and improving the chances
that we successfully grow our ecosystem's products and collaborations
and especially those we want to work with externally.
### Regression testing (recurring)
* fully qualified name: <vac:dst:vac:libp2p-evaluation:regression-testing>
* owner: Alberto
* status: N/A
* start-date: N/A
* end-date: N/A
#### Description
Run different scenarios
and collect evidence and data
of libp2p's behaviour.
Test for known regressions
that have occurred in the past
and ensure they don't happen again.
#### Deliverables
* Analysis done
* Report published with all relevant details
* RFC's GitHub issue updated
with links to the analysis and results.

View File

@@ -0,0 +1,281 @@
---
title: Waku Scaling
tags:
- "2024q4"
- "dst"
- "waku"
draft: false
description: "Use real world testing,
theoretical analysis and simulation
to determine and improve Waku's scaling properties.
Find the limits of Waku's capabilities
and measure them in different scenarios.
Deliver hard numbers and actionable insights.
Confirm or reject our ideas."
---
`vac:dst:waku:waku-scaling`
## Description
Use real world testing,
theoretical analysis and simulation
to determine and improve Waku's scaling properties.
Find the limits of Waku's capabilities
and measure them in different scenarios.
Deliver hard numbers and actionable insights.
Confirm or reject our ideas.
Through this we will, among other things,
research and find the limits of Waku's capabilities
and measure them in different scenarios.
We will work with the Waku team to improve and measure Waku
and allow for deep examination of a wide range of networks
from sizes anywhere from small (< 500 nodes)
to midscale (500-5000 nodes)
to large (10,000+ nodes).
We will in some ways
provide a parallel to the Vac QA team's efforts -
while their focus is on individual low level
or individual parts of Waku
and other software within the IFT ecosystem,
ours will be on the real world behaviour of Waku as a whole system -
at different scales and with different configurations,
mesh structure and shape -
and how that maps to our theoretical work.
We will support the Conduit of Expertise narrative directly
by providing valuable insights to Waku
and the ability to theorise, reason about, test, measure and improve
the performance, stability and scalability of Waku.
These efforts will contribute in these ways to the Conduit of Expertise narrative:
* Accelerate improvements to Waku,
improving the developer community's experience and satisfaction
both inside and outside of IFT's ecosystem,
through allowing repeatable, measureable and real world insights into Waku,
all the way from theory to practice and back.
* Improve the RFC culture by allowing for faster and easier development of RFCs
with the aid of rapidly accelerated insights
into how an RFC in development
will perform as it's being expanded
and as it goes through the draft process.
* Allow easier post-mortem analysis
of the success or relative performance
of a given RFC -
does this change use more or less bandwidth?
Did it improve things?
Seeing the effects of changes at scale
allows for a greater ability
to usefully wrap up work on, and conclude, an RFC process
and document and absorb what we learned in the process
into further improvements.
## Task List
### High Scalability Waku Demonstration
* fully qualified name: `vac:dst:waku:waku-scaling:high-scalability-waku-demonstration`
* owner: Wings
* status: 95%
* start-date: 2024/03/01
* end-date: 2024/11/01
#### Description
Demonstrate a working, real world, large scale Waku network.
Measure its performance
and attempt to support the assertion
that Waku is a scalable solution
that can work in networks at sizes
that push the limits of what the theoretical work we did predicted is possible.
Specifically, we want to deploy a 10,000 node Waku network
and measure its performance in terms of message delivery,
bandwidth usage, and other metrics.
We want to deliver a report on what we learned,
what we tested and what we found.
The report should include analysis of the performance of Waku at extreme scale,
providing insights that allow people to see significant supporting evidence
that Waku can in fact scale to these sizes and perform reliably.
#### Deliverables
- [x] An infrastructure setup, whether on-prem or cloud,
that can support deployments of a 10,000 node Waku network.
- [x] https://github.com/vacp2p/10ksim - A working set of bundled and compatible Kubernetes manifests
that allow for up to a 10,000 node Waku network
to be reliably created and measured.
The manifests should be compatible with [vac|dst|deployer-tool|deployer-tool]
and flexible.
- [ ] A useful set of measurements taken with the monitoring system and tooling we have available.
- [ ] The monitoring system stays stable the entire time, providing useful information and metrics.
### Test Store Protocol At Scale
* fully qualified name: `vac:dst:waku:waku-scaling:test-store-protocol-at-scale`
* owner: Alberto
* status: 0%
* start-date: 2024/10/07
* end-date: 2024/10/11
#### Description
Test the Store protocol at scale.
#### Deliverables
- [ ] A report on the results of the test,
including analysis, data and metrics.
- [ ] A list of any issues encountered.
- [ ] Hard data and metrics from the simulation.
### High Churn Relay+Store Reliability
* fully qualified name: `vac:dst:waku:waku-scaling:high-churn-relay-store-reliability`
* owner: Alberto
* status: 0%
* start-date: 2024/09/01
* end-date: 2024/12/31
#### Description
If nodes go online/offline, we should be able to retrieve missing messages from the store.
#### Deliverables
- [ ] A report on the results of the test,
including analysis, data and metrics.
- [ ] A list of any issues encountered.
- [ ] Hard data and metrics from the simulation.
### Relay/DiscV5 Resources in Heterogenous Clusters
* fully qualified name: `vac:dst:waku:waku-scaling:relay-discv5-resources-in-heterogenous-clusters`
* owner: Wings
* status: 0%
* start-date: 2024/09/01
* end-date: 2024/12/31
#### Description
Measure Relay bandwidth usage
and DiscV5 bandwidth usage
in heterogenous clusters
involving different node implementations
such as nwaku and go-waku.
### Deliverables
- [ ] A report on the results of each test, including analysis, data and metrics.
- [ ] A list of any issues encountered.
- [ ] Hard data and metrics from the simulation.
### Waku Shard Reliability vs Scale
* fully qualified name: `vac:dst:waku:waku-scaling:waku-shard-reliability-vs-scale`
* owner: Alberto
* status: 0%
* start-date: 2024/09/01
* end-date: 2024/12/31
#### Description
Test waku shard behaviour and stability with various of numbers of shards.
Choose a matrix to test for and then test for it.
### Deliverables
- [ ] Matrix/exact deployment script defined
- [ ] A report on the results of each test, including analysis, data and metrics.
- [ ] A list of any issues encountered.
- [ ] Hard data and metrics from the simulation.
### Filter and lightpush tests
Test the performance and reliability and behaviour
of the Filter and lightpush protocols at scale.
Confirm their stability and reliability at various scales.
Adjust the specific tests involved
in response to collaboration with the Waku team's directions
and the discoveries we make during the course of this work.
* fully qualified name: `vac:dst:waku:waku-scaling:filter-lightpush-tests`
* owner: Alberto
* status: 0%
* start-date: 2024/10/18
* end-date: 2024/10/25
#### Description
Test the Filter and lightpush protocols at scale.
### Deliverables
- [ ] A report on the current reliability and performance of the protocols at scale.
- [ ] Filed any issues encountered.
- [ ] Hard data and metrics from the simulation.
### Measure DiscV5 bandwidth with Waku discovery
* fully qualified name: `vac:dst:waku:waku-scaling:measure-discv5-bandwidth-with-waku-discovery`
* owner: Alberto
* status: 0%
* start-date: 2024/09/01
* end-date: 2024/12/31
#### Description
Measure the bandwidth usage of the Waku discovery protocol using the DiscV5 protocol.
### Deliverables
- [ ] A report on what you've learnt
- [ ] Hard data and metrics from the simulation.
- [ ] A documentation page with analysis and results and notes.
### Partial PeX Experimental Analysis
* fully qualified name: `vac:dst:waku:waku-scaling:partial-pex-experimental-analysis`
* owner: Alberto
* status: 0%
* start-date: 2024/09/01
* end-date: 2024/12/31
#### Description
Produce and run an experimental test environment
where a partial subset of the nodes
use Waku's Peer Exchange protocol
to share information about other nodes in the network.
Measure the bandwidth usage of DiscV5 on those nodes that use PeX
and compare it to the DiscV5 bandwidth usage of nodes that do not.
Measure overall bandwidth usage and record conclusions as to the impact of PeX.
#### Deliverables
- [ ] DiscV5 bandwidth comparison document/report - PeX vs no-PeX
- [ ] Overall bandwidth usage comparison document/report
- [ ] Record conclusions as to the impact of PeX.
### Mixed Environment Analysis
* fully qualified name: `vac:dst:waku:waku-scaling:mixed-environment-analysis`
* owner: Alberto
* status: 0%
* start-date: 2024/09/01
* end-date: 2024/12/31
#### Description
Measure relay resource with a mix of nodes
using Resource-restricted device reliability protocol and peer exchange,
meaning a small number of nwaku nodes serve store, light push and filter protocols
and a high number of clients consume them.
For example, 6-10 service nodes, 200 relay nodes and 1000 light nodes.
This should include connection and node churn impact on reliability for both relay and light clients.
#### Deliverables
- [ ] A report on the findings and measurements and results.
- [ ] A list of any issues encountered.
- [ ] Analysis and actionable insights or conclusions.
<!-- Most recently blocked by metrics scaling issues, nearly through them -->