Files
vac.dev/rlog/2025-07-25-zerokit-performance.mdx
jonesmarvin8 3a48b56962 minor corrections to zerokit-performance blog (#173)
* minor corrections to zerokit-performance blogs

- Added truncate line back
- Updated published
- Corrected publication date to 7-25 (and updated the file name to reflect this).
- tested locally.

* minor corrections

corrected extension for mdx for proper abstract display, and some minor typo fixes.
2025-07-24 23:49:03 +03:00

272 lines
13 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: 'Zerokit optimizations: A performance journey'
date: 2025-07-25 18:30:00
authors: benph
published: true
slug: 2025-zerokit-perf
categories: research
toc_min_heading_level: 2
toc_max_heading_level: 5
---
[<u>Zerokit</u>](https://github.com/vacp2p/zerokit/) is a toolkit
providing powerful zero-knowledge utilities, including a
means to answer the question "How do you prevent spam when
every message is anonymous?". Its use of the Merkle hash
tree, combined Poseidon hasher are keys to the answer we
seek here, and with other questions that ask the improbable.
These technologies, however, can take a heavy toll on
resources if not used correctly. What follows is a window
into the efforts made to squeeze out optimizations, and
culling of redundant resource use. A story of cripplingly
slow performance meets engineering talent, we arrive at a
place where Zerokit comes through, fast and efficient, ready
to face the world.
{/* truncate */}
## Background
Our friends over at [<u>Waku</u>](https://free.technology/waku) are
particularly enthusiastic about anonymous spam prevention
technologies. They have been using the Rate Limiting
Nullifier ([<u>RLN</u>](https://crates.io/crates/rln)) tooling that
Zerokit provides to enforce a message-rate policy among
users—a crucial feature unless we want a community bombarded
with "totally legit, not scams" messages on repeat. However,
as is often the case with new technology, some problematic
delays began to surface. Node recalculations, a common
operation, were taking tens of seconds at the scales being
tested and deployed—even exceeding 40 seconds at times.
These delays accumulate, leading to total delays on the
order of three hours under certain conditions.
Naturally, we couldn't just let this sit. While we've
touched on the issue of spam prevention, it's important to
recognize that this technology is foundational that
challenges conventional wisdom on how things must be done.
Does the idea of "smart contracts without gas" catch your
attention? Don't hold your breath just yet: the really
interesting applications of this tech will be dead in the
water, unless we can meet the challenge put to us.
## The Challenge
The plan of attack that the team put together was twofold:
get rid of redundant operations and data taking up precious
resources, and make the remaining operations go
[<u>*Blazingly Fast™*.</u>](https://old.reddit.com/r/rust/comments/1avf1d8/blazingly_fast_memory_vulnerabilities_written_in/)
Introducing the star of the show for part 1: The main
point of having this tree is to generate proofs so that
peers can verify the claims being made. That doesnt require
the whole Merkle tree, just a single path, from leaf to
root. The engineering work took us in a direction where
these paths were the primary context in which ZK proofs
operated, relegating the tree itself to an off-network
reference. This reduced the burden imposed on the network
significantly. Updating the data on the tree has similarly
reduced, with the exception being that the siblings of each
node were retained. This is called the **stateless**
approach.
Well, stateless in the context of proof generation and
verification. This is the critical context when it comes to
performance, and the stateless approach does a great job,
but these proofs have to come from *somewhere*. Each
participant still needs to maintain the Merkle tree in their
local environment. Without this tree, one cannot generate
proofs or verify the proofs provided to them. Fortunately,
one does not need to track the entire tree, but can be
limited to a subset of the tree needed. With millions of
participants on-chain, this can make the difference needed
to make Zerokit empowered technologies accessible to those
running raspberry Pis. Combine this with the fact that the
heavy lifting operations of proof gen/verification being
modular and separate, each participant can optimise to run
things according to the strengths and requirements of their
native hardware, easing the way to allow each participants
to run their tree implementation at the speed of
mach-ludicrous.
Fortunately, the core of our already existing implementation
was sane and well put together. Double-lucky for us, the
talents of newly minted VAC/ACZ team members Sylvain and Vinh were
readily available. Sylvain, with a solid background in the Rust
programming language, having already graduated from the most
challenging parts of its infamous learning curve. He quickly
got to work zeroing in on some subtle performance pathologies.
Something as simple as using a mutable iterator to change
values directly. Clever use of references to avoid copying
data, and other memory optimization techniques that can be
hidden to those that cannot “see the matrix” when working in
Rust lead to very promising bench-marking results.
Vinh, having recently graduated from his CS studies, was presented
with the challenge of parrelising computations. For those not
familiar with Rust, this might seem unreasonable, but thanks
to the `rayon` crate, and Rusts promise of "fearless concurrency"
afforded by its type and ownership system, this kind of refactor
becomes surprisingly easy, even for a talented individual at
the start of their career. Of particular note: These parallelisations
have been made available to the browser. Browser threads are
relatively now, and by diving into this bleeding-edge technology,
and making use of libraries that are still in early development
stages, _Blazingly Fast™_ is now available within the browser.
With all that in the bag, all these performance gains are
gift-wrapped in the use of browser-native WASM runtimes.
Well done, everyone!
## The importance of benchmarks
No performance project is complete without high quality
benchmark data. Writing a quick benchmark for tracking
improvements through development is one thing, but having a
system of telemetry that allows you to credibly assert
claims of superior performance is what completes the
project. With such credible claims in hand, these efforts
can bring about significant impact on the field at large.
The key word being **credible**. Credibility cannot depend
on “trust me bro” (obviously). The truth of these claims
must come out of the directed efforts of a multitude of
thought-disciplines. The engineer must have a solid model to
understand the nature of the system. The statistician sets
the quality standards of the data. The Scientist must
diligently put relevant hypothesis to the test. The advocate
must see that the reports made reach out to where it makes
the most impact, the list goes on. Much of this is out of
scope for this article, and so I will treat you with
[<u>a link</u>](https://www.youtube.com/watch?v=qUN4Tln608Q&list=PLtoQeavghzr3nlXyJEXaTLU9Ca0DXWMnt).
Heres your chance to see a hardcore OS engineer at the top
of their chosen field speak on the subject of their passion.
All this is to say we are not the only team implementing
Merkle tree tech, which also includes the Poseidon hash
function it needs. In order to be a premier research
destination, key aspect of why VAC exists, the fruits of our
labor is just the beginning. We must prove the merit of our
efforts through comparative benchmarks that satisfies the
skeptics and decision makers.
Comparative benchmarks are among the most high-stakes
element of performance critical projects. Get it right, and
quality research output can become industry standard
technology. Get it wrong, and be ready to lose the trust the
field has in you as your precious R&D fades into obscurity.
For the time being, our comparative benchmarks have been
used internally to inform decision-makers. As benchmarks
become standardised, independently verified and executed,
this initial effort may be the first of many steps to a
community-wide ecosystem. A thunderdome of benchmarks,
leaving us with a single champion that cannot be denied, but
which technology will claim this mantle? May the bits be
_ever_ in your favor...
## Benchmarking with Rust's criterion crate
Rust, much like Nim, offers unadulterated, fine-grained, and
direct control over performance, but with Rust, this control
is even more immediate. With its sophisticated ownership
model, powerful type system, and comprehensive tooling, Rust
has earned an unrivaled reputation for enabling "fearless
concurrency," ease of refactoring, and providing tooling
that effectively "pair programs with you" to help avoid
common pitfalls, includeing those of the performance
veriety.
The [<u>criterion</u>](https://crates.io/crates/criterion) crate is
considered the go-to library for micro-benchmarking within
the Rust ecosystem, and is generally regarded as an
invaluable tool for obtaining high-quality telemetry.
Through its ergonomic idioms and well-thought-out API,
writing high-quality benchmarks becomes straightforward once
you become familiar with its features. Criterion helps avoid
common traps such as inappropriate compiler optimizations,
improper performance sampling, and failing to prune
telemetry overhead. As is typical for the Rust ecosystem,
the documentation is thorough, leaving little to be desired,
and the examples are extensive, making the initial learning
process a pleasant experience.
Most importantly, it automatically generates tables and
graphs from this data, making the crucial task of analysis
straightforward and accessible. At this point, we are ready
to appreciate the results of our efforts.
## Promising results
When it comes to Merkle trees, we have two elements to
consider: The tree itself, and the hashing function that is
plugged into it. In the benchmarks we put together for the
benefit of internal processes, we put our implementation up
against a corresponding FOSS implementation. Scenarios were
developed to isolate key performance telemetry, obtain a
statistically usable sampling, with the resulting data
rendered into a human readable form that can be read with a
reasonable degree of confidence: enjoy! The brief summary:
It appears that our in house implementation consistently
outperforms others, and weve decided to continue committing
to the R&D of our in-house implementations. Congratulations
to the Zerokit team for this accomplishment.
Despite the promising results, these “micro-benchmarks” form
just some of the many pieces of the whole system performance
when it comes to product needs. How the system performs as a
whole is all that matters. This is a promising on its own,
but watching the performance benefits being realized in the
wild is the true goal.
Which brings us back to what started all this: Waku came to
us with concerns about performance issues within Zerokit
limiting the scope and scale in which it can be used. The
engineering talent brought to bear on this issue has
successfully achieved the performance goals needed, and the
results of these effort have demonstrated there is merit in
continuing our commitment to this project.
## Conclusion
Weve covered a story that starts with crippling performance
bottlenecks in Waku, and ends on this high-note: The
problematic performance scaling issues are no more, and in
the process of resolving this critical pain-point, we have
established internal benchmarks that allow us to confidently
state that what we are doing, we are doing well. These
accomplishments come down to a solid team effort. The open
communication coming in from Waku, the talented engineers
working together to bring their skills and contributions to
bear, the community developed tools and prior works that
allowed it all to happen, and those working quietly in the
background providing the leadership, resources, and
coordination needed to bring this all together. Two VAC/ACZ
engineers in particular call for specific mention:
[<u>Ekaterina</u>](https://github.com/seemenkina) for her role in
taking lead in the R&D of the Zerokit ecosystem, and
[<u>Sylvain</u>](https://github.com/sydhds) for his efforts in
squeezing out some impressive optimizations.
[<u>Vinh</u>](https://github.com/vinhtc27) for unleashing the power of multiple threads, not
only for native, but for when running in the browser as well.
Perhaps you want to get involved! Maybe you have some ideas
about what the community needs for standard benchmarks.
Would you like to see another implementation added to the
thunderdome?
[<u>Raise an issue</u>](https://github.com/vacp2p/zerokit/issues/new),
or join us on [<u>our forum</u>](https://forum.vac.dev/). We look
forward to seeing your voice added.
This is just one story, coming out of one relatively small
project from VAC research. The two driving directives of the
team is to be a conduit of expertise within IFT, and to be a
premier research destination within the domains we work in.
You might be independent of IFT with an interest in what we
do, an IFT core contributor, or anything in between: our
services are at your disposal. Join us on discord to start
the conversation, email one of our team members, or maybe
you might hear a knock on your door, should something in
your field of work catch our interest.