This PR allows for free runtime data besides free compile time data,
based on `initial_memory`.
- Replaces `cbor` by `bincode` in prover queries (bincode is more
efficient and cbor crashed with `u128`)
- Allows the prover to pass a runtime initial memory, only possible with
continuations (from https://github.com/powdr-labs/powdr/pull/2251, was
already merged into here, see commits list)
- Changes the `powdr` lib, which already always uses continuations, to
always use this mechanism for prover data
- Provides a new stdin-stream-like function to read inputs in sequence,
like other zkVMs.
- The function above is called `read_stdin` which I'm not super happy
with, ideally it'd just be called `read` but the QueryCalldata function
is already called `read`. I think we could just keep this as is and
change later.
---------
Co-authored-by: Lucas Clemente Vella <lvella@powdrlabs.com>
This should speed up the test. If I remember correctly, with this
compilation is only 2-3s for Poseidon (instead of ~10min). At runtime,
it is about as "fast" as runtime witgen.
### PR: Update Powdr's `stwo` Dependency and Align Toolchain
This PR updates Powdr's `stwo` dependency to the latest version, which
now uses the `nightly-2024-12-17` Rust toolchain. To ensure
compatibility, Powdr's toolchain has also been aligned with this new
nightly version.
As part of this update:
- Several modifications were made to address stricter rules and lints
introduced by the newer version of Clippy.
- System dependencies, including `uuid-dev` and `libgrpc++-dev`, were
added to resolve build and runtime issues brought about by updated
dependencies and toolchain requirements.
This PR puts together the pieces to run compile-time witgen for block
machines. There are still many cases where it doesn't work yet, in which
case it falls back to run-time solving. These cases should be fixed in
future PRs.
It also fixes two bugs:
- When multiplying two affine expression, the case where one of them is
zero is now handled properly.
- `WitgenInference` now handles intermediate columns.
Note that this PR could slow down witgen by attempting to compile code
once per incoming connection and input / output combination, in block
machines. I think this should be negligible though and it gives us that
much of the new pipeline is already running in the tests and elsewhere.
# Benchmark results
I tested the code with different opt levels on a benchmark that computes
ca. $2^{16}$ Poseidon hashes.
## Baseline
```
== Witgen profile (393220 events)
93.0% ( 30.8s): Secondary machine 0: main_poseidon (BlockMachine)
4.1% ( 1.4s): witgen (outer code)
2.3% ( 750.8ms): Main machine (Dynamic)
0.6% ( 204.4ms): FixedLookup
0.0% ( 3.2µs): range constraint multiplicity witgen
---------------------------
==> Total: 33.109672458s
```
## JIT (opt level 1)
```
== Witgen profile (393222 events)
52.3% ( 7.7s): JIT-compilation
32.0% ( 4.7s): Secondary machine 0: main_poseidon (BlockMachine)
9.2% ( 1.3s): witgen (outer code)
5.1% ( 748.3ms): Main machine (Dynamic)
1.4% ( 213.5ms): FixedLookup
0.0% ( 417.0ns): range constraint multiplicity witgen
---------------------------
==> Total: 14.729149333s
```
## JIT (opt level 3)
```
== Witgen profile (393222 events)
94.6% ( 107.9s): JIT-compilation
3.4% ( 3.9s): Secondary machine 0: main_poseidon (BlockMachine)
1.1% ( 1.3s): witgen (outer code)
0.7% ( 746.5ms): Main machine (Dynamic)
0.2% ( 204.1ms): FixedLookup
0.0% ( 542.0ns): range constraint multiplicity witgen
---------------------------
==> Total: 114.036571291s
```
Turns out `git worktree` does not track the remote branch, so the
benchmarks are not there.
Revert to what we did before, and explicitly restore the benchmarks from
the remote `gh-pages` branch.
This PR:
- Makes explicit the notion that 0=stdin, 1=stdout, 2=stdout in the
QueryCallback's "FS"
- Exposes the outputs in Session
- Removes printing to stdout and stderr in the callback itself. This is
now the responsibility of the host if needed.
- Adds the Fibonacci test with stdout to CI using the write mechanism
The idea is that after this we should also expose the proof's publics
and make a stream mechanism for inputs and outputs
related to [this PR](https://github.com/powdr-labs/powdr/pull/1898)
we need to change to nightly toolchain to integrate stwo
I kept the toolchain related to riscv to be "nightly-2024-08-01" as it
is handled separately in workflow, so I made the least change to make
stwo integrate-able for now.
fix some clippy issues about the comment format on some files.
---------
Co-authored-by: chriseth <chris@ethereum.org>
Replaces https://github.com/powdr-labs/powdr/pull/1790
This PR:
- [X] implements the basic RISCV BB machine without precompiles
- [X] runs the instruction tests with BB as well
- [x] runs the Rust tests except for continuations and tests that
require keccak, poseidon, ec ops
- [x] Missing: inputs and outputs with 2 limbs. There is already a
failing test that covers that.
---------
Co-authored-by: Leo Alt <leo@ethereum.org>
Working on the assumption that there is a race condition between the
delete command and the push new cache command, that is why some days we
have no cache.
Fixes a bug where circuits with publics can't be verified properly. This
was caused by a difference in the circuit when the verification key is
generated (without witnesses) vs when the user requests a proof (with).
---------
Co-authored-by: Leo Alt <leo@ethereum.org>
Builds on #1687Fixed#1572
With this PR, we are using dynamic VADCOP in the RISC-V zk-VM.
There were a few smaller fixes needed to make this work. In summary, the
changes are as follows:
- We set the degree the main machine to `None`, and all fixed lookup
machines to the appropriate size. As a consequence, the CPU, all block
machines & memory have a dynamic size.
- As a consequence, I had to adjust some tests (set the size of all
machines, so they can still be run with monolithic provers) *and* was
able to remove the `Memory_<size>` machines 🎉
- With the main machine being of flexible size, the prover can chose for
how long to run it. We run it for `1 << (MAX_DEGREE_LOG - 2)` steps and
compute the bootloader inputs accordingly. With this choice, we can
guarantee that the register memory (which can be up to 4x larger than
the main machine) does not run out of rows.
Note that while we do access `MAX_DEGREE_LOG` in a bunch of places now,
this will go away once #1667 is merged, which will allow us to configure
the degree range in ASM and for each machine individually.
### Example:
```bash
export MAX_LOG_DEGREE=18
cargo run -r --bin powdr-rs compile riscv/tests/riscv_data/many_chunks -o output --continuations
cargo run -r --bin powdr-rs execute output/many_chunks.asm -o output --continuations -w
cargo run -r --features plonky3,halo2 prove output/many_chunks.asm -d output/chunk_0 --field gl --backend plonky3-composite
```
This leads to the following output:
```
== Proving machine: main (size 65536), stage 0
==> Proof stage computed in 1.918317417s
== Proving machine: main__rom (size 8192), stage 0
==> Proof stage computed in 45.847375ms
== Proving machine: main_binary (size 1024), stage 0
==> Proof stage computed in 27.718416ms
== Proving machine: main_bit2 (size 4), stage 0
==> Proof stage computed in 15.280667ms
== Proving machine: main_bit6 (size 64), stage 0
==> Proof stage computed in 17.449875ms
== Proving machine: main_bit7 (size 128), stage 0
==> Proof stage computed in 20.717834ms
== Proving machine: main_bootloader_inputs (size 262144), stage 0
==> Proof stage computed in 524.013375ms
== Proving machine: main_byte (size 256), stage 0
==> Proof stage computed in 17.280167ms
== Proving machine: main_byte2 (size 65536), stage 0
==> Proof stage computed in 164.709625ms
== Proving machine: main_byte_binary (size 262144), stage 0
==> Proof stage computed in 504.743917ms
== Proving machine: main_byte_compare (size 65536), stage 0
==> Proof stage computed in 169.881542ms
== Proving machine: main_byte_shift (size 65536), stage 0
==> Proof stage computed in 146.235916ms
== Proving machine: main_memory (size 32768), stage 0
==> Proof stage computed in 326.522167ms
== Proving machine: main_poseidon_gl (size 16384), stage 0
==> Proof stage computed in 1.324662625s
== Proving machine: main_regs (size 262144), stage 0
==> Proof stage computed in 2.009408667s
== Proving machine: main_shift (size 32), stage 0
==> Proof stage computed in 13.71825ms
== Proving machine: main_split_gl (size 16384), stage 0
==> Proof stage computed in 108.019334ms
Proof generation took 7.364567s
Proof size: 8432928 bytes
Writing output/chunk_0/many_chunks_proof.bin.
```
Note that `main_bootloader_inputs` is still equal to the maximum size,
we should fix that in a following PR!
We don't plan to actually support all the features of `std`, but this
will allow Powdr to build and execute `std` crates that limits
themselves to the features we do support (which, as of this PR, is
reading and writing to specific file descriptors). Only works via ELF
path.
This marks more tests as ignored because they are slow.
It also adds a list of "nightly tests", which are ignored tests that are
explicitly excluded in the regular PR runs (but executed in the nightly
run that runs all tests).
There are many advantages in using standard assemblers and linkers,
like:
- maturity;
- more complete support of assembly language, which in turn allows for
support to more high level languages and compilers;
- link-time optimizations;
- no need to deal with language edge cases (our assembler and linker
deals with a number of rust stuff, but it would need even more to
support `std`).
But it comes with a cost, as we need to lift references to text data
back into labels, (because powdr operates on a higher abstraction
level), and to do that, we need the ELF file to either be a PIE
(Position Independent Executable), or to still have the linkage
relocation tables (option `--emit-relocs` of GNU and LLVM linkers).
---------
Co-authored-by: Leo Alt <leo@ethereum.org>
Co-authored-by: Leo <leo@powdrlabs.com>
Similar to #1193, but in here I am just interested in having it working
end-to-end, at least for a few cases, so that everybody can try it and
build upon.
<!--
Please follow this protocol when creating or reviewing PRs in this
repository:
- Leave the PR as draft until review is required.
- When reviewing a PR, every reviewer should assign themselves as soon
as they
start, so that other reviewers know the PR is covered. You should not be
discouraged from reviewing a PR with assignees, but you will know it is
not
strictly needed.
- Unless the PR is very small, help the reviewers by not making forced
pushes, so
that GitHub properly tracks what has been changed since the last review;
use
"merge" instead of "rebase". It can be squashed after approval.
- Once the comments have been addressed, explicitly let the reviewer
know the PR
is ready again.
-->
---------
Co-authored-by: Leo <leo@powdrlabs.com>