This PR does a few things:
- unify `Input/DataIdentifier` into a single `Input` query that takes
(channel,idx) and returns a field element
- Output query takes (channel, fe)
- change the related prover functions to reflect this
This interface is used by, but not the same, as the input/output for the
riscv machine.
How to use these to input/output bytes or serialized data is a job of
the runtime implementation.
(First 4 commits of #1650)
This PR prepares witness generation for scalar publics (#1756). Scalar
publics are similar to cells in the trace, but are global (i.e.,
independent on the row number).
With this PR, affine expressions use a new `AlgebraicVariable` enum,
that can be either a column reference (`&'a AlgebraicReference`, which
was used previously), or a reference to a public.
At the on-site, we introduced operators for all lookup-like constraints.
This means that `[x, y] in s $ [a, b];` is not parsed as a lookup
identity any more, but just as a regular expression with binary
operators. The whole concept of a lookup or identity as a rust type is
now only present after the condenser has run. Because of that, we can
remove a lot of code that was concerned with parsed identities.
Since prover functions were introduced, we can have both constraints and
prover functions at statement level. Because of that I extended the
concept of "identity" (which we partly renamed to "constraint" already)
to "proof item". A proof item is either a constraint or a prover
function. Later on, we might also include fixed columns, challenges,
etc.
Co-authored-by: chriseth <chriseth.github@gmail.com>
We currently hardcode the range of degrees that variable degree machines
are preprocessed for. Expose that in machines instead.
This changes pil namespaces to accept a min and max degree:
```
namespace main(123..456);
namespace main(5); // allowed for backward compatibility, translates to `5..5`
```
It adds two new builtins:
```
std::prover::min_degree
std::prover::max_degree
```
And sets the behavior of the `std::prover::degree` builtin to only
succeed if `min_degree` and `max_degree` are equal.
Builds on #1687Fixed#1572
With this PR, we are using dynamic VADCOP in the RISC-V zk-VM.
There were a few smaller fixes needed to make this work. In summary, the
changes are as follows:
- We set the degree the main machine to `None`, and all fixed lookup
machines to the appropriate size. As a consequence, the CPU, all block
machines & memory have a dynamic size.
- As a consequence, I had to adjust some tests (set the size of all
machines, so they can still be run with monolithic provers) *and* was
able to remove the `Memory_<size>` machines 🎉
- With the main machine being of flexible size, the prover can chose for
how long to run it. We run it for `1 << (MAX_DEGREE_LOG - 2)` steps and
compute the bootloader inputs accordingly. With this choice, we can
guarantee that the register memory (which can be up to 4x larger than
the main machine) does not run out of rows.
Note that while we do access `MAX_DEGREE_LOG` in a bunch of places now,
this will go away once #1667 is merged, which will allow us to configure
the degree range in ASM and for each machine individually.
### Example:
```bash
export MAX_LOG_DEGREE=18
cargo run -r --bin powdr-rs compile riscv/tests/riscv_data/many_chunks -o output --continuations
cargo run -r --bin powdr-rs execute output/many_chunks.asm -o output --continuations -w
cargo run -r --features plonky3,halo2 prove output/many_chunks.asm -d output/chunk_0 --field gl --backend plonky3-composite
```
This leads to the following output:
```
== Proving machine: main (size 65536), stage 0
==> Proof stage computed in 1.918317417s
== Proving machine: main__rom (size 8192), stage 0
==> Proof stage computed in 45.847375ms
== Proving machine: main_binary (size 1024), stage 0
==> Proof stage computed in 27.718416ms
== Proving machine: main_bit2 (size 4), stage 0
==> Proof stage computed in 15.280667ms
== Proving machine: main_bit6 (size 64), stage 0
==> Proof stage computed in 17.449875ms
== Proving machine: main_bit7 (size 128), stage 0
==> Proof stage computed in 20.717834ms
== Proving machine: main_bootloader_inputs (size 262144), stage 0
==> Proof stage computed in 524.013375ms
== Proving machine: main_byte (size 256), stage 0
==> Proof stage computed in 17.280167ms
== Proving machine: main_byte2 (size 65536), stage 0
==> Proof stage computed in 164.709625ms
== Proving machine: main_byte_binary (size 262144), stage 0
==> Proof stage computed in 504.743917ms
== Proving machine: main_byte_compare (size 65536), stage 0
==> Proof stage computed in 169.881542ms
== Proving machine: main_byte_shift (size 65536), stage 0
==> Proof stage computed in 146.235916ms
== Proving machine: main_memory (size 32768), stage 0
==> Proof stage computed in 326.522167ms
== Proving machine: main_poseidon_gl (size 16384), stage 0
==> Proof stage computed in 1.324662625s
== Proving machine: main_regs (size 262144), stage 0
==> Proof stage computed in 2.009408667s
== Proving machine: main_shift (size 32), stage 0
==> Proof stage computed in 13.71825ms
== Proving machine: main_split_gl (size 16384), stage 0
==> Proof stage computed in 108.019334ms
Proof generation took 7.364567s
Proof size: 8432928 bytes
Writing output/chunk_0/many_chunks_proof.bin.
```
Note that `main_bootloader_inputs` is still equal to the maximum size,
we should fix that in a following PR!
When making fixed lookup machines smaller in the RISC-V VM (#1683), I
came across the issue that range-constraint lookups (e.g. `[two_bits] in
[TWO_BITS]` where `TWO_BITS = [0, 1, 2, 3]`) where not recognized as
such if the fixed column was *just* the right size (in the above
example, `TWO_BITS = [0, 1, 2, 3, 0]` would have worked).
This PR turns the `FixedLookup` into a "normal" machine, i.e., it
implements the `Machine` trait. This removes special handling of the
fixed lookup in various places.
Changes:
- `Machine::take_witness_col_values` now takes a reference to
`MutableState`, similar to `Machine::process_plookup`. With this,
machines can still call other machines machines while finalizing. This
is needed because some machines appear to call the fixed lookup when
finalizing.
- To handle this correctly, I changed the code such that:
- Machines are finalized in the order in which they appear in the
machines list (`FixedLookup` is the last machine)
- When finalizing, machines can access all *following* machines, but not
the once before, as they are already finalized.
`FixedLookup` is still a weird machine, which is responsible to many
sets of fixed columns (i.e., several ASM-machines) which might not even
have the same length. But that can be fixed in a separate PR.
Equivalent to #1623 but in Rust
Closes#1570
@georgwiese I wanted to add a test but everything I try either panics
(we sometimes assume the fixed columns to be available in a single size)
or runs forever (when I add fixed columns to a machine, I assume witgen
keeps increasing the degree and never stops? or it picks the largest
size and just takes time?) any thoughts?
Fixes#1604
With this PR, we bypass machine detection during witness generation of
stages > 0. See [this
comment](https://github.com/powdr-labs/powdr/issues/1604#issuecomment-2257059636)
for a motivation.
This currently needs to be tested manually, as follows:
```
$ RUST_LOG=trace cargo run pil test_data/asm/block_to_block_with_bus.asm -o output -f --field bn254 --prove-with halo2-mock
...
===== Summary for row 7:
main.acc1 = 20713437912485111384541749944547180564950035591542371144095269313127123163196
main.acc2 = 5162472027861336027760332823162682203738251621730423286600997430635718406729
main.z = 3
main.res = 9
main_arith.acc1 = 1174804959354163837704655800710094523598328808873663199602934873448685332421
main_arith.acc2 = 16725770843977939194486072922094592884810112778685611057097206755940090088888
main_arith.acc1_next = 463668501342879563405020640323131794083013726819708055681247370540753473777
main_arith.acc2_next = 20043340305711842349747334022818855888193664738087810292789887394167185113571
main_arith.y = 1
main_arith.z = 1
main.dummy = 0
main.acc1_next = 0
main.acc2_next = 0
main_arith.x = 0
main_arith.sel[0] = 0
---------------------
...
```
Computing `main.acc1 + main_arith.acc1` and `main.acc2 +
main_arith.acc2` both yields
`21888242871839275222246405745257275088548364400416034343698204186575808495617`,
which is the BN254 scalar field prime! In other words, the partial
accumulators sum to 0.
---------
Co-authored-by: Leo <leo@powdrlabs.com>
Another step towards #1572
Builds on #1574
I modified witness generation as follows:
- Each machine keeps track of its current size; whenever a fixed column
value is read, it has to pass the requested size as well.
- If fixed columns are available in several sizes, witness generation
starts out by using the largest size, as before
- When finalizing a block machine, it "downsizes" the machine to the
smallest possible value
Doing this for other machine types (e.g. VM, memory, etc) should be done
in another PR.
In the `vm_to_block_dynamic_length.pil` example, witness generation now
pics the minimum size instead of the maximum size for `main_arith`
```
$ cargo run pil test_data/pil/vm_to_block_dynamic_length.pil -o output -f --field bn254 --prove-with halo2-mock-composite
...
== Proving machine: main (size 256)
==> Machine proof of 256 rows (0 bytes) computed in 60.174583ms
size: 256
Machine: main__rom
== Proving machine: main__rom (size 256)
==> Machine proof of 256 rows (0 bytes) computed in 33.310292ms
size: 32
Machine: main_arith
== Proving machine: main_arith (size 32)
==> Machine proof of 32 rows (0 bytes) computed in 2.766541ms
```
Fixes#1496
Also, a step towards #1572
This PR implements the steps needed in `CompositeBackend` to implement
dynamic VADCOP.
In summary:
- If a machines size (a.k.a. "degree") is set to `None`, fixed columns
are computed in all powers of too in some hard-coded range. This fixes
#1572. As a result, machines with a size set to `None` are available in
multiple sizes. If the size is explicitly set by the user, the machine
is only available in that one size.
- Note that the ASM linker still sets the size of machines without a
size. So, currently, this can only happen when coming from PIL directly.
- `CompositeBackend` instantiates a new backend for each machine *and
size*:
- The verification key contains a key for each machine and size.
- When proving, it it uses the backend of whatever size the witness has.
The size chosen is also stored in the proof.
- When verifying, the verification key of the reported size is used.
- Witness generation currently chooses the largest available size. This
will change in a future PR.
This is an example:
```
$ cargo run pil test_data/pil/vm_to_block_dynamic_length.pil -o output -f --field bn254 --prove-with halo2-mock-composite
...
== Proving machine: main (size 256)
==> Machine proof of 256 rows (0 bytes) computed in 209.101166ms
== Proving machine: main__rom (size 256)
==> Machine proof of 256 rows (0 bytes) computed in 226.87175ms
== Proving machine: main_arith (size 1024)
==> Machine proof of 1024 rows (0 bytes) computed in 432.807583ms
```
This PR adds `number::VariablySizedColumns`, which can store several
sizes of the same column. Currently, we always just have one size, but
as part of #1496, we can relax that.
This PR splits from the main Trait implementation PR #1450 to simplify
the review process.
It includes only the parsing of the traits (not impls) and some
functionality necessary for the code to compile.
---------
Co-authored-by: chriseth <chris@ethereum.org>
Fixes#1494
- use cbor for witness and constant files (moving polygon serialisation
to the relevant backend)
- add `degree` field on `Symbol`, inherited from the namespace degree
- have each machine in witgen operate over its own degree
- fail in the backend if we have many degrees
Just like the `VmProcessor`, with this PR the `BlockProcessor` never
processes an identity again that has been completed.
This should be a slight performance optimization (when the "default"
sequence iterator is used), but more importantly it:
- Fixes#1385
- Allows us to remove error-prone code (see below)
- Helps with witgen for stateful machines, like those who access memory
Another step towards VadCoP (#1495)
With this PR, the `CompositeBackend` splits the given PIL into multiple
machines and creates a proof for each machine.
The rough algorithm is as follows:
1. The PIL is split into namespaces
2. Any lookups or permutations that reference multiple namespaces are
removed.
3. Any other constraints that reference multiple namespaces lead to the
two namespaces being merged.
This is an example:
```
$ RUST_LOG=debug cargo run pil test_data/asm/block_to_block.asm -o output -f --prove-with halo2-composite --field bn254
...
Skipping connecting identity: main.instr_add { main.x, main.y, main.z } in 1 { main_arith.x, main_arith.y, main_arith.z };
== Proving machine: main
PIL:
namespace main(8);
col fixed x(i) { i / 4 };
col fixed y(i) { i / 4 + 1 };
col witness z;
col witness res;
col fixed latch = [0, 0, 0, 1]*;
main.res' = (1 - main.latch) * (main.res + main.z);
(1 - main.instr_add) * (main.x + main.y - main.z) = 0;
col fixed instr_add = [0, 1]*;
Starting proof generation...
Generating PK for snark...
Generating proof...
Time taken: 149.459458ms
Proof generation done.
== Proving machine: main_arith
PIL:
namespace main_arith(8);
col witness x;
col witness y;
col witness z;
main_arith.z = main_arith.x + main_arith.y;
Starting proof generation...
Generating PK for snark...
Generating proof...
Time taken: 150.752125ms
Proof generation done.
Writing output/block_to_block_proof.bin.
```
Seems seems to work across the entire codebase and allows us to create
Halo2 proofs by machine!
For STARK backends, we typically expect that IDs (e.g. Polynomial IDs,
constraint IDs, ...) are re-assigned, which is not yet happening in this
implementation. As mentioned in the comment, the easiest way to fix that
would be to fix#1488 and re-parse the PIL file.
Some witness generation fixes needed to make #1508 work:
- When we check whether we already answered the query, we assumed that
the latch row is the last row; now, we use the actual latch row.
- That same check might access any row in the last block, so now we
never finalize the last block.
This PR builds on top of #1393.
It mainly modifies the grammar by changing the way SelectedExpressions
are declared, to allow blocks to be empty.
---------
Co-authored-by: chriseth <chris@ethereum.org>
Makes the permutation argument sound on the Goldilocks field by
evaluating polynomials on the extension field introduced in #1310.
I also used the new `Constr::Permutation` variant!
A few test cases (also tested in CI):
#### No extension field
`cargo run pil test_data/std/permutation_via_challenges.asm -o output -f
--field bn254 --prove-with halo2-mock`
This still works and produces the same output as before, thanks to the
PIL evaluator removing multiplications by 0 etc:
```
col witness stage(1) z;
(std::protocols::permutation::is_first * (main.z - 1)) = 0;
((((1 - main.first_four) * ((std::protocols::permutation::beta1 - ((std::protocols::permutation::alpha1 * main.b1) + main.b2)) - 1)) + 1) * main.z') = (((main.first_four * ((std::protocols::permutation::beta1 - ((std::protocols::permutation::alpha1 * main.a1) + main.a2)) - 1)) + 1) * main.z);
```
#### With extension field
`cargo run pil test_data/std/permutation_via_challenges_ext.asm -o
output -f --field bn254 --prove-with halo2-mock`
The constraints are significantly more complex but seem correct to me:
```
col witness stage(1) z1;
col witness stage(1) z2;
(std::protocols::permutation::is_first * (main.z1 - 1)) = 0;
(std::protocols::permutation::is_first * main.z2) = 0;
(((((1 - main.first_four) * ((std::protocols::permutation::beta1 - ((std::protocols::permutation::alpha1 * main.b1) + main.b2)) - 1)) + 1) * main.z1') + ((7 * ((1 - main.first_four) * (std::protocols::permutation::beta2 - (std::protocols::permutation::alpha2 * main.b1)))) * main.z2')) = ((((main.first_four * ((std::protocols::permutation::beta1 - ((std::protocols::permutation::alpha1 * main.a1) + main.a2)) - 1)) + 1) * main.z1) + ((7 * (main.first_four * (std::protocols::permutation::beta2 - (std::protocols::permutation::alpha2 * main.a1)))) * main.z2));
((((1 - main.first_four) * (std::protocols::permutation::beta2 - (std::protocols::permutation::alpha2 * main.b1))) * main.z1') + ((((1 - main.first_four) * ((std::protocols::permutation::beta1 - ((std::protocols::permutation::alpha1 * main.b1) + main.b2)) - 1)) + 1) * main.z2')) = (((main.first_four * (std::protocols::permutation::beta2 - (std::protocols::permutation::alpha2 * main.a1))) * main.z1) + (((main.first_four * ((std::protocols::permutation::beta1 - ((std::protocols::permutation::alpha1 * main.a1) + main.a2)) - 1)) + 1) * main.z2));
```
#### On Goldilocks
Running the first example on GL fails, because using the permutation
argument without the extension field would not be sound. The second
example works, but because we don't support challenges on GL yet, it
doesn't actually run the second-phase witness generation.
---------
Co-authored-by: chriseth <chris@ethereum.org>
*Cherry-picked b1a07bd9a7 from #1380, and
extended on it.*
Fixes#1382.
With this PR, a lookup like `selector { byte_lower + 256 * byte_upper }
in { <some other machine> }` works, even if the range constraints on
`byte_lower` and `byte_upper` are not "global". For example, they could
be implemented as `selector { byte_lower } in { BYTES }` (i.e.,
`byte_lower` is only range constrained when the machine call is active).
To make this work, I changed the `Machine::process_plookup` interface
like this:
```diff
fn process_plookup<'b, Q: QueryCallback<T>>(
&mut self,
mutable_state: &'b mut MutableState<'a, 'b, T, Q>,
identity_id: u64,
- args: &[AffineExpression<&'a AlgebraicReference, T>],
+ caller_rows: &'b RowPair<'b, 'a, T>,
) -> EvalResult<'a, T>;
```
The `RowPair` passed by the caller contains all range constraints known
at runtime. The LHS of the lookup (or permutation) is no longer
evaluated by the caller but by the callee. For this, the callee needs to
remember the identity associated with the `identity_id` (before this PR,
most machines just remembered the RHS, not the full identity). I don't
expect there to be any performance implications, because we only invoke
one machine (since #1154).
### Benchmark results
```
executor-benchmark/keccak
time: [14.609 s 14.645 s 14.678 s]
change: [-2.5984% -2.3127% -2.0090%] (p = 0.00 < 0.05)
Performance has improved.
executor-benchmark/many_chunks_chunk_0
time: [39.299 s 39.380 s 39.452 s]
change: [-3.9505% -3.6909% -3.4063%] (p = 0.00 < 0.05)
Performance has improved.
```
---------
Co-authored-by: Leo <leo@powdrlabs.com>