612 Commits

Author SHA1 Message Date
chriseth
65a2fdd6a4 Unify single step and block processor. (#2317)
Extracts the common code in single step processor and block processor
into a unified "processor".

---------

Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2025-01-10 23:15:50 +00:00
chriseth
3d5bce0ab4 Iterator stuff (#2320) 2025-01-10 17:45:56 +00:00
Georg Wiese
1ee15050a9 Improve logging (#2321)
Cleans up the log a little bit, moving some logs to trace level.
Especially related to JIT, which fails quite frequently still.

For example, this exposes the issue fixed by #2322:
```
$ RUST_LOG=debug cargo run pil test_data/std/binary_large_test.asm -o output -f --linker-mode bus
...
[00:00:00 (ETA: 00:00:00)] ░░░░░░░░░░░░░░░░░░░░ 0% - Starting...                                                                                                                           Compiling JIT function for 
  Machine: Secondary machine 0: main_binary (BlockMachine)
  Identity: main::instr_and $ [0, main::X0, main::X1, main::X2] is main_binary::latch * main_binary::sel[0] $ [main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C]
   Known args: 1110
=> Error generating JIT code: Code generation failed: Incomplete machine calls:
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row -1)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 0)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 1)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 2)
...
Compiling JIT function for 
  Machine: Secondary machine 0: main_binary (BlockMachine)
  Identity: main::instr_or $ [1, main::X0, main::X1, main::X2] is main_binary::latch * main_binary::sel[1] $ [main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C]
   Known args: 1110
=> Error generating JIT code: Code generation failed: Incomplete machine calls:
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row -1)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 0)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 1)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 2)
...
Compiling JIT function for 
  Machine: Secondary machine 0: main_binary (BlockMachine)
  Identity: main::instr_xor $ [2, main::X0, main::X1, main::X2] is main_binary::latch * main_binary::sel[2] $ [main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C]
   Known args: 1110
=> Error generating JIT code: Code generation failed: Incomplete machine calls:
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row -1)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 0)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 1)
  Constr::PhantomBusInteraction(18446744069414584320 * (main_binary::latch * main_binary::sel[0]), [1, main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C], main_binary::latch); (row 2)
...
Found loop with period 1 starting at row 100
101 of 128 rows are used in machine 'Main machine (Dynamic)'.
Looping failed. Trying to generate regularly again. (Use RUST_LOG=debug to see whether this happens more often.) 128 / 129
[00:00:00 (ETA: 00:00:00)] ████████████████████ 100% - Starting...                                                                                                                         Finalizing VM: Main machine (Dynamic)
Secondary machine 0: main_binary (BlockMachine): 0 / 18 blocks computed via JIT.
72 of 128 rows are used in machine 'Secondary machine 0: main_binary (BlockMachine)'.

 == Witgen profile (570 events)
   45.5% ( 975.3ms): Secondary machine 0: main_binary (BlockMachine)
   44.6% ( 956.3ms): FixedLookup
    6.1% ( 131.3ms): multiplicity witgen
    3.5% (  75.7ms): witgen (outer code)
    0.2% (   4.2ms): Main machine (Dynamic)
  ---------------------------
    ==> Total: 2.142724334s
```

---------

Co-authored-by: chriseth <chris@ethereum.org>
2025-01-10 14:21:22 +00:00
Georg Wiese
725fcebb48 Fully implement irregular block shapes (#2281)
With #2304, JITed block machine witgen functions can already access the
previous row in the last block. With this PR, functions that actually do
that are no longer filtered out.

Changes:
- The accessed cells are now checked properly (not just looking at
assignments)
- Instead of filtering out functions that access outside the bound
`0..block_size`, we *assert* that the accessed cells are in the bounds
`-1..block_size`. This is sufficient in practice (all the tests pass),
although in theory the block machine processor could generate generate
accesses to row `block_size` and even `block_size + 1` (via next
references). I'm not sure if that's worth handling though if it never
happens in practice, so I figured it's best to fail hard if it does
happen, so we know when we do need to handle it.

As a result, we now use the JIT path for 2 machines in the Keccak RISC-V
example:
```
Secondary machine 0: main_binary (BlockMachine): 50462 / 50462 blocks computed via JIT.
201848 of 262144 rows are used in machine 'Secondary machine 0: main_binary (BlockMachine)'.
40128 of 65536 rows are used in machine 'Secondary machine 1: main_memory (DoubleSortedWitnesses32)'.
Secondary machine 2: main_poseidon_gl (BlockMachine): 0 / 1 blocks computed via JIT.
31 of 32 rows are used in machine 'Secondary machine 2: main_poseidon_gl (BlockMachine)'.
361974 of 524288 rows are used in machine 'Secondary machine 4: main_regs (DoubleSortedWitnesses32)'.
Secondary machine 5: main_shift (BlockMachine): 11734 / 11734 blocks computed via JIT.
46936 of 65536 rows are used in machine 'Secondary machine 5: main_shift (BlockMachine)'.
Secondary machine 6: main_split_gl (BlockMachine): 0 / 7 blocks computed via JIT.
56 of 64 rows are used in machine 'Secondary machine 6: main_split_gl (BlockMachine)'.
```

---------

Co-authored-by: chriseth <chris@ethereum.org>
2025-01-09 16:50:34 +00:00
chriseth
40aa582bb5 Also use variables for known inputs in machine calls. (#2315)
This introduces variables not only for outputs but only for inputs. This
allows us to back-propagate range constraints we get from the
sub-machine call also to inputs.
2025-01-09 15:27:10 +00:00
Georg Wiese
9c6315e242 Assert that block machines are stackable (#2307)
Fixes a TODO that makes block machine fail hard if a block machine is
not stackable. This way, we'll know when this happens, instead of
silently resorting to run-time solving.
2025-01-09 15:00:20 +00:00
Georg Wiese
e328eb90da Do multiplicity witgen in separate stage (#2319)
While working on #2306, @Schaeff came across several bugs in
multiplicity witness generation. These were undetected, because we
ignored multiplicities in the mock prover, which will be fixed by #2310.
With this PR, #2310 will be green.

The issue was that counting multiplicities inside
`Machine::process_plookup()` fails if the caller actually discards the
result. This happens in a few places, for example during our loop
optimization in the "dynamic machine".

With this PR, we instead have a centralized
`MultiplicityColumnGenerator` that counts multiplicities after the fact,
by going over each lookup, evaluating the two selected tuples on all
rows, and counting how often each element in the LHS appears in the RHS.

To measure the runtime of this, I ran:
```sh
export TEST=keccak
export POWDR_JIT_OPT_LEVEL=0

cargo run -r --bin powdr-rs compile riscv/tests/riscv_data/$TEST -o output --max-degree-log 18
cargo run -r --features plonky3,halo2 pil output/$TEST.asm -o output -f --field gl --linker-mode bus
```

I get the following profile on the server:
```
 == Witgen profile (2554126 events)
   32.4% (    2.6s): Secondary machine 0: main_binary (BlockMachine)
   23.1% (    1.9s): Main machine (Dynamic)
   12.7% (    1.0s): Secondary machine 4: main_regs (DoubleSortedWitnesses32)
   10.0% ( 809.9ms): FixedLookup
    7.7% ( 621.1ms): Secondary machine 5: main_shift (BlockMachine)
    5.6% ( 454.6ms): Secondary machine 2: main_poseidon_gl (BlockMachine)
    3.8% ( 312.3ms): multiplicity witgen
    3.8% ( 308.2ms): witgen (outer code)
    0.6% (  45.3ms): Secondary machine 1: main_memory (DoubleSortedWitnesses32)
    0.4% (  33.4ms): Secondary machine 6: main_split_gl (BlockMachine)
    0.0% (   8.0µs): Secondary machine 3: main_publics (WriteOnceMemory)
  ---------------------------
    ==> Total: 8.114630092s
```

So the cost is ~4%. I'm sure it can be optimized further but I would
like to leave this to a future PR.
2025-01-09 14:50:48 +00:00
chriseth
707522d46a Simplify range constraints. (#2314)
Fixes #2313
2025-01-08 15:36:24 +00:00
Georg Wiese
379267dda6 Simplify FinalizableData (#2304)
A refactoring pulled out of #2281, with the following changes:
- `FinalizableData` no longer supports unfinalized rows before the
section of finalized rows. This simplifies `FinalizableData`.
- To handle wrapping, we might still night the "unfinalized" row. For
that purpose, I added `FinalizableData::get_in_progress_row`. Note that
using this means that any range constraints will be lost, but that's not
a problem in practice.
2025-01-08 11:17:39 +00:00
Georg Wiese
04de307993 Refactor ExpressionEvaluator (#2309)
This is a first step to generalize the expression evaluator. Currently
just merging the `TraceValues` and `GlobalValues` traits, as it seemed a
bit unnecessary to have two of them.

Eventually, I want to have a more generic `ExpressionWalker`, of which
`ExpressionEvaluator` would be a special case. That walker could also be
used to do things like:
- Degree computation
- check if the expression (or any referenced intermediate) has a next
reference

While still preserving the caching behavior, so that we don't run
exponentially long on these operations either!
2025-01-08 09:43:05 +00:00
chriseth
dc4c5cdc4a Single step with branching (#2274)
Infers single-step update code including branching.

TODO:
- [ ] perform constant propagation after branching (using the evaluator
in `only_concrete_known()`-setting)
- [ ] extend the "can process fully"-interface to allow answers like
"yes, I am authorized to process, but it will never succeed given these
range constraints". This way we can remove conflicting combinations of
e.g. instruction flags. Another way would be to directly return range
constraints on variables with the "can process" call, that way we could
save branching. Maybe it's also fine to just implement this for the
fixed machine. In any case, we should double-check that we do not create
another machine call for an identity we already solved.

---------

Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2025-01-07 12:31:56 +00:00
Georg Wiese
4bb99beb6a JIT for block machines with non-rectangular shapes (#2275)
Depends on #2279

This PR implements JIT code generation for block machines with irregular
block shape, such as `std::machines::large_field::binary::Binary`. This
is achieved as follows:
- Instead of solving rows `0..block_size`, we run the solver for rows
`-1..(block_size + 1)`. This way, the solver is able to generate code
that writes to the previous row of the last block or the first row of
the next block.
- At the end, we check whether the generated code is actually
consistent: For example, if the code writes to the last row of the
previous block, it can't have a unknown value in the same cell of the
current block (unless it's known to be the same).

Note that the generated code is still not used in practice, because we
don't call the JIT with the right amount of context. I started fixing
this in #2281, but it is still WIP.

---------

Co-authored-by: chriseth <chris@ethereum.org>
2025-01-03 13:59:55 +00:00
chriseth
6d2249a2b4 Single step processor. (#2272)
Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2025-01-03 11:30:01 +00:00
chriseth
7282d90a91 Remove test function. (#2299)
Removes `MutableState::get_machine` since we already have the machine
before we construct the MutableState.
2025-01-02 14:53:53 +00:00
Georg Wiese
816e8c742c Enable Bus in tests (#2289)
This PR uses `LinkerMode::Bus` in all tests which:
- Are on Goldilocks or BN254 (the bus is not implemented for smaller
fields), AND
- Are tested with the mock prover or Plonky3 (while Halo2 does support
the bus in principle, its runtime is exponential, because it inlines
intermediate polynomials)

This is true for most tests.

Doing so, I came across a few bugs. I added comments for them below.
2024-12-30 17:44:15 +00:00
Georg Wiese
aff92d092d Improve error reporting in BlockMachineProcessor (#2279)
Extracted from #2275.

This PR makes it easier to debug failing code generation in
`BlockMachineProcessor` by printing the code generated so far.

Example:
```
$ RUST_LOG=debug cargo run pil test_data/std/binary_large_test.asm -o output -f
...
Code generation failed for connection:
  main::instr_and $ [0, main::X0, main::X1, main::X2] is main_binary::latch * main_binary::sel[0] $ [main_binary::operation_id, main_binary::A, main_binary::B, main_binary::C]
Known arguments:
  main_binary::operation_id
  main_binary::A
  main_binary::B
Error:
  Unable to derive algorithm to compute output value "main_binary::C"
The following code was generated so far:
main_binary::sel[0][3] = 1;
main_binary::operation_id[3] = params[0];
main_binary::A[3] = params[1];
main_binary::B[3] = params[2];
main_binary::operation_id[2] = main_binary::operation_id[3];
main_binary::operation_id_next[2] = main_binary::operation_id[3];
main_binary::operation_id[1] = main_binary::operation_id[2];
main_binary::operation_id_next[1] = main_binary::operation_id[2];
main_binary::operation_id[0] = main_binary::operation_id[1];
main_binary::operation_id_next[0] = main_binary::operation_id[1];
...
```
2024-12-30 16:09:13 +00:00
Georg Wiese
1226ec078f Refactor tests of BlockMachineProcessor (#2276)
With this PR, we expect a "main" machine with a connecting identity to
the block machine, and run the "normal" machine extractor on the entire
PIL.

Besides reducing the amount of code, this gives us more objects, like
`MutableState`.
2024-12-30 11:52:46 +00:00
Georg Wiese
e0d8a0c69f Remove BitAnd<GoldilocksField> for GoldilocksField (#2278)
Field elements should only be ANDed with integers. I added `BitAnd<u64>
for GoldilocksField` in #2277, this removes the possibility to XOR two
field elements.
2024-12-30 11:46:03 +00:00
Georg Wiese
51561798dd Use ExpressionEvaluator more (#2284)
This PR move the `ExpressionEvaluator` to `executor-utils` and
generalizes it such that it can evaluate an `AlgebraicExpression<T>` to
any type, not just `T`. This makes it possible to use the evaluator in
the backends. I used it in Plonky3 and Stwo, which leads to significant
code deletion.

The main feature of `ExpressionEvaluator` is that it handles
intermediate polynomials by caching during evaluation. This is cheaper
than using `Analyzed::identities_with_inlined_intermediate_polynomials`,
which might build exponentially large expressions. The Plonky3
implementation already did the same; The stwo implementation still used
`identities_with_inlined_intermediate_polynomials()` and now handles
intermediates properly.
2024-12-29 20:15:22 +00:00
ShuangWu121
2e978667b2 Update stwo dependency to the latest version (#2264)
### PR: Update Powdr's `stwo` Dependency and Align Toolchain

This PR updates Powdr's `stwo` dependency to the latest version, which
now uses the `nightly-2024-12-17` Rust toolchain. To ensure
compatibility, Powdr's toolchain has also been aligned with this new
nightly version.

As part of this update:
- Several modifications were made to address stricter rules and lints
introduced by the newer version of Clippy.
- System dependencies, including `uuid-dev` and `libgrpc++-dev`, were
added to resolve build and runtime issues brought about by updated
dependencies and toolchain requirements.
2024-12-23 11:48:52 +00:00
Georg Wiese
6151081063 Fixes in Goldilocks field implementation (#2277)
Pulled out of #2276. Needed to make the code generated for the Binary
machine compile.
2024-12-20 21:08:29 +00:00
chriseth
c1b776312a use can_process (#2256)
Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2024-12-20 13:31:41 +00:00
Georg Wiese
550a1e680a Log JIT statistics (#2273)
Currently, it is hard to know for which machines the WitJITGen worked.
With this PR, we can see it in the debug log:

```
$ POWDR_JIT_OPT_LEVEL=1 RUST_LOG=debug cargo run pil test_data/std/poseidon_gl_test.asm -o output -f
...
[00:00:10 (ETA: 00:00:00)] ████████████████████ 100% - Starting...                                          Finalizing VM: Main machine (Dynamic)
Secondary machine 0: main_poseidon (BlockMachine): 5 / 5 blocks computed via JIT.
155 of 256 rows are used in machine 'Secondary machine 0: main_poseidon (BlockMachine)'.

 == Witgen profile (532 events)
   96.9% (   10.4s): JIT-compilation
    2.5% ( 268.6ms): Secondary machine 0: main_poseidon (BlockMachine)
    0.5% (  56.3ms): witgen (outer code)
    0.1% (   8.7ms): Main machine (Dynamic)
    0.0% (   3.2ms): FixedLookup
    0.0% (   2.1µs): range constraint multiplicity witgen
  ---------------------------
    ==> Total: 10.75908525s

$ cargo run -r --features plonky3 --bin powdr-rs compile riscv/tests/riscv_data/keccak -o output --max-degree-log 18 --field gl --coprocessors arith
...
$ cargo run -r --features plonky3 pil output/keccak.asm -o output -f --field gl --prove-with mock --linker-mode bus
...
[00:00:15 (ETA: 00:00:00)] ████████████████████ 100% - 94652 rows/s, 12304k identities/s, 100% progress     Finalizing VM: Main machine (Dynamic)
Machine Secondary machine 0: main_arith (BlockMachine) is never used at runtime, so we remove it.
Secondary machine 1: main_binary (BlockMachine): 0 / 50462 blocks computed via JIT.
201848 of 262144 rows are used in machine 'Secondary machine 1: main_binary (BlockMachine)'.
40180 of 65536 rows are used in machine 'Secondary machine 2: main_memory (DoubleSortedWitnesses32)'.
Secondary machine 3: main_poseidon_gl (BlockMachine): 0 / 1 blocks computed via JIT.
31 of 32 rows are used in machine 'Secondary machine 3: main_poseidon_gl (BlockMachine)'.
362234 of 524288 rows are used in machine 'Secondary machine 5: main_regs (DoubleSortedWitnesses32)'.
Secondary machine 6: main_shift (BlockMachine): 0 / 11734 blocks computed via JIT.
46936 of 65536 rows are used in machine 'Secondary machine 6: main_shift (BlockMachine)'.
Secondary machine 7: main_split_gl (BlockMachine): 0 / 7 blocks computed via JIT.
56 of 64 rows are used in machine 'Secondary machine 7: main_split_gl (BlockMachine)'.

 == Witgen profile (2554854 events)
   48.7% (    8.0s): Secondary machine 1: main_binary (BlockMachine)
   18.4% (    3.0s): Main machine (Dynamic)
   12.4% (    2.0s): Secondary machine 6: main_shift (BlockMachine)
   10.5% (    1.7s): FixedLookup
    6.1% ( 999.6ms): Secondary machine 5: main_regs (DoubleSortedWitnesses32)
    2.5% ( 404.6ms): witgen (outer code)
    0.7% ( 114.5ms): Secondary machine 7: main_split_gl (BlockMachine)
    0.5% (  78.8ms): Secondary machine 3: main_poseidon_gl (BlockMachine)
    0.3% (  54.9ms): Secondary machine 2: main_memory (DoubleSortedWitnesses32)
    0.0% (   5.0ms): range constraint multiplicity witgen
    0.0% (   9.4µs): Secondary machine 4: main_publics (WriteOnceMemory)
  ---------------------------
    ==> Total: 16.446211875s
```
2024-12-20 12:20:20 +00:00
chriseth
6549bd2f49 Bisect range constraints. (#2270)
This adds a function to range constraints that splits a range constraint
into two disjoint range constraints of roughly the same size that equal
the initial range constraint when combined.

This function is planned to be used for branching in autowitjitgen.
2024-12-20 10:48:18 +00:00
chriseth
d5c8a1ef87 Not-concrete-is-unknown eval (#2261) 2024-12-19 16:38:20 +00:00
chriseth
a2a67a6264 Support machine calls. (#2241)
Depends on #2244
2024-12-19 16:35:19 +00:00
chriseth
309279ac8a Extract effect and sub-types into its own module. (#2266) 2024-12-19 14:27:25 +00:00
Georg Wiese
a737ed851f Call JIT (#2242)
This PR puts together the pieces to run compile-time witgen for block
machines. There are still many cases where it doesn't work yet, in which
case it falls back to run-time solving. These cases should be fixed in
future PRs.

It also fixes two bugs:
- When multiplying two affine expression, the case where one of them is
zero is now handled properly.
- `WitgenInference` now handles intermediate columns.

Note that this PR could slow down witgen by attempting to compile code
once per incoming connection and input / output combination, in block
machines. I think this should be negligible though and it gives us that
much of the new pipeline is already running in the tests and elsewhere.

# Benchmark results

I tested the code with different opt levels on a benchmark that computes
ca. $2^{16}$ Poseidon hashes.

## Baseline

```
 == Witgen profile (393220 events)
   93.0% (   30.8s): Secondary machine 0: main_poseidon (BlockMachine)
    4.1% (    1.4s): witgen (outer code)
    2.3% ( 750.8ms): Main machine (Dynamic)
    0.6% ( 204.4ms): FixedLookup
    0.0% (   3.2µs): range constraint multiplicity witgen
  ---------------------------
    ==> Total: 33.109672458s
```

## JIT (opt level 1)

```
 == Witgen profile (393222 events)
   52.3% (    7.7s): JIT-compilation
   32.0% (    4.7s): Secondary machine 0: main_poseidon (BlockMachine)
    9.2% (    1.3s): witgen (outer code)
    5.1% ( 748.3ms): Main machine (Dynamic)
    1.4% ( 213.5ms): FixedLookup
    0.0% ( 417.0ns): range constraint multiplicity witgen
  ---------------------------
    ==> Total: 14.729149333s
```

## JIT (opt level 3)

```
== Witgen profile (393222 events)
   94.6% (  107.9s): JIT-compilation
    3.4% (    3.9s): Secondary machine 0: main_poseidon (BlockMachine)
    1.1% (    1.3s): witgen (outer code)
    0.7% ( 746.5ms): Main machine (Dynamic)
    0.2% ( 204.1ms): FixedLookup
    0.0% ( 542.0ns): range constraint multiplicity witgen
  ---------------------------
    ==> Total: 114.036571291s
```
2024-12-18 20:27:49 +00:00
chriseth
89a01a4e8b "can process fully" (#2255)
Add a function to the Machine trait that can be used to determine if a
submachine call with certain known inputs and certain range constraints
can be fully processed by the called machine.

---------

Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2024-12-18 19:33:37 +00:00
chriseth
1b58d1e18a Introduce concept of assignment. (#2244) 2024-12-18 19:21:48 +00:00
chriseth
18f6da0c47 Specialized code for goldilocks. (#2253)
This is mostly a reduced copy of the goldilocks implementation we
already have with the main difference that the division tries to perform
integer division first if it can be done without remainder.
2024-12-18 19:09:51 +00:00
chriseth
d98b7ebb58 Fix bit operations. (#2249)
This fixes a bug in witjitgen where the bit operations used for masking
in bit-decomposition would use field elements as types for the masks.
The problem is that we sometimes mask using masks outside the field and
the proper type for masks is FieldElement::Integer.
2024-12-18 13:56:29 +00:00
chriseth
bdfa4e1102 Use truncate instead of pop. (#2247) 2024-12-18 13:07:28 +00:00
chriseth
3889799528 Profile jit compilation separately. (#2248) 2024-12-18 13:06:27 +00:00
chriseth
a3d9f38682 Implement Neg for FieldElement. (#2252) 2024-12-18 12:44:18 +00:00
taikoon
1a9c99d232 chore: remove duplicate words (#2227) 2024-12-18 12:14:36 +00:00
Georg Wiese
e2131f501f Block machine processor (#2226)
This PR adds a basic block machine processor. Currently, we assume a
rectangular block shape and just iterate over all rows and identities of
the block until no more progress is made. This is sufficient to generate
code for Poseidon.
2024-12-17 13:31:03 +00:00
chriseth
a68a20c9a4 Effects to rust (#2229)
Formats a vector of "effects" coming from the witgen solver into rust
code, compiles and loads it.

Submachine calls and receiving arguments will be done in another PR.

This code assumes `known` to be a padded bit vector ( #2230 ).

---------

Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2024-12-17 10:41:04 +00:00
Georg Wiese
d007b8f145 Prepare block machine processor (#2231)
A few preparations for #2226:
- Extracted a `test_utils` module
- Introduced a new `Variable` type, which can refer to either a cell or
a "parameter" (either input or output of a machine call). I think in the
future, we could have more variants (e.g. scalar publics). `Variable` is
now used instead of `Cell` in `WitgenInference`.
- `WitgenInference::process_identity` now also returns whether any
progress has been made.
- Renamed `lookup` -> `machine_call` when rendering `Effect`s
2024-12-16 13:53:10 +00:00
chriseth
e5f9392010 Use padded bitvec. (#2230) 2024-12-16 10:49:30 +00:00
chriseth
12aca0e136 Various preparatory changes. (#2228) 2024-12-12 17:56:54 +00:00
chriseth
e3c4c858f0 Witgen inference. (#2219)
This PR adds a component that can derive assignments and other code on
identities and multiple rows. It keeps track of which cells in the trace
are already known and which not. The way to access fixed rows is
abstracted because it does not have a concept of an absolute row. While
this might work for block machines with cyclic fixed columns, it does
not work in the general case.

What it does not do:
- have a sequence of which identities to consider on which rows
- a mechanism that determines when it is finished

---------

Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2024-12-12 16:12:30 +00:00
chriseth
2aa7f8388c Prepare to call jit from block machine. (#2098)
This PR performs preliminary preparations in the block machine so that
it will be able to JIT-compile and evaluate lookups into this machine
given a certain combination of "known inputs".

---------

Co-authored-by: Georg Wiese <georgwiese@gmail.com>
2024-12-12 11:58:23 +00:00
chriseth
598352a23d Expressions and solving routines. (#2212)
This module is an equivalent of the existing affine_expression.rs, but
for compile-time execution on symbolic values instead of run-time
execution on concrete values.

Using the operators defined on that that type, you can build a
SymbolicAffineExpression from a polynomial identity and then use
`.solve()` to try to solve for one unknown variable. The result (instead
of a concrete assignment as in affine_expression.rs) is a
SymbolicExpression, i.e. a complex expression involving variables
(assumed to have a concrete value known at run time), constants and
certain operators on them.

The idea is that SymbolicAffineExpression is used on polynomial
identities in turn and solving for one cell after the other in the
trace. The resulting SymbolicExpression can be translated to rust or
pil.
2024-12-11 17:06:18 +00:00
Georg Wiese
0180542559 Refactorings around process_lookup_direct (#2209)
This PR refactors a few things:
- `process_lookup_direct` no longer has a default implementation.
Eventually, we want all machines to implement it, so I figured it would
be better to explicitly panic in each machine.
- Refactored the implementation of
`FixedLookupMachine::process_plookup`, pulling some stuff out into a new
`CallerData` struct. This is similar to what @chriseth has done on
[`call_jit_from_block`](https://github.com/powdr-labs/powdr/compare/main...call_jit_from_block),
see the comment below.
- As a first test, I implemented `process_lookup_direct` for the
"large"-field memory machine (and `process_plookup` by wrapping
`process_lookup_direct`)
2024-12-10 16:44:15 +00:00
Georg Wiese
e286da46f7 Bus: Remove acc_next + materialize folded tuple (#2201)
This PR:
- Removes the `acc_next` columns, which were only needed because of a
limitation of prover functions. The prover function that existed is now
removed entirely, because we use the hand written witgen anyway, see
#2191.
- Also this PR materializes the folded tuple. This lowers the degree of
the constraints if the tuples being sent have a degree > 1. It also
enables next references in the tuple being sent.

As a result, we can now generate Plonky3 proofs with a bus!

```bash
cargo run -r --features plonky3 --bin powdr-rs compile riscv/tests/riscv_data/keccak -o output --max-degree-log 18 --field gl
cargo run -r --features plonky3 pil output/keccak.asm -o output -f --field gl --prove-with plonky3 --linker-mode bus
```

The proof generation takes 8.32s (of which 394ms are spent on generating
the second-stage witness). This compares to 2.07s proof time without a
bus.
2024-12-09 18:12:33 +00:00
Thibaut Schaeffer
576bdd714d Disallow iteration over hash types (#2167)
Iterating over hash types introduces non-determinism.
Ban generally and require turning off the lint locally when it's fine to
do so.
This
[lint](https://rust-lang.github.io/rust-clippy/master/index.html#/iter_over_hash_type)
only applies to loops.
2024-12-09 15:47:54 +00:00
Georg Wiese
ee7cf29c8e Implement process_lookup_direct for KnownMachine & change interface slightly (#2206)
Extracted out of #2071
2024-12-09 13:58:02 +00:00
Georg Wiese
dbc53c76a6 Hand-written bus witness generation (#2191)
Builds on #2194 and #2183.

This PR gives us (relatively) fast witness generation for the bus, by
writing custom code instead of relying on the generic solver + prover
functions:
```
$ cargo run -r --features plonky3 --bin powdr-rs compile riscv/tests/riscv_data/keccak-o output --max-degree-log 18 --field gl
$ cargo run -r --features plonky3 pil output/$TEST.asm -o output -f --field gl --prove-with mock --linker-mode bus
...
Running main machine for 262144 rows
[00:00:05 (ETA: 00:00:05)] █████████░░░░░░░░░░░ 48% - 24283 rows/s, 3169k identities/s, 92% progress                                    
Found loop with period 1 starting at row 127900
[00:00:05 (ETA: 00:00:00)] ████████████████████ 100% - 151125 rows/s, 16170k identities/s, 100% progress                                
Witness generation took 5.748081s
Writing output/commits.bin.
Backend setup for mock...
Setup took 0.54769236s
Generating later-stage witnesses took 0.29s
Proof generation took 2.0383847s
```

On `main`, second-stage witgen for the main machine alone takes about 5
minutes.
2024-12-09 13:35:23 +00:00
Georg Wiese
8a3e33e07c Add a simpler ExpressionEvaluator (#2194)
This PR:
- Renames the current `executor::witgen::ExpressionEvaluator` to
`executor::witgen::evaluators::partial_expression_evaluator::PartialExpressionEvaluator`
- It is used when solving and evaluates to a
`AffineResult<AlgebraicVariable<'a>, T>`, which might still contain
unknown variables.
- Adds a new `ExpressionEvaluator` that simply evaluates to `T`
- Changes `MockBackend` to use the new `ExpressionEvaluator` (previously
wrapped what is now called the `PartialExpressionEvaluator`)

As a result, the code in `MockBackend` can be simplified. Also, I'm
building on this in #2191 for fast witness generation for the bus.
2024-12-09 11:33:42 +00:00