Introduce `PgoAdapter` which wraps `Adapter` and can call apc
generation. Removes the match statements over the pgo method. Breaking
change. Now the client can pick some pgo implementations like
`pgo::CellPgo` and use it like
`CellPgo::<MyAdapter>::new(pgo_data).generate_apcs(blocks)`. The client
can also implement `PgoAdapter<A: Adapter>` for its own pgo
implementation.
After #3209, it is no longer necessary to list OpenVM's "bitwise lookup"
bus in the `BusType` enum, which lives in the VM-agnostic
`autoprecompiles` crate.
Removes the special introduction and removal of bus variables because
the solver can now handle this on its own through the `LinearizedSolver`
component. It also has the benefit that we do not need to run the
optimizer twice any more (once with bus interaction variables and once
without).
Previously we would introduce a new variable type outside of the solver
and replace all bus interactions `bus_interaction(1, x + y * z)` by
`bus_interaction(1, bus_var_1)` and the algebraic constraint `bus_var_1
= x + y * z`.
Now the `LinearizedSolver` is a component inside the solver that
replaces all bus interaction fields (that are not already constants or
simple variables) and also replaces all non-linear components:
`bus_interaction(1, x + y * z)`
becomes
```
bus_interaction(1, v1)
v1 = x + v2
v2 = y * z
```
The reason behind also replacing non-linear parts is because range
constraints are only stored for variables, not for all sub-expressions.
Also if we find the same non-linear sub-expression multiple times, we
can now replace it directly by the same variable.
---------
Co-authored-by: Georg Wiese <georgwiese@gmail.com>
Summary: We do not need to add the range constraint on the output
because the output is uniquely determined by the formula and the range
constraints on the inputs.
Details:
This slightly improves the "low degree bus interaction optimizer". It
replaces a bus interaction by an algebraic constraint and simple range
constraints in the following way:
1. It performs exhaustive search about which of the non-constant fields
in the bus interaction are inputs and which (only one) is the output
2. Determines currently-known range constraints on inputs and outputs
3. Using exhaustive search, chooses a candidate function and checks that
for all input values in the range constraints, the candidate computes
the same output as the bus interaction would (i.e. the bus interaction
also allows exactly one output compatible with the concrete values for
the inputs).
4. If such a candidate is function, it replaces the bus interaction by
the candidate turned into an algebraic constraint and simple range
constraints (the ones determined in the beginning)
This PR changes the mechanism in that it does not re-add range
constraints for the output, only for the inputs. The reason is that the
algebraic constraint directly computes the value for the output for all
inputs within the range constraints of the input. So the algebraic
constraint already enforces the range constraint on the output from the
range constraints on the inputs.
Previously, for successful tests, it would still update the expectation
file and print that it did so, even though the contents were the same.
Now the "update" message is only printed if the contents are actually
different.
This PR adds a new optimizer: If a stateless bus interaction (a.k.a.
lookup) implements a low-degree function, it is replaced by a polynomial
constraint (and range constraints).
Some examples include:
- `xor(a, 0xff)` (logical `not` on bytes): This is a linear function
(`255 - a`)
- `xor(a, b)`, where `a` and `b` are known to not have the same bits
set: This equals `a + b`
- `xor(a, b)`, where `a` and `b` are known to be bit-valued: This is
equal to `a + b - 2 * a * b`
Doing this replacement has several advantages:
- A bus interaction is replaced with a polynomial constraint (which is
usually much cheaper)
- Because the output is equal to a low-degree expression of the inputs
(often linear), the inliner can inline it, so we don't have to commit to
it
Note that in addition to the linear function, a lookup always *also*
enforces a range constraint. These are added to the system by using the
API added with the range constraint optimizer (#3151). In many cases,
these constraints are redundant and can be removed again by the range
constraint optimizer (e.g. the inputs might come from memory, which can
be assumed to store byte-constrained words.
---------
Co-authored-by: Leo <leo@powdrlabs.com>
Co-authored-by: chriseth <chris@ethereum.org>
Fixes a soundness bug @chriseth
[found](https://github.com/powdr-labs/powdr/pull/3151/files#r2267259864)
that was introduced in #3151: I thought we can e.g. check `expr` to be 6
bit by checking `4 * expr` to be a byte. That is incorrect, e.g.
$4^{-1}$ would pass the check but should not.
We used 5 for OpenVM's default constraint max degree bound for a long
time since that's what was in their SDK example, but they actually use
and suggest app log blowup = 1 for app proofs, which leads to max_degree
= 3. This PR changes the default to 3.
The basic PR is quite simple, see the first commit. A lot of the
`max_degree` propagation in the second commit is because of the
`get_air_metrics` function that needs it to call an OpenVM internal
function, and the max degree is not available in any existing data
structure at that point.
I thought the easiest was just to add it to these functions, and I had
to change one trait. But that felt easier/less invasive than changing
associated types/constants of other traits. @Schaeff let me know if this
is fine or if you prefer another route.
We used the default tuple range checker sizes, but it should be read
from the config. For example, in the reth benchmarks, the sizes are
actually different, leading to potential soundness or completeness bugs.
Two reasons for this:
- When making changes to the optimizer, the diffs in the snapshots show
immediately which columns got removed
- When looking for more optimizations, it's also useful to go through
each column and ask "Why do we need this column?"
Turns `Solver` into a trait to abstract away the bus interaction handler
and also allow other implementations in the future.
It also makes the solver persistent across optimization loops so that we
do not have to derive the same range constraints all over again.
This is also a preparatory PR to make the solver more independent from
the constraint system: The constraint system inside the solver will have
way more redundancies in the future.
---------
Co-authored-by: Georg Wiese <georgwiese@gmail.com>
Removes algebraic constraints that are implied by other algebraic
constraints.
In order to do that, grouped expressions are turned into products of
factors such that these factors are as much normalized as possible. Then
we try to find algebraic constraints that are divisors of other
algebraic constraints (i.e. the factors are subsets).
We used to need the Opcode of the APC in order to hard-code it in the PC
lookup. However, for a while, we have removed PC lookups entirely,
hard-coding the PC in the execution bridge receive instead.
This removes the necessity to store the opcode. It was only used as an
ID of the APC. But for this, we can use the PC instead. This has the
added benefit that we can more easily relate it to the original program.
The `BusType` enum in the `autoprecompiles` crate still listed the
OpenVM bus types. Now, the bus types that we actually rely on are
hard-coded (e.g. execution bridge, memory, etc), but there is an `Other`
variant where the integration can add their own types:
92ae7469e8/autoprecompiles/src/bus_map.rs (L5-L19)
---------
Co-authored-by: Thibaut Schaeffer <schaeffer.thibaut@gmail.com>
This PR generalizes the memory optimizer to not make any specific
assumptions about the structure of memory bus interactions. This is done
by turning `MemoryBusInteraction` into a trait, which is an associated
type in the `Adapter`. Currently, there is one concrete implementation,
`OpenVmMemoryBusInteraction`.
In order to get a feel for the optimizer, I think it is good to also
output the column count. It is rather difficult to get the column count
before optimization, but in the end we are only interested in the effect
of a new optimization step anyway, so I think this is fine.
---------
Co-authored-by: Georg Wiese <georgwiese@gmail.com>
Generalizations that make fewer assumptions about the structure of bus
interactions like PC lookups and bus interactions. See comments below.
One change is that we know inline the initial PC (because we know it for
any given basic block), which leads to *all* PC lookups being removed,
slightly improving performance (at least reducing one witness column &
one bus interaction).
---------
Co-authored-by: schaeff <thibaut@powdrlabs.com>
Suggested alternative to #3057: Instead of `start_idx`, we store the
`start_pc` to identify a basic block. Both should uniquely identify the
basic block, but the start PC is useful information on its own (whereas
the `start_idx` only makes sense to someone who knows the entire
program).
See also
https://github.com/powdr-labs/powdr/pull/3055#discussion_r2215738707.
This PR does two things:
- it partially removes the notion of opcode in the autoprecompile crate,
and instead relies on user provided functions to know if an instruction
is allowed/branching
- it uses VmOpcode as much as possible in the openvm crate instead of
usize
After this PR, the only remaining notion of opcode in the apc crate is
in the `SymbolicInstructionStatement` where it is still a single field
element. @georgwiese shared ideas to remove that too, or at least shift
it to the client, see #3055
We currently export the APCs to disk in the openvm crate.
Now that the APCs are generic, we can export them in the APC crate
already.
The json file is still generated in openvm, since it contains some
openvm-specific data, such as the number of logup columns (which depends
on the stark config, which the APC crate does not know about)
Just so we don't do this:
```rust
quadratic_symbolic_expression_to_algebraic(
&algebraic_to_quadratic_symbolic_expression(
&quadratic_symbolic_expression_to_algebraic(&expr)))
```
😉
I think I'll focus more on PGO tmr related to APC vs non APC columns +
some more tests, so might as well merge this for now.
Conclusion for shift chip: no further optimization detected. Did a nice
job removing compile time flags and `imm` value related markers.
In cell pgo:
- rank apcs by `cells_saved / column_count`
- select them as long as the total column count doesn't cross a max
value, possibly skipping some
---------
Co-authored-by: Steve Wang <qian.wang.wg24@wharton.upenn.edu>
Again, leaving comments here as I mark things or detect further
optimizations.
Intended single intruction tests for Branch Eq and Branch Lt chips (6
instructions total). Should be the most popular chips after ALU and LS.
Findings: after fully inspecting these branch related instructions, I
conclude that there aren't further optimizations needed. These chips
mostly revolve around value comparison (lots of diff and inverse
columns), all of which require knowing register values to further
simplify, which aren't typically solved at compile time, and therefore
almost all original constraints and bus interactions stay in except
those that involve compile time opcode flags.