This simulates one approach we could go for when moving registers to
memory. The memory machine remains completely unchanged, but the step is
increased by more than 1 in each step of the main machine. This way,
from the point of view of memory, all the memory operations happen at
different time steps, which allows for:
- Reading from the same address twice
- Writing to the same address that we read from (which from the point of
view of memory should happen *after* the read)
The only downside I see with this approach is that this makes the
differences of time steps between memory accesses bigger: Before it was
at most the degree, now it is some small constant times the degree (in
this example 3). The way the memory machine is currently built, the
difference can be at most $2^{32} - 1$, so I think this is fine in
practice. E.g., for a degree $2^{30}$ machine we could do up to 4
parallel reads / writes.
*Cherry-picked b1a07bd9a7 from #1380, and
extended on it.*
Fixes#1382.
With this PR, a lookup like `selector { byte_lower + 256 * byte_upper }
in { <some other machine> }` works, even if the range constraints on
`byte_lower` and `byte_upper` are not "global". For example, they could
be implemented as `selector { byte_lower } in { BYTES }` (i.e.,
`byte_lower` is only range constrained when the machine call is active).
To make this work, I changed the `Machine::process_plookup` interface
like this:
```diff
fn process_plookup<'b, Q: QueryCallback<T>>(
&mut self,
mutable_state: &'b mut MutableState<'a, 'b, T, Q>,
identity_id: u64,
- args: &[AffineExpression<&'a AlgebraicReference, T>],
+ caller_rows: &'b RowPair<'b, 'a, T>,
) -> EvalResult<'a, T>;
```
The `RowPair` passed by the caller contains all range constraints known
at runtime. The LHS of the lookup (or permutation) is no longer
evaluated by the caller but by the callee. For this, the callee needs to
remember the identity associated with the `identity_id` (before this PR,
most machines just remembered the RHS, not the full identity). I don't
expect there to be any performance implications, because we only invoke
one machine (since #1154).
### Benchmark results
```
executor-benchmark/keccak
time: [14.609 s 14.645 s 14.678 s]
change: [-2.5984% -2.3127% -2.0090%] (p = 0.00 < 0.05)
Performance has improved.
executor-benchmark/many_chunks_chunk_0
time: [39.299 s 39.380 s 39.452 s]
change: [-3.9505% -3.6909% -3.4063%] (p = 0.00 < 0.05)
Performance has improved.
```
---------
Co-authored-by: Leo <leo@powdrlabs.com>
Cherry-picked ef6a72fcfa from #1380.
With this PR, we track whether a call to a machine led to some side
effect (e.g. added a block). In that case, the processed identity should
count has having led to some progress, even if no values were returned
to the calling machine. An example would be writing values to memory,
which does not return any values and hence does not change the state of
the caller.
Fixes an issue that @leonardoalt had on his `binary-mux2` branch.
There are two ways to have a block machine that is connected via a
permutation:
1. Use permutations `<sel> { ... } is (sub.sel * sub.LATCH) { ... }`.
This makes sure only rows where `sub.LATCH` is `1` can be selected. This
is what we do when we compile from ASM to PIL.
2. Use permutations `<sel> { ... } is sub.sel { ... }`, but also a
constraint `(1 - sub.LATCH) * sub.sel = 0`. This achieves something
similar.
The problem is that in the second case, detecting the block size is
harder, because the latch doesn't appear anywhere in the selector. So we
used to look into *all* fixed columns to detect the period. But this
includes fixed columns that might have a larger period (as is the case
for the multiplexer machine).
This PR simply removes support for the second approach. I think this is
fine in practice, as I don't see a disadvantage of the first approach
and when you come from ASM everything works as expected. I did need to
adjust `test_data/pil/block_lookup_or_permutation.pil`, which used the
second approach.
This PR attempts various issues around using challenges in hints, which
is blocking #1306:
1. Hints of later-phase witness columns are now removed in witgen, as
these columns don't need to be computed yet anyway and the hint might be
accessing a challenge that does not exist.
2. The query callback is now cloned for each phase of witness generation
(because otherwise it was only available in the first phase).
3. `SymbolicEvaluator` no longer panics when encountering challenges,
but returns an error. This evaluator is used to detect patterns in
identities, like `A' - A = 0`. This means that we can't detect patterns
in identities that involve challenges, but at least it doesn't panic.
4. `witgen::query_processor::Symbols` can now evaluate challenges.
5. `witgen::query_processor::Symbols` now also looks up intermediate
"polynomials" (which includes challenges). This is necessary because we
don't currently inline intermediate polynomials in hints (which we do
for identities).
I added a test that demonstrates that challenges can now be used in
hints.
This PR adds witness generation support for copy constraints: Whenever a
cell value is determined, this value is copied to all cells in its
equivalence class. This allows us to do witgen for arbitrary Plonkish
circuits (which would be detected as block machines) *as long as the
circuit is topologically sorted* (because otherwise, our row-by-row
solving strategy does not work.
Copy constraints are currently only supported in the language as
`connect` identities, as opposed to lists of cell pairs that belong to
the same equivalence class. Connecting this to the PIL input should be
part of another PR.
Fixes#844
This PR adds a new machine to the STD: `WriteOnceMemory`. This can be
used in our RISC-V machine for bootloader inputs (#1203).
Most of the issues mentioned in the issue were fixed in the meantime or
had a simple workaround (like defining `let LATCH = 1`). The only
remaining issues were in the machine detection, which I fixed here.
I also re-factor two existing tests.
With the recent changes by @pacheco, we can extract our [memory
machine](https://github.com/powdr-labs/powdr/blob/main/riscv/src/compiler.rs#L687-L841)
as a separate machine and add it to the standard library.
The result should be the same as calling the function linked above with
`with_bootloader=false`, except that the memory alignment stuff is not
inlined. For this reason, the machine is not yet used by the RISC-V
machine, but it could be after #1077 is implemented.
[This](eb320dca0c) shows the diff from
what we have in `compiler.rs`.
<!--
Please follow this protocol when creating or reviewing PRs in this
repository:
- Leave the PR as draft until review is required.
- When reviewing a PR, every reviewer should assign themselves as soon
as they
start, so that other reviewers know the PR is covered. You should not be
discouraged from reviewing a PR with assignees, but you will know it is
not
strictly needed.
- Unless the PR is very small, help the reviewers by not making forced
pushes, so
that GitHub properly tracks what has been changed since the last review;
use
"merge" instead of "rebase". It can be squashed after approval.
- Once the comments have been addressed, explicitly let the reviewer
know the PR
is ready again.
-->