Files
powdr/test_data/std
Georg Wiese b6f41e242c Add PoseidonGLMemory machine (#1525)
Implements #1055 for the Poseidon machines. Pulled out of #1508.

Specifically, this PR adds a new `PoseidonGLMemory` machine which
receives 2 memory points and then:
- Reads 24 32-Bit words and packs them into 12 field elements
- Computes the Poseidon permutation (just like `PoseidonGL`)
- For each of the 4 output field elements, it:
- Invokes the `SplitGL` machine to get the canonical `u64`
representation
  - Writes the 8 32-Bit words to memory at the provided memory pointer

The read and write memory regions can even overlap! 🎉 

This should simplify our RISC-V machine, as the syscall already expects
two memory pointers. We can simply pass it to the machine directly.

I started doing that in #1533, but I think it makes sense to wait until
#1443 is merged.

To test:
```
cargo run -r pil test_data/std/poseidon_gl_test.asm -o output -f --export-csv --prove-with estark-starky
```

I recommend reviewing the diff between
`std/machines/hash/poseidon_gl.asm` and
`std/machines/hash/poseidon_gl_memory.asm`

### Discussion

The overhead of the memory read / write is quite high (18 extra witness
columns, see [this
comment](40bdca4368/std/machines/hash/poseidon_gl_memory.asm (L13-L23)),
mostly because we now need to have the input available in all rows
(which previously was only the case for the outputs). If we had offsets
other than 0 and 1, this could be avoided. Doing 24 parallel memory
reads in the first row would *not* help, because we'd have to add 24
witness columns (instead of 2 now) to store the result of the memory
operation.

A few more notes:
- With Vadcop, 18 extra witness columns in a secondary machine is *a
lot* better than introducing more registers (either "regular" registers
or assignment registers) in the main machine
- As mentioned
[here](40bdca4368/std/machines/hash/poseidon_gl_memory.asm (L111-L113)),
we could get rid of two permutations if either:
- We were able to express explicitly that we want to call at most one
operation in the current row, or
- We had an optimizer that would be smart enough to batch the memory
reads and writes.
- We could also have just 1 read or write at a time (instead of 2), but
we'd have to increase the block size from 31 to 32 and the
implementation would be more complicated.
- We could also store the full final state of the Poseidon permutation,
instead of just the first 4 elements. This would need 8 more witness
columns to make the entire output available in all rows. Then, one could
use the machine to implement a Poseidon sponge, instead of.
- Looking at the bootloader, maybe it makes sense to pass 3 input
pointers instead of 1: One for the first 4 elements, one for the next 4,
and one for the capacity (often just a constant). For example, when
computing a Merkle root, you'd pass pointers for the two children hashes
and a pointer to the capacity constant.
2024-07-08 15:20:39 +00:00
..