- fix a bug in which the wrong GPU may be queried for the max shared memory
- If multiple streams are running split through multiple GPUs,
operations happening on a stream in GPU i should query GPU i about its
max shared memory,
- also fixes wrong indexing at rust side.
- update pbs test parameters to match tfhe-rs' integer tests
- refactor mul_ggsw_glwe to make it easier to read
- fix the way we accumulate the external product result on multi-bit PBS