- The original code seemed to assume that the Barrett reduction would not
overflow if p <= 2^31, this is incorrect but rare
- The correctness constraint has a bound much smaller than 2^31, some
primes bigger than the derived threshold can still use the fast code
given a certain criterion is respected which corresponds to a "lucky" case
of the Barrett reduction, the new code now manages this
maths explained in https://blog.zksecurity.xyz/posts/barrett-tighter-bound/
and copiously in comments in the code
Those root-of-unity enable friendly twiddle generation with low hamming-weigth.
And thus, enable to replace some multiplication with simple shift.
Co-authored-by: Baptiste Roux <baptiste.roux@zama.ai>