* bitcast renderers
* fast llama load
* make it one kernel
* regression testing p1: re-enable test_dtype for all backends
fix GPU
* regression testing p2: fuzz all possible cases against numpy
remove hancoded tests since the fuzzer covers them
* define ushort
* fix indent, probably need flake8 back for CI to catch
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>