* KERNELIZE makes softmax a single kernel
* single kernel works
* softmax works
* broken
* correct
* skip that test
* kernelize tests
* rename to fuse
* better reduce_push_add_ones code
* correct now
* cleanups
* oops
* return None if we can't push ones
* rename + docs
* atol fixes group
* flash attention broken test
* dsp simulator
* progress
* fix
* close on test tiny
* working
* less waste
* line savings
* Device DSP compiler
* mock DSP at the bottom
* DSP tests
* docker caching
* test update
* need load
* skip that test for CI DSP
* last touch
* ugh
* add tiny test for randomness
* Tensor._device_seeds is a Tuple
* no tuple, just a 2 element tensor
* no more longs
* fix tests, and maybe ocelot works now
* NV still doesn't work. cleanup rules
* test + two more rules