* working I think
* where are my onnx scatter tests??
* forward_only for now
* try if nan hack fix NV
* looks like issue is different... CUDA WHY
* oops that was wrong. Try if this fixes CUDA
* simpler multiply
* actually finish this up tmrw morning :x
* fix tests?
* improve tests
* improve test and implementation
* fix ruff
* complete but lots of expected failure...
* reviewed tests
* add onnx tests
* is this a processing op?
* add return type to indicate that it's not in-place
* final cleanups
* use or and improve tests a little
* add masked_index_select
* call it masked_setitem instead
* try
* FIXED
---------
Co-authored-by: chenyu <chenyu@fastmail.com>