chenyu
cf8232ec6a
clean up more RANGEIFY flag ( #12556 )
2025-10-09 03:06:48 -04:00
George Hotz
4c9a930de2
rangeify attn tests ( #12377 )
2025-10-01 09:59:19 +08:00
qazal
109c63b904
update Tensor unit tests for RANGEIFY ( #12359 )
...
* update test_kernelize for RANGEIFY
* also kernelizes user contiguous
* skip that test
* tensor uop repr
* 4 kernels, still realizes a float
2025-09-30 11:17:21 +03:00
Nino Risteski
54be477152
rope cache optim for jit prune in llm.py ( #11678 )
...
* rope cache optim for jit prune
* rope test
* tests in test attention
* Revert "rope test"
This reverts commit 69ede543d0 .
* lint
2025-08-28 08:31:29 -07:00
chenyu
90c3ed17c5
move cast to before softmax in attention ( #9213 )
...
* move cast to before softmax in attention
saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)
* test
2025-02-24 17:24:59 -05:00
chenyu
ff3f2a9c1a
Revert "move attention upcast ( #7830 )" ( #7903 )
...
This reverts commit c07daf40e7 .
2024-11-25 18:59:51 -05:00
chenyu
c07daf40e7
move attention upcast ( #7830 )
...
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).
2024-11-22 17:10:51 -05:00