Commit Graph

4 Commits

Author SHA1 Message Date
Nino Risteski
54be477152 rope cache optim for jit prune in llm.py (#11678)
* rope cache optim for jit prune

* rope test

* tests in test attention

* Revert "rope test"

This reverts commit 69ede543d0.

* lint
2025-08-28 08:31:29 -07:00
chenyu
90c3ed17c5 move cast to before softmax in attention (#9213)
* move cast to before softmax in attention

saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)

* test
2025-02-24 17:24:59 -05:00
chenyu
ff3f2a9c1a Revert "move attention upcast (#7830)" (#7903)
This reverts commit c07daf40e7.
2024-11-25 18:59:51 -05:00
chenyu
c07daf40e7 move attention upcast (#7830)
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).
2024-11-22 17:10:51 -05:00