Files
tinygrad/test/unit
chenyu 90c3ed17c5 move cast to before softmax in attention (#9213)
* move cast to before softmax in attention

saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)

* test
2025-02-24 17:24:59 -05:00
..
2023-12-05 16:17:57 -08:00
2025-02-20 18:03:09 -05:00
2025-02-20 18:03:09 -05:00
2024-10-25 17:05:09 +07:00
2025-02-20 18:03:09 -05:00