Files
tinygrad/test
chenyu 90c3ed17c5 move cast to before softmax in attention (#9213)
* move cast to before softmax in attention

saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)

* test
2025-02-24 17:24:59 -05:00
..
2025-02-24 16:15:22 -05:00
2025-02-20 18:03:09 -05:00
2025-02-13 12:24:29 +08:00
2025-02-20 18:03:09 -05:00
2024-12-06 15:48:16 +01:00
2020-12-15 23:44:08 -08:00
2024-11-11 20:18:04 +08:00
2025-02-18 15:26:58 +08:00
2025-02-17 14:47:54 +01:00
2024-07-12 20:43:36 -07:00
2025-02-20 18:03:09 -05:00
2025-02-18 15:26:58 +08:00
2025-02-20 18:03:09 -05:00
2025-01-26 17:59:15 +09:00
2025-02-20 18:03:09 -05:00
2025-02-20 18:03:09 -05:00
2025-01-29 13:53:23 -05:00
2023-12-07 17:07:05 -08:00
2025-02-13 15:42:34 -05:00
2025-02-23 19:23:14 +01:00
2025-02-20 18:03:09 -05:00
2025-02-20 18:03:09 -05:00
2025-02-24 13:55:47 +01:00
2025-01-20 09:40:36 -08:00
2024-10-12 18:20:44 +08:00
2025-02-10 12:45:11 +01:00
2025-02-20 18:03:09 -05:00
2025-02-20 18:03:09 -05:00
2025-02-20 18:03:09 -05:00
2025-02-20 18:03:09 -05:00