Files
tinygrad/test
Jacky Lee ef5f648e2f Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502)
* Implement scaled_dot_product_attention and test

* Support attn_mask

* Support is_causal too

* Use in llama

* Don't forget to reshape

* Set requires_grad=False for causal

* Remove staticmethod

* Remove extra spaces
2023-08-08 23:27:13 -07:00
..
2023-08-05 08:53:25 -07:00
2020-12-15 23:44:08 -08:00
2023-06-25 10:38:58 -07:00
2023-07-24 11:19:58 -04:00
2023-08-06 10:32:01 -07:00
2023-02-27 06:53:18 -08:00
2023-08-04 10:53:48 -04:00
2023-08-08 13:58:10 -07:00
2023-07-23 13:00:56 -07:00
2023-08-06 10:32:01 -07:00
2023-07-23 13:00:56 -07:00
2023-08-06 10:32:01 -07:00
2023-08-08 13:58:10 -07:00