Files
tinygrad/examples
Jacky Lee ef5f648e2f Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502)
* Implement scaled_dot_product_attention and test

* Support attn_mask

* Support is_causal too

* Use in llama

* Don't forget to reshape

* Set requires_grad=False for causal

* Remove staticmethod

* Remove extra spaces
2023-08-08 23:27:13 -07:00
..
2023-03-11 16:28:10 -08:00
2023-08-03 23:35:52 -07:00
2023-08-08 15:13:24 -07:00
2023-08-01 09:35:48 -07:00
2023-06-25 15:37:51 -07:00
2023-07-04 13:50:26 -07:00
2023-02-18 16:36:12 -08:00
2023-03-29 05:07:06 +04:00