tinygrad/test/test_multitensor.py at 025fbf4e80abee1bc7c767bbcbd19ebdea9380fe

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Files

Yixiang Gao 13e872b53f add mutigpu support for llama attention (#3064 )

* add llama attention test for multigpu

* test fails

* kv cache trying to shrink on sharded axis

* mask None works for scale dot product

* kv cache seems to be working but scale dot product breaks

* scaled dot product works, but the last linear layer failed

* running into the reshape case where it could be wrong for multigpu

* making sure it was the reshape

* adding contiguous doesn't solve

* need to shard more properly

* remove reshape test

* minor adjustment to scale dot product attention test

* weights are sharded wrong

* continue fix new weight sharding

* clean up

* fix attention when start_pos is 0

* remove print

* add TODOs for the best mutigpu interface

2024-01-11 16:31:02 -08:00

8.9 KiB

Raw Blame History

View Raw

8.9 KiB Raw Blame History

8.9 KiB

Raw Blame History