global -> group (#1007)

* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
This commit is contained in:
George Hotz
2023-06-21 11:50:43 -07:00
committed by GitHub
parent aab9ee0fca
commit 18892242b0
17 changed files with 81 additions and 90 deletions

View File

@@ -36,7 +36,7 @@ tinygrad can run [LLaMA](/docs/showcase.md#llama) and [Stable Diffusion](/docs/s
Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.
```sh
DEBUG=3 OPTLOCAL=1 python3 -c "from tinygrad.tensor import Tensor;
DEBUG=3 python3 -c "from tinygrad.tensor import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.permute(1,0).reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"