mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-07 22:23:55 -05:00
global -> group (#1007)
* global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable
This commit is contained in:
@@ -36,7 +36,7 @@ tinygrad can run [LLaMA](/docs/showcase.md#llama) and [Stable Diffusion](/docs/s
|
||||
Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.
|
||||
|
||||
```sh
|
||||
DEBUG=3 OPTLOCAL=1 python3 -c "from tinygrad.tensor import Tensor;
|
||||
DEBUG=3 python3 -c "from tinygrad.tensor import Tensor;
|
||||
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
|
||||
c = (a.reshape(N, 1, N) * b.permute(1,0).reshape(1, N, N)).sum(axis=2);
|
||||
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
|
||||
|
||||
Reference in New Issue
Block a user