Commit Graph

13094 Commits

Author SHA1 Message Date
Denys Melnyk
1fdcb13bfb webgpu: fix weight lookup in export_model after compile_net key change (#15919)
* fix lookup site in export_model_webgpu after refactoring

webgpu (sd): fix export_model weight lookup after compile_net changes

fix lookup site in export_model_webgpu after refactoring

* add regression test
2026-04-25 10:04:55 +03:00
Christopher Milan
8b2826ef16 nv: fix shader local memory for NAK (#15921) 2026-04-25 01:03:11 -04:00
Christopher Milan
57fbaa3d49 amd: fallback to llvm when comgr is not available (#15914) 2026-04-24 23:30:16 -04:00
wozeparrot
4b908b6e2c llama: fused ce loss (#15920) 2026-04-24 20:01:24 -07:00
nimlgen
d3378010ee schedule() -> schedule_linear() in tests (batch 1) (#15915)
* schedule_with_vars -> linear_with_vars in tests

* tests batch 1

* batch 2

* estimate_uop

* simpler

* rm
2026-04-24 23:40:53 +03:00
chenyu
b501ba3e42 nll_loss to mixin (#15918) 2026-04-24 15:50:31 -04:00
chenyu
2f9fdb4a37 scatter to mixin (#15917) 2026-04-24 15:37:37 -04:00
nimlgen
f2751955cb remove linear_to_schedule from tests (#15912)
* remove linear_to_schedule from tests

* x
2026-04-24 20:02:10 +03:00
nimlgen
56a9f1e3ff remove last jit_cahce (#15911)
* remove last jit_cahce

* linter
2026-04-24 19:44:52 +03:00
chenyu
03a7604f76 sort argsort topk allclose to mixin (#15910) 2026-04-24 10:20:46 -04:00
nimlgen
4010aa4044 jit: no jit_cache in graphrunner (#15907)
* jit: no jit_cache in graphrunner

* m
2026-04-24 16:34:26 +03:00
chenyu
7a1adfd2aa update Tensor.allclose to return Tensor (#15904)
matches jax
2026-04-24 08:27:17 -04:00
Eitan Turok
48d7ab2695 no uv.lock (#15893) 2026-04-24 20:07:07 +08:00
qazal
5eb641395a viz/cli: select kernel events in -s DEV (#15909)
* simple test

* pass
2026-04-24 21:03:34 +09:00
nimlgen
c0f77c2e1c hcq graph to linear (#15888)
* hcq

* f

* f

* linter
2026-04-24 12:42:49 +03:00
Christopher Milan
cbf4946ea6 usb: multiple gpus and better error messages (#15900) 2026-04-24 01:57:19 -04:00
wozeparrot
9d134a2848 llama: fix fakedata timing (#15905) 2026-04-23 21:37:03 -07:00
b1tg
aab50d1bca llm: dedup MLA cache_v (#15887) 2026-04-24 12:32:10 +08:00
qazal
f379b5a40a sqtt: match amd's TS_DELTA_SHORT offset (#15901) 2026-04-24 06:41:22 +03:00
chenyu
c24da99d56 avg_pool2d, max_pool2d to mixin (#15903)
* avg_pool2d, max_pool2d to mixin

* fix

* just dtype

* that
2026-04-23 23:36:17 -04:00
chenyu
08d9106c9f scatter_reduce and sparse_categorical_crossentropy to mixin (#15902)
also use `.ne` to fix `# type: ignore[comparison-overlap]`
2026-04-23 21:06:36 -04:00
chenyu
8cc2c69e21 fix isclose mixin (#15898)
use `.eq` instead of `==`
2026-04-23 20:40:43 -04:00
nimlgen
3072862e2c metal to linear (#15884)
* metal to linear

* x

* x

* fix
2026-04-23 23:32:22 +03:00
chenyu
782bc6aece broadcast in ElementwiseMixin.div [pr] (#15897) 2026-04-23 16:02:43 -04:00
qazal
7745e05a2f sqtt: update wave end packet names (#15896)
* sqtt: update wave end packet names

* update wavestart and emu
2026-04-24 04:21:22 +09:00
qazal
ee7644932b viz/cli: -t default number (#15894)
* viz/cli: accept one path argument

* -t default

* hm

* only the -t change
2026-04-24 04:13:16 +09:00
chenyu
11c197955b interpolate and cross_entropy to mixin (#15895) 2026-04-23 14:59:45 -04:00
chenyu
f0dbc68aa9 gather to mixin (#15891) 2026-04-23 14:00:57 -04:00
chenyu
87223f870e logcumsumexp, argmax, argmin, sequential to mixin (#15890) 2026-04-23 12:10:42 -04:00
nimlgen
5cf4ad2fb6 fix resolve param (#15889) 2026-04-23 17:41:44 +03:00
nimlgen
e4696185bd cleaner cuda graph (#15886) 2026-04-23 16:34:29 +03:00
wozeparrot
d3cbd781d9 llama: use fused norm mul quantize for w13 (#15878) 2026-04-22 21:27:41 -07:00
George Hotz
0c3260d5d9 rename VECTORIZE to STACK (#15880) 2026-04-23 10:43:42 +08:00
chenyu
7c9bc29e44 Tensor method raise if arg is on different device (#15879)
instead of implicit `to`. this matches torch
2026-04-22 22:20:22 -04:00
chenyu
1fc4b3788c cummax/cummin to mixin (#15877) 2026-04-22 21:25:39 -04:00
chenyu
684e95e1d4 UOp binary op broadcasts dtype (#15875)
* UOp binary op broadcasts dtype

matches Tensor

* fix

* fix?
2026-04-22 20:37:19 -04:00
Christopher Milan
b0dc95a390 AMX in arch, better docs (#15871) 2026-04-22 17:25:18 -04:00
nimlgen
e5891acab2 jit: precompile (#15848)
* x

* jit: precompile as sep step

* x

* s

* x

* x

* x

* ?

* ?

* x

* x

* viz

* f

* x

* u

* x

* x
2026-04-23 00:23:32 +03:00
chenyu
b9e2bc619e simplify bool.cast() != const (#15874) 2026-04-22 17:08:09 -04:00
nimlgen
2041945f4b cuda graph to linear (#15870)
* cuda graph to linear

* fix

* keep as old for now

* x

* x
2026-04-22 23:39:58 +03:00
chenyu
e9ebd03e86 update reduce_to_acc index dtype [pr] (#15873)
index arg should have weakint dtype
2026-04-22 16:25:50 -04:00
chenyu
3c8daa9a75 update test_where_removal (#15872)
don't use UOp.ufix for const_like, it will broadcast dtype soon
2026-04-22 14:56:37 -04:00
George Hotz
09ff3e1883 hotfix: add bytes back to llm 2026-04-23 00:46:27 +08:00
b1tg
af93a677ae llm: glm 4.5 air (#15771)
* llm: glm 4.5 air

* clean

* clean

* remove gguf_size
2026-04-22 22:47:37 +08:00
qazal
719a7bdac5 viz: respect optional estimates in kernel info (#15867)
* simple failing test

* unpack kernel info
2026-04-22 14:24:48 +03:00
George Hotz
2d7fa58e61 fix shapes to match vecless (#15866)
* fix shapes

* need to simplify shapes
2026-04-22 18:27:46 +08:00
qazal
de8f58899e move elf assembler to renderer (#15855)
* move elf assembler to renderer

* other
2026-04-22 19:00:36 +09:00
George Hotz
d4c344b7fd hotfix: keep VCONST exclude in viz 2026-04-22 15:54:24 +08:00
wozeparrot
87378331e8 llama: fused mul quantize fp8 (#15863) 2026-04-21 20:58:37 -07:00
George Hotz
0560fa7b0f add shape to range/special (#15862) 2026-04-22 11:15:02 +08:00