Denys Melnyk
1fdcb13bfb
webgpu: fix weight lookup in export_model after compile_net key change ( #15919 )
...
* fix lookup site in export_model_webgpu after refactoring
webgpu (sd): fix export_model weight lookup after compile_net changes
fix lookup site in export_model_webgpu after refactoring
* add regression test
2026-04-25 10:04:55 +03:00
Christopher Milan
8b2826ef16
nv: fix shader local memory for NAK ( #15921 )
2026-04-25 01:03:11 -04:00
Christopher Milan
57fbaa3d49
amd: fallback to llvm when comgr is not available ( #15914 )
2026-04-24 23:30:16 -04:00
wozeparrot
4b908b6e2c
llama: fused ce loss ( #15920 )
2026-04-24 20:01:24 -07:00
nimlgen
d3378010ee
schedule() -> schedule_linear() in tests (batch 1) ( #15915 )
...
* schedule_with_vars -> linear_with_vars in tests
* tests batch 1
* batch 2
* estimate_uop
* simpler
* rm
2026-04-24 23:40:53 +03:00
chenyu
b501ba3e42
nll_loss to mixin ( #15918 )
2026-04-24 15:50:31 -04:00
chenyu
2f9fdb4a37
scatter to mixin ( #15917 )
2026-04-24 15:37:37 -04:00
nimlgen
f2751955cb
remove linear_to_schedule from tests ( #15912 )
...
* remove linear_to_schedule from tests
* x
2026-04-24 20:02:10 +03:00
nimlgen
56a9f1e3ff
remove last jit_cahce ( #15911 )
...
* remove last jit_cahce
* linter
2026-04-24 19:44:52 +03:00
chenyu
03a7604f76
sort argsort topk allclose to mixin ( #15910 )
2026-04-24 10:20:46 -04:00
nimlgen
4010aa4044
jit: no jit_cache in graphrunner ( #15907 )
...
* jit: no jit_cache in graphrunner
* m
2026-04-24 16:34:26 +03:00
chenyu
7a1adfd2aa
update Tensor.allclose to return Tensor ( #15904 )
...
matches jax
2026-04-24 08:27:17 -04:00
Eitan Turok
48d7ab2695
no uv.lock ( #15893 )
2026-04-24 20:07:07 +08:00
qazal
5eb641395a
viz/cli: select kernel events in -s DEV ( #15909 )
...
* simple test
* pass
2026-04-24 21:03:34 +09:00
nimlgen
c0f77c2e1c
hcq graph to linear ( #15888 )
...
* hcq
* f
* f
* linter
2026-04-24 12:42:49 +03:00
Christopher Milan
cbf4946ea6
usb: multiple gpus and better error messages ( #15900 )
2026-04-24 01:57:19 -04:00
wozeparrot
9d134a2848
llama: fix fakedata timing ( #15905 )
2026-04-23 21:37:03 -07:00
b1tg
aab50d1bca
llm: dedup MLA cache_v ( #15887 )
2026-04-24 12:32:10 +08:00
qazal
f379b5a40a
sqtt: match amd's TS_DELTA_SHORT offset ( #15901 )
2026-04-24 06:41:22 +03:00
chenyu
c24da99d56
avg_pool2d, max_pool2d to mixin ( #15903 )
...
* avg_pool2d, max_pool2d to mixin
* fix
* just dtype
* that
2026-04-23 23:36:17 -04:00
chenyu
08d9106c9f
scatter_reduce and sparse_categorical_crossentropy to mixin ( #15902 )
...
also use `.ne` to fix `# type: ignore[comparison-overlap]`
2026-04-23 21:06:36 -04:00
chenyu
8cc2c69e21
fix isclose mixin ( #15898 )
...
use `.eq` instead of `==`
2026-04-23 20:40:43 -04:00
nimlgen
3072862e2c
metal to linear ( #15884 )
...
* metal to linear
* x
* x
* fix
2026-04-23 23:32:22 +03:00
chenyu
782bc6aece
broadcast in ElementwiseMixin.div [pr] ( #15897 )
2026-04-23 16:02:43 -04:00
qazal
7745e05a2f
sqtt: update wave end packet names ( #15896 )
...
* sqtt: update wave end packet names
* update wavestart and emu
2026-04-24 04:21:22 +09:00
qazal
ee7644932b
viz/cli: -t default number ( #15894 )
...
* viz/cli: accept one path argument
* -t default
* hm
* only the -t change
2026-04-24 04:13:16 +09:00
chenyu
11c197955b
interpolate and cross_entropy to mixin ( #15895 )
2026-04-23 14:59:45 -04:00
chenyu
f0dbc68aa9
gather to mixin ( #15891 )
2026-04-23 14:00:57 -04:00
chenyu
87223f870e
logcumsumexp, argmax, argmin, sequential to mixin ( #15890 )
2026-04-23 12:10:42 -04:00
nimlgen
5cf4ad2fb6
fix resolve param ( #15889 )
2026-04-23 17:41:44 +03:00
nimlgen
e4696185bd
cleaner cuda graph ( #15886 )
2026-04-23 16:34:29 +03:00
wozeparrot
d3cbd781d9
llama: use fused norm mul quantize for w13 ( #15878 )
2026-04-22 21:27:41 -07:00
George Hotz
0c3260d5d9
rename VECTORIZE to STACK ( #15880 )
2026-04-23 10:43:42 +08:00
chenyu
7c9bc29e44
Tensor method raise if arg is on different device ( #15879 )
...
instead of implicit `to`. this matches torch
2026-04-22 22:20:22 -04:00
chenyu
1fc4b3788c
cummax/cummin to mixin ( #15877 )
2026-04-22 21:25:39 -04:00
chenyu
684e95e1d4
UOp binary op broadcasts dtype ( #15875 )
...
* UOp binary op broadcasts dtype
matches Tensor
* fix
* fix?
2026-04-22 20:37:19 -04:00
Christopher Milan
b0dc95a390
AMX in arch, better docs ( #15871 )
2026-04-22 17:25:18 -04:00
nimlgen
e5891acab2
jit: precompile ( #15848 )
...
* x
* jit: precompile as sep step
* x
* s
* x
* x
* x
* ?
* ?
* x
* x
* viz
* f
* x
* u
* x
* x
2026-04-23 00:23:32 +03:00
chenyu
b9e2bc619e
simplify bool.cast() != const ( #15874 )
2026-04-22 17:08:09 -04:00
nimlgen
2041945f4b
cuda graph to linear ( #15870 )
...
* cuda graph to linear
* fix
* keep as old for now
* x
* x
2026-04-22 23:39:58 +03:00
chenyu
e9ebd03e86
update reduce_to_acc index dtype [pr] ( #15873 )
...
index arg should have weakint dtype
2026-04-22 16:25:50 -04:00
chenyu
3c8daa9a75
update test_where_removal ( #15872 )
...
don't use UOp.ufix for const_like, it will broadcast dtype soon
2026-04-22 14:56:37 -04:00
George Hotz
09ff3e1883
hotfix: add bytes back to llm
2026-04-23 00:46:27 +08:00
b1tg
af93a677ae
llm: glm 4.5 air ( #15771 )
...
* llm: glm 4.5 air
* clean
* clean
* remove gguf_size
2026-04-22 22:47:37 +08:00
qazal
719a7bdac5
viz: respect optional estimates in kernel info ( #15867 )
...
* simple failing test
* unpack kernel info
2026-04-22 14:24:48 +03:00
George Hotz
2d7fa58e61
fix shapes to match vecless ( #15866 )
...
* fix shapes
* need to simplify shapes
2026-04-22 18:27:46 +08:00
qazal
de8f58899e
move elf assembler to renderer ( #15855 )
...
* move elf assembler to renderer
* other
2026-04-22 19:00:36 +09:00
George Hotz
d4c344b7fd
hotfix: keep VCONST exclude in viz
2026-04-22 15:54:24 +08:00
wozeparrot
87378331e8
llama: fused mul quantize fp8 ( #15863 )
2026-04-21 20:58:37 -07:00
George Hotz
0560fa7b0f
add shape to range/special ( #15862 )
2026-04-22 11:15:02 +08:00