Commit Graph

12781 Commits

Author SHA1 Message Date
nimlgen
0d6fc0f571 jit: graphing in uops (#15489)
* jit: graphing as rewrite rule

* f

* +metal,cuda

* x

* cl

* x

* x

* simpler

* f

* m

* x

* revert?

* revert2

* back

* back

* t

* x

* m

* x

* c

* x

* l

* x

* comment

* smaller

* rv

* x

* x
2026-03-27 19:09:02 +03:00
chenyu
30ebbe7f17 few more fold valid tests (#15509)
from remove CORRECT_DIVMOD_FOLDING attempt
2026-03-27 10:38:42 -04:00
Christopher Milan
9e0cc5c6ae create image buffers in late codegen (#15493) 2026-03-27 04:50:53 -04:00
chenyu
1198d6e908 move pow to mixin (#15507) 2026-03-27 03:16:40 -04:00
chenyu
323fcefd7d Revert "DEV is a ContextVar (#15505)" (#15506)
This reverts commit fdb30cba96.
2026-03-27 02:22:40 -04:00
Christopher Milan
fdb30cba96 DEV is a ContextVar (#15505) 2026-03-27 00:57:09 -04:00
wozeparrot
a65e958be9 llama: new apply_grad (#15503) 2026-03-26 19:39:25 -07:00
Christopher Milan
67a50fb738 move where on load with casts (#15492) 2026-03-26 22:11:27 -04:00
qazal
586c49642f viz/cli: test in CI (#15501)
* viz cli work

* baseline test

* make cli test work without subprocess

* more checks

* check itrace

* s/return/return None

* change

* minimal

* colored
2026-03-27 06:47:15 +09:00
qazal
3f9f0fa846 viz: yield sqtt alt events (#15500)
* yield other

* less

* work

* less
2026-03-27 04:43:41 +09:00
qazal
237c25031f sqtt: construct OTHER_SIMD op types with for loop (#15495)
* other-lds from amd_copy_matmul

* more other

* other simd work
2026-03-26 23:07:18 +09:00
nimlgen
7193f90746 test view input in jit (#15497)
* will anything fail?

* add test
2026-03-26 16:59:47 +03:00
nimlgen
de24b3fe37 jit: pass init params straight to base (#15496)
* jit: pass init params straight to base

* linter
2026-03-26 16:59:10 +03:00
qazal
ec5b7a249e viz: refactor sqtt timeline builder (#15494)
* viz: refactor sqtt timeline builder

* barrier maps to waves

* clean up cli
2026-03-26 21:16:15 +09:00
Christopher Milan
313937ad6d fix IMAGE TestEnd2End.test_linear_mnist (#15488) 2026-03-26 04:12:47 -04:00
Christopher Milan
bc180a963c deprecate <dev>=1 in favor of DEV=<dev> (#15467)
* start work on target

* add test

* update actions to use DEV

* update docs

* update readmes

* tests need that too

* update example

* update tests (comments)

* fix that test

* ruff

* mypy

* oops

* remove getenvs

* don't add Target yet

* and the test

* lint

* and docs

* more stuff

* assert

* few more fixes

* test assert
2026-03-26 03:48:03 -04:00
chenyu
8426f820a1 Tensor.sub to mixin (#15486)
also _broadcasted skipped broadcasting shape if it does not have shape
2026-03-25 23:20:56 -04:00
wozeparrot
1ca178f379 llama: stochastic rounding (#15456) 2026-03-25 18:16:31 -07:00
chenyu
7c8f992894 move EXPAND dtype cast back to gradient.py (#15481)
only a concern for gradient, not mixin
2026-03-25 19:25:26 -04:00
nimlgen
9d2d0774b4 remote: disk copies (#15482)
* remote: disk copies

* lineter

* r

* nv

* x
2026-03-25 22:14:25 +03:00
qazal
7c2c8d3905 viz: small ux improvements (#15483)
* test

* better

* work
2026-03-26 03:18:25 +09:00
qazal
737d5f67f9 viz: compute canvas dims for auto zoom (#15474) 2026-03-26 00:05:23 +09:00
qazal
60bd546593 sqtt: add cycle count to rdna3 enums (#15473)
* update rdna3 sqtt enums to include cycle_count

* dispatch_to_exec
2026-03-25 23:19:54 +09:00
chenyu
142bf11926 logical_not to mixin [pr] (#15472)
also UPat.cast skips same dtype
2026-03-25 09:16:45 -04:00
George Hotz
25ff7146f2 add a status line to REMOTE with DEBUG=1 (#15471)
* python speedups of hot paths

* add a status line to REMOTE with DEBUG=1

* pc

* t
2026-03-25 20:54:56 +08:00
qazal
c973b508b8 viz/cli: pass ctrlc (#15470) 2026-03-25 21:13:28 +09:00
George Hotz
c1a7d90ccc python speedups of hot paths (#15469) 2026-03-25 20:02:42 +08:00
George Hotz
ae7090b13b print function timing with DEBUG=2 (#15468)
* add DEBUG=2 function timing

* remove those functions, they aren't useful

* fix spec
2026-03-25 19:07:32 +08:00
Christopher Milan
e7f389efda fix height=1 images on macos (#15460) 2026-03-25 05:59:56 -04:00
George Hotz
789628df2e hotfix: add USE_BOT flag to ASM24 USB 2026-03-25 15:00:08 +08:00
George Hotz
cd1a276f47 llm: support gguf path or url (#15464)
* llm: support gguf path or url

* one line
2026-03-25 14:43:19 +08:00
chenyu
713b322e70 add weakint to promo_lattice (#15463)
sits between bool and smallest int
2026-03-25 00:27:34 -04:00
chenyu
02878c5a2f move _broadcasted to OpMixin (#15461)
it needs both ElementwiseMixin and MovementMixin
2026-03-24 23:56:01 -04:00
chenyu
519ba22470 more Tensor._broadcasted cleanup (#15459)
prep moving to mixin
2026-03-24 22:55:45 -04:00
George Hotz
fe2690399b llm: support assistant prefill + refactor to TransformerConfig (#15457)
* llm: support assistant prefill

* refactor to ModelConfig

* TransformerConfig

* more
2026-03-25 10:50:48 +08:00
Christopher Milan
fd92aec094 cleanup unused image pitch code (#15458) 2026-03-24 22:47:16 -04:00
chenyu
f6ed4da268 Tensor.ufix (#15452)
* Tensor.ufix

prep moving _broadcasted to mixin

* remove backward_cast
2026-03-24 22:34:43 -04:00
qazal
1b3d00d6ac viz/cli: remove --offset and --limit flags (#15439)
* work

* also no more no-color

* reorder

* update llama

* sqtt readme

* itertools

* rm that

* signals back
2026-03-25 09:52:27 +09:00
wozeparrot
da2031266a llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
qazal
652bab8aad viz: support nested track_rewrites (#15454)
* simple test

* stack active groups
2026-03-25 05:01:30 +09:00
qazal
41eb2cc41b viz: preserve zoom between re renders (#15451) 2026-03-25 03:11:10 +09:00
Salman Chishti
84049fdc07 Upgrade GitHub Actions to latest versions (#15446)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-24 10:28:49 -04:00
Salman Chishti
9567075e20 Upgrade GitHub Actions for Node 24 compatibility (#15445)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-24 10:28:19 -04:00
chenyu
b7960841af support shape broadcast in UOp.alu (#15442)
i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast
2026-03-24 10:14:57 -04:00
George Hotz
a33ac869aa llm server: temperature + test client (#15444)
* improvements to the llm server

* eval script

* eval llm

* better eval gets 58.71

* cleanups

* add temperature, but multinomial is absurdly slow

* claude is so smart

* lint

* remove slop

* no more stop
2026-03-24 21:07:15 +08:00
nimlgen
9db5d677c7 jit in viz (#15447) 2026-03-24 18:23:53 +08:00
Christopher Milan
2e4fbbcc9c ir3: fix texture mapping and benchmark (#15443) 2026-03-24 04:52:54 -04:00
Christopher Milan
d5320a9ddf QCOM cleanups (#15435) 2026-03-23 22:18:38 -04:00
George Hotz
85dee83f5d amd flash attention cleanups + emulator fixes (#15431)
* amd flash attention cleanups

* simpler

* params

* fix emulator bugs

* fix idiv bug

* remove that test

* more emu fixes
2026-03-24 10:10:46 +08:00
chenyu
018a9e2d3c remove match_dtype arg in Tensor._broadcasted (#15440)
reworked Tensor.where to not need it, also updated dtypes.from_py to use isinstance because ConstFloat issues
2026-03-23 22:10:39 -04:00