Commit Graph

12768 Commits

Author SHA1 Message Date
qazal
ec5b7a249e viz: refactor sqtt timeline builder (#15494)
* viz: refactor sqtt timeline builder

* barrier maps to waves

* clean up cli
2026-03-26 21:16:15 +09:00
Christopher Milan
313937ad6d fix IMAGE TestEnd2End.test_linear_mnist (#15488) 2026-03-26 04:12:47 -04:00
Christopher Milan
bc180a963c deprecate <dev>=1 in favor of DEV=<dev> (#15467)
* start work on target

* add test

* update actions to use DEV

* update docs

* update readmes

* tests need that too

* update example

* update tests (comments)

* fix that test

* ruff

* mypy

* oops

* remove getenvs

* don't add Target yet

* and the test

* lint

* and docs

* more stuff

* assert

* few more fixes

* test assert
2026-03-26 03:48:03 -04:00
chenyu
8426f820a1 Tensor.sub to mixin (#15486)
also _broadcasted skipped broadcasting shape if it does not have shape
2026-03-25 23:20:56 -04:00
wozeparrot
1ca178f379 llama: stochastic rounding (#15456) 2026-03-25 18:16:31 -07:00
chenyu
7c8f992894 move EXPAND dtype cast back to gradient.py (#15481)
only a concern for gradient, not mixin
2026-03-25 19:25:26 -04:00
nimlgen
9d2d0774b4 remote: disk copies (#15482)
* remote: disk copies

* lineter

* r

* nv

* x
2026-03-25 22:14:25 +03:00
qazal
7c2c8d3905 viz: small ux improvements (#15483)
* test

* better

* work
2026-03-26 03:18:25 +09:00
qazal
737d5f67f9 viz: compute canvas dims for auto zoom (#15474) 2026-03-26 00:05:23 +09:00
qazal
60bd546593 sqtt: add cycle count to rdna3 enums (#15473)
* update rdna3 sqtt enums to include cycle_count

* dispatch_to_exec
2026-03-25 23:19:54 +09:00
chenyu
142bf11926 logical_not to mixin [pr] (#15472)
also UPat.cast skips same dtype
2026-03-25 09:16:45 -04:00
George Hotz
25ff7146f2 add a status line to REMOTE with DEBUG=1 (#15471)
* python speedups of hot paths

* add a status line to REMOTE with DEBUG=1

* pc

* t
2026-03-25 20:54:56 +08:00
qazal
c973b508b8 viz/cli: pass ctrlc (#15470) 2026-03-25 21:13:28 +09:00
George Hotz
c1a7d90ccc python speedups of hot paths (#15469) 2026-03-25 20:02:42 +08:00
George Hotz
ae7090b13b print function timing with DEBUG=2 (#15468)
* add DEBUG=2 function timing

* remove those functions, they aren't useful

* fix spec
2026-03-25 19:07:32 +08:00
Christopher Milan
e7f389efda fix height=1 images on macos (#15460) 2026-03-25 05:59:56 -04:00
George Hotz
789628df2e hotfix: add USE_BOT flag to ASM24 USB 2026-03-25 15:00:08 +08:00
George Hotz
cd1a276f47 llm: support gguf path or url (#15464)
* llm: support gguf path or url

* one line
2026-03-25 14:43:19 +08:00
chenyu
713b322e70 add weakint to promo_lattice (#15463)
sits between bool and smallest int
2026-03-25 00:27:34 -04:00
chenyu
02878c5a2f move _broadcasted to OpMixin (#15461)
it needs both ElementwiseMixin and MovementMixin
2026-03-24 23:56:01 -04:00
chenyu
519ba22470 more Tensor._broadcasted cleanup (#15459)
prep moving to mixin
2026-03-24 22:55:45 -04:00
George Hotz
fe2690399b llm: support assistant prefill + refactor to TransformerConfig (#15457)
* llm: support assistant prefill

* refactor to ModelConfig

* TransformerConfig

* more
2026-03-25 10:50:48 +08:00
Christopher Milan
fd92aec094 cleanup unused image pitch code (#15458) 2026-03-24 22:47:16 -04:00
chenyu
f6ed4da268 Tensor.ufix (#15452)
* Tensor.ufix

prep moving _broadcasted to mixin

* remove backward_cast
2026-03-24 22:34:43 -04:00
qazal
1b3d00d6ac viz/cli: remove --offset and --limit flags (#15439)
* work

* also no more no-color

* reorder

* update llama

* sqtt readme

* itertools

* rm that

* signals back
2026-03-25 09:52:27 +09:00
wozeparrot
da2031266a llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
qazal
652bab8aad viz: support nested track_rewrites (#15454)
* simple test

* stack active groups
2026-03-25 05:01:30 +09:00
qazal
41eb2cc41b viz: preserve zoom between re renders (#15451) 2026-03-25 03:11:10 +09:00
Salman Chishti
84049fdc07 Upgrade GitHub Actions to latest versions (#15446)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-24 10:28:49 -04:00
Salman Chishti
9567075e20 Upgrade GitHub Actions for Node 24 compatibility (#15445)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-24 10:28:19 -04:00
chenyu
b7960841af support shape broadcast in UOp.alu (#15442)
i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast
2026-03-24 10:14:57 -04:00
George Hotz
a33ac869aa llm server: temperature + test client (#15444)
* improvements to the llm server

* eval script

* eval llm

* better eval gets 58.71

* cleanups

* add temperature, but multinomial is absurdly slow

* claude is so smart

* lint

* remove slop

* no more stop
2026-03-24 21:07:15 +08:00
nimlgen
9db5d677c7 jit in viz (#15447) 2026-03-24 18:23:53 +08:00
Christopher Milan
2e4fbbcc9c ir3: fix texture mapping and benchmark (#15443) 2026-03-24 04:52:54 -04:00
Christopher Milan
d5320a9ddf QCOM cleanups (#15435) 2026-03-23 22:18:38 -04:00
George Hotz
85dee83f5d amd flash attention cleanups + emulator fixes (#15431)
* amd flash attention cleanups

* simpler

* params

* fix emulator bugs

* fix idiv bug

* remove that test

* more emu fixes
2026-03-24 10:10:46 +08:00
chenyu
018a9e2d3c remove match_dtype arg in Tensor._broadcasted (#15440)
reworked Tensor.where to not need it, also updated dtypes.from_py to use isinstance because ConstFloat issues
2026-03-23 22:10:39 -04:00
qazal
a590eded87 sqtt: rdna4 decoder work (#15434)
* sqtt: rdna4 decoder work

* diff cleanup

* more diff

* test

* work

* works

* TS_DELTA_SHORT
2026-03-24 03:49:32 +09:00
qazal
109472c37e sqtt: new s_barrier pickles, handle rdna4 barriers in emulator (#15437) 2026-03-24 03:25:28 +09:00
nimlgen
fa4cdb422e memplan on linears (#15422)
* memplan

* test

* x

* arenas

* correct

* set any size

* ugh

* make hevc happy

* x

* x

* held

* rm old

* del

* x

* fu

* f

* cl

* cl

* ok
2026-03-23 19:50:16 +08:00
nimlgen
2da008ae3b jit: rm replan (#15433) 2026-03-23 19:31:51 +08:00
qazal
c4c53418f8 sqtt: comment out flaky rocprof timestamp assert (#15432)
* comment out rocprof assert, add new assert

* better than > 0 assert

* string
2026-03-23 19:24:04 +09:00
chenyu
66a86f88a0 simpler Tensor._broadcasted inferred dtype (#15430) 2026-03-23 05:20:11 -04:00
Pham Nguyen Hung
c89576921d Updated the APIs of mnist_gan (#15429)
Co-authored-by: pnhung1703@gmail.com <Hung Pham>
2026-03-23 17:04:00 +08:00
George Hotz
c62dea6881 ai slop flash attention (it works) (#15401)
* ai slop flash attention (it works)

* speed up, 2 TFLOPS + 7 GB/s

* simpler

* simpler

* optimize

* faster

* warp shuffle

* sqtt: link dispatch to exec (#15396)

* sqtt packet linking infra

python

* javascript

* ~doubly linked list

* ui works

* work

* exec can also highlight the pc, coloring work

* more work

* rm sqtt/model.py, doesn't need to be upstreamed

* viz: no context enters in cli, update llama profile (#15404)

* removed unused named arg in rules [pr] (#15414)

* viz: sqtt printer in viz/cli.py (#15411)

* work

* sqtt timeline in CLI

* format all printers nicely

* s/Showed/Printed

* ansistrip

* sys.exit

* keep colors in list

* work from amd_copy_matmul

* has_more always gets returned

* linter

* don't print colors

* more colors

* wow this is so deep

* work

* minor details

* selected

* improve progress bar

* remove it

* 22, global_load_vaddr is so long

* remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416)

Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb

Co-authored-by: Amp <amp@ampcode.com>

* works

* cnt=20

* revert that

* uop slice tests

* simpler

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: gg <ggordbegli@gmail.com>
Co-authored-by: Amp <amp@ampcode.com>
2026-03-23 16:15:10 +08:00
qazal
1568a5ed07 viz: show dispatch to exec delay in sidebar (#15428) 2026-03-23 16:59:59 +09:00
Christopher Milan
ddaeebb500 nir: add shift support (#15426) 2026-03-23 03:37:44 -04:00
nimlgen
c74fa9bbe1 fix jitbeam not triggered (#15424)
* um

* beam

* x

* f
2026-03-23 15:34:59 +08:00
qazal
fd3559103b viz/cli: better error message for empty itrace (#15425) 2026-03-23 15:50:20 +09:00
nimlgen
395aacd77d jit: prune on linear (#15423)
* jit: prune on linear

* x

* this is from the future
2026-03-23 14:10:34 +08:00