Commit Graph

12757 Commits

Author SHA1 Message Date
George Hotz
25ff7146f2 add a status line to REMOTE with DEBUG=1 (#15471)
* python speedups of hot paths

* add a status line to REMOTE with DEBUG=1

* pc

* t
2026-03-25 20:54:56 +08:00
qazal
c973b508b8 viz/cli: pass ctrlc (#15470) 2026-03-25 21:13:28 +09:00
George Hotz
c1a7d90ccc python speedups of hot paths (#15469) 2026-03-25 20:02:42 +08:00
George Hotz
ae7090b13b print function timing with DEBUG=2 (#15468)
* add DEBUG=2 function timing

* remove those functions, they aren't useful

* fix spec
2026-03-25 19:07:32 +08:00
Christopher Milan
e7f389efda fix height=1 images on macos (#15460) 2026-03-25 05:59:56 -04:00
George Hotz
789628df2e hotfix: add USE_BOT flag to ASM24 USB 2026-03-25 15:00:08 +08:00
George Hotz
cd1a276f47 llm: support gguf path or url (#15464)
* llm: support gguf path or url

* one line
2026-03-25 14:43:19 +08:00
chenyu
713b322e70 add weakint to promo_lattice (#15463)
sits between bool and smallest int
2026-03-25 00:27:34 -04:00
chenyu
02878c5a2f move _broadcasted to OpMixin (#15461)
it needs both ElementwiseMixin and MovementMixin
2026-03-24 23:56:01 -04:00
chenyu
519ba22470 more Tensor._broadcasted cleanup (#15459)
prep moving to mixin
2026-03-24 22:55:45 -04:00
George Hotz
fe2690399b llm: support assistant prefill + refactor to TransformerConfig (#15457)
* llm: support assistant prefill

* refactor to ModelConfig

* TransformerConfig

* more
2026-03-25 10:50:48 +08:00
Christopher Milan
fd92aec094 cleanup unused image pitch code (#15458) 2026-03-24 22:47:16 -04:00
chenyu
f6ed4da268 Tensor.ufix (#15452)
* Tensor.ufix

prep moving _broadcasted to mixin

* remove backward_cast
2026-03-24 22:34:43 -04:00
qazal
1b3d00d6ac viz/cli: remove --offset and --limit flags (#15439)
* work

* also no more no-color

* reorder

* update llama

* sqtt readme

* itertools

* rm that

* signals back
2026-03-25 09:52:27 +09:00
wozeparrot
da2031266a llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
qazal
652bab8aad viz: support nested track_rewrites (#15454)
* simple test

* stack active groups
2026-03-25 05:01:30 +09:00
qazal
41eb2cc41b viz: preserve zoom between re renders (#15451) 2026-03-25 03:11:10 +09:00
Salman Chishti
84049fdc07 Upgrade GitHub Actions to latest versions (#15446)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-24 10:28:49 -04:00
Salman Chishti
9567075e20 Upgrade GitHub Actions for Node 24 compatibility (#15445)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-24 10:28:19 -04:00
chenyu
b7960841af support shape broadcast in UOp.alu (#15442)
i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast
2026-03-24 10:14:57 -04:00
George Hotz
a33ac869aa llm server: temperature + test client (#15444)
* improvements to the llm server

* eval script

* eval llm

* better eval gets 58.71

* cleanups

* add temperature, but multinomial is absurdly slow

* claude is so smart

* lint

* remove slop

* no more stop
2026-03-24 21:07:15 +08:00
nimlgen
9db5d677c7 jit in viz (#15447) 2026-03-24 18:23:53 +08:00
Christopher Milan
2e4fbbcc9c ir3: fix texture mapping and benchmark (#15443) 2026-03-24 04:52:54 -04:00
Christopher Milan
d5320a9ddf QCOM cleanups (#15435) 2026-03-23 22:18:38 -04:00
George Hotz
85dee83f5d amd flash attention cleanups + emulator fixes (#15431)
* amd flash attention cleanups

* simpler

* params

* fix emulator bugs

* fix idiv bug

* remove that test

* more emu fixes
2026-03-24 10:10:46 +08:00
chenyu
018a9e2d3c remove match_dtype arg in Tensor._broadcasted (#15440)
reworked Tensor.where to not need it, also updated dtypes.from_py to use isinstance because ConstFloat issues
2026-03-23 22:10:39 -04:00
qazal
a590eded87 sqtt: rdna4 decoder work (#15434)
* sqtt: rdna4 decoder work

* diff cleanup

* more diff

* test

* work

* works

* TS_DELTA_SHORT
2026-03-24 03:49:32 +09:00
qazal
109472c37e sqtt: new s_barrier pickles, handle rdna4 barriers in emulator (#15437) 2026-03-24 03:25:28 +09:00
nimlgen
fa4cdb422e memplan on linears (#15422)
* memplan

* test

* x

* arenas

* correct

* set any size

* ugh

* make hevc happy

* x

* x

* held

* rm old

* del

* x

* fu

* f

* cl

* cl

* ok
2026-03-23 19:50:16 +08:00
nimlgen
2da008ae3b jit: rm replan (#15433) 2026-03-23 19:31:51 +08:00
qazal
c4c53418f8 sqtt: comment out flaky rocprof timestamp assert (#15432)
* comment out rocprof assert, add new assert

* better than > 0 assert

* string
2026-03-23 19:24:04 +09:00
chenyu
66a86f88a0 simpler Tensor._broadcasted inferred dtype (#15430) 2026-03-23 05:20:11 -04:00
Pham Nguyen Hung
c89576921d Updated the APIs of mnist_gan (#15429)
Co-authored-by: pnhung1703@gmail.com <Hung Pham>
2026-03-23 17:04:00 +08:00
George Hotz
c62dea6881 ai slop flash attention (it works) (#15401)
* ai slop flash attention (it works)

* speed up, 2 TFLOPS + 7 GB/s

* simpler

* simpler

* optimize

* faster

* warp shuffle

* sqtt: link dispatch to exec (#15396)

* sqtt packet linking infra

python

* javascript

* ~doubly linked list

* ui works

* work

* exec can also highlight the pc, coloring work

* more work

* rm sqtt/model.py, doesn't need to be upstreamed

* viz: no context enters in cli, update llama profile (#15404)

* removed unused named arg in rules [pr] (#15414)

* viz: sqtt printer in viz/cli.py (#15411)

* work

* sqtt timeline in CLI

* format all printers nicely

* s/Showed/Printed

* ansistrip

* sys.exit

* keep colors in list

* work from amd_copy_matmul

* has_more always gets returned

* linter

* don't print colors

* more colors

* wow this is so deep

* work

* minor details

* selected

* improve progress bar

* remove it

* 22, global_load_vaddr is so long

* remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416)

Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb

Co-authored-by: Amp <amp@ampcode.com>

* works

* cnt=20

* revert that

* uop slice tests

* simpler

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: gg <ggordbegli@gmail.com>
Co-authored-by: Amp <amp@ampcode.com>
2026-03-23 16:15:10 +08:00
qazal
1568a5ed07 viz: show dispatch to exec delay in sidebar (#15428) 2026-03-23 16:59:59 +09:00
Christopher Milan
ddaeebb500 nir: add shift support (#15426) 2026-03-23 03:37:44 -04:00
nimlgen
c74fa9bbe1 fix jitbeam not triggered (#15424)
* um

* beam

* x

* f
2026-03-23 15:34:59 +08:00
qazal
fd3559103b viz/cli: better error message for empty itrace (#15425) 2026-03-23 15:50:20 +09:00
nimlgen
395aacd77d jit: prune on linear (#15423)
* jit: prune on linear

* x

* this is from the future
2026-03-23 14:10:34 +08:00
chenyu
248cd9b39f make Tensor init the only caller of Tensor.from_uop (#15421)
* make Tensor init the only caller of Tensor.from_uop

prep broadcast cleanups

* type
2026-03-23 00:29:08 -04:00
chenyu
67dcc79fdd push Tensor(symbolic) logic to Tensor.from_uop (#15420) 2026-03-22 23:49:35 -04:00
gg
2087df814f remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416)
Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb

Co-authored-by: Amp <amp@ampcode.com>
2026-03-22 12:49:26 -04:00
qazal
c7b18e6108 viz: sqtt printer in viz/cli.py (#15411)
* work

* sqtt timeline in CLI

* format all printers nicely

* s/Showed/Printed

* ansistrip

* sys.exit

* keep colors in list

* work from amd_copy_matmul

* has_more always gets returned

* linter

* don't print colors

* more colors

* wow this is so deep

* work

* minor details

* selected

* improve progress bar

* remove it

* 22, global_load_vaddr is so long
2026-03-23 00:17:05 +09:00
chenyu
bcc08307da removed unused named arg in rules [pr] (#15414) 2026-03-22 09:25:46 -04:00
qazal
2363bceb47 viz: no context enters in cli, update llama profile (#15404) 2026-03-22 05:47:02 +09:00
qazal
a9ceaf3c5f sqtt: link dispatch to exec (#15396)
* sqtt packet linking infra

python

* javascript

* ~doubly linked list

* ui works

* work

* exec can also highlight the pc, coloring work

* more work

* rm sqtt/model.py, doesn't need to be upstreamed
2026-03-21 23:48:58 +09:00
nimlgen
9656d97d97 jit: captures linears, not execitems (#15399)
* jit: captures linears, not execitems

* x

* um

* etsts

* mockcuda
2026-03-21 16:32:12 +08:00
George Hotz
c13d9d29ff add SHAPED_WMMA (#15400)
* add SHAPED_WMMA

* shaped wmma

* less bad
2026-03-21 16:16:03 +08:00
George Hotz
41a9b09683 minimal vec in amd_copy_matmul (#15398)
* minimal vec in amd_copy_matmul

* unified

* unify

* reshape/permute

* cleanups

* simpler

* move index

* cleanups

* more shared
2026-03-21 14:57:21 +08:00
qazal
30b3054fd5 whitespace cleanups in viz and sqtt.py (#15395) 2026-03-21 04:46:19 +09:00