qazal
ec5b7a249e
viz: refactor sqtt timeline builder ( #15494 )
...
* viz: refactor sqtt timeline builder
* barrier maps to waves
* clean up cli
2026-03-26 21:16:15 +09:00
Christopher Milan
313937ad6d
fix IMAGE TestEnd2End.test_linear_mnist ( #15488 )
2026-03-26 04:12:47 -04:00
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> ( #15467 )
...
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
2026-03-26 03:48:03 -04:00
chenyu
8426f820a1
Tensor.sub to mixin ( #15486 )
...
also _broadcasted skipped broadcasting shape if it does not have shape
2026-03-25 23:20:56 -04:00
wozeparrot
1ca178f379
llama: stochastic rounding ( #15456 )
2026-03-25 18:16:31 -07:00
chenyu
7c8f992894
move EXPAND dtype cast back to gradient.py ( #15481 )
...
only a concern for gradient, not mixin
2026-03-25 19:25:26 -04:00
nimlgen
9d2d0774b4
remote: disk copies ( #15482 )
...
* remote: disk copies
* lineter
* r
* nv
* x
2026-03-25 22:14:25 +03:00
qazal
7c2c8d3905
viz: small ux improvements ( #15483 )
...
* test
* better
* work
2026-03-26 03:18:25 +09:00
qazal
737d5f67f9
viz: compute canvas dims for auto zoom ( #15474 )
2026-03-26 00:05:23 +09:00
qazal
60bd546593
sqtt: add cycle count to rdna3 enums ( #15473 )
...
* update rdna3 sqtt enums to include cycle_count
* dispatch_to_exec
2026-03-25 23:19:54 +09:00
chenyu
142bf11926
logical_not to mixin [pr] ( #15472 )
...
also UPat.cast skips same dtype
2026-03-25 09:16:45 -04:00
George Hotz
25ff7146f2
add a status line to REMOTE with DEBUG=1 ( #15471 )
...
* python speedups of hot paths
* add a status line to REMOTE with DEBUG=1
* pc
* t
2026-03-25 20:54:56 +08:00
qazal
c973b508b8
viz/cli: pass ctrlc ( #15470 )
2026-03-25 21:13:28 +09:00
George Hotz
c1a7d90ccc
python speedups of hot paths ( #15469 )
2026-03-25 20:02:42 +08:00
George Hotz
ae7090b13b
print function timing with DEBUG=2 ( #15468 )
...
* add DEBUG=2 function timing
* remove those functions, they aren't useful
* fix spec
2026-03-25 19:07:32 +08:00
Christopher Milan
e7f389efda
fix height=1 images on macos ( #15460 )
2026-03-25 05:59:56 -04:00
George Hotz
789628df2e
hotfix: add USE_BOT flag to ASM24 USB
2026-03-25 15:00:08 +08:00
George Hotz
cd1a276f47
llm: support gguf path or url ( #15464 )
...
* llm: support gguf path or url
* one line
2026-03-25 14:43:19 +08:00
chenyu
713b322e70
add weakint to promo_lattice ( #15463 )
...
sits between bool and smallest int
2026-03-25 00:27:34 -04:00
chenyu
02878c5a2f
move _broadcasted to OpMixin ( #15461 )
...
it needs both ElementwiseMixin and MovementMixin
2026-03-24 23:56:01 -04:00
chenyu
519ba22470
more Tensor._broadcasted cleanup ( #15459 )
...
prep moving to mixin
2026-03-24 22:55:45 -04:00
George Hotz
fe2690399b
llm: support assistant prefill + refactor to TransformerConfig ( #15457 )
...
* llm: support assistant prefill
* refactor to ModelConfig
* TransformerConfig
* more
2026-03-25 10:50:48 +08:00
Christopher Milan
fd92aec094
cleanup unused image pitch code ( #15458 )
2026-03-24 22:47:16 -04:00
chenyu
f6ed4da268
Tensor.ufix ( #15452 )
...
* Tensor.ufix
prep moving _broadcasted to mixin
* remove backward_cast
2026-03-24 22:34:43 -04:00
qazal
1b3d00d6ac
viz/cli: remove --offset and --limit flags ( #15439 )
...
* work
* also no more no-color
* reorder
* update llama
* sqtt readme
* itertools
* rm that
* signals back
2026-03-25 09:52:27 +09:00
wozeparrot
da2031266a
llama: correct 8b init ( #15397 )
2026-03-24 13:41:41 -07:00
qazal
652bab8aad
viz: support nested track_rewrites ( #15454 )
...
* simple test
* stack active groups
2026-03-25 05:01:30 +09:00
qazal
41eb2cc41b
viz: preserve zoom between re renders ( #15451 )
2026-03-25 03:11:10 +09:00
Salman Chishti
84049fdc07
Upgrade GitHub Actions to latest versions ( #15446 )
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-03-24 10:28:49 -04:00
Salman Chishti
9567075e20
Upgrade GitHub Actions for Node 24 compatibility ( #15445 )
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-03-24 10:28:19 -04:00
chenyu
b7960841af
support shape broadcast in UOp.alu ( #15442 )
...
i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast
2026-03-24 10:14:57 -04:00
George Hotz
a33ac869aa
llm server: temperature + test client ( #15444 )
...
* improvements to the llm server
* eval script
* eval llm
* better eval gets 58.71
* cleanups
* add temperature, but multinomial is absurdly slow
* claude is so smart
* lint
* remove slop
* no more stop
2026-03-24 21:07:15 +08:00
nimlgen
9db5d677c7
jit in viz ( #15447 )
2026-03-24 18:23:53 +08:00
Christopher Milan
2e4fbbcc9c
ir3: fix texture mapping and benchmark ( #15443 )
2026-03-24 04:52:54 -04:00
Christopher Milan
d5320a9ddf
QCOM cleanups ( #15435 )
2026-03-23 22:18:38 -04:00
George Hotz
85dee83f5d
amd flash attention cleanups + emulator fixes ( #15431 )
...
* amd flash attention cleanups
* simpler
* params
* fix emulator bugs
* fix idiv bug
* remove that test
* more emu fixes
2026-03-24 10:10:46 +08:00
chenyu
018a9e2d3c
remove match_dtype arg in Tensor._broadcasted ( #15440 )
...
reworked Tensor.where to not need it, also updated dtypes.from_py to use isinstance because ConstFloat issues
2026-03-23 22:10:39 -04:00
qazal
a590eded87
sqtt: rdna4 decoder work ( #15434 )
...
* sqtt: rdna4 decoder work
* diff cleanup
* more diff
* test
* work
* works
* TS_DELTA_SHORT
2026-03-24 03:49:32 +09:00
qazal
109472c37e
sqtt: new s_barrier pickles, handle rdna4 barriers in emulator ( #15437 )
2026-03-24 03:25:28 +09:00
nimlgen
fa4cdb422e
memplan on linears ( #15422 )
...
* memplan
* test
* x
* arenas
* correct
* set any size
* ugh
* make hevc happy
* x
* x
* held
* rm old
* del
* x
* fu
* f
* cl
* cl
* ok
2026-03-23 19:50:16 +08:00
nimlgen
2da008ae3b
jit: rm replan ( #15433 )
2026-03-23 19:31:51 +08:00
qazal
c4c53418f8
sqtt: comment out flaky rocprof timestamp assert ( #15432 )
...
* comment out rocprof assert, add new assert
* better than > 0 assert
* string
2026-03-23 19:24:04 +09:00
chenyu
66a86f88a0
simpler Tensor._broadcasted inferred dtype ( #15430 )
2026-03-23 05:20:11 -04:00
Pham Nguyen Hung
c89576921d
Updated the APIs of mnist_gan ( #15429 )
...
Co-authored-by: pnhung1703@gmail.com <Hung Pham>
2026-03-23 17:04:00 +08:00
George Hotz
c62dea6881
ai slop flash attention (it works) ( #15401 )
...
* ai slop flash attention (it works)
* speed up, 2 TFLOPS + 7 GB/s
* simpler
* simpler
* optimize
* faster
* warp shuffle
* sqtt: link dispatch to exec (#15396 )
* sqtt packet linking infra
python
* javascript
* ~doubly linked list
* ui works
* work
* exec can also highlight the pc, coloring work
* more work
* rm sqtt/model.py, doesn't need to be upstreamed
* viz: no context enters in cli, update llama profile (#15404 )
* removed unused named arg in rules [pr] (#15414 )
* viz: sqtt printer in viz/cli.py (#15411 )
* work
* sqtt timeline in CLI
* format all printers nicely
* s/Showed/Printed
* ansistrip
* sys.exit
* keep colors in list
* work from amd_copy_matmul
* has_more always gets returned
* linter
* don't print colors
* more colors
* wow this is so deep
* work
* minor details
* selected
* improve progress bar
* remove it
* 22, global_load_vaddr is so long
* remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416 )
Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb
Co-authored-by: Amp <amp@ampcode.com >
* works
* cnt=20
* revert that
* uop slice tests
* simpler
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
Co-authored-by: gg <ggordbegli@gmail.com >
Co-authored-by: Amp <amp@ampcode.com >
2026-03-23 16:15:10 +08:00
qazal
1568a5ed07
viz: show dispatch to exec delay in sidebar ( #15428 )
2026-03-23 16:59:59 +09:00
Christopher Milan
ddaeebb500
nir: add shift support ( #15426 )
2026-03-23 03:37:44 -04:00
nimlgen
c74fa9bbe1
fix jitbeam not triggered ( #15424 )
...
* um
* beam
* x
* f
2026-03-23 15:34:59 +08:00
qazal
fd3559103b
viz/cli: better error message for empty itrace ( #15425 )
2026-03-23 15:50:20 +09:00
nimlgen
395aacd77d
jit: prune on linear ( #15423 )
...
* jit: prune on linear
* x
* this is from the future
2026-03-23 14:10:34 +08:00