chenyu
bcc08307da
removed unused named arg in rules [pr] ( #15414 )
2026-03-22 09:25:46 -04:00
qazal
2363bceb47
viz: no context enters in cli, update llama profile ( #15404 )
2026-03-22 05:47:02 +09:00
qazal
a9ceaf3c5f
sqtt: link dispatch to exec ( #15396 )
...
* sqtt packet linking infra
python
* javascript
* ~doubly linked list
* ui works
* work
* exec can also highlight the pc, coloring work
* more work
* rm sqtt/model.py, doesn't need to be upstreamed
2026-03-21 23:48:58 +09:00
nimlgen
9656d97d97
jit: captures linears, not execitems ( #15399 )
...
* jit: captures linears, not execitems
* x
* um
* etsts
* mockcuda
2026-03-21 16:32:12 +08:00
George Hotz
c13d9d29ff
add SHAPED_WMMA ( #15400 )
...
* add SHAPED_WMMA
* shaped wmma
* less bad
2026-03-21 16:16:03 +08:00
George Hotz
41a9b09683
minimal vec in amd_copy_matmul ( #15398 )
...
* minimal vec in amd_copy_matmul
* unified
* unify
* reshape/permute
* cleanups
* simpler
* move index
* cleanups
* more shared
2026-03-21 14:57:21 +08:00
qazal
30b3054fd5
whitespace cleanups in viz and sqtt.py ( #15395 )
2026-03-21 04:46:19 +09:00
qazal
71ccc69c52
FP8=1 llama works again, hipcc can run on macos ( #15394 )
...
* hipcc macos shim
* is_dtype_supported opens devices less
2026-03-20 23:43:15 +09:00
Christopher Milan
9470d5193a
deterministic decomp apply order ( #15393 )
2026-03-20 08:10:45 -04:00
Christopher Milan
376585b003
use should_emulate for target dtype in decomp ( #15392 )
2026-03-20 07:44:57 -04:00
Christopher Milan
a12d3951de
fix test_export_model imports ( #15389 )
2026-03-20 07:27:01 -04:00
George Hotz
1a2a203f48
add wmma support to amd_copy_matmul ( #15384 )
...
* add wmma support to amd_copy_matmul
* 15 TFLOPS and merged
* unify
* simpler
* simpler
* simpler
* cleanups
* TM/TN is the full regs
* comments
* WAVES_PER_SH + SQTT_EVENT
* Add WAVERDY support
* no split warp
* 3 range
2026-03-20 19:02:19 +08:00
Christopher Milan
1560b534a5
remove IMAGE=2 ( #15312 )
2026-03-20 06:26:52 -04:00
Christopher Milan
30d609432f
ci: only xcode-select for gpuocelot on macos ( #15387 )
2026-03-20 05:58:16 -04:00
chenyu
d1b4e37dfa
remove InvalidType branch in Tensor.__init__ ( #15386 )
...
it's handled by `elif isinstance(data, get_args(ConstType)):` already
2026-03-20 05:32:33 -04:00
chenyu
c491345766
pass device into Tensor._frompy ( #15385 )
...
* pass device into Tensor._frompy
with this, canonicalize_device is the only usage of Device in tensor.py
* export_model.py
2026-03-20 05:09:01 -04:00
George Hotz
3b75d8a7a2
fix double after bug in rangeify ( #15381 )
2026-03-20 14:53:46 +08:00
Christopher Milan
0c89340a1e
automatically emulate unsupported (tiny) floats [skip_process_replay] ( #15366 )
2026-03-20 02:31:44 -04:00
George Hotz
78ad089817
make precompile the default for llm ( #15376 )
...
* make precompile the default for llm
* works
* empty is okay for kvcache
* fix cache misses
* more tests
2026-03-20 14:08:55 +08:00
chenyu
459ef41ea0
don't exclude weakint in is_dtype_supported [pr] ( #15378 )
2026-03-20 02:08:29 -04:00
qazal
cf6a429aaa
mypy emulator pre-commit passing ( #15379 )
...
* fix dict stuff
* add type: ignores
* fix pcode to put uops not ints
2026-03-20 14:44:09 +09:00
wozeparrot
87c4ec1724
llama: use flat llama ( #15353 )
2026-03-19 22:12:38 -07:00
chenyu
da1700e16b
dtypes.index -> dtypes.weakint ( #15377 )
2026-03-20 01:08:46 -04:00
nimlgen
3b04e3ea28
no gmmu mappings with GMMU=0 ( #15369 )
...
* usb
* free
* simple gmmu=0
* x
* x
* vram
* init tests
* ppg
* x
2026-03-20 12:18:34 +08:00
ridoy majumdar
c1183b8872
remove dead code in pyrender ( #15115 )
...
* remove dead code in pyrender
* retrig CI
* retrig CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-03-19 23:59:56 -04:00
chenyu
bf33c5f796
remove gradient materialize_grads ( #15367 )
...
effectively default to True
and removed *0 hack in Tensor.copysign. now dy/dx=0 if y does not depend on x
remove
2026-03-19 23:36:03 -04:00
chenyu
45baf3ff3f
pin ci xcode version ( #15375 )
2026-03-19 23:13:16 -04:00
George Hotz
4091d37e8e
flat llama step work ( #15355 )
...
* flat llama step work
* fp8 support
* blacklisted matmul
* chestertons fence
2026-03-20 09:06:12 +08:00
qazal
176ad47d7d
cdna4 emulator testing ASM_GEMM in CI ( #15373 )
...
* cdna emulator work
* accvgprs
* cdna passes most tests
* ruff
* add cdna4 to tests
* cdna emu
* crash
* pass?
* work
* gen
* clean up wave_size access
* asm_gemm passes
* remove acc from dsl.py, emulator can keep its different reg file
it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can
be cleaned up later, but not functionally required for emulator.
* split asm_gemm tests to ones fast on the emulator
* don't do that
* 124 stays null on rdna
* the segfault was because of hw regs, not this
* Revert "clean up wave_size access", it's explicitly tested
This reverts commit 1202ff5787 .
* nullcopyout
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-03-20 05:51:30 +09:00
nimlgen
16daffc042
remote connection timeout ( #15370 )
2026-03-19 19:44:16 +08:00
Christopher Milan
68d7a6b7be
PYTHONREMU: fix vop3p literals ( #15372 )
2026-03-19 07:05:01 -04:00
George Hotz
70dad9d642
add PING to RemoteCmd ( #15371 )
...
* add PING to RemoteCmd
* cleanup
2026-03-19 18:57:40 +08:00
nimlgen
1c978aeedb
amd: fix aql remote ( #15368 )
2026-03-19 18:11:03 +08:00
qazal
337c684047
viz: cycle time relative to kernel start in sidebar ( #15352 )
2026-03-19 18:41:29 +09:00
chenyu
d81b03cff4
pad_to to mixin [pr] ( #15365 )
2026-03-19 05:02:01 -04:00
chenyu
1abb6297f6
more Tensor(UOp) cleanups ( #15364 )
...
* more Tensor(UOp) cleanups
* function too
2026-03-19 03:34:30 -04:00
nimlgen
cf50ca23c3
better oom msg ( #15362 )
...
* better oom msg
* s
2026-03-19 14:07:01 +08:00
nimlgen
1a53393512
remote in ci benchmark ( #15344 )
...
* remote in ci benchmark
* move to the end
* move
* ports
* own this
2026-03-19 13:49:09 +08:00
chenyu
92dfef8060
Tensor(uop) does not need explicit device ( #15361 )
2026-03-19 00:44:33 -04:00
nimlgen
f32c2e43a7
memory: use pfree ( #15360 )
2026-03-19 12:39:23 +08:00
nimlgen
86eec01f97
limit gl*lc ( #15359 )
2026-03-19 12:38:55 +08:00
chenyu
b39816e998
failed test case for Tensor(np, "bf16") ( #15358 )
2026-03-18 23:40:14 -04:00
chenyu
e407ee410c
cosmetic Tensor._do_reduction cleanups ( #15357 )
2026-03-18 22:27:50 -04:00
chenyu
6aebf95dac
move neg and invert to mixin ( #15356 )
2026-03-18 22:03:41 -04:00
wozeparrot
f6687d1ffc
feat: sd seed0 update ( #15354 )
2026-03-18 18:42:00 -07:00
wozeparrot
c45a606750
feat: no if in rand ( #15333 )
2026-03-18 15:09:51 -07:00
qazal
23e0431848
viz: switch sqtt sidebar to a simple asm list ( #15350 )
...
* work
* something like this
* Revert "something like this"
This reverts commit 6c45098d2b .
* less
* path includes
* scroll only jumps up and down
* it's only pc and line now
2026-03-19 01:40:25 +09:00
qazal
709fc52d7b
viz: fix auto zoom range in sqtt, include endpgm packet ( #15349 )
...
* viz: fix automatic zoom range in sqtt packets
* it's x+width
* include s_endpgm
* endpgm also doesn't have exec
2026-03-18 22:52:32 +09:00
nimlgen
d4836ddbb0
canonicalize device from tuple ( #15348 )
...
* will it ifx ci?
* test
* um
2026-03-18 20:35:52 +08:00
George Hotz
5524916e39
llama compute gradients explicitly + 243 GB of RAM on MP=8 ( #15343 )
...
* llama compute gradients explicitly
* apply grads
* fix multi issue
* multi BUFFER_VIEW support
* simpler
* skip the flaky test
2026-03-18 19:54:40 +08:00