Commit Graph

12714 Commits

Author SHA1 Message Date
chenyu
bcc08307da removed unused named arg in rules [pr] (#15414) 2026-03-22 09:25:46 -04:00
qazal
2363bceb47 viz: no context enters in cli, update llama profile (#15404) 2026-03-22 05:47:02 +09:00
qazal
a9ceaf3c5f sqtt: link dispatch to exec (#15396)
* sqtt packet linking infra

python

* javascript

* ~doubly linked list

* ui works

* work

* exec can also highlight the pc, coloring work

* more work

* rm sqtt/model.py, doesn't need to be upstreamed
2026-03-21 23:48:58 +09:00
nimlgen
9656d97d97 jit: captures linears, not execitems (#15399)
* jit: captures linears, not execitems

* x

* um

* etsts

* mockcuda
2026-03-21 16:32:12 +08:00
George Hotz
c13d9d29ff add SHAPED_WMMA (#15400)
* add SHAPED_WMMA

* shaped wmma

* less bad
2026-03-21 16:16:03 +08:00
George Hotz
41a9b09683 minimal vec in amd_copy_matmul (#15398)
* minimal vec in amd_copy_matmul

* unified

* unify

* reshape/permute

* cleanups

* simpler

* move index

* cleanups

* more shared
2026-03-21 14:57:21 +08:00
qazal
30b3054fd5 whitespace cleanups in viz and sqtt.py (#15395) 2026-03-21 04:46:19 +09:00
qazal
71ccc69c52 FP8=1 llama works again, hipcc can run on macos (#15394)
* hipcc macos shim

* is_dtype_supported opens devices less
2026-03-20 23:43:15 +09:00
Christopher Milan
9470d5193a deterministic decomp apply order (#15393) 2026-03-20 08:10:45 -04:00
Christopher Milan
376585b003 use should_emulate for target dtype in decomp (#15392) 2026-03-20 07:44:57 -04:00
Christopher Milan
a12d3951de fix test_export_model imports (#15389) 2026-03-20 07:27:01 -04:00
George Hotz
1a2a203f48 add wmma support to amd_copy_matmul (#15384)
* add wmma support to amd_copy_matmul

* 15 TFLOPS and merged

* unify

* simpler

* simpler

* simpler

* cleanups

* TM/TN is the full regs

* comments

* WAVES_PER_SH + SQTT_EVENT

* Add WAVERDY support

* no split warp

* 3 range
2026-03-20 19:02:19 +08:00
Christopher Milan
1560b534a5 remove IMAGE=2 (#15312) 2026-03-20 06:26:52 -04:00
Christopher Milan
30d609432f ci: only xcode-select for gpuocelot on macos (#15387) 2026-03-20 05:58:16 -04:00
chenyu
d1b4e37dfa remove InvalidType branch in Tensor.__init__ (#15386)
it's handled by `elif isinstance(data, get_args(ConstType)):` already
2026-03-20 05:32:33 -04:00
chenyu
c491345766 pass device into Tensor._frompy (#15385)
* pass device into Tensor._frompy

with this, canonicalize_device is the only usage of Device in tensor.py

* export_model.py
2026-03-20 05:09:01 -04:00
George Hotz
3b75d8a7a2 fix double after bug in rangeify (#15381) 2026-03-20 14:53:46 +08:00
Christopher Milan
0c89340a1e automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366) 2026-03-20 02:31:44 -04:00
George Hotz
78ad089817 make precompile the default for llm (#15376)
* make precompile the default for llm

* works

* empty is okay for kvcache

* fix cache misses

* more tests
2026-03-20 14:08:55 +08:00
chenyu
459ef41ea0 don't exclude weakint in is_dtype_supported [pr] (#15378) 2026-03-20 02:08:29 -04:00
qazal
cf6a429aaa mypy emulator pre-commit passing (#15379)
* fix dict stuff

* add type: ignores

* fix pcode to put uops not ints
2026-03-20 14:44:09 +09:00
wozeparrot
87c4ec1724 llama: use flat llama (#15353) 2026-03-19 22:12:38 -07:00
chenyu
da1700e16b dtypes.index -> dtypes.weakint (#15377) 2026-03-20 01:08:46 -04:00
nimlgen
3b04e3ea28 no gmmu mappings with GMMU=0 (#15369)
* usb

* free

* simple gmmu=0

* x

* x

* vram

* init tests

* ppg

* x
2026-03-20 12:18:34 +08:00
ridoy majumdar
c1183b8872 remove dead code in pyrender (#15115)
* remove dead code in pyrender

* retrig CI

* retrig CI

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-19 23:59:56 -04:00
chenyu
bf33c5f796 remove gradient materialize_grads (#15367)
effectively default to True

and removed *0 hack in Tensor.copysign. now dy/dx=0 if y does not depend on x

remove
2026-03-19 23:36:03 -04:00
chenyu
45baf3ff3f pin ci xcode version (#15375) 2026-03-19 23:13:16 -04:00
George Hotz
4091d37e8e flat llama step work (#15355)
* flat llama step work

* fp8 support

* blacklisted matmul

* chestertons fence
2026-03-20 09:06:12 +08:00
qazal
176ad47d7d cdna4 emulator testing ASM_GEMM in CI (#15373)
* cdna emulator work

* accvgprs

* cdna passes most tests

* ruff

* add cdna4 to tests

* cdna emu

* crash

* pass?

* work

* gen

* clean up wave_size access

* asm_gemm passes

* remove acc from dsl.py, emulator can keep its different reg file

it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can
be cleaned up later, but not functionally required for emulator.

* split asm_gemm tests to ones fast on the emulator

* don't do that

* 124 stays null on rdna

* the segfault was because of hw regs, not this

* Revert "clean up wave_size access", it's explicitly tested

This reverts commit 1202ff5787.

* nullcopyout

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-03-20 05:51:30 +09:00
nimlgen
16daffc042 remote connection timeout (#15370) 2026-03-19 19:44:16 +08:00
Christopher Milan
68d7a6b7be PYTHONREMU: fix vop3p literals (#15372) 2026-03-19 07:05:01 -04:00
George Hotz
70dad9d642 add PING to RemoteCmd (#15371)
* add PING to RemoteCmd

* cleanup
2026-03-19 18:57:40 +08:00
nimlgen
1c978aeedb amd: fix aql remote (#15368) 2026-03-19 18:11:03 +08:00
qazal
337c684047 viz: cycle time relative to kernel start in sidebar (#15352) 2026-03-19 18:41:29 +09:00
chenyu
d81b03cff4 pad_to to mixin [pr] (#15365) 2026-03-19 05:02:01 -04:00
chenyu
1abb6297f6 more Tensor(UOp) cleanups (#15364)
* more Tensor(UOp) cleanups

* function too
2026-03-19 03:34:30 -04:00
nimlgen
cf50ca23c3 better oom msg (#15362)
* better oom msg

* s
2026-03-19 14:07:01 +08:00
nimlgen
1a53393512 remote in ci benchmark (#15344)
* remote in ci benchmark

* move to the end

* move

* ports

* own this
2026-03-19 13:49:09 +08:00
chenyu
92dfef8060 Tensor(uop) does not need explicit device (#15361) 2026-03-19 00:44:33 -04:00
nimlgen
f32c2e43a7 memory: use pfree (#15360) 2026-03-19 12:39:23 +08:00
nimlgen
86eec01f97 limit gl*lc (#15359) 2026-03-19 12:38:55 +08:00
chenyu
b39816e998 failed test case for Tensor(np, "bf16") (#15358) 2026-03-18 23:40:14 -04:00
chenyu
e407ee410c cosmetic Tensor._do_reduction cleanups (#15357) 2026-03-18 22:27:50 -04:00
chenyu
6aebf95dac move neg and invert to mixin (#15356) 2026-03-18 22:03:41 -04:00
wozeparrot
f6687d1ffc feat: sd seed0 update (#15354) 2026-03-18 18:42:00 -07:00
wozeparrot
c45a606750 feat: no if in rand (#15333) 2026-03-18 15:09:51 -07:00
qazal
23e0431848 viz: switch sqtt sidebar to a simple asm list (#15350)
* work

* something like this

* Revert "something like this"

This reverts commit 6c45098d2b.

* less

* path includes

* scroll only jumps up and down

* it's only pc and line now
2026-03-19 01:40:25 +09:00
qazal
709fc52d7b viz: fix auto zoom range in sqtt, include endpgm packet (#15349)
* viz: fix automatic zoom range in sqtt packets

* it's x+width

* include s_endpgm

* endpgm also doesn't have exec
2026-03-18 22:52:32 +09:00
nimlgen
d4836ddbb0 canonicalize device from tuple (#15348)
* will it ifx ci?

* test

* um
2026-03-18 20:35:52 +08:00
George Hotz
5524916e39 llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343)
* llama compute gradients explicitly

* apply grads

* fix multi issue

* multi BUFFER_VIEW support

* simpler

* skip the flaky test
2026-03-18 19:54:40 +08:00