wozeparrot
ef42334239
tk: load store cleanup ( #13290 )
2025-11-15 17:08:23 -08:00
qazal
7c110e1a57
viz: minor cleanups for sqtt ( #13275 )
...
* small prg cleanup
* test_timing
2025-11-15 01:08:56 +08:00
qazal
2ee701a009
roc: fix CEnum access ( #13270 )
...
* roc: add decoder to ci
* also add installer
* use CEnum syntax
* try 2
* add to setup
* revert ci change
* the other enum too
2025-11-14 21:41:24 +08:00
nimlgen
14eb48b13a
autogen: rename nv_gpu to nv_570 ( #13273 )
...
* autogen: rename nv_gpu to nv_570
* rename
2025-11-14 20:07:19 +08:00
Christopher Milan
09f3aae169
In-tree autogen: all C libraries ( #13220 )
...
* checkout files from autogen branch
* ioctl with payload
* fix am generations
* properly fix generations
This reverts commit b2a54f4f41 .
* revert discovery.h
* support pragma pack(1)
* typo
* better getter
* typo
* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE
* align support
* anon handling fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 18:57:44 -08:00
wozeparrot
777cbec5b3
tk: rename rt tile dims to base ( #13265 )
2025-11-13 18:43:02 -08:00
wozeparrot
7eb0d8e744
feat: mixins on tiles ( #13246 )
2025-11-13 16:52:52 -08:00
George Hotz
ba84d415fe
work from benchmarking tinybox red v2 ( #13264 )
...
* work from benchmarking tinybox red v2
* gpuburn
2025-11-13 16:38:40 -08:00
wozeparrot
547304c471
tk: group cleanup ( #13262 )
2025-11-13 14:19:51 -08:00
wozeparrot
4ada51618f
tk: don't flatten in clear ( #13249 )
2025-11-13 13:38:01 -08:00
George Hotz
faf68c03a8
more mi350x matmul work ( #13138 )
...
* more mi350x matmul work
* broken compute
2025-11-13 09:09:28 -08:00
alpharush
7e0aaadecd
feat: add repro command to summary ( #10930 )
2025-11-13 08:52:27 -08:00
nimlgen
f9b7586e08
roc: fix blob gc ( #13256 )
2025-11-13 23:38:35 +08:00
qazal
006dea4c3e
roc: only save instruction execs ( #13254 )
2025-11-13 21:28:40 +08:00
George Hotz
17aa3379e9
hotfix: improve self_tokenize
2025-11-13 00:18:57 -08:00
qazal
be2e24cb25
roc: requires sudo to install ( #13237 )
2025-11-12 16:59:22 -05:00
George Hotz
8f1f195b6d
hotfix: no hexdump for usbgpu patch.py
2025-11-12 12:05:37 -08:00
qazal
8b26cf2b3d
sqtt: update rcp timing test ( #13231 )
...
* sqtt: assert correct output in timing test
* found why
2025-11-13 02:01:54 +08:00
nimlgen
af17e07251
viz: sqtt touchups ( #13228 )
...
* viz: sqtt touchups
* revert
* matches
2025-11-12 22:40:37 +08:00
nimlgen
fcd8d0751a
test_timing for hip ( #13229 )
2025-11-12 20:28:58 +08:00
wozeparrot
371c1f2355
tk: move tiles to class ( #13224 )
2025-11-11 21:53:46 -08:00
wozeparrot
787f0070ed
feat: don't use output reg as local reduce reg ( #13203 )
2025-11-11 14:35:16 -08:00
George Hotz
0c978d45e6
stub attention ( #13196 )
...
* stub attention
* name the kernels
2025-11-10 13:48:38 -08:00
qazal
50934050bc
sqtt: append all wave execs ( #13190 )
2025-11-10 23:50:08 +08:00
qazal
38a24731a1
cleanup sqtt tooling ( #13188 )
...
* cleanup viz/serve.py
* use latest profile in rgptool.py
* unwrap nullable in roc.py, fix disasms typing
2025-11-10 20:52:57 +08:00
wozeparrot
6252831ceb
feat: initial tk library ( #13160 )
2025-11-09 22:54:29 -08:00
George Hotz
d7369de048
hotfix: update weekly commits table
2025-11-09 19:37:06 -08:00
nimlgen
614783693e
nv: remove hardcoded expansion_rom_off ( #13180 )
...
* nv: remove hardcoded expansion_rom_off
* to max size
2025-11-09 21:43:19 +08:00
nimlgen
10dc8335d2
tinygpu: fix teardown crash ( #13143 )
...
* tinygpu: fix crash
* um?
* double relase
* restore
2025-11-07 19:52:54 +08:00
qazal
7e94369464
add helper for test_timing custom ops ( #13140 )
2025-11-07 17:13:55 +08:00
nimlgen
95620426d5
tinygpu: unmap dma when client closed ( #13129 )
...
* tinygpu: unmap dma when client closed
* syn
* tiny fixes
2025-11-07 16:08:43 +08:00
nimlgen
b9b68bf437
amd: add kern to sqtt event ( #13126 )
...
* amd: add kern to sqtt event
* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579
qol improvements to sqtt decoder and timing tests ( #13125 )
2025-11-06 20:51:30 +08:00
George Hotz
bcfe42937f
move permute/flip/shrink to mixins ( #13113 )
...
* move permute to mixins
* move more stuff
* two more
* fix local mypy
* fix tests
* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0
move mixins to mixin dir ( #13105 )
...
* move mixins to mixin dir
* math
2025-11-05 10:18:33 -08:00
nimlgen
eff80beeed
amd: props in device not sqtt ( #13106 )
...
* amd: props in device not sqtt
* fix
* f
* fix
* fix
2025-11-05 23:43:20 +08:00
qazal
8119d9f082
sqtt: decode each instruction exec ( #13093 )
...
* sqtt: decode each instruction exec
* start tests
* run_asm
* capture sqtt per kernel
* chaining vgprs
* test things
* inst_execs in viz
* can also configure l and g
* 1l + cleanup
* test_sleep
* test_wmma
* work
* test sleep with llvm builtin
2025-11-05 17:30:27 +08:00
nimlgen
eaf7cbc178
amd: flush sqtt after each kernel ( #13092 )
...
* amd: flush sqtt after each kernel
* merge for rgp
2025-11-04 22:12:48 +08:00
nimlgen
49191ada77
roc: install sqtt decoder ( #13091 )
...
* roc: install?
* msg
* 0.1.4
2025-11-04 18:56:01 +08:00
nimlgen
2e97eaa866
roc: no nullptr when no wave instructions ( #13087 )
2025-11-04 17:32:14 +08:00
wozeparrot
9c00c0688a
tk fa: use 16x64 tiles ( #13086 )
2025-11-03 18:25:38 -08:00
wozeparrot
4ed0f216b5
fix: make max_matmul run again ( #13085 )
2025-11-03 18:09:09 -08:00
qazal
6df34a5887
lint sqtt parser with mypy ( #13079 )
...
* llvm address table errs
* mypy likes annotated dicts
* unwrap nullable
2025-11-04 00:53:59 +08:00
nimlgen
dfde3f54d9
rocprof: use llvm disasm ( #13077 )
...
* rocprof: use llvm disasm
* rm
2025-11-03 23:58:58 +08:00
qazal
27d42fd575
sqtt decoder print behind DEBUG>=5 ( #13076 )
...
* sqtt decoder print behind DEBUG>=5
* gfx version stuff also behind 5
2025-11-03 23:20:03 +08:00
George Hotz
416b15cc59
improve uop matmul syntax ( #13074 )
...
* improve uop matmul syntax
* store takes const
* copy
* cleanups
* faster and simpler
* label them reduce
* better syntax
* touchup
2025-11-03 21:34:26 +08:00
qazal
1c0d4f1cd2
viz: counters loader ( #12987 )
...
* standalone custom loader
* first iteration on the ui
* work
* add center helper
* add edge offsets
* enumerate all edge types
* try dagre layout algorithm
* simpler spec
* bring back double edges
* more work on edge paths
* aesthetics
* custom edges also works
* dimmer inactive links
* cleanup
* cleanup
* split out the ncu layout
* this is just a k/v map now
* rm that
* more cleanup and comments
* do work
* also this work
* simpler start
* rm that
* sqtt work
* view sqtt
* sqtt
* --custom is just in profile
* wrap c call
* from tinygrad install
* eg. module not found
2025-11-03 19:42:36 +08:00
George Hotz
1e3d6e49a6
index slicing + allclose ( #13071 )
...
* continue work on slicing+allclose
* Revert "Revert "slicing + allclose""
This reverts commit 6c7a12f21c .
* fix tests + better syntax
* forgot an after
* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
8cbef912d2
move reshape to MathTraits ( #13054 )
...
* move reshape to MathTraits
* confirm it works in amd_uop_matmul
2025-11-02 12:56:15 +08:00
George Hotz
267be7fc5e
fp16 acc
2025-11-02 12:53:04 +08:00