George Hotz
423b76a852
improve sqtt format parser (saturday coffee shop project) ( #13419 )
...
* improve sqtt format parser
* actually read the trash code ChatGPT wrote
* cleanups
* hand written parser
* quality
* more
* was missing first packet
* maybe
* filt
* fixups
* label the waves
* progress
2025-11-22 15:04:10 -08:00
qazal
c14033e10f
viz: faster startup time with SQTT=1 ( #13337 )
...
* roc.py cleanups
* direct append
* viz index cleanup
* simd row details
* add kernel arg
* late instructions decode
* more instruction decode to sep server request
* 200ms startup, 6 second to waves timeline
* sort units
* creating new http paths is easy now
* instructions unpacker
* min diff, use hyphens
* summary table
2025-11-22 22:02:30 +08:00
George Hotz
dabb02767f
set AMD profile mode with sudo on SQTT or PMC ( #13403 )
...
* require profile mode
* add mode setter
* cleanup
* not needed
* SQTT_LIMIT_SE
2025-11-20 23:19:11 -08:00
b1tg
91e289cb14
amd fp8 llvm ( #13186 )
...
* amd fp8 llvm support
* fix max
* clean
* add test_mi350.sh
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-20 12:35:57 -05:00
Roelof van Dijk
1058748440
torch backend: no aten.detach for torch 2.10 compat ( #13381 )
...
* this works, less cpp?
* simpler = better
* keep torch 2.9 working as well
2025-11-20 09:12:15 -08:00
qazal
9dbc550692
roc: map disassembly to prog name ( #13384 )
2025-11-20 23:47:19 +08:00
Roelof van Dijk
0dc2ff431d
fix: revive torch backend ( #13280 )
...
* fix: revive torch backend
* as_strided view vs copy
* Revert "as_strided view vs copy"
This reverts commit 82a61223f2 .
* add extra tests (move inplace, add fusion tests)
* better fusion with inplace_op
* no optimizer hooks (break mnist training fusion)
* split off fusion tests in separate file, assert on resnet fusion
fix: remove comments
* cleanup, reduce diff
* reduce diff
* better fusion and identity checks
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-19 15:26:50 -08:00
wozeparrot
56b2540349
tk: keep extra tile data by replacing uop ( #13370 )
2025-11-19 15:11:43 -08:00
nimlgen
0c9fbf87e1
nvioctl: classes ( #13346 )
2025-11-19 16:14:15 +03:00
wozeparrot
be72b78dcb
tk: small fixes ( #13345 )
...
* fix: handle case where final uop isn't a tk wrapped one
* clean: remove after from mma
2025-11-19 00:58:50 -08:00
qazal
a647c9eca6
sqtt ui minor fixes ( #13335 )
...
* roc.py cleanups
* direct append
* viz index cleanup
* simd row details
2025-11-19 01:27:56 +08:00
nimlgen
331f70aa75
roc: ctrlc ( #13255 )
...
* roc: ctrl-c works
* rm
2025-11-18 19:29:28 +08:00
George Hotz
6d3385c284
print special ops in postrange ( #13318 )
...
* print special ops in postrange
* fix on OSX
2025-11-17 14:43:23 -08:00
George Hotz
98e9e73286
hotfix: amd_uop_matmul getenvs
2025-11-17 13:26:01 -08:00
qazal
e7e1935225
cleanup sqtt/test_timing ( #13315 )
2025-11-18 04:28:05 +08:00
wozeparrot
33773fda87
tk initial mi350 ( #13289 )
2025-11-17 11:46:32 -08:00
nimlgen
e2cee64050
Revert "hcq: add tag to exec events ( #13311 )" ( #13314 )
...
This reverts commit f63ded5817 .
2025-11-17 22:15:31 +03:00
nimlgen
f63ded5817
hcq: add tag to exec events ( #13311 )
...
* hcq: add tag to exec events
* f
* fix
* fix
2025-11-17 16:59:30 +03:00
qazal
50a443f558
viz: add shader engine to wave exec payload ( #13310 )
...
* viz: show sqtt shader engine
* order it from smallest unit
* easier to config
2025-11-17 19:11:34 +08:00
George Hotz
55be95da15
cleanup sqtt raw parser ( #13309 )
...
* cleanup sqtt raw parser
* better names (don't merge yet)
* clean up amd
* a few more names
* one more filter
2025-11-16 13:11:51 -08:00
George Hotz
cabd4add48
more work parsing SQTT, separate VIZ/PROFILE ( #13308 )
...
* more work parsing SQTT
* more minimal runner
* sep VIZ/PROFILE
* parse print new
* improve parser
* more filter
* that
* split them
* lil cleanup
* skip flaky test
* AQL in mmapeak
2025-11-16 10:40:39 -08:00
qazal
13efdf8c31
test s_nop stall ( #13307 )
2025-11-17 00:59:39 +08:00
George Hotz
295600dc5a
saturday coffee shop work parsing the att format ( #13295 )
...
* saturday coffee shop work parsing the att format
* add examples
* parser
* classes of packets
* fully vibe coded parser
* vibing
* empty
* some vibe names
* vibes
* most of these are wrong
* more vibes
* better names
* parsing
* parse
* cleanup parser
* touchups
2025-11-16 08:25:51 -08:00
qazal
c70b06ec19
sqtt test_timing work ( #13304 )
...
* sqtt test_timing cleanups
* only the instruction
* v_mfma_f32_16x16x32_f16 16 cycles, only after second one though
2025-11-16 23:49:24 +08:00
wozeparrot
ef42334239
tk: load store cleanup ( #13290 )
2025-11-15 17:08:23 -08:00
qazal
7c110e1a57
viz: minor cleanups for sqtt ( #13275 )
...
* small prg cleanup
* test_timing
2025-11-15 01:08:56 +08:00
qazal
2ee701a009
roc: fix CEnum access ( #13270 )
...
* roc: add decoder to ci
* also add installer
* use CEnum syntax
* try 2
* add to setup
* revert ci change
* the other enum too
2025-11-14 21:41:24 +08:00
nimlgen
14eb48b13a
autogen: rename nv_gpu to nv_570 ( #13273 )
...
* autogen: rename nv_gpu to nv_570
* rename
2025-11-14 20:07:19 +08:00
Christopher Milan
09f3aae169
In-tree autogen: all C libraries ( #13220 )
...
* checkout files from autogen branch
* ioctl with payload
* fix am generations
* properly fix generations
This reverts commit b2a54f4f41 .
* revert discovery.h
* support pragma pack(1)
* typo
* better getter
* typo
* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE
* align support
* anon handling fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 18:57:44 -08:00
wozeparrot
777cbec5b3
tk: rename rt tile dims to base ( #13265 )
2025-11-13 18:43:02 -08:00
wozeparrot
7eb0d8e744
feat: mixins on tiles ( #13246 )
2025-11-13 16:52:52 -08:00
George Hotz
ba84d415fe
work from benchmarking tinybox red v2 ( #13264 )
...
* work from benchmarking tinybox red v2
* gpuburn
2025-11-13 16:38:40 -08:00
wozeparrot
547304c471
tk: group cleanup ( #13262 )
2025-11-13 14:19:51 -08:00
wozeparrot
4ada51618f
tk: don't flatten in clear ( #13249 )
2025-11-13 13:38:01 -08:00
George Hotz
faf68c03a8
more mi350x matmul work ( #13138 )
...
* more mi350x matmul work
* broken compute
2025-11-13 09:09:28 -08:00
alpharush
7e0aaadecd
feat: add repro command to summary ( #10930 )
2025-11-13 08:52:27 -08:00
nimlgen
f9b7586e08
roc: fix blob gc ( #13256 )
2025-11-13 23:38:35 +08:00
qazal
006dea4c3e
roc: only save instruction execs ( #13254 )
2025-11-13 21:28:40 +08:00
George Hotz
17aa3379e9
hotfix: improve self_tokenize
2025-11-13 00:18:57 -08:00
qazal
be2e24cb25
roc: requires sudo to install ( #13237 )
2025-11-12 16:59:22 -05:00
George Hotz
8f1f195b6d
hotfix: no hexdump for usbgpu patch.py
2025-11-12 12:05:37 -08:00
qazal
8b26cf2b3d
sqtt: update rcp timing test ( #13231 )
...
* sqtt: assert correct output in timing test
* found why
2025-11-13 02:01:54 +08:00
nimlgen
af17e07251
viz: sqtt touchups ( #13228 )
...
* viz: sqtt touchups
* revert
* matches
2025-11-12 22:40:37 +08:00
nimlgen
fcd8d0751a
test_timing for hip ( #13229 )
2025-11-12 20:28:58 +08:00
wozeparrot
371c1f2355
tk: move tiles to class ( #13224 )
2025-11-11 21:53:46 -08:00
wozeparrot
787f0070ed
feat: don't use output reg as local reduce reg ( #13203 )
2025-11-11 14:35:16 -08:00
George Hotz
0c978d45e6
stub attention ( #13196 )
...
* stub attention
* name the kernels
2025-11-10 13:48:38 -08:00
qazal
50934050bc
sqtt: append all wave execs ( #13190 )
2025-11-10 23:50:08 +08:00
qazal
38a24731a1
cleanup sqtt tooling ( #13188 )
...
* cleanup viz/serve.py
* use latest profile in rgptool.py
* unwrap nullable in roc.py, fix disasms typing
2025-11-10 20:52:57 +08:00
wozeparrot
6252831ceb
feat: initial tk library ( #13160 )
2025-11-09 22:54:29 -08:00