qazal
7622be761f
add new remu instructions from #13533 ( #13539 )
2025-12-03 06:29:20 +08:00
qazal
c65aa93081
refactor sqtt loader to enable PMC=1 SQTT=0 ( #13526 )
2025-12-02 22:50:38 +08:00
wozeparrot
1b7dbfb37f
tk: named kernels + per kernel range id ( #13522 )
2025-12-01 22:51:04 -08:00
qazal
a5ec3b24be
viz: start PMC in the counters view ( #13510 )
2025-12-02 00:01:57 +08:00
George Hotz
97b56e11e0
hotfix: 32 workgroups for radeon 8050s
2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2
use numpy in amd_uop_matmul for simpler tracing ( #13503 )
2025-11-30 08:04:38 -08:00
qazal
9023ca30ef
show number of waves in each SE/CU ( #13491 )
...
* show number of waves in each SE/CU
* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236
nv: minimal hevc ( #13502 )
...
* nv: minimal hevc
* validate
* not needed
* tralin
* var
* cpu
* fxi
* desc
* move
* cleanup
2025-11-30 16:46:55 +03:00
qazal
d457ee0ba4
viz: correctly handle multiple sqtt traces of the same prg ( #13460 )
2025-11-29 20:52:41 +08:00
wozeparrot
ffc31a23f4
tk mi350 ( #13288 )
2025-11-25 15:49:44 -08:00
qazal
5520f1fb0b
viz: per cu timeline ( #13451 )
...
* add cu_loc
* work
* WAVE -> W
2025-11-26 00:05:20 +08:00
wozeparrot
249553a119
tinyfs tweaks ( #13444 )
2025-11-24 18:07:32 -08:00
wozeparrot
f46bc31156
tk: start and step in range ( #13442 )
2025-11-24 15:43:24 -08:00
qazal
2a9bd12700
sqtt: add occupancy events to the timeline ( #13430 )
2025-11-24 22:28:05 +08:00
qazal
712c7a6448
sqtt loader cleanups from the occupancy branch ( #13431 )
...
* cleanup err handling
* from disasms
* s/wave_execs/wave_insts
2025-11-23 21:50:34 +08:00
George Hotz
9d7a17ee39
beautiful SQTT_PARSE=1 with color ( #13428 )
...
* beautiful SQTT_PARSE=1 with color
* linter
* linter 2
* a few more labels
* filter and or
* wave alloc
* a few more
2025-11-23 01:05:14 -08:00
George Hotz
da0aa57a3b
add cu parsing to attempt_sqtt_parse
2025-11-22 22:09:05 -08:00
qazal
320ed78803
can view wave timeline with SQTT_ITRACE_SE_MASK=0 ( #13427 )
2025-11-23 13:55:47 +08:00
George Hotz
5110409339
continue work on parse sqtt, enable with SQTT_PARSE ( #13425 )
...
* continue work on parse sqtt, enable with SQTT_PARSE
* fix timing
* delta is pre instruction
* hi8 values
* a few more
* a bit more
* let it crash if you enabled it
* figure out simd
* hide 0x11
2025-11-22 19:03:17 -08:00
George Hotz
92170d0ff1
lil op cleanup ( #13424 )
...
* track flag count and op count
* text
* more
* file count
* lil op cleanup
* cleanups
* move
2025-11-22 15:21:15 -08:00
George Hotz
423b76a852
improve sqtt format parser (saturday coffee shop project) ( #13419 )
...
* improve sqtt format parser
* actually read the trash code ChatGPT wrote
* cleanups
* hand written parser
* quality
* more
* was missing first packet
* maybe
* filt
* fixups
* label the waves
* progress
2025-11-22 15:04:10 -08:00
qazal
c14033e10f
viz: faster startup time with SQTT=1 ( #13337 )
...
* roc.py cleanups
* direct append
* viz index cleanup
* simd row details
* add kernel arg
* late instructions decode
* more instruction decode to sep server request
* 200ms startup, 6 second to waves timeline
* sort units
* creating new http paths is easy now
* instructions unpacker
* min diff, use hyphens
* summary table
2025-11-22 22:02:30 +08:00
George Hotz
dabb02767f
set AMD profile mode with sudo on SQTT or PMC ( #13403 )
...
* require profile mode
* add mode setter
* cleanup
* not needed
* SQTT_LIMIT_SE
2025-11-20 23:19:11 -08:00
b1tg
91e289cb14
amd fp8 llvm ( #13186 )
...
* amd fp8 llvm support
* fix max
* clean
* add test_mi350.sh
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-20 12:35:57 -05:00
Roelof van Dijk
1058748440
torch backend: no aten.detach for torch 2.10 compat ( #13381 )
...
* this works, less cpp?
* simpler = better
* keep torch 2.9 working as well
2025-11-20 09:12:15 -08:00
qazal
9dbc550692
roc: map disassembly to prog name ( #13384 )
2025-11-20 23:47:19 +08:00
Roelof van Dijk
0dc2ff431d
fix: revive torch backend ( #13280 )
...
* fix: revive torch backend
* as_strided view vs copy
* Revert "as_strided view vs copy"
This reverts commit 82a61223f2 .
* add extra tests (move inplace, add fusion tests)
* better fusion with inplace_op
* no optimizer hooks (break mnist training fusion)
* split off fusion tests in separate file, assert on resnet fusion
fix: remove comments
* cleanup, reduce diff
* reduce diff
* better fusion and identity checks
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-19 15:26:50 -08:00
wozeparrot
56b2540349
tk: keep extra tile data by replacing uop ( #13370 )
2025-11-19 15:11:43 -08:00
nimlgen
0c9fbf87e1
nvioctl: classes ( #13346 )
2025-11-19 16:14:15 +03:00
wozeparrot
be72b78dcb
tk: small fixes ( #13345 )
...
* fix: handle case where final uop isn't a tk wrapped one
* clean: remove after from mma
2025-11-19 00:58:50 -08:00
qazal
a647c9eca6
sqtt ui minor fixes ( #13335 )
...
* roc.py cleanups
* direct append
* viz index cleanup
* simd row details
2025-11-19 01:27:56 +08:00
nimlgen
331f70aa75
roc: ctrlc ( #13255 )
...
* roc: ctrl-c works
* rm
2025-11-18 19:29:28 +08:00
George Hotz
6d3385c284
print special ops in postrange ( #13318 )
...
* print special ops in postrange
* fix on OSX
2025-11-17 14:43:23 -08:00
George Hotz
98e9e73286
hotfix: amd_uop_matmul getenvs
2025-11-17 13:26:01 -08:00
qazal
e7e1935225
cleanup sqtt/test_timing ( #13315 )
2025-11-18 04:28:05 +08:00
wozeparrot
33773fda87
tk initial mi350 ( #13289 )
2025-11-17 11:46:32 -08:00
nimlgen
e2cee64050
Revert "hcq: add tag to exec events ( #13311 )" ( #13314 )
...
This reverts commit f63ded5817 .
2025-11-17 22:15:31 +03:00
nimlgen
f63ded5817
hcq: add tag to exec events ( #13311 )
...
* hcq: add tag to exec events
* f
* fix
* fix
2025-11-17 16:59:30 +03:00
qazal
50a443f558
viz: add shader engine to wave exec payload ( #13310 )
...
* viz: show sqtt shader engine
* order it from smallest unit
* easier to config
2025-11-17 19:11:34 +08:00
George Hotz
55be95da15
cleanup sqtt raw parser ( #13309 )
...
* cleanup sqtt raw parser
* better names (don't merge yet)
* clean up amd
* a few more names
* one more filter
2025-11-16 13:11:51 -08:00
George Hotz
cabd4add48
more work parsing SQTT, separate VIZ/PROFILE ( #13308 )
...
* more work parsing SQTT
* more minimal runner
* sep VIZ/PROFILE
* parse print new
* improve parser
* more filter
* that
* split them
* lil cleanup
* skip flaky test
* AQL in mmapeak
2025-11-16 10:40:39 -08:00
qazal
13efdf8c31
test s_nop stall ( #13307 )
2025-11-17 00:59:39 +08:00
George Hotz
295600dc5a
saturday coffee shop work parsing the att format ( #13295 )
...
* saturday coffee shop work parsing the att format
* add examples
* parser
* classes of packets
* fully vibe coded parser
* vibing
* empty
* some vibe names
* vibes
* most of these are wrong
* more vibes
* better names
* parsing
* parse
* cleanup parser
* touchups
2025-11-16 08:25:51 -08:00
qazal
c70b06ec19
sqtt test_timing work ( #13304 )
...
* sqtt test_timing cleanups
* only the instruction
* v_mfma_f32_16x16x32_f16 16 cycles, only after second one though
2025-11-16 23:49:24 +08:00
wozeparrot
ef42334239
tk: load store cleanup ( #13290 )
2025-11-15 17:08:23 -08:00
qazal
7c110e1a57
viz: minor cleanups for sqtt ( #13275 )
...
* small prg cleanup
* test_timing
2025-11-15 01:08:56 +08:00
qazal
2ee701a009
roc: fix CEnum access ( #13270 )
...
* roc: add decoder to ci
* also add installer
* use CEnum syntax
* try 2
* add to setup
* revert ci change
* the other enum too
2025-11-14 21:41:24 +08:00
nimlgen
14eb48b13a
autogen: rename nv_gpu to nv_570 ( #13273 )
...
* autogen: rename nv_gpu to nv_570
* rename
2025-11-14 20:07:19 +08:00
Christopher Milan
09f3aae169
In-tree autogen: all C libraries ( #13220 )
...
* checkout files from autogen branch
* ioctl with payload
* fix am generations
* properly fix generations
This reverts commit b2a54f4f41 .
* revert discovery.h
* support pragma pack(1)
* typo
* better getter
* typo
* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE
* align support
* anon handling fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 18:57:44 -08:00
wozeparrot
777cbec5b3
tk: rename rt tile dims to base ( #13265 )
2025-11-13 18:43:02 -08:00