qazal
d8f87ae2f2
SQTT packets to assembly mapper ( #14198 )
...
* disasm + compare to llvm
* start inst trace
* base tests pass
* work
* work
* all kernels
* qol
* refactor
* work
* work
* wave_focus
* simple
* work
* add a lot of asserts
* focus on wave0
* correct handling of IMMEDIATE_MASK
* work
* viz work
* use the metadata infra
* better
2026-01-18 16:32:13 +09:00
Christopher Milan
1eb110cd7d
fix memory corruption in NIR, reenable process replay ( #14204 )
2026-01-18 02:05:12 -05:00
George Hotz
a51e0a86db
assembly/amd: clean up disasm.py + add CDNA support ( #14200 )
...
* assembly/amd: clean up disasm.py
* cleanups
* add missing encodings
* decode is pretty
* cdna
* assert on failure
* cdna roudtrip
* cdna passing
* test
* lil cleanup
* variant cleanups
* cleanups
2026-01-18 14:48:44 +09:00
chenyu
4b18c92bc5
simpler Context.__enter__ [pr] ( #14201 )
2026-01-18 00:38:59 -05:00
qazal
feaa804158
skip lvp process replay in CI [pr] ( #14202 )
2026-01-18 13:25:04 +09:00
chenyu
b12a9fea80
runtime int call instead of cast(int) ( #14183 )
2026-01-17 20:34:45 -05:00
George Hotz
79c1559f69
amd asm can still be simpler ( #14199 )
...
* amd asm can still be simpler
* simpler
* V_LANE_ID
* simpler
* simpler
* compact vgpr
2026-01-17 18:40:10 +09:00
chenyu
5e6a72c33f
new Onnx Gather ( #14187 )
...
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
George Hotz
9f7f2f0e0c
MAX_SQTT_PKTS
2026-01-17 12:05:36 +09:00
George Hotz
50554115ee
fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul ( #14196 )
...
* fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul
* immed
* wave override
* restore ALT
* advance sgprs correctly
* no helpers
* decrease to 192 VGPRs
2026-01-17 11:58:34 +09:00
chenyu
ab244c7f81
onnx Gather should not assume indices to be const ( #14185 )
...
* onnx Gather should not assume indices to be const
added a failed test case
* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234
tk: fa jit fix ( #14170 )
2026-01-16 16:38:45 -08:00
qazal
a8ae9757dd
viz: put alts in the same row, LDS color ( #14194 )
...
* viz: put alts in the same row, coloring work
* assert if packets overlap
* lds color
2026-01-17 09:36:14 +09:00
qazal
5aa71f437b
viz: precise clock cycles in PKTS ( #14179 )
...
* viz: relative clock cycles in PKTS
* format clocks as xM yK 999 cycles
2026-01-17 09:08:13 +09:00
Christopher Milan
eafcd44d95
fix OSX image pitch ( #14193 )
2026-01-16 19:07:33 -05:00
Christopher Milan
3960e2758c
suppress_finalizing in hip ( #14189 )
2026-01-16 18:56:29 -05:00
qazal
9302ab003a
viz: show ALT/OTHER packets on second lane ( #14192 )
...
* viz: show dimmer ALT/OTHER packets
* remove todo comment
* work
* current vmem is gray
2026-01-17 08:55:24 +09:00
qazal
551454f476
viz: fix wave sort, show message if sqtt trace is empty ( #14190 )
...
* show message if sqtt trace is empty
* work
* fix wave sort
* back
2026-01-17 08:01:26 +09:00
George Hotz
8a2549d42b
improve amd_asm_matmul + minor VIZ PKTS improvements ( #14186 )
...
* improve amd_asm_matmul + minor VIZ PKTS improvements
* fix waitcnt issue
* cleanups
2026-01-17 06:56:59 +09:00
George Hotz
7d1d9d4568
assembly/amd: remove IMG instruction support and asm.py ( #14163 )
...
* assembly/amd: return IMG instruction supports
* remove asm.py
* op2dsl
2026-01-17 06:21:50 +09:00
chenyu
dc4ae7dd08
lower ASSERT_MIN_STEP_TIME for driving_policy to 3ms ( #14184 )
...
seems quite stable at 2.7ms now
2026-01-16 15:04:53 -05:00
chenyu
0a14e1fcd4
fix some type ignore ( #14182 )
2026-01-16 13:56:45 -05:00
chenyu
fc10470883
add UOp.__index__ ( #14181 )
...
Tensor slice is handled by __getitem__, so the index method is just for SupportsIndex
2026-01-16 12:28:33 -05:00
chenyu
6790165ef8
minor _apply_uop cleanup ( #14180 )
...
give fxn a return type and minor style change
2026-01-16 11:27:55 -05:00
nimlgen
e855ec8ee3
tbgpu: refactor dext to support user mappings ( #14177 )
2026-01-16 15:55:57 +03:00
qazal
bbc55962ee
viz: color SQTT INST Ops like UOps ( #14175 )
2026-01-16 21:24:43 +09:00
qazal
3751b29a3d
viz: skip OTHER_ SQTT packets ( #14178 )
2026-01-16 20:37:19 +09:00
qazal
7c1f1cb2bc
viz: fix INST packets coloring ( #14176 )
...
* viz: fix INST packets coloring
* work
2026-01-16 18:46:13 +09:00
qazal
1696991988
viz: add PKTS group to sqtt trace ( #14173 )
...
* viz: add PKTS group to sqtt trace
* soft_err for rdna4
* different itrace
2026-01-16 17:29:47 +09:00
Christopher Milan
a021b84604
autogen: fix enum ( #14171 )
2026-01-16 01:30:11 -05:00
qazal
fa5475307c
viz: collapse wave packets in one row, 1 clk per packet ( #14169 )
...
* per wave packets in one row
* work
* row_tuple
* cleaner
* one row and one lane per wave
* globals split into rows based on type
* barrier length
2026-01-16 13:52:07 +09:00
Christopher Milan
5abc262e22
fix dll.bind caching ( #14168 )
2026-01-15 20:25:42 -05:00
Christopher Milan
f9ca072b61
cuda compilers disassemble properly ( #14166 )
...
* cuda compilers disassemble properly
* this can use system
2026-01-15 19:02:40 -05:00
chenyu
14e9a71a41
move test_assign to unit ( #14165 )
...
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
nimlgen
a0dd9d2146
tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements ( #14164 )
...
* tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements
* format
2026-01-15 20:56:39 +03:00
qazal
32e1c267ee
viz: SQTT timeline with our decoder ( #14139 )
...
* viz: sqtt OCC/INST timeline in our decoder
* todo
* lint
* work
* cleaner
* profiling
* better timing
* keep the generic api
* more generic
* 80x -> 20x off the C decoder
* unusably slow
* rm filters
* work
* work
* other way to sort ops
* work
* first 10k
* 100K actually tells a story
* barrier INST packets get their own red color and row
* minor detail
* 50K
* soft_err
2026-01-15 20:45:16 +09:00
Christopher Milan
0cb024a5bb
remove ctypes.Structure ( #13651 )
2026-01-15 05:06:22 -05:00
George Hotz
255e0573b1
assembly/amd: clean up asm/disasm ( #14158 )
...
* assembly/amd: clean up asm/disasm
* update disasm
* revert dumb stuff
* update decode
* use fmt
2026-01-15 17:45:40 +09:00
qazal
164bc678a6
scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules ( #14161 )
...
* simplest failing test
* min fix
* same function reuses the cache
* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe
codegen/custom_kernel: do not attach KernelInfo to user program ( #14160 )
2026-01-15 14:01:48 +09:00
George Hotz
fd60626ea1
assembly/amd: refactor to use op_bits/op_regs ( #14156 )
...
* assembly/amd: refactor to use op_bits/op_regs
* remove that skip
* remove another hack
* remove another hack
* precompute mask
* more reg, less hasattr
2026-01-15 11:20:21 +09:00
chenyu
add7da268f
multiple slice assign test ( #14157 )
...
GANing test cases
2026-01-14 21:08:03 -05:00
George Hotz
e9ce12028e
assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py ( #14154 )
...
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP
* remove buf junk
* simplify
* simplify
* lil cleanup
* dead fixes
* strip non pcode extraction from pdf
* merge pdf.py into amdxml.py
* only amdxml
2026-01-15 09:23:19 +09:00
wozeparrot
7e5687f6a3
more fa multi fix ( #14152 )
2026-01-14 13:57:11 -08:00
chenyu
1381daac06
many more failed assign tests ( #14153 )
...
assign is quite broken
2026-01-14 16:20:28 -05:00
nimlgen
8c55ef4f01
amd: cleanup props ( #14145 )
...
* amd: cleanup props
* f
2026-01-14 20:27:41 +03:00
chenyu
899a56446e
failed assign test cases with write before read ( #14148 )
...
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
986e865830
fix TINY_BACKEND=1 cumsum ( #14138 )
...
* fix TINY_BACKEND=1 cumsum
old hack was wrong, need to apply contiguous on the input
* test time
* test_linalg_svd is slow
2026-01-14 09:54:49 -05:00
qazal
434dbafab5
optional Estimates in KernelInfo ( #14147 )
...
* optional Estimates in KernelInfo
* custom asm test plumbing
* s_code_end
* estimates test
* vaddr arg in global_store
* kernel desc
* Ops.DEVICE name
2026-01-14 22:55:03 +09:00
qazal
76b577ee76
viz: only SIMD name in sqtt timeline rows ( #14146 )
2026-01-14 20:13:27 +09:00