Commit Graph

11782 Commits

Author SHA1 Message Date
nimlgen
acb0045ba0 system: alloc_sysmem is part of interface (#14226) 2026-01-19 18:15:54 +03:00
qazal
ab426cb671 viz: simplify row line logic (#14227) 2026-01-20 00:00:28 +09:00
nimlgen
01653db4fd nv: GPPut is mmiointerface (#14225) 2026-01-19 17:36:26 +03:00
nimlgen
7cb7abeeb0 amd: fix scratch_wave64_lane_byte_size (#14223) 2026-01-19 15:21:39 +03:00
nimlgen
979ce211f7 amd: missing self in aql's exec (#14224) 2026-01-19 14:27:54 +03:00
George Hotz
31bcbed6bb AMD_DISABLE_SDMA for testing with -n12 (#14216) 2026-01-19 16:10:30 +09:00
qazal
578a4a50d3 viz: row lines in timeline (#14213)
* simple start, already works for memory graph

* add height to exec packets

* math.max, border-color

* borderline is in pixels

* row border color
2026-01-19 13:01:43 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
qazal
7abe9b020f viz: add border colors to pkts timeline (#14211)
* viz: add border colors to pkts timeline

* 10
2026-01-19 11:37:46 +09:00
chenyu
67d9712ef6 jit copy aliased output if it's read later (#14210) 2026-01-18 18:48:59 -05:00
chenyu
97333b1954 jit footguns test case on assign with same buffer outputs (#14209)
related https://github.com/tinygrad/tinygrad/issues/13364
2026-01-18 16:01:09 -05:00
chenyu
e7c2df9113 improve consecutive Tensor indexing (#14208)
* improve consecutive Tensor indexing

instead of O(idx_counts*src_dims), it can just be O(idx_counts)

* test correctness
2026-01-18 15:14:33 -05:00
chenyu
c7b8f6496f remove dtypes.index_like and dtypes.fields [pr] (#14207)
barely used, so just use inline and DTYPES_DICT
2026-01-18 11:49:01 -05:00
qazal
e27a0002c5 viz: only keep the sqtt bytes for pkts (#14203)
* viz: only keep the sqtt bytes for pkts

* better option name

* work

* renames
2026-01-18 17:04:26 +09:00
qazal
d8f87ae2f2 SQTT packets to assembly mapper (#14198)
* disasm + compare to llvm

* start inst trace

* base tests pass

* work

* work

* all kernels

* qol

* refactor

* work

* work

* wave_focus

* simple

* work

* add a lot of asserts

* focus on wave0

* correct handling of IMMEDIATE_MASK

* work

* viz work

* use the metadata infra

* better
2026-01-18 16:32:13 +09:00
Christopher Milan
1eb110cd7d fix memory corruption in NIR, reenable process replay (#14204) 2026-01-18 02:05:12 -05:00
George Hotz
a51e0a86db assembly/amd: clean up disasm.py + add CDNA support (#14200)
* assembly/amd: clean up disasm.py

* cleanups

* add missing encodings

* decode is pretty

* cdna

* assert on failure

* cdna roudtrip

* cdna passing

* test

* lil cleanup

* variant cleanups

* cleanups
2026-01-18 14:48:44 +09:00
chenyu
4b18c92bc5 simpler Context.__enter__ [pr] (#14201) 2026-01-18 00:38:59 -05:00
qazal
feaa804158 skip lvp process replay in CI [pr] (#14202) 2026-01-18 13:25:04 +09:00
chenyu
b12a9fea80 runtime int call instead of cast(int) (#14183) 2026-01-17 20:34:45 -05:00
George Hotz
79c1559f69 amd asm can still be simpler (#14199)
* amd asm can still be simpler

* simpler

* V_LANE_ID

* simpler

* simpler

* compact vgpr
2026-01-17 18:40:10 +09:00
chenyu
5e6a72c33f new Onnx Gather (#14187)
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
George Hotz
9f7f2f0e0c MAX_SQTT_PKTS 2026-01-17 12:05:36 +09:00
George Hotz
50554115ee fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul (#14196)
* fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul

* immed

* wave override

* restore ALT

* advance sgprs correctly

* no helpers

* decrease to 192 VGPRs
2026-01-17 11:58:34 +09:00
chenyu
ab244c7f81 onnx Gather should not assume indices to be const (#14185)
* onnx Gather should not assume indices to be const

added a failed test case

* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234 tk: fa jit fix (#14170) 2026-01-16 16:38:45 -08:00
qazal
a8ae9757dd viz: put alts in the same row, LDS color (#14194)
* viz: put alts in the same row, coloring work

* assert if packets overlap

* lds color
2026-01-17 09:36:14 +09:00
qazal
5aa71f437b viz: precise clock cycles in PKTS (#14179)
* viz: relative clock cycles in PKTS

* format clocks as xM yK 999 cycles
2026-01-17 09:08:13 +09:00
Christopher Milan
eafcd44d95 fix OSX image pitch (#14193) 2026-01-16 19:07:33 -05:00
Christopher Milan
3960e2758c suppress_finalizing in hip (#14189) 2026-01-16 18:56:29 -05:00
qazal
9302ab003a viz: show ALT/OTHER packets on second lane (#14192)
* viz: show dimmer ALT/OTHER packets

* remove todo comment

* work

* current vmem is gray
2026-01-17 08:55:24 +09:00
qazal
551454f476 viz: fix wave sort, show message if sqtt trace is empty (#14190)
* show message if sqtt trace is empty

* work

* fix wave sort

* back
2026-01-17 08:01:26 +09:00
George Hotz
8a2549d42b improve amd_asm_matmul + minor VIZ PKTS improvements (#14186)
* improve amd_asm_matmul + minor VIZ PKTS improvements

* fix waitcnt issue

* cleanups
2026-01-17 06:56:59 +09:00
George Hotz
7d1d9d4568 assembly/amd: remove IMG instruction support and asm.py (#14163)
* assembly/amd: return IMG instruction supports

* remove asm.py

* op2dsl
2026-01-17 06:21:50 +09:00
chenyu
dc4ae7dd08 lower ASSERT_MIN_STEP_TIME for driving_policy to 3ms (#14184)
seems quite stable at 2.7ms now
2026-01-16 15:04:53 -05:00
chenyu
0a14e1fcd4 fix some type ignore (#14182) 2026-01-16 13:56:45 -05:00
chenyu
fc10470883 add UOp.__index__ (#14181)
Tensor slice is handled by __getitem__, so the index method is just for SupportsIndex
2026-01-16 12:28:33 -05:00
chenyu
6790165ef8 minor _apply_uop cleanup (#14180)
give fxn a return type and minor style change
2026-01-16 11:27:55 -05:00
nimlgen
e855ec8ee3 tbgpu: refactor dext to support user mappings (#14177) 2026-01-16 15:55:57 +03:00
qazal
bbc55962ee viz: color SQTT INST Ops like UOps (#14175) 2026-01-16 21:24:43 +09:00
qazal
3751b29a3d viz: skip OTHER_ SQTT packets (#14178) 2026-01-16 20:37:19 +09:00
qazal
7c1f1cb2bc viz: fix INST packets coloring (#14176)
* viz: fix INST packets coloring

* work
2026-01-16 18:46:13 +09:00
qazal
1696991988 viz: add PKTS group to sqtt trace (#14173)
* viz: add PKTS group to sqtt trace

* soft_err for rdna4

* different itrace
2026-01-16 17:29:47 +09:00
Christopher Milan
a021b84604 autogen: fix enum (#14171) 2026-01-16 01:30:11 -05:00
qazal
fa5475307c viz: collapse wave packets in one row, 1 clk per packet (#14169)
* per wave packets in one row

* work

* row_tuple

* cleaner

* one row and one lane per wave

* globals split into rows based on type

* barrier length
2026-01-16 13:52:07 +09:00
Christopher Milan
5abc262e22 fix dll.bind caching (#14168) 2026-01-15 20:25:42 -05:00
Christopher Milan
f9ca072b61 cuda compilers disassemble properly (#14166)
* cuda compilers disassemble properly

* this can use system
2026-01-15 19:02:40 -05:00
chenyu
14e9a71a41 move test_assign to unit (#14165)
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
nimlgen
a0dd9d2146 tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements (#14164)
* tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements

* format
2026-01-15 20:56:39 +03:00
qazal
32e1c267ee viz: SQTT timeline with our decoder (#14139)
* viz: sqtt OCC/INST timeline in our decoder

* todo

* lint

* work

* cleaner

* profiling

* better timing

* keep the generic api

* more generic

* 80x -> 20x off the C decoder

* unusably slow

* rm filters

* work

* work

* other way to sort ops

* work

* first 10k

* 100K actually tells a story

* barrier INST packets get their own red color and row

* minor detail

* 50K

* soft_err
2026-01-15 20:45:16 +09:00