Commit Graph

11743 Commits

Author SHA1 Message Date
qazal
bbc55962ee viz: color SQTT INST Ops like UOps (#14175) 2026-01-16 21:24:43 +09:00
qazal
3751b29a3d viz: skip OTHER_ SQTT packets (#14178) 2026-01-16 20:37:19 +09:00
qazal
7c1f1cb2bc viz: fix INST packets coloring (#14176)
* viz: fix INST packets coloring

* work
2026-01-16 18:46:13 +09:00
qazal
1696991988 viz: add PKTS group to sqtt trace (#14173)
* viz: add PKTS group to sqtt trace

* soft_err for rdna4

* different itrace
2026-01-16 17:29:47 +09:00
Christopher Milan
a021b84604 autogen: fix enum (#14171) 2026-01-16 01:30:11 -05:00
qazal
fa5475307c viz: collapse wave packets in one row, 1 clk per packet (#14169)
* per wave packets in one row

* work

* row_tuple

* cleaner

* one row and one lane per wave

* globals split into rows based on type

* barrier length
2026-01-16 13:52:07 +09:00
Christopher Milan
5abc262e22 fix dll.bind caching (#14168) 2026-01-15 20:25:42 -05:00
Christopher Milan
f9ca072b61 cuda compilers disassemble properly (#14166)
* cuda compilers disassemble properly

* this can use system
2026-01-15 19:02:40 -05:00
chenyu
14e9a71a41 move test_assign to unit (#14165)
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
nimlgen
a0dd9d2146 tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements (#14164)
* tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements

* format
2026-01-15 20:56:39 +03:00
qazal
32e1c267ee viz: SQTT timeline with our decoder (#14139)
* viz: sqtt OCC/INST timeline in our decoder

* todo

* lint

* work

* cleaner

* profiling

* better timing

* keep the generic api

* more generic

* 80x -> 20x off the C decoder

* unusably slow

* rm filters

* work

* work

* other way to sort ops

* work

* first 10k

* 100K actually tells a story

* barrier INST packets get their own red color and row

* minor detail

* 50K

* soft_err
2026-01-15 20:45:16 +09:00
Christopher Milan
0cb024a5bb remove ctypes.Structure (#13651) 2026-01-15 05:06:22 -05:00
George Hotz
255e0573b1 assembly/amd: clean up asm/disasm (#14158)
* assembly/amd: clean up asm/disasm

* update disasm

* revert dumb stuff

* update decode

* use fmt
2026-01-15 17:45:40 +09:00
qazal
164bc678a6 scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules (#14161)
* simplest failing test

* min fix

* same function reuses the cache

* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe codegen/custom_kernel: do not attach KernelInfo to user program (#14160) 2026-01-15 14:01:48 +09:00
George Hotz
fd60626ea1 assembly/amd: refactor to use op_bits/op_regs (#14156)
* assembly/amd: refactor to use op_bits/op_regs

* remove that skip

* remove another hack

* remove another hack

* precompute mask

* more reg, less hasattr
2026-01-15 11:20:21 +09:00
chenyu
add7da268f multiple slice assign test (#14157)
GANing test cases
2026-01-14 21:08:03 -05:00
George Hotz
e9ce12028e assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py (#14154)
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP

* remove buf junk

* simplify

* simplify

* lil cleanup

* dead fixes

* strip non pcode extraction from pdf

* merge pdf.py into amdxml.py

* only amdxml
2026-01-15 09:23:19 +09:00
wozeparrot
7e5687f6a3 more fa multi fix (#14152) 2026-01-14 13:57:11 -08:00
chenyu
1381daac06 many more failed assign tests (#14153)
assign is quite broken
2026-01-14 16:20:28 -05:00
nimlgen
8c55ef4f01 amd: cleanup props (#14145)
* amd: cleanup props

* f
2026-01-14 20:27:41 +03:00
chenyu
899a56446e failed assign test cases with write before read (#14148)
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
986e865830 fix TINY_BACKEND=1 cumsum (#14138)
* fix TINY_BACKEND=1 cumsum

old hack was wrong, need to apply contiguous on the input

* test time

* test_linalg_svd is slow
2026-01-14 09:54:49 -05:00
qazal
434dbafab5 optional Estimates in KernelInfo (#14147)
* optional Estimates in KernelInfo

* custom asm test plumbing

* s_code_end

* estimates test

* vaddr arg in global_store

* kernel desc

* Ops.DEVICE name
2026-01-14 22:55:03 +09:00
qazal
76b577ee76 viz: only SIMD name in sqtt timeline rows (#14146) 2026-01-14 20:13:27 +09:00
George Hotz
e5500ae4ad add ALU stuff to default perf counters (#14135)
* add ALU stuff to default perf counters

* lds

* add alu utilization

* cleaner

* format as percent

* cleanest

* roc
2026-01-14 19:47:59 +09:00
nimlgen
86708ccac5 hip_ioctl: dump aql (#14142) 2026-01-14 13:15:10 +03:00
nimlgen
f9147422a3 ci: add setcap (#14143) 2026-01-14 13:15:01 +03:00
nimlgen
62c1a014a6 amd: rename to be consistent (#14141) 2026-01-14 11:41:04 +03:00
Christopher Milan
e0eea0d833 autogen: verify all files in CI (#14140)
* autogen: verify all files in CI

* dont delete libclang
2026-01-14 02:35:54 -05:00
chenyu
2a2c1eacf6 disable fast_idiv on metal (#14137)
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
George Hotz
2ab18ea7e3 assembly/amd: use xml instead of pdf (#14118)
* assembly/amd: use xml instead of pdf

* use amdxml to generate info about op sizes

* fix many tests with invalid instructions

* fix info generation

* chad xml fixes many bugs

* rename to operands

* simplify

* amdxml

* bug fix
2026-01-14 10:03:37 +09:00
qazal
002ea39da7 assembly/amd: use Tensor.custom_kernel to run assembly (#14125)
* assembly/amd: use Tensor.custom_kernel to run assembly

* PRINT_ASM=1 is DEBUG=4
2026-01-14 08:29:25 +09:00
chenyu
fe00682502 clean up svd tests (#14133)
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
84b88a0a31 more doc of newly added functions (#14132) 2026-01-13 15:48:45 -05:00
chenyu
e610821c52 Tensor.cummin and Tensor.nonzero (#14131) 2026-01-13 15:09:56 -05:00
chenyu
176a934ddd Tensor.diagonal support offset and dims (#14130) 2026-01-13 14:49:06 -05:00
chenyu
2a217ba206 tinybackend isin and log10 (#14120)
can use tinygrad directly
2026-01-13 14:14:09 -05:00
qazal
79d00521f8 viz: fix cfg err when endpgm is in the middle of stream (#14128)
* kernel from beautiful_mnist

* minimal test

* correct way to do this

* rm that
2026-01-14 02:00:34 +09:00
qazal
7fe91e5db9 viz: cleanup cfg renderer (#14127)
* remove colorDomains from sqtt

* colors in js

* work
2026-01-14 01:10:42 +09:00
nimlgen
1364449cab system: early pci perm check (#14126)
* system: early pci perm check

* l
2026-01-13 17:45:05 +03:00
George Hotz
a28c8105a5 assembly/amd: 2% faster amd_uop_matmul + SQTT (#14122)
* assembly/amd: 2% faster amd_uop_matmul

* SQTT_TOKEN_EXCLUDE + SQTT_SIMD_SEL

* sqtt printer

* fix printer

* fast decode

* fast decoder

* test packet counts

* ugh it's not faster

* dead
2026-01-13 19:55:32 +09:00
qazal
6cd318e377 viz: add link to graph from sqtt (#14123) 2026-01-13 17:31:03 +09:00
qazal
fd10fd245a viz: cfg tokenizer fix and unit tests (#14121)
* output Ops.BINARY

* failing test for the cfg

* dsl renamed to offset and sz

* add better asserts

* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696 also return index in Tensor.cummax (#14117)
* also return index in Tensor.cummax

* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4 tk: add failing test for fa multidevice (#14116) 2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e assembly/amd: clean up dsl and make type verification strict (#14102)
* assembly/amd: start newdsl

* work

* newdsl upd

* Reg is p nice

* cleaner

* work

* getting clean

* all fields

* more BitFields

* redo the pdfs with dsl2 syntax

* no lit

* cleanups

* more defaults

* fix get and remove crap

* aliases

* ugly but kind of works

* NULL, not rawimm

* clean up defaults

* only dsl

* asm fixes

* lit fixup

* more lit

* cleanups

* olddsl

* single pcode dict

* emu sort of works

* trash test

* global is global

* types property

* reg mods

* fix a few tests

* remove monkey patch

* fixes

* less hacks in tests

* less hacks in tests

* 4 test failures

* hw tests all pass

* fix compare emulator

* fix some tests

* 3 more

* fix and shorten sqtt

* handwritten

* fix validation

* test corrections

* all types validate

* fix dsl2 tests

* fix bugs in disasm

* skips on cdna

* work

* repr with reg[]

* fix bitfield tests

* merge pcodes in dsl

* remove override

* disasm uses inst.types

* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e add Tensor.log10 with test\test_ops.py::TestOps::test_log10 (#14113) 2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6 don't strip sink in to_uops_list [pr] (#14111) v0.12.0 2026-01-12 11:19:03 -05:00