Christopher Milan
f9ca072b61
cuda compilers disassemble properly ( #14166 )
...
* cuda compilers disassemble properly
* this can use system
2026-01-15 19:02:40 -05:00
chenyu
14e9a71a41
move test_assign to unit ( #14165 )
...
scheduling these should not depend on device
2026-01-15 17:10:13 -05:00
nimlgen
a0dd9d2146
tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements ( #14164 )
...
* tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements
* format
2026-01-15 20:56:39 +03:00
qazal
32e1c267ee
viz: SQTT timeline with our decoder ( #14139 )
...
* viz: sqtt OCC/INST timeline in our decoder
* todo
* lint
* work
* cleaner
* profiling
* better timing
* keep the generic api
* more generic
* 80x -> 20x off the C decoder
* unusably slow
* rm filters
* work
* work
* other way to sort ops
* work
* first 10k
* 100K actually tells a story
* barrier INST packets get their own red color and row
* minor detail
* 50K
* soft_err
2026-01-15 20:45:16 +09:00
Christopher Milan
0cb024a5bb
remove ctypes.Structure ( #13651 )
2026-01-15 05:06:22 -05:00
George Hotz
255e0573b1
assembly/amd: clean up asm/disasm ( #14158 )
...
* assembly/amd: clean up asm/disasm
* update disasm
* revert dumb stuff
* update decode
* use fmt
2026-01-15 17:45:40 +09:00
qazal
164bc678a6
scheduler: sched_cache bugfix for different Tensor.custom_kernel schedules ( #14161 )
...
* simplest failing test
* min fix
* same function reuses the cache
* SPEC=2 never worked for custom_kernel
2026-01-15 14:59:14 +09:00
qazal
b46da603fe
codegen/custom_kernel: do not attach KernelInfo to user program ( #14160 )
2026-01-15 14:01:48 +09:00
George Hotz
fd60626ea1
assembly/amd: refactor to use op_bits/op_regs ( #14156 )
...
* assembly/amd: refactor to use op_bits/op_regs
* remove that skip
* remove another hack
* remove another hack
* precompute mask
* more reg, less hasattr
2026-01-15 11:20:21 +09:00
chenyu
add7da268f
multiple slice assign test ( #14157 )
...
GANing test cases
2026-01-14 21:08:03 -05:00
George Hotz
e9ce12028e
assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py ( #14154 )
...
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP
* remove buf junk
* simplify
* simplify
* lil cleanup
* dead fixes
* strip non pcode extraction from pdf
* merge pdf.py into amdxml.py
* only amdxml
2026-01-15 09:23:19 +09:00
wozeparrot
7e5687f6a3
more fa multi fix ( #14152 )
2026-01-14 13:57:11 -08:00
chenyu
1381daac06
many more failed assign tests ( #14153 )
...
assign is quite broken
2026-01-14 16:20:28 -05:00
nimlgen
8c55ef4f01
amd: cleanup props ( #14145 )
...
* amd: cleanup props
* f
2026-01-14 20:27:41 +03:00
chenyu
899a56446e
failed assign test cases with write before read ( #14148 )
...
slice assign write before read fails now. this is why kv cache needs a realize
2026-01-14 10:30:50 -05:00
chenyu
986e865830
fix TINY_BACKEND=1 cumsum ( #14138 )
...
* fix TINY_BACKEND=1 cumsum
old hack was wrong, need to apply contiguous on the input
* test time
* test_linalg_svd is slow
2026-01-14 09:54:49 -05:00
qazal
434dbafab5
optional Estimates in KernelInfo ( #14147 )
...
* optional Estimates in KernelInfo
* custom asm test plumbing
* s_code_end
* estimates test
* vaddr arg in global_store
* kernel desc
* Ops.DEVICE name
2026-01-14 22:55:03 +09:00
qazal
76b577ee76
viz: only SIMD name in sqtt timeline rows ( #14146 )
2026-01-14 20:13:27 +09:00
George Hotz
e5500ae4ad
add ALU stuff to default perf counters ( #14135 )
...
* add ALU stuff to default perf counters
* lds
* add alu utilization
* cleaner
* format as percent
* cleanest
* roc
2026-01-14 19:47:59 +09:00
nimlgen
86708ccac5
hip_ioctl: dump aql ( #14142 )
2026-01-14 13:15:10 +03:00
nimlgen
f9147422a3
ci: add setcap ( #14143 )
2026-01-14 13:15:01 +03:00
nimlgen
62c1a014a6
amd: rename to be consistent ( #14141 )
2026-01-14 11:41:04 +03:00
Christopher Milan
e0eea0d833
autogen: verify all files in CI ( #14140 )
...
* autogen: verify all files in CI
* dont delete libclang
2026-01-14 02:35:54 -05:00
chenyu
2a2c1eacf6
disable fast_idiv on metal ( #14137 )
...
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c
tk: fa multi fix ( #14134 )
2026-01-13 17:22:15 -08:00
George Hotz
2ab18ea7e3
assembly/amd: use xml instead of pdf ( #14118 )
...
* assembly/amd: use xml instead of pdf
* use amdxml to generate info about op sizes
* fix many tests with invalid instructions
* fix info generation
* chad xml fixes many bugs
* rename to operands
* simplify
* amdxml
* bug fix
2026-01-14 10:03:37 +09:00
qazal
002ea39da7
assembly/amd: use Tensor.custom_kernel to run assembly ( #14125 )
...
* assembly/amd: use Tensor.custom_kernel to run assembly
* PRINT_ASM=1 is DEBUG=4
2026-01-14 08:29:25 +09:00
chenyu
fe00682502
clean up svd tests ( #14133 )
...
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
84b88a0a31
more doc of newly added functions ( #14132 )
2026-01-13 15:48:45 -05:00
chenyu
e610821c52
Tensor.cummin and Tensor.nonzero ( #14131 )
2026-01-13 15:09:56 -05:00
chenyu
176a934ddd
Tensor.diagonal support offset and dims ( #14130 )
2026-01-13 14:49:06 -05:00
chenyu
2a217ba206
tinybackend isin and log10 ( #14120 )
...
can use tinygrad directly
2026-01-13 14:14:09 -05:00
qazal
79d00521f8
viz: fix cfg err when endpgm is in the middle of stream ( #14128 )
...
* kernel from beautiful_mnist
* minimal test
* correct way to do this
* rm that
2026-01-14 02:00:34 +09:00
qazal
7fe91e5db9
viz: cleanup cfg renderer ( #14127 )
...
* remove colorDomains from sqtt
* colors in js
* work
2026-01-14 01:10:42 +09:00
nimlgen
1364449cab
system: early pci perm check ( #14126 )
...
* system: early pci perm check
* l
2026-01-13 17:45:05 +03:00
George Hotz
a28c8105a5
assembly/amd: 2% faster amd_uop_matmul + SQTT ( #14122 )
...
* assembly/amd: 2% faster amd_uop_matmul
* SQTT_TOKEN_EXCLUDE + SQTT_SIMD_SEL
* sqtt printer
* fix printer
* fast decode
* fast decoder
* test packet counts
* ugh it's not faster
* dead
2026-01-13 19:55:32 +09:00
qazal
6cd318e377
viz: add link to graph from sqtt ( #14123 )
2026-01-13 17:31:03 +09:00
qazal
fd10fd245a
viz: cfg tokenizer fix and unit tests ( #14121 )
...
* output Ops.BINARY
* failing test for the cfg
* dsl renamed to offset and sz
* add better asserts
* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696
also return index in Tensor.cummax ( #14117 )
...
* also return index in Tensor.cummax
* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4
tk: add failing test for fa multidevice ( #14116 )
2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e
assembly/amd: clean up dsl and make type verification strict ( #14102 )
...
* assembly/amd: start newdsl
* work
* newdsl upd
* Reg is p nice
* cleaner
* work
* getting clean
* all fields
* more BitFields
* redo the pdfs with dsl2 syntax
* no lit
* cleanups
* more defaults
* fix get and remove crap
* aliases
* ugly but kind of works
* NULL, not rawimm
* clean up defaults
* only dsl
* asm fixes
* lit fixup
* more lit
* cleanups
* olddsl
* single pcode dict
* emu sort of works
* trash test
* global is global
* types property
* reg mods
* fix a few tests
* remove monkey patch
* fixes
* less hacks in tests
* less hacks in tests
* 4 test failures
* hw tests all pass
* fix compare emulator
* fix some tests
* 3 more
* fix and shorten sqtt
* handwritten
* fix validation
* test corrections
* all types validate
* fix dsl2 tests
* fix bugs in disasm
* skips on cdna
* work
* repr with reg[]
* fix bitfield tests
* merge pcodes in dsl
* remove override
* disasm uses inst.types
* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e
add Tensor.log10 with test\test_ops.py::TestOps::test_log10 ( #14113 )
2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6
don't strip sink in to_uops_list [pr] ( #14111 )
v0.12.0
2026-01-12 11:19:03 -05:00
chenyu
cad7feec02
more onnx ops ( #14104 )
...
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
nimlgen
635ed2df9d
system: use pci.PCI_VENDOR_ID instead of const ( #14109 )
2026-01-12 15:24:09 +03:00
qazal
6c0f0e29ff
Revert "viz: loading... ( #14107 )" ( #14108 )
...
This reverts commit 9347757c2d .
2026-01-12 20:45:37 +09:00
nimlgen
9347757c2d
viz: loading... ( #14107 )
2026-01-12 13:24:24 +03:00
wozeparrot
3a92df66ea
feat: bump version to 0.12.0 ( #14105 )
2026-01-11 21:19:49 -08:00
chenyu
7c234a9c7c
wgsl cleanup [pr] ( #14103 )
...
refactor common pack functions
2026-01-11 21:23:45 -05:00
George Hotz
91bde927ef
assembly/amd: split asm.py into asm.py and disasm.py ( #14101 )
...
* split asm.py into asm.py and disasm.py
* split decoder
* move to pcode
* tests
2026-01-12 07:22:02 +09:00