qazal
76b577ee76
viz: only SIMD name in sqtt timeline rows ( #14146 )
2026-01-14 20:13:27 +09:00
George Hotz
e5500ae4ad
add ALU stuff to default perf counters ( #14135 )
...
* add ALU stuff to default perf counters
* lds
* add alu utilization
* cleaner
* format as percent
* cleanest
* roc
2026-01-14 19:47:59 +09:00
nimlgen
86708ccac5
hip_ioctl: dump aql ( #14142 )
2026-01-14 13:15:10 +03:00
nimlgen
f9147422a3
ci: add setcap ( #14143 )
2026-01-14 13:15:01 +03:00
nimlgen
62c1a014a6
amd: rename to be consistent ( #14141 )
2026-01-14 11:41:04 +03:00
Christopher Milan
e0eea0d833
autogen: verify all files in CI ( #14140 )
...
* autogen: verify all files in CI
* dont delete libclang
2026-01-14 02:35:54 -05:00
chenyu
2a2c1eacf6
disable fast_idiv on metal ( #14137 )
...
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c
tk: fa multi fix ( #14134 )
2026-01-13 17:22:15 -08:00
George Hotz
2ab18ea7e3
assembly/amd: use xml instead of pdf ( #14118 )
...
* assembly/amd: use xml instead of pdf
* use amdxml to generate info about op sizes
* fix many tests with invalid instructions
* fix info generation
* chad xml fixes many bugs
* rename to operands
* simplify
* amdxml
* bug fix
2026-01-14 10:03:37 +09:00
qazal
002ea39da7
assembly/amd: use Tensor.custom_kernel to run assembly ( #14125 )
...
* assembly/amd: use Tensor.custom_kernel to run assembly
* PRINT_ASM=1 is DEBUG=4
2026-01-14 08:29:25 +09:00
chenyu
fe00682502
clean up svd tests ( #14133 )
...
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
84b88a0a31
more doc of newly added functions ( #14132 )
2026-01-13 15:48:45 -05:00
chenyu
e610821c52
Tensor.cummin and Tensor.nonzero ( #14131 )
2026-01-13 15:09:56 -05:00
chenyu
176a934ddd
Tensor.diagonal support offset and dims ( #14130 )
2026-01-13 14:49:06 -05:00
chenyu
2a217ba206
tinybackend isin and log10 ( #14120 )
...
can use tinygrad directly
2026-01-13 14:14:09 -05:00
qazal
79d00521f8
viz: fix cfg err when endpgm is in the middle of stream ( #14128 )
...
* kernel from beautiful_mnist
* minimal test
* correct way to do this
* rm that
2026-01-14 02:00:34 +09:00
qazal
7fe91e5db9
viz: cleanup cfg renderer ( #14127 )
...
* remove colorDomains from sqtt
* colors in js
* work
2026-01-14 01:10:42 +09:00
nimlgen
1364449cab
system: early pci perm check ( #14126 )
...
* system: early pci perm check
* l
2026-01-13 17:45:05 +03:00
George Hotz
a28c8105a5
assembly/amd: 2% faster amd_uop_matmul + SQTT ( #14122 )
...
* assembly/amd: 2% faster amd_uop_matmul
* SQTT_TOKEN_EXCLUDE + SQTT_SIMD_SEL
* sqtt printer
* fix printer
* fast decode
* fast decoder
* test packet counts
* ugh it's not faster
* dead
2026-01-13 19:55:32 +09:00
qazal
6cd318e377
viz: add link to graph from sqtt ( #14123 )
2026-01-13 17:31:03 +09:00
qazal
fd10fd245a
viz: cfg tokenizer fix and unit tests ( #14121 )
...
* output Ops.BINARY
* failing test for the cfg
* dsl renamed to offset and sz
* add better asserts
* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696
also return index in Tensor.cummax ( #14117 )
...
* also return index in Tensor.cummax
* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4
tk: add failing test for fa multidevice ( #14116 )
2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e
assembly/amd: clean up dsl and make type verification strict ( #14102 )
...
* assembly/amd: start newdsl
* work
* newdsl upd
* Reg is p nice
* cleaner
* work
* getting clean
* all fields
* more BitFields
* redo the pdfs with dsl2 syntax
* no lit
* cleanups
* more defaults
* fix get and remove crap
* aliases
* ugly but kind of works
* NULL, not rawimm
* clean up defaults
* only dsl
* asm fixes
* lit fixup
* more lit
* cleanups
* olddsl
* single pcode dict
* emu sort of works
* trash test
* global is global
* types property
* reg mods
* fix a few tests
* remove monkey patch
* fixes
* less hacks in tests
* less hacks in tests
* 4 test failures
* hw tests all pass
* fix compare emulator
* fix some tests
* 3 more
* fix and shorten sqtt
* handwritten
* fix validation
* test corrections
* all types validate
* fix dsl2 tests
* fix bugs in disasm
* skips on cdna
* work
* repr with reg[]
* fix bitfield tests
* merge pcodes in dsl
* remove override
* disasm uses inst.types
* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e
add Tensor.log10 with test\test_ops.py::TestOps::test_log10 ( #14113 )
2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6
don't strip sink in to_uops_list [pr] ( #14111 )
v0.12.0
2026-01-12 11:19:03 -05:00
chenyu
cad7feec02
more onnx ops ( #14104 )
...
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
nimlgen
635ed2df9d
system: use pci.PCI_VENDOR_ID instead of const ( #14109 )
2026-01-12 15:24:09 +03:00
qazal
6c0f0e29ff
Revert "viz: loading... ( #14107 )" ( #14108 )
...
This reverts commit 9347757c2d .
2026-01-12 20:45:37 +09:00
nimlgen
9347757c2d
viz: loading... ( #14107 )
2026-01-12 13:24:24 +03:00
wozeparrot
3a92df66ea
feat: bump version to 0.12.0 ( #14105 )
2026-01-11 21:19:49 -08:00
chenyu
7c234a9c7c
wgsl cleanup [pr] ( #14103 )
...
refactor common pack functions
2026-01-11 21:23:45 -05:00
George Hotz
91bde927ef
assembly/amd: split asm.py into asm.py and disasm.py ( #14101 )
...
* split asm.py into asm.py and disasm.py
* split decoder
* move to pcode
* tests
2026-01-12 07:22:02 +09:00
George Hotz
44135e2e84
assembly/amd: always use v_nop in test for rocprof-trace-decoder ( #14100 )
...
* assembly/amd: always use v_nop in test for rocprof-trace-decoder
* test touchups
2026-01-12 05:31:58 +09:00
George Hotz
8b1b15aec0
assembly/amd: SQTT support ( #14099 )
...
* assembly/amd: SQTT support
* simpler
* cmp wave
* instruction compare
* rocprof decode
* simpler
* no llvm
* no strcmp
2026-01-12 05:07:17 +09:00
nimlgen
8b5ff403fa
am: flag successful finalization ( #14097 )
...
* am: flag successful finalization
* import
2026-01-11 16:24:53 +03:00
qazal
d8aba24967
amd: use kernel descriptor struct in AMDProgram ( #14096 )
2026-01-11 18:25:16 +09:00
chenyu
9973a81356
add channels_last to QLinearGlobalAveragePool ( #14094 )
...
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
c5492f8f75
cstyle cleanup [pr] ( #14093 )
2026-01-10 09:44:50 -05:00
nimlgen
d5f954858d
viz: show precise timings ( #14092 )
2026-01-10 16:21:08 +03:00
nimlgen
3e2c05ee9f
hevc: decoder as iterator ( #14091 )
2026-01-10 14:57:56 +03:00
chenyu
35c9701df0
update outdated tests and comments ( #14090 )
2026-01-10 01:00:48 -05:00
chenyu
92246ea731
update tests, WEBGPU=1 pytest . passes ( #14089 )
...
* update tests, `WEBGPU=1 pytest .` passes
* minor update
2026-01-10 00:03:02 -05:00
chenyu
c34c6d9468
fix wgsl packed_store can drop valid ( #14088 )
...
* fix wgsl packed_store can drop valid
* fix
2026-01-09 15:22:06 -05:00
chenyu
eacccc5ace
more disk assign tests ( #14087 )
...
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc
don't skip gguf test if ggml is not installed ( #14086 )
...
* don't skip gguf test if ggml is not installed
should just let it fail
* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78
add some disk assign tests ( #14085 )
2026-01-09 11:50:59 -05:00
chenyu
74fa3c7d09
decomp pow for LVP ( #14084 )
...
test failed due to undefined behavior, so use decomp instead
2026-01-09 10:50:28 -05:00
b1tg
0fbc551622
train bert with fp8 ( #13874 )
...
* fp8 train
* clean
* lint
* test fix from #13439
* skip first/last layer
* rm __init__, restore unroll <=32 check
* tests
* clean test, remove unused
* multi-gpu test, clean quantize_to_fp8
* remove bert contiguous
* run script
* test: better check
* run script search
* add seed in bert data shuffle
* move script to mi350x folder
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-09 09:21:59 -05:00
nimlgen
ba209d6305
am: utc_l1_enable on all sdma inst ( #14083 )
2026-01-09 17:17:05 +03:00