Commit Graph

11719 Commits

Author SHA1 Message Date
qazal
76b577ee76 viz: only SIMD name in sqtt timeline rows (#14146) 2026-01-14 20:13:27 +09:00
George Hotz
e5500ae4ad add ALU stuff to default perf counters (#14135)
* add ALU stuff to default perf counters

* lds

* add alu utilization

* cleaner

* format as percent

* cleanest

* roc
2026-01-14 19:47:59 +09:00
nimlgen
86708ccac5 hip_ioctl: dump aql (#14142) 2026-01-14 13:15:10 +03:00
nimlgen
f9147422a3 ci: add setcap (#14143) 2026-01-14 13:15:01 +03:00
nimlgen
62c1a014a6 amd: rename to be consistent (#14141) 2026-01-14 11:41:04 +03:00
Christopher Milan
e0eea0d833 autogen: verify all files in CI (#14140)
* autogen: verify all files in CI

* dont delete libclang
2026-01-14 02:35:54 -05:00
chenyu
2a2c1eacf6 disable fast_idiv on metal (#14137)
there's a metal compiler bug which was the root cause that keccak needs a contigous hack
2026-01-13 21:40:40 -05:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
George Hotz
2ab18ea7e3 assembly/amd: use xml instead of pdf (#14118)
* assembly/amd: use xml instead of pdf

* use amdxml to generate info about op sizes

* fix many tests with invalid instructions

* fix info generation

* chad xml fixes many bugs

* rename to operands

* simplify

* amdxml

* bug fix
2026-01-14 10:03:37 +09:00
qazal
002ea39da7 assembly/amd: use Tensor.custom_kernel to run assembly (#14125)
* assembly/amd: use Tensor.custom_kernel to run assembly

* PRINT_ASM=1 is DEBUG=4
2026-01-14 08:29:25 +09:00
chenyu
fe00682502 clean up svd tests (#14133)
removed from test_ops and added to TestTorchBackend
2026-01-13 16:32:21 -05:00
chenyu
84b88a0a31 more doc of newly added functions (#14132) 2026-01-13 15:48:45 -05:00
chenyu
e610821c52 Tensor.cummin and Tensor.nonzero (#14131) 2026-01-13 15:09:56 -05:00
chenyu
176a934ddd Tensor.diagonal support offset and dims (#14130) 2026-01-13 14:49:06 -05:00
chenyu
2a217ba206 tinybackend isin and log10 (#14120)
can use tinygrad directly
2026-01-13 14:14:09 -05:00
qazal
79d00521f8 viz: fix cfg err when endpgm is in the middle of stream (#14128)
* kernel from beautiful_mnist

* minimal test

* correct way to do this

* rm that
2026-01-14 02:00:34 +09:00
qazal
7fe91e5db9 viz: cleanup cfg renderer (#14127)
* remove colorDomains from sqtt

* colors in js

* work
2026-01-14 01:10:42 +09:00
nimlgen
1364449cab system: early pci perm check (#14126)
* system: early pci perm check

* l
2026-01-13 17:45:05 +03:00
George Hotz
a28c8105a5 assembly/amd: 2% faster amd_uop_matmul + SQTT (#14122)
* assembly/amd: 2% faster amd_uop_matmul

* SQTT_TOKEN_EXCLUDE + SQTT_SIMD_SEL

* sqtt printer

* fix printer

* fast decode

* fast decoder

* test packet counts

* ugh it's not faster

* dead
2026-01-13 19:55:32 +09:00
qazal
6cd318e377 viz: add link to graph from sqtt (#14123) 2026-01-13 17:31:03 +09:00
qazal
fd10fd245a viz: cfg tokenizer fix and unit tests (#14121)
* output Ops.BINARY

* failing test for the cfg

* dsl renamed to offset and sz

* add better asserts

* move the note
2026-01-13 15:08:55 +09:00
chenyu
05fcb57696 also return index in Tensor.cummax (#14117)
* also return index in Tensor.cummax

* fix
2026-01-12 22:42:10 -05:00
wozeparrot
7c967399a4 tk: add failing test for fa multidevice (#14116) 2026-01-12 19:11:09 -08:00
George Hotz
330a0b686e assembly/amd: clean up dsl and make type verification strict (#14102)
* assembly/amd: start newdsl

* work

* newdsl upd

* Reg is p nice

* cleaner

* work

* getting clean

* all fields

* more BitFields

* redo the pdfs with dsl2 syntax

* no lit

* cleanups

* more defaults

* fix get and remove crap

* aliases

* ugly but kind of works

* NULL, not rawimm

* clean up defaults

* only dsl

* asm fixes

* lit fixup

* more lit

* cleanups

* olddsl

* single pcode dict

* emu sort of works

* trash test

* global is global

* types property

* reg mods

* fix a few tests

* remove monkey patch

* fixes

* less hacks in tests

* less hacks in tests

* 4 test failures

* hw tests all pass

* fix compare emulator

* fix some tests

* 3 more

* fix and shorten sqtt

* handwritten

* fix validation

* test corrections

* all types validate

* fix dsl2 tests

* fix bugs in disasm

* skips on cdna

* work

* repr with reg[]

* fix bitfield tests

* merge pcodes in dsl

* remove override

* disasm uses inst.types

* simpler
2026-01-13 08:52:16 +09:00
C T
a8c821f45e add Tensor.log10 with test\test_ops.py::TestOps::test_log10 (#14113) 2026-01-12 13:45:47 -05:00
chenyu
6b0a9f5ee6 don't strip sink in to_uops_list [pr] (#14111) v0.12.0 2026-01-12 11:19:03 -05:00
chenyu
cad7feec02 more onnx ops (#14104)
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
nimlgen
635ed2df9d system: use pci.PCI_VENDOR_ID instead of const (#14109) 2026-01-12 15:24:09 +03:00
qazal
6c0f0e29ff Revert "viz: loading... (#14107)" (#14108)
This reverts commit 9347757c2d.
2026-01-12 20:45:37 +09:00
nimlgen
9347757c2d viz: loading... (#14107) 2026-01-12 13:24:24 +03:00
wozeparrot
3a92df66ea feat: bump version to 0.12.0 (#14105) 2026-01-11 21:19:49 -08:00
chenyu
7c234a9c7c wgsl cleanup [pr] (#14103)
refactor common pack functions
2026-01-11 21:23:45 -05:00
George Hotz
91bde927ef assembly/amd: split asm.py into asm.py and disasm.py (#14101)
* split asm.py into asm.py and disasm.py

* split decoder

* move to pcode

* tests
2026-01-12 07:22:02 +09:00
George Hotz
44135e2e84 assembly/amd: always use v_nop in test for rocprof-trace-decoder (#14100)
* assembly/amd: always use v_nop in test for rocprof-trace-decoder

* test touchups
2026-01-12 05:31:58 +09:00
George Hotz
8b1b15aec0 assembly/amd: SQTT support (#14099)
* assembly/amd: SQTT support

* simpler

* cmp wave

* instruction compare

* rocprof decode

* simpler

* no llvm

* no strcmp
2026-01-12 05:07:17 +09:00
nimlgen
8b5ff403fa am: flag successful finalization (#14097)
* am: flag successful finalization

* import
2026-01-11 16:24:53 +03:00
qazal
d8aba24967 amd: use kernel descriptor struct in AMDProgram (#14096) 2026-01-11 18:25:16 +09:00
chenyu
9973a81356 add channels_last to QLinearGlobalAveragePool (#14094)
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
c5492f8f75 cstyle cleanup [pr] (#14093) 2026-01-10 09:44:50 -05:00
nimlgen
d5f954858d viz: show precise timings (#14092) 2026-01-10 16:21:08 +03:00
nimlgen
3e2c05ee9f hevc: decoder as iterator (#14091) 2026-01-10 14:57:56 +03:00
chenyu
35c9701df0 update outdated tests and comments (#14090) 2026-01-10 01:00:48 -05:00
chenyu
92246ea731 update tests, WEBGPU=1 pytest . passes (#14089)
* update tests, `WEBGPU=1 pytest .` passes

* minor update
2026-01-10 00:03:02 -05:00
chenyu
c34c6d9468 fix wgsl packed_store can drop valid (#14088)
* fix wgsl packed_store can drop valid

* fix
2026-01-09 15:22:06 -05:00
chenyu
eacccc5ace more disk assign tests (#14087)
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc don't skip gguf test if ggml is not installed (#14086)
* don't skip gguf test if ggml is not installed

should just let it fail

* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78 add some disk assign tests (#14085) 2026-01-09 11:50:59 -05:00
chenyu
74fa3c7d09 decomp pow for LVP (#14084)
test failed due to undefined behavior, so use decomp instead
2026-01-09 10:50:28 -05:00
b1tg
0fbc551622 train bert with fp8 (#13874)
* fp8 train

* clean

* lint

* test fix from #13439

* skip first/last layer

* rm __init__, restore unroll <=32 check

* tests

* clean test, remove unused

* multi-gpu test, clean quantize_to_fp8

* remove bert contiguous

* run script

* test: better check

* run script search

* add seed in bert data shuffle

* move script to mi350x folder

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-09 09:21:59 -05:00
nimlgen
ba209d6305 am: utc_l1_enable on all sdma inst (#14083) 2026-01-09 17:17:05 +03:00