chenyu
6b0a9f5ee6
don't strip sink in to_uops_list [pr] ( #14111 )
v0.12.0
2026-01-12 11:19:03 -05:00
chenyu
cad7feec02
more onnx ops ( #14104 )
...
HannWindow, HammingWindow, BlackmanWindow, Hardmax, LpNormalization
2026-01-12 09:11:13 -05:00
nimlgen
635ed2df9d
system: use pci.PCI_VENDOR_ID instead of const ( #14109 )
2026-01-12 15:24:09 +03:00
qazal
6c0f0e29ff
Revert "viz: loading... ( #14107 )" ( #14108 )
...
This reverts commit 9347757c2d .
2026-01-12 20:45:37 +09:00
nimlgen
9347757c2d
viz: loading... ( #14107 )
2026-01-12 13:24:24 +03:00
wozeparrot
3a92df66ea
feat: bump version to 0.12.0 ( #14105 )
2026-01-11 21:19:49 -08:00
chenyu
7c234a9c7c
wgsl cleanup [pr] ( #14103 )
...
refactor common pack functions
2026-01-11 21:23:45 -05:00
George Hotz
91bde927ef
assembly/amd: split asm.py into asm.py and disasm.py ( #14101 )
...
* split asm.py into asm.py and disasm.py
* split decoder
* move to pcode
* tests
2026-01-12 07:22:02 +09:00
George Hotz
44135e2e84
assembly/amd: always use v_nop in test for rocprof-trace-decoder ( #14100 )
...
* assembly/amd: always use v_nop in test for rocprof-trace-decoder
* test touchups
2026-01-12 05:31:58 +09:00
George Hotz
8b1b15aec0
assembly/amd: SQTT support ( #14099 )
...
* assembly/amd: SQTT support
* simpler
* cmp wave
* instruction compare
* rocprof decode
* simpler
* no llvm
* no strcmp
2026-01-12 05:07:17 +09:00
nimlgen
8b5ff403fa
am: flag successful finalization ( #14097 )
...
* am: flag successful finalization
* import
2026-01-11 16:24:53 +03:00
qazal
d8aba24967
amd: use kernel descriptor struct in AMDProgram ( #14096 )
2026-01-11 18:25:16 +09:00
chenyu
9973a81356
add channels_last to QLinearGlobalAveragePool ( #14094 )
...
and other minor cleanups
2026-01-10 18:38:19 -05:00
chenyu
c5492f8f75
cstyle cleanup [pr] ( #14093 )
2026-01-10 09:44:50 -05:00
nimlgen
d5f954858d
viz: show precise timings ( #14092 )
2026-01-10 16:21:08 +03:00
nimlgen
3e2c05ee9f
hevc: decoder as iterator ( #14091 )
2026-01-10 14:57:56 +03:00
chenyu
35c9701df0
update outdated tests and comments ( #14090 )
2026-01-10 01:00:48 -05:00
chenyu
92246ea731
update tests, WEBGPU=1 pytest . passes ( #14089 )
...
* update tests, `WEBGPU=1 pytest .` passes
* minor update
2026-01-10 00:03:02 -05:00
chenyu
c34c6d9468
fix wgsl packed_store can drop valid ( #14088 )
...
* fix wgsl packed_store can drop valid
* fix
2026-01-09 15:22:06 -05:00
chenyu
eacccc5ace
more disk assign tests ( #14087 )
...
covers more edge cases
2026-01-09 14:14:52 -05:00
chenyu
ed295e74dc
don't skip gguf test if ggml is not installed ( #14086 )
...
* don't skip gguf test if ggml is not installed
should just let it fail
* fix
2026-01-09 12:05:58 -05:00
chenyu
cff33c8d78
add some disk assign tests ( #14085 )
2026-01-09 11:50:59 -05:00
chenyu
74fa3c7d09
decomp pow for LVP ( #14084 )
...
test failed due to undefined behavior, so use decomp instead
2026-01-09 10:50:28 -05:00
b1tg
0fbc551622
train bert with fp8 ( #13874 )
...
* fp8 train
* clean
* lint
* test fix from #13439
* skip first/last layer
* rm __init__, restore unroll <=32 check
* tests
* clean test, remove unused
* multi-gpu test, clean quantize_to_fp8
* remove bert contiguous
* run script
* test: better check
* run script search
* add seed in bert data shuffle
* move script to mi350x folder
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-09 09:21:59 -05:00
nimlgen
ba209d6305
am: utc_l1_enable on all sdma inst ( #14083 )
2026-01-09 17:17:05 +03:00
nimlgen
6b308b89b7
viz: timeline time ( #14080 )
...
* viz: timeline time
* less lines
* cut
2026-01-09 16:43:45 +03:00
nimlgen
40f9fa2db4
autogen: new kfd ( #14082 )
2026-01-09 16:08:17 +03:00
qazal
2917ed1616
roc: propagate decoder errors to main thread ( #14081 )
...
* roc: propagate decoder errors to main thread
* types
* add cause
2026-01-09 21:10:45 +09:00
qazal
f3f4d9b387
viz: fix disasm node width ( #14079 )
2026-01-09 16:37:37 +09:00
anu
c70c112254
fix CUDA=1 disassembly (VIZ=1) by stripping null terminator ( #14046 )
...
* fix ptxas disassembly bug
* single '
* move fix to get_bytes
* move rstrip
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2026-01-09 15:19:59 +09:00
qazal
13e5d00d0e
viz: exclude comma in register highlight ( #14078 )
...
* viz: exclude comma in register highlight
* simplify
2026-01-09 15:10:30 +09:00
qazal
a071adffc0
viz: amdgpu disassembly register highlighting UI ( #14059 )
...
* viz: amdgpu disassembly register highlighting
* minor details
* details from IDA
* more details from IDA
* refactor token colors
* move tokenizer to python
* simplify
* minimal tokenizer for registers
* all the operand types
2026-01-09 11:27:09 +09:00
chenyu
b878f9d5a4
reuse Tensor init with const path [pr] ( #14076 )
2026-01-08 17:49:37 -05:00
chenyu
efcb32f6a9
unique const when requires_grad is set to True ( #14075 )
...
* unique const when requires_grad is set to True
* fix pyrender
2026-01-08 16:30:45 -05:00
chenyu
b34c637767
support bfloat16 for CL ( #14073 )
2026-01-08 14:14:29 -05:00
Garret Castro
16b652302e
skip bf16 test if not supported by device ( #14070 )
2026-01-08 13:37:24 -05:00
nimlgen
3f61a96d79
am: SetSoftMaxByFreq on gfx10+ ( #14068 )
2026-01-08 17:00:03 +03:00
George Hotz
e7b5d8a434
assembly/amd: more RDNA4 asm ( #14062 )
...
* rdna4 more
* asm
* fixes
* assembly/amd: handwritten wmma failing test
* passes
* wmma default hacks
* space
* 0 skips in rdna3/rdna4 disasm
* more RDNA4 tests
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2026-01-08 05:09:37 -08:00
nimlgen
e372c841ba
hevc: beam in decode ( #14067 )
...
* hevc: beam in decode
* fine
* g
2026-01-08 15:47:16 +03:00
nimlgen
1732a4ec4b
am: rework set_clocks ( #14065 )
2026-01-08 15:33:32 +03:00
nimlgen
f3aceaa08b
hevc: fast decoder ( #14057 )
2026-01-08 15:20:37 +03:00
qazal
309197bca5
assembly/amd: test_roundtrip for cdna/rdna4 ( #14066 )
2026-01-08 21:03:13 +09:00
qazal
15a056715d
fix amd assembly IDE tests on macbook ( #14063 )
2026-01-08 17:27:52 +09:00
wozeparrot
027b935269
tk: fix grouped load store ( #14035 )
2026-01-07 22:38:02 -08:00
George Hotz
2db04d0696
assembly/amd: start adding RDNA4 support ( #14060 )
...
* assembly/amd: start adding RDNA4 support
* rdna4 asm
2026-01-07 21:19:30 -08:00
George Hotz
cb500466c2
assembly/amd: amd_asm_matmul ( #13989 )
...
* amd_asm_matmul
* dsl transform
* asm roundtrip
* fixed
* less
* better
* more
* simpler
* simplify
* lil
* simpler
* compact
* work
* cleanups
* simplify
* simpler
* cleanup
* name the regs
* simp
* big simp
* big simp
* simp
* acc grid
* fast
* stuff
* fast
* simpler
* owrks
* save vgprs
* save vgprs
* Compact
* less VGPRs
* after
* SQTT support
* fastest
* faster
* lil faster
* tile regs
* faster
* readable
* one more
* simpler
* lil simpler
* NO_GLOBAL skips early globals
* stock kernel
* cleanups
* cleanups
* one b reg
* safe reg changes
* acc is compact now
* remove confusing stuff
* sregs
* lds cleanups
* vopd
2026-01-07 20:11:05 -08:00
chenyu
3caa1e2c98
fix cast HALF with PYTHON backend ( #14058 )
2026-01-07 16:52:05 -05:00
chenyu
5f1ede7f7e
clean up test_dtype ( #14055 )
...
use less lambda
2026-01-07 15:45:42 -05:00
nimlgen
5bd4593eda
hevc: cleaner decoder ( #14056 )
...
* hevc: cleaner decoder
* nn
2026-01-07 18:29:30 +03:00
b1tg
241f0402b4
add seed in bert data shuffle ( #14054 )
2026-01-07 10:02:05 -05:00