11663 Commits

Author SHA1 Message Date
nimlgen
7cd8852f60 autogen: do no return tuples (#13629) 2025-12-09 20:08:13 +03:00
nimlgen
9e484b5b1c hcq: check size is None, do not read the whole size for 0s (#13628) 2025-12-09 19:37:44 +03:00
nimlgen
1329033b8c am: fix hot-queue restarts, only dequeue (#13627) 2025-12-09 19:37:21 +03:00
nimlgen
b07839493d proclogs with xccs (#13626) 2025-12-09 16:46:08 +03:00
qazal
2c333818f4 simplify UOp stringifier [pr] (#13618)
* simplify UOp stringifier [pr]

* fix tuple
2025-12-09 05:06:16 +08:00
chenyu
2471b49e45 minor bert / llama change from grad acc branch (#13622)
* minor bert / llama change from grad acc branch

* revert those
2025-12-08 16:04:14 -05:00
Christopher Milan
cb3d756547 NAK compile-only test (#13621) 2025-12-08 15:53:46 -05:00
Christopher Milan
a4c3d48aa9 compile-only test for IR3 actually works (#13619) 2025-12-08 15:07:49 -05:00
Christopher Milan
a17077d1d9 skip test_double_assign in CI LVP (#13620) 2025-12-08 14:54:02 -05:00
Christopher Milan
1c16b6e082 Mesa: freedreno (#12746)
* ir3 init

* got a program

* 1 + 1 works

* use isa_disasm instead of shader_disasm

* wip

* matmul works

* works on py3.14

* fix const loading

* skip QCOM failing tests

* cleanup

* args actually work

* add compile-only tests

* fix typo and install tinymesa

* IR3 NULL backend

* (float32) images work

* autogen fix

* fix compile only test

* typo

* mypy happy

* compile-only uses py3.14

* bump mesa

* unify qcom disassembler

* float16 works

* disasm shows in viz

* save a line

* add real del

* variable workgroup sizes

* simplify diff

* bump line count

* properly set wgsz

* regen mesa

* no preamble

* bump lines
2025-12-08 14:02:08 -05:00
Douglas Nyberg
947c6eefc3 add Swish op (#13541)
* add Swish ONNX operator

* add Swish regression test

* remove trailing whitespace

* upgrade ONNX to 1.20, add excludes for unimplemented ops

* upgrade ONNX to 1.19, add Swish op

* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op

* exclude attention_3d and attention_4d_gqa tests

* exclude attention fp16 tests

* exclude all attention tests

* retrigger CI

* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
nimlgen
dd8a1a10d4 amd: tiny cleanups (#13616) 2025-12-08 13:15:56 +03:00
qazal
2b07336c82 viz server cleanups (#13615)
* depths start at 0

* rename the api path
2025-12-08 17:44:43 +08:00
wozeparrot
89c4206e22 fix: typing (#13614) 2025-12-07 20:10:30 -08:00
qazal
572dfd5506 add static amd program info to viz (#13594)
* llvm-readelf

* amd_readelf + soft_err

* cleanup

* multiple metadata

* max wgp size, may be less
2025-12-08 04:08:14 +08:00
qazal
73093314bd viz: support list of sidebar info (#13612) 2025-12-08 03:09:43 +08:00
chenyu
b981b6f89e remove old llama grad_acc (#13611)
* remove old llama grad_acc

* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
Christopher Milan
94d7646bdc fix anonymous struct fields (#13610) 2025-12-07 12:56:38 -05:00
nimlgen
dcd50baca4 amd/nv: cleanup (#13608) 2025-12-07 17:05:26 +03:00
nimlgen
ac5f1e115d autogen: repro for the bug (#13607)
* autogen: repro for the test

* mute
2025-12-07 15:51:03 +03:00
Christopher Milan
4eae4b0ce6 unify adreno autogen with mesa (#13604)
* unify adreno autogen with mesa

* gen pm4

* TestTiny::test_plus works

* add a6xx enums

* IMAGE=2 TestTiny::test_gemm works

* remove adreno from CI

* cleanup
2025-12-06 15:17:36 -05:00
kamilisjon
e20bc0b9b5 remove unused function parameter in beam search (#13602) 2025-12-06 11:40:47 -05:00
nimlgen
abafb96441 hcq: check all subbufs are free (#13599)
* hcq: check all subbufs are free

* fix

* Update ops_amd.py
2025-12-06 17:43:18 +03:00
nimlgen
f2b549d921 amd: refactor scratch calc (#13595)
* amd: refactor scratch calc

* fix
2025-12-06 16:41:35 +03:00
chenyu
4562f217e1 more bert updates (#13597)
prep split jit
also lower BS to 72
2025-12-06 08:32:43 -05:00
wozeparrot
93f1baca77 feat: tk fa in tensor (#13580) 2025-12-05 14:36:29 -08:00
chenyu
cb4c6324ef revert bert grad accumulation (#13596)
prep for the new split jit style
2025-12-05 17:30:08 -05:00
qazal
f20212e1ec refactor viz error handler (#13593) 2025-12-06 02:37:39 +08:00
Christopher Milan
dec2f50aee reenable process replay for lvp (#13592) 2025-12-05 12:36:35 -05:00
chenyu
0977206b1c Revert am (#13591)
* Revert "hotfix: amd: tmpring (#13589)"

This reverts commit 4d8b283b36.

* Revert "amd: use correct structs (#13583)"

This reverts commit d8b09eda57.
2025-12-05 11:03:12 -05:00
chenyu
ac1227575f IMAGE=1 driving_vision in benchmark (#13587) 2025-12-05 10:20:54 -05:00
nimlgen
4d8b283b36 hotfix: amd: tmpring (#13589)
* hotfix: amd: tmpring

* more
2025-12-05 18:19:05 +03:00
qazal
8c332219f9 viz: remove x86asm highlighter (#13586)
* viz: remove x86asm highlighter

* formatting
2025-12-05 21:05:50 +08:00
qazal
5d8726d8d2 viz: refactor to generic sidebar (#13584) 2025-12-05 20:09:41 +08:00
nimlgen
d8b09eda57 amd: use correct structs (#13583) 2025-12-05 14:46:38 +03:00
qazal
6d92e9ffbf hotfix: skip process replay on lvp (#13585) 2025-12-05 19:25:23 +08:00
Christopher Milan
8011b953c9 mesa: remove glsl type hack (#13578)
* mesa: remove glsl type hack

* lazy type access

* save a line

* fix windows?

* mypy happy
2025-12-04 21:18:56 -05:00
George Hotz
c5bd28e21d start work on schedule cache (#13529)
* start work on schedule cache

* local unique

* schedule cache works

* schedule cache cleanup

* fix tests

* preserve metadata

* oops, fix cache

* put that there

* fix spec

* always miss

* why is that broken?

* src[0].op

* fix process replay

* delete abstractions2

* reenable the actual schedule cache

* metadata is best effort

* fix JIT in examples/gradaccum_mnist.py

* full jit

* fixed and test is real
2025-12-04 17:24:49 -08:00
wozeparrot
62e2fc5108 tk: global load/store rv (#13577) 2025-12-04 17:23:48 -08:00
Christopher Milan
5cfe1698e8 autogen: strip function parameter qualifiers (#13576)
* autogen: strip function parameter qualifiers

* regen hip

* re-regen hip
2025-12-04 19:54:34 -05:00
qazal
f21c9dbf4b enable PMC with VIZ=2 (#13575) 2025-12-05 03:09:53 +08:00
qazal
d7caae5f61 viz: tabulate pmc (#13574)
* viz: tabulate pmc

* linter

* enable nesting

* pmc comes before waves
2025-12-05 03:08:39 +08:00
chenyu
42f6cf3a90 tighter test_real_world mem and kernel count bounds (#13573)
also check if actual usage is within 20% of set limit, the old limits are too big to be useful
2025-12-04 13:35:39 -05:00
chenyu
89f9e1dcd5 add SGD to beautiful_mnist (#13571) 2025-12-04 12:17:29 -05:00
qazal
512a8f3dd4 viz: start global memory PMC tests (#13569) 2025-12-05 00:40:27 +08:00
chenyu
7df56d3b99 Optimizer.device is a property (#13568) 2025-12-04 09:25:15 -05:00
nimlgen
db99a61fad qcom: support cpu mappings (#13565)
* test

* qcom: support cpu mappings

* clean

* msg
2025-12-04 14:50:46 +03:00
George Hotz
bd6a068ef7 move track_rewrites to outer schedule cache (#13556)
Co-authored-by: qazal <qazal.software@gmail.com>
2025-12-04 19:13:45 +08:00
qazal
3eae146139 faster process replay [pr] (#13564) 2025-12-04 18:52:07 +08:00
Rory Clear
6eab756578 fix and test loading num_batches_tracked (#13538)
* fix and test loading num_batches_tracked

* add failing reverse case

* try reshape state dict if mismatch

* reshape for () and (1,)

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-12-04 01:22:49 -08:00