nimlgen
7cd8852f60
autogen: do no return tuples ( #13629 )
2025-12-09 20:08:13 +03:00
nimlgen
9e484b5b1c
hcq: check size is None, do not read the whole size for 0s ( #13628 )
2025-12-09 19:37:44 +03:00
nimlgen
1329033b8c
am: fix hot-queue restarts, only dequeue ( #13627 )
2025-12-09 19:37:21 +03:00
nimlgen
b07839493d
proclogs with xccs ( #13626 )
2025-12-09 16:46:08 +03:00
qazal
2c333818f4
simplify UOp stringifier [pr] ( #13618 )
...
* simplify UOp stringifier [pr]
* fix tuple
2025-12-09 05:06:16 +08:00
chenyu
2471b49e45
minor bert / llama change from grad acc branch ( #13622 )
...
* minor bert / llama change from grad acc branch
* revert those
2025-12-08 16:04:14 -05:00
Christopher Milan
cb3d756547
NAK compile-only test ( #13621 )
2025-12-08 15:53:46 -05:00
Christopher Milan
a4c3d48aa9
compile-only test for IR3 actually works ( #13619 )
2025-12-08 15:07:49 -05:00
Christopher Milan
a17077d1d9
skip test_double_assign in CI LVP ( #13620 )
2025-12-08 14:54:02 -05:00
Christopher Milan
1c16b6e082
Mesa: freedreno ( #12746 )
...
* ir3 init
* got a program
* 1 + 1 works
* use isa_disasm instead of shader_disasm
* wip
* matmul works
* works on py3.14
* fix const loading
* skip QCOM failing tests
* cleanup
* args actually work
* add compile-only tests
* fix typo and install tinymesa
* IR3 NULL backend
* (float32) images work
* autogen fix
* fix compile only test
* typo
* mypy happy
* compile-only uses py3.14
* bump mesa
* unify qcom disassembler
* float16 works
* disasm shows in viz
* save a line
* add real del
* variable workgroup sizes
* simplify diff
* bump line count
* properly set wgsz
* regen mesa
* no preamble
* bump lines
2025-12-08 14:02:08 -05:00
Douglas Nyberg
947c6eefc3
add Swish op ( #13541 )
...
* add Swish ONNX operator
* add Swish regression test
* remove trailing whitespace
* upgrade ONNX to 1.20, add excludes for unimplemented ops
* upgrade ONNX to 1.19, add Swish op
* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op
* exclude attention_3d and attention_4d_gqa tests
* exclude attention fp16 tests
* exclude all attention tests
* retrigger CI
* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
nimlgen
dd8a1a10d4
amd: tiny cleanups ( #13616 )
2025-12-08 13:15:56 +03:00
qazal
2b07336c82
viz server cleanups ( #13615 )
...
* depths start at 0
* rename the api path
2025-12-08 17:44:43 +08:00
wozeparrot
89c4206e22
fix: typing ( #13614 )
2025-12-07 20:10:30 -08:00
qazal
572dfd5506
add static amd program info to viz ( #13594 )
...
* llvm-readelf
* amd_readelf + soft_err
* cleanup
* multiple metadata
* max wgp size, may be less
2025-12-08 04:08:14 +08:00
qazal
73093314bd
viz: support list of sidebar info ( #13612 )
2025-12-08 03:09:43 +08:00
chenyu
b981b6f89e
remove old llama grad_acc ( #13611 )
...
* remove old llama grad_acc
* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
Christopher Milan
94d7646bdc
fix anonymous struct fields ( #13610 )
2025-12-07 12:56:38 -05:00
nimlgen
dcd50baca4
amd/nv: cleanup ( #13608 )
2025-12-07 17:05:26 +03:00
nimlgen
ac5f1e115d
autogen: repro for the bug ( #13607 )
...
* autogen: repro for the test
* mute
2025-12-07 15:51:03 +03:00
Christopher Milan
4eae4b0ce6
unify adreno autogen with mesa ( #13604 )
...
* unify adreno autogen with mesa
* gen pm4
* TestTiny::test_plus works
* add a6xx enums
* IMAGE=2 TestTiny::test_gemm works
* remove adreno from CI
* cleanup
2025-12-06 15:17:36 -05:00
kamilisjon
e20bc0b9b5
remove unused function parameter in beam search ( #13602 )
2025-12-06 11:40:47 -05:00
nimlgen
abafb96441
hcq: check all subbufs are free ( #13599 )
...
* hcq: check all subbufs are free
* fix
* Update ops_amd.py
2025-12-06 17:43:18 +03:00
nimlgen
f2b549d921
amd: refactor scratch calc ( #13595 )
...
* amd: refactor scratch calc
* fix
2025-12-06 16:41:35 +03:00
chenyu
4562f217e1
more bert updates ( #13597 )
...
prep split jit
also lower BS to 72
2025-12-06 08:32:43 -05:00
wozeparrot
93f1baca77
feat: tk fa in tensor ( #13580 )
2025-12-05 14:36:29 -08:00
chenyu
cb4c6324ef
revert bert grad accumulation ( #13596 )
...
prep for the new split jit style
2025-12-05 17:30:08 -05:00
qazal
f20212e1ec
refactor viz error handler ( #13593 )
2025-12-06 02:37:39 +08:00
Christopher Milan
dec2f50aee
reenable process replay for lvp ( #13592 )
2025-12-05 12:36:35 -05:00
chenyu
0977206b1c
Revert am ( #13591 )
...
* Revert "hotfix: amd: tmpring (#13589 )"
This reverts commit 4d8b283b36 .
* Revert "amd: use correct structs (#13583 )"
This reverts commit d8b09eda57 .
2025-12-05 11:03:12 -05:00
chenyu
ac1227575f
IMAGE=1 driving_vision in benchmark ( #13587 )
2025-12-05 10:20:54 -05:00
nimlgen
4d8b283b36
hotfix: amd: tmpring ( #13589 )
...
* hotfix: amd: tmpring
* more
2025-12-05 18:19:05 +03:00
qazal
8c332219f9
viz: remove x86asm highlighter ( #13586 )
...
* viz: remove x86asm highlighter
* formatting
2025-12-05 21:05:50 +08:00
qazal
5d8726d8d2
viz: refactor to generic sidebar ( #13584 )
2025-12-05 20:09:41 +08:00
nimlgen
d8b09eda57
amd: use correct structs ( #13583 )
2025-12-05 14:46:38 +03:00
qazal
6d92e9ffbf
hotfix: skip process replay on lvp ( #13585 )
2025-12-05 19:25:23 +08:00
Christopher Milan
8011b953c9
mesa: remove glsl type hack ( #13578 )
...
* mesa: remove glsl type hack
* lazy type access
* save a line
* fix windows?
* mypy happy
2025-12-04 21:18:56 -05:00
George Hotz
c5bd28e21d
start work on schedule cache ( #13529 )
...
* start work on schedule cache
* local unique
* schedule cache works
* schedule cache cleanup
* fix tests
* preserve metadata
* oops, fix cache
* put that there
* fix spec
* always miss
* why is that broken?
* src[0].op
* fix process replay
* delete abstractions2
* reenable the actual schedule cache
* metadata is best effort
* fix JIT in examples/gradaccum_mnist.py
* full jit
* fixed and test is real
2025-12-04 17:24:49 -08:00
wozeparrot
62e2fc5108
tk: global load/store rv ( #13577 )
2025-12-04 17:23:48 -08:00
Christopher Milan
5cfe1698e8
autogen: strip function parameter qualifiers ( #13576 )
...
* autogen: strip function parameter qualifiers
* regen hip
* re-regen hip
2025-12-04 19:54:34 -05:00
qazal
f21c9dbf4b
enable PMC with VIZ=2 ( #13575 )
2025-12-05 03:09:53 +08:00
qazal
d7caae5f61
viz: tabulate pmc ( #13574 )
...
* viz: tabulate pmc
* linter
* enable nesting
* pmc comes before waves
2025-12-05 03:08:39 +08:00
chenyu
42f6cf3a90
tighter test_real_world mem and kernel count bounds ( #13573 )
...
also check if actual usage is within 20% of set limit, the old limits are too big to be useful
2025-12-04 13:35:39 -05:00
chenyu
89f9e1dcd5
add SGD to beautiful_mnist ( #13571 )
2025-12-04 12:17:29 -05:00
qazal
512a8f3dd4
viz: start global memory PMC tests ( #13569 )
2025-12-05 00:40:27 +08:00
chenyu
7df56d3b99
Optimizer.device is a property ( #13568 )
2025-12-04 09:25:15 -05:00
nimlgen
db99a61fad
qcom: support cpu mappings ( #13565 )
...
* test
* qcom: support cpu mappings
* clean
* msg
2025-12-04 14:50:46 +03:00
George Hotz
bd6a068ef7
move track_rewrites to outer schedule cache ( #13556 )
...
Co-authored-by: qazal <qazal.software@gmail.com >
2025-12-04 19:13:45 +08:00
qazal
3eae146139
faster process replay [pr] ( #13564 )
2025-12-04 18:52:07 +08:00
Rory Clear
6eab756578
fix and test loading num_batches_tracked ( #13538 )
...
* fix and test loading num_batches_tracked
* add failing reverse case
* try reshape state dict if mismatch
* reshape for () and (1,)
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-12-04 01:22:49 -08:00