Commit Graph

11329 Commits

Author SHA1 Message Date
Christopher Milan
950d8de00e automatically inline anonymous (#13652) 2025-12-12 00:02:44 -05:00
chenyu
01e9ad0d52 clean up bert next_data (#13650)
train iter was designed to never stop for both real and fake data
2025-12-11 22:56:28 -05:00
Jakob Sachs
ab2220b834 Handle missing bfloat16 natives on CPU architectures (#13553)
* CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16

* fix lint

* remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16

---------

Co-authored-by: Jakob Sachs <jakobs99@purelymail.com>
2025-12-11 15:38:43 -05:00
nimlgen
cbae33003d ci: add usb4 (#13643)
* ci: add usb4

* debug=3

* undef

* revert
2025-12-11 19:41:41 +03:00
chenyu
03600aef1e failed test case when init jit with empty inputs (#13641)
not related to bert grad acc, but still seems to be a bug
2025-12-10 22:03:06 -05:00
nimlgen
51f3c9f615 am: use va_base as base (#13640) 2025-12-10 21:09:35 +03:00
chenyu
5034c6fb37 reenable FREE_INTERMEDIATE for bert (#13639)
* reenable FREE_INTERMEDIATE for bert

* comment
2025-12-10 12:08:09 -05:00
qazal
be6d538351 viz: add kernel walltime to pmc scoreboard (#13638)
* viz: add kernel walltime to pmc scoreboard

* fix typing

* tiny TracingKey refactor

* key on kernel name
2025-12-10 20:16:42 +08:00
qazal
1666c4aaab viz: fix counter names ordering (#13637) 2025-12-10 17:05:27 +08:00
qazal
c801bb7054 viz: show all kernel pmcs (#13635) 2025-12-10 07:16:02 +08:00
wozeparrot
4854a0c02c fix: getattr returns AttributeError not ImportError when missing (#13633) 2025-12-09 14:26:54 -08:00
chenyu
016a59cafa remove contiguous and use where in EmbeddingBert (#13632) 2025-12-09 15:49:21 -05:00
nimlgen
ddecba300f amd: use getattr for autogen (#13630)
* amd: use getattr for autogen

* fi
2025-12-09 20:36:26 +03:00
Nino Risteski
76d465dbc3 optim empty shard #13513 (#13598)
* optim empty shard

* remove tuple

* simplify

* lint

* lint2

* test

* remove original buffer unique id

* new rule

* reset shard

* update

* reset shard
2025-12-09 12:28:36 -05:00
ayanhan
47a170be2e test: enable cummax scalar IndexError test (#13625) 2025-12-09 12:25:56 -05:00
Christopher Milan
9eae9dc3be regen smu_v13 with stdint (#13631)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-12-09 12:20:01 -05:00
nimlgen
7cd8852f60 autogen: do no return tuples (#13629) 2025-12-09 20:08:13 +03:00
nimlgen
9e484b5b1c hcq: check size is None, do not read the whole size for 0s (#13628) 2025-12-09 19:37:44 +03:00
nimlgen
1329033b8c am: fix hot-queue restarts, only dequeue (#13627) 2025-12-09 19:37:21 +03:00
nimlgen
b07839493d proclogs with xccs (#13626) 2025-12-09 16:46:08 +03:00
qazal
2c333818f4 simplify UOp stringifier [pr] (#13618)
* simplify UOp stringifier [pr]

* fix tuple
2025-12-09 05:06:16 +08:00
chenyu
2471b49e45 minor bert / llama change from grad acc branch (#13622)
* minor bert / llama change from grad acc branch

* revert those
2025-12-08 16:04:14 -05:00
Christopher Milan
cb3d756547 NAK compile-only test (#13621) 2025-12-08 15:53:46 -05:00
Christopher Milan
a4c3d48aa9 compile-only test for IR3 actually works (#13619) 2025-12-08 15:07:49 -05:00
Christopher Milan
a17077d1d9 skip test_double_assign in CI LVP (#13620) 2025-12-08 14:54:02 -05:00
Christopher Milan
1c16b6e082 Mesa: freedreno (#12746)
* ir3 init

* got a program

* 1 + 1 works

* use isa_disasm instead of shader_disasm

* wip

* matmul works

* works on py3.14

* fix const loading

* skip QCOM failing tests

* cleanup

* args actually work

* add compile-only tests

* fix typo and install tinymesa

* IR3 NULL backend

* (float32) images work

* autogen fix

* fix compile only test

* typo

* mypy happy

* compile-only uses py3.14

* bump mesa

* unify qcom disassembler

* float16 works

* disasm shows in viz

* save a line

* add real del

* variable workgroup sizes

* simplify diff

* bump line count

* properly set wgsz

* regen mesa

* no preamble

* bump lines
2025-12-08 14:02:08 -05:00
Douglas Nyberg
947c6eefc3 add Swish op (#13541)
* add Swish ONNX operator

* add Swish regression test

* remove trailing whitespace

* upgrade ONNX to 1.20, add excludes for unimplemented ops

* upgrade ONNX to 1.19, add Swish op

* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op

* exclude attention_3d and attention_4d_gqa tests

* exclude attention fp16 tests

* exclude all attention tests

* retrigger CI

* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
nimlgen
dd8a1a10d4 amd: tiny cleanups (#13616) 2025-12-08 13:15:56 +03:00
qazal
2b07336c82 viz server cleanups (#13615)
* depths start at 0

* rename the api path
2025-12-08 17:44:43 +08:00
wozeparrot
89c4206e22 fix: typing (#13614) 2025-12-07 20:10:30 -08:00
qazal
572dfd5506 add static amd program info to viz (#13594)
* llvm-readelf

* amd_readelf + soft_err

* cleanup

* multiple metadata

* max wgp size, may be less
2025-12-08 04:08:14 +08:00
qazal
73093314bd viz: support list of sidebar info (#13612) 2025-12-08 03:09:43 +08:00
chenyu
b981b6f89e remove old llama grad_acc (#13611)
* remove old llama grad_acc

* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
Christopher Milan
94d7646bdc fix anonymous struct fields (#13610) 2025-12-07 12:56:38 -05:00
nimlgen
dcd50baca4 amd/nv: cleanup (#13608) 2025-12-07 17:05:26 +03:00
nimlgen
ac5f1e115d autogen: repro for the bug (#13607)
* autogen: repro for the test

* mute
2025-12-07 15:51:03 +03:00
Christopher Milan
4eae4b0ce6 unify adreno autogen with mesa (#13604)
* unify adreno autogen with mesa

* gen pm4

* TestTiny::test_plus works

* add a6xx enums

* IMAGE=2 TestTiny::test_gemm works

* remove adreno from CI

* cleanup
2025-12-06 15:17:36 -05:00
kamilisjon
e20bc0b9b5 remove unused function parameter in beam search (#13602) 2025-12-06 11:40:47 -05:00
nimlgen
abafb96441 hcq: check all subbufs are free (#13599)
* hcq: check all subbufs are free

* fix

* Update ops_amd.py
2025-12-06 17:43:18 +03:00
nimlgen
f2b549d921 amd: refactor scratch calc (#13595)
* amd: refactor scratch calc

* fix
2025-12-06 16:41:35 +03:00
chenyu
4562f217e1 more bert updates (#13597)
prep split jit
also lower BS to 72
2025-12-06 08:32:43 -05:00
wozeparrot
93f1baca77 feat: tk fa in tensor (#13580) 2025-12-05 14:36:29 -08:00
chenyu
cb4c6324ef revert bert grad accumulation (#13596)
prep for the new split jit style
2025-12-05 17:30:08 -05:00
qazal
f20212e1ec refactor viz error handler (#13593) 2025-12-06 02:37:39 +08:00
Christopher Milan
dec2f50aee reenable process replay for lvp (#13592) 2025-12-05 12:36:35 -05:00
chenyu
0977206b1c Revert am (#13591)
* Revert "hotfix: amd: tmpring (#13589)"

This reverts commit 4d8b283b36.

* Revert "amd: use correct structs (#13583)"

This reverts commit d8b09eda57.
2025-12-05 11:03:12 -05:00
chenyu
ac1227575f IMAGE=1 driving_vision in benchmark (#13587) 2025-12-05 10:20:54 -05:00
nimlgen
4d8b283b36 hotfix: amd: tmpring (#13589)
* hotfix: amd: tmpring

* more
2025-12-05 18:19:05 +03:00
qazal
8c332219f9 viz: remove x86asm highlighter (#13586)
* viz: remove x86asm highlighter

* formatting
2025-12-05 21:05:50 +08:00
qazal
5d8726d8d2 viz: refactor to generic sidebar (#13584) 2025-12-05 20:09:41 +08:00