nimlgen
b4796e2d32
amd: set queue prio to normal ( #13658 )
2025-12-12 18:25:41 +03:00
nimlgen
a1de7787bf
am: xcc/inst support ( #13657 )
2025-12-12 17:40:42 +03:00
George Hotz
f0fa9bcd98
openai api for llm ( #13648 )
...
* openai api for llm
* responds to simple request
* schedule cache needs to unbind
* stream works
* share stream code
* 20k
* one print
* cid
2025-12-12 08:25:33 -05:00
qazal
93ad1f7732
viz: readable pmc print, share unpacker with tests ( #13655 )
...
* viz: readable pmc print, share unpacker with tests
* sections
* static analyzer
* rm that
2025-12-12 19:29:59 +08:00
Christopher Milan
760e508c3a
autogen: no deep walk ( #13654 )
...
* no deep walk
* reset init
* delete walk
* remove print
* regen
* linkage spec
* cleanup
2025-12-12 01:04:35 -05:00
wozeparrot
8f60b8dd1e
fix: cast on transpose ( #13653 )
2025-12-11 21:03:49 -08:00
Christopher Milan
950d8de00e
automatically inline anonymous ( #13652 )
2025-12-12 00:02:44 -05:00
chenyu
01e9ad0d52
clean up bert next_data ( #13650 )
...
train iter was designed to never stop for both real and fake data
2025-12-11 22:56:28 -05:00
Jakob Sachs
ab2220b834
Handle missing bfloat16 natives on CPU architectures ( #13553 )
...
* CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16
* fix lint
* remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16
---------
Co-authored-by: Jakob Sachs <jakobs99@purelymail.com >
2025-12-11 15:38:43 -05:00
nimlgen
cbae33003d
ci: add usb4 ( #13643 )
...
* ci: add usb4
* debug=3
* undef
* revert
2025-12-11 19:41:41 +03:00
chenyu
03600aef1e
failed test case when init jit with empty inputs ( #13641 )
...
not related to bert grad acc, but still seems to be a bug
2025-12-10 22:03:06 -05:00
nimlgen
51f3c9f615
am: use va_base as base ( #13640 )
2025-12-10 21:09:35 +03:00
chenyu
5034c6fb37
reenable FREE_INTERMEDIATE for bert ( #13639 )
...
* reenable FREE_INTERMEDIATE for bert
* comment
2025-12-10 12:08:09 -05:00
qazal
be6d538351
viz: add kernel walltime to pmc scoreboard ( #13638 )
...
* viz: add kernel walltime to pmc scoreboard
* fix typing
* tiny TracingKey refactor
* key on kernel name
2025-12-10 20:16:42 +08:00
qazal
1666c4aaab
viz: fix counter names ordering ( #13637 )
2025-12-10 17:05:27 +08:00
qazal
c801bb7054
viz: show all kernel pmcs ( #13635 )
2025-12-10 07:16:02 +08:00
wozeparrot
4854a0c02c
fix: getattr returns AttributeError not ImportError when missing ( #13633 )
2025-12-09 14:26:54 -08:00
chenyu
016a59cafa
remove contiguous and use where in EmbeddingBert ( #13632 )
2025-12-09 15:49:21 -05:00
nimlgen
ddecba300f
amd: use getattr for autogen ( #13630 )
...
* amd: use getattr for autogen
* fi
2025-12-09 20:36:26 +03:00
Nino Risteski
76d465dbc3
optim empty shard #13513 ( #13598 )
...
* optim empty shard
* remove tuple
* simplify
* lint
* lint2
* test
* remove original buffer unique id
* new rule
* reset shard
* update
* reset shard
2025-12-09 12:28:36 -05:00
ayanhan
47a170be2e
test: enable cummax scalar IndexError test ( #13625 )
2025-12-09 12:25:56 -05:00
Christopher Milan
9eae9dc3be
regen smu_v13 with stdint ( #13631 )
...
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2025-12-09 12:20:01 -05:00
nimlgen
7cd8852f60
autogen: do no return tuples ( #13629 )
2025-12-09 20:08:13 +03:00
nimlgen
9e484b5b1c
hcq: check size is None, do not read the whole size for 0s ( #13628 )
2025-12-09 19:37:44 +03:00
nimlgen
1329033b8c
am: fix hot-queue restarts, only dequeue ( #13627 )
2025-12-09 19:37:21 +03:00
nimlgen
b07839493d
proclogs with xccs ( #13626 )
2025-12-09 16:46:08 +03:00
qazal
2c333818f4
simplify UOp stringifier [pr] ( #13618 )
...
* simplify UOp stringifier [pr]
* fix tuple
2025-12-09 05:06:16 +08:00
chenyu
2471b49e45
minor bert / llama change from grad acc branch ( #13622 )
...
* minor bert / llama change from grad acc branch
* revert those
2025-12-08 16:04:14 -05:00
Christopher Milan
cb3d756547
NAK compile-only test ( #13621 )
2025-12-08 15:53:46 -05:00
Christopher Milan
a4c3d48aa9
compile-only test for IR3 actually works ( #13619 )
2025-12-08 15:07:49 -05:00
Christopher Milan
a17077d1d9
skip test_double_assign in CI LVP ( #13620 )
2025-12-08 14:54:02 -05:00
Christopher Milan
1c16b6e082
Mesa: freedreno ( #12746 )
...
* ir3 init
* got a program
* 1 + 1 works
* use isa_disasm instead of shader_disasm
* wip
* matmul works
* works on py3.14
* fix const loading
* skip QCOM failing tests
* cleanup
* args actually work
* add compile-only tests
* fix typo and install tinymesa
* IR3 NULL backend
* (float32) images work
* autogen fix
* fix compile only test
* typo
* mypy happy
* compile-only uses py3.14
* bump mesa
* unify qcom disassembler
* float16 works
* disasm shows in viz
* save a line
* add real del
* variable workgroup sizes
* simplify diff
* bump line count
* properly set wgsz
* regen mesa
* no preamble
* bump lines
2025-12-08 14:02:08 -05:00
Douglas Nyberg
947c6eefc3
add Swish op ( #13541 )
...
* add Swish ONNX operator
* add Swish regression test
* remove trailing whitespace
* upgrade ONNX to 1.20, add excludes for unimplemented ops
* upgrade ONNX to 1.19, add Swish op
* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op
* exclude attention_3d and attention_4d_gqa tests
* exclude attention fp16 tests
* exclude all attention tests
* retrigger CI
* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
nimlgen
dd8a1a10d4
amd: tiny cleanups ( #13616 )
2025-12-08 13:15:56 +03:00
qazal
2b07336c82
viz server cleanups ( #13615 )
...
* depths start at 0
* rename the api path
2025-12-08 17:44:43 +08:00
wozeparrot
89c4206e22
fix: typing ( #13614 )
2025-12-07 20:10:30 -08:00
qazal
572dfd5506
add static amd program info to viz ( #13594 )
...
* llvm-readelf
* amd_readelf + soft_err
* cleanup
* multiple metadata
* max wgp size, may be less
2025-12-08 04:08:14 +08:00
qazal
73093314bd
viz: support list of sidebar info ( #13612 )
2025-12-08 03:09:43 +08:00
chenyu
b981b6f89e
remove old llama grad_acc ( #13611 )
...
* remove old llama grad_acc
* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
Christopher Milan
94d7646bdc
fix anonymous struct fields ( #13610 )
2025-12-07 12:56:38 -05:00
nimlgen
dcd50baca4
amd/nv: cleanup ( #13608 )
2025-12-07 17:05:26 +03:00
nimlgen
ac5f1e115d
autogen: repro for the bug ( #13607 )
...
* autogen: repro for the test
* mute
2025-12-07 15:51:03 +03:00
Christopher Milan
4eae4b0ce6
unify adreno autogen with mesa ( #13604 )
...
* unify adreno autogen with mesa
* gen pm4
* TestTiny::test_plus works
* add a6xx enums
* IMAGE=2 TestTiny::test_gemm works
* remove adreno from CI
* cleanup
2025-12-06 15:17:36 -05:00
kamilisjon
e20bc0b9b5
remove unused function parameter in beam search ( #13602 )
2025-12-06 11:40:47 -05:00
nimlgen
abafb96441
hcq: check all subbufs are free ( #13599 )
...
* hcq: check all subbufs are free
* fix
* Update ops_amd.py
2025-12-06 17:43:18 +03:00
nimlgen
f2b549d921
amd: refactor scratch calc ( #13595 )
...
* amd: refactor scratch calc
* fix
2025-12-06 16:41:35 +03:00
chenyu
4562f217e1
more bert updates ( #13597 )
...
prep split jit
also lower BS to 72
2025-12-06 08:32:43 -05:00
wozeparrot
93f1baca77
feat: tk fa in tensor ( #13580 )
2025-12-05 14:36:29 -08:00
chenyu
cb4c6324ef
revert bert grad accumulation ( #13596 )
...
prep for the new split jit style
2025-12-05 17:30:08 -05:00
qazal
f20212e1ec
refactor viz error handler ( #13593 )
2025-12-06 02:37:39 +08:00