Commit Graph

11299 Commits

Author SHA1 Message Date
qazal
572dfd5506 add static amd program info to viz (#13594)
* llvm-readelf

* amd_readelf + soft_err

* cleanup

* multiple metadata

* max wgp size, may be less
2025-12-08 04:08:14 +08:00
qazal
73093314bd viz: support list of sidebar info (#13612) 2025-12-08 03:09:43 +08:00
chenyu
b981b6f89e remove old llama grad_acc (#13611)
* remove old llama grad_acc

* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
Christopher Milan
94d7646bdc fix anonymous struct fields (#13610) 2025-12-07 12:56:38 -05:00
nimlgen
dcd50baca4 amd/nv: cleanup (#13608) 2025-12-07 17:05:26 +03:00
nimlgen
ac5f1e115d autogen: repro for the bug (#13607)
* autogen: repro for the test

* mute
2025-12-07 15:51:03 +03:00
Christopher Milan
4eae4b0ce6 unify adreno autogen with mesa (#13604)
* unify adreno autogen with mesa

* gen pm4

* TestTiny::test_plus works

* add a6xx enums

* IMAGE=2 TestTiny::test_gemm works

* remove adreno from CI

* cleanup
2025-12-06 15:17:36 -05:00
kamilisjon
e20bc0b9b5 remove unused function parameter in beam search (#13602) 2025-12-06 11:40:47 -05:00
nimlgen
abafb96441 hcq: check all subbufs are free (#13599)
* hcq: check all subbufs are free

* fix

* Update ops_amd.py
2025-12-06 17:43:18 +03:00
nimlgen
f2b549d921 amd: refactor scratch calc (#13595)
* amd: refactor scratch calc

* fix
2025-12-06 16:41:35 +03:00
chenyu
4562f217e1 more bert updates (#13597)
prep split jit
also lower BS to 72
2025-12-06 08:32:43 -05:00
wozeparrot
93f1baca77 feat: tk fa in tensor (#13580) 2025-12-05 14:36:29 -08:00
chenyu
cb4c6324ef revert bert grad accumulation (#13596)
prep for the new split jit style
2025-12-05 17:30:08 -05:00
qazal
f20212e1ec refactor viz error handler (#13593) 2025-12-06 02:37:39 +08:00
Christopher Milan
dec2f50aee reenable process replay for lvp (#13592) 2025-12-05 12:36:35 -05:00
chenyu
0977206b1c Revert am (#13591)
* Revert "hotfix: amd: tmpring (#13589)"

This reverts commit 4d8b283b36.

* Revert "amd: use correct structs (#13583)"

This reverts commit d8b09eda57.
2025-12-05 11:03:12 -05:00
chenyu
ac1227575f IMAGE=1 driving_vision in benchmark (#13587) 2025-12-05 10:20:54 -05:00
nimlgen
4d8b283b36 hotfix: amd: tmpring (#13589)
* hotfix: amd: tmpring

* more
2025-12-05 18:19:05 +03:00
qazal
8c332219f9 viz: remove x86asm highlighter (#13586)
* viz: remove x86asm highlighter

* formatting
2025-12-05 21:05:50 +08:00
qazal
5d8726d8d2 viz: refactor to generic sidebar (#13584) 2025-12-05 20:09:41 +08:00
nimlgen
d8b09eda57 amd: use correct structs (#13583) 2025-12-05 14:46:38 +03:00
qazal
6d92e9ffbf hotfix: skip process replay on lvp (#13585) 2025-12-05 19:25:23 +08:00
Christopher Milan
8011b953c9 mesa: remove glsl type hack (#13578)
* mesa: remove glsl type hack

* lazy type access

* save a line

* fix windows?

* mypy happy
2025-12-04 21:18:56 -05:00
George Hotz
c5bd28e21d start work on schedule cache (#13529)
* start work on schedule cache

* local unique

* schedule cache works

* schedule cache cleanup

* fix tests

* preserve metadata

* oops, fix cache

* put that there

* fix spec

* always miss

* why is that broken?

* src[0].op

* fix process replay

* delete abstractions2

* reenable the actual schedule cache

* metadata is best effort

* fix JIT in examples/gradaccum_mnist.py

* full jit

* fixed and test is real
2025-12-04 17:24:49 -08:00
wozeparrot
62e2fc5108 tk: global load/store rv (#13577) 2025-12-04 17:23:48 -08:00
Christopher Milan
5cfe1698e8 autogen: strip function parameter qualifiers (#13576)
* autogen: strip function parameter qualifiers

* regen hip

* re-regen hip
2025-12-04 19:54:34 -05:00
qazal
f21c9dbf4b enable PMC with VIZ=2 (#13575) 2025-12-05 03:09:53 +08:00
qazal
d7caae5f61 viz: tabulate pmc (#13574)
* viz: tabulate pmc

* linter

* enable nesting

* pmc comes before waves
2025-12-05 03:08:39 +08:00
chenyu
42f6cf3a90 tighter test_real_world mem and kernel count bounds (#13573)
also check if actual usage is within 20% of set limit, the old limits are too big to be useful
2025-12-04 13:35:39 -05:00
chenyu
89f9e1dcd5 add SGD to beautiful_mnist (#13571) 2025-12-04 12:17:29 -05:00
qazal
512a8f3dd4 viz: start global memory PMC tests (#13569) 2025-12-05 00:40:27 +08:00
chenyu
7df56d3b99 Optimizer.device is a property (#13568) 2025-12-04 09:25:15 -05:00
nimlgen
db99a61fad qcom: support cpu mappings (#13565)
* test

* qcom: support cpu mappings

* clean

* msg
2025-12-04 14:50:46 +03:00
George Hotz
bd6a068ef7 move track_rewrites to outer schedule cache (#13556)
Co-authored-by: qazal <qazal.software@gmail.com>
2025-12-04 19:13:45 +08:00
qazal
3eae146139 faster process replay [pr] (#13564) 2025-12-04 18:52:07 +08:00
Rory Clear
6eab756578 fix and test loading num_batches_tracked (#13538)
* fix and test loading num_batches_tracked

* add failing reverse case

* try reshape state dict if mismatch

* reshape for () and (1,)

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-12-04 01:22:49 -08:00
nimlgen
877a7fdd61 jit: support encdec (#13563)
* jit: support encdec

* fix
2025-12-04 11:58:34 +03:00
Douglas Nyberg
a8a62bc08e add max/min reduction support to ScatterND (#13562) 2025-12-04 00:53:47 -08:00
ayanhan
edf929ec9d fix: add __delitem__ to Tensor with proper TypeError (#13561) 2025-12-04 00:53:08 -08:00
Douglas Nyberg
9411ecedc4 fix CUDA half-precision trunc() type mismatch (#13559) 2025-12-03 21:53:16 -05:00
ayanhan
92b40290c7 fix: add test_sum_int and remove outdated TODO in test_custom_kernel (#13560) 2025-12-03 21:51:58 -05:00
Christopher Milan
0a54434b15 mitigate ctypes c_bool bitfield bug (#13558)
* mitigate ctypes c_bool bitfield bug

* don't delete old test
2025-12-03 20:46:04 -05:00
George Hotz
96d16675fe update examples/gradaccum_mnist.py to use the JIT 2025-12-03 16:11:42 -08:00
George Hotz
24ca8eeaa7 small fixups from schedule_cache (#13557) 2025-12-03 15:41:16 -08:00
Douglas Nyberg
f5abd38132 remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555) 2025-12-03 17:48:27 -05:00
George Hotz
a4c4e48385 add LUNIQUE op (#13554) 2025-12-03 14:34:34 -08:00
George Hotz
a909cd4581 faster HEVC decode (#13552)
* faster HEVC decode

* bind to variables

* cleanups

* more cleanups
2025-12-03 11:33:05 -08:00
chenyu
22777a89ea minor test_uop_symbolic updates (#13551) 2025-12-03 13:17:44 -05:00
chenyu
a205f98ef4 tighter bound for MOD (#13550) 2025-12-03 11:24:29 -05:00
nimlgen
fcdb01abe7 hip: fix ioctl (#13548) 2025-12-03 16:40:43 +03:00