qazal
572dfd5506
add static amd program info to viz ( #13594 )
...
* llvm-readelf
* amd_readelf + soft_err
* cleanup
* multiple metadata
* max wgp size, may be less
2025-12-08 04:08:14 +08:00
qazal
73093314bd
viz: support list of sidebar info ( #13612 )
2025-12-08 03:09:43 +08:00
chenyu
b981b6f89e
remove old llama grad_acc ( #13611 )
...
* remove old llama grad_acc
* GRADIENT_ACC_STEPS=1
2025-12-07 13:03:47 -05:00
Christopher Milan
94d7646bdc
fix anonymous struct fields ( #13610 )
2025-12-07 12:56:38 -05:00
nimlgen
dcd50baca4
amd/nv: cleanup ( #13608 )
2025-12-07 17:05:26 +03:00
nimlgen
ac5f1e115d
autogen: repro for the bug ( #13607 )
...
* autogen: repro for the test
* mute
2025-12-07 15:51:03 +03:00
Christopher Milan
4eae4b0ce6
unify adreno autogen with mesa ( #13604 )
...
* unify adreno autogen with mesa
* gen pm4
* TestTiny::test_plus works
* add a6xx enums
* IMAGE=2 TestTiny::test_gemm works
* remove adreno from CI
* cleanup
2025-12-06 15:17:36 -05:00
kamilisjon
e20bc0b9b5
remove unused function parameter in beam search ( #13602 )
2025-12-06 11:40:47 -05:00
nimlgen
abafb96441
hcq: check all subbufs are free ( #13599 )
...
* hcq: check all subbufs are free
* fix
* Update ops_amd.py
2025-12-06 17:43:18 +03:00
nimlgen
f2b549d921
amd: refactor scratch calc ( #13595 )
...
* amd: refactor scratch calc
* fix
2025-12-06 16:41:35 +03:00
chenyu
4562f217e1
more bert updates ( #13597 )
...
prep split jit
also lower BS to 72
2025-12-06 08:32:43 -05:00
wozeparrot
93f1baca77
feat: tk fa in tensor ( #13580 )
2025-12-05 14:36:29 -08:00
chenyu
cb4c6324ef
revert bert grad accumulation ( #13596 )
...
prep for the new split jit style
2025-12-05 17:30:08 -05:00
qazal
f20212e1ec
refactor viz error handler ( #13593 )
2025-12-06 02:37:39 +08:00
Christopher Milan
dec2f50aee
reenable process replay for lvp ( #13592 )
2025-12-05 12:36:35 -05:00
chenyu
0977206b1c
Revert am ( #13591 )
...
* Revert "hotfix: amd: tmpring (#13589 )"
This reverts commit 4d8b283b36 .
* Revert "amd: use correct structs (#13583 )"
This reverts commit d8b09eda57 .
2025-12-05 11:03:12 -05:00
chenyu
ac1227575f
IMAGE=1 driving_vision in benchmark ( #13587 )
2025-12-05 10:20:54 -05:00
nimlgen
4d8b283b36
hotfix: amd: tmpring ( #13589 )
...
* hotfix: amd: tmpring
* more
2025-12-05 18:19:05 +03:00
qazal
8c332219f9
viz: remove x86asm highlighter ( #13586 )
...
* viz: remove x86asm highlighter
* formatting
2025-12-05 21:05:50 +08:00
qazal
5d8726d8d2
viz: refactor to generic sidebar ( #13584 )
2025-12-05 20:09:41 +08:00
nimlgen
d8b09eda57
amd: use correct structs ( #13583 )
2025-12-05 14:46:38 +03:00
qazal
6d92e9ffbf
hotfix: skip process replay on lvp ( #13585 )
2025-12-05 19:25:23 +08:00
Christopher Milan
8011b953c9
mesa: remove glsl type hack ( #13578 )
...
* mesa: remove glsl type hack
* lazy type access
* save a line
* fix windows?
* mypy happy
2025-12-04 21:18:56 -05:00
George Hotz
c5bd28e21d
start work on schedule cache ( #13529 )
...
* start work on schedule cache
* local unique
* schedule cache works
* schedule cache cleanup
* fix tests
* preserve metadata
* oops, fix cache
* put that there
* fix spec
* always miss
* why is that broken?
* src[0].op
* fix process replay
* delete abstractions2
* reenable the actual schedule cache
* metadata is best effort
* fix JIT in examples/gradaccum_mnist.py
* full jit
* fixed and test is real
2025-12-04 17:24:49 -08:00
wozeparrot
62e2fc5108
tk: global load/store rv ( #13577 )
2025-12-04 17:23:48 -08:00
Christopher Milan
5cfe1698e8
autogen: strip function parameter qualifiers ( #13576 )
...
* autogen: strip function parameter qualifiers
* regen hip
* re-regen hip
2025-12-04 19:54:34 -05:00
qazal
f21c9dbf4b
enable PMC with VIZ=2 ( #13575 )
2025-12-05 03:09:53 +08:00
qazal
d7caae5f61
viz: tabulate pmc ( #13574 )
...
* viz: tabulate pmc
* linter
* enable nesting
* pmc comes before waves
2025-12-05 03:08:39 +08:00
chenyu
42f6cf3a90
tighter test_real_world mem and kernel count bounds ( #13573 )
...
also check if actual usage is within 20% of set limit, the old limits are too big to be useful
2025-12-04 13:35:39 -05:00
chenyu
89f9e1dcd5
add SGD to beautiful_mnist ( #13571 )
2025-12-04 12:17:29 -05:00
qazal
512a8f3dd4
viz: start global memory PMC tests ( #13569 )
2025-12-05 00:40:27 +08:00
chenyu
7df56d3b99
Optimizer.device is a property ( #13568 )
2025-12-04 09:25:15 -05:00
nimlgen
db99a61fad
qcom: support cpu mappings ( #13565 )
...
* test
* qcom: support cpu mappings
* clean
* msg
2025-12-04 14:50:46 +03:00
George Hotz
bd6a068ef7
move track_rewrites to outer schedule cache ( #13556 )
...
Co-authored-by: qazal <qazal.software@gmail.com >
2025-12-04 19:13:45 +08:00
qazal
3eae146139
faster process replay [pr] ( #13564 )
2025-12-04 18:52:07 +08:00
Rory Clear
6eab756578
fix and test loading num_batches_tracked ( #13538 )
...
* fix and test loading num_batches_tracked
* add failing reverse case
* try reshape state dict if mismatch
* reshape for () and (1,)
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-12-04 01:22:49 -08:00
nimlgen
877a7fdd61
jit: support encdec ( #13563 )
...
* jit: support encdec
* fix
2025-12-04 11:58:34 +03:00
Douglas Nyberg
a8a62bc08e
add max/min reduction support to ScatterND ( #13562 )
2025-12-04 00:53:47 -08:00
ayanhan
edf929ec9d
fix: add __delitem__ to Tensor with proper TypeError ( #13561 )
2025-12-04 00:53:08 -08:00
Douglas Nyberg
9411ecedc4
fix CUDA half-precision trunc() type mismatch ( #13559 )
2025-12-03 21:53:16 -05:00
ayanhan
92b40290c7
fix: add test_sum_int and remove outdated TODO in test_custom_kernel ( #13560 )
2025-12-03 21:51:58 -05:00
Christopher Milan
0a54434b15
mitigate ctypes c_bool bitfield bug ( #13558 )
...
* mitigate ctypes c_bool bitfield bug
* don't delete old test
2025-12-03 20:46:04 -05:00
George Hotz
96d16675fe
update examples/gradaccum_mnist.py to use the JIT
2025-12-03 16:11:42 -08:00
George Hotz
24ca8eeaa7
small fixups from schedule_cache ( #13557 )
2025-12-03 15:41:16 -08:00
Douglas Nyberg
f5abd38132
remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS ( #13555 )
2025-12-03 17:48:27 -05:00
George Hotz
a4c4e48385
add LUNIQUE op ( #13554 )
2025-12-03 14:34:34 -08:00
George Hotz
a909cd4581
faster HEVC decode ( #13552 )
...
* faster HEVC decode
* bind to variables
* cleanups
* more cleanups
2025-12-03 11:33:05 -08:00
chenyu
22777a89ea
minor test_uop_symbolic updates ( #13551 )
2025-12-03 13:17:44 -05:00
chenyu
a205f98ef4
tighter bound for MOD ( #13550 )
2025-12-03 11:24:29 -05:00
nimlgen
fcdb01abe7
hip: fix ioctl ( #13548 )
2025-12-03 16:40:43 +03:00