qazal
8390de39e6
amd: static flag check for sqtt/pmc ( #13545 )
2025-12-03 18:36:15 +08:00
George Hotz
ddf3f2d0c4
rdna3 asm + zip_extract ( #13499 )
...
* rdna3 asm + zip_extract
* include sqtt
* fix end parsing
* disassembler working
* parsing fields
* instruction
* op
* more parsing
2025-12-02 22:56:01 -08:00
George Hotz
6bd355fa26
add needs_second_gpu decorator ( #13543 )
...
* add needs_second_gpu decorator
* more skips
* two more fixes
2025-12-02 19:08:23 -08:00
wozeparrot
0d55aec605
fix after end ( #13542 )
2025-12-02 18:42:58 -08:00
chenyu
8902781dc1
enable more benchmarks ( #13540 )
...
* enable more benchmarks
* disable some
* adjust ASSERT_MIN_STEP_TIME
* mac NOCLANG=1
2025-12-02 20:31:14 -05:00
George Hotz
055d5aeb7f
add external_test_process_count
2025-12-02 17:26:30 -08:00
chenyu
e8879f7e31
match torch clamp backward ( #13533 )
...
* match torch clamp backward
* fix PYTHON
2025-12-02 17:58:32 -05:00
qazal
7622be761f
add new remu instructions from #13533 ( #13539 )
2025-12-03 06:29:20 +08:00
wozeparrot
18640f57b2
feat: configurable timeout ( #13537 )
2025-12-02 13:35:35 -08:00
chenyu
21aac568fd
limit lift x*y out of reduce to int [pr] ( #13535 )
2025-12-02 16:11:45 -05:00
Roelof van Dijk
c158e3c988
add cifar gated uop_given_valid regression test ( #13536 )
2025-12-02 16:02:47 -05:00
Roelof van Dijk
e329baffa7
fix cifar while keeping openpilot fused ( #13528 )
...
* this works
* test now passes
2025-12-02 12:05:56 -08:00
nimlgen
0874ba8cc8
test_hevc: do not download the whole file ( #13531 )
...
* test_hevc: do not download the whole file
* fix
2025-12-02 21:31:28 +03:00
qazal
366badaa68
require renderer argument in get_program, removes device opening in process replay [pr] ( #13524 )
2025-12-03 02:05:31 +08:00
George Hotz
21184ae6b1
bump cache to 14 ( #13530 )
2025-12-02 08:02:19 -08:00
George Hotz
037edc151c
late gate for ALLOW_TF32 ( #13527 )
...
* remove ALLOW_TF32
* the right place to put that gate
2025-12-02 07:51:58 -08:00
Douglas Nyberg
6a7c58abf1
fix(onnx): unwrap list/tuple value in Pad op ( #13500 )
...
* fix(onnx): unwrap list/tuple value in Pad op
* add regression test for Pad list value
* remove trailing whitespace
* use _resolve_const for Pad constant_value
2025-12-02 07:47:20 -08:00
qazal
c65aa93081
refactor sqtt loader to enable PMC=1 SQTT=0 ( #13526 )
2025-12-02 22:50:38 +08:00
chenyu
60f7c6cce6
simpler drop_and_clauses [pr] ( #13525 )
2025-12-02 09:12:21 -05:00
nimlgen
77a76d1b13
device: respect compiler ContextVars ( #13523 )
...
* device: envvars for cc
* fix
* fix
* x
* um
* fix
* remote
* em
* cleanup
* typing
* fix
* debug
* lvp?
* ugh
* singl
* rm
* lol
* fix
* ?
* this?
* why?
* rev
* mod test
* l
2025-12-02 14:42:04 +03:00
wozeparrot
1b7dbfb37f
tk: named kernels + per kernel range id ( #13522 )
2025-12-01 22:51:04 -08:00
wozeparrot
8713ae6de9
fix: dead sdv2 download link ( #13521 )
2025-12-01 22:50:53 -08:00
George Hotz
44104b0b7f
mnist with grad acc + Adam on CPU ( #13520 )
...
* mnist with grad acc + Adam on CPU
* still broken, but closer
* works w/o jit
* this works without the jit
2025-12-01 18:27:32 -08:00
George Hotz
7307120311
shard to one device is to ( #13519 )
...
* shard to one device is to
* fst
2025-12-01 16:29:53 -08:00
chenyu
0b92fd30f5
simpler simplify_valid [pr] ( #13514 )
...
dedup instead of getting a True clause which is removed later
2025-12-01 17:36:33 -05:00
qazal
a5ec3b24be
viz: start PMC in the counters view ( #13510 )
2025-12-02 00:01:57 +08:00
nimlgen
759b41ab91
amd: fix rsrc_word3 on gfx9 ( #13509 )
2025-12-01 12:47:54 +03:00
chenyu
ebbd114885
simpler invalid alu [pr] ( #13508 )
2025-11-30 22:18:42 -05:00
George Hotz
ada6b92b2d
add a gate to rewrite if there's no rules [pr] ( #13506 )
2025-11-30 17:40:52 -08:00
George Hotz
97b56e11e0
hotfix: 32 workgroups for radeon 8050s
2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2
use numpy in amd_uop_matmul for simpler tracing ( #13503 )
2025-11-30 08:04:38 -08:00
qazal
9023ca30ef
show number of waves in each SE/CU ( #13491 )
...
* show number of waves in each SE/CU
* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236
nv: minimal hevc ( #13502 )
...
* nv: minimal hevc
* validate
* not needed
* tralin
* var
* cpu
* fxi
* desc
* move
* cleanup
2025-11-30 16:46:55 +03:00
George Hotz
fd373fea7a
fix a few tests [pr] ( #13498 )
2025-11-29 13:43:45 -08:00
George Hotz
29b11c8992
bug in device enumerate where we didn't put default back ( #13495 )
2025-11-29 13:00:55 -08:00
George Hotz
6a140f74fe
split out unique_const and cache const [pr] ( #13493 )
...
* split out unique_const
* add cache to const
* call const in unique_const
2025-11-29 10:44:28 -08:00
George Hotz
c38b7684dc
improve microbenchmarks ( #13492 )
...
* improve microbenchmarks
* bugfix + ubench
* lil
* no src in const method
2025-11-29 10:15:22 -08:00
qazal
941597db71
viz UI cleanups ( #13490 )
2025-11-29 22:07:00 +08:00
qazal
d457ee0ba4
viz: correctly handle multiple sqtt traces of the same prg ( #13460 )
2025-11-29 20:52:41 +08:00
George Hotz
6f4d7c0c70
directly create tensor in _apply_uop ( #13489 )
2025-11-28 19:51:06 -08:00
kamilisjon
3d76ef9ba8
Update tests ( #13479 )
2025-11-28 18:35:28 -08:00
nimlgen
192bf4e00a
amd,nv: remove unused env vars ( #13487 )
2025-11-28 23:12:53 +03:00
qazal
ae9c56134e
skip test_tk failing locally on macbook ( #13476 )
2025-11-29 01:15:37 +08:00
qazal
f33ccd31fd
viz: instruction deduping for SQTT inst waves ( #13482 )
2025-11-28 23:17:07 +08:00
Roelof van Dijk
eb543a91e8
perf: remove graph-in-graph from expand_index ( #13473 )
...
* remove graph-in-graph from devectorizer
* vectorize, not sink
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-27 11:32:16 -08:00
Roelof van Dijk
d3e125d05d
keyword changed (import reserved in python) ( #13477 )
2025-11-27 11:23:00 -08:00
qazal
72ef533d9c
tracing: use u32 for buffer args encoding ( #13472 )
2025-11-28 00:19:51 +08:00
George Hotz
18addc0a1d
process replay only get_program ( #13475 )
2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095
enable process replay (non-checking) by default ( #13474 )
2025-11-27 07:28:44 -08:00
qazal
952a6a8b10
viz: add kernel buffers back to the sidebar ( #13471 )
2025-11-27 22:10:35 +08:00