Commit Graph

11446 Commits

Author SHA1 Message Date
qazal
8390de39e6 amd: static flag check for sqtt/pmc (#13545) 2025-12-03 18:36:15 +08:00
George Hotz
ddf3f2d0c4 rdna3 asm + zip_extract (#13499)
* rdna3 asm + zip_extract

* include sqtt

* fix end parsing

* disassembler working

* parsing fields

* instruction

* op

* more parsing
2025-12-02 22:56:01 -08:00
George Hotz
6bd355fa26 add needs_second_gpu decorator (#13543)
* add needs_second_gpu decorator

* more skips

* two more fixes
2025-12-02 19:08:23 -08:00
wozeparrot
0d55aec605 fix after end (#13542) 2025-12-02 18:42:58 -08:00
chenyu
8902781dc1 enable more benchmarks (#13540)
* enable more benchmarks

* disable some

* adjust ASSERT_MIN_STEP_TIME

* mac NOCLANG=1
2025-12-02 20:31:14 -05:00
George Hotz
055d5aeb7f add external_test_process_count 2025-12-02 17:26:30 -08:00
chenyu
e8879f7e31 match torch clamp backward (#13533)
* match torch clamp backward

* fix PYTHON
2025-12-02 17:58:32 -05:00
qazal
7622be761f add new remu instructions from #13533 (#13539) 2025-12-03 06:29:20 +08:00
wozeparrot
18640f57b2 feat: configurable timeout (#13537) 2025-12-02 13:35:35 -08:00
chenyu
21aac568fd limit lift x*y out of reduce to int [pr] (#13535) 2025-12-02 16:11:45 -05:00
Roelof van Dijk
c158e3c988 add cifar gated uop_given_valid regression test (#13536) 2025-12-02 16:02:47 -05:00
Roelof van Dijk
e329baffa7 fix cifar while keeping openpilot fused (#13528)
* this works

* test now passes
2025-12-02 12:05:56 -08:00
nimlgen
0874ba8cc8 test_hevc: do not download the whole file (#13531)
* test_hevc: do not download the whole file

* fix
2025-12-02 21:31:28 +03:00
qazal
366badaa68 require renderer argument in get_program, removes device opening in process replay [pr] (#13524) 2025-12-03 02:05:31 +08:00
George Hotz
21184ae6b1 bump cache to 14 (#13530) 2025-12-02 08:02:19 -08:00
George Hotz
037edc151c late gate for ALLOW_TF32 (#13527)
* remove ALLOW_TF32

* the right place to put that gate
2025-12-02 07:51:58 -08:00
Douglas Nyberg
6a7c58abf1 fix(onnx): unwrap list/tuple value in Pad op (#13500)
* fix(onnx): unwrap list/tuple value in Pad op

* add regression test for Pad list value

* remove trailing whitespace

* use _resolve_const for Pad constant_value
2025-12-02 07:47:20 -08:00
qazal
c65aa93081 refactor sqtt loader to enable PMC=1 SQTT=0 (#13526) 2025-12-02 22:50:38 +08:00
chenyu
60f7c6cce6 simpler drop_and_clauses [pr] (#13525) 2025-12-02 09:12:21 -05:00
nimlgen
77a76d1b13 device: respect compiler ContextVars (#13523)
* device: envvars for cc

* fix

* fix

* x

* um

* fix

* remote

* em

* cleanup

* typing

* fix

* debug

* lvp?

* ugh

* singl

* rm

* lol

* fix

* ?

* this?

* why?

* rev

* mod test

* l
2025-12-02 14:42:04 +03:00
wozeparrot
1b7dbfb37f tk: named kernels + per kernel range id (#13522) 2025-12-01 22:51:04 -08:00
wozeparrot
8713ae6de9 fix: dead sdv2 download link (#13521) 2025-12-01 22:50:53 -08:00
George Hotz
44104b0b7f mnist with grad acc + Adam on CPU (#13520)
* mnist with grad acc + Adam on CPU

* still broken, but closer

* works w/o jit

* this works without the jit
2025-12-01 18:27:32 -08:00
George Hotz
7307120311 shard to one device is to (#13519)
* shard to one device is to

* fst
2025-12-01 16:29:53 -08:00
chenyu
0b92fd30f5 simpler simplify_valid [pr] (#13514)
dedup instead of getting a True clause which is removed later
2025-12-01 17:36:33 -05:00
qazal
a5ec3b24be viz: start PMC in the counters view (#13510) 2025-12-02 00:01:57 +08:00
nimlgen
759b41ab91 amd: fix rsrc_word3 on gfx9 (#13509) 2025-12-01 12:47:54 +03:00
chenyu
ebbd114885 simpler invalid alu [pr] (#13508) 2025-11-30 22:18:42 -05:00
George Hotz
ada6b92b2d add a gate to rewrite if there's no rules [pr] (#13506) 2025-11-30 17:40:52 -08:00
George Hotz
97b56e11e0 hotfix: 32 workgroups for radeon 8050s 2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2 use numpy in amd_uop_matmul for simpler tracing (#13503) 2025-11-30 08:04:38 -08:00
qazal
9023ca30ef show number of waves in each SE/CU (#13491)
* show number of waves in each SE/CU

* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236 nv: minimal hevc (#13502)
* nv: minimal hevc

* validate

* not needed

* tralin

* var

* cpu

* fxi

* desc

* move

* cleanup
2025-11-30 16:46:55 +03:00
George Hotz
fd373fea7a fix a few tests [pr] (#13498) 2025-11-29 13:43:45 -08:00
George Hotz
29b11c8992 bug in device enumerate where we didn't put default back (#13495) 2025-11-29 13:00:55 -08:00
George Hotz
6a140f74fe split out unique_const and cache const [pr] (#13493)
* split out unique_const

* add cache to const

* call const in unique_const
2025-11-29 10:44:28 -08:00
George Hotz
c38b7684dc improve microbenchmarks (#13492)
* improve microbenchmarks

* bugfix + ubench

* lil

* no src in const method
2025-11-29 10:15:22 -08:00
qazal
941597db71 viz UI cleanups (#13490) 2025-11-29 22:07:00 +08:00
qazal
d457ee0ba4 viz: correctly handle multiple sqtt traces of the same prg (#13460) 2025-11-29 20:52:41 +08:00
George Hotz
6f4d7c0c70 directly create tensor in _apply_uop (#13489) 2025-11-28 19:51:06 -08:00
kamilisjon
3d76ef9ba8 Update tests (#13479) 2025-11-28 18:35:28 -08:00
nimlgen
192bf4e00a amd,nv: remove unused env vars (#13487) 2025-11-28 23:12:53 +03:00
qazal
ae9c56134e skip test_tk failing locally on macbook (#13476) 2025-11-29 01:15:37 +08:00
qazal
f33ccd31fd viz: instruction deduping for SQTT inst waves (#13482) 2025-11-28 23:17:07 +08:00
Roelof van Dijk
eb543a91e8 perf: remove graph-in-graph from expand_index (#13473)
* remove graph-in-graph from devectorizer

* vectorize, not sink

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-27 11:32:16 -08:00
Roelof van Dijk
d3e125d05d keyword changed (import reserved in python) (#13477) 2025-11-27 11:23:00 -08:00
qazal
72ef533d9c tracing: use u32 for buffer args encoding (#13472) 2025-11-28 00:19:51 +08:00
George Hotz
18addc0a1d process replay only get_program (#13475) 2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095 enable process replay (non-checking) by default (#13474) 2025-11-27 07:28:44 -08:00
qazal
952a6a8b10 viz: add kernel buffers back to the sidebar (#13471) 2025-11-27 22:10:35 +08:00