qazal
366badaa68
require renderer argument in get_program, removes device opening in process replay [pr] ( #13524 )
2025-12-03 02:05:31 +08:00
George Hotz
21184ae6b1
bump cache to 14 ( #13530 )
2025-12-02 08:02:19 -08:00
George Hotz
037edc151c
late gate for ALLOW_TF32 ( #13527 )
...
* remove ALLOW_TF32
* the right place to put that gate
2025-12-02 07:51:58 -08:00
Douglas Nyberg
6a7c58abf1
fix(onnx): unwrap list/tuple value in Pad op ( #13500 )
...
* fix(onnx): unwrap list/tuple value in Pad op
* add regression test for Pad list value
* remove trailing whitespace
* use _resolve_const for Pad constant_value
2025-12-02 07:47:20 -08:00
qazal
c65aa93081
refactor sqtt loader to enable PMC=1 SQTT=0 ( #13526 )
2025-12-02 22:50:38 +08:00
chenyu
60f7c6cce6
simpler drop_and_clauses [pr] ( #13525 )
2025-12-02 09:12:21 -05:00
nimlgen
77a76d1b13
device: respect compiler ContextVars ( #13523 )
...
* device: envvars for cc
* fix
* fix
* x
* um
* fix
* remote
* em
* cleanup
* typing
* fix
* debug
* lvp?
* ugh
* singl
* rm
* lol
* fix
* ?
* this?
* why?
* rev
* mod test
* l
2025-12-02 14:42:04 +03:00
wozeparrot
1b7dbfb37f
tk: named kernels + per kernel range id ( #13522 )
2025-12-01 22:51:04 -08:00
wozeparrot
8713ae6de9
fix: dead sdv2 download link ( #13521 )
2025-12-01 22:50:53 -08:00
George Hotz
44104b0b7f
mnist with grad acc + Adam on CPU ( #13520 )
...
* mnist with grad acc + Adam on CPU
* still broken, but closer
* works w/o jit
* this works without the jit
2025-12-01 18:27:32 -08:00
George Hotz
7307120311
shard to one device is to ( #13519 )
...
* shard to one device is to
* fst
2025-12-01 16:29:53 -08:00
chenyu
0b92fd30f5
simpler simplify_valid [pr] ( #13514 )
...
dedup instead of getting a True clause which is removed later
2025-12-01 17:36:33 -05:00
qazal
a5ec3b24be
viz: start PMC in the counters view ( #13510 )
2025-12-02 00:01:57 +08:00
nimlgen
759b41ab91
amd: fix rsrc_word3 on gfx9 ( #13509 )
2025-12-01 12:47:54 +03:00
chenyu
ebbd114885
simpler invalid alu [pr] ( #13508 )
2025-11-30 22:18:42 -05:00
George Hotz
ada6b92b2d
add a gate to rewrite if there's no rules [pr] ( #13506 )
2025-11-30 17:40:52 -08:00
George Hotz
97b56e11e0
hotfix: 32 workgroups for radeon 8050s
2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2
use numpy in amd_uop_matmul for simpler tracing ( #13503 )
2025-11-30 08:04:38 -08:00
qazal
9023ca30ef
show number of waves in each SE/CU ( #13491 )
...
* show number of waves in each SE/CU
* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236
nv: minimal hevc ( #13502 )
...
* nv: minimal hevc
* validate
* not needed
* tralin
* var
* cpu
* fxi
* desc
* move
* cleanup
2025-11-30 16:46:55 +03:00
George Hotz
fd373fea7a
fix a few tests [pr] ( #13498 )
2025-11-29 13:43:45 -08:00
George Hotz
29b11c8992
bug in device enumerate where we didn't put default back ( #13495 )
2025-11-29 13:00:55 -08:00
George Hotz
6a140f74fe
split out unique_const and cache const [pr] ( #13493 )
...
* split out unique_const
* add cache to const
* call const in unique_const
2025-11-29 10:44:28 -08:00
George Hotz
c38b7684dc
improve microbenchmarks ( #13492 )
...
* improve microbenchmarks
* bugfix + ubench
* lil
* no src in const method
2025-11-29 10:15:22 -08:00
qazal
941597db71
viz UI cleanups ( #13490 )
2025-11-29 22:07:00 +08:00
qazal
d457ee0ba4
viz: correctly handle multiple sqtt traces of the same prg ( #13460 )
2025-11-29 20:52:41 +08:00
George Hotz
6f4d7c0c70
directly create tensor in _apply_uop ( #13489 )
2025-11-28 19:51:06 -08:00
kamilisjon
3d76ef9ba8
Update tests ( #13479 )
2025-11-28 18:35:28 -08:00
nimlgen
192bf4e00a
amd,nv: remove unused env vars ( #13487 )
2025-11-28 23:12:53 +03:00
qazal
ae9c56134e
skip test_tk failing locally on macbook ( #13476 )
2025-11-29 01:15:37 +08:00
qazal
f33ccd31fd
viz: instruction deduping for SQTT inst waves ( #13482 )
2025-11-28 23:17:07 +08:00
Roelof van Dijk
eb543a91e8
perf: remove graph-in-graph from expand_index ( #13473 )
...
* remove graph-in-graph from devectorizer
* vectorize, not sink
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-27 11:32:16 -08:00
Roelof van Dijk
d3e125d05d
keyword changed (import reserved in python) ( #13477 )
2025-11-27 11:23:00 -08:00
qazal
72ef533d9c
tracing: use u32 for buffer args encoding ( #13472 )
2025-11-28 00:19:51 +08:00
George Hotz
18addc0a1d
process replay only get_program ( #13475 )
2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095
enable process replay (non-checking) by default ( #13474 )
2025-11-27 07:28:44 -08:00
qazal
952a6a8b10
viz: add kernel buffers back to the sidebar ( #13471 )
2025-11-27 22:10:35 +08:00
Kirill R.
57869387f9
Update wording in mnist.md ( #13469 )
2025-11-27 05:59:49 -08:00
nimlgen
1d207eca3d
cuda: fix fmt in compiler ( #13470 )
2025-11-27 16:51:17 +03:00
qazal
2df8a3474e
viz: bring back flops and mem in sidebar ( #13467 )
2025-11-27 17:27:44 +08:00
George Hotz
05cd2279d0
add cache on reshape ( #13466 )
...
* remove cache on divmod, way less objects
* _apply_reshape
* reshape
* no gc on realize
* wow that cache is fast
2025-11-26 18:57:40 -08:00
George Hotz
f4123b66df
add DEBUG_GC ( #13465 )
...
* add DEBUG_GC
* fixup create_schedule_with_vars
* work
2025-11-26 17:44:44 -08:00
George Hotz
19228e8d37
test_graph is flaky
2025-11-26 16:37:42 -08:00
George Hotz
268b3eb392
factor scheduling into complete_create_schedule_with_vars ( #13464 )
2025-11-26 15:43:27 -08:00
George Hotz
e4cd649ff0
remove kernelize to prepare for refactors ( #13463 )
...
* remove kernelize to prepare for refactors
* less kernelize
* last test
2025-11-26 14:18:50 -08:00
qazal
b63e5a7568
viz: full range x axis scroll ( #13459 )
2025-11-26 21:28:07 +08:00
qazal
c12e218751
viz: double click on INST wave ( #13458 )
2025-11-26 21:12:40 +08:00
qazal
e9cb738c7a
viz: event sidebar cleanup ( #13457 )
2025-11-26 19:47:15 +08:00
qazal
2a3b665972
viz: initial zoom at first event ( #13456 )
...
* viz: initial zoom at first event
* sidebar work
2025-11-26 16:42:06 +08:00
Christopher Milan
b2af92c821
fix HCQGraph.__del__ bug when finalizing ( #13298 )
...
* fix _do_ioctl import
* fix circular import
* suppress_finalizing instead
2025-11-25 20:33:48 -08:00