George Hotz
ada6b92b2d
add a gate to rewrite if there's no rules [pr] ( #13506 )
2025-11-30 17:40:52 -08:00
George Hotz
97b56e11e0
hotfix: 32 workgroups for radeon 8050s
2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2
use numpy in amd_uop_matmul for simpler tracing ( #13503 )
2025-11-30 08:04:38 -08:00
qazal
9023ca30ef
show number of waves in each SE/CU ( #13491 )
...
* show number of waves in each SE/CU
* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236
nv: minimal hevc ( #13502 )
...
* nv: minimal hevc
* validate
* not needed
* tralin
* var
* cpu
* fxi
* desc
* move
* cleanup
2025-11-30 16:46:55 +03:00
George Hotz
fd373fea7a
fix a few tests [pr] ( #13498 )
2025-11-29 13:43:45 -08:00
George Hotz
29b11c8992
bug in device enumerate where we didn't put default back ( #13495 )
2025-11-29 13:00:55 -08:00
George Hotz
6a140f74fe
split out unique_const and cache const [pr] ( #13493 )
...
* split out unique_const
* add cache to const
* call const in unique_const
2025-11-29 10:44:28 -08:00
George Hotz
c38b7684dc
improve microbenchmarks ( #13492 )
...
* improve microbenchmarks
* bugfix + ubench
* lil
* no src in const method
2025-11-29 10:15:22 -08:00
qazal
941597db71
viz UI cleanups ( #13490 )
2025-11-29 22:07:00 +08:00
qazal
d457ee0ba4
viz: correctly handle multiple sqtt traces of the same prg ( #13460 )
2025-11-29 20:52:41 +08:00
George Hotz
6f4d7c0c70
directly create tensor in _apply_uop ( #13489 )
2025-11-28 19:51:06 -08:00
kamilisjon
3d76ef9ba8
Update tests ( #13479 )
2025-11-28 18:35:28 -08:00
nimlgen
192bf4e00a
amd,nv: remove unused env vars ( #13487 )
2025-11-28 23:12:53 +03:00
qazal
ae9c56134e
skip test_tk failing locally on macbook ( #13476 )
2025-11-29 01:15:37 +08:00
qazal
f33ccd31fd
viz: instruction deduping for SQTT inst waves ( #13482 )
2025-11-28 23:17:07 +08:00
Roelof van Dijk
eb543a91e8
perf: remove graph-in-graph from expand_index ( #13473 )
...
* remove graph-in-graph from devectorizer
* vectorize, not sink
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-27 11:32:16 -08:00
Roelof van Dijk
d3e125d05d
keyword changed (import reserved in python) ( #13477 )
2025-11-27 11:23:00 -08:00
qazal
72ef533d9c
tracing: use u32 for buffer args encoding ( #13472 )
2025-11-28 00:19:51 +08:00
George Hotz
18addc0a1d
process replay only get_program ( #13475 )
2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095
enable process replay (non-checking) by default ( #13474 )
2025-11-27 07:28:44 -08:00
qazal
952a6a8b10
viz: add kernel buffers back to the sidebar ( #13471 )
2025-11-27 22:10:35 +08:00
Kirill R.
57869387f9
Update wording in mnist.md ( #13469 )
2025-11-27 05:59:49 -08:00
nimlgen
1d207eca3d
cuda: fix fmt in compiler ( #13470 )
2025-11-27 16:51:17 +03:00
qazal
2df8a3474e
viz: bring back flops and mem in sidebar ( #13467 )
2025-11-27 17:27:44 +08:00
George Hotz
05cd2279d0
add cache on reshape ( #13466 )
...
* remove cache on divmod, way less objects
* _apply_reshape
* reshape
* no gc on realize
* wow that cache is fast
2025-11-26 18:57:40 -08:00
George Hotz
f4123b66df
add DEBUG_GC ( #13465 )
...
* add DEBUG_GC
* fixup create_schedule_with_vars
* work
2025-11-26 17:44:44 -08:00
George Hotz
19228e8d37
test_graph is flaky
2025-11-26 16:37:42 -08:00
George Hotz
268b3eb392
factor scheduling into complete_create_schedule_with_vars ( #13464 )
2025-11-26 15:43:27 -08:00
George Hotz
e4cd649ff0
remove kernelize to prepare for refactors ( #13463 )
...
* remove kernelize to prepare for refactors
* less kernelize
* last test
2025-11-26 14:18:50 -08:00
qazal
b63e5a7568
viz: full range x axis scroll ( #13459 )
2025-11-26 21:28:07 +08:00
qazal
c12e218751
viz: double click on INST wave ( #13458 )
2025-11-26 21:12:40 +08:00
qazal
e9cb738c7a
viz: event sidebar cleanup ( #13457 )
2025-11-26 19:47:15 +08:00
qazal
2a3b665972
viz: initial zoom at first event ( #13456 )
...
* viz: initial zoom at first event
* sidebar work
2025-11-26 16:42:06 +08:00
Christopher Milan
b2af92c821
fix HCQGraph.__del__ bug when finalizing ( #13298 )
...
* fix _do_ioctl import
* fix circular import
* suppress_finalizing instead
2025-11-25 20:33:48 -08:00
qazal
8c1e2a42fd
viz: start work on profiler speed ( #13455 )
2025-11-26 07:54:04 +08:00
wozeparrot
ffc31a23f4
tk mi350 ( #13288 )
2025-11-25 15:49:44 -08:00
nimlgen
436ab6bfc7
nv: use opt mutliple vaspaces ( #13453 )
2025-11-25 23:10:21 +03:00
qazal
7238df7a94
viz: cleanup sort_fn ( #13454 )
2025-11-26 04:10:10 +08:00
qazal
5520f1fb0b
viz: per cu timeline ( #13451 )
...
* add cu_loc
* work
* WAVE -> W
2025-11-26 00:05:20 +08:00
qazal
4a9562e353
viz: draw markers on top ( #13449 )
...
* viz: draw markers on top
* create generic label drawer
* same text rendering infrastructure for markers
* minor details
* diff
2025-11-25 17:27:01 +08:00
George Hotz
5373fd2d66
add user device ( #13447 )
...
* add user device
* add device_sort_fn (#13448 )
Co-authored-by: qazal <qazal.software@gmail.com >
* linter
* order by dname
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-11-25 15:25:45 +08:00
George Hotz
241e533451
toposort recursive_property is faster ( #13446 )
2025-11-24 22:29:15 -08:00
George Hotz
8e8fec408e
fix n^2 _apply_map_to_tensors [pr] ( #13443 )
...
* clean up slow rules
* fix rule
* non n^2 toposort
* topovisit
* state dict profile_marker
2025-11-24 18:59:16 -08:00
wozeparrot
249553a119
tinyfs tweaks ( #13444 )
2025-11-24 18:07:32 -08:00
wozeparrot
f46bc31156
tk: start and step in range ( #13442 )
2025-11-24 15:43:24 -08:00
George Hotz
cc5e6323ac
stable diffusion profiling ( #13441 )
...
* stable diffusion profiling
Signed-off-by: George Hotz <geohot@gmail.com >
* profile_marker
* profile per step
* fix slow Context
* profile that
---------
Signed-off-by: George Hotz <geohot@gmail.com >
2025-11-24 15:25:45 -08:00
nimlgen
18cfb54736
amd: a bit better se limiting ( #13440 )
...
* amd: a bit better se limiting
* SQTT_LIMIT_SE=0
2025-11-24 21:51:47 +03:00
C T
2d53029be3
Whisper less flaky tests ( #13435 )
...
* use less flaky metric for whisper long transcription
* multiline long transcription 3 reference
* fix reference transcript
see https://homepage.ntu.edu.tw/~karchung/miniconversations/MC.htm
sanitized for whisper
* try lower wer threshold
* add test for wer metric
* extract TRANSCRIPTION_3_ALT
* rename test
* rename
* add tests for high WER difference
* move tests
* sync metric
2025-11-24 09:50:49 -08:00
qazal
2a9bd12700
sqtt: add occupancy events to the timeline ( #13430 )
2025-11-24 22:28:05 +08:00