Commit Graph

11201 Commits

Author SHA1 Message Date
Roelof van Dijk
d3e125d05d keyword changed (import reserved in python) (#13477) 2025-11-27 11:23:00 -08:00
qazal
72ef533d9c tracing: use u32 for buffer args encoding (#13472) 2025-11-28 00:19:51 +08:00
George Hotz
18addc0a1d process replay only get_program (#13475) 2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095 enable process replay (non-checking) by default (#13474) 2025-11-27 07:28:44 -08:00
qazal
952a6a8b10 viz: add kernel buffers back to the sidebar (#13471) 2025-11-27 22:10:35 +08:00
Kirill R.
57869387f9 Update wording in mnist.md (#13469) 2025-11-27 05:59:49 -08:00
nimlgen
1d207eca3d cuda: fix fmt in compiler (#13470) 2025-11-27 16:51:17 +03:00
qazal
2df8a3474e viz: bring back flops and mem in sidebar (#13467) 2025-11-27 17:27:44 +08:00
George Hotz
05cd2279d0 add cache on reshape (#13466)
* remove cache on divmod, way less objects

* _apply_reshape

* reshape

* no gc on realize

* wow that cache is fast
2025-11-26 18:57:40 -08:00
George Hotz
f4123b66df add DEBUG_GC (#13465)
* add DEBUG_GC

* fixup create_schedule_with_vars

* work
2025-11-26 17:44:44 -08:00
George Hotz
19228e8d37 test_graph is flaky 2025-11-26 16:37:42 -08:00
George Hotz
268b3eb392 factor scheduling into complete_create_schedule_with_vars (#13464) 2025-11-26 15:43:27 -08:00
George Hotz
e4cd649ff0 remove kernelize to prepare for refactors (#13463)
* remove kernelize to prepare for refactors

* less kernelize

* last test
2025-11-26 14:18:50 -08:00
qazal
b63e5a7568 viz: full range x axis scroll (#13459) 2025-11-26 21:28:07 +08:00
qazal
c12e218751 viz: double click on INST wave (#13458) 2025-11-26 21:12:40 +08:00
qazal
e9cb738c7a viz: event sidebar cleanup (#13457) 2025-11-26 19:47:15 +08:00
qazal
2a3b665972 viz: initial zoom at first event (#13456)
* viz: initial zoom at first event

* sidebar work
2025-11-26 16:42:06 +08:00
Christopher Milan
b2af92c821 fix HCQGraph.__del__ bug when finalizing (#13298)
* fix _do_ioctl import

* fix circular import

* suppress_finalizing instead
2025-11-25 20:33:48 -08:00
qazal
8c1e2a42fd viz: start work on profiler speed (#13455) 2025-11-26 07:54:04 +08:00
wozeparrot
ffc31a23f4 tk mi350 (#13288) 2025-11-25 15:49:44 -08:00
nimlgen
436ab6bfc7 nv: use opt mutliple vaspaces (#13453) 2025-11-25 23:10:21 +03:00
qazal
7238df7a94 viz: cleanup sort_fn (#13454) 2025-11-26 04:10:10 +08:00
qazal
5520f1fb0b viz: per cu timeline (#13451)
* add cu_loc

* work

* WAVE -> W
2025-11-26 00:05:20 +08:00
qazal
4a9562e353 viz: draw markers on top (#13449)
* viz: draw markers on top

* create generic label drawer

* same text rendering infrastructure for markers

* minor details

* diff
2025-11-25 17:27:01 +08:00
George Hotz
5373fd2d66 add user device (#13447)
* add user device

* add device_sort_fn (#13448)

Co-authored-by: qazal <qazal.software@gmail.com>

* linter

* order by dname

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-11-25 15:25:45 +08:00
George Hotz
241e533451 toposort recursive_property is faster (#13446) 2025-11-24 22:29:15 -08:00
George Hotz
8e8fec408e fix n^2 _apply_map_to_tensors [pr] (#13443)
* clean up slow rules

* fix rule

* non n^2 toposort

* topovisit

* state dict profile_marker
2025-11-24 18:59:16 -08:00
wozeparrot
249553a119 tinyfs tweaks (#13444) 2025-11-24 18:07:32 -08:00
wozeparrot
f46bc31156 tk: start and step in range (#13442) 2025-11-24 15:43:24 -08:00
George Hotz
cc5e6323ac stable diffusion profiling (#13441)
* stable diffusion profiling

Signed-off-by: George Hotz <geohot@gmail.com>

* profile_marker

* profile per step

* fix slow Context

* profile that

---------

Signed-off-by: George Hotz <geohot@gmail.com>
2025-11-24 15:25:45 -08:00
nimlgen
18cfb54736 amd: a bit better se limiting (#13440)
* amd: a bit better se limiting

* SQTT_LIMIT_SE=0
2025-11-24 21:51:47 +03:00
C T
2d53029be3 Whisper less flaky tests (#13435)
* use less flaky metric for whisper long transcription

* multiline long transcription 3 reference

* fix reference transcript

see https://homepage.ntu.edu.tw/~karchung/miniconversations/MC.htm
sanitized for whisper

* try lower wer threshold

* add test for wer metric

* extract TRANSCRIPTION_3_ALT

* rename test

* rename

* add tests for high WER difference

* move tests

* sync metric
2025-11-24 09:50:49 -08:00
qazal
2a9bd12700 sqtt: add occupancy events to the timeline (#13430) 2025-11-24 22:28:05 +08:00
Sieds Lykles
63a931ff76 Symbolic divisor fuzzer (#13433)
* render z3 range better

* working version

* rename

* add to workflow

* factor out variable_names

* smaller expressions

* smaller

* + back
2025-11-23 20:29:32 +01:00
nimlgen
677db34eba nv: cleanup map flags (#13434) 2025-11-23 19:54:52 +03:00
qazal
712c7a6448 sqtt loader cleanups from the occupancy branch (#13431)
* cleanup err handling

* from disasms

* s/wave_execs/wave_insts
2025-11-23 21:50:34 +08:00
George Hotz
9d7a17ee39 beautiful SQTT_PARSE=1 with color (#13428)
* beautiful SQTT_PARSE=1 with color

* linter

* linter 2

* a few more labels

* filter and or

* wave alloc

* a few more
2025-11-23 01:05:14 -08:00
qazal
474a631877 viz: align left offset for nested items (#13420) 2025-11-23 14:22:51 +08:00
George Hotz
da0aa57a3b add cu parsing to attempt_sqtt_parse 2025-11-22 22:09:05 -08:00
qazal
320ed78803 can view wave timeline with SQTT_ITRACE_SE_MASK=0 (#13427) 2025-11-23 13:55:47 +08:00
Pranil
c1838c71fc display service name typo (#13426)
its tinybox-display.service
2025-11-22 20:49:56 -08:00
George Hotz
5110409339 continue work on parse sqtt, enable with SQTT_PARSE (#13425)
* continue work on parse sqtt, enable with SQTT_PARSE

* fix timing

* delta is pre instruction

* hi8 values

* a few more

* a bit more

* let it crash if you enabled it

* figure out simd

* hide 0x11
2025-11-22 19:03:17 -08:00
George Hotz
92170d0ff1 lil op cleanup (#13424)
* track flag count and op count

* text

* more

* file count

* lil op cleanup

* cleanups

* move
2025-11-22 15:21:15 -08:00
George Hotz
423b76a852 improve sqtt format parser (saturday coffee shop project) (#13419)
* improve sqtt format parser

* actually read the trash code ChatGPT wrote

* cleanups

* hand written parser

* quality

* more

* was missing first packet

* maybe

* filt

* fixups

* label the waves

* progress
2025-11-22 15:04:10 -08:00
George Hotz
9d6cf3472e remove op/sentinel 2025-11-22 15:01:47 -08:00
Christopher Milan
310da2a201 remove hashFiles in setup-tinygrad (#13423)
* fix hashFiles in setup-tinygrad on macos

* remove hashFiles altogether
2025-11-22 17:47:10 -05:00
qazal
c14033e10f viz: faster startup time with SQTT=1 (#13337)
* roc.py cleanups

* direct append

* viz index cleanup

* simd row details

* add kernel arg

* late instructions decode

* more instruction decode to sep server request

* 200ms startup, 6 second to waves timeline

* sort units

* creating new http paths is easy now

* instructions unpacker

* min diff, use hyphens

* summary table
2025-11-22 22:02:30 +08:00
qazal
1655fdb6de viz: cleanup sqtt loader (#13417) 2025-11-22 20:10:23 +08:00
qazal
903eec3754 fix sz.py tinygrad import in ci (#13418) 2025-11-22 19:20:26 +08:00
nimlgen
3a42680e22 amd: pmc generic arch for gfx10+ (#13407) 2025-11-22 12:31:23 +03:00