Commit Graph

11175 Commits

Author SHA1 Message Date
George Hotz
8e8fec408e fix n^2 _apply_map_to_tensors [pr] (#13443)
* clean up slow rules

* fix rule

* non n^2 toposort

* topovisit

* state dict profile_marker
2025-11-24 18:59:16 -08:00
wozeparrot
249553a119 tinyfs tweaks (#13444) 2025-11-24 18:07:32 -08:00
wozeparrot
f46bc31156 tk: start and step in range (#13442) 2025-11-24 15:43:24 -08:00
George Hotz
cc5e6323ac stable diffusion profiling (#13441)
* stable diffusion profiling

Signed-off-by: George Hotz <geohot@gmail.com>

* profile_marker

* profile per step

* fix slow Context

* profile that

---------

Signed-off-by: George Hotz <geohot@gmail.com>
2025-11-24 15:25:45 -08:00
nimlgen
18cfb54736 amd: a bit better se limiting (#13440)
* amd: a bit better se limiting

* SQTT_LIMIT_SE=0
2025-11-24 21:51:47 +03:00
C T
2d53029be3 Whisper less flaky tests (#13435)
* use less flaky metric for whisper long transcription

* multiline long transcription 3 reference

* fix reference transcript

see https://homepage.ntu.edu.tw/~karchung/miniconversations/MC.htm
sanitized for whisper

* try lower wer threshold

* add test for wer metric

* extract TRANSCRIPTION_3_ALT

* rename test

* rename

* add tests for high WER difference

* move tests

* sync metric
2025-11-24 09:50:49 -08:00
qazal
2a9bd12700 sqtt: add occupancy events to the timeline (#13430) 2025-11-24 22:28:05 +08:00
Sieds Lykles
63a931ff76 Symbolic divisor fuzzer (#13433)
* render z3 range better

* working version

* rename

* add to workflow

* factor out variable_names

* smaller expressions

* smaller

* + back
2025-11-23 20:29:32 +01:00
nimlgen
677db34eba nv: cleanup map flags (#13434) 2025-11-23 19:54:52 +03:00
qazal
712c7a6448 sqtt loader cleanups from the occupancy branch (#13431)
* cleanup err handling

* from disasms

* s/wave_execs/wave_insts
2025-11-23 21:50:34 +08:00
George Hotz
9d7a17ee39 beautiful SQTT_PARSE=1 with color (#13428)
* beautiful SQTT_PARSE=1 with color

* linter

* linter 2

* a few more labels

* filter and or

* wave alloc

* a few more
2025-11-23 01:05:14 -08:00
qazal
474a631877 viz: align left offset for nested items (#13420) 2025-11-23 14:22:51 +08:00
George Hotz
da0aa57a3b add cu parsing to attempt_sqtt_parse 2025-11-22 22:09:05 -08:00
qazal
320ed78803 can view wave timeline with SQTT_ITRACE_SE_MASK=0 (#13427) 2025-11-23 13:55:47 +08:00
Pranil
c1838c71fc display service name typo (#13426)
its tinybox-display.service
2025-11-22 20:49:56 -08:00
George Hotz
5110409339 continue work on parse sqtt, enable with SQTT_PARSE (#13425)
* continue work on parse sqtt, enable with SQTT_PARSE

* fix timing

* delta is pre instruction

* hi8 values

* a few more

* a bit more

* let it crash if you enabled it

* figure out simd

* hide 0x11
2025-11-22 19:03:17 -08:00
George Hotz
92170d0ff1 lil op cleanup (#13424)
* track flag count and op count

* text

* more

* file count

* lil op cleanup

* cleanups

* move
2025-11-22 15:21:15 -08:00
George Hotz
423b76a852 improve sqtt format parser (saturday coffee shop project) (#13419)
* improve sqtt format parser

* actually read the trash code ChatGPT wrote

* cleanups

* hand written parser

* quality

* more

* was missing first packet

* maybe

* filt

* fixups

* label the waves

* progress
2025-11-22 15:04:10 -08:00
George Hotz
9d6cf3472e remove op/sentinel 2025-11-22 15:01:47 -08:00
Christopher Milan
310da2a201 remove hashFiles in setup-tinygrad (#13423)
* fix hashFiles in setup-tinygrad on macos

* remove hashFiles altogether
2025-11-22 17:47:10 -05:00
qazal
c14033e10f viz: faster startup time with SQTT=1 (#13337)
* roc.py cleanups

* direct append

* viz index cleanup

* simd row details

* add kernel arg

* late instructions decode

* more instruction decode to sep server request

* 200ms startup, 6 second to waves timeline

* sort units

* creating new http paths is easy now

* instructions unpacker

* min diff, use hyphens

* summary table
2025-11-22 22:02:30 +08:00
qazal
1655fdb6de viz: cleanup sqtt loader (#13417) 2025-11-22 20:10:23 +08:00
qazal
903eec3754 fix sz.py tinygrad import in ci (#13418) 2025-11-22 19:20:26 +08:00
nimlgen
3a42680e22 amd: pmc generic arch for gfx10+ (#13407) 2025-11-22 12:31:23 +03:00
George Hotz
1f8b24a6b9 track flag count and op count (#13416)
* track flag count and op count

* text

* more

* file count
2025-11-21 22:46:33 -08:00
George Hotz
4c0f4226b9 delete the PRECAST op [p] (#13415)
* don't use PRECAST in cstyle renderer [p]

* fix in metal

* fix opencl

* __builtin_bit_cast

* precast is unused

* cuda is c99?

* lambda_union_bitcast

* helper function

* delete precast op
2025-11-21 21:47:14 -08:00
wozeparrot
1f648bb1ba feat: reenable mobilenetv2 dsp (#13320) 2025-11-21 15:21:49 -08:00
chenyu
054477a44f remove full_symbolic in simplify (#13413)
only flip one schedule in winograd backward, no functional difference
2025-11-21 15:04:00 -05:00
chenyu
cb29265f23 add test that shows the validhack regression with bad rewrite order (#13411) 2025-11-21 13:48:30 -05:00
qazal
fdfe83880b viz: unique sqtt wave names (#13410)
* viz: unique sqtt wave names

* better name for the shape

* it's a per program counter now

* table view, refactor to wave:insts dict
2025-11-22 02:43:31 +08:00
chenyu
a6c9b4ff6a fix symbolic comments [pr] (#13408) 2025-11-21 09:18:50 -05:00
Sieds Lykles
114bb94c55 Fix load collapse MAX to ADD (#13406)
* add Ops.ADD to pattern

* add test
2025-11-21 12:26:14 +01:00
qazal
87c248eafa small cleanups from viz memory usage fixes (#13405)
* shape link cleanups

* cleanup findRectAtPosition
2025-11-21 17:05:08 +08:00
qazal
0de1b24154 viz: SE : CU : SIMD : WAVE in sqtt timeline (#13404)
* wave id in device rows

* SE : CU : SIMD : WAVE

* automatic width

* better styling

* rm the blue

* sort
2025-11-21 15:42:29 +08:00
George Hotz
dabb02767f set AMD profile mode with sudo on SQTT or PMC (#13403)
* require profile mode

* add mode setter

* cleanup

* not needed

* SQTT_LIMIT_SE
2025-11-20 23:19:11 -08:00
George Hotz
e1051d00d7 multi like on full_like as well as rand_like (#13402)
* multi like on full_like as well as rand_like

* add test and fix bug

* mismatch, optim match

* one line
2025-11-20 20:46:48 -08:00
chenyu
fa3def2f12 call less simplify in simplify_valid_load [pr] (#13401) 2025-11-20 19:54:22 -05:00
qazal
895ec7417e viz: enable mapping function names to colors (#13400) 2025-11-21 06:43:02 +08:00
George Hotz
a74f6020d5 track apply map to tensors (#13399)
* track apply map to tensors

* sub
2025-11-20 14:24:55 -08:00
chenyu
647fde64e6 no sym in pm_reduce [pr] (#13398)
* no sym in pm_reduce [pr]

* fix that
2025-11-20 16:49:09 -05:00
qazal
1313250e0d viz: use system helper for llvm-mca (#13395) 2025-11-21 04:47:25 +08:00
Christopher Milan
de3593957f Revert "Revert "autogen: fix formatting on zero-argument function-like macros…" (#13388)
This reverts commit 0901a40685.
2025-11-20 15:36:13 -05:00
qazal
1220072328 viz: refactor to generic steps api (#13393) 2025-11-21 04:33:23 +08:00
George Hotz
26ccbf7040 debufferize with symbolic in one pm (#13392) 2025-11-20 11:47:03 -08:00
George Hotz
c46f608703 top down remove_bufferize (#13391)
* top down remove_bufferize

* removable if ALWAYS_CONTIGUOUS
2025-11-20 11:32:00 -08:00
Christopher Milan
4043489803 set curl -f in setup-tinygrad (#13389)
* set curl -f in setup-tinygrad

* test bad redirect

* Revert "test bad redirect"

This reverts commit ad945e7ffc.
2025-11-20 13:45:47 -05:00
chenyu
0251a8e628 parse_valid minor cleanup [pr] (#13385)
* stricter parse_valid [pr]

* not stricter

* no VCONST

* Revert "no VCONST"

This reverts commit 330dbdf4060562596febcbf970bda6051a35012f.
2025-11-20 13:15:06 -05:00
Christopher Milan
0901a40685 Revert "autogen: fix formatting on zero-argument function-like macros (#13386)" (#13387)
This reverts commit 58d85d4bab.
2025-11-20 12:45:35 -05:00
b1tg
91e289cb14 amd fp8 llvm (#13186)
* amd fp8 llvm support

* fix max

* clean

* add test_mi350.sh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-20 12:35:57 -05:00
Roelof van Dijk
1058748440 torch backend: no aten.detach for torch 2.10 compat (#13381)
* this works, less cpp?

* simpler = better

* keep torch 2.9 working as well
2025-11-20 09:12:15 -08:00