George Hotz
8e8fec408e
fix n^2 _apply_map_to_tensors [pr] ( #13443 )
...
* clean up slow rules
* fix rule
* non n^2 toposort
* topovisit
* state dict profile_marker
2025-11-24 18:59:16 -08:00
wozeparrot
249553a119
tinyfs tweaks ( #13444 )
2025-11-24 18:07:32 -08:00
wozeparrot
f46bc31156
tk: start and step in range ( #13442 )
2025-11-24 15:43:24 -08:00
George Hotz
cc5e6323ac
stable diffusion profiling ( #13441 )
...
* stable diffusion profiling
Signed-off-by: George Hotz <geohot@gmail.com >
* profile_marker
* profile per step
* fix slow Context
* profile that
---------
Signed-off-by: George Hotz <geohot@gmail.com >
2025-11-24 15:25:45 -08:00
nimlgen
18cfb54736
amd: a bit better se limiting ( #13440 )
...
* amd: a bit better se limiting
* SQTT_LIMIT_SE=0
2025-11-24 21:51:47 +03:00
C T
2d53029be3
Whisper less flaky tests ( #13435 )
...
* use less flaky metric for whisper long transcription
* multiline long transcription 3 reference
* fix reference transcript
see https://homepage.ntu.edu.tw/~karchung/miniconversations/MC.htm
sanitized for whisper
* try lower wer threshold
* add test for wer metric
* extract TRANSCRIPTION_3_ALT
* rename test
* rename
* add tests for high WER difference
* move tests
* sync metric
2025-11-24 09:50:49 -08:00
qazal
2a9bd12700
sqtt: add occupancy events to the timeline ( #13430 )
2025-11-24 22:28:05 +08:00
Sieds Lykles
63a931ff76
Symbolic divisor fuzzer ( #13433 )
...
* render z3 range better
* working version
* rename
* add to workflow
* factor out variable_names
* smaller expressions
* smaller
* + back
2025-11-23 20:29:32 +01:00
nimlgen
677db34eba
nv: cleanup map flags ( #13434 )
2025-11-23 19:54:52 +03:00
qazal
712c7a6448
sqtt loader cleanups from the occupancy branch ( #13431 )
...
* cleanup err handling
* from disasms
* s/wave_execs/wave_insts
2025-11-23 21:50:34 +08:00
George Hotz
9d7a17ee39
beautiful SQTT_PARSE=1 with color ( #13428 )
...
* beautiful SQTT_PARSE=1 with color
* linter
* linter 2
* a few more labels
* filter and or
* wave alloc
* a few more
2025-11-23 01:05:14 -08:00
qazal
474a631877
viz: align left offset for nested items ( #13420 )
2025-11-23 14:22:51 +08:00
George Hotz
da0aa57a3b
add cu parsing to attempt_sqtt_parse
2025-11-22 22:09:05 -08:00
qazal
320ed78803
can view wave timeline with SQTT_ITRACE_SE_MASK=0 ( #13427 )
2025-11-23 13:55:47 +08:00
Pranil
c1838c71fc
display service name typo ( #13426 )
...
its tinybox-display.service
2025-11-22 20:49:56 -08:00
George Hotz
5110409339
continue work on parse sqtt, enable with SQTT_PARSE ( #13425 )
...
* continue work on parse sqtt, enable with SQTT_PARSE
* fix timing
* delta is pre instruction
* hi8 values
* a few more
* a bit more
* let it crash if you enabled it
* figure out simd
* hide 0x11
2025-11-22 19:03:17 -08:00
George Hotz
92170d0ff1
lil op cleanup ( #13424 )
...
* track flag count and op count
* text
* more
* file count
* lil op cleanup
* cleanups
* move
2025-11-22 15:21:15 -08:00
George Hotz
423b76a852
improve sqtt format parser (saturday coffee shop project) ( #13419 )
...
* improve sqtt format parser
* actually read the trash code ChatGPT wrote
* cleanups
* hand written parser
* quality
* more
* was missing first packet
* maybe
* filt
* fixups
* label the waves
* progress
2025-11-22 15:04:10 -08:00
George Hotz
9d6cf3472e
remove op/sentinel
2025-11-22 15:01:47 -08:00
Christopher Milan
310da2a201
remove hashFiles in setup-tinygrad ( #13423 )
...
* fix hashFiles in setup-tinygrad on macos
* remove hashFiles altogether
2025-11-22 17:47:10 -05:00
qazal
c14033e10f
viz: faster startup time with SQTT=1 ( #13337 )
...
* roc.py cleanups
* direct append
* viz index cleanup
* simd row details
* add kernel arg
* late instructions decode
* more instruction decode to sep server request
* 200ms startup, 6 second to waves timeline
* sort units
* creating new http paths is easy now
* instructions unpacker
* min diff, use hyphens
* summary table
2025-11-22 22:02:30 +08:00
qazal
1655fdb6de
viz: cleanup sqtt loader ( #13417 )
2025-11-22 20:10:23 +08:00
qazal
903eec3754
fix sz.py tinygrad import in ci ( #13418 )
2025-11-22 19:20:26 +08:00
nimlgen
3a42680e22
amd: pmc generic arch for gfx10+ ( #13407 )
2025-11-22 12:31:23 +03:00
George Hotz
1f8b24a6b9
track flag count and op count ( #13416 )
...
* track flag count and op count
* text
* more
* file count
2025-11-21 22:46:33 -08:00
George Hotz
4c0f4226b9
delete the PRECAST op [p] ( #13415 )
...
* don't use PRECAST in cstyle renderer [p]
* fix in metal
* fix opencl
* __builtin_bit_cast
* precast is unused
* cuda is c99?
* lambda_union_bitcast
* helper function
* delete precast op
2025-11-21 21:47:14 -08:00
wozeparrot
1f648bb1ba
feat: reenable mobilenetv2 dsp ( #13320 )
2025-11-21 15:21:49 -08:00
chenyu
054477a44f
remove full_symbolic in simplify ( #13413 )
...
only flip one schedule in winograd backward, no functional difference
2025-11-21 15:04:00 -05:00
chenyu
cb29265f23
add test that shows the validhack regression with bad rewrite order ( #13411 )
2025-11-21 13:48:30 -05:00
qazal
fdfe83880b
viz: unique sqtt wave names ( #13410 )
...
* viz: unique sqtt wave names
* better name for the shape
* it's a per program counter now
* table view, refactor to wave:insts dict
2025-11-22 02:43:31 +08:00
chenyu
a6c9b4ff6a
fix symbolic comments [pr] ( #13408 )
2025-11-21 09:18:50 -05:00
Sieds Lykles
114bb94c55
Fix load collapse MAX to ADD ( #13406 )
...
* add Ops.ADD to pattern
* add test
2025-11-21 12:26:14 +01:00
qazal
87c248eafa
small cleanups from viz memory usage fixes ( #13405 )
...
* shape link cleanups
* cleanup findRectAtPosition
2025-11-21 17:05:08 +08:00
qazal
0de1b24154
viz: SE : CU : SIMD : WAVE in sqtt timeline ( #13404 )
...
* wave id in device rows
* SE : CU : SIMD : WAVE
* automatic width
* better styling
* rm the blue
* sort
2025-11-21 15:42:29 +08:00
George Hotz
dabb02767f
set AMD profile mode with sudo on SQTT or PMC ( #13403 )
...
* require profile mode
* add mode setter
* cleanup
* not needed
* SQTT_LIMIT_SE
2025-11-20 23:19:11 -08:00
George Hotz
e1051d00d7
multi like on full_like as well as rand_like ( #13402 )
...
* multi like on full_like as well as rand_like
* add test and fix bug
* mismatch, optim match
* one line
2025-11-20 20:46:48 -08:00
chenyu
fa3def2f12
call less simplify in simplify_valid_load [pr] ( #13401 )
2025-11-20 19:54:22 -05:00
qazal
895ec7417e
viz: enable mapping function names to colors ( #13400 )
2025-11-21 06:43:02 +08:00
George Hotz
a74f6020d5
track apply map to tensors ( #13399 )
...
* track apply map to tensors
* sub
2025-11-20 14:24:55 -08:00
chenyu
647fde64e6
no sym in pm_reduce [pr] ( #13398 )
...
* no sym in pm_reduce [pr]
* fix that
2025-11-20 16:49:09 -05:00
qazal
1313250e0d
viz: use system helper for llvm-mca ( #13395 )
2025-11-21 04:47:25 +08:00
Christopher Milan
de3593957f
Revert "Revert "autogen: fix formatting on zero-argument function-like macros…" ( #13388 )
...
This reverts commit 0901a40685 .
2025-11-20 15:36:13 -05:00
qazal
1220072328
viz: refactor to generic steps api ( #13393 )
2025-11-21 04:33:23 +08:00
George Hotz
26ccbf7040
debufferize with symbolic in one pm ( #13392 )
2025-11-20 11:47:03 -08:00
George Hotz
c46f608703
top down remove_bufferize ( #13391 )
...
* top down remove_bufferize
* removable if ALWAYS_CONTIGUOUS
2025-11-20 11:32:00 -08:00
Christopher Milan
4043489803
set curl -f in setup-tinygrad ( #13389 )
...
* set curl -f in setup-tinygrad
* test bad redirect
* Revert "test bad redirect"
This reverts commit ad945e7ffc .
2025-11-20 13:45:47 -05:00
chenyu
0251a8e628
parse_valid minor cleanup [pr] ( #13385 )
...
* stricter parse_valid [pr]
* not stricter
* no VCONST
* Revert "no VCONST"
This reverts commit 330dbdf4060562596febcbf970bda6051a35012f.
2025-11-20 13:15:06 -05:00
Christopher Milan
0901a40685
Revert "autogen: fix formatting on zero-argument function-like macros ( #13386 )" ( #13387 )
...
This reverts commit 58d85d4bab .
2025-11-20 12:45:35 -05:00
b1tg
91e289cb14
amd fp8 llvm ( #13186 )
...
* amd fp8 llvm support
* fix max
* clean
* add test_mi350.sh
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-20 12:35:57 -05:00
Roelof van Dijk
1058748440
torch backend: no aten.detach for torch 2.10 compat ( #13381 )
...
* this works, less cpp?
* simpler = better
* keep torch 2.9 working as well
2025-11-20 09:12:15 -08:00