4842 Commits

Author SHA1 Message Date
George Hotz
18addc0a1d process replay only get_program (#13475) 2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095 enable process replay (non-checking) by default (#13474) 2025-11-27 07:28:44 -08:00
George Hotz
05cd2279d0 add cache on reshape (#13466)
* remove cache on divmod, way less objects

* _apply_reshape

* reshape

* no gc on realize

* wow that cache is fast
2025-11-26 18:57:40 -08:00
George Hotz
19228e8d37 test_graph is flaky 2025-11-26 16:37:42 -08:00
George Hotz
e4cd649ff0 remove kernelize to prepare for refactors (#13463)
* remove kernelize to prepare for refactors

* less kernelize

* last test
2025-11-26 14:18:50 -08:00
wozeparrot
ffc31a23f4 tk mi350 (#13288) 2025-11-25 15:49:44 -08:00
qazal
7238df7a94 viz: cleanup sort_fn (#13454) 2025-11-26 04:10:10 +08:00
wozeparrot
249553a119 tinyfs tweaks (#13444) 2025-11-24 18:07:32 -08:00
C T
2d53029be3 Whisper less flaky tests (#13435)
* use less flaky metric for whisper long transcription

* multiline long transcription 3 reference

* fix reference transcript

see https://homepage.ntu.edu.tw/~karchung/miniconversations/MC.htm
sanitized for whisper

* try lower wer threshold

* add test for wer metric

* extract TRANSCRIPTION_3_ALT

* rename test

* rename

* add tests for high WER difference

* move tests

* sync metric
2025-11-24 09:50:49 -08:00
Sieds Lykles
63a931ff76 Symbolic divisor fuzzer (#13433)
* render z3 range better

* working version

* rename

* add to workflow

* factor out variable_names

* smaller expressions

* smaller

* + back
2025-11-23 20:29:32 +01:00
George Hotz
9d7a17ee39 beautiful SQTT_PARSE=1 with color (#13428)
* beautiful SQTT_PARSE=1 with color

* linter

* linter 2

* a few more labels

* filter and or

* wave alloc

* a few more
2025-11-23 01:05:14 -08:00
chenyu
cb29265f23 add test that shows the validhack regression with bad rewrite order (#13411) 2025-11-21 13:48:30 -05:00
Sieds Lykles
114bb94c55 Fix load collapse MAX to ADD (#13406)
* add Ops.ADD to pattern

* add test
2025-11-21 12:26:14 +01:00
George Hotz
e1051d00d7 multi like on full_like as well as rand_like (#13402)
* multi like on full_like as well as rand_like

* add test and fix bug

* mismatch, optim match

* one line
2025-11-20 20:46:48 -08:00
chenyu
647fde64e6 no sym in pm_reduce [pr] (#13398)
* no sym in pm_reduce [pr]

* fix that
2025-11-20 16:49:09 -05:00
chenyu
0251a8e628 parse_valid minor cleanup [pr] (#13385)
* stricter parse_valid [pr]

* not stricter

* no VCONST

* Revert "no VCONST"

This reverts commit 330dbdf4060562596febcbf970bda6051a35012f.
2025-11-20 13:15:06 -05:00
qazal
9dcd52287a add external_benchmark_pyrender (#13378)
* add external_benchmark_pyrender

* can ctrlc it

* cpu_profile exists
2025-11-20 17:38:28 +08:00
George Hotz
8919c994b7 Revert "AxisType.PLACEHOLDER in reshape to do less graph_rewrite (#13373)" (#13375)
This reverts commit ac7559e33d.
2025-11-19 19:34:30 -08:00
George Hotz
ac7559e33d AxisType.PLACEHOLDER in reshape to do less graph_rewrite (#13373)
* AxisType.PLACEHOLDER in reshape to do less graph_rewrite

* _apply_movement_op cache
2025-11-19 19:19:58 -08:00
George Hotz
ab7df42c78 bring back fold_divmod_general with bugfix and test [pr] (#13369)
* Revert "Revert "merge to fold_divmod_general [p] (#13359)""

This reverts commit 05ccc69248.

* Revert "Revert "actually merge to fold_divmod_general [pr] (#13363)""

This reverts commit 90e5752199.

* Revert "Revert "add cache to fold_divmod_general (#13365)""

This reverts commit 8e17bd6791.

* bring back fold_divmod_general with bugfix and test
2025-11-19 14:51:51 -08:00
George Hotz
986d113024 symbolic fuzz failure (#13367)
* symbolic fuzz failure

* skip flaky test
2025-11-19 14:21:08 -08:00
George Hotz
05ccc69248 Revert "merge to fold_divmod_general [p] (#13359)"
This reverts commit 7711bbac7f.
2025-11-19 14:18:09 -08:00
George Hotz
8e17bd6791 Revert "add cache to fold_divmod_general (#13365)"
This reverts commit b5309a5043.
2025-11-19 14:18:08 -08:00
George Hotz
b5309a5043 add cache to fold_divmod_general (#13365) 2025-11-19 13:49:18 -08:00
George Hotz
7711bbac7f merge to fold_divmod_general [p] (#13359)
* merge to fold_divmod_general [p]

* merge more

* merge more

* merge more
2025-11-19 11:37:45 -08:00
George Hotz
6fdbd03104 more divmod cleanup [p] (#13358)
* more divmod cleanup [p]

* lil cleanups, faster
2025-11-19 10:35:15 -08:00
George Hotz
957cf717e7 Python speed (#13355)
* skip process replay by default

* work on python speed

* fix names of rewrite rules

* fix that test
2025-11-19 09:03:00 -08:00
George Hotz
385618d45b skip process replay by default (#13353) 2025-11-19 08:25:34 -08:00
Christopher Milan
a438c277de autogen tests for 3.14 (#13343) 2025-11-18 22:16:59 -05:00
George Hotz
1afa3c0877 vmap on full model (#13340)
* vmap on full model

* vmap gemm

* reduce sums on end

* outer reduce

* only if there's ranges

* put those rules in symbolic

* ranges

* do opt later

* add zero range
2025-11-18 16:06:06 -08:00
George Hotz
9c59b3d19e vmap grad needs reduce_backward (#13336)
* vmap grad needs reduce_backward

* fuse and outer
2025-11-18 10:08:30 -08:00
George Hotz
06e39a88a9 outer vmap works (#13334)
* outer vmap works

* fuse works

* vmap outer works

* outer ranges work

* grad work

* should be good to merge
2025-11-18 09:27:48 -08:00
George Hotz
583560ab72 this is the right way to write vmap (#13328) 2025-11-17 20:20:52 -08:00
George Hotz
e4fead8a86 write scan in uops (#13321)
* write scan in uops

* ops range

* no need for variable

* meh, later

* shorter
2025-11-17 16:58:08 -08:00
George Hotz
6d3385c284 print special ops in postrange (#13318)
* print special ops in postrange

* fix on OSX
2025-11-17 14:43:23 -08:00
wozeparrot
33773fda87 tk initial mi350 (#13289) 2025-11-17 11:46:32 -08:00
nimlgen
9bb17c53ea amd: timer fix (#13267) 2025-11-17 13:59:03 +03:00
George Hotz
cabd4add48 more work parsing SQTT, separate VIZ/PROFILE (#13308)
* more work parsing SQTT

* more minimal runner

* sep VIZ/PROFILE

* parse print new

* improve parser

* more filter

* that

* split them

* lil cleanup

* skip flaky test

* AQL in mmapeak
2025-11-16 10:40:39 -08:00
George Hotz
295600dc5a saturday coffee shop work parsing the att format (#13295)
* saturday coffee shop work parsing the att format

* add examples

* parser

* classes of packets

* fully vibe coded parser

* vibing

* empty

* some vibe names

* vibes

* most of these are wrong

* more vibes

* better names

* parsing

* parse

* cleanup parser

* touchups
2025-11-16 08:25:51 -08:00
chenyu
8f0e747b3a Tensor._tri with arange (#13297) 2025-11-16 10:21:16 -05:00
chenyu
e8844853ed Tensor.eye with arange (#13287)
with rangify we can write these with arange
2025-11-15 12:32:27 -05:00
George Hotz
22c08b470c fold using outerworld range (#13286)
* scan using outerworld range

* almost

* sched

* simple range

* mypy

* woooo outer range

* spec passes

* print the numbers

* lol it runs

* real test
2025-11-14 20:43:41 -08:00
George Hotz
567066f51f tests for cast there and back (#13195)
* fix cast folding in llama

* dtypes that work everywhere

* Skip test_cast_there_and_back for backend casts

Skip test due to backend casting issues.
2025-11-14 16:56:09 -08:00
George Hotz
6c5fa349e1 add (unused) outer range (#13285) 2025-11-14 16:47:52 -08:00
George Hotz
e5351699bd openpilot warp (#13283)
* openpilot image warp test

* 0.4 ms on metal, 1 ms on CPU

* new inputs each time

* reshape
2025-11-14 13:55:32 -08:00
chenyu
888aaab151 test_tiny cleanup (#13276) 2025-11-14 11:11:32 -05:00
nimlgen
3e63831b98 nv: support 580+ drivers (#13269)
* nv: 580+ support

* start

* f

* fake

* fix
2025-11-14 21:44:16 +08:00
nimlgen
c80d459d99 autogen: fix packed args structs (#13274)
* autogen: fix packed args structs

* and test this
2025-11-14 20:24:06 +08:00
nimlgen
14eb48b13a autogen: rename nv_gpu to nv_570 (#13273)
* autogen: rename nv_gpu to nv_570

* rename
2025-11-14 20:07:19 +08:00
nimlgen
f72b1fbca4 nv: read numClasses (#13271)
* nv: read numClasses

* fix

* d
2025-11-14 19:43:25 +08:00