George Hotz
2c8ad1b419
_apply_movement_op cache
2025-11-19 16:10:00 -08:00
George Hotz
821f3771df
AxisType.PLACEHOLDER in reshape to do less graph_rewrite
2025-11-19 16:04:23 -08:00
Roelof van Dijk
0dc2ff431d
fix: revive torch backend ( #13280 )
...
* fix: revive torch backend
* as_strided view vs copy
* Revert "as_strided view vs copy"
This reverts commit 82a61223f2 .
* add extra tests (move inplace, add fusion tests)
* better fusion with inplace_op
* no optimizer hooks (break mnist training fusion)
* split off fusion tests in separate file, assert on resnet fusion
fix: remove comments
* cleanup, reduce diff
* reduce diff
* better fusion and identity checks
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-19 15:26:50 -08:00
wozeparrot
56b2540349
tk: keep extra tile data by replacing uop ( #13370 )
2025-11-19 15:11:43 -08:00
George Hotz
ab7df42c78
bring back fold_divmod_general with bugfix and test [pr] ( #13369 )
...
* Revert "Revert "merge to fold_divmod_general [p] (#13359 )""
This reverts commit 05ccc69248 .
* Revert "Revert "actually merge to fold_divmod_general [pr] (#13363 )""
This reverts commit 90e5752199 .
* Revert "Revert "add cache to fold_divmod_general (#13365 )""
This reverts commit 8e17bd6791 .
* bring back fold_divmod_general with bugfix and test
2025-11-19 14:51:51 -08:00
George Hotz
986d113024
symbolic fuzz failure ( #13367 )
...
* symbolic fuzz failure
* skip flaky test
2025-11-19 14:21:08 -08:00
George Hotz
05ccc69248
Revert "merge to fold_divmod_general [p] ( #13359 )"
...
This reverts commit 7711bbac7f .
2025-11-19 14:18:09 -08:00
George Hotz
90e5752199
Revert "actually merge to fold_divmod_general [pr] ( #13363 )"
...
This reverts commit 3d82b83cec .
2025-11-19 14:18:08 -08:00
George Hotz
8e17bd6791
Revert "add cache to fold_divmod_general ( #13365 )"
...
This reverts commit b5309a5043 .
2025-11-19 14:18:08 -08:00
George Hotz
b5309a5043
add cache to fold_divmod_general ( #13365 )
2025-11-19 13:49:18 -08:00
George Hotz
3d82b83cec
actually merge to fold_divmod_general [pr] ( #13363 )
...
* actually merge to fold_divmod_general [pr]
* one more merge
* Revert "one more merge"
This reverts commit aa79f6781c .
* avoid that case for speed
* faster and simpler
2025-11-19 13:17:56 -08:00
chenyu
a91f00925b
remove VECTORIZE and WMMA rules from sym [pr] ( #13362 )
2025-11-19 14:51:21 -05:00
George Hotz
7711bbac7f
merge to fold_divmod_general [p] ( #13359 )
...
* merge to fold_divmod_general [p]
* merge more
* merge more
* merge more
2025-11-19 11:37:45 -08:00
George Hotz
6fdbd03104
more divmod cleanup [p] ( #13358 )
...
* more divmod cleanup [p]
* lil cleanups, faster
2025-11-19 10:35:15 -08:00
George Hotz
bd88a72149
div and mod to its own file, try 2 [p] ( #13357 )
2025-11-19 10:10:06 -08:00
George Hotz
957cf717e7
Python speed ( #13355 )
...
* skip process replay by default
* work on python speed
* fix names of rewrite rules
* fix that test
2025-11-19 09:03:00 -08:00
chenyu
fc19ea76b5
clean up threefry rules ( #13354 )
2025-11-19 11:48:07 -05:00
George Hotz
385618d45b
skip process replay by default ( #13353 )
2025-11-19 08:25:34 -08:00
chenyu
fba4535289
remove hacks for threefry long removal when padded [pr] ( #13352 )
2025-11-19 11:11:39 -05:00
George Hotz
225eb1500f
generic range changes that work for str + int ( #13350 )
...
* generic range changes that work for str + int
* opt range counts up
2025-11-19 08:07:49 -08:00
chenyu
1a72ac16a6
move where same false branch rule to symbolic_simple [pr] ( #13349 )
2025-11-19 10:15:38 -05:00
chenyu
79055ddb8b
clean propagate_invalid more [pr] ( #13347 )
2025-11-19 09:47:50 -05:00
nimlgen
0c9fbf87e1
nvioctl: classes ( #13346 )
2025-11-19 16:14:15 +03:00
qazal
f2221130bb
viz: pick shape by event type ( #13279 )
2025-11-19 20:15:52 +08:00
wozeparrot
be72b78dcb
tk: small fixes ( #13345 )
...
* fix: handle case where final uop isn't a tk wrapped one
* clean: remove after from mma
2025-11-19 00:58:50 -08:00
wozeparrot
e4fbde5b3b
fix: extra options need to go on second step too ( #13344 )
2025-11-19 00:58:09 -08:00
George Hotz
1a332afa76
spec test on 3.14 ( #12957 )
2025-11-19 00:43:04 -08:00
Christopher Milan
a438c277de
autogen tests for 3.14 ( #13343 )
2025-11-18 22:16:59 -05:00
chenyu
722e7a16ed
remove rule in propagate_invalid [pr] ( #13342 )
2025-11-18 21:38:33 -05:00
George Hotz
1afa3c0877
vmap on full model ( #13340 )
...
* vmap on full model
* vmap gemm
* reduce sums on end
* outer reduce
* only if there's ranges
* put those rules in symbolic
* ranges
* do opt later
* add zero range
2025-11-18 16:06:06 -08:00
chenyu
46cb65e692
delete rules from sym [pr] ( #13339 )
2025-11-18 14:57:35 -05:00
George Hotz
9c59b3d19e
vmap grad needs reduce_backward ( #13336 )
...
* vmap grad needs reduce_backward
* fuse and outer
2025-11-18 10:08:30 -08:00
qazal
a647c9eca6
sqtt ui minor fixes ( #13335 )
...
* roc.py cleanups
* direct append
* viz index cleanup
* simd row details
2025-11-19 01:27:56 +08:00
George Hotz
06e39a88a9
outer vmap works ( #13334 )
...
* outer vmap works
* fuse works
* vmap outer works
* outer ranges work
* grad work
* should be good to merge
2025-11-18 09:27:48 -08:00
chenyu
805de27e07
no load substitute in uop_given_valid [pr] ( #13333 )
2025-11-18 11:47:58 -05:00
chenyu
05294bc648
fix some mypy cast [pr] ( #13331 )
2025-11-18 09:23:42 -05:00
qazal
5623e765c8
VIZ=2 enables SQTT ( #13330 )
2025-11-18 22:20:31 +08:00
nimlgen
331f70aa75
roc: ctrlc ( #13255 )
...
* roc: ctrl-c works
* rm
2025-11-18 19:29:28 +08:00
George Hotz
583560ab72
this is the right way to write vmap ( #13328 )
2025-11-17 20:20:52 -08:00
Christopher Milan
8e8e53c886
int8_t is c_byte ( #13326 )
2025-11-17 21:25:40 -05:00
George Hotz
e4fead8a86
write scan in uops ( #13321 )
...
* write scan in uops
* ops range
* no need for variable
* meh, later
* shorter
2025-11-17 16:58:08 -08:00
wozeparrot
8894a5409d
feat: hipcc compiler ( #13319 )
2025-11-17 15:13:32 -08:00
George Hotz
6d3385c284
print special ops in postrange ( #13318 )
...
* print special ops in postrange
* fix on OSX
2025-11-17 14:43:23 -08:00
chenyu
b637093be9
remove a few rules in pm_lower_index_dtype [pr] ( #13317 )
2025-11-17 17:04:56 -05:00
George Hotz
98e9e73286
hotfix: amd_uop_matmul getenvs
2025-11-17 13:26:01 -08:00
qazal
e7e1935225
cleanup sqtt/test_timing ( #13315 )
2025-11-18 04:28:05 +08:00
wozeparrot
33773fda87
tk initial mi350 ( #13289 )
2025-11-17 11:46:32 -08:00
nimlgen
e2cee64050
Revert "hcq: add tag to exec events ( #13311 )" ( #13314 )
...
This reverts commit f63ded5817 .
2025-11-17 22:15:31 +03:00
chenyu
646372490c
move tiktoken import in llama3 ( #13316 )
...
only Tokenizer requires that
2025-11-17 14:09:37 -05:00
qazal
a37f221e44
viz: visualize waves in the timeline ( #13292 )
...
* viz: visualize waves in the timeline
* timeline in format
* per step
* rm that
2025-11-17 22:04:21 +08:00