Commit Graph

11118 Commits

Author SHA1 Message Date
George Hotz
2c8ad1b419 _apply_movement_op cache 2025-11-19 16:10:00 -08:00
George Hotz
821f3771df AxisType.PLACEHOLDER in reshape to do less graph_rewrite 2025-11-19 16:04:23 -08:00
Roelof van Dijk
0dc2ff431d fix: revive torch backend (#13280)
* fix: revive torch backend

* as_strided view vs copy

* Revert "as_strided view vs copy"

This reverts commit 82a61223f2.

* add extra tests (move inplace, add fusion tests)

* better fusion with inplace_op

* no optimizer hooks (break mnist training fusion)

* split off fusion tests in separate file, assert on resnet fusion

fix: remove comments

* cleanup, reduce diff

* reduce diff

* better fusion and identity checks

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-19 15:26:50 -08:00
wozeparrot
56b2540349 tk: keep extra tile data by replacing uop (#13370) 2025-11-19 15:11:43 -08:00
George Hotz
ab7df42c78 bring back fold_divmod_general with bugfix and test [pr] (#13369)
* Revert "Revert "merge to fold_divmod_general [p] (#13359)""

This reverts commit 05ccc69248.

* Revert "Revert "actually merge to fold_divmod_general [pr] (#13363)""

This reverts commit 90e5752199.

* Revert "Revert "add cache to fold_divmod_general (#13365)""

This reverts commit 8e17bd6791.

* bring back fold_divmod_general with bugfix and test
2025-11-19 14:51:51 -08:00
George Hotz
986d113024 symbolic fuzz failure (#13367)
* symbolic fuzz failure

* skip flaky test
2025-11-19 14:21:08 -08:00
George Hotz
05ccc69248 Revert "merge to fold_divmod_general [p] (#13359)"
This reverts commit 7711bbac7f.
2025-11-19 14:18:09 -08:00
George Hotz
90e5752199 Revert "actually merge to fold_divmod_general [pr] (#13363)"
This reverts commit 3d82b83cec.
2025-11-19 14:18:08 -08:00
George Hotz
8e17bd6791 Revert "add cache to fold_divmod_general (#13365)"
This reverts commit b5309a5043.
2025-11-19 14:18:08 -08:00
George Hotz
b5309a5043 add cache to fold_divmod_general (#13365) 2025-11-19 13:49:18 -08:00
George Hotz
3d82b83cec actually merge to fold_divmod_general [pr] (#13363)
* actually merge to fold_divmod_general [pr]

* one more merge

* Revert "one more merge"

This reverts commit aa79f6781c.

* avoid that case for speed

* faster and simpler
2025-11-19 13:17:56 -08:00
chenyu
a91f00925b remove VECTORIZE and WMMA rules from sym [pr] (#13362) 2025-11-19 14:51:21 -05:00
George Hotz
7711bbac7f merge to fold_divmod_general [p] (#13359)
* merge to fold_divmod_general [p]

* merge more

* merge more

* merge more
2025-11-19 11:37:45 -08:00
George Hotz
6fdbd03104 more divmod cleanup [p] (#13358)
* more divmod cleanup [p]

* lil cleanups, faster
2025-11-19 10:35:15 -08:00
George Hotz
bd88a72149 div and mod to its own file, try 2 [p] (#13357) 2025-11-19 10:10:06 -08:00
George Hotz
957cf717e7 Python speed (#13355)
* skip process replay by default

* work on python speed

* fix names of rewrite rules

* fix that test
2025-11-19 09:03:00 -08:00
chenyu
fc19ea76b5 clean up threefry rules (#13354) 2025-11-19 11:48:07 -05:00
George Hotz
385618d45b skip process replay by default (#13353) 2025-11-19 08:25:34 -08:00
chenyu
fba4535289 remove hacks for threefry long removal when padded [pr] (#13352) 2025-11-19 11:11:39 -05:00
George Hotz
225eb1500f generic range changes that work for str + int (#13350)
* generic range changes that work for str + int

* opt range counts up
2025-11-19 08:07:49 -08:00
chenyu
1a72ac16a6 move where same false branch rule to symbolic_simple [pr] (#13349) 2025-11-19 10:15:38 -05:00
chenyu
79055ddb8b clean propagate_invalid more [pr] (#13347) 2025-11-19 09:47:50 -05:00
nimlgen
0c9fbf87e1 nvioctl: classes (#13346) 2025-11-19 16:14:15 +03:00
qazal
f2221130bb viz: pick shape by event type (#13279) 2025-11-19 20:15:52 +08:00
wozeparrot
be72b78dcb tk: small fixes (#13345)
* fix: handle case where final uop isn't a tk wrapped one

* clean: remove after from mma
2025-11-19 00:58:50 -08:00
wozeparrot
e4fbde5b3b fix: extra options need to go on second step too (#13344) 2025-11-19 00:58:09 -08:00
George Hotz
1a332afa76 spec test on 3.14 (#12957) 2025-11-19 00:43:04 -08:00
Christopher Milan
a438c277de autogen tests for 3.14 (#13343) 2025-11-18 22:16:59 -05:00
chenyu
722e7a16ed remove rule in propagate_invalid [pr] (#13342) 2025-11-18 21:38:33 -05:00
George Hotz
1afa3c0877 vmap on full model (#13340)
* vmap on full model

* vmap gemm

* reduce sums on end

* outer reduce

* only if there's ranges

* put those rules in symbolic

* ranges

* do opt later

* add zero range
2025-11-18 16:06:06 -08:00
chenyu
46cb65e692 delete rules from sym [pr] (#13339) 2025-11-18 14:57:35 -05:00
George Hotz
9c59b3d19e vmap grad needs reduce_backward (#13336)
* vmap grad needs reduce_backward

* fuse and outer
2025-11-18 10:08:30 -08:00
qazal
a647c9eca6 sqtt ui minor fixes (#13335)
* roc.py cleanups

* direct append

* viz index cleanup

* simd row details
2025-11-19 01:27:56 +08:00
George Hotz
06e39a88a9 outer vmap works (#13334)
* outer vmap works

* fuse works

* vmap outer works

* outer ranges work

* grad work

* should be good to merge
2025-11-18 09:27:48 -08:00
chenyu
805de27e07 no load substitute in uop_given_valid [pr] (#13333) 2025-11-18 11:47:58 -05:00
chenyu
05294bc648 fix some mypy cast [pr] (#13331) 2025-11-18 09:23:42 -05:00
qazal
5623e765c8 VIZ=2 enables SQTT (#13330) 2025-11-18 22:20:31 +08:00
nimlgen
331f70aa75 roc: ctrlc (#13255)
* roc: ctrl-c works

* rm
2025-11-18 19:29:28 +08:00
George Hotz
583560ab72 this is the right way to write vmap (#13328) 2025-11-17 20:20:52 -08:00
Christopher Milan
8e8e53c886 int8_t is c_byte (#13326) 2025-11-17 21:25:40 -05:00
George Hotz
e4fead8a86 write scan in uops (#13321)
* write scan in uops

* ops range

* no need for variable

* meh, later

* shorter
2025-11-17 16:58:08 -08:00
wozeparrot
8894a5409d feat: hipcc compiler (#13319) 2025-11-17 15:13:32 -08:00
George Hotz
6d3385c284 print special ops in postrange (#13318)
* print special ops in postrange

* fix on OSX
2025-11-17 14:43:23 -08:00
chenyu
b637093be9 remove a few rules in pm_lower_index_dtype [pr] (#13317) 2025-11-17 17:04:56 -05:00
George Hotz
98e9e73286 hotfix: amd_uop_matmul getenvs 2025-11-17 13:26:01 -08:00
qazal
e7e1935225 cleanup sqtt/test_timing (#13315) 2025-11-18 04:28:05 +08:00
wozeparrot
33773fda87 tk initial mi350 (#13289) 2025-11-17 11:46:32 -08:00
nimlgen
e2cee64050 Revert "hcq: add tag to exec events (#13311)" (#13314)
This reverts commit f63ded5817.
2025-11-17 22:15:31 +03:00
chenyu
646372490c move tiktoken import in llama3 (#13316)
only Tokenizer requires that
2025-11-17 14:09:37 -05:00
qazal
a37f221e44 viz: visualize waves in the timeline (#13292)
* viz: visualize waves in the timeline

* timeline in format

* per step

* rm that
2025-11-17 22:04:21 +08:00