George Hotz
004af512e6
try all matches in the function ( #7288 )
2024-10-25 14:17:04 +08:00
George Hotz
bcf0537653
canonicalize the order prereqs ( #7283 )
...
* canonicalize the order
* don't change that yet
* that order isn't safe with uops
2024-10-25 11:37:51 +08:00
qazal
603d637105
split to fuse.py and schedule.py [pr] ( #7284 )
2024-10-25 06:17:24 +03:00
qazal
698457c5ce
big graph ScheduleContext [pr] ( #7282 )
2024-10-25 05:58:23 +03:00
qazal
b805711f86
move anything that isn't bfs [pr] ( #7273 )
2024-10-25 05:34:21 +03:00
George Hotz
6dc7d3c949
instant uop rules [pr] ( #7263 )
...
* instant uop rules [pr]
* real instant
* only instant folder
* better diff
* instant means instant
* Revert "instant means instant"
This reverts commit e58d9161bf .
2024-10-25 10:32:45 +08:00
chenyu
90f720d703
limit idiv by neg bound to only if s0 is non-negative [pr] ( #7277 )
...
also updated the tests when div by negative const
2024-10-24 15:46:50 -04:00
chenyu
d4c94d0d32
disable llama 1 4gpu and 6gpu benchmark ( #7276 )
...
having llama3 4gpu and 6gpu should be good enough
2024-10-24 14:19:22 -04:00
chenyu
e6929f2402
RUN_PROCESS_REPLAY=0 on llama 70B and resnet training ( #7272 )
...
* RUN_PROCESS_REPLAY=0 on llama 70B and resnet training
also added a 15 minutes total timeout, this cannot grow indefinitely
* add a few more
* a few more just for NV
2024-10-24 12:09:54 -04:00
chenyu
b777cfdcba
update test_max_simplify_and_cancel ( #7270 )
...
it's fixed and no longer dumb
2024-10-24 10:29:05 -04:00
qazal
d3953c6c55
split the preload path [pr] ( #7271 )
2024-10-24 17:28:03 +03:00
chenyu
acad11ea8e
minor cleanup to View add [pr] ( #7247 )
2024-10-24 09:18:47 -04:00
nimlgen
98f8d0ccf9
nv limit max local memory with envvar ( #7265 )
2024-10-24 16:01:50 +03:00
qazal
f20b651ee0
prescheduling refactor from big graph [pr] ( #7268 )
...
* prescheduling refactor from big graph [pr]
* finally replay
2024-10-24 14:55:07 +03:00
George Hotz
2e6ec43c49
hotfix: become the latest for process replay
2024-10-24 19:02:59 +08:00
George Hotz
9a3d498d9c
with commutative hack, uops can change. fix that ( #7266 )
...
* with commutative hack, uops can change. fix that
* simpler
2024-10-24 18:50:23 +08:00
qazal
d482d927a8
hotfix: nobody uses [run_process_replay] [pr] ( #7264 )
2024-10-24 13:37:29 +03:00
qazal
fa5dc7857a
assign toposort with big graph, bfs [pr] ( #7242 )
...
* assign toposort with big graph, bfs [pr]
* cycle
* merge 2
* filter bufs
* delete inputs
2024-10-24 13:09:01 +03:00
George Hotz
4d081eb560
double mod is single mod ( #7262 )
...
* double mod is single mod
* unused name
2024-10-24 18:02:51 +08:00
George Hotz
e4631a47f4
symbolic arange support ( #7252 )
...
* symbolic arange support WIP [pr]
* smin/smax from old try
* pad2d symbolic works
* real test
* sym arange
* symbolic arange test passes
* double mod is single mod
* lol that's not right
* more tests
* Update ops.py
2024-10-24 17:55:53 +08:00
George Hotz
23b26d40d5
small max rules + windows VIZ ( #7261 )
...
* a rule from smax
* that rule too
* fix VIZ on windows
2024-10-24 17:43:39 +08:00
George Hotz
415186da3c
Revert "some rules to simplify max ( #7258 )" ( #7260 )
...
This reverts commit b56fab54ea .
2024-10-24 17:15:52 +08:00
qazal
93934c2160
early assert cyclic read [pr] ( #7259 )
...
* early assert cyclic read [pr]
* misc
2024-10-24 11:51:12 +03:00
George Hotz
b56fab54ea
some rules to simplify max ( #7258 )
2024-10-24 16:27:21 +08:00
George Hotz
a7be9dfd71
leftover lru_cache on UPat [pr] ( #7257 )
...
* leftover lru_cache on UPat [pr]
* fix mypy
2024-10-24 16:11:24 +08:00
George Hotz
532b7b018c
add smin/smax ( #7253 )
...
* add smin/smax
* don't create var with var
* better test errors
* add failing test
* enable shape simplification
* fix tests
* Update view.py
* simpler and simplify
2024-10-24 16:10:49 +08:00
George Hotz
de7b9d7c42
improve pre-commit [pr] ( #7256 )
...
* improve pre-commit [pr]
* mypy passes on windows
2024-10-24 15:38:47 +08:00
George Hotz
b1a30677fe
add some tiny tests that should pass everywhere [pr] ( #7254 )
2024-10-24 14:38:46 +08:00
George Hotz
63048ad880
don't recreate COMMUTATIVE the other way ( #7255 )
...
* don't recreate COMMUTATIVE the other way
* add shl and add passing test
* fix tests and move assignment to __new__
* that can stay there
* happy mypy
2024-10-24 14:38:29 +08:00
George Hotz
1315b8909a
strict mode triggers on beam timeout [pr] ( #7250 )
2024-10-24 11:37:57 +08:00
George Hotz
9f32a6f496
Revert "move metal tc check to renderer [pr] ( #7248 )" ( #7251 )
...
This reverts commit 72ddcdb4d1 .
2024-10-24 10:57:09 +08:00
George Hotz
72ddcdb4d1
move metal tc check to renderer [pr] ( #7248 )
2024-10-24 10:38:57 +08:00
chenyu
451c043552
narrow return type of bool, int, float on UOp [pr] ( #7246 )
2024-10-23 21:06:43 -04:00
chenyu
9f370cccb3
minor cleanups in apply_opt [pr] ( #7243 )
2024-10-23 18:21:00 -04:00
qazal
65bbafe3e2
bfs refactors from the big graph branch [pr] ( #7235 )
2024-10-23 23:24:31 +03:00
nimlgen
ea11382087
nv fix shared_memory_size ( #7239 )
2024-10-23 21:59:47 +03:00
qazal
ca7b2658b9
start with a fresh ScheduleItemContext in process_replay [pr] ( #7236 )
2024-10-23 18:01:50 +03:00
qazal
ca6c58527b
dfs append_bufs ( #7224 )
...
* dfs append_bufs
* fix test_linearizer
2024-10-23 17:14:51 +03:00
qazal
aeeb917b6e
mask out writable bufs in runtime access_resources ( #7234 )
2024-10-23 16:13:50 +03:00
qazal
d2b608233a
get outbufs by globals idxs [pr] ( #7233 )
2024-10-23 16:06:35 +03:00
qazal
9a2718b30b
proposal: add UOps.PRELOAD ( #7220 )
2024-10-23 10:23:52 +03:00
qazal
3ce1c69c9c
split to get_realizes [pr] ( #7225 )
2024-10-23 10:22:36 +03:00
chenyu
f890d1cbbd
remove PUSH_PERMUTES from external_test_opt ( #7232 )
...
remove old comments and update kernel count for test_convnext
2024-10-23 00:11:34 -04:00
chenyu
24e2442a89
minor tweak to real_strides [pr] ( #7230 )
...
only graph_rewrite once on idx (sholuld be idempotent), and always rewrite valid. will co-rewrite idx and valid next
2024-10-22 22:05:57 -04:00
chenyu
169cc348fe
move valid related functions to ops.py [pr] ( #7229 )
2024-10-22 21:10:12 -04:00
chenyu
e90bbe6bbc
failed test cases for 3+ views shapetracker strides ( #7226 )
2024-10-22 18:49:13 -04:00
qazal
dae908299e
full_ast_rewrite api with ScheduleItemContext ( #7223 )
2024-10-22 23:17:05 +03:00
qazal
7e36e1d2bb
LAZYCACHE to context var [pr] ( #7222 )
2024-10-22 20:36:06 +03:00
qazal
2083ac0b4c
generic small graph sink -> ScheduleItem pattern matcher [pr] ( #7221 )
2024-10-22 20:20:26 +03:00
qazal
4916095124
compute ScheduleItem writable bufs [pr] ( #7214 )
...
* compute ScheduleItem writable bufs [pr]
* don't cache Buffer
2024-10-22 19:02:29 +03:00