Commit Graph

10417 Commits

Author SHA1 Message Date
chenyu
97b116bb1d UOp mul div simplification (#5637)
* UOp mul div simplification

* != 0 is fine
2024-07-22 16:14:12 -04:00
nimlgen
ee633c1988 hcq move out synchronize to base class (#5634) 2024-07-22 20:36:04 +03:00
nimlgen
26fc4610a0 amd more accurate cache managment (#5631)
* amd more accurate cache managment

* fix amd

* add memory_barrier + copies tests

* tranfer test as well

* linter
2024-07-22 19:07:01 +03:00
qazal
fe6f9b2048 more actionable verify_lazyop assert (#5635) 2024-07-23 00:06:11 +08:00
Vyacheslav Pachkov
edc58e6b6e hcq: remove duplicate allocation of kernel args by abstracting (#5633) 2024-07-22 18:29:41 +03:00
nimlgen
08a9c0ae5e hcq cache invalidation for beam (#5630)
* nv full cache invalidation

* the same command on amd

* linter

* fix amd

* nv no hardcoded consts

* beam default
2024-07-22 18:13:17 +03:00
qazal
c64e9591e3 replace gates in uopgraph [run_process_replay] (#5632)
* test

* replace gates in uopgraph [run_process_replay]

* rewrite all gates

* hmm, process replay passes?

* Revert "rewrite all gates"

This reverts commit 2425a443f3.

* yea that makes sense

* remove unsued

* replace source should be up there
2024-07-22 15:56:55 +03:00
George Hotz
dc21e63bd2 test: put conv in one reduce (#4441)
* test: put conv in one reduce

* put reduce at the end

* more expand

* generic, and that expand was breaking things

* ratio

* don't undo the expand

* arg 1

* strides

* warning, for resnet

* warning removed

* disable cast

* handle cast

* op

* err, that's right

* fixup

* fix that

* a test to play with

* add double_reduces

* working up to final reshape

* fold the last reshape

* moved to schedule

* fix axis

* ci, need to bring arange back

* FUSE_CONV_BW maybe

* valid in 3.9

* test_expand_reduce_is_folded_on_different_axes

* add FUSE_CONV_BW=1

* test_fold_batchnorm_backward

* test_sgd_4convs_fuse

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-07-22 12:16:13 +03:00
George Hotz
386fb5e7f8 folding without UNMUL (#5628)
* folding without UNMUL

* fix failures, index_collapse

* import ReduceOps

* test_arange_4096 isn't folding
2024-07-21 20:14:44 -07:00
Vyacheslav Pachkov
583829ab44 helpers: remove duplicate data64 helpers in amd/nv (#5627) 2024-07-21 16:50:59 -07:00
George Hotz
6c6d74d922 parallel mcts (#5626)
* start work on parallel mcts

* compile was linearizing twice

* typing + more early stopping

* fix compiler error
2024-07-21 14:53:23 -07:00
chenyu
c56c9c7519 move ufix inside UOp [run_process_replay] (#5621)
dtype is always the dtype of the caller
2024-07-21 17:30:37 -04:00
George Hotz
ef179087a4 mcts exit condition wasn't right, also use it with BEAM>=100 (#5619)
* mcts exit condition wasn't right, also use it with BEAM>=100

* mcts touchups

* clean up sample
2024-07-21 10:16:47 -07:00
chenyu
a823759dc5 simpler pattern matcher rules [run_process_replay] (#5620) 2024-07-21 04:05:01 -04:00
George Hotz
0f67ef4674 mcts graph and dedup support (#5618)
* mcts graph and dedup support

* usable graph

* mcts colors

* C=4 seems better

* C=3 even better

* sample_tree

* backprop is external function

* late expand to match algo
2024-07-20 23:29:14 -07:00
George Hotz
7f5282b2f5 tests if the linearizer is generating dumb code (#5611)
* tests if the linearizer is generating dumb code

* push consts to the end

* sort adds

* sorted add and mul

* this better

* simple expand/contract

* no math contract/expand
2024-07-20 20:36:32 -07:00
chenyu
eddc5bcfd7 MCTS tweaks (#5616)
MCTS 500 is competitive with BEAM=8 on resnet on M1 Max.
- increment trial times even with compiled error and runtime error.
- use best time of children as the node value.
2024-07-20 19:45:59 -07:00
George Hotz
b399ccd6ef BEAM bugfix, kernels dedup now (#5617)
* BEAM bugfix, kernels dedup now

* getenv is default
2024-07-20 19:43:50 -07:00
chenyu
92e7e65712 one more test case for symbolic mod mul (#5615) 2024-07-20 17:23:06 -04:00
chenyu
d71308ed68 copy mlperf 4.0 to mlperf 4.1 (#5614) 2024-07-20 16:12:00 -04:00
George Hotz
1113e47f96 print best in MCTS + light up the winner in hcopt 2024-07-20 09:39:36 -07:00
nimlgen
0de5812032 hcq move map to allocator (#5610)
* hcq move map to allocator

* fix
2024-07-20 19:02:45 +03:00
George Hotz
ac99ecd94e use statistics.median for timing (#5606) 2024-07-20 08:37:32 -07:00
qazal
3ab5fe4e1b test argmax multireduce failure (#5609) 2024-07-20 21:33:03 +08:00
qazal
a96b5e3abb small input_st reorder (#5608) 2024-07-20 20:50:34 +08:00
nimlgen
646bdc1c0e elf loader touchups (#5607)
* loadonly SHF_ALLOC sections

* revert this, just amd fix
2024-07-20 12:30:18 +03:00
nimlgen
7ca2c48b64 hcq simpler _gpu2cpu_time (#5605)
* hcq simpler _gpu2cpu_time

* rename
2024-07-20 11:10:25 +03:00
nimlgen
32b0c07d5a docs: fix synchronization example in hcq (#5604) 2024-07-20 10:52:06 +03:00
George Hotz
06e336bccb mcts search (#5598)
* mcts search

* mcts cleanups

* mcts cleanup

* random shuffle children order

* mcts in handcode_opt

* src and remove_node

* debug 3 to print ast

* print the type

* mcts in extra
2024-07-19 21:38:39 -07:00
chenyu
b991097d41 move UPat and PatternMatcher from uopgraph.py to uops.py (#5597)
* move UPat and PatternMatcher from uopgraph.py to uops.py

towards instant UOps rewrite on UOp.alu

[run_process_replay]

* fix imports
2024-07-19 19:28:24 -04:00
Tobias Fischer
72da3fe7e6 added clip vision model (#5595)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-19 18:35:51 -04:00
P4ssenger
a1af5a79ad remove obsolete code (#5596) 2024-07-19 18:12:03 -04:00
George Hotz
a02998472b fix no locals behavior (#5593) 2024-07-19 14:35:09 -07:00
George Hotz
2e617ca59e lowerer img index (#5592) 2024-07-19 14:22:02 -07:00
chenyu
3acd8559f4 doc: variable names in abstractions2.py (#5591) 2024-07-19 17:06:15 -04:00
chenyu
00c01f6f4d correct IDIV dtype check error msg (#5589)
`dtypes.is_int` is not the same as `dtype == dtypes.int`
2024-07-19 16:36:47 -04:00
nimlgen
b1782e3fef hcq refactor signal into class (#5575)
* hcq refactor signal into class

* fix amd

* amd do not use amd_signal_t

* cleanup

* signal setter

* fix linter

* docs

* more docs + types

* fix types
2024-07-19 23:23:05 +03:00
Francis Lata
2dc100c565 fix typo in runtime overview docs (#5588) 2024-07-19 22:00:15 +03:00
George Hotz
d0ab20a5e5 careful memory counting (with tests to specify behavior) (#5587) 2024-07-19 11:37:34 -07:00
chenyu
37dd233650 always reverse global dim (#5586)
* always reverse global dim

* one more test
2024-07-19 13:58:05 -04:00
George Hotz
10be05aae5 push contract through cast to fix test_float2_acc (try 2) (#5585)
* push contract through cast to fix test_float2_acc (try 2)

* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz
51892c8fac Revert "push contract through cast to fix test_float2_acc (#5581)" (#5583)
This reverts commit ddda9420be.
2024-07-19 09:44:30 -07:00
George Hotz
6bade4d419 save the uops in their own file (#5582) 2024-07-19 09:30:37 -07:00
George Hotz
ddda9420be push contract through cast to fix test_float2_acc (#5581)
* push contract through cast to fix test_float2_acc

* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31 some limit_dims to limit global merging (#5489)
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
e04704faff put acc first again (#5580) 2024-07-19 08:55:19 -07:00
chenyu
fc5b9f8dc9 Kernel.required_optimizations and Kernel.hand_coded_optimizations returns self (#5576)
[run_process_replay]
2024-07-19 10:55:14 -04:00
qazal
da34e1f617 scheduler refactors from the fuse_index branch (#5579)
* make simple_pads a safe set

* use is for comparing base

* 1 should continue
2024-07-19 16:23:31 +03:00
qazal
ecf88bb775 move assign_targets assignment (#5578) 2024-07-19 20:29:50 +08:00
George Hotz
0ad87021e2 move acc to end (#5568)
* move acc to end

* confirmed pictures are the same

* relax that

* Update test_ops.py
2024-07-19 03:06:52 -07:00