chenyu
97b116bb1d
UOp mul div simplification ( #5637 )
...
* UOp mul div simplification
* != 0 is fine
2024-07-22 16:14:12 -04:00
nimlgen
ee633c1988
hcq move out synchronize to base class ( #5634 )
2024-07-22 20:36:04 +03:00
nimlgen
26fc4610a0
amd more accurate cache managment ( #5631 )
...
* amd more accurate cache managment
* fix amd
* add memory_barrier + copies tests
* tranfer test as well
* linter
2024-07-22 19:07:01 +03:00
qazal
fe6f9b2048
more actionable verify_lazyop assert ( #5635 )
2024-07-23 00:06:11 +08:00
Vyacheslav Pachkov
edc58e6b6e
hcq: remove duplicate allocation of kernel args by abstracting ( #5633 )
2024-07-22 18:29:41 +03:00
nimlgen
08a9c0ae5e
hcq cache invalidation for beam ( #5630 )
...
* nv full cache invalidation
* the same command on amd
* linter
* fix amd
* nv no hardcoded consts
* beam default
2024-07-22 18:13:17 +03:00
qazal
c64e9591e3
replace gates in uopgraph [run_process_replay] ( #5632 )
...
* test
* replace gates in uopgraph [run_process_replay]
* rewrite all gates
* hmm, process replay passes?
* Revert "rewrite all gates"
This reverts commit 2425a443f3 .
* yea that makes sense
* remove unsued
* replace source should be up there
2024-07-22 15:56:55 +03:00
George Hotz
dc21e63bd2
test: put conv in one reduce ( #4441 )
...
* test: put conv in one reduce
* put reduce at the end
* more expand
* generic, and that expand was breaking things
* ratio
* don't undo the expand
* arg 1
* strides
* warning, for resnet
* warning removed
* disable cast
* handle cast
* op
* err, that's right
* fixup
* fix that
* a test to play with
* add double_reduces
* working up to final reshape
* fold the last reshape
* moved to schedule
* fix axis
* ci, need to bring arange back
* FUSE_CONV_BW maybe
* valid in 3.9
* test_expand_reduce_is_folded_on_different_axes
* add FUSE_CONV_BW=1
* test_fold_batchnorm_backward
* test_sgd_4convs_fuse
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-07-22 12:16:13 +03:00
George Hotz
386fb5e7f8
folding without UNMUL ( #5628 )
...
* folding without UNMUL
* fix failures, index_collapse
* import ReduceOps
* test_arange_4096 isn't folding
2024-07-21 20:14:44 -07:00
Vyacheslav Pachkov
583829ab44
helpers: remove duplicate data64 helpers in amd/nv ( #5627 )
2024-07-21 16:50:59 -07:00
George Hotz
6c6d74d922
parallel mcts ( #5626 )
...
* start work on parallel mcts
* compile was linearizing twice
* typing + more early stopping
* fix compiler error
2024-07-21 14:53:23 -07:00
chenyu
c56c9c7519
move ufix inside UOp [run_process_replay] ( #5621 )
...
dtype is always the dtype of the caller
2024-07-21 17:30:37 -04:00
George Hotz
ef179087a4
mcts exit condition wasn't right, also use it with BEAM>=100 ( #5619 )
...
* mcts exit condition wasn't right, also use it with BEAM>=100
* mcts touchups
* clean up sample
2024-07-21 10:16:47 -07:00
chenyu
a823759dc5
simpler pattern matcher rules [run_process_replay] ( #5620 )
2024-07-21 04:05:01 -04:00
George Hotz
0f67ef4674
mcts graph and dedup support ( #5618 )
...
* mcts graph and dedup support
* usable graph
* mcts colors
* C=4 seems better
* C=3 even better
* sample_tree
* backprop is external function
* late expand to match algo
2024-07-20 23:29:14 -07:00
George Hotz
7f5282b2f5
tests if the linearizer is generating dumb code ( #5611 )
...
* tests if the linearizer is generating dumb code
* push consts to the end
* sort adds
* sorted add and mul
* this better
* simple expand/contract
* no math contract/expand
2024-07-20 20:36:32 -07:00
chenyu
eddc5bcfd7
MCTS tweaks ( #5616 )
...
MCTS 500 is competitive with BEAM=8 on resnet on M1 Max.
- increment trial times even with compiled error and runtime error.
- use best time of children as the node value.
2024-07-20 19:45:59 -07:00
George Hotz
b399ccd6ef
BEAM bugfix, kernels dedup now ( #5617 )
...
* BEAM bugfix, kernels dedup now
* getenv is default
2024-07-20 19:43:50 -07:00
chenyu
92e7e65712
one more test case for symbolic mod mul ( #5615 )
2024-07-20 17:23:06 -04:00
chenyu
d71308ed68
copy mlperf 4.0 to mlperf 4.1 ( #5614 )
2024-07-20 16:12:00 -04:00
George Hotz
1113e47f96
print best in MCTS + light up the winner in hcopt
2024-07-20 09:39:36 -07:00
nimlgen
0de5812032
hcq move map to allocator ( #5610 )
...
* hcq move map to allocator
* fix
2024-07-20 19:02:45 +03:00
George Hotz
ac99ecd94e
use statistics.median for timing ( #5606 )
2024-07-20 08:37:32 -07:00
qazal
3ab5fe4e1b
test argmax multireduce failure ( #5609 )
2024-07-20 21:33:03 +08:00
qazal
a96b5e3abb
small input_st reorder ( #5608 )
2024-07-20 20:50:34 +08:00
nimlgen
646bdc1c0e
elf loader touchups ( #5607 )
...
* loadonly SHF_ALLOC sections
* revert this, just amd fix
2024-07-20 12:30:18 +03:00
nimlgen
7ca2c48b64
hcq simpler _gpu2cpu_time ( #5605 )
...
* hcq simpler _gpu2cpu_time
* rename
2024-07-20 11:10:25 +03:00
nimlgen
32b0c07d5a
docs: fix synchronization example in hcq ( #5604 )
2024-07-20 10:52:06 +03:00
George Hotz
06e336bccb
mcts search ( #5598 )
...
* mcts search
* mcts cleanups
* mcts cleanup
* random shuffle children order
* mcts in handcode_opt
* src and remove_node
* debug 3 to print ast
* print the type
* mcts in extra
2024-07-19 21:38:39 -07:00
chenyu
b991097d41
move UPat and PatternMatcher from uopgraph.py to uops.py ( #5597 )
...
* move UPat and PatternMatcher from uopgraph.py to uops.py
towards instant UOps rewrite on UOp.alu
[run_process_replay]
* fix imports
2024-07-19 19:28:24 -04:00
Tobias Fischer
72da3fe7e6
added clip vision model ( #5595 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-19 18:35:51 -04:00
P4ssenger
a1af5a79ad
remove obsolete code ( #5596 )
2024-07-19 18:12:03 -04:00
George Hotz
a02998472b
fix no locals behavior ( #5593 )
2024-07-19 14:35:09 -07:00
George Hotz
2e617ca59e
lowerer img index ( #5592 )
2024-07-19 14:22:02 -07:00
chenyu
3acd8559f4
doc: variable names in abstractions2.py ( #5591 )
2024-07-19 17:06:15 -04:00
chenyu
00c01f6f4d
correct IDIV dtype check error msg ( #5589 )
...
`dtypes.is_int` is not the same as `dtype == dtypes.int`
2024-07-19 16:36:47 -04:00
nimlgen
b1782e3fef
hcq refactor signal into class ( #5575 )
...
* hcq refactor signal into class
* fix amd
* amd do not use amd_signal_t
* cleanup
* signal setter
* fix linter
* docs
* more docs + types
* fix types
2024-07-19 23:23:05 +03:00
Francis Lata
2dc100c565
fix typo in runtime overview docs ( #5588 )
2024-07-19 22:00:15 +03:00
George Hotz
d0ab20a5e5
careful memory counting (with tests to specify behavior) ( #5587 )
2024-07-19 11:37:34 -07:00
chenyu
37dd233650
always reverse global dim ( #5586 )
...
* always reverse global dim
* one more test
2024-07-19 13:58:05 -04:00
George Hotz
10be05aae5
push contract through cast to fix test_float2_acc (try 2) ( #5585 )
...
* push contract through cast to fix test_float2_acc (try 2)
* contract push only on floats
2024-07-19 10:34:43 -07:00
George Hotz
51892c8fac
Revert "push contract through cast to fix test_float2_acc ( #5581 )" ( #5583 )
...
This reverts commit ddda9420be .
2024-07-19 09:44:30 -07:00
George Hotz
6bade4d419
save the uops in their own file ( #5582 )
2024-07-19 09:30:37 -07:00
George Hotz
ddda9420be
push contract through cast to fix test_float2_acc ( #5581 )
...
* push contract through cast to fix test_float2_acc
* no_vectorized_alu applies to cast too
2024-07-19 09:30:26 -07:00
chenyu
3f590c3b31
some limit_dims to limit global merging ( #5489 )
...
only supports merging dims in a way that does not surpass limit, no splitting yet
2024-07-19 12:17:46 -04:00
George Hotz
e04704faff
put acc first again ( #5580 )
2024-07-19 08:55:19 -07:00
chenyu
fc5b9f8dc9
Kernel.required_optimizations and Kernel.hand_coded_optimizations returns self ( #5576 )
...
[run_process_replay]
2024-07-19 10:55:14 -04:00
qazal
da34e1f617
scheduler refactors from the fuse_index branch ( #5579 )
...
* make simple_pads a safe set
* use is for comparing base
* 1 should continue
2024-07-19 16:23:31 +03:00
qazal
ecf88bb775
move assign_targets assignment ( #5578 )
2024-07-19 20:29:50 +08:00
George Hotz
0ad87021e2
move acc to end ( #5568 )
...
* move acc to end
* confirmed pictures are the same
* relax that
* Update test_ops.py
2024-07-19 03:06:52 -07:00