Commit Graph

9 Commits

Author SHA1 Message Date
George Hotz
dc21e63bd2 test: put conv in one reduce (#4441)
* test: put conv in one reduce

* put reduce at the end

* more expand

* generic, and that expand was breaking things

* ratio

* don't undo the expand

* arg 1

* strides

* warning, for resnet

* warning removed

* disable cast

* handle cast

* op

* err, that's right

* fixup

* fix that

* a test to play with

* add double_reduces

* working up to final reshape

* fold the last reshape

* moved to schedule

* fix axis

* ci, need to bring arange back

* FUSE_CONV_BW maybe

* valid in 3.9

* test_expand_reduce_is_folded_on_different_axes

* add FUSE_CONV_BW=1

* test_fold_batchnorm_backward

* test_sgd_4convs_fuse

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-07-22 12:16:13 +03:00
George Hotz
d1a7279605 indexing fold with casted bool (#5551)
* cast bool is where

* universal transform is wrong
2024-07-18 10:02:29 -07:00
qazal
0ad1672d5f fuse indexing (LazyOp creation) (#5506)
* bring FUSE_AS_ONE_KERNEL back

* operands need reshape?

* fused but arange didnt fold

* something deeply wrong

* yay, fused

* derive broadcasts

* s/input/reduce_input

* _fixup_ones proved a point

* this is what it takes

* down to 3 required reshapes:

1. output_shape
2. the second reduce merge dims
3. remove dims for above reshape

* start real reshapes

* resolve shape in the edges pre lazyop

* outputs are the same shape

* rewrite1: just the reduce

* more correct

* fuse_as_one_kernel

* closer

* this passes

* dont rerun info

* dont need these

* not needed
2024-07-18 14:09:17 +03:00
qazal
e22b377839 generalize FUSE_AS_ONE_KERNEL in the scheduler (#5397)
* test: use const

* hotfix: base

* asserts

* dont push through reshape

* cleanup

* dont need the cache

* test_reduceop_reshape_dont_push and test_index_fused are next
2024-07-12 10:23:16 +03:00
George Hotz
3a2b5a75d2 improve single kernel indexing (#5398)
* improve single kernel indexing

* metadata in graph (#5399)

* indexing is O(1)

* add failing test

* ugh, that all needs to be replaced with symbolic

* broken on ptx, it's fine

---------

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-07-11 19:00:57 -07:00
George Hotz
c2da4454cd indexing getting better (#5389)
* indexing getting better [run_process_replay] [no_assert]

* fix test

* test_arange_2_reduce is a simpler test

* put that print back, NOOPT

* don't merge reduces (they could be different reduces)

* FUSE_AS_ONE_KERNEL

* fix tests

* fix test_var_multireduce

* w/e put that there

* fails on others too

* fix test, revert UNMUL change

* in case order matters

* one kernel indexing works

* one kernel indexing works (test other)
2024-07-11 16:41:51 -07:00
chenyu
f1bf916b8a apply NOOPT in test_arange complexity (#4774)
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
George-the-1st
0627e26140 Added missing unittest execution code (#4400)
same code as on every other test file, just missing from this one for some reason.
2024-05-02 22:34:30 -04:00
chenyu
a6ed2ae3c6 use old cumsum optimization for arange (#3813)
revert to old cumsum opt while phi simplification is disabled.

added a flops complexity test for this
2024-03-18 20:01:03 -04:00