* Fix openpilot kernel from 209 to 206
1. Use push_movement_ops conditions in _movement_op. Don't push
PAD or check if the ops are safe to be pushed with PAD
2. Don't push if all the op.buffers are realized
* change ALLOWED_KERNEL_COUNT to 206 for openpilot
* don't push through sourceless buffers
* change the tests to adjust kernel counts for new behaviour
* restore pushing of movement ops through childless buffer
* don't push EXPAND, causes OOM
* allow push of intermediate movement ops
* adding new test behaviour
* modifying external_test_opt for new behaviour
* restore old tests
* Reenable push of EXPAND and introduce new tests
I was wrong intially thinking EXPAND can cause OOM and hence I had
disabled it. Since it is 0 stride and doesn't allocate memory its cool
* Don't push EXPAND above LoadOps LB. This is causing OOM
* Push should be decided on movement root of bufs
To check if ast.op.buffers is sourceless/ realized go the the movement
root and then decide if pushing should be done or not
* refactor for readability
* use .base instead
* don't push expand, bad memory/compute consumption
* restrict push of reshape, seeing improvement
* push reshape if unary without further check
* disable PAD solves convnext kernel count increase
* reenable test_cache_binaryop_transpose
* small nit
* init compiled cache
* clang not compile to stdout
* use kwrags in compile
* remove some useless lines
* slimmer
* fix
* tabs
* retry
* remove decorator
* no race in hip
* smaller hip
* unused import
* unused pathlib
* path to str
* add test
* fix linter
* less lines?
* decorator is back
* update tests
* no hip version
* better comments
* a bit better test
* linter
* work wo decorator
* linter happy
* simpler return type
* more tests
* better comment
* readable
* readable
* readable
* compile returns bytes
* no ununsed imports
* readable
* start work on auto opt
* lin failure
* not beating hcopt
* greedy
* timing is fast
* codegen.search
* greedy search in handcode_opt
* track running gflops
* clean up those files
* no failure
* testing with the test_ops pattern
* add assign test
* flake8 complaining about single line fn
* slice 2d and minor cleanup
* make assign_slice a one-liner
* we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn
* back assign fn for np array
* implement __setitem__ in tensor.py
* dont re-slice the ret tesnsor
* one liner assign
* drop the permute test
* Allow multi-input model export
* Add model export unit test
* Fix efficientnet compilation
* Only run model export test on JIT supported devices
* Skip export model test if not EXPORT_SUPPORTED_DEVICE
* simplify gpt2 example
* kernel_jitted_count and jit tests
* Revert "kernel_jitted_count and jit tests"
This reverts commit 31a3c26dd0.
* all_jitted test in test_real_world
* small changes
* expand in terms of substitute, directly expand g_idxs g_valid
* delete expand_ops
* don't compare using hash
* any instead of in
thanks gijskoning
Co-authored-by: Gijs Koning <gijs-koning@live.nl>
* support tc
* testing code
* no more create_rednode
* maxsize none in view/node
* oops
* undo
* typing
* oops
* oops
* lmao
* lmao
* add expand multi test
* Node.iter_idxs
* type
* type
* delete checks!
* clean up a little?
* expand_idx in symbolic
* un-golf
* play around with types >.>
* test_substitute and also remove an incorrect test?
* get rid of range
* Update symbolic.py
* split out view cache change
* split out flat components change
* reduce diff
* reduce diff
* add some float4 tests
* fix
---------
Co-authored-by: Gijs Koning <gijs-koning@live.nl>
* small lazy cleanups
* a few more
* cleanups
* no more realizing in the scheduler test
* a few more minor things
* that was just wrong
* fix graph. the graph test was completely useless
* make graph usable
* fix op graph
* lazy cleanups
* ast functions take in LazyOps
* op instead of self.op
* _base for mops
* fix contiguous
* start schedule
* test_schedule
* fix openpilot
* more tests
* bugfix and test skip
* work
* make sure things get freed
* fix zerosized tensors
* fix failing test
* fix ceil and friends
* fix openpilot
* disable training
* disable test collectives