* optimizer: add test for correctness of opts
Also added OptOps.UPCASTMID to constrain valid axes for opts with
group_for_reduce.
* llvm: fix LinearizerOptions to correctly not has_shared
* optimizer: remove premature test scaffold for TC opts
* search: fix the action space
* uops graphing
* add_phi_node
* less phi nodes
* where graph uops should live
* naming
* move it to external
* fix triton yolo
* fix clang and preserve behavior
* add mops to graph, refactor IMAGE
* no reshape pushing
* add todo
* fix openpilot model alt
* push reshapes reduces kernels in new op
* IMAGE=2 is a first class citizen now
* Revert "disable flaky triton test"
This reverts commit 1e15fdaee7.
* Update test.yml
* check if has shared for matvec
* disable ocelot cache for triton
* disable ocelot cache
* disable ocelot cache
* pass shared to triton uops tests
* temporary debugs for CI crash
* Revert "temporary debugs for CI crash"
This reverts commit fee3ea96c8.
* Revert "triton isn't tested, and allows this refactor (#2007)"
This reverts commit dea8bb0938.
* add runtime_args to every renderer, move triton local size override to runtime args
* Add binary to args, correct type returned
* update to new loops
* Update test.yml
* some cleanup
* move continue back
* more more more
* added to CI
* try
* try intentionally break some tests
* wtf
* del True for test
* yay tests broke, now pls no break
* try AGAIN
* gahy
* lol
* try
* move over constant
* moved over MORE
* move shrink over
* trailing lines
* try CUDA CI
* try again
* boom
* oops
* improved comments
* try: disable some flags and disable CUDA
* try breaking tests
* traceback has too much info so add --tb=no
* revert forced CI failure
* add comments and del unused imports
* oooooooo using regular debug try enable tb
* intentionally break tests
* added tb back. Maybe not too verbose
* strip whitespcae
* missed something
* Shape op int32 -> int64
* oops missed something
* add some types
* get rid of crazy 1 liners in pad op
* actually test Split this time LOL
* strip that whitespace
* limit metal buffers
* look at the base, not the srcs
* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"
This reverts commit 924ecc4d6a.
* add a test for that
* create cache for q learning
* make linter happy
* global beam
* where it belongs
* bugfix
* ditch the kopt, use the beam
* faster lin and DEBUG=2 okay
* remove kopt, move search to features
* Fix openpilot kernel from 209 to 206
1. Use push_movement_ops conditions in _movement_op. Don't push
PAD or check if the ops are safe to be pushed with PAD
2. Don't push if all the op.buffers are realized
* change ALLOWED_KERNEL_COUNT to 206 for openpilot
* don't push through sourceless buffers
* change the tests to adjust kernel counts for new behaviour
* restore pushing of movement ops through childless buffer
* don't push EXPAND, causes OOM
* allow push of intermediate movement ops
* adding new test behaviour
* modifying external_test_opt for new behaviour
* restore old tests
* Reenable push of EXPAND and introduce new tests
I was wrong intially thinking EXPAND can cause OOM and hence I had
disabled it. Since it is 0 stride and doesn't allocate memory its cool
* Don't push EXPAND above LoadOps LB. This is causing OOM
* Push should be decided on movement root of bufs
To check if ast.op.buffers is sourceless/ realized go the the movement
root and then decide if pushing should be done or not
* refactor for readability
* use .base instead
* don't push expand, bad memory/compute consumption
* restrict push of reshape, seeing improvement
* push reshape if unary without further check
* disable PAD solves convnext kernel count increase
* reenable test_cache_binaryop_transpose
* small nit
* init compiled cache
* clang not compile to stdout
* use kwrags in compile
* remove some useless lines
* slimmer
* fix
* tabs
* retry
* remove decorator
* no race in hip
* smaller hip
* unused import
* unused pathlib
* path to str
* add test
* fix linter
* less lines?
* decorator is back
* update tests
* no hip version
* better comments
* a bit better test
* linter
* work wo decorator
* linter happy
* simpler return type
* more tests
* better comment
* readable
* readable
* readable
* compile returns bytes
* no ununsed imports
* readable
* start work on auto opt
* lin failure
* not beating hcopt
* greedy
* timing is fast
* codegen.search
* greedy search in handcode_opt
* track running gflops
* clean up those files
* no failure
* testing with the test_ops pattern
* add assign test
* flake8 complaining about single line fn
* slice 2d and minor cleanup
* make assign_slice a one-liner
* we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn
* back assign fn for np array
* implement __setitem__ in tensor.py
* dont re-slice the ret tesnsor
* one liner assign
* drop the permute test
* Allow multi-input model export
* Add model export unit test
* Fix efficientnet compilation
* Only run model export test on JIT supported devices
* Skip export model test if not EXPORT_SUPPORTED_DEVICE
* simplify gpt2 example
* kernel_jitted_count and jit tests
* Revert "kernel_jitted_count and jit tests"
This reverts commit 31a3c26dd0.
* all_jitted test in test_real_world