* delete the old rangeify path and all the children stuff
* remove the on_stack stuff and any retries
* don't use the p word
* Revert "remove the on_stack stuff and any retries"
This reverts commit 49a2b328b9.
* rtoposort is fast, can replace rangeify with this
* fast rangeify
* work
* fast rangeify works for mnist
* should work
* progress
* pad fix
* FAST
* tests passing
* don't delete those shape ops
* put in rangeify map
* ending ranges fix
* tests
* mstack/mselect no hacks
* move to indexing.py
* touch up tests + add comments
* disable failing test
* actually make the file readable
* failing
* error
* remove skipping cast in simplify_valid [pr]
unsupported statements are handled in uop_given_valid already. the test failed because (100%x) somehow got simplified
* better test
* add ordering
* fix some tests
* fix more tests
* shorten comment
* update test
* add rule and test
* add rule and test
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
* new algo
* add test
* add function to un-nest the div
* add UOp.factor
* test UOp.factor
* uop_given_valid tries to factor simplex expression
* shorten line
* symbolic_flat is back
* change that back
* fix those new tests
* new rule for ordering
* factor multiple factors
* no symbolic_flat
* symbolic_flat to there
* move that back
* fix imports
* merge correctly
* linter happy
* add rule
* add a test
* cleanup
* revert that for now
* UOp.factor returns self instead of None
* try all_candidates
* remove or_else
* post index symbolic
* add test
* maket this closer to the original
* increase mac hlb_cifar min step time
* add some ordering tests
* cleanup
* increase pytest timeout time
* check dtype
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
* new algo
* add test
* cleanup
* update tests
* ALLOWED_GATED_READ_IMAGE from 16 -> 12
* only remove the call to simplify
* add option to simplify with factor_remainder
* Allowed readimage gates back to 16
* lowering invalid gate is part of lower_index_dtype
* update test
* remove import
* put that back
* reduce_collapse uses invalid
* fix that pattern to use invalid_pat
* valid creates the right dtype count
* seperate rule for lowering invalid gate
* dont unvectorize Invalid gate
* image_fixup uses Invalid
* update tests
* cleanup
* update split_load_store
* add .scalar() there
* Slice to unbind symbolic
* use vmax for now
* assert shape in reshape is valid
* update test_symbolic_ops to use shrink instead of reshape
* remove infer_with_bound_values for npw
* symbolic output doesnt have symbolic strides
* symbolic jit tests use shrink to unregister symbolic
* update test
* update more tests
* wrap vmax in int()
* only create a new st if the store is not an assigne
* unwrap st
* comments
* merge index_dtype_3
* new lowering with Invalid idx
* remove that dtype from range
* finish merge
* annotate better
* indentation
* dont need that anymore
* always process replay for openpilot
* more uop_given_valid for idx
* valid past index_child
* fix bug preventing load getting an alt value
* add track_match_stats back in in shapetracker and remove cache
* get_valid_idx -> get_valid and get_idx
* fix heuristics with new idx
* split line
* fix typo
* fix signature
* dont skip idx if stride is 0
the idx may still be invalid
* lower const with new valid
* delete to_indexed_uops
* update shapetracker test
* delete axis_is_masked
* add cache back
* move around comment
* fix get_valid bug
* move invalid fold to symbolic so its earlier
* cleanup
* update applying padto to new idx
* add unit tests
* cleanup
* fold line
* improve spec
* dont try to render Invalid as a float
* more consistent invalid index
* update some tests
* Fold index with true cond
* skip test
* vconst min max if Invalid in arg
* fix signature of UOp.const
* add test for min/max of Invalid CONST/VCONST
* add InvalidType to as_const signature
* is Invalid to isinstance
* Add InvalidType to ConstLike
* index gate is a where gate
* make that a metaclass
* fix heurisics for new idx
* mypy happy