* remove ExecItem and merge it with ScheduleItem
* less diff
* fix issues
* min diff
* don't change bufs in _lower
* min diff
* update
* revert
* fixes
* diff
* move view pushing to codegen, try 2
* fix up some linearizer tests
* fix test search
* fix test schedule
* delete that test
* fix test arange
* fix a few tests
* update tests
* push views
* ebs cleanup
* fix local/reg
* test and lint
* fix more tests
* test cleanups
* skipped that one
* bug in div range folding
* simpler
* oh, this is right for indexing, but the div mod folding needs to be fixed
* reenable
* Passing test_complexity_w_unroll2 (#10068)
* Passing
* remove non_folded_divs
* Add check for negative tern in div folding
* Add test
* bump that limit
* fix casted
---------
Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
* make beautiful indexing use a Variable
* stunning test
* better color
* training is broken
* fix tests
* fix variable indexing
* fix test
* no contiguous
* revert that
* revert that too
* indexing two bind
* skip for webgpu
* make not slow
* rename Opt amt to arg
* ignore_beam_cache for test_tiny
* move ignore_beam_cache to test_tiny
* move to separate pr
* revert space change
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* LazyBuffer = UOp
* try 4 at this diff
* skip optimization tests p1
* raise kernel count expectations
* BIND isn't the _only_ uop that can become a tensor
* fix test_ones_sum on symbolic
* bump openpilot, correctness first
* offset on assign is fine
* uop is immutable
* what if this was higher
* more optimization skips
* instant fold const copy
* test_multitensor shouldn't expect buffer for unrealized
* move copy folder to upats
* start BUFFER_VIEW
* kinda BUFFER_VIEW
* Revert "kinda BUFFER_VIEW"
This reverts commit 94b4fe3040.
* BUFFER_VIEW try 2
* linter and missed _device
* pylint
* keep Ops.CONTIGUOUS
* always BUFFER_VIEW disk
* test
* cpu isn't a real device
* buffer references afte del
* add that back
* start bringing some of these back
* more test updates
* simpler simplify copy
* subbufer everything
* this is fine with buffer view
* cleanup the diff in test/ 1
* copy is one thing
* diff pruning
* diff pruning 2
* oh bind unbinds way too early
* extra
* more diff pruning
* more const folding
* experiment with symbolic here
* Revert "experiment with symbolic here"
This reverts commit cb87d61f7a.
* Revert "more const folding"
This reverts commit 2a7d258a2b.
* Revert VALID early folding
This reverts commit 4074f52317.
* storing const is fine
* fix test_prefer_half_buffer
* iterate on test_real_world
* this fixes test_train_mnist memory, breaks everything else
* Revert "this fixes test_train_mnist memory, breaks everything else"
This reverts commit dccfcbe068.
* always expect buffer to exist here
* temp debug: something is mutating lazydata in compile3
* Revert "temp debug: something is mutating lazydata in compile3"
This reverts commit 71400f0d55.
* everything back to normal
* compile3
* compile3 test
* start captured jit work, that test passes
* finalized memory skip set
* linter err
* back to base here
* tiny metaop cleanup
* print tensor
* 4th type this unbind got me
* green pickle
* tensor_variable sanity
* cast sanity
* link from the reds
* COPY sanity + minor repr change
* you can exist
* enable test_winograd
* bye bye nbytes
* danger, uop is mutating
* real become
* delete those from uop init
* put it in buffer init
* buffer inits with so much stuff
* buffer pickle try 2
* toposort can't be a cached property
* fix test_schedule_gc_with_inputs
* remove all @unittest.skip(gc)
* Revert "remove all @unittest.skip(gc)"
This reverts commit 9d8d92dd85.
* reenable real world + test_schedule_gc
* test: RUN_PROCESS_REPLAY=0
* fix pickle jit
* test changes
* reenable test_lru_alloc and TestTrain
* fix imagedtype
* bring pr back
* reenable 3 gc tests
* test_schedule better diff
* disable SPLIT_REDUCEOP
* test_save_all_dtypes looks fixed
* fix metadata
* skip that one
* fix viz by not pickling buffers
* simple test for const folding
* bring split reduceop back
* add simplify_alu
* simplify_binop fixes a test
* fix cast folding
* disable that test
* that test looks fine
* changes from delete_lazy pruning p1
* cast folding and children base
* test: cast folding from pruning branch
* green test_sgd_4convs_fuse_conv_bw
* enable some indexing folding
* test_complex_backward is fixed
* prune more, 295 -> 233
* fix test_multi_const_folding_literal
* fix double copy
* early become test
* ooooops
* clean up ctx in all big_graph
* fix openpilot 208 kernels
* train_cifar is fine now
* fix CAST_BEFORE_VIEW
* ever faker const
* back to 13
* mark expectedFailure
* fine don't create them
* test_multi_const_folding_tensor
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* Start from andredaprato:webgpu-clean
* Fix infs
* inf wgsl function is not needed
* Emulated ulong for threefry, more tests passing
* Randomness tests passing
* Update model export to support new changes in webgpu, efficientnet export works again
* Simplify shift emulation in wgsl
* Delete test file
* Fix bigger than u32 u32 literal
* Why was skip copies added here?
* Python3.12 for webgpu tests
* Fix model export syntax error
* Get test ops passing with some skips
* Fix lint
* Much simpler shift
* Run more tests
* Timestamp queries are not supported in CI, so skip search tests
* All fancy indexing passing
* r is ctx
* Run more dtype tests by using is_dtype_supported
* Cleanup ulong shift rendering
* UPat -> Pat, UOps -> Ops
* Pat -> UPat
* Refactor render_ushift if-else
* Pattern to avoid ulong mul
* Remove vals_dtype
* is_nan trick + rewrite, test_isnan passing
* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)
* No arg, just op
* Support char, uchar, short, ushort
* Run test_index_mnis now that we have uint8
* Fix pyling
* Save 3 lines by using base Compiler
* No more long emulation
* Remove fixup_binops
* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm
* Simpler, faster copyin/out
* Skip some new tests that use long
* Fix typo
* copyout touchup
* Save lines by using render_cast
* WebGL is not supported in core, delete it from is_dtype_supported
* More narrow test skips for some unary tests
* TernaryOps, UnaryOps -> Ops
* TinyGrad supports WebGPU
* StableDiffusion demo: f16tof32 gpu is a lib, update UI
* Packed load/store, no more scale_size, no core tinygrad changes
* Rename copyin, copyout
* Device -> dev
* Fix lint
* Pattern matcher rule for packed load/store
* Refactor
* Shorter packed load/store
* this should fix lint
* Fix mypy
* SD compile script working
* New SD webgpu UI
* New default prompt
* New SD weights
* Fix title when webgpu not available
* Run symbolic tests, simplify is_nan, use round_up
* Show step time on UI
* Bump minimum wgpu version to v0.19
* Fix latent
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>