* Switch to dawn, all tests passing locally
* Use dawn-python
* Skip failing test
* Skip midcast and fix timestamp on metal ci
* Autogen webgpu
* Try fetch dawn lib again
* /usr/lib
* Without lib prefix
* Test autogen diff
* Delete webgpu support, move everything to ops_webgpu
* mypy fix
* Simplify, refactor
* Line savings
* No ResultContainer
* Type annotation for result
* Some more simplifications
* Why was this explicit sync used at all?
* Refactor: delete functions that are only used once
* Create shader module inline
* Clear unit tests cache, maybe that solves it
* That wasn't it
* Try deleting cache to pass failing weight compare
* weights_only=False for pytorch 2.6
* Simplify ctype array creation
* Remove nanosecond precision timestamps
* Simplify error handling
* Refactor, add back type annotations
* Deleted custom submit function, refactor
* read_buffer simplify
* Fix use after free, refactor
* Simplify supported_features
* Runtime docs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* MultiLazyBuffer is UOp [pr]
* this is new mlb
* this is the idea
* progress
* multitensor works
* more movement ops
* this
* MultiLazyBuffer is UOp
* cleanups
* multi axis
* fix more tests
* work
* not that
* add multi grad and move shard to ops
* mops not views
* no double contig
* sweet, all mt tests passing
* port old logic
* remove lbs
* fix realized
* whitespace
* assign tweak
* test_assign_kv_cache_multi passes
* fix is_realized
* fix JIT for multi
* just a few more lines i'll pay them back soon i swear please bro just a few more
* no split reduceop for multi
* remove cast before view
* greener
* indexing
* delete view instant rule
* that passes too
* openpilot too
* ack
* base on cast_before_view
* add it as a rewrite rule
* VIEW(DEVICE) is also fine
* test_shard_memory depends on forced_realize removal
* put that back, will go soon
* UOp representations change once we don't instantly fold things
* do not duplicate tests
---------
Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
* LazyBuffer = UOp
* try 4 at this diff
* skip optimization tests p1
* raise kernel count expectations
* BIND isn't the _only_ uop that can become a tensor
* fix test_ones_sum on symbolic
* bump openpilot, correctness first
* offset on assign is fine
* uop is immutable
* what if this was higher
* more optimization skips
* instant fold const copy
* test_multitensor shouldn't expect buffer for unrealized
* move copy folder to upats
* start BUFFER_VIEW
* kinda BUFFER_VIEW
* Revert "kinda BUFFER_VIEW"
This reverts commit 94b4fe3040.
* BUFFER_VIEW try 2
* linter and missed _device
* pylint
* keep Ops.CONTIGUOUS
* always BUFFER_VIEW disk
* test
* cpu isn't a real device
* buffer references afte del
* add that back
* start bringing some of these back
* more test updates
* simpler simplify copy
* subbufer everything
* this is fine with buffer view
* cleanup the diff in test/ 1
* copy is one thing
* diff pruning
* diff pruning 2
* oh bind unbinds way too early
* extra
* more diff pruning
* more const folding
* experiment with symbolic here
* Revert "experiment with symbolic here"
This reverts commit cb87d61f7a.
* Revert "more const folding"
This reverts commit 2a7d258a2b.
* Revert VALID early folding
This reverts commit 4074f52317.
* storing const is fine
* fix test_prefer_half_buffer
* iterate on test_real_world
* this fixes test_train_mnist memory, breaks everything else
* Revert "this fixes test_train_mnist memory, breaks everything else"
This reverts commit dccfcbe068.
* always expect buffer to exist here
* temp debug: something is mutating lazydata in compile3
* Revert "temp debug: something is mutating lazydata in compile3"
This reverts commit 71400f0d55.
* everything back to normal
* compile3
* compile3 test
* start captured jit work, that test passes
* finalized memory skip set
* linter err
* back to base here
* tiny metaop cleanup
* print tensor
* 4th type this unbind got me
* green pickle
* tensor_variable sanity
* cast sanity
* link from the reds
* COPY sanity + minor repr change
* you can exist
* enable test_winograd
* bye bye nbytes
* danger, uop is mutating
* real become
* delete those from uop init
* put it in buffer init
* buffer inits with so much stuff
* buffer pickle try 2
* toposort can't be a cached property
* fix test_schedule_gc_with_inputs
* remove all @unittest.skip(gc)
* Revert "remove all @unittest.skip(gc)"
This reverts commit 9d8d92dd85.
* reenable real world + test_schedule_gc
* test: RUN_PROCESS_REPLAY=0
* fix pickle jit
* test changes
* reenable test_lru_alloc and TestTrain
* fix imagedtype
* bring pr back
* reenable 3 gc tests
* test_schedule better diff
* disable SPLIT_REDUCEOP
* test_save_all_dtypes looks fixed
* fix metadata
* skip that one
* fix viz by not pickling buffers
* simple test for const folding
* bring split reduceop back
* add simplify_alu
* simplify_binop fixes a test
* fix cast folding
* disable that test
* that test looks fine
* changes from delete_lazy pruning p1
* cast folding and children base
* test: cast folding from pruning branch
* green test_sgd_4convs_fuse_conv_bw
* enable some indexing folding
* test_complex_backward is fixed
* prune more, 295 -> 233
* fix test_multi_const_folding_literal
* fix double copy
* early become test
* ooooops
* clean up ctx in all big_graph
* fix openpilot 208 kernels
* train_cifar is fine now
* fix CAST_BEFORE_VIEW
* ever faker const
* back to 13
* mark expectedFailure
* fine don't create them
* test_multi_const_folding_tensor
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like
* op arg alu [pr]
* more
* more passing
* fix more tests
* more tests passing
* fix single failing test
* so much cleaner
* noop to not have process replay trigger
* fix ptx
* try for canonical order
* cmp better
* disable bad tests
* flip const order
* fix test
* fix tests
* different fix for NOOP
* metaclass here
* fix tests
* narrower scope