* MultiLazyBuffer is UOp [pr]
* this is new mlb
* this is the idea
* progress
* multitensor works
* more movement ops
* this
* MultiLazyBuffer is UOp
* cleanups
* multi axis
* fix more tests
* work
* not that
* add multi grad and move shard to ops
* mops not views
* no double contig
* sweet, all mt tests passing
* port old logic
* remove lbs
* fix realized
* whitespace
* assign tweak
* test_assign_kv_cache_multi passes
* fix is_realized
* fix JIT for multi
* just a few more lines i'll pay them back soon i swear please bro just a few more
* no split reduceop for multi
* delete forced_realize
* put that back
* work
* remove forced_realize
* expectedFailures
* contiguous(buffer)
* multi
* expectedFailures
* cleaner create_subbuffer
* more comments
* remove that
* note
* realizes
* work
* one upat and image is back
* remove
* cleaner
* fix test_complex_backward for now
---------
Co-authored-by: George Hotz <geohot@gmail.com>
* Move define_acc down an unrolled add chain
* Prevent possible infinite recursion
* Add test
* Fix typo in test
* Move mulacc_unrolled to devoctorize + load_store_indexing pass
* Add test for mulacc_unrolled by itself
* undo formatter
* import from ops, not rewriter
* Add a const version
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* move softmax upcast to after subtracting max
max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax
* skipUnless half
* start
* progress
* fixes
* smth
* mini fixes
* fix2
* ugh, need this for now
* faster
* cleanups
* tiny linters
* make mypy happier
* test & free pts
* ops
* linter
* cleanup vm
* fix
* remove map_from
* tiny fixes
* add test to ci
* use full_shape to determine if index can potentially overflow
* update comment
* use shapetracker to check max index value
* wip
* lint
* handle mask
* upcast to int64 by st is noop on WGSL
* fix comments
* Handle negative overflow, intermediaries overflow, int64 support
handle negative overflow
handle symbolic
wip
handle intermediate values
wip
check if typemap support int64
lint
comment
* add invalid_dtype
lint
* Fix bug on checking mask overflow
wip
wip
* Add more tests, need to resolve partial upcast
test Valid_view_dup
test valid op overflow
refine test cases
clean up
cleanup
wip
refine tests
lint
* Upcast is handled by lower_load_store
upcast as graph_rewrite to backtrack
update test
wip
cleanup
wip
cleanup
do upcast in lower_load_store
lint
* cleanup
* do upcast within lower_load_store and mutate ctx
* do upcast in get_idx and view
revert
lint
* cleanup
* Upcast in vec, const
upcast to const
test case 3
upcast on vector
lint
* simplify idx with symbolic in case of fake overflow
test case4
test case 4
update test
* test case4 is only for metal
* try: upcast inside graph_rewrite instead of shapetracker
wip
* checking overflow can just be done directly on all views, with idxs
* cleanup
* REMOVE hard coded uop test for idx upcast
* refactor
cleanup
refactor
* do actual casting when necessary, instead of rewriting all idx
hard code uop test
new upcast
* check dtype for int64 in webgpu
* cleanup
cleanup
* cleanup
* update tests
cleanup
comment
cleanup
cleanup
* comment
* comment
* update comment
update comment
* refactor
* typo
* keep the scope to only upcasting
* white space
* Revert "white space"
This reverts commit 314d7eb184.
* Revert "keep the scope to only upcasting"
This reverts commit 1ef701dd85.
* sym folding is not necessary
lint1
* fold symbolic
lint
* use symbolic simple when folding shapetracker idx
* full sym folding is required after all...
* Ops.CAST should retain the src min max
* put rewrite to lowerer
wip
* start testing on higher level
wip
test higher level in test_tensor
* find Ops.STORE in list instead of recursively
* check dtype support when upcasting
* remove invalid_dtype
* lint
* fix int64 support checks in upcast
lint
* skipif skipunless
* revert fold to find test case
* Revert "revert fold to find test case"
This reverts commit 225bb6e801.
* test sym folding
* handle ptx
* wip
* wip
* delete hard coded uop test
* lint fixes
* wip
* fix checking for None
* lint
* handle ptx
* comment
* dtype for overflow()
* update skipIf skipUnless
* assert in wgsl renderer for int64
wip
* do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops
* assert in lowerer for dtype support
lint
* Revert "assert in lowerer for dtype support"
This reverts commit 8e9b1b79bf.
* assert dtype in kernel.py
* Revert "assert dtype in kernel.py"
This reverts commit e29b9a9893.
* wip
* assert in render
* remove old assert
* check dtype from rendere, assert in upcast
wip
* smaller arange for sym fold case
* linearize directly
* use expand directly
* lint
* lint
* rename
* no need to check dtype in device.py
* trigger pr
* remove dtype assert in upcast, make wgpu fail in render
* use DType for type hint instead of dtypes
* assert on KeyError in tests for webgpu backend int64
* use a tuple for src
* test real kernel run
wip
* lint error
* restore
* fix real_size
* update test example
* resolve merge stuff
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>