* use full_shape to determine if index can potentially overflow
* update comment
* use shapetracker to check max index value
* wip
* lint
* handle mask
* upcast to int64 by st is noop on WGSL
* fix comments
* Handle negative overflow, intermediaries overflow, int64 support
handle negative overflow
handle symbolic
wip
handle intermediate values
wip
check if typemap support int64
lint
comment
* add invalid_dtype
lint
* Fix bug on checking mask overflow
wip
wip
* Add more tests, need to resolve partial upcast
test Valid_view_dup
test valid op overflow
refine test cases
clean up
cleanup
wip
refine tests
lint
* Upcast is handled by lower_load_store
upcast as graph_rewrite to backtrack
update test
wip
cleanup
wip
cleanup
do upcast in lower_load_store
lint
* cleanup
* do upcast within lower_load_store and mutate ctx
* do upcast in get_idx and view
revert
lint
* cleanup
* Upcast in vec, const
upcast to const
test case 3
upcast on vector
lint
* simplify idx with symbolic in case of fake overflow
test case4
test case 4
update test
* test case4 is only for metal
* try: upcast inside graph_rewrite instead of shapetracker
wip
* checking overflow can just be done directly on all views, with idxs
* cleanup
* REMOVE hard coded uop test for idx upcast
* refactor
cleanup
refactor
* do actual casting when necessary, instead of rewriting all idx
hard code uop test
new upcast
* check dtype for int64 in webgpu
* cleanup
cleanup
* cleanup
* update tests
cleanup
comment
cleanup
cleanup
* comment
* comment
* update comment
update comment
* refactor
* typo
* keep the scope to only upcasting
* white space
* Revert "white space"
This reverts commit 314d7eb184.
* Revert "keep the scope to only upcasting"
This reverts commit 1ef701dd85.
* sym folding is not necessary
lint1
* fold symbolic
lint
* use symbolic simple when folding shapetracker idx
* full sym folding is required after all...
* Ops.CAST should retain the src min max
* put rewrite to lowerer
wip
* start testing on higher level
wip
test higher level in test_tensor
* find Ops.STORE in list instead of recursively
* check dtype support when upcasting
* remove invalid_dtype
* lint
* fix int64 support checks in upcast
lint
* skipif skipunless
* revert fold to find test case
* Revert "revert fold to find test case"
This reverts commit 225bb6e801.
* test sym folding
* handle ptx
* wip
* wip
* delete hard coded uop test
* lint fixes
* wip
* fix checking for None
* lint
* handle ptx
* comment
* dtype for overflow()
* update skipIf skipUnless
* assert in wgsl renderer for int64
wip
* do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops
* assert in lowerer for dtype support
lint
* Revert "assert in lowerer for dtype support"
This reverts commit 8e9b1b79bf.
* assert dtype in kernel.py
* Revert "assert dtype in kernel.py"
This reverts commit e29b9a9893.
* wip
* assert in render
* remove old assert
* check dtype from rendere, assert in upcast
wip
* smaller arange for sym fold case
* linearize directly
* use expand directly
* lint
* lint
* rename
* no need to check dtype in device.py
* trigger pr
* remove dtype assert in upcast, make wgpu fail in render
* use DType for type hint instead of dtypes
* assert on KeyError in tests for webgpu backend int64
* use a tuple for src
* test real kernel run
wip
* lint error
* restore
* fix real_size
* update test example
* resolve merge stuff
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
* only use BUFFER_VIEW in disk [pr]
* delete can_view
* BUFFER_VIEW op on DISK
* remove that allow_buffer_view=False
* notes
* bitcast is a low-level op too
* this passes on AMD and LLVM
* assert to prepare for grad uop [pr]
* fix test_nn
* fix most of test_tensor
* few more tests
* fix multi
* uniform gradient
* acc_dtype
* any for multi
* fix typing
* fix assert, CAST_BEFORE_VIEW is still the issue
* explict test for CAST_BEFORE_VIEW
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
* remove cast before view
* greener
* indexing
* delete view instant rule
* that passes too
* openpilot too
* ack
* base on cast_before_view
* add it as a rewrite rule
* VIEW(DEVICE) is also fine
* test_shard_memory depends on forced_realize removal
* put that back, will go soon
* UOp representations change once we don't instantly fold things
* do not duplicate tests
---------
Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
* start on test rewrite map [pr]
* chatgpt writes dumb tests
* comment out failing
* fix that test
* fix gc issue
* oh, frame 2
* remove uop mutability
* map is only the map
* simplier + more tests
* test tiny passes
* tests that need to pass
* parent test passes
* child test passes
* remove uop mutability [pr]
* test fixups
* most tests pass
* more tests pass
* lil test fixups
* them too
* fix test
* unneeded
* err, that
* fix test_hcq
* fix test failures
* fix that test
* tensor universe
* does this pass test
* Revert "does this pass test"
This reverts commit ed516b3169.
* Revert "tensor universe"
This reverts commit c21301852a.
* test_mutate_add passes
* this can pass
* Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map"
This reverts commit 657822dcdc, reversing
changes made to 2a126c145b.
* Revert "test_mutate_add passes"
This reverts commit ab4fc4c78e.
* correct enough
* remove test_rewrite_map_schedule.py
* viz
* uops are immutable
---------
Co-authored-by: qazal <qazal.software@gmail.com>
* use unravel in shape
* make process replay work
* earlier View.minify()
* fix
* fix tests
* mypy
* get rid of early minify
* fix
* linter
* clean and add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* delete is_constant from the scheduler
* add sink folding
* always give BUFFER uops Buffers [pr]
* spec for view, var (bind) and const
* add test_buffer_only_after_realize
* work
* 3 lines
* more work