* split pm_cleanups
* update test_schedule
* shrink when we remove bufferize
* dont do shrink if shape is empty
* update tests
* remove *1 from metadata
* deal with the noop bufferize
* only noop on cvar
* cleanup
* fix if
* rename
* nak works
* TestOps::test_add works
* testop has no crashes
* fix bool casts
* fix typo
* add disassemble
* RANGE and locals/regs
* simplify NAKCompiler
* disass cleanup
* cleanup nir codegen
* almost all tests passing
* cleanup notes in extra/
* old notes
* only import nak if NIR=1
* fix new SPECIAL syntax
* fix local/shared memory
* more tests passing
* add DEFINE_VAR support
* llvmpipe kinda works
* diskcache
* some mypy stuff
* lvp passing test_ops.py
* fix imports
* actually fix imports
* remove 'stdout'
* fix llvm import
* fix mypy issues
* nicer errors
* simpler test_dtype skips
* test lvp in CI
* fix github action syntax
* fix more actions typos
* switch to mesa 25.1.0
* diskcache_put
* better generation for lvp nir_options
* b64encode shader blobs
* Revert diskcache changes
This reverts commits 930fa3de8a and 8428c694b3.
* general cleanup
* better error messages
* fix llvm import
* fix windows tests
* link with libm and libgcc_s
* fix some errors
* dont check for 'float4'
* NIR uses pointer arithmetic
* use tinymesa
* bump tinymesa
* bump tinymesa again
* update lvp nir_options
* print nir shader with DEBUG
* simplify LVPCompiler
* more tests
* "gated" STORE
* NAK is cacheable
* more tests
* all tests pass locally for NAK
* test autogen in CI
* autogen deps
* more deps
* fix uop_gc
* fix macos
* mypy
* save 2 lines
* save two more lines
* save 1 line
* save 4 lines
* save more lines
* Revert "save more lines"
This reverts commit dd3a720c5a.
* save more lines
* fix LVP on windows
* refactor
* reorganize some code
* refactor lib_gpu
* move LVP check
* out of order loads
* remove support.mesa
* bump tinymesa version
* simplify LVP jit
* macos
* macos ci
* shell: bash
* testing
* more testing
* compute brew prefix
* stupid typo
* actually fix
* lib
* stdout on macos
* inline gallivm_compile_module
* Revert "inline gallivm_compile_module"
This reverts commit b65983b151.
* elf macos
* semicolon
* inherit from CPULLVMCompiler
* ruff
* disas test
* fix libm linking
* default is fine actually
* arm works
* add elf loader link test
* fix NAK beam
* pylint is too smart by half
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
* rtoposort is fast, can replace rangeify with this
* fast rangeify
* work
* fast rangeify works for mnist
* should work
* progress
* pad fix
* FAST
* tests passing
* don't delete those shape ops
* put in rangeify map
* ending ranges fix
* tests
* mstack/mselect no hacks
* move to indexing.py
* touch up tests + add comments
* disable failing test
* actually make the file readable
* failing
* error
* add dtypes.index
* cast shape, stride and mask to dtypes.index in view.create
* move pm_lower_index_dtype to ops
* DEFINE_VAR is dtype.index by default
* merge var_val_using_str
* remove int from commutative
* fix test_rewrite_map
* change that to dtypes.index
* change some int to index
* shorten those
* remove old cast in renderer
* cleanup
* change that back
* add comment
* delete comment
* just delete those
* view doesnt have to cast anymore
* adjust comment
* use full_shape to determine if index can potentially overflow
* update comment
* use shapetracker to check max index value
* wip
* lint
* handle mask
* upcast to int64 by st is noop on WGSL
* fix comments
* Handle negative overflow, intermediaries overflow, int64 support
handle negative overflow
handle symbolic
wip
handle intermediate values
wip
check if typemap support int64
lint
comment
* add invalid_dtype
lint
* Fix bug on checking mask overflow
wip
wip
* Add more tests, need to resolve partial upcast
test Valid_view_dup
test valid op overflow
refine test cases
clean up
cleanup
wip
refine tests
lint
* Upcast is handled by lower_load_store
upcast as graph_rewrite to backtrack
update test
wip
cleanup
wip
cleanup
do upcast in lower_load_store
lint
* cleanup
* do upcast within lower_load_store and mutate ctx
* do upcast in get_idx and view
revert
lint
* cleanup
* Upcast in vec, const
upcast to const
test case 3
upcast on vector
lint
* simplify idx with symbolic in case of fake overflow
test case4
test case 4
update test
* test case4 is only for metal
* try: upcast inside graph_rewrite instead of shapetracker
wip
* checking overflow can just be done directly on all views, with idxs
* cleanup
* REMOVE hard coded uop test for idx upcast
* refactor
cleanup
refactor
* do actual casting when necessary, instead of rewriting all idx
hard code uop test
new upcast
* check dtype for int64 in webgpu
* cleanup
cleanup
* cleanup
* update tests
cleanup
comment
cleanup
cleanup
* comment
* comment
* update comment
update comment
* refactor
* typo
* keep the scope to only upcasting
* white space
* Revert "white space"
This reverts commit 314d7eb184.
* Revert "keep the scope to only upcasting"
This reverts commit 1ef701dd85.
* sym folding is not necessary
lint1
* fold symbolic
lint
* use symbolic simple when folding shapetracker idx
* full sym folding is required after all...
* Ops.CAST should retain the src min max
* put rewrite to lowerer
wip
* start testing on higher level
wip
test higher level in test_tensor
* find Ops.STORE in list instead of recursively
* check dtype support when upcasting
* remove invalid_dtype
* lint
* fix int64 support checks in upcast
lint
* skipif skipunless
* revert fold to find test case
* Revert "revert fold to find test case"
This reverts commit 225bb6e801.
* test sym folding
* handle ptx
* wip
* wip
* delete hard coded uop test
* lint fixes
* wip
* fix checking for None
* lint
* handle ptx
* comment
* dtype for overflow()
* update skipIf skipUnless
* assert in wgsl renderer for int64
wip
* do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops
* assert in lowerer for dtype support
lint
* Revert "assert in lowerer for dtype support"
This reverts commit 8e9b1b79bf.
* assert dtype in kernel.py
* Revert "assert dtype in kernel.py"
This reverts commit e29b9a9893.
* wip
* assert in render
* remove old assert
* check dtype from rendere, assert in upcast
wip
* smaller arange for sym fold case
* linearize directly
* use expand directly
* lint
* lint
* rename
* no need to check dtype in device.py
* trigger pr
* remove dtype assert in upcast, make wgpu fail in render
* use DType for type hint instead of dtypes
* assert on KeyError in tests for webgpu backend int64
* use a tuple for src
* test real kernel run
wip
* lint error
* restore
* fix real_size
* update test example
* resolve merge stuff
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
* assert to prepare for grad uop [pr]
* fix test_nn
* fix most of test_tensor
* few more tests
* fix multi
* uniform gradient
* acc_dtype
* any for multi
* fix typing
* fix assert, CAST_BEFORE_VIEW is still the issue
* explict test for CAST_BEFORE_VIEW
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>