* delete forced_realize
* put that back
* work
* remove forced_realize
* expectedFailures
* contiguous(buffer)
* multi
* expectedFailures
* cleaner create_subbuffer
* more comments
* remove that
* note
* realizes
* work
* one upat and image is back
* remove
* cleaner
* fix test_complex_backward for now
---------
Co-authored-by: George Hotz <geohot@gmail.com>
* Move define_acc down an unrolled add chain
* Prevent possible infinite recursion
* Add test
* Fix typo in test
* Move mulacc_unrolled to devoctorize + load_store_indexing pass
* Add test for mulacc_unrolled by itself
* undo formatter
* import from ops, not rewriter
* Add a const version
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* move softmax upcast to after subtracting max
max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax
* skipUnless half
* start
* progress
* fixes
* smth
* mini fixes
* fix2
* ugh, need this for now
* faster
* cleanups
* tiny linters
* make mypy happier
* test & free pts
* ops
* linter
* cleanup vm
* fix
* remove map_from
* tiny fixes
* add test to ci
* use full_shape to determine if index can potentially overflow
* update comment
* use shapetracker to check max index value
* wip
* lint
* handle mask
* upcast to int64 by st is noop on WGSL
* fix comments
* Handle negative overflow, intermediaries overflow, int64 support
handle negative overflow
handle symbolic
wip
handle intermediate values
wip
check if typemap support int64
lint
comment
* add invalid_dtype
lint
* Fix bug on checking mask overflow
wip
wip
* Add more tests, need to resolve partial upcast
test Valid_view_dup
test valid op overflow
refine test cases
clean up
cleanup
wip
refine tests
lint
* Upcast is handled by lower_load_store
upcast as graph_rewrite to backtrack
update test
wip
cleanup
wip
cleanup
do upcast in lower_load_store
lint
* cleanup
* do upcast within lower_load_store and mutate ctx
* do upcast in get_idx and view
revert
lint
* cleanup
* Upcast in vec, const
upcast to const
test case 3
upcast on vector
lint
* simplify idx with symbolic in case of fake overflow
test case4
test case 4
update test
* test case4 is only for metal
* try: upcast inside graph_rewrite instead of shapetracker
wip
* checking overflow can just be done directly on all views, with idxs
* cleanup
* REMOVE hard coded uop test for idx upcast
* refactor
cleanup
refactor
* do actual casting when necessary, instead of rewriting all idx
hard code uop test
new upcast
* check dtype for int64 in webgpu
* cleanup
cleanup
* cleanup
* update tests
cleanup
comment
cleanup
cleanup
* comment
* comment
* update comment
update comment
* refactor
* typo
* keep the scope to only upcasting
* white space
* Revert "white space"
This reverts commit 314d7eb184.
* Revert "keep the scope to only upcasting"
This reverts commit 1ef701dd85.
* sym folding is not necessary
lint1
* fold symbolic
lint
* use symbolic simple when folding shapetracker idx
* full sym folding is required after all...
* Ops.CAST should retain the src min max
* put rewrite to lowerer
wip
* start testing on higher level
wip
test higher level in test_tensor
* find Ops.STORE in list instead of recursively
* check dtype support when upcasting
* remove invalid_dtype
* lint
* fix int64 support checks in upcast
lint
* skipif skipunless
* revert fold to find test case
* Revert "revert fold to find test case"
This reverts commit 225bb6e801.
* test sym folding
* handle ptx
* wip
* wip
* delete hard coded uop test
* lint fixes
* wip
* fix checking for None
* lint
* handle ptx
* comment
* dtype for overflow()
* update skipIf skipUnless
* assert in wgsl renderer for int64
wip
* do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops
* assert in lowerer for dtype support
lint
* Revert "assert in lowerer for dtype support"
This reverts commit 8e9b1b79bf.
* assert dtype in kernel.py
* Revert "assert dtype in kernel.py"
This reverts commit e29b9a9893.
* wip
* assert in render
* remove old assert
* check dtype from rendere, assert in upcast
wip
* smaller arange for sym fold case
* linearize directly
* use expand directly
* lint
* lint
* rename
* no need to check dtype in device.py
* trigger pr
* remove dtype assert in upcast, make wgpu fail in render
* use DType for type hint instead of dtypes
* assert on KeyError in tests for webgpu backend int64
* use a tuple for src
* test real kernel run
wip
* lint error
* restore
* fix real_size
* update test example
* resolve merge stuff
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
Number after .so is abi version, it is always 1 for libgcc_s.
Most linux systems set default library versions via symlinks that are
simply followed to get actual elf, however conda does it via linker
scripts which ctypes doesn't follow (below contents of libgcc_s.so):
```
/* GNU ld script
Use the shared library, but some functions are only in
the static library. */
GROUP ( libgcc_s.so.1 -lgcc )
```
ctypes.util.find_library thinks that this is the actual elf and
ctypes.CDLL just loads this text file as a shared library. The result
is:
```
File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram
helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header
```