* support same uidx in multiple shape positions
* rename var
* update comment
* add contiguous index check to global_store too
* update comment
* small change
* is this better?
* smh
* smaller change?
* get rid of more changes
* get rid of more changes
* is this even making anything better
* comment
* fix test
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* fix LtNode simplification when lhs and rhs contain same variables
`(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)`
* fix with less perf impact
* Fix numpy uint/int overflow
* lol
* Works
* Update
* Move overflow test to float64/float32
* One line
* Update
* One more
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
* Cast correctly in python emulator
* Update test yml and fix lint
* make ruff pass
* mypy passes
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
it's recommended to use __getnewargs__ to update the args of classes that use __new__ when unpickling.
It's preferred because it does not change the __new__ behavior.
* do not truncate float64 precision
* use l suffix to try avoid overload confusion
* long line, ruff bloats the function otherwise
* fmt
* remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes
* use more reasonable test values, same as test_int_to_float_unary_func
* disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test
* disable test for HIP, renderer does not support f64 precision
* do not use noqa E501, break up condition
* remove cpu and torch backends
* don't copy to cpu
* use clang instead of cpu
* multitensor gathers on the first device
* clang is cpu + use default
* fixup
* bugfix
* fix OverflowError in UnaryOps.EXP2
* avoid accessing outputs for void uops
* skip execution for UOps.IF and UOps.ENDIF
* initialize bytearray to the correct size in UOps.DEFINE_LOCAL
* validate len of input that has .sz > 1
* remove comment in code
* reinitialize loop of already iterated
* validate first value in input to be a list for inputs with .sz > 1
* add python ops tests to CI
* skip long runtime tests for PYTHON backend
* respect dtype.sz arg in UOps.CONST, and remove incorrect validation in UOps.STORE
* use math.inf instead of float('int')
* handle 0 args to UnaryOPs.LOG2
* handle load op with default of .sz > 1
* initialize the loop correctly using UOps.LOOP arg
* remove unnecessary TODO comment
* remove newline
* select a subset of 22 ops tests to skip in CI when PYTHON=1
* handle gated UOps.LOAD referencing values that have .sz > 1
* Revert "select a subset of 22 ops tests to skip in CI when PYTHON=1"
This reverts commit 7674fee81d.
* skip tests in python backend CI command
* push fix lost in conflict resolve
* Revert "skip long runtime tests for PYTHON backend"
This reverts commit 5dd2a0376e.
* clear loop state after last iteration