* explicitly check getitem indices can have at most one ellipsis
previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```
this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```
* oh we have that already
* test that
* test these
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
`where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches
* failed test case for getitem with leading Nones
torch matched numpy so tinygrad is incorrect.
another repro
```
t = np.arange(12).reshape((3, 4))
print(t[None, None, np.array([1, 2])])
t = torch.arange(12).reshape((3, 4))
print(t[None, None, torch.tensor([1, 2])].numpy())
t = Tensor.arange(12).reshape(3, 4)
print(t[None, None, Tensor([1, 2])].numpy())
```
* # noqa
default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan).
dropped several explicit atol if it's unnecessarily larger than default 1e-6.
tested on mac, tinybox red / green
moved Sign logic to function.py, and backward always returns 0 to match torch.
rewrite abs as `self * self.sign()`, so it's backward also matches torch.
* mockgpu nv
* works
* comment that out
* fix merge
* setup gpuocelot
* install packages
* not run all of them
* passes
* fix ci
* almost
* should pass
* linter
* linter 2
* try this?
* ugn, not supported
* ci
* remove ticket from description
* better descs
* handle reshape with remainder in _reshape_mask
* remove trailing whitespce
* use helper_test_op to generate tensors from shapes
* test in shapetracket too
* remove whitespace
* revert property name in other class tests
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* test tolist
* simple fix for onnx test failures (#4186)
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* bump line count to 7500
* simplest fix
* safenumpy tolist for now
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
---------
Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>
* initial version
* heh gimme grrrreen
* version 2
* clean ups
* some test confusion
* fix onnx
* rename to _broadcast_tensors
* improved errors and test
* fixed?
* some test fixup
* version 3 lol
* comments
* cleaner
* add failure test for expand to 0 test
* 1 more assertRaises test
* make err msg better
* also rewrite the expand onnx op? :s
* Fix permutation of result indices in einsum.
* Delete stray line used for breaking tests
* Fix linter error by renaming twice-used variable
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* It works?
* Clamp correctly
* Refactor
* Make code better
* Undo some stuff
* First step to trying to make floats work
* Floats work in Python op but not metal because int div is different
Python integerdivision was implemented as // which rounds towards
negative infinity, but C integer division rounds towards 0 so there
is an off-by-1 division error
* arange does cumsum with ints and then multiplies by step
This is so loop optimization can remain int only
* Undo a lot of symbolic changes
* Final check
* Cleanup
* There can be multiple phis
* Fix multiple phi op removal
* const sets dtype correctly
* Fix bugs
* Fix a couple bugs and add loop vars to resolve
* missed one
* Don't trim too many ops
* Fix symbolic test
* Use ones instead of full
* Delete test
* Lint passes
* max node error
* Small updates to loop logic
* Remove unnecessary changes
* We are getting somewhere
* Simple case
* Fix
* rm, prn
* Better
* If NumNode doesn't work then continue
* clamp is needed for arange(256)
* Move everything into the optim fn
* Replace correctly
* Order optimizations better
* Delete
* mypy
* Test for simplification
* Rename
* Fix test
* update test description
* Undo more
* Cleanup
* No replaced_ops map
* Fix lint
* AssertionError
* back again
* Reinstate assertion
* Return true and make diff not as big
* Bigger range for test
* Change cumsum impl
* fix bug
* make big cumsum work
* lint
* Undo cumsum 2-stage removal
* No while helper
* optional min/max clamping
* floats work
* rm giant arange test
* fix python cast None
* Check phi parents
* one phi allowed per where
* Fix one phi per where
* Rework iteration
* Delete assertions
* convert to int
* Try mul -1 instead of neg for hip..?
* Remove one phi per where requirements
* one accum only
* Lint
* should simplify a loop at a time
* Don't get rid of loop explcitly
* Need to iterate backwards
* lint
* unary neg
* Make optim work for onnx and sum_pad_collapse
* Better message
* filter alu ops correctly
* Fix the limiter
* lint and simplify
* Add it back
* off by one error
* test wheres and phis
* test max ops and non-if stuff
* <=
* cast_scalar
* Oops
* Change test
* Pass loop uops instead of a modified map
* Cut param transfer between linearizer and uops
* Fix issues
* Fix lint
* fix efficientnet python 3.8 invalid syntax
* distinct vars in seen_vars
* accurate var names
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>