* cleanup noop prefixes in _pool
make expand dim=None as noop (in addition to -1). then slice, reshape, expand in _pool can share the same noop prefix
* nit
* something then reshape style
* that's repeat
- removed exact duplicated tests
- only kept one function if torch_fxn is the same as tinygrad_fxn
- used tensor method instead of class method style
- replaced unneeded `lamdba f: f(x)` with just `f`
- re-enabled commented tests that work now
- removed some forward_only now 0 shape tensor can backward
* add operator.lt and operator.eq to test_dtype_alu
those should pass now as we have broadcasted before passing to lt and eq.
also updated the test skipping criteria to reuse test_dtype.is_dtype_supported
* llvm lt nan is incorrect
* enable truediv too
* Revert "enable truediv too"
This reverts commit df703235fb.
* just that
* move reduce over 0 len axis logic to lazy.py
this fixed uneven shard reduce case if the uneven one has length 0
* fix interpreted backends
* fix backwards for 0 shape tensors too
* init
* feat: add _to_const_val to getitem
* doc: changed docs
* docs: updated more docs
* merge: improved/fancy
* better error msg, minor cleanups
* feat: added index_put to test_indexing
* clean: test_indexing
* revert: gather changes lol
* refactor: use dict for tracking tensor indexing, also asserts for type
* oooooooooops
* ugh
* will revert this commit xD
* fix: removed asserts
* improvement: made in-line if statement clearer
* improved err message and improved slice_int tests
* fix: recover accidentally deleted line
* finishing touches
* reword some docs and del torch device tests in test_indexing
* del some redundant tests
* revert: gather asserts, do it in seperate pr
* fix some data_ptr stuff
* done
* done done
* shard llama
* sharding works
* simpler
* simpler
* consume option
* disable that test
* save a line
---------
Co-authored-by: George Hotz <george@tinygrad.org>
* compile cache for several devices
* ops_gpu uses hash to not care about sql
* hip rdna test with device
* linter happy
* no device passed where possible
* arch is optional to compile_{hip|cuda}
* initial multitensor jit support and tests
* Added graphs to multitensor jit and updated tests
* update unbind api
* fix set device, add TinyJit to resnet
* update_stats includes device
---------
Co-authored-by: ramenguy99 <ramenguy99@gmail.com>