chenyu
4a65010de8
remove CUDACPU flag in tests [run_process_replay] ( #5902 )
...
no longer used
2024-08-04 16:06:38 -04:00
chenyu
b392b8edc3
increase atol and rtol test_gemm_fp16 ( #5866 )
...
* increase atol and rtol test_gemm_fp16
made it pass with NOOPT which has larger accumulated error
* revert that
2024-08-01 19:09:58 -04:00
chenyu
defd89e8e0
unify negative shape creation to raise ValueError ( #5817 )
...
[run_process_replay]
2024-07-30 13:42:59 -04:00
P4ssenger
6742a4789a
Add check for negative dimension in view ( #5790 )
...
* add check for negative dimension in view
* add negative dim tests
* move check to tensor level
* fix error message
* move check to view create
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-30 13:26:27 -04:00
samm393
573e0f9a48
remove float division from idiv in python_alu ( #5777 )
...
* removes float division from idiv in python_alu
* add test
* cleaner logic
* pass clang unsigned literals correctly
* suffix ULL instead of U
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-29 12:14:12 -04:00
George Hotz
053550c3f3
remove MERGE opt, cleanup wmma upcast ( #5669 )
...
* remove MERGE opt, cleanup wmma upcast
* upcast first
* fix broken vectorize folding rule
2024-07-23 20:43:42 -07:00
George Hotz
e3f00ac77d
Fix cuda tc emu test ( #5663 )
...
* fix acc folding for NV tensor cores
* fix correctness of reduce_before_expand
* fix test emulated CUDA tensor cores
* test_gemm_fp16 on some devices
2024-07-23 15:04:25 -07:00
George Hotz
386fb5e7f8
folding without UNMUL ( #5628 )
...
* folding without UNMUL
* fix failures, index_collapse
* import ReduceOps
* test_arange_4096 isn't folding
2024-07-21 20:14:44 -07:00
George Hotz
0ad87021e2
move acc to end ( #5568 )
...
* move acc to end
* confirmed pictures are the same
* relax that
* Update test_ops.py
2024-07-19 03:06:52 -07:00
chenyu
6e405b0a2b
add 0d tensor to trunc/floor/ceil/round tests ( #5512 )
...
existing trunc test passes backward but its backward is incorrect in general. added tests that would fail
2024-07-16 16:48:25 -04:00
Tobias Fischer
87a2ef2bc2
Add Interpolate Function ( #5482 )
...
* add interpolate function
* fixed linter issue
* reduced sizes in test
---------
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2024-07-16 09:44:01 -07:00
Tobias Fischer
e219103677
Add Pad to Pooling ( #5488 )
2024-07-14 21:50:20 -07:00
Tobias Fischer
5849130cbb
gather negative dim fix ( #5486 )
2024-07-14 20:20:53 -04:00
chenyu
00813a92a0
update Tensor.eye api to match torch ( #5433 )
...
* update Tensor.eye api to match torch
input is n for nrows and optional m for ncols
* space
* fix onnx
2024-07-12 20:25:12 -04:00
chenyu
64986f949c
more transcend math tests in ci ( #5368 )
...
* more transcend math tests in ci
test large input to trig functions that hit different reduction algo, and test TRANSCENDENTAL=2 for all backend
* no CUDACPU
* try that
2024-07-10 21:19:09 -04:00
chenyu
0f0940225a
fix Tensor.all and Tensor.any for PTX ( #5335 )
...
supported boolean acc and boolean phi. and rewrite boolean max to uint8 max
2024-07-08 18:15:04 -04:00
chenyu
6856f915d6
Tensor.any and Tensor.all ( #5320 )
...
does not work in ptx yet due to how boolean tensor is handled
2024-07-07 14:36:00 -04:00
chenyu
2029cb7047
support passing None to Tensor.clip ( #5319 )
...
passing None for no upper bound or no lower bound
2024-07-07 13:04:22 -04:00
chenyu
c1e330f302
Tensor.int and Tensor.bool ( #5317 )
2024-07-07 11:52:58 -04:00
George Hotz
e53b164e1a
small changes from lowerer ( #5266 )
2024-07-02 15:03:54 -07:00
George Hotz
3df47bc21e
OpenELM + repeat_interleave ( #5234 )
...
* start writing openelm
* progress...hit bug
* repeat_interleave support
* gqa
* add rotary embedding
* spp
* i think it runs correctly
* broken
* output is good now
* cleanups
* no io_uring on android
2024-06-30 15:18:39 -07:00
hikettei
ad1ca7da64
[Feature] Added BinaryOps.AND/BinaryOps.OR ( #5223 )
...
* [Feature] Added BinaryOps.AND/BinaryOps.OR
* Add: __rand__, __ror__
2024-06-29 17:20:25 -07:00
chenyu
ee0c6dfc15
build Tensor._tri with movements only ( #5110 )
...
* build Tensor._tri with movements only
doesn't need arange, saved a kernel in attention mask
* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu
20fabd8a5b
update Tensor.triu and Tensor.tril ( #5109 )
...
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
George Hotz
9f875123b6
small changes from lowerer. [run_process_replay] [no_assert] ( #5102 )
2024-06-22 11:09:35 -07:00
chenyu
166a2b19b5
fix reduce axis of 0d tensors ( #5089 )
...
`x.sum(())` is fine, and `x.sum((1,))` should throw IndexError
2024-06-21 13:51:40 -04:00
chenyu
36b4a492a1
explicitly check getitem indices can have at most one ellipsis ( #5087 )
...
* explicitly check getitem indices can have at most one ellipsis
previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```
this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```
* oh we have that already
* test that
* test these
2024-06-21 12:33:18 -04:00
chenyu
f6d6760f71
don't cast tuple to list before creating Tensor ( #5071 )
...
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
chenyu
50700171ef
minor cleanup to reshape arg handling ( #5070 )
...
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00
chenyu
f4355d0f1b
check Tensor.permute input arg is a valid permutation ( #5069 )
...
also added support of negative axes
2024-06-20 10:01:28 -04:00
chenyu
e8f39fcaaa
check arg to Tensor.flip can appear only once ( #5068 )
...
* check arg to Tensor.flip can appear only once
raise RuntimeError if there are multiple
* fix test
2024-06-20 09:33:42 -04:00
chenyu
620fa6e5a2
check Tensor.reshape can have at most one -1 ( #5026 )
...
raise RuntimeError to match torch. on master it throws weird errors from shapetracker
2024-06-18 08:17:12 -04:00
chenyu
c0139b05d8
python_alu sin(inf) is nan ( #5020 )
...
* python_alu sin(inf) is nan
without special handling, it throws ValueError: math domain error
* skip CUDACPU
2024-06-17 19:47:30 -04:00
Ray
1ad3b25461
fix einsum output str ( #4998 )
...
* fix einsum output str
* new line to satisfy linter
* removed redundant cast (satisfy linter)
2024-06-17 12:18:14 -04:00
chenyu
67e8df4969
remove numpy from dtype ( #4969 )
...
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
geohotstan
90332eb529
Getitem pin None dimension ( #4960 )
...
* fix
* remove torch out of bounds test
* 1 more test case
2024-06-14 10:48:59 -04:00
chenyu
74586bc339
fix getitem with leading None ( #4943 )
...
i think all None handling can be unified and remove the calc_dim in advanced indexing
2024-06-13 11:23:40 -04:00
chenyu
fae08c4d48
fix Tensor.triu / Tensor.triu with boolean input ( #4941 )
...
`where(self, 0)` incorrectly upcasted the output. `where(self, False)` is correct but looks unnatural, so added a cast at the end. Pattern matcher can fold the cast into where branches
2024-06-12 20:16:13 -04:00
chenyu
eb0f5b5660
failed test case for getitem with leading Nones ( #4936 )
...
* failed test case for getitem with leading Nones
torch matched numpy so tinygrad is incorrect.
another repro
```
t = np.arange(12).reshape((3, 4))
print(t[None, None, np.array([1, 2])])
t = torch.arange(12).reshape((3, 4))
print(t[None, None, torch.tensor([1, 2])].numpy())
t = Tensor.arange(12).reshape(3, 4)
print(t[None, None, Tensor([1, 2])].numpy())
```
* # noqa
2024-06-12 16:19:42 -04:00
chenyu
1326f29e24
fix Tensor.gather shape checking criteria ( #4932 )
...
it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`
2024-06-12 13:10:14 -04:00
chenyu
798ea61377
widen test_ops [low, high] and more strict atol ( #4906 )
...
default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan).
dropped several explicit atol if it's unnecessarily larger than default 1e-6.
tested on mac, tinybox red / green
2024-06-10 20:47:09 -04:00
chenyu
c8cd637236
test case for Tensor.var reducing over size = 1 axis ( #4902 )
...
backward failed when correction >= reducing n
2024-06-10 12:11:39 -04:00
chenyu
a70e8a80d7
test_ops test cmp with special floats ( #4826 )
...
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
chenyu
3afc914617
CMPEQ -> CMPNE and make it safe to pad ( #4818 )
...
* CMPNE
* new dataset
2024-06-03 18:02:15 -04:00
chenyu
4921de1945
fix cumsum of 0-d tensor ( #4781 )
...
* fix cumsum of 0-d tensor
* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f
failed test case for ellipsis in einsum ( #4779 )
...
from #4156
2024-05-30 11:14:42 -04:00
chenyu
7e90026eb0
pow cleanup part 2 ( #4727 )
...
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
chenyu
47aba47f64
update Torch.gather api ( #4692 )
...
* update Torch.gather api
gather(self, dim, index) to match torch
* fix that
2024-05-22 21:54:06 -04:00
George Hotz
07b350a8f4
new uops is an actual graph ( #4560 )
...
* new uops is an actual graph
* it's way slower
* simpler
* fix define acc
* render_loop unique
* ops test pass
* add pattern matcher back, there's bugs
* rewrite
* use priority queue
* recursive children
* fix tests
* fix tests with SINK
* fix abstractions
* fix assembly
* simpler
* link define_acc
* fix DEFINE_ACC placement
* type verify
* full cmp
* fix cmp
* ACCESS_ACC
* insert DEFINE_ACC
* fix PHI
* recursive rewrite
* fix many tests
* sum collapse
* more patterns
* correct change
* fold arange
* fix that lin test
* space
* big folding rule works
* close
* has more maxes, meh
* cached node replace
* set changed
* simplest folding yet
* works
* works
* DIV
* all tests pass
* del
* fuzz linearizer fails
* sum_collapse
* test depth 2 cf
* fix lin test 14
* fix clang depth
* disable that
* failure 14 is fixed
* fix ptx
* failure 27 is fixed
* fix llama
* run_cnt
* Revert "Optimize PTX gated loads index calculation (#4304 )"
This reverts commit d97d5a7689 .
* fix uops loop
* fix ptx bugs
* add barrier
* print
* mem_type in ptx direct
* bypass tests that fail in CI but pass locally
* ptx remove ptr_ar
* more ptx passing
* fix ptx tests
* assert compile support
* remove model inference benchmark from red
2024-05-17 18:00:18 -07:00