chenyu
2d7c28de6a
clean up dup lambdas in helper_test_exception ( #11325 )
2025-07-22 12:21:57 -04:00
chenyu
fb42c84365
merge TestRollEdgeCases into test_ops ( #11321 )
2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c
movementop only Tensor.roll ( #11317 )
...
* movementop only Tensor.roll
* fixed
2025-07-22 10:34:15 -04:00
chenyu
6e9506e6fd
Tensor.roll supports dims=None ( #11313 )
2025-07-21 17:29:23 -04:00
chenyu
d3a93185a6
clean up test_roll ( #11312 )
2025-07-21 16:00:50 -04:00
chenyu
341a686799
Tensor.diagonal ( #11122 )
...
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Nino Risteski
a1a146a499
adding enable_gqa in SDPA ( #11097 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-06 23:25:33 -07:00
chenyu
7468959f4b
Tensor.argsort ( #11112 )
2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend ( #11113 )
...
* clean tests, set full_matrices false
* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0
pipe linalg svd to torch ( #11109 )
...
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc
Tensor.diag ( #11108 )
...
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0
remove invalid int div test ( #11106 )
...
* rm test
* also rm this
2025-07-05 18:57:55 -04:00
chenyu
a2f5a54458
move sparse_categorical_crossentropy to test_ops ( #11083 )
...
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
678cabc6f2
use argfix in Tensor.stack ( #11077 )
...
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
Ahmed Harmouche
e992ed10dc
WebGPU on Windows ( #10890 )
...
* WebGPU on Windows
* Fix dawn-python install
* New test
* pydeps
* Minor fix
* Only install dawn-python on windows webgpu
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-02 08:38:45 -07:00
chenyu
126fcf4129
clean up AMD_LLVM in tests ( #11021 )
2025-06-28 22:45:47 -04:00
chenyu
49bba2f0a0
improve test_nll_loss ( #10986 )
...
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
chenyu
0612acfc70
improve Tensor.cross_entropy ( #10985 )
...
separate when Y is prob vs indices and check shapes for indices. also fix higher dim cases
2025-06-26 01:39:48 -04:00
chenyu
18e264a449
Tensor.logsigmoid ( #10955 )
2025-06-24 11:16:14 -04:00
chenyu
35504c938e
torch.clip(x,y) -> x.clip(y) in test_ops ( #10954 )
...
* torch.clip(x,y) -> x.clip(y) in test_ops
* test_binary_crossentropy_logits_pos_weights
2025-06-24 10:22:19 -04:00
Fang-Pen Lin
86d458533f
Add pos_weight for binary_crossentropy_logits ( #10855 )
...
* Add pos_weight for binary_crossentropy_logits
* Remove debug code
* Code style
* Code style
* Rename
2025-06-24 09:42:37 -04:00
chenyu
2d9c61e39e
test more dims in test_logsumexp and test_logcumsumexp ( #10907 )
...
refactoring squeeze and unsqueeze is easy to get wrong
2025-06-20 21:42:18 -04:00
Nino Risteski
3771cc0f77
fix test logcumsumexp broken devectorize=0 ( #10880 )
...
* fix test logcumsumexp numerical
* lint
* Use dtypes.min instead of -1e4
2025-06-20 20:54:50 -04:00
George Hotz
a493eb396c
fix view add 0 ( #10840 )
2025-06-16 16:46:12 -07:00
chenyu
e5d5ae55f9
smaller inputs for test_sort and test_topk ( #10829 )
2025-06-16 00:21:15 -04:00
chenyu
7a6df0a161
remove .relu() call in several conv tests in test_ops ( #10807 )
...
* remove .relu() call in several conv tests in test_ops
testing negative parts double the effectiveness. keep the relu between two convs and the tests that explicitly test relu
* relax tol
2025-06-13 17:10:16 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] ( #10708 )
...
* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
2025-06-08 14:05:56 -07:00
George Hotz
8c76250d31
speed up a few tests ( #10692 )
2025-06-07 20:39:25 -07:00
ihar
74b849b5e1
remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape' ( #10677 )
...
* remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape'
* added the same set of unit tests for 'view' as for 'reshape' since 'view' is just an alias for 'reshape'
* improved tests for 'view' op
2025-06-07 22:15:31 -04:00
chenyu
ff1aad7b69
fix const float pow to int tensor ( #10655 )
...
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
geohotstan
602a145f8f
Add Tensor.unfold ( #10518 )
...
* yoinked 10272
* eitanturok's fixes
* hmmm should size be sint?
* add test
2025-05-26 11:15:44 -04:00
chenyu
7bfb20757c
fix tensor int floor div ( #10327 )
...
* fix tensor int floor div
* test_float_floordiv_scalar
2025-05-21 06:46:54 -04:00
chenyu
145e51247a
split CAST and BITCAST in PYTHON [pr] ( #10123 )
...
CAST only needs truncate and does not require dtype fmt. added bfloat16 tests can run locally
2025-04-30 23:27:35 -04:00
George Hotz
11113c9d07
reduce_unparented ( #10056 )
2025-04-26 09:48:16 -04:00
George Hotz
78caf55154
Revert "FP8 support on NVIDIA ( #8631 )"
...
This reverts commit 2c8e4ea865 .
2025-04-09 12:27:41 +08:00
George Hotz
d1505137ad
Revert "move TestOpsFp8s skipTest ( #9797 )"
...
This reverts commit a3aaf92b21 .
2025-04-09 12:27:40 +08:00
chenyu
a3aaf92b21
move TestOpsFp8s skipTest ( #9797 )
...
so get_available_devices is not called when running other tests
2025-04-08 22:44:07 -04:00
pkotzbach
2c8e4ea865
FP8 support on NVIDIA ( #8631 )
...
* squashed fp8 commits
* tensorcore start
* minor changes
* pre-commit
* pylint
* Delete fp8mul.cu
* clean
* small bugfix
* fix test_dtype
* fix test_dtype_alu
* add EMULATE_CUDA_SM89
* fix ci
* fix test_linearizer
* fix test_linearizer
* fix swizzle
* add debug to simple_matmul
* fixed swizzle
* python emulator
* refactor python emulator
* setup fix
* numpy setup
* ml_dtypes only in emulate_cuda_sm89
* fix pylint
* fix tests
* fix mypy
* fix mypy
* fix ruff
* done python emulator
* add acc type
* tests
* mypy
* clean code
* add cuda tensor core tests to CI
* minor fix
* clean test_dtype.py
* clean cstyle.py
* clean test_ops.py
* fix test
* fix test
* whitespaces
* pylint
* pylint
* amd?
* amd?
* amd
* reduce lines
* mockgpu remove
* fix
* ruff
* ruff
* fix mypy
* ruff
* test only for cuda
* fixed formatting
* small fixes
* small fix
* least_upper_dtype if fp8s not supported
* log and reciprocal are supported for fp8s
* ops python fixes
* dtypes.fp8s use
* e4m3 + e5m2 result dtype test
* truncate linter fix
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-08 21:54:04 -04:00
chenyu
3b8d923692
remove skip LLVM in test_div_int ( #9686 )
2025-04-02 04:15:00 -04:00
chenyu
0e34f9082e
helper functions for cstyle div mod [pr] ( #9673 )
2025-04-01 08:06:56 -04:00
Yvon Manzi
6652003839
Add cumprod to Tensor ( #9629 )
...
* probably how cumprod should look like
* update _cumalu to work with MUL
* shorter
* cumprod testing
* clean
* more cleanup
* add cumprod to torch backend.
* make it look like cumsum
* mypy fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 21:49:18 -04:00
b1tg
f90001e1a6
amd llvm render (no_comgr prereq) ( #9543 )
...
* amd llvm render
* skip test_div_rounding_mode
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-03-24 22:50:51 +08:00
geohotstan
309afa20b7
add Tensor.max_unpool2d ( #9518 )
...
* why does max_unpool2d feel slower than out.gradient ...
* slightly cleaner
* what happened to ruff
* need to think about this some more
* slightly faster now?
* clean up, 1 more failing edge case
* ok good
* working TINY_BACKEND
* nit doc wording
* retry CI
2025-03-22 12:11:33 -04:00
geohotstan
8c0d0a122c
Add return_indices to max_pool ( #9506 )
...
* wow argmax is so good
* 1 less line
* clean up and better variable names
* is this torch thing right...?
* add more tests
* slap a TODO on it
* clean ups
* prettier looking code and fix ceil mode test
* add return types and some docs
* ok that was a bad example since indices == value, just no example
2025-03-19 15:25:37 -04:00
chenyu
f8976dd2eb
enable more webgpu tests ( #9502 )
...
OSX has larger buffer number limit, and it supports fp16 now
2025-03-18 23:03:54 -04:00
Anish Umale
5e58f4b65b
Tiny backend test_ops fix part 3 ( #9483 )
...
* extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302
* pass dtype and device for ones_like
2025-03-17 18:01:51 -04:00
TJ
9fcef4d009
add masked_select to tensor.py ( #9468 )
...
* add masked_select to tensor.py
* fix tests
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-03-17 16:05:36 -04:00
geohotstan
53d6f1e1bb
Add bitonic cat sort ( #9422 )
...
* poc
* repeated values fail, sigh
* is this being timed out?
* fix up down names
* bitonic v2, does this run?
* bitonic v3, faster
* bitonic v3.1, faster
* bitonic v3.1.1, same speed unlucky
* support dim and indices
* bitonic v3.2, simpler code, TODO repeated indices
* bruv gimme green for once cmon
* cat (stack) implementation, slow but maybe one day when cat is fast meow
* revert to v3.2
* bitonic v4, who let the cats out edition
* clean up variable names
* figured out repeated indices :D
* ruff check --fix
* use sort for topk
* add Tensor.sort everywhere
* fix docs and add some types
* slightly better variable names
* am I doing torch inplace correctly?
* delegate sort to values_stable
* add a contig, faster first sort
* maybe don't test_inplace
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-17 12:01:23 -04:00
George Hotz
e174c6c3bc
new devectorizer ( #9331 )
...
* new devectorizer
* lidx
* test linearizer passes
* fix images
* fix unfoldable image load
* delete unused
* improve fix_unfoldable_image_load
* working for image
* fixup types
* fixup transcendental
* cast_vec
* cleaner transcendental
* skip failing test
* err, flip that
* not devec
* sqrt
2025-03-11 18:47:56 +08:00
chenyu
01e8b60911
acc_dtype -> dtype ( #9402 )
...
matched numpy and torch
2025-03-10 16:05:30 -04:00