chenyu
4fe19eec72
Ops.TRUNC ( #11659 )
2025-08-13 18:40:48 -04:00
chenyu
0c97d6de1b
don't round pow output for int pow int ( #11625 )
...
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850
support int Tensor pow to const non-negative int ( #11624 )
...
matches torch
2025-08-11 19:50:19 -04:00
chenyu
a67e0917c3
list indexing can normalize in python ( #11609 )
...
* list indexing can normalize in python
list index does not need to be normalized in tensor
* update those
2025-08-10 20:02:38 -04:00
chenyu
1181ec0cd2
few more tensor indexing test cases ( #11608 )
2025-08-10 18:56:42 -04:00
chenyu
dfb702ef33
fix sort for small dim ( #11601 )
...
* fix sort for small dim
* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
chenyu
aa1a6f2132
support threshold in Tensor.softplus ( #11564 )
...
fix gradient for large input
2025-08-07 13:43:18 -04:00
chenyu
dbc7807c61
enable WEBGPU tests with buffer limit ( #11489 )
...
TestSample still fails?
2025-08-03 13:02:44 -07:00
chenyu
2d7c28de6a
clean up dup lambdas in helper_test_exception ( #11325 )
2025-07-22 12:21:57 -04:00
chenyu
fb42c84365
merge TestRollEdgeCases into test_ops ( #11321 )
2025-07-22 10:55:57 -04:00
chenyu
1d8b3e9d1c
movementop only Tensor.roll ( #11317 )
...
* movementop only Tensor.roll
* fixed
2025-07-22 10:34:15 -04:00
chenyu
6e9506e6fd
Tensor.roll supports dims=None ( #11313 )
2025-07-21 17:29:23 -04:00
chenyu
d3a93185a6
clean up test_roll ( #11312 )
2025-07-21 16:00:50 -04:00
chenyu
341a686799
Tensor.diagonal ( #11122 )
...
only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant
2025-07-07 16:21:26 -04:00
Nino Risteski
a1a146a499
adding enable_gqa in SDPA ( #11097 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-07-06 23:25:33 -07:00
chenyu
7468959f4b
Tensor.argsort ( #11112 )
2025-07-06 13:56:35 -04:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend ( #11113 )
...
* clean tests, set full_matrices false
* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0
pipe linalg svd to torch ( #11109 )
...
and found a bug in svd
2025-07-06 08:37:25 -04:00
chenyu
845a4d32bc
Tensor.diag ( #11108 )
...
also updated Tensor.eye to use it
2025-07-05 23:03:02 -04:00
ttomsa
4905af4ae0
remove invalid int div test ( #11106 )
...
* rm test
* also rm this
2025-07-05 18:57:55 -04:00
chenyu
a2f5a54458
move sparse_categorical_crossentropy to test_ops ( #11083 )
...
also flattened the tests
2025-07-03 21:40:54 -04:00
chenyu
678cabc6f2
use argfix in Tensor.stack ( #11077 )
...
works for multiple Tensor args or single tuple/list of Tensors, but not the mixed
2025-07-03 12:15:11 -04:00
Ahmed Harmouche
e992ed10dc
WebGPU on Windows ( #10890 )
...
* WebGPU on Windows
* Fix dawn-python install
* New test
* pydeps
* Minor fix
* Only install dawn-python on windows webgpu
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-07-02 08:38:45 -07:00
chenyu
126fcf4129
clean up AMD_LLVM in tests ( #11021 )
2025-06-28 22:45:47 -04:00
chenyu
49bba2f0a0
improve test_nll_loss ( #10986 )
...
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
chenyu
0612acfc70
improve Tensor.cross_entropy ( #10985 )
...
separate when Y is prob vs indices and check shapes for indices. also fix higher dim cases
2025-06-26 01:39:48 -04:00
chenyu
18e264a449
Tensor.logsigmoid ( #10955 )
2025-06-24 11:16:14 -04:00
chenyu
35504c938e
torch.clip(x,y) -> x.clip(y) in test_ops ( #10954 )
...
* torch.clip(x,y) -> x.clip(y) in test_ops
* test_binary_crossentropy_logits_pos_weights
2025-06-24 10:22:19 -04:00
Fang-Pen Lin
86d458533f
Add pos_weight for binary_crossentropy_logits ( #10855 )
...
* Add pos_weight for binary_crossentropy_logits
* Remove debug code
* Code style
* Code style
* Rename
2025-06-24 09:42:37 -04:00
chenyu
2d9c61e39e
test more dims in test_logsumexp and test_logcumsumexp ( #10907 )
...
refactoring squeeze and unsqueeze is easy to get wrong
2025-06-20 21:42:18 -04:00
Nino Risteski
3771cc0f77
fix test logcumsumexp broken devectorize=0 ( #10880 )
...
* fix test logcumsumexp numerical
* lint
* Use dtypes.min instead of -1e4
2025-06-20 20:54:50 -04:00
George Hotz
a493eb396c
fix view add 0 ( #10840 )
2025-06-16 16:46:12 -07:00
chenyu
e5d5ae55f9
smaller inputs for test_sort and test_topk ( #10829 )
2025-06-16 00:21:15 -04:00
chenyu
7a6df0a161
remove .relu() call in several conv tests in test_ops ( #10807 )
...
* remove .relu() call in several conv tests in test_ops
testing negative parts double the effectiveness. keep the relu between two convs and the tests that explicitly test relu
* relax tol
2025-06-13 17:10:16 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] ( #10708 )
...
* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
2025-06-08 14:05:56 -07:00
George Hotz
8c76250d31
speed up a few tests ( #10692 )
2025-06-07 20:39:25 -07:00
ihar
74b849b5e1
remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape' ( #10677 )
...
* remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape'
* added the same set of unit tests for 'view' as for 'reshape' since 'view' is just an alias for 'reshape'
* improved tests for 'view' op
2025-06-07 22:15:31 -04:00
chenyu
ff1aad7b69
fix const float pow to int tensor ( #10655 )
...
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
geohotstan
602a145f8f
Add Tensor.unfold ( #10518 )
...
* yoinked 10272
* eitanturok's fixes
* hmmm should size be sint?
* add test
2025-05-26 11:15:44 -04:00
chenyu
7bfb20757c
fix tensor int floor div ( #10327 )
...
* fix tensor int floor div
* test_float_floordiv_scalar
2025-05-21 06:46:54 -04:00
chenyu
145e51247a
split CAST and BITCAST in PYTHON [pr] ( #10123 )
...
CAST only needs truncate and does not require dtype fmt. added bfloat16 tests can run locally
2025-04-30 23:27:35 -04:00
George Hotz
11113c9d07
reduce_unparented ( #10056 )
2025-04-26 09:48:16 -04:00
George Hotz
78caf55154
Revert "FP8 support on NVIDIA ( #8631 )"
...
This reverts commit 2c8e4ea865 .
2025-04-09 12:27:41 +08:00
George Hotz
d1505137ad
Revert "move TestOpsFp8s skipTest ( #9797 )"
...
This reverts commit a3aaf92b21 .
2025-04-09 12:27:40 +08:00
chenyu
a3aaf92b21
move TestOpsFp8s skipTest ( #9797 )
...
so get_available_devices is not called when running other tests
2025-04-08 22:44:07 -04:00
pkotzbach
2c8e4ea865
FP8 support on NVIDIA ( #8631 )
...
* squashed fp8 commits
* tensorcore start
* minor changes
* pre-commit
* pylint
* Delete fp8mul.cu
* clean
* small bugfix
* fix test_dtype
* fix test_dtype_alu
* add EMULATE_CUDA_SM89
* fix ci
* fix test_linearizer
* fix test_linearizer
* fix swizzle
* add debug to simple_matmul
* fixed swizzle
* python emulator
* refactor python emulator
* setup fix
* numpy setup
* ml_dtypes only in emulate_cuda_sm89
* fix pylint
* fix tests
* fix mypy
* fix mypy
* fix ruff
* done python emulator
* add acc type
* tests
* mypy
* clean code
* add cuda tensor core tests to CI
* minor fix
* clean test_dtype.py
* clean cstyle.py
* clean test_ops.py
* fix test
* fix test
* whitespaces
* pylint
* pylint
* amd?
* amd?
* amd
* reduce lines
* mockgpu remove
* fix
* ruff
* ruff
* fix mypy
* ruff
* test only for cuda
* fixed formatting
* small fixes
* small fix
* least_upper_dtype if fp8s not supported
* log and reciprocal are supported for fp8s
* ops python fixes
* dtypes.fp8s use
* e4m3 + e5m2 result dtype test
* truncate linter fix
---------
Co-authored-by: pkotzbach <pawkotz@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-04-08 21:54:04 -04:00
chenyu
3b8d923692
remove skip LLVM in test_div_int ( #9686 )
2025-04-02 04:15:00 -04:00
chenyu
0e34f9082e
helper functions for cstyle div mod [pr] ( #9673 )
2025-04-01 08:06:56 -04:00
Yvon Manzi
6652003839
Add cumprod to Tensor ( #9629 )
...
* probably how cumprod should look like
* update _cumalu to work with MUL
* shorter
* cumprod testing
* clean
* more cleanup
* add cumprod to torch backend.
* make it look like cumsum
* mypy fix
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-03-30 21:49:18 -04:00
b1tg
f90001e1a6
amd llvm render (no_comgr prereq) ( #9543 )
...
* amd llvm render
* skip test_div_rounding_mode
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-03-24 22:50:51 +08:00