tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
chenyu	2d7c28de6a	clean up dup lambdas in helper_test_exception (#11325 )	2025-07-22 12:21:57 -04:00
chenyu	fb42c84365	merge TestRollEdgeCases into test_ops (#11321 )	2025-07-22 10:55:57 -04:00
chenyu	1d8b3e9d1c	movementop only Tensor.roll (#11317 ) * movementop only Tensor.roll * fixed	2025-07-22 10:34:15 -04:00
chenyu	6e9506e6fd	Tensor.roll supports dims=None (#11313 )	2025-07-21 17:29:23 -04:00
chenyu	d3a93185a6	clean up test_roll (#11312 )	2025-07-21 16:00:50 -04:00
chenyu	341a686799	Tensor.diagonal (#11122 ) only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant	2025-07-07 16:21:26 -04:00
Nino Risteski	a1a146a499	adding enable_gqa in SDPA (#11097 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-07-06 23:25:33 -07:00
chenyu	7468959f4b	Tensor.argsort (#11112 )	2025-07-06 13:56:35 -04:00
kevvz	b7af9cf849	clean svd tests, set full_matrices false in torch backend (#11113 ) * clean tests, set full_matrices false * add more shape asserts	2025-07-06 13:55:49 -04:00
chenyu	ba88ec3ad0	pipe linalg svd to torch (#11109 ) and found a bug in svd	2025-07-06 08:37:25 -04:00
chenyu	845a4d32bc	Tensor.diag (#11108 ) also updated Tensor.eye to use it	2025-07-05 23:03:02 -04:00
ttomsa	4905af4ae0	remove invalid int div test (#11106 ) * rm test * also rm this	2025-07-05 18:57:55 -04:00
chenyu	a2f5a54458	move sparse_categorical_crossentropy to test_ops (#11083 ) also flattened the tests	2025-07-03 21:40:54 -04:00
chenyu	678cabc6f2	use argfix in Tensor.stack (#11077 ) works for multiple Tensor args or single tuple/list of Tensors, but not the mixed	2025-07-03 12:15:11 -04:00
Ahmed Harmouche	e992ed10dc	WebGPU on Windows (#10890 ) * WebGPU on Windows * Fix dawn-python install * New test * pydeps * Minor fix * Only install dawn-python on windows webgpu --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-07-02 08:38:45 -07:00
chenyu	126fcf4129	clean up AMD_LLVM in tests (#11021 )	2025-06-28 22:45:47 -04:00
chenyu	49bba2f0a0	improve test_nll_loss (#10986 ) build target and weight tensors outside so it tests backward too.	2025-06-26 02:46:55 -04:00
chenyu	0612acfc70	improve Tensor.cross_entropy (#10985 ) separate when Y is prob vs indices and check shapes for indices. also fix higher dim cases	2025-06-26 01:39:48 -04:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
chenyu	35504c938e	torch.clip(x,y) -> x.clip(y) in test_ops (#10954 ) * torch.clip(x,y) -> x.clip(y) in test_ops * test_binary_crossentropy_logits_pos_weights	2025-06-24 10:22:19 -04:00
Fang-Pen Lin	86d458533f	Add pos_weight for binary_crossentropy_logits (#10855 ) * Add pos_weight for binary_crossentropy_logits * Remove debug code * Code style * Code style * Rename	2025-06-24 09:42:37 -04:00
chenyu	2d9c61e39e	test more dims in test_logsumexp and test_logcumsumexp (#10907 ) refactoring squeeze and unsqueeze is easy to get wrong	2025-06-20 21:42:18 -04:00
Nino Risteski	3771cc0f77	fix test logcumsumexp broken devectorize=0 (#10880 ) * fix test logcumsumexp numerical * lint * Use dtypes.min instead of -1e4	2025-06-20 20:54:50 -04:00
George Hotz	a493eb396c	fix view add 0 (#10840 )	2025-06-16 16:46:12 -07:00
chenyu	e5d5ae55f9	smaller inputs for test_sort and test_topk (#10829 )	2025-06-16 00:21:15 -04:00
chenyu	7a6df0a161	remove .relu() call in several conv tests in test_ops (#10807 ) * remove .relu() call in several conv tests in test_ops testing negative parts double the effectiveness. keep the relu between two convs and the tests that explicitly test relu * relax tol	2025-06-13 17:10:16 -04:00
George Hotz	81b9c04574	move high level stuff to unit tests [pr] (#10708 ) * move high level stuff to unit tests [pr] * process replay on unit tests * fix pr, less compute * set omp num threads * set 200MB buffer size limit * delete junk * fix tests * faster * move test_indexing to unit * faster	2025-06-08 14:05:56 -07:00
George Hotz	8c76250d31	speed up a few tests (#10692 )	2025-06-07 20:39:25 -07:00
ihar	74b849b5e1	remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape' (#10677 ) * remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape' * added the same set of unit tests for 'view' as for 'reshape' since 'view' is just an alias for 'reshape' * improved tests for 'view' op	2025-06-07 22:15:31 -04:00
chenyu	ff1aad7b69	fix const float pow to int tensor (#10655 ) was incorrectly casted into int	2025-06-05 19:15:12 -04:00
geohotstan	602a145f8f	Add Tensor.unfold (#10518 ) * yoinked 10272 * eitanturok's fixes * hmmm should size be sint? * add test	2025-05-26 11:15:44 -04:00
chenyu	7bfb20757c	fix tensor int floor div (#10327 ) * fix tensor int floor div * test_float_floordiv_scalar	2025-05-21 06:46:54 -04:00
chenyu	145e51247a	split CAST and BITCAST in PYTHON [pr] (#10123 ) CAST only needs truncate and does not require dtype fmt. added bfloat16 tests can run locally	2025-04-30 23:27:35 -04:00
George Hotz	11113c9d07	reduce_unparented (#10056 )	2025-04-26 09:48:16 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	d1505137ad	Revert "move TestOpsFp8s skipTest (#9797 )" This reverts commit `a3aaf92b21`.	2025-04-09 12:27:40 +08:00
chenyu	a3aaf92b21	move TestOpsFp8s skipTest (#9797 ) so get_available_devices is not called when running other tests	2025-04-08 22:44:07 -04:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
chenyu	3b8d923692	remove skip LLVM in test_div_int (#9686 )	2025-04-02 04:15:00 -04:00
chenyu	0e34f9082e	helper functions for cstyle div mod [pr] (#9673 )	2025-04-01 08:06:56 -04:00
Yvon Manzi	6652003839	Add cumprod to Tensor (#9629 ) * probably how cumprod should look like * update _cumalu to work with MUL * shorter * cumprod testing * clean * more cleanup * add cumprod to torch backend. * make it look like cumsum * mypy fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:49:18 -04:00
b1tg	f90001e1a6	amd llvm render (no_comgr prereq) (#9543 ) * amd llvm render * skip test_div_rounding_mode --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-24 22:50:51 +08:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
geohotstan	8c0d0a122c	Add return_indices to max_pool (#9506 ) * wow argmax is so good * 1 less line * clean up and better variable names * is this torch thing right...? * add more tests * slap a TODO on it * clean ups * prettier looking code and fix ceil mode test * add return types and some docs * ok that was a bad example since indices == value, just no example	2025-03-19 15:25:37 -04:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
George Hotz	e174c6c3bc	new devectorizer (#9331 ) * new devectorizer * lidx * test linearizer passes * fix images * fix unfoldable image load * delete unused * improve fix_unfoldable_image_load * working for image * fixup types * fixup transcendental * cast_vec * cleaner transcendental * skip failing test * err, flip that * not devec * sqrt	2025-03-11 18:47:56 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00

1 2 3 4 5 ...

592 Commits