tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 01:18:26 -05:00

Author	SHA1	Message	Date
Park Jun	c3ad7b2a84	create randperm and support pytorch backend (#10019 )	2025-04-24 07:29:02 -04:00
Matthew Daiter	b545338e59	isin_Tensor_out added (#10018 )	2025-04-24 07:26:51 -04:00
nimlgen	1c5e353249	am: use mmio iface (#10012 ) * am: use mmio iface * linters * fixes * fixes + cleanups * mute * mypy * style	2025-04-24 00:27:04 +03:00
Francis Lata	defa1e77f6	get the proper dataset count (#9962 )	2025-04-21 12:11:37 -04:00
Francis Lata	d7e247f329	RetinaNet INITMLPERF support (#9950 ) * fixes to make fake data work * fix eval beam * fix merge issue	2025-04-21 10:32:05 -04:00
akhuntsaria	2d423e6737	fix assertion message for supported device in export_model (#9957 )	2025-04-21 09:23:44 -04:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	720f20865b	remove required_optimizations (#9848 )	2025-04-19 16:51:16 -04:00
qazal	16dfe0a902	upstream remu (#9921 )	2025-04-18 01:57:36 +03:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
Xingyu	047c8fd70d	Add amax support to Tensor operations in Torch Backend (#9905 ) * Add amax support to Tensor operations - Implemented amax function in backend.py for tensor max operations. - Added unit tests for amax in test.py to ensure correct functionality. * Fix formatting in amax output function - Adjusted spacing in the amax output lambda function in backend.py - Improved code readability for better maintenance	2025-04-16 10:35:50 +01:00
geohotstan	4e8f25109a	Revert "ONNX add output shape validation (#9720 )" (#9904 ) This reverts commit `ac713e04db`.	2025-04-16 03:15:56 -04:00
nimlgen	7c466c24f7	am_smi: refactor to support arches (#9864 ) * am_smi: refactor to support arches * shorter	2025-04-12 20:37:01 +03:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
Francis Lata	eb2e59db42	RetinaNet model type annotations and loss functions (#9822 ) * add type annotations and loss functions for training * combine sum of multiple dims inside loss functions	2025-04-10 00:31:37 -04:00
Francis Lata	7bb36d71b2	remove openimages iterate (#9820 )	2025-04-09 22:54:12 -04:00
chenyu	c5db5b83b9	add SHOULD_USE_TC=1 check to simple_matmul (#9802 ) * add SHOULD_USE_TC=1 check to simple_matmul also zero centered the random input and update atol for tf32 * ATOL=2e-2 for HALF	2025-04-09 02:24:42 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	14928fecff	Revert "fix TF32 tensor core dropped in tc_sm89 (#9798 )" This reverts commit `7c9a96824f`.	2025-04-09 12:27:39 +08:00
chenyu	7c9a96824f	fix TF32 tensor core dropped in tc_sm89 (#9798 ) also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul	2025-04-08 23:20:50 -04:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
Francis Lata	f8fe15e64e	move BoxCoder to mlperf helpers (#9773 )	2025-04-07 20:27:06 -04:00
Francis Lata	71b8890dd6	use validation dataloader inside retinanet eval (#9747 )	2025-04-05 16:46:55 -04:00
geohotstan	ac713e04db	ONNX add output shape validation (#9720 ) * add output shape validation and remove support for sequence_type * nit better err msg * add sequence_type back * improve err msg * Revert "improve err msg" This reverts commit `dc9eaea4bb`. * Revert "add sequence_type back" This reverts commit `288170b2d9`. * do explicit shape equality * small nit	2025-04-03 05:44:53 -04:00
chenyu	7dadbf3697	insert float() in bert acc (#9726 ) sum of bool by default uses default_float for acc. So without float, it might overflow with a large BS and default_float=HALF. fixed clsf_accuracy to not be inf in mi300x bert	2025-04-03 05:44:09 -04:00
George Hotz	5c7b549eab	use functools.cache instead of lru_cache(None) [pr] (#9714 ) * use functools.cache instead of lru_cache(None) [pr] * more cache	2025-04-03 11:47:13 +08:00
geohotstan	e1d7e47cca	fix ONNX IsInf unintended dtype promotion (#9711 ) * add IsInf * add corresponding test * that float16 is kinda silly	2025-04-02 22:46:15 -04:00
George Hotz	f72a87fd0e	add proper support for Ops.IGNORE to remove store masks (#9692 ) * add proper support for Ops.IGNORE to remove store masks * remove useless NHWC * revert that	2025-04-02 16:38:01 +08:00
George Hotz	6f812d3f2f	fixes from the dsp branch + 12500 lines (#9683 ) * fixes from the dsp branch * more changes * those are gep pushing	2025-04-02 13:07:17 +08:00
qazal	eee0dcc37a	merge viz back into one file (#9672 ) * merge viz back into one file * work * rename lib to js directory * fix diff * less indenting * memory graph is back * viz_sz.py	2025-04-01 19:52:02 +08:00
nimlgen	3e2f42c2e8	autogen: remove am headers from extra (#9666 )	2025-04-01 14:45:30 +07:00
Anish Umale	a1ee4d587f	Fix test_ops for tiny backend (#9302 ) * fix some tests in test_ops for torch backend(171 failing) * fix more tests (135 failures) * fix tests (126 failing) * handle transposed convs (109 tests failing) * fix slice * fix lshift & rshift and more tests (87 tests failing) * revert accidental change * remove unnecessary changes (82 failures) * fix backward for avg_pool2d (78 failures) * fix backward for avg_pool2d (78 failures) * fix replication backpass * fix reflection pad back pass (71 failures) * cummax with indicies, aten.mv and move out methods (67 failures) * extract avg_pool2d and avg_pool3d to separate functions (62 failures) * revert changes for cat_out * rewrite avg_pool and pad without repetition * remove duplicates from decomps * slice rewrite and add slice_backward (59 failures) * add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297 * fix linter error and remove Tensor.pad (48 failures) * add select_backward and index_put (40 failures) * fix some more tests (36 failures) * fix more tests (12 failures) * some cleanups and fix couple more tests (10 failures) * cleaner way to write upsample * some more upsample cleanups * use lambda for upsample * add autowrapper for upsample forward * cumsum and max_dim without aten functions * revert _log_softmax * fix more tests (1 failure) * make linter happy * move import to appropriate func * make linter happy * add codes for noqa * some more refactors * remove comment * remove dependency on aten function for conv backward * some more refactors * add returns * revert a change from merge * some cleanups * remove whitespace * remove ruff change * revert upsample * add masked_fill_.Tensor and scatter.src_out * add todo * fix test_biased_conv2d * fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :( * revert torch_debug * revert torch_debug * skip test_gather_failure for the tiny backend * make padding registration more consise * add nonzero * remove scatter_add since we already have the out * fix scatter * remove some repetition * make upsample backward registrations more concise * remove select.int * use Tensor.cumsum * realize conv2d outputs before backward to fix test_biased_conv2d * add a todo for realize(1 failure) * add new_empty and new_empty_strided * make test_pad_circular_mode forward only and remove redundant stuff * fix linter errors * remove expect failure * just tb * slice is a view_op * contiguous only when lazydata.is_realized * fix backward for test_pad_circular_mode * revert torch.nn.functional.pad override * add transpose.int and make constant_pad_nd contiguous * slice_backwards has no kwargs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-31 21:13:09 -04:00
Priyank Patel	e2d9322d21	torch backend: partial fix for strided related test fails (#9642 ) * partial fix for strided related test fails * cleanup * fix lint	2025-03-31 05:45:18 -04:00
George Hotz	49b1c46d16	good changes from the dsp branch (#9638 )	2025-03-31 13:02:53 +08:00
Yvon Manzi	6652003839	Add cumprod to Tensor (#9629 ) * probably how cumprod should look like * update _cumalu to work with MUL * shorter * cumprod testing * clean * more cleanup * add cumprod to torch backend. * make it look like cumsum * mypy fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:49:18 -04:00
geohotstan	d52e91db7b	ONNX ops clean ups (#9622 ) * combine work from remove numpy and onnx ops tests * clippy --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:39:22 -04:00
uuuvn	2a4247b8c2	RDNA 3.5 support (#9627 )	2025-03-31 01:15:20 +08:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
nimlgen	54e1e59b44	am: rdna 4 support (#9621 ) * hm * fix * return this * fine * g * ruff * fix	2025-03-29 23:16:27 +07:00
uuuvn	5908b89f71	MI300X support (WIP) (#9585 )	2025-03-29 19:46:42 +08:00
uuuvn	dd9aae02c3	Refactor ops_amd.py (MI300X prereq) (#9428 )	2025-03-29 00:17:20 +07:00
George Hotz	1e6e75e39a	little changes from dsp branch (#9582 ) * little changes from dsp branch * not that one * need the where * Revert "need the where" This reverts commit `140f89c878`.	2025-03-26 20:01:21 +08:00
Andrey	7b865ed03d	use tuple in isinstance for type checking (#9583 )	2025-03-26 19:36:48 +08:00
nimlgen	4cf2b68ca8	am_smi: fix init for newer versions (#9559 )	2025-03-25 23:48:05 +07:00
Priyank Patel	4f5e03bd60	better fix inplace detach (#9557 )	2025-03-24 22:50:28 +08:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00

... 5 6 7 8 9 ...

1363 Commits