tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 20:38:03 -05:00

Author	SHA1	Message	Date
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
Francis Lata	f8fe15e64e	move BoxCoder to mlperf helpers (#9773 )	2025-04-07 20:27:06 -04:00
Francis Lata	71b8890dd6	use validation dataloader inside retinanet eval (#9747 )	2025-04-05 16:46:55 -04:00
geohotstan	ac713e04db	ONNX add output shape validation (#9720 ) * add output shape validation and remove support for sequence_type * nit better err msg * add sequence_type back * improve err msg * Revert "improve err msg" This reverts commit `dc9eaea4bb`. * Revert "add sequence_type back" This reverts commit `288170b2d9`. * do explicit shape equality * small nit	2025-04-03 05:44:53 -04:00
chenyu	7dadbf3697	insert float() in bert acc (#9726 ) sum of bool by default uses default_float for acc. So without float, it might overflow with a large BS and default_float=HALF. fixed clsf_accuracy to not be inf in mi300x bert	2025-04-03 05:44:09 -04:00
George Hotz	5c7b549eab	use functools.cache instead of lru_cache(None) [pr] (#9714 ) * use functools.cache instead of lru_cache(None) [pr] * more cache	2025-04-03 11:47:13 +08:00
geohotstan	e1d7e47cca	fix ONNX IsInf unintended dtype promotion (#9711 ) * add IsInf * add corresponding test * that float16 is kinda silly	2025-04-02 22:46:15 -04:00
George Hotz	f72a87fd0e	add proper support for Ops.IGNORE to remove store masks (#9692 ) * add proper support for Ops.IGNORE to remove store masks * remove useless NHWC * revert that	2025-04-02 16:38:01 +08:00
George Hotz	6f812d3f2f	fixes from the dsp branch + 12500 lines (#9683 ) * fixes from the dsp branch * more changes * those are gep pushing	2025-04-02 13:07:17 +08:00
qazal	eee0dcc37a	merge viz back into one file (#9672 ) * merge viz back into one file * work * rename lib to js directory * fix diff * less indenting * memory graph is back * viz_sz.py	2025-04-01 19:52:02 +08:00
nimlgen	3e2f42c2e8	autogen: remove am headers from extra (#9666 )	2025-04-01 14:45:30 +07:00
Anish Umale	a1ee4d587f	Fix test_ops for tiny backend (#9302 ) * fix some tests in test_ops for torch backend(171 failing) * fix more tests (135 failures) * fix tests (126 failing) * handle transposed convs (109 tests failing) * fix slice * fix lshift & rshift and more tests (87 tests failing) * revert accidental change * remove unnecessary changes (82 failures) * fix backward for avg_pool2d (78 failures) * fix backward for avg_pool2d (78 failures) * fix replication backpass * fix reflection pad back pass (71 failures) * cummax with indicies, aten.mv and move out methods (67 failures) * extract avg_pool2d and avg_pool3d to separate functions (62 failures) * revert changes for cat_out * rewrite avg_pool and pad without repetition * remove duplicates from decomps * slice rewrite and add slice_backward (59 failures) * add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297 * fix linter error and remove Tensor.pad (48 failures) * add select_backward and index_put (40 failures) * fix some more tests (36 failures) * fix more tests (12 failures) * some cleanups and fix couple more tests (10 failures) * cleaner way to write upsample * some more upsample cleanups * use lambda for upsample * add autowrapper for upsample forward * cumsum and max_dim without aten functions * revert _log_softmax * fix more tests (1 failure) * make linter happy * move import to appropriate func * make linter happy * add codes for noqa * some more refactors * remove comment * remove dependency on aten function for conv backward * some more refactors * add returns * revert a change from merge * some cleanups * remove whitespace * remove ruff change * revert upsample * add masked_fill_.Tensor and scatter.src_out * add todo * fix test_biased_conv2d * fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :( * revert torch_debug * revert torch_debug * skip test_gather_failure for the tiny backend * make padding registration more consise * add nonzero * remove scatter_add since we already have the out * fix scatter * remove some repetition * make upsample backward registrations more concise * remove select.int * use Tensor.cumsum * realize conv2d outputs before backward to fix test_biased_conv2d * add a todo for realize(1 failure) * add new_empty and new_empty_strided * make test_pad_circular_mode forward only and remove redundant stuff * fix linter errors * remove expect failure * just tb * slice is a view_op * contiguous only when lazydata.is_realized * fix backward for test_pad_circular_mode * revert torch.nn.functional.pad override * add transpose.int and make constant_pad_nd contiguous * slice_backwards has no kwargs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-31 21:13:09 -04:00
Priyank Patel	e2d9322d21	torch backend: partial fix for strided related test fails (#9642 ) * partial fix for strided related test fails * cleanup * fix lint	2025-03-31 05:45:18 -04:00
George Hotz	49b1c46d16	good changes from the dsp branch (#9638 )	2025-03-31 13:02:53 +08:00
Yvon Manzi	6652003839	Add cumprod to Tensor (#9629 ) * probably how cumprod should look like * update _cumalu to work with MUL * shorter * cumprod testing * clean * more cleanup * add cumprod to torch backend. * make it look like cumsum * mypy fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:49:18 -04:00
geohotstan	d52e91db7b	ONNX ops clean ups (#9622 ) * combine work from remove numpy and onnx ops tests * clippy --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:39:22 -04:00
uuuvn	2a4247b8c2	RDNA 3.5 support (#9627 )	2025-03-31 01:15:20 +08:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
nimlgen	54e1e59b44	am: rdna 4 support (#9621 ) * hm * fix * return this * fine * g * ruff * fix	2025-03-29 23:16:27 +07:00
uuuvn	5908b89f71	MI300X support (WIP) (#9585 )	2025-03-29 19:46:42 +08:00
uuuvn	dd9aae02c3	Refactor ops_amd.py (MI300X prereq) (#9428 )	2025-03-29 00:17:20 +07:00
George Hotz	1e6e75e39a	little changes from dsp branch (#9582 ) * little changes from dsp branch * not that one * need the where * Revert "need the where" This reverts commit `140f89c878`.	2025-03-26 20:01:21 +08:00
Andrey	7b865ed03d	use tuple in isinstance for type checking (#9583 )	2025-03-26 19:36:48 +08:00
nimlgen	4cf2b68ca8	am_smi: fix init for newer versions (#9559 )	2025-03-25 23:48:05 +07:00
Priyank Patel	4f5e03bd60	better fix inplace detach (#9557 )	2025-03-24 22:50:28 +08:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00
George Hotz	68053d0510	dsp stuff / sniff ioctls from snpe (#9490 ) * sniff ioctls from snpe * dump input buffers * snpe logs from dsp * NHWC support * knum 3 * this run? * revert those --------- Co-authored-by: Comma Device <device@comma.ai>	2025-03-20 10:38:23 +08:00
geohotstan	8c0d0a122c	Add return_indices to max_pool (#9506 ) * wow argmax is so good * 1 less line * clean up and better variable names * is this torch thing right...? * add more tests * slap a TODO on it * clean ups * prettier looking code and fix ceil mode test * add return types and some docs * ok that was a bad example since indices == value, just no example	2025-03-19 15:25:37 -04:00
Francis Lam	1e5d9ad8f7	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 ) * extra/gemm/max_matmul: start of custom kernels for GEMM * add an unoptimized FP16/FP16 MMA example * add slow 3-stage fp16 acc example * add correct 3-stage pipeline with unswizzled/flat smem input (slow) * add acc fp16 example with 3 stages and swizzle (no bank conflicts) * add max version of NV fp16_fp16_fp16 * fix up comments and removed unused code in max variations * add start of no_xor example * fix to account for UOps to Ops	2025-03-19 15:04:57 +08:00
b1tg	a95b489a55	nanoGPT train works with tiny torch backend (#9283 ) * train_shakespeare_char.py works * move aten.where.self_out to tiny_backend_out * fix memory leak * corealize in the backward_hook * Update backend.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-19 11:51:02 +08:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
nimlgen	a82c9332d3	am: rename soc21 to soc (#9482 )	2025-03-18 08:54:26 +08:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
George Hotz	824c5f41ac	dsp work try 3 (#9475 ) * dsp work try 3 * padding	2025-03-17 16:42:12 +08:00
George Hotz	52ae9af4dd	Fast DSP for MobileNetV2 (try 2) (#9467 ) * Fast DSP for MobileNetV2 (try 2) * enable fast path on uchar * fix tests	2025-03-17 15:10:36 +08:00
George Hotz	09e7708b49	minimum change for rdna4 [pr] (#9455 )	2025-03-16 13:39:24 +08:00
George Hotz	cb7a7f69c7	quantization preprocessor from DSP, should be universal (#9437 ) * quantization preprocessor from DSP, should be universal * touchups * fix tests	2025-03-15 07:49:37 +08:00
chenyu	0e591baf43	redo simple_matmul change (#9450 ) numpy does not support bfloat16	2025-03-14 17:53:52 -04:00
chenyu	b0f63d3c04	Revert "`simple_matmul.py` uses np to generate random (#9438 )" (#9449 ) This reverts commit `14018050c1`.	2025-03-14 17:14:22 -04:00
Ignacio Sica	14018050c1	`simple_matmul.py` uses np to generate random (#9438 ) * np generates randoms * hotfix: use generator for int dtype * float32 as default dtype for float generator * use np.float32 instead of stirng * add dtype= to integers generator * change import _to_np_dtype source	2025-03-14 17:36:50 -03:00
geohotstan	0bed9b6cd2	benchmark huggingface onnx models (#8493 ) * add ability to ORT=1 * test_vs_ort * useless f * actually have benchmark take in modelproto for more flexibility in huggingface stuff * ok runs * good * oops fix benchmark_onnx __main__ * 224 as default * add ORT=1 option to huggingface_onnx * use Tensor to get_input * add abilty to do single onnx model testing * better names * merge properly... * copy in onnx_helpers * better * decent script * need to add debug tool first * new limit usage * why did narrowing_error come back.. * pretty decent * revert validate change * more ops bug fixes * revert unnecessary changes * fix InstanceNorm too * remove op from O4 * minimize diff * address old feedback * unsure of this, just revert * remove that assert * working attention * to_python_const Attention * cant init from np constant so just do this * final * fix bug in attention * attention clean ups * add hard TODOs and REPOPATH and TRUNCATE envvar * fix input_ids default value * final * fix scatter * cleaner _prepare_quantize * use new attention and tempfile for huggingface script * more stats * update * remove outdated code * big refactor to something usable by CI * booooooom * clean up * update to using yaml as env var input * add dry run * try * valid pad * use argparser and fix gather bug * ignore all yaml * tiny bit more polish * woah ignoring all yaml was not right * typo * decouple huggingface_onnx_run debug run with huggingface_onnx_download * bug fix for downloading single model * WOOOO ok much better * oops argparse 'required' is an invalid argument for positionals * oops argparse 'required' is an invalid argument for positionals * add assert * fix types --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-12 20:13:12 -04:00
Priyank Patel	4714c4f9ad	torch backend multigpu - add devices and tests (#9414 ) * add multi-device support and tests * simplify	2025-03-12 11:33:11 +08:00
uuuvn	e85001b6ee	SQTT profiling (#9278 ) * sqtt * docs * multi-device * ProfileSQTTEvent * exec update * 256mb default * don't let people hang their gpus * bitfields from autogen * asic info from mesa * more bitfields from autogen * SQTT_ITRACE_SE_MASK --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-11 13:19:56 +08:00
Priyank Patel	beed00eabe	fix torch backend memory leak (#9395 ) * fix leak, realize everything on torch optim step * only realize a subset --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-11 10:48:20 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00

... 3 4 5 6 7 ...

1242 Commits