tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-05 04:05:05 -05:00

Author	SHA1	Message	Date
George Hotz	ac02e7347d	ptx timing vs cuda timing (#3659 )	2024-03-08 10:17:49 -08:00
chenyu	e25879d50e	don't get new var_val for the same ast in fuzz_linearizer (#3657 ) fixed result comparison for kernels with variables	2024-03-08 09:49:24 -05:00
chenyu	1130c73844	add FUZZ_NTH to fuzz_linearizer (#3656 ) * add FUZZ_NTH to fuzz_linearizer also update tests in test_linearizer_failures to not just run on METAL * update failures for HIP/HSA * test_failure_21 LLVM PADTO	2024-03-08 09:16:49 -05:00
David Hou	9f66dcf718	PolynomialDecayWithWarmup + tests (#3649 ) * working PolynomialDecayWithWarmup + tests....... add lars_util.py, oops * keep lars_util.py as intact as possible, simplify our interface * whitespace * clean up * clean up * asserts * test polylr for full resnet training run * add comment * rename * fix do_optim * don't cast lr * info * calculate from train_files * skip it	2024-03-07 18:53:36 -05:00
chenyu	57df8e8d82	update fuzz_linearizer (#3648 ) included non-reduce kernel and kernel with variables. green msg when everything passed it's possible that creating rawbufs failed due to memory error, included that in failure cases	2024-03-07 18:41:22 -05:00
chenyu	906cc3a69b	cleanup tests Device[Device.DEFAULT] is always Compiled (#3645 )	2024-03-07 11:15:42 -05:00
qazal	bdd62c7fd8	make the bf16 include dynamic (#3642 ) * dynamic prefix * add common ones above these are common dtypes aesthetics * regression test fuzz it test * run in CI * use .append * faster	2024-03-07 10:31:35 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
George Hotz	8500265561	this mem fault still happening (#3620 ) * this mem fault still happening * smaller * that print doesn't work * overflows test * hip doesn't uses_ptr_arithmetic * only with locals * test overflow new name * it's not ptr arith * simpler * simple repro * old compiler * simpler * put that back	2024-03-05 10:39:32 -08:00
George Hotz	f500be1313	out of bounds access caused by launch bounds (#3615 ) * lin overflow * remove launch bounds * remove launch bounds infra * oops, fix bufs type	2024-03-05 06:34:00 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00
George Hotz	83530a585f	add quick external data select test	2024-03-02 05:38:32 -08:00
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
Francis Lam	11da65bccd	test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option (#3455 ) * test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option this allows us to limit the size of the kernel and reduce running times by avoiding ones that take a long time * fix spacing and re-order to put parameters together	2024-02-27 07:34:59 -05:00
chenyu	30f26279c5	add back "CPU" in test_onnx_backend supports_device (#3426 ) the onnx tests were all skipped.	2024-02-16 00:49:30 -05:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
George Hotz	a40df14fef	ops_ext to replace cpu import (#3409 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test * fix jit issue	2024-02-15 13:03:42 +01:00
George Hotz	6356474d6d	Revert "ops_ext to replace cpu import (#3406 )" (#3408 ) This reverts commit `91eb93f85a`.	2024-02-15 12:16:10 +01:00
George Hotz	91eb93f85a	ops_ext to replace cpu import (#3406 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test	2024-02-15 12:14:58 +01:00
chenyu	078a2603d5	set metal fast math default to 0 (disabled) (#3370 ) * set metal fast math default to 0 (disabled) It's a correctness fix because we use inf and nan. Let's see how slow it is * skip failed onnx tests * tmp DISABLE_COMPILER_CACHE=1 in metal benchmark * Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark" This reverts commit `22267df380`.	2024-02-14 11:42:33 +01:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
chenyu	f798b60338	add METAL_FAST_MATH env var to disable metal fast math (#3369 ) * env var METAL_FAST_MATH to disable fastmath for metal use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE * failed onnx test with fast math METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu	2024-02-11 04:26:09 -05:00
chenyu	c151131d1b	update onnx tests that no longer fail on CI (#3353 ) was debugging fast math and turned out it passed on CI now. more like a bug in CI	2024-02-08 21:19:00 -05:00
Francis Lam	2266152b28	linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340 ) Fixed test_tensor_core_opts to test all the TCs. Added commented out failing tests in test_color_shapes_with_local.	2024-02-08 16:12:58 +01:00
chenyu	30a3288c4a	touchup canonicalize empty mask (#3308 ) empty list -> None. also added env SEED for fuzz_shapetracker_math	2024-02-03 21:05:10 -05:00
chenyu	7816c3b692	onnx update for trilu and argmax (#3283 ) * support 0 in shape for tril and triu * select_last_index for ArgMax and ArgMin * pass **kwargs	2024-01-30 18:39:16 -05:00
George Hotz	247a8a2a6c	add canonicalization to View.create (#3280 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views * update second any * isinstance -> not * 25% less same but unequal	2024-01-30 10:26:48 -08:00
George Hotz	d8f6280ffb	hotfix: add CHECK_NEQ to fuzz_shapetracker_math	2024-01-30 10:07:54 -08:00
George Hotz	c4d870db0d	fix jit realize issue (#3258 )	2024-01-26 18:27:35 -08:00
chenyu	bc92c4cc32	onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252 ) * onnx Einsum, CumSum, DepthToSpace, SpaceToDepth Einsum inner product and `...` are not supported * --durations=20	2024-01-26 10:47:53 -05:00
chenyu	e45ffdb6cf	cleanup onnx (#3249 ) * add onnx test_reduce_log_sum_exp * more reuse * more * stuff * good CenterCropPad * imports * good ArrayFeatureExtractor * pretty good Pad * stuff * stuff * onnx.py * Atan * pass int8 test * dtype related * fastmath stuff * Resize linear * fix CI * move back	2024-01-25 20:39:59 -05:00
nimlgen	f87ecbb0f3	fuzzer validates outputs + (partially) oob accesses (#3178 ) * fuzzer validates outputs + (partially) oob accesses * +random * oob check only for compiled * type cmp fixes * fix zeroing * no prints * add seed	2024-01-19 13:34:51 -05:00
chenyu	1b508e0f71	fix fuzz_linearizer toCPU to as_buffer (#3158 )	2024-01-17 13:18:46 -05:00
chenyu	537fb8b0b8	separate try except blocks in onnx2torch in model benchmark (#3126 ) exceptions can be raised from either model conversion or individual backend failed. openpilot on torch mps works, but does not work with torch cpu. seperate the expcetion block so that the benchmark can inlcude torch mps for openpilot.	2024-01-15 00:39:33 -05:00
George Hotz	ac3f246c11	cached size (#3060 ) * cached size * simplify simplify * 0 doesn't have base * fix test * cleaner cache * hmm, metal is flaky on this...might be real(ish) but useless as test * short circuit reshape/expand properly * better reshape bypass	2024-01-09 16:37:37 -08:00
George Hotz	2c6f2e899d	No extra vars call (#3054 ) * remove unused reciprocal * comment * remove unneeded call to vars * free speedup	2024-01-09 09:52:58 -08:00
chenyu	19298e7a3f	Device._buffers -> Device._devices (#3052 ) backend devices used to be called buffers	2024-01-08 21:30:38 -05:00
George Hotz	2a2d3233d2	add test that the compiler isn't used (#3025 ) * add test that the compiler isn't used * one print_tree * improve speed with st size cache * switch to gpt-2	2024-01-05 17:24:01 -08:00
geohotstan	57817028bb	removed redundant dtype hacks in onnx_ops (#2939 ) * updated most dtype hacks in onnx_ops * temporarily revert dequantizelinear change * I think this is right... * MORE FIXES WOOOO NEW DTYPE IS AWESOME * ok * oops missed a print * half -> float32 for CI * is npdtype * some more * fix if ordering * more clean ups * final cleanups * casting to half not allowed * k nvm * revert ArgMax change * only GPU * llvm begone * teeny tiny change * fix: attempt to add cast tests * try this * fix dequantizelinear * revert some stuff * tests pass pls * less lines in onnx_tests * oops missed string tensor tests * clean up * try: revert default behavior changes * fix: disabled Cast and Castlike tests * docs: small changes * fix: fixed isNaN op and enabled associated tests * fix: forgot about float16 * done * update disabled test * gah missed another float16 * disable rest of failing tests * rm extra line * try... --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-01-04 01:45:24 -05:00
chenyu	58d3d5030b	vars_from_ast -> LazyOp.vars (#2965 )	2024-01-01 18:12:38 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
George Hotz	56f44bd10e	move the compiler cache to be global (#2957 ) * move the compiler cache to be global * remove non robust test * remove dead code	2024-01-01 10:59:56 -08:00
chenyu	1fb815e77e	hotfix fix coder. RMSNorm cannot have float16 input (#2932 ) * hotfix fix coder. RMSNorm cannot have float16 input * update real world test due to new kernels * more type casts	2023-12-25 02:28:11 -05:00
chenyu	50927defad	s/lazydata.realized/lazydata.base.realized/g (#2914 ) * s/lazydata.realized/lazydata.base.realized/g * not that	2023-12-22 14:45:13 -05:00
chenyu	50cfb1fb3a	update onnx model links (#2908 ) updated in https://github.com/onnx/models/pull/644	2023-12-22 00:19:41 -05:00
chenyu	1bbeb3fe2f	remove the different rtol / atol for openpilot CUDA in benchmark (#2907 ) not sure what the issue was but seems to be fixed on master	2023-12-21 22:23:39 -05:00
chenyu	5bf43c9634	reenable one onnx test failed due to dtype (#2902 )	2023-12-21 15:50:02 -05:00

1 2 3 4 5

243 Commits