tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 22:38:16 -05:00

Author	SHA1	Message	Date
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
Yixiang Gao	8a63f26a0f	make LR scheduler work with multigpu (#3011 ) * add a failing test for LR scheduler when using multigpu * fix calculation order and unnecessary tensor created for float * min_lr is no longer tensor	2024-01-04 12:10:56 -08:00
chenyu	91665ef143	rewrite MUL CAST SUM to CAST MULACC	2024-01-04 13:12:22 -05:00
chenyu	ab7dfd637b	use float for acc dtype for half tensor sum we previously only upcast uint and int, and half was using half for acc. change to acc in float for precision. but cast the result back to half to match torch/jax output dtype	2024-01-04 13:12:22 -05:00
geohotstan	57817028bb	removed redundant dtype hacks in onnx_ops (#2939 ) * updated most dtype hacks in onnx_ops * temporarily revert dequantizelinear change * I think this is right... * MORE FIXES WOOOO NEW DTYPE IS AWESOME * ok * oops missed a print * half -> float32 for CI * is npdtype * some more * fix if ordering * more clean ups * final cleanups * casting to half not allowed * k nvm * revert ArgMax change * only GPU * llvm begone * teeny tiny change * fix: attempt to add cast tests * try this * fix dequantizelinear * revert some stuff * tests pass pls * less lines in onnx_tests * oops missed string tensor tests * clean up * try: revert default behavior changes * fix: disabled Cast and Castlike tests * docs: small changes * fix: fixed isNaN op and enabled associated tests * fix: forgot about float16 * done * update disabled test * gah missed another float16 * disable rest of failing tests * rm extra line * try... --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-01-04 01:45:24 -05:00
chenyu	9f39165188	correct (dtype, device) in test_dtype.is_dtype_supported (#3007 ) corrected dtypes for TORCH and float64 support	2024-01-04 00:25:37 -05:00
chenyu	ae112c9dbe	fix some long lines in tests (#3006 ) * fix some long lines in tests * better	2024-01-03 23:53:33 -05:00
George Hotz	9699c8c90b	don't alloc for InterpretedASTRunner (#2999 )	2024-01-03 17:05:53 -08:00
chenyu	74cc6fd3c2	remove AndNode.__floordiv__ special case (#2996 ) * remove AndNode.__floordiv__ AndNode produces a Node that min/max is bounded by [0, 1] so `//` on top of that is almost always 0. we don't really use that either * keep the test	2024-01-03 17:44:55 -05:00
Yixiang Gao	5663dd46b6	Merge branch 'master' of github.com:tinygrad/tinygrad into cifar_fp16	2024-01-03 10:11:46 -08:00
chenyu	81b97cd2c6	canonicalize device in LazyBuffer constructor (#2991 ) fixed the multitensor +1 then sum bug	2024-01-03 12:55:25 -05:00
chenyu	db525cf8c2	multitensor failed test case with +1 then sum on DEVICE:0 (#2990 )	2024-01-03 12:17:11 -05:00
George Hotz	5dbaaa7061	hotfix: make multitensor shard contiguous	2024-01-03 08:48:30 -08:00
Yixiang Gao	84eb6dd32a	skip GPU cause opencl on intel can't compile half	2024-01-03 07:07:21 -08:00
Yixiang Gao	73879b50ad	only need to check the min_lr for the nan bug	2024-01-03 07:00:50 -08:00
Yixiang Gao	99f8740c60	running half in CI CPU is slow	2024-01-02 18:44:35 -08:00
Yixiang Gao	781690fd99	how long it takes on CI CPU without the lr scheduler	2024-01-02 18:33:48 -08:00
Yixiang Gao	dd00bcb9c0	fix whitespace	2024-01-02 18:16:33 -08:00
Yixiang Gao	841487cad9	add half test with using hyp from benchmarks	2024-01-02 18:14:30 -08:00
George Hotz	f494b9d463	simple multitensor API (#2903 ) * simple multitensor API * test multitensor * mt work * new api * copies * all but data parallel * allreduce there * works, but axis sharded * fix all mt tests * features/multi * work * backprop * fix tests * tests passing * mt progress * cleanups * less lines * tensor cleanup * save more lines * mypy passes * fix tests * skip for cuda too * bump download cache	2024-01-02 17:49:44 -08:00
chenyu	ff5399f053	move one last dtype test from test_helpers to test_dtype (#2975 )	2024-01-02 12:37:56 -05:00
Kevin Herro	bd6a0c90a0	add Tensor.split (#2750 ) * add Tensor.split (#2677) * fix mypy errors * add list support for Tensor.split * fix ruff comments * match tensor.split api * simplify split and test_split --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-01 22:09:04 -08:00
George Hotz	e7a432b479	search refactor (#2969 ) * minor search cleanup * now that saves lines * fix	2024-01-01 17:39:26 -08:00
chenyu	58d3d5030b	vars_from_ast -> LazyOp.vars (#2965 )	2024-01-01 18:12:38 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	8291986959	Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961 )	2024-01-01 16:21:28 -05:00
chenyu	3d720b5761	move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959 )	2024-01-01 14:41:21 -05:00
George Hotz	56f44bd10e	move the compiler cache to be global (#2957 ) * move the compiler cache to be global * remove non robust test * remove dead code	2024-01-01 10:59:56 -08:00
George Hotz	063f465604	simpler webgpu (#2956 ) * simpler webgpu * skip that test	2024-01-01 10:28:59 -08:00
chenyu	50f2e31d26	cleanup float4 grouping in global_load and global_store (#2942 ) * cleanup float4 grouping in global_load and global_store * fix test decorator	2023-12-27 14:10:04 -05:00
chenyu	54629b56d2	minor cleanup in kernel and linearizer (#2937 ) * minor cleanup in kernel and linearizer less long line, spaces and colocate variables * no deadline in hypothesis test	2023-12-26 12:05:32 -05:00
chenyu	820f2e054e	fix PADTO optimization (#2935 ) the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops. even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.	2023-12-25 22:52:49 -05:00
qazal	dca5e4fe74	tensor == tensor should be bool (#2916 ) * return bool * add tests to the type spec * fix multinomial * fix tril * fix round * fix NegativeLogLikelihoodLoss * rm debug * webgpu * more webgpu * bitwise or for adding two bools * onnx ops dont need to cast anymore * Revert "bitwise or for adding two bools" This reverts commit `b413babffa`. * workaround for metal neg * just the tests in the type spec	2023-12-25 12:38:47 -05:00
chenyu	8a8aed23d2	test dtypes of return values of cumsum, argmax/min, multinomial (#2933 ) * test dtypes of return values of cumsum, argmax/min, multinomial cumsum behaves like sum, and functions that return an index return in dtypes.default_int * because webgpu is different	2023-12-25 11:33:17 -05:00
chenyu	1fb815e77e	hotfix fix coder. RMSNorm cannot have float16 input (#2932 ) * hotfix fix coder. RMSNorm cannot have float16 input * update real world test due to new kernels * more type casts	2023-12-25 02:28:11 -05:00
Will	016aebcd84	Fixed Tensor.randint() not accepting tuple shapes (#2923 ) * ww/Fixed Tensor.randint() to accept shape tuples () * ww/Wrote a test to cover this typo * ww/Updated Tensor random objects to optionally take (,) or () to be more consistent ww/no lint no worries * ww/Made peace with linter * ww/Added new line can't reduce line size without reducing readablitity * ww/reverted to using .mul	2023-12-24 20:32:26 -05:00
Isalia20	8de1fc2539	Einsum space fix (#2927 ) * space removal in formula and a single test to cover it * space in torch einsum as well * replacing spaces in a var formula to support truncating all the spaces	2023-12-24 01:23:27 -05:00
chenyu	b55b55d56e	use at least int32 and uint32 for sum output (#2926 ) * use at least int32 and uint32 for sum output * use the correct type for acc * fix opencl * llvm mulacc	2023-12-24 01:14:54 -05:00
chenyu	089703a390	cleanup test_dtype_alu (#2919 ) wrapped long lines and lowered atol for METAL.sin to 2 since atol of two sins are bounded by 2	2023-12-22 17:29:31 -05:00
chenyu	50927defad	s/lazydata.realized/lazydata.base.realized/g (#2914 ) * s/lazydata.realized/lazydata.base.realized/g * not that	2023-12-22 14:45:13 -05:00
chenyu	2783e1b50d	bugfix Tensor.item when it's unbased (#2913 ) it's possible for numel 1 tensor lazydata to be unbased and should call lazydata.base.realized	2023-12-22 13:50:06 -05:00
Oleg Rybalko	c3133adb8c	Disk shm refactor (#2912 ) * better support for platform dependent flags * osx test support * removed unused import and made line length <150 * changed osx ci shm * lstrip in case SharedMemory._name is passed	2023-12-22 09:23:37 -08:00
chenyu	3855432265	don't use numpy to create Tensor(None) (#2909 ) * don't use numpy to create Tensor(None) empty suffices * parentheses	2023-12-22 01:07:44 -05:00
chenyu	50cfb1fb3a	update onnx model links (#2908 ) updated in https://github.com/onnx/models/pull/644	2023-12-22 00:19:41 -05:00
chenyu	1bbeb3fe2f	remove the different rtol / atol for openpilot CUDA in benchmark (#2907 ) not sure what the issue was but seems to be fixed on master	2023-12-21 22:23:39 -05:00
chenyu	a543d8bea8	fuzz default dtypes for some test_dtype tests (#2906 ) * fuzz default dtypes for some test_dtype tests * ocd * setUp and tearDown	2023-12-21 22:00:21 -05:00
George Hotz	5cac6338a4	apply the multitensor optimizations in lazy.py (#2901 ) * apply the multitensor optimizations in lazy.py * less lines * hack for webgpu * save a line	2023-12-21 13:55:49 -08:00
chenyu	5bf43c9634	reenable one onnx test failed due to dtype (#2902 )	2023-12-21 15:50:02 -05:00
George Hotz	193109a88c	hotfix: compare on ids	2023-12-20 23:47:50 -08:00

... 63 64 65 66 67 ...

4433 Commits