tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-14 08:35:17 -05:00

Author	SHA1	Message	Date
chenyu	7c80b78be9	cleanup gpt2 build function (#3018 )	2024-01-04 23:14:53 -05:00
chenyu	55e52abeba	minor cleanup of matvec in hand_coded_optimizations (#3015 ) remove noop isinstance check and fix long lines	2024-01-04 19:43:49 -05:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
George Hotz	c2a044ed83	disk_read_speed example	2024-01-04 13:59:43 -08:00
Yixiang Gao	8a63f26a0f	make LR scheduler work with multigpu (#3011 ) * add a failing test for LR scheduler when using multigpu * fix calculation order and unnecessary tensor created for float * min_lr is no longer tensor	2024-01-04 12:10:56 -08:00
chenyu	8524493748	minor gpt2 cleanup (#3012 )	2024-01-04 13:53:18 -05:00
chenyu	2b6670d2ea	separate entry for HALF hlb_cifar10 in benchmark (#3010 )	2024-01-04 13:24:10 -05:00
chenyu	5337211058	llvm CMPEQ	2024-01-04 13:12:22 -05:00
chenyu	b8c30eb358	no midcast MULACC for llvm	2024-01-04 13:12:22 -05:00
chenyu	91665ef143	rewrite MUL CAST SUM to CAST MULACC	2024-01-04 13:12:22 -05:00
chenyu	ab7dfd637b	use float for acc dtype for half tensor sum we previously only upcast uint and int, and half was using half for acc. change to acc in float for precision. but cast the result back to half to match torch/jax output dtype	2024-01-04 13:12:22 -05:00
chenyu	6fa285b943	touchup onnx xor and not (#3008 )	2024-01-04 02:02:42 -05:00
geohotstan	57817028bb	removed redundant dtype hacks in onnx_ops (#2939 ) * updated most dtype hacks in onnx_ops * temporarily revert dequantizelinear change * I think this is right... * MORE FIXES WOOOO NEW DTYPE IS AWESOME * ok * oops missed a print * half -> float32 for CI * is npdtype * some more * fix if ordering * more clean ups * final cleanups * casting to half not allowed * k nvm * revert ArgMax change * only GPU * llvm begone * teeny tiny change * fix: attempt to add cast tests * try this * fix dequantizelinear * revert some stuff * tests pass pls * less lines in onnx_tests * oops missed string tensor tests * clean up * try: revert default behavior changes * fix: disabled Cast and Castlike tests * docs: small changes * fix: fixed isNaN op and enabled associated tests * fix: forgot about float16 * done * update disabled test * gah missed another float16 * disable rest of failing tests * rm extra line * try... --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-01-04 01:45:24 -05:00
chenyu	9f39165188	correct (dtype, device) in test_dtype.is_dtype_supported (#3007 ) corrected dtypes for TORCH and float64 support	2024-01-04 00:25:37 -05:00
chenyu	ae112c9dbe	fix some long lines in tests (#3006 ) * fix some long lines in tests * better	2024-01-03 23:53:33 -05:00
George Hotz	7e191fbb86	hotfix: don't jitcache with 1 kernel. improvements to hip sniffer	2024-01-03 19:17:08 -08:00
George Hotz	bcc1aa21ac	make disk simpler (#3002 ) * make disk simpler * upd ops_disk * works on osx too * revert ops_hip	2024-01-03 17:46:21 -08:00
George Hotz	9699c8c90b	don't alloc for InterpretedASTRunner (#2999 )	2024-01-03 17:05:53 -08:00
chenyu	bca0b95ee3	bump shapetracker simplify message to DEBUG >= 5 (#2998 )	2024-01-03 20:00:36 -05:00
chenyu	74a30431b4	replace `d[a] if a in d else b` with `d.get(a, b)` (#2997 )	2024-01-03 18:10:25 -05:00
chenyu	74cc6fd3c2	remove AndNode.__floordiv__ special case (#2996 ) * remove AndNode.__floordiv__ AndNode produces a Node that min/max is bounded by [0, 1] so `//` on top of that is almost always 0. we don't really use that either * keep the test	2024-01-03 17:44:55 -05:00
George Hotz	a0c7cb2564	hotfix: create weights dir in local tg checkout	2024-01-03 14:14:33 -08:00
George Hotz	fc36a7d669	tinygrad weights	2024-01-03 14:09:28 -08:00
chenyu	1ac4d27869	remove VariableOrNum from Node.substitute arg (#2995 ) having NumNode in var_vals does not change the substitute output	2024-01-03 17:02:25 -05:00
George Hotz	65dc3700b7	hip device is default on supported platforms (#2993 )	2024-01-03 13:42:13 -08:00
George Hotz	77c98a1543	hotfix: remove weights directory	2024-01-03 13:40:39 -08:00
George Hotz	0be0f2f745	remove stable diffusion test on tinymac	2024-01-03 13:18:24 -08:00
George Hotz	a354ec9dad	Revert "hotfix: HIP is the default device on HIP platforms" This reverts commit `b748b569f5`.	2024-01-03 13:16:54 -08:00
George Hotz	b748b569f5	hotfix: HIP is the default device on HIP platforms	2024-01-03 13:13:52 -08:00
George Hotz	753a7ecc05	Hip driver (#2992 ) * start hip driver * fix hip llama * make HIP default if we can * don't change those	2024-01-03 12:53:47 -08:00
George Hotz	f290ca3924	hotfix: save lines in graph	2024-01-03 12:03:42 -08:00
Yixiang Gao	bc4b6e758b	Merge pull request #2981 from g1y5x3/cifar_fp16 adjsut div factor to avoid underflow for cifar in fp16	2024-01-03 11:15:42 -08:00
George Hotz	d7d5a487ad	hotfix: all device canonicalize should be done in Tensor	2024-01-03 10:48:04 -08:00
Yixiang Gao	ea3bc2f509	remove wino benchmark for now	2024-01-03 10:46:43 -08:00
Yixiang Gao	5663dd46b6	Merge branch 'master' of github.com:tinygrad/tinygrad into cifar_fp16	2024-01-03 10:11:46 -08:00
chenyu	81b97cd2c6	canonicalize device in LazyBuffer constructor (#2991 ) fixed the multitensor +1 then sum bug	2024-01-03 12:55:25 -05:00
chenyu	db525cf8c2	multitensor failed test case with +1 then sum on DEVICE:0 (#2990 )	2024-01-03 12:17:11 -05:00
Yixiang Gao	7f1802cd50	update benchmark	2024-01-03 09:09:34 -08:00
George Hotz	5dbaaa7061	hotfix: make multitensor shard contiguous	2024-01-03 08:48:30 -08:00
chenyu	590268fa03	out_tokens -> grouped in linearizer (#2989 ) no more token now	2024-01-03 11:45:28 -05:00
Yixiang Gao	8e1fd6ae9d	test works	2024-01-03 07:22:01 -08:00
Yixiang Gao	4f89f8b73a	make sure the old hyp breaks the test	2024-01-03 07:13:54 -08:00
Yixiang Gao	84eb6dd32a	skip GPU cause opencl on intel can't compile half	2024-01-03 07:07:21 -08:00
Yixiang Gao	73879b50ad	only need to check the min_lr for the nan bug	2024-01-03 07:00:50 -08:00
Yixiang Gao	99f8740c60	running half in CI CPU is slow	2024-01-02 18:44:35 -08:00
Yixiang Gao	781690fd99	how long it takes on CI CPU without the lr scheduler	2024-01-02 18:33:48 -08:00
Yixiang Gao	dd00bcb9c0	fix whitespace	2024-01-02 18:16:33 -08:00
Yixiang Gao	841487cad9	add half test with using hyp from benchmarks	2024-01-02 18:14:30 -08:00
George Hotz	f494b9d463	simple multitensor API (#2903 ) * simple multitensor API * test multitensor * mt work * new api * copies * all but data parallel * allreduce there * works, but axis sharded * fix all mt tests * features/multi * work * backprop * fix tests * tests passing * mt progress * cleanups * less lines * tensor cleanup * save more lines * mypy passes * fix tests * skip for cuda too * bump download cache	2024-01-02 17:49:44 -08:00
George Hotz	5522ba234b	simplify image functions (#2987 ) * simplify image functions * line in tensor	2024-01-02 17:35:08 -08:00

... 16 17 18 19 20 ...

4147 Commits