tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 15:38:29 -05:00

Author	SHA1	Message	Date
chenyu	aa76d566c2	cleanup mamba (#4004 ) make it read nicer and cleanup some movement methods and math simplification. 790m, 1.4b, 2.8b model does not really run. sampling is not implemented. jit is incorrect. some deadcode / wrong code path and copied from torch stuff stuff.	2024-03-30 02:50:13 -04:00
George Hotz	f35f9d32f2	rename mlops to function (#4003 )	2024-03-29 21:49:00 -07:00
chenyu	c71627fee6	move GlobalCounter to helpers (#4002 ) break circular import between ops and buffer	2024-03-30 00:30:30 -04:00
George Hotz	9eef44521b	ScheduleItem uses Buffer (#3995 ) * schedule Buffer * update * update tests * master * works * remove LoadOps.WAIT * fix compile2 * bad test * rename and note	2024-03-29 20:50:27 -07:00
George Hotz	1bd4f01da2	size instead of st.size (#4001 )	2024-03-29 19:59:02 -07:00
George Hotz	8f1e34a2a0	early src delete (#3996 ) * early src delete * fix bad test * fix test_linearizer	2024-03-29 19:46:07 -07:00
Szymon Ożóg	31c8ba8b84	Move transformations to PatternMatcher + clean up existing patterns (#3997 )	2024-03-29 19:42:39 -07:00
George Hotz	f916aadaea	external that test	2024-03-29 19:35:50 -07:00
George Hotz	c42ed8e99c	don't reschedule	2024-03-29 19:17:37 -07:00
chenyu	ecf38f498e	beam search resnet eval too in BENCHMARK (#4000 )	2024-03-29 21:07:23 -04:00
chenyu	b43e470f80	always use f32 for rand source of randn (#3998 ) * always use f32 for source of randn fixed bfloat16 randn to not have inf. don't really care about float64. threefry is float32 based too * HSA is broken	2024-03-29 17:04:34 -04:00
chenyu	6b6461122e	test case Tensor.randn should be finite (#3994 ) * test case Tensor.randn should be finite there's a hack to fix float16, need a generic solution that works with bf16 and threefry * skip not supported * bfloat16 local is wrong * skip RHIP	2024-03-29 14:51:02 -04:00
chenyu	d9ff636cf5	use is to compare with enum (#3993 ) * use is to compare with enum currently it's mixed between `==` and `is`, moved all to `is` * more	2024-03-29 13:02:56 -04:00
Akshit Talwar	0affbbf81c	update amx gemm (#3991 )	2024-03-29 11:45:03 -04:00
chenyu	4abb8245a6	rhs_order in einsum is argsort twice (#3990 ) * rhs_order in einsum is argsort twice * comment	2024-03-29 11:42:04 -04:00
chenyu	7bc560ec49	remove outdated bf16 comments in test_dtype (#3987 )	2024-03-29 00:56:18 -04:00
uuuvn	8a40d7d423	Shape changing bitcast and assert bitcast in disk (#3973 ) * Shape changing bitcast * only support it on disk * basic test * more tests * RuntimeError instead of assert * create unique temp files * move tests that use disk to test_disk_tensor * linter * remove assert on error messages * that's RuntimeError now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-28 21:49:10 -07:00
chenyu	793ab0512e	use ctypes to truncate float64 and float32 in uops (#3986 ) this fixed the softmax.argmax bug for ops_python as the float is truncated to float32	2024-03-28 23:56:50 -04:00
chenyu	101a0c683d	use ctyles for uops truncate (#3985 )	2024-03-28 23:31:34 -04:00
George Hotz	1bf0a7a2d1	move assign logic into lazy.py (#3984 ) * move assign logic into lazy.py * don't check the buffer	2024-03-28 20:26:38 -07:00
chenyu	3fee689ded	fix ops_python for test_uops (#3982 )	2024-03-28 22:48:55 -04:00
George Hotz	9a6ac2a50a	create the buffer with the LazyBuffer (#3977 ) * create the buffer with the LazyBuffer * fixes * hack underlying buffer when we change dtype * we only care about allocated buffers * asserts	2024-03-28 19:31:28 -07:00
chenyu	c4c243f79d	update test_uops _equal to use assert_allclose (#3981 ) it handles nan	2024-03-28 22:14:45 -04:00
reddyn12	9b5e15db6e	Mamba Implementation (#3456 ) * first commit * state back to orig * mamba comparisions * rm file * rename file * use Tensor.einsum and mke default model 370M * Cleaned code and made a comparision test * Simplyfy pull request. Only has 1 mamba implementation now. * Update prompt * rm whitespaces * last space * remove Einops dependency * rm unused code * add tests * rm print statement * rm imports * skip CLANG * Update skipIf description * skip model test in CI and add CLANG fix * rm Device import * don't be stupid * Fix conv assign When the prompt is too short, the logic for conv_state assign messes up. This can be fixed when padding the tokenized array to min length of 4. I padded using the empty string token, but idk if proper practice is to use the PAD token * fix p1 * temp * fix jit import --------- Co-authored-by: schlimeszn <schlimeszn@gmail.com> Co-authored-by: reddyn <nikidsniper@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-28 17:49:12 -07:00
George Hotz	d085837179	hotfix: that mem_used was in the wrong place	2024-03-28 17:09:04 -07:00
chenyu	1fa0351acb	fix DEFINE_ACC invalid_value to have same type as localtype (#3980 )	2024-03-28 19:21:17 -04:00
chenyu	b47f6cebb2	LinearizerOptions -> CompilerOptions (#3978 )	2024-03-28 17:50:23 -04:00
qazal	2bfb1d3e39	dynamic assign idx (#3975 )	2024-03-28 13:59:32 -07:00
George Hotz	2cfcb5623a	hotfix: d was removed from buffer	2024-03-28 13:39:02 -07:00
George Hotz	42b9d999ea	Buffer isn't always allocated (#3974 ) * buffer alloc * allocate * missing allocates * last one	2024-03-28 13:33:47 -07:00
George Hotz	9c03fe3e5d	hotfix: ShapeTracker no longer has import cycle	2024-03-28 10:34:23 -07:00
chenyu	bfcaa2f70e	assert `__setitem__` if used other than disk (#3972 ) * assert `__setitem__` if used other than disk * that is not implemented	2024-03-28 12:16:38 -04:00
David Hou	4b95350c41	fp16 resnet (without expand backwards sum in float, doesn't work) (#3816 ) * fp16 resnet * cast running mean and var back to default float * extra cast * check symbolic no overflow * add linearizer failure * loss scaler after grad contig * oops * i think this works * don't loss scale fp32 * remove overflow test case * remove symbolic bounds check * loss scaler should be float * temporarily disable padto cuz bug shruggie * make running stats in batchnorm float32? * calculate lars stuff in fp32? * oops * remove most changes * move loss scaler out of optimizer * no more FP16 var * oops --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-28 01:25:37 -04:00
George Hotz	607b4a7d70	remove buffer read, save lines (#3969 )	2024-03-27 22:02:47 -07:00
chenyu	80116be9a5	for loop to generate hip math functions for different floats (#3967 ) * for loop to generate hip math functions for different floats * slightly nicer	2024-03-27 23:24:29 -04:00
qazal	03d129baa8	inputs -> membufs (#3964 )	2024-03-27 17:34:39 -07:00
Francis Lam	16a1d43f6f	llama: prevent device initialization outside of __main__ (#3966 ) * llama: prevent device initialization outside of __main__ causes HSA resources leakages in child compile processes * llama: fix loading with multiple devices	2024-03-27 19:19:38 -04:00
Francis Lam	7c5729a3bd	wmma: refactor to remove wmma_func and create TC funcs as needed (#3945 ) * wmma: refactor to remove wmma_func and create TC funcs as needed * test_linearizer: disable bf16 CUDA during emulation testing * cstyle: clean up creation of CUDA vec dtypes * extra/gemm: add option to accumulate to bfloat16 * cleanups * benchmark: add CUDA bfloat16 matmul * more cleanups	2024-03-27 16:43:09 -04:00
chenyu	88b24df40a	touchup remove `float()` in cstyle render_const for float64 (#3962 )	2024-03-27 16:08:28 -04:00
qazal	27af37f2ad	misc: remove unused env vars (#3963 ) * remove unused env vars * delete CPU	2024-03-27 16:08:15 -04:00
George Hotz	60639cccac	hotfix: RuntimeError for assign	2024-03-27 11:18:48 -07:00
qazal	9fb573d73c	DAG cycle asserts (#3955 ) * assert cycles * these are cycle errors * flip to positive	2024-03-27 11:09:59 -07:00
geohotstan	bd3a7d068c	correct device for validation test in model benchmark CI (#3960 ) * fix tests * add clang back for only metal * change the name to reflect CLANG being ran * add back cuda	2024-03-27 13:40:06 -04:00
George Hotz	eec2b00edc	change kernel name if it's multioutput (#3958 )	2024-03-27 08:42:57 -07:00
George Hotz	d1c957a471	copy back to clang (#3951 ) * copy back to clang * force the copy for CLANG device	2024-03-27 08:13:01 -07:00
P4ssenger	332c82893a	Remove redundant check on device (#3957 ) * call self.nbytes * device is canonicalized, therefore, it cannot be None	2024-03-27 07:54:33 -07:00
chenyu	6c7df1445b	enforce UOps.CONST arg has python type based on dtype (#3952 ) added an assert in uops, remove the cast in renderer	2024-03-27 01:41:38 -04:00
George Hotz	91f3326c0b	hotfix: increase recursion limit	2024-03-26 21:26:54 -07:00
George Hotz	68ca4d4276	split to schedule.py (#3949 ) * split to schedule.py * split	2024-03-26 21:02:46 -07:00
George Hotz	da07f31fd4	hotfix: remove bf16 test entirely	2024-03-26 20:50:27 -07:00

1 2 3 4 5 ...

4009 Commits