tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
George Hotz	09f2952dc3	reintroduce merge views in update benchmark (#3279 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views	2024-01-30 09:47:20 -08:00
George Hotz	d298916232	Revert "take merge views from corsix branch" (#3278 )	2024-01-30 09:34:28 -08:00
George Hotz	b57a16aa89	take merge views from corsix branch (#3273 ) * take merge views from corsix branch * better DEBUG * max views * remove view.py change * Revert "remove view.py change" This reverts commit `f3025f4f39`. * only allow filter on non symbolic * oops, correct fix * comment to explain	2024-01-30 09:25:16 -08:00
George Hotz	6a4a5dc79d	fix pad 0 size (#3277 ) * fix pad 0 size * put in view, not pad * test was wrong	2024-01-30 08:58:10 -08:00
chenyu	b0a755288f	cifar EVAL_BS set default value to BS (#3274 ) less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s	2024-01-29 17:37:12 -05:00
Francis Lam	861d5ac224	wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250 ) will correctly handle and permutation of optops after the TC one	2024-01-29 11:51:57 -08:00
chenyu	af4ca85594	MultiLazyBuffer.reshape new_axis without real_strides (#3272 ) similar to contraction, but this is one is for finding the mapped single axis	2024-01-28 23:53:52 -05:00
chenyu	34c7621556	HIP=1 NOCLANG=1 for tinybox external_model_benchmark (#3270 ) used HIP instead of GPU and disabled slow CLANG	2024-01-28 22:05:26 -05:00
George Hotz	085dc87bed	winograd should be 4 kernels (#3268 )	2024-01-28 09:21:26 -08:00
George Hotz	f48b6aca77	long running beam pool (#3267 )	2024-01-28 08:06:03 -08:00
George Hotz	9e17378b60	Fix metal tests (#3266 ) * small fixes for tests on mac * remove device from TensorCore	2024-01-27 18:09:42 -08:00
Francis Lata	86748f4a8c	fix bbox format to be a list (#3265 )	2024-01-27 17:54:19 -08:00
George Hotz	67a78615e5	uoptimizer (#3262 ) * uoptimizer * uops * self.uoptimize	2024-01-27 10:26:04 -08:00
Hristo Georgiev	3ae811af21	tests for Tensor init data dtype and resulting dtype (#3247 ) Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com>	2024-01-27 00:13:42 -08:00
George Hotz	3c728d1082	compiler support (#3260 ) * compiler support * revert that * fix tests	2024-01-26 23:36:40 -08:00
Francis Lam	4273aabe31	extra/gemm: add a simple_conv.py along with correctness check (#3236 ) * extra/gemm: add a simple_conv.py along with correctness check The goal is to easily test tensor core triggering situations * test: add tests for acc_dtype handling and fixed typing	2024-01-26 19:06:57 -08:00
George Hotz	0aad8d238b	rebuild ocelot (#3259 ) * rebuild * strip trailing whitespace	2024-01-26 18:46:36 -08:00
George Hotz	473935125a	use comgr to compile (#3248 ) * use comgr to compile * fast * bfloat16 * move comgr to it's own file * cleaner style * comgr in new place * comgr free + dtype cleanup	2024-01-26 18:27:49 -08:00
George Hotz	c4d870db0d	fix jit realize issue (#3258 )	2024-01-26 18:27:35 -08:00
chenyu	4197ef17c4	const cleanup with dtype.Scalar (#3257 ) moved Scalar to dtype.py. assert in _broadcasted when y is a Scalar and fix some tests	2024-01-26 21:16:22 -05:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00
chenyu	bc92c4cc32	onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252 ) * onnx Einsum, CumSum, DepthToSpace, SpaceToDepth Einsum inner product and `...` are not supported * --durations=20	2024-01-26 10:47:53 -05:00
chenyu	e45ffdb6cf	cleanup onnx (#3249 ) * add onnx test_reduce_log_sum_exp * more reuse * more * stuff * good CenterCropPad * imports * good ArrayFeatureExtractor * pretty good Pad * stuff * stuff * onnx.py * Atan * pass int8 test * dtype related * fastmath stuff * Resize linear * fix CI * move back	2024-01-25 20:39:59 -05:00
Ahmed Harmouche	168b1f879c	Fix hip_matmul gemm in extra (#3241 )	2024-01-25 16:03:04 -08:00
George Hotz	7feeb118e6	hip launch speed (#3246 ) * faster HIP kernel launch * args * expand compile_hip	2024-01-25 15:13:55 -08:00
George Hotz	cb372b053f	add device speed test (#3244 )	2024-01-25 12:01:22 -08:00
geohotstan	d0e116c6d6	fix maximum/where Scalar casting (#3194 ) * init * test: added dtype tests for maximum * fix: seperate maximum const and maximum tensors * fix: del useless line * fix: some dtypes * CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys * fix: add lil helper function * fix: some test refactoring * done * sike: not done yet lol * wtf I missed an assert, am I drunk * yeah idk * fix: line save from redundant check * revert: line save * fix: simplify test_broadcast cuz I'm stumped * change some test name * fix: bool max bool works * test: add a maximum bool test * test: make sure minimum also works with bool * fix: something like this? :s * fix: maybe this? * fix: how about this? tighter check * fix: this. * revert: nvm mul(0.5) and div(2) has the same kernel for backward * fix: .is_floating_point() xD * revert: maximum and minimum and add cast * fix: cover negative const case in test * fix: use eq because I don't understand clang :D * WHOOOOPS	2024-01-25 12:26:04 -05:00
geohotstan	3628bea910	fix: big round even rounder round (#3242 ) * fix: big round even rounder round * fix: variable name lol * feat: 1 less potential cast * consistant naming (im just spaming commits now) * LOL MISSED ONNX ANOTHER COMMIT * test: fix test_ops and remove _round * test: tensor methods oops	2024-01-25 12:24:15 -05:00
chenyu	da5e27968c	failed test cases for Tensor.round (#3240 ) it should round to even	2024-01-25 02:12:50 -05:00
geohotstan	b0b5eba535	fix _round in onnx_ops to look more like new Tensor.round (#3239 ) * fix: _round in onnxops * fix: minor things * fix: no more n * fix: smol * fix: smoller	2024-01-25 01:18:58 -05:00
George Hotz	aa0d1b6330	hotfix: don't use noqa: E702 that's just dumb	2024-01-24 20:01:00 -08:00
George Hotz	b92945c98d	hotfix: DEBUG >= 2 for kernels	2024-01-24 23:55:17 +00:00
George Hotz	a8fbb03438	minor hip cleanups (#3237 )	2024-01-24 15:13:38 -08:00
nimlgen	3205fd8481	fix cuda device var rewrite (#3233 )	2024-01-24 16:57:49 -05:00
George Hotz	ed8a32722a	hip mutex signal (#3234 ) * hip mutex * hip mutex 2 * sync	2024-01-24 13:23:09 -08:00
George Hotz	47f9887ce4	hip events work (#3229 ) * hip events work * event	2024-01-24 11:49:53 -08:00
George Hotz	de7a3a56ff	save lines in llvm (#3231 ) * save lines in llvm * no implied cast in load * no cast in gate	2024-01-24 11:40:53 -08:00
George Hotz	83d614295e	reduce lines (#3230 )	2024-01-24 10:35:59 -08:00
chenyu	afeadbedc9	touch up Tensor.round and Tensor.neg (#3228 )	2024-01-24 12:29:37 -05:00
Obada Khalili	0e103b4aa0	implement Tensor.round (#3225 )	2024-01-24 11:49:17 -05:00
geohotstan	842053873d	fix neg logical_not inconsistencies (#3222 ) * try * test: add logical_not tests * gah im retarded, but this doesn't match types for const() * fix: can't we jsut do this? * big change: I don't actually know what I'm doing * WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later * BYE BYE noqa: E501 * fix: less lines and add test * fix: rm 2 redundant tests * fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e	2024-01-24 11:48:40 -05:00
George Hotz	e2e4632aea	LoadOps SYNC (#3223 ) * LoadOps SYNC and WAIT * no wait, only sync * DEBUG >= 1 * track cross device	2024-01-23 21:59:18 -08:00
chenyu	2f4b3ab1c0	shard and to should preserve requires_grad (#3224 ) dtypes are inferred from underlying lazydata, requires_grad needs to be passed explicitly	2024-01-24 00:15:10 -05:00
George Hotz	23b084e70a	add device name to device, all are constructed (#3221 )	2024-01-23 20:34:56 -08:00
George Hotz	91a1b2bd7a	the runner does the build (#3220 )	2024-01-23 18:45:43 -08:00
chenyu	9e5409be6c	cifar move GlobalCounters.reset() before shard (#3217 ) * cifar move GlobalCounters.reset() before shard also shard mini batch inplace * don't eval with DISABLE_BACKWARD	2024-01-23 16:07:43 -05:00
Francis Lam	595d05a250	test: fix test_linearizer to use the correct tc_dims (#3218 ) also re-enable the test_tensor_core_opts	2024-01-23 16:07:31 -05:00
chenyu	3c179cc27c	cifar only shuffle data at epoch start (#3216 ) save 1ms CPU time per batch. also only shuffle training set	2024-01-23 14:41:22 -05:00
George Hotz	4a07ea355d	buffer options should work (#3211 ) * buffer options should work * minor * fix dtype	2024-01-22 19:23:55 -08:00

1 2 3 4 5 ...

3504 Commits