tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
David Hou	aebaab011f	faster wino compile by catting consts across data expand dim (#3293 ) * PoC faster wino compile by catting consts across data expand dim * fix fusions * faster + golf it * noqa 501 * implicit broadcast * Revert "implicit broadcast" This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666. * shorter * shorter * oops * 216 upcasts is probably fine * wino kernel count test * test winograd number of sts * specify device for apply_matrix mat elements	2024-02-02 03:47:45 -05:00
David Hou	cf6f478901	limit group_for_reduce bufs to 32kb (#3299 ) hipcc crashes for buffers that are too large	2024-02-02 03:13:12 -05:00
chenyu	b564660637	type annotation for Compiler.cachekey and minor cleanup (#3298 )	2024-02-01 21:31:21 -05:00
Felix Wu	021eea3a52	fix UnboundLocalError when running Compiler with DISABLE_COMPILER_CACHE (#3296 )	2024-02-01 21:12:33 -05:00
chenyu	a5bf4afc1a	update ruff.toml for v0.2.0 (#3297 ) select -> lint.select. also added rule names for fully specified ones	2024-02-01 20:50:20 -05:00
chenyu	9196b11dfb	test_ops sinh/cosh/asinh/acosh/atanh (#3294 ) some have numerical issues at large input similar to sigmoid	2024-02-01 03:10:11 -05:00
Francis Lam	927f2dd24d	wmma: add HIP FP16 to FP16 tensor core (#3287 ) * wmma: add HIP FP16 to FP16 tensor core * test: fix test_tensor_core to use separate tolerances for half	2024-01-31 23:00:51 -05:00
chenyu	18e854cdbf	shrink MLB on sharded axis (#3255 ) * shrink MLB on sharded axis use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training. draft version in https://github.com/chenyuxyz/tinygrad/pull/109 * SYNCBN flag * test unclean shrinks * UnsyncedBatchNorm reuses BatchNorm * more robust pad arg check * better types * more tests! * 6 gpus in benchmark * disable slow GPUS=6 benchmark	2024-01-31 21:48:25 -05:00
chenyu	a3652e6ddc	minor cleanups to test_ops (#3290 ) - removed noop a=0 - fixed integer div test - added test for both python expression and Tensor method call - reordered for consistency and added some spaces	2024-01-31 19:01:25 -05:00
chenyu	77251336d5	fix handcode_resnet50_opt.py (#3289 ) linearizer_opts has moved. also update the logging to print after total_tm update	2024-01-31 19:01:08 -05:00
chenyu	9b8c1a0408	Tensor.batchnorm works more than 2d and reuse in onnx (#3284 )	2024-01-30 19:02:45 -05:00
chenyu	7816c3b692	onnx update for trilu and argmax (#3283 ) * support 0 in shape for tril and triu * select_last_index for ArgMax and ArgMin * pass **kwargs	2024-01-30 18:39:16 -05:00
qazal	5b46b0ff3d	Simple RDNA3 emulator (#2974 ) * mockhip->hipcpu * allocate buffers * launch a kernel read_asm api * run remu in CI * remu 0.0.2, real test ops * simple driver * 0.0.3, all test_ops * run the latest emulator * 9 minutes is way too long, drop backprop in CI * bring back the backward pass * Revert "bring back the backward pass" This reverts commit `3781e1bc56`. * Print slowest tests * emulated device directly in ops_hip * fix ruff, override mypy for specific rules * test in the same code path - hip backend env variables - install packages and verify autogen - run certain tests - remove the other hip tests path - verify Device.DEFAULT * remove the emulated hip in extra --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-30 10:39:28 -08:00
George Hotz	247a8a2a6c	add canonicalization to View.create (#3280 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views * update second any * isinstance -> not * 25% less same but unequal	2024-01-30 10:26:48 -08:00
George Hotz	d8f6280ffb	hotfix: add CHECK_NEQ to fuzz_shapetracker_math	2024-01-30 10:07:54 -08:00
George Hotz	09f2952dc3	reintroduce merge views in update benchmark (#3279 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views	2024-01-30 09:47:20 -08:00
George Hotz	d298916232	Revert "take merge views from corsix branch" (#3278 )	2024-01-30 09:34:28 -08:00
George Hotz	b57a16aa89	take merge views from corsix branch (#3273 ) * take merge views from corsix branch * better DEBUG * max views * remove view.py change * Revert "remove view.py change" This reverts commit `f3025f4f39`. * only allow filter on non symbolic * oops, correct fix * comment to explain	2024-01-30 09:25:16 -08:00
George Hotz	6a4a5dc79d	fix pad 0 size (#3277 ) * fix pad 0 size * put in view, not pad * test was wrong	2024-01-30 08:58:10 -08:00
chenyu	b0a755288f	cifar EVAL_BS set default value to BS (#3274 ) less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s	2024-01-29 17:37:12 -05:00
Francis Lam	861d5ac224	wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250 ) will correctly handle and permutation of optops after the TC one	2024-01-29 11:51:57 -08:00
chenyu	af4ca85594	MultiLazyBuffer.reshape new_axis without real_strides (#3272 ) similar to contraction, but this is one is for finding the mapped single axis	2024-01-28 23:53:52 -05:00
chenyu	34c7621556	HIP=1 NOCLANG=1 for tinybox external_model_benchmark (#3270 ) used HIP instead of GPU and disabled slow CLANG	2024-01-28 22:05:26 -05:00
George Hotz	085dc87bed	winograd should be 4 kernels (#3268 )	2024-01-28 09:21:26 -08:00
George Hotz	f48b6aca77	long running beam pool (#3267 )	2024-01-28 08:06:03 -08:00
George Hotz	9e17378b60	Fix metal tests (#3266 ) * small fixes for tests on mac * remove device from TensorCore	2024-01-27 18:09:42 -08:00
Francis Lata	86748f4a8c	fix bbox format to be a list (#3265 )	2024-01-27 17:54:19 -08:00
George Hotz	67a78615e5	uoptimizer (#3262 ) * uoptimizer * uops * self.uoptimize	2024-01-27 10:26:04 -08:00
Hristo Georgiev	3ae811af21	tests for Tensor init data dtype and resulting dtype (#3247 ) Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com>	2024-01-27 00:13:42 -08:00
George Hotz	3c728d1082	compiler support (#3260 ) * compiler support * revert that * fix tests	2024-01-26 23:36:40 -08:00
Francis Lam	4273aabe31	extra/gemm: add a simple_conv.py along with correctness check (#3236 ) * extra/gemm: add a simple_conv.py along with correctness check The goal is to easily test tensor core triggering situations * test: add tests for acc_dtype handling and fixed typing	2024-01-26 19:06:57 -08:00
George Hotz	0aad8d238b	rebuild ocelot (#3259 ) * rebuild * strip trailing whitespace	2024-01-26 18:46:36 -08:00
George Hotz	473935125a	use comgr to compile (#3248 ) * use comgr to compile * fast * bfloat16 * move comgr to it's own file * cleaner style * comgr in new place * comgr free + dtype cleanup	2024-01-26 18:27:49 -08:00
George Hotz	c4d870db0d	fix jit realize issue (#3258 )	2024-01-26 18:27:35 -08:00
chenyu	4197ef17c4	const cleanup with dtype.Scalar (#3257 ) moved Scalar to dtype.py. assert in _broadcasted when y is a Scalar and fix some tests	2024-01-26 21:16:22 -05:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00
chenyu	bc92c4cc32	onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252 ) * onnx Einsum, CumSum, DepthToSpace, SpaceToDepth Einsum inner product and `...` are not supported * --durations=20	2024-01-26 10:47:53 -05:00
chenyu	e45ffdb6cf	cleanup onnx (#3249 ) * add onnx test_reduce_log_sum_exp * more reuse * more * stuff * good CenterCropPad * imports * good ArrayFeatureExtractor * pretty good Pad * stuff * stuff * onnx.py * Atan * pass int8 test * dtype related * fastmath stuff * Resize linear * fix CI * move back	2024-01-25 20:39:59 -05:00
Ahmed Harmouche	168b1f879c	Fix hip_matmul gemm in extra (#3241 )	2024-01-25 16:03:04 -08:00
George Hotz	7feeb118e6	hip launch speed (#3246 ) * faster HIP kernel launch * args * expand compile_hip	2024-01-25 15:13:55 -08:00
George Hotz	cb372b053f	add device speed test (#3244 )	2024-01-25 12:01:22 -08:00
geohotstan	d0e116c6d6	fix maximum/where Scalar casting (#3194 ) * init * test: added dtype tests for maximum * fix: seperate maximum const and maximum tensors * fix: del useless line * fix: some dtypes * CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys * fix: add lil helper function * fix: some test refactoring * done * sike: not done yet lol * wtf I missed an assert, am I drunk * yeah idk * fix: line save from redundant check * revert: line save * fix: simplify test_broadcast cuz I'm stumped * change some test name * fix: bool max bool works * test: add a maximum bool test * test: make sure minimum also works with bool * fix: something like this? :s * fix: maybe this? * fix: how about this? tighter check * fix: this. * revert: nvm mul(0.5) and div(2) has the same kernel for backward * fix: .is_floating_point() xD * revert: maximum and minimum and add cast * fix: cover negative const case in test * fix: use eq because I don't understand clang :D * WHOOOOPS	2024-01-25 12:26:04 -05:00
geohotstan	3628bea910	fix: big round even rounder round (#3242 ) * fix: big round even rounder round * fix: variable name lol * feat: 1 less potential cast * consistant naming (im just spaming commits now) * LOL MISSED ONNX ANOTHER COMMIT * test: fix test_ops and remove _round * test: tensor methods oops	2024-01-25 12:24:15 -05:00
chenyu	da5e27968c	failed test cases for Tensor.round (#3240 ) it should round to even	2024-01-25 02:12:50 -05:00
geohotstan	b0b5eba535	fix _round in onnx_ops to look more like new Tensor.round (#3239 ) * fix: _round in onnxops * fix: minor things * fix: no more n * fix: smol * fix: smoller	2024-01-25 01:18:58 -05:00
George Hotz	aa0d1b6330	hotfix: don't use noqa: E702 that's just dumb	2024-01-24 20:01:00 -08:00
George Hotz	b92945c98d	hotfix: DEBUG >= 2 for kernels	2024-01-24 23:55:17 +00:00
George Hotz	a8fbb03438	minor hip cleanups (#3237 )	2024-01-24 15:13:38 -08:00
nimlgen	3205fd8481	fix cuda device var rewrite (#3233 )	2024-01-24 16:57:49 -05:00

1 2 3 4 5 ...

3519 Commits