tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
David Hou	aebaab011f	faster wino compile by catting consts across data expand dim (#3293 ) * PoC faster wino compile by catting consts across data expand dim * fix fusions * faster + golf it * noqa 501 * implicit broadcast * Revert "implicit broadcast" This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666. * shorter * shorter * oops * 216 upcasts is probably fine * wino kernel count test * test winograd number of sts * specify device for apply_matrix mat elements	2024-02-02 03:47:45 -05:00
Felix Wu	021eea3a52	fix UnboundLocalError when running Compiler with DISABLE_COMPILER_CACHE (#3296 )	2024-02-01 21:12:33 -05:00
chenyu	9196b11dfb	test_ops sinh/cosh/asinh/acosh/atanh (#3294 ) some have numerical issues at large input similar to sigmoid	2024-02-01 03:10:11 -05:00
Francis Lam	927f2dd24d	wmma: add HIP FP16 to FP16 tensor core (#3287 ) * wmma: add HIP FP16 to FP16 tensor core * test: fix test_tensor_core to use separate tolerances for half	2024-01-31 23:00:51 -05:00
chenyu	18e854cdbf	shrink MLB on sharded axis (#3255 ) * shrink MLB on sharded axis use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training. draft version in https://github.com/chenyuxyz/tinygrad/pull/109 * SYNCBN flag * test unclean shrinks * UnsyncedBatchNorm reuses BatchNorm * more robust pad arg check * better types * more tests! * 6 gpus in benchmark * disable slow GPUS=6 benchmark	2024-01-31 21:48:25 -05:00
chenyu	a3652e6ddc	minor cleanups to test_ops (#3290 ) - removed noop a=0 - fixed integer div test - added test for both python expression and Tensor method call - reordered for consistency and added some spaces	2024-01-31 19:01:25 -05:00
chenyu	7816c3b692	onnx update for trilu and argmax (#3283 ) * support 0 in shape for tril and triu * select_last_index for ArgMax and ArgMin * pass **kwargs	2024-01-30 18:39:16 -05:00
qazal	5b46b0ff3d	Simple RDNA3 emulator (#2974 ) * mockhip->hipcpu * allocate buffers * launch a kernel read_asm api * run remu in CI * remu 0.0.2, real test ops * simple driver * 0.0.3, all test_ops * run the latest emulator * 9 minutes is way too long, drop backprop in CI * bring back the backward pass * Revert "bring back the backward pass" This reverts commit `3781e1bc56`. * Print slowest tests * emulated device directly in ops_hip * fix ruff, override mypy for specific rules * test in the same code path - hip backend env variables - install packages and verify autogen - run certain tests - remove the other hip tests path - verify Device.DEFAULT * remove the emulated hip in extra --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-30 10:39:28 -08:00
George Hotz	247a8a2a6c	add canonicalization to View.create (#3280 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views * update second any * isinstance -> not * 25% less same but unequal	2024-01-30 10:26:48 -08:00
George Hotz	d8f6280ffb	hotfix: add CHECK_NEQ to fuzz_shapetracker_math	2024-01-30 10:07:54 -08:00
George Hotz	09f2952dc3	reintroduce merge views in update benchmark (#3279 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views	2024-01-30 09:47:20 -08:00
George Hotz	d298916232	Revert "take merge views from corsix branch" (#3278 )	2024-01-30 09:34:28 -08:00
George Hotz	b57a16aa89	take merge views from corsix branch (#3273 ) * take merge views from corsix branch * better DEBUG * max views * remove view.py change * Revert "remove view.py change" This reverts commit `f3025f4f39`. * only allow filter on non symbolic * oops, correct fix * comment to explain	2024-01-30 09:25:16 -08:00
George Hotz	6a4a5dc79d	fix pad 0 size (#3277 ) * fix pad 0 size * put in view, not pad * test was wrong	2024-01-30 08:58:10 -08:00
Francis Lam	861d5ac224	wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250 ) will correctly handle and permutation of optops after the TC one	2024-01-29 11:51:57 -08:00
George Hotz	085dc87bed	winograd should be 4 kernels (#3268 )	2024-01-28 09:21:26 -08:00
George Hotz	9e17378b60	Fix metal tests (#3266 ) * small fixes for tests on mac * remove device from TensorCore	2024-01-27 18:09:42 -08:00
Hristo Georgiev	3ae811af21	tests for Tensor init data dtype and resulting dtype (#3247 ) Co-authored-by: Hristo Georgiev <6043312+hristog@users.noreply.github.com>	2024-01-27 00:13:42 -08:00
George Hotz	3c728d1082	compiler support (#3260 ) * compiler support * revert that * fix tests	2024-01-26 23:36:40 -08:00
Francis Lam	4273aabe31	extra/gemm: add a simple_conv.py along with correctness check (#3236 ) * extra/gemm: add a simple_conv.py along with correctness check The goal is to easily test tensor core triggering situations * test: add tests for acc_dtype handling and fixed typing	2024-01-26 19:06:57 -08:00
George Hotz	473935125a	use comgr to compile (#3248 ) * use comgr to compile * fast * bfloat16 * move comgr to it's own file * cleaner style * comgr in new place * comgr free + dtype cleanup	2024-01-26 18:27:49 -08:00
George Hotz	c4d870db0d	fix jit realize issue (#3258 )	2024-01-26 18:27:35 -08:00
chenyu	4197ef17c4	const cleanup with dtype.Scalar (#3257 ) moved Scalar to dtype.py. assert in _broadcasted when y is a Scalar and fix some tests	2024-01-26 21:16:22 -05:00
chenyu	bc92c4cc32	onnx Einsum, CumSum, DepthToSpace, SpaceToDepth (#3252 ) * onnx Einsum, CumSum, DepthToSpace, SpaceToDepth Einsum inner product and `...` are not supported * --durations=20	2024-01-26 10:47:53 -05:00
chenyu	e45ffdb6cf	cleanup onnx (#3249 ) * add onnx test_reduce_log_sum_exp * more reuse * more * stuff * good CenterCropPad * imports * good ArrayFeatureExtractor * pretty good Pad * stuff * stuff * onnx.py * Atan * pass int8 test * dtype related * fastmath stuff * Resize linear * fix CI * move back	2024-01-25 20:39:59 -05:00
George Hotz	7feeb118e6	hip launch speed (#3246 ) * faster HIP kernel launch * args * expand compile_hip	2024-01-25 15:13:55 -08:00
George Hotz	cb372b053f	add device speed test (#3244 )	2024-01-25 12:01:22 -08:00
geohotstan	d0e116c6d6	fix maximum/where Scalar casting (#3194 ) * init * test: added dtype tests for maximum * fix: seperate maximum const and maximum tensors * fix: del useless line * fix: some dtypes * CODE GOLF: we golfing at mar-a-lago golf club tonight boyyyys * fix: add lil helper function * fix: some test refactoring * done * sike: not done yet lol * wtf I missed an assert, am I drunk * yeah idk * fix: line save from redundant check * revert: line save * fix: simplify test_broadcast cuz I'm stumped * change some test name * fix: bool max bool works * test: add a maximum bool test * test: make sure minimum also works with bool * fix: something like this? :s * fix: maybe this? * fix: how about this? tighter check * fix: this. * revert: nvm mul(0.5) and div(2) has the same kernel for backward * fix: .is_floating_point() xD * revert: maximum and minimum and add cast * fix: cover negative const case in test * fix: use eq because I don't understand clang :D * WHOOOOPS	2024-01-25 12:26:04 -05:00
geohotstan	3628bea910	fix: big round even rounder round (#3242 ) * fix: big round even rounder round * fix: variable name lol * feat: 1 less potential cast * consistant naming (im just spaming commits now) * LOL MISSED ONNX ANOTHER COMMIT * test: fix test_ops and remove _round * test: tensor methods oops	2024-01-25 12:24:15 -05:00
chenyu	da5e27968c	failed test cases for Tensor.round (#3240 ) it should round to even	2024-01-25 02:12:50 -05:00
George Hotz	47f9887ce4	hip events work (#3229 ) * hip events work * event	2024-01-24 11:49:53 -08:00
chenyu	afeadbedc9	touch up Tensor.round and Tensor.neg (#3228 )	2024-01-24 12:29:37 -05:00
Obada Khalili	0e103b4aa0	implement Tensor.round (#3225 )	2024-01-24 11:49:17 -05:00
geohotstan	842053873d	fix neg logical_not inconsistencies (#3222 ) * try * test: add logical_not tests * gah im retarded, but this doesn't match types for const() * fix: can't we jsut do this? * big change: I don't actually know what I'm doing * WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later * BYE BYE noqa: E501 * fix: less lines and add test * fix: rm 2 redundant tests * fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e	2024-01-24 11:48:40 -05:00
George Hotz	e2e4632aea	LoadOps SYNC (#3223 ) * LoadOps SYNC and WAIT * no wait, only sync * DEBUG >= 1 * track cross device	2024-01-23 21:59:18 -08:00
chenyu	2f4b3ab1c0	shard and to should preserve requires_grad (#3224 ) dtypes are inferred from underlying lazydata, requires_grad needs to be passed explicitly	2024-01-24 00:15:10 -05:00
George Hotz	91a1b2bd7a	the runner does the build (#3220 )	2024-01-23 18:45:43 -08:00
Francis Lam	595d05a250	test: fix test_linearizer to use the correct tc_dims (#3218 ) also re-enable the test_tensor_core_opts	2024-01-23 16:07:31 -05:00
David Hou	3378625773	name upcast variables (#3200 ) * name upcast variables * typing * unused	2024-01-22 11:37:28 -05:00
chenyu	e6c71f1b26	fix device of Tensor.arange inside Tensor.one_hot (#3199 ) it should have the same device as self	2024-01-21 21:03:50 -05:00
uuuvn	640e5c36ad	Fix metal tests broken by `3f56d1a` (#3196 ) * Remove from binary_operations before copying binary_operations into integer_binary_operations * Also remove lt and eq if running on METAL	2024-01-21 11:53:25 -05:00
chenyu	b9d27636aa	cleanup test_ops.py (#3192 ) - removed exact duplicated tests - only kept one function if torch_fxn is the same as tinygrad_fxn - used tensor method instead of class method style - replaced unneeded `lamdba f: f(x)` with just `f` - re-enabled commented tests that work now - removed some forward_only now 0 shape tensor can backward	2024-01-20 20:08:56 -05:00
chenyu	3f56d1a5e8	add operator.lt and operator.eq to test_dtype_alu (#3191 ) * add operator.lt and operator.eq to test_dtype_alu those should pass now as we have broadcasted before passing to lt and eq. also updated the test skipping criteria to reuse test_dtype.is_dtype_supported * llvm lt nan is incorrect * enable truediv too * Revert "enable truediv too" This reverts commit `df703235fb`. * just that	2024-01-20 14:54:02 -05:00
chenyu	c4b5661146	fuzz length for multitensor reduce test case (#3190 ) so that the uneven case is not just with 0 length and can have other positve values	2024-01-20 00:44:38 -05:00
chenyu	fdb1c2b1d9	move reduce over 0 len axis logic to lazy.py (#3188 ) * move reduce over 0 len axis logic to lazy.py this fixed uneven shard reduce case if the uneven one has length 0 * fix interpreted backends * fix backwards for 0 shape tensors too	2024-01-20 00:13:03 -05:00
George Hotz	254a7372fe	buffer copy refactor (#3187 )	2024-01-19 20:21:24 -08:00
chenyu	cb4cfc078a	parameterize multitensor tests for reduce (#3181 ) uneven shards reduce is incorrect now	2024-01-19 14:03:01 -05:00
nimlgen	5097d5b808	fix padto when with late reduce (#3180 ) * fix padto test * no long comment	2024-01-19 14:01:44 -05:00
nimlgen	f87ecbb0f3	fuzzer validates outputs + (partially) oob accesses (#3178 ) * fuzzer validates outputs + (partially) oob accesses * +random * oob check only for compiled * type cmp fixes * fix zeroing * no prints * add seed	2024-01-19 13:34:51 -05:00
chenyu	b2571d586c	hypothesis.st -> hypothesis.strat (#3179 ) leave `st` for shapetracker	2024-01-19 11:55:26 -05:00

1 2 3 4 5 ...

1348 Commits