tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-04 19:55:18 -05:00

Author	SHA1	Message	Date
chenyu	da5e27968c	failed test cases for Tensor.round (#3240 ) it should round to even	2024-01-25 02:12:50 -05:00
geohotstan	b0b5eba535	fix _round in onnx_ops to look more like new Tensor.round (#3239 ) * fix: _round in onnxops * fix: minor things * fix: no more n * fix: smol * fix: smoller	2024-01-25 01:18:58 -05:00
George Hotz	aa0d1b6330	hotfix: don't use noqa: E702 that's just dumb	2024-01-24 20:01:00 -08:00
George Hotz	b92945c98d	hotfix: DEBUG >= 2 for kernels	2024-01-24 23:55:17 +00:00
George Hotz	a8fbb03438	minor hip cleanups (#3237 )	2024-01-24 15:13:38 -08:00
nimlgen	3205fd8481	fix cuda device var rewrite (#3233 )	2024-01-24 16:57:49 -05:00
George Hotz	ed8a32722a	hip mutex signal (#3234 ) * hip mutex * hip mutex 2 * sync	2024-01-24 13:23:09 -08:00
George Hotz	47f9887ce4	hip events work (#3229 ) * hip events work * event	2024-01-24 11:49:53 -08:00
George Hotz	de7a3a56ff	save lines in llvm (#3231 ) * save lines in llvm * no implied cast in load * no cast in gate	2024-01-24 11:40:53 -08:00
George Hotz	83d614295e	reduce lines (#3230 )	2024-01-24 10:35:59 -08:00
chenyu	afeadbedc9	touch up Tensor.round and Tensor.neg (#3228 )	2024-01-24 12:29:37 -05:00
Obada Khalili	0e103b4aa0	implement Tensor.round (#3225 )	2024-01-24 11:49:17 -05:00
geohotstan	842053873d	fix neg logical_not inconsistencies (#3222 ) * try * test: add logical_not tests * gah im retarded, but this doesn't match types for const() * fix: can't we jsut do this? * big change: I don't actually know what I'm doing * WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later * BYE BYE noqa: E501 * fix: less lines and add test * fix: rm 2 redundant tests * fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e	2024-01-24 11:48:40 -05:00
George Hotz	e2e4632aea	LoadOps SYNC (#3223 ) * LoadOps SYNC and WAIT * no wait, only sync * DEBUG >= 1 * track cross device	2024-01-23 21:59:18 -08:00
chenyu	2f4b3ab1c0	shard and to should preserve requires_grad (#3224 ) dtypes are inferred from underlying lazydata, requires_grad needs to be passed explicitly	2024-01-24 00:15:10 -05:00
George Hotz	23b084e70a	add device name to device, all are constructed (#3221 )	2024-01-23 20:34:56 -08:00
George Hotz	91a1b2bd7a	the runner does the build (#3220 )	2024-01-23 18:45:43 -08:00
chenyu	9e5409be6c	cifar move GlobalCounters.reset() before shard (#3217 ) * cifar move GlobalCounters.reset() before shard also shard mini batch inplace * don't eval with DISABLE_BACKWARD	2024-01-23 16:07:43 -05:00
Francis Lam	595d05a250	test: fix test_linearizer to use the correct tc_dims (#3218 ) also re-enable the test_tensor_core_opts	2024-01-23 16:07:31 -05:00
chenyu	3c179cc27c	cifar only shuffle data at epoch start (#3216 ) save 1ms CPU time per batch. also only shuffle training set	2024-01-23 14:41:22 -05:00
George Hotz	4a07ea355d	buffer options should work (#3211 ) * buffer options should work * minor * fix dtype	2024-01-22 19:23:55 -08:00
George Hotz	a06f34ae42	remove dead lines from cstyle (#3212 ) * remove dead lines from cstyle * external_local_bufs is dead * more lines * minor cleanup	2024-01-22 18:59:19 -08:00
chenyu	8465938d29	minor hlb_cifar cleanups (#3208 ) mostly cosmetic. LATEBEAM=4 single 7900xtx 59.2 seconds	2024-01-22 12:38:39 -05:00
David Hou	3378625773	name upcast variables (#3200 ) * name upcast variables * typing * unused	2024-01-22 11:37:28 -05:00
chenyu	827b7a3c64	cleanup pad_reflect and make_square_mask in hlb_cifar (#3206 ) removed some complicated looking stuff. no wall time difference	2024-01-22 11:30:46 -05:00
chenyu	99884f4c98	cifar flags for RANDOM_CROP, RANDOM_FLIP, and CUTMIX (#3204 ) experimenting with different setups, also would like to jit the data augmentation next	2024-01-22 01:12:51 -05:00
chenyu	53afec2841	add HALF to handcode_resnet50_opt.py (#3202 ) use this to study tensor cores on HIP	2024-01-21 23:03:59 -05:00
chenyu	836883fedc	comment out cutmix in hlb_cifar (#3201 ) it's no-op with multi gpu and less STEPS. also the patch was selected from the whole dataset, not from the same batch	2024-01-21 22:24:53 -05:00
chenyu	e6c71f1b26	fix device of Tensor.arange inside Tensor.one_hot (#3199 ) it should have the same device as self	2024-01-21 21:03:50 -05:00
chenyu	f7d1c42239	cleanup noop prefixes in _pool (#3198 ) * cleanup noop prefixes in _pool make expand dim=None as noop (in addition to -1). then slice, reshape, expand in _pool can share the same noop prefix * nit * something then reshape style * that's repeat	2024-01-21 20:03:32 -05:00
uuuvn	640e5c36ad	Fix metal tests broken by `3f56d1a` (#3196 ) * Remove from binary_operations before copying binary_operations into integer_binary_operations * Also remove lt and eq if running on METAL	2024-01-21 11:53:25 -05:00
chenyu	b9d27636aa	cleanup test_ops.py (#3192 ) - removed exact duplicated tests - only kept one function if torch_fxn is the same as tinygrad_fxn - used tensor method instead of class method style - replaced unneeded `lamdba f: f(x)` with just `f` - re-enabled commented tests that work now - removed some forward_only now 0 shape tensor can backward	2024-01-20 20:08:56 -05:00
chenyu	3f56d1a5e8	add operator.lt and operator.eq to test_dtype_alu (#3191 ) * add operator.lt and operator.eq to test_dtype_alu those should pass now as we have broadcasted before passing to lt and eq. also updated the test skipping criteria to reuse test_dtype.is_dtype_supported * llvm lt nan is incorrect * enable truediv too * Revert "enable truediv too" This reverts commit `df703235fb`. * just that	2024-01-20 14:54:02 -05:00
chenyu	c4b5661146	fuzz length for multitensor reduce test case (#3190 ) so that the uneven case is not just with 0 length and can have other positve values	2024-01-20 00:44:38 -05:00
chenyu	fdb1c2b1d9	move reduce over 0 len axis logic to lazy.py (#3188 ) * move reduce over 0 len axis logic to lazy.py this fixed uneven shard reduce case if the uneven one has length 0 * fix interpreted backends * fix backwards for 0 shape tensors too	2024-01-20 00:13:03 -05:00
chenyu	485332935e	ring copy example (#3185 ) * ring copy example * use ones for init	2024-01-19 23:34:30 -05:00
George Hotz	254a7372fe	buffer copy refactor (#3187 )	2024-01-19 20:21:24 -08:00
chenyu	fb4bd2a57d	reenable padto to search action (#3183 )	2024-01-19 14:17:53 -05:00
chenyu	cb4cfc078a	parameterize multitensor tests for reduce (#3181 ) uneven shards reduce is incorrect now	2024-01-19 14:03:01 -05:00
nimlgen	5097d5b808	fix padto when with late reduce (#3180 ) * fix padto test * no long comment	2024-01-19 14:01:44 -05:00
George Hotz	729a01bf3e	complex PRs will not be merged	2024-01-19 10:58:47 -08:00
nimlgen	f87ecbb0f3	fuzzer validates outputs + (partially) oob accesses (#3178 ) * fuzzer validates outputs + (partially) oob accesses * +random * oob check only for compiled * type cmp fixes * fix zeroing * no prints * add seed	2024-01-19 13:34:51 -05:00
chenyu	b2571d586c	hypothesis.st -> hypothesis.strat (#3179 ) leave `st` for shapetracker	2024-01-19 11:55:26 -05:00
chenyu	c4faedebf3	add test cases for negative entry max allreduce (#3177 )	2024-01-18 22:26:51 -05:00
chenyu	ab1b7c4d09	fix allreduce for max (#3175 ) * test cases to show allreduce for max is incorrect * oh fixed	2024-01-18 20:25:35 -05:00
George Hotz	c51c90bcd4	more sync in transfer (#3174 )	2024-01-18 17:17:03 -08:00
chenyu	28dcbf0e00	test case sharded batchnorm has different ast on devices (#3172 )	2024-01-18 18:12:15 -05:00
chenyu	a60d50487d	disable padto, seems to have bug in gpt2 (#3173 )	2024-01-18 18:09:30 -05:00
George Hotz	c80884884e	event driven hip (#3160 ) * event driven hip * simpler, src makes copy * pass mypy	2024-01-18 14:35:18 -08:00
George Hotz	d2aab65958	remove unused expr node (#3170 ) * remove unused expr node * still works * simple expr_idxs * fixup typing	2024-01-18 14:18:43 -08:00

1 2 3 4 5 ...

3475 Commits