tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-18 02:21:40 -05:00

Author	SHA1	Message	Date
George Hotz	e3c2579537	flip stride to match canonical	2022-06-26 19:19:53 -07:00
George Hotz	53ab09de79	remove the SLICE on conv dw	2022-06-26 19:09:36 -07:00
George Hotz	149581b0b2	Cdx without SLICE	2022-06-26 18:51:53 -07:00
George Hotz	a04813ffe3	1 line less in cpu, fix torch tests	2022-06-26 18:11:53 -07:00
George Hotz	dffde3de5a	support both asymmetric and negative padding	2022-06-26 17:59:25 -07:00
George Hotz	49c954b389	comments	2022-06-26 17:20:25 -07:00
George Hotz	8c483fbdc9	maxpool lazy fix	2022-06-26 17:07:03 -07:00
George Hotz	f607f18006	fix backward	2022-06-25 00:00:53 -07:00
George Hotz	ec30f0402f	improve benchmark_train_efficientnet	2022-06-24 23:46:38 -07:00
George Hotz	3a147137ee	CL_DEVICE option	2022-06-24 23:22:10 -07:00
George Hotz	d748353ce5	err, okay, a bit more off	2022-06-24 22:44:57 -07:00
George Hotz	bdde95f16e	CACHE_LAZYBUFFERS options + benchmark. only a couple x from torch	2022-06-24 22:33:53 -07:00
George Hotz	6847eaf5b6	comments	2022-06-22 09:37:50 -07:00
George Hotz	1d4fb3527e	cleanups to Tensor class	2022-06-22 09:33:30 -07:00
George Hotz	3e13e3330a	UNSAFE_FLOAT4 env	2022-06-22 08:20:29 -07:00
George Hotz	73415e20ab	this fixes 2 of the conv recomputes...but it's ugh	2022-06-22 08:18:12 -07:00
George Hotz	b2d5df6049	3 convs are being recomputed	2022-06-22 07:54:52 -07:00
George Hotz	ba2defcdef	elif False	2022-06-21 23:54:09 -07:00
George Hotz	9cb0522574	noargs	2022-06-21 23:48:58 -07:00
George Hotz	1074dfbb71	unstrided	2022-06-21 23:42:21 -07:00
George Hotz	9ae01290ba	pass in shorts	2022-06-21 23:33:23 -07:00
George Hotz	18d74c01b1	float4 opt	2022-06-21 21:27:51 -07:00
George Hotz	ff3d5fe962	debugging while we compile	2022-06-21 21:12:04 -07:00
George Hotz	b12985b013	openpilot compiler	2022-06-21 20:31:18 -07:00
George Hotz	98a730dd00	benchmark on different inputs	2022-06-21 20:20:58 -07:00
George Hotz	9d06a86f7f	CL class, debugging	2022-06-21 20:16:29 -07:00
George Hotz	0b820f7966	FOLD_CONSTANTS_INTO_KERNELS and shapetracker OOB tweak	2022-06-21 19:47:15 -07:00
George Hotz	83d50e2687	move to extra.onnx	2022-06-21 19:43:44 -07:00
George Hotz	1ebc2b5545	lazy opencl works	2022-06-21 19:41:08 -07:00
George Hotz	c833886bf5	improved shapetracker	2022-06-21 19:17:25 -07:00
George Hotz	c53c91f949	opencl tests passed (#347 )	2022-06-21 18:57:09 -07:00
George Hotz	8fbe2e4aed	No ctx in llops (#345 ) * remove ctx from gpu ops * ctx for the others * this is okay * mlops are not static. fix lazy * cl is property, _processing_op is class method * kernel_name * contiguous_op	2022-06-21 10:07:49 -07:00
George Hotz	159a2d1a80	Simple Lazy (#340 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * and that * cmp is fast * 18ms on mac * it's a lot of lines, but it's faster * minor * tests pass * LoadOps.CONTIGUOUS * remove dups * torch converter doesn't support slice * move lazy out for merge * LoadOps are only for lazy	2022-06-20 22:45:11 -07:00
George Hotz	a3538e225a	Simple Lazy Pieces (#343 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * mergable without this * ops torch	2022-06-20 20:28:10 -07:00
George Hotz	2ee85812f7	intel opencl (#342 ) * intel opencl * run clinfo * that fix it? * meh * think it's the same * basekit fix * it wasn't basekit * more minimal * no clinfo	2022-06-20 19:25:55 -07:00
George Hotz	3e7416163d	batch from lazy branch (#341 )	2022-06-20 17:42:35 -07:00
George Hotz	a7131b6a46	Non contig (#339 ) * contiguous_view * non contig reduce too * conv fast * maybe faster valid * improve test_onnx * improve params * elementwise_op * draw non contig * improve contiguous	2022-06-19 22:40:48 -07:00
George Hotz	d05e7c291a	contiguous_view (#336 ) * contiguous_view * non contig reduce too * conv fast * maybe faster valid * improve test_onnx * improve params * elementwise_op * draw non contig	2022-06-19 20:37:28 -07:00
George Hotz	fb72ea3fbd	gpu uses shapetracker (fix tests) (#335 ) * shapetracker * movement_op * hmm, that's why repr failed	2022-06-19 17:32:07 -07:00
George Hotz	ce2e20b768	fix test	2022-06-19 17:07:09 -07:00
George Hotz	f5f21ecb86	gpu buffer is shapetracker	2022-06-19 17:02:24 -07:00
George Hotz	6b652dafb2	touchups	2022-06-19 16:57:14 -07:00
George Hotz	e364849b3b	stuff from lazy	2022-06-19 09:57:16 -07:00
Tim Lügger	2069fef292	unnecessary assign add in cpu processing_op (#334 ) We can replace += with = since we only change tmp once. Now np.empty() can replace np.zeros() which might be slightly faster. This saves a few milliseconds, best case ~60ms. (However, most of the time in ops_cpu.processing_op() seems to be spend on np.reshape())	2022-06-19 07:41:40 -07:00
George Hotz	8d08e41c21	print time in test	2022-06-19 00:59:09 -07:00
George Hotz	395eb60f46	less lines, and oddly faster	2022-06-18 21:48:42 -07:00
George Hotz	aa164d901e	remove ctx from buffers (#333 )	2022-06-18 17:27:10 -07:00
George Hotz	77f5cef8a6	First batch from lazy branch (#332 ) * test and helpers from lazy * lazy pt2	2022-06-18 17:26:59 -07:00
George Hotz	3faf8353ca	remove out_shape from processing_op	2022-06-16 17:07:57 -07:00
George Hotz	a11deb5150	shapetracker check for noop	2022-06-16 16:29:18 -07:00

... 141 142 143 144 145 ...

7979 Commits