tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
George Hotz	6847eaf5b6	comments	2022-06-22 09:37:50 -07:00
George Hotz	1d4fb3527e	cleanups to Tensor class	2022-06-22 09:33:30 -07:00
George Hotz	3e13e3330a	UNSAFE_FLOAT4 env	2022-06-22 08:20:29 -07:00
George Hotz	73415e20ab	this fixes 2 of the conv recomputes...but it's ugh	2022-06-22 08:18:12 -07:00
George Hotz	b2d5df6049	3 convs are being recomputed	2022-06-22 07:54:52 -07:00
George Hotz	ba2defcdef	elif False	2022-06-21 23:54:09 -07:00
George Hotz	9cb0522574	noargs	2022-06-21 23:48:58 -07:00
George Hotz	1074dfbb71	unstrided	2022-06-21 23:42:21 -07:00
George Hotz	9ae01290ba	pass in shorts	2022-06-21 23:33:23 -07:00
George Hotz	18d74c01b1	float4 opt	2022-06-21 21:27:51 -07:00
George Hotz	ff3d5fe962	debugging while we compile	2022-06-21 21:12:04 -07:00
George Hotz	b12985b013	openpilot compiler	2022-06-21 20:31:18 -07:00
George Hotz	98a730dd00	benchmark on different inputs	2022-06-21 20:20:58 -07:00
George Hotz	9d06a86f7f	CL class, debugging	2022-06-21 20:16:29 -07:00
George Hotz	0b820f7966	FOLD_CONSTANTS_INTO_KERNELS and shapetracker OOB tweak	2022-06-21 19:47:15 -07:00
George Hotz	83d50e2687	move to extra.onnx	2022-06-21 19:43:44 -07:00
George Hotz	1ebc2b5545	lazy opencl works	2022-06-21 19:41:08 -07:00
George Hotz	c833886bf5	improved shapetracker	2022-06-21 19:17:25 -07:00
George Hotz	c53c91f949	opencl tests passed (#347 )	2022-06-21 18:57:09 -07:00
George Hotz	8fbe2e4aed	No ctx in llops (#345 ) * remove ctx from gpu ops * ctx for the others * this is okay * mlops are not static. fix lazy * cl is property, _processing_op is class method * kernel_name * contiguous_op	2022-06-21 10:07:49 -07:00
George Hotz	159a2d1a80	Simple Lazy (#340 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * and that * cmp is fast * 18ms on mac * it's a lot of lines, but it's faster * minor * tests pass * LoadOps.CONTIGUOUS * remove dups * torch converter doesn't support slice * move lazy out for merge * LoadOps are only for lazy	2022-06-20 22:45:11 -07:00
George Hotz	a3538e225a	Simple Lazy Pieces (#343 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * mergable without this * ops torch	2022-06-20 20:28:10 -07:00
George Hotz	2ee85812f7	intel opencl (#342 ) * intel opencl * run clinfo * that fix it? * meh * think it's the same * basekit fix * it wasn't basekit * more minimal * no clinfo	2022-06-20 19:25:55 -07:00
George Hotz	3e7416163d	batch from lazy branch (#341 )	2022-06-20 17:42:35 -07:00
George Hotz	a7131b6a46	Non contig (#339 ) * contiguous_view * non contig reduce too * conv fast * maybe faster valid * improve test_onnx * improve params * elementwise_op * draw non contig * improve contiguous	2022-06-19 22:40:48 -07:00
George Hotz	d05e7c291a	contiguous_view (#336 ) * contiguous_view * non contig reduce too * conv fast * maybe faster valid * improve test_onnx * improve params * elementwise_op * draw non contig	2022-06-19 20:37:28 -07:00
George Hotz	fb72ea3fbd	gpu uses shapetracker (fix tests) (#335 ) * shapetracker * movement_op * hmm, that's why repr failed	2022-06-19 17:32:07 -07:00
George Hotz	ce2e20b768	fix test	2022-06-19 17:07:09 -07:00
George Hotz	f5f21ecb86	gpu buffer is shapetracker	2022-06-19 17:02:24 -07:00
George Hotz	6b652dafb2	touchups	2022-06-19 16:57:14 -07:00
George Hotz	e364849b3b	stuff from lazy	2022-06-19 09:57:16 -07:00
Tim Lügger	2069fef292	unnecessary assign add in cpu processing_op (#334 ) We can replace += with = since we only change tmp once. Now np.empty() can replace np.zeros() which might be slightly faster. This saves a few milliseconds, best case ~60ms. (However, most of the time in ops_cpu.processing_op() seems to be spend on np.reshape())	2022-06-19 07:41:40 -07:00
George Hotz	8d08e41c21	print time in test	2022-06-19 00:59:09 -07:00
George Hotz	395eb60f46	less lines, and oddly faster	2022-06-18 21:48:42 -07:00
George Hotz	aa164d901e	remove ctx from buffers (#333 )	2022-06-18 17:27:10 -07:00
George Hotz	77f5cef8a6	First batch from lazy branch (#332 ) * test and helpers from lazy * lazy pt2	2022-06-18 17:26:59 -07:00
George Hotz	3faf8353ca	remove out_shape from processing_op	2022-06-16 17:07:57 -07:00
George Hotz	a11deb5150	shapetracker check for noop	2022-06-16 16:29:18 -07:00
George Hotz	52505faaf4	minor	2022-06-16 15:53:45 -07:00
George Hotz	d5b3e18540	Accelerate with CL (#325 ) * accelerated opencl * it's running, it's just wrong * bugfix * model is correct in opencl * lazy image convert * add padding support to convolution * that stuff was all upstreamed * remove HEAD * oops * test_simple_conv2d_4 passes, add dilation support * put logic in ops_opencl * fix crash * hmm, stride seems okay * padding for batched inputs * just an issue now with cout%4 * op model still passes * fix startPackedInputChannel * pre and post processing ops for graph * don't break other llops * shapetrackering * reshapes are free * lazy movement ops	2022-06-16 15:40:52 -07:00
George Hotz	bd7068f635	fix tests hopefully	2022-06-16 14:07:37 -07:00
George Hotz	9306759cbc	put the allocations back in the ops	2022-06-16 12:12:55 -07:00
George Hotz	ce15bf2bdb	the big memory gradient didn't even need to be computed	2022-06-16 11:41:29 -07:00
George Hotz	2e58948f6a	Revert "can put that test back" This reverts commit `51b082b41a`.	2022-06-16 11:25:49 -07:00
George Hotz	51b082b41a	can put that test back	2022-06-16 11:18:14 -07:00
George Hotz	73bc181fbe	cleaner output shape	2022-06-16 10:24:03 -07:00
George Hotz	b5796ae4f9	remove useless reshape	2022-06-16 10:15:43 -07:00
George Hotz	89db797e57	get rid of reduce using channels	2022-06-16 10:01:54 -07:00
George Hotz	38d6cfec2a	remove the expand	2022-06-16 09:54:56 -07:00
George Hotz	bcfbb4c81b	minor cleanups	2022-06-15 22:27:46 -07:00

... 190 191 192 193 194 ...

10417 Commits