tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 22:38:16 -05:00

Author	SHA1	Message	Date
George Hotz	bdfdbc8f8d	broken amfi patch	2022-08-13 10:41:25 +02:00
George Hotz	262efe5784	update readme	2022-08-09 11:08:52 +02:00
George Hotz	6267a3c8c2	notes	2022-08-09 00:42:14 +02:00
George Hotz	f4ff130947	docs	2022-08-09 00:06:24 +02:00
George Hotz	01de17eeb8	amfi note	2022-08-08 13:17:36 +02:00
George Hotz	136706169d	fix ane on new mac os x	2022-08-06 19:10:22 +00:00
George Hotz	f300caa486	notes	2022-08-06 15:21:26 +00:00
George Hotz	94d526f8fc	fix op estimate	2022-08-06 14:15:50 +00:00
George Hotz	f2847cb710	remove useless init, add ops counter	2022-08-06 14:05:25 +00:00
George Hotz	5d45c6e516	Fold reduce (#362 ) * folding reduce * fold through movementops * fixup shapes * was too aggressive * i knew we needed that * don't recompute reduce * working * fix openpilot compile * prunegraph openpilot * types and reduce_shape * refactor * cleanups * neater * 1009 * 1004 * clean up reduce for 998	2022-07-19 09:24:02 -07:00
George Hotz	5e96ed523a	fix opencl bug, no training on opencl	2022-07-17 12:55:26 -07:00
George Hotz	608e2431f7	test opencl, commit to removing the crap conv code from GPU	2022-07-17 11:55:37 -07:00
George Hotz	3c4565fa21	SLICE -> PAD,SHRINK	2022-07-17 11:33:59 -07:00
George Hotz	bcf422dfdd	Device2 (#358 ) * option for matmul * fixups * fast like a nascar * running * thneed runner * no buffer id makes no backing buffer * move constant folding to the top * runs on mac * folded biases * was v slow * maybe just that * elu touchup * speed and float32 Co-authored-by: Comma Device <device@comma.ai>	2022-07-16 07:26:19 -07:00
George Hotz	817b64f5e5	A conv is a reduce op (#356 ) * universal strided conv * more correct * hmm, CPU works * cleaner cl code output * make noconv a flag * cleanup __getitem__ * refactor broadcasting * put that back * unneeded reshape in getitem * fix strided for torch	2022-07-10 19:58:50 -07:00
George Hotz	68959be05d	precompute weights for opencl	2022-07-08 10:56:48 -07:00
George Hotz	d8e7f1f8bc	opencl type ignore	2022-07-08 10:33:55 -07:00
George Hotz	ae335b6d3e	opencl works, but tons of kernels	2022-07-08 10:22:04 -07:00
George Hotz	5b66d1bb0b	begin fixing up opencl	2022-07-08 10:20:14 -07:00
George Hotz	e822aae9ec	reorg opts, nicer graph	2022-07-02 22:29:09 -07:00
George Hotz	7276f8d6bf	improve constant folding, detach before moving tensor	2022-07-02 15:29:40 -07:00
George Hotz	07b438aa8b	move that to resolve time	2022-07-02 14:26:13 -07:00
George Hotz	dbf4aa09db	assert and tuple	2022-06-27 09:19:54 -07:00
George Hotz	37a6c0ef59	create with new ShapeTracker	2022-06-27 09:07:45 -07:00
George Hotz	e55a9833fb	a little more readable	2022-06-27 08:54:04 -07:00
George Hotz	3a414d7f50	cleanup, add flops tracking	2022-06-26 22:43:39 -07:00
George Hotz	a699f7cb0b	debug cleanups	2022-06-26 21:58:44 -07:00
George Hotz	15a16b98e6	remove get_root	2022-06-26 21:18:02 -07:00
George Hotz	e3c2579537	flip stride to match canonical	2022-06-26 19:19:53 -07:00
George Hotz	49c954b389	comments	2022-06-26 17:20:25 -07:00
George Hotz	8c483fbdc9	maxpool lazy fix	2022-06-26 17:07:03 -07:00
George Hotz	bdde95f16e	CACHE_LAZYBUFFERS options + benchmark. only a couple x from torch	2022-06-24 22:33:53 -07:00
George Hotz	3e13e3330a	UNSAFE_FLOAT4 env	2022-06-22 08:20:29 -07:00
George Hotz	73415e20ab	this fixes 2 of the conv recomputes...but it's ugh	2022-06-22 08:18:12 -07:00
George Hotz	b2d5df6049	3 convs are being recomputed	2022-06-22 07:54:52 -07:00
George Hotz	ba2defcdef	elif False	2022-06-21 23:54:09 -07:00
George Hotz	9cb0522574	noargs	2022-06-21 23:48:58 -07:00
George Hotz	1074dfbb71	unstrided	2022-06-21 23:42:21 -07:00
George Hotz	9ae01290ba	pass in shorts	2022-06-21 23:33:23 -07:00
George Hotz	18d74c01b1	float4 opt	2022-06-21 21:27:51 -07:00
George Hotz	ff3d5fe962	debugging while we compile	2022-06-21 21:12:04 -07:00
George Hotz	9d06a86f7f	CL class, debugging	2022-06-21 20:16:29 -07:00
George Hotz	0b820f7966	FOLD_CONSTANTS_INTO_KERNELS and shapetracker OOB tweak	2022-06-21 19:47:15 -07:00
George Hotz	1ebc2b5545	lazy opencl works	2022-06-21 19:41:08 -07:00
George Hotz	c53c91f949	opencl tests passed (#347 )	2022-06-21 18:57:09 -07:00
George Hotz	8fbe2e4aed	No ctx in llops (#345 ) * remove ctx from gpu ops * ctx for the others * this is okay * mlops are not static. fix lazy * cl is property, _processing_op is class method * kernel_name * contiguous_op	2022-06-21 10:07:49 -07:00
George Hotz	159a2d1a80	Simple Lazy (#340 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * and that * cmp is fast * 18ms on mac * it's a lot of lines, but it's faster * minor * tests pass * LoadOps.CONTIGUOUS * remove dups * torch converter doesn't support slice * move lazy out for merge * LoadOps are only for lazy	2022-06-20 22:45:11 -07:00
George Hotz	77f5cef8a6	First batch from lazy branch (#332 ) * test and helpers from lazy * lazy pt2	2022-06-18 17:26:59 -07:00
George Hotz	52505faaf4	minor	2022-06-16 15:53:45 -07:00
George Hotz	d5b3e18540	Accelerate with CL (#325 ) * accelerated opencl * it's running, it's just wrong * bugfix * model is correct in opencl * lazy image convert * add padding support to convolution * that stuff was all upstreamed * remove HEAD * oops * test_simple_conv2d_4 passes, add dilation support * put logic in ops_opencl * fix crash * hmm, stride seems okay * padding for batched inputs * just an issue now with cout%4 * op model still passes * fix startPackedInputChannel * pre and post processing ops for graph * don't break other llops * shapetrackering * reshapes are free * lazy movement ops	2022-06-16 15:40:52 -07:00

1 2

68 Commits