tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	e822aae9ec	reorg opts, nicer graph	2022-07-02 22:29:09 -07:00
George Hotz	7276f8d6bf	improve constant folding, detach before moving tensor	2022-07-02 15:29:40 -07:00
George Hotz	07b438aa8b	move that to resolve time	2022-07-02 14:26:13 -07:00
George Hotz	dbf4aa09db	assert and tuple	2022-06-27 09:19:54 -07:00
George Hotz	37a6c0ef59	create with new ShapeTracker	2022-06-27 09:07:45 -07:00
George Hotz	e55a9833fb	a little more readable	2022-06-27 08:54:04 -07:00
George Hotz	3a414d7f50	cleanup, add flops tracking	2022-06-26 22:43:39 -07:00
George Hotz	a699f7cb0b	debug cleanups	2022-06-26 21:58:44 -07:00
George Hotz	15a16b98e6	remove get_root	2022-06-26 21:18:02 -07:00
George Hotz	49c954b389	comments	2022-06-26 17:20:25 -07:00
George Hotz	8c483fbdc9	maxpool lazy fix	2022-06-26 17:07:03 -07:00
George Hotz	bdde95f16e	CACHE_LAZYBUFFERS options + benchmark. only a couple x from torch	2022-06-24 22:33:53 -07:00
George Hotz	b2d5df6049	3 convs are being recomputed	2022-06-22 07:54:52 -07:00
George Hotz	9d06a86f7f	CL class, debugging	2022-06-21 20:16:29 -07:00
George Hotz	0b820f7966	FOLD_CONSTANTS_INTO_KERNELS and shapetracker OOB tweak	2022-06-21 19:47:15 -07:00
George Hotz	1ebc2b5545	lazy opencl works	2022-06-21 19:41:08 -07:00
George Hotz	c53c91f949	opencl tests passed (#347 )	2022-06-21 18:57:09 -07:00
George Hotz	8fbe2e4aed	No ctx in llops (#345 ) * remove ctx from gpu ops * ctx for the others * this is okay * mlops are not static. fix lazy * cl is property, _processing_op is class method * kernel_name * contiguous_op	2022-06-21 10:07:49 -07:00
George Hotz	159a2d1a80	Simple Lazy (#340 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * and that * cmp is fast * 18ms on mac * it's a lot of lines, but it's faster * minor * tests pass * LoadOps.CONTIGUOUS * remove dups * torch converter doesn't support slice * move lazy out for merge * LoadOps are only for lazy	2022-06-20 22:45:11 -07:00

19 Commits