tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
YassineYousfi	2f0f91ba3d	support float16 onnx weights (#384 )	2022-09-15 09:12:18 -04:00
George Hotz	18fde22dac	fix that soon	2022-07-20 09:07:09 -07:00
George Hotz	44848ee5dc	prints show we can precompute from the outside	2022-07-08 10:59:20 -07:00
George Hotz	001cfe83a2	local	2022-07-07 10:05:26 -07:00
George Hotz	2720ef49ca	extra and test and tuple	2022-07-07 10:01:33 -07:00
George Hotz	81b73f97a3	Optiimzation (#355 ) * constant folding into kernels * that opt worth it? * fix mypy * ast one kernel * save 2 lines in conv kernel * debug print kernel count * cl debugging * early realize inputs * refactor Device	2022-07-04 08:58:57 -07:00
George Hotz	e6e43e820e	should fix tests	2022-07-03 16:06:11 -07:00
George Hotz	98a730dd00	benchmark on different inputs	2022-06-21 20:20:58 -07:00
George Hotz	83d50e2687	move to extra.onnx	2022-06-21 19:43:44 -07:00
George Hotz	159a2d1a80	Simple Lazy (#340 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * and that * cmp is fast * 18ms on mac * it's a lot of lines, but it's faster * minor * tests pass * LoadOps.CONTIGUOUS * remove dups * torch converter doesn't support slice * move lazy out for merge * LoadOps are only for lazy	2022-06-20 22:45:11 -07:00
George Hotz	a3538e225a	Simple Lazy Pieces (#343 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * mergable without this * ops torch	2022-06-20 20:28:10 -07:00
George Hotz	d05e7c291a	contiguous_view (#336 ) * contiguous_view * non contig reduce too * conv fast * maybe faster valid * improve test_onnx * improve params * elementwise_op * draw non contig	2022-06-19 20:37:28 -07:00
George Hotz	8d08e41c21	print time in test	2022-06-19 00:59:09 -07:00
George Hotz	77f5cef8a6	First batch from lazy branch (#332 ) * test and helpers from lazy * lazy pt2	2022-06-18 17:26:59 -07:00
George Hotz	d747a4b9e2	add padding to conv2d function, other minor things	2022-06-11 22:29:42 -07:00
George Hotz	9ebd472375	move ops to ops.py	2022-06-11 15:58:56 -07:00
George Hotz	b5b68e75ff	simpler onnx	2022-06-11 15:35:45 -07:00
George Hotz	2305a5347b	test_onnx works with enet also	2022-06-11 14:30:26 -07:00
George Hotz	6fdb276886	flip batchnorm function order	2022-06-11 13:20:41 -07:00
George Hotz	85d17a2acd	running resnet onnx	2022-06-11 13:17:15 -07:00
George Hotz	db5a632e8c	multicat + test onnx is generic onnx	2022-06-11 11:50:47 -07:00

21 Commits