tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 12:58:00 -05:00

Author	SHA1	Message	Date
George Hotz	f93e297804	fix bug caused by rounding	2022-07-17 12:49:58 -07:00
George Hotz	cff297ef9d	w/e, that's a later prob	2022-07-17 12:32:50 -07:00
George Hotz	6375e7129a	opencl not imported	2022-07-17 12:14:39 -07:00
George Hotz	bf299802f8	fixup tests	2022-07-17 12:11:53 -07:00
George Hotz	3c4565fa21	SLICE -> PAD,SHRINK	2022-07-17 11:33:59 -07:00
George Hotz	cca089b11d	Revert "more expand -> repeat" This reverts commit `2e7b1630a8`.	2022-07-17 08:41:48 -07:00
George Hotz	2e7b1630a8	more expand -> repeat	2022-07-17 08:40:49 -07:00
George Hotz	d04b274cd2	noop removal can replace with reshape	2022-07-16 08:32:42 -07:00
George Hotz	bcf422dfdd	Device2 (#358 ) * option for matmul * fixups * fast like a nascar * running * thneed runner * no buffer id makes no backing buffer * move constant folding to the top * runs on mac * folded biases * was v slow * maybe just that * elu touchup * speed and float32 Co-authored-by: Comma Device <device@comma.ai>	2022-07-16 07:26:19 -07:00
George Hotz	5e46561f7e	no_grad = NOT backward	2022-07-10 20:54:57 -07:00
George Hotz	b34ae7876f	lol chr(10) not chr(13)	2022-07-10 20:03:11 -07:00
George Hotz	44848ee5dc	prints show we can precompute from the outside	2022-07-08 10:59:20 -07:00
George Hotz	04e7e4104c	track graph children and make lazycache use weak references	2022-07-07 11:01:18 -07:00
George Hotz	001cfe83a2	local	2022-07-07 10:05:26 -07:00
George Hotz	2720ef49ca	extra and test and tuple	2022-07-07 10:01:33 -07:00
George Hotz	81b73f97a3	Optiimzation (#355 ) * constant folding into kernels * that opt worth it? * fix mypy * ast one kernel * save 2 lines in conv kernel * debug print kernel count * cl debugging * early realize inputs * refactor Device	2022-07-04 08:58:57 -07:00
George Hotz	e6e43e820e	should fix tests	2022-07-03 16:06:11 -07:00
George Hotz	d7aad46758	test lazy also, make TestMNIST faster	2022-07-03 15:19:19 -07:00
George Hotz	93c378dffc	add test for slice_one	2022-07-03 12:14:20 -07:00
George Hotz	f9a8412b68	make contiguous ops yellow	2022-07-02 17:54:04 -07:00
George Hotz	207b9e1df3	padding is now a param to conv2d	2022-07-02 17:11:12 -07:00
George Hotz	cde137d163	simple shapetracker tests	2022-07-02 16:02:15 -07:00
George Hotz	368c0ce2f6	NUM=-2 for ants	2022-07-02 15:47:10 -07:00
George Hotz	7276f8d6bf	improve constant folding, detach before moving tensor	2022-07-02 15:29:40 -07:00
George Hotz	e55a9833fb	a little more readable	2022-06-27 08:54:04 -07:00
George Hotz	3a414d7f50	cleanup, add flops tracking	2022-06-26 22:43:39 -07:00
George Hotz	dffde3de5a	support both asymmetric and negative padding	2022-06-26 17:59:25 -07:00
George Hotz	49c954b389	comments	2022-06-26 17:20:25 -07:00
George Hotz	8c483fbdc9	maxpool lazy fix	2022-06-26 17:07:03 -07:00
George Hotz	98a730dd00	benchmark on different inputs	2022-06-21 20:20:58 -07:00
George Hotz	83d50e2687	move to extra.onnx	2022-06-21 19:43:44 -07:00
George Hotz	c833886bf5	improved shapetracker	2022-06-21 19:17:25 -07:00
George Hotz	159a2d1a80	Simple Lazy (#340 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * and that * cmp is fast * 18ms on mac * it's a lot of lines, but it's faster * minor * tests pass * LoadOps.CONTIGUOUS * remove dups * torch converter doesn't support slice * move lazy out for merge * LoadOps are only for lazy	2022-06-20 22:45:11 -07:00
George Hotz	a3538e225a	Simple Lazy Pieces (#343 ) * simple lazy * simple * fix graph and make realize simpler * SHUFFLE_MOVEMENT_OPS already works * MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS * it works, but it's slow * constant inlining * cache misses are the reason for loss * fix non determinism * cleanup, a few tests fail * profile * cache lazyop * cleanups * create namedtuple once * bunch of caches * it's not deleting * nograd * caching allocator * reduce_op * fromCPU if you want fromCPU * complain * nvidia fix * realized on Tensor * numpy is very slow * no loads in second run * caching in View * 10ms speedups on batman * remove old profiler * bunch of refactors * contiguous on view * elementwise_op_compile for conv * support ewop after processing op * this still works * conv folding works * all we do is conv conv conv no matter what * all args to the conv * still works * unify conv and ewop * ops_gpu cleanup * move around ops_gpu * remove caching allocator * remove unused * find_conv shorten * gpu refactors * simpler gpu * mergable without this * ops torch	2022-06-20 20:28:10 -07:00
George Hotz	a7131b6a46	Non contig (#339 ) * contiguous_view * non contig reduce too * conv fast * maybe faster valid * improve test_onnx * improve params * elementwise_op * draw non contig * improve contiguous	2022-06-19 22:40:48 -07:00
George Hotz	d05e7c291a	contiguous_view (#336 ) * contiguous_view * non contig reduce too * conv fast * maybe faster valid * improve test_onnx * improve params * elementwise_op * draw non contig	2022-06-19 20:37:28 -07:00
George Hotz	fb72ea3fbd	gpu uses shapetracker (fix tests) (#335 ) * shapetracker * movement_op * hmm, that's why repr failed	2022-06-19 17:32:07 -07:00
George Hotz	ce2e20b768	fix test	2022-06-19 17:07:09 -07:00
George Hotz	6b652dafb2	touchups	2022-06-19 16:57:14 -07:00
George Hotz	e364849b3b	stuff from lazy	2022-06-19 09:57:16 -07:00
George Hotz	8d08e41c21	print time in test	2022-06-19 00:59:09 -07:00
George Hotz	77f5cef8a6	First batch from lazy branch (#332 ) * test and helpers from lazy * lazy pt2	2022-06-18 17:26:59 -07:00
George Hotz	a11deb5150	shapetracker check for noop	2022-06-16 16:29:18 -07:00
George Hotz	52505faaf4	minor	2022-06-16 15:53:45 -07:00
George Hotz	d5b3e18540	Accelerate with CL (#325 ) * accelerated opencl * it's running, it's just wrong * bugfix * model is correct in opencl * lazy image convert * add padding support to convolution * that stuff was all upstreamed * remove HEAD * oops * test_simple_conv2d_4 passes, add dilation support * put logic in ops_opencl * fix crash * hmm, stride seems okay * padding for batched inputs * just an issue now with cout%4 * op model still passes * fix startPackedInputChannel * pre and post processing ops for graph * don't break other llops * shapetrackering * reshapes are free * lazy movement ops	2022-06-16 15:40:52 -07:00
George Hotz	bd7068f635	fix tests hopefully	2022-06-16 14:07:37 -07:00
George Hotz	ce15bf2bdb	the big memory gradient didn't even need to be computed	2022-06-16 11:41:29 -07:00
George Hotz	2e58948f6a	Revert "can put that test back" This reverts commit `51b082b41a`.	2022-06-16 11:25:49 -07:00
George Hotz	51b082b41a	can put that test back	2022-06-16 11:18:14 -07:00
George Hotz	85fe25e27b	add stride support to shapetracker	2022-06-15 17:48:41 -07:00

... 82 83 84 85 86 ...

4433 Commits