tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-19 02:44:40 -05:00

Author	SHA1	Message	Date
George Hotz	d2273d2cc4	s/contiguous_op/contiguous	2022-11-11 00:07:05 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	8c849e637c	that was in there twice, DEBUG>=4 to see loop opt	2022-10-30 15:31:39 -07:00
George Hotz	cfdf803b52	fix llvm vectorization by add analysis passes from the target machine	2022-10-30 15:28:36 -07:00
George Hotz	2f602a92ff	seperate STRIDED and EXPAND	2022-10-30 13:23:58 -07:00
George Hotz	4b6097f81d	more amx notes	2022-10-29 14:04:10 -07:00
George Hotz	fdb43fe553	gemm is 1.7 TFLOPS on a single M1 core	2022-10-29 13:42:33 -07:00
George Hotz	52bfbc31be	vectorization	2022-10-29 12:47:52 -07:00
George Hotz	e473d35f90	llvm doesn't vectorize	2022-10-29 11:59:48 -07:00
George Hotz	86eb06eb76	accurate flop estimation	2022-10-28 19:13:20 -07:00
George Hotz	dd543fbc7a	MovementOps is unused	2022-10-28 18:26:08 -07:00
George Hotz	71b336503f	no RESHAPEs in the AST	2022-10-28 18:25:30 -07:00
George Hotz	b65b70812a	Exec AST (#404 ) * working exec ast * exec_ast is staticmethod * GenericExecAST * fold that sometimes * ExplicitExecAST * exec_ast for GPU * gpu working * get_lazyop_shape * now gpubuffer is ExplicitExecAST * dedup * add a type * RESHAPE in opencl code * fix linter * that too for linter * cleanups * remove dead code * GenericShape is less lines * add ALLOWED_KERNEL_COUNT to tests * fix mypy * that's gotta be recursive * fix opencl shape processing * remove unneeded lambda	2022-10-28 08:27:03 -07:00
George Hotz	6a15fd3844	LLVM Backend take 2 (#403 ) * take 2 llvm * get_lazybuffers -> get_buffers * llvm tests pass * fix type issues and refactor LLVM	2022-10-26 20:32:31 -07:00

14 Commits