tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
George Hotz	a6de94b444	test partial sum	2023-01-22 21:28:40 -08:00
George Hotz	708215d06b	Typing (#468 ) * we typing * types look good in theory * most tests pass * gpu tests pass * TEST_AST * delete comments * i must have written that bug so many times * bugfix * don't merge the small ones * add f to constants * commits from reduce * don't GCD the mod nodes * broken and a hack IMAGE=3 * group for reduce * fix linter + mypy * move out test ast * insource TENSOR_TYPE_TO_NP_TYPE * does this fix it? * move imports out	2023-01-21 09:09:22 -08:00
George Hotz	b29614592a	first conv/second conv	2023-01-19 13:26:11 -08:00
George Hotz	3d697577b2	print_ast	2023-01-19 13:22:03 -08:00
George Hotz	0881d504c1	move shapetracker (#466 ) * move shapetracker * shapetracker test * move ast * move a few things * fix print kernel * fix test * symbolic fixups	2023-01-19 09:56:31 -08:00
George Hotz	2b47ee401f	Symbolic for indexes (#464 ) * indexer * works * all use indexer * boolean in the indexer too * symbolic is a better name than indexer * better symbolic API * min and max * symbolic tests * work * more tests * fix demodder * __str__ in the superclass * NumNode * awesome that works * still works * fix up parens * fix zeroviews * dead lines * expr_node * works * still works * refactor to not use __new__ methods * ugh something went wrong a while ago * this fixes it * mod and div at the end * test * symbolic * working * one linter issue fixed * other division * more simplifys * works * validhacks * VALIDHACKS passes thneed * no str replace stuff * inline indexes * NATIVE_EXPLOG and factoring * factor both ways * cl indexing * split on mod, not just full * onnxlimit * fix output shape * op_estimate is a function of the program * no ones in the index * four_float4 * ALLOW_4FLOAT4 * test passes * compute then store * loads first * bugfix * better, but doesn't match * select xb in smart way * new test and bugfix * no change to lazy * Node fixes linter * fix opencl with op_estimate * fix mypy * revert valid * remove unused	2023-01-19 07:21:30 -08:00
George Hotz	9245f4650a	indexer changes for master	2023-01-18 18:02:02 -08:00
George Hotz	287699c32c	simplify ones after axis splitting	2023-01-14 10:51:43 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	fff1f046b0	Simple version of the new GPU backend (#458 ) * newgpu * more to delete * hmm, tests pass with constant folding * fix lint/type * fix constant folding * comment and rerun tests * lazy touchups * fix graph_batchnorm test * smaller transformer to fix OOM * Revert "smaller transformer to fix OOM" This reverts commit `a44ef8edc2`. * no func cache * introspect * touchups * CLASTKernel * ugh, it was lru_cache * codegen * spacing * old gpu still in opencl * typing fix	2023-01-10 19:16:02 -08:00
George Hotz	bfd4f4e35c	testdocker	2023-01-09 12:41:52 -08:00
George Hotz	4885fce56e	shapetracker from newgpu (#456 ) * shapetracker from newgpu * touchup ops * test * testst * thneed deletes unused inputs * test * bugfix	2023-01-09 12:40:01 -08:00
cloud11665	4fb97b8de0	don't fail when termcolor is not installed (#436 )	2022-11-14 16:45:06 -08:00
George Hotz	5e07d4669d	the speedy chonker is going to replace the old chonker (#432 ) * bringing back reshape and permute * done with E701 * 4x4 works in generic way * max and sum not vectorizing... * special case single float * support comparing to MPS * improve matmul speed, consider generic principles * GlobalCounter * fix op tracking * faster * comment that out for now * err, it needs that * fix minor issues * fix global_mem	2022-11-11 18:34:24 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	1271f19a2b	factorizing shapetracker from chonker	2022-11-09 16:36:38 -08:00
George Hotz	9781b4c3af	rename test functions to helper_	2022-11-07 21:27:56 -08:00
George Hotz	9884be2ad5	ugh, that too	2022-11-07 21:21:35 -08:00
George Hotz	537a9eb414	fix termcolor import	2022-11-07 21:19:08 -08:00
George Hotz	2cc1d970c6	updates from the chonker branch	2022-11-07 21:12:08 -08:00
George Hotz	db2da22a04	stop blowing up floats	2022-10-30 16:47:16 -07:00
George Hotz	8afc643bb1	fix bug in ops test, it was cheating somehow	2022-10-30 16:43:24 -07:00
George Hotz	2f602a92ff	seperate STRIDED and EXPAND	2022-10-30 13:23:58 -07:00
George Hotz	544cb0a069	oops, remove while(1)	2022-10-29 14:05:13 -07:00
George Hotz	fdb43fe553	gemm is 1.7 TFLOPS on a single M1 core	2022-10-29 13:42:33 -07:00
George Hotz	52bfbc31be	vectorization	2022-10-29 12:47:52 -07:00
George Hotz	e473d35f90	llvm doesn't vectorize	2022-10-29 11:59:48 -07:00
George Hotz	7909786dbf	one more opt test	2022-10-28 18:37:53 -07:00
George Hotz	294ab9e2f8	more test opt	2022-10-28 18:04:12 -07:00
George Hotz	f885ceb695	test speed w/o bias	2022-10-28 11:22:15 -07:00
George Hotz	df31dde174	hasattr and DeviceBuffer type fixups	2022-10-28 09:05:45 -07:00
George Hotz	b65b70812a	Exec AST (#404 ) * working exec ast * exec_ast is staticmethod * GenericExecAST * fold that sometimes * ExplicitExecAST * exec_ast for GPU * gpu working * get_lazyop_shape * now gpubuffer is ExplicitExecAST * dedup * add a type * RESHAPE in opencl code * fix linter * that too for linter * cleanups * remove dead code * GenericShape is less lines * add ALLOWED_KERNEL_COUNT to tests * fix mypy * that's gotta be recursive * fix opencl shape processing * remove unneeded lambda	2022-10-28 08:27:03 -07:00
George Hotz	6a15fd3844	LLVM Backend take 2 (#403 ) * take 2 llvm * get_lazybuffers -> get_buffers * llvm tests pass * fix type issues and refactor LLVM	2022-10-26 20:32:31 -07:00
George Hotz	10921a60c4	more imports from llvm branch	2022-10-26 18:02:36 -07:00
Drew Hintz	a4ad1d774a	enable tests in test_ops.py that are disabled but now work. (#396 ) remove custom tolerances that don't appear to be needed.	2022-10-13 09:58:53 -07:00
George Hotz	793edf8900	touchup	2022-10-10 16:13:34 -07:00
George Hotz	d54a45b50d	measure speed vs torch	2022-10-10 16:06:00 -07:00
George Hotz	b7f748c15a	Fix GPU 2*31 virtual size limit (#392 ) in progress * big conv test works * that's unneeded * fix opencl with reduce * rewrite contiguous_view_constant_fold * clean up mids in loop code * subidx * print cl kernel before run * no reduce, no loop * Revert "no reduce, no loop" This reverts commit `92777e40e9`.	2022-10-05 00:55:20 -04:00
George Hotz	392e57aea7	ugh, why did that fail	2022-10-01 13:38:43 -04:00
George Hotz	7a61dc7ee9	test_sd_big_conv	2022-10-01 13:26:05 -04:00
Ollin Boer Bohan	3b1767e013	Fix OpenCL Metal texture issues (#378 ) * Fix OpenCL Metal texture issues Tile CL images when needed, to fit into the 16384 max Metal image size; gets me to ~4.8s/iteration for SD on M1 Pro with OPENCL=1 FLOAT16=1. * Minor cleanup * Fix mish in CI, or no-op? * Is mish being framed? * It would help if any of this reproduced locally * ??? * OPT is reverted; use original mish * Cleanup post-review * Fix some shape usage * Tiler tests, shouldn't oom or overflow either * Can't CL if there's no CL? * Run tiler tests even if GPU=1 * relu6 segfault binary chop; revert test * relu6 segfault binary chop; revert accel * relu6 segfault binary chop; revert . (???) * end relu6 segfault binary chop; repo's haunted	2022-09-29 01:21:54 -04:00
George Hotz	e737513c52	external_test_opt	2022-09-28 23:29:41 -04:00
George Hotz	650c011646	notrain test	2022-09-28 23:27:20 -04:00
George Hotz	af87d692e4	should this be 10?	2022-09-28 23:25:52 -04:00
George Hotz	0fd459b24e	ugh, global state	2022-09-28 23:10:49 -04:00
George Hotz	fa4eff9cc1	Device.GPU isn't definied	2022-09-28 23:00:15 -04:00
George Hotz	0b6537a572	fix tests	2022-09-28 22:57:58 -04:00
George Hotz	726cca78cd	fix bn folding issue, add new test	2022-09-28 22:52:18 -04:00
George Hotz	60df954377	Fix weight init: this work? (#391 ) * this work? * glorot uniform * requies_grad broke * propagate the None correctly * so this weight init works * ahh, i think it's this * can't beat this * glorot is best for ae * remove comments	2022-09-25 16:46:33 -04:00
George Hotz	271446e3eb	set requires_grad to None (#387 ) * set requires_grad to None * some things need gradients * hmm, why was get_parameters filtering	2022-09-21 11:16:02 -04:00

... 82 83 84 85 86 ...

4505 Commits