tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
cloud11665	264b1e5f48	cache gpuocelot build in cuda CI (#1032 )	2023-06-22 17:42:12 -07:00
cloud11665	2407690d82	add cuda on cpu tests (#1020 )	2023-06-22 14:15:50 -07:00
George Hotz	18892242b0	global -> group (#1007 ) * global -> group * allow None for local_size in custom function * lil local * comment on shape * fix cuda * smart local cast * better local heuristic * fix ptx, and work_dim cleanup * fix metal * fix ops test * fix openpilot jit * no more optlocal * might fix metal tests * try metal now * see generated metal code * test free removal. REVERT THIS * mergable	2023-06-21 11:50:43 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
George Hotz	0d4c4f4e9e	metal ci attempt (#1010 ) * metal ci attempt * skip failing ops tests * skip in the ops test * no dtype test	2023-06-19 09:23:55 -07:00
Diogo	6b1280f01c	fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956 ) * prelu and where casting * typing for safe_numpy * optional * get rid of tracing in ci * cleanup and resolved layernorm issues * removed debug print	2023-06-08 16:09:19 -07:00
kposborne2	00360da05b	Update broken `docs/abstractions.py` for changed ops, and add to CI (#930 ) * fix and add to ci * still have those * ocd * update other doc	2023-06-04 19:21:20 -07:00
George Hotz	a3feee29c5	make tests faster + add onnx (#815 ) * search one dir, disable slow * onnx tests * fast rnnt test	2023-05-27 08:53:32 -07:00
George Hotz	faf80418b7	pyopencl by default since GPU is default (#802 )	2023-05-25 17:48:18 -07:00
George Hotz	03b38864db	fix batchnorm at training (#753 ) * e2e testing * min failure * no affine on bn, still fails * why did i think i could detach that? * allow more kernels for bn * some test issue i don't understand	2023-04-19 08:01:04 -07:00
George Hotz	dbc99c243b	why did that test break?	2023-04-18 17:08:38 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
Cyril Roumégous	3f08613a2a	apply flake8 E203 rule (#684 )	2023-03-11 11:35:16 -08:00
George Hotz	1826ff6b89	dtypes nice and clean (#673 ) * add dtype class * dtypes * buffers are lazy * dtype is tracked by lazybuffer and GenericShape * fix types in llvm * llvm store * dtype tests * fix tests maybe * fix flop counter * fix CI * CI fix and check format * fix dtype and dtype check * fix custom test * fix test graph	2023-03-10 16:56:07 -08:00
George Hotz	5dc227dba6	fix bug in ENABLE_METHOD_CACHE and enable for llvm	2023-03-06 07:43:40 -08:00
George Hotz	50012f679b	move get_contraction to shapetracker	2023-03-06 06:42:57 -08:00
George Hotz	7a1d96fd76	No negative (#632 ) * behavior is correct without VALIDHACKS * simple div and mod * fix tests * no negative variables * alt form is correct * still correct * bug in mulnode * at least validhacks works now * cleanups * test validhacks, and to_image_idx * cache compare key * tests and __neg__	2023-03-03 16:48:14 -08:00
George Hotz	999b44c274	fix external test + speed	2023-03-03 06:46:16 -08:00
George Hotz	459488bba2	fix linter (#630 ) * fix linter * no imports okay * explicit bases * disable in pylintrc	2023-03-02 20:06:20 -08:00
George Hotz	bfcec234a2	Refactor ASTs (#622 ) * ugh worst branch name * compiler refactor continues * scc -> cloc * buf -> _buf * finish _buf, and program -> runtime * gpu is still working, clang isn't * clang in new style * ops_metal * something broke it * improve metal * clean up tons of cl crap * hack fix sync * cleaner gpu * gpu metal clang * cleanups * minor refactor * GPUCodegen * fix up LLVM * blind CUDA refactor * codegen / runtime * keep ops naming * linter passes * woah, llvm was allocing 4x what it needed to * bugfixes * fix openpilot compiler * fix compile_efficientnet * method cache should fix tests * deal with duped functions	2023-03-01 18:57:29 -08:00
George Hotz	3c8da6bd03	add typing	2023-02-28 10:54:46 -08:00
George Hotz	d584bae5c0	fine, openpilot can have 197 kernels	2023-02-27 11:48:36 -08:00
George Hotz	c9252d38b2	mypy cache breaks if you sometimes check untyped defs, no checking tests for now	2023-02-27 09:57:33 -08:00
George Hotz	e74779f19d	typing fixup	2023-02-27 09:52:04 -08:00
George Hotz	edc8fbfff2	woah, why isn't OPT=2	2023-02-27 08:03:31 -08:00
George Hotz	f4ee7d2cad	back to 196 kernels	2023-02-25 18:25:34 -08:00
George Hotz	6e98a172a0	fix broken contiguous	2023-02-25 17:41:49 -08:00
George Hotz	a44e8e4385	discard children on mop shuffle, 200 -> 196 kernels	2023-02-25 10:51:07 -08:00
George Hotz	758515dcc0	conv2d is an hlop (#589 ) * conv2d is an hlop * shorter conv * KOPT=-1 * alt imp * MULACC * smarter mulacc * pop conv * 7x7 -> 5x5 * didn't fix, that's not going to work * this is faster and matches old behavior * oh, non lazy just won't work with mulacc * mulacc in torch * bool types were creeping in * optimizer is actually better with hlop conv * fix pushing permutes issue * refactor einsum_mulacc * fix up readme * update readme * _image_conv2d * fix bias addition location * pushing permutes gets back to 200 kernels * conv cleanup * disable hlop conv * don't hide that in helpers	2023-02-23 17:52:31 -08:00
George Hotz	628ce067a1	add tests to mypy	2023-02-22 07:07:38 -08:00
George Hotz	714bf4b108	clang backend (#572 ) * start clang backend * mostly working * no group for reduce w clang * it compiles * compiles * a11y * minor fixups * formatting * add a test * rename test	2023-02-20 18:18:18 -08:00
James Roberts	0d405fd5bc	Parallelize CI tests (#535 )	2023-02-06 15:27:44 -06:00
George Hotz	90529d3750	tests are 20% faster (#529 ) * pytorch CPU * no cache, it's slower * pytorch cpu for real * remove double onnx	2023-02-06 09:56:14 -06:00
George Hotz	6eb0e6a650	shuffle deps: always tqdm, make linting category	2023-02-06 09:27:01 -06:00
George Hotz	1d80639646	make linter test install testing deps	2023-02-06 09:21:48 -06:00
George Hotz	60bb64811c	merge mypy into linters, no useless package update	2023-02-06 09:14:00 -06:00
Martin Loretz	97f0a82be7	Cache pip packages in github actions (#522 ) * Cache pip dependencies in github actions * Add setup.py as cache-dependency-path * Test caching * Test caching * Upgrade setup python action * Test caching * Remove setup.py from cache-dependency-path * Don't remove cache-dependency-path * Don't cache linter package's * Test caching * Test caching * Test caching * Upgrade actions/checkout to v3	2023-02-03 20:04:20 -08:00
George Hotz	e313c8af20	update openpilot tests from OPENCL to GPU	2023-01-24 14:05:20 -08:00
George Hotz	49c6e6d472	Latest attempt to add image (#462 ) * add image * load + store + boring stuff: * image tests pass * thneed print GFLOPS * op conv test * more debugging * hack for multiview image * shapetracker creates less views * disable image tests * working better * ugh, lkey not key * print in DEBUG, and allow views * works * simple padding conv2d * use index for image * that was bad code * debug print * fix types * less lines * save lines	2023-01-12 17:36:30 -08:00
George Hotz	27211103ae	docker: no -it	2023-01-09 12:49:59 -08:00
George Hotz	d6e86a29a8	docker: forgot to checkout code	2023-01-09 12:48:03 -08:00
George Hotz	73ce9a771e	that fix it	2023-01-09 12:46:33 -08:00
George Hotz	bfd4f4e35c	testdocker	2023-01-09 12:41:52 -08:00
George Hotz	5e07d4669d	the speedy chonker is going to replace the old chonker (#432 ) * bringing back reshape and permute * done with E701 * 4x4 works in generic way * max and sum not vectorizing... * special case single float * support comparing to MPS * improve matmul speed, consider generic principles * GlobalCounter * fix op tracking * faster * comment that out for now * err, it needs that * fix minor issues * fix global_mem	2022-11-11 18:34:24 -08:00
George Hotz	b8c94a67c9	Simple chonker (#431 ) * chonker will make llvm fast * work * better speed tests, we will make them fast * with the cache add is the same speed * relu and neg are fast * fix sum speed * maximum maxnum? * hack for gemm opt * gemm very slow * zeros like * test_permute * shapetracker returns self * fix shapetracker factorization * err, int strides * permutes are faster now in tinygrad than pytorch * support -1 in expand * gemm unrolled * improve final test case * WIP GEMM * why isn't GEMM fast? * revert cache dim * ffp contract works on clang, not llvm? * ignore llvm ir * this makes fma work at least, but no faster * USE_4x4 * 63 GFLOPS * 87 GFLOPS * that wasn't matmul, 44 GFLOPS now * 82 GFLOPS permuted * this permute too * a little speed for the convs * 45 GFLOPS * speed tests pass again * clean up prints * fix FMA WHAT A WASTE OF TIME * colors * moar fair * GPU * useless on chonker * cleanups * improve factorized shapetracker * better threshold * label conv * work * ops test pass again * hot load the index * run the last view, no need to create * ZeroView needs a repr for the key to work * fix segfault on out of bounds * one more test * start amx, and llvm.initialize_native_asmparser * amx works * nice AMX class * nicer AMX class * refactor get_idxs * amx working * is slower... * useless flip * cache * SZ_X * AMX_SZ_X/Y work alone * Contiguous mlop * test gemm packed * PREPARE in packed * use_amx factor * prefetch isn't faster * loop * same 3ms * 2.24 ms * allow double on store in TG * amx reduce is the same speed as non amx reduce * include memory bandwidth * clean up shapetracker * flip returns stride * prepare for upstream * Update ops_llvm.py (#426) * permutes are yellow and green now * faster conv * llvm cleanups * Show optimised IR under debug 4 (#428) * ASTKernel class * Make tinygrad work with older python version (#427) * Make tinygrad work with older python version * Use partialmethod instead of partial * smiple chonker is chonking * remove junk from test speed vs torch * fix linker and types * AMX is only here now * add LLVM tests, it's a valid backend now * oops, run llvm test * contiguous_op * fix loadops compare * dedup reduceops Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>	2022-11-10 23:17:09 -08:00
George Hotz	d02f8f9bc0	can we lose the lines with E701 still there?	2022-10-28 08:36:03 -07:00
George Hotz	ef62db3186	cleanups, remove E701	2022-10-28 08:28:56 -07:00
George Hotz	b65b70812a	Exec AST (#404 ) * working exec ast * exec_ast is staticmethod * GenericExecAST * fold that sometimes * ExplicitExecAST * exec_ast for GPU * gpu working * get_lazyop_shape * now gpubuffer is ExplicitExecAST * dedup * add a type * RESHAPE in opencl code * fix linter * that too for linter * cleanups * remove dead code * GenericShape is less lines * add ALLOWED_KERNEL_COUNT to tests * fix mypy * that's gotta be recursive * fix opencl shape processing * remove unneeded lambda	2022-10-28 08:27:03 -07:00
George Hotz	3b9b7eda48	remove run_thneed dead code	2022-10-20 17:24:18 -07:00

1 2 3

104 Commits