tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 19:25:06 -05:00

Author	SHA1	Message	Date
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
chenyu	680cbfdba4	less broken limit_dims_to_max (#2214 )	2023-11-04 08:38:06 -07:00
George Hotz	7103b716c4	merge kernel and optimizer (#2200 ) * merge kernel and optimizer * linearize is reentrant * move global/local size * clean up linearizer copy * remove unneeded lin copies * stop linearizing twice * oops, that should be None	2023-11-01 15:20:01 -07:00
George Hotz	194e4ad6f8	Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162 )" (#2182 ) This reverts commit `8cf0bb9351`.	2023-10-30 10:22:26 -07:00
Francis Lam	8cf0bb9351	optimizer: simplify GROUP and LOCAL to have one of each (#2162 ) * optimizer: simplify GROUP and LOCAL to have one of each Now that tensor cores only use LASTLOCAL, we can simplify to use only that op everywhere. The only use of GROUP is in matvec hand-coded opts and it doesn't make a performance difference so switching to use only the top behavior. Also adds additional asserts to prevent tensor core dims from being altered which causes bad kernels to be generated. * search: remove duplicated actions	2023-10-27 11:37:44 -10:00
Francis Lam	bf3490cdf9	wmma: refactor tensor cores using existing local dims (#2097 ) * wmma: refactor tensor cores using existing local dims * optimizer: fix bad rebase and break after one late local --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-10-25 13:10:46 -04:00
Francis Lam	ace6b2a151	optimizer: add test for correctness of opts (#2124 ) * optimizer: add test for correctness of opts Also added OptOps.UPCASTMID to constrain valid axes for opts with group_for_reduce. * llvm: fix LinearizerOptions to correctly not has_shared * optimizer: remove premature test scaffold for TC opts * search: fix the action space	2023-10-22 08:02:22 -07:00
David Hou	95e17ff0d4	fix wino mask upcast calculation (#2057 ) * fix wino mask upcast calculation * add tests for wino upcast hcopt * add info to note * real world wino hcopt test * wino backward test * whitespace	2023-10-18 16:54:48 -07:00
George Hotz	90c777d815	remove apply_auto_opt (#2063 )	2023-10-13 07:44:14 -07:00
Francis Lam	81c7d750db	test: fix test_linearizer.test_tensor_core test (#2036 ) must use apply_tensor_core instead of hand_coded_optimizations	2023-10-10 14:48:28 -07:00
George Hotz	121f7aa8c5	Schedule item (#2012 ) * ScheduleItem * put var_vals in the schedule * fix tests, wow that proliferated quickly * not ready to be in the schedule	2023-10-07 08:59:25 -07:00
nimlgen	2ea1dd3e87	no process() in Linearizer (#1966 ) * no process() in Linearizer * more process() clean up	2023-10-04 07:18:42 -07:00
George Hotz	f64d5b3ba8	move to realize.py (#1961 ) * move to realize.py * run_schedule moved	2023-10-03 07:25:40 -07:00
George Hotz	d48a90859c	use the opts from the default device (#1954 )	2023-10-02 03:13:46 -07:00
David Hou	d4671cd8e3	use schedule in more places in linearizer tests (#1946 ) * pass current linearizer opts to Linearizer in TestFloat4 * use schedule instead of exec_ast hook	2023-10-02 02:22:56 -07:00
David Hou	8e9db88474	expand after expr_idxs in Linearizer.global_load (#1818 ) * small changes * expand in terms of substitute, directly expand g_idxs g_valid * delete expand_ops * don't compare using hash * any instead of in thanks gijskoning Co-authored-by: Gijs Koning <gijs-koning@live.nl> * support tc * testing code * no more create_rednode * maxsize none in view/node * oops * undo * typing * oops * oops * lmao * lmao * add expand multi test * Node.iter_idxs * type * type * delete checks! * clean up a little? * expand_idx in symbolic * un-golf * play around with types >.> * test_substitute and also remove an incorrect test? * get rid of range * Update symbolic.py * split out view cache change * split out flat components change * reduce diff * reduce diff * add some float4 tests * fix --------- Co-authored-by: Gijs Koning <gijs-koning@live.nl>	2023-09-29 10:33:34 -07:00
Francis Lam	f445e056ed	wmma: add test and tensor core shape (#1925 )	2023-09-28 18:04:28 -07:00
George Hotz	c907efbf4a	reorder a few things (#1915 ) * reorder a few things * huh, that has to be there * move apply shapetracker * BufferOps * only for type checking	2023-09-25 10:17:21 +08:00
George Hotz	7ff7aacdb4	LazyOp out of Linearizer (#1908 ) * loadop buffer on cpu * works for GPU * sort of working * has bugs * gpu tests pass * fix some tests * fix tensor cores * fix test linearizer * fix symbolic * fix has_variable_shape * non symbolic size * disable weird test * simple cache fix * fix custom function * fix kopt * cleanups * a bit broken on the assign * contig check * only buffer * need that order * idx * dedup buffers * hmm, bugfix * fix tensor cores * opts device	2023-09-24 14:30:53 +08:00
George Hotz	97dc813329	Revert "All LazyOps in the Linearizer (#1905 )" (#1907 ) This reverts commit `a5820390db`.	2023-09-24 11:51:22 +08:00
George Hotz	a5820390db	All LazyOps in the Linearizer (#1905 ) * loadop buffer on cpu * works for GPU * sort of working * has bugs * gpu tests pass * fix some tests * fix tensor cores * fix test linearizer * fix symbolic * fix has_variable_shape * non symbolic size * disable weird test * simple cache fix * fix custom function * fix kopt * cleanups * a bit broken on the assign * contig check * only buffer * need that order * idx	2023-09-24 11:50:00 +08:00
nimlgen	31fca43706	kopt works with local+grouped reduce and tests (#1824 )	2023-09-09 13:22:09 -07:00
George Hotz	ed194a1d3b	zero fold (#1748 ) * add constant fold * err, it's just zero folding * self store fold + caching * prints and more folds * simpler winograd kernels * remove childless uops	2023-09-03 13:48:11 -07:00
nimlgen	1c0449e190	add cache collector (#1595 ) * init cache collector * add test_cache_collector.py * switch GlobalCounters.cache to CacheCollector * init jit models test * jitted SD * add debug msg to print loaded bufs count * moved cache collctor to jit * clearer SD * no double device import	2023-08-28 19:59:55 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
David Hou	4fbce972d7	CSE at uop level (#1483 ) * uop-level cse * add test * don't cache reduce alu ops * types * rename variable * fix * delete lines	2023-08-19 23:40:40 -07:00
David Hou	92754e177c	cache buffer loads across multiple bufs (#1482 ) * cache loads across buffers (since they may share rawbufs) * typing * add test * fix test * small changes to test * fix test * one big cache * whitespace * golf a line? * invalid is RawBuffer(0)[0], valid 1.	2023-08-19 09:09:58 -07:00
David Hou	56ee97b37f	dedup kernel args v2 (#1272 ) * new version * fix abstractions * try remove test * Revert "try remove test" This reverts commit `2fc18a9f8e`. * assert_allclose * minimize the test * minimize the test * minimize the test * minimize the test * Revert "minimize the test" This reverts commit `e0c0929596`. * Revert "minimize the test" This reverts commit `88240551b1`. * Revert "minimize the test" This reverts commit `78328a7ce2`. * Revert "minimize the test" This reverts commit `989523fded`. * skip test inside body * oops * oops	2023-07-18 20:03:42 -07:00

28 Commits