tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
chenyu	c8774713c5	lazy cleanup (#2567 )	2023-12-02 13:21:43 -05:00
George Hotz	6733425095	lower schedule (#2559 ) * lower schedule * remove RAND, and don't put load in the JIT yet * better fix for that test	2023-12-01 19:17:46 -08:00
Christopher Mauri Milan	7f01dd04f0	Apply ruff linting rules to tests (#2473 ) * everything except F821 * enable F821 with noqa * dumb fix * fix remaining imports and (former) lambdas * replace _ with noqa to avoid gc	2023-11-27 21:24:06 -08:00
George Hotz	7170a9a057	coder.py can write and run code (#2439 ) * wip mistral * coder * touchups * cleanups * mistral cleanups * clean up cache create * download the weights, fix tests * fix llama loading * global fixup * clean up all * move llama model * cleanups * Revert "cleanups" This reverts commit `a71c5d59eb`. * fine, leave it	2023-11-25 12:27:54 -08:00
George Hotz	3baaf298d6	two stage cumsum in tensor.py (#2331 ) * two stage cumsum in tensor.py * 2 more kernels for llama cumsum * gpt-2 and llama use fast multinomial	2023-11-16 12:09:53 -08:00
George Hotz	0cbf6c1811	move things, clean up extra (#2292 ) * move things * idk why pylint needs that now * delete unused	2023-11-13 20:18:40 -08:00
chenyu	a72b370066	llama take int and convert to Variable internally (#2284 )	2023-11-12 17:11:37 -05:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
George Hotz	5a4a62ecae	Disable logging in early compile2 and lower kernel counts (#2090 ) * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`. * gate behind OPT >= 4 * disable_logging in schedule * simple * from master * more images * revert that * 206 kernels	2023-10-16 20:15:24 -07:00
George Hotz	d0aaf7d83b	Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 )"" This reverts commit `f22a7cf656`.	2023-10-16 17:47:00 -07:00
George Hotz	5e24dc5a95	limit metal buffers and revert the 207 fix (try 2) (#2088 ) * limit metal buffers * look at the base, not the srcs * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`. * add a test for that	2023-10-16 14:52:16 -07:00
George Hotz	e8fcd2f3db	Revert "limit metal buffers and revert the 207 fix (#2087 )" This reverts commit `2fb10f6a19`.	2023-10-16 14:32:22 -07:00
George Hotz	2fb10f6a19	limit metal buffers and revert the 207 fix (#2087 ) * limit metal buffers * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`.	2023-10-16 14:26:32 -07:00
George Hotz	924ecc4d6a	Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 ) This reverts commit `63869c62fc`.	2023-10-13 12:01:55 -07:00
Amrit Sahu	63869c62fc	openpilot kernel fix from 209 to 207 (#2006 ) * Fix openpilot kernel from 209 to 206 1. Use push_movement_ops conditions in _movement_op. Don't push PAD or check if the ops are safe to be pushed with PAD 2. Don't push if all the op.buffers are realized * change ALLOWED_KERNEL_COUNT to 206 for openpilot * don't push through sourceless buffers * change the tests to adjust kernel counts for new behaviour * restore pushing of movement ops through childless buffer * don't push EXPAND, causes OOM * allow push of intermediate movement ops * adding new test behaviour * modifying external_test_opt for new behaviour * restore old tests * Reenable push of EXPAND and introduce new tests I was wrong intially thinking EXPAND can cause OOM and hence I had disabled it. Since it is 0 stride and doesn't allocate memory its cool * Don't push EXPAND above LoadOps LB. This is causing OOM * Push should be decided on movement root of bufs To check if ast.op.buffers is sourceless/ realized go the the movement root and then decide if pushing should be done or not * refactor for readability * use .base instead * don't push expand, bad memory/compute consumption * restrict push of reshape, seeing improvement * push reshape if unary without further check * disable PAD solves convnext kernel count increase * reenable test_cache_binaryop_transpose * small nit	2023-10-13 11:59:15 -07:00
Yixiang Gao	094d3d71be	with Tensor.train() (#1935 ) * add with.train * remove the rest TODOs * fix pyflake * fix pyflake error * fix mypy	2023-09-28 18:02:31 -07:00
George Hotz	adab724caa	schedule2, keep the tests working with small changes (#1932 ) * lazy cleanups * ast functions take in LazyOps * op instead of self.op * _base for mops * fix contiguous * start schedule * test_schedule * fix openpilot * more tests * bugfix and test skip * work * make sure things get freed * fix zerosized tensors * fix failing test * fix ceil and friends * fix openpilot * disable training * disable test collectives	2023-09-28 09:14:43 -07:00
George Hotz	6d9065ed1c	Minor cleanups (#1911 ) * cleanups * remove that simplify	2023-09-24 21:32:50 +08:00
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
nimlgen	1c0449e190	add cache collector (#1595 ) * init cache collector * add test_cache_collector.py * switch GlobalCounters.cache to CacheCollector * init jit models test * jitted SD * add debug msg to print loaded bufs count * moved cache collctor to jit * clearer SD * no double device import	2023-08-28 19:59:55 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
nimlgen	1ba8ae62a1	Match Torch speed for sum reduction (#1387 ) Co-authored-by: Alexander Edwards <alex@alexedw.com>	2023-08-05 22:27:33 -07:00
Felix	97a6029cf7	Corrected a few misspelled words (#1435 )	2023-08-04 16:51:08 -07:00
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
chenyu	1fdf560fb1	simplify get_contraction (#1373 )	2023-07-30 18:35:22 -07:00
waifairer	d89fb729e5	flake8 (#1323 ) * flake8: Ignore frequent violations, correct infrequent ones * Ignore some rules in test * Reorder test ignores * Lint test + main * EOF indent * Include all E71,E72 errors * Test the failing case in CI * Revert "Test the failing case in CI" This reverts commit `110add0a70`. * Push to test! This reverts commit `f317532779`. * ok back to passing This reverts commit `ba5052685f`. * Prove that CI fails when formatting is incorrect. * Fix formatting * Remove duplicitous E117 rule * Use flake8 config for precommit --------- Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-24 11:19:58 -04:00
George Hotz	45ecae1ab3	Revert "Match Torch speed for sum reduction on M1 (#1187 )" (#1286 ) This reverts commit `59af9b81c5`.	2023-07-19 13:39:16 -07:00
Alexander Edwards	59af9b81c5	Match Torch speed for sum reduction on M1 (#1187 ) * Add additional kernel when reducing multiple dimensions at once. * Faster for smaller inputs * Whitespace and naming * Cleaner, guard for Metal only, and max 1 split rather than N * Draft of different approach * One additional kernel call for this test (as expected)	2023-07-19 09:18:58 -07:00
Rayan Hatout	9975f24452	Fold expand preceding reduce if the reduction is on the same axis as the expansion (#1134 ) * fold expands that precede a reduce if the reduction is on the same axis as the expansion * add deterministic test for SIMPLIFY_SUM_RESHAPE_EXPAND_SUM optimization * add a test case to make sure we don't fold reduce-expand-reduce on different axes	2023-07-06 13:41:05 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
Diogo	2d4370b487	Adds tril & triu support (#936 ) * triu & tril support * lint and kernel count error * switched shape indicies * larger shape tests * reverted numpy removal until #942 is resolved	2023-06-09 22:13:20 -07:00
George Hotz	791530045d	Refactor LoadOps (#910 ) * test * work * upd test * loadops * cleanups * real ones * remove LazyNumpyArray * fix assign test * remove range * np.require * llama uses arange kernels * no caching consts * fix enet * torch load support * tests cleanup * fix shufflenet * fix image * fix torch_load test	2023-06-03 09:40:43 -07:00
George Hotz	ddc9dafe62	tighten up the kernel count tests	2023-05-29 08:48:54 -07:00
wozeparrot	0dc333cfab	Promote Embedding to `nn` (#798 ) * feat: promote Embedding to nn * fix: fix failing test * feat: add test with jit * feat: rewrite embedding to no longer need stacked for loops * clean+fix: don't know how that happened	2023-05-25 18:39:45 -07:00
George Hotz	03b38864db	fix batchnorm at training (#753 ) * e2e testing * min failure * no affine on bn, still fails * why did i think i could detach that? * allow more kernels for bn * some test issue i don't understand	2023-04-19 08:01:04 -07:00
George Hotz	17e37157b6	fix backward convs (#746 ) * fix backward convs * no pushing in reduce * late cout * test_fold_4convs_sgd	2023-04-14 10:42:11 -07:00
George Hotz	f7f416d6f4	back to 6 for test_fold_conv_sgd	2023-04-14 07:34:00 -07:00
worldwalker2000	552a048a33	make maximum split the grad like torch when equal (#738 ) * make maximum split grad * added test for maximum split grad when equal * minor expr simplification * (2-eq)/2 only once * update test bc one more sum output child stays	2023-04-14 00:17:46 -07:00
George Hotz	1cb5b2d015	test_enet_se	2023-03-24 10:04:30 -07:00
George Hotz	e88b9bfe1e	print gflops avg with DEBUG=2	2023-03-23 16:07:08 -07:00
George Hotz	b12b60af20	fix binop, other tests failure (#723 ) * fix binop, other tests failure * that was a bad idea * better layernorm * inference kernel count tests * new style reshape pushing * fixup replacement * 199 kernels is okay. fix flops * push reshape through unaryops only * GRAPH=2 draws the phantom ops * found resnet issue * non working test * mul is cheaper than div * OPT inflation * SHUFFLE_PAD_OPS in OPT=2	2023-03-22 18:15:07 -07:00
George Hotz	902906f909	Fix constant folding (#713 ) * fix * codegen * contiguous is real * no bufs_to_delete * don't assign rawconst * remove neg and not * need exec to fix custom function jit	2023-03-18 17:52:46 -07:00
George Hotz	f5467cfedc	Devicebufferless (#708 ) * runs one metal kernel * conv2d works * ops tests are passing * const folding * all ops work * pre commit always passes * torch works * working still * fix graph test * tests passing * image almost works * image conv works * most images * fix custom * fix assignment * fix compile enet * clean up comments * fix realize return value * include shapetracker in LB repr * copy should make a copy * reenable method cache * fix lna * dtypes in graph * forward only for IMAGE=2 * simple realize * getting close * fixup new api, it's good except the kernel count * back to 197 kernels * tests should pass * go to a real float * no type_on_cpu * fix the docs * put shapetracker back in it's proper place	2023-03-18 14:40:23 -07:00
George Hotz	37cf6fc4c0	err, external_test_opt.py broke...fusing will have to wait. correctness over speed	2023-03-11 17:54:47 -08:00
George Hotz	d7cb8e3e56	multithreaded fake_torch_load_zipped	2023-03-10 19:16:27 -08:00
George Hotz	00641aa45d	add challenge tests	2023-03-07 19:39:04 -08:00
George Hotz	4eb880550f	enable contract test	2023-03-07 17:32:28 -08:00
George Hotz	b561256a0e	allow all reduces (#661 ) * allow all reduces * push permute tests * explict permute reshape push * contractw1s	2023-03-07 15:36:01 -08:00

1 2

52 Commits