tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 23:08:06 -05:00

Author	SHA1	Message	Date
George Hotz	4124cf1df5	cleanup tensor cores, expose exclude local upcast (#2064 ) * expose exclude_local_upcast * convert apply tensor cores to ops * update comment * put LOCAL back to what it was, BEAM is better than way	2023-10-14 09:21:03 -07:00
David Hou	6e4a12ab68	HIPProgram __del__ (#2058 ) * HIP: only load modules when run, add __del__ * only add del	2023-10-13 17:15:46 -07:00
mmmkkaaayy	91168a28c4	whisper: make file transcription work, add basic CI test (#2042 )	2023-10-13 17:13:35 -07:00
George Hotz	924ecc4d6a	Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 ) This reverts commit `63869c62fc`.	2023-10-13 12:01:55 -07:00
Amrit Sahu	63869c62fc	openpilot kernel fix from 209 to 207 (#2006 ) * Fix openpilot kernel from 209 to 206 1. Use push_movement_ops conditions in _movement_op. Don't push PAD or check if the ops are safe to be pushed with PAD 2. Don't push if all the op.buffers are realized * change ALLOWED_KERNEL_COUNT to 206 for openpilot * don't push through sourceless buffers * change the tests to adjust kernel counts for new behaviour * restore pushing of movement ops through childless buffer * don't push EXPAND, causes OOM * allow push of intermediate movement ops * adding new test behaviour * modifying external_test_opt for new behaviour * restore old tests * Reenable push of EXPAND and introduce new tests I was wrong intially thinking EXPAND can cause OOM and hence I had disabled it. Since it is 0 stride and doesn't allocate memory its cool * Don't push EXPAND above LoadOps LB. This is causing OOM * Push should be decided on movement root of bufs To check if ast.op.buffers is sourceless/ realized go the the movement root and then decide if pushing should be done or not * refactor for readability * use .base instead * don't push expand, bad memory/compute consumption * restrict push of reshape, seeing improvement * push reshape if unary without further check * disable PAD solves convnext kernel count increase * reenable test_cache_binaryop_transpose * small nit	2023-10-13 11:59:15 -07:00
George Hotz	90c777d815	remove apply_auto_opt (#2063 )	2023-10-13 07:44:14 -07:00
nimlgen	bd42fa0b73	kernel cache (#2035 ) * init compiled cache * clang not compile to stdout * use kwrags in compile * remove some useless lines * slimmer * fix * tabs * retry * remove decorator * no race in hip * smaller hip * unused import * unused pathlib * path to str * add test * fix linter * less lines? * decorator is back * update tests * no hip version * better comments * a bit better test * linter * work wo decorator * linter happy * simpler return type * more tests * better comment * readable * readable * readable * compile returns bytes * no ununsed imports * readable	2023-10-13 06:32:01 -07:00
George Hotz	6f1810af2d	with unroll, the action space goes from 161 -> 127 (#2060 ) * with unroll, the action space goes from 161 -> 127 * more reliable instrumentation * beam search is so op * beam bugfix	2023-10-12 20:52:23 -07:00
Umut Zengin	6b7ac5c431	ModNode __mod__ rule (#2039 ) * Implement mod rule * mypy * feat: New test added	2023-10-12 11:30:10 -07:00
Yixiang Gao	3187962476	CIFAR HALF mode (#2041 ) * load weights in fp16 * add dtype option in nn * fix test * no need for dtype in nn * add option to load weights in FP16, but NaN * change loss scaler * cast to float32 for norm layer * add a todo for the forward pass padding * fix transform	2023-10-12 10:19:51 -07:00
George Hotz	c5edb3c374	train value net, improve API, add BCE (#2047 ) * api cleanups, BCE losses * valuenet * fixup examples * learning okay * add valuenet runner * net improvements * net improvements * 40% win rate	2023-10-12 07:56:38 -07:00
George Hotz	0ba629c7b9	add world dataset (#2045 )	2023-10-11 15:54:30 -07:00
George Hotz	0c3b6f13a8	Latest opt (#2044 ) * split out actions * rl algorithm	2023-10-11 15:46:14 -07:00
geohotstan	8d6cecb25c	Torch eq fix (#1562 ) * init * Revert "init" This reverts commit `682bf2073a`. * kids dont do drugs * one way to fix * resolve merge conflict * no more or * clean up	2023-10-11 12:57:11 -07:00
George Hotz	41bfeb2c1e	start work on auto opt (#2034 ) * start work on auto opt * lin failure * not beating hcopt * greedy * timing is fast * codegen.search * greedy search in handcode_opt * track running gflops * clean up those files * no failure	2023-10-11 12:54:53 -07:00
chenyu	1c980517c5	s/var_vals_from_ast/vars_from_ast (#2038 )	2023-10-10 20:21:55 -07:00
Francis Lam	81c7d750db	test: fix test_linearizer.test_tensor_core test (#2036 ) must use apply_tensor_core instead of hand_coded_optimizations	2023-10-10 14:48:28 -07:00
chenyu	e2b83f1b42	Variable.bind newer (#2017 ) * Variable.bind attempt 2 * ShapeTracker.unbind * fix llama * fix types * test case * View.vars cleanup * include mask in symbolic source * mask can be sint * st.unbind in bufferops * assert ast contain free Variable only * cleanup * conservative unbinding reduce op arg * move reduceop unbind * fix llama JIT arg behavior	2023-10-10 10:03:01 -07:00
qazal	71d93ffd79	Refactor GPU and Metal langauges in their own separate renderers (#2033 ) * Refactor GPU and Metal langauges in their own separate renderers * remove CStyleLanguage imports * move renderers too	2023-10-10 07:46:41 -07:00
George Hotz	f139060103	Rewrite hand coded opt with action space (#2030 ) * tests passing * hand coded opt with new abstractions * simpler opts * split out tensor cores	2023-10-10 07:38:38 -07:00
Ahmed Harmouche	e27fedfc7b	Fix stable diffusion output error on WebGPU (#2032 ) * Fix stable diffusion on WebGPU * Remove hack, numpy cast only on webgpu * No-copy numpy cast	2023-10-10 06:40:51 -07:00
qazal	e40f141203	Refactor and add more unit tests for disktensors (#2022 ) * testing with the test_ops pattern * add assign test * flake8 complaining about single line fn * slice 2d and minor cleanup * make assign_slice a one-liner * we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn * back assign fn for np array * implement __setitem__ in tensor.py * dont re-slice the ret tesnsor * one liner assign * drop the permute test	2023-10-09 18:46:29 -07:00
chenyu	45f0891a8f	use "<" instead of "<=" in codegen for loop (#2027 )	2023-10-09 17:26:36 -07:00
chenyu	25555c836f	llama default to JIT only if device supports JIT (#2028 )	2023-10-09 17:26:02 -07:00
George Hotz	16ca8410f8	op logger + replay (#2021 ) * logops * fix dtype printing * needs inf * ops dataset * minor improvements * 12k kernels * opt can compile * graph flops	2023-10-08 15:10:18 -07:00
calledit	46f354b49f	Fix comment to describe code (#2023 )	2023-10-08 14:28:14 -07:00
qazal	0e2e041faf	CI for using tinygrad as an external pkg (#2019 ) * create workflow * unify with test.yml	2023-10-08 10:50:48 -07:00
George Hotz	8db92bd060	fix tvm gemm example	2023-10-08 05:57:41 -07:00
mmmkkaaayy	af6e2f31ca	whisper: cast model output token to int32 (#2013 ) Co-authored-by: mmmkkaaayy <mmmkkaaayy@users.noreply.github.com>	2023-10-08 05:56:22 -07:00
Luca Sciarpa	e93e240a6c	adapting test/external/external_osx_profiling.py to the new code base (#2002 ) * adapting external osx profiling * fixing dtype * fixing buffer size	2023-10-08 05:55:00 -07:00
wozeparrot	c4e8ea73bd	feat: add tinygrad.features to setup.py (#2016 )	2023-10-07 21:55:50 -07:00
Francis Lam	dece9958f8	wmma: clean up to make WMMA arg order consistent (#2014 ) also add cache defeat to extra/gemm/simple_matmul.py	2023-10-07 17:45:40 -07:00
George Hotz	cea4cbfc7a	move image+kopt to features (#2015 ) * move image+kopt to features * fix tests * debug prints (unrelated)	2023-10-07 15:41:08 -07:00
George Hotz	44ed94ef5c	use the device abstraction in handcode_resnet50_opt	2023-10-07 13:22:20 -07:00
George Hotz	6ee9cae44f	don't extract CIFAR every time / use the cache	2023-10-07 12:33:50 -07:00
nimlgen	d07ac379f9	add var_vals to kopt with symbolic (#2008 ) * add var_vals to kopt with symbolic again * no copies	2023-10-07 09:34:21 -07:00
George Hotz	121f7aa8c5	Schedule item (#2012 ) * ScheduleItem * put var_vals in the schedule * fix tests, wow that proliferated quickly * not ready to be in the schedule	2023-10-07 08:59:25 -07:00
George Hotz	f1f64bc88d	remove val_vars from the linearizer (#2009 ) * remove val_vars from the linearizer * no need to store var vals	2023-10-07 07:47:28 -07:00
George Hotz	dea8bb0938	triton isn't tested, and allows this refactor (#2007 ) * triton isn't tested * cuda buffer	2023-10-07 07:29:59 -07:00
George Hotz	23de1db727	strip whitespace	2023-10-07 06:06:27 -07:00
Roelof van Dijk	26fcc8dff6	fix: remove runtime imports (#1982 ) fix: import what is used probably monkeypatched fix: import revert selective import	2023-10-07 05:23:08 -07:00
George Hotz	f54959e5cd	move print tree into graph (#2003 ) * move print tree into graph * add winograd profiling test * change pre-commit to run ruff first	2023-10-07 04:39:21 -07:00
Ahmed Harmouche	2114dc13d1	Allow multi-input model export (#1995 ) * Allow multi-input model export * Add model export unit test * Fix efficientnet compilation * Only run model export test on JIT supported devices * Skip export model test if not EXPORT_SUPPORTED_DEVICE	2023-10-07 04:13:34 -07:00
George Hotz	ffa33d743a	good changes from openpilot_compile2 (#2000 ) * good changed from openpilot_compile2 * float32 image type was wrong * cleaner way to write that + a test	2023-10-06 13:33:24 -07:00
chenyu	05be57f57f	Fix llama with empty prompt (#1997 ) * fix llama with one token prompt * llama is all_jitted	2023-10-06 06:48:07 -07:00
George Hotz	7a68060422	Revert "allow local + grouped reduce in hand_coded (#1996 )" (#1998 ) This reverts commit `219a1f7063`.	2023-10-06 06:43:28 -07:00
nimlgen	219a1f7063	allow local + grouped reduce in hand_coded (#1996 ) * allow local + grouped reduce in hand_coded * allowed loop size based on global_dims * fix const * fix const one more time * better divisor * a bit fix * can take 2, why not * fix linter * better comments * start with 2 * not always pick group reduce * fix images * better images * better	2023-10-06 06:11:28 -07:00
George Hotz	fa9945dac0	remove stale tests	2023-10-06 02:14:56 -07:00
Vidhan Bhatt	94b21c41a7	ci: use `mypy.ini` (#1993 )	2023-10-06 01:45:28 -07:00
George Hotz	e43d8977f8	Revert "chore: add `py.typed` marker. (#1991 )" (#1994 ) This reverts commit `6d581e8911`.	2023-10-06 01:44:34 -07:00

1 2 3 4 5 ...

2616 Commits