tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 04:47:56 -05:00

Author	SHA1	Message	Date
geohotstan	5ed630204b	Add ONNX to CI for other backends (#2069 ) * some cleanup * move continue back * more more more * added to CI * try * try intentionally break some tests * wtf * del True for test * yay tests broke, now pls no break * try AGAIN * gahy * lol * try * move over constant * moved over MORE * move shrink over * trailing lines * try CUDA CI * try again * boom * oops * improved comments * try: disable some flags and disable CUDA * try breaking tests * traceback has too much info so add --tb=no * revert forced CI failure * add comments and del unused imports * oooooooo using regular debug try enable tb * intentionally break tests * added tb back. Maybe not too verbose * strip whitespcae * missed something * Shape op int32 -> int64 * oops missed something * add some types * get rid of crazy 1 liners in pad op * actually test Split this time LOL * strip that whitespace	2023-10-17 09:33:54 -07:00
George Hotz	5a4a62ecae	Disable logging in early compile2 and lower kernel counts (#2090 ) * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`. * gate behind OPT >= 4 * disable_logging in schedule * simple * from master * more images * revert that * 206 kernels	2023-10-16 20:15:24 -07:00
George Hotz	d0aaf7d83b	Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 )"" This reverts commit `f22a7cf656`.	2023-10-16 17:47:00 -07:00
George Hotz	5e24dc5a95	limit metal buffers and revert the 207 fix (try 2) (#2088 ) * limit metal buffers * look at the base, not the srcs * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`. * add a test for that	2023-10-16 14:52:16 -07:00
George Hotz	e8fcd2f3db	Revert "limit metal buffers and revert the 207 fix (#2087 )" This reverts commit `2fb10f6a19`.	2023-10-16 14:32:22 -07:00
George Hotz	2fb10f6a19	limit metal buffers and revert the 207 fix (#2087 ) * limit metal buffers * Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)" This reverts commit `924ecc4d6a`.	2023-10-16 14:26:32 -07:00
George Hotz	c36d306606	KOPT is over, BEAM is upstream (#2071 ) * create cache for q learning * make linter happy * global beam * where it belongs * bugfix * ditch the kopt, use the beam * faster lin and DEBUG=2 okay * remove kopt, move search to features	2023-10-16 09:46:03 -07:00
mmmkkaaayy	91168a28c4	whisper: make file transcription work, add basic CI test (#2042 )	2023-10-13 17:13:35 -07:00
George Hotz	924ecc4d6a	Revert "openpilot kernel fix from 209 to 207 (#2006 )" (#2065 ) This reverts commit `63869c62fc`.	2023-10-13 12:01:55 -07:00
Amrit Sahu	63869c62fc	openpilot kernel fix from 209 to 207 (#2006 ) * Fix openpilot kernel from 209 to 206 1. Use push_movement_ops conditions in _movement_op. Don't push PAD or check if the ops are safe to be pushed with PAD 2. Don't push if all the op.buffers are realized * change ALLOWED_KERNEL_COUNT to 206 for openpilot * don't push through sourceless buffers * change the tests to adjust kernel counts for new behaviour * restore pushing of movement ops through childless buffer * don't push EXPAND, causes OOM * allow push of intermediate movement ops * adding new test behaviour * modifying external_test_opt for new behaviour * restore old tests * Reenable push of EXPAND and introduce new tests I was wrong intially thinking EXPAND can cause OOM and hence I had disabled it. Since it is 0 stride and doesn't allocate memory its cool * Don't push EXPAND above LoadOps LB. This is causing OOM * Push should be decided on movement root of bufs To check if ast.op.buffers is sourceless/ realized go the the movement root and then decide if pushing should be done or not * refactor for readability * use .base instead * don't push expand, bad memory/compute consumption * restrict push of reshape, seeing improvement * push reshape if unary without further check * disable PAD solves convnext kernel count increase * reenable test_cache_binaryop_transpose * small nit	2023-10-13 11:59:15 -07:00
qazal	0e2e041faf	CI for using tinygrad as an external pkg (#2019 ) * create workflow * unify with test.yml	2023-10-08 10:50:48 -07:00
Vidhan Bhatt	94b21c41a7	ci: use `mypy.ini` (#1993 )	2023-10-06 01:45:28 -07:00
George Hotz	2d0c1037b1	Fix up latest openpilot model (#1976 ) * fix gemv triggering for gemm * fixup_openpilot * external test issues	2023-10-05 05:24:28 -07:00
Ahmed Harmouche	fb4d830a2a	Fix cast error in render_load in wgsl (#1956 ) * Fix cast error in wgsl * User render_cast intead of introducing new method * Make it shorter * Add back webgpu tests: efficientnet and dtypes	2023-10-04 02:29:14 -07:00
George Hotz	6a79d4044a	unrealized consts everywhere (#1963 ) * unrealized consts everywhere * don't import device from lazy * Device isn't in Lazy * same issue * disable jit random	2023-10-04 01:48:10 -07:00
George Hotz	6a4ec4776e	fix CI (#1953 ) * this work * unauth * update in all places	2023-10-02 02:58:58 -07:00
Francis Lam	f445e056ed	wmma: add test and tensor core shape (#1925 )	2023-09-28 18:04:28 -07:00
Yixiang Gao	10f0dc0c85	keep only one comment from git action bot (#1936 )	2023-09-28 20:24:53 -04:00
wozeparrot	70671d9625	fix test_collectives (#1934 ) * fix: fix test_collectives.py * feat: reenable test_collectives	2023-09-28 11:02:22 -07:00
George Hotz	adab724caa	schedule2, keep the tests working with small changes (#1932 ) * lazy cleanups * ast functions take in LazyOps * op instead of self.op * _base for mops * fix contiguous * start schedule * test_schedule * fix openpilot * more tests * bugfix and test skip * work * make sure things get freed * fix zerosized tensors * fix failing test * fix ceil and friends * fix openpilot * disable training * disable test collectives	2023-09-28 09:14:43 -07:00
George Hotz	1e15fdaee7	disable flaky triton test	2023-09-23 14:59:36 +08:00
Szymon Ożóg	58296c079d	Make Triton work again (#1547 ) * Move ops_triton to runtime and remove errors from deprecated code * Remove deprecated AST Kernel * Remove deprecated buffer * Add TritonProgram * Triton Buffer * Use RawCUDABuffer * triton_compile * Added new parameter * pass _buf to program * remove deprecated include * Added triton tests * Deprecated includes removed * remove double print * Disable float4 support * Disable float4 support * variable load fix * Track local size * Add pycuda to triton dependencies * Merge test.yml * install cuda packages for testing * merge double package install * remove emulated from triton tests * upscale local index to power of 2 and add masking * cuda envs * Add TernaryOps * ConstOp loading * proper function name * remove deprecated variables * get global program from name * const ops match local shape * Enable test_nn * remove deprecated import * fix linter error * Add wait logic * Add local size override * accumulate local shapes instead of using max shape * Merge triton tests into global tests * fix envs in testing * Old testing routine * split file into renderer and program * remove print and starting whitespace * pretty ptx print on debug 5 * linter errors * ignore triton saturation tests * ignore test example * remove pytorch cpu extra index * Add triton to existing testing routine * use triton tests * disable cuda backend in triton tests * use cudacpu in tests * print used device * Print device default * Remove print * ensure we are running triton backend * update variable signatures * update dtypes for load * infinity render fixed * limit global size * negative infinity now properly rendered * split chain with parentheses for and node * Add option to disable shared memory, disable for triton * missing import * Properly index and mask conditional load * use mask only if not loading a block pointer * nan support * fix symbolic tests to include chain split * proper masking for stores * Implemented bool dtype * Add mod * fix loads for variables with valid range * merge triton with cuda runtime * merge from master * run triton tests with cuda * Correct target when running from triton * conftest with triton compiler config * use triton nightly * verbose tests for triton * capture stdout * fix function depth when exiting multiple loops * add render valid function for readabilty * fix mask for local loops * add _arg_int32 datatype * fix dims for conditional loads * enable non float stores * correct variable dtypes * fix type for arg_int32 * remove junk * Added get max function for range based var.max * remove deprecated code * Fix triton ptxas path * Fix testing for CI * clamp local size by max local size instead of always running max * Disable matmul test in triton cpu * rerun tests * Disable broken test in triton cpu * whitespace removed * rerun tests again * Disable TestSymbolicOps for triton * update to new uops * linter fix * ignore test/extra * linting fix * Update tinygrad/renderer/triton.py Co-authored-by: Gijs Koning <gijs-koning@live.nl> * remove deprecated line * quotes type fix * linter * Remove unnecesary lines * UnaryOps.NEG * dont define constants * Linting fix * Disable tests that are broken in ocelot * remove trailing whitespace * reduce line count * linting fix * update to new uast * New looping style * Update to new uast * make AST runner work with triton * linting fix * set renderer var for testing * disable local for ocelot * reenable all tests for ocelot * Pass shared to cuda * Don't group if the backend doesn't support shared mem * use working gpuocelot branch * enable all tests * enable local for ocelot * cleanup * Update test.yml * update cache key * reenable test symbolic and extra * Update test.yml * Revert "Update test.yml" (rerun tests) This reverts commit `98c0630ee5`. * Revert "fix symbolic tests to include chain split" This reverts commit `22a9a4c9cd`. * Revert "split chain with parentheses for and node" This reverts commit `7499a7004e`. * use global size from linearizer * rename newvar to dtype to match other renderers * join program start lines * simplify code that adds axis to local dims * assign r[u] in ssa * We no longer need to replace target in src * we no longer need to cast indices to int by hand * Update triton.py(rerun tests) * Update triton.py(rerun tests) * Update triton.py(rerun tests) --------- Co-authored-by: Gijs Koning <gijs-koning@live.nl> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-23 14:17:12 +08:00
Umut Zengin	3987280daf	Fix VALIDHACKS for Images and make it default (#1832 ) * valid hacks * valid hacks * valid hacks * new method * new method * handtune * is gate load breaking? * lint ruff less junk new approach? maybe this? * Make it more clear * Make it more clear * Will deal with the linter later * hack for linter * subs the idx but dont touch the valid * Updated the mod rules * lint hack * I believe bug fix lets see * Mod Node left * revert * Maybe this wont break? * revert * implemented "handtuned garbage" * revert and use VALIDHACKS * Lets see the CI * still broken? * currently its jungle * maybe this jungle ? * This works for everything somehow * Added test for symbolic * lint * final touch * This still works * lint * midway clean * less garbage * lint * final form * Slow but working way * lint and other stuff * lint * mypy * Make sure CI test Openpilot valid checks * test if CI break * Convert back * refactor * refactor * Managed to reduce openpilot time from 30 secs to 5 secs * Refactor * Substitute a node with variable * flake8 * Comment and refactor * More comprehensive mod * refactor * bug fix * More shave off * remove not sure part	2023-09-23 07:34:43 +08:00
Yixiang Gao	84ab47a90a	add branch up-to-date check (#1879 )	2023-09-20 12:41:51 -04:00
Yixiang Gao	18ec5a9e09	add comment bot to CI (#1873 )	2023-09-16 12:22:06 -04:00
wozeparrot	c870764940	Revert "add line changes diff bot to CI (#1863 )" (#1870 )	2023-09-15 16:56:42 -04:00
Yixiang Gao	789c84a7a3	add line changes diff bot to CI (#1863 )	2023-09-15 16:29:58 -04:00
chenyu	29ac8293d7	run gpt2 in CI (#1866 )	2023-09-15 04:37:02 +08:00
chenyu	9e9ea20784	Fix view, CI cpu test with python 3.8 (#1845 )	2023-09-10 22:37:58 -04:00
George Hotz	0e3e2bac13	amd wino: upload results	2023-09-09 13:57:14 -07:00
George Hotz	6f95c5f284	winograd speed test for AMD (#1826 )	2023-09-09 13:56:33 -07:00
George Hotz	0f2bd10d00	add winograd CIFAR to mac tests (#1825 ) * add winograd CIFAR to mac tests * symlink already done	2023-09-09 13:45:24 -07:00
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
nimlgen	f863c12610	test kopt correctness (#1756 ) * test kopt correctness * bump BUDGET to 20 * kopt hooks as setUp/tearDown	2023-09-04 10:55:00 -07:00
George Hotz	56abe04e4b	disable assembly (#1755 )	2023-09-04 09:41:20 -07:00
chenyu	b8fde6bb0f	Test KOPT in CI (#1744 ) * test kopt in ci * getenv takes dtype from default	2023-09-03 14:37:20 -07:00
George Hotz	89cd380bfc	add nvidia CI (#1737 ) * add nvidia * speed(nvidia)	2023-09-01 22:02:30 -07:00
George Hotz	fdd7f282cb	Reenable tensor cores for self-hosted Mac CI (#1717 ) * debug 5 matmul * allow tensor cores in CI * tensor cores on arm64 * put debug back	2023-08-30 07:53:04 -07:00
wozeparrot	2f768e386d	stable diffusion benchmark artifact (#1714 )	2023-08-29 21:08:40 -04:00
George Hotz	0ea22bf249	remove DEBUG=1 from stable diffusion AMD since jit cache is fixed	2023-08-29 12:46:12 -07:00
George Hotz	ab9b9ff3e2	pipefail benchmark (#1709 ) (#1710 ) * feat: specify shell * feat: specify shell for mac Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-08-29 08:15:02 -07:00
George Hotz	aa7c98722b	sd timing (#1706 )	2023-08-28 20:22:57 -07:00
George Hotz	f5f8b09c13	allow manual release (#1704 )	2023-08-28 17:54:25 -07:00
George Hotz	715047a1e4	fix release publish (#1703 )	2023-08-28 17:48:00 -07:00
chenyu	b5d700adae	update openpilot supercombo.onnx to 0.9.4 (#1681 ) * update openpilot supercombo.onnx to 0.9.4 * update tests for the new model * comment out comma models from external_model_benchmark	2023-08-26 19:16:08 -04:00
Roelof van Dijk	89b529c07f	[ready] ci: add py38 to linters (#1674 ) * ci: add py38 to linters * fix: run linters only on py38 --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-26 09:34:15 -04:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
Roelof van Dijk	1900acda09	[READY] ci: setup venv cache (#1475 ) * ci: cache installed packages * ci: trigger jobs * ci: fix hashfiles argument --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-20 18:43:16 -07:00
George Hotz	012ee7d162	not worth the speed (#1584 ) * not worth the speed * no slots * uops comments * bump to python 3.11 for speed * add critical slots back	2023-08-20 10:24:58 -07:00

... 15 16 17 18 19 ...

1003 Commits