tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 12:58:00 -05:00

Author	SHA1	Message	Date
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
nimlgen	f863c12610	test kopt correctness (#1756 ) * test kopt correctness * bump BUDGET to 20 * kopt hooks as setUp/tearDown	2023-09-04 10:55:00 -07:00
George Hotz	56abe04e4b	disable assembly (#1755 )	2023-09-04 09:41:20 -07:00
chenyu	b8fde6bb0f	Test KOPT in CI (#1744 ) * test kopt in ci * getenv takes dtype from default	2023-09-03 14:37:20 -07:00
George Hotz	89cd380bfc	add nvidia CI (#1737 ) * add nvidia * speed(nvidia)	2023-09-01 22:02:30 -07:00
George Hotz	fdd7f282cb	Reenable tensor cores for self-hosted Mac CI (#1717 ) * debug 5 matmul * allow tensor cores in CI * tensor cores on arm64 * put debug back	2023-08-30 07:53:04 -07:00
wozeparrot	2f768e386d	stable diffusion benchmark artifact (#1714 )	2023-08-29 21:08:40 -04:00
George Hotz	0ea22bf249	remove DEBUG=1 from stable diffusion AMD since jit cache is fixed	2023-08-29 12:46:12 -07:00
George Hotz	ab9b9ff3e2	pipefail benchmark (#1709 ) (#1710 ) * feat: specify shell * feat: specify shell for mac Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-08-29 08:15:02 -07:00
George Hotz	aa7c98722b	sd timing (#1706 )	2023-08-28 20:22:57 -07:00
George Hotz	f5f8b09c13	allow manual release (#1704 )	2023-08-28 17:54:25 -07:00
George Hotz	715047a1e4	fix release publish (#1703 )	2023-08-28 17:48:00 -07:00
chenyu	b5d700adae	update openpilot supercombo.onnx to 0.9.4 (#1681 ) * update openpilot supercombo.onnx to 0.9.4 * update tests for the new model * comment out comma models from external_model_benchmark	2023-08-26 19:16:08 -04:00
Roelof van Dijk	89b529c07f	[ready] ci: add py38 to linters (#1674 ) * ci: add py38 to linters * fix: run linters only on py38 --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-26 09:34:15 -04:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
Roelof van Dijk	1900acda09	[READY] ci: setup venv cache (#1475 ) * ci: cache installed packages * ci: trigger jobs * ci: fix hashfiles argument --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-20 18:43:16 -07:00
George Hotz	012ee7d162	not worth the speed (#1584 ) * not worth the speed * no slots * uops comments * bump to python 3.11 for speed * add critical slots back	2023-08-20 10:24:58 -07:00
George Hotz	ad7d26c393	fix __launch_bounds__ and benchmark TC MATMUL (#1575 ) * fix * benchmark matmul	2023-08-19 10:54:39 -07:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
Roelof van Dijk	84e6693915	fix: apt-get to apt, no recommends, clean up (#1571 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-18 13:48:59 -07:00
Ethan Sorrell	cb62911f6b	PTX Reintegration and Passing Tests (#1512 ) * move assembly, assembly_ptx * successful but broken rendering of ptx asm * clear ins before render asm * slightly less broken :') * we needed thread syncs * fix float16 loading, rounding modifiers and other casting stuff, passing casts_from_half * Fix runtime_args for gpuocelot * our casts were flipped on both ends * more casting * add ternary where op * dealing with storing/loading bool * add test for casting to bool from negative * Fix args.valid on ConstOp * add to CI, TODO: fix runtime_args for test_uops * fix placement of runtime_args to work with lazy.Device * undo ci changes so I can push * fix lints * start cleanup and fix things we broke fixing lints * add checks for PTX specifc asm instructions * revert added test -- doesn't pass on llvm * skip tests for underflow,overflow * another fix for how we're setting runtime args * Less broken cleanup * add to CI * add more env variables for ci test * fix ci to install pycuda for ptx * ci: copy cuda test command * cleanup * assert to make sure we're actually running ptx in ci * remove test assert * move is_ptx arg * move assembly, assembly_ptx back to extras * fix imports * initial merge fixes * clear registers, fix UOps.LOAD with invalid value * draft merge fixes * remove prints * quick lint and merge fixes * cleanup * remove PTXProgram wrapper * final cleanup * temp change for ci rerun * ci rerun * rollback ISA version	2023-08-16 16:20:20 -07:00
chenyu	11dd9b1741	symbolic codegen and exec (#1552 ) * symbolic codegen and exec * fix and add test * no sketchy * merge_dicts type * dtypes._arg_int32	2023-08-16 14:43:41 -07:00
wozeparrot	074c467020	hotfix for broken ci (#1559 )	2023-08-16 13:52:03 -04:00
Steven Anderson	93a36c3659	Arm (#1421 ) * testing new memops * better debugging * testing padded conv * branching with load * refactoring a bit * first try * fixing bugs * fixing some * eq * eq2 * do not use x's * working * fixing imm * getting things working * refactor * pow not working * working except one * refactor: one store mem * refactor: global load * refactor: imm * refactor: cleaning * fixing big offsets * refactor with ci * try ci * typo * another typo * ubuntu default * forgot git * do i need git? * missing packages * adding python-dev * with cache? * buildx action * buildx name issue? * maybe now? * python3 * newline warning * maybe now * i actually need this * ci should work now * improved caching * fixing cache * maybe now it will cache * this * testing cache * trying again * load * missing platform * caching gha * testing cache * full testing * typo * now? * why * adding checkout back * bad formatting * fixing convention issues * supporting python * adding CI flag * testing all * better comments * adding debugging * takes 12x longer * does it output progress now? * ignore models for speed * fixing merge * excluding conv_transpose2d * only 2 test cuz is to slow * another approach * let's see * faster duh * my bad * T_T * typo * sup * with output? * comment test * comment test * comment test * :? * no comment * with cache * back to normal * testing that ci works * back to passing * trying again * does it create another entry * does it create another entry? * build local * hey * Revert "excluding conv_transpose2d" This reverts commit `cc7348de03`. * does it cache if done before? * does it cache? * done * adding test ops * bad formatting * no need for this * working static mem * sum 1d * add ndim * better reg import * fix stack * back to np * working except for softmax * 5 failing * no pogress * remove keystone * remove keystone * testops passing * cleanups * more cleanup * typo * ci * ci2 * cond import * ci3 * ci4 * ci4 * ci5 * ci5 * ci6 * aligment * test all * correct test * err read_unmapped * passing test * ignore for speed * ignore for speed * ci7 * cleanup * remove docker * fixing merge * fixing bugs * add skipload for const ops * comments * First merge to master: Renderer * fix emulation * passing all tests arm64 * cleaning * fix handcoded binary * cleaning * fix errs * fix runtime arg binary * clean git diff * fix and clean * fixing metal test * cleaning * fix metal test * ci ~8 min * fix pylint and clang * cache the files in ops_clang --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-14 19:29:30 -07:00
wozeparrot	29d5801387	distributed collectives (#1519 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device * feat: allreduce * feat: test * feat: need contiguous * feat: test in ci * feat: exit with correct code * feat: don't need that * feat: opencl wait_for just doesn't work * feat: synchronize on out * feat: try? * feat: try again? * feat: add extra realizes * feat: print * feat: seed * feat: tol * feat: test ones and zeros * feat: remove print * feat: are you just flaky * feat: seperate scatter and gather? * feat: just try synchronizing * feat: remove print again * feat: bring back difference * feat: no sync * feat: revert that * feat: back to wait_for * fix: typo	2023-08-11 10:22:07 -07:00
wozeparrot	7e7c9001e9	distributed world (#1481 ) * feat: world * feat: tests * feat: no more backwards * feat: recv into * feat: whoops * feat: test in ci * feat: some debug logging * feat: workflow naming * feat: need to set pythonpath * feat: just send to same device	2023-08-10 10:00:51 -07:00
George Hotz	e3c6c0c6db	add GPT2 example (#1511 ) (#1514 ) * add gpt2 to examples * some cleanup * fixes * argparse + scaled_dot_product_attention * add timing * add to benchmark Co-authored-by: YassineYousfi <yassine.y10@gmail.com>	2023-08-10 09:09:47 -07:00
wozeparrot	351684395c	dont run on fork (#1510 )	2023-08-09 13:06:45 -04:00
wozeparrot	88e2e0c8a3	Revert "don't try to run benchmark on forks" (#1508 )	2023-08-09 12:59:49 -04:00
wozeparrot	65b65b760b	don't try to run benchmark on forks (#1507 )	2023-08-09 12:59:19 -04:00
Roelof van Dijk	aa83a9e910	ci: fix gpuocelot build cache (#1474 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-08 14:00:04 -07:00
Roelof van Dijk	e2cf0f322e	[READY] ci: missing n=auto (#1486 ) * ci: missing n=auto * fix: add to commented test --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-08 07:37:24 -07:00
George Hotz	5fdd248617	don't download cifar (#1472 )	2023-08-06 21:38:59 -07:00
George Hotz	d78fb8f4ed	add stable diffusion and llama (#1471 ) * add stable diffusion and llama * pretty in CI * was CI not true * that * CI=true, wtf * pythonpath * debug=1 * oops, wrong place * uops test broken for wgpu * wgpu tests flaky	2023-08-06 21:31:51 -07:00
Diogo	d7d1011f1e	Add WEBGPU tests to CI (#1463 ) * webgpu tests * assert device is webgpu * missed env set * exclude failing ci tests * ignore test file * changed acc for adam test	2023-08-06 10:32:01 -07:00
George Hotz	486a9dbfd9	speed v torch (#1464 ) * speed v torch * always print * change print * torch speed tee * all exposed	2023-08-06 09:32:33 -07:00
George Hotz	2ab282bfec	run on update_benchmark too (#1460 ) * run on update_benchmark too * amd inference test * name it better * add 10 CIFAR training steps	2023-08-06 08:58:37 -07:00
George Hotz	943b227cb1	only on push to master	2023-08-06 00:10:07 -07:00
George Hotz	2274e3e757	Fix benchmark (#1454 ) * do benchmarking * system * artifact * go * name artifact * only on push	2023-08-05 23:44:36 -07:00
George Hotz	bf21aec81f	do benchmarking (#1451 ) * do benchmarking * system * artifact * go * name artifact	2023-08-05 23:35:01 -07:00
George Hotz	67781fcf5d	fix fail fast in CI	2023-08-05 10:24:24 -07:00
wozeparrot	ab9e4a2e93	Make cuda CI a bit more consistent (#1403 ) * feat: use fast-apt-mirror * feat: use in more places	2023-08-02 07:38:22 -07:00
Diogo	4dc8595069	simple exporting models (#1344 ) * unified exporting * json exporting * ignore more * simplified buffer export * added dtypes * added assert * swift example * fix tests * linter * remove whitespace * fixed tests * remove swift example * remove unintended changes * allow callable models to be used * whitespace * more readable json export * name change * whitespace * whitespace	2023-08-01 09:35:48 -07:00
George Hotz	f27df835a6	delete dead stuff (#1382 ) * delete bpe from repo * remove yolo examples * Revert "remove yolo examples" This reverts commit `cd1f49d466`. * no windows	2023-07-31 11:17:49 -07:00
George Hotz	37fa7e96fb	Revert "update editorconfig, enforce via CI (#1343 )" (#1380 ) This reverts commit `da2efecbe2`.	2023-07-31 10:35:50 -07:00
Pavol Rusnak	da2efecbe2	update editorconfig, enforce via CI (#1343 ) * update editorconfig to set unix-style newlines and trim whitespace * add editorconfig github action to the CI * fix whitespace	2023-07-30 18:44:30 -07:00
chenyu	ab80ea0d38	use ubuntu for clang ci test (#1368 )	2023-07-28 20:51:25 -04:00
waifairer	d89fb729e5	flake8 (#1323 ) * flake8: Ignore frequent violations, correct infrequent ones * Ignore some rules in test * Reorder test ignores * Lint test + main * EOF indent * Include all E71,E72 errors * Test the failing case in CI * Revert "Test the failing case in CI" This reverts commit `110add0a70`. * Push to test! This reverts commit `f317532779`. * ok back to passing This reverts commit `ba5052685f`. * Prove that CI fails when formatting is incorrect. * Fix formatting * Remove duplicitous E117 rule * Use flake8 config for precommit --------- Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-24 11:19:58 -04:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00

... 16 17 18 19 20 ...

1021 Commits