tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
nimlgen	f863c12610	test kopt correctness (#1756 ) * test kopt correctness * bump BUDGET to 20 * kopt hooks as setUp/tearDown	2023-09-04 10:55:00 -07:00
chenyu	b8fde6bb0f	Test KOPT in CI (#1744 ) * test kopt in ci * getenv takes dtype from default	2023-09-03 14:37:20 -07:00
chenyu	a2745819f6	faster gpt2 jit path and gpt2 in test_real_world (#1738 )	2023-09-02 08:39:12 -07:00
nimlgen	8844a0a822	llvm jitted (#1652 )	2023-08-28 20:22:44 -07:00
chenyu	b5d700adae	update openpilot supercombo.onnx to 0.9.4 (#1681 ) * update openpilot supercombo.onnx to 0.9.4 * update tests for the new model * comment out comma models from external_model_benchmark	2023-08-26 19:16:08 -04:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00
George Hotz	739f327d2d	Shorter (#1582 ) * deleting lines * remove insert dims * if statement is never hit * bug fixes	2023-08-20 08:12:16 -07:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
JaSpa99	d3d58a37e5	Bert: use Tensor.scaled_dot_product_attention (#1528 ) * use scaled attn from Tensor * add a test for bert * linter * no more tokenizer * without loading weights * remove prints * tribute to linter lords * smaller input and less runs * small bert	2023-08-12 08:46:04 -07:00
Yixiang Gao	7c2ea85bb0	Raise memory limit for CIFAR test (#1499 )	2023-08-08 19:40:56 -04:00
Yixiang Gao	6480a1a180	CIFAR 94.03% (#1340 ) * add disk_tensor * fix jit * new baseline before whitening * whitening through torch * whiting done currently at 91.65% * 91.99% * clean up mixup and 92.3% * clean up 92.30% * 92.49% before searching for new hyper-parameters * fix CI * fix white space * add whitening init in test * refactor, update hyperpara, 92.72% * converting whiting to tinygrad operation * update CI kernels count for CIFAR * add pad reflect * add random crop 92.53% * update hyperpara 93% * 93.15% on docker container, need to refactor the assignment for hyper param * print out weights and bias to be separated * bias/non-bias params separated * fix whitespace * clean up * refactor hyper-param with dict * refactor lr schedular params * fix whitespace * fix cross entropy loss * fix whitespace * move opt hyp to hyp dict * minor fixup * adjust model, loss scaling * 92.74% while using half of compute as before * update hyp for cutmix * random shuffle during batches * clean up * updating the model * update ConvGroup * disable gradients for batchnorm layer weights * whitespace * 93.92% * clean up * finally 94%git add .! * rewrite whitening to remove dependency on torch * whitespace * remove dependency on torch, 93.91% * back to 94.03% * clean up * update test_real_world	2023-08-08 15:13:24 -07:00
nimlgen	1ba8ae62a1	Match Torch speed for sum reduction (#1387 ) Co-authored-by: Alexander Edwards <alex@alexedw.com>	2023-08-05 22:27:33 -07:00
Pavol Rusnak	cd60b8561c	Add LLaMA-2 support (#1284 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-07-24 17:12:02 -04:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
George Hotz	47f9d82722	test_conv: relax to 0.93	2023-07-23 12:57:29 -07:00
George Hotz	f45013f0a3	stable diffusion: remove realizes we don't need	2023-07-20 19:53:07 -07:00
George Hotz	50a399ffa3	real world test: relax memory	2023-07-20 14:06:22 -07:00
George Hotz	17830e25da	real world tests (#1297 ) * real world test * touchup * sync device	2023-07-20 10:50:22 -07:00
George Hotz	f7b0320d8b	add cifar training regression test (#1287 ) * add cifar training regression test * clean up print	2023-07-19 14:17:09 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
terafo	aa60feda48	Fix naming conflict with huggingface datasets (#1161 ) * Rename in files * Move files * Moved to extra/datasets as suggested * Changes to files * Fixed stupid mistake --------- Co-authored-by: terafo <terafo@protonmail.com>	2023-07-07 10:43:44 -07:00
Reza Rezvan	8ae9a054ae	Refactor nn.optim (#1091 ) * Refactor: nn.optim.py * Refactor: nn.optim.py; Fix all tests * Refactor: Replace all optim.get_parameters() * Refactor: Revert list comp. * Refactor: Replace optim.get_state_dict * Refactor: Change quickstart.md	2023-07-02 15:07:30 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
Joqsan	ef129bcb85	Zero dim Tensor support (#777 ) * add and reorganize test_slice_* tests * refactor Tensor.__getitem__() * preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones * always compare shapes of the numpy arrays obtained from tinygrad and torch tensors * add more tests for 0D support * remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py * add zero-dim support * make test_end2end.py consistent with 0dim support * add test for tensor with zero in shape * don't simplify ones if shape is () * skip tests that need zero-size tensor support. - zero-size tensor support not related to 0dim tensors. * add tests for __getitem__() supporting strides >= 1 * refactor __getitem__: support for strides >= 1 * minor refactors and add comments to __getitem__ * add tests for slices with negative steps * add support for slices with negative strides	2023-06-01 11:32:02 -07:00
George Hotz	a3feee29c5	make tests faster + add onnx (#815 ) * search one dir, disable slow * onnx tests * fast rnnt test	2023-05-27 08:53:32 -07:00
wozeparrot	01ae45a43c	Add mlperf RNN-T model (#782 ) * feat: initial rnn-t * feat: working with BS>1 * feat: add lstm test * feat: test passing hidden * clean: cleanup * feat: specify start * feat: way faster lstm & model * fix: default batch size * feat: optimization * fix: fix metrics * fix: fix feature splicing * feat: cleaner stacktime * clean: remove unused import * clean: remove extra prints * fix: fix tests and happy llvm * feat: have the librispeech dataset in its own dir * clean: unused variable * feat: no longer need numpy for the embedding + slightly more memory efficient lstm * fix: forgot to remove something that broke tests * feat: use relative paths * feat: even faster * feat: remove pointless transposes in StackTime * fix: correct forward * feat: switch to soundfile for loading and fix some leaks * feat: add comment about initial dataset setup * feat: jit more things * feat: default batch size back to 1 larger than 1 is broken again :( and even in the reference implementation it gives worse results	2023-05-25 00:41:21 -07:00
Jacky Lee	c552f6f92b	Inference test: add tests for ResNet50 (#773 ) * Add ResNet inference test and cannon * Test with ResNet50 * test_car works with resnet fix	2023-05-13 21:18:15 -07:00
George Hotz	03b38864db	fix batchnorm at training (#753 ) * e2e testing * min failure * no affine on bn, still fails * why did i think i could detach that? * allow more kernels for bn * some test issue i don't understand	2023-04-19 08:01:04 -07:00
George Hotz	30b795874a	remove RMSprop, nobody uses it anymore	2023-03-20 12:31:34 -07:00
George Hotz	623fb1ef28	do test_conv_with_bn test	2023-03-19 23:53:56 -07:00
George Hotz	305b9f2d21	multistep optim tests passing	2023-03-11 17:49:53 -08:00
George Hotz	b14d31d6db	ConvNeXt + extras (#657 ) * simple convnext implementation * shorter function names * need to realize the random functions now * creating an optimizer realizes all params * assign contiguous * fix lazy lazy * why was i doing that...add convnext to tests * LazyNumpyArray * enable assert + comment * no two tiny	2023-03-06 22:10:56 -08:00
George Hotz	7dbcc26582	fix up external tests	2023-03-06 06:52:28 -08:00
George Hotz	9bd2cdee08	skip broken bn training test for speed	2023-03-03 06:52:11 -08:00
George Hotz	999b44c274	fix external test + speed	2023-03-03 06:46:16 -08:00
George Hotz	8919ca8163	test cleanups	2023-03-03 06:36:06 -08:00

38 Commits