tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
nimlgen	a65ae1198b	do replace div->mul for non-floats (#1644 )	2023-08-23 07:34:31 -07:00
George Hotz	db8344ab83	add noalias to llvm (#1622 )	2023-08-22 09:26:01 -07:00
George Hotz	c64c47a6ae	test arange simple	2023-08-21 20:16:17 -07:00
Umut Zengin	35bf21276f	Argmax/Argmin Feature (#1576 ) * implemented argmax and argmin * lint * lint * match torch behaviour * format * removed flip	2023-08-20 18:46:46 -07:00
geohotstan	a293c18d34	Gather bugfix (#1561 )	2023-08-16 19:53:14 -04:00
geohotstan	8763037f0e	Fancy indexing is fancy wow and gather thing (#1399 )	2023-08-16 18:35:49 -04:00
nimlgen	b6937acb7e	fix casting behavior for interpreted buffers (#1525 )	2023-08-13 19:21:37 -07:00
George Hotz	38fe84d92b	cleanup mlops (#1521 ) * cleanup mlops * that line belongs there	2023-08-10 19:53:28 -07:00
geohotstan	07b79f210f	llvmir support for bool <-> float casting (#1492 )	2023-08-09 13:12:52 -04:00
Jacky Lee	ef5f648e2f	Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested (#1502 ) * Implement scaled_dot_product_attention and test * Support attn_mask * Support is_causal too * Use in llama * Don't forget to reshape * Set requires_grad=False for causal * Remove staticmethod * Remove extra spaces	2023-08-08 23:27:13 -07:00
George Hotz	d24f936501	just cmplt (#1493 ) * just cmplt * fix maximum * don't save, there's no backward * ugh, no slot either * eq is a scam	2023-08-08 13:58:10 -07:00
nimlgen	932dad1a2b	fix cast bool->float in llvmir (#1480 ) Closes #1479	2023-08-07 21:30:51 -07:00
Diogo	d7d1011f1e	Add WEBGPU tests to CI (#1463 ) * webgpu tests * assert device is webgpu * missed env set * exclude failing ci tests * ignore test file * changed acc for adam test	2023-08-06 10:32:01 -07:00
Francesco Castelli	579f4615a0	Add assert for wrong matmul/dot shapes (#1438 )	2023-08-04 18:16:56 -04:00
Umut Zengin	52db7d7435	inf, -inf support for pad (#1436 )	2023-08-04 15:05:25 -04:00
Umut Zengin	8889821547	Const pad support to pad2d and slice (#1392 ) * slice to pad2d migrate * Gain line * Mypy happy * Mypy happy * Revert * whitespace	2023-08-02 08:58:52 -07:00
Diogo	ba5e3818a0	Limit dims based on max size (#1390 ) * working * whitespace * changed defaults to None * linter * last linter error	2023-07-31 19:18:19 -07:00
Umut Zengin	0de5f20970	Re-open constant pad support to Tensor.pad (#1388 ) * Added const padding support to .pad * Linter	2023-07-31 17:08:57 -07:00
wozeparrot	32d1afa4b5	feat: correct case when base is 0 (#1360 )	2023-07-27 13:53:38 -04:00
wozeparrot	c22e77abfd	Match torch on fractional negative base pow (#1352 ) * feat: match torch on fractional negative base pow * feat: tests for trunc	2023-07-26 19:14:54 -07:00
Umut Zengin	d4ebadf2da	Small Tensor.cat optimization and reformating (#1347 )	2023-07-26 18:01:12 -04:00
geohotstan	4056f97187	Gather (#1329 )	2023-07-25 15:05:41 -04:00
waifairer	d89fb729e5	flake8 (#1323 ) * flake8: Ignore frequent violations, correct infrequent ones * Ignore some rules in test * Reorder test ignores * Lint test + main * EOF indent * Include all E71,E72 errors * Test the failing case in CI * Revert "Test the failing case in CI" This reverts commit `110add0a70`. * Push to test! This reverts commit `f317532779`. * ok back to passing This reverts commit `ba5052685f`. * Prove that CI fails when formatting is incorrect. * Fix formatting * Remove duplicitous E117 rule * Use flake8 config for precommit --------- Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-24 11:19:58 -04:00
George Hotz	086382b64e	Revert "Fix max nan (#1298 )" (#1334 ) This reverts commit `50774470b2`.	2023-07-23 20:41:28 -07:00
uncommonSensor	50774470b2	Fix max nan (#1298 ) * Fix max nan * Adds nan check option to max function * Calls to max can pass in "ignore_nan=True" argument * Added max nan CI tests * Fix max nan * Adds nan check option to max function * Calls to max can pass in "ignore_nan=True" argument * Added max nan CI tests * Turned off due to the need for granularity	2023-07-23 19:39:44 -07:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
madt2709	d2c1e8409a	Update arange to be (start, stop, step) (#1308 )	2023-07-21 00:27:23 -04:00
Umut Zengin	74e63fe4ee	Added test_chunk and fixed (#1283 )	2023-07-19 22:21:26 -04:00
Umut Zengin	fde9f0e60d	Slice migrated in Eye op (#1281 ) * Migrated from slice to pad and shrink, made cleaner * Changed repeat with reshape and expand	2023-07-19 09:08:38 -07:00
Umut Zengin	fa0265b173	Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266 )	2023-07-18 16:09:19 -04:00
Stan	ed472bffea	Fix: negative axis in `tensor.cumsum` (#1261 )	2023-07-17 16:16:38 -07:00
Adrian Kretz	5a8ad57163	Add WHERE ternary (or trinary?) op (#1196 ) * Rename FusedOps to TernaryOps * Support ternary broadcast * Add where llop and mlop * Make where op work in cstyle codegen * Don't skip test_inf_where * Add backward path to where op * Use bool in cstyle codegen * Add LLVM where op * Add numpy where op * Add torch where op * Simplify where mlop * Update documentation * Forgot a rename * Merged relevant changes from PR #1195 onto PR #1196 * Add test to cover changes to linearizer.ast_parse for WHERE op Without this METAL will try to use ternary op on float4 and fail * Make where op work in wgsl backend * Allow ternary ops to be merged * Make mypy happy --------- Co-authored-by: Francis Lam <flam@alum.mit.edu>	2023-07-16 00:31:55 -07:00
Stan	264d467f2b	Added `tensor.squeeze` and support for testing exceptions (#1241 ) * WIP: `tensor.squeeze` function * Added `test_except` param to `helper_test_op` to avoid false positives * Extracted new method `helper_test_exception` for testing exceptions * Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch	2023-07-15 00:33:24 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
madt2709	bb316a42af	Fix pow to work with negative tensors (#1191 )	2023-07-09 17:33:04 -07:00
George Hotz	43385c7dbf	remove contiguous on full (#1212 )	2023-07-09 17:31:15 -07:00
George Hotz	67e34b356a	good stuff from tensor cores branch (#1199 )	2023-07-08 16:58:26 -07:00
George Hotz	7151382364	Refactor load/store before tensor cores (#1193 ) * minor cleanups * render_const * now that's a nice refactor * clean up vload/vstore * clean up render_load * debugs there * dumb * err, this? * const float4 * what's failing * bugfix * statement includes semicolon * bugfix	2023-07-08 15:54:58 -07:00
Eli Frigo	801564f31b	Remove POW llop and add SQRT llop (#1104 ) * fixed division by zero for fast operations * made et closer to 0 * replace POW llop with SQRT * updated mlops to swap SQRT and POW llops * updated hlops to swap POW and SQRT * added sqrt llop to cpu runtime * added sqrt llop to cstyle codegen * added POW llop to llvm ir codegen * added SQRT llop to torch runtime * moved pow from mlops to hlops * found a better way to do reverse pow * fixed indentation * added SQRT llop to triton * update docs to match new llops * removed POW operator from assembly codegen * added sqrt and rsqrt to pow hlop * rewrote pow function in tensor.py * Adjust tolerance * Adjust for adamw * Reduce for Adam too * removed accidental leftover code * removed all of accidental code * added rsqrt test * removed pow from mlops again it was added back when resolving merge conflicts --------- Co-authored-by: Jacky Lee <jla524@sfu.ca>	2023-07-05 18:07:58 -07:00
George Hotz	793a670187	from tensor cores + lb touchup (#1127 )	2023-07-04 15:45:20 -07:00
George Hotz	c709dec8b5	gelu: weird test was broken for metal	2023-07-04 00:43:54 -07:00
George Hotz	daf8e1942f	sigmoid: test large postive also and add note	2023-07-04 00:18:31 -07:00
Kunwar Raj Singh	9e6067378f	Broken Sigmoid backward: Add test and mlop for Sigmoid (#1113 ) * Add failing sigmoid test * update more tests * add mlop for sigmoid * add back test * math.log(math.e) = 1 * remove divides --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-07-04 00:14:22 -07:00
geohotstan	575f75f613	hello (#1084 )	2023-07-01 01:29:35 -07:00
Jacky Lee	754e54ebb9	Fix Tensor ceil and floor for whole numbers (#1071 ) * Works on non-special numbers * Test different cases	2023-06-27 23:22:17 -07:00
George Hotz	d16c16ec28	new upcast works (#1066 ) * new upcast works * float4 try * fix unaligned float4 * disallow unaligned access * upcast dim * maybe good now * fix gpu half * vstore_half4 * fix deep image bugs * improve symbolic to fix issues * fix symbolic * cl test * this maybe * gcd of 1 is 1 * real fix for old python * improve fuzzer	2023-06-27 19:34:53 -07:00
George Hotz	3e33befc1d	realize hotspots (#1059 ) * realize hotspots * no str check * minor changes * make this an assert * faster and more readable * nicer self.buffers * tests for weak op + LAZYCACHE=0	2023-06-26 18:31:18 -07:00
Kunwar Raj Singh	5d3310ce56	MaskRCNN Inference (#884 ) * MaskRCNN weights loading * backbone maybe works * backbone works, but resnet body atol 1e-3 * RPN Call, but veryy wrong output * fixed topk * RPN maybe works, not sure about nms * Fix cursed modules * add back editorconfig * Full call, wrong output * Full call works * fix mask * use NMS from retinanet * Removing extra funcs * refactor * readable * Add example to run model * remove filter * Fix split, batched inference is worse * Fix image sizes * Matching reference * merge master * add filter on top detections * cuda backend fixed * add model eval and spec * convert images to rgb * fix eval * simplify examples code * remove extra code * meshgrid using tinygrad * removing numpy * roi align, floor, ceil * remove numpy from level_mapper * remove numpy from pooler * Revert "Merge branch 'master' of github.com:kunwar31/tinygrad into mrcnn-inference" This reverts commit `4b95a3cb49`, reversing changes made to `98f2b1fa2e`. * roi align gather * fix master merge * revert to old floor, ceil as ints present in domain * use log2 op * fix indexes * weird bug with ints and gpu * weird bug with ints and gpu * refactors, add env var for gather * floor with contiguous, where * refactor topk, sort * remove staticmethod * refactor stride * remove log2 mlop * realize -> contiguous * refactor forward * remove num_classes, stride_in_1x1 from state * refactor forward * refactoring * flake8 * removing numpy in anchor gen, use numpy for gather, nonzero, optimize topk * keep using tinygrad for smaller gathers * fix empty tensors * comms * move from tensor.py * resnet test passing * add coco dataset back * fix spaces * add test for log2 * no need to create Tensors * no need to create Tensors --------- Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>	2023-06-25 15:37:51 -07:00
Francesco Castelli	6ff720103e	Reduce tensor dot line count and fixed 1d tensor dot (#1045 ) * fixed tensor.dot * no 1d dot for image=1 * shorter lines * add 3d dot tests	2023-06-25 10:32:45 -07:00

1 2 3 4 5 ...

252 Commits