tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-27 15:58:10 -05:00

Author	SHA1	Message	Date
Yixiang Gao	4d54afb6df	sparse cat cross entropy (#1597 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs * fix training loss * add device	2023-08-21 14:14:54 -07:00
George Hotz	2e60920317	Revert "sparse cat cross entropy (#1591 )" (#1596 ) This reverts commit `f0ee850e98`.	2023-08-21 10:04:26 -07:00
Yixiang Gao	f0ee850e98	sparse cat cross entropy (#1591 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs	2023-08-21 09:56:41 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00
Diogo	ba5e3818a0	Limit dims based on max size (#1390 ) * working * whitespace * changed defaults to None * linter * last linter error	2023-07-31 19:18:19 -07:00
cheeetoo	a0965ee198	CI < 5 minutes (#1252 ) * models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker	2023-07-23 13:00:56 -07:00
Stan	872e2198fe	Added `nn.ConvTranspose1d` (#1243 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-15 00:42:42 -07:00
Stan	a8f3b3f4ed	Added test for nn.Conv1d (#1242 )	2023-07-15 00:30:50 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00
Diogo	57d3aa76a5	Windows & Ubuntu CLANG CI support (#1011 ) * matrix strategy * push env to GITHUB_ENV * use printf instead of echo * use temp helper function for cross os paths * use path join * switched to using temp helper function * skip test on windows due to memory limit * small fix * removed semi * touchups * clean up * seperate tests * test changes to test_utils on windows * small refactor * more cleanups * undo helpers change * only skip if in CI and WINDOWS	2023-06-19 09:33:24 -07:00
Diogo	0dab8edc97	support Int64 type in cstyle gen (#860 ) * added metal int64 and some simple tests * removed bool return type def * typo in test * also missing in clang and gpu runtimes * switched order for opencl * increased atol and removed new line in kernel prefix	2023-05-30 16:04:46 -07:00
Jacky Lee	5d212864b5	Add MLPerf UNet3D model (#775 ) * Add ResNet inference test and cannon * Test with ResNet50 * test_car works with resnet fix * Add KiTS19 dataset * KiTS19: Implement iterate * No batch load for this dataset * Save results on iterate * Implement dice score * Add data prep and eval functions * Resolve shape issue * Conversion works but wrong values * Segfaults when load_from_pretrained is called * Fix segfault and assign properly * Final result generated, though very slow * Store and load final result to save time * Fix typo in finalize * Score computes * More bug fixes, dice score is very low * Working broken code * Assign output values to result * Getting a much higher score now * Fix dataset preprocessing * Mean DICE score of 88.5 * Ugh, typo * Attempt to reimplement model * Rename layers * Tiny model works, kinda * Accuracy? gone * Implement InstanceNorm and match torch * Test instance norm 2d and 3d * Combined input block with downsample block * Tiny model works, support strided convtranspose * Commands to download dataset * Clean up a bit * unet3d_v2 -> unet3d * Remove duplicated code * Oops, put tests back	2023-05-28 20:38:19 -07:00
kposborne2	2163a1b049	Add shrink step to fix strided conv_transpose2d, and add to nn (#823 ) * implement conv transpose 2d * don't inherit, remove old assert --------- Co-authored-by: Kyle <kposborne@gmail.com>	2023-05-28 07:52:45 -07:00
symlon	04284414db	Batchnorm2d fixed running_var (#807 ) * BatchNorm2d match pytorch * removed comment * Batchnorm2d test multiple sizes	2023-05-26 12:32:49 -07:00
wozeparrot	0dc333cfab	Promote Embedding to `nn` (#798 ) * feat: promote Embedding to nn * fix: fix failing test * feat: add test with jit * feat: rewrite embedding to no longer need stacked for loops * clean+fix: don't know how that happened	2023-05-25 18:39:45 -07:00
George Hotz	d6f4219952	LayerNorm2d for 2 lines	2023-03-20 16:58:43 -07:00
George Hotz	8c475ea86a	relax atol, merge_view	2023-03-03 07:48:44 -08:00
George Hotz	9d539b8ebb	more intuitive output shape from _pool	2023-02-28 11:41:48 -08:00
George Hotz	758515dcc0	conv2d is an hlop (#589 ) * conv2d is an hlop * shorter conv * KOPT=-1 * alt imp * MULACC * smarter mulacc * pop conv * 7x7 -> 5x5 * didn't fix, that's not going to work * this is faster and matches old behavior * oh, non lazy just won't work with mulacc * mulacc in torch * bool types were creeping in * optimizer is actually better with hlop conv * fix pushing permutes issue * refactor einsum_mulacc * fix up readme * update readme * _image_conv2d * fix bias addition location * pushing permutes gets back to 200 kernels * conv cleanup * disable hlop conv * don't hide that in helpers	2023-02-23 17:52:31 -08:00
George Hotz	628ce067a1	add tests to mypy	2023-02-22 07:07:38 -08:00
Kirill	7944cfdadc	Remove Tensor.data (#565 )	2023-02-18 16:36:12 -08:00
Jacky Lee	e172f0087a	BatchNorm2D -> BatchNorm2d (#558 ) * BatchNorm2D -> BatchNorm2d * Fix typo	2023-02-16 12:31:49 -08:00
Jacky Lee	486f023e81	Rename Normalize and move to nn (#513 ) * Rename Normalize and move to nn * Match PyTorch for dim>1	2023-02-01 11:55:03 -08:00
George Hotz	487685919b	Revert "Rename Normalize and move to nn (#415 )" (#474 ) This reverts commit `d768acb6a9`.	2023-01-25 07:50:04 -08:00
Jacky Lee	d768acb6a9	Rename Normalize and move to nn (#415 ) * Rename Normalize and move to nn * Fix comparison to None error * Add test for GroupNorm * Rename test case * Flip parameters to match PyTorch * Increase error tolerance * Fix elementwise_affine on channels * Match arguments with PyTorch * Initialize weight and bias only when affine is true * Is this it? * A bit cleaner * Handle case where weight or bias is None	2023-01-25 07:47:59 -08:00
George Hotz	392e57aea7	ugh, why did that fail	2022-10-01 13:38:43 -04:00
George Hotz	ecc1a0470d	add Linear to tinygrad.nn	2022-09-07 07:40:48 -07:00
George Hotz	cff297ef9d	w/e, that's a later prob	2022-07-17 12:32:50 -07:00
George Hotz	bf299802f8	fixup tests	2022-07-17 12:11:53 -07:00
George Hotz	8ba3d1f803	fix bn test, affine is True	2022-01-15 19:52:15 -08:00
George Hotz	bd21304e3c	linear takes in weight and bias	2021-11-30 00:38:47 -05:00
George Hotz	dca076dbf1	remove dumb nn ops	2021-11-29 18:05:31 -05:00
George Hotz	ce3d198bb7	less lines and fix default device	2021-11-27 11:18:49 -05:00
Guglielmo Camporese	2b7589db64	Added ResNet-{18, 34, 50, 101, 152} (#271 ) * added resnets * fix minor * fix minor * resnet in models * added resnet test * added resnet train test * added linear, conv2d nn tests * fix minor in extra/training * resnet in models * fix minor * fix tolerance for linear in nn test * fix eval, this causes cpu and gpu UT failing * revert transformer test * fix minor for CPU test * improved model get_params for sequential layer * fix minor for params counting * commented broken ops tests * improved train for resnet	2021-06-21 09:37:24 -07:00
Skosh	78aa147b39	[WIP] YOLO working on tinygrad! (#245 ) * Some progress on yolov3 * Removed some debugging comments… Also, the forward pass eats all RAM for some reason * forward pass almost runs * forward pass runs almost * forward pass runs, now we gotta load the weights * loading weights works * fetches config and weights * everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done * some changes * fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly * Something is wrong with the forward pass, Conv2d tests added * forward pass almost outputs correct values, gotta fix one more thign * yolo works * some final changes * reverting changes * removed dataloader * fixed some indentation * comment out failing test, somehow it fails CI even though it passes on my computer… * fixed wrong probabilities * added webcam option to YOLO, now just need to add bounding boxes and speed it up * some progress towards adding bounding boxes * trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage * Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image * removed some debugging print statements * updated result image * something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…	2021-04-25 18:06:52 -07:00
Liam	ebd72ff437	Test split (#231 ) * Split tests Split tests into "Test CPU" and "Test GPU". Add test flag "TEST_DEVICES" which is a comma separated list of devices: CPU,GPU,ANE * Run tests based on provided TEST_DEVICES flag By default will run all "CPU,GPU,ANE" * fix bad quote * Revert changes and use GPU=1 This is done through setting the default Tensor Device to Device.CPU of GPU=1 is set. Run GPU tests: GPU=1 pytest -s -v	2021-01-01 09:19:03 -05:00
George Hotz	2f1b2c0a3b	add transpose, start on transformer	2020-12-27 16:59:12 -05:00
Liam	bcf1518309	All devices are equal! (#196 ) * Update all devices to be tested ANE, CPU and OCL all now support all tests. However tests are not currently passing on GPU and I cannot test on CPU. Failing GPU test are not an issue caused by this update. Tests have not been passing due to a missing "six" required installation. OpenCL Tests have not been run since commit: `1a1c63a08b` devices have 3 types and are handle by a new DeviceTypes enum. (The goal is to revert to Tensor.<type>, but this current setup allows for keyword argument defaults: `device=DeviceType.CPU`) All references to Tensor.GPU/CPU/ANE as been converted to the corresponding `DeviceTypes` enum. Refactor of the conversion code to allow for any device to any device conversion. * Add six dependency in requirements.txt * Resolve failure to run tests Move six into gpu required installs. Remove six from standard installation. * Remove repeated data conversion * Refactor method names Also reduce code with .to and .to_ * Dynamic device handlers * Refactor DeviceTypes -> Device * Add mem copy profiling back * test_backward_pass_diamond_model passing * Resolve Sum issue on GPU * Revert batchnorm2d tests * Update README with upadated API * ANE testing with * Last minute line gains	2020-12-15 23:44:08 -08:00
George Hotz	ffb96b2d0b	batchnorm by marcelbischoff	2020-12-09 03:23:04 -08:00
George Hotz	17659f7dd7	gpu speedup, tests work on M1	2020-12-06 09:05:49 -08:00
George Hotz	c14473f87d	unit test for batchnorm2d	2020-10-30 08:19:58 -07:00

41 Commits