tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Cole Sutyak	2d4e182294	change fetch to allow for local file selection (#1309 )	2023-07-23 15:00:16 -04:00
waifairer	7cac5ea16c	[GH-1305] Refactor test_dtypes.py to be cleaner (#1306 ) Co-authored-by: waifairer <waifairer@gmail.com>	2023-07-21 18:18:02 -04:00
Maxim Zakharov	48c4df1263	fix: prevent infinite "loading..." state (#1319 ) * demo somewhy doesn't work on my device and throw eror "Error: GPUPipelineError: [Invalid ShaderModule] is invalid" inside setupNet func * because of that, JS halts the execution of the rest of the code below and on the screen we see "loading..." forever * added try catch here to communicate about the error in a proper way	2023-07-21 14:01:53 -07:00
Jacob Pradels	b112edd2c3	Add pylint trailing whitespace rule (#1314 )	2023-07-21 13:37:55 -04:00
George Hotz	bfbb8d3d0f	fix ones, BS=2 stable diffusion, caching optimizer (#1312 ) * fix ones, BS=2 stable diffusion * caching optimizer * print search time * minor bug fix	2023-07-21 09:55:49 -07:00
George Hotz	9746f6d094	move hand coded optimizer (#1310 ) * move hand coded optimizer * llvm can optimize * fix llvm * save linearizer	2023-07-21 07:53:12 -07:00
madt2709	d2c1e8409a	Update arange to be (start, stop, step) (#1308 )	2023-07-21 00:27:23 -04:00
George Hotz	f45013f0a3	stable diffusion: remove realizes we don't need	2023-07-20 19:53:07 -07:00
George Hotz	b58dd015e3	stable diffusion: remove import numpy as np	2023-07-20 19:35:44 -07:00
George Hotz	35bc46289c	stable diffusion: use new tinygrad primitives	2023-07-20 19:25:49 -07:00
Francis Lam	78a7a15753	Fix WSGL to render NaN and prevent shader compile error (#1268 )	2023-07-20 18:00:33 -07:00
Stan	0a3d4f8103	Implementation of VITS TTS model (#1188 ) * [WIP]: implementation of VITS TTS model * Implemented VITS model, moved all code to examples/vits.py * Added support for vctk model, auto download, and cleanups * Invoke tensor.realize() before measuring inference time * Added support for mmts-tts model, extracted TextMapper class, cleanups * Removed IPY dep, added argument parser, cleanups * Tiny fixes to wav writing * Simplified the code in a few places, set diff log level for some prints * Some refactoring, added support for uma_trilingual model (anime girls) * Fixed bug where embeddings are loaded with same backing tensor, oops * Added emotional embed support, added cjks + voistock models - voistock is multilingual model with over 2k anime characters - cjks is multilingual model with 24 speakers both are kinda bad for english though :c * Removed `Tensor.Training=False` (not needed and wrong oop) * Changed default model and speaker to vctk with speaker 6 * Ported rational_quadratic_spline fun to fully use tinygrad ops, no numpy * Removed accidentally pushed test/spline.py * Some slight refactors * Replaced masked_fill with tensor.where * Added y_length estimating, plus installation instructions, plus some cleanups * Fix overestimation log message. * Changed default value of `--estimate_max_y_length` to False This is only useful for larger inputs. * Removed printing of the phonemes * Changed default value of `--text_to_synthesize`	2023-07-20 17:37:14 -07:00
George Hotz	d963024a13	optimizer small fix: return if there's nothing to optimize	2023-07-20 16:57:30 -07:00
George Hotz	9dffc9ba23	Use nevergrad to optimize kernels (try 2) (#1301 ) * nevergrad try 2 * touchups * no ones * opt fixup * cleanups * touchup * make new optimizer file	2023-07-20 16:46:45 -07:00
Diogo	8562b5a04f	fixes error when trying to convert float4 -> half4 (#1300 )	2023-07-20 14:20:05 -07:00
George Hotz	50a399ffa3	real world test: relax memory	2023-07-20 14:06:22 -07:00
George Hotz	17830e25da	real world tests (#1297 ) * real world test * touchup * sync device	2023-07-20 10:50:22 -07:00
George Hotz	ca77d6cd72	bfloat16 in LLVM (enough for llama 2) (#1293 ) * add bf16 support to LLVM * bf16 read works	2023-07-19 20:18:32 -07:00
Umut Zengin	74e63fe4ee	Added test_chunk and fixed (#1283 )	2023-07-19 22:21:26 -04:00
George Hotz	3f2497160c	strip whitespace	2023-07-19 19:01:53 -07:00
George Hotz	65fe72f10b	Cleanup loadop (#1291 ) * cleanup loadop * llvm fix * fix llvm dtype * fix clang	2023-07-19 18:59:47 -07:00
Alexander Schlögl	e3f717f614	fix CUDAProgram __init__ with DEBUG>=6 on Linux (#1288 ) * fix CUDAProgram __init__ with DEBUG>=6 on Linux Replace path generated in f-string by os.path.join * import os instead of os.path.join * move import up	2023-07-19 14:36:58 -07:00
George Hotz	f7b0320d8b	add cifar training regression test (#1287 ) * add cifar training regression test * clean up print	2023-07-19 14:17:09 -07:00
George Hotz	45ecae1ab3	Revert "Match Torch speed for sum reduction on M1 (#1187 )" (#1286 ) This reverts commit `59af9b81c5`.	2023-07-19 13:39:16 -07:00
chenyu	120ae74008	Enable JIT test for size 1 tensor (#1285 )	2023-07-19 11:06:40 -07:00
chenyu	940b6fd21a	Revert "Fix constant folding for Tensor([3]) (#1227 )" (#1274 ) This reverts commit `ab645317c9`.	2023-07-19 10:51:06 -07:00
chenyu	0aed3f73da	More JIT test cases (#1280 ) * More JIT test cases * test against jit_cache directly * remove unused	2023-07-19 10:45:43 -07:00
Francis Lam	3db57d3118	Fix llama.py to load and concatenate 13B, 30B, and 65B models (#1275 )	2023-07-19 13:22:33 -04:00
George Hotz	d6637623e3	torch test touchup	2023-07-19 09:37:23 -07:00
Alexander Edwards	59af9b81c5	Match Torch speed for sum reduction on M1 (#1187 ) * Add additional kernel when reducing multiple dimensions at once. * Faster for smaller inputs * Whitespace and naming * Cleaner, guard for Metal only, and max 1 split rather than N * Draft of different approach * One additional kernel call for this test (as expected)	2023-07-19 09:18:58 -07:00
Umut Zengin	fde9f0e60d	Slice migrated in Eye op (#1281 ) * Migrated from slice to pad and shrink, made cleaner * Changed repeat with reshape and expand	2023-07-19 09:08:38 -07:00
chenyu	a5f5330d91	Add Fuzz Test symbolic / shapetracker to CI. (#1278 ) * Fuzz test symbolic and shapetracker This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8. * mess again * no tail * test shapetracker too * Revert mess and enable all tests * removed leftover	2023-07-19 09:05:45 -07:00
David Hou	56ee97b37f	dedup kernel args v2 (#1272 ) * new version * fix abstractions * try remove test * Revert "try remove test" This reverts commit `2fc18a9f8e`. * assert_allclose * minimize the test * minimize the test * minimize the test * minimize the test * Revert "minimize the test" This reverts commit `e0c0929596`. * Revert "minimize the test" This reverts commit `88240551b1`. * Revert "minimize the test" This reverts commit `78328a7ce2`. * Revert "minimize the test" This reverts commit `989523fded`. * skip test inside body * oops * oops	2023-07-18 20:03:42 -07:00
wozeparrot	37cc33269a	cl fixes for multigpu (#1276 ) * feat: opencl fixes for multigpu usage * clean: who needs this import anyways	2023-07-18 19:59:30 -07:00
Umut Zengin	fa0265b173	Fix: AssertionError Transpose/Permute when WHERE Op in LB (#1266 )	2023-07-18 16:09:19 -04:00
chenyu	c96bf395df	Enable JIT tests for supported devices, skip METAL and WEBGPU (#1265 ) * Enable JIT test * really test metal * Skip some device	2023-07-18 11:40:37 -07:00
Umut Zengin	f8c539989e	Re-open create cumsum speed test (#1255 ) * Reduced tensor size in testing * Update formatting test_speed_v_torch.py	2023-07-17 18:59:36 -07:00
George Hotz	ab3d281a6e	Refactor MemOps (#1256 ) * metal tests pass locally * define global * refactor DEFINE_GLOBAL * move assembly out. it isn't tested * fix llvm	2023-07-17 16:36:33 -07:00
Stan	ed472bffea	Fix: negative axis in `tensor.cumsum` (#1261 )	2023-07-17 16:16:38 -07:00
Oddity	64d39188ad	Assembly ptx target current arch (#1250 ) * updated .target to use the current arch version * undid docstring	2023-07-17 08:45:43 -07:00
Adrian Kretz	5a8ad57163	Add WHERE ternary (or trinary?) op (#1196 ) * Rename FusedOps to TernaryOps * Support ternary broadcast * Add where llop and mlop * Make where op work in cstyle codegen * Don't skip test_inf_where * Add backward path to where op * Use bool in cstyle codegen * Add LLVM where op * Add numpy where op * Add torch where op * Simplify where mlop * Update documentation * Forgot a rename * Merged relevant changes from PR #1195 onto PR #1196 * Add test to cover changes to linearizer.ast_parse for WHERE op Without this METAL will try to use ternary op on float4 and fail * Make where op work in wgsl backend * Allow ternary ops to be merged * Make mypy happy --------- Co-authored-by: Francis Lam <flam@alum.mit.edu>	2023-07-16 00:31:55 -07:00
Stan	91f797cd52	Moved mkdir in `utils.download_file` to diff line (#1249 ) * Moved mkdir to diff line .mkdir does not return the actual directory being created. * use walrus operator to simplify	2023-07-16 00:30:46 -07:00
Yixiang Gao	a8f2c16f8e	add contiguous (#1246 )	2023-07-15 08:36:34 -07:00
Stan	872e2198fe	Added `nn.ConvTranspose1d` (#1243 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-15 00:42:42 -07:00
Oddity	7399f6dad7	display sass for both cuda code and ptx (#1240 ) * skip nvcc compile target cubin when using PTX * actually we should generate sass for both ptx and cuda code * Fixed formatting, should print the error anyway * ensure subprocess.run throws exception * fixed linting errors and checked before commit this time	2023-07-15 00:36:04 -07:00
Stan	264d467f2b	Added `tensor.squeeze` and support for testing exceptions (#1241 ) * WIP: `tensor.squeeze` function * Added `test_except` param to `helper_test_op` to avoid false positives * Extracted new method `helper_test_exception` for testing exceptions * Made `squeeze` not throw IndexError when ndim == 0 and dim <= 0 to match PyTorch	2023-07-15 00:33:24 -07:00
Stan	a8f3b3f4ed	Added test for nn.Conv1d (#1242 )	2023-07-15 00:30:50 -07:00
David Hou	9c135c9450	add sqrt to ptx (#1236 )	2023-07-13 07:26:11 -07:00
chenyu	32be39554c	Simplify symbolic.SumNode.__floordiv__ logic (#1220 )	2023-07-12 12:54:12 -07:00
Diogo	a9a1df785f	Webgpu support (#1077 ) * initial commit * 81 passing * 105 passing tests * 148 passing * CI tests * install dep on ci * try opencl pkgs * try using vulkan * down to only 6 failing * refactor * cleaning up * another test skipped due to buffer limit * linter * segfault * indent fix * another segfault found * small touchups * Fix max and maxpool tests * Add constant folding * Add javascript export script * better asserts in codegen * manual upcasting * reverted token type change * skip safetensor test due to unsupported type * FIx efficientnet and all other model tests * Remove np copy * fixed indent and missing import * manually destroy the buffer * revert back to length * linter errors * removed extra val * skip broken tests * skipping more tests * Make the page pretty * Save model weights as safetensor * Fix imagenet to c test * Fix second imagenet to c bug * Async and paralel kernel compilation * workgroup support * reversed local size * fixed non local bug * correct local groups * ci experiment * removed typo * Fix define local by using shared memory * Refactor * try running on mac * match metal tests * add more workers * scope down tests * trying windows runner * fixed windows env * see how many it can do * merged master * refactor * missed refactor * increase test suite coverage * missing import * whitespace in test_efficientnet.py * getting there * fixed reset * fixed bufs * switched to cstyle * cleanup * min/max rename * one more linter issue * fixed demo * linter * testing ci chrome * add unsafe webgpu arg * add build step * remove WEBGPU from cmd line * use module * try forcing directx * trying forced metal backend * temp disable conv2d for CI * disable conv_trasnpose2d --------- Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-07-12 12:52:06 -07:00

1 2 3 4 5 ...

2159 Commits