tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 13:58:00 -05:00

Author	SHA1	Message	Date
chenyu	3971259832	fix test_real_world llama (#2335 )	2023-11-16 19:50:08 -05:00
Friedrich Carl Eichenroth	75676ab8e1	Profiling-helper (#2321 ) * change profiler * remove unused imports * remove unused imports * change lazybuffer references * remove unused line * remove unused import * remove unused stuff * add types * typing * typing * typing * trigger actions * -1 loc * fixup * trigger actions * revert lazy typing changes * WIP profiler helper * replace old start & stop profiler * fixup * linting * Update llama.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-16 14:15:56 -08:00
mmmkkaaayy	8235da11dd	whisper: support batch inference, add librispeech WER test (#2074 ) * whisper: support batch inference, add librispeech WER test, add kv caching and JIT * remove JIT_SUPPORTED_DEVICE --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-16 13:50:08 -08:00
George Hotz	3baaf298d6	two stage cumsum in tensor.py (#2331 ) * two stage cumsum in tensor.py * 2 more kernels for llama cumsum * gpt-2 and llama use fast multinomial	2023-11-16 12:09:53 -08:00
chenyu	27f4c26312	fix getitem slice when end < start (#2329 )	2023-11-16 11:20:27 -05:00
chenyu	a98511561c	fuzz_linearizer same api for interpreted and compiled (#2320 )	2023-11-15 17:40:22 -05:00
Marcello Fuschi	b8d460d203	Add Tensor.multinomial (#2295 ) * add Tensor.multinomial only with replacement * add support for 2D input in Tensor.multinomial * fix multinomial output shape * allow passing replacement=False to Tensor.multinomial when num_samples=1 * improve tests for Tensor.multinomial * fix edge case in Tensor.multinomial * Tensor.multinomial no more staticmethod	2023-11-15 11:38:39 -08:00
George Hotz	70a65c201e	JIT support in Interpreted (#2314 ) * factor that out * jit is supported everywhere * fix some tests * there's no jit supported device, the jit is everywhere * fix test uops	2023-11-15 11:13:38 -08:00
chenyu	9a20bc08d6	Tensor(None) is Tensor([]) (#2316 )	2023-11-15 13:49:18 -05:00
chenyu	f1f863c953	allow 0-dim array to broadcast into zero shape tensor (#2315 ) * allow 0-dim array to broadcast into zero shape tensor * not in	2023-11-15 13:12:21 -05:00
George Hotz	4da2ddea6e	Interpreted cleanups (#2312 ) * move the compiler out of ops * don't return realized * var_vals filter, fix custom * typing	2023-11-15 09:02:23 -08:00
chenyu	123a0b86b2	support zero in shape (#2303 ) * zero in shape start * no assert for that * if output size is 0, return without exec * tweak * strides * reduce over non-zero * shrink and expand * fix import * test_elementwise where * cannot reshape from size 0 to size 1 * compiled backend reduce over 0 * zeros for numpy * reduce over 0 and keepdim resulted in 1 * reduce empty set default values * compare with same input * pad test case * cat test case * torch does not support that?	2023-11-15 11:57:48 -05:00
geohotstan	3c5a51fb3a	aaaaaaa finally (#2310 )	2023-11-15 07:12:38 -08:00
kormann	cff8375aa2	make self referential AST fast too (#2278 ) * cleanup * linter * linter * linter * rm .buffers * linter * linter * huh? * cleanup * typo * min diff * property * rev * linter * no matel hack * minimal properties * line * checkout master * copy_to_device * idk * revert * type * type * faast * speed test * cleanup test * softer test * monotonic * harder test * clean code * cleanup	2023-11-15 07:12:07 -08:00
chenyu	175cdbe815	fix pad None will value (#2308 )	2023-11-14 23:57:05 -05:00
chenyu	fac8633ba8	explicit opts for test_linearizer_failures (#2299 ) * explicit opts for test_linearizer_failures * typo * update the invalid check	2023-11-14 11:52:38 -05:00
George Hotz	0cbf6c1811	move things, clean up extra (#2292 ) * move things * idk why pylint needs that now * delete unused	2023-11-13 20:18:40 -08:00
George Hotz	b1f7f29525	metal indirect command buffers (#2285 ) * metal indirect command buffers * sub 1ms gpt * metal batch exec is good * remove whitespace * input_replace * fix ci * useResources * very simple cacheallocator * update_stats * fix CI * minor * remove that from jit	2023-11-13 17:58:26 -08:00
chenyu	d86ea188dd	support symbolic shape in Interpreted (#2289 ) * support symbolic shape in Interpreted * simpler * no InterpretedFlopCounter * tragic NumNode * regex is hard	2023-11-13 20:13:18 -05:00
nimlgen	960535dfb8	get_linearizer_actions does not return illegal actions (#2287 ) * fix some linearizer failures * linter happy * no new test class	2023-11-13 11:48:54 -05:00
rodfer	53c5baa8b6	add dilation to avg_pool2d (#2270 ) * add dilation to avg_pool2d * avg_pool_fix * avg_pool_fix * woo * oops * force it correct --------- Co-authored-by: rodfer0x80 <rodfer0x80@proton.me> Co-authored-by: zibokapi <zibokapi@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-13 08:47:56 -08:00
chenyu	a72b370066	llama take int and convert to Variable internally (#2284 )	2023-11-12 17:11:37 -05:00
chenyu	f5a62a1b42	fix some tests related to JitItem (#2279 )	2023-11-11 23:00:35 -05:00
George Hotz	78623ba204	two simple tests	2023-11-10 16:16:06 -08:00
George Hotz	6ceea02e65	hotfix of onnx	2023-11-10 15:40:30 -08:00
geohotstan	b853e9bb8c	Onnx 1.15.0 gogogo (#2217 ) * lol * lol * add GELULULULUL * onnx 1.50 * fuk torch bool neg * exclude regex tests * exclude dequantizelinear for now * is sunny in philly * damn it affinegrid * fixed auto_pad VALID * skip 0 shape tests * add temporary cast in Reduces * tests should pass now * added comments and cleanup * try moving dequantizelinear to onnx.py * fixed dequantizedlinear? * cleanup * try? * float16 segfaults LLVM CI..??? * cleanup comments * pin to 1.50.0 * remove use of -np.inf cuz numpy is kill * 1.50? lol I'm actually retarded * thx for review, muhbad * moved Gelu higher up	2023-11-10 15:36:48 -08:00
George Hotz	85d26ddc36	uops loop removal (#2262 ) * remove the loop * cleanups * tests failing still * global_loop_ctx wasn't needed * replace_op is cleaner * minor opt * cast opt was wrong * uop_num * uop num was dumb * tuplize_uops * torch tests * fix test_uops	2023-11-10 15:24:47 -08:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
qazal	b6aaf12df7	Internal cast 2 with more tests (#2257 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit `1bf1a818a3`. * Revert "render phi as the dtype" This reverts commit `d08cb270b4`. * reenable triton tests * no vstore_half if dtype is already half * upcast max	2023-11-10 10:42:39 -08:00
chenyu	75f6e9ab54	one more fuzz linearizer failed example (#2260 )	2023-11-10 09:17:37 -05:00
George Hotz	330484c072	Revert "Internal casting support (#2046 )" (#2256 ) This reverts commit `7e1d08b2ae`.	2023-11-09 21:27:13 -08:00
qazal	7e1d08b2ae	Internal casting support (#2046 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit `1bf1a818a3`. * Revert "render phi as the dtype" This reverts commit `d08cb270b4`. * reenable triton tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-09 21:02:32 -08:00
vish-pr	6051f0ce82	For cuda get current free space from device, and retry alloc failures (#2197 ) * For cuda get current free space from device, and rery alloc failures * type ignore for mypy * add init to get free mem in cuda * Move retry logic in common lib. Fix typo in override _get_cur_free_space * linter error fix in test file * Not catch all, as it will catch KeyboardInterrupt * fix unintened line changes	2023-11-09 15:53:50 -08:00
qazal	2465d5d267	fix ops tests in test_dtype (#2237 ) * fix test ops * decompose the err from test_ops * skipTest skips the entire test, we dont want that * handle cases with the same priority * add int16 to torch map	2023-11-09 15:17:43 -08:00
George Hotz	80bf0b8586	proper wmma (#2245 ) * proper wmma * hip cast * bugfixes * bugfix * that bug is fixed --------- Co-authored-by: George Hotz <george@tinygrad.org>	2023-11-09 15:15:18 -08:00
chenyu	10d642e174	fuzz linearizer transformation (#2188 ) * fuzz linearizer transformation * no standard normal for fp16 * work * Interpreted start * CPU and TORCH work * fix MemBuffer with same idx * id for failed kernels * no image and variable for Interpreted * symbolic shape * IMAGE only for GPU * Interpreted almost all good * cleanup * fix bufs_from_lin * zero size * some failed examples * just Exception * just test not pass	2023-11-09 08:03:27 -08:00
George Hotz	38b7f5a7fd	less phi, proper phi (#2241 ) * less phi, proper phi * disable flaky whisper test	2023-11-08 16:13:43 -08:00
wozeparrot	4c44d1344b	feat: remove cache_id (#2236 )	2023-11-08 08:09:21 -08:00
George Hotz	c0a033f01d	remove real_offset (#2234 ) * remove real_offset * pass in numnode * remove that real_offset * sample only for variable	2023-11-07 17:30:53 -08:00
nimlgen	ae5d1407ee	Fix mmaped in jit (#2225 ) * fix reuse for mmaped buffers in jit * comment	2023-11-06 14:54:21 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
George Hotz	c60c3b467a	clean up symlinking in benchmark (#2219 ) * clean up symlinking * make torch deterministic	2023-11-05 16:46:05 -08:00
George Hotz	baeb77a403	Make the JIT simple (no batch exec, no cache collector) (#2215 ) * remove batch exec * simple cachecollector * remove cache collector test * less lr	2023-11-05 16:23:43 -08:00
chenyu	719a97b337	fix IMAGE=2 failed with NOOPT=1 (#2209 ) * IMAGE=2 failed with NOOPT=1 * fix it	2023-11-05 13:16:37 -08:00
chenyu	680cbfdba4	less broken limit_dims_to_max (#2214 )	2023-11-04 08:38:06 -07:00
chenyu	f582ec56d5	Replace (getenv("CI", "") != "") with helpers.CI (#2213 )	2023-11-03 15:20:44 -07:00
George Hotz	f17bc16f46	simple runtime args (#2211 ) * simple runtime args * fix some tests * fix abstractions and triton * fix search	2023-11-03 12:31:29 -07:00
George Hotz	03cf0afa4f	move all to compile api (#2203 ) * move metal+clang to compile api * all to the new style * remove binary arg * fix triton * fixup tests * fix clang * diskcache is generic * __wrapped__ * compile_gpu * fix thneed * keep the src in the ASTRunner * lib * move compile_gpu * compile_gpu in device * put compiler in astrunner * test reverts * triton compiler * ugh, that too	2023-11-01 23:01:32 -07:00
George Hotz	7103b716c4	merge kernel and optimizer (#2200 ) * merge kernel and optimizer * linearize is reentrant * move global/local size * clean up linearizer copy * remove unneeded lin copies * stop linearizing twice * oops, that should be None	2023-11-01 15:20:01 -07:00
George Hotz	8ba7ced7f9	extract const if it's const (#2193 ) * extract const if it's const * fix if statement * fast math issue * fix graphing and casting * disable flaky copyout test	2023-10-31 18:52:35 -07:00

... 68 69 70 71 72 ...

4433 Commits