tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 17:38:06 -05:00

Author	SHA1	Message	Date
George Hotz	0cbf6c1811	move things, clean up extra (#2292 ) * move things * idk why pylint needs that now * delete unused	2023-11-13 20:18:40 -08:00
George Hotz	b1f7f29525	metal indirect command buffers (#2285 ) * metal indirect command buffers * sub 1ms gpt * metal batch exec is good * remove whitespace * input_replace * fix ci * useResources * very simple cacheallocator * update_stats * fix CI * minor * remove that from jit	2023-11-13 17:58:26 -08:00
chenyu	d86ea188dd	support symbolic shape in Interpreted (#2289 ) * support symbolic shape in Interpreted * simpler * no InterpretedFlopCounter * tragic NumNode * regex is hard	2023-11-13 20:13:18 -05:00
nimlgen	960535dfb8	get_linearizer_actions does not return illegal actions (#2287 ) * fix some linearizer failures * linter happy * no new test class	2023-11-13 11:48:54 -05:00
rodfer	53c5baa8b6	add dilation to avg_pool2d (#2270 ) * add dilation to avg_pool2d * avg_pool_fix * avg_pool_fix * woo * oops * force it correct --------- Co-authored-by: rodfer0x80 <rodfer0x80@proton.me> Co-authored-by: zibokapi <zibokapi@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-13 08:47:56 -08:00
chenyu	a72b370066	llama take int and convert to Variable internally (#2284 )	2023-11-12 17:11:37 -05:00
chenyu	f5a62a1b42	fix some tests related to JitItem (#2279 )	2023-11-11 23:00:35 -05:00
George Hotz	78623ba204	two simple tests	2023-11-10 16:16:06 -08:00
George Hotz	6ceea02e65	hotfix of onnx	2023-11-10 15:40:30 -08:00
geohotstan	b853e9bb8c	Onnx 1.15.0 gogogo (#2217 ) * lol * lol * add GELULULULUL * onnx 1.50 * fuk torch bool neg * exclude regex tests * exclude dequantizelinear for now * is sunny in philly * damn it affinegrid * fixed auto_pad VALID * skip 0 shape tests * add temporary cast in Reduces * tests should pass now * added comments and cleanup * try moving dequantizelinear to onnx.py * fixed dequantizedlinear? * cleanup * try? * float16 segfaults LLVM CI..??? * cleanup comments * pin to 1.50.0 * remove use of -np.inf cuz numpy is kill * 1.50? lol I'm actually retarded * thx for review, muhbad * moved Gelu higher up	2023-11-10 15:36:48 -08:00
George Hotz	85d26ddc36	uops loop removal (#2262 ) * remove the loop * cleanups * tests failing still * global_loop_ctx wasn't needed * replace_op is cleaner * minor opt * cast opt was wrong * uop_num * uop num was dumb * tuplize_uops * torch tests * fix test_uops	2023-11-10 15:24:47 -08:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
qazal	b6aaf12df7	Internal cast 2 with more tests (#2257 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit `1bf1a818a3`. * Revert "render phi as the dtype" This reverts commit `d08cb270b4`. * reenable triton tests * no vstore_half if dtype is already half * upcast max	2023-11-10 10:42:39 -08:00
chenyu	75f6e9ab54	one more fuzz linearizer failed example (#2260 )	2023-11-10 09:17:37 -05:00
George Hotz	330484c072	Revert "Internal casting support (#2046 )" (#2256 ) This reverts commit `7e1d08b2ae`.	2023-11-09 21:27:13 -08:00
qazal	7e1d08b2ae	Internal casting support (#2046 ) * Change linearizer to parse CAST * Oneliner renders for cstyle and triton * LLVM cast and ALU implementation * pylint fixes * cast in gep * remove printbufs * use cast for post-load ops * get rid of parse_cast * partially supported vectorized dtypes for initial dev * render phi as the dtype * Revert "partially supported vectorized dtypes for initial dev" This reverts commit `1bf1a818a3`. * Revert "render phi as the dtype" This reverts commit `d08cb270b4`. * reenable triton tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-09 21:02:32 -08:00
vish-pr	6051f0ce82	For cuda get current free space from device, and retry alloc failures (#2197 ) * For cuda get current free space from device, and rery alloc failures * type ignore for mypy * add init to get free mem in cuda * Move retry logic in common lib. Fix typo in override _get_cur_free_space * linter error fix in test file * Not catch all, as it will catch KeyboardInterrupt * fix unintened line changes	2023-11-09 15:53:50 -08:00
qazal	2465d5d267	fix ops tests in test_dtype (#2237 ) * fix test ops * decompose the err from test_ops * skipTest skips the entire test, we dont want that * handle cases with the same priority * add int16 to torch map	2023-11-09 15:17:43 -08:00
George Hotz	80bf0b8586	proper wmma (#2245 ) * proper wmma * hip cast * bugfixes * bugfix * that bug is fixed --------- Co-authored-by: George Hotz <george@tinygrad.org>	2023-11-09 15:15:18 -08:00
chenyu	10d642e174	fuzz linearizer transformation (#2188 ) * fuzz linearizer transformation * no standard normal for fp16 * work * Interpreted start * CPU and TORCH work * fix MemBuffer with same idx * id for failed kernels * no image and variable for Interpreted * symbolic shape * IMAGE only for GPU * Interpreted almost all good * cleanup * fix bufs_from_lin * zero size * some failed examples * just Exception * just test not pass	2023-11-09 08:03:27 -08:00
George Hotz	38b7f5a7fd	less phi, proper phi (#2241 ) * less phi, proper phi * disable flaky whisper test	2023-11-08 16:13:43 -08:00
wozeparrot	4c44d1344b	feat: remove cache_id (#2236 )	2023-11-08 08:09:21 -08:00
George Hotz	c0a033f01d	remove real_offset (#2234 ) * remove real_offset * pass in numnode * remove that real_offset * sample only for variable	2023-11-07 17:30:53 -08:00
nimlgen	ae5d1407ee	Fix mmaped in jit (#2225 ) * fix reuse for mmaped buffers in jit * comment	2023-11-06 14:54:21 -08:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
George Hotz	c60c3b467a	clean up symlinking in benchmark (#2219 ) * clean up symlinking * make torch deterministic	2023-11-05 16:46:05 -08:00
George Hotz	baeb77a403	Make the JIT simple (no batch exec, no cache collector) (#2215 ) * remove batch exec * simple cachecollector * remove cache collector test * less lr	2023-11-05 16:23:43 -08:00
chenyu	719a97b337	fix IMAGE=2 failed with NOOPT=1 (#2209 ) * IMAGE=2 failed with NOOPT=1 * fix it	2023-11-05 13:16:37 -08:00
chenyu	680cbfdba4	less broken limit_dims_to_max (#2214 )	2023-11-04 08:38:06 -07:00
chenyu	f582ec56d5	Replace (getenv("CI", "") != "") with helpers.CI (#2213 )	2023-11-03 15:20:44 -07:00
George Hotz	f17bc16f46	simple runtime args (#2211 ) * simple runtime args * fix some tests * fix abstractions and triton * fix search	2023-11-03 12:31:29 -07:00
George Hotz	03cf0afa4f	move all to compile api (#2203 ) * move metal+clang to compile api * all to the new style * remove binary arg * fix triton * fixup tests * fix clang * diskcache is generic * __wrapped__ * compile_gpu * fix thneed * keep the src in the ASTRunner * lib * move compile_gpu * compile_gpu in device * put compiler in astrunner * test reverts * triton compiler * ugh, that too	2023-11-01 23:01:32 -07:00
George Hotz	7103b716c4	merge kernel and optimizer (#2200 ) * merge kernel and optimizer * linearize is reentrant * move global/local size * clean up linearizer copy * remove unneeded lin copies * stop linearizing twice * oops, that should be None	2023-11-01 15:20:01 -07:00
George Hotz	8ba7ced7f9	extract const if it's const (#2193 ) * extract const if it's const * fix if statement * fast math issue * fix graphing and casting * disable flaky copyout test	2023-10-31 18:52:35 -07:00
George Hotz	b245f1307e	add exp2 (#2192 )	2023-10-31 17:48:42 -07:00
qazal	e2428b63a6	external (#2191 )	2023-10-31 13:57:24 -07:00
nimlgen	8c07c73a9b	Fix cl map buffer (#2190 ) * fix gpu enqueue_map_buffer out of space * add test	2023-10-31 12:02:46 -07:00
qazal	be5f185ac0	Higher test coverage for dtypes (#2156 ) * refactor unit tests for dtypes * add missing dtypes in llvmir.py and lib.py * skip torch tests * webgpu * cleaner skips * fix llvm bool casting issue using compare * llvm 100% passing * llvm segfault * TEMP decrease timeout mins to 11 debug * add bf16 to setup * skip half tests in cuda cpu * check for CUDACPU insetad * add int16 to triton dtypes * u16 for triton * remove debug - diff is still hard to read * derive from base class TestDType * enhance test_upcast and downcast by running on every possible version * dummy commit to rerun the flakey test * skip the correct tests for CUDA * bf16 should be skipped in the common TestDType cases * re-enable bf16 * more consistent structure * tiny changes to is_dtype_supported 1 * tiny changes 2 add reason * fuzz * fuzzer p2 * run fp32 twice * remove duplicate fp32 run * clang: use stdbool * skip triton on bool casts * merge and resolve conflicts	2023-10-30 22:38:42 -07:00
Akshay Kashyap	018bd29e37	Enable Multi-Output Export (#2179 ) * Enable Multi-Output Export * Add test * Update examples and lint * fix padding * test ops * dummy commit to rerun test * revert cuda lint * Enforce tuple/list of tensors * subscripted generics * put back webgpu test * Re-enable WebGPU Efficientnet test	2023-10-30 18:42:26 -07:00
qazal	a7439af786	Fix llvm int->bool cast (#2164 ) * add to ir * add test case * minimize diff * todo * enable fast math * added both False and True case	2023-10-30 15:28:23 -07:00
chenyu	3c88af5071	use unique table name for each disk_cache test (#2184 )	2023-10-30 13:49:49 -07:00
George Hotz	194e4ad6f8	Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162 )" (#2182 ) This reverts commit `8cf0bb9351`.	2023-10-30 10:22:26 -07:00
Francis Lam	8cf0bb9351	optimizer: simplify GROUP and LOCAL to have one of each (#2162 ) * optimizer: simplify GROUP and LOCAL to have one of each Now that tensor cores only use LASTLOCAL, we can simplify to use only that op everywhere. The only use of GROUP is in matvec hand-coded opts and it doesn't make a performance difference so switching to use only the top behavior. Also adds additional asserts to prevent tensor core dims from being altered which causes bad kernels to be generated. * search: remove duplicated actions	2023-10-27 11:37:44 -10:00
George Hotz	e0201922e3	Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE (#2142 ) * stable diffusion < 324ms * revert swap action * fix tests due to more sum splitting * REDUCEOP_SPLIT_THRESHOLD env var * added from unaligned np test (#2134) * align cpu buffer before copy into cl buffer (#2135) * remove shelve from handcode_resnet50_opt.py (#2139) * Add dictionary keys to reduce db size (#2131) * work * ignore beam cache * dictionary keys are generic * minor db cleanups * fix baseline and extract dataset * fix training * log likelihood * more lin to feats * sts * training policynet * net sort of works * dedup * refactor, stupid new actions * fix uops deduping * BEAM_ESTIMATE --------- Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>	2023-10-27 10:53:06 -10:00
chenyu	9215bccb41	Tensor.uniform set default to standard uniform (#2158 ) * Tensor.uniform set default to standard uniform * clean up test to reuse function	2023-10-27 16:15:30 -04:00
Roelof van Dijk	36ab04ae35	perf: lazyop as dataclass (#1603 ) * perf: lazyop as dataclass fix: linter fix: restore eq * use builtin methods, buffers to property to allow freezing * fix: reduce diff * fix: can't freeze due to KOPT tests, mypy * fix: explicit hash * can freeze if tests are fixed * fix: typo --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-10-25 17:54:30 -04:00
Francis Lam	bf3490cdf9	wmma: refactor tensor cores using existing local dims (#2097 ) * wmma: refactor tensor cores using existing local dims * optimizer: fix bad rebase and break after one late local --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-10-25 13:10:46 -04:00
wozeparrot	c29653605e	hip multigpu training (#1878 ) * feat: move to hip * feat: special path for RawBufferTransfer * feat: initial rawbuffertransfer * feat: hip ipc * feat: working hip ipc * feat: need to base device without args * feat: close mem handle * feat: modified test * feat: more multihip stuff * clean: cleanup * feat: cleaner * feat: don't crash * feat: test more * clean: way cleaner hip wrapper * feat: barrier * feat: barrier * feat: this breaks stuff * feat: we can use empty here * feat: maybe fix tests * feat: maybe fix tests again? * fix: probably fix tests * feat: no waiting here * feat: wait here * feat: much larger test * feat: need to sync here * feat: make this async * feat: no waiting! * feat: cut here * feat: sync copy * feat: random imports * feat: much cleaner world * feat: restore this * feat: restore this * clean: cleanup * feat: set this	2023-10-24 17:35:53 -04:00
nimlgen	2e89fd264f	Refactor hipgraph (#2141 ) * refactor hip graph * linter happy * happy liner	2023-10-24 15:45:56 -04:00
George Hotz	cea2bc7964	Add dictionary keys to reduce db size (#2131 ) * work * ignore beam cache * dictionary keys are generic * minor db cleanups * fix baseline and extract dataset * fix training * log likelihood	2023-10-24 10:49:22 -04:00

... 73 74 75 76 77 ...

4667 Commits