tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-15 01:48:23 -05:00

Author	SHA1	Message	Date
George Hotz	984f09ac74	flip Ops.COPY order [pr] (#10120 )	2025-04-30 16:50:18 -04:00
George Hotz	c3ff308abb	range has only one src now [pr] (#10100 ) * range has only one op now * fix z3 checker * ci fix * needs shell * try pip ensure update * that ensurepip is useless * upgrade pip before cache * windows happy?	2025-04-29 10:31:05 -04:00
qazal	cbf7347cd6	display viz rewrites with tabbing if they are subrewrites (#10097 ) * display viz rewrites with tabbing if they are subrewrites * update viz api	2025-04-29 17:57:21 +08:00
Sieds Lykles	dbb7aee02e	Split constant in div with negative x (#10088 ) * add rule * change test * lower complexity limit * remove offset in fold_unrolled_divs * remove import * add one more condition	2025-04-28 16:24:14 -04:00
George Hotz	690dac79b5	don't modify the ranges on reduce rewrite (#10062 ) * bug in div range folding * simpler * oh, this is right for indexing, but the div mod folding needs to be fixed * reenable * Passing test_complexity_w_unroll2 (#10068) * Passing * remove non_folded_divs * Add check for negative tern in div folding * Add test * bump that limit * fix casted --------- Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>	2025-04-28 12:01:19 -04:00
chenyu	4c1ce1a299	don't simplify if div folding resulted in negative numerator (#10064 ) * don't simplify if div folding resulted in negative numerator * test	2025-04-26 17:01:18 -04:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
George Hotz	d1f6701eb7	hotfix: lower amd threshold + improve block reorder test	2025-04-22 20:44:29 +01:00
George Hotz	c1539b0319	putting add first orders loads as expected (#9991 )	2025-04-22 20:12:05 +01:00
George Hotz	feee6986c9	faster block reorder (#9990 ) * faster block reorder [pr] * that shouldn't change order * key just in sorted * ind	2025-04-22 19:18:57 +01:00
chenyu	9e5e371999	make DISABLE_COMPILER_CACHE a ContextVar [pr] (#9983 )	2025-04-22 10:32:54 -04:00
George Hotz	c519b553db	non recursive toposort is 2x+ faster (#9979 ) * non recursive toposort is 2x+ faster * don't change the order	2025-04-22 13:59:38 +01:00
George Hotz	f5dc70c624	microbenchmarks + micro speed ups (#9972 ) * microbenchmarks * forgot the ubenchs * clean up type verify	2025-04-22 11:30:46 +01:00
qazal	9a9aba4cd5	setitem tests (some failing) from kernelize (#9940 )	2025-04-20 18:47:55 +08:00
George Hotz	8919370c76	hotfix: fix test_save_all_dtypes on METAL	2025-04-18 08:42:31 +01:00
Eitan Turok	2c7c205bc5	Fix dtype comparisons in vectorized transcendental + tests (#9794 ) * init test * cleanup * init * update * fix * fix python runtime for vectorized code * awesome helper * update * update * cleanup * more cleaning * cleanup more * fix tests * more cleaning * cleanup more * fix * even cleaner * failing tests is sad * cleanup * better name * make tests pass * remove vec from python runtime * remove vec from eval_uop * remove expected failues * better name	2025-04-16 08:06:12 -04:00
George Hotz	44e4934167	fast pattern matcher [pr] (#9737 ) * FastPatternMatcher * works without that * fix test pickle * strict len * compile match function * dynamic compile * fast * faster * compile * track * a lot faster * clean up * dup or * faster and simpler * fast match doesn't support store * plane * minor refactor * real speed * don't imply return None * upat * fix test * heard you wanted more speed * no generator * split cf * early fixup * fxn fixup * reconstruct_function * Revert "reconstruct_function" This reverts commit `37dac010ab`. * simpler stuff * too big * upat compile error * cleanups * don't cache that * cleanups * 10 -> 15	2025-04-14 15:24:41 +01:00
chenyu	e0ec8be37d	use CPU for test_schedule_ring (#9843 ) * use CPU for test_schedule_ring * why pre-commit is good	2025-04-10 23:20:53 -04:00
qazal	16956b79de	canonicalize Device.DEFAULT (#9835 )	2025-04-10 23:02:11 +08:00
George Hotz	f666dd14eb	fix get reduce contraction with test (#9834 )	2025-04-10 22:24:21 +08:00
George Hotz	53f0b2aad7	fix infinite loop in flash attention (#9827 ) * fix infinite loop in flash attention * get_contraction_with_reduce * skip that test * SINGLE_KERNEL_SOFTMAX + fix multi * default IGNORE_OOB * print change	2025-04-10 20:06:44 +08:00
qazal	498a2bf738	add err handling tests to viz + cleanups (#9825 ) * cleanup * add err handling tests to viz + cleanups * lint	2025-04-10 14:05:05 +08:00
qazal	3bd992dc95	multi stage graph_rewrite_map (#9803 ) * multistage graph_rewrite_map * s/merge_map/input_map * build up kernel_map from the tensor_map	2025-04-09 15:59:45 +08:00
Eitan Turok	bb7922b95f	Vectorize Transcendental Regression Tests (#9753 ) * init test * cleanup	2025-04-08 01:27:39 +08:00
chenyu	407ca54382	symbolic fold double where (#9436 ) * symbolic fold double where a.where(b.where(c, d), d) -> (a & b).where(c, d). a pattern in optimizer * test case	2025-04-05 05:12:17 -04:00
Sieds Lykles	9c2fc695b5	cond.logical_not().where(a,b) -> cond.where(b,a) (#9741 ) * Add rule for negation in where, simplifies arange patterns * 0 becomes 0.0 again * Only if cond is bool * ne is never None * Add a test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-04 19:13:32 -04:00
George Hotz	8b5a523743	fix minimum length in pattern matcher (#9736 )	2025-04-04 14:57:01 +08:00
George Hotz	cac8bcf8b5	use Ops.REDUCE (#9721 ) * decrease bert python time [pr] * order copies * Revert "order copies" This reverts commit `3f62c8693b`. * rewrite count * Ops.REDUCE * acc first in the add chain * Fix tensor core acc * arange patterns look good * fix multireduce gate * reduce rewrite rule * bump that to 15 minutes * multiwmma isn't fusing * gep through wmma is gep pushing * bump that timeout too, it's all env setup * add failing test	2025-04-04 10:14:34 +08:00
chenyu	c20f112e9f	example test use z3 to verify valid simplification (#9684 )	2025-04-02 01:05:52 -04:00
chenyu	c672716b38	improve vmin/vmax for IDIV (#9678 )	2025-04-01 23:16:01 -04:00
chenyu	8dd88ad476	don't div_and_mod_folding for negative numerator with remainder (#9674 ) can be wrong in C div since it truncates towards zero	2025-04-01 16:26:23 -04:00
chenyu	0e34f9082e	helper functions for cstyle div mod [pr] (#9673 )	2025-04-01 08:06:56 -04:00
chenyu	5358b0904b	update uop_given_valid if a node becomes const (#9604 ) * update uop_given_valid if a node becomes const * cleanup	2025-03-27 14:57:46 -04:00
qazal	bf94924d5a	fix viz with nested graph_rewrite (#9595 )	2025-03-27 13:14:28 +08:00
qazal	e5ff7b23d7	refactor to @track_matches + add failing test_nested_rewrite (#9592 ) * test_nested_rewrite * refactor to track_matches * positional arg	2025-03-27 11:11:56 +08:00
George Hotz	3c5161b4cb	add validation of the bounds of Ops.INDEX (#9503 ) * add validation of the bounds of Ops.INDEX * do mask properly * more validation * correct * fix gated * add CAST support to vmin/vmax * fix ptx and image * ptx no diff * upat.index also stays --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-03-20 12:15:55 +08:00
qazal	0b20f91ce7	remove move_mask from the devectorizer (#9511 ) * remove move_mask from the devectorizer * add (wrong) ptx * reason * enable index addition in PTX, we won't have the INDEX anyways * space	2025-03-20 11:53:12 +08:00
chenyu	189f62d44f	add rounding to tqdm unit scale (#9507 ) fixed `AssertionError: ' 1.00/10.0 1000it/s]' != ' 1.00/10.0 1.00kit/s]'`	2025-03-19 12:08:46 -04:00
hooved	136cf7b8b1	hotfix: load >2 GiB from disk on macOS (#9361 ) * enable loading >2 GiB buffer from disk on macOS * handle None case raised by mypy * add test * revert fix to repro bug in CI * tell CI to run a unit test for macOS * reapply fix	2025-03-07 14:51:58 +08:00
George Hotz	2cc4cb74f0	reorder binops (#9328 ) * reorder binops * test improvements + fix string tests * ugh, okay this	2025-03-03 14:58:18 +08:00
qazal	e162aa862d	is_realized only if buffer is allocated (#9253 ) * is_realized only if the buffer is allocated * fix the image check too * assert test_lil_model after ExecItems run	2025-02-26 08:58:08 +01:00
Sieds Lykles	9c4d9d9f10	Acc first (#9232 ) * put acc in front of the add chain * handle the other case * Make loop collapse more generic * Remove mulacc_unrolled * Actually remove it --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 22:10:15 -05:00
chenyu	90c3ed17c5	move cast to before softmax in attention (#9213 ) * move cast to before softmax in attention saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66) * test	2025-02-24 17:24:59 -05:00
qazal	14aa2395d0	allow VIEW(BUFFER) in Tensor UOps [pr] (#9210 ) * allow VIEW(BUFFER) in Tensor UOps [pr] * still reshapes * update becomes_map tests * bring copy folder to the scheduler * lint * only sgd left * optimizer assign * 13 kernels * rename to test_reorder_expand + assert VIEW	2025-02-24 13:06:15 +01:00
qazal	d12efc95d4	support custom name function in viz [pr] (#9219 ) * support custom name function in viz [pr] * title case * assert name count in test_track_rewrites_name_fxn	2025-02-24 03:03:25 +02:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
chenyu	3e22747799	run unit test on windows ci (#9187 ) * factor out testing_minimal in setup.py [pr] * testing_unit + windows	2025-02-20 14:40:41 -05:00
chenyu	287de4ecc6	use torch in test_gradient (#9186 ) used torch.autograd.grad, but not sure if it can be a template like jax	2025-02-20 12:26:11 -05:00
George Hotz	df3b320f46	rewriter -> devectorizer [pr] (#9147 )	2025-02-18 12:42:08 +08:00
Ali Ladjevardi	35e9c4657b	Use proper units when printing beam time (#9103 ) * use proper units when printing beam time * refactor DEBUG=2	2025-02-17 23:41:38 +08:00

... 6 7 8 9 10 ...

952 Commits