tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 15:28:10 -05:00

Author	SHA1	Message	Date
qazal	aa2e7b11f8	more const folding infra from the delete_lazy branch [pr] (#7976 ) * more const folding infra from the delete_lazy branch [pr] * sink base * limit	2024-12-01 23:20:30 +08:00
ignaciosica	509c4a573f	increase tolerance on test (#7972 )	2024-11-30 11:50:10 -05:00
qazal	ca20f281df	late folding size 0 ops (#7940 ) * fold st size=0 * fold 0 here * ops folding * update realize	2024-12-01 00:40:02 +08:00
chenyu	c068e8c242	fetch cleanup (#7970 ) reordered a bit to minimize the stuff in the with blocks test manually with TestFetch and `DISABLE_HTTP_CACHE=1` on some examples	2024-11-30 11:00:33 -05:00
qazal	bb8e319680	unset TRACK_MATCH_STATS while initing beam buffers [pr] (#7971 )	2024-11-30 23:56:58 +08:00
qazal	d0735d6489	swizzle store [pr] (#7964 ) * swizzle store [pr] * assign extra swizzle * now arg is optional * extra	2024-11-30 21:32:50 +08:00
qazal	6f17eedaea	schedule sink folding try 2 [pr] (#7968 )	2024-11-30 20:46:26 +08:00
qazal	293e0f8a8e	make ASSIGN arg optional [pr] (#7966 )	2024-11-30 19:40:33 +08:00
qazal	5615e92df8	const folding tests [pr] (#7967 )	2024-11-30 19:27:30 +08:00
qazal	8780818d04	Revert "schedule sink folding with graph_rewrite [pr] (#7963 )" (#7965 ) This reverts commit `4529c5d0da`.	2024-11-30 19:02:06 +08:00
qazal	4529c5d0da	schedule sink folding with graph_rewrite [pr] (#7963 ) * schedule sink folding with graph_rewrite [pr] * x is reserved, use u * match lazy const folding	2024-11-30 18:30:41 +08:00
nimlgen	10f431b96d	hcq replace update with sint (#7899 ) * try sym hcq * start with amd * move to nv * nv works * cache and qcom * fixes * signals * fix nv * qcom fixes * linter * linter * cache + typings * fixes * tiny fixes * linter * linter * lntr * ugh * comments	2024-11-29 20:08:13 +03:00
chenyu	aa51f3c14e	update kernel counts in test_real_world (#7960 ) the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now. if it's stable we could consider making kernel count strict, which helps change like #7940	2024-11-29 11:14:54 -05:00
nimlgen	d3660ccc51	prereqs for hcq updates removal (#7959 ) * hcq signals touch ups * hcq compiled has device id * helpers * prreq hcq api * oops	2024-11-29 18:20:07 +03:00
geohotstan	e1a85c262c	no type-tracker getitem refactor (#6917 ) * newest newer than new refactor of getitem * hmmm * hmmmmmmmmmmmmmmmmm * bro. * ??? * small improvements * cleaner, but why u gotta do this to me mypy * fix, but still dunno about mypy * even better * try again? Passes locally * use match * fix mypy * better * broooooo check this out * fix mypy * bug fix * fixed * polish	2024-11-29 10:18:02 -05:00
Sieds Lykles	d267a2d9eb	Div mod recombine test for issue (#7957 ) * Add test for failing div_mod recombine * Add test case when there is gcd in div/mod	2024-11-29 08:47:50 -05:00
qazal	e54ff0d3af	conceptual uop st cleanup [pr] (#7956 ) * conceptual uop st cleanup [pr] * unwrap is fine here, better than arg	2024-11-29 19:35:46 +08:00
Ahmed Harmouche	2d11765295	Fix WebGPU atomic store (#7954 )	2024-11-29 19:31:25 +08:00
nimlgen	309dcb1044	hcq signal add sleep (#7955 ) * hcqsignal sleep * fixes * typing * time ms is int	2024-11-29 14:04:45 +03:00
qazal	30f0e95fbd	don't lru_cache is_scheduled [pr] (#7953 )	2024-11-29 17:03:55 +08:00
qazal	f044271898	big graph do_realize cleanup and renames [pr] (#7952 ) * scheduler do_realize cleanup and renames [pr] * big graph is the better name * more language * append_kernel -> append_realize	2024-11-29 14:58:45 +08:00
ignaciosica	6e47dc8921	true tc swizzle [pr] (#7951 ) * true tc swizzle * cleanup * fix linter	2024-11-29 14:39:46 +08:00
geohotstan	765096fe7d	fix Tensor._pool edge case (#7581 ) * split into another branch * polish * try this * Revert "try this" This reverts commit `84f711b13e`. * try * Revert "try" This reverts commit `89c7a7649b`. * idk anymore * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-28 23:17:13 -05:00
chenyu	70f052d2b8	flip IF and RANGE order (#7947 ) this is the rest of #7919 prereqs for new block lin	2024-11-28 13:35:30 -05:00
chenyu	bb23469f93	lower conv threshold on red (#7948 )	2024-11-28 13:31:06 -05:00
chenyu	e243e709a7	BLOCK ops in Ops (#7945 ) did this break conv speed?	2024-11-28 12:44:22 -05:00
qazal	f39e9b4288	match lazy movement ops in uop [pr] (#7944 )	2024-11-28 23:03:43 +08:00
chenyu	f54508549f	don't search conv weight init in speed_v_theoretical (#7943 )	2024-11-28 10:03:18 -05:00
chenyu	3c8c98253a	BEAM_DEBUG=1 in speed_v_theoretical (#7942 ) * DEBUG=3 in speed_v_theoretical * BEAM_DEBUG=1	2024-11-28 08:30:55 -05:00
qazal	aa7e16744e	allow sinking childless consts and fold them [pr] (#7941 )	2024-11-28 20:23:37 +08:00
qazal	3ab67d45b2	init changes from the global buffer branch [pr] (#7939 )	2024-11-28 19:38:58 +08:00
nimlgen	81d415be03	amd pkt3 refactor (#7923 ) * amd pkt3 refactor * replace this * linter * fix * cmt * fast * simpler * linter * smth * missing	2024-11-28 11:06:37 +03:00
qazal	e3fe7023b0	move all VIEW -> LOAD rules to big graph rewrite [pr] (#7936 ) * move all VIEW -> LOAD rules to big graph rewrite [pr] * comments	2024-11-28 14:02:29 +08:00
qazal	e2eccdab43	swizzle upat consistency + assert it's base [pr] (#7935 )	2024-11-28 13:35:55 +08:00
George Hotz	c5c3b05b5a	block lin: only the test changes (#7933 )	2024-11-28 13:19:00 +08:00
George Hotz	32dbab945c	Revert "add block uops and modify tests (#7931 )" (#7932 ) This reverts commit `6f4519ff45`.	2024-11-28 13:15:41 +08:00
George Hotz	6f4519ff45	add block uops and modify tests (#7931 )	2024-11-28 13:11:18 +08:00
chenyu	336a9b6bf3	remove dtype from llama precompute_freqs_cis (#7930 ) do the cast based on input in first forward call instead	2024-11-27 22:28:40 -05:00
chenyu	3e2430f822	use tqdm tqdm in mlperf training (#7929 ) issue in benchmark dashboard logging, revert back to tqdm tqdm for now	2024-11-27 21:57:05 -05:00
Sieds Lykles	864758423e	Don't take const in gcd and change the "nothing_changed" condition (#7926 ) * Don't take const in gcd and change the "nothing_changed" condition Biggest difference is probably actually that I forgot to check if gcd changed if nothing else changed The TODO was fixed by not using the const in the gcd, and then taking it out * Fix more tests	2024-11-27 18:07:36 -05:00
chenyu	988d64900b	add TODO case to test_mod_congruence (#7925 ) same alu count but better bounds	2024-11-27 15:23:21 -05:00
chenyu	57262c8e34	update Tensor.scatter doc examples (#7924 ) same example from torch, i think it's much more useful	2024-11-27 11:42:36 -05:00
geohotstan	cea5853cfa	add Tensor.scatter (#7737 ) * working I think * where are my onnx scatter tests?? * forward_only for now * try if nan hack fix NV * looks like issue is different... CUDA WHY * oops that was wrong. Try if this fixes CUDA * simpler multiply * actually finish this up tmrw morning :x * fix tests? * improve tests * improve test and implementation * fix ruff * complete but lots of expected failure... * reviewed tests * add onnx tests * is this a processing op? * add return type to indicate that it's not in-place * final cleanups * use or and improve tests a little * add masked_index_select * call it masked_setitem instead * try * FIXED --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:52:04 -05:00
JaSpa99	38f34ca0cb	prepare mypy==1.13.0: legacy cast (#7866 ) * use helper to narrow literal type * narrow with asserts instead of cast * remove parantheses * tensor.item() calls tensor.data() * no copy * proper indexing	2024-11-27 10:33:35 -05:00
geohotstan	753f07e193	add circular pad mode to Tensor.pad (#7918 ) * start * send it * no more neg circular pads * quick fix onnx too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:30:51 -05:00
chenyu	a58e289d77	Revert "prereqs for new block lin so PR works (#7919 )" (#7921 ) This reverts commit `c53261b541`.	2024-11-27 08:41:09 -05:00
George Hotz	c53261b541	prereqs for new block lin so PR works (#7919 )	2024-11-27 15:07:54 +08:00
chenyu	a6171cbe71	add stable diffusion v2 to mac benchmark (#7917 ) this caught #7902	2024-11-26 22:09:43 -05:00
Sieds Lykles	d318867776	Factoring gcd out of mod (#7916 ) * Factoring gcd out of mod Curious if this will be faster/better * Update bounds on test	2024-11-26 21:17:22 -05:00
nimlgen	84f96e48a1	hcq signal tiny refactor (#7913 ) * hcq signal tiny refactor * no mv * fix * fix2 * fix3	2024-11-26 21:48:38 +03:00

1 2 3 4 5 ...

7003 Commits