tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
qazal	5441127417	assert const folding return shape matches [pr] (#8006 )	2024-12-03 19:31:06 +08:00
George Hotz	dddfb494d7	don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000 ) * don't mutate the uop/lazybuffer, just the Buffer [pr] * fix red test * try different fix * that * that's the right fix * test for fixed behavior * bump to 3.12	2024-12-03 19:03:51 +08:00
George Hotz	b8bf5b2787	minor uop speedups [pr] (#8002 ) * minor uop cleaner [pr] * free uop creation speed by removing WeakValueDictionary * a lil faster * disable that test * lines * and it doesn't print non hit patterns	2024-12-03 17:04:48 +08:00
George Hotz	0905f87b68	hotfix: print only kernel time	2024-12-03 14:25:08 +08:00
chenyu	c7bc75e634	alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) (#7900 ) * alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) only do if at least one branch is const, so total alu won't increase * tests and interesting TODO cases	2024-12-02 17:19:27 -05:00
chenyu	b91fa24387	script to run regressed sd conv on metal (#7995 ) * script to run regressed sd conv on metal this and other similar `conv2d + add` kernels contributed to most of the speed regression * # ruff: noqa: E501	2024-12-02 15:34:27 -05:00
geohotstan	0a2e10be1d	add SELU to Tensor (#7993 ) * add selu * more clean ups	2024-12-02 10:04:01 -05:00
qazal	bb606e5bcf	process replayable ops.py changes from delete_lazy [pr] (#7994 ) * process replayable ops.py changes from delete_lazy [pr] * hotfix: seed tiny_jit	2024-12-02 19:38:31 +08:00
George Hotz	0c7477b108	no bool in range [pr] (#7988 ) * no bool in range [pr] * fix llvm * add arg to range spec * fix broken test * forgot this one * hotfix: test_tiny jit is a real test	2024-12-02 19:05:16 +08:00
Ahmed Harmouche	1ea0925744	Support packed types in smem in webgpu	2024-12-02 10:13:25 +01:00
George Hotz	275951b730	clean up a few parents -> toposort [pr] (#7984 ) * clean up a few parents -> toposort [pr] * rename to old_parents + sched tests * a few more * that one * second to last * final	2024-12-02 15:59:31 +08:00
George Hotz	f17af70d17	replace all sparents with toposort (#7983 )	2024-12-02 15:00:30 +08:00
qazal	b797aee720	uop global buf number tracking try 2 [pr] (#7912 ) * uop buffer init small refactor [pr] * add early * this way it doesn't need late * buffer_num * itertools.count * count from 0 * down to 380	2024-12-02 14:45:17 +08:00
George Hotz	cbcc1c20eb	second try at block linearize (#7892 ) * second try at block linearize * weeee, works for lil matmul * it's so beautiful * test tiny passes * fix bugs * combine matching BLOCKENDS * wrapping * test lin failures passes * those failures were fake * flip sort order * fix ptx tests * deal with store better * dumb ptx fix * expect less * reduce lines * reduce lines * less lines and cleaner * no defaultdict * tighter * simpler block_parent_count	2024-12-02 13:43:09 +08:00
mesozoic-egg	90e2b2d577	Remove gated store, put rewrite to uopgraph [pr] (#7975 ) * update test for gated store * put gated store rewrite to uopgraph, rm from ptx * update test update test update test * remove gated st rewrite in llvm * lint --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-02 12:33:16 +08:00
George Hotz	d53cd92364	fix tests for delete lazy [pr] (#7980 )	2024-12-02 12:00:48 +08:00
George Hotz	6c1efb9a72	hotfix: amd gemv was flaky	2024-12-02 11:08:24 +08:00
ignaciosica	509c4a573f	increase tolerance on test (#7972 )	2024-11-30 11:50:10 -05:00
qazal	6f17eedaea	schedule sink folding try 2 [pr] (#7968 )	2024-11-30 20:46:26 +08:00
qazal	5615e92df8	const folding tests [pr] (#7967 )	2024-11-30 19:27:30 +08:00
qazal	8780818d04	Revert "schedule sink folding with graph_rewrite [pr] (#7963 )" (#7965 ) This reverts commit `4529c5d0da`.	2024-11-30 19:02:06 +08:00
qazal	4529c5d0da	schedule sink folding with graph_rewrite [pr] (#7963 ) * schedule sink folding with graph_rewrite [pr] * x is reserved, use u * match lazy const folding	2024-11-30 18:30:41 +08:00
nimlgen	10f431b96d	hcq replace update with sint (#7899 ) * try sym hcq * start with amd * move to nv * nv works * cache and qcom * fixes * signals * fix nv * qcom fixes * linter * linter * cache + typings * fixes * tiny fixes * linter * linter * lntr * ugh * comments	2024-11-29 20:08:13 +03:00
chenyu	aa51f3c14e	update kernel counts in test_real_world (#7960 ) the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now. if it's stable we could consider making kernel count strict, which helps change like #7940	2024-11-29 11:14:54 -05:00
geohotstan	e1a85c262c	no type-tracker getitem refactor (#6917 ) * newest newer than new refactor of getitem * hmmm * hmmmmmmmmmmmmmmmmm * bro. * ??? * small improvements * cleaner, but why u gotta do this to me mypy * fix, but still dunno about mypy * even better * try again? Passes locally * use match * fix mypy * better * broooooo check this out * fix mypy * bug fix * fixed * polish	2024-11-29 10:18:02 -05:00
Sieds Lykles	d267a2d9eb	Div mod recombine test for issue (#7957 ) * Add test for failing div_mod recombine * Add test case when there is gcd in div/mod	2024-11-29 08:47:50 -05:00
Ahmed Harmouche	2d11765295	Fix WebGPU atomic store (#7954 )	2024-11-29 19:31:25 +08:00
geohotstan	765096fe7d	fix Tensor._pool edge case (#7581 ) * split into another branch * polish * try this * Revert "try this" This reverts commit `84f711b13e`. * try * Revert "try" This reverts commit `89c7a7649b`. * idk anymore * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-28 23:17:13 -05:00
chenyu	bb23469f93	lower conv threshold on red (#7948 )	2024-11-28 13:31:06 -05:00
qazal	f39e9b4288	match lazy movement ops in uop [pr] (#7944 )	2024-11-28 23:03:43 +08:00
chenyu	f54508549f	don't search conv weight init in speed_v_theoretical (#7943 )	2024-11-28 10:03:18 -05:00
qazal	aa7e16744e	allow sinking childless consts and fold them [pr] (#7941 )	2024-11-28 20:23:37 +08:00
George Hotz	c5c3b05b5a	block lin: only the test changes (#7933 )	2024-11-28 13:19:00 +08:00
George Hotz	32dbab945c	Revert "add block uops and modify tests (#7931 )" (#7932 ) This reverts commit `6f4519ff45`.	2024-11-28 13:15:41 +08:00
George Hotz	6f4519ff45	add block uops and modify tests (#7931 )	2024-11-28 13:11:18 +08:00
Sieds Lykles	864758423e	Don't take const in gcd and change the "nothing_changed" condition (#7926 ) * Don't take const in gcd and change the "nothing_changed" condition Biggest difference is probably actually that I forgot to check if gcd changed if nothing else changed The TODO was fixed by not using the const in the gcd, and then taking it out * Fix more tests	2024-11-27 18:07:36 -05:00
chenyu	988d64900b	add TODO case to test_mod_congruence (#7925 ) same alu count but better bounds	2024-11-27 15:23:21 -05:00
geohotstan	cea5853cfa	add Tensor.scatter (#7737 ) * working I think * where are my onnx scatter tests?? * forward_only for now * try if nan hack fix NV * looks like issue is different... CUDA WHY * oops that was wrong. Try if this fixes CUDA * simpler multiply * actually finish this up tmrw morning :x * fix tests? * improve tests * improve test and implementation * fix ruff * complete but lots of expected failure... * reviewed tests * add onnx tests * is this a processing op? * add return type to indicate that it's not in-place * final cleanups * use or and improve tests a little * add masked_index_select * call it masked_setitem instead * try * FIXED --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:52:04 -05:00
geohotstan	753f07e193	add circular pad mode to Tensor.pad (#7918 ) * start * send it * no more neg circular pads * quick fix onnx too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:30:51 -05:00
chenyu	a58e289d77	Revert "prereqs for new block lin so PR works (#7919 )" (#7921 ) This reverts commit `c53261b541`.	2024-11-27 08:41:09 -05:00
George Hotz	c53261b541	prereqs for new block lin so PR works (#7919 )	2024-11-27 15:07:54 +08:00
Sieds Lykles	d318867776	Factoring gcd out of mod (#7916 ) * Factoring gcd out of mod Curious if this will be faster/better * Update bounds on test	2024-11-26 21:17:22 -05:00
qazal	ea57c52b99	base uop is always contiguous (#7907 ) * base is always contiguous * add test_late_fusion_post_permute_simpler * Revert "swizzle tc [pr] (#7633)" This reverts commit `f02462c5cb`. * Revert "Revert "swizzle tc [pr] (#7633)"" This reverts commit `a26b577d86`. * yay * minimal diff	2024-11-26 20:13:29 +08:00
George Hotz	4e5bf9dc7a	test assignment in jit (#7906 ) * test assignment in jit * don't waste lines * skip broken test in webgpu	2024-11-26 17:37:00 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	ff3f2a9c1a	Revert "move attention upcast (#7830 )" (#7903 ) This reverts commit `c07daf40e7`.	2024-11-25 18:59:51 -05:00
chenyu	a49ca0c2ff	clean up fully_flatten [pr] (#7885 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-25 06:53:18 -05:00
qazal	e823de3828	viz with bottom_up=True (#7894 ) * add failing test * single pass it * linter	2024-11-25 17:56:48 +08:00
George Hotz	9d0038bccb	small changes from block linearizer [pr] (#7888 ) * small changes from block linearizer [pr] * fix test_gc	2024-11-25 15:27:04 +08:00
Sieds Lykles	a49a7c4784	Improved mod folding (#7887 ) * Remove uneccessary if statement In all paths where something_changed was set to True, remainder is appended so the list can't be empty * Working version of improved mod folding * Fix offset calculation Passing fuzz_symbolic.py to 130_000 so far Added an extra test * Cleaner offset calculation	2024-11-24 22:21:34 -05:00

... 32 33 34 35 36 ...

4618 Commits