tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
Sieds Lykles	864758423e	Don't take const in gcd and change the "nothing_changed" condition (#7926 ) * Don't take const in gcd and change the "nothing_changed" condition Biggest difference is probably actually that I forgot to check if gcd changed if nothing else changed The TODO was fixed by not using the const in the gcd, and then taking it out * Fix more tests	2024-11-27 18:07:36 -05:00
chenyu	988d64900b	add TODO case to test_mod_congruence (#7925 ) same alu count but better bounds	2024-11-27 15:23:21 -05:00
geohotstan	cea5853cfa	add Tensor.scatter (#7737 ) * working I think * where are my onnx scatter tests?? * forward_only for now * try if nan hack fix NV * looks like issue is different... CUDA WHY * oops that was wrong. Try if this fixes CUDA * simpler multiply * actually finish this up tmrw morning :x * fix tests? * improve tests * improve test and implementation * fix ruff * complete but lots of expected failure... * reviewed tests * add onnx tests * is this a processing op? * add return type to indicate that it's not in-place * final cleanups * use or and improve tests a little * add masked_index_select * call it masked_setitem instead * try * FIXED --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:52:04 -05:00
geohotstan	753f07e193	add circular pad mode to Tensor.pad (#7918 ) * start * send it * no more neg circular pads * quick fix onnx too --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-27 10:30:51 -05:00
chenyu	a58e289d77	Revert "prereqs for new block lin so PR works (#7919 )" (#7921 ) This reverts commit `c53261b541`.	2024-11-27 08:41:09 -05:00
George Hotz	c53261b541	prereqs for new block lin so PR works (#7919 )	2024-11-27 15:07:54 +08:00
Sieds Lykles	d318867776	Factoring gcd out of mod (#7916 ) * Factoring gcd out of mod Curious if this will be faster/better * Update bounds on test	2024-11-26 21:17:22 -05:00
qazal	ea57c52b99	base uop is always contiguous (#7907 ) * base is always contiguous * add test_late_fusion_post_permute_simpler * Revert "swizzle tc [pr] (#7633)" This reverts commit `f02462c5cb`. * Revert "Revert "swizzle tc [pr] (#7633)"" This reverts commit `a26b577d86`. * yay * minimal diff	2024-11-26 20:13:29 +08:00
George Hotz	4e5bf9dc7a	test assignment in jit (#7906 ) * test assignment in jit * don't waste lines * skip broken test in webgpu	2024-11-26 17:37:00 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	ff3f2a9c1a	Revert "move attention upcast (#7830 )" (#7903 ) This reverts commit `c07daf40e7`.	2024-11-25 18:59:51 -05:00
chenyu	a49ca0c2ff	clean up fully_flatten [pr] (#7885 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-25 06:53:18 -05:00
qazal	e823de3828	viz with bottom_up=True (#7894 ) * add failing test * single pass it * linter	2024-11-25 17:56:48 +08:00
George Hotz	9d0038bccb	small changes from block linearizer [pr] (#7888 ) * small changes from block linearizer [pr] * fix test_gc	2024-11-25 15:27:04 +08:00
Sieds Lykles	a49a7c4784	Improved mod folding (#7887 ) * Remove uneccessary if statement In all paths where something_changed was set to True, remainder is appended so the list can't be empty * Working version of improved mod folding * Fix offset calculation Passing fuzz_symbolic.py to 130_000 so far Added an extra test * Cleaner offset calculation	2024-11-24 22:21:34 -05:00
leopf	5d92efb121	[BUGFIX] Tensor([]).data() (#7884 ) * added test, fix * fix only for (0,) shape * Revert "fix only for (0,) shape" * test_data_empty_multi_dim	2024-11-24 16:42:57 -05:00
geohotstan	ad9df26fba	add test for inconsistent behavior in float to int casting (#7870 ) * found teeny bug * no healthcheck * change function name	2024-11-24 07:31:34 -05:00
qazal	5aee78a0a6	fix uop swizzle on BUFFER, new tests (#7875 ) * fix uop swizzle on BUFFER, new tests * can have view of view	2024-11-24 17:11:09 +08:00
George Hotz	8c3d3181dd	bottom up rewrite fixes substitute [pr] (#7862 ) * single pass rewrite fixes substitute [pr] * caching for single_pass_rewrite * allow multiple rewrites * a simple test * bottom_up_rewrite is fully flexible	2024-11-23 20:53:37 +08:00
George Hotz	144e9f00df	viz is local, new test, and new quantize [pr] (#7859 ) * viz is local, new test, and new quantize [pr] * fix mime types * remove font * after index	2024-11-23 14:27:10 +08:00
chenyu	c07daf40e7	move attention upcast (#7830 ) still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).	2024-11-22 17:10:51 -05:00
chenyu	5c5b1b994c	less flaky benchmarks (#7855 ) JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830	2024-11-22 16:39:39 -05:00
chenyu	3b26e51fce	Tensor.cummax (#7854 ) generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD	2024-11-22 15:55:02 -05:00
chenyu	40d7535eeb	clean up DTYPES_DICT [pr] (#7845 )	2024-11-22 10:01:34 -05:00
qazal	9828277c03	view doesn't have buffer, fix the tests [pr] (#7841 ) * view doesn't have buffer, fix the tests [pr] * need assigns	2024-11-22 20:41:55 +08:00
chenyu	69e382216d	fix wino conv output dtype for half inputs (#7829 )	2024-11-21 12:13:54 -05:00
geohotstan	cf1ec90ad4	add inverse trig functions to Tensor (#7805 ) * implement inverse trig functions * guess we should still test nans? * magnitude as variable name :D * reorder onnx_ops ops * approximation -> x for consistency * address feedback * simpler acos * improvement? * actually just have asin depend on atan * actually this is nicer * remove a comment --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-21 09:13:36 -05:00
qazal	5399ff6d06	add UOp.const_with_shape [pr] (#7825 ) * add UOp.const_with_shape [pr] * lines	2024-11-21 21:13:23 +08:00
qazal	e378aeb94e	assert view degrade to const tests post scheduler graph_rewrite [pr] (#7822 ) * assert view degrade to const tests post scheduler graph_rewrite [pr] * low pri, probably tricky, todo	2024-11-21 19:00:41 +08:00
qazal	75c082b883	move CONST/BIND -> VALID to matchers (#7818 ) * delete special const * move CONST/BIND -> VALID to matchers * unittests * fix FUSE_ARANGE=1 * split into two upats * the right way to access view	2024-11-21 16:07:01 +08:00
George Hotz	e9ae2ccd09	_prg to match _buf [pr] (#7816 )	2024-11-21 12:44:48 +08:00
George Hotz	c5d458ce02	BufferSpec and ProgramSpec [pr] (#7814 ) * BufferSpec and ProgramSpec [pr] * delete preallocate, it's unused * Revert "delete preallocate, it's unused" This reverts commit `dcfcfaccde`.	2024-11-21 12:18:05 +08:00
George Hotz	9df5a62c5e	unify to HWQueue [pr] (#7812 ) * unify to HWCommandQueue [pr] * all is HWQueue	2024-11-21 10:33:08 +08:00
chenyu	11cea00090	lower vs_theoretical conv tflops threshold for nv (#7811 ) less flaky	2024-11-20 20:03:49 -05:00
ignaciosica	fc3154a7b3	metal bf16 tc support [pr] (#7408 ) * add bf16 tc for metal * hotfix: spacing * fix tolerance and skip metal bf16 in ci * hotfix: check for dtype_out * hotfix: add check for tc.dtype_out is bf16 back * hotfix: add parens	2024-11-20 14:39:08 -05:00
geohotstan	66a069ee25	add replicate mode to Tensor.pad (#7802 ) * base implementation * add tests * actually remove the assertionerror test * good	2024-11-20 08:39:58 -05:00
George Hotz	eb0bb7dc0b	final dname to device [pr] (#7806 ) * final dname to device [pr] * oops, fix nv	2024-11-20 20:20:28 +08:00
George Hotz	bc977fec53	dname -> device [pr] (#7804 ) * dname -> device [pr] * a few more * only one left	2024-11-20 17:57:14 +08:00
ttomsa	9adeb1041c	fix advanced setitem with 1 in shape (#7797 ) * fix advanced setitem with 1 in shape * linter	2024-11-19 20:04:59 -05:00
ttomsa	170ece6605	fix advanced setitem overlap with 0 (#7793 ) * fix advanced setitem overlap with 0 * fix comment	2024-11-19 16:03:55 -05:00
Gaétan Lepage	159c0bf25e	test_kernel_cache_in_action: fix test (#7792 )	2024-11-19 13:34:56 -05:00
Eitan Turok	56017c52a0	Raise error when model architecture does not match state dict (#7772 ) * init * style * style * style * fix test	2024-11-20 00:11:54 +08:00
George Hotz	d71fe7faa5	rename allocator methods to not conflict [pr] (#7788 ) * rename allocator methods to not conflict [pr] * forgot those * transfer + offset	2024-11-20 00:10:29 +08:00
geohotstan	aeaf574a05	add failure test for setitem bug (#7786 ) * add failure test * rename * improve tests * improve tests and no need numpy	2024-11-19 08:54:21 -05:00
qazal	1e31b5ba6b	hotfix: ctx doesn't impact process replay [pr] (#7785 )	2024-11-19 20:17:01 +08:00
chenyu	26200574dc	load_state_dict test cases when model and data shard differently (#7774 ) current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change. but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load. it should either always use model shard axis or always use state_dict shard	2024-11-18 16:08:24 -05:00
Francis Lata	a1c1b9547f	Context manager support for tqdm (#7770 ) * add context manager support * add test case for context manager usage	2024-11-18 14:12:03 -05:00
geohotstan	8100109c9d	Add replicate mode to Tensor.pad (#7608 ) * base implementation * add tests * actually remove the assertionerror test * actually only have reflect for this pr * change the 4 if-else one liner * maybe use a lambda * fix * maybe a lil cleaner * fix tests * complete * small change --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-18 10:55:38 -05:00
chenyu	66d7d5af50	fix Tensor(MultiLazyBuffer) with different dtype should fail (#7757 ) similar to Tensor(LazyBuffer) as we don't cast implicitly	2024-11-17 21:05:45 -05:00
chenyu	df817297b6	fix passing acc_dtype="" to Tensor.prod should fail (#7750 ) similar to sum	2024-11-17 11:38:13 -05:00

... 29 30 31 32 33 ...

4433 Commits