tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-11 23:25:04 -05:00

Author	SHA1	Message	Date
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
geohotstan	8c0d0a122c	Add return_indices to max_pool (#9506 ) * wow argmax is so good * 1 less line * clean up and better variable names * is this torch thing right...? * add more tests * slap a TODO on it * clean ups * prettier looking code and fix ceil mode test * add return types and some docs * ok that was a bad example since indices == value, just no example	2025-03-19 15:25:37 -04:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
George Hotz	e174c6c3bc	new devectorizer (#9331 ) * new devectorizer * lidx * test linearizer passes * fix images * fix unfoldable image load * delete unused * improve fix_unfoldable_image_load * working for image * fixup types * fixup transcendental * cast_vec * cleaner transcendental * skip failing test * err, flip that * not devec * sqrt	2025-03-11 18:47:56 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
Anish Umale	bafa40fe12	Tiny backend test_ops fix part1 (#9338 ) * extract name methods from https://github.com/tinygrad/tinygrad/pull/9302 * t.grad.numpy() -> t.grad.cpu().numpy() * revert TORCH_DEBUG change * revert dtype change in aten.sum	2025-03-03 12:36:51 -05:00
George Hotz	0d4ba7dd87	import tinygrad.frontend.torch (#9337 ) * import tinygrad.frontend.torch * type ignore	2025-03-04 00:15:29 +08:00
chenyu	146eb73790	fix Tensor.view with a tuple arg (#9330 )	2025-03-02 23:35:23 -05:00
chenyu	ba4b8c2c23	Tensor.copysign (#9329 )	2025-03-02 21:33:49 -05:00
chenyu	cc2bbb0bf1	Tensor.isfinite (#9316 )	2025-03-01 19:58:56 -05:00
chenyu	fe0f860209	update test_ops for tensors from torch (#9308 ) a few detach().numpy() -> detach().cpu().numpy()	2025-02-28 15:57:25 -05:00
chenyu	7c7db78feb	support float mod (#9306 ) also added spec check on Ops.MOD to be ints only	2025-02-28 13:33:58 -05:00
chenyu	90808e2dd0	div rounding_mode (#9304 )	2025-02-28 11:38:25 -05:00
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
chenyu	cd822bbe11	hotfix torch_grad.detach().cpu().numpy() in test_ops (#9268 )	2025-02-26 12:27:35 -05:00
chenyu	49ca90df75	update test_ops backward tests (#9267 ) instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients	2025-02-26 12:09:24 -05:00
chenyu	aaf0a8069f	xor -> bitwise_xor (#9264 )	2025-02-26 10:21:14 -05:00
Sieds Lykles	990c240b82	Stable pow gradient (#9226 ) * Stable gradient * More efficient * Fix and test for +-inf * cleaner * skip webgpu test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 20:54:26 -05:00
chenyu	b3ae664d5d	fix gradient of pow(t, int) (#9217 ) semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck	2025-02-23 17:42:09 -05:00
George Hotz	4e6665bda5	different way to write torch backend (#9197 ) * different way to write torch backend * both backends * more work * simpler code * more work * test both * imply unwrap/wrap * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * ready to start making test_ops work in torch backend * backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works * matmul backward is broken with as_strided	2025-02-22 14:42:26 +08:00
chenyu	465421b525	fix Tensor.isclose (#9143 ) many corner cases around inf and nan	2025-02-17 12:03:12 -05:00
Clément Verrier	a7f91224eb	add `Tensor.isclose()` (#8844 ) * add `Tensor.isclose()` * support `equal_nan` so as to match PyTorch's behavior * update unit tests * remove some tests temporarily * re-enable one test * re-enable other test * try to fix failing tests during CI * save one line of code --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 10:11:40 -05:00
chenyu	c954419bc8	minor tweak to transcendental pow (#9112 ) also added more pow with const test cases	2025-02-15 18:03:25 -05:00
chenyu	8dfa0024f0	raise in scatter if self and src have different dtype [pr] (#9109 ) raise RuntimeError that matches torch instead of an implcitly cast	2025-02-15 11:21:34 -05:00
Marcello Fuschi	8824f7e9df	Make logcumsumexp numerically stable (#9050 ) * Make logcumsumexp numerically stable * Refactor * Refactor for special case ndim=0 * Refactor * Use the correct device for mask --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-14 19:25:17 -05:00
chenyu	73af42aeab	fix pow backward when base is 0 (#9075 )	2025-02-13 21:06:01 -05:00
chenyu	5ef48bbe0a	swap order in rsqrt (#9069 ) fixed backward for 0	2025-02-13 16:51:21 -05:00
chenyu	e02e3b94c3	remove SQRT hack in llvm (#9067 ) replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward	2025-02-13 15:42:34 -05:00
chenyu	2573d0621a	Tensor.scatter_reduce touchup [pr] (#9060 )	2025-02-13 10:01:14 -05:00
Josh Moore	1f9d2442b9	Add `Tensor.scatter_reduce` (#8947 ) * pytorch scatter -> scatter_reduce * WIP scatter_reduce implementation * _pre_scatter return type hint * split out src, mask to satisfy linter * Add src cast back in * dict of lambdas instead of ifs * sum and prod reduction ops with include_self * add reduce arg error message * add amax and amin reduction ops * Fix include_self for higher dims * Simplify * Simplify amax and amin too * Pull include_self logic out into _inv_mask function * reduce arg cannot be None for scatter_reduce * Fix self-mask issue * Add mean reduce op * Add tests * any() not needed here * remove comment * End support for Tensor src with reduce arg in tinygrad scatter * Process index, dim inside actual functions * Add scatter_reduce to onnx * Add excluded onnx ScatterElements reduction tests back in * Save 2 lines on the mask helpers * Update docs * Add include_self=False tests * cleanup * Remove unneeded helper function --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-13 09:08:54 -05:00
Josh Moore	0c97c10814	TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034 )	2025-02-12 14:49:18 +08:00
chenyu	2845f8797a	failed test cases for rsqrt at 0 and similar ones (#9035 ) * failed test cases for rsqrt at 0 and similar ones related to 0inf this failed	2025-02-11 17:50:16 -05:00
chenyu	586e48d696	a few more backward tests now pass (#9010 )	2025-02-10 12:46:21 -05:00
chenyu	25fa5e4d5f	enable backward tests in test_std_one_in_axis [pr] (#9007 ) still one correction=0 case is broken Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-10 10:44:05 -05:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
Bhavya Gada	3b67712892	[bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937 ) * fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple * remove expectedFailure --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 10:07:54 +08:00
George Hotz	f54242849d	failing test for the devectorize [pr] (#8940 ) * failing test for the devectorize [pr] * add DEVECTORIZE to method_cache	2025-02-07 09:44:54 +08:00
chenyu	189bfa164e	enable backward test for pow(neg const x) (#8912 ) backward works now. 0x still does not work because it's a special case fixed in transcendental	2025-02-05 15:35:21 -05:00
eliotgolding	bb5ded85cc	Don't rewrite idiv to rshift when numerator is negative (#8885 ) * more conditions for shift rewrite mul/idiv * make ptx test uint so the new condition is true * delete idiv test * rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division * mul/div by 2**(large count) is unsupported anyway	2025-02-05 07:47:33 +08:00
chenyu	73ee2d74c0	raise RuntimeError for int base pow (#8852 ) current implementation is not precise and blocking other simplification change	2025-02-01 12:11:57 -05:00
Sieds Lykles	78c0455c7a	Better stable sigmoid (#8806 ) Uses `1/(xx) -> 1/x 1/x` together with `x/(1+x) -> 1-1/(1+x)` to rewrite sigmoid instead of `x/((x+1)(x+1)) -> 1/(x+1)*(1-1/(x+1))` Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-29 16:08:53 -05:00
George Hotz	b4bf6a7dea	switch backward to use gradient [pr] (#8235 ) * switch backward to use gradient [pr] * set device correctly, dedup * why does that fail? * add noop cast * simple backward * fix beautiful_mnist * touchups * set in compute_gradient * uop_count * uop_count was wrong * collections * no note * skip that test * update sched kernel counts * train mnist is 65 * fix metadata and gc * fixes * materialize_grads * no pathlib stuff * add contiguous_backward, fix bugs * add some realize * fix multi	2025-01-26 09:12:16 +09:00
chenyu	2d0842386d	fix parse_valid for float uop (#8681 ) x < c -> X <= c-1 only works for int	2025-01-19 18:15:49 -05:00
chenyu	5842ee56c6	raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675 ) matches torch, also fixed incorrect usage in tests	2025-01-19 12:55:04 -05:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
geohotstan	3dfc8e1706	Share a _resolve_pool_pads function for pool ops in Tensor (#8485 ) * _padding2d -> _resolve_pool_pads * rephrase err msg * even better error msg * check asymmetric first os people don't hit error twice * test against torch	2025-01-03 23:54:11 -05:00

1 2 3 4 5 ...

550 Commits