tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 04:47:56 -05:00

Author	SHA1	Message	Date
chenyu	8dfa0024f0	raise in scatter if self and src have different dtype [pr] (#9109 ) raise RuntimeError that matches torch instead of an implcitly cast	2025-02-15 11:21:34 -05:00
Marcello Fuschi	8824f7e9df	Make logcumsumexp numerically stable (#9050 ) * Make logcumsumexp numerically stable * Refactor * Refactor for special case ndim=0 * Refactor * Use the correct device for mask --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-14 19:25:17 -05:00
chenyu	73af42aeab	fix pow backward when base is 0 (#9075 )	2025-02-13 21:06:01 -05:00
chenyu	5ef48bbe0a	swap order in rsqrt (#9069 ) fixed backward for 0	2025-02-13 16:51:21 -05:00
chenyu	e02e3b94c3	remove SQRT hack in llvm (#9067 ) replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward	2025-02-13 15:42:34 -05:00
chenyu	2573d0621a	Tensor.scatter_reduce touchup [pr] (#9060 )	2025-02-13 10:01:14 -05:00
Josh Moore	1f9d2442b9	Add `Tensor.scatter_reduce` (#8947 ) * pytorch scatter -> scatter_reduce * WIP scatter_reduce implementation * _pre_scatter return type hint * split out src, mask to satisfy linter * Add src cast back in * dict of lambdas instead of ifs * sum and prod reduction ops with include_self * add reduce arg error message * add amax and amin reduction ops * Fix include_self for higher dims * Simplify * Simplify amax and amin too * Pull include_self logic out into _inv_mask function * reduce arg cannot be None for scatter_reduce * Fix self-mask issue * Add mean reduce op * Add tests * any() not needed here * remove comment * End support for Tensor src with reduce arg in tinygrad scatter * Process index, dim inside actual functions * Add scatter_reduce to onnx * Add excluded onnx ScatterElements reduction tests back in * Save 2 lines on the mask helpers * Update docs * Add include_self=False tests * cleanup * Remove unneeded helper function --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-13 09:08:54 -05:00
Josh Moore	0c97c10814	TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034 )	2025-02-12 14:49:18 +08:00
chenyu	2845f8797a	failed test cases for rsqrt at 0 and similar ones (#9035 ) * failed test cases for rsqrt at 0 and similar ones related to 0inf this failed	2025-02-11 17:50:16 -05:00
chenyu	586e48d696	a few more backward tests now pass (#9010 )	2025-02-10 12:46:21 -05:00
chenyu	25fa5e4d5f	enable backward tests in test_std_one_in_axis [pr] (#9007 ) still one correction=0 case is broken Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-10 10:44:05 -05:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
Bhavya Gada	3b67712892	[bounty] Fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple (#8937 ) * fix LLVM=1 NO_DEVECTORIZE=1 python3 test/test_ops.py TestOps.test_strided_conv2d_simple * remove expectedFailure --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 10:07:54 +08:00
George Hotz	f54242849d	failing test for the devectorize [pr] (#8940 ) * failing test for the devectorize [pr] * add DEVECTORIZE to method_cache	2025-02-07 09:44:54 +08:00
chenyu	189bfa164e	enable backward test for pow(neg const x) (#8912 ) backward works now. 0x still does not work because it's a special case fixed in transcendental	2025-02-05 15:35:21 -05:00
eliotgolding	bb5ded85cc	Don't rewrite idiv to rshift when numerator is negative (#8885 ) * more conditions for shift rewrite mul/idiv * make ptx test uint so the new condition is true * delete idiv test * rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division * mul/div by 2**(large count) is unsupported anyway	2025-02-05 07:47:33 +08:00
chenyu	73ee2d74c0	raise RuntimeError for int base pow (#8852 ) current implementation is not precise and blocking other simplification change	2025-02-01 12:11:57 -05:00
Sieds Lykles	78c0455c7a	Better stable sigmoid (#8806 ) Uses `1/(xx) -> 1/x 1/x` together with `x/(1+x) -> 1-1/(1+x)` to rewrite sigmoid instead of `x/((x+1)(x+1)) -> 1/(x+1)*(1-1/(x+1))` Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-29 16:08:53 -05:00
George Hotz	b4bf6a7dea	switch backward to use gradient [pr] (#8235 ) * switch backward to use gradient [pr] * set device correctly, dedup * why does that fail? * add noop cast * simple backward * fix beautiful_mnist * touchups * set in compute_gradient * uop_count * uop_count was wrong * collections * no note * skip that test * update sched kernel counts * train mnist is 65 * fix metadata and gc * fixes * materialize_grads * no pathlib stuff * add contiguous_backward, fix bugs * add some realize * fix multi	2025-01-26 09:12:16 +09:00
chenyu	2d0842386d	fix parse_valid for float uop (#8681 ) x < c -> X <= c-1 only works for int	2025-01-19 18:15:49 -05:00
chenyu	5842ee56c6	raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675 ) matches torch, also fixed incorrect usage in tests	2025-01-19 12:55:04 -05:00
geohotstan	9229867fec	Support asymmetrical pads for all pooling functions (#8109 ) * implemented in tensor * apply onnx tests to asymmetrical pads * better onnx op ordering * correct ceil_mode asymmetrical * fix onnx_ops comments * a few more TODOs and fix some stupidity * fix some typing * fix test * mypy still a little messed up * refactor out pad struct transformation * add simple docs for now * add whatever tests possible * add tests for _resolve_pool_pads * better err msg * whoops didn't mean to include this * retry CI * enable asymmetric pads onnx tests * better docs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-05 16:01:08 -05:00
geohotstan	3dfc8e1706	Share a _resolve_pool_pads function for pool ops in Tensor (#8485 ) * _padding2d -> _resolve_pool_pads * rephrase err msg * even better error msg * check asymmetric first os people don't hit error twice * test against torch	2025-01-03 23:54:11 -05:00
chenyu	f3fdec940d	Tensor.mod (#8458 ) it's a python style mod. possibily can be cleaner with a floor div relaxed the vmin for MOD slightly for cstyle negatives mod, it's more correct and might fix other bugs	2024-12-31 11:31:42 -05:00
chenyu	de3705168e	update idiv doc and test cases (#8398 ) test more cases when either numerator and denominator is negative and has remainder or not	2024-12-24 17:03:18 -05:00
chenyu	2c93f27652	remove explicit np.array and np.int32 in test_div_int [pr] (#8395 ) vals default loads as int32 now in test_ops	2024-12-24 13:09:30 -05:00
geohotstan	78cb47dfc5	docs and tests clean ups (#8383 )	2024-12-23 11:12:13 -05:00
chenyu	a556adf028	add test for Tensor silu and swish (#8381 ) was only tested in onnx, added to test_ops for completeness	2024-12-22 21:08:59 -05:00
geohotstan	423d823c50	add GatherND and ScatterND to onnx ops (#8241 ) * implemented * this implementation is now correct * this is fine I guess * better variable names * finally correct gathernd * add a note * eh just leave it at this for now * teeny adjustment	2024-12-19 00:35:04 -05:00
chenyu	4c1733440d	failed test case for stable sigmoid (#8245 ) it should also work if implemented differently	2024-12-14 15:19:41 -05:00
chenyu	3eb952f537	fix some sigmoid extreme (#8238 ) * fix some sigmoid extreme quite brittle... the problem is it has 3 terms and mul might have bad order * test_tanh_extreme * just sigmoid gradient	2024-12-14 14:37:06 -05:00
George Hotz	e2f87ecf36	start work on new gradient (#7838 ) * start work on new gradient * more correct * working tests * more tests * work * add (faliing) gradient test * add view and reduce gradient * test_add works, many failing test_ops * add max and reduce max * add max and reduce max * 129 failing * 108 failed * better view drawing * 101 failed * i got 99 failures * 94 failures * it's tons of terrible code, but only 50 tests fail * only 19 failures * same 19 but shorter * minimal doesn't matter * shorter * lil simpler * simpler * simpler * simpler * 13 test failures * nine tests fail * all ops tests pass * add contiguous gradient + fix sched tests * faster by removing toposort calls * missed one * add jax to testing	2024-12-13 16:45:53 -08:00
chenyu	c4be1529cf	update test for Tensor.softplus (#8150 ) test beta and extreme inputs. to pass big input, it needs to support `threshold`, which needs fix on backward that we punt until new gradient api	2024-12-10 17:48:02 -05:00
chenyu	286fec115e	fix Tensor.minimum for int (#8145 ) use invert instead of just neg. consolidate min, argmin, and minimum also update maximum to not apply the mid point for int	2024-12-10 13:34:41 -05:00
chenyu	917deb88a4	make //0 return 0 in python_alu (#8131 ) on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.	2024-12-09 19:32:06 -05:00
chenyu	358287959b	fix pow of int to negative const int (#8129 ) it should return in int	2024-12-09 17:20:18 -05:00
chenyu	12f7d284e0	failed test case for int pow (#8128 ) also updated test_ops so that non-float compares with `assert_equal`. removed `test_multinomial` which is tested better in test_randomness	2024-12-09 16:15:09 -05:00
qazal	80de06c8b9	scheduler ops_folding from delete_lazy (#8124 ) * scheduler diff from delete_lazy * test_std_mean * late fold copy of CONST * clang const is fine	2024-12-10 00:36:01 +08:00
chenyu	ccf54c2375	fix argmax/min on int32 min (#8118 )	2024-12-09 02:29:23 -05:00
chenyu	c814de2dd4	fix bitwise_not for signed int (#8117 ) -1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32	2024-12-09 02:02:51 -05:00
qazal	69e48da961	set NOOPT in test_avg_pool3d_failure (#8112 ) * set NOOPT=0 in test_avg_pool3d_failure * noopt should still pass	2024-12-08 10:48:29 -05:00
geohotstan	f8294b3bda	add avg pool 3d failure test (#8105 ) * add test * try simplify test case * add TODO comment	2024-12-07 16:34:38 -05:00
chenyu	2d321646b8	default tensors to int32 in test_ops (#8097 ) torch defaults to int64 but we care more about int32 anyway. remove skipped tests due to int64 not supported	2024-12-06 20:33:36 -05:00
chenyu	d000c08f04	fix return type of Tensor.pow (#8091 ) int to power of int should return int etc, it hints that we would like to have Ops.POW	2024-12-06 13:38:29 -05:00
geohotstan	0b7c44677d	Fix uint8 cast underflow (#6305 ) * hacky fix for cast * only float to uint8 * limit to float -> uint8 * touchup alu cast test * improve tests and support more float to unsigned casts * del one repeated test * del 1 more repeated test * try removing expected failure test * hmmm try 1 more * skip tests for flakiness * uint64 super flaky * clean up * grammar * just match numpy * why is CI numpy different from local numpy * increase verbosity * try * try2 * try3 * try4 * yeah idk * new direction * try again * just don't support uint32 and uint64 * done? * oops * comment * documentation * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-06 10:25:03 -05:00
geohotstan	a684d72e55	add ceil_mode for avg_pool and max_pool (#7579 ) * wip pool * check CI for remove alternative implementation * Revert "check CI for remove alternative implementation" This reverts commit `7b1bb900e5`. * fix test * tests tests tests * slap a resolve on it * fix comment * a little simpler pool * check CI for removal again * Revert "check CI for removal again" This reverts commit `be798b7857`. * small * update * some ez tests * english * clean up code * fix ruff * how did I +25 lines? * small clean ups * moar clean ups * try test_avgpool2d_failure2 in CI * final clean up * exclude bug fix * avg underscore pool * no more edge case stuff * add better comments for explanation * add test cases for decreasing end padding * address feedback * improve test coverage * tiny more polish as we wait for lines :D * more readable code ordering * add to documentation * oops * set to False instead --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-12-06 08:34:14 -05:00
Ahmed Harmouche	13eedd373b	Run WebGPU tests on ubuntu (#8033 )	2024-12-04 12:42:04 +01:00
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
geohotstan	0a2e10be1d	add SELU to Tensor (#7993 ) * add selu * more clean ups	2024-12-02 10:04:01 -05:00
geohotstan	765096fe7d	fix Tensor._pool edge case (#7581 ) * split into another branch * polish * try this * Revert "try this" This reverts commit `84f711b13e`. * try * Revert "try" This reverts commit `89c7a7649b`. * idk anymore * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-11-28 23:17:13 -05:00

1 2 3 4 5 ...

623 Commits