tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
qazal	1839e8c9b3	place masks in INDEX for TestGatedStoreRewrite [pr] (#9512 )	2025-03-20 09:46:53 +08:00
geohotstan	8c0d0a122c	Add return_indices to max_pool (#9506 ) * wow argmax is so good * 1 less line * clean up and better variable names * is this torch thing right...? * add more tests * slap a TODO on it * clean ups * prettier looking code and fix ceil mode test * add return types and some docs * ok that was a bad example since indices == value, just no example	2025-03-19 15:25:37 -04:00
chenyu	189f62d44f	add rounding to tqdm unit scale (#9507 ) fixed `AssertionError: ' 1.00/10.0 1000it/s]' != ' 1.00/10.0 1.00kit/s]'`	2025-03-19 12:08:46 -04:00
b1tg	2c87a22cf2	fix prg size calculation when there are adjacent mapped ranges (#9498 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-19 11:55:03 +08:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
qazal	ae688e4103	simple failing test for scheduling parallel reduce [pr] (#9501 ) * simple failing test for scheduling parallel reduce [pr] * atol	2025-03-19 10:52:13 +08:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
qazal	935cd01f56	simple failing test for graph_rewrite children [pr] (#9489 ) * simple failing test for graph_rewrite children [pr] * lint * update too	2025-03-18 13:07:21 +08:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00
chenyu	4f8eac59ea	failed test case for threefry (#9469 ) * failed test case for threefry not sure if it's always like this, but increment before _threefry_random_bits is incorrect. the counts should start with random numbers generated so far. use jax to generate 20 + 20 + 10 random numbers, the first 20 + 20 matches and the last 10 are different. just moving increment after _threefry_random_bits matches the number but jit test failes * workaround * why is this different? * revert those * and that	2025-03-17 14:52:10 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
qazal	e03c0aacf2	more explicit DONT_PUSH_VIEWS [pr] (#9479 ) * more explicit DONT_PUSH_VIEWS [pr] * update tests to not handcode ast * lint * test_recursive_swizzle and test_simple_store_reshape	2025-03-17 20:43:21 +08:00
qazal	3b00a778ba	fix view_left for unsafe pad ops [pr] (#9478 )	2025-03-17 19:02:02 +08:00
qazal	813f713edc	merge_views for buffer ops + create valids last (#9472 ) * merge_views for buffer ops + create valids last * view.arg * pass	2025-03-17 17:15:44 +08:00
qazal	bd1f71c1e2	simple failing test for extra ops in VALID [pr] (#9474 ) * simple failing test for extra valids [pr] * this has DEBUG=4	2025-03-17 17:02:40 +08:00
qazal	e26caf4c3a	hotfix: skip test_mean_half_precision_underflow on amd ci (#9476 ) The global size is very large (781250 gidx) and the emulated version takes more than 1 minute to execute the kernel.	2025-03-17 16:47:48 +08:00
George Hotz	15ee742afa	add get_children_map to uop (#9470 ) * add get_children_map to uop * update_children * fix new children	2025-03-17 14:36:13 +08:00
George Hotz	cb7a7f69c7	quantization preprocessor from DSP, should be universal (#9437 ) * quantization preprocessor from DSP, should be universal * touchups * fix tests	2025-03-15 07:49:37 +08:00
chenyu	99b0287e4e	add GROUP and GROUPTOP to test_arange (#9432 ) it does not grow quadratically, but it's not 0 ops now	2025-03-13 11:28:38 -04:00
qazal	90ffa9bd45	swizzle without buffer ops try 2 [pr] (#9427 ) * add DONT_PUSH_VIEWS to matchers * swizzle without buffer ops try 2 [pr] * swizzle reduceop * simple failing test * fix failing test * s/on/for	2025-03-13 10:00:40 +01:00
chenyu	22fc0a2e36	bert sum acc in half (#9412 ) also BS=96	2025-03-11 23:03:15 -04:00
George Hotz	e174c6c3bc	new devectorizer (#9331 ) * new devectorizer * lidx * test linearizer passes * fix images * fix unfoldable image load * delete unused * improve fix_unfoldable_image_load * working for image * fixup types * fixup transcendental * cast_vec * cleaner transcendental * skip failing test * err, flip that * not devec * sqrt	2025-03-11 18:47:56 +08:00
George Hotz	2780e2027e	devectorize prereqs [pr] (#9404 )	2025-03-11 12:33:29 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
qazal	59dfb234eb	replace hardcoded ast with tensors in TestSwizzle [pr] (#9401 )	2025-03-10 19:33:57 +01:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
qazal	a1f41fadf6	test_schedule cleanups + add DONT_GROUP_REDUCES [pr] (#9392 ) * test_schedule cleanups + add DONT_GROUP_REDUCES [pr] * replace with test_swizzle_reduceop * delete duplicate tests * test_allow_push_permutes * one kernel tests	2025-03-09 15:01:08 +01:00
qazal	286b480f82	do not replace assign with the offset buffer [pr] (#9387 )	2025-03-08 11:57:44 +01:00
qazal	0d2762c010	prep refactor for adding buffer ops last [pr] (#9383 ) * prep refactor for adding buffer ops last [pr] * freeze buffers * add swizzle_reduceop * shape for reduceop_view_right * simpler elementwise_view_right * add shapetracker to const * only const * from process replay	2025-03-08 08:00:14 +01:00
nimlgen	243078dda9	am: optimize tlb usage (#9049 ) * am: optimize tlb usage * fxies * comments * tiny	2025-03-07 19:37:29 +03:00
geohotstan	088d86691b	fix onnx gather and onnx auto_pad VALID mode (#9375 ) * fix gather and auto_pad * long -> int64	2025-03-07 10:27:23 -05:00
hooved	136cf7b8b1	hotfix: load >2 GiB from disk on macOS (#9361 ) * enable loading >2 GiB buffer from disk on macOS * handle None case raised by mypy * add test * revert fix to repro bug in CI * tell CI to run a unit test for macOS * reapply fix	2025-03-07 14:51:58 +08:00
nimlgen	9bd13de44c	lower test_gemv_4096_16384 to 750 for red (#9367 )	2025-03-05 22:44:48 +03:00
uuuvn	b75f307234	amd: autogen ip bases (#9360 )	2025-03-05 22:30:38 +03:00
chenyu	2cb2fce8d9	lower test_gemm_8192 amd_tflops to 65 (#9364 )	2025-03-05 14:06:11 -05:00
nimlgen	14c88abf27	add some options to allreduce bench (#9348 )	2025-03-04 23:46:36 +03:00
Anish Umale	bafa40fe12	Tiny backend test_ops fix part1 (#9338 ) * extract name methods from https://github.com/tinygrad/tinygrad/pull/9302 * t.grad.numpy() -> t.grad.cpu().numpy() * revert TORCH_DEBUG change * revert dtype change in aten.sum	2025-03-03 12:36:51 -05:00
George Hotz	0d4ba7dd87	import tinygrad.frontend.torch (#9337 ) * import tinygrad.frontend.torch * type ignore	2025-03-04 00:15:29 +08:00
qazal	23084fd850	merge merge_views and remove_movement_ops [pr] (#9333 ) * merge merge_views and remove_movement_ops [pr] * fix that assert	2025-03-03 12:38:59 +01:00
George Hotz	ece0a0f305	use empty for test instead of rand (#9332 )	2025-03-03 16:19:06 +08:00
George Hotz	2cc4cb74f0	reorder binops (#9328 ) * reorder binops * test improvements + fix string tests * ugh, okay this	2025-03-03 14:58:18 +08:00
chenyu	146eb73790	fix Tensor.view with a tuple arg (#9330 )	2025-03-02 23:35:23 -05:00
chenyu	ba4b8c2c23	Tensor.copysign (#9329 )	2025-03-02 21:33:49 -05:00
nimlgen	8cae00833c	flaky test in ci (#9321 )	2025-03-02 16:27:22 +03:00
Ali Ladjevardi	00028e87bb	Failing test for not realizing intermediate expand in multi-GPU (#9320 )	2025-03-02 12:54:48 +01:00
George Hotz	ba97fd0b9c	hotfix: add test/external/external_benchmark_disk_raw	2025-03-02 02:32:15 +00:00
chenyu	cc2bbb0bf1	Tensor.isfinite (#9316 )	2025-03-01 19:58:56 -05:00
geohotstan	d9ec05cea6	Test Onnx quantization behavior (#9301 ) * add DynamicDequantizeLinear and corresponding tests * wow qlinearops are round away from zero * this passes locally... * again * try * try separate test * round to even again * also add QLinearMul --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-01 19:21:58 -05:00
chenyu	fe0f860209	update test_ops for tensors from torch (#9308 ) a few detach().numpy() -> detach().cpu().numpy()	2025-02-28 15:57:25 -05:00

1 2 3 4 5 ...

3484 Commits