tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-15 01:48:23 -05:00

Author	SHA1	Message	Date
geohotstan	d52e91db7b	ONNX ops clean ups (#9622 ) * combine work from remove numpy and onnx ops tests * clippy --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:39:22 -04:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
nimlgen	54e1e59b44	am: rdna 4 support (#9621 ) * hm * fix * return this * fine * g * ruff * fix	2025-03-29 23:16:27 +07:00
nimlgen	118bd1cbed	hotfix: amd imports (#9620 )	2025-03-29 20:19:53 +07:00
uuuvn	dd9aae02c3	Refactor ops_amd.py (MI300X prereq) (#9428 )	2025-03-29 00:17:20 +07:00
nimlgen	fa0ebbd237	jit: optimize before pickle (#9611 ) * jit: optimize before pickle * optimize weights * fix * mypy * mypy2	2025-03-28 19:06:09 +07:00
Andrew Furey	50dee4a7b3	add test for checking const gradients (#9598 )	2025-03-27 15:17:37 -04:00
chenyu	5358b0904b	update uop_given_valid if a node becomes const (#9604 ) * update uop_given_valid if a node becomes const * cleanup	2025-03-27 14:57:46 -04:00
qazal	bf94924d5a	fix viz with nested graph_rewrite (#9595 )	2025-03-27 13:14:28 +08:00
qazal	e5ff7b23d7	refactor to @track_matches + add failing test_nested_rewrite (#9592 ) * test_nested_rewrite * refactor to track_matches * positional arg	2025-03-27 11:11:56 +08:00
nimlgen	dc9da1d917	memplan into one buffer (#9526 ) * new memplanner * new should works * fix * VALIDATE_MEMORY_PLANNER * hm? * ugh * fix alignment * fix2 * rm * tiny fixes * test * comments and fixes * fix2 * liiiinetr * t * fix	2025-03-27 01:46:50 +07:00
nimlgen	e88a640ca5	fix _access_resources for offset buffers (#9580 ) * fix _access_resources for offset buffers * test	2025-03-26 18:42:43 +07:00
George Hotz	9115ce8860	linearizer fixups from DSP branch (#9581 )	2025-03-26 18:28:15 +08:00
nimlgen	ccbcdca473	add memplanner tests (#9577 )	2025-03-26 10:59:39 +07:00
chenyu	cddd750d68	add a failed test case for jit/nojit rand [pr] (#9574 ) currently adding jit produced different rand values	2025-03-25 13:32:44 -04:00
qazal	52301fe68e	move Buffer refcount increment out of schedule.py (#9564 ) * move Buffer refcount increment out of schedule.py * add TestGC.test_assign_refcount * refcount refers to Ops.BUFFER UOps	2025-03-25 12:08:27 +08:00
chenyu	6427272bf6	minor update to rand [pr] (#9566 )	2025-03-24 18:49:50 -04:00
qazal	d7c754ce49	failing test for UOp buffer ref count (#9563 ) * failing test for UOp buffer ref count * lint	2025-03-25 00:10:48 +08:00
b1tg	f90001e1a6	amd llvm render (no_comgr prereq) (#9543 ) * amd llvm render * skip test_div_rounding_mode --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-24 22:50:51 +08:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
nimlgen	d5667419af	am: move out pte creation logic (#9548 ) * am: move out pte creation logic * emu * ops	2025-03-23 18:29:10 +07:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
quortus	bdd44d4255	Fix DSP transcendentals (#9542 )	2025-03-22 11:08:18 +08:00
chenyu	c33679c47b	increase size in test_multinomial_counterexample (#9540 ) should be less flaky	2025-03-21 17:46:52 -04:00
Francis Lata	1a1087e3a0	cleanups on losses and dataset tests (#9538 )	2025-03-21 17:03:18 -04:00
Francis Lata	8cbe4009fc	RetinaNet losses (#9536 ) * add sigmoid_focal_loss and l1_loss * update ref implementation comment	2025-03-21 15:52:54 -04:00
Francis Lata	e6389184c5	update comment for retinanet dataloader implementations (#9534 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 15:07:45 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
b1tg	58206fa8a9	add amd llvm compiler (#9519 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 23:13:27 +08:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00
George Hotz	3c5161b4cb	add validation of the bounds of Ops.INDEX (#9503 ) * add validation of the bounds of Ops.INDEX * do mask properly * more validation * correct * fix gated * add CAST support to vmin/vmax * fix ptx and image * ptx no diff * upat.index also stays --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-03-20 12:15:55 +08:00
qazal	0b20f91ce7	remove move_mask from the devectorizer (#9511 ) * remove move_mask from the devectorizer * add (wrong) ptx * reason * enable index addition in PTX, we won't have the INDEX anyways * space	2025-03-20 11:53:12 +08:00
qazal	1839e8c9b3	place masks in INDEX for TestGatedStoreRewrite [pr] (#9512 )	2025-03-20 09:46:53 +08:00
geohotstan	8c0d0a122c	Add return_indices to max_pool (#9506 ) * wow argmax is so good * 1 less line * clean up and better variable names * is this torch thing right...? * add more tests * slap a TODO on it * clean ups * prettier looking code and fix ceil mode test * add return types and some docs * ok that was a bad example since indices == value, just no example	2025-03-19 15:25:37 -04:00
chenyu	189f62d44f	add rounding to tqdm unit scale (#9507 ) fixed `AssertionError: ' 1.00/10.0 1000it/s]' != ' 1.00/10.0 1.00kit/s]'`	2025-03-19 12:08:46 -04:00
b1tg	2c87a22cf2	fix prg size calculation when there are adjacent mapped ranges (#9498 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-19 11:55:03 +08:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
qazal	ae688e4103	simple failing test for scheduling parallel reduce [pr] (#9501 ) * simple failing test for scheduling parallel reduce [pr] * atol	2025-03-19 10:52:13 +08:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
qazal	935cd01f56	simple failing test for graph_rewrite children [pr] (#9489 ) * simple failing test for graph_rewrite children [pr] * lint * update too	2025-03-18 13:07:21 +08:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00
chenyu	4f8eac59ea	failed test case for threefry (#9469 ) * failed test case for threefry not sure if it's always like this, but increment before _threefry_random_bits is incorrect. the counts should start with random numbers generated so far. use jax to generate 20 + 20 + 10 random numbers, the first 20 + 20 matches and the last 10 are different. just moving increment after _threefry_random_bits matches the number but jit test failes * workaround * why is this different? * revert those * and that	2025-03-17 14:52:10 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
qazal	e03c0aacf2	more explicit DONT_PUSH_VIEWS [pr] (#9479 ) * more explicit DONT_PUSH_VIEWS [pr] * update tests to not handcode ast * lint * test_recursive_swizzle and test_simple_store_reshape	2025-03-17 20:43:21 +08:00
qazal	3b00a778ba	fix view_left for unsafe pad ops [pr] (#9478 )	2025-03-17 19:02:02 +08:00
qazal	813f713edc	merge_views for buffer ops + create valids last (#9472 ) * merge_views for buffer ops + create valids last * view.arg * pass	2025-03-17 17:15:44 +08:00
qazal	bd1f71c1e2	simple failing test for extra ops in VALID [pr] (#9474 ) * simple failing test for extra valids [pr] * this has DEBUG=4	2025-03-17 17:02:40 +08:00
qazal	e26caf4c3a	hotfix: skip test_mean_half_precision_underflow on amd ci (#9476 ) The global size is very large (781250 gidx) and the emulated version takes more than 1 minute to execute the kernel.	2025-03-17 16:47:48 +08:00

... 22 23 24 25 26 ...

4667 Commits