tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
George Hotz	1714fc3ba4	start work on speed [pr] (#9707 ) * fix get_location * fix get_location try 2 * clean up split_load_store [pr] * SHR fixup [pr]	2025-04-03 10:39:01 +08:00
George Hotz	0f1ffc2050	hotfix: cat tests 2048 instead of 256	2025-04-03 10:37:56 +08:00
Ignacio Sica	2d6d8b7355	add bf16 mfma support (#9695 ) * add bf16 mfma support * skip tc if emulated_amd and dtypes is bf16 * hotfix	2025-04-02 21:44:49 +08:00
chenyu	3b8d923692	remove skip LLVM in test_div_int (#9686 )	2025-04-02 04:15:00 -04:00
George Hotz	e78e8722dc	Revert "LDS noop and spec (#9669 )" (#9691 ) This reverts commit `870b545ace`. Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>	2025-04-02 15:31:32 +08:00
chenyu	c20f112e9f	example test use z3 to verify valid simplification (#9684 )	2025-04-02 01:05:52 -04:00
chenyu	bca0c85193	skip CI CPU test_data_parallel_resnet_train_step (#9685 ) flaky	2025-04-02 01:04:54 -04:00
qazal	bb94f13e58	add RECORD_TRACEBACKS=1 option to process replay (#9679 ) * add RECORD_TRACEBACKS=1 option to process replay * stack	2025-04-02 11:58:27 +08:00
chenyu	c672716b38	improve vmin/vmax for IDIV (#9678 )	2025-04-01 23:16:01 -04:00
chenyu	8dd88ad476	don't div_and_mod_folding for negative numerator with remainder (#9674 ) can be wrong in C div since it truncates towards zero	2025-04-01 16:26:23 -04:00
chenyu	0e34f9082e	helper functions for cstyle div mod [pr] (#9673 )	2025-04-01 08:06:56 -04:00
Ignacio Sica	870b545ace	LDS noop and spec (#9669 ) * init lds noop and lds_0 spec * refactor lds helper test * fix typo * test all lds at the same time * change comment * comment * start test_lds_full * test_lds_tc * add tc spec	2025-04-01 18:44:55 +08:00
b1tg	d9af4cfc1b	AMD_LLVM: tensor cores support (#9613 ) * tensor cores support * test tesor cores codegen * use rewrite rules --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-04-01 09:56:27 +08:00
Ignacio Sica	1444069c09	Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test (#9649 )	2025-03-31 19:05:36 +08:00
Ignacio Sica	baa67fd124	Uppercase N and M (standalone syntax change) (#9647 )	2025-03-31 18:45:30 +08:00
Yvon Manzi	6652003839	Add cumprod to Tensor (#9629 ) * probably how cumprod should look like * update _cumalu to work with MUL * shorter * cumprod testing * clean * more cleanup * add cumprod to torch backend. * make it look like cumsum * mypy fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:49:18 -04:00
geohotstan	d52e91db7b	ONNX ops clean ups (#9622 ) * combine work from remove numpy and onnx ops tests * clippy --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:39:22 -04:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
nimlgen	54e1e59b44	am: rdna 4 support (#9621 ) * hm * fix * return this * fine * g * ruff * fix	2025-03-29 23:16:27 +07:00
nimlgen	118bd1cbed	hotfix: amd imports (#9620 )	2025-03-29 20:19:53 +07:00
uuuvn	dd9aae02c3	Refactor ops_amd.py (MI300X prereq) (#9428 )	2025-03-29 00:17:20 +07:00
nimlgen	fa0ebbd237	jit: optimize before pickle (#9611 ) * jit: optimize before pickle * optimize weights * fix * mypy * mypy2	2025-03-28 19:06:09 +07:00
Andrew Furey	50dee4a7b3	add test for checking const gradients (#9598 )	2025-03-27 15:17:37 -04:00
chenyu	5358b0904b	update uop_given_valid if a node becomes const (#9604 ) * update uop_given_valid if a node becomes const * cleanup	2025-03-27 14:57:46 -04:00
qazal	bf94924d5a	fix viz with nested graph_rewrite (#9595 )	2025-03-27 13:14:28 +08:00
qazal	e5ff7b23d7	refactor to @track_matches + add failing test_nested_rewrite (#9592 ) * test_nested_rewrite * refactor to track_matches * positional arg	2025-03-27 11:11:56 +08:00
nimlgen	dc9da1d917	memplan into one buffer (#9526 ) * new memplanner * new should works * fix * VALIDATE_MEMORY_PLANNER * hm? * ugh * fix alignment * fix2 * rm * tiny fixes * test * comments and fixes * fix2 * liiiinetr * t * fix	2025-03-27 01:46:50 +07:00
nimlgen	e88a640ca5	fix _access_resources for offset buffers (#9580 ) * fix _access_resources for offset buffers * test	2025-03-26 18:42:43 +07:00
George Hotz	9115ce8860	linearizer fixups from DSP branch (#9581 )	2025-03-26 18:28:15 +08:00
nimlgen	ccbcdca473	add memplanner tests (#9577 )	2025-03-26 10:59:39 +07:00
chenyu	cddd750d68	add a failed test case for jit/nojit rand [pr] (#9574 ) currently adding jit produced different rand values	2025-03-25 13:32:44 -04:00
qazal	52301fe68e	move Buffer refcount increment out of schedule.py (#9564 ) * move Buffer refcount increment out of schedule.py * add TestGC.test_assign_refcount * refcount refers to Ops.BUFFER UOps	2025-03-25 12:08:27 +08:00
chenyu	6427272bf6	minor update to rand [pr] (#9566 )	2025-03-24 18:49:50 -04:00
qazal	d7c754ce49	failing test for UOp buffer ref count (#9563 ) * failing test for UOp buffer ref count * lint	2025-03-25 00:10:48 +08:00
b1tg	f90001e1a6	amd llvm render (no_comgr prereq) (#9543 ) * amd llvm render * skip test_div_rounding_mode --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-24 22:50:51 +08:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
nimlgen	d5667419af	am: move out pte creation logic (#9548 ) * am: move out pte creation logic * emu * ops	2025-03-23 18:29:10 +07:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
quortus	bdd44d4255	Fix DSP transcendentals (#9542 )	2025-03-22 11:08:18 +08:00
chenyu	c33679c47b	increase size in test_multinomial_counterexample (#9540 ) should be less flaky	2025-03-21 17:46:52 -04:00
Francis Lata	1a1087e3a0	cleanups on losses and dataset tests (#9538 )	2025-03-21 17:03:18 -04:00
Francis Lata	8cbe4009fc	RetinaNet losses (#9536 ) * add sigmoid_focal_loss and l1_loss * update ref implementation comment	2025-03-21 15:52:54 -04:00
Francis Lata	e6389184c5	update comment for retinanet dataloader implementations (#9534 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 15:07:45 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
b1tg	58206fa8a9	add amd llvm compiler (#9519 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 23:13:27 +08:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00
George Hotz	3c5161b4cb	add validation of the bounds of Ops.INDEX (#9503 ) * add validation of the bounds of Ops.INDEX * do mask properly * more validation * correct * fix gated * add CAST support to vmin/vmax * fix ptx and image * ptx no diff * upat.index also stays --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-03-20 12:15:55 +08:00
qazal	0b20f91ce7	remove move_mask from the devectorizer (#9511 ) * remove move_mask from the devectorizer * add (wrong) ptx * reason * enable index addition in PTX, we won't have the INDEX anyways * space	2025-03-20 11:53:12 +08:00
qazal	1839e8c9b3	place masks in INDEX for TestGatedStoreRewrite [pr] (#9512 )	2025-03-20 09:46:53 +08:00

... 17 18 19 20 21 ...

4433 Commits