tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
nimlgen	1c5e353249	am: use mmio iface (#10012 ) * am: use mmio iface * linters * fixes * fixes + cleanups * mute * mypy * style	2025-04-24 00:27:04 +03:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
George Hotz	71ecc7fa1a	use a pattern matcher for upcast [pr] (#10000 )	2025-04-23 14:24:23 +01:00
George Hotz	cc1087d2ec	move simplify into views_to_indexed_uops (#9999 ) * move simplify into views_to_indexed_uops * cache that	2025-04-23 13:50:27 +01:00
George Hotz	d1f6701eb7	hotfix: lower amd threshold + improve block reorder test	2025-04-22 20:44:29 +01:00
qazal	1d90be2cff	match kernelize API in process replay (#9948 )	2025-04-21 05:23:41 +08:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	720f20865b	remove required_optimizations (#9848 )	2025-04-19 16:51:16 -04:00
qazal	b58decac0c	fix diamond assigns before mapping tensors UOps to assigns (#9855 ) * keep tensor_map until diamond assign fixup * ctx	2025-04-18 14:17:43 +03:00
George Hotz	aa98aff4cd	don't use ops name, just keep sink (#9922 ) * don't use ops name, just keep sink * fix test * endif sink	2025-04-18 08:59:18 +01:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
geohotstan	4e8f25109a	Revert "ONNX add output shape validation (#9720 )" (#9904 ) This reverts commit `ac713e04db`.	2025-04-16 03:15:56 -04:00
nimlgen	83ae83d871	compare amd and am to cpu as well (#9896 )	2025-04-15 13:32:18 +03:00
nimlgen	23a95dd84d	script to compare amd and am kerns (#9889 ) * script to compare amd and am kerns * tool * is it used???	2025-04-15 00:11:22 +03:00
qazal	e201bc3e93	process replay kernel asts in toposort order [pr] (#9869 ) * process replay kernel asts in toposort order [pr] * use HEAD replay	2025-04-13 17:20:34 +08:00
Alexey Zaytsev	7dda6aae7d	Skip CLOUD in external_test_example (#9857 ) Closes #9814	2025-04-12 10:17:44 +08:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
qazal	fbc6aa53d4	script for local process_replay + fix viz name [pr] (#9837 )	2025-04-11 00:39:18 +08:00
qazal	16afe04f45	move process replay to grouper (#9830 ) * simpler * sched	2025-04-10 18:27:42 +08:00
chenyu	c462162db8	update benchmark bert scripts with BS and ACC_DTYPE (#9826 ) BS=16, ACC_DTYPE=half for tinybox, BS=128, ACC_DTYPE=float for mi300x	2025-04-10 02:06:02 -04:00
George Hotz	fefee5d3ab	single kernel softmax (#9776 ) * real single kernel softmax * cleanup * fix blockend insertion * add to bert test	2025-04-08 12:35:48 +08:00
George Hotz	db22094d35	hotfix: update softmax fusion test	2025-04-08 11:23:19 +08:00
Sieds Lykles	07d1aefaf4	fast idiv (#9755 ) * fast idiv with tests and fuzzer * Add todo comment * Add env variable to toggle fast_idiv * Move env check * Add fuzz fast_idiv to ci --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-07 08:32:24 -04:00
chenyu	b190d85ad7	benchmark script bert softmax (#9759 )	2025-04-07 00:31:18 -04:00
chenyu	43e4565148	weighted linear in external_benchmark_bert_matmuls (#9757 ) include the linear to get qkv, and permute so that stride matches with the real run	2025-04-06 23:35:42 -04:00
chenyu	8a585dc5c1	benchmark script for matmuls in bert (#9752 ) 2 main matmuls in the bert layers. getting these to be fast makes bert fast	2025-04-06 19:34:25 +08:00
George Hotz	926b0bcc57	cache folded upcast [pr] (#9733 )	2025-04-04 11:23:19 +08:00
geohotstan	ac713e04db	ONNX add output shape validation (#9720 ) * add output shape validation and remove support for sequence_type * nit better err msg * add sequence_type back * improve err msg * Revert "improve err msg" This reverts commit `dc9eaea4bb`. * Revert "add sequence_type back" This reverts commit `288170b2d9`. * do explicit shape equality * small nit	2025-04-03 05:44:53 -04:00
George Hotz	49dafe6d43	add gc tests [pr] (#9718 ) * add gc tests [pr] * del * more gc tests * add NullGraph	2025-04-03 14:08:32 +08:00
geohotstan	e1d7e47cca	fix ONNX IsInf unintended dtype promotion (#9711 ) * add IsInf * add corresponding test * that float16 is kinda silly	2025-04-02 22:46:15 -04:00
qazal	bb94f13e58	add RECORD_TRACEBACKS=1 option to process replay (#9679 ) * add RECORD_TRACEBACKS=1 option to process replay * stack	2025-04-02 11:58:27 +08:00
chenyu	c672716b38	improve vmin/vmax for IDIV (#9678 )	2025-04-01 23:16:01 -04:00
geohotstan	d52e91db7b	ONNX ops clean ups (#9622 ) * combine work from remove numpy and onnx ops tests * clippy --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 21:39:22 -04:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
nimlgen	54e1e59b44	am: rdna 4 support (#9621 ) * hm * fix * return this * fine * g * ruff * fix	2025-03-29 23:16:27 +07:00
nimlgen	118bd1cbed	hotfix: amd imports (#9620 )	2025-03-29 20:19:53 +07:00
George Hotz	9115ce8860	linearizer fixups from DSP branch (#9581 )	2025-03-26 18:28:15 +08:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
nimlgen	d5667419af	am: move out pte creation logic (#9548 ) * am: move out pte creation logic * emu * ops	2025-03-23 18:29:10 +07:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
Francis Lata	1a1087e3a0	cleanups on losses and dataset tests (#9538 )	2025-03-21 17:03:18 -04:00
Francis Lata	8cbe4009fc	RetinaNet losses (#9536 ) * add sigmoid_focal_loss and l1_loss * update ref implementation comment	2025-03-21 15:52:54 -04:00
Francis Lata	e6389184c5	update comment for retinanet dataloader implementations (#9534 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 15:07:45 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
nimlgen	243078dda9	am: optimize tlb usage (#9049 ) * am: optimize tlb usage * fxies * comments * tiny	2025-03-07 19:37:29 +03:00
geohotstan	088d86691b	fix onnx gather and onnx auto_pad VALID mode (#9375 ) * fix gather and auto_pad * long -> int64	2025-03-07 10:27:23 -05:00
nimlgen	9bd13de44c	lower test_gemv_4096_16384 to 750 for red (#9367 )	2025-03-05 22:44:48 +03:00
chenyu	2cb2fce8d9	lower test_gemm_8192 amd_tflops to 65 (#9364 )	2025-03-05 14:06:11 -05:00
nimlgen	14c88abf27	add some options to allreduce bench (#9348 )	2025-03-04 23:46:36 +03:00

... 2 3 4 5 6 ...

870 Commits