tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 17:38:06 -05:00

Author	SHA1	Message	Date
qazal	f4ec57baff	new schedule linearizer enqueues KERNEL UOps [pr] (#9993 ) * new schedule linearizer enqueues kernels [pr] * no defaultdict * diff * minor	2025-04-23 05:17:58 +08:00
George Hotz	d1f6701eb7	hotfix: lower amd threshold + improve block reorder test	2025-04-22 20:44:29 +01:00
nimlgen	db51133537	rename HWInterface -> FileIOInterface (#9989 ) * rename HWInterface -> FileIOInterface * ugh	2025-04-22 22:18:57 +03:00
George Hotz	c1539b0319	putting add first orders loads as expected (#9991 )	2025-04-22 20:12:05 +01:00
nimlgen	bd580d8ea4	hcq: use mmio interface in nv (#9986 ) * hcq: start mmio interface * allow double cast * revert * faster? * simpler, not needed more now * dd * types * fix	2025-04-22 21:58:12 +03:00
George Hotz	feee6986c9	faster block reorder (#9990 ) * faster block reorder [pr] * that shouldn't change order * key just in sorted * ind	2025-04-22 19:18:57 +01:00
qazal	6cb2d18c03	refactor schedule linearize to defaultdict [pr] (#9984 ) * refactor schedule linearize to defaultdict [pr] * skip that * don't need .get	2025-04-23 00:00:23 +08:00
chenyu	9e5e371999	make DISABLE_COMPILER_CACHE a ContextVar [pr] (#9983 )	2025-04-22 10:32:54 -04:00
qazal	bbc324f5dc	remove CAST_AFTER_EXPAND (#9980 )	2025-04-22 21:06:11 +08:00
George Hotz	c519b553db	non recursive toposort is 2x+ faster (#9979 ) * non recursive toposort is 2x+ faster * don't change the order	2025-04-22 13:59:38 +01:00
qazal	7b55846e08	prep STORE UOp creation for multi output [pr] (#9975 ) * prep STORE UOp creation for multi output [pr] * test_multioutput_ast	2025-04-22 19:34:52 +08:00
George Hotz	e358e0a0c6	move metadata set to tensor [pr] (#9976 ) * move metadata set to tensor [pr] * only track that in tensor.py	2025-04-22 12:30:35 +01:00
George Hotz	f5dc70c624	microbenchmarks + micro speed ups (#9972 ) * microbenchmarks * forgot the ubenchs * clean up type verify	2025-04-22 11:30:46 +01:00
qazal	1cf4e24ca5	fix kernelize usage with pm_gradient (#9953 ) * fix kernelize usage with pm_gradient * remove that	2025-04-22 17:26:05 +08:00
qazal	36ed3c3253	fix kernelize with VIEW children (#9961 )	2025-04-21 23:38:46 +08:00
qazal	e8910540f6	Kernelize can be called multiple times on a Tensor (#9949 ) * Kernelize can be called multiple times on a Tensor * add (failing) test_kernelize_bw	2025-04-21 06:28:47 +08:00
qazal	1d90be2cff	match kernelize API in process replay (#9948 )	2025-04-21 05:23:41 +08:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
qazal	dd16087f62	fold double ASSIGN to same target (#9941 )	2025-04-20 19:06:38 +08:00
qazal	9a9aba4cd5	setitem tests (some failing) from kernelize (#9940 )	2025-04-20 18:47:55 +08:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	720f20865b	remove required_optimizations (#9848 )	2025-04-19 16:51:16 -04:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
qazal	b58decac0c	fix diamond assigns before mapping tensors UOps to assigns (#9855 ) * keep tensor_map until diamond assign fixup * ctx	2025-04-18 14:17:43 +03:00
George Hotz	aa98aff4cd	don't use ops name, just keep sink (#9922 ) * don't use ops name, just keep sink * fix test * endif sink	2025-04-18 08:59:18 +01:00
George Hotz	8919370c76	hotfix: fix test_save_all_dtypes on METAL	2025-04-18 08:42:31 +01:00
qazal	16dfe0a902	upstream remu (#9921 )	2025-04-18 01:57:36 +03:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
Eitan Turok	2c7c205bc5	Fix dtype comparisons in vectorized transcendental + tests (#9794 ) * init test * cleanup * init * update * fix * fix python runtime for vectorized code * awesome helper * update * update * cleanup * more cleaning * cleanup more * fix tests * more cleaning * cleanup more * fix * even cleaner * failing tests is sad * cleanup * better name * make tests pass * remove vec from python runtime * remove vec from eval_uop * remove expected failues * better name	2025-04-16 08:06:12 -04:00
geohotstan	4e8f25109a	Revert "ONNX add output shape validation (#9720 )" (#9904 ) This reverts commit `ac713e04db`.	2025-04-16 03:15:56 -04:00
pkotzbach	5849c43382	FP8s part 1 (#9887 ) * fp8s part 1 * prettier * fixes * fixes * remove stuff that should be in next pr * revert * add creation --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-15 11:20:02 -04:00
nimlgen	83ae83d871	compare amd and am to cpu as well (#9896 )	2025-04-15 13:32:18 +03:00
nimlgen	23a95dd84d	script to compare amd and am kerns (#9889 ) * script to compare amd and am kerns * tool * is it used???	2025-04-15 00:11:22 +03:00
chenyu	ce454793e6	support specifying dtype for Tensor.linear (#9886 )	2025-04-14 13:55:11 -04:00
George Hotz	44e4934167	fast pattern matcher [pr] (#9737 ) * FastPatternMatcher * works without that * fix test pickle * strict len * compile match function * dynamic compile * fast * faster * compile * track * a lot faster * clean up * dup or * faster and simpler * fast match doesn't support store * plane * minor refactor * real speed * don't imply return None * upat * fix test * heard you wanted more speed * no generator * split cf * early fixup * fxn fixup * reconstruct_function * Revert "reconstruct_function" This reverts commit `37dac010ab`. * simpler stuff * too big * upat compile error * cleanups * don't cache that * cleanups * 10 -> 15	2025-04-14 15:24:41 +01:00
qazal	e201bc3e93	process replay kernel asts in toposort order [pr] (#9869 ) * process replay kernel asts in toposort order [pr] * use HEAD replay	2025-04-13 17:20:34 +08:00
Alexey Zaytsev	7dda6aae7d	Skip CLOUD in external_test_example (#9857 ) Closes #9814	2025-04-12 10:17:44 +08:00
George Hotz	dd52951dd0	fix single kernel softmax with cast (#9842 ) * fix single kernel softmax with cast * tolerate none * 3e-4 * skip on dtype	2025-04-11 12:12:02 +08:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
chenyu	e0ec8be37d	use CPU for test_schedule_ring (#9843 ) * use CPU for test_schedule_ring * why pre-commit is good	2025-04-10 23:20:53 -04:00
qazal	fbc6aa53d4	script for local process_replay + fix viz name [pr] (#9837 )	2025-04-11 00:39:18 +08:00
qazal	16956b79de	canonicalize Device.DEFAULT (#9835 )	2025-04-10 23:02:11 +08:00
George Hotz	f666dd14eb	fix get reduce contraction with test (#9834 )	2025-04-10 22:24:21 +08:00
chenyu	7fa5f29582	add test_embedding to test_softmax_fusion (#9832 )	2025-04-10 08:25:34 -04:00
George Hotz	53f0b2aad7	fix infinite loop in flash attention (#9827 ) * fix infinite loop in flash attention * get_contraction_with_reduce * skip that test * SINGLE_KERNEL_SOFTMAX + fix multi * default IGNORE_OOB * print change	2025-04-10 20:06:44 +08:00
qazal	16afe04f45	move process replay to grouper (#9830 ) * simpler * sched	2025-04-10 18:27:42 +08:00
chenyu	c8f47c1d07	not_support_multi_device helper (#9831 ) unify the test helper to skip ci device that does not support multi	2025-04-10 05:25:29 -04:00
chenyu	c462162db8	update benchmark bert scripts with BS and ACC_DTYPE (#9826 ) BS=16, ACC_DTYPE=half for tinybox, BS=128, ACC_DTYPE=float for mi300x	2025-04-10 02:06:02 -04:00
qazal	498a2bf738	add err handling tests to viz + cleanups (#9825 ) * cleanup * add err handling tests to viz + cleanups * lint	2025-04-10 14:05:05 +08:00
George Hotz	fce432d2e3	Ops.FUSE makes softmax a single kernel (#9808 ) * KERNELIZE makes softmax a single kernel * single kernel works * softmax works * broken * correct * skip that test * kernelize tests * rename to fuse * better reduce_push_add_ones code * correct now * cleanups * oops * return None if we can't push ones * rename + docs * atol fixes group * flash attention broken test	2025-04-09 22:56:28 +08:00

... 20 21 22 23 24 ...

4667 Commits