tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 05:48:08 -05:00

Author	SHA1	Message	Date
George Hotz	44e4934167	fast pattern matcher [pr] (#9737 ) * FastPatternMatcher * works without that * fix test pickle * strict len * compile match function * dynamic compile * fast * faster * compile * track * a lot faster * clean up * dup or * faster and simpler * fast match doesn't support store * plane * minor refactor * real speed * don't imply return None * upat * fix test * heard you wanted more speed * no generator * split cf * early fixup * fxn fixup * reconstruct_function * Revert "reconstruct_function" This reverts commit `37dac010ab`. * simpler stuff * too big * upat compile error * cleanups * don't cache that * cleanups * 10 -> 15	2025-04-14 15:24:41 +01:00
qazal	e201bc3e93	process replay kernel asts in toposort order [pr] (#9869 ) * process replay kernel asts in toposort order [pr] * use HEAD replay	2025-04-13 17:20:34 +08:00
Alexey Zaytsev	7dda6aae7d	Skip CLOUD in external_test_example (#9857 ) Closes #9814	2025-04-12 10:17:44 +08:00
George Hotz	dd52951dd0	fix single kernel softmax with cast (#9842 ) * fix single kernel softmax with cast * tolerate none * 3e-4 * skip on dtype	2025-04-11 12:12:02 +08:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
chenyu	e0ec8be37d	use CPU for test_schedule_ring (#9843 ) * use CPU for test_schedule_ring * why pre-commit is good	2025-04-10 23:20:53 -04:00
qazal	fbc6aa53d4	script for local process_replay + fix viz name [pr] (#9837 )	2025-04-11 00:39:18 +08:00
qazal	16956b79de	canonicalize Device.DEFAULT (#9835 )	2025-04-10 23:02:11 +08:00
George Hotz	f666dd14eb	fix get reduce contraction with test (#9834 )	2025-04-10 22:24:21 +08:00
chenyu	7fa5f29582	add test_embedding to test_softmax_fusion (#9832 )	2025-04-10 08:25:34 -04:00
George Hotz	53f0b2aad7	fix infinite loop in flash attention (#9827 ) * fix infinite loop in flash attention * get_contraction_with_reduce * skip that test * SINGLE_KERNEL_SOFTMAX + fix multi * default IGNORE_OOB * print change	2025-04-10 20:06:44 +08:00
qazal	16afe04f45	move process replay to grouper (#9830 ) * simpler * sched	2025-04-10 18:27:42 +08:00
chenyu	c8f47c1d07	not_support_multi_device helper (#9831 ) unify the test helper to skip ci device that does not support multi	2025-04-10 05:25:29 -04:00
chenyu	c462162db8	update benchmark bert scripts with BS and ACC_DTYPE (#9826 ) BS=16, ACC_DTYPE=half for tinybox, BS=128, ACC_DTYPE=float for mi300x	2025-04-10 02:06:02 -04:00
qazal	498a2bf738	add err handling tests to viz + cleanups (#9825 ) * cleanup * add err handling tests to viz + cleanups * lint	2025-04-10 14:05:05 +08:00
George Hotz	fce432d2e3	Ops.FUSE makes softmax a single kernel (#9808 ) * KERNELIZE makes softmax a single kernel * single kernel works * softmax works * broken * correct * skip that test * kernelize tests * rename to fuse * better reduce_push_add_ones code * correct now * cleanups * oops * return None if we can't push ones * rename + docs * atol fixes group * flash attention broken test	2025-04-09 22:56:28 +08:00
qazal	3bd992dc95	multi stage graph_rewrite_map (#9803 ) * multistage graph_rewrite_map * s/merge_map/input_map * build up kernel_map from the tensor_map	2025-04-09 15:59:45 +08:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	d1505137ad	Revert "move TestOpsFp8s skipTest (#9797 )" This reverts commit `a3aaf92b21`.	2025-04-09 12:27:40 +08:00
chenyu	a3aaf92b21	move TestOpsFp8s skipTest (#9797 ) so get_available_devices is not called when running other tests	2025-04-08 22:44:07 -04:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
qazal	f13e9cf2d9	move view_left to grouper.py + tiny reorders [pr] (#9780 ) * move view_left to grouper.py [pr] * reorder grouper * test_schedule	2025-04-08 15:39:28 +08:00
chenyu	7a28133b37	failed test for single softmax backward (#9778 ) getting RecursionError with DONT_GROUP_REDUCES=1	2025-04-08 02:36:32 -04:00
George Hotz	fefee5d3ab	single kernel softmax (#9776 ) * real single kernel softmax * cleanup * fix blockend insertion * add to bert test	2025-04-08 12:35:48 +08:00
qazal	9963bb51e0	grouper tests cleanups [pr] (#9777 ) * grouper tests cleanups [pr] * viz * tuple * whitespace	2025-04-08 12:33:11 +08:00
George Hotz	db22094d35	hotfix: update softmax fusion test	2025-04-08 11:23:19 +08:00
Eitan Turok	bb7922b95f	Vectorize Transcendental Regression Tests (#9753 ) * init test * cleanup	2025-04-08 01:27:39 +08:00
Sieds Lykles	07d1aefaf4	fast idiv (#9755 ) * fast idiv with tests and fuzzer * Add todo comment * Add env variable to toggle fast_idiv * Move env check * Add fuzz fast_idiv to ci --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-07 08:32:24 -04:00
nimlgen	fa888ee077	minor test cleanups (#9770 ) * fix test_graph on max * pcie5	2025-04-07 15:29:12 +03:00
qazal	891322fd51	split into grouper.py (#9768 ) * split into grouper.py * update tests * reorder	2025-04-07 18:40:59 +08:00
qazal	8ddb1357c0	fix UPat.location after pickle (#9763 ) * fix UPat.location after pickle [pr] * named upat test	2025-04-07 15:16:42 +08:00
chenyu	b190d85ad7	benchmark script bert softmax (#9759 )	2025-04-07 00:31:18 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
chenyu	43e4565148	weighted linear in external_benchmark_bert_matmuls (#9757 ) include the linear to get qkv, and permute so that stride matches with the real run	2025-04-06 23:35:42 -04:00
chenyu	8a585dc5c1	benchmark script for matmuls in bert (#9752 ) 2 main matmuls in the bert layers. getting these to be fast makes bert fast	2025-04-06 19:34:25 +08:00
nimlgen	5f7c79676f	jit: prune independent copies (#9749 ) * jit: prune independent copies * linter * check kernel cnt	2025-04-05 20:50:28 +03:00
nimlgen	c2573b247c	jit: rename optimize_weights -> replan_buffers_memory_layout (#9751 )	2025-04-05 20:35:15 +03:00
chenyu	407ca54382	symbolic fold double where (#9436 ) * symbolic fold double where a.where(b.where(c, d), d) -> (a & b).where(c, d). a pattern in optimizer * test case	2025-04-05 05:12:17 -04:00
Sieds Lykles	9c2fc695b5	cond.logical_not().where(a,b) -> cond.where(b,a) (#9741 ) * Add rule for negation in where, simplifies arange patterns * 0 becomes 0.0 again * Only if cond is bool * ne is never None * Add a test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-04 19:13:32 -04:00
chenyu	fe998798fb	linearizer failure test for OUT OF BOUNDS ACCESS (#9738 )	2025-04-04 03:48:43 -04:00
George Hotz	8b5a523743	fix minimum length in pattern matcher (#9736 )	2025-04-04 14:57:01 +08:00
George Hotz	926b0bcc57	cache folded upcast [pr] (#9733 )	2025-04-04 11:23:19 +08:00
George Hotz	cac8bcf8b5	use Ops.REDUCE (#9721 ) * decrease bert python time [pr] * order copies * Revert "order copies" This reverts commit `3f62c8693b`. * rewrite count * Ops.REDUCE * acc first in the add chain * Fix tensor core acc * arange patterns look good * fix multireduce gate * reduce rewrite rule * bump that to 15 minutes * multiwmma isn't fusing * gep through wmma is gep pushing * bump that timeout too, it's all env setup * add failing test	2025-04-04 10:14:34 +08:00
nimlgen	949459fdd6	jit: fix deallocate on unallocated buffers in free_intermediates (#9699 )	2025-04-03 18:32:51 +03:00
geohotstan	ac713e04db	ONNX add output shape validation (#9720 ) * add output shape validation and remove support for sequence_type * nit better err msg * add sequence_type back * improve err msg * Revert "improve err msg" This reverts commit `dc9eaea4bb`. * Revert "add sequence_type back" This reverts commit `288170b2d9`. * do explicit shape equality * small nit	2025-04-03 05:44:53 -04:00
chenyu	79145e3d40	cleanup truncate_bf16 [pr] (#9725 ) use torch bfloat16 for groundtruth in test. also a TODO for discrepancy	2025-04-03 05:43:49 -04:00
Ignacio Sica	bc2d86195e	increase test tolerance (#9719 )	2025-04-03 15:24:09 +08:00
George Hotz	49dafe6d43	add gc tests [pr] (#9718 ) * add gc tests [pr] * del * more gc tests * add NullGraph	2025-04-03 14:08:32 +08:00
Ignacio Sica	bc91fffc5d	fix gated store with index in python backend (#9703 ) * add default gate in index * assert store * add TestRendererFailures - move test_gated_store_with_alu to new TestRenderFailures class for tests that fail on multiple renderers - add test_renderer_failures.py run on python CI * add test for gated index in 2d * test TestRenderFailures	2025-04-03 12:48:28 +08:00
geohotstan	e1d7e47cca	fix ONNX IsInf unintended dtype promotion (#9711 ) * add IsInf * add corresponding test * that float16 is kinda silly	2025-04-02 22:46:15 -04:00

... 16 17 18 19 20 ...

4433 Commits