tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	b4eb876d5a	kernel.py no longer permutes reduce axis [pr] (#10968 ) * kernel.py no longer permutes reduce axis [pr] * delete tests that handcode uops * regen of sops is broken... * put import back * just remove that * disable those tests	2025-06-26 17:44:58 -07:00
Ignacio Sica	579194f523	remove some linearize calls from tests 2 [pr] (#10992 ) * refactor count_float4 to take uops as input instead of kernel * remove some calls to linearize in test_linearizer * remove some more calls * remove one more call	2025-06-26 18:22:27 -03:00
Ignacio Sica	21f1c4cc09	remove some linearize calls from tests [pr] (#10978 ) * remove some linearize calls from tests speed_compare_cuda_ptx test_uop_spec test_linearizer test_uops test_winograd * more clear assert message	2025-06-25 12:37:17 -07:00
Ignacio Sica	98d2cde293	revert tc_group feature (#10971 )	2025-06-24 20:58:13 -07:00
George Hotz	8a65720528	hotfix: disable test_tensor_core_opts_group test on real metal	2025-06-24 15:21:33 -07:00
George Hotz	8743ca40e2	force reduce to be in axis order (#10837 ) * force reduce to be in axis order * disable rule causing loop * disable that rule * no ra there * only move non reduce * fix tests	2025-06-24 13:00:16 -07:00
Ignacio Sica	956a8391a5	minor cleanup on test_tensor_core_opts tests (#10924 ) * minor cleanup on test_tensor_core_opts tests Tests now notify when skipped Before, they silently skipped if backend didn't had half precision and accumulation Also cleaned up atol and rtol setup * refactor test_tensor_core_opts_group --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-06-23 16:30:21 -07:00
Ignacio Sica	b8d09a1dae	tc with group/grouptop (#10903 )	2025-06-23 09:58:41 -07:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00
qazal	bbf05110a2	use kernelize in TestLinearizer.test_indexing_multireduce [pr] (#10571 )	2025-05-30 11:27:09 +03:00
qazal	9169dcfb49	do not create kernels with more inputs than the backend allows (#10510 ) * work * no itertools + top down pass * clean viz * python can do that * webgpu * gbarrier of gbarrier is gbarrier * device can be tuple * bug in toposort * failing test for gated toposort * contiguous of gbarrier is gbarrier * check for binops * Revert "check for binops" This reverts commit `53e3cdf720`. * viz + match on gbarrier, self exists by default * alt * green now * cleanup	2025-05-26 18:02:03 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
Ignacio Sica	8f79492c75	fix test_tensor_cores_codegen for ptx renderer (#10119 )	2025-05-01 21:52:36 -03:00
Ignacio Sica	bf5fb97498	fix `AMD_LLVM` bf16 tc for `gfx1100` (#10102 ) * fix amd_llvm bf16 tc * cleanup pattern	2025-04-30 20:06:38 -03:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
George Hotz	4c242b0483	hotfix: tests all pass on metal local	2025-04-28 12:09:00 -04:00
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
Ignacio Sica	76a86735c0	hotfix `amd` bf16 is supported case (#10039 ) * hotfix amd and amd_llvm * bf16 not supported in ci * hotfix amd_llvm is not a device * remove default * dont gate on ci and amd_llvm * minor cleanup * skip bf16 tc test for amd_llvm	2025-04-24 21:29:27 -03:00
Ignacio Sica	b4f823acbe	fix helper_tc_allclose (#9606 ) * fix helper_tc_allclose * cleanup * hotfix * cleanup * cleanup * check real buffer and add cast for bf16 * cleanup * fix padded for ops_python * avoid assert on amd emulated tc * swap dimensions * revert, should have nothing to do with padded * revert fix, should not go in this pr * remove skip	2025-04-24 18:36:40 -03:00
Ignacio Sica	51ca19d061	set `test_tensor_cores_padded_amd` to expectedFailure (#10036 ) * init * add expected failure to correctly track progres * hotfix * skip for amd_llvm as well * add skip * add pr number * move comment to amd test * change reason	2025-04-24 17:11:40 -03:00
Ignacio Sica	373ca59b7f	use is_dtype_supported to check dtype support in tc tests (#10035 )	2025-04-24 14:59:14 -03:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
George Hotz	aa98aff4cd	don't use ops name, just keep sink (#9922 ) * don't use ops name, just keep sink * fix test * endif sink	2025-04-18 08:59:18 +01:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
George Hotz	cac8bcf8b5	use Ops.REDUCE (#9721 ) * decrease bert python time [pr] * order copies * Revert "order copies" This reverts commit `3f62c8693b`. * rewrite count * Ops.REDUCE * acc first in the add chain * Fix tensor core acc * arange patterns look good * fix multireduce gate * reduce rewrite rule * bump that to 15 minutes * multiwmma isn't fusing * gep through wmma is gep pushing * bump that timeout too, it's all env setup * add failing test	2025-04-04 10:14:34 +08:00
Ignacio Sica	2d6d8b7355	add bf16 mfma support (#9695 ) * add bf16 mfma support * skip tc if emulated_amd and dtypes is bf16 * hotfix	2025-04-02 21:44:49 +08:00
George Hotz	e78e8722dc	Revert "LDS noop and spec (#9669 )" (#9691 ) This reverts commit `870b545ace`. Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>	2025-04-02 15:31:32 +08:00
Ignacio Sica	870b545ace	LDS noop and spec (#9669 ) * init lds noop and lds_0 spec * refactor lds helper test * fix typo * test all lds at the same time * change comment * comment * start test_lds_full * test_lds_tc * add tc spec	2025-04-01 18:44:55 +08:00
b1tg	d9af4cfc1b	AMD_LLVM: tensor cores support (#9613 ) * tensor cores support * test tesor cores codegen * use rewrite rules --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-04-01 09:56:27 +08:00
Ignacio Sica	1444069c09	Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test (#9649 )	2025-03-31 19:05:36 +08:00
Ignacio Sica	baa67fd124	Uppercase N and M (standalone syntax change) (#9647 )	2025-03-31 18:45:30 +08:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
George Hotz	ece0a0f305	use empty for test instead of rand (#9332 )	2025-03-03 16:19:06 +08:00
George Hotz	2cc4cb74f0	reorder binops (#9328 ) * reorder binops * test improvements + fix string tests * ugh, okay this	2025-03-03 14:58:18 +08:00
qazal	2eab8021fb	remove inputs+outputs attributes from ScheduleItem [pr] (#9192 ) * remove inputs/outputs from ScheduleItem * fix test_linearizer * fix test_conv_shapetracker * fix test_schedule + lint * test_image_dtype + multitensor + search	2025-02-21 13:48:11 +01:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
George Hotz	a4dab3ec3f	add name uop (#9149 ) * add name uop, TODO: refactor renderer to use * renderer uses name uop * fix tests * render * ptx	2025-02-18 15:26:58 +08:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
chenyu	f53b819648	UOps. -> Ops. [pr] (#9044 ) updated the comments and doc except extra	2025-02-12 12:53:23 -05:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00

1 2 3 4 5 ...

412 Commits