tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	affd83961c	small changes from define_reg (#11327 ) * small changes from define_reg * fix webgpu	2025-07-22 11:11:48 -07:00
George Hotz	3b674df34b	generic changes from define_reg_2 (#11315 ) * generic changes from define_reg_2 * fix for ptx * ugh, that one	2025-07-21 15:14:06 -07:00
chenyu	54924f9969	type remove Union and Optional [pr] (#11283 ) use `\|` for consistency	2025-07-19 14:05:52 -04:00
chenyu	ec3efd2919	move upcast before reduce (#11250 ) * move upcast before reduce upcast goes to end of global+local+upcast * r_196_32_4_24_8	2025-07-18 14:42:15 -04:00
chenyu	522dc72f08	remove Kernel.local_dims [pr] (#11268 ) * remove Kernel.local_dims [pr] also not needed * fix test_matvec	2025-07-16 17:46:19 -04:00
chenyu	c8e5c4d7c3	insert_before -> insert_at [pr] (#11257 ) more precise	2025-07-15 17:44:34 -04:00
chenyu	b6662096cb	remove more first_reduce [pr] (#11239 )	2025-07-14 19:13:44 -04:00
chenyu	eb8e17ef59	remove most of the first_upcast [pr] (#11238 )	2025-07-14 16:54:24 -04:00
chenyu	674dc28505	remove Kernel.full_unupcasted_shape [pr] (#11215 ) decomp to shape_len and first_upcast to get the last upcast-able dim	2025-07-13 13:56:23 -04:00
chenyu	2b48b961be	fix a few broken AMX tests (#11204 )	2025-07-12 21:42:38 -04:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
George Hotz	359bed74f8	axis type tracking [pr] (#11137 ) * axis type tracking [pr] * keep update_info * keep legacy colors * update tests to apply_opt	2025-07-08 14:16:25 -07:00
George Hotz	0597735f28	remove TC=3 not porting this (#11045 )	2025-06-30 15:12:49 -07:00
chenyu	126fcf4129	clean up AMD_LLVM in tests (#11021 )	2025-06-28 22:45:47 -04:00
George Hotz	be53ef4f0a	rename DEFINE_ACC -> DEFINE_REG (#11006 ) * rename DEFINE_ACC -> DEFINE_REG * add CMPEQ to groupops	2025-06-27 11:09:25 -07:00
George Hotz	5a1911b7c4	apply the global dims late (#11002 ) * apply the global dims late [pr] * late gpudims * tests passing * remove the random local_dims inc * simpler	2025-06-27 09:54:34 -07:00
George Hotz	b4eb876d5a	kernel.py no longer permutes reduce axis [pr] (#10968 ) * kernel.py no longer permutes reduce axis [pr] * delete tests that handcode uops * regen of sops is broken... * put import back * just remove that * disable those tests	2025-06-26 17:44:58 -07:00
Ignacio Sica	579194f523	remove some linearize calls from tests 2 [pr] (#10992 ) * refactor count_float4 to take uops as input instead of kernel * remove some calls to linearize in test_linearizer * remove some more calls * remove one more call	2025-06-26 18:22:27 -03:00
Ignacio Sica	21f1c4cc09	remove some linearize calls from tests [pr] (#10978 ) * remove some linearize calls from tests speed_compare_cuda_ptx test_uop_spec test_linearizer test_uops test_winograd * more clear assert message	2025-06-25 12:37:17 -07:00
Ignacio Sica	98d2cde293	revert tc_group feature (#10971 )	2025-06-24 20:58:13 -07:00
George Hotz	8a65720528	hotfix: disable test_tensor_core_opts_group test on real metal	2025-06-24 15:21:33 -07:00
George Hotz	8743ca40e2	force reduce to be in axis order (#10837 ) * force reduce to be in axis order * disable rule causing loop * disable that rule * no ra there * only move non reduce * fix tests	2025-06-24 13:00:16 -07:00
Ignacio Sica	956a8391a5	minor cleanup on test_tensor_core_opts tests (#10924 ) * minor cleanup on test_tensor_core_opts tests Tests now notify when skipped Before, they silently skipped if backend didn't had half precision and accumulation Also cleaned up atol and rtol setup * refactor test_tensor_core_opts_group --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-06-23 16:30:21 -07:00
Ignacio Sica	b8d09a1dae	tc with group/grouptop (#10903 )	2025-06-23 09:58:41 -07:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00
qazal	bbf05110a2	use kernelize in TestLinearizer.test_indexing_multireduce [pr] (#10571 )	2025-05-30 11:27:09 +03:00
qazal	9169dcfb49	do not create kernels with more inputs than the backend allows (#10510 ) * work * no itertools + top down pass * clean viz * python can do that * webgpu * gbarrier of gbarrier is gbarrier * device can be tuple * bug in toposort * failing test for gated toposort * contiguous of gbarrier is gbarrier * check for binops * Revert "check for binops" This reverts commit `53e3cdf720`. * viz + match on gbarrier, self exists by default * alt * green now * cleanup	2025-05-26 18:02:03 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
Ignacio Sica	8f79492c75	fix test_tensor_cores_codegen for ptx renderer (#10119 )	2025-05-01 21:52:36 -03:00
Ignacio Sica	bf5fb97498	fix `AMD_LLVM` bf16 tc for `gfx1100` (#10102 ) * fix amd_llvm bf16 tc * cleanup pattern	2025-04-30 20:06:38 -03:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
George Hotz	4c242b0483	hotfix: tests all pass on metal local	2025-04-28 12:09:00 -04:00
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
Ignacio Sica	76a86735c0	hotfix `amd` bf16 is supported case (#10039 ) * hotfix amd and amd_llvm * bf16 not supported in ci * hotfix amd_llvm is not a device * remove default * dont gate on ci and amd_llvm * minor cleanup * skip bf16 tc test for amd_llvm	2025-04-24 21:29:27 -03:00
Ignacio Sica	b4f823acbe	fix helper_tc_allclose (#9606 ) * fix helper_tc_allclose * cleanup * hotfix * cleanup * cleanup * check real buffer and add cast for bf16 * cleanup * fix padded for ops_python * avoid assert on amd emulated tc * swap dimensions * revert, should have nothing to do with padded * revert fix, should not go in this pr * remove skip	2025-04-24 18:36:40 -03:00
Ignacio Sica	51ca19d061	set `test_tensor_cores_padded_amd` to expectedFailure (#10036 ) * init * add expected failure to correctly track progres * hotfix * skip for amd_llvm as well * add skip * add pr number * move comment to amd test * change reason	2025-04-24 17:11:40 -03:00
Ignacio Sica	373ca59b7f	use is_dtype_supported to check dtype support in tc tests (#10035 )	2025-04-24 14:59:14 -03:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
George Hotz	aa98aff4cd	don't use ops name, just keep sink (#9922 ) * don't use ops name, just keep sink * fix test * endif sink	2025-04-18 08:59:18 +01:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00

1 2 3 4 5 ...

380 Commits