tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00
qazal	bbf05110a2	use kernelize in TestLinearizer.test_indexing_multireduce [pr] (#10571 )	2025-05-30 11:27:09 +03:00
qazal	9169dcfb49	do not create kernels with more inputs than the backend allows (#10510 ) * work * no itertools + top down pass * clean viz * python can do that * webgpu * gbarrier of gbarrier is gbarrier * device can be tuple * bug in toposort * failing test for gated toposort * contiguous of gbarrier is gbarrier * check for binops * Revert "check for binops" This reverts commit `53e3cdf720`. * viz + match on gbarrier, self exists by default * alt * green now * cleanup	2025-05-26 18:02:03 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
Ignacio Sica	8f79492c75	fix test_tensor_cores_codegen for ptx renderer (#10119 )	2025-05-01 21:52:36 -03:00
Ignacio Sica	bf5fb97498	fix `AMD_LLVM` bf16 tc for `gfx1100` (#10102 ) * fix amd_llvm bf16 tc * cleanup pattern	2025-04-30 20:06:38 -03:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
George Hotz	4c242b0483	hotfix: tests all pass on metal local	2025-04-28 12:09:00 -04:00
qazal	d13c100981	don't sort dims in verify_sink_dims [pr] (#10059 ) * don't sort dims in verify_sink_dims [pr] * 1 can exist with n * put process_replay warn last * assert shape is the same * bring that back	2025-04-26 23:24:30 +08:00
Ignacio Sica	76a86735c0	hotfix `amd` bf16 is supported case (#10039 ) * hotfix amd and amd_llvm * bf16 not supported in ci * hotfix amd_llvm is not a device * remove default * dont gate on ci and amd_llvm * minor cleanup * skip bf16 tc test for amd_llvm	2025-04-24 21:29:27 -03:00
Ignacio Sica	b4f823acbe	fix helper_tc_allclose (#9606 ) * fix helper_tc_allclose * cleanup * hotfix * cleanup * cleanup * check real buffer and add cast for bf16 * cleanup * fix padded for ops_python * avoid assert on amd emulated tc * swap dimensions * revert, should have nothing to do with padded * revert fix, should not go in this pr * remove skip	2025-04-24 18:36:40 -03:00
Ignacio Sica	51ca19d061	set `test_tensor_cores_padded_amd` to expectedFailure (#10036 ) * init * add expected failure to correctly track progres * hotfix * skip for amd_llvm as well * add skip * add pr number * move comment to amd test * change reason	2025-04-24 17:11:40 -03:00
Ignacio Sica	373ca59b7f	use is_dtype_supported to check dtype support in tc tests (#10035 )	2025-04-24 14:59:14 -03:00
George Hotz	2ed3acd767	toposort is a function [pr] (#10004 )	2025-04-23 16:25:03 +01:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
George Hotz	aa98aff4cd	don't use ops name, just keep sink (#9922 ) * don't use ops name, just keep sink * fix test * endif sink	2025-04-18 08:59:18 +01:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
George Hotz	cac8bcf8b5	use Ops.REDUCE (#9721 ) * decrease bert python time [pr] * order copies * Revert "order copies" This reverts commit `3f62c8693b`. * rewrite count * Ops.REDUCE * acc first in the add chain * Fix tensor core acc * arange patterns look good * fix multireduce gate * reduce rewrite rule * bump that to 15 minutes * multiwmma isn't fusing * gep through wmma is gep pushing * bump that timeout too, it's all env setup * add failing test	2025-04-04 10:14:34 +08:00
Ignacio Sica	2d6d8b7355	add bf16 mfma support (#9695 ) * add bf16 mfma support * skip tc if emulated_amd and dtypes is bf16 * hotfix	2025-04-02 21:44:49 +08:00
George Hotz	e78e8722dc	Revert "LDS noop and spec (#9669 )" (#9691 ) This reverts commit `870b545ace`. Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>	2025-04-02 15:31:32 +08:00
Ignacio Sica	870b545ace	LDS noop and spec (#9669 ) * init lds noop and lds_0 spec * refactor lds helper test * fix typo * test all lds at the same time * change comment * comment * start test_lds_full * test_lds_tc * add tc spec	2025-04-01 18:44:55 +08:00
b1tg	d9af4cfc1b	AMD_LLVM: tensor cores support (#9613 ) * tensor cores support * test tesor cores codegen * use rewrite rules --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-04-01 09:56:27 +08:00
Ignacio Sica	1444069c09	Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test (#9649 )	2025-03-31 19:05:36 +08:00
Ignacio Sica	baa67fd124	Uppercase N and M (standalone syntax change) (#9647 )	2025-03-31 18:45:30 +08:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
George Hotz	ece0a0f305	use empty for test instead of rand (#9332 )	2025-03-03 16:19:06 +08:00
George Hotz	2cc4cb74f0	reorder binops (#9328 ) * reorder binops * test improvements + fix string tests * ugh, okay this	2025-03-03 14:58:18 +08:00
qazal	2eab8021fb	remove inputs+outputs attributes from ScheduleItem [pr] (#9192 ) * remove inputs/outputs from ScheduleItem * fix test_linearizer * fix test_conv_shapetracker * fix test_schedule + lint * test_image_dtype + multitensor + search	2025-02-21 13:48:11 +01:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
George Hotz	a4dab3ec3f	add name uop (#9149 ) * add name uop, TODO: refactor renderer to use * renderer uses name uop * fix tests * render * ptx	2025-02-18 15:26:58 +08:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
chenyu	f53b819648	UOps. -> Ops. [pr] (#9044 ) updated the comments and doc except extra	2025-02-12 12:53:23 -05:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00
George Hotz	a3c78d47b3	speed docs + upgrades [pr] (#8964 ) * add some docs about speed [pr] * better torch gemm * enable locals on llvm/clang * disable locals for beam speed on LLVM/CLANG * 0x20 alignment in llvm allows ymm use	2025-02-08 17:28:52 +08:00
George Hotz	c2b4c43edb	handle stride 0 reduce (#8068 ) * handle stride 0 reduce [pr] * more test fixups * a few more --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-07 15:40:58 +01:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
Ignacio Sica	15f94ac964	TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793 ) * squash search over search * refactor assert * init benchmark * cleaner get_kernel_actions * cleaner get_kernel_actions * add comment	2025-02-05 11:03:46 -05:00
Ignacio Sica	260df1a17f	`tc_select` noop (#8801 ) * tc_select noop * revert changes in test	2025-01-29 13:53:23 -05:00
qazal	ba17786068	do not construct unmasked VALID (#8759 ) * new lines that exist in codegen/ops * update tests * update sops.gz (13071 -> 13070 asts) * fix viz too * remove that TODO * diff pruning * mask assert + device * work * diff pruning * re: fix viz too --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-28 20:51:21 +02:00
Ignacio Sica	b240f12593	[TIP-9] rename Opt's amt to arg 2 (#8770 ) * rename Opt amt to arg * ignore_beam_cache for test_tiny * move ignore_beam_cache to test_tiny * move to separate pr * revert space change --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-27 14:19:04 -05:00
George Hotz	3ed146a5ff	Revert "rename Opt amt to arg (#8767 )" (#8769 ) This reverts commit `bf041659a5`.	2025-01-27 23:46:37 +09:00
Ignacio Sica	bf041659a5	rename Opt amt to arg (#8767 )	2025-01-27 23:36:47 +09:00

1 2 3 4 5 ...

352 Commits