tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
George Hotz	cac8bcf8b5	use Ops.REDUCE (#9721 ) * decrease bert python time [pr] * order copies * Revert "order copies" This reverts commit `3f62c8693b`. * rewrite count * Ops.REDUCE * acc first in the add chain * Fix tensor core acc * arange patterns look good * fix multireduce gate * reduce rewrite rule * bump that to 15 minutes * multiwmma isn't fusing * gep through wmma is gep pushing * bump that timeout too, it's all env setup * add failing test	2025-04-04 10:14:34 +08:00
Ignacio Sica	2d6d8b7355	add bf16 mfma support (#9695 ) * add bf16 mfma support * skip tc if emulated_amd and dtypes is bf16 * hotfix	2025-04-02 21:44:49 +08:00
George Hotz	e78e8722dc	Revert "LDS noop and spec (#9669 )" (#9691 ) This reverts commit `870b545ace`. Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>	2025-04-02 15:31:32 +08:00
Ignacio Sica	870b545ace	LDS noop and spec (#9669 ) * init lds noop and lds_0 spec * refactor lds helper test * fix typo * test all lds at the same time * change comment * comment * start test_lds_full * test_lds_tc * add tc spec	2025-04-01 18:44:55 +08:00
b1tg	d9af4cfc1b	AMD_LLVM: tensor cores support (#9613 ) * tensor cores support * test tesor cores codegen * use rewrite rules --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-04-01 09:56:27 +08:00
Ignacio Sica	1444069c09	Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test (#9649 )	2025-03-31 19:05:36 +08:00
Ignacio Sica	baa67fd124	Uppercase N and M (standalone syntax change) (#9647 )	2025-03-31 18:45:30 +08:00
chenyu	f8976dd2eb	enable more webgpu tests (#9502 ) OSX has larger buffer number limit, and it supports fp16 now	2025-03-18 23:03:54 -04:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
George Hotz	ece0a0f305	use empty for test instead of rand (#9332 )	2025-03-03 16:19:06 +08:00
George Hotz	2cc4cb74f0	reorder binops (#9328 ) * reorder binops * test improvements + fix string tests * ugh, okay this	2025-03-03 14:58:18 +08:00
qazal	2eab8021fb	remove inputs+outputs attributes from ScheduleItem [pr] (#9192 ) * remove inputs/outputs from ScheduleItem * fix test_linearizer * fix test_conv_shapetracker * fix test_schedule + lint * test_image_dtype + multitensor + search	2025-02-21 13:48:11 +01:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
George Hotz	a4dab3ec3f	add name uop (#9149 ) * add name uop, TODO: refactor renderer to use * renderer uses name uop * fix tests * render * ptx	2025-02-18 15:26:58 +08:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
chenyu	f53b819648	UOps. -> Ops. [pr] (#9044 ) updated the comments and doc except extra	2025-02-12 12:53:23 -05:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00
George Hotz	a3c78d47b3	speed docs + upgrades [pr] (#8964 ) * add some docs about speed [pr] * better torch gemm * enable locals on llvm/clang * disable locals for beam speed on LLVM/CLANG * 0x20 alignment in llvm allows ymm use	2025-02-08 17:28:52 +08:00
George Hotz	c2b4c43edb	handle stride 0 reduce (#8068 ) * handle stride 0 reduce [pr] * more test fixups * a few more --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-07 15:40:58 +01:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
Ignacio Sica	15f94ac964	TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793 ) * squash search over search * refactor assert * init benchmark * cleaner get_kernel_actions * cleaner get_kernel_actions * add comment	2025-02-05 11:03:46 -05:00
Ignacio Sica	260df1a17f	`tc_select` noop (#8801 ) * tc_select noop * revert changes in test	2025-01-29 13:53:23 -05:00
qazal	ba17786068	do not construct unmasked VALID (#8759 ) * new lines that exist in codegen/ops * update tests * update sops.gz (13071 -> 13070 asts) * fix viz too * remove that TODO * diff pruning * mask assert + device * work * diff pruning * re: fix viz too --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-28 20:51:21 +02:00
Ignacio Sica	b240f12593	[TIP-9] rename Opt's amt to arg 2 (#8770 ) * rename Opt amt to arg * ignore_beam_cache for test_tiny * move ignore_beam_cache to test_tiny * move to separate pr * revert space change --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-27 14:19:04 -05:00
George Hotz	3ed146a5ff	Revert "rename Opt amt to arg (#8767 )" (#8769 ) This reverts commit `bf041659a5`.	2025-01-27 23:46:37 +09:00
Ignacio Sica	bf041659a5	rename Opt amt to arg (#8767 )	2025-01-27 23:36:47 +09:00
George Hotz	b4bf6a7dea	switch backward to use gradient [pr] (#8235 ) * switch backward to use gradient [pr] * set device correctly, dedup * why does that fail? * add noop cast * simple backward * fix beautiful_mnist * touchups * set in compute_gradient * uop_count * uop_count was wrong * collections * no note * skip that test * update sched kernel counts * train mnist is 65 * fix metadata and gc * fixes * materialize_grads * no pathlib stuff * add contiguous_backward, fix bugs * add some realize * fix multi	2025-01-26 09:12:16 +09:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
qazal	d957a4f108	add tests for div buffer collapsing in the scheduler [pr] (#8671 ) * add tests for mul/div buffer collapsing in the scheduler [pr] * lint * merge with test_linearizer's version of this * 4*3	2025-01-18 14:15:29 -05:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
qazal	ae2229d727	assert kernel buffer limit at compile time [pr] (#8595 ) * remove the BUF_LIMIT assert * skip the base one	2025-01-13 16:32:07 -05:00
qazal	586e730d32	use UOp.st for kernel reduce axes (#8499 ) * use UOp.st for kernel reduce axes [pr] * do not return dict	2025-01-13 06:24:11 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
George Hotz	29c14f1cbf	hotfix: update tests for no uop mut	2024-12-30 10:05:37 -05:00
ignaciosica	ba0c844a83	special tol when f16 and bf16 are tc input dtypes (#8183 )	2024-12-21 11:32:26 -05:00
George Hotz	bd9c015b09	tests from grad uop path [pr] (#8313 )	2024-12-18 09:25:05 -08:00
Ahmed Harmouche	a73e3677d0	Test linearizer on webgpu (#8159 ) * Test linearizer on wgpu * Skip tests due to exceeded dims	2024-12-11 17:03:26 +01:00
qazal	6be388be86	failing test for const folding breaking indexing [pr] (#8103 )	2024-12-07 19:55:02 +08:00
George Hotz	0c7477b108	no bool in range [pr] (#7988 ) * no bool in range [pr] * fix llvm * add arg to range spec * fix broken test * forgot this one * hotfix: test_tiny jit is a real test	2024-12-02 19:05:16 +08:00
George Hotz	f17af70d17	replace all sparents with toposort (#7983 )	2024-12-02 15:00:30 +08:00
George Hotz	c5c3b05b5a	block lin: only the test changes (#7933 )	2024-11-28 13:19:00 +08:00
George Hotz	32dbab945c	Revert "add block uops and modify tests (#7931 )" (#7932 ) This reverts commit `6f4519ff45`.	2024-11-28 13:15:41 +08:00
George Hotz	6f4519ff45	add block uops and modify tests (#7931 )	2024-11-28 13:11:18 +08:00
chenyu	a58e289d77	Revert "prereqs for new block lin so PR works (#7919 )" (#7921 ) This reverts commit `c53261b541`.	2024-11-27 08:41:09 -05:00

1 2 3 4 5 ...

334 Commits