tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-13 08:58:05 -05:00

Author	SHA1	Message	Date
Sieds Lykles	e75be6eafc	[bounty] [pr] index validation with z3 (#9981 ) * index validation with z3 * Change comment * toposort -> toposort() --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 08:06:08 -04:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
qazal	16dfe0a902	upstream remu (#9921 )	2025-04-18 01:57:36 +03:00
George Hotz	44e4934167	fast pattern matcher [pr] (#9737 ) * FastPatternMatcher * works without that * fix test pickle * strict len * compile match function * dynamic compile * fast * faster * compile * track * a lot faster * clean up * dup or * faster and simpler * fast match doesn't support store * plane * minor refactor * real speed * don't imply return None * upat * fix test * heard you wanted more speed * no generator * split cf * early fixup * fxn fixup * reconstruct_function * Revert "reconstruct_function" This reverts commit `37dac010ab`. * simpler stuff * too big * upat compile error * cleanups * don't cache that * cleanups * 10 -> 15	2025-04-14 15:24:41 +01:00
George Hotz	355739fc94	switch to universal match [pr] (#9879 ) * switch to universal match [pr] * 10 -> 15	2025-04-14 09:15:37 +01:00
chenyu	6896197978	relax ATOL for TC half tests more (#9847 )	2025-04-11 03:20:22 -04:00
George Hotz	f666dd14eb	fix get reduce contraction with test (#9834 )	2025-04-10 22:24:21 +08:00
chenyu	566e389585	more relaxed ATOL for HALF=1 simple_matmul test (#9823 ) it's a function of N so only updated in the test command	2025-04-10 00:46:16 -04:00
chenyu	06a928b341	higher ATOL for half input TC test (#9821 ) flaky	2025-04-09 23:57:25 -04:00
uuuvn	3ee317ffed	Fix kfd autogen and verify it in ci (#9818 ) Had to autogen newer uapi headers for #9746 (dmabuf export ioctl missing), submitting just the fix without updating to newer headers as they are only needed for infiniband stuff	2025-04-10 09:53:42 +08:00
chenyu	c5db5b83b9	add SHOULD_USE_TC=1 check to simple_matmul (#9802 ) * add SHOULD_USE_TC=1 check to simple_matmul also zero centered the random input and update atol for tf32 * ATOL=2e-2 for HALF	2025-04-09 02:24:42 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	14928fecff	Revert "fix TF32 tensor core dropped in tc_sm89 (#9798 )" This reverts commit `7c9a96824f`.	2025-04-09 12:27:39 +08:00
chenyu	7c9a96824f	fix TF32 tensor core dropped in tc_sm89 (#9798 ) also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul	2025-04-08 23:20:50 -04:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
Sieds Lykles	07d1aefaf4	fast idiv (#9755 ) * fast idiv with tests and fuzzer * Add todo comment * Add env variable to toggle fast_idiv * Move env check * Add fuzz fast_idiv to ci --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-07 08:32:24 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
George Hotz	cac8bcf8b5	use Ops.REDUCE (#9721 ) * decrease bert python time [pr] * order copies * Revert "order copies" This reverts commit `3f62c8693b`. * rewrite count * Ops.REDUCE * acc first in the add chain * Fix tensor core acc * arange patterns look good * fix multireduce gate * reduce rewrite rule * bump that to 15 minutes * multiwmma isn't fusing * gep through wmma is gep pushing * bump that timeout too, it's all env setup * add failing test	2025-04-04 10:14:34 +08:00
chenyu	1d25844d44	Revert "disable CI red llama 3 4 gpu beam (#9690 )" (#9709 ) This reverts commit `6a5eacba8b`.	2025-04-03 02:34:39 -04:00
George Hotz	49dafe6d43	add gc tests [pr] (#9718 ) * add gc tests [pr] * del * more gc tests * add NullGraph	2025-04-03 14:08:32 +08:00
Ignacio Sica	bc91fffc5d	fix gated store with index in python backend (#9703 ) * add default gate in index * assert store * add TestRendererFailures - move test_gated_store_with_alu to new TestRenderFailures class for tests that fail on multiple renderers - add test_renderer_failures.py run on python CI * add test for gated index in 2d * test TestRenderFailures	2025-04-03 12:48:28 +08:00
chenyu	bc3bfcbad4	update install gpuocelot (#9693 ) `-DCMAKE_POLICY_VERSION_MINIMUM=3.5`	2025-04-02 04:10:34 -04:00
chenyu	6a5eacba8b	disable CI red llama 3 4 gpu beam (#9690 ) device hangs and ci would fail	2025-04-02 03:19:09 -04:00
George Hotz	6f812d3f2f	fixes from the dsp branch + 12500 lines (#9683 ) * fixes from the dsp branch * more changes * those are gep pushing	2025-04-02 13:07:17 +08:00
Anish Umale	a1ee4d587f	Fix test_ops for tiny backend (#9302 ) * fix some tests in test_ops for torch backend(171 failing) * fix more tests (135 failures) * fix tests (126 failing) * handle transposed convs (109 tests failing) * fix slice * fix lshift & rshift and more tests (87 tests failing) * revert accidental change * remove unnecessary changes (82 failures) * fix backward for avg_pool2d (78 failures) * fix backward for avg_pool2d (78 failures) * fix replication backpass * fix reflection pad back pass (71 failures) * cummax with indicies, aten.mv and move out methods (67 failures) * extract avg_pool2d and avg_pool3d to separate functions (62 failures) * revert changes for cat_out * rewrite avg_pool and pad without repetition * remove duplicates from decomps * slice rewrite and add slice_backward (59 failures) * add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297 * fix linter error and remove Tensor.pad (48 failures) * add select_backward and index_put (40 failures) * fix some more tests (36 failures) * fix more tests (12 failures) * some cleanups and fix couple more tests (10 failures) * cleaner way to write upsample * some more upsample cleanups * use lambda for upsample * add autowrapper for upsample forward * cumsum and max_dim without aten functions * revert _log_softmax * fix more tests (1 failure) * make linter happy * move import to appropriate func * make linter happy * add codes for noqa * some more refactors * remove comment * remove dependency on aten function for conv backward * some more refactors * add returns * revert a change from merge * some cleanups * remove whitespace * remove ruff change * revert upsample * add masked_fill_.Tensor and scatter.src_out * add todo * fix test_biased_conv2d * fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :( * revert torch_debug * revert torch_debug * skip test_gather_failure for the tiny backend * make padding registration more consise * add nonzero * remove scatter_add since we already have the out * fix scatter * remove some repetition * make upsample backward registrations more concise * remove select.int * use Tensor.cumsum * realize conv2d outputs before backward to fix test_biased_conv2d * add a todo for realize(1 failure) * add new_empty and new_empty_strided * make test_pad_circular_mode forward only and remove redundant stuff * fix linter errors * remove expect failure * just tb * slice is a view_op * contiguous only when lazydata.is_realized * fix backward for test_pad_circular_mode * revert torch.nn.functional.pad override * add transpose.int and make constant_pad_nd contiguous * slice_backwards has no kwargs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-31 21:13:09 -04:00
chenyu	60eb0c4ed7	exclude slow tests on PYTHON (#9634 )	2025-03-30 22:55:05 -04:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
b1tg	f90001e1a6	amd llvm render (no_comgr prereq) (#9543 ) * amd llvm render * skip test_div_rounding_mode --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-24 22:50:51 +08:00
Ahmed Harmouche	7ce7fe0574	Refactor webgpu_dawn lib finding (#9547 ) * Refactor webgpu_dawn lib finding * Fix ruff	2025-03-23 08:23:29 -04:00
quortus	bdd44d4255	Fix DSP transcendentals (#9542 )	2025-03-22 11:08:18 +08:00
chenyu	ee3d313b34	Revert "update ruff to 0.11.2 (#9531 )" (#9535 ) This reverts commit `d8d65e2747`.	2025-03-21 14:52:25 -04:00
b1tg	58206fa8a9	add amd llvm compiler (#9519 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 23:13:27 +08:00
chenyu	d8d65e2747	update ruff to 0.11.2 (#9531 ) 0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade	2025-03-21 10:32:59 -04:00
chenyu	b9fab9b914	pin ruff to 0.11.0 in CI (#9520 ) 0.11.1 had a bug https://github.com/astral-sh/ruff/issues/16874 that breaks ci	2025-03-20 13:12:50 -04:00
b1tg	1d71436e6a	use libllvm19 in ci (#9494 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-03-19 11:53:32 +08:00
Ignacio Sica	5c56cac0a0	MI300 mfma support (#9417 ) * add f16/f32 mfma support for MI300 - add 16x16 mfma shape support for f16 with f32 acc - add ops_python mfma emulation - add arch to AMDRenderer * minor cleanup * minor cleanup * add mfma emulation task to ci * add back todo * hotfix: comment * add tc=3 job to ci	2025-03-18 14:33:30 -03:00
George Hotz	cb7a7f69c7	quantization preprocessor from DSP, should be universal (#9437 ) * quantization preprocessor from DSP, should be universal * touchups * fix tests	2025-03-15 07:49:37 +08:00
qazal	4df2b6347d	hotfix: bump tinybox red training CI timeout to 30 minutes (#9426 )	2025-03-13 09:31:44 +01:00
George Hotz	931436204c	hotfix: 12000 lines, for AMD stuff	2025-03-13 10:48:14 +08:00
Priyank Patel	4714c4f9ad	torch backend multigpu - add devices and tests (#9414 ) * add multi-device support and tests * simplify	2025-03-12 11:33:11 +08:00
uuuvn	e85001b6ee	SQTT profiling (#9278 ) * sqtt * docs * multi-device * ProfileSQTTEvent * exec update * 256mb default * don't let people hang their gpus * bitfields from autogen * asic info from mesa * more bitfields from autogen * SQTT_ITRACE_SE_MASK --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-11 13:19:56 +08:00
Priyank Patel	796c3bbb23	torch: support in-place operations on views (#9371 ) * add torch inplace tests * first set of tests passing * wrap all inplace funcs, add more tests * fixes and wrap more functions * fix all uint8 tests to avoid slow tests * fix the one test * another test, another fix * and one more, works for ddp now * something on contiguous, cleanup --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-03-10 23:29:00 +08:00
hooved	136cf7b8b1	hotfix: load >2 GiB from disk on macOS (#9361 ) * enable loading >2 GiB buffer from disk on macOS * handle None case raised by mypy * add test * revert fix to repro bug in CI * tell CI to run a unit test for macOS * reapply fix	2025-03-07 14:51:58 +08:00
uuuvn	c6d76770e4	Increase timeout on macos tests (#9362 ) Process replay timeouts: https://github.com/tinygrad/tinygrad/actions/runs/13682213444/job/38257133289?pr=9360	2025-03-05 13:04:16 -05:00
nimlgen	cd9d74f7ea	use am in training benchmarks (#9357 ) * am in training benchmarks * fix * not needed anymore	2025-03-05 19:13:47 +03:00
George Hotz	7576a1da23	hotfix: line count to 11500, lines for SQTT and AMDLLVM	2025-03-05 09:21:18 +08:00
chenyu	e301f21f63	CI ubuntu-20.04 -> ubuntu-22.04 (#9345 ) 20.04 is removed now	2025-03-04 11:39:12 -05:00
chenyu	019417743c	ruff torch backend (#9341 )	2025-03-03 15:15:23 -05:00
chenyu	40619a4bbc	separate workflow for TINY_BACKEND=1 mnist (#9339 ) * separate workflow for TINY_BACKEND=1 mnist * rebalance	2025-03-03 13:05:24 -05:00
Eitan Turok	d657d5f754	[Bounty] Vectorize Transcendental (#9058 ) * init * cast everythig right * more casting * install pillow in test * quick tests * simplify * quick tests * delete test * tests * fix import error * add vec to ldexp3k * vec for bitcast * some helper tests * high level tests * clean tests * change tolerance so cuda passes * ruff passes * remove tests for transcendental helpers * ruff passes * make exponent in power vectorized * fix pow test * add newline * add vec dtype to ilogb2k * comment + clean up * ruff --------- Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-28 15:47:25 +08:00

... 6 7 8 9 10 ...

1107 Commits