tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
Ignacio Sica	bc91fffc5d	fix gated store with index in python backend (#9703 ) * add default gate in index * assert store * add TestRendererFailures - move test_gated_store_with_alu to new TestRenderFailures class for tests that fail on multiple renderers - add test_renderer_failures.py run on python CI * add test for gated index in 2d * test TestRenderFailures	2025-04-03 12:48:28 +08:00
qazal	f2bd65ccfc	delete Ops.EMPTY and Tensor._metaop (#9715 ) * delete Ops.EMPTY and Tensor._metaop [pr] * test_creation * arg= * abstractions2	2025-04-03 12:29:02 +08:00
George Hotz	5c7b549eab	use functools.cache instead of lru_cache(None) [pr] (#9714 ) * use functools.cache instead of lru_cache(None) [pr] * more cache	2025-04-03 11:47:13 +08:00
qazal	bbd13191f4	cleanup tensor BIND + remove outdated comments in tensor.py [pr] (#9712 ) * cleanup tensor BIND + remove outdated comments in tensor.py [pr] * from_blob whitespace * assert	2025-04-03 11:21:53 +08:00
geohotstan	e1d7e47cca	fix ONNX IsInf unintended dtype promotion (#9711 ) * add IsInf * add corresponding test * that float16 is kinda silly	2025-04-02 22:46:15 -04:00
qazal	11ae254dc5	construct BUFFER UOps directly when device in known [pr] (#9710 ) * construct BUFFER UOps directly when device in known [pr] * diff	2025-04-03 10:41:44 +08:00
George Hotz	1714fc3ba4	start work on speed [pr] (#9707 ) * fix get_location * fix get_location try 2 * clean up split_load_store [pr] * SHR fixup [pr]	2025-04-03 10:39:01 +08:00
George Hotz	0f1ffc2050	hotfix: cat tests 2048 instead of 256	2025-04-03 10:37:56 +08:00
uuuvn	5bd485c027	Fix double SDMA_OP_FENCE (#9705 ) Introduced in #9585, probably when i incorrectly resolved merge conflict while rebasing an old, mi300x-only branch. Seems to be the source of multi gpu beam llama hangs	2025-04-03 09:43:37 +08:00
chenyu	a6fec2f5ae	dev_run for bert on mi300x (#9706 )	2025-04-02 21:12:55 -04:00
nimlgen	d96b4983ac	amd: support rdna4 in runtime again (#9702 )	2025-04-03 01:19:23 +07:00
Ignacio Sica	2d6d8b7355	add bf16 mfma support (#9695 ) * add bf16 mfma support * skip tc if emulated_amd and dtypes is bf16 * hotfix	2025-04-02 21:44:49 +08:00
nimlgen	a6733f519f	dsp: make relro sections contiguous (#9701 )	2025-04-02 18:02:16 +07:00
George Hotz	ea5caefef0	gep should look at count, not vcount (#9698 ) * gep should look at count, not vcount * gep in order is a rule * min change * gep on void	2025-04-02 18:10:57 +08:00
George Hotz	f72a87fd0e	add proper support for Ops.IGNORE to remove store masks (#9692 ) * add proper support for Ops.IGNORE to remove store masks * remove useless NHWC * revert that	2025-04-02 16:38:01 +08:00
chenyu	3b8d923692	remove skip LLVM in test_div_int (#9686 )	2025-04-02 04:15:00 -04:00
chenyu	bc3bfcbad4	update install gpuocelot (#9693 ) `-DCMAKE_POLICY_VERSION_MINIMUM=3.5`	2025-04-02 04:10:34 -04:00
George Hotz	e78e8722dc	Revert "LDS noop and spec (#9669 )" (#9691 ) This reverts commit `870b545ace`. Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>	2025-04-02 15:31:32 +08:00
George Hotz	4514fd91c1	more stuff from DSP (#9689 ) * more good stuff from dsp branch * test pkl imagenet	2025-04-02 15:27:48 +08:00
chenyu	6a5eacba8b	disable CI red llama 3 4 gpu beam (#9690 ) device hangs and ci would fail	2025-04-02 03:19:09 -04:00
Ignacio Sica	876a8be97a	Debug env var breakdown (#9663 ) * add debug level breakdown * hotfix * Update env_vars.md	2025-04-02 14:34:07 +08:00
George Hotz	6f812d3f2f	fixes from the dsp branch + 12500 lines (#9683 ) * fixes from the dsp branch * more changes * those are gep pushing	2025-04-02 13:07:17 +08:00
chenyu	c20f112e9f	example test use z3 to verify valid simplification (#9684 )	2025-04-02 01:05:52 -04:00
chenyu	bca0c85193	skip CI CPU test_data_parallel_resnet_train_step (#9685 ) flaky	2025-04-02 01:04:54 -04:00
qazal	bb94f13e58	add RECORD_TRACEBACKS=1 option to process replay (#9679 ) * add RECORD_TRACEBACKS=1 option to process replay * stack	2025-04-02 11:58:27 +08:00
chenyu	3acc1b928a	minor div_and_mod_folding cleanup [pr] (#9681 ) it's not wrong because the dtype is never used, but `x.const_like` is more readable	2025-04-01 23:51:36 -04:00
chenyu	c672716b38	improve vmin/vmax for IDIV (#9678 )	2025-04-01 23:16:01 -04:00
chenyu	8dd88ad476	don't div_and_mod_folding for negative numerator with remainder (#9674 ) can be wrong in C div since it truncates towards zero	2025-04-01 16:26:23 -04:00
chenyu	0e34f9082e	helper functions for cstyle div mod [pr] (#9673 )	2025-04-01 08:06:56 -04:00
qazal	eee0dcc37a	merge viz back into one file (#9672 ) * merge viz back into one file * work * rename lib to js directory * fix diff * less indenting * memory graph is back * viz_sz.py	2025-04-01 19:52:02 +08:00
Ignacio Sica	870b545ace	LDS noop and spec (#9669 ) * init lds noop and lds_0 spec * refactor lds helper test * fix typo * test all lds at the same time * change comment * comment * start test_lds_full * test_lds_tc * add tc spec	2025-04-01 18:44:55 +08:00
uuuvn	609a006242	AMDComputeQueue.wreg (#9628 ) * AMDComputeQueue.wreg Used to be part of #9428, i think it's much more readable than repeating the ~same pm4 things over and over again, especially with separate .encode * fix indentation	2025-04-01 17:01:33 +07:00
qazal	fa373e15a3	hotfix: NULL=1 Buffer does not have _buf (#9661 )	2025-04-01 17:43:55 +08:00
nimlgen	3e2f42c2e8	autogen: remove am headers from extra (#9666 )	2025-04-01 14:45:30 +07:00
Ignacio Sica	cfad139189	bump assembly debug to 7 (#9662 )	2025-04-01 11:51:33 +08:00
Ignacio Sica	ac533e89a2	remove duplicated ast print (#9660 )	2025-04-01 10:29:24 +08:00
Ignacio Sica	846ef84cda	move uops print to debug >= 6 (#9659 )	2025-04-01 10:29:09 +08:00
b1tg	d9af4cfc1b	AMD_LLVM: tensor cores support (#9613 ) * tensor cores support * test tesor cores codegen * use rewrite rules --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-04-01 09:56:27 +08:00
qazal	1658eb4e63	always fit fresh viz graph into view [pr] (#9657 )	2025-04-01 09:34:26 +08:00
Anish Umale	a1ee4d587f	Fix test_ops for tiny backend (#9302 ) * fix some tests in test_ops for torch backend(171 failing) * fix more tests (135 failures) * fix tests (126 failing) * handle transposed convs (109 tests failing) * fix slice * fix lshift & rshift and more tests (87 tests failing) * revert accidental change * remove unnecessary changes (82 failures) * fix backward for avg_pool2d (78 failures) * fix backward for avg_pool2d (78 failures) * fix replication backpass * fix reflection pad back pass (71 failures) * cummax with indicies, aten.mv and move out methods (67 failures) * extract avg_pool2d and avg_pool3d to separate functions (62 failures) * revert changes for cat_out * rewrite avg_pool and pad without repetition * remove duplicates from decomps * slice rewrite and add slice_backward (59 failures) * add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297 * fix linter error and remove Tensor.pad (48 failures) * add select_backward and index_put (40 failures) * fix some more tests (36 failures) * fix more tests (12 failures) * some cleanups and fix couple more tests (10 failures) * cleaner way to write upsample * some more upsample cleanups * use lambda for upsample * add autowrapper for upsample forward * cumsum and max_dim without aten functions * revert _log_softmax * fix more tests (1 failure) * make linter happy * move import to appropriate func * make linter happy * add codes for noqa * some more refactors * remove comment * remove dependency on aten function for conv backward * some more refactors * add returns * revert a change from merge * some cleanups * remove whitespace * remove ruff change * revert upsample * add masked_fill_.Tensor and scatter.src_out * add todo * fix test_biased_conv2d * fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :( * revert torch_debug * revert torch_debug * skip test_gather_failure for the tiny backend * make padding registration more consise * add nonzero * remove scatter_add since we already have the out * fix scatter * remove some repetition * make upsample backward registrations more concise * remove select.int * use Tensor.cumsum * realize conv2d outputs before backward to fix test_biased_conv2d * add a todo for realize(1 failure) * add new_empty and new_empty_strided * make test_pad_circular_mode forward only and remove redundant stuff * fix linter errors * remove expect failure * just tb * slice is a view_op * contiguous only when lazydata.is_realized * fix backward for test_pad_circular_mode * revert torch.nn.functional.pad override * add transpose.int and make constant_pad_nd contiguous * slice_backwards has no kwargs --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-31 21:13:09 -04:00
qazal	a0b4465412	bring GroupOp.Meta back (#9656 )	2025-04-01 01:02:29 +08:00
Ignacio Sica	f277f407f2	remove smem_prefix_for_cast for amd (#9651 )	2025-03-31 23:03:35 +08:00
chenyu	f7cb2e8da3	bert dev_beam for mi300x box (#9648 ) * bert dev_beam for mi300x box * terminate BENCHMARK properly	2025-03-31 08:35:51 -04:00
qazal	5171b098e5	merge_double_reduce without asserts [pr] (#9650 )	2025-03-31 19:17:05 +08:00
Ignacio Sica	1444069c09	Uppercase K for dimension and lowercase k for kernel in linearizer tc helper test (#9649 )	2025-03-31 19:05:36 +08:00
Ignacio Sica	baa67fd124	Uppercase N and M (standalone syntax change) (#9647 )	2025-03-31 18:45:30 +08:00
chenyu	aca0f1befb	print idx when OUT OF BOUNDS ACCESS (#9646 ) in some cases (if there's a where in idx) the vmin/vmax might not be tight	2025-03-31 06:12:44 -04:00
Priyank Patel	e2d9322d21	torch backend: partial fix for strided related test fails (#9642 ) * partial fix for strided related test fails * cleanup * fix lint	2025-03-31 05:45:18 -04:00
qazal	76c1b1edf6	viz kernel list cleanup (#9643 )	2025-03-31 15:53:39 +08:00
George Hotz	e4c545b396	linearizer fix from dsp branch (#9641 ) * linearizer fix from dsp branch * revert that	2025-03-31 14:26:39 +08:00

... 45 46 47 48 49 ...

10633 Commits