tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Christopher Milan	9f4b7bed25	add pickled jit regression test (#15774 )	2026-04-16 16:59:09 -04:00
qazal	12c653a743	remove opts arg in get_program, everything uses opts_to_apply [pr] (#15767 ) * check Ops.BEAM in process replay * remove opts from the get_program api * lint * simplify * cleanup	2026-04-16 22:42:43 +03:00
George Hotz	d1cce7a476	put the ranges on store instead of after (#15759 ) * put the ranges on store instead of after * better assert * fix stuff * comment out slow rules i don't understand * simpler rule * closer * return false for store * fix loop * only a few schedule failures remain * remove stores to self * all tests pass locally * remove junk * regression test and fix * better test, bump broken torch count * bugfix with regression test * new fusion is better	2026-04-16 19:06:40 +08:00
chenyu	218d6b8988	delete old UOp.size [pr] (#15756 )	2026-04-15 23:21:00 -04:00
chenyu	8bd4fead26	UOp.size -> prod(max_shape) (#15755 ) and more test updates	2026-04-15 22:41:30 -04:00
chenyu	10c262ced8	update tests that use UOp.size (#15753 )	2026-04-15 21:58:27 -04:00
nimlgen	164495678c	test_graph to use uops (#15746 ) * test_graph to use uops * x * n	2026-04-15 21:59:41 +03:00
George Hotz	1ae6528bb6	move schedule into schedule (#15736 ) * move schedule into schedule * callify to root * sched docs	2026-04-15 11:03:25 +08:00
chenyu	3394d18066	size*itemsize -> nbytes (#15729 ) and some UOp.size removal to prep for size to mixin change	2026-04-14 16:27:54 -04:00
chenyu	e706f408cb	suppress test warnings from numpy (#15688 )	2026-04-11 22:33:20 -04:00
chenyu	8e7fcc8ca3	remove _include_initial in _cumalu (#15674 ) handle negative pad in caller	2026-04-10 08:33:30 -04:00
chenyu	4cf2759fc8	fix merge_reduce_ends (#15659 ) * fix merge_reduce_ends same range with different nesting should not merge, like cumsum twice should not merge * skip that	2026-04-08 17:20:01 -04:00
qazal	39a029ec55	remove ASM_GEMM context var (#15645 )	2026-04-08 18:02:40 +09:00
wozeparrot	70dbd35023	llama: move custom_kernel into flat_llama (#15643 )	2026-04-08 00:19:14 -07:00
chenyu	01b49c8647	support int operand for shifts (#15618 ) matches torch/jax, also symbolic rule to remove mask	2026-04-06 12:32:12 -04:00
Andrew Cappelli	e39cfe685a	validate lr, momentum, weight_decay in optimizers (#15576 )	2026-04-06 06:37:34 +08:00
wozeparrot	7e54992bf6	fp8 llama (#15588 ) Co-authored-by: qazal <qazal.software@gmail.com>	2026-04-04 18:24:57 -07:00
Christopher Milan	645d45d968	DEV has arch (#15577 ) Co-authored-by: Comma Device <device@comma.ai>	2026-04-03 19:17:19 -04:00
chenyu	8fdef2d3e4	mean/std/var to mixin (#15593 )	2026-04-03 10:42:41 -04:00
Christopher Milan	0ed8d9271d	Renderers accept Target or nothing (#15590 )	2026-04-03 01:09:41 -04:00
qazal	fefb0ebc2a	gemm/asm: fp8 cleanups (#15580 ) * normal gemm here * s/dtypes.fp8e4m3/FP8_DTYPE * gemm_bw * device UOp stays NULL	2026-04-02 19:02:38 +09:00
chenyu	1aa04eab08	simple CreationMixin (#15567 ) start with full_like, zeros_like, ones_like	2026-04-01 23:00:56 -04:00
Christopher Milan	0d6fbc2355	remove flaky and redundant image test (#15574 )	2026-04-01 16:33:13 -04:00
chenyu	fc5b94b902	fix UOp.where(const, const) (#15560 ) * fix UOp.where(const, const) * fix	2026-04-01 05:28:49 -04:00
Christopher Milan	acf239e4d2	specify renderer in DEV, <dev>_<ren>=1 is deprecated (#15551 )	2026-03-31 18:35:14 -04:00
qazal	8feb8edc68	gemm/asm: add fp8 support to cdna asm_gemm (#15542 ) * work * hmm, mixins * rhs_transposed * also fix the dtype * check for hipcc * Exception * select dev * default	2026-03-31 19:32:54 +09:00
qazal	f88e255cea	gemm/asm: split and parameterize dtype in llama gemm tests (#15408 ) * gemm/asm: more tests for emulator, parameterize llama gemm tests * bf16 atol	2026-03-31 17:12:44 +09:00
chenyu	f0eaac4235	reduce mixin (#15523 )	2026-03-30 05:23:58 -04:00
nimlgen	0d6fc0f571	jit: graphing in uops (#15489 ) * jit: graphing as rewrite rule * f * +metal,cuda * x * cl * x * x * simpler * f * m * x * revert? * revert2 * back * back * t * x * m * x * c * x * l * x * comment * smaller * rv * x * x	2026-03-27 19:09:02 +03:00
nimlgen	7193f90746	test view input in jit (#15497 ) * will anything fail? * add test	2026-03-26 16:59:47 +03:00
Christopher Milan	bc180a963c	deprecate <dev>=1 in favor of DEV=<dev> (#15467 ) * start work on target * add test * update actions to use DEV * update docs * update readmes * tests need that too * update example * update tests (comments) * fix that test * ruff * mypy * oops * remove getenvs * don't add Target yet * and the test * lint * and docs * more stuff * assert * few more fixes * test assert	2026-03-26 03:48:03 -04:00
nimlgen	2da008ae3b	jit: rm replan (#15433 )	2026-03-23 19:31:51 +08:00
nimlgen	c74fa9bbe1	fix jitbeam not triggered (#15424 ) * um * beam * x * f	2026-03-23 15:34:59 +08:00
nimlgen	9656d97d97	jit: captures linears, not execitems (#15399 ) * jit: captures linears, not execitems * x * um * etsts * mockcuda	2026-03-21 16:32:12 +08:00
Christopher Milan	1560b534a5	remove IMAGE=2 (#15312 )	2026-03-20 06:26:52 -04:00
Christopher Milan	0c89340a1e	automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366 )	2026-03-20 02:31:44 -04:00
chenyu	da1700e16b	dtypes.index -> dtypes.weakint (#15377 )	2026-03-20 01:08:46 -04:00
qazal	176ad47d7d	cdna4 emulator testing ASM_GEMM in CI (#15373 ) * cdna emulator work * accvgprs * cdna passes most tests * ruff * add cdna4 to tests * cdna emu * crash * pass? * work * gen * clean up wave_size access * asm_gemm passes * remove acc from dsl.py, emulator can keep its different reg file it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can be cleaned up later, but not functionally required for emulator. * split asm_gemm tests to ones fast on the emulator * don't do that * 124 stays null on rdna * the segfault was because of hw regs, not this * Revert "clean up wave_size access", it's explicitly tested This reverts commit `1202ff5787`. * nullcopyout --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-03-20 05:51:30 +09:00
chenyu	b39816e998	failed test case for Tensor(np, "bf16") (#15358 )	2026-03-18 23:40:14 -04:00
wozeparrot	c45a606750	feat: no if in rand (#15333 )	2026-03-18 15:09:51 -07:00
George Hotz	5524916e39	llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343 ) * llama compute gradients explicitly * apply grads * fix multi issue * multi BUFFER_VIEW support * simpler * skip the flaky test	2026-03-18 19:54:40 +08:00
chenyu	761ce8c0d3	fix Invalid combine rules (#15345 ) * fix Invalid combine rules wrong conditions broke setiem into invalids * fix	2026-03-18 04:58:02 -04:00
chenyu	fceb21c315	Tensor(uop) uses device from uop (#15340 )	2026-03-18 02:56:06 -04:00
George Hotz	6109117af1	anonymous buffers are Invalid (#15336 ) * anonymous buffers are Invalid * unique_const * work * remove invalid writes * test_anonymous_buffers_in_function	2026-03-18 14:52:56 +08:00
chenyu	ac7a348d06	dtypes.as_const -> DType.const (#15337 ) does not need to be a staticmethod	2026-03-18 00:48:41 -04:00
wozeparrot	b45edeb965	fix: rand supports large tensors (#15329 )	2026-03-17 15:45:41 -07:00
wozeparrot	674c760974	embedded bwd vocab shard (#15001 ) * fix: remove more multi from call * feat: embedding bwd vocab sharding * clean: unused import * clean: don't actually need this pattern	2026-03-16 19:37:16 -07:00
qazal	33bd33e783	sqtt: add CDNA ops enum, show in viz (#15140 )	2026-03-17 09:38:42 +09:00
qazal	5cd1daa3bc	cdna asm_gemm in one file, remove old rdna3 asm (#15281 )	2026-03-16 04:32:30 +09:00
chenyu	842c978df3	remove staticmethod dtypes.max/min (#15227 ) always use x.dtype.max/min	2026-03-11 23:11:24 -04:00

1 2 3

121 Commits