tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Christopher Milan	19e96497ee	interface in DEV (#15620 )	2026-04-06 19:59:28 -04:00
qazal	8ba58304f7	viz: reenable tests (#15626 )	2026-04-07 07:52:44 +09:00
chenyu	2f7d085450	shared _normalize_indices for getitem (#15625 ) * shared _normalize_indices for getitem * list	2026-04-06 17:45:36 -04:00
chenyu	01b49c8647	support int operand for shifts (#15618 ) matches torch/jax, also symbolic rule to remove mask	2026-04-06 12:32:12 -04:00
Christopher Milan	645d45d968	DEV has arch (#15577 ) Co-authored-by: Comma Device <device@comma.ai>	2026-04-03 19:17:19 -04:00
Christopher Milan	0ed8d9271d	Renderers accept Target or nothing (#15590 )	2026-04-03 01:09:41 -04:00
chenyu	1aa04eab08	simple CreationMixin (#15567 ) start with full_like, zeros_like, ones_like	2026-04-01 23:00:56 -04:00
Christopher Milan	6c67bd4c14	better error message when invalid renderer is specified (#15573 )	2026-04-01 17:12:55 -04:00
b1tg	20497f2840	fold BIND to CONST when min==max (#15568 )	2026-04-01 11:19:04 -04:00
chenyu	f5c0794df2	fix Tensor.const_like (#15565 ) used to always return a 0-d tensor, now returns an expanded Tensor based on self.shape and matches UOp	2026-04-01 08:35:19 -04:00
Christopher Milan	acf239e4d2	specify renderer in DEV, <dev>_<ren>=1 is deprecated (#15551 )	2026-03-31 18:35:14 -04:00
chenyu	4ac2552642	improve ReduceMixin.all (#15544 ) use prod instead of min since `mul` lowered to `and` directly	2026-03-31 07:54:27 -04:00
chenyu	89ec22131a	tests to show double negation in min is not cancelled (#15543 )	2026-03-31 06:59:13 -04:00
qazal	467c0af8aa	viz: skip flaky sever tests (#15538 )	2026-03-31 17:20:30 +09:00
Christopher Milan	adbfd82d1d	DEV is ContextVar, setting Device.DEFAULT is deprecated (#15508 )	2026-03-30 17:10:49 -04:00
chenyu	c0753ab62f	XOR simplifcation rules (#15512 ) x^-1 has good vmin/vmax, and x^y^y is x	2026-03-27 23:23:27 -04:00
nimlgen	0d6fc0f571	jit: graphing in uops (#15489 ) * jit: graphing as rewrite rule * f * +metal,cuda * x * cl * x * x * simpler * f * m * x * revert? * revert2 * back * back * t * x * m * x * c * x * l * x * comment * smaller * rv * x * x	2026-03-27 19:09:02 +03:00
chenyu	30ebbe7f17	few more fold valid tests (#15509 ) from remove CORRECT_DIVMOD_FOLDING attempt	2026-03-27 10:38:42 -04:00
chenyu	323fcefd7d	Revert "DEV is a ContextVar (#15505 )" (#15506 ) This reverts commit `fdb30cba96`.	2026-03-27 02:22:40 -04:00
Christopher Milan	fdb30cba96	DEV is a ContextVar (#15505 )	2026-03-27 00:57:09 -04:00
Christopher Milan	67a50fb738	move where on load with casts (#15492 )	2026-03-26 22:11:27 -04:00
Christopher Milan	bc180a963c	deprecate <dev>=1 in favor of DEV=<dev> (#15467 ) * start work on target * add test * update actions to use DEV * update docs * update readmes * tests need that too * update example * update tests (comments) * fix that test * ruff * mypy * oops * remove getenvs * don't add Target yet * and the test * lint * and docs * more stuff * assert * few more fixes * test assert	2026-03-26 03:48:03 -04:00
chenyu	7c8f992894	move EXPAND dtype cast back to gradient.py (#15481 ) only a concern for gradient, not mixin	2026-03-25 19:25:26 -04:00
chenyu	713b322e70	add weakint to promo_lattice (#15463 ) sits between bool and smallest int	2026-03-25 00:27:34 -04:00
George Hotz	fe2690399b	llm: support assistant prefill + refactor to TransformerConfig (#15457 ) * llm: support assistant prefill * refactor to ModelConfig * TransformerConfig * more	2026-03-25 10:50:48 +08:00
qazal	652bab8aad	viz: support nested track_rewrites (#15454 ) * simple test * stack active groups	2026-03-25 05:01:30 +09:00
chenyu	b7960841af	support shape broadcast in UOp.alu (#15442 ) i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast	2026-03-24 10:14:57 -04:00
George Hotz	a33ac869aa	llm server: temperature + test client (#15444 ) * improvements to the llm server * eval script * eval llm * better eval gets 58.71 * cleanups * add temperature, but multinomial is absurdly slow * claude is so smart * lint * remove slop * no more stop	2026-03-24 21:07:15 +08:00
George Hotz	85dee83f5d	amd flash attention cleanups + emulator fixes (#15431 ) * amd flash attention cleanups * simpler * params * fix emulator bugs * fix idiv bug * remove that test * more emu fixes	2026-03-24 10:10:46 +08:00
nimlgen	fa4cdb422e	memplan on linears (#15422 ) * memplan * test * x * arenas * correct * set any size * ugh * make hevc happy * x * x * held * rm old * del * x * fu * f * cl * cl * ok	2026-03-23 19:50:16 +08:00
George Hotz	c62dea6881	ai slop flash attention (it works) (#15401 ) * ai slop flash attention (it works) * speed up, 2 TFLOPS + 7 GB/s * simpler * simpler * optimize * faster * warp shuffle * sqtt: link dispatch to exec (#15396) * sqtt packet linking infra python * javascript * ~doubly linked list * ui works * work * exec can also highlight the pc, coloring work * more work * rm sqtt/model.py, doesn't need to be upstreamed * viz: no context enters in cli, update llama profile (#15404) * removed unused named arg in rules [pr] (#15414) * viz: sqtt printer in viz/cli.py (#15411) * work * sqtt timeline in CLI * format all printers nicely * s/Showed/Printed * ansistrip * sys.exit * keep colors in list * work from amd_copy_matmul * has_more always gets returned * linter * don't print colors * more colors * wow this is so deep * work * minor details * selected * improve progress bar * remove it * 22, global_load_vaddr is so long * remove 0 hack in sign, gradient materializes zeros for unconnected nodes (#15416) Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb Co-authored-by: Amp <amp@ampcode.com> works * cnt=20 * revert that * uop slice tests * simpler --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: gg <ggordbegli@gmail.com> Co-authored-by: Amp <amp@ampcode.com>	2026-03-23 16:15:10 +08:00
nimlgen	9656d97d97	jit: captures linears, not execitems (#15399 ) * jit: captures linears, not execitems * x * um * etsts * mockcuda	2026-03-21 16:32:12 +08:00
Christopher Milan	1560b534a5	remove IMAGE=2 (#15312 )	2026-03-20 06:26:52 -04:00
Christopher Milan	0c89340a1e	automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366 )	2026-03-20 02:31:44 -04:00
chenyu	da1700e16b	dtypes.index -> dtypes.weakint (#15377 )	2026-03-20 01:08:46 -04:00
nimlgen	86eec01f97	limit gl*lc (#15359 )	2026-03-19 12:38:55 +08:00
nimlgen	d4836ddbb0	canonicalize device from tuple (#15348 ) * will it ifx ci? * test * um	2026-03-18 20:35:52 +08:00
George Hotz	5524916e39	llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343 ) * llama compute gradients explicitly * apply grads * fix multi issue * multi BUFFER_VIEW support * simpler * skip the flaky test	2026-03-18 19:54:40 +08:00
nimlgen	f853371c83	fix compilers autoselect (#15346 )	2026-03-18 18:19:53 +08:00
chenyu	94926d00d8	fix rand > uint32.max (#15330 ) need to keep low and high as 1D tensor. `PYTHONPATH=. LLAMA3_SIZE=405B python3 examples/mlperf/models/flat_llama.py` works now	2026-03-17 22:00:01 -04:00
qazal	00817cf65e	viz: all tests can run on the NULL device (#15328 ) * remove that * move to test_viz * get_cfg * do not use os.environ * hm * it's always on NULL * import renderer * no import *	2026-03-18 04:14:20 +09:00
chenyu	14eb8170e4	skip TestRunAsModule if libclang is loaded (#15323 ) reverse rule of TestAutogen skip, otherwise `NULL=1 python -m pytest test/null/test_autogen.py test/null/test_device.py` crashes for me	2026-03-17 06:02:53 -04:00
Christopher Milan	9047249a7c	m.where(x.pad_to(m.shape), Invalid) ranges shrink (#15275 )	2026-03-14 07:26:36 -04:00
Christopher Milan	dabdc986df	shrink guarded ranges, try 2 (#15272 )	2026-03-14 04:24:05 -04:00
Christopher Milan	7cf4b16c91	Revert "shrink guarded ranges" (#15271 )	2026-03-14 03:44:38 -04:00
Christopher Milan	d9951e2f8e	shrink guarded ranges (#15263 )	2026-03-14 03:38:48 -04:00
chenyu	90b7f4341d	failed two level divmod recombine case (#15233 )	2026-03-12 04:04:36 -04:00
chenyu	842c978df3	remove staticmethod dtypes.max/min (#15227 ) always use x.dtype.max/min	2026-03-11 23:11:24 -04:00
chenyu	fce87f19a8	better fold_add_divmod_recombine (#15214 )	2026-03-10 23:24:22 -04:00
chenyu	df8deec949	test for nest_by_factor selection (#15213 )	2026-03-10 22:41:31 -04:00

1 2 3

138 Commits