tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Christopher Milan	fdb30cba96	DEV is a ContextVar (#15505 )	2026-03-27 00:57:09 -04:00
Christopher Milan	67a50fb738	move where on load with casts (#15492 )	2026-03-26 22:11:27 -04:00
qazal	586c49642f	viz/cli: test in CI (#15501 ) * viz cli work * baseline test * make cli test work without subprocess * more checks * check itrace * s/return/return None * change * minimal * colored	2026-03-27 06:47:15 +09:00
qazal	3f9f0fa846	viz: yield sqtt alt events (#15500 ) * yield other * less * work * less	2026-03-27 04:43:41 +09:00
nimlgen	7193f90746	test view input in jit (#15497 ) * will anything fail? * add test	2026-03-26 16:59:47 +03:00
Christopher Milan	bc180a963c	deprecate <dev>=1 in favor of DEV=<dev> (#15467 ) * start work on target * add test * update actions to use DEV * update docs * update readmes * tests need that too * update example * update tests (comments) * fix that test * ruff * mypy * oops * remove getenvs * don't add Target yet * and the test * lint * and docs * more stuff * assert * few more fixes * test assert	2026-03-26 03:48:03 -04:00
chenyu	7c8f992894	move EXPAND dtype cast back to gradient.py (#15481 ) only a concern for gradient, not mixin	2026-03-25 19:25:26 -04:00
qazal	60bd546593	sqtt: add cycle count to rdna3 enums (#15473 ) * update rdna3 sqtt enums to include cycle_count * dispatch_to_exec	2026-03-25 23:19:54 +09:00
chenyu	713b322e70	add weakint to promo_lattice (#15463 ) sits between bool and smallest int	2026-03-25 00:27:34 -04:00
George Hotz	fe2690399b	llm: support assistant prefill + refactor to TransformerConfig (#15457 ) * llm: support assistant prefill * refactor to ModelConfig * TransformerConfig * more	2026-03-25 10:50:48 +08:00
qazal	1b3d00d6ac	viz/cli: remove --offset and --limit flags (#15439 ) * work * also no more no-color * reorder * update llama * sqtt readme * itertools * rm that * signals back	2026-03-25 09:52:27 +09:00
qazal	652bab8aad	viz: support nested track_rewrites (#15454 ) * simple test * stack active groups	2026-03-25 05:01:30 +09:00
chenyu	b7960841af	support shape broadcast in UOp.alu (#15442 ) i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast	2026-03-24 10:14:57 -04:00
George Hotz	a33ac869aa	llm server: temperature + test client (#15444 ) * improvements to the llm server * eval script * eval llm * better eval gets 58.71 * cleanups * add temperature, but multinomial is absurdly slow * claude is so smart * lint * remove slop * no more stop	2026-03-24 21:07:15 +08:00
George Hotz	85dee83f5d	amd flash attention cleanups + emulator fixes (#15431 ) * amd flash attention cleanups * simpler * params * fix emulator bugs * fix idiv bug * remove that test * more emu fixes	2026-03-24 10:10:46 +08:00
qazal	a590eded87	sqtt: rdna4 decoder work (#15434 ) * sqtt: rdna4 decoder work * diff cleanup * more diff * test * work * works * TS_DELTA_SHORT	2026-03-24 03:49:32 +09:00
qazal	109472c37e	sqtt: new s_barrier pickles, handle rdna4 barriers in emulator (#15437 )	2026-03-24 03:25:28 +09:00
nimlgen	fa4cdb422e	memplan on linears (#15422 ) * memplan * test * x * arenas * correct * set any size * ugh * make hevc happy * x * x * held * rm old * del * x * fu * f * cl * cl * ok	2026-03-23 19:50:16 +08:00
nimlgen	2da008ae3b	jit: rm replan (#15433 )	2026-03-23 19:31:51 +08:00
qazal	c4c53418f8	sqtt: comment out flaky rocprof timestamp assert (#15432 ) * comment out rocprof assert, add new assert * better than > 0 assert * string	2026-03-23 19:24:04 +09:00
George Hotz	c62dea6881	ai slop flash attention (it works) (#15401 ) * ai slop flash attention (it works) * speed up, 2 TFLOPS + 7 GB/s * simpler * simpler * optimize * faster * warp shuffle * sqtt: link dispatch to exec (#15396) * sqtt packet linking infra python * javascript * ~doubly linked list * ui works * work * exec can also highlight the pc, coloring work * more work * rm sqtt/model.py, doesn't need to be upstreamed * viz: no context enters in cli, update llama profile (#15404) * removed unused named arg in rules [pr] (#15414) * viz: sqtt printer in viz/cli.py (#15411) * work * sqtt timeline in CLI * format all printers nicely * s/Showed/Printed * ansistrip * sys.exit * keep colors in list * work from amd_copy_matmul * has_more always gets returned * linter * don't print colors * more colors * wow this is so deep * work * minor details * selected * improve progress bar * remove it * 22, global_load_vaddr is so long * remove 0 hack in sign, gradient materializes zeros for unconnected nodes (#15416) Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb Co-authored-by: Amp <amp@ampcode.com> works * cnt=20 * revert that * uop slice tests * simpler --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: gg <ggordbegli@gmail.com> Co-authored-by: Amp <amp@ampcode.com>	2026-03-23 16:15:10 +08:00
nimlgen	c74fa9bbe1	fix jitbeam not triggered (#15424 ) * um * beam * x * f	2026-03-23 15:34:59 +08:00
qazal	c7b18e6108	viz: sqtt printer in viz/cli.py (#15411 ) * work * sqtt timeline in CLI * format all printers nicely * s/Showed/Printed * ansistrip * sys.exit * keep colors in list * work from amd_copy_matmul * has_more always gets returned * linter * don't print colors * more colors * wow this is so deep * work * minor details * selected * improve progress bar * remove it * 22, global_load_vaddr is so long	2026-03-23 00:17:05 +09:00
nimlgen	9656d97d97	jit: captures linears, not execitems (#15399 ) * jit: captures linears, not execitems * x * um * etsts * mockcuda	2026-03-21 16:32:12 +08:00
Christopher Milan	a12d3951de	fix test_export_model imports (#15389 )	2026-03-20 07:27:01 -04:00
Christopher Milan	1560b534a5	remove IMAGE=2 (#15312 )	2026-03-20 06:26:52 -04:00
chenyu	c491345766	pass device into Tensor._frompy (#15385 ) * pass device into Tensor._frompy with this, canonicalize_device is the only usage of Device in tensor.py * export_model.py	2026-03-20 05:09:01 -04:00
George Hotz	3b75d8a7a2	fix double after bug in rangeify (#15381 )	2026-03-20 14:53:46 +08:00
Christopher Milan	0c89340a1e	automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366 )	2026-03-20 02:31:44 -04:00
qazal	cf6a429aaa	mypy emulator pre-commit passing (#15379 ) * fix dict stuff * add type: ignores * fix pcode to put uops not ints	2026-03-20 14:44:09 +09:00
chenyu	da1700e16b	dtypes.index -> dtypes.weakint (#15377 )	2026-03-20 01:08:46 -04:00
chenyu	bf33c5f796	remove gradient materialize_grads (#15367 ) effectively default to True and removed *0 hack in Tensor.copysign. now dy/dx=0 if y does not depend on x remove	2026-03-19 23:36:03 -04:00
qazal	176ad47d7d	cdna4 emulator testing ASM_GEMM in CI (#15373 ) * cdna emulator work * accvgprs * cdna passes most tests * ruff * add cdna4 to tests * cdna emu * crash * pass? * work * gen * clean up wave_size access * asm_gemm passes * remove acc from dsl.py, emulator can keep its different reg file it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can be cleaned up later, but not functionally required for emulator. * split asm_gemm tests to ones fast on the emulator * don't do that * 124 stays null on rdna * the segfault was because of hw regs, not this * Revert "clean up wave_size access", it's explicitly tested This reverts commit `1202ff5787`. * nullcopyout --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-03-20 05:51:30 +09:00
Christopher Milan	68d7a6b7be	PYTHONREMU: fix vop3p literals (#15372 )	2026-03-19 07:05:01 -04:00
nimlgen	86eec01f97	limit gl*lc (#15359 )	2026-03-19 12:38:55 +08:00
chenyu	b39816e998	failed test case for Tensor(np, "bf16") (#15358 )	2026-03-18 23:40:14 -04:00
wozeparrot	c45a606750	feat: no if in rand (#15333 )	2026-03-18 15:09:51 -07:00
qazal	709fc52d7b	viz: fix auto zoom range in sqtt, include endpgm packet (#15349 ) * viz: fix automatic zoom range in sqtt packets * it's x+width * include s_endpgm * endpgm also doesn't have exec	2026-03-18 22:52:32 +09:00
nimlgen	d4836ddbb0	canonicalize device from tuple (#15348 ) * will it ifx ci? * test * um	2026-03-18 20:35:52 +08:00
George Hotz	5524916e39	llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343 ) * llama compute gradients explicitly * apply grads * fix multi issue * multi BUFFER_VIEW support * simpler * skip the flaky test	2026-03-18 19:54:40 +08:00
nimlgen	f853371c83	fix compilers autoselect (#15346 )	2026-03-18 18:19:53 +08:00
chenyu	761ce8c0d3	fix Invalid combine rules (#15345 ) * fix Invalid combine rules wrong conditions broke setiem into invalids * fix	2026-03-18 04:58:02 -04:00
chenyu	fceb21c315	Tensor(uop) uses device from uop (#15340 )	2026-03-18 02:56:06 -04:00
George Hotz	6109117af1	anonymous buffers are Invalid (#15336 ) * anonymous buffers are Invalid * unique_const * work * remove invalid writes * test_anonymous_buffers_in_function	2026-03-18 14:52:56 +08:00
nimlgen	d720d50e12	memory: traverse all valid ranges only (#15338 ) * memory: traverse all valid ranges only * x	2026-03-18 14:03:39 +08:00
chenyu	ac7a348d06	dtypes.as_const -> DType.const (#15337 ) does not need to be a staticmethod	2026-03-18 00:48:41 -04:00
Christopher Milan	864d3917d5	add openpilot onnx parser test (#15334 )	2026-03-18 00:12:02 -04:00
chenyu	94926d00d8	fix rand > uint32.max (#15330 ) need to keep low and high as 1D tensor. `PYTHONPATH=. LLAMA3_SIZE=405B python3 examples/mlperf/models/flat_llama.py` works now	2026-03-17 22:00:01 -04:00
wozeparrot	b45edeb965	fix: rand supports large tensors (#15329 )	2026-03-17 15:45:41 -07:00
qazal	00817cf65e	viz: all tests can run on the NULL device (#15328 ) * remove that * move to test_viz * get_cfg * do not use os.environ * hm * it's always on NULL * import renderer * no import *	2026-03-18 04:14:20 +09:00

1 2 3 4 5 ...

5347 Commits