tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
chenyu	bcc08307da	removed unused named arg in rules [pr] (#15414 )	2026-03-22 09:25:46 -04:00
qazal	2363bceb47	viz: no context enters in cli, update llama profile (#15404 )	2026-03-22 05:47:02 +09:00
qazal	a9ceaf3c5f	sqtt: link dispatch to exec (#15396 ) * sqtt packet linking infra python * javascript * ~doubly linked list * ui works * work * exec can also highlight the pc, coloring work * more work * rm sqtt/model.py, doesn't need to be upstreamed	2026-03-21 23:48:58 +09:00
nimlgen	9656d97d97	jit: captures linears, not execitems (#15399 ) * jit: captures linears, not execitems * x * um * etsts * mockcuda	2026-03-21 16:32:12 +08:00
George Hotz	c13d9d29ff	add SHAPED_WMMA (#15400 ) * add SHAPED_WMMA * shaped wmma * less bad	2026-03-21 16:16:03 +08:00
George Hotz	41a9b09683	minimal vec in amd_copy_matmul (#15398 ) * minimal vec in amd_copy_matmul * unified * unify * reshape/permute * cleanups * simpler * move index * cleanups * more shared	2026-03-21 14:57:21 +08:00
qazal	30b3054fd5	whitespace cleanups in viz and sqtt.py (#15395 )	2026-03-21 04:46:19 +09:00
qazal	71ccc69c52	FP8=1 llama works again, hipcc can run on macos (#15394 ) * hipcc macos shim * is_dtype_supported opens devices less	2026-03-20 23:43:15 +09:00
Christopher Milan	9470d5193a	deterministic decomp apply order (#15393 )	2026-03-20 08:10:45 -04:00
Christopher Milan	376585b003	use should_emulate for target dtype in decomp (#15392 )	2026-03-20 07:44:57 -04:00
Christopher Milan	a12d3951de	fix test_export_model imports (#15389 )	2026-03-20 07:27:01 -04:00
George Hotz	1a2a203f48	add wmma support to amd_copy_matmul (#15384 ) * add wmma support to amd_copy_matmul * 15 TFLOPS and merged * unify * simpler * simpler * simpler * cleanups * TM/TN is the full regs * comments * WAVES_PER_SH + SQTT_EVENT * Add WAVERDY support * no split warp * 3 range	2026-03-20 19:02:19 +08:00
Christopher Milan	1560b534a5	remove IMAGE=2 (#15312 )	2026-03-20 06:26:52 -04:00
Christopher Milan	30d609432f	ci: only xcode-select for gpuocelot on macos (#15387 )	2026-03-20 05:58:16 -04:00
chenyu	d1b4e37dfa	remove InvalidType branch in Tensor.__init__ (#15386 ) it's handled by `elif isinstance(data, get_args(ConstType)):` already	2026-03-20 05:32:33 -04:00
chenyu	c491345766	pass device into Tensor._frompy (#15385 ) * pass device into Tensor._frompy with this, canonicalize_device is the only usage of Device in tensor.py * export_model.py	2026-03-20 05:09:01 -04:00
George Hotz	3b75d8a7a2	fix double after bug in rangeify (#15381 )	2026-03-20 14:53:46 +08:00
Christopher Milan	0c89340a1e	automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366 )	2026-03-20 02:31:44 -04:00
George Hotz	78ad089817	make precompile the default for llm (#15376 ) * make precompile the default for llm * works * empty is okay for kvcache * fix cache misses * more tests	2026-03-20 14:08:55 +08:00
chenyu	459ef41ea0	don't exclude weakint in is_dtype_supported [pr] (#15378 )	2026-03-20 02:08:29 -04:00
qazal	cf6a429aaa	mypy emulator pre-commit passing (#15379 ) * fix dict stuff * add type: ignores * fix pcode to put uops not ints	2026-03-20 14:44:09 +09:00
wozeparrot	87c4ec1724	llama: use flat llama (#15353 )	2026-03-19 22:12:38 -07:00
chenyu	da1700e16b	dtypes.index -> dtypes.weakint (#15377 )	2026-03-20 01:08:46 -04:00
nimlgen	3b04e3ea28	no gmmu mappings with GMMU=0 (#15369 ) * usb * free * simple gmmu=0 * x * x * vram * init tests * ppg * x	2026-03-20 12:18:34 +08:00
ridoy majumdar	c1183b8872	remove dead code in pyrender (#15115 ) * remove dead code in pyrender * retrig CI * retrig CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2026-03-19 23:59:56 -04:00
chenyu	bf33c5f796	remove gradient materialize_grads (#15367 ) effectively default to True and removed *0 hack in Tensor.copysign. now dy/dx=0 if y does not depend on x remove	2026-03-19 23:36:03 -04:00
chenyu	45baf3ff3f	pin ci xcode version (#15375 )	2026-03-19 23:13:16 -04:00
George Hotz	4091d37e8e	flat llama step work (#15355 ) * flat llama step work * fp8 support * blacklisted matmul * chestertons fence	2026-03-20 09:06:12 +08:00
qazal	176ad47d7d	cdna4 emulator testing ASM_GEMM in CI (#15373 ) * cdna emulator work * accvgprs * cdna passes most tests * ruff * add cdna4 to tests * cdna emu * crash * pass? * work * gen * clean up wave_size access * asm_gemm passes * remove acc from dsl.py, emulator can keep its different reg file it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can be cleaned up later, but not functionally required for emulator. * split asm_gemm tests to ones fast on the emulator * don't do that * 124 stays null on rdna * the segfault was because of hw regs, not this * Revert "clean up wave_size access", it's explicitly tested This reverts commit `1202ff5787`. * nullcopyout --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-03-20 05:51:30 +09:00
nimlgen	16daffc042	remote connection timeout (#15370 )	2026-03-19 19:44:16 +08:00
Christopher Milan	68d7a6b7be	PYTHONREMU: fix vop3p literals (#15372 )	2026-03-19 07:05:01 -04:00
George Hotz	70dad9d642	add PING to RemoteCmd (#15371 ) * add PING to RemoteCmd * cleanup	2026-03-19 18:57:40 +08:00
nimlgen	1c978aeedb	amd: fix aql remote (#15368 )	2026-03-19 18:11:03 +08:00
qazal	337c684047	viz: cycle time relative to kernel start in sidebar (#15352 )	2026-03-19 18:41:29 +09:00
chenyu	d81b03cff4	pad_to to mixin [pr] (#15365 )	2026-03-19 05:02:01 -04:00
chenyu	1abb6297f6	more Tensor(UOp) cleanups (#15364 ) * more Tensor(UOp) cleanups * function too	2026-03-19 03:34:30 -04:00
nimlgen	cf50ca23c3	better oom msg (#15362 ) * better oom msg * s	2026-03-19 14:07:01 +08:00
nimlgen	1a53393512	remote in ci benchmark (#15344 ) * remote in ci benchmark * move to the end * move * ports * own this	2026-03-19 13:49:09 +08:00
chenyu	92dfef8060	Tensor(uop) does not need explicit device (#15361 )	2026-03-19 00:44:33 -04:00
nimlgen	f32c2e43a7	memory: use pfree (#15360 )	2026-03-19 12:39:23 +08:00
nimlgen	86eec01f97	limit gl*lc (#15359 )	2026-03-19 12:38:55 +08:00
chenyu	b39816e998	failed test case for Tensor(np, "bf16") (#15358 )	2026-03-18 23:40:14 -04:00
chenyu	e407ee410c	cosmetic Tensor._do_reduction cleanups (#15357 )	2026-03-18 22:27:50 -04:00
chenyu	6aebf95dac	move neg and invert to mixin (#15356 )	2026-03-18 22:03:41 -04:00
wozeparrot	f6687d1ffc	feat: sd seed0 update (#15354 )	2026-03-18 18:42:00 -07:00
wozeparrot	c45a606750	feat: no if in rand (#15333 )	2026-03-18 15:09:51 -07:00
qazal	23e0431848	viz: switch sqtt sidebar to a simple asm list (#15350 ) * work * something like this * Revert "something like this" This reverts commit `6c45098d2b`. * less * path includes * scroll only jumps up and down * it's only pc and line now	2026-03-19 01:40:25 +09:00
qazal	709fc52d7b	viz: fix auto zoom range in sqtt, include endpgm packet (#15349 ) * viz: fix automatic zoom range in sqtt packets * it's x+width * include s_endpgm * endpgm also doesn't have exec	2026-03-18 22:52:32 +09:00
nimlgen	d4836ddbb0	canonicalize device from tuple (#15348 ) * will it ifx ci? * test * um	2026-03-18 20:35:52 +08:00
George Hotz	5524916e39	llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343 ) * llama compute gradients explicitly * apply grads * fix multi issue * multi BUFFER_VIEW support * simpler * skip the flaky test	2026-03-18 19:54:40 +08:00

1 2 3 4 5 ...

12714 Commits