tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
wozeparrot	be23772d43	llama3 fixes part2 (#15150 )	2026-03-04 23:43:50 -08:00
wozeparrot	0c769289eb	llama3: more scripts (#15107 )	2026-03-04 22:18:03 -08:00
George Hotz	fb43b415f9	fix symbolic shape call + chunked prefill (#15149 ) * fix precompile for symbolic shape * chunked prefill * cleaner * test that	2026-03-05 14:02:26 +08:00
George Hotz	8a82b26522	llm: print the prefill cache size (#15146 ) * print the llm prefill cache size * mock that too	2026-03-05 12:13:28 +08:00
chenyu	b5370fd52d	use copy_multi in alu_multi [pr] (#15143 ) * use copy_multi in alu_multi [pr] * copy to anything	2026-03-04 22:53:00 -05:00
George Hotz	72a9ed6e23	fix render depth bug + add warmup to serve + no realize default (#15144 ) * fix render depth bug + add warmup to serve * make realize not the default	2026-03-05 11:21:16 +08:00
George Hotz	ac1847cbf7	fully symbolic llm (#15097 ) * work * llm symbolic (almost) * work * revert that * llm sym * works * cleanups * cache tokens with the kv cache * cleanups * cleanups	2026-03-05 10:22:11 +08:00
qazal	33a1970045	sqtt: simplify inst mapping, validate JUMP processing in CI (#15139 ) * jump cleanup * assert there's a JUMP * new example for JUMP * regenerate examples * rdna4 work * new packets * work * less for branch handling * less verbose * fix err message	2026-03-05 09:53:12 +09:00
chenyu	04da527a7a	minor div_and_mod_symbolic cleanups (#15138 )	2026-03-04 19:05:44 -05:00
chenyu	106d18b792	use UOp methods in allreduce.py [pr] (#15137 ) except the one line with Ops.BUFFER and Ops.NOOP, not sure what that's for	2026-03-04 17:15:33 -05:00
chenyu	34594bcaaf	Revert "bug in metal: offset is stored as uint32, overflow (#15129 )" (#15136 ) This reverts commit `9c58db16fa`.	2026-03-04 16:54:42 -05:00
Roelof van Dijk	9c58db16fa	bug in metal: offset is stored as uint32, overflow (#15129 ) * metal uint32 icb offset overflow * fix: diff * supports_exec_item * GraphRunner.supports_exec_item * tests * fix: can't import on non-metal	2026-03-04 22:52:12 +03:00
chenyu	4cce283790	relax test_tqdm_perf (#15134 )	2026-03-04 12:58:47 -05:00
chenyu	fae400d300	update assign tests to also test the expected behavior (#15132 )	2026-03-04 11:34:43 -05:00
chenyu	1f96cc2b51	update non-contiguous buffer error message [pr] (#15131 ) * update non-contiguous buffer error message [pr] also cleaned up the tests * order	2026-03-04 11:13:26 -05:00
nimlgen	563d5c3211	more graph tests (#15130 )	2026-03-04 19:01:12 +03:00
nimlgen	cdc48da9cd	hevc: assert and speed (#15122 ) * hevc: assert and speed * simpler	2026-03-04 19:01:02 +03:00
wozeparrot	4e9b85ecfd	fa: pull inputs out of call (#15127 )	2026-03-04 03:15:49 -08:00
George Hotz	47faa2d7b4	hotfix: llm kv cache uses clone instead of realize to avoid many realize	2026-03-04 19:07:03 +08:00
George Hotz	8ebd24637b	fix fa forward building with clang 22 (#15124 ) * fix fa forward building with clang 22 * fix: override rocm path --------- Co-authored-by: Woze Parrot <wozeparrot@gmail.com>	2026-03-04 02:32:25 -08:00
Christopher Milan	592f9bf6c6	set OPENPILOT_HACKS=1 to enable replace assign (#15123 )	2026-03-04 05:26:04 -05:00
wozeparrot	df23057984	fa: change bwd grid dim + unshuffle using mops (#15068 )	2026-03-04 01:23:40 -08:00
Christopher Milan	5623cea7b1	move openpilot contiguous hacks to schedule (#15120 )	2026-03-04 03:04:06 -05:00
wozeparrot	759c7fc81c	failing test for allreduce memory usage (#15106 )	2026-03-03 23:38:38 -08:00
George Hotz	5ecfe549e7	allreduce is a function with LATE_ALLREDUCE=1 (#15119 ) * allreduce as a function * allreduce function * support allreduce function * LATE_ALLREDUCE	2026-03-04 15:17:58 +08:00
Christopher Milan	e7e70a3c95	simplify idx before counting backward_slice (#15117 )	2026-03-03 23:53:50 -05:00
George Hotz	2d72a4a90c	fix copying padded const (#15116 ) * fix const padding cpu * remove comment	2026-03-04 10:39:45 +08:00
chenyu	b5ebb4d06d	contiguous_view_offset returns only offset [pr] (#15113 ) size is always input.size	2026-03-03 15:23:39 -05:00
nimlgen	abd830b260	am: setup_rinf returns only doorbell (#15112 )	2026-03-03 19:27:41 +03:00
nimlgen	4b42bb54aa	am: reset sdma to start from 0 (#15109 )	2026-03-03 18:14:46 +03:00
George Hotz	01ddb4c267	add precompile to call (#15099 ) * add precompile to call * put get back * something * after structure * alt * keep it call * resolve call * resolve linear call * precompile works with llm * revert rangeify * color for debugging * getenv PRECOMPILE * clean up deco pattern * fully recursive sink scheduling * revert llama * fix SPEC=2	2026-03-03 22:32:42 +08:00
qazal	c7f908b788	sqtt: fix rdna4 structs (#15111 ) * work * DEBUG=2	2026-03-03 23:32:14 +09:00
qazal	8dd691761d	sqtt: remove old files (#15108 )	2026-03-03 22:43:24 +09:00
Christopher Milan	de043226ba	benchmark comma usbgpu driving_vision step and load time (#15103 ) Co-authored-by: Comma Device <device@comma.ai>	2026-03-03 06:08:03 -05:00
Christopher Milan	5f6b610da1	FLOAT16 logic for IMAGE==1 goes back to image_conv2d (#15105 )	2026-03-03 05:37:57 -05:00
wozeparrot	529318259c	fix: fix null tests to actually use null device (#15104 )	2026-03-03 02:05:47 -08:00
George Hotz	7d025089e3	no after removal (#15102 ) * no after removal * we are using walk * null schedule test * pytest deps * Revert "pytest deps" This reverts commit `5e1c5304ec`. * Revert "null schedule test" This reverts commit `02da66053e`. * clean null tests	2026-03-03 17:50:31 +08:00
wozeparrot	92c16810ac	feat: per device mem_used (#15100 )	2026-03-03 01:31:28 -08:00
qazal	e3a0598d0b	viz: the whole pc should be in view (#15101 )	2026-03-03 17:17:53 +09:00
b1tg	a9ea36de79	assembly/amd: v_cmp_lg_f32 is ordered not-equal (#14982 )	2026-03-03 15:37:48 +08:00
wozeparrot	c35de9bd68	asm_gemm: support more sharding (#15002 )	2026-03-02 23:16:37 -08:00
wozeparrot	824ba4386a	llama3 dp fix (#15098 )	2026-03-02 22:43:07 -08:00
chenyu	5dcf29b1a0	use clone in test_swap_slices (#15096 )	2026-03-02 22:05:12 -05:00
Christopher Milan	c70e8af068	move IMAGE FLOAT16 logic to allocations (#15095 ) * FLOAT16 logic in allocations * cleanup * separate that * only apply when IMAGE == 1 * test passing now * create image buffers earlier	2026-03-02 22:00:05 -05:00
George Hotz	d483e4153a	buffer view is like buffer (#15082 ) * buffer view is like buffer * fix * swap_reshape_shrink * contiguous on gguf, fix overlap * revert that * _device_supports_view * this * fix that test * 0 buffers * that test was wrong * this * check correct size * contig BUFFER_VIEW * this * fix tests * buffer view tests * om * fix torch * no MOCKGPU * skip	2026-03-03 09:52:33 +08:00
qazal	62ee976c1b	gemm/asm: cleanup repeated patterns to helper functions (#15094 )	2026-03-03 08:14:47 +09:00
qazal	848f5cea96	viz: sqtt instruction packet trace (#15065 )	2026-03-03 07:55:04 +09:00
chenyu	14d1c5fdfd	assign fusion tests on detach and contiguous_backward (#15092 )	2026-03-02 15:21:51 -05:00
nimlgen	dfa180413d	tbgpu: sign nv (#15087 )	2026-03-02 22:58:30 +03:00
chenyu	71f228f80f	test exact kernel count in torch_backend/test_kernel_fusion (#15091 )	2026-03-02 14:26:32 -05:00

1 2 3 4 5 ...

12509 Commits