tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
George Hotz	35e3983840	Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644 )	2026-04-08 17:16:19 +08:00
Christopher Milan	acf239e4d2	specify renderer in DEV, <dev>_<ren>=1 is deprecated (#15551 )	2026-03-31 18:35:14 -04:00
nimlgen	5181c8e23a	llm: fix nan in kvcache (#15552 )	2026-04-01 00:38:45 +03:00
b1tg	a63392a565	llm: pairwise ranking topk for MoE expert selection (#15499 )	2026-03-31 12:46:39 +08:00
nimlgen	0d6fc0f571	jit: graphing in uops (#15489 ) * jit: graphing as rewrite rule * f * +metal,cuda * x * cl * x * x * simpler * f * m * x * revert? * revert2 * back * back * t * x * m * x * c * x * l * x * comment * smaller * rv * x * x	2026-03-27 19:09:02 +03:00
Christopher Milan	bc180a963c	deprecate <dev>=1 in favor of DEV=<dev> (#15467 ) * start work on target * add test * update actions to use DEV * update docs * update readmes * tests need that too * update example * update tests (comments) * fix that test * ruff * mypy * oops * remove getenvs * don't add Target yet * and the test * lint * and docs * more stuff * assert * few more fixes * test assert	2026-03-26 03:48:03 -04:00
George Hotz	fe2690399b	llm: support assistant prefill + refactor to TransformerConfig (#15457 ) * llm: support assistant prefill * refactor to ModelConfig * TransformerConfig * more	2026-03-25 10:50:48 +08:00
George Hotz	a33ac869aa	llm server: temperature + test client (#15444 ) * improvements to the llm server * eval script * eval llm * better eval gets 58.71 * cleanups * add temperature, but multinomial is absurdly slow * claude is so smart * lint * remove slop * no more stop	2026-03-24 21:07:15 +08:00
chenyu	c491345766	pass device into Tensor._frompy (#15385 ) * pass device into Tensor._frompy with this, canonicalize_device is the only usage of Device in tensor.py * export_model.py	2026-03-20 05:09:01 -04:00
George Hotz	3b75d8a7a2	fix double after bug in rangeify (#15381 )	2026-03-20 14:53:46 +08:00
Christopher Milan	0c89340a1e	automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366 )	2026-03-20 02:31:44 -04:00
chenyu	bf33c5f796	remove gradient materialize_grads (#15367 ) effectively default to True and removed *0 hack in Tensor.copysign. now dy/dx=0 if y does not depend on x remove	2026-03-19 23:36:03 -04:00
chenyu	fceb21c315	Tensor(uop) uses device from uop (#15340 )	2026-03-18 02:56:06 -04:00
George Hotz	9d95321be3	set allow_implicit=False by default (#15319 ) * set allow_implicit=False by default * modernize beautiful mnist	2026-03-17 17:14:38 +08:00
George Hotz	584ec75aa2	precompile backward (#15311 ) * add precompile backward support * cleanups * fix * compact grad * split v not split * simpler * no NOOPT	2026-03-17 15:28:40 +08:00
b1tg	856a839efc	llm: fix qwen3 moe topk renormalization (#15201 )	2026-03-17 12:57:33 +08:00
George Hotz	3ff03be413	call always has tuple (#15297 ) * call always has tuple * fix pre-commit and simplify * update * fix * move that assert * tuple * fix multi * cleanups * fix merge	2026-03-17 10:58:46 +08:00
wozeparrot	674c760974	embedded bwd vocab shard (#15001 ) * fix: remove more multi from call * feat: embedding bwd vocab sharding * clean: unused import * clean: don't actually need this pattern	2026-03-16 19:37:16 -07:00
chenyu	02afb45f29	remove UOp.assign [pr] (#15300 ) * remove UOp.assign [pr] it's all store and after, UOp is immutable * fix test	2026-03-16 21:45:41 -04:00
chenyu	3e2b7803e6	view assign replaces at buffer identity (#15298 ) matches what functions capture	2026-03-16 19:58:38 -04:00
George Hotz	476276f4b4	support grads on tuples (#15287 ) * support grads on tuples * simpler * grad_fxn works * cleanups * unused	2026-03-16 17:39:34 +08:00
George Hotz	08662bc4ab	add TUPLE/GETTUPLE, simple tests pass (#15286 ) * simple tuple stuff passes * resolved	2026-03-16 15:06:02 +08:00
chenyu	cd14e8e64b	allocations contiguous is store+after (#15280 )	2026-03-15 11:58:40 -04:00
Sieds Lykles	4b59083d7c	assign into empty works (#15256 )	2026-03-13 10:24:29 -04:00
chenyu	018c01508d	test case for call precompile multi (#15254 )	2026-03-13 06:28:43 -04:00
b1tg	18dc77ccab	add fp8 fnuz dtypes with PYTHON backend support (#14945 ) * add fp8 fnuz dtypes with PYTHON backend support * rm emu related change * clarify fp8 fnuz zero handling * Revert "rm emu related change" This reverts commit `efa4763c22`. --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2026-03-11 22:30:18 -04:00
George Hotz	4f3f55328b	do not patch on invalid tensor tests (#15226 ) * do not patch on invalid tensor tests * cleanup	2026-03-12 09:35:20 +08:00
Christopher Milan	2fb8a7f60f	fix test_invalid_tensor when before values are nan (#15215 )	2026-03-10 23:51:19 -04:00
Christopher Milan	ffaafd391a	Invalid in Tensor (#15154 )	2026-03-10 02:49:54 -04:00
chenyu	a53187eef7	fix TestPartialAssignToSharedBuffer (#15202 ) bufferize_to_store issue with assign	2026-03-09 23:14:23 -04:00
b1tg	891a73befc	llm: fix chunked prefill (#15182 ) * llm: fix chunked prefill * less lines --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2026-03-07 22:08:31 +08:00
Ananta Ranganathan	5bdad8ee41	update mxfp4 tests to use the same patterns as the others (#15177 ) * update mxfp4 tests to use the same patterns as the others * fix typo in test call not sure how it committed	2026-03-06 13:21:40 -05:00
Ananta Ranganathan	5c50035e0d	avoid using arithmetic for mxfp4 (#15172 ) * avoid using arithmetic for mxfp4 * update tests to use assert equal * no longer todo	2026-03-06 11:17:56 -05:00
Roelof van Dijk	059c6326c0	metal uint32 icb offset overflow (#15156 ) * metal uint32 icb offset overflow fix: diff supports_exec_item GraphRunner.supports_exec_item tests fix: can't import on non-metal stricter * also test the non-metal buffer case * imports on non-mac	2026-03-06 00:54:39 +03:00
Ananta Ranganathan	8ef656324e	FIXED TEST Q5_K GGUF dequant (#15147 ) * q5_k gguf support as separate pr * fix the problematic gemv test for q5_k * add assert to make sure the gemv test cant fail with warning instead of error	2026-03-05 16:32:36 +08:00
George Hotz	e97922a57c	LLM speedup with two jits, prefill/rollout (#15153 ) * START_TIME * print cleanup * fix tests	2026-03-05 16:21:09 +08:00
George Hotz	fb43b415f9	fix symbolic shape call + chunked prefill (#15149 ) * fix precompile for symbolic shape * chunked prefill * cleaner * test that	2026-03-05 14:02:26 +08:00
George Hotz	ac1847cbf7	fully symbolic llm (#15097 ) * work * llm symbolic (almost) * work * revert that * llm sym * works * cleanups * cache tokens with the kv cache * cleanups * cleanups	2026-03-05 10:22:11 +08:00
chenyu	34594bcaaf	Revert "bug in metal: offset is stored as uint32, overflow (#15129 )" (#15136 ) This reverts commit `9c58db16fa`.	2026-03-04 16:54:42 -05:00
Roelof van Dijk	9c58db16fa	bug in metal: offset is stored as uint32, overflow (#15129 ) * metal uint32 icb offset overflow * fix: diff * supports_exec_item * GraphRunner.supports_exec_item * tests * fix: can't import on non-metal	2026-03-04 22:52:12 +03:00
chenyu	fae400d300	update assign tests to also test the expected behavior (#15132 )	2026-03-04 11:34:43 -05:00
chenyu	1f96cc2b51	update non-contiguous buffer error message [pr] (#15131 ) * update non-contiguous buffer error message [pr] also cleaned up the tests * order	2026-03-04 11:13:26 -05:00
George Hotz	01ddb4c267	add precompile to call (#15099 ) * add precompile to call * put get back * something * after structure * alt * keep it call * resolve call * resolve linear call * precompile works with llm * revert rangeify * color for debugging * getenv PRECOMPILE * clean up deco pattern * fully recursive sink scheduling * revert llama * fix SPEC=2	2026-03-03 22:32:42 +08:00
chenyu	5dcf29b1a0	use clone in test_swap_slices (#15096 )	2026-03-02 22:05:12 -05:00
George Hotz	d483e4153a	buffer view is like buffer (#15082 ) * buffer view is like buffer * fix * swap_reshape_shrink * contiguous on gguf, fix overlap * revert that * _device_supports_view * this * fix that test * 0 buffers * that test was wrong * this * check correct size * contig BUFFER_VIEW * this * fix tests * buffer view tests * om * fix torch * no MOCKGPU * skip	2026-03-03 09:52:33 +08:00
chenyu	14d1c5fdfd	assign fusion tests on detach and contiguous_backward (#15092 )	2026-03-02 15:21:51 -05:00
chenyu	103ea16ec0	add contiguous back to svd (#15074 ) can cause infinite loop	2026-02-28 16:49:26 -05:00
George Hotz	bb84e389cf	functions for llama trainer (#15045 ) * functions for llama trainer * function there * axis match * fix multi * lil cleaner * there's a bug with HK_FLASH_ATTENTION * training functions * for commit	2026-02-28 12:15:18 +08:00
chenyu	5fd06f4f02	differentiable setitem (#15054 ) * differentiable setitem go through the where path for bw * no return	2026-02-27 17:25:15 -05:00
chenyu	c9f6d8751b	don't remove_bufferize for Invalid (#15053 ) * don't remove_bufferize for Invalid * replaced	2026-02-27 15:16:09 -05:00

1 2 3 4 5 ...

1139 Commits