tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
wozeparrot	99e667bdcd	tk fa bwd (#13480 )	2025-12-17 23:56:37 -08:00
George Hotz	aeb7516c8a	tests passing on tinybox h3 (#13742 )	2025-12-17 19:04:34 -04:00
chenyu	7cd7593c5d	add script to train bert on mi350x (#13743 ) adapted from mi300 config	2025-12-17 16:54:04 -05:00
George Hotz	22f3e7f995	better precommit coverage and faster (#13740 ) * improve pre-commit hook speed and coverage * remove a few * lose that	2025-12-17 13:25:55 -04:00
George Hotz	bc78cf1197	filter warnings for nicer test output (#13739 )	2025-12-17 13:25:27 -04:00
George Hotz	b013244c38	fix local tests for AMD_LLVM (#13738 ) * fix local tests for AMD_LLVM * fix linters * skip that for now * fix segfault	2025-12-17 12:23:46 -04:00
nimlgen	7081014c73	am_smi: mi300 (#13737 ) * am_smi: mi300 * smi * remo	2025-12-17 17:56:01 +03:00
George Hotz	3dbde178c1	mark slow tests as slow instead of as CI (#13736 ) * mark slow tests as slow instead of as CI * CI shouldn't have different behavior * more skips / CI * slow	2025-12-17 10:29:57 -04:00
George Hotz	9015a22523	make tests faster (#13734 )	2025-12-17 09:39:44 -04:00
nimlgen	3eecb4f123	am: mi350 support (#13733 )	2025-12-17 14:57:21 +03:00
wozeparrot	5151a341b3	tk: small changes from fa bwd (#13732 )	2025-12-16 22:44:36 -08:00
chenyu	fda73c8180	support LAMB param offload (#13730 ) also added Tensor.shard_like	2025-12-16 19:56:30 -05:00
George Hotz	cf0c28d5ae	all tests pass on strix halo (#13728 )	2025-12-16 19:35:50 -04:00
Christopher Milan	af1d938a50	DLL: search wsl lib folder (#13727 )	2025-12-16 18:27:09 -05:00
George Hotz	0fb645cc4c	move some methods to mixins (#13725 ) * move some methods to mixins * a few more * math trunc	2025-12-16 19:20:04 -04:00
Christopher Milan	c6ba016da6	fix cuda check (#13726 )	2025-12-16 18:00:09 -05:00
George Hotz	ee45669d14	pre extract afters + sched cleanups (#13720 ) * pre extract afters + sched cleanups * claude.md lesson * tests for schedule cache * Revert "tests for schedule cache" This reverts commit `fb3f2e800a`.	2025-12-16 16:14:30 -04:00
George Hotz	4b741e893f	remove REMOTE=1 (#13722 ) * remove REMOTE=1 * leave ibverbs	2025-12-16 15:58:10 -04:00
George Hotz	4d8d821f56	create schedule before the cache (#13717 ) * create schedule before the cache * move create_schedule * simpler * simpler * simpler	2025-12-16 14:15:31 -04:00
George Hotz	bfe374c7f5	support symbolic shapes in split/chunk when split dim is concrete (#13718 ) * support symbolic shapes in split/chunk when split dim is concrete Previously split() and chunk() required all dimensions to be concrete. Now they only require the dimension being split to be concrete, allowing them to work with tensors that have symbolic shapes in other dimensions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * update CLAUDE.md: add pre-commit and no-amend rules 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix dim resolution order in split/chunk Ensure dim_sz is retrieved after dim is resolved, not before. The previous one-liner evaluated self.shape[dim] with the original unresolved dim value. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-16 13:55:06 -04:00
chenyu	e428fbfab6	verify dtype of llama model params (#13719 )	2025-12-16 12:32:02 -05:00
George Hotz	e5a66ace80	multi custom kernel support (#13716 ) * multi custom kernel support * custom kernel xfrom * works * no SPEC=2 on ck * panic * touchups	2025-12-16 11:36:30 -04:00
nimlgen	5778722979	am: restore queues (#13714 ) * am: restore queues * l * cmnt	2025-12-16 15:21:42 +03:00
chenyu	041e9a41c9	add contiguous in BertIntermediate (#13713 ) faster step with a lot less recomputation	2025-12-15 22:37:36 -05:00
George Hotz	7589c897b2	split usbgpu tests into their own benchmark [pr] (#13711 )	2025-12-15 21:42:40 -04:00
qazal	6bafd90248	remove unused process replay input [pr] (#13712 )	2025-12-16 09:29:35 +08:00
George Hotz	321ab943b2	qwen model is working (#13690 ) * qwen model is mostly working * add Q4_K quantization support to GGUF parser, add qwen3:1.7b model - Add Q4_K (type 12) dequantization in nn/state.py - Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0) - Make bos_token_id optional for models like Qwen3 that don't have it - Fix line length issues and add preset parameter to SimpleTokenizer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * smaller diff * test dequant * half split * better * simple tok * mock token * polish * better * fix * replace --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 18:00:34 -04:00
George Hotz	d43e4c7553	llm args + lil html page (#13710 ) * update llm args * lil html page * lil * line size * qol	2025-12-15 17:09:31 -04:00
George Hotz	ee4a7ee12f	rope half-split (#13706 ) * rope half * nicer * this * rearrange	2025-12-15 15:31:11 -04:00
Christopher Milan	2359e88f0c	wrap cdll redo (#13705 ) * wrap CDLL with custom findlib * lint * regen * fix * mypy * hardcode libc on macos * fix frameworks * fix webgpu win * remove supports * regen metal * regen libclang * regen * simpler * regen * regen * find nvrtc * fix * regen * fix * typo * regen * split * rsplit one * typo * try load DLL * string error	2025-12-15 13:15:02 -05:00
wozeparrot	5d509499b2	tk: kernel finish groups stores (#13704 )	2025-12-15 09:16:17 -08:00
George Hotz	54a22aa298	add test for jit footguns (#13701 ) * add test for jit footguns * shorter * notes	2025-12-15 10:47:44 -05:00
George Hotz	fd49bb512d	download cache by job (#13703 )	2025-12-15 10:47:17 -05:00
George Hotz	a657a4e0f4	add Q4_K GGUF quantization support (#13700 ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 10:17:56 -05:00
nimlgen	615dcab767	am: minimal mi300 boot (#13679 ) * nbio7_9 * psp * gmc * gfx * sdma * ih * linter * linter * minor * finish * add missing * do not allow warm boot for now	2025-12-15 15:55:03 +03:00
qazal	72e006cd59	fast VIZ=2 startup (#13682 )	2025-12-15 19:16:43 +08:00
qazal	50d34428bd	fix viz endstream (#13687 )	2025-12-15 16:54:18 +08:00
wozeparrot	7ef7ce2856	tk reg local store (#13689 )	2025-12-14 23:07:30 -08:00
George Hotz	572ca80046	fast tinygrad.apps.llm (#13685 ) * llm: add --benchmark support * fix speed * debug logging * fix test attention	2025-12-14 21:05:21 -05:00
chenyu	6cad622f59	don't FREE_INTERMEDIATE in bert (#13684 ) hangs green hcq consistently after an hour of training	2025-12-14 14:27:42 -05:00
chenyu	871ab8415f	some onnx cleanups (#13683 )	2025-12-14 13:58:54 -05:00
nimlgen	75832ce4f6	am: psp with no autoload (#13681 )	2025-12-14 20:20:09 +03:00
nimlgen	8bcb1038e4	am: nbio 7.9.0 (#13680 )	2025-12-14 18:35:29 +03:00
George Hotz	013240938b	llm: add --benchmark support (#13678 )	2025-12-14 08:35:05 -05:00
Robbe Derks	cddbdaf5e1	usbgpu: patch: auto-detect controller PID/VID (#13645 ) * auto-detect controller * fix lint? * needs '' * just try	2025-12-14 00:54:51 -05:00
George Hotz	d7fb5d9b62	speedups: early return from simplify (#13665 ) * early return from simplify * pm_rewrite * more speed * remove again * early return from simplify * ugh	2025-12-14 00:51:28 -05:00
George Hotz	bcbf832399	add chrism	2025-12-14 00:45:57 -05:00
chenyu	ed962786d6	use assign in Tensor.backward (#13674 ) preserve the grad object so that jit works	2025-12-13 22:43:06 -05:00
chenyu	721a379c41	Revert "autogen: use wrapped CDLL with custom findlib (#13666 )" (#13675 ) This reverts commit `f6cc3b13b9`.	2025-12-13 22:42:41 -05:00
nimlgen	6402dcf940	am: xccs queue creation (#13672 )	2025-12-13 18:37:09 +03:00

1 2 3 4 5 ...

11401 Commits