tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-07 22:23:55 -05:00

Author	SHA1	Message	Date
chenyu	0a98fd38b3	fix tests that failed locally on mac (#13872 ) keccak output was silently broken without contiguous	2025-12-29 11:23:38 -05:00
Clément Verrier	0e409ff5ce	fix indentation in UOp pretty_print for repeated references (#13857 ) * fix correct indentation in UOp pretty_print for repeated references When a UOp was referenced multiple times, the walrus operator notation (e.g., x0:=) was correctly used for the first occurrence, but subsequent references had misaligned indentation due to an extra space character. Fix indentation misalignment in pretty_print() when UOps are referenced multiple times. * add simple unit tests for UOp repr --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-12-29 10:46:16 -05:00
George Hotz	25ef866e89	write python emulator from RDNA3 psuedocode in pdf (#13841 ) * write python emulator from RDNA3 psuedocode in pdf * emu2 * more emu * working * more psueod * progress * cleanups * delete junk * delete stale files * just emu * work * emu compare * bemu * cleanups and more failures * revert bench emu * fix emu cmp * four tests fail * bugfixes * dsl * ext * refactor * dsl * div scale fix * test_emu * fix emu tests * pcode * test pcode * top imports * fix test_emu to use run_asm * emu tests on real hardware * more tests * more emu tests * more * work * work * bug fix * bugfixes * fix fp16 gemm * all ops tests pass in emulator * fix llvm tests * fix a few more tests * fix mockgpu timeout	2025-12-29 07:39:53 -05:00
nimlgen	c6769badc2	mockgpu: async support (#13868 ) * mockgpu: async support * cpu	2025-12-29 13:18:37 +03:00
chenyu	784b919f7f	Revert "optim empty shard #13513 (#13598 )" (#13855 ) * Revert "optim empty shard #13513 (#13598)" This reverts commit `76d465dbc3`. * test_arange_shrink * update test	2025-12-27 21:10:23 -05:00
anu	9b4de8abc7	fix beam in python 3.14+ (#13836 ) * fix beam search on python 3.14 * add PickleableCount class to helpers * change name, add test, add step * tidy count init	2025-12-27 16:24:22 -05:00
Clément Verrier	ae013beab8	handle empty VECTORIZE in UOp.render() (#13847 ) `UOp.render()` crashed with `IndexError: tuple index out of range` when the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`. The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns `True` (vacuous truth), causing the code to access `x.src[0]` on an empty tuple. - Fix `IndexError` when calling `UOp.render()` on graphs containing empty `VECTORIZE` nodes. - Add test for empty `VECTORIZE` rendering.	2025-12-27 10:09:39 -05:00
qazal	a2da61d096	use new style amd compiler in viz (#13848 ) * working version, handcode gfx1100 arch * get target from device properties * lib in cfg test program spec	2025-12-27 23:59:30 +09:00
qazal	f6de9095a0	switch asm tests to dsl (#13840 ) * switch asm tests to dsl * labeled basic blocks also work * indenting for basic blocks * allow define from star import	2025-12-27 02:15:16 +09:00
George Hotz	9d94b8c6b2	python asm dsl in extra + python REMU (#13436 ) * having fun with python asm dsl * rdna3 * meh * all in rdna3 * work * more work * work * integration * tests * simpler * simpler * asm * better * simpler * progress * emu * simpler * emu * tests * types * vopd * cleaups * work * memory ranges * add tracing * refactors * run_asm exit * more readable * compare to remu * test gemm * bug + stale * more tests * refactor * tests fix * more ins * more instructions * refactor * faster * match case * match case * simpler * work * tests * run_asm * work * bug fixes * more emu * alu/emu * refactor * no pipeline emu yet * alu direct * fix * bugfixes + new test * fix exceptions in emulators * update gen.py * pylint * no pdf * improve bench_emu * speedups * cleanups * more tests	2025-12-25 13:04:14 -05:00
chenyu	54af29dbdb	trange can just be a function (#13827 )	2025-12-24 23:57:10 -05:00
qazal	a1c1684b91	set .amdhsa_kernarg_size in asm test (#13826 )	2025-12-25 13:08:14 +09:00
George Hotz	43c6e973d8	add optional compiler in Renderer (#13817 ) * add optional compiler in Renderer [pr] * fix * late init * remove precompiled * cleanup	2025-12-23 17:58:46 -05:00
nimlgen	90b217896f	am: xgmi p2p (#13811 ) * system: use addr space * am: xgmi * fix * ugh	2025-12-23 20:11:38 +03:00
George Hotz	6439a515be	test fixups / speedups / var_vals refactor (#13812 ) * no PYTHONPATH + llm server port 0 * llm tok speedup * refactor var_vals	2025-12-23 12:05:59 -05:00
George Hotz	8dcba2e2cc	no full_rewrite [pr] (#13809 ) * no full_rewrite [pr] * fix * fix docs	2025-12-22 23:20:01 -05:00
George Hotz	2af2b4da5d	Revert "rewrites for renderer and compiler (#13646 )" (#13806 ) This reverts commit `339dadf056`.	2025-12-22 19:21:33 -05:00
George Hotz	339dadf056	rewrites for renderer and compiler (#13646 ) * rewrites for renderer and compiler * full_rewrite_to_program * fix pre-commit * compiler passed into get_program * no pkl compiler * lib on program spec * fix spec * fix test * no device * compiler_device * nm * fix nir * fix * simplest * fix tests * revert	2025-12-22 18:58:43 -05:00
chenyu	7f1d41c9f9	delete files that import ShapeTracker (#13805 )	2025-12-22 15:54:18 -05:00
qazal	389f01c7f4	viz: amdgpu assembly basic block graph (#13755 )	2025-12-22 23:17:16 +08:00
George Hotz	df0f9d6860	add olmoe support to llm (#13792 ) * add olmoe support to llm * cleanups * simpler * clean * fix mypy * lil * remove dumb assert	2025-12-22 10:41:35 -04:00
chenyu	5cb827f7bf	clean up can_lossless_cast and add missing pairs [p] (#13793 )	2025-12-21 12:18:33 -05:00
George Hotz	75a6a03664	add qwen3 moe support to tinygrad.apps.llm (#13775 ) * qwen moe works * simple moe * one test * integration	2025-12-21 12:36:02 -04:00
qazal	dc660c9fc0	remove stale / untested viz related files (#13785 )	2025-12-21 16:42:48 +08:00
George Hotz	59c02dd87f	does this fix the dtype test? (#13779 ) * does this fix the dtype test? * simpler	2025-12-20 17:31:46 -04:00
chenyu	733ef0452c	update test_uop_resolve (#13777 ) plain @unittest.expectedFailure is too broad	2025-12-20 12:40:59 -05:00
George Hotz	45c459848d	remove more stale stuff (#13765 ) * remove more stale stuff * remove disassemblers/adreno * stale	2025-12-19 17:14:56 -04:00
George Hotz	744af193f0	remove ScheduleItem and merge it with ExecItem (#13759 ) * remove ExecItem and merge it with ScheduleItem * less diff * fix issues * min diff * don't change bufs in _lower * min diff * update * revert * fixes * diff	2025-12-19 17:04:24 -04:00
Christopher Milan	97103831c5	Revert "remove image from BufferSpec (#13636 )" (#13761 ) This reverts commit `2571a1eb47`.	2025-12-19 13:54:36 -05:00
Christopher Milan	2571a1eb47	remove image from BufferSpec (#13636 ) * remove image from BufferSpec * cl tiny_gemm (64) works * mypy * padding * openpilot CL * reshape properly * remove extra qcom checks * pad output * mypy * update compile test * move undo * TestImageCopy valid images * TestImageRealization valid images * TestImageDType valid images * cleanups * test_renderer_failures * ruff * mypy * simplify ops_qcom * bump step time	2025-12-19 13:41:20 -05:00
chenyu	185a000882	gradient of COPY (#13760 )	2025-12-19 13:33:59 -05:00
George Hotz	fa40df972f	fix tests for NV (#13744 ) * small fix * min diff * bfloat16 out	2025-12-18 13:20:21 -04:00
wozeparrot	99e667bdcd	tk fa bwd (#13480 )	2025-12-17 23:56:37 -08:00
George Hotz	aeb7516c8a	tests passing on tinybox h3 (#13742 )	2025-12-17 19:04:34 -04:00
George Hotz	b013244c38	fix local tests for AMD_LLVM (#13738 ) * fix local tests for AMD_LLVM * fix linters * skip that for now * fix segfault	2025-12-17 12:23:46 -04:00
George Hotz	3dbde178c1	mark slow tests as slow instead of as CI (#13736 ) * mark slow tests as slow instead of as CI * CI shouldn't have different behavior * more skips / CI * slow	2025-12-17 10:29:57 -04:00
George Hotz	9015a22523	make tests faster (#13734 )	2025-12-17 09:39:44 -04:00
chenyu	fda73c8180	support LAMB param offload (#13730 ) also added Tensor.shard_like	2025-12-16 19:56:30 -05:00
George Hotz	cf0c28d5ae	all tests pass on strix halo (#13728 )	2025-12-16 19:35:50 -04:00
George Hotz	4b741e893f	remove REMOTE=1 (#13722 ) * remove REMOTE=1 * leave ibverbs	2025-12-16 15:58:10 -04:00
George Hotz	bfe374c7f5	support symbolic shapes in split/chunk when split dim is concrete (#13718 ) * support symbolic shapes in split/chunk when split dim is concrete Previously split() and chunk() required all dimensions to be concrete. Now they only require the dimension being split to be concrete, allowing them to work with tensors that have symbolic shapes in other dimensions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * update CLAUDE.md: add pre-commit and no-amend rules 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix dim resolution order in split/chunk Ensure dim_sz is retrieved after dim is resolved, not before. The previous one-liner evaluated self.shape[dim] with the original unresolved dim value. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-16 13:55:06 -04:00
George Hotz	e5a66ace80	multi custom kernel support (#13716 ) * multi custom kernel support * custom kernel xfrom * works * no SPEC=2 on ck * panic * touchups	2025-12-16 11:36:30 -04:00
George Hotz	321ab943b2	qwen model is working (#13690 ) * qwen model is mostly working * add Q4_K quantization support to GGUF parser, add qwen3:1.7b model - Add Q4_K (type 12) dequantization in nn/state.py - Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0) - Make bos_token_id optional for models like Qwen3 that don't have it - Fix line length issues and add preset parameter to SimpleTokenizer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * smaller diff * test dequant * half split * better * simple tok * mock token * polish * better * fix * replace --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 18:00:34 -04:00
wozeparrot	5d509499b2	tk: kernel finish groups stores (#13704 )	2025-12-15 09:16:17 -08:00
George Hotz	54a22aa298	add test for jit footguns (#13701 ) * add test for jit footguns * shorter * notes	2025-12-15 10:47:44 -05:00
George Hotz	a657a4e0f4	add Q4_K GGUF quantization support (#13700 ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 10:17:56 -05:00
wozeparrot	7ef7ce2856	tk reg local store (#13689 )	2025-12-14 23:07:30 -08:00
George Hotz	572ca80046	fast tinygrad.apps.llm (#13685 ) * llm: add --benchmark support * fix speed * debug logging * fix test attention	2025-12-14 21:05:21 -05:00
chenyu	ed962786d6	use assign in Tensor.backward (#13674 ) preserve the grad object so that jit works	2025-12-13 22:43:06 -05:00
George Hotz	55845f7de7	schedule: cache unbinds for consistent cache keys (#13664 ) * schedule: cache unbinds for consistent cache keys strip BIND values before computing cache key so different bound values (e.g. KV cache positions) hit the same schedule cache entry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * spec: allow single-src BIND for schedule cache key normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add lessons learned to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * more claude.md --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 17:27:42 -05:00

1 2 3 4 5 ...

4838 Commits