tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	6439a515be	test fixups / speedups / var_vals refactor (#13812 ) * no PYTHONPATH + llm server port 0 * llm tok speedup * refactor var_vals	2025-12-23 12:05:59 -05:00
George Hotz	8dcba2e2cc	no full_rewrite [pr] (#13809 ) * no full_rewrite [pr] * fix * fix docs	2025-12-22 23:20:01 -05:00
George Hotz	2af2b4da5d	Revert "rewrites for renderer and compiler (#13646 )" (#13806 ) This reverts commit `339dadf056`.	2025-12-22 19:21:33 -05:00
George Hotz	339dadf056	rewrites for renderer and compiler (#13646 ) * rewrites for renderer and compiler * full_rewrite_to_program * fix pre-commit * compiler passed into get_program * no pkl compiler * lib on program spec * fix spec * fix test * no device * compiler_device * nm * fix nir * fix * simplest * fix tests * revert	2025-12-22 18:58:43 -05:00
chenyu	7f1d41c9f9	delete files that import ShapeTracker (#13805 )	2025-12-22 15:54:18 -05:00
qazal	389f01c7f4	viz: amdgpu assembly basic block graph (#13755 )	2025-12-22 23:17:16 +08:00
George Hotz	df0f9d6860	add olmoe support to llm (#13792 ) * add olmoe support to llm * cleanups * simpler * clean * fix mypy * lil * remove dumb assert	2025-12-22 10:41:35 -04:00
chenyu	5cb827f7bf	clean up can_lossless_cast and add missing pairs [p] (#13793 )	2025-12-21 12:18:33 -05:00
George Hotz	75a6a03664	add qwen3 moe support to tinygrad.apps.llm (#13775 ) * qwen moe works * simple moe * one test * integration	2025-12-21 12:36:02 -04:00
qazal	dc660c9fc0	remove stale / untested viz related files (#13785 )	2025-12-21 16:42:48 +08:00
George Hotz	59c02dd87f	does this fix the dtype test? (#13779 ) * does this fix the dtype test? * simpler	2025-12-20 17:31:46 -04:00
chenyu	733ef0452c	update test_uop_resolve (#13777 ) plain @unittest.expectedFailure is too broad	2025-12-20 12:40:59 -05:00
George Hotz	45c459848d	remove more stale stuff (#13765 ) * remove more stale stuff * remove disassemblers/adreno * stale	2025-12-19 17:14:56 -04:00
George Hotz	744af193f0	remove ScheduleItem and merge it with ExecItem (#13759 ) * remove ExecItem and merge it with ScheduleItem * less diff * fix issues * min diff * don't change bufs in _lower * min diff * update * revert * fixes * diff	2025-12-19 17:04:24 -04:00
Christopher Milan	97103831c5	Revert "remove image from BufferSpec (#13636 )" (#13761 ) This reverts commit `2571a1eb47`.	2025-12-19 13:54:36 -05:00
Christopher Milan	2571a1eb47	remove image from BufferSpec (#13636 ) * remove image from BufferSpec * cl tiny_gemm (64) works * mypy * padding * openpilot CL * reshape properly * remove extra qcom checks * pad output * mypy * update compile test * move undo * TestImageCopy valid images * TestImageRealization valid images * TestImageDType valid images * cleanups * test_renderer_failures * ruff * mypy * simplify ops_qcom * bump step time	2025-12-19 13:41:20 -05:00
chenyu	185a000882	gradient of COPY (#13760 )	2025-12-19 13:33:59 -05:00
George Hotz	fa40df972f	fix tests for NV (#13744 ) * small fix * min diff * bfloat16 out	2025-12-18 13:20:21 -04:00
wozeparrot	99e667bdcd	tk fa bwd (#13480 )	2025-12-17 23:56:37 -08:00
George Hotz	aeb7516c8a	tests passing on tinybox h3 (#13742 )	2025-12-17 19:04:34 -04:00
George Hotz	b013244c38	fix local tests for AMD_LLVM (#13738 ) * fix local tests for AMD_LLVM * fix linters * skip that for now * fix segfault	2025-12-17 12:23:46 -04:00
George Hotz	3dbde178c1	mark slow tests as slow instead of as CI (#13736 ) * mark slow tests as slow instead of as CI * CI shouldn't have different behavior * more skips / CI * slow	2025-12-17 10:29:57 -04:00
George Hotz	9015a22523	make tests faster (#13734 )	2025-12-17 09:39:44 -04:00
chenyu	fda73c8180	support LAMB param offload (#13730 ) also added Tensor.shard_like	2025-12-16 19:56:30 -05:00
George Hotz	cf0c28d5ae	all tests pass on strix halo (#13728 )	2025-12-16 19:35:50 -04:00
George Hotz	4b741e893f	remove REMOTE=1 (#13722 ) * remove REMOTE=1 * leave ibverbs	2025-12-16 15:58:10 -04:00
George Hotz	bfe374c7f5	support symbolic shapes in split/chunk when split dim is concrete (#13718 ) * support symbolic shapes in split/chunk when split dim is concrete Previously split() and chunk() required all dimensions to be concrete. Now they only require the dimension being split to be concrete, allowing them to work with tensors that have symbolic shapes in other dimensions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * update CLAUDE.md: add pre-commit and no-amend rules 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix dim resolution order in split/chunk Ensure dim_sz is retrieved after dim is resolved, not before. The previous one-liner evaluated self.shape[dim] with the original unresolved dim value. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-16 13:55:06 -04:00
George Hotz	e5a66ace80	multi custom kernel support (#13716 ) * multi custom kernel support * custom kernel xfrom * works * no SPEC=2 on ck * panic * touchups	2025-12-16 11:36:30 -04:00
George Hotz	321ab943b2	qwen model is working (#13690 ) * qwen model is mostly working * add Q4_K quantization support to GGUF parser, add qwen3:1.7b model - Add Q4_K (type 12) dequantization in nn/state.py - Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0) - Make bos_token_id optional for models like Qwen3 that don't have it - Fix line length issues and add preset parameter to SimpleTokenizer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * smaller diff * test dequant * half split * better * simple tok * mock token * polish * better * fix * replace --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 18:00:34 -04:00
wozeparrot	5d509499b2	tk: kernel finish groups stores (#13704 )	2025-12-15 09:16:17 -08:00
George Hotz	54a22aa298	add test for jit footguns (#13701 ) * add test for jit footguns * shorter * notes	2025-12-15 10:47:44 -05:00
George Hotz	a657a4e0f4	add Q4_K GGUF quantization support (#13700 ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 10:17:56 -05:00
wozeparrot	7ef7ce2856	tk reg local store (#13689 )	2025-12-14 23:07:30 -08:00
George Hotz	572ca80046	fast tinygrad.apps.llm (#13685 ) * llm: add --benchmark support * fix speed * debug logging * fix test attention	2025-12-14 21:05:21 -05:00
chenyu	ed962786d6	use assign in Tensor.backward (#13674 ) preserve the grad object so that jit works	2025-12-13 22:43:06 -05:00
George Hotz	55845f7de7	schedule: cache unbinds for consistent cache keys (#13664 ) * schedule: cache unbinds for consistent cache keys strip BIND values before computing cache key so different bound values (e.g. KV cache positions) hit the same schedule cache entry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * spec: allow single-src BIND for schedule cache key normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add lessons learned to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * more claude.md --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 17:27:42 -05:00
George Hotz	8c87a0bf8d	Revert "schedule: cache unbinds for consistent cache keys (#13662 )" This reverts commit `af86cae10c`.	2025-12-12 16:49:50 -05:00
George Hotz	af86cae10c	schedule: cache unbinds for consistent cache keys (#13662 ) * schedule: cache unbinds for consistent cache keys different bound variable values (e.g. kv cache positions) now produce the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before computing the cache key and rebinding after lookup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: cache unbinds for consistent cache keys When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to tagged DEFINE_VARs before computing the cache key. This ensures that the same computation with different bound values (e.g., different KV cache positions in LLM) gets the same cache key and reuses the cached schedule. The fix: - pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR - pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND - pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify - var_vals extracted from BINDs before cache key computation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: fix BIND handling and add CLAUDE.md - Handle BIND to RANGE in create_schedule (not matched by CONST pattern) - Assert all BINDs on same variable have same value - Add CLAUDE.md codebase guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 16:40:10 -05:00
George Hotz	316da9f7ff	llm: add created/model fields, non-streaming support, and tests (#13660 ) * llm: add created/model fields, non-streaming support, and tests - Add `created` timestamp and `model` fields to response (required by OpenAI spec) - Add non-streaming mode support for /v1/chat/completions - Add `send_data` helper to HTTPRequestHandler for responses with Content-Length - Refactor viz/serve.py to use send_data - Add integration tests using real OpenAI client 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * add openai to testing * toml * Remove 'openai' from dependencies Removed 'openai' from the dependencies list. * bump cache --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 14:50:36 -05:00
nimlgen	e36385e570	am: support xgmi systems (#13659 ) * am: support xgmi systems * fake_am	2025-12-12 18:55:45 +03:00
Jakob Sachs	ab2220b834	Handle missing bfloat16 natives on CPU architectures (#13553 ) * CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16 * fix lint * remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16 --------- Co-authored-by: Jakob Sachs <jakobs99@purelymail.com>	2025-12-11 15:38:43 -05:00
chenyu	03600aef1e	failed test case when init jit with empty inputs (#13641 ) not related to bert grad acc, but still seems to be a bug	2025-12-10 22:03:06 -05:00
Nino Risteski	76d465dbc3	optim empty shard #13513 (#13598 ) * optim empty shard * remove tuple * simplify * lint * lint2 * test * remove original buffer unique id * new rule * reset shard * update * reset shard	2025-12-09 12:28:36 -05:00
ayanhan	47a170be2e	test: enable cummax scalar IndexError test (#13625 )	2025-12-09 12:25:56 -05:00
Christopher Milan	a17077d1d9	skip test_double_assign in CI LVP (#13620 )	2025-12-08 14:54:02 -05:00
Christopher Milan	1c16b6e082	Mesa: freedreno (#12746 ) * ir3 init * got a program * 1 + 1 works * use isa_disasm instead of shader_disasm * wip * matmul works * works on py3.14 * fix const loading * skip QCOM failing tests * cleanup * args actually work * add compile-only tests * fix typo and install tinymesa * IR3 NULL backend * (float32) images work * autogen fix * fix compile only test * typo * mypy happy * compile-only uses py3.14 * bump mesa * unify qcom disassembler * float16 works * disasm shows in viz * save a line * add real del * variable workgroup sizes * simplify diff * bump line count * properly set wgsz * regen mesa * no preamble * bump lines	2025-12-08 14:02:08 -05:00
Douglas Nyberg	947c6eefc3	add Swish op (#13541 ) * add Swish ONNX operator * add Swish regression test * remove trailing whitespace * upgrade ONNX to 1.20, add excludes for unimplemented ops * upgrade ONNX to 1.19, add Swish op * upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op * exclude attention_3d and attention_4d_gqa tests * exclude attention fp16 tests * exclude all attention tests * retrigger CI * retrigger CI - worker crash	2025-12-08 12:41:18 -05:00
Christopher Milan	94d7646bdc	fix anonymous struct fields (#13610 )	2025-12-07 12:56:38 -05:00
nimlgen	ac5f1e115d	autogen: repro for the bug (#13607 ) * autogen: repro for the test * mute	2025-12-07 15:51:03 +03:00
wozeparrot	93f1baca77	feat: tk fa in tensor (#13580 )	2025-12-05 14:36:29 -08:00

1 2 3 4 5 ...

4774 Commits