tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 22:48:25 -05:00

Author	SHA1	Message	Date
wozeparrot	7ef7ce2856	tk reg local store (#13689 )	2025-12-14 23:07:30 -08:00
George Hotz	572ca80046	fast tinygrad.apps.llm (#13685 ) * llm: add --benchmark support * fix speed * debug logging * fix test attention	2025-12-14 21:05:21 -05:00
chenyu	ed962786d6	use assign in Tensor.backward (#13674 ) preserve the grad object so that jit works	2025-12-13 22:43:06 -05:00
George Hotz	55845f7de7	schedule: cache unbinds for consistent cache keys (#13664 ) * schedule: cache unbinds for consistent cache keys strip BIND values before computing cache key so different bound values (e.g. KV cache positions) hit the same schedule cache entry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * spec: allow single-src BIND for schedule cache key normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add lessons learned to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * more claude.md --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 17:27:42 -05:00
George Hotz	8c87a0bf8d	Revert "schedule: cache unbinds for consistent cache keys (#13662 )" This reverts commit `af86cae10c`.	2025-12-12 16:49:50 -05:00
George Hotz	af86cae10c	schedule: cache unbinds for consistent cache keys (#13662 ) * schedule: cache unbinds for consistent cache keys different bound variable values (e.g. kv cache positions) now produce the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before computing the cache key and rebinding after lookup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: cache unbinds for consistent cache keys When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to tagged DEFINE_VARs before computing the cache key. This ensures that the same computation with different bound values (e.g., different KV cache positions in LLM) gets the same cache key and reuses the cached schedule. The fix: - pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR - pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND - pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify - var_vals extracted from BINDs before cache key computation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: fix BIND handling and add CLAUDE.md - Handle BIND to RANGE in create_schedule (not matched by CONST pattern) - Assert all BINDs on same variable have same value - Add CLAUDE.md codebase guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 16:40:10 -05:00
George Hotz	316da9f7ff	llm: add created/model fields, non-streaming support, and tests (#13660 ) * llm: add created/model fields, non-streaming support, and tests - Add `created` timestamp and `model` fields to response (required by OpenAI spec) - Add non-streaming mode support for /v1/chat/completions - Add `send_data` helper to HTTPRequestHandler for responses with Content-Length - Refactor viz/serve.py to use send_data - Add integration tests using real OpenAI client 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * add openai to testing * toml * Remove 'openai' from dependencies Removed 'openai' from the dependencies list. * bump cache --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 14:50:36 -05:00
nimlgen	e36385e570	am: support xgmi systems (#13659 ) * am: support xgmi systems * fake_am	2025-12-12 18:55:45 +03:00
Jakob Sachs	ab2220b834	Handle missing bfloat16 natives on CPU architectures (#13553 ) * CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16 * fix lint * remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16 --------- Co-authored-by: Jakob Sachs <jakobs99@purelymail.com>	2025-12-11 15:38:43 -05:00
chenyu	03600aef1e	failed test case when init jit with empty inputs (#13641 ) not related to bert grad acc, but still seems to be a bug	2025-12-10 22:03:06 -05:00
Nino Risteski	76d465dbc3	optim empty shard #13513 (#13598 ) * optim empty shard * remove tuple * simplify * lint * lint2 * test * remove original buffer unique id * new rule * reset shard * update * reset shard	2025-12-09 12:28:36 -05:00
ayanhan	47a170be2e	test: enable cummax scalar IndexError test (#13625 )	2025-12-09 12:25:56 -05:00
Christopher Milan	a17077d1d9	skip test_double_assign in CI LVP (#13620 )	2025-12-08 14:54:02 -05:00
Christopher Milan	1c16b6e082	Mesa: freedreno (#12746 ) * ir3 init * got a program * 1 + 1 works * use isa_disasm instead of shader_disasm * wip * matmul works * works on py3.14 * fix const loading * skip QCOM failing tests * cleanup * args actually work * add compile-only tests * fix typo and install tinymesa * IR3 NULL backend * (float32) images work * autogen fix * fix compile only test * typo * mypy happy * compile-only uses py3.14 * bump mesa * unify qcom disassembler * float16 works * disasm shows in viz * save a line * add real del * variable workgroup sizes * simplify diff * bump line count * properly set wgsz * regen mesa * no preamble * bump lines	2025-12-08 14:02:08 -05:00
Douglas Nyberg	947c6eefc3	add Swish op (#13541 ) * add Swish ONNX operator * add Swish regression test * remove trailing whitespace * upgrade ONNX to 1.20, add excludes for unimplemented ops * upgrade ONNX to 1.19, add Swish op * upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op * exclude attention_3d and attention_4d_gqa tests * exclude attention fp16 tests * exclude all attention tests * retrigger CI * retrigger CI - worker crash	2025-12-08 12:41:18 -05:00
Christopher Milan	94d7646bdc	fix anonymous struct fields (#13610 )	2025-12-07 12:56:38 -05:00
nimlgen	ac5f1e115d	autogen: repro for the bug (#13607 ) * autogen: repro for the test * mute	2025-12-07 15:51:03 +03:00
wozeparrot	93f1baca77	feat: tk fa in tensor (#13580 )	2025-12-05 14:36:29 -08:00
George Hotz	c5bd28e21d	start work on schedule cache (#13529 ) * start work on schedule cache * local unique * schedule cache works * schedule cache cleanup * fix tests * preserve metadata * oops, fix cache * put that there * fix spec * always miss * why is that broken? * src[0].op * fix process replay * delete abstractions2 * reenable the actual schedule cache * metadata is best effort * fix JIT in examples/gradaccum_mnist.py * full jit * fixed and test is real	2025-12-04 17:24:49 -08:00
chenyu	42f6cf3a90	tighter test_real_world mem and kernel count bounds (#13573 ) also check if actual usage is within 20% of set limit, the old limits are too big to be useful	2025-12-04 13:35:39 -05:00
chenyu	89f9e1dcd5	add SGD to beautiful_mnist (#13571 )	2025-12-04 12:17:29 -05:00
Rory Clear	6eab756578	fix and test loading num_batches_tracked (#13538 ) * fix and test loading num_batches_tracked * add failing reverse case * try reshape state dict if mismatch * reshape for () and (1,) --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-12-04 01:22:49 -08:00
Douglas Nyberg	a8a62bc08e	add max/min reduction support to ScatterND (#13562 )	2025-12-04 00:53:47 -08:00
ayanhan	edf929ec9d	fix: add __delitem__ to Tensor with proper TypeError (#13561 )	2025-12-04 00:53:08 -08:00
ayanhan	92b40290c7	fix: add test_sum_int and remove outdated TODO in test_custom_kernel (#13560 )	2025-12-03 21:51:58 -05:00
Christopher Milan	0a54434b15	mitigate ctypes c_bool bitfield bug (#13558 ) * mitigate ctypes c_bool bitfield bug * don't delete old test	2025-12-03 20:46:04 -05:00
George Hotz	24ca8eeaa7	small fixups from schedule_cache (#13557 )	2025-12-03 15:41:16 -08:00
Douglas Nyberg	f5abd38132	remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555 )	2025-12-03 17:48:27 -05:00
George Hotz	a4c4e48385	add LUNIQUE op (#13554 )	2025-12-03 14:34:34 -08:00
chenyu	22777a89ea	minor test_uop_symbolic updates (#13551 )	2025-12-03 13:17:44 -05:00
chenyu	a205f98ef4	tighter bound for MOD (#13550 )	2025-12-03 11:24:29 -05:00
nimlgen	549f3287a8	fix caching for fetch (#13544 )	2025-12-03 14:34:14 +03:00
George Hotz	6bd355fa26	add needs_second_gpu decorator (#13543 ) * add needs_second_gpu decorator * more skips * two more fixes	2025-12-02 19:08:23 -08:00
wozeparrot	0d55aec605	fix after end (#13542 )	2025-12-02 18:42:58 -08:00
George Hotz	055d5aeb7f	add external_test_process_count	2025-12-02 17:26:30 -08:00
chenyu	e8879f7e31	match torch clamp backward (#13533 ) * match torch clamp backward * fix PYTHON	2025-12-02 17:58:32 -05:00
Roelof van Dijk	c158e3c988	add cifar gated uop_given_valid regression test (#13536 )	2025-12-02 16:02:47 -05:00
Roelof van Dijk	e329baffa7	fix cifar while keeping openpilot fused (#13528 ) * this works * test now passes	2025-12-02 12:05:56 -08:00
nimlgen	0874ba8cc8	test_hevc: do not download the whole file (#13531 ) * test_hevc: do not download the whole file * fix	2025-12-02 21:31:28 +03:00
qazal	366badaa68	require renderer argument in get_program, removes device opening in process replay [pr] (#13524 )	2025-12-03 02:05:31 +08:00
Douglas Nyberg	6a7c58abf1	fix(onnx): unwrap list/tuple value in Pad op (#13500 ) * fix(onnx): unwrap list/tuple value in Pad op * add regression test for Pad list value * remove trailing whitespace * use _resolve_const for Pad constant_value	2025-12-02 07:47:20 -08:00
nimlgen	77a76d1b13	device: respect compiler ContextVars (#13523 ) * device: envvars for cc * fix * fix * x * um * fix * remote * em * cleanup * typing * fix * debug * lvp? * ugh * singl * rm * lol * fix * ? * this? * why? * rev * mod test * l	2025-12-02 14:42:04 +03:00
wozeparrot	1b7dbfb37f	tk: named kernels + per kernel range id (#13522 )	2025-12-01 22:51:04 -08:00
nimlgen	455dd88236	nv: minimal hevc (#13502 ) * nv: minimal hevc * validate * not needed * tralin * var * cpu * fxi * desc * move * cleanup	2025-11-30 16:46:55 +03:00
George Hotz	fd373fea7a	fix a few tests [pr] (#13498 )	2025-11-29 13:43:45 -08:00
George Hotz	6a140f74fe	split out unique_const and cache const [pr] (#13493 ) * split out unique_const * add cache to const * call const in unique_const	2025-11-29 10:44:28 -08:00
George Hotz	c38b7684dc	improve microbenchmarks (#13492 ) * improve microbenchmarks * bugfix + ubench * lil * no src in const method	2025-11-29 10:15:22 -08:00
kamilisjon	3d76ef9ba8	Update tests (#13479 )	2025-11-28 18:35:28 -08:00
qazal	ae9c56134e	skip test_tk failing locally on macbook (#13476 )	2025-11-29 01:15:37 +08:00
qazal	72ef533d9c	tracing: use u32 for buffer args encoding (#13472 )	2025-11-28 00:19:51 +08:00

1 2 3 4 5 ...

4842 Commits