tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-07 22:23:55 -05:00

Author	SHA1	Message	Date
qazal	019e71f8ca	lds bank count tests from pmc counters (#13667 ) * lds bank count tests from pmc counters * these tests run on the RDNA3 card too * rename duration to cycles, other rename comment * add SQ_LDS_IDX_ACTIVE to gfx9 defaults	2025-12-13 17:39:32 +08:00
qazal	a6dfd8a672	viz server cleanups (#13668 ) * viz server cleanups * comment	2025-12-13 17:27:53 +08:00
Christopher Milan	f6cc3b13b9	autogen: use wrapped CDLL with custom findlib (#13666 ) * wrap CDLL with custom findlib * lint * regen * fix * mypy * hardcode libc on macos * fix frameworks * fix webgpu win * remove supports * regen metal * regen libclang * regen * simpler * regen * regen * find nvrtc * fix * regen * fix * typo * regen * split * rsplit one * typo	2025-12-13 01:31:30 -05:00
George Hotz	55845f7de7	schedule: cache unbinds for consistent cache keys (#13664 ) * schedule: cache unbinds for consistent cache keys strip BIND values before computing cache key so different bound values (e.g. KV cache positions) hit the same schedule cache entry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * spec: allow single-src BIND for schedule cache key normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add lessons learned to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * more claude.md --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 17:27:42 -05:00
George Hotz	27845353a0	add CLAUDE.md	2025-12-12 16:50:11 -05:00
George Hotz	8c87a0bf8d	Revert "schedule: cache unbinds for consistent cache keys (#13662 )" This reverts commit `af86cae10c`.	2025-12-12 16:49:50 -05:00
George Hotz	443b7fea80	Revert "add notes about jit to claude.md" This reverts commit `429f82e6a9`.	2025-12-12 16:49:48 -05:00
George Hotz	429f82e6a9	add notes about jit to claude.md	2025-12-12 16:48:23 -05:00
George Hotz	af86cae10c	schedule: cache unbinds for consistent cache keys (#13662 ) * schedule: cache unbinds for consistent cache keys different bound variable values (e.g. kv cache positions) now produce the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before computing the cache key and rebinding after lookup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: cache unbinds for consistent cache keys When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to tagged DEFINE_VARs before computing the cache key. This ensures that the same computation with different bound values (e.g., different KV cache positions in LLM) gets the same cache key and reuses the cached schedule. The fix: - pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR - pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND - pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify - var_vals extracted from BINDs before cache key computation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: fix BIND handling and add CLAUDE.md - Handle BIND to RANGE in create_schedule (not matched by CONST pattern) - Assert all BINDs on same variable have same value - Add CLAUDE.md codebase guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 16:40:10 -05:00
chenyu	fcaed1e1dd	don't use empty in bert fake data (#13661 ) somehow jit does not count empty as input	2025-12-12 15:59:50 -05:00
George Hotz	316da9f7ff	llm: add created/model fields, non-streaming support, and tests (#13660 ) * llm: add created/model fields, non-streaming support, and tests - Add `created` timestamp and `model` fields to response (required by OpenAI spec) - Add non-streaming mode support for /v1/chat/completions - Add `send_data` helper to HTTPRequestHandler for responses with Content-Length - Refactor viz/serve.py to use send_data - Add integration tests using real OpenAI client 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * add openai to testing * toml * Remove 'openai' from dependencies Removed 'openai' from the dependencies list. * bump cache --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 14:50:36 -05:00
George Hotz	9604773e45	add model choosing support to llm (#13656 )	2025-12-12 11:22:11 -05:00
nimlgen	e36385e570	am: support xgmi systems (#13659 ) * am: support xgmi systems * fake_am	2025-12-12 18:55:45 +03:00
nimlgen	b4796e2d32	amd: set queue prio to normal (#13658 )	2025-12-12 18:25:41 +03:00
nimlgen	a1de7787bf	am: xcc/inst support (#13657 )	2025-12-12 17:40:42 +03:00
George Hotz	f0fa9bcd98	openai api for llm (#13648 ) * openai api for llm * responds to simple request * schedule cache needs to unbind * stream works * share stream code * 20k * one print * cid	2025-12-12 08:25:33 -05:00
qazal	93ad1f7732	viz: readable pmc print, share unpacker with tests (#13655 ) * viz: readable pmc print, share unpacker with tests * sections * static analyzer * rm that	2025-12-12 19:29:59 +08:00
Christopher Milan	760e508c3a	autogen: no deep walk (#13654 ) * no deep walk * reset init * delete walk * remove print * regen * linkage spec * cleanup	2025-12-12 01:04:35 -05:00
wozeparrot	8f60b8dd1e	fix: cast on transpose (#13653 )	2025-12-11 21:03:49 -08:00
Christopher Milan	950d8de00e	automatically inline anonymous (#13652 )	2025-12-12 00:02:44 -05:00
chenyu	01e9ad0d52	clean up bert next_data (#13650 ) train iter was designed to never stop for both real and fake data	2025-12-11 22:56:28 -05:00
Jakob Sachs	ab2220b834	Handle missing bfloat16 natives on CPU architectures (#13553 ) * CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16 * fix lint * remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16 --------- Co-authored-by: Jakob Sachs <jakobs99@purelymail.com>	2025-12-11 15:38:43 -05:00
nimlgen	cbae33003d	ci: add usb4 (#13643 ) * ci: add usb4 * debug=3 * undef * revert	2025-12-11 19:41:41 +03:00
chenyu	03600aef1e	failed test case when init jit with empty inputs (#13641 ) not related to bert grad acc, but still seems to be a bug	2025-12-10 22:03:06 -05:00
nimlgen	51f3c9f615	am: use va_base as base (#13640 )	2025-12-10 21:09:35 +03:00
chenyu	5034c6fb37	reenable FREE_INTERMEDIATE for bert (#13639 ) * reenable FREE_INTERMEDIATE for bert * comment	2025-12-10 12:08:09 -05:00
qazal	be6d538351	viz: add kernel walltime to pmc scoreboard (#13638 ) * viz: add kernel walltime to pmc scoreboard * fix typing * tiny TracingKey refactor * key on kernel name	2025-12-10 20:16:42 +08:00
qazal	1666c4aaab	viz: fix counter names ordering (#13637 )	2025-12-10 17:05:27 +08:00
qazal	c801bb7054	viz: show all kernel pmcs (#13635 )	2025-12-10 07:16:02 +08:00
wozeparrot	4854a0c02c	fix: getattr returns AttributeError not ImportError when missing (#13633 )	2025-12-09 14:26:54 -08:00
chenyu	016a59cafa	remove contiguous and use where in EmbeddingBert (#13632 )	2025-12-09 15:49:21 -05:00
nimlgen	ddecba300f	amd: use getattr for autogen (#13630 ) * amd: use getattr for autogen * fi	2025-12-09 20:36:26 +03:00
Nino Risteski	76d465dbc3	optim empty shard #13513 (#13598 ) * optim empty shard * remove tuple * simplify * lint * lint2 * test * remove original buffer unique id * new rule * reset shard * update * reset shard	2025-12-09 12:28:36 -05:00
ayanhan	47a170be2e	test: enable cummax scalar IndexError test (#13625 )	2025-12-09 12:25:56 -05:00
Christopher Milan	9eae9dc3be	regen smu_v13 with stdint (#13631 ) Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-12-09 12:20:01 -05:00
nimlgen	7cd8852f60	autogen: do no return tuples (#13629 )	2025-12-09 20:08:13 +03:00
nimlgen	9e484b5b1c	hcq: check size is None, do not read the whole size for 0s (#13628 )	2025-12-09 19:37:44 +03:00
nimlgen	1329033b8c	am: fix hot-queue restarts, only dequeue (#13627 )	2025-12-09 19:37:21 +03:00
nimlgen	b07839493d	proclogs with xccs (#13626 )	2025-12-09 16:46:08 +03:00
qazal	2c333818f4	simplify UOp stringifier [pr] (#13618 ) * simplify UOp stringifier [pr] * fix tuple	2025-12-09 05:06:16 +08:00
chenyu	2471b49e45	minor bert / llama change from grad acc branch (#13622 ) * minor bert / llama change from grad acc branch * revert those	2025-12-08 16:04:14 -05:00
Christopher Milan	cb3d756547	NAK compile-only test (#13621 )	2025-12-08 15:53:46 -05:00
Christopher Milan	a4c3d48aa9	compile-only test for IR3 actually works (#13619 )	2025-12-08 15:07:49 -05:00
Christopher Milan	a17077d1d9	skip test_double_assign in CI LVP (#13620 )	2025-12-08 14:54:02 -05:00
Christopher Milan	1c16b6e082	Mesa: freedreno (#12746 ) * ir3 init * got a program * 1 + 1 works * use isa_disasm instead of shader_disasm * wip * matmul works * works on py3.14 * fix const loading * skip QCOM failing tests * cleanup * args actually work * add compile-only tests * fix typo and install tinymesa * IR3 NULL backend * (float32) images work * autogen fix * fix compile only test * typo * mypy happy * compile-only uses py3.14 * bump mesa * unify qcom disassembler * float16 works * disasm shows in viz * save a line * add real del * variable workgroup sizes * simplify diff * bump line count * properly set wgsz * regen mesa * no preamble * bump lines	2025-12-08 14:02:08 -05:00
Douglas Nyberg	947c6eefc3	add Swish op (#13541 ) * add Swish ONNX operator * add Swish regression test * remove trailing whitespace * upgrade ONNX to 1.20, add excludes for unimplemented ops * upgrade ONNX to 1.19, add Swish op * upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op * exclude attention_3d and attention_4d_gqa tests * exclude attention fp16 tests * exclude all attention tests * retrigger CI * retrigger CI - worker crash	2025-12-08 12:41:18 -05:00
nimlgen	dd8a1a10d4	amd: tiny cleanups (#13616 )	2025-12-08 13:15:56 +03:00
qazal	2b07336c82	viz server cleanups (#13615 ) * depths start at 0 * rename the api path	2025-12-08 17:44:43 +08:00
wozeparrot	89c4206e22	fix: typing (#13614 )	2025-12-07 20:10:30 -08:00
qazal	572dfd5506	add static amd program info to viz (#13594 ) * llvm-readelf * amd_readelf + soft_err * cleanup * multiple metadata * max wgp size, may be less	2025-12-08 04:08:14 +08:00

... 5 6 7 8 9 ...

11648 Commits