tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
chenyu	e428fbfab6	verify dtype of llama model params (#13719 )	2025-12-16 12:32:02 -05:00
George Hotz	e5a66ace80	multi custom kernel support (#13716 ) * multi custom kernel support * custom kernel xfrom * works * no SPEC=2 on ck * panic * touchups	2025-12-16 11:36:30 -04:00
nimlgen	5778722979	am: restore queues (#13714 ) * am: restore queues * l * cmnt	2025-12-16 15:21:42 +03:00
chenyu	041e9a41c9	add contiguous in BertIntermediate (#13713 ) faster step with a lot less recomputation	2025-12-15 22:37:36 -05:00
George Hotz	7589c897b2	split usbgpu tests into their own benchmark [pr] (#13711 )	2025-12-15 21:42:40 -04:00
qazal	6bafd90248	remove unused process replay input [pr] (#13712 )	2025-12-16 09:29:35 +08:00
George Hotz	321ab943b2	qwen model is working (#13690 ) * qwen model is mostly working * add Q4_K quantization support to GGUF parser, add qwen3:1.7b model - Add Q4_K (type 12) dequantization in nn/state.py - Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0) - Make bos_token_id optional for models like Qwen3 that don't have it - Fix line length issues and add preset parameter to SimpleTokenizer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * smaller diff * test dequant * half split * better * simple tok * mock token * polish * better * fix * replace --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 18:00:34 -04:00
George Hotz	d43e4c7553	llm args + lil html page (#13710 ) * update llm args * lil html page * lil * line size * qol	2025-12-15 17:09:31 -04:00
George Hotz	ee4a7ee12f	rope half-split (#13706 ) * rope half * nicer * this * rearrange	2025-12-15 15:31:11 -04:00
Christopher Milan	2359e88f0c	wrap cdll redo (#13705 ) * wrap CDLL with custom findlib * lint * regen * fix * mypy * hardcode libc on macos * fix frameworks * fix webgpu win * remove supports * regen metal * regen libclang * regen * simpler * regen * regen * find nvrtc * fix * regen * fix * typo * regen * split * rsplit one * typo * try load DLL * string error	2025-12-15 13:15:02 -05:00
wozeparrot	5d509499b2	tk: kernel finish groups stores (#13704 )	2025-12-15 09:16:17 -08:00
George Hotz	54a22aa298	add test for jit footguns (#13701 ) * add test for jit footguns * shorter * notes	2025-12-15 10:47:44 -05:00
George Hotz	fd49bb512d	download cache by job (#13703 )	2025-12-15 10:47:17 -05:00
George Hotz	a657a4e0f4	add Q4_K GGUF quantization support (#13700 ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-15 10:17:56 -05:00
nimlgen	615dcab767	am: minimal mi300 boot (#13679 ) * nbio7_9 * psp * gmc * gfx * sdma * ih * linter * linter * minor * finish * add missing * do not allow warm boot for now	2025-12-15 15:55:03 +03:00
qazal	72e006cd59	fast VIZ=2 startup (#13682 )	2025-12-15 19:16:43 +08:00
qazal	50d34428bd	fix viz endstream (#13687 )	2025-12-15 16:54:18 +08:00
wozeparrot	7ef7ce2856	tk reg local store (#13689 )	2025-12-14 23:07:30 -08:00
George Hotz	572ca80046	fast tinygrad.apps.llm (#13685 ) * llm: add --benchmark support * fix speed * debug logging * fix test attention	2025-12-14 21:05:21 -05:00
chenyu	6cad622f59	don't FREE_INTERMEDIATE in bert (#13684 ) hangs green hcq consistently after an hour of training	2025-12-14 14:27:42 -05:00
chenyu	871ab8415f	some onnx cleanups (#13683 )	2025-12-14 13:58:54 -05:00
nimlgen	75832ce4f6	am: psp with no autoload (#13681 )	2025-12-14 20:20:09 +03:00
nimlgen	8bcb1038e4	am: nbio 7.9.0 (#13680 )	2025-12-14 18:35:29 +03:00
George Hotz	013240938b	llm: add --benchmark support (#13678 )	2025-12-14 08:35:05 -05:00
Robbe Derks	cddbdaf5e1	usbgpu: patch: auto-detect controller PID/VID (#13645 ) * auto-detect controller * fix lint? * needs '' * just try	2025-12-14 00:54:51 -05:00
George Hotz	d7fb5d9b62	speedups: early return from simplify (#13665 ) * early return from simplify * pm_rewrite * more speed * remove again * early return from simplify * ugh	2025-12-14 00:51:28 -05:00
George Hotz	bcbf832399	add chrism	2025-12-14 00:45:57 -05:00
chenyu	ed962786d6	use assign in Tensor.backward (#13674 ) preserve the grad object so that jit works	2025-12-13 22:43:06 -05:00
chenyu	721a379c41	Revert "autogen: use wrapped CDLL with custom findlib (#13666 )" (#13675 ) This reverts commit `f6cc3b13b9`.	2025-12-13 22:42:41 -05:00
nimlgen	6402dcf940	am: xccs queue creation (#13672 )	2025-12-13 18:37:09 +03:00
nimlgen	8430ee7d5f	am: stop hqd only when active (#13670 ) * am: stop hqd only when active * this better	2025-12-13 17:41:44 +03:00
nimlgen	a49ba241bb	am: use fb_base/fb_end as mc aperture (#13671 )	2025-12-13 17:29:03 +03:00
nimlgen	0b15c573ca	amd: xccs in PCIIface (#13669 )	2025-12-13 17:22:11 +03:00
qazal	019e71f8ca	lds bank count tests from pmc counters (#13667 ) * lds bank count tests from pmc counters * these tests run on the RDNA3 card too * rename duration to cycles, other rename comment * add SQ_LDS_IDX_ACTIVE to gfx9 defaults	2025-12-13 17:39:32 +08:00
qazal	a6dfd8a672	viz server cleanups (#13668 ) * viz server cleanups * comment	2025-12-13 17:27:53 +08:00
Christopher Milan	f6cc3b13b9	autogen: use wrapped CDLL with custom findlib (#13666 ) * wrap CDLL with custom findlib * lint * regen * fix * mypy * hardcode libc on macos * fix frameworks * fix webgpu win * remove supports * regen metal * regen libclang * regen * simpler * regen * regen * find nvrtc * fix * regen * fix * typo * regen * split * rsplit one * typo	2025-12-13 01:31:30 -05:00
George Hotz	55845f7de7	schedule: cache unbinds for consistent cache keys (#13664 ) * schedule: cache unbinds for consistent cache keys strip BIND values before computing cache key so different bound values (e.g. KV cache positions) hit the same schedule cache entry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * spec: allow single-src BIND for schedule cache key normalization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add lessons learned to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * more claude.md --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 17:27:42 -05:00
George Hotz	27845353a0	add CLAUDE.md	2025-12-12 16:50:11 -05:00
George Hotz	8c87a0bf8d	Revert "schedule: cache unbinds for consistent cache keys (#13662 )" This reverts commit `af86cae10c`.	2025-12-12 16:49:50 -05:00
George Hotz	443b7fea80	Revert "add notes about jit to claude.md" This reverts commit `429f82e6a9`.	2025-12-12 16:49:48 -05:00
George Hotz	429f82e6a9	add notes about jit to claude.md	2025-12-12 16:48:23 -05:00
George Hotz	af86cae10c	schedule: cache unbinds for consistent cache keys (#13662 ) * schedule: cache unbinds for consistent cache keys different bound variable values (e.g. kv cache positions) now produce the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before computing the cache key and rebinding after lookup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: cache unbinds for consistent cache keys When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to tagged DEFINE_VARs before computing the cache key. This ensures that the same computation with different bound values (e.g., different KV cache positions in LLM) gets the same cache key and reuses the cached schedule. The fix: - pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR - pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND - pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify - var_vals extracted from BINDs before cache key computation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * schedule: fix BIND handling and add CLAUDE.md - Handle BIND to RANGE in create_schedule (not matched by CONST pattern) - Assert all BINDs on same variable have same value - Add CLAUDE.md codebase guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 16:40:10 -05:00
chenyu	fcaed1e1dd	don't use empty in bert fake data (#13661 ) somehow jit does not count empty as input	2025-12-12 15:59:50 -05:00
George Hotz	316da9f7ff	llm: add created/model fields, non-streaming support, and tests (#13660 ) * llm: add created/model fields, non-streaming support, and tests - Add `created` timestamp and `model` fields to response (required by OpenAI spec) - Add non-streaming mode support for /v1/chat/completions - Add `send_data` helper to HTTPRequestHandler for responses with Content-Length - Refactor viz/serve.py to use send_data - Add integration tests using real OpenAI client 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * add openai to testing * toml * Remove 'openai' from dependencies Removed 'openai' from the dependencies list. * bump cache --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 14:50:36 -05:00
George Hotz	9604773e45	add model choosing support to llm (#13656 )	2025-12-12 11:22:11 -05:00
nimlgen	e36385e570	am: support xgmi systems (#13659 ) * am: support xgmi systems * fake_am	2025-12-12 18:55:45 +03:00
nimlgen	b4796e2d32	amd: set queue prio to normal (#13658 )	2025-12-12 18:25:41 +03:00
nimlgen	a1de7787bf	am: xcc/inst support (#13657 )	2025-12-12 17:40:42 +03:00
George Hotz	f0fa9bcd98	openai api for llm (#13648 ) * openai api for llm * responds to simple request * schedule cache needs to unbind * stream works * share stream code * 20k * one print * cid	2025-12-12 08:25:33 -05:00
qazal	93ad1f7732	viz: readable pmc print, share unpacker with tests (#13655 ) * viz: readable pmc print, share unpacker with tests * sections * static analyzer * rm that	2025-12-12 19:29:59 +08:00

1 2 3 4 5 ...

11431 Commits