Commit Graph

11431 Commits

Author SHA1 Message Date
chenyu
e428fbfab6 verify dtype of llama model params (#13719) 2025-12-16 12:32:02 -05:00
George Hotz
e5a66ace80 multi custom kernel support (#13716)
* multi custom kernel support

* custom kernel xfrom

* works

* no SPEC=2 on ck

* panic

* touchups
2025-12-16 11:36:30 -04:00
nimlgen
5778722979 am: restore queues (#13714)
* am: restore queues

* l

* cmnt
2025-12-16 15:21:42 +03:00
chenyu
041e9a41c9 add contiguous in BertIntermediate (#13713)
faster step with a lot less recomputation
2025-12-15 22:37:36 -05:00
George Hotz
7589c897b2 split usbgpu tests into their own benchmark [pr] (#13711) 2025-12-15 21:42:40 -04:00
qazal
6bafd90248 remove unused process replay input [pr] (#13712) 2025-12-16 09:29:35 +08:00
George Hotz
321ab943b2 qwen model is working (#13690)
* qwen model is mostly working

* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model

- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* smaller diff

* test dequant

* half split

* better

* simple tok

* mock token

* polish

* better

* fix

* replace

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:00:34 -04:00
George Hotz
d43e4c7553 llm args + lil html page (#13710)
* update llm args

* lil html page

* lil

* line size

* qol
2025-12-15 17:09:31 -04:00
George Hotz
ee4a7ee12f rope half-split (#13706)
* rope half

* nicer

* this

* rearrange
2025-12-15 15:31:11 -04:00
Christopher Milan
2359e88f0c wrap cdll redo (#13705)
* wrap CDLL with custom findlib

* lint

* regen

* fix

* mypy

* hardcode libc on macos

* fix frameworks

* fix webgpu win

* remove supports

* regen metal

* regen libclang

* regen

* simpler

* regen

* regen

* find nvrtc

* fix

* regen

* fix

* typo

* regen

* split

* rsplit one

* typo

* try load DLL

* string error
2025-12-15 13:15:02 -05:00
wozeparrot
5d509499b2 tk: kernel finish groups stores (#13704) 2025-12-15 09:16:17 -08:00
George Hotz
54a22aa298 add test for jit footguns (#13701)
* add test for jit footguns

* shorter

* notes
2025-12-15 10:47:44 -05:00
George Hotz
fd49bb512d download cache by job (#13703) 2025-12-15 10:47:17 -05:00
George Hotz
a657a4e0f4 add Q4_K GGUF quantization support (#13700)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 10:17:56 -05:00
nimlgen
615dcab767 am: minimal mi300 boot (#13679)
* nbio7_9

* psp

* gmc

* gfx

* sdma

* ih

* linter

* linter

* minor

* finish

* add missing

* do not allow warm boot for now
2025-12-15 15:55:03 +03:00
qazal
72e006cd59 fast VIZ=2 startup (#13682) 2025-12-15 19:16:43 +08:00
qazal
50d34428bd fix viz endstream (#13687) 2025-12-15 16:54:18 +08:00
wozeparrot
7ef7ce2856 tk reg local store (#13689) 2025-12-14 23:07:30 -08:00
George Hotz
572ca80046 fast tinygrad.apps.llm (#13685)
* llm: add --benchmark support

* fix speed

* debug logging

* fix test attention
2025-12-14 21:05:21 -05:00
chenyu
6cad622f59 don't FREE_INTERMEDIATE in bert (#13684)
hangs green hcq consistently after an hour of training
2025-12-14 14:27:42 -05:00
chenyu
871ab8415f some onnx cleanups (#13683) 2025-12-14 13:58:54 -05:00
nimlgen
75832ce4f6 am: psp with no autoload (#13681) 2025-12-14 20:20:09 +03:00
nimlgen
8bcb1038e4 am: nbio 7.9.0 (#13680) 2025-12-14 18:35:29 +03:00
George Hotz
013240938b llm: add --benchmark support (#13678) 2025-12-14 08:35:05 -05:00
Robbe Derks
cddbdaf5e1 usbgpu: patch: auto-detect controller PID/VID (#13645)
* auto-detect controller

* fix lint?

* needs ''

* just try
2025-12-14 00:54:51 -05:00
George Hotz
d7fb5d9b62 speedups: early return from simplify (#13665)
* early return from simplify

* pm_rewrite

* more speed

* remove again

* early return from simplify

* ugh
2025-12-14 00:51:28 -05:00
George Hotz
bcbf832399 add chrism 2025-12-14 00:45:57 -05:00
chenyu
ed962786d6 use assign in Tensor.backward (#13674)
preserve the grad object so that jit works
2025-12-13 22:43:06 -05:00
chenyu
721a379c41 Revert "autogen: use wrapped CDLL with custom findlib (#13666)" (#13675)
This reverts commit f6cc3b13b9.
2025-12-13 22:42:41 -05:00
nimlgen
6402dcf940 am: xccs queue creation (#13672) 2025-12-13 18:37:09 +03:00
nimlgen
8430ee7d5f am: stop hqd only when active (#13670)
* am: stop hqd only when active

* this better
2025-12-13 17:41:44 +03:00
nimlgen
a49ba241bb am: use fb_base/fb_end as mc aperture (#13671) 2025-12-13 17:29:03 +03:00
nimlgen
0b15c573ca amd: xccs in PCIIface (#13669) 2025-12-13 17:22:11 +03:00
qazal
019e71f8ca lds bank count tests from pmc counters (#13667)
* lds bank count tests from pmc counters

* these tests run on the RDNA3 card too

* rename duration to cycles, other rename comment

* add SQ_LDS_IDX_ACTIVE to gfx9 defaults
2025-12-13 17:39:32 +08:00
qazal
a6dfd8a672 viz server cleanups (#13668)
* viz server cleanups

* comment
2025-12-13 17:27:53 +08:00
Christopher Milan
f6cc3b13b9 autogen: use wrapped CDLL with custom findlib (#13666)
* wrap CDLL with custom findlib

* lint

* regen

* fix

* mypy

* hardcode libc on macos

* fix frameworks

* fix webgpu win

* remove supports

* regen metal

* regen libclang

* regen

* simpler

* regen

* regen

* find nvrtc

* fix

* regen

* fix

* typo

* regen

* split

* rsplit one

* typo
2025-12-13 01:31:30 -05:00
George Hotz
55845f7de7 schedule: cache unbinds for consistent cache keys (#13664)
* schedule: cache unbinds for consistent cache keys

strip BIND values before computing cache key so different bound values
(e.g. KV cache positions) hit the same schedule cache entry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* spec: allow single-src BIND for schedule cache key normalization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add lessons learned to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* more claude.md

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 17:27:42 -05:00
George Hotz
27845353a0 add CLAUDE.md 2025-12-12 16:50:11 -05:00
George Hotz
8c87a0bf8d Revert "schedule: cache unbinds for consistent cache keys (#13662)"
This reverts commit af86cae10c.
2025-12-12 16:49:50 -05:00
George Hotz
443b7fea80 Revert "add notes about jit to claude.md"
This reverts commit 429f82e6a9.
2025-12-12 16:49:48 -05:00
George Hotz
429f82e6a9 add notes about jit to claude.md 2025-12-12 16:48:23 -05:00
George Hotz
af86cae10c schedule: cache unbinds for consistent cache keys (#13662)
* schedule: cache unbinds for consistent cache keys

different bound variable values (e.g. kv cache positions) now produce
the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before
computing the cache key and rebinding after lookup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: cache unbinds for consistent cache keys

When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to
tagged DEFINE_VARs before computing the cache key. This ensures that
the same computation with different bound values (e.g., different
KV cache positions in LLM) gets the same cache key and reuses the
cached schedule.

The fix:
- pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR
- pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND
- pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify
- var_vals extracted from BINDs before cache key computation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: fix BIND handling and add CLAUDE.md

- Handle BIND to RANGE in create_schedule (not matched by CONST pattern)
- Assert all BINDs on same variable have same value
- Add CLAUDE.md codebase guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 16:40:10 -05:00
chenyu
fcaed1e1dd don't use empty in bert fake data (#13661)
somehow jit does not count empty as input
2025-12-12 15:59:50 -05:00
George Hotz
316da9f7ff llm: add created/model fields, non-streaming support, and tests (#13660)
* llm: add created/model fields, non-streaming support, and tests

- Add `created` timestamp and `model` fields to response (required by OpenAI spec)
- Add non-streaming mode support for /v1/chat/completions
- Add `send_data` helper to HTTPRequestHandler for responses with Content-Length
- Refactor viz/serve.py to use send_data
- Add integration tests using real OpenAI client

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* add openai to testing

* toml

* Remove 'openai' from dependencies

Removed 'openai' from the dependencies list.

* bump cache

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 14:50:36 -05:00
George Hotz
9604773e45 add model choosing support to llm (#13656) 2025-12-12 11:22:11 -05:00
nimlgen
e36385e570 am: support xgmi systems (#13659)
* am: support xgmi systems

* fake_am
2025-12-12 18:55:45 +03:00
nimlgen
b4796e2d32 amd: set queue prio to normal (#13658) 2025-12-12 18:25:41 +03:00
nimlgen
a1de7787bf am: xcc/inst support (#13657) 2025-12-12 17:40:42 +03:00
George Hotz
f0fa9bcd98 openai api for llm (#13648)
* openai api for llm

* responds to simple request

* schedule cache needs to unbind

* stream works

* share stream code

* 20k

* one print

* cid
2025-12-12 08:25:33 -05:00
qazal
93ad1f7732 viz: readable pmc print, share unpacker with tests (#13655)
* viz: readable pmc print, share unpacker with tests

* sections

* static analyzer

* rm that
2025-12-12 19:29:59 +08:00