Commit Graph

949 Commits

Author SHA1 Message Date
George Hotz
aeb7516c8a tests passing on tinybox h3 (#13742) 2025-12-17 19:04:34 -04:00
George Hotz
b013244c38 fix local tests for AMD_LLVM (#13738)
* fix local tests for AMD_LLVM

* fix linters

* skip that for now

* fix segfault
2025-12-17 12:23:46 -04:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
9015a22523 make tests faster (#13734) 2025-12-17 09:39:44 -04:00
George Hotz
cf0c28d5ae all tests pass on strix halo (#13728) 2025-12-16 19:35:50 -04:00
George Hotz
321ab943b2 qwen model is working (#13690)
* qwen model is mostly working

* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model

- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* smaller diff

* test dequant

* half split

* better

* simple tok

* mock token

* polish

* better

* fix

* replace

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:00:34 -04:00
George Hotz
a657a4e0f4 add Q4_K GGUF quantization support (#13700)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 10:17:56 -05:00
George Hotz
572ca80046 fast tinygrad.apps.llm (#13685)
* llm: add --benchmark support

* fix speed

* debug logging

* fix test attention
2025-12-14 21:05:21 -05:00
chenyu
ed962786d6 use assign in Tensor.backward (#13674)
preserve the grad object so that jit works
2025-12-13 22:43:06 -05:00
George Hotz
55845f7de7 schedule: cache unbinds for consistent cache keys (#13664)
* schedule: cache unbinds for consistent cache keys

strip BIND values before computing cache key so different bound values
(e.g. KV cache positions) hit the same schedule cache entry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* spec: allow single-src BIND for schedule cache key normalization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add lessons learned to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* more claude.md

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 17:27:42 -05:00
George Hotz
8c87a0bf8d Revert "schedule: cache unbinds for consistent cache keys (#13662)"
This reverts commit af86cae10c.
2025-12-12 16:49:50 -05:00
George Hotz
af86cae10c schedule: cache unbinds for consistent cache keys (#13662)
* schedule: cache unbinds for consistent cache keys

different bound variable values (e.g. kv cache positions) now produce
the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before
computing the cache key and rebinding after lookup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: cache unbinds for consistent cache keys

When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to
tagged DEFINE_VARs before computing the cache key. This ensures that
the same computation with different bound values (e.g., different
KV cache positions in LLM) gets the same cache key and reuses the
cached schedule.

The fix:
- pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR
- pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND
- pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify
- var_vals extracted from BINDs before cache key computation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: fix BIND handling and add CLAUDE.md

- Handle BIND to RANGE in create_schedule (not matched by CONST pattern)
- Assert all BINDs on same variable have same value
- Add CLAUDE.md codebase guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 16:40:10 -05:00
George Hotz
316da9f7ff llm: add created/model fields, non-streaming support, and tests (#13660)
* llm: add created/model fields, non-streaming support, and tests

- Add `created` timestamp and `model` fields to response (required by OpenAI spec)
- Add non-streaming mode support for /v1/chat/completions
- Add `send_data` helper to HTTPRequestHandler for responses with Content-Length
- Refactor viz/serve.py to use send_data
- Add integration tests using real OpenAI client

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* add openai to testing

* toml

* Remove 'openai' from dependencies

Removed 'openai' from the dependencies list.

* bump cache

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 14:50:36 -05:00
Christopher Milan
94d7646bdc fix anonymous struct fields (#13610) 2025-12-07 12:56:38 -05:00
nimlgen
ac5f1e115d autogen: repro for the bug (#13607)
* autogen: repro for the test

* mute
2025-12-07 15:51:03 +03:00
George Hotz
c5bd28e21d start work on schedule cache (#13529)
* start work on schedule cache

* local unique

* schedule cache works

* schedule cache cleanup

* fix tests

* preserve metadata

* oops, fix cache

* put that there

* fix spec

* always miss

* why is that broken?

* src[0].op

* fix process replay

* delete abstractions2

* reenable the actual schedule cache

* metadata is best effort

* fix JIT in examples/gradaccum_mnist.py

* full jit

* fixed and test is real
2025-12-04 17:24:49 -08:00
ayanhan
edf929ec9d fix: add __delitem__ to Tensor with proper TypeError (#13561) 2025-12-04 00:53:08 -08:00
Christopher Milan
0a54434b15 mitigate ctypes c_bool bitfield bug (#13558)
* mitigate ctypes c_bool bitfield bug

* don't delete old test
2025-12-03 20:46:04 -05:00
chenyu
22777a89ea minor test_uop_symbolic updates (#13551) 2025-12-03 13:17:44 -05:00
chenyu
a205f98ef4 tighter bound for MOD (#13550) 2025-12-03 11:24:29 -05:00
nimlgen
549f3287a8 fix caching for fetch (#13544) 2025-12-03 14:34:14 +03:00
George Hotz
6bd355fa26 add needs_second_gpu decorator (#13543)
* add needs_second_gpu decorator

* more skips

* two more fixes
2025-12-02 19:08:23 -08:00
Roelof van Dijk
c158e3c988 add cifar gated uop_given_valid regression test (#13536) 2025-12-02 16:02:47 -05:00
nimlgen
77a76d1b13 device: respect compiler ContextVars (#13523)
* device: envvars for cc

* fix

* fix

* x

* um

* fix

* remote

* em

* cleanup

* typing

* fix

* debug

* lvp?

* ugh

* singl

* rm

* lol

* fix

* ?

* this?

* why?

* rev

* mod test

* l
2025-12-02 14:42:04 +03:00
George Hotz
c38b7684dc improve microbenchmarks (#13492)
* improve microbenchmarks

* bugfix + ubench

* lil

* no src in const method
2025-11-29 10:15:22 -08:00
qazal
72ef533d9c tracing: use u32 for buffer args encoding (#13472) 2025-11-28 00:19:51 +08:00
George Hotz
e4cd649ff0 remove kernelize to prepare for refactors (#13463)
* remove kernelize to prepare for refactors

* less kernelize

* last test
2025-11-26 14:18:50 -08:00
qazal
7238df7a94 viz: cleanup sort_fn (#13454) 2025-11-26 04:10:10 +08:00
wozeparrot
249553a119 tinyfs tweaks (#13444) 2025-11-24 18:07:32 -08:00
chenyu
cb29265f23 add test that shows the validhack regression with bad rewrite order (#13411) 2025-11-21 13:48:30 -05:00
chenyu
0251a8e628 parse_valid minor cleanup [pr] (#13385)
* stricter parse_valid [pr]

* not stricter

* no VCONST

* Revert "no VCONST"

This reverts commit 330dbdf4060562596febcbf970bda6051a35012f.
2025-11-20 13:15:06 -05:00
George Hotz
986d113024 symbolic fuzz failure (#13367)
* symbolic fuzz failure

* skip flaky test
2025-11-19 14:21:08 -08:00
George Hotz
05ccc69248 Revert "merge to fold_divmod_general [p] (#13359)"
This reverts commit 7711bbac7f.
2025-11-19 14:18:09 -08:00
George Hotz
7711bbac7f merge to fold_divmod_general [p] (#13359)
* merge to fold_divmod_general [p]

* merge more

* merge more

* merge more
2025-11-19 11:37:45 -08:00
George Hotz
957cf717e7 Python speed (#13355)
* skip process replay by default

* work on python speed

* fix names of rewrite rules

* fix that test
2025-11-19 09:03:00 -08:00
Christopher Milan
a438c277de autogen tests for 3.14 (#13343) 2025-11-18 22:16:59 -05:00
George Hotz
cabd4add48 more work parsing SQTT, separate VIZ/PROFILE (#13308)
* more work parsing SQTT

* more minimal runner

* sep VIZ/PROFILE

* parse print new

* improve parser

* more filter

* that

* split them

* lil cleanup

* skip flaky test

* AQL in mmapeak
2025-11-16 10:40:39 -08:00
nimlgen
c80d459d99 autogen: fix packed args structs (#13274)
* autogen: fix packed args structs

* and test this
2025-11-14 20:24:06 +08:00
George Hotz
bcdfc109b5 hotfix: disable flaky test 2025-11-13 06:19:28 -08:00
George Hotz
ab9fa964d8 DISABLE_COMPILER_CACHE -> CCACHE (#13234)
* DISABLE_COMPILER_CACHE -> CCACHE

* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
qazal
7a6853fa40 viz: show python callstack in the first graph (#13218) 2025-11-12 20:52:28 +08:00
Christopher Milan
41a098a82d In-tree autogen: libc.py (#13217)
* checkout changes from autogen branch

* parents

* pylint happy

* move sys to system in helpers.py

* typo

* typo
2025-11-11 19:13:48 -08:00
qazal
bc55bc4849 cleanup test_viz profiler tests (#13221) 2025-11-12 03:46:48 +08:00
nimlgen
b8e48effcb device: no compilers message with reasons (#13146)
* device: no compilers message with reasons

* typings

* mypy
2025-11-07 23:01:45 +08:00
chenyu
bb8cf948f2 variation of (x%c)+(x//c)*c = x (#13135)
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
bcfe42937f move permute/flip/shrink to mixins (#13113)
* move permute to mixins

* move more stuff

* two more

* fix local mypy

* fix tests

* fix shrink
2025-11-05 14:14:15 -08:00
Sieds Lykles
3dc593c536 add strip_params to pyrender (#13021)
* add strip_params to pyrender

* update that one too

* strip_parens fix

* cleaner

* add test

* add some more tests

* cleaner strip_parens
2025-10-31 14:15:56 +01:00
Sieds Lykles
4c8362128b New symbolic renderer + strip parens (#13017)
* new uop renderer

* better tester

* strip parens

* update tests

* split method check_uop_against_string

* use ctx.update instead of add_rendered method

* strip parens based on precedence

* update test

* new symbolic renderer

* add comment
2025-10-30 16:41:32 +01:00
George Hotz
2da02f1ae1 add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
Sieds Lykles
70bce62c67 dont collapse possibly empty symbolic range (#12994)
* dont collapse a symbolic range based on min/max

* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:17:09 +01:00