* qwen3.5

* faster

* or

* rm zero hack

* less float

* T=1

* clean

* clean

* 4b

* rope_dim

* Revert "jit: captures linears, not execitems (#15399)"

This reverts commit 9656d97d97.

* DeltaNetBlock

* pairwise_topk

* clean

* Reapply "jit: captures linears, not execitems (#15399)"

This reverts commit cf3deff53d.

* clean topk, _swiglu

* common

* FFNBlock

* clean

* half

* no mix

* qwen3.5 test

* fix ssm cache invalidation

* TransformerConfig

* SSMConfig

* clean

* reset_state

* llm: reuse server conversation tokens to avoid BPE roundtrip cache miss

* import error

* prefill

* none check

* put it back

* clean pairwise_topk

* symbolic: fold BIND(CONST, CONST) to CONST

* clean

* simpler pm

* _cached_msg_count

* stream decoder; ssm checkpoints

* rm checkpoint

* attn_output_gate

* conflict, attn_output_gate

* clean, less has_ssm, assert

* chunked prefill

* _reset_cache

* _reusable_prefix_len

* revert loop

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
This commit is contained in:
b1tg
2026-04-13 15:35:24 +08:00
committed by GitHub
parent 2ada38f777
commit 2b5ba0095d
2 changed files with 100 additions and 10 deletions

View File

@@ -508,6 +508,8 @@ jobs:
run: echo "What's a male chicken called? Answer with only one word." | MAX_BUFFER_SIZE=0 python3 -m tinygrad.apps.llm --model llama3.2:1b | tee /dev/stderr | grep -i rooster
- name: Test 1B LLM (llama q4)
run: echo "What's a male chicken called? Answer with only one word." | MAX_BUFFER_SIZE=0 python3 -m tinygrad.apps.llm --model llama3.2:1b-q4 | tee /dev/stderr | grep -i rooster
- name: Test 1B LLM (qwen3.5)
run: echo "What's a male chicken called? Answer with only one word." | MAX_BUFFER_SIZE=0 python3 -m tinygrad.apps.llm --model qwen3.5:0.8b | tee /dev/stderr | grep -i rooster
- name: Test 1B LLM (qwen)
# NOTE: qwen is dumb and only knows about female chickens
run: echo "What's a female chicken called? Answer with only one word." | MAX_BUFFER_SIZE=0 python3 -m tinygrad.apps.llm --model qwen3:0.6b | tee /dev/stderr | grep -i hen