Commit Graph

12959 Commits

Author SHA1 Message Date
chenyu
7cbfa1896a comment out unused arm, triton in toml (#15741)
fixed `PYTHONPATH=. uv run tinygrad/apps/llm.py`
2026-04-15 10:05:19 -04:00
Christopher Milan
1c36878008 DEV: suggest alternatives (#15732) 2026-04-14 23:42:32 -04:00
George Hotz
1ae6528bb6 move schedule into schedule (#15736)
* move schedule into schedule

* callify to root

* sched docs
2026-04-15 11:03:25 +08:00
wozeparrot
3721c60bef llama: bs 16 (#15737) 2026-04-14 19:52:03 -07:00
wozeparrot
480ad264a4 llama: per device amax (#15735) 2026-04-14 19:01:17 -07:00
Christopher Milan
adc96cd724 qcom: synchronize for copyin (#15731)
fixes: #15698
2026-04-14 18:31:15 -04:00
chenyu
3394d18066 size*itemsize -> nbytes (#15729)
and some UOp.size removal to prep for size to mixin change
2026-04-14 16:27:54 -04:00
nimlgen
e9ecc990ea amd: add r9700 devid (#15721) 2026-04-14 20:15:00 +03:00
George Hotz
2450c8cba8 rename to callify + fix mypy (#15727)
* rename to callify + fix mypy

* update test
2026-04-14 23:43:19 +08:00
chenyu
528faa18ec update env_vars.md (#15722)
remove HCQ_VISIBLE_DEVICES, IMAGE=2 and old DEBUG=3 stuff
2026-04-14 09:13:35 -04:00
George Hotz
359b1582d6 amd: EMU DPP support (#15719)
* EMU DPP support from GPT 5.4

* cleanups

* simple

* nope

* fix
2026-04-14 14:58:41 +08:00
wozeparrot
2b8d303f75 allreduce in precast dtype (#15689) 2026-04-13 20:24:12 -07:00
George Hotz
5683126844 llm: support for tekken tokenizer (#15720) 2026-04-14 10:52:07 +08:00
chenyu
70883a6950 cat the stack to mixin (#15715) 2026-04-13 18:44:39 -04:00
qazal
355e2729d3 viz: keep program UOp in data (#15714)
* refactor program uop access

* c.name
2026-04-14 07:04:16 +09:00
qazal
905b8adc97 viz: cli and server cleanups (#15713)
* update get_profile arg[0]

* uop_to_json arg[0]

* data is standalone in cli
2026-04-14 06:42:29 +09:00
Christopher Milan
d83707ec29 autogen: explicit types (#15679) 2026-04-13 16:54:39 -04:00
chenyu
ac41f15fc1 cumsum to mixin (#15712)
built on top of getitem
2026-04-13 15:06:08 -04:00
nimlgen
eac481b67f mlx: fix ctypes (#15711)
* mlx: fix ctypes

* x
2026-04-13 20:43:56 +03:00
nimlgen
b370f5c5ac hcq: call free for unmap (#15710) 2026-04-13 20:30:21 +03:00
chenyu
931d6cc62a basic getitem to mixin (#15697)
* basic getitem to mixin

* cleanup

* fix

* cleanup
2026-04-13 13:04:36 -04:00
George Hotz
7610bdc59e block multistore, it's not supported (#15708) 2026-04-13 20:57:59 +08:00
George Hotz
84d64b5835 hotfix: abstractions4 works in mock except asm 2026-04-13 20:57:00 +08:00
George Hotz
16f50a40a5 remove REMU from tree (#15706)
* no more compare emulators

* remove remu from tree
2026-04-13 20:43:08 +08:00
qazal
ac027055ef viz: no global state (#15705)
* start viz data

* get_full_rewrites also moves

* update ref_map

* work

* update consumers

* cleaner cli

* linter

* cleanup tests

* back

* better

* sqtt tests
2026-04-13 21:35:20 +09:00
George Hotz
4c1fb18a09 Revert "Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (…" (#15703)
This reverts commit 0cec42db71.
2026-04-13 19:09:38 +08:00
George Hotz
0cec42db71 Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700)" (#15702)
This reverts commit 6f5d756282.
2026-04-13 19:06:44 +08:00
George Hotz
6f5d756282 Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700)
* broken after/assign test

* test for GatedDeltaNet

* better comments

* fix issue 1 with multi kernel

* fix 2

* fix

* linter

* public api + cleanup
2026-04-13 18:43:23 +08:00
b1tg
2b5ba0095d qwen3.5 (#15210)
* qwen3.5

* faster

* or

* rm zero hack

* less float

* T=1

* clean

* clean

* 4b

* rope_dim

* Revert "jit: captures linears, not execitems (#15399)"

This reverts commit 9656d97d97.

* DeltaNetBlock

* pairwise_topk

* clean

* Reapply "jit: captures linears, not execitems (#15399)"

This reverts commit cf3deff53d.

* clean topk, _swiglu

* common

* FFNBlock

* clean

* half

* no mix

* qwen3.5 test

* fix ssm cache invalidation

* TransformerConfig

* SSMConfig

* clean

* reset_state

* llm: reuse server conversation tokens to avoid BPE roundtrip cache miss

* import error

* prefill

* none check

* put it back

* clean pairwise_topk

* symbolic: fold BIND(CONST, CONST) to CONST

* clean

* simpler pm

* _cached_msg_count

* stream decoder; ssm checkpoints

* rm checkpoint

* attn_output_gate

* conflict, attn_output_gate

* clean, less has_ssm, assert

* chunked prefill

* _reset_cache

* _reusable_prefix_len

* revert loop

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-04-13 15:35:24 +08:00
qazal
2ada38f777 viz: execv after all producers complete (#15696) 2026-04-13 08:15:47 +09:00
chenyu
f7ff480fa6 start mixin getitem tests (#15695)
goal is to make Tensor[idx].uop equal to Tensor.uop[idx]
2026-04-12 18:54:33 -04:00
chenyu
77385ccb37 more trivial stuff to mixin (#15693) 2026-04-12 15:17:16 -04:00
chenyu
ff1de5ae13 normalize logsumexp contiguous_backward to mixin (#15692)
* normalize logsumexp contiguous_backward to mixin

* more
2026-04-12 13:13:00 -04:00
chenyu
0254cfe642 move usum and uprod to mixin (#15690)
and used it to clean up ops and tensor
2026-04-12 11:42:24 -04:00
nimlgen
e9b2e156b4 add jitbeam to tinygpu docs (#15691) 2026-04-12 18:20:26 +03:00
chenyu
e706f408cb suppress test warnings from numpy (#15688) 2026-04-11 22:33:20 -04:00
nimlgen
938cba4fdf amd: a bit faster usb, skip interrupts on sync (#15686) 2026-04-11 17:26:36 +03:00
qazal
054d78e6ff fix llama profile.sh NULL source (#15685) 2026-04-11 22:56:05 +09:00
Graham Robbins
4ca844e96b add Q1_0 gguf type (#15683)
* add Q1_0

* better description

* fix trailing whitespace

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-04-11 18:17:24 +08:00
George Hotz
5156a04cf5 add support for AM_POWER_LIMIT (#15684)
* add support for AM_POWER_LIMIT

* level None
2026-04-11 17:14:54 +08:00
wozeparrot
457508d5a0 llama: save more 2 (#15681) 2026-04-11 01:03:36 -07:00
George Hotz
29238b772f AMD USB: support for 0xF3 power toggle 2026-04-11 13:04:38 +08:00
George Hotz
b5a9465b13 llm: add support for moonlight (deepseek MLA) (#15466)
* add gguf Q5_0

* it works

* rebase

* simpler test

* class

* less diff

* dicts

* normal names

* simplify

* this

* simpler

* work

* work
2026-04-11 10:32:48 +08:00
wozeparrot
590464c8d8 llama: only support wqkv path + cleanups (#15680)
* llama: only support wqkv path + cleanups

* llama: missing transpose
2026-04-11 07:39:27 +08:00
nimlgen
aa012d6f08 usb: faster custom (#15678)
* usb: _f0_out_buf for e4 cmd as well

* custom speed

* fast
2026-04-10 23:00:31 +03:00
nimlgen
58646f9569 usb fast copyout (#15677)
* usb

* fix usb
2026-04-10 21:04:49 +03:00
qazal
0d5cdc9600 viz: split draw loop (#15676)
* split draw loop

* one draw

* no functions

* inline all highlights

* cleanup
2026-04-10 23:25:50 +09:00
chenyu
e1334d3852 move canonicalize_device to device.py (#15675) 2026-04-10 09:43:56 -04:00
chenyu
8e7fcc8ca3 remove _include_initial in _cumalu (#15674)
handle negative pad in caller
2026-04-10 08:33:30 -04:00
George Hotz
9092f2a8c0 llm: add shared_expert and rope_dim support from qwen35 (#15673)
* llm: add shared_expert and rope_dim support from qwen35

* refactor into FFNBlock and TransformerBlock

* norms where they belong
2026-04-10 19:18:27 +08:00