George Hotz
|
5683126844
|
llm: support for tekken tokenizer (#15720)
|
2026-04-14 10:52:07 +08:00 |
|
chenyu
|
70883a6950
|
cat the stack to mixin (#15715)
|
2026-04-13 18:44:39 -04:00 |
|
qazal
|
355e2729d3
|
viz: keep program UOp in data (#15714)
* refactor program uop access
* c.name
|
2026-04-14 07:04:16 +09:00 |
|
qazal
|
905b8adc97
|
viz: cli and server cleanups (#15713)
* update get_profile arg[0]
* uop_to_json arg[0]
* data is standalone in cli
|
2026-04-14 06:42:29 +09:00 |
|
Christopher Milan
|
d83707ec29
|
autogen: explicit types (#15679)
|
2026-04-13 16:54:39 -04:00 |
|
chenyu
|
ac41f15fc1
|
cumsum to mixin (#15712)
built on top of getitem
|
2026-04-13 15:06:08 -04:00 |
|
nimlgen
|
eac481b67f
|
mlx: fix ctypes (#15711)
* mlx: fix ctypes
* x
|
2026-04-13 20:43:56 +03:00 |
|
nimlgen
|
b370f5c5ac
|
hcq: call free for unmap (#15710)
|
2026-04-13 20:30:21 +03:00 |
|
chenyu
|
931d6cc62a
|
basic getitem to mixin (#15697)
* basic getitem to mixin
* cleanup
* fix
* cleanup
|
2026-04-13 13:04:36 -04:00 |
|
George Hotz
|
7610bdc59e
|
block multistore, it's not supported (#15708)
|
2026-04-13 20:57:59 +08:00 |
|
George Hotz
|
84d64b5835
|
hotfix: abstractions4 works in mock except asm
|
2026-04-13 20:57:00 +08:00 |
|
George Hotz
|
16f50a40a5
|
remove REMU from tree (#15706)
* no more compare emulators
* remove remu from tree
|
2026-04-13 20:43:08 +08:00 |
|
qazal
|
ac027055ef
|
viz: no global state (#15705)
* start viz data
* get_full_rewrites also moves
* update ref_map
* work
* update consumers
* cleaner cli
* linter
* cleanup tests
* back
* better
* sqtt tests
|
2026-04-13 21:35:20 +09:00 |
|
George Hotz
|
4c1fb18a09
|
Revert "Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (…" (#15703)
This reverts commit 0cec42db71.
|
2026-04-13 19:09:38 +08:00 |
|
George Hotz
|
0cec42db71
|
Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700)" (#15702)
This reverts commit 6f5d756282.
|
2026-04-13 19:06:44 +08:00 |
|
George Hotz
|
6f5d756282
|
Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700)
* broken after/assign test
* test for GatedDeltaNet
* better comments
* fix issue 1 with multi kernel
* fix 2
* fix
* linter
* public api + cleanup
|
2026-04-13 18:43:23 +08:00 |
|
b1tg
|
2b5ba0095d
|
qwen3.5 (#15210)
* qwen3.5
* faster
* or
* rm zero hack
* less float
* T=1
* clean
* clean
* 4b
* rope_dim
* Revert "jit: captures linears, not execitems (#15399)"
This reverts commit 9656d97d97.
* DeltaNetBlock
* pairwise_topk
* clean
* Reapply "jit: captures linears, not execitems (#15399)"
This reverts commit cf3deff53d.
* clean topk, _swiglu
* common
* FFNBlock
* clean
* half
* no mix
* qwen3.5 test
* fix ssm cache invalidation
* TransformerConfig
* SSMConfig
* clean
* reset_state
* llm: reuse server conversation tokens to avoid BPE roundtrip cache miss
* import error
* prefill
* none check
* put it back
* clean pairwise_topk
* symbolic: fold BIND(CONST, CONST) to CONST
* clean
* simpler pm
* _cached_msg_count
* stream decoder; ssm checkpoints
* rm checkpoint
* attn_output_gate
* conflict, attn_output_gate
* clean, less has_ssm, assert
* chunked prefill
* _reset_cache
* _reusable_prefix_len
* revert loop
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
|
2026-04-13 15:35:24 +08:00 |
|
qazal
|
2ada38f777
|
viz: execv after all producers complete (#15696)
|
2026-04-13 08:15:47 +09:00 |
|
chenyu
|
f7ff480fa6
|
start mixin getitem tests (#15695)
goal is to make Tensor[idx].uop equal to Tensor.uop[idx]
|
2026-04-12 18:54:33 -04:00 |
|
chenyu
|
77385ccb37
|
more trivial stuff to mixin (#15693)
|
2026-04-12 15:17:16 -04:00 |
|
chenyu
|
ff1de5ae13
|
normalize logsumexp contiguous_backward to mixin (#15692)
* normalize logsumexp contiguous_backward to mixin
* more
|
2026-04-12 13:13:00 -04:00 |
|
chenyu
|
0254cfe642
|
move usum and uprod to mixin (#15690)
and used it to clean up ops and tensor
|
2026-04-12 11:42:24 -04:00 |
|
nimlgen
|
e9b2e156b4
|
add jitbeam to tinygpu docs (#15691)
|
2026-04-12 18:20:26 +03:00 |
|
chenyu
|
e706f408cb
|
suppress test warnings from numpy (#15688)
|
2026-04-11 22:33:20 -04:00 |
|
nimlgen
|
938cba4fdf
|
amd: a bit faster usb, skip interrupts on sync (#15686)
|
2026-04-11 17:26:36 +03:00 |
|
qazal
|
054d78e6ff
|
fix llama profile.sh NULL source (#15685)
|
2026-04-11 22:56:05 +09:00 |
|
Graham Robbins
|
4ca844e96b
|
add Q1_0 gguf type (#15683)
* add Q1_0
* better description
* fix trailing whitespace
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
|
2026-04-11 18:17:24 +08:00 |
|
George Hotz
|
5156a04cf5
|
add support for AM_POWER_LIMIT (#15684)
* add support for AM_POWER_LIMIT
* level None
|
2026-04-11 17:14:54 +08:00 |
|
wozeparrot
|
457508d5a0
|
llama: save more 2 (#15681)
|
2026-04-11 01:03:36 -07:00 |
|
George Hotz
|
29238b772f
|
AMD USB: support for 0xF3 power toggle
|
2026-04-11 13:04:38 +08:00 |
|
George Hotz
|
b5a9465b13
|
llm: add support for moonlight (deepseek MLA) (#15466)
* add gguf Q5_0
* it works
* rebase
* simpler test
* class
* less diff
* dicts
* normal names
* simplify
* this
* simpler
* work
* work
|
2026-04-11 10:32:48 +08:00 |
|
wozeparrot
|
590464c8d8
|
llama: only support wqkv path + cleanups (#15680)
* llama: only support wqkv path + cleanups
* llama: missing transpose
|
2026-04-11 07:39:27 +08:00 |
|
nimlgen
|
aa012d6f08
|
usb: faster custom (#15678)
* usb: _f0_out_buf for e4 cmd as well
* custom speed
* fast
|
2026-04-10 23:00:31 +03:00 |
|
nimlgen
|
58646f9569
|
usb fast copyout (#15677)
* usb
* fix usb
|
2026-04-10 21:04:49 +03:00 |
|
qazal
|
0d5cdc9600
|
viz: split draw loop (#15676)
* split draw loop
* one draw
* no functions
* inline all highlights
* cleanup
|
2026-04-10 23:25:50 +09:00 |
|
chenyu
|
e1334d3852
|
move canonicalize_device to device.py (#15675)
|
2026-04-10 09:43:56 -04:00 |
|
chenyu
|
8e7fcc8ca3
|
remove _include_initial in _cumalu (#15674)
handle negative pad in caller
|
2026-04-10 08:33:30 -04:00 |
|
George Hotz
|
9092f2a8c0
|
llm: add shared_expert and rope_dim support from qwen35 (#15673)
* llm: add shared_expert and rope_dim support from qwen35
* refactor into FFNBlock and TransformerBlock
* norms where they belong
|
2026-04-10 19:18:27 +08:00 |
|
b1tg
|
9ab1415937
|
llm: fix streaming UTF-8 decode (#15653)
|
2026-04-10 17:01:02 +08:00 |
|
wozeparrot
|
55bcd7cc9e
|
llama amax outside (#15670)
|
2026-04-09 23:08:03 -07:00 |
|
George Hotz
|
16f3448b26
|
Add HIP to abstractions4 (#15672)
* cleanup formatting
* add HIP option
* pass in correct
|
2026-04-10 14:05:52 +08:00 |
|
George Hotz
|
ed2a72bb23
|
work on abstractions4 (#15671)
* work on abstractions4
* works
* offst
* assembly works
* RAND
* cleanup
* work
|
2026-04-10 13:25:11 +08:00 |
|
Christopher Milan
|
dbc23e8a1b
|
move HCQ_VISIBLE_DEVICES into DEV (#15668)
|
2026-04-09 22:01:35 -04:00 |
|
George Hotz
|
fa02105546
|
hotfix: pin amd isa xml version
|
2026-04-10 06:47:00 +08:00 |
|
nimlgen
|
057dc173ab
|
beam uop (#15660)
* beam as uop
* x
|
2026-04-09 19:13:03 +03:00 |
|
nimlgen
|
0ff30b003d
|
am: reset queues from spi (#15664)
* am: reset queues from spi
* move
|
2026-04-09 18:25:50 +03:00 |
|
George Hotz
|
48a7627b04
|
add RDNA4 support to copy WMMA (#15663)
* add RDNA4 supportt to copy WMMA
* simpler
* simpler
* comment
* assert
|
2026-04-09 22:48:20 +08:00 |
|
chenyu
|
6837881b06
|
remove same_shape_noop [pr] (#15662)
no longer used
|
2026-04-09 09:50:26 -04:00 |
|
Christopher Milan
|
d08c76d9cb
|
c.Struct cleanup (#15640)
|
2026-04-08 20:07:16 -04:00 |
|
qazal
|
742b3894d7
|
viz/cli: add pmc printer (#15651)
* viz/cli: add pmc printer
* cli work
* s
* linter
* pack workgroups
* add : to wgp
* counter name
|
2026-04-09 08:50:54 +09:00 |
|