chenyu
|
f7ff480fa6
|
start mixin getitem tests (#15695)
goal is to make Tensor[idx].uop equal to Tensor.uop[idx]
|
2026-04-12 18:54:33 -04:00 |
|
chenyu
|
e706f408cb
|
suppress test warnings from numpy (#15688)
|
2026-04-11 22:33:20 -04:00 |
|
Graham Robbins
|
4ca844e96b
|
add Q1_0 gguf type (#15683)
* add Q1_0
* better description
* fix trailing whitespace
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
|
2026-04-11 18:17:24 +08:00 |
|
wozeparrot
|
457508d5a0
|
llama: save more 2 (#15681)
|
2026-04-11 01:03:36 -07:00 |
|
George Hotz
|
b5a9465b13
|
llm: add support for moonlight (deepseek MLA) (#15466)
* add gguf Q5_0
* it works
* rebase
* simpler test
* class
* less diff
* dicts
* normal names
* simplify
* this
* simpler
* work
* work
|
2026-04-11 10:32:48 +08:00 |
|
chenyu
|
8e7fcc8ca3
|
remove _include_initial in _cumalu (#15674)
handle negative pad in caller
|
2026-04-10 08:33:30 -04:00 |
|
George Hotz
|
9092f2a8c0
|
llm: add shared_expert and rope_dim support from qwen35 (#15673)
* llm: add shared_expert and rope_dim support from qwen35
* refactor into FFNBlock and TransformerBlock
* norms where they belong
|
2026-04-10 19:18:27 +08:00 |
|
b1tg
|
9ab1415937
|
llm: fix streaming UTF-8 decode (#15653)
|
2026-04-10 17:01:02 +08:00 |
|
Christopher Milan
|
dbc23e8a1b
|
move HCQ_VISIBLE_DEVICES into DEV (#15668)
|
2026-04-09 22:01:35 -04:00 |
|
Christopher Milan
|
d08c76d9cb
|
c.Struct cleanup (#15640)
|
2026-04-08 20:07:16 -04:00 |
|
chenyu
|
4cf2759fc8
|
fix merge_reduce_ends (#15659)
* fix merge_reduce_ends
same range with different nesting should not merge, like cumsum twice should not merge
* skip that
|
2026-04-08 17:20:01 -04:00 |
|
qazal
|
71c83cc3f6
|
viz: put OTHER_ on the wave row (#15650)
* viz: put OTHER_ on the wave row
* update tests
* cleanup cli
|
2026-04-08 23:13:44 +09:00 |
|
qazal
|
3ac16b3bea
|
viz: add wmma row, update exec duration logic (#15646)
* viz: split wmma to its own row, fix duration logic
* regs
* decrease number of loops, add pickle
* assert overlaps
|
2026-04-08 20:24:23 +09:00 |
|
George Hotz
|
35e3983840
|
Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644)
|
2026-04-08 17:16:19 +08:00 |
|
qazal
|
39a029ec55
|
remove ASM_GEMM context var (#15645)
|
2026-04-08 18:02:40 +09:00 |
|
qazal
|
dc6a51e44d
|
viz: add # of bytes to sdma (#15639)
* viz: add # of bytes to sdma
* update test_viz
|
2026-04-08 17:43:37 +09:00 |
|
wozeparrot
|
70dbd35023
|
llama: move custom_kernel into flat_llama (#15643)
|
2026-04-08 00:19:14 -07:00 |
|
George Hotz
|
f930579b7a
|
llm: change the default port to 8000 so you can remember it (match vLLM)
|
2026-04-08 11:25:38 +08:00 |
|
George Hotz
|
2b01ca59dd
|
USB driver for custom ASM firmware (#15597)
* USB driver for custom ASM firmware
* timeout
* fix mypy
* pcie mem read
* flip in f/w
* one tx
* litle endian
* autodetect custom
* mock bypass
* lint
* clean
|
2026-04-07 13:45:41 +08:00 |
|
Christopher Milan
|
19e96497ee
|
interface in DEV (#15620)
|
2026-04-06 19:59:28 -04:00 |
|
qazal
|
8ba58304f7
|
viz: reenable tests (#15626)
|
2026-04-07 07:52:44 +09:00 |
|
chenyu
|
2f7d085450
|
shared _normalize_indices for getitem (#15625)
* shared _normalize_indices for getitem
* list
|
2026-04-06 17:45:36 -04:00 |
|
chenyu
|
a444be172d
|
lower fuzz_symbolic_symbolic_div timeout (#15619)
mitigate timeout crash due to high total time
|
2026-04-06 12:58:29 -04:00 |
|
chenyu
|
01b49c8647
|
support int operand for shifts (#15618)
matches torch/jax, also symbolic rule to remove mask
|
2026-04-06 12:32:12 -04:00 |
|
Valtteri Valo
|
86c4431d74
|
add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079)
use supportsFamily API to detect GPU generation instead of parsing
ICB debug description strings. also adds metal4.0 compiler target.
|
2026-04-06 06:51:38 +08:00 |
|
Andrew Cappelli
|
e39cfe685a
|
validate lr, momentum, weight_decay in optimizers (#15576)
|
2026-04-06 06:37:34 +08:00 |
|
nimlgen
|
e3986a6b74
|
mlx: init runtime (#15612)
* mlx: init
* x
* swap
|
2026-04-05 22:52:29 +03:00 |
|
nimlgen
|
5e134aa087
|
hcq: add write/poll_bit commands (#15610)
* hcq: add write/poll_bit commands
* x
|
2026-04-05 18:09:44 +03:00 |
|
nimlgen
|
604cdbf2f7
|
am: large allocs aligned to 2mb to use 2mb pages (#15609)
|
2026-04-05 18:01:31 +03:00 |
|
qazal
|
b2d5b29f45
|
assembly/amd: validate dsl keyword args (#15608)
* assembly/amd: validate dsl keyword args
* hm, this should use the SOP2 s_waits
* use the sop2 s_waits
|
2026-04-05 23:00:24 +09:00 |
|
wozeparrot
|
7e54992bf6
|
fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
|
2026-04-04 18:24:57 -07:00 |
|
qazal
|
4d36366717
|
assembly/amd: match rdna4 hw gidx init in emulator (#15604)
* simple rdna4 copy kernel with hw fault
* the trivial fix: use ttmp instead of s
* now copy kernel fails in mockgpu
* rm crashing kernel
|
2026-04-05 02:28:18 +09:00 |
|
Christopher Milan
|
645d45d968
|
DEV has arch (#15577)
Co-authored-by: Comma Device <device@comma.ai>
|
2026-04-03 19:17:19 -04:00 |
|
nimlgen
|
902edc3781
|
hcq: hcqbuf in copy (#15595)
|
2026-04-03 22:47:36 +03:00 |
|
chenyu
|
8fdef2d3e4
|
mean/std/var to mixin (#15593)
|
2026-04-03 10:42:41 -04:00 |
|
Christopher Milan
|
0ed8d9271d
|
Renderers accept Target or nothing (#15590)
|
2026-04-03 01:09:41 -04:00 |
|
qazal
|
fefb0ebc2a
|
gemm/asm: fp8 cleanups (#15580)
* normal gemm here
* s/dtypes.fp8e4m3/FP8_DTYPE
* gemm_bw
* device UOp stays NULL
|
2026-04-02 19:02:38 +09:00 |
|
chenyu
|
1aa04eab08
|
simple CreationMixin (#15567)
start with full_like, zeros_like, ones_like
|
2026-04-01 23:00:56 -04:00 |
|
Christopher Milan
|
6c67bd4c14
|
better error message when invalid renderer is specified (#15573)
|
2026-04-01 17:12:55 -04:00 |
|
Christopher Milan
|
0d6fbc2355
|
remove flaky and redundant image test (#15574)
|
2026-04-01 16:33:13 -04:00 |
|
b1tg
|
20497f2840
|
fold BIND to CONST when min==max (#15568)
|
2026-04-01 11:19:04 -04:00 |
|
chenyu
|
f5c0794df2
|
fix Tensor.const_like (#15565)
used to always return a 0-d tensor, now returns an expanded Tensor based on self.shape and matches UOp
|
2026-04-01 08:35:19 -04:00 |
|
chenyu
|
fc5b94b902
|
fix UOp.where(const, const) (#15560)
* fix UOp.where(const, const)
* fix
|
2026-04-01 05:28:49 -04:00 |
|
Christopher Milan
|
acf239e4d2
|
specify renderer in DEV, <dev>_<ren>=1 is deprecated (#15551)
|
2026-03-31 18:35:14 -04:00 |
|
nimlgen
|
5181c8e23a
|
llm: fix nan in kvcache (#15552)
|
2026-04-01 00:38:45 +03:00 |
|
qazal
|
a15345a53e
|
viz/cli: improve --help message (#15546)
* viz/cli: improve --help message
* not the default
* more work
* -s
* respect colored
|
2026-03-31 22:31:33 +09:00 |
|
chenyu
|
4ac2552642
|
improve ReduceMixin.all (#15544)
use prod instead of min since `mul` lowered to `and` directly
|
2026-03-31 07:54:27 -04:00 |
|
chenyu
|
89ec22131a
|
tests to show double negation in min is not cancelled (#15543)
|
2026-03-31 06:59:13 -04:00 |
|
qazal
|
8feb8edc68
|
gemm/asm: add fp8 support to cdna asm_gemm (#15542)
* work
* hmm, mixins
* rhs_transposed
* also fix the dtype
* check for hipcc
* Exception
* select dev
* default
|
2026-03-31 19:32:54 +09:00 |
|
qazal
|
467c0af8aa
|
viz: skip flaky sever tests (#15538)
|
2026-03-31 17:20:30 +09:00 |
|