Commit Graph

12897 Commits

Author SHA1 Message Date
chenyu
4cf2759fc8 fix merge_reduce_ends (#15659)
* fix merge_reduce_ends

same range with different nesting should not merge, like cumsum twice should not merge

* skip that
2026-04-08 17:20:01 -04:00
chenyu
cb681da840 move UOp.pad to mixin (#15657)
the same arg works for Tensor.pad
2026-04-08 13:15:19 -04:00
nimlgen
28b14b0e38 mlx: remove to_be, use helpers (#15655) 2026-04-08 20:07:28 +03:00
nimlgen
1b44cb2ac6 split update stat from execitem (#15654) 2026-04-08 20:07:12 +03:00
qazal
71c83cc3f6 viz: put OTHER_ on the wave row (#15650)
* viz: put OTHER_ on the wave row

* update tests

* cleanup cli
2026-04-08 23:13:44 +09:00
chenyu
839d37b7bc update median_step_time in model_train.py (#15649)
BENCHMARK=5 used to pick the 4th largest, not the middle one
2026-04-08 09:53:59 -04:00
chenyu
dae9dea903 clean up tensor random functions (#15648)
* clean up tensor random functions

* revert that
2026-04-08 09:44:37 -04:00
George Hotz
1ebeb52e59 RDNA4 asm gemm (#15427)
* sqtt: rdna4 decoder work

* diff cleanup

* more diff

* test

* 125

* r4

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2026-04-08 21:26:44 +08:00
nimlgen
b1e52ba0c2 the slowest line in hcq graph (#15635)
* the slowest line in hcq graph

* x
2026-04-08 15:53:52 +03:00
qazal
3ac16b3bea viz: add wmma row, update exec duration logic (#15646)
* viz: split wmma to its own row, fix duration logic

* regs

* decrease number of loops, add pickle

* assert overlaps
2026-04-08 20:24:23 +09:00
George Hotz
35e3983840 Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644) 2026-04-08 17:16:19 +08:00
qazal
39a029ec55 remove ASM_GEMM context var (#15645) 2026-04-08 18:02:40 +09:00
qazal
dc6a51e44d viz: add # of bytes to sdma (#15639)
* viz: add # of bytes to sdma

* update test_viz
2026-04-08 17:43:37 +09:00
wozeparrot
70dbd35023 llama: move custom_kernel into flat_llama (#15643) 2026-04-08 00:19:14 -07:00
Christopher Milan
bcf6931a4f fix: comma 4 does not have pcie (#15642) 2026-04-07 23:57:03 -04:00
George Hotz
f930579b7a llm: change the default port to 8000 so you can remember it (match vLLM) 2026-04-08 11:25:38 +08:00
b1tg
bf3763526a llm: buffer SSE chunks to fix parse errors from split reads (#15641) 2026-04-08 10:26:23 +08:00
qazal
a508b8fd2a viz: delete redundant things (#15637)
* delete that

* remove

* delete graph config
2026-04-08 07:18:04 +09:00
chenyu
9c6e925b56 move lerp to mixin (#15634)
last function of math function section
2026-04-07 15:13:00 -04:00
qazal
890286e8d6 update llama profile.sh (#15633)
* update llama profile.sh

* BENCHMARK 5
2026-04-08 03:18:45 +09:00
nimlgen
b78b384d58 mlx: graph (#15621)
* Dx

* Dx

* simpler

* mypy

* x

* f

* Dx

* x

* c

* x
2026-04-07 19:43:51 +03:00
qazal
d29f0ef721 viz: speed up profiler first render (#15632)
* viz: speed up profiler first render

* better comment
2026-04-07 23:07:09 +09:00
George Hotz
d3de63d998 improvements to apps.llm (#15631) 2026-04-07 20:34:05 +08:00
George Hotz
2b01ca59dd USB driver for custom ASM firmware (#15597)
* USB driver for custom ASM firmware

* timeout

* fix mypy

* pcie mem read

* flip in f/w

* one tx

* litle endian

* autodetect custom

* mock bypass

* lint

* clean
2026-04-07 13:45:41 +08:00
wozeparrot
810d7c00cd llama: unify scripts (#15628) 2026-04-06 20:28:08 -07:00
Christopher Milan
19e96497ee interface in DEV (#15620) 2026-04-06 19:59:28 -04:00
qazal
8ba58304f7 viz: reenable tests (#15626) 2026-04-07 07:52:44 +09:00
chenyu
2f7d085450 shared _normalize_indices for getitem (#15625)
* shared _normalize_indices for getitem

* list
2026-04-06 17:45:36 -04:00
chenyu
66ec188d50 more activations to mixin (#15624) 2026-04-06 15:41:41 -04:00
chenyu
1483f7e71c support shift by Tensor (#15623)
* support shift by Tensor

* use mixin
2026-04-06 15:14:57 -04:00
chenyu
6e30a5f5ea update shifts in torch backend (#15622) 2026-04-06 14:08:33 -04:00
chenyu
a444be172d lower fuzz_symbolic_symbolic_div timeout (#15619)
mitigate timeout crash due to high total time
2026-04-06 12:58:29 -04:00
chenyu
01b49c8647 support int operand for shifts (#15618)
matches torch/jax, also symbolic rule to remove mask
2026-04-06 12:32:12 -04:00
nimlgen
e2700475cf mlx: cleaner (#15617)
* mlx: cleaner

* x
2026-04-06 17:49:47 +03:00
Valtteri Valo
86c4431d74 add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079)
use supportsFamily API to detect GPU generation instead of parsing
ICB debug description strings. also adds metal4.0 compiler target.
2026-04-06 06:51:38 +08:00
13Perrius
ff0c941548 remove redundant iteration and toposort in _deepwalk (#15532) 2026-04-06 06:38:45 +08:00
Andrew Cappelli
e39cfe685a validate lr, momentum, weight_decay in optimizers (#15576) 2026-04-06 06:37:34 +08:00
nimlgen
6a334ceb27 hotfix: fix bert (#15613) 2026-04-05 23:41:21 +03:00
nimlgen
e3986a6b74 mlx: init runtime (#15612)
* mlx: init

* x

* swap
2026-04-05 22:52:29 +03:00
nimlgen
e0988dbae5 hcq: support non for signal_t and compute_t (#15611)
* hcq: support non for signal_t and compute_t

* revert

* x
2026-04-05 18:56:47 +03:00
nimlgen
5e134aa087 hcq: add write/poll_bit commands (#15610)
* hcq: add write/poll_bit commands

* x
2026-04-05 18:09:44 +03:00
nimlgen
604cdbf2f7 am: large allocs aligned to 2mb to use 2mb pages (#15609) 2026-04-05 18:01:31 +03:00
qazal
b2d5b29f45 assembly/amd: validate dsl keyword args (#15608)
* assembly/amd: validate dsl keyword args

* hm, this should use the SOP2 s_waits

* use the sop2 s_waits
2026-04-05 23:00:24 +09:00
qazal
056fcd7758 viz: web work from rdna4 gemm (#15607)
* add rdna4 barrier

* fix realtime
2026-04-05 19:14:16 +09:00
wozeparrot
7e54992bf6 fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
2026-04-04 18:24:57 -07:00
qazal
4d36366717 assembly/amd: match rdna4 hw gidx init in emulator (#15604)
* simple rdna4 copy kernel with hw fault

* the trivial fix: use ttmp instead of s

* now copy kernel fails in mockgpu

* rm crashing kernel
2026-04-05 02:28:18 +09:00
chenyu
2ba5a6ddc8 remove detach in selu (#15602)
UOp does not have detach. this does not change behavior
2026-04-04 11:04:29 -04:00
qazal
f7aed180e4 viz/cli: add Other row in profiler (#15600) 2026-04-04 22:40:53 +09:00
Christopher Milan
74ecf6d3e6 opaque structs are also c.Struct (#15596) 2026-04-03 19:40:43 -04:00
Christopher Milan
645d45d968 DEV has arch (#15577)
Co-authored-by: Comma Device <device@comma.ai>
2026-04-03 19:17:19 -04:00