Christopher Milan
|
bcf6931a4f
|
fix: comma 4 does not have pcie (#15642)
|
2026-04-07 23:57:03 -04:00 |
|
George Hotz
|
f930579b7a
|
llm: change the default port to 8000 so you can remember it (match vLLM)
|
2026-04-08 11:25:38 +08:00 |
|
b1tg
|
bf3763526a
|
llm: buffer SSE chunks to fix parse errors from split reads (#15641)
|
2026-04-08 10:26:23 +08:00 |
|
qazal
|
a508b8fd2a
|
viz: delete redundant things (#15637)
* delete that
* remove
* delete graph config
|
2026-04-08 07:18:04 +09:00 |
|
chenyu
|
9c6e925b56
|
move lerp to mixin (#15634)
last function of math function section
|
2026-04-07 15:13:00 -04:00 |
|
qazal
|
890286e8d6
|
update llama profile.sh (#15633)
* update llama profile.sh
* BENCHMARK 5
|
2026-04-08 03:18:45 +09:00 |
|
nimlgen
|
b78b384d58
|
mlx: graph (#15621)
* Dx
* Dx
* simpler
* mypy
* x
* f
* Dx
* x
* c
* x
|
2026-04-07 19:43:51 +03:00 |
|
qazal
|
d29f0ef721
|
viz: speed up profiler first render (#15632)
* viz: speed up profiler first render
* better comment
|
2026-04-07 23:07:09 +09:00 |
|
George Hotz
|
d3de63d998
|
improvements to apps.llm (#15631)
|
2026-04-07 20:34:05 +08:00 |
|
George Hotz
|
2b01ca59dd
|
USB driver for custom ASM firmware (#15597)
* USB driver for custom ASM firmware
* timeout
* fix mypy
* pcie mem read
* flip in f/w
* one tx
* litle endian
* autodetect custom
* mock bypass
* lint
* clean
|
2026-04-07 13:45:41 +08:00 |
|
wozeparrot
|
810d7c00cd
|
llama: unify scripts (#15628)
|
2026-04-06 20:28:08 -07:00 |
|
Christopher Milan
|
19e96497ee
|
interface in DEV (#15620)
|
2026-04-06 19:59:28 -04:00 |
|
qazal
|
8ba58304f7
|
viz: reenable tests (#15626)
|
2026-04-07 07:52:44 +09:00 |
|
chenyu
|
2f7d085450
|
shared _normalize_indices for getitem (#15625)
* shared _normalize_indices for getitem
* list
|
2026-04-06 17:45:36 -04:00 |
|
chenyu
|
66ec188d50
|
more activations to mixin (#15624)
|
2026-04-06 15:41:41 -04:00 |
|
chenyu
|
1483f7e71c
|
support shift by Tensor (#15623)
* support shift by Tensor
* use mixin
|
2026-04-06 15:14:57 -04:00 |
|
chenyu
|
6e30a5f5ea
|
update shifts in torch backend (#15622)
|
2026-04-06 14:08:33 -04:00 |
|
chenyu
|
a444be172d
|
lower fuzz_symbolic_symbolic_div timeout (#15619)
mitigate timeout crash due to high total time
|
2026-04-06 12:58:29 -04:00 |
|
chenyu
|
01b49c8647
|
support int operand for shifts (#15618)
matches torch/jax, also symbolic rule to remove mask
|
2026-04-06 12:32:12 -04:00 |
|
nimlgen
|
e2700475cf
|
mlx: cleaner (#15617)
* mlx: cleaner
* x
|
2026-04-06 17:49:47 +03:00 |
|
Valtteri Valo
|
86c4431d74
|
add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079)
use supportsFamily API to detect GPU generation instead of parsing
ICB debug description strings. also adds metal4.0 compiler target.
|
2026-04-06 06:51:38 +08:00 |
|
13Perrius
|
ff0c941548
|
remove redundant iteration and toposort in _deepwalk (#15532)
|
2026-04-06 06:38:45 +08:00 |
|
Andrew Cappelli
|
e39cfe685a
|
validate lr, momentum, weight_decay in optimizers (#15576)
|
2026-04-06 06:37:34 +08:00 |
|
nimlgen
|
6a334ceb27
|
hotfix: fix bert (#15613)
|
2026-04-05 23:41:21 +03:00 |
|
nimlgen
|
e3986a6b74
|
mlx: init runtime (#15612)
* mlx: init
* x
* swap
|
2026-04-05 22:52:29 +03:00 |
|
nimlgen
|
e0988dbae5
|
hcq: support non for signal_t and compute_t (#15611)
* hcq: support non for signal_t and compute_t
* revert
* x
|
2026-04-05 18:56:47 +03:00 |
|
nimlgen
|
5e134aa087
|
hcq: add write/poll_bit commands (#15610)
* hcq: add write/poll_bit commands
* x
|
2026-04-05 18:09:44 +03:00 |
|
nimlgen
|
604cdbf2f7
|
am: large allocs aligned to 2mb to use 2mb pages (#15609)
|
2026-04-05 18:01:31 +03:00 |
|
qazal
|
b2d5b29f45
|
assembly/amd: validate dsl keyword args (#15608)
* assembly/amd: validate dsl keyword args
* hm, this should use the SOP2 s_waits
* use the sop2 s_waits
|
2026-04-05 23:00:24 +09:00 |
|
qazal
|
056fcd7758
|
viz: web work from rdna4 gemm (#15607)
* add rdna4 barrier
* fix realtime
|
2026-04-05 19:14:16 +09:00 |
|
wozeparrot
|
7e54992bf6
|
fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
|
2026-04-04 18:24:57 -07:00 |
|
qazal
|
4d36366717
|
assembly/amd: match rdna4 hw gidx init in emulator (#15604)
* simple rdna4 copy kernel with hw fault
* the trivial fix: use ttmp instead of s
* now copy kernel fails in mockgpu
* rm crashing kernel
|
2026-04-05 02:28:18 +09:00 |
|
chenyu
|
2ba5a6ddc8
|
remove detach in selu (#15602)
UOp does not have detach. this does not change behavior
|
2026-04-04 11:04:29 -04:00 |
|
qazal
|
f7aed180e4
|
viz/cli: add Other row in profiler (#15600)
|
2026-04-04 22:40:53 +09:00 |
|
Christopher Milan
|
74ecf6d3e6
|
opaque structs are also c.Struct (#15596)
|
2026-04-03 19:40:43 -04:00 |
|
Christopher Milan
|
645d45d968
|
DEV has arch (#15577)
Co-authored-by: Comma Device <device@comma.ai>
|
2026-04-03 19:17:19 -04:00 |
|
nimlgen
|
902edc3781
|
hcq: hcqbuf in copy (#15595)
|
2026-04-03 22:47:36 +03:00 |
|
nimlgen
|
2c4271209e
|
hcq: peer groups for remote (#15594)
* hcq: set real peer group
* x
* x
* x
|
2026-04-03 19:03:07 +03:00 |
|
chenyu
|
8fdef2d3e4
|
mean/std/var to mixin (#15593)
|
2026-04-03 10:42:41 -04:00 |
|
qazal
|
9920b42b5e
|
hotfix: renderer.target.arch in disasm (#15592)
|
2026-04-03 22:23:51 +09:00 |
|
nimlgen
|
237084b276
|
remote: support several hosts (#15585)
* remote: support several hossts
* f
|
2026-04-03 11:22:15 +03:00 |
|
Christopher Milan
|
0ed8d9271d
|
Renderers accept Target or nothing (#15590)
|
2026-04-03 01:09:41 -04:00 |
|
wozeparrot
|
3a26920141
|
feat: framework ci (#15589)
|
2026-04-02 22:03:51 -07:00 |
|
Christopher Milan
|
736fea8412
|
select_first_inited cleanup and better errors (#15587)
|
2026-04-02 19:27:58 -04:00 |
|
Christopher Milan
|
8c50da800d
|
[pr] cleanup unused ctx's in codegen (#15586)
|
2026-04-02 19:06:58 -04:00 |
|
nimlgen
|
694dc5a717
|
install script in benchmark (#15584)
|
2026-04-02 18:15:58 +03:00 |
|
nimlgen
|
046c3f1240
|
mlx: add loopback with send/recv (#15583)
|
2026-04-02 18:15:46 +03:00 |
|
chenyu
|
c64226e97c
|
fix CreationMixin doc (#15582)
|
2026-04-02 09:46:28 -04:00 |
|
qazal
|
fefb0ebc2a
|
gemm/asm: fp8 cleanups (#15580)
* normal gemm here
* s/dtypes.fp8e4m3/FP8_DTYPE
* gemm_bw
* device UOp stays NULL
|
2026-04-02 19:02:38 +09:00 |
|
chenyu
|
61bc91aa8c
|
Tensor cumalu cleanups (#15579)
* Tensor cumalu cleanups
* happy
|
2026-04-02 05:23:22 -04:00 |
|