George Hotz
|
fa02105546
|
hotfix: pin amd isa xml version
|
2026-04-10 06:47:00 +08:00 |
|
nimlgen
|
057dc173ab
|
beam uop (#15660)
* beam as uop
* x
|
2026-04-09 19:13:03 +03:00 |
|
nimlgen
|
0ff30b003d
|
am: reset queues from spi (#15664)
* am: reset queues from spi
* move
|
2026-04-09 18:25:50 +03:00 |
|
George Hotz
|
48a7627b04
|
add RDNA4 support to copy WMMA (#15663)
* add RDNA4 supportt to copy WMMA
* simpler
* simpler
* comment
* assert
|
2026-04-09 22:48:20 +08:00 |
|
chenyu
|
6837881b06
|
remove same_shape_noop [pr] (#15662)
no longer used
|
2026-04-09 09:50:26 -04:00 |
|
Christopher Milan
|
d08c76d9cb
|
c.Struct cleanup (#15640)
|
2026-04-08 20:07:16 -04:00 |
|
qazal
|
742b3894d7
|
viz/cli: add pmc printer (#15651)
* viz/cli: add pmc printer
* cli work
* s
* linter
* pack workgroups
* add : to wgp
* counter name
|
2026-04-09 08:50:54 +09:00 |
|
chenyu
|
4cf2759fc8
|
fix merge_reduce_ends (#15659)
* fix merge_reduce_ends
same range with different nesting should not merge, like cumsum twice should not merge
* skip that
|
2026-04-08 17:20:01 -04:00 |
|
chenyu
|
cb681da840
|
move UOp.pad to mixin (#15657)
the same arg works for Tensor.pad
|
2026-04-08 13:15:19 -04:00 |
|
nimlgen
|
28b14b0e38
|
mlx: remove to_be, use helpers (#15655)
|
2026-04-08 20:07:28 +03:00 |
|
nimlgen
|
1b44cb2ac6
|
split update stat from execitem (#15654)
|
2026-04-08 20:07:12 +03:00 |
|
qazal
|
71c83cc3f6
|
viz: put OTHER_ on the wave row (#15650)
* viz: put OTHER_ on the wave row
* update tests
* cleanup cli
|
2026-04-08 23:13:44 +09:00 |
|
chenyu
|
839d37b7bc
|
update median_step_time in model_train.py (#15649)
BENCHMARK=5 used to pick the 4th largest, not the middle one
|
2026-04-08 09:53:59 -04:00 |
|
chenyu
|
dae9dea903
|
clean up tensor random functions (#15648)
* clean up tensor random functions
* revert that
|
2026-04-08 09:44:37 -04:00 |
|
George Hotz
|
1ebeb52e59
|
RDNA4 asm gemm (#15427)
* sqtt: rdna4 decoder work
* diff cleanup
* more diff
* test
* 125
* r4
---------
Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
|
2026-04-08 21:26:44 +08:00 |
|
nimlgen
|
b1e52ba0c2
|
the slowest line in hcq graph (#15635)
* the slowest line in hcq graph
* x
|
2026-04-08 15:53:52 +03:00 |
|
qazal
|
3ac16b3bea
|
viz: add wmma row, update exec duration logic (#15646)
* viz: split wmma to its own row, fix duration logic
* regs
* decrease number of loops, add pickle
* assert overlaps
|
2026-04-08 20:24:23 +09:00 |
|
George Hotz
|
35e3983840
|
Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644)
|
2026-04-08 17:16:19 +08:00 |
|
qazal
|
39a029ec55
|
remove ASM_GEMM context var (#15645)
|
2026-04-08 18:02:40 +09:00 |
|
qazal
|
dc6a51e44d
|
viz: add # of bytes to sdma (#15639)
* viz: add # of bytes to sdma
* update test_viz
|
2026-04-08 17:43:37 +09:00 |
|
wozeparrot
|
70dbd35023
|
llama: move custom_kernel into flat_llama (#15643)
|
2026-04-08 00:19:14 -07:00 |
|
Christopher Milan
|
bcf6931a4f
|
fix: comma 4 does not have pcie (#15642)
|
2026-04-07 23:57:03 -04:00 |
|
George Hotz
|
f930579b7a
|
llm: change the default port to 8000 so you can remember it (match vLLM)
|
2026-04-08 11:25:38 +08:00 |
|
b1tg
|
bf3763526a
|
llm: buffer SSE chunks to fix parse errors from split reads (#15641)
|
2026-04-08 10:26:23 +08:00 |
|
qazal
|
a508b8fd2a
|
viz: delete redundant things (#15637)
* delete that
* remove
* delete graph config
|
2026-04-08 07:18:04 +09:00 |
|
chenyu
|
9c6e925b56
|
move lerp to mixin (#15634)
last function of math function section
|
2026-04-07 15:13:00 -04:00 |
|
qazal
|
890286e8d6
|
update llama profile.sh (#15633)
* update llama profile.sh
* BENCHMARK 5
|
2026-04-08 03:18:45 +09:00 |
|
nimlgen
|
b78b384d58
|
mlx: graph (#15621)
* Dx
* Dx
* simpler
* mypy
* x
* f
* Dx
* x
* c
* x
|
2026-04-07 19:43:51 +03:00 |
|
qazal
|
d29f0ef721
|
viz: speed up profiler first render (#15632)
* viz: speed up profiler first render
* better comment
|
2026-04-07 23:07:09 +09:00 |
|
George Hotz
|
d3de63d998
|
improvements to apps.llm (#15631)
|
2026-04-07 20:34:05 +08:00 |
|
George Hotz
|
2b01ca59dd
|
USB driver for custom ASM firmware (#15597)
* USB driver for custom ASM firmware
* timeout
* fix mypy
* pcie mem read
* flip in f/w
* one tx
* litle endian
* autodetect custom
* mock bypass
* lint
* clean
|
2026-04-07 13:45:41 +08:00 |
|
wozeparrot
|
810d7c00cd
|
llama: unify scripts (#15628)
|
2026-04-06 20:28:08 -07:00 |
|
Christopher Milan
|
19e96497ee
|
interface in DEV (#15620)
|
2026-04-06 19:59:28 -04:00 |
|
qazal
|
8ba58304f7
|
viz: reenable tests (#15626)
|
2026-04-07 07:52:44 +09:00 |
|
chenyu
|
2f7d085450
|
shared _normalize_indices for getitem (#15625)
* shared _normalize_indices for getitem
* list
|
2026-04-06 17:45:36 -04:00 |
|
chenyu
|
66ec188d50
|
more activations to mixin (#15624)
|
2026-04-06 15:41:41 -04:00 |
|
chenyu
|
1483f7e71c
|
support shift by Tensor (#15623)
* support shift by Tensor
* use mixin
|
2026-04-06 15:14:57 -04:00 |
|
chenyu
|
6e30a5f5ea
|
update shifts in torch backend (#15622)
|
2026-04-06 14:08:33 -04:00 |
|
chenyu
|
a444be172d
|
lower fuzz_symbolic_symbolic_div timeout (#15619)
mitigate timeout crash due to high total time
|
2026-04-06 12:58:29 -04:00 |
|
chenyu
|
01b49c8647
|
support int operand for shifts (#15618)
matches torch/jax, also symbolic rule to remove mask
|
2026-04-06 12:32:12 -04:00 |
|
nimlgen
|
e2700475cf
|
mlx: cleaner (#15617)
* mlx: cleaner
* x
|
2026-04-06 17:49:47 +03:00 |
|
Valtteri Valo
|
86c4431d74
|
add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079)
use supportsFamily API to detect GPU generation instead of parsing
ICB debug description strings. also adds metal4.0 compiler target.
|
2026-04-06 06:51:38 +08:00 |
|
13Perrius
|
ff0c941548
|
remove redundant iteration and toposort in _deepwalk (#15532)
|
2026-04-06 06:38:45 +08:00 |
|
Andrew Cappelli
|
e39cfe685a
|
validate lr, momentum, weight_decay in optimizers (#15576)
|
2026-04-06 06:37:34 +08:00 |
|
nimlgen
|
6a334ceb27
|
hotfix: fix bert (#15613)
|
2026-04-05 23:41:21 +03:00 |
|
nimlgen
|
e3986a6b74
|
mlx: init runtime (#15612)
* mlx: init
* x
* swap
|
2026-04-05 22:52:29 +03:00 |
|
nimlgen
|
e0988dbae5
|
hcq: support non for signal_t and compute_t (#15611)
* hcq: support non for signal_t and compute_t
* revert
* x
|
2026-04-05 18:56:47 +03:00 |
|
nimlgen
|
5e134aa087
|
hcq: add write/poll_bit commands (#15610)
* hcq: add write/poll_bit commands
* x
|
2026-04-05 18:09:44 +03:00 |
|
nimlgen
|
604cdbf2f7
|
am: large allocs aligned to 2mb to use 2mb pages (#15609)
|
2026-04-05 18:01:31 +03:00 |
|
qazal
|
b2d5b29f45
|
assembly/amd: validate dsl keyword args (#15608)
* assembly/amd: validate dsl keyword args
* hm, this should use the SOP2 s_waits
* use the sop2 s_waits
|
2026-04-05 23:00:24 +09:00 |
|