chenyu
da1700e16b
dtypes.index -> dtypes.weakint ( #15377 )
2026-03-20 01:08:46 -04:00
nimlgen
3b04e3ea28
no gmmu mappings with GMMU=0 ( #15369 )
...
* usb
* free
* simple gmmu=0
* x
* x
* vram
* init tests
* ppg
* x
2026-03-20 12:18:34 +08:00
ridoy majumdar
c1183b8872
remove dead code in pyrender ( #15115 )
...
* remove dead code in pyrender
* retrig CI
* retrig CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-03-19 23:59:56 -04:00
chenyu
bf33c5f796
remove gradient materialize_grads ( #15367 )
...
effectively default to True
and removed *0 hack in Tensor.copysign. now dy/dx=0 if y does not depend on x
remove
2026-03-19 23:36:03 -04:00
chenyu
45baf3ff3f
pin ci xcode version ( #15375 )
2026-03-19 23:13:16 -04:00
George Hotz
4091d37e8e
flat llama step work ( #15355 )
...
* flat llama step work
* fp8 support
* blacklisted matmul
* chestertons fence
2026-03-20 09:06:12 +08:00
qazal
176ad47d7d
cdna4 emulator testing ASM_GEMM in CI ( #15373 )
...
* cdna emulator work
* accvgprs
* cdna passes most tests
* ruff
* add cdna4 to tests
* cdna emu
* crash
* pass?
* work
* gen
* clean up wave_size access
* asm_gemm passes
* remove acc from dsl.py, emulator can keep its different reg file
it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can
be cleaned up later, but not functionally required for emulator.
* split asm_gemm tests to ones fast on the emulator
* don't do that
* 124 stays null on rdna
* the segfault was because of hw regs, not this
* Revert "clean up wave_size access", it's explicitly tested
This reverts commit 1202ff5787 .
* nullcopyout
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-03-20 05:51:30 +09:00
nimlgen
16daffc042
remote connection timeout ( #15370 )
2026-03-19 19:44:16 +08:00
Christopher Milan
68d7a6b7be
PYTHONREMU: fix vop3p literals ( #15372 )
2026-03-19 07:05:01 -04:00
George Hotz
70dad9d642
add PING to RemoteCmd ( #15371 )
...
* add PING to RemoteCmd
* cleanup
2026-03-19 18:57:40 +08:00
nimlgen
1c978aeedb
amd: fix aql remote ( #15368 )
2026-03-19 18:11:03 +08:00
qazal
337c684047
viz: cycle time relative to kernel start in sidebar ( #15352 )
2026-03-19 18:41:29 +09:00
chenyu
d81b03cff4
pad_to to mixin [pr] ( #15365 )
2026-03-19 05:02:01 -04:00
chenyu
1abb6297f6
more Tensor(UOp) cleanups ( #15364 )
...
* more Tensor(UOp) cleanups
* function too
2026-03-19 03:34:30 -04:00
nimlgen
cf50ca23c3
better oom msg ( #15362 )
...
* better oom msg
* s
2026-03-19 14:07:01 +08:00
nimlgen
1a53393512
remote in ci benchmark ( #15344 )
...
* remote in ci benchmark
* move to the end
* move
* ports
* own this
2026-03-19 13:49:09 +08:00
chenyu
92dfef8060
Tensor(uop) does not need explicit device ( #15361 )
2026-03-19 00:44:33 -04:00
nimlgen
f32c2e43a7
memory: use pfree ( #15360 )
2026-03-19 12:39:23 +08:00
nimlgen
86eec01f97
limit gl*lc ( #15359 )
2026-03-19 12:38:55 +08:00
chenyu
b39816e998
failed test case for Tensor(np, "bf16") ( #15358 )
2026-03-18 23:40:14 -04:00
chenyu
e407ee410c
cosmetic Tensor._do_reduction cleanups ( #15357 )
2026-03-18 22:27:50 -04:00
chenyu
6aebf95dac
move neg and invert to mixin ( #15356 )
2026-03-18 22:03:41 -04:00
wozeparrot
f6687d1ffc
feat: sd seed0 update ( #15354 )
2026-03-18 18:42:00 -07:00
wozeparrot
c45a606750
feat: no if in rand ( #15333 )
2026-03-18 15:09:51 -07:00
qazal
23e0431848
viz: switch sqtt sidebar to a simple asm list ( #15350 )
...
* work
* something like this
* Revert "something like this"
This reverts commit 6c45098d2b .
* less
* path includes
* scroll only jumps up and down
* it's only pc and line now
2026-03-19 01:40:25 +09:00
qazal
709fc52d7b
viz: fix auto zoom range in sqtt, include endpgm packet ( #15349 )
...
* viz: fix automatic zoom range in sqtt packets
* it's x+width
* include s_endpgm
* endpgm also doesn't have exec
2026-03-18 22:52:32 +09:00
nimlgen
d4836ddbb0
canonicalize device from tuple ( #15348 )
...
* will it ifx ci?
* test
* um
2026-03-18 20:35:52 +08:00
George Hotz
5524916e39
llama compute gradients explicitly + 243 GB of RAM on MP=8 ( #15343 )
...
* llama compute gradients explicitly
* apply grads
* fix multi issue
* multi BUFFER_VIEW support
* simpler
* skip the flaky test
2026-03-18 19:54:40 +08:00
nimlgen
ff004d2114
remote: fix mmio ( #15347 )
2026-03-18 18:20:39 +08:00
nimlgen
f853371c83
fix compilers autoselect ( #15346 )
2026-03-18 18:19:53 +08:00
chenyu
761ce8c0d3
fix Invalid combine rules ( #15345 )
...
* fix Invalid combine rules
wrong conditions broke setiem into invalids
* fix
2026-03-18 04:58:02 -04:00
nimlgen
c0499ca3e8
nv: use mmio iface ( #15342 )
...
* nv: use mmio iface
* nv: use mmio iface
* revert
* f
2026-03-18 16:53:09 +08:00
Christopher Milan
499ad9a356
benchmark openpilot 0.11.0 ( #15341 )
2026-03-18 03:28:43 -04:00
George Hotz
6e196195d8
add test for flat llama ( #15327 )
...
* add test for flat llama
* simpler
* back to split w1/w3
* env
* still too much ram
* invalid
2026-03-18 15:16:33 +08:00
chenyu
fceb21c315
Tensor(uop) uses device from uop ( #15340 )
2026-03-18 02:56:06 -04:00
George Hotz
6109117af1
anonymous buffers are Invalid ( #15336 )
...
* anonymous buffers are Invalid
* unique_const
* work
* remove invalid writes
* test_anonymous_buffers_in_function
2026-03-18 14:52:56 +08:00
chenyu
e644e1cb6a
less Tensor(...).uop indirection in Tensor.__init__ ( #15339 )
2026-03-18 02:17:38 -04:00
nimlgen
0315faf938
remote bench ( #15331 )
2026-03-18 14:03:51 +08:00
nimlgen
d720d50e12
memory: traverse all valid ranges only ( #15338 )
...
* memory: traverse all valid ranges only
* x
2026-03-18 14:03:39 +08:00
chenyu
ac7a348d06
dtypes.as_const -> DType.const ( #15337 )
...
does not need to be a staticmethod
2026-03-18 00:48:41 -04:00
Christopher Milan
864d3917d5
add openpilot onnx parser test ( #15334 )
2026-03-18 00:12:02 -04:00
Christopher Milan
0222bfdf69
Revert "don't use intermediate dict in onnx parse" ( #15332 )
2026-03-17 23:46:30 -04:00
chenyu
94926d00d8
fix rand > uint32.max ( #15330 )
...
need to keep low and high as 1D tensor.
`PYTHONPATH=. LLAMA3_SIZE=405B python3 examples/mlperf/models/flat_llama.py` works now
2026-03-17 22:00:01 -04:00
wozeparrot
b45edeb965
fix: rand supports large tensors ( #15329 )
2026-03-17 15:45:41 -07:00
qazal
00817cf65e
viz: all tests can run on the NULL device ( #15328 )
...
* remove that
* move to test_viz
* get_cfg
* do not use os.environ
* hm
* it's always on NULL
* import renderer
* no import *
2026-03-18 04:14:20 +09:00
George Hotz
2605840ee2
flat llama ( #15324 )
...
* FlatTransformer
* works
* pass in buffer views
* print stuff
* print
* bugfixes
2026-03-17 19:39:55 +08:00
nimlgen
0a641ce17d
system: remote ( #15318 )
...
* system: remote
* listen
* print
* fix
* minor
2026-03-17 19:25:37 +08:00
Christopher Milan
69eefdca20
images with height=1 have less strict width rules ( #15325 )
2026-03-17 07:07:22 -04:00
chenyu
14eb8170e4
skip TestRunAsModule if libclang is loaded ( #15323 )
...
reverse rule of TestAutogen skip, otherwise `NULL=1 python -m pytest test/null/test_autogen.py test/null/test_device.py` crashes for me
2026-03-17 06:02:53 -04:00
qazal
e7c26b6319
viz: rename to Start Cycle for the sqtt graph ( #15320 )
2026-03-17 18:53:06 +09:00