chenyu
34594bcaaf
Revert "bug in metal: offset is stored as uint32, overflow ( #15129 )" ( #15136 )
...
This reverts commit 9c58db16fa .
2026-03-04 16:54:42 -05:00
Roelof van Dijk
9c58db16fa
bug in metal: offset is stored as uint32, overflow ( #15129 )
...
* metal uint32 icb offset overflow
* fix: diff
* supports_exec_item
* GraphRunner.supports_exec_item
* tests
* fix: can't import on non-metal
2026-03-04 22:52:12 +03:00
chenyu
4cce283790
relax test_tqdm_perf ( #15134 )
2026-03-04 12:58:47 -05:00
chenyu
fae400d300
update assign tests to also test the expected behavior ( #15132 )
2026-03-04 11:34:43 -05:00
chenyu
1f96cc2b51
update non-contiguous buffer error message [pr] ( #15131 )
...
* update non-contiguous buffer error message [pr]
also cleaned up the tests
* order
2026-03-04 11:13:26 -05:00
nimlgen
563d5c3211
more graph tests ( #15130 )
2026-03-04 19:01:12 +03:00
nimlgen
cdc48da9cd
hevc: assert and speed ( #15122 )
...
* hevc: assert and speed
* simpler
2026-03-04 19:01:02 +03:00
wozeparrot
4e9b85ecfd
fa: pull inputs out of call ( #15127 )
2026-03-04 03:15:49 -08:00
George Hotz
47faa2d7b4
hotfix: llm kv cache uses clone instead of realize to avoid many realize
2026-03-04 19:07:03 +08:00
George Hotz
8ebd24637b
fix fa forward building with clang 22 ( #15124 )
...
* fix fa forward building with clang 22
* fix: override rocm path
---------
Co-authored-by: Woze Parrot <wozeparrot@gmail.com >
2026-03-04 02:32:25 -08:00
Christopher Milan
592f9bf6c6
set OPENPILOT_HACKS=1 to enable replace assign ( #15123 )
2026-03-04 05:26:04 -05:00
wozeparrot
df23057984
fa: change bwd grid dim + unshuffle using mops ( #15068 )
2026-03-04 01:23:40 -08:00
Christopher Milan
5623cea7b1
move openpilot contiguous hacks to schedule ( #15120 )
2026-03-04 03:04:06 -05:00
wozeparrot
759c7fc81c
failing test for allreduce memory usage ( #15106 )
2026-03-03 23:38:38 -08:00
George Hotz
5ecfe549e7
allreduce is a function with LATE_ALLREDUCE=1 ( #15119 )
...
* allreduce as a function
* allreduce function
* support allreduce function
* LATE_ALLREDUCE
2026-03-04 15:17:58 +08:00
Christopher Milan
e7e70a3c95
simplify idx before counting backward_slice ( #15117 )
2026-03-03 23:53:50 -05:00
George Hotz
2d72a4a90c
fix copying padded const ( #15116 )
...
* fix const padding cpu
* remove comment
2026-03-04 10:39:45 +08:00
chenyu
b5ebb4d06d
contiguous_view_offset returns only offset [pr] ( #15113 )
...
size is always input.size
2026-03-03 15:23:39 -05:00
nimlgen
abd830b260
am: setup_rinf returns only doorbell ( #15112 )
2026-03-03 19:27:41 +03:00
nimlgen
4b42bb54aa
am: reset sdma to start from 0 ( #15109 )
2026-03-03 18:14:46 +03:00
George Hotz
01ddb4c267
add precompile to call ( #15099 )
...
* add precompile to call
* put get back
* something
* after structure
* alt
* keep it call
* resolve call
* resolve linear call
* precompile works with llm
* revert rangeify
* color for debugging
* getenv PRECOMPILE
* clean up deco pattern
* fully recursive sink scheduling
* revert llama
* fix SPEC=2
2026-03-03 22:32:42 +08:00
qazal
c7f908b788
sqtt: fix rdna4 structs ( #15111 )
...
* work
* DEBUG=2
2026-03-03 23:32:14 +09:00
qazal
8dd691761d
sqtt: remove old files ( #15108 )
2026-03-03 22:43:24 +09:00
Christopher Milan
de043226ba
benchmark comma usbgpu driving_vision step and load time ( #15103 )
...
Co-authored-by: Comma Device <device@comma.ai >
2026-03-03 06:08:03 -05:00
Christopher Milan
5f6b610da1
FLOAT16 logic for IMAGE==1 goes back to image_conv2d ( #15105 )
2026-03-03 05:37:57 -05:00
wozeparrot
529318259c
fix: fix null tests to actually use null device ( #15104 )
2026-03-03 02:05:47 -08:00
George Hotz
7d025089e3
no after removal ( #15102 )
...
* no after removal
* we are using walk
* null schedule test
* pytest deps
* Revert "pytest deps"
This reverts commit 5e1c5304ec .
* Revert "null schedule test"
This reverts commit 02da66053e .
* clean null tests
2026-03-03 17:50:31 +08:00
wozeparrot
92c16810ac
feat: per device mem_used ( #15100 )
2026-03-03 01:31:28 -08:00
qazal
e3a0598d0b
viz: the whole pc should be in view ( #15101 )
2026-03-03 17:17:53 +09:00
b1tg
a9ea36de79
assembly/amd: v_cmp_lg_f32 is ordered not-equal ( #14982 )
2026-03-03 15:37:48 +08:00
wozeparrot
c35de9bd68
asm_gemm: support more sharding ( #15002 )
2026-03-02 23:16:37 -08:00
wozeparrot
824ba4386a
llama3 dp fix ( #15098 )
2026-03-02 22:43:07 -08:00
chenyu
5dcf29b1a0
use clone in test_swap_slices ( #15096 )
2026-03-02 22:05:12 -05:00
Christopher Milan
c70e8af068
move IMAGE FLOAT16 logic to allocations ( #15095 )
...
* FLOAT16 logic in allocations
* cleanup
* separate that
* only apply when IMAGE == 1
* test passing now
* create image buffers earlier
2026-03-02 22:00:05 -05:00
George Hotz
d483e4153a
buffer view is like buffer ( #15082 )
...
* buffer view is like buffer
* fix
* swap_reshape_shrink
* contiguous on gguf, fix overlap
* revert that
* _device_supports_view
* this
* fix that test
* 0 buffers
* that test was wrong
* this
* check correct size
* contig BUFFER_VIEW
* this
* fix tests
* buffer view tests
* om
* fix torch
* no MOCKGPU
* skip
2026-03-03 09:52:33 +08:00
qazal
62ee976c1b
gemm/asm: cleanup repeated patterns to helper functions ( #15094 )
2026-03-03 08:14:47 +09:00
qazal
848f5cea96
viz: sqtt instruction packet trace ( #15065 )
2026-03-03 07:55:04 +09:00
chenyu
14d1c5fdfd
assign fusion tests on detach and contiguous_backward ( #15092 )
2026-03-02 15:21:51 -05:00
nimlgen
dfa180413d
tbgpu: sign nv ( #15087 )
2026-03-02 22:58:30 +03:00
chenyu
71f228f80f
test exact kernel count in torch_backend/test_kernel_fusion ( #15091 )
2026-03-02 14:26:32 -05:00
chenyu
f80b1033c5
simpler Tensor.all ( #15089 )
...
same generated kernel
2026-03-02 11:08:55 -05:00
chenyu
4008f7d4e8
move Tensor.one_hot +1 to python ( #15088 )
2026-03-02 10:56:41 -05:00
nimlgen
dafbe9733a
am: cleanup ( #15086 )
2026-03-02 17:06:21 +03:00
qazal
f7aeff6061
viz: cli.py cleanups, do not require PYTHONPATH ( #15085 )
...
* cleanup the print
* sys.exit
* equal check
* cleanup unpacker
* cli doesn't need PYTHONPATH
* no semicolons
* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
George Hotz
5ff278446c
add contiguous_view_offset ( #15084 )
...
* add contiguous_view_offset
* no int
2026-03-02 18:05:04 +08:00
Christopher Milan
977c270774
IMAGE=1 kernel count failing tests ( #15083 )
2026-03-02 04:35:26 -05:00
George Hotz
3539693555
Support triu variable on diagonal + SDPA symbolic ( #15081 )
...
* triu variable
* fails
* dumbbb
* no commutative in reshape
* real fix
* revert that
* sdpa symbolic tests
2026-03-02 12:19:48 +08:00
wozeparrot
a4f6365929
llama3: fstep takes grads ( #15069 )
2026-03-01 20:05:07 -08:00
Nick
8e8e9f6ff6
assert removal for _tri() + tests ( #15073 )
...
* assert removal for _tri() and tests
* removed import
* tests triu/tril like in prefill
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-03-02 10:34:28 +08:00
nimlgen
ccbbca05ef
beam: add dev_timeout for am ( #15063 )
...
* beam: add dev_timeout for am
* all covered
* fk
* x
* fuzz
* reset
* f
2026-03-01 16:57:29 +03:00