Commit Graph

12493 Commits

Author SHA1 Message Date
nimlgen
cdc48da9cd hevc: assert and speed (#15122)
* hevc: assert and speed

* simpler
2026-03-04 19:01:02 +03:00
wozeparrot
4e9b85ecfd fa: pull inputs out of call (#15127) 2026-03-04 03:15:49 -08:00
George Hotz
47faa2d7b4 hotfix: llm kv cache uses clone instead of realize to avoid many realize 2026-03-04 19:07:03 +08:00
George Hotz
8ebd24637b fix fa forward building with clang 22 (#15124)
* fix fa forward building with clang 22

* fix: override rocm path

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-03-04 02:32:25 -08:00
Christopher Milan
592f9bf6c6 set OPENPILOT_HACKS=1 to enable replace assign (#15123) 2026-03-04 05:26:04 -05:00
wozeparrot
df23057984 fa: change bwd grid dim + unshuffle using mops (#15068) 2026-03-04 01:23:40 -08:00
Christopher Milan
5623cea7b1 move openpilot contiguous hacks to schedule (#15120) 2026-03-04 03:04:06 -05:00
wozeparrot
759c7fc81c failing test for allreduce memory usage (#15106) 2026-03-03 23:38:38 -08:00
George Hotz
5ecfe549e7 allreduce is a function with LATE_ALLREDUCE=1 (#15119)
* allreduce as a function

* allreduce function

* support allreduce function

* LATE_ALLREDUCE
2026-03-04 15:17:58 +08:00
Christopher Milan
e7e70a3c95 simplify idx before counting backward_slice (#15117) 2026-03-03 23:53:50 -05:00
George Hotz
2d72a4a90c fix copying padded const (#15116)
* fix const padding cpu

* remove comment
2026-03-04 10:39:45 +08:00
chenyu
b5ebb4d06d contiguous_view_offset returns only offset [pr] (#15113)
size is always input.size
2026-03-03 15:23:39 -05:00
nimlgen
abd830b260 am: setup_rinf returns only doorbell (#15112) 2026-03-03 19:27:41 +03:00
nimlgen
4b42bb54aa am: reset sdma to start from 0 (#15109) 2026-03-03 18:14:46 +03:00
George Hotz
01ddb4c267 add precompile to call (#15099)
* add precompile to call

* put get back

* something

* after structure

* alt

* keep it call

* resolve call

* resolve linear call

* precompile works with llm

* revert rangeify

* color for debugging

* getenv PRECOMPILE

* clean up deco pattern

* fully recursive sink scheduling

* revert llama

* fix SPEC=2
2026-03-03 22:32:42 +08:00
qazal
c7f908b788 sqtt: fix rdna4 structs (#15111)
* work

* DEBUG=2
2026-03-03 23:32:14 +09:00
qazal
8dd691761d sqtt: remove old files (#15108) 2026-03-03 22:43:24 +09:00
Christopher Milan
de043226ba benchmark comma usbgpu driving_vision step and load time (#15103)
Co-authored-by: Comma Device <device@comma.ai>
2026-03-03 06:08:03 -05:00
Christopher Milan
5f6b610da1 FLOAT16 logic for IMAGE==1 goes back to image_conv2d (#15105) 2026-03-03 05:37:57 -05:00
wozeparrot
529318259c fix: fix null tests to actually use null device (#15104) 2026-03-03 02:05:47 -08:00
George Hotz
7d025089e3 no after removal (#15102)
* no after removal

* we are using walk

* null schedule test

* pytest deps

* Revert "pytest deps"

This reverts commit 5e1c5304ec.

* Revert "null schedule test"

This reverts commit 02da66053e.

* clean null tests
2026-03-03 17:50:31 +08:00
wozeparrot
92c16810ac feat: per device mem_used (#15100) 2026-03-03 01:31:28 -08:00
qazal
e3a0598d0b viz: the whole pc should be in view (#15101) 2026-03-03 17:17:53 +09:00
b1tg
a9ea36de79 assembly/amd: v_cmp_lg_f32 is ordered not-equal (#14982) 2026-03-03 15:37:48 +08:00
wozeparrot
c35de9bd68 asm_gemm: support more sharding (#15002) 2026-03-02 23:16:37 -08:00
wozeparrot
824ba4386a llama3 dp fix (#15098) 2026-03-02 22:43:07 -08:00
chenyu
5dcf29b1a0 use clone in test_swap_slices (#15096) 2026-03-02 22:05:12 -05:00
Christopher Milan
c70e8af068 move IMAGE FLOAT16 logic to allocations (#15095)
* FLOAT16 logic in allocations

* cleanup

* separate that

* only apply when IMAGE == 1

* test passing now

* create image buffers earlier
2026-03-02 22:00:05 -05:00
George Hotz
d483e4153a buffer view is like buffer (#15082)
* buffer view is like buffer

* fix

* swap_reshape_shrink

* contiguous on gguf, fix overlap

* revert that

* _device_supports_view

* this

* fix that test

* 0 buffers

* that test was wrong

* this

* check correct size

* contig BUFFER_VIEW

* this

* fix tests

* buffer view tests

* om

* fix torch

* no MOCKGPU

* skip
2026-03-03 09:52:33 +08:00
qazal
62ee976c1b gemm/asm: cleanup repeated patterns to helper functions (#15094) 2026-03-03 08:14:47 +09:00
qazal
848f5cea96 viz: sqtt instruction packet trace (#15065) 2026-03-03 07:55:04 +09:00
chenyu
14d1c5fdfd assign fusion tests on detach and contiguous_backward (#15092) 2026-03-02 15:21:51 -05:00
nimlgen
dfa180413d tbgpu: sign nv (#15087) 2026-03-02 22:58:30 +03:00
chenyu
71f228f80f test exact kernel count in torch_backend/test_kernel_fusion (#15091) 2026-03-02 14:26:32 -05:00
chenyu
f80b1033c5 simpler Tensor.all (#15089)
same generated kernel
2026-03-02 11:08:55 -05:00
chenyu
4008f7d4e8 move Tensor.one_hot +1 to python (#15088) 2026-03-02 10:56:41 -05:00
nimlgen
dafbe9733a am: cleanup (#15086) 2026-03-02 17:06:21 +03:00
qazal
f7aeff6061 viz: cli.py cleanups, do not require PYTHONPATH (#15085)
* cleanup the print

* sys.exit

* equal check

* cleanup unpacker

* cli doesn't need PYTHONPATH

* no semicolons

* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
George Hotz
5ff278446c add contiguous_view_offset (#15084)
* add contiguous_view_offset

* no int
2026-03-02 18:05:04 +08:00
Christopher Milan
977c270774 IMAGE=1 kernel count failing tests (#15083) 2026-03-02 04:35:26 -05:00
George Hotz
3539693555 Support triu variable on diagonal + SDPA symbolic (#15081)
* triu variable

* fails

* dumbbb

* no commutative in reshape

* real fix

* revert that

* sdpa symbolic tests
2026-03-02 12:19:48 +08:00
wozeparrot
a4f6365929 llama3: fstep takes grads (#15069) 2026-03-01 20:05:07 -08:00
Nick
8e8e9f6ff6 assert removal for _tri() + tests (#15073)
* assert removal for _tri() and tests

* removed import

* tests triu/tril like in prefill

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-03-02 10:34:28 +08:00
nimlgen
ccbbca05ef beam: add dev_timeout for am (#15063)
* beam: add dev_timeout for am

* all covered

* fk

* x

* fuzz

* reset

* f
2026-03-01 16:57:29 +03:00
chenyu
8cb4368967 delete unused END NOOP rule [pr] (#15077) 2026-03-01 00:09:05 -05:00
chenyu
efce99adc9 skip isComposing key press in llm.py (#15076)
for the CJK input user
2026-02-28 20:31:53 -05:00
chenyu
103ea16ec0 add contiguous back to svd (#15074)
can cause infinite loop
2026-02-28 16:49:26 -05:00
chenyu
fe0fa8333b Revert "improve Tensor.sort indices (#15070)" (#15072)
This reverts commit e3003631f2.
2026-02-28 14:40:30 -05:00
chenyu
e3003631f2 improve Tensor.sort indices (#15070)
* improve Tensor.sort indices

instead of N^2 match at the end, have an arange to start and go through the same N(logN)^2 path

* contiguous
2026-02-28 14:16:16 -05:00
wozeparrot
cfc5cf65ad llama3: vocab padding fix + jit copies on fakedata (#15067) 2026-02-28 08:44:55 -08:00