Commit Graph

12509 Commits

Author SHA1 Message Date
wozeparrot
be23772d43 llama3 fixes part2 (#15150) 2026-03-04 23:43:50 -08:00
wozeparrot
0c769289eb llama3: more scripts (#15107) 2026-03-04 22:18:03 -08:00
George Hotz
fb43b415f9 fix symbolic shape call + chunked prefill (#15149)
* fix precompile for symbolic shape

* chunked prefill

* cleaner

* test that
2026-03-05 14:02:26 +08:00
George Hotz
8a82b26522 llm: print the prefill cache size (#15146)
* print the llm prefill cache size

* mock that too
2026-03-05 12:13:28 +08:00
chenyu
b5370fd52d use copy_multi in alu_multi [pr] (#15143)
* use copy_multi in alu_multi [pr]

* copy to anything
2026-03-04 22:53:00 -05:00
George Hotz
72a9ed6e23 fix render depth bug + add warmup to serve + no realize default (#15144)
* fix render depth bug + add warmup to serve

* make realize not the default
2026-03-05 11:21:16 +08:00
George Hotz
ac1847cbf7 fully symbolic llm (#15097)
* work

* llm symbolic (almost)

* work

* revert that

* llm sym

* works

* cleanups

* cache tokens with the kv cache

* cleanups

* cleanups
2026-03-05 10:22:11 +08:00
qazal
33a1970045 sqtt: simplify inst mapping, validate JUMP processing in CI (#15139)
* jump cleanup

* assert there's a JUMP

* new example for JUMP

* regenerate examples

* rdna4 work

* new packets

* work

* less for branch handling

* less verbose

* fix err message
2026-03-05 09:53:12 +09:00
chenyu
04da527a7a minor div_and_mod_symbolic cleanups (#15138) 2026-03-04 19:05:44 -05:00
chenyu
106d18b792 use UOp methods in allreduce.py [pr] (#15137)
except the one line with Ops.BUFFER and Ops.NOOP, not sure what that's for
2026-03-04 17:15:33 -05:00
chenyu
34594bcaaf Revert "bug in metal: offset is stored as uint32, overflow (#15129)" (#15136)
This reverts commit 9c58db16fa.
2026-03-04 16:54:42 -05:00
Roelof van Dijk
9c58db16fa bug in metal: offset is stored as uint32, overflow (#15129)
* metal uint32 icb offset overflow

* fix: diff

* supports_exec_item

* GraphRunner.supports_exec_item

* tests

* fix: can't import on non-metal
2026-03-04 22:52:12 +03:00
chenyu
4cce283790 relax test_tqdm_perf (#15134) 2026-03-04 12:58:47 -05:00
chenyu
fae400d300 update assign tests to also test the expected behavior (#15132) 2026-03-04 11:34:43 -05:00
chenyu
1f96cc2b51 update non-contiguous buffer error message [pr] (#15131)
* update non-contiguous buffer error message [pr]

also cleaned up the tests

* order
2026-03-04 11:13:26 -05:00
nimlgen
563d5c3211 more graph tests (#15130) 2026-03-04 19:01:12 +03:00
nimlgen
cdc48da9cd hevc: assert and speed (#15122)
* hevc: assert and speed

* simpler
2026-03-04 19:01:02 +03:00
wozeparrot
4e9b85ecfd fa: pull inputs out of call (#15127) 2026-03-04 03:15:49 -08:00
George Hotz
47faa2d7b4 hotfix: llm kv cache uses clone instead of realize to avoid many realize 2026-03-04 19:07:03 +08:00
George Hotz
8ebd24637b fix fa forward building with clang 22 (#15124)
* fix fa forward building with clang 22

* fix: override rocm path

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-03-04 02:32:25 -08:00
Christopher Milan
592f9bf6c6 set OPENPILOT_HACKS=1 to enable replace assign (#15123) 2026-03-04 05:26:04 -05:00
wozeparrot
df23057984 fa: change bwd grid dim + unshuffle using mops (#15068) 2026-03-04 01:23:40 -08:00
Christopher Milan
5623cea7b1 move openpilot contiguous hacks to schedule (#15120) 2026-03-04 03:04:06 -05:00
wozeparrot
759c7fc81c failing test for allreduce memory usage (#15106) 2026-03-03 23:38:38 -08:00
George Hotz
5ecfe549e7 allreduce is a function with LATE_ALLREDUCE=1 (#15119)
* allreduce as a function

* allreduce function

* support allreduce function

* LATE_ALLREDUCE
2026-03-04 15:17:58 +08:00
Christopher Milan
e7e70a3c95 simplify idx before counting backward_slice (#15117) 2026-03-03 23:53:50 -05:00
George Hotz
2d72a4a90c fix copying padded const (#15116)
* fix const padding cpu

* remove comment
2026-03-04 10:39:45 +08:00
chenyu
b5ebb4d06d contiguous_view_offset returns only offset [pr] (#15113)
size is always input.size
2026-03-03 15:23:39 -05:00
nimlgen
abd830b260 am: setup_rinf returns only doorbell (#15112) 2026-03-03 19:27:41 +03:00
nimlgen
4b42bb54aa am: reset sdma to start from 0 (#15109) 2026-03-03 18:14:46 +03:00
George Hotz
01ddb4c267 add precompile to call (#15099)
* add precompile to call

* put get back

* something

* after structure

* alt

* keep it call

* resolve call

* resolve linear call

* precompile works with llm

* revert rangeify

* color for debugging

* getenv PRECOMPILE

* clean up deco pattern

* fully recursive sink scheduling

* revert llama

* fix SPEC=2
2026-03-03 22:32:42 +08:00
qazal
c7f908b788 sqtt: fix rdna4 structs (#15111)
* work

* DEBUG=2
2026-03-03 23:32:14 +09:00
qazal
8dd691761d sqtt: remove old files (#15108) 2026-03-03 22:43:24 +09:00
Christopher Milan
de043226ba benchmark comma usbgpu driving_vision step and load time (#15103)
Co-authored-by: Comma Device <device@comma.ai>
2026-03-03 06:08:03 -05:00
Christopher Milan
5f6b610da1 FLOAT16 logic for IMAGE==1 goes back to image_conv2d (#15105) 2026-03-03 05:37:57 -05:00
wozeparrot
529318259c fix: fix null tests to actually use null device (#15104) 2026-03-03 02:05:47 -08:00
George Hotz
7d025089e3 no after removal (#15102)
* no after removal

* we are using walk

* null schedule test

* pytest deps

* Revert "pytest deps"

This reverts commit 5e1c5304ec.

* Revert "null schedule test"

This reverts commit 02da66053e.

* clean null tests
2026-03-03 17:50:31 +08:00
wozeparrot
92c16810ac feat: per device mem_used (#15100) 2026-03-03 01:31:28 -08:00
qazal
e3a0598d0b viz: the whole pc should be in view (#15101) 2026-03-03 17:17:53 +09:00
b1tg
a9ea36de79 assembly/amd: v_cmp_lg_f32 is ordered not-equal (#14982) 2026-03-03 15:37:48 +08:00
wozeparrot
c35de9bd68 asm_gemm: support more sharding (#15002) 2026-03-02 23:16:37 -08:00
wozeparrot
824ba4386a llama3 dp fix (#15098) 2026-03-02 22:43:07 -08:00
chenyu
5dcf29b1a0 use clone in test_swap_slices (#15096) 2026-03-02 22:05:12 -05:00
Christopher Milan
c70e8af068 move IMAGE FLOAT16 logic to allocations (#15095)
* FLOAT16 logic in allocations

* cleanup

* separate that

* only apply when IMAGE == 1

* test passing now

* create image buffers earlier
2026-03-02 22:00:05 -05:00
George Hotz
d483e4153a buffer view is like buffer (#15082)
* buffer view is like buffer

* fix

* swap_reshape_shrink

* contiguous on gguf, fix overlap

* revert that

* _device_supports_view

* this

* fix that test

* 0 buffers

* that test was wrong

* this

* check correct size

* contig BUFFER_VIEW

* this

* fix tests

* buffer view tests

* om

* fix torch

* no MOCKGPU

* skip
2026-03-03 09:52:33 +08:00
qazal
62ee976c1b gemm/asm: cleanup repeated patterns to helper functions (#15094) 2026-03-03 08:14:47 +09:00
qazal
848f5cea96 viz: sqtt instruction packet trace (#15065) 2026-03-03 07:55:04 +09:00
chenyu
14d1c5fdfd assign fusion tests on detach and contiguous_backward (#15092) 2026-03-02 15:21:51 -05:00
nimlgen
dfa180413d tbgpu: sign nv (#15087) 2026-03-02 22:58:30 +03:00
chenyu
71f228f80f test exact kernel count in torch_backend/test_kernel_fusion (#15091) 2026-03-02 14:26:32 -05:00