Commit Graph

11106 Commits

Author SHA1 Message Date
George Hotz
617614beb7 add mi350x support to mmapeak (#12784) 2025-10-19 16:11:07 +08:00
qazal
c8ef4b60f6 viz: share match tracing and TINY device profiler (#12783)
* set a default name for the traces

* set profile_matches + renames

* profile_matches test

* traces 4 steps total
2025-10-19 14:30:07 +08:00
chenyu
350a4754a9 Update openpilot models (#12780)
* Update openpilot models

* Update slower model

* fix that

---------

Co-authored-by: Bruce Wayne <harald.the.engineer@gmail.com>
2025-10-18 20:32:35 -04:00
chenyu
30ff84d050 update test_conv2d_ceildiv_edge_case (#12779) 2025-10-18 16:43:32 -04:00
nimlgen
442218266d qcom: fix profiler (#12778)
* qcom: fix profiler

* this way
2025-10-19 01:27:59 +08:00
Harald Schäfer
addc54b96c Simplify openpilot compile3.py (#12748)
* Simpler compile3

* tests

* remove default args

* onnx file is still fp16

* self-test FP16 too

* allow test disable

* absurd tolerance

* Just do latest

* Try simplest

* use later models

* kernel count not relevant if speed is good

* dead improts

* Revert "dead improts"

This reverts commit f68c2cd15d.

* Revert "kernel count not relevant if speed is good"

This reverts commit 0955ca4ee0.

* add back kernal count check on latest model
2025-10-18 10:12:22 -04:00
nimlgen
037f6e8fa0 qcom: ioctl for 7xx (#12777) 2025-10-18 20:33:14 +08:00
wozeparrot
82f10cfe2e feat: assert on bufferview math (#12772) 2025-10-17 14:20:08 -07:00
chenyu
fcdf4ab37e remove a contiguous in LARS (#12770) 2025-10-17 17:07:30 -04:00
nimlgen
910d698b78 system: cleanup page sizes (#12771)
* system: cleanup page sizes

* ooops
2025-10-18 02:06:42 +08:00
George Hotz
062a6d68d7 test flash attention backward (#12762)
* test flash attention backward

* TODO: fix pcontig

* end ranges

* render colors

* very big

* multiout at every level

* reset ending ranges

* fix tests

* ugh
2025-10-17 23:15:59 +08:00
George Hotz
33025b99f6 small changes from fa backward (#12769) 2025-10-17 22:41:18 +08:00
chenyu
e0d0d4372d fix shape of m and v in onnx Adam with FUSE_OPTIM (#12768)
value is still slightly off but that's not onnx specific
2025-10-17 10:32:41 -04:00
qazal
bd662bea67 viz: light up program runs (#12764)
* basics work

* fix the color

* light up program events

* swap a with p

* better
2025-10-17 19:33:18 +08:00
George Hotz
c9a3464f76 those decimals never mattered (#12760)
* those decimals never mattered

* this

* improve debug

* real substitute fixes pcontig

* locals are different buffers
2025-10-17 17:16:24 +08:00
qazal
0160f034d6 viz: show display name for copy runners (#12761)
* viz: show display name for copy runners

* more u32
2025-10-17 16:59:51 +08:00
qazal
253d32b065 viz: add metadata to buffer user list (#12758)
* simple failing test

* encodings

* test passing

* key is deduped
2025-10-17 16:28:54 +08:00
George Hotz
935a60db72 bring back partial contig and flash attention (#12756)
* bring back partial contig and flash attention

* why not 2

* work

* that

* fix pcontig
2025-10-17 16:19:05 +08:00
Sieds Lykles
f6bc620169 UOp.prod and UOp.sum methods (#12755) 2025-10-17 10:02:01 +02:00
Sieds Lykles
d1bb5c0426 slightly flatter symbolic (#12757) 2025-10-17 09:58:45 +02:00
qazal
5417e4b099 viz helper cleanups (#12754) 2025-10-17 15:20:24 +08:00
qazal
3196a7aae3 viz: pre reqs for lighting up programs (#12753) 2025-10-17 15:03:21 +08:00
qazal
dfb8f9fc9e viz: annotate buffer mutability in the memory graph (#12750) 2025-10-17 11:53:02 +08:00
Sieds Lykles
79c2f1ae26 remove reduce_rangless and replace with reduce_unparented (#12749) 2025-10-17 04:46:05 +02:00
chenyu
9561803cb0 fix assert in test_schedule (#12745)
* fix assert in test_schedule

updated kernel counts and some old tests

* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64 delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
98239f1156 few shapetracker cleanups (#12741) 2025-10-16 12:43:27 -04:00
chenyu
53478c741d relax ASSERT_MIN_STEP_TIME for space lab policy (#12742) 2025-10-16 11:40:36 -04:00
geohotstan
5d209ee7ec onnx helper intermediate node output validation (#12740)
* start

* update comments

* good

* add comments and better printing

* done
2025-10-16 11:17:47 -04:00
Christopher Milan
bce2bc0465 Revert "use RTLD_GLOBAL on macos" (#12738)
This reverts commit 89fe3e574d.
2025-10-16 10:07:21 -04:00
chenyu
f34f26bca0 fix gpt2 with benchmark (#12736)
`CPU=1 python3 examples/gpt2.py --benchmark 128` works now
2025-10-16 09:55:20 -04:00
Sieds Lykles
55db1b0e0e reduce where that is cut from two sides (#12733)
* better rule

* correct pattern

* shorten line
2025-10-16 15:25:15 +02:00
nimlgen
cf9baeea61 Revert "nv: check if jitlink is avail (#12731)" (#12735)
This reverts commit a069a45d14.
2025-10-16 20:41:49 +08:00
George Hotz
8be7844b2e use apply uop for assign to fix assign metadata (#12732)
* use apply uop for assign

* fix metadata for assign

* fix backward metadata

* those aren't real tests
2025-10-16 20:34:12 +08:00
nimlgen
3aa2277b8f nv: usb4 (#12696)
* hackish

* prog

* match

* l

* simpler

* refactor

* not osx

* apple things

* tiny changes

* fix mask

* match fix

* nn
2025-10-16 20:11:19 +08:00
nimlgen
a069a45d14 nv: check if jitlink is avail (#12731) 2025-10-16 19:58:50 +08:00
George Hotz
a498ec9c18 cleanup names of postrange + fast FUSE_OPTIM (#12730)
* cleanup names of postrange

* make FUSE_OPTIM not slow

* delete junk in def r
2025-10-16 19:38:31 +08:00
Sieds Lykles
8f740e07ff no broadcasting/vectors in reduce collapse (#12729) 2025-10-16 13:22:57 +02:00
qazal
533f18b22c viz: add trace data for inflight buffers (#12728)
* viz: add trace data for inflight buffers

* add test_inflight_buf

* temp stores the keys

* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169 faster stable diffusion load (#12725)
* faster stable diffusion load

* failing tests
2025-10-16 18:31:59 +08:00
nimlgen
e7c057d5dc system: alloc_sysmem return view (#12724)
* system: alloc_sysmem return view

* e
2025-10-16 17:55:01 +08:00
nimlgen
b86a33a312 ptx: support bw (#12722) 2025-10-16 15:38:08 +08:00
nimlgen
b8cd66c7a2 nv: support all gb20x and small bar (#12721) 2025-10-16 15:37:54 +08:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
wozeparrot
cc2dfe22f5 tinyfs: fetch file utility (#12719) 2025-10-15 23:38:56 -07:00
nimlgen
3ed543f956 system: reorder funcs + barrier on macos (#12714) 2025-10-16 14:38:01 +08:00
qazal
b77bdbbc62 viz: count unpickle in server startup time (#12715)
* viz: count unpickle in server startup time

* type checking
2025-10-16 13:07:46 +08:00
George Hotz
7c19db00f1 remove st from jit/split_reduceop (#12713)
* remove st from jit

* fix by merging reshapes

* no st usage in rangeify

* hmm, stop early works

* fix speed regressions
2025-10-16 12:50:58 +08:00
qazal
069177c1be trace buffer producer and consumers (#12639)
* trace buffer producer and consumers

* work

* generic colored util

* fix batched

* basic clicking works

* generic javascript that works for producer and consumers

* keep focused shape

* idle time

* timings for producer and consumers dedup

* from sd test

* tiny cleanups

* timeline

* work

* up to here

* assert

* list it

* work
2025-10-16 11:11:31 +08:00