Commit Graph

10674 Commits

Author SHA1 Message Date
George Hotz
890897553d this work? 2025-10-20 18:35:45 +08:00
George Hotz
ec97cec952 this is better 2025-10-20 18:07:34 +08:00
George Hotz
154b6d5901 after in sym, axis_letters in range 2025-10-20 17:39:23 +08:00
George Hotz
b8a9cce783 replace NOOP with AFTER in reg init (#12804)
* after op

* fix tests

* replace NOOP with AFTER in reg init

* closer

* or_after there

* fix device

* fix all renderers

* better spec for after
2025-10-20 15:34:32 +08:00
qazal
12fd2c9c7b explicitly set ignore_indexing for schedule only (#12803) 2025-10-20 13:11:57 +08:00
qazal
734c99f722 viz: show indexing rewrites during run_rangeify (#12802)
* viz: show indexing rewrites during run_rangeify

* sinking index
2025-10-20 12:37:03 +08:00
George Hotz
2e9082e0bc after op (#12801)
* after op

* fix tests
2025-10-20 12:27:56 +08:00
qazal
339e6edb7d viz: ui prereqs for hierarchical rewrites (#12799) 2025-10-20 12:15:15 +08:00
wozeparrot
357dac8425 feat: allow tuple indexing on uops (#12797) 2025-10-19 19:11:05 -07:00
George Hotz
ba593f7b98 don't render index (#12796)
* don't render index

* update to ignore_indexing

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2025-10-20 09:48:36 +08:00
George Hotz
cad3ada909 tinygpu: build with SIP off works 2025-10-20 09:11:09 +08:00
nimlgen
9cd35deae7 amd: fix alignment + pointers for aql over usb (#12793) 2025-10-19 23:55:57 +08:00
nimlgen
59784a5972 amd: ensure ts is written (#12794) 2025-10-19 23:55:49 +08:00
chenyu
63a23dfe80 test step 0 in TestTrainingOnnxOps (#12790)
and tighter rtol
2025-10-19 09:15:49 -04:00
chenyu
e8158afd4b update test_qlinear_add_round_half_to_even (#12789)
this does not pass locally
2025-10-19 08:47:27 -04:00
Sieds Lykles
1df9c7d7e7 reduce_collapse uses symbolic_flat (#12766)
* sym->symbolic_flat

* cast invalid drops invalid
2025-10-19 12:27:47 +02:00
Sieds Lykles
fd6ef4801c rangeify uses symbolic_flat (#12786)
* symbolic_simple -> symbolic_flat

* remove expected failures
2025-10-19 12:27:14 +02:00
George Hotz
89e7f2fa00 mmapeak: gfx1103 support 2025-10-19 16:57:28 +08:00
George Hotz
617614beb7 add mi350x support to mmapeak (#12784) 2025-10-19 16:11:07 +08:00
qazal
c8ef4b60f6 viz: share match tracing and TINY device profiler (#12783)
* set a default name for the traces

* set profile_matches + renames

* profile_matches test

* traces 4 steps total
2025-10-19 14:30:07 +08:00
chenyu
350a4754a9 Update openpilot models (#12780)
* Update openpilot models

* Update slower model

* fix that

---------

Co-authored-by: Bruce Wayne <harald.the.engineer@gmail.com>
2025-10-18 20:32:35 -04:00
chenyu
30ff84d050 update test_conv2d_ceildiv_edge_case (#12779) 2025-10-18 16:43:32 -04:00
nimlgen
442218266d qcom: fix profiler (#12778)
* qcom: fix profiler

* this way
2025-10-19 01:27:59 +08:00
Harald Schäfer
addc54b96c Simplify openpilot compile3.py (#12748)
* Simpler compile3

* tests

* remove default args

* onnx file is still fp16

* self-test FP16 too

* allow test disable

* absurd tolerance

* Just do latest

* Try simplest

* use later models

* kernel count not relevant if speed is good

* dead improts

* Revert "dead improts"

This reverts commit f68c2cd15d.

* Revert "kernel count not relevant if speed is good"

This reverts commit 0955ca4ee0.

* add back kernal count check on latest model
2025-10-18 10:12:22 -04:00
nimlgen
037f6e8fa0 qcom: ioctl for 7xx (#12777) 2025-10-18 20:33:14 +08:00
wozeparrot
82f10cfe2e feat: assert on bufferview math (#12772) 2025-10-17 14:20:08 -07:00
chenyu
fcdf4ab37e remove a contiguous in LARS (#12770) 2025-10-17 17:07:30 -04:00
nimlgen
910d698b78 system: cleanup page sizes (#12771)
* system: cleanup page sizes

* ooops
2025-10-18 02:06:42 +08:00
George Hotz
062a6d68d7 test flash attention backward (#12762)
* test flash attention backward

* TODO: fix pcontig

* end ranges

* render colors

* very big

* multiout at every level

* reset ending ranges

* fix tests

* ugh
2025-10-17 23:15:59 +08:00
George Hotz
33025b99f6 small changes from fa backward (#12769) 2025-10-17 22:41:18 +08:00
chenyu
e0d0d4372d fix shape of m and v in onnx Adam with FUSE_OPTIM (#12768)
value is still slightly off but that's not onnx specific
2025-10-17 10:32:41 -04:00
qazal
bd662bea67 viz: light up program runs (#12764)
* basics work

* fix the color

* light up program events

* swap a with p

* better
2025-10-17 19:33:18 +08:00
George Hotz
c9a3464f76 those decimals never mattered (#12760)
* those decimals never mattered

* this

* improve debug

* real substitute fixes pcontig

* locals are different buffers
2025-10-17 17:16:24 +08:00
qazal
0160f034d6 viz: show display name for copy runners (#12761)
* viz: show display name for copy runners

* more u32
2025-10-17 16:59:51 +08:00
qazal
253d32b065 viz: add metadata to buffer user list (#12758)
* simple failing test

* encodings

* test passing

* key is deduped
2025-10-17 16:28:54 +08:00
George Hotz
935a60db72 bring back partial contig and flash attention (#12756)
* bring back partial contig and flash attention

* why not 2

* work

* that

* fix pcontig
2025-10-17 16:19:05 +08:00
Sieds Lykles
f6bc620169 UOp.prod and UOp.sum methods (#12755) 2025-10-17 10:02:01 +02:00
Sieds Lykles
d1bb5c0426 slightly flatter symbolic (#12757) 2025-10-17 09:58:45 +02:00
qazal
5417e4b099 viz helper cleanups (#12754) 2025-10-17 15:20:24 +08:00
qazal
3196a7aae3 viz: pre reqs for lighting up programs (#12753) 2025-10-17 15:03:21 +08:00
qazal
dfb8f9fc9e viz: annotate buffer mutability in the memory graph (#12750) 2025-10-17 11:53:02 +08:00
Sieds Lykles
79c2f1ae26 remove reduce_rangless and replace with reduce_unparented (#12749) 2025-10-17 04:46:05 +02:00
chenyu
9561803cb0 fix assert in test_schedule (#12745)
* fix assert in test_schedule

updated kernel counts and some old tests

* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64 delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
98239f1156 few shapetracker cleanups (#12741) 2025-10-16 12:43:27 -04:00
chenyu
53478c741d relax ASSERT_MIN_STEP_TIME for space lab policy (#12742) 2025-10-16 11:40:36 -04:00
geohotstan
5d209ee7ec onnx helper intermediate node output validation (#12740)
* start

* update comments

* good

* add comments and better printing

* done
2025-10-16 11:17:47 -04:00
Christopher Milan
bce2bc0465 Revert "use RTLD_GLOBAL on macos" (#12738)
This reverts commit 89fe3e574d.
2025-10-16 10:07:21 -04:00
chenyu
f34f26bca0 fix gpt2 with benchmark (#12736)
`CPU=1 python3 examples/gpt2.py --benchmark 128` works now
2025-10-16 09:55:20 -04:00
Sieds Lykles
55db1b0e0e reduce where that is cut from two sides (#12733)
* better rule

* correct pattern

* shorten line
2025-10-16 15:25:15 +02:00