George Hotz
dc77b3318b
move files that pass with NULL=1 to test/null ( #14508 )
...
* move files that pass with NULL=1 to test/null
* fix windows
* cpu 0
* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
85c7b23160
add pytest -nauto to benchmark for mac ( #14458 )
...
* add pytest -nauto to benchmark
* 3 minute timeout
* 3 min
* setup env
* comment
* fresh db
* in the pyenv
2026-02-03 12:26:09 +08:00
nimlgen
544928766d
hcq_smi: kill mac pids ( #14398 )
2026-01-28 15:00:28 +03:00
George Hotz
52b989c6c8
don't place consts early + fixes from anthropic challenge ( #14286 )
...
* don't place consts early
* add anthropic challenge
* with ref
* do we still have to devectorize bools?
* tests pass
* just WHERE
* fine, revert that
* fine, revert
* only index
* z3 validator doesn't support vectorized
* Revert "z3 validator doesn't support vectorized"
This reverts commit 1b7930ecb3 .
* z3 not for vec
* no spec
* VLIWRenderer
* loop unrolling
* better comments
* cleanups
* skip cast
* renderer
* cleanups
* prints
* no hack
* hacks
* bump to 11
* reg warning
* lil clean
* cleaner renderer
2026-01-23 10:48:39 +09:00
chenyu
dc4ae7dd08
lower ASSERT_MIN_STEP_TIME for driving_policy to 3ms ( #14184 )
...
seems quite stable at 2.7ms now
2026-01-16 15:04:53 -05:00
nimlgen
f9147422a3
ci: add setcap ( #14143 )
2026-01-14 13:15:01 +03:00
nimlgen
e372c841ba
hevc: beam in decode ( #14067 )
...
* hevc: beam in decode
* fine
* g
2026-01-08 15:47:16 +03:00
Christopher Milan
61dc70f1a8
add driving_vision IMAGE=1 benchmark ( #13979 )
2026-01-02 13:58:27 -05:00
chenyu
ce84a23142
remove tee in benchmark ( #13954 )
2026-01-01 10:55:36 -05:00
chenyu
f5090192c8
reorder AMD tensor core benchmark test ( #13860 )
...
* reorder AMD tensor core benchmark test
* disable that
2025-12-28 12:29:51 -05:00
George Hotz
4702da41d5
hotfix: mkdir for extra/disassemblers
2025-12-19 17:18:37 -04:00
Christopher Milan
97103831c5
Revert "remove image from BufferSpec ( #13636 )" ( #13761 )
...
This reverts commit 2571a1eb47 .
2025-12-19 13:54:36 -05:00
Christopher Milan
2571a1eb47
remove image from BufferSpec ( #13636 )
...
* remove image from BufferSpec
* cl tiny_gemm (64) works
* mypy
* padding
* openpilot CL
* reshape properly
* remove extra qcom checks
* pad output
* mypy
* update compile test
* move undo
* TestImageCopy valid images
* TestImageRealization valid images
* TestImageDType valid images
* cleanups
* test_renderer_failures
* ruff
* mypy
* simplify ops_qcom
* bump step time
2025-12-19 13:41:20 -05:00
George Hotz
4b741e893f
remove REMOTE=1 ( #13722 )
...
* remove REMOTE=1
* leave ibverbs
2025-12-16 15:58:10 -04:00
George Hotz
7589c897b2
split usbgpu tests into their own benchmark [pr] ( #13711 )
2025-12-15 21:42:40 -04:00
qazal
6bafd90248
remove unused process replay input [pr] ( #13712 )
2025-12-16 09:29:35 +08:00
nimlgen
cbae33003d
ci: add usb4 ( #13643 )
...
* ci: add usb4
* debug=3
* undef
* revert
2025-12-11 19:41:41 +03:00
chenyu
2471b49e45
minor bert / llama change from grad acc branch ( #13622 )
...
* minor bert / llama change from grad acc branch
* revert those
2025-12-08 16:04:14 -05:00
chenyu
ac1227575f
IMAGE=1 driving_vision in benchmark ( #13587 )
2025-12-05 10:20:54 -05:00
chenyu
8902781dc1
enable more benchmarks ( #13540 )
...
* enable more benchmarks
* disable some
* adjust ASSERT_MIN_STEP_TIME
* mac NOCLANG=1
2025-12-02 20:31:14 -05:00
nimlgen
455dd88236
nv: minimal hevc ( #13502 )
...
* nv: minimal hevc
* validate
* not needed
* tralin
* var
* cpu
* fxi
* desc
* move
* cleanup
2025-11-30 16:46:55 +03:00
wozeparrot
1f648bb1ba
feat: reenable mobilenetv2 dsp ( #13320 )
2025-11-21 15:21:49 -08:00
chenyu
6372c95094
disable benchmark MobileNetV2 on DSP ( #13305 )
...
failed on tinyc2
2025-11-16 09:42:52 -05:00
Harald Schäfer
3af231904e
openpilot compile tests: assert pre-rangify speeds ( #12775 )
...
* assert pre-rangify speeds
* typo
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 09:39:06 -08:00
chenyu
3f939f3d3c
update pm_simplify_valid ( #13241 )
...
* update pm_simplify_valid
fixed openpilot conv regression
* IMAGE training is broken
2025-11-12 19:40:02 -05:00
George Hotz
ab9fa964d8
DISABLE_COMPILER_CACHE -> CCACHE ( #13234 )
...
* DISABLE_COMPILER_CACHE -> CCACHE
* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
chenyu
23b90945c3
add a benchmark for openpilot vision with DEBUG=2 ( #13219 )
...
see per kernel speed, also disable the jobs for 0.9.9
2025-11-11 14:41:52 -05:00
chenyu
6c48c87e51
improved ASSERT_MIN_STEP_TIME ( #13182 )
...
* improved ASSERT_MIN_STEP_TIME
getting close, current time +1ms then round up
* relax
2025-11-09 16:41:12 -05:00
George Hotz
42b34cf83d
bottom up linearizer ( #13133 )
...
* bottom up linearizer
* late stores
* more complete
* remove broken heuristic
* upcast size
* opt
* more conservative
* it needs that
* disable opencl half on QCOM
* fix
* make that a real test
* cpu test okay
* ptx skip
* end is after the range
2025-11-06 15:30:32 -08:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical ( #13096 )
2025-11-04 11:28:18 -05:00
George Hotz
5eb87ab131
hotfix: bump cifar time to 350
2025-10-30 17:29:20 +08:00
b1tg
bb307b9e81
fix fp8 vectorization ( #12977 )
...
* fix fp8 vectorization
* add fp8 tc to benchmark
2025-10-28 13:55:30 -04:00
b1tg
45e2f916a3
add quantize fp8 in llama3 ( #12893 )
...
* add quantize fp8 in llama3
* don't truncate fp8 alu result
* cast to float32 before matmul
* --model weights/LLaMA-3/8B-SF-DPO/
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-10-27 10:22:57 -04:00
wozeparrot
6e00dec95d
feat: pin openpilot 0.10.1 models ( #12878 )
2025-10-22 14:57:54 -07:00
chenyu
f0831c8c30
add 0.10.0 to comma benchmark ( #12875 )
...
* add 0.10.0 to comma benchmark
disabled the 0.10.1 ones which are pinned to master. it does not work because benchmark uses the cached old version
* that's pinned
2025-10-22 15:18:21 -04:00
George Hotz
726988fa4b
late ifs try 2 ( #12865 )
...
* late ifs try 2
* fix image
* fix that test
* panic
* ptx fixups
* preserve toposort
* those pass locally
* Revert "those pass locally"
This reverts commit 063409f828 .
* no ls
* make that explicit
2025-10-22 18:49:27 +08:00
chenyu
6d86e962c7
update ASSERT_MIN_STEP_TIME ( #12857 )
...
0.10.1 driving_policy is good now, still need driving_vision and dmonitoring to be fast
2025-10-21 22:46:07 -04:00
wozeparrot
62e7b8b870
feat: just use compile3 ( #12849 )
2025-10-21 07:56:50 -07:00
wozeparrot
990e8b97ee
feat: log openpilot 0.10.1 times ( #12816 )
2025-10-20 18:30:34 -07:00
chenyu
350a4754a9
Update openpilot models ( #12780 )
...
* Update openpilot models
* Update slower model
* fix that
---------
Co-authored-by: Bruce Wayne <harald.the.engineer@gmail.com >
2025-10-18 20:32:35 -04:00
Harald Schäfer
addc54b96c
Simplify openpilot compile3.py ( #12748 )
...
* Simpler compile3
* tests
* remove default args
* onnx file is still fp16
* self-test FP16 too
* allow test disable
* absurd tolerance
* Just do latest
* Try simplest
* use later models
* kernel count not relevant if speed is good
* dead improts
* Revert "dead improts"
This reverts commit f68c2cd15d .
* Revert "kernel count not relevant if speed is good"
This reverts commit 0955ca4ee0 .
* add back kernal count check on latest model
2025-10-18 10:12:22 -04:00
chenyu
285534ce64
delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES ( #12744 )
...
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
53478c741d
relax ASSERT_MIN_STEP_TIME for space lab policy ( #12742 )
2025-10-16 11:40:36 -04:00
chenyu
b8cf35fb77
print macOS version in CI ( #12705 )
2025-10-15 15:05:33 -04:00
chenyu
89df6f611d
reenable sdxl mac benchmark ( #12680 )
...
also updated faster sd step times
2025-10-14 17:36:17 -04:00
Sieds Lykles
e625c27598
update min step times openpilot ( #12600 )
2025-10-10 11:24:27 +02:00
chenyu
be05028419
move ASSERT_MIN_STEP_TIME to compile3 ( #12535 )
...
threshold is current time +20%
2025-10-08 22:16:59 -04:00
chenyu
5986d656a2
tighter ASSERT_MIN_STEP_TIME ( #12531 )
...
set to about 1.2x of actual time now
2025-10-08 21:22:54 -04:00
George Hotz
3b0b3a2e64
fast RANGEIFY ( #12504 )
...
* rtoposort is fast, can replace rangeify with this
* fast rangeify
* work
* fast rangeify works for mnist
* should work
* progress
* pad fix
* FAST
* tests passing
* don't delete those shape ops
* put in rangeify map
* ending ranges fix
* tests
* mstack/mselect no hacks
* move to indexing.py
* touch up tests + add comments
* disable failing test
* actually make the file readable
* failing
* error
2025-10-08 19:38:06 +08:00
chenyu
eb3bc277b3
remove ASSERT_MIN_STEP_TIME in external_benchmark_openpilot ( #12495 )
...
should add for compile3 and compile 3 only
2025-10-07 22:13:42 -04:00