chenyu
4e5a9132e7
JIT_BATCH_SIZE=0 in compile3 ( #13245 )
...
fixed some enqueue time
2025-11-12 23:12:45 -05:00
chenyu
41e45c20ff
minor stuff reading the printed code [pr] ( #13177 )
2025-11-09 00:58:51 -05:00
chenyu
834067d91f
move onnx import in compile3 ( #13172 )
...
only used in test_vs_onnx
2025-11-08 09:44:34 -08:00
Harald Schäfer
587ccc0e5c
compile3: make selftests opt-in ( #12851 )
2025-10-21 11:32:27 -07:00
wozeparrot
990e8b97ee
feat: log openpilot 0.10.1 times ( #12816 )
2025-10-20 18:30:34 -07:00
Harald Schäfer
addc54b96c
Simplify openpilot compile3.py ( #12748 )
...
* Simpler compile3
* tests
* remove default args
* onnx file is still fp16
* self-test FP16 too
* allow test disable
* absurd tolerance
* Just do latest
* Try simplest
* use later models
* kernel count not relevant if speed is good
* dead improts
* Revert "dead improts"
This reverts commit f68c2cd15d .
* Revert "kernel count not relevant if speed is good"
This reverts commit 0955ca4ee0 .
* add back kernal count check on latest model
2025-10-18 10:12:22 -04:00
George Hotz
612e3d6143
replace mop arg with vectorized index ( #12695 )
...
* replace mop arg with vectorized index
* tests passing
* better viz
* no compile4
2025-10-15 20:50:06 +08:00
nimlgen
658c566e22
vars in gated_read_image_count ( #12486 )
...
* vars in gated_read_image_count
* nc
2025-10-09 14:54:15 +08:00
chenyu
be05028419
move ASSERT_MIN_STEP_TIME to compile3 ( #12535 )
...
threshold is current time +20%
2025-10-08 22:16:59 -04:00
qazal
7e0b14243e
delete grouper and kernelize ( #12517 )
...
* delete grouper and kernelize
* +sys.setrecursionlimit
2025-10-08 12:27:26 +03:00
George Hotz
0f25b4b289
move frontend dir to nn [pr] ( #12470 )
2025-10-07 10:42:22 +08:00
qazal
1af05dae77
fix rangeify in compile4.py ( #12467 )
...
* fix rangeify in compile4.py
* fix type_verify
2025-10-06 13:37:46 +03:00
chenyu
0e266f376c
ops_gpu -> ops_cl ( #12103 )
2025-09-10 15:15:48 -04:00
George Hotz
842184a1ab
rename kernelize to schedule, try 2 ( #11305 )
2025-07-21 11:18:36 -07:00
chenyu
85ddd72038
simpler grouptop in hcopt ( #11219 )
...
* simpler grouptop in hcopt
keep the only perf relevant conditions and the rest is handled by try except
* update openpilot read image count
2025-07-13 16:06:09 -04:00
geohotstan
5ce278b245
OnnxRunner file as input ( #10789 )
...
* file path as input and have parse be in OnnxRunner.__init__
* modelproto_to_onnxrunner -> modelproto_to_runner
* whoops, fix import
* oh flakiness again, is it because it's getting gc-ed?
* small changes
* CI flaky so just move compile4 fix in
* copy typing of onnx_load
* actually can just import onnx_load instead of onnx.load
* fix external_benchmark_openpilot
* fix onnx_runner test to use onnx_helper
* rerun CI
* try run_modelproto
* spam CI a few times
* revert run_modelproto since that's flaky also
* no external onnx_load usage except onnx.py
* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?
* model_benchmark 193s -> 80s, add OnnxRunner.to()...
* minimize diff and clean up
* device can be None, weird but eh
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-12 14:27:46 -04:00
geohotstan
50936b4a18
ONNX real float16 ( #10694 )
...
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-26 14:05:12 -04:00
George Hotz
b41e0563a3
move stuff to kernelize folder ( #10902 )
...
* move stuff to kernelize folder
* oops, forgot that
2025-06-20 16:10:20 -07:00
George Hotz
cba6e15937
split grouper and kernelize [pr] ( #10854 )
2025-06-17 17:54:20 -07:00
chenyu
7d5c769c6b
fix compile4 ( #10797 )
2025-06-12 22:28:56 -04:00
b1tg
24d328e313
onnx parser ( #10435 )
...
* onnx parser
* fix compile, lint
* onnx.load -> onnx_load
* compatible with ModelProto
* fix test external_test_onnx_ops.py
* fix tests
* fix signed int
* reduce to 261 lines
* fix TypeProto.Optional
* debug for _parse_message, add TypeProto.Sequence, cleanup
* onnx_load from Tensor
* remove BufferedReader
* 174 lines and reduce tensor copy
* cleanup
* use onnx_load in external_model_benchmark.py
* fix qcom test
* [onnx] parser support external data
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-09 12:44:28 -04:00
George Hotz
32e9949052
rename lazydata to uop ( #10698 )
2025-06-08 08:42:22 -07:00
George Hotz
0d39bb5de1
rename to get_kernelize_map ( #10465 )
2025-05-22 11:44:44 -07:00
George Hotz
577a0b4cfa
openpilot compile4 (wip) ( #10407 )
...
* openpilot compile4
* add copies
* remove junk
2025-05-22 10:47:34 -07:00
George Hotz
74d98eafb8
add onnx frontend stub [pr] ( #9558 )
2025-03-24 12:24:34 +08:00
ZwX1616
c977781b3c
no numpy change if no NPY ( #9281 )
...
* skip np change check if no NPY
* use any
2025-02-28 09:32:35 +08:00
George Hotz
8b16c65bca
add compile3 benchmark [pr] ( #8929 )
2025-02-06 22:49:31 +08:00
geohotstan
dd82b4c913
make onnx runner a class ( #8647 )
...
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-20 10:11:05 -08:00
Harald Schäfer
7059459648
Openpilot compile: fix for openpilot use ( #8338 )
...
* compile3 changes
* merge conflict
* merge conflict
* give dm npy for now
* Revert "give dm npy for now"
This reverts commit bfd980da7d2c2bab5b073127442c361922032ba1.
* updates
* Always float32 floats
* Update compile3.py
* Update compile3.py
---------
Co-authored-by: ZwX1616 <zwx1616@gmail.com >
2024-12-19 19:43:15 -05:00
chenyu
26e049ab40
add ALLOWED_READ_IMAGE=2131 to openpilot ( #8166 )
...
added as exact number check now as it's not clear if more/less than allowed is any better
2024-12-11 12:14:17 -08:00
George Hotz
f83d715f41
move checks into compile3, delete compile2 [pr] ( #8127 )
...
* move checks into compile3 [pr]
* test_vs_onnx
* test v torch works
* float16 won't compile on compile3
* actually delete compile2
2024-12-09 14:21:42 -08:00
George Hotz
00ac0db9d4
np tensors have the memory from numpy in compile3 [pr] ( #8098 )
2024-12-07 14:01:51 +08:00
George Hotz
22feb3a2f1
move copy into the JIT for openpilot compile3 ( #7937 )
...
* move copy into the JIT, test fails
* ahh, prune was the issue
2024-12-07 13:26:26 +08:00
George Hotz
fbb4099b3c
add test for compile3 [pr] ( #7783 )
...
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-19 19:26:51 +08:00
Harald Schäfer
e7cbc29f48
openpilot benchmark: add cast from numpy to benchmark ( #7593 )
...
* openpilot benchmark: add cast from numpy to benchmark
* whitespace
* comment
2024-11-08 19:31:00 +08:00
George Hotz
c8bf09b7d4
s/UOps/Ops ( #7500 )
...
* s/UOps/Ops [pr]
* fix
2024-11-03 11:26:10 +08:00
George Hotz
72a9ac27e9
support image dtype in cloud [pr] ( #7482 )
...
* support image dtype in cloud [pr]
* remove outdated osx hack
* unused imports
2024-11-02 23:54:27 +08:00
George Hotz
26df50cf43
move memory_planner to memory.py [pr] ( #7079 )
2024-10-16 10:04:35 +08:00
George Hotz
5c9f76e274
hotfix: openpilot compile3 compare to i==1
2024-10-12 09:44:24 +08:00
George Hotz
f45d178a55
hotfix: support JIT_BATCH_SIZE=0, make that the default
2024-09-25 10:36:04 +08:00
George Hotz
b9e6d42a1f
Revert "gated native math in OpenCL ( #6683 )" ( #6691 )
...
This reverts commit 2fe3eeed17 .
2024-09-24 08:48:10 +08:00
George Hotz
2fe3eeed17
gated native math in OpenCL ( #6683 )
...
* gated native math
* Update cstyle.py
2024-09-23 19:22:13 +08:00
chenyu
b14c1bc417
UOps.RANGE is_increasing ( #6615 )
...
* UOps.RANGE is_increasing
283 -> 47 valids
* test
2024-09-20 03:14:52 -04:00
George Hotz
d02bb270b7
add copyin copyout for image on GPU [run_process_replay] ( #6580 )
...
* add copyin copyout for image on GPU [run_process_replay]
* add timing
* enqueue vs total run
* it's failing but that's fine
2024-09-18 16:06:20 +08:00
George Hotz
d4b662c318
new openpilot compile ( #6573 )
...
* new openpilot compile
* note, copyout doesn't work for images
2024-09-18 14:22:50 +08:00
chenyu
798be6bb74
add gated read_image count in openpilot compile2 ( #6546 )
...
530 to go
2024-09-16 21:17:00 -04:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-08-16 18:17:57 -04:00
qazal
c23d44c779
AST is UOp ( #6030 )
...
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12 .
* fix benchmark
* remove extra dedup
2024-08-16 22:09:00 +03:00
George Hotz
e077bc7baf
move memory planner to realize ( #5937 )
2024-08-06 10:41:29 -07:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00