Christopher Milan
4043489803
set curl -f in setup-tinygrad ( #13389 )
...
* set curl -f in setup-tinygrad
* test bad redirect
* Revert "test bad redirect"
This reverts commit ad945e7ffc .
2025-11-20 13:45:47 -05:00
Christopher Milan
0901a40685
Revert "autogen: fix formatting on zero-argument function-like macros ( #13386 )" ( #13387 )
...
This reverts commit 58d85d4bab .
2025-11-20 12:45:35 -05:00
Christopher Milan
58d85d4bab
autogen: fix formatting on zero-argument function-like macros ( #13386 )
...
* fix formatting on zero-argument function-like macros
* autogen tests should run
* ugh
2025-11-20 12:11:04 -05:00
Roelof van Dijk
0dc2ff431d
fix: revive torch backend ( #13280 )
...
* fix: revive torch backend
* as_strided view vs copy
* Revert "as_strided view vs copy"
This reverts commit 82a61223f2 .
* add extra tests (move inplace, add fusion tests)
* better fusion with inplace_op
* no optimizer hooks (break mnist training fusion)
* split off fusion tests in separate file, assert on resnet fusion
fix: remove comments
* cleanup, reduce diff
* reduce diff
* better fusion and identity checks
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-19 15:26:50 -08:00
George Hotz
1a332afa76
spec test on 3.14 ( #12957 )
2025-11-19 00:43:04 -08:00
chenyu
6372c95094
disable benchmark MobileNetV2 on DSP ( #13305 )
...
failed on tinyc2
2025-11-16 09:42:52 -05:00
Christopher Milan
5b823af696
Remove (pypi) clang dep for autogen ( #13284 )
...
* no more clang
* regen comgr_3
* ci doesn't need pypi clang
* fix objc
* REGEN for libclang
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-15 09:05:11 -08:00
George Hotz
df53c62a9f
bump line count
2025-11-15 08:16:20 -08:00
Christopher Milan
d1bb08c5a1
In-tree autogen: objective c ( #13223 )
...
* checkout changes from autogen branch
* move assert
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-14 14:08:42 -08:00
nimlgen
14eb48b13a
autogen: rename nv_gpu to nv_570 ( #13273 )
...
* autogen: rename nv_gpu to nv_570
* rename
2025-11-14 20:07:19 +08:00
George Hotz
44d84228ff
move comgr_3 logic back to the old place ( #13266 )
...
* move comgr_3 logic back to the old place
* explicit
2025-11-13 20:05:54 -08:00
Christopher Milan
09f3aae169
In-tree autogen: all C libraries ( #13220 )
...
* checkout files from autogen branch
* ioctl with payload
* fix am generations
* properly fix generations
This reverts commit b2a54f4f41 .
* revert discovery.h
* support pragma pack(1)
* typo
* better getter
* typo
* NVCEC0_QMDV05_00_RELEASE[01]_ENABLE
* align support
* anon handling fix
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 18:57:44 -08:00
Harald Schäfer
3af231904e
openpilot compile tests: assert pre-rangify speeds ( #12775 )
...
* assert pre-rangify speeds
* typo
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-11-13 09:39:06 -08:00
George Hotz
263b724143
one cache and bump it ( #13258 )
2025-11-13 07:33:31 -08:00
chenyu
3f939f3d3c
update pm_simplify_valid ( #13241 )
...
* update pm_simplify_valid
fixed openpilot conv regression
* IMAGE training is broken
2025-11-12 19:40:02 -05:00
George Hotz
ab9fa964d8
DISABLE_COMPILER_CACHE -> CCACHE ( #13234 )
...
* DISABLE_COMPILER_CACHE -> CCACHE
* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
Christopher Milan
41a098a82d
In-tree autogen: libc.py ( #13217 )
...
* checkout changes from autogen branch
* parents
* pylint happy
* move sys to system in helpers.py
* typo
* typo
2025-11-11 19:13:48 -08:00
chenyu
23b90945c3
add a benchmark for openpilot vision with DEBUG=2 ( #13219 )
...
see per kernel speed, also disable the jobs for 0.9.9
2025-11-11 14:41:52 -05:00
Gaétan Lepage
6fd7ce3832
migrate to pyproject.toml ( #13189 )
...
* migrate to pyproject.toml
* move mypy config to pyproject.toml
2025-11-11 09:09:27 -08:00
chenyu
60e55d9a2d
line count 18500 ( #13191 )
2025-11-10 13:52:13 -05:00
chenyu
6c48c87e51
improved ASSERT_MIN_STEP_TIME ( #13182 )
...
* improved ASSERT_MIN_STEP_TIME
getting close, current time +1ms then round up
* relax
2025-11-09 16:41:12 -05:00
chenyu
e1d46de8f8
update GROUPTOP heuristic more ( #13178 )
...
reverts #13176
2025-11-09 02:31:12 -05:00
chenyu
8e868dced8
only GROUPTOP one reduce kernel ( #13176 )
...
* only GROUPTOP one reduce kernel
* ALLOWED_GATED_READ_IMAGE=148
2025-11-08 22:38:44 -05:00
George Hotz
42b34cf83d
bottom up linearizer ( #13133 )
...
* bottom up linearizer
* late stores
* more complete
* remove broken heuristic
* upcast size
* opt
* more conservative
* it needs that
* disable opencl half on QCOM
* fix
* make that a real test
* cpu test okay
* ptx skip
* end is after the range
2025-11-06 15:30:32 -08:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical ( #13096 )
2025-11-04 11:28:18 -05:00
chenyu
ddf01fdb15
revert mlperf.yml setting ( #13080 )
2025-11-03 15:24:13 -05:00
chenyu
a317d6e625
extra/amdpci/setup_python_cap.sh ( #13070 )
2025-11-02 19:19:36 -05:00
chenyu
ad501ce50a
mlperf cron install tqdm ( #13069 )
...
one more...
2025-11-02 18:09:27 -05:00
chenyu
2c8d619147
mlperf cron install influxdb3-python ( #13068 )
2025-11-02 17:55:40 -05:00
chenyu
4c22f089fc
mlperf cron install tensorflow try 2 ( #13067 )
2025-11-02 17:11:01 -05:00
chenyu
c58cf91850
mlperf cron install tensorflow ( #13066 )
2025-11-02 16:48:05 -05:00
chenyu
74db65cf72
update mlperf bert LOGMLPERF ( #13065 )
2025-11-02 15:26:37 -05:00
chenyu
b18293de96
train bert in mlperf cron ( #13064 )
...
more relevant now
2025-11-02 15:04:02 -05:00
George Hotz
036ee9f84c
Self type + mixins ( #13056 )
...
* use Self type
* mixin
* fix later
2025-11-02 13:30:01 +08:00
George Hotz
65a0a31475
AMD mi350x matmul from stream ( #13040 )
...
* works
* working mfma
* 120 TFLOPS
* regs
* 192 TFLOPS
* try pipelining
* something
* notes
* contract
* linter to 3.11
* that was a bug
2025-11-01 17:55:19 +08:00
nimlgen
f6786c1bfd
autogen: py314 ( #13038 )
...
* autogen: py314
* bump py?
2025-11-01 04:02:19 +08:00
George Hotz
5eb87ab131
hotfix: bump cifar time to 350
2025-10-30 17:29:20 +08:00
nimlgen
4b001ec723
amd: pmc in mockgpu ( #13000 )
...
* amd: pmc in mockgpu
* fix
* do not open in ci
2025-10-30 01:52:02 +08:00
b1tg
bb307b9e81
fix fp8 vectorization ( #12977 )
...
* fix fp8 vectorization
* add fp8 tc to benchmark
2025-10-28 13:55:30 -04:00
George Hotz
5e01cc299b
zero len ranges fail ( #12974 )
...
* zero len ranges fail
* fix Python backend
* fix llvm
* fix ptx
* yolo fix nir
* this works...
* always store...
* always store...
* Revert "always store..."
This reverts commit 0816cf344d .
2025-10-28 22:49:55 +08:00
George Hotz
e936aa7974
cleanups from if range branch ( #12973 )
2025-10-28 20:58:47 +08:00
George Hotz
2832954bcb
test with IGNORE_OOB=0 ( #12960 )
2025-10-28 10:32:19 +08:00
George Hotz
7784cec48e
pytest-split on spec ( #12959 )
2025-10-28 10:09:01 +08:00
b1tg
45e2f916a3
add quantize fp8 in llama3 ( #12893 )
...
* add quantize fp8 in llama3
* don't truncate fp8 alu result
* cast to float32 before matmul
* --model weights/LLaMA-3/8B-SF-DPO/
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-10-27 10:22:57 -04:00
George Hotz
25c2da1579
check SPEC=2 in CI ( #12945 )
...
* check SPEC=2 in CI
* split SPEC=2
* fast enough
2025-10-27 21:53:57 +08:00
George Hotz
8a941d95a4
SPEC=2 is full spec, SPEC=1 is default ( #12910 )
...
* SPEC=1 passes all tests
* just use SPEC, not __debug__
2025-10-25 11:10:43 +08:00
chenyu
4b7329001d
clean up test_avg_pool3d ( #12905 )
2025-10-24 14:31:36 -04:00
chenyu
154b4f9f40
test FUSE_OPTIM=1 test/test_optim.py ( #12895 )
2025-10-23 15:54:27 -04:00
wozeparrot
6e00dec95d
feat: pin openpilot 0.10.1 models ( #12878 )
2025-10-22 14:57:54 -07:00
chenyu
f0831c8c30
add 0.10.0 to comma benchmark ( #12875 )
...
* add 0.10.0 to comma benchmark
disabled the 0.10.1 ones which are pinned to master. it does not work because benchmark uses the cached old version
* that's pinned
2025-10-22 15:18:21 -04:00