Commit Graph

11015 Commits

Author SHA1 Message Date
George Hotz
17aa3379e9 hotfix: improve self_tokenize 2025-11-13 00:18:57 -08:00
chenyu
4e5a9132e7 JIT_BATCH_SIZE=0 in compile3 (#13245)
fixed some enqueue time
2025-11-12 23:12:45 -05:00
wozeparrot
759557f633 feat: move tk tests to testextra (#13242) 2025-11-12 17:06:53 -08:00
chenyu
3f939f3d3c update pm_simplify_valid (#13241)
* update pm_simplify_valid

fixed openpilot conv regression

* IMAGE training is broken
2025-11-12 19:40:02 -05:00
chenyu
f9851a852f minor update to uop_given_valid [pr] (#13243)
split from #13241
2025-11-12 19:03:18 -05:00
qazal
fe2876a6d8 hotfix: second GB/s in viz (#13240) 2025-11-13 07:14:27 +08:00
George Hotz
a23dea202b actually make AMD_LLVM not default (#13238) 2025-11-12 15:07:23 -08:00
George Hotz
ab9fa964d8 DISABLE_COMPILER_CACHE -> CCACHE (#13234)
* DISABLE_COMPILER_CACHE -> CCACHE

* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
qazal
be2e24cb25 roc: requires sudo to install (#13237) 2025-11-12 16:59:22 -05:00
George Hotz
8f1f195b6d hotfix: no hexdump for usbgpu patch.py 2025-11-12 12:05:37 -08:00
nimlgen
9a53fcbde4 amd: sqtt on rdna3.5 (#13233) 2025-11-13 03:30:42 +08:00
George Hotz
13f10a31dc AMD_LLVM default off (#13232) 2025-11-12 11:06:33 -08:00
qazal
8b26cf2b3d sqtt: update rcp timing test (#13231)
* sqtt: assert correct output in timing test

* found why
2025-11-13 02:01:54 +08:00
Jan Akhremchik
bc8e537423 Add NONZERO op to onnx backend (#13211) 2025-11-12 08:55:51 -08:00
nimlgen
af17e07251 viz: sqtt touchups (#13228)
* viz: sqtt touchups

* revert

* matches
2025-11-12 22:40:37 +08:00
qazal
7a6853fa40 viz: show python callstack in the first graph (#13218) 2025-11-12 20:52:28 +08:00
nimlgen
82eb63d3ad qcom: auto switch idle timer when profiling (#13230)
* qcom: auto switch idle timer when profiling

* fi
2025-11-12 20:31:24 +08:00
nimlgen
fcd8d0751a test_timing for hip (#13229) 2025-11-12 20:28:58 +08:00
qazal
74b9d33acb viz: direct link to program source (#13227) 2025-11-12 16:27:13 +08:00
wozeparrot
371c1f2355 tk: move tiles to class (#13224) 2025-11-11 21:53:46 -08:00
Christopher Milan
41a098a82d In-tree autogen: libc.py (#13217)
* checkout changes from autogen branch

* parents

* pylint happy

* move sys to system in helpers.py

* typo

* typo
2025-11-11 19:13:48 -08:00
wozeparrot
222bb12ddf tk softmax (#13205) 2025-11-11 15:13:16 -08:00
wozeparrot
787f0070ed feat: don't use output reg as local reduce reg (#13203) 2025-11-11 14:35:16 -08:00
chenyu
ece1415def clean up image_dot and image_conv2d (#13222)
* clean up image_dot and image_conv2d

* those are fine

* interesting
2025-11-11 15:53:03 -05:00
nimlgen
2f0ea29b34 qcom: 48bit timestamps (#13214)
* qcom: 48bit timestamps

* f

* lol

* fix
2025-11-12 04:14:33 +08:00
qazal
bc55bc4849 cleanup test_viz profiler tests (#13221) 2025-11-12 03:46:48 +08:00
chenyu
23b90945c3 add a benchmark for openpilot vision with DEBUG=2 (#13219)
see per kernel speed, also disable the jobs for 0.9.9
2025-11-11 14:41:52 -05:00
George Hotz
c2075f3613 gc disable during big rewrites (#13215)
* gc disable during big rewrites

* cleaner with helper
2025-11-11 10:30:47 -08:00
Roelof van Dijk
e59313da08 migrate pytest and ruff (#13216) 2025-11-11 13:27:51 -05:00
Gaétan Lepage
6fd7ce3832 migrate to pyproject.toml (#13189)
* migrate to pyproject.toml

* move mypy config to pyproject.toml
2025-11-11 09:09:27 -08:00
qazal
8002921a04 viz: improve the program run tooltip (#13212)
* add tflops to tooltip format

* show if the run was batched
2025-11-12 00:56:03 +08:00
qazal
f91e366a17 viz: display the graph layout recursion error (#13194)
* viz: display the graph layout recursion error

* share styles

* +min-width

* same thing

* inline the append
2025-11-11 15:25:12 +08:00
wozeparrot
73497af4c0 clean: use np for allclose (#13204) 2025-11-10 23:02:43 -08:00
George Hotz
a6360fd94d store can have shape (#13202)
* store can have shape

* _shape
2025-11-10 22:16:47 -08:00
b1tg
f3692b7406 clean up hip renderer (#13063)
* clean up hip renderer

* ocml

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-11 00:44:24 -05:00
chenyu
22b8579234 one last regressed dm kernel (#13201) 2025-11-10 23:30:52 -05:00
chenyu
58b7e4fab3 GROUPTOP heuristic on more axes (#13206)
fixed dm speed
2025-11-10 23:30:37 -05:00
chenyu
829cdafccc update openpilot slow conv uop ast (#13197)
the two remaining slow ones
2025-11-10 17:03:20 -05:00
George Hotz
0c978d45e6 stub attention (#13196)
* stub attention

* name the kernels
2025-11-10 13:48:38 -08:00
chenyu
58c30fc7ce minor image_conv2d cleanup (#13193) 2025-11-10 16:05:40 -05:00
chenyu
60e55d9a2d line count 18500 (#13191) 2025-11-10 13:52:13 -05:00
nimlgen
09a59c2203 qcom: support new chip versioning (#13185)
* qcom: support new chip versioning

* ops

* nit

* fix

* f
2025-11-10 23:57:29 +08:00
qazal
50934050bc sqtt: append all wave execs (#13190) 2025-11-10 23:50:08 +08:00
qazal
38a24731a1 cleanup sqtt tooling (#13188)
* cleanup viz/serve.py

* use latest profile in rgptool.py

* unwrap nullable in roc.py, fix disasms typing
2025-11-10 20:52:57 +08:00
qazal
845a24dcc6 viz: group sqtt waves by program (#13187)
* viz: group sqtt waves by program

* color the names
2025-11-10 19:25:23 +08:00
George Hotz
fd6803000e mutmut cfg (#13184)
* mutmut cfg

* coveragerc
2025-11-09 23:29:29 -08:00
wozeparrot
6252831ceb feat: initial tk library (#13160) 2025-11-09 22:54:29 -08:00
George Hotz
925231aec1 repeat does less reshape for 1s (#13183) 2025-11-09 19:43:02 -08:00
George Hotz
d7369de048 hotfix: update weekly commits table 2025-11-09 19:37:06 -08:00
chenyu
6c48c87e51 improved ASSERT_MIN_STEP_TIME (#13182)
* improved ASSERT_MIN_STEP_TIME

getting close, current time +1ms  then round up

* relax
2025-11-09 16:41:12 -05:00