wozeparrot
222bb12ddf
tk softmax ( #13205 )
2025-11-11 15:13:16 -08:00
wozeparrot
787f0070ed
feat: don't use output reg as local reduce reg ( #13203 )
2025-11-11 14:35:16 -08:00
chenyu
ece1415def
clean up image_dot and image_conv2d ( #13222 )
...
* clean up image_dot and image_conv2d
* those are fine
* interesting
2025-11-11 15:53:03 -05:00
nimlgen
2f0ea29b34
qcom: 48bit timestamps ( #13214 )
...
* qcom: 48bit timestamps
* f
* lol
* fix
2025-11-12 04:14:33 +08:00
qazal
bc55bc4849
cleanup test_viz profiler tests ( #13221 )
2025-11-12 03:46:48 +08:00
chenyu
23b90945c3
add a benchmark for openpilot vision with DEBUG=2 ( #13219 )
...
see per kernel speed, also disable the jobs for 0.9.9
2025-11-11 14:41:52 -05:00
George Hotz
c2075f3613
gc disable during big rewrites ( #13215 )
...
* gc disable during big rewrites
* cleaner with helper
2025-11-11 10:30:47 -08:00
Roelof van Dijk
e59313da08
migrate pytest and ruff ( #13216 )
2025-11-11 13:27:51 -05:00
Gaétan Lepage
6fd7ce3832
migrate to pyproject.toml ( #13189 )
...
* migrate to pyproject.toml
* move mypy config to pyproject.toml
2025-11-11 09:09:27 -08:00
qazal
8002921a04
viz: improve the program run tooltip ( #13212 )
...
* add tflops to tooltip format
* show if the run was batched
2025-11-12 00:56:03 +08:00
qazal
f91e366a17
viz: display the graph layout recursion error ( #13194 )
...
* viz: display the graph layout recursion error
* share styles
* +min-width
* same thing
* inline the append
2025-11-11 15:25:12 +08:00
wozeparrot
73497af4c0
clean: use np for allclose ( #13204 )
2025-11-10 23:02:43 -08:00
George Hotz
a6360fd94d
store can have shape ( #13202 )
...
* store can have shape
* _shape
2025-11-10 22:16:47 -08:00
b1tg
f3692b7406
clean up hip renderer ( #13063 )
...
* clean up hip renderer
* ocml
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-11 00:44:24 -05:00
chenyu
22b8579234
one last regressed dm kernel ( #13201 )
2025-11-10 23:30:52 -05:00
chenyu
58b7e4fab3
GROUPTOP heuristic on more axes ( #13206 )
...
fixed dm speed
2025-11-10 23:30:37 -05:00
chenyu
829cdafccc
update openpilot slow conv uop ast ( #13197 )
...
the two remaining slow ones
2025-11-10 17:03:20 -05:00
George Hotz
0c978d45e6
stub attention ( #13196 )
...
* stub attention
* name the kernels
2025-11-10 13:48:38 -08:00
chenyu
58c30fc7ce
minor image_conv2d cleanup ( #13193 )
2025-11-10 16:05:40 -05:00
chenyu
60e55d9a2d
line count 18500 ( #13191 )
2025-11-10 13:52:13 -05:00
nimlgen
09a59c2203
qcom: support new chip versioning ( #13185 )
...
* qcom: support new chip versioning
* ops
* nit
* fix
* f
2025-11-10 23:57:29 +08:00
qazal
50934050bc
sqtt: append all wave execs ( #13190 )
2025-11-10 23:50:08 +08:00
qazal
38a24731a1
cleanup sqtt tooling ( #13188 )
...
* cleanup viz/serve.py
* use latest profile in rgptool.py
* unwrap nullable in roc.py, fix disasms typing
2025-11-10 20:52:57 +08:00
qazal
845a24dcc6
viz: group sqtt waves by program ( #13187 )
...
* viz: group sqtt waves by program
* color the names
2025-11-10 19:25:23 +08:00
George Hotz
fd6803000e
mutmut cfg ( #13184 )
...
* mutmut cfg
* coveragerc
2025-11-09 23:29:29 -08:00
wozeparrot
6252831ceb
feat: initial tk library ( #13160 )
2025-11-09 22:54:29 -08:00
George Hotz
925231aec1
repeat does less reshape for 1s ( #13183 )
2025-11-09 19:43:02 -08:00
George Hotz
d7369de048
hotfix: update weekly commits table
2025-11-09 19:37:06 -08:00
chenyu
6c48c87e51
improved ASSERT_MIN_STEP_TIME ( #13182 )
...
* improved ASSERT_MIN_STEP_TIME
getting close, current time +1ms then round up
* relax
2025-11-09 16:41:12 -05:00
nimlgen
17715688c7
system: validate vendor for APLPCIIfaceBase ( #13181 )
2025-11-10 02:49:21 +08:00
nimlgen
614783693e
nv: remove hardcoded expansion_rom_off ( #13180 )
...
* nv: remove hardcoded expansion_rom_off
* to max size
2025-11-09 21:43:19 +08:00
chenyu
e1d46de8f8
update GROUPTOP heuristic more ( #13178 )
...
reverts #13176
2025-11-09 02:31:12 -05:00
chenyu
41e45c20ff
minor stuff reading the printed code [pr] ( #13177 )
2025-11-09 00:58:51 -05:00
chenyu
8e868dced8
only GROUPTOP one reduce kernel ( #13176 )
...
* only GROUPTOP one reduce kernel
* ALLOWED_GATED_READ_IMAGE=148
2025-11-08 22:38:44 -05:00
chenyu
834067d91f
move onnx import in compile3 ( #13172 )
...
only used in test_vs_onnx
2025-11-08 09:44:34 -08:00
nimlgen
7f3240dbfe
nv: cleanup alloc ( #13170 )
...
* nv: cleanup alloc
* okay okay
2025-11-09 00:14:46 +08:00
qazal
7250fc0354
viz: double click on kernel run goes to codegen ( #13147 )
2025-11-08 23:40:50 +08:00
qazal
8a7fa9e7b4
sqtt: show total cycles of kernel in viz ( #13169 )
2025-11-08 21:00:40 +08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py ( #13168 )
...
* external_benchmark_op_cat.py
cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS
* fix
2025-11-08 01:54:10 -05:00
chenyu
a62496cb3d
clean up get_grouped_dims [pr] ( #13159 )
2025-11-08 01:53:54 -05:00
wozeparrot
eb0192b0bb
feat: print ranges that aren't ended ( #13167 )
2025-11-07 22:01:29 -08:00
George Hotz
b41541bc44
bounty: Remove Tensor._pool alternative implementation and verify kernels remain the same ( #13164 )
2025-11-07 16:59:48 -08:00
George Hotz
ffb9e8396f
fix indexing bug with convs
...
* minimal difference for ONE_POOL=1
* fix indexing bug
* improve indexing debugger
* more debugger improvements
* always for reshape
2025-11-07 16:45:19 -08:00
chenyu
6a509da7f3
Scheduler.reduceops helper [pr] ( #13162 )
2025-11-07 18:59:46 -05:00
George Hotz
2413311289
make _pool simpler ( #13161 )
...
* make _pool simpler
* just syntax
* more correct and smaller
* try this now
* Revert "try this now"
This reverts commit 607cdc2164 .
* ONE_POOL
2025-11-07 15:58:44 -08:00
George Hotz
70054cdb14
move backward cast to broadcasted, expand to mixins ( #13156 )
...
* shrink_to mixin
* move backward cast into _broadcasted
* expand to movement mixin
* move a few more
* fix spec issue
2025-11-07 15:07:47 -08:00
George Hotz
f2519ea0ba
shrink_to mixin ( #13155 )
2025-11-07 11:46:24 -08:00
C T
0f9d7f650d
whisper: fix oob, explicit dtype ( #13144 )
...
* fix dtype depending on numpy version
numpy v2 np.array returns int64 which Tensor passed through for the
first decode call, swallowing the <|notimestamps|> token and corrupting
the sequence
* fix whisper OOB
global limit on whisper's context length
* enforce whisper max_tokens_to_sample (match openai)
local limit on max tokens decoded
2025-11-07 12:55:01 -05:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case ( #13142 )
...
* Fix gpudims bug on webgpu
* Fix split dim bug
* Remove webgpu_bug from examples
* Add test for shape correctness
* Fix 3D indexing
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb
device: no compilers message with reasons ( #13146 )
...
* device: no compilers message with reasons
* typings
* mypy
2025-11-07 23:01:45 +08:00