chenyu
51398edf9c
fix indirect import ( #13958 )
...
also deleted old external tests
2026-01-01 14:22:45 -05:00
nimlgen
25440f0f72
all2all ( #13902 )
...
* all2all
* um
* fix
* x
* um
* simler
* mypy
* fix
* t
* cmnts
2025-12-31 16:38:32 +03:00
George Hotz
43c6e973d8
add optional compiler in Renderer ( #13817 )
...
* add optional compiler in Renderer [pr]
* fix
* late init
* remove precompiled
* cleanup
2025-12-23 17:58:46 -05:00
nimlgen
90b217896f
am: xgmi p2p ( #13811 )
...
* system: use addr space
* am: xgmi
* fix
* ugh
2025-12-23 20:11:38 +03:00
George Hotz
8dcba2e2cc
no full_rewrite [pr] ( #13809 )
...
* no full_rewrite [pr]
* fix
* fix docs
2025-12-22 23:20:01 -05:00
chenyu
7f1d41c9f9
delete files that import ShapeTracker ( #13805 )
2025-12-22 15:54:18 -05:00
George Hotz
45c459848d
remove more stale stuff ( #13765 )
...
* remove more stale stuff
* remove disassemblers/adreno
* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0
remove ScheduleItem and merge it with ExecItem ( #13759 )
...
* remove ExecItem and merge it with ScheduleItem
* less diff
* fix issues
* min diff
* don't change bufs in _lower
* min diff
* update
* revert
* fixes
* diff
2025-12-19 17:04:24 -04:00
George Hotz
3dbde178c1
mark slow tests as slow instead of as CI ( #13736 )
...
* mark slow tests as slow instead of as CI
* CI shouldn't have different behavior
* more skips / CI
* slow
2025-12-17 10:29:57 -04:00
George Hotz
4b741e893f
remove REMOTE=1 ( #13722 )
...
* remove REMOTE=1
* leave ibverbs
2025-12-16 15:58:10 -04:00
nimlgen
e36385e570
am: support xgmi systems ( #13659 )
...
* am: support xgmi systems
* fake_am
2025-12-12 18:55:45 +03:00
Douglas Nyberg
947c6eefc3
add Swish op ( #13541 )
...
* add Swish ONNX operator
* add Swish regression test
* remove trailing whitespace
* upgrade ONNX to 1.20, add excludes for unimplemented ops
* upgrade ONNX to 1.19, add Swish op
* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op
* exclude attention_3d and attention_4d_gqa tests
* exclude attention fp16 tests
* exclude all attention tests
* retrigger CI
* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
George Hotz
c5bd28e21d
start work on schedule cache ( #13529 )
...
* start work on schedule cache
* local unique
* schedule cache works
* schedule cache cleanup
* fix tests
* preserve metadata
* oops, fix cache
* put that there
* fix spec
* always miss
* why is that broken?
* src[0].op
* fix process replay
* delete abstractions2
* reenable the actual schedule cache
* metadata is best effort
* fix JIT in examples/gradaccum_mnist.py
* full jit
* fixed and test is real
2025-12-04 17:24:49 -08:00
Douglas Nyberg
a8a62bc08e
add max/min reduction support to ScatterND ( #13562 )
2025-12-04 00:53:47 -08:00
Douglas Nyberg
f5abd38132
remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS ( #13555 )
2025-12-03 17:48:27 -05:00
George Hotz
055d5aeb7f
add external_test_process_count
2025-12-02 17:26:30 -08:00
qazal
366badaa68
require renderer argument in get_program, removes device opening in process replay [pr] ( #13524 )
2025-12-03 02:05:31 +08:00
George Hotz
6a140f74fe
split out unique_const and cache const [pr] ( #13493 )
...
* split out unique_const
* add cache to const
* call const in unique_const
2025-11-29 10:44:28 -08:00
George Hotz
18addc0a1d
process replay only get_program ( #13475 )
2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095
enable process replay (non-checking) by default ( #13474 )
2025-11-27 07:28:44 -08:00
George Hotz
05cd2279d0
add cache on reshape ( #13466 )
...
* remove cache on divmod, way less objects
* _apply_reshape
* reshape
* no gc on realize
* wow that cache is fast
2025-11-26 18:57:40 -08:00
Sieds Lykles
63a931ff76
Symbolic divisor fuzzer ( #13433 )
...
* render z3 range better
* working version
* rename
* add to workflow
* factor out variable_names
* smaller expressions
* smaller
* + back
2025-11-23 20:29:32 +01:00
qazal
9dcd52287a
add external_benchmark_pyrender ( #13378 )
...
* add external_benchmark_pyrender
* can ctrlc it
* cpu_profile exists
2025-11-20 17:38:28 +08:00
George Hotz
8919c994b7
Revert "AxisType.PLACEHOLDER in reshape to do less graph_rewrite ( #13373 )" ( #13375 )
...
This reverts commit ac7559e33d .
2025-11-19 19:34:30 -08:00
George Hotz
ac7559e33d
AxisType.PLACEHOLDER in reshape to do less graph_rewrite ( #13373 )
...
* AxisType.PLACEHOLDER in reshape to do less graph_rewrite
* _apply_movement_op cache
2025-11-19 19:19:58 -08:00
George Hotz
ab7df42c78
bring back fold_divmod_general with bugfix and test [pr] ( #13369 )
...
* Revert "Revert "merge to fold_divmod_general [p] (#13359 )""
This reverts commit 05ccc69248 .
* Revert "Revert "actually merge to fold_divmod_general [pr] (#13363 )""
This reverts commit 90e5752199 .
* Revert "Revert "add cache to fold_divmod_general (#13365 )""
This reverts commit 8e17bd6791 .
* bring back fold_divmod_general with bugfix and test
2025-11-19 14:51:51 -08:00
George Hotz
8e17bd6791
Revert "add cache to fold_divmod_general ( #13365 )"
...
This reverts commit b5309a5043 .
2025-11-19 14:18:08 -08:00
George Hotz
b5309a5043
add cache to fold_divmod_general ( #13365 )
2025-11-19 13:49:18 -08:00
George Hotz
6fdbd03104
more divmod cleanup [p] ( #13358 )
...
* more divmod cleanup [p]
* lil cleanups, faster
2025-11-19 10:35:15 -08:00
George Hotz
385618d45b
skip process replay by default ( #13353 )
2025-11-19 08:25:34 -08:00
George Hotz
6d3385c284
print special ops in postrange ( #13318 )
...
* print special ops in postrange
* fix on OSX
2025-11-17 14:43:23 -08:00
George Hotz
e5351699bd
openpilot warp ( #13283 )
...
* openpilot image warp test
* 0.4 ms on metal, 1 ms on CPU
* new inputs each time
* reshape
2025-11-14 13:55:32 -08:00
wozeparrot
759557f633
feat: move tk tests to testextra ( #13242 )
2025-11-12 17:06:53 -08:00
Jan Akhremchik
bc8e537423
Add NONZERO op to onnx backend ( #13211 )
2025-11-12 08:55:51 -08:00
wozeparrot
371c1f2355
tk: move tiles to class ( #13224 )
2025-11-11 21:53:46 -08:00
wozeparrot
222bb12ddf
tk softmax ( #13205 )
2025-11-11 15:13:16 -08:00
wozeparrot
73497af4c0
clean: use np for allclose ( #13204 )
2025-11-10 23:02:43 -08:00
chenyu
22b8579234
one last regressed dm kernel ( #13201 )
2025-11-10 23:30:52 -05:00
chenyu
829cdafccc
update openpilot slow conv uop ast ( #13197 )
...
the two remaining slow ones
2025-11-10 17:03:20 -05:00
wozeparrot
6252831ceb
feat: initial tk library ( #13160 )
2025-11-09 22:54:29 -08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py ( #13168 )
...
* external_benchmark_op_cat.py
cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS
* fix
2025-11-08 01:54:10 -05:00
nimlgen
dafdb4bfb1
test hcq open with pytest ( #13124 )
...
* test hcq open with pytest
* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87
system: fix flock on pcidevs ( #13123 )
...
* system: fix locking of hcq devices
* rename and fullrun
* force ok
* fix
* fix
2025-11-06 19:02:13 +08:00
chenyu
18d4ecc1f3
lower nv test_gemm_4096 target ( #13107 )
2025-11-05 11:05:16 -05:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical ( #13096 )
2025-11-04 11:28:18 -05:00
chenyu
f6430a0559
add script for one slow openpilot conv ( #12953 )
...
* add script for one slow openpilot conv
* fix ruff
2025-10-30 18:08:41 -04:00
George Hotz
f5a3b33d33
add fun with nhwc convs
2025-10-28 17:12:22 +08:00
Sieds Lykles
e22c5e7e73
process_replay uses opts argument for KernelInfo.opts_to_apply ( #12946 )
...
* opts_to_apply is opts
* skip beamed kernels
* simpler change
* fix the tensor cores tests for process replay
* use opts
2025-10-28 09:00:28 +01:00
chenyu
a79832b01f
control_flow.py -> linearizer.py [pr] ( #12948 )
2025-10-27 12:38:13 -04:00
Sieds Lykles
e1f8c82938
Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 ( #12109 )
...
* match onnx spec
* use least_upper_dtype
* promote the square
* just cast before the square
2025-10-24 12:26:11 +02:00