Commit Graph

929 Commits

Author SHA1 Message Date
nimlgen
25440f0f72 all2all (#13902)
* all2all

* um

* fix

* x

* um

* simler

* mypy

* fix

* t

* cmnts
2025-12-31 16:38:32 +03:00
George Hotz
43c6e973d8 add optional compiler in Renderer (#13817)
* add optional compiler in Renderer [pr]

* fix

* late init

* remove precompiled

* cleanup
2025-12-23 17:58:46 -05:00
nimlgen
90b217896f am: xgmi p2p (#13811)
* system: use addr space

* am: xgmi

* fix

* ugh
2025-12-23 20:11:38 +03:00
George Hotz
8dcba2e2cc no full_rewrite [pr] (#13809)
* no full_rewrite [pr]

* fix

* fix docs
2025-12-22 23:20:01 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
4b741e893f remove REMOTE=1 (#13722)
* remove REMOTE=1

* leave ibverbs
2025-12-16 15:58:10 -04:00
nimlgen
e36385e570 am: support xgmi systems (#13659)
* am: support xgmi systems

* fake_am
2025-12-12 18:55:45 +03:00
Douglas Nyberg
947c6eefc3 add Swish op (#13541)
* add Swish ONNX operator

* add Swish regression test

* remove trailing whitespace

* upgrade ONNX to 1.20, add excludes for unimplemented ops

* upgrade ONNX to 1.19, add Swish op

* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op

* exclude attention_3d and attention_4d_gqa tests

* exclude attention fp16 tests

* exclude all attention tests

* retrigger CI

* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
George Hotz
c5bd28e21d start work on schedule cache (#13529)
* start work on schedule cache

* local unique

* schedule cache works

* schedule cache cleanup

* fix tests

* preserve metadata

* oops, fix cache

* put that there

* fix spec

* always miss

* why is that broken?

* src[0].op

* fix process replay

* delete abstractions2

* reenable the actual schedule cache

* metadata is best effort

* fix JIT in examples/gradaccum_mnist.py

* full jit

* fixed and test is real
2025-12-04 17:24:49 -08:00
Douglas Nyberg
a8a62bc08e add max/min reduction support to ScatterND (#13562) 2025-12-04 00:53:47 -08:00
Douglas Nyberg
f5abd38132 remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555) 2025-12-03 17:48:27 -05:00
George Hotz
055d5aeb7f add external_test_process_count 2025-12-02 17:26:30 -08:00
qazal
366badaa68 require renderer argument in get_program, removes device opening in process replay [pr] (#13524) 2025-12-03 02:05:31 +08:00
George Hotz
6a140f74fe split out unique_const and cache const [pr] (#13493)
* split out unique_const

* add cache to const

* call const in unique_const
2025-11-29 10:44:28 -08:00
George Hotz
18addc0a1d process replay only get_program (#13475) 2025-11-27 08:18:18 -08:00
George Hotz
a8e005b095 enable process replay (non-checking) by default (#13474) 2025-11-27 07:28:44 -08:00
George Hotz
05cd2279d0 add cache on reshape (#13466)
* remove cache on divmod, way less objects

* _apply_reshape

* reshape

* no gc on realize

* wow that cache is fast
2025-11-26 18:57:40 -08:00
Sieds Lykles
63a931ff76 Symbolic divisor fuzzer (#13433)
* render z3 range better

* working version

* rename

* add to workflow

* factor out variable_names

* smaller expressions

* smaller

* + back
2025-11-23 20:29:32 +01:00
qazal
9dcd52287a add external_benchmark_pyrender (#13378)
* add external_benchmark_pyrender

* can ctrlc it

* cpu_profile exists
2025-11-20 17:38:28 +08:00
George Hotz
8919c994b7 Revert "AxisType.PLACEHOLDER in reshape to do less graph_rewrite (#13373)" (#13375)
This reverts commit ac7559e33d.
2025-11-19 19:34:30 -08:00
George Hotz
ac7559e33d AxisType.PLACEHOLDER in reshape to do less graph_rewrite (#13373)
* AxisType.PLACEHOLDER in reshape to do less graph_rewrite

* _apply_movement_op cache
2025-11-19 19:19:58 -08:00
George Hotz
ab7df42c78 bring back fold_divmod_general with bugfix and test [pr] (#13369)
* Revert "Revert "merge to fold_divmod_general [p] (#13359)""

This reverts commit 05ccc69248.

* Revert "Revert "actually merge to fold_divmod_general [pr] (#13363)""

This reverts commit 90e5752199.

* Revert "Revert "add cache to fold_divmod_general (#13365)""

This reverts commit 8e17bd6791.

* bring back fold_divmod_general with bugfix and test
2025-11-19 14:51:51 -08:00
George Hotz
8e17bd6791 Revert "add cache to fold_divmod_general (#13365)"
This reverts commit b5309a5043.
2025-11-19 14:18:08 -08:00
George Hotz
b5309a5043 add cache to fold_divmod_general (#13365) 2025-11-19 13:49:18 -08:00
George Hotz
6fdbd03104 more divmod cleanup [p] (#13358)
* more divmod cleanup [p]

* lil cleanups, faster
2025-11-19 10:35:15 -08:00
George Hotz
385618d45b skip process replay by default (#13353) 2025-11-19 08:25:34 -08:00
George Hotz
6d3385c284 print special ops in postrange (#13318)
* print special ops in postrange

* fix on OSX
2025-11-17 14:43:23 -08:00
George Hotz
e5351699bd openpilot warp (#13283)
* openpilot image warp test

* 0.4 ms on metal, 1 ms on CPU

* new inputs each time

* reshape
2025-11-14 13:55:32 -08:00
wozeparrot
759557f633 feat: move tk tests to testextra (#13242) 2025-11-12 17:06:53 -08:00
Jan Akhremchik
bc8e537423 Add NONZERO op to onnx backend (#13211) 2025-11-12 08:55:51 -08:00
wozeparrot
371c1f2355 tk: move tiles to class (#13224) 2025-11-11 21:53:46 -08:00
wozeparrot
222bb12ddf tk softmax (#13205) 2025-11-11 15:13:16 -08:00
wozeparrot
73497af4c0 clean: use np for allclose (#13204) 2025-11-10 23:02:43 -08:00
chenyu
22b8579234 one last regressed dm kernel (#13201) 2025-11-10 23:30:52 -05:00
chenyu
829cdafccc update openpilot slow conv uop ast (#13197)
the two remaining slow ones
2025-11-10 17:03:20 -05:00
wozeparrot
6252831ceb feat: initial tk library (#13160) 2025-11-09 22:54:29 -08:00
chenyu
2ba8b4946f external_benchmark_op_cat.py (#13168)
* external_benchmark_op_cat.py

cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS

* fix
2025-11-08 01:54:10 -05:00
nimlgen
dafdb4bfb1 test hcq open with pytest (#13124)
* test hcq open with pytest

* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87 system: fix flock on pcidevs (#13123)
* system: fix locking of hcq devices

* rename and fullrun

* force ok

* fix

* fix
2025-11-06 19:02:13 +08:00
chenyu
18d4ecc1f3 lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
chenyu
54141e9cb9 DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
chenyu
f6430a0559 add script for one slow openpilot conv (#12953)
* add script for one slow openpilot conv

* fix ruff
2025-10-30 18:08:41 -04:00
George Hotz
f5a3b33d33 add fun with nhwc convs 2025-10-28 17:12:22 +08:00
Sieds Lykles
e22c5e7e73 process_replay uses opts argument for KernelInfo.opts_to_apply (#12946)
* opts_to_apply is opts

* skip beamed kernels

* simpler change

* fix the tensor cores tests for process replay

* use opts
2025-10-28 09:00:28 +01:00
chenyu
a79832b01f control_flow.py -> linearizer.py [pr] (#12948) 2025-10-27 12:38:13 -04:00
Sieds Lykles
e1f8c82938 Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 (#12109)
* match onnx spec

* use least_upper_dtype

* promote the square

* just cast before the square
2025-10-24 12:26:11 +02:00
George Hotz
7762b3558b clean up the spec (#12868)
* tighten up the spec

* move validate into a different file

* that moved to validate

* after(barr)
2025-10-22 19:50:42 +08:00