openpilot kernel fix from 209 to 207 (#2006)

* Fix openpilot kernel from 209 to 206 1. Use push_movement_ops conditions in _movement_op. Don't push PAD or check if the ops are safe to be pushed with PAD 2. Don't push if all the op.buffers are realized * change ALLOWED_KERNEL_COUNT to 206 for openpilot * don't push through sourceless buffers * change the tests to adjust kernel counts for new behaviour * restore pushing of movement ops through childless buffer * don't push EXPAND, causes OOM * allow push of intermediate movement ops * adding new test behaviour * modifying external_test_opt for new behaviour * restore old tests * Reenable push of EXPAND and introduce new tests I was wrong intially thinking EXPAND can cause OOM and hence I had disabled it. Since it is 0 stride and doesn't allocate memory its cool * Don't push EXPAND above LoadOps LB. This is causing OOM * Push should be decided on movement root of bufs To check if ast.op.buffers is sourceless/ realized go the the movement root and then decide if pushing should be done or not * refactor for readability * use .base instead * don't push expand, bad memory/compute consumption * restrict push of reshape, seeing improvement * push reshape if unary without further check * disable PAD solves convnext kernel count increase * reenable test_cache_binaryop_transpose * small nit
2026-01-09 15:08:02 -05:00 · 2023-10-14 00:29:15 +05:30
parent 90c777d815
commit 63869c62fc
3 changed files with 11 additions and 7 deletions
--- a/test/external/external_test_opt.py
+++ b/test/external/external_test_opt.py
@@ -64,7 +64,7 @@ class TestInferenceMinKernels(unittest.TestCase):
    for p in get_parameters(model): p.assign(np.zeros(p.shape, dtype=p.dtype.np))
    img = Tensor.randn(1, 3, 224, 224)
    # TODO: this seems very high
-    with CLCache(116):
+    with CLCache(115):
      model.forward(img).realize()

  def test_resnet(self):
@@ -78,7 +78,7 @@ class TestInferenceMinKernels(unittest.TestCase):
    model = ViT(embed_dim=192, num_heads=3)
    for p in get_parameters(model): p.assign(np.zeros(p.shape, dtype=p.dtype.np))
    img = Tensor.randn(1, 3, 224, 224)
-    with CLCache(223): # NOTE: this is way too high
+    with CLCache(222): # NOTE: this is way too high
      out = model.forward(img)
      assert len(CacheCollector.cache) == 0, "ViT prerealized?"
      out.realize()
@@ -88,7 +88,7 @@ class TestInferenceMinKernels(unittest.TestCase):
    args_tiny = {"dim": 512, "multiple_of": 256, "n_heads": 8, "n_layers": 4, "norm_eps": 1e-05, "vocab_size": 1000}
    model = Transformer(**args_tiny)
    for p in get_parameters(model): p.assign(np.zeros(p.shape, dtype=p.dtype.np))
-    with CLCache(94):
+    with CLCache(85):
      model(Tensor([[1,2,3,4]]), 0).realize()

@unittest.skipUnless(Device.DEFAULT == "GPU", "Not Implemented")