conv2d is an hlop (#589)

* conv2d is an hlop * shorter conv * KOPT=-1 * alt imp * MULACC * smarter mulacc * pop conv * 7x7 -> 5x5 * didn't fix, that's not going to work * this is faster and matches old behavior * oh, non lazy just won't work with mulacc * mulacc in torch * bool types were creeping in * optimizer is actually better with hlop conv * fix pushing permutes issue * refactor einsum_mulacc * fix up readme * update readme * _image_conv2d * fix bias addition location * pushing permutes gets back to 200 kernels * conv cleanup * disable hlop conv * don't hide that in helpers
2026-01-09 15:08:02 -05:00 · 2023-02-23 17:52:31 -08:00
parent 8835df7a5c
commit 758515dcc0
13 changed files with 177 additions and 55 deletions
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -56,9 +56,7 @@ jobs:
    - name: Install Dependencies
      run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
    - name: Run Pytest
-      run: LAZY=0 python -m pytest -s -v -n=auto
-    - name: Run Pytest (lazy)
-      run: LAZY=1 python -m pytest -s -v -n=auto
+      run: python -m pytest -s -v -n=auto

  testimagenet:
    name: ImageNet to C Compile Test
@@ -110,9 +108,7 @@ jobs:
    - name: Install Dependencies
      run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
    - name: Run Pytest
-      run: LAZY=0 TORCH=1 python -m pytest -s -v -n=auto
-    - name: Run Pytest (lazy)
-      run: LAZY=1 TORCH=1 python -m pytest -s -v -n=auto
+      run: TORCH=1 python -m pytest -s -v -n=auto

  testgpu:
    name: GPU Tests