CI < 5 minutes (#1252)

* models matrix * fix typo and install gpu deps * install llvm deps if needed * fix * testops with cuda * remove pip cache since not work * cuda env * install cuda deps * maybe it will work now * i can't read * all tests in matrix * trim down more * opencl stuff in matrix * opencl pip cache * test split * change cuda test exclusion * test * fix cuda maybe * add models * add more n=auto * third thing * fix bug * cache pip more * change name * update tests * try again cause why not * balance * try again... * try apt cache for cuda * try on gpu: * try cuda again * update packages step * replace libz-dev with zlib1g-dev * only cache cuda * why error * fix gpuocelot bug * apt cache err * apt cache to slow? * opt and image in single runner * add a couple n=autos * remove test matrix * try cuda apt cache again * libz-dev -> zlib1g-dev * remove -s since not supported by xdist * the cache takes too long and doesn't work * combine webgpu and metal tests * combine imagenet to c and cpu tests * torch tests with linters * torch back by itself * small windows clang test with torch tests * fix a goofy windows bug * im dumb * bro * clang with linters * fix pylint error * linter not work on windows * try with clang again * clang and imagenet? * install deps * fix * fix quote * clang by itself (windows too slow) * env vars for imagenet * cache pip for metal and webgpu tests * try torch with metal and webgpu * doesn't work, too long * remove -v * try -n=logical * don't use logical * revert accidental thing * remove some prints unless CI * fix print unless CI * ignore speed tests for slow tests * clang windows in matrix (ubuntu being tested in imagenet->c test) * try manual pip cache * fix windows pip cache path * all manual pip cache * fix pip cache dir for macos * print_ci function in helpers * CI as variable, no print_ci * missed one * cuda tests with docker image * remove setup-python action for cuda * python->python3? * remove -s -v * try fix pip cache * maybe fix * try to fix pip cache * is this the path? * maybe cache pip * try again * create wheels dir * ? * cuda pip deps in dockerfile * disable pip cache for clang * image from ghcr instead of docker hub * why is clang like this * fast deps * try use different caches * remove the fast thing * try with lighter image * remove setup python for cuda * small docker and cuda fast deps * ignore a few more tests * cool docker thing (maybe) * oops * quotes * fix docker command * fix bug * ignore train efficientnet test * remove dockerfile (docker stuff takes too long) * remove docker stuff and normal cuda * oops * ignore the tests for cuda * does this work * ignore test_train on slow backends * add space * llvm ignore same tests as cuda * nvm * ignore lr scheduler tests * get some stats * fix ignore bug * remove extra ' * remove and * ignore test for llvm * change ignored tests and durationon all backends * fix * and -> or * ignore some more cuda tests * finally? * does this fix it * remove durations=0 * add some more tests to llvm * make last pytest more readable * fix * don't train efficientnet on cpu * try w/out pip cache * pip cache seems to be generally better * pytest file markers * try apt fast for cuda * use quick install for apt-fast * apt-fast not worth * apt-get to apt * fix typo * suppress warnings * register markers * disable debug on fuzz tests * change marker names * apt update and apt install in one command * update marker names in test.yml * webgpu pytest marker
2026-01-23 05:48:08 -05:00 · 2023-07-23 15:00:56 -05:00
parent 47f9d82722
commit a0965ee198
23 changed files with 237 additions and 226 deletions
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -18,6 +18,11 @@ jobs:
      uses: actions/setup-python@v4
      with:
        python-version: 3.8
+    - name: Cache pip
+      uses: actions/cache@v3
+      with:
+        path: ~/.cache/pip
+        key: linting
    - name: Install dependencies
      run: pip install -e '.[linting,testing]' --extra-index-url https://download.pytorch.org/whl/cpu
    - name: Repo line count
@@ -31,12 +36,12 @@ jobs:
    - name: Run mypy
      run: mypy tinygrad/ --ignore-missing-imports --check-untyped-defs --explicit-package-bases --warn-unreachable
    - name: Install SLOCCount
-      run: sudo apt-get install sloccount
+      run: sudo apt install sloccount
    - name: Check <5000 lines
      run: sloccount tinygrad test examples extra; if [ $(sloccount tinygrad | sed -n 's/.*Total Physical Source Lines of Code (SLOC)[ ]*= \([^ ]*\).*/\1/p' | tr -d ',') -gt 5000 ]; then exit 1; fi

-  testcpu:
-    name: CPU Tests
+  testcpuimagenet:
+    name: CPU and ImageNet to C Tests
    runs-on: ubuntu-latest
    timeout-minutes: 20

@@ -47,6 +52,11 @@ jobs:
      uses: actions/setup-python@v4
      with:
        python-version: 3.8
+    - name: Cache pip
+      uses: actions/cache@v3
+      with:
+        path: ~/.cache/pip
+        key: testing
    - name: Install Dependencies
      run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
    - name: Test Docs
@@ -54,49 +64,11 @@ jobs:
    - name: Test Quickstart
      run: awk '/```python/{flag=1;next}/```/{flag=0}flag' docs/quickstart.md > quickstart.py && PYTHONPATH=. python3 quickstart.py
    - name: Run Pytest
-      run: python -m pytest -s -v -n=auto test/
+      run: python -m pytest -n=auto test/ -k "not (test_efficientnet and models/test_train.py)"
    - name: Fuzz Test symbolic
-      run: DEBUG=1 python test/external/fuzz_symbolic.py
+      run: python test/external/fuzz_symbolic.py
    - name: Fuzz Test shapetracker
-      run: PYTHONPATH="." DEBUG=1 python test/external/fuzz_shapetracker.py
-  
-  testwebgpu:
-    name: WebGPU Tests
-    runs-on: macos-13
-
-    steps:
-    - name: Checkout Code
-      uses: actions/checkout@v3
-    - name: Set up Python 3.8
-      uses: actions/setup-python@v4
-      with:
-        python-version: 3.8
-    - name: Install Dependencies
-      run: pip install -e '.[testing,webgpu]' --extra-index-url https://download.pytorch.org/whl/cpu
-    # - name: Set Env
-    #   run: printf "WEBGPU=1\nWGPU_BACKEND_TYPE=D3D12\n" >> $GITHUB_ENV
-    - name: Run Pytest
-      run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m pytest -s -v -n=auto test/test_ops.py test/test_speed_v_torch.py test/test_nn.py test/test_jit.py test/test_randomness.py test/test_tensor.py test/test_assign.py test/test_conv.py test/test_nn.py test/test_custom_function.py test/test_conv_shapetracker.py
-    - name: Build WEBGPU Efficientnet
-      run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m examples.webgpu.compile_webgpu
-    # - name: Install Puppeteer
-    #   run: npm install puppeteer
-    # - name: Run Efficientnet
-    #   run: node test/test_webgpu.js
-  testimagenet:
-    name: ImageNet to C Compile Test
-    runs-on: ubuntu-latest
-    timeout-minutes: 20
-
-    steps:
-    - name: Checkout Code
-      uses: actions/checkout@v3
-    - name: Set up Python 3.8
-      uses: actions/setup-python@v4
-      with:
-        python-version: 3.8
-    - name: Install Dependencies
-      run: pip install -e .
+      run: PYTHONPATH="." python test/external/fuzz_shapetracker.py
    - name: Compile EfficientNet to C
      run: PYTHONPATH="." CLANG=1 python3 examples/compile_efficientnet.py > recognize.c
    - name: Compile C to native
@@ -104,44 +76,6 @@ jobs:
    - name: Test EfficientNet
      run: curl https://media.istockphoto.com/photos/hen-picture-id831791190 | ./recognize | grep hen

-  testllvm:
-    name: LLVM Tests
-    runs-on: ubuntu-latest
-    timeout-minutes: 20
-
-    steps:
-    - name: Checkout Code
-      uses: actions/checkout@v3
-    - name: Set up Python 3.8
-      uses: actions/setup-python@v4
-      with:
-        python-version: 3.8
-    - name: Install Dependencies
-      run: pip install -e '.[llvm,testing]' --extra-index-url https://download.pytorch.org/whl/cpu
-    - name: Run Pytest
-      run: ENABLE_METHOD_CACHE=1 LLVM=1 python -m pytest -s -v -n=auto test/
-
-  testclang:
-    strategy:
-      matrix:
-        os: [ubuntu-latest, windows-latest]
-    runs-on: ${{ matrix.os }}
-    name: CLANG Tests ${{ matrix.os }} (w method cache)
-
-    steps:
-    - name: Checkout Code
-      uses: actions/checkout@v3
-    - name: Set up Python 3.8
-      uses: actions/setup-python@v4
-      with:
-        python-version: 3.8
-    - name: Install Dependencies
-      run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
-    - name: Set env
-      run: printf "CI=1\nCLANG=1\nENABLE_METHOD_CACHE=1" >> $GITHUB_ENV
-    - name: Run Pytest
-      run: python -m pytest -s -v -n=auto test/
-
  testtorch:
    name: Torch Tests
    runs-on: ubuntu-latest
@@ -154,79 +88,72 @@ jobs:
      uses: actions/setup-python@v4
      with:
        python-version: 3.8
+    - name: Cache pip
+      uses: actions/cache@v3
+      with:
+        path: ~/.cache/pip
+        key: testing
    - name: Install Dependencies
      run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
    - name: Run Pytest
-      run: TORCH=1 python -m pytest -s -v -n=auto test/
+      run: TORCH=1 python -m pytest -n=auto test/
    - name: Run ONNX
-      run: TORCH=1 python -m pytest test/external/external_test_onnx_backend.py --tb=no --disable-warnings || true
-
-  testgpu:
-    name: GPU Tests
-    runs-on: ubuntu-20.04
-    timeout-minutes: 20
-
-    steps:
-    - name: Checkout Code
-      uses: actions/checkout@v3
-    - name: Update packages
-      run: |
-        wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
-        echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
-        sudo apt-get update
-    - name: Install OpenCL
-      #run: sudo apt-get install -y pocl-opencl-icd
-      run: sudo apt-get install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
-    - name: Set up Python 3.8
-      uses: actions/setup-python@v4
-      with:
-        python-version: 3.8
-    - name: Install Dependencies
-      run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
-    - name: Run Optimizer Test (OPT 2 and 3)
-      run: |
-        PYTHONPATH="." OPT=2 GPU=1 python test/external/external_test_opt.py
-        PYTHONPATH="." OPT=3 GPU=1 python test/external/external_test_opt.py
-    - name: Run Pytest (default)
-      run: GPU=1 python -m pytest -s -v -n=auto test/
+      run: TORCH=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py --tb=no --disable-warnings || true

  testopencl:
-    name: openpilot (OpenCL) Test
+    strategy:
+      matrix:
+        task: [optimage, openpilot]
+    name: ${{ matrix.task=='optimage'&&'GPU OPT and IMAGE Tests'||'openpilot (OpenCL) Tests'}}
    runs-on: ubuntu-20.04
    timeout-minutes: 20

    steps:
-    - name: Checkout Code
-      uses: actions/checkout@v3
-    - name: Update packages
-      run: |
-        wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
-        echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
-        sudo apt-get update
-    - name: Install OpenCL
-      #run: sudo apt-get install -y pocl-opencl-icd
-      run: sudo apt-get install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
-    - name: Set up Python 3.8
-      uses: actions/setup-python@v4
-      with:
-        python-version: 3.8
-    - name: Install Dependencies
-      run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
-    - name: Test openpilot model compile and size
-      run: |
-        DEBUG=2 ALLOWED_KERNEL_COUNT=199 FLOAT16=1 DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
-        python3 -c 'import os; assert os.path.getsize("/tmp/output.thneed") < 100_000_000'
-    - name: Test GPU IMAGE ops
-      run: |
-        GPU=1 IMAGE=1 python3 test/test_ops.py
-        FORWARD_ONLY=1 GPU=1 IMAGE=2 python3 test/test_ops.py
-    - name: Test openpilot model correctness (float32)
-      run: DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
-    - name: Test tensor core ops
-      run: GPU=1 TC=2 python3 test/test_ops.py
+      - name: Checkout Code
+        uses: actions/checkout@v3
+      - name: Update packages
+        run: |
+          wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
+          echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
+          sudo apt update
+      - name: Install OpenCL
+        #run: sudo apt-get install -y pocl-opencl-icd
+        run: sudo apt install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
+      - name: Set up Python 3.8
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.8
+      - name: Cache pip
+        uses: actions/cache@v3
+        with:
+          path: ~/.cache/pip
+          key: testing
+      - name: Install Dependencies
+        run: pip install -e '.[testing]' --extra-index-url https://download.pytorch.org/whl/cpu
+      - if: ${{ matrix.task == 'optimage' }}
+        name: Run Optimizer Test (OPT 2 and 3)
+        run: |
+          PYTHONPATH="." OPT=2 GPU=1 python -m pytest -n=auto test/external/external_test_opt.py
+          PYTHONPATH="." OPT=3 GPU=1 python -m pytest -n=auto test/external/external_test_opt.py
+      - if: ${{ matrix.task == 'optimage'}}
+        name: Test GPU IMAGE ops
+        run: |
+          GPU=1 IMAGE=1 python3 -m pytest -n=auto test/test_ops.py
+          FORWARD_ONLY=1 GPU=1 IMAGE=2 python3 -m pytest -n=auto test/test_ops.py
+      - if: ${{ matrix.task == 'openpilot' }}
+        name: Test openpilot model compile and size
+        run: |
+          DEBUG=2 ALLOWED_KERNEL_COUNT=199 FLOAT16=1 DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
+          python3 -c 'import os; assert os.path.getsize("/tmp/output.thneed") < 100_000_000'
+      - if: ${{ matrix.task == 'openpilot' }}
+        name: Test openpilot model correctness (float32)
+        run: DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py
+      - if: ${{ matrix.task == 'openpilot' }}
+        name: Test tensor core ops
+        run: GPU=1 TC=2 python3 -m pytest -n=auto test/test_ops.py

-  testmetal:
-    name: Metal Tests
+  testmetalwebgpu:
+    name: Metal and WebGPU Tests
    runs-on: macos-13
    timeout-minutes: 20

@@ -237,19 +164,27 @@ jobs:
      uses: actions/setup-python@v4
      with:
        python-version: 3.11
+    - name: Cache pip
+      uses: actions/cache@v3
+      with:
+        path: ~/Library/Caches/pip
+        key: metalwebgpu
    - name: Install Dependencies
-      run: pip install -e '.[metal,testing]'
+      run: pip install -e '.[metal,webgpu,testing]' --extra-index-url https://download.pytorch.org/whl/cpu
    - name: Test LLaMA compile speed
      run: PYTHONPATH="." METAL=1 python3 test/external/external_test_speed_llama.py
    #- name: Run dtype test
    #  run: DEBUG=4 METAL=1 python -m pytest test/test_dtype.py
    # dtype test has issues on test_half_to_int8
-    - name: Run ops test
+    - name: Run metal ops test
      run: DEBUG=2 METAL=1 python -m pytest test/test_ops.py
    - name: Run JIT test
      run: DEBUG=2 METAL=1 python -m pytest test/test_jit.py
    # TODO: why not testing the whole test/?
-
+    - name: Run webgpu pytest
+      run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m pytest -n=auto -m 'webgpu'
+    - name: Build WEBGPU Efficientnet
+      run: WEBGPU=1 WGPU_BACKEND_TYPE=Metal python -m examples.webgpu.compile_webgpu

  testdocker:
    name: Docker Test
@@ -264,58 +199,73 @@ jobs:
    - name: Test Docker
      run: docker run --rm tinygrad /usr/bin/env python3 -c "from tinygrad.tensor import Tensor; print(Tensor.eye(3).numpy())"

+  tests:
+    strategy:
+      matrix:
+        backend: [llvm, clang, gpu, cuda]

-  testcuda:
-    name: (emulated) cuda test
-    runs-on: ubuntu-22.04
+    name: Tests on (${{ matrix.backend }})
+    runs-on: ${{ matrix.backend == 'gpu'  && 'ubuntu-20.04' || matrix.backend=='clang'&&'windows-latest'|| 'ubuntu-latest' }}
    timeout-minutes: 20

    steps:
-    - name: Checkout Code
-      uses: actions/checkout@v3
-    - name: Update packages
-      run: |
-        export DEBIAN_FRONTEND=noninteractive
-        sudo apt-get update -y
-    - name: Install packages
-      run: sudo apt-get install -y --no-install-recommends git g++ cmake ninja-build llvm-15-dev libz-dev libglew-dev flex bison libfl-dev libboost-thread-dev libboost-filesystem-dev nvidia-cuda-toolkit-gcc
-    - name: Cache gpuocelot
-      id: cache-build
-      uses: actions/cache@v3
-      env:
-        cache-name: cache-gpuocelot-build
-      with:
-        path: ${{ github.workspace }}/gpuocelot/ocelot/
-        key: ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
-        restore-keys: |
-          ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
-    - if: ${{ steps.cache-build.outputs.cache-hit != 'true' }}
-      name: Clone gpuocelot
-      uses: actions/checkout@v3
-      with:
-        repository: gpuocelot/gpuocelot
-        ref: 19626fc00b6ee321638c3111074269c69050e091
-        path: ${{ github.workspace }}/gpuocelot
-        submodules: true
-    - if: ${{ steps.cache-build.outputs.cache-hit != 'true' }}
-      name: Compile gpuocelot
-      run: |
-        cd ${{ github.workspace }}/gpuocelot/ocelot
-        mkdir build
-        cd build
-        cmake .. -Wno-dev -G Ninja -DOCELOT_BUILD_TOOLS=OFF
-        ninja
-    - name: Install gpuocelot
-      run: |
-        cd ${{ github.workspace }}/gpuocelot/ocelot/build
-        sudo ninja install
-    - name: Set up Python 3.8
-      uses: actions/setup-python@v4
-      with:
-        python-version: 3.8
-        cache: 'pip'
-        cache-dependency-path: setup.py
-    - name: Install tinygrad dependencies
-      run: pip install -e '.[testing, cuda]' --extra-index-url https://download.pytorch.org/whl/cpu
-    - name: Run pytest
-      run: FORWARD_ONLY=1 JIT=1 OPT=2 CUDA=1 CUDACPU=1 python -m pytest -s -v -n=auto test --ignore=test/external --ignore=test/models --ignore=test/test_speed_v_torch.py --ignore=test/test_specific_conv.py --ignore=test/test_net_speed.py --ignore=test/test_nn.py -k "not half"
+      - name: Checkout Code
+        uses: actions/checkout@v3
+      - name: Set up Python 3.8
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.8
+      - name: Cache pip
+        uses: actions/cache@v3
+        with:
+          path: ${{ matrix.backend=='clang'&&'~\AppData\Local\pip\cache'||'~/.cache/pip' }}
+          key: ${{ matrix.backend }}
+      - name: Set env
+        run: printf "${{ matrix.backend == 'llvm' && 'ENABLE_METHOD_CACHE=1\nLLVM=1' || matrix.backend == 'clang' && 'CLANG=1\nENABLED_METHOD_CACHE=1' || matrix.backend == 'gpu' && 'GPU=1' || matrix.backend == 'cuda' && 'FORWARD_ONLY=1\nJIT=1\nOPT=2\nCUDA=1\nCUDACPU=1\n'}}" >> $GITHUB_ENV
+      - name: Install packages (gpu)
+        if: matrix.backend == 'gpu'
+        run: |
+          wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
+          echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
+          sudo apt update && \
+          sudo apt install -y intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl 
+      - name: Install packages (cuda)
+        if: matrix.backend == 'cuda'
+        run: |
+          export DEBIAN_FRONTEND=noninteractive
+          sudo apt update -y && \
+          sudo apt install -y --no-install-recommends git g++ cmake ninja-build llvm-15-dev zlib1g-dev libglew-dev flex bison libfl-dev libboost-thread-dev libboost-filesystem-dev nvidia-cuda-toolkit-gcc
+      - name: Cache gpuocelot
+        if: matrix.backend == 'cuda'
+        id: cache-build
+        uses: actions/cache@v3
+        env:
+          cache-name: cache-gpuocelot-build
+        with:
+          path: ${{ github.workspace }}/gpuocelot/ocelot/
+          key: ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
+          restore-keys: |
+            ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
+      - name: Clone/compile gpuocelot
+        if: matrix.backend == 'cuda' && steps.cache-build.outputs.cache-hit != 'true'
+        run: |
+          git clone --recurse-submodules https://github.com/gpuocelot/gpuocelot.git ${{ github.workspace }}/gpuocelot
+          cd ${{ github.workspace }}/gpuocelot/ocelot
+          git checkout 19626fc00b6ee321638c3111074269c69050e091
+          mkdir build
+          cd build
+          cmake .. -Wno-dev -G Ninja -DOCELOT_BUILD_TOOLS=OFF
+          ninja
+      - name: Install gpuocelot
+        if: matrix.backend == 'cuda'
+        run: |
+          cd ${{ github.workspace }}/gpuocelot/ocelot/build
+          sudo ninja install
+      - name: Install dependencies
+        run: pip install -e '.[testing${{matrix.backend=='llvm'&&',llvm'||matrix.backend=='cuda'&&',cuda'||''}}]' --extra-index-url https://download.pytorch.org/whl/cpu
+      - name: Run pytest (not cuda)
+        if: matrix.backend!='cuda'
+        run: python -m pytest -n=auto test/ -k '${{matrix.backend=='llvm'&&'not (test_nn.py and test_conv_transpose2d)'||'test'}}' -m 'not exclude_${{matrix.backend}}'
+      - name: Run pytest (cuda)
+        if: matrix.backend=='cuda'
+        run: python -m pytest -n=auto test/ -k 'not (half or test_efficientnet_safetensors) and not (test_conv2d and test_tensor.py)' -m 'not exclude_cuda' --ignore=test/external --ignore=test/models