use Ops.REDUCE (#9721)

* decrease bert python time [pr]

* order copies

* Revert "order copies"

This reverts commit 3f62c8693b.

* rewrite count

* Ops.REDUCE

* acc first in the add chain

* Fix tensor core acc

* arange patterns look good

* fix multireduce gate

* reduce rewrite rule

* bump that to 15 minutes

* multiwmma isn't fusing

* gep through wmma is gep pushing

* bump that timeout too, it's all env setup

* add failing test
This commit is contained in:
George Hotz
2025-04-04 10:14:34 +08:00
committed by GitHub
parent 949459fdd6
commit cac8bcf8b5
11 changed files with 115 additions and 43 deletions

View File

@@ -149,7 +149,7 @@ jobs:
torchbackend:
name: Torch Backend Tests
runs-on: ubuntu-latest
timeout-minutes: 10
timeout-minutes: 15
steps:
- name: Checkout Code
uses: actions/checkout@v4
@@ -186,7 +186,7 @@ jobs:
torchbackendmore:
name: Torch Backend Tests More
runs-on: ubuntu-latest
timeout-minutes: 10
timeout-minutes: 15
steps:
- name: Checkout Code
uses: actions/checkout@v4