[BACKEND] Improve decision of MMA dimension on H100 (#2373)

When there is a chain of mma ops we want to pick the same shape to avoid conversions. This improves the detection going through for loops. This fixes a crash in tutorial bw attention. We might want to change this logic and convert the format to allow more efficient MMA at some point.
2026-04-27 03:01:52 -04:00 · 2023-09-22 15:21:56 -07:00
parent 1724604bd9
commit 840e7e7b53
4 changed files with 106 additions and 14 deletions
--- a/include/triton/Dialect/TritonGPU/Transforms/Utility.h
+++ b/include/triton/Dialect/TritonGPU/Transforms/Utility.h
@@ -141,6 +141,16 @@ Value linearize(OpBuilder &b, Location loc, ArrayRef<Value> multiDim,
 Value linearize(OpBuilder &b, Location loc, ArrayRef<Value> multiDim,
                ArrayRef<unsigned> shape);

+// Implement backward and forward slice that will go through scf blocks when
+// yield or scf results are in the slice.
+// Note that like exisiting forward and backard slice this may add operations to
+// the slice that are not actually dependent on the root because when a region
+// is added to the slice in the forward slice all the operations of the region
+// are added. We could implement a more accurate slice method by tracking value
+// usage across scf regions.
+void getBackwardSliceSCFAware(Operation *, SetVector<Operation *> *slices);
+void getForwardSliceSCFAware(Value root, SetVector<Operation *> *slices);
+
 } // namespace mlir

 #endif // TRITON_DIALECT_TRITONGPU_TRANSFORMS_UTILITY_H_