[BACKEND] Support MMA V3 with register operand (#2375)

MMA V3 support taking operand A from register. This helps for chained matmul operations like in attention. Add an optimization to use this mode when it helps and add the lowering for it.
2026-04-05 03:01:17 -04:00 · 2023-09-25 10:43:54 -07:00
parent 8ae2ae4f40
commit 6bc1d9e1be
13 changed files with 186 additions and 78 deletions
--- a/lib/Analysis/Allocation.cpp
+++ b/lib/Analysis/Allocation.cpp
@@ -96,6 +96,8 @@ SmallVector<unsigned>
 getScratchConfigForCvtLayout(triton::gpu::ConvertLayoutOp op, unsigned &inVec,
                             unsigned &outVec) {
  auto repShape = getRepShapeForCvtLayout(op);
+  if (repShape.empty())
+    return repShape;

  auto srcTy = op.getSrc().getType().cast<RankedTensorType>();
  auto dstTy = op.getResult().getType().cast<RankedTensorType>();