[BACKEND] Support MMA V3 with register operand (#2375)

MMA V3 support taking operand A from register. This helps for chained
matmul operations like in attention.
Add an optimization to use this mode when it helps and add the lowering
for it.
This commit is contained in:
Thomas Raoux
2023-09-25 10:43:54 -07:00
committed by GitHub
parent 8ae2ae4f40
commit 6bc1d9e1be
13 changed files with 186 additions and 78 deletions

View File

@@ -96,6 +96,8 @@ SmallVector<unsigned>
getScratchConfigForCvtLayout(triton::gpu::ConvertLayoutOp op, unsigned &inVec,
unsigned &outVec) {
auto repShape = getRepShapeForCvtLayout(op);
if (repShape.empty())
return repShape;
auto srcTy = op.getSrc().getType().cast<RankedTensorType>();
auto dstTy = op.getResult().getType().cast<RankedTensorType>();