mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
[Backend] Refactor sharedToDotOperandMFMA lowering (#439)
* Remove unnecessary xor computations for k-major swizzled tensors * Support mfma16 and mfma4 in the fast path * Choose warpsPerCTA according to nonKDim * Set maxPhase=4 for mfma4 * Fix tests For now, we do not disable swizzling for k-major tensors * Remove fastPathComputeOffsetsTy1 * Enable k-major + disabled swizzling in the normal path
This commit is contained in:
@@ -157,6 +157,9 @@ compared to 1*64 when the hasLeadingOffset is false.
|
||||
// maxPhase is set to SIMDWidth / perPhase
|
||||
int vecSize = ((typeWidthInBit == 16) ? 64 : 32 ) / typeWidthInBit;
|
||||
int maxPhase = SIMDWidth / perPhase;
|
||||
// TODO (zhanglx): figure out better parameters for mfma4
|
||||
if (mfmaEnc.getNonKDim() == 4 )
|
||||
maxPhase = 4;
|
||||
|
||||
return get(context, vecSize, perPhase, maxPhase, order, CTALayout);
|
||||
} else {
|
||||
|
||||
Reference in New Issue
Block a user