Files
ROCm/include
Lixun Zhang 2e217c5a5c [Backend] Refactor sharedToDotOperandMFMA lowering (#439)
* Remove unnecessary xor computations for k-major swizzled tensors

* Support mfma16 and mfma4 in the fast path

* Choose warpsPerCTA according to nonKDim

* Set maxPhase=4 for mfma4

* Fix tests

For now, we do not disable swizzling for k-major tensors

* Remove fastPathComputeOffsetsTy1

* Enable k-major + disabled swizzling in the normal path
2024-01-12 12:50:18 -06:00
..