[MFMA] Introduce dot operand loading fast path (#269)

* [MFMA] Introduce dot operand loading fast path

This PR introduces fast path for code generation of MFMA dot operand
loading from LDS.

Fast path is used when operand is not swizzled and is not slice of some
bigger LDS object(it is not a slice of a tensor).
This is a case for current FA and GEMM kernels compiled with
num_stages=1, i.e. software pipelining is disabled.

* cleanup swizzle info
This commit is contained in:
Alexander Efimov
2023-07-27 20:46:50 +02:00
committed by GitHub
parent 2fbffe2784
commit 0073bb98f4
3 changed files with 390 additions and 77 deletions

View File

@@ -85,9 +85,7 @@ def optimize_ttgir(mod, num_stages, arch):
pm.add_tritongpu_accelerate_matmul_pass(80)
pm.add_tritongpu_remove_layout_conversions_pass()
pm.add_tritongpu_optimize_dot_operands_pass()
# TODO enable this pass for AMD GPU when it is ready
if not is_hip():
pm.add_tritongpu_pipeline_pass(num_stages)
pm.add_tritongpu_pipeline_pass(num_stages)
pm.add_tritongpu_prefetch_pass()
pm.add_tritongpu_optimize_dot_operands_pass()
pm.add_tritongpu_remove_layout_conversions_pass()