mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
* [WIP][FA OPTIMIZATION] Optimize chain dot This commit optimizes chain dot operation by keeping results of the first dot operation in registers. * [FA OPTIMIZATION] Enable lowering pipeline for keeping result of chain dot in registers * Move operand swapping in ttgir -> llir lowering phase * Refactor emitMfmaOffsetForCTA function to be more readable * Fix accidental change in 06-fused-attention.py * Address review comments * Fix rebase errors
21 KiB
21 KiB