mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
In current implementation, warpsPerCTA is always set to [numWarps, 1] for 2 tt.dot fusion scenario. But, it is not optimal for cases such that tt.dot doesn't have enough parallelism on row dimension but on column dimension.