Files
ROCm/python
Weixing Zhang 34b89a1173 [OPTIMIZER] Tweak warpsPerCTA based on the shape of MMA output (#2581)
In current implementation, warpsPerCTA is always set to [numWarps, 1]
for 2 tt.dot fusion scenario. But, it is not optimal for cases such that
tt.dot doesn't have enough parallelism on row dimension but on column
dimension.
2023-11-03 16:40:03 -04:00
..