[BACKEND] Fix reductions when number of unique element is smaller than layout (#1913)

Fix calculation of unique number of threads within a warp. We need to
consider the number of elements per thread in the calculation. Also
change the layout test to integer sum in order to catch bugs with unique
data as max reduction may hide those kind of problems.
This commit is contained in:
Thomas
2023-07-07 19:48:13 -07:00
committed by GitHub
parent 778ed64a66
commit bd900e0a6f
5 changed files with 36 additions and 38 deletions

View File

@@ -918,7 +918,8 @@ unsigned ModuleAxisInfoAnalysis::getPtrContiguity(Value ptr) {
auto order = triton::gpu::getOrder(layout);
unsigned align = getPtrAlignment(ptr);
auto uniqueContigPerThread = triton::gpu::getUniqueContigPerThread(tensorTy);
auto uniqueContigPerThread =
triton::gpu::getUniqueContigPerThread(layout, tensorTy.getShape());
assert(order[0] < uniqueContigPerThread.size() &&
"Unxpected uniqueContigPerThread size");
unsigned contiguity = uniqueContigPerThread[order[0]];