[BACKEND] Fix reductions when number of unique element is smaller than layout (#1913)

Fix calculation of unique number of threads within a warp. We need to consider the number of elements per thread in the calculation. Also change the layout test to integer sum in order to catch bugs with unique data as max reduction may hide those kind of problems.
2026-04-05 03:01:17 -04:00 · 2023-07-07 19:48:13 -07:00
parent 778ed64a66
commit bd900e0a6f
5 changed files with 36 additions and 38 deletions
--- a/lib/Analysis/AxisInfo.cpp
+++ b/lib/Analysis/AxisInfo.cpp
@@ -918,7 +918,8 @@ unsigned ModuleAxisInfoAnalysis::getPtrContiguity(Value ptr) {
  auto order = triton::gpu::getOrder(layout);
  unsigned align = getPtrAlignment(ptr);

-  auto uniqueContigPerThread = triton::gpu::getUniqueContigPerThread(tensorTy);
+  auto uniqueContigPerThread =
+      triton::gpu::getUniqueContigPerThread(layout, tensorTy.getShape());
  assert(order[0] < uniqueContigPerThread.size() &&
         "Unxpected uniqueContigPerThread size");
  unsigned contiguity = uniqueContigPerThread[order[0]];