mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
[BACKEND] Fix scan issues on repetitive warps and improve perf when there's a single warp on the axis (#2330)
1. On the axis, using `getAxisNumWarpsWithUniqueData` instead of getting the raw number of warps to avoid communication among warps that handle the same piece of data. 2. When there's a single warp on the axis, using warp Intrinsics for communication and skip shared memory. Need a follow up PR for code clean up.
This commit is contained in:
@@ -88,6 +88,8 @@ public:
|
||||
unsigned getNonAxisNumThreadsPerCTA();
|
||||
// Return the number of warps per CTA along axis dim.
|
||||
unsigned getAxisNumWarps();
|
||||
// Return the number of warps per CTA along axis dim with unique data.
|
||||
unsigned getAxisNumWarpsWithUniqueData();
|
||||
// Return the number of threads per warp along axis dim.
|
||||
unsigned getAxisNumThreadsPerWarp();
|
||||
// Return the number of blocks along axis dim.
|
||||
|
||||
Reference in New Issue
Block a user