Files
ROCm/python/test/unit/hopper
Beal Wang 7e5cd95bf2 [OPTIMIZER] Fix Warp Specialized kernel launch failure (#2146)
For warp specialized persistent kernel, the instruction sequence for
Warp Groups are
```
// warp group 0
for wave in 0..num_waves:
    idx = wave * num_inner_loop_steps;
    for k_tile_idx in 0..num_k_tiles:
        mbarrier.wait EB[idx];
        W0;
        mbarrier.arrive FB[idx];
        idx++;
```
```
// warp group 1
for wave in 0..num_waves:
    idx = wave * num_inner_loop_steps;
    for k_tile_idx in 0..num_k_tiles:
        mbarrier.wait FB[idx];
        R0;
        mbarrier.arrive EB[idx];
        idx++;
```
then this would form a sequence of morally-strong relations W0 -> R0 ->
W1 -> R1 in causality order.
But if GEMM K is small than K-TileShape, then the num_inner_loop_steps
of persistent kernel is 0. The buffer id and mbarrier id will always be
0 in this case. And it may form W0 -> W1 -> R0 -> R1 order, which is
contradicts with the atomicity --
"If a read R precedes an overlapping write W in causality order, then R
cannot read from W."
2023-08-21 14:46:57 +08:00
..