mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
Replace inline assembly in commonShflSync with intrinsics (#418)
Inline assembly does not take into account instructions around, and in general can not avoid data hazards. Replacing inline asm with intrinsics solves this problem. This particular code behaved incorrectly in one of mfma dot tests: Code generated with help of inline assembly: ``` v_mfma_f32_4x4x4f16 v[4:7], v[4:5], v[6:7], 0 ds_swizzle_b32 v3, v4, offset:swizzle(SWAP:4) ``` Correct code generated with intrinsics: ``` v_mfma_f32_4x4x4f16 v[4:7], v[4:5], v[6:7], 0 s_nop 4 ds_swizzle_b32 v3, v4, offset:swizzle(SWAP:4) ```
This commit is contained in:
@@ -1961,11 +1961,11 @@ module attributes {"triton_gpu.num-ctas" = 1 : i32, "triton_gpu.num-warps" = 1 :
|
||||
// PTX: nvvm.shfl.sync bfly
|
||||
// PTX: nvvm.barrier0
|
||||
|
||||
// GCN-COUNT-4: ds_swizzle_b32
|
||||
// GCN-COUNT-4: rocdl.ds_swizzle %{{.*}} : (i32, i32) -> i32
|
||||
// GCN: llvm.store
|
||||
// GCN: rocdl.barrier
|
||||
// GCN: llvm.load
|
||||
// GCN-COUNT-2: ds_swizzle_b32
|
||||
// GCN-COUNT-2: rocdl.ds_swizzle %{{.*}} : (i32, i32) -> i32
|
||||
// GCN: llvm.store
|
||||
// GCN: rocdl.barrier
|
||||
// GCN: llvm.load
|
||||
|
||||
Reference in New Issue
Block a user