Files
ROCm/test
Alexander Efimov a944811b6d Replace inline assembly in commonShflSync with intrinsics (#418)
Inline assembly does not take into account instructions around,
and in general can not avoid data hazards.
Replacing inline asm with intrinsics solves this problem.
This particular code behaved incorrectly in one of mfma dot tests:

Code generated with help of inline assembly:

```
  v_mfma_f32_4x4x4f16 v[4:7], v[4:5], v[6:7], 0
  ds_swizzle_b32 v3, v4, offset:swizzle(SWAP:4)
```

Correct code generated with intrinsics:

```
  v_mfma_f32_4x4x4f16 v[4:7], v[4:5], v[6:7], 0
  s_nop 4
  ds_swizzle_b32 v3, v4, offset:swizzle(SWAP:4)
```
2023-12-11 16:41:39 +01:00
..