Files
ROCm/scripts/amd
Alexander Efimov 605a90c58e [MFMA] Support tile size 4x4 version 1 (#413)
This PR enables 4x4 tile size in MFMA based dot operations.

Supported tiled dot is (4x64) x (64x4) -> (4x4) in MFMA layout.
However, actual dot operation should have at least 64 output elements, this is a limitation of other layouts appearing during result processing (i.e. blocked layout can not handle tensors smaller than wavesize).

For example, following dots are supported: (4x64) x (64x16) -> (4x16), (16x64) x (64x4) -> (16x4) or (8x64) x (64x8) -> (8x8)
Following dots are not supporter: (4x128) x (128x4) -> (4x4), (4x64) x (64x8) -> (4x8)

This is a first version of dot using mfma 4x4 instructions, with redundancy and reductions.
2023-12-12 18:23:55 +01:00
..
2022-12-21 13:13:24 -06:00
2023-05-04 16:46:59 -05:00
2022-12-22 18:47:42 -06:00
2022-12-22 18:47:42 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2023-05-01 12:49:29 -05:00
2022-12-21 13:13:24 -06:00
2023-05-04 16:46:59 -05:00
2023-11-08 12:42:54 -06:00
2023-05-12 15:37:08 -05:00
2022-12-21 13:13:24 -06:00
2022-12-21 13:13:24 -06:00
2023-05-12 15:37:08 -05:00