ROCm/lib at 6ac9d51ff00803281253c7299c48fe0bb11dde33 - ROCm

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Files

Thomas Raoux 6ac9d51ff0 [OPTIMIZATION] Enable pipelining for bwd flash attention (#2590 )

This allow pipelining when a load is used by multiple dot in a loop.

Relax the condition to pipeline dot operands for mma v3 case. This
improves performance for the bwd pass from 260TF to 275TF. However this
expose a performance problem due to the wmma pipelining as ptxas will
now fall back to serial wgmma. A follow up PR will fix a bug in how we
emit wgmma_wait during pipelining and will bring performance to 335TF

2023-11-03 11:46:51 -07:00

Analysis

[BACKEND] Handle AtomicCASOp in GPU IR conversion (#2514 )

2023-10-25 15:20:07 -04:00

Conversion

[BACKEND] Pipeliner refactoring (#2565 )

2023-11-02 09:56:39 -07:00

Dialect

[OPTIMIZATION] Enable pipelining for bwd flash attention (#2590 )

2023-11-03 11:46:51 -07:00

Target

upgrade llvm to b1115f8c (NFC) (#2403 )

2023-10-16 16:38:49 -07:00

CMakeLists.txt

[BACKEND] Remove HopperHelpers.c and replace with inline ptx and LLVM codegen (#2047 )

2023-08-10 15:52:37 -07:00