diff --git a/docs/how-to/rocm-for-ai/inference-optimization/model-acceleration-libraries.rst b/docs/how-to/rocm-for-ai/inference-optimization/model-acceleration-libraries.rst index 4ea3aa5f7..bdc6aadb6 100755 --- a/docs/how-to/rocm-for-ai/inference-optimization/model-acceleration-libraries.rst +++ b/docs/how-to/rocm-for-ai/inference-optimization/model-acceleration-libraries.rst @@ -41,16 +41,16 @@ Installing Flash Attention 2 `Flash Attention `_ supports two backend implementations on AMD GPUs. -* `Composable Kernel (CK) `_ - the default backend -* `OpenAI Triton `_ - an alternative backend +* `Composable Kernel (CK) `__ - the default backend +* `OpenAI Triton `__ - an alternative backend -You can switch between these backends using the environment variable FLASH_ATTENTION_TRITON_AMD_ENABLE: +You can switch between these backends using the environment variable ``FLASH_ATTENTION_TRITON_AMD_ENABLE``: -FLASH_ATTENTION_TRITON_AMD_ENABLE="FALSE" -→ Use Composable Kernel (CK) backend (FlashAttention 2) +``FLASH_ATTENTION_TRITON_AMD_ENABLE="FALSE"`` +→ Use Composable Kernel (CK) backend (Flash Attention 2) -FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" -→ Use OpenAI Triton backend (FlashAttention 2) +``FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"`` +→ Use OpenAI Triton backend (Flash Attention 2) To install Flash Attention 2, use the following commands: @@ -74,7 +74,7 @@ For detailed installation instructions, see `Flash Attention