Add JAX compatibility doc (#4234)

* Add JAX compatibility (cherry picked from commit 99215ab6b4cf6a1209d6c5fc781b5855251dcba5) * WIP (cherry picked from commit 54564a85d340b4149ed80a33377cf54c1eb48713) * Fix docker table (cherry picked from commit 8115a905764c869b390de2561e5f1356ec7e9743) * WIP (cherry picked from commit 45076e1fd20fd2c43f7a0ab6d8d5d246c498d801) * add minor formatting (cherry picked from commit c75706841092006c26766611b0407b79a13c7345) * PR feedbacks (cherry picked from commit 236b5daae4251c26cd697c6e20d5982771b05754) * fix inconsistent formatting (cherry picked from commit 0c6a2e3627f9e6159e3f400ab18769904c18097e) * Rename file (cherry picked from commit f17239aa8a9fa1ecdf8dab08c0348dc9216c5311) * jax_triton supported (cherry picked from commit fa56d697fbaa44c0c480df71dc236be8584291c0) * WIP (cherry picked from commit e8f0c5741fe96bb1e3272365906334d911a9a849) * WIP (cherry picked from commit 8ee4f3c62da8e11eea591340dc7c9fc1be8b7035) * WIP (cherry picked from commit 58c6bf441054fe3a21ba2d86808279e90de847b7) * WIP (cherry picked from commit 368ddf6925215a9bfd75a43c7c33def12238f81d) * update .wordlist.txt (cherry picked from commit 78ac332c8d6eba93e2b3e57440da3f60054bbadb) * update .wordlist.txt (cherry picked from commit 8d9492399f4b73b0c3c5359684d5b7faa328ba0f) * Fix typos (cherry picked from commit 394dede13b6de087237832fe3c693c11da7d733b) * update jax note (cherry picked from commit ceacc713c4295f8bbd20fc622579de9053b73337) * Update docs/compatibility/ml-compatibility/jax-compatibility.rst (cherry picked from commit b0613e914a2ba639fddea62eb495f97beaa8ba49) * Update docs/compatibility/ml-compatibility/jax-compatibility.rst (cherry picked from commit 8aac4344b6fd4120a3b8a31878f5316df99f3f99) * Add back hipGraph support (cherry picked from commit 028ddb3535073e0cd668c24614a0a73a491b5948) * WIP (cherry picked from commit 2e0ff9c5e3f88ceea6b0ca770bb4edb52ce08a47) * WIP (cherry picked from commit 186802585de5b7d58f9ac2a7947a83c037df1617) * add blurb about docker icon (cherry picked from commit aef650d4072578f75e7549151613f390f6545ce1) * update pytorch-compatibility path in conf.py * words --------- Co-authored-by: Mátyás Aradi <matyas@streamhpc.com> Co-authored-by: Istvan Kiss <neon60@gmail.com> (cherry picked from commit ff1393142b)
2026-01-10 07:08:08 -05:00 · 2025-01-07 09:57:19 -05:00
parent 01fd243fb8
commit d1ca7ebd66
6 changed files with 706 additions and 26 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -26,6 +26,7 @@ ASm
 ATI
 AddressSanitizer
 AlexNet
+Andrej
 Arb
 Autocast
 BARs
@@ -187,15 +188,17 @@ Interop
 Intersphinx
 Intra
 Ioffe
+JAX's
 Jinja
 JSON
 Jupyter
 KFD
 KFDTest
-KiB
 KMD
 KV
 KVM
+Karpathy's
+KiB
 Keras
 Khronos
 LAPACK
@@ -288,6 +291,7 @@ OpenVX
 OpenXLA
 Oversubscription
 PagedAttention
+Pallas
 PCC
 PCI
 PCIe
@@ -662,6 +666,7 @@ mutex
 mvffr
 namespace
 namespaces
+nanoGPT
 num
 numref
 ocl
@@ -673,7 +678,9 @@ optimizers
 os
 oversubscription
 pageable
+pallas
 parallelization
+parallelizing
 parameterization
 passthrough
 perfcounter
@@ -761,6 +768,7 @@ runtimes
 sL
 scalability
 scalable
+scipy
 seealso
 sendmsg
 seqs
--- a/docs/compatibility/compatibility-matrix.rst
+++ b/docs/compatibility/compatibility-matrix.rst
@@ -47,9 +47,9 @@ compatibility and system requirements.
      ,gfx908,gfx908,gfx908
      ,,,
      FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
-      :doc:`PyTorch <../compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
+      :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13"
      :doc:`TensorFlow <rocm-install-on-linux:install/3rd-party/tensorflow-install>`,"2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1"
-      :doc:`JAX <rocm-install-on-linux:install/3rd-party/jax-install>`,0.4.35,0.4.35,0.4.26
+      :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.26
      `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.17.3,1.17.3,1.17.3
      ,,,
      THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
--- a/docs/compatibility/ml-compatibility/jax-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/jax-compatibility.rst
@@ -0,0 +1,664 @@
+.. meta::
+   :description: JAX compatibility
+   :keywords: GPU, JAX compatibility
+
+*******************************************************************************
+JAX compatibility
+*******************************************************************************
+
+JAX provides a NumPy-like API, which combines automatic differentiation and the
+Accelerated Linear Algebra (XLA) compiler to achieve high-performance machine
+learning at scale.
+
+JAX uses composable transformations of Python and NumPy through just-in-time (JIT) compilation,
+automatic vectorization, and parallelization. To learn about JAX, including profiling and
+optimizations, see the official `JAX documentation
+<https://jax.readthedocs.io/en/latest/notebooks/quickstart.html>`_.
+
+ROCm support for JAX is upstreamed and users can build the official source code with ROCm
+support:
+
+- ROCm JAX release:
+
+  - Offers AMD-validated and community :ref:`Docker images <jax-docker-compat>` with ROCm and JAX pre-installed.
+
+  - ROCm JAX repository: `<https://github.com/ROCm/jax>`__
+
+  - See the :doc:`ROCm JAX installation guide <rocm-install-on-linux:install/3rd-party/jax-install>`
+    to get started.
+
+- Official JAX release:
+
+  - Official JAX repository: `<https://github.com/jax-ml/jax>`__
+
+  - See the `AMD GPU (Linux) installation section
+    <https://jax.readthedocs.io/en/latest/installation.html#amd-gpu-linux>`_ in the JAX
+    documentation.
+
+.. note::
+
+   AMD releases official `ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax>`_
+   quarterly alongside new ROCm releases. These images undergo full AMD testing.
+   `Community ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax-community>`_
+   follow upstream JAX releases and use the latest available ROCm version.
+
+.. _jax-docker-compat:
+
+Docker image compatibility
+================================================================================
+
+.. |docker-icon| raw:: html
+
+   <i class="fab fa-docker"></i>
+
+AMD validates and publishes ready-made `JAX <https://hub.docker.com/r/rocm/jax/>`_
+images with ROCm backends on Docker Hub. The following Docker image tags and
+associated inventories are validated for
+`ROCm 6.3.1 <https://repo.radeon.com/rocm/apt/6.3.1/>`_. Click |docker-icon|
+to see the image on Docker Hub.
+
+.. list-table:: JAX Docker image components
+    :header-rows: 1
+
+    * - Docker image
+      - JAX
+      - Linux
+      - Python
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.3.1-jax0.4.31-py3.12/images/sha256-085a0cd5207110922f1fca684933a9359c66d42db6c5aba4760ed5214fdabde0"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
+
+      - `0.4.31 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.31>`_
+      - Ubuntu 24.04
+      - `3.12.7 <https://www.python.org/downloads/release/python-3127/>`_
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.3.1-jax0.4.31-py3.10/images/sha256-f88eddad8f47856d8640b694da4da347ffc1750d7363175ab7dc872e82b43324"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
+
+      - `0.4.31 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.31>`_
+      - Ubuntu 22.04
+      - `3.10.14 <https://www.python.org/downloads/release/python-31014/>`_
+
+AMD publishes community `JAX <https://hub.docker.com/r/rocm/jax-community>`_
+images with ROCm backends on Docker Hub. The following Docker image tags and
+associated inventories are tested for `ROCm 6.2.4 <https://repo.radeon.com/rocm/apt/6.2.4/>`_.
+
+.. list-table:: JAX community Docker image components
+    :header-rows: 1
+
+    * - Docker image
+      - JAX
+      - Linux
+      - Python
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.12.7/images/sha256-a6032d89c07573b84c44e42c637bf9752b1b7cd2a222d39344e603d8f4c63beb?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+
+      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - Ubuntu 22.04
+      - `3.12.7 <https://www.python.org/downloads/release/python-3127/>`_
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.11.10/images/sha256-d462f7e445545fba2f3b92234a21beaa52fe6c5f550faabcfdcd1bf53486d991?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+
+      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - Ubuntu 22.04
+      - `3.11.10 <https://www.python.org/downloads/release/python-31110/>`_
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.10.15/images/sha256-6f2d4d0f529378d9572f0e8cfdcbc101d1e1d335bd626bb3336fff87814e9d60?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+
+      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - Ubuntu 22.04
+      - `3.10.15 <https://www.python.org/downloads/release/python-31015/>`_
+
+Critical ROCm libraries for JAX
+================================================================================
+
+The functionality of JAX with ROCm is determined by its underlying library
+dependencies. These critical ROCm components affect the capabilities,
+performance, and feature set available to developers.
+
+.. list-table::
+    :header-rows: 1
+
+    * - ROCm library
+      - Version
+      - Purpose
+      - Used in
+    * - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
+      - 2.3.0
+      - Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
+        matrix and vector operations.
+      - Matrix multiplication in ``jax.numpy.matmul``, ``jax.lax.dot`` and
+        ``jax.lax.dot_general``, operations like ``jax.numpy.dot``, which
+        involve vector and matrix computations and batch matrix multiplications
+        ``jax.numpy.einsum`` with matrix-multiplication patterns algebra
+        operations.
+    * - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
+      - 0.10.0
+      - hipBLASLt is an extension of hipBLAS, providing additional
+        features like epilogues fused into the matrix multiplication kernel or
+        use of integer tensor cores.
+      - Matrix multiplication in ``jax.numpy.matmul`` or ``jax.lax.dot``, and
+        the XLA (Accelerated Linear Algebra) use hipBLASLt for optimized matrix
+        operations, mixed-precision support, and hardware-specific
+        optimizations.
+    * - `hipCUB <https://github.com/ROCm/hipCUB>`_
+      - 3.3.0
+      - Provides a C++ template library for parallel algorithms for reduction,
+        scan, sort and select.
+      - Reduction functions (``jax.numpy.sum``, ``jax.numpy.mean``, 
+        ``jax.numpy.prod``, ``jax.numpy.max`` and ``jax.numpy.min``), prefix sum
+        (``jax.numpy.cumsum``, ``jax.numpy.cumprod``) and sorting
+        (``jax.numpy.sort``, ``jax.numpy.argsort``).
+    * - `hipFFT <https://github.com/ROCm/hipFFT>`_
+      - 1.0.17
+      - Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
+      - Used in functions like ``jax.numpy.fft``.
+    * - `hipRAND <https://github.com/ROCm/hipRAND>`_
+      - 2.11.0
+      - Provides fast random number generation for GPUs.
+      - The ``jax.random.uniform``, ``jax.random.normal``,
+        ``jax.random.randint`` and ``jax.random.split``.
+    * - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
+      - 2.3.0
+      - Provides GPU-accelerated solvers for linear systems, eigenvalues, and
+        singular value decompositions (SVD).
+      - Solving linear systems (``jax.numpy.linalg.solve``), matrix
+        factorizations, SVD (``jax.numpy.linalg.svd``) and eigenvalue problems 
+        (``jax.numpy.linalg.eig``).
+    * - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
+      - 3.1.2
+      - Accelerates operations on sparse matrices, such as sparse matrix-vector
+        or matrix-matrix products.
+      - Sparse matrix multiplication (``jax.numpy.matmul``), sparse
+        matrix-vector and matrix-matrix products
+        (``jax.experimental.sparse.dot``), sparse linear system solvers and
+        sparse data handling.
+    * - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`_
+      - 0.2.2
+      - Accelerates operations on sparse matrices, such as sparse matrix-vector
+        or matrix-matrix products.
+      - Sparse matrix multiplication (``jax.numpy.matmul``), sparse
+        matrix-vector and matrix-matrix products
+        (``jax.experimental.sparse.dot``) and sparse linear system solvers.
+    * - `MIOpen <https://github.com/ROCm/MIOpen>`_
+      - 3.3.0
+      - Optimized for deep learning primitives such as convolutions, pooling,
+        normalization, and activation functions.
+      - Speeds up convolutional neural networks (CNNs), recurrent neural
+        networks (RNNs), and other layers. Used in operations like
+        ``jax.nn.conv``, ``jax.nn.relu``, and ``jax.nn.batch_norm``.
+    * - `RCCL <https://github.com/ROCm/rccl>`_
+      - 2.21.5
+      - Optimized for multi-GPU communication for operations like  all-reduce,
+        broadcast, and scatter.
+      - Distribute computations across multiple GPU with ``pmap`` and
+        ``jax.distributed``. XLA automatically uses rccl when executing
+        operations across multiple GPUs on AMD hardware.
+    * - `rocThrust <https://github.com/ROCm/rocThrust>`_
+      - 3.3.0
+      - Provides a C++ template library for parallel algorithms like sorting,
+        reduction, and scanning.
+      - Reduction operations like ``jax.numpy.sum``, ``jax.pmap`` for
+        distributed training, which involves parallel reductions or
+        operations like ``jax.numpy.cumsum`` can use rocThrust.
+
+Supported and unsupported features
+===============================================================================
+
+The following table maps GPU-accelerated JAX modules to their supported
+ROCm and JAX versions.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Description
+      - Since JAX
+      - Since ROCm
+    * - ``jax.numpy``
+      - Implements the NumPy API, using the primitives in ``jax.lax``.
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.scipy``
+      - Provides GPU-accelerated and differentiable implementations of many
+        functions from the SciPy library, leveraging JAX's transformations
+        (e.g., ``grad``, ``jit``, ``vmap``).
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.lax``
+      - A library of primitives operations that underpins libraries such as
+        ``jax.numpy.`` Transformation rules, such as Jacobian-vector product
+        (JVP) and batching rules, are typically defined as transformations on
+        ``jax.lax`` primitives.
+      - 0.1.57
+      - 5.0.0
+    * - ``jax.random``
+      - Provides a number of routines for deterministic generation of sequences
+        of pseudorandom numbers.
+      - 0.1.58
+      - 5.0.0
+    * - ``jax.sharding``
+      - Allows to define partitioning and distributing arrays across multiple
+        devices.
+      - 0.3.20
+      - 5.1.0
+    * - ``jax.dlpack``
+      - For exchanging tensor data between JAX and other libraries that support the
+        DLPack standard.
+      - 0.1.57
+      - 5.0.0
+    * - ``jax.distributed``
+      - Enables the scaling of computations across multiple devices on a single
+        machine or across multiple machines.
+      - 0.1.74
+      - 5.0.0
+    * - ``jax.dtypes``
+      - Provides utilities for working with and managing data types in JAX
+        arrays and computations.
+      - 0.1.66
+      - 5.0.0 
+    * - ``jax.image``
+      - Contains image manipulation functions like resize, scale and translation.
+      - 0.1.57
+      - 5.0.0
+    * - ``jax.nn``
+      - Contains common functions for neural network libraries.
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.ops``
+      - Computes the minimum, maximum, sum or product within segments of an
+        array.
+      - 0.1.57
+      - 5.0.0
+    * - ``jax.profiler``
+      - Contains JAX’s tracing and time profiling features.
+      - 0.1.57
+      - 5.0.0
+    * - ``jax.stages``
+      - Contains interfaces to stages of the compiled execution process.
+      - 0.3.4
+      - 5.0.0
+    * - ``jax.tree``
+      - Provides utilities for working with tree-like container data structures.
+      - 0.4.26
+      - 5.6.0
+    * - ``jax.tree_util``
+      - Provides utilities for working with nested data structures, or
+        ``pytrees``.
+      - 0.1.65
+      - 5.0.0
+    * - ``jax.typing``
+      - Provides JAX-specific static type annotations.
+      - 0.3.18
+      - 5.1.0
+    * - ``jax.extend``
+      - Provides modules for access to JAX internal machinery module. The
+        ``jax.extend`` module defines a library view of some of JAX’s internal
+        components.
+      - 0.4.15
+      - 5.5.0
+    * - ``jax.example_libraries``
+      - Serves as a collection of example code and libraries that demonstrate
+        various capabilities of JAX.
+      - 0.1.74
+      - 5.0.0
+    * - ``jax.experimental``
+      - Namespace for experimental features and APIs that are in development or
+        are not yet fully stable for production use.
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.lib``
+      - Set of internal tools and types for bridging between JAX’s Python
+        frontend and its XLA backend.
+      - 0.4.6
+      - 5.3.0
+    * - ``jax_triton``
+      - Library that integrates the Triton deep learning compiler with JAX.
+      - jax_triton 0.2.0 
+      - 6.2.4
+
+jax.scipy module
+-------------------------------------------------------------------------------
+
+A SciPy-like API for scientific computing.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Since JAX
+      - Since ROCm
+    * - ``jax.scipy.cluster``
+      - 0.3.11
+      - 5.1.0
+    * - ``jax.scipy.fft``
+      - 0.1.71
+      - 5.0.0
+    * - ``jax.scipy.integrate``
+      - 0.4.15
+      - 5.5.0
+    * - ``jax.scipy.interpolate``
+      - 0.1.76
+      - 5.0.0
+    * - ``jax.scipy.linalg``
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.scipy.ndimage``
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.scipy.optimize``
+      - 0.1.57
+      - 5.0.0
+    * - ``jax.scipy.signal``
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.scipy.spatial.transform``
+      - 0.4.12
+      - 5.4.0
+    * - ``jax.scipy.sparse.linalg``
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.scipy.special``
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.scipy.stats``
+      - 0.1.56
+      - 5.0.0
+
+jax.scipy.stats module
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. list-table::
+   :header-rows: 1
+
+   * - Module
+     - Since JAX
+     - Since ROCm
+   * - ``jax.scipy.stats.bernouli``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.beta``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.betabinom``
+     - 0.1.61
+     - 5.0.0
+   * - ``jax.scipy.stats.binom``
+     - 0.4.14
+     - 5.4.0
+   * - ``jax.scipy.stats.cauchy``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.chi2``
+     - 0.1.61
+     - 5.0.0
+   * - ``jax.scipy.stats.dirichlet``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.expon``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.gamma``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.gennorm``
+     - 0.3.15
+     - 5.2.0
+   * - ``jax.scipy.stats.geom``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.laplace``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.logistic``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.multinomial``
+     - 0.3.18
+     - 5.1.0
+   * - ``jax.scipy.stats.multivariate_normal``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.nbinom``
+     - 0.1.72
+     - 5.0.0
+   * - ``jax.scipy.stats.norm``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.pareto``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.poisson``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.t``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.truncnorm``
+     - 0.4.0
+     - 5.3.0
+   * - ``jax.scipy.stats.uniform``
+     - 0.1.56
+     - 5.0.0
+   * - ``jax.scipy.stats.vonmises``
+     - 0.4.2
+     - 5.3.0
+   * - ``jax.scipy.stats.wrapcauchy``
+     - 0.4.20
+     - 5.6.0
+
+jax.extend module
+-------------------------------------------------------------------------------
+
+Modules for JAX extensions.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Since JAX
+      - Since ROCm
+    * - ``jax.extend.ffi``
+      - 0.4.30
+      - 6.0.0
+    * - ``jax.extend.linear_util``
+      - 0.4.17
+      - 5.6.0
+    * - ``jax.extend.mlir``
+      - 0.4.26
+      - 5.6.0
+    * - ``jax.extend.random``
+      - 0.4.15
+      - 5.5.0
+
+jax.experimental module
+-------------------------------------------------------------------------------
+
+Experimental modules and APIs.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Since JAX
+      - Since ROCm
+    * - ``jax.experimental.checkify``
+      - 0.1.75
+      - 5.0.0
+    * - ``jax.experimental.compilation_cache.compilation_cache``
+      - 0.1.68
+      - 5.0.0
+    * - ``jax.experimental.custom_partitioning``
+      - 0.4.0
+      - 5.3.0
+    * - ``jax.experimental.jet``
+      - 0.1.56
+      - 5.0.0
+    * - ``jax.experimental.key_reuse``
+      - 0.4.26
+      - 5.6.0
+    * - ``jax.experimental.mesh_utils``
+      - 0.1.76
+      - 5.0.0
+    * - ``jax.experimental.multihost_utils``
+      - 0.3.2
+      - 5.0.0
+    * - ``jax.experimental.pallas``
+      - 0.4.15
+      - 5.5.0
+    * - ``jax.experimental.pjit``
+      - 0.1.61
+      - 5.0.0
+    * - ``jax.experimental.serialize_executable``
+      - 0.4.0
+      - 5.3.0
+    * - ``jax.experimental.shard_map``
+      - 0.4.3
+      - 5.3.0
+    * - ``jax.experimental.sparse``
+      - 0.1.75
+      - 5.0.0
+
+.. list-table::
+    :header-rows: 1
+
+    * - API
+      - Since JAX
+      - Since ROCm
+    * - ``jax.experimental.enable_x64``
+      - 0.1.60
+      - 5.0.0
+    * - ``jax.experimental.disable_x64``
+      - 0.1.60
+      - 5.0.0
+
+jax.experimental.pallas module
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Module for Pallas, a JAX extension for custom kernels.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Since JAX
+      - Since ROCm
+    * - ``jax.experimental.pallas.mosaic_gpu``
+      - 0.4.31
+      - 6.1.3
+    * - ``jax.experimental.pallas.tpu``
+      - 0.4.15
+      - 5.5.0
+    * - ``jax.experimental.pallas.triton``
+      - 0.4.32
+      - 6.1.3
+
+jax.experimental.sparse module
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Experimental support for sparse matrix operations.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Since JAX
+      - Since ROCm
+    * - ``jax.experimental.sparse.linalg``
+      - 0.3.15
+      - 5.2.0
+    * - ``jax.experimental.sparse.sparsify``
+      - 0.3.25
+      - ❌
+
+.. list-table::
+    :header-rows: 1
+
+    * - ``sparse`` data structure API
+      - Since JAX
+      - Since ROCm
+    * - ``jax.experimental.sparse.BCOO``
+      - 0.1.72
+      - 5.0.0
+    * - ``jax.experimental.sparse.BCSR``
+      - 0.3.20
+      - 5.1.0
+    * - ``jax.experimental.sparse.CSR``
+      - 0.1.75
+      - 5.0.0
+    * - ``jax.experimental.sparse.NM``
+      - 0.4.27
+      - 5.6.0
+    * - ``jax.experimental.sparse.COO``
+      - 0.1.75
+      - 5.0.0
+
+Unsupported JAX features
+------------------------
+
+The following are GPU-accelerated JAX features not currently supported by
+ROCm.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Data type
+      - Description
+      - Since JAX
+    * - Mixed Precision with TF32
+      - Mixed precision with TF32 is used for matrix multiplications,
+        convolutions, and other linear algebra operations, particularly in
+        deep learning workloads like CNNs and transformers.
+      - 0.2.25
+    * - RNN support
+      - Currently only LSTM with double bias is supported with float32 input
+        and weight.
+      - 0.3.25
+    * - XLA int4 support
+      - 4-bit integer (int4) precision in the XLA compiler.
+      - 0.4.0
+    * - ``jax.experimental.sparsify``
+      - Converts a dense matrix to a sparse matrix representation.
+      - Experimental
+
+Use cases and recommendations
+================================================================================
+
+* The `nanoGPT in JAX <https://rocm.blogs.amd.com/artificial-intelligence/nanoGPT-JAX/README.html>`_
+  blog explores the implementation and training of a Generative Pre-trained
+  Transformer (GPT) model in JAX, inspired by Andrej Karpathy’s PyTorch-based
+  nanoGPT. By comparing how essential GPT components—such as self-attention
+  mechanisms and optimizers—are realized in PyTorch and JAX, also highlight
+  JAX’s unique features.
+
+* The `Optimize GPT Training: Enabling Mixed Precision Training in JAX using
+  ROCm on AMD GPUs <https://rocm.blogs.amd.com/artificial-intelligence/jax-mixed-precision/README.html>`_
+  blog post provides a comprehensive guide on enhancing the training efficiency
+  of GPT models by implementing mixed precision techniques in JAX, specifically
+  tailored for AMD GPUs utilizing the ROCm platform.
+
+* The `Supercharging JAX with Triton Kernels on AMD GPUs <https://rocm.blogs.amd.com/artificial-intelligence/jax-triton/README.html>`_
+  blog demonstrates how to develop a custom fused dropout-activation kernel for
+  matrices using Triton, integrate it with JAX, and benchmark its performance
+  using ROCm.
+
+* The `Distributed fine-tuning with JAX on AMD GPUs <https://rocm.blogs.amd.com/artificial-intelligence/distributed-sft-jax/README.html>`_
+  outlines the process of fine-tuning a Bidirectional Encoder Representations
+  from Transformers (BERT)-based large language model (LLM) using JAX for a text
+  classification task. The blog post discuss techniques for parallelizing the
+  fine-tuning across multiple AMD GPUs and assess the model's performance on a
+  holdout dataset. During the fine-tuning, a BERT-base-cased transformer model
+  and the General Language Understanding Evaluation (GLUE) benchmark dataset was
+  used on a multi-GPU setup.
+
+* The `MI300X workload optimization guide <https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html>`_
+  provides detailed guidance on optimizing workloads for the AMD Instinct MI300X
+  accelerator using ROCm. The page is aimed at helping users achieve optimal
+  performance for deep learning and other high-performance computing tasks on
+  the MI300X GPU.
+
+For more use cases and recommendations, see `ROCm JAX blog posts <https://rocm.blogs.amd.com/blog/tag/jax.html>`_
+
--- a/docs/compatibility/ml-compatibility/pytorch-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/pytorch-compatibility.rst
@@ -11,8 +11,9 @@ deep learning. PyTorch on ROCm provides mixed-precision and large-scale training
 using `MIOpen <https://github.com/ROCm/MIOpen>`_ and
 `RCCL <https://github.com/ROCm/rccl>`_ libraries.

-ROCm support for PyTorch is upstreamed into the official PyTorch repository. Due to independent
-compatibility considerations, this results in two distinct release cycles for PyTorch on ROCm:
+ROCm support for PyTorch is upstreamed into the official PyTorch repository. Due
+to independent compatibility considerations, this results in two distinct
+release cycles for PyTorch on ROCm:

 - ROCm PyTorch release:

@@ -22,7 +23,7 @@ compatibility considerations, this results in two distinct release cycles for Py
  - Offers :ref:`Docker images <pytorch-docker-compat>` with ROCm and PyTorch
    pre-installed.

-  - ROCm PyTorch repository: `<https://github.com/rocm/pytorch>`__
+  - ROCm PyTorch repository: `<https://github.com/ROCm/pytorch>`__

  - See the :doc:`ROCm PyTorch installation guide <rocm-install-on-linux:install/3rd-party/pytorch-install>` to get started.

@@ -47,9 +48,14 @@ the stable release of ROCm to maintain consistency.
 Docker image compatibility
 ================================================================================

+.. |docker-icon| raw:: html
+
+   <i class="fab fa-docker"></i>
+
 AMD validates and publishes ready-made `PyTorch <https://hub.docker.com/r/rocm/pytorch>`_
 images with ROCm backends on Docker Hub. The following Docker image tags and
 associated inventories are validated for `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_.
+Click |docker-icon| to see the image on Docker Hub.

 .. list-table:: PyTorch Docker image components
    :header-rows: 1
@@ -190,7 +196,7 @@ associated inventories are validated for `ROCm 6.3.0 <https://repo.radeon.com/ro
 Critical ROCm libraries for PyTorch
 ================================================================================

-The functionality of PyTorch with ROCm is shaped by its underlying library
+The functionality of PyTorch with ROCm is determined by its underlying library
 dependencies. These critical ROCm components affect the capabilities,
 performance, and feature set available to developers.

@@ -269,7 +275,7 @@ performance, and feature set available to developers.
        ``torch.nn.Conv2d``, ``torch.nn.ReLU``, and ``torch.nn.LSTM``.
    * - `MIGraphX <https://github.com/ROCm/AMDMIGraphX>`_
      - 2.11.0
-      - Add graph-level optimizations, ONNX models and mixed precision support
+      - Adds graph-level optimizations, ONNX models and mixed precision support
        and enable Ahead-of-Time (AOT) Compilation.
      - Speeds up inference models and executes ONNX models for
        compatibility with other frameworks.
@@ -295,19 +301,19 @@ performance, and feature set available to developers.
        Handles communication in multi-GPU setups.
    * - `rocDecode <https://github.com/ROCm/rocDecode>`_
      - 0.8.0
-      - Provide hardware-accelerated data decoding capabilities, particularly
+      - Provides hardware-accelerated data decoding capabilities, particularly
        for image, video, and other dataset formats.
      - Can be integrated in ``torch.utils.data``, ``torchvision.transforms``
        and ``torch.distributed``.
    * - `rocJPEG <https://github.com/ROCm/rocJPEG>`_
      - 0.6.0
-      - Provide hardware-accelerated JPEG image decoding and encoding.
+      - Provides hardware-accelerated JPEG image decoding and encoding.
      - GPU accelerated ``torchvision.io.decode_jpeg`` and
        ``torchvision.io.encode_jpeg`` and can be integrated in
        ``torch.utils.data`` and ``torchvision``.
    * - `RPP <https://github.com/ROCm/RPP>`_
      - 1.9.1
-      - Speed up data augmentation, transformation, and other preprocessing step.
+      - Speeds up data augmentation, transformation, and other preprocessing steps.
      - Easy to integrate into PyTorch's ``torch.utils.data`` and
        ``torchvision`` data load workloads.
    * - `rocThrust <https://github.com/ROCm/rocThrust>`_
@@ -472,13 +478,13 @@ leveraging ROCm and CUDA as the underlying frameworks.
      - 0.4.0
      - 3.8
    * - Tensor operations on GPU
-      - Perform tensor operations such as addition and matrix multiplications on
+      - Performs tensor operations such as addition and matrix multiplications on
        the GPU.
      - 0.4.0
      - 3.8
    * - Streams and events
      - Streams allow overlapping computation and communication for optimized
-        performance, events enable synchronization.
+        performance. Events enable synchronization.
      - 1.6.0
      - 3.8
    * - Memory management
@@ -488,13 +494,13 @@ leveraging ROCm and CUDA as the underlying frameworks.
      - 0.3.0
      - 1.9.2
    * - Running process lists of memory management
-      - Return a human-readable printout of the running processes and their GPU
-        memory use for a given device with functions like 
+      - Returns a human-readable printout of the running processes and their GPU
+        memory use for a given device with functions like
        ``torch.cuda.memory_stats()`` and ``torch.cuda.memory_summary()``.
      - 1.8.0
      - 4.0
    * - Communication collectives
-      - A set of APIs that enable efficient communication between multiple GPUs,
+      - Set of APIs that enable efficient communication between multiple GPUs,
        allowing for distributed computing and data parallelism.
      - 1.9.0
      - 5.0
@@ -657,14 +663,14 @@ of computational resources and scalability for large-scale tasks.
      - Since PyTorch
      - Since ROCm
    * - TensorPipe
-      - TensorPipe is a point-to-point communication library integrated into
+      - A point-to-point communication library integrated into
        PyTorch for distributed training. It is designed to handle tensor data
        transfers efficiently between different processes or devices, including
        those on separate machines.
      - 1.8
      - 5.4
    * - Gloo
-      - Gloo is designed for multi-machine and multi-GPU setups, enabling
+      - Designed for multi-machine and multi-GPU setups, enabling
        efficient communication and synchronization between processes. Gloo is
        one of the default backends for PyTorch's Distributed Data Parallel
        (DDP) and RPC frameworks, alongside other backends like NCCL and MPI.
@@ -716,11 +722,11 @@ The following ``torchaudio`` features are GPU-accelerated.
      - Since torchaudio version
      - Since ROCm
    * - ``torchaudio.transforms.Spectrogram``
-      - Generate spectrogram of an input waveform using STFT.
+      - Generates spectrogram of an input waveform using STFT.
      - 0.6.0
      - 4.5
    * - ``torchaudio.transforms.MelSpectrogram``
-      - Generate the mel-scale spectrogram of raw audio signals.
+      - Generates the mel-scale spectrogram of raw audio signals.
      - 0.9.0
      - 4.5
    * - ``torchaudio.transforms.MFCC``
@@ -728,7 +734,7 @@ The following ``torchaudio`` features are GPU-accelerated.
      - 0.9.0
      - 4.5
    * - ``torchaudio.transforms.Resample``
-      - Resample a signal from one frequency to another
+      - Resamples a signal from one frequency to another.
      - 0.9.0
      - 4.5

@@ -766,7 +772,7 @@ The following ``torchvision`` features are GPU-accelerated.
      - 0.1.6
      - 2.x
    * - ``torchvision.io``
-      - Video decoding and frame extraction using GPU acceleration with NVIDIA’s
+      - Enables video decoding and frame extraction using GPU acceleration with NVIDIA’s
        NVDEC and nvJPEG (rocJPEG) on CUDA-enabled GPUs.
      - 0.4.0
      - 6.3
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -39,7 +39,8 @@ all_article_info_author = ""
 # pages with specific settings
 article_pages = [
    {"file": "about/release-notes", "os": ["linux", "windows"], "date": "2024-12-20"},
-    {"file": "compatibility/pytorch-compatibility", "os": ["linux"]},
+    {"file": "compatibility/ml-compatibility/pytorch-compatibility", "os": ["linux"]},
+    {"file": "compatibility/ml-compatibility/jax-compatibility", "os": ["linux"]},
    {"file": "how-to/deep-learning-rocm", "os": ["linux"]},
    {"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
    {"file": "how-to/rocm-for-ai/install", "os": ["linux"]},
--- a/docs/how-to/deep-learning-rocm.rst
+++ b/docs/how-to/deep-learning-rocm.rst
@@ -14,9 +14,10 @@ frameworks to ensure that framework-specific optimizations take advantage of AMD
 The following guides provide information on compatibility and supported
 features for these ROCm-enabled deep learning frameworks.

-* :doc:`PyTorch compatibility <../compatibility/pytorch-compatibility>`
-.. * :doc:`TensorFlow compatibility <../compatibility/tensorflow-compatibility>`
-.. * :doc:`JAX compatibility <../compatibility/jax-compatibility>`
+* :doc:`PyTorch compatibility <../compatibility/ml-compatibility/pytorch-compatibility>`
+* :doc:`JAX compatibility <../compatibility/ml-compatibility/jax-compatibility>`
+
+.. * :doc:`TensorFlow compatibility <../compatibility/ml-compatibility/tensorflow-compatibility>`

 This chart steps through typical installation workflows for installing deep learning frameworks for ROCm.