From 73244f70f54ce508935409bb8159d9baf9006225 Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Thu, 27 Nov 2025 15:15:25 +0100 Subject: [PATCH] JAX key features and enhancements Co-authored-by: Pratik Basyal --- .wordlist.txt | 5 ++++ .../ml-compatibility/jax-compatibility.rst | 27 +++++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/.wordlist.txt b/.wordlist.txt index 8760b6c34..e65036c2a 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -36,6 +36,7 @@ Andrej Arb Autocast autograd +Backported BARs BatchNorm BLAS @@ -201,9 +202,11 @@ GenAI GenZ GitHub Gitpod +hardcoded HBM HCA HGX +HLO HIPCC hipDataType HIPExtension @@ -327,6 +330,7 @@ MoEs Mooncake Mpops Multicore +multihost Multithreaded MXFP MyEnvironment @@ -1017,6 +1021,7 @@ uncacheable uncorrectable underoptimized unhandled +unfused uninstallation unmapped unsqueeze diff --git a/docs/compatibility/ml-compatibility/jax-compatibility.rst b/docs/compatibility/ml-compatibility/jax-compatibility.rst index 57940b32a..4083b9e57 100644 --- a/docs/compatibility/ml-compatibility/jax-compatibility.rst +++ b/docs/compatibility/ml-compatibility/jax-compatibility.rst @@ -269,6 +269,33 @@ For a complete and up-to-date list of JAX public modules (for example, ``jax.num JAX API modules are maintained by the JAX project and is subject to change. Refer to the official Jax documentation for the most up-to-date information. +Key features and enhancements for ROCm 7.1 +=============================================================================== + +- Enabled compilation of multihost HLO runner Python bindings. + + - Backported multihost HLO runner bindings and some related changes to + :code:`FunctionalHloRunner`. + + - Added :code:`requirements_lock_3_12` to enable building for Python 3.12. + +- Removed hardcoded NHWC convolution layout for ``fp16`` precision to address the performance drops for ``fp16`` precision on gfx12xx GPUs. + + +- ROCprofiler-SDK integration: + + - Integrated ROCprofiler-SDK (v3) to XLA to improve profiling of GPU events, + support both time-based and step-based profiling. + + - Added unit tests for :code:`rocm_collector` and :code:`rocm_tracer`. + +- Added Triton unsupported conversion from ``f8E4M3FNUZ`` to ``fp16`` with + rounding mode. + +- Introduced :code:`CudnnFusedConvDecomposer` to revert fused convolutions + when :code:`ConvAlgorithmPicker` fails to find a fused algorithm, and removed + unfused fallback paths from :code:`RocmFusedConvRunner`. + Key features and enhancements for ROCm 7.0 ===============================================================================