640 RN known issues batch 4 (#365)

* ROCProfiler deprecation notice udpated

* RHEL 9.6 support removed and 9.5 EOS rejected

* Feedback to KV cache highlight added

* Wrong entry of ROCprofiler-SDK removed

* Additional known issues added

* GA Release date updated

* Consolidated changelog sync
This commit is contained in:
Pratik Basyal
2025-04-10 09:05:34 -04:00
committed by GitHub
parent 588204f800
commit 1e231b4b28
3 changed files with 28 additions and 4 deletions

View File

@@ -263,16 +263,24 @@ and in-depth descriptions.
- Perl package installation is not required, and users will need to install this themselves if they want to.
- Support for ROCm Object tooling has moved into `llvm-objdump` provided by package `rocm-llvm`.
* SDMA retainer logic is removed for engine selection in operation of runtime buffer copy.
#### Optimized
* `hipGraphLaunch` parallelism is improved for complex data-parallel graphs.
* Make the round-robin queue selection in command scheduling. For multi-streams execution, HSA queue from null stream lock is freed and won't occupy the queue ID after the kernel in the stream is finished.
* The HIP runtime doesn't free bitcode object before code generation. It adds a cache, which allows compiled code objects to be reused instead of recompiling. This improves performance on multi-GPU systems.
* Runtime now uses unified copy approach:
- Unpinned `H2D` copies are no longer blocking until the size of 1 MB.
- Kernel copy path is enabled for unpinned `H2D`/`D2H` methods.
- The default environment variable `GPU_FORCE_BLIT_COPY_SIZE` is set to `16`, which limits the kernel copy to sizes less than 16 KB, while copies larger than that would be handled by `SDMA` engine.
- Blit code is refactored, and ASAN instrumentation is cleaned up.
#### Resolved issues
* Out-of-memory error on Microsoft Windows. When the user calls `hipMalloc` for device memory allocation while specifying a size larger than the available device memory, the HIP runtime fixes the error in the API implementation, allocating the available device memory plus system memory (shared virtual memory).
* Error of dependency on libgcc-s1 during rocm-dev install on Debian Buster. HIP runtime now uses libgcc1 for this distros.
* Error of dependency on `libgcc-s1` during rocm-dev install on Debian Buster. HIP runtime now uses `libgcc1` for this distros.
* Stack corruption during kernel execution. HIP runtime now adds a maximum stack size limit based on the GPU device feature.
#### Upcoming changes