mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-09 22:58:17 -05:00
640 RN known issues batch 4 (#365)
* ROCProfiler deprecation notice udpated * RHEL 9.6 support removed and 9.5 EOS rejected * Feedback to KV cache highlight added * Wrong entry of ROCprofiler-SDK removed * Additional known issues added * GA Release date updated * Consolidated changelog sync
This commit is contained in:
10
CHANGELOG.md
10
CHANGELOG.md
@@ -263,16 +263,24 @@ and in-depth descriptions.
|
||||
- Perl package installation is not required, and users will need to install this themselves if they want to.
|
||||
- Support for ROCm Object tooling has moved into `llvm-objdump` provided by package `rocm-llvm`.
|
||||
|
||||
* SDMA retainer logic is removed for engine selection in operation of runtime buffer copy.
|
||||
|
||||
#### Optimized
|
||||
|
||||
* `hipGraphLaunch` parallelism is improved for complex data-parallel graphs.
|
||||
* Make the round-robin queue selection in command scheduling. For multi-streams execution, HSA queue from null stream lock is freed and won't occupy the queue ID after the kernel in the stream is finished.
|
||||
* The HIP runtime doesn't free bitcode object before code generation. It adds a cache, which allows compiled code objects to be reused instead of recompiling. This improves performance on multi-GPU systems.
|
||||
* Runtime now uses unified copy approach:
|
||||
|
||||
- Unpinned `H2D` copies are no longer blocking until the size of 1 MB.
|
||||
- Kernel copy path is enabled for unpinned `H2D`/`D2H` methods.
|
||||
- The default environment variable `GPU_FORCE_BLIT_COPY_SIZE` is set to `16`, which limits the kernel copy to sizes less than 16 KB, while copies larger than that would be handled by `SDMA` engine.
|
||||
- Blit code is refactored, and ASAN instrumentation is cleaned up.
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Out-of-memory error on Microsoft Windows. When the user calls `hipMalloc` for device memory allocation while specifying a size larger than the available device memory, the HIP runtime fixes the error in the API implementation, allocating the available device memory plus system memory (shared virtual memory).
|
||||
* Error of dependency on libgcc-s1 during rocm-dev install on Debian Buster. HIP runtime now uses libgcc1 for this distros.
|
||||
* Error of dependency on `libgcc-s1` during rocm-dev install on Debian Buster. HIP runtime now uses `libgcc1` for this distros.
|
||||
* Stack corruption during kernel execution. HIP runtime now adds a maximum stack size limit based on the GPU device feature.
|
||||
|
||||
#### Upcoming changes
|
||||
|
||||
Reference in New Issue
Block a user