Remove duplicate entry for Tensile

2026-01-09 14:48:06 -05:00 · 2024-04-16 13:27:04 -06:00
parent 2ea7ac694e
commit 6d7daee9af
1 changed files with 0 additions and 52 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -60,7 +60,6 @@ environments where legacy `DT_RPATH` is the preferred form of linking (instead o
 do **not** recommend trying to install both sets of packages.
 ```

-
 ### Library changes in ROCm 6.1.0

 | Library | Version |
@@ -319,53 +318,6 @@ rocWMMA 1.4.0 for ROCm 6.1.0
 * Built all test in large code model
 * Removed inefficient branching in layout loop unrolling

-#### Tensile
-
-Tensile 4.40.0 for ROCm 6.1.0
-
-##### Additions
-
-* New `DisableKernelPieces` values to invalidate local read, local write, and global read
-* Added Stream-K kernel generation, including two-tile Stream-K algorithm by setting `StreamK=3`
-* New feature to allow testing Stream-K grid multipliers
-* Added debug output to check occupancy for Stream-K
-* Added reject condition for FractionalLoad + DepthU!=power of 2
-* New `TENSILE_DB` debugging value to dump the common kernel parameters
-* Added predicate for APU libs
-* New parameter (`ClusterLocalRead`) to turn on/off wider local read opt for `TileMajorLDS`
-* New parameter (`ExtraLatencyForLR`) to add extra interval between local read and wait
-* New logic to check LDS size with auto LdsPad(=1) and change LdsPad to 0 if LDS overflows
-* Added initialization type and general batched options to the `rocblas-bench` input creator script
-
-##### Optimizations
-
-* Enabled `MFMA` + `LocalSplitU=4` for `MT16x16`
-* Enabled (`DirectToVgpr` + `MI4x4`) and supported skinny MacroTile
-* Optimized postGSU kernel: separate postGSU kernels for different GSU values, loop unroll for GSU
-  loop, wider global load depending on array size, and parallel reduction depending on array size
-* Auto LdsPad calculation for `TileMajorLds` + `MI16x16`
-* Auto LdsPad calculation for `UnrollMajorLds` + `MI16x16` + `VectorWidth`
-
-##### Changes
-
-* Cleared `hipErrorNotFound` error since it is an expected part of the search
-* Modified hipCC search path for Linux
-* Changed PCI ID from 32-bit to 64-bit for ROCm SMI HW monitor
-* Changed `LdsBlockSizePerPad` to `LdsBlockSizePerPadA`, B to specify LBSPP separately
-* Changed the default value of `LdsPadA`, B, `LdsBlockSizePerPadA`, B from 0 to -1
-* Updated test cases according to parameter changes for LdsPad, LBSPP and ClusterLocalRead
-* Replaced `std::regex` with `fnmatch()/PathMatchSpec` as a workaround to `std::regex` stack overflow
-  known bug
-
-##### Fixes
-
-* hipCC compile append flag `parallel-jobs=4`
-* Race condition in Stream-K that appeared with large grids and small sizes
-* Mismatch issue with `LdsPad` + `LdsBlockSizePerPad!=0` and `TailLoop`
-* Mismatch issue with `LdsPad` + `LdsBlockSizePerPad!=0` and `SplitLds`
-* Incorrect reject condition check for `DirectToLds` + `LdsBlockSizePerPad=-1` case
-* Small fix for `LdsPad` optimization (`LdsElement` calculation)  
-
 #### hipBLAS

 hipBLAS 2.1.0 for ROCm 6.1.0
@@ -405,7 +357,6 @@ hipTensor 1.2.0 for ROCm 6.1.0
 
 * Fixed bug in contraction calculation with data type F32

-
 #### hipBLASLt

 hipBLASLt 0.7.0 for ROCm 6.1.0
@@ -882,7 +833,6 @@ Tensile 4.40.0 for ROCm 6.1.0
 ##### Known issue

 * In a future release, the ROCm Validation Suite P2P Benchmark and Qualification Tool (PBQT) tests will be optimized to meet the target bandwidth requirements for MI300X.
-  

 #### MI200 SR-IOV 

@@ -890,7 +840,6 @@ Tensile 4.40.0 for ROCm 6.1.0

 * Multimedia applications may encounter compilation errors in the MI200 Single Root Input/Output Virtualization (SR-IOV) environment. This is because MI200 SR-IOV does not currently support multimedia applications. 

-
 ### AMD MI300A RAS

 #### Fixed defect
@@ -901,7 +850,6 @@ Tensile 4.40.0 for ROCm 6.1.0

  This issue is resolved in the ROCm 6.1 release, and users will no longer encounter the GFX correctable error (CE) and uncorrectable error (UE) failures.

-
 ## ROCm 6.0.2

 The ROCm 6.0.2 point release consists of minor bug fixes to improve the stability of MI300 GPU