# MI100 high-performance computing and tuning guide
## System settings
This chapter reviews system settings that are required to configure the system
for AMD Instinct™ MI100 accelerators and that can improve performance of the
GPUs. It is advised to configure the system for best possible host configuration
according to the high-performance computing tuning guides for AMD EPYC™
7002 Series and EPYC™ 7003 Series processors, depending on the processor generation of the
system.
In addition to the BIOS settings listed below the following settings
({ref}`mi100-bios-settings`) will also have to be enacted via the command line (see
{ref}`mi100-os-settings`):
* Core C states
* AMD-PCI-UTIL (on AMD EPYC™ 7002 series processors)
* IOMMU (if needed)
(mi100-bios-settings)=
### System BIOS settings
For maximum MI100 GPU performance on systems with AMD EPYC™ 7002 series
processors (codename "Rome") and AMI System BIOS, the following configuration of
System BIOS settings has been validated. These settings must be used for the
qualification process and should be set as default values for the system BIOS.
Analogous settings for other non-AMI System BIOS providers could be set
similarly. For systems with Intel processors, some settings may not apply or be
available as listed in the following table.
```{list-table} Recommended settings for the system BIOS in a GIGABYTE platform.
:header-rows: 1
:name: mi100-bios
*
- BIOS Setting Location
- Parameter
- Value
- Comments
*
- Advanced / PCI Subsystem Settings
- Above 4G Decoding
- Enabled
- GPU Large BAR Support
*
- AMD CBS / CPU Common Options
- Global C-state Control
- Auto
- Global C-States
*
- AMD CBS / CPU Common Options
- CCD/Core/Thread Enablement
- Accept
- Global C-States
*
- AMD CBS / CPU Common Options / Performance
- SMT Control
- Disable
- Global C-States
*
- AMD CBS / DF Common Options / Memory Addressing
- NUMA nodes per socket
- NPS 1,2,4
- NUMA Nodes (NPS)
*
- AMD CBS / DF Common Options / Memory Addressing
- Memory interleaving
- Auto
- Numa Nodes (NPS)
*
- AMD CBS / DF Common Options / Link
- 4-link xGMI max speed
- 18 Gbps
- Set AMD CPU xGMI speed to highest rate supported
*
- AMD CBS / DF Common Options / Link
- 3-link xGMI max speed
- 18 Gbps
- Set AMD CPU xGMI speed to highest rate supported
*
- AMD CBS / NBIO Common Options
- IOMMU
- Disable
-
*
- AMD CBS / NBIO Common Options
- PCIe Ten Bit Tag Support
- Enable
-
*
- AMD CBS / NBIO Common Options
- Preferred IO
- Manual
-
*
- AMD CBS / NBIO Common Options
- Preferred IO Bus
- "Use lspci to find pci device id"
-
*
- AMD CBS / NBIO Common Options
- Enhanced Preferred IO Mode
- Enable
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- Determinism Control
- Manual
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- Determinism Slider
- Power
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- cTDP Control
- Manual
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- cTDP
- 240
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- Package Power Limit Control
- Manual
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- Package Power Limit
- 240
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- xGMI Link Width Control
- Manual
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- xGMI Force Link Width
- 2
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- xGMI Force Link Width Control
- Force
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- APBDIS
- 1
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- DF C-states
- Auto
-
*
- AMD CBS / NBIO Common Options / SMU Common Options
- Fixed SOC P-state
- P0
-
*
- AMD CBS / UMC Common Options / DDR4 Common Options
- Enforce POR
- Accept
-
*
- AMD CBS / UMC Common Options / DDR4 Common Options / Enforce POR
- Overclock
- Enabled
-
*
- AMD CBS / UMC Common Options / DDR4 Common Options / Enforce POR
- Memory Clock Speed
- 1600 MHz
- Set to max Memory Speed, if using 3200 MHz DIMMs
*
- AMD CBS / UMC Common Options / DDR4 Common Options / DRAM Controller
Configuration / DRAM Power Options
- Power Down Enable
- Disabled
- RAM Power Down
*
- AMD CBS / Security
- TSME
- Disabled
- Memory Encryption
```
#### NBIO link clock frequency
The NBIOs (4x per AMD EPYC™ processor) are the serializers/deserializers (also
known as "SerDes") that convert and prepare the I/O signals for the processor's
128 external I/O interface lanes (32 per NBIO).
LCLK (short for link clock frequency) controls the link speed of the internal
bus that connects the NBIO silicon with the data fabric. All data between the
processor and its PCIe lanes flow to the data fabric based on these LCLK
frequency settings. The link clock frequency of the NBIO components need to be
forced to the maximum frequency for optimal PCIe performance.
For AMD EPYC™ 7002 series processors, this setting cannot be modified via
configuration options in the server BIOS alone. Instead, the AMD-IOPM-UTIL (see
Section 3.2.3) must be run at every server boot to disable Dynamic Power
Management for all PCIe Root Complexes and NBIOs within the system and to lock
the logic into the highest performance operational mode.
For AMD EPYC™ 7003 series processors, configuring all NBIOs to be in "Enhanced
Preferred I/O" mode is sufficient to enable highest link clock frequency for the
NBIO components.
#### Memory configuration
For the memory addressing modes, especially the
number of NUMA nodes per socket/processor (NPS), the recommended setting is
to follow the guidance of the high-performance computing tuning guides
for AMD EPYC™ 7002 Series and AMD EPYC™ 7003 Series processors to provide the optimal
configuration for host side computation.
If the system is set to one NUMA domain per socket/processor (NPS1),
bidirectional copy bandwidth between host memory and GPU memory may be
slightly higher (up to about 16% more) than with four NUMA domains per socket
processor (NPS4). For memory bandwidth sensitive applications using MPI, NPS4
is recommended. For applications that are not optimized for NUMA locality,
NPS1 is the recommended setting.
(mi100-os-settings)=
### Operating system settings
#### CPU core states - C-states
There are several core states (C-states) that an AMD EPYC CPU can idle within:
* C0: active. This is the active state while running an application.
* C1: idle
* C2: idle and power gated. This is a deeper sleep state and will have a
greater latency when moving back to the C0 state, compared to when the
CPU is coming out of C1.
Disabling C2 is important for running with a high performance, low-latency
network. To disable power-gating on all cores run the following on Linux
systems:
```shell
cpupower idle-set -d 2
```
Note that the `cpupower` tool must be installed, as it is not part of the base
packages of most Linux® distributions. The package needed varies with the
respective Linux distribution.
::::{tab-set}
:::{tab-item} Ubuntu
:sync: ubuntu
```shell
sudo apt install linux-tools-common
```
:::
:::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
```shell
sudo yum install cpupowerutils
```
:::
:::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
```shell
sudo zypper install cpupower
```
:::
::::
#### AMD-IOPM-UTIL
This section applies to AMD EPYC™ 7002 processors to optimize advanced
Dynamic Power Management (DPM) in the I/O logic (see NBIO description above)
for performance. Certain I/O workloads may benefit from disabling this power
management. This utility disables DPM for all PCI-e root complexes in the
system and locks the logic into the highest performance operational mode.
Disabling I/O DPM will reduce the latency and/or improve the throughput of
low-bandwidth messages for PCI-e InfiniBand NICs and GPUs. Other workloads
with low-bandwidth bursty PCI-e I/O characteristics may benefit as well if
multiple such PCI-e devices are installed in the system.
The actions of the utility do not persist across reboots. There is no need to
change any existing firmware settings when using this utility. The "Preferred
I/O" and "Enhanced Preferred I/O" settings should remain unchanged at enabled.
```{tip}
The recommended method to use the utility is either to create a system
start-up script, for example, a one-shot `systemd` service unit, or run the
utility when starting up a job scheduler on the system. The installer
packages (see
[Power Management Utility](https://developer.amd.com/iopm-utility/)) will
create and enable a `systemd` service unit for you. This service unit is
configured to run in one-shot mode. This means that even when the service
unit runs as expected, the status of the service unit will show inactive.
This is the expected behavior when the utility runs normally. If the service
unit shows failed, the utility did not run as expected. The output in either
case can be shown with the `systemctl status` command.
Stopping the service unit has no effect since the utility does not leave
anything running. To undo the effects of the utility, disable the service
unit with the `systemctl disable` command and reboot the system.
The utility does not have any command-line options, and it must be run with
super-user permissions.
```
#### Systems with 256 CPU threads - IOMMU configuration
For systems that have 256 logical CPU cores or more (e.g., 64-core AMD EPYC™
7763 in a dual-socket configuration and SMT enabled), setting the input-output
memory management unit (IOMMU) configuration to "disabled" can limit the number
of available logical cores to 255. The reason is that the Linux® kernel disables
X2APIC in this case and falls back to Advanced Programmable Interrupt Controller
(APIC), which can only enumerate a maximum of 255 (logical) cores.
If SMT is enabled by setting "CCD/Core/Thread Enablement > SMT Control" to
"enable", the following steps can be applied to the system to enable all
(logical) cores of the system:
* In the server BIOS, set IOMMU to "Enabled".
* When configuring the Grub boot loader, add the following arguments for the
Linux kernel: `amd_iommu=on iommu=pt`
* Update Grub to use the modified configuration:
```shell
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
```
* Reboot the system.
* Verify IOMMU passthrough mode by inspecting the kernel log via `dmesg`:
```none
[...]
[ 0.000000] Kernel command line: [...] amd_iommu=on iommu=pt
[...]
```
Once the system is properly configured, ROCm software can be
installed.
## System management
For a complete guide on how to install/manage/uninstall ROCm on Linux, refer to
{doc}`Quick-start (Linux)