Merge pull request #654 from SwRaw/swraw/amd-smi-doc

replace rocm-smi reference with amd-smi
2026-04-27 03:01:52 -04:00 · 2026-01-05 19:02:24 +05:30
parent 377d2631e3 5b12c9a80e
commit 5aef686b67
4 changed files with 75 additions and 76 deletions
--- a/docs/how-to/rocm-for-ai/fine-tuning/multi-gpu-fine-tuning-and-inference.rst
+++ b/docs/how-to/rocm-for-ai/fine-tuning/multi-gpu-fine-tuning-and-inference.rst
@@ -44,7 +44,7 @@ Setting up the base implementation environment

   .. code-block:: shell

-      rocm-smi --showproductname
+      amd-smi static --board

 #. Check that your GPUs are available to PyTorch.

@@ -65,8 +65,8 @@ Setting up the base implementation environment

 .. tip::

-   During training and inference, you can check the memory usage by running the ``rocm-smi`` command in your terminal.
-   This tool helps you see shows which GPUs are involved.
+   During training and inference, you can check the memory usage by running the ``amd-smi`` command in your terminal.
+   This tool helps you see which GPUs are involved.


 .. _fine-tuning-llms-multi-gpu-hugging-face-accelerate:
@@ -91,10 +91,10 @@ Now, it's important to adjust how you load the model. Add the ``device_map`` par

   ...
   base_model_name = "meta-llama/Llama-2-7b-chat-hf"
-   
+
   # Load base model to GPU memory
   base_model = AutoModelForCausalLM.from_pretrained(
-           base_model_name, 
+           base_model_name,
           device_map = "auto",
           trust_remote_code = True)
   ...
@@ -130,7 +130,7 @@ After loading the model in this way, the model is fully ready to use the resourc
 torchtune for fine-tuning and inference
 =============================================

-`torchtune <https://pytorch.org/torchtune/main/>`_ is a PyTorch-native library for easy single and multi-GPU 
+`torchtune <https://pytorch.org/torchtune/main/>`_ is a PyTorch-native library for easy single and multi-GPU
 model fine-tuning and inference with LLMs.

 #. Install torchtune using pip.
@@ -139,7 +139,7 @@ model fine-tuning and inference with LLMs.

      # Install torchtune with PyTorch release 2.2.2+
      pip install torchtune
-      
+
      # To confirm that the package is installed correctly
      tune --help

@@ -148,12 +148,12 @@ model fine-tuning and inference with LLMs.
   .. code-block:: shell

      usage: tune [-h] {download,ls,cp,run,validate} ...
-      
+
      Welcome to the TorchTune CLI!
-      
+
      options:
        -h, --help            show this help message and exit
-      
+
      subcommands:
        {download,ls,cp,run,validate}

@@ -194,11 +194,11 @@ model fine-tuning and inference with LLMs.
        apply_lora_to_output: False
        lora_rank: 8
        lora_alpha: 16
-      
+
      tokenizer:
        _component_: torchtune.models.llama2.llama2_tokenizer
        path: /tmp/Llama-2-7b-hf/tokenizer.model
-      
+
      # Dataset and sampler
      dataset:
        _component_: torchtune.datasets.alpaca_cleaned_dataset
--- a/docs/how-to/rocm-for-ai/fine-tuning/single-gpu-fine-tuning-and-inference.rst
+++ b/docs/how-to/rocm-for-ai/fine-tuning/single-gpu-fine-tuning-and-inference.rst
@@ -44,20 +44,19 @@ Setting up the base implementation environment

   .. code-block:: shell

-      rocm-smi --showproductname
+      amd-smi static --board

   Your output should look like this:

   .. code-block:: shell

-      ============================ ROCm System Management Interface ============================
-      ====================================== Product Info ======================================
-      GPU[0]          : Card Series:          AMD Instinct MI300X OAM
-      GPU[0]          : Card model:           0x74a1
-      GPU[0]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
-      GPU[0]          : Card SKU:             MI3SRIOV
-      ==========================================================================================
-      ================================== End of ROCm SMI Log ===================================
+      GPU: 0
+         BOARD:
+            MODEL_NUMBER: 102-G39203-0B
+            PRODUCT_SERIAL: PCB079220-1150
+            FRU_ID: 113-AMDG392030B04-100-300000097H
+            PRODUCT_NAME: AMD Instinct MI325 OAM
+            MANUFACTURER_NAME: AMD

 #. Check that your GPUs are available to PyTorch.

@@ -94,13 +93,13 @@ Setting up the base implementation environment
      pip install -r requirements-dev.txt
      cmake -DBNB_ROCM_ARCH="gfx942" -DCOMPUTE_BACKEND=hip -S .
      python setup.py install
-      
+
      # To leverage the SFTTrainer in TRL for model fine-tuning.
      pip install trl
-      
+
      # To leverage PEFT for efficiently adapting pre-trained language models .
      pip install peft
-      
+
      # Install the other dependencies.
      pip install transformers datasets huggingface-hub scipy

@@ -132,7 +131,7 @@ Download the base model and fine-tuning dataset

   .. note::

-      You can also use the `NousResearch Llama-2-7b-chat-hf <https://huggingface.co/NousResearch/Llama-2-7b-chat-hf>`_ 
+      You can also use the `NousResearch Llama-2-7b-chat-hf <https://huggingface.co/NousResearch/Llama-2-7b-chat-hf>`_
      as a substitute. It has the same model weights as the original.

 #. Run the following code to load the base model and tokenizer.
@@ -141,14 +140,14 @@ Download the base model and fine-tuning dataset

      # Base model and tokenizer names.
      base_model_name = "meta-llama/Llama-2-7b-chat-hf"
-      
+
      # Load base model to GPU memory.
      device = "cuda:0"
      base_model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code = True).to(device)
-      
+
      # Load tokenizer.
      tokenizer = AutoTokenizer.from_pretrained(
-              base_model_name, 
+              base_model_name,
              trust_remote_code = True)
      tokenizer.pad_token = tokenizer.eos_token
      tokenizer.padding_side = "right"
@@ -162,10 +161,10 @@ Download the base model and fine-tuning dataset
      # Dataset for fine-tuning.
      training_dataset_name = "mlabonne/guanaco-llama2-1k"
      training_dataset = load_dataset(training_dataset_name, split = "train")
-      
+
      # Check the data.
      print(training_dataset)
-      
+
      # Dataset 11 is a QA sample in English.
      print(training_dataset[11])

@@ -252,8 +251,8 @@ Compare the number of trainable parameters and training time under the two diffe
                    dataset_text_field = "text",
                    tokenizer = tokenizer,
                    args = training_arguments
-            ) 
-            
+            )
+
            # Run the trainer.
            sft_trainer.train()

@@ -286,7 +285,7 @@ Compare the number of trainable parameters and training time under the two diffe
                    if param.requires_grad:
                        trainable_params += param.numel()
                print(f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}")
-            
+
            sft_trainer.peft_config = None
            print_trainable_parameters(sft_trainer.model)

@@ -309,8 +308,8 @@ Compare the number of trainable parameters and training time under the two diffe
                    dataset_text_field = "text",
                    tokenizer = tokenizer,
                    args = training_arguments
-            ) 
-            
+            )
+
            # Training.
            trainer_full.train()

@@ -349,7 +348,7 @@ store, and load.

         # PEFT adapter name.
         adapter_name = "llama-2-7b-enhanced-adapter"
-         
+
         # Save PEFT adapter.
         sft_trainer.model.save_pretrained(adapter_name)

@@ -359,21 +358,21 @@ store, and load.

         # Access adapter directory.
         cd llama-2-7b-enhanced-adapter
-         
+
         # List all adapter files.
         README.md  adapter_config.json  adapter_model.safetensors

   .. tab-item:: Saving a fully fine-tuned model
      :sync: without

-      If you're not using LoRA and PEFT so there is no PEFT LoRA configuration used for training, use the following code 
+      If you're not using LoRA and PEFT so there is no PEFT LoRA configuration used for training, use the following code
      to save your fine-tuned model to your system.

      .. code-block:: python

         # Fully fine-tuned model name.
         new_model_name = "llama-2-7b-enhanced"
-         
+
         # Save the fully fine-tuned model.
         full_trainer.model.save_pretrained(new_model_name)

@@ -383,7 +382,7 @@ store, and load.

         # Access new model directory.
         cd llama-2-7b-enhanced
-         
+
         # List all model files.
         config.json                       model-00002-of-00006.safetensors  model-00005-of-00006.safetensors
         generation_config.json            model-00003-of-00006.safetensors  model-00006-of-00006.safetensors
@@ -412,26 +411,26 @@ Let's look at achieving model inference using these types of models.

   .. tab-item:: Inference using PEFT adapters

-      To use PEFT adapters like a normal transformer model, you can run the generation by loading a base model along with PEFT 
+      To use PEFT adapters like a normal transformer model, you can run the generation by loading a base model along with PEFT
      adapters as follows.

      .. code-block:: python

         from peft import PeftModel
         from transformers import AutoModelForCausalLM
-         
+
         # Set the path of the model or the name on Hugging face hub
         base_model_name = "meta-llama/Llama-2-7b-chat-hf"
-         
+
         # Set the path of the adapter
         adapter_name = "Llama-2-7b-enhanced-adpater"
-         
-         # Load base model 
+
+         # Load base model
         base_model = AutoModelForCausalLM.from_pretrained(base_model_name)
-         
-         # Adapt the base model with the adapter 
+
+         # Adapt the base model with the adapter
         new_model = PeftModel.from_pretrained(base_model, adapter_name)
-         
+
         # Then, run generation as the same with a normal model outlined in 2.1

      The PEFT library provides a ``merge_and_unload`` method, which merges the adapter layers into the base model. This is
@@ -439,13 +438,13 @@ Let's look at achieving model inference using these types of models.

      .. code-block:: python

-         # Load base model 
+         # Load base model
         base_model = AutoModelForCausalLM.from_pretrained(base_model_name)
-         
-         # Adapt the base model with the adapter 
+
+         # Adapt the base model with the adapter
         new_model = PeftModel.from_pretrained(base_model, adapter_name)
-         
-         # Merge adapter 
+
+         # Merge adapter
         model = model.merge_and_unload()

         # Save the merged model into local
@@ -461,25 +460,25 @@ Let's look at achieving model inference using these types of models.

         # Import relevant class for loading model and tokenizer
         from transformers import AutoTokenizer, AutoModelForCausalLM
-         
+
         # Set the pre-trained model name on Hugging face hub
         model_name = "meta-llama/Llama-2-7b-chat-hf"
-         
-         # Set device type 
+
+         # Set device type
         device = "cuda:0"
-         
-         # Load model and tokenizer 
+
+         # Load model and tokenizer
         model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
         tokenizer = AutoTokenizer.from_pretrained(model_name)
-         
-         # Input prompt encoding 
+
+         # Input prompt encoding
         query = "What is a large language model?"
         inputs = tokenizer.encode(query, return_tensors="pt").to(device)
-         
-         # Token generation  
-         outputs = model.generate(inputs) 
-         
-         # Outputs decoding 
+
+         # Token generation
+         outputs = model.generate(inputs)
+
+         # Outputs decoding
         print(tokenizer.decode(outputs[0]))

      In addition, pipelines from Transformers offer simple APIs to use pre-trained models for different tasks, including
@@ -490,14 +489,14 @@ Let's look at achieving model inference using these types of models.

         # Import relevant class for loading model and tokenizer
         from transformers import pipeline
-         
+
         # Set the path of your model or the name on Hugging face hub
         model_name_or_path = "meta-llama/Llama-2-7b-chat-hf"
-         
-         # Set pipeline 
+
+         # Set pipeline
         # A positive device value will run the model on associated CUDA device id
         pipe = pipeline("text-generation", model=model_name_or_path, device=0)
-         
+
         # Token generation
         print(pipe("What is a large language model?")[0]["generated_text"])

--- a/docs/how-to/rocm-for-ai/system-setup/prerequisite-system-validation.rst
+++ b/docs/how-to/rocm-for-ai/system-setup/prerequisite-system-validation.rst
@@ -31,16 +31,16 @@ in the Instinct documentation for more information.
 Hardware verification with ROCm
 -------------------------------

-Use the command ``rocm-smi --setperfdeterminism 1900`` to set the max clock speed up to 1900 MHz
+Use the command ``amd-smi set --perf-determinism 1900`` to set the max clock speed up to 1900 MHz
 instead of the default 2100 MHz. This can reduce the chance of a PCC event lowering the attainable
 GPU clocks. This setting will not be required for new IFWI releases with the production PRC feature.
-You can restore this setting to its default value with the ``rocm-smi -r`` command.
+You can restore this setting to its default value with the ``amd-smi reset --clocks`` command.

 Run the command:

 .. code-block:: shell

-   rocm-smi --setperfdeterminism 1900
+   amd-smi set --perf-determinism 1900

 See `Hardware verfication for ROCm <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#hardware-verification-with-rocm>`_
 in the Instinct documentation for more information.
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v24.12-dev.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v24.12-dev.rst
@@ -108,16 +108,16 @@ for more information.
 Hardware verification with ROCm
 -------------------------------

-Use the command ``rocm-smi --setperfdeterminism 1900`` to set the max clock speed up to 1900 MHz
+Use the command ``amd-smi set --perf-determinism 1900`` to set the max clock speed up to 1900 MHz
 instead of the default 2100 MHz. This can reduce the chance of a PCC event lowering the attainable
 GPU clocks. This setting will not be required for new IFWI releases with the production PRC feature.
-You can restore this setting to its default value with the ``rocm-smi -r`` command.
+You can restore this setting to its default value with the ``amd-smi reset --clocks`` command.

 Run the command:

 .. code-block:: shell

-   rocm-smi --setperfdeterminism 1900
+   amd-smi set --perf-determinism 1900

 See `Hardware verification with ROCm <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#hardware-verification-with-rocm>`_ for more information.

@@ -248,7 +248,7 @@ Download the Docker image and required packages
      Checking out this specific commit is recommended for a stable and reproducible environment.

      .. code-block:: shell
-         
+
         git checkout bb93ccbfeae6363c67b361a97a27c74ab86e7e92

 Prepare training datasets