AAC ABI ACE ACEs ACS AccVGPR AccVGPRs ALU AMD AMDGPU AMDGPUs AMDMIGraphX AMI AOCC AOMP APBDIS APIC APIs APU ASIC ASICs ASan ASAN ASm ATI AddressSanitizer AlexNet Arb Autocast BARs BLAS BMC Blit Blockwise Bluefield Bootloader CCD CDNA CHTML CIFAR CLI CLion CMake CMakeLists CMakePackage CP CPC CPF CPP CPU CPUs Cron CSC CSE CSV CSn CTest CTests CU CUDA CUs CXX Cavium CentOS ChatGPT CoRR Codespaces Commitizen CommonMark Concretized Conda ConnectX CuPy Dashboarding DDR DF DGEMM DIMM DKMS DL DMA DNN DNNL DPM DRI DW DWORD Dask DataFrame DataLoader DataParallel DeepSpeed Dependabot Deprecations DevCap Dockerfile Doxygen ELMo ENDPGM EPYC ESXi EoS FBGEMM FFT FFTs FFmpeg FHS FMA FP FX Filesystem FindDb Flang Fortran Fuyu GALB GCC GCD GCDs GCN GDB GDDR GDR GDS GEMM GEMMs GFortran GiB GIM GL GLXT GMI GPG GPR GPT GPU GPU's GPUs GRBM GenAI GenZ GitHub Gitpod HBM HCA HGX HIPCC HIPExtension HIPIFY HPC HPCG HPE HPL HSA HW HWE HWS Haswell Higgs Hyperparameters ICV IDE IDEs IFWI IMDb IOMMU IOP IOPM IOV IRQ ISA ISV ISVs ITL ImageNet InfiniBand Inlines IntelliSense Interop Intersphinx Intra Ioffe Jinja JSON Jupyter KFD KFDTest KiB KV KVM Keras Khronos LAPACK LCLK LDS LLM LLMs LLVM LM LSAN LSan LTS LoRA MEM MERCHANTABILITY MFMA MiB MIGraphX MIOpen MIOpenGEMM MIVisionX MLM MMA MMIO MMIOH MNIST MPI MSVC MVAPICH MVFFR Makefile Makefiles Matplotlib Matrox Megatrends Megatron Mellanox Mellanox's Meta's Miniconda MirroredStrategy Mixtral Multicore Multithreaded MyEnvironment MyST NBIO NBIOs NIC NICs NLI NLP NPKit NPS NSP NUMA NVCC NVIDIA NVPTX NaN Nano Navi Noncoherently NousResearch's NumPy OAM OAMs OCP OEM OFED OMM OMP OMPI OMPT OMPX ONNX OSS OSU OpenCL OpenCV OpenFabrics OpenGL OpenMP OpenMPI OpenSSL OpenVX OpenXLA Oversubscription PCC PCI PCIe PEFT PIL PILImage POR PRNG PRs PaLM Pageable PeerDirect PerfDb Perfetto PipelineParallel PnP PowerEdge PowerShell PyPi PyTorch Qcycles Qwen RAII RAS RCCL RDC RDMA RDNA README RHEL RNN RNNs ROC ROCProfiler ROCTracer ROCclr ROCdbgapi ROCgdb ROCk ROCm ROCmCC ROCmSoftwarePlatform ROCmValidationSuite ROCprofiler ROCr RST RW Radeon RelWithDebInfo Req Rickle RoCE Ryzen SALU SBIOS SCA SDK SDMA SDPA SDRAM SENDMSG SGPR SGPRs SHA SIGQUIT SIMD SIMDs SKU SKUs SLES SMEM SMI SMT SPI SQs SRAM SRAMECC SVD SWE SerDes ShareGPT Shlens Skylake Softmax Spack SplitK Supermicro Szegedy TCA TCC TCI TCIU TCP TCR TF TFLOPS TP TPU TPUs TSME Tagram TensileLite TensorBoard TensorFlow TensorParallel ToC TorchAudio TorchMIGraphX TorchScript TorchServe TorchVision TransferBench TrapStatus UAC UC UCC UCX UE UIF UMC USM UTCL UTIL Uncached Unittests Unhandled VALU VBIOS VGPR VGPRs VM VMEM VMWare VRAM VSIX VSkipped Vanhoucke Vulkan WGP WGPs WX WikiText Wojna Workgroups Writebacks XCD XCDs XGBoost XGBoost's XGMI XT XTX Xeon Xilinx Xnack Xteam YAML YML YModel ZeRO ZenDNN accuracies activations addr alloc allocatable allocator allocators amdgpu api atmi atomics autogenerated avx awk backend backends benchmarking bfloat bilinear bitcode bitsandbytes blit bootloader boson bosons br buildable bursty bzip cacheable cd centos centric changelog chiplet cmake cmd coalescable codename collater comgr completers composable concretization config conformant constructible convolutional convolves copyable cpp csn cuBLAS cuFFT cuLIB cuRAND cuSOLVER cuSPARSE customizations cTDP dataset datasets dataspace datatype datatypes dbgapi de deallocation denoise denoised denoises denormalize dequantization dequantizes deserializers detections dev devicelibs devsel dimensionality disambiguates distro el embeddings enablement encodings endpgm enqueue env epilog etcetera ethernet exascale executables ffmpeg filesystem fortran fp gRPC galb gcc gdb gfortran gfx githooks github globals gnupg grayscale gzip heterogenous hipBLAS hipBLASLt hipBLASLt's hipCUB hipFFT hipLIB hipRAND hipSOLVER hipSPARSE hipSPARSELt hipTensor hipamd hipblas hipcub hipfft hipfort hipify hipsolver hipsparse hlist hotspotting hpc hpp hsa hsakmt hyperparameter iDRAC ib_core inband incrementing inductor inferencing inflight init initializer inlining installable interop interprocedural intra invariants invocating ipo jax kdb kfd latencies libfabric libjpeg libs linearized linter linux llvm localscratch logits lossy macOS matchers microarchitecture migraphx miopen miopengemm mivisionx mjx mkdir mlirmiopen mtypes mutex mvffr namespace namespaces numref ocl opencl opencv openmp openssl optimizers os oversubscription pageable parallelization parameterization passthrough perfcounter performant perl pragma pre prebuild prebuilt precompiled preconditioner preconfigured prefetch prefetchable prefill prefills preloaded preprocess preprocessed preprocessing preprocessor prequantized prerequisites profiler profilers protobuf pseudorandom py quantile quantizer quasirandom queueing rccl rdc rdma reStructuredText redirections refactorization reformats repo repos representativeness req resampling rescaling reusability roadmap roc rocAL rocALUTION rocBLAS rocDecode rocFFT rocLIB rocMLIR rocPRIM rocPyDecode rocRAND rocSOLVER rocSPARSE rocThrust rocWMMA rocalution rocblas rocclr rocfft rocm rocminfo rocprim rocprof rocprofiler rocr rocrand rocsolver rocsparse rocthrust roctracer rst runtime runtimes sL scalability scalable sendmsg serializers shader sharding sigmoid sm smi softmax spack src stochastically strided subcommand subdirectory subexpression subfolder subfolders submodule submodules supercomputing symlink symlinks td tensorfloat th tokenization tokenize tokenized tokenizer tokenizes toolchain toolchains toolset toolsets torchvision tqdm tracebacks txt uarch uncached uncorrectable unhandled uninstallation unsqueeze unstacking unswitching untrusted untuned upvote USM UTCL UTIL utils vL variational vdi vectorizable vectorization vectorize vectorized vectorizer vectorizes vjxb voxel walkthrough walkthroughs watchpoints wavefront wavefronts whitespace whitespaces workgroup workgroups writeback writebacks wrreq wzo xargs xGMI xz yaml ysvmadyb zypper