AAC ABI ACE ACEs ACS AccVGPR AccVGPRs AITER ALU AllReduce AMD AMDGPU AMDGPUs AMDMIGraphX AMI AOCC AOMP AOT AOTriton APBDIS APIC APIs APU APUs ASIC ASICs ASan ASAN ASm Async ATI atomicRMW AddressSanitizer AlexNet Andrej Arb Autocast autograd BARs BatchNorm BLAS BMC BabelStream Blit Blockwise Bluefield Bootloader Broadcom CAS CCD CDNA CGUI CHTML CIFAR CLI CLion CMake CMakeLists CMakePackage CP CPC CPF CPP CPU CPUs Cron CSC CSDATA CSE CSV CSn CTest CTests CU CUDA CUs CXX CX Cavium CentOS ChatGPT Cholesky CoRR Codespaces Commitizen CommonMark Concretized Conda ConnectX CountOnes CuPy customizable da Dashboarding Dataloading dataflows DBRX DDR DF DGEMM DGL DGLGraph dGPU dGPUs DIMM DKMS DL DMA DOMContentLoaded DNN DNNL DPM DRI DW DWORD Dask DataFrame DataLoader DataParallel Debian decompositions DeepSeek DeepSpeed Dependabot Deprecations DevCap DirectX Disaggregated disaggregated Dockerfile Dockerized Doxygen dropless ELMo ENDPGM EPYC ESXi EP EoS etcd equalto fas FBGEMM FiLM FIFOs FFT FFTs FFmpeg FHS FIXME FMA FP FX Filesystem FindDb Flang FlashAttention FlashInfer’s FlashInfer FluxBenchmark Fortran Fuyu GALB GAT GATNE GCC GCD GCDs GCN GCNN GDB GDDR GDR GDS GEMM GEMMs GFLOPS GFortran GFXIP GGUF Gemma GiB GIM GL Glibc GLXT Gloo GMI GNN GNNs GPG GPR GPT GPU GPU's GPUDirect GPUs GraphBolt GraphSage GRBM GRE GenAI GenZ GitHub Gitpod HBM HCA HGX HIPCC hipDataType HIPExtension HIPIFY HIPification hipification HIPify HPC HPCG HPE HPL HSA HW HWE HWS Haswell Higgs href Hyperparameters HybridEngine Huggingface Hunyuan HunyuanVideo IB ICD ICT ICV IDE IDEs IFWI IMDb IncDec instrSize interpolators IOMMU IOP IOPS IOPM IOV IRQ ISA ISV ISVs ITL ImageNet InfiniBand Inlines IntelliSense Interop Intersphinx Intra Ioffe JAX's JAXLIB Jinja js JSON Jupyter KFD KFDTest KMD KV KVM Karpathy's KiB Kineto Keras Khronos LAPACK LCLK LDS LLM LLMs LLVM LM logsumexp LRU LSAN LSan LTS LSTMs LteAll LanguageCrossEntropy LoRA MECO MEM MERCHANTABILITY MFMA MiB MIGraphX MIOpen MIOpenGEMM MIOpen's MIVisionX MLM MMA MMIO MMIOH MMU MNIST MPI MPT MSVC mul MVAPICH MVFFR Makefile Makefiles Matplotlib Matrox MaxText MBT Megablocks Megatrends Megatron Mellanox Mellanox's Meta's Miniconda MirroredStrategy Mixtral MLA MosaicML MoEs Mooncake Mpops Multicore Multithreaded mx MXFP MyEnvironment MyST NANOO NBIO NBIOs NCCL NCF NFS NIC NICs NLI NLP NN NPKit NPS NSP NUMA NVCC NVIDIA NVPTX NaN Nano Navi Noncoherently NoReturn NousResearch's NumPy OAM OAMs OCP OEM OFED OMM OMP OMPI OOM OMPT OMPX ONNX OSS OSU OpenCL OpenCV OpenFabrics OpenGL OpenMP OpenMPI OpenSSL OpenVX OpenXLA Optim Oversubscription PagedAttention Pallas PCC PCI PCIe PEFT perf PEQT PIL PILImage PJRT POR PRNG PRs PaLM Pageable PeerDirect PerfDb Perfetto PipelineParallel PnP PowerEdge PowerShell Pretrained Pretraining Primus Profiler's PyPi Pytest PyTorch QPS Qcycles Qwen RAII RAS RCCL RDC RDC's RDMA RDNA README Recomputation RHEL RMW RNN RNNs ROC ROCProfiler ROCT ROCTx ROCTracer ROCclr ROCdbgapi ROCgdb ROCk ROCm ROCmCC ROCmSoftwarePlatform ROCmValidationSuite ROCprofiler ROCr RPP RST RW Radeon RelWithDebInfo Req Rickle RoCE Runfile Ryzen SALU SBIOS SCA ScaledGEMM SDK SDMA SDPA SDRAM SENDMSG SGLang SGPR SGPRs SHA SHARK's SIGQUIT SIMD SIMDs SKU SKUs SLES SLURM Slurm SMEM SMFMA SMI SMT SPI SQs SRAM SRAMECC SVD SWE SerDes ShareGPT Shlens simd Skylake Softmax Spack SplitK Supermicro Szegedy TagRAM TCA TCC TCCs TCI TCIU TCP TCR TVM THREADGROUPS threadgroups TensorRT TensorFloat TF TFLOPS TP TPS TPU TPUs TSME Tagram TensileLite TensorBoard TensorFlow TensorParallel TheRock ToC TorchAudio torchaudio TorchElastic TorchMIGraphX torchrec TorchScript TorchServe torchserve torchtext TorchVision TransferBench TrapStatus UAC UC UCC UCX ud UE UIF UMC USM UTCL UTIL UltraChat Uncached Unittests Unhandled unwindowed VALU VBIOS VCN verl's VGPR VGPRs VM VMEM VMWare VRAM VSIX VSkipped Vanhoucke Vulkan WGP WGPs WR WX WikiText Wojna Workgroups Writebacks xcc XCD XCDs XGBoost XGBoost's XGMI XT XTX Xeon Xilinx Xnack Xteam YAML YAMLs YML YModel ZeRO ZenDNN accuracies activations addr addEventListener ade ai alloc allocatable allocator allocators amdgpu api aten atmi atomics autogenerated autotune avx awk az backend backends bb benchmarked benchmarking bfloat bilinear bitcode bitsandbytes bitwise Bitwise blit bootloader boson bosons br BrainFloat btn buildable bursty bzip cacheable carveout cd centos centric changelog checkpointing chiplet classList cmake cmd coalescable codename collater comgr compat completers composable concretization config configs conformant const constructible convolutional convolves copyable cpp csn cuBLAS cuda cuDNN cudnn cuFFT cuLIB cuRAND cuSOLVER cuSPARSE customizations cTDP dataset datasets dataspace datatemplate datatype datatypes dbgapi de deallocation debuggability debian deepseek denoise denoised denoises denormalize dequantization dequantized dequantizes deserializers detections dev devicelibs devsel dgl dimensionality disambiguates distro distros dkms dtype eb el embeddings enablement encodings endfor endif endpgm enqueue env epilog etcetera ethernet exascale executables ffmpeg filesystem forEach fortran fp framebuffer gRPC galb gcc gdb gemm getAttribute gfortran gfx githooks github globals gnupg gpu grayscale gx gzip heterogenous hipBLAS hipBLASLt hipBLASLt's hipblaslt hipCUB hipFFT hipFORT hipLIB hipRAND hipSOLVER hipSPARSE hipSPARSELt hipTensor hipamd hipblas hipcc hipcub hipfft hipfort hipify hipsolver hipsparse hlist hostname hotspotting hpc hpp hsa hsakmt hyperparameter hyperparameters iDRAC ib_core inband incrementing inductor inferencing inflight init initializer inlining installable interop interprocedural intra intrinsics invariants invocating ipo jax json kdb kfd kv lang latencies len libfabric libjpeg libs linalg linearized linter linux llm llvm lm localscratch logits lossy macOS matchers maxtext megablocks megatron microarchitecture migraphx migratable miopen miopengemm mivisionx mixtral mjx mkdir mlirmiopen mtypes mutex mvffr namespace namespaces nanoGPT NCS NOP NVLink num numref ocl opencl opencv openmp openssl optimizers ol os oversubscription pageable pallas parallelization parallelizing param params parameterization passthrough pe perfcounter performant piecewise perl pragma pre prebuild prebuilt precompiled preconditioner preconfigured preemptible prefetch prefetchable prefill prefills preloaded preprocess preprocessed preprocessing preprocessor prequantized prerequisites pretrain pretraining primus profiler profilers protobuf pseudorandom px py pytorch recommender recommenders quantile quantizer quasirandom querySelector querySelectorAll queueing qwen radeon rc rccl rdc rdma reStructuredText redirections refactorization reformats reinforcememt repo repos representativeness req resampling rescaling reusability rhel rl RLHF roadmap roc rocAL rocALUTION rocBLAS rocDecode rocFFT rocHPCG rocJPEG rocLIB rocMLIR rocPRIM rocPyDecode rocRAND rocSOLVER rocSPARSE rocThrust rocWMMA rocalution rocblas rocclr rocfft rocm rocminfo rocprim rocprof rocprofv rocprofiler rocr rocrand rocsolver rocsparse rocthrust roctracer rst runtime runtimes ResNet sL scalability scalable scipy seealso selectattr selectedTag sendmsg seqs serializers setAttribute sglang shader sharding sigmoid sles sm smi softmax spack spmm src stanford stochastically strided subcommand subdirectory subexpression subfolder subfolders submatrix submodule submodules subnet supercomputing symlink symlinks sys tabindex targetContainer td tensorfloat tf th tokenization tokenize tokenized tokenizer tokenizes toolchain toolchains topk toolset toolsets torchtitan torchvision tp tqdm tracebacks txt TopK uarch uncached uncacheable uncorrectable underoptimized unhandled uninstallation unmapped unsqueeze unstacking unswitching untrusted untuned upvote USM UTCL UTIL utils UX vL variational vdi vectorizable vectorization vectorize vectorized vectorizer vectorizes verl virtualize virtualized vjxb vllm voxel walkthrough walkthroughs watchpoints wavefront wavefronts whitespace whitespaces workgroup workgroups writeback writebacks wrreq wzo xargs xdit xDiT xGMI xPacked xz yaml ysvmadyb zypper