github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-02-21 03:00:39 -05:00

Author	SHA1	Message	Date
Lixun Zhang	8351f49fc7	[Tuning] Gemm tuning v3 (#457 ) * Add gemm tuning script v3 * Introduce --jobs to control the number of files to generate * Switch to trans convention used by Tensile * Rerun rocprof if it crashes * update README * Remove peak perf and efficiency	2024-01-17 10:09:34 -06:00
jayfurmanek	d2f8bc1740	remove git modules for tree sitter (#465 )	2024-01-16 15:44:05 -06:00
Lixun Zhang	b5ed97873c	Added a script to print occupancy info (#450 )	2024-01-10 14:01:13 -06:00
Lixun Zhang	ce9dacec72	Skip BLOCK_SIZE that is too large compare to M/N (#449 )	2024-01-09 13:41:09 -06:00
Jack Taylor	1e2fd0dd1a	Update hip_backend to use libhsa-runtime for arch info, (#411 ) brings in path changes for pytorch triton wheels Co-authored-by: jayfurmanek <Jason.Furmanek@amd.com>	2023-12-21 15:40:57 +00:00
jayfurmanek	a42ac260aa	Merge branch 'triton-mlir' into ifu-231117	2023-12-12 14:24:11 -06:00
Alexander Efimov	605a90c58e	[MFMA] Support tile size 4x4 version 1 (#413 ) This PR enables 4x4 tile size in MFMA based dot operations. Supported tiled dot is (4x64) x (64x4) -> (4x4) in MFMA layout. However, actual dot operation should have at least 64 output elements, this is a limitation of other layouts appearing during result processing (i.e. blocked layout can not handle tensors smaller than wavesize). For example, following dots are supported: (4x64) x (64x16) -> (4x16), (16x64) x (64x4) -> (16x4) or (8x64) x (64x8) -> (8x8) Following dots are not supporter: (4x128) x (128x4) -> (4x4), (4x64) x (64x8) -> (4x8) This is a first version of dot using mfma 4x4 instructions, with redundancy and reductions.	2023-12-12 18:23:55 +01:00
Alexander Efimov	2be6ec771e	[GEMM] [Tuning] Make tuning script more verbose (#420 ) This PR adds: - verbose tuning mode: printing std output of compilation and tuning calls - collecting information about failed compilations - print correctness check output with word - split dimensions in generated scripts with "-" - gpu_ids option to set particular gpus	2023-12-10 22:04:00 -06:00
Alexander Efimov	e19b5fd6bc	[GEMM] Add script to run one tuning config (#419 ) The script runs one given config for debug purposes.	2023-12-07 18:12:03 -06:00
Michael Melesse	64a0924381	ROCM IFU: remove ref to test_elementwise	2023-12-07 13:31:59 -06:00
Lixun Zhang	670ae8054d	Add a cute tool to plot blocked, dotOperand, and mfma layout (#407 ) * Add commands to plot blocked, dotOperand, and mfma layout * Add commands to plot LDS layout and wmma instruction layout	2023-11-29 09:35:33 -06:00
Lixun Zhang	d4eda83b33	Benchmark FA on 2 GCDs (#393 )	2023-11-08 12:42:54 -06:00
Lixun Zhang	1af893d8a2	[FRONTEND] Add input dtypes to autotuning key (#2534 ) (#374 ) * [FRONTEND] Add input dtypes to autotuning key (#2534) * Fix conflict in 06-fused-attention * Fix get_best_config in FA-transV.py * Fix leftover get_best_config() --------- Co-authored-by: Adnan Akhundov <adnan.akhundov@gmail.com>	2023-11-07 19:36:57 -06:00
Lixun Zhang	f963c04034	Use the same heuristics for mfma type as PR#352 (#366 )	2023-10-18 20:32:44 -05:00
Lixun Zhang	1de859df32	[GEMM] [Tuning] Add `waves_per_eu` to gemm tuning (#362 ) * Add waves_per_eu in the tuning space * Do not allocate tensor on device during kernel compilation step * Add breakdown elapsed time * Parallelize the post-processing step * Parallelize the profile step with --ngpus * Better timing info printout	2023-10-16 13:50:03 -05:00
Jack Taylor	47563240f8	PyTorch triton branch synchronisation (#354 ) * Restructure ROCM Library Search Currently there are a handful of ROCM dependant files which are required for triton to run. The linker(ld.lld), the include files, and multiple hip/hsa shared objects. This change will provide three search areas to find these files. All in the same order. 1. third_party/rocm. This location is within the python/triton directory and is carried over when triton is built. IF all necessary files are in this location there will be no need to have ROCM installed at all on the system. 2. $ROCM_PATH environmental variable. If this exists it will override all other locations to find ROCM necessary files 3. /opt/rocm. The default location for ROCm installations. Finding one here will notify triton that ROCM is installed in this environment To ease with step 3. A new script scripts/amd/setup_rocm_libs.sh has been added to the repo. Executing this script will cause all necessary ROCM files to be downloaded from their respective packages on repo.radeon.com and installed in third_party/rocm. Allowing for triton to run without installing the full ROCM stack. setup_rocm_libs.sh takes a env_var ROCM_VERSION if a user wishes to install a ROCM version other than the default (currently 5.4.2) When triton whls are built to support Pytorch, method 3 will be used to stay in sync with PyTorch's approach of bringing along any libraries needed and not requiring ROCM to be installed. (cherry picked from commit e6aea90fb3e8218cb562e5d990719112d8282702) * Fix default rocm path Running into `fatal error: hip/hip_runtime.h: No such file or directory` with latest wheel due to incorrect directory for ROCm libs (cherry picked from commit 292bae625b113eb65c66cfe4442da7a6456c988a) * setup_rocm_libs.sh manylinux refactor (cherry picked from commit f995f314ada4606cb78dc6233cd9c8effc356191) * Set setup_rocm_libs.sh to be executable (cherry picked from commit 05d67b9418cacda0d356c2102d7c1a887948b013) * Revert to using numbered so files to fix upstream (cherry picked from commit 34f8189eae57a23cc15b4b4f032fe25757e0db8e) * Remove drm script --------- Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2023-10-11 15:30:39 +01:00
Lixun Zhang	515525d068	[GEMM] Tuning script v2 (#350 ) * [GEMM] Tuning script v2 * Extend tuning space to include BLOCK_SIZE = 256 Check LDS in a more smart way * Added README * Add git branch and commit to the default tuning result filename	2023-10-10 20:49:49 -05:00
Jason Furmanek	e5d7bb4fae	Initial commit to resolve merge conflicts rename tl.float8e4 to tl.float8e4nv to align with upstream ROCM IFU: Fix python arch issues ROCM IFU: Fix kernel launcher ROCM IFU: Fix merge conflicts fix debug build Set correct threadsPerCTA	2023-10-03 04:04:26 +00:00
Lixun Zhang	8d99331c89	Combine split_k and non split_k kernels in GEMM tuning API (#344 )	2023-09-28 12:37:22 -05:00
Shucai Xiao	10795d8fd3	Fixed a bug related to split_k and prune unnecessary tuning space (#332 ) * refine tuning scrit by adding prune_configs, also fixed a bug in generating tuning configs * fixed a bug in returning the empty config	2023-09-21 23:47:14 -05:00
Shucai Xiao	fb3f2d6feb	refine gemm tuning scripts (#309 ) * refine the gemm tuning scripts to reduce tuning space and better perf numbers * added code to support tuning in full tuning space * add a function to get best tuning config * refine the matmul tutorial example to print out best tuning config for each input * added even_k to gemm kernel heuristic for better performance * address review comments	2023-09-07 08:09:11 -05:00
Shucai Xiao	1c86e3238a	remove multiple archtictures to isa head and adding gemm tuning scripts (#261 ) * Remove adding multiple architectures to isa head * Add mask for gpu memory load in scripts for tuning gemm 'script/amd/gemm/matmul.py' * Move the scripts to a better place 'scripts/amd/gemm/'	2023-07-18 14:21:16 -05:00
Michael Melesse	275fead8e3	fix lit test	2023-05-12 15:37:08 -05:00
Michael Melesse	9cc141b12d	assume ROCM device This is a combination of 7 commits. use pyt nightly with root repro with pytorch unit test hardcode isROCM to true set is_cuda to False ignore cc arg clean up match triton-mlir branch	2023-05-04 16:46:59 -05:00
Michael Melesse	fdd2af8b38	fix workflow This is a combination of 6 commits. change github actions install git remove pre-commit back to old install use -e clean up	2023-05-01 12:49:29 -05:00
Michael Melesse	13facab95f	fix lit tests This is a combination of 3 commits. fix build and test errors fix lit test error fix lit tests	2023-05-01 12:48:20 -05:00
Michael Melesse	d211cd7750	skip bad test	2023-04-17 13:12:34 -05:00
Michael Melesse	705d47d0dd	fix lit test issues This is a combination of 6 commits. install lit fix lit test fix lit test fix aot lit issues fix final lit tests add lit tests	2023-04-17 11:46:37 -05:00
Michael Melesse	f50116208f	match masked load	2023-04-11 15:20:08 -05:00
Rahul Batra	c7ac25dc60	fix shift op	2023-04-10 15:05:45 -05:00
Rahul Batra	3d71a6a034	fix issues	2023-04-07 14:40:59 -05:00
Rohit Santhanam	dadc09623b	Replace hard coded ROCM paths with ROCM_PATH env var.	2023-03-06 03:20:38 +00:00
Michael Melesse	2077c0723b	local ROCM bitcode files This is a combination of 6 commits. use local bitcode This is a combination of 3 commits. add bit code to repo update test change bit code path move bit code update path update scripts update test fix path issue	2023-02-17 14:10:34 -06:00
Daniil Fukalov	6b4687db34	[ROCM][scripts] Add script to build debug LLVM installation.	2023-01-13 00:41:57 +01:00
Michael Melesse	bcccbf7787	update test script	2022-12-24 10:25:50 -06:00
Michael Melesse	28bec3dc41	update test	2022-12-24 07:53:28 -06:00
Michael Melesse	3f8b402f8a	update script	2022-12-22 22:06:32 -06:00
Michael Melesse	9ff2f8b653	enable kernel launching	2022-12-22 21:59:47 -06:00
Michael Melesse	46357a92f2	label kernels correctly	2022-12-22 21:24:34 -06:00
Michael Melesse	34f95bc7d9	update scripts	2022-12-22 18:47:42 -06:00
Michael Melesse	8b1fb798e6	show segfaults	2022-12-22 16:29:49 -06:00
Michael Melesse	f06fdff372	add prints in c code	2022-12-22 16:20:46 -06:00
Michael Melesse	814a59a3d6	attempt launch	2022-12-22 08:29:20 -06:00
Michael Melesse	edd0df94dc	compiles	2022-12-21 13:48:56 -06:00
Michael Melesse	5e055a5165	add scripts	2022-12-21 13:13:24 -06:00

45 Commits