Commit Graph

45 Commits

Author SHA1 Message Date
Lixun Zhang
8351f49fc7 [Tuning] Gemm tuning v3 (#457)
* Add gemm tuning script v3

* Introduce --jobs to control the number of files to generate

* Switch to trans convention used by Tensile

* Rerun rocprof if it crashes

* update README

* Remove peak perf and efficiency
2024-01-17 10:09:34 -06:00
jayfurmanek
d2f8bc1740 remove git modules for tree sitter (#465) 2024-01-16 15:44:05 -06:00
Lixun Zhang
b5ed97873c Added a script to print occupancy info (#450) 2024-01-10 14:01:13 -06:00
Lixun Zhang
ce9dacec72 Skip BLOCK_SIZE that is too large compare to M/N (#449) 2024-01-09 13:41:09 -06:00
Jack Taylor
1e2fd0dd1a Update hip_backend to use libhsa-runtime for arch info, (#411)
brings in path changes for pytorch triton wheels

Co-authored-by: jayfurmanek <Jason.Furmanek@amd.com>
2023-12-21 15:40:57 +00:00
jayfurmanek
a42ac260aa Merge branch 'triton-mlir' into ifu-231117 2023-12-12 14:24:11 -06:00
Alexander Efimov
605a90c58e [MFMA] Support tile size 4x4 version 1 (#413)
This PR enables 4x4 tile size in MFMA based dot operations.

Supported tiled dot is (4x64) x (64x4) -> (4x4) in MFMA layout.
However, actual dot operation should have at least 64 output elements, this is a limitation of other layouts appearing during result processing (i.e. blocked layout can not handle tensors smaller than wavesize).

For example, following dots are supported: (4x64) x (64x16) -> (4x16), (16x64) x (64x4) -> (16x4) or (8x64) x (64x8) -> (8x8)
Following dots are not supporter: (4x128) x (128x4) -> (4x4), (4x64) x (64x8) -> (4x8)

This is a first version of dot using mfma 4x4 instructions, with redundancy and reductions.
2023-12-12 18:23:55 +01:00
Alexander Efimov
2be6ec771e [GEMM] [Tuning] Make tuning script more verbose (#420)
This PR adds:
- verbose tuning mode: printing std output of compilation and tuning calls
- collecting information about failed compilations
- print correctness check output with word
- split dimensions in generated scripts with "-"
- gpu_ids option to set particular gpus
2023-12-10 22:04:00 -06:00
Alexander Efimov
e19b5fd6bc [GEMM] Add script to run one tuning config (#419)
The script runs one given config for debug purposes.
2023-12-07 18:12:03 -06:00
Michael Melesse
64a0924381 ROCM IFU: remove ref to test_elementwise 2023-12-07 13:31:59 -06:00
Lixun Zhang
670ae8054d Add a cute tool to plot blocked, dotOperand, and mfma layout (#407)
* Add commands to plot blocked, dotOperand, and mfma layout

* Add commands to plot LDS layout and wmma instruction layout
2023-11-29 09:35:33 -06:00
Lixun Zhang
d4eda83b33 Benchmark FA on 2 GCDs (#393) 2023-11-08 12:42:54 -06:00
Lixun Zhang
1af893d8a2 [FRONTEND] Add input dtypes to autotuning key (#2534) (#374)
* [FRONTEND] Add input dtypes to autotuning key (#2534)

* Fix conflict in 06-fused-attention

* Fix get_best_config in FA-transV.py

* Fix leftover get_best_config()

---------

Co-authored-by: Adnan Akhundov <adnan.akhundov@gmail.com>
2023-11-07 19:36:57 -06:00
Lixun Zhang
f963c04034 Use the same heuristics for mfma type as PR#352 (#366) 2023-10-18 20:32:44 -05:00
Lixun Zhang
1de859df32 [GEMM] [Tuning] Add waves_per_eu to gemm tuning (#362)
* Add waves_per_eu in the tuning space

* Do not allocate tensor on device during kernel compilation step

* Add breakdown elapsed time

* Parallelize the post-processing step

* Parallelize the profile step with --ngpus

* Better timing info printout
2023-10-16 13:50:03 -05:00
Jack Taylor
47563240f8 PyTorch triton branch synchronisation (#354)
* Restructure ROCM Library Search
Currently there are a handful of ROCM dependant files which are required for
triton to run.  The linker(ld.lld), the include files, and multiple hip/hsa
shared objects.

This change will provide three search areas to find these files.  All in
the same order.

1. third_party/rocm.  This location is within the python/triton directory
   and is carried over when triton is built.  IF all necessary files
   are in this location there will be no need to have ROCM installed at
   all on the system.

2. $ROCM_PATH environmental variable.  If this exists it will override
   all other locations to find ROCM necessary files

3. /opt/rocm.  The default location for ROCm installations.  Finding one
   here will notify triton that ROCM is installed in this environment

To ease with step 3.  A new script scripts/amd/setup_rocm_libs.sh
has been added to the repo.  Executing this script will cause all necessary
ROCM files to be downloaded from their respective packages on repo.radeon.com
and installed in third_party/rocm.  Allowing for triton to run without installing
the full ROCM stack.  setup_rocm_libs.sh takes a env_var ROCM_VERSION if a user
wishes to install a ROCM version other than the default (currently 5.4.2)

When triton whls are built to support Pytorch, method 3 will be used to stay in
sync with PyTorch's approach of bringing along any libraries needed and not
requiring ROCM to be installed.

(cherry picked from commit e6aea90fb3e8218cb562e5d990719112d8282702)

* Fix default rocm path

Running into `fatal error: hip/hip_runtime.h: No such file or directory` with latest wheel due to incorrect directory for ROCm libs

(cherry picked from commit 292bae625b113eb65c66cfe4442da7a6456c988a)

* setup_rocm_libs.sh manylinux refactor

(cherry picked from commit f995f314ada4606cb78dc6233cd9c8effc356191)

* Set setup_rocm_libs.sh to be executable

(cherry picked from commit 05d67b9418cacda0d356c2102d7c1a887948b013)

* Revert to using numbered so files to fix upstream

(cherry picked from commit 34f8189eae57a23cc15b4b4f032fe25757e0db8e)

* Remove drm script

---------

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2023-10-11 15:30:39 +01:00
Lixun Zhang
515525d068 [GEMM] Tuning script v2 (#350)
* [GEMM] Tuning script v2

* Extend tuning space to include BLOCK_SIZE = 256

Check LDS in a more smart way

* Added README

* Add git branch and commit to the default tuning result filename
2023-10-10 20:49:49 -05:00
Jason Furmanek
e5d7bb4fae Initial commit to resolve merge conflicts
rename tl.float8e4 to tl.float8e4nv to align with upstream

ROCM IFU: Fix python arch issues

ROCM IFU: Fix kernel launcher

ROCM IFU: Fix merge conflicts

fix debug build

Set correct threadsPerCTA
2023-10-03 04:04:26 +00:00
Lixun Zhang
8d99331c89 Combine split_k and non split_k kernels in GEMM tuning API (#344) 2023-09-28 12:37:22 -05:00
Shucai Xiao
10795d8fd3 Fixed a bug related to split_k and prune unnecessary tuning space (#332)
* refine tuning scrit by adding prune_configs, also fixed a bug in generating tuning configs

* fixed a bug in returning the empty config
2023-09-21 23:47:14 -05:00
Shucai Xiao
fb3f2d6feb refine gemm tuning scripts (#309)
* refine the gemm tuning scripts to reduce tuning space and better perf numbers

* added code to support tuning in full tuning space

* add a function to get best tuning config

* refine the matmul tutorial example to print out best tuning config for each input

* added even_k to gemm kernel heuristic for better performance

* address review comments
2023-09-07 08:09:11 -05:00
Shucai Xiao
1c86e3238a remove multiple archtictures to isa head and adding gemm tuning scripts (#261)
* Remove adding multiple architectures to isa head

* Add mask for gpu memory load in scripts for tuning gemm 'script/amd/gemm/matmul.py'

* Move the scripts to a better place 'scripts/amd/gemm/'
2023-07-18 14:21:16 -05:00
Michael Melesse
275fead8e3 fix lit test 2023-05-12 15:37:08 -05:00
Michael Melesse
9cc141b12d assume ROCM device
This is a combination of 7 commits.

use pyt nightly with root

repro with pytorch unit test

hardcode isROCM to true

set is_cuda to False

ignore cc arg

clean up

match triton-mlir branch
2023-05-04 16:46:59 -05:00
Michael Melesse
fdd2af8b38 fix workflow
This is a combination of 6 commits.

change github actions

install git

remove pre-commit

back to old install

use -e

clean up
2023-05-01 12:49:29 -05:00
Michael Melesse
13facab95f fix lit tests
This is a combination of 3 commits.

fix build and test errors

fix lit test error

fix lit tests
2023-05-01 12:48:20 -05:00
Michael Melesse
d211cd7750 skip bad test 2023-04-17 13:12:34 -05:00
Michael Melesse
705d47d0dd fix lit test issues
This is a combination of 6 commits.

install lit

fix lit test

fix lit test

fix aot lit issues

fix final lit tests

add lit tests
2023-04-17 11:46:37 -05:00
Michael Melesse
f50116208f match masked load 2023-04-11 15:20:08 -05:00
Rahul Batra
c7ac25dc60 fix shift op 2023-04-10 15:05:45 -05:00
Rahul Batra
3d71a6a034 fix issues 2023-04-07 14:40:59 -05:00
Rohit Santhanam
dadc09623b Replace hard coded ROCM paths with ROCM_PATH env var. 2023-03-06 03:20:38 +00:00
Michael Melesse
2077c0723b local ROCM bitcode files
This is a combination of 6 commits.

use local bitcode

This is a combination of 3 commits.

add bit code to repo

update test

change bit code path

move bit code

update path

update scripts

update test

fix path issue
2023-02-17 14:10:34 -06:00
Daniil Fukalov
6b4687db34 [ROCM][scripts] Add script to build debug LLVM installation. 2023-01-13 00:41:57 +01:00
Michael Melesse
bcccbf7787 update test script 2022-12-24 10:25:50 -06:00
Michael Melesse
28bec3dc41 update test 2022-12-24 07:53:28 -06:00
Michael Melesse
3f8b402f8a update script 2022-12-22 22:06:32 -06:00
Michael Melesse
9ff2f8b653 enable kernel launching 2022-12-22 21:59:47 -06:00
Michael Melesse
46357a92f2 label kernels correctly 2022-12-22 21:24:34 -06:00
Michael Melesse
34f95bc7d9 update scripts 2022-12-22 18:47:42 -06:00
Michael Melesse
8b1fb798e6 show segfaults 2022-12-22 16:29:49 -06:00
Michael Melesse
f06fdff372 add prints in c code 2022-12-22 16:20:46 -06:00
Michael Melesse
814a59a3d6 attempt launch 2022-12-22 08:29:20 -06:00
Michael Melesse
edd0df94dc compiles 2022-12-21 13:48:56 -06:00
Michael Melesse
5e055a5165 add scripts 2022-12-21 13:13:24 -06:00