github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-02-21 03:00:39 -05:00

Author	SHA1	Message	Date
Ognjen	171a67e837	Add scheduling pass	2024-01-25 18:07:45 +00:00
Jack Taylor	1e2fd0dd1a	Update hip_backend to use libhsa-runtime for arch info, (#411 ) brings in path changes for pytorch triton wheels Co-authored-by: jayfurmanek <Jason.Furmanek@amd.com>	2023-12-21 15:40:57 +00:00
Jason Furmanek	484852876e	Resolve merge conflicts; AMD adjustments for new LLVM version	2023-11-09 19:00:49 +00:00
Jason Furmanek	977d5aa267	Merge commit '721897fcc4f942aa97d2e9ba3787a5e213758177' into ifu-231108 Conflicts: bin/triton-translate.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp python/triton/compiler/compiler.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py test/Conversion/tritongpu_to_llvm.mlir	2023-11-08 18:51:23 +00:00
Jason Furmanek	3a6dc5ad8d	resolve some merge conflicts fix more conflits Resolve merge conflicts Some more build and conflict fixes Resolve conflicts for 06-fused-attension.py resolve merge conflicts for the tutorial group gemm example Fixes for some LIT tests resolve remaining conflicts in tests Fix empty kernel set capability 0	2023-11-06 23:13:10 +00:00
Jason Furmanek	33151a860f	Merge commit 'ac9fa68d18c777e421bd3f6fb1ddcfd60b6fda33' into ifu-rebase-again Conflicts: .gitignore .gitmodules README.md bin/triton-translate.cpp include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td include/triton/Target/AMDGCN/AMDGCNTranslation.h include/triton/Target/HSACO/HSACOTranslation.h lib/Analysis/Allocation.cpp lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/CMakeLists.txt lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ScanOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/Utility.cpp lib/Conversion/TritonGPUToLLVM/Utility.h lib/Dialect/TritonGPU/IR/Dialect.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Target/HSACO/CMakeLists.txt lib/Target/HSACO/HSACOTranslation.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/test/unit/language/test_core.py python/test/unit/operators/test_flash_attention.py python/triton/compiler/compiler.py python/triton/compiler/make_launcher.py python/triton/language/semantic.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py python/tutorials/11-grouped-gemm.py test/Conversion/tritongpu_to_llvm.mlir	2023-11-06 23:10:10 +00:00
Michael Melesse	1fd9b40f2f	Works as StandAlone and Backend and also Perf is Good This is a combination of 4 commits. Works as StandAlone and Backend Works as StandAlone and Backend This is a combination of 13 commits. Works StandAlone and as Backend This is a combination of 7 commits. backend set default dir with flag move bitcode to backend dir copy backend save empty test work in backendmode enable backend mode when copying to upstream clean up fix failure minimize diff add skip function fix bug with corrupted dwarf exp match num_wraps fix multi threaded test issue move bitcode file out of lib move backend to python/triton/third_party/hip move libhsa backend works again restart ci clean upstream location first before copy match scripts fix new error memoize backend stuff fix bug	2023-10-26 14:27:18 -05:00
Michael Melesse	09ba348f87	[ROCM] Core Functionality for AMD (#1983 ) * this pr adds a third party backend for triton that works on AMD * this expose a lot of the work that has been done in our [fork](https://github.com/ROCmSoftwarePlatform/triton) * most unit tests on `test_core.py` pass * it skips some unit tests for various reasons * we plan to follow up with more prs improving Functionality and Performance in the future --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-10-26 08:36:49 -05:00
Mehdi Amini	721897fcc4	upgrade llvm to `b1115f8c` (NFC) (#2403 ) Co-authored-by: Thomas Raoux <thomas.raoux@openai.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Phil Tillet <phil@openai.com>	2023-10-16 16:38:49 -07:00
Alexander Efimov	4d539d7dae	Add licenses to AMD related files (#351 )	2023-10-16 15:18:01 -05:00
Stewart Hall	29828fe491	[FRONTEND] add option to disable fp mul/add fusion (#2495 ) By default, ptxas will enable fusion of mul/add to fma instructions. The backend was also being configured unconditionally to enable this on conversion from LLVM IR to PTX. This commit adds an option which can be used to disable the FP fusion behavior in both locations.	2023-10-14 12:23:30 -07:00
Philippe Tillet	3b6ec763d5	Revert "[BACKEND] Disable BreakPhiStruct pass (#2458 )" (#2498 ) This reverts commit `b1bc9b20a0`.	2023-10-14 10:40:49 -07:00
Jack Taylor	47563240f8	PyTorch triton branch synchronisation (#354 ) * Restructure ROCM Library Search Currently there are a handful of ROCM dependant files which are required for triton to run. The linker(ld.lld), the include files, and multiple hip/hsa shared objects. This change will provide three search areas to find these files. All in the same order. 1. third_party/rocm. This location is within the python/triton directory and is carried over when triton is built. IF all necessary files are in this location there will be no need to have ROCM installed at all on the system. 2. $ROCM_PATH environmental variable. If this exists it will override all other locations to find ROCM necessary files 3. /opt/rocm. The default location for ROCm installations. Finding one here will notify triton that ROCM is installed in this environment To ease with step 3. A new script scripts/amd/setup_rocm_libs.sh has been added to the repo. Executing this script will cause all necessary ROCM files to be downloaded from their respective packages on repo.radeon.com and installed in third_party/rocm. Allowing for triton to run without installing the full ROCM stack. setup_rocm_libs.sh takes a env_var ROCM_VERSION if a user wishes to install a ROCM version other than the default (currently 5.4.2) When triton whls are built to support Pytorch, method 3 will be used to stay in sync with PyTorch's approach of bringing along any libraries needed and not requiring ROCM to be installed. (cherry picked from commit e6aea90fb3e8218cb562e5d990719112d8282702) * Fix default rocm path Running into `fatal error: hip/hip_runtime.h: No such file or directory` with latest wheel due to incorrect directory for ROCm libs (cherry picked from commit 292bae625b113eb65c66cfe4442da7a6456c988a) * setup_rocm_libs.sh manylinux refactor (cherry picked from commit f995f314ada4606cb78dc6233cd9c8effc356191) * Set setup_rocm_libs.sh to be executable (cherry picked from commit 05d67b9418cacda0d356c2102d7c1a887948b013) * Revert to using numbered so files to fix upstream (cherry picked from commit 34f8189eae57a23cc15b4b4f032fe25757e0db8e) * Remove drm script --------- Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2023-10-11 15:30:39 +01:00
Thomas Raoux	b1bc9b20a0	[BACKEND] Disable BreakPhiStruct pass (#2458 ) This is causing functional failures in pytorch workload. Disabling it until I figure out the problem.	2023-10-06 17:59:53 -07:00
oplavsic	e801638b40	Add waves_per_eu as kernel parameter (#319 ) * Add waves_per_eu as kernel parameter * Fix failing tests * Add default value for waves_per_eu for ttgir_to_llir function * Remove aot.py	2023-10-06 12:08:34 -05:00
Christian Sigg	5458014282	[BACKEND] Lower to PTX with `trap-unreachable` (#2429 ) We've seen cases where the entire kernel is poisoned due to division-by-zero, resulting in a single `unreachable` instruction at the LLIR level. Emit this instruction as `trap` (instead of dropping it) so that the kernel doesn't run successfully without writing any outputs.	2023-10-03 21:05:10 -07:00
Thomas Raoux	020f43d5a3	[NFC] Minor clean ups found during LLVM upgrade (#2433 ) Pull some of the changes required for LLVM upgrade to make the upgrade simpler.	2023-10-03 08:22:46 -07:00
Jason Furmanek	e5d7bb4fae	Initial commit to resolve merge conflicts rename tl.float8e4 to tl.float8e4nv to align with upstream ROCM IFU: Fix python arch issues ROCM IFU: Fix kernel launcher ROCM IFU: Fix merge conflicts fix debug build Set correct threadsPerCTA	2023-10-03 04:04:26 +00:00
Jason Furmanek	74fd8e9754	Merge commit '36fc54b6f28168d3644808bfe299f1ba06a36272' into ifu230908-2 Conflicts: .gitignore bin/triton-translate.cpp include/triton/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.h include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMMAv2.cpp lib/Conversion/TritonGPUToLLVM/DotOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMBase.h lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Conversion/TritonGPUToLLVM/Utility.h lib/Dialect/Triton/Transforms/RewriteTensorPointer.cpp lib/Dialect/TritonGPU/IR/Dialect.cpp lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/test/unit/runtime/test_subproc.py python/triton/compiler/compiler.py python/triton/compiler/make_launcher.py python/triton/language/semantic.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py test/Conversion/triton_to_tritongpu.mlir test/Conversion/tritongpu_to_llvm.mlir test/TritonGPU/coalesce.mlir unittest/Conversion/TritonGPUToLLVM/CMakeLists.txt	2023-10-02 18:01:04 +00:00
Philippe Tillet	c4bc3fd92f	[BACKEND] Fix-up memory leak (#2365 )	2023-09-21 13:46:30 -07:00
Thomas Raoux	bb949d1141	[BACKEND] Move struct optimization down the LLVM pipeline (#2312 ) Move the optimization to remove phi of struct later in the optimization pipeline to avoid interfering with CFG optimization.	2023-09-16 12:28:53 -07:00
Thomas Raoux	d3956a21f3	[BACKEND] Add LLVM pre-processing pass to break struct types (#2285 ) Add infrastructure to be able to add and test custom LLVM passes in the backend. This will allow use to apply some low level optimizations and cleanup on LLVM IR. Add a first pass that breaks up phi of struct created by lowering to LLVM. Those can often pessimise the optimizer as it would block optimizations going through phi nodes.	2023-09-13 10:03:29 -07:00
Christian Sigg	f6828e1a6f	[Backend] Make `ConvertTritonGPUToLLVMPass`'s `tmaMetadata` a member (#2271 ) .. instead of an option. This partially addresses https://github.com/openai/triton/issues/2265 to no longer crash when printing a pass pipeline in textual form. It is not a proper solution for the fact that pass results should be stored in the IR and not in a pointer argument.	2023-09-11 07:16:54 -07:00
Corbin Robeck	007bea9994	Add bitcode writer to AMDGCN hsaco output	2023-09-01 04:02:29 +00:00
Jason Furmanek	df5c263a19	Fix merge conflicts	2023-09-01 04:01:32 +00:00
Jason Furmanek	3eaeb89d18	Merge commit '5df904233c11a65bd131ead7268f84cca7804275' into ifu230810-2 Conflicts: include/triton/Dialect/Triton/Transforms/Passes.h include/triton/Dialect/TritonGPU/IR/Dialect.h include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td lib/Analysis/Allocation.cpp lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Dialect/Triton/Transforms/RewriteTensorPointer.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Dialect/TritonGPU/Transforms/ReorderInstructions.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/triton/compiler/compiler.py python/triton/ops/flash_attention.py python/triton/runtime/autotuner.py python/triton/runtime/jit.py python/triton/tools/aot.py python/tutorials/06-fused-attention.py test/Conversion/tritongpu_to_llvm.mlir test/Target/tritongpu_to_llvmir.mlir test/Target/tritongpu_to_llvmir_noinline.mlir	2023-09-01 03:25:33 +00:00
Michael Melesse	c6d33dcebf	[ROCM] Core Functionality for AMD (#1983 ) * this pr adds a third party backend for triton that works on AMD * this expose a lot of the work that has been done in our [fork](https://github.com/ROCmSoftwarePlatform/triton) * most unit tests on `test_core.py` pass * it skips some unit tests for various reasons * we plan to follow up with more prs improving Functionality and Performance in the future --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-31 14:02:00 -07:00
Thomas	ad3e363a44	[BACKEND] Remove dead code related to old libhopper_helpers.bc (#2145 )	2023-08-20 22:56:20 -07:00
Whitney Tsang	100cabd0e4	[FRONTEND] use enum instead of bool to select target (#2118 ) Before this PR, the determination of `TritonGPUToLLVMIRPass` to generate NVVM-compatible LLVM or ROCDL-compatible LLVM is controlled by a boolean `isROCM`. This method is hard to scale. This PR changes it to use an enum instead, where new target can be added easily when needed. --------- Signed-off-by: Tsang, Whitney <whitney.tsang@intel.com> Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-17 18:37:09 -07:00
Whitney Tsang	129e7dfc6f	[TritonGPUToLLVM] Correct the usage of option passing (#2104 ) For example, when given `--convert-triton-gpu-to-llvm="is-rocm=true"`, `ConvertTritonGPUToLLVMPass` should generate ROCM-compatible LLVM. Before this PR, transformation options passed in command line are not respected.	2023-08-16 00:56:01 +00:00
Zahi Moudallal	4d373aa103	[BACKEND] Remove HopperHelpers.c and replace with inline ptx and LLVM codegen (#2047 )	2023-08-10 15:52:37 -07:00
Peter Hawkins	3be74fa92d	Include only necessary MLIR conversion passes, rather than all of them. (#2068 ) No functional changes intended, and it might slightly speed up the build. This allows a downstream Bazel build of Triton to avoid building a number of dialects and passes that Triton doesn't need.	2023-08-09 08:30:42 -07:00
goostavz	f1512bded1	Initial code merge of Hopper support (#2036 ) The initial code merge of Nvidia Hopper features support. Please be aware that the code merge is not finished yet and the trouble-shooting is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.) and automatic warp-specialization are experimental for now and turned off by default. It is recommended for a trial when version 3.0 is released. The work is contributed by: ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao, ivanyinwz, goostavz & yangjunpro from Nvidia, in cooperation with: ptillet, Jokeren, ThomasRaoux & zahimoud from OpenAI. Co-authored-by: Goostav Zhu <gzhu@nvidia.com>	2023-08-07 09:53:04 +08:00
oplavsic	138844568d	Use optimal number of VGPRs (#281 ) * Use optimal number of VGPRs * Fix tritongpu_to_hsaco test	2023-08-04 10:46:53 -05:00
Thomas	e6216047b8	[BACKEND] Upgrade the max PTX version allowed to 8.2 (#1982 )	2023-07-23 19:56:01 -07:00
Keren Zhou	cc5a7ed52f	[FRONTEND][BACKEND] Materialize line info for triton kernels (#1902 ) `export TRITON_DISABLE_LINE_INFO=1` to disable the feature.	2023-07-07 16:03:44 -04:00
Ingo Müller	a5fb71eed8	[CMAKE] Add link dependency to dl for TritonLLVMIR. (#1857 ) That library makes use of the dladdr function, so it eventually needs to be linked with -ldl, which may not be done automatically. This commit adds a link dependency to `${CMAKE_DL_LIBS}`, which is CMake's way of specifying that library in a portable way.	2023-06-29 10:31:45 -04:00
Jason Furmanek	28d9754b2a	Merge remote-tracking branch 'oai/main' into ifu230601 Conflicts: python/test/unit/language/assert_helper.py test/Conversion/tritongpu_to_llvm.mlir	2023-06-01 20:53:33 +00:00
Mehdi Amini	83245259a6	[OPTIMIZER][BACKEND] switch the TritonGPU dialect to use MLIR Properties (NFC) (#1696 ) Also try to switch APIs access to the new upstream APIs that separate explicitly the access to "discardable" and "inherent" attributes (the latter being stored in properties now). Generic accessors like `getAttr()` `setAttr()` `setAttrs()` are much more expensive and to be avoided.	2023-05-20 01:36:48 +00:00
Jason Furmanek	78c60742fc	IFU 230517 Resolve merge conflicts	2023-05-17 17:36:44 +00:00
Jason Furmanek	4c4e42e524	Merge remote-tracking branch 'openai/main' into IFU-230517 Conflicts: lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/test/unit/language/assert_helper.py python/triton/third_party/cuda/bin/ptxas test/Conversion/tritongpu_to_llvm.mlir It looks like you may be committing a merge. If this is not correct, please remove the file .git/MERGE_HEAD and try again.	2023-05-17 15:03:42 +00:00
cloudhan	323843cde8	[BUILD] stop depending on dlfcn-win32 by implementing `dladdr` natively with WIN32 API (#1674 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-05-16 07:19:36 +00:00
Ingo Müller	b2a757d000	[BUILD] Add missing CMake link-time dependencies. (#1654 )	2023-05-11 19:17:44 -07:00
Michael Melesse	9cc141b12d	assume ROCM device This is a combination of 7 commits. use pyt nightly with root repro with pytorch unit test hardcode isROCM to true set is_cuda to False ignore cc arg clean up match triton-mlir branch	2023-05-04 16:46:59 -05:00
Michael Melesse	2784b804d9	Merge remote-tracking branch 'upstream/main' into ifu_4_26_2023	2023-04-26 12:04:21 -05:00
Daniil Fukalov	3a4343968b	Merge branch 'triton-mlir' into dfukalov/work	2023-04-17 21:08:58 +02:00
Michael Melesse	705d47d0dd	fix lit test issues This is a combination of 6 commits. install lit fix lit test fix lit test fix aot lit issues fix final lit tests add lit tests	2023-04-17 11:46:37 -05:00
Philippe Tillet	e5c7d2a83c	[FRONTEND] cleaned up language; added frontend function for `globaltimer` special register (#1525 )	2023-04-14 15:29:27 -07:00
dfukalov	d76b343aac	[ROCM] Use `llvm::dbgs()` and `llvm::errs()` streams instead of `std::cout`.	2023-04-14 16:56:11 +02:00
Michael Melesse	3603483fc0	clean up previous platform functions	2023-04-13 13:20:08 -05:00

1 2 3

125 Commits