github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Jason Furmanek	977d5aa267	Merge commit '721897fcc4f942aa97d2e9ba3787a5e213758177' into ifu-231108 Conflicts: bin/triton-translate.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp python/triton/compiler/compiler.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py test/Conversion/tritongpu_to_llvm.mlir	2023-11-08 18:51:23 +00:00
Jason Furmanek	39e8901d7a	ROCM IFU: Resolve merge conflicts in RemoveLayoutConversions.cpp fix merge error fix dot fix make_range additional fix	2023-11-07 04:29:38 +00:00
Jason Furmanek	c3132eeda8	ROCM IFU: Third-party fixes: preload ROCDL Additional 3rd-party fix Remove redundant mfma_supported defines	2023-11-07 03:29:16 +00:00
Jason Furmanek	3a6dc5ad8d	resolve some merge conflicts fix more conflits Resolve merge conflicts Some more build and conflict fixes Resolve conflicts for 06-fused-attension.py resolve merge conflicts for the tutorial group gemm example Fixes for some LIT tests resolve remaining conflicts in tests Fix empty kernel set capability 0	2023-11-06 23:13:10 +00:00
Jason Furmanek	33151a860f	Merge commit 'ac9fa68d18c777e421bd3f6fb1ddcfd60b6fda33' into ifu-rebase-again Conflicts: .gitignore .gitmodules README.md bin/triton-translate.cpp include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td include/triton/Target/AMDGCN/AMDGCNTranslation.h include/triton/Target/HSACO/HSACOTranslation.h lib/Analysis/Allocation.cpp lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/CMakeLists.txt lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ScanOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/Utility.cpp lib/Conversion/TritonGPUToLLVM/Utility.h lib/Dialect/TritonGPU/IR/Dialect.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Target/HSACO/CMakeLists.txt lib/Target/HSACO/HSACOTranslation.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/test/unit/language/test_core.py python/test/unit/operators/test_flash_attention.py python/triton/compiler/compiler.py python/triton/compiler/make_launcher.py python/triton/language/semantic.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py python/tutorials/11-grouped-gemm.py test/Conversion/tritongpu_to_llvm.mlir	2023-11-06 23:10:10 +00:00
Shucai Xiao	79bebc4ffe	fp8 type support (#357 ) * add two fp8 data types `tl.float8e4b8` and `tl.float8e5b16` to triton. * add SW type conversion between `tl.float8e4b8/tl.float8e5b16` and `fp16` * change flashattention to support fp8 in q/k.	2023-11-02 15:51:23 -05:00
Michael Melesse	1fd9b40f2f	Works as StandAlone and Backend and also Perf is Good This is a combination of 4 commits. Works as StandAlone and Backend Works as StandAlone and Backend This is a combination of 13 commits. Works StandAlone and as Backend This is a combination of 7 commits. backend set default dir with flag move bitcode to backend dir copy backend save empty test work in backendmode enable backend mode when copying to upstream clean up fix failure minimize diff add skip function fix bug with corrupted dwarf exp match num_wraps fix multi threaded test issue move bitcode file out of lib move backend to python/triton/third_party/hip move libhsa backend works again restart ci clean upstream location first before copy match scripts fix new error memoize backend stuff fix bug	2023-10-26 14:27:18 -05:00
Michael Melesse	09ba348f87	[ROCM] Core Functionality for AMD (#1983 ) * this pr adds a third party backend for triton that works on AMD * this expose a lot of the work that has been done in our [fork](https://github.com/ROCmSoftwarePlatform/triton) * most unit tests on `test_core.py` pass * it skips some unit tests for various reasons * we plan to follow up with more prs improving Functionality and Performance in the future --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-10-26 08:36:49 -05:00
Alexander Efimov	20f316b19a	[MFMA] Switch between MFMA types (#352 ) This PR introduces matrix_instr_nonkdim flag to switch between MFMA 16 and MFMA 32.	2023-10-18 16:57:34 +02:00
Zahi Moudallal	726bdb984f	[FRONTEND][BACKEND] Fix constexpr assignment ; revert #2430 (#2496 ) Without this change, a constexpr assignment (ie. `A = B & C`, where `B` and `C` are both constexpr) is getting assigned to a triton tensor, which becomes an issue when `A` is used as the condition of an If statement. Note: I had to add `not isinstance(node.value, ast.Constant)` to the condition because if we are assigning `x = 0` then the assigned value is also a constexpr, but in this case we do want to assign a triton tensor to `x` so that we can do `x.to(tl.int64)` for example, which cannot be done on a constexpr. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-10-16 12:35:19 -07:00
Stewart Hall	29828fe491	[FRONTEND] add option to disable fp mul/add fusion (#2495 ) By default, ptxas will enable fusion of mul/add to fma instructions. The backend was also being configured unconditionally to enable this on conversion from LLVM IR to PTX. This commit adds an option which can be used to disable the FP fusion behavior in both locations.	2023-10-14 12:23:30 -07:00
Jack Taylor	5d44c60d17	enforce cc=None on PyTorch ROCm (#296 ) * enforce cc=None on ROCm * Comment * Update approach to ignore integer cc values Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> --------- Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com>	2023-10-12 10:17:26 +01:00
Jack Taylor	47563240f8	PyTorch triton branch synchronisation (#354 ) * Restructure ROCM Library Search Currently there are a handful of ROCM dependant files which are required for triton to run. The linker(ld.lld), the include files, and multiple hip/hsa shared objects. This change will provide three search areas to find these files. All in the same order. 1. third_party/rocm. This location is within the python/triton directory and is carried over when triton is built. IF all necessary files are in this location there will be no need to have ROCM installed at all on the system. 2. $ROCM_PATH environmental variable. If this exists it will override all other locations to find ROCM necessary files 3. /opt/rocm. The default location for ROCm installations. Finding one here will notify triton that ROCM is installed in this environment To ease with step 3. A new script scripts/amd/setup_rocm_libs.sh has been added to the repo. Executing this script will cause all necessary ROCM files to be downloaded from their respective packages on repo.radeon.com and installed in third_party/rocm. Allowing for triton to run without installing the full ROCM stack. setup_rocm_libs.sh takes a env_var ROCM_VERSION if a user wishes to install a ROCM version other than the default (currently 5.4.2) When triton whls are built to support Pytorch, method 3 will be used to stay in sync with PyTorch's approach of bringing along any libraries needed and not requiring ROCM to be installed. (cherry picked from commit e6aea90fb3e8218cb562e5d990719112d8282702) * Fix default rocm path Running into `fatal error: hip/hip_runtime.h: No such file or directory` with latest wheel due to incorrect directory for ROCm libs (cherry picked from commit 292bae625b113eb65c66cfe4442da7a6456c988a) * setup_rocm_libs.sh manylinux refactor (cherry picked from commit f995f314ada4606cb78dc6233cd9c8effc356191) * Set setup_rocm_libs.sh to be executable (cherry picked from commit 05d67b9418cacda0d356c2102d7c1a887948b013) * Revert to using numbered so files to fix upstream (cherry picked from commit 34f8189eae57a23cc15b4b4f032fe25757e0db8e) * Remove drm script --------- Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2023-10-11 15:30:39 +01:00
Beal Wang	5812d970a8	[HOPPER][OPTIMIZER] remove divOp and remOp from gemm math loop (#2402 ) This is just for Warp Specialization kernels on Hopper. Replace DivOp and RemOp with SelectOp and AndOp/XorOp.	2023-10-09 14:42:06 +08:00
Philippe Tillet	424e67e727	[FRONTEND] improved while loop error messages (#2463 )	2023-10-06 18:37:52 -07:00
Keren Zhou	a42d517021	[FRONTEND] Better warning on nested jit functions (#2453 )	2023-10-06 14:22:51 -07:00
oplavsic	e801638b40	Add waves_per_eu as kernel parameter (#319 ) * Add waves_per_eu as kernel parameter * Fix failing tests * Add default value for waves_per_eu for ttgir_to_llir function * Remove aot.py	2023-10-06 12:08:34 -05:00
Michael Melesse	31fe8aadc5	ROCM IFU: Fix minimize_alloc ROCM IFU: Small fixes	2023-10-03 05:34:44 +00:00
Jason Furmanek	e5d7bb4fae	Initial commit to resolve merge conflicts rename tl.float8e4 to tl.float8e4nv to align with upstream ROCM IFU: Fix python arch issues ROCM IFU: Fix kernel launcher ROCM IFU: Fix merge conflicts fix debug build Set correct threadsPerCTA	2023-10-03 04:04:26 +00:00
Jason Furmanek	74fd8e9754	Merge commit '36fc54b6f28168d3644808bfe299f1ba06a36272' into ifu230908-2 Conflicts: .gitignore bin/triton-translate.cpp include/triton/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.h include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMMAv2.cpp lib/Conversion/TritonGPUToLLVM/DotOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMBase.h lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Conversion/TritonGPUToLLVM/Utility.h lib/Dialect/Triton/Transforms/RewriteTensorPointer.cpp lib/Dialect/TritonGPU/IR/Dialect.cpp lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/test/unit/runtime/test_subproc.py python/triton/compiler/compiler.py python/triton/compiler/make_launcher.py python/triton/language/semantic.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py test/Conversion/triton_to_tritongpu.mlir test/Conversion/tritongpu_to_llvm.mlir test/TritonGPU/coalesce.mlir unittest/Conversion/TritonGPUToLLVM/CMakeLists.txt	2023-10-02 18:01:04 +00:00
Philippe Tillet	7432fff4be	[FRONTEND] add limited introspection capabilities in `tl.extra.cuda` ; rename `arch` into `target` (#2385 )	2023-09-25 23:58:25 -07:00
Shantanu	8e75e392ae	[FRONTEND] Fix Python error handling in launch (#2334 ) This was regressed by #2185 because we didn't realise CUDA_CHECK macro could do Python calls (similar to what led to #2225). I think the PyErr_Occurred got removed in that PR because there was missing error handling before the call to _launch, so it looked like it was just in the wrong place. It looks like there are also potentially a couple places in cuda.c that can return with error set, e.g. getDeviceProperties, memAlloc, memcpyHtoD, memFree, tensorMapEncodeTiled etc, but those are all pre-existing and not affected by recent changes.	2023-09-19 00:12:00 -07:00
Stonepia	68e1bd162c	[FRONTEND] fix xpu stages logic (#2305 )	2023-09-17 09:19:14 -07:00
Zahi Moudallal	db5c793f82	[FRONTEND] Add sass to asm dict with lazy evaluation (#2309 )	2023-09-15 15:31:43 -07:00
Keren Zhou	08c1658957	[FRONTEND] Accommodate new triton IR format (#2294 ) - Support memory space for pointers (e.g., `!tt.ptr<f32, 1>`). - Support parsing function attribute, though not used yet.	2023-09-14 09:03:23 -07:00
Zahi Moudallal	36087a108f	[FRONTEND] Added SASS to asm dict (#2280 )	2023-09-13 21:21:01 +00:00
Thomas Raoux	b63e8f87fc	[FRONTEND] Override prototype (#2214 ) Low tech but very useful way to override kernels on the fly. This can be use for debugging functionality or performance problems this lets user dump modify and feed back IR into the jit compiler.	2023-09-13 10:05:47 -07:00
Alexander Efimov	a06072f8ff	Fix dangling gpu_has_mfma use (#325 ) * Fix dangling gpu_has_mfma use This PR replaces gpu_has_mfma use with gpu_matrix_core_version * add basic test	2023-09-11 12:31:48 -05:00
Alexander Efimov	6691de65db	[MFMA] Support BFloat16 on MI100 (#295 ) * [MFMA] Support BFloat16 on MI100 This PR makes use of mfma_f32_32x32x4bf16 instruction, available on MI100. * fix tests, fix mfma encoding comment, fix switch between mfma versions. * replace kDim from mfma layout with kWidth from dotOp layout * rebase fix * fix mfma to dot op shortcut for bfloat16 * fix review comments	2023-09-08 15:08:34 -05:00
Keren Zhou	10f59d8ce0	[RUNTIME] Get the correct end idx for regular arguments of GPU kernels (#2262 ) Previously, if there were any specializations of "1" or "constexpr" mixed with unspecialized arguments in arbitrary order, we might have encountered errors due to passing incorrect arguments. This was because the length of the signature did not indicate the maximum index of regular arguments. https://github.com/openai/triton/issues/2229 @shunting314 @amjames More specifically for cases like: ``` kernel( b: tl.tensor, a: tl.constexpr, c: tl.int = 1, d, e: tl.constexpr, ... ) ```	2023-09-07 23:31:07 -07:00
SJW	491eb9ddfe	[MLIR] Added tritongpu-stream-pipeline pass (#305 ) * [MLIR] Added tritongpu-stream-pipeline pass - Prologue: Hoist the pipelinable load operations and shared memory store for the ramp up stage - Pipelined Loop: Assemble the loop body minus last iteration - Prefetch next tile from global into regs (while computing from previous) - Non-load loop body - Store next tile into shared mem - Epilogue: Peeled non-load loop body for last iteration * * updated comment	2023-09-07 15:24:59 -05:00
Shantanu	a4df60e20a	[FRONTEND] Fix GIL handling in error conditions (#2225 ) The use of the opaque GIL state APIs should mean that the PyErr_SetString is now safe, regardless of whether the caller has the GIL or not.	2023-09-01 13:30:42 -07:00
Jason Furmanek	df5c263a19	Fix merge conflicts	2023-09-01 04:01:32 +00:00
Jason Furmanek	3eaeb89d18	Merge commit '5df904233c11a65bd131ead7268f84cca7804275' into ifu230810-2 Conflicts: include/triton/Dialect/Triton/Transforms/Passes.h include/triton/Dialect/TritonGPU/IR/Dialect.h include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td lib/Analysis/Allocation.cpp lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Dialect/Triton/Transforms/RewriteTensorPointer.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Dialect/TritonGPU/Transforms/ReorderInstructions.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/triton/compiler/compiler.py python/triton/ops/flash_attention.py python/triton/runtime/autotuner.py python/triton/runtime/jit.py python/triton/tools/aot.py python/tutorials/06-fused-attention.py test/Conversion/tritongpu_to_llvm.mlir test/Target/tritongpu_to_llvmir.mlir test/Target/tritongpu_to_llvmir_noinline.mlir	2023-09-01 03:25:33 +00:00
Michael Melesse	c6d33dcebf	[ROCM] Core Functionality for AMD (#1983 ) * this pr adds a third party backend for triton that works on AMD * this expose a lot of the work that has been done in our [fork](https://github.com/ROCmSoftwarePlatform/triton) * most unit tests on `test_core.py` pass * it skips some unit tests for various reasons * we plan to follow up with more prs improving Functionality and Performance in the future --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-31 14:02:00 -07:00
Greg Brockman	a9b8c8c37d	[FRONTEND] drop GIL for launch, and set value=false upon pointer error (#2185 )	2023-08-26 17:07:57 +00:00
Greg Brockman	64d8df4c69	[FRONTEND] handle errors from launch_enter_hook (#2178 )	2023-08-24 20:32:01 -07:00
jayfurmanek	ff7e707f87	Enable usage of block pointer semantics for AMD gpus (#301 ) * Enable usage of block pointer semantics for AMD gpus This commit enables usage of block pointer semantics by enabling rewrite_tensor_pointer_pass that rewrites block pointer loads/stores to legacy loads/stores. * Update FA fwd in tutorial to use the block pointers * use 90 compute capability for amd gpus in python/triton/compiler/compiler.py Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> --------- Co-authored-by: Ognjen Plavsic <ognjen.plavsic@dxc.com> Co-authored-by: Lixun Zhang <lixun.zhang@amd.com> Co-authored-by: Aleksandr Efimov <130555951+alefimov-amd@users.noreply.github.com> Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com>	2023-08-24 13:05:12 -05:00
chengjunlu	6cb67185f8	[FRONTEND]To use proper default num_warps and num_stages based on the device backend in JITFucntion (#2130 ) The default values used by JITFunction for num_warps and num_stages are coupled with Nvidia GPU architecture. We should use the proper default values based on the device backend for the kernel to be compiled to. 1. Add two functions to return the default num_warps and num_stages for the specific device backend. 2. JITFunction uses the proper default num_warps and num_stages based on the specific device backend. Co-authored-by: Wang Weihan <eikan.wang@intel.com>	2023-08-24 21:58:18 +08:00
Philippe Tillet	ea8416164f	[FRONTEND] name mangling fixup (#2148 )	2023-08-21 12:11:52 -07:00
Keren Zhou	584d5c263f	[FRONTEND] Disable IfExp on dynamic conditions (#2100 ) `if _unwrap_if_constexpr(cond)` then enters `node.body` is wrong when cond is a tensor since we cannot statically evaluate a dynamic tensor's value. The right way to solve the problem is probably: 1. visit the ast of IfExp (do not build IRs) 2. get the type of the last statement 3. initialize the return value and assign it to livein 4. call visit_If	2023-08-20 12:58:10 -07:00
Zahi Moudallal	23dd11d471	[BACKEND] Solidify f8e4m3 (#2105 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-18 19:12:09 -07:00
Whitney Tsang	100cabd0e4	[FRONTEND] use enum instead of bool to select target (#2118 ) Before this PR, the determination of `TritonGPUToLLVMIRPass` to generate NVVM-compatible LLVM or ROCDL-compatible LLVM is controlled by a boolean `isROCM`. This method is hard to scale. This PR changes it to use an enum instead, where new target can be added easily when needed. --------- Signed-off-by: Tsang, Whitney <whitney.tsang@intel.com> Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-17 18:37:09 -07:00
Lixun Zhang	87e45cb011	Set vecSize and maxPhase more generically	2023-08-16 08:30:32 -05:00
Lixun Zhang	7156fcb0ef	Set `vecSize` = 4 and `maxPhase` = BLOCK_K/4	2023-08-16 08:30:32 -05:00
Izzy Putterman	fc667d1f8f	[FRONTEND] fix new absolute imports (#2072 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-13 14:23:36 +00:00
Thomas	98372f46d3	[FRONTEND] Remove extra calls to _get_config causing runtime overhead (#2094 )	2023-08-13 06:51:26 -07:00
Zahi Moudallal	a01c116f76	[FRONTEND/BACKEND] Revived Float8E4B15x4 (#2090 )	2023-08-11 17:49:52 -07:00
Keren Zhou	382e8fb1fa	[RUNTIME] Make apis compatible with cuda 11 drivers (#2081 ) https://github.com/openai/triton/issues/2042	2023-08-11 17:46:56 -07:00
Zahi Moudallal	b62b6d6a71	[FRONTEND] Remove cache key from metadata (#2082 )	2023-08-11 08:58:00 -07:00

1 2 3

118 Commits