github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Shucai Xiao	79bebc4ffe	fp8 type support (#357 ) * add two fp8 data types `tl.float8e4b8` and `tl.float8e5b16` to triton. * add SW type conversion between `tl.float8e4b8/tl.float8e5b16` and `fp16` * change flashattention to support fp8 in q/k.	2023-11-02 15:51:23 -05:00
Shucai Xiao	2729ae6c6f	use different int8 mfma instructions on different GPUs. (#368 ) * changes support to choose different int8 instructions * rename an instruction name Co-authored-by: Aleksandr Efimov <efimov.alexander@gmail.com>	2023-10-25 19:12:21 -05:00
Alexander Efimov	5a86b46bb1	[MFMA] FP8 and BF8 support (#355 ) * [MFMA] FP8 and BF8 support This PR adds support of fp8 and bf8 in AccelerateMatmul pass and Introduces generation of float8 mfma instructions in ttg to llvm conversion. * add tests * fix tests * review fix: fix variable naming and dot operand promotion. * review comments fixes --------- Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>	2023-10-25 13:27:10 -05:00
oplavsic	715a589ce3	[FA fwd D=128] Reduce LDS usage in epilogue (#340 ) * rebase onto improve_fwd_fa * Fixed a leftover from rebase * rebase onto improve_fa_fwd * Reduce tuning space * Disable bwd with D=128 * Add test for d=128 * Fix an issue with get_best_config when there is only one config * Added better configs for d=128 * Fix typos --------- Co-authored-by: Lixun Zhang <lixun.zhang@amd.com>	2023-10-25 12:10:34 -05:00
jayfurmanek	e74bdb1581	Always promote to int32 in commonShflSync (#369 )	2023-10-23 12:27:11 -05:00
Alexander Efimov	20f316b19a	[MFMA] Switch between MFMA types (#352 ) This PR introduces matrix_instr_nonkdim flag to switch between MFMA 16 and MFMA 32.	2023-10-18 16:57:34 +02:00
Alexander Efimov	4d539d7dae	Add licenses to AMD related files (#351 )	2023-10-16 15:18:01 -05:00
Alexander Efimov	7e34c244c2	[Triton] Mfma16 support (#251 ) * [MFAM] Support mfma with NM size 16 This PR code emitting of MFMA instructions with size 16. * add control over mfma type with MFMA_TYPE=16 env var	2023-10-09 13:59:54 -05:00
Aleksandr Efimov	e6f75d05e3	fix sum_reduction lit test in Conversion/tritongpu_to_llvm.mlir testsuite	2023-10-03 16:13:13 +00:00
Aleksandr Efimov	90a15e449e	ROCM IFU: Fix tritongpu_to_llvm lit test	2023-10-03 04:31:03 +00:00
wenchenvincent	42a5bf9c7c	ROCM IFU: Enabled conversion between fp8e4m3b15x4 and fp16. Refactored conversion between fp8e4m3nv and fp16. (#335 )	2023-10-03 04:30:01 +00:00
Aleksandr Efimov	634f66a090	ROCM IFU: Fix emitOffsetForMfmaLayout function	2023-10-03 04:29:54 +00:00
Aleksandr Efimov	78faa65dbd	ROCM IFU: Fix of dot operand type promotion ROCM IFU: Fix formatting	2023-10-03 04:29:29 +00:00
Jason Furmanek	e5d7bb4fae	Initial commit to resolve merge conflicts rename tl.float8e4 to tl.float8e4nv to align with upstream ROCM IFU: Fix python arch issues ROCM IFU: Fix kernel launcher ROCM IFU: Fix merge conflicts fix debug build Set correct threadsPerCTA	2023-10-03 04:04:26 +00:00
Jason Furmanek	74fd8e9754	Merge commit '36fc54b6f28168d3644808bfe299f1ba06a36272' into ifu230908-2 Conflicts: .gitignore bin/triton-translate.cpp include/triton/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.h include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandMMAv2.cpp lib/Conversion/TritonGPUToLLVM/DotOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMBase.h lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Conversion/TritonGPUToLLVM/Utility.h lib/Dialect/Triton/Transforms/RewriteTensorPointer.cpp lib/Dialect/TritonGPU/IR/Dialect.cpp lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/test/unit/runtime/test_subproc.py python/triton/compiler/compiler.py python/triton/compiler/make_launcher.py python/triton/language/semantic.py python/triton/runtime/jit.py python/tutorials/06-fused-attention.py test/Conversion/triton_to_tritongpu.mlir test/Conversion/tritongpu_to_llvm.mlir test/TritonGPU/coalesce.mlir unittest/Conversion/TritonGPUToLLVM/CMakeLists.txt	2023-10-02 18:01:04 +00:00
Shucai Xiao	6e82aa8dbc	support gemm fp8/fp16 mixed input (#333 ) * changes to support fp8/fp16 mixed inputs * add unit test for fp8/fp16 mixed input for gemm	2023-09-27 08:00:31 -05:00
Aleksandr Efimov	d80cd2d374	[MFMA] Change kWidth parameter semantics This PR changes kWidth semantics "from elements per instruction" to "elements per thread per instruction" along k axis.	2023-09-25 10:56:44 -05:00
Lixun Zhang	23465f3416	Add assert for shuffleType and default value for laneid	2023-09-12 10:16:44 -05:00
Lixun Zhang	1c653dc438	Move shfl_up impl into commonShflSync	2023-09-12 10:16:44 -05:00
Lixun Zhang	74ea0c87de	Generalize warpSize - We have to change the API of shflUpSync to pass laneId to the rocm implementation of shfl_up - And we also distinguish laneIdAxis and laneId	2023-09-12 10:16:44 -05:00
Lixun Zhang	ea397b49aa	Fix the issue when CTA coverage is larger than the tile	2023-09-12 10:16:44 -05:00
Lixun Zhang	ed20089bc8	Add shfl_up implementation for AMD backend copied from `f58b93693b/include/hc.hpp (L2879-L2885)`	2023-09-12 10:16:44 -05:00
Alexander Efimov	6691de65db	[MFMA] Support BFloat16 on MI100 (#295 ) * [MFMA] Support BFloat16 on MI100 This PR makes use of mfma_f32_32x32x4bf16 instruction, available on MI100. * fix tests, fix mfma encoding comment, fix switch between mfma versions. * replace kDim from mfma layout with kWidth from dotOp layout * rebase fix * fix mfma to dot op shortcut for bfloat16 * fix review comments	2023-09-08 15:08:34 -05:00
Wen Chen	076a04d5eb	[ROCM] Optimized int8 to bf16 conversion by not reusing FpToFpOpConversion::convertFp32ToBf16. Changed the lit test rules for vectorized int8 to bf16 conversion on ROCm as ROCm has a different implementation.	2023-09-07 17:26:43 +00:00
Wen Chen	ffc230ebfe	[ROCM] Fixed implementation of fp32 to bf16 conversion on ROCm.	2023-09-06 18:10:54 -05:00
Wen Chen	2d3e38e182	[ROCM] Added ROCm support for int8 to bfloat16 conversion.	2023-09-06 18:10:54 -05:00
Wen Chen	59a40d3f72	[ROCM] Added ROCm support for the conversions of following data types: [float8e4m3, float8e4m3b15, float8e5m2] <-> [float16, bfloat16]	2023-09-06 18:10:54 -05:00
Aleksandr Efimov	751edfb3b9	[BACKEND] Fix fma mixed-precision This is partial cherry-pick of https://github.com/openai/triton/pull/2184 Dropped code unrealted to dot fix.	2023-09-05 21:16:50 +00:00
Aleksandr Efimov	591681d36e	Revert "[Dot] Fix FMA fp16xfp16 dot (#315 )" This reverts commit `11752a6993`.	2023-09-05 21:12:56 +00:00
Alexander Efimov	11752a6993	[Dot] Fix FMA fp16xfp16 dot (#315 ) Disable reorder of FMA dot arguments for amd gpu.	2023-09-05 20:52:08 +00:00
Keren Zhou	c0f418bcdd	[BACKEND] Fix BF16 dot operand type mismatch (#2162 ) https://github.com/openai/triton/issues/2156	2023-09-05 20:46:31 +00:00
Aleksandr Efimov	2f7ead6f3b	Fix subprocess tests for IFU This PR changes printf ttgir -> llvm conversion, unknown location is assigned to global constant holding format string. This fixes problem in test_subprocess.py tests, which failed during construction of file location for format string constants.	2023-09-05 20:46:04 +00:00
Jason Furmanek	df5c263a19	Fix merge conflicts	2023-09-01 04:01:32 +00:00
Jason Furmanek	3eaeb89d18	Merge commit '5df904233c11a65bd131ead7268f84cca7804275' into ifu230810-2 Conflicts: include/triton/Dialect/Triton/Transforms/Passes.h include/triton/Dialect/TritonGPU/IR/Dialect.h include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td lib/Analysis/Allocation.cpp lib/Analysis/Utility.cpp lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVM.cpp lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Dialect/Triton/Transforms/RewriteTensorPointer.cpp lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp lib/Dialect/TritonGPU/Transforms/ReorderInstructions.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/src/triton.cc python/triton/compiler/compiler.py python/triton/ops/flash_attention.py python/triton/runtime/autotuner.py python/triton/runtime/jit.py python/triton/tools/aot.py python/tutorials/06-fused-attention.py test/Conversion/tritongpu_to_llvm.mlir test/Target/tritongpu_to_llvmir.mlir test/Target/tritongpu_to_llvmir_noinline.mlir	2023-09-01 03:25:33 +00:00
Philippe Tillet	36fc54b6f2	[BACKEND] cleaner V100 tensor core packing (#2222 )	2023-08-31 14:00:51 -07:00
Philippe Tillet	c4b27d04e3	[BACKEND] Added float8e4b15 codegen for SM < 80 (#2216 )	2023-08-30 21:57:49 -07:00
Philippe Tillet	ec51552fff	[BACKEND] Lift restriction for float8e4b15 to only support row-col layout (#2212 )	2023-08-30 14:06:31 -07:00
Keren Zhou	6e4932cda8	[BACKEND] Fix fma mixed-precision (#2184 ) and expose the allow_tf32 argument to the matmul op @shunting314	2023-08-26 09:49:58 -07:00
Keren Zhou	f6cdcf1d77	[BACKEND] Fix BF16 dot operand type mismatch (#2162 ) https://github.com/openai/triton/issues/2156	2023-08-24 20:32:33 -07:00
Zahi Moudallal	5282ed890d	[CI] Add back pre-commit to nvidia CI job (#2159 )	2023-08-23 01:11:03 +00:00
ivanyinwz	ec801ce18e	[BACKEND] Optimize performance for f16 epilogue with TMA store (#2135 ) 1. Optimize the conversion and packing for 2xf32 -> 2xf16. 2. Split TMA store block into multiple slices of size 64x64. 3. Distribute the TMA store to all the warps. 4. Fix some naming issue.	2023-08-21 12:44:11 -07:00
jayfurmanek	fa429316d4	Merge pull request #268 from ROCmSoftwarePlatform/improve_reduce_for_fa [CHERRY-PICKED FROM UPSTREAM][BACKEND] no longer uses shared mem or barriers for single-warp reductions (openai#1915)	2023-08-21 13:29:11 -05:00
Alexander Zinoviev	d5188fa230	[BACKEND] enable transpose for float16 on sm75 (#2139 ) Replace the Turing version for the dot operation from following Volta version to following Ampere version. Update code generator to produce two m16.n8.k8 MMAs for Turing instead of one m16.n8.k16 MMA we have for Ampere.	2023-08-18 22:20:17 -07:00
Zahi Moudallal	23dd11d471	[BACKEND] Solidify f8e4m3 (#2105 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-18 19:12:09 -07:00
Thomas	23ef2615d2	[BACKEND] Merge TT_ElementwisePureExtern and TT_ElementwiseImpureExtern (#2137 ) Use getEffect instead to tell passes whether the op has side effects or not. This doesn't change functionality otherwise.	2023-08-18 20:56:10 +00:00
Thomas	bf351b9ba2	[FRONTENT][BACKEND] Add support for elementwise inline assembly (#2136 ) Add a new operation to be able to implement packed inline assembly for elementwise operations. This way inline assembly can be used to control elementwise operations. It also allows to pack elements to be able to manually vectorize operations.	2023-08-18 12:57:52 -07:00
Alexander Efimov	01b0108c94	[MFMA] [FA] Keep bf16 results of FA dot operations in registers (#298 ) This PR enables optimization for keeping bf16 values in registers between dot operations.	2023-08-18 07:33:00 -05:00
Alexander Efimov	9ab335196f	[MFMA] More optimal offset computation (#286 ) This PR replaces expensive operations with simpler ones: mul,div are replaced with select and compare. This is minor change, it decreses number of required registers when dot operation loading is a bottleneck by one.	2023-08-18 07:32:38 -05:00
Alexander Efimov	23979098c8	[MFMA] MI200 bfloat16 support (#294 ) This PR enables bfloat16 support in MFMA dot on MI200. Used mfma_f32_32x32x8bf16_1k instruction.	2023-08-18 07:28:18 -05:00
Whitney Tsang	100cabd0e4	[FRONTEND] use enum instead of bool to select target (#2118 ) Before this PR, the determination of `TritonGPUToLLVMIRPass` to generate NVVM-compatible LLVM or ROCDL-compatible LLVM is controlled by a boolean `isROCM`. This method is hard to scale. This PR changes it to use an enum instead, where new target can be added easily when needed. --------- Signed-off-by: Tsang, Whitney <whitney.tsang@intel.com> Co-authored-by: Philippe Tillet <phil@openai.com>	2023-08-17 18:37:09 -07:00

1 2 3 4 5 ...

469 Commits