github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Jason Furmanek	12005a82f2	Initial commit to resolve merge conflicts	2023-06-30 19:53:53 +00:00
Thomas	2e3182bab7	[BACKEND] Support scan on dimensions other that fastest moving one (#1863 ) This relax the restriction in the scan lowering to support layout where we scan along a dimension which isn't the fastest moving one. This is done by relaxing how we accesses elements during scanning and allow elements to be strided.	2023-06-30 12:40:48 -07:00
Oleg Shyshkov	66ed53d19d	[FRONTEND] Support mixed-precision inputs in triton.ops.matmul. (#1754 ) Support only combinations of float32 with float16 or bfloat16 for now. Shouldn't change anything for cases when input types match. That's a follow-up to the comment in my other PR: https://github.com/openai/triton/pull/1746#issuecomment-1579630016. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-06-30 09:22:27 -07:00
Thomas	7a8a2da8ef	[BACKEND] Enable lowering of f16 constant matmul (#1870 ) Since the type expected for mma encoding is i32 when lowering f16 splat we need to pack f16 constants into a i32 value. This allows re-enabling the constant matmul unit test.	2023-06-30 07:00:25 -04:00
Philippe Tillet	f77015967d	Revert "[FRONTEND][BACKEND] improved fp8 specs (#1841 )" (#1865 ) This reverts commit `d4c941177e`.	2023-06-29 21:07:01 -04:00
Jason Furmanek	2b38ab4b6c	Merge remote-tracking branch 'oai/main' into ifu230620 Conflicts: include/triton/Conversion/TritonToTritonGPU/Passes.td include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp python/test/unit/language/assert_helper.py python/triton/compiler/compiler.py python/triton/runtime/jit.py python/triton/tools/aot.py test/Conversion/triton_to_tritongpu.mlir test/Conversion/tritongpu_to_llvm.mlir	2023-06-29 21:47:27 +00:00
Thomas	3be060849a	[FEATURE] Add associative_scan support (#1858 ) Implement associative_scan in the front end and implement lowering to LLVM for blocked layout where the scan happens on the fastest moving dimension. This will later be generalized to support more layout.	2023-06-29 14:37:51 -07:00
Xinya Zhang	75b86da598	Add configurable wavefront size support for Navi/MI. [To squash] Configurable warp size in test_core_amd.py::test_convert2d Note: test_core_amd.py::test_convert2d unit tests have been changed because some of the old layouts exceed the shared memory limit (64KiB)	2023-06-28 22:25:14 -05:00
Thomas	e5d7411a69	[BACKEND] Add .wt store cache modifier (#1831 )	2023-06-28 17:40:30 +00:00
Keren Zhou	d2de3f37f0	[BACKEND] Reduce code cleanup and bug fix for the fast path (#1816 ) https://github.com/openai/triton/issues/1715	2023-06-27 17:27:24 -07:00
Zahi Moudallal	2dcbf4783e	[BACKEND] Use getOrder for mma layout warps order instead of the hardcoded col-major order (#1825 )	2023-06-27 10:56:09 -07:00
Philippe Tillet	d4c941177e	[FRONTEND][BACKEND] improved fp8 specs (#1841 ) clearly differentiate between standard fp8e4 (which we'll stop supporting on SM <= 89 because conversions are too expensive if we want to handle the single NaN and clipping properly) and a software-optimized fp8e4b15 format.	2023-06-26 16:19:03 -07:00
Wang Weihan	a3c39d8fbe	[TEST] Add device parameter for ut (#1817 ) Triton has supported different codegen backends for different devices, so enabling the unit test cases to support different devices also makes sense. Otherwise, the third-party backend might have to intrusively change the Triton test cases.	2023-06-25 15:38:59 +08:00
Thomas	3d1cd89b54	[BACKEND] Add store cache modifiers (#1826 ) Plumb through store cache modifiers.	2023-06-23 09:29:10 -07:00
Zahi Moudallal	6ad8cd52e7	[CI] Added IR reference-check github workflow (#1755 )	2023-06-22 18:00:40 -07:00
Zahi Moudallal	ca4f242c9b	[TEST] Added matmul config for testing (#1758 )	2023-06-22 13:31:37 -07:00
Philippe Tillet	0d6cd0307a	[FRONTEND] add tie_break_left option to arg-reductions (#1813 )	2023-06-21 19:35:52 -07:00
Philippe Tillet	4c0e3d907e	[TOOLS] improved ahead-of-time compiler (#1805 ) This is a revival of @gaxler initial ahead-of-time compiler proposal. Code was simplified and some constraints were relaxed (i.e., we now execute the entire file provided vs just the kernel AST) to promote maintainability. A basic unit test was added, though it does not test specialization right now. co-authored by: Gregory Axler, thexler <g.axler@gmail.com>	2023-06-21 01:02:58 -07:00
Keren Zhou	1851c8ca99	[FRONTEND] Fix binary compare op on constexprs (#1801 ) Example: ``` if static_a == 0 and static_b == 1: ... ``` The return value of `static_a == 0` should be `constexpr(True)` but not `True`, otherwise the bool object (True/False) doesn't have the `logical_and` method.	2023-06-18 20:27:56 -07:00
oplavsic	64d7b521cf	[MFMA] Enabled fused attention forward pass. (#226 ) * [MFMA] Activated Fused Attention Forward Pass Patch contains following changes: 1) make_range operator now works with MFMA layout. 2) Reduce operation is forced to run in block layout: inputs converted to block layouts, outputs returned to MFMA layout * Use simple module walk instead of pattern rewritter. * Remove pattern rewritter header. * Enable basic reduce algorithm for MFMA layout * Add TODO comment for fused attention backward pass * Fix bug in fast codegen algorithm for reduce op * Fix input type bug * Increase block size to 128 since out of memory issue is not seen on MI210 * Fix block_size error * Add mfma support in DecomposeDotOperand pattern.	2023-06-16 15:39:08 -05:00
Christopher Hesse	981e98a213	[FRONTEND] update assert_helper.py (#1789 )	2023-06-15 16:24:30 -07:00
Philippe Tillet	9a2580de13	[CI] Added H100 node (#1779 )	2023-06-15 14:21:47 -07:00
Philippe Tillet	b24dc19741	[FRONTEND] cleaned up symbol names (#1782 )	2023-06-14 18:55:32 -07:00
Zahi Moudallal	ac15d00ef4	[TEST] Added f8xf16 tests (#1771 )	2023-06-12 16:14:17 -07:00
Wang Weihan	b27a91a113	[FRONTEND] Enable triton to support register thirdparty backend at runtime (#1643 ) This PR intends to provide a mechanism to support a third-party backend at runtime to generate the backend-specific code. The mechanism provided a common class to abstract the third-party backend logic and two essential functions to register and get the third-party backend at runtime. - `BaseBackend`: A common class to abstract the third-party backend logic - `register_backend`: Register a third-party backend with a given device type - `get_backend`: Get the third-party backend with a given device type Generally, a third-party backend must inherit from `BaseBackend` and implement all the member functions according to the backend characteristics. As long as the backend implementation is ready, the third-party backend can invoke `register_backend` to register it under a given device. During the kernel compilation and execution, the mechanism will get the registered backend to generate the kernel and launcher code for a given device. This PR added a dummy backend to simulate a third-party backend and demonstrate the usage. - [test_device_backend.py](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1): To define a third-party backend and register the backend - [ExtensionBackend](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R123): Inherit from the `BaseBackend` and implement some specific logic like [filter out some compile stages](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R129-R135) - [Register the `ExtensionBackend` for `CPU`](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R279) - [extension_backend.c](https://github.com/openai/triton/pull/1643/files#diff-169c1d08b3a0a7b343cfa3258fbc32b47e0f6c46305a112652fa1bdaaec89d29): To provide the utility function to load kernel binary and get the backend properties.	2023-06-09 09:09:59 -07:00
jayfurmanek	29f93b147b	Merge pull request #229 from ROCmSoftwarePlatform/ifu230601 IFU 230601	2023-06-09 07:55:32 -05:00
Keren Zhou	4fbadf6f6f	[BACKEND] Fix `tl.cat` when the number of threads > the size of a tensor (#1751 ) `tl.cat(tensor<64>, tensor<64>) -> tensor(128)`, because it concatenates elements into a single thread, if number of threads is 128, each thread should own at least 2 elements. With this PR, we also disable remat of the cat op in some cases.	2023-06-07 15:42:38 -07:00
Aleksandr Efimov	0a12031c75	[Triton] Fix MFMA dot operand loading This PR fixes computation of indexes of MFMA dot operands and gives variables more informative names.	2023-06-07 21:30:52 +02:00
Philippe Tillet	c52a91231a	[FRONTEND][BACKEND] Add acquire/release semantics for atomics (#1739 )	2023-06-05 19:09:13 -07:00
Jason Furmanek	0497f95982	[ROCM] Fix assert helper	2023-06-05 21:42:44 +00:00
Philippe Tillet	6c1992cb38	[FRONTEND] min/max now accept `return_indices` argument (#1731 ) Not just syntactic sugar for successive max + argmax but also avoids computing the max twice	2023-06-02 22:01:02 -07:00
jayfurmanek	153ed472b8	Merge branch 'triton-mlir' into ifu230601	2023-06-01 16:18:25 -05:00
Daniil Fukalov	6be1dce41c	[ROCM] Fix transposed operands processing in dot operation with MFMA. (#227 ) - Applied to `loadA()` the same fix as 2c88ed6aab9ace22ccde1f0e443a1579727ee501. - Minor cleanup of `mfmaLayout.getWarpsPerCTA()` usage. Partially fixes ROCmSoftwarePlatform/frameworks-internal#4545	2023-06-01 23:08:58 +02:00
Jason Furmanek	56c55e7451	Initial commit to resolve merge conflicts	2023-06-01 20:58:37 +00:00
Jason Furmanek	28d9754b2a	Merge remote-tracking branch 'oai/main' into ifu230601 Conflicts: python/test/unit/language/assert_helper.py test/Conversion/tritongpu_to_llvm.mlir	2023-06-01 20:53:33 +00:00
Keren Zhou	1e171bf270	[BACKEND] Pipeline pass rewrite part 1: functionality fixes (#1716 ) Support the following three cases: 1. Operands of `load` depend on induction variables before `load`s. 2. Mixed use of induction variables and offset to update the `ptr`. 3. Cross iteration (>1) dependency values.	2023-06-01 12:07:43 -07:00
Mehdi Amini	440fd1bf20	[TESTS] Increase the paylog of the globaltimer kernel to reduce chances of fakiness (#1726 ) If the kernel is too small, on a very fast GPU we may get 0 because the resolution of the timer seems too coarse. Fixes #1725	2023-06-01 02:53:07 -07:00
Mehdi Amini	b0c893cdc5	[FRONTEND][BACKEND] Hardened get_program_id axis by making it an enum attribute (#1721 ) Also catch out-of-bounds indices at constructio and throw a proper error in the frontend. Finally, let's make the IR a bit prettier: %0 = tt.get_program_id {axis = 0 : i32} : i32 becomes: %0 = tt.get_program_id x : i32 Fixes #1718	2023-05-31 22:49:46 -07:00
Mehdi Amini	19c65d6007	[FRONTEND] fix checks for valid slice and avoid hitting an obscure exception. (#1720 ) When comparing to the expected slides, using the `==` operator will dispatch to the component of the slice. If the user writes `a[10:20]` these are `triton.constexpr` instances, and the `__eq__` operator which is implemented as: `return constexpr(self.value == other.value)`. At this point the access to `.value` on the provided `None` yields an exception that isn't very friendly to the user. I am not sure if the implementation of `constexpr` should be hardened instead? Co-authored-by: Philippe Tillet <phil@openai.com>	2023-05-31 16:37:19 +00:00
Andrey Shukshov	fee5950893	[MFMA] Implementation of MFMA DotOp pipeline (#180 ) * [MFMA] Implementation of MFMA DotOp pipeline * Added MFMA test_dot unit tests * Added missing ifdefs * Update offline tests * Removing duplicate parts * fix build after rebase * remove redundant stuff * simplify MMAv3.cpp * move reps function into operand attr description, remove coreMatrixType type from layout conversion, refactored type conversion * remove duplication of mfma intruction shape computation * move all MFMA instruction shape details into layout attribute * fix formatting * reenable matmul acceleration * fix dot operator type conversion * add offline test for dotop * add missing ifdef wrappers * run clang format on changes * review and rebase fix * add switch for MFMA instructions * change check precision for float16 test * disable redundant check for allowTF32 * - skip unsupported block size in matmul autotuner - support transposed inputs of dot * reenable matmul acceleration * Add first part to FMA for dot operation on HW without MFMA support. * Fix offline tests. * Fix lit tests * refactor mmav3 to mfma * fix rebase issues * fix detection of mfma support and wrong assert * remove unnecessary macros * Add documentation for MFMA layout. * fix line size computation for B argument * Fix getElemsPerThread() and getSizePerThread() functions for MFMA layout. --------- Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> Co-authored-by: dfukalov <1671137+dfukalov@users.noreply.github.com> Co-authored-by: weihan13 <weihan13@amd.com> Co-authored-by: Ognjen Plavsic <ognjen.plavsic@dxc.com>	2023-05-30 16:10:28 -05:00
Philippe Tillet	4e2f57add5	[FRONTEND] Added default axis=None for reduction, which reduces across all the axes. (#1712 )	2023-05-28 16:13:21 -07:00
Philippe Tillet	420e4acecc	[TEST] Added flash attention tests for D_HEAD in {16, 32, 128}. (#1709 )	2023-05-27 22:48:22 -07:00
Keren Zhou	0341953466	[FRONTEND] Correct the debug syntax (#1705 ) - If `TRITON_DEBUG=True`, all triton functions will be compiled in the debug mode. - Otherwise, a triton function `f`'s debug flag is either `True`, `False` or `None` (default). - If `True`, `f` is compiled in the debug mode. - If `False`, `f` is compiled in the normal mode. - If `None`, `f` is compiled based on its caller's debug flag. The root (kernel) function's debug flag can also be set through the `compile` function. cc @ngimel , @Chillee	2023-05-24 23:24:29 -07:00
Philippe Tillet	b5ba639bae	[FRONTEND] fixed issue for fp64 literals and added tests (#1698 ) fixes #1686	2023-05-20 18:36:28 -07:00
Keren Zhou	fb30d84069	[FRONTEND] Refactor contains_return_op into an independent AST (#1694 ) https://github.com/openai/triton/issues/1690	2023-05-20 11:18:40 -07:00
Zahi Moudallal	34817ecc95	[BACKEND] Added support to convert shared to distributed layouts (#1682 )	2023-05-17 17:20:29 -07:00
Jason Furmanek	78c60742fc	IFU 230517 Resolve merge conflicts	2023-05-17 17:36:44 +00:00
Jason Furmanek	4c4e42e524	Merge remote-tracking branch 'openai/main' into IFU-230517 Conflicts: lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/test/unit/language/assert_helper.py python/triton/third_party/cuda/bin/ptxas test/Conversion/tritongpu_to_llvm.mlir It looks like you may be committing a merge. If this is not correct, please remove the file .git/MERGE_HEAD and try again.	2023-05-17 15:03:42 +00:00
Keren Zhou	3baab48eaf	[FRONTEND] Differentiate between bool and int in the frontend (#1678 ) `bool` is a subclass of `int`, so `isinstance(bool_var, int) == True`, and a `bool` constant will be converted to an `int` constant. In triton specifically, if a bool var is treated as an integer, it prevents us using the `logical_and` operator which requires both operands have the same bit length. > Cannot bitcast data-type of size 32 to data-type of size 1 By differentiating int and bool, it allows us to make the syntax more close to native python. We can now use `if bool_var and condition` to check the truthiness, and `if bool_var is True` to check identity.	2023-05-16 18:24:16 +00:00
Daniil Fukalov	7acc1cb707	[ROCM] Implement `device_assert` functionality. (#207 ) Triton firstly prints assert message into stderr stream with the same (refactored) helper function as `device_print` and then ends the thread execution. Note: s_endpgm instruction is used, since s_trap (generated from LLVM::Trap or LLVM::DebugTrap) has some issues on different HW. Also got back fix in `python/triton/compiler/compiler.py` lost after one of IFU.	2023-05-15 16:16:14 +02:00

1 2 3 4 5 ...

417 Commits