github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-02-21 03:00:39 -05:00

Author	SHA1	Message	Date
Zahi Moudallal	2dcbf4783e	[BACKEND] Use getOrder for mma layout warps order instead of the hardcoded col-major order (#1825 )	2023-06-27 10:56:09 -07:00
Philippe Tillet	d4c941177e	[FRONTEND][BACKEND] improved fp8 specs (#1841 ) clearly differentiate between standard fp8e4 (which we'll stop supporting on SM <= 89 because conversions are too expensive if we want to handle the single NaN and clipping properly) and a software-optimized fp8e4b15 format.	2023-06-26 16:19:03 -07:00
Wang Weihan	a3c39d8fbe	[TEST] Add device parameter for ut (#1817 ) Triton has supported different codegen backends for different devices, so enabling the unit test cases to support different devices also makes sense. Otherwise, the third-party backend might have to intrusively change the Triton test cases.	2023-06-25 15:38:59 +08:00
Thomas	3d1cd89b54	[BACKEND] Add store cache modifiers (#1826 ) Plumb through store cache modifiers.	2023-06-23 09:29:10 -07:00
Zahi Moudallal	6ad8cd52e7	[CI] Added IR reference-check github workflow (#1755 )	2023-06-22 18:00:40 -07:00
Zahi Moudallal	ca4f242c9b	[TEST] Added matmul config for testing (#1758 )	2023-06-22 13:31:37 -07:00
Philippe Tillet	0d6cd0307a	[FRONTEND] add tie_break_left option to arg-reductions (#1813 )	2023-06-21 19:35:52 -07:00
Philippe Tillet	4c0e3d907e	[TOOLS] improved ahead-of-time compiler (#1805 ) This is a revival of @gaxler initial ahead-of-time compiler proposal. Code was simplified and some constraints were relaxed (i.e., we now execute the entire file provided vs just the kernel AST) to promote maintainability. A basic unit test was added, though it does not test specialization right now. co-authored by: Gregory Axler, thexler <g.axler@gmail.com>	2023-06-21 01:02:58 -07:00
Keren Zhou	1851c8ca99	[FRONTEND] Fix binary compare op on constexprs (#1801 ) Example: ``` if static_a == 0 and static_b == 1: ... ``` The return value of `static_a == 0` should be `constexpr(True)` but not `True`, otherwise the bool object (True/False) doesn't have the `logical_and` method.	2023-06-18 20:27:56 -07:00
oplavsic	64d7b521cf	[MFMA] Enabled fused attention forward pass. (#226 ) * [MFMA] Activated Fused Attention Forward Pass Patch contains following changes: 1) make_range operator now works with MFMA layout. 2) Reduce operation is forced to run in block layout: inputs converted to block layouts, outputs returned to MFMA layout * Use simple module walk instead of pattern rewritter. * Remove pattern rewritter header. * Enable basic reduce algorithm for MFMA layout * Add TODO comment for fused attention backward pass * Fix bug in fast codegen algorithm for reduce op * Fix input type bug * Increase block size to 128 since out of memory issue is not seen on MI210 * Fix block_size error * Add mfma support in DecomposeDotOperand pattern.	2023-06-16 15:39:08 -05:00
Christopher Hesse	981e98a213	[FRONTEND] update assert_helper.py (#1789 )	2023-06-15 16:24:30 -07:00
Philippe Tillet	9a2580de13	[CI] Added H100 node (#1779 )	2023-06-15 14:21:47 -07:00
Philippe Tillet	b24dc19741	[FRONTEND] cleaned up symbol names (#1782 )	2023-06-14 18:55:32 -07:00
Zahi Moudallal	ac15d00ef4	[TEST] Added f8xf16 tests (#1771 )	2023-06-12 16:14:17 -07:00
Wang Weihan	b27a91a113	[FRONTEND] Enable triton to support register thirdparty backend at runtime (#1643 ) This PR intends to provide a mechanism to support a third-party backend at runtime to generate the backend-specific code. The mechanism provided a common class to abstract the third-party backend logic and two essential functions to register and get the third-party backend at runtime. - `BaseBackend`: A common class to abstract the third-party backend logic - `register_backend`: Register a third-party backend with a given device type - `get_backend`: Get the third-party backend with a given device type Generally, a third-party backend must inherit from `BaseBackend` and implement all the member functions according to the backend characteristics. As long as the backend implementation is ready, the third-party backend can invoke `register_backend` to register it under a given device. During the kernel compilation and execution, the mechanism will get the registered backend to generate the kernel and launcher code for a given device. This PR added a dummy backend to simulate a third-party backend and demonstrate the usage. - [test_device_backend.py](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1): To define a third-party backend and register the backend - [ExtensionBackend](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R123): Inherit from the `BaseBackend` and implement some specific logic like [filter out some compile stages](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R129-R135) - [Register the `ExtensionBackend` for `CPU`](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R279) - [extension_backend.c](https://github.com/openai/triton/pull/1643/files#diff-169c1d08b3a0a7b343cfa3258fbc32b47e0f6c46305a112652fa1bdaaec89d29): To provide the utility function to load kernel binary and get the backend properties.	2023-06-09 09:09:59 -07:00
jayfurmanek	29f93b147b	Merge pull request #229 from ROCmSoftwarePlatform/ifu230601 IFU 230601	2023-06-09 07:55:32 -05:00
Keren Zhou	4fbadf6f6f	[BACKEND] Fix `tl.cat` when the number of threads > the size of a tensor (#1751 ) `tl.cat(tensor<64>, tensor<64>) -> tensor(128)`, because it concatenates elements into a single thread, if number of threads is 128, each thread should own at least 2 elements. With this PR, we also disable remat of the cat op in some cases.	2023-06-07 15:42:38 -07:00
Aleksandr Efimov	0a12031c75	[Triton] Fix MFMA dot operand loading This PR fixes computation of indexes of MFMA dot operands and gives variables more informative names.	2023-06-07 21:30:52 +02:00
Philippe Tillet	c52a91231a	[FRONTEND][BACKEND] Add acquire/release semantics for atomics (#1739 )	2023-06-05 19:09:13 -07:00
Jason Furmanek	0497f95982	[ROCM] Fix assert helper	2023-06-05 21:42:44 +00:00
Philippe Tillet	6c1992cb38	[FRONTEND] min/max now accept `return_indices` argument (#1731 ) Not just syntactic sugar for successive max + argmax but also avoids computing the max twice	2023-06-02 22:01:02 -07:00
jayfurmanek	153ed472b8	Merge branch 'triton-mlir' into ifu230601	2023-06-01 16:18:25 -05:00
Daniil Fukalov	6be1dce41c	[ROCM] Fix transposed operands processing in dot operation with MFMA. (#227 ) - Applied to `loadA()` the same fix as 2c88ed6aab9ace22ccde1f0e443a1579727ee501. - Minor cleanup of `mfmaLayout.getWarpsPerCTA()` usage. Partially fixes ROCmSoftwarePlatform/frameworks-internal#4545	2023-06-01 23:08:58 +02:00
Jason Furmanek	56c55e7451	Initial commit to resolve merge conflicts	2023-06-01 20:58:37 +00:00
Jason Furmanek	28d9754b2a	Merge remote-tracking branch 'oai/main' into ifu230601 Conflicts: python/test/unit/language/assert_helper.py test/Conversion/tritongpu_to_llvm.mlir	2023-06-01 20:53:33 +00:00
Keren Zhou	1e171bf270	[BACKEND] Pipeline pass rewrite part 1: functionality fixes (#1716 ) Support the following three cases: 1. Operands of `load` depend on induction variables before `load`s. 2. Mixed use of induction variables and offset to update the `ptr`. 3. Cross iteration (>1) dependency values.	2023-06-01 12:07:43 -07:00
Mehdi Amini	440fd1bf20	[TESTS] Increase the paylog of the globaltimer kernel to reduce chances of fakiness (#1726 ) If the kernel is too small, on a very fast GPU we may get 0 because the resolution of the timer seems too coarse. Fixes #1725	2023-06-01 02:53:07 -07:00
Mehdi Amini	b0c893cdc5	[FRONTEND][BACKEND] Hardened get_program_id axis by making it an enum attribute (#1721 ) Also catch out-of-bounds indices at constructio and throw a proper error in the frontend. Finally, let's make the IR a bit prettier: %0 = tt.get_program_id {axis = 0 : i32} : i32 becomes: %0 = tt.get_program_id x : i32 Fixes #1718	2023-05-31 22:49:46 -07:00
Mehdi Amini	19c65d6007	[FRONTEND] fix checks for valid slice and avoid hitting an obscure exception. (#1720 ) When comparing to the expected slides, using the `==` operator will dispatch to the component of the slice. If the user writes `a[10:20]` these are `triton.constexpr` instances, and the `__eq__` operator which is implemented as: `return constexpr(self.value == other.value)`. At this point the access to `.value` on the provided `None` yields an exception that isn't very friendly to the user. I am not sure if the implementation of `constexpr` should be hardened instead? Co-authored-by: Philippe Tillet <phil@openai.com>	2023-05-31 16:37:19 +00:00
Andrey Shukshov	fee5950893	[MFMA] Implementation of MFMA DotOp pipeline (#180 ) * [MFMA] Implementation of MFMA DotOp pipeline * Added MFMA test_dot unit tests * Added missing ifdefs * Update offline tests * Removing duplicate parts * fix build after rebase * remove redundant stuff * simplify MMAv3.cpp * move reps function into operand attr description, remove coreMatrixType type from layout conversion, refactored type conversion * remove duplication of mfma intruction shape computation * move all MFMA instruction shape details into layout attribute * fix formatting * reenable matmul acceleration * fix dot operator type conversion * add offline test for dotop * add missing ifdef wrappers * run clang format on changes * review and rebase fix * add switch for MFMA instructions * change check precision for float16 test * disable redundant check for allowTF32 * - skip unsupported block size in matmul autotuner - support transposed inputs of dot * reenable matmul acceleration * Add first part to FMA for dot operation on HW without MFMA support. * Fix offline tests. * Fix lit tests * refactor mmav3 to mfma * fix rebase issues * fix detection of mfma support and wrong assert * remove unnecessary macros * Add documentation for MFMA layout. * fix line size computation for B argument * Fix getElemsPerThread() and getSizePerThread() functions for MFMA layout. --------- Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> Co-authored-by: dfukalov <1671137+dfukalov@users.noreply.github.com> Co-authored-by: weihan13 <weihan13@amd.com> Co-authored-by: Ognjen Plavsic <ognjen.plavsic@dxc.com>	2023-05-30 16:10:28 -05:00
Philippe Tillet	4e2f57add5	[FRONTEND] Added default axis=None for reduction, which reduces across all the axes. (#1712 )	2023-05-28 16:13:21 -07:00
Philippe Tillet	420e4acecc	[TEST] Added flash attention tests for D_HEAD in {16, 32, 128}. (#1709 )	2023-05-27 22:48:22 -07:00
Keren Zhou	0341953466	[FRONTEND] Correct the debug syntax (#1705 ) - If `TRITON_DEBUG=True`, all triton functions will be compiled in the debug mode. - Otherwise, a triton function `f`'s debug flag is either `True`, `False` or `None` (default). - If `True`, `f` is compiled in the debug mode. - If `False`, `f` is compiled in the normal mode. - If `None`, `f` is compiled based on its caller's debug flag. The root (kernel) function's debug flag can also be set through the `compile` function. cc @ngimel , @Chillee	2023-05-24 23:24:29 -07:00
Philippe Tillet	b5ba639bae	[FRONTEND] fixed issue for fp64 literals and added tests (#1698 ) fixes #1686	2023-05-20 18:36:28 -07:00
Keren Zhou	fb30d84069	[FRONTEND] Refactor contains_return_op into an independent AST (#1694 ) https://github.com/openai/triton/issues/1690	2023-05-20 11:18:40 -07:00
Zahi Moudallal	34817ecc95	[BACKEND] Added support to convert shared to distributed layouts (#1682 )	2023-05-17 17:20:29 -07:00
Jason Furmanek	78c60742fc	IFU 230517 Resolve merge conflicts	2023-05-17 17:36:44 +00:00
Jason Furmanek	4c4e42e524	Merge remote-tracking branch 'openai/main' into IFU-230517 Conflicts: lib/Conversion/TritonGPUToLLVM/TritonGPUToLLVMPass.cpp lib/Target/LLVMIR/LLVMIRTranslation.cpp python/test/unit/language/assert_helper.py python/triton/third_party/cuda/bin/ptxas test/Conversion/tritongpu_to_llvm.mlir It looks like you may be committing a merge. If this is not correct, please remove the file .git/MERGE_HEAD and try again.	2023-05-17 15:03:42 +00:00
Keren Zhou	3baab48eaf	[FRONTEND] Differentiate between bool and int in the frontend (#1678 ) `bool` is a subclass of `int`, so `isinstance(bool_var, int) == True`, and a `bool` constant will be converted to an `int` constant. In triton specifically, if a bool var is treated as an integer, it prevents us using the `logical_and` operator which requires both operands have the same bit length. > Cannot bitcast data-type of size 32 to data-type of size 1 By differentiating int and bool, it allows us to make the syntax more close to native python. We can now use `if bool_var and condition` to check the truthiness, and `if bool_var is True` to check identity.	2023-05-16 18:24:16 +00:00
Daniil Fukalov	7acc1cb707	[ROCM] Implement `device_assert` functionality. (#207 ) Triton firstly prints assert message into stderr stream with the same (refactored) helper function as `device_print` and then ends the thread execution. Note: s_endpgm instruction is used, since s_trap (generated from LLVM::Trap or LLVM::DebugTrap) has some issues on different HW. Also got back fix in `python/triton/compiler/compiler.py` lost after one of IFU.	2023-05-15 16:16:14 +02:00
Keren Zhou	674f9bf7a6	[FRONTEND] Better error messages for noinline functions (#1657 ) ``` at 10:18:def val_multiplier_noinline(val, i): return val * i ^ Function val_multiplier_noinline is marked noinline, but was called with non-scalar argument val:fp32[constexpr[128]] ```	2023-05-11 12:46:25 -07:00
Benjamin Chetioui	115964b780	[TESTS] Add regression test for issue #1601 . (#1611 ) Following up on #1603, I am adding a new file meant to contain functional regression tests to the repository. Let me know if another folder would be a more appropriate place for these tests. Co-authored-by: Philippe Tillet <phil@openai.com>	2023-05-10 23:30:36 -07:00
Natalia Gimelshein	0daee68d71	[FRONTEND] Don't call set_device in tl.dot (#1646 ) This breaks multiprocess compilation	2023-05-10 20:39:27 -04:00
Zahi Moudallal	fb40bf1954	[TEST] Fixed and re-enabled reduce test (#1644 ) Re-enabled reduce test after fixing the %cst stride in the ttgir, and modifying the sweep parameters to make sure the shape per CTA to be less than or equal to the tensor shape.	2023-05-10 15:15:11 -07:00
Mario Lezcano Casado	6b1af5fe37	[FRONTEND] Add support for scalar conditions in `device_assert` (#1641 ) This sometimes happens in TorchInductor. See https://github.com/pytorch/pytorch/pull/100880. More generally, it's useful to be able to write `tl.device_assert(False, msg)`. Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-05-09 23:05:00 -07:00
Keren Zhou	b19b274d93	[FRONTEND] Fix return op related control flow issues (#1637 ) - Case 1: Return after static control flow is taken. Peel off instructions after the first `return` for each basic block. ```python if static_condition: tl.store(...) return return ``` - Case 2: Return exists in both `if` and `else` branches of an inlined `JITFunction` function ```python def foo(): if dynamic_condition: return a else: return b ``` - Case 3: Return exists in a `JITFunction` from another module ```python import module if cond: a = module.func() ``` - Case 4: A chain of calls through undefined local variables ```python import module if cond: a = x a = a.to(tl.int32).to(tl.int32) ``` - Case 5: Call a function `func` without returning variables. `func` is recognized as an `Expr` first instead of a `Call`. ```python if cond: foo() else: bar() ``` - Case 6: Call a `noinline` function. We don't need to check if the function contains any return op.	2023-05-09 12:51:14 -04:00
Michaël Benesty	858a2f0a5e	[FRONTEND] Added interpreter mode (#1573 ) Simple mechanism to run Triton kernels on PyTorch for debugging purpose (upstream from Kernl). Todo: - random grid iteration - support of atomic ops - more unit tests - cover new APIs?	2023-05-08 14:28:20 -07:00
Zahi Moudallal	125d9d1cc7	[TEST] Added convert layout test from/to sliced blocked/mma (#1620 )	2023-05-06 00:20:52 +00:00
Keren Zhou	fd381e2336	[BACKEND] Allow noinline functions to return multiple values of primitive types (#1623 ) Fix https://github.com/openai/triton/issues/1621	2023-05-05 19:25:58 +00:00
Zahi Moudallal	e2ae2c6c48	[BACKEND] Modified store op thread masking (#1605 )	2023-05-04 17:15:05 -07:00

... 3 4 5 6 7 ...

507 Commits