github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Alexander Efimov	d830164699	[Dot] Support slicing in dot operands (#241 ) * [Dot] Support slicing in dot operands This PR enables support of sliced tensors of dot operands. One of consequences is we can reenable prefetch and sw pipeline optimizations. * refactor to comply with next changes with swizzling support * comment fixes for code review	2023-06-26 12:55:34 -05:00
Izzy Putterman	3c400e7818	[FRONTEND] switch absolute imports to relative v2 (#1833 )	2023-06-26 04:13:12 +00:00
Wang Weihan	a3c39d8fbe	[TEST] Add device parameter for ut (#1817 ) Triton has supported different codegen backends for different devices, so enabling the unit test cases to support different devices also makes sense. Otherwise, the third-party backend might have to intrusively change the Triton test cases.	2023-06-25 15:38:59 +08:00
Thomas	3d1cd89b54	[BACKEND] Add store cache modifiers (#1826 ) Plumb through store cache modifiers.	2023-06-23 09:29:10 -07:00
Zahi Moudallal	6ad8cd52e7	[CI] Added IR reference-check github workflow (#1755 )	2023-06-22 18:00:40 -07:00
Wang Weihan	4d3a92f1b8	[BUILD] Make sure always build_ext first (#1819 ) The third-party backend might install its python package to the `triton/third_party` python package during the build process. But the `build_py` could be executed before the `build_ext`, and then `build_py` would only copy the `packages` defined in the `setup.py` w/o the third-party related packages as the third-party backend has not been built, which is triggered by `build_ext`. Therefore, this PR refined the build order a little bit to ensure `build_ext` always happens before `build_py`.	2023-06-22 13:32:03 -07:00
Zahi Moudallal	ca4f242c9b	[TEST] Added matmul config for testing (#1758 )	2023-06-22 13:31:37 -07:00
Goran Flegar	8d566e4196	[FRONTEND] Fix missing attribute access in DependenciesFinder (#1820 ) It seems that patch #1773 introduced a bug, since the `lhs` object doesn't necessarily have a `__name__` attribute. I'm hitting this if I modify the matmul tutorial (gflegar/triton@442b00f4d): ``` File "/home/gflegar/triton/python/triton/runtime/jit.py", line 74, in visit_Attribute if lhs is None or lhs.__name__ == "triton": AttributeError: 'Tensor' object has no attribute '__name__' ``` I think the idea of that patch was to remove the need to import triton by replacing `lhs is triton` with `lhs.__name__ == "triton"`. This patch should have the same behavior as the original code, but withouth failing if `lhs` doesn't havea `__name__` attribute.	2023-06-22 13:30:25 -07:00
Izzy Putterman	5686c51cdb	[FRONTEND] allow pre-hook in autotuner configs to access config kwargs (#1814 ) This is a very quick change that allows the configs' pre-hooks to see the values in the config itself. This is useful if we'd like to allocate intermediate tensor and the shape depends on tile size.	2023-06-22 05:40:48 -07:00
Philippe Tillet	0d6cd0307a	[FRONTEND] add tie_break_left option to arg-reductions (#1813 )	2023-06-21 19:35:52 -07:00
Philippe Tillet	4c0e3d907e	[TOOLS] improved ahead-of-time compiler (#1805 ) This is a revival of @gaxler initial ahead-of-time compiler proposal. Code was simplified and some constraints were relaxed (i.e., we now execute the entire file provided vs just the kernel AST) to promote maintainability. A basic unit test was added, though it does not test specialization right now. co-authored by: Gregory Axler, thexler <g.axler@gmail.com>	2023-06-21 01:02:58 -07:00
Christopher Hesse	767dd5b5c1	[DOCS] update debug docs (#1790 ) Haven't tested the docs (they don't seem to compile on my machine when I use the makefile)	2023-06-20 20:45:17 -07:00
Thomas	4be1c94b1f	Enable ir dumping for ttir toi ttgir phase. (#1804 ) ttir_to_ttgir was missing enable_debug to be able to dump IR.	2023-06-20 17:17:18 +00:00
ZhenLei Xu	326061efd4	Remove Duplicate Definition of 'asm' Variable in triton/compiler/compiler.py (#1803 )	2023-06-19 22:28:28 -07:00
Xinya Zhang	a14b1e2cc4	Add Navi3x bitcode files. (#239 ) Ideally Triton should be able to locate bitcode files under /opt/rocm/ (or $ROCM_HOME). Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>	2023-06-19 10:30:47 -05:00
Keren Zhou	1851c8ca99	[FRONTEND] Fix binary compare op on constexprs (#1801 ) Example: ``` if static_a == 0 and static_b == 1: ... ``` The return value of `static_a == 0` should be `constexpr(True)` but not `True`, otherwise the bool object (True/False) doesn't have the `logical_and` method.	2023-06-18 20:27:56 -07:00
oplavsic	64d7b521cf	[MFMA] Enabled fused attention forward pass. (#226 ) * [MFMA] Activated Fused Attention Forward Pass Patch contains following changes: 1) make_range operator now works with MFMA layout. 2) Reduce operation is forced to run in block layout: inputs converted to block layouts, outputs returned to MFMA layout * Use simple module walk instead of pattern rewritter. * Remove pattern rewritter header. * Enable basic reduce algorithm for MFMA layout * Add TODO comment for fused attention backward pass * Fix bug in fast codegen algorithm for reduce op * Fix input type bug * Increase block size to 128 since out of memory issue is not seen on MI210 * Fix block_size error * Add mfma support in DecomposeDotOperand pattern.	2023-06-16 15:39:08 -05:00
Christopher Hesse	981e98a213	[FRONTEND] update assert_helper.py (#1789 )	2023-06-15 16:24:30 -07:00
Philippe Tillet	9a2580de13	[CI] Added H100 node (#1779 )	2023-06-15 14:21:47 -07:00
Philippe Tillet	b24dc19741	[FRONTEND] cleaned up symbol names (#1782 )	2023-06-14 18:55:32 -07:00
Izzy Putterman	71e21f5797	[FRONTEND] switch absolute imports to relative imports in Triton (#1773 )	2023-06-14 23:59:24 +00:00
Philippe Tillet	754306b5d7	[FRONTEND] `TRITON_PTXAS_PATH` can now take in options (#1778 )	2023-06-14 14:03:03 -07:00
Zahi Moudallal	ac15d00ef4	[TEST] Added f8xf16 tests (#1771 )	2023-06-12 16:14:17 -07:00
Wang Weihan	b27a91a113	[FRONTEND] Enable triton to support register thirdparty backend at runtime (#1643 ) This PR intends to provide a mechanism to support a third-party backend at runtime to generate the backend-specific code. The mechanism provided a common class to abstract the third-party backend logic and two essential functions to register and get the third-party backend at runtime. - `BaseBackend`: A common class to abstract the third-party backend logic - `register_backend`: Register a third-party backend with a given device type - `get_backend`: Get the third-party backend with a given device type Generally, a third-party backend must inherit from `BaseBackend` and implement all the member functions according to the backend characteristics. As long as the backend implementation is ready, the third-party backend can invoke `register_backend` to register it under a given device. During the kernel compilation and execution, the mechanism will get the registered backend to generate the kernel and launcher code for a given device. This PR added a dummy backend to simulate a third-party backend and demonstrate the usage. - [test_device_backend.py](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1): To define a third-party backend and register the backend - [ExtensionBackend](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R123): Inherit from the `BaseBackend` and implement some specific logic like [filter out some compile stages](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R129-R135) - [Register the `ExtensionBackend` for `CPU`](https://github.com/openai/triton/pull/1643/files#diff-bbe4d50624f2d11bf17c878a1ed4d422918c124c182cf9357b993240c385bea1R279) - [extension_backend.c](https://github.com/openai/triton/pull/1643/files#diff-169c1d08b3a0a7b343cfa3258fbc32b47e0f6c46305a112652fa1bdaaec89d29): To provide the utility function to load kernel binary and get the backend properties.	2023-06-09 09:09:59 -07:00
jayfurmanek	29f93b147b	Merge pull request #229 from ROCmSoftwarePlatform/ifu230601 IFU 230601	2023-06-09 07:55:32 -05:00
jayfurmanek	49ef240047	get Arch Info using HSA (#233 )	2023-06-08 14:30:46 -05:00
Sophia Wisdom	2fb56dcf52	[FRONTEND] improve `tl.dot` error messages (#1757 ) This makes it easier to figure out what's going on when you get these error messages.	2023-06-07 19:39:16 -07:00
Keren Zhou	4fbadf6f6f	[BACKEND] Fix `tl.cat` when the number of threads > the size of a tensor (#1751 ) `tl.cat(tensor<64>, tensor<64>) -> tensor(128)`, because it concatenates elements into a single thread, if number of threads is 128, each thread should own at least 2 elements. With this PR, we also disable remat of the cat op in some cases.	2023-06-07 15:42:38 -07:00
Aleksandr Efimov	0a12031c75	[Triton] Fix MFMA dot operand loading This PR fixes computation of indexes of MFMA dot operands and gives variables more informative names.	2023-06-07 21:30:52 +02:00
Zahi Moudallal	1db570f45c	[FRONTEND] Added arch to the json (#1747 )	2023-06-06 17:06:03 -07:00
Edward Z. Yang	f294a18864	[FRONTEND] force quantile tensors to be float; prevents accidents (#1741 ) In particular, sometimes this was failing with: ``` RuntimeError: quantile() input tensor must be either float or double dtype ``` Fixes https://github.com/pytorch/pytorch/issues/103054 Signed-off-by: Edward Z. Yang <ezyang@meta.com> --------- Signed-off-by: Edward Z. Yang <ezyang@meta.com>	2023-06-05 20:55:40 -07:00
Keren Zhou	1eedef3596	[FRONTEND] Fix `next_power_of_2` for large integers (#1740 ) test case: `next_power_of_2(2**33 + 1)`	2023-06-05 20:29:05 -07:00
Philippe Tillet	c52a91231a	[FRONTEND][BACKEND] Add acquire/release semantics for atomics (#1739 )	2023-06-05 19:09:13 -07:00
Jason Furmanek	0497f95982	[ROCM] Fix assert helper	2023-06-05 21:42:44 +00:00
Philippe Tillet	9c8d7c18b3	[FRONTEND] simpler/faster argmax code (#1736 ) @peterbell10 @Jokeren I believe that the current argmax reduction code is overly complicated. Am I missing something here?	2023-06-04 21:30:55 -07:00
Philippe Tillet	6c1992cb38	[FRONTEND] min/max now accept `return_indices` argument (#1731 ) Not just syntactic sugar for successive max + argmax but also avoids computing the max twice	2023-06-02 22:01:02 -07:00
chengjunlu	45ba9af6ed	[BACKEND] Add a configurable parameter for the number of threads per warp (#1719 ) Add a configurable parameter for the number of threads per warp for other GPU. Like: Intel GPU. Make it default to be 32 not change code logic on the CUDA/AMD GPU. Note: The Intel GPU GenX ISA is explicit SIMD and can support variant number of threads lane per HW execution unit.	2023-06-02 16:55:06 -07:00
Sharad Vikram	acf1ede5bf	[DEBUGGER] Use from __future__ import annotations to avoid importing torch from annotations (#1722 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-06-01 14:57:17 -07:00
jayfurmanek	153ed472b8	Merge branch 'triton-mlir' into ifu230601	2023-06-01 16:18:25 -05:00
Daniil Fukalov	6be1dce41c	[ROCM] Fix transposed operands processing in dot operation with MFMA. (#227 ) - Applied to `loadA()` the same fix as 2c88ed6aab9ace22ccde1f0e443a1579727ee501. - Minor cleanup of `mfmaLayout.getWarpsPerCTA()` usage. Partially fixes ROCmSoftwarePlatform/frameworks-internal#4545	2023-06-01 23:08:58 +02:00
Jason Furmanek	56c55e7451	Initial commit to resolve merge conflicts	2023-06-01 20:58:37 +00:00
Jason Furmanek	28d9754b2a	Merge remote-tracking branch 'oai/main' into ifu230601 Conflicts: python/test/unit/language/assert_helper.py test/Conversion/tritongpu_to_llvm.mlir	2023-06-01 20:53:33 +00:00
Keren Zhou	1e171bf270	[BACKEND] Pipeline pass rewrite part 1: functionality fixes (#1716 ) Support the following three cases: 1. Operands of `load` depend on induction variables before `load`s. 2. Mixed use of induction variables and offset to update the `ptr`. 3. Cross iteration (>1) dependency values.	2023-06-01 12:07:43 -07:00
Mehdi Amini	440fd1bf20	[TESTS] Increase the paylog of the globaltimer kernel to reduce chances of fakiness (#1726 ) If the kernel is too small, on a very fast GPU we may get 0 because the resolution of the timer seems too coarse. Fixes #1725	2023-06-01 02:53:07 -07:00
Mehdi Amini	b0c893cdc5	[FRONTEND][BACKEND] Hardened get_program_id axis by making it an enum attribute (#1721 ) Also catch out-of-bounds indices at constructio and throw a proper error in the frontend. Finally, let's make the IR a bit prettier: %0 = tt.get_program_id {axis = 0 : i32} : i32 becomes: %0 = tt.get_program_id x : i32 Fixes #1718	2023-05-31 22:49:46 -07:00
Mehdi Amini	19c65d6007	[FRONTEND] fix checks for valid slice and avoid hitting an obscure exception. (#1720 ) When comparing to the expected slides, using the `==` operator will dispatch to the component of the slice. If the user writes `a[10:20]` these are `triton.constexpr` instances, and the `__eq__` operator which is implemented as: `return constexpr(self.value == other.value)`. At this point the access to `.value` on the provided `None` yields an exception that isn't very friendly to the user. I am not sure if the implementation of `constexpr` should be hardened instead? Co-authored-by: Philippe Tillet <phil@openai.com>	2023-05-31 16:37:19 +00:00
Eugene Zhulenev	327d362cca	[FRONTEND] add a method to serialize Triton module to MLIR bytecode (#1711 ) In addition to printing Triton module to string, and an option to print it in bytecode format (https://mlir.llvm.org/docs/BytecodeFormat/)	2023-05-31 08:49:11 -07:00
Andrey Shukshov	fee5950893	[MFMA] Implementation of MFMA DotOp pipeline (#180 ) * [MFMA] Implementation of MFMA DotOp pipeline * Added MFMA test_dot unit tests * Added missing ifdefs * Update offline tests * Removing duplicate parts * fix build after rebase * remove redundant stuff * simplify MMAv3.cpp * move reps function into operand attr description, remove coreMatrixType type from layout conversion, refactored type conversion * remove duplication of mfma intruction shape computation * move all MFMA instruction shape details into layout attribute * fix formatting * reenable matmul acceleration * fix dot operator type conversion * add offline test for dotop * add missing ifdef wrappers * run clang format on changes * review and rebase fix * add switch for MFMA instructions * change check precision for float16 test * disable redundant check for allowTF32 * - skip unsupported block size in matmul autotuner - support transposed inputs of dot * reenable matmul acceleration * Add first part to FMA for dot operation on HW without MFMA support. * Fix offline tests. * Fix lit tests * refactor mmav3 to mfma * fix rebase issues * fix detection of mfma support and wrong assert * remove unnecessary macros * Add documentation for MFMA layout. * fix line size computation for B argument * Fix getElemsPerThread() and getSizePerThread() functions for MFMA layout. --------- Co-authored-by: Alexander Efimov <efimov.alexander@gmail.com> Co-authored-by: dfukalov <1671137+dfukalov@users.noreply.github.com> Co-authored-by: weihan13 <weihan13@amd.com> Co-authored-by: Ognjen Plavsic <ognjen.plavsic@dxc.com>	2023-05-30 16:10:28 -05:00
Philippe Tillet	4e2f57add5	[FRONTEND] Added default axis=None for reduction, which reduces across all the axes. (#1712 )	2023-05-28 16:13:21 -07:00
Philippe Tillet	420e4acecc	[TEST] Added flash attention tests for D_HEAD in {16, 32, 128}. (#1709 )	2023-05-27 22:48:22 -07:00

... 3 4 5 6 7 ...

1258 Commits