github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Philippe Tillet	7c7b769e37	[SETUP] Fixed dependencies (#1389 )	2023-03-22 16:15:35 -07:00
Keren Zhou	2ba77a9212	[OPTIMIZER] Fix a typo in SimplifyReduceCvt (#1385 )	2023-03-21 22:45:58 -07:00
xndcn	65d8d802d5	[BACKEND] Fix wrong conversion from float8e4m3 <> bfloat16 (#1384 ) exponent compensate should be 0x3c00(120) instead of 0x3800(112)	2023-03-21 18:58:13 -07:00
Phil Tillet	08f705d193	[DOCS] Better contributing guidelines	2023-03-21 16:21:43 -07:00
Philippe Tillet	c34ceca741	[BACKEND] Remove DotOpHelpers (i.e., decouple ConvertLayoutOpToLLVM and DotOpToLLVM) (#1383 ) One long-standing issue in the backend has been the apparent complexity of the tensor core codegen. This complexity mostly stems from the existence of the DotOpHelpers` utilities, which have become over time a catch-all for all things related to MmaEncoding and DotOperandEncoding. The purpose of this PR is to decouple what should be decoupled, as a first step towards cleaning our tensor core codegen. Other, more more local PRs will follow.	2023-03-21 15:24:28 -07:00
mcskatkat	9ae78d21f1	[FRONTEND] `CompilationError._format_message` issue + tidying (#1362 ) - fixed `CompilationError._format_message` fails when `error_message` is a `constexpr` - factored out `_is_constexpr()` checks and `_unwrap_if_constexpr()` idioms - Added `UnsupportedLanguageConstruct` exception, replaced some python builtin exceptions raised in such cases. - Some hardening in `.visit_If()` - cleaner exception handling in `build_triton_ir()`	2023-03-21 19:52:18 +00:00
Keren Zhou	c1dd6df9ce	[FRONTEND] Fix negative induction variable (#1382 )	2023-03-21 08:38:16 -07:00
xndcn	84ffefc368	[BACKEND] Fix wrong conversion from float8e4m3 <> float16 (#1375 ) after offset shifting, exponent compensate should not be forgotten also add back some comments from `legacy_backend`	2023-03-20 21:45:25 -07:00
Keren Zhou	e281bd9fe9	[OPTIMIZER] Ensure the conversion of blockArgument is placed at the beginning of the block (#1379 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-20 21:19:26 -04:00
Keren Zhou	23fc647a3e	[OPTIMIZER] Fixe optimizer hanging caused by SimplifyReduceCvt (#1377 ) https://github.com/openai/triton/issues/1328 Match the convert_layout operation in SimplifyReduceCvt (convert_layout->reduce). This way we don't miss higher priority rewrite patterns like RematerializeBackward and SimplifyConversion. We also need to set SimplifyConversion's benefit = 4, RematerializeBackward's benefit = 3, and RematerializeForward's benefit = 2.	2023-03-20 16:20:19 -07:00
Philippe Tillet	29d01ba5f3	[OPTIMIZER] We shouldn't try to rematerialize view/cat forward since output layout can't be deduced automatically (#1378 )	2023-03-20 14:26:50 -07:00
Keren Zhou	78d5900467	[OPTIMIZER] Improve pipeline to handle general indirect access to matrices (#1291 ) Differentiate between immediate and non-immediate block arguments. If we have a load that immediately depends on a block argument in the current iteration, it is an immediate dependency. Otherwise, it is a non-immediate dependency, which means the load depends on a block argument in the previous iterations. For example: ``` scf.for (%arg0, %arg1, %arg2) { %0 = load %arg0 <--- immediate dep, this address is initialized at numStages-2 %1 = load %arg1 %2 = add %1, %arg2 %3 = load %2 <--- non-immediate dep, %arg1 must be an update-to-date value } ``` The above code pattern is commonly seen in cases where we have indirect memory accesses using a lookup table, such as PyTorch's `bsr_dense_bmm`. This PR improves `bsr_dense_bmm` for about ~20% on the unit test cases.	2023-03-20 14:39:47 -04:00
Philippe Tillet	fe9dc4b58e	[OPTIMIZER] Restored ViewOp/CatOp passthrough in simulateBackwardRematerialization (#1376 )	2023-03-20 11:02:54 -07:00
Phil Tillet	e650d3708b	[FRONTEND] `dot` now uses `tl.float32` by default for `out_dtype`.	2023-03-19 21:58:46 -07:00
Philippe Tillet	b4decbe155	[BACKEND] Now using `call_once` to initialize LLVM target (#1373 )	2023-03-19 21:23:39 -07:00
Fei Hu	6366c5a254	[FRONTEND][BACKEND] Add support for FP16 output for tl.dot (#1258 ) --------- Co-authored-by: Fei Hu <fhu@microsoft.com>	2023-03-19 19:52:14 -07:00
Philippe Tillet	e4b2d1bc3d	[FRONTEND][BACKEND] no longer using indices for loops (#1370 )	2023-03-19 14:57:50 -07:00
Philippe Tillet	28e05c9799	[OPTIMIZER] Canonicalize `convert_layout(cat: #layout1) -> #layout2` as `cat: #layout2` (#1369 ) We can do that because `cat` reorders elements anyways	2023-03-19 14:16:55 -07:00
Philippe Tillet	39139258c8	[FRONTEND][BACKEND] tl.mathlib -> tl.math; internally reverted to mathlib -> libdevice (#1368 )	2023-03-19 02:14:57 -07:00
rsanthanam-amd	c575911a01	[FRONTEND] Change libdevice to mathlib and fix abs (#1361 ) Co-authored-by: Phil Tillet <phil@openai.com>	2023-03-19 01:34:16 -07:00
Philippe Tillet	02caa8a652	[OPTIMIZER] Better handling of control flow in Triton -> TritonGPU conversion (#1367 )	2023-03-18 23:00:19 -07:00
Philippe Tillet	2f035c0611	[FRONTEND] Fix contains_return_op when analyzing functions in another module (#1365 )	2023-03-18 15:02:45 -07:00
Edward Z. Yang	6d61a5ca23	[FRONTEND] Don't use HOME envvar to get HOME (#1364 ) Fixes https://github.com/pytorch/pytorch/issues/97076	2023-03-18 10:39:58 -07:00
peterbell10	c9740f0870	[OPTIMIZER] Add canonicalize/fold for ExpandDimsOp, ViewOp and BroadcastOp (#1354 ) These eliminate no-op reshapes, and simplify some combinations of view ops into a single view. e.g. viewing a splat becomes a single splat.	2023-03-16 21:13:58 -07:00
Horace He	1d2871d0d1	[RUNTIME] Fix memory leak in (#1358 ) Fixes a bug that causes Triton to leak 32 bytes on every kernel invocation. Also solves https://github.com/pytorch/pytorch/issues/96937	2023-03-16 17:52:06 -07:00
mcskatkat	611a2dc9bf	[FRONTEND] `CodeGenerator`: enhanced (#1355 ) Contents of this change to `CodeGenerator`: - addressed mutable default value in constructor (GitHub #1353) - structured and faster name lookup (replaces `.get_value`) - added informative error messages in some places - tidy mechanism for "static" (compile time) functions replaces inline `if ... elif ...` chain in `.visit_Call` - more robust `static_assert` and `static_print` - more informative `CompilationError` display (saves scrolling up through long tracebacks) - dedicated `CompileTimeAssertionFailure` exception for `static_assert` can be specially treated upstream by `Autotuner` to skip configurations that violate constraints (as for `OutOfResources`) --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-16 17:00:43 -07:00
Berke Kocaoğlu	ba91f39dbf	[DOC] Fix syntax errors, typos, formatting; increase consistency (#1357 ) This PR; - Fixes syntax errors like `.type values: dict[str, Callable[[list[Any]], Any]]` to `:type values: dict[str, Callable[[list[Any]], Any]]`, - Fixes typos, - Fixes formatting like `k ++` to ` k++`, - Increases consistency (e.g. by transforming the minority `cd dir/` to the majority `cd dir`).	2023-03-16 15:32:02 -07:00
Phil Tillet	d00bc5af67	[README] Now saying we won't accept PRs that fix simple typos in our documentation	2023-03-16 12:43:47 -07:00
mcskatkat	53e8e04d6e	[FRONTEND] fix constexpr by annotation (#1352 ) Fixed unjustified `TypeError` raised when arg is (strangely) annotated with a non-type	2023-03-16 11:10:19 -07:00
mcskatkat	f5d22d5995	[FRONTEND] support f-strings in compiler with `constexpr` conversion (#1349 ) This addition allows explanatory messages upon assertion failures: ```python @triton.jit def my_single_block_kernel( matrix_extent: tl.constexpr, block_size: tl.constexpr, # must be >= extent (single block) matrix: Tensor, ... ): tl.static_assert(matrix_extent <= block_size, f"`matrix_extent` should not be more than the block size ({block_size}), but is {matrix_extent}") ``` Yielding, when called incorrectly: ``` AssertionError: `matrix_extent` should not be more than the block size (32), but is 57 ```	2023-03-16 08:02:10 +00:00
Da Yan	9d5505d043	[OPTIMIZER] Infer the alignment info of loops' induction variables (#1350 ) Before this PR, loops' induction variables' (IV) alignment info is lost. For example: ``` for n in range(0, K, BLOCK): x = base + n ^-- Triton doesn't know n is always a multiple of BLOCK ``` This PR fixes this. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-16 00:39:08 -07:00
Shintaro Iwasaki	4b774ee4d0	[OPS/BLOCKSPARSE] remove unnecessary mask (#1351 ) This PR applies a minor patch that removes unnecessary masks in `_dsd_kernel()`. ### Details `offs_bn` is defined as follows and not updated after that. ```py offs_bn = pid_m * TILE_N + tl.arange(0, TILE_N) offs_bn = tl.max_contiguous(tl.multiple_of(offs_bn % DS0, TILE_N), TILE_N) ``` Because `offs_bn = offs_bn % DS0`, this mask is always `True`. ```py b = tl.load(pb, mask=offs_bn[None, :] < DS0) ``` This PR removes this mask (as well as explicit `mask=True`).	2023-03-15 19:06:38 -07:00
mcskatkat	c175473bbf	[FRONTEND] In `JITFunction`: infer constexpr arg only if annotated as such (#1345 ) Fixed `JITFunction.__init__` to mark args as constexpr only when the annotation is actually `tl.constexpr`, rather than treating any annotated arg as constexpr.	2023-03-15 16:39:45 -07:00
Stonepia	109b5e2729	[BUILD] Fix the build bug when user use system package of llvm by setting `LLVM_SYSPATH` (#1336 ) When the user set the `LLVM_SYSPATH` to use custom build llvm, it will throw the error because there is no version.txt under the custom build one. This PR skips the version check If the `LLVM_SYSPATH` is set. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-15 13:28:19 -07:00
Philippe Tillet	56b23f433d	[TEST] Temporarily disable `test_dot` mode that fails because of ptxas/nvptx (#1344 )	2023-03-15 01:17:48 -07:00
peterbell10	01b177afe7	[FRONTEND] Mangle signed and unsigned integer types differently (#1340 ) This is cherry-picked from #1305 If you call a `JITFunction` twice in the same kernel, first with `int32` then with `uint32`, the second call will treat the unsigned value as signed. This passes through MLIR without error because MLIR uses the same types for both, but different operation calls will be generated so you may silently get the wrong result.	2023-03-14 22:29:18 -07:00
Philippe Tillet	ad81447ad0	[FRONTEND] Marking `int1` (`bool`) type as unsigned (#1343 )	2023-03-14 22:05:13 -07:00
Philippe Tillet	082828af47	[OPTIMIZER] Fixed up divisibility analysis in div operation (#1341 )	2023-03-14 18:17:05 -07:00
Keren Zhou	da0b0bfde6	[BACKEND] Still run llvm-opt but set optLevel to 0 to avoid the `abs(float)` bug (#1339 ) https://github.com/openai/triton/issues/1337	2023-03-14 12:38:57 -07:00
Philippe Tillet	6a8634e2a7	[BACKEND] No longer running LLVM-IR optimizations after codegen. (#1338 ) This triggered some outrageous bugs. See #1337.	2023-03-13 22:50:15 -07:00
Philippe Tillet	dde34904d0	[TESTING] triton.testing.allclose now uses torch.allclose (#1333 )	2023-03-13 17:48:32 -07:00
Philippe Tillet	6539395337	[OPTIMIZER] CatOp is now marked as not having invertible layout (#1332 )	2023-03-13 15:42:48 -07:00
Edward Z. Yang	01b8cfe9ff	[BUILD] Mash stdc++fs into more targets (#1329 ) I observed that when compiling with gcc8, stdc++fs linker flag isn't passed to enough targets. I couldn't figure out the correct target to add the linker flag to, so I'm just mashing it everywhere. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Signed-off-by: Edward Z. Yang <ezyang@meta.com>	2023-03-13 15:02:53 -07:00
Nikita Shulga	663074460d	[VERSION] Update `triton/__init__.py` (#1327 ) Followup after `c7581c9a91`	2023-03-13 10:38:38 -07:00
Philippe Tillet	9b7c65a3a9	[BACKEND][OPTIMIZER] Refactor MMAv1 codegen (#1322 ) - Significant simplification of the optimizer pipeline. Right mma version is now set directly after the coalescing pass. DotOperand layout no longer hold a state to `isRow` argument, and instead query it from their parent - Moved a bunch of things from TritonGPUToLLVM/DotOpHelpers to TritonGPUAttrDefs. All MMAv1 state is now queried from attributes. - logic for getELemsPerThread is no longer duplicated in TypeConverter	2023-03-12 19:54:38 -07:00
Christian Sigg	64fc0e23ce	[BACKEND] Fix `triton-convert-arith-to-index`. (#1310 ) The dialect of created ops needs to be part of dependent dialects.	2023-03-12 19:43:41 -07:00
Yu Guo	ef55ccfed0	[TESTING] fix get_max_simd_tflops (#1318 ) `_triton.runtime.num_sm`, `_triton.runtime.clock_rate`, `_triton.runtime.cc` seem no longer exist. use the corresponding methods from `get_max_tensorcore_tflops` in the same file.	2023-03-11 10:07:25 -08:00
Philippe Tillet	5a786cf778	[FRONTEND] Fixed `contains_return_op` behavior (#1317 )	2023-03-10 23:58:28 -08:00
Philippe Tillet	3fe3adbcde	[FRONTEND][BACKEND] Add support for float8e5m2 type (#1314 )	2023-03-10 19:14:47 -08:00
Luo Yihang	9626c8e944	[DOC] Fix typos in comments (#1311 ) Fixed several typos in `python/triton/runtime/autotuner.py`	2023-03-10 09:33:24 -08:00

1 2 3 4 5 ...

791 Commits