github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
zahimoud	fd34b20fba	[BACKEND] Fixed bug in reduce; add tests	2023-04-11 18:09:18 -07:00
Philippe Tillet	e0d6f5f4f5	[BUILD] updated LLVM binaries (#1504 ) Co-authored-by: Christian Sigg <csigg@google.com>	2023-04-11 00:14:00 -07:00
Keren Zhou	6d0ed41307	[BACKEND] Replace Func Dialect with custom triton ops (func, call, return) (#1502 ) MLIR current only supports a custom inlining interface per dialect, so we cannot change the inlining decision of `func.func`. https://discourse.llvm.org/t/avoid-inlining-some-functions-using-the-func-dialect/69830/3 Could revert it back once they've designed a better inliner interface. Inlining attributes will be implemented in the next PR since this PR is already huge.	2023-04-10 21:08:40 -07:00
Philippe Tillet	640f3c3921	[OPTIMIZER] Tweaked layout removal conversion heuristics (#1501 ) Loads are now consider cheap to rematerialize when there are more threads than elements in the tensor	2023-04-10 15:19:08 -07:00
Keren Zhou	032509384a	[ANALYSIS] Fine-tune comments for shared memory allocation (#1492 ) And add a new test to check multiple color cases which have never be tested before	2023-04-10 09:00:36 -07:00
Philippe Tillet	adc760dac1	[OPTIMIZER] enable loop pipelining using pointer increments from vector look-up tables (#1490 )	2023-04-10 08:59:42 -07:00
Philippe Tillet	b86425a28e	[TEST] made `lut_bmm` pipeline test more concise and specific (#1488 )	2023-04-08 19:17:35 -07:00
long.chen	f7ad8ae022	[Refine] remove const ref of mlir::Attribute (#1486 ) https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/ https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#f16-for-in-parameters-pass-cheaply-copied-types-by-value-and-others-by-reference-to-const ``` The C++ Attribute and Type classes in MLIR (like Ops, and many other things) are value-typed. This means that instances of Attribute or Type are passed around by-value, as opposed to by-pointer or by-reference. The Attribute and Type classes act as wrappers around internal storage objects that are uniqued within an instance of an MLIRContext. ```	2023-04-08 10:38:59 -07:00
Philippe Tillet	47e73aadda	[BACKEND] Revert inline PTX for conversions supported by LLVM (#1474 ) No longer needed now that we initialize all registers. Motivation for reverting this workaround now that we can is that it introduced performance regressions	2023-04-04 17:52:26 -07:00
Christian Sigg	01a93185a1	[BACKEND][OPTIMIZER] Switch from llvm::Optional to std::optional. (#1416 )	2023-04-04 09:06:28 -07:00
Philippe Tillet	053af4e9f8	[FRONTEND] Refactor file hierarchy (#1464 ) The purpose of this PR is to remove some circular dependencies and separate concerns better in the frontend. It's still not perfect -- `triton.compile` still includes a few runtime architecture-specific component, but at least much better than before. This PR still assumes that AMD only supports empty kernels right now. Other PRs will follow to make the frontend supports multiple devices in a more modular way.	2023-04-02 12:07:08 -07:00
Keren Zhou	0855cacdd8	[BACKEND] Fix small matmul dot (#1463 ) https://github.com/openai/triton/issues/1449 In theory, we might be able to support even 8x8 dot if we also wrap around `cOff`.	2023-04-02 02:05:05 +00:00
Keren Zhou	801bb9d3b5	[ANALYSIS] Fix divisibility calculation for addptr (#1453 )	2023-03-31 17:57:31 -07:00
Keren Zhou	28ea484dab	[BACKEND] Clean up type inference functions (#1451 ) And remove duplicate function definition.	2023-03-30 23:07:32 -07:00
Keren Zhou	43eed392df	[BACKEND] Fix tl.exp for fp16 (#1440 ) https://github.com/openai/triton/issues/1438 https://github.com/openai/triton/issues/1360	2023-03-29 16:34:23 -07:00
Keren Zhou	ee593fca0b	[BACKEND] Fix int8 dot (#1435 )	2023-03-28 20:18:17 -07:00
Keren Zhou	3342cc1c0c	[OPTIMIZER] Do not create yield if yieldValues is empty (#1437 ) https://github.com/openai/triton/issues/1432	2023-03-28 19:33:52 -07:00
Keren Zhou	adc4d25276	[BACKEND] A general interface for initializing destination operands in load/store operations (#1427 )	2023-03-27 22:13:01 -07:00
Chenggang Zhao	72b071253e	[FRONTEND] Support block pointer semantics (#1392 ) This PR introduces a new semantics: block pointer, which makes users easier & faster to load a block from a parent tensor. Below is a detailed API change by an example: ``` # Make a block pointer, which points to a block in the parent shape # `base`: the parent tensor # `shape`: the shape of the parent tensor # `strides`: the strides of the parent tensor # `offsets`: the offsets of the block in the parent tensor # `order`: the order of the data arrangement in memory # Below is an example loading a 2D column-major matrix block_ptr = tl.make_block_ptr(base=ptr, shape=(M, N), strides=(stride_m, stride_n), offsets=(0, 0), block_shape=(BLOCK_M, BLOCK_N), order=(1, 0)) # Advance the offsets; note that the striding information is already saved in `block_ptr` # `base`: the block pointer to be advanced # `offsets`: the offsets for each dimension block_ptr = tl.advance(base=block_ptr, offsets=(BLOCK_M, -BLOCK_N)) block_ptr = tl.advance(base=block_ptr, offsets=(-BLOCK_M, BLOCK_N)) # Load from a block pointer, the output type is the dereferenced type of `block_ptr`, e.g. ptr<tensor<32x32xf32>> -> tensor<32x32xf32> # `ptr`: the block pointer to be loaded # `boundary_check`: a tuple of dimensions to check the boundary # `padding`: padding strategy for elements out of bound val = tl.load(ptr=block_ptr, boundary_check=(0, 1), padding="zero") # Store by a block pointer, in which the pointer and the value tensor should have the same shape # `ptr`: the block pointer to be stored # `boundary_check`: a tuple of dimensions to check the boundary (no-write if out of bound) tl.store(ptr=block_ptr, value=val, boundary_check=(0, 1)) ``` --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-27 16:46:49 -07:00
Xuehai Pan	5b36cb48ad	[CI][TEST] update `pre-commit` hooks and use `pre-commit` for style tests in CI (#1409 ) Ref issue: - #1408 Changes: - Add `.editorconfig` - Add `pre-commit-hooks`: ```yaml - repo: https://github.com/pre-commit/pre-commit-hooks rev: v4.4.0 hooks: - id: check-symlinks - id: destroyed-symlinks - id: trailing-whitespace - id: end-of-file-fixer - id: check-yaml - id: check-toml - id: check-ast - id: check-added-large-files - id: check-merge-conflict - id: check-executables-have-shebangs - id: check-shebang-scripts-are-executable - id: detect-private-key - id: debug-statements ``` - Add `flake8` to `pre-commit` config and add `.flake8` file - Use `pre-commit` for style tests in CI - Run `pre-commit` and fix existing violations: - fix trailing spaces - fix end-of-files - fix mod file mode with `chmod -x` - run `autopep8` on existing code - fix `flake8` violations	2023-03-25 14:52:16 -07:00
peterbell10	6063fccd0b	[FRONTEND][BACKEND] Lower `tl.abs` to `math::Abs{I,F}Op` (#1401 ) This generates identical PTX for floating point, but for integer types the resulting PTX is much better. For example `tl.abs` for int16 currently generates ```mlir cvt.s32.s16 %r1, %rs2; neg.s16 %rs4, %rs2; setp.lt.s32 %p4, %r1, 0; selp.b16 %rs3, %rs4, %rs2, %p4; ``` After, it becomes a single `abs.s16` instruction. This also improves LLVM's ability to optimize floats. e.g. `abs(t) * abs(t)` is optimized to `t * t` now which didn't happen before. --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-03-24 21:58:24 -07:00
Michael Melesse	a9c87245b4	[ROCM] Enable ROCM Backend #1 : Empty Kernel (#1312 ) This PR is a first in a series of PRs to import the changes that we have made to enable ROCM on [our fork](https://github.com/ROCmSoftwarePlatform/triton) of triton. The PR contains the major changes to the python frontend and enough changes to the c++ backend to allow compilation and running of the empty kernel. We use the ROCM ci added a few weeks ago to verify things. --------- Co-authored-by: Ronan Keryell <ronan@keryell.fr>	2023-03-24 17:18:27 -07:00
Keren Zhou	b7762bee2c	[TEST] Cleanup SCF dialect in tests (#1402 )	2023-03-24 09:21:40 -07:00
Philippe Tillet	fc7c0b0e43	[FRONTEND] Removed torch dependency and cleaned up testing (#1394 ) `assert triton.testing.allclose` -> `torch.testing.assert_allclose` `triton.testing.assert_almost_equal` -> `torch.testing.assert_allclose`	2023-03-23 22:37:21 -07:00
xndcn	ff1d0377e0	[BACKEND] Fix wrong conversion from float8e5m2 <> bfloat16 (#1391 ) exponent compensate should be 0x3800(112) instead of 0x3000(96) also add a mantissa bit for float16 conversion to round to nearest float8e5m2 Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-24 04:42:08 +00:00
Keren Zhou	c9f47d9094	[BACKEND] Init values before load to avoid ptxas issues (#1396 )	2023-03-23 17:24:03 -07:00
Keren Zhou	2ba77a9212	[OPTIMIZER] Fix a typo in SimplifyReduceCvt (#1385 )	2023-03-21 22:45:58 -07:00
xndcn	65d8d802d5	[BACKEND] Fix wrong conversion from float8e4m3 <> bfloat16 (#1384 ) exponent compensate should be 0x3c00(120) instead of 0x3800(112)	2023-03-21 18:58:13 -07:00
Philippe Tillet	c34ceca741	[BACKEND] Remove DotOpHelpers (i.e., decouple ConvertLayoutOpToLLVM and DotOpToLLVM) (#1383 ) One long-standing issue in the backend has been the apparent complexity of the tensor core codegen. This complexity mostly stems from the existence of the DotOpHelpers` utilities, which have become over time a catch-all for all things related to MmaEncoding and DotOperandEncoding. The purpose of this PR is to decouple what should be decoupled, as a first step towards cleaning our tensor core codegen. Other, more more local PRs will follow.	2023-03-21 15:24:28 -07:00
xndcn	84ffefc368	[BACKEND] Fix wrong conversion from float8e4m3 <> float16 (#1375 ) after offset shifting, exponent compensate should not be forgotten also add back some comments from `legacy_backend`	2023-03-20 21:45:25 -07:00
Keren Zhou	e281bd9fe9	[OPTIMIZER] Ensure the conversion of blockArgument is placed at the beginning of the block (#1379 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-20 21:19:26 -04:00
Keren Zhou	23fc647a3e	[OPTIMIZER] Fixe optimizer hanging caused by SimplifyReduceCvt (#1377 ) https://github.com/openai/triton/issues/1328 Match the convert_layout operation in SimplifyReduceCvt (convert_layout->reduce). This way we don't miss higher priority rewrite patterns like RematerializeBackward and SimplifyConversion. We also need to set SimplifyConversion's benefit = 4, RematerializeBackward's benefit = 3, and RematerializeForward's benefit = 2.	2023-03-20 16:20:19 -07:00
Philippe Tillet	29d01ba5f3	[OPTIMIZER] We shouldn't try to rematerialize view/cat forward since output layout can't be deduced automatically (#1378 )	2023-03-20 14:26:50 -07:00
Keren Zhou	78d5900467	[OPTIMIZER] Improve pipeline to handle general indirect access to matrices (#1291 ) Differentiate between immediate and non-immediate block arguments. If we have a load that immediately depends on a block argument in the current iteration, it is an immediate dependency. Otherwise, it is a non-immediate dependency, which means the load depends on a block argument in the previous iterations. For example: ``` scf.for (%arg0, %arg1, %arg2) { %0 = load %arg0 <--- immediate dep, this address is initialized at numStages-2 %1 = load %arg1 %2 = add %1, %arg2 %3 = load %2 <--- non-immediate dep, %arg1 must be an update-to-date value } ``` The above code pattern is commonly seen in cases where we have indirect memory accesses using a lookup table, such as PyTorch's `bsr_dense_bmm`. This PR improves `bsr_dense_bmm` for about ~20% on the unit test cases.	2023-03-20 14:39:47 -04:00
Philippe Tillet	fe9dc4b58e	[OPTIMIZER] Restored ViewOp/CatOp passthrough in simulateBackwardRematerialization (#1376 )	2023-03-20 11:02:54 -07:00
Philippe Tillet	b4decbe155	[BACKEND] Now using `call_once` to initialize LLVM target (#1373 )	2023-03-19 21:23:39 -07:00
Fei Hu	6366c5a254	[FRONTEND][BACKEND] Add support for FP16 output for tl.dot (#1258 ) --------- Co-authored-by: Fei Hu <fhu@microsoft.com>	2023-03-19 19:52:14 -07:00
Philippe Tillet	e4b2d1bc3d	[FRONTEND][BACKEND] no longer using indices for loops (#1370 )	2023-03-19 14:57:50 -07:00
Philippe Tillet	28e05c9799	[OPTIMIZER] Canonicalize `convert_layout(cat: #layout1) -> #layout2` as `cat: #layout2` (#1369 ) We can do that because `cat` reorders elements anyways	2023-03-19 14:16:55 -07:00
Philippe Tillet	39139258c8	[FRONTEND][BACKEND] tl.mathlib -> tl.math; internally reverted to mathlib -> libdevice (#1368 )	2023-03-19 02:14:57 -07:00
rsanthanam-amd	c575911a01	[FRONTEND] Change libdevice to mathlib and fix abs (#1361 ) Co-authored-by: Phil Tillet <phil@openai.com>	2023-03-19 01:34:16 -07:00
Philippe Tillet	02caa8a652	[OPTIMIZER] Better handling of control flow in Triton -> TritonGPU conversion (#1367 )	2023-03-18 23:00:19 -07:00
peterbell10	c9740f0870	[OPTIMIZER] Add canonicalize/fold for ExpandDimsOp, ViewOp and BroadcastOp (#1354 ) These eliminate no-op reshapes, and simplify some combinations of view ops into a single view. e.g. viewing a splat becomes a single splat.	2023-03-16 21:13:58 -07:00
Berke Kocaoğlu	ba91f39dbf	[DOC] Fix syntax errors, typos, formatting; increase consistency (#1357 ) This PR; - Fixes syntax errors like `.type values: dict[str, Callable[[list[Any]], Any]]` to `:type values: dict[str, Callable[[list[Any]], Any]]`, - Fixes typos, - Fixes formatting like `k ++` to ` k++`, - Increases consistency (e.g. by transforming the minority `cd dir/` to the majority `cd dir`).	2023-03-16 15:32:02 -07:00
Da Yan	9d5505d043	[OPTIMIZER] Infer the alignment info of loops' induction variables (#1350 ) Before this PR, loops' induction variables' (IV) alignment info is lost. For example: ``` for n in range(0, K, BLOCK): x = base + n ^-- Triton doesn't know n is always a multiple of BLOCK ``` This PR fixes this. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-03-16 00:39:08 -07:00
Philippe Tillet	082828af47	[OPTIMIZER] Fixed up divisibility analysis in div operation (#1341 )	2023-03-14 18:17:05 -07:00
Keren Zhou	da0b0bfde6	[BACKEND] Still run llvm-opt but set optLevel to 0 to avoid the `abs(float)` bug (#1339 ) https://github.com/openai/triton/issues/1337	2023-03-14 12:38:57 -07:00
Philippe Tillet	6a8634e2a7	[BACKEND] No longer running LLVM-IR optimizations after codegen. (#1338 ) This triggered some outrageous bugs. See #1337.	2023-03-13 22:50:15 -07:00
Philippe Tillet	6539395337	[OPTIMIZER] CatOp is now marked as not having invertible layout (#1332 )	2023-03-13 15:42:48 -07:00
Philippe Tillet	9b7c65a3a9	[BACKEND][OPTIMIZER] Refactor MMAv1 codegen (#1322 ) - Significant simplification of the optimizer pipeline. Right mma version is now set directly after the coalescing pass. DotOperand layout no longer hold a state to `isRow` argument, and instead query it from their parent - Moved a bunch of things from TritonGPUToLLVM/DotOpHelpers to TritonGPUAttrDefs. All MMAv1 state is now queried from attributes. - logic for getELemsPerThread is no longer duplicated in TypeConverter	2023-03-12 19:54:38 -07:00

1 2 3 4 5 ...

328 Commits