github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-02-21 03:00:39 -05:00

Author	SHA1	Message	Date
Justin Lebar	71a8544ce7	Improve docs for atomic and load/store operations. (#2437 ) - Move atomic_cas and atomic_xchg to "atomic ops" section of documentation. - Don't talk about the `cmp` operand for operations which don't have it. - Document the `sem` operand. - :code:`foo` and ``foo`` don't work inside a :type: annotation, apparently. (They are rendered literally, instead of being treated as a formatting command.) Get rid of them. - Format the bulleted lists in the load/store operations as intended.	2023-10-04 04:17:42 +00:00
goostavz	f1512bded1	Initial code merge of Hopper support (#2036 ) The initial code merge of Nvidia Hopper features support. Please be aware that the code merge is not finished yet and the trouble-shooting is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.) and automatic warp-specialization are experimental for now and turned off by default. It is recommended for a trial when version 3.0 is released. The work is contributed by: ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao, ivanyinwz, goostavz & yangjunpro from Nvidia, in cooperation with: ptillet, Jokeren, ThomasRaoux & zahimoud from OpenAI. Co-authored-by: Goostav Zhu <gzhu@nvidia.com>	2023-08-07 09:53:04 +08:00
David Berard	7202c6cff0	[FRONTEND] expose `tl.max_constancy` hint (#1951 ) Similar to `tl.multiple_of` and `tl.max_contiguous`, `tl.max_constancy` will expose a compiler hint indicating that all the values are equal in a block of a certain size. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-07-17 18:30:25 +00:00
Thomas	ae0ee5248f	[FRONTEND] Add cumprod scan op (#1894 ) Add and test cumprod. This also allows testing a case of accumulation where 0 is not the identity element. Also add documention for scan functions.	2023-07-05 10:09:06 -07:00
Keren Zhou	74dbb2fc0a	[DOCS] Add missing ops and corresponding comments (#1699 )	2023-05-21 12:18:48 -07:00
peterbell10	deb2c71fb4	[FRONTEND] Add `tl.expand_dims` (#1614 ) This exposes `semantic.expand_dims` in the public API and builds upon it with support for expanding multiple dimensions at once. e.g. ```python tl.expand_dims(tl.arange(0, N), (0, -1)) # shape = [1, N, 1] ``` Compared to indexing with `None`, this API is useful because the dimensions can be constexpr values rather than hard-coded into the source. As a basic example ```python @triton.jit def max_keepdim(value, dim): res = tl.max(value, dim) return tl.expand_dims(res, dim) ```	2023-05-04 09:46:24 -07:00
peterbell10	0d76c4ca95	[FRONTEND] Rename `tl.reduction` -> `tl.reduce` and improve testing (#1521 ) `tl.reduction` is currently tested indirectly through the existing reduction operators, but it's good to have a direct test for the function itself. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-04-14 14:35:31 -07:00
peterbell10	6063fccd0b	[FRONTEND][BACKEND] Lower `tl.abs` to `math::Abs{I,F}Op` (#1401 ) This generates identical PTX for floating point, but for integer types the resulting PTX is much better. For example `tl.abs` for int16 currently generates ```mlir cvt.s32.s16 %r1, %rs2; neg.s16 %rs4, %rs2; setp.lt.s32 %p4, %r1, 0; selp.b16 %rs3, %rs4, %rs2, %p4; ``` After, it becomes a single `abs.s16` instruction. This also improves LLVM's ability to optimize floats. e.g. `abs(t) * abs(t)` is optimized to `t * t` now which didn't happen before. --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-03-24 21:58:24 -07:00
Berke Kocaoğlu	ba91f39dbf	[DOC] Fix syntax errors, typos, formatting; increase consistency (#1357 ) This PR; - Fixes syntax errors like `.type values: dict[str, Callable[[list[Any]], Any]]` to `:type values: dict[str, Callable[[list[Any]], Any]]`, - Fixes typos, - Fixes formatting like `k ++` to ` k++`, - Increases consistency (e.g. by transforming the minority `cd dir/` to the majority `cd dir`).	2023-03-16 15:32:02 -07:00
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Keren Zhou	d345ddf837	[DOCS] Separate atomic cas from other atomic operations since operands are very different (#559 )	2022-06-22 17:51:17 -07:00
Szymon Sidor	8bedcce9be	[LANG] Added seeded random number generation - philox (#261 )	2021-09-02 22:02:40 -07:00
Philippe Tillet	f26a48a3b4	[DOCS] Various improvements (#224 ) - Added docstr for autotune, Config, heuristics - Added docstr for atomics - Hiding internal _builder argument used for built-in language primitives - Re-factor docstr to use common templates between similar functions.	2021-08-18 11:15:53 -07:00
Philippe Tillet	c45c2e9684	[DOCS] Added docs for cos/sin/sqrt (#204 )	2021-08-14 10:34:07 -07:00
Philippe Tillet	bfc0a7587d	[PYTHON] Renamed triton.core -> triton.language (#92 )	2021-07-27 12:38:49 -07:00

15 Commits