github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Rohit Santhanam	cd9ae1cd36	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02232023	2023-02-23 21:41:54 +00:00
rsanthanam-amd	e7f84448bf	Merge pull request #127 from dfukalov/dfukalov/work-3 [ROCM] Enable float16 and int8 types for FMA based `dot` implementation.	2023-02-22 16:39:04 -06:00
Daniil Fukalov	2d678efb89	[ROCM] Enable float16 and int8 types for FMA based `dot` implementation. By default Triton generates MLIR with f32 result of the tt.dot operation on f16 typed operands. So we have "tt.dot(f16,f16,f32)->f32" types in .ttgir. But LLVM FMA instruction requires for the same type for all three operands. So first two operands are implicitly casted f16->f32 as "unrealized_conversion_cast struct{f16,f16,...}->struct{f32,f32}". The change fixed incorrect implicit cast generation. For the int8 typed operands result operand is also casted after performing dot. As the next step to improve FMA based dot operation FMA on f16 and int8 target specific intrinsics (e.g. fma(f16,f16,f16)->f16) could be used, perhaps as an option.	2023-02-22 22:36:20 +01:00
Eric Wang	320ae18093	[FRONTEND] Add error messages for arange (#1218 ) Fix issue https://github.com/openai/triton/issues/244 Check `end` is greater than `start`. Check if the range can fit in `int32`. Check the number of elements less than or equal to `TRITON_MAX_TENSOR_NUMEL = 131072`. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-22 00:37:28 +00:00
Yu Guo	19228d88bc	[FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166 ) we may compile kernels on remote machines which do not have local libdevice.10.bc. Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-21 20:15:12 +00:00
Michaël Benesty	940f394a35	[Frontend] fix crash on cast when dest is constexpr (#1222 ) This pull request addresses a crash that occurs when casting to a tl.constexpr type in the frontend. More info and repro code available in: https://github.com/openai/triton/issues/1221	2023-02-20 10:50:33 -08:00
Rohit Santhanam	841784d1e3	Merge remote-tracking branch 'upstream/main' into upgrade_triton_mlir_rocm_to_llvm_head	2023-02-18 09:25:20 +00:00
Philippe Tillet	4d067f5120	[FRONTEND] Now emit an error for `tl.reshape`, instead of silently calling `tl.view` (#1212 )	2023-02-17 20:21:20 -08:00
Eric Wang	30db959dae	[FRONTEND] Add error messages for load/store (#1179 ) Fix issue https://github.com/openai/triton/issues/633	2023-02-13 10:52:50 -05:00
Rohit Santhanam	a2416e0901	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-02112023	2023-02-11 14:48:19 +00:00
Philippe Tillet	2aba985daa	[OPTIMIZER] Improved layout simplifications heuristics (#1168 )	2023-02-09 20:17:25 -08:00
fdrocha	972b761390	[FRONTEND] For __rshift__ operator, use arithmetic right shift if dtype is a signed int. (#1153 )	2023-02-06 10:26:17 +00:00
Keren Zhou	ce47f94e59	[FRONTEND] Check if the data types of A and B in the dot op have the same data type (#1155 )	2023-02-06 01:58:07 -08:00
Philippe Tillet	43798ab27e	[BUILD] Restored wheels workflow (#1146 ) - Dependent CUDA files (ptxas, cuda.h, libdevice.bc.10) are now packaged in `triton/third_party/cuda`. `ptxas` is downloaded from conda repo at install time. - Can now be built with old glibc (as that used by manylinux2014)	2023-02-03 16:22:10 -08:00
Rohit Santhanam	8cb6ab5b1a	Merge remote-tracking branch 'upstream/main' into triton_mlir_IFU_02022023	2023-02-02 22:54:53 +00:00
George Karpenkov	a9d1935e79	[FRONTEND] Fix error message when atomics are not supported for a given dtype (#1134 ) Otherwise, the construction of the exception crashes during string concatenation.	2023-02-02 02:49:34 +00:00
Philippe Tillet	8fea1fb478	[FRONTEND] Adding static range (#1130 ) Included: Revert "[BACKEND] Replace `mlir::topologicalSort` with a custom implementation (#1113)"	2023-01-31 18:04:19 -08:00
Philippe Tillet	c4b9d699d2	[FRONTEND][BACKEND] Fixed many bugs (#1122 ) - temporarily commenting assertion in `MemBar.cpp`. We need to fix this! but for now the following patches will unblock a number of users. - Fixed frontend codegen issue for If / For / While. Emit an error when replaced values' type mismatch. - Added "top level" codepath for if statements, which allows users to write patterns to exit early from kernels (e.g., `if cond1: if cond2: return else: ...`). Added associated codegen in TritonToTritonGPUPass - Added basic control flow tests - Pipeline pass is no longer activated when memory accesses can't be vectorized - Added missing magic methods to `constexpr` - Fixed issue in random.py: bitcast some values to uint when they need to be. - Added support for `Not` - Fixed nondeterministic compilation issue	2023-01-30 23:22:36 -08:00
Yan Chunwei	94b419c327	[FRONTEND] some tiny fix (#1120 )	2023-01-30 19:39:38 -08:00
Nishant Sikarwar	653c8dc124	[FRONTEND] Replaced range with enumerate calls (#1110 ) Using range(len(...)) is not pythonic. Python does not have not index-based loops. Instead, it uses collection iterators. Python has a built-in method enumerate which adds a counter to an iterable. Using this, you can access the counter and the value from the iterable at the same time. It is therefore recommended to replace range(len(...)) with enumerate(...). for ex `5bcf60a5c0/python/triton/language/extern.py (L68)` `f62d556fff/python/triton/language/extern.py (L68)` Signed-off-by: GitHub <noreply@github.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-01-30 15:22:11 -08:00
Michael Melesse	a9f955f862	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-2023-30-1	2023-01-30 14:04:01 -06:00
Nishant Sikarwar	e5dbe35cc1	[FRONTEND] removed unnecessary comprehension (#1085 )	2023-01-30 19:42:14 +00:00
Rohit Santhanam	2d0ee0fa0f	Merge remote-tracking branch 'upstream/main' into triton-mlir-IFU-01232023	2023-01-24 03:59:17 +00:00
Daniil Fukalov	e6983feb91	[ROCM] Implement next part of atomics. - fixed scalar atomic_rmw implementation for fmin/fmax for f32 - fixed tensor atomic_rmw - added atomic_cas implementation. TODO: fix atomic_rmw for f16, implement fmin/fmax for f32 with native instructions (asm inline in case of LLVM 14) instead of tweak used as for NV.	2023-01-23 14:01:08 +01:00
Keren Zhou	c59fb4acca	[FRONTEND] Fix libdevice elementwise compute for constexpr (#1082 )	2023-01-22 07:11:44 +00:00
Nishant Sikarwar	7687f85ca4	[FRONTEND] decorating static methods with @staticmethod (#1069 )	2023-01-17 14:35:06 -08:00
Keren Zhou	3f47e9aa0e	[BACKEND] Fix unrealized conversion for fp32 dot (#1051 )	2023-01-17 21:55:44 +00:00
Nishant Sikarwar	4a74d6eae9	[FRONTEND] replaced chains comparison operator with `in` (#1059 )	2023-01-15 20:14:35 +00:00
Rohit Santhanam	ce8adb92bd	Merge remote-tracking branch 'upstream/master' into triton-mlir-IFU-01142023	2023-01-14 19:19:58 +00:00
Philippe Tillet	dc7ecf4535	[FRONTEND] Fix output datatype of reduce (#1045 )	2023-01-10 15:04:54 -08:00
Keren Zhou	733301ff31	[Backend] Rewrite code for linking external library to expose more inlining opportunities (#1037 ) - Also make it cleaner. - And mark out the code needs to be fixed in `semantic.py`.	2023-01-08 13:44:29 -08:00
Keren Zhou	4023149ee3	[Frontend] Convert constexpr to value for store and load ops (#1030 ) Fixing problem 2 in https://github.com/openai/triton/issues/1017 Co-authored-by: Philippe Tillet <phil@openai.com>	2023-01-05 14:40:16 -05:00
Sophia Wisdom	411bacb2a8	[FRONTEND] Add logical operations on constexprs (#1033 )	2023-01-04 18:06:32 -08:00
Sharad Vikram	4a3eb0fb9f	[FRONTEND] Fix argmin/max output type (#1012 ) Currently Triton returns tensors with the input types rather than i32 when doing reduce argmax/argmin.	2023-01-04 15:13:47 +00:00
Keren Zhou	18c47161ae	[Frontend] Fix import for libdevice (#1028 ) This is a hotfix for issue 1 in https://github.com/openai/triton/issues/1017	2023-01-04 15:13:30 +00:00
Sharad Vikram	d32c538066	[FRONTEND] Export `broadcast` and `broadcast_to` in `triton.language` (#1007 )	2023-01-04 15:10:30 +00:00
Keren Zhou	c9e7385255	[FRONTEND] Fix 3d indexing (#1006 )	2023-01-04 14:59:31 +00:00
Sharad Vikram	bc73bbb12c	[FRONTEND] Fix argmin/max output type (#1012 ) Currently Triton returns tensors with the input types rather than i32 when doing reduce argmax/argmin.	2023-01-03 23:12:16 -08:00
Keren Zhou	8460ea3df1	[Frontend] Fix import for libdevice (#1028 ) This is a hotfix for issue 1 in https://github.com/openai/triton/issues/1017	2023-01-03 15:48:05 -08:00
Sharad Vikram	925d3d7f98	[FRONTEND] Export `broadcast` and `broadcast_to` in `triton.language` (#1007 )	2022-12-22 01:57:33 +00:00
Keren Zhou	b5aafb0dab	[FRONTEND] Fix 3d indexing (#1006 )	2022-12-21 12:52:32 -08:00
Michael Melesse	41578a63d2	Merge remote-tracking branch 'upstream/triton-mlir' into triton-mlir-IFU	2022-12-21 12:53:03 -06:00
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Keren Zhou	50a5128448	[Triton-MLIR][BACKEND] Support bfloat16 and clean up some test code (#998 )	2022-12-20 22:26:51 -08:00
Philippe Tillet	899bb0a0e7	[FORMAT] Run `clang-format`, `autopep8` and `isort` (#1000 )	2022-12-20 17:47:34 -08:00
Philippe Tillet	e759d8ef61	[FRONTEND] `%` now has same semantics as torch on floats (#999 )	2022-12-20 15:37:19 -08:00
dfukalov	dfa19a6fe4	[Triton-MLIR][ROCM] Addressed comments. Added example script to build cuda2gcn.bc.	2022-12-20 00:19:45 +01:00
Daniil Fukalov	c2a6028f26	[Triton-MLIR][ROCM] Added cuda2gcn wrapper for devicelib. Thereis cuda2gcn wrapper library in ROCm-Device-Libs' sources: https://github.com/RadeonOpenCompute/ROCm-Device-Libs/tree/amd-stg-open/cuda2gcn I've just slightly fixed it (going to create appropriate PRs in its repo). And then have built from the source (the default ROCm-Device-Libs build doesn't create it). It contains mappings and/or implementations for __nv* library functions. This step allowed to remove `if`s in libdevice.py since they are not neeeded there anymore. Also added some tests (that started to pass) to test_core_amd.py.	2022-12-20 00:18:50 +01:00
Philippe Tillet	9f27468377	[TESTS][FRONTEND][BACKEND] Merge `master` and `triton-mlir` tests (#979 ) Also fix a bunch of bugs in float32 / tf32 Co-authored-by: Jokeren <kerenzhou@openai.com>	2022-12-15 19:28:50 -08:00
Philippe Tillet	e5cfa0f633	[FRONTEND] Added a few assertions in `semantic.dot` (#977 )	2022-12-12 00:07:14 -08:00

1 2 3 4

167 Commits