github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Phil Tillet	305f99e614	[BUILD] Fixed typo in setup.py	2023-03-07 15:45:36 -08:00
Philippe Tillet	c34b32866b	[BUILD] re-download package if version has changed (#1294 )	2023-03-07 10:15:35 -08:00
Philippe Tillet	53c216f506	[TESTS] Now using downloaded FileCheck directly without copying it (#1293 )	2023-03-07 09:11:08 -08:00
JiCheng	849a40baad	[FRONTEND] Add check for the axis of reduction op (#1268 )	2023-03-06 22:11:43 -08:00
Da Yan	6249f5d923	[OPTIMIZER] checking loads' masks' alignment info in the pipeline pass (#1289 ) Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-03-06 18:37:26 -08:00
Philippe Tillet	3db55c5f94	[OPTIMIZER]]BACKEND] Some backend and optimization passes clean-up (#1284 ) * Cleaned up pipeline pass. Now works when there are element-wise ops between the load and the dot * Made `splat` compatible with varibales that have DotOperandLayout * Moves rematerialization utils to separate Transforms/Utility.cpp file.	2023-03-06 17:17:59 -08:00
Keren Zhou	73d55eb59c	[OPTIMIZER] Check if the definition op is NULL in GPU combine passes (#1288 )	2023-03-06 16:19:13 -08:00
Keren Zhou	4731f300d3	[BACKEND] Mask out wrapped threads in store ops (#1283 )	2023-03-06 14:50:20 -08:00
Alexander Zinoviev	5e92a66267	[DOC] Fix a typo in where's description (#1286 ) Co-authored-by: Alexander Zinoviev <zinoviev@google.com>	2023-03-06 14:38:03 -08:00
Philippe Tillet	ff94e34430	[TESTS][BUILD] now using llvm @ 8e5a41e8271f (#1282 ) Now we also use the FileTest utility packaged with llvm pre-built binaries	2023-03-05 17:23:00 -08:00
Philippe Tillet	0f82fac60e	[BACKEND] move struct packing/unpacking to type converter and give a more explicit name (#1281 ) This is the first of a series of PR meant to clean up how the backend handles the codegen for dot operand layouts	2023-03-05 16:04:29 -08:00
Keren Zhou	d376020f90	[FRONTEND][BACKEND] Implement `tl.device_assert` and rename `tl.printf` to `tl.device_print` (#1143 ) Note that `tl.device_print` and `print` accepts different arguments than the normal `print`. The first argument must be a string, following by variables. Device side: - `tl.device_print` - `tl.device_assert` - `print` - `assert` Compilation time: - `tl.static_assert` - `tl.static_print` Usage example: 1. ```Python tl.device_assert(x == 0, "x != 0") ``` Output: ```Python ... python/test/unit/language/assert_helper.py:18: kernel: block: [0,0,0], thread: [33,0,0] Assertion `x != 0` failed. ... ``` 2. ```Python tl.device_print("hello ", x) ``` Output: ```Python ... hello 1 ... ``` The environment variable `TRITON_DEBUG` sets the default debugging flag; if it's true, `tl.device_assert` or `assert` will be skipped.	2023-03-04 08:08:29 -08:00
Keren Zhou	77c145cec8	[BUILD] Bump cmake requirement to >= 3.20 and format CMakeLists.txt (#1276 ) cc @malfet	2023-03-03 11:43:09 -08:00
Li Dong	e6edcdfd28	[README] fix typo (#1274 )	2023-03-03 08:01:18 -08:00
Phil Tillet	c7581c9a91	[PACKAGING] bump dev version to 2.1.0	2023-03-02 21:52:30 -08:00
Phil Tillet	2cb7da7259	[DOCS] Light edits to README	2023-03-02 21:52:03 -08:00
Keren Zhou	65e5a3bc24	[FRONTEND] Improve `tl.full` to accept both static and dynamic values (#1269 )	2023-03-02 12:19:54 -08:00
Keren Zhou	d54745538b	[BACKEND][CI] Disable most backward rematerialization through load/store (#1260 )	2023-03-02 09:45:50 -08:00
Phil Tillet	2660c814c9	[FRONTEND] for loop negative step hotfix	2023-03-01 23:45:03 -08:00
Philippe Tillet	fa0fbc937f	[FRONTEND][BACKEND][OPTIMIZER] Loops now use 64-bit indices when necessary (#1261 ) * Frontend: - `int` kernel arguments are always signed - Loop induction variable is now determine by integer promotion on lb/ub/step * Optimizer: - Added new ExtractSliceOp that enforces 32-bit offsets * Backend: - Use 64-bit indices when lowering functions and control flow - Removed `idx_val` macro and replaced it with `i32_val` - Cleaned up comments - Added new ArithToIndex pass to make sure operations on indices are done with the `index` dialect, that gets converted to LLVM separately using a 64-bit target	2023-03-01 23:09:48 -08:00
Keren Zhou	90fcb38c7b	[BACKEND] Overwrite NVPTX converters for fp16<->fp32 and int16<->int32 to avoid ptxas problems (#1267 )	2023-03-01 18:26:06 -08:00
Da Yan	cb7b315a17	[OPTIMIZER] Copying named attributes when converting from Triton to TritonGPU (#1265 )	2023-03-01 12:31:46 -08:00
Keren Zhou	be6217cce7	[BACKEND] Improve ptxas error message (#1263 )	2023-03-01 00:59:36 +00:00
Philippe Tillet	f8c92c3d17	[ANALYSIS] initializing operand axisinfo state (if necessary) before visiting operation (#1259 ) Feels very wrong, and probably not the right way to do this. But otherwise `scf.if` doesn't get initialized since the merge to llvm-head. Suggestions are welcome 😅	2023-02-27 19:32:19 -08:00
Keren Zhou	5376fe9443	[FRONTEND] Improve triton hooks (#1256 ) Callback interfaces are not changed, just to record more attributes (i.e., `constants`) and simplify invocations	2023-02-26 17:16:05 -08:00
Keren Zhou	eab16d7aa7	[CI] Various fixes for the Torchinductor CI (#1251 )	2023-02-24 11:49:14 -08:00
Da Yan	0eead250c1	[FRONTEND] add missing tensor/constexpr ops (#1249 )	2023-02-24 18:45:22 +00:00
Phil Tillet	823072146f	[CI] .github/workflows/torchinductor/run-tests.yml -> .github/workflows/torch-inductor-tests.yml	2023-02-23 15:55:07 -08:00
Keren Zhou	5b0b89d45b	[CI] Init torchinductor correctness and performance tests (#1241 )	2023-02-23 13:53:41 -08:00
Yan Chunwei	7eecc4d4ad	[Frontend] Fix jit cache bug (#1242 )	2023-02-23 09:21:30 -08:00
Michaël Benesty	66ddd17e72	[EXAMLPES] remove unnecessary argument (#1243 ) Small cleaning of an example calling an old API to display generated IR	2023-02-23 09:15:32 -08:00
Stonepia	a38d2defb8	[Build] Strip the static libraries symbol from the triton shared library (#1240 ) This is to solve https://github.com/openai/triton/issues/1236 This commit hides the symbols of the shared libraries for `libtriton.so`, so that when other object link against `libtriton.so`, it won't have confilct.	2023-02-22 23:39:31 -08:00
Chenggang Zhao	b5efa91e2a	[Backend] Fix a bug in swizzling store (#1235 ) The function calculates the swizzled address to store (not load), so we should use `outOrder` instead of `inOrder`. Current tests do not cover this case, but at NVIDIA, we have a case related to `sm_90` that could trigger. Already discussed in the Slack channel with @Jokeren.	2023-02-22 19:13:21 -08:00
Douglas Lehr	729211a404	Ensure __triton_launcher calls right _launch. (#1229 ) Per issue https://github.com/openai/triton/issues/1228. I believe we are potentially exposed when a Triton executor (Pytorch for example) links in two or more `triton_.so` shared objects and each has a stub for `_launch`. This fix ensures the `_launch` function is tied locally to the calling `__triton_launcher` and can't be misused by another library.	2023-02-23 00:16:36 +00:00
Keren Zhou	6a9316e69a	[BACKEND] Clean up SCF -> CF conversion (#1234 )	2023-02-22 23:49:47 +00:00
Philippe Tillet	0ec277efc5	[OPTIMIZER] cleaned, renamed and simplified some optimization passes (#1232 ) This shouldn't actually change the behavior of Triton -- only clean things up.	2023-02-22 13:54:55 -08:00
Philippe Tillet	ba0198326e	[TESTS] make performance regression testing less strict (#1231 )	2023-02-21 22:22:02 -08:00
Mihir Patel	6bef0c2bd6	[FRONTEND] Update path for headers to support Python 3.10 (#1123 ) Python 3.10 changes where packages are installed by default, causing problems with Ubuntu into `/local`. See [this](https://lists.debian.org/debian-python/2022/03/msg00039.html) and [this](https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1967920). Triton seems to break when using 3.10 as it looks for the headers, but the headers are not in `/local`, e.g. they are at `/usr/include/python3.X` and not `/usr/local/include/python3.X` Not 100% sure what's going on here since it's deep in python / pip, but I think this should fix it. Otherwise, you have to hack around it in dockerfiles, e.g. `ENV DEB_PYTHON_INSTALL_LAYOUT=deb`, which breaks things with the release of pip that went. --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-02-21 21:19:08 -08:00
Philippe Tillet	174f121c1c	[TESTS] Added attention regression tests (#1227 )	2023-02-21 20:22:36 -08:00
Phil Tillet	0192ab2178	Revert "[CI] Now only running CI when checks are requested in merge groups" This reverts commit `d023e1cb06`.	2023-02-21 16:39:47 -08:00
Eric Wang	320ae18093	[FRONTEND] Add error messages for arange (#1218 ) Fix issue https://github.com/openai/triton/issues/244 Check `end` is greater than `start`. Check if the range can fit in `int32`. Check the number of elements less than or equal to `TRITON_MAX_TENSOR_NUMEL = 131072`. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-22 00:37:28 +00:00
Phil Tillet	d023e1cb06	[CI] Now only running CI when checks are requested in merge groups	2023-02-21 16:34:25 -08:00
Philippe Tillet	307dde9cb5	[CI] revived regression tests (#1225 )	2023-02-21 16:33:03 -08:00
Yu Guo	19228d88bc	[FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166 ) we may compile kernels on remote machines which do not have local libdevice.10.bc. Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-21 20:15:12 +00:00
Philippe Tillet	cdd59eae68	[CI] Added A100 runner; tentative merge queues support (#1224 )	2023-02-21 01:37:56 -08:00
Michaël Benesty	940f394a35	[Frontend] fix crash on cast when dest is constexpr (#1222 ) This pull request addresses a crash that occurs when casting to a tl.constexpr type in the frontend. More info and repro code available in: https://github.com/openai/triton/issues/1221	2023-02-20 10:50:33 -08:00
Christian Sigg	17795a34ac	[NFC] Remove null character (#1220 )	2023-02-20 08:50:28 +00:00
Keren Zhou	123c687ed9	[BACKEND] Rewrite Membar to fit the CF dialect (#1213 )	2023-02-19 14:54:33 -08:00
BillSchumacher	6b44d31ae4	[BUILD] windows and cmake compatibility. (#1214 ) Make cmake happier, it doesn't like multiple target_link_library definitions for the same name. Use find_package instead on Windows for dlfcn-win32. Set LLVM_SYS_PATH on Windows for python setup. Debug build almost working, AlwaysCreate error thrown still.	2023-02-19 09:51:50 +00:00
Philippe Tillet	c1194bd237	[OPTIMIZER] Refined side-effect traits (#1216 )	2023-02-19 01:21:19 -08:00

1 2 3 4 5 ...

732 Commits