github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Yu Guo	19228d88bc	[FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166 ) we may compile kernels on remote machines which do not have local libdevice.10.bc. Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-21 20:15:12 +00:00
Philippe Tillet	cdd59eae68	[CI] Added A100 runner; tentative merge queues support (#1224 )	2023-02-21 01:37:56 -08:00
Michaël Benesty	940f394a35	[Frontend] fix crash on cast when dest is constexpr (#1222 ) This pull request addresses a crash that occurs when casting to a tl.constexpr type in the frontend. More info and repro code available in: https://github.com/openai/triton/issues/1221	2023-02-20 10:50:33 -08:00
Christian Sigg	17795a34ac	[NFC] Remove null character (#1220 )	2023-02-20 08:50:28 +00:00
BillSchumacher	6b44d31ae4	[BUILD] windows and cmake compatibility. (#1214 ) Make cmake happier, it doesn't like multiple target_link_library definitions for the same name. Use find_package instead on Windows for dlfcn-win32. Set LLVM_SYS_PATH on Windows for python setup. Debug build almost working, AlwaysCreate error thrown still.	2023-02-19 09:51:50 +00:00
Arun A. Kumar	35d1c062b8	[FRONTEND] fix AutoTuner error when OutOfResources (#1208 ) Minor bug: AutoTuner currently throws the following error when certain configs go OutOfResources (e.g. the matmul example when testing on GPUs with less shared memory).	2023-02-18 07:29:33 +00:00
Philippe Tillet	4d067f5120	[FRONTEND] Now emit an error for `tl.reshape`, instead of silently calling `tl.view` (#1212 )	2023-02-17 20:21:20 -08:00
Christian Sigg	9ef4b5d773	Rebase to LLVM-head. (#1200 ) Rebase to `37b7a60cd7`	2023-02-17 13:16:11 -08:00
Philippe Tillet	969331aedd	[BUILD] fixed setup.py on older glibc (#1206 )	2023-02-16 19:43:18 -08:00
Philippe Tillet	8a4117a0f4	[FRONTEND] launcher module is now renamed from `launcher` to `__triton_launcher` (#1201 ) creating dynamically a module named `launcher` may conflict with other modules named the same in the user's environment.	2023-02-16 17:28:51 -08:00
Christian Sigg	fc7a8e3581	Rebase Triton to LLVM-15. (#1070 ) This PR rebases Triton from LLVM-14 to LLVM-15. Most changes are mechanical, except for the analysis framework changes.	2023-02-16 06:40:53 -08:00
Horace He	f21e76affe	[TUTORIALS] changed for loop to iterate by 1 in matmuls (#1198 ) For the new MLIR backend, this appears to increase matmul perf significantly in many cases.	2023-02-16 03:44:42 +00:00
Philippe Tillet	9c330a411c	[FRONTEND] fixed pinned memory exception behavior (#1197 ) no longer raise exception when the pointer is on "cpu" but also accessible from within kernels (e.g., pinned memory)	2023-02-15 17:40:45 -08:00
Philippe Tillet	48c4efa23b	[FRONTEND] Now using symbol-dce in `optimize_triton_ir` (#1196 ) This will removed unused private functions after we've inlined everything. That's important because TritonToTritonGPU doesn't know how to lower tensor arguments.	2023-02-15 15:00:57 -08:00
Philippe Tillet	e3941f9d09	[OPTIMIZER][BACKEND] Cleaned up Volta codegen (#1185 )	2023-02-14 22:39:35 -08:00
Philippe Tillet	8bca84ce3d	[OPTIMIZER] Bugfix in Combine.cpp ; Added `trans` support in Pipeline.cpp (#1174 )	2023-02-14 13:36:44 -08:00
Keren Zhou	6413c7b9de	[BACKEND] Calculate correct warp ids for small matrices (#1180 ) Fixing https://github.com/openai/triton/issues/1162 Add tests 16x16x16	2023-02-14 05:28:03 +00:00
Eric Wang	30db959dae	[FRONTEND] Add error messages for load/store (#1179 ) Fix issue https://github.com/openai/triton/issues/633	2023-02-13 10:52:50 -05:00
Yen-Chen Lin	1ea08be168	[TUTORIALS] Add description for 05-layer-norm.py (#1178 ) - Add text description and equations for the tutorial. - Improve the code readability by changing variable names to align them with the equation. The actual code logic is not changed. This is a follow-up of #510. Let me know if a preview HTML is helpful for the review, I can add a link to that too.	2023-02-13 08:47:35 +00:00
Philippe Tillet	2aba985daa	[OPTIMIZER] Improved layout simplifications heuristics (#1168 )	2023-02-09 20:17:25 -08:00
Yu Guo	6173dd174f	[FRONTEND] Check TRITON_PTXAS_PATH is a valid file (#1165 )	2023-02-09 17:17:35 +00:00
Daniil Fukalov	3af678d097	[TEST] Fix typo. (#1164 ) The line is duplicate of the line 1097 - seems like the typo.	2023-02-09 08:26:21 -08:00
Nikita Shulga	ebbd9a5df3	[BUILD] remove unused global var (#1161 ) `package_data` is no longer referenced from anywhere. Use `third_party/*/` wildcard to package contents of subfolders	2023-02-08 05:23:05 +00:00
Stonepia	a13ddf08e2	[FRONTEND] Fix bug when the `_SYSPATH` is set. (#1156 )	2023-02-06 18:02:42 +00:00
fdrocha	972b761390	[FRONTEND] For __rshift__ operator, use arithmetic right shift if dtype is a signed int. (#1153 )	2023-02-06 10:26:17 +00:00
Keren Zhou	ce47f94e59	[FRONTEND] Check if the data types of A and B in the dot op have the same data type (#1155 )	2023-02-06 01:58:07 -08:00
Emil Masoumi	dff43abbb9	[Build] Prevent excessive hyphens from causing build errors. (#1151 ) Prevents excessive hyphens from causing build errors on non-Windows machines.	2023-02-04 00:22:57 -08:00
Philippe Tillet	8a4ca2c61a	[CI][TEST][FRONTEND] Various small fixes (#1150 ) - cancels CI runs in progress when a PR is updated - atomics tests now use small int values that can be represented exactly - replaced some old-style formatting by some f-string	2023-02-03 18:12:34 -08:00
Philippe Tillet	43798ab27e	[BUILD] Restored wheels workflow (#1146 ) - Dependent CUDA files (ptxas, cuda.h, libdevice.bc.10) are now packaged in `triton/third_party/cuda`. `ptxas` is downloaded from conda repo at install time. - Can now be built with old glibc (as that used by manylinux2014)	2023-02-03 16:22:10 -08:00
Nishant Sikarwar	f9e26deb05	[FRONTEND] using literal syntax to create the data structure (#1119 ) The literal syntax can give minor performance bumps compared to function calls to create dict, list and tuple. This name dict must be looked up in the global scope in case it has rebound. The same goes for the other two types list() and tuple(). Signed-off-by: nishantsikarwar <nsikarwar@ch.iitr.ac.in> Co-authored-by: Philippe Tillet <phil@openai.com>	2023-02-03 13:59:13 -08:00
Chenggang Zhao	f86843f815	Change `libdevice.bc` Path in Core Tests (#1141 ) Only test `libdevice.bc` shipped with triton	2023-02-02 20:01:12 -08:00
George Karpenkov	a9d1935e79	[FRONTEND] Fix error message when atomics are not supported for a given dtype (#1134 ) Otherwise, the construction of the exception crashes during string concatenation.	2023-02-02 02:49:34 +00:00
Philippe Tillet	ccd17d6bf9	[TESTS] Added test for flash-attention (#1138 )	2023-02-01 11:26:29 -08:00
George Karpenkov	9c3f55cbee	[BUILD] Allow multi-threading during compilation (#1133 ) Currently, multi-threading is only allowed during PTX->cubin compilation, but not for LLVM->PTX or TTIR->LLVM conversion.	2023-02-01 09:40:25 -08:00
Keren Zhou	5dd8ce3745	[BACKEND] Fix topological sort and add new test cases (#1132 ) Previous https://github.com/openai/triton/pull/1113 forgot to consider that a node may have multiple parents, visiting the instruction before any parent violates the semantic of topological sort. The fixed implementation exhaustively add all operations into a candidate subgraph and move an operation to the "ready" queue once all of its operands have been visited.	2023-01-31 23:41:20 -08:00
Philippe Tillet	8fea1fb478	[FRONTEND] Adding static range (#1130 ) Included: Revert "[BACKEND] Replace `mlir::topologicalSort` with a custom implementation (#1113)"	2023-01-31 18:04:19 -08:00
rsanthanam-amd	be3da96919	[FRONTEND] Fix restoration of llir IR from cache to give a string. (#1127 ) Since the llir IR is a string when it is first generated, it should also be a string when we fetch it from the cache.	2023-01-31 18:35:10 +00:00
Philippe Tillet	c4b9d699d2	[FRONTEND][BACKEND] Fixed many bugs (#1122 ) - temporarily commenting assertion in `MemBar.cpp`. We need to fix this! but for now the following patches will unblock a number of users. - Fixed frontend codegen issue for If / For / While. Emit an error when replaced values' type mismatch. - Added "top level" codepath for if statements, which allows users to write patterns to exit early from kernels (e.g., `if cond1: if cond2: return else: ...`). Added associated codegen in TritonToTritonGPUPass - Added basic control flow tests - Pipeline pass is no longer activated when memory accesses can't be vectorized - Added missing magic methods to `constexpr` - Fixed issue in random.py: bitcast some values to uint when they need to be. - Added support for `Not` - Fixed nondeterministic compilation issue	2023-01-30 23:22:36 -08:00
goostavz	3e8d83b7cc	Minor fix to support sm_90 (#1125 ) This fix enables the support on sm_90 (otherwise it will crash). Logs like > 'sm_90' is not a recognized processor for this target (ignoring processor) could be ignored and should be eliminated with the update of llvm nvptx backend.	2023-01-31 14:08:02 +08:00
Yan Chunwei	94b419c327	[FRONTEND] some tiny fix (#1120 )	2023-01-30 19:39:38 -08:00
Nishant Sikarwar	653c8dc124	[FRONTEND] Replaced range with enumerate calls (#1110 ) Using range(len(...)) is not pythonic. Python does not have not index-based loops. Instead, it uses collection iterators. Python has a built-in method enumerate which adds a counter to an iterable. Using this, you can access the counter and the value from the iterable at the same time. It is therefore recommended to replace range(len(...)) with enumerate(...). for ex `5bcf60a5c0/python/triton/language/extern.py (L68)` `f62d556fff/python/triton/language/extern.py (L68)` Signed-off-by: GitHub <noreply@github.com> Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-01-30 15:22:11 -08:00
Nishant Sikarwar	e5dbe35cc1	[FRONTEND] removed unnecessary comprehension (#1085 )	2023-01-30 19:42:14 +00:00
Nikita Shulga	e9446c7ce3	[BUILD] Add ability to bundle CUDA dependencies (#1100 )	2023-01-27 09:55:49 -08:00
Nikita Shulga	d3e753b5c0	[RUNTIME] Raise runtime error if C compiler is not found (#982 ) Makes error reported in https://github.com/pytorch/pytorch/issues/90377 a bit easier to understand	2023-01-26 00:08:25 +00:00
Edward Z. Yang	cf0ae2ed76	[BUILD] Still build even if lit is not installed on user's system (#1095 ) Otherwise it fails with ``` File "setup.py", line 147, in build_extension "-DLLVM_EXTERNAL_LIT=" + lit_dir,` TypeError: can only concatenate str (not "NoneType") to str ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com>	2023-01-25 12:55:59 -08:00
Keren Zhou	c59fb4acca	[FRONTEND] Fix libdevice elementwise compute for constexpr (#1082 )	2023-01-22 07:11:44 +00:00
Yan Chunwei	88498d104a	[BACKEND] DotOp enable ld.v4 in MMAv1 (#1020 ) The existing convert distributed to distributed layouts logic is based on processing each MMA-block, this requires each MMA-block to share exactly the same fixed pattern(such as the one described in the [NV PTX doc](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-fragment-mma-16816-float)). While for MMAv1, things are different, the MMA-block has variant patterns for different shapes and data layouts as below <img width="200" alt="image" src="https://user-images.githubusercontent.com/328693/213354941-731d7856-ad24-4f48-be0e-3cf41532cfa4.png"> This requires all the cell coordinates in DotOp output to be computed.	2023-01-19 09:42:33 -08:00
Philippe Tillet	408d1d7e87	[OPTIMIZER] Improved flash attention forward pass performance (#1075 ) - Fixed typo in instruction reordering pass - Minor additional optimizations for shared memory allocator - Optimized flash attention tutorial forward pass kernel	2023-01-19 06:46:01 +00:00
Void Main	b2c522a451	[BACKEND] Remove duplicate def for create_get_program_id (#1013 ) The same function is redefined in lines [645-650](https://github.com/openai/triton/blob/master/python/src/triton.cc#L645-L650) and [1174-1179](https://github.com/openai/triton/blob/master/python/src/triton.cc#L1174-L1179), compared these 2 definitions, looks like we should remove the code in lines 645-650. Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-01-18 16:25:10 -05:00
Philippe Tillet	660f2e8cce	[OPTIMIZER] pipeline and prefetch pass now use a more ptxas-friendly schedule (#1065 )	2023-01-17 15:21:19 -08:00

1 2 3 4 5 ...

528 Commits