Commit Graph

528 Commits

Author SHA1 Message Date
Yu Guo
19228d88bc [FRONTEND][BACKEND] add env variable TRITON_LIBDEVICE_PATH (#1166)
we may compile kernels on remote machines which do not have local
libdevice.10.bc.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-21 20:15:12 +00:00
Philippe Tillet
cdd59eae68 [CI] Added A100 runner; tentative merge queues support (#1224) 2023-02-21 01:37:56 -08:00
Michaël Benesty
940f394a35 [Frontend] fix crash on cast when dest is constexpr (#1222)
This pull request addresses a crash that occurs when casting to a
tl.constexpr type in the frontend.

More info and repro code available in:
https://github.com/openai/triton/issues/1221
2023-02-20 10:50:33 -08:00
Christian Sigg
17795a34ac [NFC] Remove null character (#1220) 2023-02-20 08:50:28 +00:00
BillSchumacher
6b44d31ae4 [BUILD] windows and cmake compatibility. (#1214)
Make cmake happier, it doesn't like multiple target_link_library
definitions for the same name.

Use find_package instead on Windows for dlfcn-win32. 
Set LLVM_SYS_PATH on Windows for python setup.

Debug build almost working, AlwaysCreate error thrown still.
2023-02-19 09:51:50 +00:00
Arun A. Kumar
35d1c062b8 [FRONTEND] fix AutoTuner error when OutOfResources (#1208)
Minor bug: AutoTuner currently throws the following error when certain
configs go OutOfResources (e.g. the matmul example when testing on GPUs
with less shared memory).
2023-02-18 07:29:33 +00:00
Philippe Tillet
4d067f5120 [FRONTEND] Now emit an error for tl.reshape, instead of silently calling tl.view (#1212) 2023-02-17 20:21:20 -08:00
Christian Sigg
9ef4b5d773 Rebase to LLVM-head. (#1200)
Rebase to
37b7a60cd7
2023-02-17 13:16:11 -08:00
Philippe Tillet
969331aedd [BUILD] fixed setup.py on older glibc (#1206) 2023-02-16 19:43:18 -08:00
Philippe Tillet
8a4117a0f4 [FRONTEND] launcher module is now renamed from launcher to __triton_launcher (#1201)
creating dynamically a module named `launcher` may conflict with other
modules named the same in the user's environment.
2023-02-16 17:28:51 -08:00
Christian Sigg
fc7a8e3581 Rebase Triton to LLVM-15. (#1070)
This PR rebases Triton from LLVM-14 to LLVM-15. Most changes are
mechanical, except for the analysis framework changes.
2023-02-16 06:40:53 -08:00
Horace He
f21e76affe [TUTORIALS] changed for loop to iterate by 1 in matmuls (#1198)
For the new MLIR backend, this appears to increase matmul perf
significantly in many cases.
2023-02-16 03:44:42 +00:00
Philippe Tillet
9c330a411c [FRONTEND] fixed pinned memory exception behavior (#1197)
no longer raise exception when the pointer is on "cpu" but also
accessible from within kernels (e.g., pinned memory)
2023-02-15 17:40:45 -08:00
Philippe Tillet
48c4efa23b [FRONTEND] Now using symbol-dce in optimize_triton_ir (#1196)
This will removed unused private functions after we've inlined
everything. That's important because TritonToTritonGPU doesn't know how
to lower tensor arguments.
2023-02-15 15:00:57 -08:00
Philippe Tillet
e3941f9d09 [OPTIMIZER][BACKEND] Cleaned up Volta codegen (#1185) 2023-02-14 22:39:35 -08:00
Philippe Tillet
8bca84ce3d [OPTIMIZER] Bugfix in Combine.cpp ; Added trans support in Pipeline.cpp (#1174) 2023-02-14 13:36:44 -08:00
Keren Zhou
6413c7b9de [BACKEND] Calculate correct warp ids for small matrices (#1180)
Fixing https://github.com/openai/triton/issues/1162

Add tests 16x16x16
2023-02-14 05:28:03 +00:00
Eric Wang
30db959dae [FRONTEND] Add error messages for load/store (#1179)
Fix issue https://github.com/openai/triton/issues/633
2023-02-13 10:52:50 -05:00
Yen-Chen Lin
1ea08be168 [TUTORIALS] Add description for 05-layer-norm.py (#1178)
- Add text description and equations for the tutorial. 
- Improve the code readability by changing variable names to align them
with the equation. The actual code logic is not changed.

This is a follow-up of #510. Let me know if a preview HTML is helpful
for the review, I can add a link to that too.
2023-02-13 08:47:35 +00:00
Philippe Tillet
2aba985daa [OPTIMIZER] Improved layout simplifications heuristics (#1168) 2023-02-09 20:17:25 -08:00
Yu Guo
6173dd174f [FRONTEND] Check TRITON_PTXAS_PATH is a valid file (#1165) 2023-02-09 17:17:35 +00:00
Daniil Fukalov
3af678d097 [TEST] Fix typo. (#1164)
The line is duplicate of the line 1097 - seems like the typo.
2023-02-09 08:26:21 -08:00
Nikita Shulga
ebbd9a5df3 [BUILD] remove unused global var (#1161)
`package_data` is no longer referenced from anywhere.

Use `third_party/**/*` wildcard to package contents of subfolders
2023-02-08 05:23:05 +00:00
Stonepia
a13ddf08e2 [FRONTEND] Fix bug when the _SYSPATH is set. (#1156) 2023-02-06 18:02:42 +00:00
fdrocha
972b761390 [FRONTEND] For __rshift__ operator, use arithmetic right shift if dtype is a signed int. (#1153) 2023-02-06 10:26:17 +00:00
Keren Zhou
ce47f94e59 [FRONTEND] Check if the data types of *A* and *B* in the dot op have the same data type (#1155) 2023-02-06 01:58:07 -08:00
Emil Masoumi
dff43abbb9 [Build] Prevent excessive hyphens from causing build errors. (#1151)
Prevents excessive hyphens from causing build errors on non-Windows
machines.
2023-02-04 00:22:57 -08:00
Philippe Tillet
8a4ca2c61a [CI][TEST][FRONTEND] Various small fixes (#1150)
- cancels CI runs in progress when a PR is updated
- atomics tests now use small int values that can be represented exactly
- replaced some old-style formatting by some f-string
2023-02-03 18:12:34 -08:00
Philippe Tillet
43798ab27e [BUILD] Restored wheels workflow (#1146)
- Dependent CUDA files (ptxas, cuda.h, libdevice.bc.10) are now packaged in
`triton/third_party/cuda`. `ptxas` is downloaded from conda repo at
install time.
- Can now be built with old glibc (as that used by manylinux2014)
2023-02-03 16:22:10 -08:00
Nishant Sikarwar
f9e26deb05 [FRONTEND] using literal syntax to create the data structure (#1119)
The literal syntax can give minor performance bumps compared to function
calls to create dict, list and tuple. This name dict must be looked up
in the global scope in case it has rebound. The same goes for the other
two types list() and tuple().

Signed-off-by: nishantsikarwar <nsikarwar@ch.iitr.ac.in>
Co-authored-by: Philippe Tillet <phil@openai.com>
2023-02-03 13:59:13 -08:00
Chenggang Zhao
f86843f815 Change libdevice.bc Path in Core Tests (#1141)
Only test `libdevice.bc` shipped with triton
2023-02-02 20:01:12 -08:00
George Karpenkov
a9d1935e79 [FRONTEND] Fix error message when atomics are not supported for a given dtype (#1134)
Otherwise, the construction of the exception crashes during string
concatenation.
2023-02-02 02:49:34 +00:00
Philippe Tillet
ccd17d6bf9 [TESTS] Added test for flash-attention (#1138) 2023-02-01 11:26:29 -08:00
George Karpenkov
9c3f55cbee [BUILD] Allow multi-threading during compilation (#1133)
Currently, multi-threading is only allowed during PTX->cubin
compilation, but not for LLVM->PTX or TTIR->LLVM conversion.
2023-02-01 09:40:25 -08:00
Keren Zhou
5dd8ce3745 [BACKEND] Fix topological sort and add new test cases (#1132)
Previous https://github.com/openai/triton/pull/1113 forgot to consider
that a node may have multiple parents, visiting the instruction before
any parent violates the semantic of topological sort.

The fixed implementation exhaustively add all operations into a
candidate subgraph and move an operation to the "ready" queue once all
of its operands have been visited.
2023-01-31 23:41:20 -08:00
Philippe Tillet
8fea1fb478 [FRONTEND] Adding static range (#1130)
Included: Revert "[BACKEND] Replace `mlir::topologicalSort` with a
custom implementation (#1113)"
2023-01-31 18:04:19 -08:00
rsanthanam-amd
be3da96919 [FRONTEND] Fix restoration of llir IR from cache to give a string. (#1127)
Since the llir IR is a string when it is first generated, it should also
be a string when we fetch it from the cache.
2023-01-31 18:35:10 +00:00
Philippe Tillet
c4b9d699d2 [FRONTEND][BACKEND] Fixed many bugs (#1122)
- **temporarily commenting assertion in `MemBar.cpp`. We need to fix
this! but for now the following patches will unblock a number of
users.**
- Fixed frontend codegen issue for If / For / While. Emit an error when
replaced values' type mismatch.
- Added "top level" codepath for if statements, which allows users to
write patterns to exit early from kernels (e.g., `if cond1: if cond2:
return else: ...`). Added associated codegen in TritonToTritonGPUPass
- Added basic control flow tests
- Pipeline pass is no longer activated when memory accesses can't be
vectorized
- Added missing magic methods to `constexpr`
- Fixed issue in random.py: bitcast some values to uint when they need
to be.
- Added support for `Not`
- Fixed nondeterministic compilation issue
2023-01-30 23:22:36 -08:00
goostavz
3e8d83b7cc Minor fix to support sm_90 (#1125)
This fix enables the support on sm_90 (otherwise it will crash).

Logs like 
> 'sm_90' is not a recognized processor for this target (ignoring
processor)

could be ignored and should be eliminated with the update of llvm nvptx
backend.
2023-01-31 14:08:02 +08:00
Yan Chunwei
94b419c327 [FRONTEND] some tiny fix (#1120) 2023-01-30 19:39:38 -08:00
Nishant Sikarwar
653c8dc124 [FRONTEND] Replaced range with enumerate calls (#1110)
Using range(len(...)) is not pythonic. 
Python does not have not index-based loops. Instead, it uses collection
iterators. Python has a built-in method enumerate which adds a counter
to an iterable. Using this, you can access the counter and the value
from the iterable at the same time. It is therefore recommended to
replace range(len(...)) with enumerate(...).

for ex 


5bcf60a5c0/python/triton/language/extern.py (L68)



f62d556fff/python/triton/language/extern.py (L68)

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-01-30 15:22:11 -08:00
Nishant Sikarwar
e5dbe35cc1 [FRONTEND] removed unnecessary comprehension (#1085) 2023-01-30 19:42:14 +00:00
Nikita Shulga
e9446c7ce3 [BUILD] Add ability to bundle CUDA dependencies (#1100) 2023-01-27 09:55:49 -08:00
Nikita Shulga
d3e753b5c0 [RUNTIME] Raise runtime error if C compiler is not found (#982)
Makes error reported in https://github.com/pytorch/pytorch/issues/90377
a bit easier to understand
2023-01-26 00:08:25 +00:00
Edward Z. Yang
cf0ae2ed76 [BUILD] Still build even if lit is not installed on user's system (#1095)
Otherwise it fails with

```
File "setup.py", line 147, in build_extension
    "-DLLVM_EXTERNAL_LIT=" + lit_dir,`
TypeError: can only concatenate str (not "NoneType") to str
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
2023-01-25 12:55:59 -08:00
Keren Zhou
c59fb4acca [FRONTEND] Fix libdevice elementwise compute for constexpr (#1082) 2023-01-22 07:11:44 +00:00
Yan Chunwei
88498d104a [BACKEND] DotOp enable ld.v4 in MMAv1 (#1020)
The existing convert distributed to distributed layouts logic is based
on processing each MMA-block, this requires each MMA-block to share
exactly the same fixed pattern(such as the one described in the [NV PTX
doc](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-fragment-mma-16816-float)).

While for MMAv1, things are different, the MMA-block has variant
patterns for different shapes and data layouts as below

<img width="200" alt="image"
src="https://user-images.githubusercontent.com/328693/213354941-731d7856-ad24-4f48-be0e-3cf41532cfa4.png">

This requires all the cell coordinates in DotOp output to be computed.
2023-01-19 09:42:33 -08:00
Philippe Tillet
408d1d7e87 [OPTIMIZER] Improved flash attention forward pass performance (#1075)
- Fixed typo in instruction reordering pass
- Minor additional optimizations for shared memory allocator
- Optimized flash attention tutorial forward pass kernel
2023-01-19 06:46:01 +00:00
Void Main
b2c522a451 [BACKEND] Remove duplicate def for create_get_program_id (#1013)
The same function is redefined in lines
[645-650](https://github.com/openai/triton/blob/master/python/src/triton.cc#L645-L650)
and
[1174-1179](https://github.com/openai/triton/blob/master/python/src/triton.cc#L1174-L1179),
compared these 2 definitions, looks like we should remove the code in
lines 645-650.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2023-01-18 16:25:10 -05:00
Philippe Tillet
660f2e8cce [OPTIMIZER] pipeline and prefetch pass now use a more ptxas-friendly schedule (#1065) 2023-01-17 15:21:19 -08:00