Commit Graph

855 Commits

Author SHA1 Message Date
zahimoud
fd34b20fba [BACKEND] Fixed bug in reduce; add tests 2023-04-11 18:09:18 -07:00
Phil Tillet
3e22e18295 [TESTING] do_bench now return min time by default.
This is likely to be more stable in general for benchmarks that have L2
hit rate comparable to what is encountered in practice
2023-04-11 17:18:01 -07:00
Keren Zhou
272f23457a [DOCS] Restore the documentation workflow (#1503)
Not sure if it works at this moment, but at least we can restore the
workflow first.
2023-04-11 13:36:15 -07:00
Philippe Tillet
0fedf6b79a [TESTS] disable launch latency test (#1510) 2023-04-11 10:31:16 -07:00
Philippe Tillet
e0d6f5f4f5 [BUILD] updated LLVM binaries (#1504)
Co-authored-by: Christian Sigg <csigg@google.com>
2023-04-11 00:14:00 -07:00
Keren Zhou
6d0ed41307 [BACKEND] Replace Func Dialect with custom triton ops (func, call, return) (#1502)
MLIR current only supports a custom inlining interface per dialect, so
we cannot change the inlining decision of `func.func`.


https://discourse.llvm.org/t/avoid-inlining-some-functions-using-the-func-dialect/69830/3

Could revert it back once they've designed a better inliner interface.

Inlining attributes will be implemented in the next PR since this PR is
already huge.
2023-04-10 21:08:40 -07:00
Philippe Tillet
640f3c3921 [OPTIMIZER] Tweaked layout removal conversion heuristics (#1501)
Loads are now consider cheap to rematerialize when there are more
threads than elements in the tensor
2023-04-10 15:19:08 -07:00
peterbell10
2c06f875e4 [TESTS] Add triton version of mlir-reduce (#1498)
[`mlir-reduce`](https://mlir.llvm.org/docs/Tools/mlir-reduce/) is a tool
to reduce the complexity of bug reproducers written in mlir. Similar to
`triton-opt`, `triton` needs to have its own version with the dialects
registered properly for it to work.
2023-04-10 13:31:11 -07:00
petterreinholdtsen
8c55276c90 [DOCS] include link to github project in README (#1494)
This make it easier for those working from tarball releases or clones to
know where the upstream project is located.
2023-04-10 09:29:59 -07:00
Keren Zhou
032509384a [ANALYSIS] Fine-tune comments for shared memory allocation (#1492)
And add a new test to check multiple color cases which have never be
tested before
2023-04-10 09:00:36 -07:00
Zilin Zhu
19e424eb98 [ops/blocksparse] Fix grid shape for large lm (#1491)
When the language model grows really large, the axis 1 of the origin
grid shape (`c.shape[1]`, correspond to the number of nonzero elements
in the layout) will be larger than 65536, exceeds the cuda limit and
results in `[CUDA]: invalid argument`.

This PR is moving the axis 1 of the origin grid to axis 0, as the limit
for axis 0 is 2^31 - 1.

Thank you for your time on reviewing this PR :)
2023-04-10 09:00:12 -07:00
Philippe Tillet
adc760dac1 [OPTIMIZER] enable loop pipelining using pointer increments from vector look-up tables (#1490) 2023-04-10 08:59:42 -07:00
who who who
fd0516fb90 [DOCS] Fixed typo (#1489) 2023-04-09 16:06:34 -07:00
Philippe Tillet
b86425a28e [TEST] made lut_bmm pipeline test more concise and specific (#1488) 2023-04-08 19:17:35 -07:00
long.chen
f7ad8ae022 [Refine] remove const ref of mlir::Attribute (#1486)
https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#f16-for-in-parameters-pass-cheaply-copied-types-by-value-and-others-by-reference-to-const
```
The C++ Attribute and Type classes in MLIR (like Ops, and many other things) are value-typed. 
This means that instances of Attribute or Type are passed around by-value, 
as opposed to by-pointer or by-reference. 
The Attribute and Type classes act as wrappers around internal storage objects that are uniqued within an instance of an MLIRContext.
```
2023-04-08 10:38:59 -07:00
mcskatkat
82ec1a89ea [FRONTEND] code_generator.py TODOs fixed & removed (#1484)
Handled TODOs that were waiting for the circular import issue to be
resolved
2023-04-07 22:05:46 -07:00
Ian O'Connell
bc0b007e4b [FRONTEND] Allow cache manager to be overridden, and tweak apis to easier work with remote caches (#1478)
The changes here come with a few separate bits:

- Allow replacing the cache manager with an ENV variable to make it
pluggable
- Make the `make_path` api private since its leaking some internal bits
of the cache and allowing file access. Use a get operation instead.
- For the `compile` operation we have a several files part of a single
compile pipeline that are small, this can be not the most performant
with remote caches. Also some operations like
`_triton.get_shared_memory_size` only work when everything is cached or
none(or some key ones aren't). They segfault otherwise. So grouping
these as an entity avoids that.
2023-04-07 13:38:28 -07:00
Keren Zhou
6743e42eb5 [FRONTEND] Data type specification for math functions (#1485) 2023-04-07 10:26:19 -07:00
Keren Zhou
7f3f58f332 [FRONTEND] Fix broadcast semantics (#1480)
https://github.com/openai/triton/pull/1183

---------

Co-authored-by: Yen-Chen Lin <yenchenlin1994@gmail.com>
2023-04-06 10:40:40 -07:00
Philippe Tillet
8cbf9b40a4 [TESTING] Minor fixes (#1479) 2023-04-06 00:48:33 -07:00
Phil Tillet
4c1d001ae4 [TESTING] Now using numpy instead of pytorch in triton.assert_close
More memory-efficient than pytorch
2023-04-04 23:57:12 -07:00
Eta
577cafff0a [BUILD] Add missing subpackages to build (#1475)
The `triton/compiler`, `triton/runtime/driver`, and `triton/third_party`
subpackages were missing from the distribution built with the old
`setup.py` after #1464, causing an immediate error upon importing Triton
with a non-editable installation. This change adds the missing Python
subpackages and moves `triton/third_party` inclusion to `MANIFEST.in`,
where it will automatically be included in wheels due to the existing
`include_package_data` setup flag.
2023-04-04 22:41:08 -07:00
Phil Tillet
0e11f1e167 [TESTING] Added triton.allclose wrapper around torch.testing.allclose.
This adds a convenience layer to test linear algebra kernels and their perf.
2023-04-04 21:53:36 -07:00
Philippe Tillet
47e73aadda [BACKEND] Revert inline PTX for conversions supported by LLVM (#1474)
No longer needed now that we initialize all registers. Motivation for
reverting this workaround now that we can is that it introduced
performance regressions
2023-04-04 17:52:26 -07:00
Keren Zhou
00a9143bb4 [FRONTEND] Expose Autotuner to users (#1473)
The Autotuner is a handy utility. By allowing external access to the
Autotuner, users can overwrite some functions (e.g., `run`) to
load/store best configurations, initialize tensors based on
configuration values, and change benchmarking standard (e.g., based on
bytes instead of time).
2023-04-04 17:12:00 -07:00
Christian Sigg
01a93185a1 [BACKEND][OPTIMIZER] Switch from llvm::Optional to std::optional. (#1416) 2023-04-04 09:06:28 -07:00
Philippe Tillet
053af4e9f8 [FRONTEND] Refactor file hierarchy (#1464)
The purpose of this PR is to remove some circular dependencies and
separate concerns better in the frontend. It's still not perfect --
`triton.compile` still includes a few runtime architecture-specific
component, but at least much better than before.

This PR still assumes that AMD only supports empty kernels right now.
Other PRs will follow to make the frontend supports multiple devices in
a more modular way.
2023-04-02 12:07:08 -07:00
Keren Zhou
0855cacdd8 [BACKEND] Fix small matmul dot (#1463)
https://github.com/openai/triton/issues/1449

In theory, we might be able to support even 8x8 dot if we also wrap
around `cOff`.
2023-04-02 02:05:05 +00:00
Keren Zhou
801bb9d3b5 [ANALYSIS] Fix divisibility calculation for addptr (#1453) 2023-03-31 17:57:31 -07:00
Keren Zhou
859952a0aa [FRONTEND] Include the debug field when computing the kernel hash (#1458)
Co-authored-by: Philippe Tillet <phil@openai.com>
2023-04-01 00:52:51 +00:00
Da Yan
bf158bf01f [FRONTEND] kwargs as autotune key (#1457) 2023-03-31 17:09:14 -07:00
Kern Handa
2c0417da96 [DOCS] fixed typo triton.testing.allclose -> torch.allclose in MatMul tutorial (#1460) 2023-03-31 17:06:46 -07:00
Keren Zhou
cc4aa1ebbc [FRONTEND] Fix if-exp parsing for size-1 tensors (#1455) 2023-03-31 15:05:52 -07:00
Phil Tillet
966e5d955b [TEST] Increase `LATENCY_THRESHOLD_US 2023-03-31 11:38:18 -07:00
Francisco Massa
c1b057eee9 [FRONTEND] Add option to specify number of compilation threads during Triton compilation (#1450)
On some machines, the amount of available RAM might not be enough to
compile Triton with `2 * num_cpus` parallelism. For example, CircleCI's
`large` instance can't handle Triton compilation as is due to
insufficient memory.

Instead, I propose to take PyTorch's approach where we can define a
[`MAX_JOBS` env
var](0e4ddc2b40/tools/setup_helpers/cmake.py (L366-L368))
that gives the user the possibility to reduce (or increase) the
parallelism during compilation.

Co-authored-by: Philippe Tillet <phil@openai.com>
2023-03-31 11:34:18 -07:00
Philippe Tillet
123afdf423 [DOCS] fixed typo assert_almost_equal -> assert_allclose in tutorials (#1456) 2023-03-31 11:27:18 -07:00
Keren Zhou
28ea484dab [BACKEND] Clean up type inference functions (#1451)
And remove duplicate function definition.
2023-03-30 23:07:32 -07:00
mcskatkat
109bfca5c0 [FRONTEND] CodeGenerator.statically_implemented_functions: fixed incorrect hacky initialization (#1444)
This fixes the problem indicated in #1443
2023-03-30 00:26:00 -07:00
Chenggang Zhao
1bead327fd [TUTORIALS] Add the missing tutorial: libdevice functions (#1430)
While merging `triton-mlir`, it seems that the libdevice tutorial was
missed. This PR adds it back and modifies it with current interface
`tl.math`.

Also found a bug in `test_core.py`, `extern_libs` arguments should still
pass `libdevice`. Or it will fail on my added test. Legacy code didn't
fail because `lib_path` is none and ignored.

---------

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Co-authored-by: Philippe Tillet <phil@openai.com>
2023-03-29 19:00:17 -07:00
zahimoud
3fe2901bfc [FRONTEND] Typehint improvement (#1442)
Fixed bug with typehint checking. Refactored typehint code for
specializations. Added typehint checking for sig_keys.
2023-03-29 18:12:40 -07:00
Keren Zhou
43eed392df [BACKEND] Fix tl.exp for fp16 (#1440)
https://github.com/openai/triton/issues/1438
https://github.com/openai/triton/issues/1360
2023-03-29 16:34:23 -07:00
Sophia Wisdom
f53bb6a1bc [FRONTEND] More friendly error message when non-Triton function is called from Triton function (#1429) 2023-03-28 22:38:03 -07:00
zahimoud
73b124155b [FRONTEND] Added typehints support to speedup triton kernel launch (#1431)
One of the possible optimizations for kernel launch overhead. Basically,
we are trying to avoid having to run `hasattr` and `isinstance` for each
argument, by adding typehints to the kernel definition. Also, added a
unit test to regression to make sure we keep the launch overhead within
an expected range.
2023-03-28 22:37:34 -07:00
Keren Zhou
ee593fca0b [BACKEND] Fix int8 dot (#1435) 2023-03-28 20:18:17 -07:00
Keren Zhou
3342cc1c0c [OPTIMIZER] Do not create yield if yieldValues is empty (#1437)
https://github.com/openai/triton/issues/1432
2023-03-28 19:33:52 -07:00
Philippe Tillet
4bfbb8718a [FRONTEND] Added NoneType as an accepted condition type (#1436) 2023-03-28 18:35:12 -07:00
Michael Melesse
5293288e77 [ROCM] Enable ROCM Backend #1.5: Address Remaining Comments from #1312 (#1434)
This PR address the remaing issues from #1312. It does the following
*  LLVM String Join
* adds comment to GCNBuilder Class

---------

Co-authored-by: Rahul Batra <rahbatra@amd.com>
2023-03-28 17:23:57 -07:00
Philippe Tillet
888cbad0e5 [FRONTEND] parse_mlir_module now properly initializes LLVMDialect (#1433) 2023-03-28 15:25:31 -07:00
Keren Zhou
adc4d25276 [BACKEND] A general interface for initializing destination operands in load/store operations (#1427) 2023-03-27 22:13:01 -07:00
Philippe Tillet
fe76b12354 [BUILD] Back to cmake >= 3.18 (#1428) 2023-03-27 16:47:34 -07:00