This exposes `semantic.expand_dims` in the public API and builds upon it
with support for expanding multiple dimensions at once. e.g.
```python
tl.expand_dims(tl.arange(0, N), (0, -1)) # shape = [1, N, 1]
```
Compared to indexing with `None`, this API is useful because the
dimensions can be constexpr values rather than hard-coded into the
source. As a basic example
```python
@triton.jit
def max_keepdim(value, dim):
res = tl.max(value, dim)
return tl.expand_dims(res, dim)
```
`tl.reduction` is currently tested indirectly through the existing
reduction operators, but it's good to have a direct test for the
function itself.
---------
Co-authored-by: Philippe Tillet <phil@openai.com>
This generates identical PTX for floating point, but for integer types
the resulting PTX is much better. For example `tl.abs` for int16
currently generates
```mlir
cvt.s32.s16 %r1, %rs2;
neg.s16 %rs4, %rs2;
setp.lt.s32 %p4, %r1, 0;
selp.b16 %rs3, %rs4, %rs2, %p4;
```
After, it becomes a single `abs.s16` instruction.
This also improves LLVM's ability to optimize floats. e.g. `abs(t) *
abs(t)` is optimized to `t * t` now which didn't happen before.
---------
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
This PR;
- Fixes syntax errors like `.type values: dict[str,
Callable[[list[Any]], Any]]` to `:type values: dict[str,
Callable[[list[Any]], Any]]`,
- Fixes typos,
- Fixes formatting like `k ++` to ` k++`,
- Increases consistency (e.g. by transforming the minority `cd dir/` to
the majority `cd dir`).
- Add text description and equations for the tutorial.
- Improve the code readability by changing variable names to align them
with the equation. The actual code logic is not changed.
This is a follow-up of #510. Let me know if a preview HTML is helpful
for the review, I can add a link to that too.
- Fix meta-parameter usage on tutorials.
- Install tutorial dependencies on CI.
- Switch from `requirements-test.txt` to `extras_require` for test dependencies, and also use it for tutorial dependencies.
- Make some performance tests deterministic.
- Added docstr for autotune, Config, heuristics
- Added docstr for atomics
- Hiding internal _builder argument used for built-in language primitives
- Re-factor docstr to use common templates between similar functions.
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.
See documentations for more information on the new API
The solution proposed in #77 can create namespace conflicts when triton and triton-nightly have both been pip installed. Therefore, this PR is moving nightly releases to pre-releases in the main triton index.
Recently there has been more and more report about installation issues:
- Installing Triton before upgrading pytorch can create some issues because Triton uses some torch headers
- llvm-10-dev not available on some platform; llvm-11-dev not available on e.g. Ubuntu.
absence of nightly builds
This PR should fix all these issues. Some CMake tricks are used to download and install llvm at build time. Triton Python bindings were modified to remove dependence on pytorch ops. Midnight CI job added to generate binary wheels for all Triton version and update them on pypi's new triton-nightly project.
This PR will also make it very easy to use LLVM forks in the future for whatever needs we have.
* Simplified `triton.kernel` API to achieve lower latency:
> .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
> compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
> torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops