* Cleaned up pipeline pass. Now works when there are element-wise ops
between the load and the dot
* Made `splat` compatible with varibales that have DotOperandLayout
* Moves rematerialization utils to separate Transforms/Utility.cpp file.
* Frontend:
- `int` kernel arguments are always signed
- Loop induction variable is now determine by integer promotion on
lb/ub/step
* Optimizer:
- Added new ExtractSliceOp that enforces 32-bit offsets
* Backend:
- Use 64-bit indices when lowering functions and control flow
- Removed `idx_val` macro and replaced it with `i32_val`
- Cleaned up comments
- Added new ArithToIndex pass to make sure operations on indices are
done with the `index` dialect, that gets converted to LLVM separately
using a 64-bit target
Feels very wrong, and probably not the right way to do this. But
otherwise `scf.if` doesn't get initialized since the merge to llvm-head.
Suggestions are welcome 😅
This is to solve https://github.com/openai/triton/issues/1236
This commit hides the symbols of the shared libraries for
`libtriton.so`, so that when other object link against `libtriton.so`,
it won't have confilct.
The function calculates the swizzled address to **store** (not load), so
we should use `outOrder` instead of `inOrder`. Current tests do not
cover this case, but at NVIDIA, we have a case related to `sm_90` that
could trigger. Already discussed in the Slack channel with @Jokeren.
Per issue https://github.com/openai/triton/issues/1228. I believe we are
potentially exposed when a Triton executor (Pytorch for example) links
in two or more `triton_.so` shared objects and each has a stub for
`_launch`.
This fix ensures the `_launch` function is tied locally to the calling
`__triton_launcher` and can't be misused by another library.
Python 3.10 changes where packages are installed by default, causing
problems with Ubuntu into `/local`. See
[this](https://lists.debian.org/debian-python/2022/03/msg00039.html) and
[this](https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1967920).
Triton seems to break when using 3.10 as it looks for the headers, but
the headers are not in `/local`, e.g. they are at
`/usr/include/python3.X` and not `/usr/local/include/python3.X`
Not 100% sure what's going on here since it's deep in python / pip, but
I think this should fix it. Otherwise, you have to hack around it in
dockerfiles, e.g. `ENV DEB_PYTHON_INSTALL_LAYOUT=deb`, which breaks
things with the release of pip that went.
---------
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Fix issue https://github.com/openai/triton/issues/244
Check `end` is greater than `start`.
Check if the range can fit in `int32`.
Check the number of elements less than or equal to
`TRITON_MAX_TENSOR_NUMEL = 131072`.
---------
Co-authored-by: Philippe Tillet <phil@openai.com>
This pull request addresses a crash that occurs when casting to a
tl.constexpr type in the frontend.
More info and repro code available in:
https://github.com/openai/triton/issues/1221
Make cmake happier, it doesn't like multiple target_link_library
definitions for the same name.
Use find_package instead on Windows for dlfcn-win32.
Set LLVM_SYS_PATH on Windows for python setup.
Debug build almost working, AlwaysCreate error thrown still.