- Significant simplification of the optimizer pipeline. Right mma
version is now set directly after the coalescing pass. DotOperand layout
no longer hold a state to `isRow` argument, and instead query it from
their parent
- Moved a bunch of things from TritonGPUToLLVM/DotOpHelpers to
TritonGPUAttrDefs. All MMAv1 state is now queried from attributes.
- logic for getELemsPerThread is no longer duplicated in TypeConverter
`_triton.runtime.num_sm`, `_triton.runtime.clock_rate`,
`_triton.runtime.cc` seem no longer exist.
use the corresponding methods from `get_max_tensorcore_tflops` in the
same file.
* Cleaned up pipeline pass. Now works when there are element-wise ops
between the load and the dot
* Made `splat` compatible with varibales that have DotOperandLayout
* Moves rematerialization utils to separate Transforms/Utility.cpp file.
* Frontend:
- `int` kernel arguments are always signed
- Loop induction variable is now determine by integer promotion on
lb/ub/step
* Optimizer:
- Added new ExtractSliceOp that enforces 32-bit offsets
* Backend:
- Use 64-bit indices when lowering functions and control flow
- Removed `idx_val` macro and replaced it with `i32_val`
- Cleaned up comments
- Added new ArithToIndex pass to make sure operations on indices are
done with the `index` dialect, that gets converted to LLVM separately
using a 64-bit target
Per issue https://github.com/openai/triton/issues/1228. I believe we are
potentially exposed when a Triton executor (Pytorch for example) links
in two or more `triton_.so` shared objects and each has a stub for
`_launch`.
This fix ensures the `_launch` function is tied locally to the calling
`__triton_launcher` and can't be misused by another library.
Python 3.10 changes where packages are installed by default, causing
problems with Ubuntu into `/local`. See
[this](https://lists.debian.org/debian-python/2022/03/msg00039.html) and
[this](https://bugs.launchpad.net/ubuntu/+source/python3.10/+bug/1967920).
Triton seems to break when using 3.10 as it looks for the headers, but
the headers are not in `/local`, e.g. they are at
`/usr/include/python3.X` and not `/usr/local/include/python3.X`
Not 100% sure what's going on here since it's deep in python / pip, but
I think this should fix it. Otherwise, you have to hack around it in
dockerfiles, e.g. `ENV DEB_PYTHON_INSTALL_LAYOUT=deb`, which breaks
things with the release of pip that went.
---------
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Fix issue https://github.com/openai/triton/issues/244
Check `end` is greater than `start`.
Check if the range can fit in `int32`.
Check the number of elements less than or equal to
`TRITON_MAX_TENSOR_NUMEL = 131072`.
---------
Co-authored-by: Philippe Tillet <phil@openai.com>
This pull request addresses a crash that occurs when casting to a
tl.constexpr type in the frontend.
More info and repro code available in:
https://github.com/openai/triton/issues/1221
Make cmake happier, it doesn't like multiple target_link_library
definitions for the same name.
Use find_package instead on Windows for dlfcn-win32.
Set LLVM_SYS_PATH on Windows for python setup.
Debug build almost working, AlwaysCreate error thrown still.
Minor bug: AutoTuner currently throws the following error when certain
configs go OutOfResources (e.g. the matmul example when testing on GPUs
with less shared memory).