- Note `wheel` as a build-time dependency.
- Add tips for getting a faster build.
- Add instructions for running tests.
- Add flag to build with ccache.
(Thanks to @ThomasRaoux for most of these instructions!)
Current setup.py could not clean the build directly because the default
build directly has been changed in `CMakeBuild`. This PR is to clean
build directly in this regard.
- These minor fixes are not specific to interface changes from LLVM main
or official llvm-17 branch and can be applied on triton main branch.
- https://github.com/darkbuck/triton/tree/darkbuck/main/llvm-main-branch
has extra changes to build again LLVM main branch build to enable me to
work on other backends on the main branch only. That's the hobby effort
and just FYR.
The initial code merge of Nvidia Hopper features support. Please be
aware that the code merge is not finished yet and the trouble-shooting
is still ongoing. The new hardware features (GMMA, TMA, STMATRIX etc.)
and automatic warp-specialization are experimental for now and turned
off by default. It is recommended for a trial when version 3.0 is
released.
The work is contributed by:
ben-zhang-609, bealwang, donproc, qliu93, jsh20, allatit23, LyricZhao,
ivanyinwz, goostavz & yangjunpro
from Nvidia, in cooperation with:
ptillet, Jokeren, ThomasRaoux & zahimoud
from OpenAI.
Co-authored-by: Goostav Zhu <gzhu@nvidia.com>
The third-party backend might install its python package to the
`triton/third_party` python package during the build process. But the
`build_py` could be executed before the `build_ext`, and then `build_py`
would only copy the `packages` defined in the `setup.py` w/o the
third-party related packages as the third-party backend has not been
built, which is triggered by `build_ext`. Therefore, this PR refined the
build order a little bit to ensure `build_ext` always happens before
`build_py`.
This depends on a [pending LLVM
release](https://github.com/ptillet/triton-llvm-releases/pull/10).
* Implement setCalleeFromCallable in CallOp.
* Cast type to ShapedType for various getters.
* Improve TritonDialect::materializeConstant due to breaking change in
constructor of arith::ConstantOp.
* Add OpaqueProperties argument in inferReturnTypes.
Co-authored-by: Philippe Tillet <phil@openai.com>
Simple mechanism to run Triton kernels on PyTorch for debugging purpose
(upstream from Kernl).
Todo:
- random grid iteration
- support of atomic ops
- more unit tests
- cover new APIs?
Fixes#1545
`build_temp` is a temporary directory which `distutils` used to keep in
the `./build` directory, but when `pyproject.toml` is present `pip` now
puts it in `/tmp` and removes it at the end of the build.
Instead, this creates a new permanent directory like
`python/build/cmake.linux_x86_64-cpython-3.8` (the old name but with
cmake instead of temp).
While I was looking at the verbose pip output, I also noticed a bunch of
warnings like
```
Python recognizes 'triton/runtime.backends' as an importable package,
but it is not listed in the `packages` configuration of setuptools.
'triton/runtime.backends' has been automatically added to the distribution only
because it may contain data files, but this behavior is likely to change
in future versions of setuptools (and therefore is considered deprecated).
```
So I've also added these to the packages list.
---------
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
This way reduces build time with assertions enabled LLVM and
dramatically speeds up triton's build with a "debug" LLVM.
Co-authored-by: Philippe Tillet <phil@openai.com>
The `triton/compiler`, `triton/runtime/driver`, and `triton/third_party`
subpackages were missing from the distribution built with the old
`setup.py` after #1464, causing an immediate error upon importing Triton
with a non-editable installation. This change adds the missing Python
subpackages and moves `triton/third_party` inclusion to `MANIFEST.in`,
where it will automatically be included in wheels due to the existing
`include_package_data` setup flag.
The purpose of this PR is to remove some circular dependencies and
separate concerns better in the frontend. It's still not perfect --
`triton.compile` still includes a few runtime architecture-specific
component, but at least much better than before.
This PR still assumes that AMD only supports empty kernels right now.
Other PRs will follow to make the frontend supports multiple devices in
a more modular way.
On some machines, the amount of available RAM might not be enough to
compile Triton with `2 * num_cpus` parallelism. For example, CircleCI's
`large` instance can't handle Triton compilation as is due to
insufficient memory.
Instead, I propose to take PyTorch's approach where we can define a
[`MAX_JOBS` env
var](0e4ddc2b40/tools/setup_helpers/cmake.py (L366-L368))
that gives the user the possibility to reduce (or increase) the
parallelism during compilation.
Co-authored-by: Philippe Tillet <phil@openai.com>
When the user set the `LLVM_SYSPATH` to use custom build llvm, it will
throw the error because there is no version.txt under the custom build
one.
This PR skips the version check If the `LLVM_SYSPATH` is set.
---------
Co-authored-by: Philippe Tillet <phil@openai.com>
Make cmake happier, it doesn't like multiple target_link_library
definitions for the same name.
Use find_package instead on Windows for dlfcn-win32.
Set LLVM_SYS_PATH on Windows for python setup.
Debug build almost working, AlwaysCreate error thrown still.