Philippe Tillet
269ebc12e5
[PYTHON][TESTS][DOC] Various improvement of the API and code quality:
...
* Simplified `triton.kernel` API to achieve lower latency:
> .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
> compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
> torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
c0bc7ed8b0
[PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily
2021-07-27 12:38:48 -07:00
Philippe Tillet
a77c925dfd
[DRIVER] Improved performance of Host driver code
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4
[GENERAL] Various bugfixes
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f3ee53f24
[PYTHON] Added option to show PTX source code in Python
2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d
[LANG] Added log intrinsic
2021-07-27 12:38:48 -07:00
Philippe Tillet
02a6e81b88
[PYTHON] Cleaning C++ bindings
2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5
[GENERAL] Various improvements:
...
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d
[CODEGEN] More work on the CPU backend
2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05
[RUNTIME] Lower-level interface for executing functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39
[CODEGEN][ANALYSIS] Fixed issue in layout inference
2021-07-27 12:38:48 -07:00
Philippe Tillet
c33d6d15f5
[TRITON][PYTHON] Reverted back to distutils
2021-07-27 12:38:48 -07:00
Philippe Tillet
955b027103
[TRITON][KERNEL] Fixed issue for concurrent compilation of torch
...
extensions
2021-07-27 12:38:48 -07:00
Philippe Tillet
f35b9100e2
[PYTHON] Restored compatibility with powerpc
2021-07-27 12:38:48 -07:00
Philippe Tillet
1426b103e9
[PYTHON] Removed -std=gnu++11 in extra_cflags
2021-07-27 12:38:48 -07:00
Philippe Tillet
04a9ea060b
[GENERAL] Added compatibility with pytorch 1.2.0 and powerpc
2021-07-27 12:38:48 -07:00
Philippe Tillet
609ef3a24d
[CORE] Fixed bug for Multi-GPU
2021-07-27 12:38:48 -07:00
Philippe Tillet
5bb977173f
[PYTHON][EINSUM] re-established auto-tuning
2021-07-27 12:38:48 -07:00
Philippe Tillet
4ae0e28b32
[PYTHON][KERNEL] Added thread-safety when caching custom torch op
2021-07-27 12:38:48 -07:00
Philippe Tillet
94e8ee7f01
[PYTHON][KERNEL] Better handling of case where cache directory already
...
exists
2021-07-27 12:38:48 -07:00
Philippe Tillet
3304629de9
[CORE] Fixed several issues that arose in the development of the
...
torch-blocksparse package:
* Now using warp shuffle in reductions when possible
* Various bugfixes in layout inference
* Added INFINITY, exponential and select
* Better error messages for unimplemented constructs
2021-07-27 12:38:48 -07:00
Philippe Tillet
268894a5ce
[PYTHON] Merged blocksparse branch:
...
* Example for blocksparse matrix multiplication
* Simplified Triton kernel API
* Revived auto-tuning in einsum
2021-07-27 12:38:48 -07:00
Philippe Tillet
ea37ba5d35
[PYTHON][OPS] Fixed typo in einsum
2021-07-27 12:38:48 -07:00
Philippe Tillet
926acc2e28
[TRITON][NN][CONV] Renamed input -> x to not modify built-in functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
420e36a038
[PYTHON][NN][CONV] Fixed typo in dx computation
2021-07-27 12:38:48 -07:00
Philippe Tillet
ecb0d81b2d
[PYTHON] Added missing files for nn submodule
2021-07-27 12:38:48 -07:00
Philippe Tillet
3d769b57e2
[PYTHON] Better packaging
2021-07-27 12:38:48 -07:00
Philippe Tillet
dfb844bf41
[GENERAL] Improved caching mechanism:
...
* Now computing hash in libtriton
* Now only compiling a single pytorch hook per function signature
2021-07-27 12:38:48 -07:00
Philippe Tillet
30f77e9ec5
[PYTHON][OPS][EINSUM] Now throwing error for automatic differentiation
...
of extended einsum
2021-07-27 12:38:48 -07:00
Philippe Tillet
4e50ef4076
[PYTHON][OP][EINSUM] simplified API
2021-07-27 12:38:48 -07:00
Philippe Tillet
26fd884d96
[PYTHON][OPS][EINSUM] Added support for inner tensor strides
2021-07-27 12:38:48 -07:00
Philippe Tillet
4181f9f2af
[CODEGEN][TRANSFORM][PEEPHOLE] Fixed bug in *1 multiplication
2021-07-27 12:38:48 -07:00
Philippe Tillet
3816f2f259
[PYTHON][EINSUM] Now handling reduction sizes that are not a multiple of
...
TK
2021-07-27 12:38:48 -07:00
Philippe Tillet
fa4ec7ea65
[PYTHON][OPS][EINSUM] Added support for masked accumulator
2021-07-27 12:38:48 -07:00
Philippe Tillet
404dd18333
[PYTHON][CORE] Deprecating Tensorflow support
2021-07-27 12:38:48 -07:00
Philippe Tillet
558422c18a
[PYTHON][EXAMPLES] Changed shape of einsum examples
2021-07-27 12:38:48 -07:00
Philippe Tillet
6d7cf35123
History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space
2021-07-27 12:38:38 -07:00