This exposes `semantic.expand_dims` in the public API and builds upon it
with support for expanding multiple dimensions at once. e.g.
```python
tl.expand_dims(tl.arange(0, N), (0, -1)) # shape = [1, N, 1]
```
Compared to indexing with `None`, this API is useful because the
dimensions can be constexpr values rather than hard-coded into the
source. As a basic example
```python
@triton.jit
def max_keepdim(value, dim):
res = tl.max(value, dim)
return tl.expand_dims(res, dim)
```
`tl.reduction` is currently tested indirectly through the existing
reduction operators, but it's good to have a direct test for the
function itself.
---------
Co-authored-by: Philippe Tillet <phil@openai.com>
This generates identical PTX for floating point, but for integer types
the resulting PTX is much better. For example `tl.abs` for int16
currently generates
```mlir
cvt.s32.s16 %r1, %rs2;
neg.s16 %rs4, %rs2;
setp.lt.s32 %p4, %r1, 0;
selp.b16 %rs3, %rs4, %rs2, %p4;
```
After, it becomes a single `abs.s16` instruction.
This also improves LLVM's ability to optimize floats. e.g. `abs(t) *
abs(t)` is optimized to `t * t` now which didn't happen before.
---------
Co-authored-by: Keren Zhou <kerenzhou@openai.com>
This PR;
- Fixes syntax errors like `.type values: dict[str,
Callable[[list[Any]], Any]]` to `:type values: dict[str,
Callable[[list[Any]], Any]]`,
- Fixes typos,
- Fixes formatting like `k ++` to ` k++`,
- Increases consistency (e.g. by transforming the minority `cd dir/` to
the majority `cd dir`).
- Added docstr for autotune, Config, heuristics
- Added docstr for atomics
- Hiding internal _builder argument used for built-in language primitives
- Re-factor docstr to use common templates between similar functions.