Files
ROCm/python/tutorials
Vinayak Gokhale 9cdf3a58c3 Enable split kernel in bwd pass (#303)
* Add fwd and bwd v2

Changes are largely from upstream.

* Split bwd kernel in dq and dk+dv

Only adds the split kernels. They are not enabled yet.

* Pull scalar multiplies out of the loop

* Enable split kernel for bwd pass

* Put back P_SEQ=128 in fwd test

Not used for bwd test

* Address review comments

* Address comments

Conditionally set causal/ splitkernel to False for bwd.

* Add block pointer semantics to bwd pass

This significantly increases perf for bwd, similar to fwd.
2023-08-29 13:51:29 -05:00
..

Tutorials
=========

Below is a gallery of tutorials for writing various basic operations with Triton. It is recommended that you read through the tutorials in order, starting with the simplest one.

To install the dependencies for the tutorials:

.. code-block:: bash

    cd triton
    pip install -e './python[tutorials]'