mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
* Add fwd and bwd v2 Changes are largely from upstream. * Split bwd kernel in dq and dk+dv Only adds the split kernels. They are not enabled yet. * Pull scalar multiplies out of the loop * Enable split kernel for bwd pass * Put back P_SEQ=128 in fwd test Not used for bwd test * Address review comments * Address comments Conditionally set causal/ splitkernel to False for bwd. * Add block pointer semantics to bwd pass This significantly increases perf for bwd, similar to fwd.
Tutorials
=========
Below is a gallery of tutorials for writing various basic operations with Triton. It is recommended that you read through the tutorials in order, starting with the simplest one.
To install the dependencies for the tutorials:
.. code-block:: bash
cd triton
pip install -e './python[tutorials]'