mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-01-10 15:38:29 -05:00
Make Triton work again (#1547)
* Move ops_triton to runtime and remove errors from deprecated code * Remove deprecated AST Kernel * Remove deprecated buffer * Add TritonProgram * Triton Buffer * Use RawCUDABuffer * triton_compile * Added new parameter * pass _buf to program * remove deprecated include * Added triton tests * Deprecated includes removed * remove double print * Disable float4 support * Disable float4 support * variable load fix * Track local size * Add pycuda to triton dependencies * Merge test.yml * install cuda packages for testing * merge double package install * remove emulated from triton tests * upscale local index to power of 2 and add masking * cuda envs * Add TernaryOps * ConstOp loading * proper function name * remove deprecated variables * get global program from name * const ops match local shape * Enable test_nn * remove deprecated import * fix linter error * Add wait logic * Add local size override * accumulate local shapes instead of using max shape * Merge triton tests into global tests * fix envs in testing * Old testing routine * split file into renderer and program * remove print and starting whitespace * pretty ptx print on debug 5 * linter errors * ignore triton saturation tests * ignore test example * remove pytorch cpu extra index * Add triton to existing testing routine * use triton tests * disable cuda backend in triton tests * use cudacpu in tests * print used device * Print device default * Remove print * ensure we are running triton backend * update variable signatures * update dtypes for load * infinity render fixed * limit global size * negative infinity now properly rendered * split chain with parentheses for and node * Add option to disable shared memory, disable for triton * missing import * Properly index and mask conditional load * use mask only if not loading a block pointer * nan support * fix symbolic tests to include chain split * proper masking for stores * Implemented bool dtype * Add mod * fix loads for variables with valid range * merge triton with cuda runtime * merge from master * run triton tests with cuda * Correct target when running from triton * conftest with triton compiler config * use triton nightly * verbose tests for triton * capture stdout * fix function depth when exiting multiple loops * add render valid function for readabilty * fix mask for local loops * add _arg_int32 datatype * fix dims for conditional loads * enable non float stores * correct variable dtypes * fix type for arg_int32 * remove junk * Added get max function for range based var.max * remove deprecated code * Fix triton ptxas path * Fix testing for CI * clamp local size by max local size instead of always running max * Disable matmul test in triton cpu * rerun tests * Disable broken test in triton cpu * whitespace removed * rerun tests again * Disable TestSymbolicOps for triton * update to new uops * linter fix * ignore test/extra * linting fix * Update tinygrad/renderer/triton.py Co-authored-by: Gijs Koning <gijs-koning@live.nl> * remove deprecated line * quotes type fix * linter * Remove unnecesary lines * UnaryOps.NEG * dont define constants * Linting fix * Disable tests that are broken in ocelot * remove trailing whitespace * reduce line count * linting fix * update to new uast * New looping style * Update to new uast * make AST runner work with triton * linting fix * set renderer var for testing * disable local for ocelot * reenable all tests for ocelot * Pass shared to cuda * Don't group if the backend doesn't support shared mem * use working gpuocelot branch * enable all tests * enable local for ocelot * cleanup * Update test.yml * update cache key * reenable test symbolic and extra * Update test.yml * Revert "Update test.yml" (rerun tests) This reverts commit98c0630ee5. * Revert "fix symbolic tests to include chain split" This reverts commit22a9a4c9cd. * Revert "split chain with parentheses for and node" This reverts commit7499a7004e. * use global size from linearizer * rename newvar to dtype to match other renderers * join program start lines * simplify code that adds axis to local dims * assign r[u] in ssa * We no longer need to replace target in src * we no longer need to cast indices to int by hand * Update triton.py(rerun tests) * Update triton.py(rerun tests) * Update triton.py(rerun tests) --------- Co-authored-by: Gijs Koning <gijs-koning@live.nl> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
This commit is contained in:
26
.github/workflows/test.yml
vendored
26
.github/workflows/test.yml
vendored
@@ -210,7 +210,7 @@ jobs:
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
backend: [llvm, clang, gpu, cuda] #, ptx]
|
||||
backend: [llvm, clang, gpu, cuda, triton] #, ptx]
|
||||
|
||||
name: Tests on (${{ matrix.backend }})
|
||||
runs-on: ${{ matrix.backend == 'gpu' && 'ubuntu-20.04' || 'ubuntu-latest' }}
|
||||
@@ -229,7 +229,7 @@ jobs:
|
||||
path: ${{ env.Python3_ROOT_DIR }}/lib/python3.11/site-packages
|
||||
key: ${{ matrix.backend }}-packages-${{ hashFiles('*/setup.py') }}
|
||||
- name: Set env
|
||||
run: printf "${{ matrix.backend == 'llvm' && 'ENABLE_METHOD_CACHE=1\nLLVM=1' || matrix.backend == 'clang' && 'CLANG=1\nENABLED_METHOD_CACHE=1' || matrix.backend == 'gpu' && 'GPU=1' || matrix.backend == 'cuda' && 'FORWARD_ONLY=1\nJIT=1\nOPT=2\nCUDA=1\nCUDACPU=1\n' || matrix.backend == 'PTX' && 'FORWARD_ONLY=1\nJIT=1\nOPT=2\nCUDA=1\nCUDACPU=1\nPTX=1' }}" >> $GITHUB_ENV
|
||||
run: printf "${{ matrix.backend == 'llvm' && 'ENABLE_METHOD_CACHE=1\nLLVM=1' || matrix.backend == 'clang' && 'CLANG=1\nENABLED_METHOD_CACHE=1' || matrix.backend == 'gpu' && 'GPU=1' || matrix.backend == 'cuda' && 'FORWARD_ONLY=1\nJIT=1\nOPT=2\nCUDA=1\nCUDACPU=1\n' || matrix.backend == 'PTX' && 'FORWARD_ONLY=1\nJIT=1\nOPT=2\nCUDA=1\nCUDACPU=1\nPTX=1' || matrix.backend == 'triton' && 'FORWARD_ONLY=1\nJIT=1\nOPT=2\nCUDA=1\nCUDACPU=1\nTRITON=1\nTRITON_PTXAS_PATH=/usr/bin/ptxas'}}" >> $GITHUB_ENV
|
||||
- name: Find faster apt mirror
|
||||
# uses: vegardit/fast-apt-mirror.sh@v1
|
||||
# - name: Install packages (gpu)
|
||||
@@ -240,40 +240,39 @@ jobs:
|
||||
sudo apt update -y
|
||||
sudo apt install -y --no-install-recommends intel-oneapi-runtime-compilers intel-oneapi-runtime-opencl
|
||||
- name: Install packages (cuda)
|
||||
if: matrix.backend == 'cuda' || matrix.backend == 'ptx'
|
||||
if: matrix.backend == 'cuda' || matrix.backend == 'ptx' || matrix.backend == 'triton'
|
||||
run: |
|
||||
sudo apt update -y
|
||||
sudo apt install -y --no-install-recommends git g++ cmake ninja-build llvm-15-dev zlib1g-dev libglew-dev flex bison libfl-dev libboost-thread-dev libboost-filesystem-dev nvidia-cuda-toolkit-gcc
|
||||
- name: Cache gpuocelot
|
||||
if: matrix.backend == 'cuda' || matrix.backend == 'ptx'
|
||||
if: matrix.backend == 'cuda' || matrix.backend == 'ptx'|| matrix.backend == 'triton'
|
||||
id: cache-build
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
cache-name: cache-gpuocelot-build
|
||||
with:
|
||||
path: ${{ github.workspace }}/gpuocelot/ocelot
|
||||
key: ubuntu22.04-gpuocelot-19626fc00b6ee321638c3111074269c69050e091
|
||||
key: ubuntu22.04-gpuocelot-szymonozog-tinygrad
|
||||
- name: Clone/compile gpuocelot
|
||||
if: (matrix.backend == 'cuda' || matrix.backend == 'ptx') && steps.cache-build.outputs.cache-hit != 'true'
|
||||
if: (matrix.backend == 'cuda' || matrix.backend == 'ptx' || matrix.backend == 'triton') && steps.cache-build.outputs.cache-hit != 'true'
|
||||
run: |
|
||||
git clone --recurse-submodules https://github.com/gpuocelot/gpuocelot.git ${{ github.workspace }}/gpuocelot
|
||||
git clone --recurse-submodules --single-branch --branch tinygrad https://github.com/SzymonOzog/gpuocelot.git ${{ github.workspace }}/gpuocelot
|
||||
cd ${{ github.workspace }}/gpuocelot/ocelot
|
||||
git checkout 19626fc00b6ee321638c3111074269c69050e091
|
||||
mkdir build
|
||||
cd build
|
||||
cmake .. -Wno-dev -G Ninja -DOCELOT_BUILD_TOOLS=OFF
|
||||
ninja
|
||||
- name: Install gpuocelot
|
||||
if: matrix.backend == 'cuda' || matrix.backend == 'ptx'
|
||||
if: matrix.backend == 'cuda' || matrix.backend == 'ptx' || matrix.backend == 'triton'
|
||||
run: |
|
||||
cd ${{ github.workspace }}/gpuocelot/ocelot/build
|
||||
sudo ninja install
|
||||
- name: Install dependencies
|
||||
run: pip install -e '.[testing${{matrix.backend=='llvm'&&',llvm'||matrix.backend=='cuda'&&',cuda'||matrix.backend=='ptx'&&',cuda'||''}}]' --extra-index-url https://download.pytorch.org/whl/cpu
|
||||
run: pip install -e '.[testing${{matrix.backend=='llvm'&&',llvm'||matrix.backend=='cuda'&&',cuda'||matrix.backend=='ptx'&&',cuda'||matrix.backend=='triton'&&',triton'||''}}]' --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/
|
||||
- name: Check Device.DEFAULT
|
||||
run: python -c "from tinygrad.ops import Device; assert Device.DEFAULT in ['LLVM','CLANG','CUDA','GPU'], Device.DEFAULT"
|
||||
run: python -c "from tinygrad.lazy import Device; assert Device.DEFAULT in ['LLVM','CLANG','CUDA','GPU'], Device.DEFAULT"
|
||||
- name: Run pytest (not cuda)
|
||||
if: matrix.backend!='cuda' && matrix.backend!='ptx'
|
||||
if: matrix.backend!='cuda' && matrix.backend!='ptx' && matrix.backend!='triton'
|
||||
run: python -m pytest -n=auto test/ -k '${{matrix.backend=='llvm'&&'not (test_nn.py and test_conv_transpose2d)'||'test'}}' -m 'not exclude_${{matrix.backend}}'
|
||||
- name: Run pytest (cuda)
|
||||
if: matrix.backend=='cuda'
|
||||
@@ -281,6 +280,9 @@ jobs:
|
||||
- name: Run pytest (ptx)
|
||||
if: matrix.backend=='ptx'
|
||||
run: python -m pytest -n=auto test/ -k 'not (half or test_efficientnet_safetensors) and not (test_conv2d and test_tensor.py)' -m 'not exclude_cuda' --ignore=test/external --ignore=test/models
|
||||
- name: Run pytest (triton)
|
||||
if: matrix.backend=='triton'
|
||||
run: python -m pytest -n=auto test/ -k 'not (half or test_efficientnet_safetensors) and not (test_conv2d and test_tensor.py)' -m 'not exclude_cuda' --ignore=test/external --ignore=test/models
|
||||
|
||||
testunicorn:
|
||||
name: ARM64 unicorn Test
|
||||
|
||||
Reference in New Issue
Block a user