nv driver (#4044)

* start * fix err 93 * gpu * ioctl mappings * alloc like cuda * semaphores * wait for semaphores value * start ops_nv * very simple kernels work * init several gpus * qmd dumper * dirty, but most of kernels work * always all test_ops * progress, more tests, stable * test_ops passes, gpt2 works but wth big fifo, wrap of fifo doesn't work, i think it's something coherency releated * need better sync * fix sync * alloc2 * all tests pass! * cleanup 1 * cleanup * multigpu, simple transfer * fix sync * correct init * nv_gpu autogen + sync bug fix * clean extra/nv_gpu_driver * p2p * clean up * remove old gen * small fixes * cleanup * cleanup 2 * small fixes * bigger queue size * cleanups * wait * fixed signals for devs * fix hang + parallel beam * small fixes * detect when local memory is big in kernel * correct assert * small fixes * correct tls size est * one va space * less lines * shorter * save 2 lines * save some lines * remove type ignores --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-01-09 15:08:02 -05:00 · 2024-04-22 18:50:20 +03:00
parent 77a3780005
commit e6227bdb15
10 changed files with 11067 additions and 3430 deletions
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -458,8 +458,11 @@ jobs:
        if: matrix.backend == 'cuda'
        run: |
          cp tinygrad/runtime/autogen/cuda.py /tmp/cuda.py.bak
+          cp tinygrad/runtime/autogen/nv_gpu.py /tmp/nv_gpu.py.bak
          ./autogen_stubs.sh cuda
+          ./autogen_stubs.sh nv
          diff /tmp/cuda.py.bak tinygrad/runtime/autogen/cuda.py
+          diff /tmp/nv_gpu.py.bak tinygrad/runtime/autogen/nv_gpu.py
      - name: Verify HIP autogen
        if: matrix.backend == 'hip'
        run: |