* start

* fix err 93

* gpu

* ioctl mappings

* alloc like cuda

* semaphores

* wait for semaphores value

* start ops_nv

* very simple kernels work

* init several gpus

* qmd dumper

* dirty, but most of kernels work

* always all test_ops

* progress, more tests, stable

* test_ops passes, gpt2 works

but wth big fifo, wrap of fifo doesn't work, i think it's something coherency releated

* need better sync

* fix sync

* alloc2

* all tests pass!

* cleanup 1

* cleanup

* multigpu, simple transfer

* fix sync

* correct init

* nv_gpu autogen + sync bug fix

* clean extra/nv_gpu_driver

* p2p

* clean up

* remove old gen

* small fixes

* cleanup

* cleanup 2

* small fixes

* bigger queue size

* cleanups

* wait

* fixed signals for devs

* fix hang + parallel beam

* small fixes

* detect when local memory is big in kernel

* correct assert

* small fixes

* correct tls size est

* one va space

* less lines

* shorter

* save 2 lines

* save some lines

* remove type ignores

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
This commit is contained in:
nimlgen
2024-04-22 18:50:20 +03:00
committed by GitHub
parent 77a3780005
commit e6227bdb15
10 changed files with 11067 additions and 3430 deletions

View File

@@ -458,8 +458,11 @@ jobs:
if: matrix.backend == 'cuda'
run: |
cp tinygrad/runtime/autogen/cuda.py /tmp/cuda.py.bak
cp tinygrad/runtime/autogen/nv_gpu.py /tmp/nv_gpu.py.bak
./autogen_stubs.sh cuda
./autogen_stubs.sh nv
diff /tmp/cuda.py.bak tinygrad/runtime/autogen/cuda.py
diff /tmp/nv_gpu.py.bak tinygrad/runtime/autogen/nv_gpu.py
- name: Verify HIP autogen
if: matrix.backend == 'hip'
run: |