Devicebufferless (#708)

* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
This commit is contained in:
George Hotz
2023-03-18 14:40:23 -07:00
committed by GitHub
parent 26a3888ab8
commit f5467cfedc
37 changed files with 471 additions and 446 deletions

View File

@@ -79,7 +79,7 @@ jobs:
run: curl https://media.istockphoto.com/photos/hen-picture-id831791190 | ./recognize | grep hen
testllvm:
name: LLVM Tests
name: LLVM Tests (w method cache)
runs-on: ubuntu-latest
steps:
@@ -160,7 +160,9 @@ jobs:
- name: Install Dependencies
run: pip install -e '.[gpu,testing]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Test GPU IMAGE ops
run: GPU=1 IMAGE=2 python3 test/test_ops.py
run: |
GPU=1 IMAGE=1 python3 test/test_ops.py
FORWARD_ONLY=1 GPU=1 IMAGE=2 python3 test/test_ops.py
- name: Test openpilot model
run: |
ALLOWED_KERNEL_COUNT=197 FLOAT16=1 VALIDHACKS=1 DEBUGCL=1 GPU=1 IMAGE=2 python3 openpilot/compile.py