Commit Graph

1068 Commits

Author SHA1 Message Date
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
Roelof van Dijk
d0e21a7398 ci: don't install recommended packages for GPU (#1215)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-07-11 15:38:49 -07:00
George Hotz
beb4d3ab01 Tensor Cores 2: Local Buffers Edition (#1057)
* local buffers

* work

* works

* invert_strides

* work

* non tc

* fix shapetracker bug

* stride priority

* touchups

* gate tensor cores

* tensor core conv

* cleanups

* bug fixes

* fix metal_matmul

* fast tensor cores

* more speed

* buffer selection bug fix

* fix CI maybe

* ugh, CI is set to true, not 1

* tc allowed

* add_gl_dimension

* split out padding conv tests

* does padding add fail

* test_padded_conv2d_1x1

* skip metal ci stuff

* more strict on yellow

* float2

* strip parens

* fix float2

* touch up

* dtype

* strip parens

* no alias

* bugfix

* cast float2 and test tensor core ops

* oops, don't hardcode 4
2023-07-09 09:06:00 -07:00
George Hotz
7151382364 Refactor load/store before tensor cores (#1193)
* minor cleanups

* render_const

* now that's a nice refactor

* clean up vload/vstore

* clean up render_load

* debugs there

* dumb

* err, this?

* const float4

* what's failing

* bugfix

* statement includes semicolon

* bugfix
2023-07-08 15:54:58 -07:00
George Hotz
d9c1d81e99 Revert "feat: cancel previous workflow runs on new commits (#1184)" (#1194)
This reverts commit d66a0c285d.
2023-07-08 11:26:13 -07:00
George Hotz
52600d532e add 20 minute timeout 2023-07-07 23:02:28 -07:00
wozeparrot
d66a0c285d feat: cancel previous workflow runs on new commits (#1184) 2023-07-07 22:55:35 -07:00
foreign-sub
574cbda979 Quickstart (#1015)
* fix quickstart md

* add quickstart to ci
2023-06-29 13:26:58 -07:00
George Hotz
d16c16ec28 new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
George Hotz
70c07dfea5 5k line max (#1064) 2023-06-27 10:53:18 -07:00
George Hotz
0f281e7b18 touchups 2023-06-25 15:24:26 -07:00
George Hotz
c8fbdeb48e test speed llama (#1046)
* test speed llama

* oops, put it back

* uses the real device codegen

* just do it on the mac

* pp

* is faster?

* Revert "is faster?"

This reverts commit 42db542010.

* disable docker again for less load on CI
2023-06-25 15:22:56 -07:00
Jacky Lee
5d16cc283f Docker fix (#1039)
* Docker test

* Remove extra installs

* Don't run full test

* No need for testing dependencies
2023-06-25 10:38:58 -07:00
cloud11665
264b1e5f48 cache gpuocelot build in cuda CI (#1032) 2023-06-22 17:42:12 -07:00
cloud11665
2407690d82 add cuda on cpu tests (#1020) 2023-06-22 14:15:50 -07:00
George Hotz
18892242b0 global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz
0d4c4f4e9e metal ci attempt (#1010)
* metal ci attempt

* skip failing ops tests

* skip in the ops test

* no dtype test
2023-06-19 09:23:55 -07:00
Diogo
6b1280f01c fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956)
* prelu and where casting

* typing for safe_numpy

* optional

* get rid of tracing in ci

* cleanup and resolved layernorm issues

* removed debug print
2023-06-08 16:09:19 -07:00
kposborne2
00360da05b Update broken docs/abstractions.py for changed ops, and add to CI (#930)
* fix and add to ci

* still have those

* ocd

* update other doc
2023-06-04 19:21:20 -07:00
George Hotz
a3feee29c5 make tests faster + add onnx (#815)
* search one dir, disable slow

* onnx tests

* fast rnnt test
2023-05-27 08:53:32 -07:00
George Hotz
faf80418b7 pyopencl by default since GPU is default (#802) 2023-05-25 17:48:18 -07:00
George Hotz
03b38864db fix batchnorm at training (#753)
* e2e testing

* min failure

* no affine on bn, still fails

* why did i think i could detach that?

* allow more kernels for bn

* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz
dbc99c243b why did that test break? 2023-04-18 17:08:38 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Cyril Roumégous
3f08613a2a apply flake8 E203 rule (#684) 2023-03-11 11:35:16 -08:00
George Hotz
1826ff6b89 dtypes nice and clean (#673)
* add dtype class

* dtypes

* buffers are lazy

* dtype is tracked by lazybuffer and GenericShape

* fix types in llvm

* llvm store

* dtype tests

* fix tests maybe

* fix flop counter

* fix CI

* CI fix and check format

* fix dtype and dtype check

* fix custom test

* fix test graph
2023-03-10 16:56:07 -08:00
George Hotz
5dc227dba6 fix bug in ENABLE_METHOD_CACHE and enable for llvm 2023-03-06 07:43:40 -08:00
George Hotz
50012f679b move get_contraction to shapetracker 2023-03-06 06:42:57 -08:00
George Hotz
7a1d96fd76 No negative (#632)
* behavior is correct without VALIDHACKS

* simple div and mod

* fix tests

* no negative variables

* alt form is correct

* still correct

* bug in mulnode

* at least validhacks works now

* cleanups

* test validhacks, and to_image_idx

* cache compare key

* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz
999b44c274 fix external test + speed 2023-03-03 06:46:16 -08:00
George Hotz
459488bba2 fix linter (#630)
* fix linter

* no imports okay

* explicit bases

* disable in pylintrc
2023-03-02 20:06:20 -08:00
George Hotz
bfcec234a2 Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz
3c8da6bd03 add typing 2023-02-28 10:54:46 -08:00
George Hotz
d584bae5c0 fine, openpilot can have 197 kernels 2023-02-27 11:48:36 -08:00
George Hotz
c9252d38b2 mypy cache breaks if you sometimes check untyped defs, no checking tests for now 2023-02-27 09:57:33 -08:00
George Hotz
e74779f19d typing fixup 2023-02-27 09:52:04 -08:00
George Hotz
edc8fbfff2 woah, why isn't OPT=2 2023-02-27 08:03:31 -08:00
George Hotz
f4ee7d2cad back to 196 kernels 2023-02-25 18:25:34 -08:00
George Hotz
6e98a172a0 fix broken contiguous 2023-02-25 17:41:49 -08:00
George Hotz
a44e8e4385 discard children on mop shuffle, 200 -> 196 kernels 2023-02-25 10:51:07 -08:00
George Hotz
758515dcc0 conv2d is an hlop (#589)
* conv2d is an hlop

* shorter conv

* KOPT=-1

* alt imp

* MULACC

* smarter mulacc

* pop conv

* 7x7 -> 5x5

* didn't fix, that's not going to work

* this is faster and matches old behavior

* oh, non lazy just won't work with mulacc

* mulacc in torch

* bool types were creeping in

* optimizer is actually better with hlop conv

* fix pushing permutes issue

* refactor einsum_mulacc

* fix up readme

* update readme

* _image_conv2d

* fix bias addition location

* pushing permutes gets back to 200 kernels

* conv cleanup

* disable hlop conv

* don't hide that in helpers
2023-02-23 17:52:31 -08:00
George Hotz
628ce067a1 add tests to mypy 2023-02-22 07:07:38 -08:00
George Hotz
714bf4b108 clang backend (#572)
* start clang backend

* mostly working

* no group for reduce w clang

* it compiles

* compiles

* a11y

* minor fixups

* formatting

* add a test

* rename test
2023-02-20 18:18:18 -08:00
James Roberts
0d405fd5bc Parallelize CI tests (#535) 2023-02-06 15:27:44 -06:00
George Hotz
90529d3750 tests are 20% faster (#529)
* pytorch CPU

* no cache, it's slower

* pytorch cpu for real

* remove double onnx
2023-02-06 09:56:14 -06:00
George Hotz
6eb0e6a650 shuffle deps: always tqdm, make linting category 2023-02-06 09:27:01 -06:00
George Hotz
1d80639646 make linter test install testing deps 2023-02-06 09:21:48 -06:00
George Hotz
60bb64811c merge mypy into linters, no useless package update 2023-02-06 09:14:00 -06:00