Commit Graph

104 Commits

Author SHA1 Message Date
cloud11665
264b1e5f48 cache gpuocelot build in cuda CI (#1032) 2023-06-22 17:42:12 -07:00
cloud11665
2407690d82 add cuda on cpu tests (#1020) 2023-06-22 14:15:50 -07:00
George Hotz
18892242b0 global -> group (#1007)
* global -> group

* allow None for local_size in custom function

* lil local

* comment on shape

* fix cuda

* smart local cast

* better local heuristic

* fix ptx, and work_dim cleanup

* fix metal

* fix ops test

* fix openpilot jit

* no more optlocal

* might fix metal tests

* try metal now

* see generated metal code

* test free removal. REVERT THIS

* mergable
2023-06-21 11:50:43 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz
0d4c4f4e9e metal ci attempt (#1010)
* metal ci attempt

* skip failing ops tests

* skip in the ops test

* no dtype test
2023-06-19 09:23:55 -07:00
Diogo
6b1280f01c fixes to Onnx ops LayerNormalization/Prelu and added OptionalHasElement/OptionalGetElement (#956)
* prelu and where casting

* typing for safe_numpy

* optional

* get rid of tracing in ci

* cleanup and resolved layernorm issues

* removed debug print
2023-06-08 16:09:19 -07:00
kposborne2
00360da05b Update broken docs/abstractions.py for changed ops, and add to CI (#930)
* fix and add to ci

* still have those

* ocd

* update other doc
2023-06-04 19:21:20 -07:00
George Hotz
a3feee29c5 make tests faster + add onnx (#815)
* search one dir, disable slow

* onnx tests

* fast rnnt test
2023-05-27 08:53:32 -07:00
George Hotz
faf80418b7 pyopencl by default since GPU is default (#802) 2023-05-25 17:48:18 -07:00
George Hotz
03b38864db fix batchnorm at training (#753)
* e2e testing

* min failure

* no affine on bn, still fails

* why did i think i could detach that?

* allow more kernels for bn

* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz
dbc99c243b why did that test break? 2023-04-18 17:08:38 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Cyril Roumégous
3f08613a2a apply flake8 E203 rule (#684) 2023-03-11 11:35:16 -08:00
George Hotz
1826ff6b89 dtypes nice and clean (#673)
* add dtype class

* dtypes

* buffers are lazy

* dtype is tracked by lazybuffer and GenericShape

* fix types in llvm

* llvm store

* dtype tests

* fix tests maybe

* fix flop counter

* fix CI

* CI fix and check format

* fix dtype and dtype check

* fix custom test

* fix test graph
2023-03-10 16:56:07 -08:00
George Hotz
5dc227dba6 fix bug in ENABLE_METHOD_CACHE and enable for llvm 2023-03-06 07:43:40 -08:00
George Hotz
50012f679b move get_contraction to shapetracker 2023-03-06 06:42:57 -08:00
George Hotz
7a1d96fd76 No negative (#632)
* behavior is correct without VALIDHACKS

* simple div and mod

* fix tests

* no negative variables

* alt form is correct

* still correct

* bug in mulnode

* at least validhacks works now

* cleanups

* test validhacks, and to_image_idx

* cache compare key

* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz
999b44c274 fix external test + speed 2023-03-03 06:46:16 -08:00
George Hotz
459488bba2 fix linter (#630)
* fix linter

* no imports okay

* explicit bases

* disable in pylintrc
2023-03-02 20:06:20 -08:00
George Hotz
bfcec234a2 Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz
3c8da6bd03 add typing 2023-02-28 10:54:46 -08:00
George Hotz
d584bae5c0 fine, openpilot can have 197 kernels 2023-02-27 11:48:36 -08:00
George Hotz
c9252d38b2 mypy cache breaks if you sometimes check untyped defs, no checking tests for now 2023-02-27 09:57:33 -08:00
George Hotz
e74779f19d typing fixup 2023-02-27 09:52:04 -08:00
George Hotz
edc8fbfff2 woah, why isn't OPT=2 2023-02-27 08:03:31 -08:00
George Hotz
f4ee7d2cad back to 196 kernels 2023-02-25 18:25:34 -08:00
George Hotz
6e98a172a0 fix broken contiguous 2023-02-25 17:41:49 -08:00
George Hotz
a44e8e4385 discard children on mop shuffle, 200 -> 196 kernels 2023-02-25 10:51:07 -08:00
George Hotz
758515dcc0 conv2d is an hlop (#589)
* conv2d is an hlop

* shorter conv

* KOPT=-1

* alt imp

* MULACC

* smarter mulacc

* pop conv

* 7x7 -> 5x5

* didn't fix, that's not going to work

* this is faster and matches old behavior

* oh, non lazy just won't work with mulacc

* mulacc in torch

* bool types were creeping in

* optimizer is actually better with hlop conv

* fix pushing permutes issue

* refactor einsum_mulacc

* fix up readme

* update readme

* _image_conv2d

* fix bias addition location

* pushing permutes gets back to 200 kernels

* conv cleanup

* disable hlop conv

* don't hide that in helpers
2023-02-23 17:52:31 -08:00
George Hotz
628ce067a1 add tests to mypy 2023-02-22 07:07:38 -08:00
George Hotz
714bf4b108 clang backend (#572)
* start clang backend

* mostly working

* no group for reduce w clang

* it compiles

* compiles

* a11y

* minor fixups

* formatting

* add a test

* rename test
2023-02-20 18:18:18 -08:00
James Roberts
0d405fd5bc Parallelize CI tests (#535) 2023-02-06 15:27:44 -06:00
George Hotz
90529d3750 tests are 20% faster (#529)
* pytorch CPU

* no cache, it's slower

* pytorch cpu for real

* remove double onnx
2023-02-06 09:56:14 -06:00
George Hotz
6eb0e6a650 shuffle deps: always tqdm, make linting category 2023-02-06 09:27:01 -06:00
George Hotz
1d80639646 make linter test install testing deps 2023-02-06 09:21:48 -06:00
George Hotz
60bb64811c merge mypy into linters, no useless package update 2023-02-06 09:14:00 -06:00
Martin Loretz
97f0a82be7 Cache pip packages in github actions (#522)
* Cache pip dependencies in github actions

* Add setup.py as cache-dependency-path

* Test caching

* Test caching

* Upgrade setup python action

* Test caching

* Remove setup.py from cache-dependency-path

* Don't remove cache-dependency-path

* Don't cache linter package's

* Test caching

* Test caching

* Test caching

* Upgrade actions/checkout to v3
2023-02-03 20:04:20 -08:00
George Hotz
e313c8af20 update openpilot tests from OPENCL to GPU 2023-01-24 14:05:20 -08:00
George Hotz
49c6e6d472 Latest attempt to add image (#462)
* add image

* load + store + boring stuff:

* image tests pass

* thneed print GFLOPS

* op conv test

* more debugging

* hack for multiview image

* shapetracker creates less views

* disable image tests

* working better

* ugh, lkey not key

* print in DEBUG, and allow views

* works

* simple padding conv2d

* use index for image

* that was bad code

* debug print

* fix types

* less lines

* save lines
2023-01-12 17:36:30 -08:00
George Hotz
27211103ae docker: no -it 2023-01-09 12:49:59 -08:00
George Hotz
d6e86a29a8 docker: forgot to checkout code 2023-01-09 12:48:03 -08:00
George Hotz
73ce9a771e that fix it 2023-01-09 12:46:33 -08:00
George Hotz
bfd4f4e35c testdocker 2023-01-09 12:41:52 -08:00
George Hotz
5e07d4669d the speedy chonker is going to replace the old chonker (#432)
* bringing back reshape and permute

* done with E701

* 4x4 works in generic way

* max and sum not vectorizing...

* special case single float

* support comparing to MPS

* improve matmul speed, consider generic principles

* GlobalCounter

* fix op tracking

* faster

* comment that out for now

* err, it needs that

* fix minor issues

* fix global_mem
2022-11-11 18:34:24 -08:00
George Hotz
b8c94a67c9 Simple chonker (#431)
* chonker will make llvm fast

* work

* better speed tests, we will make them fast

* with the cache add is the same speed

* relu and neg are fast

* fix sum speed

* maximum maxnum?

* hack for gemm opt

* gemm very slow

* zeros like

* test_permute

* shapetracker returns self

* fix shapetracker factorization

* err, int strides

* permutes are faster now in tinygrad than pytorch

* support -1 in expand

* gemm unrolled

* improve final test case

* WIP GEMM

* why isn't GEMM fast?

* revert cache dim

* ffp contract works on clang, not llvm?

* ignore llvm ir

* this makes fma work at least, but no faster

* USE_4x4

* 63 GFLOPS

* 87 GFLOPS

* that wasn't matmul, 44 GFLOPS now

* 82 GFLOPS permuted

* this permute too

* a little speed for the convs

* 45 GFLOPS

* speed tests pass again

* clean up prints

* fix FMA WHAT A WASTE OF TIME

* colors

* moar fair

* GPU

* useless on chonker

* cleanups

* improve factorized shapetracker

* better threshold

* label conv

* work

* ops test pass again

* hot load the index

* run the last view, no need to create

* ZeroView needs a repr for the key to work

* fix segfault on out of bounds

* one more test

* start amx, and llvm.initialize_native_asmparser

* amx works

* nice AMX class

* nicer AMX class

* refactor get_idxs

* amx working

* is slower...

* useless flip

* cache

* SZ_X

* AMX_SZ_X/Y work alone

* Contiguous mlop

* test gemm packed

* PREPARE in packed

* use_amx factor

* prefetch isn't faster

* loop

* same 3ms

* 2.24 ms

* allow double on store in TG

* amx reduce is the same speed as non amx reduce

* include memory bandwidth

* clean up shapetracker

* flip returns stride

* prepare for upstream

* Update ops_llvm.py (#426)

* permutes are yellow and green now

* faster conv

* llvm cleanups

* Show optimised IR under debug 4 (#428)

* ASTKernel class

* Make tinygrad work with older python version (#427)

* Make tinygrad work with older python version

* Use partialmethod instead of partial

* smiple chonker is chonking

* remove junk from test speed vs torch

* fix linker and types

* AMX is only here now

* add LLVM tests, it's a valid backend now

* oops, run llvm test

* contiguous_op

* fix loadops compare

* dedup reduceops

Co-authored-by: calledit <1573053+calledit@users.noreply.github.com>
2022-11-10 23:17:09 -08:00
George Hotz
d02f8f9bc0 can we lose the lines with E701 still there? 2022-10-28 08:36:03 -07:00
George Hotz
ef62db3186 cleanups, remove E701 2022-10-28 08:28:56 -07:00
George Hotz
b65b70812a Exec AST (#404)
* working exec ast

* exec_ast is staticmethod

* GenericExecAST

* fold that sometimes

* ExplicitExecAST

* exec_ast for GPU

* gpu working

* get_lazyop_shape

* now gpubuffer is ExplicitExecAST

* dedup

* add a type

* RESHAPE in opencl code

* fix linter

* that too for linter

* cleanups

* remove dead code

* GenericShape is less lines

* add ALLOWED_KERNEL_COUNT to tests

* fix mypy

* that's gotta be recursive

* fix opencl shape processing

* remove unneeded lambda
2022-10-28 08:27:03 -07:00
George Hotz
3b9b7eda48 remove run_thneed dead code 2022-10-20 17:24:18 -07:00