Commit Graph

458 Commits

Author SHA1 Message Date
George Hotz
8ff2e13550 From teeny (#2426)
* changes from teenygrad work

* support not supporting ImageDType/PtrDType

* fixups from teeny
2023-11-24 12:50:56 -08:00
nimlgen
e68aebfff9 bring hip graph back (#2385)
* bring hip graph back

* share with metal

* fix linter

* remove hasattrs

* Update ops_hip.py

* hip wrapper does not use _buf

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-24 07:53:44 -08:00
George Hotz
12023b6824 onnx ops cleanup (#2413)
* onnx ops cleanup

* revert those
2023-11-23 18:39:49 -08:00
George Hotz
095e2ced61 add name support to fetch (#2407)
* add name support

* use fetch in gpt2

* remove requests from main lib, networkx also optional

* umm, keep that assert

* updates to fetch

* i love the walrus so much

* stop bundling mnist with tinygrad

* err, https

* download cache names

* add DOWNLOAD_CACHE_VERSION

* need env.

* ugh, wrong path

* replace get_child
2023-11-23 14:16:17 -08:00
George Hotz
0505c5ea50 remove force_wait, refactor to graph (#2405)
* remove force_wait

* refactor

* get rid of stupid ASTRunner

* fix del in diskbuffer

* BufferOps.FROM_UNDERLYING

* put offset in the rawbuffer

* fix bugs

* use exec
2023-11-23 12:46:07 -08:00
George Hotz
4f8f0ac139 minor cleanups, remove dead files (#2398)
* minor cleanups, remove dead files

* s.name

* use disk

* pytest passes on mac
2023-11-23 09:01:50 -08:00
George Hotz
66c75f30c6 remove triton (#2396) 2023-11-23 07:40:59 -08:00
chenyu
8798d120bb autopad shapetracker for BEAM (#2375)
* autopad shapetracker for BEAM

* OptOps.PADTO

* skip that test for now

* correct padding reduce axis

* just 32

* avoid more than double the FLOPs

* cleanups

* test case

* no support for triton and llvm yet

* typos

* symbolic shape would not work

* cannot PADTO with MAX kernel

* advance db version

* no breaking change - don't advance db version

* is triton just python?

* Revert "is triton just python?"

This reverts commit 17e776c25587615e33a3634c2fb0bb8591ce65d4.

* Revert "Revert "is triton just python?""

This reverts commit 6c434c01e1c4b0ea0431ec18632cd859fb3cf260.

* support llvm

* is it really passing in CI only?

* update tests

* oh triton test passed

* simpler

* revert that, with a test

* check if st are the same

* Revert "check if st are the same"

This reverts commit d2a5eac110a5da1af82a2728c883779ef69c3cad.

* update the db version

* rebase artifact
2023-11-22 21:05:25 -05:00
qazal
0eda545946 dtypes.float.vec(sz) (#2386)
* replace all _dtypen with dtype.vec(n)

fix: print works

* conceptul refactor of cstyle render_load logic

* linearizer GEP is explicit that its dtype is the scalar version of localtype

* vectorized global_store and load don't need a conditional
2023-11-22 17:43:14 -08:00
George Hotz
cbb8486779 ResNet training changes (update benchmark) (#2390)
* default arg for chunk

* bring back to_

* good changes

* new set

* unused hash

* fix optim

* new torch loader

* fix test lr scheduler
2023-11-22 17:41:12 -08:00
wozeparrot
abbcc7aefa missed cleanup from cache_id removal (#2376) 2023-11-21 01:03:43 -05:00
George Hotz
a0890f4e6c move fetch to helpers (#2363)
* switch datasets to new fetch

* add test_helpers

* fix convnext and delete old torch load
2023-11-19 12:29:51 -08:00
chenyu
d7d078c7f9 Node.vars() returns a set and properly dedup (#2356)
* dedup RedNode.vars()

* vars returns a set

* fix more vars

* unused import

* update to_movement_ops

* comment
2023-11-18 17:44:52 -05:00
George Hotz
40246d35bc ops_shm removed (#2351)
* ops_shm removed

* buf.cast

* err, forgot those
2023-11-18 11:41:58 -08:00
George Hotz
c7b38b324b A beautiful MNIST training example (#2272)
* beautiful mnist

* beautiful mnist example

* from tinygrad import Tensor

* more beautiful

* the jit is super core tinygrad

* globalcounters reset on jit run

* symlinks and exclude

* beautiful_cartpole

* evaluate is it's own function

* no symlinks

* more beautiful

* jit reset for double speed

* type hinting for JIT

* beautiful_mnist gets 98%

* beautiful_mnist < 4s with BEAM=2

* better cartpole

* use actor critic

* zero_grad got lost

* delete double relu

* stable cartpole with PPO

* beautiful_cartpole is more beautiful

* REPLAY_BUFFER

* beautiful stuff typechecks

* None support in shape

* hp tuning
2023-11-17 19:42:43 -08:00
chenyu
d2c0035c73 add back as_strided, move rebuilt mops to extra (#2344)
* add back as_strided, move rebuilt mops to extra

* negative stride for ops_cpu

* Revert "negative stride for ops_cpu"

This reverts commit a13b6815ac.

* skip that

* style
2023-11-17 14:34:30 -05:00
George Hotz
652d2de256 wow how did i think that was okay (#2339) 2023-11-16 21:21:11 -08:00
chenyu
822d6e6f18 Simpler mops verify (#2325)
* rewrite the to_movement_ops check using symbolic

* tweak
2023-11-15 21:47:18 -05:00
forcefieldsovereign
b64738e1d6 Remove AS_STRIDED from shapetracker (#2216)
* very close

* remove comment

* negative strides working

* almost everything passes

* calculate offset with list comprehension

* some cleanup

* got disk load working

* review suggestions

* fix after merge

* overlap working

* did it

* clean

* fixed disk load

* lint

* mypy

* removed as_strided

* trying without simplify

* added back simplify

* make sure expanding to smaller shape

* cleanup

* removed comment

* removed env file

* trying whisper test again

* onnx test sqlite issue

* working on test

* finished test

* eliminate unnecessary shrink-then-pad

* don't shrink buffer

* added strides check

* added to ci under linters

* switch issue

* allow symbolic stride

* removed .env

* isinstance

* adjust strides for double expand

* cleanup

* needed to add type hint for mypy

* set pythonpath
2023-11-15 15:50:17 -05:00
geohotstan
3c5a51fb3a aaaaaaa finally (#2310) 2023-11-15 07:12:38 -08:00
George Hotz
4f7b1ac0d2 cleanups before interpreted jit (#2306)
* jit mnist

* InterpretedFlopCounter doesn't rely on Interpreted

* allocator for cpu and torch

* types for exec_ast

* fix type issues

* fix onnx, remove print

* always self.from_underlying
2023-11-14 21:44:25 -08:00
nimlgen
4e0d47533e beam works with var vals (#2296)
* beam works with var vals

* test passes now

* better comment

* linter happy
2023-11-14 13:03:19 -05:00
George Hotz
0cbf6c1811 move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
b1f7f29525 metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
rodfer
53c5baa8b6 add dilation to avg_pool2d (#2270)
* add dilation to avg_pool2d

* avg_pool_fix

* avg_pool_fix

* woo

* oops

* force it correct

---------

Co-authored-by: rodfer0x80 <rodfer0x80@proton.me>
Co-authored-by: zibokapi <zibokapi@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-13 08:47:56 -08:00
valar
123ea051e6 refactor/ci: delete many # type: ignore (#2281)
* refactor/ci: delete many `# type: ignore`

* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag

refs #2240

* ci: move `--warn-unused-ignores` flag to mypy config

refs #2240
2023-11-12 11:04:20 -08:00
geohotstan
b853e9bb8c Onnx 1.15.0 gogogo (#2217)
* lol

* lol

* add GELULULULUL

* onnx 1.50

* fuk torch bool neg

* exclude regex tests

* exclude dequantizelinear for now

* is sunny in philly

* damn it affinegrid

* fixed auto_pad VALID

* skip 0 shape tests

* add temporary cast in Reduces

* tests should pass now

* added comments and cleanup

* try moving dequantizelinear to onnx.py

* fixed dequantizedlinear?

* cleanup

* try?

* float16 segfaults LLVM CI..???

* cleanup comments

* pin to 1.50.0

* remove use of -np.inf cuz numpy is kill

* 1.50? lol I'm actually retarded

* thx for review, muhbad

* moved Gelu higher up
2023-11-10 15:36:48 -08:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
George Hotz
80bf0b8586 proper wmma (#2245)
* proper wmma

* hip cast

* bugfixes

* bugfix

* that bug is fixed

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2023-11-09 15:15:18 -08:00
wozeparrot
4c44d1344b feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00
Rory Clear
553688f12a update metal matmul and matvec for compile api (#2238) 2023-11-08 08:08:35 -08:00
George Hotz
2f7aab3d13 move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
chenyu
f582ec56d5 Replace (getenv("CI", "") != "") with helpers.CI (#2213) 2023-11-03 15:20:44 -07:00
George Hotz
f17bc16f46 simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz
ddbc6eecaf some refactors in the realization (#2206)
* some refactors

* delete old kernel search
2023-11-02 19:51:28 -07:00
George Hotz
03cf0afa4f move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
George Hotz
8932816816 remove arm64, caching for cuda (#2201)
* remove arm64, caching for cuda

* caching in llvm

* switch cache_compiled to new cache

* fix clang

* caching for metal

* fix pylint

* cleanups

* perf_counter and binary
2023-11-01 18:44:00 -07:00
George Hotz
7103b716c4 merge kernel and optimizer (#2200)
* merge kernel and optimizer

* linearize is reentrant

* move global/local size

* clean up linearizer copy

* remove unneeded lin copies

* stop linearizing twice

* oops, that should be None
2023-11-01 15:20:01 -07:00
George Hotz
33bb650e94 use mad in opencl (#2198)
Co-authored-by: Comma Device <device@comma.ai>
2023-11-01 10:40:08 -07:00
Comma Device
2e9982fe2d fastvits example that's 10% faster 2023-10-31 21:48:23 -07:00
George Hotz
8ba7ced7f9 extract const if it's const (#2193)
* extract const if it's const

* fix if statement

* fast math issue

* fix graphing and casting

* disable flaky copyout test
2023-10-31 18:52:35 -07:00
George Hotz
5aaa8a0cc1 fix shape 2023-10-31 11:36:19 -07:00
George Hotz
a27c9f9de5 openpilot compile2 (#2189)
* try compile2

* pass to thneed

* fix tanh onnx
2023-10-31 11:08:58 -07:00
forcefieldsovereign
f294bdd681 fixed imports (#2185) 2023-10-30 22:07:17 -07:00
Akshay Kashyap
018bd29e37 Enable Multi-Output Export (#2179)
* Enable Multi-Output Export

* Add test

* Update examples and lint

* fix padding

* test ops

* dummy commit to rerun test

* revert cuda lint

* Enforce tuple/list of tensors

* subscripted generics

* put back webgpu test

* Re-enable WebGPU Efficientnet test
2023-10-30 18:42:26 -07:00
chenyu
6c58bf3e9c in time_linearizer, allocate a scratch buffer if output buffer is also input (#2152)
* in time_linearizer, allocate a scratch buffer if output buffer is also input

* move scratch buffer creation outside search
2023-10-28 07:17:41 -10:00
George Hotz
e0201922e3 Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE (#2142)
* stable diffusion < 324ms

* revert swap action

* fix tests due to more sum splitting

* REDUCEOP_SPLIT_THRESHOLD env var

* added from unaligned np test (#2134)

* align cpu buffer before copy into cl buffer (#2135)

* remove shelve from handcode_resnet50_opt.py (#2139)

* Add dictionary keys to reduce db size (#2131)

* work

* ignore beam cache

* dictionary keys are generic

* minor db cleanups

* fix baseline and extract dataset

* fix training

* log likelihood

* more lin to feats

* sts

* training policynet

* net sort of works

* dedup

* refactor, stupid new actions

* fix uops deduping

* BEAM_ESTIMATE

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>
2023-10-27 10:53:06 -10:00
chenyu
0ca0e9ee5e exclude ast with variables from beam search (#2140)
* exclude ast with variables from beam search

* test that

* add to CI
2023-10-25 16:35:29 -04:00
wozeparrot
c29653605e hip multigpu training (#1878)
* feat: move to hip

* feat: special path for RawBufferTransfer

* feat: initial rawbuffertransfer

* feat: hip ipc

* feat: working hip ipc

* feat: need to base device without args

* feat: close mem handle

* feat: modified test

* feat: more multihip stuff

* clean: cleanup

* feat: cleaner

* feat: don't crash

* feat: test more

* clean: way cleaner hip wrapper

* feat: barrier

* feat: barrier

* feat: this breaks stuff

* feat: we can use empty here

* feat: maybe fix tests

* feat: maybe fix tests again?

* fix: probably fix tests

* feat: no waiting here

* feat: wait here

* feat: much larger test

* feat: need to sync here

* feat: make this async

* feat: no waiting!

* feat: cut here

* feat: sync copy

* feat: random imports

* feat: much cleaner world

* feat: restore this

* feat: restore this

* clean: cleanup

* feat: set this
2023-10-24 17:35:53 -04:00
nimlgen
2e89fd264f Refactor hipgraph (#2141)
* refactor hip graph

* linter happy

* happy liner
2023-10-24 15:45:56 -04:00