Commit Graph

952 Commits

Author SHA1 Message Date
chenyu
3e0c2d256f symbolic shapetracker (#1506)
* symbolic shapetracker

* no need

* keep only symbolic and clean up

* explicit // and % Node support

* NumNode * Node
2023-08-12 12:22:58 -07:00
George Hotz
bd7f4b1249 move renamer to linearizer (#1442)
* move renamer to linearizer

* uops converter

* Delete test_uops.py
2023-08-05 08:53:25 -07:00
wozeparrot
801bed4f66 Add ops_shm (#1413)
* feat: add ops_shm

* clean: extra newline

* feat: add test

* feat: ci doesn't like that

* feat: ci still doesn't like that

* feat: skip big test on ci

* feat: testing

* feat: big

* feat: testing again

* feat: reskip test
2023-08-03 17:40:52 -07:00
chenyu
34f348643b Support constant expand to symbolic shape (#1411) 2023-08-02 21:21:22 -07:00
chenyu
18d0a93f09 LazyBuffer.get_variable_buffers() (#1391)
* LazyBudder.get_variable_buffers()

* remove left_only, add ProdNode

* no vars for OpNode.b

* do not change symbolic vars, remove ProdNode
2023-08-02 09:01:35 -07:00
Diogo
4dc8595069 simple exporting models (#1344)
* unified exporting

* json exporting

* ignore more

* simplified buffer export

* added dtypes

* added assert

* swift example

* fix tests

* linter

* remove whitespace

* fixed tests

* remove swift example

* remove unintended changes

* allow callable models to be used

* whitespace

* more readable json export

* name change

* whitespace

* whitespace
2023-08-01 09:35:48 -07:00
chenyu
f5ef445cb6 trim space (#1381) 2023-07-31 10:37:57 -07:00
S-Lykles
c2b82ea8ac fix to_shape_strides (#1374)
* add tests for expr_node and expr_idxs

* simplify condition and add missing optimization
2023-07-30 18:42:46 -07:00
chenyu
1fdf560fb1 simplify get_contraction (#1373) 2023-07-30 18:35:22 -07:00
S-Lykles
a32c677601 Fix off by one error in View.expr_node (#1363)
* Fix off_by_one error in View.expr_node

* Add test for expr_node

* Remove whitespace before :

* test no arguments and properly test idx=None
2023-07-29 08:10:37 -07:00
cheeetoo
a0965ee198 CI < 5 minutes (#1252)
* models matrix

* fix typo and install gpu deps

* install llvm deps if needed

* fix

* testops with cuda

* remove pip cache since not work

* cuda env

* install cuda deps

* maybe it will work now

* i can't read

* all tests in matrix

* trim down more

* opencl stuff in matrix

* opencl pip cache

* test split

* change cuda test exclusion

* test

* fix cuda maybe

* add models

* add more n=auto

* third thing

* fix bug

* cache pip more

* change name

* update tests

* try again cause why not

* balance

* try again...

* try apt cache for cuda

* try on gpu:

* try cuda again

* update packages step

* replace libz-dev with zlib1g-dev

* only cache cuda

* why error

* fix gpuocelot bug

* apt cache err

* apt cache to slow?

* opt and image in single runner

* add a couple n=autos

* remove test matrix

* try cuda apt cache again

* libz-dev -> zlib1g-dev

* remove -s since not supported by xdist

* the cache takes too long and doesn't work

* combine webgpu and metal tests

* combine imagenet to c and cpu tests

* torch tests with linters

* torch back by itself

* small windows clang test with torch tests

* fix a goofy windows bug

* im dumb

* bro

* clang with linters

* fix pylint error

* linter not work on windows

* try with clang again

* clang and imagenet?

* install deps

* fix

* fix quote

* clang by itself (windows too slow)

* env vars for imagenet

* cache pip for metal and webgpu tests

* try torch with metal and webgpu

* doesn't work, too long

* remove -v

* try -n=logical

* don't use logical

* revert accidental thing

* remove some prints unless CI

* fix print unless CI

* ignore speed tests for slow tests

* clang windows in matrix (ubuntu being tested in imagenet->c test)

* try manual pip cache

* fix windows pip cache path

* all manual pip cache

* fix pip cache dir for macos

* print_ci function in helpers

* CI as variable, no print_ci

* missed one

* cuda tests with docker image

* remove setup-python action for cuda

* python->python3?

* remove -s -v

* try fix pip cache

* maybe fix

* try to fix pip cache

* is this the path?

* maybe cache pip

* try again

* create wheels dir

* ?

* cuda pip deps in dockerfile

* disable pip cache for clang

* image from ghcr instead of docker hub

* why is clang like this

* fast deps

* try use different caches

* remove the fast thing

* try with lighter image

* remove setup python for cuda

* small docker and cuda fast deps

* ignore a few more tests

* cool docker thing (maybe)

* oops

* quotes

* fix docker command

* fix bug

* ignore train efficientnet test

* remove dockerfile (docker stuff takes too long)

* remove docker stuff and normal cuda

* oops

* ignore the tests for cuda

* does this work

* ignore test_train on slow backends

* add space

* llvm ignore same tests as cuda

* nvm

* ignore lr scheduler tests

* get some stats

* fix ignore bug

* remove extra '

* remove and

* ignore test for llvm

* change ignored tests and durationon all backends

* fix

* and -> or

* ignore some more cuda tests

* finally?

* does this fix it

* remove durations=0

* add some more tests to llvm

* make last pytest more readable

* fix

* don't train efficientnet on cpu

* try w/out pip cache

* pip cache seems to be generally better

* pytest file markers

* try apt fast for cuda

* use quick install for apt-fast

* apt-fast not worth

* apt-get to apt

* fix typo

* suppress warnings

* register markers

* disable debug on fuzz tests

* change marker names

* apt update and apt install in one command

* update marker names in test.yml

* webgpu pytest marker
2023-07-23 13:00:56 -07:00
chenyu
a5f5330d91 Add Fuzz Test symbolic / shapetracker to CI. (#1278)
* Fuzz test symbolic and shapetracker

This reverts commit d5773ddebff54c1ff608838076f0b4ff126b8aa8.

* mess again

* no tail

* test shapetracker too

* Revert mess and enable all tests

* removed leftover
2023-07-19 09:05:45 -07:00
chenyu
32be39554c Simplify symbolic.SumNode.__floordiv__ logic (#1220) 2023-07-12 12:54:12 -07:00
Diogo
a9a1df785f Webgpu support (#1077)
* initial commit

* 81 passing

* 105 passing tests

* 148 passing

* CI tests

* install dep on ci

* try opencl pkgs

* try using vulkan

* down to only 6 failing

* refactor

* cleaning up

* another test skipped due to buffer limit

* linter

* segfault

* indent fix

* another segfault found

* small touchups

* Fix max and maxpool tests

* Add constant folding

* Add javascript export script

* better asserts in codegen

* manual upcasting

* reverted token type change

* skip safetensor test due to unsupported type

* FIx efficientnet and all other model tests

* Remove np copy

* fixed indent and missing import

* manually destroy the buffer

* revert back to length

* linter errors

* removed extra val

* skip broken tests

* skipping more tests

* Make the page pretty

* Save model weights as safetensor

* Fix imagenet to c test

* Fix second imagenet to c bug

* Async and paralel kernel compilation

* workgroup support

* reversed local size

* fixed non local bug

* correct local groups

* ci experiment

* removed typo

* Fix define local by using shared memory

* Refactor

* try running on mac

* match metal tests

* add more workers

* scope down tests

* trying windows runner

* fixed windows env

* see how many it can do

* merged master

* refactor

* missed refactor

* increase test suite coverage

* missing import

* whitespace in test_efficientnet.py

* getting there

* fixed reset

* fixed bufs

* switched to cstyle

* cleanup

* min/max rename

* one more linter issue

* fixed demo

* linter

* testing ci chrome

* add unsafe webgpu arg

* add build step

* remove WEBGPU from cmd line

* use module

* try forcing directx

* trying forced metal backend

* temp disable conv2d for CI

* disable conv_trasnpose2d

---------

Co-authored-by: 0x4d - Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-07-12 12:52:06 -07:00
George Hotz
7151382364 Refactor load/store before tensor cores (#1193)
* minor cleanups

* render_const

* now that's a nice refactor

* clean up vload/vstore

* clean up render_load

* debugs there

* dumb

* err, this?

* const float4

* what's failing

* bugfix

* statement includes semicolon

* bugfix
2023-07-08 15:54:58 -07:00
Stan
9b6e57eccd helpers.py: improved test coverage + exception handling (#1165)
* Fixes + improved test coverage for helpers.py

- added exception handling in `proc`, if an exception was thrown, the thread would hang
- made `_early_exec_process` catch any Exception, before if an exception was thrown before the process was started, it would hand the thread

* Made `_early_exec_process` catch any Exception

 Otherwise, if an exception was thrown before the process was started, it would hang the thread. For example a type error for an argument passed to `subprocess.check_output`

* Fixed `from tinygrad.helpers import Timing` import

oops, for some reason my IDE cleaned that import from extra/helpers.

* Fixed import in llama.py

Another one that I skipped by accident, mybad

* Extracted a class for tests of early exec

* Normalize line endings, windows uses /r/n

* Made `cross_process` not a daemon
2023-07-07 10:26:05 -07:00
George Hotz
793a670187 from tensor cores + lb touchup (#1127) 2023-07-04 15:45:20 -07:00
George Hotz
d16c16ec28 new upcast works (#1066)
* new upcast works

* float4 try

* fix unaligned float4

* disallow unaligned access

* upcast dim

* maybe good now

* fix gpu half

* vstore_half4

* fix deep image bugs

* improve symbolic to fix issues

* fix symbolic

* cl test

* this maybe

* gcd of 1 is 1

* real fix for old python

* improve fuzzer
2023-06-27 19:34:53 -07:00
George Hotz
3e33befc1d realize hotspots (#1059)
* realize hotspots

* no str check

* minor changes

* make this an assert

* faster and more readable

* nicer self.buffers

* tests for weak op + LAZYCACHE=0
2023-06-26 18:31:18 -07:00
George Hotz
c8fbdeb48e test speed llama (#1046)
* test speed llama

* oops, put it back

* uses the real device codegen

* just do it on the mac

* pp

* is faster?

* Revert "is faster?"

This reverts commit 42db542010.

* disable docker again for less load on CI
2023-06-25 15:22:56 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz
5428b5d774 good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
George Hotz
c62c64f0b7 remove GeNode (#965) 2023-06-09 21:48:56 -07:00
George Hotz
791530045d Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
George Hotz
d58586bb17 safetensors! (#903)
* safetensors test

* safe_save

* load back with real safetensors

* bugfix in device name. add simple torch_load

* it works for llama, but it's slower...

* mmap

* no intermediate

* load mmaped

* readinto speed

* not ready yet

* revert that
2023-06-02 13:41:09 -07:00
George Hotz
59f9bcd4a4 Disktensors! (#819)
* make empty a real thing

* start ops_disk

* disk tensor works

* interpreted cleanup

* slice write to disk

* preprocess imagenet

* fix custom function
2023-05-28 15:40:37 -07:00
Rayan Hatout
8b2c2d6896 Optimizations in symbolic.py (#796)
* optimizations in symbolic.py

* fix infinite recursion when expanding sums

* add test case to make sure NumNodes are hoisted up in cases where MulNodes cancel eachother out
2023-05-26 12:59:53 -07:00
Sasha Krassovsky
b258af117a Fix PytestCollectionWarning when running tests (#791) 2023-05-24 23:17:57 -07:00
George Hotz
81aa3e546b exclude GPU on tiny (#766) 2023-05-05 10:07:23 -07:00
George Hotz
8b7ecd63bb Remove Zeroview (#748)
* no zeroview start

* closer

* stride mask

* st tests pass, delete ZeroView

* byebye zv

* close to working

* not contiguous with mask

* subtract, don't add

* mask on view

* ugh, that shouldn't have been in there

* shape merge

* bugfixes

* fuzzer + 4 fuzzer failures

* fuzzer for symbolic

* more fuzzing and nothing

* that fuzzer doesn't hit either

* fixes padding...ugh

* no more offsets

* working

* rewrite load and store

* all checks

* fix idxs

* progress

* bugfix

* float4_axis

* works

* cleanups

* complex valids_okay
2023-04-17 08:21:46 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
5495c7d64e linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz
54f499b623 Move rawbuffer (#697)
* move GlobalCounters to helpers

* that's not part of the public api

* move InterpretedBuffer

* remove fromCPU from devicebuffer
2023-03-13 22:30:36 -07:00
George Hotz
c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz
a4abcf0969 improve test_example 2023-03-12 22:59:40 -07:00
George Hotz
5577634cf3 tests in pre commit 2023-03-12 22:42:26 -07:00
George Hotz
ce1564b05e fix shapetracker test 2023-03-12 22:33:25 -07:00
George Hotz
01f39b19dc move to shapetracker.py 2023-03-11 07:50:07 -08:00
George Hotz
0b03216cc3 losing lines (#678)
* losing lines

* FLIP -> STRIDE

* shapetracker refactor
2023-03-10 21:57:05 -08:00
George Hotz
1826ff6b89 dtypes nice and clean (#673)
* add dtype class

* dtypes

* buffers are lazy

* dtype is tracked by lazybuffer and GenericShape

* fix types in llvm

* llvm store

* dtype tests

* fix tests maybe

* fix flop counter

* fix CI

* CI fix and check format

* fix dtype and dtype check

* fix custom test

* fix test graph
2023-03-10 16:56:07 -08:00
George Hotz
dbbaa0bdd7 int32, and refactor pad/shrink 2023-03-09 12:57:17 -08:00
George Hotz
fb5ee9260f add pad tests to shapetracker 2023-03-09 12:51:18 -08:00
George Hotz
e0244baf60 3 letters for graph op 2023-03-07 19:20:48 -08:00
Alex Wang
d885d2d0f5 Allow 1s for contraction detection (#663)
* Allow 1s for contraction check

* More test cases for 1s
2023-03-07 17:31:28 -08:00
George Hotz
50012f679b move get_contraction to shapetracker 2023-03-06 06:42:57 -08:00
Alex Wang
64ecbd91b5 Refactor contraction and add integration test cases for push permute (#650)
* Refactor contraction and add unit tests

* Fix typo; Fix TestConv.test_elu failure due to some ones in old_shape

* Add push permute test cases

* Fix mypy type annotation check error

* Add contraction unit test; Reshape to higher dimension is not contraction
2023-03-06 06:36:55 -08:00
George Hotz
16b03f3c3b wow, can't believe that was broken (#642)
* wow, can't believe that was broken

* remove namedtuple comment
2023-03-04 22:28:28 -08:00
George Hotz
7a1d96fd76 No negative (#632)
* behavior is correct without VALIDHACKS

* simple div and mod

* fix tests

* no negative variables

* alt form is correct

* still correct

* bug in mulnode

* at least validhacks works now

* cleanups

* test validhacks, and to_image_idx

* cache compare key

* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz
8c475ea86a relax atol, merge_view 2023-03-03 07:48:44 -08:00