Commit Graph

183 Commits

Author SHA1 Message Date
George Hotz
c8fbdeb48e test speed llama (#1046)
* test speed llama

* oops, put it back

* uses the real device codegen

* just do it on the mac

* pp

* is faster?

* Revert "is faster?"

This reverts commit 42db542010.

* disable docker again for less load on CI
2023-06-25 15:22:56 -07:00
Diogo
57d3aa76a5 Windows & Ubuntu CLANG CI support (#1011)
* matrix strategy

* push env to GITHUB_ENV

* use printf instead of echo

* use temp helper function for cross os paths

* use path join

* switched to using temp helper function

* skip test on windows due to memory limit

* small fix

* removed semi

* touchups

* clean up

* seperate tests

* test changes to test_utils on windows

* small refactor

* more cleanups

* undo helpers change

* only skip if in CI and WINDOWS
2023-06-19 09:33:24 -07:00
George Hotz
5428b5d774 good changes from tensor_cores branch (#1005)
* good changes from tensor_cores branch

* touchups

* real_strides fixup

* refactor merge_views
2023-06-18 20:28:06 -07:00
George Hotz
c62c64f0b7 remove GeNode (#965) 2023-06-09 21:48:56 -07:00
George Hotz
791530045d Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
George Hotz
d58586bb17 safetensors! (#903)
* safetensors test

* safe_save

* load back with real safetensors

* bugfix in device name. add simple torch_load

* it works for llama, but it's slower...

* mmap

* no intermediate

* load mmaped

* readinto speed

* not ready yet

* revert that
2023-06-02 13:41:09 -07:00
George Hotz
59f9bcd4a4 Disktensors! (#819)
* make empty a real thing

* start ops_disk

* disk tensor works

* interpreted cleanup

* slice write to disk

* preprocess imagenet

* fix custom function
2023-05-28 15:40:37 -07:00
Rayan Hatout
8b2c2d6896 Optimizations in symbolic.py (#796)
* optimizations in symbolic.py

* fix infinite recursion when expanding sums

* add test case to make sure NumNodes are hoisted up in cases where MulNodes cancel eachother out
2023-05-26 12:59:53 -07:00
Sasha Krassovsky
b258af117a Fix PytestCollectionWarning when running tests (#791) 2023-05-24 23:17:57 -07:00
George Hotz
81aa3e546b exclude GPU on tiny (#766) 2023-05-05 10:07:23 -07:00
George Hotz
8b7ecd63bb Remove Zeroview (#748)
* no zeroview start

* closer

* stride mask

* st tests pass, delete ZeroView

* byebye zv

* close to working

* not contiguous with mask

* subtract, don't add

* mask on view

* ugh, that shouldn't have been in there

* shape merge

* bugfixes

* fuzzer + 4 fuzzer failures

* fuzzer for symbolic

* more fuzzing and nothing

* that fuzzer doesn't hit either

* fixes padding...ugh

* no more offsets

* working

* rewrite load and store

* all checks

* fix idxs

* progress

* bugfix

* float4_axis

* works

* cleanups

* complex valids_okay
2023-04-17 08:21:46 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
5495c7d64e linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz
54f499b623 Move rawbuffer (#697)
* move GlobalCounters to helpers

* that's not part of the public api

* move InterpretedBuffer

* remove fromCPU from devicebuffer
2023-03-13 22:30:36 -07:00
George Hotz
c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz
a4abcf0969 improve test_example 2023-03-12 22:59:40 -07:00
George Hotz
5577634cf3 tests in pre commit 2023-03-12 22:42:26 -07:00
George Hotz
ce1564b05e fix shapetracker test 2023-03-12 22:33:25 -07:00
George Hotz
01f39b19dc move to shapetracker.py 2023-03-11 07:50:07 -08:00
George Hotz
0b03216cc3 losing lines (#678)
* losing lines

* FLIP -> STRIDE

* shapetracker refactor
2023-03-10 21:57:05 -08:00
George Hotz
1826ff6b89 dtypes nice and clean (#673)
* add dtype class

* dtypes

* buffers are lazy

* dtype is tracked by lazybuffer and GenericShape

* fix types in llvm

* llvm store

* dtype tests

* fix tests maybe

* fix flop counter

* fix CI

* CI fix and check format

* fix dtype and dtype check

* fix custom test

* fix test graph
2023-03-10 16:56:07 -08:00
George Hotz
dbbaa0bdd7 int32, and refactor pad/shrink 2023-03-09 12:57:17 -08:00
George Hotz
fb5ee9260f add pad tests to shapetracker 2023-03-09 12:51:18 -08:00
George Hotz
e0244baf60 3 letters for graph op 2023-03-07 19:20:48 -08:00
Alex Wang
d885d2d0f5 Allow 1s for contraction detection (#663)
* Allow 1s for contraction check

* More test cases for 1s
2023-03-07 17:31:28 -08:00
George Hotz
50012f679b move get_contraction to shapetracker 2023-03-06 06:42:57 -08:00
Alex Wang
64ecbd91b5 Refactor contraction and add integration test cases for push permute (#650)
* Refactor contraction and add unit tests

* Fix typo; Fix TestConv.test_elu failure due to some ones in old_shape

* Add push permute test cases

* Fix mypy type annotation check error

* Add contraction unit test; Reshape to higher dimension is not contraction
2023-03-06 06:36:55 -08:00
George Hotz
16b03f3c3b wow, can't believe that was broken (#642)
* wow, can't believe that was broken

* remove namedtuple comment
2023-03-04 22:28:28 -08:00
George Hotz
7a1d96fd76 No negative (#632)
* behavior is correct without VALIDHACKS

* simple div and mod

* fix tests

* no negative variables

* alt form is correct

* still correct

* bug in mulnode

* at least validhacks works now

* cleanups

* test validhacks, and to_image_idx

* cache compare key

* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz
8c475ea86a relax atol, merge_view 2023-03-03 07:48:44 -08:00
George Hotz
b9ce20c374 openpilot test wasn't running, factor out image idx 2023-03-03 07:41:53 -08:00
George Hotz
8919ca8163 test cleanups 2023-03-03 06:36:06 -08:00