Commit Graph

34 Commits

Author SHA1 Message Date
George Hotz
26014a0fa1 add convtranspose (#809)
* add convtranspose

* onnx convtranspose
2023-05-26 12:35:03 -07:00
wozeparrot
0dc333cfab Promote Embedding to nn (#798)
* feat: promote Embedding to nn

* fix: fix failing test

* feat: add test with jit

* feat: rewrite embedding to no longer need stacked for loops

* clean+fix: don't know how that happened
2023-05-25 18:39:45 -07:00
Diogo
c19ef0fcce Add sin/cos/tan (#794)
* added sin/cos/tan

* fix lint

* added onnx ops support
2023-05-25 09:04:56 -07:00
George Hotz
f2a964f447 nocopy (#764) 2023-05-05 09:32:06 -07:00
George Hotz
f28df9900f multidevice works (#763)
* basic multigpu working

* better multigpu test

* upper

* touchups

* cl sync
2023-05-04 01:04:58 -07:00
George Hotz
7ecf4dff68 multi cl_queue (#762)
* multi cl_queue

* only platforms 1

* gpus first, then cpus

* put device on underlying buffer

* cl_queue array
2023-05-03 12:15:28 -07:00
George Hotz
03b38864db fix batchnorm at training (#753)
* e2e testing

* min failure

* no affine on bn, still fails

* why did i think i could detach that?

* allow more kernels for bn

* some test issue i don't understand
2023-04-19 08:01:04 -07:00
George Hotz
8b7ecd63bb Remove Zeroview (#748)
* no zeroview start

* closer

* stride mask

* st tests pass, delete ZeroView

* byebye zv

* close to working

* not contiguous with mask

* subtract, don't add

* mask on view

* ugh, that shouldn't have been in there

* shape merge

* bugfixes

* fuzzer + 4 fuzzer failures

* fuzzer for symbolic

* more fuzzing and nothing

* that fuzzer doesn't hit either

* fixes padding...ugh

* no more offsets

* working

* rewrite load and store

* all checks

* fix idxs

* progress

* bugfix

* float4_axis

* works

* cleanups

* complex valids_okay
2023-04-17 08:21:46 -07:00
George Hotz
17e37157b6 fix backward convs (#746)
* fix backward convs

* no pushing in reduce

* late cout

* test_fold_4convs_sgd
2023-04-14 10:42:11 -07:00
George Hotz
f7f416d6f4 back to 6 for test_fold_conv_sgd 2023-04-14 07:34:00 -07:00
worldwalker2000
552a048a33 make maximum split the grad like torch when equal (#738)
* make maximum split grad

* added test for maximum split grad when equal

* minor expr simplification

* (2-eq)/2 only once

* update test bc one more sum output child stays
2023-04-14 00:17:46 -07:00
George Hotz
20894991ed good changes from the M1 Tensor Core project (#730)
* good changes

* working except llvm

* llvm types

* nice acc

* archprobe

* lang.float4

* use self.acc for late acc

* fix store bug
2023-03-29 05:11:02 +04:00
George Hotz
1cb5b2d015 test_enet_se 2023-03-24 10:04:30 -07:00
George Hotz
e88b9bfe1e print gflops avg with DEBUG=2 2023-03-23 16:07:08 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
902906f909 Fix constant folding (#713)
* fix

* codegen

* contiguous is real

* no bufs_to_delete

* don't assign rawconst

* remove neg and not

* need exec to fix custom function jit
2023-03-18 17:52:46 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz
37cf6fc4c0 err, external_test_opt.py broke...fusing will have to wait. correctness over speed 2023-03-11 17:54:47 -08:00
George Hotz
61071f881a fix bug, and add unit test to catch failure 2023-03-11 16:57:25 -08:00
George Hotz
3ec457248c failing llama test 2023-03-11 16:28:10 -08:00
George Hotz
01f39b19dc move to shapetracker.py 2023-03-11 07:50:07 -08:00
George Hotz
d7cb8e3e56 multithreaded fake_torch_load_zipped 2023-03-10 19:16:27 -08:00
George Hotz
00641aa45d add challenge tests 2023-03-07 19:39:04 -08:00
George Hotz
4eb880550f enable contract test 2023-03-07 17:32:28 -08:00
George Hotz
b561256a0e allow all reduces (#661)
* allow all reduces

* push permute tests

* explict permute reshape push

* contractw1s
2023-03-07 15:36:01 -08:00
George Hotz
7dbcc26582 fix up external tests 2023-03-06 06:52:28 -08:00
Alex Wang
64ecbd91b5 Refactor contraction and add integration test cases for push permute (#650)
* Refactor contraction and add unit tests

* Fix typo; Fix TestConv.test_elu failure due to some ones in old_shape

* Add push permute test cases

* Fix mypy type annotation check error

* Add contraction unit test; Reshape to higher dimension is not contraction
2023-03-06 06:36:55 -08:00
George Hotz
b1ba78ac38 move applegpu disassembler 2023-03-05 11:21:12 -08:00
George Hotz
16b03f3c3b wow, can't believe that was broken (#642)
* wow, can't believe that was broken

* remove namedtuple comment
2023-03-04 22:28:28 -08:00
George Hotz
4a607f7d65 more ext gpu tests 2023-03-04 21:00:08 -08:00
George Hotz
69198a73d2 test_1x1_24_6 2023-03-04 20:37:46 -08:00
George Hotz
b02a392d69 Improve local (#635)
* local is improving

* local is finding bugs

* new local should work
2023-03-04 09:30:49 -08:00
George Hotz
528cb3b3b9 fix ast test 2023-03-04 07:49:25 -08:00
George Hotz
8919ca8163 test cleanups 2023-03-03 06:36:06 -08:00