Commit Graph

638 Commits

Author SHA1 Message Date
kposborne2
ae83e9844c add output_padding to transposed conv (#875) 2023-06-01 00:03:22 -07:00
Tom Edwards
115903a15c Add unbiased std and corresponding tests (#881)
* add unbiased std and corresponding tests

* replaced unbiased with correction + tests
2023-05-31 16:32:36 -07:00
Ubaidullah Khan
502e33652f add Tensor.full and Tensor.full_like and reuse them (#852)
* add Tensor.ones_like()

* add full_like and full and reuse in zeros,ones

* add tests for full and full_like
2023-05-29 17:48:09 -07:00
Marcello Fuschi
6ea5df19b2 Fix conv_transpose2d asymmetric padding (#840) 2023-05-29 07:57:06 -07:00
George Hotz
59f9bcd4a4 Disktensors! (#819)
* make empty a real thing

* start ops_disk

* disk tensor works

* interpreted cleanup

* slice write to disk

* preprocess imagenet

* fix custom function
2023-05-28 15:40:37 -07:00
Marcello Fuschi
6d49925a26 Add max_pool2d dilation (#833) 2023-05-28 15:16:48 -07:00
Kirill R
081b3ab639 Tensor.where method (#830) 2023-05-28 10:20:33 -07:00
Kirill R
0c0c7380af Add Tensor.where (#826)
* Add Tensor.where

* fix linter

* fix mypy
2023-05-28 08:04:56 -07:00
kposborne2
2163a1b049 Add shrink step to fix strided conv_transpose2d, and add to nn (#823)
* implement conv transpose 2d

* don't inherit, remove old assert

---------

Co-authored-by: Kyle <kposborne@gmail.com>
2023-05-28 07:52:45 -07:00
George Hotz
26014a0fa1 add convtranspose (#809)
* add convtranspose

* onnx convtranspose
2023-05-26 12:35:03 -07:00
Aneesh Durg
6d4a728f62 Don't collapse dimensions during batched matmul (FIX #799) (#800)
* Don't collapse dimensions during batched matmul (FIX #799)

* Avoid reshaping tensor to the same shape

* Skip batched matrix multiply when IMAGE is set
2023-05-26 11:15:34 -07:00
Diogo
c19ef0fcce Add sin/cos/tan (#794)
* added sin/cos/tan

* fix lint

* added onnx ops support
2023-05-25 09:04:56 -07:00
Rabia Eda Yılmaz
e5b4b36cba add std to tensor.py (#767)
* add std

* delete comment

* edit: one liner std, add: test

* adjust

* fix: shape mismatch

* set unbiased to False

* added unbiased option

* fix unbiased option in test and clean code

* better

* generalize axis

* holly coffee molly

* generalize axes without unbiased opt.

* hopefully done

* complete unbiased true for axes

* Update test_ops.py

* fixed

* std completed without bessels correction

* fix comment

* ups
2023-05-13 12:20:44 -07:00
George Hotz
810f03dafa conv3d + unet3d (#772)
* conv3d, needs test

* test passes, padding wrong on unet

* unet3d

* no conv3d on images
2023-05-12 13:54:07 -07:00
Joqsan
0b9d4126d0 Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) (#758)
* add stack() and repeat() methods

* make stack a static method
2023-05-01 09:37:46 -07:00
George Hotz
3d15769a8f 50 TFLOPS cuda matmul 2023-04-19 14:38:24 -07:00
George Hotz
133521e730 relu UnaryOp is back 2023-04-14 07:12:53 -07:00
worldwalker2000
552a048a33 make maximum split the grad like torch when equal (#738)
* make maximum split grad

* added test for maximum split grad when equal

* minor expr simplification

* (2-eq)/2 only once

* update test bc one more sum output child stays
2023-04-14 00:17:46 -07:00
Andre Slavescu
39d6e1525f Added activation ops + tests (#729)
* activation ops

* type hints + more testing

* formatting correction + parameter testing

* fixes to shape testing

* hardtanh to use clip + removed type hints

* assign val fix
2023-03-28 13:17:53 +04:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
5495c7d64e linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00
George Hotz
902906f909 Fix constant folding (#713)
* fix

* codegen

* contiguous is real

* no bufs_to_delete

* don't assign rawconst

* remove neg and not

* need exec to fix custom function jit
2023-03-18 17:52:46 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz
c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz
61071f881a fix bug, and add unit test to catch failure 2023-03-11 16:57:25 -08:00
Diogo
784afc6c6f Eq magic function support (#683)
* add eq magic func

* changed from eq to __eq__

* ignore type for linter

* mypy doenst like descriptions :(
2023-03-11 10:31:46 -08:00
George Hotz
0b03216cc3 losing lines (#678)
* losing lines

* FLIP -> STRIDE

* shapetracker refactor
2023-03-10 21:57:05 -08:00
George Hotz
1a039306d2 good changes from llama branch (#671)
* good changes from llama

* transpose behavior changed
2023-03-09 20:51:22 -08:00
George Hotz
e0244baf60 3 letters for graph op 2023-03-07 19:20:48 -08:00
George Hotz
382f346523 clean up opt (#649)
* clean up opt

* don't let global kernels get too small

* 8192 -> 1024

* disable local shape for clang

* fix can_merge

* unroll the 5x5 depthwise convs in op

* load float4 check
2023-03-05 20:49:36 -08:00
George Hotz
0335cb86b9 refactor comparison. there's a bug in the method cache 2023-03-02 10:10:16 -08:00
George Hotz
11f257ddf9 clean up cmp tests 2023-03-02 09:35:16 -08:00
Diogo
52204a7b88 adding comparison operators (#616)
* Less, LessOrEqual, Greater, GreaterOrEqual, Equal

* lint fix

* using built in functions

* overriding __eq__ breaks things

* backwards pass for less - foward only tests

* one other spot

* removing backwards for comparison ops to match pytorch

* raise runtime error

* more tests for comparison ops

* fixed the lineup

* added number upcast tests
2023-03-02 08:10:44 -08:00
George Hotz
e9e71fbfc4 remove mlop (#619)
* remove mlop

* lil simpler
2023-02-28 17:58:24 -08:00
George Hotz
4c4d88aad4 fix the last bug, and make HLOP the default 2023-02-28 17:04:28 -08:00
George Hotz
686a74de92 fast zeros and ones 2023-02-27 06:46:26 -08:00
George Hotz
8b96522e1d instant identity removal 2023-02-25 09:46:04 -08:00
George Hotz
2c5e13a513 Reluless (#600)
* replace relu for maximum

* fix for other backend

* clean up RELU and GT0

* tests for maximum

* had to clean that up

* why reverse a maximum?
2023-02-25 01:21:16 -08:00
George Hotz
f8f026e8bb oversized expand for HLOP convs 2023-02-24 21:48:47 -08:00
George Hotz
2e56a4793e rename log_softmax, support dim, fix onnx Softmax 2023-02-24 10:11:24 -08:00
George Hotz
758515dcc0 conv2d is an hlop (#589)
* conv2d is an hlop

* shorter conv

* KOPT=-1

* alt imp

* MULACC

* smarter mulacc

* pop conv

* 7x7 -> 5x5

* didn't fix, that's not going to work

* this is faster and matches old behavior

* oh, non lazy just won't work with mulacc

* mulacc in torch

* bool types were creeping in

* optimizer is actually better with hlop conv

* fix pushing permutes issue

* refactor einsum_mulacc

* fix up readme

* update readme

* _image_conv2d

* fix bias addition location

* pushing permutes gets back to 200 kernels

* conv cleanup

* disable hlop conv

* don't hide that in helpers
2023-02-23 17:52:31 -08:00
George Hotz
fd6082dcef support all _pool2d. conv will eventually be an hlop 2023-02-23 08:19:47 -08:00
George Hotz
c8d89eb20e avg/max pool strides 2023-02-22 18:00:48 -08:00
Connor Henderson
9670bf1fd1 Add unsqueeze (#574)
* Add unsqueeze

* remove UNSQUEEZE from llops part of readme

* make it an hlop
2023-02-20 20:14:59 -08:00
Jacky Lee
ad4f6aa2cf Add test for quick_gelu (#526)
* Add test for quick_gelu

* Bump PyTorch version for approximate
2023-02-03 20:01:39 -08:00
George Hotz
cd97b036cc A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
Jacky Lee
799b3f185a Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz
de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
George Hotz
bd8a5c2ced Simple CUDA Runtime (#480)
* factor out opencl runtime

* don't use CL outside the runtime

* cuda runtime adds

* final_dimension

* tests pass with CUDA backend

* more cuda

* cuda simpler

* retain old functionality

* linter and typing

* move globalcounters out of runtimes

* oops, GlobalCounters in cuda

* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00
George Hotz
9245f4650a indexer changes for master 2023-01-18 18:02:02 -08:00