Commit Graph

4667 Commits

Author SHA1 Message Date
George Hotz
758515dcc0 conv2d is an hlop (#589)
* conv2d is an hlop

* shorter conv

* KOPT=-1

* alt imp

* MULACC

* smarter mulacc

* pop conv

* 7x7 -> 5x5

* didn't fix, that's not going to work

* this is faster and matches old behavior

* oh, non lazy just won't work with mulacc

* mulacc in torch

* bool types were creeping in

* optimizer is actually better with hlop conv

* fix pushing permutes issue

* refactor einsum_mulacc

* fix up readme

* update readme

* _image_conv2d

* fix bias addition location

* pushing permutes gets back to 200 kernels

* conv cleanup

* disable hlop conv

* don't hide that in helpers
2023-02-23 17:52:31 -08:00
George Hotz
ab3a2ae9a2 fix test_resnet in onnx now that maxpool works 2023-02-23 08:41:47 -08:00
George Hotz
fd6082dcef support all _pool2d. conv will eventually be an hlop 2023-02-23 08:19:47 -08:00
Mischa Untaga
5190784cbb Fix Tensor random functions determinism with same seed (#580)
* fix Tensor random functions determinism with same seed

* long lived rng

* TIL ClassVar typing
2023-02-22 19:08:43 -08:00
George Hotz
c8d89eb20e avg/max pool strides 2023-02-22 18:00:48 -08:00
George Hotz
628ce067a1 add tests to mypy 2023-02-22 07:07:38 -08:00
George Hotz
104c3c5e73 oops, forgot that debug 2023-02-22 06:58:27 -08:00
Connor Henderson
9670bf1fd1 Add unsqueeze (#574)
* Add unsqueeze

* remove UNSQUEEZE from llops part of readme

* make it an hlop
2023-02-20 20:14:59 -08:00
George Hotz
60008e55cd sick of that failing 2023-02-19 13:05:37 -08:00
Martin Loretz
7e9a5e3f31 Refactor graph (#560)
* Refactor graph

* Add graph tests

* Use CPUBuffer for graph tests

* Remove the use of GlobalCounters
2023-02-19 10:41:30 -08:00
Kirill
7944cfdadc Remove Tensor.data (#565) 2023-02-18 16:36:12 -08:00
Jacky Lee
9fd41632c6 Import get_parameters from tinygrad.nn (#559)
* get_parameter is in optim

* Update all imports for get_parameters

* Clean up

* use optim.get_paramters
2023-02-17 15:22:26 -08:00
George Hotz
fae7654924 fix sync issue 2023-02-17 12:42:45 -08:00
George Hotz
5e6265be6e metal timing, fix speed test 2023-02-17 12:31:54 -08:00
George Hotz
121bd03cbd metal globalcounters 2023-02-17 12:02:54 -08:00
Jacky Lee
e172f0087a BatchNorm2D -> BatchNorm2d (#558)
* BatchNorm2D -> BatchNorm2d

* Fix typo
2023-02-16 12:31:49 -08:00
George Hotz
20a03d5017 woah, don't sync torch if it's not torch 2023-02-12 07:48:56 -08:00
George Hotz
de71c13934 test speed v torch uses jit 2023-02-12 07:43:17 -08:00
George Hotz
446442dbb3 fix tests symbolic 2023-02-11 15:16:47 -08:00
George Hotz
7a7046f264 sum_combine_num 2023-02-11 14:48:31 -08:00
George Hotz
7d33f2d659 CL.CACHE is over, GlobalCounters.cache is it 2023-02-11 12:00:14 -08:00
George Hotz
0a2035e015 oops, GPU isn't defined 2023-02-11 10:10:02 -08:00
George Hotz
3421d4af10 the jit has a test 2023-02-11 10:04:03 -08:00
George Hotz
b9f02671d3 oops, broke torch speed test 2023-02-10 16:13:53 -06:00
Jacky Lee
5c51ae8dbf Show where tinygrad is faster in speed test vs torch (#549)
* show where tinygrad is faster

* don't change text color
2023-02-10 14:01:07 -06:00
George Hotz
c3cf17c6d0 Symbolic render (#550)
* render symbolic

* valid

* fix shapetracker tests

* render_python is the default

* expr is gone

* remove legacy behavior
2023-02-10 13:22:26 -06:00
Lucas Keller
56a06280c5 Testing/utils (#548)
* New unittest for utils.py

Unit test fetch in basic ways. Would have tested more fetches, but
downloading stuff for tests is annoying and mocking is more
dependencies.

* Remove unused imports
2023-02-10 12:08:20 -06:00
George Hotz
5de850f6d5 assign buffer reuse (#547)
* assign buffer reuse works

* fix assign for torch and cpu

* allow assign from numpy

* fix llvm output_buffer

* add some assign tests

* fix assignment test

* test should fail without lazy

* env var to disable assign
2023-02-09 11:53:02 -06:00
George Hotz
473bbd3e35 fix graphs 2023-02-09 09:40:46 -06:00
George Hotz
3d63934995 refactor to keep cl in the runtime (#545)
* refactor to keep cl in the runtime

* fix thneed, rename cl to _cl

* bugfix + _cuda

* fix tests

* thneed more correct
2023-02-08 16:46:09 -06:00
Mitchell Goff
ae4f0aeb5f NumPy-like semantics for Tensor.__getitem__ (#506)
* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None

* Fixed pad2d

* mypy doesn't know about mlops methods

* normal python behavior for out-of-bounds slicing

* type: ignore

* inlined idxfix

* added comment for __getitem__

* Better comments, better tests, and fixed bug in np.newaxis
2023-02-08 08:59:46 -06:00
George Hotz
aebe75d9a2 remove val expansion (#539)
* remove val expansion

* types for all shapetracker functions:

* more typing

* add all the parens to the test

* more types

* fix tests

* very minor speedup
2023-02-07 15:14:05 -06:00
Jared Z
7604b17fbf TestZeroViewShapeTracker fix test (#481)
* TestZeroViewST test

* updated to align with st naming conventions in file

* Update test_shapetracker.py
2023-02-07 06:17:55 -06:00
George Hotz
c073271f20 more symbolic correctness 2023-02-07 00:03:14 -06:00
George Hotz
e961fd3a04 more symbolic test, ModNode is wrong 2023-02-06 23:43:21 -06:00
George Hotz
8cfeb118d6 symbolic new test 2023-02-06 23:27:26 -06:00
George Hotz
c3d81bba2a test_train: Adam -> SGD 2023-02-06 08:55:41 -06:00
Jacky Lee
ad4f6aa2cf Add test for quick_gelu (#526)
* Add test for quick_gelu

* Bump PyTorch version for approximate
2023-02-03 20:01:39 -08:00
Jacky Lee
486f023e81 Rename Normalize and move to nn (#513)
* Rename Normalize and move to nn

* Match PyTorch for dim>1
2023-02-01 11:55:03 -08:00
George Hotz
cd97b036cc A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
Jacky Lee
799b3f185a Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
Jacky Lee
491e78d203 Add symbolic tests for correctness (#494)
* [WIP] Add symbolic tests for correctness

* Fix typo

* Fix expected value for test_and_fold

* Add more tests for symbolic

* It is indeed right

* Clean up

* Check all strings

* Put TODO back
2023-01-30 18:40:16 -08:00
George Hotz
7457f0d755 KOPT=2 2023-01-30 13:28:06 -08:00
George Hotz
cccfea4b25 factor out KOPT code 2023-01-30 13:13:55 -08:00
George Hotz
de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
George Hotz
2db272c7f7 Kernel Optimizer (#489)
* kernel optimizer

* 10x faster, but wrong. not good deal

* move test -> extra

* print x speedup

* clcache

* fix clcache + DEBUG

* GFLOPS estimate

* i==3
2023-01-29 17:15:00 -08:00
George Hotz
ebdec2b72f fix optimizer 2023-01-29 00:23:06 -08:00
George Hotz
b0df4d99a0 os x profiling: this ratio is exact i believe 2023-01-28 19:02:51 -08:00
George Hotz
2f194aadad loop unrolling upcast 2023-01-28 14:51:24 -08:00
George Hotz
381f3e92da fix prints, add third conv 2023-01-28 14:10:27 -08:00