Commit Graph

1363 Commits

Author SHA1 Message Date
George Hotz
4d232c7c95 optional networkx + DEBUGCL=2 2023-02-20 09:50:46 -08:00
George Hotz
bbfec2fde7 8.46 TFLOPS 2023-02-19 13:21:25 -08:00
George Hotz
1ba847963d reshape and retain metal_matmul 2023-02-19 13:07:23 -08:00
Kirill
7944cfdadc Remove Tensor.data (#565) 2023-02-18 16:36:12 -08:00
Jacky Lee
9fd41632c6 Import get_parameters from tinygrad.nn (#559)
* get_parameter is in optim

* Update all imports for get_parameters

* Clean up

* use optim.get_paramters
2023-02-17 15:22:26 -08:00
George Hotz
82c257e8f5 more kernel search 2023-02-12 10:34:56 -08:00
George Hotz
de71c13934 test speed v torch uses jit 2023-02-12 07:43:17 -08:00
George Hotz
ba3bf5bdf7 cifar stops learning 2023-02-11 17:21:42 -08:00
George Hotz
40f3949742 fancier KOPT 2023-02-11 16:40:25 -08:00
George Hotz
446442dbb3 fix tests symbolic 2023-02-11 15:16:47 -08:00
George Hotz
20a351a3c6 hand optim CONVW 2023-02-11 14:41:08 -08:00
George Hotz
031edd01e6 switch openpilot compile to TinyJit 2023-02-11 09:51:44 -08:00
George Hotz
608fd730d3 put the JIT in extra 2023-02-11 00:35:18 -06:00
George Hotz
fed95119dc CL.mem_used -> GlobalCounters.mem_used 2023-02-10 23:13:29 -06:00
Kirill
27154db99a Downloads weights in examples/stable_diffusion.py (#537)
* Downloads weights in examples/stable_diffusion.py

* use download_file_if_not_exists in fetch

* make consistent with previous NOCACHE behavior
2023-02-10 14:37:04 -06:00
George Hotz
5ed3622965 add dump to kernel_search 2023-02-10 12:13:30 -06:00
George Hotz
d9555bc478 that turned out to be dumb 2023-02-08 16:52:29 -06:00
George Hotz
3d63934995 refactor to keep cl in the runtime (#545)
* refactor to keep cl in the runtime

* fix thneed, rename cl to _cl

* bugfix + _cuda

* fix tests

* thneed more correct
2023-02-08 16:46:09 -06:00
George Hotz
2844482a60 Mypy fun (#541)
* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz
185d2e3678 fix map_buffer and add some __slots__ 2023-02-07 15:32:48 -06:00
George Hotz
d93563f39f fix KOPT 2023-02-07 06:56:33 -06:00
George Hotz
f7291f6ca3 fixes big KOPT, breaks opencl (#505)
* fixes big KOPT, breaks opencl

* fix optimizer

* KernelCache

* oops, broke batchnorm

* hack to fix it

* fix llvm, less hacky gpu

* disable the cache

* cache just breaks things
2023-02-05 10:46:17 -08:00
George Hotz
cd97b036cc A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
Jacky Lee
799b3f185a Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz
60ccddb58b reenable SWAP 2023-01-30 17:32:02 -08:00
George Hotz
aea55eb196 found failing upcast 2023-01-30 16:12:56 -08:00
George Hotz
b67f997864 tests pass w/o float4 2023-01-30 15:40:49 -08:00
George Hotz
c6f570a2e6 improve progress bar 2023-01-30 14:50:28 -08:00
George Hotz
7118602c97 goat progress bar 2023-01-30 14:37:26 -08:00
George Hotz
cccfea4b25 factor out KOPT code 2023-01-30 13:13:55 -08:00
George Hotz
de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
AllentDan
7b6b1f32b1 [Fix] fix typo: test_mnist -> datasets (#492)
* test_mnist -> datasets

* fix mnist_gan
2023-01-29 21:30:47 -08:00
George Hotz
2db272c7f7 Kernel Optimizer (#489)
* kernel optimizer

* 10x faster, but wrong. not good deal

* move test -> extra

* print x speedup

* clcache

* fix clcache + DEBUG

* GFLOPS estimate

* i==3
2023-01-29 17:15:00 -08:00
George Hotz
bb0cdc2442 111.51x speedup for reduce 2023-01-29 03:06:00 -08:00
George Hotz
45c0aa6e2d search with SHIFT, REDUCE 2023-01-29 02:42:20 -08:00
George Hotz
87879cf4b6 improve search more 2023-01-29 02:08:57 -08:00
George Hotz
f6bbd43cb8 improve search 2023-01-29 01:33:47 -08:00
George Hotz
ebdec2b72f fix optimizer 2023-01-29 00:23:06 -08:00
George Hotz
a9cabce791 oops, broke mem estimates 2023-01-28 20:21:31 -08:00
George Hotz
a500e79bd1 don't OPTWG on OS X, it's way slower 2023-01-28 20:02:33 -08:00
George Hotz
b0df4d99a0 os x profiling: this ratio is exact i believe 2023-01-28 19:02:51 -08:00
George Hotz
ae810eb558 minor cleanups 2023-01-28 08:59:15 -08:00
George Hotz
6d5e1a8029 GEMM kernel search 2023-01-27 10:08:57 -08:00
Comma Device
f08e740957 factor out hand coded opt 2023-01-26 14:54:06 -06:00
George Hotz
5e8a36a18b real op kernel 2023-01-26 09:51:32 -08:00
George Hotz
e0600f537a op kernel in kernel search 2023-01-26 09:47:01 -08:00
George Hotz
aafc29484a cleanups 2023-01-25 12:37:10 -08:00
George Hotz
919e943867 decent search 2023-01-25 12:20:53 -08:00
George Hotz
7f3da91f8b kernel_search 2023-01-25 12:05:09 -08:00
George Hotz
e37424424f first little attempt at search 2023-01-25 11:49:29 -08:00