Commit Graph

10417 Commits

Author SHA1 Message Date
James Roberts
db0a9b0a2d Refactor CL.time_sum into GlobalCounters (#519) 2023-02-01 20:13:56 -08:00
Martin Loretz
45e847d284 Update triton to work in master (#517)
* Update triton to work in master

* Move mem_estimate out of runner
2023-02-01 12:58:14 -08:00
George Hotz
5e37f084db stable diffusion: clean up constant folding 2023-02-01 12:53:16 -08:00
George Hotz
175c38d1b3 triton: it already was GT0 2023-02-01 12:00:33 -08:00
Jacky Lee
486f023e81 Rename Normalize and move to nn (#513)
* Rename Normalize and move to nn

* Match PyTorch for dim>1
2023-02-01 11:55:03 -08:00
George Hotz
cd97b036cc A Triton backend for tinygrad (#470)
* triton can add

* print stuff from triton

* write out file

* ops triton working

* reduce ops

* sort of works

* Triton bugfixes & implementation of remaining ops (#490)

* padding

* support pow, max, relu, gt0

* allocate return buffer

* Fix reduce

* Add tests for power op

* Fix triton illegal memory accesses and memory leak (#512)

* Fix mypy issue

* Add triton to setup.py

* Replace torch with pycuda

* Use one cuda stream for data transfer and kernels

* Remove triton submodule

* Fix memory leak by using weakrefs for caching

* Fix memory access by adding valid as mask for load

* Fix invalid kernel launches by flattening the grid (#515)

---------

Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com>
2023-02-01 11:53:57 -08:00
George Hotz
4e24002bbe no generic exceptions 2023-02-01 11:14:37 -08:00
Jacky Lee
54c68defc7 Replace SIGN with GT0 (#511)
* Replace sign with gt0

* Replace sign with gt0

* GT0 works on GPU

* Fix brackets

---------

Co-authored-by: Tom Finet <tom.codeninja@gmail.com>
2023-02-01 11:01:39 -08:00
Jacky Lee
799b3f185a Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz
d91b6711ea oops, broke BN 2023-01-31 08:18:48 -08:00
George Hotz
21f2af08d5 getenv + graphing 2023-01-30 19:15:03 -08:00
Jacky Lee
491e78d203 Add symbolic tests for correctness (#494)
* [WIP] Add symbolic tests for correctness

* Fix typo

* Fix expected value for test_and_fold

* Add more tests for symbolic

* It is indeed right

* Clean up

* Check all strings

* Put TODO back
2023-01-30 18:40:16 -08:00
George Hotz
60ccddb58b reenable SWAP 2023-01-30 17:32:02 -08:00
George Hotz
c1a769b68b fix bug in gpu copy out 2023-01-30 16:51:28 -08:00
George Hotz
e87410c531 fix multiple accumulators 2023-01-30 16:22:26 -08:00
George Hotz
aea55eb196 found failing upcast 2023-01-30 16:12:56 -08:00
George Hotz
b67f997864 tests pass w/o float4 2023-01-30 15:40:49 -08:00
George Hotz
c6f570a2e6 improve progress bar 2023-01-30 14:50:28 -08:00
Kevin Gilpin
4685c9c095 Big changes (#498)
Use make_pair
2023-01-30 14:42:22 -08:00
George Hotz
7118602c97 goat progress bar 2023-01-30 14:37:26 -08:00
George Hotz
7ee0d99c70 CLCACHE 2023-01-30 14:02:06 -08:00
George Hotz
7457f0d755 KOPT=2 2023-01-30 13:28:06 -08:00
George Hotz
cccfea4b25 factor out KOPT code 2023-01-30 13:13:55 -08:00
George Hotz
de2c419fd4 make_pair and first attempt at hlb_cifar10 2023-01-30 11:07:23 -08:00
AllentDan
7b6b1f32b1 [Fix] fix typo: test_mnist -> datasets (#492)
* test_mnist -> datasets

* fix mnist_gan
2023-01-29 21:30:47 -08:00
George Hotz
2db272c7f7 Kernel Optimizer (#489)
* kernel optimizer

* 10x faster, but wrong. not good deal

* move test -> extra

* print x speedup

* clcache

* fix clcache + DEBUG

* GFLOPS estimate

* i==3
2023-01-29 17:15:00 -08:00
Martin Loretz
43abbd3d00 Use force_create to allocate return buffer (#491) 2023-01-29 17:13:10 -08:00
George Hotz
bb0cdc2442 111.51x speedup for reduce 2023-01-29 03:06:00 -08:00
George Hotz
45c0aa6e2d search with SHIFT, REDUCE 2023-01-29 02:42:20 -08:00
George Hotz
87879cf4b6 improve search more 2023-01-29 02:08:57 -08:00
George Hotz
f6bbd43cb8 improve search 2023-01-29 01:33:47 -08:00
George Hotz
ebdec2b72f fix optimizer 2023-01-29 00:23:06 -08:00
George Hotz
a9cabce791 oops, broke mem estimates 2023-01-28 20:21:31 -08:00
George Hotz
a500e79bd1 don't OPTWG on OS X, it's way slower 2023-01-28 20:02:33 -08:00
George Hotz
b0df4d99a0 os x profiling: this ratio is exact i believe 2023-01-28 19:02:51 -08:00
George Hotz
c0963b723e should fix tests 2023-01-28 15:13:03 -08:00
George Hotz
b134a4f3d1 don't upcast already upcasted 2023-01-28 14:58:28 -08:00
George Hotz
2f194aadad loop unrolling upcast 2023-01-28 14:51:24 -08:00
George Hotz
381f3e92da fix prints, add third conv 2023-01-28 14:10:27 -08:00
George Hotz
92001a06e1 openpilot/go.sh 2023-01-28 13:57:43 -08:00
George Hotz
aea29f8a6e fix CUDA reduce 2023-01-28 13:38:58 -08:00
George Hotz
0f34c24aeb move expr_idxs to shapetracker 2023-01-28 12:25:05 -08:00
George Hotz
f2e81f7208 line reduction and cleanups 2023-01-28 12:17:40 -08:00
George Hotz
03dd1201dc local buffer implied 2023-01-28 12:06:28 -08:00
George Hotz
b3e4e678e8 Use ShapeTracker for tracking shapes in kernels (#485)
* local is a normal buffer

* remove extra shapes and strides

* fix opt

* fix llvm
2023-01-28 11:56:32 -08:00
George Hotz
259c48f235 discord image is invite link 2023-01-28 11:42:11 -08:00
George Hotz
d748000ada tinygrad discord 2023-01-28 11:36:15 -08:00
George Hotz
ae810eb558 minor cleanups 2023-01-28 08:59:15 -08:00
George Hotz
713318745d padding size in get_conv_args 2023-01-28 08:47:18 -08:00
George Hotz
299d1cdc9c lil cleanup of load ldr 2023-01-28 00:31:57 -08:00