George Hotz
4d232c7c95
optional networkx + DEBUGCL=2
2023-02-20 09:50:46 -08:00
George Hotz
bbfec2fde7
8.46 TFLOPS
2023-02-19 13:21:25 -08:00
George Hotz
1ba847963d
reshape and retain metal_matmul
2023-02-19 13:07:23 -08:00
Kirill
7944cfdadc
Remove Tensor.data ( #565 )
2023-02-18 16:36:12 -08:00
Jacky Lee
9fd41632c6
Import get_parameters from tinygrad.nn ( #559 )
...
* get_parameter is in optim
* Update all imports for get_parameters
* Clean up
* use optim.get_paramters
2023-02-17 15:22:26 -08:00
George Hotz
82c257e8f5
more kernel search
2023-02-12 10:34:56 -08:00
George Hotz
de71c13934
test speed v torch uses jit
2023-02-12 07:43:17 -08:00
George Hotz
ba3bf5bdf7
cifar stops learning
2023-02-11 17:21:42 -08:00
George Hotz
40f3949742
fancier KOPT
2023-02-11 16:40:25 -08:00
George Hotz
446442dbb3
fix tests symbolic
2023-02-11 15:16:47 -08:00
George Hotz
20a351a3c6
hand optim CONVW
2023-02-11 14:41:08 -08:00
George Hotz
031edd01e6
switch openpilot compile to TinyJit
2023-02-11 09:51:44 -08:00
George Hotz
608fd730d3
put the JIT in extra
2023-02-11 00:35:18 -06:00
George Hotz
fed95119dc
CL.mem_used -> GlobalCounters.mem_used
2023-02-10 23:13:29 -06:00
Kirill
27154db99a
Downloads weights in examples/stable_diffusion.py ( #537 )
...
* Downloads weights in examples/stable_diffusion.py
* use download_file_if_not_exists in fetch
* make consistent with previous NOCACHE behavior
2023-02-10 14:37:04 -06:00
George Hotz
5ed3622965
add dump to kernel_search
2023-02-10 12:13:30 -06:00
George Hotz
d9555bc478
that turned out to be dumb
2023-02-08 16:52:29 -06:00
George Hotz
3d63934995
refactor to keep cl in the runtime ( #545 )
...
* refactor to keep cl in the runtime
* fix thneed, rename cl to _cl
* bugfix + _cuda
* fix tests
* thneed more correct
2023-02-08 16:46:09 -06:00
George Hotz
2844482a60
Mypy fun ( #541 )
...
* mypy fun
* things are just faster
* running fast
* mypy is fast
* compile.sh
* no gpu hack
* refactor ops_cpu and ops_torch to not subclass
* make weak buffer work
* tensor works
* fix test failing
* cpu/torch cleanups
* no or operator on dict in python 3.8
* that was junk
* fix warnings
* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz
185d2e3678
fix map_buffer and add some __slots__
2023-02-07 15:32:48 -06:00
George Hotz
d93563f39f
fix KOPT
2023-02-07 06:56:33 -06:00
George Hotz
f7291f6ca3
fixes big KOPT, breaks opencl ( #505 )
...
* fixes big KOPT, breaks opencl
* fix optimizer
* KernelCache
* oops, broke batchnorm
* hack to fix it
* fix llvm, less hacky gpu
* disable the cache
* cache just breaks things
2023-02-05 10:46:17 -08:00
George Hotz
cd97b036cc
A Triton backend for tinygrad ( #470 )
...
* triton can add
* print stuff from triton
* write out file
* ops triton working
* reduce ops
* sort of works
* Triton bugfixes & implementation of remaining ops (#490 )
* padding
* support pow, max, relu, gt0
* allocate return buffer
* Fix reduce
* Add tests for power op
* Fix triton illegal memory accesses and memory leak (#512 )
* Fix mypy issue
* Add triton to setup.py
* Replace torch with pycuda
* Use one cuda stream for data transfer and kernels
* Remove triton submodule
* Fix memory leak by using weakrefs for caching
* Fix memory access by adding valid as mask for load
* Fix invalid kernel launches by flattening the grid (#515 )
---------
Co-authored-by: Martin Loretz <20306567+martinloretzzz@users.noreply.github.com >
2023-02-01 11:53:57 -08:00
Jacky Lee
799b3f185a
Refactor getenv into helpers ( #508 )
...
* Refactor getenv into helpers
* Remove unused os
* Fix default value
* Fix more defaults for CI
* Fix bracket
* Revert changes to openpilot/compile.py
* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz
60ccddb58b
reenable SWAP
2023-01-30 17:32:02 -08:00
George Hotz
aea55eb196
found failing upcast
2023-01-30 16:12:56 -08:00
George Hotz
b67f997864
tests pass w/o float4
2023-01-30 15:40:49 -08:00
George Hotz
c6f570a2e6
improve progress bar
2023-01-30 14:50:28 -08:00
George Hotz
7118602c97
goat progress bar
2023-01-30 14:37:26 -08:00
George Hotz
cccfea4b25
factor out KOPT code
2023-01-30 13:13:55 -08:00
George Hotz
de2c419fd4
make_pair and first attempt at hlb_cifar10
2023-01-30 11:07:23 -08:00
AllentDan
7b6b1f32b1
[Fix] fix typo: test_mnist -> datasets ( #492 )
...
* test_mnist -> datasets
* fix mnist_gan
2023-01-29 21:30:47 -08:00
George Hotz
2db272c7f7
Kernel Optimizer ( #489 )
...
* kernel optimizer
* 10x faster, but wrong. not good deal
* move test -> extra
* print x speedup
* clcache
* fix clcache + DEBUG
* GFLOPS estimate
* i==3
2023-01-29 17:15:00 -08:00
George Hotz
bb0cdc2442
111.51x speedup for reduce
2023-01-29 03:06:00 -08:00
George Hotz
45c0aa6e2d
search with SHIFT, REDUCE
2023-01-29 02:42:20 -08:00
George Hotz
87879cf4b6
improve search more
2023-01-29 02:08:57 -08:00
George Hotz
f6bbd43cb8
improve search
2023-01-29 01:33:47 -08:00
George Hotz
ebdec2b72f
fix optimizer
2023-01-29 00:23:06 -08:00
George Hotz
a9cabce791
oops, broke mem estimates
2023-01-28 20:21:31 -08:00
George Hotz
a500e79bd1
don't OPTWG on OS X, it's way slower
2023-01-28 20:02:33 -08:00
George Hotz
b0df4d99a0
os x profiling: this ratio is exact i believe
2023-01-28 19:02:51 -08:00
George Hotz
ae810eb558
minor cleanups
2023-01-28 08:59:15 -08:00
George Hotz
6d5e1a8029
GEMM kernel search
2023-01-27 10:08:57 -08:00
Comma Device
f08e740957
factor out hand coded opt
2023-01-26 14:54:06 -06:00
George Hotz
5e8a36a18b
real op kernel
2023-01-26 09:51:32 -08:00
George Hotz
e0600f537a
op kernel in kernel search
2023-01-26 09:47:01 -08:00
George Hotz
aafc29484a
cleanups
2023-01-25 12:37:10 -08:00
George Hotz
919e943867
decent search
2023-01-25 12:20:53 -08:00
George Hotz
7f3da91f8b
kernel_search
2023-01-25 12:05:09 -08:00
George Hotz
e37424424f
first little attempt at search
2023-01-25 11:49:29 -08:00