Commit Graph

9457 Commits

Author SHA1 Message Date
George Hotz
68959be05d precompute weights for opencl 2022-07-08 10:56:48 -07:00
George Hotz
d8e7f1f8bc opencl type ignore 2022-07-08 10:33:55 -07:00
George Hotz
ae335b6d3e opencl works, but tons of kernels 2022-07-08 10:22:04 -07:00
George Hotz
5b66d1bb0b begin fixing up opencl 2022-07-08 10:20:14 -07:00
George Hotz
7e17f2ae8d fix mypy, add TODOs 2022-07-08 09:57:22 -07:00
George Hotz
8557ed88df use ast engine for merged reduceop 2022-07-08 09:37:40 -07:00
George Hotz
3656a5615a MERGE_ELEMENTWISE_INTO_REDUCE 2022-07-08 09:32:28 -07:00
George Hotz
ca9532ce29 less lines, and typing found a bug 2022-07-08 08:57:12 -07:00
George Hotz
2035b89e54 wooo 1k lines 2022-07-08 08:44:57 -07:00
George Hotz
2a8c1071d9 cleanups 2022-07-08 08:36:31 -07:00
George Hotz
e6733286df unify conv and reduce 2022-07-08 08:27:30 -07:00
George Hotz
9c34b3eef3 tighten up gpu kernels 2022-07-08 07:59:04 -07:00
George Hotz
563bf2d8e8 force input/weight to be contiguous (uncached) 2022-07-08 07:40:30 -07:00
George Hotz
1cf805a56a fix no MERGE_MOVEMENT_OPS bug 2022-07-08 07:27:53 -07:00
George Hotz
c0ef998b48 remove finished todo 2022-07-07 11:39:25 -07:00
George Hotz
715e335c60 fix types 2022-07-07 11:36:09 -07:00
George Hotz
9ee8426c51 much better cache 2022-07-07 11:32:00 -07:00
George Hotz
eb6696c3a5 only childless elementwise ops get merged 2022-07-07 11:13:25 -07:00
George Hotz
04e7e4104c track graph children and make lazycache use weak references 2022-07-07 11:01:18 -07:00
George Hotz
001cfe83a2 local 2022-07-07 10:05:26 -07:00
George Hotz
2720ef49ca extra and test and tuple 2022-07-07 10:01:33 -07:00
George Hotz
059fe94700 junk import 2022-07-06 21:47:38 -07:00
George Hotz
a61a4d09ad merge conv and binary op 2022-07-06 08:27:26 -07:00
George Hotz
6e0015095f LBCACHE 2022-07-04 16:05:19 -07:00
George Hotz
7a5acd3ace cache 2022-07-04 16:04:48 -07:00
George Hotz
d5d9cffe7c training param for batchnorm 2022-07-04 13:28:03 -07:00
George Hotz
21c78b9316 can be v slow 2022-07-04 13:23:34 -07:00
George Hotz
46bce4156f CL profiling 2022-07-04 13:22:12 -07:00
George Hotz
34f43ea10e LAZY and CLCACHE are defaults 2022-07-04 13:09:15 -07:00
George Hotz
425b0dcd58 sorry linecount, CLCACHE 2022-07-04 12:52:04 -07:00
George Hotz
b7afd83267 track cl mem used 2022-07-04 12:19:00 -07:00
George Hotz
5ef62c33a1 SHUFFLE_MOVEMENT_OPS is OPT=3 2022-07-04 09:55:30 -07:00
George Hotz
d5de8452c6 dashed loadops 2022-07-04 09:50:56 -07:00
George Hotz
e74adcce5c refactoring 2022-07-04 09:25:19 -07:00
George Hotz
0bdb021880 separate realize functions for different ops 2022-07-04 09:07:22 -07:00
George Hotz
81b73f97a3 Optiimzation (#355)
* constant folding into kernels

* that opt worth it?

* fix mypy

* ast one kernel

* save 2 lines in conv kernel

* debug print kernel count

* cl debugging

* early realize inputs

* refactor Device
2022-07-04 08:58:57 -07:00
George Hotz
df7976248b be lazy with the gpubuffer copies for host for constant folding 2022-07-03 23:04:14 -07:00
George Hotz
4d4ea47ca7 one more line 2022-07-03 17:28:42 -07:00
George Hotz
02cd8510cb cleanups 2022-07-03 17:23:20 -07:00
George Hotz
d89542640a hmm, typechecker isn't checking everything 2022-07-03 17:12:51 -07:00
George Hotz
6b0aa2a902 sorry about the line count, this is a good optimization 2022-07-03 17:11:13 -07:00
George Hotz
748618530b tests will run at okay speed now? 2022-07-03 16:41:52 -07:00
George Hotz
c3d13893f9 add SHUFFLE_MOVEMENT_OPS, exactly 1000 lines 2022-07-03 16:30:42 -07:00
George Hotz
e6e43e820e should fix tests 2022-07-03 16:06:11 -07:00
George Hotz
71a812fbf2 elementwise_ops 2022-07-03 15:29:38 -07:00
George Hotz
d7aad46758 test lazy also, make TestMNIST faster 2022-07-03 15:19:19 -07:00
Nicklas Boman
64d986bc8b add mypy to ci testing (#353) 2022-07-03 15:11:35 -07:00
George Hotz
57ebce8d67 first LazyBuffer optimizations 2022-07-03 15:09:16 -07:00
George Hotz
a1a20891ef more types 2022-07-03 14:03:34 -07:00
George Hotz
99b287ed87 typechecks 2022-07-03 13:54:30 -07:00