George Hotz
e6733286df
unify conv and reduce
2022-07-08 08:27:30 -07:00
George Hotz
9c34b3eef3
tighten up gpu kernels
2022-07-08 07:59:04 -07:00
George Hotz
563bf2d8e8
force input/weight to be contiguous (uncached)
2022-07-08 07:40:30 -07:00
George Hotz
1cf805a56a
fix no MERGE_MOVEMENT_OPS bug
2022-07-08 07:27:53 -07:00
George Hotz
c0ef998b48
remove finished todo
2022-07-07 11:39:25 -07:00
George Hotz
715e335c60
fix types
2022-07-07 11:36:09 -07:00
George Hotz
9ee8426c51
much better cache
2022-07-07 11:32:00 -07:00
George Hotz
eb6696c3a5
only childless elementwise ops get merged
2022-07-07 11:13:25 -07:00
George Hotz
04e7e4104c
track graph children and make lazycache use weak references
2022-07-07 11:01:18 -07:00
George Hotz
001cfe83a2
local
2022-07-07 10:05:26 -07:00
George Hotz
2720ef49ca
extra and test and tuple
2022-07-07 10:01:33 -07:00
George Hotz
059fe94700
junk import
2022-07-06 21:47:38 -07:00
George Hotz
a61a4d09ad
merge conv and binary op
2022-07-06 08:27:26 -07:00
George Hotz
6e0015095f
LBCACHE
2022-07-04 16:05:19 -07:00
George Hotz
7a5acd3ace
cache
2022-07-04 16:04:48 -07:00
George Hotz
d5d9cffe7c
training param for batchnorm
2022-07-04 13:28:03 -07:00
George Hotz
21c78b9316
can be v slow
2022-07-04 13:23:34 -07:00
George Hotz
46bce4156f
CL profiling
2022-07-04 13:22:12 -07:00
George Hotz
34f43ea10e
LAZY and CLCACHE are defaults
2022-07-04 13:09:15 -07:00
George Hotz
425b0dcd58
sorry linecount, CLCACHE
2022-07-04 12:52:04 -07:00
George Hotz
b7afd83267
track cl mem used
2022-07-04 12:19:00 -07:00
George Hotz
5ef62c33a1
SHUFFLE_MOVEMENT_OPS is OPT=3
2022-07-04 09:55:30 -07:00
George Hotz
d5de8452c6
dashed loadops
2022-07-04 09:50:56 -07:00
George Hotz
e74adcce5c
refactoring
2022-07-04 09:25:19 -07:00
George Hotz
0bdb021880
separate realize functions for different ops
2022-07-04 09:07:22 -07:00
George Hotz
81b73f97a3
Optiimzation ( #355 )
...
* constant folding into kernels
* that opt worth it?
* fix mypy
* ast one kernel
* save 2 lines in conv kernel
* debug print kernel count
* cl debugging
* early realize inputs
* refactor Device
2022-07-04 08:58:57 -07:00
George Hotz
df7976248b
be lazy with the gpubuffer copies for host for constant folding
2022-07-03 23:04:14 -07:00
George Hotz
4d4ea47ca7
one more line
2022-07-03 17:28:42 -07:00
George Hotz
02cd8510cb
cleanups
2022-07-03 17:23:20 -07:00
George Hotz
d89542640a
hmm, typechecker isn't checking everything
2022-07-03 17:12:51 -07:00
George Hotz
6b0aa2a902
sorry about the line count, this is a good optimization
2022-07-03 17:11:13 -07:00
George Hotz
748618530b
tests will run at okay speed now?
2022-07-03 16:41:52 -07:00
George Hotz
c3d13893f9
add SHUFFLE_MOVEMENT_OPS, exactly 1000 lines
2022-07-03 16:30:42 -07:00
George Hotz
e6e43e820e
should fix tests
2022-07-03 16:06:11 -07:00
George Hotz
71a812fbf2
elementwise_ops
2022-07-03 15:29:38 -07:00
George Hotz
d7aad46758
test lazy also, make TestMNIST faster
2022-07-03 15:19:19 -07:00
Nicklas Boman
64d986bc8b
add mypy to ci testing ( #353 )
2022-07-03 15:11:35 -07:00
George Hotz
57ebce8d67
first LazyBuffer optimizations
2022-07-03 15:09:16 -07:00
George Hotz
a1a20891ef
more types
2022-07-03 14:03:34 -07:00
George Hotz
99b287ed87
typechecks
2022-07-03 13:54:30 -07:00
George Hotz
cdf2be74f9
add neg
2022-07-03 13:04:58 -07:00
George Hotz
72a9ff7011
remove numpy usage
2022-07-03 12:58:51 -07:00
George Hotz
745e36fda5
mlops cleanup
2022-07-03 12:41:05 -07:00
George Hotz
93c378dffc
add test for slice_one
2022-07-03 12:14:20 -07:00
George Hotz
d10dd175f4
fix len 0 shapes in getitem
2022-07-03 12:12:02 -07:00
George Hotz
1b1c82fac7
print underlying buffer if it's realized
2022-07-03 11:52:58 -07:00
George Hotz
df16b455a7
make lazy the default ( #352 )
...
* make lazy the default
* always float32
* while the lazy framework should be default, lazyness itself shouldn't be (for now)
* bugfixes
* remove the need for the ops class
* fxn_for_op
* hmm, my contiguous asserts went away
* move small shape thing
* refactor reduce
* remove the weird unused new functions
* only that install works
* thats broken
* unused imports, should be good if it passes
2022-07-03 11:40:27 -07:00
George Hotz
bbfdd28a6d
flops counter was dumb
2022-07-03 07:41:52 -07:00
George Hotz
c7a580daa9
flip div order
2022-07-02 23:37:22 -07:00
George Hotz
0d82cfd587
huh, torch 1.12 broke it. remove unused requirements.txt and pin torch 1.11
2022-07-02 23:07:59 -07:00