George Hotz
78795e3507
reduce line count by simplifying DeviceBuffer
2023-02-09 12:52:14 -06:00
George Hotz
5de850f6d5
assign buffer reuse ( #547 )
...
* assign buffer reuse works
* fix assign for torch and cpu
* allow assign from numpy
* fix llvm output_buffer
* add some assign tests
* fix assignment test
* test should fail without lazy
* env var to disable assign
2023-02-09 11:53:02 -06:00
George Hotz
473bbd3e35
fix graphs
2023-02-09 09:40:46 -06:00
George Hotz
16a7edc775
move base_fxn_for_op to ops_cpu
2023-02-08 18:23:48 -06:00
George Hotz
c642f5e72b
less lines for torch
2023-02-08 18:15:59 -06:00
George Hotz
58a03eb693
generic processing op
2023-02-08 18:09:17 -06:00
George Hotz
4c2faa4140
functools.partial keeps mypy compiler working
2023-02-08 18:04:32 -06:00
George Hotz
cfd13c083b
refactor GenericShape for a big line reduction
2023-02-08 18:01:08 -06:00
George Hotz
c656513591
GPURunner class will replace CL cache eventually
2023-02-08 17:31:36 -06:00
George Hotz
a5a55ac19e
GlobalCounters cache + assign in optim
2023-02-08 17:10:55 -06:00
George Hotz
d9555bc478
that turned out to be dumb
2023-02-08 16:52:29 -06:00
George Hotz
3d63934995
refactor to keep cl in the runtime ( #545 )
...
* refactor to keep cl in the runtime
* fix thneed, rename cl to _cl
* bugfix + _cuda
* fix tests
* thneed more correct
2023-02-08 16:46:09 -06:00
George Hotz
8c8a5a77dd
refactor llvm into runtime and ops
2023-02-08 16:28:32 -06:00
George Hotz
45ce4de6f3
improve typing
2023-02-08 12:48:21 -06:00
George Hotz
2e1bdc889a
write out all the functions, no auto binding ( #543 )
...
* write out all the functions, no auto binding
* cleanups, more types
* Slice is for internal calls only
* improve typing
* ugh, put slice back
2023-02-08 12:41:39 -06:00
George Hotz
d854337f0d
nn/optim.py compiles now
2023-02-08 11:25:18 -06:00
George Hotz
1029deccb1
refactor ops_cpu and ops_torch to not share code
2023-02-08 11:11:42 -06:00
George Hotz
ee18420c13
dyn add of math ops
2023-02-08 10:04:30 -06:00
George Hotz
2844482a60
Mypy fun ( #541 )
...
* mypy fun
* things are just faster
* running fast
* mypy is fast
* compile.sh
* no gpu hack
* refactor ops_cpu and ops_torch to not subclass
* make weak buffer work
* tensor works
* fix test failing
* cpu/torch cleanups
* no or operator on dict in python 3.8
* that was junk
* fix warnings
* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz
996e0a10b7
update cpu and torch to hold buffers ( #542 )
...
* update cpu and torch to hold buffers
* save lines, and probably faster
2023-02-08 09:40:45 -06:00
Mitchell Goff
ae4f0aeb5f
NumPy-like semantics for Tensor.__getitem__ ( #506 )
...
* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None
* Fixed pad2d
* mypy doesn't know about mlops methods
* normal python behavior for out-of-bounds slicing
* type: ignore
* inlined idxfix
* added comment for __getitem__
* Better comments, better tests, and fixed bug in np.newaxis
2023-02-08 08:59:46 -06:00
George Hotz
0ac3286af0
factor out Device
2023-02-07 16:08:20 -06:00
George Hotz
2aeebd70a6
mypy will compile the shapetracker, no speed up
2023-02-07 15:43:44 -06:00
George Hotz
185d2e3678
fix map_buffer and add some __slots__
2023-02-07 15:32:48 -06:00
George Hotz
aebe75d9a2
remove val expansion ( #539 )
...
* remove val expansion
* types for all shapetracker functions:
* more typing
* add all the parens to the test
* more types
* fix tests
* very minor speedup
2023-02-07 15:14:05 -06:00
George Hotz
001cc96e25
Lazy refactor ( #538 )
...
* refactor lazy to return ASTs
* a lil cleaner
* oops, compare ids
* gate on GRAPH
* cleanups
* less calls to log_op
* simpler
* realize_buffers -> map_buffers
* even simpler
* think in asts
* a lil cleaner
* NOOP means contiguous
2023-02-07 11:53:21 -06:00
George Hotz
02d8cb0959
lazy cleanup
2023-02-07 07:39:53 -06:00
George Hotz
d93563f39f
fix KOPT
2023-02-07 06:56:33 -06:00
Jared Z
7604b17fbf
TestZeroViewShapeTracker fix test ( #481 )
...
* TestZeroViewST test
* updated to align with st naming conventions in file
* Update test_shapetracker.py
2023-02-07 06:17:55 -06:00
George Hotz
c073271f20
more symbolic correctness
2023-02-07 00:03:14 -06:00
George Hotz
e961fd3a04
more symbolic test, ModNode is wrong
2023-02-06 23:43:21 -06:00
George Hotz
8cfeb118d6
symbolic new test
2023-02-06 23:27:26 -06:00
George Hotz
7c5a5ecdac
even simpler symbolic
2023-02-06 22:47:00 -06:00
George Hotz
8b05de1841
symbolic cleanups
2023-02-06 22:12:11 -06:00
George Hotz
2a924e2b77
fix sz.sh for llvm
2023-02-06 15:36:05 -06:00
James Roberts
0d405fd5bc
Parallelize CI tests ( #535 )
2023-02-06 15:27:44 -06:00
Andrey
4977d6f225
using tuples in isinstance ( #534 )
2023-02-06 14:40:26 -06:00
timmermansjoy
d56c57b112
adding more robust install method ( #532 )
2023-02-06 13:12:05 -06:00
George Hotz
fd3807c479
delete cherry and old cuda accel, promote llvm
2023-02-06 10:02:41 -06:00
George Hotz
90529d3750
tests are 20% faster ( #529 )
...
* pytorch CPU
* no cache, it's slower
* pytorch cpu for real
* remove double onnx
2023-02-06 09:56:14 -06:00
George Hotz
039de1b332
oops, pytest is for testing
2023-02-06 09:30:12 -06:00
George Hotz
6eb0e6a650
shuffle deps: always tqdm, make linting category
2023-02-06 09:27:01 -06:00
George Hotz
1d80639646
make linter test install testing deps
2023-02-06 09:21:48 -06:00
George Hotz
60bb64811c
merge mypy into linters, no useless package update
2023-02-06 09:14:00 -06:00
George Hotz
c3d81bba2a
test_train: Adam -> SGD
2023-02-06 08:55:41 -06:00
George Hotz
36c26a57b1
make slow LLVM opt optional
2023-02-05 20:24:12 -06:00
George Hotz
f7291f6ca3
fixes big KOPT, breaks opencl ( #505 )
...
* fixes big KOPT, breaks opencl
* fix optimizer
* KernelCache
* oops, broke batchnorm
* hack to fix it
* fix llvm, less hacky gpu
* disable the cache
* cache just breaks things
2023-02-05 10:46:17 -08:00
Martin Loretz
97f0a82be7
Cache pip packages in github actions ( #522 )
...
* Cache pip dependencies in github actions
* Add setup.py as cache-dependency-path
* Test caching
* Test caching
* Upgrade setup python action
* Test caching
* Remove setup.py from cache-dependency-path
* Don't remove cache-dependency-path
* Don't cache linter package's
* Test caching
* Test caching
* Test caching
* Upgrade actions/checkout to v3
2023-02-03 20:04:20 -08:00
Martin Loretz
4ad67b4bbc
Refactor triton buffer to use CLBuffer of cuda runtime ( #524 )
...
* Refactor triton buffer to use CLBuffer of runtime
* Fix opencl GT0
2023-02-03 20:02:41 -08:00
Jacky Lee
ad4f6aa2cf
Add test for quick_gelu ( #526 )
...
* Add test for quick_gelu
* Bump PyTorch version for approximate
2023-02-03 20:01:39 -08:00