wozeparrot
37cc33269a
cl fixes for multigpu ( #1276 )
...
* feat: opencl fixes for multigpu usage
* clean: who needs this import anyways
2023-07-18 19:59:30 -07:00
Rob Grossman
c8ddc34368
include missing queue in thneed load ( #1095 )
2023-07-02 12:33:59 -07:00
George Hotz
7ecf4dff68
multi cl_queue ( #762 )
...
* multi cl_queue
* only platforms 1
* gpus first, then cpus
* put device on underlying buffer
* cl_queue array
2023-05-03 12:15:28 -07:00
George Hotz
1826ff6b89
dtypes nice and clean ( #673 )
...
* add dtype class
* dtypes
* buffers are lazy
* dtype is tracked by lazybuffer and GenericShape
* fix types in llvm
* llvm store
* dtype tests
* fix tests maybe
* fix flop counter
* fix CI
* CI fix and check format
* fix dtype and dtype check
* fix custom test
* fix test graph
2023-03-10 16:56:07 -08:00
George Hotz
d8dda2af3a
openpilot fixups
2023-03-06 14:14:44 -08:00
George Hotz
a77d792aff
Codegen gpu cleanups ( #640 )
...
* cleanups
* fixups
* handle pre upcasted global buffers
* early is just required
* delete junk from hand coded opt
* implicit upcast_in_mid_reduce
* speedup
* fix exec w validhacks
* reorder opt
* only need to check the output for that
* return total runtime from kernels if debugging
2023-03-04 15:31:51 -08:00
George Hotz
c53efb3635
optimize for CL ( #633 )
...
* required opt
* simplify
* works
* shift_to_last
* required is fine
* print shape in colored
* better shape
* args was wrong
* debugs
* fix empty shape
* colored shape printer
2023-03-03 22:00:09 -08:00
George Hotz
d062cc82b8
put restrict back
2023-03-01 21:34:45 -08:00
George Hotz
bfcec234a2
Refactor ASTs ( #622 )
...
* ugh worst branch name
* compiler refactor continues
* scc -> cloc
* buf -> _buf
* finish _buf, and program -> runtime
* gpu is still working, clang isn't
* clang in new style
* ops_metal
* something broke it
* improve metal
* clean up tons of cl crap
* hack fix sync
* cleaner gpu
* gpu metal clang
* cleanups
* minor refactor
* GPUCodegen
* fix up LLVM
* blind CUDA refactor
* codegen / runtime
* keep ops naming
* linter passes
* woah, llvm was allocing 4x what it needed to
* bugfixes
* fix openpilot compiler
* fix compile_efficientnet
* method cache should fix tests
* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz
4d232c7c95
optional networkx + DEBUGCL=2
2023-02-20 09:50:46 -08:00
George Hotz
d9555bc478
that turned out to be dumb
2023-02-08 16:52:29 -06:00
George Hotz
3d63934995
refactor to keep cl in the runtime ( #545 )
...
* refactor to keep cl in the runtime
* fix thneed, rename cl to _cl
* bugfix + _cuda
* fix tests
* thneed more correct
2023-02-08 16:46:09 -06:00
Jacky Lee
799b3f185a
Refactor getenv into helpers ( #508 )
...
* Refactor getenv into helpers
* Remove unused os
* Fix default value
* Fix more defaults for CI
* Fix bracket
* Revert changes to openpilot/compile.py
* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz
a500e79bd1
don't OPTWG on OS X, it's way slower
2023-01-28 20:02:33 -08:00
George Hotz
b0df4d99a0
os x profiling: this ratio is exact i believe
2023-01-28 19:02:51 -08:00
George Hotz
ae810eb558
minor cleanups
2023-01-28 08:59:15 -08:00
Comma Device
9e2af0a972
too far with the OPTWG
2023-01-24 13:14:59 -06:00
Comma Device
3590848b93
a little more local workgroup options
2023-01-24 12:50:27 -06:00
Comma Device
4b74752c42
fix hotspots by improving the workgroup optimizer
2023-01-24 12:46:28 -06:00
George Hotz
fd760a390a
fix incremental time
2023-01-24 10:19:04 -08:00
George Hotz
a949de873b
reduce 2.0 ( #469 )
...
* reduce 2.0
* works
* hacks
* DEBUG=3 for shapes
* fix types
* 0s weren't being folded
* cleaner
* last_reduce is no longer needed
* comments and cleanup
2023-01-23 15:11:13 -08:00
George Hotz
f1196984e6
harmless to intertwine the math and the stores
2023-01-21 09:31:56 -08:00
George Hotz
0881d504c1
move shapetracker ( #466 )
...
* move shapetracker
* shapetracker test
* move ast
* move a few things
* fix print kernel
* fix test
* symbolic fixups
2023-01-19 09:56:31 -08:00
George Hotz
9245f4650a
indexer changes for master
2023-01-18 18:02:02 -08:00
George Hotz
49c6e6d472
Latest attempt to add image ( #462 )
...
* add image
* load + store + boring stuff:
* image tests pass
* thneed print GFLOPS
* op conv test
* more debugging
* hack for multiview image
* shapetracker creates less views
* disable image tests
* working better
* ugh, lkey not key
* print in DEBUG, and allow views
* works
* simple padding conv2d
* use index for image
* that was bad code
* debug print
* fix types
* less lines
* save lines
2023-01-12 17:36:30 -08:00
George Hotz
281b0db773
three from image
2023-01-12 12:26:58 -08:00
George Hotz
9ff6c532eb
Prereqs for IMAGE=1 ( #461 )
...
* contig
* move ast, debug prog
* add Token
* cleanup reduce
* exec_ast
2023-01-11 20:18:42 -08:00
George Hotz
4885fce56e
shapetracker from newgpu ( #456 )
...
* shapetracker from newgpu
* touchup ops
* test
* testst
* thneed deletes unused inputs
* test
* bugfix
2023-01-09 12:40:01 -08:00
George Hotz
8e22d5ee67
replace networkx with defaultdict
2022-10-20 19:36:43 -07:00
George Hotz
63f9c55156
really dumb bug
2022-10-20 17:07:47 -07:00
George Hotz
1bec4651b3
fix nonstatic weights
2022-10-20 17:04:14 -07:00
George Hotz
50c95c7d9a
add assert to catch issue in attention
2022-10-20 15:13:00 -07:00
George Hotz
26c78ccf7d
remove useless buffer
2022-10-20 14:07:28 -07:00
George Hotz
a18c1f3178
zero out the inputs
2022-10-20 13:46:52 -07:00
George Hotz
c400ee0beb
refactoring thneed ( #400 )
...
* refactoring thneed
* continue
* minor update
* looks like it's working
* big refactor
* confirm thneed got the right output
* code is there but it's broken
* works now
* always OPTWG, input -> dat
* fix type issue
2022-10-20 12:35:59 -07:00