Commit Graph

28 Commits

Author SHA1 Message Date
George Hotz
ea13504f35 fix METAL_XCODE 2023-02-19 20:02:12 -08:00
George Hotz
1ba847963d reshape and retain metal_matmul 2023-02-19 13:07:23 -08:00
George Hotz
0b3f686530 Good fast triton (#567)
* runtime fixups

* uints and printbufs

* uints don't work
2023-02-19 12:21:55 -08:00
Diogo
a508c2b429 small tweaks to the metal runtime (#562)
* small tweaks to the metal runtime

* create buffer straight from numpy

* reverted back due to bug when adding 1+1

* removed comments
2023-02-19 11:25:13 -08:00
George Hotz
5e6265be6e metal timing, fix speed test 2023-02-17 12:31:54 -08:00
George Hotz
121bd03cbd metal globalcounters 2023-02-17 12:02:54 -08:00
George Hotz
67d1df80ba gid is array, metal works 2023-02-17 11:54:50 -08:00
George Hotz
f9af0322e7 metal can add 2023-02-17 11:45:33 -08:00
George Hotz
89499b303d oops, bad else. why didn't linter catch 2023-02-11 12:02:09 -08:00
George Hotz
7d33f2d659 CL.CACHE is over, GlobalCounters.cache is it 2023-02-11 12:00:14 -08:00
George Hotz
6f9b103878 fix opencl types 2023-02-10 23:18:39 -06:00
George Hotz
fed95119dc CL.mem_used -> GlobalCounters.mem_used 2023-02-10 23:13:29 -06:00
George Hotz
77988e3236 fix str() line count bug in scc 2023-02-10 22:53:30 -06:00
George Hotz
a4cb161bd4 log_kernel 2023-02-10 21:51:53 -06:00
George Hotz
473bbd3e35 fix graphs 2023-02-09 09:40:46 -06:00
George Hotz
d9555bc478 that turned out to be dumb 2023-02-08 16:52:29 -06:00
George Hotz
3d63934995 refactor to keep cl in the runtime (#545)
* refactor to keep cl in the runtime

* fix thneed, rename cl to _cl

* bugfix + _cuda

* fix tests

* thneed more correct
2023-02-08 16:46:09 -06:00
George Hotz
8c8a5a77dd refactor llvm into runtime and ops 2023-02-08 16:28:32 -06:00
George Hotz
2844482a60 Mypy fun (#541)
* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz
aebe75d9a2 remove val expansion (#539)
* remove val expansion

* types for all shapetracker functions:

* more typing

* add all the parens to the test

* more types

* fix tests

* very minor speedup
2023-02-07 15:14:05 -06:00
Martin Loretz
4ad67b4bbc Refactor triton buffer to use CLBuffer of cuda runtime (#524)
* Refactor triton buffer to use CLBuffer of runtime

* Fix opencl GT0
2023-02-03 20:02:41 -08:00
James Roberts
db0a9b0a2d Refactor CL.time_sum into GlobalCounters (#519) 2023-02-01 20:13:56 -08:00
Jacky Lee
799b3f185a Refactor getenv into helpers (#508)
* Refactor getenv into helpers

* Remove unused os

* Fix default value

* Fix more defaults for CI

* Fix bracket

* Revert changes to openpilot/compile.py

* Use getenv from helpers when possible
2023-01-31 15:09:09 -08:00
George Hotz
cccfea4b25 factor out KOPT code 2023-01-30 13:13:55 -08:00
George Hotz
2db272c7f7 Kernel Optimizer (#489)
* kernel optimizer

* 10x faster, but wrong. not good deal

* move test -> extra

* print x speedup

* clcache

* fix clcache + DEBUG

* GFLOPS estimate

* i==3
2023-01-29 17:15:00 -08:00
George Hotz
b0df4d99a0 os x profiling: this ratio is exact i believe 2023-01-28 19:02:51 -08:00
George Hotz
aea29f8a6e fix CUDA reduce 2023-01-28 13:38:58 -08:00
George Hotz
bd8a5c2ced Simple CUDA Runtime (#480)
* factor out opencl runtime

* don't use CL outside the runtime

* cuda runtime adds

* final_dimension

* tests pass with CUDA backend

* more cuda

* cuda simpler

* retain old functionality

* linter and typing

* move globalcounters out of runtimes

* oops, GlobalCounters in cuda

* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00