George Hotz
381f3e92da
fix prints, add third conv
2023-01-28 14:10:27 -08:00
George Hotz
92001a06e1
openpilot/go.sh
2023-01-28 13:57:43 -08:00
George Hotz
aea29f8a6e
fix CUDA reduce
2023-01-28 13:38:58 -08:00
George Hotz
0f34c24aeb
move expr_idxs to shapetracker
2023-01-28 12:25:05 -08:00
George Hotz
f2e81f7208
line reduction and cleanups
2023-01-28 12:17:40 -08:00
George Hotz
03dd1201dc
local buffer implied
2023-01-28 12:06:28 -08:00
George Hotz
b3e4e678e8
Use ShapeTracker for tracking shapes in kernels ( #485 )
...
* local is a normal buffer
* remove extra shapes and strides
* fix opt
* fix llvm
2023-01-28 11:56:32 -08:00
George Hotz
259c48f235
discord image is invite link
2023-01-28 11:42:11 -08:00
George Hotz
d748000ada
tinygrad discord
2023-01-28 11:36:15 -08:00
George Hotz
ae810eb558
minor cleanups
2023-01-28 08:59:15 -08:00
George Hotz
713318745d
padding size in get_conv_args
2023-01-28 08:47:18 -08:00
George Hotz
299d1cdc9c
lil cleanup of load ldr
2023-01-28 00:31:57 -08:00
George Hotz
2b5bc5d4a1
factor out image_idx
2023-01-28 00:22:54 -08:00
George Hotz
bd8a5c2ced
Simple CUDA Runtime ( #480 )
...
* factor out opencl runtime
* don't use CL outside the runtime
* cuda runtime adds
* final_dimension
* tests pass with CUDA backend
* more cuda
* cuda simpler
* retain old functionality
* linter and typing
* move globalcounters out of runtimes
* oops, GlobalCounters in cuda
* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00
George Hotz
6d5e1a8029
GEMM kernel search
2023-01-27 10:08:57 -08:00
George Hotz
123993156d
refactor group_for_reduce a little
2023-01-27 08:51:23 -08:00
George Hotz
82e58108e3
add flake8 to precommit
2023-01-26 22:31:45 -08:00
George Hotz
f4b571039b
fix shape types
2023-01-26 22:29:20 -08:00
Jacky Lee
026ba78526
Add commit hooks ( #478 )
...
* Add pre-commit hook
* We need ret
* Fix some type definitions
2023-01-26 22:24:31 -08:00
George Hotz
c07bc39941
fix mypy, plz add commit hooks
2023-01-26 14:25:42 -08:00
Comma Device
f08e740957
factor out hand coded opt
2023-01-26 14:54:06 -06:00
George Hotz
5e8a36a18b
real op kernel
2023-01-26 09:51:32 -08:00
George Hotz
e0600f537a
op kernel in kernel search
2023-01-26 09:47:01 -08:00
George Hotz
60acb2641f
ugh, don't use os
2023-01-25 19:41:21 -08:00
George Hotz
b1dec64815
new types and fixup ShapeTracker type mismatches
2023-01-25 19:39:36 -08:00
George Hotz
1b624a5051
DeviceBuffer has abstract methods
2023-01-25 19:16:23 -08:00
George Hotz
faab6461dd
that lambda is required
2023-01-25 18:46:56 -08:00
George Hotz
44e96c58b4
touch up pytorch speed tests
2023-01-25 18:11:26 -08:00
George Hotz
8db345d846
functools.partialmethod -> lambda fixes Python 3.11
2023-01-25 18:08:38 -08:00
calledit
a0af1045bf
Some new tests ( #440 )
...
* Make test run
* Added new tests: sub pow constant_sub
* Fix indentation
* Added one to many lines
* Fix indentation
* Update test_cl_tiler.py
* Delete test_cl_tiler.py
2023-01-25 15:40:19 -08:00
George Hotz
aafc29484a
cleanups
2023-01-25 12:37:10 -08:00
George Hotz
919e943867
decent search
2023-01-25 12:20:53 -08:00
George Hotz
7f3da91f8b
kernel_search
2023-01-25 12:05:09 -08:00
George Hotz
e37424424f
first little attempt at search
2023-01-25 11:49:29 -08:00
George Hotz
c15e9c3c7a
comment where future perf should go
2023-01-25 11:13:57 -08:00
George Hotz
ee1f6ab3ca
flip output shape extra dimension indexing for speed
2023-01-25 11:00:37 -08:00
George Hotz
335a261a2e
test for slow kernel
2023-01-25 10:25:22 -08:00
George Hotz
0d594ccc51
mps option in torch (note: it's broken)
2023-01-25 10:10:39 -08:00
George Hotz
66da3bc3c0
reset the benchmark timer
2023-01-25 09:20:34 -08:00
George Hotz
f5be4043ac
fix OSX CL kernel timing
2023-01-25 08:37:18 -08:00
George Hotz
f6fc2a0d98
huh, this prevents an extra kernel
2023-01-25 07:53:35 -08:00
George Hotz
487685919b
Revert "Rename Normalize and move to nn ( #415 )" ( #474 )
...
This reverts commit d768acb6a9 .
2023-01-25 07:50:04 -08:00
Jacky Lee
d768acb6a9
Rename Normalize and move to nn ( #415 )
...
* Rename Normalize and move to nn
* Fix comparison to None error
* Add test for GroupNorm
* Rename test case
* Flip parameters to match PyTorch
* Increase error tolerance
* Fix elementwise_affine on channels
* Match arguments with PyTorch
* Initialize weight and bias only when affine is true
* Is this it?
* A bit cleaner
* Handle case where weight or bias is None
2023-01-25 07:47:59 -08:00
George Hotz
baf64c14ac
cleanups, simple padding in the processing op
2023-01-25 07:37:52 -08:00
George Hotz
3acf62d489
cleanups for IMAGE=2 conv
2023-01-25 07:18:34 -08:00
George Hotz
6d7658db12
delete opencl <celebration>
2023-01-24 14:18:35 -08:00
George Hotz
e313c8af20
update openpilot tests from OPENCL to GPU
2023-01-24 14:05:20 -08:00
George Hotz
2e1d47b166
there's a bug in scc for empty string
2023-01-24 12:06:06 -08:00
George Hotz
e9c293361b
fix typo
2023-01-24 12:03:58 -08:00
Comma Device
9e2af0a972
too far with the OPTWG
2023-01-24 13:14:59 -06:00