Commit Graph

1317 Commits

Author SHA1 Message Date
George Hotz
2b5bc5d4a1 factor out image_idx 2023-01-28 00:22:54 -08:00
George Hotz
bd8a5c2ced Simple CUDA Runtime (#480)
* factor out opencl runtime

* don't use CL outside the runtime

* cuda runtime adds

* final_dimension

* tests pass with CUDA backend

* more cuda

* cuda simpler

* retain old functionality

* linter and typing

* move globalcounters out of runtimes

* oops, GlobalCounters in cuda

* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00
George Hotz
6d5e1a8029 GEMM kernel search 2023-01-27 10:08:57 -08:00
George Hotz
123993156d refactor group_for_reduce a little 2023-01-27 08:51:23 -08:00
George Hotz
82e58108e3 add flake8 to precommit 2023-01-26 22:31:45 -08:00
George Hotz
f4b571039b fix shape types 2023-01-26 22:29:20 -08:00
Jacky Lee
026ba78526 Add commit hooks (#478)
* Add pre-commit hook

* We need ret

* Fix some type definitions
2023-01-26 22:24:31 -08:00
George Hotz
c07bc39941 fix mypy, plz add commit hooks 2023-01-26 14:25:42 -08:00
Comma Device
f08e740957 factor out hand coded opt 2023-01-26 14:54:06 -06:00
George Hotz
5e8a36a18b real op kernel 2023-01-26 09:51:32 -08:00
George Hotz
e0600f537a op kernel in kernel search 2023-01-26 09:47:01 -08:00
George Hotz
60acb2641f ugh, don't use os 2023-01-25 19:41:21 -08:00
George Hotz
b1dec64815 new types and fixup ShapeTracker type mismatches 2023-01-25 19:39:36 -08:00
George Hotz
1b624a5051 DeviceBuffer has abstract methods 2023-01-25 19:16:23 -08:00
George Hotz
faab6461dd that lambda is required 2023-01-25 18:46:56 -08:00
George Hotz
44e96c58b4 touch up pytorch speed tests 2023-01-25 18:11:26 -08:00
George Hotz
8db345d846 functools.partialmethod -> lambda fixes Python 3.11 2023-01-25 18:08:38 -08:00
calledit
a0af1045bf Some new tests (#440)
* Make test run

* Added new tests: sub pow constant_sub

* Fix indentation

* Added one to many lines

* Fix indentation

* Update test_cl_tiler.py

* Delete test_cl_tiler.py
2023-01-25 15:40:19 -08:00
George Hotz
aafc29484a cleanups 2023-01-25 12:37:10 -08:00
George Hotz
919e943867 decent search 2023-01-25 12:20:53 -08:00
George Hotz
7f3da91f8b kernel_search 2023-01-25 12:05:09 -08:00
George Hotz
e37424424f first little attempt at search 2023-01-25 11:49:29 -08:00
George Hotz
c15e9c3c7a comment where future perf should go 2023-01-25 11:13:57 -08:00
George Hotz
ee1f6ab3ca flip output shape extra dimension indexing for speed 2023-01-25 11:00:37 -08:00
George Hotz
335a261a2e test for slow kernel 2023-01-25 10:25:22 -08:00
George Hotz
0d594ccc51 mps option in torch (note: it's broken) 2023-01-25 10:10:39 -08:00
George Hotz
66da3bc3c0 reset the benchmark timer 2023-01-25 09:20:34 -08:00
George Hotz
f5be4043ac fix OSX CL kernel timing 2023-01-25 08:37:18 -08:00
George Hotz
f6fc2a0d98 huh, this prevents an extra kernel 2023-01-25 07:53:35 -08:00
George Hotz
487685919b Revert "Rename Normalize and move to nn (#415)" (#474)
This reverts commit d768acb6a9.
2023-01-25 07:50:04 -08:00
Jacky Lee
d768acb6a9 Rename Normalize and move to nn (#415)
* Rename Normalize and move to nn

* Fix comparison to None error

* Add test for GroupNorm

* Rename test case

* Flip parameters to match PyTorch

* Increase error tolerance

* Fix elementwise_affine on channels

* Match arguments with PyTorch

* Initialize weight and bias only when affine is true

* Is this it?

* A bit cleaner

* Handle case where weight or bias is None
2023-01-25 07:47:59 -08:00
George Hotz
baf64c14ac cleanups, simple padding in the processing op 2023-01-25 07:37:52 -08:00
George Hotz
3acf62d489 cleanups for IMAGE=2 conv 2023-01-25 07:18:34 -08:00
George Hotz
6d7658db12 delete opencl <celebration> 2023-01-24 14:18:35 -08:00
George Hotz
e313c8af20 update openpilot tests from OPENCL to GPU 2023-01-24 14:05:20 -08:00
George Hotz
2e1d47b166 there's a bug in scc for empty string 2023-01-24 12:06:06 -08:00
George Hotz
e9c293361b fix typo 2023-01-24 12:03:58 -08:00
Comma Device
9e2af0a972 too far with the OPTWG 2023-01-24 13:14:59 -06:00
Comma Device
3590848b93 a little more local workgroup options 2023-01-24 12:50:27 -06:00
Comma Device
4b74752c42 fix hotspots by improving the workgroup optimizer 2023-01-24 12:46:28 -06:00
George Hotz
fd760a390a fix incremental time 2023-01-24 10:19:04 -08:00
George Hotz
7a369b856b nope, no default NATIVE_EXPLOG 2023-01-24 10:01:52 -08:00
George Hotz
78fedc13d1 native_explog is default 2023-01-24 08:09:43 -08:00
George Hotz
5d350d4883 the ast test is actually a test now 2023-01-24 07:53:24 -08:00
George Hotz
7a159b9b04 tinygrad got big...make it tiny again 2023-01-23 21:33:56 -08:00
George Hotz
6286ace4f1 does this work yet (#471) 2023-01-23 20:36:17 -08:00
George Hotz
c22554f44a floats for nvidia 2023-01-23 16:36:10 -08:00
George Hotz
6fe9edf30f torch cuda is very fast 2023-01-23 16:24:46 -08:00
George Hotz
a949de873b reduce 2.0 (#469)
* reduce 2.0

* works

* hacks

* DEBUG=3 for shapes

* fix types

* 0s weren't being folded

* cleaner

* last_reduce is no longer needed

* comments and cleanup
2023-01-23 15:11:13 -08:00
George Hotz
a6de94b444 test partial sum 2023-01-22 21:28:40 -08:00