George Hotz
21f2af08d5
getenv + graphing
2023-01-30 19:15:03 -08:00
Jacky Lee
491e78d203
Add symbolic tests for correctness ( #494 )
...
* [WIP] Add symbolic tests for correctness
* Fix typo
* Fix expected value for test_and_fold
* Add more tests for symbolic
* It is indeed right
* Clean up
* Check all strings
* Put TODO back
2023-01-30 18:40:16 -08:00
George Hotz
60ccddb58b
reenable SWAP
2023-01-30 17:32:02 -08:00
George Hotz
c1a769b68b
fix bug in gpu copy out
2023-01-30 16:51:28 -08:00
George Hotz
e87410c531
fix multiple accumulators
2023-01-30 16:22:26 -08:00
George Hotz
aea55eb196
found failing upcast
2023-01-30 16:12:56 -08:00
George Hotz
b67f997864
tests pass w/o float4
2023-01-30 15:40:49 -08:00
George Hotz
c6f570a2e6
improve progress bar
2023-01-30 14:50:28 -08:00
Kevin Gilpin
4685c9c095
Big changes ( #498 )
...
Use make_pair
2023-01-30 14:42:22 -08:00
George Hotz
7118602c97
goat progress bar
2023-01-30 14:37:26 -08:00
George Hotz
7ee0d99c70
CLCACHE
2023-01-30 14:02:06 -08:00
George Hotz
7457f0d755
KOPT=2
2023-01-30 13:28:06 -08:00
George Hotz
cccfea4b25
factor out KOPT code
2023-01-30 13:13:55 -08:00
George Hotz
de2c419fd4
make_pair and first attempt at hlb_cifar10
2023-01-30 11:07:23 -08:00
AllentDan
7b6b1f32b1
[Fix] fix typo: test_mnist -> datasets ( #492 )
...
* test_mnist -> datasets
* fix mnist_gan
2023-01-29 21:30:47 -08:00
George Hotz
2db272c7f7
Kernel Optimizer ( #489 )
...
* kernel optimizer
* 10x faster, but wrong. not good deal
* move test -> extra
* print x speedup
* clcache
* fix clcache + DEBUG
* GFLOPS estimate
* i==3
2023-01-29 17:15:00 -08:00
Martin Loretz
43abbd3d00
Use force_create to allocate return buffer ( #491 )
2023-01-29 17:13:10 -08:00
George Hotz
bb0cdc2442
111.51x speedup for reduce
2023-01-29 03:06:00 -08:00
George Hotz
45c0aa6e2d
search with SHIFT, REDUCE
2023-01-29 02:42:20 -08:00
George Hotz
87879cf4b6
improve search more
2023-01-29 02:08:57 -08:00
George Hotz
f6bbd43cb8
improve search
2023-01-29 01:33:47 -08:00
George Hotz
ebdec2b72f
fix optimizer
2023-01-29 00:23:06 -08:00
George Hotz
a9cabce791
oops, broke mem estimates
2023-01-28 20:21:31 -08:00
George Hotz
a500e79bd1
don't OPTWG on OS X, it's way slower
2023-01-28 20:02:33 -08:00
George Hotz
b0df4d99a0
os x profiling: this ratio is exact i believe
2023-01-28 19:02:51 -08:00
George Hotz
c0963b723e
should fix tests
2023-01-28 15:13:03 -08:00
George Hotz
b134a4f3d1
don't upcast already upcasted
2023-01-28 14:58:28 -08:00
George Hotz
2f194aadad
loop unrolling upcast
2023-01-28 14:51:24 -08:00
George Hotz
381f3e92da
fix prints, add third conv
2023-01-28 14:10:27 -08:00
George Hotz
92001a06e1
openpilot/go.sh
2023-01-28 13:57:43 -08:00
George Hotz
aea29f8a6e
fix CUDA reduce
2023-01-28 13:38:58 -08:00
George Hotz
0f34c24aeb
move expr_idxs to shapetracker
2023-01-28 12:25:05 -08:00
George Hotz
f2e81f7208
line reduction and cleanups
2023-01-28 12:17:40 -08:00
George Hotz
03dd1201dc
local buffer implied
2023-01-28 12:06:28 -08:00
George Hotz
b3e4e678e8
Use ShapeTracker for tracking shapes in kernels ( #485 )
...
* local is a normal buffer
* remove extra shapes and strides
* fix opt
* fix llvm
2023-01-28 11:56:32 -08:00
George Hotz
259c48f235
discord image is invite link
2023-01-28 11:42:11 -08:00
George Hotz
d748000ada
tinygrad discord
2023-01-28 11:36:15 -08:00
George Hotz
ae810eb558
minor cleanups
2023-01-28 08:59:15 -08:00
George Hotz
713318745d
padding size in get_conv_args
2023-01-28 08:47:18 -08:00
George Hotz
299d1cdc9c
lil cleanup of load ldr
2023-01-28 00:31:57 -08:00
George Hotz
2b5bc5d4a1
factor out image_idx
2023-01-28 00:22:54 -08:00
George Hotz
bd8a5c2ced
Simple CUDA Runtime ( #480 )
...
* factor out opencl runtime
* don't use CL outside the runtime
* cuda runtime adds
* final_dimension
* tests pass with CUDA backend
* more cuda
* cuda simpler
* retain old functionality
* linter and typing
* move globalcounters out of runtimes
* oops, GlobalCounters in cuda
* MAX_OUTPUT_SHAPE=3 is fine for CUDA
2023-01-27 16:26:24 -08:00
George Hotz
6d5e1a8029
GEMM kernel search
2023-01-27 10:08:57 -08:00
George Hotz
123993156d
refactor group_for_reduce a little
2023-01-27 08:51:23 -08:00
George Hotz
82e58108e3
add flake8 to precommit
2023-01-26 22:31:45 -08:00
George Hotz
f4b571039b
fix shape types
2023-01-26 22:29:20 -08:00
Jacky Lee
026ba78526
Add commit hooks ( #478 )
...
* Add pre-commit hook
* We need ret
* Fix some type definitions
2023-01-26 22:24:31 -08:00
George Hotz
c07bc39941
fix mypy, plz add commit hooks
2023-01-26 14:25:42 -08:00
Comma Device
f08e740957
factor out hand coded opt
2023-01-26 14:54:06 -06:00
George Hotz
5e8a36a18b
real op kernel
2023-01-26 09:51:32 -08:00