George Hotz
5dc227dba6
fix bug in ENABLE_METHOD_CACHE and enable for llvm
2023-03-06 07:43:40 -08:00
George Hotz
8c5dea8d72
fix CUDA float4 issues
2023-03-06 07:16:38 -08:00
George Hotz
7dbcc26582
fix up external tests
2023-03-06 06:52:28 -08:00
George Hotz
50012f679b
move get_contraction to shapetracker
2023-03-06 06:42:57 -08:00
Alex Wang
64ecbd91b5
Refactor contraction and add integration test cases for push permute ( #650 )
...
* Refactor contraction and add unit tests
* Fix typo; Fix TestConv.test_elu failure due to some ones in old_shape
* Add push permute test cases
* Fix mypy type annotation check error
* Add contraction unit test; Reshape to higher dimension is not contraction
2023-03-06 06:36:55 -08:00
Peter McDevitt
cb5be9697c
One less line in consume_flops ( #651 )
...
* less lines
* using walrus
* using original way
2023-03-05 23:34:45 -08:00
George Hotz
f028accae3
print upcasted colors
2023-03-05 22:12:42 -08:00
George Hotz
b2d4b2c06e
refactor imports and EOL whitespace
2023-03-05 20:52:03 -08:00
George Hotz
382f346523
clean up opt ( #649 )
...
* clean up opt
* don't let global kernels get too small
* 8192 -> 1024
* disable local shape for clang
* fix can_merge
* unroll the 5x5 depthwise convs in op
* load float4 check
2023-03-05 20:49:36 -08:00
George Hotz
7930c6ab5c
CLImage backing bug + test_vec_mul
2023-03-05 16:32:05 -08:00
George Hotz
8de24e3b05
accumulator can be a float4 ( #647 )
...
* remove reduceopop
* not float4 yet
* float4 acc works
* group_float4 on store
2023-03-05 15:44:41 -08:00
Cyril Roumégous
c10131ddf5
reduce number of lines ( #645 )
2023-03-05 15:42:32 -08:00
George Hotz
7989f79820
using image from mad branch saves 1ms on op model
2023-03-05 14:38:42 -08:00
George Hotz
7940ad258e
fix dropout test
2023-03-05 12:24:04 -08:00
George Hotz
3072e098c0
local workgroup optimizer
2023-03-05 12:08:12 -08:00
George Hotz
b1ba78ac38
move applegpu disassembler
2023-03-05 11:21:12 -08:00
George Hotz
e8de3f5736
Revert "less lines ( #643 )" ( #644 )
...
This reverts commit 30f2238994 .
2023-03-05 08:41:11 -08:00
Peter McDevitt
30f2238994
less lines ( #643 )
2023-03-05 08:37:14 -08:00
Comma Device
3da56ab41d
adreno disassembler
2023-03-05 10:32:03 -06:00
George Hotz
16b03f3c3b
wow, can't believe that was broken ( #642 )
...
* wow, can't believe that was broken
* remove namedtuple comment
2023-03-04 22:28:28 -08:00
George Hotz
4a607f7d65
more ext gpu tests
2023-03-04 21:00:08 -08:00
George Hotz
69198a73d2
test_1x1_24_6
2023-03-04 20:37:46 -08:00
George Hotz
f281f707bd
better function names
2023-03-04 18:27:37 -08:00
George Hotz
a77d792aff
Codegen gpu cleanups ( #640 )
...
* cleanups
* fixups
* handle pre upcasted global buffers
* early is just required
* delete junk from hand coded opt
* implicit upcast_in_mid_reduce
* speedup
* fix exec w validhacks
* reorder opt
* only need to check the output for that
* return total runtime from kernels if debugging
2023-03-04 15:31:51 -08:00
Patrick Geneva
10d40d3cf2
Expand the inline loop to prevent stack overflow from _deepwalk ( #638 )
...
* Expand the inline loop to prevent stack overflow
* Explicitly loop
2023-03-04 15:14:17 -08:00
George Hotz
b02a392d69
Improve local ( #635 )
...
* local is improving
* local is finding bugs
* new local should work
2023-03-04 09:30:49 -08:00
Patrick Geneva
117111825c
Fix windows file permission error ( #634 )
2023-03-04 09:23:55 -08:00
George Hotz
528cb3b3b9
fix ast test
2023-03-04 07:49:25 -08:00
George Hotz
8bc9277587
G.nodes isn't always valid
2023-03-04 07:24:43 -08:00
George Hotz
85f69b5489
metal needs the Cocoa
2023-03-03 23:22:15 -08:00
George Hotz
28a6ada4ce
line reduction in metal
2023-03-03 23:14:40 -08:00
George Hotz
893f136fe0
lines from helpers
2023-03-03 23:07:46 -08:00
George Hotz
81cda2b672
zero out s == 1 strides
2023-03-03 22:57:02 -08:00
George Hotz
aef336c079
merge_views is very powerful
2023-03-03 22:53:59 -08:00
George Hotz
b5b4edf59b
comments
2023-03-03 22:39:31 -08:00
George Hotz
cfb050e2d1
simple modrange, thanks Jacky
2023-03-03 22:37:04 -08:00
George Hotz
3dab721f9f
lazy cleanup
2023-03-03 22:01:03 -08:00
George Hotz
c53efb3635
optimize for CL ( #633 )
...
* required opt
* simplify
* works
* shift_to_last
* required is fine
* print shape in colored
* better shape
* args was wrong
* debugs
* fix empty shape
* colored shape printer
2023-03-03 22:00:09 -08:00
Pankaj Doharey
9d97d97b26
Opens image in default viewer after saving. ( #612 )
2023-03-03 17:28:49 -08:00
George Hotz
1a84976d4d
fix thneed gflops
2023-03-03 16:52:59 -08:00
George Hotz
7a1d96fd76
No negative ( #632 )
...
* behavior is correct without VALIDHACKS
* simple div and mod
* fix tests
* no negative variables
* alt form is correct
* still correct
* bug in mulnode
* at least validhacks works now
* cleanups
* test validhacks, and to_image_idx
* cache compare key
* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz
8c475ea86a
relax atol, merge_view
2023-03-03 07:48:44 -08:00
George Hotz
b9ce20c374
openpilot test wasn't running, factor out image idx
2023-03-03 07:41:53 -08:00
George Hotz
9bd2cdee08
skip broken bn training test for speed
2023-03-03 06:52:11 -08:00
George Hotz
999b44c274
fix external test + speed
2023-03-03 06:46:16 -08:00
George Hotz
8919ca8163
test cleanups
2023-03-03 06:36:06 -08:00
George Hotz
459488bba2
fix linter ( #630 )
...
* fix linter
* no imports okay
* explicit bases
* disable in pylintrc
2023-03-02 20:06:20 -08:00
George Hotz
3915c89fb6
symbolic improvements ( #629 )
...
* fixups
* shorter diff
* wow, okay removing that had side effects
* more numeric tests
* MIN MAX tests
2023-03-02 19:50:38 -08:00
George Hotz
b842cdf11f
support wait in cuda
2023-03-02 10:39:26 -08:00
George Hotz
dc88ad3342
fix ops print bug
2023-03-02 10:33:03 -08:00