Commit Graph

1673 Commits

Author SHA1 Message Date
George Hotz
5dc227dba6 fix bug in ENABLE_METHOD_CACHE and enable for llvm 2023-03-06 07:43:40 -08:00
George Hotz
8c5dea8d72 fix CUDA float4 issues 2023-03-06 07:16:38 -08:00
George Hotz
7dbcc26582 fix up external tests 2023-03-06 06:52:28 -08:00
George Hotz
50012f679b move get_contraction to shapetracker 2023-03-06 06:42:57 -08:00
Alex Wang
64ecbd91b5 Refactor contraction and add integration test cases for push permute (#650)
* Refactor contraction and add unit tests

* Fix typo; Fix TestConv.test_elu failure due to some ones in old_shape

* Add push permute test cases

* Fix mypy type annotation check error

* Add contraction unit test; Reshape to higher dimension is not contraction
2023-03-06 06:36:55 -08:00
Peter McDevitt
cb5be9697c One less line in consume_flops (#651)
* less lines

* using walrus

* using original way
2023-03-05 23:34:45 -08:00
George Hotz
f028accae3 print upcasted colors 2023-03-05 22:12:42 -08:00
George Hotz
b2d4b2c06e refactor imports and EOL whitespace 2023-03-05 20:52:03 -08:00
George Hotz
382f346523 clean up opt (#649)
* clean up opt

* don't let global kernels get too small

* 8192 -> 1024

* disable local shape for clang

* fix can_merge

* unroll the 5x5 depthwise convs in op

* load float4 check
2023-03-05 20:49:36 -08:00
George Hotz
7930c6ab5c CLImage backing bug + test_vec_mul 2023-03-05 16:32:05 -08:00
George Hotz
8de24e3b05 accumulator can be a float4 (#647)
* remove reduceopop

* not float4 yet

* float4 acc works

* group_float4 on store
2023-03-05 15:44:41 -08:00
Cyril Roumégous
c10131ddf5 reduce number of lines (#645) 2023-03-05 15:42:32 -08:00
George Hotz
7989f79820 using image from mad branch saves 1ms on op model 2023-03-05 14:38:42 -08:00
George Hotz
7940ad258e fix dropout test 2023-03-05 12:24:04 -08:00
George Hotz
3072e098c0 local workgroup optimizer 2023-03-05 12:08:12 -08:00
George Hotz
b1ba78ac38 move applegpu disassembler 2023-03-05 11:21:12 -08:00
George Hotz
e8de3f5736 Revert "less lines (#643)" (#644)
This reverts commit 30f2238994.
2023-03-05 08:41:11 -08:00
Peter McDevitt
30f2238994 less lines (#643) 2023-03-05 08:37:14 -08:00
Comma Device
3da56ab41d adreno disassembler 2023-03-05 10:32:03 -06:00
George Hotz
16b03f3c3b wow, can't believe that was broken (#642)
* wow, can't believe that was broken

* remove namedtuple comment
2023-03-04 22:28:28 -08:00
George Hotz
4a607f7d65 more ext gpu tests 2023-03-04 21:00:08 -08:00
George Hotz
69198a73d2 test_1x1_24_6 2023-03-04 20:37:46 -08:00
George Hotz
f281f707bd better function names 2023-03-04 18:27:37 -08:00
George Hotz
a77d792aff Codegen gpu cleanups (#640)
* cleanups

* fixups

* handle pre upcasted global buffers

* early is just required

* delete junk from hand coded opt

* implicit upcast_in_mid_reduce

* speedup

* fix exec w validhacks

* reorder opt

* only need to check the output for that

* return total runtime from kernels if debugging
2023-03-04 15:31:51 -08:00
Patrick Geneva
10d40d3cf2 Expand the inline loop to prevent stack overflow from _deepwalk (#638)
* Expand the inline loop to prevent stack overflow

* Explicitly loop
2023-03-04 15:14:17 -08:00
George Hotz
b02a392d69 Improve local (#635)
* local is improving

* local is finding bugs

* new local should work
2023-03-04 09:30:49 -08:00
Patrick Geneva
117111825c Fix windows file permission error (#634) 2023-03-04 09:23:55 -08:00
George Hotz
528cb3b3b9 fix ast test 2023-03-04 07:49:25 -08:00
George Hotz
8bc9277587 G.nodes isn't always valid 2023-03-04 07:24:43 -08:00
George Hotz
85f69b5489 metal needs the Cocoa 2023-03-03 23:22:15 -08:00
George Hotz
28a6ada4ce line reduction in metal 2023-03-03 23:14:40 -08:00
George Hotz
893f136fe0 lines from helpers 2023-03-03 23:07:46 -08:00
George Hotz
81cda2b672 zero out s == 1 strides 2023-03-03 22:57:02 -08:00
George Hotz
aef336c079 merge_views is very powerful 2023-03-03 22:53:59 -08:00
George Hotz
b5b4edf59b comments 2023-03-03 22:39:31 -08:00
George Hotz
cfb050e2d1 simple modrange, thanks Jacky 2023-03-03 22:37:04 -08:00
George Hotz
3dab721f9f lazy cleanup 2023-03-03 22:01:03 -08:00
George Hotz
c53efb3635 optimize for CL (#633)
* required opt

* simplify

* works

* shift_to_last

* required is fine

* print shape in colored

* better shape

* args was wrong

* debugs

* fix empty shape

* colored shape printer
2023-03-03 22:00:09 -08:00
Pankaj Doharey
9d97d97b26 Opens image in default viewer after saving. (#612) 2023-03-03 17:28:49 -08:00
George Hotz
1a84976d4d fix thneed gflops 2023-03-03 16:52:59 -08:00
George Hotz
7a1d96fd76 No negative (#632)
* behavior is correct without VALIDHACKS

* simple div and mod

* fix tests

* no negative variables

* alt form is correct

* still correct

* bug in mulnode

* at least validhacks works now

* cleanups

* test validhacks, and to_image_idx

* cache compare key

* tests and __neg__
2023-03-03 16:48:14 -08:00
George Hotz
8c475ea86a relax atol, merge_view 2023-03-03 07:48:44 -08:00
George Hotz
b9ce20c374 openpilot test wasn't running, factor out image idx 2023-03-03 07:41:53 -08:00
George Hotz
9bd2cdee08 skip broken bn training test for speed 2023-03-03 06:52:11 -08:00
George Hotz
999b44c274 fix external test + speed 2023-03-03 06:46:16 -08:00
George Hotz
8919ca8163 test cleanups 2023-03-03 06:36:06 -08:00
George Hotz
459488bba2 fix linter (#630)
* fix linter

* no imports okay

* explicit bases

* disable in pylintrc
2023-03-02 20:06:20 -08:00
George Hotz
3915c89fb6 symbolic improvements (#629)
* fixups

* shorter diff

* wow, okay removing that had side effects

* more numeric tests

* MIN MAX tests
2023-03-02 19:50:38 -08:00
George Hotz
b842cdf11f support wait in cuda 2023-03-02 10:39:26 -08:00
George Hotz
dc88ad3342 fix ops print bug 2023-03-02 10:33:03 -08:00