Commit Graph

47 Commits

Author SHA1 Message Date
George Hotz
d6b404ac11 No dtype alloc (#2570)
* fix all allocs

* improve docs

* ugh fix fake alloc
2023-12-02 13:29:40 -08:00
George Hotz
5068e99d18 refactor to remove extra kernel params (#2563)
* refactor to have compiled kernel

* bugfixes

* docs/beautiful.py

* revert that

* fix tests
2023-12-02 00:32:25 -08:00
George Hotz
6733425095 lower schedule (#2559)
* lower schedule

* remove RAND, and don't put load in the JIT yet

* better fix for that test
2023-12-01 19:17:46 -08:00
chenyu
7fec966b5e bye bye NOOP (#2534)
* bye bye NOOP

* SIN

* NEG
2023-11-30 23:10:35 -08:00
George Hotz
2c363b5f0b new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
George Hotz
0c9b4ab885 no to_underlying (#2222)
* no to_underlying

* context is no longer used

* no more optimizing

* update docs
2023-11-05 21:34:20 -08:00
George Hotz
f17bc16f46 simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz
03cf0afa4f move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
George Hotz
121f7aa8c5 Schedule item (#2012)
* ScheduleItem

* put var_vals in the schedule

* fix tests, wow that proliferated quickly

* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
George Hotz
de5d603ec1 corealize + remove realize from lazybuffer (#1968)
* corealize + remove realize from lazybuffer

* fix multigpu

* fix graph
2023-10-04 10:59:31 -07:00
nimlgen
2ea1dd3e87 no process() in Linearizer (#1966)
* no process() in Linearizer

* more process() clean up
2023-10-04 07:18:42 -07:00
George Hotz
0945848b5f schedule the loadops like everything else (#1964)
* schedule the loadops like everything else

* unify loadops with other things we schedule

* delete all the ops

* fix symbolic jit
2023-10-04 02:36:04 -07:00
George Hotz
adab724caa schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
George Hotz
c907efbf4a reorder a few things (#1915)
* reorder a few things

* huh, that has to be there

* move apply shapetracker

* BufferOps

* only for type checking
2023-09-25 10:17:21 +08:00
George Hotz
20059dc55b Make ShapeTracker Immutable (#1909)
* ugh

* ops test pass

* fix shapetracker tests

* sym shapetracker

* shapetracker is a tuple of views now

* from_shape

* fix has variable shape

* key isn't needed

* post init assert
2023-09-24 21:09:03 +08:00
George Hotz
7ff7aacdb4 LazyOp out of Linearizer (#1908)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx

* dedup buffers

* hmm, bugfix

* fix tensor cores

* opts device
2023-09-24 14:30:53 +08:00
George Hotz
97dc813329 Revert "All LazyOps in the Linearizer (#1905)" (#1907)
This reverts commit a5820390db.
2023-09-24 11:51:22 +08:00
George Hotz
a5820390db All LazyOps in the Linearizer (#1905)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx
2023-09-24 11:50:00 +08:00
crankygrumpster
c8025c319c Remove Token from abstractions.py (#1741)
* Remove Token from abstractions.py, update output string

* add dtype
2023-09-02 21:56:11 -07:00
George Hotz
453e437598 move stuff in the linearizer (#1726)
* move stuff in linearizer

* move stuff in linearizer

* minor

* fix opts import
2023-08-31 14:42:09 -07:00
nimlgen
1c0449e190 add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
chenyu
ae39cf84ab Symbolic Shape JIT main PR (#1353)
* Symbolic Shape JIT

update tests

2 variables symbolic ops, adding more tests

test passing

cleanup

* more test cases

* single flag

* review update

* jit attention one piece

* realize

* symbolic_jit test for cuda

* old artifact

* works with cuda gpu but failed ci

* CUDACPU
2023-08-18 14:39:55 -07:00
George Hotz
d24f936501 just cmplt (#1493)
* just cmplt

* fix maximum

* don't save, there's no backward

* ugh, no slot either

* eq is a scam
2023-08-08 13:58:10 -07:00
George Hotz
7b8d06c9f1 test uops (#1444)
* test uops

* tests should pass

* improve uops

* precision
2023-08-05 12:35:56 -07:00
George Hotz
84c430355e fix backends for new style (#1443)
* fix backends for new style

* fix method cache

* fix fakeless

* llvm blacklist

* fix kernel optimizer
2023-08-05 11:07:04 -07:00
chenyu
940b6fd21a Revert "Fix constant folding for Tensor([3]) (#1227)" (#1274)
This reverts commit ab645317c9.
2023-07-19 10:51:06 -07:00
David Hou
56ee97b37f dedup kernel args v2 (#1272)
* new version

* fix abstractions

* try remove test

* Revert "try remove test"

This reverts commit 2fc18a9f8e.

* assert_allclose

* minimize the test

* minimize the test

* minimize the test

* minimize the test

* Revert "minimize the test"

This reverts commit e0c0929596.

* Revert "minimize the test"

This reverts commit 88240551b1.

* Revert "minimize the test"

This reverts commit 78328a7ce2.

* Revert "minimize the test"

This reverts commit 989523fded.

* skip test inside body

* oops

* oops
2023-07-18 20:03:42 -07:00
Adrian Kretz
5a8ad57163 Add WHERE ternary (or trinary?) op (#1196)
* Rename FusedOps to TernaryOps

* Support ternary broadcast

* Add where llop and mlop

* Make where op work in cstyle codegen

* Don't skip test_inf_where

* Add backward path to where op

* Use bool in cstyle codegen

* Add LLVM where op

* Add numpy where op

* Add torch where op

* Simplify where mlop

* Update documentation

* Forgot a rename

* Merged relevant changes from PR #1195 onto PR #1196

* Add test to cover changes to linearizer.ast_parse for WHERE op

Without this METAL will try to use ternary op on float4 and fail

* Make where op work in wgsl backend

* Allow ternary ops to be merged

* Make mypy happy

---------

Co-authored-by: Francis Lam <flam@alum.mit.edu>
2023-07-16 00:31:55 -07:00
chenyu
ab645317c9 Fix constant folding for Tensor([3]) (#1227)
* Fix constant folding for Tensor([3])

* Remove duplicated prod import

* load in the same device

* better numpy

* add constant fold shape test cases

* improve tests
2023-07-11 14:01:32 -07:00
George Hotz
2952b8e7a8 Fix up abstractions.py to include the Linearizer (#1177)
* fix up docs

* remove pow, add sqrt
2023-07-07 18:33:51 -07:00
tricky-labyrinth
fd98f6cffa Small fix to abstractions.py so it runs on Windows without throwing an AttributeError (#1109)
Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com>
2023-07-03 13:44:49 -07:00
ernie
4d703be6d7 fix typo (#1065) 2023-06-27 10:56:54 -07:00
George Hotz
df40a9c238 EXP+LOG -> EXP2+LOG2 (#954)
* EXP+LOG -> EXP2+LOG2

* update docs
2023-06-08 10:57:31 -07:00
kposborne2
00360da05b Update broken docs/abstractions.py for changed ops, and add to CI (#930)
* fix and add to ci

* still have those

* ocd

* update other doc
2023-06-04 19:21:20 -07:00
George Hotz
791530045d Refactor LoadOps (#910)
* test

* work

* upd test

* loadops

* cleanups

* real ones

* remove LazyNumpyArray

* fix assign test

* remove range

* np.require

* llama uses arange kernels

* no caching consts

* fix enet

* torch load support

* tests cleanup

* fix shufflenet

* fix image

* fix torch_load test
2023-06-03 09:40:43 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz
3a8af99adb i understand ClassVar now 2023-03-15 09:00:25 -07:00
Pasan Perera
df48753692 fixed the import error for latest changes in master (#705) 2023-03-15 08:59:42 -07:00
George Hotz
54f499b623 Move rawbuffer (#697)
* move GlobalCounters to helpers

* that's not part of the public api

* move InterpretedBuffer

* remove fromCPU from devicebuffer
2023-03-13 22:30:36 -07:00
George Hotz
cbc5a7222a symbolic is now a 6/10 due to the infinite loop. do better. 2023-03-13 00:07:59 -07:00
George Hotz
c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz
ce1564b05e fix shapetracker test 2023-03-12 22:33:25 -07:00
George Hotz
153cce0f7e tutorial 2023-03-12 22:31:46 -07:00
George Hotz
8d16ebaea7 we have docs: 2023-03-12 19:05:44 -07:00