Commit Graph

129 Commits

Author SHA1 Message Date
chenyu
10d642e174 fuzz linearizer transformation (#2188)
* fuzz linearizer transformation

* no standard normal for fp16

* work

* Interpreted start

* CPU and TORCH work

* fix MemBuffer with same idx

* id for failed kernels

* no image and variable for Interpreted

* symbolic shape

* IMAGE only for GPU

* Interpreted almost all good

* cleanup

* fix bufs_from_lin

* zero size

* some failed examples

* just Exception

* just test not pass
2023-11-09 08:03:27 -08:00
wozeparrot
4c44d1344b feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00
George Hotz
c0a033f01d remove real_offset (#2234)
* remove real_offset

* pass in numnode

* remove that real_offset

* sample only for variable
2023-11-07 17:30:53 -08:00
George Hotz
2f7aab3d13 move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
chenyu
f582ec56d5 Replace (getenv("CI", "") != "") with helpers.CI (#2213) 2023-11-03 15:20:44 -07:00
George Hotz
f17bc16f46 simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz
03cf0afa4f move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
George Hotz
8ba7ced7f9 extract const if it's const (#2193)
* extract const if it's const

* fix if statement

* fast math issue

* fix graphing and casting

* disable flaky copyout test
2023-10-31 18:52:35 -07:00
qazal
e2428b63a6 external (#2191) 2023-10-31 13:57:24 -07:00
Roelof van Dijk
36ab04ae35 perf: lazyop as dataclass (#1603)
* perf: lazyop as dataclass

fix: linter

fix: restore eq

* use builtin methods, buffers to property to allow freezing

* fix: reduce diff

* fix: can't freeze due to KOPT tests, mypy

* fix: explicit hash

* can freeze if tests are fixed

* fix: typo

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-25 17:54:30 -04:00
wozeparrot
c29653605e hip multigpu training (#1878)
* feat: move to hip

* feat: special path for RawBufferTransfer

* feat: initial rawbuffertransfer

* feat: hip ipc

* feat: working hip ipc

* feat: need to base device without args

* feat: close mem handle

* feat: modified test

* feat: more multihip stuff

* clean: cleanup

* feat: cleaner

* feat: don't crash

* feat: test more

* clean: way cleaner hip wrapper

* feat: barrier

* feat: barrier

* feat: this breaks stuff

* feat: we can use empty here

* feat: maybe fix tests

* feat: maybe fix tests again?

* fix: probably fix tests

* feat: no waiting here

* feat: wait here

* feat: much larger test

* feat: need to sync here

* feat: make this async

* feat: no waiting!

* feat: cut here

* feat: sync copy

* feat: random imports

* feat: much cleaner world

* feat: restore this

* feat: restore this

* clean: cleanup

* feat: set this
2023-10-24 17:35:53 -04:00
George Hotz
cb508e6923 uops graphing + phi (#2120)
* uops graphing

* add_phi_node

* less phi nodes

* where graph uops should live

* naming

* move it to external

* fix triton yolo

* fix clang and preserve behavior
2023-10-19 22:26:28 -07:00
geohotstan
5ed630204b Add ONNX to CI for other backends (#2069)
* some cleanup

* move continue back

* more more more

* added to CI

* try

* try intentionally break some tests

* wtf

* del True for test

* yay tests broke, now pls no break

* try AGAIN

* gahy

* lol

* try

* move over constant

* moved over MORE

* move shrink over

* trailing lines

* try CUDA CI

* try again

* boom

* oops

* improved comments

* try: disable some flags and disable CUDA

* try breaking tests

* traceback has too much info so add --tb=no

* revert forced CI failure

* add comments and del unused imports

* oooooooo using regular debug try enable tb

* intentionally break tests

* added tb back. Maybe not too verbose

* strip whitespcae

* missed something

* Shape op int32 -> int64

* oops missed something

* add some types

* get rid of crazy 1 liners in pad op

* actually test Split this time LOL

* strip that whitespace
2023-10-17 09:33:54 -07:00
George Hotz
5a4a62ecae Disable logging in early compile2 and lower kernel counts (#2090)
* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* gate behind OPT >= 4

* disable_logging in schedule

* simple

* from master

* more images

* revert that

* 206 kernels
2023-10-16 20:15:24 -07:00
George Hotz
d0aaf7d83b Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)""
This reverts commit f22a7cf656.
2023-10-16 17:47:00 -07:00
George Hotz
5e24dc5a95 limit metal buffers and revert the 207 fix (try 2) (#2088)
* limit metal buffers

* look at the base, not the srcs

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* add a test for that
2023-10-16 14:52:16 -07:00
George Hotz
e8fcd2f3db Revert "limit metal buffers and revert the 207 fix (#2087)"
This reverts commit 2fb10f6a19.
2023-10-16 14:32:22 -07:00
George Hotz
2fb10f6a19 limit metal buffers and revert the 207 fix (#2087)
* limit metal buffers

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.
2023-10-16 14:26:32 -07:00
George Hotz
c36d306606 KOPT is over, BEAM is upstream (#2071)
* create cache for q learning

* make linter happy

* global beam

* where it belongs

* bugfix

* ditch the kopt, use the beam

* faster lin and DEBUG=2 okay

* remove kopt, move search to features
2023-10-16 09:46:03 -07:00
George Hotz
924ecc4d6a Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)
This reverts commit 63869c62fc.
2023-10-13 12:01:55 -07:00
Amrit Sahu
63869c62fc openpilot kernel fix from 209 to 207 (#2006)
* Fix openpilot kernel from 209 to 206

1. Use push_movement_ops conditions in _movement_op. Don't push
PAD or check if the ops are safe to be pushed with PAD

2. Don't push if all the op.buffers are realized

* change ALLOWED_KERNEL_COUNT to 206 for openpilot

* don't push through sourceless buffers

* change the tests to adjust kernel counts for new behaviour

* restore pushing of movement ops through childless buffer

* don't push EXPAND, causes OOM

* allow push of intermediate movement ops

* adding new test behaviour

* modifying external_test_opt for new behaviour

* restore old tests

* Reenable push of EXPAND and introduce new tests

I was wrong intially thinking EXPAND can cause OOM and hence I had
disabled it. Since it is 0 stride and doesn't allocate memory its cool

* Don't push EXPAND above LoadOps LB. This is causing OOM

* Push should be decided on movement root of bufs

To check if ast.op.buffers is sourceless/ realized go the the movement
root and then decide if pushing should be done or not

* refactor for readability

* use .base instead

* don't push expand, bad memory/compute consumption

* restrict push of reshape, seeing improvement

* push reshape if unary without further check

* disable PAD solves convnext kernel count increase

* reenable test_cache_binaryop_transpose

* small nit
2023-10-13 11:59:15 -07:00
geohotstan
8d6cecb25c Torch eq fix (#1562)
* init

* Revert "init"

This reverts commit 682bf2073a.

* kids dont do drugs

* one way to fix

* resolve merge conflict

* no more or

* clean up
2023-10-11 12:57:11 -07:00
Luca Sciarpa
e93e240a6c adapting test/external/external_osx_profiling.py to the new code base (#2002)
* adapting external osx profiling

* fixing dtype

* fixing buffer size
2023-10-08 05:55:00 -07:00
George Hotz
fa9945dac0 remove stale tests 2023-10-06 02:14:56 -07:00
George Hotz
2d0c1037b1 Fix up latest openpilot model (#1976)
* fix gemv triggering for gemm

* fixup_openpilot

* external test issues
2023-10-05 05:24:28 -07:00
George Hotz
3d5127038c don't create linearizer if we are in the method cache (#1969)
* don't create linearizer if we are in the method cache

* remove unchecked properties

* that key isn't used

* fix default type is sticky
2023-10-04 12:42:58 -07:00
Yixiang Gao
094d3d71be with Tensor.train() (#1935)
* add with.train

* remove the rest TODOs

* fix pyflake

* fix pyflake error

* fix mypy
2023-09-28 18:02:31 -07:00
wozeparrot
70671d9625 fix test_collectives (#1934)
* fix: fix test_collectives.py

* feat: reenable test_collectives
2023-09-28 11:02:22 -07:00
George Hotz
adab724caa schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
George Hotz
6d9065ed1c Minor cleanups (#1911)
* cleanups

* remove that simplify
2023-09-24 21:32:50 +08:00
George Hotz
78576915de Add needed contiguous to DiskBuffer. SHM support on OSX (#1891)
* add some contiguous

* remove second contig

* Revert "remove second contig"

This reverts commit fc164f7dca1ad75b1e466e4e45a05eca58b7e0e0.

* shm on osx

* can repro bug

* don't contig zeros and ones
2023-09-22 09:16:42 +08:00
qazal
d0e752003d fixes (#1893) 2023-09-22 07:20:27 +08:00
nimlgen
9450e41f70 no import when Python is shutting down (#1875) 2023-09-20 12:47:02 -04:00
Pavol Rusnak
52a92bf95d use class Foo: instead of class Foo(): (#1797)
* use class Foo: instead of class Foo():

* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
geohotstan
1bbf26d7fd fix try except not catching fxn() in benchmark (#1783)
* have function raise notimplementederror

* more lines

* revert back to 2 lines :D

* aahhhhhhhh shoooot im stupid

* keep it minimal?
2023-09-06 07:36:43 -07:00
Pavol Rusnak
a50a7ef6f2 revert typo in external_multi_gpu.py (#1777)
introduced by fb1cc6bf4b
2023-09-05 20:46:28 -07:00
George Hotz
89a8a02697 disable openpilot model in model benchmark 2023-09-05 13:32:30 -07:00
geohotstan
9af5645ba3 onnx full passing (#1076)
* 1

* 83 failed

* learning how git works

* lol idk

* zero shape aaaa

* space lol

* aaa

* test check

* haha

* fixed gather

* 73 failing

* 71 failing

* 68 failing

* added some debug

* fking resize

* lol

* 62 failing

* 58 failling fucking did nearest resize hell yeah

* clean up

* 56 failing

* janitor duty

* lol

* 53 failing

* hi mom

* 50 failing

* added linear interp, but coord_trans is wrong

* did lin interpolation woohoo

* 43 failing

* 40 failing

* temporary Gather fix

* 39 failing

* fixed slice onnxver<10

* 37 failing

* 35 failing

* excluded tests that use float64

* 32 failing with hacks

* added _batchnorm() for 3D 5D batchnorm, 29 failing

* changed ALLOWED_KERNEL_COUNT from 199 to 207

* added improved Gather op, reverted ALLOWED_KERNEL_COUNT commit

* support Round op

* added storage_order/indices maxpool, 27 failing

* support maxunpool, 25 failures

* support Gradient, 23 failures

* merged new where

* added Adam

* cleanups

* added Momentum and Nesterov Momentum

* added Adagrad

* support sequence_type, 20 failing

* ugh git

* I give up on cubic interp :D, 9 failing

* sexy 1 liner gather, much improved, wow

* polished gather to make it shine bright like a diamond

* clean 1 liner for gather

* improved readability of gather

* uhh

* clean up

* more clean up

* WHITEspace

* implemented SoftmaxCrossEntropyLoss op

* added comments and cleaned up if statements

* update

* thank based wozeparrot for pow and new GatherElements

* CPU and TORCH all pass | cast float64 -> float32 for all fromCPU()

* _nearest_gather() failing on yolo

* reverted ops_cpu change and added assert in Resize

* added comments for resize for multiple channels

* oops

* merge

* test

* switched np.pad to Tensor.pad for constant padding

* gah

* gah2

* sexy reflect pad with movementops -> add

* delete commented out lines

* edge mode pad sexy as well

* trying out model_benchmark

* revert gitignore change lol

* init

* Revert "init"

This reverts commit 682bf2073a.

* wrote cast workaround for CPU, CPU and TORCH all pass

* wrote cast workaround for CPU, CPU and TORCH all pass

* skipped tests w/ 0 shape for METAL and GPU

* excluded tests for CLANG, CPU, TORCH, CLANG pass

* fixed hacky ConvTranspose

* gotta figure out autopad

* UOps.STORE support cast bool -> float

* small fix for fast gather

* reverted 0 shape skipped tests

* oops missed a file

* added comment

* fixed slice op hack

* First commit to pr

* More trig ops

* More trig ops

* format

* isinf support

* More ops

* changed onnx_ops to use our new gather :D

* Det op bug fix

* rebase

* fixed some tests

* det broken and slow

* fixed compress to use new gather

* implemented argmax argmin

* support variable types in type_proto

* support Upsample and Identity sequence

* we support float64 now and tinygrad support automatic broadcasting

* added EyeLike op

* resize does support multiple channels now actually

* yolov8 onnx runs successfully

* added batch size 1

* oops

* finally fixed type_proto I think

* fixed some llvm bugs

* del whitespaces

* added ZenginU Format PR

* test

* oops

* added float64 exclude tests back

* more skipped tests

* try

* ok openpilot pass

* flake8 pass

* woooooohooo

* revert external_model_benchmark changes

* perf tested gather

* removed promote types from ops_cpu

* numerical errors from 1681 is fixed

---------

Co-authored-by: ZenginU <umutzengin00@gmail.com>
2023-09-05 13:23:32 -07:00
George Hotz
fb1cc6bf4b llama jit is default, print tok/sec (#1774)
* llama jit is default, print tok/sec

* jit not default in CI
2023-09-05 10:12:16 -07:00
Yixiang Gao
66a6bbd029 codellama (#1702)
* add codellama with pre-downloaded weights

* add rope_theta, fix param

* fix test

* add 7B-Python

* add 7B-Instruct

* replace single quotes with doulbe

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-02 08:45:12 -07:00
JaSpa99
024dd690fa Reactivate commavq/gpt2m benchmark (#1731)
* get commavq/gpt2m from huggingface

* increase tols
2023-09-01 06:45:08 -07:00
Karan Handa
a8aa13dc91 [ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
nimlgen
1c0449e190 add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
chenyu
b5d700adae update openpilot supercombo.onnx to 0.9.4 (#1681)
* update openpilot supercombo.onnx to 0.9.4

* update tests for the new model

* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
643cbdfd50 make embedding and GPT-2 fast (#1631)
* make embedding fast

* jit more, variable shape support

* print mem bw
2023-08-22 15:14:38 -07:00
George Hotz
718ced296c move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
Yixiang Gao
8d6662a741 .cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
chenyu
ae39cf84ab Symbolic Shape JIT main PR (#1353)
* Symbolic Shape JIT

update tests

2 variables symbolic ops, adding more tests

test passing

cleanup

* more test cases

* single flag

* review update

* jit attention one piece

* realize

* symbolic_jit test for cuda

* old artifact

* works with cuda gpu but failed ci

* CUDACPU
2023-08-18 14:39:55 -07:00
nimlgen
bd111411bf init allocator for compiled backends (#1467)
* init allocator for compiled backends

* Update ops_webgpu.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-17 10:33:32 -07:00