Commit Graph

1003 Commits

Author SHA1 Message Date
geohotstan
5ed630204b Add ONNX to CI for other backends (#2069)
* some cleanup

* move continue back

* more more more

* added to CI

* try

* try intentionally break some tests

* wtf

* del True for test

* yay tests broke, now pls no break

* try AGAIN

* gahy

* lol

* try

* move over constant

* moved over MORE

* move shrink over

* trailing lines

* try CUDA CI

* try again

* boom

* oops

* improved comments

* try: disable some flags and disable CUDA

* try breaking tests

* traceback has too much info so add --tb=no

* revert forced CI failure

* add comments and del unused imports

* oooooooo using regular debug try enable tb

* intentionally break tests

* added tb back. Maybe not too verbose

* strip whitespcae

* missed something

* Shape op int32 -> int64

* oops missed something

* add some types

* get rid of crazy 1 liners in pad op

* actually test Split this time LOL

* strip that whitespace
2023-10-17 09:33:54 -07:00
George Hotz
5a4a62ecae Disable logging in early compile2 and lower kernel counts (#2090)
* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* gate behind OPT >= 4

* disable_logging in schedule

* simple

* from master

* more images

* revert that

* 206 kernels
2023-10-16 20:15:24 -07:00
George Hotz
d0aaf7d83b Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)""
This reverts commit f22a7cf656.
2023-10-16 17:47:00 -07:00
George Hotz
5e24dc5a95 limit metal buffers and revert the 207 fix (try 2) (#2088)
* limit metal buffers

* look at the base, not the srcs

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* add a test for that
2023-10-16 14:52:16 -07:00
George Hotz
e8fcd2f3db Revert "limit metal buffers and revert the 207 fix (#2087)"
This reverts commit 2fb10f6a19.
2023-10-16 14:32:22 -07:00
George Hotz
2fb10f6a19 limit metal buffers and revert the 207 fix (#2087)
* limit metal buffers

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.
2023-10-16 14:26:32 -07:00
George Hotz
c36d306606 KOPT is over, BEAM is upstream (#2071)
* create cache for q learning

* make linter happy

* global beam

* where it belongs

* bugfix

* ditch the kopt, use the beam

* faster lin and DEBUG=2 okay

* remove kopt, move search to features
2023-10-16 09:46:03 -07:00
mmmkkaaayy
91168a28c4 whisper: make file transcription work, add basic CI test (#2042) 2023-10-13 17:13:35 -07:00
George Hotz
924ecc4d6a Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)
This reverts commit 63869c62fc.
2023-10-13 12:01:55 -07:00
Amrit Sahu
63869c62fc openpilot kernel fix from 209 to 207 (#2006)
* Fix openpilot kernel from 209 to 206

1. Use push_movement_ops conditions in _movement_op. Don't push
PAD or check if the ops are safe to be pushed with PAD

2. Don't push if all the op.buffers are realized

* change ALLOWED_KERNEL_COUNT to 206 for openpilot

* don't push through sourceless buffers

* change the tests to adjust kernel counts for new behaviour

* restore pushing of movement ops through childless buffer

* don't push EXPAND, causes OOM

* allow push of intermediate movement ops

* adding new test behaviour

* modifying external_test_opt for new behaviour

* restore old tests

* Reenable push of EXPAND and introduce new tests

I was wrong intially thinking EXPAND can cause OOM and hence I had
disabled it. Since it is 0 stride and doesn't allocate memory its cool

* Don't push EXPAND above LoadOps LB. This is causing OOM

* Push should be decided on movement root of bufs

To check if ast.op.buffers is sourceless/ realized go the the movement
root and then decide if pushing should be done or not

* refactor for readability

* use .base instead

* don't push expand, bad memory/compute consumption

* restrict push of reshape, seeing improvement

* push reshape if unary without further check

* disable PAD solves convnext kernel count increase

* reenable test_cache_binaryop_transpose

* small nit
2023-10-13 11:59:15 -07:00
qazal
0e2e041faf CI for using tinygrad as an external pkg (#2019)
* create workflow

* unify with test.yml
2023-10-08 10:50:48 -07:00
Vidhan Bhatt
94b21c41a7 ci: use mypy.ini (#1993) 2023-10-06 01:45:28 -07:00
George Hotz
2d0c1037b1 Fix up latest openpilot model (#1976)
* fix gemv triggering for gemm

* fixup_openpilot

* external test issues
2023-10-05 05:24:28 -07:00
Ahmed Harmouche
fb4d830a2a Fix cast error in render_load in wgsl (#1956)
* Fix cast error in wgsl

* User render_cast intead of introducing new method

* Make it shorter

* Add back webgpu tests: efficientnet and dtypes
2023-10-04 02:29:14 -07:00
George Hotz
6a79d4044a unrealized consts everywhere (#1963)
* unrealized consts everywhere

* don't import device from lazy

* Device isn't in Lazy

* same issue

* disable jit random
2023-10-04 01:48:10 -07:00
George Hotz
6a4ec4776e fix CI (#1953)
* this work

* unauth

* update in all places
2023-10-02 02:58:58 -07:00
Francis Lam
f445e056ed wmma: add test and tensor core shape (#1925) 2023-09-28 18:04:28 -07:00
Yixiang Gao
10f0dc0c85 keep only one comment from git action bot (#1936) 2023-09-28 20:24:53 -04:00
wozeparrot
70671d9625 fix test_collectives (#1934)
* fix: fix test_collectives.py

* feat: reenable test_collectives
2023-09-28 11:02:22 -07:00
George Hotz
adab724caa schedule2, keep the tests working with small changes (#1932)
* lazy cleanups

* ast functions take in LazyOps

* op instead of self.op

* _base for mops

* fix contiguous

* start schedule

* test_schedule

* fix openpilot

* more tests

* bugfix and test skip

* work

* make sure things get freed

* fix zerosized tensors

* fix failing test

* fix ceil and friends

* fix openpilot

* disable training

* disable test collectives
2023-09-28 09:14:43 -07:00
George Hotz
1e15fdaee7 disable flaky triton test 2023-09-23 14:59:36 +08:00
Szymon Ożóg
58296c079d Make Triton work again (#1547)
* Move ops_triton to runtime and remove errors from deprecated code

* Remove deprecated AST Kernel

* Remove deprecated buffer

* Add TritonProgram

* Triton Buffer

* Use RawCUDABuffer

* triton_compile

* Added new parameter

* pass _buf to program

* remove deprecated include

* Added triton tests

* Deprecated includes removed

* remove double print

* Disable float4 support

* Disable float4 support

* variable load fix

* Track local size

* Add pycuda to triton dependencies

* Merge test.yml

* install cuda packages for testing

* merge double package install

* remove emulated from triton tests

* upscale local index to power of 2 and add masking

* cuda envs

* Add TernaryOps

* ConstOp loading

* proper function name

* remove deprecated variables

* get global program from name

* const ops match local shape

* Enable test_nn

* remove deprecated import

* fix linter error

* Add wait logic

* Add local size override

* accumulate local shapes instead of using max shape

* Merge triton tests into global tests

* fix envs in testing

* Old testing routine

* split file into renderer and program

* remove print and starting whitespace

* pretty ptx print on debug 5

* linter errors

* ignore triton saturation tests

* ignore test example

* remove pytorch cpu extra index

* Add triton to existing testing routine

* use triton tests

* disable cuda backend in triton tests

* use cudacpu in tests

* print used device

* Print device default

* Remove print

* ensure we are running triton backend

* update variable signatures

* update dtypes for load

* infinity render fixed

* limit global size

* negative infinity now properly rendered

* split chain with parentheses for and node

* Add option to disable shared memory, disable for triton

* missing import

* Properly index and mask conditional load

* use mask only if not loading a block pointer

* nan support

* fix symbolic tests to include chain split

* proper masking for stores

* Implemented bool dtype

* Add mod

* fix loads for variables with valid range

* merge triton with cuda runtime

* merge from master

* run triton tests with cuda

* Correct target when running from triton

* conftest with triton compiler config

* use triton nightly

* verbose tests for triton

* capture stdout

* fix function depth when exiting multiple loops

* add render valid function for readabilty

* fix mask for local loops

* add _arg_int32 datatype

* fix dims for conditional loads

* enable non float stores

* correct variable dtypes

* fix type for arg_int32

* remove junk

* Added get max function for range based var.max

* remove deprecated code

* Fix triton ptxas path

* Fix testing for CI

* clamp local size by max local size instead of always running max

* Disable matmul test in triton cpu

* rerun tests

* Disable broken test in triton cpu

* whitespace removed

* rerun tests again

* Disable TestSymbolicOps for triton

* update to new uops

* linter fix

* ignore test/extra

* linting fix

* Update tinygrad/renderer/triton.py

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* remove deprecated line

* quotes type fix

* linter

* Remove unnecesary lines

* UnaryOps.NEG

* dont define constants

* Linting fix

* Disable tests that are broken in ocelot

* remove trailing whitespace

* reduce line count

* linting fix

* update to new uast

* New looping style

* Update to new uast

* make AST runner work with triton

* linting fix

* set renderer var for testing

* disable local for ocelot

* reenable all tests for ocelot

* Pass shared to cuda

* Don't group if the backend doesn't support shared mem

* use working gpuocelot branch

* enable all tests

* enable local for ocelot

* cleanup

* Update test.yml

* update cache key

* reenable test symbolic and extra

* Update test.yml

* Revert "Update test.yml" (rerun tests)

This reverts commit 98c0630ee5.

* Revert "fix symbolic tests to include chain split"

This reverts commit 22a9a4c9cd.

* Revert "split chain with parentheses for and node"

This reverts commit 7499a7004e.

* use global size from linearizer

* rename newvar to dtype to match other renderers

* join program start lines

* simplify code that adds axis to local dims

* assign r[u] in ssa

* We no longer need to replace target in src

* we no longer need to cast indices to int by hand

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

* Update triton.py(rerun tests)

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-23 14:17:12 +08:00
Umut Zengin
3987280daf Fix VALIDHACKS for Images and make it default (#1832)
* valid hacks

* valid hacks

* valid hacks

* new method

* new method

* handtune

* is gate load breaking?

* lint

ruff

less junk

new approach?

maybe this?

* Make it more clear

* Make it more clear

* Will deal with the linter later

* hack for linter

* subs the idx but dont touch the valid

* Updated the mod rules

* lint hack

* I believe bug fix lets see

* Mod Node left

* revert

* Maybe this wont break?

* revert

* implemented "handtuned garbage"

* revert and use VALIDHACKS

* Lets see the CI

* still broken?

* currently its jungle

* maybe this jungle ?

* This works for everything somehow

* Added test for symbolic

* lint

* final touch

* This still works

* lint

* midway clean

* less garbage

* lint

* final form

* Slow but working way

* lint and other stuff

* lint

* mypy

* Make sure CI test Openpilot valid checks

* test if CI break

* Convert back

* refactor

* refactor

* Managed to reduce openpilot time from 30 secs to 5 secs

* Refactor

* Substitute a node with variable

* flake8

* Comment and refactor

* More comprehensive mod

* refactor

* bug fix

* More shave off

* remove not sure part
2023-09-23 07:34:43 +08:00
Yixiang Gao
84ab47a90a add branch up-to-date check (#1879) 2023-09-20 12:41:51 -04:00
Yixiang Gao
18ec5a9e09 add comment bot to CI (#1873) 2023-09-16 12:22:06 -04:00
wozeparrot
c870764940 Revert "add line changes diff bot to CI (#1863)" (#1870) 2023-09-15 16:56:42 -04:00
Yixiang Gao
789c84a7a3 add line changes diff bot to CI (#1863) 2023-09-15 16:29:58 -04:00
chenyu
29ac8293d7 run gpt2 in CI (#1866) 2023-09-15 04:37:02 +08:00
chenyu
9e9ea20784 Fix view, CI cpu test with python 3.8 (#1845) 2023-09-10 22:37:58 -04:00
George Hotz
0e3e2bac13 amd wino: upload results 2023-09-09 13:57:14 -07:00
George Hotz
6f95c5f284 winograd speed test for AMD (#1826) 2023-09-09 13:56:33 -07:00
George Hotz
0f2bd10d00 add winograd CIFAR to mac tests (#1825)
* add winograd CIFAR to mac tests

* symlink already done
2023-09-09 13:45:24 -07:00
Pavol Rusnak
52a92bf95d use class Foo: instead of class Foo(): (#1797)
* use class Foo: instead of class Foo():

* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
George Hotz
fb1cc6bf4b llama jit is default, print tok/sec (#1774)
* llama jit is default, print tok/sec

* jit not default in CI
2023-09-05 10:12:16 -07:00
nimlgen
f863c12610 test kopt correctness (#1756)
* test kopt correctness

* bump BUDGET to 20

* kopt hooks as setUp/tearDown
2023-09-04 10:55:00 -07:00
George Hotz
56abe04e4b disable assembly (#1755) 2023-09-04 09:41:20 -07:00
chenyu
b8fde6bb0f Test KOPT in CI (#1744)
* test kopt in ci

* getenv takes dtype from default
2023-09-03 14:37:20 -07:00
George Hotz
89cd380bfc add nvidia CI (#1737)
* add nvidia

* speed(nvidia)
2023-09-01 22:02:30 -07:00
George Hotz
fdd7f282cb Reenable tensor cores for self-hosted Mac CI (#1717)
* debug 5 matmul

* allow tensor cores in CI

* tensor cores on arm64

* put debug back
2023-08-30 07:53:04 -07:00
wozeparrot
2f768e386d stable diffusion benchmark artifact (#1714) 2023-08-29 21:08:40 -04:00
George Hotz
0ea22bf249 remove DEBUG=1 from stable diffusion AMD since jit cache is fixed 2023-08-29 12:46:12 -07:00
George Hotz
ab9b9ff3e2 pipefail benchmark (#1709) (#1710)
* feat: specify shell

* feat: specify shell for mac

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-08-29 08:15:02 -07:00
George Hotz
aa7c98722b sd timing (#1706) 2023-08-28 20:22:57 -07:00
George Hotz
f5f8b09c13 allow manual release (#1704) 2023-08-28 17:54:25 -07:00
George Hotz
715047a1e4 fix release publish (#1703) 2023-08-28 17:48:00 -07:00
chenyu
b5d700adae update openpilot supercombo.onnx to 0.9.4 (#1681)
* update openpilot supercombo.onnx to 0.9.4

* update tests for the new model

* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
Roelof van Dijk
89b529c07f [ready] ci: add py38 to linters (#1674)
* ci: add py38 to linters

* fix: run linters only on py38

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-26 09:34:15 -04:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
Roelof van Dijk
1900acda09 [READY] ci: setup venv cache (#1475)
* ci: cache installed packages

* ci: trigger jobs

* ci: fix hashfiles argument

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-20 18:43:16 -07:00
George Hotz
012ee7d162 not worth the speed (#1584)
* not worth the speed

* no slots

* uops comments

* bump to python 3.11 for speed

* add critical slots back
2023-08-20 10:24:58 -07:00