Commit Graph

2480 Commits

Author SHA1 Message Date
wozeparrot
c870764940 Revert "add line changes diff bot to CI (#1863)" (#1870) 2023-09-15 16:56:42 -04:00
Yixiang Gao
789c84a7a3 add line changes diff bot to CI (#1863) 2023-09-15 16:29:58 -04:00
chenyu
29ac8293d7 run gpt2 in CI (#1866) 2023-09-15 04:37:02 +08:00
chenyu
1b46de1a3e fix type of helpers.prod, add test cases (#1859) 2023-09-14 05:16:55 +08:00
chenyu
e67306ba04 symbolic shape type with TypeGuard (#1852) 2023-09-13 05:27:22 +08:00
Roelof van Dijk
c91b44f7bf refactor: move size to view (#1848)
* refactor: move size to view

* fix: pylint

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-11 07:16:04 -07:00
chenyu
9e9ea20784 Fix view, CI cpu test with python 3.8 (#1845) 2023-09-10 22:37:58 -04:00
chenyu
3ec301c2d7 apply view.py patch (#1844) 2023-09-10 17:32:15 -07:00
Yixiang Gao
a32951a001 add test_tensor_copy (#1840)
* add  test_tensor_copy

* fix whitespace

* add value check
2023-09-10 16:01:58 -07:00
Roelof van Dijk
1bc52c60df fix: minor tweaks to view (#1842)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-10 15:55:57 -07:00
George Hotz
47e602f717 view: do not trade complexity for speed (#1839)
* view: do not trade complexity for speed

* staticmethods

* view create
2023-09-10 11:29:53 -07:00
chenyu
c0bc4cfbaf DivNode.b is int (#1833) 2023-09-10 09:04:29 -07:00
nimlgen
13790b1e20 cast types in render_load (#1837) 2023-09-10 07:58:13 -07:00
David Hou
e74a6ca7e4 expand in terms of substitute (#1827) 2023-09-09 14:43:00 -07:00
George Hotz
0e3e2bac13 amd wino: upload results 2023-09-09 13:57:14 -07:00
George Hotz
6f95c5f284 winograd speed test for AMD (#1826) 2023-09-09 13:56:33 -07:00
George Hotz
0f2bd10d00 add winograd CIFAR to mac tests (#1825)
* add winograd CIFAR to mac tests

* symlink already done
2023-09-09 13:45:24 -07:00
nimlgen
31fca43706 kopt works with local+grouped reduce and tests (#1824) 2023-09-09 13:22:09 -07:00
chenyu
9da40c8448 move Node.__lt__ SumNode special case to SumNode (#1823) 2023-09-09 13:20:38 -07:00
Francis Lam
651205fa5c linearizer: support local and group_for_reduce dimensions together (#1821)
also minor changes to test_speed_v_torch.py and size of UOps.SPECIAL
2023-09-08 12:39:27 -07:00
segf00lt
9e8c1dbf34 patch to remove hack from stable_diffusion.py (#1814)
* patch to remove hack from stable_diffusion.py

* sorry linter

* realize after assign?

* float16 broken in llvmlite use float64 for now

* int32

* idiot forgot to change test array dtype
2023-09-08 09:26:50 -07:00
chenyu
ebcda8a714 Move var_vals from ShapeTracker to LazyBuffer (#1819) 2023-09-08 09:25:10 -07:00
kormann
7ac65a93b4 utils.printtree (#1816)
* utils.printtree

* linter compliance

* rename to print_tree
2023-09-07 23:08:57 -07:00
George Hotz
4613c9e77c add tvm example, formatting (#1813)
* add tvm example

* no realize
2023-09-07 11:50:41 -07:00
nimlgen
5b15a972b5 no functions with same names in test/ (#1811) 2023-09-07 11:27:31 -07:00
George Hotz
722823dee1 stable diffusion: force fp16 free 2023-09-06 15:11:05 -07:00
chenyu
928cb1a64a AndNode.substitute short circuit (#1800)
* AndNode substitute short circuit

* Node.__bool__ is faster than Node.__eq__
2023-09-06 14:58:49 -07:00
nimlgen
a78a1fa499 fix jit buffer reuse when freed (#1802)
* fix jit buffer reuse when freed

* Firbid output_buffer reusage
2023-09-06 14:41:57 -07:00
Yixiang Gao
22cf15e9d0 convert function into tinygrad (#1803) 2023-09-06 14:41:26 -07:00
Pavol Rusnak
52a92bf95d use class Foo: instead of class Foo(): (#1797)
* use class Foo: instead of class Foo():

* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
badcc
fd25792c8b Ensure freqs as type float32 in freqs_cis (#1798) 2023-09-06 10:24:15 -07:00
chenyu
35072877ef sym_infer is noop for int input (#1795) 2023-09-06 09:17:20 -07:00
George Hotz
f67638b27a delete broken DDPG example 2023-09-06 08:01:12 -07:00
George Hotz
78a43ad2c7 add uop fixup (#1793) 2023-09-06 07:55:22 -07:00
geohotstan
1bbf26d7fd fix try except not catching fxn() in benchmark (#1783)
* have function raise notimplementederror

* more lines

* revert back to 2 lines :D

* aahhhhhhhh shoooot im stupid

* keep it minimal?
2023-09-06 07:36:43 -07:00
chenyu
09e78a9d07 Node does not need to subclass ABC (#1792)
* Node does not need to subclass ABC

* class Node:
2023-09-06 07:35:45 -07:00
badcc
ee9ac20752 Use correct dtype in Tensor when data is an ndarray (#1785)
* use correct dtype in Tensor when data is an ndarray

* attempt 2

* add assert to be consistent

* Add test case for ndarray

* Add test case for list

* remove whitespace
2023-09-06 07:35:32 -07:00
nimlgen
130cd55942 fix gpu compilation of const GEP (#1788) 2023-09-06 07:34:46 -07:00
George Hotz
e10a9692ec Revert "fix attn_mask None issue" (#1787)
* Revert "fix attn_mask None issue (#1786)"

This reverts commit bd06d88c73.

* Update tensor.py
2023-09-05 21:18:55 -07:00
David Hou
343b256deb PoC fast winograd compile (#1771)
* proof of concept for variable replace global load

* small hacks to make faster

* clean up a little?

* linter

* allow substituting with an expression

* clean up a little

* fix everything

* try to fix bug?

* type annotation

* typing

* typing
2023-09-05 21:14:40 -07:00
Pavol Rusnak
a50a7ef6f2 revert typo in external_multi_gpu.py (#1777)
introduced by fb1cc6bf4b
2023-09-05 20:46:28 -07:00
George Hotz
bd06d88c73 fix attn_mask None issue (#1786) 2023-09-05 20:45:54 -07:00
Francis Lam
0379b64ac4 add seed option to stable_diffusion (#1784)
useful for testing correctness of model runs
2023-09-05 19:45:15 -07:00
George Hotz
6100d7425f add 2 to locals, uops debug 5 (#1782) 2023-09-05 19:44:43 -07:00
Roelof van Dijk
2a11669e1d perf: faster and more readable merge_dicts (#1775)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-05 14:42:19 -07:00
George Hotz
89a8a02697 disable openpilot model in model benchmark 2023-09-05 13:32:30 -07:00
geohotstan
9af5645ba3 onnx full passing (#1076)
* 1

* 83 failed

* learning how git works

* lol idk

* zero shape aaaa

* space lol

* aaa

* test check

* haha

* fixed gather

* 73 failing

* 71 failing

* 68 failing

* added some debug

* fking resize

* lol

* 62 failing

* 58 failling fucking did nearest resize hell yeah

* clean up

* 56 failing

* janitor duty

* lol

* 53 failing

* hi mom

* 50 failing

* added linear interp, but coord_trans is wrong

* did lin interpolation woohoo

* 43 failing

* 40 failing

* temporary Gather fix

* 39 failing

* fixed slice onnxver<10

* 37 failing

* 35 failing

* excluded tests that use float64

* 32 failing with hacks

* added _batchnorm() for 3D 5D batchnorm, 29 failing

* changed ALLOWED_KERNEL_COUNT from 199 to 207

* added improved Gather op, reverted ALLOWED_KERNEL_COUNT commit

* support Round op

* added storage_order/indices maxpool, 27 failing

* support maxunpool, 25 failures

* support Gradient, 23 failures

* merged new where

* added Adam

* cleanups

* added Momentum and Nesterov Momentum

* added Adagrad

* support sequence_type, 20 failing

* ugh git

* I give up on cubic interp :D, 9 failing

* sexy 1 liner gather, much improved, wow

* polished gather to make it shine bright like a diamond

* clean 1 liner for gather

* improved readability of gather

* uhh

* clean up

* more clean up

* WHITEspace

* implemented SoftmaxCrossEntropyLoss op

* added comments and cleaned up if statements

* update

* thank based wozeparrot for pow and new GatherElements

* CPU and TORCH all pass | cast float64 -> float32 for all fromCPU()

* _nearest_gather() failing on yolo

* reverted ops_cpu change and added assert in Resize

* added comments for resize for multiple channels

* oops

* merge

* test

* switched np.pad to Tensor.pad for constant padding

* gah

* gah2

* sexy reflect pad with movementops -> add

* delete commented out lines

* edge mode pad sexy as well

* trying out model_benchmark

* revert gitignore change lol

* init

* Revert "init"

This reverts commit 682bf2073a.

* wrote cast workaround for CPU, CPU and TORCH all pass

* wrote cast workaround for CPU, CPU and TORCH all pass

* skipped tests w/ 0 shape for METAL and GPU

* excluded tests for CLANG, CPU, TORCH, CLANG pass

* fixed hacky ConvTranspose

* gotta figure out autopad

* UOps.STORE support cast bool -> float

* small fix for fast gather

* reverted 0 shape skipped tests

* oops missed a file

* added comment

* fixed slice op hack

* First commit to pr

* More trig ops

* More trig ops

* format

* isinf support

* More ops

* changed onnx_ops to use our new gather :D

* Det op bug fix

* rebase

* fixed some tests

* det broken and slow

* fixed compress to use new gather

* implemented argmax argmin

* support variable types in type_proto

* support Upsample and Identity sequence

* we support float64 now and tinygrad support automatic broadcasting

* added EyeLike op

* resize does support multiple channels now actually

* yolov8 onnx runs successfully

* added batch size 1

* oops

* finally fixed type_proto I think

* fixed some llvm bugs

* del whitespaces

* added ZenginU Format PR

* test

* oops

* added float64 exclude tests back

* more skipped tests

* try

* ok openpilot pass

* flake8 pass

* woooooohooo

* revert external_model_benchmark changes

* perf tested gather

* removed promote types from ops_cpu

* numerical errors from 1681 is fixed

---------

Co-authored-by: ZenginU <umutzengin00@gmail.com>
2023-09-05 13:23:32 -07:00
George Hotz
fb1cc6bf4b llama jit is default, print tok/sec (#1774)
* llama jit is default, print tok/sec

* jit not default in CI
2023-09-05 10:12:16 -07:00
Roelof van Dijk
f6e6a1a4d7 perf: avoid cast, restore isinstance (#1772)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-05 09:07:04 -04:00
geohotstan
671101e6b8 Metal stuff pip install on default when on Darwin (#1770)
* added to setup

* split lines for Darwin stuff
2023-09-04 21:59:54 -07:00