Commit Graph

10417 Commits

Author SHA1 Message Date
David Hou
e74a6ca7e4 expand in terms of substitute (#1827) 2023-09-09 14:43:00 -07:00
George Hotz
0e3e2bac13 amd wino: upload results 2023-09-09 13:57:14 -07:00
George Hotz
6f95c5f284 winograd speed test for AMD (#1826) 2023-09-09 13:56:33 -07:00
George Hotz
0f2bd10d00 add winograd CIFAR to mac tests (#1825)
* add winograd CIFAR to mac tests

* symlink already done
2023-09-09 13:45:24 -07:00
nimlgen
31fca43706 kopt works with local+grouped reduce and tests (#1824) 2023-09-09 13:22:09 -07:00
chenyu
9da40c8448 move Node.__lt__ SumNode special case to SumNode (#1823) 2023-09-09 13:20:38 -07:00
Francis Lam
651205fa5c linearizer: support local and group_for_reduce dimensions together (#1821)
also minor changes to test_speed_v_torch.py and size of UOps.SPECIAL
2023-09-08 12:39:27 -07:00
segf00lt
9e8c1dbf34 patch to remove hack from stable_diffusion.py (#1814)
* patch to remove hack from stable_diffusion.py

* sorry linter

* realize after assign?

* float16 broken in llvmlite use float64 for now

* int32

* idiot forgot to change test array dtype
2023-09-08 09:26:50 -07:00
chenyu
ebcda8a714 Move var_vals from ShapeTracker to LazyBuffer (#1819) 2023-09-08 09:25:10 -07:00
kormann
7ac65a93b4 utils.printtree (#1816)
* utils.printtree

* linter compliance

* rename to print_tree
2023-09-07 23:08:57 -07:00
George Hotz
4613c9e77c add tvm example, formatting (#1813)
* add tvm example

* no realize
2023-09-07 11:50:41 -07:00
nimlgen
5b15a972b5 no functions with same names in test/ (#1811) 2023-09-07 11:27:31 -07:00
George Hotz
722823dee1 stable diffusion: force fp16 free 2023-09-06 15:11:05 -07:00
chenyu
928cb1a64a AndNode.substitute short circuit (#1800)
* AndNode substitute short circuit

* Node.__bool__ is faster than Node.__eq__
2023-09-06 14:58:49 -07:00
nimlgen
a78a1fa499 fix jit buffer reuse when freed (#1802)
* fix jit buffer reuse when freed

* Firbid output_buffer reusage
2023-09-06 14:41:57 -07:00
Yixiang Gao
22cf15e9d0 convert function into tinygrad (#1803) 2023-09-06 14:41:26 -07:00
Pavol Rusnak
52a92bf95d use class Foo: instead of class Foo(): (#1797)
* use class Foo: instead of class Foo():

* add ruff linter, copy settings from .flake8 to ruff.toml
2023-09-06 12:20:25 -07:00
badcc
fd25792c8b Ensure freqs as type float32 in freqs_cis (#1798) 2023-09-06 10:24:15 -07:00
chenyu
35072877ef sym_infer is noop for int input (#1795) 2023-09-06 09:17:20 -07:00
George Hotz
f67638b27a delete broken DDPG example 2023-09-06 08:01:12 -07:00
George Hotz
78a43ad2c7 add uop fixup (#1793) 2023-09-06 07:55:22 -07:00
geohotstan
1bbf26d7fd fix try except not catching fxn() in benchmark (#1783)
* have function raise notimplementederror

* more lines

* revert back to 2 lines :D

* aahhhhhhhh shoooot im stupid

* keep it minimal?
2023-09-06 07:36:43 -07:00
chenyu
09e78a9d07 Node does not need to subclass ABC (#1792)
* Node does not need to subclass ABC

* class Node:
2023-09-06 07:35:45 -07:00
badcc
ee9ac20752 Use correct dtype in Tensor when data is an ndarray (#1785)
* use correct dtype in Tensor when data is an ndarray

* attempt 2

* add assert to be consistent

* Add test case for ndarray

* Add test case for list

* remove whitespace
2023-09-06 07:35:32 -07:00
nimlgen
130cd55942 fix gpu compilation of const GEP (#1788) 2023-09-06 07:34:46 -07:00
George Hotz
e10a9692ec Revert "fix attn_mask None issue" (#1787)
* Revert "fix attn_mask None issue (#1786)"

This reverts commit bd06d88c73.

* Update tensor.py
2023-09-05 21:18:55 -07:00
David Hou
343b256deb PoC fast winograd compile (#1771)
* proof of concept for variable replace global load

* small hacks to make faster

* clean up a little?

* linter

* allow substituting with an expression

* clean up a little

* fix everything

* try to fix bug?

* type annotation

* typing

* typing
2023-09-05 21:14:40 -07:00
Pavol Rusnak
a50a7ef6f2 revert typo in external_multi_gpu.py (#1777)
introduced by fb1cc6bf4b
2023-09-05 20:46:28 -07:00
George Hotz
bd06d88c73 fix attn_mask None issue (#1786) 2023-09-05 20:45:54 -07:00
Francis Lam
0379b64ac4 add seed option to stable_diffusion (#1784)
useful for testing correctness of model runs
2023-09-05 19:45:15 -07:00
George Hotz
6100d7425f add 2 to locals, uops debug 5 (#1782) 2023-09-05 19:44:43 -07:00
Roelof van Dijk
2a11669e1d perf: faster and more readable merge_dicts (#1775)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-05 14:42:19 -07:00
George Hotz
89a8a02697 disable openpilot model in model benchmark 2023-09-05 13:32:30 -07:00
geohotstan
9af5645ba3 onnx full passing (#1076)
* 1

* 83 failed

* learning how git works

* lol idk

* zero shape aaaa

* space lol

* aaa

* test check

* haha

* fixed gather

* 73 failing

* 71 failing

* 68 failing

* added some debug

* fking resize

* lol

* 62 failing

* 58 failling fucking did nearest resize hell yeah

* clean up

* 56 failing

* janitor duty

* lol

* 53 failing

* hi mom

* 50 failing

* added linear interp, but coord_trans is wrong

* did lin interpolation woohoo

* 43 failing

* 40 failing

* temporary Gather fix

* 39 failing

* fixed slice onnxver<10

* 37 failing

* 35 failing

* excluded tests that use float64

* 32 failing with hacks

* added _batchnorm() for 3D 5D batchnorm, 29 failing

* changed ALLOWED_KERNEL_COUNT from 199 to 207

* added improved Gather op, reverted ALLOWED_KERNEL_COUNT commit

* support Round op

* added storage_order/indices maxpool, 27 failing

* support maxunpool, 25 failures

* support Gradient, 23 failures

* merged new where

* added Adam

* cleanups

* added Momentum and Nesterov Momentum

* added Adagrad

* support sequence_type, 20 failing

* ugh git

* I give up on cubic interp :D, 9 failing

* sexy 1 liner gather, much improved, wow

* polished gather to make it shine bright like a diamond

* clean 1 liner for gather

* improved readability of gather

* uhh

* clean up

* more clean up

* WHITEspace

* implemented SoftmaxCrossEntropyLoss op

* added comments and cleaned up if statements

* update

* thank based wozeparrot for pow and new GatherElements

* CPU and TORCH all pass | cast float64 -> float32 for all fromCPU()

* _nearest_gather() failing on yolo

* reverted ops_cpu change and added assert in Resize

* added comments for resize for multiple channels

* oops

* merge

* test

* switched np.pad to Tensor.pad for constant padding

* gah

* gah2

* sexy reflect pad with movementops -> add

* delete commented out lines

* edge mode pad sexy as well

* trying out model_benchmark

* revert gitignore change lol

* init

* Revert "init"

This reverts commit 682bf2073a.

* wrote cast workaround for CPU, CPU and TORCH all pass

* wrote cast workaround for CPU, CPU and TORCH all pass

* skipped tests w/ 0 shape for METAL and GPU

* excluded tests for CLANG, CPU, TORCH, CLANG pass

* fixed hacky ConvTranspose

* gotta figure out autopad

* UOps.STORE support cast bool -> float

* small fix for fast gather

* reverted 0 shape skipped tests

* oops missed a file

* added comment

* fixed slice op hack

* First commit to pr

* More trig ops

* More trig ops

* format

* isinf support

* More ops

* changed onnx_ops to use our new gather :D

* Det op bug fix

* rebase

* fixed some tests

* det broken and slow

* fixed compress to use new gather

* implemented argmax argmin

* support variable types in type_proto

* support Upsample and Identity sequence

* we support float64 now and tinygrad support automatic broadcasting

* added EyeLike op

* resize does support multiple channels now actually

* yolov8 onnx runs successfully

* added batch size 1

* oops

* finally fixed type_proto I think

* fixed some llvm bugs

* del whitespaces

* added ZenginU Format PR

* test

* oops

* added float64 exclude tests back

* more skipped tests

* try

* ok openpilot pass

* flake8 pass

* woooooohooo

* revert external_model_benchmark changes

* perf tested gather

* removed promote types from ops_cpu

* numerical errors from 1681 is fixed

---------

Co-authored-by: ZenginU <umutzengin00@gmail.com>
2023-09-05 13:23:32 -07:00
George Hotz
fb1cc6bf4b llama jit is default, print tok/sec (#1774)
* llama jit is default, print tok/sec

* jit not default in CI
2023-09-05 10:12:16 -07:00
Roelof van Dijk
f6e6a1a4d7 perf: avoid cast, restore isinstance (#1772)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-05 09:07:04 -04:00
geohotstan
671101e6b8 Metal stuff pip install on default when on Darwin (#1770)
* added to setup

* split lines for Darwin stuff
2023-09-04 21:59:54 -07:00
George Hotz
10305bfc0a tuples only (#1769) 2023-09-04 16:35:11 -07:00
George Hotz
63c46e0287 Parens and gls (#1768)
* more paren stripping

* remove global and local size from renderers

* complex strip parens

* extra helpers + minor webgpu fix

* fix test uops

* one more parens test
2023-09-04 16:09:01 -07:00
Adrian Kretz
3473c9e88d Metal conv tensor cores (#1696)
* Benchmark 5x5 conv kernel which is optimized

* Use Metal tensor cores in 2d convs
2023-09-04 15:14:46 -07:00
George Hotz
b32ed8e6e9 removing loop (#1764)
* removing loop

* fix llvm

* remove unused

* strip parens

* with side effects

* define global has side effects
2023-09-04 14:47:46 -07:00
tomtom-95
7344f7c2d1 KeyError fixed. (#1763) 2023-09-04 15:36:16 -04:00
Roelof van Dijk
fd8e14c07a fix: unused function (#1759)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-04 11:39:50 -07:00
Roelof van Dijk
c826854e48 fix: remove unused function (#1760)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-04 11:39:34 -07:00
Roelof van Dijk
2aaecc1ce4 fix: remove unused function (#1761)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-04 11:39:27 -07:00
nimlgen
f863c12610 test kopt correctness (#1756)
* test kopt correctness

* bump BUDGET to 20

* kopt hooks as setUp/tearDown
2023-09-04 10:55:00 -07:00
George Hotz
c6d5d45a2b Remove MemOp (#1750)
* start removing memop

* locals

* support both stores

* might be correct

* remove parens on shape ish

* fix metal ops

* render load and render store

* fix image

* maybe fix asm

* fix test uops

* revert asm

* remove memop itself
2023-09-04 09:58:33 -07:00
George Hotz
56abe04e4b disable assembly (#1755) 2023-09-04 09:41:20 -07:00
chenyu
b8fde6bb0f Test KOPT in CI (#1744)
* test kopt in ci

* getenv takes dtype from default
2023-09-03 14:37:20 -07:00
George Hotz
ed194a1d3b zero fold (#1748)
* add constant fold

* err, it's just zero folding

* self store fold + caching

* prints and more folds

* simpler winograd kernels

* remove childless uops
2023-09-03 13:48:11 -07:00