Commit Graph

2388 Commits

Author SHA1 Message Date
Max Hahn
f9cb31fdc2 added visitor pattern (#1669)
* added visitor pattern

* pylint bug workaround

* added tests, made abstract OpNode inherit from ABC

* fixed assert

* fix check of abstract classes in negative test

* remove assert False
2023-08-30 09:03:44 -07:00
George Hotz
fdd7f282cb Reenable tensor cores for self-hosted Mac CI (#1717)
* debug 5 matmul

* allow tensor cores in CI

* tensor cores on arm64

* put debug back
2023-08-30 07:53:04 -07:00
chenyu
ac183568be llama JIT python runtime speedup (#1633)
* no JIT call in TransformerBlock

* idea

* move 2 reshapes to jitted function

shrink inside jitted too, 6.3ms

remove back reshapes, 5.5ms

isinstance -> __class__ 4.99ms

* think

revert ops_gpu.py

revert symbolic.py too

PYOPENCL_COMPILER_OUTPUT=1

* cleanup

* fix cache shape for conversational model

only reshape if start_pos > 0

* small cleanup

* include var_vals.keys() to st.key

* add comments

* llama small update

* everything jitted again, similar structure to gpt2

* fix typing

* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
Umut Zengin
1682e9a38a Fix: Stable Diffusion index (#1713) 2023-08-30 00:21:10 -04:00
wozeparrot
2f768e386d stable diffusion benchmark artifact (#1714) 2023-08-29 21:08:40 -04:00
George Hotz
0ea22bf249 remove DEBUG=1 from stable diffusion AMD since jit cache is fixed 2023-08-29 12:46:12 -07:00
George Hotz
ab9b9ff3e2 pipefail benchmark (#1709) (#1710)
* feat: specify shell

* feat: specify shell for mac

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-08-29 08:15:02 -07:00
George Hotz
aa7c98722b sd timing (#1706) 2023-08-28 20:22:57 -07:00
nimlgen
8844a0a822 llvm jitted (#1652) 2023-08-28 20:22:44 -07:00
nimlgen
1c0449e190 add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
George Hotz
f5f8b09c13 allow manual release (#1704) 2023-08-28 17:54:25 -07:00
George Hotz
715047a1e4 fix release publish (#1703) 2023-08-28 17:48:00 -07:00
Olivier Chafik
ee6d8de2dc Llama: load models in HuggingFace format (incl. indexed, safetensors) (#1583) 2023-08-28 15:11:40 -04:00
qazal
3515ba4f23 add dtypes test (#1682) 2023-08-28 08:12:15 -07:00
Roelof van Dijk
50f669e43b [ready] perf: simpler Tensor init (#1679)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 22:18:03 -04:00
Roelof van Dijk
b66f54e379 perf: avoid reshaping if not necessary (#1683)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:17:04 -04:00
Roelof van Dijk
328cf2e86a perf: remove cast and revert back to isinstance (#1694)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:15:52 -04:00
wozeparrot
8b354b3f73 feat: version bump! (#1687) v0.7.0 2023-08-27 12:38:58 -04:00
Roelof van Dijk
abaa605f71 [ready] perf: start enumerate at 1 instead of checking all i (#1691)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 12:00:32 -04:00
Roelof van Dijk
2730ed657f perf: faster lazyop eq (#1693)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 11:17:02 -04:00
Roelof van Dijk
6ca509a485 perf: constant in while in for in busy func (#1688)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 11:13:16 -04:00
Roelof van Dijk
b89d81330f fix: restore old behaviour (#1689)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 10:45:53 -04:00
chenyu
66fbf4800b fix symbolic_ops tests with Tensor.training=True (#1686) 2023-08-26 23:19:56 -04:00
Roelof van Dijk
6c5dc9c153 [ready] perf: faster lazyop init (#1673)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-26 22:59:10 -04:00
wozeparrot
f61d0657d1 document new envvars (#1676)
* feat: document some new envvars

* feat: actually put values

* feat: no more cifar torch

* feat: no fakedata
2023-08-26 20:17:02 -04:00
Yixiang Gao
9d93a82354 remove FAKEDATA (#1685) 2023-08-26 20:15:54 -04:00
chenyu
b5d700adae update openpilot supercombo.onnx to 0.9.4 (#1681)
* update openpilot supercombo.onnx to 0.9.4

* update tests for the new model

* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
Roelof van Dijk
89b529c07f [ready] ci: add py38 to linters (#1674)
* ci: add py38 to linters

* fix: run linters only on py38

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-26 09:34:15 -04:00
Jordan Wright
25be7f745d Tensor.uniform with dtype=int bug fix (#1593) 2023-08-26 01:59:53 -04:00
Roelof van Dijk
f702a8f497 [ready] avoid in-function graph imports in lazy.py (#1666)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-25 13:56:28 -04:00
Roelof van Dijk
02e64da678 refactor: tuples can be concatenated with + (#1671)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-25 12:37:13 -04:00
Yixiang Gao
173850f599 fix CIFAR jit (#1657)
* update mask function

* kept 94 with the new fetcher

clean up batch fetcher

* 94.04% without cutmix

* 94.04% with cutmix

* move batch fetcher to avoid fetching additional batch last STEP
2023-08-24 16:14:40 -07:00
chenyu
f00325e77d ops_metal newCommandQueueWithMaxCommandBufferCount_(1024) (#1664) 2023-08-24 15:42:00 -07:00
DavidFarago
1ba8f0dca3 Quickstart: Upgrade section "Training" to new code (#1663)
Co-authored-by: Dave Farago <dfarago@innoopract.com>
2023-08-24 17:12:16 -04:00
DavidFarago
29adae84eb Quickstart: Use tensors to compute train accuracy (#1662)
Co-authored-by: Dave Farago <dfarago@innoopract.com>
2023-08-24 17:09:12 -04:00
George Hotz
d37d092c14 split linearizer into 3 files (#1654) 2023-08-23 14:58:47 -07:00
George Hotz
1b8c40234f Uast start (#1650)
* work

* more tests

* more tests 2

* don't break it
2023-08-23 12:00:06 -07:00
geohotstan
484708da87 #1615 fix (#1616) 2023-08-23 14:51:05 -04:00
Pavol Rusnak
b57c374164 add accelerator links to readme (#1649) 2023-08-23 14:47:55 -04:00
George Hotz
82623697a8 Move asm renderer (#1648)
* teeny changes

* teeny updates

* move to renderer
2023-08-23 10:06:43 -07:00
George Hotz
a89363574d teeny changes (#1647)
* teeny changes

* teeny updates
2023-08-23 09:53:39 -07:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b do replace div->mul for non-floats (#1644) 2023-08-23 07:34:31 -07:00
George Hotz
da694d4241 move that image import 2023-08-22 21:30:55 -07:00
George Hotz
41e83be3dd simple where broadcast (#1643) 2023-08-22 21:24:49 -07:00
George Hotz
c831218139 Optional: Reduce line count and simplify the LazyBuffer interface (#1642)
* less lines in lazybuffer, def e

* custom function

* cast

* reorder functions

* lb type
2023-08-22 21:01:10 -07:00
George Hotz
d25046e66a matvec tests (#1634)
* matvec tests

* f16

* f16 is broken
2023-08-22 17:33:58 -07:00
George Hotz
643cbdfd50 make embedding and GPT-2 fast (#1631)
* make embedding fast

* jit more, variable shape support

* print mem bw
2023-08-22 15:14:38 -07:00
Niklas D
a7752ad65d Fix link to state.py in quickstart (#1632) 2023-08-22 17:39:30 -04:00
c143
c9c40bb16f Import whole math module in tensor.py (#1628) 2023-08-22 17:07:46 -04:00