Max Hahn
f9cb31fdc2
added visitor pattern ( #1669 )
...
* added visitor pattern
* pylint bug workaround
* added tests, made abstract OpNode inherit from ABC
* fixed assert
* fix check of abstract classes in negative test
* remove assert False
2023-08-30 09:03:44 -07:00
George Hotz
fdd7f282cb
Reenable tensor cores for self-hosted Mac CI ( #1717 )
...
* debug 5 matmul
* allow tensor cores in CI
* tensor cores on arm64
* put debug back
2023-08-30 07:53:04 -07:00
chenyu
ac183568be
llama JIT python runtime speedup ( #1633 )
...
* no JIT call in TransformerBlock
* idea
* move 2 reshapes to jitted function
shrink inside jitted too, 6.3ms
remove back reshapes, 5.5ms
isinstance -> __class__ 4.99ms
* think
revert ops_gpu.py
revert symbolic.py too
PYOPENCL_COMPILER_OUTPUT=1
* cleanup
* fix cache shape for conversational model
only reshape if start_pos > 0
* small cleanup
* include var_vals.keys() to st.key
* add comments
* llama small update
* everything jitted again, similar structure to gpt2
* fix typing
* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
Umut Zengin
1682e9a38a
Fix: Stable Diffusion index ( #1713 )
2023-08-30 00:21:10 -04:00
wozeparrot
2f768e386d
stable diffusion benchmark artifact ( #1714 )
2023-08-29 21:08:40 -04:00
George Hotz
0ea22bf249
remove DEBUG=1 from stable diffusion AMD since jit cache is fixed
2023-08-29 12:46:12 -07:00
George Hotz
ab9b9ff3e2
pipefail benchmark ( #1709 ) ( #1710 )
...
* feat: specify shell
* feat: specify shell for mac
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2023-08-29 08:15:02 -07:00
George Hotz
aa7c98722b
sd timing ( #1706 )
2023-08-28 20:22:57 -07:00
nimlgen
8844a0a822
llvm jitted ( #1652 )
2023-08-28 20:22:44 -07:00
nimlgen
1c0449e190
add cache collector ( #1595 )
...
* init cache collector
* add test_cache_collector.py
* switch GlobalCounters.cache to CacheCollector
* init jit models test
* jitted SD
* add debug msg to print loaded bufs count
* moved cache collctor to jit
* clearer SD
* no double device import
2023-08-28 19:59:55 -07:00
George Hotz
f5f8b09c13
allow manual release ( #1704 )
2023-08-28 17:54:25 -07:00
George Hotz
715047a1e4
fix release publish ( #1703 )
2023-08-28 17:48:00 -07:00
Olivier Chafik
ee6d8de2dc
Llama: load models in HuggingFace format (incl. indexed, safetensors) ( #1583 )
2023-08-28 15:11:40 -04:00
qazal
3515ba4f23
add dtypes test ( #1682 )
2023-08-28 08:12:15 -07:00
Roelof van Dijk
50f669e43b
[ready] perf: simpler Tensor init ( #1679 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-27 22:18:03 -04:00
Roelof van Dijk
b66f54e379
perf: avoid reshaping if not necessary ( #1683 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-27 14:17:04 -04:00
Roelof van Dijk
328cf2e86a
perf: remove cast and revert back to isinstance ( #1694 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-27 14:15:52 -04:00
wozeparrot
8b354b3f73
feat: version bump! ( #1687 )
v0.7.0
2023-08-27 12:38:58 -04:00
Roelof van Dijk
abaa605f71
[ready] perf: start enumerate at 1 instead of checking all i ( #1691 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-27 12:00:32 -04:00
Roelof van Dijk
2730ed657f
perf: faster lazyop eq ( #1693 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-27 11:17:02 -04:00
Roelof van Dijk
6ca509a485
perf: constant in while in for in busy func ( #1688 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-27 11:13:16 -04:00
Roelof van Dijk
b89d81330f
fix: restore old behaviour ( #1689 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-27 10:45:53 -04:00
chenyu
66fbf4800b
fix symbolic_ops tests with Tensor.training=True ( #1686 )
2023-08-26 23:19:56 -04:00
Roelof van Dijk
6c5dc9c153
[ready] perf: faster lazyop init ( #1673 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-26 22:59:10 -04:00
wozeparrot
f61d0657d1
document new envvars ( #1676 )
...
* feat: document some new envvars
* feat: actually put values
* feat: no more cifar torch
* feat: no fakedata
2023-08-26 20:17:02 -04:00
Yixiang Gao
9d93a82354
remove FAKEDATA ( #1685 )
2023-08-26 20:15:54 -04:00
chenyu
b5d700adae
update openpilot supercombo.onnx to 0.9.4 ( #1681 )
...
* update openpilot supercombo.onnx to 0.9.4
* update tests for the new model
* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
Roelof van Dijk
89b529c07f
[ready] ci: add py38 to linters ( #1674 )
...
* ci: add py38 to linters
* fix: run linters only on py38
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-26 09:34:15 -04:00
Jordan Wright
25be7f745d
Tensor.uniform with dtype=int bug fix ( #1593 )
2023-08-26 01:59:53 -04:00
Roelof van Dijk
f702a8f497
[ready] avoid in-function graph imports in lazy.py ( #1666 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-25 13:56:28 -04:00
Roelof van Dijk
02e64da678
refactor: tuples can be concatenated with + ( #1671 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-25 12:37:13 -04:00
Yixiang Gao
173850f599
fix CIFAR jit ( #1657 )
...
* update mask function
* kept 94 with the new fetcher
clean up batch fetcher
* 94.04% without cutmix
* 94.04% with cutmix
* move batch fetcher to avoid fetching additional batch last STEP
2023-08-24 16:14:40 -07:00
chenyu
f00325e77d
ops_metal newCommandQueueWithMaxCommandBufferCount_(1024) ( #1664 )
2023-08-24 15:42:00 -07:00
DavidFarago
1ba8f0dca3
Quickstart: Upgrade section "Training" to new code ( #1663 )
...
Co-authored-by: Dave Farago <dfarago@innoopract.com >
2023-08-24 17:12:16 -04:00
DavidFarago
29adae84eb
Quickstart: Use tensors to compute train accuracy ( #1662 )
...
Co-authored-by: Dave Farago <dfarago@innoopract.com >
2023-08-24 17:09:12 -04:00
George Hotz
d37d092c14
split linearizer into 3 files ( #1654 )
2023-08-23 14:58:47 -07:00
George Hotz
1b8c40234f
Uast start ( #1650 )
...
* work
* more tests
* more tests 2
* don't break it
2023-08-23 12:00:06 -07:00
geohotstan
484708da87
#1615 fix ( #1616 )
2023-08-23 14:51:05 -04:00
Pavol Rusnak
b57c374164
add accelerator links to readme ( #1649 )
2023-08-23 14:47:55 -04:00
George Hotz
82623697a8
Move asm renderer ( #1648 )
...
* teeny changes
* teeny updates
* move to renderer
2023-08-23 10:06:43 -07:00
George Hotz
a89363574d
teeny changes ( #1647 )
...
* teeny changes
* teeny updates
2023-08-23 09:53:39 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b
do replace div->mul for non-floats ( #1644 )
2023-08-23 07:34:31 -07:00
George Hotz
da694d4241
move that image import
2023-08-22 21:30:55 -07:00
George Hotz
41e83be3dd
simple where broadcast ( #1643 )
2023-08-22 21:24:49 -07:00
George Hotz
c831218139
Optional: Reduce line count and simplify the LazyBuffer interface ( #1642 )
...
* less lines in lazybuffer, def e
* custom function
* cast
* reorder functions
* lb type
2023-08-22 21:01:10 -07:00
George Hotz
d25046e66a
matvec tests ( #1634 )
...
* matvec tests
* f16
* f16 is broken
2023-08-22 17:33:58 -07:00
George Hotz
643cbdfd50
make embedding and GPT-2 fast ( #1631 )
...
* make embedding fast
* jit more, variable shape support
* print mem bw
2023-08-22 15:14:38 -07:00
Niklas D
a7752ad65d
Fix link to state.py in quickstart ( #1632 )
2023-08-22 17:39:30 -04:00
c143
c9c40bb16f
Import whole math module in tensor.py ( #1628 )
2023-08-22 17:07:46 -04:00