Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
e17b1af160 UnaryOps.NEG (#1749) 2023-09-03 12:44:26 -07:00
George Hotz
9f1a54acee pretty kernel in cstyle (#1746)
* pretty kernel in cstyle

* fix mem estimate

* that made it slower

* Revert "that made it slower"

This reverts commit faa4cd0187.
2023-09-03 10:21:02 -07:00
George Hotz
e910e0e62c folding mul by 0 (#1743)
* why doesn't this work

* zero mlop

* explicit fold in winograd
2023-09-03 09:04:12 -07:00
David Hou
3151d91f6e 3x3 winograd convs (#1675)
* winograd

* simplify local groups code

* comment

* respects self.opts.has_local

* always simplify ones

* make mypy happy

* move reshape, WINO flag

* wino flag, simple forward backward test for wino

* extra wino test

* merge oops

* comments

* axis_needs_valid -> axis_is_masked

* don't delete needs_valid (it's unused though)

* make linter happy

* make linter happy

* smaller test

* change number

* make wino tests very small
2023-09-03 07:29:43 -07:00
crankygrumpster
c8025c319c Remove Token from abstractions.py (#1741)
* Remove Token from abstractions.py, update output string

* add dtype
2023-09-02 21:56:11 -07:00
geohotstan
e36148b1ce Make __getitem__ TINYer (#1661) 2023-09-02 23:01:01 -04:00
Roelof van Dijk
60590cf8b5 perf: create buffer only when needed (#1684)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-09-02 17:43:29 -07:00
Yixiang Gao
66a6bbd029 codellama (#1702)
* add codellama with pre-downloaded weights

* add rope_theta, fix param

* fix test

* add 7B-Python

* add 7B-Instruct

* replace single quotes with doulbe

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-02 08:45:12 -07:00
chenyu
a2745819f6 faster gpt2 jit path and gpt2 in test_real_world (#1738) 2023-09-02 08:39:12 -07:00
George Hotz
89cd380bfc add nvidia CI (#1737)
* add nvidia

* speed(nvidia)
2023-09-01 22:02:30 -07:00
George Hotz
91258aa67f render const (#1736)
* render const

* remove constop

* fix llvm and webgpu

* disable consts in llvm again

* assembly special

* fix const rendering

* fix arm64

* imms are int

* fix ptx

* fix arm64
2023-09-01 19:01:43 -07:00
nimlgen
a96e54d8bb search for grouped reduces (#1732) 2023-09-01 14:21:10 -07:00
George Hotz
cd844ec4b2 remove Token class (#1723)
* no fusion

* no float4 grouping

* mulacc fusion is fine. remove uop_alu

* fully remove get_grouped_maybe_float4

* removed that test

* that's not float4 anymore

* disable failing arm64

* metal ops pass tokenless

* fix wmma

* update test_uops with new style

* fix gep

* fix float4 store

* fix float4 store more

* cuda tests pass

* disable broadcast pow

* fix ptx

* reenable arm64

* bring cse back

* don't cache the acc

* fix ptx bug
2023-09-01 12:53:07 -07:00
George Hotz
458eb89463 minor changes from prerender (#1734) 2023-09-01 10:04:47 -07:00
chenyu
f964b9e5ee visitor pattern for sym_infer and unit tests (#1733)
* visitor pattern for sym_infer and unit tests

* comments
2023-09-01 09:47:45 -07:00
wozeparrot
bf05534c6e hip multidevice (#1728)
* feat: hip multidevice support + p2p

* feat: default device
2023-09-01 06:46:13 -07:00
JaSpa99
024dd690fa Reactivate commavq/gpt2m benchmark (#1731)
* get commavq/gpt2m from huggingface

* increase tols
2023-09-01 06:45:08 -07:00
George Hotz
7780eb3c5a minor dimensions (#1730) 2023-09-01 06:42:00 -07:00
George Hotz
5c403d43b9 New >3 indexing (#1729)
* move reindexing into linearizer

* get_grouped_dims

* don't limit for clang
2023-08-31 21:24:15 -07:00
George Hotz
e3a062ad17 real matvec test 2023-08-31 17:27:25 -07:00
George Hotz
453e437598 move stuff in the linearizer (#1726)
* move stuff in linearizer

* move stuff in linearizer

* minor

* fix opts import
2023-08-31 14:42:09 -07:00
George Hotz
c18a497dde minor global dim cleanup (#1724) 2023-08-31 12:23:39 -07:00
geohotstan
94b1257f5e Changed DEVICE to Device.DEFAULT in deep_determinist_policy_gradient (#1715)
* added device in optim and deep

* oops forgot to del print code

* use Device.DEFAULT instead

* removed device
2023-08-31 07:08:51 -07:00
nimlgen
b5cf274da3 remove memory peak for quantized llama (#1720) 2023-08-30 16:32:30 -04:00
chenyu
e4eb5d55c7 critical realize for unjitted llama (#1718) 2023-08-30 14:52:32 -04:00
George Hotz
cd7ceed914 gpt2: print total instead of sync time 2023-08-30 10:59:42 -07:00
Roelof van Dijk
62536d6000 perf: use enumerate where possible (#1692)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-30 10:41:51 -07:00
Karan Handa
a8aa13dc91 [ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
nimlgen
355b02dc3f allow zerosized tensors (#1659)
* allow zerosized tensors

* works with numpy
2023-08-30 10:39:24 -07:00
Max Hahn
f9cb31fdc2 added visitor pattern (#1669)
* added visitor pattern

* pylint bug workaround

* added tests, made abstract OpNode inherit from ABC

* fixed assert

* fix check of abstract classes in negative test

* remove assert False
2023-08-30 09:03:44 -07:00
George Hotz
fdd7f282cb Reenable tensor cores for self-hosted Mac CI (#1717)
* debug 5 matmul

* allow tensor cores in CI

* tensor cores on arm64

* put debug back
2023-08-30 07:53:04 -07:00
chenyu
ac183568be llama JIT python runtime speedup (#1633)
* no JIT call in TransformerBlock

* idea

* move 2 reshapes to jitted function

shrink inside jitted too, 6.3ms

remove back reshapes, 5.5ms

isinstance -> __class__ 4.99ms

* think

revert ops_gpu.py

revert symbolic.py too

PYOPENCL_COMPILER_OUTPUT=1

* cleanup

* fix cache shape for conversational model

only reshape if start_pos > 0

* small cleanup

* include var_vals.keys() to st.key

* add comments

* llama small update

* everything jitted again, similar structure to gpt2

* fix typing

* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
Umut Zengin
1682e9a38a Fix: Stable Diffusion index (#1713) 2023-08-30 00:21:10 -04:00
wozeparrot
2f768e386d stable diffusion benchmark artifact (#1714) 2023-08-29 21:08:40 -04:00
George Hotz
0ea22bf249 remove DEBUG=1 from stable diffusion AMD since jit cache is fixed 2023-08-29 12:46:12 -07:00
George Hotz
ab9b9ff3e2 pipefail benchmark (#1709) (#1710)
* feat: specify shell

* feat: specify shell for mac

Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2023-08-29 08:15:02 -07:00
George Hotz
aa7c98722b sd timing (#1706) 2023-08-28 20:22:57 -07:00
nimlgen
8844a0a822 llvm jitted (#1652) 2023-08-28 20:22:44 -07:00
nimlgen
1c0449e190 add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
George Hotz
f5f8b09c13 allow manual release (#1704) 2023-08-28 17:54:25 -07:00
George Hotz
715047a1e4 fix release publish (#1703) 2023-08-28 17:48:00 -07:00
Olivier Chafik
ee6d8de2dc Llama: load models in HuggingFace format (incl. indexed, safetensors) (#1583) 2023-08-28 15:11:40 -04:00
qazal
3515ba4f23 add dtypes test (#1682) 2023-08-28 08:12:15 -07:00
Roelof van Dijk
50f669e43b [ready] perf: simpler Tensor init (#1679)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 22:18:03 -04:00
Roelof van Dijk
b66f54e379 perf: avoid reshaping if not necessary (#1683)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:17:04 -04:00
Roelof van Dijk
328cf2e86a perf: remove cast and revert back to isinstance (#1694)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 14:15:52 -04:00
wozeparrot
8b354b3f73 feat: version bump! (#1687) v0.7.0 2023-08-27 12:38:58 -04:00
Roelof van Dijk
abaa605f71 [ready] perf: start enumerate at 1 instead of checking all i (#1691)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 12:00:32 -04:00
Roelof van Dijk
2730ed657f perf: faster lazyop eq (#1693)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 11:17:02 -04:00
Roelof van Dijk
6ca509a485 perf: constant in while in for in busy func (#1688)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-27 11:13:16 -04:00