Commit Graph

4667 Commits

Author SHA1 Message Date
George Hotz
fb1cc6bf4b llama jit is default, print tok/sec (#1774)
* llama jit is default, print tok/sec

* jit not default in CI
2023-09-05 10:12:16 -07:00
George Hotz
63c46e0287 Parens and gls (#1768)
* more paren stripping

* remove global and local size from renderers

* complex strip parens

* extra helpers + minor webgpu fix

* fix test uops

* one more parens test
2023-09-04 16:09:01 -07:00
Adrian Kretz
3473c9e88d Metal conv tensor cores (#1696)
* Benchmark 5x5 conv kernel which is optimized

* Use Metal tensor cores in 2d convs
2023-09-04 15:14:46 -07:00
tomtom-95
7344f7c2d1 KeyError fixed. (#1763) 2023-09-04 15:36:16 -04:00
nimlgen
f863c12610 test kopt correctness (#1756)
* test kopt correctness

* bump BUDGET to 20

* kopt hooks as setUp/tearDown
2023-09-04 10:55:00 -07:00
George Hotz
c6d5d45a2b Remove MemOp (#1750)
* start removing memop

* locals

* support both stores

* might be correct

* remove parens on shape ish

* fix metal ops

* render load and render store

* fix image

* maybe fix asm

* fix test uops

* revert asm

* remove memop itself
2023-09-04 09:58:33 -07:00
chenyu
b8fde6bb0f Test KOPT in CI (#1744)
* test kopt in ci

* getenv takes dtype from default
2023-09-03 14:37:20 -07:00
George Hotz
ed194a1d3b zero fold (#1748)
* add constant fold

* err, it's just zero folding

* self store fold + caching

* prints and more folds

* simpler winograd kernels

* remove childless uops
2023-09-03 13:48:11 -07:00
George Hotz
e17b1af160 UnaryOps.NEG (#1749) 2023-09-03 12:44:26 -07:00
David Hou
3151d91f6e 3x3 winograd convs (#1675)
* winograd

* simplify local groups code

* comment

* respects self.opts.has_local

* always simplify ones

* make mypy happy

* move reshape, WINO flag

* wino flag, simple forward backward test for wino

* extra wino test

* merge oops

* comments

* axis_needs_valid -> axis_is_masked

* don't delete needs_valid (it's unused though)

* make linter happy

* make linter happy

* smaller test

* change number

* make wino tests very small
2023-09-03 07:29:43 -07:00
geohotstan
e36148b1ce Make __getitem__ TINYer (#1661) 2023-09-02 23:01:01 -04:00
Yixiang Gao
66a6bbd029 codellama (#1702)
* add codellama with pre-downloaded weights

* add rope_theta, fix param

* fix test

* add 7B-Python

* add 7B-Instruct

* replace single quotes with doulbe

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-02 08:45:12 -07:00
chenyu
a2745819f6 faster gpt2 jit path and gpt2 in test_real_world (#1738) 2023-09-02 08:39:12 -07:00
George Hotz
91258aa67f render const (#1736)
* render const

* remove constop

* fix llvm and webgpu

* disable consts in llvm again

* assembly special

* fix const rendering

* fix arm64

* imms are int

* fix ptx

* fix arm64
2023-09-01 19:01:43 -07:00
George Hotz
cd844ec4b2 remove Token class (#1723)
* no fusion

* no float4 grouping

* mulacc fusion is fine. remove uop_alu

* fully remove get_grouped_maybe_float4

* removed that test

* that's not float4 anymore

* disable failing arm64

* metal ops pass tokenless

* fix wmma

* update test_uops with new style

* fix gep

* fix float4 store

* fix float4 store more

* cuda tests pass

* disable broadcast pow

* fix ptx

* reenable arm64

* bring cse back

* don't cache the acc

* fix ptx bug
2023-09-01 12:53:07 -07:00
George Hotz
458eb89463 minor changes from prerender (#1734) 2023-09-01 10:04:47 -07:00
chenyu
f964b9e5ee visitor pattern for sym_infer and unit tests (#1733)
* visitor pattern for sym_infer and unit tests

* comments
2023-09-01 09:47:45 -07:00
JaSpa99
024dd690fa Reactivate commavq/gpt2m benchmark (#1731)
* get commavq/gpt2m from huggingface

* increase tols
2023-09-01 06:45:08 -07:00
George Hotz
5c403d43b9 New >3 indexing (#1729)
* move reindexing into linearizer

* get_grouped_dims

* don't limit for clang
2023-08-31 21:24:15 -07:00
George Hotz
e3a062ad17 real matvec test 2023-08-31 17:27:25 -07:00
Karan Handa
a8aa13dc91 [ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
nimlgen
355b02dc3f allow zerosized tensors (#1659)
* allow zerosized tensors

* works with numpy
2023-08-30 10:39:24 -07:00
Max Hahn
f9cb31fdc2 added visitor pattern (#1669)
* added visitor pattern

* pylint bug workaround

* added tests, made abstract OpNode inherit from ABC

* fixed assert

* fix check of abstract classes in negative test

* remove assert False
2023-08-30 09:03:44 -07:00
chenyu
ac183568be llama JIT python runtime speedup (#1633)
* no JIT call in TransformerBlock

* idea

* move 2 reshapes to jitted function

shrink inside jitted too, 6.3ms

remove back reshapes, 5.5ms

isinstance -> __class__ 4.99ms

* think

revert ops_gpu.py

revert symbolic.py too

PYOPENCL_COMPILER_OUTPUT=1

* cleanup

* fix cache shape for conversational model

only reshape if start_pos > 0

* small cleanup

* include var_vals.keys() to st.key

* add comments

* llama small update

* everything jitted again, similar structure to gpt2

* fix typing

* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
nimlgen
8844a0a822 llvm jitted (#1652) 2023-08-28 20:22:44 -07:00
nimlgen
1c0449e190 add cache collector (#1595)
* init cache collector

* add test_cache_collector.py

* switch GlobalCounters.cache to CacheCollector

* init jit models test

* jitted SD

* add debug msg to print loaded bufs count

* moved cache collctor to jit

* clearer SD

* no double device import
2023-08-28 19:59:55 -07:00
qazal
3515ba4f23 add dtypes test (#1682) 2023-08-28 08:12:15 -07:00
chenyu
66fbf4800b fix symbolic_ops tests with Tensor.training=True (#1686) 2023-08-26 23:19:56 -04:00
chenyu
b5d700adae update openpilot supercombo.onnx to 0.9.4 (#1681)
* update openpilot supercombo.onnx to 0.9.4

* update tests for the new model

* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
Jordan Wright
25be7f745d Tensor.uniform with dtype=int bug fix (#1593) 2023-08-26 01:59:53 -04:00
George Hotz
1b8c40234f Uast start (#1650)
* work

* more tests

* more tests 2

* don't break it
2023-08-23 12:00:06 -07:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b do replace div->mul for non-floats (#1644) 2023-08-23 07:34:31 -07:00
George Hotz
c831218139 Optional: Reduce line count and simplify the LazyBuffer interface (#1642)
* less lines in lazybuffer, def e

* custom function

* cast

* reorder functions

* lb type
2023-08-22 21:01:10 -07:00
George Hotz
d25046e66a matvec tests (#1634)
* matvec tests

* f16

* f16 is broken
2023-08-22 17:33:58 -07:00
George Hotz
643cbdfd50 make embedding and GPT-2 fast (#1631)
* make embedding fast

* jit more, variable shape support

* print mem bw
2023-08-22 15:14:38 -07:00
George Hotz
db8344ab83 add noalias to llvm (#1622) 2023-08-22 09:26:01 -07:00
chenyu
89e13f2f04 support symbols in shrink (#1611) 2023-08-22 09:08:21 -07:00
George Hotz
718ced296c move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
George Hotz
86a32ffb1a lt sum (#1617) 2023-08-21 21:19:16 -07:00
George Hotz
c64c47a6ae test arange simple 2023-08-21 20:16:17 -07:00
Yixiang Gao
4f02491cd4 add cpu if torch tensor (#1609) 2023-08-21 16:57:59 -07:00
Yixiang Gao
4d54afb6df sparse cat cross entropy (#1597)
* add sparse cat cross entropy

* minor fix

* add log_softmax into loss function

* add test

* update docs

* fix training loss

* add device
2023-08-21 14:14:54 -07:00
George Hotz
2e60920317 Revert "sparse cat cross entropy (#1591)" (#1596)
This reverts commit f0ee850e98.
2023-08-21 10:04:26 -07:00
Yixiang Gao
f0ee850e98 sparse cat cross entropy (#1591)
* add sparse cat cross entropy

* minor fix

* add log_softmax into loss function

* add test

* update docs
2023-08-21 09:56:41 -07:00
Yixiang Gao
8d6662a741 .cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Umut Zengin
35bf21276f Argmax/Argmin Feature (#1576)
* implemented argmax and argmin

* lint

* lint

* match torch behaviour

* format

* removed flip
2023-08-20 18:46:46 -07:00
George Hotz
012ee7d162 not worth the speed (#1584)
* not worth the speed

* no slots

* uops comments

* bump to python 3.11 for speed

* add critical slots back
2023-08-20 10:24:58 -07:00
George Hotz
739f327d2d Shorter (#1582)
* deleting lines

* remove insert dims

* if statement is never hit

* bug fixes
2023-08-20 08:12:16 -07:00
David Hou
4fbce972d7 CSE at uop level (#1483)
* uop-level cse

* add test

* don't cache reduce alu ops

* types

* rename variable

* fix

* delete lines
2023-08-19 23:40:40 -07:00