George Hotz
fb1cc6bf4b
llama jit is default, print tok/sec ( #1774 )
...
* llama jit is default, print tok/sec
* jit not default in CI
2023-09-05 10:12:16 -07:00
George Hotz
63c46e0287
Parens and gls ( #1768 )
...
* more paren stripping
* remove global and local size from renderers
* complex strip parens
* extra helpers + minor webgpu fix
* fix test uops
* one more parens test
2023-09-04 16:09:01 -07:00
Adrian Kretz
3473c9e88d
Metal conv tensor cores ( #1696 )
...
* Benchmark 5x5 conv kernel which is optimized
* Use Metal tensor cores in 2d convs
2023-09-04 15:14:46 -07:00
tomtom-95
7344f7c2d1
KeyError fixed. ( #1763 )
2023-09-04 15:36:16 -04:00
nimlgen
f863c12610
test kopt correctness ( #1756 )
...
* test kopt correctness
* bump BUDGET to 20
* kopt hooks as setUp/tearDown
2023-09-04 10:55:00 -07:00
George Hotz
c6d5d45a2b
Remove MemOp ( #1750 )
...
* start removing memop
* locals
* support both stores
* might be correct
* remove parens on shape ish
* fix metal ops
* render load and render store
* fix image
* maybe fix asm
* fix test uops
* revert asm
* remove memop itself
2023-09-04 09:58:33 -07:00
chenyu
b8fde6bb0f
Test KOPT in CI ( #1744 )
...
* test kopt in ci
* getenv takes dtype from default
2023-09-03 14:37:20 -07:00
George Hotz
ed194a1d3b
zero fold ( #1748 )
...
* add constant fold
* err, it's just zero folding
* self store fold + caching
* prints and more folds
* simpler winograd kernels
* remove childless uops
2023-09-03 13:48:11 -07:00
George Hotz
e17b1af160
UnaryOps.NEG ( #1749 )
2023-09-03 12:44:26 -07:00
David Hou
3151d91f6e
3x3 winograd convs ( #1675 )
...
* winograd
* simplify local groups code
* comment
* respects self.opts.has_local
* always simplify ones
* make mypy happy
* move reshape, WINO flag
* wino flag, simple forward backward test for wino
* extra wino test
* merge oops
* comments
* axis_needs_valid -> axis_is_masked
* don't delete needs_valid (it's unused though)
* make linter happy
* make linter happy
* smaller test
* change number
* make wino tests very small
2023-09-03 07:29:43 -07:00
geohotstan
e36148b1ce
Make __getitem__ TINYer ( #1661 )
2023-09-02 23:01:01 -04:00
Yixiang Gao
66a6bbd029
codellama ( #1702 )
...
* add codellama with pre-downloaded weights
* add rope_theta, fix param
* fix test
* add 7B-Python
* add 7B-Instruct
* replace single quotes with doulbe
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-09-02 08:45:12 -07:00
chenyu
a2745819f6
faster gpt2 jit path and gpt2 in test_real_world ( #1738 )
2023-09-02 08:39:12 -07:00
George Hotz
91258aa67f
render const ( #1736 )
...
* render const
* remove constop
* fix llvm and webgpu
* disable consts in llvm again
* assembly special
* fix const rendering
* fix arm64
* imms are int
* fix ptx
* fix arm64
2023-09-01 19:01:43 -07:00
George Hotz
cd844ec4b2
remove Token class ( #1723 )
...
* no fusion
* no float4 grouping
* mulacc fusion is fine. remove uop_alu
* fully remove get_grouped_maybe_float4
* removed that test
* that's not float4 anymore
* disable failing arm64
* metal ops pass tokenless
* fix wmma
* update test_uops with new style
* fix gep
* fix float4 store
* fix float4 store more
* cuda tests pass
* disable broadcast pow
* fix ptx
* reenable arm64
* bring cse back
* don't cache the acc
* fix ptx bug
2023-09-01 12:53:07 -07:00
George Hotz
458eb89463
minor changes from prerender ( #1734 )
2023-09-01 10:04:47 -07:00
chenyu
f964b9e5ee
visitor pattern for sym_infer and unit tests ( #1733 )
...
* visitor pattern for sym_infer and unit tests
* comments
2023-09-01 09:47:45 -07:00
JaSpa99
024dd690fa
Reactivate commavq/gpt2m benchmark ( #1731 )
...
* get commavq/gpt2m from huggingface
* increase tols
2023-09-01 06:45:08 -07:00
George Hotz
5c403d43b9
New >3 indexing ( #1729 )
...
* move reindexing into linearizer
* get_grouped_dims
* don't limit for clang
2023-08-31 21:24:15 -07:00
George Hotz
e3a062ad17
real matvec test
2023-08-31 17:27:25 -07:00
Karan Handa
a8aa13dc91
[ready] Replacing os with pathlib ( #1708 )
...
* replace os.path with pathlib
* safe convert dirnames to pathlib
* replace all os.path.join
* fix cuda error
* change main chunk
* Reviewer fixes
* fix vgg
* Fixed everything
* Final fixes
* ensure consistency
* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
nimlgen
355b02dc3f
allow zerosized tensors ( #1659 )
...
* allow zerosized tensors
* works with numpy
2023-08-30 10:39:24 -07:00
Max Hahn
f9cb31fdc2
added visitor pattern ( #1669 )
...
* added visitor pattern
* pylint bug workaround
* added tests, made abstract OpNode inherit from ABC
* fixed assert
* fix check of abstract classes in negative test
* remove assert False
2023-08-30 09:03:44 -07:00
chenyu
ac183568be
llama JIT python runtime speedup ( #1633 )
...
* no JIT call in TransformerBlock
* idea
* move 2 reshapes to jitted function
shrink inside jitted too, 6.3ms
remove back reshapes, 5.5ms
isinstance -> __class__ 4.99ms
* think
revert ops_gpu.py
revert symbolic.py too
PYOPENCL_COMPILER_OUTPUT=1
* cleanup
* fix cache shape for conversational model
only reshape if start_pos > 0
* small cleanup
* include var_vals.keys() to st.key
* add comments
* llama small update
* everything jitted again, similar structure to gpt2
* fix typing
* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
nimlgen
8844a0a822
llvm jitted ( #1652 )
2023-08-28 20:22:44 -07:00
nimlgen
1c0449e190
add cache collector ( #1595 )
...
* init cache collector
* add test_cache_collector.py
* switch GlobalCounters.cache to CacheCollector
* init jit models test
* jitted SD
* add debug msg to print loaded bufs count
* moved cache collctor to jit
* clearer SD
* no double device import
2023-08-28 19:59:55 -07:00
qazal
3515ba4f23
add dtypes test ( #1682 )
2023-08-28 08:12:15 -07:00
chenyu
66fbf4800b
fix symbolic_ops tests with Tensor.training=True ( #1686 )
2023-08-26 23:19:56 -04:00
chenyu
b5d700adae
update openpilot supercombo.onnx to 0.9.4 ( #1681 )
...
* update openpilot supercombo.onnx to 0.9.4
* update tests for the new model
* comment out comma models from external_model_benchmark
2023-08-26 19:16:08 -04:00
Jordan Wright
25be7f745d
Tensor.uniform with dtype=int bug fix ( #1593 )
2023-08-26 01:59:53 -04:00
George Hotz
1b8c40234f
Uast start ( #1650 )
...
* work
* more tests
* more tests 2
* don't break it
2023-08-23 12:00:06 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b
do replace div->mul for non-floats ( #1644 )
2023-08-23 07:34:31 -07:00
George Hotz
c831218139
Optional: Reduce line count and simplify the LazyBuffer interface ( #1642 )
...
* less lines in lazybuffer, def e
* custom function
* cast
* reorder functions
* lb type
2023-08-22 21:01:10 -07:00
George Hotz
d25046e66a
matvec tests ( #1634 )
...
* matvec tests
* f16
* f16 is broken
2023-08-22 17:33:58 -07:00
George Hotz
643cbdfd50
make embedding and GPT-2 fast ( #1631 )
...
* make embedding fast
* jit more, variable shape support
* print mem bw
2023-08-22 15:14:38 -07:00
George Hotz
db8344ab83
add noalias to llvm ( #1622 )
2023-08-22 09:26:01 -07:00
chenyu
89e13f2f04
support symbols in shrink ( #1611 )
2023-08-22 09:08:21 -07:00
George Hotz
718ced296c
move state to nn/state ( #1619 )
2023-08-22 07:36:24 -07:00
George Hotz
86a32ffb1a
lt sum ( #1617 )
2023-08-21 21:19:16 -07:00
George Hotz
c64c47a6ae
test arange simple
2023-08-21 20:16:17 -07:00
Yixiang Gao
4f02491cd4
add cpu if torch tensor ( #1609 )
2023-08-21 16:57:59 -07:00
Yixiang Gao
4d54afb6df
sparse cat cross entropy ( #1597 )
...
* add sparse cat cross entropy
* minor fix
* add log_softmax into loss function
* add test
* update docs
* fix training loss
* add device
2023-08-21 14:14:54 -07:00
George Hotz
2e60920317
Revert "sparse cat cross entropy ( #1591 )" ( #1596 )
...
This reverts commit f0ee850e98 .
2023-08-21 10:04:26 -07:00
Yixiang Gao
f0ee850e98
sparse cat cross entropy ( #1591 )
...
* add sparse cat cross entropy
* minor fix
* add log_softmax into loss function
* add test
* update docs
2023-08-21 09:56:41 -07:00
Yixiang Gao
8d6662a741
.cpu().numpy() -> .numpy() ( #1594 )
...
* .cpu().numpy() -> .numpy()
* restore ops_torch
* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Umut Zengin
35bf21276f
Argmax/Argmin Feature ( #1576 )
...
* implemented argmax and argmin
* lint
* lint
* match torch behaviour
* format
* removed flip
2023-08-20 18:46:46 -07:00
George Hotz
012ee7d162
not worth the speed ( #1584 )
...
* not worth the speed
* no slots
* uops comments
* bump to python 3.11 for speed
* add critical slots back
2023-08-20 10:24:58 -07:00
George Hotz
739f327d2d
Shorter ( #1582 )
...
* deleting lines
* remove insert dims
* if statement is never hit
* bug fixes
2023-08-20 08:12:16 -07:00
David Hou
4fbce972d7
CSE at uop level ( #1483 )
...
* uop-level cse
* add test
* don't cache reduce alu ops
* types
* rename variable
* fix
* delete lines
2023-08-19 23:40:40 -07:00