George Hotz
1b8c40234f
Uast start ( #1650 )
...
* work
* more tests
* more tests 2
* don't break it
2023-08-23 12:00:06 -07:00
geohotstan
484708da87
#1615 fix ( #1616 )
2023-08-23 14:51:05 -04:00
Pavol Rusnak
b57c374164
add accelerator links to readme ( #1649 )
2023-08-23 14:47:55 -04:00
George Hotz
82623697a8
Move asm renderer ( #1648 )
...
* teeny changes
* teeny updates
* move to renderer
2023-08-23 10:06:43 -07:00
George Hotz
a89363574d
teeny changes ( #1647 )
...
* teeny changes
* teeny updates
2023-08-23 09:53:39 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b
do replace div->mul for non-floats ( #1644 )
2023-08-23 07:34:31 -07:00
George Hotz
da694d4241
move that image import
2023-08-22 21:30:55 -07:00
George Hotz
41e83be3dd
simple where broadcast ( #1643 )
2023-08-22 21:24:49 -07:00
George Hotz
c831218139
Optional: Reduce line count and simplify the LazyBuffer interface ( #1642 )
...
* less lines in lazybuffer, def e
* custom function
* cast
* reorder functions
* lb type
2023-08-22 21:01:10 -07:00
George Hotz
d25046e66a
matvec tests ( #1634 )
...
* matvec tests
* f16
* f16 is broken
2023-08-22 17:33:58 -07:00
George Hotz
643cbdfd50
make embedding and GPT-2 fast ( #1631 )
...
* make embedding fast
* jit more, variable shape support
* print mem bw
2023-08-22 15:14:38 -07:00
Niklas D
a7752ad65d
Fix link to state.py in quickstart ( #1632 )
2023-08-22 17:39:30 -04:00
c143
c9c40bb16f
Import whole math module in tensor.py ( #1628 )
2023-08-22 17:07:46 -04:00
Roelof van Dijk
6fcfa50b35
[ready] perf: no noop cast just to make mypy happy ( #1626 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-22 17:07:22 -04:00
Roelof van Dijk
f04a6d7882
perf: faster partition ( #1625 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-22 11:56:41 -07:00
George Hotz
d3c401ba3c
llama quantize: scale uses mul, not div
2023-08-22 11:48:56 -07:00
George Hotz
696e4d20a1
fix KOPT=2 with variable shape
2023-08-22 11:34:34 -07:00
George Hotz
de1fcc418f
no more toCPU path ( #1624 )
2023-08-22 11:07:26 -07:00
George Hotz
463dece63e
auto arg dtypes ( #1623 )
2023-08-22 10:22:40 -07:00
George Hotz
db8344ab83
add noalias to llvm ( #1622 )
2023-08-22 09:26:01 -07:00
chenyu
89e13f2f04
support symbols in shrink ( #1611 )
2023-08-22 09:08:21 -07:00
George Hotz
718ced296c
move state to nn/state ( #1619 )
2023-08-22 07:36:24 -07:00
Umut Zengin
1e93fd5449
Readability for unreadable functions ( #1610 )
...
* cleaned
* typing
* typing
* if format
* if format
* mypy
* update argmax
* argmax more readable
* More stable def pad
* lint
2023-08-22 07:09:08 -07:00
George Hotz
86a32ffb1a
lt sum ( #1617 )
2023-08-21 21:19:16 -07:00
George Hotz
c64c47a6ae
test arange simple
2023-08-21 20:16:17 -07:00
George Hotz
4f459841bc
Symbolic JIT for GPT2 ( #1613 )
...
* not fast yet
* simpler
* symbolic jit
* fp16 GOPS and GB
2023-08-21 19:44:57 -07:00
Yixiang Gao
4f02491cd4
add cpu if torch tensor ( #1609 )
2023-08-21 16:57:59 -07:00
Umut Zengin
f720682beb
np.argmax to Tensor.argmax ( #1608 )
...
* to tensor argmax
* removed keepdim
* training update
2023-08-21 15:22:29 -07:00
George Hotz
4ea00bad38
track down llama bug
2023-08-21 15:14:21 -07:00
Roelof van Dijk
b02f77b354
perf: faster broadcasted ( #1601 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-21 14:21:46 -07:00
Yixiang Gao
4d54afb6df
sparse cat cross entropy ( #1597 )
...
* add sparse cat cross entropy
* minor fix
* add log_softmax into loss function
* add test
* update docs
* fix training loss
* add device
2023-08-21 14:14:54 -07:00
Roelof van Dijk
109100656f
refactor: no len if it is not needed ( #1598 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-21 14:06:32 -07:00
Roelof van Dijk
2c8f8ac611
perf: no ret needed ( #1604 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-21 14:05:13 -07:00
Roelof van Dijk
750714c386
perf: namedtuples are hashable, don't need a key ( #1607 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-21 14:01:10 -07:00
George Hotz
aaa6fdf347
this was unused code ( #1600 )
2023-08-21 12:02:58 -07:00
Roelof van Dijk
8e8724d3a8
perf: if argument order (mops) ( #1599 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-21 11:20:39 -07:00
George Hotz
2e60920317
Revert "sparse cat cross entropy ( #1591 )" ( #1596 )
...
This reverts commit f0ee850e98 .
2023-08-21 10:04:26 -07:00
Yixiang Gao
f0ee850e98
sparse cat cross entropy ( #1591 )
...
* add sparse cat cross entropy
* minor fix
* add log_softmax into loss function
* add test
* update docs
2023-08-21 09:56:41 -07:00
Yixiang Gao
8d6662a741
.cpu().numpy() -> .numpy() ( #1594 )
...
* .cpu().numpy() -> .numpy()
* restore ops_torch
* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Umut Zengin
35bf21276f
Argmax/Argmin Feature ( #1576 )
...
* implemented argmax and argmin
* lint
* lint
* match torch behaviour
* format
* removed flip
2023-08-20 18:46:46 -07:00
Roelof van Dijk
1900acda09
[READY] ci: setup venv cache ( #1475 )
...
* ci: cache installed packages
* ci: trigger jobs
* ci: fix hashfiles argument
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com >
2023-08-20 18:43:16 -07:00
Umut Zengin
3fc7e984f0
__getitem__ refactoring ( #1586 )
...
* dene
* dene
* form
* form
* form
* form
* lint
* small change
* preserve old
* revert to explicit reshape
2023-08-20 18:42:30 -07:00
George Hotz
d627349af0
teeny changes ( #1589 )
...
* teeny changes
* import order
2023-08-20 13:38:38 -07:00
George Hotz
012ee7d162
not worth the speed ( #1584 )
...
* not worth the speed
* no slots
* uops comments
* bump to python 3.11 for speed
* add critical slots back
2023-08-20 10:24:58 -07:00
George Hotz
739f327d2d
Shorter ( #1582 )
...
* deleting lines
* remove insert dims
* if statement is never hit
* bug fixes
2023-08-20 08:12:16 -07:00
David Hou
4fbce972d7
CSE at uop level ( #1483 )
...
* uop-level cse
* add test
* don't cache reduce alu ops
* types
* rename variable
* fix
* delete lines
2023-08-19 23:40:40 -07:00
George Hotz
b9feb1b743
fp16 support in stable diffusion
2023-08-20 05:37:21 +00:00
George Hotz
ad7d26c393
fix __launch_bounds__ and benchmark TC MATMUL ( #1575 )
...
* fix
* benchmark matmul
2023-08-19 10:54:39 -07:00
David Hou
92754e177c
cache buffer loads across multiple bufs ( #1482 )
...
* cache loads across buffers (since they may share rawbufs)
* typing
* add test
* fix test
* small changes to test
* fix test
* one big cache
* whitespace
* golf a line?
* invalid is RawBuffer(0)[0], valid 1.
2023-08-19 09:09:58 -07:00