Commit Graph

2355 Commits

Author SHA1 Message Date
DavidFarago
1ba8f0dca3 Quickstart: Upgrade section "Training" to new code (#1663)
Co-authored-by: Dave Farago <dfarago@innoopract.com>
2023-08-24 17:12:16 -04:00
DavidFarago
29adae84eb Quickstart: Use tensors to compute train accuracy (#1662)
Co-authored-by: Dave Farago <dfarago@innoopract.com>
2023-08-24 17:09:12 -04:00
George Hotz
d37d092c14 split linearizer into 3 files (#1654) 2023-08-23 14:58:47 -07:00
George Hotz
1b8c40234f Uast start (#1650)
* work

* more tests

* more tests 2

* don't break it
2023-08-23 12:00:06 -07:00
geohotstan
484708da87 #1615 fix (#1616) 2023-08-23 14:51:05 -04:00
Pavol Rusnak
b57c374164 add accelerator links to readme (#1649) 2023-08-23 14:47:55 -04:00
George Hotz
82623697a8 Move asm renderer (#1648)
* teeny changes

* teeny updates

* move to renderer
2023-08-23 10:06:43 -07:00
George Hotz
a89363574d teeny changes (#1647)
* teeny changes

* teeny updates
2023-08-23 09:53:39 -07:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b do replace div->mul for non-floats (#1644) 2023-08-23 07:34:31 -07:00
George Hotz
da694d4241 move that image import 2023-08-22 21:30:55 -07:00
George Hotz
41e83be3dd simple where broadcast (#1643) 2023-08-22 21:24:49 -07:00
George Hotz
c831218139 Optional: Reduce line count and simplify the LazyBuffer interface (#1642)
* less lines in lazybuffer, def e

* custom function

* cast

* reorder functions

* lb type
2023-08-22 21:01:10 -07:00
George Hotz
d25046e66a matvec tests (#1634)
* matvec tests

* f16

* f16 is broken
2023-08-22 17:33:58 -07:00
George Hotz
643cbdfd50 make embedding and GPT-2 fast (#1631)
* make embedding fast

* jit more, variable shape support

* print mem bw
2023-08-22 15:14:38 -07:00
Niklas D
a7752ad65d Fix link to state.py in quickstart (#1632) 2023-08-22 17:39:30 -04:00
c143
c9c40bb16f Import whole math module in tensor.py (#1628) 2023-08-22 17:07:46 -04:00
Roelof van Dijk
6fcfa50b35 [ready] perf: no noop cast just to make mypy happy (#1626)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-22 17:07:22 -04:00
Roelof van Dijk
f04a6d7882 perf: faster partition (#1625)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-22 11:56:41 -07:00
George Hotz
d3c401ba3c llama quantize: scale uses mul, not div 2023-08-22 11:48:56 -07:00
George Hotz
696e4d20a1 fix KOPT=2 with variable shape 2023-08-22 11:34:34 -07:00
George Hotz
de1fcc418f no more toCPU path (#1624) 2023-08-22 11:07:26 -07:00
George Hotz
463dece63e auto arg dtypes (#1623) 2023-08-22 10:22:40 -07:00
George Hotz
db8344ab83 add noalias to llvm (#1622) 2023-08-22 09:26:01 -07:00
chenyu
89e13f2f04 support symbols in shrink (#1611) 2023-08-22 09:08:21 -07:00
George Hotz
718ced296c move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
Umut Zengin
1e93fd5449 Readability for unreadable functions (#1610)
* cleaned

* typing

* typing

* if format

* if format

* mypy

* update argmax

* argmax more readable

* More stable def pad

* lint
2023-08-22 07:09:08 -07:00
George Hotz
86a32ffb1a lt sum (#1617) 2023-08-21 21:19:16 -07:00
George Hotz
c64c47a6ae test arange simple 2023-08-21 20:16:17 -07:00
George Hotz
4f459841bc Symbolic JIT for GPT2 (#1613)
* not fast yet

* simpler

* symbolic jit

* fp16 GOPS and GB
2023-08-21 19:44:57 -07:00
Yixiang Gao
4f02491cd4 add cpu if torch tensor (#1609) 2023-08-21 16:57:59 -07:00
Umut Zengin
f720682beb np.argmax to Tensor.argmax (#1608)
* to tensor argmax

* removed keepdim

* training update
2023-08-21 15:22:29 -07:00
George Hotz
4ea00bad38 track down llama bug 2023-08-21 15:14:21 -07:00
Roelof van Dijk
b02f77b354 perf: faster broadcasted (#1601)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-21 14:21:46 -07:00
Yixiang Gao
4d54afb6df sparse cat cross entropy (#1597)
* add sparse cat cross entropy

* minor fix

* add log_softmax into loss function

* add test

* update docs

* fix training loss

* add device
2023-08-21 14:14:54 -07:00
Roelof van Dijk
109100656f refactor: no len if it is not needed (#1598)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-21 14:06:32 -07:00
Roelof van Dijk
2c8f8ac611 perf: no ret needed (#1604)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-21 14:05:13 -07:00
Roelof van Dijk
750714c386 perf: namedtuples are hashable, don't need a key (#1607)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-21 14:01:10 -07:00
George Hotz
aaa6fdf347 this was unused code (#1600) 2023-08-21 12:02:58 -07:00
Roelof van Dijk
8e8724d3a8 perf: if argument order (mops) (#1599)
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-21 11:20:39 -07:00
George Hotz
2e60920317 Revert "sparse cat cross entropy (#1591)" (#1596)
This reverts commit f0ee850e98.
2023-08-21 10:04:26 -07:00
Yixiang Gao
f0ee850e98 sparse cat cross entropy (#1591)
* add sparse cat cross entropy

* minor fix

* add log_softmax into loss function

* add test

* update docs
2023-08-21 09:56:41 -07:00
Yixiang Gao
8d6662a741 .cpu().numpy() -> .numpy() (#1594)
* .cpu().numpy() -> .numpy()

* restore ops_torch

* restore test_speed_v_torch
2023-08-21 09:53:29 -07:00
Umut Zengin
35bf21276f Argmax/Argmin Feature (#1576)
* implemented argmax and argmin

* lint

* lint

* match torch behaviour

* format

* removed flip
2023-08-20 18:46:46 -07:00
Roelof van Dijk
1900acda09 [READY] ci: setup venv cache (#1475)
* ci: cache installed packages

* ci: trigger jobs

* ci: fix hashfiles argument

---------

Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-08-20 18:43:16 -07:00
Umut Zengin
3fc7e984f0 __getitem__ refactoring (#1586)
* dene

* dene

* form

* form

* form

* form

* lint

* small change

* preserve old

* revert to explicit reshape
2023-08-20 18:42:30 -07:00
George Hotz
d627349af0 teeny changes (#1589)
* teeny changes

* import order
2023-08-20 13:38:38 -07:00
George Hotz
012ee7d162 not worth the speed (#1584)
* not worth the speed

* no slots

* uops comments

* bump to python 3.11 for speed

* add critical slots back
2023-08-20 10:24:58 -07:00
George Hotz
739f327d2d Shorter (#1582)
* deleting lines

* remove insert dims

* if statement is never hit

* bug fixes
2023-08-20 08:12:16 -07:00
David Hou
4fbce972d7 CSE at uop level (#1483)
* uop-level cse

* add test

* don't cache reduce alu ops

* types

* rename variable

* fix

* delete lines
2023-08-19 23:40:40 -07:00