Commit Graph

84 Commits

Author SHA1 Message Date
George Hotz
c80884884e event driven hip (#3160)
* event driven hip

* simpler, src makes copy

* pass mypy
2024-01-18 14:35:18 -08:00
George Hotz
a72b1b6d65 sharding for llama (#3151)
* shard llama

* sharding works

* simpler

* simpler

* consume option

* disable that test

* save a line

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2024-01-16 19:28:00 -08:00
George Hotz
655c6f61d3 St real size (#3046)
* track the size in the lazybuffer

* shapetracker real size

* lint
2024-01-08 14:44:53 -08:00
George Hotz
c003be7309 Revert "track size in shapetracker" (#3043)
* Revert "track size in shapetracker (#3026)"

This reverts commit a8ba1ac08f.

* st.size
2024-01-08 13:13:39 -08:00
George Hotz
ebb81e8f11 hotfix: st.size() -> st.size in llama 2024-01-05 20:18:52 -08:00
chenyu
f88506e630 move gpt2/llama sampling inside the model call (#3013)
* move gpt2/llama sampling inside the model call

* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
George Hotz
0fd44259cd bf16 fix + cleanups from mixtral (#2698)
* bf16 fix + cleanups from mixtral

* generic bf16 cast
2023-12-10 16:31:52 -08:00
chenyu
fae5394845 validate llama output (#2681)
* validate llama output

* does not work with quantize
2023-12-08 16:42:01 -05:00
chenyu
539b00a645 move llama getenv("JIT") from models to examples (#2671)
Transformer class has a jit param so we should use that in the caller
2023-12-07 12:43:22 -05:00
Oleg Rybalko
5e87083783 Whisper + LLAMA + VITS (#2332)
* feat: working voice 2 text using whisper

* feat: added llama generation

* feat: vits init

* feat: more accurate voice conversion

* feat: support for tts and working pipeline for the first pass

* fix: linter checks

* refactored vits initialization and inference, added mmts-tts support

* fixed process sync and now we can have an infinite conversation

* reuse output stream to remove overhead of creating a new one each time

* added pre-prompt configuration with yaml files

* adjusted code to merge PR which changed whisper

* optimized whisper, now it's blazing fast and also reduced number of lines

* added better debug printing

* use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens

* fixed hf convert and now it's working with tinyllama

* added tinyllama config

* refactored code and made it work with all llama models

* prettier order

* prettier order

* fixed suffix for tinyllama and refactored convert_from_hf

* added missing parameters

* fixed stream release and added missing params

* jitted dp and encoder

* jitted flow forward

* removed re-init of espeak on each call to save up time

* jitted generator forward for blazing fast tts

* added contextmanager for displaying a chat log

* removed whitespace for pylint

* updated code to support latest fetch func

* wait for llama eos token and pass params from cli to llama

* listen for not fixed amount of time

* refactored code a bit

* removed thresholding and now the output streams directly to whisper

* tokenize llama output for vits batch size to work and stream each sentence to a speaker

* changed speaker

* whisper is now printing on the same line

* don't trigger llama on whisper output in parens

* added tinyllama chat model

* adjusted code to work with tinyllama chat model

* removed unused cli arg

* autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer

* fixed issue with long sentences by chunking them

* support for multiline llama output

* prettified log output

* adjusted sentence length

* remove quote from response to avoid funny tts

* fixed prompts

* added missing parameter
2023-12-02 15:03:46 -08:00
Davi Silva
ddeec24fa8 Cleanup & fix llama.py (#2524)
* docs, cleanup crap

* comma AI

* fix 70B

* this is why lexical scope exists
2023-11-30 16:00:17 -05:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
7170a9a057 coder.py can write and run code (#2439)
* wip mistral

* coder

* touchups

* cleanups

* mistral cleanups

* clean up cache create

* download the weights, fix tests

* fix llama loading

* global fixup

* clean up all

* move llama model

* cleanups

* Revert "cleanups"

This reverts commit a71c5d59eb.

* fine, leave it
2023-11-25 12:27:54 -08:00
Davi Silva
df41a57e09 Fix: missing n_kv_heads for smaller models from huggingface (#2438)
* fix: missing n_kv_heads for smaller models from huggingface

* a lil golfing
2023-11-25 10:29:04 -08:00
George Hotz
5bb720a777 Cocoa is no longer used 2023-11-23 14:31:21 -08:00
George Hotz
2dec86970a hotfix: default remains gen 1 llama 2023-11-21 14:43:02 -08:00
Oleg Rybalko
7220f5c9fc fixed hf convert and now it's working with tinyllama (#2374)
* fixed hf convert and now it's working with tinyllama

* added tinyllama config

* refactored code and made it work with all llama models

* prettier order

* prettier order

* fixed suffix for tinyllama and refactored convert_from_hf

* dynamically update help if MODEL_PARAMS changes and default size is the 1st
2023-11-21 14:36:52 -08:00
Friedrich Carl Eichenroth
75676ab8e1 Profiling-helper (#2321)
* change profiler

* remove unused imports

* remove unused imports

* change lazybuffer references

* remove unused line

* remove unused import

* remove unused stuff

* add types

* typing

* typing

* typing

* trigger actions

* -1 loc

* fixup

* trigger actions

* revert lazy typing changes

* WIP profiler helper

* replace old start & stop profiler

* fixup

* linting

* Update llama.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 14:15:56 -08:00
George Hotz
3baaf298d6 two stage cumsum in tensor.py (#2331)
* two stage cumsum in tensor.py

* 2 more kernels for llama cumsum

* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
George Hotz
70a65c201e JIT support in Interpreted (#2314)
* factor that out

* jit is supported everywhere

* fix some tests

* there's no jit supported device, the jit is everywhere

* fix test uops
2023-11-15 11:13:38 -08:00
George Hotz
01f8781c26 fix CI (#2300)
* might work

* might work 2

* might work 3

* sneak that in to llama too

* pin them all
2023-11-14 11:02:59 -08:00
chenyu
a72b370066 llama take int and convert to Variable internally (#2284) 2023-11-12 17:11:37 -05:00
chenyu
453f48ce02 pad None means (0,0) (#2273) 2023-11-11 09:50:26 -08:00
chenyu
880e693207 fix llama n_kv_heads in kvcache (#2267)
* fix llama n_kv_heads in kvcache

* trigger ci
2023-11-10 21:44:39 -05:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
George Hotz
2f7aab3d13 move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
chenyu
8548b20b23 fix codellama params and repeat_kv (#2181) 2023-10-30 10:16:26 -07:00
will
bc0829b677 Fix llama json loading (#2160) 2023-10-27 10:21:56 -10:00
nimlgen
e21bf776c8 fix debug=1 llama/gpt2 timings (#2143) 2023-10-24 15:45:00 -04:00
chenyu
e2b83f1b42 Variable.bind newer (#2017)
* Variable.bind attempt 2

* ShapeTracker.unbind

* fix llama

* fix types

* test case

* View.vars cleanup

* include mask in symbolic source

* mask can be sint

* st.unbind in bufferops

* assert ast contain free Variable only

* cleanup

* conservative unbinding reduce op arg

* move reduceop unbind

* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
chenyu
25555c836f llama default to JIT only if device supports JIT (#2028) 2023-10-09 17:26:02 -07:00
chenyu
05be57f57f Fix llama with empty prompt (#1997)
* fix llama with one token prompt

* llama is all_jitted
2023-10-06 06:48:07 -07:00
chenyu
da2b3e55f4 simpler llama - don't shrink twice (#1981) 2023-10-05 14:31:46 -07:00
chenyu
ebcda8a714 Move var_vals from ShapeTracker to LazyBuffer (#1819) 2023-09-08 09:25:10 -07:00
Yixiang Gao
22cf15e9d0 convert function into tinygrad (#1803) 2023-09-06 14:41:26 -07:00
badcc
fd25792c8b Ensure freqs as type float32 in freqs_cis (#1798) 2023-09-06 10:24:15 -07:00
George Hotz
fb1cc6bf4b llama jit is default, print tok/sec (#1774)
* llama jit is default, print tok/sec

* jit not default in CI
2023-09-05 10:12:16 -07:00
Yixiang Gao
66a6bbd029 codellama (#1702)
* add codellama with pre-downloaded weights

* add rope_theta, fix param

* fix test

* add 7B-Python

* add 7B-Instruct

* replace single quotes with doulbe

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-09-02 08:45:12 -07:00
nimlgen
b5cf274da3 remove memory peak for quantized llama (#1720) 2023-08-30 16:32:30 -04:00
chenyu
e4eb5d55c7 critical realize for unjitted llama (#1718) 2023-08-30 14:52:32 -04:00
Karan Handa
a8aa13dc91 [ready] Replacing os with pathlib (#1708)
* replace os.path with pathlib

* safe convert dirnames to pathlib

* replace all os.path.join

* fix cuda error

* change main chunk

* Reviewer fixes

* fix vgg

* Fixed everything

* Final fixes

* ensure consistency

* Change all parent.parent... to parents
2023-08-30 10:41:08 -07:00
chenyu
ac183568be llama JIT python runtime speedup (#1633)
* no JIT call in TransformerBlock

* idea

* move 2 reshapes to jitted function

shrink inside jitted too, 6.3ms

remove back reshapes, 5.5ms

isinstance -> __class__ 4.99ms

* think

revert ops_gpu.py

revert symbolic.py too

PYOPENCL_COMPILER_OUTPUT=1

* cleanup

* fix cache shape for conversational model

only reshape if start_pos > 0

* small cleanup

* include var_vals.keys() to st.key

* add comments

* llama small update

* everything jitted again, similar structure to gpt2

* fix typing

* add TODO for in place update cache
2023-08-30 07:51:05 -07:00
Olivier Chafik
ee6d8de2dc Llama: load models in HuggingFace format (incl. indexed, safetensors) (#1583) 2023-08-28 15:11:40 -04:00
George Hotz
a6d842af7a move device to ops (#1646)
* move device to ops

* mlops types

* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
d3c401ba3c llama quantize: scale uses mul, not div 2023-08-22 11:48:56 -07:00
chenyu
89e13f2f04 support symbols in shrink (#1611) 2023-08-22 09:08:21 -07:00
George Hotz
718ced296c move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
Umut Zengin
f720682beb np.argmax to Tensor.argmax (#1608)
* to tensor argmax

* removed keepdim

* training update
2023-08-21 15:22:29 -07:00