George Hotz
5d28a202b5
make tinychat local ( #7871 )
2024-11-24 14:45:48 +08:00
chenyu
22d5def113
download llama3 70B ( #7868 )
...
use "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF".
```
PYTHONPATH=. JITBEAM=2 python3 examples/llama3.py --download_model --size 70B --quantize int8 --benchmark
```
on M4 Max, 40 sec to load the model and
```
enqueue in 165.15 ms
total 328.54 ms, 3.04 tok/s, 247.46 GB/s, param 221.20 GB/s
enqueue in 5.31 ms
total 168.48 ms, 5.94 tok/s, 482.54 GB/s, param 431.34 GB/s
enqueue in 5.32 ms
total 168.77 ms, 5.93 tok/s, 481.71 GB/s, param 430.60 GB/s
enqueue in 5.69 ms
total 169.51 ms, 5.90 tok/s, 479.61 GB/s, param 428.72 GB/s
enqueue in 5.41 ms
total 168.60 ms, 5.93 tok/s, 482.20 GB/s, param 431.04 GB/s
enqueue in 5.18 ms
total 168.98 ms, 5.92 tok/s, 481.12 GB/s, param 430.08 GB/s
enqueue in 5.43 ms
total 168.82 ms, 5.92 tok/s, 481.59 GB/s, param 430.49 GB/s
enqueue in 5.27 ms
total 168.94 ms, 5.92 tok/s, 481.23 GB/s, param 430.17 GB/s
```
2024-11-23 12:18:31 -05:00
George Hotz
144e9f00df
viz is local, new test, and new quantize [pr] ( #7859 )
...
* viz is local, new test, and new quantize [pr]
* fix mime types
* remove font
* after index
2024-11-23 14:27:10 +08:00
George Hotz
3989bd2682
idiv + reciprocal [pr] ( #7354 )
...
* idiv + reciprocal
* remove upcast from div
* fix docs
2024-10-29 15:54:19 +08:00
chenyu
4a03e00aa1
fix llama3 download_model assert ( #7320 )
...
false positive if download_model and model are not provided
2024-10-27 11:20:24 -04:00
eliotgolding
e920f1d663
Llama 3.2 1B load from GGUF ( #7295 )
...
* gguf 1b-instruct
* not needed
2024-10-27 09:29:02 +08:00
wozeparrot
f932116e05
feat: small things from default_threefry ( #6708 )
2024-09-24 17:00:47 +08:00
wozeparrot
d269bc95fa
faster tinychat ( #5993 )
2024-08-08 19:16:26 -07:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
wozeparrot
639af3f823
llama3 temperature flag ( #5803 )
2024-07-29 16:33:51 -07:00
wozeparrot
fa873df9c1
bring tinychat more inline with tinyos' version ( #5358 )
2024-07-10 13:13:52 -07:00
nimlgen
21b225ac45
llama3 download works ( #5160 )
2024-06-26 22:45:13 +03:00
wozeparrot
c91b3c4079
shard llama3 on 0 sometimes ( #5157 )
2024-06-26 11:50:57 -07:00
chenyu
dade7677cf
validate llama3 output only with model "LLaMA-3/8B-SF-DPO" ( #5138 )
2024-06-24 20:58:25 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
chenyu
e468601226
update llama attention casting ( #5096 )
...
* update llama attention casting
updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.
* fix that
2024-06-22 10:57:17 -04:00
wozeparrot
acb715c64c
fix: llama3 special tokens ( #5045 )
2024-06-18 17:08:44 -07:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples ( #5038 )
...
* use tinytqdm in active tests and examples
stress test this before 0.9.1
* no set_description
2024-06-18 16:01:19 -04:00
wozeparrot
ce1ed374c9
more tinychat fixes ( #4971 )
2024-06-15 16:29:39 -07:00
wozeparrot
8209cd3c55
easier llama3 + fetch subdir ( #4938 )
2024-06-14 13:47:27 -07:00
wozeparrot
3d13c23bfa
llama3 --download_model ( #4922 )
2024-06-11 22:59:59 -07:00
wozeparrot
6c24eda522
feat: tinychat ( #4869 )
2024-06-08 12:05:45 -07:00
wozeparrot
ed0a740fe4
greater chat api endpoint compat ( #4792 )
2024-05-30 22:47:31 -07:00
chenyu
7624ad3ddd
add --timing and --profile to llama3 example ( #4767 )
2024-05-28 16:24:44 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
chenyu
5e3fbbb33e
llama3 example add manual seed and log seed ( #4667 )
2024-05-20 19:09:57 -04:00
chenyu
ae861325ce
update llama sample for mac 32 input buffer limit ( #4662 )
...
set default sampling params to function call to 0, and top k in llama3 to 25.
2024-05-20 17:23:39 -04:00
wozeparrot
b144d4b460
new llama3 example ( #4576 )
2024-05-19 22:42:23 -07:00