tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	5d28a202b5	make tinychat local (#7871 )	2024-11-24 14:45:48 +08:00
chenyu	22d5def113	download llama3 70B (#7868 ) use "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF". ``` PYTHONPATH=. JITBEAM=2 python3 examples/llama3.py --download_model --size 70B --quantize int8 --benchmark ``` on M4 Max, 40 sec to load the model and ``` enqueue in 165.15 ms total 328.54 ms, 3.04 tok/s, 247.46 GB/s, param 221.20 GB/s enqueue in 5.31 ms total 168.48 ms, 5.94 tok/s, 482.54 GB/s, param 431.34 GB/s enqueue in 5.32 ms total 168.77 ms, 5.93 tok/s, 481.71 GB/s, param 430.60 GB/s enqueue in 5.69 ms total 169.51 ms, 5.90 tok/s, 479.61 GB/s, param 428.72 GB/s enqueue in 5.41 ms total 168.60 ms, 5.93 tok/s, 482.20 GB/s, param 431.04 GB/s enqueue in 5.18 ms total 168.98 ms, 5.92 tok/s, 481.12 GB/s, param 430.08 GB/s enqueue in 5.43 ms total 168.82 ms, 5.92 tok/s, 481.59 GB/s, param 430.49 GB/s enqueue in 5.27 ms total 168.94 ms, 5.92 tok/s, 481.23 GB/s, param 430.17 GB/s ```	2024-11-23 12:18:31 -05:00
George Hotz	144e9f00df	viz is local, new test, and new quantize [pr] (#7859 ) * viz is local, new test, and new quantize [pr] * fix mime types * remove font * after index	2024-11-23 14:27:10 +08:00
George Hotz	3989bd2682	idiv + reciprocal [pr] (#7354 ) * idiv + reciprocal * remove upcast from div * fix docs	2024-10-29 15:54:19 +08:00
chenyu	4a03e00aa1	fix llama3 download_model assert (#7320 ) false positive if download_model and model are not provided	2024-10-27 11:20:24 -04:00
eliotgolding	e920f1d663	Llama 3.2 1B load from GGUF (#7295 ) * gguf 1b-instruct * not needed	2024-10-27 09:29:02 +08:00
wozeparrot	f932116e05	feat: small things from default_threefry (#6708 )	2024-09-24 17:00:47 +08:00
wozeparrot	d269bc95fa	faster tinychat (#5993 )	2024-08-08 19:16:26 -07:00
wozeparrot	eebb1b9922	feat: temperature 0 llama3 benchmark (#5806 )	2024-07-30 12:05:36 -07:00
wozeparrot	639af3f823	llama3 temperature flag (#5803 )	2024-07-29 16:33:51 -07:00
wozeparrot	fa873df9c1	bring tinychat more inline with tinyos' version (#5358 )	2024-07-10 13:13:52 -07:00
nimlgen	21b225ac45	llama3 download works (#5160 )	2024-06-26 22:45:13 +03:00
wozeparrot	c91b3c4079	shard llama3 on 0 sometimes (#5157 )	2024-06-26 11:50:57 -07:00
chenyu	dade7677cf	validate llama3 output only with model "LLaMA-3/8B-SF-DPO" (#5138 )	2024-06-24 20:58:25 -04:00
chenyu	8080298739	s/tinytqdm/tqdm (#5103 ) except in unit test where tqdm is imported	2024-06-22 14:18:26 -04:00
chenyu	e468601226	update llama attention casting (#5096 ) * update llama attention casting updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention. * fix that	2024-06-22 10:57:17 -04:00
wozeparrot	acb715c64c	fix: llama3 special tokens (#5045 )	2024-06-18 17:08:44 -07:00
chenyu	a3ed4176c8	use tinytqdm in active tests and examples (#5038 ) * use tinytqdm in active tests and examples stress test this before 0.9.1 * no set_description	2024-06-18 16:01:19 -04:00
wozeparrot	ce1ed374c9	more tinychat fixes (#4971 )	2024-06-15 16:29:39 -07:00
wozeparrot	8209cd3c55	easier llama3 + fetch subdir (#4938 )	2024-06-14 13:47:27 -07:00
wozeparrot	3d13c23bfa	llama3 `--download_model` (#4922 )	2024-06-11 22:59:59 -07:00
wozeparrot	6c24eda522	feat: tinychat (#4869 )	2024-06-08 12:05:45 -07:00
wozeparrot	ed0a740fe4	greater chat api endpoint compat (#4792 )	2024-05-30 22:47:31 -07:00
chenyu	7624ad3ddd	add --timing and --profile to llama3 example (#4767 )	2024-05-28 16:24:44 -04:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
chenyu	5e3fbbb33e	llama3 example add manual seed and log seed (#4667 )	2024-05-20 19:09:57 -04:00
chenyu	ae861325ce	update llama sample for mac 32 input buffer limit (#4662 ) set default sampling params to function call to 0, and top k in llama3 to 25.	2024-05-20 17:23:39 -04:00
wozeparrot	b144d4b460	new llama3 example (#4576 )	2024-05-19 22:42:23 -07:00

28 Commits