tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 11:14:56 -05:00

Author	SHA1	Message	Date
George Hotz	c80884884e	event driven hip (#3160 ) * event driven hip * simpler, src makes copy * pass mypy	2024-01-18 14:35:18 -08:00
George Hotz	a72b1b6d65	sharding for llama (#3151 ) * shard llama * sharding works * simpler * simpler * consume option * disable that test * save a line --------- Co-authored-by: George Hotz <george@tinygrad.org>	2024-01-16 19:28:00 -08:00
George Hotz	655c6f61d3	St real size (#3046 ) * track the size in the lazybuffer * shapetracker real size * lint	2024-01-08 14:44:53 -08:00
George Hotz	c003be7309	Revert "track size in shapetracker" (#3043 ) * Revert "track size in shapetracker (#3026)" This reverts commit `a8ba1ac08f`. * st.size	2024-01-08 13:13:39 -08:00
George Hotz	ebb81e8f11	hotfix: st.size() -> st.size in llama	2024-01-05 20:18:52 -08:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
George Hotz	0fd44259cd	bf16 fix + cleanups from mixtral (#2698 ) * bf16 fix + cleanups from mixtral * generic bf16 cast	2023-12-10 16:31:52 -08:00
chenyu	fae5394845	validate llama output (#2681 ) * validate llama output * does not work with quantize	2023-12-08 16:42:01 -05:00
chenyu	539b00a645	move llama getenv("JIT") from models to examples (#2671 ) Transformer class has a jit param so we should use that in the caller	2023-12-07 12:43:22 -05:00
Oleg Rybalko	5e87083783	Whisper + LLAMA + VITS (#2332 ) * feat: working voice 2 text using whisper * feat: added llama generation * feat: vits init * feat: more accurate voice conversion * feat: support for tts and working pipeline for the first pass * fix: linter checks * refactored vits initialization and inference, added mmts-tts support * fixed process sync and now we can have an infinite conversation * reuse output stream to remove overhead of creating a new one each time * added pre-prompt configuration with yaml files * adjusted code to merge PR which changed whisper * optimized whisper, now it's blazing fast and also reduced number of lines * added better debug printing * use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens * fixed hf convert and now it's working with tinyllama * added tinyllama config * refactored code and made it work with all llama models * prettier order * prettier order * fixed suffix for tinyllama and refactored convert_from_hf * added missing parameters * fixed stream release and added missing params * jitted dp and encoder * jitted flow forward * removed re-init of espeak on each call to save up time * jitted generator forward for blazing fast tts * added contextmanager for displaying a chat log * removed whitespace for pylint * updated code to support latest fetch func * wait for llama eos token and pass params from cli to llama * listen for not fixed amount of time * refactored code a bit * removed thresholding and now the output streams directly to whisper * tokenize llama output for vits batch size to work and stream each sentence to a speaker * changed speaker * whisper is now printing on the same line * don't trigger llama on whisper output in parens * added tinyllama chat model * adjusted code to work with tinyllama chat model * removed unused cli arg * autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer * fixed issue with long sentences by chunking them * support for multiline llama output * prettified log output * adjusted sentence length * remove quote from response to avoid funny tts * fixed prompts * added missing parameter	2023-12-02 15:03:46 -08:00
Davi Silva	ddeec24fa8	Cleanup & fix llama.py (#2524 ) * docs, cleanup crap * comma AI * fix 70B * this is why lexical scope exists	2023-11-30 16:00:17 -05:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	7170a9a057	coder.py can write and run code (#2439 ) * wip mistral * coder * touchups * cleanups * mistral cleanups * clean up cache create * download the weights, fix tests * fix llama loading * global fixup * clean up all * move llama model * cleanups * Revert "cleanups" This reverts commit `a71c5d59eb`. * fine, leave it	2023-11-25 12:27:54 -08:00
Davi Silva	df41a57e09	Fix: missing n_kv_heads for smaller models from huggingface (#2438 ) * fix: missing n_kv_heads for smaller models from huggingface * a lil golfing	2023-11-25 10:29:04 -08:00
George Hotz	5bb720a777	Cocoa is no longer used	2023-11-23 14:31:21 -08:00
George Hotz	2dec86970a	hotfix: default remains gen 1 llama	2023-11-21 14:43:02 -08:00
Oleg Rybalko	7220f5c9fc	fixed hf convert and now it's working with tinyllama (#2374 ) * fixed hf convert and now it's working with tinyllama * added tinyllama config * refactored code and made it work with all llama models * prettier order * prettier order * fixed suffix for tinyllama and refactored convert_from_hf * dynamically update help if MODEL_PARAMS changes and default size is the 1st	2023-11-21 14:36:52 -08:00
Friedrich Carl Eichenroth	75676ab8e1	Profiling-helper (#2321 ) * change profiler * remove unused imports * remove unused imports * change lazybuffer references * remove unused line * remove unused import * remove unused stuff * add types * typing * typing * typing * trigger actions * -1 loc * fixup * trigger actions * revert lazy typing changes * WIP profiler helper * replace old start & stop profiler * fixup * linting * Update llama.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-11-16 14:15:56 -08:00
George Hotz	3baaf298d6	two stage cumsum in tensor.py (#2331 ) * two stage cumsum in tensor.py * 2 more kernels for llama cumsum * gpt-2 and llama use fast multinomial	2023-11-16 12:09:53 -08:00
George Hotz	70a65c201e	JIT support in Interpreted (#2314 ) * factor that out * jit is supported everywhere * fix some tests * there's no jit supported device, the jit is everywhere * fix test uops	2023-11-15 11:13:38 -08:00
George Hotz	01f8781c26	fix CI (#2300 ) * might work * might work 2 * might work 3 * sneak that in to llama too * pin them all	2023-11-14 11:02:59 -08:00
chenyu	a72b370066	llama take int and convert to Variable internally (#2284 )	2023-11-12 17:11:37 -05:00
chenyu	453f48ce02	pad None means (0,0) (#2273 )	2023-11-11 09:50:26 -08:00
chenyu	880e693207	fix llama n_kv_heads in kvcache (#2267 ) * fix llama n_kv_heads in kvcache * trigger ci	2023-11-10 21:44:39 -05:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
chenyu	8548b20b23	fix codellama params and repeat_kv (#2181 )	2023-10-30 10:16:26 -07:00
will	bc0829b677	Fix llama json loading (#2160 )	2023-10-27 10:21:56 -10:00
nimlgen	e21bf776c8	fix debug=1 llama/gpt2 timings (#2143 )	2023-10-24 15:45:00 -04:00
chenyu	e2b83f1b42	Variable.bind newer (#2017 ) * Variable.bind attempt 2 * ShapeTracker.unbind * fix llama * fix types * test case * View.vars cleanup * include mask in symbolic source * mask can be sint * st.unbind in bufferops * assert ast contain free Variable only * cleanup * conservative unbinding reduce op arg * move reduceop unbind * fix llama JIT arg behavior	2023-10-10 10:03:01 -07:00
chenyu	25555c836f	llama default to JIT only if device supports JIT (#2028 )	2023-10-09 17:26:02 -07:00
chenyu	05be57f57f	Fix llama with empty prompt (#1997 ) * fix llama with one token prompt * llama is all_jitted	2023-10-06 06:48:07 -07:00
chenyu	da2b3e55f4	simpler llama - don't shrink twice (#1981 )	2023-10-05 14:31:46 -07:00
chenyu	ebcda8a714	Move var_vals from ShapeTracker to LazyBuffer (#1819 )	2023-09-08 09:25:10 -07:00
Yixiang Gao	22cf15e9d0	convert function into tinygrad (#1803 )	2023-09-06 14:41:26 -07:00
badcc	fd25792c8b	Ensure freqs as type float32 in freqs_cis (#1798 )	2023-09-06 10:24:15 -07:00
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
Yixiang Gao	66a6bbd029	codellama (#1702 ) * add codellama with pre-downloaded weights * add rope_theta, fix param * fix test * add 7B-Python * add 7B-Instruct * replace single quotes with doulbe --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-02 08:45:12 -07:00
nimlgen	b5cf274da3	remove memory peak for quantized llama (#1720 )	2023-08-30 16:32:30 -04:00
chenyu	e4eb5d55c7	critical realize for unjitted llama (#1718 )	2023-08-30 14:52:32 -04:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
chenyu	ac183568be	llama JIT python runtime speedup (#1633 ) * no JIT call in TransformerBlock * idea * move 2 reshapes to jitted function shrink inside jitted too, 6.3ms remove back reshapes, 5.5ms isinstance -> __class__ 4.99ms * think revert ops_gpu.py revert symbolic.py too PYOPENCL_COMPILER_OUTPUT=1 * cleanup * fix cache shape for conversational model only reshape if start_pos > 0 * small cleanup * include var_vals.keys() to st.key * add comments * llama small update * everything jitted again, similar structure to gpt2 * fix typing * add TODO for in place update cache	2023-08-30 07:51:05 -07:00
Olivier Chafik	ee6d8de2dc	Llama: load models in HuggingFace format (incl. indexed, safetensors) (#1583 )	2023-08-28 15:11:40 -04:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	d3c401ba3c	llama quantize: scale uses mul, not div	2023-08-22 11:48:56 -07:00
chenyu	89e13f2f04	support symbols in shrink (#1611 )	2023-08-22 09:08:21 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
Umut Zengin	f720682beb	np.argmax to Tensor.argmax (#1608 ) * to tensor argmax * removed keepdim * training update	2023-08-21 15:22:29 -07:00

1 2

84 Commits