SHARK-Studio

mirror of https://github.com/nod-ai/SHARK-Studio.git synced 2026-01-10 06:17:55 -05:00

Author	SHA1	Message	Date
PhaneeshB	eab2194ca1	fix time calc for sharded	2023-12-06 01:20:47 +05:30
PhaneeshB	93f583f0be	fix device_idx for non-layer vmfbs	2023-12-06 01:20:47 +05:30
PhaneeshB	e5ed167f03	mmap shards + disable sharing of device arrays across devices	2023-12-06 01:20:47 +05:30
Elias Joseph	051ba5de63	improved sharded performance and fixed issue with lmhead on rocm	2023-12-06 01:20:47 +05:30
Eliasj42	9c50edc664	fixed functionality of sharded vicuna/llama2 (#1982 ) Co-authored-by: Elias Joseph <elias@nod-labs.com>	2023-12-04 09:11:52 -08:00
Jakub Kuderski	1f5b39f56e	[vicuna.py] Add option to enable tracing (#1993 ) This makes the program wait for tracy profiler to connect before exiting and flush profiling data after each token. I don't know how to select the tracy iree-runtime variant programatically -- instead, print an error and exit.	2023-11-24 12:25:03 -08:00
Jakub Kuderski	2da31c4109	[vicuna.py] Rework benchmark statistics calculation (#1992 ) - Move statistics out of the main loop - Add 'end-to-end' numbers - Switch the main display unit from s to ms - Start measuring time at 0 The new print format looks like this: ``` Number of iterations: 5 Num tokens: 1 (prompt), 512 (generated), 513 (total) Prefill: avg. 0.01 ms (stdev 0.00), avg. 97.99 tokens/s Decode: avg. 4840.44 ms (stdev 28.80), avg. 97.99 tokens/s Decode end-2-end: avg. 85.78 tokens/s (w/o prompt), avg. 95.98 (w/ prompt) ```	2023-11-23 12:04:03 -05:00
jinchen62	dd37c26d36	Update brevitas quant api (#1975 )	2023-11-15 10:04:07 -08:00
PhaneeshB	54bff4611d	fix cli rocm device selection	2023-11-13 23:35:55 +05:30
PhaneeshB	11510d5111	add intra rocm vmfb differentiator	2023-11-13 23:35:55 +05:30
PhaneeshB	392bade0bf	enable non default rocm device selection for webui	2023-11-13 23:35:55 +05:30
PhaneeshB	ab0e870c43	fix vicuna cli vulkan	2023-11-09 22:27:13 +05:30
Jakub Kuderski	488a172292	[vicuna.py] Allow to pass extra arguments to iree-compile (#1935 ) Add a new flag `-Xiree_compile` to forward extra compiler arguments to `iree-compile`. This flag can be set multiple times to pass more than one extra argument.	2023-11-06 12:12:34 -05:00
Gaurav Shukla	6e409bfb77	fix else if syntax error Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-10-10 06:23:56 +05:30
Ean Garvey	66f6e79d68	Split CPU/GPU definitions conditionally outside of torch contexts. (#1879 )	2023-10-09 16:46:41 -07:00
Ean Garvey	3b825579a7	(LLaMa-2) Point to int4 + f32 acc .mlir for cpu (#1878 ) - fixes some issues with non-system prompt invocation Co-authored-by: Gaurav Shukla <gauravshukla789@gmail.com>	2023-10-09 14:37:35 -05:00
Ean Garvey	caf6cc5d8f	Switch most compile flows to use ireec.compile_file. (#1863 ) * Switch most compile flows to use ireec.compile_file. * re-add input type to compile_str path. * Check if mlir_module exists before checking if it's a path or pyobject. * Fix some save_dir cases	2023-10-06 23:04:43 -05:00
Jakub Kuderski	86c1c0c215	Add aggregate statistics to microbenchmark (#1871 ) Print averaged results at the end of all iterations. Increase the default number of iterations to 5. Example: ``` Number of iterations: 5 Prefill: avg. 0.03 s, stddev 0.00 Decode: avg. 43.34 tokens/s, stdev 0.13 ``` Also remove the -2 in the number of generated tokens -- I did not find any evidence we need it.	2023-10-06 10:03:07 -07:00
Daniel Garvey	8bb364bcb8	enforce fp32 accumulates for cpu (#1873 )	2023-10-06 11:34:49 -05:00
Daniel Garvey	7abddd01ec	argmax inside model + brevitas pin (#1872 )	2023-10-05 20:15:21 -07:00
Jakub Kuderski	9c4610b9da	Add microbenchmark mode to vicuna CLI (#1864 ) Add flags to enable a non-internactive mode for microbenchmarking llama models. In this mode, the system and user prompts are specified with CLI flags, and the number of generated tokens and iterations is fixed. Also move the stats below the response and trim any response blankspace.	2023-10-05 00:12:08 -04:00
Gaurav Shukla	7cc9b3f8e8	[llama cli] Fix llama cli Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-10-03 20:39:53 +05:30
Gaurav Shukla	e930ba85b4	[os] Remove os dependency from vmfb naming (#1854 ) Also fixes a small ui issue for chatbot. Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-29 12:38:17 -05:00
Gaurav Shukla	cd732e7a38	[chatbot] split execution time to prefill and decode Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-29 13:18:03 +05:30
Gaurav Shukla	82f833e87d	[vulkan] Update vmfb naming Update vmfb naming for vulkan devices in order to resolve naming conflicts in the presence of multiple vulkan devices. Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-28 14:52:11 +05:30
Abhishek Varma	9a0efffcca	[Llama2] Fix wrong Vulkan device ID + Add Vulkan compile flags -- This commit fixes the wrong Vulkan device being selected during runtime. -- It also adds couple of IREE compilation flags to target specific Vulkan device. -- It also changes the Vulkan device listing to be more in tune with lowering control flow. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>	2023-09-22 22:24:18 +05:30
Quinn Dawkins	ded74d09cd	[vicuna.py] Keep past key values on device (#1836 ) The past key values are only used within the models themselves and can be kept on device. For vulkan int4, this gives 44 tok/s (for the first prompt) and settles at around 26 tok/s on 7900xtx.	2023-09-19 18:17:41 -04:00
PhaneeshB	b817bb8455	add roles for llama2	2023-09-12 10:59:28 +05:30
Abhishek Varma	c854208d49	[Llama2] Prefetch llama2 tokenizer configs (#1824 ) -- This commit prefetches llama2 tokenizer configs from shark_tank. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>	2023-09-08 11:29:54 -07:00
Gaurav Shukla	c5dcfc1f13	[vicuna] Exit when mlir is not present in shark tank (#1825 ) Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-08 10:30:29 -07:00
Gaurav Shukla	ede6bf83e2	[vicuna] Disabling the IR generation path Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-06 20:13:17 +05:30
Gaurav Shukla	d2f64eefa3	[chatbot] Remove few outdated models from list (#1814 )	2023-09-04 09:26:32 -07:00
Phaneesh Barwaria	1ccafa1fc1	fix llama2-70b rewrite tensor dim	2023-09-01 17:27:06 +05:30
jinchen62	4c3d8a0a7f	Enable downloading vmfb/mlir for webui (#1807 )	2023-08-31 11:05:47 -07:00
jinchen62	3601dc7c3b	Fix llama2 13b combined ir (#1803 )	2023-08-28 11:34:44 -07:00
Daniel Garvey	671881cf87	Llama2 70b (#1783 ) * llama2 70b IR gen * fix IR sec llama2 + debug * llama270b --------- Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>	2023-08-25 23:04:28 -07:00
Gaurav Shukla	4e9be6be59	[chatbot] Add debug as class attribute (#1799 ) Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-08-25 21:46:29 -07:00
Ean Garvey	9c8cbaf498	Add support for ROCM (Windows) in Studio + compile utils (#1770 ) * WIP: MSVC ROCM support for SHARK Studio * Make get_iree_rocm_args platform-agnostic. * Update stable_args.py * Update rocm arg handling in SD utils * Guard quantization imports. Co-authored-by: jam https://github.com/jammm	2023-08-25 20:56:05 -07:00
jinchen62	51f90a4d56	Update conversion passes for brevitas quant op (#1795 )	2023-08-25 17:28:07 -05:00
Abhishek Varma	310d5d0a49	Fix llama2 13b crashing + add spec file for CLI execution of Llama (#1797 ) * [Llama2] Add a fix for Llama2 13B downloading/crashing -- This commit fixes downloading/crashing of llama2 13B on wrong .mlir file. -- Also adds support for downloading vmfb from shark_tank in CLI. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com> * [llama2] Add a spec file to run Llama/Vicuna CLI exe -- This commit adds a spec file to run Llama/Vicuna CLI exe. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com> --------- Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>	2023-08-25 09:36:09 -05:00
Ean Garvey	9697981004	Pipe through a debug option to iree compile utils. (#1796 ) * Update compile_utils.py * Pipe through a flag to toggle debug options in compile utils. * Update SharkLLMBase.py	2023-08-25 07:11:11 -07:00
Vivek Khandelwal	16160d9a7d	Fix combine mlir script	2023-08-24 19:10:49 +05:30
Abhishek Varma	db990826d3	Add Llama2 13B int4 fp16 support (#1784 ) Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>	2023-08-23 10:00:32 -07:00
Vivek Khandelwal	05889a8fe1	Add LLaMa2-int4-fp16 support (#1782 )	2023-08-22 07:45:50 -07:00
Daniel Garvey	d8f0f7bade	replace public with private (#1776 ) unload footguns	2023-08-18 14:22:46 -07:00
jinchen62	8738571d1e	Adapt the change of brevitas custom op name (#1772 )	2023-08-17 14:24:43 -07:00
Gaurav Shukla	a4c354ce54	[version] Pin diffusers==0.19.3 Once the latest works with LORA train, unpin it. Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-08-17 21:27:10 +05:30
Gaurav Shukla	cc53efa89f	[cli] Fix chatbot cli Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-08-17 21:27:10 +05:30
Gaurav Shukla	9ae8bc921e	[chatbot] Fix chatbot cli and webview warning Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-08-17 21:27:10 +05:30
Daniel Garvey	045c3c3852	enable iree-opt-const-expr-hoisting in vicuna (#1742 ) Co-authored-by: powderluv <powderluv@users.noreply.github.com>	2023-08-14 18:43:42 -07:00

1 2

94 Commits