AMD-SHARK-Studio

mirror of https://github.com/nod-ai/AMD-SHARK-Studio.git synced 2026-04-03 03:00:17 -04:00

Author	SHA1	Message	Date
Vivek Khandelwal	3cc643b2de	Add support for StableLM-3B model (#2019 ) * Add support for StableLM-3B model * Add support for Quantized StableLM-3B model * Update stablelm_pipeline.py	2023-12-12 22:39:50 +05:30
Phaneesh Barwaria	bf70e80d20	vulkan device id fix (#2028 )	2023-12-08 19:00:26 -06:00
Ean Garvey	3322b7264f	(vicuna.py) Move enable_tracy_tracing outside of BenchmarkRunInfo (#2011 )	2023-12-06 14:57:32 -06:00
Elias Joseph	1a723645fb	finilized fixes for sharded llama2	2023-12-06 15:35:29 +05:30
Eliasj42	dfdd3b1f78	improved sharded performance and fixed issue with lmhead on rocm (#2008 ) * improved sharded performance and fixed issue with lmhead on rocm * mmap shards + disable sharing of device arrays across devices * fix device_idx for non-layer vmfbs * fix time calc for sharded --------- Co-authored-by: Elias Joseph <elias@nod-labs.com> Co-authored-by: PhaneeshB <b.phaneesh@gmail.com>	2023-12-05 11:53:44 -08:00
Eliasj42	9c50edc664	fixed functionality of sharded vicuna/llama2 (#1982 ) Co-authored-by: Elias Joseph <elias@nod-labs.com>	2023-12-04 09:11:52 -08:00
Vivek Khandelwal	396a054856	Fix Sharded Falcon-180b	2023-11-30 21:51:57 +05:30
Vivek Khandelwal	5c66948d4f	Fix unsharded Falcon pipeline	2023-11-30 21:51:57 +05:30
Vivek Khandelwal	666e601dd9	Remove sharding support for non-180B falcon variants	2023-11-27 13:45:13 +05:30
Vivek Khandelwal	ca58908e5b	Add Falcon-GPTQ Support for 2-way sharding	2023-11-27 13:45:13 +05:30
Jakub Kuderski	1f5b39f56e	[vicuna.py] Add option to enable tracing (#1993 ) This makes the program wait for tracy profiler to connect before exiting and flush profiling data after each token. I don't know how to select the tracy iree-runtime variant programatically -- instead, print an error and exit.	2023-11-24 12:25:03 -08:00
Jakub Kuderski	2da31c4109	[vicuna.py] Rework benchmark statistics calculation (#1992 ) - Move statistics out of the main loop - Add 'end-to-end' numbers - Switch the main display unit from s to ms - Start measuring time at 0 The new print format looks like this: ``` Number of iterations: 5 Num tokens: 1 (prompt), 512 (generated), 513 (total) Prefill: avg. 0.01 ms (stdev 0.00), avg. 97.99 tokens/s Decode: avg. 4840.44 ms (stdev 28.80), avg. 97.99 tokens/s Decode end-2-end: avg. 85.78 tokens/s (w/o prompt), avg. 95.98 (w/ prompt) ```	2023-11-23 12:04:03 -05:00
jinchen62	dd37c26d36	Update brevitas quant api (#1975 )	2023-11-15 10:04:07 -08:00
PhaneeshB	54bff4611d	fix cli rocm device selection	2023-11-13 23:35:55 +05:30
PhaneeshB	11510d5111	add intra rocm vmfb differentiator	2023-11-13 23:35:55 +05:30
PhaneeshB	392bade0bf	enable non default rocm device selection for webui	2023-11-13 23:35:55 +05:30
dependabot[bot]	df20cf9c8a	Bump langchain in /apps/language_models/langchain (#1968 ) Bumps [langchain](https://github.com/langchain-ai/langchain) from 0.0.325 to 0.0.329. - [Release notes](https://github.com/langchain-ai/langchain/releases) - [Commits](https://github.com/langchain-ai/langchain/compare/v0.0.325...v0.0.329) --- updated-dependencies: - dependency-name: langchain dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-12 19:46:00 -08:00
dependabot[bot]	f41ad87ef6	Bump langchain in /apps/language_models/langchain (#1926 ) Bumps [langchain](https://github.com/langchain-ai/langchain) from 0.0.202 to 0.0.325. - [Release notes](https://github.com/langchain-ai/langchain/releases) - [Commits](https://github.com/langchain-ai/langchain/compare/v0.0.202...v0.0.325) --- updated-dependencies: - dependency-name: langchain dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-09 11:03:47 -06:00
dependabot[bot]	d811524a00	Bump pypdf from 3.12.2 to 3.17.0 in /apps/language_models/langchain (#1929 ) Bumps [pypdf](https://github.com/py-pdf/pypdf) from 3.12.2 to 3.17.0. - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](https://github.com/py-pdf/pypdf/compare/3.12.2...3.17.0) --- updated-dependencies: - dependency-name: pypdf dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-09 11:02:43 -06:00
PhaneeshB	ab0e870c43	fix vicuna cli vulkan	2023-11-09 22:27:13 +05:30
Jakub Kuderski	488a172292	[vicuna.py] Allow to pass extra arguments to iree-compile (#1935 ) Add a new flag `-Xiree_compile` to forward extra compiler arguments to `iree-compile`. This flag can be set multiple times to pass more than one extra argument.	2023-11-06 12:12:34 -05:00
Vivek Khandelwal	92b694db4d	Add support for Falcon-40b-GPTQ	2023-11-06 19:49:19 +05:30
Vivek Khandelwal	322874f7f9	Fix issue in Falcon-GPTQ	2023-11-03 11:48:36 +05:30
Vivek Khandelwal	71846344a2	Add sharded Falcon-GPTQ support This commit adds the support for sharded Falcon-7b-GPTQ and Falcon-180B-GPTQ. This commit also adds the support for 4-way sharding of the Falcon model for the device ROCM. Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>	2023-11-01 12:11:44 +05:30
Vivek Khandelwal	ea920f2955	Add sharded Falcon support	2023-10-26 21:53:25 +05:30
Vivek Khandelwal	205e57683a	Modify Falcon-180b-GPTQ sharded pipeline	2023-10-17 20:26:01 +05:30
Vivek Khandelwal	2866d665ee	Fix Sharded Falcon-180b-GPTQ Pipeline	2023-10-17 20:26:01 +05:30
Vivek Khandelwal	202ffff67b	Add support for sharded Falcon model	2023-10-13 22:05:10 +05:30
Vivek Khandelwal	b83d32fafe	Fix Falcon GPTQ Pipeline	2023-10-11 20:09:32 +05:30
Vivek Khandelwal	0a618e1863	Add support for Falcon GPTQ	2023-10-11 10:47:48 +05:30
Gaurav Shukla	6e409bfb77	fix else if syntax error Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-10-10 06:23:56 +05:30
Ean Garvey	66f6e79d68	Split CPU/GPU definitions conditionally outside of torch contexts. (#1879 )	2023-10-09 16:46:41 -07:00
Ean Garvey	3b825579a7	(LLaMa-2) Point to int4 + f32 acc .mlir for cpu (#1878 ) - fixes some issues with non-system prompt invocation Co-authored-by: Gaurav Shukla <gauravshukla789@gmail.com>	2023-10-09 14:37:35 -05:00
Ean Garvey	caf6cc5d8f	Switch most compile flows to use ireec.compile_file. (#1863 ) * Switch most compile flows to use ireec.compile_file. * re-add input type to compile_str path. * Check if mlir_module exists before checking if it's a path or pyobject. * Fix some save_dir cases	2023-10-06 23:04:43 -05:00
Ean Garvey	8614a18474	Remove tf dependencies from importer path. (#1874 ) * Remove tf dependencies from import path. * Fix formatting.	2023-10-06 12:27:12 -07:00
Jakub Kuderski	86c1c0c215	Add aggregate statistics to microbenchmark (#1871 ) Print averaged results at the end of all iterations. Increase the default number of iterations to 5. Example: ``` Number of iterations: 5 Prefill: avg. 0.03 s, stddev 0.00 Decode: avg. 43.34 tokens/s, stdev 0.13 ``` Also remove the -2 in the number of generated tokens -- I did not find any evidence we need it.	2023-10-06 10:03:07 -07:00
Daniel Garvey	8bb364bcb8	enforce fp32 accumulates for cpu (#1873 )	2023-10-06 11:34:49 -05:00
Daniel Garvey	7abddd01ec	argmax inside model + brevitas pin (#1872 )	2023-10-05 20:15:21 -07:00
Abhishek Varma	2a451fa0c7	[Llama2] Add a standalone utility for dynamic and combining IRs -- This script adds a standalone utility for converting Llama IRs to dynamic and combining them as well. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>	2023-10-05 20:01:06 +05:30
Jakub Kuderski	9c4610b9da	Add microbenchmark mode to vicuna CLI (#1864 ) Add flags to enable a non-internactive mode for microbenchmarking llama models. In this mode, the system and user prompts are specified with CLI flags, and the number of generated tokens and iterations is fixed. Also move the stats below the response and trim any response blankspace.	2023-10-05 00:12:08 -04:00
Gaurav Shukla	7cc9b3f8e8	[llama cli] Fix llama cli Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-10-03 20:39:53 +05:30
Vivek Khandelwal	8dd7850c69	Add Falcon-GPTQ support	2023-10-02 16:39:57 +05:30
Gaurav Shukla	e930ba85b4	[os] Remove os dependency from vmfb naming (#1854 ) Also fixes a small ui issue for chatbot. Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-29 12:38:17 -05:00
Gaurav Shukla	cd732e7a38	[chatbot] split execution time to prefill and decode Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-29 13:18:03 +05:30
Gaurav Shukla	82f833e87d	[vulkan] Update vmfb naming Update vmfb naming for vulkan devices in order to resolve naming conflicts in the presence of multiple vulkan devices. Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com>	2023-09-28 14:52:11 +05:30
Vivek Khandelwal	c9d6870105	Modify falcon pipeline for 180b support	2023-09-28 12:39:35 +05:30
Abhishek Varma	9a0efffcca	[Llama2] Fix wrong Vulkan device ID + Add Vulkan compile flags -- This commit fixes the wrong Vulkan device being selected during runtime. -- It also adds couple of IREE compilation flags to target specific Vulkan device. -- It also changes the Vulkan device listing to be more in tune with lowering control flow. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>	2023-09-22 22:24:18 +05:30
Quinn Dawkins	ded74d09cd	[vicuna.py] Keep past key values on device (#1836 ) The past key values are only used within the models themselves and can be kept on device. For vulkan int4, this gives 44 tok/s (for the first prompt) and settles at around 26 tok/s on 7900xtx.	2023-09-19 18:17:41 -04:00
PhaneeshB	b817bb8455	add roles for llama2	2023-09-12 10:59:28 +05:30
Abhishek Varma	c854208d49	[Llama2] Prefetch llama2 tokenizer configs (#1824 ) -- This commit prefetches llama2 tokenizer configs from shark_tank. Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>	2023-09-08 11:29:54 -07:00

1 2 3 4

178 Commits