PhaneeshB
e5ed167f03
mmap shards + disable sharing of device arrays across devices
2023-12-06 01:20:47 +05:30
Elias Joseph
051ba5de63
improved sharded performance and fixed issue with lmhead on rocm
2023-12-06 01:20:47 +05:30
Vivek Khandelwal
396a054856
Fix Sharded Falcon-180b
2023-11-30 21:51:57 +05:30
Vivek Khandelwal
5c66948d4f
Fix unsharded Falcon pipeline
2023-11-30 21:51:57 +05:30
Vivek Khandelwal
666e601dd9
Remove sharding support for non-180B falcon variants
2023-11-27 13:45:13 +05:30
Vivek Khandelwal
ca58908e5b
Add Falcon-GPTQ Support for 2-way sharding
2023-11-27 13:45:13 +05:30
jinchen62
dd37c26d36
Update brevitas quant api ( #1975 )
2023-11-15 10:04:07 -08:00
Vivek Khandelwal
92b694db4d
Add support for Falcon-40b-GPTQ
2023-11-06 19:49:19 +05:30
Vivek Khandelwal
322874f7f9
Fix issue in Falcon-GPTQ
2023-11-03 11:48:36 +05:30
Vivek Khandelwal
71846344a2
Add sharded Falcon-GPTQ support
...
This commit adds the support for sharded Falcon-7b-GPTQ and
Falcon-180B-GPTQ. This commit also adds the support for 4-way
sharding of the Falcon model for the device ROCM.
Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com >
2023-11-01 12:11:44 +05:30
Vivek Khandelwal
ea920f2955
Add sharded Falcon support
2023-10-26 21:53:25 +05:30
Vivek Khandelwal
205e57683a
Modify Falcon-180b-GPTQ sharded pipeline
2023-10-17 20:26:01 +05:30
Vivek Khandelwal
2866d665ee
Fix Sharded Falcon-180b-GPTQ Pipeline
2023-10-17 20:26:01 +05:30
Vivek Khandelwal
202ffff67b
Add support for sharded Falcon model
2023-10-13 22:05:10 +05:30
Vivek Khandelwal
b83d32fafe
Fix Falcon GPTQ Pipeline
2023-10-11 20:09:32 +05:30
Vivek Khandelwal
0a618e1863
Add support for Falcon GPTQ
2023-10-11 10:47:48 +05:30
Ean Garvey
66f6e79d68
Split CPU/GPU definitions conditionally outside of torch contexts. ( #1879 )
2023-10-09 16:46:41 -07:00
Ean Garvey
caf6cc5d8f
Switch most compile flows to use ireec.compile_file. ( #1863 )
...
* Switch most compile flows to use ireec.compile_file.
* re-add input type to compile_str path.
* Check if mlir_module exists before checking if it's a path or pyobject.
* Fix some save_dir cases
2023-10-06 23:04:43 -05:00
Ean Garvey
8614a18474
Remove tf dependencies from importer path. ( #1874 )
...
* Remove tf dependencies from import path.
* Fix formatting.
2023-10-06 12:27:12 -07:00
Daniel Garvey
8bb364bcb8
enforce fp32 accumulates for cpu ( #1873 )
2023-10-06 11:34:49 -05:00
Daniel Garvey
7abddd01ec
argmax inside model + brevitas pin ( #1872 )
2023-10-05 20:15:21 -07:00
Vivek Khandelwal
8dd7850c69
Add Falcon-GPTQ support
2023-10-02 16:39:57 +05:30
Vivek Khandelwal
c9d6870105
Modify falcon pipeline for 180b support
2023-09-28 12:39:35 +05:30
Daniel Garvey
671881cf87
Llama2 70b ( #1783 )
...
* llama2 70b IR gen
* fix IR sec llama2 + debug
* llama270b
---------
Co-authored-by: PhaneeshB <b.phaneesh@gmail.com >
2023-08-25 23:04:28 -07:00
Ean Garvey
9c8cbaf498
Add support for ROCM (Windows) in Studio + compile utils ( #1770 )
...
* WIP: MSVC ROCM support for SHARK Studio
* Make get_iree_rocm_args platform-agnostic.
* Update stable_args.py
* Update rocm arg handling in SD utils
* Guard quantization imports.
Co-authored-by: jam https://github.com/jammm
2023-08-25 20:56:05 -07:00
jinchen62
51f90a4d56
Update conversion passes for brevitas quant op ( #1795 )
2023-08-25 17:28:07 -05:00
Ean Garvey
9697981004
Pipe through a debug option to iree compile utils. ( #1796 )
...
* Update compile_utils.py
* Pipe through a flag to toggle debug options in compile utils.
* Update SharkLLMBase.py
2023-08-25 07:11:11 -07:00
Abhishek Varma
db990826d3
Add Llama2 13B int4 fp16 support ( #1784 )
...
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
2023-08-23 10:00:32 -07:00
Vivek Khandelwal
05889a8fe1
Add LLaMa2-int4-fp16 support ( #1782 )
2023-08-22 07:45:50 -07:00
jinchen62
8738571d1e
Adapt the change of brevitas custom op name ( #1772 )
2023-08-17 14:24:43 -07:00
Eliasj42
ed484b8253
added functionality for int8 vicuna and 4 shards ( #1712 )
...
combined vicuna_4_shards.py and vicuna.py to reduce code duplication
Co-authored-by: Elias Joseph <elias@nod-labs.com >
2023-08-04 14:05:05 -05:00
Gaurav Shukla
bd30044c0b
[Shard] Add sharding generation in shark studio
...
Signed-Off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-08-04 21:51:14 +05:30
Abhishek Varma
47f8a79c75
[MiniGPT4] Add MiniGPT4 to SHARK ( #1554 )
...
* [MiniGPT4] Add MiniGPT4 to SHARK
-- This is the first installment of MiniGPT4 in SHARK.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
* Add int8 support for MiniGPT4
-- This commit adds int8 support for MiniGPT4.
Signed-off-by: Abhishek Varma <abhishek@nod-lab.com >
* Update .spec for MiniGPT4's config files
* black format MiniGPT4
---------
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com >
Signed-off-by: Abhishek Varma <abhishek@nod-lab.com >
2023-07-25 09:42:27 -07:00
Vivek Khandelwal
03c4d9e171
Add support for Llama-2-70b for web and cli, and for hf_auth_token
2023-07-20 14:57:48 +05:30
jinchen62
3662224c04
Update brevitas requirement ( #1677 )
...
also clean up useless args
Co-authored-by: powderluv <powderluv@users.noreply.github.com >
2023-07-19 22:03:32 -07:00
Vivek Khandelwal
4be80f7158
Add support for the Llama-2 model
2023-07-19 20:57:08 +05:30
jinchen62
47ec7275e6
Fix brevitas quantize argument ( #1633 )
2023-07-07 11:30:31 -07:00
jinchen62
bc6fee1a0c
Add int4/int8 vicuna ( #1598 )
2023-07-05 07:01:51 -07:00
Eliasj42
4015793f84
changed method of compiling vicuna to remove first and second vicuna ( #1611 )
...
Co-authored-by: Elias Joseph <elias@nod-labs.com >
Co-authored-by: powderluv <powderluv@users.noreply.github.com >
2023-07-03 12:12:43 -07:00
jinchen62
534de05791
Update precision check for vicuna ( #1610 )
2023-06-29 16:16:33 -05:00
Daniel Garvey
5779e8c039
int4/int8 vicuna download support ( #1609 )
...
* set task_topology_max_group to cpu_count
by default. Can be overriden with a flag of the same str
* add download for int4/int8 mlir
2023-06-29 13:35:51 -07:00
Gaurav Shukla
1d6a1f9f8a
[vicuna] Add tokens streaming(step=3) ( #1600 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-06-27 08:59:27 -07:00
powderluv
726d73d6ba
Revert "[vicuna] Add streaming of tokens ( #1587 )" ( #1588 )
...
This reverts commit 4d55e51d46 .
2023-06-23 10:29:00 -07:00
Gaurav Shukla
4d55e51d46
[vicuna] Add streaming of tokens ( #1587 )
...
Signed-off-by: Gaurav Shukla <gaurav@nod-labs.com >
2023-06-23 08:20:46 -07:00
jinchen62
4002da7161
Add int4/int8 options to chatbot webui ( #1586 )
2023-06-23 07:18:34 -07:00
Eliasj42
8822b9acd7
added ability to use config file to shard vicuna ( #1565 )
...
Co-authored-by: Elias Joseph <elias@nod-labs.com >
2023-06-22 17:40:35 -05:00
Daniel Garvey
0ca3b9fce3
fix some mmap and vicuna bugs ( #1576 )
2023-06-22 17:39:55 -05:00
Daniel Garvey
a202bb466a
fp16 fixes for webui ( #1571 )
2023-06-21 20:24:02 -07:00
Phaneesh Barwaria
88cc2423cc
Enable Vicuna fp16 cpu ( #1562 )
...
* fix second vic mlir gen
* fp16 mlir/vmfb download from shark_tank
2023-06-20 13:43:21 -05:00
Vivek Khandelwal
855435ee24
Fix for the user input for Falcon pipeline
2023-06-20 18:09:32 +05:30