From d835f4863284f79dab76e314ca2fe7dcf2e17375 Mon Sep 17 00:00:00 2001
From: Chi Wang <wang.chi@microsoft.com>
Date: Thu, 31 Aug 2023 15:37:45 +0000
Subject: [PATCH] doc update

---
 website/docs/Contribute.md                    |  4 +-
 website/docs/Getting-Started.md               |  1 +
 website/docs/Research.md                      | 13 ++++-
 ...ltiagent_conversation.md => agent_chat.md} |  0
 website/docs/Use-Cases/enhanced_inference.md  | 52 ++++++++++---------
 .../docs/Use-Cases/utilities4applications.md  | 15 ------
 6 files changed, 42 insertions(+), 43 deletions(-)
 rename website/docs/Use-Cases/{multiagent_conversation.md => agent_chat.md} (100%)
 delete mode 100644 website/docs/Use-Cases/utilities4applications.md

diff --git a/website/docs/Contribute.md b/website/docs/Contribute.md
index 1ccd8fa4f..0bd548581 100644
--- a/website/docs/Contribute.md
+++ b/website/docs/Contribute.md
@@ -62,10 +62,10 @@ There is currently no formal reviewer solicitation process. Current reviewers id
 
 ```bash
 git clone https://github.com/microsoft/autogen.git
-pip install -e pyautogen[notebook]
+pip install -e autogen
 ```
 
-In case the `pip install` command fails, try escaping the brackets such as `pip install -e pyautogen\[notebook\]`.
+<!-- In case the `pip install` command fails, try escaping the brackets such as `pip install -e autogen`. -->
 
 ### Docker
 
diff --git a/website/docs/Getting-Started.md b/website/docs/Getting-Started.md
index 36ba35c52..ccfd7ef97 100644
--- a/website/docs/Getting-Started.md
+++ b/website/docs/Getting-Started.md
@@ -12,6 +12,7 @@ AutoGen is a framework that enables development of LLM applications using multip
 * It supports diverse conversation patterns for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy,
 the number of agents, and agent conversation topology.
 * It provides a collection of working systems with different complexities. These systems span a wide range of applications from various domains and complexities. They demonstrate how AutoGen can easily support different conversation patterns.
+* AutoGen provides a drop-in replacement of `openai.Completion` or `openai.ChatCompletion` as an enhanced inference API. It allows easy performance tuning, utilities like API unification & caching, and advanced usage patterns, such as error handling, multi-config inference, context programming etc.
 
 AutoGen is powered by collaborative [research studies](/docs/Research) from Microsoft, Penn State University, and University of Washington.
 
diff --git a/website/docs/Research.md b/website/docs/Research.md
index ebcfffa01..b71148c14 100644
--- a/website/docs/Research.md
+++ b/website/docs/Research.md
@@ -1,6 +1,6 @@
 # Research
 
-For technical details, please check our technical report.
+For technical details, please check our technical report and research publications.
 
 * [AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework](https://arxiv.org/abs/2308.08155) Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang and Chi Wang. ArXiv 2023.
 
@@ -14,3 +14,14 @@ For technical details, please check our technical report.
       primaryClass={cs.AI}
 }
 ```
+
+* [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673). Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah. ArXiv preprint arXiv:2303.04673 (2023).
+
+```bibtex
+@inproceedings{wang2023EcoOptiGen,
+    title={Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference},
+    author={Chi Wang and Susan Xueqing Liu and Ahmed H. Awadallah},
+    year={2023},
+    booktitle={ArXiv preprint arXiv:2303.04673},
+}
+```
diff --git a/website/docs/Use-Cases/multiagent_conversation.md b/website/docs/Use-Cases/agent_chat.md
similarity index 100%
rename from website/docs/Use-Cases/multiagent_conversation.md
rename to website/docs/Use-Cases/agent_chat.md
diff --git a/website/docs/Use-Cases/enhanced_inference.md b/website/docs/Use-Cases/enhanced_inference.md
index 7b9da127c..7e8d04aa3 100644
--- a/website/docs/Use-Cases/enhanced_inference.md
+++ b/website/docs/Use-Cases/enhanced_inference.md
@@ -1,15 +1,15 @@
-## Enhanced Inference
+# Enhanced Inference
 
-One can use [`autogen.Completion.create`](/docs/reference/autogen/oai/completion#create) to perform inference.
+[`autogen.Completion`](/docs/reference/oai/completion) is a drop-in replacement of `openai.Completion` and `openai.ChatCompletion` as an enhanced inference API.
 There are a number of benefits of using `autogen` to perform inference: performance tuning, API unification, caching, error handling, multi-config inference, result filtering, templating and so on.
 
-### Tune Inference Parameters
+## Tune Inference Parameters
 
 *Links to notebook examples:*
 * [Optimize for Code Generation](https://github.com/microsoft/autogen/blob/main/notebook/autogen_openai_completion.ipynb)
 * [Optimize for Math](https://github.com/microsoft/autogen/blob/main/notebook/autogen_chatgpt_gpt4.ipynb)
 
-#### Choices to optimize
+### Choices to optimize
 
 The cost of using foundation models for text generation is typically measured in terms of the number of tokens in the input and output combined. From the perspective of an application builder using foundation models, the use case is to maximize the utility of the generated text under an inference budget constraint (e.g., measured by the average dollar cost needed to solve a coding problem). This can be achieved by optimizing the hyperparameters of the inference,
 which can significantly affect both the utility and the cost of the generated text.
@@ -40,11 +40,11 @@ With AutoGen, the tuning can be performed with the following information:
 1. Search space.
 1. Budgets: inference and optimization respectively.
 
-#### Validation data
+### Validation data
 
 Collect a diverse set of instances. They can be stored in an iterable of dicts. For example, each instance dict can contain "problem" as a key and the description str of a math problem as the value; and "solution" as a key and the solution str as the value.
 
-#### Evaluation function
+### Evaluation function
 
 The evaluation function should take a list of responses, and other keyword arguments corresponding to the keys in each validation data instance as input, and output a dict of metrics. For example,
 
@@ -56,13 +56,13 @@ def eval_math_responses(responses: List[str], solution: str, **args) -> Dict:
     return {"success": is_equivalent(answer, solution)}
 ```
 
-[`autogen.code_utils`](/docs/reference/autogen/code_utils) and [`autogen.math_utils`](/docs/reference/autogen/math_utils) offer some example evaluation functions for code generation and math problem solving.
+[`autogen.code_utils`](/docs/reference/code_utils) and [`autogen.math_utils`](/docs/reference/math_utils) offer some example evaluation functions for code generation and math problem solving.
 
-#### Metric to optimize
+### Metric to optimize
 
 The metric to optimize is usually an aggregated metric over all the tuning data instances. For example, users can specify "success" as the metric and "max" as the optimization mode. By default, the aggregation function is taking the average. Users can provide a customized aggregation function if needed.
 
-#### Search space
+### Search space
 
 Users can specify the (optional) search range for each hyperparameter.
 
@@ -77,15 +77,15 @@ And `{problem}` will be replaced by the "problem" field of each data instance.
 Please don't provide both. By default, each configuration will choose either a temperature or a top_p in [0, 1] uniformly.
 1. presence_penalty, frequency_penalty. They can be constants or specified by `flaml.tune.uniform` etc. Not tuned by default.
 
-#### Budgets
+### Budgets
 
 One can specify an inference budget and an optimization budget.
 The inference budget refers to the average inference cost per data instance.
 The optimization budget refers to the total budget allowed in the tuning process. Both are measured by dollars and follow the price per 1000 tokens.
 
-#### Perform tuning
+### Perform tuning
 
-Now, you can use [`autogen.Completion.tune`](/docs/reference/autogen/oai/completion#tune) for tuning. For example,
+Now, you can use [`autogen.Completion.tune`](/docs/reference/oai/completion#tune) for tuning. For example,
 
 ```python
 import autogen
@@ -106,22 +106,22 @@ The returned `config` contains the optimized configuration and `analysis` contai
 
 The tuend config can be used to perform inference.
 
-### API unification
+## API unification
 
 `autogen.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API.
 When chat models are used and `prompt` is given as the input to `autogen.Completion.create`, the prompt will be automatically converted into `messages` to fit the chat completion API requirement. One advantage is that one can experiment with both chat and non-chat models for the same prompt in a unified API.
 
-For local LLMs, one can spin up an endpoint using a package like [simple_ai_server](https://github.com/lhenault/simpleAI) and [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](/blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.
+For local LLMs, one can spin up an endpoint using a package like [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](/blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.
 
 When only working with the chat-based models, `autogen.ChatCompletion` can be used. It also does automatic conversion from prompt to messages, if prompt is provided instead of messages.
 
-### Caching
+## Caching
 
-API call results are cached locally and reused when the same request is issued. This is useful when repeating or continuing experiments for reproducibility and cost saving. It still allows controlled randomness by setting the "seed", using [`set_cache`](/docs/reference/autogen/oai/completion#set_cache) or specifying in `create()`.
+API call results are cached locally and reused when the same request is issued. This is useful when repeating or continuing experiments for reproducibility and cost saving. It still allows controlled randomness by setting the "seed", using [`set_cache`](/docs/reference/oai/completion#set_cache) or specifying in `create()`.
 
-### Error handling
+## Error handling
 
-#### Runtime error
+### Runtime error
 
 It is easy to hit error when calling OpenAI APIs, due to connection, rate limit, or timeout. Some of the errors are transient. `autogen.Completion.create` deals with the transient errors and retries automatically. Initial request timeout, retry timeout and retry time interval can be configured via `request_timeout`, `retry_timeout` and `autogen.Completion.retry_time`.
 
@@ -158,7 +158,9 @@ response = autogen.Completion.create(
 It will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and a locally hosted llama-7B one by one, ignoring AuthenticationError, RateLimitError and Timeout,
 until a valid result is returned. This can speed up the development process where the rate limit is a bottleneck. An error will be raised if the last choice fails. So make sure the last choice in the list has the best availability.
 
-#### Logic error
+For convenience, we provide a number of utility functions to load config lists, such as [`config_list_from_json`](/docs/references/oai/openai_utils#config_list_from_json).
+
+### Logic error
 
 Another type of error is that the returned response does not satisfy a requirement. For example, if the response is required to be a valid json string, one would like to filter the responses that are not. This can be achieved by providing a list of configurations and a filter function. For example,
 
@@ -183,7 +185,7 @@ The example above will try to use text-ada-001, gpt-3.5-turbo, and text-davinci-
 
 *Advanced use case: Check this [blogpost](/blog/2023/05/18/GPT-adaptive-humaneval) to find how to improve GPT-4's coding performance from 68% to 90% while reducing the inference cost.*
 
-### Templating
+## Templating
 
 If the provided prompt or message is a template, it will be automatically materialized with a given context. For example,
 
@@ -244,7 +246,7 @@ context.append(
 response = autogen.ChatCompletion.create(context, messages=messages, **config)
 ```
 
-### Logging (Experimental)
+## Logging (Experimental)
 
 When debugging or diagnosing an LLM-based system, it is often convenient to log the API calls and analyze them. `autogen.Completion` and `autogen.ChatCompletion` offer an easy way to collect the API call histories. For example, to log the chat histories, simply run:
 ```python
@@ -363,8 +365,8 @@ Set `compact=False` in `start_logging()` to switch.
 It can be seen that the individual API call history contains redundant information of the conversation. For a long conversation the degree of redundancy is high.
 The compact history is more efficient and the individual API call history contains more details.
 
-### Other Utilities
+## Other Utilities
 
-- a [`cost`](/docs/reference/autogen/oai/completion#cost) function to calculate the cost of an API call.
-- a [`test`](/docs/reference/autogen/oai/completion#test) function to conveniently evaluate the configuration over test data.
-- an [`extract_text_or_function_call`](/docs/reference/autogen/oai/completion#extract_text_or_function_call) function to extract the text or function call from a completion or chat response.
+- a [`cost`](/docs/reference/oai/completion#cost) function to calculate the cost of an API call.
+- a [`test`](/docs/reference/oai/completion#test) function to conveniently evaluate the configuration over test data.
+- an [`extract_text_or_function_call`](/docs/reference/oai/completion#extract_text_or_function_call) function to extract the text or function call from a completion or chat response.
diff --git a/website/docs/Use-Cases/utilities4applications.md b/website/docs/Use-Cases/utilities4applications.md
deleted file mode 100644
index 692d73e15..000000000
--- a/website/docs/Use-Cases/utilities4applications.md
+++ /dev/null
@@ -1,15 +0,0 @@
-## Utilities for Applications
-AutoGen provides a drop-in replacement of `openai.Completion` or `openai.ChatCompletion` as an enhanced inference API. It allows easy performance tuning, utilities like API unification & caching, and advanced usage patterns, such as error handling, multi-config inference, context programming etc.
-
-### Code
-
-[`autogen.code_utils`](/docs/reference/autogen/code_utils) offers code-related utilities, such as:
-- a [`improve_code`](/docs/reference/autogen/code_utils#improve_code) function to improve code for a given objective.
-- a [`generate_assertions`](/docs/reference/autogen/code_utils#generate_assertions) function to generate assertion statements from function signature and docstr.
-- a [`implement`](/docs/reference/autogen/code_utils#implement) function to implement a function from a definition.
-- a [`eval_function_completions`](/docs/reference/autogen/code_utils#eval_function_completions) function to evaluate the success of a function completion task, or select a response from a list of responses using generated assertions.
-
-### Math
-
-[`autogen.math_utils`](/docs/reference/autogen/math_utils) offers utilities for math problems, such as:
-- a [eval_math_responses](/docs/reference/autogen/math_utils#eval_math_responses) function to select a response using voting, and check if the final answer is correct if the canonical solution is provided.