mirror of
https://github.com/microsoft/autogen.git
synced 2026-04-20 03:02:16 -04:00
Restore TeachableAgent tests (#761)
* Update chat_with_teachable_agent.py to v2. * Update agentchat_teachability.ipynb to v2. * Add test of teachability accuracy. * Update installation instructions. * Add to contrib tests. * pre-commit fixes * Apply reviewer suggestions to test workflows.
This commit is contained in:
38
.github/workflows/contrib-openai.yml
vendored
38
.github/workflows/contrib-openai.yml
vendored
@@ -138,3 +138,41 @@ jobs:
|
||||
with:
|
||||
file: ./coverage.xml
|
||||
flags: unittests
|
||||
TeachableAgent:
|
||||
strategy:
|
||||
matrix:
|
||||
os: [ubuntu-latest]
|
||||
python-version: ["3.11"]
|
||||
runs-on: ${{ matrix.os }}
|
||||
environment: openai1
|
||||
steps:
|
||||
# checkout to pr branch
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v3
|
||||
with:
|
||||
ref: ${{ github.event.pull_request.head.sha }}
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
- name: Install packages and dependencies
|
||||
run: |
|
||||
docker --version
|
||||
python -m pip install --upgrade pip wheel
|
||||
pip install -e .[teachable]
|
||||
python -c "import autogen"
|
||||
pip install coverage
|
||||
- name: Coverage
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
OAI_CONFIG_LIST: ${{ secrets.OAI_CONFIG_LIST }}
|
||||
run: |
|
||||
coverage run -a -m pytest test/agentchat/contrib/test_teachable_agent.py
|
||||
coverage xml
|
||||
- name: Upload coverage to Codecov
|
||||
uses: codecov/codecov-action@v3
|
||||
with:
|
||||
file: ./coverage.xml
|
||||
flags: unittests
|
||||
|
||||
26
.github/workflows/contrib-tests.yml
vendored
26
.github/workflows/contrib-tests.yml
vendored
@@ -109,6 +109,30 @@ jobs:
|
||||
pip install -e .
|
||||
pip uninstall -y openai
|
||||
- name: Test GPTAssistantAgent
|
||||
if: matrix.python-version != '3.10'
|
||||
run: |
|
||||
pytest test/agentchat/contrib/test_gpt_assistant.py
|
||||
|
||||
TeachableAgent:
|
||||
runs-on: ${{ matrix.os }}
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
os: [ubuntu-latest, macos-latest, windows-2019]
|
||||
python-version: ["3.8", "3.9", "3.10", "3.11"]
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
- name: Install packages and dependencies for all tests
|
||||
run: |
|
||||
python -m pip install --upgrade pip wheel
|
||||
pip install pytest
|
||||
- name: Install packages and dependencies for TeachableAgent
|
||||
run: |
|
||||
pip install -e .[teachable]
|
||||
pip uninstall -y openai
|
||||
- name: Test TeachableAgent
|
||||
run: |
|
||||
pytest test/agentchat/contrib/test_teachable_agent.py
|
||||
|
||||
@@ -21,7 +21,7 @@
|
||||
"\n",
|
||||
"In making decisions about memo storage and retrieval, `TeachableAgent` calls an instance of `TextAnalyzerAgent` to analyze pieces of text in several different ways. This adds extra LLM calls involving a relatively small number of tokens. These calls can add a few seconds to the time a user waits for a response.\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how `TeachableAgent` can learn facts, preferences, and skills from users. To chat with `TeachableAgent` yourself, run [chat_with_teachable_agent.py](../test/agentchat/chat_with_teachable_agent.py).\n",
|
||||
"This notebook demonstrates how `TeachableAgent` can learn facts, preferences, and skills from users. To chat with `TeachableAgent` yourself, run [chat_with_teachable_agent.py](../test/agentchat/contrib/chat_with_teachable_agent.py).\n",
|
||||
"\n",
|
||||
"## Requirements\n",
|
||||
"\n",
|
||||
@@ -38,7 +38,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%capture --no-stderr\n",
|
||||
"# %pip install \"pyautogen[teachable]"
|
||||
"# %pip install \"pyautogen[teachable]\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -142,9 +142,9 @@
|
||||
"from autogen import UserProxyAgent\n",
|
||||
"\n",
|
||||
"llm_config = {\n",
|
||||
" \"timeout\": 60,\n",
|
||||
" \"config_list\": config_list,\n",
|
||||
" \"use_cache\": True, # Use False to explore LLM non-determinism.\n",
|
||||
" \"timeout\": 60,\n",
|
||||
" \"cache_seed\": None, # Use an int to seed the response cache. Use None to disable caching.\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"teach_config={\n",
|
||||
@@ -157,6 +157,7 @@
|
||||
"try:\n",
|
||||
" from termcolor import colored\n",
|
||||
"except ImportError:\n",
|
||||
"\n",
|
||||
" def colored(x, *args, **kwargs):\n",
|
||||
" return x\n",
|
||||
" \n",
|
||||
@@ -170,8 +171,7 @@
|
||||
" human_input_mode=\"NEVER\",\n",
|
||||
" is_termination_msg=lambda x: True if \"TERMINATE\" in x.get(\"content\") else False,\n",
|
||||
" max_consecutive_auto_reply=0,\n",
|
||||
")\n",
|
||||
"\n"
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -781,7 +781,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.17"
|
||||
"version": "3.10.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,6 +1,12 @@
|
||||
from autogen import UserProxyAgent, config_list_from_json
|
||||
from autogen.agentchat.contrib.teachable_agent import TeachableAgent
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
|
||||
from test_assistant_agent import OAI_CONFIG_LIST, KEY_LOC # noqa: E402
|
||||
|
||||
|
||||
try:
|
||||
from termcolor import colored
|
||||
@@ -12,10 +18,13 @@ except ImportError:
|
||||
|
||||
verbosity = 0 # 0 for basic info, 1 to add memory operations, 2 for analyzer messages, 3 for memo lists.
|
||||
recall_threshold = 1.5 # Higher numbers allow more (but less relevant) memos to be recalled.
|
||||
use_cache = False # If True, cached LLM calls will be skipped and responses pulled from cache. False exposes LLM non-determinism.
|
||||
cache_seed = None # Use an int to seed the response cache. Use None to disable caching.
|
||||
|
||||
# Specify the model to use. GPT-3.5 is less reliable than GPT-4 at learning from user input.
|
||||
# filter_dict = {"model": ["gpt-4-0613"]}
|
||||
# filter_dict = {"model": ["gpt-3.5-turbo-0613"]}
|
||||
filter_dict = {"model": ["gpt-4"]}
|
||||
# filter_dict = {"model": ["gpt-35-turbo-16k", "gpt-3.5-turbo-16k"]}
|
||||
|
||||
|
||||
def create_teachable_agent(reset_db=False):
|
||||
@@ -23,10 +32,10 @@ def create_teachable_agent(reset_db=False):
|
||||
# Load LLM inference endpoints from an env variable or a file
|
||||
# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints
|
||||
# and OAI_CONFIG_LIST_sample
|
||||
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST", filter_dict=filter_dict)
|
||||
config_list = config_list_from_json(env_or_file=OAI_CONFIG_LIST, filter_dict=filter_dict, file_location=KEY_LOC)
|
||||
teachable_agent = TeachableAgent(
|
||||
name="teachableagent",
|
||||
llm_config={"config_list": config_list, "timeout": 120, "use_cache": use_cache},
|
||||
llm_config={"config_list": config_list, "timeout": 120, "cache_seed": cache_seed},
|
||||
teach_config={
|
||||
"verbosity": verbosity,
|
||||
"reset_db": reset_db,
|
||||
@@ -1,3 +1,11 @@
|
||||
import pytest
|
||||
import os
|
||||
import sys
|
||||
from autogen import ConversableAgent, config_list_from_json
|
||||
|
||||
sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
|
||||
from test_assistant_agent import OAI_CONFIG_LIST, KEY_LOC # noqa: E402
|
||||
|
||||
try:
|
||||
from openai import OpenAI
|
||||
from autogen.agentchat.contrib.teachable_agent import TeachableAgent
|
||||
@@ -6,11 +14,6 @@ except ImportError:
|
||||
else:
|
||||
skip = False
|
||||
|
||||
import pytest
|
||||
import sys
|
||||
from autogen import ConversableAgent, config_list_from_json
|
||||
from test_assistant_agent import OAI_CONFIG_LIST, KEY_LOC
|
||||
|
||||
try:
|
||||
from termcolor import colored
|
||||
except ImportError:
|
||||
@@ -25,8 +28,7 @@ skill_verbosity = 3 # 0 for basic info, 1 to add memory operations, 2 for analy
|
||||
|
||||
assert_on_error = False # GPT-4 nearly always succeeds on these unit tests, but GPT-3.5 is a bit less reliable.
|
||||
recall_threshold = 1.5 # Higher numbers allow more (but less relevant) memos to be recalled.
|
||||
cache_seed = None
|
||||
# If int, cached LLM calls will be skipped and responses pulled from cache. None exposes LLM non-determinism.
|
||||
cache_seed = None # Use an int to seed the response cache. Use None to disable caching.
|
||||
|
||||
# Specify the model to use by uncommenting one of the following lines.
|
||||
# filter_dict={"model": ["gpt-4-0613"]}
|
||||
@@ -139,10 +141,10 @@ def use_task_advice_pair_phrasing():
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
skip or not sys.version.startswith("3.11"),
|
||||
reason="do not run if dependency is not installed or py!=3.11",
|
||||
skip,
|
||||
reason="do not run if dependency is not installed",
|
||||
)
|
||||
def test_all():
|
||||
def test_teachability_code_paths():
|
||||
"""Runs this file's unit tests."""
|
||||
total_num_errors, total_num_tests = 0, 0
|
||||
|
||||
@@ -169,6 +171,49 @@ def test_all():
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
skip,
|
||||
reason="do not run if dependency is not installed",
|
||||
)
|
||||
def test_teachability_accuracy():
|
||||
"""A very cheap and fast test of teachability accuracy."""
|
||||
print(colored("\nTEST TEACHABILITY ACCURACY", "light_cyan"))
|
||||
|
||||
num_trials = 10 # The expected probability of failure is about 0.3 on each trial.
|
||||
for trial in range(num_trials):
|
||||
teachable_agent = create_teachable_agent(
|
||||
reset_db=True, verbosity=0
|
||||
) # For a clean test, clear the agent's memory.
|
||||
user = ConversableAgent("user", max_consecutive_auto_reply=0, llm_config=False, human_input_mode="NEVER")
|
||||
|
||||
# Prepopulate memory with a few arbitrary memos, just to make retrieval less trivial.
|
||||
teachable_agent.prepopulate_db()
|
||||
|
||||
# Tell the teachable agent something it wouldn't already know.
|
||||
user.initiate_chat(recipient=teachable_agent, message="My favorite color is teal.")
|
||||
|
||||
# Let the teachable agent remember things that should be learned from this chat.
|
||||
teachable_agent.learn_from_user_feedback()
|
||||
|
||||
# Now start a new chat to clear the context, and ask the teachable agent about the new information.
|
||||
print(colored("\nSTARTING A NEW CHAT WITH EMPTY CONTEXT", "light_cyan"))
|
||||
user.initiate_chat(recipient=teachable_agent, message="What's my favorite color?")
|
||||
num_errors = check_agent_response(teachable_agent, user, "teal")
|
||||
|
||||
print(colored(f"\nTRIAL {trial + 1} OF {num_trials} FINISHED", "light_cyan"))
|
||||
|
||||
# Wrap up.
|
||||
teachable_agent.close_db()
|
||||
|
||||
# Exit on the first success.
|
||||
if num_errors == 0:
|
||||
return
|
||||
|
||||
# All trials failed.
|
||||
assert False, "test_teachability_accuracy() failed on all {} trials.".format(num_trials)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
"""Runs this file's unit tests from the command line."""
|
||||
test_all()
|
||||
test_teachability_code_paths()
|
||||
test_teachability_accuracy()
|
||||
@@ -23,11 +23,11 @@ In order to make effective decisions about memo storage and retrieval, `Teachabl
|
||||
|
||||
AutoGen contains three code examples that use `TeachableAgent`.
|
||||
|
||||
1. Run [chat_with_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/chat_with_teachable_agent.py) to converse with `TeachableAgent`.
|
||||
1. Run [chat_with_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/contrib/chat_with_teachable_agent.py) to converse with `TeachableAgent`.
|
||||
|
||||
2. Use the Jupyter notebook [agentchat_teachability.ipynb](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb) to step through examples discussed below.
|
||||
|
||||
3. Run [test_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/test_teachable_agent.py) for quick unit testing of `TeachableAgent`.
|
||||
3. Run [test_teachable_agent.py](https://github.com/microsoft/autogen/blob/main/test/agentchat/contrib/test_teachable_agent.py) for quick unit testing of `TeachableAgent`.
|
||||
|
||||
|
||||
## Basic Usage of TeachableAgent
|
||||
|
||||
@@ -55,7 +55,7 @@ openai v1 is a total rewrite of the library with many breaking changes. For exam
|
||||
Therefore, some changes are required for users of `pyautogen<0.2`.
|
||||
|
||||
- `api_base` -> `base_url`, `request_timeout` -> `timeout` in `llm_config` and `config_list`. `max_retry_period` and `retry_wait_time` are deprecated. `max_retries` can be set for each client.
|
||||
- MathChat, TeachableAgent are unsupported until they are tested in future release.
|
||||
- MathChat is unsupported until it is tested in future release.
|
||||
- `autogen.Completion` and `autogen.ChatCompletion` are deprecated. The essential functionalities are moved to `autogen.OpenAIWrapper`:
|
||||
```python
|
||||
from autogen import OpenAIWrapper
|
||||
@@ -118,6 +118,17 @@ Example notebooks:
|
||||
[Automated Code Generation and Question Answering with Qdrant based Retrieval Augmented Agents](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_qdrant_RetrieveChat.ipynb)
|
||||
|
||||
|
||||
- #### TeachableAgent
|
||||
|
||||
To use TeachableAgent, please install AutoGen with the [teachable] option.
|
||||
```bash
|
||||
pip install "pyautogen[teachable]"
|
||||
```
|
||||
|
||||
Example notebook: [Chatting with TeachableAgent](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_teachability.ipynb)
|
||||
|
||||
|
||||
|
||||
- #### Large Multimodal Model (LMM) Agents
|
||||
|
||||
We offered Multimodal Conversable Agent and LLaVA Agent. Please install with the [lmm] option to use it.
|
||||
|
||||
Reference in New Issue
Block a user