docs: Update user guide notebooks to enhance clarity and add structured output (#5224)

Resolves #5043
2026-04-20 03:02:16 -04:00 · 2025-01-27 13:57:29 -08:00
parent 6359b6a7be
commit 2ceb9dcffe
3 changed files with 280 additions and 126 deletions
--- a/python/packages/autogen-core/docs/src/user-guide/agentchat-user-guide/tutorial/agents.ipynb
+++ b/python/packages/autogen-core/docs/src/user-guide/agentchat-user-guide/tutorial/agents.ipynb
@@ -35,9 +35,15 @@
    "from autogen_agentchat.messages import TextMessage\n",
    "from autogen_agentchat.ui import Console\n",
    "from autogen_core import CancellationToken\n",
-    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
-    "\n",
-    "\n",
+    "from autogen_ext.models.openai import OpenAIChatCompletionClient"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
    "# Define a tool that searches the web for information.\n",
    "async def web_search(query: str) -> str:\n",
    "    \"\"\"Find information on the web\"\"\"\n",
@@ -321,6 +327,82 @@
    ")"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Structured Output\n",
+    "\n",
+    "Structured output allows models to return structured JSON text with pre-defined schema\n",
+    "provided by the application. Different from JSON-mode, the schema can be provided\n",
+    "as a [Pydantic BaseModel](https://docs.pydantic.dev/latest/concepts/models/)\n",
+    "class, which can also be used to validate the output. \n",
+    "\n",
+    "```{note}\n",
+    "Structured output is only available for models that support it. It also\n",
+    "requires the model client to support structured output as well.\n",
+    "Currently, the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`\n",
+    "and {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient`\n",
+    "support structured output.\n",
+    "```\n",
+    "\n",
+    "Structured output is also useful for incorporating Chain-of-Thought\n",
+    "reasoning in the agent's responses.\n",
+    "See the example below for how to use structured output with the assistant agent."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "---------- user ----------\n",
+      "I am happy.\n",
+      "---------- assistant ----------\n",
+      "{\"thoughts\":\"The user explicitly states that they are happy.\",\"response\":\"happy\"}\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "TaskResult(messages=[TextMessage(source='user', models_usage=None, content='I am happy.', type='TextMessage'), TextMessage(source='assistant', models_usage=RequestUsage(prompt_tokens=89, completion_tokens=18), content='{\"thoughts\":\"The user explicitly states that they are happy.\",\"response\":\"happy\"}', type='TextMessage')], stop_reason=None)"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from typing import Literal\n",
+    "\n",
+    "from pydantic import BaseModel\n",
+    "\n",
+    "\n",
+    "# The response format for the agent as a Pydantic base model.\n",
+    "class AgentResponse(BaseModel):\n",
+    "    thoughts: str\n",
+    "    response: Literal[\"happy\", \"sad\", \"neutral\"]\n",
+    "\n",
+    "\n",
+    "# Create an agent that uses the OpenAI GPT-4o model with the custom response format.\n",
+    "model_client = OpenAIChatCompletionClient(\n",
+    "    model=\"gpt-4o\",\n",
+    "    response_format=AgentResponse,  # type: ignore\n",
+    ")\n",
+    "agent = AssistantAgent(\n",
+    "    \"assistant\",\n",
+    "    model_client=model_client,\n",
+    "    system_message=\"Categorize the input as happy, sad, or neutral following the JSON format.\",\n",
+    ")\n",
+    "\n",
+    "await Console(agent.run_stream(task=\"I am happy.\"))"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
--- a/python/packages/autogen-core/docs/src/user-guide/agentchat-user-guide/tutorial/models.ipynb
+++ b/python/packages/autogen-core/docs/src/user-guide/agentchat-user-guide/tutorial/models.ipynb
@@ -6,7 +6,10 @@
   "source": [
    "# Models\n",
    "\n",
-    "In many cases, agents need access to LLM model services such as OpenAI, Azure OpenAI, or local models. Since there are many different providers with different APIs, `autogen-core` implements a protocol for [model clients](../../core-user-guide/components/model-clients.ipynb) and `autogen-ext` implements a set of model clients for popular model services. AgentChat can use these model clients to interact with model services. \n",
+    "In many cases, agents need access to LLM model services such as OpenAI, Azure OpenAI, or local models. Since there are many different providers with different APIs, `autogen-core` implements a protocol for model clients and `autogen-ext` implements a set of model clients for popular model services. AgentChat can use these model clients to interact with model services. \n",
+    "\n",
+    "This section provides a quick overview of available model clients.\n",
+    "For more details on how to use them directly, please refer to [Model Clients](../../core-user-guide/components/model-clients.ipynb) in the Core API documentation.\n",
    "\n",
    "```{note}\n",
    "See {py:class}`~autogen_ext.models.cache.ChatCompletionCache` for a caching wrapper to use with the following clients.\n",
--- a/python/packages/autogen-core/docs/src/user-guide/core-user-guide/components/model-clients.ipynb
+++ b/python/packages/autogen-core/docs/src/user-guide/core-user-guide/components/model-clients.ipynb
@@ -14,17 +14,19 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Built-in Model Clients\n",
-    "\n",
    "Currently there are three built-in model clients:\n",
-    "* {py:class}`~autogen_ext.models.OpenAIChatCompletionClient`\n",
-    "* {py:class}`~autogen_ext.models.AzureOpenAIChatCompletionClient`\n",
-    "* {py:class}`~autogen_ext.models.AzureAIChatCompletionClient`\n",
+    "* {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`\n",
+    "* {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient`\n",
+    "* {py:class}`~autogen_ext.models.azure.AzureAIChatCompletionClient`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## OpenAI\n",
    "\n",
-    "\n",
-    "### OpenAI\n",
-    "\n",
-    "To use the {py:class}`~autogen_ext.models.OpenAIChatCompletionClient`, you need to install the `openai` extra."
+    "To use the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`, you need to install the `openai` extra."
   ]
  },
  {
@@ -68,7 +70,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can call the {py:meth}`~autogen_ext.models.OpenAIChatCompletionClient.create` method to create a\n",
+    "You can call the {py:meth}`~autogen_ext.models.openai.BaseOpenAIChatCompletionClient.create` method to create a\n",
    "chat completion request, and await for an {py:class}`~autogen_core.models.CreateResult` object in return."
   ]
  },
@@ -118,12 +120,142 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### OpenAI-Compatible API\n",
+    "## Azure OpenAI\n",
    "\n",
-    "You can use the {py:class}`~autogen_ext.models.OpenAIChatCompletionClient` to interact with OpenAI-compatible APIs such as Ollama and Gemini (beta).\n",
+    "To use the {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient`, you need to provide\n",
+    "the deployment id, Azure Cognitive Services endpoint, api version, and model capabilities.\n",
+    "For authentication, you can either provide an API key or an Azure Active Directory (AAD) token credential."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# pip install \"autogen-ext[openai,azure]\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following code snippet shows how to use AAD authentication.\n",
+    "The identity used must be assigned the [**Cognitive Services OpenAI User**](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/role-based-access-control#cognitive-services-openai-user) role."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autogen_ext.models.openai import AzureOpenAIChatCompletionClient\n",
+    "from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n",
    "\n",
-    "#### Ollama\n",
+    "# Create the token provider\n",
+    "token_provider = get_bearer_token_provider(DefaultAzureCredential(), \"https://cognitiveservices.azure.com/.default\")\n",
    "\n",
+    "az_model_client = AzureOpenAIChatCompletionClient(\n",
+    "    azure_deployment=\"{your-azure-deployment}\",\n",
+    "    model=\"{model-name, such as gpt-4o}\",\n",
+    "    api_version=\"2024-06-01\",\n",
+    "    azure_endpoint=\"https://{your-custom-endpoint}.openai.azure.com/\",\n",
+    "    azure_ad_token_provider=token_provider,  # Optional if you choose key-based authentication.\n",
+    "    # api_key=\"sk-...\", # For key-based authentication.\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```{note}\n",
+    "See [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity#chat-completions) for how to use the Azure client directly or for more info.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Azure AI Foundry\n",
+    "\n",
+    "[Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/) (previously known as Azure AI Studio) offers models hosted on Azure.\n",
+    "To use those models, you use the {py:class}`~autogen_ext.models.azure.AzureAIChatCompletionClient`.\n",
+    "\n",
+    "You need to install the `azure` extra to use this client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "shellscript"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# pip install \"autogen-ext[openai,azure]\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Below is an example of using this client with the Phi-4 model from [GitHub Marketplace](https://github.com/marketplace/models)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=8) cached=False logprobs=None\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "\n",
+    "from autogen_core.models import UserMessage\n",
+    "from autogen_ext.models.azure import AzureAIChatCompletionClient\n",
+    "from azure.core.credentials import AzureKeyCredential\n",
+    "\n",
+    "client = AzureAIChatCompletionClient(\n",
+    "    model=\"Phi-4\",\n",
+    "    endpoint=\"https://models.inference.ai.azure.com\",\n",
+    "    # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.\n",
+    "    # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens\n",
+    "    credential=AzureKeyCredential(os.environ[\"GITHUB_TOKEN\"]),\n",
+    "    model_info={\n",
+    "        \"json_output\": False,\n",
+    "        \"function_calling\": False,\n",
+    "        \"vision\": False,\n",
+    "        \"family\": \"unknown\",\n",
+    "    },\n",
+    ")\n",
+    "\n",
+    "result = await client.create([UserMessage(content=\"What is the capital of France?\", source=\"user\")])\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Ollama\n",
+    "\n",
+    "You can use the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient` to interact with OpenAI-compatible APIs such as Ollama and Gemini (beta).\n",
    "The below example shows how to use a local model running on [Ollama](https://ollama.com) server."
   ]
  },
@@ -164,9 +296,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Gemini (beta)\n",
+    "## Gemini (beta)\n",
    "\n",
-    "The below example shows how to use the Gemini model."
+    "The below example shows how to use the Gemini model via the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`."
   ]
  },
  {
@@ -208,9 +340,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Streaming Response\n",
+    "## Streaming Response\n",
    "\n",
-    "You can use the {py:meth}`~autogen_ext.models.OpenAIChatCompletionClient.create_streaming` method to create a\n",
+    "You can use the {py:meth}`~autogen_ext.models.openai.BaseOpenAIChatCompletionClient.create_stream` method to create a\n",
    "chat completion request with streaming response."
   ]
  },
@@ -366,133 +498,70 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Azure OpenAI\n",
+    "## Structured Output\n",
    "\n",
-    "To use the {py:class}`~autogen_ext.models.AzureOpenAIChatCompletionClient`, you need to provide\n",
-    "the deployment id, Azure Cognitive Services endpoint, api version, and model capabilities.\n",
-    "For authentication, you can either provide an API key or an Azure Active Directory (AAD) token credential."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "vscode": {
-     "languageId": "shellscript"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# pip install \"autogen-ext[openai,azure]\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The following code snippet shows how to use AAD authentication.\n",
-    "The identity used must be assigned the [**Cognitive Services OpenAI User**](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/role-based-access-control#cognitive-services-openai-user) role."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from autogen_ext.models.openai import AzureOpenAIChatCompletionClient\n",
-    "from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n",
+    "Structured output can be enabled by setting the `response_format` field in\n",
+    "{py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient` and {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient` to\n",
+    "as a [Pydantic BaseModel](https://docs.pydantic.dev/latest/concepts/models/) class.\n",
    "\n",
-    "# Create the token provider\n",
-    "token_provider = get_bearer_token_provider(DefaultAzureCredential(), \"https://cognitiveservices.azure.com/.default\")\n",
-    "\n",
-    "az_model_client = AzureOpenAIChatCompletionClient(\n",
-    "    azure_deployment=\"{your-azure-deployment}\",\n",
-    "    model=\"{model-name, such as gpt-4o}\",\n",
-    "    api_version=\"2024-06-01\",\n",
-    "    azure_endpoint=\"https://{your-custom-endpoint}.openai.azure.com/\",\n",
-    "    azure_ad_token_provider=token_provider,  # Optional if you choose key-based authentication.\n",
-    "    # api_key=\"sk-...\", # For key-based authentication.\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
    "```{note}\n",
-    "See [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity#chat-completions) for how to use the Azure client directly or for more info.\n",
+    "Structured output is only available for models that support it. It also\n",
+    "requires the model client to support structured output as well.\n",
+    "Currently, the {py:class}`~autogen_ext.models.openai.OpenAIChatCompletionClient`\n",
+    "and {py:class}`~autogen_ext.models.openai.AzureOpenAIChatCompletionClient`\n",
+    "support structured output.\n",
    "```"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Azure AI Foundry\n",
-    "\n",
-    "[Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/) (previously known as Azure AI Studio) offers models hosted on Azure.\n",
-    "To use those models, you use the {py:class}`~autogen_ext.models.azure.AzureAIChatCompletionClient`.\n",
-    "\n",
-    "You need to install the `azure` extra to use this client."
-   ]
-  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "shellscript"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# pip install \"autogen-ext[openai,azure]\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Below is an example of using this client with the Phi-4 model from [GitHub Marketplace](https://github.com/marketplace/models)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=8) cached=False logprobs=None\n"
+      "I'm glad to hear that you're feeling happy! It's such a great emotion that can brighten your whole day. Is there anything in particular that's bringing you joy today? 😊\n",
+      "happy\n"
     ]
    }
   ],
   "source": [
-    "import os\n",
+    "from typing import Literal\n",
    "\n",
-    "from autogen_core.models import UserMessage\n",
-    "from autogen_ext.models.azure import AzureAIChatCompletionClient\n",
-    "from azure.core.credentials import AzureKeyCredential\n",
+    "from pydantic import BaseModel\n",
    "\n",
-    "client = AzureAIChatCompletionClient(\n",
-    "    model=\"Phi-4\",\n",
-    "    endpoint=\"https://models.inference.ai.azure.com\",\n",
-    "    # To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.\n",
-    "    # Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens\n",
-    "    credential=AzureKeyCredential(os.environ[\"GITHUB_TOKEN\"]),\n",
-    "    model_info={\n",
-    "        \"json_output\": False,\n",
-    "        \"function_calling\": False,\n",
-    "        \"vision\": False,\n",
-    "        \"family\": \"unknown\",\n",
-    "    },\n",
+    "\n",
+    "# The response format for the agent as a Pydantic base model.\n",
+    "class AgentResponse(BaseModel):\n",
+    "    thoughts: str\n",
+    "    response: Literal[\"happy\", \"sad\", \"neutral\"]\n",
+    "\n",
+    "\n",
+    "# Create an agent that uses the OpenAI GPT-4o model with the custom response format.\n",
+    "model_client = OpenAIChatCompletionClient(\n",
+    "    model=\"gpt-4o\",\n",
+    "    response_format=AgentResponse,  # type: ignore\n",
    ")\n",
    "\n",
-    "result = await client.create([UserMessage(content=\"What is the capital of France?\", source=\"user\")])\n",
-    "print(result)"
+    "# Send a message list to the model and await the response.\n",
+    "messages = [\n",
+    "    UserMessage(content=\"I am happy.\", source=\"user\"),\n",
+    "]\n",
+    "response = await model_client.create(messages=messages)\n",
+    "assert isinstance(response.content, str)\n",
+    "parsed_response = AgentResponse.model_validate_json(response.content)\n",
+    "print(parsed_response.thoughts)\n",
+    "print(parsed_response.response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You also use the `extra_create_args` parameter in the {py:meth}`~autogen_ext.models.openai.BaseOpenAIChatCompletionClient.create` method\n",
+    "to set the `response_format` field so that the structured output can be configured for each request."
   ]
  },
  {