mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test
This commit is contained in:
@@ -114,6 +114,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
|
||||
- When spawning sub-agents for research, ensure each has a distinct
|
||||
non-overlapping scope to avoid redundant searches.
|
||||
|
||||
|
||||
### Tool Discovery Priority
|
||||
|
||||
When the user asks to interact with a service or API, follow this order:
|
||||
|
||||
1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
|
||||
|
||||
2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
|
||||
|
||||
3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
|
||||
|
||||
4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
|
||||
|
||||
**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
|
||||
|
||||
### Sub-agent tasks
|
||||
- When using the Task tool, NEVER set `run_in_background` to true.
|
||||
All tasks must run in the foreground.
|
||||
|
||||
@@ -46,6 +46,12 @@ Steps:
|
||||
or fix manually based on the error descriptions. Iterate until valid.
|
||||
7. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
|
||||
the final `agent_json`
|
||||
8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
|
||||
`wait_for_result=120` to verify the agent works end-to-end.
|
||||
9. **Inspect & fix**: Check the dry-run output for errors. If issues are
|
||||
found, call `edit_agent` to fix and dry-run again. Repeat until the
|
||||
simulation passes or the problems are clearly unfixable.
|
||||
See "REQUIRED: Dry-Run Verification Loop" section below for details.
|
||||
|
||||
### Agent JSON Structure
|
||||
|
||||
@@ -239,19 +245,51 @@ call in a loop until the task is complete:
|
||||
Regular blocks work exactly like sub-agents as tools — wire each input
|
||||
field from `source_name: "tools"` on the Orchestrator side.
|
||||
|
||||
### Testing with Dry Run
|
||||
### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)
|
||||
|
||||
After saving an agent, suggest a dry run to validate wiring without consuming
|
||||
real API calls, credentials, or credits:
|
||||
After creating or editing an agent, you MUST dry-run it before telling the
|
||||
user the agent is ready. NEVER skip this step.
|
||||
|
||||
1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
|
||||
sample inputs. This executes the graph with mock outputs, verifying that
|
||||
links resolve correctly and required inputs are satisfied.
|
||||
2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
|
||||
to inspect the full node-by-node execution trace. This shows what each node
|
||||
received as input and produced as output, making it easy to spot wiring issues.
|
||||
3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
|
||||
the agent JSON and re-save before suggesting a real execution.
|
||||
#### Step-by-step workflow
|
||||
|
||||
1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
|
||||
2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
|
||||
and realistic sample inputs that exercise every path in the agent. This
|
||||
simulates execution using an LLM for each block — no real API calls,
|
||||
credentials, or credits are consumed.
|
||||
3. **Inspect output**: Examine the dry-run result for problems. If
|
||||
`wait_for_result` returns only a summary, call
|
||||
`view_agent_output(execution_id=..., show_execution_details=True)` to
|
||||
see the full node-by-node execution trace. Look for:
|
||||
- **Errors / failed nodes** — a node raised an exception or returned an
|
||||
error status. Common causes: wrong `source_name`/`sink_name` in links,
|
||||
missing `input_default` values, or referencing a nonexistent block output.
|
||||
- **Null / empty outputs** — data did not flow through a link. Verify that
|
||||
`source_name` and `sink_name` match the block schemas exactly (case-
|
||||
sensitive, including nested `_#_` notation).
|
||||
- **Nodes that never executed** — the node was not reached. Likely a
|
||||
missing or broken link from an upstream node.
|
||||
- **Unexpected values** — data arrived but in the wrong type or
|
||||
structure. Check type compatibility between linked ports.
|
||||
4. **Fix**: If any issues are found, call `edit_agent` with the corrected
|
||||
agent JSON, then go back to step 2.
|
||||
5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
|
||||
or the problems are clearly unfixable. If you stop making progress,
|
||||
report the remaining issues to the user and ask for guidance.
|
||||
|
||||
#### Good vs bad dry-run output
|
||||
|
||||
**Good output** (agent is ready):
|
||||
- All nodes executed successfully (no errors in the execution trace)
|
||||
- Data flows through every link with non-null, correctly-typed values
|
||||
- The final `AgentOutputBlock` contains a meaningful result
|
||||
- Status is `COMPLETED`
|
||||
|
||||
**Bad output** (needs fixing):
|
||||
- Status is `FAILED` — check the error message for the failing node
|
||||
- An output node received `null` — trace back to find the broken link
|
||||
- A node received data in the wrong format (e.g. string where list expected)
|
||||
- Nodes downstream of a failing node were skipped entirely
|
||||
|
||||
**Special block behaviour in dry-run mode:**
|
||||
- **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the
|
||||
|
||||
@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.
|
||||
|
||||
### Important: Check blocks first
|
||||
|
||||
Before using `run_mcp_tool`, always check if the platform already has blocks for the service
|
||||
using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
|
||||
Google Calendar, Gmail, etc.) that work without MCP setup.
|
||||
Always follow the **Tool Discovery Priority** described in the tool notes:
|
||||
call `find_block` before resorting to `run_mcp_tool`.
|
||||
|
||||
Only use `run_mcp_tool` when:
|
||||
- The service is in the known hosted MCP servers list above, OR
|
||||
- You searched `find_block` first and found no matching blocks
|
||||
- You searched `find_block` first and found no matching blocks, AND
|
||||
- The service is in the known hosted MCP servers list above or found via the registry API
|
||||
|
||||
**Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
|
||||
or from the `remotes[].url` field in MCP registry search results.
|
||||
|
||||
@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
|
||||
return (
|
||||
"Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
|
||||
"and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
|
||||
)
|
||||
|
||||
@property
|
||||
def parameters(self) -> dict[str, Any]:
|
||||
|
||||
@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
|
||||
},
|
||||
"dry_run": {
|
||||
"type": "boolean",
|
||||
"description": "Execute in preview mode.",
|
||||
"description": (
|
||||
"When true, simulates execution using an LLM for each block "
|
||||
"— no real API calls, credentials, or credits. "
|
||||
"See agent_generation_guide for the full workflow."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["dry_run"],
|
||||
|
||||
0
autogpt_platform/backend/test/copilot/__init__.py
Normal file
0
autogpt_platform/backend/test/copilot/__init__.py
Normal file
394
autogpt_platform/backend/test/copilot/dry_run_loop_test.py
Normal file
394
autogpt_platform/backend/test/copilot/dry_run_loop_test.py
Normal file
@@ -0,0 +1,394 @@
|
||||
"""Prompt regression tests AND functional tests for the dry-run verification loop.
|
||||
|
||||
NOTE: This file lives in test/copilot/ rather than being colocated with a
|
||||
single source module because it is a cross-cutting test spanning multiple
|
||||
modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
|
||||
|
||||
These tests verify that the create -> dry-run -> fix iterative workflow is
|
||||
properly communicated through tool descriptions, the prompting supplement,
|
||||
and the agent building guide.
|
||||
|
||||
After deduplication, the full dry-run workflow lives in the
|
||||
agent_generation_guide.md only. The system prompt and individual tool
|
||||
descriptions no longer repeat it — they keep a minimal footprint.
|
||||
|
||||
**Intentionally brittle**: the assertions check for specific substrings so
|
||||
that accidental removal or rewording of key instructions is caught. If you
|
||||
deliberately reword a prompt, update the corresponding assertion here.
|
||||
|
||||
--- Functional tests (added separately) ---
|
||||
|
||||
The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
|
||||
the guide and follows its instructions. There are no standalone Python
|
||||
functions that implement "loop until passing" logic; the loop is driven by
|
||||
the LLM. However, several pieces of real Python infrastructure make the
|
||||
loop possible:
|
||||
|
||||
1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
|
||||
``dry_run`` boolean parameter that the LLM must be able to set.
|
||||
2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
|
||||
bool, so the executor can branch on it.
|
||||
3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
|
||||
credential and missing-input gates when ``dry_run=True``.
|
||||
4. The guide documents the workflow steps in a specific order that the LLM
|
||||
must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
|
||||
|
||||
The functional test classes below exercise items 1-4 directly.
|
||||
"""
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any, cast
|
||||
|
||||
import pytest
|
||||
from openai.types.chat import ChatCompletionToolParam
|
||||
from pydantic import ValidationError
|
||||
|
||||
from backend.copilot.prompting import get_sdk_supplement
|
||||
from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
|
||||
from backend.copilot.tools import TOOL_REGISTRY
|
||||
from backend.copilot.tools.run_agent import RunAgentInput
|
||||
|
||||
# Resolved once for the whole module so individual tests stay fast.
|
||||
_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Prompt regression tests (original)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSystemPromptBasics:
|
||||
"""Verify the system prompt includes essential baseline content.
|
||||
|
||||
After deduplication, the dry-run workflow lives only in the guide.
|
||||
The system prompt carries tone and personality only.
|
||||
"""
|
||||
|
||||
def test_mentions_automations(self):
|
||||
assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
|
||||
|
||||
def test_mentions_action_oriented(self):
|
||||
assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
|
||||
|
||||
|
||||
class TestToolDescriptionsDryRunLoop:
|
||||
"""Verify tool descriptions and parameters related to the dry-run loop."""
|
||||
|
||||
def test_get_agent_building_guide_mentions_workflow(self):
|
||||
desc = TOOL_REGISTRY["get_agent_building_guide"].description
|
||||
assert "dry-run" in desc.lower()
|
||||
|
||||
def test_run_agent_dry_run_param_exists_and_is_boolean(self):
|
||||
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
assert "dry_run" in params["properties"]
|
||||
assert params["properties"]["dry_run"]["type"] == "boolean"
|
||||
|
||||
def test_run_agent_dry_run_param_mentions_simulation(self):
|
||||
"""After deduplication the dry_run param description mentions simulation."""
|
||||
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
dry_run_desc = params["properties"]["dry_run"]["description"]
|
||||
assert "simulat" in dry_run_desc.lower()
|
||||
|
||||
|
||||
class TestPromptingSupplementContent:
|
||||
"""Verify the prompting supplement (via get_sdk_supplement) includes
|
||||
essential shared tool notes. After deduplication, the dry-run workflow
|
||||
lives only in the guide; the supplement carries storage, file-handling,
|
||||
and tool-discovery notes.
|
||||
"""
|
||||
|
||||
def test_includes_tool_discovery_priority(self):
|
||||
assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
|
||||
|
||||
def test_includes_find_block_first(self):
|
||||
assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
|
||||
|
||||
def test_includes_send_authenticated_web_request(self):
|
||||
assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
|
||||
|
||||
|
||||
class TestAgentBuildingGuideDryRunLoop:
|
||||
"""Verify the agent building guide includes the dry-run loop."""
|
||||
|
||||
@pytest.fixture
|
||||
def guide_content(self):
|
||||
guide_path = (
|
||||
Path(__file__).resolve().parent.parent.parent
|
||||
/ "backend"
|
||||
/ "copilot"
|
||||
/ "sdk"
|
||||
/ "agent_generation_guide.md"
|
||||
)
|
||||
return guide_path.read_text(encoding="utf-8")
|
||||
|
||||
def test_has_dry_run_verification_section(self, guide_content):
|
||||
assert "REQUIRED: Dry-Run Verification Loop" in guide_content
|
||||
|
||||
def test_workflow_includes_dry_run_step(self, guide_content):
|
||||
assert "dry_run=True" in guide_content
|
||||
|
||||
def test_mentions_good_vs_bad_output(self, guide_content):
|
||||
assert "**Good output**" in guide_content
|
||||
assert "**Bad output**" in guide_content
|
||||
|
||||
def test_mentions_repeat_until_pass(self, guide_content):
|
||||
lower = guide_content.lower()
|
||||
assert "repeat" in lower
|
||||
assert "clearly unfixable" in lower
|
||||
|
||||
def test_mentions_wait_for_result(self, guide_content):
|
||||
assert "wait_for_result=120" in guide_content
|
||||
|
||||
def test_mentions_view_agent_output(self, guide_content):
|
||||
assert "view_agent_output" in guide_content
|
||||
|
||||
def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
|
||||
assert "**Dry-run**" in guide_content
|
||||
assert "**Inspect & fix**" in guide_content
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Functional tests: tool schema validation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRunAgentToolSchema:
|
||||
"""Validate the run_agent OpenAI tool schema exposes dry_run correctly.
|
||||
|
||||
These go beyond substring checks — they verify the full schema structure
|
||||
that the LLM receives, ensuring the parameter is well-formed and will be
|
||||
parsed correctly by OpenAI function-calling.
|
||||
"""
|
||||
|
||||
@pytest.fixture
|
||||
def schema(self) -> ChatCompletionToolParam:
|
||||
return TOOL_REGISTRY["run_agent"].as_openai_tool()
|
||||
|
||||
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
|
||||
"""The schema has the required top-level OpenAI structure."""
|
||||
assert schema["type"] == "function"
|
||||
assert "function" in schema
|
||||
func = schema["function"]
|
||||
assert "name" in func
|
||||
assert "description" in func
|
||||
assert "parameters" in func
|
||||
assert func["name"] == "run_agent"
|
||||
|
||||
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
|
||||
"""dry_run must be in 'required' so the LLM always provides it explicitly."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
required = params.get("required", [])
|
||||
assert "dry_run" in required
|
||||
|
||||
def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
|
||||
"""dry_run must be typed as boolean so the LLM generates true/false."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
assert params["properties"]["dry_run"]["type"] == "boolean"
|
||||
|
||||
def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
|
||||
"""The description must be present and substantive for LLM guidance."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
desc = params["properties"]["dry_run"]["description"]
|
||||
assert isinstance(desc, str)
|
||||
assert len(desc) > 10, "Description too short to guide the LLM"
|
||||
|
||||
def test_wait_for_result_coexists_with_dry_run(
|
||||
self, schema: ChatCompletionToolParam
|
||||
):
|
||||
"""wait_for_result must also be present — the guide instructs the LLM
|
||||
to pass both dry_run=True and wait_for_result=120 together."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
assert "wait_for_result" in params["properties"]
|
||||
assert params["properties"]["wait_for_result"]["type"] == "integer"
|
||||
|
||||
|
||||
class TestRunBlockToolSchema:
|
||||
"""Validate the run_block OpenAI tool schema exposes dry_run correctly."""
|
||||
|
||||
@pytest.fixture
|
||||
def schema(self) -> ChatCompletionToolParam:
|
||||
return TOOL_REGISTRY["run_block"].as_openai_tool()
|
||||
|
||||
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
|
||||
assert schema["type"] == "function"
|
||||
func = schema["function"]
|
||||
assert func["name"] == "run_block"
|
||||
assert "parameters" in func
|
||||
|
||||
def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
props = params["properties"]
|
||||
assert "dry_run" in props
|
||||
assert props["dry_run"]["type"] == "boolean"
|
||||
|
||||
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
|
||||
"""dry_run must be required — along with block_id and input_data."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
required = params.get("required", [])
|
||||
assert "dry_run" in required
|
||||
assert "block_id" in required
|
||||
assert "input_data" in required
|
||||
|
||||
def test_dry_run_description_mentions_preview(
|
||||
self, schema: ChatCompletionToolParam
|
||||
):
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
desc = params["properties"]["dry_run"]["description"]
|
||||
assert isinstance(desc, str)
|
||||
assert (
|
||||
"preview mode" in desc.lower()
|
||||
), "run_block dry_run description should mention preview mode"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Functional tests: RunAgentInput Pydantic model
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRunAgentInputModel:
|
||||
"""Validate RunAgentInput Pydantic model handles dry_run correctly.
|
||||
|
||||
The executor reads dry_run from this model, so it must parse, default,
|
||||
and validate properly.
|
||||
"""
|
||||
|
||||
def test_dry_run_accepts_true(self):
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
|
||||
assert model.dry_run is True
|
||||
|
||||
def test_dry_run_accepts_false(self):
|
||||
"""dry_run=False must be accepted when provided explicitly."""
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
|
||||
assert model.dry_run is False
|
||||
|
||||
def test_dry_run_coerces_truthy_int(self):
|
||||
"""Pydantic bool fields coerce int 1 to True."""
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=1) # type: ignore[arg-type]
|
||||
assert model.dry_run is True
|
||||
|
||||
def test_dry_run_coerces_falsy_int(self):
|
||||
"""Pydantic bool fields coerce int 0 to False."""
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=0) # type: ignore[arg-type]
|
||||
assert model.dry_run is False
|
||||
|
||||
def test_dry_run_with_wait_for_result(self):
|
||||
"""The guide instructs passing both dry_run=True and wait_for_result=120.
|
||||
The model must accept this combination."""
|
||||
model = RunAgentInput(
|
||||
username_agent_slug="user/agent",
|
||||
dry_run=True,
|
||||
wait_for_result=120,
|
||||
)
|
||||
assert model.dry_run is True
|
||||
assert model.wait_for_result == 120
|
||||
|
||||
def test_wait_for_result_upper_bound(self):
|
||||
"""wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
|
||||
with pytest.raises(ValidationError):
|
||||
RunAgentInput(
|
||||
username_agent_slug="user/agent",
|
||||
dry_run=True,
|
||||
wait_for_result=301,
|
||||
)
|
||||
|
||||
def test_string_fields_are_stripped(self):
|
||||
"""The strip_strings validator should strip whitespace from string fields."""
|
||||
model = RunAgentInput(username_agent_slug=" user/agent ", dry_run=True)
|
||||
assert model.username_agent_slug == "user/agent"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Functional tests: guide documents the correct workflow ordering
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestGuideWorkflowOrdering:
|
||||
"""Verify the guide documents workflow steps in the correct order.
|
||||
|
||||
The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
|
||||
If these steps are reordered, the copilot would follow the wrong sequence.
|
||||
These tests verify *ordering*, not just presence.
|
||||
"""
|
||||
|
||||
@pytest.fixture
|
||||
def guide_content(self) -> str:
|
||||
guide_path = (
|
||||
Path(__file__).resolve().parent.parent.parent
|
||||
/ "backend"
|
||||
/ "copilot"
|
||||
/ "sdk"
|
||||
/ "agent_generation_guide.md"
|
||||
)
|
||||
return guide_path.read_text(encoding="utf-8")
|
||||
|
||||
def test_create_before_dry_run_in_workflow(self, guide_content: str):
|
||||
"""Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
|
||||
create_pos = guide_content.index("create_agent")
|
||||
dry_run_pos = guide_content.index("dry_run=True")
|
||||
assert (
|
||||
create_pos < dry_run_pos
|
||||
), "create_agent must appear before dry_run=True in the workflow"
|
||||
|
||||
def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
|
||||
"""In the verification loop section, Dry-run step must come before
|
||||
Inspect & fix step."""
|
||||
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
|
||||
section = guide_content[section_start:]
|
||||
dry_run_pos = section.index("**Dry-run**")
|
||||
inspect_pos = section.index("**Inspect")
|
||||
assert (
|
||||
dry_run_pos < inspect_pos
|
||||
), "Dry-run step must come before Inspect & fix in the verification loop"
|
||||
|
||||
def test_fix_before_repeat_in_verification_section(self, guide_content: str):
|
||||
"""The Fix step must come before the Repeat step."""
|
||||
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
|
||||
section = guide_content[section_start:]
|
||||
fix_pos = section.index("**Fix**")
|
||||
repeat_pos = section.index("**Repeat**")
|
||||
assert fix_pos < repeat_pos
|
||||
|
||||
def test_good_output_before_bad_output(self, guide_content: str):
|
||||
"""Good output examples should be listed before bad output examples,
|
||||
so the LLM sees the success pattern first."""
|
||||
good_pos = guide_content.index("**Good output**")
|
||||
bad_pos = guide_content.index("**Bad output**")
|
||||
assert good_pos < bad_pos
|
||||
|
||||
def test_numbered_steps_in_verification_section(self, guide_content: str):
|
||||
"""The step-by-step workflow should have numbered steps 1-5."""
|
||||
section_start = guide_content.index("Step-by-step workflow")
|
||||
section = guide_content[section_start:]
|
||||
# The section should contain numbered items 1 through 5
|
||||
for step_num in range(1, 6):
|
||||
assert (
|
||||
f"{step_num}. " in section
|
||||
), f"Missing numbered step {step_num} in verification workflow"
|
||||
|
||||
def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
|
||||
"""The main workflow steps (1-9) must appear in ascending order."""
|
||||
# Extract the numbered workflow items from the top-level workflow section
|
||||
workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
|
||||
# End at the next ### section
|
||||
next_section = guide_content.index("### Agent JSON Structure")
|
||||
workflow_section = guide_content[workflow_start:next_section]
|
||||
step_positions = []
|
||||
for step_num in range(1, 10):
|
||||
pattern = rf"^{step_num}\.\s"
|
||||
match = re.search(pattern, workflow_section, re.MULTILINE)
|
||||
if match:
|
||||
step_positions.append((step_num, match.start()))
|
||||
# Verify at least steps 1-9 are present and in order
|
||||
assert (
|
||||
len(step_positions) >= 9
|
||||
), f"Expected 9 workflow steps, found {len(step_positions)}"
|
||||
for i in range(1, len(step_positions)):
|
||||
prev_num, prev_pos = step_positions[i - 1]
|
||||
curr_num, curr_pos = step_positions[i]
|
||||
assert prev_pos < curr_pos, (
|
||||
f"Step {prev_num} (pos {prev_pos}) should appear before "
|
||||
f"step {curr_num} (pos {curr_pos})"
|
||||
)
|
||||
@@ -10,6 +10,7 @@ import { toDisplayName } from "@/providers/agent-credentials/helper";
|
||||
import { APIKeyCredentialsModal } from "./components/APIKeyCredentialsModal/APIKeyCredentialsModal";
|
||||
import { CredentialsFlatView } from "./components/CredentialsFlatView/CredentialsFlatView";
|
||||
import { CredentialTypeSelector } from "./components/CredentialTypeSelector/CredentialTypeSelector";
|
||||
import { DeleteConfirmationModal } from "./components/DeleteConfirmationModal/DeleteConfirmationModal";
|
||||
import { HostScopedCredentialsModal } from "./components/HotScopedCredentialsModal/HotScopedCredentialsModal";
|
||||
import { OAuthFlowWaitingModal } from "./components/OAuthWaitingModal/OAuthWaitingModal";
|
||||
import { PasswordCredentialsModal } from "./components/PasswordCredentialsModal/PasswordCredentialsModal";
|
||||
@@ -90,6 +91,11 @@ export function CredentialsInput({
|
||||
handleActionButtonClick,
|
||||
handleCredentialSelect,
|
||||
handleOAuthLogin,
|
||||
handleDeleteCredential,
|
||||
handleDeleteConfirm,
|
||||
credentialToDelete,
|
||||
setCredentialToDelete,
|
||||
deleteCredentialsMutation,
|
||||
} = hookData;
|
||||
|
||||
const displayName = toDisplayName(provider);
|
||||
@@ -113,6 +119,7 @@ export function CredentialsInput({
|
||||
onSelectCredential={handleCredentialSelect}
|
||||
onClearCredential={() => onSelectCredential(undefined)}
|
||||
onAddCredential={handleActionButtonClick}
|
||||
onDeleteCredential={readOnly ? undefined : handleDeleteCredential}
|
||||
actionButtonText={actionButtonText}
|
||||
isOptional={isOptional}
|
||||
showTitle={showTitle}
|
||||
@@ -192,6 +199,13 @@ export function CredentialsInput({
|
||||
Error: {oAuthError}
|
||||
</Text>
|
||||
)}
|
||||
|
||||
<DeleteConfirmationModal
|
||||
credentialToDelete={credentialToDelete}
|
||||
isDeleting={deleteCredentialsMutation.isPending}
|
||||
onClose={() => setCredentialToDelete(null)}
|
||||
onConfirm={handleDeleteConfirm}
|
||||
/>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
|
||||
@@ -31,6 +31,7 @@ type Props = {
|
||||
onSelectCredential: (credentialId: string) => void;
|
||||
onClearCredential: () => void;
|
||||
onAddCredential: () => void;
|
||||
onDeleteCredential?: (credential: { id: string; title: string }) => void;
|
||||
};
|
||||
|
||||
export function CredentialsFlatView({
|
||||
@@ -47,6 +48,7 @@ export function CredentialsFlatView({
|
||||
onSelectCredential,
|
||||
onClearCredential,
|
||||
onAddCredential,
|
||||
onDeleteCredential,
|
||||
}: Props) {
|
||||
const hasCredentials = credentials.length > 0;
|
||||
|
||||
@@ -99,6 +101,15 @@ export function CredentialsFlatView({
|
||||
provider={provider}
|
||||
displayName={displayName}
|
||||
onSelect={() => onSelectCredential(credential.id)}
|
||||
onDelete={
|
||||
onDeleteCredential
|
||||
? () =>
|
||||
onDeleteCredential({
|
||||
id: credential.id,
|
||||
title: credential.title || credential.id,
|
||||
})
|
||||
: undefined
|
||||
}
|
||||
readOnly={readOnly}
|
||||
/>
|
||||
))}
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { useEffect, useState } from "react";
|
||||
import { useContext, useEffect, useState } from "react";
|
||||
import { z } from "zod";
|
||||
import { useForm } from "react-hook-form";
|
||||
import { zodResolver } from "@hookform/resolvers/zod";
|
||||
@@ -16,6 +16,7 @@ import {
|
||||
BlockIOCredentialsSubSchema,
|
||||
CredentialsMetaInput,
|
||||
} from "@/lib/autogpt-server-api/types";
|
||||
import { CredentialsProvidersContext } from "@/providers/agent-credentials/credentials-provider";
|
||||
import { getHostFromUrl } from "@/lib/utils/url";
|
||||
import { PlusIcon, TrashIcon } from "@phosphor-icons/react";
|
||||
|
||||
@@ -35,6 +36,7 @@ export function HostScopedCredentialsModal({
|
||||
siblingInputs,
|
||||
}: Props) {
|
||||
const credentials = useCredentials(schema, siblingInputs);
|
||||
const allProviders = useContext(CredentialsProvidersContext);
|
||||
|
||||
// Get current host from siblingInputs or discriminator_values
|
||||
const currentUrl = credentials?.discriminatorValue;
|
||||
@@ -89,7 +91,25 @@ export function HostScopedCredentialsModal({
|
||||
return null;
|
||||
}
|
||||
|
||||
const { provider, providerName, createHostScopedCredentials } = credentials;
|
||||
const {
|
||||
provider,
|
||||
providerName,
|
||||
createHostScopedCredentials,
|
||||
deleteCredentials,
|
||||
} = credentials;
|
||||
|
||||
// Use the unfiltered credential list from the provider context for deduplication.
|
||||
// The hook's savedCredentials is pre-filtered by discriminatorValue, which may be
|
||||
// empty when no URL is entered yet — causing deduplication to miss existing creds.
|
||||
const allProviderCredentials =
|
||||
allProviders?.[provider]?.savedCredentials ?? [];
|
||||
|
||||
const hasExistingForHost = allProviderCredentials.some(
|
||||
(c) =>
|
||||
c.type === "host_scoped" &&
|
||||
"host" in c &&
|
||||
c.host === (currentHost || form.getValues("host")),
|
||||
);
|
||||
|
||||
const addHeaderPair = () => {
|
||||
setHeaderPairs([...headerPairs, { key: "", value: "" }]);
|
||||
@@ -123,9 +143,19 @@ export function HostScopedCredentialsModal({
|
||||
{} as Record<string, string>,
|
||||
);
|
||||
|
||||
// Delete existing host-scoped credentials for the same host to avoid duplicates.
|
||||
// Uses unfiltered provider credentials (not the hook's pre-filtered list).
|
||||
const host = values.host;
|
||||
const existingForHost = allProviderCredentials.filter(
|
||||
(c) => c.type === "host_scoped" && "host" in c && c.host === host,
|
||||
);
|
||||
for (const existing of existingForHost) {
|
||||
await deleteCredentials(existing.id, true);
|
||||
}
|
||||
|
||||
const newCredentials = await createHostScopedCredentials({
|
||||
host: values.host,
|
||||
title: currentHost || values.host,
|
||||
host,
|
||||
title: currentHost || host,
|
||||
headers,
|
||||
});
|
||||
|
||||
@@ -139,7 +169,11 @@ export function HostScopedCredentialsModal({
|
||||
|
||||
return (
|
||||
<Dialog
|
||||
title={`Add sensitive headers for ${providerName}`}
|
||||
title={
|
||||
hasExistingForHost
|
||||
? `Update sensitive headers for ${providerName}`
|
||||
: `Add sensitive headers for ${providerName}`
|
||||
}
|
||||
controlled={{
|
||||
isOpen: open,
|
||||
set: (isOpen) => {
|
||||
@@ -241,7 +275,9 @@ export function HostScopedCredentialsModal({
|
||||
|
||||
<div className="pt-8">
|
||||
<Button type="submit" className="w-full" size="small">
|
||||
Save & use these credentials
|
||||
{hasExistingForHost
|
||||
? "Update & use these credentials"
|
||||
: "Save & use these credentials"}
|
||||
</Button>
|
||||
</div>
|
||||
</form>
|
||||
|
||||
@@ -149,7 +149,7 @@ export function getActionButtonText(
|
||||
if (supportsOAuth2) return "Connect another account";
|
||||
if (supportsApiKey) return "Use a new API key";
|
||||
if (supportsUserPassword) return "Add a new username and password";
|
||||
if (supportsHostScoped) return "Add new headers";
|
||||
if (supportsHostScoped) return "Update headers";
|
||||
return "Add new credentials";
|
||||
} else {
|
||||
if (supportsOAuth2) return "Add account";
|
||||
|
||||
Reference in New Issue
Block a user