Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test

2026-04-08 03:00:28 -04:00 · 2026-04-02 16:42:11 +02:00
parent e860f164e4 60e2474640
commit 72d0c8dad8
11 changed files with 539 additions and 25 deletions
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -114,6 +114,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
 - When spawning sub-agents for research, ensure each has a distinct
  non-overlapping scope to avoid redundant searches.

+
+### Tool Discovery Priority
+
+When the user asks to interact with a service or API, follow this order:
+
+1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
+
+2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
+
+3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
+
+4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
+
+**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
+
 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
  All tasks must run in the foreground.
--- a/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
@@ -46,6 +46,12 @@ Steps:
   or fix manually based on the error descriptions. Iterate until valid.
 7. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
   the final `agent_json`
+8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
+   `wait_for_result=120` to verify the agent works end-to-end.
+9. **Inspect & fix**: Check the dry-run output for errors. If issues are
+   found, call `edit_agent` to fix and dry-run again. Repeat until the
+   simulation passes or the problems are clearly unfixable.
+   See "REQUIRED: Dry-Run Verification Loop" section below for details.

 ### Agent JSON Structure

@@ -239,19 +245,51 @@ call in a loop until the task is complete:
 Regular blocks work exactly like sub-agents as tools — wire each input
 field from `source_name: "tools"` on the Orchestrator side.

-### Testing with Dry Run
+### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)

-After saving an agent, suggest a dry run to validate wiring without consuming
-real API calls, credentials, or credits:
+After creating or editing an agent, you MUST dry-run it before telling the
+user the agent is ready. NEVER skip this step.

-1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
-   sample inputs. This executes the graph with mock outputs, verifying that
-   links resolve correctly and required inputs are satisfied.
-2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
-   to inspect the full node-by-node execution trace. This shows what each node
-   received as input and produced as output, making it easy to spot wiring issues.
-3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
-   the agent JSON and re-save before suggesting a real execution.
+#### Step-by-step workflow
+
+1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
+2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
+   and realistic sample inputs that exercise every path in the agent. This
+   simulates execution using an LLM for each block — no real API calls,
+   credentials, or credits are consumed.
+3. **Inspect output**: Examine the dry-run result for problems. If
+   `wait_for_result` returns only a summary, call
+   `view_agent_output(execution_id=..., show_execution_details=True)` to
+   see the full node-by-node execution trace. Look for:
+   - **Errors / failed nodes** — a node raised an exception or returned an
+     error status. Common causes: wrong `source_name`/`sink_name` in links,
+     missing `input_default` values, or referencing a nonexistent block output.
+   - **Null / empty outputs** — data did not flow through a link. Verify that
+     `source_name` and `sink_name` match the block schemas exactly (case-
+     sensitive, including nested `_#_` notation).
+   - **Nodes that never executed** — the node was not reached. Likely a
+     missing or broken link from an upstream node.
+   - **Unexpected values** — data arrived but in the wrong type or
+     structure. Check type compatibility between linked ports.
+4. **Fix**: If any issues are found, call `edit_agent` with the corrected
+   agent JSON, then go back to step 2.
+5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
+   or the problems are clearly unfixable. If you stop making progress,
+   report the remaining issues to the user and ask for guidance.
+
+#### Good vs bad dry-run output
+
+**Good output** (agent is ready):
+- All nodes executed successfully (no errors in the execution trace)
+- Data flows through every link with non-null, correctly-typed values
+- The final `AgentOutputBlock` contains a meaningful result
+- Status is `COMPLETED`
+
+**Bad output** (needs fixing):
+- Status is `FAILED` — check the error message for the failing node
+- An output node received `null` — trace back to find the broken link
+- A node received data in the wrong format (e.g. string where list expected)
+- Nodes downstream of a failing node were skipped entirely

 **Special block behaviour in dry-run mode:**
 - **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the
--- a/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.

 ### Important: Check blocks first

-Before using `run_mcp_tool`, always check if the platform already has blocks for the service
-using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
-Google Calendar, Gmail, etc.) that work without MCP setup.
+Always follow the **Tool Discovery Priority** described in the tool notes:
+call `find_block` before resorting to `run_mcp_tool`.

 Only use `run_mcp_tool` when:
- The service is in the known hosted MCP servers list above, OR
- You searched `find_block` first and found no matching blocks
+- You searched `find_block` first and found no matching blocks, AND
+- The service is in the known hosted MCP servers list above or found via the registry API

 **Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
 or from the `remotes[].url` field in MCP registry search results.
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):

    @property
    def description(self) -> str:
-        return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
+        return (
+            "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
+            "and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/run_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_agent.py
@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
                },
                "dry_run": {
                    "type": "boolean",
-                    "description": "Execute in preview mode.",
+                    "description": (
+                        "When true, simulates execution using an LLM for each block "
+                        "— no real API calls, credentials, or credits. "
+                        "See agent_generation_guide for the full workflow."
+                    ),
                },
            },
            "required": ["dry_run"],
--- a/autogpt_platform/backend/test/copilot/init.py
+++ b/autogpt_platform/backend/test/copilot/init.py
--- a/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
+++ b/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
@@ -0,0 +1,394 @@
+"""Prompt regression tests AND functional tests for the dry-run verification loop.
+
+NOTE: This file lives in test/copilot/ rather than being colocated with a
+single source module because it is a cross-cutting test spanning multiple
+modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
+
+These tests verify that the create -> dry-run -> fix iterative workflow is
+properly communicated through tool descriptions, the prompting supplement,
+and the agent building guide.
+
+After deduplication, the full dry-run workflow lives in the
+agent_generation_guide.md only. The system prompt and individual tool
+descriptions no longer repeat it — they keep a minimal footprint.
+
+**Intentionally brittle**: the assertions check for specific substrings so
+that accidental removal or rewording of key instructions is caught. If you
+deliberately reword a prompt, update the corresponding assertion here.
+
+--- Functional tests (added separately) ---
+
+The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
+the guide and follows its instructions.  There are no standalone Python
+functions that implement "loop until passing" logic; the loop is driven by
+the LLM.  However, several pieces of real Python infrastructure make the
+loop possible:
+
+1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
+   ``dry_run`` boolean parameter that the LLM must be able to set.
+2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
+   bool, so the executor can branch on it.
+3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
+   credential and missing-input gates when ``dry_run=True``.
+4. The guide documents the workflow steps in a specific order that the LLM
+   must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
+
+The functional test classes below exercise items 1-4 directly.
+"""
+
+import re
+from pathlib import Path
+from typing import Any, cast
+
+import pytest
+from openai.types.chat import ChatCompletionToolParam
+from pydantic import ValidationError
+
+from backend.copilot.prompting import get_sdk_supplement
+from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
+from backend.copilot.tools import TOOL_REGISTRY
+from backend.copilot.tools.run_agent import RunAgentInput
+
+# Resolved once for the whole module so individual tests stay fast.
+_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
+
+
+# ---------------------------------------------------------------------------
+# Prompt regression tests (original)
+# ---------------------------------------------------------------------------
+
+
+class TestSystemPromptBasics:
+    """Verify the system prompt includes essential baseline content.
+
+    After deduplication, the dry-run workflow lives only in the guide.
+    The system prompt carries tone and personality only.
+    """
+
+    def test_mentions_automations(self):
+        assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
+
+    def test_mentions_action_oriented(self):
+        assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
+
+
+class TestToolDescriptionsDryRunLoop:
+    """Verify tool descriptions and parameters related to the dry-run loop."""
+
+    def test_get_agent_building_guide_mentions_workflow(self):
+        desc = TOOL_REGISTRY["get_agent_building_guide"].description
+        assert "dry-run" in desc.lower()
+
+    def test_run_agent_dry_run_param_exists_and_is_boolean(self):
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "dry_run" in params["properties"]
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_run_agent_dry_run_param_mentions_simulation(self):
+        """After deduplication the dry_run param description mentions simulation."""
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        dry_run_desc = params["properties"]["dry_run"]["description"]
+        assert "simulat" in dry_run_desc.lower()
+
+
+class TestPromptingSupplementContent:
+    """Verify the prompting supplement (via get_sdk_supplement) includes
+    essential shared tool notes.  After deduplication, the dry-run workflow
+    lives only in the guide; the supplement carries storage, file-handling,
+    and tool-discovery notes.
+    """
+
+    def test_includes_tool_discovery_priority(self):
+        assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
+
+    def test_includes_find_block_first(self):
+        assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
+
+    def test_includes_send_authenticated_web_request(self):
+        assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
+
+
+class TestAgentBuildingGuideDryRunLoop:
+    """Verify the agent building guide includes the dry-run loop."""
+
+    @pytest.fixture
+    def guide_content(self):
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_has_dry_run_verification_section(self, guide_content):
+        assert "REQUIRED: Dry-Run Verification Loop" in guide_content
+
+    def test_workflow_includes_dry_run_step(self, guide_content):
+        assert "dry_run=True" in guide_content
+
+    def test_mentions_good_vs_bad_output(self, guide_content):
+        assert "**Good output**" in guide_content
+        assert "**Bad output**" in guide_content
+
+    def test_mentions_repeat_until_pass(self, guide_content):
+        lower = guide_content.lower()
+        assert "repeat" in lower
+        assert "clearly unfixable" in lower
+
+    def test_mentions_wait_for_result(self, guide_content):
+        assert "wait_for_result=120" in guide_content
+
+    def test_mentions_view_agent_output(self, guide_content):
+        assert "view_agent_output" in guide_content
+
+    def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
+        assert "**Dry-run**" in guide_content
+        assert "**Inspect & fix**" in guide_content
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: tool schema validation
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentToolSchema:
+    """Validate the run_agent OpenAI tool schema exposes dry_run correctly.
+
+    These go beyond substring checks — they verify the full schema structure
+    that the LLM receives, ensuring the parameter is well-formed and will be
+    parsed correctly by OpenAI function-calling.
+    """
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_agent"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        """The schema has the required top-level OpenAI structure."""
+        assert schema["type"] == "function"
+        assert "function" in schema
+        func = schema["function"]
+        assert "name" in func
+        assert "description" in func
+        assert "parameters" in func
+        assert func["name"] == "run_agent"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be in 'required' so the LLM always provides it explicitly."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+
+    def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
+        """dry_run must be typed as boolean so the LLM generates true/false."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
+        """The description must be present and substantive for LLM guidance."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert len(desc) > 10, "Description too short to guide the LLM"
+
+    def test_wait_for_result_coexists_with_dry_run(
+        self, schema: ChatCompletionToolParam
+    ):
+        """wait_for_result must also be present — the guide instructs the LLM
+        to pass both dry_run=True and wait_for_result=120 together."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "wait_for_result" in params["properties"]
+        assert params["properties"]["wait_for_result"]["type"] == "integer"
+
+
+class TestRunBlockToolSchema:
+    """Validate the run_block OpenAI tool schema exposes dry_run correctly."""
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_block"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        assert schema["type"] == "function"
+        func = schema["function"]
+        assert func["name"] == "run_block"
+        assert "parameters" in func
+
+    def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        props = params["properties"]
+        assert "dry_run" in props
+        assert props["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be required — along with block_id and input_data."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+        assert "block_id" in required
+        assert "input_data" in required
+
+    def test_dry_run_description_mentions_preview(
+        self, schema: ChatCompletionToolParam
+    ):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert (
+            "preview mode" in desc.lower()
+        ), "run_block dry_run description should mention preview mode"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: RunAgentInput Pydantic model
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentInputModel:
+    """Validate RunAgentInput Pydantic model handles dry_run correctly.
+
+    The executor reads dry_run from this model, so it must parse, default,
+    and validate properly.
+    """
+
+    def test_dry_run_accepts_true(self):
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
+        assert model.dry_run is True
+
+    def test_dry_run_accepts_false(self):
+        """dry_run=False must be accepted when provided explicitly."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
+        assert model.dry_run is False
+
+    def test_dry_run_coerces_truthy_int(self):
+        """Pydantic bool fields coerce int 1 to True."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=1)  # type: ignore[arg-type]
+        assert model.dry_run is True
+
+    def test_dry_run_coerces_falsy_int(self):
+        """Pydantic bool fields coerce int 0 to False."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=0)  # type: ignore[arg-type]
+        assert model.dry_run is False
+
+    def test_dry_run_with_wait_for_result(self):
+        """The guide instructs passing both dry_run=True and wait_for_result=120.
+        The model must accept this combination."""
+        model = RunAgentInput(
+            username_agent_slug="user/agent",
+            dry_run=True,
+            wait_for_result=120,
+        )
+        assert model.dry_run is True
+        assert model.wait_for_result == 120
+
+    def test_wait_for_result_upper_bound(self):
+        """wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
+        with pytest.raises(ValidationError):
+            RunAgentInput(
+                username_agent_slug="user/agent",
+                dry_run=True,
+                wait_for_result=301,
+            )
+
+    def test_string_fields_are_stripped(self):
+        """The strip_strings validator should strip whitespace from string fields."""
+        model = RunAgentInput(username_agent_slug="  user/agent  ", dry_run=True)
+        assert model.username_agent_slug == "user/agent"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: guide documents the correct workflow ordering
+# ---------------------------------------------------------------------------
+
+
+class TestGuideWorkflowOrdering:
+    """Verify the guide documents workflow steps in the correct order.
+
+    The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
+    If these steps are reordered, the copilot would follow the wrong sequence.
+    These tests verify *ordering*, not just presence.
+    """
+
+    @pytest.fixture
+    def guide_content(self) -> str:
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_create_before_dry_run_in_workflow(self, guide_content: str):
+        """Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
+        create_pos = guide_content.index("create_agent")
+        dry_run_pos = guide_content.index("dry_run=True")
+        assert (
+            create_pos < dry_run_pos
+        ), "create_agent must appear before dry_run=True in the workflow"
+
+    def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
+        """In the verification loop section, Dry-run step must come before
+        Inspect & fix step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        dry_run_pos = section.index("**Dry-run**")
+        inspect_pos = section.index("**Inspect")
+        assert (
+            dry_run_pos < inspect_pos
+        ), "Dry-run step must come before Inspect & fix in the verification loop"
+
+    def test_fix_before_repeat_in_verification_section(self, guide_content: str):
+        """The Fix step must come before the Repeat step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        fix_pos = section.index("**Fix**")
+        repeat_pos = section.index("**Repeat**")
+        assert fix_pos < repeat_pos
+
+    def test_good_output_before_bad_output(self, guide_content: str):
+        """Good output examples should be listed before bad output examples,
+        so the LLM sees the success pattern first."""
+        good_pos = guide_content.index("**Good output**")
+        bad_pos = guide_content.index("**Bad output**")
+        assert good_pos < bad_pos
+
+    def test_numbered_steps_in_verification_section(self, guide_content: str):
+        """The step-by-step workflow should have numbered steps 1-5."""
+        section_start = guide_content.index("Step-by-step workflow")
+        section = guide_content[section_start:]
+        # The section should contain numbered items 1 through 5
+        for step_num in range(1, 6):
+            assert (
+                f"{step_num}. " in section
+            ), f"Missing numbered step {step_num} in verification workflow"
+
+    def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
+        """The main workflow steps (1-9) must appear in ascending order."""
+        # Extract the numbered workflow items from the top-level workflow section
+        workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
+        # End at the next ### section
+        next_section = guide_content.index("### Agent JSON Structure")
+        workflow_section = guide_content[workflow_start:next_section]
+        step_positions = []
+        for step_num in range(1, 10):
+            pattern = rf"^{step_num}\.\s"
+            match = re.search(pattern, workflow_section, re.MULTILINE)
+            if match:
+                step_positions.append((step_num, match.start()))
+        # Verify at least steps 1-9 are present and in order
+        assert (
+            len(step_positions) >= 9
+        ), f"Expected 9 workflow steps, found {len(step_positions)}"
+        for i in range(1, len(step_positions)):
+            prev_num, prev_pos = step_positions[i - 1]
+            curr_num, curr_pos = step_positions[i]
+            assert prev_pos < curr_pos, (
+                f"Step {prev_num} (pos {prev_pos}) should appear before "
+                f"step {curr_num} (pos {curr_pos})"
+            )
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/CredentialsInput.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/CredentialsInput.tsx
@@ -10,6 +10,7 @@ import { toDisplayName } from "@/providers/agent-credentials/helper";
 import { APIKeyCredentialsModal } from "./components/APIKeyCredentialsModal/APIKeyCredentialsModal";
 import { CredentialsFlatView } from "./components/CredentialsFlatView/CredentialsFlatView";
 import { CredentialTypeSelector } from "./components/CredentialTypeSelector/CredentialTypeSelector";
+import { DeleteConfirmationModal } from "./components/DeleteConfirmationModal/DeleteConfirmationModal";
 import { HostScopedCredentialsModal } from "./components/HotScopedCredentialsModal/HotScopedCredentialsModal";
 import { OAuthFlowWaitingModal } from "./components/OAuthWaitingModal/OAuthWaitingModal";
 import { PasswordCredentialsModal } from "./components/PasswordCredentialsModal/PasswordCredentialsModal";
@@ -90,6 +91,11 @@ export function CredentialsInput({
    handleActionButtonClick,
    handleCredentialSelect,
    handleOAuthLogin,
+    handleDeleteCredential,
+    handleDeleteConfirm,
+    credentialToDelete,
+    setCredentialToDelete,
+    deleteCredentialsMutation,
  } = hookData;

  const displayName = toDisplayName(provider);
@@ -113,6 +119,7 @@ export function CredentialsInput({
        onSelectCredential={handleCredentialSelect}
        onClearCredential={() => onSelectCredential(undefined)}
        onAddCredential={handleActionButtonClick}
+        onDeleteCredential={readOnly ? undefined : handleDeleteCredential}
        actionButtonText={actionButtonText}
        isOptional={isOptional}
        showTitle={showTitle}
@@ -192,6 +199,13 @@ export function CredentialsInput({
              Error: {oAuthError}
            </Text>
          )}
+
+          <DeleteConfirmationModal
+            credentialToDelete={credentialToDelete}
+            isDeleting={deleteCredentialsMutation.isPending}
+            onClose={() => setCredentialToDelete(null)}
+            onConfirm={handleDeleteConfirm}
+          />
        </>
      )}
    </div>
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/CredentialsFlatView/CredentialsFlatView.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/CredentialsFlatView/CredentialsFlatView.tsx
@@ -31,6 +31,7 @@ type Props = {
  onSelectCredential: (credentialId: string) => void;
  onClearCredential: () => void;
  onAddCredential: () => void;
+  onDeleteCredential?: (credential: { id: string; title: string }) => void;
 };

 export function CredentialsFlatView({
@@ -47,6 +48,7 @@ export function CredentialsFlatView({
  onSelectCredential,
  onClearCredential,
  onAddCredential,
+  onDeleteCredential,
 }: Props) {
  const hasCredentials = credentials.length > 0;

@@ -99,6 +101,15 @@ export function CredentialsFlatView({
                  provider={provider}
                  displayName={displayName}
                  onSelect={() => onSelectCredential(credential.id)}
+                  onDelete={
+                    onDeleteCredential
+                      ? () =>
+                          onDeleteCredential({
+                            id: credential.id,
+                            title: credential.title || credential.id,
+                          })
+                      : undefined
+                  }
                  readOnly={readOnly}
                />
              ))}
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/HotScopedCredentialsModal/HotScopedCredentialsModal.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/HotScopedCredentialsModal/HotScopedCredentialsModal.tsx
@@ -1,4 +1,4 @@
-import { useEffect, useState } from "react";
+import { useContext, useEffect, useState } from "react";
 import { z } from "zod";
 import { useForm } from "react-hook-form";
 import { zodResolver } from "@hookform/resolvers/zod";
@@ -16,6 +16,7 @@ import {
  BlockIOCredentialsSubSchema,
  CredentialsMetaInput,
 } from "@/lib/autogpt-server-api/types";
+import { CredentialsProvidersContext } from "@/providers/agent-credentials/credentials-provider";
 import { getHostFromUrl } from "@/lib/utils/url";
 import { PlusIcon, TrashIcon } from "@phosphor-icons/react";

@@ -35,6 +36,7 @@ export function HostScopedCredentialsModal({
  siblingInputs,
 }: Props) {
  const credentials = useCredentials(schema, siblingInputs);
+  const allProviders = useContext(CredentialsProvidersContext);

  // Get current host from siblingInputs or discriminator_values
  const currentUrl = credentials?.discriminatorValue;
@@ -89,7 +91,25 @@ export function HostScopedCredentialsModal({
    return null;
  }

-  const { provider, providerName, createHostScopedCredentials } = credentials;
+  const {
+    provider,
+    providerName,
+    createHostScopedCredentials,
+    deleteCredentials,
+  } = credentials;
+
+  // Use the unfiltered credential list from the provider context for deduplication.
+  // The hook's savedCredentials is pre-filtered by discriminatorValue, which may be
+  // empty when no URL is entered yet — causing deduplication to miss existing creds.
+  const allProviderCredentials =
+    allProviders?.[provider]?.savedCredentials ?? [];
+
+  const hasExistingForHost = allProviderCredentials.some(
+    (c) =>
+      c.type === "host_scoped" &&
+      "host" in c &&
+      c.host === (currentHost || form.getValues("host")),
+  );

  const addHeaderPair = () => {
    setHeaderPairs([...headerPairs, { key: "", value: "" }]);
@@ -123,9 +143,19 @@ export function HostScopedCredentialsModal({
      {} as Record<string, string>,
    );

+    // Delete existing host-scoped credentials for the same host to avoid duplicates.
+    // Uses unfiltered provider credentials (not the hook's pre-filtered list).
+    const host = values.host;
+    const existingForHost = allProviderCredentials.filter(
+      (c) => c.type === "host_scoped" && "host" in c && c.host === host,
+    );
+    for (const existing of existingForHost) {
+      await deleteCredentials(existing.id, true);
+    }
+
    const newCredentials = await createHostScopedCredentials({
-      host: values.host,
-      title: currentHost || values.host,
+      host,
+      title: currentHost || host,
      headers,
    });

@@ -139,7 +169,11 @@ export function HostScopedCredentialsModal({

  return (
    <Dialog
-      title={`Add sensitive headers for ${providerName}`}
+      title={
+        hasExistingForHost
+          ? `Update sensitive headers for ${providerName}`
+          : `Add sensitive headers for ${providerName}`
+      }
      controlled={{
        isOpen: open,
        set: (isOpen) => {
@@ -241,7 +275,9 @@ export function HostScopedCredentialsModal({

            <div className="pt-8">
              <Button type="submit" className="w-full" size="small">
-                Save & use these credentials
+                {hasExistingForHost
+                  ? "Update & use these credentials"
+                  : "Save & use these credentials"}
              </Button>
            </div>
          </form>
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts
@@ -149,7 +149,7 @@ export function getActionButtonText(
    if (supportsOAuth2) return "Connect another account";
    if (supportsApiKey) return "Use a new API key";
    if (supportsUserPassword) return "Add a new username and password";
-    if (supportsHostScoped) return "Add new headers";
+    if (supportsHostScoped) return "Update headers";
    return "Add new credentials";
  } else {
    if (supportsOAuth2) return "Add account";