Compare commits

...

36 Commits

Author SHA1 Message Date
Zamil Majdy
f796390994 fix(backend): address PR reviewer feedback on sensitive field filtering
- Expand SENSITIVE_FIELD_NAMES with private_key, client_secret, secret_key,
  passphrase, webhook_secret, bearer_token
- Switch from exact-match to substring matching for sensitive field names
  (e.g. 'my_api_key' is now correctly filtered)
- Add recursive scanning of nested dicts to prevent secret leakage through
  benign top-level keys (e.g. {"config": {"api_key": "..."}})
- Extract duplicated defaults-collection logic into shared
  filter_sensitive_fields() helper (DRY fix)
- Replace magic number 64 with MAX_TOOL_NAME_LENGTH constant
- Add tests for substring matching, nested filtering, expanded field names,
  and non-dict input_default edge case
2026-03-31 14:42:46 +02:00
Zamil Majdy
ec9e45df98 test(backend): add round-trip routing test for disambiguated tool names
Existing tests verify that duplicate tools get suffixed names, but none
verify the reverse path: when the LLM calls a suffixed name like
`match_text_pattern_1`, the orchestrator resolves it back to the correct
node via `_process_tool_calls`. This test closes that gap by simulating
LLM tool calls with suffixed names and asserting the resolved tool_def
points to the correct `_sink_node_id`.
2026-03-29 17:10:25 +02:00
Zamil Majdy
ebd476471e test(backend): add more test conditions for tool name disambiguation
Adds 13 additional test cases covering:
- Parameters and metadata preservation through disambiguation
- Leading/trailing underscore names
- Exact 64-char boundary (no suffix, with suffix, two-digit suffix)
- Nested dict/list default values in descriptions
- None/null default values
- Customized name priority over block name
- Colliding customized names get suffixed
- Required fields in tool parameters
- Parameters unchanged after disambiguation

Addresses reviewer feedback requesting more test conditions.
2026-03-29 12:28:49 +02:00
Zamil Majdy
41578af2da refactor(backend): remove unused enumerate index in tool name disambiguation
Simplify _disambiguate_tool_names by removing the unused index from the
enumerate call. The index was stored in valid_tools tuples but never
referenced.
2026-03-29 11:50:18 +02:00
Zamil Majdy
d27b97cf74 Revert "fix(backend): use descriptive suffixes for duplicate tool names"
This reverts commit 0f7a47ccfa.
2026-03-29 11:21:31 +02:00
Zamil Majdy
0f7a47ccfa fix(backend): use descriptive suffixes for duplicate tool names
Instead of appending meaningless _1, _2 numeric suffixes, generate
descriptive names from hardcoded defaults (e.g. web_search_for_weather)
so the LLM can distinguish duplicate tools at a glance. Falls back to
numeric suffixes only when no descriptive suffix can be derived.

Addresses reviewer feedback from ntindle on PR #12555.
2026-03-29 11:16:36 +02:00
Zamil Majdy
db691aff62 Revert "fix(orchestrator): use descriptive suffixes for duplicate tool names"
This reverts commit 643b8a04b9.
2026-03-29 08:53:58 +02:00
Zamil Majdy
643b8a04b9 fix(orchestrator): use descriptive suffixes for duplicate tool names
Instead of opaque _1, _2 numeric suffixes, derive meaningful names from
hardcoded defaults (e.g. search_error, search_warning). This helps the
LLM understand what each tool variant does. Falls back to numeric
suffixes only when descriptive names collide.

Addresses review feedback from @ntindle.
2026-03-29 08:50:52 +02:00
Zamil Majdy
ebf9491dcb Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/orchestrator-duplicate-tool-names 2026-03-29 04:19:32 +02:00
Zamil Majdy
9ab49f7a4a fix(test): correct underscore count in cleanup test and fix formatting
The test_cleanup_special_characters_in_tool_name assertion expected 6
underscores but "My Tool! @#$% v2.0" has 7 non-alnum chars between
'l' and 'v' (! space @ # $ % space). Also apply black formatting to
the re.fullmatch assertion in test_unicode_in_block_names_and_defaults.
2026-03-28 07:32:23 +03:00
Zamil Majdy
091c5ebb67 Merge branch 'fix/orchestrator-duplicate-tool-names' of github.com:Significant-Gravitas/AutoGPT into fix/orchestrator-duplicate-tool-names 2026-03-28 07:30:39 +03:00
Zamil Majdy
f77cde318f Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/orchestrator-duplicate-tool-names 2026-03-28 07:27:35 +03:00
Zamil Majdy
2e69220d55 style(test): fix trailing whitespace in dedup tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 03:18:11 +00:00
Zamil Majdy
97b5fb89f6 chore: retrigger CI for Docker rate limit recovery
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 03:10:30 +00:00
Zamil Majdy
7f435c3b02 test(backend): add edge-case tests for tool name disambiguation
Addresses ntindle's review feedback to add more test conditions:
- Empty tool list (no-op)
- Single tool (no suffixing)
- All 10 tools with same name (_1 through _10)
- Multiple distinct duplicate groups
- Large default value truncation in description
- Malformed tool entries (missing function/name)
- Sensitive field filtering from defaults
- Pre-existing suffix collision (_1 already taken)
- Mixed duplicates and uniques
- Description preservation for non-duplicates

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:57:52 +00:00
Zamil Majdy
82c1b06940 fix(orchestrator): guard against non-string tool names in _disambiguate_tool_names
The existing check `"name" not in func` only verifies the key exists but
not that the value is a string.  When name is None, int, or list the
subsequent `name[: 64 - len(suffix)]` slice raises TypeError.

Replace with `not isinstance(func.get("name"), str)` which rejects
missing keys AND non-string values in one check.  Add a dedicated test.
2026-03-27 13:43:11 +07:00
Zamil Majdy
f67e05ef98 revert(orchestrator): use simple _1 _2 _3 numeric suffixes for duplicate tool names
Remove _derive_suffix_from_defaults() added in 3f5c2b93c. Descriptive
suffixes from defaults produce extralong, clashing names. Clean numeric
suffixes are better; context is already provided via the [Pre-configured: ...]
description enrichment. Update tests to match.
2026-03-27 13:32:20 +07:00
Zamil Majdy
8cddf26ed6 test(backend): add edge-case tests for tool name disambiguation
Add 10 new test cases for OrchestratorBlock tool dedup covering:
- Duplicate blocks with no hardcoded defaults (numeric fallback)
- Very long block names (>64 chars) truncation with suffix
- 5+ duplicate blocks all producing unique names
- Mixed: 2 same-type unnamed + 1 custom-named (only unnamed get suffixed)
- Sensitive fields filtered while non-sensitive shown in descriptions
- Empty input_default ({} and None) does not crash
- Direct _disambiguate_tool_names with empty/None defaults
- Unicode in block names and default values
- Unicode in defaults appearing in descriptions
- Tool name collision with existing suffix (while-loop skip)
- Numeric fallback skipping taken suffix names

Also fix existing test assertions to match descriptive suffix behavior
introduced in 3f5c2b93c.
2026-03-27 13:23:09 +07:00
Zamil Majdy
3f5c2b93cd fix(orchestrator): use descriptive suffixes for duplicate tool names
Instead of opaque _1, _2 suffixes, derive meaningful labels from
hardcoded defaults (e.g. search_topic_sports) so the LLM can
distinguish tool variants by name. Falls back to numeric suffixes
when defaults don't provide a useful label.
2026-03-27 13:13:54 +07:00
Zamil Majdy
7e62fdae48 fix(backend): address round 1 review findings for tool dedup
- Add missing non-dict entry to malformed tools test to actually exercise
  the isinstance(tool, dict) guard in _disambiguate_tool_names
- Expand SENSITIVE_FIELD_NAMES with authorization, access_token, and
  refresh_token to prevent leaking additional auth-related fields
2026-03-27 11:03:02 +07:00
Zamil Majdy
a866e8d709 fix(backend): harden _disambiguate_tool_names and extract shared SENSITIVE_FIELD_NAMES
- Use .get() with safe defaults in _disambiguate_tool_names so malformed
  tools (missing function/name keys) are silently skipped instead of
  raising KeyError.
- Move SENSITIVE_FIELD_NAMES to backend/util/security.py for reuse by
  other blocks.
- Add tests for malformed-tool resilience and missing-description handling.
2026-03-27 10:30:01 +07:00
Zamil Majdy
df2cd316f8 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/orchestrator-duplicate-tool-names 2026-03-26 23:52:23 +07:00
Zamil Majdy
f1b1c19612 fix(platform/blocks): remove remaining _skip alias in _create_agent_function_signature 2026-03-26 11:51:31 +07:00
Zamil Majdy
e5d42fcb99 fix(platform/blocks): remove redundant _skip alias, improve truncation marker for large default values 2026-03-26 11:36:17 +07:00
Zamil Majdy
8b289cacdd fix(platform/blocks): extract _SENSITIVE_FIELD_NAMES constant, use Counter for O(n) dedup, truncate large default values in descriptions 2026-03-26 11:23:35 +07:00
Zamil Majdy
1eeac09801 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/orchestrator-duplicate-tool-names 2026-03-26 08:05:16 +07:00
Zamil Majdy
afc02697af fix(platform/blocks): restore sensitive field filtering and suffix collision handling in tool dedup
The revert in dd0cca48b removed sensitive field filtering and suffix
collision handling, breaking two existing tests. This restores the
functionality using inline filtering at the _hardcoded_defaults
collection points (both _create_block_function_signature and
_create_agent_function_signature) and a while-loop in
_disambiguate_tool_names to skip already-taken suffix candidates.
2026-03-26 05:43:53 +07:00
Zamil Majdy
dd0cca48b4 Revert "fix(platform/blocks): filter sensitive fields and handle suffix collisions in tool dedup"
This reverts commit 60d4dd8ff2.
2026-03-25 22:58:59 +07:00
Zamil Majdy
60d4dd8ff2 fix(platform/blocks): filter sensitive fields and handle suffix collisions in tool dedup
- Filter credentials, api_key, password, secret, token, auth from
  tool descriptions to prevent leaking sensitive data to the LLM API
- Check `taken` set when assigning suffixes to avoid collisions with
  user-named tools (e.g. skip _1 if already taken, use _2 instead)
2026-03-25 22:43:21 +07:00
Zamil Majdy
8548bfcc4e refactor(platform/blocks): extract _disambiguate_tool_names as clean module-level helper
Replaces nested inline logic with a focused function. Early return when
no duplicates. No deep nesting. Under 30 lines.
2026-03-25 22:12:30 +07:00
Zamil Majdy
7e19a1aa68 refactor(platform/blocks): remove _is_sensitive_field from tool dedup
Credentials are handled by the credentials system, not input_default.
No secrets should appear in hardcoded defaults, so the pattern-based
sensitive field filter is unnecessary and gives a false sense of security.
2026-03-25 22:07:07 +07:00
Zamil Majdy
5723c1e230 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/orchestrator-duplicate-tool-names 2026-03-25 21:47:02 +07:00
Zamil Majdy
1968ecf355 fix(backend): add input_default to mock nodes in orchestrator tests
The _create_block_function_signature method now accesses
sink_node.input_default for storing hardcoded defaults. Existing test
mocks were missing this attribute, causing AttributeError in CI.
2026-03-25 18:11:01 +07:00
Zamil Majdy
f4d6bc1f5b fix(backend): prevent suffix collision with user-named tools
The dedup logic now checks the full set of existing tool names before
assigning a suffix. If a user-defined tool already uses e.g. 'my_tool_1',
the algorithm skips to '_2'. Fixes the edge case flagged by Sentry.
2026-03-25 16:42:58 +07:00
Zamil Majdy
823eb3d15a fix(backend): address PR review — sensitive fields, isinstance guard, name truncation
- Add isinstance(defaults, dict) guard in _create_agent_function_signature
- Filter credentials/api_key/password/secret/token from tool descriptions
- Truncate base tool name to stay within Anthropic 64-char limit
- Remove redundant cleanup loop for _hardcoded_defaults
- Add tests for sensitive field exclusion and name length truncation
2026-03-25 16:38:51 +07:00
Zamil Majdy
270c2f0f55 fix(backend): disambiguate duplicate tool names in OrchestratorBlock
When multiple nodes use the same block type, the orchestrator generated
identical tool names causing Anthropic API to reject with "Tool names
must be unique." Now appends _1, _2, etc. to duplicates and enriches
their descriptions with hardcoded defaults so the LLM can distinguish.
2026-03-25 16:31:35 +07:00
4 changed files with 1881 additions and 2 deletions

View File

@@ -28,6 +28,7 @@ from backend.data.model import NodeExecutionStats, SchemaField
from backend.util import json
from backend.util.clients import get_database_manager_async_client
from backend.util.prompt import MAIN_OBJECTIVE_PREFIX
from backend.util.security import filter_sensitive_fields
if TYPE_CHECKING:
from backend.data.graph import Link, Node
@@ -35,6 +36,9 @@ if TYPE_CHECKING:
logger = logging.getLogger(__name__)
# Anthropic imposes a 64-character limit on tool/function names.
MAX_TOOL_NAME_LENGTH = 64
class ToolInfo(BaseModel):
"""Processed tool call information."""
@@ -258,6 +262,71 @@ def get_pending_tool_calls(conversation_history: list[Any] | None) -> dict[str,
return {call_id: count for call_id, count in pending_calls.items() if count > 0}
def _disambiguate_tool_names(tools: list[dict[str, Any]]) -> None:
"""Ensure all tool names are unique (Anthropic API requires this).
When multiple nodes use the same block type, they get the same tool name.
This appends _1, _2, etc. and enriches descriptions with hardcoded defaults
so the LLM can distinguish them. Mutates the list in place.
Malformed tools (missing ``function`` or ``function.name``) are silently
skipped so the caller never crashes on unexpected input.
"""
# Collect tools that have the required structure, skipping malformed ones.
valid_tools: list[dict[str, Any]] = []
for tool in tools:
func = tool.get("function") if isinstance(tool, dict) else None
if not isinstance(func, dict) or not isinstance(func.get("name"), str):
# Strip internal metadata even from malformed entries.
if isinstance(func, dict):
func.pop("_hardcoded_defaults", None)
continue
valid_tools.append(tool)
names = [t.get("function", {}).get("name", "") for t in valid_tools]
name_counts = Counter(names)
duplicates = {n for n, c in name_counts.items() if c > 1}
if not duplicates:
for t in valid_tools:
t.get("function", {}).pop("_hardcoded_defaults", None)
return
taken: set[str] = set(names)
counters: dict[str, int] = {}
for tool in valid_tools:
func = tool.get("function", {})
name = func.get("name", "")
defaults = func.pop("_hardcoded_defaults", {})
if name not in duplicates:
continue
counters[name] = counters.get(name, 0) + 1
# Skip suffixes that collide with existing (e.g. user-named) tools
while True:
suffix = f"_{counters[name]}"
candidate = f"{name[: MAX_TOOL_NAME_LENGTH - len(suffix)]}{suffix}"
if candidate not in taken:
break
counters[name] += 1
func["name"] = candidate
taken.add(candidate)
if defaults and isinstance(defaults, dict):
parts: list[str] = []
for k, v in defaults.items():
rendered = json.dumps(v)
if len(rendered) > 100:
rendered = rendered[:80] + "...<truncated>"
parts.append(f"{k}={rendered}")
summary = ", ".join(parts)
original_desc = func.get("description", "") or ""
func["description"] = f"{original_desc} [Pre-configured: {summary}]"
class OrchestratorBlock(Block):
"""
A block that uses a language model to orchestrate tool calls, supporting both
@@ -507,6 +576,19 @@ class OrchestratorBlock(Block):
tool_function["_field_mapping"] = field_mapping
tool_function["_sink_node_id"] = sink_node.id
# Store hardcoded defaults (non-linked inputs) for disambiguation.
# Exclude linked fields, private fields, and credential/auth fields
# to avoid leaking sensitive data into tool descriptions.
defaults = sink_node.input_default
tool_function["_hardcoded_defaults"] = (
filter_sensitive_fields(
defaults,
linked_fields={link.sink_name for link in links},
)
if isinstance(defaults, dict)
else {}
)
return {"type": "function", "function": tool_function}
@staticmethod
@@ -581,6 +663,21 @@ class OrchestratorBlock(Block):
tool_function["_field_mapping"] = field_mapping
tool_function["_sink_node_id"] = sink_node.id
# Store hardcoded defaults (non-linked inputs) for disambiguation.
# Exclude linked fields, private fields, agent meta fields, and
# credential/auth fields to avoid leaking sensitive data.
_AGENT_META_FIELDS = frozenset({"graph_id", "graph_version", "input_schema"})
defaults = sink_node.input_default
tool_function["_hardcoded_defaults"] = (
filter_sensitive_fields(
defaults,
linked_fields={link.sink_name for link in links},
extra_excludes=_AGENT_META_FIELDS,
)
if isinstance(defaults, dict)
else {}
)
return {"type": "function", "function": tool_function}
@staticmethod
@@ -629,6 +726,7 @@ class OrchestratorBlock(Block):
)
return_tool_functions.append(tool_func)
_disambiguate_tool_names(return_tool_functions)
return return_tool_functions
async def _attempt_llm_call_with_validation(
@@ -996,7 +1094,10 @@ class OrchestratorBlock(Block):
credentials, input_data, iteration_prompt, tool_functions
)
except Exception as e:
yield "error", f"LLM call failed in agent mode iteration {iteration}: {str(e)}"
yield (
"error",
f"LLM call failed in agent mode iteration {iteration}: {str(e)}",
)
return
# Process tool calls
@@ -1041,7 +1142,10 @@ class OrchestratorBlock(Block):
if max_iterations < 0:
yield "finished", f"Agent mode completed after {iteration} iterations"
else:
yield "finished", f"Agent mode completed after {max_iterations} iterations (limit reached)"
yield (
"finished",
f"Agent mode completed after {max_iterations} iterations (limit reached)",
)
yield "conversations", current_prompt
async def run(

View File

@@ -1074,6 +1074,7 @@ async def test_orchestrator_uses_customized_name_for_blocks():
mock_node.block_id = StoreValueBlock().id
mock_node.metadata = {"customized_name": "My Custom Tool Name"}
mock_node.block = StoreValueBlock()
mock_node.input_default = {}
# Create a mock link
mock_link = MagicMock(spec=Link)
@@ -1105,6 +1106,7 @@ async def test_orchestrator_falls_back_to_block_name():
mock_node.block_id = StoreValueBlock().id
mock_node.metadata = {} # No customized_name
mock_node.block = StoreValueBlock()
mock_node.input_default = {}
# Create a mock link
mock_link = MagicMock(spec=Link)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,80 @@
"""Shared security constants for field-level filtering.
Other modules (e.g. orchestrator, future blocks) import from here so the
sensitive-field list stays in one place.
"""
from typing import Any
# Substrings used for case-insensitive matching against field names.
# A field is considered sensitive if any of these appear anywhere in the
# lowercased field name (substring match, not exact match).
SENSITIVE_FIELD_NAMES: frozenset[str] = frozenset(
{
"credentials",
"api_key",
"password",
"secret",
"secret_key",
"private_key",
"client_secret",
"token",
"auth",
"authorization",
"access_token",
"refresh_token",
"bearer_token",
"passphrase",
"webhook_secret",
}
)
def is_sensitive_field(field_name: str) -> bool:
"""Check if a field name is sensitive using substring matching.
Returns True if the lowercased field_name contains any of the
SENSITIVE_FIELD_NAMES as a substring.
"""
lower = field_name.lower()
return any(s in lower for s in SENSITIVE_FIELD_NAMES)
def filter_sensitive_fields(
data: dict[str, Any],
*,
extra_excludes: frozenset[str] | None = None,
linked_fields: set[str] | None = None,
) -> dict[str, Any]:
"""Return a copy of *data* with sensitive and private fields removed.
This also recursively scans one level of nested dicts to remove keys
that match sensitive field names, preventing secrets from leaking
through benign top-level key names (e.g. ``{"config": {"api_key": "..."}}``)
Args:
data: The dict to filter.
extra_excludes: Additional exact field names to exclude (e.g.
``{"graph_id", "graph_version", "input_schema"}``).
linked_fields: Field names to exclude because they are linked.
"""
result: dict[str, Any] = {}
excludes = extra_excludes or frozenset()
linked = linked_fields or set()
for k, v in data.items():
if k in linked:
continue
if k.startswith("_"):
continue
if k in excludes:
continue
if is_sensitive_field(k):
continue
# Recursively filter nested dicts one level deep
if isinstance(v, dict):
v = {nk: nv for nk, nv in v.items() if not is_sensitive_field(nk)}
if not v:
continue
result[k] = v
return result