mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
## Summary - **Problem**: When the LLM calls a tool with large file content, it must rewrite all content token-by-token. This is wasteful since the files are already accessible on disk. - **Solution**: Introduces an \`@@agptfile:\` reference protocol. The LLM passes a file path reference; the processor loads and substitutes the content before executing the tool. ### Protocol \`\`\` @@agptfile:<uri>[<start>-<end>] \`\`\` **Supported URI types:** | URI | Source | |-----|--------| | \`workspace://<file_id>\` | Persistent workspace file by ID | | \`workspace:///<path>\` | Workspace file by virtual path | | \`/absolute/path\` | Absolute host or sandbox path | **Line range** is optional; omitting it reads the whole file. ### Backend changes - Rename \`@file:\` → \`@@agptfile:\` prefix for uniqueness; extract \`FILE_REF_PREFIX\` constant - Extract shared execution-context ContextVars into \`backend/copilot/context.py\` — eliminates duplicate ContextVar objects that caused \`e2b_file_tools.py\` to always see empty context - \`tool_adapter.py\` imports ContextVars from \`context.py\` (single source of truth) - \`expand_file_refs_in_string\` raises \`FileRefExpansionError\` on failure (instead of inline error strings), blocking tool execution and returning a clear error hint to the model - Tighten URI regex: only expand refs starting with \`workspace://\` or \`/\` - Aggregate budget: 1 MB total expansion cap across all refs in one string - Per-file cap: 200 KB per individual ref - Fix \`_read_file_handler\` to pass \`get_sdk_cwd()\` to \`is_allowed_local_path\` — ephemeral working directory files were incorrectly blocked - Fix \`_is_allowed_local\` in \`e2b_file_tools.py\` to pass \`get_sdk_cwd()\` - Restrict local path allow-list to \`tool-results/\` subdirectory only (was entire session project dir) - Add \`raise_on_error\` param + remove two-pass \`_FILE_REF_ERROR_RE\` detection - Update system prompt docs and tool_adapter error messages ### Frontend changes - \`BlockInputCard\`: hidden by default with Show/Hide toggle + \`mb-2\` spacing ## Test plan - [ ] \`poetry run pytest backend/copilot/ -x --ignore=backend/copilot/sdk/file_ref_integration_test.py\` passes - [ ] \`@@agptfile:workspace:///<path>[1-50]\` expands correctly in tool calls - [ ] Invalid line ranges produce \`[file-ref error: ...]\` inline messages - [ ] Files outside \`sdk_cwd\` / \`tool-results/\` are rejected - [ ] Block input card shows hidden by default with toggle
219 lines
8.2 KiB
Python
219 lines
8.2 KiB
Python
"""Centralized prompt building logic for CoPilot.
|
|
|
|
This module contains all prompt construction functions and constants,
|
|
handling the distinction between:
|
|
- SDK mode vs Baseline mode (tool documentation needs)
|
|
- Local mode vs E2B mode (storage/filesystem differences)
|
|
"""
|
|
|
|
from backend.copilot.tools import TOOL_REGISTRY
|
|
|
|
# Shared technical notes that apply to both SDK and baseline modes
|
|
_SHARED_TOOL_NOTES = """\
|
|
|
|
### Sharing files with the user
|
|
After saving a file to the persistent workspace with `write_workspace_file`,
|
|
share it with the user by embedding the `download_url` from the response in
|
|
your message as a Markdown link or image:
|
|
|
|
- **Any file** — shows as a clickable download link:
|
|
`[report.csv](workspace://file_id#text/csv)`
|
|
- **Image** — renders inline in chat:
|
|
``
|
|
- **Video** — renders inline in chat with player controls:
|
|
``
|
|
|
|
The `download_url` field in the `write_workspace_file` response is already
|
|
in the correct format — paste it directly after the `(` in the Markdown.
|
|
|
|
### Passing file content to tools — @@agptfile: references
|
|
Instead of copying large file contents into a tool argument, pass a file
|
|
reference and the platform will load the content for you.
|
|
|
|
Syntax: `@@agptfile:<uri>[<start>-<end>]`
|
|
|
|
- `<uri>` **must** start with `workspace://` or `/` (absolute path):
|
|
- `workspace://<file_id>` — workspace file by ID
|
|
- `workspace:///<path>` — workspace file by virtual path
|
|
- `/absolute/local/path` — ephemeral or sdk_cwd file
|
|
- E2B sandbox absolute path (e.g. `/home/user/script.py`)
|
|
- `[<start>-<end>]` is an optional 1-indexed inclusive line range.
|
|
- URIs that do not start with `workspace://` or `/` are **not** expanded.
|
|
|
|
Examples:
|
|
```
|
|
@@agptfile:workspace://abc123
|
|
@@agptfile:workspace://abc123[10-50]
|
|
@@agptfile:workspace:///reports/q1.md
|
|
@@agptfile:/tmp/copilot-<session>/output.py[1-80]
|
|
@@agptfile:/home/user/script.py
|
|
```
|
|
|
|
You can embed a reference inside any string argument, or use it as the entire
|
|
value. Multiple references in one argument are all expanded.
|
|
|
|
|
|
### Sub-agent tasks
|
|
- When using the Task tool, NEVER set `run_in_background` to true.
|
|
All tasks must run in the foreground.
|
|
"""
|
|
|
|
|
|
# Environment-specific supplement templates
|
|
def _build_storage_supplement(
|
|
working_dir: str,
|
|
sandbox_type: str,
|
|
storage_system_1_name: str,
|
|
storage_system_1_characteristics: list[str],
|
|
storage_system_1_persistence: list[str],
|
|
file_move_name_1_to_2: str,
|
|
file_move_name_2_to_1: str,
|
|
) -> str:
|
|
"""Build storage/filesystem supplement for a specific environment.
|
|
|
|
Template function handles all formatting (bullets, indentation, markdown).
|
|
Callers provide clean data as lists of strings.
|
|
|
|
Args:
|
|
working_dir: Working directory path
|
|
sandbox_type: Description of bash_exec sandbox
|
|
storage_system_1_name: Name of primary storage (ephemeral or cloud)
|
|
storage_system_1_characteristics: List of characteristic descriptions
|
|
storage_system_1_persistence: List of persistence behavior descriptions
|
|
file_move_name_1_to_2: Direction label for primary→persistent
|
|
file_move_name_2_to_1: Direction label for persistent→primary
|
|
"""
|
|
# Format lists as bullet points with proper indentation
|
|
characteristics = "\n".join(f" - {c}" for c in storage_system_1_characteristics)
|
|
persistence = "\n".join(f" - {p}" for p in storage_system_1_persistence)
|
|
|
|
return f"""
|
|
|
|
## Tool notes
|
|
|
|
### Shell commands
|
|
- The SDK built-in Bash tool is NOT available. Use the `bash_exec` MCP tool
|
|
for shell commands — it runs {sandbox_type}.
|
|
|
|
### Working directory
|
|
- Your working directory is: `{working_dir}`
|
|
- All SDK file tools AND `bash_exec` operate on the same filesystem
|
|
- Use relative paths or absolute paths under `{working_dir}` for all file operations
|
|
|
|
### Two storage systems — CRITICAL to understand
|
|
|
|
1. **{storage_system_1_name}** (`{working_dir}`):
|
|
{characteristics}
|
|
{persistence}
|
|
|
|
2. **Persistent workspace** (cloud storage):
|
|
- Files here **survive across sessions indefinitely**
|
|
|
|
### Moving files between storages
|
|
- **{file_move_name_1_to_2}**: Copy to persistent workspace
|
|
- **{file_move_name_2_to_1}**: Download for processing
|
|
|
|
### File persistence
|
|
Important files (code, configs, outputs) should be saved to workspace to ensure they persist.
|
|
{_SHARED_TOOL_NOTES}"""
|
|
|
|
|
|
# Pre-built supplements for common environments
|
|
def _get_local_storage_supplement(cwd: str) -> str:
|
|
"""Local ephemeral storage (files lost between turns)."""
|
|
return _build_storage_supplement(
|
|
working_dir=cwd,
|
|
sandbox_type="in a network-isolated sandbox",
|
|
storage_system_1_name="Ephemeral working directory",
|
|
storage_system_1_characteristics=[
|
|
"Shared by SDK Read/Write/Edit/Glob/Grep tools AND `bash_exec`",
|
|
],
|
|
storage_system_1_persistence=[
|
|
"Files here are **lost between turns** — do NOT rely on them persisting",
|
|
"Use for temporary work: running scripts, processing data, etc.",
|
|
],
|
|
file_move_name_1_to_2="Ephemeral → Persistent",
|
|
file_move_name_2_to_1="Persistent → Ephemeral",
|
|
)
|
|
|
|
|
|
def _get_cloud_sandbox_supplement() -> str:
|
|
"""Cloud persistent sandbox (files survive across turns in session)."""
|
|
return _build_storage_supplement(
|
|
working_dir="/home/user",
|
|
sandbox_type="in a cloud sandbox with full internet access",
|
|
storage_system_1_name="Cloud sandbox",
|
|
storage_system_1_characteristics=[
|
|
"Shared by all file tools AND `bash_exec` — same filesystem",
|
|
"Full Linux environment with internet access",
|
|
],
|
|
storage_system_1_persistence=[
|
|
"Files **persist across turns** within the current session",
|
|
"Lost when the session expires (12 h inactivity)",
|
|
],
|
|
file_move_name_1_to_2="Sandbox → Persistent",
|
|
file_move_name_2_to_1="Persistent → Sandbox",
|
|
)
|
|
|
|
|
|
def _generate_tool_documentation() -> str:
|
|
"""Auto-generate tool documentation from TOOL_REGISTRY.
|
|
|
|
NOTE: This is ONLY used in baseline mode (direct OpenAI API).
|
|
SDK mode doesn't need it since Claude gets tool schemas automatically.
|
|
|
|
This generates a complete list of available tools with their descriptions,
|
|
ensuring the documentation stays in sync with the actual tool implementations.
|
|
All workflow guidance is now embedded in individual tool descriptions.
|
|
|
|
Only documents tools that are available in the current environment
|
|
(checked via tool.is_available property).
|
|
"""
|
|
docs = "\n## AVAILABLE TOOLS\n\n"
|
|
|
|
# Sort tools alphabetically for consistent output
|
|
# Filter by is_available to match get_available_tools() behavior
|
|
for name in sorted(TOOL_REGISTRY.keys()):
|
|
tool = TOOL_REGISTRY[name]
|
|
if not tool.is_available:
|
|
continue
|
|
schema = tool.as_openai_tool()
|
|
desc = schema["function"].get("description", "No description available")
|
|
# Format as bullet list with tool name in code style
|
|
docs += f"- **`{name}`**: {desc}\n"
|
|
|
|
return docs
|
|
|
|
|
|
def get_sdk_supplement(use_e2b: bool, cwd: str = "") -> str:
|
|
"""Get the supplement for SDK mode (Claude Agent SDK).
|
|
|
|
SDK mode does NOT include tool documentation because Claude automatically
|
|
receives tool schemas from the SDK. Only includes technical notes about
|
|
storage systems and execution environment.
|
|
|
|
Args:
|
|
use_e2b: Whether E2B cloud sandbox is being used
|
|
cwd: Current working directory (only used in local_storage mode)
|
|
|
|
Returns:
|
|
The supplement string to append to the system prompt
|
|
"""
|
|
if use_e2b:
|
|
return _get_cloud_sandbox_supplement()
|
|
return _get_local_storage_supplement(cwd)
|
|
|
|
|
|
def get_baseline_supplement() -> str:
|
|
"""Get the supplement for baseline mode (direct OpenAI API).
|
|
|
|
Baseline mode INCLUDES auto-generated tool documentation because the
|
|
direct API doesn't automatically provide tool schemas to Claude.
|
|
Also includes shared technical notes (but NOT SDK-specific environment details).
|
|
|
|
Returns:
|
|
The supplement string to append to the system prompt
|
|
"""
|
|
tool_docs = _generate_tool_documentation()
|
|
return tool_docs + _SHARED_TOOL_NOTES
|