Commit Graph

284 Commits

Author SHA1 Message Date
Zamil Majdy
fcaebd1bb7 refactor(backend/copilot): unified queue-backed copilot turns + async sub-AutoPilot + guide-read gate (#12841)
### Why / What / How

**Why:** the 10-min stream-level idle timeout was killing legitimate
long-running tool calls — notably sub-AutoPilot runs via
`run_block(AutoPilotBlock)`, which routinely take 15–45 min. The symptom
users saw was `"A tool call appears to be stuck"` even though AutoPilot
was actively working. A second long-standing rough edge was shipped
alongside: agents often skipped `get_agent_building_guide` when
generating agent JSON, producing schemas that failed validation and
burned turns on auto-fix loops.

**What:** three threaded pieces.

1. **Async sub-AutoPilot via `run_sub_session`.** New copilot tool that
delegates a task to a fresh (or resumed) sub-AutoPilot, and its
companion `get_sub_session_result` for polling/cancelling. The agent
starts with `run_sub_session(prompt, wait_for_result≤300s)` and, if the
sub isn't done inside the cap, receives a handle + polls via
`get_sub_session_result(wait_if_running≤300s)`. No single MCP call ever
blocks the stream for more than 5 min, so the 10-min stream-idle timer
stays simple and effective (derived as `MAX_TOOL_WAIT_SECONDS * 2`).

2. **Queue-backed copilot turn dispatch** — one code path for all three
callers.
- `run_sub_session` enqueues a `CoPilotExecutionEntry` on the existing
`copilot_execution` exchange instead of spawning an in-process
`asyncio.Task`.
- `AutoPilotBlock.execute_copilot` (graph block) now uses the **same
queue** instead of `collect_copilot_response` inline.
   - The HTTP SSE endpoint was already queue-backed.
- All three share a single primitive: `run_copilot_turn_via_queue` →
`create_session` → `enqueue_copilot_turn` → `wait_for_session_result`.
The event-aggregation logic (`EventAccumulator`/`process_event`) is a
shared module used by both the direct-stream path and the cross-process
waiter.
- Benefits: **deploy/crash resilience** (RabbitMQ redelivery survives
worker restarts), **natural load balancing** across copilot_executor
workers, **sessions as first-class resources** (UI users can
`/copilot?sessionId=<inner>` into any sub or AutoPilot block's session),
and every future stream-level feature (pending-messages drain #12737,
compaction policies, etc.) applies uniformly instead of bypassing
graph-block sessions.

3. **Guide-read gate on agent-generation tools.** `create_agent` /
`edit_agent` / `validate_agent_graph` / `fix_agent_graph` refuse until
the session has called `get_agent_building_guide`. The pre-existing soft
hint was routinely ignored; the gate makes the dependency enforceable.
All four tool descriptions advertise the requirement in one tightened
sentence ("Requires get_agent_building_guide first (refuses
otherwise).") that stays under the 32000-char schema budget.

**How:**

#### Queue-backed sub-AutoPilot + AutoPilotBlock

- `sdk/session_waiter.py` — new module. `SessionResult` dataclass
mirrors `CopilotResult`. `wait_for_session_result` subscribes to
`stream_registry`, drains events via shared `process_event`, returns
`(outcome, result)`. `wait_for_session_completion` is the cheaper
outcome-only variant. `run_copilot_turn_via_queue` is the canonical
three-step dispatch. Every exit path unsubscribes the listener.
- `sdk/stream_accumulator.py` — new module. `EventAccumulator`,
`ToolCallEntry`, `process_event` extracted from `collect.py`. Both the
direct-stream and cross-process paths now use the same fold logic.
- `tools/run_sub_session.py` / `tools/get_sub_session_result.py` —
rewritten around the shared primitive. `sub_session_id` is now the sub's
`ChatSession` id directly (no separate registry handle). Ownership
re-verified on every call via `get_chat_session`. Cancel via
`enqueue_cancel_task` on the existing `copilot_cancel` fan-out exchange.
- `blocks/autopilot.py` — `execute_copilot` replaced its inline
`collect_copilot_response` with `run_copilot_turn_via_queue`.
`SessionResult` carries response text, tool calls, and token usage back
from the worker so no DB round-trip is needed. The block's public I/O
contract (inputs, outputs, `ToolCallEntry` shape) is unchanged.
- `CoPilotExecutionEntry` gains a `permissions: CopilotPermissions |
None` field forwarded to the worker's `stream_fn` so the sub's
capability filter survives the queue hop. The processor passes it
through to `stream_chat_completion_sdk` /
`stream_chat_completion_baseline`.
- **Deleted**: `sdk/sub_session_registry.py` (module-level dict,
done-callback, abandoned-task cap, `notify_shutdown_and_cancel_all`,
`_reset_for_test`), plus the shutdown-notifier hook in
`copilot_executor.processor.cleanup` — redundant under queue-backed
execution.

#### Run_block single-tool cap (3)

- `tools/helpers.execute_block` caps block execution at
`MAX_TOOL_WAIT_SECONDS = 5 min` via `asyncio.wait_for` around the
generator consumption.
- On timeout: logs `copilot_tool_timeout tool=run_block block=…
block_id=… input_keys=… user=… session=… cap_s=…` (grep-friendly) and
returns an `ErrorResponse` that redirects the LLM to `run_agent` /
`run_sub_session`.
- Billing protection: `_charge_block_credits` is called in a `finally`
guarded by `asyncio.shield` and marked `charge_handled` **before** the
await so cancel-mid-charge doesn't double-bill and
cancel-mid-generator-before-charge still settles via the finally.

#### Guide-read gate

- `helpers.require_guide_read(session, tool_name)` scans
`session.messages` for any prior assistant tool call named
`get_agent_building_guide` (handles both OpenAI and flat shapes).
Applied at the top of `_execute` in `create_agent`, `edit_agent`,
`validate_agent_graph`, `fix_agent_graph`. Tool descriptions advertise
the requirement.

#### Shared timing constants

- `MAX_TOOL_WAIT_SECONDS = 5 * 60` + `STREAM_IDLE_TIMEOUT_SECONDS = 2 *
MAX_TOOL_WAIT_SECONDS` in `constants.py`. Every long-running tool
(`run_agent`, `view_agent_output`, `run_sub_session`,
`get_sub_session_result`, `run_block`) imports from one place; no more
hardcoded 300 / `10*60` literals drifting apart. Stream-idle invariant
("no single tool blocks close to the idle timeout") holds by
construction.

### Frontend

- Friendlier tool-card labels: `run_sub_session` → "Sub-AutoPilot",
`get_sub_session_result` → "Sub-AutoPilot result", `run_block` →
"Action" (matches the builder UI's own naming), `run_agent` → "Agent".
Fixes the double-verb "Running Run …" phrasing.
- `SubSessionStatusResponse.sub_autopilot_session_link` surfaces
`/copilot?sessionId=<inner>` so users can click into any sub's session
from the tool-call card — same pattern as `run_agent`'s
`library_agent_link`.

### Changes 🏗️

- **New modules**: `sdk/session_waiter.py`, `sdk/stream_accumulator.py`,
`tools/run_sub_session.py`, `tools/get_sub_session_result.py`,
`tools/sub_session_test.py`, `tools/agent_guide_gate_test.py`.
- **New response types**: `SubSessionStatusResponse`,
`SubSessionProgressSnapshot`, `SessionResult`.
- **New gate helper**: `require_guide_read` in `tools/helpers.py`.
- **Queue protocol**: `permissions` field on `CoPilotExecutionEntry`,
threaded through `processor.py` → `stream_fn`.
- **Hidden**: `AUTOPILOT_BLOCK_ID` in `COPILOT_EXCLUDED_BLOCK_IDS`
(run_block can't execute AutoPilotBlock; agents use `run_sub_session`
instead).
- **Deleted**: `sdk/sub_session_registry.py`, processor
shutdown-notifier hook.
- **Regenerated**: `openapi.json` for the new response types; block-docs
for the updated `ToolName` Literal.
- **Tool descriptions**: tightened the guide-gate hint across the four
agent-builder tools to stay under the 32000-char schema budget.
- **40+ tests** across sub_session, execute_block cap + billing races,
stream_accumulator, agent_guide_gate, frontend helpers.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Unit suite green on the full copilot tree; `poetry run format` +
`pyright` clean
- [x] Schema character budget test passes (tool descriptions trimmed to
stay under 32000)
- [x] Native UI E2E (`poetry run app` + `pnpm dev`):
`run_sub_session(wait_for_result=60)` returns `status="completed"` +
`sub_autopilot_session_link` inline;
`run_sub_session(wait_for_result=1)` returns `status="running"` +
handle, `get_sub_session_result(wait_if_running=60)` observes `running →
completed` transition
- [x] AutoPilotBlock (graph) goes through `copilot_executor` queue
end-to-end (verified via logs: ExecutionManager's AutoPilotBlock node
spawned session `f6de335b-…`, a different `CoPilotExecutor` worker
acquired its cluster lock and ran the SDK stream)
- [x] Guide gate: `create_agent` without a prior
`get_agent_building_guide` returns the refusal; agent reads the guide
and retries successfully
2026-04-18 23:11:41 +07:00
slepybear
334ec18c31 docs: convert in-code comments to MkDocs admonitions in block-sdk-gui… (#12819)
### Why / What / How

<!-- Why: Why does this PR exist? What problem does it solve, or what's
broken/missing without it? -->
This PR converts inline Python comments in code examples within
`block-sdk-guide.md` into MkDocs `!!! note` admonitions. This makes code
examples cleaner and more copy-paste friendly while preserving all
explanatory content.

<!-- What: What does this PR change? Summarize the changes at a high
level. -->
Converts inline comments in code blocks to admonitions following the
pattern established in PR #12396 (new_blocks.md) and PR #12313.

<!-- How: How does it work? Describe the approach, key implementation
details, or architecture decisions. -->
- Wrapped code examples with `!!! note` admonitions
- Removed inline comments from code blocks for clean copy-paste
- Added explanatory admonitions after each code block

### Changes 🏗️

- Provider configuration examples (API key and OAuth)
- Block class Input/Output schema annotations
- Block initialization parameters
- Test configuration
- OAuth and webhook handler implementations
- Authentication types and file handling patterns

### Checklist 📋

#### For documentation changes:
- [x] Follows the admonition pattern from PR #12396
- [x] No code changes, documentation only
- [x] Admonition syntax verified correct

#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes

---

**Related Issues**: Closes #8946

Co-authored-by: slepybear <slepybear@users.noreply.github.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-17 07:47:52 +00:00
Toran Bruce Richards
f410929560 feat(platform): Add xAI Grok 4.20 models from OpenRouter (#12620)
Requested by @Torantulino

Adds the 2 xAI Grok 4.20 models available on OpenRouter that are missing
from the platform.

## Why

`x-ai/grok-4.20` and `x-ai/grok-4.20-multi-agent` are xAI's current
flagship models (released March 2026) and are available via OpenRouter,
but weren't accessible from the platform's LLM blocks.

## Changes

**`autogpt_platform/backend/backend/blocks/llm.py`**
- Added `GROK_4_20` and `GROK_4_20_MULTI_AGENT` enum members
- Added corresponding `MODEL_METADATA` entries (open_router provider, 2M
context window, price tier 3)

**`autogpt_platform/backend/backend/data/block_cost_config.py`**
- Added `MODEL_COST` entries at 5 credits each (flagship tier, $2/M in)

**`docs/integrations/block-integrations/llm.md`**
- Added new model IDs to all LLM block tables

| Model | Pricing | Context |
|-------|---------|---------|
| `x-ai/grok-4.20` | $2/M in, $6/M out | 2M |
| `x-ai/grok-4.20-multi-agent` | $2/M in, $6/M out | 2M |

Both models use the standard OpenRouter chat completions API — no
special handling needed.

Resolves: SECRT-2196

---------

Co-authored-by: Torantulino <22963551+Torantulino@users.noreply.github.com>
Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>
Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-04-16 12:14:56 +00:00
Zamil Majdy
77d8362983 docs(blocks): sync misc.md with memory_search/memory_store tools from dev merge 2026-04-09 23:15:02 +07:00
Toran Bruce Richards
f6ddcbc6cb feat(platform): Add all 12 Z.ai GLM models via OpenRouter (#12672)
## Summary

Add Z.ai (Zhipu AI) GLM model family to the platform LLM blocks, routed
through OpenRouter. This enables users to select any of the 12 Z.ai
models across all LLM-powered blocks (AI Text Generator, AI
Conversation, AI Structured Response, AI Text Summarizer, AI List
Generator).

## Gap Analysis

All 12 Z.ai models currently available on OpenRouter's API were missing
from the AutoGPT platform:

| Model | Context Window | Max Output | Price Tier | Cost |
|-------|---------------|------------|------------|------|
| GLM 4 32B | 128K | N/A | Tier 1 | 1 |
| GLM 4.5 | 131K | 98K | Tier 2 | 2 |
| GLM 4.5 Air | 131K | 98K | Tier 1 | 1 |
| GLM 4.5 Air (Free) | 131K | 96K | Tier 1 | 1 |
| GLM 4.5V (vision) | 65K | 16K | Tier 2 | 2 |
| GLM 4.6 | 204K | 204K | Tier 1 | 1 |
| GLM 4.6V (vision) | 131K | 131K | Tier 1 | 1 |
| GLM 4.7 | 202K | 65K | Tier 1 | 1 |
| GLM 4.7 Flash | 202K | N/A | Tier 1 | 1 |
| GLM 5 | 80K | 131K | Tier 2 | 2 |
| GLM 5 Turbo | 202K | 131K | Tier 3 | 4 |
| GLM 5V Turbo (vision) | 202K | 131K | Tier 3 | 4 |

## Changes

- **`autogpt_platform/backend/backend/blocks/llm.py`**: Added 12
`LlmModel` enum entries and corresponding `MODEL_METADATA` with context
windows, max output tokens, display names, and price tiers sourced from
OpenRouter API
- **`autogpt_platform/backend/backend/data/block_cost_config.py`**:
Added `MODEL_COST` entries for all 12 models, with costs scaled to match
pricing (1 for budget, 2 for mid-range, 4 for premium)

## How it works

All Z.ai models route through the existing OpenRouter provider
(`open_router`) — no new provider or API client code needed. Users with
an OpenRouter API key can immediately select any Z.ai model from the
model dropdown in any LLM block.

## Related

- Linear: REQ-83

---------

Co-authored-by: AutoGPT CoPilot <copilot@agpt.co>
2026-04-03 15:48:33 +00:00
Zamil Majdy
fff101e037 feat(backend): add SQL query block with multi-database support for CoPilot analytics (#12569)
## Summary
- Add a read-only SQL query block for CoPilot/AutoPilot analytics access
- Supports **multiple databases**: PostgreSQL, MySQL, SQLite, MSSQL via
SQLAlchemy
- Enforces read-only queries (SELECT only) with defense-in-depth SQL
validation using sqlparse
- SSRF protection: blocks connections to private/internal IPs
- Credentials stored securely via the platform credential system

## Changes
- New `SQLQueryBlock` in `backend/blocks/sql_query_block.py` with
`DatabaseType` enum
- SQLAlchemy-based execution with dialect-specific read-only and timeout
settings
- Connection URL validation ensuring driver matches selected database
type
- Comprehensive test suite (62 tests) including URL validation,
sanitization, serialization
- Documentation in `docs/integrations/block-integrations/data.md`
- Added `DATABASE` provider to `ProviderName` enum

### Checklist 📋
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan

#### Test plan:
- [x] Unit tests pass for query validation, URL validation, error
sanitization, value serialization
- [x] Read-only enforcement rejects INSERT/UPDATE/DELETE/DROP
- [x] Multi-statement injection blocked
- [x] SSRF protection blocks private IPs
- [x] Connection URL driver validation works for all 4 database types

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 06:43:40 +00:00
Zamil Majdy
f1ac05b2e0 fix(backend): propagate dry-run mode to special blocks with LLM-powered simulation (#12575)
## Summary
- **OrchestratorBlock & AgentExecutorBlock** now execute for real in
dry-run mode so the orchestrator can make LLM calls and agent executors
can spawn child graphs. Their downstream tool blocks and child-graph
blocks are still simulated via `simulate_block()`. Credential fields
from node defaults are restored since `validate_exec()` wipes them in
dry-run mode. Agent-mode iterations capped at 1 in dry-run.
- **All blocks** (including MCPToolBlock) are simulated via a single
generic `simulate_block()` path. The LLM prompt is grounded by
`inspect.getsource(block.run)`, giving the simulator access to the exact
implementation of each block's `run()` method. This produces realistic
mock responses for any block type without needing block-specific
simulation logic.
- Updated agent generation guide to document special block dry-run
behavior.
- Minor frontend fixes: exported `formatCents` from
`RateLimitResetDialog` for reuse in `UsagePanelContent`, used `useRef`
for stable callback references in `useResetRateLimit` to avoid stale
closures.
- 74 tests (21 existing dry-run + 53 new simulator tests covering prompt
building, passthrough logic, and special block dry-run).

## Design

The simulator (`backend/executor/simulator.py`) uses a two-tier
approach:

1. **Passthrough blocks** (OrchestratorBlock, AgentExecutorBlock):
`prepare_dry_run()` returns modified input_data so these blocks execute
for real in `manager.py`. OrchestratorBlock gets `max_iterations=1`
(agent mode) or 0 (traditional mode). AgentExecutorBlock spawns real
child graph executions whose blocks inherit `dry_run=True`.

2. **All other blocks**: `simulate_block()` builds an LLM prompt
containing:
   - Block name and description
   - Input/output schemas (JSON Schema)
   - The block's `run()` source code via `inspect.getsource(block.run)`
- The actual input values (with credentials stripped and long values
truncated)

The LLM then role-plays the block's execution, producing realistic
outputs grounded in the actual implementation.

Special handling for input/output blocks: `AgentInputBlock` and
`AgentOutputBlock` are pure passthrough (no LLM call needed).

## Test plan
- [x] All 74 tests pass (`pytest backend/copilot/tools/test_dry_run.py
backend/executor/simulator_test.py`)
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, frontend
typecheck)
- [x] CI: all checks green
- [x] E2E: dry-run execution completes with `is_dry_run=true`, cost=0,
no errors
- [x] E2E: normal (non-dry-run) execution unchanged
- [x] E2E: Create agent with OrchestratorBlock + tool blocks, run with
`dry_run=True`, verify orchestrator makes real LLM calls while tool
blocks are simulated
- [x] E2E: AgentExecutorBlock spawns child graph in dry-run, child
blocks are LLM-simulated
- [x] E2E: Builder simulate button works end-to-end with special blocks

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:09:55 +00:00
Toran Bruce Richards
11b846dd49 fix(blocks): rename placeholder_values to options on AgentDropdownInputBlock (#12595)
## Summary

Resolves [REQ-78](https://linear.app/autogpt/issue/REQ-78): The
`placeholder_values` field on `AgentDropdownInputBlock` is misleadingly
named. In every major UI framework "placeholder" means non-binding hint
text that disappears on focus, but this field actually creates a
dropdown selector that restricts the user to only those values.

## Changes

### Core rename (`autogpt_platform/backend/backend/blocks/io.py`)
- Renamed `placeholder_values` → `options` on
`AgentDropdownInputBlock.Input`
- Added clear field description: *"If provided, renders the input as a
dropdown selector restricted to these values. Leave empty for free-text
input."*
- Updated class docstring to describe actual behavior
- Overrode `model_construct()` to remap legacy `placeholder_values` →
`options` for **backward compatibility** with existing persisted agent
JSON

### Tests (`autogpt_platform/backend/backend/blocks/test/test_block.py`)
- Updated existing tests to use canonical `options` field name
- Added 2 new backward-compat tests verifying legacy
`placeholder_values` still works through both `model_construct()` and
`Graph._generate_schema()` paths

### Documentation
- Updated
`autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md`
— changed field name in CoPilot SDK guide
- Updated `docs/integrations/block-integrations/basic.md` — changed
field name and description in public docs

### Load tests
(`autogpt_platform/backend/load-tests/tests/api/graph-execution-test.js`)
- Removed spurious `placeholder_values: {}` from AgentInputBlock node
(this field never existed on AgentInputBlock)
- Fixed execution input to use `value` instead of `placeholder_values`

## Backward Compatibility

Existing agents with `placeholder_values` in their persisted
`input_default` JSON will continue to work — the `model_construct()`
override transparently remaps the old key to `options`. No database
migration needed since the field is stored inside a JSON blob, not as a
dedicated column.

## Testing

- All existing tests updated and passing
- 2 new backward-compat tests added
- No frontend changes needed (frontend reads `enum` from generated JSON
Schema, not the field name directly)

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-02 05:56:17 +00:00
Nicholas Tindle
88589764b5 dx(platform): normalize agent instructions for Claude and Codex (#12592)
### Why / What / How

Why: repo guidance was split between Claude-specific `CLAUDE.md` files
and Codex-specific `AGENTS.md` files, which duplicated instruction
content and made the same repository behave differently across agents.
The repo also had Claude skills under `.claude/skills` but no
Codex-visible repo skill path.

What: this PR bridges the repo's Claude skills into Codex and normalizes
shared instruction files so `AGENTS.md` becomes the canonical source
while each `CLAUDE.md` imports its sibling `AGENTS.md`.

How: add a repo-local `.agents/skills` symlink pointing to
`../.claude/skills`; move nested `CLAUDE.md` content into sibling
`AGENTS.md` files; replace each repo `CLAUDE.md` with a one-line
`@AGENTS.md` shim so Claude and Codex read the same scoped guidance
without duplicating text. The root `CLAUDE.md` now imports the root
`AGENTS.md` rather than symlinking to it.

Note: the instruction-file normalization commit was created with
`--no-verify` because the repo's frontend pre-commit `tsc` hook
currently fails on unrelated existing errors, largely missing
`autogpt_platform/frontend/src/app/api/__generated__/*` modules.

### Changes 🏗️

- Add `.agents/skills` as a repo-local symlink to `../.claude/skills` so
Codex discovers the existing Claude repo skills.
- Add a real root `CLAUDE.md` shim that imports the canonical root
`AGENTS.md`.
- Promote nested scoped instruction content into sibling `AGENTS.md`
files under `autogpt_platform/`, `autogpt_platform/backend/`,
`autogpt_platform/frontend/`, `autogpt_platform/frontend/src/tests/`,
and `docs/`.
- Replace the corresponding nested `CLAUDE.md` files with one-line
`@AGENTS.md` shims.
- Preserve the existing scoped instruction hierarchy while making the
shared content cross-compatible between Claude and Codex.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified `.agents/skills` resolves to `../.claude/skills`
  - [x] Verified each repo `CLAUDE.md` now contains only `@AGENTS.md`
- [x] Verified the expected `AGENTS.md` files exist at the root and
nested scoped directories
- [x] Verified the branch contains only the intended agent-guidance
commits relative to `dev` and the working tree is clean

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No runtime configuration changes are included in this PR.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: documentation/instruction-file reshuffle plus an
`.agents/skills` pointer; no runtime code paths are modified.
> 
> **Overview**
> Unifies agent guidance so **`AGENTS.md` becomes canonical** and all
corresponding `CLAUDE.md` files become 1-line shims (`@AGENTS.md`) at
the repo root, `autogpt_platform/`, backend, frontend, frontend tests,
and `docs/`.
> 
> Adds `.agents/skills` pointing to `../.claude/skills` so non-Claude
agents discover the same shared skills/instructions, eliminating
duplicated/agent-specific guidance content.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
839483c3b6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-04-01 09:08:51 +00:00
Zamil Majdy
3e25488b2d feat(copilot): add session-level dry_run flag to autopilot sessions (#12582)
## Summary
- Adds a session-level `dry_run` flag that forces ALL tool calls
(`run_block`, `run_agent`) in a copilot/autopilot session to use dry-run
simulation mode
- Stores the flag in a typed `ChatSessionMetadata` JSON model on the
`ChatSession` DB row, accessed via `session.dry_run` property
- Adds `dry_run` to the AutoPilot block Input schema so graph builders
can create dry-run autopilot nodes
- Refactors multiple copilot tools from `**kwargs` to explicit
parameters for type safety

## Changes
- **Prisma schema**: Added `metadata` JSON column to `ChatSession` model
with migration
- **Python models**: Added `ChatSessionMetadata` model with `dry_run`
field, added `metadata` field to `ChatSessionInfo` and `ChatSession`,
updated `from_db()`, `new()`, and `create_chat_session()`
- **Session propagation**: `set_execution_context(user_id, session)`
called from `baseline/service.py` so tool handlers can read
session-level flags via `session.dry_run`
- **Tool enforcement**: `run_block` and `run_agent` check
`session.dry_run` and force `dry_run=True` when set; `run_agent` blocks
scheduling in dry-run sessions
- **AutoPilot block**: Added `dry_run` input field, passes it when
creating sessions
- **Chat API**: Added `CreateSessionRequest` model with `dry_run` field
to `POST /sessions` endpoint; added `metadata` to session responses
- **Frontend**: Updated `useChatSession.ts` to pass body to the create
session mutation
- **Tool refactoring**: Multiple copilot tools refactored from
`**kwargs` to explicit named parameters (agent_browser, manage_folders,
workspace_files, connect_integration, agent_output, bash_exec, etc.) for
better type safety

## Test plan
- [x] Unit tests for `ChatSession.new()` with dry_run parameter
- [x] Unit tests for `RunBlockTool` session dry_run override
- [x] Unit tests for `RunAgentTool` session dry_run override
- [x] Unit tests for session dry_run blocks scheduling
- [x] Existing dry_run tests still pass (12/12)
- [x] Existing permissions tests still pass
- [x] All pre-commit hooks pass (ruff, isort, pyright, tsc)
- [ ] Manual: Create autopilot session with `dry_run=True`, verify
run_block/run_agent calls use simulation

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 16:27:36 +00:00
Zamil Majdy
37d9863552 feat(platform): add extended thinking execution mode to OrchestratorBlock (#12512)
## Summary
- Adds `ExecutionMode` enum with `BUILT_IN` (default built-in tool-call
loop) and `EXTENDED_THINKING` (delegates to Claude Agent SDK for richer
reasoning)
- Extracts shared `tool_call_loop` into `backend/util/tool_call_loop.py`
— reusable by both OrchestratorBlock agent mode and copilot baseline
- Refactors copilot baseline to use the shared `tool_call_loop` with
callback-driven iteration

## ExecutionMode enum
`ExecutionMode` (`backend/blocks/orchestrator.py`) controls how
OrchestratorBlock executes tool calls:
- **`BUILT_IN`** — Default mode. Runs the built-in tool-call loop
(supports all LLM providers).
- **`EXTENDED_THINKING`** — Delegates to the Claude Agent SDK for
extended thinking and multi-step planning. Requires Anthropic-compatible
providers (`anthropic` / `open_router`) and direct API credentials
(subscription mode not supported). Validates both provider and model
name at runtime.

## Shared tool_call_loop
`backend/util/tool_call_loop.py` provides a generic, provider-agnostic
conversation loop:
1. Call LLM with tools → 2. Extract tool calls → 3. Execute tools → 4.
Update conversation → 5. Repeat

Callers provide three callbacks:
- `llm_call`: wraps any LLM provider (OpenAI streaming, Anthropic,
llm.llm_call, etc.)
- `execute_tool`: wraps any tool execution (TOOL_REGISTRY, graph block
execution, etc.)
- `update_conversation`: formats messages for the specific protocol

## OrchestratorBlock EXTENDED_THINKING mode
- `_create_graph_mcp_server()` converts graph-connected blocks to MCP
tools
- `_execute_tools_sdk_mode()` runs `ClaudeSDKClient` with those MCP
tools
- Agent mode refactored to use shared `tool_call_loop`

## Copilot baseline refactored
- Streaming callbacks buffer `Stream*` events during loop execution
- Events are drained after `tool_call_loop` returns
- Same conversation logic, less code duplication

## SDK environment builder extraction
- `build_sdk_env()` extracted to `backend/copilot/sdk/env.py` for reuse
by both copilot SDK service and OrchestratorBlock

## Provider validation
EXTENDED_THINKING mode validates `provider in ('anthropic',
'open_router')` and `model_name.startswith('claude')` because the Claude
Agent SDK requires an Anthropic API key or OpenRouter key. Subscription
mode is not supported — it uses the platform's internal credit system
which doesn't provide raw API keys needed by the SDK. The validation
raises a clear `ValueError` if an unsupported provider or model is used.

## PR Dependencies
This PR builds on #12511 (Claude SDK client). It can be reviewed
independently — #12511 only adds the SDK client module which this PR
imports. If #12511 merges first, this PR will have no conflicts.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All pre-commit hooks pass (typecheck, lint, format)
  - [x] Existing OrchestratorBlock tests still pass
- [x] Copilot baseline behavior unchanged (same stream events, same tool
execution)
- [x] Manual: OrchestratorBlock with execution_mode=EXTENDED_THINKING +
downstream blocks → SDK calls tools
  - [x] Agent mode regression test (non-SDK path works as before)
  - [x] SDK mode error handling (invalid provider raises ValueError)
2026-03-31 20:04:13 +07:00
Zamil Majdy
f79d8f0449 fix(backend): move placeholder_values exclusively to AgentDropdownInputBlock (#12551)
## Why

`AgentInputBlock` has a `placeholder_values` field whose
`generate_schema()` converts it into a JSON schema `enum`. The frontend
renders any field with `enum` as a dropdown/select. This means
AI-generated agents that populate `placeholder_values` with example
values (e.g. URLs) on regular `AgentInputBlock` nodes end up with
dropdowns instead of free-text inputs — users can't type custom values.

Only `AgentDropdownInputBlock` should produce dropdown behavior.

## What

- Removed `placeholder_values` field from `AgentInputBlock.Input`
- Moved the `enum` generation logic to
`AgentDropdownInputBlock.Input.generate_schema()`
- Cleaned up test data for non-dropdown input blocks
- Updated copilot agent generation guide to stop suggesting
`placeholder_values` for `AgentInputBlock`

## How

The base `AgentInputBlock.Input.generate_schema()` no longer converts
`placeholder_values` → `enum`. Only `AgentDropdownInputBlock.Input`
defines `placeholder_values` and overrides `generate_schema()` to
produce the `enum`.

**Backward compatibility**: Existing agents with `placeholder_values` on
`AgentInputBlock` nodes load fine — `model_construct()` silently ignores
extra fields not defined on the model. Those inputs will now render as
text fields (desired behavior).

## Test plan
- [x] `poetry run pytest backend/blocks/test/test_block.py -xvs` — all
block tests pass
- [x] `poetry run format && poetry run lint` — clean
- [ ] Import an agent JSON with `placeholder_values` on an
`AgentInputBlock` — verify it loads and renders as text input
- [ ] Create an agent with `AgentDropdownInputBlock` — verify dropdown
still works
2026-03-26 08:09:38 +00:00
An Vy Le
f871717f68 fix(backend): add sink input validation to AgentValidator (#12514)
## Summary

- Added `validate_sink_input_existence` method to `AgentValidator` to
ensure all sink names in links and input defaults reference valid input
schema fields in the corresponding block
- Added comprehensive tests covering valid/invalid sink names, nested
inputs, and default key handling
- Updated `ReadDiscordMessagesBlock` description to clarify it reads new
messages and triggers on new posts
- Removed leftover test function file

## Test plan

- [ ] Run `pytest` on `validator_test.py` to verify all sink input
validation cases pass
- [ ] Verify existing agent validation flow is unaffected
- [ ] Confirm `ReadDiscordMessagesBlock` description update is accurate

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-25 16:08:17 +00:00
Zamil Majdy
80bfd64ffa Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-03-24 21:18:11 +07:00
Zamil Majdy
0076ad2a1a hotfix(blocks): bump stagehand ^0.5.1 → ^3.4.0 to fix yanked litellm (#12539)
## Summary

**Critical CI fix** — litellm was compromised in a supply chain attack
(versions 1.82.7/1.82.8 contained infostealer malware) and PyPI
subsequently yanked many litellm versions including the 1.7x range that
stagehand 0.5.x depended on. This breaks `poetry lock` in CI for all
PRs.

- Bump `stagehand` from `^0.5.1` to `^3.4.0` — Stagehand v3 is a
Stainless-generated HTTP API client that **no longer depends on
litellm**, completely removing litellm from our dependency tree
- Migrate stagehand blocks to use `AsyncStagehand` + session-based API
(`sessions.start`, `session.navigate/act/observe/extract`)
- Net reduction of ~430 lines in `poetry.lock` from dropping litellm and
its transitive dependencies

## Why

All CI pipelines are blocked because `poetry lock` fails to resolve
yanked litellm versions that stagehand 0.5.x required.

## Test plan

- [x] CI passes (poetry lock resolves, backend tests green)
- [ ] Verify stagehand blocks still function with the new session-based
API
2026-03-24 21:17:19 +07:00
Zamil Majdy
9381057079 refactor(platform): rename SmartDecisionMakerBlock to OrchestratorBlock (#12511)
## Summary
- Renames `SmartDecisionMakerBlock` to `OrchestratorBlock` across the
entire codebase
- The block supports iteration/agent mode and general tool
orchestration, so "Smart Decision Maker" no longer accurately describes
its capabilities
- Block UUID (`3b191d9f-356f-482d-8238-ba04b6d18381`) remains unchanged
— fully backward compatible with existing graphs

## Changes
- Renamed block class, constants, file names, test files, docs, and
frontend enum
- Updated copilot agent generator (helpers, validator, fixer) references
- Updated agent generation guide documentation
- No functional changes — pure rename refactor

### For code changes
- [x] I have clearly listed my changes in the PR description
- [x] I have made corresponding changes to the documentation
- [x] My changes do not generate new warnings or errors
- [x] New and existing unit tests pass locally with my changes

## Test plan
- [x] All pre-commit hooks pass (typecheck, lint, format)
- [x] Existing graphs with this block continue to load and execute (same
UUID)
- [x] Agent mode / iteration mode works as before
- [x] Copilot agent generator correctly references the renamed block
2026-03-24 19:16:42 +07:00
Zamil Majdy
ee5382a064 feat(copilot): add tool/block capability filtering to AutoPilotBlock (#12482)
## Summary

- Adds `CopilotPermissions` model (`copilot/permissions.py`) — a
capability filter that restricts which tools and blocks the
AutoPilot/Copilot may use during a single execution
- Exposes 4 new `advanced=True` fields on `AutoPilotBlock`: `tools`,
`tools_exclude`, `blocks`, `blocks_exclude`
- Threads permissions through the full execution path: `AutoPilotBlock`
→ `collect_copilot_response` → `stream_chat_completion_sdk` →
`run_block`
- Implements recursion inheritance via contextvar: sub-agent executions
can only be *more* restrictive than their parent

## Design

**Tool filtering** (`tools` + `tools_exclude`):
- `tools_exclude=True` (default): `tools` is a **blacklist** — listed
tools denied, all others allowed. Empty list = allow all.
- `tools_exclude=False`: `tools` is a **whitelist** — only listed tools
are allowed.
- Users specify short names (`run_block`, `web_fetch`, `Read`, `Task`,
…) — mapped to full SDK format internally.
- Validated eagerly at block-run time with a clear error listing valid
names.

**Block filtering** (`blocks` + `blocks_exclude`):
- Same semantics as tool filtering, applied inside `run_block` via
contextvar.
- Each entry can be a full UUID, an 8-char partial UUID (first segment),
or a case-insensitive block name.
- Validated against the live block registry; invalid identifiers surface
a helpful error before the session is created.

**Recursion inheritance**:
- `_inherited_permissions` contextvar stores the parent execution's
permissions.
- On each `AutoPilotBlock.run()`, the child's permissions are merged
with the parent via `merged_with_parent()` — effective allowed sets are
intersected (tools) and the parent chain is kept for block checks.
- Sub-agents can never expand what the parent allowed.

## Test plan

- [x] 68 new unit tests in `copilot/permissions_test.py` and
`blocks/autopilot_permissions_test.py`
- [x] Block identifier matching: full UUID, partial UUID, name,
case-insensitivity
- [x] Tool allow/deny list semantics including edge cases (empty list,
unknown tool)
- [x] Parent/child merging and recursion ceiling correctness
- [x] `validate_tool_names` / `validate_block_identifiers` with mock
block registry
- [x] `apply_tool_permissions` SDK tool-list integration
- [x] `AutoPilotBlock.run()` — invalid tool/block yields error before
session creation
- [x] `AutoPilotBlock.run()` — valid permissions forwarded to
`execute_copilot`
- [x] Existing `AutoPilotBlock` block tests still pass (2/2)
- [x] All hooks pass (pyright, ruff, black, isort)
- [x] E2E: CoPilot chat works end-to-end with E2B sandbox (12s stream)
- [x] E2E: Permission fields render in Builder UI (Tools combobox,
exclude toggles)
- [x] E2E: Agent with restricted permissions (whitelist web_fetch only)
executes correctly
- [x] E2E: Permission values preserved through API round-trip
2026-03-24 07:49:58 +00:00
Nicholas Tindle
f01f668674 fix(backend): support Responses API in SmartDecisionMakerBlock (#12489)
## Summary

- Fixes SmartDecisionMakerBlock conversation management to work with
OpenAI's Responses API, which was introduced in #12099 (commit 1240f38)
- The migration to `responses.create` updated the outbound LLM call but
missed the conversation history serialization — the `raw_response` is
now the entire `Response` object (not a `ChatCompletionMessage`), and
tool calls/results use `function_call` / `function_call_output` types
instead of role-based messages
- This caused a 400 error on the second LLM call in agent mode:
`"Invalid value: ''. Supported values are: 'assistant', 'system',
'developer', and 'user'."`

### Changes

**`smart_decision_maker.py`** — 6 functions updated:
| Function | Fix |
|---|---|
| `_convert_raw_response_to_dict` | Detects Responses API `Response`
objects, extracts output items as a list |
| `_get_tool_requests` | Recognizes `type: "function_call"` items |
| `_get_tool_responses` | Recognizes `type: "function_call_output"`
items |
| `_create_tool_response` | New `responses_api` kwarg produces
`function_call_output` format |
| `_update_conversation` | Handles list return from
`_convert_raw_response_to_dict` |
| Non-agent mode path | Same list handling for traditional execution |

**`test_smart_decision_maker_responses_api.py`** — 61 tests covering:
- Every branch of all 6 affected helper functions
- Chat Completions, Anthropic, and Responses API formats
- End-to-end agent mode and traditional mode conversation validity

## Test plan

- [x] 61 new unit tests all pass
- [x] 11 existing SmartDecisionMakerBlock tests still pass (no
regressions)
- [x] All pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] CI integration tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Updates core LLM invocation and agent conversation/tool-call
bookkeeping to match OpenAI’s Responses API, which can affect tool
execution loops and prompt serialization across providers. Risk is
mitigated by extensive new unit tests, but regressions could surface in
production agent-mode flows or token/usage accounting.
> 
> **Overview**
> **Migrates OpenAI calls from Chat Completions to the Responses API
end-to-end**, including tool schema conversion, output parsing,
reasoning/text extraction, and updated token usage fields in
`LLMResponse`.
> 
> **Fixes SmartDecisionMakerBlock conversation/tool handling for
Responses API** by treating `raw_response` as a Response object
(splitting it into `output` items for replay), recognizing
`function_call`/`function_call_output` entries, and emitting tool
outputs in the correct Responses format to prevent invalid follow-up
prompts.
> 
> Also adjusts prompt compaction/token estimation to understand
Responses API tool items, changes
`get_execution_outputs_by_node_exec_id` to return list-valued
`CompletedBlockOutput`, removes `gpt-3.5-turbo` from model/cost/docs
lists, and adds focused unit tests plus a lightweight `conftest.py` to
run these tests without the full server stack.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ff292efd3d. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
2026-03-20 03:23:52 +00:00
Nicholas Tindle
cbff3b53d3 Revert "feat(backend): migrate OpenAI provider to Responses API" (#12490)
Reverts Significant-Gravitas/AutoGPT#12099

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Reverts the OpenAI integration in `llm_call` from the Responses API
back to `chat.completions`, which can change tool-calling, JSON-mode
behavior, and token accounting across core AI blocks. The change is
localized but touches the primary LLM execution path and associated
tests/docs.
> 
> **Overview**
> Reverts the OpenAI path in `backend/blocks/llm.py` from the Responses
API back to `chat.completions`, including updating JSON-mode
(`response_format`), tool handling, and usage extraction to match the
Chat Completions response shape.
> 
> Removes the now-unused `backend/util/openai_responses.py` helpers and
their unit tests, updates LLM tests to mock `chat.completions.create`,
and adds `gpt-3.5-turbo` to the supported model list, cost config, and
LLM docs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7d6226d10e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-03-20 01:51:56 +00:00
Otto
1240f38f75 feat(backend): migrate OpenAI provider to Responses API (#12099)
## Summary

Migrates the OpenAI provider in the LLM block from
`chat.completions.create` to `responses.create` — OpenAI's newer,
unified API. Also removes the obsolete GPT-3.5-turbo model.

Resolves #11624
Linear:
[OPEN-2911](https://linear.app/autogpt/issue/OPEN-2911/update-openai-calls-to-use-responsescreate)

## Changes

- **`backend/blocks/llm.py`** — OpenAI provider now uses
`responses.create` exclusively. Removed GPT-3.5-turbo enum + metadata.
- **`backend/util/openai_responses.py`** *(new)* — Helpers for the
Responses API: tool format conversion, content/reasoning/usage/tool-call
extraction.
- **`backend/util/openai_responses_test.py`** *(new)* — Unit tests for
all helper functions.
- **`backend/data/block_cost_config.py`** — Removed GPT-3.5 cost entry.
- **`docs/integrations/block-integrations/llm.md`** — Regenerated block
docs.

## Key API differences handled

| Aspect | Chat Completions | Responses API |
|--------|-----------------|---------------|
| Messages param | `messages` | `input` |
| Max tokens param | `max_completion_tokens` | `max_output_tokens` |
| Usage fields | `prompt_tokens` / `completion_tokens` | `input_tokens`
/ `output_tokens` |
| Tool format | Nested under `function` key | Flat structure |

## Test plan

- [x] Unit tests for all `openai_responses.py` helpers
- [x] Existing LLM block tests updated for Responses API mocks
- [x] Regular OpenAI models work
- [x] Reasoning OpenAI models work
- [x] Non-OpenAI models work

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 09:19:31 +00:00
Zamil Majdy
5d9a169e04 feat(blocks): add AutoPilotBlock for invoking AutoPilot from graphs (#12439)
## Summary
- Adds `AutogptCopilotBlock` that invokes the platform's copilot system
(`stream_chat_completion_sdk`) directly from graph executions
- Enables sub-agent patterns: copilot can call this block recursively
(with depth limiting via `contextvars`)
- Enables scheduled copilot execution through the agent executor system
- No user credentials needed — uses server-side copilot config

## Inputs/Outputs
**Inputs:** prompt, system_context, session_id (continuation), timeout,
max_recursion_depth
**Outputs:** response text, tool_calls list, conversation_history JSON,
session_id, token_usage

## Test plan
- [x] Block test passes (`test_available_blocks[AutogptCopilotBlock]`)
- [x] Pre-commit hooks pass (format, lint, typecheck)
- [ ] Manual test: add block to graph, send prompt, verify response
- [ ] Manual test: chain two copilot blocks with session_id to verify
continuation
2026-03-18 11:22:25 +00:00
Otto
e657472162 feat(blocks): Add Nano Banana 2 to image generator, customizer, and editor blocks (#12218)
Requested by @Torantulino

Add `google/nano-banana-2` (Gemini 3.1 Flash Image) support across all
three image blocks.

### Changes

**`ai_image_customizer.py`**
- Add `NANO_BANANA_2 = "google/nano-banana-2"` to `GeminiImageModel`
enum
- Update block description to reference Nano-Banana models generically

**`ai_image_generator_block.py`**
- Add `NANO_BANANA_2` to `ImageGenModel` enum
- Add generation branch (identical to NBP except model name)

**`flux_kontext.py` (AI Image Editor)**
- Rename `FluxKontextModelName` → `ImageEditorModel` (with
backwards-compatible alias)
- Add `NANO_BANANA_PRO` and `NANO_BANANA_2` to the editor
- Model-aware branching in `run_model()`: NB models use `image_input`
list (not `input_image`), no `seed`, and add `output_format`

**`block_cost_config.py`**
- Add NB2 cost entries for all three blocks (14 credits, matching NBP)
- Add NB Pro cost entry for editor block
- Update editor block refs from `.PRO`/`.MAX` to
`.FLUX_KONTEXT_PRO`/`.FLUX_KONTEXT_MAX`

Resolves SECRT-2047

---------

Co-authored-by: Torantulino <Torantulino@users.noreply.github.com>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
2026-03-18 09:42:18 +00:00
Abhimanyu Yadav
e32d258a7e feat(blocks): add AgentMail integration blocks (#12417)
## Summary
- Add a full AgentMail integration with blocks for managing inboxes,
messages, threads, drafts, attachments, lists, and pods
- Includes shared provider configuration (`_config.py`) with API key
authentication
- 8 block modules covering ~25 individual blocks across all AgentMail
API surfaces

  ## Block Modules
  | Module | Blocks |
  |--------|--------|
  | `inbox.py` | Create, Get, List, Update, Delete inboxes |
| `messages.py` | Send, Get, List, Delete messages + org-wide listing |
  | `threads.py` | Get, List, Delete threads + org-wide listing |
| `drafts.py` | Create, Get, List, Update, Send, Delete drafts +
org-wide listing |
  | `attachments.py` | Download attachments |
  | `lists.py` | Create, Get, List, Update, Delete mailing lists |
  | `pods.py` | Create, Get, List, Update, Delete pods |

  ## Test plan
- [x] `poetry run pytest 'backend/blocks/test/test_block.py' -xvs` — all
new blocks pass the standard block test suite
  - [x] test all blocks manually
2026-03-17 12:40:32 +00:00
Nicholas Tindle
8892bcd230 docs: Add workspace and media file architecture documentation (#11989)
### Changes 🏗️

- Added comprehensive architecture documentation at
`docs/platform/workspace-media-architecture.md` covering:
  - Database models (`UserWorkspace`, `UserWorkspaceFile`)
  - `WorkspaceManager` API with session scoping
- `store_media_file()` media normalization pipeline (input types, return
formats)
  - Virus scanning responsibility boundaries
- Decision tree for choosing `WorkspaceManager` vs `store_media_file()`
- Configuration reference including `clamav_max_concurrency` and
`clamav_mark_failed_scans_as_clean`
  - Common patterns with error handling examples
- Updated `autogpt_platform/backend/CLAUDE.md` with a "Workspace & Media
Files" section referencing the new docs
- Removed duplicate `scan_content_safe()` call from
`WriteWorkspaceFileTool` — `WorkspaceManager.write_file()` already scans
internally, so the tool was double-scanning every file
- Replaced removed comment in `workspace.py` with explicit ownership
comment clarifying that `WorkspaceManager` is the single scanning
boundary

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `scan_content_safe()` is called inside
`WorkspaceManager.write_file()` (workspace.py:186)
- [x] Verified `store_media_file()` scans all input branches including
local paths (file.py:351)
- [x] Verified documentation accuracy against current source code after
merge with dev
  - [x] CI checks all passing

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Mostly adds documentation and internal developer guidance; the only
code change is a comment clarifying `WorkspaceManager.write_file()` as
the single virus-scanning boundary, with no behavior change.
> 
> **Overview**
> Adds a new `docs/platform/workspace-media-architecture.md` describing
the Workspace storage layer vs the `store_media_file()` media pipeline,
including session scoping and virus-scanning/persistence responsibility
boundaries.
> 
> Updates backend `CLAUDE.md` to point contributors to the new doc when
working on CoPilot uploads/downloads or
`WorkspaceManager`/`store_media_file()`, and clarifies in
`WorkspaceManager.write_file()` (comment-only) that callers should not
duplicate virus scanning.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
18fcfa03f8. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 06:12:26 +00:00
Bently
ef446e4fe9 feat(llm): Add Cohere Command A Family Models (#12339)
## Summary
Adds the Cohere Command A family of models to AutoGPT Platform with
proper pricing configuration.

## Models Added
- **Command A 03.2025**: Flagship model (256k context, 8k output) - 3
credits
- **Command A Translate 08.2025**: State-of-the-art translation (8k
context, 8k output) - 3 credits
- **Command A Reasoning 08.2025**: First reasoning model (256k context,
32k output) - 6 credits
- **Command A Vision 07.2025**: First vision-capable model (128k
context, 8k output) - 3 credits

## Changes
- Added 4 new LlmModel enum entries with proper OpenRouter model IDs
- Added ModelMetadata for each model with correct context windows,
output limits, and price tiers
- Added pricing configuration in block_cost_config.py

## Testing
- [ ] Models appear in AutoGPT Platform model selector
- [ ] Pricing is correctly applied when using models

Resolves **SECRT-2083**
2026-03-12 11:56:30 +00:00
Bently
7b1e8ed786 feat(llm): Add Microsoft Phi-4 model support (#12342)
## Changes
- Added `MICROSOFT_PHI_4` to LlmModel enum (`microsoft/phi-4`)
- Configured model metadata:
  - 16K context window
  - 16K max output tokens
  - OpenRouter provider
- Set cost tier: 1
  - Input: $0.06 per 1M tokens
  - Output: $0.14 per 1M tokens

## Details
Microsoft Phi-4 is a 14B parameter model available through OpenRouter.
This PR adds proper support in the autogpt_platform backend.

Resolves SECRT-2086
2026-03-12 11:15:27 +00:00
Bently
3595c6e769 feat(llm): add Perplexity Sonar Reasoning Pro model (#12341)
## Summary
Adds support for Perplexity's new reasoning model:
`perplexity/sonar-reasoning-pro`

## Changes
-  Added `PERPLEXITY_SONAR_REASONING_PRO` to `LlmModel` enum
-  Added model metadata (128K context window, 8K max output tokens,
tier 2)
-  Set pricing at 5 credits (matches sonar-pro tier)

## Model Details
- **Model ID:** `perplexity/sonar-reasoning-pro`
- **Provider:** OpenRouter
- **Context Window:** 128,000 tokens
- **Max Output:** 8,000 tokens
- **Pricing:** $0.000002/token (prompt), $0.000008/token (completion)
- **Cost Tier:** 2 (5 credits)

## Testing
-  Black formatting passed
-  Ruff linting passed

Resolves SECRT-2084
2026-03-12 09:58:29 +00:00
Bently
ade2baa58f feat(llm): Add Grok 3 model support (#12343)
## Summary
Adds support for xAI's Grok 3 model to AutoGPT.

## Changes
- Added `GROK_3` to `LlmModel` enum with identifier `x-ai/grok-3`
- Configured model metadata:
  - Context window: 131,072 tokens (128k)
  - Max output: 32,768 tokens (32k)  
  - Provider: OpenRouter
  - Creator: xAI
  - Price tier: 2 (mid-tier)
- Set model cost to 3 credits (mid-tier pricing between fast models and
Grok 4)
- Updated block documentation to include Grok 3 in model lists

## Pricing Rationale
- **Grok 4**: 9 credits (tier 3 - premium, 256k context)
- **Grok 3**: 3 credits (tier 2 - mid-tier, 128k context) ← NEW
- **Grok 4 Fast/4.1 Fast/Code Fast**: 1 credit (tier 1 - affordable)

Grok 3 is positioned as a mid-tier model, priced similarly to other tier
2 models.

## Testing
- [x] Code passes `black` formatting
- [x] Code passes `ruff` linting
- [x] Model metadata and cost configuration added
- [x] Documentation updated

Closes SECRT-2079
2026-03-12 07:31:59 +00:00
Bently
89a5b3178a fix(llm): Update Gemini model lineup - add 3.1 models, deprecate 3 Pro Preview (#12331)
## 🔴 URGENT: Gemini 3 Pro Preview Shutdown - March 9, 2026

Google is shutting down Gemini 3 Pro Preview **tomorrow (March 9,
2026)**. This PR addresses SECRT-2067 by updating the Gemini model
lineup to prevent disruption.

---

## Changes

###  P0 - Critical (This Week)
- [x] **Remove/Replace Gemini 3 Pro Preview** → Migrated to 3.1 Pro
Preview
- [x] **Add Gemini 3.1 Pro Preview** (released Feb 19, 2026)

###  P1 - High Priority  
- [x] **Add Gemini 3.1 Flash Lite Preview** (released Mar 3, 2026)
- [x] **Add Gemini 3 Flash Preview** (released Dec 17, 2025)

###  P2 - Medium Priority
- [x] **Add Gemini 2.5 Pro (stable/GA)** (released Jun 17, 2025)

---

## Model Details

| Model | Context | Input Cost | Output Cost | Price Tier |
|-------|---------|------------|-------------|------------|
| **Gemini 3.1 Pro Preview** | 1.05M | $2.00/1M | $12.00/1M | 2 |
| **Gemini 3.1 Flash Lite Preview** | 1.05M | $0.25/1M | $1.50/1M | 1 |
| **Gemini 3 Flash Preview** | 1.05M | $0.50/1M | $3.00/1M | 1 |
| **Gemini 2.5 Pro (GA)** | 1.05M | $1.25/1M | $10.00/1M | 2 |
| ~~Gemini 3 Pro Preview~~ | ~~1.05M~~ | ~~$2.00/1M~~ | ~~$12.00/1M~~ |
**DEPRECATED** |

---

## Migration Strategy

**Database Migration:**
`20260308095500_migrate_deprecated_gemini_3_pro_preview`

- Automatically migrates all existing graphs using
`google/gemini-3-pro-preview` to `google/gemini-3.1-pro-preview`
- Updates: AgentBlock, AgentGraphExecution, AgentNodeExecution,
AgentGraph
- Zero user-facing disruption
- Migration runs on next deployment (before March 9 shutdown)

---

## Testing

- [ ] Verify new models appear in LLM block dropdown
- [ ] Test migration on staging database
- [ ] Confirm existing graphs using deprecated model auto-migrate
- [ ] Validate cost calculations for new models

---

## References

- **Linear Issue:**
[SECRT-2067](https://linear.app/autogpt/issue/SECRT-2067)
- **OpenRouter Models:** https://openrouter.ai/models/google
- **Google Deprecation Notice:**
https://ai.google.dev/gemini-api/docs/deprecations

---

## Checklist

- [x] Models added to `LlmModel` enum
- [x] Model metadata configured
- [x] Cost config updated
- [x] Database migration created
- [x] Deprecated model commented out (not removed for historical
reference)
- [ ] PR reviewed and approved
- [ ] Merged before March 9, 2026 deadline

---

**Priority:** 🔴 Critical - Must merge before March 9, 2026
2026-03-11 11:21:16 +00:00
Bently
34a2f9a0a2 feat(llm): add Mistral flagship models (Large 3, Medium 3.1, Small 3.2, Codestral) (#12337)
## Summary

Adds four missing Mistral AI flagship models to address the critical
coverage gap identified in
[SECRT-2082](https://linear.app/autogpt/issue/SECRT-2082).

## Models Added

| Model | Context | Max Output | Price Tier | Use Case |
|-------|---------|------------|------------|----------|
| **Mistral Large 3** | 262K | None | 2 (Medium) | Flagship reasoning
model, 41B active params (675B total), MoE architecture |
| **Mistral Medium 3.1** | 131K | None | 2 (Medium) | Balanced
performance/cost, 8x cheaper than traditional large models |
| **Mistral Small 3.2** | 131K | 131K | 1 (Low) | Fast, cost-efficient,
high-volume use cases |
| **Codestral 2508** | 256K | None | 1 (Low) | Code generation
specialist (FIM, correction, test gen) |

## Problem

Previously, the platform only offered:
- Mistral Nemo (1 official model)
- dolphin-mistral (third-party Ollama fine-tune)

This left significant gaps in Mistral's lineup, particularly:
- No flagship reasoning model
- No balanced mid-tier option
- No code-specialized model
- Missing multimodal capabilities (Large 3, Medium 3.1, Small 3.2 all
support text+image)

## Changes

**File:** `autogpt_platform/backend/backend/blocks/llm.py`

- Added 4 enum entries in `LlmModel` class
- Added 4 metadata entries in `MODEL_METADATA` dict
- All models use OpenRouter provider
- Follows existing pattern for model additions

## Testing

-  Enum values match OpenRouter model IDs
-  Metadata follows existing format
-  Context windows verified from OpenRouter API
-  Price tiers assigned appropriately

## Closes

- SECRT-2082

---

**Note:** All models are available via OpenRouter and tested. This
brings Mistral coverage in line with other major providers (OpenAI,
Anthropic, Google).
2026-03-11 08:48:48 +00:00
Zamil Majdy
9f4caa7dfc feat(blocks): add and harden GitHub blocks for full-cycle development (#12334)
## Summary
- Add 8 new GitHub blocks: GetRepositoryInfo, ForkRepository,
ListCommits, SearchCode, CompareBranches, GetRepositoryTree,
MultiFileCommit, MergePullRequest
- Split `repo.py` (2094 lines, 19 blocks) into domain-specific modules:
`repo.py`, `repo_branches.py`, `repo_files.py`, `commits.py`
- Concurrent blob creation via `asyncio.gather()` in MultiFileCommit
- URL-encode branch/ref params via `urllib.parse.quote()` for
defense-in-depth
- Step-level error handling in MultiFileCommit ref update with recovery
SHA
- Collapse FileOperation CREATE/UPDATE into UPSERT (Git Trees API treats
them identically)
- Add `ge=1, le=100` constraints on per_page SchemaFields
- Preserve URL scheme in `prepare_pr_api_url`
- Handle null commit authors gracefully in ListCommits
- Add unit tests for `prepare_pr_api_url`, error-path tests for
MergePR/MultiFileCommit, FileOperation enum validation tests

## Test plan
- [ ] Block tests pass for all 19 GitHub blocks (CI:
`test_available_blocks`)
- [ ] New test file `test_github_blocks.py` passes (prepare_pr_api_url,
error paths, enum)
- [ ] `check-docs-sync` passes with regenerated docs
- [ ] pyright/ruff clean on all changed files
2026-03-11 08:35:37 +00:00
nKOxxx
c7124a5240 Add documentation for Google Gemini integration (#12283)
## Summary
Adding comprehensive documentation for Google Gemini integration with
AutoGPT.

## Changes
- Added setup instructions for Gemini API
- Documented configuration options
- Added examples and best practices

## Related Issues
N/A - Documentation improvement

## Testing
- Verified documentation accuracy
- Tested all code examples

## Checklist
- [x] Code follows project style
- [x] Documentation updated
- [x] Tests pass (if applicable)
2026-03-09 15:13:28 +00:00
Reinier van der Leer
aa08063939 refactor(backend/db): Improve & clean up Marketplace DB layer & API (#12284)
These changes were part of #12206, but here they are separately for
easier review.
This is all primarily to make the v2 API (#11678) work possible/easier.

### Changes 🏗️

- Fix relations between `Profile`, `StoreListing`, and `AgentGraph`
- Redefine `StoreSubmission` view with more efficient joins (100x
speed-up on dev DB) and more consistent field names
- Clean up query functions in `store/db.py`
- Clean up models in `store/model.py`
- Add missing fields to `StoreAgent` and `StoreSubmission` views
- Rename ambiguous `agent_id` -> `graph_id`
- Clean up API route definitions & docs in `store/routes.py`
  - Make routes more consistent
- Avoid collision edge-case between `/agents/{username}/{agent_name}`
and `/agents/{store_listing_version_id}/*`
- Replace all usages of legacy `BackendAPI` for store endpoints with
generated client
- Remove scope requirements on public store endpoints in v1 external API

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Test all Marketplace views (including admin views)
    - [x] Download an agent from the marketplace
  - [x] Submit an agent to the Marketplace
  - [x] Approve/reject Marketplace submission
2026-03-06 14:38:12 +00:00
Bently
7c8c7bf395 feat(llm): add Claude Sonnet 4.6 model (#12158)
## Summary
Adds Claude Sonnet 4.6 (`claude-sonnet-4-6`) to the platform.

## Model Details (from [Anthropic
docs](https://www.anthropic.com/news/claude-sonnet-4-6))
- **API ID:** `claude-sonnet-4-6`
- **Pricing:** $3 / input MTok, $15 / output MTok (same as Sonnet 4.5)
- **Context window:** 200K tokens (1M beta)
- **Max output:** 64K tokens
- **Knowledge cutoff:** Aug 2025 (reliable), Jan 2026 (training data)

## Changes
- Added `CLAUDE_4_6_SONNET` to `LlmModel` enum
- Added metadata entry with correct context/output limits
- Updated Stagehand to use Sonnet 4.6 (better for browser automation
tasks)

## Why
Sonnet 4.6 brings major improvements in coding, computer use, and
reasoning. Developers with early access often prefer it to even Opus
4.5.

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-03-05 19:36:56 +00:00
Krzysztof Czerwinski
a1cb3d2a91 feat(blocks): Add Telegram blocks (#12141)
Add Telegram blocks that allow the use of [Telegram bots' API
features](https://core.telegram.org/bots/features).

### Changes 🏗️

1. Credentials & API layer: Bot token auth via `APIKeyCredentials`,
helper functions for JSON API calls (call_telegram_api) and multipart
file uploads (call_telegram_api_with_file)
2. Trigger blocks:
- `TelegramMessageTriggerBlock` — receives messages (text, photo, voice,
audio, document, video, edited message) with configurable event filters
- `TelegramMessageReactionTriggerBlock` — fires on reaction changes
(private chats auto, groups require admin)
2. Action blocks (11 total):
  - Send: Message, Photo, Voice, Audio, Document, Video
  - Reply to Message, Edit Message, Delete Message
  - Get File (download by file_id)
3. Webhook manager: Registers/deregisters webhooks via Telegram's
setWebhook API, validates incoming requests using
X-Telegram-Bot-Api-Secret-Token header
4. Provider registration: Added TELEGRAM to ProviderName enum and
registered `TelegramWebhooksManager`
5. Media send blocks support both URL passthrough (Telegram fetches
directly) and file upload for workspace/data URI inputs

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Non-AI UUIDs
  - [x] Blocks work correctly
    - [x] SendTelegramMessageBlock
    - [x] SendTelegramPhotoBlock
    - [x] SendTelegramVoiceBlock
    - [x] SendTelegramAudioBlock
    - [x] SendTelegramDocumentBlock
    - [x] SendTelegramVideoBlock
    - [x] ReplyToTelegramMessageBlock
    - [x] GetTelegramFileBlock
    - [x] DeleteTelegramMessageBlock
    - [x] EditTelegramMessageBlock
    - [x] TelegramMessageTriggerBlock (works for every trigger type)
    - [x] TelegramMessageReactionTriggerBlock

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-26 10:25:08 +00:00
Bently
ef42b17e3b docs: add Podman compatibility warning (#12120)
## Summary
Adds a warning to the Getting Started docs clarifying that **Podman and
podman-compose are not supported**.

## Problem
Users on Windows using `podman-compose` instead of Docker get errors
like:
```
Error: the specified Containerfile or Dockerfile does not exist, ..\..\autogpt_platform\backend\Dockerfile
```

This is because Podman handles relative paths differently than Docker,
causing incorrect path resolution on Windows.

## Solution
- Added a clear warning section after the Windows WSL 2 notes
- Explains the error users might see
- Directs them to install Docker Desktop instead

Closes #11358

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Adds a "Podman Not Supported" warning section to the Getting Started
documentation, placed after the Windows/WSL 2 installation notes. The
section clarifies that Docker is required, shows the typical error
message users encounter when using Podman, and directs them to install
Docker Desktop instead. This addresses issue #11358 where Windows users
using `podman-compose` hit path resolution errors.

- Adds `### ⚠️ Podman Not Supported` section under Manual Setup, after
Windows Installation Note
- Includes the specific error message users see with Podman for easy
identification
- Links to Docker Desktop installation docs as the recommended solution
- Formatting is consistent with existing sections in the document (emoji
headings, code blocks for errors)
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge — it only adds a documentation warning
section with no code changes.
- The change is a small, well-written documentation addition that adds a
Podman compatibility warning. It touches only one markdown file,
introduces no code changes, and is consistent with the existing document
structure and style. No issues were found.
- No files require special attention.
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[User wants to run AutoGPT] --> B{Which container runtime?}
    B -->|Docker / Docker Desktop| C[docker compose up -d --build]
    C --> D[AutoGPT starts successfully]
    B -->|Podman / podman-compose| E[podman-compose up -d --build]
    E --> F[Error: Containerfile or Dockerfile does not exist]
    F --> G[New warning section directs user to install Docker Desktop]
    G --> C
```
</details>


<sub>Last reviewed commit: 23ea6bd</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-23 15:19:24 +00:00
Eve
647c8ed8d4 feat(backend/blocks): enhance list concatenation with advanced operations (#12105)
## Summary

Enhances the existing `ConcatenateListsBlock` and adds five new
companion blocks for comprehensive list manipulation, addressing issue
#11139 ("Implement block to concatenate lists").

### Changes

- **Enhanced `ConcatenateListsBlock`** with optional deduplication
(`deduplicate`) and None-value filtering (`remove_none`), plus an output
`length` field
- **New `FlattenListBlock`**: Recursively flattens nested list
structures with configurable `max_depth`
- **New `InterleaveListsBlock`**: Round-robin interleaving of elements
from multiple lists
- **New `ZipListsBlock`**: Zips corresponding elements from multiple
lists with support for padding to longest or truncating to shortest
- **New `ListDifferenceBlock`**: Computes set difference between two
lists (regular or symmetric)
- **New `ListIntersectionBlock`**: Finds common elements between two
lists, preserving order

### Helper Utilities

Extracted reusable helper functions for validation, flattening,
deduplication, interleaving, chunking, and statistics computation to
support the blocks and enable future reuse.

### Test Coverage

Comprehensive test suite with 188 test functions across 29 test classes
covering:
- Built-in block test harness validation for all 6 blocks
- Manual edge-case tests for each block (empty inputs, large lists,
mixed types, nested structures)
- Internal method tests for all block classes
- Unit tests for all helper utility functions

Closes #11139

## Test plan

- [x] All files pass Python syntax validation (`ast.parse`)
- [x] Built-in `test_input`/`test_output` tests defined for all blocks
- [x] Manual tests cover edge cases: empty lists, large lists, mixed
types, nested structures, deduplication, None removal
- [x] Helper function tests validate all utility functions independently
- [x] All block IDs are valid UUID4
- [x] Block categories set to `BlockCategory.BASIC` for consistency with
existing list blocks


<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Enhanced `ConcatenateListsBlock` with deduplication and None-filtering
options, and added five new list manipulation blocks
(`FlattenListBlock`, `InterleaveListsBlock`, `ZipListsBlock`,
`ListDifferenceBlock`, `ListIntersectionBlock`) with comprehensive
helper functions and test coverage.

**Key Changes:**
- Enhanced `ConcatenateListsBlock` with `deduplicate` and `remove_none`
options, plus `length` output field
- Added `FlattenListBlock` for recursively flattening nested lists with
configurable `max_depth`
- Added `InterleaveListsBlock` for round-robin element interleaving
- Added `ZipListsBlock` with support for padding/truncation
- Added `ListDifferenceBlock` and `ListIntersectionBlock` for set
operations
- Extracted 12 reusable helper functions for validation, flattening,
deduplication, etc.
- Comprehensive test suite with 188 test functions covering edge cases

**Minor Issues:**
- Helper function `_deduplicate_list` has redundant logic in the `else`
branch that duplicates the `if` branch
- Three helper functions (`_filter_empty_collections`,
`_compute_list_statistics`, `_chunk_list`) are defined but unused -
consider removing unless planned for future use
- The `_make_hashable` function uses `hash(repr(item))` for unhashable
types, which correctly treats structurally identical dicts/lists as
duplicates
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor style improvements recommended
- The implementation is well-structured with comprehensive test coverage
(188 tests), proper error handling, and follows existing block patterns.
All blocks use valid UUID4 IDs and correct categories. The helper
functions provide good code reuse. The minor issues are purely stylistic
(redundant code, unused helpers) and don't affect functionality or
safety.
- No files require special attention - both files are well-tested and
follow project conventions
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant Block as List Block
    participant Helper as Helper Functions
    participant Output
    
    User->>Block: Input (lists/parameters)
    Block->>Helper: _validate_all_lists()
    Helper-->>Block: validation result
    
    alt validation fails
        Block->>Output: error message
    else validation succeeds
        Block->>Helper: _concatenate_lists_simple() / _flatten_nested_list() / etc.
        Helper-->>Block: processed result
        
        opt deduplicate enabled
            Block->>Helper: _deduplicate_list()
            Helper-->>Block: deduplicated result
        end
        
        opt remove_none enabled
            Block->>Helper: _filter_none_values()
            Helper-->>Block: filtered result
        end
        
        Block->>Output: result + length
    end
    
    Output-->>User: Block outputs
```
</details>


<sub>Last reviewed commit: a6d5445</sub>

<!-- greptile_other_comments_section -->

<sub>(2/5) Greptile learns from your feedback when you react with thumbs
up/down!</sub>

<!-- /greptile_comment -->

---------

Co-authored-by: Otto <otto@agpt.co>
2026-02-16 05:39:53 +00:00
Zamil Majdy
f9f358c526 feat(mcp): Add MCP tool block with OAuth, tool discovery, and standard credential integration (#12011)
## Summary

<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/18e8ef34-d222-453c-8b0a-1b25ef8cf806"
/>


<img width="250" alt="image"
src="https://github.com/user-attachments/assets/ba97556c-09c5-4f76-9f4e-49a2e8e57468"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/68f7804a-fe74-442d-9849-39a229c052cf"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/700690ba-f9fe-4726-8871-3bfbab586001"
/>

Full-stack MCP (Model Context Protocol) tool block integration that
allows users to connect to any MCP server, discover available tools,
authenticate via OAuth, and execute tools — all through the standard
AutoGPT credential system.

### Backend

- **MCPToolBlock** (`blocks/mcp/block.py`): New block using
`CredentialsMetaInput` pattern with optional credentials (`default={}`),
supporting both authenticated (OAuth) and public MCP servers. Includes
auto-lookup fallback for backward compatibility.
- **MCP Client** (`blocks/mcp/client.py`): HTTP transport with JSON-RPC
2.0, tool discovery, tool execution with robust error handling
(type-checked error fields, non-JSON response handling)
- **MCP OAuth Handler** (`blocks/mcp/oauth.py`): RFC 8414 discovery,
dynamic per-server OAuth with PKCE, token storage and refresh via
`raise_for_status=True`
- **MCP API Routes** (`api/features/mcp/routes.py`): `discover-tools`,
`oauth/login`, `oauth/callback` endpoints with credential cleanup,
defensive OAuth metadata validation
- **Credential system integration**:
- `CredentialsMetaInput` model_validator normalizes legacy
`"ProviderName.MCP"` format from Python 3.13's `str(StrEnum)` change
- `CredentialsFieldInfo.combine()` supports URL-based credential
discrimination (each MCP server gets its own credential entry)
- `aggregate_credentials_inputs` checks block schema defaults for
credential optionality
- Executor normalizes credential data for both Pydantic and JSON schema
validation paths
  - Chat credential matching handles MCP server URL filtering
- `provider_matches()` helper used consistently for Python 3.13 StrEnum
compatibility
- **Pre-run validation**: `_validate_graph_get_errors` now calls
`get_missing_input()` for custom block-level validation (MCP tool
arguments)
- **Security**: HTML tag stripping loop to prevent XSS bypass, SSRF
protection (removed trusted_origins)

### Frontend

- **MCPToolDialog** (`MCPToolDialog.tsx`): Full tool discovery UI —
enter server URL, authenticate if needed, browse tools, select tool and
configure
- **OAuth popup** (`oauth-popup.ts`): Shared utility supporting
cross-origin MCP OAuth flows with BroadcastChannel + localStorage
fallback
- **Credential integration**: MCP-specific OAuth flow in
`useCredentialsInput`, server URL filtering in `useCredentials`, MCP
callback page
- **CredentialsSelect**: Auto-selects first available credential instead
of defaulting to "None", credentials listed before "None" in dropdown
- **Node rendering**: Dynamic tool input schema rendering on MCP nodes,
proper handling in both legacy and new flow editors
- **Block title persistence**: `customized_name` set at block creation
for both MCP and Agent blocks — no fallback logic needed, titles survive
save/load reliably
- **Stable credential ordering**: Removed `sortByUnsetFirst` that caused
credential inputs to jump when selected

### Tests (~2060 lines)

- Unit tests: block, client, tool execution
- Integration tests: mock MCP server with auth
- OAuth flow tests
- API endpoint tests
- Credential combining/optionality tests
- E2e tests (skipped in CI, run manually)

## Key Design Decisions

1. **Optional credentials via `default={}`**: MCP servers can be public
(no auth) or private (OAuth). The `credentials` field has `default={}`
making it optional at the schema level, so public servers work without
prompting for credentials.

2. **URL-based credential discrimination**: Each MCP server URL gets its
own credential entry in the "Run agent" form (via
`discriminator="server_url"`), so agents using multiple MCP servers
prompt for each independently.

3. **Model-level normalization**: Python 3.13 changed `str(StrEnum)` to
return `"ClassName.MEMBER"`. Rather than scattering fixes across the
codebase, a Pydantic `model_validator(mode="before")` on
`CredentialsMetaInput` handles normalization centrally, and
`provider_matches()` handles lookups.

4. **Credential auto-select**: `CredentialsSelect` component defaults to
the first available credential and notifies the parent state, ensuring
credentials are pre-filled in the "Run agent" dialog without requiring
manual selection.

5. **customized_name for block titles**: Both MCP and Agent blocks set
`customized_name` in metadata at creation time. This eliminates
convoluted runtime fallback logic (`agent_name`, hostname extraction) —
the title is persisted once and read directly.

## Test plan

- [x] Unit/integration tests pass (68 MCP + 11 graph = 79 tests)
- [x] Manual: MCP block with public server (DeepWiki) — no credentials
needed, tools discovered and executable
- [x] Manual: MCP block with OAuth server (Linear, Sentry) — OAuth flow
prompts correctly
- [x] Manual: "Run agent" form shows correct credential requirements per
MCP server
- [x] Manual: Credential auto-selects when exactly one matches,
pre-selects first when multiple exist
- [x] Manual: Credential ordering stays stable when
selecting/deselecting
- [x] Manual: MCP block title persists after save and refresh
- [x] Manual: Agent block title persists after save and refresh (via
customized_name)
- [ ] Manual: Shared agent with MCP block prompts new user for
credentials

---------

Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-13 16:17:03 +00:00
Nicholas Tindle
cb166dd6fb feat(blocks): Store sandbox files to workspace (#12073)
Store files created by sandbox blocks (Claude Code, Code Executor) to
the user's workspace for persistence across runs.

### Changes 🏗️

- **New `sandbox_files.py` utility** (`backend/util/sandbox_files.py`)
  - Shared module for extracting files from E2B sandboxes
- Stores files to workspace via `store_media_file()` (includes virus
scanning, size limits)
  - Returns `SandboxFileOutput` with path, content, and `workspace_ref`

- **Claude Code block** (`backend/blocks/claude_code.py`)
  - Added `workspace_ref` field to `FileOutput` schema
  - Replaced inline `_extract_files()` with shared utility
  - Files from working directory now stored to workspace automatically

- **Code Executor block** (`backend/blocks/code_executor.py`)
  - Added `files` output field to `ExecuteCodeBlock.Output`
  - Creates `/output` directory in sandbox before execution
  - Extracts all files (text + binary) from `/output` after execution
- Updated `execute_code()` to support file extraction with
`extract_files` param

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create agent with Claude Code block, have it create a file, verify
`workspace_ref` in output
- [x] Create agent with Code Executor block, write file to `/output`,
verify `workspace_ref` in output
  - [x] Verify files persist in workspace after sandbox disposal
- [x] Verify binary files (images, etc.) work correctly in Code Executor
- [x] Verify existing graphs using `content` field still work (backward
compat)

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No configuration changes required - this is purely additive backend
code.

---

**Related:** Closes SECRT-1931

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds automatic extraction and workspace storage of sandbox-written
files (including binaries for code execution), which can affect output
payload size, performance, and file-handling edge cases.
> 
> **Overview**
> **Sandbox blocks now persist generated files to workspace.** A new
shared utility (`backend/util/sandbox_files.py`) extracts files from an
E2B sandbox (scoped by a start timestamp) and stores them via
`store_media_file`, returning `SandboxFileOutput` with `workspace_ref`.
> 
> `ClaudeCodeBlock` replaces its inline file-scraping logic with this
utility and updates the `files` output schema to include
`workspace_ref`.
> 
> `ExecuteCodeBlock` adds a `files` output and extends the executor
mixin to optionally extract/store files (text + binary) when an
`execution_context` is provided; related mocks/tests and docs are
updated accordingly.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
343854c0cf. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 15:56:59 +00:00
Reinier van der Leer
113e87a23c refactor(backend): Reduce circular imports (#12068)
I'm getting circular import issues because there is a lot of
cross-importing between `backend.data`, `backend.blocks`, and other
modules. This change reduces block-related cross-imports and thus risk
of breaking circular imports.

### Changes 🏗️

- Strip down `backend.data.block`
- Move `Block` base class and related class/enum defs to
`backend.blocks._base`
  - Move `is_block_auth_configured` to `backend.blocks._utils`
- Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks`
(`__init__.py`)
  - Update imports everywhere
- Remove unused and poorly typed `Block.create()`
  - Change usages from `block_cls.create()` to `block_cls()`
- Improve typing of `load_all_blocks` and `get_blocks`
- Move cross-import of `backend.api.features.library.model` from
`backend/data/__init__.py` to `backend/data/integrations.py`
- Remove deprecated attribute `NodeModel.webhook`
  - Re-generate OpenAPI spec and fix frontend usage
- Eliminate module-level `backend.blocks` import from `blocks/agent.py`
- Eliminate module-level `backend.data.execution` and
`backend.executor.manager` imports from `blocks/helpers/review.py`
- Replace `BlockInput` with `GraphInput` for graph inputs

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI static type-checking + tests should be sufficient for this
2026-02-12 12:07:49 +00:00
Otto
36aeb0b2b3 docs(blocks): clarify HumanInTheLoop output descriptions for agent builder (#12069)
## Problem

The agent builder (LLM) misinterprets the HumanInTheLoop block outputs.
It thinks `approved_data` and `rejected_data` will yield status strings
like "APPROVED" or "REJECTED" instead of understanding that the actual
input data passes through.

This leads to unnecessary complexity - the agent builder adds comparison
blocks to check for status strings that don't exist.

## Solution

Enriched the block docstring and all input/output field descriptions to
make it explicit that:
1. The output is the actual data itself, not a status string
2. The routing is determined by which output pin fires
3. How to use the block correctly (connect downstream blocks to
appropriate output pins)

## Changes

- Updated block docstring with clear "How it works" and "Example usage"
sections
- Enhanced `data` input description to explain data flow
- Enhanced `name` input description for reviewer context
- Enhanced `approved_data` output to explicitly state it's NOT a status
string
- Enhanced `rejected_data` output to explicitly state it's NOT a status
string
- Enhanced `review_message` output for clarity

## Testing

Documentation-only change to schema descriptions. No functional changes.

Fixes SECRT-1930

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Enhanced documentation for the `HumanInTheLoopBlock` to clarify how
output pins work. The key improvement explicitly states that output pins
(`approved_data` and `rejected_data`) yield the actual input data, not
status strings like "APPROVED" or "REJECTED". This prevents the agent
builder (LLM) from misinterpreting the block's behavior and adding
unnecessary comparison blocks.

**Key changes:**
- Added "How it works" and "Example usage" sections to the block
docstring
- Clarified that routing is determined by which output pin fires, not by
comparing output values
- Enhanced all input/output field descriptions with explicit data flow
explanations
- Emphasized that downstream blocks should be connected to the
appropriate output pin based on desired workflow path

This is a documentation-only change with no functional modifications to
the code logic.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with no risk
- Documentation-only change that accurately reflects the existing code
behavior. No functional changes, no runtime impact, and the enhanced
descriptions correctly explain how the block outputs work based on
verification of the implementation code.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-11 15:43:58 +00:00
Nicholas Tindle
85b6520710 feat(blocks): Add video editing blocks (#11796)
<!-- Clearly explain the need for these changes: -->
This PR adds general-purpose video editing blocks for the AutoGPT
Platform, enabling automated video production workflows like documentary
creation, marketing videos, tutorial assembly, and content repurposing.

### Changes 🏗️

<!-- Concisely describe all of the changes made in this pull request:
-->

**New blocks added in `backend/blocks/video/`:**
- `VideoDownloadBlock` - Download videos from URLs (YouTube, Vimeo, news
sites, direct links) using yt-dlp
- `VideoClipBlock` - Extract time segments from videos with start/end
time validation
- `VideoConcatBlock` - Merge multiple video clips with optional
transitions (none, crossfade, fade_black)
- `VideoTextOverlayBlock` - Add text overlays/captions with positioning
and timing options
- `VideoNarrationBlock` - Generate AI narration via ElevenLabs and mix
with video audio (replace, mix, or ducking modes)

**Dependencies required:**
- `yt-dlp` - For video downloading
- `moviepy` - For video editing operations

**Implementation details:**
- All blocks follow the SDK pattern with proper error handling and
exception chaining
- Proper resource cleanup in `finally` blocks to prevent memory leaks
- Input validation (e.g., end_time > start_time)
- Test mocks included for CI

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Blocks follow the SDK pattern with
`BlockSchemaInput`/`BlockSchemaOutput`
  - [x] Resource cleanup is implemented in `finally` blocks
  - [x] Exception chaining is properly implemented
  - [x] Input validation is in place
  - [x] Test mocks are provided for CI environments

#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)

N/A - No configuration changes required.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new multimedia blocks that invoke ffmpeg/MoviePy and introduces
new external dependencies (plus container packages), which can impact
runtime stability and resource usage; download/overlay blocks are
present but disabled due to sandbox/policy concerns.
> 
> **Overview**
> Adds a new `backend.blocks.video` module with general-purpose video
workflow blocks (download, clip, concat w/ transitions, loop, add-audio,
text overlay, and ElevenLabs-powered narration), including shared
utilities for codec selection, filename cleanup, and an ffmpeg-based
chapter-strip workaround for MoviePy.
> 
> Extends credentials/config to support ElevenLabs
(`ELEVENLABS_API_KEY`, provider enum, system credentials, and cost
config) and adds new dependencies (`elevenlabs`, `yt-dlp`) plus Docker
runtime packages (`ffmpeg`, `imagemagick`).
> 
> Improves file/reference handling end-to-end by embedding MIME types in
`workspace://...#mime` outputs and updating frontend rendering to detect
video vs image from MIME fragments (and broaden supported audio/video
extensions), with optional enhanced output rendering behind a feature
flag in the legacy builder UI.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
da7a44d794. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:22:33 +00:00
Bently
bfa942e032 feat(platform): Add Claude Opus 4.6 model support (#11983)
## Summary
Adds support for Anthropic's newly released Claude Opus 4.6 model.

## Changes
- Added `claude-opus-4-6` to the `LlmModel` enum
- Added model metadata: 200K context window (1M beta), **128K max output
tokens**
- Added block cost config (same pricing tier as Opus 4.5: $5/MTok input,
$25/MTok output)
- Updated chat config default model to Claude Opus 4.6

## Model Details
From [Anthropic's
docs](https://docs.anthropic.com/en/docs/about-claude/models):
- **API ID:** `claude-opus-4-6`
- **Context window:** 200K tokens (1M beta)
- **Max output:** 128K tokens (up from 64K on Opus 4.5)
- **Extended thinking:** Yes
- **Adaptive thinking:** Yes (new, Opus 4.6 exclusive)
- **Knowledge cutoff:** May 2025 (reliable), Aug 2025 (training)
- **Pricing:** $5/MTok input, $25/MTok output (same as Opus 4.5)

---------

Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
2026-02-05 19:19:51 +00:00
Bently
3ca2387631 feat(blocks): Implement Text Encode block (#11857)
## Summary
Implements a `TextEncoderBlock` that encodes plain text into escape
sequences (the reverse of `TextDecoderBlock`).

## Changes

### Block Implementation
- Added `encoder_block.py` with `TextEncoderBlock` in
`autogpt_platform/backend/backend/blocks/`
- Uses `codecs.encode(text, "unicode_escape").decode("utf-8")` for
encoding
- Mirrors the structure and patterns of the existing `TextDecoderBlock`
- Categorised as `BlockCategory.TEXT`

### Documentation
- Added Text Encoder section to
`docs/integrations/block-integrations/text.md` (the auto-generated docs
file for TEXT category blocks)
- Expanded "How it works" with technical details on the encoding method,
validation, and edge cases
- Added 3 structured use cases per docs guidelines: JSON payload
preparation, Config/ENV generation, Snapshot fixtures
- Added Text Encoder to the overview table in
`docs/integrations/README.md`
- Removed standalone `encoder_block.md` (TEXT category blocks belong in
`text.md` per `CATEGORY_FILE_MAP` in `generate_block_docs.py`)

### Documentation Formatting (CodeRabbit feedback)
- Added blank lines around markdown tables (MD058)
- Added `text` language tags to fenced code blocks (MD040)
- Restructured use case section with bold headings per coding guidelines

## How Docs Were Synced
The `check-docs-sync` CI job runs `poetry run python
scripts/generate_block_docs.py --check` which expects blocks to be
documented in category-grouped files. Since `TextEncoderBlock` uses
`BlockCategory.TEXT`, the `CATEGORY_FILE_MAP` maps it to `text.md` — not
a standalone file. The block entry was added to `text.md` following the
exact format used by the generator (with `<!-- MANUAL -->` markers for
hand-written sections).

## Related Issue
Fixes #11111

---------

Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: lif <19658300+majiayu000@users.noreply.github.com>
Co-authored-by: Aryan Kaul <134673289+aryancodes1@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Nick Tindle <nick@ntindle.com>
2026-02-05 17:31:02 +00:00
Otto
4f908d5cb3 fix(platform): Improve Linear Search Block [SECRT-1880] (#11967)
## Summary

Implements [SECRT-1880](https://linear.app/autogpt/issue/SECRT-1880) -
Improve Linear Search Block

## Changes

### Models (`models.py`)
- Added `State` model with `id`, `name`, and `type` fields for workflow
state information
- Added `state: State | None` field to `Issue` model

### API Client (`_api.py`)
- Updated `try_search_issues()` to:
- Add `max_results` parameter (default 10, was ~50) to reduce token
usage
  - Add `team_id` parameter for team filtering
- Return `createdAt`, `state`, `project`, and `assignee` fields in
results
- Fixed `try_get_team_by_name()` to return descriptive error message
when team not found instead of crashing with `IndexError`

### Block (`issues.py`)
- Added `max_results` input parameter (1-100, default 10)
- Added `team_name` input parameter for optional team filtering
- Added `error` output field for graceful error handling
- Added categories (`PRODUCTIVITY`, `ISSUE_TRACKING`)
- Updated test fixtures to include new fields

## Breaking Changes

| Change | Before | After | Mitigation |
|--------|--------|-------|------------|
| Default result count | ~50 | 10 | Users can set `max_results` up to
100 if needed |

## Non-Breaking Changes

- `state` field added to `Issue` (optional, defaults to `None`)
- `max_results` param added (has default value)
- `team_name` param added (optional, defaults to `None`)
- `error` output added (follows established pattern from GitHub blocks)

## Testing

- [x] Format/lint checks pass
- [x] Unit test fixtures updated

Resolves SECRT-1880

---------

Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>
2026-02-04 22:54:46 +00:00
Otto
7ee94d986c docs: add credentials prerequisites to create-basic-agent guide (#11913)
## Summary
Addresses #11785 - users were encountering `openai_api_key_credentials`
errors when following the create-basic-agent guide because it didn't
mention the need to configure API credentials before using AI blocks.

## Changes
Added a **Prerequisites** section to
`docs/platform/create-basic-agent.md` explaining:
- **Cloud users:** Go to Profile → Integrations to add API keys
- **Self-hosted (Docker):** Add keys to `autogpt_platform/backend/.env`
and restart services

Also added a note that the Calculator example doesn't need credentials,
making it a good first test.

## Related
- Issue: #11785
2026-01-31 03:05:31 +00:00
Bently
de0ec3d388 chore(llm): remove deprecated Claude 3.7 Sonnet model with migration and defensive handling (#11841)
## Summary
Remove `claude-3-7-sonnet-20250219` from LLM model definitions ahead of
Anthropic's API retirement, with comprehensive migration and defensive
error handling.

## Background
Anthropic is retiring Claude 3.7 Sonnet (`claude-3-7-sonnet-20250219`)
on **February 19, 2026 at 9:00 AM PT**. This PR removes the model from
the platform and migrates existing users to prevent service
interruptions.

## Changes

### Code Changes
- Remove `CLAUDE_3_7_SONNET` enum member from `LlmModel` in `llm.py`
- Remove corresponding `ModelMetadata` entry
- Remove `CLAUDE_3_7_SONNET` from `StagehandRecommendedLlmModel` enum
- Remove `CLAUDE_3_7_SONNET` from block cost config
- Add `CLAUDE_4_5_SONNET` to `StagehandRecommendedLlmModel` enum
- Update Stagehand block defaults from `CLAUDE_3_7_SONNET` to
`CLAUDE_4_5_SONNET` (staying in Claude family)
- Add defensive error handling in `CredentialsFieldInfo.discriminate()`
for deprecated model values

### Database Migration
- Adds migration `20260126120000_migrate_claude_3_7_to_4_5_sonnet`
- Migrates `AgentNode.constantInput` model references
- Migrates `AgentNodeExecutionInputOutput.data` preset overrides

### Documentation
- Updated `docs/integrations/block-integrations/llm.md` to remove
deprecated model
- Updated `docs/integrations/block-integrations/stagehand/blocks.md` to
remove deprecated model and add Claude 4.5 Sonnet

## Notes
- Agent JSON files in `autogpt_platform/backend/agents/` still reference
this model in their provider mappings. These are auto-generated and
should be regenerated separately.

## Testing
- [ ] Verify LLM block still functions with remaining models
- [ ] Confirm no import errors in affected files
- [ ] Verify migration runs successfully
- [ ] Verify deprecated model gives helpful error message instead of
KeyError
2026-01-30 08:40:55 +00:00
Reinier van der Leer
4cd5da678d refactor(claude): Split autogpt_platform/CLAUDE.md into project-specific files (#11788)
Split `autogpt_platform/CLAUDE.md` into project-specific files, to make
the scope of the instructions clearer.

Also, some minor improvements:

- Change references to other Markdown files to @file/path.md syntax that
Claude recognizes
- Update ambiguous/incorrect/outdated instructions
- Remove trailing slashes
- Fix broken file path references in other docs (including comments)
2026-01-29 17:33:02 +00:00
Nicholas Tindle
7668c17d9c feat(platform): add User Workspace for persistent CoPilot file storage (#11867)
Implements persistent User Workspace storage for CoPilot, enabling
blocks to save and retrieve files across sessions. Files are stored in
session-scoped virtual paths (`/sessions/{session_id}/`).

Fixes SECRT-1833

### Changes 🏗️

**Database & Storage:**
- Add `UserWorkspace` and `UserWorkspaceFile` Prisma models
- Implement `WorkspaceStorageBackend` abstraction (GCS for cloud, local
filesystem for self-hosted)
- Add `workspace_id` and `session_id` fields to `ExecutionContext`

**Backend API:**
- Add REST endpoints: `GET/POST /api/workspace/files`, `GET/DELETE
/api/workspace/files/{id}`, `GET /api/workspace/files/{id}/download`
- Add CoPilot tools: `list_workspace_files`, `read_workspace_file`,
`write_workspace_file`
- Integrate workspace storage into `store_media_file()` - returns
`workspace://file-id` references

**Block Updates:**
- Refactor all file-handling blocks to use unified `ExecutionContext`
parameter
- Update media-generating blocks to persist outputs to workspace
(AIImageGenerator, AIImageCustomizer, FluxKontext, TalkingHead, FAL
video, Bannerbear, etc.)

**Frontend:**
- Render `workspace://` image references in chat via proxy endpoint
- Add "AI cannot see this image" overlay indicator

**CoPilot Context Mapping:**
- Session = Agent (graph_id) = Run (graph_exec_id)
- Files scoped to `/sessions/{session_id}/`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Create CoPilot session, generate image with AIImageGeneratorBlock
  - [ ] Verify image returns `workspace://file-id` (not base64)
  - [ ] Verify image renders in chat with visibility indicator
  - [ ] Verify workspace files persist across sessions
  - [ ] Test list/read/write workspace files via CoPilot tools
  - [ ] Test local storage backend for self-hosted deployments

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new persistent file-storage surface area (DB tables,
storage backends, download API, and chat tools) and rewires
`store_media_file()`/block execution context across many blocks, so
regressions could impact file handling, access control, or storage
costs.
> 
> **Overview**
> Adds a **persistent per-user Workspace** (new
`UserWorkspace`/`UserWorkspaceFile` models plus `WorkspaceManager` +
`WorkspaceStorageBackend` with GCS/local implementations) and wires it
into the API via a new `/api/workspace/files/{file_id}/download` route
(including header-sanitized `Content-Disposition`) and shutdown
lifecycle hooks.
> 
> Extends `ExecutionContext` to carry execution identity +
`workspace_id`/`session_id`, updates executor tooling to clone
node-specific contexts, and updates `run_block` (CoPilot) to create a
session-scoped workspace and synthetic graph/run/node IDs.
> 
> Refactors `store_media_file()` to require `execution_context` +
`return_format` and to support `workspace://` references; migrates many
media/file-handling blocks and related tests to the new API and to
persist generated media as `workspace://...` (or fall back to data URIs
outside CoPilot), and adds CoPilot chat tools for
listing/reading/writing/deleting workspace files with safeguards against
context bloat.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6abc70f793. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-01-29 05:49:47 +00:00
Bently
82d7134fc6 feat(blocks): Add ClaudeCodeBlock for executing tasks via Claude Code in E2B sandbox (#11761)
Introduces a new ClaudeCodeBlock that enables execution of coding tasks
using Anthropic's Claude Code in an E2B sandbox. This block unlocks
powerful agentic coding capabilities - Claude Code can autonomously
create files, install packages, run commands, and build complete
applications within a secure sandboxed environment.

Changes 🏗️

- New file backend/blocks/claude_code.py:
  - ClaudeCodeBlock - Execute tasks using Claude Code in an E2B sandbox
- Dual credential support: E2B API key (sandbox) + Anthropic API key
(Claude Code)
- Session continuation support via session_id, sandbox_id, and
conversation_history
- Automatic file extraction with path, relative_path, name, and content
fields
  - Configurable timeout, setup commands, and working directory
- dispose_sandbox option to keep sandbox alive for multi-turn
conversations

Checklist 📋

For code changes:

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create and execute ClaudeCodeBlock with a simple prompt ("Create a
hello world HTML file")
- [x] Verify files output includes correct path, relative_path, name,
and content
- [x] Test session continuation by passing session_id and sandbox_id
back
- [x] Build "Any API → Instant App" demo agent combining Firecrawl +
ClaudeCodeBlock + GitHub blocks
- [x] Verify generated files are pushed to GitHub with correct folder
structure using relative_path

Here are two example agents i made that can be used to test this agent,
they require github, anthropic and e2b access via api keys that are set
via the user/on the platform is testing on dev

The first agent is my

Any API → Instant App
"Transform any API documentation into a fully functional web
application. Just provide a docs URL and get a complete, ready-to-deploy
app pushed to a new GitHub repository."

[Any API → Instant
App_v36.json](https://github.com/user-attachments/files/24600326/Any.API.Instant.App_v36.json)


The second agent is my
Idea to project
"Simply enter your coding project's idea and this agent will make all of
the base initial code needed for you to start working on that project
and place it on github for you!"

[Idea to
project_v11.json](https://github.com/user-attachments/files/24600346/Idea.to.project_v11.json)

If you have any questions or issues let me know.

References
https://e2b.dev/blog/python-guide-run-claude-code-in-an-e2b-sandbox

https://github.com/e2b-dev/e2b-cookbook/tree/main/examples/anthropic-claude-code-in-sandbox-python
https://code.claude.com/docs/en/cli-reference

I tried to use E2b's "anthropic-claude-code" template but it kept
complaining it was out of date, so I make it manually spin up a E2b
instance and make it install the latest claude code and it uses that
2026-01-23 10:05:32 +00:00