Commit Graph

288 Commits

Author SHA1 Message Date
An Vy Le
3aa72b4245 feat(backend/copilot): inline picker-backed inputs via run_block + accept AgentInputBlock subclasses (#12880)
### Why / What / How

**Why:** Resolves #12875. CoPilot's agent-builder was hardcoding Google
Drive file IDs into consuming blocks' `input_default` instead of wiring
an `AgentGoogleDriveFileInputBlock`. A beta user hit this across **13
saved versions** of one agent. Root causes:

1. `validate_io_blocks` only accepted the literal base `AgentInputBlock`
/ `AgentOutputBlock` IDs, so even when CoPilot used a specialized
subclass like `AgentGoogleDriveFileInputBlock` as the only input, the
validator forced it to keep a throwaway base alongside — entrenching the
anti-pattern.
2. Running a Drive consumer directly via CoPilot's `run_block` silently
failed because the auto-credentials flow (picker attaches
`_credentials_id`) existed only in the graph executor, never in
CoPilot's direct-execution path.
3. Drive picker guidance lived in `agent_generation_guide.md` instead of
on the blocks themselves, so it duplicated and drifted from the code.
4. Observed in a live session: when asked to read a private sheet,
CoPilot refused with "share publicly or use the builder" instead of
calling `run_block` and letting the picker render — the prompt rule was
buried and the fallback path (omitted required picker field) returned a
generic schema preview.

**What:** Four coordinated platform + CoPilot improvements. No
block-specific validator rules, no Drive-specific code in UI or prompt.

**How:**

#### 1. `validate_io_blocks` subclass support

Accepts any block with `uiType == "Input"` / `"Output"` (populated from
`Block.block_type` at registration). `AgentGoogleDriveFileInputBlock`,
`AgentDropdownInputBlock`, `AgentTableInputBlock`, etc. stand alone.
Base-ID fallback preserved for call sites that pass a minimal blocks
list.

#### 2. Inline picker via `run_block`

- Extracted `_acquire_auto_credentials` from
`backend/executor/manager.py` into shared
`backend/executor/auto_credentials.py` (exports
`acquire_auto_credentials` + `MissingAutoCredentialsError`).
- Wired it into `backend/copilot/tools/helpers.py::execute_block`. When
`_credentials_id` is present, the block executes with creds injected
(chained flows work). When missing/null, `execute_block` returns the
existing `SetupRequirementsResponse` — frontend's `FormRenderer` renders
the picker inline via the existing
`GoogleDrivePickerField`/`GoogleDrivePickerInput`. On pick, the LLM
re-invokes `run_block` with the populated input — same continuation
pattern as OAuth-missing-credentials. No new response types, no new
continuation tool, no new frontend component.
- `run_block` now short-circuits to `SetupRequirementsResponse` when
missing required fields include a picker-backed field, skipping the
schema-preview round trip the LLM would otherwise take.
- `get_inputs_from_schema` spreads the full property schema (`**schema`)
instead of whitelisting — any `format` / `json_schema_extra` / custom
widget config flows through to the generic custom-field dispatch on the
frontend. Future picker formats (date pickers, file pickers, etc.) work
without backend changes.
- Frontend `SetupRequirementsCard/helpers.ts` uses index-signature
passthrough for arbitrary schema keys — no widget-specific code in that
layer.

#### 3. `validate_only` parameter on `run_block`

`run_block(id, {})` is not always a safe probe — for blocks with zero
required inputs, it executes. New `validate_only: true` parameter
returns `BlockDetailsResponse` (schema + missing-input list) without
executing, rendering picker cards, or charging credits. Same response
shape as the existing schema preview — no new branch, just an extra
condition on the existing one. LLM uses this for pre-flight when it's
unsure whether a block has required inputs.

#### 4. Block-local picker guidance

Agent-generation picker guidance relocated from the guide onto the
blocks themselves — surfaced at `find_block` time, exactly when the LLM
decides to wire a picker-backed consumer:

- `GoogleDriveFileField` (shared factory for every Drive field on
Sheets/Docs/etc.) appends a standard hint to the caller's description
covering: feed from the specialized input block, never hardcode (even
one parsed from a URL), picker is the only credential source.
- `AgentGoogleDriveFileInputBlock`'s block description now covers when
it's required, the `allowed_views` mapping, wiring direction, and a
concrete link-shape example.
- `agent_generation_guide.md` loses the dedicated 71-line Drive section.
The IO-blocks section now tells the LLM specialized subclasses satisfy
the requirement and carry their own usage guidance in block/field
descriptions — read them when `find_block` surfaces a match.
- New "Picker-backed inputs via `run_block`" section in the CoPilot
prompt, written generically (picker fields detected via `format` /
`auto_credentials` schema hints, no provider names hardcoded) — covers:
don't ask the user for URLs/IDs, don't refuse private-resource asks,
chained picker objects pass through as-is.
- Sharpened `MissingAutoCredentialsError` message so when a bare ID
reaches execution, the error explicitly tells the LLM the picker renders
inline (not "ask the user for something").

### Changes 🏗️

- `backend/copilot/tools/agent_generator/validator.py` —
`_collect_io_block_ids` + subclass-aware `validate_io_blocks`.
- `backend/executor/auto_credentials.py` (new) — shared
`acquire_auto_credentials` + `MissingAutoCredentialsError`.
- `backend/executor/manager.py` — imports from the shared module, drops
the local copy.
- `backend/copilot/tools/helpers.py` — `execute_block` calls
`acquire_auto_credentials`, merges kwargs, releases locks in `finally`,
returns `SetupRequirementsResponse` on missing creds.
`get_inputs_from_schema` spreads the full property schema.
- `backend/copilot/tools/run_block.py` — picker-field short-circuit +
`validate_only` parameter.
- `backend/copilot/prompting.py` — "Picker-backed inputs via
`run_block`" + "Pre-flight with `validate_only`" sections.
- `backend/blocks/google/_drive.py` — `GoogleDriveFileField` appends the
agent-builder hint to every Drive consumer's description.
- `backend/blocks/io.py` — `AgentGoogleDriveFileInputBlock` description
expanded.
- `backend/copilot/sdk/agent_generation_guide.md` — Drive section
removed, IO-blocks subclass note expanded.
- `frontend/.../SetupRequirementsCard/helpers.ts` — index-signature
passthrough for arbitrary schema keys; schema fields propagate into the
generated RJSF schema.
- Tests: new `TestExecuteBlockAutoCredentials` (4 cases) +
`validate_only` + picker-short-circuit cases in `run_block_test.py`;
`manager_auto_credentials_test.py` moved to new import path; 6 new
frontend cases in `SetupRequirementsCard/__tests__/helpers.test.ts`
covering schema passthrough.
- Also: one-line hoist of `import secrets` in
`backend/integrations/managed_providers/ayrshare.py` — ruff E402
introduced by #12883 was blocking our lint post-merge.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Backend unit suites: validator_test (48), helpers_test (40),
run_block_test (19), manager_auto_credentials_test (15) — **all green**
- [x] Frontend `SetupRequirementsCard` helpers — **75/75 pass**
(including 6 new passthrough cases)
- [x] `poetry run format` (ruff + isort + black) clean on touched files
(pre-existing pyright errors in unrelated `graphiti_core` /
`StreamEvent` / etc. files not introduced by this PR)
- [x] Live CoPilot chat on dev-builder confirmed the setup card renders
`custom/google_drive_picker_field` for a Drive consumer block called via
`run_block`
- [x] Live agent-generation confirmed CoPilot creates a subclass-only
agent (`AgentGoogleDriveFileInputBlock` → `GoogleSheetsReadBlock` →
`AgentOutputBlock`) with no throwaway base `AgentInputBlock`

#### For configuration changes:
- [x] N/A — no config changes

---------

Co-authored-by: majdyz <zamil.majdy@agpt.co>
2026-04-24 13:05:11 +07:00
Zamil Majdy
80bfde1ca6 feat(blocks): charge Ayrshare per-post + align Bannerbear/Jina floors (#12893)
## Why

The cost-tracking audit on 2026-04-23 ([Platform System
Credentials](https://www.notion.so/auto-gpt/4d251f343fe146bcb91b6a037d1bfc3c))
surfaced three gaps where the user wallet was silently subsidising
third-party spend:

1. **Ayrshare (13 blocks)** — zero charge on every social post. No
`BLOCK_COSTS` entry, no SDK `.with_base_cost` registration. Platform
absorbs the entire ~$149/mo Business plan.
2. **Bannerbear** — flat 1 credit/call below the ~$0.025/image unit cost
on the Starter tier ($49/mo / 2K images).
3. **JinaChunkingBlock** — wallet-free; siblings (`JinaEmbeddingBlock`,
`SearchTheWebBlock`) are charged.

## What

- New `backend/blocks/ayrshare/_cost.py` with two-tier
`AYRSHARE_POST_COSTS` (5 credits when `is_video=True`, 2 credits
otherwise — first-match wins in `block_usage_cost`).
- All 13 `PostTo*Block` classes decorated with
`@cost(*AYRSHARE_POST_COSTS)`.
- `BannerbearTextOverlayBlock` floor: 1 → 3 credits in
`bannerbear/_config.py`.
- `JinaChunkingBlock` added to `BLOCK_COSTS` with a flat 1-credit floor.
- `cost(...)` decorator generic-ized via `TypeVar`, so pyright retains
`PostToXBlock.Input/Output` narrowing.

## How

Ayrshare uses a decorator-based registration (not a direct `BLOCK_COSTS`
entry) because each `post_to_*.py` block imports from `backend.sdk`, and
`backend.sdk.cost_integration` imports `BLOCK_COSTS` — listing the
blocks in `block_cost_config.py` would create a circular import. The
`@cost` decorator defined in `sdk/cost_integration.py` was already the
approved escape hatch for this exact shape.

cost_filter in `block_usage_cost` already supports boolean-field
matching (see Apollo's `enrich_info` tier), so `{"is_video": True}` and
`{"is_video": False}` select the right tier at execution time.
`is_video` defaults to `False` on `BaseAyrshareInput`, so posts that
omit the field still land on the 2-credit default.

## Test plan

- [x] `poetry run pytest backend/data/block_cost_config_test.py` — new
6-test suite covers Ayrshare video/non-video/default tiers, the
Bannerbear floor, and the Jina chunking floor
- [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py`
— no regressions (45 pre-existing tests still pass)
- [x] `poetry run ruff format` + `poetry run isort` + `poetry run ruff
check --fix`
- [x] `poetry run pyright` on touched files — 0 errors, 0 warnings
(pre-existing `LlmModel.KIMI_K2_*` errors are on dev and unrelated)
- [ ] Manual: run an Ayrshare post through the builder and confirm 2cr
(text/image) vs 5cr (video) charge
2026-04-23 20:39:35 +07:00
Zamil Majdy
39cdc0a5e0 fix(backend/copilot): tame Kimi compaction storm + tunable threshold + Langfuse cost backfill (#12889)
## Why

Investigation of two reported sessions
([85804387](https://dev-builder.agpt.co/copilot?sessionId=85804387-7708-4fdc-8ec9-64283cdd902d),
[19d69dec](https://dev-builder.agpt.co/copilot?sessionId=19d69dec-210f-4439-a94b-2d7d443b9909))
where Kimi K2.6 via OpenRouter was running ~30 min per turn with no
actions completed (Discord report from Toran). Langfuse traces showed:

- 31 generation calls per turn at p90 = 151s, max = 415s
- 2.57M uncached tokens, `cache_create=0`, ~4% cache_read — Moonshot's
OpenRouter endpoint silently drops Anthropic-style cache writes
- **3 SDK-internal compactions per turn** — each compaction is itself a
slow LLM round-trip
- Reconciled OpenRouter cost was being recorded to a DB row but never
surfaced on the Langfuse trace, leaving operators to grep pod logs

## What

Four commits, split by concern.

### 1. `fix(backend/copilot): skip CLAUDE_AUTOCOMPACT_PCT_OVERRIDE for
Moonshot/Kimi` (`5fd9c5aa`)

`env.py` was unconditionally setting
`CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50` (introduced in #12747 to cap
cache-creation cost on Anthropic where context >200K = 54% of total
cost). On Kimi where `cache_create=0` silently, the cache-cost rationale
doesn't apply — but the 50% threshold still made the bundled CLI
auto-compact at ~100K tokens, triggering 3+ compactions per turn against
Kimi's larger effective window. Each compaction added a slow LLM
round-trip (one in our test ran 166s and burned the budget cap before
the user got any output).

Threads the resolved `sdk_model` (and `fallback_model`) into
`build_sdk_env` and skips the env var when the model matches
`is_moonshot_model(...)`. The CLI then uses its default ~93% threshold,
cutting compaction passes to 0–1.

### 2. `feat(backend/copilot): backfill OpenRouter reconciled cost to
Langfuse trace` (`f3de3624` + follow-ups `5ce3d038`, `d2c1a2cd`,
`d8e08525`, `d243bf6c9`)

`record_turn_cost_from_openrouter` runs as a fire-and-forget task after
the OTel span closes, so the Langfuse trace UI showed the SDK CLI's
rate-card estimate only — for non-Anthropic OpenRouter routes that
estimate is Sonnet pricing on Kimi tokens (~5x too high).

The backfill captures `langfuse.get_current_trace_id()` and threads it
into the reconcile task, which emits an `openrouter-cost-reconcile`
child event with the authoritative cost + token usage. **Bug caught
during /pr-test:** `propagate_attributes` only annotates an existing
OTel span, it doesn't create one — by the time the `finally` block runs,
SDK-emitted spans have ended and `get_current_trace_id()` returns None.
Fixed in `d8e08525` by wrapping the turn in
`langfuse.start_as_current_span(name="copilot-sdk-turn")`. Also tags
fallback-path events with `cost_source` so operators can distinguish
reconciled vs estimated turns.

### 3. `feat(backend/copilot): expose CLAUDE_AUTOCOMPACT_PCT_OVERRIDE as
a config knob` (`72416f73`)

The previously-hardcoded `50` is now
`claude_agent_autocompact_pct_override` (default 50, env
`CHAT_CLAUDE_AGENT_AUTOCOMPACT_PCT_OVERRIDE`). Setting to 0 omits the
env var entirely so the CLI uses its native ~93% threshold — useful when
the post-compact floor (system prompt + tool defs ≈ 65–110K) sits close
to an aggressive trigger and operators see back-to-back compaction
cascades. Moonshot routes still skip the env var unconditionally
regardless of config.

### 4. `fix(backend/copilot): align SDK retry compaction target with CLI
autocompact threshold` (`730ad256`)

`_reduce_context` was calling `compact_transcript` without an explicit
`target_tokens`, so it fell back to `get_compression_target(model) =
context_window - 60K`. For Sonnet 200K that's 140K — well above the
CLI's PCT=50 trigger of 90K — and for Kimi 256K it's 196K, above the
CLI's default 167K trigger. Result: a successful retry compaction landed
at 140K/196K and the CLI immediately re-compacted on the next call →
**two compactions per recovered turn**.

New `_compaction_target_tokens(model)` mirrors the CLI's `i6_()` formula
(`min(window * pct/100, window - 13K)`) with a 20K safety buffer so the
post-compact context sits comfortably below the CLI's trigger.

## How — empirical validation against the actual long Kimi transcript

Replayed the 199-message transcript from session 85804387 through the
bundled CLI in two configurations:

| | Post-fix (no override) | Pre-fix (`PCT_OVERRIDE=50`) |
|---|---|---|
| `autocompact: tokens=` | 126,312 | 126,341 |
| `threshold=` | **167,000** | **90,000** |
| Decision | 126K < 167K → **skip** | 126K > 90K → **COMPACTION FIRES**
|
| Duration | 21s | **166s** (8x slower) |
| Cost | $0.34 | **$0.82** (2.4x more) |
| Output | PONG (success) | empty (hit $0.50 budget cap, exit 1) |

The pre-fix configuration burned $0.82 of compaction work over 166s and
never produced a user response — exactly the failure mode reported.

**Why cascade happens at 50%, not at 93%:** post-compaction context is
`summary (~5–10K) + system_prompt + tool_definitions + skills + active
TodoWrite + memory ≈ 65–110K floor`. With trigger at 90K, post-compact
floor sits AT or above the trigger → next assistant message tips over →
immediate re-compaction → cascade until the CLI's rapid-refill breaker
trips at 3 attempts. With trigger at 167K, the same floor sits
comfortably below trigger → no cascade.

## Considered but not done

- **Force `cache_control` markers to reach Moonshot**: bundled CLI sends
them by default; Moonshot silently drops them per their own docs (uses
`X-Msh-Context-Cache` headers, not body markers). Real fix needs
bypassing OpenRouter — out of scope.
- **Slim the system prompt + tool definitions** to lower the
post-compact floor: real win but separate refactor with tool-use
accuracy A/B.
- **LD-driven auto-fallback to Sonnet on Kimi degradation**:
`claude_agent_fallback_model` already wires `--fallback-model` for
overload (529); auto-flipping on slowness needs latency aggregation
infra that doesn't exist yet.

## Test plan

- [x] `poetry run pytest backend/copilot/sdk/env_test.py
backend/copilot/sdk/openrouter_cost_test.py
backend/copilot/sdk/service_helpers_test.py` — 111 passed (37 env + 23
cost + 51 helpers, including 6 new env tests, 3 backfill tests, 6 new
compaction-target tests)
- [x] `poetry run pytest backend/copilot/sdk/` — 970+ passed
- [x] `poetry run pyright .` — 0 errors
- [x] `poetry run format` — clean
- [x] /pr-test --fix end-to-end against dev — 5/5 scenarios PASS,
including Anthropic route ($0.0174 cost +0.0% delta) and Moonshot route
($0.028 vs $0.018 → +58.2% delta validates reconcile rationale)
- [x] Transcript replay validation: pre-fix vs post-fix on real
126K-token transcript → 8x slower / 2.4x more expensive / fails entirely
on pre-fix; clean PONG on post-fix
2026-04-23 18:46:35 +07:00
Zamil Majdy
e3f6d36759 feat(backend/blocks): register 13 paid blocks + document credit/microdollar wallet boundary (#12876)
### Why / What / How

**Why.** Audit of `BLOCK_COSTS` against `credentials_store.py` system
credentials revealed **13 paid blocks** running for free from the credit
wallet's perspective — `BLOCK_COSTS.get(type(block))` returned `None`,
`cost = 0`, no `spend_credits` deduction. Users without their own API
key consumed system credentials with zero credit drain. Separately, the
credit wallet (user-facing prepaid balance) and the copilot microdollar
counter (operator-side meter that gates `daily_cost_limit_microdollars`)
were never documented as separate systems, so future readers kept
tripping on the "why isn't this block charging my limit?" question.

**What.** Three deltas, all credit-wallet-side:

- **Register the 13 paid blocks in `BLOCK_COSTS`** with reasonable
per-call credit prices (1 credit = $0.01). Pricing researched against
the providers' published rates with ~2-3x markup.
- **Document the credit/microdollar boundary** in
`copilot/rate_limit.py`: credits = user-facing prepaid wallet with
marketplace-creator charging; microdollars = operator-side meter that
only ticks on copilot LLM turns (baseline / SDK / web_search /
simulator). Block execution bills credits, not microdollars — explicit
contract.
- **Populate `provider_cost`** on PerplexityBlock so PlatformCostLog
rows carry the real OpenRouter `x-total-cost` value via the existing
`executor/cost_tracking.log_system_credential_cost` path (separate flow
from credit deduction).

### Block costs registered

| Provider | Block | Credits | Raw cost / markup |
|---|---|---|---|
| Perplexity (OpenRouter) | PerplexityBlock — Sonar | 1 | $0.001-0.005 /
call |
| | PerplexityBlock — Sonar Pro | 5 | $0.025 / call |
| | PerplexityBlock — Sonar Deep Research | 10 | up to $0.05 / call |
| Jina | FactCheckerBlock | 1 | $0.005 / call |
| Mem0 | AddMemoryBlock | 1 | $0.0004 / call (1c floor) |
| | SearchMemoryBlock | 1 | $0.004 / call |
| | GetAllMemoriesBlock | 1 | $0.004 / call |
| | GetLatestMemoryBlock | 1 | $0.004 / call |
| ScreenshotOne | ScreenshotWebPageBlock | 2 | $0.0085 / call (2.4x) |
| Nvidia | NvidiaDeepfakeDetectBlock | 2 | est $0.005 (no public SKU) |
| Smartlead | CreateCampaignBlock | 2 | $0.0065 send-equivalent (3x) |
| | AddLeadToCampaignBlock | 1 | $0.0065 (1.5x) |
| | SaveCampaignSequencesBlock | 1 | config-only |
| ZeroBounce | ValidateEmailsBlock | 2 | $0.008 / email (2.5x) |
| E2B + Anthropic | ClaudeCodeBlock | **100** | $0.50-$2 / typical
session (E2B sandbox + in-sandbox Claude) |

**Not in scope** — already covered via the SDK
`ProviderBuilder.with_base_cost()` pattern in their respective
`_config.py`: Exa, Linear, Airtable, Bannerbear, Wolfram, Firecrawl,
Wordpress, Baas, Stagehand, Dataforseo.

### How

1. `backend/data/block_cost_config.py` — 13 new `BlockCost` entries (3
Perplexity models + Fact Checker + 11 from this round).
2. `backend/copilot/rate_limit.py` — boundary docstring.
3. `backend/blocks/perplexity.py` — populate
`NodeExecutionStats.provider_cost` so PlatformCostLog rows carry the
real OpenRouter `x-total-cost` value.
4. Tests — `TestUnregisteredBlockRunsFree` regression +
`TestNewlyRegisteredBlockCosts` pinning every new entry by `cost_amount`
so a future refactor can't quietly drop one.

The companion Notion "Platform System Credentials" database has been
updated with a new `Platform Credit Cost` column populated across all 30
provider rows.

### Scope trim

An earlier revision piped block execution cost into the **copilot
microdollar counter** via `_record_block_microdollar_cost` in
`copilot/tools/helpers.py::execute_block`. That was reverted in
`16ae0f7b5` — the microdollar counter stays scoped to copilot LLM turns
only, credit wallet handles block execution. The pipe-through crossed a
boundary we explicitly want to keep.

### Changes

- `backend/data/block_cost_config.py` — 13 × `BlockCost` entries across
7 providers.
- `backend/blocks/perplexity.py` — populate `provider_cost` on the
execution stats (feeds PlatformCostLog).
- `backend/copilot/rate_limit.py` — boundary docstring only (no
behaviour change).
- `backend/copilot/tools/helpers_test.py` —
`TestUnregisteredBlockRunsFree` + `TestNewlyRegisteredBlockCosts` (8 new
regression tests).
- `backend/blocks/block_cost_tracking_test.py` — provider-cost
extraction pins.

### Checklist

For code changes:
- [x] Changes listed above
- [x] Test plan below
- [x] Tested according to the test plan:
- [x] `poetry run pytest backend/copilot/tools/helpers_test.py
backend/copilot/tools/run_block_test.py
backend/copilot/tools/continue_run_block_test.py
backend/blocks/block_cost_tracking_test.py
backend/blocks/test/test_perplexity.py` — passes
- [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py
backend/copilot/rate_limit_test.py
backend/copilot/token_tracking_test.py` — passes (confirms docstring
edits didn't regress the LLM-turn microdollar path)
  - [x] Pyright clean on all touched files
- [ ] Manual: run PerplexityBlock via copilot `run_block` — credits
deduct, PlatformCostLog row visible with `provider_cost`, no
microdollar-counter tick.
- [ ] Manual: run an unregistered block via copilot — no error, no
credit drain, no silent billing.
- [ ] Manual: run ClaudeCodeBlock via builder — 100 credits deducted
from wallet.

### Companion PR

PR #12873 ships the copilot microdollar / rate-limit work (web_search
cost, simulator cost, reasoning / reconnect fixes). This PR is
credit-wallet only.
2026-04-22 12:03:02 +07:00
Zamil Majdy
fcaebd1bb7 refactor(backend/copilot): unified queue-backed copilot turns + async sub-AutoPilot + guide-read gate (#12841)
### Why / What / How

**Why:** the 10-min stream-level idle timeout was killing legitimate
long-running tool calls — notably sub-AutoPilot runs via
`run_block(AutoPilotBlock)`, which routinely take 15–45 min. The symptom
users saw was `"A tool call appears to be stuck"` even though AutoPilot
was actively working. A second long-standing rough edge was shipped
alongside: agents often skipped `get_agent_building_guide` when
generating agent JSON, producing schemas that failed validation and
burned turns on auto-fix loops.

**What:** three threaded pieces.

1. **Async sub-AutoPilot via `run_sub_session`.** New copilot tool that
delegates a task to a fresh (or resumed) sub-AutoPilot, and its
companion `get_sub_session_result` for polling/cancelling. The agent
starts with `run_sub_session(prompt, wait_for_result≤300s)` and, if the
sub isn't done inside the cap, receives a handle + polls via
`get_sub_session_result(wait_if_running≤300s)`. No single MCP call ever
blocks the stream for more than 5 min, so the 10-min stream-idle timer
stays simple and effective (derived as `MAX_TOOL_WAIT_SECONDS * 2`).

2. **Queue-backed copilot turn dispatch** — one code path for all three
callers.
- `run_sub_session` enqueues a `CoPilotExecutionEntry` on the existing
`copilot_execution` exchange instead of spawning an in-process
`asyncio.Task`.
- `AutoPilotBlock.execute_copilot` (graph block) now uses the **same
queue** instead of `collect_copilot_response` inline.
   - The HTTP SSE endpoint was already queue-backed.
- All three share a single primitive: `run_copilot_turn_via_queue` →
`create_session` → `enqueue_copilot_turn` → `wait_for_session_result`.
The event-aggregation logic (`EventAccumulator`/`process_event`) is a
shared module used by both the direct-stream path and the cross-process
waiter.
- Benefits: **deploy/crash resilience** (RabbitMQ redelivery survives
worker restarts), **natural load balancing** across copilot_executor
workers, **sessions as first-class resources** (UI users can
`/copilot?sessionId=<inner>` into any sub or AutoPilot block's session),
and every future stream-level feature (pending-messages drain #12737,
compaction policies, etc.) applies uniformly instead of bypassing
graph-block sessions.

3. **Guide-read gate on agent-generation tools.** `create_agent` /
`edit_agent` / `validate_agent_graph` / `fix_agent_graph` refuse until
the session has called `get_agent_building_guide`. The pre-existing soft
hint was routinely ignored; the gate makes the dependency enforceable.
All four tool descriptions advertise the requirement in one tightened
sentence ("Requires get_agent_building_guide first (refuses
otherwise).") that stays under the 32000-char schema budget.

**How:**

#### Queue-backed sub-AutoPilot + AutoPilotBlock

- `sdk/session_waiter.py` — new module. `SessionResult` dataclass
mirrors `CopilotResult`. `wait_for_session_result` subscribes to
`stream_registry`, drains events via shared `process_event`, returns
`(outcome, result)`. `wait_for_session_completion` is the cheaper
outcome-only variant. `run_copilot_turn_via_queue` is the canonical
three-step dispatch. Every exit path unsubscribes the listener.
- `sdk/stream_accumulator.py` — new module. `EventAccumulator`,
`ToolCallEntry`, `process_event` extracted from `collect.py`. Both the
direct-stream and cross-process paths now use the same fold logic.
- `tools/run_sub_session.py` / `tools/get_sub_session_result.py` —
rewritten around the shared primitive. `sub_session_id` is now the sub's
`ChatSession` id directly (no separate registry handle). Ownership
re-verified on every call via `get_chat_session`. Cancel via
`enqueue_cancel_task` on the existing `copilot_cancel` fan-out exchange.
- `blocks/autopilot.py` — `execute_copilot` replaced its inline
`collect_copilot_response` with `run_copilot_turn_via_queue`.
`SessionResult` carries response text, tool calls, and token usage back
from the worker so no DB round-trip is needed. The block's public I/O
contract (inputs, outputs, `ToolCallEntry` shape) is unchanged.
- `CoPilotExecutionEntry` gains a `permissions: CopilotPermissions |
None` field forwarded to the worker's `stream_fn` so the sub's
capability filter survives the queue hop. The processor passes it
through to `stream_chat_completion_sdk` /
`stream_chat_completion_baseline`.
- **Deleted**: `sdk/sub_session_registry.py` (module-level dict,
done-callback, abandoned-task cap, `notify_shutdown_and_cancel_all`,
`_reset_for_test`), plus the shutdown-notifier hook in
`copilot_executor.processor.cleanup` — redundant under queue-backed
execution.

#### Run_block single-tool cap (3)

- `tools/helpers.execute_block` caps block execution at
`MAX_TOOL_WAIT_SECONDS = 5 min` via `asyncio.wait_for` around the
generator consumption.
- On timeout: logs `copilot_tool_timeout tool=run_block block=…
block_id=… input_keys=… user=… session=… cap_s=…` (grep-friendly) and
returns an `ErrorResponse` that redirects the LLM to `run_agent` /
`run_sub_session`.
- Billing protection: `_charge_block_credits` is called in a `finally`
guarded by `asyncio.shield` and marked `charge_handled` **before** the
await so cancel-mid-charge doesn't double-bill and
cancel-mid-generator-before-charge still settles via the finally.

#### Guide-read gate

- `helpers.require_guide_read(session, tool_name)` scans
`session.messages` for any prior assistant tool call named
`get_agent_building_guide` (handles both OpenAI and flat shapes).
Applied at the top of `_execute` in `create_agent`, `edit_agent`,
`validate_agent_graph`, `fix_agent_graph`. Tool descriptions advertise
the requirement.

#### Shared timing constants

- `MAX_TOOL_WAIT_SECONDS = 5 * 60` + `STREAM_IDLE_TIMEOUT_SECONDS = 2 *
MAX_TOOL_WAIT_SECONDS` in `constants.py`. Every long-running tool
(`run_agent`, `view_agent_output`, `run_sub_session`,
`get_sub_session_result`, `run_block`) imports from one place; no more
hardcoded 300 / `10*60` literals drifting apart. Stream-idle invariant
("no single tool blocks close to the idle timeout") holds by
construction.

### Frontend

- Friendlier tool-card labels: `run_sub_session` → "Sub-AutoPilot",
`get_sub_session_result` → "Sub-AutoPilot result", `run_block` →
"Action" (matches the builder UI's own naming), `run_agent` → "Agent".
Fixes the double-verb "Running Run …" phrasing.
- `SubSessionStatusResponse.sub_autopilot_session_link` surfaces
`/copilot?sessionId=<inner>` so users can click into any sub's session
from the tool-call card — same pattern as `run_agent`'s
`library_agent_link`.

### Changes 🏗️

- **New modules**: `sdk/session_waiter.py`, `sdk/stream_accumulator.py`,
`tools/run_sub_session.py`, `tools/get_sub_session_result.py`,
`tools/sub_session_test.py`, `tools/agent_guide_gate_test.py`.
- **New response types**: `SubSessionStatusResponse`,
`SubSessionProgressSnapshot`, `SessionResult`.
- **New gate helper**: `require_guide_read` in `tools/helpers.py`.
- **Queue protocol**: `permissions` field on `CoPilotExecutionEntry`,
threaded through `processor.py` → `stream_fn`.
- **Hidden**: `AUTOPILOT_BLOCK_ID` in `COPILOT_EXCLUDED_BLOCK_IDS`
(run_block can't execute AutoPilotBlock; agents use `run_sub_session`
instead).
- **Deleted**: `sdk/sub_session_registry.py`, processor
shutdown-notifier hook.
- **Regenerated**: `openapi.json` for the new response types; block-docs
for the updated `ToolName` Literal.
- **Tool descriptions**: tightened the guide-gate hint across the four
agent-builder tools to stay under the 32000-char schema budget.
- **40+ tests** across sub_session, execute_block cap + billing races,
stream_accumulator, agent_guide_gate, frontend helpers.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Unit suite green on the full copilot tree; `poetry run format` +
`pyright` clean
- [x] Schema character budget test passes (tool descriptions trimmed to
stay under 32000)
- [x] Native UI E2E (`poetry run app` + `pnpm dev`):
`run_sub_session(wait_for_result=60)` returns `status="completed"` +
`sub_autopilot_session_link` inline;
`run_sub_session(wait_for_result=1)` returns `status="running"` +
handle, `get_sub_session_result(wait_if_running=60)` observes `running →
completed` transition
- [x] AutoPilotBlock (graph) goes through `copilot_executor` queue
end-to-end (verified via logs: ExecutionManager's AutoPilotBlock node
spawned session `f6de335b-…`, a different `CoPilotExecutor` worker
acquired its cluster lock and ran the SDK stream)
- [x] Guide gate: `create_agent` without a prior
`get_agent_building_guide` returns the refusal; agent reads the guide
and retries successfully
2026-04-18 23:11:41 +07:00
slepybear
334ec18c31 docs: convert in-code comments to MkDocs admonitions in block-sdk-gui… (#12819)
### Why / What / How

<!-- Why: Why does this PR exist? What problem does it solve, or what's
broken/missing without it? -->
This PR converts inline Python comments in code examples within
`block-sdk-guide.md` into MkDocs `!!! note` admonitions. This makes code
examples cleaner and more copy-paste friendly while preserving all
explanatory content.

<!-- What: What does this PR change? Summarize the changes at a high
level. -->
Converts inline comments in code blocks to admonitions following the
pattern established in PR #12396 (new_blocks.md) and PR #12313.

<!-- How: How does it work? Describe the approach, key implementation
details, or architecture decisions. -->
- Wrapped code examples with `!!! note` admonitions
- Removed inline comments from code blocks for clean copy-paste
- Added explanatory admonitions after each code block

### Changes 🏗️

- Provider configuration examples (API key and OAuth)
- Block class Input/Output schema annotations
- Block initialization parameters
- Test configuration
- OAuth and webhook handler implementations
- Authentication types and file handling patterns

### Checklist 📋

#### For documentation changes:
- [x] Follows the admonition pattern from PR #12396
- [x] No code changes, documentation only
- [x] Admonition syntax verified correct

#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes

---

**Related Issues**: Closes #8946

Co-authored-by: slepybear <slepybear@users.noreply.github.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-17 07:47:52 +00:00
Toran Bruce Richards
f410929560 feat(platform): Add xAI Grok 4.20 models from OpenRouter (#12620)
Requested by @Torantulino

Adds the 2 xAI Grok 4.20 models available on OpenRouter that are missing
from the platform.

## Why

`x-ai/grok-4.20` and `x-ai/grok-4.20-multi-agent` are xAI's current
flagship models (released March 2026) and are available via OpenRouter,
but weren't accessible from the platform's LLM blocks.

## Changes

**`autogpt_platform/backend/backend/blocks/llm.py`**
- Added `GROK_4_20` and `GROK_4_20_MULTI_AGENT` enum members
- Added corresponding `MODEL_METADATA` entries (open_router provider, 2M
context window, price tier 3)

**`autogpt_platform/backend/backend/data/block_cost_config.py`**
- Added `MODEL_COST` entries at 5 credits each (flagship tier, $2/M in)

**`docs/integrations/block-integrations/llm.md`**
- Added new model IDs to all LLM block tables

| Model | Pricing | Context |
|-------|---------|---------|
| `x-ai/grok-4.20` | $2/M in, $6/M out | 2M |
| `x-ai/grok-4.20-multi-agent` | $2/M in, $6/M out | 2M |

Both models use the standard OpenRouter chat completions API — no
special handling needed.

Resolves: SECRT-2196

---------

Co-authored-by: Torantulino <22963551+Torantulino@users.noreply.github.com>
Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>
Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-04-16 12:14:56 +00:00
Zamil Majdy
77d8362983 docs(blocks): sync misc.md with memory_search/memory_store tools from dev merge 2026-04-09 23:15:02 +07:00
Toran Bruce Richards
f6ddcbc6cb feat(platform): Add all 12 Z.ai GLM models via OpenRouter (#12672)
## Summary

Add Z.ai (Zhipu AI) GLM model family to the platform LLM blocks, routed
through OpenRouter. This enables users to select any of the 12 Z.ai
models across all LLM-powered blocks (AI Text Generator, AI
Conversation, AI Structured Response, AI Text Summarizer, AI List
Generator).

## Gap Analysis

All 12 Z.ai models currently available on OpenRouter's API were missing
from the AutoGPT platform:

| Model | Context Window | Max Output | Price Tier | Cost |
|-------|---------------|------------|------------|------|
| GLM 4 32B | 128K | N/A | Tier 1 | 1 |
| GLM 4.5 | 131K | 98K | Tier 2 | 2 |
| GLM 4.5 Air | 131K | 98K | Tier 1 | 1 |
| GLM 4.5 Air (Free) | 131K | 96K | Tier 1 | 1 |
| GLM 4.5V (vision) | 65K | 16K | Tier 2 | 2 |
| GLM 4.6 | 204K | 204K | Tier 1 | 1 |
| GLM 4.6V (vision) | 131K | 131K | Tier 1 | 1 |
| GLM 4.7 | 202K | 65K | Tier 1 | 1 |
| GLM 4.7 Flash | 202K | N/A | Tier 1 | 1 |
| GLM 5 | 80K | 131K | Tier 2 | 2 |
| GLM 5 Turbo | 202K | 131K | Tier 3 | 4 |
| GLM 5V Turbo (vision) | 202K | 131K | Tier 3 | 4 |

## Changes

- **`autogpt_platform/backend/backend/blocks/llm.py`**: Added 12
`LlmModel` enum entries and corresponding `MODEL_METADATA` with context
windows, max output tokens, display names, and price tiers sourced from
OpenRouter API
- **`autogpt_platform/backend/backend/data/block_cost_config.py`**:
Added `MODEL_COST` entries for all 12 models, with costs scaled to match
pricing (1 for budget, 2 for mid-range, 4 for premium)

## How it works

All Z.ai models route through the existing OpenRouter provider
(`open_router`) — no new provider or API client code needed. Users with
an OpenRouter API key can immediately select any Z.ai model from the
model dropdown in any LLM block.

## Related

- Linear: REQ-83

---------

Co-authored-by: AutoGPT CoPilot <copilot@agpt.co>
2026-04-03 15:48:33 +00:00
Zamil Majdy
fff101e037 feat(backend): add SQL query block with multi-database support for CoPilot analytics (#12569)
## Summary
- Add a read-only SQL query block for CoPilot/AutoPilot analytics access
- Supports **multiple databases**: PostgreSQL, MySQL, SQLite, MSSQL via
SQLAlchemy
- Enforces read-only queries (SELECT only) with defense-in-depth SQL
validation using sqlparse
- SSRF protection: blocks connections to private/internal IPs
- Credentials stored securely via the platform credential system

## Changes
- New `SQLQueryBlock` in `backend/blocks/sql_query_block.py` with
`DatabaseType` enum
- SQLAlchemy-based execution with dialect-specific read-only and timeout
settings
- Connection URL validation ensuring driver matches selected database
type
- Comprehensive test suite (62 tests) including URL validation,
sanitization, serialization
- Documentation in `docs/integrations/block-integrations/data.md`
- Added `DATABASE` provider to `ProviderName` enum

### Checklist 📋
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan

#### Test plan:
- [x] Unit tests pass for query validation, URL validation, error
sanitization, value serialization
- [x] Read-only enforcement rejects INSERT/UPDATE/DELETE/DROP
- [x] Multi-statement injection blocked
- [x] SSRF protection blocks private IPs
- [x] Connection URL driver validation works for all 4 database types

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 06:43:40 +00:00
Zamil Majdy
f1ac05b2e0 fix(backend): propagate dry-run mode to special blocks with LLM-powered simulation (#12575)
## Summary
- **OrchestratorBlock & AgentExecutorBlock** now execute for real in
dry-run mode so the orchestrator can make LLM calls and agent executors
can spawn child graphs. Their downstream tool blocks and child-graph
blocks are still simulated via `simulate_block()`. Credential fields
from node defaults are restored since `validate_exec()` wipes them in
dry-run mode. Agent-mode iterations capped at 1 in dry-run.
- **All blocks** (including MCPToolBlock) are simulated via a single
generic `simulate_block()` path. The LLM prompt is grounded by
`inspect.getsource(block.run)`, giving the simulator access to the exact
implementation of each block's `run()` method. This produces realistic
mock responses for any block type without needing block-specific
simulation logic.
- Updated agent generation guide to document special block dry-run
behavior.
- Minor frontend fixes: exported `formatCents` from
`RateLimitResetDialog` for reuse in `UsagePanelContent`, used `useRef`
for stable callback references in `useResetRateLimit` to avoid stale
closures.
- 74 tests (21 existing dry-run + 53 new simulator tests covering prompt
building, passthrough logic, and special block dry-run).

## Design

The simulator (`backend/executor/simulator.py`) uses a two-tier
approach:

1. **Passthrough blocks** (OrchestratorBlock, AgentExecutorBlock):
`prepare_dry_run()` returns modified input_data so these blocks execute
for real in `manager.py`. OrchestratorBlock gets `max_iterations=1`
(agent mode) or 0 (traditional mode). AgentExecutorBlock spawns real
child graph executions whose blocks inherit `dry_run=True`.

2. **All other blocks**: `simulate_block()` builds an LLM prompt
containing:
   - Block name and description
   - Input/output schemas (JSON Schema)
   - The block's `run()` source code via `inspect.getsource(block.run)`
- The actual input values (with credentials stripped and long values
truncated)

The LLM then role-plays the block's execution, producing realistic
outputs grounded in the actual implementation.

Special handling for input/output blocks: `AgentInputBlock` and
`AgentOutputBlock` are pure passthrough (no LLM call needed).

## Test plan
- [x] All 74 tests pass (`pytest backend/copilot/tools/test_dry_run.py
backend/executor/simulator_test.py`)
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, frontend
typecheck)
- [x] CI: all checks green
- [x] E2E: dry-run execution completes with `is_dry_run=true`, cost=0,
no errors
- [x] E2E: normal (non-dry-run) execution unchanged
- [x] E2E: Create agent with OrchestratorBlock + tool blocks, run with
`dry_run=True`, verify orchestrator makes real LLM calls while tool
blocks are simulated
- [x] E2E: AgentExecutorBlock spawns child graph in dry-run, child
blocks are LLM-simulated
- [x] E2E: Builder simulate button works end-to-end with special blocks

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 17:09:55 +00:00
Toran Bruce Richards
11b846dd49 fix(blocks): rename placeholder_values to options on AgentDropdownInputBlock (#12595)
## Summary

Resolves [REQ-78](https://linear.app/autogpt/issue/REQ-78): The
`placeholder_values` field on `AgentDropdownInputBlock` is misleadingly
named. In every major UI framework "placeholder" means non-binding hint
text that disappears on focus, but this field actually creates a
dropdown selector that restricts the user to only those values.

## Changes

### Core rename (`autogpt_platform/backend/backend/blocks/io.py`)
- Renamed `placeholder_values` → `options` on
`AgentDropdownInputBlock.Input`
- Added clear field description: *"If provided, renders the input as a
dropdown selector restricted to these values. Leave empty for free-text
input."*
- Updated class docstring to describe actual behavior
- Overrode `model_construct()` to remap legacy `placeholder_values` →
`options` for **backward compatibility** with existing persisted agent
JSON

### Tests (`autogpt_platform/backend/backend/blocks/test/test_block.py`)
- Updated existing tests to use canonical `options` field name
- Added 2 new backward-compat tests verifying legacy
`placeholder_values` still works through both `model_construct()` and
`Graph._generate_schema()` paths

### Documentation
- Updated
`autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md`
— changed field name in CoPilot SDK guide
- Updated `docs/integrations/block-integrations/basic.md` — changed
field name and description in public docs

### Load tests
(`autogpt_platform/backend/load-tests/tests/api/graph-execution-test.js`)
- Removed spurious `placeholder_values: {}` from AgentInputBlock node
(this field never existed on AgentInputBlock)
- Fixed execution input to use `value` instead of `placeholder_values`

## Backward Compatibility

Existing agents with `placeholder_values` in their persisted
`input_default` JSON will continue to work — the `model_construct()`
override transparently remaps the old key to `options`. No database
migration needed since the field is stored inside a JSON blob, not as a
dedicated column.

## Testing

- All existing tests updated and passing
- 2 new backward-compat tests added
- No frontend changes needed (frontend reads `enum` from generated JSON
Schema, not the field name directly)

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-02 05:56:17 +00:00
Nicholas Tindle
88589764b5 dx(platform): normalize agent instructions for Claude and Codex (#12592)
### Why / What / How

Why: repo guidance was split between Claude-specific `CLAUDE.md` files
and Codex-specific `AGENTS.md` files, which duplicated instruction
content and made the same repository behave differently across agents.
The repo also had Claude skills under `.claude/skills` but no
Codex-visible repo skill path.

What: this PR bridges the repo's Claude skills into Codex and normalizes
shared instruction files so `AGENTS.md` becomes the canonical source
while each `CLAUDE.md` imports its sibling `AGENTS.md`.

How: add a repo-local `.agents/skills` symlink pointing to
`../.claude/skills`; move nested `CLAUDE.md` content into sibling
`AGENTS.md` files; replace each repo `CLAUDE.md` with a one-line
`@AGENTS.md` shim so Claude and Codex read the same scoped guidance
without duplicating text. The root `CLAUDE.md` now imports the root
`AGENTS.md` rather than symlinking to it.

Note: the instruction-file normalization commit was created with
`--no-verify` because the repo's frontend pre-commit `tsc` hook
currently fails on unrelated existing errors, largely missing
`autogpt_platform/frontend/src/app/api/__generated__/*` modules.

### Changes 🏗️

- Add `.agents/skills` as a repo-local symlink to `../.claude/skills` so
Codex discovers the existing Claude repo skills.
- Add a real root `CLAUDE.md` shim that imports the canonical root
`AGENTS.md`.
- Promote nested scoped instruction content into sibling `AGENTS.md`
files under `autogpt_platform/`, `autogpt_platform/backend/`,
`autogpt_platform/frontend/`, `autogpt_platform/frontend/src/tests/`,
and `docs/`.
- Replace the corresponding nested `CLAUDE.md` files with one-line
`@AGENTS.md` shims.
- Preserve the existing scoped instruction hierarchy while making the
shared content cross-compatible between Claude and Codex.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified `.agents/skills` resolves to `../.claude/skills`
  - [x] Verified each repo `CLAUDE.md` now contains only `@AGENTS.md`
- [x] Verified the expected `AGENTS.md` files exist at the root and
nested scoped directories
- [x] Verified the branch contains only the intended agent-guidance
commits relative to `dev` and the working tree is clean

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No runtime configuration changes are included in this PR.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: documentation/instruction-file reshuffle plus an
`.agents/skills` pointer; no runtime code paths are modified.
> 
> **Overview**
> Unifies agent guidance so **`AGENTS.md` becomes canonical** and all
corresponding `CLAUDE.md` files become 1-line shims (`@AGENTS.md`) at
the repo root, `autogpt_platform/`, backend, frontend, frontend tests,
and `docs/`.
> 
> Adds `.agents/skills` pointing to `../.claude/skills` so non-Claude
agents discover the same shared skills/instructions, eliminating
duplicated/agent-specific guidance content.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
839483c3b6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-04-01 09:08:51 +00:00
Zamil Majdy
3e25488b2d feat(copilot): add session-level dry_run flag to autopilot sessions (#12582)
## Summary
- Adds a session-level `dry_run` flag that forces ALL tool calls
(`run_block`, `run_agent`) in a copilot/autopilot session to use dry-run
simulation mode
- Stores the flag in a typed `ChatSessionMetadata` JSON model on the
`ChatSession` DB row, accessed via `session.dry_run` property
- Adds `dry_run` to the AutoPilot block Input schema so graph builders
can create dry-run autopilot nodes
- Refactors multiple copilot tools from `**kwargs` to explicit
parameters for type safety

## Changes
- **Prisma schema**: Added `metadata` JSON column to `ChatSession` model
with migration
- **Python models**: Added `ChatSessionMetadata` model with `dry_run`
field, added `metadata` field to `ChatSessionInfo` and `ChatSession`,
updated `from_db()`, `new()`, and `create_chat_session()`
- **Session propagation**: `set_execution_context(user_id, session)`
called from `baseline/service.py` so tool handlers can read
session-level flags via `session.dry_run`
- **Tool enforcement**: `run_block` and `run_agent` check
`session.dry_run` and force `dry_run=True` when set; `run_agent` blocks
scheduling in dry-run sessions
- **AutoPilot block**: Added `dry_run` input field, passes it when
creating sessions
- **Chat API**: Added `CreateSessionRequest` model with `dry_run` field
to `POST /sessions` endpoint; added `metadata` to session responses
- **Frontend**: Updated `useChatSession.ts` to pass body to the create
session mutation
- **Tool refactoring**: Multiple copilot tools refactored from
`**kwargs` to explicit named parameters (agent_browser, manage_folders,
workspace_files, connect_integration, agent_output, bash_exec, etc.) for
better type safety

## Test plan
- [x] Unit tests for `ChatSession.new()` with dry_run parameter
- [x] Unit tests for `RunBlockTool` session dry_run override
- [x] Unit tests for `RunAgentTool` session dry_run override
- [x] Unit tests for session dry_run blocks scheduling
- [x] Existing dry_run tests still pass (12/12)
- [x] Existing permissions tests still pass
- [x] All pre-commit hooks pass (ruff, isort, pyright, tsc)
- [ ] Manual: Create autopilot session with `dry_run=True`, verify
run_block/run_agent calls use simulation

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 16:27:36 +00:00
Zamil Majdy
37d9863552 feat(platform): add extended thinking execution mode to OrchestratorBlock (#12512)
## Summary
- Adds `ExecutionMode` enum with `BUILT_IN` (default built-in tool-call
loop) and `EXTENDED_THINKING` (delegates to Claude Agent SDK for richer
reasoning)
- Extracts shared `tool_call_loop` into `backend/util/tool_call_loop.py`
— reusable by both OrchestratorBlock agent mode and copilot baseline
- Refactors copilot baseline to use the shared `tool_call_loop` with
callback-driven iteration

## ExecutionMode enum
`ExecutionMode` (`backend/blocks/orchestrator.py`) controls how
OrchestratorBlock executes tool calls:
- **`BUILT_IN`** — Default mode. Runs the built-in tool-call loop
(supports all LLM providers).
- **`EXTENDED_THINKING`** — Delegates to the Claude Agent SDK for
extended thinking and multi-step planning. Requires Anthropic-compatible
providers (`anthropic` / `open_router`) and direct API credentials
(subscription mode not supported). Validates both provider and model
name at runtime.

## Shared tool_call_loop
`backend/util/tool_call_loop.py` provides a generic, provider-agnostic
conversation loop:
1. Call LLM with tools → 2. Extract tool calls → 3. Execute tools → 4.
Update conversation → 5. Repeat

Callers provide three callbacks:
- `llm_call`: wraps any LLM provider (OpenAI streaming, Anthropic,
llm.llm_call, etc.)
- `execute_tool`: wraps any tool execution (TOOL_REGISTRY, graph block
execution, etc.)
- `update_conversation`: formats messages for the specific protocol

## OrchestratorBlock EXTENDED_THINKING mode
- `_create_graph_mcp_server()` converts graph-connected blocks to MCP
tools
- `_execute_tools_sdk_mode()` runs `ClaudeSDKClient` with those MCP
tools
- Agent mode refactored to use shared `tool_call_loop`

## Copilot baseline refactored
- Streaming callbacks buffer `Stream*` events during loop execution
- Events are drained after `tool_call_loop` returns
- Same conversation logic, less code duplication

## SDK environment builder extraction
- `build_sdk_env()` extracted to `backend/copilot/sdk/env.py` for reuse
by both copilot SDK service and OrchestratorBlock

## Provider validation
EXTENDED_THINKING mode validates `provider in ('anthropic',
'open_router')` and `model_name.startswith('claude')` because the Claude
Agent SDK requires an Anthropic API key or OpenRouter key. Subscription
mode is not supported — it uses the platform's internal credit system
which doesn't provide raw API keys needed by the SDK. The validation
raises a clear `ValueError` if an unsupported provider or model is used.

## PR Dependencies
This PR builds on #12511 (Claude SDK client). It can be reviewed
independently — #12511 only adds the SDK client module which this PR
imports. If #12511 merges first, this PR will have no conflicts.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All pre-commit hooks pass (typecheck, lint, format)
  - [x] Existing OrchestratorBlock tests still pass
- [x] Copilot baseline behavior unchanged (same stream events, same tool
execution)
- [x] Manual: OrchestratorBlock with execution_mode=EXTENDED_THINKING +
downstream blocks → SDK calls tools
  - [x] Agent mode regression test (non-SDK path works as before)
  - [x] SDK mode error handling (invalid provider raises ValueError)
2026-03-31 20:04:13 +07:00
Zamil Majdy
f79d8f0449 fix(backend): move placeholder_values exclusively to AgentDropdownInputBlock (#12551)
## Why

`AgentInputBlock` has a `placeholder_values` field whose
`generate_schema()` converts it into a JSON schema `enum`. The frontend
renders any field with `enum` as a dropdown/select. This means
AI-generated agents that populate `placeholder_values` with example
values (e.g. URLs) on regular `AgentInputBlock` nodes end up with
dropdowns instead of free-text inputs — users can't type custom values.

Only `AgentDropdownInputBlock` should produce dropdown behavior.

## What

- Removed `placeholder_values` field from `AgentInputBlock.Input`
- Moved the `enum` generation logic to
`AgentDropdownInputBlock.Input.generate_schema()`
- Cleaned up test data for non-dropdown input blocks
- Updated copilot agent generation guide to stop suggesting
`placeholder_values` for `AgentInputBlock`

## How

The base `AgentInputBlock.Input.generate_schema()` no longer converts
`placeholder_values` → `enum`. Only `AgentDropdownInputBlock.Input`
defines `placeholder_values` and overrides `generate_schema()` to
produce the `enum`.

**Backward compatibility**: Existing agents with `placeholder_values` on
`AgentInputBlock` nodes load fine — `model_construct()` silently ignores
extra fields not defined on the model. Those inputs will now render as
text fields (desired behavior).

## Test plan
- [x] `poetry run pytest backend/blocks/test/test_block.py -xvs` — all
block tests pass
- [x] `poetry run format && poetry run lint` — clean
- [ ] Import an agent JSON with `placeholder_values` on an
`AgentInputBlock` — verify it loads and renders as text input
- [ ] Create an agent with `AgentDropdownInputBlock` — verify dropdown
still works
2026-03-26 08:09:38 +00:00
An Vy Le
f871717f68 fix(backend): add sink input validation to AgentValidator (#12514)
## Summary

- Added `validate_sink_input_existence` method to `AgentValidator` to
ensure all sink names in links and input defaults reference valid input
schema fields in the corresponding block
- Added comprehensive tests covering valid/invalid sink names, nested
inputs, and default key handling
- Updated `ReadDiscordMessagesBlock` description to clarify it reads new
messages and triggers on new posts
- Removed leftover test function file

## Test plan

- [ ] Run `pytest` on `validator_test.py` to verify all sink input
validation cases pass
- [ ] Verify existing agent validation flow is unaffected
- [ ] Confirm `ReadDiscordMessagesBlock` description update is accurate

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-25 16:08:17 +00:00
Zamil Majdy
80bfd64ffa Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-03-24 21:18:11 +07:00
Zamil Majdy
0076ad2a1a hotfix(blocks): bump stagehand ^0.5.1 → ^3.4.0 to fix yanked litellm (#12539)
## Summary

**Critical CI fix** — litellm was compromised in a supply chain attack
(versions 1.82.7/1.82.8 contained infostealer malware) and PyPI
subsequently yanked many litellm versions including the 1.7x range that
stagehand 0.5.x depended on. This breaks `poetry lock` in CI for all
PRs.

- Bump `stagehand` from `^0.5.1` to `^3.4.0` — Stagehand v3 is a
Stainless-generated HTTP API client that **no longer depends on
litellm**, completely removing litellm from our dependency tree
- Migrate stagehand blocks to use `AsyncStagehand` + session-based API
(`sessions.start`, `session.navigate/act/observe/extract`)
- Net reduction of ~430 lines in `poetry.lock` from dropping litellm and
its transitive dependencies

## Why

All CI pipelines are blocked because `poetry lock` fails to resolve
yanked litellm versions that stagehand 0.5.x required.

## Test plan

- [x] CI passes (poetry lock resolves, backend tests green)
- [ ] Verify stagehand blocks still function with the new session-based
API
2026-03-24 21:17:19 +07:00
Zamil Majdy
9381057079 refactor(platform): rename SmartDecisionMakerBlock to OrchestratorBlock (#12511)
## Summary
- Renames `SmartDecisionMakerBlock` to `OrchestratorBlock` across the
entire codebase
- The block supports iteration/agent mode and general tool
orchestration, so "Smart Decision Maker" no longer accurately describes
its capabilities
- Block UUID (`3b191d9f-356f-482d-8238-ba04b6d18381`) remains unchanged
— fully backward compatible with existing graphs

## Changes
- Renamed block class, constants, file names, test files, docs, and
frontend enum
- Updated copilot agent generator (helpers, validator, fixer) references
- Updated agent generation guide documentation
- No functional changes — pure rename refactor

### For code changes
- [x] I have clearly listed my changes in the PR description
- [x] I have made corresponding changes to the documentation
- [x] My changes do not generate new warnings or errors
- [x] New and existing unit tests pass locally with my changes

## Test plan
- [x] All pre-commit hooks pass (typecheck, lint, format)
- [x] Existing graphs with this block continue to load and execute (same
UUID)
- [x] Agent mode / iteration mode works as before
- [x] Copilot agent generator correctly references the renamed block
2026-03-24 19:16:42 +07:00
Zamil Majdy
ee5382a064 feat(copilot): add tool/block capability filtering to AutoPilotBlock (#12482)
## Summary

- Adds `CopilotPermissions` model (`copilot/permissions.py`) — a
capability filter that restricts which tools and blocks the
AutoPilot/Copilot may use during a single execution
- Exposes 4 new `advanced=True` fields on `AutoPilotBlock`: `tools`,
`tools_exclude`, `blocks`, `blocks_exclude`
- Threads permissions through the full execution path: `AutoPilotBlock`
→ `collect_copilot_response` → `stream_chat_completion_sdk` →
`run_block`
- Implements recursion inheritance via contextvar: sub-agent executions
can only be *more* restrictive than their parent

## Design

**Tool filtering** (`tools` + `tools_exclude`):
- `tools_exclude=True` (default): `tools` is a **blacklist** — listed
tools denied, all others allowed. Empty list = allow all.
- `tools_exclude=False`: `tools` is a **whitelist** — only listed tools
are allowed.
- Users specify short names (`run_block`, `web_fetch`, `Read`, `Task`,
…) — mapped to full SDK format internally.
- Validated eagerly at block-run time with a clear error listing valid
names.

**Block filtering** (`blocks` + `blocks_exclude`):
- Same semantics as tool filtering, applied inside `run_block` via
contextvar.
- Each entry can be a full UUID, an 8-char partial UUID (first segment),
or a case-insensitive block name.
- Validated against the live block registry; invalid identifiers surface
a helpful error before the session is created.

**Recursion inheritance**:
- `_inherited_permissions` contextvar stores the parent execution's
permissions.
- On each `AutoPilotBlock.run()`, the child's permissions are merged
with the parent via `merged_with_parent()` — effective allowed sets are
intersected (tools) and the parent chain is kept for block checks.
- Sub-agents can never expand what the parent allowed.

## Test plan

- [x] 68 new unit tests in `copilot/permissions_test.py` and
`blocks/autopilot_permissions_test.py`
- [x] Block identifier matching: full UUID, partial UUID, name,
case-insensitivity
- [x] Tool allow/deny list semantics including edge cases (empty list,
unknown tool)
- [x] Parent/child merging and recursion ceiling correctness
- [x] `validate_tool_names` / `validate_block_identifiers` with mock
block registry
- [x] `apply_tool_permissions` SDK tool-list integration
- [x] `AutoPilotBlock.run()` — invalid tool/block yields error before
session creation
- [x] `AutoPilotBlock.run()` — valid permissions forwarded to
`execute_copilot`
- [x] Existing `AutoPilotBlock` block tests still pass (2/2)
- [x] All hooks pass (pyright, ruff, black, isort)
- [x] E2E: CoPilot chat works end-to-end with E2B sandbox (12s stream)
- [x] E2E: Permission fields render in Builder UI (Tools combobox,
exclude toggles)
- [x] E2E: Agent with restricted permissions (whitelist web_fetch only)
executes correctly
- [x] E2E: Permission values preserved through API round-trip
2026-03-24 07:49:58 +00:00
Nicholas Tindle
f01f668674 fix(backend): support Responses API in SmartDecisionMakerBlock (#12489)
## Summary

- Fixes SmartDecisionMakerBlock conversation management to work with
OpenAI's Responses API, which was introduced in #12099 (commit 1240f38)
- The migration to `responses.create` updated the outbound LLM call but
missed the conversation history serialization — the `raw_response` is
now the entire `Response` object (not a `ChatCompletionMessage`), and
tool calls/results use `function_call` / `function_call_output` types
instead of role-based messages
- This caused a 400 error on the second LLM call in agent mode:
`"Invalid value: ''. Supported values are: 'assistant', 'system',
'developer', and 'user'."`

### Changes

**`smart_decision_maker.py`** — 6 functions updated:
| Function | Fix |
|---|---|
| `_convert_raw_response_to_dict` | Detects Responses API `Response`
objects, extracts output items as a list |
| `_get_tool_requests` | Recognizes `type: "function_call"` items |
| `_get_tool_responses` | Recognizes `type: "function_call_output"`
items |
| `_create_tool_response` | New `responses_api` kwarg produces
`function_call_output` format |
| `_update_conversation` | Handles list return from
`_convert_raw_response_to_dict` |
| Non-agent mode path | Same list handling for traditional execution |

**`test_smart_decision_maker_responses_api.py`** — 61 tests covering:
- Every branch of all 6 affected helper functions
- Chat Completions, Anthropic, and Responses API formats
- End-to-end agent mode and traditional mode conversation validity

## Test plan

- [x] 61 new unit tests all pass
- [x] 11 existing SmartDecisionMakerBlock tests still pass (no
regressions)
- [x] All pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] CI integration tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Updates core LLM invocation and agent conversation/tool-call
bookkeeping to match OpenAI’s Responses API, which can affect tool
execution loops and prompt serialization across providers. Risk is
mitigated by extensive new unit tests, but regressions could surface in
production agent-mode flows or token/usage accounting.
> 
> **Overview**
> **Migrates OpenAI calls from Chat Completions to the Responses API
end-to-end**, including tool schema conversion, output parsing,
reasoning/text extraction, and updated token usage fields in
`LLMResponse`.
> 
> **Fixes SmartDecisionMakerBlock conversation/tool handling for
Responses API** by treating `raw_response` as a Response object
(splitting it into `output` items for replay), recognizing
`function_call`/`function_call_output` entries, and emitting tool
outputs in the correct Responses format to prevent invalid follow-up
prompts.
> 
> Also adjusts prompt compaction/token estimation to understand
Responses API tool items, changes
`get_execution_outputs_by_node_exec_id` to return list-valued
`CompletedBlockOutput`, removes `gpt-3.5-turbo` from model/cost/docs
lists, and adds focused unit tests plus a lightweight `conftest.py` to
run these tests without the full server stack.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ff292efd3d. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
2026-03-20 03:23:52 +00:00
Nicholas Tindle
cbff3b53d3 Revert "feat(backend): migrate OpenAI provider to Responses API" (#12490)
Reverts Significant-Gravitas/AutoGPT#12099

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Reverts the OpenAI integration in `llm_call` from the Responses API
back to `chat.completions`, which can change tool-calling, JSON-mode
behavior, and token accounting across core AI blocks. The change is
localized but touches the primary LLM execution path and associated
tests/docs.
> 
> **Overview**
> Reverts the OpenAI path in `backend/blocks/llm.py` from the Responses
API back to `chat.completions`, including updating JSON-mode
(`response_format`), tool handling, and usage extraction to match the
Chat Completions response shape.
> 
> Removes the now-unused `backend/util/openai_responses.py` helpers and
their unit tests, updates LLM tests to mock `chat.completions.create`,
and adds `gpt-3.5-turbo` to the supported model list, cost config, and
LLM docs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7d6226d10e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-03-20 01:51:56 +00:00
Otto
1240f38f75 feat(backend): migrate OpenAI provider to Responses API (#12099)
## Summary

Migrates the OpenAI provider in the LLM block from
`chat.completions.create` to `responses.create` — OpenAI's newer,
unified API. Also removes the obsolete GPT-3.5-turbo model.

Resolves #11624
Linear:
[OPEN-2911](https://linear.app/autogpt/issue/OPEN-2911/update-openai-calls-to-use-responsescreate)

## Changes

- **`backend/blocks/llm.py`** — OpenAI provider now uses
`responses.create` exclusively. Removed GPT-3.5-turbo enum + metadata.
- **`backend/util/openai_responses.py`** *(new)* — Helpers for the
Responses API: tool format conversion, content/reasoning/usage/tool-call
extraction.
- **`backend/util/openai_responses_test.py`** *(new)* — Unit tests for
all helper functions.
- **`backend/data/block_cost_config.py`** — Removed GPT-3.5 cost entry.
- **`docs/integrations/block-integrations/llm.md`** — Regenerated block
docs.

## Key API differences handled

| Aspect | Chat Completions | Responses API |
|--------|-----------------|---------------|
| Messages param | `messages` | `input` |
| Max tokens param | `max_completion_tokens` | `max_output_tokens` |
| Usage fields | `prompt_tokens` / `completion_tokens` | `input_tokens`
/ `output_tokens` |
| Tool format | Nested under `function` key | Flat structure |

## Test plan

- [x] Unit tests for all `openai_responses.py` helpers
- [x] Existing LLM block tests updated for Responses API mocks
- [x] Regular OpenAI models work
- [x] Reasoning OpenAI models work
- [x] Non-OpenAI models work

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 09:19:31 +00:00
Zamil Majdy
5d9a169e04 feat(blocks): add AutoPilotBlock for invoking AutoPilot from graphs (#12439)
## Summary
- Adds `AutogptCopilotBlock` that invokes the platform's copilot system
(`stream_chat_completion_sdk`) directly from graph executions
- Enables sub-agent patterns: copilot can call this block recursively
(with depth limiting via `contextvars`)
- Enables scheduled copilot execution through the agent executor system
- No user credentials needed — uses server-side copilot config

## Inputs/Outputs
**Inputs:** prompt, system_context, session_id (continuation), timeout,
max_recursion_depth
**Outputs:** response text, tool_calls list, conversation_history JSON,
session_id, token_usage

## Test plan
- [x] Block test passes (`test_available_blocks[AutogptCopilotBlock]`)
- [x] Pre-commit hooks pass (format, lint, typecheck)
- [ ] Manual test: add block to graph, send prompt, verify response
- [ ] Manual test: chain two copilot blocks with session_id to verify
continuation
2026-03-18 11:22:25 +00:00
Otto
e657472162 feat(blocks): Add Nano Banana 2 to image generator, customizer, and editor blocks (#12218)
Requested by @Torantulino

Add `google/nano-banana-2` (Gemini 3.1 Flash Image) support across all
three image blocks.

### Changes

**`ai_image_customizer.py`**
- Add `NANO_BANANA_2 = "google/nano-banana-2"` to `GeminiImageModel`
enum
- Update block description to reference Nano-Banana models generically

**`ai_image_generator_block.py`**
- Add `NANO_BANANA_2` to `ImageGenModel` enum
- Add generation branch (identical to NBP except model name)

**`flux_kontext.py` (AI Image Editor)**
- Rename `FluxKontextModelName` → `ImageEditorModel` (with
backwards-compatible alias)
- Add `NANO_BANANA_PRO` and `NANO_BANANA_2` to the editor
- Model-aware branching in `run_model()`: NB models use `image_input`
list (not `input_image`), no `seed`, and add `output_format`

**`block_cost_config.py`**
- Add NB2 cost entries for all three blocks (14 credits, matching NBP)
- Add NB Pro cost entry for editor block
- Update editor block refs from `.PRO`/`.MAX` to
`.FLUX_KONTEXT_PRO`/`.FLUX_KONTEXT_MAX`

Resolves SECRT-2047

---------

Co-authored-by: Torantulino <Torantulino@users.noreply.github.com>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
2026-03-18 09:42:18 +00:00
Abhimanyu Yadav
e32d258a7e feat(blocks): add AgentMail integration blocks (#12417)
## Summary
- Add a full AgentMail integration with blocks for managing inboxes,
messages, threads, drafts, attachments, lists, and pods
- Includes shared provider configuration (`_config.py`) with API key
authentication
- 8 block modules covering ~25 individual blocks across all AgentMail
API surfaces

  ## Block Modules
  | Module | Blocks |
  |--------|--------|
  | `inbox.py` | Create, Get, List, Update, Delete inboxes |
| `messages.py` | Send, Get, List, Delete messages + org-wide listing |
  | `threads.py` | Get, List, Delete threads + org-wide listing |
| `drafts.py` | Create, Get, List, Update, Send, Delete drafts +
org-wide listing |
  | `attachments.py` | Download attachments |
  | `lists.py` | Create, Get, List, Update, Delete mailing lists |
  | `pods.py` | Create, Get, List, Update, Delete pods |

  ## Test plan
- [x] `poetry run pytest 'backend/blocks/test/test_block.py' -xvs` — all
new blocks pass the standard block test suite
  - [x] test all blocks manually
2026-03-17 12:40:32 +00:00
Nicholas Tindle
8892bcd230 docs: Add workspace and media file architecture documentation (#11989)
### Changes 🏗️

- Added comprehensive architecture documentation at
`docs/platform/workspace-media-architecture.md` covering:
  - Database models (`UserWorkspace`, `UserWorkspaceFile`)
  - `WorkspaceManager` API with session scoping
- `store_media_file()` media normalization pipeline (input types, return
formats)
  - Virus scanning responsibility boundaries
- Decision tree for choosing `WorkspaceManager` vs `store_media_file()`
- Configuration reference including `clamav_max_concurrency` and
`clamav_mark_failed_scans_as_clean`
  - Common patterns with error handling examples
- Updated `autogpt_platform/backend/CLAUDE.md` with a "Workspace & Media
Files" section referencing the new docs
- Removed duplicate `scan_content_safe()` call from
`WriteWorkspaceFileTool` — `WorkspaceManager.write_file()` already scans
internally, so the tool was double-scanning every file
- Replaced removed comment in `workspace.py` with explicit ownership
comment clarifying that `WorkspaceManager` is the single scanning
boundary

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `scan_content_safe()` is called inside
`WorkspaceManager.write_file()` (workspace.py:186)
- [x] Verified `store_media_file()` scans all input branches including
local paths (file.py:351)
- [x] Verified documentation accuracy against current source code after
merge with dev
  - [x] CI checks all passing

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Mostly adds documentation and internal developer guidance; the only
code change is a comment clarifying `WorkspaceManager.write_file()` as
the single virus-scanning boundary, with no behavior change.
> 
> **Overview**
> Adds a new `docs/platform/workspace-media-architecture.md` describing
the Workspace storage layer vs the `store_media_file()` media pipeline,
including session scoping and virus-scanning/persistence responsibility
boundaries.
> 
> Updates backend `CLAUDE.md` to point contributors to the new doc when
working on CoPilot uploads/downloads or
`WorkspaceManager`/`store_media_file()`, and clarifies in
`WorkspaceManager.write_file()` (comment-only) that callers should not
duplicate virus scanning.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
18fcfa03f8. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 06:12:26 +00:00
Bently
ef446e4fe9 feat(llm): Add Cohere Command A Family Models (#12339)
## Summary
Adds the Cohere Command A family of models to AutoGPT Platform with
proper pricing configuration.

## Models Added
- **Command A 03.2025**: Flagship model (256k context, 8k output) - 3
credits
- **Command A Translate 08.2025**: State-of-the-art translation (8k
context, 8k output) - 3 credits
- **Command A Reasoning 08.2025**: First reasoning model (256k context,
32k output) - 6 credits
- **Command A Vision 07.2025**: First vision-capable model (128k
context, 8k output) - 3 credits

## Changes
- Added 4 new LlmModel enum entries with proper OpenRouter model IDs
- Added ModelMetadata for each model with correct context windows,
output limits, and price tiers
- Added pricing configuration in block_cost_config.py

## Testing
- [ ] Models appear in AutoGPT Platform model selector
- [ ] Pricing is correctly applied when using models

Resolves **SECRT-2083**
2026-03-12 11:56:30 +00:00
Bently
7b1e8ed786 feat(llm): Add Microsoft Phi-4 model support (#12342)
## Changes
- Added `MICROSOFT_PHI_4` to LlmModel enum (`microsoft/phi-4`)
- Configured model metadata:
  - 16K context window
  - 16K max output tokens
  - OpenRouter provider
- Set cost tier: 1
  - Input: $0.06 per 1M tokens
  - Output: $0.14 per 1M tokens

## Details
Microsoft Phi-4 is a 14B parameter model available through OpenRouter.
This PR adds proper support in the autogpt_platform backend.

Resolves SECRT-2086
2026-03-12 11:15:27 +00:00
Bently
3595c6e769 feat(llm): add Perplexity Sonar Reasoning Pro model (#12341)
## Summary
Adds support for Perplexity's new reasoning model:
`perplexity/sonar-reasoning-pro`

## Changes
-  Added `PERPLEXITY_SONAR_REASONING_PRO` to `LlmModel` enum
-  Added model metadata (128K context window, 8K max output tokens,
tier 2)
-  Set pricing at 5 credits (matches sonar-pro tier)

## Model Details
- **Model ID:** `perplexity/sonar-reasoning-pro`
- **Provider:** OpenRouter
- **Context Window:** 128,000 tokens
- **Max Output:** 8,000 tokens
- **Pricing:** $0.000002/token (prompt), $0.000008/token (completion)
- **Cost Tier:** 2 (5 credits)

## Testing
-  Black formatting passed
-  Ruff linting passed

Resolves SECRT-2084
2026-03-12 09:58:29 +00:00
Bently
ade2baa58f feat(llm): Add Grok 3 model support (#12343)
## Summary
Adds support for xAI's Grok 3 model to AutoGPT.

## Changes
- Added `GROK_3` to `LlmModel` enum with identifier `x-ai/grok-3`
- Configured model metadata:
  - Context window: 131,072 tokens (128k)
  - Max output: 32,768 tokens (32k)  
  - Provider: OpenRouter
  - Creator: xAI
  - Price tier: 2 (mid-tier)
- Set model cost to 3 credits (mid-tier pricing between fast models and
Grok 4)
- Updated block documentation to include Grok 3 in model lists

## Pricing Rationale
- **Grok 4**: 9 credits (tier 3 - premium, 256k context)
- **Grok 3**: 3 credits (tier 2 - mid-tier, 128k context) ← NEW
- **Grok 4 Fast/4.1 Fast/Code Fast**: 1 credit (tier 1 - affordable)

Grok 3 is positioned as a mid-tier model, priced similarly to other tier
2 models.

## Testing
- [x] Code passes `black` formatting
- [x] Code passes `ruff` linting
- [x] Model metadata and cost configuration added
- [x] Documentation updated

Closes SECRT-2079
2026-03-12 07:31:59 +00:00
Bently
89a5b3178a fix(llm): Update Gemini model lineup - add 3.1 models, deprecate 3 Pro Preview (#12331)
## 🔴 URGENT: Gemini 3 Pro Preview Shutdown - March 9, 2026

Google is shutting down Gemini 3 Pro Preview **tomorrow (March 9,
2026)**. This PR addresses SECRT-2067 by updating the Gemini model
lineup to prevent disruption.

---

## Changes

###  P0 - Critical (This Week)
- [x] **Remove/Replace Gemini 3 Pro Preview** → Migrated to 3.1 Pro
Preview
- [x] **Add Gemini 3.1 Pro Preview** (released Feb 19, 2026)

###  P1 - High Priority  
- [x] **Add Gemini 3.1 Flash Lite Preview** (released Mar 3, 2026)
- [x] **Add Gemini 3 Flash Preview** (released Dec 17, 2025)

###  P2 - Medium Priority
- [x] **Add Gemini 2.5 Pro (stable/GA)** (released Jun 17, 2025)

---

## Model Details

| Model | Context | Input Cost | Output Cost | Price Tier |
|-------|---------|------------|-------------|------------|
| **Gemini 3.1 Pro Preview** | 1.05M | $2.00/1M | $12.00/1M | 2 |
| **Gemini 3.1 Flash Lite Preview** | 1.05M | $0.25/1M | $1.50/1M | 1 |
| **Gemini 3 Flash Preview** | 1.05M | $0.50/1M | $3.00/1M | 1 |
| **Gemini 2.5 Pro (GA)** | 1.05M | $1.25/1M | $10.00/1M | 2 |
| ~~Gemini 3 Pro Preview~~ | ~~1.05M~~ | ~~$2.00/1M~~ | ~~$12.00/1M~~ |
**DEPRECATED** |

---

## Migration Strategy

**Database Migration:**
`20260308095500_migrate_deprecated_gemini_3_pro_preview`

- Automatically migrates all existing graphs using
`google/gemini-3-pro-preview` to `google/gemini-3.1-pro-preview`
- Updates: AgentBlock, AgentGraphExecution, AgentNodeExecution,
AgentGraph
- Zero user-facing disruption
- Migration runs on next deployment (before March 9 shutdown)

---

## Testing

- [ ] Verify new models appear in LLM block dropdown
- [ ] Test migration on staging database
- [ ] Confirm existing graphs using deprecated model auto-migrate
- [ ] Validate cost calculations for new models

---

## References

- **Linear Issue:**
[SECRT-2067](https://linear.app/autogpt/issue/SECRT-2067)
- **OpenRouter Models:** https://openrouter.ai/models/google
- **Google Deprecation Notice:**
https://ai.google.dev/gemini-api/docs/deprecations

---

## Checklist

- [x] Models added to `LlmModel` enum
- [x] Model metadata configured
- [x] Cost config updated
- [x] Database migration created
- [x] Deprecated model commented out (not removed for historical
reference)
- [ ] PR reviewed and approved
- [ ] Merged before March 9, 2026 deadline

---

**Priority:** 🔴 Critical - Must merge before March 9, 2026
2026-03-11 11:21:16 +00:00
Bently
34a2f9a0a2 feat(llm): add Mistral flagship models (Large 3, Medium 3.1, Small 3.2, Codestral) (#12337)
## Summary

Adds four missing Mistral AI flagship models to address the critical
coverage gap identified in
[SECRT-2082](https://linear.app/autogpt/issue/SECRT-2082).

## Models Added

| Model | Context | Max Output | Price Tier | Use Case |
|-------|---------|------------|------------|----------|
| **Mistral Large 3** | 262K | None | 2 (Medium) | Flagship reasoning
model, 41B active params (675B total), MoE architecture |
| **Mistral Medium 3.1** | 131K | None | 2 (Medium) | Balanced
performance/cost, 8x cheaper than traditional large models |
| **Mistral Small 3.2** | 131K | 131K | 1 (Low) | Fast, cost-efficient,
high-volume use cases |
| **Codestral 2508** | 256K | None | 1 (Low) | Code generation
specialist (FIM, correction, test gen) |

## Problem

Previously, the platform only offered:
- Mistral Nemo (1 official model)
- dolphin-mistral (third-party Ollama fine-tune)

This left significant gaps in Mistral's lineup, particularly:
- No flagship reasoning model
- No balanced mid-tier option
- No code-specialized model
- Missing multimodal capabilities (Large 3, Medium 3.1, Small 3.2 all
support text+image)

## Changes

**File:** `autogpt_platform/backend/backend/blocks/llm.py`

- Added 4 enum entries in `LlmModel` class
- Added 4 metadata entries in `MODEL_METADATA` dict
- All models use OpenRouter provider
- Follows existing pattern for model additions

## Testing

-  Enum values match OpenRouter model IDs
-  Metadata follows existing format
-  Context windows verified from OpenRouter API
-  Price tiers assigned appropriately

## Closes

- SECRT-2082

---

**Note:** All models are available via OpenRouter and tested. This
brings Mistral coverage in line with other major providers (OpenAI,
Anthropic, Google).
2026-03-11 08:48:48 +00:00
Zamil Majdy
9f4caa7dfc feat(blocks): add and harden GitHub blocks for full-cycle development (#12334)
## Summary
- Add 8 new GitHub blocks: GetRepositoryInfo, ForkRepository,
ListCommits, SearchCode, CompareBranches, GetRepositoryTree,
MultiFileCommit, MergePullRequest
- Split `repo.py` (2094 lines, 19 blocks) into domain-specific modules:
`repo.py`, `repo_branches.py`, `repo_files.py`, `commits.py`
- Concurrent blob creation via `asyncio.gather()` in MultiFileCommit
- URL-encode branch/ref params via `urllib.parse.quote()` for
defense-in-depth
- Step-level error handling in MultiFileCommit ref update with recovery
SHA
- Collapse FileOperation CREATE/UPDATE into UPSERT (Git Trees API treats
them identically)
- Add `ge=1, le=100` constraints on per_page SchemaFields
- Preserve URL scheme in `prepare_pr_api_url`
- Handle null commit authors gracefully in ListCommits
- Add unit tests for `prepare_pr_api_url`, error-path tests for
MergePR/MultiFileCommit, FileOperation enum validation tests

## Test plan
- [ ] Block tests pass for all 19 GitHub blocks (CI:
`test_available_blocks`)
- [ ] New test file `test_github_blocks.py` passes (prepare_pr_api_url,
error paths, enum)
- [ ] `check-docs-sync` passes with regenerated docs
- [ ] pyright/ruff clean on all changed files
2026-03-11 08:35:37 +00:00
nKOxxx
c7124a5240 Add documentation for Google Gemini integration (#12283)
## Summary
Adding comprehensive documentation for Google Gemini integration with
AutoGPT.

## Changes
- Added setup instructions for Gemini API
- Documented configuration options
- Added examples and best practices

## Related Issues
N/A - Documentation improvement

## Testing
- Verified documentation accuracy
- Tested all code examples

## Checklist
- [x] Code follows project style
- [x] Documentation updated
- [x] Tests pass (if applicable)
2026-03-09 15:13:28 +00:00
Reinier van der Leer
aa08063939 refactor(backend/db): Improve & clean up Marketplace DB layer & API (#12284)
These changes were part of #12206, but here they are separately for
easier review.
This is all primarily to make the v2 API (#11678) work possible/easier.

### Changes 🏗️

- Fix relations between `Profile`, `StoreListing`, and `AgentGraph`
- Redefine `StoreSubmission` view with more efficient joins (100x
speed-up on dev DB) and more consistent field names
- Clean up query functions in `store/db.py`
- Clean up models in `store/model.py`
- Add missing fields to `StoreAgent` and `StoreSubmission` views
- Rename ambiguous `agent_id` -> `graph_id`
- Clean up API route definitions & docs in `store/routes.py`
  - Make routes more consistent
- Avoid collision edge-case between `/agents/{username}/{agent_name}`
and `/agents/{store_listing_version_id}/*`
- Replace all usages of legacy `BackendAPI` for store endpoints with
generated client
- Remove scope requirements on public store endpoints in v1 external API

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Test all Marketplace views (including admin views)
    - [x] Download an agent from the marketplace
  - [x] Submit an agent to the Marketplace
  - [x] Approve/reject Marketplace submission
2026-03-06 14:38:12 +00:00
Bently
7c8c7bf395 feat(llm): add Claude Sonnet 4.6 model (#12158)
## Summary
Adds Claude Sonnet 4.6 (`claude-sonnet-4-6`) to the platform.

## Model Details (from [Anthropic
docs](https://www.anthropic.com/news/claude-sonnet-4-6))
- **API ID:** `claude-sonnet-4-6`
- **Pricing:** $3 / input MTok, $15 / output MTok (same as Sonnet 4.5)
- **Context window:** 200K tokens (1M beta)
- **Max output:** 64K tokens
- **Knowledge cutoff:** Aug 2025 (reliable), Jan 2026 (training data)

## Changes
- Added `CLAUDE_4_6_SONNET` to `LlmModel` enum
- Added metadata entry with correct context/output limits
- Updated Stagehand to use Sonnet 4.6 (better for browser automation
tasks)

## Why
Sonnet 4.6 brings major improvements in coding, computer use, and
reasoning. Developers with early access often prefer it to even Opus
4.5.

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-03-05 19:36:56 +00:00
Krzysztof Czerwinski
a1cb3d2a91 feat(blocks): Add Telegram blocks (#12141)
Add Telegram blocks that allow the use of [Telegram bots' API
features](https://core.telegram.org/bots/features).

### Changes 🏗️

1. Credentials & API layer: Bot token auth via `APIKeyCredentials`,
helper functions for JSON API calls (call_telegram_api) and multipart
file uploads (call_telegram_api_with_file)
2. Trigger blocks:
- `TelegramMessageTriggerBlock` — receives messages (text, photo, voice,
audio, document, video, edited message) with configurable event filters
- `TelegramMessageReactionTriggerBlock` — fires on reaction changes
(private chats auto, groups require admin)
2. Action blocks (11 total):
  - Send: Message, Photo, Voice, Audio, Document, Video
  - Reply to Message, Edit Message, Delete Message
  - Get File (download by file_id)
3. Webhook manager: Registers/deregisters webhooks via Telegram's
setWebhook API, validates incoming requests using
X-Telegram-Bot-Api-Secret-Token header
4. Provider registration: Added TELEGRAM to ProviderName enum and
registered `TelegramWebhooksManager`
5. Media send blocks support both URL passthrough (Telegram fetches
directly) and file upload for workspace/data URI inputs

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Non-AI UUIDs
  - [x] Blocks work correctly
    - [x] SendTelegramMessageBlock
    - [x] SendTelegramPhotoBlock
    - [x] SendTelegramVoiceBlock
    - [x] SendTelegramAudioBlock
    - [x] SendTelegramDocumentBlock
    - [x] SendTelegramVideoBlock
    - [x] ReplyToTelegramMessageBlock
    - [x] GetTelegramFileBlock
    - [x] DeleteTelegramMessageBlock
    - [x] EditTelegramMessageBlock
    - [x] TelegramMessageTriggerBlock (works for every trigger type)
    - [x] TelegramMessageReactionTriggerBlock

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-26 10:25:08 +00:00
Bently
ef42b17e3b docs: add Podman compatibility warning (#12120)
## Summary
Adds a warning to the Getting Started docs clarifying that **Podman and
podman-compose are not supported**.

## Problem
Users on Windows using `podman-compose` instead of Docker get errors
like:
```
Error: the specified Containerfile or Dockerfile does not exist, ..\..\autogpt_platform\backend\Dockerfile
```

This is because Podman handles relative paths differently than Docker,
causing incorrect path resolution on Windows.

## Solution
- Added a clear warning section after the Windows WSL 2 notes
- Explains the error users might see
- Directs them to install Docker Desktop instead

Closes #11358

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Adds a "Podman Not Supported" warning section to the Getting Started
documentation, placed after the Windows/WSL 2 installation notes. The
section clarifies that Docker is required, shows the typical error
message users encounter when using Podman, and directs them to install
Docker Desktop instead. This addresses issue #11358 where Windows users
using `podman-compose` hit path resolution errors.

- Adds `### ⚠️ Podman Not Supported` section under Manual Setup, after
Windows Installation Note
- Includes the specific error message users see with Podman for easy
identification
- Links to Docker Desktop installation docs as the recommended solution
- Formatting is consistent with existing sections in the document (emoji
headings, code blocks for errors)
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge — it only adds a documentation warning
section with no code changes.
- The change is a small, well-written documentation addition that adds a
Podman compatibility warning. It touches only one markdown file,
introduces no code changes, and is consistent with the existing document
structure and style. No issues were found.
- No files require special attention.
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[User wants to run AutoGPT] --> B{Which container runtime?}
    B -->|Docker / Docker Desktop| C[docker compose up -d --build]
    C --> D[AutoGPT starts successfully]
    B -->|Podman / podman-compose| E[podman-compose up -d --build]
    E --> F[Error: Containerfile or Dockerfile does not exist]
    F --> G[New warning section directs user to install Docker Desktop]
    G --> C
```
</details>


<sub>Last reviewed commit: 23ea6bd</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-23 15:19:24 +00:00
Eve
647c8ed8d4 feat(backend/blocks): enhance list concatenation with advanced operations (#12105)
## Summary

Enhances the existing `ConcatenateListsBlock` and adds five new
companion blocks for comprehensive list manipulation, addressing issue
#11139 ("Implement block to concatenate lists").

### Changes

- **Enhanced `ConcatenateListsBlock`** with optional deduplication
(`deduplicate`) and None-value filtering (`remove_none`), plus an output
`length` field
- **New `FlattenListBlock`**: Recursively flattens nested list
structures with configurable `max_depth`
- **New `InterleaveListsBlock`**: Round-robin interleaving of elements
from multiple lists
- **New `ZipListsBlock`**: Zips corresponding elements from multiple
lists with support for padding to longest or truncating to shortest
- **New `ListDifferenceBlock`**: Computes set difference between two
lists (regular or symmetric)
- **New `ListIntersectionBlock`**: Finds common elements between two
lists, preserving order

### Helper Utilities

Extracted reusable helper functions for validation, flattening,
deduplication, interleaving, chunking, and statistics computation to
support the blocks and enable future reuse.

### Test Coverage

Comprehensive test suite with 188 test functions across 29 test classes
covering:
- Built-in block test harness validation for all 6 blocks
- Manual edge-case tests for each block (empty inputs, large lists,
mixed types, nested structures)
- Internal method tests for all block classes
- Unit tests for all helper utility functions

Closes #11139

## Test plan

- [x] All files pass Python syntax validation (`ast.parse`)
- [x] Built-in `test_input`/`test_output` tests defined for all blocks
- [x] Manual tests cover edge cases: empty lists, large lists, mixed
types, nested structures, deduplication, None removal
- [x] Helper function tests validate all utility functions independently
- [x] All block IDs are valid UUID4
- [x] Block categories set to `BlockCategory.BASIC` for consistency with
existing list blocks


<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Enhanced `ConcatenateListsBlock` with deduplication and None-filtering
options, and added five new list manipulation blocks
(`FlattenListBlock`, `InterleaveListsBlock`, `ZipListsBlock`,
`ListDifferenceBlock`, `ListIntersectionBlock`) with comprehensive
helper functions and test coverage.

**Key Changes:**
- Enhanced `ConcatenateListsBlock` with `deduplicate` and `remove_none`
options, plus `length` output field
- Added `FlattenListBlock` for recursively flattening nested lists with
configurable `max_depth`
- Added `InterleaveListsBlock` for round-robin element interleaving
- Added `ZipListsBlock` with support for padding/truncation
- Added `ListDifferenceBlock` and `ListIntersectionBlock` for set
operations
- Extracted 12 reusable helper functions for validation, flattening,
deduplication, etc.
- Comprehensive test suite with 188 test functions covering edge cases

**Minor Issues:**
- Helper function `_deduplicate_list` has redundant logic in the `else`
branch that duplicates the `if` branch
- Three helper functions (`_filter_empty_collections`,
`_compute_list_statistics`, `_chunk_list`) are defined but unused -
consider removing unless planned for future use
- The `_make_hashable` function uses `hash(repr(item))` for unhashable
types, which correctly treats structurally identical dicts/lists as
duplicates
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor style improvements recommended
- The implementation is well-structured with comprehensive test coverage
(188 tests), proper error handling, and follows existing block patterns.
All blocks use valid UUID4 IDs and correct categories. The helper
functions provide good code reuse. The minor issues are purely stylistic
(redundant code, unused helpers) and don't affect functionality or
safety.
- No files require special attention - both files are well-tested and
follow project conventions
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant Block as List Block
    participant Helper as Helper Functions
    participant Output
    
    User->>Block: Input (lists/parameters)
    Block->>Helper: _validate_all_lists()
    Helper-->>Block: validation result
    
    alt validation fails
        Block->>Output: error message
    else validation succeeds
        Block->>Helper: _concatenate_lists_simple() / _flatten_nested_list() / etc.
        Helper-->>Block: processed result
        
        opt deduplicate enabled
            Block->>Helper: _deduplicate_list()
            Helper-->>Block: deduplicated result
        end
        
        opt remove_none enabled
            Block->>Helper: _filter_none_values()
            Helper-->>Block: filtered result
        end
        
        Block->>Output: result + length
    end
    
    Output-->>User: Block outputs
```
</details>


<sub>Last reviewed commit: a6d5445</sub>

<!-- greptile_other_comments_section -->

<sub>(2/5) Greptile learns from your feedback when you react with thumbs
up/down!</sub>

<!-- /greptile_comment -->

---------

Co-authored-by: Otto <otto@agpt.co>
2026-02-16 05:39:53 +00:00
Zamil Majdy
f9f358c526 feat(mcp): Add MCP tool block with OAuth, tool discovery, and standard credential integration (#12011)
## Summary

<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/18e8ef34-d222-453c-8b0a-1b25ef8cf806"
/>


<img width="250" alt="image"
src="https://github.com/user-attachments/assets/ba97556c-09c5-4f76-9f4e-49a2e8e57468"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/68f7804a-fe74-442d-9849-39a229c052cf"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/700690ba-f9fe-4726-8871-3bfbab586001"
/>

Full-stack MCP (Model Context Protocol) tool block integration that
allows users to connect to any MCP server, discover available tools,
authenticate via OAuth, and execute tools — all through the standard
AutoGPT credential system.

### Backend

- **MCPToolBlock** (`blocks/mcp/block.py`): New block using
`CredentialsMetaInput` pattern with optional credentials (`default={}`),
supporting both authenticated (OAuth) and public MCP servers. Includes
auto-lookup fallback for backward compatibility.
- **MCP Client** (`blocks/mcp/client.py`): HTTP transport with JSON-RPC
2.0, tool discovery, tool execution with robust error handling
(type-checked error fields, non-JSON response handling)
- **MCP OAuth Handler** (`blocks/mcp/oauth.py`): RFC 8414 discovery,
dynamic per-server OAuth with PKCE, token storage and refresh via
`raise_for_status=True`
- **MCP API Routes** (`api/features/mcp/routes.py`): `discover-tools`,
`oauth/login`, `oauth/callback` endpoints with credential cleanup,
defensive OAuth metadata validation
- **Credential system integration**:
- `CredentialsMetaInput` model_validator normalizes legacy
`"ProviderName.MCP"` format from Python 3.13's `str(StrEnum)` change
- `CredentialsFieldInfo.combine()` supports URL-based credential
discrimination (each MCP server gets its own credential entry)
- `aggregate_credentials_inputs` checks block schema defaults for
credential optionality
- Executor normalizes credential data for both Pydantic and JSON schema
validation paths
  - Chat credential matching handles MCP server URL filtering
- `provider_matches()` helper used consistently for Python 3.13 StrEnum
compatibility
- **Pre-run validation**: `_validate_graph_get_errors` now calls
`get_missing_input()` for custom block-level validation (MCP tool
arguments)
- **Security**: HTML tag stripping loop to prevent XSS bypass, SSRF
protection (removed trusted_origins)

### Frontend

- **MCPToolDialog** (`MCPToolDialog.tsx`): Full tool discovery UI —
enter server URL, authenticate if needed, browse tools, select tool and
configure
- **OAuth popup** (`oauth-popup.ts`): Shared utility supporting
cross-origin MCP OAuth flows with BroadcastChannel + localStorage
fallback
- **Credential integration**: MCP-specific OAuth flow in
`useCredentialsInput`, server URL filtering in `useCredentials`, MCP
callback page
- **CredentialsSelect**: Auto-selects first available credential instead
of defaulting to "None", credentials listed before "None" in dropdown
- **Node rendering**: Dynamic tool input schema rendering on MCP nodes,
proper handling in both legacy and new flow editors
- **Block title persistence**: `customized_name` set at block creation
for both MCP and Agent blocks — no fallback logic needed, titles survive
save/load reliably
- **Stable credential ordering**: Removed `sortByUnsetFirst` that caused
credential inputs to jump when selected

### Tests (~2060 lines)

- Unit tests: block, client, tool execution
- Integration tests: mock MCP server with auth
- OAuth flow tests
- API endpoint tests
- Credential combining/optionality tests
- E2e tests (skipped in CI, run manually)

## Key Design Decisions

1. **Optional credentials via `default={}`**: MCP servers can be public
(no auth) or private (OAuth). The `credentials` field has `default={}`
making it optional at the schema level, so public servers work without
prompting for credentials.

2. **URL-based credential discrimination**: Each MCP server URL gets its
own credential entry in the "Run agent" form (via
`discriminator="server_url"`), so agents using multiple MCP servers
prompt for each independently.

3. **Model-level normalization**: Python 3.13 changed `str(StrEnum)` to
return `"ClassName.MEMBER"`. Rather than scattering fixes across the
codebase, a Pydantic `model_validator(mode="before")` on
`CredentialsMetaInput` handles normalization centrally, and
`provider_matches()` handles lookups.

4. **Credential auto-select**: `CredentialsSelect` component defaults to
the first available credential and notifies the parent state, ensuring
credentials are pre-filled in the "Run agent" dialog without requiring
manual selection.

5. **customized_name for block titles**: Both MCP and Agent blocks set
`customized_name` in metadata at creation time. This eliminates
convoluted runtime fallback logic (`agent_name`, hostname extraction) —
the title is persisted once and read directly.

## Test plan

- [x] Unit/integration tests pass (68 MCP + 11 graph = 79 tests)
- [x] Manual: MCP block with public server (DeepWiki) — no credentials
needed, tools discovered and executable
- [x] Manual: MCP block with OAuth server (Linear, Sentry) — OAuth flow
prompts correctly
- [x] Manual: "Run agent" form shows correct credential requirements per
MCP server
- [x] Manual: Credential auto-selects when exactly one matches,
pre-selects first when multiple exist
- [x] Manual: Credential ordering stays stable when
selecting/deselecting
- [x] Manual: MCP block title persists after save and refresh
- [x] Manual: Agent block title persists after save and refresh (via
customized_name)
- [ ] Manual: Shared agent with MCP block prompts new user for
credentials

---------

Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-13 16:17:03 +00:00
Nicholas Tindle
cb166dd6fb feat(blocks): Store sandbox files to workspace (#12073)
Store files created by sandbox blocks (Claude Code, Code Executor) to
the user's workspace for persistence across runs.

### Changes 🏗️

- **New `sandbox_files.py` utility** (`backend/util/sandbox_files.py`)
  - Shared module for extracting files from E2B sandboxes
- Stores files to workspace via `store_media_file()` (includes virus
scanning, size limits)
  - Returns `SandboxFileOutput` with path, content, and `workspace_ref`

- **Claude Code block** (`backend/blocks/claude_code.py`)
  - Added `workspace_ref` field to `FileOutput` schema
  - Replaced inline `_extract_files()` with shared utility
  - Files from working directory now stored to workspace automatically

- **Code Executor block** (`backend/blocks/code_executor.py`)
  - Added `files` output field to `ExecuteCodeBlock.Output`
  - Creates `/output` directory in sandbox before execution
  - Extracts all files (text + binary) from `/output` after execution
- Updated `execute_code()` to support file extraction with
`extract_files` param

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create agent with Claude Code block, have it create a file, verify
`workspace_ref` in output
- [x] Create agent with Code Executor block, write file to `/output`,
verify `workspace_ref` in output
  - [x] Verify files persist in workspace after sandbox disposal
- [x] Verify binary files (images, etc.) work correctly in Code Executor
- [x] Verify existing graphs using `content` field still work (backward
compat)

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No configuration changes required - this is purely additive backend
code.

---

**Related:** Closes SECRT-1931

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds automatic extraction and workspace storage of sandbox-written
files (including binaries for code execution), which can affect output
payload size, performance, and file-handling edge cases.
> 
> **Overview**
> **Sandbox blocks now persist generated files to workspace.** A new
shared utility (`backend/util/sandbox_files.py`) extracts files from an
E2B sandbox (scoped by a start timestamp) and stores them via
`store_media_file`, returning `SandboxFileOutput` with `workspace_ref`.
> 
> `ClaudeCodeBlock` replaces its inline file-scraping logic with this
utility and updates the `files` output schema to include
`workspace_ref`.
> 
> `ExecuteCodeBlock` adds a `files` output and extends the executor
mixin to optionally extract/store files (text + binary) when an
`execution_context` is provided; related mocks/tests and docs are
updated accordingly.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
343854c0cf. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 15:56:59 +00:00
Reinier van der Leer
113e87a23c refactor(backend): Reduce circular imports (#12068)
I'm getting circular import issues because there is a lot of
cross-importing between `backend.data`, `backend.blocks`, and other
modules. This change reduces block-related cross-imports and thus risk
of breaking circular imports.

### Changes 🏗️

- Strip down `backend.data.block`
- Move `Block` base class and related class/enum defs to
`backend.blocks._base`
  - Move `is_block_auth_configured` to `backend.blocks._utils`
- Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks`
(`__init__.py`)
  - Update imports everywhere
- Remove unused and poorly typed `Block.create()`
  - Change usages from `block_cls.create()` to `block_cls()`
- Improve typing of `load_all_blocks` and `get_blocks`
- Move cross-import of `backend.api.features.library.model` from
`backend/data/__init__.py` to `backend/data/integrations.py`
- Remove deprecated attribute `NodeModel.webhook`
  - Re-generate OpenAPI spec and fix frontend usage
- Eliminate module-level `backend.blocks` import from `blocks/agent.py`
- Eliminate module-level `backend.data.execution` and
`backend.executor.manager` imports from `blocks/helpers/review.py`
- Replace `BlockInput` with `GraphInput` for graph inputs

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI static type-checking + tests should be sufficient for this
2026-02-12 12:07:49 +00:00
Otto
36aeb0b2b3 docs(blocks): clarify HumanInTheLoop output descriptions for agent builder (#12069)
## Problem

The agent builder (LLM) misinterprets the HumanInTheLoop block outputs.
It thinks `approved_data` and `rejected_data` will yield status strings
like "APPROVED" or "REJECTED" instead of understanding that the actual
input data passes through.

This leads to unnecessary complexity - the agent builder adds comparison
blocks to check for status strings that don't exist.

## Solution

Enriched the block docstring and all input/output field descriptions to
make it explicit that:
1. The output is the actual data itself, not a status string
2. The routing is determined by which output pin fires
3. How to use the block correctly (connect downstream blocks to
appropriate output pins)

## Changes

- Updated block docstring with clear "How it works" and "Example usage"
sections
- Enhanced `data` input description to explain data flow
- Enhanced `name` input description for reviewer context
- Enhanced `approved_data` output to explicitly state it's NOT a status
string
- Enhanced `rejected_data` output to explicitly state it's NOT a status
string
- Enhanced `review_message` output for clarity

## Testing

Documentation-only change to schema descriptions. No functional changes.

Fixes SECRT-1930

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Enhanced documentation for the `HumanInTheLoopBlock` to clarify how
output pins work. The key improvement explicitly states that output pins
(`approved_data` and `rejected_data`) yield the actual input data, not
status strings like "APPROVED" or "REJECTED". This prevents the agent
builder (LLM) from misinterpreting the block's behavior and adding
unnecessary comparison blocks.

**Key changes:**
- Added "How it works" and "Example usage" sections to the block
docstring
- Clarified that routing is determined by which output pin fires, not by
comparing output values
- Enhanced all input/output field descriptions with explicit data flow
explanations
- Emphasized that downstream blocks should be connected to the
appropriate output pin based on desired workflow path

This is a documentation-only change with no functional modifications to
the code logic.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with no risk
- Documentation-only change that accurately reflects the existing code
behavior. No functional changes, no runtime impact, and the enhanced
descriptions correctly explain how the block outputs work based on
verification of the implementation code.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-11 15:43:58 +00:00
Nicholas Tindle
85b6520710 feat(blocks): Add video editing blocks (#11796)
<!-- Clearly explain the need for these changes: -->
This PR adds general-purpose video editing blocks for the AutoGPT
Platform, enabling automated video production workflows like documentary
creation, marketing videos, tutorial assembly, and content repurposing.

### Changes 🏗️

<!-- Concisely describe all of the changes made in this pull request:
-->

**New blocks added in `backend/blocks/video/`:**
- `VideoDownloadBlock` - Download videos from URLs (YouTube, Vimeo, news
sites, direct links) using yt-dlp
- `VideoClipBlock` - Extract time segments from videos with start/end
time validation
- `VideoConcatBlock` - Merge multiple video clips with optional
transitions (none, crossfade, fade_black)
- `VideoTextOverlayBlock` - Add text overlays/captions with positioning
and timing options
- `VideoNarrationBlock` - Generate AI narration via ElevenLabs and mix
with video audio (replace, mix, or ducking modes)

**Dependencies required:**
- `yt-dlp` - For video downloading
- `moviepy` - For video editing operations

**Implementation details:**
- All blocks follow the SDK pattern with proper error handling and
exception chaining
- Proper resource cleanup in `finally` blocks to prevent memory leaks
- Input validation (e.g., end_time > start_time)
- Test mocks included for CI

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Blocks follow the SDK pattern with
`BlockSchemaInput`/`BlockSchemaOutput`
  - [x] Resource cleanup is implemented in `finally` blocks
  - [x] Exception chaining is properly implemented
  - [x] Input validation is in place
  - [x] Test mocks are provided for CI environments

#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)

N/A - No configuration changes required.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds new multimedia blocks that invoke ffmpeg/MoviePy and introduces
new external dependencies (plus container packages), which can impact
runtime stability and resource usage; download/overlay blocks are
present but disabled due to sandbox/policy concerns.
> 
> **Overview**
> Adds a new `backend.blocks.video` module with general-purpose video
workflow blocks (download, clip, concat w/ transitions, loop, add-audio,
text overlay, and ElevenLabs-powered narration), including shared
utilities for codec selection, filename cleanup, and an ffmpeg-based
chapter-strip workaround for MoviePy.
> 
> Extends credentials/config to support ElevenLabs
(`ELEVENLABS_API_KEY`, provider enum, system credentials, and cost
config) and adds new dependencies (`elevenlabs`, `yt-dlp`) plus Docker
runtime packages (`ffmpeg`, `imagemagick`).
> 
> Improves file/reference handling end-to-end by embedding MIME types in
`workspace://...#mime` outputs and updating frontend rendering to detect
video vs image from MIME fragments (and broaden supported audio/video
extensions), with optional enhanced output rendering behind a feature
flag in the legacy builder UI.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
da7a44d794. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 22:22:33 +00:00
Bently
bfa942e032 feat(platform): Add Claude Opus 4.6 model support (#11983)
## Summary
Adds support for Anthropic's newly released Claude Opus 4.6 model.

## Changes
- Added `claude-opus-4-6` to the `LlmModel` enum
- Added model metadata: 200K context window (1M beta), **128K max output
tokens**
- Added block cost config (same pricing tier as Opus 4.5: $5/MTok input,
$25/MTok output)
- Updated chat config default model to Claude Opus 4.6

## Model Details
From [Anthropic's
docs](https://docs.anthropic.com/en/docs/about-claude/models):
- **API ID:** `claude-opus-4-6`
- **Context window:** 200K tokens (1M beta)
- **Max output:** 128K tokens (up from 64K on Opus 4.5)
- **Extended thinking:** Yes
- **Adaptive thinking:** Yes (new, Opus 4.6 exclusive)
- **Knowledge cutoff:** May 2025 (reliable), Aug 2025 (training)
- **Pricing:** $5/MTok input, $25/MTok output (same as Opus 4.5)

---------

Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
2026-02-05 19:19:51 +00:00
Bently
3ca2387631 feat(blocks): Implement Text Encode block (#11857)
## Summary
Implements a `TextEncoderBlock` that encodes plain text into escape
sequences (the reverse of `TextDecoderBlock`).

## Changes

### Block Implementation
- Added `encoder_block.py` with `TextEncoderBlock` in
`autogpt_platform/backend/backend/blocks/`
- Uses `codecs.encode(text, "unicode_escape").decode("utf-8")` for
encoding
- Mirrors the structure and patterns of the existing `TextDecoderBlock`
- Categorised as `BlockCategory.TEXT`

### Documentation
- Added Text Encoder section to
`docs/integrations/block-integrations/text.md` (the auto-generated docs
file for TEXT category blocks)
- Expanded "How it works" with technical details on the encoding method,
validation, and edge cases
- Added 3 structured use cases per docs guidelines: JSON payload
preparation, Config/ENV generation, Snapshot fixtures
- Added Text Encoder to the overview table in
`docs/integrations/README.md`
- Removed standalone `encoder_block.md` (TEXT category blocks belong in
`text.md` per `CATEGORY_FILE_MAP` in `generate_block_docs.py`)

### Documentation Formatting (CodeRabbit feedback)
- Added blank lines around markdown tables (MD058)
- Added `text` language tags to fenced code blocks (MD040)
- Restructured use case section with bold headings per coding guidelines

## How Docs Were Synced
The `check-docs-sync` CI job runs `poetry run python
scripts/generate_block_docs.py --check` which expects blocks to be
documented in category-grouped files. Since `TextEncoderBlock` uses
`BlockCategory.TEXT`, the `CATEGORY_FILE_MAP` maps it to `text.md` — not
a standalone file. The block entry was added to `text.md` following the
exact format used by the generator (with `<!-- MANUAL -->` markers for
hand-written sections).

## Related Issue
Fixes #11111

---------

Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: lif <19658300+majiayu000@users.noreply.github.com>
Co-authored-by: Aryan Kaul <134673289+aryancodes1@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Nick Tindle <nick@ntindle.com>
2026-02-05 17:31:02 +00:00
Otto
4f908d5cb3 fix(platform): Improve Linear Search Block [SECRT-1880] (#11967)
## Summary

Implements [SECRT-1880](https://linear.app/autogpt/issue/SECRT-1880) -
Improve Linear Search Block

## Changes

### Models (`models.py`)
- Added `State` model with `id`, `name`, and `type` fields for workflow
state information
- Added `state: State | None` field to `Issue` model

### API Client (`_api.py`)
- Updated `try_search_issues()` to:
- Add `max_results` parameter (default 10, was ~50) to reduce token
usage
  - Add `team_id` parameter for team filtering
- Return `createdAt`, `state`, `project`, and `assignee` fields in
results
- Fixed `try_get_team_by_name()` to return descriptive error message
when team not found instead of crashing with `IndexError`

### Block (`issues.py`)
- Added `max_results` input parameter (1-100, default 10)
- Added `team_name` input parameter for optional team filtering
- Added `error` output field for graceful error handling
- Added categories (`PRODUCTIVITY`, `ISSUE_TRACKING`)
- Updated test fixtures to include new fields

## Breaking Changes

| Change | Before | After | Mitigation |
|--------|--------|-------|------------|
| Default result count | ~50 | 10 | Users can set `max_results` up to
100 if needed |

## Non-Breaking Changes

- `state` field added to `Issue` (optional, defaults to `None`)
- `max_results` param added (has default value)
- `team_name` param added (optional, defaults to `None`)
- `error` output added (follows established pattern from GitHub blocks)

## Testing

- [x] Format/lint checks pass
- [x] Unit test fixtures updated

Resolves SECRT-1880

---------

Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>
2026-02-04 22:54:46 +00:00
Otto
7ee94d986c docs: add credentials prerequisites to create-basic-agent guide (#11913)
## Summary
Addresses #11785 - users were encountering `openai_api_key_credentials`
errors when following the create-basic-agent guide because it didn't
mention the need to configure API credentials before using AI blocks.

## Changes
Added a **Prerequisites** section to
`docs/platform/create-basic-agent.md` explaining:
- **Cloud users:** Go to Profile → Integrations to add API keys
- **Self-hosted (Docker):** Add keys to `autogpt_platform/backend/.env`
and restart services

Also added a note that the Calculator example doesn't need credentials,
making it a good first test.

## Related
- Issue: #11785
2026-01-31 03:05:31 +00:00