AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
An Vy Le	3aa72b4245	feat(backend/copilot): inline picker-backed inputs via run_block + accept AgentInputBlock subclasses (#12880 ) ### Why / What / How Why: Resolves #12875. CoPilot's agent-builder was hardcoding Google Drive file IDs into consuming blocks' `input_default` instead of wiring an `AgentGoogleDriveFileInputBlock`. A beta user hit this across 13 saved versions of one agent. Root causes: 1. `validate_io_blocks` only accepted the literal base `AgentInputBlock` / `AgentOutputBlock` IDs, so even when CoPilot used a specialized subclass like `AgentGoogleDriveFileInputBlock` as the only input, the validator forced it to keep a throwaway base alongside — entrenching the anti-pattern. 2. Running a Drive consumer directly via CoPilot's `run_block` silently failed because the auto-credentials flow (picker attaches `_credentials_id`) existed only in the graph executor, never in CoPilot's direct-execution path. 3. Drive picker guidance lived in `agent_generation_guide.md` instead of on the blocks themselves, so it duplicated and drifted from the code. 4. Observed in a live session: when asked to read a private sheet, CoPilot refused with "share publicly or use the builder" instead of calling `run_block` and letting the picker render — the prompt rule was buried and the fallback path (omitted required picker field) returned a generic schema preview. What: Four coordinated platform + CoPilot improvements. No block-specific validator rules, no Drive-specific code in UI or prompt. How: #### 1. `validate_io_blocks` subclass support Accepts any block with `uiType == "Input"` / `"Output"` (populated from `Block.block_type` at registration). `AgentGoogleDriveFileInputBlock`, `AgentDropdownInputBlock`, `AgentTableInputBlock`, etc. stand alone. Base-ID fallback preserved for call sites that pass a minimal blocks list. #### 2. Inline picker via `run_block` - Extracted `_acquire_auto_credentials` from `backend/executor/manager.py` into shared `backend/executor/auto_credentials.py` (exports `acquire_auto_credentials` + `MissingAutoCredentialsError`). - Wired it into `backend/copilot/tools/helpers.py::execute_block`. When `_credentials_id` is present, the block executes with creds injected (chained flows work). When missing/null, `execute_block` returns the existing `SetupRequirementsResponse` — frontend's `FormRenderer` renders the picker inline via the existing `GoogleDrivePickerField`/`GoogleDrivePickerInput`. On pick, the LLM re-invokes `run_block` with the populated input — same continuation pattern as OAuth-missing-credentials. No new response types, no new continuation tool, no new frontend component. - `run_block` now short-circuits to `SetupRequirementsResponse` when missing required fields include a picker-backed field, skipping the schema-preview round trip the LLM would otherwise take. - `get_inputs_from_schema` spreads the full property schema (`schema`) instead of whitelisting — any `format` / `json_schema_extra` / custom widget config flows through to the generic custom-field dispatch on the frontend. Future picker formats (date pickers, file pickers, etc.) work without backend changes. - Frontend `SetupRequirementsCard/helpers.ts` uses index-signature passthrough for arbitrary schema keys — no widget-specific code in that layer. #### 3. `validate_only` parameter on `run_block` `run_block(id, {})` is not always a safe probe — for blocks with zero required inputs, it executes. New `validate_only: true` parameter returns `BlockDetailsResponse` (schema + missing-input list) without executing, rendering picker cards, or charging credits. Same response shape as the existing schema preview — no new branch, just an extra condition on the existing one. LLM uses this for pre-flight when it's unsure whether a block has required inputs. #### 4. Block-local picker guidance Agent-generation picker guidance relocated from the guide onto the blocks themselves — surfaced at `find_block` time, exactly when the LLM decides to wire a picker-backed consumer: - `GoogleDriveFileField` (shared factory for every Drive field on Sheets/Docs/etc.) appends a standard hint to the caller's description covering: feed from the specialized input block, never hardcode (even one parsed from a URL), picker is the only credential source. - `AgentGoogleDriveFileInputBlock`'s block description now covers when it's required, the `allowed_views` mapping, wiring direction, and a concrete link-shape example. - `agent_generation_guide.md` loses the dedicated 71-line Drive section. The IO-blocks section now tells the LLM specialized subclasses satisfy the requirement and carry their own usage guidance in block/field descriptions — read them when `find_block` surfaces a match. - New "Picker-backed inputs via `run_block`" section in the CoPilot prompt, written generically (picker fields detected via `format` / `auto_credentials` schema hints, no provider names hardcoded) — covers: don't ask the user for URLs/IDs, don't refuse private-resource asks, chained picker objects pass through as-is. - Sharpened `MissingAutoCredentialsError` message so when a bare ID reaches execution, the error explicitly tells the LLM the picker renders inline (not "ask the user for something"). ### Changes 🏗️ - `backend/copilot/tools/agent_generator/validator.py` — `_collect_io_block_ids` + subclass-aware `validate_io_blocks`. - `backend/executor/auto_credentials.py` (new) — shared `acquire_auto_credentials` + `MissingAutoCredentialsError`. - `backend/executor/manager.py` — imports from the shared module, drops the local copy. - `backend/copilot/tools/helpers.py` — `execute_block` calls `acquire_auto_credentials`, merges kwargs, releases locks in `finally`, returns `SetupRequirementsResponse` on missing creds. `get_inputs_from_schema` spreads the full property schema. - `backend/copilot/tools/run_block.py` — picker-field short-circuit + `validate_only` parameter. - `backend/copilot/prompting.py` — "Picker-backed inputs via `run_block`" + "Pre-flight with `validate_only`" sections. - `backend/blocks/google/_drive.py` — `GoogleDriveFileField` appends the agent-builder hint to every Drive consumer's description. - `backend/blocks/io.py` — `AgentGoogleDriveFileInputBlock` description expanded. - `backend/copilot/sdk/agent_generation_guide.md` — Drive section removed, IO-blocks subclass note expanded. - `frontend/.../SetupRequirementsCard/helpers.ts` — index-signature passthrough for arbitrary schema keys; schema fields propagate into the generated RJSF schema. - Tests: new `TestExecuteBlockAutoCredentials` (4 cases) + `validate_only` + picker-short-circuit cases in `run_block_test.py`; `manager_auto_credentials_test.py` moved to new import path; 6 new frontend cases in `SetupRequirementsCard/__tests__/helpers.test.ts` covering schema passthrough. - Also: one-line hoist of `import secrets` in `backend/integrations/managed_providers/ayrshare.py` — ruff E402 introduced by #12883 was blocking our lint post-merge. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Backend unit suites: validator_test (48), helpers_test (40), run_block_test (19), manager_auto_credentials_test (15) — all green - [x] Frontend `SetupRequirementsCard` helpers — 75/75 pass** (including 6 new passthrough cases) - [x] `poetry run format` (ruff + isort + black) clean on touched files (pre-existing pyright errors in unrelated `graphiti_core` / `StreamEvent` / etc. files not introduced by this PR) - [x] Live CoPilot chat on dev-builder confirmed the setup card renders `custom/google_drive_picker_field` for a Drive consumer block called via `run_block` - [x] Live agent-generation confirmed CoPilot creates a subclass-only agent (`AgentGoogleDriveFileInputBlock` → `GoogleSheetsReadBlock` → `AgentOutputBlock`) with no throwaway base `AgentInputBlock` #### For configuration changes: - [x] N/A — no config changes --------- Co-authored-by: majdyz <zamil.majdy@agpt.co>	2026-04-24 13:05:11 +07:00
Zamil Majdy	80bfde1ca6	feat(blocks): charge Ayrshare per-post + align Bannerbear/Jina floors (#12893 ) ## Why The cost-tracking audit on 2026-04-23 ([Platform System Credentials](https://www.notion.so/auto-gpt/4d251f343fe146bcb91b6a037d1bfc3c)) surfaced three gaps where the user wallet was silently subsidising third-party spend: 1. Ayrshare (13 blocks) — zero charge on every social post. No `BLOCK_COSTS` entry, no SDK `.with_base_cost` registration. Platform absorbs the entire ~$149/mo Business plan. 2. Bannerbear — flat 1 credit/call below the ~$0.025/image unit cost on the Starter tier ($49/mo / 2K images). 3. JinaChunkingBlock — wallet-free; siblings (`JinaEmbeddingBlock`, `SearchTheWebBlock`) are charged. ## What - New `backend/blocks/ayrshare/_cost.py` with two-tier `AYRSHARE_POST_COSTS` (5 credits when `is_video=True`, 2 credits otherwise — first-match wins in `block_usage_cost`). - All 13 `PostToBlock` classes decorated with `@cost(AYRSHARE_POST_COSTS)`. - `BannerbearTextOverlayBlock` floor: 1 → 3 credits in `bannerbear/_config.py`. - `JinaChunkingBlock` added to `BLOCK_COSTS` with a flat 1-credit floor. - `cost(...)` decorator generic-ized via `TypeVar`, so pyright retains `PostToXBlock.Input/Output` narrowing. ## How Ayrshare uses a decorator-based registration (not a direct `BLOCK_COSTS` entry) because each `post_to_.py` block imports from `backend.sdk`, and `backend.sdk.cost_integration` imports `BLOCK_COSTS` — listing the blocks in `block_cost_config.py` would create a circular import. The `@cost` decorator defined in `sdk/cost_integration.py` was already the approved escape hatch for this exact shape. cost_filter in `block_usage_cost` already supports boolean-field matching (see Apollo's `enrich_info` tier), so `{"is_video": True}` and `{"is_video": False}` select the right tier at execution time. `is_video` defaults to `False` on `BaseAyrshareInput`, so posts that omit the field still land on the 2-credit default. ## Test plan - [x] `poetry run pytest backend/data/block_cost_config_test.py` — new 6-test suite covers Ayrshare video/non-video/default tiers, the Bannerbear floor, and the Jina chunking floor - [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py` — no regressions (45 pre-existing tests still pass) - [x] `poetry run ruff format` + `poetry run isort` + `poetry run ruff check --fix` - [x] `poetry run pyright` on touched files — 0 errors, 0 warnings (pre-existing `LlmModel.KIMI_K2_` errors are on dev and unrelated) - [ ] Manual: run an Ayrshare post through the builder and confirm 2cr (text/image) vs 5cr (video) charge	2026-04-23 20:39:35 +07:00
Zamil Majdy	39cdc0a5e0	fix(backend/copilot): tame Kimi compaction storm + tunable threshold + Langfuse cost backfill (#12889 ) ## Why Investigation of two reported sessions ([85804387](https://dev-builder.agpt.co/copilot?sessionId=85804387-7708-4fdc-8ec9-64283cdd902d), [19d69dec](https://dev-builder.agpt.co/copilot?sessionId=19d69dec-210f-4439-a94b-2d7d443b9909)) where Kimi K2.6 via OpenRouter was running ~30 min per turn with no actions completed (Discord report from Toran). Langfuse traces showed: - 31 generation calls per turn at p90 = 151s, max = 415s - 2.57M uncached tokens, `cache_create=0`, ~4% cache_read — Moonshot's OpenRouter endpoint silently drops Anthropic-style cache writes - 3 SDK-internal compactions per turn — each compaction is itself a slow LLM round-trip - Reconciled OpenRouter cost was being recorded to a DB row but never surfaced on the Langfuse trace, leaving operators to grep pod logs ## What Four commits, split by concern. ### 1. `fix(backend/copilot): skip CLAUDE_AUTOCOMPACT_PCT_OVERRIDE for Moonshot/Kimi` (`5fd9c5aa`) `env.py` was unconditionally setting `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50` (introduced in #12747 to cap cache-creation cost on Anthropic where context >200K = 54% of total cost). On Kimi where `cache_create=0` silently, the cache-cost rationale doesn't apply — but the 50% threshold still made the bundled CLI auto-compact at ~100K tokens, triggering 3+ compactions per turn against Kimi's larger effective window. Each compaction added a slow LLM round-trip (one in our test ran 166s and burned the budget cap before the user got any output). Threads the resolved `sdk_model` (and `fallback_model`) into `build_sdk_env` and skips the env var when the model matches `is_moonshot_model(...)`. The CLI then uses its default ~93% threshold, cutting compaction passes to 0–1. ### 2. `feat(backend/copilot): backfill OpenRouter reconciled cost to Langfuse trace` (`f3de3624` + follow-ups `5ce3d038`, `d2c1a2cd`, `d8e08525`, `d243bf6c9`) `record_turn_cost_from_openrouter` runs as a fire-and-forget task after the OTel span closes, so the Langfuse trace UI showed the SDK CLI's rate-card estimate only — for non-Anthropic OpenRouter routes that estimate is Sonnet pricing on Kimi tokens (~5x too high). The backfill captures `langfuse.get_current_trace_id()` and threads it into the reconcile task, which emits an `openrouter-cost-reconcile` child event with the authoritative cost + token usage. Bug caught during /pr-test: `propagate_attributes` only annotates an existing OTel span, it doesn't create one — by the time the `finally` block runs, SDK-emitted spans have ended and `get_current_trace_id()` returns None. Fixed in `d8e08525` by wrapping the turn in `langfuse.start_as_current_span(name="copilot-sdk-turn")`. Also tags fallback-path events with `cost_source` so operators can distinguish reconciled vs estimated turns. ### 3. `feat(backend/copilot): expose CLAUDE_AUTOCOMPACT_PCT_OVERRIDE as a config knob` (`72416f73`) The previously-hardcoded `50` is now `claude_agent_autocompact_pct_override` (default 50, env `CHAT_CLAUDE_AGENT_AUTOCOMPACT_PCT_OVERRIDE`). Setting to 0 omits the env var entirely so the CLI uses its native ~93% threshold — useful when the post-compact floor (system prompt + tool defs ≈ 65–110K) sits close to an aggressive trigger and operators see back-to-back compaction cascades. Moonshot routes still skip the env var unconditionally regardless of config. ### 4. `fix(backend/copilot): align SDK retry compaction target with CLI autocompact threshold` (`730ad256`) `_reduce_context` was calling `compact_transcript` without an explicit `target_tokens`, so it fell back to `get_compression_target(model) = context_window - 60K`. For Sonnet 200K that's 140K — well above the CLI's PCT=50 trigger of 90K — and for Kimi 256K it's 196K, above the CLI's default 167K trigger. Result: a successful retry compaction landed at 140K/196K and the CLI immediately re-compacted on the next call → two compactions per recovered turn. New `_compaction_target_tokens(model)` mirrors the CLI's `i6_()` formula (`min(window * pct/100, window - 13K)`) with a 20K safety buffer so the post-compact context sits comfortably below the CLI's trigger. ## How — empirical validation against the actual long Kimi transcript Replayed the 199-message transcript from session 85804387 through the bundled CLI in two configurations: \| \| Post-fix (no override) \| Pre-fix (`PCT_OVERRIDE=50`) \| \|---\|---\|---\| \| `autocompact: tokens=` \| 126,312 \| 126,341 \| \| `threshold=` \| 167,000 \| 90,000 \| \| Decision \| 126K < 167K → skip \| 126K > 90K → COMPACTION FIRES \| \| Duration \| 21s \| 166s (8x slower) \| \| Cost \| $0.34 \| $0.82 (2.4x more) \| \| Output \| PONG (success) \| empty (hit $0.50 budget cap, exit 1) \| The pre-fix configuration burned $0.82 of compaction work over 166s and never produced a user response — exactly the failure mode reported. Why cascade happens at 50%, not at 93%: post-compaction context is `summary (~5–10K) + system_prompt + tool_definitions + skills + active TodoWrite + memory ≈ 65–110K floor`. With trigger at 90K, post-compact floor sits AT or above the trigger → next assistant message tips over → immediate re-compaction → cascade until the CLI's rapid-refill breaker trips at 3 attempts. With trigger at 167K, the same floor sits comfortably below trigger → no cascade. ## Considered but not done - Force `cache_control` markers to reach Moonshot: bundled CLI sends them by default; Moonshot silently drops them per their own docs (uses `X-Msh-Context-Cache` headers, not body markers). Real fix needs bypassing OpenRouter — out of scope. - Slim the system prompt + tool definitions to lower the post-compact floor: real win but separate refactor with tool-use accuracy A/B. - LD-driven auto-fallback to Sonnet on Kimi degradation: `claude_agent_fallback_model` already wires `--fallback-model` for overload (529); auto-flipping on slowness needs latency aggregation infra that doesn't exist yet. ## Test plan - [x] `poetry run pytest backend/copilot/sdk/env_test.py backend/copilot/sdk/openrouter_cost_test.py backend/copilot/sdk/service_helpers_test.py` — 111 passed (37 env + 23 cost + 51 helpers, including 6 new env tests, 3 backfill tests, 6 new compaction-target tests) - [x] `poetry run pytest backend/copilot/sdk/` — 970+ passed - [x] `poetry run pyright .` — 0 errors - [x] `poetry run format` — clean - [x] /pr-test --fix end-to-end against dev — 5/5 scenarios PASS, including Anthropic route ($0.0174 cost +0.0% delta) and Moonshot route ($0.028 vs $0.018 → +58.2% delta validates reconcile rationale) - [x] Transcript replay validation: pre-fix vs post-fix on real 126K-token transcript → 8x slower / 2.4x more expensive / fails entirely on pre-fix; clean PONG on post-fix	2026-04-23 18:46:35 +07:00
Zamil Majdy	e3f6d36759	feat(backend/blocks): register 13 paid blocks + document credit/microdollar wallet boundary (#12876 ) ### Why / What / How Why. Audit of `BLOCK_COSTS` against `credentials_store.py` system credentials revealed 13 paid blocks running for free from the credit wallet's perspective — `BLOCK_COSTS.get(type(block))` returned `None`, `cost = 0`, no `spend_credits` deduction. Users without their own API key consumed system credentials with zero credit drain. Separately, the credit wallet (user-facing prepaid balance) and the copilot microdollar counter (operator-side meter that gates `daily_cost_limit_microdollars`) were never documented as separate systems, so future readers kept tripping on the "why isn't this block charging my limit?" question. What. Three deltas, all credit-wallet-side: - Register the 13 paid blocks in `BLOCK_COSTS` with reasonable per-call credit prices (1 credit = $0.01). Pricing researched against the providers' published rates with ~2-3x markup. - Document the credit/microdollar boundary in `copilot/rate_limit.py`: credits = user-facing prepaid wallet with marketplace-creator charging; microdollars = operator-side meter that only ticks on copilot LLM turns (baseline / SDK / web_search / simulator). Block execution bills credits, not microdollars — explicit contract. - Populate `provider_cost` on PerplexityBlock so PlatformCostLog rows carry the real OpenRouter `x-total-cost` value via the existing `executor/cost_tracking.log_system_credential_cost` path (separate flow from credit deduction). ### Block costs registered \| Provider \| Block \| Credits \| Raw cost / markup \| \|---\|---\|---\|---\| \| Perplexity (OpenRouter) \| PerplexityBlock — Sonar \| 1 \| $0.001-0.005 / call \| \| \| PerplexityBlock — Sonar Pro \| 5 \| $0.025 / call \| \| \| PerplexityBlock — Sonar Deep Research \| 10 \| up to $0.05 / call \| \| Jina \| FactCheckerBlock \| 1 \| $0.005 / call \| \| Mem0 \| AddMemoryBlock \| 1 \| $0.0004 / call (1c floor) \| \| \| SearchMemoryBlock \| 1 \| $0.004 / call \| \| \| GetAllMemoriesBlock \| 1 \| $0.004 / call \| \| \| GetLatestMemoryBlock \| 1 \| $0.004 / call \| \| ScreenshotOne \| ScreenshotWebPageBlock \| 2 \| $0.0085 / call (2.4x) \| \| Nvidia \| NvidiaDeepfakeDetectBlock \| 2 \| est $0.005 (no public SKU) \| \| Smartlead \| CreateCampaignBlock \| 2 \| $0.0065 send-equivalent (3x) \| \| \| AddLeadToCampaignBlock \| 1 \| $0.0065 (1.5x) \| \| \| SaveCampaignSequencesBlock \| 1 \| config-only \| \| ZeroBounce \| ValidateEmailsBlock \| 2 \| $0.008 / email (2.5x) \| \| E2B + Anthropic \| ClaudeCodeBlock \| 100 \| $0.50-$2 / typical session (E2B sandbox + in-sandbox Claude) \| Not in scope — already covered via the SDK `ProviderBuilder.with_base_cost()` pattern in their respective `_config.py`: Exa, Linear, Airtable, Bannerbear, Wolfram, Firecrawl, Wordpress, Baas, Stagehand, Dataforseo. ### How 1. `backend/data/block_cost_config.py` — 13 new `BlockCost` entries (3 Perplexity models + Fact Checker + 11 from this round). 2. `backend/copilot/rate_limit.py` — boundary docstring. 3. `backend/blocks/perplexity.py` — populate `NodeExecutionStats.provider_cost` so PlatformCostLog rows carry the real OpenRouter `x-total-cost` value. 4. Tests — `TestUnregisteredBlockRunsFree` regression + `TestNewlyRegisteredBlockCosts` pinning every new entry by `cost_amount` so a future refactor can't quietly drop one. The companion Notion "Platform System Credentials" database has been updated with a new `Platform Credit Cost` column populated across all 30 provider rows. ### Scope trim An earlier revision piped block execution cost into the copilot microdollar counter via `_record_block_microdollar_cost` in `copilot/tools/helpers.py::execute_block`. That was reverted in `16ae0f7b5` — the microdollar counter stays scoped to copilot LLM turns only, credit wallet handles block execution. The pipe-through crossed a boundary we explicitly want to keep. ### Changes - `backend/data/block_cost_config.py` — 13 × `BlockCost` entries across 7 providers. - `backend/blocks/perplexity.py` — populate `provider_cost` on the execution stats (feeds PlatformCostLog). - `backend/copilot/rate_limit.py` — boundary docstring only (no behaviour change). - `backend/copilot/tools/helpers_test.py` — `TestUnregisteredBlockRunsFree` + `TestNewlyRegisteredBlockCosts` (8 new regression tests). - `backend/blocks/block_cost_tracking_test.py` — provider-cost extraction pins. ### Checklist For code changes: - [x] Changes listed above - [x] Test plan below - [x] Tested according to the test plan: - [x] `poetry run pytest backend/copilot/tools/helpers_test.py backend/copilot/tools/run_block_test.py backend/copilot/tools/continue_run_block_test.py backend/blocks/block_cost_tracking_test.py backend/blocks/test/test_perplexity.py` — passes - [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py backend/copilot/rate_limit_test.py backend/copilot/token_tracking_test.py` — passes (confirms docstring edits didn't regress the LLM-turn microdollar path) - [x] Pyright clean on all touched files - [ ] Manual: run PerplexityBlock via copilot `run_block` — credits deduct, PlatformCostLog row visible with `provider_cost`, no microdollar-counter tick. - [ ] Manual: run an unregistered block via copilot — no error, no credit drain, no silent billing. - [ ] Manual: run ClaudeCodeBlock via builder — 100 credits deducted from wallet. ### Companion PR PR #12873 ships the copilot microdollar / rate-limit work (web_search cost, simulator cost, reasoning / reconnect fixes). This PR is credit-wallet only.	2026-04-22 12:03:02 +07:00
Zamil Majdy	fcaebd1bb7	refactor(backend/copilot): unified queue-backed copilot turns + async sub-AutoPilot + guide-read gate (#12841 ) ### Why / What / How Why: the 10-min stream-level idle timeout was killing legitimate long-running tool calls — notably sub-AutoPilot runs via `run_block(AutoPilotBlock)`, which routinely take 15–45 min. The symptom users saw was `"A tool call appears to be stuck"` even though AutoPilot was actively working. A second long-standing rough edge was shipped alongside: agents often skipped `get_agent_building_guide` when generating agent JSON, producing schemas that failed validation and burned turns on auto-fix loops. What: three threaded pieces. 1. Async sub-AutoPilot via `run_sub_session`. New copilot tool that delegates a task to a fresh (or resumed) sub-AutoPilot, and its companion `get_sub_session_result` for polling/cancelling. The agent starts with `run_sub_session(prompt, wait_for_result≤300s)` and, if the sub isn't done inside the cap, receives a handle + polls via `get_sub_session_result(wait_if_running≤300s)`. No single MCP call ever blocks the stream for more than 5 min, so the 10-min stream-idle timer stays simple and effective (derived as `MAX_TOOL_WAIT_SECONDS * 2`). 2. Queue-backed copilot turn dispatch — one code path for all three callers. - `run_sub_session` enqueues a `CoPilotExecutionEntry` on the existing `copilot_execution` exchange instead of spawning an in-process `asyncio.Task`. - `AutoPilotBlock.execute_copilot` (graph block) now uses the same queue instead of `collect_copilot_response` inline. - The HTTP SSE endpoint was already queue-backed. - All three share a single primitive: `run_copilot_turn_via_queue` → `create_session` → `enqueue_copilot_turn` → `wait_for_session_result`. The event-aggregation logic (`EventAccumulator`/`process_event`) is a shared module used by both the direct-stream path and the cross-process waiter. - Benefits: deploy/crash resilience (RabbitMQ redelivery survives worker restarts), natural load balancing across copilot_executor workers, sessions as first-class resources (UI users can `/copilot?sessionId=<inner>` into any sub or AutoPilot block's session), and every future stream-level feature (pending-messages drain #12737, compaction policies, etc.) applies uniformly instead of bypassing graph-block sessions. 3. Guide-read gate on agent-generation tools. `create_agent` / `edit_agent` / `validate_agent_graph` / `fix_agent_graph` refuse until the session has called `get_agent_building_guide`. The pre-existing soft hint was routinely ignored; the gate makes the dependency enforceable. All four tool descriptions advertise the requirement in one tightened sentence ("Requires get_agent_building_guide first (refuses otherwise).") that stays under the 32000-char schema budget. How: #### Queue-backed sub-AutoPilot + AutoPilotBlock - `sdk/session_waiter.py` — new module. `SessionResult` dataclass mirrors `CopilotResult`. `wait_for_session_result` subscribes to `stream_registry`, drains events via shared `process_event`, returns `(outcome, result)`. `wait_for_session_completion` is the cheaper outcome-only variant. `run_copilot_turn_via_queue` is the canonical three-step dispatch. Every exit path unsubscribes the listener. - `sdk/stream_accumulator.py` — new module. `EventAccumulator`, `ToolCallEntry`, `process_event` extracted from `collect.py`. Both the direct-stream and cross-process paths now use the same fold logic. - `tools/run_sub_session.py` / `tools/get_sub_session_result.py` — rewritten around the shared primitive. `sub_session_id` is now the sub's `ChatSession` id directly (no separate registry handle). Ownership re-verified on every call via `get_chat_session`. Cancel via `enqueue_cancel_task` on the existing `copilot_cancel` fan-out exchange. - `blocks/autopilot.py` — `execute_copilot` replaced its inline `collect_copilot_response` with `run_copilot_turn_via_queue`. `SessionResult` carries response text, tool calls, and token usage back from the worker so no DB round-trip is needed. The block's public I/O contract (inputs, outputs, `ToolCallEntry` shape) is unchanged. - `CoPilotExecutionEntry` gains a `permissions: CopilotPermissions \| None` field forwarded to the worker's `stream_fn` so the sub's capability filter survives the queue hop. The processor passes it through to `stream_chat_completion_sdk` / `stream_chat_completion_baseline`. - Deleted: `sdk/sub_session_registry.py` (module-level dict, done-callback, abandoned-task cap, `notify_shutdown_and_cancel_all`, `_reset_for_test`), plus the shutdown-notifier hook in `copilot_executor.processor.cleanup` — redundant under queue-backed execution. #### Run_block single-tool cap (3) - `tools/helpers.execute_block` caps block execution at `MAX_TOOL_WAIT_SECONDS = 5 min` via `asyncio.wait_for` around the generator consumption. - On timeout: logs `copilot_tool_timeout tool=run_block block=… block_id=… input_keys=… user=… session=… cap_s=…` (grep-friendly) and returns an `ErrorResponse` that redirects the LLM to `run_agent` / `run_sub_session`. - Billing protection: `_charge_block_credits` is called in a `finally` guarded by `asyncio.shield` and marked `charge_handled` before the await so cancel-mid-charge doesn't double-bill and cancel-mid-generator-before-charge still settles via the finally. #### Guide-read gate - `helpers.require_guide_read(session, tool_name)` scans `session.messages` for any prior assistant tool call named `get_agent_building_guide` (handles both OpenAI and flat shapes). Applied at the top of `_execute` in `create_agent`, `edit_agent`, `validate_agent_graph`, `fix_agent_graph`. Tool descriptions advertise the requirement. #### Shared timing constants - `MAX_TOOL_WAIT_SECONDS = 5 * 60` + `STREAM_IDLE_TIMEOUT_SECONDS = 2 * MAX_TOOL_WAIT_SECONDS` in `constants.py`. Every long-running tool (`run_agent`, `view_agent_output`, `run_sub_session`, `get_sub_session_result`, `run_block`) imports from one place; no more hardcoded 300 / `1060` literals drifting apart. Stream-idle invariant ("no single tool blocks close to the idle timeout") holds by construction. ### Frontend - Friendlier tool-card labels: `run_sub_session` → "Sub-AutoPilot", `get_sub_session_result` → "Sub-AutoPilot result", `run_block` → "Action" (matches the builder UI's own naming), `run_agent` → "Agent". Fixes the double-verb "Running Run …" phrasing. - `SubSessionStatusResponse.sub_autopilot_session_link` surfaces `/copilot?sessionId=<inner>` so users can click into any sub's session from the tool-call card — same pattern as `run_agent`'s `library_agent_link`. ### Changes 🏗️ - New modules: `sdk/session_waiter.py`, `sdk/stream_accumulator.py`, `tools/run_sub_session.py`, `tools/get_sub_session_result.py`, `tools/sub_session_test.py`, `tools/agent_guide_gate_test.py`. - New response types: `SubSessionStatusResponse`, `SubSessionProgressSnapshot`, `SessionResult`. - New gate helper: `require_guide_read` in `tools/helpers.py`. - Queue protocol: `permissions` field on `CoPilotExecutionEntry`, threaded through `processor.py` → `stream_fn`. - Hidden: `AUTOPILOT_BLOCK_ID` in `COPILOT_EXCLUDED_BLOCK_IDS` (run_block can't execute AutoPilotBlock; agents use `run_sub_session` instead). - Deleted: `sdk/sub_session_registry.py`, processor shutdown-notifier hook. - Regenerated: `openapi.json` for the new response types; block-docs for the updated `ToolName` Literal. - Tool descriptions: tightened the guide-gate hint across the four agent-builder tools to stay under the 32000-char schema budget. - 40+ tests* across sub_session, execute_block cap + billing races, stream_accumulator, agent_guide_gate, frontend helpers. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Unit suite green on the full copilot tree; `poetry run format` + `pyright` clean - [x] Schema character budget test passes (tool descriptions trimmed to stay under 32000) - [x] Native UI E2E (`poetry run app` + `pnpm dev`): `run_sub_session(wait_for_result=60)` returns `status="completed"` + `sub_autopilot_session_link` inline; `run_sub_session(wait_for_result=1)` returns `status="running"` + handle, `get_sub_session_result(wait_if_running=60)` observes `running → completed` transition - [x] AutoPilotBlock (graph) goes through `copilot_executor` queue end-to-end (verified via logs: ExecutionManager's AutoPilotBlock node spawned session `f6de335b-…`, a different `CoPilotExecutor` worker acquired its cluster lock and ran the SDK stream) - [x] Guide gate: `create_agent` without a prior `get_agent_building_guide` returns the refusal; agent reads the guide and retries successfully	2026-04-18 23:11:41 +07:00
slepybear	334ec18c31	docs: convert in-code comments to MkDocs admonitions in block-sdk-gui… (#12819 ) ### Why / What / How <!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? --> This PR converts inline Python comments in code examples within `block-sdk-guide.md` into MkDocs `!!! note` admonitions. This makes code examples cleaner and more copy-paste friendly while preserving all explanatory content. <!-- What: What does this PR change? Summarize the changes at a high level. --> Converts inline comments in code blocks to admonitions following the pattern established in PR #12396 (new_blocks.md) and PR #12313. <!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. --> - Wrapped code examples with `!!! note` admonitions - Removed inline comments from code blocks for clean copy-paste - Added explanatory admonitions after each code block ### Changes 🏗️ - Provider configuration examples (API key and OAuth) - Block class Input/Output schema annotations - Block initialization parameters - Test configuration - OAuth and webhook handler implementations - Authentication types and file handling patterns ### Checklist 📋 #### For documentation changes: - [x] Follows the admonition pattern from PR #12396 - [x] No code changes, documentation only - [x] Admonition syntax verified correct #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [ ] `docker-compose.yml` is updated or already compatible with my changes --- Related Issues: Closes #8946 Co-authored-by: slepybear <slepybear@users.noreply.github.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-04-17 07:47:52 +00:00
Toran Bruce Richards	f410929560	feat(platform): Add xAI Grok 4.20 models from OpenRouter (#12620 ) Requested by @Torantulino Adds the 2 xAI Grok 4.20 models available on OpenRouter that are missing from the platform. ## Why `x-ai/grok-4.20` and `x-ai/grok-4.20-multi-agent` are xAI's current flagship models (released March 2026) and are available via OpenRouter, but weren't accessible from the platform's LLM blocks. ## Changes `autogpt_platform/backend/backend/blocks/llm.py` - Added `GROK_4_20` and `GROK_4_20_MULTI_AGENT` enum members - Added corresponding `MODEL_METADATA` entries (open_router provider, 2M context window, price tier 3) `autogpt_platform/backend/backend/data/block_cost_config.py` - Added `MODEL_COST` entries at 5 credits each (flagship tier, $2/M in) `docs/integrations/block-integrations/llm.md` - Added new model IDs to all LLM block tables \| Model \| Pricing \| Context \| \|-------\|---------\|---------\| \| `x-ai/grok-4.20` \| $2/M in, $6/M out \| 2M \| \| `x-ai/grok-4.20-multi-agent` \| $2/M in, $6/M out \| 2M \| Both models use the standard OpenRouter chat completions API — no special handling needed. Resolves: SECRT-2196 --------- Co-authored-by: Torantulino <22963551+Torantulino@users.noreply.github.com> Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com> Co-authored-by: Otto (AGPT) <otto@agpt.co>	2026-04-16 12:14:56 +00:00
Zamil Majdy	77d8362983	docs(blocks): sync misc.md with memory_search/memory_store tools from dev merge	2026-04-09 23:15:02 +07:00
Toran Bruce Richards	f6ddcbc6cb	feat(platform): Add all 12 Z.ai GLM models via OpenRouter (#12672 ) ## Summary Add Z.ai (Zhipu AI) GLM model family to the platform LLM blocks, routed through OpenRouter. This enables users to select any of the 12 Z.ai models across all LLM-powered blocks (AI Text Generator, AI Conversation, AI Structured Response, AI Text Summarizer, AI List Generator). ## Gap Analysis All 12 Z.ai models currently available on OpenRouter's API were missing from the AutoGPT platform: \| Model \| Context Window \| Max Output \| Price Tier \| Cost \| \|-------\|---------------\|------------\|------------\|------\| \| GLM 4 32B \| 128K \| N/A \| Tier 1 \| 1 \| \| GLM 4.5 \| 131K \| 98K \| Tier 2 \| 2 \| \| GLM 4.5 Air \| 131K \| 98K \| Tier 1 \| 1 \| \| GLM 4.5 Air (Free) \| 131K \| 96K \| Tier 1 \| 1 \| \| GLM 4.5V (vision) \| 65K \| 16K \| Tier 2 \| 2 \| \| GLM 4.6 \| 204K \| 204K \| Tier 1 \| 1 \| \| GLM 4.6V (vision) \| 131K \| 131K \| Tier 1 \| 1 \| \| GLM 4.7 \| 202K \| 65K \| Tier 1 \| 1 \| \| GLM 4.7 Flash \| 202K \| N/A \| Tier 1 \| 1 \| \| GLM 5 \| 80K \| 131K \| Tier 2 \| 2 \| \| GLM 5 Turbo \| 202K \| 131K \| Tier 3 \| 4 \| \| GLM 5V Turbo (vision) \| 202K \| 131K \| Tier 3 \| 4 \| ## Changes - `autogpt_platform/backend/backend/blocks/llm.py`: Added 12 `LlmModel` enum entries and corresponding `MODEL_METADATA` with context windows, max output tokens, display names, and price tiers sourced from OpenRouter API - `autogpt_platform/backend/backend/data/block_cost_config.py`: Added `MODEL_COST` entries for all 12 models, with costs scaled to match pricing (1 for budget, 2 for mid-range, 4 for premium) ## How it works All Z.ai models route through the existing OpenRouter provider (`open_router`) — no new provider or API client code needed. Users with an OpenRouter API key can immediately select any Z.ai model from the model dropdown in any LLM block. ## Related - Linear: REQ-83 --------- Co-authored-by: AutoGPT CoPilot <copilot@agpt.co>	2026-04-03 15:48:33 +00:00
Zamil Majdy	fff101e037	feat(backend): add SQL query block with multi-database support for CoPilot analytics (#12569 ) ## Summary - Add a read-only SQL query block for CoPilot/AutoPilot analytics access - Supports multiple databases: PostgreSQL, MySQL, SQLite, MSSQL via SQLAlchemy - Enforces read-only queries (SELECT only) with defense-in-depth SQL validation using sqlparse - SSRF protection: blocks connections to private/internal IPs - Credentials stored securely via the platform credential system ## Changes - New `SQLQueryBlock` in `backend/blocks/sql_query_block.py` with `DatabaseType` enum - SQLAlchemy-based execution with dialect-specific read-only and timeout settings - Connection URL validation ensuring driver matches selected database type - Comprehensive test suite (62 tests) including URL validation, sanitization, serialization - Documentation in `docs/integrations/block-integrations/data.md` - Added `DATABASE` provider to `ProviderName` enum ### Checklist 📋 - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan #### Test plan: - [x] Unit tests pass for query validation, URL validation, error sanitization, value serialization - [x] Read-only enforcement rejects INSERT/UPDATE/DELETE/DROP - [x] Multi-statement injection blocked - [x] SSRF protection blocks private IPs - [x] Connection URL driver validation works for all 4 database types --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 06:43:40 +00:00
Zamil Majdy	f1ac05b2e0	fix(backend): propagate dry-run mode to special blocks with LLM-powered simulation (#12575 ) ## Summary - OrchestratorBlock & AgentExecutorBlock now execute for real in dry-run mode so the orchestrator can make LLM calls and agent executors can spawn child graphs. Their downstream tool blocks and child-graph blocks are still simulated via `simulate_block()`. Credential fields from node defaults are restored since `validate_exec()` wipes them in dry-run mode. Agent-mode iterations capped at 1 in dry-run. - All blocks (including MCPToolBlock) are simulated via a single generic `simulate_block()` path. The LLM prompt is grounded by `inspect.getsource(block.run)`, giving the simulator access to the exact implementation of each block's `run()` method. This produces realistic mock responses for any block type without needing block-specific simulation logic. - Updated agent generation guide to document special block dry-run behavior. - Minor frontend fixes: exported `formatCents` from `RateLimitResetDialog` for reuse in `UsagePanelContent`, used `useRef` for stable callback references in `useResetRateLimit` to avoid stale closures. - 74 tests (21 existing dry-run + 53 new simulator tests covering prompt building, passthrough logic, and special block dry-run). ## Design The simulator (`backend/executor/simulator.py`) uses a two-tier approach: 1. Passthrough blocks (OrchestratorBlock, AgentExecutorBlock): `prepare_dry_run()` returns modified input_data so these blocks execute for real in `manager.py`. OrchestratorBlock gets `max_iterations=1` (agent mode) or 0 (traditional mode). AgentExecutorBlock spawns real child graph executions whose blocks inherit `dry_run=True`. 2. All other blocks: `simulate_block()` builds an LLM prompt containing: - Block name and description - Input/output schemas (JSON Schema) - The block's `run()` source code via `inspect.getsource(block.run)` - The actual input values (with credentials stripped and long values truncated) The LLM then role-plays the block's execution, producing realistic outputs grounded in the actual implementation. Special handling for input/output blocks: `AgentInputBlock` and `AgentOutputBlock` are pure passthrough (no LLM call needed). ## Test plan - [x] All 74 tests pass (`pytest backend/copilot/tools/test_dry_run.py backend/executor/simulator_test.py`) - [x] Pre-commit hooks pass (ruff, isort, black, pyright, frontend typecheck) - [x] CI: all checks green - [x] E2E: dry-run execution completes with `is_dry_run=true`, cost=0, no errors - [x] E2E: normal (non-dry-run) execution unchanged - [x] E2E: Create agent with OrchestratorBlock + tool blocks, run with `dry_run=True`, verify orchestrator makes real LLM calls while tool blocks are simulated - [x] E2E: AgentExecutorBlock spawns child graph in dry-run, child blocks are LLM-simulated - [x] E2E: Builder simulate button works end-to-end with special blocks --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 17:09:55 +00:00
Toran Bruce Richards	11b846dd49	fix(blocks): rename placeholder_values to options on AgentDropdownInputBlock (#12595 ) ## Summary Resolves [REQ-78](https://linear.app/autogpt/issue/REQ-78): The `placeholder_values` field on `AgentDropdownInputBlock` is misleadingly named. In every major UI framework "placeholder" means non-binding hint text that disappears on focus, but this field actually creates a dropdown selector that restricts the user to only those values. ## Changes ### Core rename (`autogpt_platform/backend/backend/blocks/io.py`) - Renamed `placeholder_values` → `options` on `AgentDropdownInputBlock.Input` - Added clear field description: "If provided, renders the input as a dropdown selector restricted to these values. Leave empty for free-text input." - Updated class docstring to describe actual behavior - Overrode `model_construct()` to remap legacy `placeholder_values` → `options` for backward compatibility with existing persisted agent JSON ### Tests (`autogpt_platform/backend/backend/blocks/test/test_block.py`) - Updated existing tests to use canonical `options` field name - Added 2 new backward-compat tests verifying legacy `placeholder_values` still works through both `model_construct()` and `Graph._generate_schema()` paths ### Documentation - Updated `autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md` — changed field name in CoPilot SDK guide - Updated `docs/integrations/block-integrations/basic.md` — changed field name and description in public docs ### Load tests (`autogpt_platform/backend/load-tests/tests/api/graph-execution-test.js`) - Removed spurious `placeholder_values: {}` from AgentInputBlock node (this field never existed on AgentInputBlock) - Fixed execution input to use `value` instead of `placeholder_values` ## Backward Compatibility Existing agents with `placeholder_values` in their persisted `input_default` JSON will continue to work — the `model_construct()` override transparently remaps the old key to `options`. No database migration needed since the field is stored inside a JSON blob, not as a dedicated column. ## Testing - All existing tests updated and passing - 2 new backward-compat tests added - No frontend changes needed (frontend reads `enum` from generated JSON Schema, not the field name directly) --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-04-02 05:56:17 +00:00
Nicholas Tindle	88589764b5	dx(platform): normalize agent instructions for Claude and Codex (#12592 ) ### Why / What / How Why: repo guidance was split between Claude-specific `CLAUDE.md` files and Codex-specific `AGENTS.md` files, which duplicated instruction content and made the same repository behave differently across agents. The repo also had Claude skills under `.claude/skills` but no Codex-visible repo skill path. What: this PR bridges the repo's Claude skills into Codex and normalizes shared instruction files so `AGENTS.md` becomes the canonical source while each `CLAUDE.md` imports its sibling `AGENTS.md`. How: add a repo-local `.agents/skills` symlink pointing to `../.claude/skills`; move nested `CLAUDE.md` content into sibling `AGENTS.md` files; replace each repo `CLAUDE.md` with a one-line `@AGENTS.md` shim so Claude and Codex read the same scoped guidance without duplicating text. The root `CLAUDE.md` now imports the root `AGENTS.md` rather than symlinking to it. Note: the instruction-file normalization commit was created with `--no-verify` because the repo's frontend pre-commit `tsc` hook currently fails on unrelated existing errors, largely missing `autogpt_platform/frontend/src/app/api/__generated__/` modules. ### Changes 🏗️ - Add `.agents/skills` as a repo-local symlink to `../.claude/skills` so Codex discovers the existing Claude repo skills. - Add a real root `CLAUDE.md` shim that imports the canonical root `AGENTS.md`. - Promote nested scoped instruction content into sibling `AGENTS.md` files under `autogpt_platform/`, `autogpt_platform/backend/`, `autogpt_platform/frontend/`, `autogpt_platform/frontend/src/tests/`, and `docs/`. - Replace the corresponding nested `CLAUDE.md` files with one-line `@AGENTS.md` shims. - Preserve the existing scoped instruction hierarchy while making the shared content cross-compatible between Claude and Codex. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified `.agents/skills` resolves to `../.claude/skills` - [x] Verified each repo `CLAUDE.md` now contains only `@AGENTS.md` - [x] Verified the expected `AGENTS.md` files exist at the root and nested scoped directories - [x] Verified the branch contains only the intended agent-guidance commits relative to `dev` and the working tree is clean #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) No runtime configuration changes are included in this PR. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk* > Low risk: documentation/instruction-file reshuffle plus an `.agents/skills` pointer; no runtime code paths are modified. > > Overview > Unifies agent guidance so `AGENTS.md` becomes canonical and all corresponding `CLAUDE.md` files become 1-line shims (`@AGENTS.md`) at the repo root, `autogpt_platform/`, backend, frontend, frontend tests, and `docs/`. > > Adds `.agents/skills` pointing to `../.claude/skills` so non-Claude agents discover the same shared skills/instructions, eliminating duplicated/agent-specific guidance content. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `839483c3b6`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-04-01 09:08:51 +00:00
Zamil Majdy	3e25488b2d	feat(copilot): add session-level dry_run flag to autopilot sessions (#12582 ) ## Summary - Adds a session-level `dry_run` flag that forces ALL tool calls (`run_block`, `run_agent`) in a copilot/autopilot session to use dry-run simulation mode - Stores the flag in a typed `ChatSessionMetadata` JSON model on the `ChatSession` DB row, accessed via `session.dry_run` property - Adds `dry_run` to the AutoPilot block Input schema so graph builders can create dry-run autopilot nodes - Refactors multiple copilot tools from `kwargs` to explicit parameters for type safety ## Changes - Prisma schema: Added `metadata` JSON column to `ChatSession` model with migration - Python models: Added `ChatSessionMetadata` model with `dry_run` field, added `metadata` field to `ChatSessionInfo` and `ChatSession`, updated `from_db()`, `new()`, and `create_chat_session()` - Session propagation: `set_execution_context(user_id, session)` called from `baseline/service.py` so tool handlers can read session-level flags via `session.dry_run` - Tool enforcement: `run_block` and `run_agent` check `session.dry_run` and force `dry_run=True` when set; `run_agent` blocks scheduling in dry-run sessions - AutoPilot block: Added `dry_run` input field, passes it when creating sessions - Chat API: Added `CreateSessionRequest` model with `dry_run` field to `POST /sessions` endpoint; added `metadata` to session responses - Frontend: Updated `useChatSession.ts` to pass body to the create session mutation - Tool refactoring: Multiple copilot tools refactored from `kwargs` to explicit named parameters (agent_browser, manage_folders, workspace_files, connect_integration, agent_output, bash_exec, etc.) for better type safety ## Test plan - [x] Unit tests for `ChatSession.new()` with dry_run parameter - [x] Unit tests for `RunBlockTool` session dry_run override - [x] Unit tests for `RunAgentTool` session dry_run override - [x] Unit tests for session dry_run blocks scheduling - [x] Existing dry_run tests still pass (12/12) - [x] Existing permissions tests still pass - [x] All pre-commit hooks pass (ruff, isort, pyright, tsc) - [ ] Manual: Create autopilot session with `dry_run=True`, verify run_block/run_agent calls use simulation --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 16:27:36 +00:00
Zamil Majdy	37d9863552	feat(platform): add extended thinking execution mode to OrchestratorBlock (#12512 ) ## Summary - Adds `ExecutionMode` enum with `BUILT_IN` (default built-in tool-call loop) and `EXTENDED_THINKING` (delegates to Claude Agent SDK for richer reasoning) - Extracts shared `tool_call_loop` into `backend/util/tool_call_loop.py` — reusable by both OrchestratorBlock agent mode and copilot baseline - Refactors copilot baseline to use the shared `tool_call_loop` with callback-driven iteration ## ExecutionMode enum `ExecutionMode` (`backend/blocks/orchestrator.py`) controls how OrchestratorBlock executes tool calls: - `BUILT_IN` — Default mode. Runs the built-in tool-call loop (supports all LLM providers). - `EXTENDED_THINKING` — Delegates to the Claude Agent SDK for extended thinking and multi-step planning. Requires Anthropic-compatible providers (`anthropic` / `open_router`) and direct API credentials (subscription mode not supported). Validates both provider and model name at runtime. ## Shared tool_call_loop `backend/util/tool_call_loop.py` provides a generic, provider-agnostic conversation loop: 1. Call LLM with tools → 2. Extract tool calls → 3. Execute tools → 4. Update conversation → 5. Repeat Callers provide three callbacks: - `llm_call`: wraps any LLM provider (OpenAI streaming, Anthropic, llm.llm_call, etc.) - `execute_tool`: wraps any tool execution (TOOL_REGISTRY, graph block execution, etc.) - `update_conversation`: formats messages for the specific protocol ## OrchestratorBlock EXTENDED_THINKING mode - `_create_graph_mcp_server()` converts graph-connected blocks to MCP tools - `_execute_tools_sdk_mode()` runs `ClaudeSDKClient` with those MCP tools - Agent mode refactored to use shared `tool_call_loop` ## Copilot baseline refactored - Streaming callbacks buffer `Stream*` events during loop execution - Events are drained after `tool_call_loop` returns - Same conversation logic, less code duplication ## SDK environment builder extraction - `build_sdk_env()` extracted to `backend/copilot/sdk/env.py` for reuse by both copilot SDK service and OrchestratorBlock ## Provider validation EXTENDED_THINKING mode validates `provider in ('anthropic', 'open_router')` and `model_name.startswith('claude')` because the Claude Agent SDK requires an Anthropic API key or OpenRouter key. Subscription mode is not supported — it uses the platform's internal credit system which doesn't provide raw API keys needed by the SDK. The validation raises a clear `ValueError` if an unsupported provider or model is used. ## PR Dependencies This PR builds on #12511 (Claude SDK client). It can be reviewed independently — #12511 only adds the SDK client module which this PR imports. If #12511 merges first, this PR will have no conflicts. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] All pre-commit hooks pass (typecheck, lint, format) - [x] Existing OrchestratorBlock tests still pass - [x] Copilot baseline behavior unchanged (same stream events, same tool execution) - [x] Manual: OrchestratorBlock with execution_mode=EXTENDED_THINKING + downstream blocks → SDK calls tools - [x] Agent mode regression test (non-SDK path works as before) - [x] SDK mode error handling (invalid provider raises ValueError)	2026-03-31 20:04:13 +07:00
Zamil Majdy	f79d8f0449	fix(backend): move placeholder_values exclusively to AgentDropdownInputBlock (#12551 ) ## Why `AgentInputBlock` has a `placeholder_values` field whose `generate_schema()` converts it into a JSON schema `enum`. The frontend renders any field with `enum` as a dropdown/select. This means AI-generated agents that populate `placeholder_values` with example values (e.g. URLs) on regular `AgentInputBlock` nodes end up with dropdowns instead of free-text inputs — users can't type custom values. Only `AgentDropdownInputBlock` should produce dropdown behavior. ## What - Removed `placeholder_values` field from `AgentInputBlock.Input` - Moved the `enum` generation logic to `AgentDropdownInputBlock.Input.generate_schema()` - Cleaned up test data for non-dropdown input blocks - Updated copilot agent generation guide to stop suggesting `placeholder_values` for `AgentInputBlock` ## How The base `AgentInputBlock.Input.generate_schema()` no longer converts `placeholder_values` → `enum`. Only `AgentDropdownInputBlock.Input` defines `placeholder_values` and overrides `generate_schema()` to produce the `enum`. Backward compatibility: Existing agents with `placeholder_values` on `AgentInputBlock` nodes load fine — `model_construct()` silently ignores extra fields not defined on the model. Those inputs will now render as text fields (desired behavior). ## Test plan - [x] `poetry run pytest backend/blocks/test/test_block.py -xvs` — all block tests pass - [x] `poetry run format && poetry run lint` — clean - [ ] Import an agent JSON with `placeholder_values` on an `AgentInputBlock` — verify it loads and renders as text input - [ ] Create an agent with `AgentDropdownInputBlock` — verify dropdown still works	2026-03-26 08:09:38 +00:00
An Vy Le	f871717f68	fix(backend): add sink input validation to AgentValidator (#12514 ) ## Summary - Added `validate_sink_input_existence` method to `AgentValidator` to ensure all sink names in links and input defaults reference valid input schema fields in the corresponding block - Added comprehensive tests covering valid/invalid sink names, nested inputs, and default key handling - Updated `ReadDiscordMessagesBlock` description to clarify it reads new messages and triggers on new posts - Removed leftover test function file ## Test plan - [ ] Run `pytest` on `validator_test.py` to verify all sink input validation cases pass - [ ] Verify existing agent validation flow is unaffected - [ ] Confirm `ReadDiscordMessagesBlock` description update is accurate 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-03-25 16:08:17 +00:00
Zamil Majdy	80bfd64ffa	Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev	2026-03-24 21:18:11 +07:00
Zamil Majdy	0076ad2a1a	hotfix(blocks): bump stagehand ^0.5.1 → ^3.4.0 to fix yanked litellm (#12539 ) ## Summary Critical CI fix — litellm was compromised in a supply chain attack (versions 1.82.7/1.82.8 contained infostealer malware) and PyPI subsequently yanked many litellm versions including the 1.7x range that stagehand 0.5.x depended on. This breaks `poetry lock` in CI for all PRs. - Bump `stagehand` from `^0.5.1` to `^3.4.0` — Stagehand v3 is a Stainless-generated HTTP API client that no longer depends on litellm, completely removing litellm from our dependency tree - Migrate stagehand blocks to use `AsyncStagehand` + session-based API (`sessions.start`, `session.navigate/act/observe/extract`) - Net reduction of ~430 lines in `poetry.lock` from dropping litellm and its transitive dependencies ## Why All CI pipelines are blocked because `poetry lock` fails to resolve yanked litellm versions that stagehand 0.5.x required. ## Test plan - [x] CI passes (poetry lock resolves, backend tests green) - [ ] Verify stagehand blocks still function with the new session-based API	2026-03-24 21:17:19 +07:00
Zamil Majdy	9381057079	refactor(platform): rename SmartDecisionMakerBlock to OrchestratorBlock (#12511 ) ## Summary - Renames `SmartDecisionMakerBlock` to `OrchestratorBlock` across the entire codebase - The block supports iteration/agent mode and general tool orchestration, so "Smart Decision Maker" no longer accurately describes its capabilities - Block UUID (`3b191d9f-356f-482d-8238-ba04b6d18381`) remains unchanged — fully backward compatible with existing graphs ## Changes - Renamed block class, constants, file names, test files, docs, and frontend enum - Updated copilot agent generator (helpers, validator, fixer) references - Updated agent generation guide documentation - No functional changes — pure rename refactor ### For code changes - [x] I have clearly listed my changes in the PR description - [x] I have made corresponding changes to the documentation - [x] My changes do not generate new warnings or errors - [x] New and existing unit tests pass locally with my changes ## Test plan - [x] All pre-commit hooks pass (typecheck, lint, format) - [x] Existing graphs with this block continue to load and execute (same UUID) - [x] Agent mode / iteration mode works as before - [x] Copilot agent generator correctly references the renamed block	2026-03-24 19:16:42 +07:00
Zamil Majdy	ee5382a064	feat(copilot): add tool/block capability filtering to AutoPilotBlock (#12482 ) ## Summary - Adds `CopilotPermissions` model (`copilot/permissions.py`) — a capability filter that restricts which tools and blocks the AutoPilot/Copilot may use during a single execution - Exposes 4 new `advanced=True` fields on `AutoPilotBlock`: `tools`, `tools_exclude`, `blocks`, `blocks_exclude` - Threads permissions through the full execution path: `AutoPilotBlock` → `collect_copilot_response` → `stream_chat_completion_sdk` → `run_block` - Implements recursion inheritance via contextvar: sub-agent executions can only be more restrictive than their parent ## Design Tool filtering (`tools` + `tools_exclude`): - `tools_exclude=True` (default): `tools` is a blacklist — listed tools denied, all others allowed. Empty list = allow all. - `tools_exclude=False`: `tools` is a whitelist — only listed tools are allowed. - Users specify short names (`run_block`, `web_fetch`, `Read`, `Task`, …) — mapped to full SDK format internally. - Validated eagerly at block-run time with a clear error listing valid names. Block filtering (`blocks` + `blocks_exclude`): - Same semantics as tool filtering, applied inside `run_block` via contextvar. - Each entry can be a full UUID, an 8-char partial UUID (first segment), or a case-insensitive block name. - Validated against the live block registry; invalid identifiers surface a helpful error before the session is created. Recursion inheritance: - `_inherited_permissions` contextvar stores the parent execution's permissions. - On each `AutoPilotBlock.run()`, the child's permissions are merged with the parent via `merged_with_parent()` — effective allowed sets are intersected (tools) and the parent chain is kept for block checks. - Sub-agents can never expand what the parent allowed. ## Test plan - [x] 68 new unit tests in `copilot/permissions_test.py` and `blocks/autopilot_permissions_test.py` - [x] Block identifier matching: full UUID, partial UUID, name, case-insensitivity - [x] Tool allow/deny list semantics including edge cases (empty list, unknown tool) - [x] Parent/child merging and recursion ceiling correctness - [x] `validate_tool_names` / `validate_block_identifiers` with mock block registry - [x] `apply_tool_permissions` SDK tool-list integration - [x] `AutoPilotBlock.run()` — invalid tool/block yields error before session creation - [x] `AutoPilotBlock.run()` — valid permissions forwarded to `execute_copilot` - [x] Existing `AutoPilotBlock` block tests still pass (2/2) - [x] All hooks pass (pyright, ruff, black, isort) - [x] E2E: CoPilot chat works end-to-end with E2B sandbox (12s stream) - [x] E2E: Permission fields render in Builder UI (Tools combobox, exclude toggles) - [x] E2E: Agent with restricted permissions (whitelist web_fetch only) executes correctly - [x] E2E: Permission values preserved through API round-trip	2026-03-24 07:49:58 +00:00
Nicholas Tindle	f01f668674	fix(backend): support Responses API in SmartDecisionMakerBlock (#12489 ) ## Summary - Fixes SmartDecisionMakerBlock conversation management to work with OpenAI's Responses API, which was introduced in #12099 (commit `1240f38`) - The migration to `responses.create` updated the outbound LLM call but missed the conversation history serialization — the `raw_response` is now the entire `Response` object (not a `ChatCompletionMessage`), and tool calls/results use `function_call` / `function_call_output` types instead of role-based messages - This caused a 400 error on the second LLM call in agent mode: `"Invalid value: ''. Supported values are: 'assistant', 'system', 'developer', and 'user'."` ### Changes `smart_decision_maker.py` — 6 functions updated: \| Function \| Fix \| \|---\|---\| \| `_convert_raw_response_to_dict` \| Detects Responses API `Response` objects, extracts output items as a list \| \| `_get_tool_requests` \| Recognizes `type: "function_call"` items \| \| `_get_tool_responses` \| Recognizes `type: "function_call_output"` items \| \| `_create_tool_response` \| New `responses_api` kwarg produces `function_call_output` format \| \| `_update_conversation` \| Handles list return from `_convert_raw_response_to_dict` \| \| Non-agent mode path \| Same list handling for traditional execution \| `test_smart_decision_maker_responses_api.py` — 61 tests covering: - Every branch of all 6 affected helper functions - Chat Completions, Anthropic, and Responses API formats - End-to-end agent mode and traditional mode conversation validity ## Test plan - [x] 61 new unit tests all pass - [x] 11 existing SmartDecisionMakerBlock tests still pass (no regressions) - [x] All pre-commit hooks pass (ruff, black, isort, pyright) - [ ] CI integration tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Updates core LLM invocation and agent conversation/tool-call bookkeeping to match OpenAI’s Responses API, which can affect tool execution loops and prompt serialization across providers. Risk is mitigated by extensive new unit tests, but regressions could surface in production agent-mode flows or token/usage accounting. > > Overview > Migrates OpenAI calls from Chat Completions to the Responses API end-to-end, including tool schema conversion, output parsing, reasoning/text extraction, and updated token usage fields in `LLMResponse`. > > Fixes SmartDecisionMakerBlock conversation/tool handling for Responses API by treating `raw_response` as a Response object (splitting it into `output` items for replay), recognizing `function_call`/`function_call_output` entries, and emitting tool outputs in the correct Responses format to prevent invalid follow-up prompts. > > Also adjusts prompt compaction/token estimation to understand Responses API tool items, changes `get_execution_outputs_by_node_exec_id` to return list-valued `CompletedBlockOutput`, removes `gpt-3.5-turbo` from model/cost/docs lists, and adds focused unit tests plus a lightweight `conftest.py` to run these tests without the full server stack. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `ff292efd3d`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Otto <otto@agpt.co> Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>	2026-03-20 03:23:52 +00:00
Nicholas Tindle	cbff3b53d3	Revert "feat(backend): migrate OpenAI provider to Responses API" (#12490 ) Reverts Significant-Gravitas/AutoGPT#12099 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Reverts the OpenAI integration in `llm_call` from the Responses API back to `chat.completions`, which can change tool-calling, JSON-mode behavior, and token accounting across core AI blocks. The change is localized but touches the primary LLM execution path and associated tests/docs. > > Overview > Reverts the OpenAI path in `backend/blocks/llm.py` from the Responses API back to `chat.completions`, including updating JSON-mode (`response_format`), tool handling, and usage extraction to match the Chat Completions response shape. > > Removes the now-unused `backend/util/openai_responses.py` helpers and their unit tests, updates LLM tests to mock `chat.completions.create`, and adds `gpt-3.5-turbo` to the supported model list, cost config, and LLM docs. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `7d6226d10e`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-03-20 01:51:56 +00:00
Otto	1240f38f75	feat(backend): migrate OpenAI provider to Responses API (#12099 ) ## Summary Migrates the OpenAI provider in the LLM block from `chat.completions.create` to `responses.create` — OpenAI's newer, unified API. Also removes the obsolete GPT-3.5-turbo model. Resolves #11624 Linear: [OPEN-2911](https://linear.app/autogpt/issue/OPEN-2911/update-openai-calls-to-use-responsescreate) ## Changes - `backend/blocks/llm.py` — OpenAI provider now uses `responses.create` exclusively. Removed GPT-3.5-turbo enum + metadata. - `backend/util/openai_responses.py` (new) — Helpers for the Responses API: tool format conversion, content/reasoning/usage/tool-call extraction. - `backend/util/openai_responses_test.py` (new) — Unit tests for all helper functions. - `backend/data/block_cost_config.py` — Removed GPT-3.5 cost entry. - `docs/integrations/block-integrations/llm.md` — Regenerated block docs. ## Key API differences handled \| Aspect \| Chat Completions \| Responses API \| \|--------\|-----------------\|---------------\| \| Messages param \| `messages` \| `input` \| \| Max tokens param \| `max_completion_tokens` \| `max_output_tokens` \| \| Usage fields \| `prompt_tokens` / `completion_tokens` \| `input_tokens` / `output_tokens` \| \| Tool format \| Nested under `function` key \| Flat structure \| ## Test plan - [x] Unit tests for all `openai_responses.py` helpers - [x] Existing LLM block tests updated for Responses API mocks - [x] Regular OpenAI models work - [x] Reasoning OpenAI models work - [x] Non-OpenAI models work --------- Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 09:19:31 +00:00
Zamil Majdy	5d9a169e04	feat(blocks): add AutoPilotBlock for invoking AutoPilot from graphs (#12439 ) ## Summary - Adds `AutogptCopilotBlock` that invokes the platform's copilot system (`stream_chat_completion_sdk`) directly from graph executions - Enables sub-agent patterns: copilot can call this block recursively (with depth limiting via `contextvars`) - Enables scheduled copilot execution through the agent executor system - No user credentials needed — uses server-side copilot config ## Inputs/Outputs Inputs: prompt, system_context, session_id (continuation), timeout, max_recursion_depth Outputs: response text, tool_calls list, conversation_history JSON, session_id, token_usage ## Test plan - [x] Block test passes (`test_available_blocks[AutogptCopilotBlock]`) - [x] Pre-commit hooks pass (format, lint, typecheck) - [ ] Manual test: add block to graph, send prompt, verify response - [ ] Manual test: chain two copilot blocks with session_id to verify continuation	2026-03-18 11:22:25 +00:00
Otto	e657472162	feat(blocks): Add Nano Banana 2 to image generator, customizer, and editor blocks (#12218 ) Requested by @Torantulino Add `google/nano-banana-2` (Gemini 3.1 Flash Image) support across all three image blocks. ### Changes `ai_image_customizer.py` - Add `NANO_BANANA_2 = "google/nano-banana-2"` to `GeminiImageModel` enum - Update block description to reference Nano-Banana models generically `ai_image_generator_block.py` - Add `NANO_BANANA_2` to `ImageGenModel` enum - Add generation branch (identical to NBP except model name) `flux_kontext.py` (AI Image Editor) - Rename `FluxKontextModelName` → `ImageEditorModel` (with backwards-compatible alias) - Add `NANO_BANANA_PRO` and `NANO_BANANA_2` to the editor - Model-aware branching in `run_model()`: NB models use `image_input` list (not `input_image`), no `seed`, and add `output_format` `block_cost_config.py` - Add NB2 cost entries for all three blocks (14 credits, matching NBP) - Add NB Pro cost entry for editor block - Update editor block refs from `.PRO`/`.MAX` to `.FLUX_KONTEXT_PRO`/`.FLUX_KONTEXT_MAX` Resolves SECRT-2047 --------- Co-authored-by: Torantulino <Torantulino@users.noreply.github.com> Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>	2026-03-18 09:42:18 +00:00
Abhimanyu Yadav	e32d258a7e	feat(blocks): add AgentMail integration blocks (#12417 ) ## Summary - Add a full AgentMail integration with blocks for managing inboxes, messages, threads, drafts, attachments, lists, and pods - Includes shared provider configuration (`_config.py`) with API key authentication - 8 block modules covering ~25 individual blocks across all AgentMail API surfaces ## Block Modules \| Module \| Blocks \| \|--------\|--------\| \| `inbox.py` \| Create, Get, List, Update, Delete inboxes \| \| `messages.py` \| Send, Get, List, Delete messages + org-wide listing \| \| `threads.py` \| Get, List, Delete threads + org-wide listing \| \| `drafts.py` \| Create, Get, List, Update, Send, Delete drafts + org-wide listing \| \| `attachments.py` \| Download attachments \| \| `lists.py` \| Create, Get, List, Update, Delete mailing lists \| \| `pods.py` \| Create, Get, List, Update, Delete pods \| ## Test plan - [x] `poetry run pytest 'backend/blocks/test/test_block.py' -xvs` — all new blocks pass the standard block test suite - [x] test all blocks manually	2026-03-17 12:40:32 +00:00
Nicholas Tindle	8892bcd230	docs: Add workspace and media file architecture documentation (#11989 ) ### Changes 🏗️ - Added comprehensive architecture documentation at `docs/platform/workspace-media-architecture.md` covering: - Database models (`UserWorkspace`, `UserWorkspaceFile`) - `WorkspaceManager` API with session scoping - `store_media_file()` media normalization pipeline (input types, return formats) - Virus scanning responsibility boundaries - Decision tree for choosing `WorkspaceManager` vs `store_media_file()` - Configuration reference including `clamav_max_concurrency` and `clamav_mark_failed_scans_as_clean` - Common patterns with error handling examples - Updated `autogpt_platform/backend/CLAUDE.md` with a "Workspace & Media Files" section referencing the new docs - Removed duplicate `scan_content_safe()` call from `WriteWorkspaceFileTool` — `WorkspaceManager.write_file()` already scans internally, so the tool was double-scanning every file - Replaced removed comment in `workspace.py` with explicit ownership comment clarifying that `WorkspaceManager` is the single scanning boundary ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified `scan_content_safe()` is called inside `WorkspaceManager.write_file()` (workspace.py:186) - [x] Verified `store_media_file()` scans all input branches including local paths (file.py:351) - [x] Verified documentation accuracy against current source code after merge with dev - [x] CI checks all passing <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Mostly adds documentation and internal developer guidance; the only code change is a comment clarifying `WorkspaceManager.write_file()` as the single virus-scanning boundary, with no behavior change. > > Overview > Adds a new `docs/platform/workspace-media-architecture.md` describing the Workspace storage layer vs the `store_media_file()` media pipeline, including session scoping and virus-scanning/persistence responsibility boundaries. > > Updates backend `CLAUDE.md` to point contributors to the new doc when working on CoPilot uploads/downloads or `WorkspaceManager`/`store_media_file()`, and clarifies in `WorkspaceManager.write_file()` (comment-only) that callers should not duplicate virus scanning. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `18fcfa03f8`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 06:12:26 +00:00
Bently	ef446e4fe9	feat(llm): Add Cohere Command A Family Models (#12339 ) ## Summary Adds the Cohere Command A family of models to AutoGPT Platform with proper pricing configuration. ## Models Added - Command A 03.2025: Flagship model (256k context, 8k output) - 3 credits - Command A Translate 08.2025: State-of-the-art translation (8k context, 8k output) - 3 credits - Command A Reasoning 08.2025: First reasoning model (256k context, 32k output) - 6 credits - Command A Vision 07.2025: First vision-capable model (128k context, 8k output) - 3 credits ## Changes - Added 4 new LlmModel enum entries with proper OpenRouter model IDs - Added ModelMetadata for each model with correct context windows, output limits, and price tiers - Added pricing configuration in block_cost_config.py ## Testing - [ ] Models appear in AutoGPT Platform model selector - [ ] Pricing is correctly applied when using models Resolves SECRT-2083	2026-03-12 11:56:30 +00:00
Bently	7b1e8ed786	feat(llm): Add Microsoft Phi-4 model support (#12342 ) ## Changes - Added `MICROSOFT_PHI_4` to LlmModel enum (`microsoft/phi-4`) - Configured model metadata: - 16K context window - 16K max output tokens - OpenRouter provider - Set cost tier: 1 - Input: $0.06 per 1M tokens - Output: $0.14 per 1M tokens ## Details Microsoft Phi-4 is a 14B parameter model available through OpenRouter. This PR adds proper support in the autogpt_platform backend. Resolves SECRT-2086	2026-03-12 11:15:27 +00:00
Bently	3595c6e769	feat(llm): add Perplexity Sonar Reasoning Pro model (#12341 ) ## Summary Adds support for Perplexity's new reasoning model: `perplexity/sonar-reasoning-pro` ## Changes - ✅ Added `PERPLEXITY_SONAR_REASONING_PRO` to `LlmModel` enum - ✅ Added model metadata (128K context window, 8K max output tokens, tier 2) - ✅ Set pricing at 5 credits (matches sonar-pro tier) ## Model Details - Model ID: `perplexity/sonar-reasoning-pro` - Provider: OpenRouter - Context Window: 128,000 tokens - Max Output: 8,000 tokens - Pricing: $0.000002/token (prompt), $0.000008/token (completion) - Cost Tier: 2 (5 credits) ## Testing - ✅ Black formatting passed - ✅ Ruff linting passed Resolves SECRT-2084	2026-03-12 09:58:29 +00:00
Bently	ade2baa58f	feat(llm): Add Grok 3 model support (#12343 ) ## Summary Adds support for xAI's Grok 3 model to AutoGPT. ## Changes - Added `GROK_3` to `LlmModel` enum with identifier `x-ai/grok-3` - Configured model metadata: - Context window: 131,072 tokens (128k) - Max output: 32,768 tokens (32k) - Provider: OpenRouter - Creator: xAI - Price tier: 2 (mid-tier) - Set model cost to 3 credits (mid-tier pricing between fast models and Grok 4) - Updated block documentation to include Grok 3 in model lists ## Pricing Rationale - Grok 4: 9 credits (tier 3 - premium, 256k context) - Grok 3: 3 credits (tier 2 - mid-tier, 128k context) ← NEW - Grok 4 Fast/4.1 Fast/Code Fast: 1 credit (tier 1 - affordable) Grok 3 is positioned as a mid-tier model, priced similarly to other tier 2 models. ## Testing - [x] Code passes `black` formatting - [x] Code passes `ruff` linting - [x] Model metadata and cost configuration added - [x] Documentation updated Closes SECRT-2079	2026-03-12 07:31:59 +00:00
Bently	89a5b3178a	fix(llm): Update Gemini model lineup - add 3.1 models, deprecate 3 Pro Preview (#12331 ) ## 🔴 URGENT: Gemini 3 Pro Preview Shutdown - March 9, 2026 Google is shutting down Gemini 3 Pro Preview tomorrow (March 9, 2026). This PR addresses SECRT-2067 by updating the Gemini model lineup to prevent disruption. --- ## Changes ### ✅ P0 - Critical (This Week) - [x] Remove/Replace Gemini 3 Pro Preview → Migrated to 3.1 Pro Preview - [x] Add Gemini 3.1 Pro Preview (released Feb 19, 2026) ### ✅ P1 - High Priority - [x] Add Gemini 3.1 Flash Lite Preview (released Mar 3, 2026) - [x] Add Gemini 3 Flash Preview (released Dec 17, 2025) ### ✅ P2 - Medium Priority - [x] Add Gemini 2.5 Pro (stable/GA) (released Jun 17, 2025) --- ## Model Details \| Model \| Context \| Input Cost \| Output Cost \| Price Tier \| \|-------\|---------\|------------\|-------------\|------------\| \| Gemini 3.1 Pro Preview \| 1.05M \| $2.00/1M \| $12.00/1M \| 2 \| \| Gemini 3.1 Flash Lite Preview \| 1.05M \| $0.25/1M \| $1.50/1M \| 1 \| \| Gemini 3 Flash Preview \| 1.05M \| $0.50/1M \| $3.00/1M \| 1 \| \| Gemini 2.5 Pro (GA) \| 1.05M \| $1.25/1M \| $10.00/1M \| 2 \| \| ~~Gemini 3 Pro Preview~~ \| ~~1.05M~~ \| ~~$2.00/1M~~ \| ~~$12.00/1M~~ \| DEPRECATED \| --- ## Migration Strategy Database Migration: `20260308095500_migrate_deprecated_gemini_3_pro_preview` - Automatically migrates all existing graphs using `google/gemini-3-pro-preview` to `google/gemini-3.1-pro-preview` - Updates: AgentBlock, AgentGraphExecution, AgentNodeExecution, AgentGraph - Zero user-facing disruption - Migration runs on next deployment (before March 9 shutdown) --- ## Testing - [ ] Verify new models appear in LLM block dropdown - [ ] Test migration on staging database - [ ] Confirm existing graphs using deprecated model auto-migrate - [ ] Validate cost calculations for new models --- ## References - Linear Issue: [SECRT-2067](https://linear.app/autogpt/issue/SECRT-2067) - OpenRouter Models: https://openrouter.ai/models/google - Google Deprecation Notice: https://ai.google.dev/gemini-api/docs/deprecations --- ## Checklist - [x] Models added to `LlmModel` enum - [x] Model metadata configured - [x] Cost config updated - [x] Database migration created - [x] Deprecated model commented out (not removed for historical reference) - [ ] PR reviewed and approved - [ ] Merged before March 9, 2026 deadline --- Priority: 🔴 Critical - Must merge before March 9, 2026	2026-03-11 11:21:16 +00:00
Bently	34a2f9a0a2	feat(llm): add Mistral flagship models (Large 3, Medium 3.1, Small 3.2, Codestral) (#12337 ) ## Summary Adds four missing Mistral AI flagship models to address the critical coverage gap identified in [SECRT-2082](https://linear.app/autogpt/issue/SECRT-2082). ## Models Added \| Model \| Context \| Max Output \| Price Tier \| Use Case \| \|-------\|---------\|------------\|------------\|----------\| \| Mistral Large 3 \| 262K \| None \| 2 (Medium) \| Flagship reasoning model, 41B active params (675B total), MoE architecture \| \| Mistral Medium 3.1 \| 131K \| None \| 2 (Medium) \| Balanced performance/cost, 8x cheaper than traditional large models \| \| Mistral Small 3.2 \| 131K \| 131K \| 1 (Low) \| Fast, cost-efficient, high-volume use cases \| \| Codestral 2508 \| 256K \| None \| 1 (Low) \| Code generation specialist (FIM, correction, test gen) \| ## Problem Previously, the platform only offered: - Mistral Nemo (1 official model) - dolphin-mistral (third-party Ollama fine-tune) This left significant gaps in Mistral's lineup, particularly: - No flagship reasoning model - No balanced mid-tier option - No code-specialized model - Missing multimodal capabilities (Large 3, Medium 3.1, Small 3.2 all support text+image) ## Changes File: `autogpt_platform/backend/backend/blocks/llm.py` - Added 4 enum entries in `LlmModel` class - Added 4 metadata entries in `MODEL_METADATA` dict - All models use OpenRouter provider - Follows existing pattern for model additions ## Testing - ✅ Enum values match OpenRouter model IDs - ✅ Metadata follows existing format - ✅ Context windows verified from OpenRouter API - ✅ Price tiers assigned appropriately ## Closes - SECRT-2082 --- Note: All models are available via OpenRouter and tested. This brings Mistral coverage in line with other major providers (OpenAI, Anthropic, Google).	2026-03-11 08:48:48 +00:00
Zamil Majdy	9f4caa7dfc	feat(blocks): add and harden GitHub blocks for full-cycle development (#12334 ) ## Summary - Add 8 new GitHub blocks: GetRepositoryInfo, ForkRepository, ListCommits, SearchCode, CompareBranches, GetRepositoryTree, MultiFileCommit, MergePullRequest - Split `repo.py` (2094 lines, 19 blocks) into domain-specific modules: `repo.py`, `repo_branches.py`, `repo_files.py`, `commits.py` - Concurrent blob creation via `asyncio.gather()` in MultiFileCommit - URL-encode branch/ref params via `urllib.parse.quote()` for defense-in-depth - Step-level error handling in MultiFileCommit ref update with recovery SHA - Collapse FileOperation CREATE/UPDATE into UPSERT (Git Trees API treats them identically) - Add `ge=1, le=100` constraints on per_page SchemaFields - Preserve URL scheme in `prepare_pr_api_url` - Handle null commit authors gracefully in ListCommits - Add unit tests for `prepare_pr_api_url`, error-path tests for MergePR/MultiFileCommit, FileOperation enum validation tests ## Test plan - [ ] Block tests pass for all 19 GitHub blocks (CI: `test_available_blocks`) - [ ] New test file `test_github_blocks.py` passes (prepare_pr_api_url, error paths, enum) - [ ] `check-docs-sync` passes with regenerated docs - [ ] pyright/ruff clean on all changed files	2026-03-11 08:35:37 +00:00
nKOxxx	c7124a5240	Add documentation for Google Gemini integration (#12283 ) ## Summary Adding comprehensive documentation for Google Gemini integration with AutoGPT. ## Changes - Added setup instructions for Gemini API - Documented configuration options - Added examples and best practices ## Related Issues N/A - Documentation improvement ## Testing - Verified documentation accuracy - Tested all code examples ## Checklist - [x] Code follows project style - [x] Documentation updated - [x] Tests pass (if applicable)	2026-03-09 15:13:28 +00:00
Reinier van der Leer	aa08063939	refactor(backend/db): Improve & clean up Marketplace DB layer & API (#12284 ) These changes were part of #12206, but here they are separately for easier review. This is all primarily to make the v2 API (#11678) work possible/easier. ### Changes 🏗️ - Fix relations between `Profile`, `StoreListing`, and `AgentGraph` - Redefine `StoreSubmission` view with more efficient joins (100x speed-up on dev DB) and more consistent field names - Clean up query functions in `store/db.py` - Clean up models in `store/model.py` - Add missing fields to `StoreAgent` and `StoreSubmission` views - Rename ambiguous `agent_id` -> `graph_id` - Clean up API route definitions & docs in `store/routes.py` - Make routes more consistent - Avoid collision edge-case between `/agents/{username}/{agent_name}` and `/agents/{store_listing_version_id}/*` - Replace all usages of legacy `BackendAPI` for store endpoints with generated client - Remove scope requirements on public store endpoints in v1 external API ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Test all Marketplace views (including admin views) - [x] Download an agent from the marketplace - [x] Submit an agent to the Marketplace - [x] Approve/reject Marketplace submission	2026-03-06 14:38:12 +00:00
Bently	7c8c7bf395	feat(llm): add Claude Sonnet 4.6 model (#12158 ) ## Summary Adds Claude Sonnet 4.6 (`claude-sonnet-4-6`) to the platform. ## Model Details (from [Anthropic docs](https://www.anthropic.com/news/claude-sonnet-4-6)) - API ID: `claude-sonnet-4-6` - Pricing: $3 / input MTok, $15 / output MTok (same as Sonnet 4.5) - Context window: 200K tokens (1M beta) - Max output: 64K tokens - Knowledge cutoff: Aug 2025 (reliable), Jan 2026 (training data) ## Changes - Added `CLAUDE_4_6_SONNET` to `LlmModel` enum - Added metadata entry with correct context/output limits - Updated Stagehand to use Sonnet 4.6 (better for browser automation tasks) ## Why Sonnet 4.6 brings major improvements in coding, computer use, and reasoning. Developers with early access often prefer it to even Opus 4.5. --------- Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-03-05 19:36:56 +00:00
Krzysztof Czerwinski	a1cb3d2a91	feat(blocks): Add Telegram blocks (#12141 ) Add Telegram blocks that allow the use of [Telegram bots' API features](https://core.telegram.org/bots/features). ### Changes 🏗️ 1. Credentials & API layer: Bot token auth via `APIKeyCredentials`, helper functions for JSON API calls (call_telegram_api) and multipart file uploads (call_telegram_api_with_file) 2. Trigger blocks: - `TelegramMessageTriggerBlock` — receives messages (text, photo, voice, audio, document, video, edited message) with configurable event filters - `TelegramMessageReactionTriggerBlock` — fires on reaction changes (private chats auto, groups require admin) 2. Action blocks (11 total): - Send: Message, Photo, Voice, Audio, Document, Video - Reply to Message, Edit Message, Delete Message - Get File (download by file_id) 3. Webhook manager: Registers/deregisters webhooks via Telegram's setWebhook API, validates incoming requests using X-Telegram-Bot-Api-Secret-Token header 4. Provider registration: Added TELEGRAM to ProviderName enum and registered `TelegramWebhooksManager` 5. Media send blocks support both URL passthrough (Telegram fetches directly) and file upload for workspace/data URI inputs ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Non-AI UUIDs - [x] Blocks work correctly - [x] SendTelegramMessageBlock - [x] SendTelegramPhotoBlock - [x] SendTelegramVoiceBlock - [x] SendTelegramAudioBlock - [x] SendTelegramDocumentBlock - [x] SendTelegramVideoBlock - [x] ReplyToTelegramMessageBlock - [x] GetTelegramFileBlock - [x] DeleteTelegramMessageBlock - [x] EditTelegramMessageBlock - [x] TelegramMessageTriggerBlock (works for every trigger type) - [x] TelegramMessageReactionTriggerBlock --------- Co-authored-by: Reinier van der Leer <pwuts@agpt.co>	2026-02-26 10:25:08 +00:00
Bently	ef42b17e3b	docs: add Podman compatibility warning (#12120 ) ## Summary Adds a warning to the Getting Started docs clarifying that Podman and podman-compose are not supported. ## Problem Users on Windows using `podman-compose` instead of Docker get errors like: ``` Error: the specified Containerfile or Dockerfile does not exist, ..\..\autogpt_platform\backend\Dockerfile ``` This is because Podman handles relative paths differently than Docker, causing incorrect path resolution on Windows. ## Solution - Added a clear warning section after the Windows WSL 2 notes - Explains the error users might see - Directs them to install Docker Desktop instead Closes #11358 <!-- greptile_comment --> <details><summary><h3>Greptile Summary</h3></summary> Adds a "Podman Not Supported" warning section to the Getting Started documentation, placed after the Windows/WSL 2 installation notes. The section clarifies that Docker is required, shows the typical error message users encounter when using Podman, and directs them to install Docker Desktop instead. This addresses issue #11358 where Windows users using `podman-compose` hit path resolution errors. - Adds `### ⚠️ Podman Not Supported` section under Manual Setup, after Windows Installation Note - Includes the specific error message users see with Podman for easy identification - Links to Docker Desktop installation docs as the recommended solution - Formatting is consistent with existing sections in the document (emoji headings, code blocks for errors) </details> <details><summary><h3>Confidence Score: 5/5</h3></summary> - This PR is safe to merge — it only adds a documentation warning section with no code changes. - The change is a small, well-written documentation addition that adds a Podman compatibility warning. It touches only one markdown file, introduces no code changes, and is consistent with the existing document structure and style. No issues were found. - No files require special attention. </details> <details><summary><h3>Flowchart</h3></summary> ```mermaid flowchart TD A[User wants to run AutoGPT] --> B{Which container runtime?} B -->\|Docker / Docker Desktop\| C[docker compose up -d --build] C --> D[AutoGPT starts successfully] B -->\|Podman / podman-compose\| E[podman-compose up -d --build] E --> F[Error: Containerfile or Dockerfile does not exist] F --> G[New warning section directs user to install Docker Desktop] G --> C ``` </details> <sub>Last reviewed commit: 23ea6bd</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->	2026-02-23 15:19:24 +00:00
Eve	647c8ed8d4	feat(backend/blocks): enhance list concatenation with advanced operations (#12105 ) ## Summary Enhances the existing `ConcatenateListsBlock` and adds five new companion blocks for comprehensive list manipulation, addressing issue #11139 ("Implement block to concatenate lists"). ### Changes - Enhanced `ConcatenateListsBlock` with optional deduplication (`deduplicate`) and None-value filtering (`remove_none`), plus an output `length` field - New `FlattenListBlock`: Recursively flattens nested list structures with configurable `max_depth` - New `InterleaveListsBlock`: Round-robin interleaving of elements from multiple lists - New `ZipListsBlock`: Zips corresponding elements from multiple lists with support for padding to longest or truncating to shortest - New `ListDifferenceBlock`: Computes set difference between two lists (regular or symmetric) - New `ListIntersectionBlock`: Finds common elements between two lists, preserving order ### Helper Utilities Extracted reusable helper functions for validation, flattening, deduplication, interleaving, chunking, and statistics computation to support the blocks and enable future reuse. ### Test Coverage Comprehensive test suite with 188 test functions across 29 test classes covering: - Built-in block test harness validation for all 6 blocks - Manual edge-case tests for each block (empty inputs, large lists, mixed types, nested structures) - Internal method tests for all block classes - Unit tests for all helper utility functions Closes #11139 ## Test plan - [x] All files pass Python syntax validation (`ast.parse`) - [x] Built-in `test_input`/`test_output` tests defined for all blocks - [x] Manual tests cover edge cases: empty lists, large lists, mixed types, nested structures, deduplication, None removal - [x] Helper function tests validate all utility functions independently - [x] All block IDs are valid UUID4 - [x] Block categories set to `BlockCategory.BASIC` for consistency with existing list blocks <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Enhanced `ConcatenateListsBlock` with deduplication and None-filtering options, and added five new list manipulation blocks (`FlattenListBlock`, `InterleaveListsBlock`, `ZipListsBlock`, `ListDifferenceBlock`, `ListIntersectionBlock`) with comprehensive helper functions and test coverage. Key Changes: - Enhanced `ConcatenateListsBlock` with `deduplicate` and `remove_none` options, plus `length` output field - Added `FlattenListBlock` for recursively flattening nested lists with configurable `max_depth` - Added `InterleaveListsBlock` for round-robin element interleaving - Added `ZipListsBlock` with support for padding/truncation - Added `ListDifferenceBlock` and `ListIntersectionBlock` for set operations - Extracted 12 reusable helper functions for validation, flattening, deduplication, etc. - Comprehensive test suite with 188 test functions covering edge cases Minor Issues: - Helper function `_deduplicate_list` has redundant logic in the `else` branch that duplicates the `if` branch - Three helper functions (`_filter_empty_collections`, `_compute_list_statistics`, `_chunk_list`) are defined but unused - consider removing unless planned for future use - The `_make_hashable` function uses `hash(repr(item))` for unhashable types, which correctly treats structurally identical dicts/lists as duplicates </details> <details><summary><h3>Confidence Score: 4/5</h3></summary> - Safe to merge with minor style improvements recommended - The implementation is well-structured with comprehensive test coverage (188 tests), proper error handling, and follows existing block patterns. All blocks use valid UUID4 IDs and correct categories. The helper functions provide good code reuse. The minor issues are purely stylistic (redundant code, unused helpers) and don't affect functionality or safety. - No files require special attention - both files are well-tested and follow project conventions </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant User participant Block as List Block participant Helper as Helper Functions participant Output User->>Block: Input (lists/parameters) Block->>Helper: _validate_all_lists() Helper-->>Block: validation result alt validation fails Block->>Output: error message else validation succeeds Block->>Helper: _concatenate_lists_simple() / _flatten_nested_list() / etc. Helper-->>Block: processed result opt deduplicate enabled Block->>Helper: _deduplicate_list() Helper-->>Block: deduplicated result end opt remove_none enabled Block->>Helper: _filter_none_values() Helper-->>Block: filtered result end Block->>Output: result + length end Output-->>User: Block outputs ``` </details> <sub>Last reviewed commit: a6d5445</sub> <!-- greptile_other_comments_section --> <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub> <!-- /greptile_comment --> --------- Co-authored-by: Otto <otto@agpt.co>	2026-02-16 05:39:53 +00:00
Zamil Majdy	f9f358c526	feat(mcp): Add MCP tool block with OAuth, tool discovery, and standard credential integration (#12011 ) ## Summary <img width="1000" alt="image" src="https://github.com/user-attachments/assets/18e8ef34-d222-453c-8b0a-1b25ef8cf806" /> <img width="250" alt="image" src="https://github.com/user-attachments/assets/ba97556c-09c5-4f76-9f4e-49a2e8e57468" /> <img width="250" alt="image" src="https://github.com/user-attachments/assets/68f7804a-fe74-442d-9849-39a229c052cf" /> <img width="250" alt="image" src="https://github.com/user-attachments/assets/700690ba-f9fe-4726-8871-3bfbab586001" /> Full-stack MCP (Model Context Protocol) tool block integration that allows users to connect to any MCP server, discover available tools, authenticate via OAuth, and execute tools — all through the standard AutoGPT credential system. ### Backend - MCPToolBlock (`blocks/mcp/block.py`): New block using `CredentialsMetaInput` pattern with optional credentials (`default={}`), supporting both authenticated (OAuth) and public MCP servers. Includes auto-lookup fallback for backward compatibility. - MCP Client (`blocks/mcp/client.py`): HTTP transport with JSON-RPC 2.0, tool discovery, tool execution with robust error handling (type-checked error fields, non-JSON response handling) - MCP OAuth Handler (`blocks/mcp/oauth.py`): RFC 8414 discovery, dynamic per-server OAuth with PKCE, token storage and refresh via `raise_for_status=True` - MCP API Routes (`api/features/mcp/routes.py`): `discover-tools`, `oauth/login`, `oauth/callback` endpoints with credential cleanup, defensive OAuth metadata validation - Credential system integration: - `CredentialsMetaInput` model_validator normalizes legacy `"ProviderName.MCP"` format from Python 3.13's `str(StrEnum)` change - `CredentialsFieldInfo.combine()` supports URL-based credential discrimination (each MCP server gets its own credential entry) - `aggregate_credentials_inputs` checks block schema defaults for credential optionality - Executor normalizes credential data for both Pydantic and JSON schema validation paths - Chat credential matching handles MCP server URL filtering - `provider_matches()` helper used consistently for Python 3.13 StrEnum compatibility - Pre-run validation: `_validate_graph_get_errors` now calls `get_missing_input()` for custom block-level validation (MCP tool arguments) - Security: HTML tag stripping loop to prevent XSS bypass, SSRF protection (removed trusted_origins) ### Frontend - MCPToolDialog (`MCPToolDialog.tsx`): Full tool discovery UI — enter server URL, authenticate if needed, browse tools, select tool and configure - OAuth popup (`oauth-popup.ts`): Shared utility supporting cross-origin MCP OAuth flows with BroadcastChannel + localStorage fallback - Credential integration: MCP-specific OAuth flow in `useCredentialsInput`, server URL filtering in `useCredentials`, MCP callback page - CredentialsSelect: Auto-selects first available credential instead of defaulting to "None", credentials listed before "None" in dropdown - Node rendering: Dynamic tool input schema rendering on MCP nodes, proper handling in both legacy and new flow editors - Block title persistence: `customized_name` set at block creation for both MCP and Agent blocks — no fallback logic needed, titles survive save/load reliably - Stable credential ordering: Removed `sortByUnsetFirst` that caused credential inputs to jump when selected ### Tests (~2060 lines) - Unit tests: block, client, tool execution - Integration tests: mock MCP server with auth - OAuth flow tests - API endpoint tests - Credential combining/optionality tests - E2e tests (skipped in CI, run manually) ## Key Design Decisions 1. Optional credentials via `default={}`: MCP servers can be public (no auth) or private (OAuth). The `credentials` field has `default={}` making it optional at the schema level, so public servers work without prompting for credentials. 2. URL-based credential discrimination: Each MCP server URL gets its own credential entry in the "Run agent" form (via `discriminator="server_url"`), so agents using multiple MCP servers prompt for each independently. 3. Model-level normalization: Python 3.13 changed `str(StrEnum)` to return `"ClassName.MEMBER"`. Rather than scattering fixes across the codebase, a Pydantic `model_validator(mode="before")` on `CredentialsMetaInput` handles normalization centrally, and `provider_matches()` handles lookups. 4. Credential auto-select: `CredentialsSelect` component defaults to the first available credential and notifies the parent state, ensuring credentials are pre-filled in the "Run agent" dialog without requiring manual selection. 5. customized_name for block titles: Both MCP and Agent blocks set `customized_name` in metadata at creation time. This eliminates convoluted runtime fallback logic (`agent_name`, hostname extraction) — the title is persisted once and read directly. ## Test plan - [x] Unit/integration tests pass (68 MCP + 11 graph = 79 tests) - [x] Manual: MCP block with public server (DeepWiki) — no credentials needed, tools discovered and executable - [x] Manual: MCP block with OAuth server (Linear, Sentry) — OAuth flow prompts correctly - [x] Manual: "Run agent" form shows correct credential requirements per MCP server - [x] Manual: Credential auto-selects when exactly one matches, pre-selects first when multiple exist - [x] Manual: Credential ordering stays stable when selecting/deselecting - [x] Manual: MCP block title persists after save and refresh - [x] Manual: Agent block title persists after save and refresh (via customized_name) - [ ] Manual: Shared agent with MCP block prompts new user for credentials --------- Co-authored-by: Otto <otto@agpt.co> Co-authored-by: Ubbe <hi@ubbe.dev>	2026-02-13 16:17:03 +00:00
Nicholas Tindle	cb166dd6fb	feat(blocks): Store sandbox files to workspace (#12073 ) Store files created by sandbox blocks (Claude Code, Code Executor) to the user's workspace for persistence across runs. ### Changes 🏗️ - New `sandbox_files.py` utility (`backend/util/sandbox_files.py`) - Shared module for extracting files from E2B sandboxes - Stores files to workspace via `store_media_file()` (includes virus scanning, size limits) - Returns `SandboxFileOutput` with path, content, and `workspace_ref` - Claude Code block (`backend/blocks/claude_code.py`) - Added `workspace_ref` field to `FileOutput` schema - Replaced inline `_extract_files()` with shared utility - Files from working directory now stored to workspace automatically - Code Executor block (`backend/blocks/code_executor.py`) - Added `files` output field to `ExecuteCodeBlock.Output` - Creates `/output` directory in sandbox before execution - Extracts all files (text + binary) from `/output` after execution - Updated `execute_code()` to support file extraction with `extract_files` param ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Create agent with Claude Code block, have it create a file, verify `workspace_ref` in output - [x] Create agent with Code Executor block, write file to `/output`, verify `workspace_ref` in output - [x] Verify files persist in workspace after sandbox disposal - [x] Verify binary files (images, etc.) work correctly in Code Executor - [x] Verify existing graphs using `content` field still work (backward compat) #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) No configuration changes required - this is purely additive backend code. --- Related: Closes SECRT-1931 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds automatic extraction and workspace storage of sandbox-written files (including binaries for code execution), which can affect output payload size, performance, and file-handling edge cases. > > Overview > Sandbox blocks now persist generated files to workspace. A new shared utility (`backend/util/sandbox_files.py`) extracts files from an E2B sandbox (scoped by a start timestamp) and stores them via `store_media_file`, returning `SandboxFileOutput` with `workspace_ref`. > > `ClaudeCodeBlock` replaces its inline file-scraping logic with this utility and updates the `files` output schema to include `workspace_ref`. > > `ExecuteCodeBlock` adds a `files` output and extends the executor mixin to optionally extract/store files (text + binary) when an `execution_context` is provided; related mocks/tests and docs are updated accordingly. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `343854c0cf`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 15:56:59 +00:00
Reinier van der Leer	113e87a23c	refactor(backend): Reduce circular imports (#12068 ) I'm getting circular import issues because there is a lot of cross-importing between `backend.data`, `backend.blocks`, and other modules. This change reduces block-related cross-imports and thus risk of breaking circular imports. ### Changes 🏗️ - Strip down `backend.data.block` - Move `Block` base class and related class/enum defs to `backend.blocks._base` - Move `is_block_auth_configured` to `backend.blocks._utils` - Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks` (`__init__.py`) - Update imports everywhere - Remove unused and poorly typed `Block.create()` - Change usages from `block_cls.create()` to `block_cls()` - Improve typing of `load_all_blocks` and `get_blocks` - Move cross-import of `backend.api.features.library.model` from `backend/data/__init__.py` to `backend/data/integrations.py` - Remove deprecated attribute `NodeModel.webhook` - Re-generate OpenAPI spec and fix frontend usage - Eliminate module-level `backend.blocks` import from `blocks/agent.py` - Eliminate module-level `backend.data.execution` and `backend.executor.manager` imports from `blocks/helpers/review.py` - Replace `BlockInput` with `GraphInput` for graph inputs ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - CI static type-checking + tests should be sufficient for this	2026-02-12 12:07:49 +00:00
Otto	36aeb0b2b3	docs(blocks): clarify HumanInTheLoop output descriptions for agent builder (#12069 ) ## Problem The agent builder (LLM) misinterprets the HumanInTheLoop block outputs. It thinks `approved_data` and `rejected_data` will yield status strings like "APPROVED" or "REJECTED" instead of understanding that the actual input data passes through. This leads to unnecessary complexity - the agent builder adds comparison blocks to check for status strings that don't exist. ## Solution Enriched the block docstring and all input/output field descriptions to make it explicit that: 1. The output is the actual data itself, not a status string 2. The routing is determined by which output pin fires 3. How to use the block correctly (connect downstream blocks to appropriate output pins) ## Changes - Updated block docstring with clear "How it works" and "Example usage" sections - Enhanced `data` input description to explain data flow - Enhanced `name` input description for reviewer context - Enhanced `approved_data` output to explicitly state it's NOT a status string - Enhanced `rejected_data` output to explicitly state it's NOT a status string - Enhanced `review_message` output for clarity ## Testing Documentation-only change to schema descriptions. No functional changes. Fixes SECRT-1930 <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Enhanced documentation for the `HumanInTheLoopBlock` to clarify how output pins work. The key improvement explicitly states that output pins (`approved_data` and `rejected_data`) yield the actual input data, not status strings like "APPROVED" or "REJECTED". This prevents the agent builder (LLM) from misinterpreting the block's behavior and adding unnecessary comparison blocks. Key changes: - Added "How it works" and "Example usage" sections to the block docstring - Clarified that routing is determined by which output pin fires, not by comparing output values - Enhanced all input/output field descriptions with explicit data flow explanations - Emphasized that downstream blocks should be connected to the appropriate output pin based on desired workflow path This is a documentation-only change with no functional modifications to the code logic. </details> <details><summary><h3>Confidence Score: 5/5</h3></summary> - This PR is safe to merge with no risk - Documentation-only change that accurately reflects the existing code behavior. No functional changes, no runtime impact, and the enhanced descriptions correctly explain how the block outputs work based on verification of the implementation code. - No files require special attention </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment --> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-02-11 15:43:58 +00:00
Nicholas Tindle	85b6520710	feat(blocks): Add video editing blocks (#11796 ) <!-- Clearly explain the need for these changes: --> This PR adds general-purpose video editing blocks for the AutoGPT Platform, enabling automated video production workflows like documentary creation, marketing videos, tutorial assembly, and content repurposing. ### Changes 🏗️ <!-- Concisely describe all of the changes made in this pull request: --> New blocks added in `backend/blocks/video/`: - `VideoDownloadBlock` - Download videos from URLs (YouTube, Vimeo, news sites, direct links) using yt-dlp - `VideoClipBlock` - Extract time segments from videos with start/end time validation - `VideoConcatBlock` - Merge multiple video clips with optional transitions (none, crossfade, fade_black) - `VideoTextOverlayBlock` - Add text overlays/captions with positioning and timing options - `VideoNarrationBlock` - Generate AI narration via ElevenLabs and mix with video audio (replace, mix, or ducking modes) Dependencies required: - `yt-dlp` - For video downloading - `moviepy` - For video editing operations Implementation details: - All blocks follow the SDK pattern with proper error handling and exception chaining - Proper resource cleanup in `finally` blocks to prevent memory leaks - Input validation (e.g., end_time > start_time) - Test mocks included for CI ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Blocks follow the SDK pattern with `BlockSchemaInput`/`BlockSchemaOutput` - [x] Resource cleanup is implemented in `finally` blocks - [x] Exception chaining is properly implemented - [x] Input validation is in place - [x] Test mocks are provided for CI environments #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [ ] I have included a list of my configuration changes in the PR description (under Changes) N/A - No configuration changes required. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds new multimedia blocks that invoke ffmpeg/MoviePy and introduces new external dependencies (plus container packages), which can impact runtime stability and resource usage; download/overlay blocks are present but disabled due to sandbox/policy concerns. > > Overview > Adds a new `backend.blocks.video` module with general-purpose video workflow blocks (download, clip, concat w/ transitions, loop, add-audio, text overlay, and ElevenLabs-powered narration), including shared utilities for codec selection, filename cleanup, and an ffmpeg-based chapter-strip workaround for MoviePy. > > Extends credentials/config to support ElevenLabs (`ELEVENLABS_API_KEY`, provider enum, system credentials, and cost config) and adds new dependencies (`elevenlabs`, `yt-dlp`) plus Docker runtime packages (`ffmpeg`, `imagemagick`). > > Improves file/reference handling end-to-end by embedding MIME types in `workspace://...#mime` outputs and updating frontend rendering to detect video vs image from MIME fragments (and broaden supported audio/video extensions), with optional enhanced output rendering behind a feature flag in the legacy builder UI. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `da7a44d794`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com> Co-authored-by: Otto <otto@agpt.co> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 22:22:33 +00:00
Bently	bfa942e032	feat(platform): Add Claude Opus 4.6 model support (#11983 ) ## Summary Adds support for Anthropic's newly released Claude Opus 4.6 model. ## Changes - Added `claude-opus-4-6` to the `LlmModel` enum - Added model metadata: 200K context window (1M beta), 128K max output tokens - Added block cost config (same pricing tier as Opus 4.5: $5/MTok input, $25/MTok output) - Updated chat config default model to Claude Opus 4.6 ## Model Details From [Anthropic's docs](https://docs.anthropic.com/en/docs/about-claude/models): - API ID: `claude-opus-4-6` - Context window: 200K tokens (1M beta) - Max output: 128K tokens (up from 64K on Opus 4.5) - Extended thinking: Yes - Adaptive thinking: Yes (new, Opus 4.6 exclusive) - Knowledge cutoff: May 2025 (reliable), Aug 2025 (training) - Pricing: $5/MTok input, $25/MTok output (same as Opus 4.5) --------- Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>	2026-02-05 19:19:51 +00:00
Bently	3ca2387631	feat(blocks): Implement Text Encode block (#11857 ) ## Summary Implements a `TextEncoderBlock` that encodes plain text into escape sequences (the reverse of `TextDecoderBlock`). ## Changes ### Block Implementation - Added `encoder_block.py` with `TextEncoderBlock` in `autogpt_platform/backend/backend/blocks/` - Uses `codecs.encode(text, "unicode_escape").decode("utf-8")` for encoding - Mirrors the structure and patterns of the existing `TextDecoderBlock` - Categorised as `BlockCategory.TEXT` ### Documentation - Added Text Encoder section to `docs/integrations/block-integrations/text.md` (the auto-generated docs file for TEXT category blocks) - Expanded "How it works" with technical details on the encoding method, validation, and edge cases - Added 3 structured use cases per docs guidelines: JSON payload preparation, Config/ENV generation, Snapshot fixtures - Added Text Encoder to the overview table in `docs/integrations/README.md` - Removed standalone `encoder_block.md` (TEXT category blocks belong in `text.md` per `CATEGORY_FILE_MAP` in `generate_block_docs.py`) ### Documentation Formatting (CodeRabbit feedback) - Added blank lines around markdown tables (MD058) - Added `text` language tags to fenced code blocks (MD040) - Restructured use case section with bold headings per coding guidelines ## How Docs Were Synced The `check-docs-sync` CI job runs `poetry run python scripts/generate_block_docs.py --check` which expects blocks to be documented in category-grouped files. Since `TextEncoderBlock` uses `BlockCategory.TEXT`, the `CATEGORY_FILE_MAP` maps it to `text.md` — not a standalone file. The block entry was added to `text.md` following the exact format used by the generator (with `<!-- MANUAL -->` markers for hand-written sections). ## Related Issue Fixes #11111 --------- Co-authored-by: Otto <otto@agpt.co> Co-authored-by: lif <19658300+majiayu000@users.noreply.github.com> Co-authored-by: Aryan Kaul <134673289+aryancodes1@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: Nick Tindle <nick@ntindle.com>	2026-02-05 17:31:02 +00:00
Otto	4f908d5cb3	fix(platform): Improve Linear Search Block [SECRT-1880] (#11967 ) ## Summary Implements [SECRT-1880](https://linear.app/autogpt/issue/SECRT-1880) - Improve Linear Search Block ## Changes ### Models (`models.py`) - Added `State` model with `id`, `name`, and `type` fields for workflow state information - Added `state: State \| None` field to `Issue` model ### API Client (`_api.py`) - Updated `try_search_issues()` to: - Add `max_results` parameter (default 10, was ~50) to reduce token usage - Add `team_id` parameter for team filtering - Return `createdAt`, `state`, `project`, and `assignee` fields in results - Fixed `try_get_team_by_name()` to return descriptive error message when team not found instead of crashing with `IndexError` ### Block (`issues.py`) - Added `max_results` input parameter (1-100, default 10) - Added `team_name` input parameter for optional team filtering - Added `error` output field for graceful error handling - Added categories (`PRODUCTIVITY`, `ISSUE_TRACKING`) - Updated test fixtures to include new fields ## Breaking Changes \| Change \| Before \| After \| Mitigation \| \|--------\|--------\|-------\|------------\| \| Default result count \| ~50 \| 10 \| Users can set `max_results` up to 100 if needed \| ## Non-Breaking Changes - `state` field added to `Issue` (optional, defaults to `None`) - `max_results` param added (has default value) - `team_name` param added (optional, defaults to `None`) - `error` output added (follows established pattern from GitHub blocks) ## Testing - [x] Format/lint checks pass - [x] Unit test fixtures updated Resolves SECRT-1880 --------- Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>	2026-02-04 22:54:46 +00:00
Otto	7ee94d986c	docs: add credentials prerequisites to create-basic-agent guide (#11913 ) ## Summary Addresses #11785 - users were encountering `openai_api_key_credentials` errors when following the create-basic-agent guide because it didn't mention the need to configure API credentials before using AI blocks. ## Changes Added a Prerequisites section to `docs/platform/create-basic-agent.md` explaining: - Cloud users: Go to Profile → Integrations to add API keys - Self-hosted (Docker): Add keys to `autogpt_platform/backend/.env` and restart services Also added a note that the Calculator example doesn't need credentials, making it a good first test. ## Related - Issue: #11785	2026-01-31 03:05:31 +00:00

1 2 3 4 5 ...

288 Commits