mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-30 03:00:41 -04:00
80bfd64ffa0e4454be369ddca794e916b8d357d1
8140 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
80bfd64ffa | Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev | ||
|
|
0076ad2a1a |
hotfix(blocks): bump stagehand ^0.5.1 → ^3.4.0 to fix yanked litellm (#12539)
## Summary **Critical CI fix** — litellm was compromised in a supply chain attack (versions 1.82.7/1.82.8 contained infostealer malware) and PyPI subsequently yanked many litellm versions including the 1.7x range that stagehand 0.5.x depended on. This breaks `poetry lock` in CI for all PRs. - Bump `stagehand` from `^0.5.1` to `^3.4.0` — Stagehand v3 is a Stainless-generated HTTP API client that **no longer depends on litellm**, completely removing litellm from our dependency tree - Migrate stagehand blocks to use `AsyncStagehand` + session-based API (`sessions.start`, `session.navigate/act/observe/extract`) - Net reduction of ~430 lines in `poetry.lock` from dropping litellm and its transitive dependencies ## Why All CI pipelines are blocked because `poetry lock` fails to resolve yanked litellm versions that stagehand 0.5.x required. ## Test plan - [x] CI passes (poetry lock resolves, backend tests green) - [ ] Verify stagehand blocks still function with the new session-based API |
||
|
|
edb3d322f0 |
feat(backend/copilot): parallel block execution via infrastructure-level pre-launch (#12472)
## Summary - Implements **infrastructure-level parallel tool execution** for CoPilot: all tools called in a single LLM turn now execute concurrently with zero changes to individual tool implementations or LLM prompts. - Adds `pre_launch_tool_call()` to `tool_adapter.py`: when an `AssistantMessage` with `ToolUseBlock`s arrives, all tools are immediately fired as `asyncio.Task`s before the SDK dispatches MCP handlers. Each MCP handler then awaits its pre-launched task instead of executing fresh. - Adds a `_tool_task_queues` `ContextVar` (initialized per-session in `set_execution_context()`) so concurrent sessions never share task queues. - DRY refactor: extracts `prepare_block_for_execution()`, `check_hitl_review()`, and `BlockPreparation` dataclass into `helpers.py` so the execution pipeline is reusable. - 10 unit tests for the parallel pre-launch infrastructure (queue enqueue/dequeue, MCP prefix stripping, fallback path, `CancelledError` handling, multi-same-tool FIFO ordering). ## Root cause The Claude Agent SDK CLI sends MCP tool calls as sequential request-response pairs: it waits for each `control_response` before issuing the next `mcp_message`. Even though Python dispatches handlers with `start_soon`, the CLI never issues call B until call A's response is sent — blocks always ran sequentially. The pre-launch pattern fixes this at the infrastructure level by starting all tasks before the SDK even dispatches the first handler. ## Test plan - [x] `poetry run pytest backend/copilot/sdk/tool_adapter_test.py` — 27 tests pass (10 new parallel infra tests) - [x] `poetry run pytest backend/copilot/tools/helpers_test.py` — 20 tests pass - [x] `poetry run pytest backend/copilot/tools/run_block_test.py backend/copilot/tools/test_run_block_details.py` — all pass - [x] Manually test in CoPilot: ask the agent to run two blocks simultaneously — verify both start executing before either completes - [x] E2E: Both GetCurrentTimeBlock and CalculatorBlock executed concurrently (time=09:35:42, 42×7=294) - [x] E2E: Pre-launch mechanism active — two run_block events at same timestamp (3ms apart) - [x] E2E: Arg-mismatch fallback tested — system correctly cancels and falls back to direct execution |
||
|
|
9381057079 |
refactor(platform): rename SmartDecisionMakerBlock to OrchestratorBlock (#12511)
## Summary - Renames `SmartDecisionMakerBlock` to `OrchestratorBlock` across the entire codebase - The block supports iteration/agent mode and general tool orchestration, so "Smart Decision Maker" no longer accurately describes its capabilities - Block UUID (`3b191d9f-356f-482d-8238-ba04b6d18381`) remains unchanged — fully backward compatible with existing graphs ## Changes - Renamed block class, constants, file names, test files, docs, and frontend enum - Updated copilot agent generator (helpers, validator, fixer) references - Updated agent generation guide documentation - No functional changes — pure rename refactor ### For code changes - [x] I have clearly listed my changes in the PR description - [x] I have made corresponding changes to the documentation - [x] My changes do not generate new warnings or errors - [x] New and existing unit tests pass locally with my changes ## Test plan - [x] All pre-commit hooks pass (typecheck, lint, format) - [x] Existing graphs with this block continue to load and execute (same UUID) - [x] Agent mode / iteration mode works as before - [x] Copilot agent generator correctly references the renamed block |
||
|
|
f21a36ca37 |
fix(backend): downgrade user-caused LLM API errors to warning level (#12516)
Requested by @majdyz Follow-up to #12513. Anthropic/OpenAI 401, 403, and 429 errors are user-caused (bad API keys, forbidden, rate limits) and should not hit Sentry as exceptions. ### Changes **Changes in `blocks/llm.py`:** - Anthropic `APIError` handler (line ~950): check `status_code` — use `logger.warning()` for 401/403/429, keep `logger.error()` for server errors - Generic `Exception` handler in LLM block `run()` (line ~1467): same pattern — `logger.warning()` for user-caused status codes, `logger.exception()` for everything else - Extracted `USER_ERROR_STATUS_CODES = (401, 403, 429)` module-level constant - Added `break` to short-circuit retry loop for user-caused errors - Removed double-logging from inner Anthropic handler **Changes in `blocks/test/test_llm.py`:** - Added 8 regression tests covering 401/403/429 fast-exit and 500 retry behavior **Sentry issues addressed:** - AUTOGPT-SERVER-8B6, 8B7, 8B8 — `[LLM-Block] Anthropic API error: Error code: 401 - invalid x-api-key` - Any OpenAI 401/403/429 errors hitting the generic exception handler Part of SECRT-2166 ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan #### Test plan: - [x] Unit tests for 401/403/429 Anthropic errors → warning log, no retry - [x] Unit tests for 500 Anthropic errors → error log, retry - [x] Unit tests for 401/403/429 OpenAI errors → warning log, no retry - [x] Unit tests for 500 OpenAI errors → error log, retry - [x] Verified USER_ERROR_STATUS_CODES constant is used consistently - [x] Verified no double-logging in Anthropic handler path --- Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co> --------- Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co> |
||
|
|
ee5382a064 |
feat(copilot): add tool/block capability filtering to AutoPilotBlock (#12482)
## Summary - Adds `CopilotPermissions` model (`copilot/permissions.py`) — a capability filter that restricts which tools and blocks the AutoPilot/Copilot may use during a single execution - Exposes 4 new `advanced=True` fields on `AutoPilotBlock`: `tools`, `tools_exclude`, `blocks`, `blocks_exclude` - Threads permissions through the full execution path: `AutoPilotBlock` → `collect_copilot_response` → `stream_chat_completion_sdk` → `run_block` - Implements recursion inheritance via contextvar: sub-agent executions can only be *more* restrictive than their parent ## Design **Tool filtering** (`tools` + `tools_exclude`): - `tools_exclude=True` (default): `tools` is a **blacklist** — listed tools denied, all others allowed. Empty list = allow all. - `tools_exclude=False`: `tools` is a **whitelist** — only listed tools are allowed. - Users specify short names (`run_block`, `web_fetch`, `Read`, `Task`, …) — mapped to full SDK format internally. - Validated eagerly at block-run time with a clear error listing valid names. **Block filtering** (`blocks` + `blocks_exclude`): - Same semantics as tool filtering, applied inside `run_block` via contextvar. - Each entry can be a full UUID, an 8-char partial UUID (first segment), or a case-insensitive block name. - Validated against the live block registry; invalid identifiers surface a helpful error before the session is created. **Recursion inheritance**: - `_inherited_permissions` contextvar stores the parent execution's permissions. - On each `AutoPilotBlock.run()`, the child's permissions are merged with the parent via `merged_with_parent()` — effective allowed sets are intersected (tools) and the parent chain is kept for block checks. - Sub-agents can never expand what the parent allowed. ## Test plan - [x] 68 new unit tests in `copilot/permissions_test.py` and `blocks/autopilot_permissions_test.py` - [x] Block identifier matching: full UUID, partial UUID, name, case-insensitivity - [x] Tool allow/deny list semantics including edge cases (empty list, unknown tool) - [x] Parent/child merging and recursion ceiling correctness - [x] `validate_tool_names` / `validate_block_identifiers` with mock block registry - [x] `apply_tool_permissions` SDK tool-list integration - [x] `AutoPilotBlock.run()` — invalid tool/block yields error before session creation - [x] `AutoPilotBlock.run()` — valid permissions forwarded to `execute_copilot` - [x] Existing `AutoPilotBlock` block tests still pass (2/2) - [x] All hooks pass (pyright, ruff, black, isort) - [x] E2E: CoPilot chat works end-to-end with E2B sandbox (12s stream) - [x] E2E: Permission fields render in Builder UI (Tools combobox, exclude toggles) - [x] E2E: Agent with restricted permissions (whitelist web_fetch only) executes correctly - [x] E2E: Permission values preserved through API round-trip |
||
|
|
b80e5ea987 |
fix(backend): allow admins to download submitted agents pending review (#12535)
## Why
Admins cannot download submitted-but-not-yet-approved agents from
`/admin/marketplace`. Clicking "Download" fails silently with a Server
Components render error. This blocks admins from reviewing agents that
companies have submitted.
## What
Remove the redundant ownership/marketplace check from
`get_graph_as_admin()` that was silently tightened in PR #11323 (Nov
2025). Add regression tests for both the admin download path and the
non-admin marketplace access control.
## How
**Root cause:** In PR #11323, Reinier refactored an inline
`StoreListingVersion` query (which had no status filter) into a call to
`is_graph_published_in_marketplace()` (which requires `submissionStatus:
APPROVED`). This was collateral cleanup — his PR focused on sub-agent
execution permissions — but it broke admin download of pending agents.
**Fix:** Remove the ownership/marketplace check from
`get_graph_as_admin()`, keeping only the null guard. This is safe
because `get_graph_as_admin` is only callable through admin-protected
routes (`requires_admin_user` at router level).
**Tests added:**
- `test_admin_can_access_pending_agent_not_owned` — admin can access a
graph they don't own that isn't APPROVED
- `test_admin_download_pending_agent_with_subagents` — admin export
includes sub-graphs
- `test_get_graph_non_owner_approved_marketplace_agent` — protects PR
#11323: non-owners CAN access APPROVED agents
- `test_get_graph_non_owner_pending_marketplace_agent_denied` — protects
PR #11323: non-owners CANNOT access PENDING agents
### Checklist
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] 4 regression tests pass locally
- [x] Admin can download pending agents (verified via unit test)
- [x] Non-admin marketplace access control preserved
## Test plan
- [ ] Verify admin can download a submitted-but-not-approved agent from
`/admin/marketplace`
- [ ] Verify non-admin users still cannot access admin endpoints
- [ ] Verify the download succeeds without console errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Changes access-control behavior for admin graph retrieval; risk is
mitigated by route-level admin auth but misuse of `get_graph_as_admin()`
outside admin-protected routes would expose non-approved graphs.
>
> **Overview**
> Admins can now download/review **submitted-but-not-approved**
marketplace agents: `get_graph_as_admin()` no longer enforces ownership
or *marketplace APPROVED* checks, only returning `None` when the graph
doesn’t exist.
>
> Adds regression tests covering the admin download/export path
(including sub-graphs) and confirming non-admin behavior is unchanged:
non-owners can fetch **APPROVED** marketplace graphs but cannot access
**pending** ones.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
|
||
|
|
3d4fcfacb6 |
fix(backend): add circuit breaker for infinite tool call retry loops (#12499)
## Summary
- Adds a two-layer circuit breaker to prevent AutoPilot from looping
infinitely when tool calls fail with empty parameters
- **Tool-level**: After 3 consecutive identical failures per tool,
returns a hard-stop message instructing the model to output content as
text instead of retrying
- **Stream-level**: After 6 consecutive empty tool calls (`input: {}`),
aborts the stream entirely with a user-visible error and retry button
## Background
In session `c5548b48`, the model completed all research successfully but
then spent 51+ minutes in an infinite loop trying to write output —
every tool call was sent with `input: {}` (likely due to context
saturation preventing argument serialization). 21+ identical failing
tool calls with no circuit breaker.
## Changes
- `tool_adapter.py`: Added `_check_circuit_breaker`,
`_record_tool_failure`, `_clear_tool_failures` functions with a
`ContextVar`-based tracker. Integrated into both `create_tool_handler`
(BaseTool) and the `_truncating` wrapper (all tools).
- `service.py`: Added empty-tool-call detection in the main stream loop
that counts consecutive `AssistantMessage`s with empty
`ToolUseBlock.input` and aborts after the limit.
- `test_circuit_breaker.py`: 7 unit tests covering threshold behavior,
per-args tracking, reset on success, and uninitialized tracker safety.
## Test plan
- [x] Unit tests pass (`pytest
backend/copilot/sdk/test_circuit_breaker.py` — 8/8 passing)
- [x] Pre-commit hooks pass (Ruff, Black, isort, typecheck all pass)
- [x] E2E: CoPilot tool calls work normally (GetCurrentTimeBlock
returned 09:16:39 UTC)
- [x] E2E: Circuit breaker pass-through verified (successful calls don't
trigger breaker)
- [x] E2E: Circuit breaker code integrated into tool_adapter truncating
wrapper
|
||
|
|
32eac6d52e |
dx(skills): improve /pr-test to require screenshots, state verification, and fix accountability (#12527)
## Summary - Add "Critical Requirements" section making screenshots at every step, PR comment posting, state verification, negative tests, and full evidence reports non-negotiable - Add "State Manipulation for Realistic Testing" section with Redis CLI, DB query, and API before/after patterns - Strengthen fix mode to require before/after screenshot pairs, rebuild only affected services, and commit after each fix - Expand test report format to include API evidence and screenshot evidence columns - Bump version to 2.0.0 ## Test plan - [x] Run `/pr-test` on an existing PR and verify it follows the new critical requirements - [x] Verify screenshots are posted to PR comment - [x] Verify fix mode produces before/after screenshot pairs |
||
|
|
9762f4cde7 |
chore(libs/deps-dev): bump the development-dependencies group across 1 directory with 2 updates (#12523)
Bumps the development-dependencies group with 2 updates in the /autogpt_platform/autogpt_libs directory: [pytest-cov](https://github.com/pytest-dev/pytest-cov) and [ruff](https://github.com/astral-sh/ruff). Updates `pytest-cov` from 7.0.0 to 7.1.0 <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst">pytest-cov's changelog</a>.</em></p> <blockquote> <h2>7.1.0 (2026-03-21)</h2> <ul> <li> <p>Fixed total coverage computation to always be consistent, regardless of reporting settings. Previously some reports could produce different total counts, and consequently can make --cov-fail-under behave different depending on reporting options. See <code>[#641](https://github.com/pytest-dev/pytest-cov/issues/641) <https://github.com/pytest-dev/pytest-cov/issues/641></code>_.</p> </li> <li> <p>Improve handling of ResourceWarning from sqlite3.</p> <p>The plugin adds warning filter for sqlite3 <code>ResourceWarning</code> unclosed database (since 6.2.0). It checks if there is already existing plugin for this message by comparing filter regular expression. When filter is specified on command line the message is escaped and does not match an expected message. A check for an escaped regular expression is added to handle this case.</p> <p>With this fix one can suppress <code>ResourceWarning</code> from sqlite3 from command line::</p> <p>pytest -W "ignore:unclosed database in <sqlite3.Connection object at:ResourceWarning" ...</p> </li> <li> <p>Various improvements to documentation. Contributed by Art Pelling in <code>[#718](https://github.com/pytest-dev/pytest-cov/issues/718) <https://github.com/pytest-dev/pytest-cov/pull/718></code>_ and "vivodi" in <code>[#738](https://github.com/pytest-dev/pytest-cov/issues/738) <https://github.com/pytest-dev/pytest-cov/pull/738></code><em>. Also closed <code>[#736](https://github.com/pytest-dev/pytest-cov/issues/736) <https://github.com/pytest-dev/pytest-cov/issues/736></code></em>.</p> </li> <li> <p>Fixed some assertions in tests. Contributed by in Markéta Machová in <code>[#722](https://github.com/pytest-dev/pytest-cov/issues/722) <https://github.com/pytest-dev/pytest-cov/pull/722></code>_.</p> </li> <li> <p>Removed unnecessary coverage configuration copying (meant as a backup because reporting commands had configuration side-effects before coverage 5.0).</p> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
76901ba22f |
docs: add Why/What/How structure to PR template, CLAUDE.md, and PR skills (#12525)
Requested by @majdyz ### Why / What / How **Why:** PR descriptions currently explain the *what* and *how* but not the *why*. Without motivation context, reviewers can't judge whether an approach fits the problem. Nick flagged this in standup: "The PR descriptions you use are explaining the what not the why." **What:** Adds a consistent Why / What / How structure to PR descriptions across the entire workflow — template, CLAUDE.md guidance, and all PR-related skills (`/pr-review`, `/pr-test`, `/pr-address`). **How:** - **`.github/PULL_REQUEST_TEMPLATE.md`**: Replaced the old vague `Changes` heading with a single `Why / What / How` section with guiding comments - **`autogpt_platform/CLAUDE.md`**: Added bullet under "Creating Pull Requests" requiring the Why/What/How structure - **`.claude/skills/pr-review/SKILL.md`**: Added "Read the PR description" step before reading the diff, and "Description quality" to the review checklist - **`.claude/skills/pr-test/SKILL.md`**: Updated Step 1 to read the PR description and understand Why/What/How before testing - **`.claude/skills/pr-address/SKILL.md`**: Added "Read the PR description" step before fetching comments ## Test plan - [x] All five files reviewed for correct formatting and consistency --- Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co> |
||
|
|
23b65939f3 |
fix(backend/db): add DB_STATEMENT_CACHE_SIZE env var for Prisma engine (#12521)
## Summary - Add `DB_STATEMENT_CACHE_SIZE` env var support for Prisma query engine - Wires through as `statement_cache_size` URL parameter to control the LRU prepared statement cache per connection in the Rust binary engine ## Why Live investigation on dev pods showed the Prisma Rust engine growing from 34MB to 932MB over ~1hr due to unbounded query plan cache. Despite `pgbouncer=true` in the DATABASE_URL (which should disable caching), the engine still caches. This gives explicit control: setting `DB_STATEMENT_CACHE_SIZE=0` disables the cache entirely. ## Live data (dev) ``` Fresh pod: Python=693MB, Engine=34MB, Total=727MB Bloated: Python=2.1GB, Engine=932MB, Total=3GB ``` ## Infra companion PR [AutoGPT_cloud_infrastructure#299](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/299) sets `DB_STATEMENT_CACHE_SIZE=0` along with `PYTHONMALLOC=malloc` and memory limit changes. ## Test plan - [ ] Deploy to dev and monitor Prisma engine memory over 1hr - [ ] Verify queries still work correctly with cache disabled - [ ] Compare engine RSS on fresh vs aged pods |
||
|
|
1c27eaac53 |
dx(skills): improve /pr-test skill to show screenshots with explanations (#12518)
## Summary - Update /pr-test skill to consistently show screenshots inline to the user with explanations - Post PR comments with inline images and per-screenshot descriptions (not just local file paths) - Simplify GitHub Git API upload flow for screenshot hosting ## Changes - Step 5: Take screenshots at every significant test step (aim for 1+ per scenario) - Step 6 (new): Show every screenshot to the user via Read tool with 2-3 sentence explanations - Step 7: Post PR comment with inline images, summary table, and per-screenshot context ## Test plan - [x] Tested end-to-end on PR #12512 — screenshots uploaded and rendered correctly in PR comment |
||
|
|
923b164794 |
fix(backend): use system chromium for agent-browser on all architectures (#12473)
## Summary - Replaces the arch-conditional chromium install (ARM64 vs AMD64) with a single approach: always use the distro-packaged `chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` - Removes `agent-browser install` entirely (it downloads Chrome for Testing, which has no ARM64 binary) - Removes the `entrypoint.sh` wrapper script that was setting the env var at runtime - Updates `autogpt_platform/db/docker/docker-compose.yml`: removes `external: true` from the network declarations so the Supabase stack can be brought up standalone (needed for the Docker integration tests in the test plan below — without this, `docker compose up` fails unless the platform stack is already running); also sets `GOTRUE_MAILER_AUTOCONFIRM: true` for local dev convenience (no SMTP setup required on first run — this compose file is not used in production) - Updates `autogpt_platform/docker-compose.platform.yml`: mounts the `workspace` volume so agent-browser results (screenshots, snapshots) are accessible from other services; without this the copilot workspace write fails in Docker ## Verification Tested via Docker build on arm64 (Apple Silicon): ``` === Testing agent-browser with system chromium === ✓ Example Domain https://example.com/ === SUCCESS: agent-browser launched with system chromium === ``` agent-browser navigated to example.com in ~1.5s using system chromium (v146 from Debian trixie). ## Test plan - [x] Docker build test on arm64: `agent-browser open https://example.com` succeeds with system chromium - [x] Verify amd64 Docker build still works (CI) |
||
|
|
e86ac21c43 |
feat(platform): add workflow import from other tools (n8n, Make.com, Zapier) (#12440)
## Summary - Enable one-click import of workflows from other platforms (n8n, Make.com, Zapier, etc.) into AutoGPT via CoPilot - **No backend endpoint** — import is entirely client-side: the dialog reads the file or fetches the n8n template URL, uploads the JSON to the workspace via `uploadFileDirect`, stores the file reference in `sessionStorage`, and redirects to CoPilot with `autosubmit=true` - CoPilot receives the workflow JSON as a proper file attachment and uses the existing agent-generator pipeline to convert it - Library dialog redesigned: 2 tabs — "AutoGPT agent" (upload exported agent JSON) and "Another platform" (file upload + optional n8n URL) ## How it works 1. User uploads a workflow JSON (or pastes an n8n template URL) 2. Frontend fetches/reads the JSON and uploads it to the user's workspace via the existing file upload API 3. User is redirected to `/copilot?source=import&autosubmit=true` 4. CoPilot picks up the file from `sessionStorage` and sends it as a `FileUIPart` attachment with a prompt to recreate the workflow as an AutoGPT agent ## Test plan - [x] Manual test: import a real n8n workflow JSON via the dialog - [x] Manual test: paste an n8n template URL and verify it fetches + converts - [x] Manual test: import Make.com / Zapier workflow export JSON - [x] Repeated imports don't cause 409 conflicts (filenames use `crypto.randomUUID()`) - [x] E2E: Import dialog has 2 tabs (AutoGPT agent + Another platform) - [x] E2E: n8n quick-start template buttons present - [x] E2E: n8n URL input enables Import button on valid URL - [x] E2E: Workspace upload API returns file_id |
||
|
|
94224be841 | Merge remote-tracking branch 'origin/master' into dev | ||
|
|
da4bdc7ab9 |
fix(backend+frontend): reduce Sentry noise from user-caused errors (#12513)
Requested by @majdyz
User-caused errors (no payment method, webhook agent invocation, missing
credentials, bad API keys) were hitting Sentry via `logger.exception()`
in the `ValueError` handler, creating noise that obscures real bugs.
Additionally, a frontend crash on the copilot page (BUILDER-71J) needed
fixing.
**Changes:**
**Backend — rest_api.py**
- Set `log_error=False` for the `ValueError` exception handler (line
278), consistent with how `FolderValidationError` and `NotFoundError`
are already handled. User-caused 400 errors no longer trigger
`logger.exception()` → Sentry.
**Backend — executor/manager.py**
- Downgrade `ExecutionManager` input validation skip errors from `error`
to `warning` level. Missing credentials is expected user behavior, not
an internal error.
**Backend — blocks/llm.py**
- Sanitize unpaired surrogates in LLM prompt content before sending to
provider APIs. Prevents `UnicodeEncodeError: surrogates not allowed`
when httpx encodes the JSON body (AUTOGPT-SERVER-8AX).
**Frontend — package.json**
- Upgrade `ai` SDK from `6.0.59` to `6.0.134` to fix BUILDER-71J
(`TypeError: undefined is not an object (evaluating
'this.activeResponse.state')` on /copilot page). This is a known issue
in the Vercel AI SDK fixed in later patch versions.
**Sentry issues addressed:**
- `No payment method found` (ValueError → 400)
- `This agent is triggered by an external event (webhook)` (ValueError →
400)
- `Node input updated with non-existent credentials` (ValueError → 400)
- `[ExecutionManager] Skip execution, input validation error: missing
input {credentials}`
- `UnicodeEncodeError: surrogates not allowed` (AUTOGPT-SERVER-8AX)
- `TypeError: activeResponse.state` (BUILDER-71J)
Resolves SECRT-2166
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
---------
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
|
||
|
|
7176cecf25 |
perf(copilot): reduce tool schema token cost by 34% (#12398)
## Summary Reduce CoPilot per-turn token overhead by systematically trimming tool descriptions, parameter schemas, and system prompt content. All 35 MCP tool schemas are passed on every SDK call — this PR reduces their size. ### Strategy 1. **Tool descriptions**: Trimmed verbose multi-sentence explanations to concise single-sentence summaries while preserving meaning 2. **Parameter schemas**: Shortened parameter descriptions to essential info, removed some `default` values (handled in code) 3. **System prompt**: Condensed `_SHARED_TOOL_NOTES` and storage supplement template in `prompting.py` 4. **Cross-tool references**: Removed duplicate workflow hints (e.g. "call find_block before run_block" appeared in BOTH tools — kept only in the dependent tool). Critical cross-tool references retained (e.g. `continue_run_block` in `run_block`, `fix_agent_graph` in `validate_agent`, `get_doc_page` in `search_docs`, `web_fetch` preference in `browser_navigate`) ### Token Impact | Metric | Before | After | Reduction | |--------|--------|-------|-----------| | System Prompt | ~865 tokens | ~497 tokens | 43% | | Tool Schemas | ~9,744 tokens | ~6,470 tokens | 34% | | **Grand Total** | **~10,609 tokens** | **~6,967 tokens** | **34%** | Saves **~3,642 tokens per conversation turn**. ### Key Decisions - **Mostly description changes**: Tool logic, parameters, and types unchanged. However, some schema-level `default` fields were removed (e.g. `save` in `customize_agent`) — these are machine-readable metadata, not just prose, and may affect LLM behavior. - **Quality preserved**: All descriptions still convey what the tool does and essential usage patterns - **Cross-references trimmed carefully**: Kept prerequisite hints in the dependent tool (run_block mentions find_block) but removed the reverse (find_block no longer mentions run_block). Critical cross-tool guidance retained where removal would degrade model behavior. - **`run_time` description fixed**: Added missing supported values (today, last 30 days, ISO datetime) per review feedback ### Future Optimization The SDK passes all 35 tools on every call. The MCP protocol's `list_tools()` handler supports dynamic tool registration — a follow-up PR could implement lazy tool loading (register core tools + a discovery meta-tool) to further reduce per-turn token cost. ### Changes - Trimmed descriptions across 25 tool files - Condensed `_SHARED_TOOL_NOTES` and `_build_storage_supplement` in `prompting.py` - Fixed `run_time` schema description in `agent_output.py` ### Checklist #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] All 273 copilot tests pass locally - [x] All 35 tools load and produce valid schemas - [x] Before/after token dumps compared - [x] Formatting passes (`poetry run format`) - [x] CI green |
||
|
|
f35210761c |
feat(devops): add /pr-test skill + subscription mode auto-provisioning (#12507)
## Summary - Adds `/pr-test` skill for automated E2E testing of PRs using docker compose, agent-browser, and API calls - Covers full environment setup (copy .env, configure copilot auth, ARM64 Docker fix) - Includes browser UI testing, direct API testing, screenshot capture, and test report generation - Has `--fix` mode for auto-fixing bugs found during testing (similar to `/pr-address`) - **Screenshot uploads use GitHub Git API** (blobs → tree → commit → ref) — no local git operations, safe for worktrees - **Subscription mode improvements:** - Extract subscription auth logic to `sdk/subscription.py` — uses SDK's bundled CLI binary instead of requiring `npm install -g @anthropic-ai/claude-code` - Auto-provision `~/.claude/.credentials.json` from `CLAUDE_CODE_OAUTH_TOKEN` env var on container startup — no `claude login` needed in Docker - Add `scripts/refresh_claude_token.sh` — cross-platform helper (macOS/Linux/Windows) to extract OAuth tokens from host and update `backend/.env` ## Test plan - [x] Validated skill on multiple PRs (#12482, #12483, #12499, #12500, #12501, #12440, #12472) — all test scenarios passed - [x] Confirmed screenshot upload via GitHub Git API renders correctly on all 7 PRs - [x] Verified subscription mode E2E in Docker: `refresh_claude_token.sh` → `docker compose up` → copilot chat responds correctly with no API keys (pure OAuth subscription) - [x] Verified auto-provisioning of credentials file inside container from `CLAUDE_CODE_OAUTH_TOKEN` env var - [x] Confirmed bundled CLI detection (`claude_agent_sdk._bundled/claude`) works without system-installed `claude` - [x] `poetry run pytest backend/copilot/sdk/service_test.py` — 24/24 tests pass |
||
|
|
1ebcf85669 |
fix(platform): resolve 5 production Sentry alerts (#12496)
## Summary Fixes 5 high-priority Sentry alerts from production: - **AUTOGPT-SERVER-8AM**: Fix `TypeError: TypedDict does not support instance and class checks` — `_value_satisfies_type` in `type.py` now handles TypedDict classes that don't support `isinstance()` checks - **AUTOGPT-SERVER-8AN**: Fix `ValueError: No payment method found` triggering Sentry error — catch the expected ValueError in the auto-top-up endpoint and return HTTP 422 instead - **BUILDER-7F5**: Fix `Upload failed (409): File already exists` — add `overwrite` query param to workspace upload endpoint and set it to `true` from the frontend direct-upload - **BUILDER-7F0**: Fix `LaTeX-incompatible input` KaTeX warnings flooding Sentry — set `strict: false` on rehype-katex plugin to suppress warnings for unrecognized Unicode characters - **AUTOGPT-SERVER-89N**: Fix `Tool execution with manager failed: validation error for dict[str,list[any]]` — make RPC return type validation resilient (log warning instead of crash) and downgrade SmartDecisionMaker tool execution errors to warnings ## Test plan - [ ] Verify TypedDict type coercion works for GithubMultiFileCommitBlock inputs - [ ] Verify auto-top-up without payment method returns 422, not 500 - [ ] Verify file re-upload in copilot succeeds (overwrites instead of 409) - [ ] Verify LaTeX rendering with Unicode characters doesn't produce console warnings - [ ] Verify SmartDecisionMaker tool execution failures are logged at warning level |
||
|
|
ab7c38bda7 |
fix(frontend): detect closed OAuth popup and allow dismissing waiting modal (#12443)
Requested by @kcze When a user closes the OAuth sign-in popup without completing authentication, the 'Waiting on sign-in process' modal was stuck open with no way to dismiss it, forcing a page refresh. Two bugs caused this: 1. `oauth-popup.ts` had no detection for the popup being closed by the user. The promise would hang until the 5-minute timeout. 2. The modal's cancel button aborted a disconnected `AbortController` instead of the actual OAuth flow's abort function, so clicking cancel/close did nothing. ### Changes - Add `popup.closed` polling (500ms) in `openOAuthPopup()` that rejects the promise when the user closes the auth window - Add reject-on-abort so the cancel button properly terminates the flow - Replace the disconnected `oAuthPopupController` with a direct `cancelOAuthFlow()` function that calls the real abort ref - Handle popup-closed and user-canceled as silent cancellations (no error toast) ### Testing Tested manually ✅ - [x] Start OAuth flow → close popup window → modal dismisses automatically ✅ - [x] Start OAuth flow → click cancel on modal → popup closes, modal dismisses ✅ - [x] Complete OAuth flow normally → works as before ✅ Resolves SECRT-2054 --- Co-authored-by: Krzysztof Czerwinski (@kcze) <krzysztof.czerwinski@agpt.co> --------- Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
0f67e45d05 |
hotfix(marketplace): adjust card height overflow (#12497)
## Summary ### Before <img width="500" height="501" alt="Screenshot 2026-03-20 at 21 50 31" src="https://github.com/user-attachments/assets/6154cffb-6772-4c3d-a703-527c8ca0daff" /> ### After <img width="500" height="581" alt="Screenshot 2026-03-20 at 21 33 12" src="https://github.com/user-attachments/assets/2f9bd69d-30c5-4d06-ad1e-ed76b184afe5" /> ### Other minor fixes - minor spacing adjustments in creator/search pages when empty and between sections ### Summary - Increase StoreCard height from 25rem to 26.5rem to prevent content overflow - Replace manual tooltip-based title truncation with `OverflowText` component in StoreCard - Adjust carousel indicator positioning and hide it on md+ when exactly 3 featured agents are shown ## Test plan - [x] Verify marketplace cards display without text overflow - [x] Verify featured section carousel indicators behave correctly - [x] Check responsive behavior at common breakpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
b9ce37600e |
refactor(frontend/marketplace): move download below Add to library with contextual text (#12486)
## Summary <img width="1487" height="670" alt="Screenshot 2026-03-20 at 00 52 58" src="https://github.com/user-attachments/assets/f09de2a0-3c5b-4bce-b6f4-8a853f6792cf" /> - Move the download button from inline next to "Add to library" to a separate line below it - Add contextual text: "Want to use this agent locally? Download here" - Style the "Download here" as a violet ghost button link with the download icon ## Test plan - [ ] Visit a marketplace agent page - [ ] Verify "Add to library" button renders in its row - [ ] Verify "Want to use this agent locally? Download here" appears below it - [ ] Click "Download here" and confirm the agent downloads correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
3921deaef1 |
fix(frontend): truncate marketplace card description to 2 lines (#12494)
Reduces `line-clamp` from 3 to 2 on the marketplace `StoreCard` description to prevent text from overlapping with the absolutely-positioned run count and +Add button at the bottom of the card. Resolves SECRT-2156. --- Co-authored-by: Abhimanyu Yadav (@Abhi1992002) <122007096+Abhi1992002@users.noreply.github.com> |
||
|
|
f01f668674 |
fix(backend): support Responses API in SmartDecisionMakerBlock (#12489)
## Summary - Fixes SmartDecisionMakerBlock conversation management to work with OpenAI's Responses API, which was introduced in #12099 (commitautogpt-platform-beta-v0.6.52 |
||
|
|
f7a3491f91 |
docs(platform): add TDD guidance to CLAUDE.md files (#12491)
Requested by @majdyz Adds TDD (test-driven development) guidance to CLAUDE.md files so Claude Code follows a test-first workflow when fixing bugs or adding features. **Changes:** - **Parent `CLAUDE.md`**: Cross-cutting TDD workflow — write a failing `xfail` test, implement the fix, remove the marker - **Backend `CLAUDE.md`**: Concrete pytest example with `@pytest.mark.xfail` pattern - **Frontend `CLAUDE.md`**: Note about using Playwright `.fixme` annotation for bug-fix tests The workflow is: write a failing test first → confirm it fails for the right reason → implement → confirm it passes. This ensures every bug fix is covered by a test that would have caught the regression. --- Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co> |
||
|
|
cbff3b53d3 |
Revert "feat(backend): migrate OpenAI provider to Responses API" (#12490)
Reverts Significant-Gravitas/AutoGPT#12099
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Reverts the OpenAI integration in `llm_call` from the Responses API
back to `chat.completions`, which can change tool-calling, JSON-mode
behavior, and token accounting across core AI blocks. The change is
localized but touches the primary LLM execution path and associated
tests/docs.
>
> **Overview**
> Reverts the OpenAI path in `backend/blocks/llm.py` from the Responses
API back to `chat.completions`, including updating JSON-mode
(`response_format`), tool handling, and usage extraction to match the
Chat Completions response shape.
>
> Removes the now-unused `backend/util/openai_responses.py` helpers and
their unit tests, updates LLM tests to mock `chat.completions.create`,
and adds `gpt-3.5-turbo` to the supported model list, cost config, and
LLM docs.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
|
||
|
|
5b9a4c52c9 |
revert(platform): Revert invite system (#12485)
## Summary Reverts the invite system PRs due to security gaps identified during review: - The move from Supabase-native `allowed_users` gating to application-level gating allows orphaned Supabase auth accounts (valid JWT without a platform `User`) - The auth middleware never verifies `User` existence, so orphaned users get 500s instead of clean 403s - OAuth/Google SSO signup completely bypasses the invite gate - The DB trigger that atomically created `User` + `Profile` on signup was dropped in favor of a client-initiated API call, introducing a failure window ### Reverted PRs - Reverts #12347 — Foundation: InvitedUser model, invite-gated signup, admin UI - Reverts #12374 — Tally enrichment: personalized prompts from form submissions - Reverts #12451 — Pre-check: POST /auth/check-invite endpoint - Reverts #12452 (collateral) — Themed prompt categories / SuggestionThemes UI. This PR built on top of #12374's `suggested_prompts` backend field and `/chat/suggested-prompts` endpoint, so it cannot remain without #12374. The copilot empty session falls back to hardcoded default prompts. ### Migration Includes a new migration (`20260319120000_revert_invite_system`) that: - Drops the `InvitedUser` table and its enums (`InvitedUserStatus`, `TallyComputationStatus`) - Restores the `add_user_and_profile_to_platform()` trigger on `auth.users` - Backfills `User` + `Profile` rows for any auth accounts created during the invite-gate window ### What's NOT reverted - The `generate_username()` function (never dropped, still used by backfill migration) - The old `add_user_to_platform()` function (superseded by `add_user_and_profile_to_platform()`) - PR #12471 (admin UX improvements) — was never merged, no action needed ## Test plan - [x] Verify migration: `InvitedUser` table dropped, enums dropped, trigger restored - [x] Verify backfill: no orphaned auth users, no users without Profile - [x] Verify existing users can still log in (email + OAuth) - [x] Verify CoPilot chat page loads with default prompts - [ ] Verify new user signup creates `User` + `Profile` via the restored trigger - [ ] Verify admin `/admin/users` page loads without crashing - [ ] Run backend tests: `poetry run test` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> |
||
|
|
0ce1c90b55 |
fix(frontend): rename "CoPilot" to "AutoPilot" on credits page (#12481)
Requested by @kcze Renames "CoPilot" → "AutoPilot" on the credits/usage limits page: - **Heading:** "CoPilot Usage Limits" → "AutoPilot Usage Limits" - **Button:** "Open CoPilot" → "Open AutoPilot" - Comment updated to match --- Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co> Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co> |
||
|
|
d4c6eb9adc |
fix(frontend): collapse navbar text to icons below 1280px (#12484)
## Summary <img width="400" height="339" alt="Screenshot 2026-03-19 at 22 53 23" src="https://github.com/user-attachments/assets/2fa76b8f-424d-4764-90ac-b7a331f5f610" /> <img width="600" height="595" alt="Screenshot 2026-03-19 at 22 53 31" src="https://github.com/user-attachments/assets/23f51cc7-b01e-4d83-97ba-2c43683877db" /> <img width="800" height="523" alt="Screenshot 2026-03-19 at 22 53 36" src="https://github.com/user-attachments/assets/1e447b9a-1cca-428c-bccd-1730f1670b8e" /> Now that we have the `Give feedback` button on the Navigation bar, collpase some of the links below `1280px` so there is more space and they don't collide with each other... - Collapse navbar link text to icon-only below 1280px (`xl` breakpoint) to prevent crowding - Wallet button shows only the wallet icon below 1280px instead of "Earn credits" text - Feedback button shows only the chat icon below 1280px instead of "Give Feedback" text - Added `whitespace-nowrap` to feedback button to prevent wrapping ## Changes - `NavbarLink.tsx`: `lg:block` → `xl:block` for link text - `Wallet.tsx`: `md:hidden`/`md:inline-block` → `xl:hidden`/`xl:inline-block` - `FeedbackButton.tsx`: wrap text in `hidden xl:inline` span, add `whitespace-nowrap` ## Test plan - [ ] Resize browser between 1024px–1280px and verify navbar shows only icons - [ ] At 1280px+ verify full text labels appear for links, wallet, and feedback - [ ] Verify mobile navbar still works correctly below `md` breakpoint 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
1bb91b53b7 |
fix(frontend/marketplace): comprehensive marketplace UI redesign (#12462)
## Summary <img width="600" height="964" alt="Screenshot_2026-03-19_at_00 07 52" src="https://github.com/user-attachments/assets/95c0430a-26a3-499b-8f6a-25b9715d3012" /> <img width="600" height="968" alt="Screenshot_2026-03-19_at_00 08 01" src="https://github.com/user-attachments/assets/d440c3b0-c247-4f13-bf82-a51ff2e50902" /> <img width="600" height="939" alt="Screenshot_2026-03-19_at_00 08 14" src="https://github.com/user-attachments/assets/f19be759-e102-4a95-9474-64f18bce60cf" />" <img width="600" height="953" alt="Screenshot_2026-03-19_at_00 08 24" src="https://github.com/user-attachments/assets/ba4fa644-3958-45e2-89e9-a6a4448c63c5" /> - Re-style and re-skin the Marketplace pages to look more "professional" ... - Move the `Give feedback` button to the header ## Test plan - [x] Verify marketplace page search bar matches Form text field styling - [x] Verify agent cards have padding and subtle border - [x] Verify hover/focus states work correctly - [x] Check responsive behavior at different breakpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a5f9c43a41 |
feat(platform): replace suggestion pills with themed prompt categories (#12452)
## Summary https://github.com/user-attachments/assets/13da6d36-5f35-429b-a6cf-e18316bb8709 Replaces the flat list of suggestion pills in the CoPilot empty session with themed prompt categories (Learn, Create, Automate, Organize), each shown as a popover with contextual prompts. - **Backend**: Changes `suggested_prompts` from a flat `list[str]` to a themed `dict[str, list[str]]` keyed by category. Updates Tally extraction LLM prompt to generate prompts per theme, and the `/suggested-prompts` API to return grouped themes. Legacy `list[str]` rows are preserved under a `"General"` key for backward compatibility. - **Frontend**: Replaces inline pill buttons with a `SuggestionThemes` popover component. Each theme button (with icon) opens a dropdown of 5 relevant prompts. Falls back to hardcoded defaults when the API has no personalized prompts. Normalizes partial API responses by padding missing themes with defaults. Legacy `"General"` prompts are distributed round-robin across themes so existing users keep their personalized suggestions. ### Changes 🏗️ - `backend/data/understanding.py`: `suggested_prompts` field changed from `list[str]` to `dict[str, list[str]]`; legacy list rows preserved under `"General"` key; list items validated as strings - `backend/data/tally.py`: LLM prompt updated to generate themed prompts; validation now per-theme with blank-string rejection - `backend/api/features/chat/routes.py`: New `SuggestedTheme` model; endpoint returns `themes[]` - `frontend/copilot/components/EmptySession/EmptySession.tsx`: Uses generated API types directly (no cast) - `frontend/copilot/components/EmptySession/helpers.ts`: `DEFAULT_THEMES` replaces `DEFAULT_QUICK_ACTIONS`; `getSuggestionThemes` normalizes partial API responses and distributes legacy `"General"` prompts across themes - `frontend/copilot/components/EmptySession/components/SuggestionThemes/`: New popover component with theme icons and loading states ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verify themed suggestion buttons render on CoPilot empty session - [x] Click each theme button and confirm popover opens with prompts - [x] Click a prompt and confirm it sends the message - [x] Verify fallback to default themes when API returns no custom prompts - [x] Verify legacy users' personalized prompts are preserved and visible 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
1240f38f75 |
feat(backend): migrate OpenAI provider to Responses API (#12099)
## Summary Migrates the OpenAI provider in the LLM block from `chat.completions.create` to `responses.create` — OpenAI's newer, unified API. Also removes the obsolete GPT-3.5-turbo model. Resolves #11624 Linear: [OPEN-2911](https://linear.app/autogpt/issue/OPEN-2911/update-openai-calls-to-use-responsescreate) ## Changes - **`backend/blocks/llm.py`** — OpenAI provider now uses `responses.create` exclusively. Removed GPT-3.5-turbo enum + metadata. - **`backend/util/openai_responses.py`** *(new)* — Helpers for the Responses API: tool format conversion, content/reasoning/usage/tool-call extraction. - **`backend/util/openai_responses_test.py`** *(new)* — Unit tests for all helper functions. - **`backend/data/block_cost_config.py`** — Removed GPT-3.5 cost entry. - **`docs/integrations/block-integrations/llm.md`** — Regenerated block docs. ## Key API differences handled | Aspect | Chat Completions | Responses API | |--------|-----------------|---------------| | Messages param | `messages` | `input` | | Max tokens param | `max_completion_tokens` | `max_output_tokens` | | Usage fields | `prompt_tokens` / `completion_tokens` | `input_tokens` / `output_tokens` | | Tool format | Nested under `function` key | Flat structure | ## Test plan - [x] Unit tests for all `openai_responses.py` helpers - [x] Existing LLM block tests updated for Responses API mocks - [x] Regular OpenAI models work - [x] Reasoning OpenAI models work - [x] Non-OpenAI models work --------- Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f617f50f0b |
dx(skills): improve pr-address skill — full thread context + PR description backtick fix (#12480)
## Summary Improves the `pr-address` skill with two fixes: - **Full comment thread loading**: Adds `--paginate` to the inline comments fetch and explicit instructions to reconstruct threads using `in_reply_to_id`, reading root-to-last-reply before acting. Previously, only the opening comment was visible — missing reviewer replies led to wrong fixes. - **Backtick-safe PR descriptions**: Adds instructions to write the PR body to a temp file via `<<'PREOF'` heredoc before passing to `gh pr edit/create`. Inlining the body directly causes backticks to be shell-escaped, breaking markdown rendering. ## Test plan - [ ] Run `/pr-address` on a PR with multi-reply inline comment threads — verify the last reply is what gets acted on - [ ] Update a PR description containing backticks — verify they render correctly in GitHub |
||
|
|
943a1df815 |
dx(backend): Make Builder and Marketplace search work without embeddings (#12479)
When OpenAI credentials are unavailable (fork PRs, dev envs without API keys), both builder block search and store agent functionality break: 1. **Block search returns wrong results.** `unified_hybrid_search` falls back to a zero vector when embedding generation fails. With ~200 blocks in `UnifiedContentEmbedding`, the zero-vector semantic scores are garbage, and lexical matching on short block names is too weak — "Store Value" doesn't appear in the top results for query "Store Value". 2. **Store submission approval fails entirely.** `review_store_submission` calls `ensure_embedding()` inside a transaction. When it throws, the entire transaction rolls back — no store submissions get approved, the `StoreAgent` materialized view stays empty, and all marketplace e2e tests fail. 3. **Store search returns nothing.** Even when store data exists, `hybrid_search` queries `UnifiedContentEmbedding` which has no store agent rows (backfill failed). It succeeds with zero results rather than throwing, so the existing exception-based fallback never triggers. ### Changes 🏗️ - Replace `unified_hybrid_search` with in-memory text search in `_hybrid_search_blocks` (-> `_text_search_blocks`). All ~200 blocks are already loaded in memory, and `_score_primary_fields` provides correct deterministic text relevance scoring against block name, description, and input schema field descriptions — the same rich text the embedding pipeline uses. CamelCase block names are split via `split_camelcase()` to match the tokenization from PR #12400. - Make embedding generation in `review_store_submission` best-effort: catch failures and log a warning instead of rolling back the approval transaction. The backfill scheduler retries later when credentials become available. - Fall through to direct DB search when `hybrid_search` returns empty results (not just when it throws). The fallback uses ad-hoc `to_tsvector`/`plainto_tsquery` with `ts_rank_cd` ranking on `StoreAgent` view fields, restoring the search quality of the original pre-hybrid implementation (stemming, stop-word removal, relevance ranking). - Fix Playwright artifact upload in end-to-end test CI ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `build.spec.ts`: 8/8 pass locally (was 0/7 before fix) - [x] All 79 e2e tests pass in CI (was 15 failures before fix) --- Co-authored-by: Reinier van der Leer (@Pwuts) --------- Co-authored-by: Reinier van der Leer <pwuts@agpt.co> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
593001e0c8 |
fix(frontend): Remove dead Tutorial button from TallyPopup (#12474)
After the legacy builder was removed in #12082, the TallyPopup component still showed a "Tutorial" button (bottom-right, next to "Give Feedback") that navigated to `/build?resetTutorial=true`. Nothing handles that param anymore, so clicking it did nothing. This removes the dead button and its associated state/handler from TallyPopup and useTallyPopup. The working tutorial (Shepherd.js chalkboard icon in CustomControls) is unaffected. **Changes:** - `TallyPopup.tsx`: Remove Tutorial button JSX, unused imports (`usePathname`, `useSearchParams`), and `isNewBuilder` check - `useTallyPopup.ts`: Remove `showTutorial` state, `handleResetTutorial` handler, unused `useRouter` import Resolves SECRT-2109 --- Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co> Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co> |
||
|
|
e1db8234a3 |
fix(frontend/copilot): constrain markdown heading sizes in user chat messages (#12463)
### Before <img width="600" height="489" alt="Screenshot 2026-03-18 at 19 24 41" src="https://github.com/user-attachments/assets/bb8dc0fa-04cd-4f32-8125-2d7930b4acde" /> Formatted headings in user messages would look massive ### After <img width="600" height="549" alt="Screenshot 2026-03-18 at 19 24 33" src="https://github.com/user-attachments/assets/51230232-c914-42dd-821f-3b067b80bab4" /> Markdown headings (`# H1` through `###### H6`) and setext-style headings (`====`) in user chat messages rendered at their full HTML heading size, which looked disproportionately large in the chat bubble context. ### Changes 🏗️ - Added Tailwind CSS overrides on the user message `MessageContent` wrapper to cap all heading elements (h1-h6) at `text-lg font-semibold` - Only affects user messages in copilot chat (via `group-[.is-user]` selector); assistant messages are unchanged ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [ ] Send a user message containing `# Heading 1` through `###### Heading 6` and verify they all render at constrained size - [ ] Send a message with `====` separator pattern and verify it doesn't render as a mega H1 - [ ] Verify assistant messages with headings still render normally Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
282173be9d |
feat(copilot): GitHub CLI support — inject GH_TOKEN and connect_integration tool (#12426)
## Summary - When a user has connected GitHub, `GH_TOKEN` is automatically injected into the Claude Agent SDK subprocess environment so `gh` CLI commands work without any manual auth step - When GitHub is **not** connected, the copilot can call a new `connect_integration(provider="github")` MCP tool, which surfaces the same credential setup card used by regular GitHub blocks — the user connects inline without leaving the chat - After connecting, the copilot is instructed to retry the operation automatically ## Changes **Backend** - `sdk/service.py`: `_get_github_token_for_user()` fetches OAuth2 or API key credentials and injects `GH_TOKEN` + `GITHUB_TOKEN` into `sdk_env` before the SDK subprocess starts (per-request, thread-safe via `ClaudeAgentOptions.env`) - `tools/connect_integration.py`: new `ConnectIntegrationTool` MCP tool — returns `SetupRequirementsResponse` for a given provider (`github` for now); extensible via `_PROVIDER_INFO` dict - `tools/__init__.py`: registers `connect_integration` in `TOOL_REGISTRY` - `prompting.py`: adds GitHub CLI / `connect_integration` guidance to `_SHARED_TOOL_NOTES` **Frontend** - `ConnectIntegrationTool/ConnectIntegrationTool.tsx`: thin wrapper around the existing `SetupRequirementsCard` with a tailored retry instruction - `MessagePartRenderer.tsx`: dispatches `tool-connect_integration` to the new component ## Test plan - [ ] User with GitHub credentials: `gh pr list` works without any auth step in copilot - [ ] User without GitHub credentials: copilot calls `connect_integration`, card renders with GitHub credential input, after connecting copilot retries and `gh` works - [ ] `GH_TOKEN` is NOT leaked across users (injected via `ClaudeAgentOptions.env`, not `os.environ`) - [ ] `connect_integration` with unknown provider returns a graceful error message |
||
|
|
5d9a169e04 |
feat(blocks): add AutoPilotBlock for invoking AutoPilot from graphs (#12439)
## Summary - Adds `AutogptCopilotBlock` that invokes the platform's copilot system (`stream_chat_completion_sdk`) directly from graph executions - Enables sub-agent patterns: copilot can call this block recursively (with depth limiting via `contextvars`) - Enables scheduled copilot execution through the agent executor system - No user credentials needed — uses server-side copilot config ## Inputs/Outputs **Inputs:** prompt, system_context, session_id (continuation), timeout, max_recursion_depth **Outputs:** response text, tool_calls list, conversation_history JSON, session_id, token_usage ## Test plan - [x] Block test passes (`test_available_blocks[AutogptCopilotBlock]`) - [x] Pre-commit hooks pass (format, lint, typecheck) - [ ] Manual test: add block to graph, send prompt, verify response - [ ] Manual test: chain two copilot blocks with session_id to verify continuation |
||
|
|
6fd1050457 |
fix(backend): arch-conditional chromium in Docker for ARM64 compatibility (#12466)
## Summary - On **amd64**: keep `agent-browser install` (Chrome for Testing — pinned version tested with Playwright) + restore runtime libs - On **arm64**: install system `chromium` package (Chrome for Testing has no ARM64 binary) + skip `agent-browser install` - An entrypoint script sets `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` at container startup on arm64 (detected via presence of `/usr/bin/chromium`); on amd64 the var is left unset so agent-browser uses Chrome for Testing as before **Why not system chromium on amd64?** `agent-browser install` downloads a specific Chrome for Testing version pinned to the Playwright version in use. Using whatever Debian ships on amd64 could cause protocol compatibility issues. Introduced by #12301 (cc @Significant-Gravitas/zamil-majdy) ## Test plan - [ ] `docker compose up --build` succeeds on ARM64 (Apple Silicon) - [ ] `docker compose up --build` succeeds on x86_64 - [ ] Copilot browser tools (`browser_navigate`, `browser_act`, `browser_screenshot`) work in a Copilot session on both architectures --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> |
||
|
|
02708bcd00 |
fix(platform): pre-check invite eligibility before Supabase signup (#12451)
Requested by @Swiftyos The invite gate check in `get_or_activate_user()` runs after Supabase creates the auth user, resulting in orphaned auth accounts with no platform access when a non-invited user signs up. Users could create a Supabase account but had no `User`, `Profile`, or `Onboarding` records — they could log in but access nothing. ### Changes 🏗️ **Backend** (`v1.py`, `invited_user.py`): - Add public `POST /api/auth/check-invite` endpoint (no auth required — this is a pre-signup check) - Add `check_invite_eligibility()` helper in the data layer - Returns `{allowed: true}` when `enable_invite_gate` is disabled - Extracted `is_internal_email()` helper to deduplicate `@agpt.co` bypass logic (was duplicated between route and `get_or_activate_user`) - Checks `InvitedUser` table for `INVITED` status - Added IP-based Redis rate limiting (10 req/60 s per IP, fails open if Redis unavailable, returns HTTP 429 when exceeded) - Fixed Redis pipeline atomicity: `incr` + `expire` now sent in a single pipeline round-trip, preventing a TTL-less key if `expire` had previously failed after `incr` - Fixed incorrect `await` on `pipe.incr()` / `pipe.expire()` — redis-py async pipeline queue methods are synchronous; only `execute()` is awaitable. The erroneous `await` was silently swallowed by the `except` block, making the rate limiter completely non-functional **Frontend** (`signup/actions.ts`): - Call the generated `postV1CheckIfAnEmailIsAllowedToSignUp` client (replacing raw `fetch`) before `supabase.auth.signUp()` - `ApiError` (non-OK HTTP responses) logs a Sentry warning with the HTTP status; network/other errors capture a Sentry exception - If not allowed, return `not_allowed` error (existing `EmailNotAllowedModal` handles this) - Graceful fallback: if the pre-check fails (backend unreachable), falls through to the existing flow — `get_or_activate_user()` remains as defense-in-depth **Tests** (`v1_test.py`, `invited_user_test.py`): - 5 route-level tests covering: gate disabled → allowed, `@agpt.co` bypass, eligible email, ineligible email, rate-limit exceeded - Rate-limit test mock updated to use pipeline interface (`pipeline().execute()` returns `[count, True]`) - Existing `invited_user_test.py` updated to cover `check_invite_eligibility` branches **Not changed:** - Google OAuth flow — already gated by OAuth provider settings - `get_or_activate_user()` — stays as backend safety net - All admin invite CRUD routes — unchanged ### Test plan 1. Email/password signup with invited email → signup proceeds normally 2. Email/password signup with non-invited email → `EmailNotAllowedModal` shown, no Supabase user created 3. `enable_invite_gate=false` → all emails allowed 4. Backend unreachable during pre-check → falls through to existing flow 5. Same IP exceeds 10 requests/60 s → HTTP 429 returned --- Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com> --------- Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> |
||
|
|
156d61fe5c |
dx(skills): add merge conflict detection and resolution to pr-address (#12469)
## Summary - Adds merge conflict detection as step 2 of the polling loop (between CI check and comment check), including handling of the transient `"UNKNOWN"` state - Adds a "Resolving merge conflicts" section with step-by-step instructions using 3-way merge (no force push needed since PRs are squash-merged) - Validates all three git conflict markers before staging to prevent committing broken code - Fixes `args` → `argument-hint` in skill frontmatter ## Test plan - [ ] Verify skill renders correctly in Claude Code |
||
|
|
5a29de0e0e |
fix(platform): try-compact-retry for prompt-too-long errors in CoPilot SDK (#12413)
## Summary When the Claude SDK returns a prompt-too-long error (e.g. transcript + query exceeds the model's context window), the streaming loop now retries with escalating fallbacks instead of failing immediately: 1. **Attempt 1**: Use the transcript as-is (normal path) 2. **Attempt 2**: Compact the transcript via LLM summarization (`compact_transcript`) and retry 3. **Attempt 3**: Drop the transcript entirely and fall back to DB-reconstructed context (`_build_query_message`) If all 3 attempts fail, a `StreamError(code="prompt_too_long")` is yielded to the frontend. ### Key changes **`service.py`** - Add `_is_prompt_too_long(err)` — pattern-matches SDK exceptions for prompt-length errors (`prompt is too long`, `prompt_too_long`, `context_length_exceeded`, `request too large`) - Wrap `async with ClaudeSDKClient` in a 3-attempt retry `for` loop with compaction/fallback logic - Move `current_message`, `_build_query_message`, and `_prepare_file_attachments` before the retry loop (computed once, reused) - Skip transcript upload in `finally` when `transcript_caused_error` (avoids persisting a broken/empty transcript) - Reset `stream_completed` between retry iterations - Document outer-scope variable contract in `_run_stream_attempt` closure (which variables are reassigned between retries vs read-only) **`transcript.py`** - Add `compact_transcript(content, log_prefix, model)` — converts JSONL → messages → `compress_context` (LLM summarization with truncation fallback) → JSONL - Add helpers: `_flatten_assistant_content`, `_flatten_tool_result_content`, `_transcript_to_messages`, `_messages_to_transcript`, `_run_compression` - Returns `None` when compaction fails or transcript is already within budget (signals caller to fall through to DB fallback) - Truncation fallback wrapped in 30s timeout to prevent unbounded CPU time on large transcripts - Accepts `model` parameter to avoid creating a new `ChatConfig()` on every call **`util/prompt.py`** - Fix `_truncate_middle_tokens` edge case: returns empty string when `max_tok < 1`, properly handles `max_tok < 3` **`config.py`** - E2B sandbox timeout raised from 5 min to 15 min to accommodate compaction retries **`prompt_too_long_test.py`** (new, 45 tests) - `_is_prompt_too_long` positive/negative patterns, case sensitivity, BaseException handling - Flatten helpers for assistant/tool_result content blocks - `_transcript_to_messages` / `_messages_to_transcript` roundtrip, strippable types, empty content - `compact_transcript` async tests: too few messages, not compacted, successful compaction, compression failure **`retry_scenarios_test.py`** (new, 27 tests) - Full retry state machine simulation covering all 8 scenarios: 1. Normal flow (no retry) 2. Compact succeeds → retry succeeds 3. Compact fails → DB fallback succeeds 4. No transcript → DB fallback succeeds 5. Double fail → DB fallback on attempt 3 6. All 3 attempts exhausted 7. Non-prompt-too-long error (no retry) 8. Compaction returns identical content → DB fallback - Edge cases: nested exceptions, case insensitivity, unicode content, large transcripts, resume-after-compaction flow **Shared test fixtures** (`conftest.py`) - Extracted `build_test_transcript` helper used across 3 test files to eliminate duplication ## Test plan - [x] `_is_prompt_too_long` correctly identifies prompt-length errors (8 positive, 5 negative patterns) - [x] `compact_transcript` compacts oversized transcripts via LLM summarization - [x] `compact_transcript` returns `None` on failure or when already within budget - [x] Retry loop state machine: all 8 scenarios verified with state assertions - [x] `TranscriptBuilder` works correctly after loading compacted transcripts - [x] `_messages_to_transcript` roundtrip preserves content including unicode - [x] `transcript_caused_error` prevents stale transcript upload - [x] Truncation timeout prevents unbounded CPU time - [x] All 139 unit tests pass locally - [x] CI green (tests 3.11/3.12/3.13, types, CodeQL, linting) |
||
|
|
e657472162 |
feat(blocks): Add Nano Banana 2 to image generator, customizer, and editor blocks (#12218)
Requested by @Torantulino Add `google/nano-banana-2` (Gemini 3.1 Flash Image) support across all three image blocks. ### Changes **`ai_image_customizer.py`** - Add `NANO_BANANA_2 = "google/nano-banana-2"` to `GeminiImageModel` enum - Update block description to reference Nano-Banana models generically **`ai_image_generator_block.py`** - Add `NANO_BANANA_2` to `ImageGenModel` enum - Add generation branch (identical to NBP except model name) **`flux_kontext.py` (AI Image Editor)** - Rename `FluxKontextModelName` → `ImageEditorModel` (with backwards-compatible alias) - Add `NANO_BANANA_PRO` and `NANO_BANANA_2` to the editor - Model-aware branching in `run_model()`: NB models use `image_input` list (not `input_image`), no `seed`, and add `output_format` **`block_cost_config.py`** - Add NB2 cost entries for all three blocks (14 credits, matching NBP) - Add NB Pro cost entry for editor block - Update editor block refs from `.PRO`/`.MAX` to `.FLUX_KONTEXT_PRO`/`.FLUX_KONTEXT_MAX` Resolves SECRT-2047 --------- Co-authored-by: Torantulino <Torantulino@users.noreply.github.com> Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com> |
||
|
|
4d00e0f179 |
fix(blocks): allow falsy entries in AddToListBlock (#12028)
## Summary - treat AddToListBlock.entry as optional rather than truthy so 0/""/False are appended - extend block self-tests with a falsy entry case ## Testing - Not run (pytest not available in environment) Co-authored-by: DEEVEN SERU <144827577+DEVELOPER-DEEVEN@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> |
||
|
|
1d7282b5f3 |
fix(backend): Truncate filenames with excessively long 'extensions' (#12025)
Fixes issue where filenames with no dots until the end (or massive extensions) bypassed truncation logic, causing OSError [Errno 36]. Limits extension preservation to 20 chars. --------- Co-authored-by: DEVELOPER-DEEVEN <144827577+DEVELOPER-DEEVEN@users.noreply.github.com> |
||
|
|
e3591fcaa3 |
ci(backend): Python version specific type checking (#12453)
- Resolves #10657 - Partially based on #10913 ### Changes 🏗️ - Run Pyright separately for each supported Python version - Move type checking and linting into separate jobs - Add `--skip-pyright` option to lint script - Move `linter.py` into `backend/scripts` - Move other scripts in `backend/` too for consistency ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - CI --- Co-authored-by: @Joaco2603 <jpappa2603@gmail.com> --------- Co-authored-by: Joaco2603 <jpappa2603@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
876dc32e17 |
chore(backend): Update poetry to v2.2.1 (#12459)
Poetry v2.2.1 has bugfixes that are relevant in context of our `.pre-commit-config.yaml` ### Changes 🏗️ - Update `poetry` from v2.1.1 to v2.2.1 (latest version supported by Dependabot) - Re-generate `poetry.lock` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - CI |
||
|
|
616e29f5e4 |
fix tests for 6d0e206
|
||
|
|
280a98ad38 |
dx(skills): poll for new PR comments while waiting for CI (#12461)
## Summary - Updates the `pr-address` skill to poll for new PR comments while waiting for CI, instead of blocking solely on `gh pr checks --watch --fail-fast` - Runs CI watch in the background and polls all 3 comment endpoints every 30s - Allows bot comments (coderabbitai, sentry) to be addressed in parallel with CI rather than sequentially ## Test plan - [ ] Run `/pr-address` on a PR with pending CI and verify it detects new comments while CI is running - [ ] Verify CI failures are still handled correctly after the combined wait |