Compare commits

...

303 Commits

Author SHA1 Message Date
Bentlybro
be328c1ec5 feat(backend/llm-registry): wire refresh_runtime_caches to Redis invalidation and pub/sub
After any admin DB mutation, clear the shared Redis cache, refresh this
process's in-memory state, then publish a notification so all other workers
reload from Redis without hitting the database.
2026-04-07 18:35:41 +01:00
Bentlybro
8410448c16 fix(backend/llm-registry): enforce single recommended model in update_model
When setting is_recommended=True on a model, first clears the flag on all
other models within the same transaction so only one model can be
recommended at a time.
2026-04-07 18:35:08 +01:00
Bentlybro
e168597663 chore: regenerate OpenAPI schema from current backend endpoints 2026-04-07 18:35:08 +01:00
Bentlybro
1d903ae287 fix: add trailing newline to openapi.json 2026-04-07 18:35:08 +01:00
Bentlybro
1be7aebdea chore: regenerate OpenAPI schema for new migration endpoints 2026-04-07 18:35:08 +01:00
Bentlybro
36045c7007 feat(backend): Add model migration system - usage tracking, safe delete, disable with migration, revert
- GET /llm/models/{slug}/usage - count AgentNodes using a model
- DELETE /llm/models/{slug} with optional replacement_model_slug for safe migration
- POST /llm/models/{slug}/toggle with migration support when disabling
- GET /llm/migrations - list model migrations (with include_reverted filter)
- POST /llm/migrations/{id}/revert - revert a migration (restores nodes, re-enables source model)
- Transactional migration: counts nodes, migrates atomically, creates LlmModelMigration audit record
- Ported from original PR #11699's db.py
2026-04-07 18:35:08 +01:00
Bentlybro
445eb173a5 feat(backend): Add admin list endpoints, creator CRUD, model cost creation
- GET /llm/admin/providers - list all providers from DB (includes empty ones)
- GET /llm/admin/models - list all models with costs and creator info
- POST /llm/creators - create new creator
- PATCH /llm/creators/{name} - update creator
- DELETE /llm/creators/{name} - delete creator (with model check)
- Create LlmModelCost records when creating a model
- Resolve provider name to ID in create_model
- Add costs field to CreateLlmModelRequest
2026-04-07 18:35:08 +01:00
Bentlybro
393a138fee fix(backend): Use {slug:path} for model routes to support slugs with slashes
Model slugs like 'openai/gpt-oss-120b' contain forward slashes which
FastAPI's default {slug} parameter doesn't capture. Using {slug:path}
allows the full slug to be captured as a single parameter.
2026-04-07 18:35:08 +01:00
Bentlybro
ccc1e35c5b Add LLM creators endpoint and OpenAPI entry
Introduce a read endpoint for LLM model creators: add _map_creator_response serializer and an admin-only GET /llm/creators route that queries prisma.models.LlmModelCreator (ordered by name), logs results, and returns serialized creators with error handling. Also update frontend OpenAPI spec with the /api/llm/creators GET operation.
2026-04-07 18:35:07 +01:00
Bentlybro
c66f114e28 Add LLM model/provider API endpoints
Add admin LLM CRUD endpoints and request schemas to the OpenAPI spec. Introduces POST /api/llm/models and DELETE/PATCH /api/llm/models/{slug}, and POST /api/llm/providers and DELETE/PATCH /api/llm/providers/{name} (all with HTTPBearerJWT security where applicable). Adds CreateLlmModelRequest, UpdateLlmModelRequest, CreateLlmProviderRequest, and UpdateLlmProviderRequest component schemas and corresponding responses (201/200/204 plus validation and auth errors). Notes provider deletion requires no associated models.
2026-04-07 18:35:07 +01:00
Bentlybro
939edc73b8 feat(platform): Implement LLM registry admin API functionality
Implement full CRUD operations for admin API:

Database layer (db_write.py):
- create_provider, update_provider, delete_provider
- create_model, update_model, delete_model
- refresh_runtime_caches - invalidates in-memory registry after mutations
- Proper validation and error handling

Admin routes (admin_routes.py):
- All endpoints now functional (no more 501)
- Proper error responses (400 for validation, 404 for not found, 500 for server errors)
- Lookup by slug/name before operations
- Cache refresh after all mutations

Features:
- Provider deletion blocked if models exist (FK constraint)
- All mutations refresh registry cache automatically
- Proper logging for audit trail
- Admin auth enforced on all endpoints

Based on original implementation from PR #11699 (upstream-llm branch).

Builds on:
- PR #12357: Schema foundation
- PR #12359: Registry core
- PR #12371: Public read API
2026-04-07 18:35:07 +01:00
Bentlybro
d52409c853 feat(platform): Add LLM registry admin API skeleton - Part 4 of 6
Add admin write API endpoints for LLM registry management:
- POST /api/llm/models - Create model
- PATCH /api/llm/models/{slug} - Update model
- DELETE /api/llm/models/{slug} - Delete model
- POST /api/llm/providers - Create provider
- PATCH /api/llm/providers/{name} - Update provider
- DELETE /api/llm/providers/{name} - Delete provider

All endpoints require admin authentication via requires_admin_user.

Request/response models defined in admin_model.py:
- CreateLlmModelRequest, UpdateLlmModelRequest
- CreateLlmProviderRequest, UpdateLlmProviderRequest

Implementation coming in follow-up commits (currently returns 501 Not Implemented).

This builds on:
- PR #12357: Schema foundation
- PR #12359: Registry core
- PR #12371: Public read API
2026-04-07 18:35:07 +01:00
Bentlybro
90a68084eb fix(backend): Include is_enabled in public model list response 2026-04-07 18:34:58 +01:00
Bentlybro
fb9a3224be Add is_enabled field to OpenAPI model
Introduce a new boolean property `is_enabled` (default: true) into the OpenAPI schema in autogpt_platform/frontend/src/app/api/openapi.json next to `price_tier` and `is_recommended`. This exposes an enable/disable flag in the API model for consumers and defaults new entries to enabled.
2026-04-07 18:34:58 +01:00
Bentlybro
eb76b95aa5 Add is_enabled flag to LlmModel
Introduce an is_enabled: bool = True field to the LlmModel pydantic model to allow toggling model availability. Defaulting to True preserves backward compatibility and avoids breaking changes; can be used by APIs or UIs to filter or disable models without removing them.
2026-04-07 18:34:58 +01:00
Bentlybro
cc17884360 Add LLM models/providers endpoints to OpenAPI
Add two new GET endpoints to the OpenAPI spec: /api/llm/models (with optional enabled_only query param, JWT auth) and /api/llm/providers (JWT auth). These endpoints expose the in-memory LLM registry: list of models and grouped providers with their enabled models. Also add related component schemas (LlmModel, LlmModelCost, LlmModelCreator, LlmModelsResponse, LlmProvider, LlmProvidersResponse) describing model metadata, costs, creators and response shapes.
2026-04-07 18:34:58 +01:00
Bentlybro
1ce3cc0231 fix: remove incorrectly placed openapi.json file 2026-04-07 18:34:58 +01:00
Bentlybro
bd1f4b5701 fix: regenerate OpenAPI schema after rebase 2026-04-07 18:34:58 +01:00
Bentlybro
e89e56d90d feat(platform): Add LLM registry public read API
Implements public GET endpoints for querying LLM models and providers - Part 3 of 6 in the incremental registry rollout.

**Endpoints:**
- GET /api/llm/models - List all models (filterable by enabled_only)
- GET /api/llm/providers - List providers with their models

**Design:**
- Uses in-memory registry from PR 2 (no DB queries)
- Fast reads from cache populated at startup
- Grouped by provider for easy UI rendering

**Response models:**
- LlmModel - model info with capabilities, costs, creator
- LlmProvider - provider with nested models
- LlmModelsResponse - list + total count
- LlmProvidersResponse - grouped by provider

**Authentication:**
- Requires user auth (requires_user dependency)
- Public within authenticated sessions

**Integration:**
- Registered in rest_api.py at /api prefix
- Tagged with v2 + llm for OpenAPI grouping

**What's NOT included (later PRs):**
- Admin write API (PR 4)
- Block integration (PR 5)
- Redis cache (PR 6)

Lines: ~180 total
Files: 4 (3 new, 1 modified)
Review time: < 10 minutes
2026-04-07 18:34:58 +01:00
Bentlybro
2a923dcd92 feat(backend/llm-registry): add Redis-backed cache and cross-process pub/sub sync
- Wrap DB fetch with @cached(shared_cache=True) so results are stored in
  Redis automatically — other workers skip the DB on warm cache
- Add notifications.py with publish/subscribe helpers using llm_registry:refresh
  pub/sub channel for cross-process invalidation
- clear_registry_cache() invalidates the shared Redis entry before a forced
  DB refresh (called by admin mutations)
- rest_api.py: start a background subscription task so every worker reloads
  its in-process cache when another worker refreshes the registry
2026-04-07 18:34:45 +01:00
Bentlybro
1fffd21b16 fix(registry): switch to Pydantic models, add typed capabilities, add unit tests
- Replace frozen dataclasses with Pydantic BaseModel(frozen=True) for true immutability
- Add typed boolean fields for model capabilities (supports_tools, etc.)
- Add comprehensive unit tests for registry module
- Addresses Majdyz review feedback on PR #12359
2026-04-05 08:04:55 +00:00
Bentlybro
2241a62b75 fix(registry): address Majdyz review - extract helper, fix schema prefix, return copies, remove re-export 2026-04-04 20:38:13 +00:00
Bentlybro
a5b71b9783 style: fix trailing whitespace in registry.py 2026-03-25 14:12:06 +00:00
Bentlybro
7632548408 fix(startup): handle missing AgentNode table in migrate_llm_models
Tests fail with 'relation "platform.AgentNode" does not exist' because
migrate_llm_models() runs during startup and queries a table that doesn't
exist in fresh test databases.

This is an existing bug in the codebase - the function has no error handling.

Wrap the call in try/except to gracefully handle test environments where
the AgentNode table hasn't been created yet.
2026-03-25 14:12:06 +00:00
Bentlybro
05fa10925c refactor: address CodeRabbit/Majdyz review feedback
- Fix ModelMetadata duplicate type collision by importing from blocks.llm
- Remove _json_to_dict helper, use dict() inline
- Add warning when Provider relation is missing (data corruption indicator)
- Optimize get_default_model_slug with next() (single sort pass)
- Optimize _build_schema_options to use list comprehension
- Move llm_registry import to top-level in rest_api.py
- Ensure max_output_tokens falls back to context_window when null

All critical and quick-win issues addressed.
2026-03-25 14:12:06 +00:00
Bentlybro
c64246be87 fix: address Sentry/CodeRabbit critical and major issues
**CRITICAL FIX - ModelMetadata instantiation:**
- Removed non-existent 'supports_vision' argument
- Added required fields: display_name, provider_name, creator_name, price_tier
- Handle nullable DB fields (Creator, priceTier, maxOutputTokens) safely
- Fallback: creator_name='Unknown' if no Creator, price_tier=1 if invalid

**MAJOR FIX - Preserve pricing unit:**
- Added 'unit' field to RegistryModelCost dataclass
- Prevents RUN vs TOKENS ambiguity in cached costs
- Convert Prisma enum to string when building cost objects

**MAJOR FIX - Deterministic default model:**
- Sort recommended models by display_name before selection
- Prevents non-deterministic results when multiple models are recommended
- Ensures consistent default across refreshes

**STARTUP IMPROVEMENT:**
- Added comment: graceful fallback OK for now (no blocks use registry yet)
- Will be stricter in PR #5 when block integration lands
- Added success log message for registry refresh

Fixes identified by Sentry (critical TypeError) and CodeRabbit review.
2026-03-25 14:12:06 +00:00
Bentlybro
253937e7b9 feat(platform): Add LLM registry core - DB layer + in-memory cache
Implements the registry core for dynamic LLM model management:

**DB Layer:**
- Fetch models with provider, costs, and creator relations
- Prisma query with includes for related data
- Convert DB records to typed dataclasses

**In-memory Cache:**
- Global dict for fast model lookups
- Atomic cache refresh with lock protection
- Schema options generation for UI dropdowns

**Public API:**
- get_model(slug) - lookup by slug
- get_all_models() - all models (including disabled)
- get_enabled_models() - enabled models only
- get_schema_options() - UI dropdown data
- get_default_model_slug() - recommended or first enabled
- refresh_llm_registry() - manual refresh trigger

**Integration:**
- Refresh at API startup (before block init)
- Graceful fallback if registry unavailable
- Enables blocks to consume registry data

**Models:**
- RegistryModel - full model with metadata
- RegistryModelCost - pricing configuration
- RegistryModelCreator - model creator info
- ModelMetadata - context window, capabilities

**Next PRs:**
- PR #3: Public read API (GET endpoints)
- PR #4: Admin write API (POST/PATCH/DELETE)
- PR #5: Block integration (update LLM block)
- PR #6: Redis cache (solve thundering herd)

Lines: ~230 (registry.py ~210, __init__.py ~30, model.py from draft)
Files: 4 (3 new, 1 modified)
2026-03-25 14:12:06 +00:00
Bentlybro
73e481b508 revert: undo changes to graph.py
Reverting migrate_llm_models modifications per request.
Back to dev baseline for this file.
2026-03-25 14:12:06 +00:00
Bentlybro
f0cc4ae573 Seed LLM model creators and link models
Update migration to seed LLM model creators and associate them with models. Adds INSERTs for LlmModelCreator, updates the file header comment, introduces a creator_ids CTE, and extends the LlmModel INSERT to include creatorId (joining on creator name). Existing provider seeding and model cost logic remain unchanged; ON CONFLICT behavior preserved.
2026-03-25 13:57:41 +00:00
Bently
e0282b00db Merge branch 'dev' into feat/llm-registry-schema 2026-03-23 13:43:15 +00:00
Bentlybro
9a9c36b806 Update LLM registry seeds and conflict clause
Add and rename model slugs and costs in the LLM registry seed migration (e.g. rename 'o3' -> 'o3-2025-04-16', add 'gpt-5.2-2025-12-11', Anthropic 'claude-opus-4-6'/'claude-sonnet-4-6', multiple Google Gemini and Mistralai OpenRouter entries, and other provider models). Also tighten the ON CONFLICT upsert semantics so conflicts are ignored only when "credentialId" IS NULL, preventing silent skips for credentialed entries. These changes seed new models and ensure correct conflict handling during migration.
2026-03-23 13:20:48 +00:00
Zamil Majdy
e86ac21c43 feat(platform): add workflow import from other tools (n8n, Make.com, Zapier) (#12440)
## Summary
- Enable one-click import of workflows from other platforms (n8n,
Make.com, Zapier, etc.) into AutoGPT via CoPilot
- **No backend endpoint** — import is entirely client-side: the dialog
reads the file or fetches the n8n template URL, uploads the JSON to the
workspace via `uploadFileDirect`, stores the file reference in
`sessionStorage`, and redirects to CoPilot with `autosubmit=true`
- CoPilot receives the workflow JSON as a proper file attachment and
uses the existing agent-generator pipeline to convert it
- Library dialog redesigned: 2 tabs — "AutoGPT agent" (upload exported
agent JSON) and "Another platform" (file upload + optional n8n URL)

## How it works
1. User uploads a workflow JSON (or pastes an n8n template URL)
2. Frontend fetches/reads the JSON and uploads it to the user's
workspace via the existing file upload API
3. User is redirected to `/copilot?source=import&autosubmit=true`
4. CoPilot picks up the file from `sessionStorage` and sends it as a
`FileUIPart` attachment with a prompt to recreate the workflow as an
AutoGPT agent

## Test plan
- [x] Manual test: import a real n8n workflow JSON via the dialog
- [x] Manual test: paste an n8n template URL and verify it fetches +
converts
- [x] Manual test: import Make.com / Zapier workflow export JSON
- [x] Repeated imports don't cause 409 conflicts (filenames use
`crypto.randomUUID()`)
- [x] E2E: Import dialog has 2 tabs (AutoGPT agent + Another platform)
- [x] E2E: n8n quick-start template buttons present
- [x] E2E: n8n URL input enables Import button on valid URL
- [x] E2E: Workspace upload API returns file_id
2026-03-23 13:03:02 +00:00
Bentlybro
d5381625cd Add timestamps to LLM registry seed inserts
Update migration.sql to include createdAt and updatedAt columns/values for LlmProvider, LlmModel, and LlmModelCost seed inserts. Uses CURRENT_TIMESTAMP for both timestamp fields and adjusts the INSERT SELECT ordering for models to match the added columns. This ensures the seed data satisfies schemas that require timestamps and provides consistent created/updated metadata.
2026-03-23 12:59:30 +00:00
Bentlybro
f6ae3d6593 Update migration.sql 2026-03-23 12:43:06 +00:00
Lluis Agusti
94224be841 Merge remote-tracking branch 'origin/master' into dev 2026-03-23 20:42:32 +08:00
Bentlybro
0fb1b854df add llm's via migration 2026-03-23 12:37:29 +00:00
Otto
da4bdc7ab9 fix(backend+frontend): reduce Sentry noise from user-caused errors (#12513)
Requested by @majdyz

User-caused errors (no payment method, webhook agent invocation, missing
credentials, bad API keys) were hitting Sentry via `logger.exception()`
in the `ValueError` handler, creating noise that obscures real bugs.
Additionally, a frontend crash on the copilot page (BUILDER-71J) needed
fixing.

**Changes:**

**Backend — rest_api.py**
- Set `log_error=False` for the `ValueError` exception handler (line
278), consistent with how `FolderValidationError` and `NotFoundError`
are already handled. User-caused 400 errors no longer trigger
`logger.exception()` → Sentry.

**Backend — executor/manager.py**
- Downgrade `ExecutionManager` input validation skip errors from `error`
to `warning` level. Missing credentials is expected user behavior, not
an internal error.

**Backend — blocks/llm.py**
- Sanitize unpaired surrogates in LLM prompt content before sending to
provider APIs. Prevents `UnicodeEncodeError: surrogates not allowed`
when httpx encodes the JSON body (AUTOGPT-SERVER-8AX).

**Frontend — package.json**
- Upgrade `ai` SDK from `6.0.59` to `6.0.134` to fix BUILDER-71J
(`TypeError: undefined is not an object (evaluating
'this.activeResponse.state')` on /copilot page). This is a known issue
in the Vercel AI SDK fixed in later patch versions.

**Sentry issues addressed:**
- `No payment method found` (ValueError → 400)
- `This agent is triggered by an external event (webhook)` (ValueError →
400)
- `Node input updated with non-existent credentials` (ValueError → 400)
- `[ExecutionManager] Skip execution, input validation error: missing
input {credentials}`
- `UnicodeEncodeError: surrogates not allowed` (AUTOGPT-SERVER-8AX)
- `TypeError: activeResponse.state` (BUILDER-71J)

Resolves SECRT-2166

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>

---------

Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-23 12:22:49 +00:00
Zamil Majdy
7176cecf25 perf(copilot): reduce tool schema token cost by 34% (#12398)
## Summary

Reduce CoPilot per-turn token overhead by systematically trimming tool
descriptions, parameter schemas, and system prompt content. All 35 MCP
tool schemas are passed on every SDK call — this PR reduces their size.

### Strategy

1. **Tool descriptions**: Trimmed verbose multi-sentence explanations to
concise single-sentence summaries while preserving meaning
2. **Parameter schemas**: Shortened parameter descriptions to essential
info, removed some `default` values (handled in code)
3. **System prompt**: Condensed `_SHARED_TOOL_NOTES` and storage
supplement template in `prompting.py`
4. **Cross-tool references**: Removed duplicate workflow hints (e.g.
"call find_block before run_block" appeared in BOTH tools — kept only in
the dependent tool). Critical cross-tool references retained (e.g.
`continue_run_block` in `run_block`, `fix_agent_graph` in
`validate_agent`, `get_doc_page` in `search_docs`, `web_fetch`
preference in `browser_navigate`)

### Token Impact

| Metric | Before | After | Reduction |
|--------|--------|-------|-----------|
| System Prompt | ~865 tokens | ~497 tokens | 43% |
| Tool Schemas | ~9,744 tokens | ~6,470 tokens | 34% |
| **Grand Total** | **~10,609 tokens** | **~6,967 tokens** | **34%** |

Saves **~3,642 tokens per conversation turn**.

### Key Decisions

- **Mostly description changes**: Tool logic, parameters, and types
unchanged. However, some schema-level `default` fields were removed
(e.g. `save` in `customize_agent`) — these are machine-readable
metadata, not just prose, and may affect LLM behavior.
- **Quality preserved**: All descriptions still convey what the tool
does and essential usage patterns
- **Cross-references trimmed carefully**: Kept prerequisite hints in the
dependent tool (run_block mentions find_block) but removed the reverse
(find_block no longer mentions run_block). Critical cross-tool guidance
retained where removal would degrade model behavior.
- **`run_time` description fixed**: Added missing supported values
(today, last 30 days, ISO datetime) per review feedback

### Future Optimization

The SDK passes all 35 tools on every call. The MCP protocol's
`list_tools()` handler supports dynamic tool registration — a follow-up
PR could implement lazy tool loading (register core tools + a discovery
meta-tool) to further reduce per-turn token cost.

### Changes

- Trimmed descriptions across 25 tool files
- Condensed `_SHARED_TOOL_NOTES` and `_build_storage_supplement` in
`prompting.py`
- Fixed `run_time` schema description in `agent_output.py`

### Checklist

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All 273 copilot tests pass locally
  - [x] All 35 tools load and produce valid schemas
  - [x] Before/after token dumps compared
  - [x] Formatting passes (`poetry run format`)
  - [x] CI green
2026-03-23 08:27:24 +00:00
Zamil Majdy
f35210761c feat(devops): add /pr-test skill + subscription mode auto-provisioning (#12507)
## Summary
- Adds `/pr-test` skill for automated E2E testing of PRs using docker
compose, agent-browser, and API calls
- Covers full environment setup (copy .env, configure copilot auth,
ARM64 Docker fix)
- Includes browser UI testing, direct API testing, screenshot capture,
and test report generation
- Has `--fix` mode for auto-fixing bugs found during testing (similar to
`/pr-address`)
- **Screenshot uploads use GitHub Git API** (blobs → tree → commit →
ref) — no local git operations, safe for worktrees
- **Subscription mode improvements:**
- Extract subscription auth logic to `sdk/subscription.py` — uses SDK's
bundled CLI binary instead of requiring `npm install -g
@anthropic-ai/claude-code`
- Auto-provision `~/.claude/.credentials.json` from
`CLAUDE_CODE_OAUTH_TOKEN` env var on container startup — no `claude
login` needed in Docker
- Add `scripts/refresh_claude_token.sh` — cross-platform helper
(macOS/Linux/Windows) to extract OAuth tokens from host and update
`backend/.env`

## Test plan
- [x] Validated skill on multiple PRs (#12482, #12483, #12499, #12500,
#12501, #12440, #12472) — all test scenarios passed
- [x] Confirmed screenshot upload via GitHub Git API renders correctly
on all 7 PRs
- [x] Verified subscription mode E2E in Docker:
`refresh_claude_token.sh` → `docker compose up` → copilot chat responds
correctly with no API keys (pure OAuth subscription)
- [x] Verified auto-provisioning of credentials file inside container
from `CLAUDE_CODE_OAUTH_TOKEN` env var
- [x] Confirmed bundled CLI detection
(`claude_agent_sdk._bundled/claude`) works without system-installed
`claude`
- [x] `poetry run pytest backend/copilot/sdk/service_test.py` — 24/24
tests pass
2026-03-23 15:29:00 +07:00
Zamil Majdy
1ebcf85669 fix(platform): resolve 5 production Sentry alerts (#12496)
## Summary

Fixes 5 high-priority Sentry alerts from production:

- **AUTOGPT-SERVER-8AM**: Fix `TypeError: TypedDict does not support
instance and class checks` — `_value_satisfies_type` in `type.py` now
handles TypedDict classes that don't support `isinstance()` checks
- **AUTOGPT-SERVER-8AN**: Fix `ValueError: No payment method found`
triggering Sentry error — catch the expected ValueError in the
auto-top-up endpoint and return HTTP 422 instead
- **BUILDER-7F5**: Fix `Upload failed (409): File already exists` — add
`overwrite` query param to workspace upload endpoint and set it to
`true` from the frontend direct-upload
- **BUILDER-7F0**: Fix `LaTeX-incompatible input` KaTeX warnings
flooding Sentry — set `strict: false` on rehype-katex plugin to suppress
warnings for unrecognized Unicode characters
- **AUTOGPT-SERVER-89N**: Fix `Tool execution with manager failed:
validation error for dict[str,list[any]]` — make RPC return type
validation resilient (log warning instead of crash) and downgrade
SmartDecisionMaker tool execution errors to warnings

## Test plan
- [ ] Verify TypedDict type coercion works for
GithubMultiFileCommitBlock inputs
- [ ] Verify auto-top-up without payment method returns 422, not 500
- [ ] Verify file re-upload in copilot succeeds (overwrites instead of
409)
- [ ] Verify LaTeX rendering with Unicode characters doesn't produce
console warnings
- [ ] Verify SmartDecisionMaker tool execution failures are logged at
warning level
2026-03-23 08:05:08 +00:00
Otto
ab7c38bda7 fix(frontend): detect closed OAuth popup and allow dismissing waiting modal (#12443)
Requested by @kcze

When a user closes the OAuth sign-in popup without completing
authentication, the 'Waiting on sign-in process' modal was stuck open
with no way to dismiss it, forcing a page refresh.

Two bugs caused this:

1. `oauth-popup.ts` had no detection for the popup being closed by the
user. The promise would hang until the 5-minute timeout.

2. The modal's cancel button aborted a disconnected `AbortController`
instead of the actual OAuth flow's abort function, so clicking
cancel/close did nothing.

### Changes

- Add `popup.closed` polling (500ms) in `openOAuthPopup()` that rejects
the promise when the user closes the auth window
- Add reject-on-abort so the cancel button properly terminates the flow
- Replace the disconnected `oAuthPopupController` with a direct
`cancelOAuthFlow()` function that calls the real abort ref
- Handle popup-closed and user-canceled as silent cancellations (no
error toast)

### Testing

Tested manually 
- [x] Start OAuth flow → close popup window → modal dismisses
automatically 
- [x] Start OAuth flow → click cancel on modal → popup closes, modal
dismisses 
- [x] Complete OAuth flow normally → works as before 

Resolves SECRT-2054

---
Co-authored-by: Krzysztof Czerwinski (@kcze)
<krzysztof.czerwinski@agpt.co>

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 14:41:09 +00:00
Ubbe
0f67e45d05 hotfix(marketplace): adjust card height overflow (#12497)
## Summary

### Before

<img width="500" height="501" alt="Screenshot 2026-03-20 at 21 50 31"
src="https://github.com/user-attachments/assets/6154cffb-6772-4c3d-a703-527c8ca0daff"
/>

### After

<img width="500" height="581" alt="Screenshot 2026-03-20 at 21 33 12"
src="https://github.com/user-attachments/assets/2f9bd69d-30c5-4d06-ad1e-ed76b184afe5"
/>

### Other minor fixes

- minor spacing adjustments in creator/search pages when empty and
between sections


### Summary

- Increase StoreCard height from 25rem to 26.5rem to prevent content
overflow
- Replace manual tooltip-based title truncation with `OverflowText`
component in StoreCard
- Adjust carousel indicator positioning and hide it on md+ when exactly
3 featured agents are shown

## Test plan
- [x] Verify marketplace cards display without text overflow
- [x] Verify featured section carousel indicators behave correctly
- [x] Check responsive behavior at common breakpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 22:03:28 +08:00
Ubbe
b9ce37600e refactor(frontend/marketplace): move download below Add to library with contextual text (#12486)
## Summary

<img width="1487" height="670" alt="Screenshot 2026-03-20 at 00 52 58"
src="https://github.com/user-attachments/assets/f09de2a0-3c5b-4bce-b6f4-8a853f6792cf"
/>


- Move the download button from inline next to "Add to library" to a
separate line below it
- Add contextual text: "Want to use this agent locally? Download here"
- Style the "Download here" as a violet ghost button link with the
download icon

## Test plan
- [ ] Visit a marketplace agent page
- [ ] Verify "Add to library" button renders in its row
- [ ] Verify "Want to use this agent locally? Download here" appears
below it
- [ ] Click "Download here" and confirm the agent downloads correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 13:13:59 +00:00
Otto
3921deaef1 fix(frontend): truncate marketplace card description to 2 lines (#12494)
Reduces `line-clamp` from 3 to 2 on the marketplace `StoreCard`
description to prevent text from overlapping with the
absolutely-positioned run count and +Add button at the bottom of the
card.

Resolves SECRT-2156.

---
Co-authored-by: Abhimanyu Yadav (@Abhi1992002)
<122007096+Abhi1992002@users.noreply.github.com>
2026-03-20 09:10:21 +00:00
Nicholas Tindle
f01f668674 fix(backend): support Responses API in SmartDecisionMakerBlock (#12489)
## Summary

- Fixes SmartDecisionMakerBlock conversation management to work with
OpenAI's Responses API, which was introduced in #12099 (commit 1240f38)
- The migration to `responses.create` updated the outbound LLM call but
missed the conversation history serialization — the `raw_response` is
now the entire `Response` object (not a `ChatCompletionMessage`), and
tool calls/results use `function_call` / `function_call_output` types
instead of role-based messages
- This caused a 400 error on the second LLM call in agent mode:
`"Invalid value: ''. Supported values are: 'assistant', 'system',
'developer', and 'user'."`

### Changes

**`smart_decision_maker.py`** — 6 functions updated:
| Function | Fix |
|---|---|
| `_convert_raw_response_to_dict` | Detects Responses API `Response`
objects, extracts output items as a list |
| `_get_tool_requests` | Recognizes `type: "function_call"` items |
| `_get_tool_responses` | Recognizes `type: "function_call_output"`
items |
| `_create_tool_response` | New `responses_api` kwarg produces
`function_call_output` format |
| `_update_conversation` | Handles list return from
`_convert_raw_response_to_dict` |
| Non-agent mode path | Same list handling for traditional execution |

**`test_smart_decision_maker_responses_api.py`** — 61 tests covering:
- Every branch of all 6 affected helper functions
- Chat Completions, Anthropic, and Responses API formats
- End-to-end agent mode and traditional mode conversation validity

## Test plan

- [x] 61 new unit tests all pass
- [x] 11 existing SmartDecisionMakerBlock tests still pass (no
regressions)
- [x] All pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] CI integration tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Updates core LLM invocation and agent conversation/tool-call
bookkeeping to match OpenAI’s Responses API, which can affect tool
execution loops and prompt serialization across providers. Risk is
mitigated by extensive new unit tests, but regressions could surface in
production agent-mode flows or token/usage accounting.
> 
> **Overview**
> **Migrates OpenAI calls from Chat Completions to the Responses API
end-to-end**, including tool schema conversion, output parsing,
reasoning/text extraction, and updated token usage fields in
`LLMResponse`.
> 
> **Fixes SmartDecisionMakerBlock conversation/tool handling for
Responses API** by treating `raw_response` as a Response object
(splitting it into `output` items for replay), recognizing
`function_call`/`function_call_output` entries, and emitting tool
outputs in the correct Responses format to prevent invalid follow-up
prompts.
> 
> Also adjusts prompt compaction/token estimation to understand
Responses API tool items, changes
`get_execution_outputs_by_node_exec_id` to return list-valued
`CompletedBlockOutput`, removes `gpt-3.5-turbo` from model/cost/docs
lists, and adds focused unit tests plus a lightweight `conftest.py` to
run these tests without the full server stack.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ff292efd3d. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
2026-03-20 03:23:52 +00:00
Otto
f7a3491f91 docs(platform): add TDD guidance to CLAUDE.md files (#12491)
Requested by @majdyz

Adds TDD (test-driven development) guidance to CLAUDE.md files so Claude
Code follows a test-first workflow when fixing bugs or adding features.

**Changes:**
- **Parent `CLAUDE.md`**: Cross-cutting TDD workflow — write a failing
`xfail` test, implement the fix, remove the marker
- **Backend `CLAUDE.md`**: Concrete pytest example with
`@pytest.mark.xfail` pattern
- **Frontend `CLAUDE.md`**: Note about using Playwright `.fixme`
annotation for bug-fix tests

The workflow is: write a failing test first → confirm it fails for the
right reason → implement → confirm it passes. This ensures every bug fix
is covered by a test that would have caught the regression.

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-20 02:13:16 +00:00
Nicholas Tindle
cbff3b53d3 Revert "feat(backend): migrate OpenAI provider to Responses API" (#12490)
Reverts Significant-Gravitas/AutoGPT#12099

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Reverts the OpenAI integration in `llm_call` from the Responses API
back to `chat.completions`, which can change tool-calling, JSON-mode
behavior, and token accounting across core AI blocks. The change is
localized but touches the primary LLM execution path and associated
tests/docs.
> 
> **Overview**
> Reverts the OpenAI path in `backend/blocks/llm.py` from the Responses
API back to `chat.completions`, including updating JSON-mode
(`response_format`), tool handling, and usage extraction to match the
Chat Completions response shape.
> 
> Removes the now-unused `backend/util/openai_responses.py` helpers and
their unit tests, updates LLM tests to mock `chat.completions.create`,
and adds `gpt-3.5-turbo` to the supported model list, cost config, and
LLM docs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7d6226d10e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-03-20 01:51:56 +00:00
Reinier van der Leer
5b9a4c52c9 revert(platform): Revert invite system (#12485)
## Summary

Reverts the invite system PRs due to security gaps identified during
review:

- The move from Supabase-native `allowed_users` gating to
application-level gating allows orphaned Supabase auth accounts (valid
JWT without a platform `User`)
- The auth middleware never verifies `User` existence, so orphaned users
get 500s instead of clean 403s
- OAuth/Google SSO signup completely bypasses the invite gate
- The DB trigger that atomically created `User` + `Profile` on signup
was dropped in favor of a client-initiated API call, introducing a
failure window

### Reverted PRs
- Reverts #12347 — Foundation: InvitedUser model, invite-gated signup,
admin UI
- Reverts #12374 — Tally enrichment: personalized prompts from form
submissions
- Reverts #12451 — Pre-check: POST /auth/check-invite endpoint
- Reverts #12452 (collateral) — Themed prompt categories /
SuggestionThemes UI. This PR built on top of #12374's
`suggested_prompts` backend field and `/chat/suggested-prompts`
endpoint, so it cannot remain without #12374. The copilot empty session
falls back to hardcoded default prompts.

### Migration
Includes a new migration (`20260319120000_revert_invite_system`) that:
- Drops the `InvitedUser` table and its enums (`InvitedUserStatus`,
`TallyComputationStatus`)
- Restores the `add_user_and_profile_to_platform()` trigger on
`auth.users`
- Backfills `User` + `Profile` rows for any auth accounts created during
the invite-gate window

### What's NOT reverted
- The `generate_username()` function (never dropped, still used by
backfill migration)
- The old `add_user_to_platform()` function (superseded by
`add_user_and_profile_to_platform()`)
- PR #12471 (admin UX improvements) — was never merged, no action needed

## Test plan
- [x] Verify migration: `InvitedUser` table dropped, enums dropped,
trigger restored
- [x] Verify backfill: no orphaned auth users, no users without Profile
- [x] Verify existing users can still log in (email + OAuth)
- [x] Verify CoPilot chat page loads with default prompts
- [ ] Verify new user signup creates `User` + `Profile` via the restored
trigger
- [ ] Verify admin `/admin/users` page loads without crashing
- [ ] Run backend tests: `poetry run test`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-19 17:15:30 +00:00
Otto
0ce1c90b55 fix(frontend): rename "CoPilot" to "AutoPilot" on credits page (#12481)
Requested by @kcze

Renames "CoPilot" → "AutoPilot" on the credits/usage limits page:

- **Heading:** "CoPilot Usage Limits" → "AutoPilot Usage Limits"
- **Button:** "Open CoPilot" → "Open AutoPilot"
- Comment updated to match

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>

Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-19 15:25:21 +00:00
Ubbe
d4c6eb9adc fix(frontend): collapse navbar text to icons below 1280px (#12484)
## Summary

<img width="400" height="339" alt="Screenshot 2026-03-19 at 22 53 23"
src="https://github.com/user-attachments/assets/2fa76b8f-424d-4764-90ac-b7a331f5f610"
/>

<img width="600" height="595" alt="Screenshot 2026-03-19 at 22 53 31"
src="https://github.com/user-attachments/assets/23f51cc7-b01e-4d83-97ba-2c43683877db"
/>

<img width="800" height="523" alt="Screenshot 2026-03-19 at 22 53 36"
src="https://github.com/user-attachments/assets/1e447b9a-1cca-428c-bccd-1730f1670b8e"
/>

Now that we have the `Give feedback` button on the Navigation bar,
collpase some of the links below `1280px` so there is more space and
they don't collide with each other...

- Collapse navbar link text to icon-only below 1280px (`xl` breakpoint)
to prevent crowding
- Wallet button shows only the wallet icon below 1280px instead of "Earn
credits" text
- Feedback button shows only the chat icon below 1280px instead of "Give
Feedback" text
- Added `whitespace-nowrap` to feedback button to prevent wrapping

## Changes
- `NavbarLink.tsx`: `lg:block` → `xl:block` for link text
- `Wallet.tsx`: `md:hidden`/`md:inline-block` →
`xl:hidden`/`xl:inline-block`
- `FeedbackButton.tsx`: wrap text in `hidden xl:inline` span, add
`whitespace-nowrap`

## Test plan
- [ ] Resize browser between 1024px–1280px and verify navbar shows only
icons
- [ ] At 1280px+ verify full text labels appear for links, wallet, and
feedback
- [ ] Verify mobile navbar still works correctly below `md` breakpoint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 15:10:27 +00:00
Ubbe
1bb91b53b7 fix(frontend/marketplace): comprehensive marketplace UI redesign (#12462)
## Summary

<img width="600" height="964" alt="Screenshot_2026-03-19_at_00 07 52"
src="https://github.com/user-attachments/assets/95c0430a-26a3-499b-8f6a-25b9715d3012"
/>
<img width="600" height="968" alt="Screenshot_2026-03-19_at_00 08 01"
src="https://github.com/user-attachments/assets/d440c3b0-c247-4f13-bf82-a51ff2e50902"
/>
<img width="600" height="939" alt="Screenshot_2026-03-19_at_00 08 14"
src="https://github.com/user-attachments/assets/f19be759-e102-4a95-9474-64f18bce60cf"
/>"
<img width="600" height="953" alt="Screenshot_2026-03-19_at_00 08 24"
src="https://github.com/user-attachments/assets/ba4fa644-3958-45e2-89e9-a6a4448c63c5"
/>



- Re-style and re-skin the Marketplace pages to look more "professional"
...
- Move the `Give feedback` button to the header

## Test plan
- [x] Verify marketplace page search bar matches Form text field styling
- [x] Verify agent cards have padding and subtle border
- [x] Verify hover/focus states work correctly
- [x] Check responsive behavior at different breakpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 22:28:01 +08:00
Bentlybro
64a011664a fix(schema): address Majdyz review feedback
- Add FK constraints on LlmModelMigration (sourceModelSlug, targetModelSlug → LlmModel.slug)
- Remove unused @@index([credentialProvider]) on LlmModelCost
- Remove redundant @@index([isReverted]) on LlmModelMigration (covered by composite)
- Add documentation for credentialProvider field explaining its purpose
- Add reverse relation fields to LlmModel (SourceMigrations, TargetMigrations)

Fixes data integrity: typos in migration slugs now caught at DB level.
2026-03-19 11:01:09 +00:00
Bentlybro
1db7c048d9 fix: isort import order 2026-03-19 11:01:09 +00:00
Bentlybro
4c5627c966 fix: use execute_raw_with_schema for proper multi-schema support
Per Sentry feedback: db.execute_raw ignores connection string's ?schema=
parameter and defaults to 'public' schema. This breaks in multi-schema setups.

Changes:
- Import execute_raw_with_schema from .db
- Use {schema_prefix} placeholder in query
- Call execute_raw_with_schema instead of db.execute_raw

This matches the pattern used in fix_llm_provider_credentials and other
schema-aware migrations. Works in both CI (public schema) and local
(platform schema from connection string).
2026-03-19 11:01:09 +00:00
Bentlybro
d97d137a51 fix: remove hardcoded schema prefix from migrate_llm_models query
The raw SQL query in migrate_llm_models() hardcoded platform."AgentNode"
which fails in CI where tables are in 'public' schema (not 'platform').

This code exists in dev but only runs when LLM registry has data. With our
new schema, the migration tries to run at startup and fails in CI.

Changed: UPDATE platform."AgentNode" -> UPDATE "AgentNode"

Matches pattern of all other migrations - let connection string's default
schema handle routing.
2026-03-19 11:01:09 +00:00
Bentlybro
ded9e293ff fix: remove CREATE SCHEMA to match CI environment
CI uses schema "public" as default (not "platform"), so creating
a platform schema then tables without prefix puts tables in public
but Prisma looks in platform.

Existing migrations don't create schema - they rely on connection
string's default. Remove CREATE SCHEMA IF NOT EXISTS to match.
2026-03-19 11:01:09 +00:00
Bentlybro
83d504bed2 fix: remove schema prefix from migration SQL to match existing pattern
CI failing with 'relation "platform.AgentNode" does not exist' because
Prisma generates queries differently when tables are created with
explicit schema prefixes.

Existing AutoGPT migrations use:
  CREATE TABLE "AgentNode" (...)

Not:
  CREATE TABLE "platform"."AgentNode" (...)

The connection string's ?schema=platform handles schema selection,
so explicit prefixes aren't needed and cause compatibility issues.

Changes:
- Remove all "platform". prefixes from:
  * CREATE TYPE statements
  * CREATE TABLE statements
  * CREATE INDEX statements
  * ALTER TABLE statements
  * REFERENCES clauses in foreign keys

Now matches existing migration pattern exactly.
2026-03-19 11:01:09 +00:00
Bentlybro
a5f1ffb35b fix: add partial unique indexes for data integrity
Per CodeRabbit feedback - fix 2 actual bugs:

1. Prevent multiple active migrations per source model
   - Add partial unique index: UNIQUE (sourceModelSlug) WHERE isReverted = false
   - Prevents ambiguous routing when resolving migrations

2. Allow both default and credential-specific costs
   - Remove @@unique([llmModelId, credentialProvider, unit])
   - Add 2 partial unique indexes:
     * UNIQUE (llmModelId, provider, unit) WHERE credentialId IS NULL (defaults)
     * UNIQUE (llmModelId, provider, credentialId, unit) WHERE credentialId IS NOT NULL (overrides)
   - Enables provider-level default costs + per-credential overrides

Schema comments document that these constraints exist in migration SQL.
2026-03-19 11:01:09 +00:00
Bentlybro
97c6516a14 fix: remove multiSchema - follow existing AutoGPT pattern
Remove unnecessary multiSchema configuration that broke existing models.

AutoGPT uses connection string's ?schema=platform parameter as default,
not Prisma's multiSchema feature. Existing models (User, AgentGraph, etc.)
have no @@schema() directives and work fine.

Changes:
- Remove schemas = ["platform", "public"] from datasource
- Remove "multiSchema" from previewFeatures
- Remove all @@schema() directives from LLM models and enum

Migration SQL already creates tables in platform schema explicitly
(CREATE TABLE "platform"."LlmProvider" etc.) which is correct.

This matches the existing pattern used throughout the codebase.
2026-03-19 11:01:09 +00:00
Bentlybro
876dde8bc7 fix: address CodeRabbit design feedback
Per CodeRabbit review:

1. **Safety: Change capability defaults false → safer for partial seeding**
   - supportsTools: true → false
   - supportsJsonOutput: true → false
   - Prevents partially-seeded rows from being assumed capable

2. **Clarity: Rename supportsParallelTool → supportsParallelToolCalls**
   - More explicit about what the field represents

3. **Performance: Remove redundant indexes**
   - Drop @@index([llmModelId]) - covered by unique constraint
   - Drop @@index([sourceModelSlug]) - covered by composite index
   - Reduces write overhead and storage

4. **Documentation: Acknowledge customCreditCost limitation**
   - It's unit-agnostic (doesn't distinguish RUN vs TOKENS)
   - Noted as TODO for follow-up PR with proper unit-aware override

Schema + migration both updated to match.
2026-03-19 11:01:09 +00:00
Bentlybro
0bfdd74b25 fix: add @@schema("platform") to LlmCostUnit enum
Sentry caught this - enums also need @@schema directive with multiSchema enabled.
Without it, Prisma looks for enum in public schema but it's created in platform.
2026-03-19 11:01:09 +00:00
Bentlybro
a7d2f81b18 feat: add database CHECK constraints for data integrity
Per CodeRabbit feedback - enforce numeric domain rules at DB level:

Migration:
- priceTier: CHECK (priceTier BETWEEN 1 AND 3)
- creditCost: CHECK (creditCost >= 0)
- nodeCount: CHECK (nodeCount >= 0)
- customCreditCost: CHECK (customCreditCost IS NULL OR customCreditCost >= 0)

Schema comments:
- Document constraints inline for developer visibility

Prevents invalid data (negative costs, out-of-range tiers) from
entering the database, matching backend/blocks/llm.py contract.
2026-03-19 11:01:09 +00:00
Bentlybro
3699eaa556 fix: use @@schema() instead of @@map() for platform schema + create schema in migration
Critical fixes from PR review:

1. Replace @@map("platform.ModelName") with @@schema("platform")
   - Sentry correctly identified: Prisma was looking for literal table "platform.LlmProvider" with dot
   - Proper syntax: enable multiSchema feature + use @@schema directive

2. Create platform schema in migration
   - CI failed: schema "platform" does not exist
   - Add CREATE SCHEMA IF NOT EXISTS at start of migration

Schema changes:
- datasource: add schemas = ["platform", "public"]
- generator: add "multiSchema" to previewFeatures
- All 5 models: @@map() → @@schema("platform")

Migration changes:
- Add CREATE SCHEMA IF NOT EXISTS "platform" before enum creation

Fixes CI failure and Sentry-identified bug.
2026-03-19 11:01:09 +00:00
Bentlybro
21adf9e0fb feat(platform): Add LLM registry database schema
Add Prisma schema and migration for dynamic LLM model registry:

Schema additions:
- LlmProvider: Registry of LLM providers (OpenAI, Anthropic, etc.)
- LlmModel: Individual models with capabilities and metadata
- LlmModelCost: Per-model pricing configuration
- LlmModelCreator: Model creators/trainers (OpenAI, Meta, etc.)
- LlmModelMigration: Track model migrations and reverts
- LlmCostUnit enum: RUN vs TOKENS pricing units

Key features:
- Model-specific capabilities (tools, JSON, reasoning, parallel calls)
- Flexible creator/provider separation (e.g., Meta model via Hugging Face)
- Migration tracking with custom pricing overrides
- Indexes for performance on common queries

Part 1 of incremental LLM registry implementation.
Refs: Draft PR #11699
2026-03-19 11:01:08 +00:00
Ubbe
a5f9c43a41 feat(platform): replace suggestion pills with themed prompt categories (#12452)
## Summary



https://github.com/user-attachments/assets/13da6d36-5f35-429b-a6cf-e18316bb8709



Replaces the flat list of suggestion pills in the CoPilot empty session
with themed prompt categories (Learn, Create, Automate, Organize), each
shown as a popover with contextual prompts.

- **Backend**: Changes `suggested_prompts` from a flat `list[str]` to a
themed `dict[str, list[str]]` keyed by category. Updates Tally
extraction LLM prompt to generate prompts per theme, and the
`/suggested-prompts` API to return grouped themes. Legacy `list[str]`
rows are preserved under a `"General"` key for backward compatibility.
- **Frontend**: Replaces inline pill buttons with a `SuggestionThemes`
popover component. Each theme button (with icon) opens a dropdown of 5
relevant prompts. Falls back to hardcoded defaults when the API has no
personalized prompts. Normalizes partial API responses by padding
missing themes with defaults. Legacy `"General"` prompts are distributed
round-robin across themes so existing users keep their personalized
suggestions.

### Changes 🏗️

- `backend/data/understanding.py`: `suggested_prompts` field changed
from `list[str]` to `dict[str, list[str]]`; legacy list rows preserved
under `"General"` key; list items validated as strings
- `backend/data/tally.py`: LLM prompt updated to generate themed
prompts; validation now per-theme with blank-string rejection
- `backend/api/features/chat/routes.py`: New `SuggestedTheme` model;
endpoint returns `themes[]`
- `frontend/copilot/components/EmptySession/EmptySession.tsx`: Uses
generated API types directly (no cast)
- `frontend/copilot/components/EmptySession/helpers.ts`:
`DEFAULT_THEMES` replaces `DEFAULT_QUICK_ACTIONS`; `getSuggestionThemes`
normalizes partial API responses and distributes legacy `"General"`
prompts across themes
-
`frontend/copilot/components/EmptySession/components/SuggestionThemes/`:
New popover component with theme icons and loading states

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verify themed suggestion buttons render on CoPilot empty session
  - [x] Click each theme button and confirm popover opens with prompts
  - [x] Click a prompt and confirm it sends the message
- [x] Verify fallback to default themes when API returns no custom
prompts
- [x] Verify legacy users' personalized prompts are preserved and
visible


🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 18:46:12 +08:00
Otto
1240f38f75 feat(backend): migrate OpenAI provider to Responses API (#12099)
## Summary

Migrates the OpenAI provider in the LLM block from
`chat.completions.create` to `responses.create` — OpenAI's newer,
unified API. Also removes the obsolete GPT-3.5-turbo model.

Resolves #11624
Linear:
[OPEN-2911](https://linear.app/autogpt/issue/OPEN-2911/update-openai-calls-to-use-responsescreate)

## Changes

- **`backend/blocks/llm.py`** — OpenAI provider now uses
`responses.create` exclusively. Removed GPT-3.5-turbo enum + metadata.
- **`backend/util/openai_responses.py`** *(new)* — Helpers for the
Responses API: tool format conversion, content/reasoning/usage/tool-call
extraction.
- **`backend/util/openai_responses_test.py`** *(new)* — Unit tests for
all helper functions.
- **`backend/data/block_cost_config.py`** — Removed GPT-3.5 cost entry.
- **`docs/integrations/block-integrations/llm.md`** — Regenerated block
docs.

## Key API differences handled

| Aspect | Chat Completions | Responses API |
|--------|-----------------|---------------|
| Messages param | `messages` | `input` |
| Max tokens param | `max_completion_tokens` | `max_output_tokens` |
| Usage fields | `prompt_tokens` / `completion_tokens` | `input_tokens`
/ `output_tokens` |
| Tool format | Nested under `function` key | Flat structure |

## Test plan

- [x] Unit tests for all `openai_responses.py` helpers
- [x] Existing LLM block tests updated for Responses API mocks
- [x] Regular OpenAI models work
- [x] Reasoning OpenAI models work
- [x] Non-OpenAI models work

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 09:19:31 +00:00
Zamil Majdy
f617f50f0b dx(skills): improve pr-address skill — full thread context + PR description backtick fix (#12480)
## Summary

Improves the `pr-address` skill with two fixes:

- **Full comment thread loading**: Adds `--paginate` to the inline
comments fetch and explicit instructions to reconstruct threads using
`in_reply_to_id`, reading root-to-last-reply before acting. Previously,
only the opening comment was visible — missing reviewer replies led to
wrong fixes.
- **Backtick-safe PR descriptions**: Adds instructions to write the PR
body to a temp file via `<<'PREOF'` heredoc before passing to `gh pr
edit/create`. Inlining the body directly causes backticks to be
shell-escaped, breaking markdown rendering.

## Test plan
- [ ] Run `/pr-address` on a PR with multi-reply inline comment threads
— verify the last reply is what gets acted on
- [ ] Update a PR description containing backticks — verify they render
correctly in GitHub
2026-03-19 15:11:14 +07:00
Otto
943a1df815 dx(backend): Make Builder and Marketplace search work without embeddings (#12479)
When OpenAI credentials are unavailable (fork PRs, dev envs without API
keys), both builder block search and store agent functionality break:

1. **Block search returns wrong results.** `unified_hybrid_search` falls
back to a zero vector when embedding generation fails. With ~200 blocks
in `UnifiedContentEmbedding`, the zero-vector semantic scores are
garbage, and lexical matching on short block names is too weak — "Store
Value" doesn't appear in the top results for query "Store Value".

2. **Store submission approval fails entirely.**
`review_store_submission` calls `ensure_embedding()` inside a
transaction. When it throws, the entire transaction rolls back — no
store submissions get approved, the `StoreAgent` materialized view stays
empty, and all marketplace e2e tests fail.

3. **Store search returns nothing.** Even when store data exists,
`hybrid_search` queries `UnifiedContentEmbedding` which has no store
agent rows (backfill failed). It succeeds with zero results rather than
throwing, so the existing exception-based fallback never triggers.

### Changes 🏗️

- Replace `unified_hybrid_search` with in-memory text search in
`_hybrid_search_blocks` (-> `_text_search_blocks`). All ~200 blocks are
already loaded in memory, and `_score_primary_fields` provides correct
deterministic text relevance scoring against block name, description,
and input schema field descriptions — the same rich text the embedding
pipeline uses. CamelCase block names are split via `split_camelcase()`
to match the tokenization from PR #12400.

- Make embedding generation in `review_store_submission` best-effort:
catch failures and log a warning instead of rolling back the approval
transaction. The backfill scheduler retries later when credentials
become available.

- Fall through to direct DB search when `hybrid_search` returns empty
results (not just when it throws). The fallback uses ad-hoc
`to_tsvector`/`plainto_tsquery` with `ts_rank_cd` ranking on
`StoreAgent` view fields, restoring the search quality of the original
pre-hybrid implementation (stemming, stop-word removal, relevance
ranking).

- Fix Playwright artifact upload in end-to-end test CI

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `build.spec.ts`: 8/8 pass locally (was 0/7 before fix)
  - [x] All 79 e2e tests pass in CI (was 15 failures before fix)

---
Co-authored-by: Reinier van der Leer (@Pwuts)

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 00:11:06 +00:00
Otto
593001e0c8 fix(frontend): Remove dead Tutorial button from TallyPopup (#12474)
After the legacy builder was removed in #12082, the TallyPopup component
still showed a "Tutorial" button (bottom-right, next to "Give Feedback")
that navigated to `/build?resetTutorial=true`. Nothing handles that
param anymore, so clicking it did nothing.

This removes the dead button and its associated state/handler from
TallyPopup and useTallyPopup. The working tutorial (Shepherd.js
chalkboard icon in CustomControls) is unaffected.

**Changes:**
- `TallyPopup.tsx`: Remove Tutorial button JSX, unused imports
(`usePathname`, `useSearchParams`), and `isNewBuilder` check
- `useTallyPopup.ts`: Remove `showTutorial` state, `handleResetTutorial`
handler, unused `useRouter` import

Resolves SECRT-2109

---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>

Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
2026-03-19 00:09:46 +00:00
Ubbe
e1db8234a3 fix(frontend/copilot): constrain markdown heading sizes in user chat messages (#12463)
### Before

<img width="600" height="489" alt="Screenshot 2026-03-18 at 19 24 41"
src="https://github.com/user-attachments/assets/bb8dc0fa-04cd-4f32-8125-2d7930b4acde"
/>

Formatted headings in user messages would look massive

### After

<img width="600" height="549" alt="Screenshot 2026-03-18 at 19 24 33"
src="https://github.com/user-attachments/assets/51230232-c914-42dd-821f-3b067b80bab4"
/>

Markdown headings (`# H1` through `###### H6`) and setext-style headings
(`====`) in user chat messages rendered at their full HTML heading size,
which looked disproportionately large in the chat bubble context.

### Changes 🏗️

- Added Tailwind CSS overrides on the user message `MessageContent`
wrapper to cap all heading elements (h1-h6) at `text-lg font-semibold`
- Only affects user messages in copilot chat (via `group-[.is-user]`
selector); assistant messages are unchanged

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Send a user message containing `# Heading 1` through `######
Heading 6` and verify they all render at constrained size
- [ ] Send a message with `====` separator pattern and verify it doesn't
render as a mega H1
  - [ ] Verify assistant messages with headings still render normally

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 00:33:09 +08:00
Zamil Majdy
282173be9d feat(copilot): GitHub CLI support — inject GH_TOKEN and connect_integration tool (#12426)
## Summary

- When a user has connected GitHub, `GH_TOKEN` is automatically injected
into the Claude Agent SDK subprocess environment so `gh` CLI commands
work without any manual auth step
- When GitHub is **not** connected, the copilot can call a new
`connect_integration(provider="github")` MCP tool, which surfaces the
same credential setup card used by regular GitHub blocks — the user
connects inline without leaving the chat
- After connecting, the copilot is instructed to retry the operation
automatically

## Changes

**Backend**
- `sdk/service.py`: `_get_github_token_for_user()` fetches OAuth2 or API
key credentials and injects `GH_TOKEN` + `GITHUB_TOKEN` into `sdk_env`
before the SDK subprocess starts (per-request, thread-safe via
`ClaudeAgentOptions.env`)
- `tools/connect_integration.py`: new `ConnectIntegrationTool` MCP tool
— returns `SetupRequirementsResponse` for a given provider (`github` for
now); extensible via `_PROVIDER_INFO` dict
- `tools/__init__.py`: registers `connect_integration` in
`TOOL_REGISTRY`
- `prompting.py`: adds GitHub CLI / `connect_integration` guidance to
`_SHARED_TOOL_NOTES`

**Frontend**
- `ConnectIntegrationTool/ConnectIntegrationTool.tsx`: thin wrapper
around the existing `SetupRequirementsCard` with a tailored retry
instruction
- `MessagePartRenderer.tsx`: dispatches `tool-connect_integration` to
the new component

## Test plan

- [ ] User with GitHub credentials: `gh pr list` works without any auth
step in copilot
- [ ] User without GitHub credentials: copilot calls
`connect_integration`, card renders with GitHub credential input, after
connecting copilot retries and `gh` works
- [ ] `GH_TOKEN` is NOT leaked across users (injected via
`ClaudeAgentOptions.env`, not `os.environ`)
- [ ] `connect_integration` with unknown provider returns a graceful
error message
2026-03-18 11:52:42 +00:00
Zamil Majdy
5d9a169e04 feat(blocks): add AutoPilotBlock for invoking AutoPilot from graphs (#12439)
## Summary
- Adds `AutogptCopilotBlock` that invokes the platform's copilot system
(`stream_chat_completion_sdk`) directly from graph executions
- Enables sub-agent patterns: copilot can call this block recursively
(with depth limiting via `contextvars`)
- Enables scheduled copilot execution through the agent executor system
- No user credentials needed — uses server-side copilot config

## Inputs/Outputs
**Inputs:** prompt, system_context, session_id (continuation), timeout,
max_recursion_depth
**Outputs:** response text, tool_calls list, conversation_history JSON,
session_id, token_usage

## Test plan
- [x] Block test passes (`test_available_blocks[AutogptCopilotBlock]`)
- [x] Pre-commit hooks pass (format, lint, typecheck)
- [ ] Manual test: add block to graph, send prompt, verify response
- [ ] Manual test: chain two copilot blocks with session_id to verify
continuation
2026-03-18 11:22:25 +00:00
Ubbe
6fd1050457 fix(backend): arch-conditional chromium in Docker for ARM64 compatibility (#12466)
## Summary
- On **amd64**: keep `agent-browser install` (Chrome for Testing —
pinned version tested with Playwright) + restore runtime libs
- On **arm64**: install system `chromium` package (Chrome for Testing
has no ARM64 binary) + skip `agent-browser install`
- An entrypoint script sets
`AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` at container startup
on arm64 (detected via presence of `/usr/bin/chromium`); on amd64 the
var is left unset so agent-browser uses Chrome for Testing as before

**Why not system chromium on amd64?** `agent-browser install` downloads
a specific Chrome for Testing version pinned to the Playwright version
in use. Using whatever Debian ships on amd64 could cause protocol
compatibility issues.

Introduced by #12301 (cc @Significant-Gravitas/zamil-majdy)

## Test plan
- [ ] `docker compose up --build` succeeds on ARM64 (Apple Silicon)
- [ ] `docker compose up --build` succeeds on x86_64
- [ ] Copilot browser tools (`browser_navigate`, `browser_act`,
`browser_screenshot`) work in a Copilot session on both architectures

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-18 19:08:14 +08:00
Otto
02708bcd00 fix(platform): pre-check invite eligibility before Supabase signup (#12451)
Requested by @Swiftyos

The invite gate check in `get_or_activate_user()` runs after Supabase
creates the auth user, resulting in orphaned auth accounts with no
platform access when a non-invited user signs up. Users could create a
Supabase account but had no `User`, `Profile`, or `Onboarding` records —
they could log in but access nothing.

### Changes 🏗️

**Backend** (`v1.py`, `invited_user.py`):
- Add public `POST /api/auth/check-invite` endpoint (no auth required —
this is a pre-signup check)
- Add `check_invite_eligibility()` helper in the data layer
- Returns `{allowed: true}` when `enable_invite_gate` is disabled
- Extracted `is_internal_email()` helper to deduplicate `@agpt.co`
bypass logic (was duplicated between route and `get_or_activate_user`)
- Checks `InvitedUser` table for `INVITED` status
- Added IP-based Redis rate limiting (10 req/60 s per IP, fails open if
Redis unavailable, returns HTTP 429 when exceeded)
- Fixed Redis pipeline atomicity: `incr` + `expire` now sent in a single
pipeline round-trip, preventing a TTL-less key if `expire` had
previously failed after `incr`
- Fixed incorrect `await` on `pipe.incr()` / `pipe.expire()` — redis-py
async pipeline queue methods are synchronous; only `execute()` is
awaitable. The erroneous `await` was silently swallowed by the `except`
block, making the rate limiter completely non-functional

**Frontend** (`signup/actions.ts`):
- Call the generated `postV1CheckIfAnEmailIsAllowedToSignUp` client
(replacing raw `fetch`) before `supabase.auth.signUp()`
- `ApiError` (non-OK HTTP responses) logs a Sentry warning with the HTTP
status; network/other errors capture a Sentry exception
- If not allowed, return `not_allowed` error (existing
`EmailNotAllowedModal` handles this)
- Graceful fallback: if the pre-check fails (backend unreachable), falls
through to the existing flow — `get_or_activate_user()` remains as
defense-in-depth

**Tests** (`v1_test.py`, `invited_user_test.py`):
- 5 route-level tests covering: gate disabled → allowed, `@agpt.co`
bypass, eligible email, ineligible email, rate-limit exceeded
- Rate-limit test mock updated to use pipeline interface
(`pipeline().execute()` returns `[count, True]`)
- Existing `invited_user_test.py` updated to cover
`check_invite_eligibility` branches

**Not changed:**
- Google OAuth flow — already gated by OAuth provider settings
- `get_or_activate_user()` — stays as backend safety net
- All admin invite CRUD routes — unchanged

### Test plan
1. Email/password signup with invited email → signup proceeds normally
2. Email/password signup with non-invited email → `EmailNotAllowedModal`
shown, no Supabase user created
3. `enable_invite_gate=false` → all emails allowed
4. Backend unreachable during pre-check → falls through to existing flow
5. Same IP exceeds 10 requests/60 s → HTTP 429 returned

---
Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com>

---------

Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-18 10:36:50 +00:00
Zamil Majdy
156d61fe5c dx(skills): add merge conflict detection and resolution to pr-address (#12469)
## Summary
- Adds merge conflict detection as step 2 of the polling loop (between
CI check and comment check), including handling of the transient
`"UNKNOWN"` state
- Adds a "Resolving merge conflicts" section with step-by-step
instructions using 3-way merge (no force push needed since PRs are
squash-merged)
- Validates all three git conflict markers before staging to prevent
committing broken code
- Fixes `args` → `argument-hint` in skill frontmatter

## Test plan
- [ ] Verify skill renders correctly in Claude Code
2026-03-18 17:46:32 +07:00
Zamil Majdy
5a29de0e0e fix(platform): try-compact-retry for prompt-too-long errors in CoPilot SDK (#12413)
## Summary

When the Claude SDK returns a prompt-too-long error (e.g. transcript +
query exceeds the model's context window), the streaming loop now
retries with escalating fallbacks instead of failing immediately:

1. **Attempt 1**: Use the transcript as-is (normal path)
2. **Attempt 2**: Compact the transcript via LLM summarization
(`compact_transcript`) and retry
3. **Attempt 3**: Drop the transcript entirely and fall back to
DB-reconstructed context (`_build_query_message`)

If all 3 attempts fail, a `StreamError(code="prompt_too_long")` is
yielded to the frontend.

### Key changes

**`service.py`**
- Add `_is_prompt_too_long(err)` — pattern-matches SDK exceptions for
prompt-length errors (`prompt is too long`, `prompt_too_long`,
`context_length_exceeded`, `request too large`)
- Wrap `async with ClaudeSDKClient` in a 3-attempt retry `for` loop with
compaction/fallback logic
- Move `current_message`, `_build_query_message`, and
`_prepare_file_attachments` before the retry loop (computed once,
reused)
- Skip transcript upload in `finally` when `transcript_caused_error`
(avoids persisting a broken/empty transcript)
- Reset `stream_completed` between retry iterations
- Document outer-scope variable contract in `_run_stream_attempt`
closure (which variables are reassigned between retries vs read-only)

**`transcript.py`**
- Add `compact_transcript(content, log_prefix, model)` — converts JSONL
→ messages → `compress_context` (LLM summarization with truncation
fallback) → JSONL
- Add helpers: `_flatten_assistant_content`,
`_flatten_tool_result_content`, `_transcript_to_messages`,
`_messages_to_transcript`, `_run_compression`
- Returns `None` when compaction fails or transcript is already within
budget (signals caller to fall through to DB fallback)
- Truncation fallback wrapped in 30s timeout to prevent unbounded CPU
time on large transcripts
- Accepts `model` parameter to avoid creating a new `ChatConfig()` on
every call

**`util/prompt.py`**
- Fix `_truncate_middle_tokens` edge case: returns empty string when
`max_tok < 1`, properly handles `max_tok < 3`

**`config.py`**
- E2B sandbox timeout raised from 5 min to 15 min to accommodate
compaction retries

**`prompt_too_long_test.py`** (new, 45 tests)
- `_is_prompt_too_long` positive/negative patterns, case sensitivity,
BaseException handling
- Flatten helpers for assistant/tool_result content blocks
- `_transcript_to_messages` / `_messages_to_transcript` roundtrip,
strippable types, empty content
- `compact_transcript` async tests: too few messages, not compacted,
successful compaction, compression failure

**`retry_scenarios_test.py`** (new, 27 tests)
- Full retry state machine simulation covering all 8 scenarios:
  1. Normal flow (no retry)
  2. Compact succeeds → retry succeeds
  3. Compact fails → DB fallback succeeds
  4. No transcript → DB fallback succeeds
  5. Double fail → DB fallback on attempt 3
  6. All 3 attempts exhausted
  7. Non-prompt-too-long error (no retry)
  8. Compaction returns identical content → DB fallback
- Edge cases: nested exceptions, case insensitivity, unicode content,
large transcripts, resume-after-compaction flow

**Shared test fixtures** (`conftest.py`)
- Extracted `build_test_transcript` helper used across 3 test files to
eliminate duplication

## Test plan

- [x] `_is_prompt_too_long` correctly identifies prompt-length errors (8
positive, 5 negative patterns)
- [x] `compact_transcript` compacts oversized transcripts via LLM
summarization
- [x] `compact_transcript` returns `None` on failure or when already
within budget
- [x] Retry loop state machine: all 8 scenarios verified with state
assertions
- [x] `TranscriptBuilder` works correctly after loading compacted
transcripts
- [x] `_messages_to_transcript` roundtrip preserves content including
unicode
- [x] `transcript_caused_error` prevents stale transcript upload
- [x] Truncation timeout prevents unbounded CPU time
- [x] All 139 unit tests pass locally
- [x] CI green (tests 3.11/3.12/3.13, types, CodeQL, linting)
2026-03-18 10:27:31 +00:00
Otto
e657472162 feat(blocks): Add Nano Banana 2 to image generator, customizer, and editor blocks (#12218)
Requested by @Torantulino

Add `google/nano-banana-2` (Gemini 3.1 Flash Image) support across all
three image blocks.

### Changes

**`ai_image_customizer.py`**
- Add `NANO_BANANA_2 = "google/nano-banana-2"` to `GeminiImageModel`
enum
- Update block description to reference Nano-Banana models generically

**`ai_image_generator_block.py`**
- Add `NANO_BANANA_2` to `ImageGenModel` enum
- Add generation branch (identical to NBP except model name)

**`flux_kontext.py` (AI Image Editor)**
- Rename `FluxKontextModelName` → `ImageEditorModel` (with
backwards-compatible alias)
- Add `NANO_BANANA_PRO` and `NANO_BANANA_2` to the editor
- Model-aware branching in `run_model()`: NB models use `image_input`
list (not `input_image`), no `seed`, and add `output_format`

**`block_cost_config.py`**
- Add NB2 cost entries for all three blocks (14 credits, matching NBP)
- Add NB Pro cost entry for editor block
- Update editor block refs from `.PRO`/`.MAX` to
`.FLUX_KONTEXT_PRO`/`.FLUX_KONTEXT_MAX`

Resolves SECRT-2047

---------

Co-authored-by: Torantulino <Torantulino@users.noreply.github.com>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
2026-03-18 09:42:18 +00:00
DEEVEN SERU
4d00e0f179 fix(blocks): allow falsy entries in AddToListBlock (#12028)
## Summary
- treat AddToListBlock.entry as optional rather than truthy so
0/""/False are appended
- extend block self-tests with a falsy entry case

## Testing
- Not run (pytest not available in environment)

Co-authored-by: DEEVEN SERU <144827577+DEVELOPER-DEEVEN@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-03-18 09:42:14 +00:00
DEEVEN SERU
1d7282b5f3 fix(backend): Truncate filenames with excessively long 'extensions' (#12025)
Fixes issue where filenames with no dots until the end (or massive
extensions) bypassed truncation logic, causing OSError [Errno 36].
Limits extension preservation to 20 chars.

---------

Co-authored-by: DEVELOPER-DEEVEN <144827577+DEVELOPER-DEEVEN@users.noreply.github.com>
2026-03-18 09:42:06 +00:00
Reinier van der Leer
e3591fcaa3 ci(backend): Python version specific type checking (#12453)
- Resolves #10657
- Partially based on #10913

### Changes 🏗️

- Run Pyright separately for each supported Python version
  - Move type checking and linting into separate jobs
    - Add `--skip-pyright` option to lint script
- Move `linter.py` into `backend/scripts`
  - Move other scripts in `backend/` too for consistency

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI

---

Co-authored-by: @Joaco2603 <jpappa2603@gmail.com>

---------

Co-authored-by: Joaco2603 <jpappa2603@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 09:41:35 +00:00
Reinier van der Leer
876dc32e17 chore(backend): Update poetry to v2.2.1 (#12459)
Poetry v2.2.1 has bugfixes that are relevant in context of our
`.pre-commit-config.yaml`

### Changes 🏗️

- Update `poetry` from v2.1.1 to v2.2.1 (latest version supported by
Dependabot)
- Re-generate `poetry.lock`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI
2026-03-18 09:41:28 +00:00
Reinier van der Leer
616e29f5e4 fix tests for 6d0e206 2026-03-18 10:39:51 +01:00
Zamil Majdy
280a98ad38 dx(skills): poll for new PR comments while waiting for CI (#12461)
## Summary
- Updates the `pr-address` skill to poll for new PR comments while
waiting for CI, instead of blocking solely on `gh pr checks --watch
--fail-fast`
- Runs CI watch in the background and polls all 3 comment endpoints
every 30s
- Allows bot comments (coderabbitai, sentry) to be addressed in parallel
with CI rather than sequentially

## Test plan
- [ ] Run `/pr-address` on a PR with pending CI and verify it detects
new comments while CI is running
- [ ] Verify CI failures are still handled correctly after the combined
wait
2026-03-18 15:07:13 +07:00
Reinier van der Leer
c7f2a7dd03 fix formatting 2026-03-17 20:30:33 +01:00
Otto
6d0e2063ec Merge commit from fork
* fix(backend): add resource limits to Jinja2 template rendering

Prevent DoS via computational exhaustion in FillTextTemplateBlock by:

- Subclassing SandboxedEnvironment to intercept ** and * operators
  with caps on exponent size (1000) and string repeat length (10K)
- Replacing range() global with a capped version (max 10K items)
- Wrapping template.render() in a ThreadPoolExecutor with a 10s
  timeout to kill runaway expressions

Addresses GHSA-ppw9-h7rv-gwq9 (CWE-400).

* address review: move helpers after TextFormatter, drop ThreadPoolExecutor

- Move _safe_range and _RestrictedEnvironment below TextFormatter
  (helpers after the function that uses them)
- Remove ThreadPoolExecutor timeout wrapper from format_string() —
  it has problematic behavior in async contexts and the static
  interception (operator caps, range limit) already covers the
  known attack vectors

* address review: extend sequence guard, harden format_email, add tests

- Extend * guard to cover list and tuple repetition, not just strings
  (blocks {{ [0] * 999999999 }} and {{ (0,) * 999999999 }})
- Rename MAX_STRING_REPEAT → MAX_SEQUENCE_REPEAT
- Use _RestrictedEnvironment in format_email (defense-in-depth)
- Add tests: list repeat, tuple repeat, negative exponent, nested
  exponentiation (18 tests total)

* add async timeout wrapper at block level

Wrap format_string calls in FillTextTemplateBlock and AgentOutputBlock
with asyncio.wait_for(asyncio.to_thread(...), timeout=10s).

This provides defense-in-depth: if an expression somehow bypasses the
static operator checks, the async timeout will cancel it. Uses
asyncio.to_thread for proper async integration (no event loop blocking)
and asyncio.wait_for for real cancellation on timeout.

* make format_string async with timeout kwarg

Move asyncio.wait_for + asyncio.to_thread into format_string() itself
with a timeout kwarg (default 10s). This way all callers get the
timeout automatically — no wrapper needed at each call site.

- format_string() is now async, callers use await
- format_email() is now async (calls format_string internally)
- Updated all callers: text.py, io.py, llm.py, smart_decision_maker.py,
  email.py, notifications.py
- Tests updated to use asyncio.run()

* use Jinja2 native async rendering instead of to_thread

Switch from asyncio.to_thread(template.render) to Jinja2's native
enable_async=True + template.render_async(). No thread overhead,
proper async integration. asyncio.wait_for timeout still applies.

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-17 20:24:04 +01:00
Zamil Majdy
8b577ae194 feat(backend/copilot): add direct ID lookup to find_agent and find_block tools (#12446)
## Summary
- Add direct `creator/slug` lookup to `find_agent` marketplace search,
bypassing full-text search when an exact identifier is provided
- Add direct UUID lookup to `find_block`, returning the block
immediately when a valid block ID is given
- Update tool descriptions and parameter hints to document the new
lookup capabilities

## Test plan
- [ ] Verify `find_agent` with a `creator/slug` query returns the exact
agent
- [ ] Verify `find_agent` falls back to search when slug lookup fails
- [ ] Verify `find_block` with a block UUID returns the exact block
- [ ] Verify `find_block` with a non-existent UUID falls through to
search
- [ ] Verify excluded block types/IDs are still filtered in direct
lookup
2026-03-17 16:41:17 +00:00
Zamil Majdy
d8f5f783ae feat(copilot): enable SmartDecisionMakerBlock in agent generator (#12438)
## Summary
- Enable the agent generator to create orchestrator agents using
**SmartDecisionMakerBlock** with agent mode
- SmartDecisionMaker + AgentExecutorBlock tools = autonomous agent that
decides which sub-agents to call, executes them, reads results, and
loops until done
- Follows existing patterns (AgentExecutorBlock/MCPToolBlock) for fixer,
validator, and guide documentation

## Changes
- Remove SmartDecisionMakerBlock from `COPILOT_EXCLUDED_BLOCK_IDS` in
`find_block.py`
- Add `SMART_DECISION_MAKER_BLOCK_ID` constant to `helpers.py`
- Add `fix_smart_decision_maker_blocks()` in `fixer.py` — populates
agent-mode defaults (`max_iterations=-1`,
`conversation_compaction=True`, etc.)
- Add `validate_smart_decision_maker_blocks()` in `validator.py` —
ensures downstream tool blocks are connected
- Add SmartDecisionMakerBlock documentation section in
`agent_generation_guide.md`
- Add 18 tests: 7 fixer, 7 validator, 4 e2e pipeline

## Test plan
- [x] All 18 new tests pass
(`test/agent_generator/test_smart_decision_maker.py`)
- [x] All 31 existing agent generator tests still pass
- [x] Pre-commit hooks (ruff, black, isort, pyright) all pass
- [ ] Manual: use CoPilot to generate an orchestrator agent with
SmartDecisionMakerBlock

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-17 16:30:04 +00:00
Reinier van der Leer
82d22f3680 dx(backend): Update CLAUDE.md (#12458)
- Prefer f-strings except for debug statements
- Top-down module/function/class ordering

As suggested by @majdyz, this is more effective than commenting on every
single instance on PRs.
2026-03-17 16:27:09 +00:00
Zamil Majdy
50622333d1 fix(backend/copilot): fix tool-result file read failures across turns (#12399)
## Summary
- **Path validation fix**: `is_allowed_local_path()` now correctly
handles the SDK's nested conversation UUID path structure
(`<encoded-cwd>/<conversation-uuid>/tool-results/<file>`) instead of
only matching `<encoded-cwd>/tool-results/<file>`
- **`read_workspace_file` fallback**: When the model mistakenly calls
`read_workspace_file` for an SDK tool-result path (local disk, not cloud
storage), the tool now falls back to reading from local disk instead of
returning "file not found"
- **Cross-turn cleanup fix**: Stopped deleting
`~/.claude/projects/<encoded-cwd>/` between turns — tool-result files
now persist across `--resume` turns so the model can re-read them. Added
TTL-based stale directory sweeping (24h) to prevent unbounded disk
growth.
- **System prompt**: Added guidance telling the model to use `read_file`
(not `read_workspace_file`) for SDK tool-result paths
- **Symlink escape fix** (e2b_file_tools.py): Added `readlink -f`
canonicalization inside the E2B sandbox to detect symlink-based path
escapes before writes
- **Stash timeout increase**: `wait_for_stash` timeout increased from
0.5s to 2.0s, with a post-timeout `sleep(0)` fallback

### Root cause
Investigated via Langfuse trace `5116befdca6a6ff9a8af6153753e267d`
(session `d5841fd8`). The model ran 3 Perplexity deep research calls,
SDK truncated large outputs to `~/.claude/projects/.../tool-results/`
files. Model then called `read_workspace_file` (cloud DB) instead of
`read_file` (local disk), getting "file not found". Additionally, the
path validation check didn't account for the SDK's nested UUID directory
structure, and cleanup between turns deleted tool-result files that the
transcript still referenced.

## Test plan
- [x] All 653 copilot tests pass (excluding 1 pre-existing infra test)
- [x] Security test `test_read_claude_projects_settings_json_denied`
still passes — non-tool-result files under the project dir are still
blocked
- [x] `poetry run format` passes all checks
2026-03-17 15:57:15 +00:00
Zamil Majdy
27af5782a9 feat(skills): add gh pr checks --watch to pr-address loop (#12457)
## Summary
- Teaches the `pr-address` skill to use `gh pr checks --watch
--fail-fast` for efficient CI waiting instead of manual polling
- Adds guidance on investigating failures with `gh run view
--log-failed`
- Adds explicit "between CI waits" section: re-fetch and address new bot
comments while CI runs

## Test plan
- [x] Verified the updated skill renders correctly
- [ ] Use `/pr-address` on a PR with pending CI to confirm the new flow
works
2026-03-17 22:10:18 +07:00
Otto
522f932e67 Merge commit from fork
SendEmailBlock accepted user-supplied smtp_server and smtp_port inputs
and passed them directly to smtplib.SMTP() with no IP validation,
bypassing the platform's SSRF protections in request.py.

This fix:
- Makes _resolve_and_check_blocked public in request.py so non-HTTP
  blocks can reuse the same IP validation
- Validates the SMTP server hostname via resolve_and_check_blocked()
  before connecting
- Restricts allowed SMTP ports to standard values (25, 465, 587, 2525)
- Catches SMTPConnectError and SMTPServerDisconnected to prevent TCP
  banner leakage in error messages

Fixes GHSA-4jwj-6mg5-wrwf
2026-03-17 15:55:49 +01:00
Otto
a6124b06d5 Merge commit from fork
* fix(backend): add HMAC signing to Redis cache to prevent pickle deserialization attacks

Add HMAC-SHA256 integrity verification to all values stored in the shared
Redis cache. This prevents cache poisoning attacks where an attacker with
Redis access injects malicious pickled payloads that execute arbitrary code
on deserialization.

Changes:
- Sign pickled values with HMAC-SHA256 before storing in Redis
- Verify HMAC signature before deserializing cached values
- Reject tampered or unsigned (legacy) cache entries gracefully
  (treated as cache misses, logged as warnings)
- Derive HMAC key from redis_password or unsubscribe_secret_key
- Add tests for HMAC round-trip, tamper detection, and legacy rejection

Fixes GHSA-rfg2-37xq-w4m9

* improve log message

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-17 15:52:37 +01:00
Otto
ae660ea04f Merge commit from fork
Replace NamedTemporaryFile(delete=False) with a direct Response,
preventing unbounded disk consumption on the public download endpoint.

Fixes: GHSA-374w-2pxq-c9jp
2026-03-17 15:33:55 +01:00
Otto
2479f3a1c4 Merge commit from fork
- Normalize IPv4-mapped IPv6 addresses (e.g. ::ffff:127.0.0.1) to IPv4
  before checking against blocked networks, preventing blocklist bypass
- Add missing blocked ranges: CGNAT (100.64.0.0/10), IETF Protocol
  Assignments (192.0.0.0/24), Benchmarking (198.18.0.0/15)
- Add comprehensive tests for IPv4-mapped bypass and new blocked ranges
2026-03-17 14:43:38 +01:00
Abhimanyu Yadav
8153306384 feat(frontend): reusable confetti with enhanced particles and dual bursts (#12454)
<!-- Clearly explain the need for these changes: -->

The previous confetti implementation using party-js was causing lag.
Replaced it with canvas-confetti for smoother, more performant
celebrations with enhanced visual effects.

### Changes 🏗️

- **New Confetti Component**: Reusable canvas-confetti wrapper with
AutoGPT purple color palette and Storybook stories demonstrating various
effects
- **Enhanced Wallet Confetti**: Dual simultaneous bursts at 45° and 135°
angles with larger particles (scalar 1.2) for better visibility
- **Enhanced Task Celebration**: Dual-burst confetti for task group and
individual task completion events
- **Onboarding Congrats Page**: Replaced party-js with canvas-confetti
for side-cannon animation effect
- **Dependency**: Added canvas-confetti v1.9.4, removed party-js

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Trigger task completion in wallet to see dual-burst confetti at
45° and 135° angles
- [x] Complete tasks/groups to verify celebration confetti displays with
larger particles
  - [x] Visit onboarding congratulations page to see side-cannon effect
  - [x] Verify confetti rendering performance and no console errors
2026-03-17 12:49:15 +00:00
Abhimanyu Yadav
9c3d100a22 feat(frontend): add builder e2e tests for new Flow Editor (#12436)
### Changes
- Replace skipped legacy builder tests with 8 working Playwright e2e
tests
  targeting the new Flow Editor
- Rewrite `BuildPage` page object to match new `data-id`/`data-testid`
  selectors
- Update `agent-activity.spec.ts` to use new `BuildPage` API

### Tests added
  - Build page loads successfully (canvas + control buttons)
  - Add a block via block menu search
  - Add multiple blocks
  - Remove a block (select + Backspace)
  - Save an agent (name/description, verify flowID in URL)
  - Save and verify run button becomes enabled
  - Copy and paste a node (Cmd+C/V)
  - Run an agent from the builder

 ### Test plan
  - [x] All 8 builder tests pass locally (`pnpm test:no-build
  src/tests/build.spec.ts`)
  - [x] `pnpm format`, `pnpm lint`, `pnpm types` all clean
  - [x] CI passes
2026-03-17 12:48:59 +00:00
Zamil Majdy
fc3bf6c154 fix(copilot): handle transient Anthropic API connection errors gracefully (#12445)
## Summary
- Detect transient Anthropic API errors (ECONNRESET, "socket connection
was closed unexpectedly") across all error paths in the copilot SDK
streaming loop
- Replace raw technical error messages with user-friendly text:
**"Anthropic connection interrupted — please retry"**
- Add `retryable` field to `StreamError` model so the frontend can
distinguish retryable errors
- Add **"Try Again" button** on the error card for transient errors,
which re-sends the last user message

### Background
Sentry issue
[AUTOGPT-SERVER-875](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-875)
— 25+ events since March 13, caused by Anthropic API infrastructure
instability (confirmed by their status page). Same SDK/code on dev and
prod, prod-only because of higher volume of long-running streaming
sessions.

### Changes
**Backend (`constants.py`, `service.py`, `response_adapter.py`,
`response_model.py`):**
- `is_transient_api_error()` — pattern-matching helper for known
transient error strings
- Intercept transient errors in 3 places: `AssistantMessage.error`,
stream exceptions, `BaseException` handler
- Use friendly message in error markers persisted to session (so it
shows properly on page refresh too)
- `StreamError.retryable` field for frontend consumption

**Frontend (`ChatContainer`, `ChatMessagesContainer`,
`MessagePartRenderer`):**
- Thread `onRetry` callback from `ChatContainer` →
`ChatMessagesContainer` → `MessagePartRenderer`
- Detect transient error text in error markers and show "Try Again"
button via existing `ErrorCard.onRetry`
- Clicking "Try Again" re-sends the last user message (backend
auto-cleans stale error markers)

Fixes SECRT-2128, SECRT-2129, SECRT-2130

## Test plan
- [ ] Verify transient error detection with `is_transient_api_error()`
for known patterns
- [ ] Confirm error card shows "Anthropic connection interrupted —
please retry" instead of raw socket error
- [ ] Confirm "Try Again" button appears on transient error cards
- [ ] Confirm "Try Again" re-sends the last user message successfully
- [ ] Confirm non-transient errors (e.g., "Prompt is too long") still
show original error text without retry button
- [ ] Verify error marker persists correctly on page refresh
2026-03-17 12:48:53 +00:00
Abhimanyu Yadav
e32d258a7e feat(blocks): add AgentMail integration blocks (#12417)
## Summary
- Add a full AgentMail integration with blocks for managing inboxes,
messages, threads, drafts, attachments, lists, and pods
- Includes shared provider configuration (`_config.py`) with API key
authentication
- 8 block modules covering ~25 individual blocks across all AgentMail
API surfaces

  ## Block Modules
  | Module | Blocks |
  |--------|--------|
  | `inbox.py` | Create, Get, List, Update, Delete inboxes |
| `messages.py` | Send, Get, List, Delete messages + org-wide listing |
  | `threads.py` | Get, List, Delete threads + org-wide listing |
| `drafts.py` | Create, Get, List, Update, Send, Delete drafts +
org-wide listing |
  | `attachments.py` | Download attachments |
  | `lists.py` | Create, Get, List, Update, Delete mailing lists |
  | `pods.py` | Create, Get, List, Update, Delete pods |

  ## Test plan
- [x] `poetry run pytest 'backend/blocks/test/test_block.py' -xvs` — all
new blocks pass the standard block test suite
  - [x] test all blocks manually
2026-03-17 12:40:32 +00:00
Abhimanyu Yadav
3e86544bfe feat(frontend): add graph search functionality to new builder (#12395)
### Changes
- Integrates the existing graph search components into the new builder's
control panel
- Search by block name/title, block type, node inputs/outputs, and
description with fuzzy matching
  (Jaro-Winkler)
- Clicking a result zooms/navigates to the node on the canvas
- Keyboard shortcut Cmd/Ctrl+F to open search
- Arrow key navigation and Enter to select within results
- Styled to match the new builder's block menu card pattern


https://github.com/user-attachments/assets/41ed676d-83b1-4f00-8611-00d20987a7af


### Test plan

- [x] Open builder with a graph containing multiple nodes
- [x] Click magnifying glass icon in control panel — search panel opens
- [x] Type a query — results filter by name, type, inputs, outputs
- [x] Click a result — canvas zooms to that node
- [x] Use arrow keys + Enter to navigate and select results
- [x] Press Cmd/Ctrl+F — search panel opens
- [x] Press Escape or click outside — search panel closes and query
clears
2026-03-17 12:19:54 +00:00
Abhimanyu Yadav
c6b729bdfa fix(frontend): replace custom LibraryTabs with design system TabsLine (#12444)
Replaces the custom LibraryTabs component with the design system's
TabsLine component throughout the library page for better UI
consistency. Also wires up favorite animation refs and removes the
unused `agentGraphVersion` field from the test fixture.

### Changes 🏗️

- Replace `LibraryTabs` with `TabsLine` from design system in
`FavoritesSection`, `LibrarySubSection`, and `page.tsx`
- Add favorite animation ref registration in `FavoritesSection` and
`LibrarySubSection`
- Inline tab type definition as `{ id: string; title: string; icon: Icon
}` in component props
- Remove unused `agentGraphVersion` field from `load_store_agents.py`
test

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Library page renders with both "All" and "Favorites" tabs using
TabsLine component
  - [x] Tab switching between all agents and favorites works correctly
  - [x] Favorite animations reference the correct tab element
2026-03-17 10:39:12 +00:00
Zamil Majdy
7a391fbd99 feat(platform): CoPilot credit charging, token rate limiting, and usage UI (#12385)
### Background
CoPilot block execution was not charging credits, LLM token usage was
not tracked, and there was no per-user rate limiting. This PR adds all
three, plus a frontend usage indicator.

### Screenshot

<!-- Drag-drop the usage limits screenshot here -->

### Changes

**Credit Charging** (`copilot/tools/helpers.py`)
- Pre-execution balance check + post-execution credit deduction via
`block_usage_cost` / `spend_credits`
- Uses adapter pattern (RPC fallback) so it works in the CoPilot
executor which has no Prisma connection

**Token Rate Limiting** (`copilot/rate_limit.py`)
- Redis-backed daily + weekly fixed-window counters per user
- Fail-open on Redis outages, clock-skew-safe weekly boundaries
- Configurable via `daily_token_limit` / `weekly_token_limit` (0 =
unlimited)

**Token Tracking**
- *Baseline* (`copilot/baseline/service.py`):
`stream_options={"include_usage": True}` with tiktoken fallback
estimation
- *SDK* (`copilot/sdk/service.py`): Extract usage from Claude Agent SDK
`ResultMessage`, including cached tokens
- Both: yield `StreamUsage` SSE events, persist `Usage` records, call
`record_token_usage` in `finally`

**Usage API** (`api/features/chat/routes.py`)
- `GET /api/chat/usage` — returns `CoPilotUsageStatus` (daily/weekly
used, limit, resets_at)
- Pre-turn `check_rate_limit` in `stream_chat_post` (returns 429 on
exceed)

**Frontend** (`copilot/components/UsageLimits/`)
- `UsageLimits` popover with daily/weekly progress bars, reset times,
dark mode
- `useUsageLimits` hook with 30s auto-refresh via generated Orval API
hook

### Tests
| Area | Tests | File |
|------|-------|------|
| Rate limiting | 22 | `rate_limit_test.py` |
| Credit charging | 12 | `helpers_test.py` |
| Usage API | 3 | `routes_test.py` |
| Frontend UI | 9 | `UsageLimits.test.tsx` |

### Checklist

- [x] Changes clearly listed
- [x] Test plan created and executed (46 backend + 9 frontend tests)
- [x] Pre-commit hooks pass (formatting, linting, type checks)
- [x] `.env.default` compatible (new config defaults to 0 = unlimited)
- [x] `docker-compose.yml` compatible (no changes needed)
2026-03-17 08:43:27 +00:00
Zamil Majdy
791dd7cb48 fix(backend): split CamelCase block names and filter disabled blocks before batch slicing (#12400)
## Summary

Two bugs causing blocks to be invisible in CoPilot search:

### Bug 1: CamelCase block names not tokenized
Block names like `AITextGeneratorBlock` were indexed as single tokens in
the search database. PostgreSQL's `plainto_tsquery('english', ...)` and
the BM25 tokenizer both treat CamelCase as one word, so searching for
"text generator" produced zero lexical/BM25 match.

**Fix:** Split CamelCase names into separate words before indexing (e.g.
`"AI Text Generator Block"`) and in the BM25 tokenizer.

### Bug 2: Disabled blocks exhausting batch budget (root cause of 36
missing blocks)
The `batch_size` limit in `get_missing_items()` was applied **before**
filtering out disabled blocks. With 120+ disabled blocks and
`batch_size=100`, the first 100 missing entries were all disabled
(skipped via `continue`), leaving the 36 enabled blocks beyond the slice
boundary **never indexed**. This made core blocks like
`AITextGeneratorBlock`, `AIConversationBlock`, `AIListGeneratorBlock`,
etc. completely invisible to search.

**Fix:** Filter disabled blocks from the missing list before slicing by
`batch_size`.

### Changes
- **`content_handlers.py`**: 
- Split CamelCase block names into space-separated words when building
`searchableText`
- Filter disabled blocks before applying `batch_size` slice so enabled
blocks aren't starved
- **`hybrid_search.py`**: Updated BM25 `tokenize()` to split CamelCase
tokens

### Evidence from local DB
```
Indexed blocks: 341
Total blocks: 497 (156 missing from index)
Missing (non-disabled): 36 — including AITextGeneratorBlock, AIConversationBlock, etc.

# batch_size analysis:
First 100 missing: 0 enabled, 100 disabled  ← batch exhausted by disabled blocks
After 100: 36 enabled                        ← never reached!
```

## Test plan
- [ ] Verify CamelCase splitting: `AITextGeneratorBlock` → `AI Text
Generator Block`
- [ ] Run `poetry run pytest backend/api/features/store/` for
regressions
- [ ] After deploy, trigger embedding backfill and verify all 36 blocks
get indexed
- [ ] Search for "text generator" in CoPilot and verify
`AITextGeneratorBlock` appears
2026-03-17 08:36:53 +00:00
Abhimanyu Yadav
f0800b9420 feat(frontend): add rich media previews for Builder node outputs and file inputs (#12432)
### Changes
- Add YouTube/Vimeo embed support to `VideoRenderer` — URLs render as
embedded
  iframe players instead of plain text
- Add new `AudioRenderer` — HTTP audio URLs (.mp3, .wav, .ogg, .m4a,
.aac,
  .flac) and data URIs render as inline audio players
- Add new `LinkRenderer` — any HTTP/HTTPS URL not claimed by a media
renderer
  becomes a clickable link with an external-link icon
- Add media preview button to `FileInput` — uploaded audio, video, and
image
files show an Eye icon that opens a preview dialog reusing the
OutputRenderer
  system
- Update `ContentRenderer` shortContent gate to allow new renderers
through in
  node previews


https://github.com/user-attachments/assets/eea27fb7-3870-4a1e-8d08-ba23b6e07d74

### Test plan
- [x] `pnpm vitest run src/components/contextual/OutputRenderers/` — 36
tests
  passing
- [x] `pnpm format && pnpm lint && pnpm types` — all clean
- [x] Manual: run a block that outputs a YouTube URL → embedded player
- [x] Manual: run a block that outputs an audio file URL → audio player
- [x] Manual: run a block that outputs a generic URL → clickable link
- [x] Manual: upload an audio/video/image file to a file input → Eye
icon
  appears, clicking opens preview dialog
2026-03-17 07:09:02 +00:00
Abhimanyu Yadav
60bc49ba50 fix(platform): fix image delete button on EditAgentForm (#12362)
### Summary
- SECRT-2094: Fix store image delete button accidentally submitting the
edit form — the remove image <button> in ThumbnailImages.tsx was missing
type="button", causing it to act as a form submit inside the
EditAgentForm. This closed the modal and showed a success toast without
the user clicking "Update submission".

https://github.com/user-attachments/assets/86cbdd7d-90b1-473c-9709-e75e956dea6b

###  Changes
- `frontend/.../ThumbnailImages.tsx` — added type="button" to image
remove button
2026-03-17 07:06:05 +00:00
Abhimanyu Yadav
ba4f4b6242 test(frontend): add integration tests for builder UI state stores and draft recovery (part-2) (#12435)
### Changes
- Add integration tests for `controlPanelStore` (sidebar panel state
  management)
- Add integration tests for `blockMenuStore` (search/filter/category
state,
  creator list deduplication, reset behavior)
- Add integration tests for `tutorialStore` (tutorial lifecycle, step
  progression, input values)
- Add integration tests for `DraftRecoveryPopup` (diff summary
rendering,
  restore/discard actions, null diff fallback, singular/plural text)

### Test plan
  - [x] All 54 tests pass across 4 new test files
  - [x] `pnpm format` clean
  - [x] `pnpm lint` clean
  - [x] `pnpm types` clean
2026-03-17 07:05:51 +00:00
Nicholas Tindle
8892bcd230 docs: Add workspace and media file architecture documentation (#11989)
### Changes 🏗️

- Added comprehensive architecture documentation at
`docs/platform/workspace-media-architecture.md` covering:
  - Database models (`UserWorkspace`, `UserWorkspaceFile`)
  - `WorkspaceManager` API with session scoping
- `store_media_file()` media normalization pipeline (input types, return
formats)
  - Virus scanning responsibility boundaries
- Decision tree for choosing `WorkspaceManager` vs `store_media_file()`
- Configuration reference including `clamav_max_concurrency` and
`clamav_mark_failed_scans_as_clean`
  - Common patterns with error handling examples
- Updated `autogpt_platform/backend/CLAUDE.md` with a "Workspace & Media
Files" section referencing the new docs
- Removed duplicate `scan_content_safe()` call from
`WriteWorkspaceFileTool` — `WorkspaceManager.write_file()` already scans
internally, so the tool was double-scanning every file
- Replaced removed comment in `workspace.py` with explicit ownership
comment clarifying that `WorkspaceManager` is the single scanning
boundary

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `scan_content_safe()` is called inside
`WorkspaceManager.write_file()` (workspace.py:186)
- [x] Verified `store_media_file()` scans all input branches including
local paths (file.py:351)
- [x] Verified documentation accuracy against current source code after
merge with dev
  - [x] CI checks all passing

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Mostly adds documentation and internal developer guidance; the only
code change is a comment clarifying `WorkspaceManager.write_file()` as
the single virus-scanning boundary, with no behavior change.
> 
> **Overview**
> Adds a new `docs/platform/workspace-media-architecture.md` describing
the Workspace storage layer vs the `store_media_file()` media pipeline,
including session scoping and virus-scanning/persistence responsibility
boundaries.
> 
> Updates backend `CLAUDE.md` to point contributors to the new doc when
working on CoPilot uploads/downloads or
`WorkspaceManager`/`store_media_file()`, and clarifies in
`WorkspaceManager.write_file()` (comment-only) that callers should not
duplicate virus scanning.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
18fcfa03f8. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 06:12:26 +00:00
Zamil Majdy
48ff8300a4 Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-03-17 13:13:42 +07:00
Abhimanyu Yadav
c268fc6464 test(frontend/builder): add integration tests for builder stores, components, and hooks (part-1) (#12433)
### Changes
- Add 329 integration tests across 11 test files for the builder (visual
  workflow editor)
- Cover all Zustand stores (nodeStore, edgeStore, historyStore,
graphStore,
  copyPasteStore, blockMenuStore, controlPanelStore)
- Cover key components (CustomNode, NewBlockMenu, NewSaveControl,
RunGraph)
- Cover hooks (useFlow, useCopyPaste)

### Test files

  | File | Tests | Coverage |
  |------|-------|----------|
| `nodeStore.test.ts` | 58 | Node lifecycle, bulk ops, backend
conversion,
  execution tracking, status, errors, resolution mode |
  | `edgeStore.test.ts` | 37 | Edge CRUD, duplicate rejection, bead
  visualization, backend link conversion, upsert |
| `historyStore.test.ts` | 22 | Undo/redo, history limits (50),
microtask
  batching, deduplication, canUndo/canRedo |
| `graphStore.test.ts` | 28 | Execution status transitions,
isGraphRunning,
  schema management, sub-graphs |
| `copyPasteStore.test.ts` | 8 | Copy/paste with ID remapping, position
offset,
   edge preservation |
| `CustomNode.test.tsx` | 25 | Rendering by block type (NOTE, WEBHOOK,
AGENT,
  OUTPUT, AYRSHARE), error states |
| `NewBlockMenu.test.tsx` | 29 | Store state (search, filters, creators,
  categories), search/default view routing |
| `NewSaveControl.test.tsx` | 11 | Save dialog rendering, form
validation,
  version display, popover state |
| `RunGraph.test.tsx` | 11 | Run/stop button states, loading, click
handlers,
  RunInputDialog visibility |
  | `useFlow.test.ts` | 4 | Loading states, initial load completion |
| `useCopyPaste.test.ts` | 16 | Clipboard copy/paste, UUID remapping,
viewport
  centering, input field guard |
2026-03-17 05:24:55 +00:00
Reinier van der Leer
aff3fb44af ci(platform): Improve end-to-end CI & reduce its cost (#12437)
Our CI costs are skyrocketing, most of it because of
`platform-fullstack-ci.yml`. The `types` job currently uses in a
`big-boi` runner (= expensive), but doesn't need to.
Additionally, the "end-to-end tests" job is currently in
`platform-frontend-ci.yml` instead of `platform-fullstack-ci.yml`,
causing it not to run on backend changes (which it should).

### Changes 🏗️

- Simplify `check-api-types` job (renamed from `types`) and make it use
regular `ubuntu-latest` runner
- Export API schema from backend through CLI (instead of spinning it up
in docker)
- Fix dependency caching in `platform-fullstack-ci.yml` (based on recent
improvements in `platform-frontend-ci.yml`)
- Move `e2e_tests` job to `platform-fullstack-ci.yml`

Out-of-scope but necessary:
- Eliminate module-level init of OpenAI client in
`backend.copilot.service`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI
2026-03-16 23:08:18 +00:00
Zamil Majdy
9a41312769 feat(backend/copilot): parse @@agptfile bare refs by file extension (#12392)
The `@@agptfile:` expansion system previously used content-sniffing
(trying
`json.loads` then `csv.Sniffer`) to decide whether to parse file content
as
structured data. This was fragile — a file containing just `"42"` would
be
parsed as an integer, and the heuristics could misfire on ambiguous
content.

This PR replaces content-sniffing with **extension/MIME-based format
detection**.
When the file has a well-known extension (`.json`, `.csv`, etc.) or MIME
type
fragment (`workspace://id#application/json`), the content is parsed
accordingly.
Unknown formats or parse failures always fall back to plain string — no
surprises.

> [!NOTE]
> This PR builds on the `@@agptfile:` file reference protocol introduced
in #12332 and the structured data auto-parsing added in #12390.
>
> **What is `@@agptfile:`?**
> It is a special URI prefix (e.g. `@@agptfile:workspace:///report.csv`)
that the CoPilot SDK expands inline before sending tool arguments to
blocks. This lets the AI reference workspace files by name, and the SDK
automatically reads and injects the file content. See #12332 for the
full design.

### Changes 🏗️

**New utility: `backend/util/file_content_parser.py`**
- `infer_format(uri)` — determines format from file extension or MIME
fragment
- `parse_file_content(content, fmt)` — parses content, never raises
- Supported text formats: JSON, JSONL/NDJSON, CSV, TSV, YAML, TOML
- Supported binary formats: Parquet (via pyarrow), Excel/XLSX (via
openpyxl)
- JSON scalars (strings, numbers, booleans, null) stay as strings — only
  containers (arrays, objects) are promoted
- CSV/TSV require ≥1 row and ≥2 columns to qualify as tabular data
- Added `openpyxl` dependency for Excel reading via pandas
- Case-insensitive MIME fragment matching per RFC 2045
- Shared `PARSE_EXCEPTIONS` constant to avoid duplication between
modules

**Updated `expand_file_refs_in_args` in `file_ref.py`**
- Bare refs now use `infer_format` + `parse_file_content` instead of the
  old `_try_parse_structured` content-sniffing function
- Binary formats (parquet, xlsx) read raw bytes via `read_file_bytes`
- Embedded refs (text around `@@agptfile:`) still produce plain strings
- **Size guards**: Workspace and sandbox file reads now enforce a 10 MB
limit
  (matching the existing local file limit) to prevent OOM on large files

**Updated `blocks/github/commits.py`**
- Consolidated `_create_blob` and `_create_binary_blob` into a single
function
  with an `encoding` parameter

**Updated copilot system prompt**
- Documents the extension-based structured data parsing and supported
formats

**66 new tests** in `file_content_parser_test.py` covering:
- Format inference (extension, MIME, case-insensitive, precedence)
- All 8 format parsers (happy path + edge cases + fallbacks)
- Binary format handling (string input fallback, invalid bytes fallback)
- Unknown format passthrough

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All 66 file_content_parser_test.py tests pass
  - [x] All 31 file_ref_test.py tests pass
  - [x] All 13 file_ref_integration_test.py tests pass
  - [x] `poetry run format` passes clean (including pyright)
2026-03-16 22:31:21 +00:00
Ubbe
048fb06b0a feat(frontend): add "Jump Back In" button to Library page (#12387)
Adds a "Jump Back In" CTA at the top of the Library page to encourage
users to quickly rerun their most recently successful agent.

Closes SECRT-1536

### Changes 🏗️

- New `JumpBackIn` component with `useJumpBackIn` hook at
`library/components/JumpBackIn/`
- Fetches first page of library agents sorted by `updatedAt`
- Finds the first agent with a `COMPLETED` execution in
`recent_executions`
- Shows banner with agent name + "Jump Back In" button linking to
`/library/agents/{id}`
- Returns `null` (hidden) when loading or when no agent with a
successful run exists

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format`, `pnpm lint`, `pnpm types` all pass
- [x] Verified banner is hidden when no successful runs exist (edge
case)
- [x] Verified library page renders correctly with no visual regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 21:35:03 +08:00
Zamil Majdy
3f653e6614 dx(.claude): refactor and consolidate Claude Code skills (#12424)
Refactors the Claude Code skills for a cleaner, more intuitive dev loop.

### Changes 🏗️

- **`/pr-review` (new)**: Actual code review skill — reads the PR diff,
fetches existing comments to avoid duplicates, and posts inline GitHub
comments with structured feedback (Blockers / Should Fix / Nice to Have
/ Nit) covering correctness, security, code quality, architecture, and
testing.

- **`/pr-address` (was `/babysit-pr`)**: Addresses review comments and
monitors CI until green. Renamed from `/babysit-pr` to `/pr-address` to
better reflect its purpose. Handles bot-specific feedback
(autogpt-reviewer, sentry, coderabbitai) and loops until all comments
are addressed and CI is green.

- **`/backend-check` + `/frontend-check` → `/check`**: Unified into a
single `/check` skill that auto-detects whether backend (Python) or
frontend (TypeScript) code changed and runs the appropriate formatting,
linting, type checking, and tests. Shared code quality rules applied to
both.

- **`/code-style` enhanced**: Now covers both Python and
TypeScript/React. Added learnings from real PR work: lazy `%s` logging,
TOCTOU awareness, SSE protocol rules (`data:` vs `: comment`), FastAPI
`Security()` vs `Depends()`, Redis pipeline atomicity, error path
sanitization, mock target rules after refactoring.

- **`/worktree` fixed**: Normal `git worktree` is now the default (was
branchlet-first). Branchlet moved to optional section. All paths derived
from `git rev-parse --show-toplevel`.

- **`/pr-create`, `/openapi-regen`, `/new-block` cleaned up**: Reference
`/check` and `/code-style` instead of duplicating instructions.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified all skill files parse correctly (valid YAML frontmatter)
  - [x] Verified skill auto-detection triggers updated in descriptions
- [x] Verified old backend-check and frontend-check directories removed
- [x] Verified pr-review and pr-address directories created with correct
content
2026-03-16 10:35:05 +00:00
Zamil Majdy
c9c3d54b2b fix(platform): reduce Sentry noise by filtering expected errors and downgrading log levels (#12430)
## Summary

Reduces Sentry error noise by ~90% by filtering out expected/transient
errors and downgrading inappropriate error-level logs to warnings. Most
of the top Sentry issues are not actual bugs but expected conditions
(user errors, transient infra, business logic) that were incorrectly
logged at ERROR level, causing them to be captured as Sentry events.

## Changes

### 1. Sentry `before_send` filter (`metrics.py`)
Added a `before_send` hook to filter known expected errors before they
reach Sentry:
- **AMQP/RabbitMQ connection errors** — transient during
deploys/restarts
- **User credential errors** — invalid API keys, missing auth headers
(user error, not platform bug)
- **Insufficient balance** — expected business logic
- **Blocked IP access** — security check working as intended
- **Discord bot token errors** — misconfiguration, not runtime error
- **Google metadata DNS errors** — expected in non-GCP environments
- **Inactive email recipients** — expected for bounced addresses
- **Unclosed client sessions/connectors** — resource cleanup noise

### 2. Connection retry log levels (`retry.py`)
- `conn_retry` final failure: `error` → `warning` (these are infra
retries, not bugs)
- `conn_retry` wrapper final failure: `error` → `warning`
- Discord alert send failure: `error` → `warning`

### 3. Block execution Sentry capture (`manager.py`)
- Skip `sentry_sdk.capture_exception()` for `ValueError` subclasses
(BlockExecutionError, BlockInputError, InsufficientBalanceError, etc.) —
these are user-caused errors, not platform bugs
- Downgrade executor shutdown/disconnect errors to warning

### 4. Scheduler log levels (`scheduler.py`)
- Graph validation failure: `error` → `warning` (expected for
old/invalid graphs)
- Unable to unschedule graph: `error` → `warning`
- Job listener failure: `error` → `warning`
- Async operation failure: `error` → `warning`

### 5. Discord system alert (`notifications.py`)
- Wrapped `discord_system_alert` endpoint with try/catch to prevent
unhandled exceptions (fixes AUTOGPT-SERVER-743, AUTOGPT-SERVER-7MW)

### 6. Notification system log levels (`notifications.py`)
- All batch processing errors: `error` → `warning`
- User email not found: `error` → `warning`
- Notification parsing errors: `error` → `warning`
- Email sending failures: `error` → `warning`
- Summary data gathering failure: `error` → `warning`
- Cleaned up unprofessional error messages

### 7. Cloud storage cleanup (`cloud_storage.py`)
- Cleanup error: `error` → `warning`

## Sentry Issues Addressed

### AMQP/RabbitMQ (~3.4M events total)
-
[AUTOGPT-SERVER-3H2](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H2)
— AMQPConnector ConnectionRefusedError (1.2M events)
-
[AUTOGPT-SERVER-3H3](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H3)
— AMQPConnectionWorkflowFailed (770K events)
-
[AUTOGPT-SERVER-3H4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H4)
— AMQP connection workflow failed (770K events)
-
[AUTOGPT-SERVER-3H5](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H5)
— AMQPConnectionWorkflow reporting failure (770K events)
-
[AUTOGPT-SERVER-3H7](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H7)
— Socket failed to connect (514K events)
-
[AUTOGPT-SERVER-3H8](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H8)
— TCP Connection attempt failed (514K events)
-
[AUTOGPT-SERVER-3H6](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H6)
— AMQPConnectionError (93K events)
-
[AUTOGPT-SERVER-7SX](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7SX)
— Error creating transport (69K events)
-
[AUTOGPT-SERVER-1TN](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-1TN)
— ChannelInvalidStateError (39K events)
-
[AUTOGPT-SERVER-6JC](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6JC)
— ConnectionClosedByBroker (2K events)
-
[AUTOGPT-SERVER-6RJ/6RK/6RN/6RQ/6RP/6RR](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6RJ)
— Various connection failures (~15K events)
-
[AUTOGPT-SERVER-4A5/6RM/7XN](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4A5)
— Connection close/transport errors (~540 events)

### User Credential Errors (~15K events)
-
[AUTOGPT-SERVER-6S5](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6S5)
— Incorrect OpenAI API key (9.2K events)
-
[AUTOGPT-SERVER-7W4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7W4)
— Incorrect API key in AIConditionBlock (3.4K events)
-
[AUTOGPT-SERVER-83Y](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-83Y)
— AI condition invalid key (2.3K events)
-
[AUTOGPT-SERVER-7ZP](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7ZP)
— Perplexity missing auth header (451 events)
-
[AUTOGPT-SERVER-7XK/7XM](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7XK)
— Anthropic invalid key (125 events)
-
[AUTOGPT-SERVER-82C](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-82C)
— Missing auth header (27 events)
-
[AUTOGPT-SERVER-721](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-721)
— Ideogram invalid token (165 events)

### Business Logic / Validation (~120K events)
-
[AUTOGPT-SERVER-7YQ](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7YQ)
— Disabled block used in graph (56K events)
-
[AUTOGPT-SERVER-6W3](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6W3)
— Graph failed validation (46K events)
-
[AUTOGPT-SERVER-6W2](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6W2)
— Unable to unschedule graph (46K events)
-
[AUTOGPT-SERVER-83X](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-83X)
— Blocked IP access (15K events)
-
[AUTOGPT-SERVER-6K9](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6K9)
— Insufficient balance (4K events)

### Discord Alert Failures (~24K events)
-
[AUTOGPT-SERVER-743](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-743)
— Discord improper token (22K events)
-
[AUTOGPT-SERVER-7MW](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7MW)
— Discord 403 Missing Access (1.5K events)

### Notification System (~16K events)
-
[AUTOGPT-SERVER-550](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-550)
— Notification batch create error (8.3K events)
-
[AUTOGPT-SERVER-58H](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-58H)
— ValidationError for NotificationEventModel (3K events)
-
[AUTOGPT-SERVER-5C6](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-5C6)
— Get notification batch error (2.1K events)
-
[AUTOGPT-SERVER-4BT](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4BT)
— Notification batch create error (1.8K events)
-
[AUTOGPT-SERVER-5E4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-5E4)
— NotificationPreference validation (1.4K events)
-
[AUTOGPT-SERVER-508](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-508)
— Inactive email recipients (702 events)

### Infrastructure / Transient (~20K events)
-
[AUTOGPT-SERVER-6WJ](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6WJ)
— Unclosed client session (13K events)
-
[AUTOGPT-SERVER-745](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-745)
— Unclosed connector (5.8K events)
-
[AUTOGPT-SERVER-4V1](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4V1)
— Google metadata DNS error (2.2K events)
-
[AUTOGPT-SERVER-80J](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-80J)
— CloudStorage DNS error (35 events)

### Executor Shutdown
-
[AUTOGPT-SERVER-55J](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-55J)
— Error disconnecting run client (118 events)

## Test plan
- [x] All pre-commit hooks pass (Ruff, isort, Black, Pyright typecheck)
- [x] All changed modules import successfully
- [ ] Deploy to staging and verify Sentry event volume drops
significantly
- [ ] Verify legitimate errors still appear in Sentry
2026-03-16 10:29:01 +00:00
Ubbe
53d58e21d3 feat(frontend): replace technical block terminology with user-friendly labels (#12389)
## Summary
- Replaces all user-facing "block" terminology in the CoPilot activity
stream with plain-English labels ("Step failed", "action",
"Credentials", etc.)
- Adds `humanizeFileName()` utility to display file names without
extensions, with title-case and spaces (e.g. `executive_memo.md` →
`"Executive Memo"`)
- Updates error messages across RunBlock, RunAgent, and FindBlocks tools
to use friendly language

## Test plan
- [ ] Open CoPilot and trigger a block execution — verify animation text
says "Running" / "Step failed" instead of "Running the block" / "Error
running block"
- [ ] Trigger a file read/write action — verify the activity shows
humanized file names (e.g. `Reading "Executive Memo"` not `Reading
executive_memo.md`)
- [ ] Trigger FindBlocks — verify labels say "Searching for actions" and
"Results" instead of "Searching for blocks" and "Block results"
- [ ] Check the work-done stats bar — verify it shows "action" /
"actions" instead of "block run" / "block runs"
- [ ] Trigger a setup requirements card — verify labels say
"Credentials" and "Inputs" instead of "Block credentials" and "Block
inputs"
- [ ] Visit `/copilot/styleguide` — verify error test data no longer
contains "Block execution" text

Resolves: [SECRT-2025](https://linear.app/autogpt/issue/SECRT-2025)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 09:00:25 +00:00
Ubbe
fa04fb41d8 feat(frontend): add "Run now" button to schedule view (#12388)
Adds a "Run now" action to the schedule detail view and sidebar
dropdown, allowing users to immediately trigger a scheduled agent run
without waiting for the next cron execution.

### Changes 🏗️

- **`useSelectedScheduleActions.ts`**: Added
`usePostV1ExecuteGraphAgent` hook and `handleRunNow` function that
executes the agent using the schedule's stored `input_data` and
`input_credentials`. On success, invalidates runs query and navigates to
the new run
- **`SelectedScheduleActions.tsx`**: Added Play icon button as first
action button, with loading spinner while running
- **`SelectedScheduleView.tsx`**: Threads `onSelectRun` prop and
`schedule` object to action components (both mobile and desktop layouts)
- **`NewAgentLibraryView.tsx`**: Passes `onSelectRun` handler to enable
navigation to the new run after execution
- **`ScheduleActionsDropdown.tsx`**: Added "Run now" dropdown menu item
with same execution logic
- **`ScheduleListItem.tsx`**: Added `onRunCreated` prop passed to
dropdown
- **`SidebarRunsList.tsx`**: Connects sidebar dropdown to run
selection/navigation

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format`, `pnpm lint`, `pnpm types` all pass
- [x] Code review: follows existing patterns (mirrors "Run Again" in
SelectedRunActions)
  - [x] No visual regressions on agent detail page

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 17:00:41 +08:00
Otto
d9c16ded65 fix(copilot): prioritize block discovery over MCP and sanitize HTML errors (#12394)
Requested by @majdyz

When a user asks for Google Sheets integration, the CoPilot agent skips
block discovery entirely (despite 55+ Google Sheets blocks being
available), jumps straight to MCP, guesses a fake URL
(`https://sheets.googleapis.com/mcp`), and gets a raw HTML 404 error
page dumped into the conversation.

**Changes:**

1. **MCP guide** (`mcp_tool_guide.md`): Added "Check blocks first"
section directing the agent to use `find_block` before attempting MCP
for any service not in the known servers list. Explicitly prohibits
guessing/constructing MCP server URLs.

2. **Error handling** (`run_mcp_tool.py`): Detects HTML error pages in
HTTP responses (e.g. raw 404 pages from non-MCP endpoints) and returns a
clean one-liner like "This URL does not appear to host an MCP server"
instead of dumping the full HTML body.

**Note:** The main CoPilot system prompt (managed externally, not in
repo) should also be updated to reinforce block-first behavior in the
Capability Check section. This PR covers the in-repo changes.

Session reference: `9216df83-5f4a-48eb-9457-3ba2057638ae` (turn 3)
Ticket: [SECRT-2116](https://linear.app/autogpt/issue/SECRT-2116)

---
Co-authored-by: Zamil Majdy (@majdyz) <majdyz@gmail.com>

---------

Co-authored-by: Zamil Majdy (@majdyz) <majdyz@gmail.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-14 12:49:03 +00:00
Otto
6dc8429ae7 fix(copilot): downgrade agent validation failure log from error to warning (#12409)
Agent validation failures are expected when the LLM generates invalid
agent graphs (wrong block IDs, missing required inputs, bad output field
names). The validator catches these and returns proper error responses.

However, `validator.py:938` used `logger.error()`, which Sentry captures
as error events — flooding #platform-alerts with non-errors.

This changes it to `logger.warning()`, keeping the log visible for
debugging without triggering Sentry alerts.

Fixes SECRT-2120

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-14 12:48:36 +00:00
Zamil Majdy
cfe22e5a8f fix(backend/copilot): sync TranscriptBuilder with CLI on mid-stream compaction (#12401)
## Summary
- **Root cause**: `TranscriptBuilder` accumulates all raw SDK stream
messages including pre-compaction content. When the CLI compacts
mid-stream, the uploaded transcript was still uncompacted, causing
"Prompt is too long" errors on the next `--resume` turn.
- **Fix**: Detect mid-stream compaction via the `PreCompact` hook, read
the CLI's session file to get the compacted entries (summary +
post-compaction messages), and call
`TranscriptBuilder.replace_entries()` to sync it with the CLI's active
context. This ensures the uploaded transcript always matches what the
CLI sees.
- **Key changes**:
- `CompactionTracker`: stores `transcript_path` from `PreCompact` hook,
one-shot `compaction_just_ended` flag that correctly resets for multiple
compactions
- `read_compacted_entries()`: reads CLI session JSONL, finds
`isCompactSummary: true` entry, returns it + all entries after. Includes
path validation against the CLI projects directory.
- `TranscriptBuilder.replace_entries()`: clears and replaces all entries
with compacted ones, preserving `isCompactSummary` entries (which have
`type: "summary"` that would normally be stripped)
- `load_previous()`: also preserves `isCompactSummary` entries when
loading a previously compacted transcript
- Service stream loop: after compaction ends, reads compacted entries
and syncs TranscriptBuilder

## Test plan
- [x] 69 tests pass across `compaction_test.py` and `transcript_test.py`
- [x] Tests cover: one-shot flag behavior, multiple compactions within a
query, transcript path storage, path traversal rejection,
`read_compacted_entries` (7 tests), `replace_entries` (4 tests),
`load_previous` with compacted content (2 tests)
- [x] Pre-commit hooks pass (lint, format, typecheck)
- [ ] Manual test: trigger compaction in a multi-turn session and verify
the uploaded transcript reflects compaction
2026-03-13 22:17:46 +00:00
Otto
0b594a219c feat(copilot): support prompt-in-URL for shareable prompt links (#12406)
Requested by @torantula

Add support for shareable AutoPilot URLs that contain a prompt in the
URL hash fragment, inspired by [Lovable's
implementation](https://docs.lovable.dev/integrations/build-with-url).

**URL format:**
- `/copilot#prompt=URL-encoded-text` — pre-fills the input for the user
to review before sending
- `/copilot?autosubmit=true#prompt=...` — auto-creates a session and
sends the prompt immediately

**Example:**
```
https://platform.agpt.co/copilot#prompt=Create%20a%20todo%20app
https://platform.agpt.co/copilot?autosubmit=true#prompt=Create%20a%20todo%20app
```

**Key design decisions:**
- Uses URL fragment (`#`) instead of query params — fragments never hit
the server, so prompts stay client-side only (better for privacy, no
backend URL length limits)
- URL is cleaned via `history.replaceState` immediately after extraction
to prevent re-triggering on navigation/reload
- Leverages existing `pendingMessage` + `createSession()` flow for
auto-submit — no new backend APIs needed
- For populate-only mode, passes `initialPrompt` down through component
tree to pre-fill the chat input

**Files changed:**
- `useCopilotPage.ts` — URL hash extraction logic + `initialPrompt`
state
- `CopilotPage.tsx` — passes `initialPrompt` to `ChatContainer`
- `ChatContainer.tsx` — passes `initialPrompt` to `EmptySession`
- `EmptySession.tsx` — passes `initialPrompt` to `ChatInput`
- `ChatInput.tsx` / `useChatInput.ts` — accepts `initialValue` to
pre-fill the textarea

Fixes SECRT-2119

---
Co-authored-by: Toran Bruce Richards (@Torantulino) <toran@agpt.co>
2026-03-13 23:54:54 +07:00
Zamil Majdy
a8259ca935 feat(analytics): read-only SQL views layer with analytics schema (#12367)
### Changes 🏗️

Adds `autogpt_platform/analytics/` — 14 SQL view definitions that expose
production data safely through a locked-down `analytics` schema.

**Security model:**
- Views use `security_invoker = false` (PostgreSQL 15+), so they execute
as their owner (`postgres`), not the caller
- `analytics_readonly` role only has access to `analytics.*` — cannot
touch `platform` or `auth` tables directly

**Files:**
- `backend/generate_views.py` — does everything; auto-reads credentials
from `backend/.env`
- `analytics/queries/*.sql` — 14 documented view definitions (auth, user
activity, executions, onboarding funnel, cohort retention)

---

### Running locally (dev)

```bash
cd autogpt_platform/backend

# First time only — creates analytics schema, role, grants
poetry run analytics-setup

# Create / refresh views (auto-reads backend/.env)
poetry run analytics-views
```

### Running in production (Supabase)

```bash
cd autogpt_platform/backend

# Step 1 — first time only (run in Supabase SQL Editor as postgres superuser)
poetry run analytics-setup --dry-run
# Paste the output into Supabase SQL Editor and run

# Step 2 — apply views (use direct connection host, not pooler)
poetry run analytics-views --db-url "postgresql://postgres:PASSWORD@db.<ref>.supabase.co:5432/postgres"

# Step 3 — set password for analytics_readonly so external tools can connect
# Run in Supabase SQL Editor:
# ALTER ROLE analytics_readonly WITH PASSWORD 'your-password';
```

---

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Setup + views applied cleanly on local Postgres 15
- [x] `analytics_readonly` can `SELECT` from all 14 `analytics.*` views
- [x] `analytics_readonly` gets `permission denied` on `platform.*` and
`auth.*` directly

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-03-13 12:04:42 +00:00
Swifty
1f1288d623 feat(copilot): generate personalized quick-action prompts from Tally business understanding (#12374)
During Tally data extraction, the system now also generates personalized
quick-action prompts as part of the existing LLM extraction call
(configurable model, defaults to GPT-4o-mini, `temperature=0.0`). The
prompt asks the LLM for 5 candidates, then the code validates (filters
prompts >20 words) and keeps the top 3. These prompts are stored in the
existing `CoPilotUnderstanding.data` JSON field (at the top level, not
under `business`) and served to the frontend via a new API endpoint. The
copilot chat page uses them instead of hardcoded defaults when
available.

### Changes 🏗️

**Backend – Data models** (`understanding.py`):
- Added `suggested_prompts` field to `BusinessUnderstandingInput`
(optional) and `BusinessUnderstanding` (default empty list)
- Updated `from_db()` to deserialize `suggested_prompts` from top-level
of the data JSON
- Updated `merge_business_understanding_data()` with overwrite strategy
for prompts (full replace, not append)
- `format_understanding_for_prompt()` intentionally does **not** include
`suggested_prompts` — they are UI-only

**Backend – Prompt generation** (`tally.py`):
- Extended `_EXTRACTION_PROMPT` to request 5 suggested prompts alongside
the existing business understanding fields — all extracted in a single
LLM call (`temperature=0.0`)
- Post-extraction validation filters out prompts exceeding 20 words and
slices to the top 3
- Model is now configurable via `tally_extraction_llm_model` setting
(defaults to `openai/gpt-4o-mini`)

**Backend – API endpoint** (`routes.py`):
- Added `GET /api/chat/suggested-prompts` (auth required)
- Returns `{prompts: string[]}` from the user's cached business
understanding (48h Redis TTL)
- Returns empty array if no understanding or no prompts exist

**Frontend** (`EmptySession/`):
- `helpers.ts`: Extracted defaults to `DEFAULT_QUICK_ACTIONS`,
`getQuickActions()` now accepts optional custom prompts and falls back
to defaults
- `EmptySession.tsx`: Calls `useGetV2GetSuggestedPrompts` hook
(`staleTime: Infinity`) and passes results to `getQuickActions()` with
hardcoded fallback
- Fixed `useEffect` resize handler that previously used
`window.innerWidth` as a dependency (re-ran every render); now uses a
proper resize event listener
- Added skeleton loading state while prompts are being fetched

**Generated** (`__generated__/`):
- Regenerated Orval API client with new endpoint types and hooks

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Backend format + lint + pyright pass
  - [x] Frontend format + lint pass
  - [x] All existing tally tests pass (28/28)
  - [x] All chat route tests pass (9/9)
  - [x] All invited_user tests pass (7/7)
- [x] E2E: New user with tally data sees custom prompts on copilot page
  - [x] E2E: User without tally data sees hardcoded default prompts
  - [x] E2E: Clicking a custom prompt sends it as a chat message
2026-03-13 12:11:31 +01:00
Otto
02645732b8 feat(backend/copilot): enable E2B auto_resume and reduce safety-net timeout (#12397)
Enable E2B `auto_resume` lifecycle option and reduce the safety-net
timeout from 3 hours to 5 minutes.

Currently, if the explicit per-turn `pause_sandbox_direct()` call fails
(process crash, network issue, fire-and-forget task cancellation), the
sandbox keeps running for up to **3 hours** before the safety-net
timeout fires. With this change, worst-case billing drops to **5
minutes**.

### Changes
- Add `auto_resume: True` to sandbox lifecycle config — paused sandboxes
wake transparently on SDK activity
- Reduce `e2b_sandbox_timeout` default from 10800s (3h) → 300s (5min)
- Add `e2b_sandbox_auto_resume` config field (default: `True`)
- Guard: `auto_resume` only added when `on_timeout == "pause"`

### What doesn't change
- Explicit per-turn `pause_sandbox_direct()` remains the primary
mechanism
- `connect()` / `_try_reconnect()` flow unchanged
- Redis key management unchanged
- No latency impact (resume is ~1-2s regardless of trigger)

### Risk
Very low — `auto_resume` is additive. If it doesn't work as advertised,
`connect()` still resumes paused sandboxes exactly as before.

Ref: https://e2b.dev/docs/sandbox/auto-resume
Linear: SECRT-2118

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-13 10:29:28 +00:00
Swifty
ba301a3912 feat(platform): add whitelisting-backed beta user provisioning (#12347)
### Changes 🏗️

- add invite-backed beta provisioning with a new `InvitedUser` platform
model, Prisma migration, and first-login activation path that
materializes `User`, `Profile`, `UserOnboarding`, and
`CoPilotUnderstanding`
- replace the legacy beta allowlist check with invite-backed gating for
email/password signup and Tally pre-seeding during activation
- add admin backend APIs and frontend `/admin/users` management UI for
listing, creating, revoking, retrying, and bulk-uploading invited users
- add the design doc for the beta invite system and extend backend
coverage for invite activation, bulk uploads, and auth-route behavior
- configuration changes: introduce the new invite/tally schema objects
and migration; no new env vars or docker service changes are required

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `cd autogpt_platform/backend && poetry run format`
- [x] `cd autogpt_platform/backend && poetry run pytest -q` (run against
an isolated local Postgres database with non-conflicting service port
overrides)

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-03-13 10:25:49 +01:00
Abhimanyu Yadav
0cd9c0d87a fix(frontend): show sub-folders when navigating inside a folder (#12316)
## Summary

When opening a folder in the library, sub-folders were not displayed —
only agents were shown. This was caused by two issues:

1. The folder list query always fetched root-level folders (no
`parent_id` filter), so sub-folders were never requested
2. `showFolders` was set to `false` whenever a folder was selected,
hiding all folders from the view

### Changes 🏗️

- Pass `parent_id` to the `useGetV2ListLibraryFolders` hook so it
fetches child folders of the currently selected folder
- Remove the `!selectedFolderId` condition from `showFolders` so folders
render inside other folders
- Fetch the current folder via `useGetV2GetFolder` instead of searching
the (now differently-scoped) folder list
- Clean up breadcrumb: remove emoji icon, match folder name text size to
"My Library", replace `Button` with plain `<button>` to remove extra
padding/gap

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open a folder in the library and verify sub-folders are displayed
  - [x] Verify agents inside the folder still display correctly
- [x] Verify breadcrumb shows folder name without emoji, matching "My
Library" text size
- [x] Verify clicking "My Library" in breadcrumb navigates back to root
  - [x] Verify root-level view still shows all top-level folders
  - [x] Verify favorites tab does not show folders
2026-03-13 04:40:09 +00:00
Zamil Majdy
a083493aa2 fix(backend/copilot): auto-parse structured data and robust type coercion (#12390)
The copilot's `@@agptfile:` reference system always produces strings
when expanding
file references. This breaks blocks that expect structured types — e.g.
`GoogleSheetsWriteBlock` expects `values: list[list[str]]`, but receives
a raw CSV
string instead. Additionally, the copilot's input coercion was
duplicating logic from
the executor instead of reusing the shared `convert()` utility, and the
coercion had
no type-aware gating — it would always call `convert()`, which could
incorrectly
transform values that already matched the expected type (e.g.
stringifying a valid
`list[str]` in a `str | list[str]` union).

### Changes 🏗️

**Structured data parsing for `@@agptfile:` bare references:**
- When an entire tool argument value is a bare `@@agptfile:` reference,
the resolved
content is now auto-parsed: JSON → native types, CSV/TSV →
`list[list[str]]`
- Embedded references within larger strings still do plain text
substitution
- Updated copilot system prompt to document the structured data
capability

**Shared type coercion utility (`coerce_inputs_to_schema`):**
- Extracted `coerce_inputs_to_schema()` into `backend/util/type.py` —
shared by both
  the executor's `validate_exec()` and the copilot's `execute_block()`
- Uses Pydantic `model_fields` (not `__annotations__`) to include
inherited fields
- Added `_value_satisfies_type()` gate: only calls `convert()` when the
value doesn't
already match the target type, including recursive inner-element
checking for generics

**`_value_satisfies_type` — recursive type checking:**
- Handles `Any`, `Optional`, `Union`, `list[T]`, `dict[K,V]`, `set[T]`,
`tuple[T, ...]`,
  heterogeneous `tuple[str, int, bool]`, bare generics, nested generics
- Guards against non-runtime origins (`Literal`, etc.) to prevent
`isinstance()` crashes
- Returns `False` (not `True`) for unhandled generic origins as a safe
fallback

**Test coverage:**
- 51 new tests for `_value_satisfies_type` and `coerce_inputs_to_schema`
in `type_test.py`
- 8 new tests for `execute_block` type coercion in `helpers_test.py`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All existing file_ref tests pass
- [x] All new type_test.py tests pass (51 tests covering
_value_satisfies_type and coerce_inputs_to_schema)
- [x] All new helpers_test.py tests pass (8 tests covering execute_block
coercion)
  - [x] `poetry run format` passes clean
  - [x] `poetry run lint` passes clean
  - [x] Pyright type checking passes
2026-03-12 19:27:41 +00:00
Zamil Majdy
c51dc7ad99 fix(backend): agent generator sets invalid model on PerplexityBlocks (#12391)
Fixes the agent generator setting `gpt-5.2-2025-12-11` (or `gpt-4o`) as
the model for PerplexityBlocks instead of valid Perplexity models,
causing 100% failure rate for agents using Perplexity blocks.

### Changes 🏗️

- **Fixer: block-aware model validation** — `fix_ai_model_parameter()`
now reads the block's `inputSchema` to check for `enum` constraints on
the model field. Blocks with their own model enum (PerplexityBlock,
IdeogramBlock, CodexBlock, etc.) are validated against their own allowed
values with the correct default, instead of the hardcoded generic set
(`gpt-4o`, `claude-opus-4-6`). This also fixes `edit_agent` which runs
through the same fixer pipeline.
- **PerplexityBlock: runtime fallback** — Added a `field_validator` on
the model field that gracefully falls back to `SONAR` instead of
crashing when an invalid model value is encountered at runtime. Also
overrides `validate_data` to sanitize invalid model values *before* JSON
schema validation (which runs in `Block._execute` before Pydantic
instantiation), ensuring the fallback is actually reachable during block
execution.
- **DB migration** — Fixes existing PerplexityBlock nodes with invalid
model values in both `AgentNode.constantInput` and
`AgentNodeExecutionInputOutput` (preset overrides), matching the pattern
from the Gemini migration.
- **Tests** — Fixer tests for block-specific enum validation, plus
`validate_data`-level tests ensuring invalid models are sanitized before
JSON schema validation rejects them.

Resolves [SECRT-2097](https://linear.app/autogpt/issue/SECRT-2097)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All existing + new fixer tests pass
  - [x] PerplexityBlock block test passes
- [x] 11 perplexity_test.py tests pass (field_validator + validate_data
paths)
- [x] Verified invalid model (`gpt-5.2-2025-12-11`) falls back to
`perplexity/sonar` at runtime
  - [x] Verified valid Perplexity models are preserved by the fixer
  - [x] Migration covers both constantInput and preset overrides
2026-03-12 18:54:18 +00:00
Krzysztof Czerwinski
bc6b82218a feat(platform): add autopilot notification system (#12364)
Adds a notification system for the Copilot (AutoPilot) so users know
when background chats finish processing — via in-app indicators, sounds,
browser notifications, and document title badges.

### Changes 🏗️

**Backend**
- Add `is_processing` field to `SessionSummaryResponse` — batch-checks
Redis for active stream status on each session in the list endpoint
- Fix `is_processing` always returning `false` due to bytes vs string
comparison (`b"running"` → `"running"`) with `decode_responses=True`
Redis client
- Add `CopilotCompletionPayload` model for WebSocket notification events
- Publish `copilot_completion` notification via WebSocket when a session
completes in `stream_registry.mark_session_completed`

**Frontend — Notification UI**
- Add `NotificationBanner` component — amber banner prompting users to
enable browser notifications (auto-hides when already enabled or
dismissed)
- Add `NotificationDialog` component — modal dialog for enabling
notifications, supports force-open from sidebar menu for testing
- Fix repeated word "response" in dialog copy

**Frontend — Sidebar**
- Add bell icon in sidebar header with popover menu containing:
- Notifications toggle (requests browser permission on enable; shows
toast if denied)
  - Sound toggle (disabled when notifications are off)
  - "Show notification popup" button (for testing the dialog)
  - "Clear local data" button (resets all copilot localStorage keys)
- Bell icon states: `BellSlash` (disabled), `Bell` (enabled, no sound),
`BellRinging` (enabled + sound)
- Add processing indicator (PulseLoader) and completion checkmark
(CheckCircle) inline with chat title, to the left of the hamburger menu
- Processing indicator hides immediately when completion arrives (no
overlap with checkmark)
- Fix PulseLoader initial flash — start at `scale(0); opacity: 0` with
smoother keyframes
- Add 10s polling (`refetchInterval`) to session list so `is_processing`
updates automatically
- Clear document title badge when navigating to a completed chat
- Remove duplicate "Your chats" heading that appeared in both
SidebarHeader and SidebarContent

**Frontend — Notification Hook (`useCopilotNotifications`)**
- Listen for `copilot_completion` WebSocket events
- Track completed sessions in Zustand store
- Play notification sound (only for background sessions, not active
chat)
- Update `document.title` with unread count badge
- Send browser `Notification` when tab is hidden, with click-to-navigate
to the completed chat
- Reset document title on tab focus

**Frontend — Store & Storage**
- Add `completedSessionIDs`, `isNotificationsEnabled`, `isSoundEnabled`,
`showNotificationDialog`, `clearCopilotLocalData` to Zustand store
- Persist notification and sound preferences in localStorage
- On init, validate `isNotificationsEnabled` against actual
`Notification.permission`
- Add localStorage keys: `COPILOT_NOTIFICATIONS_ENABLED`,
`COPILOT_SOUND_ENABLED`, `COPILOT_NOTIFICATION_BANNER_DISMISSED`,
`COPILOT_NOTIFICATION_DIALOG_DISMISSED`

**Mobile**
- Add processing/completion indicators and sound toggle to MobileDrawer

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open copilot, start a chat, switch to another chat — verify
processing indicator appears on the background chat
- [x] Wait for background chat to complete — verify checkmark appears,
processing indicator disappears
- [x] Enable notifications via bell menu — verify browser permission
prompt appears
- [x] With notifications enabled, complete a background chat while on
another tab — verify system notification appears with sound
- [x] Click system notification — verify it navigates to the completed
chat
- [x] Verify document title shows unread count and resets when
navigating to the chat or focusing the tab
  - [x] Toggle sound off — verify no sound plays on completion
- [x] Toggle notifications off — verify no sound, no system
notification, no badge
  - [x] Clear local data — verify all preferences reset
- [x] Verify notification banner hides when notifications already
enabled
- [x] Verify dialog auto-shows for first-time users and can be
force-opened from menu

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:03:24 +00:00
Otto
83e49f71cd fix(frontend): pass through Supabase error params in password reset callback (#12384)
When Supabase rejects a password reset token (expired, already used,
etc.), it redirects to the callback URL with `error`, `error_code`, and
`error_description` params instead of a `code`. Previously, the callback
only checked for `!code` and returned a generic "Missing verification
code" error, swallowing the actual Supabase error.

This meant the `ExpiredLinkMessage` UX (added in SECRT-1369 / #12123)
was never triggered for these cases — users just saw the email input
form again with no explanation.

Now the callback reads Supabase's error params and forwards them to
`/reset-password`, where the existing expired link detection picks them
up correctly.

**Note:** This doesn't fix the root cause of Pwuts's token expiry issue
(likely link preview/prefetch consuming the OTP), but it ensures users
see the proper "link expired" message with a "Request new link" button
instead of a confusing silent redirect.

---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
2026-03-12 13:51:15 +00:00
Bently
ef446e4fe9 feat(llm): Add Cohere Command A Family Models (#12339)
## Summary
Adds the Cohere Command A family of models to AutoGPT Platform with
proper pricing configuration.

## Models Added
- **Command A 03.2025**: Flagship model (256k context, 8k output) - 3
credits
- **Command A Translate 08.2025**: State-of-the-art translation (8k
context, 8k output) - 3 credits
- **Command A Reasoning 08.2025**: First reasoning model (256k context,
32k output) - 6 credits
- **Command A Vision 07.2025**: First vision-capable model (128k
context, 8k output) - 3 credits

## Changes
- Added 4 new LlmModel enum entries with proper OpenRouter model IDs
- Added ModelMetadata for each model with correct context windows,
output limits, and price tiers
- Added pricing configuration in block_cost_config.py

## Testing
- [ ] Models appear in AutoGPT Platform model selector
- [ ] Pricing is correctly applied when using models

Resolves **SECRT-2083**
2026-03-12 11:56:30 +00:00
Bently
7b1e8ed786 feat(llm): Add Microsoft Phi-4 model support (#12342)
## Changes
- Added `MICROSOFT_PHI_4` to LlmModel enum (`microsoft/phi-4`)
- Configured model metadata:
  - 16K context window
  - 16K max output tokens
  - OpenRouter provider
- Set cost tier: 1
  - Input: $0.06 per 1M tokens
  - Output: $0.14 per 1M tokens

## Details
Microsoft Phi-4 is a 14B parameter model available through OpenRouter.
This PR adds proper support in the autogpt_platform backend.

Resolves SECRT-2086
2026-03-12 11:15:27 +00:00
Abhimanyu Yadav
7ccfff1040 feat(frontend): add credential type selector for multi-auth providers (#12378)
### Changes

- When a provider supports multiple credential types (e.g. GitHub with
both OAuth and API Key),
clicking "Add credential" now opens a tabbed dialog where users can
choose which type to use.
  Previously, OAuth always took priority and API key was unreachable.
- Each credential in the list now shows a type-specific icon (provider
icon for OAuth, key for API Key,
password/lock for others) and a small label badge (e.g. "API Key",
"OAuth").
- The native dropdown options also include the credential type in
parentheses for clarity.
- Single credential type providers behave exactly as before — no dialog,
direct action.


https://github.com/user-attachments/assets/79f3a097-ea97-426b-a2d9-781d7dcdb8a4



  ## Test plan
- [x] Test with a provider that has only one credential type (e.g.
OpenAI with api_key only) — should
  behave as before
- [x] Test with a provider that has multiple types (e.g. GitHub with
OAuth + API Key configured) —
  should show tabbed dialog
  - [x] Verify OAuth tab triggers the OAuth flow correctly
  - [x] Verify API Key tab shows the inline form and creates credentials
  - [x] Verify credential list shows correct icons and type badges
  - [x] Verify dropdown options show type in parentheses
2026-03-12 10:17:58 +00:00
Otto
81c7685a82 fix(frontend): release test fixes — scheduler time picker, unpublished banner (#12376)
Two frontend fixes from release testing (2026-03-11):

**SECRT-2102:** The schedule dialog shows an "At [hh]:[mm]" time picker
when selecting Custom > Every x Minutes or Hours, which makes no sense
for sub-day intervals. Now only shows the time picker for Custom > Days
and other frequency types.

**SECRT-2103:** The "Unpublished changes" banner shows for agents the
user doesn't own or create. Root cause: `owner_user_id` is the library
copy owner, not the graph creator. Changed to use `can_access_graph`
which correctly reflects write access.

---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>

---------

Co-authored-by: Reinier van der Leer (@Pwuts) <reinier@agpt.co>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-12 10:02:26 +00:00
Bently
3595c6e769 feat(llm): add Perplexity Sonar Reasoning Pro model (#12341)
## Summary
Adds support for Perplexity's new reasoning model:
`perplexity/sonar-reasoning-pro`

## Changes
-  Added `PERPLEXITY_SONAR_REASONING_PRO` to `LlmModel` enum
-  Added model metadata (128K context window, 8K max output tokens,
tier 2)
-  Set pricing at 5 credits (matches sonar-pro tier)

## Model Details
- **Model ID:** `perplexity/sonar-reasoning-pro`
- **Provider:** OpenRouter
- **Context Window:** 128,000 tokens
- **Max Output:** 8,000 tokens
- **Pricing:** $0.000002/token (prompt), $0.000008/token (completion)
- **Cost Tier:** 2 (5 credits)

## Testing
-  Black formatting passed
-  Ruff linting passed

Resolves SECRT-2084
2026-03-12 09:58:29 +00:00
Abhimanyu Yadav
1c2953d61b fix(frontend): restore broken tutorial in builder (#12377)
### Changes
- Restored missing `shepherd.js/dist/css/shepherd.css` base styles
import
- Added missing .new-builder-tutorial-disable and
.new-builder-tutorial-highlight CSS classes to
  tutorial.css
- Fixed getFormContainerSelector() to include -node suffix matching the
actual DOM attribute

###  What broke
The old legacy-builder/tutorial.ts was the only file importing
Shepherd's base CSS. When #12082 removed
the legacy builder, the new tutorial lost all base Shepherd styles
(close button positioning, modal
overlay, tooltip layout). The new tutorial's custom CSS overrides
depended on these base styles
  existing.

  Test plan
  - [x] Start the tutorial from the builder (click the chalkboard icon)
- [x] Verify the close (X) button is positioned correctly in the
top-right of the popover
  - [x] Verify the modal overlay dims the background properly
- [x] Verify element highlighting works when the tutorial points to
blocks/buttons
- [x] Verify non-target blocks are grayed out during the "select
calculator" step
- [x] Complete the full tutorial flow end-to-end (add block → configure
→ connect → save → run)
2026-03-12 09:23:34 +00:00
Zamil Majdy
755bc84b1a fix(copilot): replace MCP jargon with user-friendly language (#12381)
Closes SECRT-2105

### Changes 🏗️

Replace all user-facing MCP technical terminology with plain, friendly
language across the CoPilot UI and LLM prompting.

**Backend (`run_mcp_tool.py`)**
- Added `_service_name()` helper that extracts a readable name from an
MCP host (`mcp.sentry.dev` → `Sentry`)
- `agent_name` in `SetupRequirementsResponse`: `"MCP: mcp.sentry.dev"` →
`"Sentry"`
- Auth message: `"The MCP server at X requires authentication. Please
connect your credentials to continue."` → `"To continue, sign in to
Sentry and approve access."`

**Backend (`mcp_tool_guide.md`)**
- Added "Communication style" section with before/after examples to
teach the LLM to avoid "MCP server", "OAuth", "credentials" jargon in
responses to users

**Frontend (`MCPSetupCard.tsx`)**
- Button: `"Connect to mcp.sentry.dev"` → `"Connect Sentry"`
- Connected state: `"Connected to mcp.sentry.dev!"` → `"Connected to
Sentry!"`
- Retry message: `"I've connected the MCP server credentials. Please
retry."` → `"I've connected. Please retry."`

**Frontend (`helpers.tsx`)**
- Added `serviceNameFromHost()` helper (exported, mirrors the backend
logic)
- Run text: `"Discovering MCP tools on mcp.sentry.dev"` → `"Connecting
to Sentry…"`
- Run text: `"Connecting to MCP server"` → `"Connecting…"`
- Run text: `"Connect to MCP: mcp.sentry.dev"` → `"Connect Sentry"`
(uses `agent_name` which is now just `"Sentry"`)
- Run text: `"Discovered N tool(s) on mcp.sentry.dev"` → `"Connected to
Sentry"`
- Error text: `"MCP error"` → `"Connection error"`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Open CoPilot and ask it to connect to a service (e.g. Sentry,
Notion)
- [ ] Verify the run text accordion title shows `"Connecting to
Sentry…"` instead of `"Discovering MCP tools on mcp.sentry.dev"`
- [ ] Verify the auth card button shows `"Connect Sentry"` instead of
`"Connect to mcp.sentry.dev"`
- [ ] Verify the connected state shows `"Connected to Sentry!"` instead
of `"Connected to mcp.sentry.dev!"`
- [ ] Verify the LLM response text avoids "MCP server", "OAuth",
"credentials" terminology
2026-03-12 08:54:15 +00:00
Bently
ade2baa58f feat(llm): Add Grok 3 model support (#12343)
## Summary
Adds support for xAI's Grok 3 model to AutoGPT.

## Changes
- Added `GROK_3` to `LlmModel` enum with identifier `x-ai/grok-3`
- Configured model metadata:
  - Context window: 131,072 tokens (128k)
  - Max output: 32,768 tokens (32k)  
  - Provider: OpenRouter
  - Creator: xAI
  - Price tier: 2 (mid-tier)
- Set model cost to 3 credits (mid-tier pricing between fast models and
Grok 4)
- Updated block documentation to include Grok 3 in model lists

## Pricing Rationale
- **Grok 4**: 9 credits (tier 3 - premium, 256k context)
- **Grok 3**: 3 credits (tier 2 - mid-tier, 128k context) ← NEW
- **Grok 4 Fast/4.1 Fast/Code Fast**: 1 credit (tier 1 - affordable)

Grok 3 is positioned as a mid-tier model, priced similarly to other tier
2 models.

## Testing
- [x] Code passes `black` formatting
- [x] Code passes `ruff` linting
- [x] Model metadata and cost configuration added
- [x] Documentation updated

Closes SECRT-2079
2026-03-12 07:31:59 +00:00
Reinier van der Leer
4d35534a89 Merge branch 'master' into dev 2026-03-11 22:26:35 +01:00
Zamil Majdy
2cc748f34c chore(frontend): remove accidentally committed generated file (#12373)
`responseType.ts` was accidentally committed inside
`src/app/api/__generated__/models/` despite that directory being listed
in `.gitignore` (added in PR #12238).

### Changes 🏗️

- Removes
`autogpt_platform/frontend/src/app/api/__generated__/models/responseType.ts`
from git tracking — the file is already covered by the `.gitignore` rule
`src/app/api/__generated__/` and should never have been committed.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] No functional changes — only removes a stale tracked file that is
already gitignored
2026-03-11 14:22:37 +00:00
Shunyu Wu
c2e79fa5e1 fix(gmail): fallback to raw HTML when html2text conversion fails (#12369)
## Summary
- keep Gmail body extraction resilient when `html2text` converter raises
- fallback to raw HTML instead of failing extraction
- add regression test for converter failure path

Closes #12368

## Testing
- added unit test in
`autogpt_platform/backend/test/blocks/test_gmail.py`

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-11 11:46:57 +00:00
Bently
89a5b3178a fix(llm): Update Gemini model lineup - add 3.1 models, deprecate 3 Pro Preview (#12331)
## 🔴 URGENT: Gemini 3 Pro Preview Shutdown - March 9, 2026

Google is shutting down Gemini 3 Pro Preview **tomorrow (March 9,
2026)**. This PR addresses SECRT-2067 by updating the Gemini model
lineup to prevent disruption.

---

## Changes

###  P0 - Critical (This Week)
- [x] **Remove/Replace Gemini 3 Pro Preview** → Migrated to 3.1 Pro
Preview
- [x] **Add Gemini 3.1 Pro Preview** (released Feb 19, 2026)

###  P1 - High Priority  
- [x] **Add Gemini 3.1 Flash Lite Preview** (released Mar 3, 2026)
- [x] **Add Gemini 3 Flash Preview** (released Dec 17, 2025)

###  P2 - Medium Priority
- [x] **Add Gemini 2.5 Pro (stable/GA)** (released Jun 17, 2025)

---

## Model Details

| Model | Context | Input Cost | Output Cost | Price Tier |
|-------|---------|------------|-------------|------------|
| **Gemini 3.1 Pro Preview** | 1.05M | $2.00/1M | $12.00/1M | 2 |
| **Gemini 3.1 Flash Lite Preview** | 1.05M | $0.25/1M | $1.50/1M | 1 |
| **Gemini 3 Flash Preview** | 1.05M | $0.50/1M | $3.00/1M | 1 |
| **Gemini 2.5 Pro (GA)** | 1.05M | $1.25/1M | $10.00/1M | 2 |
| ~~Gemini 3 Pro Preview~~ | ~~1.05M~~ | ~~$2.00/1M~~ | ~~$12.00/1M~~ |
**DEPRECATED** |

---

## Migration Strategy

**Database Migration:**
`20260308095500_migrate_deprecated_gemini_3_pro_preview`

- Automatically migrates all existing graphs using
`google/gemini-3-pro-preview` to `google/gemini-3.1-pro-preview`
- Updates: AgentBlock, AgentGraphExecution, AgentNodeExecution,
AgentGraph
- Zero user-facing disruption
- Migration runs on next deployment (before March 9 shutdown)

---

## Testing

- [ ] Verify new models appear in LLM block dropdown
- [ ] Test migration on staging database
- [ ] Confirm existing graphs using deprecated model auto-migrate
- [ ] Validate cost calculations for new models

---

## References

- **Linear Issue:**
[SECRT-2067](https://linear.app/autogpt/issue/SECRT-2067)
- **OpenRouter Models:** https://openrouter.ai/models/google
- **Google Deprecation Notice:**
https://ai.google.dev/gemini-api/docs/deprecations

---

## Checklist

- [x] Models added to `LlmModel` enum
- [x] Model metadata configured
- [x] Cost config updated
- [x] Database migration created
- [x] Deprecated model commented out (not removed for historical
reference)
- [ ] PR reviewed and approved
- [ ] Merged before March 9, 2026 deadline

---

**Priority:** 🔴 Critical - Must merge before March 9, 2026
2026-03-11 11:21:16 +00:00
Abhimanyu Yadav
c62d9a24ff fix(frontend): show correct status in agent submission view modal (#12360)
### Changes 🏗️
- The "View" modal for agent submissions hardcoded "Agent is awaiting
review" regardless of actual status
- Now displays "Agent approved", "Agent rejected", or "Agent is awaiting
review" based on the submission's actual status
- Shows review feedback in a highlighted section for rejected agents
when review comments are available

<img width="1127" height="788" alt="Screenshot 2026-03-11 at 9 02 29 AM"
src="https://github.com/user-attachments/assets/840e0fb1-22c2-4fda-891b-967c8b8b4043"
/>
<img width="1105" height="680" alt="Screenshot 2026-03-11 at 9 02 46 AM"
src="https://github.com/user-attachments/assets/f0c407e6-c58e-4ec8-9988-9f5c69bfa9a7"
/>

  ## Test plan
- [x] Submit an agent and verify the view modal shows "Agent is awaiting
review"
- [x] View an approved agent submission and verify it shows "Agent
approved"
- [x] View a rejected agent submission and verify it shows "Agent
rejected"
- [x] View a rejected agent with review comments and verify the feedback
section appears

  Closes SECRT-2092
2026-03-11 10:08:17 +00:00
Abhimanyu Yadav
0e0bfaac29 fix(frontend): show specific error messages for store image upload failures (#12361)
### Changes
- Surface backend error details (file size limit, invalid file type,
virus detected, etc.) in the upload failed toast instead of showing a
generic "Upload Failed" message
- The backend already returns specific error messages (e.g., "File too
large. Maximum size is 50MB") but the frontend was discarding them with
a catch-all handler
  
<img width="1222" height="411" alt="Screenshot 2026-03-11 at 9 13 30 AM"
src="https://github.com/user-attachments/assets/34ab3d90-fffa-4788-917a-fe2a7f4144b9"
/>

  ## Test plan
- [x] Upload an image larger than 50MB to a store submission → should
see "File too large. Maximum size is 50MB"
- [x] Upload an unsupported file type → should see file type error
message
  - [x] Upload a valid image → should still work normally

  Resolves SECRT-2093
2026-03-11 10:07:37 +00:00
Bently
0633475915 fix(frontend/library): graceful schedule deletion with auto-selection (#12278)
### Motivation 🎯

Fixes the issue where deleting a schedule shows an error screen instead
of gracefully handling the deletion. Previously, when a user deleted a
schedule, a race condition occurred where the query cache refetch
completed before the URL
state updated, causing the component to try rendering a schedule that no
longer existed (resulting in a 404 error screen).

### Changes 🏗️

**1. Fixed deletion order to prevent error screen flash**
- `useSelectedScheduleActions.ts` - Call `onDeleted()` callback
**before** invalidating queries to clear selection first
- `ScheduleActionsDropdown.tsx` - Same fix for sidebar dropdown deletion

**2. Added smart auto-selection logic**
- `useNewAgentLibraryView.ts`:
  - Added query to fetch current schedules list
  - Added `handleScheduleDeleted(deletedScheduleId)` function that:
    - Auto-selects the first remaining schedule if others exist
    - Clears selection to show empty state if no schedules remain

**3. Wired up smart deletion handler throughout component tree**
- `NewAgentLibraryView.tsx` - Passes `handleScheduleDeleted` to child
components
- `SelectedScheduleView.tsx` - Changed callback from
`onClearSelectedRun` to `onScheduleDeleted` and passes schedule ID
- `SidebarRunsList.tsx` - Added `onScheduleDeleted` prop and passes it
through to list items

### Checklist 📋

**Test Plan:**
- [] Create 2-3 test schedules for an agent
- [] Delete a schedule from the detail view (trash icon in actions) when
other schedules exist → Verify next schedule auto-selects without error
- [] Delete a schedule from the sidebar dropdown (three-dot menu) when
other schedules exist → Verify next schedule auto-selects without error
- [] Delete all schedules until only one remains → Verify empty state
shows gracefully without error
- [] Verify "Schedule deleted" toast appears on successful deletion
- [] Verify no error screen appears at any point during deletion flow
2026-03-11 09:01:55 +00:00
Bently
34a2f9a0a2 feat(llm): add Mistral flagship models (Large 3, Medium 3.1, Small 3.2, Codestral) (#12337)
## Summary

Adds four missing Mistral AI flagship models to address the critical
coverage gap identified in
[SECRT-2082](https://linear.app/autogpt/issue/SECRT-2082).

## Models Added

| Model | Context | Max Output | Price Tier | Use Case |
|-------|---------|------------|------------|----------|
| **Mistral Large 3** | 262K | None | 2 (Medium) | Flagship reasoning
model, 41B active params (675B total), MoE architecture |
| **Mistral Medium 3.1** | 131K | None | 2 (Medium) | Balanced
performance/cost, 8x cheaper than traditional large models |
| **Mistral Small 3.2** | 131K | 131K | 1 (Low) | Fast, cost-efficient,
high-volume use cases |
| **Codestral 2508** | 256K | None | 1 (Low) | Code generation
specialist (FIM, correction, test gen) |

## Problem

Previously, the platform only offered:
- Mistral Nemo (1 official model)
- dolphin-mistral (third-party Ollama fine-tune)

This left significant gaps in Mistral's lineup, particularly:
- No flagship reasoning model
- No balanced mid-tier option
- No code-specialized model
- Missing multimodal capabilities (Large 3, Medium 3.1, Small 3.2 all
support text+image)

## Changes

**File:** `autogpt_platform/backend/backend/blocks/llm.py`

- Added 4 enum entries in `LlmModel` class
- Added 4 metadata entries in `MODEL_METADATA` dict
- All models use OpenRouter provider
- Follows existing pattern for model additions

## Testing

-  Enum values match OpenRouter model IDs
-  Metadata follows existing format
-  Context windows verified from OpenRouter API
-  Price tiers assigned appropriately

## Closes

- SECRT-2082

---

**Note:** All models are available via OpenRouter and tested. This
brings Mistral coverage in line with other major providers (OpenAI,
Anthropic, Google).
2026-03-11 08:48:48 +00:00
Zamil Majdy
9f4caa7dfc feat(blocks): add and harden GitHub blocks for full-cycle development (#12334)
## Summary
- Add 8 new GitHub blocks: GetRepositoryInfo, ForkRepository,
ListCommits, SearchCode, CompareBranches, GetRepositoryTree,
MultiFileCommit, MergePullRequest
- Split `repo.py` (2094 lines, 19 blocks) into domain-specific modules:
`repo.py`, `repo_branches.py`, `repo_files.py`, `commits.py`
- Concurrent blob creation via `asyncio.gather()` in MultiFileCommit
- URL-encode branch/ref params via `urllib.parse.quote()` for
defense-in-depth
- Step-level error handling in MultiFileCommit ref update with recovery
SHA
- Collapse FileOperation CREATE/UPDATE into UPSERT (Git Trees API treats
them identically)
- Add `ge=1, le=100` constraints on per_page SchemaFields
- Preserve URL scheme in `prepare_pr_api_url`
- Handle null commit authors gracefully in ListCommits
- Add unit tests for `prepare_pr_api_url`, error-path tests for
MergePR/MultiFileCommit, FileOperation enum validation tests

## Test plan
- [ ] Block tests pass for all 19 GitHub blocks (CI:
`test_available_blocks`)
- [ ] New test file `test_github_blocks.py` passes (prepare_pr_api_url,
error paths, enum)
- [ ] `check-docs-sync` passes with regenerated docs
- [ ] pyright/ruff clean on all changed files
2026-03-11 08:35:37 +00:00
Otto
0876d22e22 feat(frontend/copilot): improve TTS voice selection to avoid robotic voices (#12317)
Requested by @0ubbe

Refines the `pickBestVoice()` function to ensure non-robotic voices are
always preferred:

- **Filter out known low-quality engines** — eSpeak, Festival, MBROLA,
Flite, and Pico voices are deprioritized
- **Prefer remote/cloud-backed voices** — `localService: false` voices
are typically higher quality
- **Expand preferred voices list** — added Moira, Tessa (macOS), Jenny,
Aria, Guy (Windows OneCore)
- **Smarter fallback chain** — English default → English → any default →
first available

The previous fallback could select eSpeak or Festival voices on Linux
systems, resulting in robotic output. Now those are filtered out unless
they're the only option.

---
Co-authored-by: Ubbe <ubbe@users.noreply.github.com>

---------

Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:47:42 +08:00
Zamil Majdy
15e3980d65 fix(frontend): buffer workspace file downloads to prevent truncation (#12349)
## Summary
- Workspace file downloads (images, CSVs, etc.) were silently truncated
(~10 KB lost from the end) when served through the Next.js proxy
- Root cause: `new NextResponse(response.body)` passes a
`ReadableStream` directly, which Next.js / Vercel silently truncates for
larger files
- Fix: fully buffer with `response.arrayBuffer()` before forwarding, and
set `Content-Length` from the actual buffer size
- Keeps the auth proxy intact — no signed URLs (which would be public
and expire, breaking chat history)

## Root cause verification
Confirmed locally on session `080f27f9-0379-4085-a67a-ee34cc40cd62`:
- Backend `write_workspace_file` logs **978,831 bytes** written
- Direct backend download (`curl
localhost:8006/api/workspace/files/.../download`): **978,831 bytes** 
- Browser download through Next.js proxy: **truncated** 

## Why not signed URLs?
- Signed URLs are effectively public — anyone with the link can download
the file (privacy concern)
- Signed URLs expire, but chat history persists — reopening a
conversation later would show broken downloads
- Buffering is fine: workspace files are capped at 100 MB, Vercel
function memory is 1 GB

## Related
- Discord thread: `#Truncated File Bug` channel
- Related PR #12319 (signed URL approach) — this fix is simpler and
preserves auth

## Test plan
- [ ] Download a workspace file (CSV, PNG, any type) through the copilot
UI
- [ ] Verify downloaded file size matches the original
- [ ] Verify PNGs open correctly and CSVs have all rows

cc @Swiftyos @uberdot @AdarshRawat1
2026-03-10 18:23:51 +00:00
Zamil Majdy
fe9eb2564b feat(copilot): HITL review for sensitive block execution (#12356)
## Summary
- Integrates existing Human-In-The-Loop (HITL) review infrastructure
into CoPilot's direct block execution (`run_block`) for blocks marked
with `is_sensitive_action=True`
- Removes the `PendingHumanReview → AgentGraphExecution` FK constraint
to support synthetic CoPilot session IDs (migration included)
- Adds `ReviewRequiredResponse` model + frontend `ReviewRequiredCard`
component to surface review status in the chat UI
- Auto-approval works within a CoPilot session: once a block is
approved, subsequent executions of the same block in the same session
are auto-approved (using `copilot-session-{session_id}` as
`graph_exec_id` and `copilot-node-{block_id}` as `node_id`)

## Test plan
- [x] All 11 `run_block_test.py` tests pass (3 new sensitive action
tests)
- [ ] Manual: Execute a block with `is_sensitive_action=True` in CoPilot
→ verify ReviewRequiredResponse is returned and rendered
- [ ] Manual: Approve in review panel → re-execute the same block →
verify auto-approval kicks in
- [ ] Manual: Verify non-sensitive blocks still execute without review
2026-03-10 18:20:11 +00:00
Otto
5641cdd3ca fix(backend): update test patches for validate_url → validate_url_host rename (#12358)
bfb843a renamed `validate_url` to `validate_url_host` in
`agent_browser`, `run_mcp_tool`, and MCP routes, but the corresponding
test files still patched the old name, causing `AttributeError` in CI.

Updates all mock patch targets and assertions across 3 test files:
- `agent_browser_test.py`
- `test_run_mcp_tool.py`  
- `mcp/test_routes.py`

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
2026-03-10 17:22:11 +00:00
Otto
bfb843a56e Merge commit from fork
* Fix SSRF via user-controlled ollama_host field

Validate ollama_host against BLOCKED_IP_NETWORKS before passing to
ollama.AsyncClient(). The server-configured default (env: OLLAMA_HOST)
is allowed without validation; user-supplied values that differ are
checked for private/internal IP resolution.

Fixes GHSA-6jx2-4h7q-3fx3

* Generalize validate_ollama_host to validate_host; fix description line length

* Rename to validate_untrusted_host with whitelist parameter

* Apply PR suggestion: include whitelist in error message; run formatting

* Move whitelist check after URL normalization; match on netloc

* revert unrelated formatting changes

* Dedup validate_url and validate_untrusted_host; normalize whitelist

* Move _resolve_and_check_blocked after calling functions

* dedup and clean up

* make trusted_hostnames truly optional

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-10 15:51:58 +01:00
Abhimanyu Yadav
684845d946 fix(frontend/builder): handle discriminated unions and improve node layout (#12354)
## Summary
- **Discriminated union support (oneOf)**: Added a new `OneOfField`
component that properly
renders Pydantic discriminated unions. Hides the unusable parent object
handle, auto-populates
the discriminator value, shows a dropdown with variant titles (e.g.,
"Username" / "UserId"), and
filters out the internal discriminator field from the form.
Non-discriminated `oneOf` schemas
  fall back to existing `AnyOfField` behavior.
- **Collapsible object outputs**: Object-type outputs with nested keys
(e.g.,
`PersonLookupResponse.Url`, `PersonLookupResponse.profile`) are now
collapsed by default behind a
caret toggle. Nested keys show short names instead of the full
`Parent.Key` prefix.
- **Node layout cleanup**: Removed excessive bottom margin (`mb-6`) from
`FormRenderer`, hide the
Advanced toggle when no advanced fields exist, and add rounded bottom
corners on OUTPUT-type
  blocks.

<img width="440" height="427" alt="Screenshot 2026-03-10 at 11 31 55 AM"
src="https://github.com/user-attachments/assets/06cc5414-4e02-4371-bdeb-1695e7cb2c97"
/>
<img width="371" height="320" alt="Screenshot 2026-03-10 at 11 36 52 AM"
src="https://github.com/user-attachments/assets/1a55f87a-c602-4f4d-b91b-6e49f810e5d5"
/>

  ## Test plan
- [x] Add a Twitter Get User block — verify "Identifier" shows a
dropdown (Username/UserId) with
no unusable parent handle, discriminator field is hidden, and the block
can run without staying
  INCOMPLETE
- [x] Add any block with object outputs (e.g., PersonLookupResponse) —
verify nested keys are
  collapsed by default and expand on click with short labels
- [x] Verify blocks without advanced fields don't show the Advanced
toggle
- [x] Verify existing `anyOf` schemas (optional types, 3+ variant
unions) still render correctly
  - [x] Check OUTPUT-type blocks have rounded bottom corners

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: eureka928 <meobius123@gmail.com>
2026-03-10 14:13:32 +00:00
Bently
6a6b23c2e1 fix(frontend): Remove unused Otto Server Action causing 107K+ errors (#12336)
## Summary

Fixes [OPEN-3025](https://linear.app/autogpt/issue/OPEN-3025) —
**107,571+ Server Action errors** in production

Removes the orphaned `askOtto` Server Action that was left behind after
the Otto chat widget removal in PR #12082.

## Problem

Next.js Server Actions that are never imported are excluded from the
server manifest. Old client bundles still reference the action ID,
causing "not found" errors.

**Sentry impact:**
- **BUILDER-3BN:** 107,571 events
- **BUILDER-729:** 285 events  
- **BUILDER-3QH:** 1,611 events
- **36+ users affected**

## Root Cause

1. **Mar 2025:** Otto widget added to `/build` page with `askOtto`
Server Action
2. **Feb 2026:** Otto widget removed (PR #12082), but `actions.ts` left
behind
3. **Result:** Dead code → not in manifest → errors

## Evidence

```bash
# Zero imports across frontend:
grep -r "askOtto" src/ --exclude="actions.ts"
# → No results

# Server manifest missing the action:
cat .next/server/server-reference-manifest.json
# → Only includes login/supabase actions, NOT build/actions
```

## Changes

-  Delete
`autogpt_platform/frontend/src/app/(platform)/build/actions.ts`

## Testing

1. Verify no imports of `askOtto` in codebase 
2. Check Sentry for error drop after deploy
3. Monitor for new "Server Action not found" errors

## Checklist

- [x] Dead code confirmed (zero imports)
- [x] Sentry issues documented
- [x] Clear commit message with context
2026-03-10 09:03:38 +00:00
Dream
d0a1d72e8a fix(frontend/builder): batch undo history for cascading operations (#12344)
## Summary

Fixes undo in the Builder not working correctly when deleting nodes.
When a node is deleted, React Flow fires `onNodesChange` (node removal)
and `onEdgesChange` (cascading edge cleanup) as separate callbacks —
each independently pushing to the undo history stack. This creates
intermediate states that break undo:

- Single undo restores a partial state (e.g. edges pointing to a deleted
node)
- Multiple undos required to fully restore the graph
- Redo also produces inconsistent states

Resolves #10999

### Changes 🏗️

- **`historyStore.ts`** — Added microtask-based batching to
`pushState()`. Multiple calls within the same synchronous execution
(same event loop tick) are coalesced into a single history entry,
keeping only the first pre-change snapshot. Uses `queueMicrotask` so all
cascading store updates from a single user action settle before the
history entry is committed.
- Reset `pendingState` in `initializeHistory()` and `clear()` to prevent
stale batched state from leaking across graph loads or navigation.

**Side benefit:** Copy/paste operations that add multiple nodes and
edges now also produce a single history entry instead of one per
node/edge.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Place 3 blocks A, B, C and connect A→B→C
  - [x] Delete block C (removes node + cascading edge B→C)
  - [x] Delete connection A→B
  - [x] Undo — connection A→B restored (single undo, not multiple)
  - [x] Undo — block C and connection B→C restored
  - [x] Redo — block C removed again with its connections
- [x] Copy/paste multiple connected blocks — single undo reverts entire
paste

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
2026-03-10 04:55:07 +00:00
Zamil Majdy
f1945d6a2f feat(platform/copilot): @@agptfile: file-ref protocol for tool call inputs + block input toggle (#12332)
## Summary

- **Problem**: When the LLM calls a tool with large file content, it
must rewrite all content token-by-token. This is wasteful since the
files are already accessible on disk.
- **Solution**: Introduces an \`@@agptfile:\` reference protocol. The
LLM passes a file path reference; the processor loads and substitutes
the content before executing the tool.

### Protocol

\`\`\`
@@agptfile:<uri>[<start>-<end>]
\`\`\`

**Supported URI types:**
| URI | Source |
|-----|--------|
| \`workspace://<file_id>\` | Persistent workspace file by ID |
| \`workspace:///<path>\` | Workspace file by virtual path |
| \`/absolute/path\` | Absolute host or sandbox path |

**Line range** is optional; omitting it reads the whole file.

### Backend changes

- Rename \`@file:\` → \`@@agptfile:\` prefix for uniqueness; extract
\`FILE_REF_PREFIX\` constant
- Extract shared execution-context ContextVars into
\`backend/copilot/context.py\` — eliminates duplicate ContextVar objects
that caused \`e2b_file_tools.py\` to always see empty context
- \`tool_adapter.py\` imports ContextVars from \`context.py\` (single
source of truth)
- \`expand_file_refs_in_string\` raises \`FileRefExpansionError\` on
failure (instead of inline error strings), blocking tool execution and
returning a clear error hint to the model
- Tighten URI regex: only expand refs starting with \`workspace://\` or
\`/\`
- Aggregate budget: 1 MB total expansion cap across all refs in one
string
- Per-file cap: 200 KB per individual ref
- Fix \`_read_file_handler\` to pass \`get_sdk_cwd()\` to
\`is_allowed_local_path\` — ephemeral working directory files were
incorrectly blocked
- Fix \`_is_allowed_local\` in \`e2b_file_tools.py\` to pass
\`get_sdk_cwd()\`
- Restrict local path allow-list to \`tool-results/\` subdirectory only
(was entire session project dir)
- Add \`raise_on_error\` param + remove two-pass \`_FILE_REF_ERROR_RE\`
detection
- Update system prompt docs and tool_adapter error messages

### Frontend changes

- \`BlockInputCard\`: hidden by default with Show/Hide toggle + \`mb-2\`
spacing

## Test plan

- [ ] \`poetry run pytest backend/copilot/ -x
--ignore=backend/copilot/sdk/file_ref_integration_test.py\` passes
- [ ] \`@@agptfile:workspace:///<path>[1-50]\` expands correctly in tool
calls
- [ ] Invalid line ranges produce \`[file-ref error: ...]\` inline
messages
- [ ] Files outside \`sdk_cwd\` / \`tool-results/\` are rejected
- [ ] Block input card shows hidden by default with toggle
2026-03-09 18:39:13 +00:00
Zamil Majdy
6491cb1e23 feat(copilot): local agent generation with validation, fixing, MCP & sub-agent support (#12238)
## Summary

Port the agent generation pipeline from the external AgentGenerator
service into local copilot tools, making the Claude Agent SDK itself
handle validation, fixing, and block recommendation — no separate inner
LLM calls needed.

Key capabilities:
- **Local agent generation**: Create, edit, and customize agents
entirely within the SDK session
- **Graph validation**: 9 validation checks (block existence, link
references, type compatibility, IO blocks, etc.)
- **Graph fixing**: 17+ auto-fix methods (ID repair, link rewiring, type
conversion, credential stripping, dynamic block sink names, etc.)
- **MCP tool blocks**: Guide and fixer support for MCPToolBlock nodes
with proper dynamic input schema handling
- **Sub-agent composition**: AgentExecutorBlock support with library
agent schema enrichment
- **Embedding fallback**: Falls back to OpenRouter for embeddings when
`openai_internal_api_key` is unavailable
- **Actionable error messages**: Excluded block types (MCP, Agent)
return specific hints redirecting to the correct tool

### New Tools
- `validate_agent_graph` — run 9 validation checks on agent JSON
- `fix_agent_graph` — apply 17+ auto-fixes to agent JSON
- `get_blocks_for_goal` — recommend blocks for a given goal (with
optimized descriptions)

### Refactored Tools
- `create_agent`, `edit_agent`, `customize_agent` — accept `agent_json`
for local generation with shared fix→validate→save pipeline
- `find_block` — added `include_schemas` parameter, excludes MCP/Agent
blocks with actionable hints
- `run_block` — actionable error messages for excluded block types
- `find_library_agent` — enriched with `graph_version`, `input_schema`,
`output_schema` for sub-agent composition

### Architecture
- Split 2,558-line `validation.py` into `fixer.py`, `validator.py`,
`helpers.py`, `pipeline.py`
- Extracted shared `fix_validate_and_save()` pipeline (was duplicated
across 3 tools)
- Shared `OPENROUTER_BASE_URL` constant across codebase
- Comprehensive test coverage: 78+ unit tests for fixer/validator, 8
run_block tests, 17 SDK compat tests

## Test plan
- [x] `poetry run format` passes
- [x] `poetry run pytest -s -vvv backend/copilot/` — all tests pass
- [x] CI green on all Python versions (3.11, 3.12, 3.13)
- [x] Manual E2E: copilot generates agents with correct IO blocks,
links, and node structure
- [x] Manual E2E: MCP tool blocks use bare field names for dynamic
inputs
- [x] Manual E2E: sub-agent composition with AgentExecutorBlock
2026-03-09 16:10:22 +00:00
nKOxxx
c7124a5240 Add documentation for Google Gemini integration (#12283)
## Summary
Adding comprehensive documentation for Google Gemini integration with
AutoGPT.

## Changes
- Added setup instructions for Gemini API
- Documented configuration options
- Added examples and best practices

## Related Issues
N/A - Documentation improvement

## Testing
- Verified documentation accuracy
- Tested all code examples

## Checklist
- [x] Code follows project style
- [x] Documentation updated
- [x] Tests pass (if applicable)
2026-03-09 15:13:28 +00:00
Zamil Majdy
5537cb2858 dx: add shared Claude Code skills as auto-triggered guidelines (#12297)
## Summary
- Add 8 Claude Code skills under \`.claude/skills/\` that act as
**auto-triggered guidelines** — the LLM invokes them automatically based
on context, no manual \`/command\` needed
- Skills: \`pr-review\`, \`pr-create\`, \`new-block\`,
\`openapi-regen\`, \`backend-check\`, \`frontend-check\`,
\`worktree-setup\`, \`code-style\`
- Each skill has an explicit TRIGGER condition so the LLM knows when to
apply it without being asked

## Changes

### Skills (all auto-triggered by context)
| Skill | Trigger |
|-------|---------|
| \`pr-review\` | User shares a PR URL or asks to address review
comments |
| \`pr-create\` | User asks to create a PR, push changes for review, or
submit work |
| \`new-block\` | User asks to create a new block or add a new
integration |
| \`openapi-regen\` | API routes change, new endpoints added, or
frontend types are stale |
| \`backend-check\` | Backend Python code has been modified |
| \`frontend-check\` | Frontend TypeScript/React code has been modified
|
| \`worktree-setup\` | User asks to work on a branch in isolation or set
up a worktree |
| \`code-style\` | Writing or reviewing Python code |

## Test plan
- [ ] Verify skills appear automatically in Claude Code when context
matches (no \`/command\` needed)
- [ ] Modify frontend code — confirm \`frontend-check\` fires
automatically
- [ ] Ask Claude to "create a PR" — confirm \`pr-create\` fires without
\`/pr-create\`
- [ ] Share a PR URL — confirm \`pr-review\` fires automatically

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:10:38 +00:00
Zamil Majdy
aef5f6d666 feat(copilot): E2B sandbox auto-pause between turns to eliminate idle billing (#12330)
## Summary

### Before
- E2B sandboxes ran continuously between CoPilot turns, billing for idle
time
- Sandbox timeout caused **termination** (kill), losing all session
state
- No explicit cleanup when sessions were deleted — sandboxes leaked
- Single timeout concept with no separation between pause and kill
semantics

### After
- **Per-turn pause**: `pause_sandbox()` is called in the `finally` block
after every CoPilot turn, stopping billing instantly between turns
(paused sandboxes cost \$0 compute)
- **Auto-pause safety net**: Sandboxes are created with
`lifecycle={"on_timeout": "pause"}` (`pause_timeout` = 4h default) so
they auto-pause rather than terminate if the explicit pause is missed
- **Auto-reconnect**: `AsyncSandbox.connect()` in e2b SDK v2
auto-resumes paused sandboxes transparently — no extra code needed
- **Session delete cleanup**: `kill_sandbox()` is now called in
`delete_chat_session()` to explicitly terminate sandboxes and free
resources
- **Two distinct timeouts**: `pause_timeout` (4h, e2b auto-pause) vs
`redis_ttl` (12h, session key lifetime)

### Key Changes

| File | Change |
|------|--------|
| `pyproject.toml` | Bump `e2b-code-interpreter` `1.x` → `2.x` |
| `e2b_sandbox.py` | Add `pause_sandbox()`, `kill_sandbox()`,
`_act_on_sandbox()` helper; `lifecycle={"on_timeout": "pause"}`;
separate `pause_timeout` / `redis_ttl` params |
| `sdk/service.py` | Call `pause_sandbox()` in `finally` block
**before** transcript upload; use walrus operator for type-safe
`e2b_api_key` narrowing |
| `model.py` | Call `kill_sandbox()` in `delete_chat_session()`; inline
import to avoid circular dependency |
| `config.py` | Add `e2b_active` property; rename `e2b_sandbox_timeout`
default to 4h |
| `e2b_sandbox_test.py` | Add `test_pause_then_reconnect_reuses_sandbox`
test; update all `sandbox_timeout` → `pause_timeout` |

### Verified E2E
- Used real `E2B_API_KEY` from k8s dev cluster to manually verify:
sandbox created → paused → `is_running() == False` → reconnected via
`connect()` → state preserved → killed

## Test plan
- [x] `poetry run pytest backend/copilot/tools/e2b_sandbox_test.py` —
all 19 tests pass
- [x] CI: test (3.11, 3.12, 3.13), types — all green
- [x] E2E verified with real E2B credentials
2026-03-09 14:55:10 +00:00
Ubbe
8063391d0a feat(frontend/copilot): pin interactive tool cards outside reasoning collapse (#12346)
## Summary

<img width="400" height="227" alt="Screenshot 2026-03-09 at 22 43 10"
src="https://github.com/user-attachments/assets/0116e260-860d-4466-9763-e02de2766e50"
/>

<img width="600" height="618" alt="Screenshot 2026-03-09 at 22 43 14"
src="https://github.com/user-attachments/assets/beaa6aca-afa8-483f-ac06-439bf162c951"
/>

- When the copilot stream finishes, tool calls that require user
interaction (credentials, inputs, clarification) are now **pinned**
outside the "Show reasoning" collapse instead of being hidden
- Added `isInteractiveToolPart()` helper that checks tool output's
`type` field against a set of interactive response types
- Modified `splitReasoningAndResponse()` to extract interactive tools
from reasoning into the visible response section
- Added styleguide section with 3 demos: `setup_requirements`,
`agent_details`, and `agent_saved` pinning scenarios

### Interactive response types kept visible:
`setup_requirements`, `agent_details`, `block_details`, `need_login`,
`input_validation_error`, `clarification_needed`, `suggested_goal`,
`agent_preview`, `agent_saved`

Error responses remain in reasoning (LLM explains them in final text).

Closes SECRT-2088

## Test plan
- [ ] Verify copilot stream with interactive tool (e.g. run_agent
requiring credentials) keeps the tool card visible after stream ends
- [ ] Verify non-interactive tools (find_block, bash_exec) still
collapse into "Show reasoning"
- [ ] Verify styleguide page at `/copilot/styleguide` renders the new
"Reasoning Collapse: Interactive Tool Pinning" section correctly
- [ ] Verify `pnpm types`, `pnpm lint`, `pnpm format` all pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 23:12:14 +08:00
Otto
0bbb12d688 fix(frontend/copilot): hide New Chat button on Autopilot homepage (#12321)
Requested by @0ubbe

The **New Chat** button was visible on the Autopilot homepage where
clicking it has no effect (since `sessionId` is already `null`). This
hides the button when no chat session is active, so it only appears when
the user is viewing a conversation and wants to start a new one.

**Changes:**
- `ChatSidebar.tsx` — hide button in both collapsed and expanded sidebar
states when `sessionId` is null
- `MobileDrawer.tsx` — same fix for mobile drawer

---
Co-authored-by: Ubbe <ubbe@users.noreply.github.com>
2026-03-09 22:41:11 +08:00
Otto
eadc68f2a5 feat(frontend/copilot): move microphone button to right side of input box (#12320)
Requested by @olivia-1421

Moves the microphone/recording button from the left-side tools group to
the right side, next to the submit button. The left side is now reserved
for the attachment/upload (plus) button only.

**Before:** `[ 📎 🎤 ] .................. [ ➤ ]`
**After:**  `[ 📎 ] .................. [ 🎤 ➤ ]`

---
Co-authored-by: Olivia <olivia-1421@users.noreply.github.com>

---------

Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 18:37:02 +08:00
Reinier van der Leer
19d775c435 Merge commit from fork 2026-03-08 10:25:24 +01:00
Reinier van der Leer
eca7b5e793 Merge commit from fork 2026-03-08 10:24:44 +01:00
Otto
c304a4937a fix(backend): Handle manual run attempts for triggered agents (#12298)
When a webhook-triggered agent is executed directly (e.g. via Copilot)
without actual webhook data, `GraphExecution.from_db()` crashes with
`KeyError: 'payload'` because it does a hard key access on
`exec.input_data["payload"]` for webhook blocks.

This caused 232 Sentry events (AUTOGPT-SERVER-821) and multiple
INCOMPLETE graph executions due to retries.

**Changes:**

1. **Defensive fix in `from_db()`** — use `.get("payload")` instead of
`["payload"]` to handle missing keys gracefully (matches existing
pattern for input blocks using `.get("value")`)

2. **Upfront refusal in `_construct_starting_node_execution_input()`** —
refuse execution of webhook/webhook_manual blocks when no payload is
provided. The check is placed after `nodes_input_masks` application, so
legitimate webhook triggers (which inject payload via
`nodes_input_masks`) pass through fine.

Resolves [SENTRY-1113: Copilot is able to manually initiate runs for
triggered agents (which
fails)](https://linear.app/autogpt/issue/SENTRY-1113/copilot-is-able-to-manually-initiate-runs-for-triggered-agents-which)

---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
2026-03-06 20:47:51 +00:00
Zamil Majdy
7ead4c040f hotfix(backend/copilot): capture tool results in transcript (#12323)
## Summary
- Fixes tool results not being captured in the CoPilot transcript during
SDK-based streaming
- Adds `transcript_builder.add_user_message()` call with `tool_result`
content block when a `StreamToolOutputAvailable` event is received
- Ensures transcript accurately reflects the full conversation including
tool outputs, which is critical for Langfuse tracing and debugging

## Context
After the transcript refactor in #12318, tool call results from the SDK
streaming loop were not being recorded in the transcript. This meant
Langfuse traces were missing tool outputs, making it hard to debug agent
behavior.

## Test plan
- [ ] Verify CoPilot conversation with tool calls captures tool results
in Langfuse traces
- [ ] Verify transcript includes tool_result content blocks after tool
execution
2026-03-06 18:58:48 +00:00
Zamil Majdy
8cfabcf4fd refactor(backend/copilot): centralize prompt building in prompting.py (#12324)
## Summary

Centralizes all prompt building logic into a new
`backend/copilot/prompting.py` module with clear SDK vs baseline and
local vs E2B distinctions.

### Key Changes

**New `prompting.py` module:**
- `get_sdk_supplement(use_e2b, cwd)` - For SDK mode (NO tool docs -
Claude gets schemas automatically)
- `get_baseline_supplement(use_e2b, cwd)` - For baseline mode (WITH
auto-generated tool docs from TOOL_REGISTRY)
- Handles local/E2B storage differences

**SDK mode (`sdk/service.py`):**
- Removed 165+ lines of duplicate constants
- Now imports and uses `get_sdk_supplement()`
- Cleaner, more maintainable

**Baseline mode (`baseline/service.py`):**
- Now appends `get_baseline_supplement()` to system prompt
- Baseline mode finally gets tool documentation!

**Enhanced tool descriptions:**
- `create_agent`: Added feedback loop workflow (suggested_goal,
clarifying_questions)
- `run_mcp_tool`: Added known server URLs, 2-step workflow, auth
handling

**Tests:**
- Updated to verify SDK excludes tool docs, baseline includes them
- All existing tests pass

### Architecture Benefits

 Single source of truth for prompt supplements
 Clear SDK vs baseline distinction (SDK doesn't need tool docs)
 Clear local vs E2B distinction (storage systems)
 Easy to maintain and update
 Eliminates code duplication

## Test plan

- [x] Unit tests pass (TestPromptSupplement class)
- [x] SDK mode excludes tool documentation
- [x] Baseline mode includes tool documentation
- [x] E2B vs local mode differences handled correctly
2026-03-06 18:56:20 +00:00
Zamil Majdy
7bf407b66c Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-03-07 02:01:41 +07:00
Abhimanyu Yadav
0f813f1bf9 feat(copilot): Add folder management tools to CoPilot (#12290)
Adds folder management capabilities to the CoPilot, allowing users to
organize agents into folders directly from the chat interface.

<img width="823" height="356" alt="Screenshot 2026-03-05 at 5 26 30 PM"
src="https://github.com/user-attachments/assets/4c55f926-1e71-488f-9eb6-fca87c4ab01b"
/>
<img width="797" height="150" alt="Screenshot 2026-03-05 at 5 28 40 PM"
src="https://github.com/user-attachments/assets/5c9c6f8b-57ac-4122-b17d-b9f091bb7c4e"
/>
<img width="763" height="196" alt="Screenshot 2026-03-05 at 5 28 36 PM"
src="https://github.com/user-attachments/assets/d1b22b5d-921d-44ac-90e8-a5820bb3146d"
/>
<img width="756" height="199" alt="Screenshot 2026-03-05 at 5 30 17 PM"
src="https://github.com/user-attachments/assets/40a59748-f42e-4521-bae0-cc786918a9b5"
/>

### Changes

**Backend -- 6 new CoPilot tools** (`manage_folders.py`):
- `create_folder` -- Create folders with optional parent, icon, and
color
- `list_folders` -- List folder tree or children of a specific folder,
with optional `include_agents` to show agents inside each folder
- `update_folder` -- Rename or change icon/color
- `move_folder` -- Reparent a folder or move to root
- `delete_folder` -- Soft-delete (agents moved to root, not deleted)
- `move_agents_to_folder` -- Bulk-move agents into a folder or back to
root

**Backend -- DatabaseManager RPC registration**:
- Registered all 7 folder DB functions (`create_folder`, `list_folders`,
`get_folder_tree`, `update_folder`, `move_folder`, `delete_folder`,
`bulk_move_agents_to_folder`) in `DatabaseManager` and
`DatabaseManagerAsyncClient` so they work via RPC in the CoPilotExecutor
process
- `manage_folders.py` uses `db_accessors.library_db()` pattern
(consistent with all other copilot tools) instead of direct Prisma
imports

**Backend -- folder_id threading**:
- `create_agent` and `customize_agent` tools accept optional `folder_id`
to save agents directly into a folder
- `save_agent_to_library` -> `create_graph_in_library` ->
`create_library_agent` pipeline passes `folder_id` through
- `create_library_agent` refactored from `asyncio.gather` to sequential
loop to support conditional `folderId` assignment on the main graph only
(not sub-graphs)

**Backend -- system prompt and models**:
- Added folder tool descriptions and usage guidance to Otto's system
prompt
- Added `FolderAgentSummary` model for lightweight agent info in folder
listings
- Added 6 `ResponseType` enum values and corresponding Pydantic response
models (`FolderInfo`, `FolderTreeInfo`, `FolderCreatedResponse`, etc.)

**Frontend -- FolderTool UI component**:
- `FolderTool.tsx` -- Renders folder operations in chat using the
`file-tree` molecule component for tree view, with `FileIcon` for agents
and `FolderIcon` for folders (both `text-neutral-600`)
- `helpers.ts` -- Type guards, output parsing, animation text helpers,
and `FolderAgentSummary` type
- `MessagePartRenderer.tsx` -- Routes 6 folder tool types to
`FolderTool` component
- Flat folder list view shows agents inside `FolderCard` when
`include_agents` is set

**Frontend -- file-tree molecule**:
- Fixed 3 pre-existing lint errors in `file-tree.tsx` (unused `ref`,
`handleSelect`, `className` params)
- Updated tree indicator line color from `bg-neutral-100` to
`bg-neutral-400` for visibility
- Added `file-tree.stories.tsx` with 5 stories: Default, AllExpanded,
FoldersOnly, WithInitialSelection, NoIndicator
- Added `ui/scroll-area.tsx` (dependency of file-tree, was missing from
non-legacy ui folder)

### Checklist

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create a folder via copilot chat ("create a folder called
Marketing")
  - [x] List folders ("show me my folders")
- [x] List folders with agents ("show me my folders and the agents in
them")
- [x] Update folder name/icon/color ("rename Marketing folder to Sales")
- [x] Move folder to a different parent ("move Sales into the Projects
folder")
  - [x] Delete a folder and verify agents move to root
- [x] Move agents into a folder ("put my newsletter agent in the
Marketing folder")
- [x] Create agent with folder_id ("create a scraper agent and save it
in my Tools folder")
- [x] Verify FolderTool UI renders loading, success, error, and empty
states correctly
- [x] Verify folder tree renders nested folders with file-tree component
- [x] Verify agents appear as FileIcon nodes in tree view when
include_agents is true
  - [x] Verify file-tree storybook stories render correctly
2026-03-06 14:59:03 +00:00
Reinier van der Leer
aa08063939 refactor(backend/db): Improve & clean up Marketplace DB layer & API (#12284)
These changes were part of #12206, but here they are separately for
easier review.
This is all primarily to make the v2 API (#11678) work possible/easier.

### Changes 🏗️

- Fix relations between `Profile`, `StoreListing`, and `AgentGraph`
- Redefine `StoreSubmission` view with more efficient joins (100x
speed-up on dev DB) and more consistent field names
- Clean up query functions in `store/db.py`
- Clean up models in `store/model.py`
- Add missing fields to `StoreAgent` and `StoreSubmission` views
- Rename ambiguous `agent_id` -> `graph_id`
- Clean up API route definitions & docs in `store/routes.py`
  - Make routes more consistent
- Avoid collision edge-case between `/agents/{username}/{agent_name}`
and `/agents/{store_listing_version_id}/*`
- Replace all usages of legacy `BackendAPI` for store endpoints with
generated client
- Remove scope requirements on public store endpoints in v1 external API

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Test all Marketplace views (including admin views)
    - [x] Download an agent from the marketplace
  - [x] Submit an agent to the Marketplace
  - [x] Approve/reject Marketplace submission
2026-03-06 14:38:12 +00:00
Zamil Majdy
bde6a4c0df Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev
# Conflicts:
#	autogpt_platform/backend/backend/copilot/sdk/service.py
2026-03-06 21:07:37 +07:00
Zamil Majdy
d56452898a hotfix(backend/copilot): refactor transcript to SDK-based atomic full-context model (#12318)
## Summary

Major refactor to eliminate CLI transcript race conditions and simplify
the codebase by building transcripts directly from SDK messages instead
of reading CLI files.

## Problem

The previous approach had race conditions:
- SDK reads CLI transcript file during stop hook
- CLI may not have finished writing → incomplete transcript
- Complex merge logic to detect and fix incomplete writes
- ~200 lines of synthetic entry detection and merge code

## Solution

**Atomic Full-Context Transcript Model:**
- Build transcript from SDK messages during streaming
(`TranscriptBuilder`)
- Each upload REPLACES the previous transcript entirely (atomic)
- No CLI file reading → no race conditions
- Eliminates all merge complexity

## Key Changes

### Core Refactor
- **NEW**: `transcript_builder.py` - Build JSONL from SDK messages
during streaming
- **SIMPLIFIED**: `transcript.py` - Removed merge logic, simplified
upload/download
- **SIMPLIFIED**: `service.py` - Use TranscriptBuilder, removed stop
hook callback
- **CLEANED**: `security_hooks.py` - Removed `on_stop` parameter

### Performance & Code Quality
- **orjson migration**: Use `backend.util.json` (2-3x faster than
stdlib)
- Added `fallback` parameter to `json.loads()` for cleaner error
handling
- Moved SDK imports to top-level per code style guidelines

### Bug Fixes
- Fixed garbage collection bug in background task handling
- Fixed double upload bug in timeout handling  
- Downgraded PII-risk logging from WARNING to DEBUG
- Added 30s timeout to prevent session lock hang

## Code Removed (~200 lines)

- `merge_with_previous_transcript()` - No longer needed
- `read_transcript_file()` - No longer needed
- `CapturedTranscript` dataclass - No longer needed
- `_on_stop()` callback - No longer needed
- Synthetic entry detection logic - No longer needed
- Manual append/merge logic in finally block - No longer needed

## Testing

-  All transcript tests passing (24/24)
-  Verified with real session logs showing proper transcript growth
-  Verified with Langfuse traces showing proper turn tracking (1-8)

## Transcript Growth Pattern

From session logs:
- **Turn 1**: 2 entries (initial)
- **Turn 2**: 5 entries (+3), 2257B uploaded
- **Turn N**: ~2N entries (linear growth)

Each upload is the **complete atomic state** - always REPLACES, never
incremental.

## Files Changed

```
backend/copilot/sdk/transcript_builder.py (NEW)   | +140 lines
backend/copilot/sdk/transcript.py                  | -198, +125 lines  
backend/copilot/sdk/service.py                     | -214, +160 lines
backend/copilot/sdk/security_hooks.py              | -33, +10 lines
backend/copilot/sdk/transcript_test.py             | -85, +36 lines
backend/util/json.py                               | +45 lines
```

**Net result**: -200 lines, more reliable, faster JSON operations.

## Migration Notes

This is a **breaking change** for any code that:
- Directly calls `merge_with_previous_transcript()` or
`read_transcript_file()`
- Relies on incremental transcript uploads
- Expects stop hook callbacks

All internal usage has been updated.

---

@ntindle - Tagging for autogpt-reviewer
2026-03-06 21:03:49 +07:00
Ubbe
7507240177 feat(copilot): collapse repeated tool calls and fix stream stuck on completion (#12282)
## Summary
- **Frontend:** Group consecutive completed generic tool parts into
collapsible summary rows with a "Reasoning" collapse for finalized
messages. Merge consecutive assistant messages on hydration to avoid
split bubbles. Extract GenericTool helpers. Add `reconnectExhausted`
state and a brief delay before refetching session to reduce stale
`active_stream` reconnect cycles.
- **Backend:** Make transcript upload fire-and-forget instead of
blocking the generator exit. The 30s upload timeout in
`_try_upload_transcript` was delaying `mark_session_completed()`,
keeping the SSE stream alive with only heartbeats after the LLM had
finished — causing the UI to stay stuck in "streaming" state.

## Test plan
- [ ] Send a message in Copilot that triggers multiple tool calls —
verify they collapse into a grouped summary row once completed
- [ ] Verify the final text response appears below the collapsed
reasoning section
- [ ] Confirm the stream properly closes after the agent finishes (no
stuck "Stop" button)
- [ ] Refresh mid-stream and verify reconnection works correctly
- [ ] Click Stop during streaming — verify the UI becomes responsive
immediately

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 21:21:59 +08:00
Abhimanyu Yadav
d7c3f5b8fc fix(frontend): bypass Next.js proxy for file uploads to fix 413 error (#12315)
## Summary
- File uploads routed through the Next.js API proxy (`/api/proxy/...`)
fail with HTTP 413 for files >4.5MB due to Vercel's serverless function
body size limit
- Created shared `uploadFileDirect` utility (`src/lib/direct-upload.ts`)
that uploads files directly from the browser to the Python backend,
bypassing the proxy entirely
- Updated `useWorkspaceUpload` to use direct upload instead of the
generated hook (which went through the proxy)
- Deduplicated the copilot page's inline upload logic to use the same
shared utility

## Changes 🏗️
- **New**: `src/lib/direct-upload.ts` — shared utility for
direct-to-backend file uploads (up to 256MB)
- **Updated**: `useWorkspaceUpload.ts` — replaced proxy-based generated
hook with `uploadFileDirect`
- **Updated**: `useCopilotPage.ts` — replaced inline upload logic with
shared `uploadFileDirect`, removed unused imports

## Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Upload a file >5MB via workspace file input (e.g. in agent
builder) — should succeed without 413
  - [x] Upload a file >5MB via copilot chat — should succeed without 413
  - [x] Upload a small file (<1MB) via both paths — should still work
  - [x] Verify file delete still works from workspace file input
2026-03-06 12:20:18 +00:00
Otto
3e108a813a fix(backend): Use db_manager for workspace in add_graph_execution (#12312)
When `add_graph_execution` is called from a context where the global
Prisma client isn't connected (e.g. CoPilot tools, external API), the
call to `get_or_create_workspace(user_id)` crashes with
`ClientNotConnectedError` because it directly accesses
`UserWorkspace.prisma()`.

The fix adds `workspace_db` to the existing `if prisma.is_connected()`
fallback pattern, consistent with how all other DB calls in the function
already work.

**Sentry:** AUTOGPT-SERVER-83T (and ~15 related issues going back to Jan
2026)

---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>

Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
2026-03-06 08:48:15 +01:00
Krzysztof Czerwinski
08c49a78f8 feat(copilot): UX improvements (#12258)
CoPilot conversation UX improvements (SECRT-2055):

1. **Rename conversations** — Inline rename via the session dropdown
menu. New `PATCH /sessions/{session_id}/title` endpoint with server-side
validation (rejects blank/whitespace-only titles, normalizes
whitespace). Pressing Enter or clicking away submits; Escape cancels
without submitting.

2. **New Chat button moved to top & sticky** — The 'New Chat' button is
now at the top of the sidebar (under 'Your chats') instead of the
footer, and stays fixed — only the session list below it scrolls. A
subtle shadow separator mirrors the original footer style.

3. **Auto-generated title appears live** — After the first message in a
new chat, the sidebar polls for the backend-generated title and animates
it in smoothly once available. The backend also guards against
auto-title overwriting a user-set title.

4. **External Link popup redesign** — Replaced the CSS-hacked external
link confirmation dialog with a proper AutoGPT `Dialog` component using
the design system (`Button`, `Text`, `Dialog`). Removed the old
`globals.css` workaround.

<img width="321" height="263" alt="Screenshot 2026-03-03 at 6 31 50 pm"
src="https://github.com/user-attachments/assets/3cdd1c6f-cca6-4f16-8165-15a1dc2d53f7"
/>

<img width="374" height="74" alt="Screenshot 2026-03-02 at 6 39 07 pm"
src="https://github.com/user-attachments/assets/6f9fc953-5fa7-4469-9eab-7074e7604519"
/>

<img width="548" height="293" alt="Screenshot 2026-03-02 at 6 36 28 pm"
src="https://github.com/user-attachments/assets/0f34683b-7281-4826-ac6f-ac7926e67854"
/>

### Changes 🏗️

**Backend:**
- `routes.py`: Added `PATCH /sessions/{session_id}/title` endpoint with
`UpdateSessionTitleRequest` Pydantic model — validates non-blank title,
normalizes whitespace, returns 404 vs 500 correctly
- `routes_test.py`: New test file — 7 test cases covering success,
whitespace trimming, blank rejection (422), not found (404), internal
failure (500)
- `service.py`: Auto-title generation now checks if a user-set title
already exists before overwriting
- `openapi.json`: Updated with new endpoint schema

**Frontend:**
- `ChatSidebar.tsx`: Inline rename (Enter/blur submits, Escape cancels
via ref flag); "New Chat" button sticky at top with shadow separator;
session title animates when auto-generated title appears
(`AnimatePresence`)
- `useCopilotPage.ts`: Polls for auto-generated title after stream ends,
stops as soon as title appears in cache
- `MobileDrawer.tsx`: Updated to match sidebar layout changes
- `DeleteChatDialog.tsx`: Removed redundant `onClose` prop (controlled
Dialog already handles close)
- `message.tsx`: Added `ExternalLinkModal` using AutoGPT design system;
removed redundant `onClose` prop
- `globals.css`: Removed old CSS hack for external link modal

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create a new chat, send a message — verify auto-generated title
appears in sidebar without refresh
- [x] Rename a chat via dropdown — Enter submits, Escape reverts, blank
title rejected
- [x] Rename a chat, then send another message — verify user title is
not overwritten by auto-title
- [x] With many chats, scroll the sidebar — verify "New Chat" button
stays fixed at top
- [x] Click an external link in a message — verify the new dialog
appears with AutoGPT styling

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 06:01:41 +00:00
Bently
5d56548e6b fix(frontend): prevent crash on /library with 401 error from pagination helper (#12292)
## Changes
Fixes crash on `/library` page when backend returns a 401 authentication
error.

### Problem

When the backend returns a 401 error, React Query still calls
`getNextPageParam` with the error response. The response doesn't have
the expected pagination structure, causing `pagination` to be
`undefined`. The code then crashes trying
 to access `pagination.current_page`.

Error:
TypeError: Cannot read properties of undefined (reading 'current_page')
    at Object.getNextPageParam

### Solution

Added a defensive null check in `getPaginationNextPageNumber()` to
handle cases where `pagination` is undefined:

```typescript
const { pagination } = lastPage.data;
if (!pagination) return undefined;
```
When undefined is returned, React Query interprets this as "no next page
available" and gracefully stops pagination instead of crashing.

Testing

- Manual testing: Verify /library page handles 401 errors without
crashing
- The fix is defensive and doesn't change behavior for successful
responses

Related Issues

Closes OPEN-2684
2026-03-05 19:52:36 +00:00
Otto
6ecf55d214 fix(frontend): fix 'Open link' button text color to white for contrast (#12304)
Requested by @ntindle

The Streamdown external link safety modal's "Open link" button had dark
text (`color: black`) on a dark background, making it unreadable.
Changed to `color: white` for proper contrast per our design system.

**File:** `autogpt_platform/frontend/src/app/globals.css`

Resolves SECRT-2061

---
Co-authored-by: Nick Tindle (@ntindle)
2026-03-05 19:50:39 +00:00
Bently
7c8c7bf395 feat(llm): add Claude Sonnet 4.6 model (#12158)
## Summary
Adds Claude Sonnet 4.6 (`claude-sonnet-4-6`) to the platform.

## Model Details (from [Anthropic
docs](https://www.anthropic.com/news/claude-sonnet-4-6))
- **API ID:** `claude-sonnet-4-6`
- **Pricing:** $3 / input MTok, $15 / output MTok (same as Sonnet 4.5)
- **Context window:** 200K tokens (1M beta)
- **Max output:** 64K tokens
- **Knowledge cutoff:** Aug 2025 (reliable), Jan 2026 (training data)

## Changes
- Added `CLAUDE_4_6_SONNET` to `LlmModel` enum
- Added metadata entry with correct context/output limits
- Updated Stagehand to use Sonnet 4.6 (better for browser automation
tasks)

## Why
Sonnet 4.6 brings major improvements in coding, computer use, and
reasoning. Developers with early access often prefer it to even Opus
4.5.

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-03-05 19:36:56 +00:00
Zamil Majdy
0b9e0665dd Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT 2026-03-06 02:32:36 +07:00
Zamil Majdy
be18436e8f Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-03-06 02:31:40 +07:00
Zamil Majdy
f6f268a1f0 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into HEAD 2026-03-06 02:29:56 +07:00
Zamil Majdy
ea0333c1fc fix(copilot): always upload transcript instead of size-based skip (#12303)
## Summary

Fixes copilot sessions "forgetting" previous turns due to stale
transcript storage.

**Root cause:** The transcript upload logic used byte size comparison
(`existing >= new → skip`) to prevent overwriting newer transcripts with
older ones. However, with `--resume` the CLI compacts old tool results,
so newer transcripts can have **fewer bytes** despite containing **more
conversation events**. This caused the stored transcript to freeze at
whatever the largest historical upload was — every subsequent turn
downloaded the same stale transcript and the agent lost context of
recent turns.

**Evidence from prod session `41a3814c`:**
- Stored transcript: 764KB (frozen, never updated)
- Turn 1 output: 379KB (75 lines) → upload skipped (764KB >= 379KB)
- Turn 2 output: 422KB (71 lines) → upload skipped (764KB >= 422KB)
- Turn 3 output: **empty** → upload skipped
- Agent resumed from the same stale 764KB transcript every turn, losing
context of the PR it created

**Fix:** Remove the size comparison entirely. The executor holds a
cluster lock per session, so concurrent uploads cannot race. Just always
overwrite with the latest transcript.

## Test plan
- [x] `poetry run pytest backend/copilot/sdk/transcript_test.py` — 25/25
pass
- [x] All pre-commit hooks pass
- [ ] After deploy: verify multi-turn sessions retain context across
turns
2026-03-06 02:26:52 +07:00
Zamil Majdy
21c705af6e fix(backend/copilot): prevent title update from overwriting session messages (#12302)
### Changes 🏗️

Fixes a race condition in `update_session_title()` where the background
title generation task could overwrite the Redis session cache with a
stale snapshot, causing the copilot to "forget" its previous turns.

**Root cause:** `update_session_title()` performs a read-modify-write on
the Redis cache (read full session → set title → write back). Meanwhile,
`upsert_chat_session()` writes a newer version with more messages during
streaming. If the title task reads early (e.g., 34 messages) and writes
late (after streaming persisted 101 messages), the stale 34-message
version overwrites the 101-message version. When the next message lands
on a different pod, it loads the stale session from Redis.

**Fix:** Replace the read-modify-write with a simple cache invalidation
(`invalidate_session_cache`). The title is already updated in the DB;
the next access just reloads from DB with the correct title and
messages. No locks, no deserialization of the full session blob, no risk
of stale overwrites.

**Evidence from prod logs (session `41a3814c`):**
- Pod `tm2jb` persisted session with 101 messages
- Pod `phflm` loaded session from Redis cache with only 35 messages (66
messages lost)
- The title background task ran between these events, overwriting the
cache

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `poetry run pytest backend/copilot/model_test.py` — 15/15 pass
  - [x] All pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] After deploy: verify long sessions no longer lose context on
multi-pod setups
2026-03-05 18:49:41 +00:00
Zamil Majdy
a576be9db2 fix(backend): install agent-browser + Chromium in Docker image (#12301)
The Copilot browser tool (`browser_navigate`, `browser_act`,
`browser_screenshot`) has been broken on dev because `agent-browser` CLI
+ Chromium were never installed in the backend Docker image.

### Changes 🏗️

- Added `npx playwright install-deps chromium` to install Chromium
runtime libraries (libnss3, libatk, etc.)
- Added `npm install -g agent-browser` to install the CLI
- Added `agent-browser install` to download the Chromium binary
- Layer is placed after existing COPY-from-builder lines to preserve
Docker cache ordering

### Root cause

Every `browser_navigate` call fails with:
```
WARNING  [browser_navigate] open failed for <url>: agent-browser is not installed
(run: npm install -g agent-browser && agent-browser install).
```
The error originates from `FileNotFoundError` in `agent_browser.py:101`
when the subprocess tries to execute the `agent-browser` binary which
doesn't exist in the container.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `agent-browser` binary is missing from current dev pod
via `kubectl logs`
- [x] Confirmed session `01eeac29-5a7` shows repeated failures for all
URLs
- [ ] After deploy: verify browser_navigate works in a Copilot session
on dev

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-03-05 18:44:55 +00:00
dependabot[bot]
5e90585f10 chore(deps): bump crazy-max/ghaction-github-runtime from 3 to 4 (#12262)
Bumps
[crazy-max/ghaction-github-runtime](https://github.com/crazy-max/ghaction-github-runtime)
from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/crazy-max/ghaction-github-runtime/releases">crazy-max/ghaction-github-runtime's
releases</a>.</em></p>
<blockquote>
<h2>v3.1.0</h2>
<ul>
<li>Bump <code>@​actions/core</code> from 1.10.0 to 1.11.1 in <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/58">crazy-max/ghaction-github-runtime#58</a></li>
<li>Bump braces from 3.0.2 to 3.0.3 in <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/54">crazy-max/ghaction-github-runtime#54</a></li>
<li>Bump cross-spawn from 7.0.3 to 7.0.6 in <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/59">crazy-max/ghaction-github-runtime#59</a></li>
<li>Bump ip from 2.0.0 to 2.0.1 in <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/50">crazy-max/ghaction-github-runtime#50</a></li>
<li>Bump micromatch from 4.0.5 to 4.0.8 in <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/55">crazy-max/ghaction-github-runtime#55</a></li>
<li>Bump tar from 6.1.14 to 6.2.1 in <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/pull/51">crazy-max/ghaction-github-runtime#51</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/crazy-max/ghaction-github-runtime/compare/v3.0.0...v3.1.0">https://github.com/crazy-max/ghaction-github-runtime/compare/v3.0.0...v3.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="04d248b846"><code>04d248b</code></a>
Merge pull request <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/76">#76</a>
from crazy-max/node24</li>
<li><a
href="c8f8e4e4e2"><code>c8f8e4e</code></a>
node 24 as default runtime</li>
<li><a
href="494a382acb"><code>494a382</code></a>
Merge pull request <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/68">#68</a>
from crazy-max/dependabot/npm_and_yarn/actions/core-2.0.1</li>
<li><a
href="5d51b8ef32"><code>5d51b8e</code></a>
Merge pull request <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/74">#74</a>
from crazy-max/dependabot/npm_and_yarn/minimatch-3.1.5</li>
<li><a
href="f7077dccce"><code>f7077dc</code></a>
chore: update generated content</li>
<li><a
href="4d1e03547a"><code>4d1e035</code></a>
chore(deps): bump minimatch from 3.1.2 to 3.1.5</li>
<li><a
href="b59d56d5bc"><code>b59d56d</code></a>
chore(deps): bump <code>@​actions/core</code> from 1.11.1 to 2.0.1</li>
<li><a
href="6d0e2ef281"><code>6d0e2ef</code></a>
Merge pull request <a
href="https://redirect.github.com/crazy-max/ghaction-github-runtime/issues/75">#75</a>
from crazy-max/esm</li>
<li><a
href="41d6f6acdb"><code>41d6f6a</code></a>
remove codecov config</li>
<li><a
href="b5018eca65"><code>b5018ec</code></a>
chore: update generated content</li>
<li>Additional commits viewable in <a
href="https://github.com/crazy-max/ghaction-github-runtime/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=crazy-max/ghaction-github-runtime&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-03-05 15:59:06 +00:00
Zamil Majdy
3e22a0e786 fix(copilot): pin claude-agent-sdk to 0.1.45 to fix tool_reference content block validation error (#12294)
Requested by @majdyz

## Problem

CoPilot throws `400 Invalid Anthropic Messages API request` errors on
first message, both locally and on Dev.

## Root Cause

The CLI's built-in `ToolSearch` tool returns `tool_reference` content
blocks (`{"type": "tool_reference", "tool_name":
"mcp__copilot__find_block"}`). When the CLI constructs the next
Anthropic API request, it passes these blocks as-is in the
`tool_result.content` field. However, the Anthropic Messages API only
accepts `text` and `image` content block types in tool results.

This causes a Zod validation error:
```
messages[3].content[0].content: Invalid input: expected string, received array
```

The error only manifests when using **OpenRouter** (`ANTHROPIC_BASE_URL`
set) because the Anthropic TypeScript SDK performs stricter client-side
Zod validation in that code path vs the subscription auth path.

PR #12288 bumped `claude-agent-sdk` from `0.1.39` to `^0.1.46`, which
upgraded the bundled Claude CLI from `v2.1.49` to `v2.1.69` where this
issue was introduced.

## Fix

Pin to `0.1.45` which has a CLI version that doesn't produce
`tool_reference` content blocks in tool results.

## Testing
- CoPilot first message should work without 400 errors via OpenRouter
- SDK compat tests should still pass
2026-03-05 13:12:26 +00:00
Ubbe
6abe39b33a feat(frontend/copilot): add per-turn work-done summary stats (#12257)
## Summary
- Adds per-turn work-done counters (e.g. "3 searches", "1 agent run")
shown as plain text on the final assistant message of each
user/assistant interaction pair
- Counters aggregate tool calls by category (searches, agents run,
blocks run, agents created/edited, agents scheduled)
- Copy and TTS actions now appear only on the final assistant message
per turn, with text aggregated from all assistant messages in that turn
- Removes the global JobStatsBar above the chat input

Resolves: SECRT-2026

## Test plan
- [ ] Work-done counters appear only on the last assistant message of
each turn (not on intermediate assistant messages)
- [ ] Counters increment correctly as tool call parts appear in messages
- [ ] Internal operations (add_understanding, search_docs, get_doc_page,
find_block) are NOT counted
- [ ] Max 3 counter categories shown, sorted by volume
- [ ] Copy/TTS actions appear only on the final assistant message per
turn
- [ ] Copy/TTS aggregate text from all assistant messages in the turn
- [ ] No counters or actions shown while streaming is still in progress
- [ ] No type errors, lint errors, or format issues introduced

Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 19:32:48 +08:00
Zamil Majdy
476cf1c601 feat(copilot): support Claude Code subscription auth for SDK mode (#12288)
## Summary

- Adds `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION` config flag to let the
copilot SDK path use the Claude CLI's own subscription auth (from
`claude login`) instead of API keys
- When enabled, the SDK subprocess inherits CLI credentials — no
`ANTHROPIC_BASE_URL`/`AUTH_TOKEN` override is injected
- Forces SDK mode regardless of LaunchDarkly flag (baseline path uses
`openai.AsyncOpenAI` which requires an API key)
- Validates CLI installation on first use with clear error messages

## Setup

```bash
npm install -g @anthropic-ai/claude-code
claude login
# then set in .env:
CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true
```

## Changes

| File | Change |
|------|--------|
| `copilot/config.py` | New `use_claude_code_subscription` field + env
var validator |
| `copilot/sdk/service.py` | `_validate_claude_code_subscription()` +
`_build_sdk_env()` early-return + fail-fast guard |
| `copilot/executor/processor.py` | Force SDK mode via short-circuit
`or` |

## Test plan

- [ ] Set `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true`, unset all API keys
- [ ] Run `claude login` on the host
- [ ] Start backend, send a copilot message — verify SDK subprocess uses
CLI auth
- [ ] Verify existing OpenRouter/API key flows still work (no
regression)
2026-03-05 09:55:35 +00:00
Zamil Majdy
25022f2d1e fix(copilot): handle empty tool_call arguments in baseline path (#12289)
## Summary

Handle empty/None `tool_call.arguments` in the baseline copilot path
that cause OpenRouter 400 errors when converting to Anthropic format.

## Changes

**`backend/copilot/baseline/service.py`**:
- Default empty `tc["arguments"]` to `"{}"` to prevent OpenRouter from
failing on empty tool arguments during format conversion.

## Test plan

- [x] Existing baseline tests pass
- [ ] Verify on staging: trigger a tool call in baseline mode and
confirm normal flow works
2026-03-05 09:53:05 +00:00
Zamil Majdy
ce1675cfc7 feat(copilot): add Langfuse tracing to baseline LLM path (#12281)
## Summary
Depends on #12276 (baseline code).

- Swap shared OpenAI client to `langfuse.openai.AsyncOpenAI` —
auto-captures all LLM calls (token usage, latency, model, prompts) as
Langfuse generations when configured
- Add `propagate_attributes()` context in baseline streaming for
`user_id`/`session_id` attribution, matching the SDK path's OTEL tracing
- No-op when Langfuse is not configured — `langfuse.openai.AsyncOpenAI`
falls back to standard `openai.AsyncOpenAI` behavior

## Observability parity

| Aspect | SDK path | Baseline path (after this PR) |
|--------|----------|-------------------------------|
| LLM call tracing | OTEL via `configure_claude_agent_sdk()` |
`langfuse.openai.AsyncOpenAI` auto-instrumentation |
| User/session context | `propagate_attributes()` |
`propagate_attributes()` |
| Langfuse prompts | Shared `_build_system_prompt()` | Shared
`_build_system_prompt()` |
| Token/cost tracking | Via OTEL spans | Via Langfuse generation objects
|

## Test plan
- [x] `poetry run format` passes (pyright, ruff, black, isort)
- [ ] Verify Langfuse traces appear for baseline path with
`CHAT_USE_CLAUDE_AGENT_SDK=false`
- [ ] Verify SDK path tracing is unaffected
2026-03-05 09:51:16 +00:00
Otto
3d0ede9f34 feat(backend/copilot): attach uploaded images and PDFs as multimodal vision blocks (#12273)
Requested by @majdyz

When users upload images or PDFs to CoPilot, the AI couldn't see the
content because the CLI's Zod validator rejects large base64 in MCP tool
results and even small images were misidentified (the CLI silently drops
or corrupts image content blocks in tool results).

## Approach

Embed uploaded images directly as **vision content blocks** in the user
message via `client._transport.write()`. The SDK's `client.query()` only
accepts string content, so we bypass it for multimodal messages —
writing a properly structured user message with `[...image_blocks,
{"type": "text", "text": query}]` directly to the transport. This
ensures the CLI binary receives images as native vision blocks, matching
how the Anthropic API handles multimodal input.

For binary files accessed via workspace tools at runtime, we save them
to the SDK's ephemeral working directory (`sdk_cwd`) and return a file
path for the CLI's built-in `Read` tool to handle natively.

## Changes

### Vision content blocks for attached files — `service.py`
- `_prepare_file_attachments` downloads workspace files before the
query, converts images to base64 vision blocks (`{"type": "image",
"source": {"type": "base64", ...}}`)
- When vision blocks are present, writes multimodal user message
directly to `client._transport` instead of using `client.query()`
- Non-image files (PDFs, text) are saved to `sdk_cwd` with a hint to use
the Read tool

### File-path based access for workspace tools — `workspace_files.py`
- `read_workspace_file` saves binary files to `sdk_cwd` instead of
returning base64, returning a path for the Read tool

### SDK context for ephemeral directory — `tool_adapter.py`
- Added `sdk_cwd` context variable so workspace tools can access the
ephemeral directory
- Removed inline base64 multimodal block machinery
(`_extract_content_block`, `_strip_base64_from_text`, `_BLOCK_BUILDERS`,
etc.)

### Frontend — rendering improvements
- `MessageAttachments.tsx` — uses `OutputRenderers` system
(`globalRegistry` + `OutputItem`) for image/video preview rendering
instead of custom components
- `GenericTool.tsx` — uses `OutputRenderers` system for inline image
rendering of base64 content
- `routes.py` — returns 409 for duplicate workspace filenames

### Tests
- `tool_adapter_test.py` — removed multimodal extraction/stripping
tests, added `get_sdk_cwd` tests
- `service_test.py` — rewritten for `_prepare_file_attachments` with
file-on-disk assertions

Closes OPEN-3022

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-05 09:09:59 +00:00
Ubbe
5474f7c495 feat(frontend/copilot): add output action buttons (upvote, downvote) with Langfuse feedback (#12260)
## Summary
- Feedback is submitted to the backend Langfuse integration
(`/api/chat/sessions/{id}/feedback`) for observability
- Downvote opens a modal dialog for optional detailed feedback text (max
2000 chars)
- Buttons are hidden during streaming and appear on hover; once feedback
is selected they stay visible

## Changes
- **`AssistantMessageActions.tsx`** (new): Renders copy (CopySimple),
thumbs-up, and thumbs-down buttons using `MessageAction` from the design
system. Visual states for selected feedback (green for upvote, red for
downvote with filled icons).
- **`FeedbackModal.tsx`** (new): Dialog with a textarea for optional
downvote comment, using the design system `Dialog` component.
- **`useMessageFeedback.ts`** (new): Hook managing per-message feedback
state and backend submission via `POST
/api/chat/sessions/{id}/feedback`.
- **`ChatMessagesContainer.tsx`** (modified): Renders
`AssistantMessageActions` after `MessageContent` for assistant messages
when not streaming.
- **`ChatContainer.tsx`** (modified): Passes `sessionID` prop through to
`ChatMessagesContainer`.

## Test plan
- [ ] Verify action buttons appear on hover over assistant messages
- [ ] Verify buttons are hidden during active streaming
- [ ] Click copy button → text copied to clipboard, success toast shown
- [ ] Click upvote → green highlight, "Thank you" toast, button locked
- [ ] Click downvote → red highlight, feedback modal opens
- [ ] Submit feedback modal with/without comment → modal closes,
feedback sent
- [ ] Cancel feedback modal → modal closes, downvote stays locked
- [ ] Verify feedback POST reaches `/api/chat/sessions/{id}/feedback`

### Linear issue
Closes SECRT-2051

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 17:01:05 +08:00
Abhimanyu Yadav
f1b771b7ee feat(platform): switch builder file inputs from base64 to workspace uploads (#12226)
## Summary

Builder node file inputs were stored as base64 data URIs directly in
graph JSON, bloating saves and causing lag. This PR uploads files to the
existing workspace system and stores lightweight `workspace://`
references instead.

## What changed

- **Upload**: When a user picks a file in a builder node input, it gets
uploaded to workspace storage and the graph stores a small
`workspace://file-id#mime/type` URI instead of a huge base64 string.

- **Delete**: When a user clears a file input, the workspace file is
soft-deleted from storage so it doesn't leave orphaned files behind.

- **Execution**: Wired up `workspace_id` on `ExecutionContext` so blocks
can resolve `workspace://` URIs during graph runs. `store_media_file()`
already knew how to handle them.

- **Output rendering**: Added a renderer that displays `workspace://`
URIs as images, videos, audio players, or download cards in node output.

- **Proxy fix**: Removed a `Content-Type: text/plain` override on
multipart form responses that was breaking the generated hooks' response
parsing.

Existing graphs with base64 `data:` URIs continue to work — no migration
needed.

## Test plan

- [x] Upload file in builder → spinner shows, completes, file label
appears
- [x] Save/reload graph → `workspace://` URI persists, not base64
- [x] Clear file input → workspace file is deleted
- [x] Run graph → blocks resolve `workspace://` files correctly
- [x] Output renders images/video/audio from `workspace://` URIs
- [x] Old graphs with base64 still work

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 08:38:18 +00:00
Otto
aa7a2f0a48 hotfix(frontend/signup): Add missing createUser() call in password signup (#12287)
Requested by @0ubbe

Password signup was missing the backend `createUser()` call that the
OAuth callback flow already had. This caused `getOnboardingStatus()` to
fail/hang for new users whose backend record didn't exist yet, resulting
in an infinite spinner after account creation.

## Root Cause

| Flow | createUser() | getOnboardingStatus() | Result |
|------|-------------|----------------------|--------|
| OAuth signup |  Called |  Works | Redirects correctly |
| Password signup |  Missing |  Fails/hangs | Infinite spinner |

## Fix

Adds `createUser()` call in `signup/actions.ts` after session is set,
before onboarding status check — matching the OAuth callback pattern.
Includes error handling with Sentry reporting.

## Testing

- Create a new password account → should redirect without spinner
- OAuth signup unaffected (no changes to that flow)

Fixes OPEN-3023

---------

Co-authored-by: Lluis Agusti <hi@llu.lu>
2026-03-05 16:11:51 +08:00
Nicholas Tindle
3722d05b9b fix(frontend/builder): make Google Drive file inputs chainable (#12274)
Resolves: OPEN-3018

Google Drive picker fields on INPUT blocks were missing connection
handles, making them non-chainable in the new builder.

### Changes 🏗️

- **Render `TitleFieldTemplate` with `InputNodeHandle`** — uses
`getHandleId()` with `fieldPathId.$id` (which correctly resolves to e.g.
`agpt_%_spreadsheet`), fixing the previous `_@_` handle error caused by
using `idSchema.$id` (undefined for custom RJSF FieldProps)
- **Override `showHandles: !!nodeId`** in uiOptions — the INPUT block's
`generate-ui-schema.ts` sets `showHandles: false`, but Google Drive
fields need handles to be chainable
- **Hide picker content when handle is connected** — uses
`useEdgeStore.isInputConnected()` to detect wired connections and
conditionally hides the picker/placeholder UI

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Add a Google Drive file input block to a graph in the new builder
  - [x] Verify the connection handle appears on the input
  - [x] Connect another block's output to the Google Drive input handle
- [x] Verify the picker UI hides when connected and reappears when
disconnected
- [x] Verify the Google Drive picker still works normally on non-INPUT
block nodes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes input-handle ID generation and conditional rendering for
Google Drive fields in the builder; regressions could break edge
connections or hide the picker unexpectedly on some nodes.
> 
> **Overview**
> Google Drive picker fields now render a proper RJSF
`TitleFieldTemplate` (and thus input handles) using a computed
`handleId` derived from `fieldPathId.$id`, and force `showHandles` on
when a `nodeId` is present.
> 
> The picker/placeholder UI is now conditionally hidden when
`useEdgeStore.isInputConnected()` reports the input handle is connected,
preventing duplicate input UI when the value comes from an upstream
node.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1f1df53a38. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: abhi1992002 <abhimanyu1992002@gmail.com>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
2026-03-05 04:28:01 +00:00
Zamil Majdy
592830ce9b feat(copilot): persist large tool outputs to workspace with retrieval instructions (#12279)
## Summary
- Large tool outputs (>80K chars) are now persisted to session workspace
storage before truncation, preventing permanent data loss
- Truncated output includes a head preview (50K chars) with clear
retrieval instructions referencing `read_workspace_file` with
offset/length
- Added `offset` and `length` parameters to `ReadWorkspaceFileTool` for
paginated reads of large files without re-triggering truncation

## Problem
Tool outputs exceeding 100K chars were permanently lost — truncated by
`StreamToolOutputAvailable.model_post_init` using middle-out truncation.
The model had no way to retrieve the full output later, causing
recursive read loops where the agent repeatedly tries to re-read
truncated data.

## Solution
1. **`BaseTool.execute()`** — When output exceeds 80K chars, persist
full output to workspace at `tool-outputs/{tool_call_id}.json`, then
replace with a head preview wrapped in `<tool-output-truncated>` tags
containing retrieval instructions
2. **`ReadWorkspaceFileTool`** — New `offset`/`length` parameters enable
paginated reads so the agent can fetch slices without re-triggering
truncation
3. **Graceful fallback** — If workspace write fails, returns raw output
unchanged for existing truncation to handle

## Test plan
- [x] `base_test.py`: 5 tests covering persist+preview, fallback on
error, small output passthrough, large output persistence, anonymous
user skip
- [x] `workspace_files_test.py`: Ranged read test covering offset+length
slice, offset-only, offset beyond file length
- [ ] CI passes
- [ ] Review comments addressed
2026-03-05 00:50:01 +00:00
Zamil Majdy
6cc680f71c feat(copilot): improve SDK loading time (#12280)
## Summary
- Skip CLI version check at worker init (saves ~300ms/request)
- Pre-warm bundled CLI binary at startup to warm OS page caches (~500ms
saved on first request per worker)
- Parallelize E2B setup, system prompt fetch, and transcript download
with `asyncio.gather()` (saves ~200-500ms)
- Enable Langfuse prompt caching with configurable TTL (default 300s)

## Test plan
- [ ] `poetry run pytest backend/copilot/sdk/service_test.py -s -vvv`
- [ ] Manual: send copilot messages via SDK path, verify resume still
works on multi-turn
- [ ] Check executor logs for "CLI pre-warm done" messages
2026-03-05 00:49:14 +00:00
Otto
b342bfa3ba fix(frontend): revalidate layout after email/password login (#12285)
Requested by @ntindle

After logging in with email/password, the page navigates but renders a
blank/unauthenticated state (just logo + cookie banner). A manual page
refresh fixes it.

The `login` server action calls `signInWithPassword()` server-side but
doesn't call `revalidatePath()`, so Next.js serves cached RSC payloads
that don't reflect the new auth state. The OAuth callback route already
does this correctly.

**Fix:** Add `revalidatePath(next, "layout")` after successful login,
matching the OAuth callback pattern.

Closes SECRT-2059
2026-03-04 22:25:48 +00:00
Zamil Majdy
0215332386 feat(copilot): remove legacy copilot, add baseline non-SDK mode with tool calling (#12276)
## Summary
- Remove ~1200 lines of broken/unmaintained non-SDK copilot streaming
code (retry logic, parallel tool calls, context window management)
- Add `stream_chat_completion_baseline()` as a clean fallback LLM path
with full tool-calling support when `CHAT_USE_CLAUDE_AGENT_SDK=false`
(e.g. when Anthropic is down)
- Baseline reuses the same shared `TOOL_REGISTRY`,
`get_available_tools()`, and `execute_tool()` as the SDK path
- Move baseline code to dedicated `baseline/` folder (mirrors `sdk/`
structure)
- Clean up SDK service: remove unused params, fix model/env resolution,
fix stream error persistence
- Clean up config: remove `max_retries`, `thinking_enabled` fields
(non-SDK only)

## Changes
| File | Action |
|------|--------|
| `backend/copilot/baseline/__init__.py` | New — package export |
| `backend/copilot/baseline/service.py` | New — baseline streaming with
tool-call loop |
| `backend/copilot/baseline/service_test.py` | New — multi-turn keyword
recall test |
| `backend/copilot/service.py` | Remove ~1200 lines of legacy code, keep
shared helpers only |
| `backend/copilot/executor/processor.py` | Simplify branching to SDK vs
baseline |
| `backend/copilot/sdk/service.py` | Remove unused params, fix model/env
separation, fix stream error persistence |
| `backend/copilot/config.py` | Remove `max_retries`, `thinking_enabled`
|
| `backend/copilot/service_test.py` | Keep SDK test only (baseline test
moved) |
| `backend/copilot/parallel_tool_calls_test.py` | Deleted (tested
removed code) |

## Test plan
- [x] `poetry run format` passes
- [x] CI passes (all 3 Python versions, types, CodeQL)
- [ ] SDK path works unchanged in production
- [x] Baseline path (`CHAT_USE_CLAUDE_AGENT_SDK=false`) streams
responses with tool calling
- [x] Baseline emits correct Vercel AI SDK stream protocol events
2026-03-04 13:51:46 +00:00
Zamil Majdy
160d6eddfb feat(copilot): enable OpenRouter broadcast for SDK /messages endpoint (#12277)
## Summary

OpenRouter Broadcast silently drops traces for the Anthropic-native
`/api/v1/messages` endpoint unless an `x-session-id` HTTP header is
present. This was confirmed by systematic testing against our Langfuse
integration:

| Test | Endpoint | `x-session-id` header | Broadcast to Langfuse |
|------|----------|-----------------------|----------------------|
| 1 | `/chat/completions` | N/A (body fields work) |  |
| 2 | `/messages` (body fields only) |  |  |
| 3 | `/messages` (header + body) |  |  |
| 4 | `/messages` (`metadata.user_id` only) |  |  |
| 5 | `/messages` (header only) |  |  |

**Root cause:** OpenRouter only triggers broadcast for the `/messages`
endpoint when the `x-session-id` HTTP header is present — body-level
`session_id` and `metadata.user_id` are insufficient.

### Changes
- **SDK path:** Inject `x-session-id` and `x-user-id` via
`ANTHROPIC_CUSTOM_HEADERS` env var in `_build_sdk_env()`, which the
Claude Agent SDK CLI reads and attaches to every outgoing API request
- **Non-SDK path:** Add `trace` object (`trace_name` + `environment`) to
`extra_body` for richer broadcast metadata in Langfuse

This creates complementary traces alongside the existing OTEL
integration: broadcast provides cost/usage data from OpenRouter while
OTEL provides full tool-call observability with `userId`, `sessionId`,
`environment`, and `tags`.

## Test plan
- [x] Verified via test script: `/messages` with `x-session-id` header →
trace appears in Langfuse with correct `sessionId`
- [x] Verified `/chat/completions` with `trace` object → trace appears
with custom `trace_name`
- [x] Pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] Deploy to dev and verify broadcast traces appear for real copilot
SDK sessions
2026-03-04 09:07:48 +00:00
Ubbe
a5db9c05d0 feat(frontend/copilot): add text-to-speech and share output actions (#12256)
## Summary
- Add text-to-speech action button to CoPilot assistant messages using
the browser Web Speech API
- Add share action button that uses the Web Share API with clipboard
fallback
- Replace inline SVG copy icon with Phosphor CopyIcon for consistency

## Linked Issue
SECRT-2052

## Test plan
- [ ] Verify copy button still works
- [ ] Click speaker icon and verify TTS reads aloud
- [ ] Click stop while playing and verify speech stops
- [ ] Click share icon and verify native share or clipboard fallback

Note: This PR should be merged after SECRT-2051 PR

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 16:08:54 +08:00
Otto
b74d41d50c fix(backend): handle UniqueViolationError in workspace file retry path (#12267)
Requested by @majdyz

When two concurrent requests write to the same workspace file path with
`overwrite=True`, the retry after deleting the conflicting file could
also hit a `UniqueViolationError`. This raw Prisma exception was
bubbling up unhandled to Sentry as a high-priority alert
(AUTOGPT-SERVER-7ZA).

Now the retry path catches `UniqueViolationError` specifically and
converts it to a `ValueError` with a clear message, matching the
existing pattern for the non-overwrite path.

**Change:** `autogpt_platform/backend/backend/util/workspace.py` — added
a specific `UniqueViolationError` catch before the generic `Exception`
catch in the retry block.

**Risk:** Minimal — only affects the already-failing retry path. No
behavior change for success paths.

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-04 07:04:50 +00:00
Otto
a897f9e124 feat(copilot): render context compaction as tool-call UI events (#12250)
Requested by @majdyz

When CoPilot compacts (summarizes/truncates) conversation history to fit
within context limits, the user now sees it rendered like a tool call —
a spinner while compaction runs, then a completion notice.

**Backend:**
- Added `compaction_start_events()`, `compaction_end_events()`,
`compaction_events()` in `response_model.py` using the existing
tool-call SSE protocol (`tool-input-start` → `tool-input-available` →
`tool-output-available`)
- All three compaction paths (legacy `service.py`, SDK pre-query, SDK
mid-stream) use the same pattern
- Pre-query and SDK-internal compaction tracked independently so neither
suppresses the other

**Frontend:**
- Added `compaction` tool category to `GenericTool` with
`ArrowsClockwise` icon
- Shows "Summarizing earlier messages…" with spinner while running
- Shows "Earlier messages were summarized" when done
- No expandable accordion — just the status line

**Cleanup:**
- Removed unused `system_notice_start/end_events`,
`COMPACTION_STARTED_MSG`
- Removed unused `system_notice_events`, `system_error_events`,
`_system_text_events`

Closes SECRT-2053

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-04 05:31:32 +00:00
Zamil Majdy
7fd26d3554 feat(copilot): run_mcp_tool — MCP server discovery and execution in Otto (#12213)
## Summary

Enables Otto (the AutoGPT copilot) to connect to any MCP (Model Context
Protocol) server, discover its tools, and execute them — with the same
credential login UI used in the graph builder.

**Why a dedicated `run_mcp_tool` instead of reusing `run_block` +
MCPToolBlock?**
Two blockers make `run_block` unworkable for MCP:
1. **No discovery mode** — `MCPToolBlock` errors with "No tool selected"
when `selected_tool` is empty; the agent can't learn what tools exist
before picking one.
2. **Credential matching bug** — `find_matching_credential()` (the block
execution path) does NOT check MCP server URLs; it would match any
stored MCP OAuth credential regardless of server. The correct
`_credential_is_for_mcp_server()` helper only applies in the graph path.

## Changes

### Backend
- **New `run_mcp_tool` copilot tool** (`run_mcp_tool.py`) — two-stage
flow:
1. `run_mcp_tool(server_url)` → discovers available tools via
`MCPClient.list_tools()`
2. `run_mcp_tool(server_url, tool_name, tool_arguments)` → executes via
`MCPClient.call_tool()`
- Lazy auth: fast DB credential lookup first
(`MCPToolBlock._auto_lookup_credential`); on HTTP 401/403 with no stored
creds, returns `SetupRequirementsResponse` so the frontend renders the
existing CredentialsGroupedView OAuth login card
- **New response models** in `models.py`: `MCPToolsDiscoveredResponse`,
`MCPToolOutputResponse`, `MCPToolInfo`
- **Exclude MCPToolBlock** from `find_block` / `run_block`
(`COPILOT_EXCLUDED_BLOCK_TYPES`)
- **System prompt update** — MCP section with two-step flow,
`input_schema` guidance, auth-wait instruction, and registry URL
(`registry.modelcontextprotocol.io`)

### Frontend
- **`RunMCPToolComponent`** — routes between credential prompt (reuses
`SetupRequirementsCard` from RunBlock) and result card; discovery step
shows only a minimal in-progress animation (agent-internal, not
user-facing)
- **`MCPToolOutputCard`** — renders tool result as formatted JSON or
plain text
- **`helpers.tsx`** — type guards (`isMCPToolOutput`,
`isSetupRequirementsOutput`, `isErrorOutput`), output parsing, animation
text
- Registered `tool-run_mcp_tool` case in `ChatMessagesContainer`

## Test plan

- [ ] Call `run_mcp_tool(server_url)` with a public MCP server → see
discovery animation, agent gets tool list
- [ ] Call `run_mcp_tool(server_url, tool_name, tool_arguments)` → see
`MCPToolOutputCard` with result
- [ ] Call with an auth-required server and no stored creds →
`SetupRequirementsCard` renders with MCP OAuth button
- [ ] After connecting credentials, retry → executes successfully
- [ ] `find_block("MCP")` returns no results (MCPToolBlock excluded)
- [ ] Backend unit tests: mock `MCPClient` for discovery + execution +
auth error paths

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-03-04 05:30:38 +00:00
Zamil Majdy
b504cf9854 feat(copilot): Add agent-browser multi-step browser automation tools (#12230)
## Summary

Adds three new Copilot tools for multi-step browser automation using the
[agent-browser](https://github.com/vercel-labs/agent-browser) CLI
(Playwright-based local daemon):

- **`browser_navigate`** — navigate to a URL and get an
accessibility-tree snapshot with `@ref` IDs
- **`browser_act`** — interact with page elements (click, fill, scroll,
check, press, select, `dblclick`, `type`, `wait`, back, forward,
reload); returns updated snapshot
- **`browser_screenshot`** — capture annotated screenshot (with `@ref`
overlays) and save to user workspace

Also adds **`browse_web`** (Stagehand + Browserbase) for one-shot
JS-rendered page extraction.

### Why two browser tools?

| Tool | When to use |
|------|-------------|
| `browse_web` | Single-shot extraction — cloud Browserbase session, no
local daemon needed |
| `browser_navigate` / `browser_act` | Multi-step flows (login →
navigate → scrape), persistent session within a Copilot session |

### Design decisions

- **SSRF protection**: Uses the same `validate_url()` from
`backend.util.request` as HTTP blocks — async DNS, all IPs checked, full
RFC 1918 + link-local + IPv6 coverage
- **Session isolation**: `_run()` passes both `--session <id>` (isolated
Chromium context per Copilot session) **and** `--session-name <id>`
(persist cookies within a session), preventing cross-session state
leakage while supporting login flows
- **Snapshot truncation**: Interactive-only accessibility tree
(`snapshot -i`) capped at 20 000 chars with a continuation hint
- **Screenshot storage**: PNG bytes uploaded to user workspace via
`WriteWorkspaceFileTool`; returns `file_id` for retrieval

### Bugs fixed in this PR

- Session isolation bug: `--session-name` alone shared browser history
across different Copilot sessions; added `--session` to isolate contexts
- Missing actions: added `dblclick`, `type` (append without clearing),
`wait` (CSS selector or ms delay)

## Test plan

- [x] 53 unit tests covering all three tools, all actions, SSRF
integration, auth check, session isolation, snapshot truncation,
timeout, missing binary
- [x] Integration test: real `agent-browser` CLI + Anthropic API
tool-calling loop (3/3 scenarios passed)
- [x] Linting (Ruff, isort, Black, Pyright) all passing

```
backend/copilot/tools/agent_browser_test.py  53 passed in 17.79s
```
2026-03-03 21:55:28 +00:00
Zamil Majdy
29da8db48e feat(copilot): E2B cloud sandbox — unified file tools, persistent execution, output truncation (#12212)
## Summary

- **E2B file tools**: New MCP tools
(`read_file`/`write_file`/`edit_file`/`glob`/`grep`) that operate
directly on the E2B sandbox filesystem (`/home/user`). When E2B is
active, these replace SDK built-in `Read/Write/Edit/Glob/Grep` so all
tools share a single coherent filesystem with `bash_exec` — no sync
needed.
- **E2B sandbox lifecycle**: New `e2b_sandbox.py` manages sandbox
creation and reconnection via Redis, with stale-key cleanup on
reconnection failure.
- **E2B enabled by default**: `use_e2b_sandbox` defaults to `True`; set
`CHAT_USE_E2B_SANDBOX=false` to disable.
- **Centralized output truncation**: All MCP tool outputs are truncated
via `_truncating` wrapper and stashed (`_pending_tool_outputs`) to
bypass SDK's head-truncation for the frontend.
- **Frontend tool display**: `GenericTool.tsx` now renders bash
stdout/stderr, file content, edit diffs (old/new), todo lists, and
glob/grep results with category-specific icons and status text.
- **Workspace file tools + E2B**: `read_workspace_file`'s `save_to_path`
and `write_workspace_file`'s `source_path` route to E2B sandbox when
active.

## Files changed

| Area | Files | What |
|------|-------|------|
| E2B file tools | `sdk/e2b_file_tools.py`, `sdk/e2b_file_tools_test.py`
| MCP file tool handlers + tests |
| E2B sandbox | `tools/e2b_sandbox.py` | Sandbox lifecycle
(create/reconnect/Redis) |
| Tool adapter | `sdk/tool_adapter.py` | MCP server, truncation, stash,
path validation |
| Service | `sdk/service.py` | E2B integration, prompt supplements |
| Security | `sdk/security_hooks.py`, `sdk/security_hooks_test.py` |
Path validation for E2B mode |
| Bash exec | `tools/bash_exec.py` | E2B execution path |
| Workspace files | `tools/workspace_files.py`,
`tools/workspace_files_test.py` | E2B-aware save/source paths |
| Config | `copilot/config.py` | E2B config fields (default on) |
| Truncation | `util/truncate.py` | Middle-out truncation fix |
| Frontend | `GenericTool.tsx` | Tool-specific display rendering |

## Test plan

- [x] `security_hooks_test.py` — 43 tests (path validation, tool access,
deny messages)
- [x] `e2b_file_tools_test.py` — 19 tests (path resolution, local read
safety)
- [x] `workspace_files_test.py` — 17 tests (ephemeral path validation)
- [x] CI green (backend 3.11/3.12/3.13, lint, types, e2e)
2026-03-03 21:31:38 +00:00
Nicholas Tindle
757ec1f064 feat(platform): Add file upload to copilot chat [SECRT-1788] (#12220)
## Summary

- Add file attachment support to copilot chat (documents, images,
spreadsheets, video, audio)
- Show upload progress with spinner overlays on file chips during upload
- Display attached files as styled pills in sent user messages using AI
SDK's native `FileUIPart`
- Backend upload endpoint with virus scanning (ClamAV), per-file size
limits, and per-user storage caps
- Enrich chat stream with file metadata so the LLM can access files via
`read_workspace_file`

Resolves: [SECRT-1788](https://linear.app/autogpt/issue/SECRT-1788)

### Backend
| File | Change |
|------|--------|
| `chat/routes.py` | Accept `file_ids` in stream request, enrich user
message with file metadata |
| `workspace/routes.py` | New `POST /files/upload` and `GET
/storage/usage` endpoints |
| `executor/utils.py` | Thread `file_ids` through
`CoPilotExecutionEntry` and RabbitMQ |
| `settings.py` | Add `max_file_size_mb` and `max_workspace_storage_mb`
config |

### Frontend
| File | Change |
|------|--------|
| `AttachmentMenu.tsx` | **New** — `+` button with popover for file
category selection |
| `FileChips.tsx` | **New** — file preview chips with upload spinner
state |
| `MessageAttachments.tsx` | **New** — paperclip pills rendering
`FileUIPart` in chat bubbles |
| `upload/route.ts` | **New** — Next.js API proxy for multipart uploads
to backend |
| `ChatInput.tsx` | Integrate attachment menu, file chips, upload
progress |
| `useCopilotPage.ts` | Upload flow, `FileUIPart` construction,
transport `file_ids` extraction |
| `ChatMessagesContainer.tsx` | Render file parts as
`MessageAttachments` |
| `ChatContainer.tsx` / `EmptySession.tsx` | Thread `isUploadingFiles`
prop |
| `useChatInput.ts` | `canSendEmpty` option for file-only sends |
| `stream/route.ts` | Forward `file_ids` to backend |

## Test plan

- [x] Attach files via `+` button → file chips appear with X buttons
- [x] Remove a chip → file is removed from the list
- [x] Send message with files → chips show upload spinners → message
appears with file attachment pills
- [x] Upload failure → toast error, chips revert to editable (no phantom
message sent)
- [x] New session (empty form): same upload flow works
- [x] Messages without files render normally
- [x] Network tab: `file_ids` present in stream POST body

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds authenticated file upload/storage-quota enforcement and threads
`file_ids` through the chat streaming path, which affects data handling
and storage behavior. Risk is mitigated by UUID/workspace scoping, size
limits, and virus scanning but still touches security- and
reliability-sensitive upload flows.
> 
> **Overview**
> Copilot chat now supports attaching files: the frontend adds
drag-and-drop and an attach button, shows selected files as removable
chips with an upload-in-progress state, and renders sent attachments
using AI SDK `FileUIPart` with download links.
> 
> On send, files are uploaded to the backend (with client-side limits
and failure handling) and the chat stream request includes `file_ids`;
the backend sanitizes/filters IDs, scopes them to the user’s workspace,
appends an `[Attached files]` metadata block to the user message for the
LLM, and forwards the sanitized IDs through `enqueue_copilot_turn`.
> 
> The backend adds `POST /workspace/files/upload` (filename
sanitization, per-file size limit, ClamAV scan, and per-user storage
quota with post-write rollback) plus `GET /workspace/storage/usage`,
introduces `max_workspace_storage_mb` config, optimizes workspace size
calculation, and fixes executor cleanup to avoid un-awaited coroutine
warnings; new route tests cover file ID validation and upload quota/scan
behaviors.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
8d3b95d046. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 20:23:27 +00:00
Ubbe
9442c648a4 fix(platform/copilot): bypass Vercel SSE proxy, refactor hook architecture (#12254)
## Summary

Reliability, architecture, and UX improvements for the CoPilot SSE
streaming pipeline.

### Frontend

- **SSE proxy bypass**: Connect directly to the Python backend for SSE
streams, avoiding the Next.js serverless proxy and its 800s Vercel
function timeout ceiling
- **Hook refactor**: Decompose the 490-line `useCopilotPage` monolith
into focused domain modules:
- `helpers.ts` — pure functions (`deduplicateMessages`,
`resolveInProgressTools`)
- `store.ts` — Zustand store for shared UI state (`sessionToDelete`,
drawer open/close)
- `useCopilotStream.ts` — SSE transport, `useChat` wrapper,
reconnect/resume logic, stop+cancel
  - `useCopilotPage.ts` — thin orchestrator (~160 lines)
- **ChatMessagesContainer refactor**: Split 525-line monolith into
sub-components:
  - `helpers.ts` — pure text parsing (markers, workspace URLs)
- `components/ThinkingIndicator.tsx` — ScaleLoader animation + cycling
phrases with pulse
- `components/MessagePartRenderer.tsx` — tool dispatch switch +
workspace media
- **Stop UX fixes**:
- Guard `isReconnecting` and resume effect with `isUserStoppingRef` so
the input unlocks immediately after explicit stop (previously stuck
until page refresh)
- Inject cancellation marker locally in `stop()` so "You manually
stopped this chat" shows instantly
- **Thinking indicator polish**: Replace MorphingBlob SVG with
ScaleLoader (16px), fix initial dark circle flash via
`animation-fill-mode: backwards`, smooth `animate-pulse` text instead of
shimmer gradient
- **ChatSidebar consolidation**: Reads `sessionToDelete` from Zustand
store instead of duplicating delete state/mutation locally
- **Auth error handling**: `getAuthHeaders()` throws on failure instead
of silently returning empty headers; 401 errors show user-facing toast
- **Stale closure fix**: Use refs for reconnect guards to avoid stale
closures during rapid reconnect cycles
- **Session switch resume**: Clear `hasResumedRef` on session switch so
returning to a session with an active stream auto-reconnects
- **Target session cache invalidation**: Invalidate the target session's
React Query cache on switch so `active_stream` is accurate for resume
- **Dedup hardening**: Content-fingerprint dedup resets on non-assistant
messages, preventing legitimate repeated responses from being dropped
- **Marker prefixes**: Hex-suffixed markers (`[__COPILOT_ERROR_f7a1__]`)
to prevent LLM false-positives
- **Code style**: Remove unnecessary `useCallback` wrappers per project
convention, replace unsafe `as` cast with runtime type guard

### Backend (minimal)

- **Faster heartbeat**: 10s → 3s interval to keep SSE alive through
proxies/LBs
- **Faster stall detection**: SSE subscriber queue timeout 30s → 10s
- **Marker prefixes**: Matching hex-suffixed prefixes for error/system
markers

## Test plan

- [ ] Verify SSE streams connect directly to backend (no Next.js proxy
in network tab)
- [ ] Verify reconnect works on transient disconnects (up to 3 attempts
with backoff)
- [ ] Verify auth failure shows user-facing toast
- [ ] Verify switching sessions and switching back shows messages and
resumes active stream
- [ ] Verify deleting a chat from sidebar works (shared Zustand state)
- [ ] Verify mobile drawer delete works (shared Zustand state)
- [ ] Verify thinking indicator shows ScaleLoader + pulsing text, no
dark circle flash
- [ ] Verify stopping a stream immediately unlocks the input and shows
"You manually stopped this chat"
- [ ] Verify marker prefix parsing still works with hex-suffixed
prefixes
- [ ] `pnpm format && pnpm lint && pnpm types` pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:56:24 +08:00
Zamil Majdy
1c51dd18aa fix(test): backdate UserBalance.updatedAt in test_block_credit_reset (#12236)
## Root cause

The test constructs \`month3\` using \`datetime.now().replace(month=3,
day=1)\` — hardcoded to **March of the real current year**. When
\`update(balance=400)\` runs, Prisma auto-sets \`updatedAt\` to the
**real wall-clock time**.

The refill guard in \`BetaUserCredit.get_credits\` is:
\`\`\`python
if (snapshot_time.year, snapshot_time.month) == (cur_time.year,
cur_time.month):
    return balance  # same month → skip refill
\`\`\`

This means the test only fails when run **during the real month of
March**, because the mocked \`month3\` and the real \`updatedAt\` both
land in March:

| Test runs in | \`snapshot_time\` (real \`updatedAt\`) | \`cur_time\`
(mocked month3) | Same? | Result |
|---|---|---|---|---|
| January 2026 | \`(2026, 1)\` | \`(2026, 3)\` |  | refill triggers  |
| February 2026 | \`(2026, 2)\` | \`(2026, 3)\` |  | refill triggers 
|
| **March 2026** | **\`(2026, 3)\`** | **\`(2026, 3)\`** | **** |
**skips refill ** |
| April 2026 | \`(2026, 4)\` | \`(2026, 3)\` |  | refill triggers  |

It would silently pass again in April, then fail again next March 2027.

## Fix

Explicitly pass \`updatedAt=month2\` when updating the balance to 400,
so the month2→month3 transition is correctly detected regardless of when
the test actually runs. This matches the existing pattern used earlier
in the same test for the month1 setup.

## Test plan
- [ ] \`pytest backend/data/credit_test.py::test_block_credit_reset\`
passes
- [ ] No other credit tests broken
2026-03-01 07:46:04 +00:00
Zamil Majdy
6f4f80871d feat(copilot): Langfuse SDK tracing for Claude Agent SDK path (#12228)
## Problem

The Copilot SDK path (`ClaudeSDKClient`) routes API calls through `POST
/api/v1/messages` (Anthropic-native endpoint). OpenRouter Broadcast
**silently excludes** this endpoint — it only forwards `POST
/api/v1/chat/completions` (OpenAI-compat) to Langfuse. As a result, all
SDK-path turns were invisible in Langfuse.

**Root cause confirmed** via live pod test: two HTTP calls (one per
endpoint), only the `/chat/completions` one appeared in Langfuse.

## Solution

Add **Langfuse SDK direct tracing** in `sdk/service.py`, wrapping each
`stream_chat_completion_sdk()` call with a `generation` observation.

### What gets captured per user turn
| Field | Value |
|---|---|
| `name` | `copilot-sdk-session` |
| `model` | resolved SDK model |
| `input` | user message |
| `output` | final accumulated assistant text |
| `usage_details.input` | aggregated input tokens (from
`ResultMessage.usage`) |
| `usage_details.output` | aggregated output tokens |
| `cost_details.total` | total cost USD |
| trace `session_id` | copilot session ID |
| trace `user_id` | authenticated user ID |
| trace `tags` | `["sdk"]` |

Token counts and cost are **aggregated** across all internal Anthropic
API calls in the session (tool-use turns included), sourced from
`ResultMessage.usage`.

### Implementation notes
- Span is opened via
`start_as_current_observation(as_type='generation')` before
`ClaudeSDKClient` enters
- Span is **always closed in `finally`** — survives errors,
cancellations, and user stops
- Fails open: any Langfuse init error is caught and logged at `DEBUG`,
tracing is disabled for that turn but the session continues normally
- Only runs when `_is_langfuse_configured()` returns true (same guard as
the non-SDK path)

## Also included

`reproduce_openrouter_broadcast_gap.py` — standalone repro script (no
sensitive data) demonstrating that `/api/v1/messages` is not captured by
OpenRouter Broadcast while `/api/v1/chat/completions` is. To be filed
with OpenRouter support.

## Test plan

- [ ] Deploy to dev, send a Copilot message via the SDK path
- [ ] Confirm trace appears in Langfuse with `tags=["sdk"]`, correct
`session_id`/`user_id`, non-zero token counts
- [ ] Confirm session still works normally when `LANGFUSE_PUBLIC_KEY` is
not set (no-op path)
- [ ] Confirm session still works on error/cancellation (span closed in
finally)
2026-02-27 16:26:46 +00:00
Ubbe
e8cca6cd9a feat(frontend/copilot): migrate ChatInput to ai-sdk prompt-input component (#12207)
## Summary

- **Migrate ChatInput** to composable `PromptInput*` sub-components from
AI SDK Elements, replacing the custom implementation with a boxy,
Claude-style input layout (textarea + footer with tools and submit)
- **Eliminate JS-based DOM height manipulation** (60+ lines removed from
`useChatInput.ts`) in favor of CSS-driven auto-resize via
`min-h`/`max-h`, fixing input sizing jumps (SECRT-2040)
- **Change stop button color** from red to black (`bg-zinc-800`) per
SECRT-2038, while keeping mic recording button red
- **Add new UI primitives**: `InputGroup`, `Spinner`, `Textarea`, and
`prompt-input` composition layer

### New files
- `src/components/ai-elements/prompt-input.tsx` — Composable prompt
input sub-components (PromptInput, PromptInputBody, PromptInputTextarea,
PromptInputFooter, PromptInputTools, PromptInputButton,
PromptInputSubmit)
- `src/components/ui/input-group.tsx` — Layout primitive with flex-col
support, rounded-xlarge styling
- `src/components/ui/spinner.tsx` — Loading spinner using Phosphor
CircleNotch
- `src/components/ui/textarea.tsx` — Base shadcn Textarea component

### Modified files
- `ChatInput.tsx` — Rewritten to compose PromptInput* sub-components
with InputGroup
- `useChatInput.ts` — Simplified: removed maxRows, hasMultipleLines,
handleKeyDown, all DOM style manipulation
- `useVoiceRecording.ts` — Removed `baseHandleKeyDown` dependency;
PromptInputTextarea handles Enter→submit natively

## Resolves
- SECRT-2042: Migrate copilot chat input to ai-sdk prompt-input
component
- SECRT-2038: Change stop button color from red to black

## Test plan
- [ ] Type a message and send it — verify it submits and clears the
input
- [ ] Multi-line input grows smoothly without sizing jumps
- [ ] Press Enter to send, Shift+Enter for new line
- [ ] Voice recording: press space on empty input to start, space again
to stop
- [ ] Mic button stays red while recording; stop-generating button is
black
- [ ] Input has boxy rectangular shape with rounded-xlarge corners
- [ ] Streaming: stop button appears during generation, clicking it
stops the stream
- [ ] EmptySession layout renders correctly with the new input
- [ ] Input is disabled during transcription with "Transcribing..."
placeholder

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 15:24:19 +00:00
Abhimanyu Yadav
bf6308e87c fix(frontend): truncate node output and fix dialog overflow (#12222)
## Summary
- **Truncate node output on canvas**: Input content now uses
`shortContent={true}` so it renders truncated text instead of full
content. Output items are capped to 3 per pin with `.slice(0, 3)`.
- **Increase truncation limit**: `TextRenderer` truncation raised from
100 to 200 characters for better readability.
- **Fix dialog content overflow**: Removed legacy `ScrollArea` from the
Full Preview dialog (`NodeDataViewer`) — it was preventing proper width
constraint, causing JSON/code content to overflow beyond the dialog
boundary. Replaced with a simple flex container that respects the
dialog's width.
- **Reposition action buttons**: Copy/download buttons moved from
right-side/absolute positioning to below the content, aligned left, for
better layout with horizontally-scrollable content.
- **Add overflow protection to ContentRenderer**: Added
`overflow-hidden` and `pre` word-wrap rules to prevent content from
breaking out of the node card on the canvas.

## Test plan
- [x] Open a node with long JSON output on the builder canvas — verify
content is truncated
- [x] Click the expand button to open "Full Output Preview" dialog —
verify content stays within dialog bounds and scrolls horizontally if
needed
- [x] Verify copy/download buttons appear below the content,
left-aligned
- [x] Check that input data also shows truncated on the canvas node
- [x] Verify output items are capped at 3 per pin on the canvas

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-02-27 10:53:16 +00:00
Otto
4e59143d16 Add plans/ to .gitignore (#12229)
Requested by @torantula

Adds `plans/` to `.gitignore` and removes one existing tracked plan
file.
2026-02-27 10:12:46 +00:00
Reinier van der Leer
d5efb6915b dx: Sync types & dependencies on pre-commit and post-checkout (#12211)
Our pre-commit hooks can use an update: the type check often fails based
on stale type definitions, the OpenAPI schema isn't synced/checked, and
the pre-push checks aren't installed by default.

### Changes 🏗️

- Regenerate Prisma `.pyi` type stub in on `prisma generate` hook:
Pyright prefers `.pyi` over `.py`, so a stale stub shadows the
regenerated `types.py`
- Also run setup hooks (dependency install, `prisma generate`, `pnpm
generate:api`) on `post-checkout`, to keep types and packages in sync
after switching branches
- Switch these hooks to `git diff` checks because `post-checkout`
doesn't support file triggers/filters
- Add `Check & Install dependencies - AutoGPT Platform - Frontend` hook
- Add `Sync API types - AutoGPT Platform - Backend -> Frontend` hook
- Fix non-ASCII issue in `export-api-schema` (`ensure_ascii=False`)
- Exclude `pnpm-lock.yaml` from `detect-secrets` hook (integrity hashes
cause ~1800 false positives)
- Add `default_stages: [pre-commit]`
- Add `post-checkout`, `pre-push` to `default_install_hook_types`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Tested locally
2026-02-26 22:28:59 +01:00
Nicholas Tindle
b9aac42056 Merge branch 'master' into dev 2026-02-26 13:39:34 -06:00
Otto
95651d33da feat(backend): add fpdf2 dependency for PDF operations in copilot executor (#12216)
Requested by @majdyz

Adds [fpdf2](https://github.com/py-pdf/fpdf2) (v2.8.6) to backend
dependencies to enable PDF generation and manipulation in the copilot
executor.

fpdf2 is a lightweight PDF generation library (no external dependencies,
pure Python) that allows creating PDFs with text, images, tables, and
more.
2026-02-26 18:21:34 +00:00
Zamil Majdy
b30418d833 fix(copilot): inject working directory into SDK prompt + workspace download links (#12215)
## Summary

- Replaces the static `_SDK_TOOL_SUPPLEMENT` placeholder path with
`_build_sdk_tool_supplement(cwd: str)` that injects the session-specific
working directory
- `sdk_cwd` is computed once via `_make_sdk_cwd(session_id)`,
`os.makedirs` is called after lock acquisition (inside the protected
`try/finally`), and the same variable is used everywhere — no drift
between prompt and execution directory
- Added `ValueError`/`OSError` error handling for cwd preparation with
proper `StreamError` emission
- Teaches the SDK agent how to share workspace files with the user via
`workspace://` Markdown links (images render inline, videos render with
player controls, other files as download links)
- `WorkspaceWriteResponse` now includes `download_url` (pre-formatted
`workspace://file_id#mime` string) and a normalised `mime_type` field
(MIME parameters stripped, lowercased)
- Frontend: workspace `workspace://` regular links now resolve to
absolute URLs so Streamdown's "Copy link" copies the full URL
- Frontend: Streamdown's "Open link" button colour overridden to match
the design system (violet accent) — previously near-invisible in dark
mode due to `--primary` resolving to near-white

## Motivation

The SDK agent was seeing a hardcoded placeholder path in the system
prompt instead of the real working directory, causing it to reference
wrong paths in tool calls. Additionally, there was no guidance for the
agent on how to share files it writes to the workspace with the user in
chat.

## Test plan

- [ ] CI green (test 3.11 / 3.12 / 3.13)
- [ ] Start a copilot session with `CHAT_USE_CLAUDE_AGENT_SDK=true` and
verify the agent references the correct `sdk_cwd` path in its tool calls
- [ ] Ask the agent to write a file and confirm it responds with a
clickable download link / inline image using the `workspace://` syntax
- [ ] Verify the "Open link" button in the Streamdown external-link
dialog is visible in both light and dark mode
- [ ] Click "Copy link" on a workspace file link and confirm it copies
the full URL (including host)
2026-02-26 17:26:19 +00:00
Otto
ed729ddbe2 feat(copilot): Wait for agent execution completion (#12147)
Adds the ability for CoPilot to wait for agent execution to complete
before returning results.

Closes SECRT-2003.

## Changes

### New: `execution_utils.py`
- `wait_for_execution()` — uses Redis pubsub to wait for execution to
reach terminal state
- `TERMINAL_STATUSES` — shared frozenset of completed/failed/terminated
- `PAUSED_STATUSES` — handles REVIEW (human-in-the-loop) as a
stop-waiting state
- `get_execution_outputs()` — helper to extract outputs

### `run_agent.py`
- New `wait_for_result` parameter (0-300 seconds)
- When >0, waits for execution to complete and returns outputs directly
- Handles completed, failed, terminated, review, and timeout states with
appropriate responses

### `agent_output.py` (view_agent_output)
- New `wait_if_running` parameter (0-300 seconds)
- Includes running/queued/review executions when waiting is requested
- Status-aware response messages (completed, failed, running, review,
etc.)

## How it works
1. After starting execution, subscribes to Redis pubsub channel for
execution events
2. Re-checks DB after subscribing to close the race window
3. `asyncio.wait_for` enforces the timeout
4. On completion: returns full outputs via `AgentOutputResponse`
5. On timeout: returns current state with guidance to check again later
6. On error/terminated: returns `ErrorResponse` with details
7. Redis connection always cleaned up via `finally` block

## Testing

- [x] Run an agent with `wait_for_result=0` — should return immediately
with execution ID (existing behavior)
- [x] Run a fast agent with `wait_for_result=60` — should return
completed outputs
- [x] Run a slow agent with `wait_for_result=5` — should timeout and
return current status
- [x] Use `view_agent_output` with `wait_if_running=0` on a completed
execution — should return outputs
- [x] Use `view_agent_output` with `wait_if_running=30` on a running
execution — should wait and return
- [ ] ~~Verify Redis connections are cleaned up (no leaked pubsub
connections after timeout)~~
- [ ] ~~Test with a failed execution — should return error response~~
- [ ] ~~Test with a terminated execution — should return error response
(not "still running")~~

## Collaboration

This PR was developed in collaboration with @Pwuts.

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-26 16:41:33 +00:00
Otto
8c7030af0b fix(copilot): handle 'all' keyword in find_library_agent tool (#12138)
When users ask CoPilot to "show all my agents" or similar, the LLM was
passing the literal string "all" as a search query to
`find_library_agent`, which matched no agents because there's no agent
named "all". (issue:
[SECRT-2002](https://linear.app/autogpt/issue/SECRT-2002))

## Changes

- **Make `query` parameter optional** in `FindLibraryAgentTool` - users
can now omit it to list all agents
- **Add special keyword handling** - keywords like "all", "*",
"everything", "any", or empty string are treated as "list all" rather
than literal searches
- **Update response messages** - differentiate between "listing all
agents" vs "searching for X"

## Example

Before:
```
User: Show me all my agents
CoPilot: find_library_agent(query="all")
Result: No agents matching 'all' found in your library
```

After:
```
User: Show me all my agents  
CoPilot: find_library_agent(query="all") OR find_library_agent()
Result: Found 5 agents in your library
```

## Testing

- [x] Test with "show me all my agents" prompt
- [x] Test with empty query
- [x] Test with specific search terms (should still work as before)

## Collaboration

This PR was developed in collaboration with @Pwuts.
2026-02-26 16:07:40 +00:00
Otto
195b14286a fix(frontend): fix Streamdown link safety modal and add origin check (#12209)
Requested by @ntindle

Fixes the Streamdown link safety modal in CoPilot with three changes:

**1. Fix invisible "Open link" button (HIGH)**
Added Streamdown's dist directory to the Tailwind content scan in
`tailwind.config.ts`. Previously, Tailwind was only scanning
`./src/**/*.{ts,tsx}`, so classes used by Streamdown's internal modal
components (like `bg-primary`, `text-primary-foreground`,
`hover:bg-primary/90`) were being purged. The "Open link" button
rendered invisible but remained clickable.

**2. Add same-origin URL whitelist (MEDIUM)**
Configured `linkSafety.onLinkCheck` on the `<Streamdown>` component in
`message.tsx` to whitelist same-origin URLs. Previously, ALL links
(including internal `/api/proxy/...` workspace download URLs) triggered
the "Open external link?" modal. Now same-origin links open directly.

**3. Add Storybook stories (LOW)**
Added `message.stories.tsx` with stories covering default messages, user
messages, internal/external links, the link safety modal, and
conversations.

### Testing
- [ ] Open link safety modal → "Open link" button is visible with proper
styling
- [ ] Click a workspace download link → opens directly (no modal)
- [ ] Click an external link → shows safety modal
- [ ] Verify in both light and dark mode
- [ ] Verify on mobile viewport
- [ ] Storybook stories render correctly

Fixes SECRT-2044
2026-02-26 15:19:54 +00:00
Zamil Majdy
29ca034e40 fix(backend/frontend): error handling, stream reconnection, and chat switching (#12205)
## Problem

CoPilot executions were experiencing:
1. **Duplicate error markers** - Both `execute()` and `_execute_async()`
called `mark_session_completed`, sending duplicate completion markers
2. **RuntimeError bypass** - RuntimeErrors that weren't SDK cleanup
issues bypassed error persistence logic
3. **Generic error messages** - StreamError showed "An error occurred"
instead of actual error text
4. **Empty chat on reconnect** - Messages cleared immediately when
reconnecting, before new messages arrived
5. **Stream not resuming** - Switching chats (A → B → A) didn't resume
active streams due to stale `hasResumedRef`
6. **Excessive diagnostic logging** - 60+ lines of STREAM_DIAG console
logs not needed in production

## Changes 🏗️

### 1. Consolidated Exception Handling
**Files:** `backend/copilot/executor/processor.py`,
`backend/copilot/sdk/service.py`

**processor.py:**
- Removed all error handling from `execute()` method
- Kept error handling only in `_execute_async()` where work happens
- Merged `CancelledError` and `BaseException` handlers into single catch
- Uses `isinstance()` to determine error message

**service.py:**
- Merged `CancelledError` and `Exception` handlers into single catch
- Moved RuntimeError check inside main Exception handler
- Prevents non-cancel-scope RuntimeErrors from bypassing error
persistence

**Impact:** Eliminates duplicate `mark_session_completed` calls, ~70
lines of code removed

---

### 2. Fixed StreamError Message
**File:** `backend/copilot/sdk/service.py`

- Changed from generic `"An error occurred. Please try again."`
- Now shows actual error: `errorText=error_msg`
- Provides real error details to frontend during active stream

---

### 3. Deferred Message Clearing on Reconnect
**File:** `frontend/src/app/(platform)/copilot/useCopilotPage.ts`

- Added `shouldClearOnNextMessageRef` flag
- Set flag when reconnect starts
- Clear old assistant messages only AFTER first new message arrives
- Prevents empty chat flicker during reconnection

---

### 4. Fixed Chat Switching Stream Resume
**File:** `frontend/src/app/(platform)/copilot/useCopilotPage.ts`

**Problem:** When switching Chat A → B → A, the stream didn't resume
because `hasResumedRef.current.get(sessionId)` was still `true`

**Fix:** Clear `hasResumedRef` entry when navigating away from session

**Flow now:**
1. In Chat A with active stream
2. Switch to Chat B → clears `hasResumedRef` for Chat A
3. Switch back to Chat A → `hasResumedRef` is false → resumes stream 

---

### 5. Removed Diagnostic Logging
**Files:** `frontend/useCopilotPage.ts`, `frontend/useChatSession.ts`,
`backend/stream_registry.py`, `backend/processor.py`,
`backend/routes.py`

- Removed all `[STREAM_DIAG]` console logs (60+ lines)
- Logs were useful for debugging but not needed in production
- Cleaner codebase, reduced noise in logs

---

### 6. Exception Handling Order Consistency
**File:** `backend/copilot/executor/processor.py`

- Made both CancelledError and regular exception branches follow same
pattern
- Set `error_msg` before logging in both cases
- Consistent code structure

---

## Architecture Quality: **9/10**

**Strengths:**
- Eliminated duplicate completion markers
- All RuntimeErrors now get proper error persistence
- Real error messages shown to users
- Stream resume works reliably when switching chats
- Cleaner codebase with diagnostic logs removed
- Consistent exception handling patterns

**Trade-offs:**
- Message clearing deferred means brief period with stale + new messages
(acceptable, prevents empty chat)

## Test Plan

- [x] Verify no duplicate completion markers sent
- [x] Trigger RuntimeError, verify error persists
- [x] Check StreamError shows actual error message
- [x] Reconnect, verify chat doesn't go empty
- [x] Switch Chat A → B → A with active stream, verify resume works
- [x] Verify no STREAM_DIAG logs in console
- [x] Run `pnpm format && pnpm lint && pnpm types` - all passed
- [x] Run `poetry run format` - all passed
- [ ] Test in production

## Checklist 📋

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan  
- [x] I have tested my changes according to the test plan
- [x] `.env.default` is updated or compatible (no config changes)
- [x] `docker-compose.yml` is updated or compatible (no config changes)
2026-02-26 13:32:25 +00:00
Reinier van der Leer
1d9dd782a8 feat(backend/api): Add POST /graphs endpoint to external API (#12208)
- Resolves [SECRT-2031: Add upload agent to Library endpoint on external
API](https://linear.app/autogpt/issue/SECRT-2031)

### Changes 🏗️

- Add `POST /graphs` to v1 external API
- Add support for requiring multiple scopes in `require_permission`
middleware
- Add `WRITE_GRAPH` and `WRITE_LIBRARY` API permission scopes

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Test `POST /graphs` endpoint through `/docs` Swagger UI
2026-02-26 12:54:39 +01:00
Krzysztof Czerwinski
a1cb3d2a91 feat(blocks): Add Telegram blocks (#12141)
Add Telegram blocks that allow the use of [Telegram bots' API
features](https://core.telegram.org/bots/features).

### Changes 🏗️

1. Credentials & API layer: Bot token auth via `APIKeyCredentials`,
helper functions for JSON API calls (call_telegram_api) and multipart
file uploads (call_telegram_api_with_file)
2. Trigger blocks:
- `TelegramMessageTriggerBlock` — receives messages (text, photo, voice,
audio, document, video, edited message) with configurable event filters
- `TelegramMessageReactionTriggerBlock` — fires on reaction changes
(private chats auto, groups require admin)
2. Action blocks (11 total):
  - Send: Message, Photo, Voice, Audio, Document, Video
  - Reply to Message, Edit Message, Delete Message
  - Get File (download by file_id)
3. Webhook manager: Registers/deregisters webhooks via Telegram's
setWebhook API, validates incoming requests using
X-Telegram-Bot-Api-Secret-Token header
4. Provider registration: Added TELEGRAM to ProviderName enum and
registered `TelegramWebhooksManager`
5. Media send blocks support both URL passthrough (Telegram fetches
directly) and file upload for workspace/data URI inputs

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Non-AI UUIDs
  - [x] Blocks work correctly
    - [x] SendTelegramMessageBlock
    - [x] SendTelegramPhotoBlock
    - [x] SendTelegramVoiceBlock
    - [x] SendTelegramAudioBlock
    - [x] SendTelegramDocumentBlock
    - [x] SendTelegramVideoBlock
    - [x] ReplyToTelegramMessageBlock
    - [x] GetTelegramFileBlock
    - [x] DeleteTelegramMessageBlock
    - [x] EditTelegramMessageBlock
    - [x] TelegramMessageTriggerBlock (works for every trigger type)
    - [x] TelegramMessageReactionTriggerBlock

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-26 10:25:08 +00:00
Otto
1b91327034 fix(builder): Show X button on edge line hover, not just button hover (#12083)
## Summary

Fixes the issue where the X button for removing connections between
nodes only appears when hovering directly over the button itself. Users
now see the button when hovering anywhere on the connection line.

## Changes

- Added an invisible interaction path along the edge with a 20px stroke
width
- The path triggers the same hover state as the button
- This makes the X button visible when hovering the line OR the button
- Preserves existing behavior for broken edges (always visible)

## Testing

1. Hover over an edge line (not the button) → X button should appear
2. Move from line to button → button should stay visible  
3. Move away from both → button should fade out
4. Broken edges should still show X button always

## Linear

Fixes SECRT-1943

## Screenshots

This is a UX improvement - no visual changes except the button now
appears on line hover.

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR improves the UX for edge deletion by adding an invisible
interaction path with a 20px stroke width that makes the delete button
(X) appear when hovering anywhere along the connection line, not just
when hovering directly over the button.

**Key Changes:**
- Added invisible `<path>` element before `BaseEdge` with
`stroke="transparent"` and `strokeWidth={20}`
- Path has `onMouseEnter` and `onMouseLeave` handlers that trigger the
same `setIsHovered` state used by the delete button
- Delete button visibility logic remains unchanged: fades in when
`isHovered` is true (or always visible for broken edges)
- Works uniformly for all edge types (regular, static, and broken edges)

**How It Works:**
The invisible path creates a wider hit area (20px) around the edge
curve, making it much easier for users to trigger the hover state. When
the mouse enters this area, `isHovered` becomes true, which causes the
delete button to fade in (via the existing opacity transition logic).
The button itself also has hover handlers, so moving from the line to
the button maintains the visible state smoothly.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk - it's a small, focused UX
improvement with no logic changes
- The implementation is clean and focused: adds only 9 lines of code,
uses existing state management (`isHovered`), and doesn't modify any
deletion logic. The invisible path is a standard SVG/React pattern for
expanding hit areas, and the approach is consistent with how the delete
button already handles hover events. No breaking changes, no side
effects.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
2026-02-26 10:02:01 +00:00
Krzysztof Czerwinski
c7cdb40c5b feat(platform): Update new builder search (#11806)
### Changes 🏗️

- Add materialized view for suggested blocks
- Make `search` in builder accept comma separated filter list in query
- Remove Otto suggestions
- Use hybrid search for blocks search in builder
- Exclude `AgentExecutorBlocks` from builder
- Remove `Block` suffix from builder items

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Materialized view function works (when running manually)
- [x] Higher execution count blocks are shown first in "suggested
blocks" (uses materialized view)
  - [x] Hybrid search works
- [x] `AgentExecutorBlocks` doesn't appear on search results and in
blocks list
  - [x] `Block` suffix isn't displayed on blocks names in builder items

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-26 09:56:40 +00:00
Nicholas Tindle
77fb4419d0 Handle workspace:// URLs in regular markdown links (#12166)
### Changes 🏗️

Extended the `resolveWorkspaceUrls` function to handle both image syntax
(`![alt](workspace://id#mime)`) and regular link syntax
(`[text](workspace://id)`).

Previously, only image links were being resolved. Regular workspace
links were being blocked by Streamdown's rehype-harden sanitizer because
`workspace://` is not in the allowed URL-scheme whitelist, causing
"[blocked]" to appear next to link text.

The fix:
- Refactored the function to process image links first (existing
behavior)
- Added a second regex replacement to handle regular links using a
negative lookbehind (`(?<!!)`) to avoid matching image syntax
- Both patterns now resolve `workspace://` URLs to proxy download URLs
via `/api/proxy`
- Updated JSDoc comments to clarify the dual handling

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified image links with MIME type hints still resolve correctly
- [x] Verified regular workspace links now resolve to proxy URLs instead
of being blocked
- [x] Confirmed negative lookbehind prevents double-processing of image
syntax

https://claude.ai/code/session_0184TVJJcEoB8wbX9htCnv4b

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: a small, localized frontend markdown preprocessing change
that only rewrites `workspace://` URLs to existing `/api/proxy` download
URLs; main risk is regex edge cases affecting link rendering.
> 
> **Overview**
> Updates `resolveWorkspaceUrls` in `ChatMessagesContainer` to rewrite
**both** `workspace://` image markdown and regular markdown links into
`/api/proxy` download URLs so Streamdown sanitization no longer shows
`[blocked]` for workspace links.
> 
> Image handling is preserved (including `#video/*` MIME hints via
`video:` alt text), and a second regex pass with a negative lookbehind
avoids double-processing image syntax when rewriting plain links.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e17749b72c. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Lluis Agusti <hi@llu.lu>
2026-02-25 12:33:10 +00:00
Bently
9f002ce8f6 fix(frontend): improve UX for expired or duplicate password reset links (#12123)
## Summary
Improves the user experience when a password reset link has expired or
been used, replacing the confusing generic error with a clean, helpful
message.

## Changes
- Added `ExpiredLinkMessage` component that displays a user-friendly
error state
- Updated reset password page to detect expired/used links from:
- Supabase error format
(`error=access_denied&error_code=otp_expired&error_description=...`)
  - Internal clean format (`error=link_expired`)
- Enhanced callback route to detect and map expired/invalid link errors
- Clear, actionable UI with:
  - Friendly error message explaining what happened
  - "Send Me a New Link" button to request a new reset email
  - Login link for users who already have access

## Before
Users saw a confusing URL with error parameters and an unclear form:
```
/reset-password?error=access_denied&error_code=otp_expired&error_description=Email+link+is+invalid+or+has+expired
```

## After
Users see a clean, helpful message explaining the issue and how to fix
it.

<img width="548" height="454" alt="image"
src="https://github.com/user-attachments/assets/e867e522-146c-4d43-91b3-9e62d2957f95"
/>


Closes SECRT-1369

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Navigate to `/reset-password?error=link_expired` and verify the
ExpiredLinkMessage component appears
  - [ ] Click "Send Me a New Link" and verify the email form appears
- [ ] Navigate to
`/reset-password?error=access_denied&error_code=otp_expired` and verify
same behavior

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Improved password reset UX by adding an `ExpiredLinkMessage` component
that displays when users follow expired or already-used reset links. The
implementation detects expired link errors from Supabase
(`error_code=otp_expired`) and internal format (`error=link_expired`),
replacing confusing URL parameters with a clean message.

**Key changes:**
- Added error detection logic in both callback route and reset password
page to identify expired/invalid links
- Created new `ExpiredLinkMessage` component with friendly messaging
- Enhanced error handling to differentiate between expired links and
other errors

**Issues found:**
- The "Send Me a New Link" button misleadingly suggests it will send an
email, but it only reveals the email form - user must still enter email
and submit
- `access_denied` error detection may be too broad and could incorrectly
classify non-expired errors as expired links
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- This PR improves UX but has logic issues that could mislead users
- The implementation correctly detects expired links and displays
helpful UI, but the "Send Me a New Link" button doesn't actually send an
email (just shows the form), which creates a misleading user experience.
Additionally, the `access_denied` error check is overly broad and could
incorrectly classify errors. These are functional issues that should be
addressed before merge.
- Pay close attention to `page.tsx` - the `handleSendNewLink` function
and error detection logic need refinement
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    Start[User clicks reset link with code] --> Callback[API: /auth/callback/reset-password]
    Callback --> CheckCode{Code valid?}
    
    CheckCode -->|No - expired/invalid/used| DetectError[Detect error type]
    DetectError --> CheckExpired{Contains expired/<br/>invalid/otp_expired/<br/>already/used?}
    CheckExpired -->|Yes| RedirectExpired[Redirect to /reset-password?error=link_expired]
    CheckExpired -->|No| RedirectOther[Redirect to /reset-password?error=message]
    
    CheckCode -->|Yes| RedirectSuccess[Redirect to /reset-password]
    
    RedirectExpired --> PageLoad[Page: /reset-password]
    RedirectOther --> PageLoad
    RedirectSuccess --> PageLoad
    
    PageLoad --> ParseParams[Parse URL params]
    ParseParams --> CheckErrorParams{Has error or<br/>error_code?}
    
    CheckErrorParams -->|Yes| CheckExpiredParams{error=link_expired OR<br/>errorCode=otp_expired OR<br/>error=access_denied OR<br/>description contains<br/>expired/invalid?}
    CheckExpiredParams -->|Yes| ShowExpired[Show ExpiredLinkMessage]
    CheckExpiredParams -->|No| ShowToast[Show error toast]
    
    CheckErrorParams -->|No| CheckUser{User<br/>authenticated?}
    
    ShowExpired --> ClickButton[User clicks 'Send Me a New Link']
    ClickButton --> HideExpired[setShowExpiredMessage false]
    HideExpired --> ShowForm[Show email form]
    
    ShowToast --> ClearParams[Clear error params from URL]
    ClearParams --> CheckUser
    
    CheckUser -->|Yes| ShowPasswordForm[Show password change form]
    CheckUser -->|No| ShowForm[Show email form]
```
</details>


<sub>Last reviewed commit: 80e9f40</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-25 12:11:55 +00:00
Ubbe
74691076c6 fix(frontend/copilot): show clarification and agent-saved cards without accordion (#12204)
### Background

The CoPilot tool UI wraps several output cards (clarification questions,
agent saved confirmation) inside a collapsible `ToolAccordion`. This
means users have to expand the accordion to see important interactive
content — clarification questions they need to answer, or confirmation
that their agent was created/updated.

### Changes 🏗️

- **Clarification questions always visible**: Moved
`ClarificationQuestionsCard` out of the `ToolAccordion` in both
`CreateAgent` and `EditAgent` tools so users immediately see and can
answer questions without expanding an accordion
- **Agent saved card always visible**: Moved the agent-saved
confirmation card out of the `ToolAccordion` in both tools so the
success state with library/builder links is immediately visible
- **Extracted `AgentSavedCard` component**: The agent-saved card was
duplicated between `CreateAgent` and `EditAgent` — extracted it into a
shared `copilot/components/AgentSavedCard/AgentSavedCard.tsx` component,
parameterized by `message` ("has been saved to your library!" vs "has
been updated!")
- **ClarificationQuestionsCard polish**: Updated spacing, icon
(`ChatTeardropDotsIcon`), typography variants, border styles, and number
badge sizing for a cleaner look
- **Minor atom tweaks**: Lightened `secondary` button variant
(`zinc-200` → `zinc-100`), changed textarea border radius from
`rounded-3xl` to `rounded-xl`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format` passes
  - [x] `pnpm lint` passes
  - [x] `pnpm types` passes
- [ ] Create an agent via CoPilot and verify the saved card shows
without accordion
- [ ] Trigger clarification questions and verify they show without
accordion
- [ ] Edit an agent via CoPilot and verify the updated card shows
without accordion
- [ ] Verify the ClarificationQuestionsCard styling looks correct
(spacing, icons, borders)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 18:13:01 +07:00
Otto
b15ad0df9b hotfix(frontend): fix null credits TypeError on /copilot (#12202)
Requested by @majdyz

Fix `TypeError: Cannot read properties of null (reading 'credits')` on
the /copilot page.

**Sentry:**
[BUILDER-71P](https://significant-gravitas.sentry.io/issues/7256025912/)
**Linear:** SENTRY-1110

## Root Cause

Two issues combined:

1. **`getUserCredit()` had a broken try/catch** — it wasn't `await`ing
`_get()`, so async errors (including null responses) were never caught
2. **`_makeClientRequest` returns `null` during logout** — when a user
is logging out and `/credits` races with auth teardown, the response is
`null`

Chain: logout starts → `/credits` fetch races → auth error →
`_makeClientRequest` returns `null` → `getUserCredit` passes `null`
through → `fetchCredits` does `null.credits` → 💥

## Fix

- `getUserCredit()`: Add `await` + null coalescing fallback to `{
credits: 0 }`
- `fetchCredits()`: Add optional chaining guard (`response?.credits ??
null`)
2026-02-25 10:38:08 +00:00
Abhimanyu Yadav
2136defea8 feat(library): implement folder organization system for agents (#12101)
### Changes 🏗️

This PR adds folder organization capabilities to the library, allowing
users to organize their agents into folders:

- Added new `LibraryFolder` model and database schema
- Created folder management API endpoints for CRUD operations
- Implemented folder tree structure with proper parent-child
relationships
- Added drag-and-drop functionality for moving agents between folders
- Created folder creation dialog with emoji picker for folder icons
- Added folder editing and deletion capabilities
- Implemented folder navigation in the library UI
- Added validation to prevent circular references and excessive nesting
- Created animation for favoriting agents
- Updated library agent list to show folder structure
- Added folder filtering to agent list queries

<img width="1512" height="950" alt="Screenshot 2026-02-13 at 9 08 45 PM"
src="https://github.com/user-attachments/assets/78778e03-4349-4d50-ad71-d83028ca004a"
/>

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Create a new folder with custom name, icon, and color
  - [x] Move agents into folders via drag and drop
  - [x] Move agents into folders via context menu
  - [x] Navigate between folders
  - [x] Edit folder properties (name, icon, color)
  - [x] Delete folders and verify agents return to root
  - [x] Verify favorite animation works when adding to favorites
  - [x] Test folder navigation with search functionality
  - [x] Verify folder tree structure is maintained

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR implements a comprehensive folder organization system for
library agents, enabling hierarchical structure up to 5 levels deep.

**Backend Changes:**
- Added `LibraryFolder` model with self-referential hierarchy
(`parentId` → `Parent`/`Children`)
- Implemented CRUD operations with validation for circular references
and depth limits (MAX_FOLDER_DEPTH=5)
- Added `folderId` foreign key to `LibraryAgent` table
- Created folder management endpoints: list, get, create, update, move,
delete, and bulk agent moves
- Proper soft-delete cascade handling for folders and their contained
agents

**Frontend Changes:**
- Created folder creation/edit/delete dialogs with emoji picker
integration
- Implemented folder navigation UI with breadcrumbs and folder tree
structure
- Added drag-and-drop support for moving agents between folders
- Created context menu for agent actions (move to folder, remove from
folder)
- Added favorite animation system with `FavoriteAnimationProvider`
- Integrated folder filtering into agent list queries

**Key Features:**
- Folders support custom names, emoji icons, and hex colors
- Unique constraint per parent folder per user prevents duplicate names
- Validation prevents circular folder hierarchies and excessive nesting
- Agents can be moved between folders via drag-drop or context menu
- Deleting a folder soft-deletes all descendant folders and contained
agents
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with minor considerations for performance
optimization
- The implementation is well-structured with proper validation, error
handling, and database constraints. The folder hierarchy logic correctly
prevents circular references and enforces depth limits. However, there
are some performance concerns with N+1 queries in depth calculation and
circular reference checking that could be optimized for deeply nested
hierarchies. The foreign key constraint (ON DELETE RESTRICT) conflicts
with the hard-delete code path but shouldn't cause issues since
soft-deletes are the default. The client-side duplicate validation is
redundant but not harmful.
- Pay close attention to migration file (foreign key constraint) and
db.py (performance of recursive queries)
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant DB

    User->>Frontend: Create folder with name/icon/color
    Frontend->>API: POST /v2/folders
    API->>DB: Validate parent exists & depth limit
    API->>DB: Check unique constraint (userId, parentId, name)
    DB-->>API: Folder created
    API-->>Frontend: LibraryFolder response
    Frontend-->>User: Show success toast

    User->>Frontend: Drag agent to folder
    Frontend->>API: POST /v2/folders/agents/bulk-move
    API->>DB: Verify folder exists
    API->>DB: Update LibraryAgent.folderId
    DB-->>API: Agents updated
    API-->>Frontend: Updated agents
    Frontend-->>User: Refresh agent list

    User->>Frontend: Navigate into folder
    Frontend->>API: GET /v2/library/agents?folder_id=X
    API->>DB: Query agents WHERE folderId=X
    DB-->>API: Filtered agents
    API-->>Frontend: Agent list
    Frontend-->>User: Display folder contents

    User->>Frontend: Delete folder
    Frontend->>API: DELETE /v2/folders/{id}
    API->>DB: Get descendant folders recursively
    API->>DB: Soft-delete folders + agents in transaction
    DB-->>API: Deletion complete
    API-->>Frontend: 204 No Content
    Frontend-->>User: Show success toast
```
</details>


<sub>Last reviewed commit: a6c2f64</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-24 15:04:56 +00:00
Zamil Majdy
6e61cb103c fix(copilot): workspace file listing fix (#12190)
Requested by @majdyz

Improves workspace file display in GenericTool:
- Base64 content decoding for workspace files
- Rich file object rendering (path, size, mime type)
- MCP text extraction from SDK tool responses (Read, Glob, Grep, Edit)
- Better file list formatting for both string and object file entries

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-02-24 12:33:24 +00:00
Zamil Majdy
0e72e1f5e7 fix(platform/copilot): fix stuck sessions, stop button, and StreamFinish reliability (#12191)
## Summary

- **Fix stuck sessions**: Root cause was `_stream_listener` infinite
xread loop when Redis session metadata TTL expired — `hget` returned
`None` which bypassed the `status != "running"` break condition. Fixed
by treating `None` status as non-running.
- **Fix stop button reliability**: Cancel endpoint now force-completes
via `mark_session_completed` when executor doesn't respond within 5s.
Returns `cancelled=True` for already-expired sessions.
- **Single-owner StreamFinish**: All `yield StreamFinish()` removed from
service layers (sdk/service.py, service.py, dummy.py).
`mark_session_completed` is now the single atomic source of truth for
publishing StreamFinish via Lua CAS script.
- **Rename task → session/turn**: Consistent terminology across
stream_registry and processor.
- **Frontend session refetch**: Added `refetchOnMount: true` so page
refresh re-fetches session state.
- **Test fixes**: Updated e2e, service, and run_agent tests for
StreamFinish removal; fixed async fixture decorators.

## Test plan
- [x] E2E dummy streaming tests pass (13 passed, 1 xfailed)
- [x] run_agent_test.py passes (async fixture decorator fix)
- [x] service_test.py passes (StreamFinish assertions removed)
- [ ] Manual: verify stuck sessions recover on page refresh
- [ ] Manual: verify stop button works for active and expired sessions
- [ ] Manual: verify no duplicate StreamFinish events in SSE stream
2026-02-24 10:49:22 +00:00
Swifty
163b0b3c9d feat(backend): pre-populate CoPilotUnderstanding from Tally form on signup (#12119)
When new users sign up, check if they previously filled out the Tally
beta application form and, if so, pre-populate their
CoPilotUnderstanding with business data extracted from that form. This
gives the CoPilot (Otto) immediate context about the user on their very
first chat interaction.

### Changes 🏗️

- **`backend/util/settings.py`**: Added `tally_api_key` to `Secrets`
class
- **`backend/.env.default`**: Added `TALLY_API_KEY=` env var entry
- **`backend/data/tally.py`** (new): Core Tally integration module
- Redis-cached email index of form submissions (1h TTL) with incremental
refresh via `startDate`
  - Paginated Tally API fetching with Bearer token auth
  - Email matching (case-insensitive) against submission data
- LLM extraction (gpt-4o-mini via OpenRouter) of
`BusinessUnderstandingInput` fields
  - Fire-and-forget orchestrator that is idempotent and never raises
- **`backend/api/features/v1.py`**: Added background task in
`get_or_create_user_route` to trigger Tally lookup on login (skips if
understanding already exists)
- **`backend/data/tally_test.py`** (new): 15 unit tests covering index
building, email case-insensitivity, cache hit/miss, format helpers,
idempotency, graceful degradation, and error resilience

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All 15 unit tests pass (`poetry run pytest
backend/data/tally_test.py --noconftest -xvs`)
  - [x] Lint clean (`poetry run ruff check` on changed files)
  - [x] Type check clean (`poetry run pyright` on new files)
- [ ] Manual: Set `TALLY_API_KEY` in `.env`, create a new user, verify
CoPilotUnderstanding is populated
- [ ] Manual: Verify user creation succeeds when Tally API key is
missing or API is down

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
- Added `TALLY_API_KEY=` to `.env.default` (optional, empty by default —
feature is a no-op without it)

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

This PR adds a Tally form integration that pre-populates
`CoPilotUnderstanding` for new users by matching their signup email
against cached Tally beta application form submissions, then using an
LLM (gpt-4o-mini via OpenRouter) to extract structured business data.

- **New module `tally.py`** implements Redis-cached email indexing of
Tally form submissions with incremental refresh, email matching, LLM
extraction, and an idempotent fire-and-forget orchestrator
- **`v1.py`** adds a background task on the `get_or_create_user_route`
to trigger Tally lookup on every login (idempotency check is inside the
called function)
- **`settings.py` / `.env.default`** adds `tally_api_key` as an optional
secret — feature is a no-op without it
- **`tally_test.py`** adds 15 unit tests with thorough mocking coverage
- **Bug: TTL mismatch** — `_LAST_FETCH_TTL` (2h) > `_INDEX_TTL` (1h)
creates a window where incremental refresh loses all previously indexed
emails because the base index has expired but `last_fetch` persists.
This will cause silent data loss for users whose form submissions were
indexed before the cache expiry
- **Bug: `str.format()` on LLM prompt** — form data containing `{` or
`}` will crash the prompt formatting, silently preventing understanding
population for those users
</details>


<details><summary><h3>Confidence Score: 2/5</h3></summary>

- This PR has two logic bugs that will cause silent data loss in
production — recommend fixing before merge.
- The TTL mismatch between `_LAST_FETCH_TTL` and `_INDEX_TTL` will
intermittently cause incomplete caches, silently dropping users from the
email index. The `str.format()` issue will cause failures for any form
submission containing curly braces. Both bugs are caught by the
top-level exception handler, so they won't crash the service, but they
will silently prevent the feature from working correctly for affected
users. The overall architecture is sound and well-tested for normal
paths.
- `autogpt_platform/backend/backend/data/tally.py` — contains both the
TTL mismatch bug in `_refresh_cache` and the `str.format()` issue in
`extract_business_understanding`
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant API as v1.py (get_or_create_user_route)
    participant Tally as tally.py (populate_understanding_from_tally)
    participant DB as Database (understanding)
    participant Redis
    participant TallyAPI as Tally API
    participant LLM as OpenRouter (gpt-4o-mini)

    User->>API: POST /auth/user (JWT)
    API->>API: get_or_create_user(user_data)
    API-->>User: Return user (immediate)
    API->>Tally: asyncio.create_task(populate_understanding_from_tally)

    Tally->>DB: get_business_understanding(user_id)
    alt Understanding exists
        DB-->>Tally: existing understanding
        Note over Tally: Skip (idempotent)
    else No understanding
        DB-->>Tally: None
        Tally->>Tally: Check tally_api_key configured
        Tally->>Redis: Check cached email index
        alt Cache hit
            Redis-->>Tally: email_index + questions
        else Cache miss
            Redis-->>Tally: None
            Tally->>TallyAPI: GET /forms/{id}/submissions (paginated)
            TallyAPI-->>Tally: submissions + questions
            Tally->>Tally: Build email index
            Tally->>Redis: Cache index (1h TTL)
        end
        Tally->>Tally: Lookup email in index
        alt Email found
            Tally->>Tally: format_submission_for_llm()
            Tally->>LLM: Extract BusinessUnderstandingInput
            LLM-->>Tally: JSON structured data
            Tally->>DB: upsert_business_understanding(user_id, input)
        end
    end
```
</details>


<sub>Last reviewed commit: 92d2da4</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Otto (AGPT) <otto@agpt.co>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-24 11:31:29 +01:00
Bently
ef42b17e3b docs: add Podman compatibility warning (#12120)
## Summary
Adds a warning to the Getting Started docs clarifying that **Podman and
podman-compose are not supported**.

## Problem
Users on Windows using `podman-compose` instead of Docker get errors
like:
```
Error: the specified Containerfile or Dockerfile does not exist, ..\..\autogpt_platform\backend\Dockerfile
```

This is because Podman handles relative paths differently than Docker,
causing incorrect path resolution on Windows.

## Solution
- Added a clear warning section after the Windows WSL 2 notes
- Explains the error users might see
- Directs them to install Docker Desktop instead

Closes #11358

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Adds a "Podman Not Supported" warning section to the Getting Started
documentation, placed after the Windows/WSL 2 installation notes. The
section clarifies that Docker is required, shows the typical error
message users encounter when using Podman, and directs them to install
Docker Desktop instead. This addresses issue #11358 where Windows users
using `podman-compose` hit path resolution errors.

- Adds `### ⚠️ Podman Not Supported` section under Manual Setup, after
Windows Installation Note
- Includes the specific error message users see with Podman for easy
identification
- Links to Docker Desktop installation docs as the recommended solution
- Formatting is consistent with existing sections in the document (emoji
headings, code blocks for errors)
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge — it only adds a documentation warning
section with no code changes.
- The change is a small, well-written documentation addition that adds a
Podman compatibility warning. It touches only one markdown file,
introduces no code changes, and is consistent with the existing document
structure and style. No issues were found.
- No files require special attention.
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[User wants to run AutoGPT] --> B{Which container runtime?}
    B -->|Docker / Docker Desktop| C[docker compose up -d --build]
    C --> D[AutoGPT starts successfully]
    B -->|Podman / podman-compose| E[podman-compose up -d --build]
    E --> F[Error: Containerfile or Dockerfile does not exist]
    F --> G[New warning section directs user to install Docker Desktop]
    G --> C
```
</details>


<sub>Last reviewed commit: 23ea6bd</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-23 15:19:24 +00:00
Ubbe
a18ffd0b21 fix(frontend/copilot): always-visible credentials, inputs, and login prompts (#12194)
Credentials, inputs, and login prompts in copilot tool outputs were
hidden inside collapsible accordions — users could accidentally collapse
them, hiding blocking actionable UI. This PR extracts all blocking
requirements out of accordions so they're always visible.

### Changes 🏗️

- **RunAgent & RunBlock**: Extract `SetupRequirementsCard` (credentials
picker) out of `ToolAccordion` — renders standalone, always visible
- **RunAgent**: Also extract `AgentDetailsCard` (inputs needed) and
`need_login` message out of accordion
- **SetupRequirementsCard (RunBlock)**: Input form always visible
(removed toggle button and animation), unified "Proceed" button disabled
until credentials + inputs are satisfied
- **SetupRequirementsCard (RunAgent)**: "Proceed" button disabled until
all credentials are selected
- **Both cards**: Added titled box with border for credentials section
("Block credentials" / "Agent credentials"), matching the existing
inputs box pattern
- **CredentialsFlatView**: "Add" button uses `variant="primary"` when
user has no credentials (was `secondary`)
- **Styleguide**: Added mock `CredentialsProvidersContext` with two
scenarios:
  - No credentials → shows "add new" flow
  - Has credentials → shows selection list with existing accounts
- **CreateAgent & EditAgent**: Picked up user-initiated styling
refinements

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format && pnpm lint && pnpm types` all pass
  - [ ] Visit `/copilot/styleguide` and verify:
- [ ] "Setup requirements — no credentials" shows add-credential button
(primary variant)
- [ ] "Setup requirements — has credentials" shows credential selection
dropdown
- [ ] Both RunAgent and RunBlock setup requirements render outside
accordion
- [ ] Trigger a copilot agent run that requires credentials — credential
picker always visible
- [ ] Trigger a copilot block run that requires credentials + inputs —
both sections visible, "Proceed" disabled until ready
- [ ] Trigger a copilot agent run that returns "agent details" — card
renders outside accordion
- [ ] Verify other output types (execution_started, error) still render
inside accordions


🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:39:21 +07:00
Otto
e40c8c70ce fix(copilot): collision detection, session locking, and sync for concurrent message saves (#12177)
Requested by @majdyz

Concurrent writers (incremental streaming saves from PR #12173 and
long-running tool callbacks) can race to persist messages with the same
`(sessionId, sequence)` pair, causing unique constraint violations on
`ChatMessage`.

**Root cause:** The streaming loop tracks `saved_msg_count` in-memory,
but the long-running tool callback (`_build_long_running_callback`) also
appends messages and calls `upsert_chat_session` independently — without
coordinating sequence numbers. When the streaming loop does its next
incremental save with the stale `saved_msg_count`, it tries to insert at
a sequence that already exists.

**Fix:** Multi-layered defense-in-depth approach:

1. **Collision detection with retry** (db.py): `add_chat_messages_batch`
uses `create_many()` in a transaction. On `UniqueViolationError`,
queries `MAX(sequence)+1` from DB and retries with the correct offset
(max 5 attempts).

2. **Robust sequence tracking** (db.py): `get_next_sequence()` uses
indexed `find_first` with `order={"sequence": "desc"}` for O(1) MAX
lookup, immune to deleted messages.

3. **Session-based counter** (model.py): Added `saved_message_count`
field to `ChatSession`. `upsert_chat_session` returns the session with
updated count, eliminating tuple returns throughout the codebase.

4. **MessageCounter dataclass** (sdk/service.py): Replaced list[int]
mutable reference pattern with a clean `MessageCounter` dataclass for
shared state between streaming loop and long-running callbacks.

5. **Session locking** (sdk/service.py): Prevent concurrent streams on
the same session using Redis `SET NX EX` distributed locks with TTL
refresh on heartbeats (config.stream_ttl = 3600s).

6. **Atomic operations** (db.py): Single timestamp for all messages and
session update in batch operations for consistency. Parallel queries
with `asyncio.gather` for lower latency.

7. **Config-based TTL** (sdk/service.py, config.py): Consolidated all
TTL constants to use `config.stream_ttl` (3600s) with lock refresh on
heartbeats.

### Key implementation details

- **create_many**: Uses `sessionId` directly (not nested
`Session.connect`) as `create_many` doesn't support nested creates
- **Type narrowing**: Added explicit `assert session is not None`
statements for pyright type checking in async contexts
- **Parallel operations**: Use `asyncio.gather` for independent DB
operations (create_many + session update)
- **Single timestamp**: All messages in a batch share the same
`createdAt` timestamp for atomicity

### Changes
- `backend/copilot/db.py`: Collision detection with `create_many` +
retry, indexed sequence lookup, single timestamp, parallel queries
- `backend/copilot/model.py`: Added `saved_message_count` field,
simplified return types
- `backend/copilot/sdk/service.py`: MessageCounter dataclass, session
locking with refresh, config-based TTL, type narrowing
- `backend/copilot/service.py`: Updated all callers to handle new return
types
- `backend/copilot/config.py`: Increased long_running_operation_ttl to
3600s with clarified docstring
- `backend/copilot/*_test.py`: Tests updated for new signatures

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-20 15:05:03 +00:00
Zamil Majdy
9cdcd6793f fix(copilot): remove stream timeout, add error propagation to frontend (#12175)
## Summary

Fixes critical reliability issues where long-running copilot sessions
were forcibly terminated and failures showed no error messages to users.

## Issues Fixed

1. **Silent failures**: Tasks failed but frontend showed "stopped" with
zero explanation
2. **Premature timeout**: Sessions auto-expired after 5 minutes even
when actively running

## Changes

### Error propagation to frontend
- Add `error_message` parameter to `mark_task_completed()`
- When `status="failed"`, publish `StreamError` before `StreamFinish` so
frontend displays reason
- Update all failure callers with specific error messages:
  - Session not found: `"Session {id} not found"`
  - Tool setup failed: `"Failed to setup tool {name}: {error}"`  
  - Task cancelled: `"Task was cancelled"`

### Remove stream timeout
- Delete `stream_timeout` config (was 300s/5min)
- Remove auto-expiry logic in `get_active_task_for_session()`
- Sessions now run indefinitely — user controls stopping via UI

## Why

**Auto-expiry was broken:**
- Used `created_at` (task start) not last activity
- SDK sessions with multiple LLM calls + subagent Tasks easily run
20-30+ minutes
- A task publishing chunks every second still got killed at 5min mark
- Hard timeout is inappropriate for long-running AI agents

**Error propagation was missing:**
- `mark_task_completed(status="failed")` only sent `StreamFinish`
- No `StreamError` event = frontend had no message to show user
- Backend logs showed errors but user saw nothing

## Test Plan

- [x] Formatter, linter, type-check pass
- [ ] Start a copilot session with Task tool (spawns subagent)
- [ ] Verify session runs beyond 5 minutes without auto-expiry
- [ ] Cancel a running session → frontend shows "Task was cancelled"
error
- [ ] Trigger a tool setup failure → frontend shows error message
- [ ] Session continues running until user clicks stop or task completes

## Files Changed

- `backend/copilot/config.py` — removed `stream_timeout`
- `backend/copilot/stream_registry.py` — removed auto-expiry, added
error propagation
- `backend/copilot/service.py` — error messages for 2 failure paths
- `backend/copilot/executor/processor.py` — error message for
cancellation
2026-02-20 09:16:22 +00:00
Zamil Majdy
fc64f83331 fix(copilot): SDK streaming reliability, parallel tools, incremental saves, frontend reconnection (#12173)
## Summary

Fixes multiple reliability issues in the copilot's Claude Agent SDK
streaming pipeline — tool outputs getting stuck, parallel tool calls
flushing prematurely, messages lost on page refresh, and SSE
reconnection failures.

## Changes

### Backend: Streaming loop rewrite (`sdk/service.py`)
- **Non-cancelling heartbeat pattern**: Replace `asyncio.timeout()` with
`asyncio.wait()` for SDK message iteration. The old approach corrupted
the SDK's internal anyio memory stream when timeouts fired
mid-`__anext__()`, causing `StopAsyncIteration` on the next call and
silently dropping all in-flight tool results.
- **Hook synchronization**: Add `wait_for_stash()` before
`convert_message()` — the SDK fires PostToolUse hooks via `start_soon()`
(fire-and-forget), so the next message can arrive before the hook
stashes its output. The new asyncio.Event-based mechanism bridges this
gap without arbitrary sleeps.
- **Error handling**: Add `asyncio.CancelledError` handling at both
inner (streaming loop) and outer (session) levels, plus pending task
cleanup in `finally` block to prevent leaked coroutines. Catch
`Exception` from `done.pop().result()` for SDK error messages.
- **Safety-net flush**: After streaming loop ends, flush any remaining
unresolved tool calls so the frontend stops showing spinners even if the
stream drops unexpectedly.
- **StreamFinish fallback**: Emit `StreamFinishStep` + `StreamFinish`
when stream ends without `ResultMessage` (StopAsyncIteration) so the
frontend transitions to "ready" state.
- **Incremental session saves**: Save session to PostgreSQL after each
tool input/output event (not just at stream end), so page refresh and
other devices see recent messages.
- **Enhanced logging**: All log lines now include `session_id[:12]`
prefix and tool call resolution state (unresolved/current/resolved
counts).

### Backend: Response adapter (`sdk/response_adapter.py`)
- **Parallel tool call support**: Skip `_flush_unresolved_tool_calls()`
when an AssistantMessage contains only ToolUseBlocks (parallel
continuation) — the prior tools are still executing concurrently and
haven't finished yet.
- **Duplicate output prevention**: Skip already-resolved tool results in
both UserMessage (ToolResultBlock) and parent_tool_use_id handling to
prevent duplicate `StreamToolOutputAvailable` events.
- **`has_unresolved_tool_calls` property**: Used by the streaming loop
to decide whether to wait for PostToolUse hooks.
- **`session_id` parameter**: Passed through for structured logging.

### Backend: Hook synchronization (`sdk/tool_adapter.py`)
- **`_stash_event` ContextVar**: asyncio.Event signaled by
`stash_pending_tool_output()` whenever a PostToolUse hook stashes
output.
- **`wait_for_stash()`**: Awaits the event with configurable timeout —
replaces the racy "hope the hook finished" approach.

### Backend: Security hooks (`sdk/security_hooks.py`)
- Enhanced logging in `post_tool_use_hook` — log whether tool is
built-in, preview of stashed output, warning when `tool_response` is
None.

### Backend: Incremental save optimization (`model.py`)
- **`existing_message_count` parameter** on `upsert_chat_session`: Skip
the DB query to count existing messages when the caller already tracks
this (streaming loop).
- **`skip_existence_check` parameter** on `_save_session_to_db`: Skip
the `get_chat_session` existence query when we know the session row
already exists. Reduces from 4 DB round trips to 2 per incremental save.

### Backend: SDK version bump (`pyproject.toml`, `poetry.lock`)
- Bump `claude-agent-sdk` from `^0.1.0` to `^0.1.39`.

### Backend: New tests
- **`sdk_compat_test.py`** (new file): SDK compatibility tests — verify
the installed SDK exposes every class, attribute, and method the copilot
integration relies on. Catches SDK upgrade breakage immediately.
- **`response_adapter_test.py`**: 9 new tests covering
flush-at-ResultMessage, flush-at-next-AssistantMessage, stashed output
flush, wait_for_stash signaling/timeout/fast-path, parallel tool call
non-premature-flush, text-message flush of prior tools, and
already-resolved tool skip in UserMessage.

### Frontend: Session hydration (`convertChatSessionToUiMessages.ts`)
- **`isComplete` option**: When session has no active stream, mark
dangling tool calls (no output in DB) as `output-available` with empty
output — stops stale spinners after page refresh.

### Frontend: Chat session hook (`useChatSession.ts`)
- Reorder `hasActiveStream` memo before `hydratedMessages` so
`isComplete` flag is available.
- Pass `{ isComplete: !hasActiveStream }` to
`convertChatSessionMessagesToUiMessages`.

### Frontend: Copilot page hook (`useCopilotPage.ts`)
- **Cache invalidation on stream end**: Invalidate React Query session
cache when stream transitions active → idle, so next hydration fetches
fresh messages from backend (staleTime: Infinity otherwise keeps stale
data).
- **Resume ref reset**: Reset `hasResumedRef` on stream end to allow
re-resume if SSE drops but backend task is still running.
- **Remove old `resolveInProgressTools` effect**: Replaced by
backend-side safety-net flush + hydration-time `isComplete` marking.

## Test plan
- [ ] Existing copilot tests pass (`pytest backend/copilot/ -x -q`)
- [ ] SDK compat tests pass (`pytest
backend/copilot/sdk/sdk_compat_test.py -v`)
- [ ] Tool outputs (bash_exec, web_fetch, WebSearch) appear in the UI
instead of getting stuck
- [ ] Parallel tool calls (e.g. multiple WebSearch) complete and display
results without premature flush
- [ ] Page refresh during active stream reconnects and recovers messages
- [ ] Opening session from another device shows recent tool results
- [ ] SSE drop → automatic reconnection without losing messages
- [ ] Long-running tools (create_agent) still delegate to background
infrastructure
2026-02-20 08:25:08 +00:00
Otto
7718c49f05 Make CoPilot todo/task list card expanded by default (#12168)
The todo card rendered by GenericTool was collapsed by default,
requiring users to click to see their checklist items. Now passes
`defaultExpanded` when the category is `"todo"` so the task list is
immediately visible.

**File changed:**
`autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/GenericTool.tsx`

Resolves [SECRT-2017](https://linear.app/autogpt/issue/SECRT-2017)
2026-02-20 05:36:16 +00:00
Abhimanyu Yadav
0a1591fce2 refactor(frontend): remove old builder code and monitoring components
(#12082)

### Changes 🏗️

This PR removes old builder code and monitoring components as part of
the migration to the new flow editor:

- **NewControlPanel**: Simplified component by removing unused props
(`flowExecutionID`, `visualizeBeads`, `pinSavePopover`,
`pinBlocksPopover`, `nodes`, `onNodeSelect`, `onNodeHover`) and cleaned
up commented legacy code
- **Import paths**: Updated all references from
`legacy-builder/CustomNode` to `FlowEditor/nodes/CustomNode`
- **GraphContent**: Fixed type safety by properly handling
`customized_name` metadata and using `categoryColorMap` instead of
`getPrimaryCategoryColor`
- **useNewControlPanel**: Removed unused state and query parameter
handling related to old builder
- Removed dead code and commented-out imports throughout

### Checklist 📋

#### For code changes:

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
    - [x] Verify NewControlPanel renders correctly
    - [x] Test BlockMenu functionality
    - [x] Test Save Control
    - [x] Test Undo/Redo buttons
    - [x] Verify graph search menu still works with updated imports

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR removes legacy builder components and monitoring page (~12,000
lines of code), simplifying `NewControlPanel` to focus only on the new
flow editor.

**Key changes:**
- Removed entire `legacy-builder/` directory (36 files) containing old
CustomNode, CustomEdge, Flow, and control components
- Deleted `/monitoring` page and all related components (9 files)
- Deleted `useAgentGraph` hook (1,043 lines) that was only used by
legacy components
- Simplified `NewControlPanel` by removing unused props
(`flowExecutionID`, `nodes`, `onNodeSelect`, etc.) and commented-out
code
- Updated imports in `NewSearchGraph` components to reference new
`FlowEditor/nodes/CustomNode` instead of deleted
`legacy-builder/CustomNode`
- Removed `/monitoring` from protected pages in `helpers.ts`
- Updated test files to remove monitoring-related test helpers

**Minor style issues:**
- `useNewControlPanel` hook returns unused state (`blockMenuSelected`)
that should be cleaned up
- Unnecessary double negation (`!!`) in `GraphContent.tsx:136`
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with minor style improvements recommended
- The refactor is a straightforward deletion of legacy code with no
references remaining in the codebase. All imports have been updated
correctly, tests cleaned up, and routing configuration updated. The only
issues are minor unused code that could be cleaned up but won't cause
runtime errors.
- No files require special attention - the unused state in
`useNewControlPanel.ts` is a minor style issue
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant NewControlPanel
    participant BlockMenu
    participant NewSaveControl
    participant UndoRedoButtons
    participant Store as blockMenuStore (Zustand)

    Note over NewControlPanel: Simplified component (removed props & legacy code)
    
    User->>NewControlPanel: Render
    NewControlPanel->>useNewControlPanel: Call hook (unused return)
    
    NewControlPanel->>BlockMenu: Render
    BlockMenu->>Store: Access state via useBlockMenuStore
    Store-->>BlockMenu: Return search, filters, etc.
    
    NewControlPanel->>NewSaveControl: Render
    NewControlPanel->>UndoRedoButtons: Render
    
    Note over NewControlPanel,Store: State management moved from hook to Zustand store
    Note over User: Legacy components (CustomNode, Flow, etc.) completely removed
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-20 05:19:08 +00:00
Zamil Majdy
681bb7c2b4 feat(copilot): workspace file tools, context reconstruction, transcript upload protection (#12164)
## Summary
- **Workspace file tools**: `write_workspace_file` now accepts plain
text `content`, `source_path` (copy from ephemeral disk), and graceful
fallback for invalid base64. `read_workspace_file` gains `save_to_path`
to download workspace files to the ephemeral working directory. Both
validate paths against session-specific ephemeral directory.
- **Context reconstruction**: `_format_conversation_context` now
includes tool call summaries and tool results (not just user/assistant
text), fixing agent amnesia when transcript is unavailable or stale.
- **Transcript upload protection**: Moved transcript upload from inside
the inner `try` block to the `finally` block, ensuring it always runs
even on streaming exceptions — prevents transcript loss that caused
staleness on subsequent turns.
- **Agent inactivity timeout**: Configurable timeout (default 300s)
kills hung Claude agents that stop producing SDK messages.
- **SDK system prompt**: Restructured with clear sections for shell
commands, two storage systems, file transfer workflows, and long-running
tools.
- **Path validation hardening**: `_validate_ephemeral_path` uses
`os.path.realpath` for both session dir and target path, fixing macOS
`/tmp` → `/private/tmp` symlink mismatch. Empty-string params normalised
to `None` to prevent dispatch assertion failures.

## Test plan
- [x] `_format_conversation_context` — empty, user, assistant, tool
calls, tool results, full conversation (query_builder_test.py)
- [x] `_build_query_message` — resume up-to-date, stale transcript gap,
zero msg count, no resume single/multi (query_builder_test.py)
- [x] `_validate_ephemeral_path` — valid path, traversal, cross-session,
symlink escape, nested (workspace_files_test.py)
- [x] `_resolve_write_content` — no sources, multiple sources, plain
text, base64, invalid base64, source_path, not found, outside ephemeral,
empty strings (workspace_files_test.py)
- [ ] Verify transcript upload occurs even after streaming error
- [ ] Verify agent inactivity timeout kills hung agents (300s default)

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-02-20 03:20:12 +00:00
Zamil Majdy
0818cd6683 fix(copilot): prevent background agent stalls and context hallucination (#12167)
## Summary
- **Block background Task agents**: The SDK's `Task` tool with
`run_in_background=true` stalls the SSE stream (no messages flow while
they execute) and the agents get killed when the main agent's turn ends
and we SIGTERM the CLI. The `PreToolUse` hook now denies these and tells
the agent to run tasks in the foreground instead.
- **Add heartbeats to SDK streaming**: Replaced the `async for` loop
with an explicit async iterator + `asyncio.wait_for(15s)`. Sends
`StreamHeartbeat` when the CLI is idle (e.g. during long tool execution)
to keep SSE connections alive through proxies/LBs.
- **Fix summarization hallucination**: The `_summarize_messages_llm`
prompt forced the LLM to produce ALL 9 sections ("You MUST include
ALL"), causing fabrication when the conversation didn't have content for
every section. Changed to optional sections with explicit
anti-hallucination instructions.

## Context
Session `7a9dda34-1068-4cfb-9132-5daf8ad31253` exhibited both issues:
1. The copilot tried to spin up background agents to create files in
parallel, then stopped responding
2. On resume, the copilot hallucinated having completed a "comprehensive
competitive analysis" with "9 deliverables" that never happened

## Test plan
- [x] All 26 security hooks tests pass (3 new: background blocked,
foreground allowed, limit enforced)
- [x] All 44 prompt utility tests pass
- [x] Linting and typecheck pass
- [ ] Manual test: copilot session where agent attempts to use Task tool
— should run foreground only
- [ ] Manual test: long-running tool execution — SSE should stay alive
via heartbeats
- [ ] Manual test: resume a multi-turn session — no hallucinated context
in summary
2026-02-19 20:00:15 +00:00
Zamil Majdy
7a39bdfaf8 feat(copilot): wire up stop button to cancel executor tasks (#12171)
## Summary

- The stop button was completely disconnected — clicking it only aborted
the client-side SSE fetch while the executor kept running indefinitely
- The executor already had full cancel infrastructure (RabbitMQ FANOUT
consumer, `CancelCoPilotEvent`, `threading.Event`, periodic cancel
checks) but nobody ever published a cancel message
- This PR wires up the missing pieces: a cancel REST endpoint, a publish
function, and frontend integration

## Changes

- **`executor/utils.py`**: Add `enqueue_cancel_task()` to publish
`CancelCoPilotEvent` to the existing FANOUT exchange
- **`routes.py`**: Add `POST /sessions/{session_id}/cancel` that finds
the active task, publishes cancel, and polls Redis until the task
confirms stopped (up to 10s timeout)
- **`cancel/route.ts`**: Next.js API proxy route for the cancel endpoint
- **`useCopilotPage.ts`**: Wrap AI SDK's `stop()` to also call the
backend cancel API — `sdkStop()` fires first for instant UI feedback,
then the cancel API waits for executor confirmation

## Test plan

- [ ] Start a copilot chat session and send a message
- [ ] Click "Stop generating" while streaming
- [ ] Verify executor logs show `Received cancel for {task_id}` and
`Cancelled during streaming`
- [ ] Verify the cancel endpoint returns `{"cancelled": true}` (not
timeout)
- [ ] Verify frontend transitions to idle state
- [ ] Verify clicking stop when no task is running returns
`{"cancelled": false, "reason": "no_active_task"}`
2026-02-19 19:57:51 +00:00
Otto
0b151f64e8 feat(copilot): Execute parallel tool calls concurrently (#12165)
When the LLM returns multiple tool calls in a single response (e.g.
multiple web fetches for a research task), they now execute concurrently
instead of sequentially. This can dramatically reduce latency for
multi-tool turns.

**Before:** Tool calls execute one after another — 7 web fetches × 2s
each = 14s total
**After:** All tool calls fire concurrently — 7 web fetches = ~2s total

### Changes

- **`service.py`**: New `_execute_tool_calls_parallel()` function that
spawns tool calls as concurrent `asyncio` tasks, collecting stream
events via `asyncio.Queue`
- **`service.py`**: `_yield_tool_call()` now accepts an optional
`session_lock` parameter for concurrent-safe session mutations
- **`base.py`**: Session lock exposed via `contextvars` so tools that
need it can access it without interface changes
- **`run_agent.py`**: Rate-limit counters (`successful_agent_runs`,
`successful_agent_schedules`) protected with the session lock to prevent
race conditions

### Concurrency Safety

| Shared State | Risk | Mitigation |
|---|---|---|
| `session.messages` (long-running tools only) | Race on append + upsert
| `session_lock` wraps mutations |
| `session.successful_agent_runs` counter | Bypass max-runs check |
`session_lock` wraps read-check-increment |
| Tool-internal state (DB queries, API calls) | None — stateless | No
mitigation needed |

### Testing

- Added `parallel_tool_calls_test.py` with tests for:
  - Parallel timing verification (sum vs max of delays)
  - Single tool call regression
  - Retryable error propagation
  - Shared session lock verification
  - Cancellation cleanup

Closes SECRT-2016

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-19 17:53:36 +00:00
Zamil Majdy
be2a48aedb feat(platform/copilot): add SuggestedGoalResponse for vague/unachievable goals (#12139)
## Summary

- Add `SUGGESTED_GOAL` response type and `SuggestedGoalResponse` model
to backend; vague/unachievable goals now return a structured suggestion
instead of a generic error
- Add `SuggestedGoalCard` frontend component (amber styling, "Use this
goal" button) that lets users accept and re-submit a refined goal in one
click
- Add error recovery buttons ("Try again", "Simplify goal") to the error
output block
- Update copilot system prompt with explicit guidance for handling
`suggested_goal` and `clarifying_questions` feedback loops
- Add `create_agent_test.py` covering all four decomposition result
types

## Test plan

- [ ] Trigger vague goal (e.g. "monitor social media") →
`SuggestedGoalCard` renders with amber styling
- [ ] Trigger unachievable goal (e.g. "read my mind") → card shows goal
type "Goal cannot be accomplished" with reason
- [ ] Click "Use this goal" → sends message and triggers new
`create_agent` call with the suggested goal
- [ ] Trigger an error → "Try again" and "Simplify goal" buttons appear
below the error
- [ ] Clarifying questions answered → LLM re-calls `create_agent` with
context (system prompt guidance)
- [ ] Backend tests pass: `poetry run pytest
backend/api/features/chat/tools/create_agent_test.py -xvs` (requires
Docker services)

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaced generic `ErrorResponse` with structured `SuggestedGoalResponse`
for vague/unachievable goals in the copilot agent creation flow. Added
frontend `SuggestedGoalCard` component with amber styling and "Use this
goal" button for one-click goal refinement. Enhanced system prompt with
explicit feedback loop handling for `suggested_goal` and
`clarifying_questions`. Added comprehensive test coverage for all four
decomposition result types.

**Key improvements:**
- Better UX: Users can now accept refined goals with one click instead
of manually retyping
- Clearer error recovery: Added "Try again" and "Simplify goal" buttons
to error blocks
- Structured data: Backend now returns `suggested_goal`, `reason`,
`original_goal`, and `goal_type` fields instead of embedding everything
in error messages

**Issue found:**
- The `reason` field from the backend is not being passed to or
displayed by the `SuggestedGoalCard` component, so users won't see the
explanation for why their goal was rejected (especially important for
unachievable goals where it explains what blocks are missing)
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge after fixing the missing `reason` field in the frontend
component
- Implementation is well-structured with good test coverage and follows
established patterns. The issue with the missing `reason` field is
straightforward to fix but important for UX - users won't understand why
their goal was rejected without it. All other changes are solid: backend
properly returns structured data, tests cover all cases, and the
component integration follows the project's conventions.
-
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
and SuggestedGoalCard.tsx need the `reason` prop added
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    Start[User submits goal to create_agent] --> Decompose[decompose_goal analyzes request]
    
    Decompose --> CheckType{Decomposition result type?}
    
    CheckType -->|clarifying_questions| Questions[Return ClarificationNeededResponse]
    Questions --> UserAnswers[User answers questions]
    UserAnswers --> Retry[Retry with context]
    Retry --> Decompose
    
    CheckType -->|vague_goal| VagueResponse[Return SuggestedGoalResponse<br/>goal_type: vague]
    VagueResponse --> ShowSuggestion[Frontend: SuggestedGoalCard<br/>amber styling]
    ShowSuggestion --> UserAccepts{User clicks<br/>Use this goal?}
    UserAccepts -->|Yes| NewGoal[Send suggested goal]
    NewGoal --> Decompose
    UserAccepts -->|No| End1[User refines manually]
    
    CheckType -->|unachievable_goal| UnachievableResponse[Return SuggestedGoalResponse<br/>goal_type: unachievable<br/>reason: missing blocks]
    UnachievableResponse --> ShowSuggestion
    
    CheckType -->|success| Generate[generate_agent creates workflow]
    Generate --> SaveOrPreview{save parameter?}
    SaveOrPreview -->|true| Save[Save to library<br/>AgentSavedResponse]
    SaveOrPreview -->|false| Preview[AgentPreviewResponse]
    
    CheckType -->|error| ErrorFlow[Return ErrorResponse]
    ErrorFlow --> ShowError[Frontend: Show error with<br/>Try again & Simplify goal buttons]
    ShowError --> UserRetry{User action?}
    UserRetry -->|Try again| Decompose
    UserRetry -->|Simplify goal| GetHelp[Ask LLM to simplify]
    GetHelp --> Decompose
    
    Save --> End2[Done]
    Preview --> End2
    End1 --> End2
```
</details>


<sub>Last reviewed commit: 2f37aee</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-19 16:11:41 +00:00
Ubbe
aeca4dbb79 docs(frontend): add mandatory pre-completion checks to CLAUDE.md (#12161)
### Changes 🏗️

Adds a **Pre-completion Checks (MANDATORY)** section to
`frontend/CLAUDE.md` that instructs Claude Code agents to always run the
following commands in order before reporting frontend work as done:

1. `pnpm format` — auto-fix formatting issues
2. `pnpm lint` — check for lint errors and fix them
3. `pnpm types` — check for type errors and fix them

This ensures code quality gates are enforced consistently by AI agents
working on the frontend.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified `pnpm format` passes cleanly
  - [x] Verified `pnpm lint` passes cleanly
  - [x] Verified `pnpm types` passes cleanly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:07:55 +08:00
Ubbe
7b85eeaae2 refactor(frontend): fix flaky e2e tests (#12156)
### Changes 🏗️

Some fixes to make running e2e more predictable...

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] e2e are imdempotent

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:38:50 +07:00
Ubbe
4db3be2d61 fix(frontend): switch minigame to snake (#12160)
## Changes 🏗️

<img width="600" height="416" alt="Screenshot 2026-02-19 at 18 05 39"
src="https://github.com/user-attachments/assets/930116ad-b611-4398-bee7-4e33ca4dc688"
/>

Make the mini game a snake 🐍 game, so we don't use assets (_possible
license issues_ ), and it's simpler...

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run the app and test
2026-02-19 19:28:59 +07:00
Ubbe
f57a1995d0 fix(frontend): make chat spinner centred when loading (#12154)
## Changes 🏗️

<img width="800" height="969" alt="Screenshot 2026-02-18 at 20 30 36"
src="https://github.com/user-attachments/assets/30d7d211-98c1-4159-94e1-86e81e29ad43"
/>

- Make the spinner centred when the chat is loading

## Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run the app and test locally
2026-02-19 17:31:07 +07:00
Zamil Majdy
3928c35928 feat(copilot): SDK tool output, transcript resume, stream reconnection, GenericTool UI (#12159)
## Summary

### SDK built-in tool output forwarding
- WebSearch, Read, TodoWrite outputs now render in the frontend —
PostToolUse hook stashes outputs before SDK truncation, response adapter
flushes unresolved tool calls via `_flush_unresolved_tool_calls` +
`parent_tool_use_id` handling
- Multi-call stash upgraded to `dict[str, list[str]]` to support
multiple calls to the same built-in tool in one turn

### Transcript-based `--resume` with staleness detection
- Simplified to single upload block after `async with` (Stop hook +
`appendFileSync` guarantees), extracted `_try_upload_transcript` helper
- **NEW**: `message_count` watermark + timestamp metadata stored
alongside transcript — on the next turn, detects staleness and
compresses only the missed messages instead of the full history (hybrid:
transcript via `--resume` + compressed gap)
- Removed redundant dual-strategy code and dead
`find_cli_transcript`/`read_fallback_transcript` functions

### Frontend stream reconnection
- **NEW**: Enabled `resume: true` on `useChat` with
`prepareReconnectToStreamRequest` — page refresh reconnects to active
backend streams via Redis replay (backend `resume_session_stream`
endpoint was already wired up)

### GenericTool.tsx UI overhaul
- Tool-specific icons (terminal, globe, file, search, edit, checklist)
with category-based display
- TodoWrite checklist rendering with status indicators
(completed/in-progress/pending)
- WebSearch/MCP content display via `extractMcpText` for MCP-style
content blocks + raw JSON fallback
- Defensive TodoItem filter per coderabbit review
- Proper accordion content per tool category (bash, web, file, search,
edit, todo)

### Image support
- MCP tool results now include `{"type": "image"}` content blocks when
workspace file responses contain `content_base64` with image MIME types

### Security & cleanup
- `AskUserQuestion` added to `SDK_DISALLOWED_TOOLS` (interactive CLI
tool, no terminal in copilot)
- 36 per-operation `[TIMING]`/`[TASK_LOOKUP]` diagnostic logs downgraded
info→debug
- Silent exception fixes: warning logs for swallowed errors in
stream_registry + service

## Test plan
- [ ] Verify copilot multi-turn conversations use `--resume` (check logs
for `Using --resume`)
- [ ] Verify stale transcript detection fills gap (check logs for
`Transcript stale: covers N of M messages`)
- [ ] Verify page refresh reconnects to active stream (check network tab
for GET to `/stream` returning SSE)
- [ ] Verify WebSearch, Read, TodoWrite tool outputs render in frontend
accordion
- [ ] Verify GenericTool icons and accordion content display correctly
for each tool type
- [ ] Verify production log volume is reduced (no more `[TIMING]` at
info level)

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-02-19 08:48:12 +00:00
Otto
dc77e7b6e6 feat(frontend): Replace advanced switch with chevron on builder nodes (#12152)
## Summary

Replaces the "Advanced" switch/toggle on builder nodes with a chevron
control, matching the UX pattern used for the outputs section.

Resolves
[OPEN-3006](https://linear.app/autogpt/issue/OPEN-3006/replace-advanced-switch-with-chevron-on-builder-nodes)

Before
<img width="443" height="348" alt="Screenshot 2026-02-17 at 9 01 31 pm"
src="https://github.com/user-attachments/assets/40e98669-3136-4e53-8d46-df18ea32c4d7"
/>
After
<img width="443" height="348" alt="Screenshot 2026-02-17 at 9 00 21 pm"
src="https://github.com/user-attachments/assets/0836e3ac-1d0a-43d7-9392-c9d5741b32b6"
/>

## Changes

- `NodeAdvancedToggle.tsx` — Replaced switch component with a chevron
expand/collapse toggle

## Testing

Tested and verified by @kpczerwinski

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaces the `Switch` toggle for the "Advanced" section on builder nodes
with a chevron (`CaretDownIcon`) expand/collapse control, matching the
existing UX pattern used in `OutputHandler.tsx`. The change is clean and
consistent with the codebase.

- Swapped `Switch` component for a ghost `Button` + `CaretDownIcon` with
a `rotate-180` transition for visual feedback
- Pattern closely mirrors the output section toggle in
`OutputHandler.tsx` (lines 120-136)
- Removed the top border separator and rounded bottom corners from the
container, adjusting the visual spacing
- Toggle logic correctly inverts the `showAdvanced` boolean state
- Uses Phosphor Icons and design system components per project
conventions
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge — it is a small, focused UI change with no
logic or security concerns.
- Single file changed with a straightforward UI component swap. The new
implementation follows an established pattern already in use in
OutputHandler.tsx. Toggle logic is correct and all conventions (Phosphor
Icons, design system components, Tailwind styling) are followed.
- No files require special attention.
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant NodeAdvancedToggle
    participant nodeStore

    User->>NodeAdvancedToggle: Click chevron button
    NodeAdvancedToggle->>nodeStore: setShowAdvanced(nodeId, !showAdvanced)
    nodeStore-->>NodeAdvancedToggle: Updated showAdvanced state
    NodeAdvancedToggle->>NodeAdvancedToggle: Rotate CaretDownIcon (0° ↔ 180°)
    Note over NodeAdvancedToggle: Advanced fields shown/hidden via FormCreator
```
</details>


<sub>Last reviewed commit: ad66080</sub>

<!-- greptile_other_comments_section -->

**Context used:**

- Context from `dashboard` - autogpt_platform/frontend/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=39861924-d320-41ba-a1a7-a8bff44f780a))
- Context from `dashboard` -
autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/ARCHITECTURE_FLOW_EDITOR.md
([source](https://app.greptile.com/review/custom-context?memory=0c5511fe-9aeb-4cf1-bbe9-798f2093b748))

<!-- /greptile_comment -->

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Ubbe <0ubbe@users.noreply.github.com>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-18 15:34:02 +00:00
Otto
ba75cc28b5 fix(copilot): Remove description from feature request search, add PII prevention (#12155)
Two targeted changes to the CoPilot feature request tools:

1. **Remove description from search results** — The
`search_feature_requests` tool no longer returns issue descriptions.
Only the title is needed for duplicate detection, reducing unnecessary
data exposure.

2. **Prevent PII in created issues** — Updated the
`create_feature_request` tool description and parameter descriptions to
explicitly instruct the LLM to never include personally identifiable
information (names, emails, company names, etc.) in Linear issue titles
and descriptions.

Resolves [SECRT-2010](https://linear.app/autogpt/issue/SECRT-2010)
2026-02-18 14:36:12 +01:00
Otto
15bcdae4e8 fix(backend/copilot): Clean up GCSWorkspaceStorage per worker (#12153)
The copilot executor runs each worker in its own thread with a dedicated
event loop (`asyncio.new_event_loop()`). `aiohttp.ClientSession` is
bound to the event loop where it was created — using it from a different
loop causes `asyncio.timeout()` to fail with:

```
RuntimeError: Timeout context manager should be used inside a task
```

This was the root cause of transcript upload failures tracked in
SECRT-2009 and [Sentry
#7272473694](https://significant-gravitas.sentry.io/issues/7272473694/).

### Fix

**One `GCSWorkspaceStorage` instance per event loop** instead of a
single shared global.

- `get_workspace_storage()` now returns a per-loop GCS instance (keyed
by `id(asyncio.get_running_loop())`). Local storage remains shared since
it has no async I/O.
- `shutdown_workspace_storage()` closes the instance for the **current**
loop only, so `session.close()` always runs on the loop that created the
session.
- `CoPilotProcessor.cleanup()` shuts down workspace storage on the
worker's own loop, then stops the loop.
- Manager cleanup submits `cleanup_worker` to each thread pool worker
before shutting down the executor — replacing the old approach of
creating a temporary event loop that couldn't close cross-loop sessions.

### Changes

| File | Change |
|------|--------|
| `util/workspace_storage.py` | `GCSWorkspaceStorage` back to simple
single-session class; `get_workspace_storage()` returns per-loop GCS
instance; `shutdown_workspace_storage()` scoped to current loop |
| `copilot/executor/processor.py` | Added `CoPilotProcessor.cleanup()`
and `cleanup_worker()` |
| `copilot/executor/manager.py` | Calls `cleanup_worker` on each thread
pool worker during shutdown |

Fixes SECRT-2009

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-18 11:17:39 +00:00
Otto
e9ba7e51db fix(copilot): Route workspace through db_accessors, fix transcript upload (#12148)
## Summary

Fixes two bugs in the copilot executor:

### SECRT-2008: WorkspaceManager bypasses db_accessors
`backend/util/workspace.py` imported 6 workspace functions directly from
`backend/data/workspace.py`, which call `prisma()` directly. In the
copilot executor (no Prisma connection), these fail.

**Fix:** Replace direct imports with `workspace_db()` from
`db_accessors`, routing through the database_manager HTTP client when
Prisma is unavailable. Also:
- Register all 6 workspace functions in `DatabaseManager` and
`DatabaseManagerAsyncClient`
- Add `UniqueViolationError` to the service `EXCEPTION_MAPPING` so it's
properly re-raised over HTTP (needed for race-condition retry logic)

### SECRT-2009: Transcript upload asyncio.timeout error
`asyncio.create_task()` at line 696 of `service.py` creates an orphaned
background task in the executor's thread event loop.
`gcloud-aio-storage`'s `asyncio.timeout()` fails in this context.

**Fix:** Replace `create_task` with direct `await`. The upload runs
after streaming completes (all chunks already yielded), so no
user-facing latency impact. The function already has internal try/except
error handling.
2026-02-17 22:24:19 +00:00
Reinier van der Leer
d23248f065 feat(backend/copilot): Copilot Executor Microservice (#12057)
Uncouple Copilot task execution from the REST API server. This should
help performance and scalability, and allows task execution to continue
regardless of the state of the user's connection.

- Resolves #12023

### Changes 🏗️

- Add `backend.copilot.executor`->`CoPilotExecutor` (setup similar to
`backend.executor`->`ExecutionManager`).

This executor service uses RabbitMQ-based task distribution, and sticks
with the existing Redis Streams setup for task output. It uses a cluster
lock mechanism to ensure a task is only executed by one pod, and the
`DatabaseManager` for pooled DB access.

- Add `backend.data.db_accessors` for automatic choice of direct/proxied
DB access

Chat requests now flow: API → RabbitMQ → CoPilot Executor → Redis
Streams → SSE Client. This enables horizontal scaling of chat processing
and isolates long-running LLM operations from the API service.

- Move non-API Copilot stuff into `backend.copilot` (from
`backend.api.features.chat`)
  - Updated import paths for all usages

- Move `backend.executor.database` to `backend.data.db_manager` and add
methods for copilot executor
  - Updated import paths for all usages
- Make `backend.copilot.db` RPC-compatible (-> DB ops return ~~Prisma~~
Pydantic models)
  - Make `backend.data.workspace` RPC-compatible
  - Make `backend.data.graphs.get_store_listed_graphs` RPC-compatible

DX:
- Add `copilot_executor` service to Docker setup

Config:
- Add `Config.num_copilot_workers` (default 5) and
`Config.copilot_executor_port` (default 8008)
- Remove unused `Config.agent_server_port`

> [!WARNING]
> **This change adds a new microservice to the system, with entrypoint
`backend.copilot.executor`.**
> The `docker compose` setup has been updated, but if you run the
Platform on something else, you'll have to update your deployment config
to include this new service.
>
> When running locally, the `CoPilotExecutor` uses port 8008 by default.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Copilot works
    - [x] Processes messages when triggered
    - [x] Can use its tools

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-17 16:15:28 +00:00
Bently
905373a712 fix(frontend): use singleton Shiki highlighter for code syntax highlighting (#12144)
## Summary
Addresses SENTRY-1051: Shiki warning about multiple highlighter
instances.

## Problem
The `@streamdown/code` package creates a **new Shiki highlighter for
each language** encountered. When users view AI chat responses with code
blocks in multiple languages (JavaScript, Python, JSON, YAML, etc.),
this creates 10+ highlighter instances, triggering Shiki's warning:

> "10 instances have been created. Shiki is supposed to be used as a
singleton, consider refactoring your code to cache your highlighter
instance"

This causes memory bloat and performance degradation.

## Solution
Introduced a custom code highlighting plugin that properly implements
the singleton pattern:

### New files:
- `src/lib/shiki-highlighter.ts` - Singleton highlighter management
- `src/lib/streamdown-code-plugin.ts` - Drop-in replacement for
`@streamdown/code`

### Key features:
- **Single shared highlighter** - One instance serves all code blocks
- **Preloaded common languages** - JS, TS, Python, JSON, Bash, YAML,
etc.
- **Lazy loading** - Additional languages loaded on demand
- **Result caching** - Avoids re-highlighting identical code blocks

### Changes:
- Added `shiki` as direct dependency
- Updated `message.tsx` to use the new plugin

## Testing
- [ ] Verify code blocks render correctly in AI chat
- [ ] Confirm no Shiki singleton warnings in console
- [ ] Test with multiple languages in same conversation

## Related
- Linear: SENTRY-1051
- Sentry: Multiple Shiki instances warning

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaced `@streamdown/code` with a custom singleton-based Shiki
highlighter implementation to resolve memory bloat from creating
multiple highlighter instances per language. The new implementation
creates a single shared highlighter with preloaded common languages (JS,
TS, Python, JSON, etc.) and lazy-loads additional languages on demand.
Results are cached to avoid re-highlighting identical code blocks.

**Key changes:**
- Added `shiki` v3.21.0 as a direct dependency
- Created `shiki-highlighter.ts` with singleton pattern and language
management utilities
- Created `streamdown-code-plugin.ts` as a drop-in replacement for
`@streamdown/code`
- Updated `message.tsx` to import from the new plugin instead of
`@streamdown/code`

The implementation follows React best practices with async highlighting
and callback-based notifications. The cache key uses code length +
prefix/suffix for efficient lookups on large code blocks.
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor considerations for edge cases
- The implementation is solid with proper singleton pattern, caching,
and async handling. The code is well-structured and addresses the stated
problem. However, there's a subtle potential race condition in the
callback handling where multiple concurrent requests for the same cache
key could trigger duplicate highlight operations before the first
completes. The cache key generation using prefix/suffix could
theoretically cause false cache hits for large files with identical
prefixes and suffixes. Despite these edge cases, the implementation
should work correctly for the vast majority of use cases.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant UI as Streamdown Component
    participant Plugin as Custom Code Plugin
    participant Cache as Token Cache
    participant Singleton as Shiki Highlighter (Singleton)
    participant Callbacks as Pending Callbacks

    UI->>Plugin: highlight(code, lang)
    Plugin->>Cache: Check cache key
    
    alt Cache hit
        Cache-->>Plugin: Return cached result
        Plugin-->>UI: Return highlighted tokens
    else Cache miss
        Plugin->>Callbacks: Register callback
        Plugin->>Singleton: Get highlighter instance
        
        alt First call
            Singleton->>Singleton: Create highlighter with preloaded languages
        end
        
        Singleton-->>Plugin: Return highlighter
        
        alt Language not loaded
            Plugin->>Singleton: Load language dynamically
        end
        
        Plugin->>Singleton: codeToTokens(code, lang, themes)
        Singleton-->>Plugin: Return tokens
        Plugin->>Cache: Store result
        Plugin->>Callbacks: Notify all waiting callbacks
        Callbacks-->>UI: Async callback with result
    end
```
</details>


<sub>Last reviewed commit: 96c793b</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-17 12:15:53 +00:00
Otto
ee9d39bc0f refactor(copilot): Replace legacy delete dialog with molecules/Dialog (#12136)
## Summary
Updates the session delete confirmation in CoPilot to use the new
`Dialog` component from `molecules/Dialog` instead of the legacy
`DeleteConfirmDialog`.

## Changes
- **ChatSidebar**: Use Dialog component for delete confirmation
(desktop)
- **CopilotPage**: Use Dialog component for delete confirmation (mobile)

## Behavior
- Dialog stays **open** during deletion with loading state on button
- Cancel button **disabled** while delete is in progress
- Delete button shows **loading spinner** during deletion
- Dialog only closes on successful delete or when cancel is clicked (if
not deleting)

## Screenshots
*Dialog uses the same styling as other molecules/Dialog instances in the
app*

## Requested by
@0ubbe

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaces the legacy `DeleteConfirmDialog` component with the new
`molecules/Dialog` component for session delete confirmations in both
desktop (ChatSidebar) and mobile (CopilotPage) views. The new
implementation maintains the same behavior: dialog stays open during
deletion with a loading state on the delete button and disabled cancel
button, closing only on successful deletion or cancel click.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- This is a straightforward component replacement that maintains the
same behavior and UX. The Dialog component API is properly used with
controlled state, the loading states are correctly implemented, and both
mobile and desktop views are handled consistently. The changes are
well-tested patterns used elsewhere in the codebase.
- No files require special attention
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[User clicks delete button] --> B{isMobile?}
    B -->|Yes| C[CopilotPage Dialog]
    B -->|No| D[ChatSidebar Dialog]
    
    C --> E[Set sessionToDelete state]
    D --> E
    
    E --> F[Dialog opens with controlled.isOpen]
    F --> G{User action?}
    
    G -->|Cancel| H{isDeleting?}
    H -->|No| I[handleCancelDelete: setSessionToDelete null]
    H -->|Yes| J[Cancel button disabled]
    
    G -->|Confirm Delete| K[handleConfirmDelete called]
    K --> L[deleteSession mutation]
    L --> M[isDeleting = true]
    M --> N[Button shows loading spinner]
    M --> O[Cancel button disabled]
    
    L --> P{Mutation result?}
    P -->|Success| Q[Invalidate sessions query]
    Q --> R[Clear sessionId if current]
    R --> S[setSessionToDelete null]
    S --> T[Dialog closes]
    
    P -->|Error| U[Show toast error]
    U --> V[setSessionToDelete null]
    V --> W[Dialog closes]
```
</details>


<sub>Last reviewed commit: 275950c</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-17 19:12:27 +07:00
Swifty
05aaf7a85e fix(backend): Rename LINEAR_API_KEY to COPILOT_LINEAR_API_KEY to prevent global access (#12143)
The `LINEAR_API_KEY` environment variable name is too generic — it
matches the key name used by integrations/blocks, meaning that if set
globally, it could inadvertently grant all users access to Linear
through the blocks system rather than restricting it to the copilot
feature-request tool.

This renames the setting to `COPILOT_LINEAR_API_KEY` to make it clear
this key is scoped exclusively to the copilot's feature-request
functionality, preventing it from being picked up as a general-purpose
Linear credential.

### Changes 🏗️

- Renamed `linear_api_key` → `copilot_linear_api_key` in `Secrets`
settings model (`backend/util/settings.py`)
- Updated all references in the copilot feature-request tool
(`backend/api/features/chat/tools/feature_requests.py`)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified the rename is consistent across all references (settings
+ feature_requests tool)
  - [x] No other files reference the old `linear_api_key` setting name

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

> **Note:** The env var changes from `LINEAR_API_KEY` to
`COPILOT_LINEAR_API_KEY`. Any deployment using the old name will need to
update accordingly.

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Renamed `LINEAR_API_KEY` to `COPILOT_LINEAR_API_KEY` in settings and the
copilot feature-request tool to prevent unintended access through Linear
blocks.

**Key changes:**
- Updated `Secrets.linear_api_key` → `Secrets.copilot_linear_api_key` in
`backend/util/settings.py`
- Updated all references in
`backend/api/features/chat/tools/feature_requests.py`
- The rename prevents the copilot Linear key from being picked up by the
Linear blocks integration (which uses `LINEAR_API_KEY` via
`ProviderBuilder` in `backend/blocks/linear/_config.py`)

**Issues found:**
- `.env.default` still references `LINEAR_API_KEY` instead of
`COPILOT_LINEAR_API_KEY`
- Frontend styleguide has a hardcoded error message with the old
variable name
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- Generally safe but requires fixing `.env.default` before deployment
- The code changes are correct and achieve the intended security
improvement by preventing scope leakage. However, the PR is incomplete -
`.env.default` wasn't updated (critical for deployment) and a frontend
error message reference was missed. These issues will cause
configuration problems for anyone deploying with the new variable name.
- Check `autogpt_platform/backend/.env.default` and
`autogpt_platform/frontend/src/app/(platform)/copilot/styleguide/page.tsx`
- both need updates to match the renamed variable
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[".env file<br/>COPILOT_LINEAR_API_KEY"] --> B["Secrets model<br/>copilot_linear_api_key"]
    B --> C["feature_requests.py<br/>_get_linear_config()"]
    C --> D["Creates APIKeyCredentials<br/>for copilot feature requests"]
    
    E[".env file<br/>LINEAR_API_KEY"] --> F["ProviderBuilder<br/>in blocks/linear/_config.py"]
    F --> G["Linear blocks integration<br/>for user workflows"]
    
    style A fill:#90EE90
    style B fill:#90EE90
    style C fill:#90EE90
    style D fill:#90EE90
    style E fill:#FFD700
    style F fill:#FFD700
    style G fill:#FFD700
```
</details>


<sub>Last reviewed commit: 86dc57a</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-17 11:16:43 +01:00
Reinier van der Leer
9d4dcbd9e0 fix(backend/docker): Make server last (= default) build stage
Without specifying an explicit build target it would build the `migrate` stage because it is the last stage in the Dockerfile. This caused deployment failures.

- Follow-up to #12124 and 074be7ae
2026-02-16 14:49:30 +01:00
Reinier van der Leer
074be7aea6 fix(backend/docker): Update run commands to match deployment
- Follow-up to #12124

Changes:
- Update `run` commands for all backend services in `docker-compose.platform.yml` to match the deployment commands used in production
- Add trigger on `docker-compose(.platform)?.yml` changes to the Frontend CI workflow
2026-02-16 14:23:29 +01:00
Otto
39d28b24fc ci(backend): Upgrade RabbitMQ from 3.12 (EOL) to 4.1.4 (#12118)
## Summary
Upgrades RabbitMQ from the end-of-life `rabbitmq:3.12-management` to
`rabbitmq:4.1.4`, aligning CI, local dev, and e2e testing with
production.

## Changes

### CI Workflow (`.github/workflows/platform-backend-ci.yml`)
- **Image:** `rabbitmq:3.12-management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used
- **Health check:** Added to prevent flaky tests from race conditions
during startup

### Docker Compose (`docker-compose.platform.yml`,
`docker-compose.test.yaml`)
- **Image:** `rabbitmq:management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used

## Why
- RabbitMQ 3.12 is EOL
- We don't use the management interface, so `-management` variant is
unnecessary
- CI and local dev/e2e should match production (4.1.4)

## Testing
CI validates that backend tests pass against RabbitMQ 4.1.4 on Python
3.11, 3.12, and 3.13.

---
Closes SECRT-1703
2026-02-16 12:45:39 +00:00
Reinier van der Leer
bf79a7748a fix(backend/build): Update stale Poetry usage in Dockerfile (#12124)
[SECRT-2006: Dev deployment failing: poetry not found in container
PATH](https://linear.app/autogpt/issue/SECRT-2006)

- Follow-up to #12090

### Changes 🏗️

- Remove now-broken Poetry path config values
- Remove usage of now-broken `poetry run` in container run command
- Add trigger on `backend/Dockerfile` changes to Frontend CI workflow

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - If it works, CI will pass
2026-02-16 13:54:20 +01:00
Otto
649d4ab7f5 feat(chat): Add delete chat session endpoint and UI (#12112)
## Summary

Adds the ability to delete chat sessions from the CoPilot interface.

## Changes

### Backend
- Add `DELETE /api/chat/sessions/{session_id}` endpoint in `routes.py`
- Returns 204 on success, 404 if not found or not owned by user
- Reuses existing `delete_chat_session` function from `model.py`

### Frontend
- Add delete button (trash icon) that appears on hover for each chat
session
- Add confirmation dialog before deletion using existing
`DeleteConfirmDialog` component
- Refresh session list after successful delete
- Clear current session selection if the deleted session was active
- Update OpenAPI spec with new endpoint

## Testing

1. Hover over a chat session in sidebar → trash icon appears
2. Click trash icon → confirmation dialog
3. Confirm deletion → session removed, list refreshes
4. If deleted session was active, selection is cleared

## Screenshots

Delete button appears on hover, confirmation dialog on click.

## Related Issues

Closes SECRT-1928

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Adds the ability to delete chat sessions from the CoPilot interface — a
new `DELETE /api/chat/sessions/{session_id}` backend endpoint and a
corresponding delete button with confirmation dialog in the
`ChatSidebar` frontend component.

- **Backend route** (`routes.py`): Clean implementation reusing the
existing `delete_chat_session` model function with proper auth guards
and 204/404 responses. No issues.
- **Frontend** (`ChatSidebar.tsx`): Adds hover-visible trash icon per
session, confirmation dialog, mutation with cache invalidation, and
active session clearing on delete. However, it uses a `__legacy__`
component (`DeleteConfirmDialog`) which violates the project's style
guide — new code should use the modern design system components. Error
handling only logs to console without user-facing feedback (project
convention is to use toast notifications for mutation errors).
`isDeleting` is destructured but unused.
- **OpenAPI spec** updated correctly.
- **Unrelated file included**:
`notes/plan-SECRT-1959-graph-edge-desync.md` is a planning document for
a different ticket and should be removed from this PR. The `notes/`
directory is newly introduced and both plan files should be reconsidered
for inclusion.
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- Functionally correct but has style guide violations and includes
unrelated files that should be addressed before merge.
- The core feature implementation (backend DELETE endpoint and frontend
mutation logic) is sound and follows existing patterns. Score is lowered
because: (1) the frontend uses a legacy component explicitly prohibited
by the project's style guide, (2) mutation errors are not surfaced to
the user, and (3) the PR includes an unrelated planning document for a
different ticket.
- Pay close attention to `ChatSidebar.tsx` for the legacy component
import and error handling, and
`notes/plan-SECRT-1959-graph-edge-desync.md` which should be removed.
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant ChatSidebar as ChatSidebar (Frontend)
    participant ReactQuery as React Query
    participant API as DELETE /api/chat/sessions/{id}
    participant Model as model.delete_chat_session
    participant DB as db.delete_chat_session (Prisma)
    participant Redis as Redis Cache

    User->>ChatSidebar: Click trash icon on session
    ChatSidebar->>ChatSidebar: Show DeleteConfirmDialog
    User->>ChatSidebar: Confirm deletion
    ChatSidebar->>ReactQuery: deleteSession({ sessionId })
    ReactQuery->>API: DELETE /api/chat/sessions/{session_id}
    API->>Model: delete_chat_session(session_id, user_id)
    Model->>DB: delete_many(where: {id, userId})
    DB-->>Model: bool (deleted count > 0)
    Model->>Redis: Delete session cache key
    Model->>Model: Clean up session lock
    Model-->>API: True
    API-->>ReactQuery: 204 No Content
    ReactQuery->>ChatSidebar: onSuccess callback
    ChatSidebar->>ReactQuery: invalidateQueries(sessions list)
    ChatSidebar->>ChatSidebar: Clear sessionId if deleted was active
```
</details>


<sub>Last reviewed commit: 44a92c6</sub>

<!-- greptile_other_comments_section -->

<details><summary><h4>Context used (3)</h4></summary>

- Context from `dashboard` - autogpt_platform/frontend/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=39861924-d320-41ba-a1a7-a8bff44f780a))
- Context from `dashboard` - autogpt_platform/frontend/CONTRIBUTING.md
([source](https://app.greptile.com/review/custom-context?memory=cc4f1b17-cb5c-4b63-b218-c772b48e20ee))
- Context from `dashboard` - autogpt_platform/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=6e9dc5dc-8942-47df-8677-e60062ec8c3a))
</details>


<!-- /greptile_comment -->

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-16 12:19:18 +00:00
Ubbe
223df9d3da feat(frontend): improve create/edit copilot UX (#12117)
## Changes 🏗️

Make the UX nicer when running long tasks in Copilot, like creating an
agent, editing it or running a task.

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run locally and play the game!

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

This PR replaces the static progress bar and idle wait screens with an
interactive mini-game across the Create, Edit, and Run Agent copilot
tools. The existing mini-game (a simple runner with projectile-dodge
boss encounters) is significantly overhauled into a two-mode game: a
runner mode with animated tree obstacles and a duel mode featuring a
melee boss fight with attack, guard, and movement mechanics.
Sprite-based rendering replaces the previous shape-drawing approach.

- **Create/Edit/Run Agent UX**: All three tool views now show the
mini-game with contextual overlays during long-running operations,
replacing the progress bar in EditAgent and adding the game to RunAgent
- **Game mechanics overhaul**: Boss encounters changed from
projectile-dodging to melee duel with attack (Z), block (X), movement
(arrows), and jump (Space) controls
- **Sprite rendering**: Added 9 sprite sheet assets for characters,
trees, and boss animations with fallback to shape rendering if images
fail to load
- **UI overlays**: Added React-managed overlay states for idle,
boss-intro, boss-defeated, and game-over screens with continue/retry
buttons
- **Minor issues found**: Unused `isRunActive` variable in
`MiniGame.tsx`, unreachable "leaving" boss phase in `useMiniGame.ts`,
and a missing `expanded` property in `getAccordionMeta` return type
annotation in `EditAgent.tsx`
- **Unused asset**: `archer-shoot.png` is included in the PR but never
imported or referenced in any code
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge — it only affects the copilot mini-game UX
with no backend or data model changes.
- The changes are entirely frontend/cosmetic, scoped to the copilot
tools' waiting UX. The mini-game logic is self-contained in a
canvas-based hook and doesn't affect any application state, API calls,
or routing. The issues found are minor (unused variable, dead code, type
annotation gap, unused asset) and don't impact runtime behavior.
- `useMiniGame.ts` has the most complex logic changes (boss AI, death
animations, sprite rendering) and contains unreachable dead code in the
"leaving" phase handler. `EditAgent.tsx` has a return type annotation
that doesn't include `expanded`.
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[Game Idle] -->|"Start button"| B[Run Mode]
    B -->|"Jump over trees"| C{Score >= Threshold?}
    C -->|No| B
    C -->|"Yes, obstacles clear"| D[Boss Intro Overlay]
    D -->|"Continue button"| E[Duel Mode]
    E -->|"Attack Z / Guard X / Move ←→"| F{Boss HP <= 0?}
    F -->|No| G{Player hit & not guarding?}
    G -->|No| E
    G -->|Yes| H[Player Death Animation]
    H --> I[Game Over Overlay]
    I -->|"Retry button"| B
    F -->|Yes| J[Boss Death Animation]
    J --> K[Boss Defeated Overlay]
    K -->|"Continue button"| L[Reset Boss & Resume Run]
    L --> B
```
</details>


<sub>Last reviewed commit: ad80e24</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-16 10:53:08 +00:00
Ubbe
187ab04745 refactor(frontend): remove OldAgentLibraryView and NEW_AGENT_RUNS flag (#12088)
## Summary
- Removes the deprecated `OldAgentLibraryView` directory (13 files,
~2200 lines deleted)
- Removes the `NEW_AGENT_RUNS` feature flag from the `Flag` enum and
defaults
- Removes the legacy agent library page at `library/legacy/[id]`
- Moves shared `CronScheduler` components to
`src/components/contextual/CronScheduler/`
- Moves `agent-run-draft-view` and `agent-status-chip` to
`legacy-builder/` (co-located with their only consumer)
- Updates all import paths in consuming files (`AgentInfoStep`,
`SaveControl`, `RunnerInputUI`, `useRunGraph`)

## Test plan
- [x] `pnpm format` passes
- [x] `pnpm types` passes (no TypeScript errors)
- [x] No remaining references to `OldAgentLibraryView`,
`NEW_AGENT_RUNS`, or `new-agent-runs` in the codebase
- [x] Verify `RunnerInputUI` dialog still works in the legacy builder
- [x] Verify `AgentInfoStep` cron scheduling works in the publish modal
- [x] Verify `SaveControl` cron scheduling works in the legacy builder

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR removes deprecated code from the legacy agent library view
system and consolidates the codebase to use the new agent runs
implementation exclusively. The refactor successfully removes ~2200
lines of code across 13 deleted files while properly relocating shared
components.

**Key changes:**
- Removed the entire `OldAgentLibraryView` directory and its 13
component files
- Removed the `NEW_AGENT_RUNS` feature flag from the `Flag` enum and
defaults
- Deleted the legacy agent library page route at `library/legacy/[id]`
- Moved `CronScheduler` components to
`src/components/contextual/CronScheduler/` for shared use across the
application
- Moved `agent-run-draft-view` and `agent-status-chip` to
`legacy-builder/` directory, co-locating them with their only consumer
- Updated `useRunGraph.ts` to import `GraphExecutionMeta` from the
generated API models instead of the deleted custom type definition
- Updated all import paths in consuming components (`AgentInfoStep`,
`SaveControl`, `RunnerInputUI`)

**Technical notes:**
- The new import path for `GraphExecutionMeta`
(`@/app/api/__generated__/models/graphExecutionMeta`) will be generated
when running `pnpm generate:api` from the OpenAPI spec
- All references to the old code have been cleanly removed from the
codebase
- The refactor maintains proper separation of concerns by moving shared
components to contextual locations
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with minimal risk, pending manual
verification of the UI components mentioned in the test plan
- The refactor is well-structured and all code changes are correct. The
score of 4 (rather than 5) reflects that the PR author has marked three
manual testing items as incomplete in the test plan: verifying
`RunnerInputUI` dialog, `AgentInfoStep` cron scheduling, and
`SaveControl` cron scheduling. While the code changes are sound, these
UI components should be manually tested before merging to ensure the
moved components work correctly in their new locations.
- No files require special attention. The author should complete the
manual testing checklist items for `RunnerInputUI`, `AgentInfoStep`, and
`SaveControl` as noted in the test plan.
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Dev as Developer
    participant FE as Frontend Build
    participant API as Backend API
    participant Gen as Generated Types

    Note over Dev,Gen: Refactor: Remove OldAgentLibraryView & NEW_AGENT_RUNS flag

    Dev->>FE: Delete OldAgentLibraryView (13 files, ~2200 lines)
    Dev->>FE: Remove NEW_AGENT_RUNS from Flag enum
    Dev->>FE: Delete library/legacy/[id]/page.tsx
    
    Dev->>FE: Move CronScheduler → src/components/contextual/
    Dev->>FE: Move agent-run-draft-view → legacy-builder/
    Dev->>FE: Move agent-status-chip → legacy-builder/
    
    Dev->>FE: Update RunnerInputUI import path
    Dev->>FE: Update SaveControl import path
    Dev->>FE: Update AgentInfoStep import path
    
    Dev->>FE: Update useRunGraph.ts
    FE->>Gen: Import GraphExecutionMeta from generated models
    Note over Gen: Type available after pnpm generate:api
    
    Gen-->>API: Uses OpenAPI spec schema
    API-->>FE: Type-safe GraphExecutionMeta model
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:29:59 +08:00
Abhimanyu Yadav
e2d3c8a217 fix(frontend): Prevent node drag when selecting text in object editor key input (#11955)
## Summary
- Add `nodrag` class to the key name input wrapper in
`WrapIfAdditionalTemplate.tsx`
- This prevents the node from being dragged when users try to select
text in the key name input field
- Follows the same pattern used by other input components like
`TextWidget.tsx`

## Test plan
- [x] Open the new builder
- [x] Add a custom node with an Object input field
- [x] Try to select text in the key name input by clicking and dragging
- [x] Verify that text selection works without moving the block

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-16 06:59:33 +00:00
Eve
647c8ed8d4 feat(backend/blocks): enhance list concatenation with advanced operations (#12105)
## Summary

Enhances the existing `ConcatenateListsBlock` and adds five new
companion blocks for comprehensive list manipulation, addressing issue
#11139 ("Implement block to concatenate lists").

### Changes

- **Enhanced `ConcatenateListsBlock`** with optional deduplication
(`deduplicate`) and None-value filtering (`remove_none`), plus an output
`length` field
- **New `FlattenListBlock`**: Recursively flattens nested list
structures with configurable `max_depth`
- **New `InterleaveListsBlock`**: Round-robin interleaving of elements
from multiple lists
- **New `ZipListsBlock`**: Zips corresponding elements from multiple
lists with support for padding to longest or truncating to shortest
- **New `ListDifferenceBlock`**: Computes set difference between two
lists (regular or symmetric)
- **New `ListIntersectionBlock`**: Finds common elements between two
lists, preserving order

### Helper Utilities

Extracted reusable helper functions for validation, flattening,
deduplication, interleaving, chunking, and statistics computation to
support the blocks and enable future reuse.

### Test Coverage

Comprehensive test suite with 188 test functions across 29 test classes
covering:
- Built-in block test harness validation for all 6 blocks
- Manual edge-case tests for each block (empty inputs, large lists,
mixed types, nested structures)
- Internal method tests for all block classes
- Unit tests for all helper utility functions

Closes #11139

## Test plan

- [x] All files pass Python syntax validation (`ast.parse`)
- [x] Built-in `test_input`/`test_output` tests defined for all blocks
- [x] Manual tests cover edge cases: empty lists, large lists, mixed
types, nested structures, deduplication, None removal
- [x] Helper function tests validate all utility functions independently
- [x] All block IDs are valid UUID4
- [x] Block categories set to `BlockCategory.BASIC` for consistency with
existing list blocks


<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Enhanced `ConcatenateListsBlock` with deduplication and None-filtering
options, and added five new list manipulation blocks
(`FlattenListBlock`, `InterleaveListsBlock`, `ZipListsBlock`,
`ListDifferenceBlock`, `ListIntersectionBlock`) with comprehensive
helper functions and test coverage.

**Key Changes:**
- Enhanced `ConcatenateListsBlock` with `deduplicate` and `remove_none`
options, plus `length` output field
- Added `FlattenListBlock` for recursively flattening nested lists with
configurable `max_depth`
- Added `InterleaveListsBlock` for round-robin element interleaving
- Added `ZipListsBlock` with support for padding/truncation
- Added `ListDifferenceBlock` and `ListIntersectionBlock` for set
operations
- Extracted 12 reusable helper functions for validation, flattening,
deduplication, etc.
- Comprehensive test suite with 188 test functions covering edge cases

**Minor Issues:**
- Helper function `_deduplicate_list` has redundant logic in the `else`
branch that duplicates the `if` branch
- Three helper functions (`_filter_empty_collections`,
`_compute_list_statistics`, `_chunk_list`) are defined but unused -
consider removing unless planned for future use
- The `_make_hashable` function uses `hash(repr(item))` for unhashable
types, which correctly treats structurally identical dicts/lists as
duplicates
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor style improvements recommended
- The implementation is well-structured with comprehensive test coverage
(188 tests), proper error handling, and follows existing block patterns.
All blocks use valid UUID4 IDs and correct categories. The helper
functions provide good code reuse. The minor issues are purely stylistic
(redundant code, unused helpers) and don't affect functionality or
safety.
- No files require special attention - both files are well-tested and
follow project conventions
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant Block as List Block
    participant Helper as Helper Functions
    participant Output
    
    User->>Block: Input (lists/parameters)
    Block->>Helper: _validate_all_lists()
    Helper-->>Block: validation result
    
    alt validation fails
        Block->>Output: error message
    else validation succeeds
        Block->>Helper: _concatenate_lists_simple() / _flatten_nested_list() / etc.
        Helper-->>Block: processed result
        
        opt deduplicate enabled
            Block->>Helper: _deduplicate_list()
            Helper-->>Block: deduplicated result
        end
        
        opt remove_none enabled
            Block->>Helper: _filter_none_values()
            Helper-->>Block: filtered result
        end
        
        Block->>Output: result + length
    end
    
    Output-->>User: Block outputs
```
</details>


<sub>Last reviewed commit: a6d5445</sub>

<!-- greptile_other_comments_section -->

<sub>(2/5) Greptile learns from your feedback when you react with thumbs
up/down!</sub>

<!-- /greptile_comment -->

---------

Co-authored-by: Otto <otto@agpt.co>
2026-02-16 05:39:53 +00:00
Zamil Majdy
27d94e395c feat(backend/sdk): enable WebSearch, block WebFetch, consolidate tool constants (#12108)
## Summary
- Enable Claude Agent SDK built-in **WebSearch** tool (Brave Search via
Anthropic API) for the CoPilot SDK agent
- Explicitly **block WebFetch** via `SDK_DISALLOWED_TOOLS`. The agent
uses the SSRF-protected `mcp__copilot__web_fetch` MCP tool instead
- **Consolidate** all tool security constants (`BLOCKED_TOOLS`,
`WORKSPACE_SCOPED_TOOLS`, `DANGEROUS_PATTERNS`, `SDK_DISALLOWED_TOOLS`)
into `tool_adapter.py` as a single source of truth — previously
scattered across `tool_adapter.py`, `security_hooks.py`, and inline in
`service.py`

## Changes
- `tool_adapter.py`: Add `WebSearch` to `_SDK_BUILTIN_TOOLS`, add
`SDK_DISALLOWED_TOOLS`, move security constants here
- `security_hooks.py`: Import constants from `tool_adapter.py` instead
of defining locally
- `service.py`: Use `SDK_DISALLOWED_TOOLS` instead of inline `["Bash"]`

## Test plan
- [x] All 21 security hooks tests pass
- [x] Ruff lint clean
- [x] All pre-commit hooks pass
- [ ] Verify WebSearch works in CoPilot chat (manual test)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Consolidates tool security constants into `tool_adapter.py` as single
source of truth, enables WebSearch (Brave via Anthropic API), and
explicitly blocks WebFetch to prevent SSRF attacks. The change improves
security by ensuring the agent uses the SSRF-protected
`mcp__copilot__web_fetch` tool instead of the built-in WebFetch which
can access internal networks like `localhost:8006`.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes improve security by blocking WebFetch (SSRF risk) while
enabling safe WebSearch. The consolidation of constants into a single
source of truth improves maintainability. All existing tests pass (21
security hooks tests), and the refactoring is straightforward with no
behavioral changes to existing security logic. The only suggestions are
minor improvements: adding a test for WebFetch blocking and considering
a lowercase alias for consistency.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Agent as SDK Agent
    participant Hooks as Security Hooks
    participant TA as tool_adapter.py
    participant MCP as MCP Tools
    
    Note over TA: SDK_DISALLOWED_TOOLS = ["Bash", "WebFetch"]
    Note over TA: _SDK_BUILTIN_TOOLS includes WebSearch
    
    Agent->>Hooks: Request WebSearch (Brave API)
    Hooks->>TA: Check BLOCKED_TOOLS
    TA-->>Hooks: Not blocked
    Hooks-->>Agent: Allowed ✓
    Agent->>Agent: Execute via Anthropic API
    
    Agent->>Hooks: Request WebFetch (SSRF risk)
    Hooks->>TA: Check BLOCKED_TOOLS
    Note over TA: WebFetch in SDK_DISALLOWED_TOOLS
    TA-->>Hooks: Blocked
    Hooks-->>Agent: Denied ✗
    Note over Agent: Use mcp__copilot__web_fetch instead
    
    Agent->>Hooks: Request mcp__copilot__web_fetch
    Hooks->>MCP: Validate (MCP tool, not SDK builtin)
    MCP-->>Hooks: Has SSRF protection
    Hooks-->>Agent: Allowed ✓
    Agent->>MCP: Execute with SSRF checks
```
</details>


<sub>Last reviewed commit: 2d9975f</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-15 06:51:25 +00:00
DEEVEN SERU
b8f5c208d0 Handle errors in Jina ExtractWebsiteContentBlock (#12048)
## Summary
- catch Jina reader client/server errors in ExtractWebsiteContentBlock
and surface a clear error output keyed to the user URL
- guard empty responses to return an explicit error instead of yielding
blank content
- add regression tests covering the happy path and HTTP client failures
via a monkeypatched fetch

## Testing
- not run (pytest unavailable in this environment)

---------

Co-authored-by: Nicholas Tindle <nicktindle@outlook.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-02-13 19:15:09 +00:00
Otto
ca216dfd7f ci(docs-claude-review): Update comments instead of creating new ones (#12106)
## Changes 🏗️

This PR updates the Claude Block Docs Review CI workflow to update
existing comments instead of creating new ones on each push.

### What's Changed:
1. **Concurrency group** - Prevents race conditions if the workflow runs
twice simultaneously
2. **Comment cleanup step** - Deletes any previous Claude review comment
before posting a new one
3. **Marker instruction** - Instructs Claude to include a `<!--
CLAUDE_DOCS_REVIEW -->` marker in its comment for identification

### Why:
Previously, every PR push would create a new review comment, cluttering
the PR with multiple comments. Now only the most recent review is shown.

### Testing:
1. Create a PR that triggers this workflow (modify a file in
`docs/integrations/` or `autogpt_platform/backend/backend/blocks/`)
2. Verify first run creates comment with marker
3. Push another commit
4. Verify old comment is deleted and new comment is created (not
accumulated)

Requested by @Bentlybro

---

## Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan (will be
tested on merge)

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Added concurrency control and comment deduplication to prevent multiple
Claude review comments from accumulating on PRs. The workflow now
deletes previous review comments (identified by `<!-- CLAUDE_DOCS_REVIEW
-->` marker) before posting new ones, and uses concurrency groups to
prevent race conditions.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes are well-contained, follow GitHub Actions best practices,
and use built-in GitHub APIs safely. The concurrency control prevents
race conditions, and the comment cleanup logic uses proper filtering
with `head -1` to handle edge cases. The HTML comment marker approach is
standard and reliable.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant GH as GitHub PR Event
    participant WF as Workflow
    participant API as GitHub API
    participant Claude as Claude Action
    
    GH->>WF: PR opened/synchronized
    WF->>WF: Check concurrency group
    Note over WF: Cancel any in-progress runs<br/>for same PR number
    WF->>API: Query PR comments
    API-->>WF: Return all comments
    WF->>WF: Filter for CLAUDE_DOCS_REVIEW marker
    alt Previous comment exists
        WF->>API: DELETE comment by ID
        API-->>WF: Comment deleted
    else No previous comment
        WF->>WF: Skip deletion
    end
    WF->>Claude: Run code review
    Claude->>API: POST new comment with marker
    API-->>Claude: Comment created
```
</details>


<sub>Last reviewed commit: fb1b436</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-13 16:46:23 +00:00
Zamil Majdy
f9f358c526 feat(mcp): Add MCP tool block with OAuth, tool discovery, and standard credential integration (#12011)
## Summary

<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/18e8ef34-d222-453c-8b0a-1b25ef8cf806"
/>


<img width="250" alt="image"
src="https://github.com/user-attachments/assets/ba97556c-09c5-4f76-9f4e-49a2e8e57468"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/68f7804a-fe74-442d-9849-39a229c052cf"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/700690ba-f9fe-4726-8871-3bfbab586001"
/>

Full-stack MCP (Model Context Protocol) tool block integration that
allows users to connect to any MCP server, discover available tools,
authenticate via OAuth, and execute tools — all through the standard
AutoGPT credential system.

### Backend

- **MCPToolBlock** (`blocks/mcp/block.py`): New block using
`CredentialsMetaInput` pattern with optional credentials (`default={}`),
supporting both authenticated (OAuth) and public MCP servers. Includes
auto-lookup fallback for backward compatibility.
- **MCP Client** (`blocks/mcp/client.py`): HTTP transport with JSON-RPC
2.0, tool discovery, tool execution with robust error handling
(type-checked error fields, non-JSON response handling)
- **MCP OAuth Handler** (`blocks/mcp/oauth.py`): RFC 8414 discovery,
dynamic per-server OAuth with PKCE, token storage and refresh via
`raise_for_status=True`
- **MCP API Routes** (`api/features/mcp/routes.py`): `discover-tools`,
`oauth/login`, `oauth/callback` endpoints with credential cleanup,
defensive OAuth metadata validation
- **Credential system integration**:
- `CredentialsMetaInput` model_validator normalizes legacy
`"ProviderName.MCP"` format from Python 3.13's `str(StrEnum)` change
- `CredentialsFieldInfo.combine()` supports URL-based credential
discrimination (each MCP server gets its own credential entry)
- `aggregate_credentials_inputs` checks block schema defaults for
credential optionality
- Executor normalizes credential data for both Pydantic and JSON schema
validation paths
  - Chat credential matching handles MCP server URL filtering
- `provider_matches()` helper used consistently for Python 3.13 StrEnum
compatibility
- **Pre-run validation**: `_validate_graph_get_errors` now calls
`get_missing_input()` for custom block-level validation (MCP tool
arguments)
- **Security**: HTML tag stripping loop to prevent XSS bypass, SSRF
protection (removed trusted_origins)

### Frontend

- **MCPToolDialog** (`MCPToolDialog.tsx`): Full tool discovery UI —
enter server URL, authenticate if needed, browse tools, select tool and
configure
- **OAuth popup** (`oauth-popup.ts`): Shared utility supporting
cross-origin MCP OAuth flows with BroadcastChannel + localStorage
fallback
- **Credential integration**: MCP-specific OAuth flow in
`useCredentialsInput`, server URL filtering in `useCredentials`, MCP
callback page
- **CredentialsSelect**: Auto-selects first available credential instead
of defaulting to "None", credentials listed before "None" in dropdown
- **Node rendering**: Dynamic tool input schema rendering on MCP nodes,
proper handling in both legacy and new flow editors
- **Block title persistence**: `customized_name` set at block creation
for both MCP and Agent blocks — no fallback logic needed, titles survive
save/load reliably
- **Stable credential ordering**: Removed `sortByUnsetFirst` that caused
credential inputs to jump when selected

### Tests (~2060 lines)

- Unit tests: block, client, tool execution
- Integration tests: mock MCP server with auth
- OAuth flow tests
- API endpoint tests
- Credential combining/optionality tests
- E2e tests (skipped in CI, run manually)

## Key Design Decisions

1. **Optional credentials via `default={}`**: MCP servers can be public
(no auth) or private (OAuth). The `credentials` field has `default={}`
making it optional at the schema level, so public servers work without
prompting for credentials.

2. **URL-based credential discrimination**: Each MCP server URL gets its
own credential entry in the "Run agent" form (via
`discriminator="server_url"`), so agents using multiple MCP servers
prompt for each independently.

3. **Model-level normalization**: Python 3.13 changed `str(StrEnum)` to
return `"ClassName.MEMBER"`. Rather than scattering fixes across the
codebase, a Pydantic `model_validator(mode="before")` on
`CredentialsMetaInput` handles normalization centrally, and
`provider_matches()` handles lookups.

4. **Credential auto-select**: `CredentialsSelect` component defaults to
the first available credential and notifies the parent state, ensuring
credentials are pre-filled in the "Run agent" dialog without requiring
manual selection.

5. **customized_name for block titles**: Both MCP and Agent blocks set
`customized_name` in metadata at creation time. This eliminates
convoluted runtime fallback logic (`agent_name`, hostname extraction) —
the title is persisted once and read directly.

## Test plan

- [x] Unit/integration tests pass (68 MCP + 11 graph = 79 tests)
- [x] Manual: MCP block with public server (DeepWiki) — no credentials
needed, tools discovered and executable
- [x] Manual: MCP block with OAuth server (Linear, Sentry) — OAuth flow
prompts correctly
- [x] Manual: "Run agent" form shows correct credential requirements per
MCP server
- [x] Manual: Credential auto-selects when exactly one matches,
pre-selects first when multiple exist
- [x] Manual: Credential ordering stays stable when
selecting/deselecting
- [x] Manual: MCP block title persists after save and refresh
- [x] Manual: Agent block title persists after save and refresh (via
customized_name)
- [ ] Manual: Shared agent with MCP block prompts new user for
credentials

---------

Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-13 16:17:03 +00:00
Zamil Majdy
52b3aebf71 feat(backend/sdk): Claude Agent SDK integration for CoPilot (#12103)
## Summary

Full integration of the **Claude Agent SDK** to replace the existing
one-turn OpenAI-compatible CoPilot implementation with a multi-turn,
tool-using AI agent.

### What changed

**Core SDK Integration** (`chat/sdk/` — new module)
- **`service.py`**: Main orchestrator — spawns Claude Code CLI as a
subprocess per user message, streams responses back via SSE. Handles
conversation history compression, session lifecycle, and error recovery.
- **`response_adapter.py`**: Translates Claude Agent SDK events (text
deltas, tool use, errors, result messages) into the existing CoPilot
`StreamEvent` protocol so the frontend works unchanged.
- **`tool_adapter.py`**: Bridges CoPilot's MCP tools (find_block,
run_block, create_agent, etc.) into the SDK's tool format. Handles
schema conversion and result serialization.
- **`security_hooks.py`**: Pre/Post tool-use hooks that enforce a strict
allowlist of tools, block path traversal, sandbox file operations to
per-session workspace directories, cap sub-agent spawning, and prevent
the model from accessing unauthorized system resources.
- **`transcript.py`**: JSONL transcript I/O utilities for the stateless
`--resume` feature (see below).

**Stateless Multi-Turn Resume** (new)
- Instead of compressing conversation history via LLM on every turn
(lossy and expensive), we capture Claude Code's native JSONL session
transcript via a **Stop hook** callback, persist it in the DB
(`ChatSession.sdkTranscript`), and restore it on the next turn via
`--resume <file>`.
- This preserves full tool call/result context across turns with zero
token overhead for history.
- Feature-flagged via `CLAUDE_AGENT_USE_RESUME` (default: off).
- DB migration: `ALTER TABLE "ChatSession" ADD COLUMN "sdkTranscript"
TEXT`.

**Sandboxed Tool Execution** (`chat/tools/`)
- **`bash_exec.py`**: Sandboxed bash execution using bubblewrap
(`bwrap`) with read-only root filesystem, per-session writable
workspace, resource limits (CPU, memory, file size), and network
isolation.
- **`sandbox.py`**: Shared bubblewrap sandbox infrastructure — generates
`bwrap` command lines with configurable mounts, environment, and
resource constraints.
- **`web_fetch.py`**: URL fetching tool with domain allowlist, size
limits, and content-type filtering.
- **`check_operation_status.py`**: Polling tool for long-running
operations (agent creation, block execution) so the SDK doesn't block
waiting.
- **`find_block.py`** / **`run_block.py`**: Enhanced with category
filtering, optimized response size (removed raw JSON schemas), and
better error handling.

**Security**
- Path traversal prevention: session IDs sanitized, all file ops
confined to workspace dirs, symlink resolution.
- Tool allowlist enforcement via SDK hooks — model cannot call arbitrary
tools.
- Built-in `Bash` tool blocked via `disallowed_tools` to prevent
bypassing sandboxed `bash_exec`.
- Sub-agent (`Task`) spawning capped at configurable limit (default:
10).
- CodeQL-clean path sanitization patterns.

**Streaming & Reconnection**
- SSE stream registry backed by Redis Streams for crash-resilient
reconnection.
- Long-running operation tracking with TTL-based cleanup.
- Atomic message append to prevent race conditions on concurrent writes.

**Configuration** (`config.py`)
- `use_claude_agent_sdk` — master toggle (default: on)
- `claude_agent_model` — model override for SDK path
- `claude_agent_max_buffer_size` — JSON parsing buffer (10MB)
- `claude_agent_max_subtasks` — sub-agent cap (10)
- `claude_agent_use_resume` — transcript-based resume (default: off)
- `thinking_enabled` — extended thinking for Claude models

**Tests**
- `sdk/response_adapter_test.py` — 366 lines covering all event
translation paths
- `sdk/security_hooks_test.py` — 165 lines covering tool blocking, path
traversal, subtask limits
- `chat/model_test.py` — 214 lines covering session model serialization
- `chat/service_test.py` — Integration tests including multi-turn resume
keyword recall
- `tools/find_block_test.py` / `run_block_test.py` — Extended with new
tool behavior tests

## Test plan
- [x] Unit tests pass (`sdk/response_adapter_test.py`,
`security_hooks_test.py`, `model_test.py`)
- [x] Integration test: multi-turn keyword recall via `--resume`
(`service_test.py::test_sdk_resume_multi_turn`)
- [x] Manual E2E: CoPilot chat sessions with tool calls, bash execution,
and multi-turn context
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, flake8)
- [ ] Staging deployment with `claude_agent_use_resume=false` initially
- [ ] Enable resume in staging, verify transcript capture and recall

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR replaces the existing OpenAI-compatible CoPilot with a full
Claude Agent SDK integration, introducing multi-turn conversations,
stateless resume via JSONL transcripts, and sandboxed tool execution.

**Key changes:**
- **SDK integration** (`chat/sdk/`): spawns Claude Code CLI subprocess
per message, translates events to frontend protocol, bridges MCP tools
- **Stateless resume**: captures JSONL transcripts via Stop hook,
persists in `ChatSession.sdkTranscript`, restores with `--resume`
(feature-flagged, default off)
- **Sandboxed execution**: bubblewrap sandbox for bash commands with
filesystem whitelist, network isolation, resource limits
- **Security hooks**: tool allowlist enforcement, path traversal
prevention, workspace-scoped file operations, sub-agent spawn limits
- **Long-running operations**: delegates `create_agent`/`edit_agent` to
existing stream_registry infrastructure for SSE reconnection
- **Feature flag**: `CHAT_USE_CLAUDE_AGENT_SDK` with LaunchDarkly
support, defaults to enabled

**Security issues found:**
- Path traversal validation has logic errors in `security_hooks.py:82`
(tilde expansion order) and `service.py:266` (redundant `..` check)
- Config validator always prefers env var over explicit `False` value
(`config.py:162`)
- Race condition in `routes.py:323` — message persisted before task
registration, could duplicate on retry
- Resource limits in sandbox may fail silently (`sandbox.py:109`)

**Test coverage is strong** with 366 lines for response adapter, 165 for
security hooks, and integration tests for multi-turn resume.
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- This PR is generally safe but has critical security issues in path
validation that must be fixed before merge
- Score reflects strong architecture and test coverage offset by real
security vulnerabilities: the tilde expansion bug in `security_hooks.py`
could allow sandbox escape, the race condition could cause message
duplication, and the silent ulimit failures could bypass resource
limits. The bubblewrap sandbox and allowlist enforcement are
well-designed, but the path validation bugs need fixing. The transcript
resume feature is properly feature-flagged. Overall the implementation
is solid but the security issues prevent a higher score.
- Pay close attention to
`backend/api/features/chat/sdk/security_hooks.py` (path traversal
vulnerability), `backend/api/features/chat/routes.py` (race condition),
`backend/api/features/chat/tools/sandbox.py` (silent resource limit
failures), and `backend/api/features/chat/sdk/service.py` (redundant
security check)
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Frontend
    participant Routes as routes.py
    participant SDKService as sdk/service.py
    participant ClaudeSDK as Claude Agent SDK CLI
    participant SecurityHooks as security_hooks.py
    participant ToolAdapter as tool_adapter.py
    participant CoPilotTools as tools/*
    participant Sandbox as sandbox.py (bwrap)
    participant DB as Database
    participant Redis as stream_registry

    Frontend->>Routes: POST /chat (user message)
    Routes->>SDKService: stream_chat_completion_sdk()
    
    SDKService->>DB: get_chat_session()
    DB-->>SDKService: session + messages
    
    alt Resume enabled AND transcript exists
        SDKService->>SDKService: validate_transcript()
        SDKService->>SDKService: write_transcript_to_tempfile()
        Note over SDKService: Pass --resume to SDK
    else No resume
        SDKService->>SDKService: _compress_conversation_history()
        Note over SDKService: Inject history into user message
    end
    
    SDKService->>SecurityHooks: create_security_hooks()
    SDKService->>ToolAdapter: create_copilot_mcp_server()
    SDKService->>ClaudeSDK: spawn subprocess with MCP server
    
    loop Streaming Conversation
        ClaudeSDK->>SDKService: AssistantMessage (text/tool_use)
        SDKService->>Frontend: StreamTextDelta / StreamToolInputAvailable
        
        alt Tool Call
            ClaudeSDK->>SecurityHooks: PreToolUse hook
            SecurityHooks->>SecurityHooks: validate path, check allowlist
            alt Tool blocked
                SecurityHooks-->>ClaudeSDK: deny
            else Tool allowed
                SecurityHooks-->>ClaudeSDK: allow
                ClaudeSDK->>ToolAdapter: call MCP tool
                
                alt Long-running tool (create_agent, edit_agent)
                    ToolAdapter->>Redis: register task
                    ToolAdapter->>DB: save OperationPendingResponse
                    ToolAdapter->>ToolAdapter: spawn background task
                    ToolAdapter-->>ClaudeSDK: OperationStartedResponse
                else Regular tool (find_block, bash_exec)
                    ToolAdapter->>CoPilotTools: execute()
                    alt bash_exec
                        CoPilotTools->>Sandbox: run_sandboxed()
                        Sandbox->>Sandbox: build bwrap command
                        Note over Sandbox: Network isolation,<br/>filesystem whitelist,<br/>resource limits
                        Sandbox-->>CoPilotTools: stdout, stderr, exit_code
                    end
                    CoPilotTools-->>ToolAdapter: result
                    ToolAdapter->>ToolAdapter: stash full output
                    ToolAdapter-->>ClaudeSDK: MCP response
                end
                
                SecurityHooks->>SecurityHooks: PostToolUse hook (log)
            end
        end
        
        ClaudeSDK->>SDKService: UserMessage (ToolResultBlock)
        SDKService->>ToolAdapter: pop_pending_tool_output()
        SDKService->>Frontend: StreamToolOutputAvailable
    end
    
    ClaudeSDK->>SecurityHooks: Stop hook
    SecurityHooks->>SDKService: transcript_path callback
    SDKService->>SDKService: read_transcript_file()
    SDKService->>DB: save transcript to session.sdkTranscript
    
    ClaudeSDK->>SDKService: ResultMessage (success)
    SDKService->>Frontend: StreamFinish
    SDKService->>DB: upsert_chat_session()
```
</details>


<sub>Last reviewed commit: 28c1121</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Swifty <craigswift13@gmail.com>
2026-02-13 15:49:03 +00:00
Otto
965b7d3e04 dx: Add PR overlap detection & alert (#12104)
## Summary

Adds an automated workflow that detects potential merge conflicts
between open PRs, helping contributors coordinate proactively.

**Example output:** [See comment on PR
#12057](https://github.com/Significant-Gravitas/AutoGPT/pull/12057#issuecomment-3897330632)

## How it works

1. **Triggered on PR events** — runs when a PR is opened, pushed to, or
reopened
2. **Compares against all open PRs** targeting the same base branch
3. **Detects overlaps** at multiple levels:
   - File overlap (same files modified)
   - Line overlap (same line ranges modified)
   - Actual merge conflicts (attempts real merges)
4. **Posts a comment** on the PR with findings

## Features

- Full file paths with common prefix extraction for readability
- Conflict size (number of conflict regions + lines affected)
- Conflict types (content, added, deleted, modified/deleted, etc.)
- Last-updated timestamps for each PR
- Risk categorization (conflict, medium, low)
- Ignores noise files (openapi.json, lock files)
- Updates existing comment on subsequent pushes (no spam)
- Filters out PRs older than 14 days
- Clone-once optimization for fast merge testing (~48s for 19 PRs)

## Files

- `.github/scripts/detect_overlaps.py` — main detection script
- `.github/workflows/pr-overlap-check.yml` — workflow definition
2026-02-13 15:45:10 +00:00
Bently
c2368f15ff fix(blocks): disable PrintToConsoleBlock (#12100)
## Summary
Disables the Print to Console block as requested by Nick Tindle.

## Changes
- Added `disabled=True` to PrintToConsoleBlock in `basic.py`

## Testing
- Block will no longer appear in the platform UI
- Existing graphs using this block should be checked (block ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`)

Closes OPEN-3000

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Added `disabled=True` parameter to `PrintToConsoleBlock` in `basic.py`
per Nick Tindle's request (OPEN-3000).

- Block follows the same disabling pattern used by other blocks in the
codebase (e.g., `BlockInstallationBlock`, video blocks, Ayrshare blocks)
- Block will no longer appear in the platform UI for new graph creation
- Existing graphs using this block (ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`) will need to be checked for
compatibility
- Comment properly documents the reason for disabling
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- Single-line change that adds a well-documented flag following existing
patterns used throughout the codebase. The change is non-destructive and
only affects UI visibility of the block for new graphs.
- No files require special attention
</details>


<sub>Last reviewed commit: 759003b</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-13 15:20:23 +00:00
dependabot[bot]
9ac3f64d56 chore(deps): bump github/codeql-action from 3 to 4 (#12033)
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v3.32.2</h2>
<ul>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.1">2.24.1</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3460">#3460</a></li>
</ul>
<h2>v3.32.1</h2>
<ul>
<li>A warning is now shown in Default Setup workflow logs if a <a
href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private
package registry is configured</a> using a GitHub Personal Access Token
(PAT), but no username is configured. <a
href="https://redirect.github.com/github/codeql-action/pull/3422">#3422</a></li>
<li>Fixed a bug which caused the CodeQL Action to fail when repository
properties cannot successfully be retrieved. <a
href="https://redirect.github.com/github/codeql-action/pull/3421">#3421</a></li>
</ul>
<h2>v3.32.0</h2>
<ul>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.0">2.24.0</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3425">#3425</a></li>
</ul>
<h2>v3.31.11</h2>
<ul>
<li>When running a Default Setup workflow with <a
href="https://docs.github.com/en/actions/how-tos/monitor-workflows/enable-debug-logging">Actions
debugging enabled</a>, the CodeQL Action will now use more unique names
when uploading logs from the Dependabot authentication proxy as workflow
artifacts. This ensures that the artifact names do not clash between
multiple jobs in a build matrix. <a
href="https://redirect.github.com/github/codeql-action/pull/3409">#3409</a></li>
<li>Improved error handling throughout the CodeQL Action. <a
href="https://redirect.github.com/github/codeql-action/pull/3415">#3415</a></li>
<li>Added experimental support for automatically excluding <a
href="https://docs.github.com/en/repositories/working-with-files/managing-files/customizing-how-changed-files-appear-on-github">generated
files</a> from the analysis. This feature is not currently enabled for
any analysis. In the future, it may be enabled by default for some
GitHub-managed analyses. <a
href="https://redirect.github.com/github/codeql-action/pull/3318">#3318</a></li>
<li>The changelog extracts that are included with releases of the CodeQL
Action are now shorter to avoid duplicated information from appearing in
Dependabot PRs. <a
href="https://redirect.github.com/github/codeql-action/pull/3403">#3403</a></li>
</ul>
<h2>v3.31.10</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.31.10 - 12 Jan 2026</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.9. <a
href="https://redirect.github.com/github/codeql-action/pull/3393">#3393</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.31.10/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.31.9</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.31.9 - 16 Dec 2025</h2>
<p>No user facing changes.</p>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.31.9/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.31.8</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.31.8 - 11 Dec 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.8. <a
href="https://redirect.github.com/github/codeql-action/pull/3354">#3354</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.31.8/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.31.7</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h2>4.31.11 - 23 Jan 2026</h2>
<ul>
<li>When running a Default Setup workflow with <a
href="https://docs.github.com/en/actions/how-tos/monitor-workflows/enable-debug-logging">Actions
debugging enabled</a>, the CodeQL Action will now use more unique names
when uploading logs from the Dependabot authentication proxy as workflow
artifacts. This ensures that the artifact names do not clash between
multiple jobs in a build matrix. <a
href="https://redirect.github.com/github/codeql-action/pull/3409">#3409</a></li>
<li>Improved error handling throughout the CodeQL Action. <a
href="https://redirect.github.com/github/codeql-action/pull/3415">#3415</a></li>
<li>Added experimental support for automatically excluding <a
href="https://docs.github.com/en/repositories/working-with-files/managing-files/customizing-how-changed-files-appear-on-github">generated
files</a> from the analysis. This feature is not currently enabled for
any analysis. In the future, it may be enabled by default for some
GitHub-managed analyses. <a
href="https://redirect.github.com/github/codeql-action/pull/3318">#3318</a></li>
<li>The changelog extracts that are included with releases of the CodeQL
Action are now shorter to avoid duplicated information from appearing in
Dependabot PRs. <a
href="https://redirect.github.com/github/codeql-action/pull/3403">#3403</a></li>
</ul>
<h2>4.31.10 - 12 Jan 2026</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.9. <a
href="https://redirect.github.com/github/codeql-action/pull/3393">#3393</a></li>
</ul>
<h2>4.31.9 - 16 Dec 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.8 - 11 Dec 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.8. <a
href="https://redirect.github.com/github/codeql-action/pull/3354">#3354</a></li>
</ul>
<h2>4.31.7 - 05 Dec 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.7. <a
href="https://redirect.github.com/github/codeql-action/pull/3343">#3343</a></li>
</ul>
<h2>4.31.6 - 01 Dec 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.5 - 24 Nov 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.6. <a
href="https://redirect.github.com/github/codeql-action/pull/3321">#3321</a></li>
</ul>
<h2>4.31.4 - 18 Nov 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.3 - 13 Nov 2025</h2>
<ul>
<li>CodeQL Action v3 will be deprecated in December 2026. The Action now
logs a warning for customers who are running v3 but could be running v4.
For more information, see <a
href="https://github.blog/changelog/2025-10-28-upcoming-deprecation-of-codeql-action-v3/">Upcoming
deprecation of CodeQL Action v3</a>.</li>
<li>Update default CodeQL bundle version to 2.23.5. <a
href="https://redirect.github.com/github/codeql-action/pull/3288">#3288</a></li>
</ul>
<h2>4.31.2 - 30 Oct 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.1 - 30 Oct 2025</h2>
<ul>
<li>The <code>add-snippets</code> input has been removed from the
<code>analyze</code> action. This input has been deprecated since CodeQL
Action 3.26.4 in August 2024 when this removal was announced.</li>
</ul>
<h2>4.31.0 - 24 Oct 2025</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8aac4e47ac"><code>8aac4e4</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3448">#3448</a>
from github/mergeback/v4.32.1-to-main-6bc82e05</li>
<li><a
href="e8d7df4f04"><code>e8d7df4</code></a>
Rebuild</li>
<li><a
href="c1bba77db0"><code>c1bba77</code></a>
Update changelog and version after v4.32.1</li>
<li><a
href="6bc82e05fd"><code>6bc82e0</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3447">#3447</a>
from github/update-v4.32.1-f52cbc830</li>
<li><a
href="42f00f2d33"><code>42f00f2</code></a>
Add a couple of change notes</li>
<li><a
href="cedee6de9f"><code>cedee6d</code></a>
Update changelog for v4.32.1</li>
<li><a
href="f52cbc8309"><code>f52cbc8</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3445">#3445</a>
from github/dependabot/npm_and_yarn/fast-xml-parser-...</li>
<li>See full diff in <a
href="https://github.com/github/codeql-action/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-13 15:04:05 +00:00
Swifty
5035b69c79 feat(platform): add feature request tools for CoPilot chat (#12102)
Users can now search for existing feature requests and submit new ones
directly through the CoPilot chat interface. Requests are tracked in
Linear with customer need attribution.

### Changes 🏗️

**Backend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` to
the CoPilot chat tools registry
- Integrated with Linear GraphQL API for searching issues in the feature
requests project, creating new issues, upserting customers, and
attaching customer needs
- Added `linear_api_key` secret to settings for system-level Linear API
access
- Added response models (`FeatureRequestSearchResponse`,
`FeatureRequestCreatedResponse`, `FeatureRequestInfo`) to the tools
models

**Frontend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` UI
components with full streaming state handling (input-streaming,
input-available, output-available, output-error)
- Added helper utilities for output parsing, type guards, animation
text, and icon rendering
- Wired tools into `ChatMessagesContainer` for rendering in the chat
- Added styleguide examples covering all tool states

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified search returns matching feature requests from Linear
- [x] Verified creating a new feature request creates an issue and
customer need in Linear
- [x] Verified adding a need to an existing issue works via
`existing_issue_id`
  - [x] Verified error states render correctly in the UI
  - [x] Verified styleguide page renders all tool states

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

New secret: `LINEAR_API_KEY` — required for system-level Linear API
operations (defaults to empty string).

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Adds feature request search and creation tools to CoPilot chat,
integrating with Linear's GraphQL API to track user feedback. Users can
now search existing feature requests and submit new ones (or add their
need to existing issues) directly through conversation.

**Key changes:**
- Backend: `SearchFeatureRequestsTool` and `CreateFeatureRequestTool`
with Linear API integration via system-level `LINEAR_API_KEY`
- Frontend: React components with streaming state handling and accordion
UI for search results and creation confirmations
- Models: Added `FeatureRequestSearchResponse` and
`FeatureRequestCreatedResponse` to response types
- Customer need tracking: Upserts customers in Linear and attaches needs
to issues for better feedback attribution

**Issues found:**
- Missing `LINEAR_API_KEY` entry in `.env.default` (required per PR
description checklist)
- Hardcoded project/team IDs reduce maintainability
- Global singleton pattern could cause issues in async contexts
- Using `user_id` as customer name reduces readability in Linear
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor configuration fix required
- The implementation is well-structured with proper error handling, type
safety, and follows existing patterns in the codebase. The missing
`.env.default` entry is a straightforward configuration issue that must
be fixed before deployment but doesn't affect code quality. The other
findings are style improvements that don't impact functionality.
- Verify that `LINEAR_API_KEY` is added to `.env.default` before merging
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant CoPilot UI
    participant LLM
    participant FeatureRequestTool
    participant LinearClient
    participant Linear API

    User->>CoPilot UI: Request feature via chat
    CoPilot UI->>LLM: Send user message
    
    LLM->>FeatureRequestTool: search_feature_requests(query)
    FeatureRequestTool->>LinearClient: query(SEARCH_ISSUES_QUERY)
    LinearClient->>Linear API: POST /graphql (search)
    Linear API-->>LinearClient: searchIssues.nodes[]
    LinearClient-->>FeatureRequestTool: Feature request data
    FeatureRequestTool-->>LLM: FeatureRequestSearchResponse
    
    alt No existing requests found
        LLM->>FeatureRequestTool: create_feature_request(title, description)
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
        LinearClient->>Linear API: POST /graphql (upsert customer)
        Linear API-->>LinearClient: customer {id, name}
        LinearClient-->>FeatureRequestTool: Customer data
        
        FeatureRequestTool->>LinearClient: mutate(ISSUE_CREATE_MUTATION)
        LinearClient->>Linear API: POST /graphql (create issue)
        Linear API-->>LinearClient: issue {id, identifier, url}
        LinearClient-->>FeatureRequestTool: Issue data
        
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
        LinearClient->>Linear API: POST /graphql (attach need)
        Linear API-->>LinearClient: need {id, issue}
        LinearClient-->>FeatureRequestTool: Need data
        FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
    else Existing request found
        LLM->>FeatureRequestTool: create_feature_request(title, description, existing_issue_id)
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
        LinearClient->>Linear API: POST /graphql (upsert customer)
        Linear API-->>LinearClient: customer {id}
        LinearClient-->>FeatureRequestTool: Customer data
        
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
        LinearClient->>Linear API: POST /graphql (attach need to existing)
        Linear API-->>LinearClient: need {id, issue}
        LinearClient-->>FeatureRequestTool: Need data
        FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
    end
    
    LLM-->>CoPilot UI: Tool response + continuation
    CoPilot UI-->>User: Display result with accordion UI
```
</details>


<sub>Last reviewed commit: af2e093</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-13 15:27:00 +01:00
Otto
86af8fc856 ci: apply E2E CI optimizations to Claude workflows (#12097)
## Summary

Applies the CI performance optimizations from #12090 to Claude Code
workflows.

## Changes

### `claude.yml` & `claude-dependabot.yml`
- **pnpm caching**: Replaced manual `actions/cache` with `setup-node`
built-in `cache: "pnpm"`
- Removes 4 steps (set pnpm store dir, cache step, manual config) → 1
step

### `claude-ci-failure-auto-fix.yml`
- **Added dev environment setup** with optimized caching
- Now Claude can run lint/tests when fixing CI failures (previously
could only edit files)
- Uses the same optimized caching patterns

## Dependency

This PR is based on #12090 and will merge after it.

## Testing

- Workflow YAML syntax validated
- Patterns match proven #12090 implementation
- CI caching changes fail gracefully to uncached builds

## Linear

Fixes [SECRT-1950](https://linear.app/autogpt/issue/SECRT-1950)

## Future Enhancements

E2E test data caching could be added to Claude workflows if needed for
running integration tests. Currently Claude workflows set up a dev
environment but don't run E2E tests by default.

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Applies proven CI performance optimizations to Claude workflows by
simplifying pnpm caching and adding dev environment setup to the
auto-fix workflow.

**Key changes:**
- Replaced manual pnpm cache configuration (4 steps) with built-in
`setup-node` `cache: "pnpm"` support in `claude.yml` and
`claude-dependabot.yml`
- Added complete dev environment setup (Python/Poetry + Node.js/pnpm) to
`claude-ci-failure-auto-fix.yml` so Claude can run linting and tests
when fixing CI failures
- Correctly orders `corepack enable` before `setup-node` to ensure pnpm
is available for caching

The changes mirror the optimizations from PR #12090 and maintain
consistency across all Claude workflows.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes are CI infrastructure optimizations that mirror proven
patterns from PR #12090. The pnpm caching simplification reduces
complexity without changing functionality (caching failures gracefully
fall back to uncached builds). The dev environment setup in the auto-fix
workflow is additive and enables Claude to run linting/tests. All YAML
syntax is correct and the step ordering follows best practices.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant GHA as GitHub Actions
    participant Corepack as Corepack
    participant SetupNode as setup-node@v6
    participant Cache as GHA Cache
    participant pnpm as pnpm

    Note over GHA,pnpm: Before (Manual Caching)
    GHA->>SetupNode: Set up Node.js 22
    SetupNode-->>GHA: Node.js ready
    GHA->>Corepack: Enable corepack
    Corepack-->>GHA: pnpm available
    GHA->>pnpm: Configure store directory
    pnpm-->>GHA: Store path set
    GHA->>Cache: actions/cache (manual key)
    Cache-->>GHA: Cache restored/missed
    GHA->>pnpm: Install dependencies
    pnpm-->>GHA: Dependencies installed

    Note over GHA,pnpm: After (Built-in Caching)
    GHA->>Corepack: Enable corepack
    Corepack-->>GHA: pnpm available
    GHA->>SetupNode: Set up Node.js 22<br/>cache: "pnpm"<br/>cache-dependency-path: pnpm-lock.yaml
    SetupNode->>Cache: Auto-detect pnpm store
    Cache-->>SetupNode: Cache restored/missed
    SetupNode-->>GHA: Node.js + cache ready
    GHA->>pnpm: Install dependencies
    pnpm-->>GHA: Dependencies installed
```
</details>


<sub>Last reviewed commit: f1681a0</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-13 13:48:04 +00:00
Otto
dfa517300b debug(copilot): Add detailed API error logging (#11942)
## Summary
Adds comprehensive error logging for OpenRouter/OpenAI API errors to
help diagnose issues like provider routing failures, context length
exceeded, rate limits, etc.

## Background
While investigating
[SECRT-1859](https://linear.app/autogpt/issue/SECRT-1859), we found that
when OpenRouter returns errors, the actual error details weren't being
captured or logged. Langfuse traces showed `provider_name: 'unknown'`
and `completion: null` without any insight into WHY all providers
rejected the request.

## Changes
- Add `_extract_api_error_details()` to extract rich information from
API errors including:
  - Status code and request ID
  - Response body (contains OpenRouter's actual error message)
  - OpenRouter-specific headers (provider, model)
  - Rate limit headers
- Add `_log_api_error()` helper that logs errors with context:
  - Session ID for correlation
  - Message count (helps identify context length issues)
  - Model being used
  - Retry count
- Update error handling in `_stream_chat_chunks()` and
`_generate_llm_continuation()` to use new logging
- Extract provider's error message from response body for better user
feedback

## Example log output
```
API error: {
  'error_type': 'APIStatusError',
  'error_message': 'Provider returned error',
  'status_code': 400,
  'request_id': 'req_xxx',
  'response_body': {'error': {'message': 'context_length_exceeded', 'type': 'invalid_request_error'}},
  'openrouter_provider': 'unknown',
  'session_id': '44fbb803-...',
  'message_count': 52,
  'model': 'anthropic/claude-opus-4.5',
  'retry_count': 0
}
```

## Testing
- [ ] Verified code passes linting (black, isort, ruff)
- [ ] Error details are properly extracted from different error types

## Refs
- Linear: SECRT-1859
- Thread:
https://discord.com/channels/1126875755960336515/1467066151002571034

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-13 13:15:17 +00:00
Reinier van der Leer
43b25b5e2f ci(frontend): Speed up E2E test job (#12090)
The frontend `e2e_test` doesn't have a working build cache setup,
causing really slow builds = slow test jobs. These changes reduce total
test runtime from ~12 minutes to ~5 minutes.

### Changes 🏗️

- Inject build cache config into docker compose config; let `buildx
bake` use GHA cache directly
  - Add `docker-ci-fix-compose-build-cache.py` script
- Optimize `backend/Dockerfile` + root `.dockerignore`
- Replace broken DIY pnpm store caching with `actions/setup-node`
built-in cache management
- Add caching for test seed data created in DB

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI
2026-02-13 11:09:41 +01:00
Swifty
ab0b537cc7 refactor(backend): optimize find_block response size by removing raw JSON schemas (#12020)
### Changes 🏗️

The `find_block` AutoPilot tool was returning ~90K characters per
response (10 blocks). The bloat came from including full JSON Schema
objects (`input_schema`, `output_schema`) with all nested `$defs`,
`anyOf`, and type definitions for every block.

**What changed:**

- **`BlockInfoSummary` model**: Removed `input_schema` (raw JSON
Schema), `output_schema` (raw JSON Schema), and `categories`. Added
`output_fields` (compact field-level summaries matching the existing
`required_inputs` format).
- **`BlockListResponse` model**: Removed `usage_hint` (info now in
`message`).
- **`FindBlockTool._execute()`**: Now extracts compact `output_fields`
from output schema properties instead of including the entire raw
schema. Credentials handling is unchanged.
- **Test**: Added `test_response_size_average_chars_per_block` with
realistic block schemas (HTTP, Email, Claude Code) to measure and assert
response size stays under 2K chars/block.
- **`CLAUDE.md`**: Clarified `dev` vs `master` branching strategy.

**Result:** Average response size reduced from ~9,000 to ~1,300 chars
per block (~85% reduction). This directly reduces LLM token consumption,
latency, and API costs for AutoPilot interactions.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified models import and serialize correctly
- [x] Verified response size: 3,970 chars for 3 realistic blocks (avg
1,323/block)
- [x] Lint (`ruff check`) and type check (`pyright`) pass on changed
files
- [x] Frontend compatibility preserved: `blocks[].name` and `count`
fields retained for `block_list` handler

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
2026-02-13 11:08:51 +01:00
dependabot[bot]
9a8c6ad609 chore(libs/deps): bump the production-dependencies group across 1 directory with 4 updates (#12056)
Bumps the production-dependencies group with 4 updates in the
/autogpt_platform/autogpt_libs directory:
[cryptography](https://github.com/pyca/cryptography),
[fastapi](https://github.com/fastapi/fastapi),
[launchdarkly-server-sdk](https://github.com/launchdarkly/python-server-sdk)
and [supabase](https://github.com/supabase/supabase-py).

Updates `cryptography` from 46.0.4 to 46.0.5
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst">cryptography's
changelog</a>.</em></p>
<blockquote>
<p>46.0.5 - 2026-02-10</p>
<pre><code>
* An attacker could create a malicious public key that reveals portions
of your
private key when using certain uncommon elliptic curves (binary curves).
This version now includes additional security checks to prevent this
attack.
This issue only affects binary elliptic curves, which are rarely used in
real-world applications. Credit to **XlabAI Team of Tencent Xuanwu Lab
and
Atuin Automated Vulnerability Discovery Engine** for reporting the
issue.
  **CVE-2026-26007**
* Support for ``SECT*`` binary elliptic curves is deprecated and will be
  removed in the next release.
<p>.. v46-0-4:<br />
</code></pre></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="06e120e682"><code>06e120e</code></a>
bump version for 46.0.5 release (<a
href="https://redirect.github.com/pyca/cryptography/issues/14289">#14289</a>)</li>
<li><a
href="0eebb9dbb6"><code>0eebb9d</code></a>
EC check key on cofactor &gt; 1 (<a
href="https://redirect.github.com/pyca/cryptography/issues/14287">#14287</a>)</li>
<li><a
href="bedf6e186b"><code>bedf6e1</code></a>
fix openssl version on 46 branch (<a
href="https://redirect.github.com/pyca/cryptography/issues/14220">#14220</a>)</li>
<li>See full diff in <a
href="https://github.com/pyca/cryptography/compare/46.0.4...46.0.5">compare
view</a></li>
</ul>
</details>
<br />

Updates `fastapi` from 0.128.0 to 0.128.7
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/fastapi/fastapi/releases">fastapi's
releases</a>.</em></p>
<blockquote>
<h2>0.128.7</h2>
<h3>Features</h3>
<ul>
<li> Show a clear error on attempt to include router into itself. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14258">#14258</a>
by <a
href="https://github.com/JavierSanchezCastro"><code>@​JavierSanchezCastro</code></a>.</li>
<li> Replace <code>dict</code> by <code>Mapping</code> on
<code>HTTPException.headers</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/12997">#12997</a>
by <a
href="https://github.com/rijenkii"><code>@​rijenkii</code></a>.</li>
</ul>
<h3>Refactors</h3>
<ul>
<li>♻️ Simplify reading files in memory, do it sequentially instead of
(fake) parallel. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14884">#14884</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h3>Docs</h3>
<ul>
<li>📝 Use <code>dfn</code> tag for definitions instead of
<code>abbr</code> in docs. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14744">#14744</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h3>Internal</h3>
<ul>
<li> Tweak comment in test to reference PR. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14885">#14885</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
<li>🔧 Update LLM-prompt for <code>abbr</code> and <code>dfn</code> tags.
PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14747">#14747</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
<li> Test order for the submitted byte Files. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14828">#14828</a>
by <a
href="https://github.com/valentinDruzhinin"><code>@​valentinDruzhinin</code></a>.</li>
<li>🔧 Configure <code>test</code> workflow to run tests with
<code>inline-snapshot=review</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14876">#14876</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h2>0.128.6</h2>
<h3>Fixes</h3>
<ul>
<li>🐛 Fix <code>on_startup</code> and <code>on_shutdown</code>
parameters of <code>APIRouter</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14873">#14873</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h3>Translations</h3>
<ul>
<li>🌐 Update translations for zh (update-outdated). PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14843">#14843</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h3>Internal</h3>
<ul>
<li> Fix parameterized tests with snapshots. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14875">#14875</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h2>0.128.5</h2>
<h3>Refactors</h3>
<ul>
<li>♻️ Refactor and simplify Pydantic v2 (and v1) compatibility internal
utils. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14862">#14862</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h3>Internal</h3>
<ul>
<li> Add inline snapshot tests for OpenAPI before changes from Pydantic
v2. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14864">#14864</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h2>0.128.4</h2>
<h3>Refactors</h3>
<ul>
<li>♻️ Refactor internals, simplify Pydantic v2/v1 utils,
<code>create_model_field</code>, better types for
<code>lenient_issubclass</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14860">#14860</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
<li>♻️ Simplify internals, remove Pydantic v1 only logic, no longer
needed. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14857">#14857</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
<li>♻️ Refactor internals, cleanup unneeded Pydantic v1 specific logic.
PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14856">#14856</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8f82c94de0"><code>8f82c94</code></a>
🔖 Release version 0.128.7</li>
<li><a
href="5bb3423205"><code>5bb3423</code></a>
📝 Update release notes</li>
<li><a
href="6ce5e3e961"><code>6ce5e3e</code></a>
 Tweak comment in test to reference PR (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14885">#14885</a>)</li>
<li><a
href="65da3dde12"><code>65da3dd</code></a>
📝 Update release notes</li>
<li><a
href="81f82fd955"><code>81f82fd</code></a>
🔧 Update LLM-prompt for <code>abbr</code> and <code>dfn</code> tags (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14747">#14747</a>)</li>
<li><a
href="ff721017df"><code>ff72101</code></a>
📝 Update release notes</li>
<li><a
href="ca76a4eba9"><code>ca76a4e</code></a>
📝 Use <code>dfn</code> tag for definitions instead of <code>abbr</code>
in docs (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14744">#14744</a>)</li>
<li><a
href="1133a4594d"><code>1133a45</code></a>
📝 Update release notes</li>
<li><a
href="38f965985e"><code>38f9659</code></a>
 Test order for the submitted byte Files (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14828">#14828</a>)</li>
<li><a
href="3f1cc8f8f5"><code>3f1cc8f</code></a>
📝 Update release notes</li>
<li>Additional commits viewable in <a
href="https://github.com/fastapi/fastapi/compare/0.128.0...0.128.7">compare
view</a></li>
</ul>
</details>
<br />

Updates `launchdarkly-server-sdk` from 9.14.1 to 9.15.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/launchdarkly/python-server-sdk/releases">launchdarkly-server-sdk's
releases</a>.</em></p>
<blockquote>
<h2>v9.15.0</h2>
<h2><a
href="https://github.com/launchdarkly/python-server-sdk/compare/9.14.1...9.15.0">9.15.0</a>
(2026-02-10)</h2>
<h3>Features</h3>
<ul>
<li>Drop support for python 3.9 (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/393">#393</a>)
(<a
href="5b761bd306">5b761bd</a>)</li>
<li>Update ChangeSet to always require a Selector (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/405">#405</a>)
(<a
href="5dc4f81688">5dc4f81</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Add context manager for clearer, safer locks (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/396">#396</a>)
(<a
href="beca0fa498">beca0fa</a>)</li>
<li>Address potential race condition in FeatureStore update_availability
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/391">#391</a>)
(<a
href="31cf4875c3">31cf487</a>)</li>
<li>Allow modifying fdv2 data source options independent of main config
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/403">#403</a>)
(<a
href="d78079e7f3">d78079e</a>)</li>
<li>Mark copy_with_new_sdk_key method as deprecated (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/353">#353</a>)
(<a
href="e471ccc3d5">e471ccc</a>)</li>
<li>Prevent immediate polling on recoverable error (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/399">#399</a>)
(<a
href="da565a2dce">da565a2</a>)</li>
<li>Redis store is considered initialized when <code>$inited</code> key
is written (<a
href="e99a27d48f">e99a27d</a>)</li>
<li>Stop FeatureStoreClientWrapper poller on close (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/397">#397</a>)
(<a
href="468afdfef3">468afdf</a>)</li>
<li>Update DataSystemConfig to accept list of synchronizers (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/404">#404</a>)
(<a
href="c73ad14090">c73ad14</a>)</li>
<li>Update reason documentation with inExperiment value (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/401">#401</a>)
(<a
href="cbfc3dd887">cbfc3dd</a>)</li>
<li>Update Redis to write missing <code>$inited</code> key (<a
href="e99a27d48f">e99a27d</a>)</li>
</ul>
<hr />
<p>This PR was generated with <a
href="https://github.com/googleapis/release-please">Release Please</a>.
See <a
href="https://github.com/googleapis/release-please#release-please">documentation</a>.</p>
<!-- raw HTML omitted -->
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/launchdarkly/python-server-sdk/blob/main/CHANGELOG.md">launchdarkly-server-sdk's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/launchdarkly/python-server-sdk/compare/9.14.1...9.15.0">9.15.0</a>
(2026-02-10)</h2>
<h3>⚠ BREAKING CHANGES</h3>
<p><strong>Note:</strong> The following breaking changes apply only to
FDv2 (Flag Delivery v2) early access features, which are not subject to
semantic versioning and may change without a major version bump.</p>
<ul>
<li>Update ChangeSet to always require a Selector (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/405">#405</a>)
(<a
href="5dc4f81688">5dc4f81</a>)
<ul>
<li>The <code>ChangeSetBuilder.finish()</code> method now requires a
<code>Selector</code> parameter.</li>
</ul>
</li>
<li>Update DataSystemConfig to accept list of synchronizers (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/404">#404</a>)
(<a
href="c73ad14090">c73ad14</a>)
<ul>
<li>The <code>DataSystemConfig.synchronizers</code> field now accepts a
list of synchronizers, and the
<code>ConfigBuilder.synchronizers()</code> method accepts variadic
arguments.</li>
</ul>
</li>
</ul>
<h3>Features</h3>
<ul>
<li>Drop support for python 3.9 (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/393">#393</a>)
(<a
href="5b761bd306">5b761bd</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Add context manager for clearer, safer locks (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/396">#396</a>)
(<a
href="beca0fa498">beca0fa</a>)</li>
<li>Address potential race condition in FeatureStore update_availability
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/391">#391</a>)
(<a
href="31cf4875c3">31cf487</a>)</li>
<li>Allow modifying fdv2 data source options independent of main config
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/403">#403</a>)
(<a
href="d78079e7f3">d78079e</a>)</li>
<li>Mark copy_with_new_sdk_key method as deprecated (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/353">#353</a>)
(<a
href="e471ccc3d5">e471ccc</a>)</li>
<li>Prevent immediate polling on recoverable error (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/399">#399</a>)
(<a
href="da565a2dce">da565a2</a>)</li>
<li>Redis store is considered initialized when <code>$inited</code> key
is written (<a
href="e99a27d48f">e99a27d</a>)</li>
<li>Stop FeatureStoreClientWrapper poller on close (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/397">#397</a>)
(<a
href="468afdfef3">468afdf</a>)</li>
<li>Update reason documentation with inExperiment value (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/401">#401</a>)
(<a
href="cbfc3dd887">cbfc3dd</a>)</li>
<li>Update Redis to write missing <code>$inited</code> key (<a
href="e99a27d48f">e99a27d</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e542f737a6"><code>e542f73</code></a>
chore(main): release 9.15.0 (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/394">#394</a>)</li>
<li><a
href="e471ccc3d5"><code>e471ccc</code></a>
fix: Mark copy_with_new_sdk_key method as deprecated (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/353">#353</a>)</li>
<li><a
href="5dc4f81688"><code>5dc4f81</code></a>
feat: Update ChangeSet to always require a Selector (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/405">#405</a>)</li>
<li><a
href="f20fffeb1e"><code>f20fffe</code></a>
chore: Remove dead code, clarify names, other cleanup (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/398">#398</a>)</li>
<li><a
href="c73ad14090"><code>c73ad14</code></a>
fix: Update DataSystemConfig to accept list of synchronizers (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/404">#404</a>)</li>
<li><a
href="d78079e7f3"><code>d78079e</code></a>
fix: Allow modifying fdv2 data source options independent of main config
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/403">#403</a>)</li>
<li><a
href="e99a27d48f"><code>e99a27d</code></a>
chore: Support persistent data store verification in contract tests (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/402">#402</a>)</li>
<li><a
href="cbfc3dd887"><code>cbfc3dd</code></a>
fix: Update reason documentation with inExperiment value (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/401">#401</a>)</li>
<li><a
href="5a1adbb2de"><code>5a1adbb</code></a>
chore: Update sdk_metadata features (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/400">#400</a>)</li>
<li><a
href="da565a2dce"><code>da565a2</code></a>
fix: Prevent immediate polling on recoverable error (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/399">#399</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/launchdarkly/python-server-sdk/compare/9.14.1...9.15.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `supabase` from 2.27.2 to 2.28.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/supabase/supabase-py/releases">supabase's
releases</a>.</em></p>
<blockquote>
<h2>v2.28.0</h2>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.3...v2.28.0">2.28.0</a>
(2026-02-10)</h2>
<h3>Features</h3>
<ul>
<li><strong>storage:</strong> add list_v2 method to file_api client (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1377">#1377</a>)
(<a
href="259f4ad42d">259f4ad</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>auth:</strong> add missing is_sso_user, deleted_at,
banned_until to User model (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1375">#1375</a>)
(<a
href="7f84a62996">7f84a62</a>)</li>
<li><strong>realtime:</strong> ensure remove_channel removes channel
from channels dict (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1373">#1373</a>)
(<a
href="0923314039">0923314</a>)</li>
<li><strong>realtime:</strong> use pop with default in _handle_message
to prevent KeyError (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1388">#1388</a>)
(<a
href="baea26f7ce">baea26f</a>)</li>
<li><strong>storage3:</strong> replace print() with warnings.warn() for
trailing slash notice (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1380">#1380</a>)
(<a
href="50b099fa06">50b099f</a>)</li>
</ul>
<h2>v2.27.3</h2>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.2...v2.27.3">2.27.3</a>
(2026-02-03)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>deprecate python 3.9 in all packages (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1365">#1365</a>)
(<a
href="cc72ed75d4">cc72ed7</a>)</li>
<li>ensure storage_url has trailing slash to prevent warning (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1367">#1367</a>)
(<a
href="4267ff1345">4267ff1</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/supabase/supabase-py/blob/main/CHANGELOG.md">supabase's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.3...v2.28.0">2.28.0</a>
(2026-02-10)</h2>
<h3>Features</h3>
<ul>
<li><strong>storage:</strong> add list_v2 method to file_api client (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1377">#1377</a>)
(<a
href="259f4ad42d">259f4ad</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>auth:</strong> add missing is_sso_user, deleted_at,
banned_until to User model (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1375">#1375</a>)
(<a
href="7f84a62996">7f84a62</a>)</li>
<li><strong>realtime:</strong> ensure remove_channel removes channel
from channels dict (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1373">#1373</a>)
(<a
href="0923314039">0923314</a>)</li>
<li><strong>realtime:</strong> use pop with default in _handle_message
to prevent KeyError (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1388">#1388</a>)
(<a
href="baea26f7ce">baea26f</a>)</li>
<li><strong>storage3:</strong> replace print() with warnings.warn() for
trailing slash notice (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1380">#1380</a>)
(<a
href="50b099fa06">50b099f</a>)</li>
</ul>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.2...v2.27.3">2.27.3</a>
(2026-02-03)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>deprecate python 3.9 in all packages (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1365">#1365</a>)
(<a
href="cc72ed75d4">cc72ed7</a>)</li>
<li>ensure storage_url has trailing slash to prevent warning (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1367">#1367</a>)
(<a
href="4267ff1345">4267ff1</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="59e338400b"><code>59e3384</code></a>
chore(main): release 2.28.0 (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1378">#1378</a>)</li>
<li><a
href="baea26f7ce"><code>baea26f</code></a>
fix(realtime): use pop with default in _handle_message to prevent
KeyError (#...</li>
<li><a
href="259f4ad42d"><code>259f4ad</code></a>
feat(storage): add list_v2 method to file_api client (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1377">#1377</a>)</li>
<li><a
href="50b099fa06"><code>50b099f</code></a>
fix(storage3): replace print() with warnings.warn() for trailing slash
notice...</li>
<li><a
href="0923314039"><code>0923314</code></a>
fix(realtime): ensure remove_channel removes channel from channels dict
(<a
href="https://redirect.github.com/supabase/supabase-py/issues/1373">#1373</a>)</li>
<li><a
href="7f84a62996"><code>7f84a62</code></a>
fix(auth): add missing is_sso_user, deleted_at, banned_until to User
model (#...</li>
<li><a
href="57dd6e2195"><code>57dd6e2</code></a>
chore(deps): bump the uv group across 1 directory with 3 updates (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1369">#1369</a>)</li>
<li><a
href="c357def670"><code>c357def</code></a>
chore(main): release 2.27.3 (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1368">#1368</a>)</li>
<li><a
href="4267ff1345"><code>4267ff1</code></a>
fix: ensure storage_url has trailing slash to prevent warning (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1367">#1367</a>)</li>
<li><a
href="cc72ed75d4"><code>cc72ed7</code></a>
fix: deprecate python 3.9 in all packages (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1365">#1365</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/supabase/supabase-py/compare/v2.27.2...v2.28.0">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Dependency update bumps 4 packages in the production-dependencies group,
including a **critical security patch for `cryptography`**
(CVE-2026-26007) that prevents malicious public key attacks on binary
elliptic curves. The update also includes bug fixes for `fastapi`,
`launchdarkly-server-sdk`, and `supabase`.

- **cryptography** 46.0.4 → 46.0.5: patches CVE-2026-26007, deprecates
SECT* binary curves
- **fastapi** 0.128.0 → 0.128.7: bug fixes, improved error handling,
relaxed Starlette constraint
- **launchdarkly-server-sdk** 9.14.1 → 9.15.0: drops Python 3.9 support
(requires >=3.10), fixes race conditions
- **supabase** 2.27.2/2.27.3 → 2.28.0: realtime fixes, new User model
fields

The lock files correctly resolve all dependencies. Python 3.10+
requirement is already enforced in both packages. However, backend's
`pyproject.toml` still specifies `launchdarkly-server-sdk = "^9.14.1"`
while the lock file uses 9.15.0 (pulled from autogpt_libs dependency),
creating a minor version constraint inconsistency.
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with one minor style suggestion
- Automated dependency update with critical security patch for
cryptography. All updates are backwards-compatible within semver
constraints. Lock files correctly resolve all dependencies. Python 3.10+
is already enforced. Only minor issue is version constraint
inconsistency in backend's pyproject.toml for launchdarkly-server-sdk,
which doesn't affect functionality but should be aligned for clarity.
- autogpt_platform/backend/pyproject.toml needs launchdarkly-server-sdk
version constraint updated to ^9.15.0
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
2026-02-13 09:10:11 +00:00
Ubbe
e8c50b96d1 fix(frontend): improve CoPilot chat table styling (#12094)
## Summary
- Remove left and right borders from tables rendered in CoPilot chat
- Increase cell padding (py-3 → py-3.5) for better spacing between text
and lines
- Applies to both Streamdown (main chat) and MarkdownRenderer (tool
outputs)

Design feedback from Olivia to make tables "breathe" more.

## Test plan
- [ ] Open CoPilot chat and trigger a response containing a table
- [ ] Verify tables no longer have left/right borders
- [ ] Verify increased spacing between rows
- [ ] Check both light and dark modes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Improved CoPilot chat table styling by removing left and right borders
and increasing vertical padding from `py-3` to `py-3.5`. Changes apply
to both:
- Streamdown-rendered tables (via CSS selector in `globals.css`)  
- MarkdownRenderer tables (via Tailwind classes)

The changes make tables "breathe" more per design feedback from Olivia.

**Issue Found:**
- The CSS padding value in `globals.css:192` is `0.625rem` (`py-2.5`)
but should be `0.875rem` (`py-3.5`) to match the PR description and the
MarkdownRenderer implementation.
</details>


<details><summary><h3>Confidence Score: 2/5</h3></summary>

- This PR has a logical error that will cause inconsistent table styling
between Streamdown and MarkdownRenderer tables
- The implementation has an inconsistency where the CSS file uses
`py-2.5` padding while the PR description and MarkdownRenderer use
`py-3.5`. This will result in different table padding between the two
rendering systems, contradicting the goal of consistent styling
improvements.
- Pay close attention to `autogpt_platform/frontend/src/app/globals.css`
- the padding value needs to be corrected to match the intended design
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-02-13 09:38:59 +08:00
Ubbe
30e854569a feat(frontend): add exact timestamp tooltip on run timestamps (#12087)
Resolves OPEN-2693: Make exact timestamp of runs accessible through UI.

The NewAgentLibraryView shows relative timestamps ("2 days ago") for
runs and schedules, but unlike the OldAgentLibraryView it didn't show
the exact timestamp on hover. This PR adds a native `title` tooltip so
users can see the full date/time by hovering.

### Changes 🏗️

- Added `descriptionTitle` prop to `SidebarItemCard` that renders as a
`title` attribute on the description text
- `TaskListItem` now passes the exact `run.started_at` timestamp via
`descriptionTitle`
- `ScheduleListItem` now passes the exact `schedule.next_run_time`
timestamp via `descriptionTitle`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [ ] Open an agent in the library view
- [ ] Hover over a run's relative timestamp (e.g. "2 days ago") and
confirm the full date/time tooltip appears
- [ ] Hover over a schedule's relative timestamp and confirm the full
date/time tooltip appears

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Added native tooltip functionality to show exact timestamps in the
library view. The implementation adds a `descriptionTitle` prop to
`SidebarItemCard` that renders as a `title` attribute on the description
text. This allows users to hover over relative timestamps (e.g., "2 days
ago") to see the full date/time.

**Changes:**
- Added optional `descriptionTitle` prop to `SidebarItemCard` component
(SidebarItemCard.tsx:10)
- `TaskListItem` passes `run.started_at` as the tooltip value
(TaskListItem.tsx:84-86)
- `ScheduleListItem` passes `schedule.next_run_time` as the tooltip
value (ScheduleListItem.tsx:32)
- Unrelated fix included: Sentry configuration updated to suppress
cross-origin stylesheet errors (instrumentation-client.ts:25-28)

**Note:** The PR includes two separate commits - the main timestamp
tooltip feature and a Sentry error suppression fix. The PR description
only documents the timestamp feature.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes are straightforward and limited in scope - adding an
optional prop that forwards a native HTML attribute for tooltip
functionality. The Text component already supports forwarding arbitrary
HTML attributes through its spread operator (...rest), ensuring the
`title` attribute works correctly. Both the timestamp tooltip feature
and the Sentry configuration fix are low-risk improvements with no
breaking changes.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant TaskListItem
    participant ScheduleListItem
    participant SidebarItemCard
    participant Text
    participant Browser

    User->>TaskListItem: Hover over run timestamp
    TaskListItem->>SidebarItemCard: Pass descriptionTitle (run.started_at)
    SidebarItemCard->>Text: Render with title attribute
    Text->>Browser: Forward title attribute to DOM
    Browser->>User: Display native tooltip with exact timestamp

    User->>ScheduleListItem: Hover over schedule timestamp
    ScheduleListItem->>SidebarItemCard: Pass descriptionTitle (schedule.next_run_time)
    SidebarItemCard->>Text: Render with title attribute
    Text->>Browser: Forward title attribute to DOM
    Browser->>User: Display native tooltip with exact timestamp
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 09:38:16 +08:00
Ubbe
301d7cbada fix(frontend): suppress cross-origin stylesheet security error (#12086)
## Summary
- Adds `ignoreErrors` to the Sentry client configuration
(`instrumentation-client.ts`) to filter out `SecurityError:
CSSStyleSheet.cssRules getter: Not allowed to access cross-origin
stylesheet` errors
- These errors are caused by Sentry Replay (rrweb) attempting to
serialize DOM snapshots that include cross-origin stylesheets (from
browser extensions or CDN-loaded CSS)
- This was reported via Sentry on production, occurring on any page when
logged in

## Changes
- **`frontend/instrumentation-client.ts`**: Added `ignoreErrors: [/Not
allowed to access cross-origin stylesheet/]` to `Sentry.init()` config

## Test plan
- [ ] Verify the error no longer appears in Sentry after deployment
- [ ] Verify Sentry Replay still works correctly for other errors
- [ ] Verify no regressions in error tracking (other errors should still
be captured)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Adds error filtering to Sentry client configuration to suppress
cross-origin stylesheet security errors that occur when Sentry Replay
(rrweb) attempts to serialize DOM snapshots containing stylesheets from
browser extensions or CDN-loaded CSS. This prevents noise in Sentry
error logs without affecting the capture of legitimate errors.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The change adds a simple error filter to suppress benign cross-origin
stylesheet errors that are caused by Sentry Replay itself. The regex
pattern is specific and only affects client-side error reporting, with
no impact on application functionality or legitimate error capture
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 09:37:54 +08:00
Ubbe
d95aef7665 fix(copilot): stream timeout, long-running tool polling, and CreateAgent UI refresh (#12070)
Agent generation completes on the backend but the UI does not
update/refresh to show the result.

### Changes 🏗️

![Uploading Screenshot 2026-02-13 at 00.44.54.png…]()


- **Stream start timeout (12s):** If the backend doesn't begin streaming
within 12 seconds of submitting a message, the stream is aborted and a
destructive toast is shown to the user.
- **Long-running tool polling:** Added `useLongRunningToolPolling` hook
that polls the session endpoint every 1.5s while a tool output is in an
operating state (`operation_started` / `operation_pending` /
`operation_in_progress`). When the backend completes, messages are
refreshed so the UI reflects the final result.
- **CreateAgent UI improvements:** Replaced the orbit loader / progress
bar with a mini-game, added expanded accordion for saved agents, and
improved the saved-agent card with image, icons, and links that open in
new tabs.
- **Backend tweaks:** Added `image_url` to `CreateAgentToolOutput`,
minor model/service updates for the dummy agent generator.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Send a message and verify the stream starts within 12s or a toast
appears
- [x] Trigger agent creation and verify the UI updates when the backend
completes
- [x] Verify the saved-agent card renders correctly with image, links,
and icons

---------

Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 20:06:40 +00:00
Nicholas Tindle
cb166dd6fb feat(blocks): Store sandbox files to workspace (#12073)
Store files created by sandbox blocks (Claude Code, Code Executor) to
the user's workspace for persistence across runs.

### Changes 🏗️

- **New `sandbox_files.py` utility** (`backend/util/sandbox_files.py`)
  - Shared module for extracting files from E2B sandboxes
- Stores files to workspace via `store_media_file()` (includes virus
scanning, size limits)
  - Returns `SandboxFileOutput` with path, content, and `workspace_ref`

- **Claude Code block** (`backend/blocks/claude_code.py`)
  - Added `workspace_ref` field to `FileOutput` schema
  - Replaced inline `_extract_files()` with shared utility
  - Files from working directory now stored to workspace automatically

- **Code Executor block** (`backend/blocks/code_executor.py`)
  - Added `files` output field to `ExecuteCodeBlock.Output`
  - Creates `/output` directory in sandbox before execution
  - Extracts all files (text + binary) from `/output` after execution
- Updated `execute_code()` to support file extraction with
`extract_files` param

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create agent with Claude Code block, have it create a file, verify
`workspace_ref` in output
- [x] Create agent with Code Executor block, write file to `/output`,
verify `workspace_ref` in output
  - [x] Verify files persist in workspace after sandbox disposal
- [x] Verify binary files (images, etc.) work correctly in Code Executor
- [x] Verify existing graphs using `content` field still work (backward
compat)

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No configuration changes required - this is purely additive backend
code.

---

**Related:** Closes SECRT-1931

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds automatic extraction and workspace storage of sandbox-written
files (including binaries for code execution), which can affect output
payload size, performance, and file-handling edge cases.
> 
> **Overview**
> **Sandbox blocks now persist generated files to workspace.** A new
shared utility (`backend/util/sandbox_files.py`) extracts files from an
E2B sandbox (scoped by a start timestamp) and stores them via
`store_media_file`, returning `SandboxFileOutput` with `workspace_ref`.
> 
> `ClaudeCodeBlock` replaces its inline file-scraping logic with this
utility and updates the `files` output schema to include
`workspace_ref`.
> 
> `ExecuteCodeBlock` adds a `files` output and extends the executor
mixin to optionally extract/store files (text + binary) when an
`execution_context` is provided; related mocks/tests and docs are
updated accordingly.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
343854c0cf. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 15:56:59 +00:00
Swifty
3d31f62bf1 Revert "added feature request tooling"
This reverts commit b8b6c9de23.
2026-02-12 16:39:24 +01:00
Swifty
b8b6c9de23 added feature request tooling 2026-02-12 16:38:17 +01:00
Abhimanyu Yadav
4f6055f494 refactor(frontend): remove default expiration date from API key credentials form (#12092)
### Changes 🏗️

Removed the default expiration date for API keys in the credentials
modal. Previously, API keys were set to expire the next day by default,
but now the expiration date field starts empty, allowing users to
explicitly choose whether they want to set an expiration date.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open the API key credentials modal and verify the expiration date
field is empty by default
  - [x] Test creating an API key with and without an expiration date
  - [x] Verify both scenarios work correctly

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Removed the default expiration date for API key credentials in the
credentials modal. Previously, API keys were automatically set to expire
the next day at midnight. Now the expiration date field starts empty,
allowing users to explicitly choose whether to set an expiration.

- Removed `getDefaultExpirationDate()` helper function that calculated
tomorrow's date
- Changed default `expiresAt` value from calculated date to empty string
- Backend already supports optional expiration (`expires_at?: number`),
so no backend changes needed
- Form submission correctly handles empty expiration by passing
`undefined` to the API
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes are straightforward and well-contained. The refactor
removes a helper function and changes a default value. The backend API
already supports optional expiration dates, and the form submission
logic correctly handles empty values by passing undefined. The change
improves UX by not forcing a default expiration date on users.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-12 12:57:06 +00:00
Otto
695a185fa1 fix(frontend): remove fixed min-height from CoPilot message container (#12091)
## Summary

Removes the `min-h-screen` class from `ConversationContent` in
ChatMessagesContainer, which was causing fixed height layout issues in
the CoPilot chat interface.

## Changes

- Removed `min-h-screen` from ConversationContent className

## Linear

Fixes [SECRT-1944](https://linear.app/autogpt/issue/SECRT-1944)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Removes the `min-h-screen` (100vh) class from `ConversationContent` that
was causing the chat message container to enforce a minimum viewport
height. The parent container already handles height constraints with
`h-full min-h-0` and flexbox layout, so the fixed minimum height was
creating layout conflicts. The component now properly grows within its
flex container using `flex-1`.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The change removes a single problematic CSS class that was causing
fixed height layout issues. The parent container already handles height
constraints properly with flexbox, and removing min-h-screen allows the
component to size correctly within its flex parent. This is a targeted,
low-risk bug fix with no logic changes.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-12 12:46:29 +00:00
Reinier van der Leer
113e87a23c refactor(backend): Reduce circular imports (#12068)
I'm getting circular import issues because there is a lot of
cross-importing between `backend.data`, `backend.blocks`, and other
modules. This change reduces block-related cross-imports and thus risk
of breaking circular imports.

### Changes 🏗️

- Strip down `backend.data.block`
- Move `Block` base class and related class/enum defs to
`backend.blocks._base`
  - Move `is_block_auth_configured` to `backend.blocks._utils`
- Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks`
(`__init__.py`)
  - Update imports everywhere
- Remove unused and poorly typed `Block.create()`
  - Change usages from `block_cls.create()` to `block_cls()`
- Improve typing of `load_all_blocks` and `get_blocks`
- Move cross-import of `backend.api.features.library.model` from
`backend/data/__init__.py` to `backend/data/integrations.py`
- Remove deprecated attribute `NodeModel.webhook`
  - Re-generate OpenAPI spec and fix frontend usage
- Eliminate module-level `backend.blocks` import from `blocks/agent.py`
- Eliminate module-level `backend.data.execution` and
`backend.executor.manager` imports from `blocks/helpers/review.py`
- Replace `BlockInput` with `GraphInput` for graph inputs

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI static type-checking + tests should be sufficient for this
2026-02-12 12:07:49 +00:00
Abhimanyu Yadav
d09f1532a4 feat(frontend): replace legacy builder with new flow editor
(#12081)

### Changes 🏗️

This PR completes the migration from the legacy builder to the new Flow
editor by removing all legacy code and feature flags.

**Removed:**
- Old builder view toggle functionality (`BuilderViewTabs.tsx`)
- Legacy debug panel (`RightSidebar.tsx`)
- Feature flags: `NEW_FLOW_EDITOR` and `BUILDER_VIEW_SWITCH`
- `useBuilderView` hook and related view-switching logic

**Updated:**
- Simplified `build/page.tsx` to always render the new Flow editor
- Added CSS styling (`flow.css`) to properly render Phosphor icons in
React Flow handles

**Tests:**
- Skipped e2e test suite in `build.spec.ts` (legacy builder tests)
- Follow-up PR (#12082) will add new e2e tests for the Flow editor

### Checklist 📋

#### For code changes:

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
    - [x] Create a new flow and verify it loads correctly
    - [x] Add nodes and connections to verify basic functionality works
    - [x] Verify that node handles render correctly with the new CSS
- [x] Check that the UI is clean without the old debug panel or view
toggles

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
2026-02-12 11:16:01 +00:00
Zamil Majdy
a78145505b fix(copilot): merge split assistant messages to prevent Anthropic API errors (#12062)
## Summary
- When the copilot model responds with both text content AND a
long-running tool call (e.g., `create_agent`), the streaming code
created two separate consecutive assistant messages — one with text, one
with `tool_calls`. This caused Anthropic's API to reject with
`"unexpected tool_use_id found in tool_result blocks"` because the
`tool_result` couldn't find a matching `tool_use` in the immediately
preceding assistant message.
- Added a defensive merge of consecutive assistant messages in
`to_openai_messages()` (fixes existing corrupt sessions too)
- Fixed `_yield_tool_call` to add tool_calls to the existing
current-turn assistant message instead of creating a new one
- Changed `accumulated_tool_calls` assignment to use `extend` to prevent
overwriting tool_calls added by long-running tool flow

## Test plan
- [x] All 23 chat feature tests pass (`backend/api/features/chat/`)
- [x] All 44 prompt utility tests pass (`backend/util/prompt_test.py`)
- [x] All pre-commit hooks pass (ruff, isort, black, pyright)
- [ ] Manual test: create an agent via copilot, then ask a follow-up
question — should no longer get 400 error

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Fixes a critical bug where long-running tool calls (like `create_agent`)
caused Anthropic API 400 errors due to split assistant messages. The fix
ensures tool calls are added to the existing assistant message instead
of creating new ones, and adds a defensive merge function to repair any
existing corrupt sessions.

**Key changes:**
- Added `_merge_consecutive_assistant_messages()` to defensively merge
split assistant messages in `to_openai_messages()`
- Modified `_yield_tool_call()` to append tool calls to the current-turn
assistant message instead of creating a new one
- Changed `accumulated_tool_calls` from assignment to `extend` to
preserve tool calls already added by long-running tool flow

**Impact:** Resolves the issue where users received 400 errors after
creating agents via copilot and asking follow-up questions.
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor verification recommended
- The changes are well-targeted and solve a real API compatibility
issue. The logic is sound: searching backwards for the current assistant
message is correct, and using `extend` instead of assignment prevents
overwriting. The defensive merge in `to_openai_messages()` also fixes
existing corrupt sessions. All existing tests pass according to the PR
description.
- No files require special attention - changes are localized and
defensive
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant StreamAPI as stream_chat_completion
    participant Chunks as _stream_chat_chunks
    participant ToolCall as _yield_tool_call
    participant Session as ChatSession
    
    User->>StreamAPI: Send message
    StreamAPI->>Chunks: Stream chat chunks
    
    alt Text + Long-running tool call
        Chunks->>StreamAPI: Text delta (content)
        StreamAPI->>Session: Append assistant message with content
        Chunks->>ToolCall: Tool call detected
        
        Note over ToolCall: OLD: Created new assistant message<br/>NEW: Appends to existing assistant
        
        ToolCall->>Session: Search backwards for current assistant
        ToolCall->>Session: Append tool_call to existing message
        ToolCall->>Session: Add pending tool result
    end
    
    StreamAPI->>StreamAPI: Merge accumulated_tool_calls
    Note over StreamAPI: Use extend (not assign)<br/>to preserve existing tool_calls
    
    StreamAPI->>Session: to_openai_messages()
    Session->>Session: _merge_consecutive_assistant_messages()
    Note over Session: Defensive: Merges any split<br/>assistant messages
    Session-->>StreamAPI: Merged messages
    
    StreamAPI->>User: Return response
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-12 01:52:17 +00:00
Otto
36aeb0b2b3 docs(blocks): clarify HumanInTheLoop output descriptions for agent builder (#12069)
## Problem

The agent builder (LLM) misinterprets the HumanInTheLoop block outputs.
It thinks `approved_data` and `rejected_data` will yield status strings
like "APPROVED" or "REJECTED" instead of understanding that the actual
input data passes through.

This leads to unnecessary complexity - the agent builder adds comparison
blocks to check for status strings that don't exist.

## Solution

Enriched the block docstring and all input/output field descriptions to
make it explicit that:
1. The output is the actual data itself, not a status string
2. The routing is determined by which output pin fires
3. How to use the block correctly (connect downstream blocks to
appropriate output pins)

## Changes

- Updated block docstring with clear "How it works" and "Example usage"
sections
- Enhanced `data` input description to explain data flow
- Enhanced `name` input description for reviewer context
- Enhanced `approved_data` output to explicitly state it's NOT a status
string
- Enhanced `rejected_data` output to explicitly state it's NOT a status
string
- Enhanced `review_message` output for clarity

## Testing

Documentation-only change to schema descriptions. No functional changes.

Fixes SECRT-1930

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Enhanced documentation for the `HumanInTheLoopBlock` to clarify how
output pins work. The key improvement explicitly states that output pins
(`approved_data` and `rejected_data`) yield the actual input data, not
status strings like "APPROVED" or "REJECTED". This prevents the agent
builder (LLM) from misinterpreting the block's behavior and adding
unnecessary comparison blocks.

**Key changes:**
- Added "How it works" and "Example usage" sections to the block
docstring
- Clarified that routing is determined by which output pin fires, not by
comparing output values
- Enhanced all input/output field descriptions with explicit data flow
explanations
- Emphasized that downstream blocks should be connected to the
appropriate output pin based on desired workflow path

This is a documentation-only change with no functional modifications to
the code logic.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with no risk
- Documentation-only change that accurately reflects the existing code
behavior. No functional changes, no runtime impact, and the enhanced
descriptions correctly explain how the block outputs work based on
verification of the implementation code.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-11 15:43:58 +00:00
Ubbe
2a189c44c4 fix(frontend): API stream issues leaking into prompt (#12063)
## Changes 🏗️

<img width="800" height="621" alt="Screenshot 2026-02-11 at 19 32 39"
src="https://github.com/user-attachments/assets/e97be1a7-972e-4ae0-8dfa-6ade63cf287b"
/>

When the BE API has an error, prevent it from leaking into the stream
and instead handle it gracefully via toast.

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run the app locally and trust the changes

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR fixes an issue where backend API stream errors were leaking into
the chat prompt instead of being handled gracefully. The fix involves
both backend and frontend changes to ensure error events conform to the
AI SDK's strict schema.

**Key Changes:**
- **Backend (`response_model.py`)**: Added custom `to_sse()` method for
`StreamError` that only emits `type` and `errorText` fields, stripping
extra fields like `code` and `details` that cause AI SDK validation
failures
- **Backend (`prompt.py`)**: Added validation step after context
compression to remove orphaned tool responses without matching tool
calls, preventing "unexpected tool_use_id" API errors
- **Frontend (`route.ts`)**: Implemented SSE stream normalization with
`normalizeSSEStream()` and `normalizeSSEEvent()` functions to strip
non-conforming fields from error events before they reach the AI SDK
- **Frontend (`ChatMessagesContainer.tsx`)**: Added toast notifications
for errors and improved error display UI with deduplication logic

The changes ensure a clean separation between internal error metadata
(useful for logging/debugging) and the strict schema required by the AI
SDK on the frontend.
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with low risk
- The changes are well-structured and address a specific bug with proper
error handling. The dual-layer approach (backend filtering in `to_sse()`
+ frontend normalization) provides defense-in-depth. However, the lack
of automated tests for the new error normalization logic and the
potential for edge cases in SSE parsing prevent a perfect score.
- Pay close attention to
`autogpt_platform/frontend/src/app/api/chat/sessions/[sessionId]/stream/route.ts`
- the SSE normalization logic should be tested with various error
scenarios
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant Frontend as ChatMessagesContainer
    participant Proxy as /api/chat/.../stream
    participant Backend as Backend API
    participant AISDK as AI SDK

    User->>Frontend: Send message
    Frontend->>Proxy: POST with message
    Proxy->>Backend: Forward request with auth
    Backend->>Backend: Process message
    
    alt Success Path
        Backend->>Proxy: SSE stream (text-delta, etc.)
        Proxy->>Proxy: normalizeSSEStream (pass through)
        Proxy->>AISDK: Forward SSE events
        AISDK->>Frontend: Update messages
        Frontend->>User: Display response
    else Error Path
        Backend->>Backend: StreamError.to_sse()
        Note over Backend: Only emit {type, errorText}
        Backend->>Proxy: SSE error event
        Proxy->>Proxy: normalizeSSEEvent()
        Note over Proxy: Strip extra fields (code, details)
        Proxy->>AISDK: {type: "error", errorText: "..."}
        AISDK->>Frontend: error state updated
        Frontend->>Frontend: Toast notification (deduplicated)
        Frontend->>User: Show error UI + toast
    end
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Otto-AGPT <otto@agpt.co>
2026-02-11 22:46:37 +08:00
Abhimanyu Yadav
508759610f fix(frontend): add min-width-0 to ContentCard to prevent overflow (#12060)
### Changes 🏗️

Added `min-w-0` class to the ContentCard component in the ToolAccordion
to prevent content overflow issues. This CSS fix ensures that the card
properly respects its container width constraints and allows text
truncation to work correctly when content is too wide.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified that tool content displays correctly in the accordion
- [x] Confirmed that long content properly truncates instead of
overflowing
  - [x] Tested with various screen sizes to ensure responsive behavior

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Added `min-w-0` class to `ContentCard` component to fix text truncation
overflow in grid layouts. This is a standard CSS fix that allows grid
items to shrink below their content size, enabling `truncate` classes on
child elements (`ContentCardTitle`, `ContentCardSubtitle`) to work
correctly. The fix follows the same pattern already used in
`ContentCardHeader` (line 54) and `ToolAccordion` (line 54).
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- Safe to merge with no risk
- Single-line CSS fix that addresses a well-known flexbox/grid layout
issue. The change follows existing patterns in the codebase and is
thoroughly tested. No logic changes, no breaking changes, no side
effects.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-11 21:09:21 +08:00
Otto
062fe1aa70 fix(security): enforce disabled flag on blocks in graph validation (#12059)
## Summary
Blocks marked `disabled=True` (like BlockInstallationBlock) were not
being checked during graph validation, allowing them to be used via
direct API calls despite being hidden from the UI.

This adds a security check in `_validate_graph_get_errors()` to reject
any graph containing disabled blocks.

## Security Advisory
GHSA-4crw-9p35-9x54

## Linear
SECRT-1927

## Changes
- Added `block.disabled` check in graph validation (6 lines)

## Testing
- Graphs with disabled blocks → rejected with clear error message
- Graphs with valid blocks → unchanged behavior

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Adds critical security validation to prevent execution of disabled
blocks (like `BlockInstallationBlock`) via direct API calls. The fix
validates that `block.disabled` is `False` during graph validation in
`_validate_graph_get_errors()` on line 747-750, ensuring disabled blocks
are rejected before graph creation or execution. This closes a
vulnerability where blocks marked disabled in the UI could still be used
through API endpoints.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge and addresses a critical security
vulnerability
- The fix is minimal (6 lines), correctly placed in the validation flow,
includes clear security context (GHSA reference), and follows existing
validation patterns. The check is positioned after block existence
validation and before input validation, ensuring disabled blocks are
caught early in both graph creation and execution paths.
- No files require special attention
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 03:28:19 +00:00
dependabot[bot]
2cd0d4fe0f chore(deps): bump actions/checkout from 4 to 6 (#12034)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to
6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/releases">actions/checkout's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>v6-beta by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2298">actions/checkout#2298</a></li>
<li>update readme/changelog for v6 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2311">actions/checkout#2311</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5.0.0...v6.0.0">https://github.com/actions/checkout/compare/v5.0.0...v6.0.0</a></p>
<h2>v6-beta</h2>
<h2>What's Changed</h2>
<p>Updated persist-credentials to store the credentials under
<code>$RUNNER_TEMP</code> instead of directly in the local git
config.</p>
<p>This requires a minimum Actions Runner version of <a
href="https://github.com/actions/runner/releases/tag/v2.329.0">v2.329.0</a>
to access the persisted credentials for <a
href="https://docs.github.com/en/actions/tutorials/use-containerized-services/create-a-docker-container-action">Docker
container action</a> scenarios.</p>
<h2>v5.0.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v5...v5.0.1">https://github.com/actions/checkout/compare/v5...v5.0.1</a></p>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li>
</ul>
<h2>⚠️ Minimum Compatible Runner Version</h2>
<p><strong>v2.327.1</strong><br />
<a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Release
Notes</a></p>
<p>Make sure your runner is updated to this version or newer to use this
release.</p>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p>
<h2>v4.3.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/checkout/compare/v4...v4.3.1">https://github.com/actions/checkout/compare/v4...v4.3.1</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/checkout/blob/main/CHANGELOG.md">actions/checkout's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<h2>v6.0.2</h2>
<ul>
<li>Fix tag handling: preserve annotations and explicit fetch-tags by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2356">actions/checkout#2356</a></li>
</ul>
<h2>v6.0.1</h2>
<ul>
<li>Add worktree support for persist-credentials includeIf by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2327">actions/checkout#2327</a></li>
</ul>
<h2>v6.0.0</h2>
<ul>
<li>Persist creds to a separate file by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2286">actions/checkout#2286</a></li>
<li>Update README to include Node.js 24 support details and requirements
by <a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2248">actions/checkout#2248</a></li>
</ul>
<h2>v5.0.1</h2>
<ul>
<li>Port v6 cleanup to v5 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2301">actions/checkout#2301</a></li>
</ul>
<h2>v5.0.0</h2>
<ul>
<li>Update actions checkout to use node 24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li>
</ul>
<h2>v4.3.1</h2>
<ul>
<li>Port v6 cleanup to v4 by <a
href="https://github.com/ericsciple"><code>@​ericsciple</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2305">actions/checkout#2305</a></li>
</ul>
<h2>v4.3.0</h2>
<ul>
<li>docs: update README.md by <a
href="https://github.com/motss"><code>@​motss</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li>
<li>Add internal repos for checking out multiple repositories by <a
href="https://github.com/mouismail"><code>@​mouismail</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li>
<li>Documentation update - add recommended permissions to Readme by <a
href="https://github.com/benwells"><code>@​benwells</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li>
<li>Adjust positioning of user email note and permissions heading by <a
href="https://github.com/joshmgross"><code>@​joshmgross</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@​nebuk89</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li>
<li>Update CODEOWNERS for actions by <a
href="https://github.com/TingluoHuang"><code>@​TingluoHuang</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li>
<li>Update package dependencies by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li>
</ul>
<h2>v4.2.2</h2>
<ul>
<li><code>url-helper.ts</code> now leverages well-known environment
variables by <a href="https://github.com/jww3"><code>@​jww3</code></a>
in <a
href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li>
<li>Expand unit test coverage for <code>isGhes</code> by <a
href="https://github.com/jww3"><code>@​jww3</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li>
</ul>
<h2>v4.2.1</h2>
<ul>
<li>Check out other refs/* by commit if provided, fall back to ref by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li>
</ul>
<h2>v4.2.0</h2>
<ul>
<li>Add Ref and Commit outputs by <a
href="https://github.com/lucacome"><code>@​lucacome</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1180">actions/checkout#1180</a></li>
<li>Dependency updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>- <a
href="https://redirect.github.com/actions/checkout/pull/1777">actions/checkout#1777</a>,
<a
href="https://redirect.github.com/actions/checkout/pull/1872">actions/checkout#1872</a></li>
</ul>
<h2>v4.1.7</h2>
<ul>
<li>Bump the minor-npm-dependencies group across 1 directory with 4
updates by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1739">actions/checkout#1739</a></li>
<li>Bump actions/checkout from 3 to 4 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1697">actions/checkout#1697</a></li>
<li>Check out other refs/* by commit by <a
href="https://github.com/orhantoy"><code>@​orhantoy</code></a> in <a
href="https://redirect.github.com/actions/checkout/pull/1774">actions/checkout#1774</a></li>
<li>Pin actions/checkout's own workflows to a known, good, stable
version. by <a href="https://github.com/jww3"><code>@​jww3</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1776">actions/checkout#1776</a></li>
</ul>
<h2>v4.1.6</h2>
<ul>
<li>Check platform to set archive extension appropriately by <a
href="https://github.com/cory-miller"><code>@​cory-miller</code></a> in
<a
href="https://redirect.github.com/actions/checkout/pull/1732">actions/checkout#1732</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="de0fac2e45"><code>de0fac2</code></a>
Fix tag handling: preserve annotations and explicit fetch-tags (<a
href="https://redirect.github.com/actions/checkout/issues/2356">#2356</a>)</li>
<li><a
href="064fe7f331"><code>064fe7f</code></a>
Add orchestration_id to git user-agent when ACTIONS_ORCHESTRATION_ID is
set (...</li>
<li><a
href="8e8c483db8"><code>8e8c483</code></a>
Clarify v6 README (<a
href="https://redirect.github.com/actions/checkout/issues/2328">#2328</a>)</li>
<li><a
href="033fa0dc0b"><code>033fa0d</code></a>
Add worktree support for persist-credentials includeIf (<a
href="https://redirect.github.com/actions/checkout/issues/2327">#2327</a>)</li>
<li><a
href="c2d88d3ecc"><code>c2d88d3</code></a>
Update all references from v5 and v4 to v6 (<a
href="https://redirect.github.com/actions/checkout/issues/2314">#2314</a>)</li>
<li><a
href="1af3b93b68"><code>1af3b93</code></a>
update readme/changelog for v6 (<a
href="https://redirect.github.com/actions/checkout/issues/2311">#2311</a>)</li>
<li><a
href="71cf2267d8"><code>71cf226</code></a>
v6-beta (<a
href="https://redirect.github.com/actions/checkout/issues/2298">#2298</a>)</li>
<li><a
href="069c695914"><code>069c695</code></a>
Persist creds to a separate file (<a
href="https://redirect.github.com/actions/checkout/issues/2286">#2286</a>)</li>
<li><a
href="ff7abcd0c3"><code>ff7abcd</code></a>
Update README to include Node.js 24 support details and requirements (<a
href="https://redirect.github.com/actions/checkout/issues/2248">#2248</a>)</li>
<li><a
href="08c6903cd8"><code>08c6903</code></a>
Prepare v5.0.0 release (<a
href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/checkout/compare/v4...v6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4&new-version=6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
2026-02-11 02:25:51 +00:00
dependabot[bot]
1ecae8c87e chore(backend/deps): bump aiofiles from 24.1.0 to 25.1.0 in /autogpt_platform/backend (#12043)
Bumps [aiofiles](https://github.com/Tinche/aiofiles) from 24.1.0 to
25.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/Tinche/aiofiles/releases">aiofiles's
releases</a>.</em></p>
<blockquote>
<h2>v25.1.0</h2>
<ul>
<li>Switch to <a href="https://docs.astral.sh/uv/">uv</a> + add Python
v3.14 support. (<a
href="https://redirect.github.com/Tinche/aiofiles/pull/219">#219</a>)</li>
<li>Add <code>ruff</code> formatter and linter. <a
href="https://redirect.github.com/Tinche/aiofiles/pull/216">#216</a></li>
<li>Drop Python 3.8 support. If you require it, use version 24.1.0. <a
href="https://redirect.github.com/Tinche/aiofiles/pull/204">#204</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/danielsmyers"><code>@​danielsmyers</code></a>
made their first contribution in <a
href="https://redirect.github.com/Tinche/aiofiles/pull/185">Tinche/aiofiles#185</a></li>
<li><a
href="https://github.com/stankudrow"><code>@​stankudrow</code></a> made
their first contribution in <a
href="https://redirect.github.com/Tinche/aiofiles/pull/192">Tinche/aiofiles#192</a></li>
<li><a
href="https://github.com/waketzheng"><code>@​waketzheng</code></a> made
their first contribution in <a
href="https://redirect.github.com/Tinche/aiofiles/pull/221">Tinche/aiofiles#221</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/Tinche/aiofiles/compare/v24.1.0...v25.1.0">https://github.com/Tinche/aiofiles/compare/v24.1.0...v25.1.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/Tinche/aiofiles/blob/main/CHANGELOG.md">aiofiles's
changelog</a>.</em></p>
<blockquote>
<h2>25.1.0 (2025-10-09)</h2>
<ul>
<li>Switch to <a href="https://docs.astral.sh/uv/">uv</a> + add Python
v3.14 support.
(<a
href="https://redirect.github.com/Tinche/aiofiles/pull/219">#219</a>)</li>
<li>Add <code>ruff</code> formatter and linter.
<a
href="https://redirect.github.com/Tinche/aiofiles/pull/216">#216</a></li>
<li>Drop Python 3.8 support. If you require it, use version 24.1.0.
<a
href="https://redirect.github.com/Tinche/aiofiles/pull/204">#204</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="348f5ef656"><code>348f5ef</code></a>
v25.1.0</li>
<li><a
href="5e1bb8f12b"><code>5e1bb8f</code></a>
docs: update readme to use ruff badge (<a
href="https://redirect.github.com/Tinche/aiofiles/issues/221">#221</a>)</li>
<li><a
href="6fdc25c781"><code>6fdc25c</code></a>
Move to uv. (<a
href="https://redirect.github.com/Tinche/aiofiles/issues/219">#219</a>)</li>
<li><a
href="1989132423"><code>1989132</code></a>
set 'function' as a default fixture loop scope value</li>
<li><a
href="8986452a1b"><code>8986452</code></a>
add the 'asyncio_default_fixture_loop_scope=session' option</li>
<li><a
href="ccab1ff776"><code>ccab1ff</code></a>
update pytest-asyncio==1.0.0</li>
<li><a
href="8727c96f5b"><code>8727c96</code></a>
add PR <a
href="https://redirect.github.com/Tinche/aiofiles/issues/216">#216</a>
into the CHANGELOG</li>
<li><a
href="a9388e5f8d"><code>a9388e5</code></a>
add TID and ignore TID252</li>
<li><a
href="760366489a"><code>7603664</code></a>
remove [ruff].exclude keyval</li>
<li><a
href="7c49a5c5f2"><code>7c49a5c</code></a>
add final newlines</li>
<li>Additional commits viewable in <a
href="https://github.com/Tinche/aiofiles/compare/v24.1.0...v25.1.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aiofiles&package-manager=pip&previous-version=24.1.0&new-version=25.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
2026-02-10 23:32:30 +00:00
dependabot[bot]
659338f90c chore(deps): bump peter-evans/repository-dispatch from 3 to 4 (#12035)
Bumps
[peter-evans/repository-dispatch](https://github.com/peter-evans/repository-dispatch)
from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/peter-evans/repository-dispatch/releases">peter-evans/repository-dispatch's
releases</a>.</em></p>
<blockquote>
<h2>Repository Dispatch v4.0.0</h2>
<p>⚙️ Requires <a
href="https://github.com/actions/runner/releases/tag/v2.327.1">Actions
Runner v2.327.1</a> or later if you are using a self-hosted runner for
Node 24 support.</p>
<h2>What's Changed</h2>
<ul>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.8 to
18.19.10 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/306">peter-evans/repository-dispatch#306</a></li>
<li>build(deps): bump peter-evans/repository-dispatch from 2 to 3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/307">peter-evans/repository-dispatch#307</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.10 to
18.19.14 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/308">peter-evans/repository-dispatch#308</a></li>
<li>build(deps): bump peter-evans/create-pull-request from 5 to 6 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/310">peter-evans/repository-dispatch#310</a></li>
<li>build(deps): bump peter-evans/slash-command-dispatch from 3 to 4 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/309">peter-evans/repository-dispatch#309</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.14 to
18.19.15 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/311">peter-evans/repository-dispatch#311</a></li>
<li>build(deps-dev): bump prettier from 3.2.4 to 3.2.5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/312">peter-evans/repository-dispatch#312</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.15 to
18.19.17 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/313">peter-evans/repository-dispatch#313</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.17 to
18.19.18 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/314">peter-evans/repository-dispatch#314</a></li>
<li>build(deps-dev): bump eslint-plugin-github from 4.10.1 to 4.10.2 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/316">peter-evans/repository-dispatch#316</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.18 to
18.19.21 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/317">peter-evans/repository-dispatch#317</a></li>
<li>build(deps-dev): bump eslint from 8.56.0 to 8.57.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/318">peter-evans/repository-dispatch#318</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.21 to
18.19.22 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/319">peter-evans/repository-dispatch#319</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.22 to
18.19.24 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/320">peter-evans/repository-dispatch#320</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.24 to
18.19.26 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/321">peter-evans/repository-dispatch#321</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.26 to
18.19.29 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/322">peter-evans/repository-dispatch#322</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.29 to
18.19.31 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/323">peter-evans/repository-dispatch#323</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.31 to
18.19.33 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/324">peter-evans/repository-dispatch#324</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.33 to
18.19.34 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/325">peter-evans/repository-dispatch#325</a></li>
<li>build(deps-dev): bump prettier from 3.2.5 to 3.3.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/326">peter-evans/repository-dispatch#326</a></li>
<li>build(deps-dev): bump prettier from 3.3.1 to 3.3.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/327">peter-evans/repository-dispatch#327</a></li>
<li>build(deps-dev): bump braces from 3.0.2 to 3.0.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/328">peter-evans/repository-dispatch#328</a></li>
<li>build(deps-dev): bump ws from 7.5.9 to 7.5.10 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/329">peter-evans/repository-dispatch#329</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.34 to
18.19.38 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/330">peter-evans/repository-dispatch#330</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.38 to
18.19.39 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/332">peter-evans/repository-dispatch#332</a></li>
<li>build(deps-dev): bump prettier from 3.3.2 to 3.3.3 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/334">peter-evans/repository-dispatch#334</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.39 to
18.19.41 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/335">peter-evans/repository-dispatch#335</a></li>
<li>build(deps-dev): bump eslint-plugin-prettier from 5.1.3 to 5.2.1 by
<a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/336">peter-evans/repository-dispatch#336</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.41 to
18.19.42 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/337">peter-evans/repository-dispatch#337</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.42 to
18.19.43 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/338">peter-evans/repository-dispatch#338</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.43 to
18.19.44 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/339">peter-evans/repository-dispatch#339</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.44 to
18.19.45 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/340">peter-evans/repository-dispatch#340</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.45 to
18.19.47 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/341">peter-evans/repository-dispatch#341</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.47 to
18.19.50 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/343">peter-evans/repository-dispatch#343</a></li>
<li>build(deps): bump peter-evans/create-pull-request from 6 to 7 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/342">peter-evans/repository-dispatch#342</a></li>
<li>build(deps-dev): bump eslint from 8.57.0 to 8.57.1 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/344">peter-evans/repository-dispatch#344</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.50 to
18.19.53 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/345">peter-evans/repository-dispatch#345</a></li>
<li>build(deps-dev): bump <code>@​vercel/ncc</code> from 0.38.1 to
0.38.2 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/346">peter-evans/repository-dispatch#346</a></li>
<li>Update distribution by <a
href="https://github.com/actions-bot"><code>@​actions-bot</code></a> in
<a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/347">peter-evans/repository-dispatch#347</a></li>
<li>build(deps): bump <code>@​actions/core</code> from 1.10.1 to 1.11.0
by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/349">peter-evans/repository-dispatch#349</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.53 to
18.19.54 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/348">peter-evans/repository-dispatch#348</a></li>
<li>Update distribution by <a
href="https://github.com/actions-bot"><code>@​actions-bot</code></a> in
<a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/350">peter-evans/repository-dispatch#350</a></li>
<li>build(deps): bump <code>@​actions/core</code> from 1.11.0 to 1.11.1
by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/351">peter-evans/repository-dispatch#351</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.54 to
18.19.55 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/352">peter-evans/repository-dispatch#352</a></li>
<li>Update distribution by <a
href="https://github.com/actions-bot"><code>@​actions-bot</code></a> in
<a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/353">peter-evans/repository-dispatch#353</a></li>
<li>build(deps-dev): bump <code>@​types/node</code> from 18.19.55 to
18.19.56 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/peter-evans/repository-dispatch/pull/354">peter-evans/repository-dispatch#354</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="28959ce8df"><code>28959ce</code></a>
Fix node version in actions.yml (<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/433">#433</a>)</li>
<li><a
href="25d29c2bbf"><code>25d29c2</code></a>
build(deps-dev): bump <code>@​types/node</code> in the npm group (<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/432">#432</a>)</li>
<li><a
href="830136c664"><code>830136c</code></a>
build(deps): bump the github-actions group with 3 updates (<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/431">#431</a>)</li>
<li><a
href="2c856c63fe"><code>2c856c6</code></a>
ci: update dependabot config</li>
<li><a
href="66739071c2"><code>6673907</code></a>
build(deps-dev): bump <code>@​types/node</code> from 18.19.127 to
18.19.129 (<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/429">#429</a>)</li>
<li><a
href="952a211c1e"><code>952a211</code></a>
build(deps): bump peter-evans/repository-dispatch from 3 to 4 (<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/428">#428</a>)</li>
<li><a
href="5fc4efd1a4"><code>5fc4efd</code></a>
docs: update readme</li>
<li><a
href="a628c95fd1"><code>a628c95</code></a>
feat: v4 (<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/427">#427</a>)</li>
<li><a
href="de78ac1a71"><code>de78ac1</code></a>
build(deps-dev): bump <code>@​vercel/ncc</code> from 0.38.3 to 0.38.4
(<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/425">#425</a>)</li>
<li><a
href="f49fa7f26b"><code>f49fa7f</code></a>
build(deps-dev): bump <code>@​types/node</code> from 18.19.124 to
18.19.127 (<a
href="https://redirect.github.com/peter-evans/repository-dispatch/issues/426">#426</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/peter-evans/repository-dispatch/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=peter-evans/repository-dispatch&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-02-10 21:28:23 +00:00
Abhimanyu Yadav
4df5b7bde7 refactor(frontend): remove defaultExpanded prop from ToolAccordion components (#12054)
### Changes

- Removed `defaultExpanded` prop from `ToolAccordion` in CreateAgent,
EditAgent, RunAgent, and RunBlock components to streamline the code and
improve readability.

### Impact

- This refactor enhances maintainability by reducing complexity in the
component structure while preserving existing functionality.

### Changes 🏗️

- Removed conditional expansion logic from all tool components
- Simplified ToolAccordion implementation across all affected components

### Checklist 📋

#### For code changes:

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create and run agents with various tools to verify accordion
behavior works correctly
    - [x] Verify that UI components expand and collapse as expected
    - [x] Test with different output types to ensure proper rendering

---------

Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Lluis Agusti <hi@llu.lu>
2026-02-11 00:22:01 +08:00
Otto
017a00af46 feat(copilot): Enable extended thinking for Claude models (#12052)
## Summary

Enables Anthropic's extended thinking feature for Claude models in
CoPilot via OpenRouter. This keeps the model's chain-of-thought
reasoning internal rather than outputting it to users.

## Problem

The CoPilot prompt was designed for a thinking agent (with
`<internal_reasoning>` tags), but extended thinking wasn't enabled on
the API side. This caused the model to output its reasoning as regular
text, leaking internal analysis to users.

## Solution

Added thinking configuration to the OpenRouter `extra_body` for
Anthropic models:
```python
extra_body["provider"] = {
    "anthropic": {
        "thinking": {
            "type": "enabled",
            "budget_tokens": config.thinking_budget_tokens,
        }
    }
}
```

## Configuration

New settings in `ChatConfig`:
| Setting | Default | Description |
|---------|---------|-------------|
| `thinking_enabled` | `True` | Enable extended thinking for Claude
models |
| `thinking_budget_tokens` | `10000` | Token budget for thinking
(1000-100000) |

## Changes

- `config.py`: Added `thinking_enabled` and `thinking_budget_tokens`
settings
- `service.py`: Added thinking config to all 3 places where `extra_body`
is built for LLM calls

## Testing

- Verify CoPilot responses no longer include internal reasoning text
- Check that Claude's extended thinking is working (should see thinking
tokens in usage)
- Confirm non-Anthropic models are unaffected

## Related

Discussion:
https://discord.com/channels/1126875755960336515/1126875756925046928/1470779843552612607

---------

Co-authored-by: Swifty <craigswift13@gmail.com>
2026-02-10 16:18:05 +01:00
Reinier van der Leer
52650eed1d refactor(frontend/auth): Move /copilot auth check to middleware (#12053)
These "is the user authenticated, and should they be?" checks should not
be spread across the codebase, it's complex enough as it is. :')

- Follow-up to #12050

### Changes 🏗️

- Revert "fix(frontend): copilot redirect logout (#12050)"
- Add `/copilot` to `PROTECTED_PAGES` in `@/lib/supabase/helpers`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Trivial change, we know this works for other pages
2026-02-10 14:43:33 +00:00
1022 changed files with 116934 additions and 36667 deletions

View File

@@ -0,0 +1,200 @@
---
name: pr-address
description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
user-invocable: true
argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
---
# PR Address
## Find the PR
```bash
gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
gh pr view {N}
```
## Fetch comments (all sources)
### 1. Inline review threads — GraphQL (primary source of actionable items)
Use GraphQL to fetch inline threads. It natively exposes `isResolved`, returns threads already grouped with all replies, and paginates via cursor — no manual thread reconstruction needed.
```bash
gh api graphql -f query='
{
repository(owner: "Significant-Gravitas", name: "AutoGPT") {
pullRequest(number: {N}) {
reviewThreads(first: 100) {
pageInfo { hasNextPage endCursor }
nodes {
id
isResolved
path
comments(last: 1) {
nodes { databaseId body author { login } createdAt }
}
}
}
}
}
}'
```
If `pageInfo.hasNextPage` is true, fetch subsequent pages by adding `after: "<endCursor>"` to `reviewThreads(first: 100, after: "...")` and repeat until `hasNextPage` is false.
**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
### 2. Top-level reviews — REST (MUST paginate)
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
Two things to extract:
- **Overall state**: look for `CHANGES_REQUESTED` or `APPROVED` reviews.
- **Actionable feedback**: non-empty bodies only. Empty-body reviews are thread-resolution events — they indicate progress but have no feedback to act on.
**Where each reviewer posts:**
- `autogpt-reviewer` — posts detailed structured reviews ("Blockers", "Should Fix", "Nice to Have") as **top-level reviews**. Not present on every PR. Address ALL items.
- `sentry[bot]` — posts bug predictions as **inline threads**. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — posts summaries as **top-level reviews** AND actionable items as **inline threads**. Address actionable items.
- Human reviewers — can post in any source. Address ALL non-empty feedback.
### 3. PR conversation comments — REST
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.
## For each unaddressed comment
Address comments **one at a time**: fix → commit → push → inline reply → next.
1. Read the referenced code, make the fix (or reply explaining why it's not needed)
2. Commit and push the fix
3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):
| Comment type | How to reply |
|---|---|
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
## Format and commit
After fixing, format the changed code:
- **Backend** (from `autogpt_platform/backend/`): `poetry run format`
- **Frontend** (from `autogpt_platform/frontend/`): `pnpm format && pnpm lint && pnpm types`
If API routes changed, regenerate the frontend client:
```bash
cd autogpt_platform/backend && poetry run rest &
REST_PID=$!
trap "kill $REST_PID 2>/dev/null" EXIT
WAIT=0; until curl -sf http://localhost:8006/health > /dev/null 2>&1; do sleep 1; WAIT=$((WAIT+1)); [ $WAIT -ge 60 ] && echo "Timed out" && exit 1; done
cd ../frontend && pnpm generate:api:force
kill $REST_PID 2>/dev/null; trap - EXIT
```
Never manually edit files in `src/app/api/__generated__/`.
Then commit and **push immediately** — never batch commits without pushing.
For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).
## The loop
```text
address comments → format → commit → push
→ wait for CI (while addressing new comments) → fix failures → push
→ re-check comments after CI settles
→ repeat until: all comments addressed AND CI green AND no new comments arriving
```
### Polling for CI + new comments
After pushing, poll for **both** CI status and new comments in a single loop. Do not use `gh pr checks --watch` — it blocks the tool and prevents reacting to new comments while CI is running.
> **Note:** `gh pr checks --watch --fail-fast` is tempting but it blocks the entire Bash tool call, meaning the agent cannot check for or address new comments until CI fully completes. Always poll manually instead.
**Polling loop — repeat every 30 seconds:**
1. Check CI status:
```bash
gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,name,link
```
Parse the results: if every check has `bucket` of `"pass"` or `"skipping"`, CI is green. If any has `"fail"`, CI has failed. Otherwise CI is still pending.
2. Check for merge conflicts:
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json mergeable --jq '.mergeable'
```
If the result is `"CONFLICTING"`, the PR has a merge conflict — see "Resolving merge conflicts" below. If `"UNKNOWN"`, GitHub is still computing mergeability — wait and re-check next poll.
3. Check for new/changed comments (all three sources):
**Inline threads** — re-run the GraphQL query from "Fetch comments". For each unresolved thread, record `{thread_id, last_comment_databaseId}` as your baseline. On each poll, action is needed if:
- A new thread `id` appears that wasn't in the baseline (new thread), OR
- An existing thread's `last_comment_databaseId` has changed (new reply on existing thread)
**Conversation comments:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Compare total count and newest `id` against baseline. Filter to non-empty, non-bot, non-author-update messages.
**Top-level reviews:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
Watch for new non-empty reviews (`CHANGES_REQUESTED` or `COMMENTED` with body). Compare total count and newest `id` against baseline.
4. **React in this precedence order (first match wins):**
| What happened | Action |
|---|---|
| Merge conflict detected | See "Resolving merge conflicts" below. |
| Mergeability is `UNKNOWN` | GitHub is still computing mergeability. Sleep 30 seconds, then restart polling from the top. |
| New comments detected | Address them (fix → commit → push → reply). After pushing, re-fetch all comments to update your baseline, then restart this polling loop from the top (new commits invalidate CI status). |
| CI failed (bucket == "fail") | Get failed check links: `gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,link --jq '.[] \| select(.bucket == "fail") \| .link'`. Extract run ID from link (format: `.../actions/runs/<run-id>/job/...`), read logs with `gh run view <run-id> --repo Significant-Gravitas/AutoGPT --log-failed`. Fix → commit → push → restart polling. |
| CI green + no new comments | **Do not exit immediately.** Bots (coderabbitai, sentry) often post reviews shortly after CI settles. Continue polling for **2 more cycles (60s)** after CI goes green. Only exit after 2 consecutive green+quiet polls. |
| CI pending + no new comments | Sleep 30 seconds, then poll again. |
**The loop ends when:** CI fully green + all comments addressed + **2 consecutive polls with no new comments after CI settled.**
### Resolving merge conflicts
1. Identify the PR's target branch and remote:
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json baseRefName --jq '.baseRefName'
git remote -v # find the remote pointing to Significant-Gravitas/AutoGPT (typically 'upstream' in forks, 'origin' for direct contributors)
```
2. Pull the latest base branch with a 3-way merge:
```bash
git pull {base-remote} {base-branch} --no-rebase
```
3. Resolve conflicting files, then verify no conflict markers remain:
```bash
if grep -R -n -E '^(<<<<<<<|=======|>>>>>>>)' <conflicted-files>; then
echo "Unresolved conflict markers found — resolve before proceeding."
exit 1
fi
```
4. Stage and push:
```bash
git add <conflicted-files>
git commit -m "Resolve merge conflicts with {base-branch}"
git push
```
5. Restart the polling loop from the top — new commits reset CI status.

View File

@@ -0,0 +1,74 @@
---
name: pr-review
description: Review a PR for correctness, security, code quality, and testing issues. TRIGGER when user asks to review a PR, check PR quality, or give feedback on a PR.
user-invocable: true
args: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
---
# PR Review
## Find the PR
```bash
gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
gh pr view {N}
```
## Read the diff
```bash
gh pr diff {N}
```
## Fetch existing review comments
Before posting anything, fetch existing inline comments to avoid duplicates:
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
```
## What to check
**Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).
**Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
**Code quality:** apply rules from backend/frontend CLAUDE.md files.
**Architecture:** DRY, single responsibility, modular functions. `Security()` vs `Depends()` for FastAPI auth. `data:` for SSE events, `: comment` for heartbeats. `transaction=True` for Redis pipelines.
**Testing:** edge cases covered, colocated `*_test.py` (backend) / `__tests__/` (frontend), mocks target where symbol is **used** not defined, `AsyncMock` for async.
## Output format
Every comment **must** be prefixed with `🤖` and a criticality badge:
| Tier | Badge | Meaning |
|---|---|---|
| Blocker | `🔴 **Blocker**` | Must fix before merge |
| Should Fix | `🟠 **Should Fix**` | Important improvement |
| Nice to Have | `🟡 **Nice to Have**` | Minor suggestion |
| Nit | `🔵 **Nit**` | Style / wording |
Example: `🤖 🔴 **Blocker**: Missing error handling for X — suggest wrapping in try/except.`
## Post inline comments
For each finding, post an inline comment on the PR (do not just write a local report):
```bash
# Get the latest commit SHA for the PR
COMMIT_SHA=$(gh api repos/Significant-Gravitas/AutoGPT/pulls/{N} --jq '.head.sha')
# Post an inline comment on a specific file/line
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments \
-f body="🤖 🔴 **Blocker**: <description>" \
-f commit_id="$COMMIT_SHA" \
-f path="<file path>" \
-F line=<line number>
```

View File

@@ -0,0 +1,534 @@
---
name: pr-test
description: "E2E manual testing of PRs/branches using docker compose, agent-browser, and API calls. TRIGGER when user asks to manually test a PR, test a feature end-to-end, or run integration tests against a running system."
user-invocable: true
argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
metadata:
author: autogpt-team
version: "1.0.0"
---
# Manual E2E Test
Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.
## Arguments
- `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
- If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop)
## Step 0: Resolve the target
```bash
# If argument is a PR number, find its worktree
gh pr view {N} --json headRefName --jq '.headRefName'
# If argument is a path, use it directly
```
Determine:
- `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree)
- `WORKTREE_PATH` — the worktree directory
- `PLATFORM_DIR``$WORKTREE_PATH/autogpt_platform`
- `BACKEND_DIR``$PLATFORM_DIR/backend`
- `FRONTEND_DIR``$PLATFORM_DIR/frontend`
- `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`)
- `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions")
- `RESULTS_DIR``$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}`
Create the results directory:
```bash
PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
mkdir -p $RESULTS_DIR
```
**Test user credentials** (for logging into the UI or verifying results manually):
- Email: `test@test.com`
- Password: `testtest123`
## Step 1: Understand the PR
Before testing, understand what changed:
```bash
cd $WORKTREE_PATH
git log --oneline dev..HEAD | head -20
git diff dev --stat
```
Read the changed files to understand:
1. What feature/fix does this PR implement?
2. What components are affected? (backend, frontend, copilot, executor, etc.)
3. What are the key user-facing behaviors to test?
## Step 2: Write test scenarios
Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
```markdown
# Test Plan: PR #{N} — {title}
## Scenarios
1. [Scenario name] — [what to verify]
2. ...
## API Tests (if applicable)
1. [Endpoint] — [expected behavior]
## UI Tests (if applicable)
1. [Page/component] — [interaction to test]
## Negative Tests
1. [What should NOT happen]
```
**Be critical** — include edge cases, error paths, and security checks.
## Step 3: Environment setup
### 3a. Copy .env files from the root worktree
The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree:
```bash
# CRITICAL: .env files are NOT checked into git. They must be copied manually.
cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env
```
### 3b. Configure copilot authentication
The copilot needs an LLM API to function. Two approaches (try subscription first):
#### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)
The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup.
Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL):
```bash
# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env
```
**How it works:** The script reads the OAuth token from:
- **macOS**: system keychain (`"Claude Code-credentials"`)
- **Linux/WSL**: `~/.claude/.credentials.json`
- **Windows**: `%APPDATA%/claude/.credentials.json`
It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed.
**Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor`
#### Option 2: OpenRouter API key mode (fallback)
If subscription mode doesn't work, switch to API key mode using OpenRouter:
```bash
# In $BACKEND_DIR/.env, ensure these are set:
CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
CHAT_BASE_URL=https://openrouter.ai/api/v1
CHAT_USE_CLAUDE_AGENT_SDK=true
```
Use `sed` to update these values:
```bash
ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
# Add or update CHAT_API_KEY and CHAT_BASE_URL
grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env
```
### 3c. Stop conflicting containers
```bash
# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
docker stop "$name" 2>/dev/null
done
```
### 3e. Build and start
```bash
cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi
cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi
```
**Note:** If the container appears to be running old code (e.g. missing PR changes), use `docker compose build --no-cache` to force a full rebuild. Docker BuildKit may sometimes reuse cached `COPY` layers from a previous build on a different branch.
**Expected time: 3-8 minutes** for build, 5-10 minutes with `--no-cache`.
### 3f. Wait for services to be ready
```bash
# Poll until backend and frontend respond
for i in $(seq 1 60); do
BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
echo "Services ready"
break
fi
sleep 5
done
```
### 3h. Create test user and get auth token
```bash
ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')
# Signup (idempotent — returns "User already registered" if exists)
RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}')
# If "Database error finding user", restart supabase-auth and retry
if echo "$RESULT" | grep -q "Database error"; then
docker restart supabase-auth && sleep 5
curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}'
fi
# Get auth token
TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}' | jq -r '.access_token // ""')
```
**Use this token for ALL API calls:**
```bash
curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...
```
## Step 4: Run tests
### Service ports reference
| Service | Port | URL |
|---------|------|-----|
| Frontend | 3000 | http://localhost:3000 |
| Backend REST | 8006 | http://localhost:8006 |
| Supabase Auth (via Kong) | 8000 | http://localhost:8000 |
| Executor | 8002 | http://localhost:8002 |
| Copilot Executor | 8008 | http://localhost:8008 |
| WebSocket | 8001 | http://localhost:8001 |
| Database Manager | 8005 | http://localhost:8005 |
| Redis | 6379 | localhost:6379 |
| RabbitMQ | 5672 | localhost:5672 |
### API testing
Use `curl` with the auth token for backend API tests:
```bash
# Example: List agents
curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20
# Example: Create an agent
curl -s -X POST http://localhost:8006/api/graphs \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{...}' | jq .
# Example: Run an agent
curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"data": {...}}'
# Example: Get execution results
curl -s -H "Authorization: Bearer $TOKEN" \
"http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
```
### Browser testing with agent-browser
```bash
# Close any existing session
agent-browser close 2>/dev/null || true
# Use --session-name to persist cookies across navigations
# This means login only needs to happen once per test session
agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000
# Get interactive elements
agent-browser --session-name pr-test snapshot | grep "textbox\|button"
# Login
agent-browser --session-name pr-test fill {email_ref} "test@test.com"
agent-browser --session-name pr-test fill {password_ref} "testtest123"
agent-browser --session-name pr-test click {login_button_ref}
sleep 5
# Dismiss cookie banner if present
agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true
# Navigate — cookies are preserved so login persists
agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
# Take screenshot
agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png
# Interact with elements
agent-browser --session-name pr-test fill {ref} "text"
agent-browser --session-name pr-test press "Enter"
agent-browser --session-name pr-test click {ref}
agent-browser --session-name pr-test click 'text=Button Text'
# Read page content
agent-browser --session-name pr-test snapshot | grep "text:"
```
**Key pages:**
- `/copilot` — CoPilot chat (for testing copilot features)
- `/build` — Agent builder (for testing block/node features)
- `/build?flowID={id}` — Specific agent in builder
- `/library` — Agent library (for testing listing/import features)
- `/library/agents/{id}` — Agent detail with run history
- `/marketplace` — Marketplace
### Checking logs
```bash
# Backend REST server
docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
# Executor (runs agent graphs)
docker logs autogpt_platform-executor-1 2>&1 | tail -30
# Copilot executor (runs copilot chat sessions)
docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30
# Frontend
docker logs autogpt_platform-frontend-1 2>&1 | tail -30
# Filter for errors
docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20
```
### Copilot chat testing
The copilot uses SSE streaming. To test via API:
```bash
# Create a session
SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{}' | jq -r '.id // .session_id // ""')
# Stream a message (SSE - will stream chunks)
curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"message": "Hello, what can you help me with?"}' \
--max-time 60 2>/dev/null | head -50
```
Or test via browser (preferred for UI verification):
```bash
agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
# ... fill chat input and press Enter, wait 20-30s for response
```
## Step 5: Record results
For each test scenario, record in `$RESULTS_DIR/test-report.md`:
```markdown
# E2E Test Report: PR #{N} — {title}
Date: {date}
Branch: {branch}
Worktree: {path}
## Environment
- Docker services: [list running containers]
- API keys: OpenRouter={present/missing}, E2B={present/missing}
## Test Results
### Scenario 1: {name}
**Steps:**
1. ...
2. ...
**Expected:** ...
**Actual:** ...
**Result:** PASS / FAIL
**Screenshot:** {filename}.png
**Logs:** (if relevant)
### Scenario 2: {name}
...
## Summary
- Total: X scenarios
- Passed: Y
- Failed: Z
- Bugs found: [list]
```
Take screenshots at each significant step:
```bash
agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{description}.png
```
## Step 6: Report results
After all tests complete, output a summary to the user:
1. Table of all scenarios with PASS/FAIL
2. Screenshots of failures (read the PNG files to show them)
3. Any bugs found with details
4. Recommendations
### Post test results as PR comment with screenshots
Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees).
```bash
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"
# Step 1: Create blobs for each screenshot
declare -a TREE_ENTRIES
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
B64=$(base64 < "$img")
BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
TREE_ENTRIES+=("-f" "tree[][path]=${SCREENSHOTS_DIR}/${BASENAME}" "-f" "tree[][mode]=100644" "-f" "tree[][type]=blob" "-f" "tree[][sha]=${BLOB_SHA}")
done
# Step 2: Create a tree with all screenshot blobs
# Build the tree JSON manually since gh api doesn't handle arrays well
TREE_JSON='['
FIRST=true
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
B64=$(base64 < "$img")
BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
done
TREE_JSON+=']'
TREE_SHA=$(echo "$TREE_JSON" | gh api "repos/${REPO}/git/trees" --input - -f base_tree="" --jq '.sha' 2>/dev/null \
|| echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
# Step 3: Create a commit pointing to that tree
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
# Step 4: Create or update the ref (branch) — no local checkout needed
gh api "repos/${REPO}/git/refs" \
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
-f sha="$COMMIT_SHA" 2>/dev/null \
|| gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
-X PATCH -f sha="$COMMIT_SHA" -f force=true
# Step 5: Build image markdown and post the comment
REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
IMAGE_MARKDOWN=""
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
IMAGE_MARKDOWN="$IMAGE_MARKDOWN
![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})"
done
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -f body="$(cat <<EOF
## 🧪 E2E Test Report
$(cat $RESULTS_DIR/test-report.md)
### Screenshots
${IMAGE_MARKDOWN}
EOF
)"
```
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
## Fix mode (--fix flag)
When `--fix` is present, after finding a bug:
1. Identify the root cause in the code
2. Fix it in the worktree
3. Rebuild the affected service: `cd $PLATFORM_DIR && docker compose up --build -d {service_name}`
4. Re-test the scenario
5. If fix works, commit and push:
```bash
cd $WORKTREE_PATH
git add -A
git commit -m "fix: {description of fix}"
git push
```
6. Continue testing remaining scenarios
7. After all fixes, run the full test suite again to ensure no regressions
### Fix loop (like pr-address)
```text
test scenario → find bug → fix code → rebuild service → re-test
→ repeat until all scenarios pass
→ commit + push all fixes
→ run full re-test to verify
```
## Known issues and workarounds
### Problem: "Database error finding user" on signup
**Cause:** Supabase auth service schema cache is stale after migration.
**Fix:** `docker restart supabase-auth && sleep 5` then retry signup.
### Problem: Copilot returns auth errors in subscription mode
**Cause:** `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` but `CLAUDE_CODE_OAUTH_TOKEN` is not set or expired.
**Fix:** Re-extract the OAuth token from macOS keychain (see step 3b, Option 1) and recreate the container (`docker compose up -d copilot_executor`). The backend auto-provisions `~/.claude/.credentials.json` from the env var on startup. No `npm install` or `claude login` needed — the SDK bundles its own CLI binary.
### Problem: agent-browser can't find chromium
**Cause:** The Dockerfile auto-provisions system chromium on all architectures (including ARM64). If your branch is behind `dev`, this may not be present yet.
**Fix:** Check if chromium exists: `which chromium || which chromium-browser`. If missing, install it: `apt-get install -y chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` in the container environment.
### Problem: agent-browser selector matches multiple elements
**Cause:** `text=X` matches all elements containing that text.
**Fix:** Use `agent-browser snapshot` to get specific `ref=eNN` references, then use those: `agent-browser click eNN`.
### Problem: Frontend shows cookie banner blocking interaction
**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
### Problem: Container loses npm packages after rebuild
**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
**Fix:** Add packages to the Dockerfile instead of installing at runtime.
### Problem: Services not starting after `docker compose up`
**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.
### Problem: Docker uses cached layers with old code (PR changes not visible)
**Cause:** `docker compose up --build` reuses cached `COPY` layers from previous builds. If the PR branch changes Python files but the previous build already cached that layer from `dev`, the container runs `dev` code.
**Fix:** Always use `docker compose build --no-cache` for the first build of a PR branch. Subsequent rebuilds within the same branch can use `--build`.
### Problem: `agent-browser open` loses login session
**Cause:** Without session persistence, `agent-browser open` starts fresh.
**Fix:** Use `--session-name pr-test` on ALL agent-browser commands. This auto-saves/restores cookies and localStorage across navigations. Alternatively, use `agent-browser eval "window.location.href = '...'"` to navigate within the same context.
### Problem: Supabase auth returns "Database error querying schema"
**Cause:** The database schema changed (migration ran) but supabase-auth has a stale schema cache.
**Fix:** `docker restart supabase-db && sleep 10 && docker restart supabase-auth && sleep 8`. If user data was lost, re-signup.

View File

@@ -0,0 +1,85 @@
---
name: worktree
description: Set up a new git worktree for parallel development. Creates the worktree, copies .env files, installs dependencies, and generates Prisma client. TRIGGER when user asks to set up a worktree, work on a branch in isolation, or needs a separate environment for a branch or PR.
user-invocable: true
args: "[name] — optional worktree name (e.g., 'AutoGPT7'). If omitted, uses next available AutoGPT<N>."
metadata:
author: autogpt-team
version: "3.0.0"
---
# Worktree Setup
## Create the worktree
Derive paths from the git toplevel. If a name is provided as argument, use it. Otherwise, check `git worktree list` and pick the next `AutoGPT<N>`.
```bash
ROOT=$(git rev-parse --show-toplevel)
PARENT=$(dirname "$ROOT")
# From an existing branch
git worktree add "$PARENT/<NAME>" <branch-name>
# From a new branch off dev
git worktree add -b <new-branch> "$PARENT/<NAME>" dev
```
## Copy environment files
Copy `.env` from the root worktree. Falls back to `.env.default` if `.env` doesn't exist.
```bash
ROOT=$(git rev-parse --show-toplevel)
TARGET="$(dirname "$ROOT")/<NAME>"
for envpath in autogpt_platform/backend autogpt_platform/frontend autogpt_platform; do
if [ -f "$ROOT/$envpath/.env" ]; then
cp "$ROOT/$envpath/.env" "$TARGET/$envpath/.env"
elif [ -f "$ROOT/$envpath/.env.default" ]; then
cp "$ROOT/$envpath/.env.default" "$TARGET/$envpath/.env"
fi
done
```
## Install dependencies
```bash
TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install
cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate
cd "$TARGET/autogpt_platform/frontend" && pnpm install
```
Replace `<NAME>` with the actual worktree name (e.g., `AutoGPT7`).
## Running the app (optional)
Backend uses ports: 8001, 8002, 8003, 8005, 8006, 8007, 8008. Free them first if needed:
```bash
TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
for port in 8001 8002 8003 8005 8006 8007 8008; do
lsof -ti :$port | xargs kill -9 2>/dev/null || true
done
cd "$TARGET/autogpt_platform/backend" && poetry run app
```
## CoPilot testing
SDK mode spawns a Claude subprocess — won't work inside Claude Code. Set `CHAT_USE_CLAUDE_AGENT_SDK=false` in `backend/.env` to use baseline mode.
## Cleanup
```bash
# Replace <NAME> with the actual worktree name (e.g., AutoGPT7)
git worktree remove "$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
```
## Alternative: Branchlet (optional)
If [branchlet](https://www.npmjs.com/package/branchlet) is installed:
```bash
branchlet create -n <name> -s <source-branch> -b <new-branch>
```

View File

@@ -5,42 +5,13 @@
!docs/
# Platform - Libs
!autogpt_platform/autogpt_libs/autogpt_libs/
!autogpt_platform/autogpt_libs/pyproject.toml
!autogpt_platform/autogpt_libs/poetry.lock
!autogpt_platform/autogpt_libs/README.md
!autogpt_platform/autogpt_libs/
# Platform - Backend
!autogpt_platform/backend/backend/
!autogpt_platform/backend/test/e2e_test_data.py
!autogpt_platform/backend/migrations/
!autogpt_platform/backend/schema.prisma
!autogpt_platform/backend/pyproject.toml
!autogpt_platform/backend/poetry.lock
!autogpt_platform/backend/README.md
!autogpt_platform/backend/.env
!autogpt_platform/backend/gen_prisma_types_stub.py
# Platform - Market
!autogpt_platform/market/market/
!autogpt_platform/market/scripts.py
!autogpt_platform/market/schema.prisma
!autogpt_platform/market/pyproject.toml
!autogpt_platform/market/poetry.lock
!autogpt_platform/market/README.md
!autogpt_platform/backend/
# Platform - Frontend
!autogpt_platform/frontend/src/
!autogpt_platform/frontend/public/
!autogpt_platform/frontend/scripts/
!autogpt_platform/frontend/package.json
!autogpt_platform/frontend/pnpm-lock.yaml
!autogpt_platform/frontend/tsconfig.json
!autogpt_platform/frontend/README.md
## config
!autogpt_platform/frontend/*.config.*
!autogpt_platform/frontend/.env.*
!autogpt_platform/frontend/.env
!autogpt_platform/frontend/
# Classic - AutoGPT
!classic/original_autogpt/autogpt/
@@ -64,6 +35,38 @@
# Classic - Frontend
!classic/frontend/build/web/
# Explicitly re-ignore some folders
.*
**/__pycache__
# Explicitly re-ignore unwanted files from whitelisted directories
# Note: These patterns MUST come after the whitelist rules to take effect
# Hidden files and directories (but keep frontend .env files needed for build)
**/.*
!autogpt_platform/frontend/.env
!autogpt_platform/frontend/.env.default
!autogpt_platform/frontend/.env.production
# Python artifacts
**/__pycache__/
**/*.pyc
**/*.pyo
**/.venv/
**/.ruff_cache/
**/.pytest_cache/
**/.coverage
**/htmlcov/
# Node artifacts
**/node_modules/
**/.next/
**/storybook-static/
**/playwright-report/
**/test-results/
# Build artifacts
**/dist/
**/build/
!autogpt_platform/frontend/src/**/build/
**/target/
# Logs and temp files
**/*.log
**/*.tmp

1229
.github/scripts/detect_overlaps.py vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -22,7 +22,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
ref: ${{ github.event.workflow_run.head_branch }}
fetch-depth: 0
@@ -40,6 +40,48 @@ jobs:
git checkout -b "$BRANCH_NAME"
echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT
# Backend Python/Poetry setup (so Claude can run linting/tests)
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
cd autogpt_platform/backend
HEAD_POETRY_VERSION=$(python3 ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install Python dependencies
working-directory: autogpt_platform/backend
run: poetry install
- name: Generate Prisma Client
working-directory: autogpt_platform/backend
run: poetry run prisma generate && poetry run gen-prisma-stub
# Frontend Node.js/pnpm setup (so Claude can run linting/tests)
- name: Enable corepack
run: corepack enable
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install JavaScript dependencies
working-directory: autogpt_platform/frontend
run: pnpm install --frozen-lockfile
- name: Get CI failure details
id: failure_details
uses: actions/github-script@v8

View File

@@ -30,7 +30,7 @@ jobs:
actions: read # Required for CI access
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 1
@@ -77,27 +77,15 @@ jobs:
run: poetry run prisma generate && poetry run gen-prisma-stub
# Frontend Node.js/pnpm setup (mirrors platform-frontend-ci.yml)
- name: Enable corepack
run: corepack enable
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22"
- name: Enable corepack
run: corepack enable
- name: Set pnpm store directory
run: |
pnpm config set store-dir ~/.pnpm-store
echo "PNPM_HOME=$HOME/.pnpm-store" >> $GITHUB_ENV
- name: Cache frontend dependencies
uses: actions/cache@v5
with:
path: ~/.pnpm-store
key: ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install JavaScript dependencies
working-directory: autogpt_platform/frontend

View File

@@ -40,7 +40,7 @@ jobs:
actions: read # Required for CI access
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 1
@@ -93,27 +93,15 @@ jobs:
run: poetry run prisma generate && poetry run gen-prisma-stub
# Frontend Node.js/pnpm setup (mirrors platform-frontend-ci.yml)
- name: Enable corepack
run: corepack enable
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22"
- name: Enable corepack
run: corepack enable
- name: Set pnpm store directory
run: |
pnpm config set store-dir ~/.pnpm-store
echo "PNPM_HOME=$HOME/.pnpm-store" >> $GITHUB_ENV
- name: Cache frontend dependencies
uses: actions/cache@v5
with:
path: ~/.pnpm-store
key: ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install JavaScript dependencies
working-directory: autogpt_platform/frontend

View File

@@ -58,11 +58,11 @@ jobs:
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
uses: github/codeql-action/init@v4
with:
languages: ${{ matrix.language }}
build-mode: ${{ matrix.build-mode }}
@@ -93,6 +93,6 @@ jobs:
exit 1
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
uses: github/codeql-action/analyze@v4
with:
category: "/language:${{matrix.language}}"

View File

@@ -27,7 +27,7 @@ jobs:
# If you do not check out your code, Copilot will do this for you.
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
submodules: true

View File

@@ -23,7 +23,7 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 1

View File

@@ -7,6 +7,10 @@ on:
- "docs/integrations/**"
- "autogpt_platform/backend/backend/blocks/**"
concurrency:
group: claude-docs-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
jobs:
claude-review:
# Only run for PRs from members/collaborators
@@ -23,7 +27,7 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
@@ -91,5 +95,35 @@ jobs:
3. Read corresponding documentation files to verify accuracy
4. Provide your feedback as a PR comment
## IMPORTANT: Comment Marker
Start your PR comment with exactly this HTML comment marker on its own line:
<!-- CLAUDE_DOCS_REVIEW -->
This marker is used to identify and replace your comment on subsequent runs.
Be constructive and specific. If everything looks good, say so!
If there are issues, explain what's wrong and suggest how to fix it.
- name: Delete old Claude review comments
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Get all comment IDs with our marker, sorted by creation date (oldest first)
COMMENT_IDS=$(gh api \
repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments \
--jq '[.[] | select(.body | contains("<!-- CLAUDE_DOCS_REVIEW -->"))] | sort_by(.created_at) | .[].id')
# Count comments
COMMENT_COUNT=$(echo "$COMMENT_IDS" | grep -c . || true)
if [ "$COMMENT_COUNT" -gt 1 ]; then
# Delete all but the last (newest) comment
echo "$COMMENT_IDS" | head -n -1 | while read -r COMMENT_ID; do
if [ -n "$COMMENT_ID" ]; then
echo "Deleting old review comment: $COMMENT_ID"
gh api -X DELETE repos/${{ github.repository }}/issues/comments/$COMMENT_ID
fi
done
else
echo "No old review comments to clean up"
fi

View File

@@ -28,7 +28,7 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 1

View File

@@ -25,7 +25,7 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
ref: ${{ github.event.inputs.git_ref || github.ref_name }}
@@ -52,7 +52,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Trigger deploy workflow
uses: peter-evans/repository-dispatch@v3
uses: peter-evans/repository-dispatch@v4
with:
token: ${{ secrets.DEPLOY_TOKEN }}
repository: Significant-Gravitas/AutoGPT_cloud_infrastructure

View File

@@ -17,7 +17,7 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
ref: ${{ github.ref_name || 'master' }}
@@ -45,7 +45,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Trigger deploy workflow
uses: peter-evans/repository-dispatch@v3
uses: peter-evans/repository-dispatch@v4
with:
token: ${{ secrets.DEPLOY_TOKEN }}
repository: Significant-Gravitas/AutoGPT_cloud_infrastructure

View File

@@ -5,12 +5,14 @@ on:
branches: [master, dev, ci-test*]
paths:
- ".github/workflows/platform-backend-ci.yml"
- ".github/workflows/scripts/get_package_version_from_lockfile.py"
- "autogpt_platform/backend/**"
- "autogpt_platform/autogpt_libs/**"
pull_request:
branches: [master, dev, release-*]
paths:
- ".github/workflows/platform-backend-ci.yml"
- ".github/workflows/scripts/get_package_version_from_lockfile.py"
- "autogpt_platform/backend/**"
- "autogpt_platform/autogpt_libs/**"
merge_group:
@@ -25,10 +27,91 @@ defaults:
working-directory: autogpt_platform/backend
jobs:
lint:
permissions:
contents: read
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-py3.12-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
echo "Using Poetry version ${HEAD_POETRY_VERSION}"
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
- name: Install Python dependencies
run: poetry install
- name: Run Linters
run: poetry run lint --skip-pyright
env:
CI: true
PLAIN_OUTPUT: True
type-check:
permissions:
contents: read
timeout-minutes: 10
strategy:
fail-fast: false
matrix:
python-version: ["3.11", "3.12", "3.13"]
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
echo "Using Poetry version ${HEAD_POETRY_VERSION}"
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
- name: Install Python dependencies
run: poetry install
- name: Generate Prisma Client
run: poetry run prisma generate && poetry run gen-prisma-stub
- name: Run Pyright
run: poetry run pyright --pythonversion ${{ matrix.python-version }}
env:
CI: true
PLAIN_OUTPUT: True
test:
permissions:
contents: read
timeout-minutes: 30
timeout-minutes: 15
strategy:
fail-fast: false
matrix:
@@ -41,13 +124,18 @@ jobs:
ports:
- 6379:6379
rabbitmq:
image: rabbitmq:3.12-management
image: rabbitmq:4.1.4
ports:
- 5672:5672
- 15672:15672
env:
RABBITMQ_DEFAULT_USER: ${{ env.RABBITMQ_DEFAULT_USER }}
RABBITMQ_DEFAULT_PASS: ${{ env.RABBITMQ_DEFAULT_PASS }}
options: >-
--health-cmd "rabbitmq-diagnostics -q ping"
--health-interval 30s
--health-timeout 10s
--health-retries 5
--health-start-period 10s
clamav:
image: clamav/clamav-debian:latest
ports:
@@ -68,7 +156,7 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
submodules: true
@@ -91,9 +179,9 @@ jobs:
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry (Unix)
- name: Install Poetry
run: |
# Extract Poetry version from backend/poetry.lock
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
@@ -151,22 +239,22 @@ jobs:
echo "Waiting for ClamAV daemon to start..."
max_attempts=60
attempt=0
until nc -z localhost 3310 || [ $attempt -eq $max_attempts ]; do
echo "ClamAV is unavailable - sleeping (attempt $((attempt+1))/$max_attempts)"
sleep 5
attempt=$((attempt+1))
done
if [ $attempt -eq $max_attempts ]; then
echo "ClamAV failed to start after $((max_attempts*5)) seconds"
echo "Checking ClamAV service logs..."
docker logs $(docker ps -q --filter "ancestor=clamav/clamav-debian:latest") 2>&1 | tail -50 || echo "No ClamAV container found"
exit 1
fi
echo "ClamAV is ready!"
# Verify ClamAV is responsive
echo "Testing ClamAV connection..."
timeout 10 bash -c 'echo "PING" | nc localhost 3310' || {
@@ -181,18 +269,13 @@ jobs:
DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
DIRECT_URL: ${{ steps.supabase.outputs.DB_URL }}
- id: lint
name: Run Linter
run: poetry run lint
- name: Run pytest with coverage
- name: Run pytest
run: |
if [[ "${{ runner.debug }}" == "1" ]]; then
poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG
else
poetry run pytest -s -vv
fi
if: success() || (failure() && steps.lint.outcome == 'failure')
env:
LOG_LEVEL: ${{ runner.debug && 'DEBUG' || 'INFO' }}
DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
@@ -204,6 +287,12 @@ jobs:
REDIS_PORT: "6379"
ENCRYPTION_KEY: "dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=" # DO NOT USE IN PRODUCTION!!
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v4
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
# flags: backend,${{ runner.os }}
env:
CI: true
PLAIN_OUTPUT: True
@@ -217,9 +306,3 @@ jobs:
# the backend service, docker composes, and examples
RABBITMQ_DEFAULT_USER: "rabbitmq_user_default"
RABBITMQ_DEFAULT_PASS: "k0VMxyIJF9S35f3x2uaw5IWAl6Y536O7"
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v4
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
# flags: backend,${{ runner.os }}

View File

@@ -82,7 +82,7 @@ jobs:
- name: Dispatch Deploy Event
if: steps.check_status.outputs.should_deploy == 'true'
uses: peter-evans/repository-dispatch@v3
uses: peter-evans/repository-dispatch@v4
with:
token: ${{ secrets.DISPATCH_TOKEN }}
repository: Significant-Gravitas/AutoGPT_cloud_infrastructure
@@ -110,7 +110,7 @@ jobs:
- name: Dispatch Undeploy Event (from comment)
if: steps.check_status.outputs.should_undeploy == 'true'
uses: peter-evans/repository-dispatch@v3
uses: peter-evans/repository-dispatch@v4
with:
token: ${{ secrets.DISPATCH_TOKEN }}
repository: Significant-Gravitas/AutoGPT_cloud_infrastructure
@@ -168,7 +168,7 @@ jobs:
github.event_name == 'pull_request' &&
github.event.action == 'closed' &&
steps.check_pr_close.outputs.should_undeploy == 'true'
uses: peter-evans/repository-dispatch@v3
uses: peter-evans/repository-dispatch@v4
with:
token: ${{ secrets.DISPATCH_TOKEN }}
repository: Significant-Gravitas/AutoGPT_cloud_infrastructure

View File

@@ -6,10 +6,16 @@ on:
paths:
- ".github/workflows/platform-frontend-ci.yml"
- "autogpt_platform/frontend/**"
- "autogpt_platform/backend/Dockerfile"
- "autogpt_platform/docker-compose.yml"
- "autogpt_platform/docker-compose.platform.yml"
pull_request:
paths:
- ".github/workflows/platform-frontend-ci.yml"
- "autogpt_platform/frontend/**"
- "autogpt_platform/backend/Dockerfile"
- "autogpt_platform/docker-compose.yml"
- "autogpt_platform/docker-compose.platform.yml"
merge_group:
workflow_dispatch:
@@ -26,12 +32,11 @@ jobs:
setup:
runs-on: ubuntu-latest
outputs:
cache-key: ${{ steps.cache-key.outputs.key }}
components-changed: ${{ steps.filter.outputs.components }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Check for component changes
uses: dorny/paths-filter@v3
@@ -41,28 +46,17 @@ jobs:
components:
- 'autogpt_platform/frontend/src/components/**'
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Generate cache key
id: cache-key
run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT
- name: Cache dependencies
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ steps.cache-key.outputs.key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
- name: Install dependencies to populate cache
run: pnpm install --frozen-lockfile
lint:
@@ -71,24 +65,17 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
uses: actions/checkout@v6
- name: Enable corepack
run: corepack enable
- name: Restore dependencies cache
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
run: pnpm install --frozen-lockfile
@@ -107,26 +94,19 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Restore dependencies cache
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
run: pnpm install --frozen-lockfile
@@ -140,163 +120,25 @@ jobs:
token: ${{ secrets.GITHUB_TOKEN }}
exitOnceUploaded: true
e2e_test:
runs-on: big-boi
needs: setup
strategy:
fail-fast: false
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Copy default supabase .env
run: |
cp ../.env.default ../.env
- name: Copy backend .env and set OpenAI API key
run: |
cp ../backend/.env.default ../backend/.env
echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
env:
# Used by E2E test data script to generate embeddings for approved store agents
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Cache Docker layers
uses: actions/cache@v5
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-frontend-test-${{ hashFiles('autogpt_platform/docker-compose.yml', 'autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/pyproject.toml', 'autogpt_platform/backend/poetry.lock') }}
restore-keys: |
${{ runner.os }}-buildx-frontend-test-
- name: Run docker compose
run: |
NEXT_PUBLIC_PW_TEST=true docker compose -f ../docker-compose.yml up -d
env:
DOCKER_BUILDKIT: 1
BUILDX_CACHE_FROM: type=local,src=/tmp/.buildx-cache
BUILDX_CACHE_TO: type=local,dest=/tmp/.buildx-cache-new,mode=max
- name: Move cache
run: |
rm -rf /tmp/.buildx-cache
if [ -d "/tmp/.buildx-cache-new" ]; then
mv /tmp/.buildx-cache-new /tmp/.buildx-cache
fi
- name: Wait for services to be ready
run: |
echo "Waiting for rest_server to be ready..."
timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
echo "Waiting for database to be ready..."
timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
- name: Create E2E test data
run: |
echo "Creating E2E test data..."
# First try to run the script from inside the container
if docker compose -f ../docker-compose.yml exec -T rest_server test -f /app/autogpt_platform/backend/test/e2e_test_data.py; then
echo "✅ Found e2e_test_data.py in container, running it..."
docker compose -f ../docker-compose.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python backend/test/e2e_test_data.py" || {
echo "❌ E2E test data creation failed!"
docker compose -f ../docker-compose.yml logs --tail=50 rest_server
exit 1
}
else
echo "⚠️ e2e_test_data.py not found in container, copying and running..."
# Copy the script into the container and run it
docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.yml ps -q rest_server):/tmp/e2e_test_data.py || {
echo "❌ Failed to copy script to container"
exit 1
}
docker compose -f ../docker-compose.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
echo "❌ E2E test data creation failed!"
docker compose -f ../docker-compose.yml logs --tail=50 rest_server
exit 1
}
fi
- name: Restore dependencies cache
uses: actions/cache@v5
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Install Browser 'chromium'
run: pnpm playwright install --with-deps chromium
- name: Run Playwright tests
run: pnpm test:no-build
continue-on-error: false
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: playwright-report
if-no-files-found: ignore
retention-days: 3
- name: Upload Playwright test results
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-test-results
path: test-results
if-no-files-found: ignore
retention-days: 3
- name: Print Final Docker Compose logs
if: always()
run: docker compose -f ../docker-compose.yml logs
integration_test:
runs-on: ubuntu-latest
needs: setup
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
submodules: recursive
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Restore dependencies cache
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
run: pnpm install --frozen-lockfile

View File

@@ -1,14 +1,18 @@
name: AutoGPT Platform - Frontend CI
name: AutoGPT Platform - Full-stack CI
on:
push:
branches: [master, dev]
paths:
- ".github/workflows/platform-fullstack-ci.yml"
- ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
- ".github/workflows/scripts/get_package_version_from_lockfile.py"
- "autogpt_platform/**"
pull_request:
paths:
- ".github/workflows/platform-fullstack-ci.yml"
- ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
- ".github/workflows/scripts/get_package_version_from_lockfile.py"
- "autogpt_platform/**"
merge_group:
@@ -24,113 +28,285 @@ defaults:
jobs:
setup:
runs-on: ubuntu-latest
outputs:
cache-key: ${{ steps.cache-key.outputs.key }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
uses: actions/checkout@v6
- name: Enable corepack
run: corepack enable
- name: Generate cache key
id: cache-key
run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT
- name: Cache dependencies
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ steps.cache-key.outputs.key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
- name: Install dependencies to populate cache
run: pnpm install --frozen-lockfile
types:
runs-on: big-boi
check-api-types:
name: check API types
runs-on: ubuntu-latest
needs: setup
strategy:
fail-fast: false
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
submodules: recursive
- name: Set up Node.js
# ------------------------ Backend setup ------------------------
- name: Set up Backend - Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Set up Backend - Install Poetry
working-directory: autogpt_platform/backend
run: |
POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
echo "Installing Poetry version ${POETRY_VERSION}"
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$POETRY_VERSION python3 -
- name: Set up Backend - Set up dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Set up Backend - Install dependencies
working-directory: autogpt_platform/backend
run: poetry install
- name: Set up Backend - Generate Prisma client
working-directory: autogpt_platform/backend
run: poetry run prisma generate && poetry run gen-prisma-stub
- name: Set up Frontend - Export OpenAPI schema from Backend
working-directory: autogpt_platform/backend
run: poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
# ------------------------ Frontend setup ------------------------
- name: Set up Frontend - Enable corepack
run: corepack enable
- name: Set up Frontend - Set up Node
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Enable corepack
run: corepack enable
- name: Copy default supabase .env
run: |
cp ../.env.default ../.env
- name: Copy backend .env
run: |
cp ../backend/.env.default ../backend/.env
- name: Run docker compose
run: |
docker compose -f ../docker-compose.yml --profile local up -d deps_backend
- name: Restore dependencies cache
uses: actions/cache@v5
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-
- name: Install dependencies
- name: Set up Frontend - Install dependencies
run: pnpm install --frozen-lockfile
- name: Setup .env
run: cp .env.default .env
- name: Wait for services to be ready
run: |
echo "Waiting for rest_server to be ready..."
timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
echo "Waiting for database to be ready..."
timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
- name: Generate API queries
run: pnpm generate:api:force
- name: Set up Frontend - Format OpenAPI schema
id: format-schema
run: pnpm prettier --write ./src/app/api/openapi.json
- name: Check for API schema changes
run: |
if ! git diff --exit-code src/app/api/openapi.json; then
echo "❌ API schema changes detected in src/app/api/openapi.json"
echo ""
echo "The openapi.json file has been modified after running 'pnpm generate:api-all'."
echo "The openapi.json file has been modified after exporting the API schema."
echo "This usually means changes have been made in the BE endpoints without updating the Frontend."
echo "The API schema is now out of sync with the Front-end queries."
echo ""
echo "To fix this:"
echo "1. Pull the backend 'docker compose pull && docker compose up -d --build --force-recreate'"
echo "2. Run 'pnpm generate:api' locally"
echo "3. Run 'pnpm types' locally"
echo "4. Fix any TypeScript errors that may have been introduced"
echo "5. Commit and push your changes"
echo "\nIn the backend directory:"
echo "1. Run 'poetry run export-api-schema --output ../frontend/src/app/api/openapi.json'"
echo "\nIn the frontend directory:"
echo "2. Run 'pnpm prettier --write src/app/api/openapi.json'"
echo "3. Run 'pnpm generate:api'"
echo "4. Run 'pnpm types'"
echo "5. Fix any TypeScript errors that may have been introduced"
echo "6. Commit and push your changes"
echo ""
exit 1
else
echo "✅ No API schema changes detected"
fi
- name: Run Typescript checks
- name: Set up Frontend - Generate API client
id: generate-api-client
run: pnpm orval --config ./orval.config.ts
# Continue with type generation & check even if there are schema changes
if: success() || (steps.format-schema.outcome == 'success')
- name: Check for TypeScript errors
run: pnpm types
if: success() || (steps.generate-api-client.outcome == 'success')
e2e_test:
name: end-to-end tests
runs-on: big-boi
steps:
- name: Checkout repository
uses: actions/checkout@v6
with:
submodules: recursive
- name: Set up Platform - Copy default supabase .env
run: |
cp ../.env.default ../.env
- name: Set up Platform - Copy backend .env and set OpenAI API key
run: |
cp ../backend/.env.default ../backend/.env
echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
env:
# Used by E2E test data script to generate embeddings for approved store agents
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
- name: Set up Platform - Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
driver: docker-container
driver-opts: network=host
- name: Set up Platform - Expose GHA cache to docker buildx CLI
uses: crazy-max/ghaction-github-runtime@v4
- name: Set up Platform - Build Docker images (with cache)
working-directory: autogpt_platform
run: |
pip install pyyaml
# Resolve extends and generate a flat compose file that bake can understand
docker compose -f docker-compose.yml config > docker-compose.resolved.yml
# Add cache configuration to the resolved compose file
python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
--source docker-compose.resolved.yml \
--cache-from "type=gha" \
--cache-to "type=gha,mode=max" \
--backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
--git-ref "${{ github.ref }}"
# Build with bake using the resolved compose file (now includes cache config)
docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
env:
NEXT_PUBLIC_PW_TEST: true
- name: Set up tests - Cache E2E test data
id: e2e-data-cache
uses: actions/cache@v5
with:
path: /tmp/e2e_test_data.sql
key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-fullstack-ci.yml') }}
- name: Set up Platform - Start Supabase DB + Auth
run: |
docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
echo "Waiting for database to be ready..."
timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
echo "Waiting for auth service to be ready..."
timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
- name: Set up Platform - Run migrations
run: |
echo "Running migrations..."
docker compose -f ../docker-compose.resolved.yml run --rm migrate
echo "✅ Migrations completed"
env:
NEXT_PUBLIC_PW_TEST: true
- name: Set up tests - Load cached E2E test data
if: steps.e2e-data-cache.outputs.cache-hit == 'true'
run: |
echo "✅ Found cached E2E test data, restoring..."
{
echo "SET session_replication_role = 'replica';"
cat /tmp/e2e_test_data.sql
echo "SET session_replication_role = 'origin';"
} | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
# Refresh materialized views after restore
docker compose -f ../docker-compose.resolved.yml exec -T db \
psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
echo "✅ E2E test data restored from cache"
- name: Set up Platform - Start (all other services)
run: |
docker compose -f ../docker-compose.resolved.yml up -d --no-build
echo "Waiting for rest_server to be ready..."
timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
env:
NEXT_PUBLIC_PW_TEST: true
- name: Set up tests - Create E2E test data
if: steps.e2e-data-cache.outputs.cache-hit != 'true'
run: |
echo "Creating E2E test data..."
docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
echo "❌ E2E test data creation failed!"
docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
exit 1
}
# Dump auth.users + platform schema for cache (two separate dumps)
echo "Dumping database for cache..."
{
docker compose -f ../docker-compose.resolved.yml exec -T db \
pg_dump -U postgres --data-only --column-inserts \
--table='auth.users' postgres
docker compose -f ../docker-compose.resolved.yml exec -T db \
pg_dump -U postgres --data-only --column-inserts \
--schema=platform \
--exclude-table='platform._prisma_migrations' \
--exclude-table='platform.apscheduler_jobs' \
--exclude-table='platform.apscheduler_jobs_batched_notifications' \
postgres
} > /tmp/e2e_test_data.sql
echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
- name: Set up tests - Enable corepack
run: corepack enable
- name: Set up tests - Set up Node
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Set up tests - Install dependencies
run: pnpm install --frozen-lockfile
- name: Set up tests - Install browser 'chromium'
run: pnpm playwright install --with-deps chromium
- name: Run Playwright tests
run: pnpm test:no-build
continue-on-error: false
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: autogpt_platform/frontend/playwright-report
if-no-files-found: ignore
retention-days: 3
- name: Upload Playwright test results
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-test-results
path: autogpt_platform/frontend/test-results
if-no-files-found: ignore
retention-days: 3
- name: Print Final Docker Compose logs
if: always()
run: docker compose -f ../docker-compose.resolved.yml logs

39
.github/workflows/pr-overlap-check.yml vendored Normal file
View File

@@ -0,0 +1,39 @@
name: PR Overlap Detection
on:
pull_request:
types: [opened, synchronize, reopened]
branches:
- dev
- master
permissions:
contents: read
pull-requests: write
jobs:
check-overlaps:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # Need full history for merge testing
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Configure git
run: |
git config user.email "github-actions[bot]@users.noreply.github.com"
git config user.name "github-actions[bot]"
- name: Run overlap detection
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# Always succeed - this check informs contributors, it shouldn't block merging
continue-on-error: true
run: |
python .github/scripts/detect_overlaps.py ${{ github.event.pull_request.number }}

View File

@@ -11,7 +11,7 @@ jobs:
steps:
# - name: Wait some time for all actions to start
# run: sleep 30
- uses: actions/checkout@v4
- uses: actions/checkout@v6
# with:
# fetch-depth: 0
- name: Set up Python

View File

@@ -0,0 +1,195 @@
#!/usr/bin/env python3
"""
Add cache configuration to a resolved docker-compose file for all services
that have a build key, and ensure image names match what docker compose expects.
"""
import argparse
import yaml
DEFAULT_BRANCH = "dev"
CACHE_BUILDS_FOR_COMPONENTS = ["backend", "frontend"]
def main():
parser = argparse.ArgumentParser(
description="Add cache config to a resolved compose file"
)
parser.add_argument(
"--source",
required=True,
help="Source compose file to read (should be output of `docker compose config`)",
)
parser.add_argument(
"--cache-from",
default="type=gha",
help="Cache source configuration",
)
parser.add_argument(
"--cache-to",
default="type=gha,mode=max",
help="Cache destination configuration",
)
for component in CACHE_BUILDS_FOR_COMPONENTS:
parser.add_argument(
f"--{component}-hash",
default="",
help=f"Hash for {component} cache scope (e.g., from hashFiles())",
)
parser.add_argument(
"--git-ref",
default="",
help="Git ref for branch-based cache scope (e.g., refs/heads/master)",
)
args = parser.parse_args()
# Normalize git ref to a safe scope name (e.g., refs/heads/master -> master)
git_ref_scope = ""
if args.git_ref:
git_ref_scope = args.git_ref.replace("refs/heads/", "").replace("/", "-")
with open(args.source, "r") as f:
compose = yaml.safe_load(f)
# Get project name from compose file or default
project_name = compose.get("name", "autogpt_platform")
def get_image_name(dockerfile: str, target: str) -> str:
"""Generate image name based on Dockerfile folder and build target."""
dockerfile_parts = dockerfile.replace("\\", "/").split("/")
if len(dockerfile_parts) >= 2:
folder_name = dockerfile_parts[-2] # e.g., "backend" or "frontend"
else:
folder_name = "app"
return f"{project_name}-{folder_name}:{target}"
def get_build_key(dockerfile: str, target: str) -> str:
"""Generate a unique key for a Dockerfile+target combination."""
return f"{dockerfile}:{target}"
def get_component(dockerfile: str) -> str | None:
"""Get component name (frontend/backend) from dockerfile path."""
for component in CACHE_BUILDS_FOR_COMPONENTS:
if component in dockerfile:
return component
return None
# First pass: collect all services with build configs and identify duplicates
# Track which (dockerfile, target) combinations we've seen
build_key_to_first_service: dict[str, str] = {}
services_to_build: list[str] = []
services_to_dedupe: list[str] = []
for service_name, service_config in compose.get("services", {}).items():
if "build" not in service_config:
continue
build_config = service_config["build"]
dockerfile = build_config.get("dockerfile", "Dockerfile")
target = build_config.get("target", "default")
build_key = get_build_key(dockerfile, target)
if build_key not in build_key_to_first_service:
# First service with this build config - it will do the actual build
build_key_to_first_service[build_key] = service_name
services_to_build.append(service_name)
else:
# Duplicate - will just use the image from the first service
services_to_dedupe.append(service_name)
# Second pass: configure builds and deduplicate
modified_services = []
for service_name, service_config in compose.get("services", {}).items():
if "build" not in service_config:
continue
build_config = service_config["build"]
dockerfile = build_config.get("dockerfile", "Dockerfile")
target = build_config.get("target", "latest")
image_name = get_image_name(dockerfile, target)
# Set image name for all services (needed for both builders and deduped)
service_config["image"] = image_name
if service_name in services_to_dedupe:
# Remove build config - this service will use the pre-built image
del service_config["build"]
continue
# This service will do the actual build - add cache config
cache_from_list = []
cache_to_list = []
component = get_component(dockerfile)
if not component:
# Skip services that don't clearly match frontend/backend
continue
# Get the hash for this component
component_hash = getattr(args, f"{component}_hash")
# Scope format: platform-{component}-{target}-{hash|ref}
# Example: platform-backend-server-abc123
if "type=gha" in args.cache_from:
# 1. Primary: exact hash match (most specific)
if component_hash:
hash_scope = f"platform-{component}-{target}-{component_hash}"
cache_from_list.append(f"{args.cache_from},scope={hash_scope}")
# 2. Fallback: branch-based cache
if git_ref_scope:
ref_scope = f"platform-{component}-{target}-{git_ref_scope}"
cache_from_list.append(f"{args.cache_from},scope={ref_scope}")
# 3. Fallback: dev branch cache (for PRs/feature branches)
if git_ref_scope and git_ref_scope != DEFAULT_BRANCH:
master_scope = f"platform-{component}-{target}-{DEFAULT_BRANCH}"
cache_from_list.append(f"{args.cache_from},scope={master_scope}")
if "type=gha" in args.cache_to:
# Write to both hash-based and branch-based scopes
if component_hash:
hash_scope = f"platform-{component}-{target}-{component_hash}"
cache_to_list.append(f"{args.cache_to},scope={hash_scope}")
if git_ref_scope:
ref_scope = f"platform-{component}-{target}-{git_ref_scope}"
cache_to_list.append(f"{args.cache_to},scope={ref_scope}")
# Ensure we have at least one cache source/target
if not cache_from_list:
cache_from_list.append(args.cache_from)
if not cache_to_list:
cache_to_list.append(args.cache_to)
build_config["cache_from"] = cache_from_list
build_config["cache_to"] = cache_to_list
modified_services.append(service_name)
# Write back to the same file
with open(args.source, "w") as f:
yaml.dump(compose, f, default_flow_style=False, sort_keys=False)
print(f"Added cache config to {len(modified_services)} services in {args.source}:")
for svc in modified_services:
svc_config = compose["services"][svc]
build_cfg = svc_config.get("build", {})
cache_from_list = build_cfg.get("cache_from", ["none"])
cache_to_list = build_cfg.get("cache_to", ["none"])
print(f" - {svc}")
print(f" image: {svc_config.get('image', 'N/A')}")
print(f" cache_from: {cache_from_list}")
print(f" cache_to: {cache_to_list}")
if services_to_dedupe:
print(
f"Deduplicated {len(services_to_dedupe)} services (will use pre-built images):"
)
for svc in services_to_dedupe:
print(f" - {svc} -> {compose['services'][svc].get('image', 'N/A')}")
if __name__ == "__main__":
main()

4
.gitignore vendored
View File

@@ -180,4 +180,6 @@ autogpt_platform/backend/settings.py
.claude/settings.local.json
CLAUDE.local.md
/autogpt_platform/backend/logs
.next
.next
# Implementation plans (generated by AI agents)
plans/

1
.nvmrc Normal file
View File

@@ -0,0 +1 @@
22

View File

@@ -1,3 +1,10 @@
default_install_hook_types:
- pre-commit
- pre-push
- post-checkout
default_stages: [pre-commit]
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
@@ -17,6 +24,7 @@ repos:
name: Detect secrets
description: Detects high entropy strings that are likely to be passwords.
files: ^autogpt_platform/
exclude: pnpm-lock\.yaml$
stages: [pre-push]
- repo: local
@@ -26,49 +34,106 @@ repos:
- id: poetry-install
name: Check & Install dependencies - AutoGPT Platform - Backend
alias: poetry-install-platform-backend
entry: poetry -C autogpt_platform/backend install
# include autogpt_libs source (since it's a path dependency)
files: ^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$
types: [file]
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$" || exit 0;
poetry -C autogpt_platform/backend install
'
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: poetry-install
name: Check & Install dependencies - AutoGPT Platform - Libs
alias: poetry-install-platform-libs
entry: poetry -C autogpt_platform/autogpt_libs install
files: ^autogpt_platform/autogpt_libs/poetry\.lock$
types: [file]
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^autogpt_platform/autogpt_libs/poetry\.lock$" || exit 0;
poetry -C autogpt_platform/autogpt_libs install
'
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: pnpm-install
name: Check & Install dependencies - AutoGPT Platform - Frontend
alias: pnpm-install-platform-frontend
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^autogpt_platform/frontend/pnpm-lock\.yaml$" || exit 0;
pnpm --prefix autogpt_platform/frontend install
'
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: poetry-install
name: Check & Install dependencies - Classic - AutoGPT
alias: poetry-install-classic-autogpt
entry: poetry -C classic/original_autogpt install
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^classic/(original_autogpt|forge)/poetry\.lock$" || exit 0;
poetry -C classic/original_autogpt install
'
# include forge source (since it's a path dependency)
files: ^classic/(original_autogpt|forge)/poetry\.lock$
types: [file]
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: poetry-install
name: Check & Install dependencies - Classic - Forge
alias: poetry-install-classic-forge
entry: poetry -C classic/forge install
files: ^classic/forge/poetry\.lock$
types: [file]
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^classic/forge/poetry\.lock$" || exit 0;
poetry -C classic/forge install
'
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: poetry-install
name: Check & Install dependencies - Classic - Benchmark
alias: poetry-install-classic-benchmark
entry: poetry -C classic/benchmark install
files: ^classic/benchmark/poetry\.lock$
types: [file]
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^classic/benchmark/poetry\.lock$" || exit 0;
poetry -C classic/benchmark install
'
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- repo: local
# For proper type checking, Prisma client must be up-to-date.
@@ -76,12 +141,54 @@ repos:
- id: prisma-generate
name: Prisma Generate - AutoGPT Platform - Backend
alias: prisma-generate-platform-backend
entry: bash -c 'cd autogpt_platform/backend && poetry run prisma generate'
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema\.prisma)$" || exit 0;
cd autogpt_platform/backend
&& poetry run prisma generate
&& poetry run gen-prisma-stub
'
# include everything that triggers poetry install + the prisma schema
files: ^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema.prisma)$
types: [file]
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: export-api-schema
name: Export API schema - AutoGPT Platform - Backend -> Frontend
alias: export-api-schema-platform
entry: >
bash -c '
cd autogpt_platform/backend
&& poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
&& cd ../frontend
&& pnpm prettier --write ./src/app/api/openapi.json
'
files: ^autogpt_platform/backend/
language: system
pass_filenames: false
- id: generate-api-client
name: Generate API client - AutoGPT Platform - Frontend
alias: generate-api-client-platform-frontend
entry: >
bash -c '
SCHEMA=autogpt_platform/frontend/src/app/api/openapi.json;
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --quiet "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF" -- "$SCHEMA" && exit 0
else
git diff --quiet HEAD -- "$SCHEMA" && exit 0
fi;
cd autogpt_platform/frontend && pnpm generate:api
'
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.7.2

View File

@@ -1,2 +1,3 @@
*.ignore.*
*.ign.*
*.ign.*
.application.logs

View File

@@ -45,19 +45,47 @@ AutoGPT Platform is a monorepo containing:
- Backend/Frontend services use YAML anchors for consistent configuration
- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
### Branching Strategy
- **`dev`** is the main development branch. All PRs should target `dev`.
- **`master`** is the production branch. Only used for production releases.
### Creating Pull Requests
- Create the PR against the `dev` branch of the repository.
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
```bash
PR_BODY=$(mktemp)
cat > "$PR_BODY" << 'PREOF'
## Summary
- use `backticks` freely here
PREOF
gh pr create --title "..." --body-file "$PR_BODY" --base dev
rm "$PR_BODY"
```
- Run the github pre-commit hooks to ensure code quality.
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, follow a test-first approach:
1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
2. **Implement the fix/feature** — write the minimal code to make the test pass.
3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
This ensures every change is covered by a test and that the test actually validates the intended behavior.
### Reviewing/Revising Pull Requests
- When the user runs /pr-comments or tries to fetch them, also run gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews to get the reviews
- Use gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews/[review_id]/comments to get the review contents
- Use gh api /repos/Significant-Gravitas/AutoGPT/issues/9924/comments to get the pr specific comments
Use `/pr-review` to review a PR or `/pr-address` to address comments.
When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
### Conventional Commits

View File

@@ -0,0 +1,40 @@
-- =============================================================
-- View: analytics.auth_activities
-- Looker source alias: ds49 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Tracks authentication events (login, logout, SSO, password
-- reset, etc.) from Supabase's internal audit log.
-- Useful for monitoring sign-in patterns and detecting anomalies.
--
-- SOURCE TABLES
-- auth.audit_log_entries — Supabase internal auth event log
--
-- OUTPUT COLUMNS
-- created_at TIMESTAMPTZ When the auth event occurred
-- actor_id TEXT User ID who triggered the event
-- actor_via_sso TEXT Whether the action was via SSO ('true'/'false')
-- action TEXT Event type (e.g. 'login', 'logout', 'token_refreshed')
--
-- WINDOW
-- Rolling 90 days from current date
--
-- EXAMPLE QUERIES
-- -- Daily login counts
-- SELECT DATE_TRUNC('day', created_at) AS day, COUNT(*) AS logins
-- FROM analytics.auth_activities
-- WHERE action = 'login'
-- GROUP BY 1 ORDER BY 1;
--
-- -- SSO vs password login breakdown
-- SELECT actor_via_sso, COUNT(*) FROM analytics.auth_activities
-- WHERE action = 'login' GROUP BY 1;
-- =============================================================
SELECT
created_at,
payload->>'actor_id' AS actor_id,
payload->>'actor_via_sso' AS actor_via_sso,
payload->>'action' AS action
FROM auth.audit_log_entries
WHERE created_at >= NOW() - INTERVAL '90 days'

View File

@@ -0,0 +1,105 @@
-- =============================================================
-- View: analytics.graph_execution
-- Looker source alias: ds16 | Charts: 21
-- =============================================================
-- DESCRIPTION
-- One row per agent graph execution (last 90 days).
-- Unpacks the JSONB stats column into individual numeric columns
-- and normalises the executionStatus — runs that failed due to
-- insufficient credits are reclassified as 'NO_CREDITS' for
-- easier filtering. Error messages are scrubbed of IDs and URLs
-- to allow safe grouping.
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records
-- platform.AgentGraph — Agent graph metadata (for name)
-- platform.LibraryAgent — To flag possibly-AI (safe-mode) agents
--
-- OUTPUT COLUMNS
-- id TEXT Execution UUID
-- agentGraphId TEXT Agent graph UUID
-- agentGraphVersion INT Graph version number
-- executionStatus TEXT COMPLETED | FAILED | NO_CREDITS | RUNNING | QUEUED | TERMINATED
-- createdAt TIMESTAMPTZ When the execution was queued
-- updatedAt TIMESTAMPTZ Last status update time
-- userId TEXT Owner user UUID
-- agentGraphName TEXT Human-readable agent name
-- cputime DECIMAL Total CPU seconds consumed
-- walltime DECIMAL Total wall-clock seconds
-- node_count DECIMAL Number of nodes in the graph
-- nodes_cputime DECIMAL CPU time across all nodes
-- nodes_walltime DECIMAL Wall time across all nodes
-- execution_cost DECIMAL Credit cost of this execution
-- correctness_score FLOAT AI correctness score (if available)
-- possibly_ai BOOLEAN True if agent has sensitive_action_safe_mode enabled
-- groupedErrorMessage TEXT Scrubbed error string (IDs/URLs replaced with wildcards)
--
-- WINDOW
-- Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
-- -- Daily execution counts by status
-- SELECT DATE_TRUNC('day', "createdAt") AS day, "executionStatus", COUNT(*)
-- FROM analytics.graph_execution
-- GROUP BY 1, 2 ORDER BY 1;
--
-- -- Average cost per execution by agent
-- SELECT "agentGraphName", AVG("execution_cost") AS avg_cost, COUNT(*) AS runs
-- FROM analytics.graph_execution
-- WHERE "executionStatus" = 'COMPLETED'
-- GROUP BY 1 ORDER BY avg_cost DESC;
--
-- -- Top error messages
-- SELECT "groupedErrorMessage", COUNT(*) AS occurrences
-- FROM analytics.graph_execution
-- WHERE "executionStatus" = 'FAILED'
-- GROUP BY 1 ORDER BY 2 DESC LIMIT 20;
-- =============================================================
SELECT
ge."id" AS id,
ge."agentGraphId" AS agentGraphId,
ge."agentGraphVersion" AS agentGraphVersion,
CASE
WHEN jsonb_exists(ge."stats"::jsonb, 'error')
AND (
(ge."stats"::jsonb->>'error') ILIKE '%insufficient balance%'
OR (ge."stats"::jsonb->>'error') ILIKE '%you have no credits left%'
)
THEN 'NO_CREDITS'
ELSE CAST(ge."executionStatus" AS TEXT)
END AS executionStatus,
ge."createdAt" AS createdAt,
ge."updatedAt" AS updatedAt,
ge."userId" AS userId,
g."name" AS agentGraphName,
(ge."stats"::jsonb->>'cputime')::decimal AS cputime,
(ge."stats"::jsonb->>'walltime')::decimal AS walltime,
(ge."stats"::jsonb->>'node_count')::decimal AS node_count,
(ge."stats"::jsonb->>'nodes_cputime')::decimal AS nodes_cputime,
(ge."stats"::jsonb->>'nodes_walltime')::decimal AS nodes_walltime,
(ge."stats"::jsonb->>'cost')::decimal AS execution_cost,
(ge."stats"::jsonb->>'correctness_score')::float AS correctness_score,
COALESCE(la.possibly_ai, FALSE) AS possibly_ai,
REGEXP_REPLACE(
REGEXP_REPLACE(
TRIM(BOTH '"' FROM ge."stats"::jsonb->>'error'),
'(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
'\1\2/...', 'gi'
),
'[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
) AS groupedErrorMessage
FROM platform."AgentGraphExecution" ge
LEFT JOIN platform."AgentGraph" g
ON ge."agentGraphId" = g."id"
AND ge."agentGraphVersion" = g."version"
LEFT JOIN (
SELECT DISTINCT ON ("userId", "agentGraphId")
"userId", "agentGraphId",
("settings"::jsonb->>'sensitive_action_safe_mode')::boolean AS possibly_ai
FROM platform."LibraryAgent"
WHERE "isDeleted" = FALSE
AND "isArchived" = FALSE
ORDER BY "userId", "agentGraphId", "agentGraphVersion" DESC
) la ON la."userId" = ge."userId" AND la."agentGraphId" = ge."agentGraphId"
WHERE ge."createdAt" > CURRENT_DATE - INTERVAL '90 days'

View File

@@ -0,0 +1,101 @@
-- =============================================================
-- View: analytics.node_block_execution
-- Looker source alias: ds14 | Charts: 11
-- =============================================================
-- DESCRIPTION
-- One row per node (block) execution (last 90 days).
-- Unpacks stats JSONB and joins to identify which block type
-- was run. For failed nodes, joins the error output and
-- scrubs it for safe grouping.
--
-- SOURCE TABLES
-- platform.AgentNodeExecution — Node execution records
-- platform.AgentNode — Node → block mapping
-- platform.AgentBlock — Block name/ID
-- platform.AgentNodeExecutionInputOutput — Error output values
--
-- OUTPUT COLUMNS
-- id TEXT Node execution UUID
-- agentGraphExecutionId TEXT Parent graph execution UUID
-- agentNodeId TEXT Node UUID within the graph
-- executionStatus TEXT COMPLETED | FAILED | QUEUED | RUNNING | TERMINATED
-- addedTime TIMESTAMPTZ When the node was queued
-- queuedTime TIMESTAMPTZ When it entered the queue
-- startedTime TIMESTAMPTZ When execution started
-- endedTime TIMESTAMPTZ When execution finished
-- inputSize BIGINT Input payload size in bytes
-- outputSize BIGINT Output payload size in bytes
-- walltime NUMERIC Wall-clock seconds for this node
-- cputime NUMERIC CPU seconds for this node
-- llmRetryCount INT Number of LLM retries
-- llmCallCount INT Number of LLM API calls made
-- inputTokenCount BIGINT LLM input tokens consumed
-- outputTokenCount BIGINT LLM output tokens produced
-- blockName TEXT Human-readable block name (e.g. 'OpenAIBlock')
-- blockId TEXT Block UUID
-- groupedErrorMessage TEXT Scrubbed error (IDs/URLs wildcarded)
-- errorMessage TEXT Raw error output (only set when FAILED)
--
-- WINDOW
-- Rolling 90 days (addedTime > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
-- -- Most-used blocks by execution count
-- SELECT "blockName", COUNT(*) AS executions,
-- COUNT(*) FILTER (WHERE "executionStatus"='FAILED') AS failures
-- FROM analytics.node_block_execution
-- GROUP BY 1 ORDER BY executions DESC LIMIT 20;
--
-- -- Average LLM token usage per block
-- SELECT "blockName",
-- AVG("inputTokenCount") AS avg_input_tokens,
-- AVG("outputTokenCount") AS avg_output_tokens
-- FROM analytics.node_block_execution
-- WHERE "llmCallCount" > 0
-- GROUP BY 1 ORDER BY avg_input_tokens DESC;
--
-- -- Top failure reasons
-- SELECT "blockName", "groupedErrorMessage", COUNT(*) AS count
-- FROM analytics.node_block_execution
-- WHERE "executionStatus" = 'FAILED'
-- GROUP BY 1, 2 ORDER BY count DESC LIMIT 20;
-- =============================================================
SELECT
ne."id" AS id,
ne."agentGraphExecutionId" AS agentGraphExecutionId,
ne."agentNodeId" AS agentNodeId,
CAST(ne."executionStatus" AS TEXT) AS executionStatus,
ne."addedTime" AS addedTime,
ne."queuedTime" AS queuedTime,
ne."startedTime" AS startedTime,
ne."endedTime" AS endedTime,
(ne."stats"::jsonb->>'input_size')::bigint AS inputSize,
(ne."stats"::jsonb->>'output_size')::bigint AS outputSize,
(ne."stats"::jsonb->>'walltime')::numeric AS walltime,
(ne."stats"::jsonb->>'cputime')::numeric AS cputime,
(ne."stats"::jsonb->>'llm_retry_count')::int AS llmRetryCount,
(ne."stats"::jsonb->>'llm_call_count')::int AS llmCallCount,
(ne."stats"::jsonb->>'input_token_count')::bigint AS inputTokenCount,
(ne."stats"::jsonb->>'output_token_count')::bigint AS outputTokenCount,
b."name" AS blockName,
b."id" AS blockId,
REGEXP_REPLACE(
REGEXP_REPLACE(
TRIM(BOTH '"' FROM eio."data"::text),
'(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
'\1\2/...', 'gi'
),
'[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
) AS groupedErrorMessage,
eio."data" AS errorMessage
FROM platform."AgentNodeExecution" ne
LEFT JOIN platform."AgentNode" nd
ON ne."agentNodeId" = nd."id"
LEFT JOIN platform."AgentBlock" b
ON nd."agentBlockId" = b."id"
LEFT JOIN platform."AgentNodeExecutionInputOutput" eio
ON eio."referencedByOutputExecId" = ne."id"
AND eio."name" = 'error'
AND ne."executionStatus" = 'FAILED'
WHERE ne."addedTime" > CURRENT_DATE - INTERVAL '90 days'

View File

@@ -0,0 +1,97 @@
-- =============================================================
-- View: analytics.retention_agent
-- Looker source alias: ds35 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention broken down per individual agent.
-- Cohort = week of a user's first use of THAT specific agent.
-- Tells you which agents keep users coming back vs. one-shot
-- use. Only includes cohorts from the last 180 days.
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records (user × agent × time)
-- platform.AgentGraph — Agent names
--
-- OUTPUT COLUMNS
-- agent_id TEXT Agent graph UUID
-- agent_label TEXT 'AgentName [first8chars]'
-- agent_label_n TEXT 'AgentName [first8chars] (n=total_users)'
-- cohort_week_start DATE Week users first ran this agent
-- cohort_label TEXT ISO week label
-- cohort_label_n TEXT ISO week label with cohort size
-- user_lifetime_week INT Weeks since first use of this agent
-- cohort_users BIGINT Users in this cohort for this agent
-- active_users BIGINT Users who ran the agent again in week k
-- retention_rate FLOAT active_users / cohort_users
-- cohort_users_w0 BIGINT cohort_users only at week 0 (safe to SUM)
-- agent_total_users BIGINT Total users across all cohorts for this agent
--
-- EXAMPLE QUERIES
-- -- Best-retained agents at week 2
-- SELECT agent_label, AVG(retention_rate) AS w2_retention
-- FROM analytics.retention_agent
-- WHERE user_lifetime_week = 2 AND cohort_users >= 10
-- GROUP BY 1 ORDER BY w2_retention DESC LIMIT 10;
--
-- -- Agents with most unique users
-- SELECT DISTINCT agent_label, agent_total_users
-- FROM analytics.retention_agent
-- ORDER BY agent_total_users DESC LIMIT 20;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
events AS (
SELECT e."userId"::text AS user_id, e."agentGraphId" AS agent_id,
e."createdAt"::timestamptz AS created_at,
DATE_TRUNC('week', e."createdAt")::date AS week_start
FROM platform."AgentGraphExecution" e
),
first_use AS (
SELECT user_id, agent_id, MIN(created_at) AS first_use_at,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1,2
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_weeks AS (SELECT DISTINCT user_id, agent_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, aw.agent_id, fu.cohort_week_start,
((aw.week_start - DATE_TRUNC('week',fu.first_use_at)::date)/7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_use fu USING (user_id, agent_id)
WHERE aw.week_start >= DATE_TRUNC('week',fu.first_use_at)::date
),
active_counts AS (
SELECT agent_id, cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2,3
),
cohort_sizes AS (
SELECT agent_id, cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_use GROUP BY 1,2
),
cohort_caps AS (
SELECT cs.agent_id, cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.agent_id, cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
),
agent_names AS (SELECT DISTINCT ON (g."id") g."id" AS agent_id, g."name" AS agent_name FROM platform."AgentGraph" g ORDER BY g."id", g."version" DESC),
agent_total_users AS (SELECT agent_id, SUM(cohort_users) AS agent_total_users FROM cohort_sizes GROUP BY 1)
SELECT
g.agent_id,
COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||']' AS agent_label,
COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||'] (n='||COALESCE(atu.agent_total_users,0)||')' AS agent_label_n,
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(ac.active_users,0) AS active_users,
COALESCE(ac.active_users,0)::float / NULLIF(g.cohort_users,0) AS retention_rate,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0,
COALESCE(atu.agent_total_users,0) AS agent_total_users
FROM grid g
LEFT JOIN active_counts ac ON ac.agent_id=g.agent_id AND ac.cohort_week_start=g.cohort_week_start AND ac.user_lifetime_week=g.user_lifetime_week
LEFT JOIN agent_names an ON an.agent_id=g.agent_id
LEFT JOIN agent_total_users atu ON atu.agent_id=g.agent_id
ORDER BY agent_label, g.cohort_week_start, g.user_lifetime_week;

View File

@@ -0,0 +1,81 @@
-- =============================================================
-- View: analytics.retention_execution_daily
-- Looker source alias: ds111 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Daily cohort retention based on agent executions.
-- Cohort anchor = day of user's FIRST ever execution.
-- Only includes cohorts from the last 90 days, up to day 30.
-- Great for early engagement analysis (did users run another
-- agent the next day?).
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records
--
-- OUTPUT COLUMNS
-- Same pattern as retention_login_daily.
-- cohort_day_start = day of first execution (not first login)
--
-- EXAMPLE QUERIES
-- -- Day-3 execution retention
-- SELECT cohort_label, retention_rate_bounded AS d3_retention
-- FROM analytics.retention_execution_daily
-- WHERE user_lifetime_day = 3 ORDER BY cohort_day_start;
-- =============================================================
WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days') AS cohort_start),
events AS (
SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
DATE_TRUNC('day', e."createdAt")::date AS day_start
FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
),
first_exec AS (
SELECT user_id, MIN(created_at) AS first_exec_at,
DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
FROM events GROUP BY 1
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
user_day_age AS (
SELECT ad.user_id, fe.cohort_day_start,
(ad.day_start - DATE_TRUNC('day',fe.first_exec_at)::date)::int AS user_lifetime_day
FROM activity_days ad JOIN first_exec fe USING (user_id)
WHERE ad.day_start >= DATE_TRUNC('day',fe.first_exec_at)::date
),
bounded_counts AS (
SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_day_start, cs.cohort_users,
LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
)
SELECT
g.cohort_day_start,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD') AS cohort_label,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_day, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END AS cohort_users_d0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
ORDER BY g.cohort_day_start, g.user_lifetime_day;

View File

@@ -0,0 +1,81 @@
-- =============================================================
-- View: analytics.retention_execution_weekly
-- Looker source alias: ds92 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention based on agent executions.
-- Cohort anchor = week of user's FIRST ever agent execution
-- (not first login). Only includes cohorts from the last 180 days.
-- Useful when you care about product engagement, not just visits.
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records
--
-- OUTPUT COLUMNS
-- Same pattern as retention_login_weekly.
-- cohort_week_start = week of first execution (not first login)
--
-- EXAMPLE QUERIES
-- -- Week-2 execution retention
-- SELECT cohort_label, retention_rate_bounded
-- FROM analytics.retention_execution_weekly
-- WHERE user_lifetime_week = 2 ORDER BY cohort_week_start;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
events AS (
SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
DATE_TRUNC('week', e."createdAt")::date AS week_start
FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
),
first_exec AS (
SELECT user_id, MIN(created_at) AS first_exec_at,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, fe.cohort_week_start,
((aw.week_start - DATE_TRUNC('week',fe.first_exec_at)::date)/7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_exec fe USING (user_id)
WHERE aw.week_start >= DATE_TRUNC('week',fe.first_exec_at)::date
),
bounded_counts AS (
SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
)
SELECT
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
ORDER BY g.cohort_week_start, g.user_lifetime_week;

View File

@@ -0,0 +1,94 @@
-- =============================================================
-- View: analytics.retention_login_daily
-- Looker source alias: ds112 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Daily cohort retention based on login sessions.
-- Same logic as retention_login_weekly but at day granularity,
-- showing up to day 30 for cohorts from the last 90 days.
-- Useful for analysing early activation (days 1-7) in detail.
--
-- SOURCE TABLES
-- auth.sessions — Login session records
--
-- OUTPUT COLUMNS (same pattern as retention_login_weekly)
-- cohort_day_start DATE First day the cohort logged in
-- cohort_label TEXT Date string (e.g. '2025-03-01')
-- cohort_label_n TEXT Date + cohort size (e.g. '2025-03-01 (n=12)')
-- user_lifetime_day INT Days since first login (0 = signup day)
-- cohort_users BIGINT Total users in cohort
-- active_users_bounded BIGINT Users active on exactly day k
-- retained_users_unbounded BIGINT Users active any time on/after day k
-- retention_rate_bounded FLOAT bounded / cohort_users
-- retention_rate_unbounded FLOAT unbounded / cohort_users
-- cohort_users_d0 BIGINT cohort_users only at day 0, else 0 (safe to SUM)
--
-- EXAMPLE QUERIES
-- -- Day-1 retention rate (came back next day)
-- SELECT cohort_label, retention_rate_bounded AS d1_retention
-- FROM analytics.retention_login_daily
-- WHERE user_lifetime_day = 1 ORDER BY cohort_day_start;
--
-- -- Average retention curve across all cohorts
-- SELECT user_lifetime_day,
-- SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_d0), 0) AS avg_retention
-- FROM analytics.retention_login_daily
-- GROUP BY 1 ORDER BY 1;
-- =============================================================
WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days')::date AS cohort_start),
events AS (
SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
DATE_TRUNC('day', s.created_at)::date AS day_start
FROM auth.sessions s WHERE s.user_id IS NOT NULL
),
first_login AS (
SELECT user_id, MIN(created_at) AS first_login_time,
DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
FROM events GROUP BY 1
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
user_day_age AS (
SELECT ad.user_id, fl.cohort_day_start,
(ad.day_start - DATE_TRUNC('day', fl.first_login_time)::date)::int AS user_lifetime_day
FROM activity_days ad JOIN first_login fl USING (user_id)
WHERE ad.day_start >= DATE_TRUNC('day', fl.first_login_time)::date
),
bounded_counts AS (
SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_day_start, cs.cohort_users,
LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
)
SELECT
g.cohort_day_start,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD') AS cohort_label,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_day, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END AS cohort_users_d0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
ORDER BY g.cohort_day_start, g.user_lifetime_day;

View File

@@ -0,0 +1,96 @@
-- =============================================================
-- View: analytics.retention_login_onboarded_weekly
-- Looker source alias: ds101 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention from login sessions, restricted to
-- users who "onboarded" — defined as running at least one
-- agent within 365 days of their first login.
-- Filters out users who signed up but never activated,
-- giving a cleaner view of engaged-user retention.
--
-- SOURCE TABLES
-- auth.sessions — Login session records
-- platform.AgentGraphExecution — Used to identify onboarders
--
-- OUTPUT COLUMNS
-- Same as retention_login_weekly (cohort_week_start, user_lifetime_week,
-- retention_rate_bounded, retention_rate_unbounded, etc.)
-- Only difference: cohort is filtered to onboarded users only.
--
-- EXAMPLE QUERIES
-- -- Compare week-4 retention: all users vs onboarded only
-- SELECT 'all_users' AS segment, AVG(retention_rate_bounded) AS w4_retention
-- FROM analytics.retention_login_weekly WHERE user_lifetime_week = 4
-- UNION ALL
-- SELECT 'onboarded', AVG(retention_rate_bounded)
-- FROM analytics.retention_login_onboarded_weekly WHERE user_lifetime_week = 4;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks, 365::int AS onboarding_window_days),
events AS (
SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
DATE_TRUNC('week', s.created_at)::date AS week_start
FROM auth.sessions s WHERE s.user_id IS NOT NULL
),
first_login_all AS (
SELECT user_id, MIN(created_at) AS first_login_time,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1
),
onboarders AS (
SELECT fl.user_id FROM first_login_all fl
WHERE EXISTS (
SELECT 1 FROM platform."AgentGraphExecution" e
WHERE e."userId"::text = fl.user_id
AND e."createdAt" >= fl.first_login_time
AND e."createdAt" < fl.first_login_time
+ make_interval(days => (SELECT onboarding_window_days FROM params))
)
),
first_login AS (SELECT * FROM first_login_all WHERE user_id IN (SELECT user_id FROM onboarders)),
activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, fl.cohort_week_start,
((aw.week_start - DATE_TRUNC('week',fl.first_login_time)::date)/7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_login fl USING (user_id)
WHERE aw.week_start >= DATE_TRUNC('week',fl.first_login_time)::date
),
bounded_counts AS (
SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
)
SELECT
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
ORDER BY g.cohort_week_start, g.user_lifetime_week;

View File

@@ -0,0 +1,103 @@
-- =============================================================
-- View: analytics.retention_login_weekly
-- Looker source alias: ds83 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention based on login sessions.
-- Users are grouped by the ISO week of their first ever login.
-- For each cohort × lifetime-week combination, outputs both:
-- - bounded rate: % active in exactly that week
-- - unbounded rate: % who were ever active on or after that week
-- Weeks are capped to the cohort's actual age (no future data points).
--
-- SOURCE TABLES
-- auth.sessions — Login session records
--
-- HOW TO READ THE OUTPUT
-- cohort_week_start The Monday of the week users first logged in
-- user_lifetime_week 0 = signup week, 1 = one week later, etc.
-- retention_rate_bounded = active_users_bounded / cohort_users
-- retention_rate_unbounded = retained_users_unbounded / cohort_users
--
-- OUTPUT COLUMNS
-- cohort_week_start DATE First day of the cohort's signup week
-- cohort_label TEXT ISO week label (e.g. '2025-W01')
-- cohort_label_n TEXT ISO week label with cohort size (e.g. '2025-W01 (n=42)')
-- user_lifetime_week INT Weeks since first login (0 = signup week)
-- cohort_users BIGINT Total users in this cohort (denominator)
-- active_users_bounded BIGINT Users active in exactly week k
-- retained_users_unbounded BIGINT Users active any time on/after week k
-- retention_rate_bounded FLOAT bounded active / cohort_users
-- retention_rate_unbounded FLOAT unbounded retained / cohort_users
-- cohort_users_w0 BIGINT cohort_users only at week 0, else 0 (safe to SUM in pivot tables)
--
-- EXAMPLE QUERIES
-- -- Week-1 retention rate per cohort
-- SELECT cohort_label, retention_rate_bounded AS w1_retention
-- FROM analytics.retention_login_weekly
-- WHERE user_lifetime_week = 1
-- ORDER BY cohort_week_start;
--
-- -- Overall average retention curve (all cohorts combined)
-- SELECT user_lifetime_week,
-- SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_w0), 0) AS avg_retention
-- FROM analytics.retention_login_weekly
-- GROUP BY 1 ORDER BY 1;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks),
events AS (
SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
DATE_TRUNC('week', s.created_at)::date AS week_start
FROM auth.sessions s WHERE s.user_id IS NOT NULL
),
first_login AS (
SELECT user_id, MIN(created_at) AS first_login_time,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1
),
activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, fl.cohort_week_start,
((aw.week_start - DATE_TRUNC('week', fl.first_login_time)::date) / 7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_login fl USING (user_id)
WHERE aw.week_start >= DATE_TRUNC('week', fl.first_login_time)::date
),
bounded_counts AS (
SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date - cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
)
SELECT
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
ORDER BY g.cohort_week_start, g.user_lifetime_week

View File

@@ -0,0 +1,71 @@
-- =============================================================
-- View: analytics.user_block_spending
-- Looker source alias: ds6 | Charts: 5
-- =============================================================
-- DESCRIPTION
-- One row per credit transaction (last 90 days).
-- Shows how users spend credits broken down by block type,
-- LLM provider and model. Joins node execution stats for
-- token-level detail.
--
-- SOURCE TABLES
-- platform.CreditTransaction — Credit debit/credit records
-- platform.AgentNodeExecution — Node execution stats (for token counts)
--
-- OUTPUT COLUMNS
-- transactionKey TEXT Unique transaction identifier
-- userId TEXT User who was charged
-- amount DECIMAL Credit amount (positive = credit, negative = debit)
-- negativeAmount DECIMAL amount * -1 (convenience for spend charts)
-- transactionType TEXT Transaction type (e.g. 'USAGE', 'REFUND', 'TOP_UP')
-- transactionTime TIMESTAMPTZ When the transaction was recorded
-- blockId TEXT Block UUID that triggered the spend
-- blockName TEXT Human-readable block name
-- llm_provider TEXT LLM provider (e.g. 'openai', 'anthropic')
-- llm_model TEXT Model name (e.g. 'gpt-4o', 'claude-3-5-sonnet')
-- node_exec_id TEXT Linked node execution UUID
-- llm_call_count INT LLM API calls made in that execution
-- llm_retry_count INT LLM retries in that execution
-- llm_input_token_count INT Input tokens consumed
-- llm_output_token_count INT Output tokens produced
--
-- WINDOW
-- Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
-- -- Total spend per user (last 90 days)
-- SELECT "userId", SUM("negativeAmount") AS total_spent
-- FROM analytics.user_block_spending
-- WHERE "transactionType" = 'USAGE'
-- GROUP BY 1 ORDER BY total_spent DESC;
--
-- -- Spend by LLM provider + model
-- SELECT "llm_provider", "llm_model",
-- SUM("negativeAmount") AS total_cost,
-- SUM("llm_input_token_count") AS input_tokens,
-- SUM("llm_output_token_count") AS output_tokens
-- FROM analytics.user_block_spending
-- WHERE "llm_provider" IS NOT NULL
-- GROUP BY 1, 2 ORDER BY total_cost DESC;
-- =============================================================
SELECT
c."transactionKey" AS transactionKey,
c."userId" AS userId,
c."amount" AS amount,
c."amount" * -1 AS negativeAmount,
c."type" AS transactionType,
c."createdAt" AS transactionTime,
c.metadata->>'block_id' AS blockId,
c.metadata->>'block' AS blockName,
c.metadata->'input'->'credentials'->>'provider' AS llm_provider,
c.metadata->'input'->>'model' AS llm_model,
c.metadata->>'node_exec_id' AS node_exec_id,
(ne."stats"->>'llm_call_count')::int AS llm_call_count,
(ne."stats"->>'llm_retry_count')::int AS llm_retry_count,
(ne."stats"->>'input_token_count')::int AS llm_input_token_count,
(ne."stats"->>'output_token_count')::int AS llm_output_token_count
FROM platform."CreditTransaction" c
LEFT JOIN platform."AgentNodeExecution" ne
ON (c.metadata->>'node_exec_id') = ne."id"::text
WHERE c."createdAt" > CURRENT_DATE - INTERVAL '90 days'

View File

@@ -0,0 +1,45 @@
-- =============================================================
-- View: analytics.user_onboarding
-- Looker source alias: ds68 | Charts: 3
-- =============================================================
-- DESCRIPTION
-- One row per user onboarding record. Contains the user's
-- stated usage reason, selected integrations, completed
-- onboarding steps and optional first agent selection.
-- Full history (no date filter) since onboarding happens
-- once per user.
--
-- SOURCE TABLES
-- platform.UserOnboarding — Onboarding state per user
--
-- OUTPUT COLUMNS
-- id TEXT Onboarding record UUID
-- createdAt TIMESTAMPTZ When onboarding started
-- updatedAt TIMESTAMPTZ Last update to onboarding state
-- usageReason TEXT Why user signed up (e.g. 'work', 'personal')
-- integrations TEXT[] Array of integration names the user selected
-- userId TEXT User UUID
-- completedSteps TEXT[] Array of onboarding step enums completed
-- selectedStoreListingVersionId TEXT First marketplace agent the user chose (if any)
--
-- EXAMPLE QUERIES
-- -- Usage reason breakdown
-- SELECT "usageReason", COUNT(*) FROM analytics.user_onboarding GROUP BY 1;
--
-- -- Completion rate per step
-- SELECT step, COUNT(*) AS users_completed
-- FROM analytics.user_onboarding
-- CROSS JOIN LATERAL UNNEST("completedSteps") AS step
-- GROUP BY 1 ORDER BY users_completed DESC;
-- =============================================================
SELECT
id,
"createdAt",
"updatedAt",
"usageReason",
integrations,
"userId",
"completedSteps",
"selectedStoreListingVersionId"
FROM platform."UserOnboarding"

View File

@@ -0,0 +1,100 @@
-- =============================================================
-- View: analytics.user_onboarding_funnel
-- Looker source alias: ds74 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Pre-aggregated onboarding funnel showing how many users
-- completed each step and the drop-off percentage from the
-- previous step. One row per onboarding step (all 22 steps
-- always present, even with 0 completions — prevents sparse
-- gaps from making LAG compare the wrong predecessors).
--
-- SOURCE TABLES
-- platform.UserOnboarding — Onboarding records with completedSteps array
--
-- OUTPUT COLUMNS
-- step TEXT Onboarding step enum name (e.g. 'WELCOME', 'CONGRATS')
-- step_order INT Numeric position in the funnel (1=first, 22=last)
-- users_completed BIGINT Distinct users who completed this step
-- pct_from_prev NUMERIC % of users from the previous step who reached this one
--
-- STEP ORDER
-- 1 WELCOME 9 MARKETPLACE_VISIT 17 SCHEDULE_AGENT
-- 2 USAGE_REASON 10 MARKETPLACE_ADD_AGENT 18 RUN_AGENTS
-- 3 INTEGRATIONS 11 MARKETPLACE_RUN_AGENT 19 RUN_3_DAYS
-- 4 AGENT_CHOICE 12 BUILDER_OPEN 20 TRIGGER_WEBHOOK
-- 5 AGENT_NEW_RUN 13 BUILDER_SAVE_AGENT 21 RUN_14_DAYS
-- 6 AGENT_INPUT 14 BUILDER_RUN_AGENT 22 RUN_AGENTS_100
-- 7 CONGRATS 15 VISIT_COPILOT
-- 8 GET_RESULTS 16 RE_RUN_AGENT
--
-- WINDOW
-- Users who started onboarding in the last 90 days
--
-- EXAMPLE QUERIES
-- -- Full funnel
-- SELECT * FROM analytics.user_onboarding_funnel ORDER BY step_order;
--
-- -- Biggest drop-off point
-- SELECT step, pct_from_prev FROM analytics.user_onboarding_funnel
-- ORDER BY pct_from_prev ASC LIMIT 3;
-- =============================================================
WITH all_steps AS (
-- Complete ordered grid of all 22 steps so zero-completion steps
-- are always present, keeping LAG comparisons correct.
SELECT step_name, step_order
FROM (VALUES
('WELCOME', 1),
('USAGE_REASON', 2),
('INTEGRATIONS', 3),
('AGENT_CHOICE', 4),
('AGENT_NEW_RUN', 5),
('AGENT_INPUT', 6),
('CONGRATS', 7),
('GET_RESULTS', 8),
('MARKETPLACE_VISIT', 9),
('MARKETPLACE_ADD_AGENT', 10),
('MARKETPLACE_RUN_AGENT', 11),
('BUILDER_OPEN', 12),
('BUILDER_SAVE_AGENT', 13),
('BUILDER_RUN_AGENT', 14),
('VISIT_COPILOT', 15),
('RE_RUN_AGENT', 16),
('SCHEDULE_AGENT', 17),
('RUN_AGENTS', 18),
('RUN_3_DAYS', 19),
('TRIGGER_WEBHOOK', 20),
('RUN_14_DAYS', 21),
('RUN_AGENTS_100', 22)
) AS t(step_name, step_order)
),
raw AS (
SELECT
u."userId",
step_txt::text AS step
FROM platform."UserOnboarding" u
CROSS JOIN LATERAL UNNEST(u."completedSteps") AS step_txt
WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
),
step_counts AS (
SELECT step, COUNT(DISTINCT "userId") AS users_completed
FROM raw GROUP BY step
),
funnel AS (
SELECT
a.step_name AS step,
a.step_order,
COALESCE(sc.users_completed, 0) AS users_completed,
ROUND(
100.0 * COALESCE(sc.users_completed, 0)
/ NULLIF(
LAG(COALESCE(sc.users_completed, 0)) OVER (ORDER BY a.step_order),
0
),
2
) AS pct_from_prev
FROM all_steps a
LEFT JOIN step_counts sc ON sc.step = a.step_name
)
SELECT * FROM funnel ORDER BY step_order

View File

@@ -0,0 +1,41 @@
-- =============================================================
-- View: analytics.user_onboarding_integration
-- Looker source alias: ds75 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Pre-aggregated count of users who selected each integration
-- during onboarding. One row per integration type, sorted
-- by popularity.
--
-- SOURCE TABLES
-- platform.UserOnboarding — integrations array column
--
-- OUTPUT COLUMNS
-- integration TEXT Integration name (e.g. 'github', 'slack', 'notion')
-- users_with_integration BIGINT Distinct users who selected this integration
--
-- WINDOW
-- Users who started onboarding in the last 90 days
--
-- EXAMPLE QUERIES
-- -- Full integration popularity ranking
-- SELECT * FROM analytics.user_onboarding_integration;
--
-- -- Top 5 integrations
-- SELECT * FROM analytics.user_onboarding_integration LIMIT 5;
-- =============================================================
WITH exploded AS (
SELECT
u."userId" AS user_id,
UNNEST(u."integrations") AS integration
FROM platform."UserOnboarding" u
WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
integration,
COUNT(DISTINCT user_id) AS users_with_integration
FROM exploded
WHERE integration IS NOT NULL AND integration <> ''
GROUP BY integration
ORDER BY users_with_integration DESC

View File

@@ -0,0 +1,145 @@
-- =============================================================
-- View: analytics.users_activities
-- Looker source alias: ds56 | Charts: 5
-- =============================================================
-- DESCRIPTION
-- One row per user with lifetime activity summary.
-- Joins login sessions with agent graphs, executions and
-- node-level runs to give a full picture of how engaged
-- each user is. Includes a convenience flag for 7-day
-- activation (did the user return at least 7 days after
-- their first login?).
--
-- SOURCE TABLES
-- auth.sessions — Login/session records
-- platform.AgentGraph — Graphs (agents) built by the user
-- platform.AgentGraphExecution — Agent run history
-- platform.AgentNodeExecution — Individual block execution history
--
-- PERFORMANCE NOTE
-- Each CTE aggregates its own table independently by userId.
-- This avoids the fan-out that occurs when driving every join
-- from user_logins across the two largest tables
-- (AgentGraphExecution and AgentNodeExecution).
--
-- OUTPUT COLUMNS
-- user_id TEXT Supabase user UUID
-- first_login_time TIMESTAMPTZ First ever session created_at
-- last_login_time TIMESTAMPTZ Most recent session created_at
-- last_visit_time TIMESTAMPTZ Max of last refresh or login
-- last_agent_save_time TIMESTAMPTZ Last time user saved an agent graph
-- agent_count BIGINT Number of distinct active graphs built (0 if none)
-- first_agent_run_time TIMESTAMPTZ First ever graph execution
-- last_agent_run_time TIMESTAMPTZ Most recent graph execution
-- unique_agent_runs BIGINT Distinct agent graphs ever run (0 if none)
-- agent_runs BIGINT Total graph execution count (0 if none)
-- node_execution_count BIGINT Total node executions across all runs
-- node_execution_failed BIGINT Node executions with FAILED status
-- node_execution_completed BIGINT Node executions with COMPLETED status
-- node_execution_terminated BIGINT Node executions with TERMINATED status
-- node_execution_queued BIGINT Node executions with QUEUED status
-- node_execution_running BIGINT Node executions with RUNNING status
-- is_active_after_7d INT 1=returned after day 7, 0=did not, NULL=too early to tell
-- node_execution_incomplete BIGINT Node executions with INCOMPLETE status
-- node_execution_review BIGINT Node executions with REVIEW status
--
-- EXAMPLE QUERIES
-- -- Users who ran at least one agent and returned after 7 days
-- SELECT COUNT(*) FROM analytics.users_activities
-- WHERE agent_runs > 0 AND is_active_after_7d = 1;
--
-- -- Top 10 most active users by agent runs
-- SELECT user_id, agent_runs, node_execution_count
-- FROM analytics.users_activities
-- ORDER BY agent_runs DESC LIMIT 10;
--
-- -- 7-day activation rate
-- SELECT
-- SUM(CASE WHEN is_active_after_7d = 1 THEN 1 ELSE 0 END)::float
-- / NULLIF(COUNT(CASE WHEN is_active_after_7d IS NOT NULL THEN 1 END), 0)
-- AS activation_rate
-- FROM analytics.users_activities;
-- =============================================================
WITH user_logins AS (
SELECT
user_id::text AS user_id,
MIN(created_at) AS first_login_time,
MAX(created_at) AS last_login_time,
GREATEST(
MAX(refreshed_at)::timestamptz,
MAX(created_at)::timestamptz
) AS last_visit_time
FROM auth.sessions
GROUP BY user_id
),
user_agents AS (
-- Aggregate AgentGraph directly by userId (no fan-out from user_logins)
SELECT
"userId"::text AS user_id,
MAX("updatedAt") AS last_agent_save_time,
COUNT(DISTINCT "id") AS agent_count
FROM platform."AgentGraph"
WHERE "isActive"
GROUP BY "userId"
),
user_graph_runs AS (
-- Aggregate AgentGraphExecution directly by userId
SELECT
"userId"::text AS user_id,
MIN("createdAt") AS first_agent_run_time,
MAX("createdAt") AS last_agent_run_time,
COUNT(DISTINCT "agentGraphId") AS unique_agent_runs,
COUNT("id") AS agent_runs
FROM platform."AgentGraphExecution"
GROUP BY "userId"
),
user_node_runs AS (
-- Aggregate AgentNodeExecution directly; resolve userId via a
-- single join to AgentGraphExecution instead of fanning out from
-- user_logins through both large tables.
SELECT
g."userId"::text AS user_id,
COUNT(*) AS node_execution_count,
COUNT(*) FILTER (WHERE n."executionStatus" = 'FAILED') AS node_execution_failed,
COUNT(*) FILTER (WHERE n."executionStatus" = 'COMPLETED') AS node_execution_completed,
COUNT(*) FILTER (WHERE n."executionStatus" = 'TERMINATED') AS node_execution_terminated,
COUNT(*) FILTER (WHERE n."executionStatus" = 'QUEUED') AS node_execution_queued,
COUNT(*) FILTER (WHERE n."executionStatus" = 'RUNNING') AS node_execution_running,
COUNT(*) FILTER (WHERE n."executionStatus" = 'INCOMPLETE') AS node_execution_incomplete,
COUNT(*) FILTER (WHERE n."executionStatus" = 'REVIEW') AS node_execution_review
FROM platform."AgentNodeExecution" n
JOIN platform."AgentGraphExecution" g
ON g."id" = n."agentGraphExecutionId"
GROUP BY g."userId"
)
SELECT
ul.user_id,
ul.first_login_time,
ul.last_login_time,
ul.last_visit_time,
ua.last_agent_save_time,
COALESCE(ua.agent_count, 0) AS agent_count,
gr.first_agent_run_time,
gr.last_agent_run_time,
COALESCE(gr.unique_agent_runs, 0) AS unique_agent_runs,
COALESCE(gr.agent_runs, 0) AS agent_runs,
COALESCE(nr.node_execution_count, 0) AS node_execution_count,
COALESCE(nr.node_execution_failed, 0) AS node_execution_failed,
COALESCE(nr.node_execution_completed, 0) AS node_execution_completed,
COALESCE(nr.node_execution_terminated, 0) AS node_execution_terminated,
COALESCE(nr.node_execution_queued, 0) AS node_execution_queued,
COALESCE(nr.node_execution_running, 0) AS node_execution_running,
CASE
WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
AND ul.last_visit_time >= ul.first_login_time + INTERVAL '7 days' THEN 1
WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
AND ul.last_visit_time < ul.first_login_time + INTERVAL '7 days' THEN 0
ELSE NULL
END AS is_active_after_7d,
COALESCE(nr.node_execution_incomplete, 0) AS node_execution_incomplete,
COALESCE(nr.node_execution_review, 0) AS node_execution_review
FROM user_logins ul
LEFT JOIN user_agents ua ON ul.user_id = ua.user_id
LEFT JOIN user_graph_runs gr ON ul.user_id = gr.user_id
LEFT JOIN user_node_runs nr ON ul.user_id = nr.user_id

View File

@@ -448,61 +448,61 @@ toml = ["tomli ; python_full_version <= \"3.11.0a6\""]
[[package]]
name = "cryptography"
version = "46.0.4"
version = "46.0.5"
description = "cryptography is a package which provides cryptographic recipes and primitives to Python developers."
optional = false
python-versions = "!=3.9.0,!=3.9.1,>=3.8"
groups = ["main"]
files = [
{file = "cryptography-46.0.4-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:281526e865ed4166009e235afadf3a4c4cba6056f99336a99efba65336fd5485"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5f14fba5bf6f4390d7ff8f086c566454bff0411f6d8aa7af79c88b6f9267aecc"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:47bcd19517e6389132f76e2d5303ded6cf3f78903da2158a671be8de024f4cd0"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:01df4f50f314fbe7009f54046e908d1754f19d0c6d3070df1e6268c5a4af09fa"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:5aa3e463596b0087b3da0dbe2b2487e9fc261d25da85754e30e3b40637d61f81"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:0a9ad24359fee86f131836a9ac3bffc9329e956624a2d379b613f8f8abaf5255"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:dc1272e25ef673efe72f2096e92ae39dea1a1a450dd44918b15351f72c5a168e"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:de0f5f4ec8711ebc555f54735d4c673fc34b65c44283895f1a08c2b49d2fd99c"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:eeeb2e33d8dbcccc34d64651f00a98cb41b2dc69cef866771a5717e6734dfa32"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:3d425eacbc9aceafd2cb429e42f4e5d5633c6f873f5e567077043ef1b9bbf616"},
{file = "cryptography-46.0.4-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:91627ebf691d1ea3976a031b61fb7bac1ccd745afa03602275dda443e11c8de0"},
{file = "cryptography-46.0.4-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:2d08bc22efd73e8854b0b7caff402d735b354862f1145d7be3b9c0f740fef6a0"},
{file = "cryptography-46.0.4-cp311-abi3-win32.whl", hash = "sha256:82a62483daf20b8134f6e92898da70d04d0ef9a75829d732ea1018678185f4f5"},
{file = "cryptography-46.0.4-cp311-abi3-win_amd64.whl", hash = "sha256:6225d3ebe26a55dbc8ead5ad1265c0403552a63336499564675b29eb3184c09b"},
{file = "cryptography-46.0.4-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:485e2b65d25ec0d901bca7bcae0f53b00133bf3173916d8e421f6fddde103908"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:078e5f06bd2fa5aea5a324f2a09f914b1484f1d0c2a4d6a8a28c74e72f65f2da"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:dce1e4f068f03008da7fa51cc7abc6ddc5e5de3e3d1550334eaf8393982a5829"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:2067461c80271f422ee7bdbe79b9b4be54a5162e90345f86a23445a0cf3fd8a2"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:c92010b58a51196a5f41c3795190203ac52edfd5dc3ff99149b4659eba9d2085"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:829c2b12bbc5428ab02d6b7f7e9bbfd53e33efd6672d21341f2177470171ad8b"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:62217ba44bf81b30abaeda1488686a04a702a261e26f87db51ff61d9d3510abd"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:9c2da296c8d3415b93e6053f5a728649a87a48ce084a9aaf51d6e46c87c7f2d2"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:9b34d8ba84454641a6bf4d6762d15847ecbd85c1316c0a7984e6e4e9f748ec2e"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:df4a817fa7138dd0c96c8c8c20f04b8aaa1fac3bbf610913dcad8ea82e1bfd3f"},
{file = "cryptography-46.0.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:b1de0ebf7587f28f9190b9cb526e901bf448c9e6a99655d2b07fff60e8212a82"},
{file = "cryptography-46.0.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:9b4d17bc7bd7cdd98e3af40b441feaea4c68225e2eb2341026c84511ad246c0c"},
{file = "cryptography-46.0.4-cp314-cp314t-win32.whl", hash = "sha256:c411f16275b0dea722d76544a61d6421e2cc829ad76eec79280dbdc9ddf50061"},
{file = "cryptography-46.0.4-cp314-cp314t-win_amd64.whl", hash = "sha256:728fedc529efc1439eb6107b677f7f7558adab4553ef8669f0d02d42d7b959a7"},
{file = "cryptography-46.0.4-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:a9556ba711f7c23f77b151d5798f3ac44a13455cc68db7697a1096e6d0563cab"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:8bf75b0259e87fa70bddc0b8b4078b76e7fd512fd9afae6c1193bcf440a4dbef"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3c268a3490df22270955966ba236d6bc4a8f9b6e4ffddb78aac535f1a5ea471d"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:812815182f6a0c1d49a37893a303b44eaac827d7f0d582cecfc81b6427f22973"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:a90e43e3ef65e6dcf969dfe3bb40cbf5aef0d523dff95bfa24256be172a845f4"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:a05177ff6296644ef2876fce50518dffb5bcdf903c85250974fc8bc85d54c0af"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:daa392191f626d50f1b136c9b4cf08af69ca8279d110ea24f5c2700054d2e263"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:e07ea39c5b048e085f15923511d8121e4a9dc45cee4e3b970ca4f0d338f23095"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:d5a45ddc256f492ce42a4e35879c5e5528c09cd9ad12420828c972951d8e016b"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:6bb5157bf6a350e5b28aee23beb2d84ae6f5be390b2f8ee7ea179cda077e1019"},
{file = "cryptography-46.0.4-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:dd5aba870a2c40f87a3af043e0dee7d9eb02d4aff88a797b48f2b43eff8c3ab4"},
{file = "cryptography-46.0.4-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:93d8291da8d71024379ab2cb0b5c57915300155ad42e07f76bea6ad838d7e59b"},
{file = "cryptography-46.0.4-cp38-abi3-win32.whl", hash = "sha256:0563655cb3c6d05fb2afe693340bc050c30f9f34e15763361cf08e94749401fc"},
{file = "cryptography-46.0.4-cp38-abi3-win_amd64.whl", hash = "sha256:fa0900b9ef9c49728887d1576fd8d9e7e3ea872fa9b25ef9b64888adc434e976"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:766330cce7416c92b5e90c3bb71b1b79521760cdcfc3a6a1a182d4c9fab23d2b"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:c236a44acfb610e70f6b3e1c3ca20ff24459659231ef2f8c48e879e2d32b73da"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:8a15fb869670efa8f83cbffbc8753c1abf236883225aed74cd179b720ac9ec80"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:fdc3daab53b212472f1524d070735b2f0c214239df131903bae1d598016fa822"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:44cc0675b27cadb71bdbb96099cca1fa051cd11d2ade09e5cd3a2edb929ed947"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:be8c01a7d5a55f9a47d1888162b76c8f49d62b234d88f0ff91a9fbebe32ffbc3"},
{file = "cryptography-46.0.4.tar.gz", hash = "sha256:bfd019f60f8abc2ed1b9be4ddc21cfef059c841d86d710bb69909a688cbb8f59"},
{file = "cryptography-46.0.5-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:351695ada9ea9618b3500b490ad54c739860883df6c1f555e088eaf25b1bbaad"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:c18ff11e86df2e28854939acde2d003f7984f721eba450b56a200ad90eeb0e6b"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4d7e3d356b8cd4ea5aff04f129d5f66ebdc7b6f8eae802b93739ed520c47c79b"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:50bfb6925eff619c9c023b967d5b77a54e04256c4281b0e21336a130cd7fc263"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:803812e111e75d1aa73690d2facc295eaefd4439be1023fefc4995eaea2af90d"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ee190460e2fbe447175cda91b88b84ae8322a104fc27766ad09428754a618ed"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:f145bba11b878005c496e93e257c1e88f154d278d2638e6450d17e0f31e558d2"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:e9251e3be159d1020c4030bd2e5f84d6a43fe54b6c19c12f51cde9542a2817b2"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:47fb8a66058b80e509c47118ef8a75d14c455e81ac369050f20ba0d23e77fee0"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:4c3341037c136030cb46e4b1e17b7418ea4cbd9dd207e4a6f3b2b24e0d4ac731"},
{file = "cryptography-46.0.5-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:890bcb4abd5a2d3f852196437129eb3667d62630333aacc13dfd470fad3aaa82"},
{file = "cryptography-46.0.5-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:80a8d7bfdf38f87ca30a5391c0c9ce4ed2926918e017c29ddf643d0ed2778ea1"},
{file = "cryptography-46.0.5-cp311-abi3-win32.whl", hash = "sha256:60ee7e19e95104d4c03871d7d7dfb3d22ef8a9b9c6778c94e1c8fcc8365afd48"},
{file = "cryptography-46.0.5-cp311-abi3-win_amd64.whl", hash = "sha256:38946c54b16c885c72c4f59846be9743d699eee2b69b6988e0a00a01f46a61a4"},
{file = "cryptography-46.0.5-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:94a76daa32eb78d61339aff7952ea819b1734b46f73646a07decb40e5b3448e2"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5be7bf2fb40769e05739dd0046e7b26f9d4670badc7b032d6ce4db64dddc0678"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fe346b143ff9685e40192a4960938545c699054ba11d4f9029f94751e3f71d87"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:c69fd885df7d089548a42d5ec05be26050ebcd2283d89b3d30676eb32ff87dee"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:8293f3dea7fc929ef7240796ba231413afa7b68ce38fd21da2995549f5961981"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:1abfdb89b41c3be0365328a410baa9df3ff8a9110fb75e7b52e66803ddabc9a9"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:d66e421495fdb797610a08f43b05269e0a5ea7f5e652a89bfd5a7d3c1dee3648"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:4e817a8920bfbcff8940ecfd60f23d01836408242b30f1a708d93198393a80b4"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:68f68d13f2e1cb95163fa3b4db4bf9a159a418f5f6e7242564fc75fcae667fd0"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:a3d1fae9863299076f05cb8a778c467578262fae09f9dc0ee9b12eb4268ce663"},
{file = "cryptography-46.0.5-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:c4143987a42a2397f2fc3b4d7e3a7d313fbe684f67ff443999e803dd75a76826"},
{file = "cryptography-46.0.5-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:7d731d4b107030987fd61a7f8ab512b25b53cef8f233a97379ede116f30eb67d"},
{file = "cryptography-46.0.5-cp314-cp314t-win32.whl", hash = "sha256:c3bcce8521d785d510b2aad26ae2c966092b7daa8f45dd8f44734a104dc0bc1a"},
{file = "cryptography-46.0.5-cp314-cp314t-win_amd64.whl", hash = "sha256:4d8ae8659ab18c65ced284993c2265910f6c9e650189d4e3f68445ef82a810e4"},
{file = "cryptography-46.0.5-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:4108d4c09fbbf2789d0c926eb4152ae1760d5a2d97612b92d508d96c861e4d31"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:7d1f30a86d2757199cb2d56e48cce14deddf1f9c95f1ef1b64ee91ea43fe2e18"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:039917b0dc418bb9f6edce8a906572d69e74bd330b0b3fea4f79dab7f8ddd235"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:ba2a27ff02f48193fc4daeadf8ad2590516fa3d0adeeb34336b96f7fa64c1e3a"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:61aa400dce22cb001a98014f647dc21cda08f7915ceb95df0c9eaf84b4b6af76"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ce58ba46e1bc2aac4f7d9290223cead56743fa6ab94a5d53292ffaac6a91614"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:420d0e909050490d04359e7fdb5ed7e667ca5c3c402b809ae2563d7e66a92229"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:582f5fcd2afa31622f317f80426a027f30dc792e9c80ffee87b993200ea115f1"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:bfd56bb4b37ed4f330b82402f6f435845a5f5648edf1ad497da51a8452d5d62d"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:a3d507bb6a513ca96ba84443226af944b0f7f47dcc9a399d110cd6146481d24c"},
{file = "cryptography-46.0.5-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9f16fbdf4da055efb21c22d81b89f155f02ba420558db21288b3d0035bafd5f4"},
{file = "cryptography-46.0.5-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:ced80795227d70549a411a4ab66e8ce307899fad2220ce5ab2f296e687eacde9"},
{file = "cryptography-46.0.5-cp38-abi3-win32.whl", hash = "sha256:02f547fce831f5096c9a567fd41bc12ca8f11df260959ecc7c3202555cc47a72"},
{file = "cryptography-46.0.5-cp38-abi3-win_amd64.whl", hash = "sha256:556e106ee01aa13484ce9b0239bca667be5004efb0aabbed28d353df86445595"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:3b4995dc971c9fb83c25aa44cf45f02ba86f71ee600d81091c2f0cbae116b06c"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:bc84e875994c3b445871ea7181d424588171efec3e185dced958dad9e001950a"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:2ae6971afd6246710480e3f15824ed3029a60fc16991db250034efd0b9fb4356"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:d861ee9e76ace6cf36a6a89b959ec08e7bc2493ee39d07ffe5acb23ef46d27da"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:2b7a67c9cd56372f3249b39699f2ad479f6991e62ea15800973b956f4b73e257"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:8456928655f856c6e1533ff59d5be76578a7157224dbd9ce6872f25055ab9ab7"},
{file = "cryptography-46.0.5.tar.gz", hash = "sha256:abace499247268e3757271b2f1e244b36b06f8515cf27c4d49468fc9eb16e93d"},
]
[package.dependencies]
@@ -516,7 +516,7 @@ nox = ["nox[uv] (>=2024.4.15)"]
pep8test = ["check-sdist", "click (>=8.0.1)", "mypy (>=1.14)", "ruff (>=0.11.11)"]
sdist = ["build (>=1.0.0)"]
ssh = ["bcrypt (>=3.1.5)"]
test = ["certifi (>=2024)", "cryptography-vectors (==46.0.4)", "pretend (>=0.7)", "pytest (>=7.4.0)", "pytest-benchmark (>=4.0)", "pytest-cov (>=2.10.1)", "pytest-xdist (>=3.5.0)"]
test = ["certifi (>=2024)", "cryptography-vectors (==46.0.5)", "pretend (>=0.7)", "pytest (>=7.4.0)", "pytest-benchmark (>=4.0)", "pytest-cov (>=2.10.1)", "pytest-xdist (>=3.5.0)"]
test-randomorder = ["pytest-randomly"]
[[package]]
@@ -570,24 +570,25 @@ tests = ["coverage", "coveralls", "dill", "mock", "nose"]
[[package]]
name = "fastapi"
version = "0.128.0"
version = "0.128.7"
description = "FastAPI framework, high performance, easy to learn, fast to code, ready for production"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "fastapi-0.128.0-py3-none-any.whl", hash = "sha256:aebd93f9716ee3b4f4fcfe13ffb7cf308d99c9f3ab5622d8877441072561582d"},
{file = "fastapi-0.128.0.tar.gz", hash = "sha256:1cc179e1cef10a6be60ffe429f79b829dce99d8de32d7acb7e6c8dfdf7f2645a"},
{file = "fastapi-0.128.7-py3-none-any.whl", hash = "sha256:6bd9bd31cb7047465f2d3fa3ba3f33b0870b17d4eaf7cdb36d1576ab060ad662"},
{file = "fastapi-0.128.7.tar.gz", hash = "sha256:783c273416995486c155ad2c0e2b45905dedfaf20b9ef8d9f6a9124670639a24"},
]
[package.dependencies]
annotated-doc = ">=0.0.2"
pydantic = ">=2.7.0"
starlette = ">=0.40.0,<0.51.0"
starlette = ">=0.40.0,<1.0.0"
typing-extensions = ">=4.8.0"
typing-inspection = ">=0.4.2"
[package.extras]
all = ["email-validator (>=2.0.0)", "fastapi-cli[standard] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "itsdangerous (>=1.1.0)", "jinja2 (>=3.1.5)", "orjson (>=3.2.1)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "pyyaml (>=5.3.1)", "ujson (>=4.0.1,!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0)", "uvicorn[standard] (>=0.12.0)"]
all = ["email-validator (>=2.0.0)", "fastapi-cli[standard] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "itsdangerous (>=1.1.0)", "jinja2 (>=3.1.5)", "orjson (>=3.9.3)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "pyyaml (>=5.3.1)", "ujson (>=5.8.0)", "uvicorn[standard] (>=0.12.0)"]
standard = ["email-validator (>=2.0.0)", "fastapi-cli[standard] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "jinja2 (>=3.1.5)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "uvicorn[standard] (>=0.12.0)"]
standard-no-fastapi-cloud-cli = ["email-validator (>=2.0.0)", "fastapi-cli[standard-no-fastapi-cloud-cli] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "jinja2 (>=3.1.5)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "uvicorn[standard] (>=0.12.0)"]
@@ -1062,14 +1063,14 @@ urllib3 = ">=1.26.0,<3"
[[package]]
name = "launchdarkly-server-sdk"
version = "9.14.1"
version = "9.15.0"
description = "LaunchDarkly SDK for Python"
optional = false
python-versions = ">=3.9"
python-versions = ">=3.10"
groups = ["main"]
files = [
{file = "launchdarkly_server_sdk-9.14.1-py3-none-any.whl", hash = "sha256:a9e2bd9ecdef845cd631ae0d4334a1115e5b44257c42eb2349492be4bac7815c"},
{file = "launchdarkly_server_sdk-9.14.1.tar.gz", hash = "sha256:1df44baf0a0efa74d8c1dad7a00592b98bce7d19edded7f770da8dbc49922213"},
{file = "launchdarkly_server_sdk-9.15.0-py3-none-any.whl", hash = "sha256:c267e29bfa3fb5e2a06a208448ada6ed5557a2924979b8d79c970b45d227c668"},
{file = "launchdarkly_server_sdk-9.15.0.tar.gz", hash = "sha256:f31441b74bc1a69c381db57c33116509e407a2612628ad6dff0a7dbb39d5020b"},
]
[package.dependencies]
@@ -1478,14 +1479,14 @@ testing = ["coverage", "pytest", "pytest-benchmark"]
[[package]]
name = "postgrest"
version = "2.27.2"
version = "2.28.0"
description = "PostgREST client for Python. This library provides an ORM interface to PostgREST."
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "postgrest-2.27.2-py3-none-any.whl", hash = "sha256:1666fef3de05ca097a314433dd5ae2f2d71c613cb7b233d0f468c4ffe37277da"},
{file = "postgrest-2.27.2.tar.gz", hash = "sha256:55407d530b5af3d64e883a71fec1f345d369958f723ce4a8ab0b7d169e313242"},
{file = "postgrest-2.28.0-py3-none-any.whl", hash = "sha256:7bca2f24dd1a1bf8a3d586c7482aba6cd41662da6733045fad585b63b7f7df75"},
{file = "postgrest-2.28.0.tar.gz", hash = "sha256:c36b38646d25ea4255321d3d924ce70f8d20ec7799cb42c1221d6a818d4f6515"},
]
[package.dependencies]
@@ -2248,14 +2249,14 @@ cli = ["click (>=5.0)"]
[[package]]
name = "realtime"
version = "2.27.2"
version = "2.28.0"
description = ""
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "realtime-2.27.2-py3-none-any.whl", hash = "sha256:34a9cbb26a274e707e8fc9e3ee0a66de944beac0fe604dc336d1e985db2c830f"},
{file = "realtime-2.27.2.tar.gz", hash = "sha256:b960a90294d2cea1b3f1275ecb89204304728e08fff1c393cc1b3150739556b3"},
{file = "realtime-2.28.0-py3-none-any.whl", hash = "sha256:db1bd59bab9b1fcc9f9d3b1a073bed35bf4994d720e6751f10031a58d57a3836"},
{file = "realtime-2.28.0.tar.gz", hash = "sha256:d18cedcebd6a8f22fcd509bc767f639761eb218b7b2b6f14fc4205b6259b50fc"},
]
[package.dependencies]
@@ -2436,14 +2437,14 @@ full = ["httpx (>=0.27.0,<0.29.0)", "itsdangerous", "jinja2", "python-multipart
[[package]]
name = "storage3"
version = "2.27.2"
version = "2.28.0"
description = "Supabase Storage client for Python."
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "storage3-2.27.2-py3-none-any.whl", hash = "sha256:e6f16e7a260729e7b1f46e9bf61746805a02e30f5e419ee1291007c432e3ec63"},
{file = "storage3-2.27.2.tar.gz", hash = "sha256:cb4807b7f86b4bb1272ac6fdd2f3cfd8ba577297046fa5f88557425200275af5"},
{file = "storage3-2.28.0-py3-none-any.whl", hash = "sha256:ecb50efd2ac71dabbdf97e99ad346eafa630c4c627a8e5a138ceb5fbbadae716"},
{file = "storage3-2.28.0.tar.gz", hash = "sha256:bc1d008aff67de7a0f2bd867baee7aadbcdb6f78f5a310b4f7a38e8c13c19865"},
]
[package.dependencies]
@@ -2487,35 +2488,35 @@ python-dateutil = ">=2.6.0"
[[package]]
name = "supabase"
version = "2.27.2"
version = "2.28.0"
description = "Supabase client for Python."
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "supabase-2.27.2-py3-none-any.whl", hash = "sha256:d4dce00b3a418ee578017ec577c0e5be47a9a636355009c76f20ed2faa15bc54"},
{file = "supabase-2.27.2.tar.gz", hash = "sha256:2aed40e4f3454438822442a1e94a47be6694c2c70392e7ae99b51a226d4293f7"},
{file = "supabase-2.28.0-py3-none-any.whl", hash = "sha256:42776971c7d0ccca16034df1ab96a31c50228eb1eb19da4249ad2f756fc20272"},
{file = "supabase-2.28.0.tar.gz", hash = "sha256:aea299aaab2a2eed3c57e0be7fc035c6807214194cce795a3575add20268ece1"},
]
[package.dependencies]
httpx = ">=0.26,<0.29"
postgrest = "2.27.2"
realtime = "2.27.2"
storage3 = "2.27.2"
supabase-auth = "2.27.2"
supabase-functions = "2.27.2"
postgrest = "2.28.0"
realtime = "2.28.0"
storage3 = "2.28.0"
supabase-auth = "2.28.0"
supabase-functions = "2.28.0"
yarl = ">=1.22.0"
[[package]]
name = "supabase-auth"
version = "2.27.2"
version = "2.28.0"
description = "Python Client Library for Supabase Auth"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "supabase_auth-2.27.2-py3-none-any.whl", hash = "sha256:78ec25b11314d0a9527a7205f3b1c72560dccdc11b38392f80297ef98664ee91"},
{file = "supabase_auth-2.27.2.tar.gz", hash = "sha256:0f5bcc79b3677cb42e9d321f3c559070cfa40d6a29a67672cc8382fb7dc2fe97"},
{file = "supabase_auth-2.28.0-py3-none-any.whl", hash = "sha256:2ac85026cc285054c7fa6d41924f3a333e9ec298c013e5b5e1754039ba7caec9"},
{file = "supabase_auth-2.28.0.tar.gz", hash = "sha256:2bb8f18ff39934e44b28f10918db965659f3735cd6fbfcc022fe0b82dbf8233e"},
]
[package.dependencies]
@@ -2525,14 +2526,14 @@ pyjwt = {version = ">=2.10.1", extras = ["crypto"]}
[[package]]
name = "supabase-functions"
version = "2.27.2"
version = "2.28.0"
description = "Library for Supabase Functions"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "supabase_functions-2.27.2-py3-none-any.whl", hash = "sha256:db480efc669d0bca07605b9b6f167312af43121adcc842a111f79bea416ef754"},
{file = "supabase_functions-2.27.2.tar.gz", hash = "sha256:d0c8266207a94371cb3fd35ad3c7f025b78a97cf026861e04ccd35ac1775f80b"},
{file = "supabase_functions-2.28.0-py3-none-any.whl", hash = "sha256:30bf2d586f8df285faf0621bb5d5bb3ec3157234fc820553ca156f009475e4ae"},
{file = "supabase_functions-2.28.0.tar.gz", hash = "sha256:db3dddfc37aca5858819eb461130968473bd8c75bd284581013958526dac718b"},
]
[package.dependencies]
@@ -2911,4 +2912,4 @@ type = ["pytest-mypy"]
[metadata]
lock-version = "2.1"
python-versions = ">=3.10,<4.0"
content-hash = "40eae94995dc0a388fa832ed4af9b6137f28d5b5ced3aaea70d5f91d4d9a179d"
content-hash = "9619cae908ad38fa2c48016a58bcf4241f6f5793aa0e6cc140276e91c433cbbb"

View File

@@ -11,14 +11,14 @@ python = ">=3.10,<4.0"
colorama = "^0.4.6"
cryptography = "^46.0"
expiringdict = "^1.2.2"
fastapi = "^0.128.0"
fastapi = "^0.128.7"
google-cloud-logging = "^3.13.0"
launchdarkly-server-sdk = "^9.14.1"
launchdarkly-server-sdk = "^9.15.0"
pydantic = "^2.12.5"
pydantic-settings = "^2.12.0"
pyjwt = { version = "^2.11.0", extras = ["crypto"] }
redis = "^6.2.0"
supabase = "^2.27.2"
supabase = "^2.28.0"
uvicorn = "^0.40.0"
[tool.poetry.group.dev.dependencies]

View File

@@ -104,6 +104,12 @@ TWITTER_CLIENT_SECRET=
# Make a new workspace for your OAuth APP -- trust me
# https://linear.app/settings/api/applications/new
# Callback URL: http://localhost:3000/auth/integrations/oauth_callback
LINEAR_API_KEY=
# Linear project and team IDs for the feature request tracker.
# Find these in your Linear workspace URL: linear.app/<workspace>/project/<project-id>
# and in team settings. Used by the chat copilot to file and search feature requests.
LINEAR_FEATURE_REQUEST_PROJECT_ID=
LINEAR_FEATURE_REQUEST_TEAM_ID=
LINEAR_CLIENT_ID=
LINEAR_CLIENT_SECRET=
@@ -184,5 +190,8 @@ ZEROBOUNCE_API_KEY=
POSTHOG_API_KEY=
POSTHOG_HOST=https://eu.i.posthog.com
# Tally Form Integration (pre-populate business understanding on signup)
TALLY_API_KEY=
# Other Services
AUTOMOD_API_KEY=

View File

@@ -58,10 +58,56 @@ poetry run pytest path/to/test.py --snapshot-update
- **Authentication**: JWT-based with Supabase integration
- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
## Code Style
- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
- **Pydantic models** over dataclass/namedtuple/dict for structured data
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
- **`max(0, value)` guards** — for computed values that should never be negative
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
## Testing Approach
- Uses pytest with snapshot testing for API responses
- Test files are colocated with source files (`*_test.py`)
- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, write the test **before** the implementation:
```python
# 1. Write a failing test marked xfail
@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
# 2. Run it — confirm it fails (XFAIL)
# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
# 3. Implement the fix
# 4. Remove xfail, run again — confirm it passes
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
```
This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
## Database Schema
@@ -157,6 +203,16 @@ yield "image_url", result_url
3. Write tests alongside the route file
4. Run `poetry run test` to verify
## Workspace & Media Files
**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
- Working on CoPilot file upload/download features
- Building blocks that handle `MediaFileType` inputs/outputs
- Modifying `WorkspaceManager` or `store_media_file()`
- Debugging file persistence or virus scanning issues
Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
## Security Implementation
### Cache Protection Middleware

View File

@@ -1,3 +1,5 @@
# ============================ DEPENDENCY BUILDER ============================ #
FROM debian:13-slim AS builder
# Set environment variables
@@ -48,63 +50,128 @@ RUN poetry install --no-ansi --no-root
# Generate Prisma client
COPY autogpt_platform/backend/schema.prisma ./
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
RUN poetry run prisma generate && poetry run gen-prisma-stub
FROM debian:13-slim AS server_dependencies
# =============================== DB MIGRATOR =============================== #
# Lightweight migrate stage - only needs Prisma CLI, not full Python environment
FROM debian:13-slim AS migrate
WORKDIR /app/autogpt_platform/backend
ENV DEBIAN_FRONTEND=noninteractive
# Install only what's needed for prisma migrate: Node.js and minimal Python for prisma-python
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.13 \
python3-pip \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Copy Node.js from builder (needed for Prisma CLI)
COPY --from=builder /usr/bin/node /usr/bin/node
COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
COPY --from=builder /usr/bin/npm /usr/bin/npm
# Copy Prisma binaries
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
# Install prisma-client-py directly (much smaller than copying full venv)
RUN pip3 install prisma>=0.15.0 --break-system-packages
COPY autogpt_platform/backend/schema.prisma ./
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
COPY autogpt_platform/backend/migrations ./migrations
# ============================== BACKEND SERVER ============================== #
FROM debian:13-slim AS server
WORKDIR /app
ENV POETRY_HOME=/opt/poetry \
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=true \
POETRY_VIRTUALENVS_IN_PROJECT=true \
DEBIAN_FRONTEND=noninteractive
ENV PATH=/opt/poetry/bin:$PATH
ENV DEBIAN_FRONTEND=noninteractive
# Install Python, FFmpeg, and ImageMagick (required for video processing blocks)
RUN apt-get update && apt-get install -y \
# Install Python, FFmpeg, ImageMagick, and CLI tools for agent use.
# bubblewrap provides OS-level sandbox (whitelist-only FS + no network)
# for the bash_exec MCP tool (fallback when E2B is not configured).
# Using --no-install-recommends saves ~650MB by skipping unnecessary deps like llvm, mesa, etc.
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.13 \
python3-pip \
ffmpeg \
imagemagick \
jq \
ripgrep \
tree \
bubblewrap \
&& rm -rf /var/lib/apt/lists/*
# Copy only necessary files from builder
COPY --from=builder /app /app
# Copy poetry (build-time only, for `poetry install --only-root` to create entry points)
COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
# Copy Node.js installation for Prisma
# Copy Node.js installation for Prisma and agent-browser.
# npm/npx are symlinks in the builder (-> ../lib/node_modules/npm/bin/*-cli.js);
# COPY resolves them to regular files, breaking require() paths. Recreate as
# proper symlinks so npm/npx can find their modules.
COPY --from=builder /usr/bin/node /usr/bin/node
COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
COPY --from=builder /usr/bin/npm /usr/bin/npm
COPY --from=builder /usr/bin/npx /usr/bin/npx
RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
&& ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
# Install agent-browser (Copilot browser tool) + Chromium.
# On amd64: install runtime libs + run `agent-browser install` to download
# Chrome for Testing (pinned version, tested with Playwright).
# On arm64: install system chromium package — Chrome for Testing has no ARM64
# binary. AGENT_BROWSER_EXECUTABLE_PATH is set at runtime by the entrypoint
# script (below) to redirect agent-browser to the system binary.
ARG TARGETARCH
RUN apt-get update \
&& if [ "$TARGETARCH" = "arm64" ]; then \
apt-get install -y --no-install-recommends chromium fonts-liberation; \
else \
apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
fonts-liberation libfontconfig1; \
fi \
&& rm -rf /var/lib/apt/lists/* \
&& npm install -g agent-browser \
&& ([ "$TARGETARCH" = "arm64" ] || agent-browser install) \
&& rm -rf /tmp/* /root/.npm
RUN mkdir -p /app/autogpt_platform/autogpt_libs
RUN mkdir -p /app/autogpt_platform/backend
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml /app/autogpt_platform/backend/
# On arm64 the system chromium is at /usr/bin/chromium; set
# AGENT_BROWSER_EXECUTABLE_PATH so agent-browser's daemon uses it instead of
# Chrome for Testing (which has no ARM64 binary). On amd64 the variable is left
# unset so agent-browser uses the Chrome for Testing binary it downloaded above.
RUN printf '#!/bin/sh\n[ -x /usr/bin/chromium ] && export AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium\nexec "$@"\n' \
> /usr/local/bin/entrypoint.sh \
&& chmod +x /usr/local/bin/entrypoint.sh
WORKDIR /app/autogpt_platform/backend
FROM server_dependencies AS migrate
# Copy only the .venv from builder (not the entire /app directory)
# The .venv includes the generated Prisma client
COPY --from=builder /app/autogpt_platform/backend/.venv ./.venv
ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
# Migration stage only needs schema and migrations - much lighter than full backend
COPY autogpt_platform/backend/schema.prisma /app/autogpt_platform/backend/
COPY autogpt_platform/backend/backend/data/partial_types.py /app/autogpt_platform/backend/backend/data/partial_types.py
COPY autogpt_platform/backend/migrations /app/autogpt_platform/backend/migrations
# Copy dependency files + autogpt_libs (path dependency)
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml ./
FROM server_dependencies AS server
COPY autogpt_platform/backend /app/autogpt_platform/backend
# Copy backend code + docs (for Copilot docs search)
COPY autogpt_platform/backend ./
COPY docs /app/docs
RUN poetry install --no-ansi --only-root
# Install the project package to create entry point scripts in .venv/bin/
# (e.g., rest, executor, ws, db, scheduler, notification - see [tool.poetry.scripts])
RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \
poetry install --no-ansi --only-root
ENV PORT=8000
CMD ["poetry", "run", "rest"]
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["rest"]

View File

@@ -1,4 +1,9 @@
"""Common test fixtures for server tests."""
"""Common test fixtures for server tests.
Note: Common fixtures like test_user_id, admin_user_id, target_user_id,
setup_test_user, and setup_admin_user are defined in the parent conftest.py
(backend/conftest.py) and are available here automatically.
"""
import pytest
from pytest_snapshot.plugin import Snapshot
@@ -11,54 +16,6 @@ def configured_snapshot(snapshot: Snapshot) -> Snapshot:
return snapshot
@pytest.fixture
def test_user_id() -> str:
"""Test user ID fixture."""
return "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
@pytest.fixture
def admin_user_id() -> str:
"""Admin user ID fixture."""
return "4e53486c-cf57-477e-ba2a-cb02dc828e1b"
@pytest.fixture
def target_user_id() -> str:
"""Target user ID fixture."""
return "5e53486c-cf57-477e-ba2a-cb02dc828e1c"
@pytest.fixture
async def setup_test_user(test_user_id):
"""Create test user in database before tests."""
from backend.data.user import get_or_create_user
# Create the test user in the database using JWT token format
user_data = {
"sub": test_user_id,
"email": "test@example.com",
"user_metadata": {"name": "Test User"},
}
await get_or_create_user(user_data)
return test_user_id
@pytest.fixture
async def setup_admin_user(admin_user_id):
"""Create admin user in database before tests."""
from backend.data.user import get_or_create_user
# Create the admin user in the database using JWT token format
user_data = {
"sub": admin_user_id,
"email": "test-admin@example.com",
"user_metadata": {"name": "Test Admin"},
}
await get_or_create_user(user_data)
return admin_user_id
@pytest.fixture
def mock_jwt_user(test_user_id):
"""Provide mock JWT payload for regular user testing."""

View File

@@ -88,20 +88,23 @@ async def require_auth(
)
def require_permission(permission: APIKeyPermission):
def require_permission(*permissions: APIKeyPermission):
"""
Dependency function for checking specific permissions
Dependency function for checking required permissions.
All listed permissions must be present.
(works with API keys and OAuth tokens)
"""
async def check_permission(
async def check_permissions(
auth: APIAuthorizationInfo = Security(require_auth),
) -> APIAuthorizationInfo:
if permission not in auth.scopes:
missing = [p for p in permissions if p not in auth.scopes]
if missing:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail=f"Missing required permission: {permission.value}",
detail=f"Missing required permission(s): "
f"{', '.join(p.value for p in missing)}",
)
return auth
return check_permission
return check_permissions

View File

@@ -1,7 +1,7 @@
import logging
import urllib.parse
from collections import defaultdict
from typing import Annotated, Any, Literal, Optional, Sequence
from typing import Annotated, Any, Optional, Sequence
from fastapi import APIRouter, Body, HTTPException, Security
from prisma.enums import AgentExecutionStatus, APIKeyPermission
@@ -9,15 +9,17 @@ from pydantic import BaseModel, Field
from typing_extensions import TypedDict
import backend.api.features.store.cache as store_cache
import backend.api.features.store.db as store_db
import backend.api.features.store.model as store_model
import backend.data.block
from backend.api.external.middleware import require_permission
import backend.blocks
from backend.api.external.middleware import require_auth, require_permission
from backend.data import execution as execution_db
from backend.data import graph as graph_db
from backend.data import user as user_db
from backend.data.auth.base import APIAuthorizationInfo
from backend.data.block import BlockInput, CompletedBlockOutput
from backend.executor.utils import add_graph_execution
from backend.integrations.webhooks.graph_lifecycle_hooks import on_graph_activate
from backend.util.settings import Settings
from .integrations import integrations_router
@@ -67,7 +69,7 @@ async def get_user_info(
dependencies=[Security(require_permission(APIKeyPermission.READ_BLOCK))],
)
async def get_graph_blocks() -> Sequence[dict[Any, Any]]:
blocks = [block() for block in backend.data.block.get_blocks().values()]
blocks = [block() for block in backend.blocks.get_blocks().values()]
return [b.to_dict() for b in blocks if not b.disabled]
@@ -83,7 +85,7 @@ async def execute_graph_block(
require_permission(APIKeyPermission.EXECUTE_BLOCK)
),
) -> CompletedBlockOutput:
obj = backend.data.block.get_block(block_id)
obj = backend.blocks.get_block(block_id)
if not obj:
raise HTTPException(status_code=404, detail=f"Block #{block_id} not found.")
if obj.disabled:
@@ -95,6 +97,43 @@ async def execute_graph_block(
return output
@v1_router.post(
path="/graphs",
tags=["graphs"],
status_code=201,
dependencies=[
Security(
require_permission(
APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY
)
)
],
)
async def create_graph(
graph: graph_db.Graph,
auth: APIAuthorizationInfo = Security(
require_permission(APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY)
),
) -> graph_db.GraphModel:
"""
Create a new agent graph.
The graph will be validated and assigned a new ID.
It is automatically added to the user's library.
"""
from backend.api.features.library import db as library_db
graph_model = graph_db.make_graph_model(graph, auth.user_id)
graph_model.reassign_ids(user_id=auth.user_id, reassign_graph_id=True)
graph_model.validate_graph(for_run=False)
await graph_db.create_graph(graph_model, user_id=auth.user_id)
await library_db.create_library_agent(graph_model, auth.user_id)
activated_graph = await on_graph_activate(graph_model, user_id=auth.user_id)
return activated_graph
@v1_router.post(
path="/graphs/{graph_id}/execute/{graph_version}",
tags=["graphs"],
@@ -192,13 +231,13 @@ async def get_graph_execution_results(
@v1_router.get(
path="/store/agents",
tags=["store"],
dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
dependencies=[Security(require_auth)], # data is public; auth required as anti-DDoS
response_model=store_model.StoreAgentsResponse,
)
async def get_store_agents(
featured: bool = False,
creator: str | None = None,
sorted_by: Literal["rating", "runs", "name", "updated_at"] | None = None,
sorted_by: store_db.StoreAgentsSortOptions | None = None,
search_query: str | None = None,
category: str | None = None,
page: int = 1,
@@ -240,7 +279,7 @@ async def get_store_agents(
@v1_router.get(
path="/store/agents/{username}/{agent_name}",
tags=["store"],
dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
dependencies=[Security(require_auth)], # data is public; auth required as anti-DDoS
response_model=store_model.StoreAgentDetails,
)
async def get_store_agent(
@@ -268,13 +307,13 @@ async def get_store_agent(
@v1_router.get(
path="/store/creators",
tags=["store"],
dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
dependencies=[Security(require_auth)], # data is public; auth required as anti-DDoS
response_model=store_model.CreatorsResponse,
)
async def get_store_creators(
featured: bool = False,
search_query: str | None = None,
sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None = None,
sorted_by: store_db.StoreCreatorsSortOptions | None = None,
page: int = 1,
page_size: int = 20,
) -> store_model.CreatorsResponse:
@@ -310,7 +349,7 @@ async def get_store_creators(
@v1_router.get(
path="/store/creators/{username}",
tags=["store"],
dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
dependencies=[Security(require_auth)], # data is public; auth required as anti-DDoS
response_model=store_model.CreatorDetails,
)
async def get_store_creator(

View File

@@ -15,9 +15,9 @@ from prisma.enums import APIKeyPermission
from pydantic import BaseModel, Field
from backend.api.external.middleware import require_permission
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.tools import find_agent_tool, run_agent_tool
from backend.api.features.chat.tools.models import ToolResponseBase
from backend.copilot.model import ChatSession
from backend.copilot.tools import find_agent_tool, run_agent_tool
from backend.copilot.tools.models import ToolResponseBase
from backend.data.auth.base import APIAuthorizationInfo
logger = logging.getLogger(__name__)

View File

@@ -24,14 +24,13 @@ router = fastapi.APIRouter(
@router.get(
"/listings",
summary="Get Admin Listings History",
response_model=store_model.StoreListingsWithVersionsResponse,
)
async def get_admin_listings_with_versions(
status: typing.Optional[prisma.enums.SubmissionStatus] = None,
search: typing.Optional[str] = None,
page: int = 1,
page_size: int = 20,
):
) -> store_model.StoreListingsWithVersionsAdminViewResponse:
"""
Get store listings with their version history for admins.
@@ -45,36 +44,26 @@ async def get_admin_listings_with_versions(
page_size: Number of items per page
Returns:
StoreListingsWithVersionsResponse with listings and their versions
Paginated listings with their versions
"""
try:
listings = await store_db.get_admin_listings_with_versions(
status=status,
search_query=search,
page=page,
page_size=page_size,
)
return listings
except Exception as e:
logger.exception("Error getting admin listings with versions: %s", e)
return fastapi.responses.JSONResponse(
status_code=500,
content={
"detail": "An error occurred while retrieving listings with versions"
},
)
listings = await store_db.get_admin_listings_with_versions(
status=status,
search_query=search,
page=page,
page_size=page_size,
)
return listings
@router.post(
"/submissions/{store_listing_version_id}/review",
summary="Review Store Submission",
response_model=store_model.StoreSubmission,
)
async def review_submission(
store_listing_version_id: str,
request: store_model.ReviewSubmissionRequest,
user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
):
) -> store_model.StoreSubmissionAdminView:
"""
Review a store listing submission.
@@ -84,31 +73,24 @@ async def review_submission(
user_id: Authenticated admin user performing the review
Returns:
StoreSubmission with updated review information
StoreSubmissionAdminView with updated review information
"""
try:
already_approved = await store_db.check_submission_already_approved(
store_listing_version_id=store_listing_version_id,
)
submission = await store_db.review_store_submission(
store_listing_version_id=store_listing_version_id,
is_approved=request.is_approved,
external_comments=request.comments,
internal_comments=request.internal_comments or "",
reviewer_id=user_id,
)
already_approved = await store_db.check_submission_already_approved(
store_listing_version_id=store_listing_version_id,
)
submission = await store_db.review_store_submission(
store_listing_version_id=store_listing_version_id,
is_approved=request.is_approved,
external_comments=request.comments,
internal_comments=request.internal_comments or "",
reviewer_id=user_id,
)
state_changed = already_approved != request.is_approved
# Clear caches when the request is approved as it updates what is shown on the store
if state_changed:
store_cache.clear_all_caches()
return submission
except Exception as e:
logger.exception("Error reviewing submission: %s", e)
return fastapi.responses.JSONResponse(
status_code=500,
content={"detail": "An error occurred while reviewing the submission"},
)
state_changed = already_approved != request.is_approved
# Clear caches whenever approval state changes, since store visibility can change
if state_changed:
store_cache.clear_all_caches()
return submission
@router.get(

View File

@@ -1,28 +1,33 @@
import logging
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from difflib import SequenceMatcher
from typing import Sequence
from typing import Any, Sequence, get_args, get_origin
import prisma
from prisma.models import mv_suggested_blocks
import backend.api.features.library.db as library_db
import backend.api.features.library.model as library_model
import backend.api.features.store.db as store_db
import backend.api.features.store.model as store_model
import backend.data.block
from backend.blocks import load_all_blocks
from backend.blocks._base import (
AnyBlockSchema,
BlockCategory,
BlockInfo,
BlockSchema,
BlockType,
)
from backend.blocks.llm import LlmModel
from backend.data.block import AnyBlockSchema, BlockCategory, BlockInfo, BlockSchema
from backend.data.db import query_raw_with_schema
from backend.integrations.providers import ProviderName
from backend.util.cache import cached
from backend.util.models import Pagination
from backend.util.text import split_camelcase
from .model import (
BlockCategoryResponse,
BlockResponse,
BlockType,
BlockTypeFilter,
CountResponse,
FilterType,
Provider,
@@ -37,6 +42,16 @@ MAX_LIBRARY_AGENT_RESULTS = 100
MAX_MARKETPLACE_AGENT_RESULTS = 100
MIN_SCORE_FOR_FILTERED_RESULTS = 10.0
# Boost blocks over marketplace agents in search results
BLOCK_SCORE_BOOST = 50.0
# Block IDs to exclude from search results
EXCLUDED_BLOCK_IDS = frozenset(
{
"e189baac-8c20-45a1-94a7-55177ea42565", # AgentExecutorBlock
}
)
SearchResultItem = BlockInfo | library_model.LibraryAgent | store_model.StoreAgent
@@ -59,8 +74,8 @@ def get_block_categories(category_blocks: int = 3) -> list[BlockCategoryResponse
for block_type in load_all_blocks().values():
block: AnyBlockSchema = block_type()
# Skip disabled blocks
if block.disabled:
# Skip disabled and excluded blocks
if block.disabled or block.id in EXCLUDED_BLOCK_IDS:
continue
# Skip blocks that don't have categories (all should have at least one)
if not block.categories:
@@ -88,7 +103,7 @@ def get_block_categories(category_blocks: int = 3) -> list[BlockCategoryResponse
def get_blocks(
*,
category: str | None = None,
type: BlockType | None = None,
type: BlockTypeFilter | None = None,
provider: ProviderName | None = None,
page: int = 1,
page_size: int = 50,
@@ -111,6 +126,9 @@ def get_blocks(
# Skip disabled blocks
if block.disabled:
continue
# Skip excluded blocks
if block.id in EXCLUDED_BLOCK_IDS:
continue
# Skip blocks that don't match the category
if category and category not in {c.name.lower() for c in block.categories}:
continue
@@ -250,14 +268,25 @@ async def _build_cached_search_results(
"my_agents": 0,
}
block_results, block_total, integration_total = _collect_block_results(
normalized_query=normalized_query,
include_blocks=include_blocks,
include_integrations=include_integrations,
)
scored_items.extend(block_results)
total_items["blocks"] = block_total
total_items["integrations"] = integration_total
# Use hybrid search when query is present, otherwise list all blocks
if (include_blocks or include_integrations) and normalized_query:
block_results, block_total, integration_total = await _text_search_blocks(
query=search_query,
include_blocks=include_blocks,
include_integrations=include_integrations,
)
scored_items.extend(block_results)
total_items["blocks"] = block_total
total_items["integrations"] = integration_total
elif include_blocks or include_integrations:
# No query - list all blocks using in-memory approach
block_results, block_total, integration_total = _collect_block_results(
include_blocks=include_blocks,
include_integrations=include_integrations,
)
scored_items.extend(block_results)
total_items["blocks"] = block_total
total_items["integrations"] = integration_total
if include_library_agents:
library_response = await library_db.list_library_agents(
@@ -302,10 +331,14 @@ async def _build_cached_search_results(
def _collect_block_results(
*,
normalized_query: str,
include_blocks: bool,
include_integrations: bool,
) -> tuple[list[_ScoredItem], int, int]:
"""
Collect all blocks for listing (no search query).
All blocks get BLOCK_SCORE_BOOST to prioritize them over marketplace agents.
"""
results: list[_ScoredItem] = []
block_count = 0
integration_count = 0
@@ -318,6 +351,10 @@ def _collect_block_results(
if block.disabled:
continue
# Skip excluded blocks
if block.id in EXCLUDED_BLOCK_IDS:
continue
block_info = block.get_info()
credentials = list(block.input_schema.get_credentials_fields().values())
is_integration = len(credentials) > 0
@@ -327,10 +364,6 @@ def _collect_block_results(
if not is_integration and not include_blocks:
continue
score = _score_block(block, block_info, normalized_query)
if not _should_include_item(score, normalized_query):
continue
filter_type: FilterType = "integrations" if is_integration else "blocks"
if is_integration:
integration_count += 1
@@ -341,14 +374,86 @@ def _collect_block_results(
_ScoredItem(
item=block_info,
filter_type=filter_type,
score=score,
sort_key=_get_item_name(block_info),
score=BLOCK_SCORE_BOOST,
sort_key=block_info.name.lower(),
)
)
return results, block_count, integration_count
async def _text_search_blocks(
*,
query: str,
include_blocks: bool,
include_integrations: bool,
) -> tuple[list[_ScoredItem], int, int]:
"""
Search blocks using in-memory text matching over the block registry.
All blocks are already loaded in memory, so this is fast and reliable
regardless of whether OpenAI embeddings are available.
Scoring:
- Base: text relevance via _score_primary_fields, plus BLOCK_SCORE_BOOST
to prioritize blocks over marketplace agents in combined results
- +20 if the block has an LlmModel field and the query matches an LLM model name
"""
results: list[_ScoredItem] = []
if not include_blocks and not include_integrations:
return results, 0, 0
normalized_query = query.strip().lower()
all_results, _, _ = _collect_block_results(
include_blocks=include_blocks,
include_integrations=include_integrations,
)
all_blocks = load_all_blocks()
for item in all_results:
block_info = item.item
assert isinstance(block_info, BlockInfo)
name = split_camelcase(block_info.name).lower()
# Build rich description including input field descriptions,
# matching the searchable text that the embedding pipeline uses
desc_parts = [block_info.description or ""]
block_cls = all_blocks.get(block_info.id)
if block_cls is not None:
block: AnyBlockSchema = block_cls()
desc_parts += [
f"{f}: {info.description}"
for f, info in block.input_schema.model_fields.items()
if info.description
]
description = " ".join(desc_parts).lower()
score = _score_primary_fields(name, description, normalized_query)
# Add LLM model match bonus
if block_cls is not None and _matches_llm_model(
block_cls().input_schema, normalized_query
):
score += 20
if score >= MIN_SCORE_FOR_FILTERED_RESULTS:
results.append(
_ScoredItem(
item=block_info,
filter_type=item.filter_type,
score=score + BLOCK_SCORE_BOOST,
sort_key=name,
)
)
block_count = sum(1 for r in results if r.filter_type == "blocks")
integration_count = sum(1 for r in results if r.filter_type == "integrations")
return results, block_count, integration_count
def _build_library_items(
*,
agents: list[library_model.LibraryAgent],
@@ -467,6 +572,8 @@ async def _get_static_counts():
block: AnyBlockSchema = block_type()
if block.disabled:
continue
if block.id in EXCLUDED_BLOCK_IDS:
continue
all_blocks += 1
@@ -493,47 +600,25 @@ async def _get_static_counts():
}
def _contains_type(annotation: Any, target: type) -> bool:
"""Check if an annotation is or contains the target type (handles Optional/Union/Annotated)."""
if annotation is target:
return True
origin = get_origin(annotation)
if origin is None:
return False
return any(_contains_type(arg, target) for arg in get_args(annotation))
def _matches_llm_model(schema_cls: type[BlockSchema], query: str) -> bool:
for field in schema_cls.model_fields.values():
if field.annotation == LlmModel:
if _contains_type(field.annotation, LlmModel):
# Check if query matches any value in llm_models
if any(query in name for name in llm_models):
return True
return False
def _score_block(
block: AnyBlockSchema,
block_info: BlockInfo,
normalized_query: str,
) -> float:
if not normalized_query:
return 0.0
name = block_info.name.lower()
description = block_info.description.lower()
score = _score_primary_fields(name, description, normalized_query)
category_text = " ".join(
category.get("category", "").lower() for category in block_info.categories
)
score += _score_additional_field(category_text, normalized_query, 12, 6)
credentials_info = block.input_schema.get_credentials_fields_info().values()
provider_names = [
provider.value.lower()
for info in credentials_info
for provider in info.provider
]
provider_text = " ".join(provider_names)
score += _score_additional_field(provider_text, normalized_query, 15, 6)
if _matches_llm_model(block.input_schema, normalized_query):
score += 20
return score
def _score_library_agent(
agent: library_model.LibraryAgent,
normalized_query: str,
@@ -640,45 +725,32 @@ def _get_all_providers() -> dict[ProviderName, Provider]:
return providers
@cached(ttl_seconds=3600)
@cached(ttl_seconds=3600, shared_cache=True)
async def get_suggested_blocks(count: int = 5) -> list[BlockInfo]:
suggested_blocks = []
# Sum the number of executions for each block type
# Prisma cannot group by nested relations, so we do a raw query
# Calculate the cutoff timestamp
timestamp_threshold = datetime.now(timezone.utc) - timedelta(days=30)
"""Return the most-executed blocks from the last 14 days.
results = await query_raw_with_schema(
"""
SELECT
agent_node."agentBlockId" AS block_id,
COUNT(execution.id) AS execution_count
FROM {schema_prefix}"AgentNodeExecution" execution
JOIN {schema_prefix}"AgentNode" agent_node ON execution."agentNodeId" = agent_node.id
WHERE execution."endedTime" >= $1::timestamp
GROUP BY agent_node."agentBlockId"
ORDER BY execution_count DESC;
""",
timestamp_threshold,
)
Queries the mv_suggested_blocks materialized view (refreshed hourly via pg_cron)
and returns the top `count` blocks sorted by execution count, excluding
Input/Output/Agent block types and blocks in EXCLUDED_BLOCK_IDS.
"""
results = await mv_suggested_blocks.prisma().find_many()
# Get the top blocks based on execution count
# But ignore Input and Output blocks
# But ignore Input, Output, Agent, and excluded blocks
blocks: list[tuple[BlockInfo, int]] = []
execution_counts = {row.block_id: row.execution_count for row in results}
for block_type in load_all_blocks().values():
block: AnyBlockSchema = block_type()
if block.disabled or block.block_type in (
backend.data.block.BlockType.INPUT,
backend.data.block.BlockType.OUTPUT,
backend.data.block.BlockType.AGENT,
BlockType.INPUT,
BlockType.OUTPUT,
BlockType.AGENT,
):
continue
# Find the execution count for this block
execution_count = next(
(row["execution_count"] for row in results if row["block_id"] == block.id),
0,
)
if block.id in EXCLUDED_BLOCK_IDS:
continue
execution_count = execution_counts.get(block.id, 0)
blocks.append((block.get_info(), execution_count))
# Sort blocks by execution count
blocks.sort(key=lambda x: x[1], reverse=True)

View File

@@ -4,7 +4,7 @@ from pydantic import BaseModel
import backend.api.features.library.model as library_model
import backend.api.features.store.model as store_model
from backend.data.block import BlockInfo
from backend.blocks._base import BlockInfo
from backend.integrations.providers import ProviderName
from backend.util.models import Pagination
@@ -15,7 +15,7 @@ FilterType = Literal[
"my_agents",
]
BlockType = Literal["all", "input", "action", "output"]
BlockTypeFilter = Literal["all", "input", "action", "output"]
class SearchEntry(BaseModel):
@@ -27,7 +27,6 @@ class SearchEntry(BaseModel):
# Suggestions
class SuggestionsResponse(BaseModel):
otto_suggestions: list[str]
recent_searches: list[SearchEntry]
providers: list[ProviderName]
top_blocks: list[BlockInfo]

View File

@@ -1,5 +1,5 @@
import logging
from typing import Annotated, Sequence
from typing import Annotated, Sequence, cast, get_args
import fastapi
from autogpt_libs.auth.dependencies import get_user_id, requires_user
@@ -10,6 +10,8 @@ from backend.util.models import Pagination
from . import db as builder_db
from . import model as builder_model
VALID_FILTER_VALUES = get_args(builder_model.FilterType)
logger = logging.getLogger(__name__)
router = fastapi.APIRouter(
@@ -49,11 +51,6 @@ async def get_suggestions(
Get all suggestions for the Blocks Menu.
"""
return builder_model.SuggestionsResponse(
otto_suggestions=[
"What blocks do I need to get started?",
"Help me create a list",
"Help me feed my data to Google Maps",
],
recent_searches=await builder_db.get_recent_searches(user_id),
providers=[
ProviderName.TWITTER,
@@ -88,7 +85,7 @@ async def get_block_categories(
)
async def get_blocks(
category: Annotated[str | None, fastapi.Query()] = None,
type: Annotated[builder_model.BlockType | None, fastapi.Query()] = None,
type: Annotated[builder_model.BlockTypeFilter | None, fastapi.Query()] = None,
provider: Annotated[ProviderName | None, fastapi.Query()] = None,
page: Annotated[int, fastapi.Query()] = 1,
page_size: Annotated[int, fastapi.Query()] = 50,
@@ -151,7 +148,7 @@ async def get_providers(
async def search(
user_id: Annotated[str, fastapi.Security(get_user_id)],
search_query: Annotated[str | None, fastapi.Query()] = None,
filter: Annotated[list[builder_model.FilterType] | None, fastapi.Query()] = None,
filter: Annotated[str | None, fastapi.Query()] = None,
search_id: Annotated[str | None, fastapi.Query()] = None,
by_creator: Annotated[list[str] | None, fastapi.Query()] = None,
page: Annotated[int, fastapi.Query()] = 1,
@@ -160,9 +157,20 @@ async def search(
"""
Search for blocks (including integrations), marketplace agents, and user library agents.
"""
# If no filters are provided, then we will return all types
if not filter:
filter = [
# Parse and validate filter parameter
filters: list[builder_model.FilterType]
if filter:
filter_values = [f.strip() for f in filter.split(",")]
invalid_filters = [f for f in filter_values if f not in VALID_FILTER_VALUES]
if invalid_filters:
raise fastapi.HTTPException(
status_code=400,
detail=f"Invalid filter value(s): {', '.join(invalid_filters)}. "
f"Valid values are: {', '.join(VALID_FILTER_VALUES)}",
)
filters = cast(list[builder_model.FilterType], filter_values)
else:
filters = [
"blocks",
"integrations",
"marketplace_agents",
@@ -174,7 +182,7 @@ async def search(
cached_results = await builder_db.get_sorted_search_results(
user_id=user_id,
search_query=search_query,
filters=filter,
filters=filters,
by_creator=by_creator,
)
@@ -196,7 +204,7 @@ async def search(
user_id,
builder_model.SearchEntry(
search_query=search_query,
filter=filter,
filter=filters,
by_creator=by_creator,
search_id=search_id,
),

View File

@@ -1,368 +0,0 @@
"""Redis Streams consumer for operation completion messages.
This module provides a consumer (ChatCompletionConsumer) that listens for
completion notifications (OperationCompleteMessage) from external services
(like Agent Generator) and triggers the appropriate stream registry and
chat service updates via process_operation_success/process_operation_failure.
Why Redis Streams instead of RabbitMQ?
--------------------------------------
While the project typically uses RabbitMQ for async task queues (e.g., execution
queue), Redis Streams was chosen for chat completion notifications because:
1. **Unified Infrastructure**: The SSE reconnection feature already uses Redis
Streams (via stream_registry) for message persistence and replay. Using Redis
Streams for completion notifications keeps all chat streaming infrastructure
in one system, simplifying operations and reducing cross-system coordination.
2. **Message Replay**: Redis Streams support XREAD with arbitrary message IDs,
allowing consumers to replay missed messages after reconnection. This aligns
with the SSE reconnection pattern where clients can resume from last_message_id.
3. **Consumer Groups with XAUTOCLAIM**: Redis consumer groups provide automatic
load balancing across pods with explicit message claiming (XAUTOCLAIM) for
recovering from dead consumers - ideal for the completion callback pattern.
4. **Lower Latency**: For real-time SSE updates, Redis (already in-memory for
stream_registry) provides lower latency than an additional RabbitMQ hop.
5. **Atomicity with Task State**: Completion processing often needs to update
task metadata stored in Redis. Keeping both in Redis enables simpler
transactional semantics without distributed coordination.
The consumer uses Redis Streams with consumer groups for reliable message
processing across multiple platform pods, with XAUTOCLAIM for reclaiming
stale pending messages from dead consumers.
"""
import asyncio
import logging
import os
import uuid
from typing import Any
import orjson
from prisma import Prisma
from pydantic import BaseModel
from redis.exceptions import ResponseError
from backend.data.redis_client import get_redis_async
from . import stream_registry
from .completion_handler import process_operation_failure, process_operation_success
from .config import ChatConfig
logger = logging.getLogger(__name__)
config = ChatConfig()
class OperationCompleteMessage(BaseModel):
"""Message format for operation completion notifications."""
operation_id: str
task_id: str
success: bool
result: dict | str | None = None
error: str | None = None
class ChatCompletionConsumer:
"""Consumer for chat operation completion messages from Redis Streams.
This consumer initializes its own Prisma client in start() to ensure
database operations work correctly within this async context.
Uses Redis consumer groups to allow multiple platform pods to consume
messages reliably with automatic redelivery on failure.
"""
def __init__(self):
self._consumer_task: asyncio.Task | None = None
self._running = False
self._prisma: Prisma | None = None
self._consumer_name = f"consumer-{uuid.uuid4().hex[:8]}"
async def start(self) -> None:
"""Start the completion consumer."""
if self._running:
logger.warning("Completion consumer already running")
return
# Create consumer group if it doesn't exist
try:
redis = await get_redis_async()
await redis.xgroup_create(
config.stream_completion_name,
config.stream_consumer_group,
id="0",
mkstream=True,
)
logger.info(
f"Created consumer group '{config.stream_consumer_group}' "
f"on stream '{config.stream_completion_name}'"
)
except ResponseError as e:
if "BUSYGROUP" in str(e):
logger.debug(
f"Consumer group '{config.stream_consumer_group}' already exists"
)
else:
raise
self._running = True
self._consumer_task = asyncio.create_task(self._consume_messages())
logger.info(
f"Chat completion consumer started (consumer: {self._consumer_name})"
)
async def _ensure_prisma(self) -> Prisma:
"""Lazily initialize Prisma client on first use."""
if self._prisma is None:
database_url = os.getenv("DATABASE_URL", "postgresql://localhost:5432")
self._prisma = Prisma(datasource={"url": database_url})
await self._prisma.connect()
logger.info("[COMPLETION] Consumer Prisma client connected (lazy init)")
return self._prisma
async def stop(self) -> None:
"""Stop the completion consumer."""
self._running = False
if self._consumer_task:
self._consumer_task.cancel()
try:
await self._consumer_task
except asyncio.CancelledError:
pass
self._consumer_task = None
if self._prisma:
await self._prisma.disconnect()
self._prisma = None
logger.info("[COMPLETION] Consumer Prisma client disconnected")
logger.info("Chat completion consumer stopped")
async def _consume_messages(self) -> None:
"""Main message consumption loop with retry logic."""
max_retries = 10
retry_delay = 5 # seconds
retry_count = 0
block_timeout = 5000 # milliseconds
while self._running and retry_count < max_retries:
try:
redis = await get_redis_async()
# Reset retry count on successful connection
retry_count = 0
while self._running:
# First, claim any stale pending messages from dead consumers
# Redis does NOT auto-redeliver pending messages; we must explicitly
# claim them using XAUTOCLAIM
try:
claimed_result = await redis.xautoclaim(
name=config.stream_completion_name,
groupname=config.stream_consumer_group,
consumername=self._consumer_name,
min_idle_time=config.stream_claim_min_idle_ms,
start_id="0-0",
count=10,
)
# xautoclaim returns: (next_start_id, [(id, data), ...], [deleted_ids])
if claimed_result and len(claimed_result) >= 2:
claimed_entries = claimed_result[1]
if claimed_entries:
logger.info(
f"Claimed {len(claimed_entries)} stale pending messages"
)
for entry_id, data in claimed_entries:
if not self._running:
return
await self._process_entry(redis, entry_id, data)
except Exception as e:
logger.warning(f"XAUTOCLAIM failed (non-fatal): {e}")
# Read new messages from the stream
messages = await redis.xreadgroup(
groupname=config.stream_consumer_group,
consumername=self._consumer_name,
streams={config.stream_completion_name: ">"},
block=block_timeout,
count=10,
)
if not messages:
continue
for stream_name, entries in messages:
for entry_id, data in entries:
if not self._running:
return
await self._process_entry(redis, entry_id, data)
except asyncio.CancelledError:
logger.info("Consumer cancelled")
return
except Exception as e:
retry_count += 1
logger.error(
f"Consumer error (retry {retry_count}/{max_retries}): {e}",
exc_info=True,
)
if self._running and retry_count < max_retries:
await asyncio.sleep(retry_delay)
else:
logger.error("Max retries reached, stopping consumer")
return
async def _process_entry(
self, redis: Any, entry_id: str, data: dict[str, Any]
) -> None:
"""Process a single stream entry and acknowledge it on success.
Args:
redis: Redis client connection
entry_id: The stream entry ID
data: The entry data dict
"""
try:
# Handle the message
message_data = data.get("data")
if message_data:
await self._handle_message(
message_data.encode()
if isinstance(message_data, str)
else message_data
)
# Acknowledge the message after successful processing
await redis.xack(
config.stream_completion_name,
config.stream_consumer_group,
entry_id,
)
except Exception as e:
logger.error(
f"Error processing completion message {entry_id}: {e}",
exc_info=True,
)
# Message remains in pending state and will be claimed by
# XAUTOCLAIM after min_idle_time expires
async def _handle_message(self, body: bytes) -> None:
"""Handle a completion message using our own Prisma client."""
try:
data = orjson.loads(body)
message = OperationCompleteMessage(**data)
except Exception as e:
logger.error(f"Failed to parse completion message: {e}")
return
logger.info(
f"[COMPLETION] Received completion for operation {message.operation_id} "
f"(task_id={message.task_id}, success={message.success})"
)
# Find task in registry
task = await stream_registry.find_task_by_operation_id(message.operation_id)
if task is None:
task = await stream_registry.get_task(message.task_id)
if task is None:
logger.warning(
f"[COMPLETION] Task not found for operation {message.operation_id} "
f"(task_id={message.task_id})"
)
return
logger.info(
f"[COMPLETION] Found task: task_id={task.task_id}, "
f"session_id={task.session_id}, tool_call_id={task.tool_call_id}"
)
# Guard against empty task fields
if not task.task_id or not task.session_id or not task.tool_call_id:
logger.error(
f"[COMPLETION] Task has empty critical fields! "
f"task_id={task.task_id!r}, session_id={task.session_id!r}, "
f"tool_call_id={task.tool_call_id!r}"
)
return
if message.success:
await self._handle_success(task, message)
else:
await self._handle_failure(task, message)
async def _handle_success(
self,
task: stream_registry.ActiveTask,
message: OperationCompleteMessage,
) -> None:
"""Handle successful operation completion."""
prisma = await self._ensure_prisma()
await process_operation_success(task, message.result, prisma)
async def _handle_failure(
self,
task: stream_registry.ActiveTask,
message: OperationCompleteMessage,
) -> None:
"""Handle failed operation completion."""
prisma = await self._ensure_prisma()
await process_operation_failure(task, message.error, prisma)
# Module-level consumer instance
_consumer: ChatCompletionConsumer | None = None
async def start_completion_consumer() -> None:
"""Start the global completion consumer."""
global _consumer
if _consumer is None:
_consumer = ChatCompletionConsumer()
await _consumer.start()
async def stop_completion_consumer() -> None:
"""Stop the global completion consumer."""
global _consumer
if _consumer:
await _consumer.stop()
_consumer = None
async def publish_operation_complete(
operation_id: str,
task_id: str,
success: bool,
result: dict | str | None = None,
error: str | None = None,
) -> None:
"""Publish an operation completion message to Redis Streams.
Args:
operation_id: The operation ID that completed.
task_id: The task ID associated with the operation.
success: Whether the operation succeeded.
result: The result data (for success).
error: The error message (for failure).
"""
message = OperationCompleteMessage(
operation_id=operation_id,
task_id=task_id,
success=success,
result=result,
error=error,
)
redis = await get_redis_async()
await redis.xadd(
config.stream_completion_name,
{"data": message.model_dump_json()},
maxlen=config.stream_max_length,
)
logger.info(f"Published completion for operation {operation_id}")

View File

@@ -1,344 +0,0 @@
"""Shared completion handling for operation success and failure.
This module provides common logic for handling operation completion from both:
- The Redis Streams consumer (completion_consumer.py)
- The HTTP webhook endpoint (routes.py)
"""
import logging
from typing import Any
import orjson
from prisma import Prisma
from . import service as chat_service
from . import stream_registry
from .response_model import StreamError, StreamToolOutputAvailable
from .tools.models import ErrorResponse
logger = logging.getLogger(__name__)
# Tools that produce agent_json that needs to be saved to library
AGENT_GENERATION_TOOLS = {"create_agent", "edit_agent"}
# Keys that should be stripped from agent_json when returning in error responses
SENSITIVE_KEYS = frozenset(
{
"api_key",
"apikey",
"api_secret",
"password",
"secret",
"credentials",
"credential",
"token",
"access_token",
"refresh_token",
"private_key",
"privatekey",
"auth",
"authorization",
}
)
def _sanitize_agent_json(obj: Any) -> Any:
"""Recursively sanitize agent_json by removing sensitive keys.
Args:
obj: The object to sanitize (dict, list, or primitive)
Returns:
Sanitized copy with sensitive keys removed/redacted
"""
if isinstance(obj, dict):
return {
k: "[REDACTED]" if k.lower() in SENSITIVE_KEYS else _sanitize_agent_json(v)
for k, v in obj.items()
}
elif isinstance(obj, list):
return [_sanitize_agent_json(item) for item in obj]
else:
return obj
class ToolMessageUpdateError(Exception):
"""Raised when updating a tool message in the database fails."""
pass
async def _update_tool_message(
session_id: str,
tool_call_id: str,
content: str,
prisma_client: Prisma | None,
) -> None:
"""Update tool message in database.
Args:
session_id: The session ID
tool_call_id: The tool call ID to update
content: The new content for the message
prisma_client: Optional Prisma client. If None, uses chat_service.
Raises:
ToolMessageUpdateError: If the database update fails. The caller should
handle this to avoid marking the task as completed with inconsistent state.
"""
try:
if prisma_client:
# Use provided Prisma client (for consumer with its own connection)
updated_count = await prisma_client.chatmessage.update_many(
where={
"sessionId": session_id,
"toolCallId": tool_call_id,
},
data={"content": content},
)
# Check if any rows were updated - 0 means message not found
if updated_count == 0:
raise ToolMessageUpdateError(
f"No message found with tool_call_id={tool_call_id} in session {session_id}"
)
else:
# Use service function (for webhook endpoint)
await chat_service._update_pending_operation(
session_id=session_id,
tool_call_id=tool_call_id,
result=content,
)
except ToolMessageUpdateError:
raise
except Exception as e:
logger.error(f"[COMPLETION] Failed to update tool message: {e}", exc_info=True)
raise ToolMessageUpdateError(
f"Failed to update tool message for tool_call_id={tool_call_id}: {e}"
) from e
def serialize_result(result: dict | list | str | int | float | bool | None) -> str:
"""Serialize result to JSON string with sensible defaults.
Args:
result: The result to serialize. Can be a dict, list, string,
number, boolean, or None.
Returns:
JSON string representation of the result. Returns '{"status": "completed"}'
only when result is explicitly None.
"""
if isinstance(result, str):
return result
if result is None:
return '{"status": "completed"}'
return orjson.dumps(result).decode("utf-8")
async def _save_agent_from_result(
result: dict[str, Any],
user_id: str | None,
tool_name: str,
) -> dict[str, Any]:
"""Save agent to library if result contains agent_json.
Args:
result: The result dict that may contain agent_json
user_id: The user ID to save the agent for
tool_name: The tool name (create_agent or edit_agent)
Returns:
Updated result dict with saved agent details, or original result if no agent_json
"""
if not user_id:
logger.warning("[COMPLETION] Cannot save agent: no user_id in task")
return result
agent_json = result.get("agent_json")
if not agent_json:
logger.warning(
f"[COMPLETION] {tool_name} completed but no agent_json in result"
)
return result
try:
from .tools.agent_generator import save_agent_to_library
is_update = tool_name == "edit_agent"
created_graph, library_agent = await save_agent_to_library(
agent_json, user_id, is_update=is_update
)
logger.info(
f"[COMPLETION] Saved agent '{created_graph.name}' to library "
f"(graph_id={created_graph.id}, library_agent_id={library_agent.id})"
)
# Return a response similar to AgentSavedResponse
return {
"type": "agent_saved",
"message": f"Agent '{created_graph.name}' has been saved to your library!",
"agent_id": created_graph.id,
"agent_name": created_graph.name,
"library_agent_id": library_agent.id,
"library_agent_link": f"/library/agents/{library_agent.id}",
"agent_page_link": f"/build?flowID={created_graph.id}",
}
except Exception as e:
logger.error(
f"[COMPLETION] Failed to save agent to library: {e}",
exc_info=True,
)
# Return error but don't fail the whole operation
# Sanitize agent_json to remove sensitive keys before returning
return {
"type": "error",
"message": f"Agent was generated but failed to save: {str(e)}",
"error": str(e),
"agent_json": _sanitize_agent_json(agent_json),
}
async def process_operation_success(
task: stream_registry.ActiveTask,
result: dict | str | None,
prisma_client: Prisma | None = None,
) -> None:
"""Handle successful operation completion.
Publishes the result to the stream registry, updates the database,
generates LLM continuation, and marks the task as completed.
Args:
task: The active task that completed
result: The result data from the operation
prisma_client: Optional Prisma client for database operations.
If None, uses chat_service._update_pending_operation instead.
Raises:
ToolMessageUpdateError: If the database update fails. The task will be
marked as failed instead of completed to avoid inconsistent state.
"""
# For agent generation tools, save the agent to library
if task.tool_name in AGENT_GENERATION_TOOLS and isinstance(result, dict):
result = await _save_agent_from_result(result, task.user_id, task.tool_name)
# Serialize result for output (only substitute default when result is exactly None)
result_output = result if result is not None else {"status": "completed"}
output_str = (
result_output
if isinstance(result_output, str)
else orjson.dumps(result_output).decode("utf-8")
)
# Publish result to stream registry
await stream_registry.publish_chunk(
task.task_id,
StreamToolOutputAvailable(
toolCallId=task.tool_call_id,
toolName=task.tool_name,
output=output_str,
success=True,
),
)
# Update pending operation in database
# If this fails, we must not continue to mark the task as completed
result_str = serialize_result(result)
try:
await _update_tool_message(
session_id=task.session_id,
tool_call_id=task.tool_call_id,
content=result_str,
prisma_client=prisma_client,
)
except ToolMessageUpdateError:
# DB update failed - mark task as failed to avoid inconsistent state
logger.error(
f"[COMPLETION] DB update failed for task {task.task_id}, "
"marking as failed instead of completed"
)
await stream_registry.publish_chunk(
task.task_id,
StreamError(errorText="Failed to save operation result to database"),
)
await stream_registry.mark_task_completed(task.task_id, status="failed")
raise
# Generate LLM continuation with streaming
try:
await chat_service._generate_llm_continuation_with_streaming(
session_id=task.session_id,
user_id=task.user_id,
task_id=task.task_id,
)
except Exception as e:
logger.error(
f"[COMPLETION] Failed to generate LLM continuation: {e}",
exc_info=True,
)
# Mark task as completed and release Redis lock
await stream_registry.mark_task_completed(task.task_id, status="completed")
try:
await chat_service._mark_operation_completed(task.tool_call_id)
except Exception as e:
logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
logger.info(
f"[COMPLETION] Successfully processed completion for task {task.task_id}"
)
async def process_operation_failure(
task: stream_registry.ActiveTask,
error: str | None,
prisma_client: Prisma | None = None,
) -> None:
"""Handle failed operation completion.
Publishes the error to the stream registry, updates the database with
the error response, and marks the task as failed.
Args:
task: The active task that failed
error: The error message from the operation
prisma_client: Optional Prisma client for database operations.
If None, uses chat_service._update_pending_operation instead.
"""
error_msg = error or "Operation failed"
# Publish error to stream registry
await stream_registry.publish_chunk(
task.task_id,
StreamError(errorText=error_msg),
)
# Update pending operation with error
# If this fails, we still continue to mark the task as failed
error_response = ErrorResponse(
message=error_msg,
error=error,
)
try:
await _update_tool_message(
session_id=task.session_id,
tool_call_id=task.tool_call_id,
content=error_response.model_dump_json(),
prisma_client=prisma_client,
)
except ToolMessageUpdateError:
# DB update failed - log but continue with cleanup
logger.error(
f"[COMPLETION] DB update failed while processing failure for task {task.task_id}, "
"continuing with cleanup"
)
# Mark task as failed and release Redis lock
await stream_registry.mark_task_completed(task.task_id, status="failed")
try:
await chat_service._mark_operation_completed(task.tool_call_id)
except Exception as e:
logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
logger.info(f"[COMPLETION] Processed failure for task {task.task_id}: {error_msg}")

View File

@@ -1,146 +0,0 @@
"""Configuration management for chat system."""
import os
from pydantic import Field, field_validator
from pydantic_settings import BaseSettings
class ChatConfig(BaseSettings):
"""Configuration for the chat system."""
# OpenAI API Configuration
model: str = Field(
default="anthropic/claude-opus-4.6", description="Default model to use"
)
title_model: str = Field(
default="openai/gpt-4o-mini",
description="Model to use for generating session titles (should be fast/cheap)",
)
api_key: str | None = Field(default=None, description="OpenAI API key")
base_url: str | None = Field(
default="https://openrouter.ai/api/v1",
description="Base URL for API (e.g., for OpenRouter)",
)
# Session TTL Configuration - 12 hours
session_ttl: int = Field(default=43200, description="Session TTL in seconds")
# Streaming Configuration
max_context_messages: int = Field(
default=50, ge=1, le=200, description="Maximum context messages"
)
stream_timeout: int = Field(default=300, description="Stream timeout in seconds")
max_retries: int = Field(default=3, description="Maximum number of retries")
max_agent_runs: int = Field(default=30, description="Maximum number of agent runs")
max_agent_schedules: int = Field(
default=30, description="Maximum number of agent schedules"
)
# Long-running operation configuration
long_running_operation_ttl: int = Field(
default=600,
description="TTL in seconds for long-running operation tracking in Redis (safety net if pod dies)",
)
# Stream registry configuration for SSE reconnection
stream_ttl: int = Field(
default=3600,
description="TTL in seconds for stream data in Redis (1 hour)",
)
stream_max_length: int = Field(
default=10000,
description="Maximum number of messages to store per stream",
)
# Redis Streams configuration for completion consumer
stream_completion_name: str = Field(
default="chat:completions",
description="Redis Stream name for operation completions",
)
stream_consumer_group: str = Field(
default="chat_consumers",
description="Consumer group name for completion stream",
)
stream_claim_min_idle_ms: int = Field(
default=60000,
description="Minimum idle time in milliseconds before claiming pending messages from dead consumers",
)
# Redis key prefixes for stream registry
task_meta_prefix: str = Field(
default="chat:task:meta:",
description="Prefix for task metadata hash keys",
)
task_stream_prefix: str = Field(
default="chat:stream:",
description="Prefix for task message stream keys",
)
task_op_prefix: str = Field(
default="chat:task:op:",
description="Prefix for operation ID to task ID mapping keys",
)
internal_api_key: str | None = Field(
default=None,
description="API key for internal webhook callbacks (env: CHAT_INTERNAL_API_KEY)",
)
# Langfuse Prompt Management Configuration
# Note: Langfuse credentials are in Settings().secrets (settings.py)
langfuse_prompt_name: str = Field(
default="CoPilot Prompt",
description="Name of the prompt in Langfuse to fetch",
)
@field_validator("api_key", mode="before")
@classmethod
def get_api_key(cls, v):
"""Get API key from environment if not provided."""
if v is None:
# Try to get from environment variables
# First check for CHAT_API_KEY (Pydantic prefix)
v = os.getenv("CHAT_API_KEY")
if not v:
# Fall back to OPEN_ROUTER_API_KEY
v = os.getenv("OPEN_ROUTER_API_KEY")
if not v:
# Fall back to OPENAI_API_KEY
v = os.getenv("OPENAI_API_KEY")
return v
@field_validator("base_url", mode="before")
@classmethod
def get_base_url(cls, v):
"""Get base URL from environment if not provided."""
if v is None:
# Check for OpenRouter or custom base URL
v = os.getenv("CHAT_BASE_URL")
if not v:
v = os.getenv("OPENROUTER_BASE_URL")
if not v:
v = os.getenv("OPENAI_BASE_URL")
if not v:
v = "https://openrouter.ai/api/v1"
return v
@field_validator("internal_api_key", mode="before")
@classmethod
def get_internal_api_key(cls, v):
"""Get internal API key from environment if not provided."""
if v is None:
v = os.getenv("CHAT_INTERNAL_API_KEY")
return v
# Prompt paths for different contexts
PROMPT_PATHS: dict[str, str] = {
"default": "prompts/chat_system.md",
"onboarding": "prompts/onboarding_system.md",
}
class Config:
"""Pydantic config."""
env_file = ".env"
env_file_encoding = "utf-8"
extra = "ignore" # Ignore extra environment variables

View File

@@ -1,288 +0,0 @@
"""Database operations for chat sessions."""
import asyncio
import logging
from datetime import UTC, datetime
from typing import Any, cast
from prisma.models import ChatMessage as PrismaChatMessage
from prisma.models import ChatSession as PrismaChatSession
from prisma.types import (
ChatMessageCreateInput,
ChatSessionCreateInput,
ChatSessionUpdateInput,
ChatSessionWhereInput,
)
from backend.data.db import transaction
from backend.util.json import SafeJson
logger = logging.getLogger(__name__)
async def get_chat_session(session_id: str) -> PrismaChatSession | None:
"""Get a chat session by ID from the database."""
session = await PrismaChatSession.prisma().find_unique(
where={"id": session_id},
include={"Messages": True},
)
if session and session.Messages:
# Sort messages by sequence in Python - Prisma Python client doesn't support
# order_by in include clauses (unlike Prisma JS), so we sort after fetching
session.Messages.sort(key=lambda m: m.sequence)
return session
async def create_chat_session(
session_id: str,
user_id: str,
) -> PrismaChatSession:
"""Create a new chat session in the database."""
data = ChatSessionCreateInput(
id=session_id,
userId=user_id,
credentials=SafeJson({}),
successfulAgentRuns=SafeJson({}),
successfulAgentSchedules=SafeJson({}),
)
return await PrismaChatSession.prisma().create(data=data)
async def update_chat_session(
session_id: str,
credentials: dict[str, Any] | None = None,
successful_agent_runs: dict[str, Any] | None = None,
successful_agent_schedules: dict[str, Any] | None = None,
total_prompt_tokens: int | None = None,
total_completion_tokens: int | None = None,
title: str | None = None,
) -> PrismaChatSession | None:
"""Update a chat session's metadata."""
data: ChatSessionUpdateInput = {"updatedAt": datetime.now(UTC)}
if credentials is not None:
data["credentials"] = SafeJson(credentials)
if successful_agent_runs is not None:
data["successfulAgentRuns"] = SafeJson(successful_agent_runs)
if successful_agent_schedules is not None:
data["successfulAgentSchedules"] = SafeJson(successful_agent_schedules)
if total_prompt_tokens is not None:
data["totalPromptTokens"] = total_prompt_tokens
if total_completion_tokens is not None:
data["totalCompletionTokens"] = total_completion_tokens
if title is not None:
data["title"] = title
session = await PrismaChatSession.prisma().update(
where={"id": session_id},
data=data,
include={"Messages": True},
)
if session and session.Messages:
# Sort in Python - Prisma Python doesn't support order_by in include clauses
session.Messages.sort(key=lambda m: m.sequence)
return session
async def add_chat_message(
session_id: str,
role: str,
sequence: int,
content: str | None = None,
name: str | None = None,
tool_call_id: str | None = None,
refusal: str | None = None,
tool_calls: list[dict[str, Any]] | None = None,
function_call: dict[str, Any] | None = None,
) -> PrismaChatMessage:
"""Add a message to a chat session."""
# Build input dict dynamically rather than using ChatMessageCreateInput directly
# because Prisma's TypedDict validation rejects optional fields set to None.
# We only include fields that have values, then cast at the end.
data: dict[str, Any] = {
"Session": {"connect": {"id": session_id}},
"role": role,
"sequence": sequence,
}
# Add optional string fields
if content is not None:
data["content"] = content
if name is not None:
data["name"] = name
if tool_call_id is not None:
data["toolCallId"] = tool_call_id
if refusal is not None:
data["refusal"] = refusal
# Add optional JSON fields only when they have values
if tool_calls is not None:
data["toolCalls"] = SafeJson(tool_calls)
if function_call is not None:
data["functionCall"] = SafeJson(function_call)
# Run message create and session timestamp update in parallel for lower latency
_, message = await asyncio.gather(
PrismaChatSession.prisma().update(
where={"id": session_id},
data={"updatedAt": datetime.now(UTC)},
),
PrismaChatMessage.prisma().create(data=cast(ChatMessageCreateInput, data)),
)
return message
async def add_chat_messages_batch(
session_id: str,
messages: list[dict[str, Any]],
start_sequence: int,
) -> list[PrismaChatMessage]:
"""Add multiple messages to a chat session in a batch.
Uses a transaction for atomicity - if any message creation fails,
the entire batch is rolled back.
"""
if not messages:
return []
created_messages = []
async with transaction() as tx:
for i, msg in enumerate(messages):
# Build input dict dynamically rather than using ChatMessageCreateInput
# directly because Prisma's TypedDict validation rejects optional fields
# set to None. We only include fields that have values, then cast.
data: dict[str, Any] = {
"Session": {"connect": {"id": session_id}},
"role": msg["role"],
"sequence": start_sequence + i,
}
# Add optional string fields
if msg.get("content") is not None:
data["content"] = msg["content"]
if msg.get("name") is not None:
data["name"] = msg["name"]
if msg.get("tool_call_id") is not None:
data["toolCallId"] = msg["tool_call_id"]
if msg.get("refusal") is not None:
data["refusal"] = msg["refusal"]
# Add optional JSON fields only when they have values
if msg.get("tool_calls") is not None:
data["toolCalls"] = SafeJson(msg["tool_calls"])
if msg.get("function_call") is not None:
data["functionCall"] = SafeJson(msg["function_call"])
created = await PrismaChatMessage.prisma(tx).create(
data=cast(ChatMessageCreateInput, data)
)
created_messages.append(created)
# Update session's updatedAt timestamp within the same transaction.
# Note: Token usage (total_prompt_tokens, total_completion_tokens) is updated
# separately via update_chat_session() after streaming completes.
await PrismaChatSession.prisma(tx).update(
where={"id": session_id},
data={"updatedAt": datetime.now(UTC)},
)
return created_messages
async def get_user_chat_sessions(
user_id: str,
limit: int = 50,
offset: int = 0,
) -> list[PrismaChatSession]:
"""Get chat sessions for a user, ordered by most recent."""
return await PrismaChatSession.prisma().find_many(
where={"userId": user_id},
order={"updatedAt": "desc"},
take=limit,
skip=offset,
)
async def get_user_session_count(user_id: str) -> int:
"""Get the total number of chat sessions for a user."""
return await PrismaChatSession.prisma().count(where={"userId": user_id})
async def delete_chat_session(session_id: str, user_id: str | None = None) -> bool:
"""Delete a chat session and all its messages.
Args:
session_id: The session ID to delete.
user_id: If provided, validates that the session belongs to this user
before deletion. This prevents unauthorized deletion of other
users' sessions.
Returns:
True if deleted successfully, False otherwise.
"""
try:
# Build typed where clause with optional user_id validation
where_clause: ChatSessionWhereInput = {"id": session_id}
if user_id is not None:
where_clause["userId"] = user_id
result = await PrismaChatSession.prisma().delete_many(where=where_clause)
if result == 0:
logger.warning(
f"No session deleted for {session_id} "
f"(user_id validation: {user_id is not None})"
)
return False
return True
except Exception as e:
logger.error(f"Failed to delete chat session {session_id}: {e}")
return False
async def get_chat_session_message_count(session_id: str) -> int:
"""Get the number of messages in a chat session."""
count = await PrismaChatMessage.prisma().count(where={"sessionId": session_id})
return count
async def update_tool_message_content(
session_id: str,
tool_call_id: str,
new_content: str,
) -> bool:
"""Update the content of a tool message in chat history.
Used by background tasks to update pending operation messages with final results.
Args:
session_id: The chat session ID.
tool_call_id: The tool call ID to find the message.
new_content: The new content to set.
Returns:
True if a message was updated, False otherwise.
"""
try:
result = await PrismaChatMessage.prisma().update_many(
where={
"sessionId": session_id,
"toolCallId": tool_call_id,
},
data={
"content": new_content,
},
)
if result == 0:
logger.warning(
f"No message found to update for session {session_id}, "
f"tool_call_id {tool_call_id}"
)
return False
return True
except Exception as e:
logger.error(
f"Failed to update tool message for session {session_id}, "
f"tool_call_id {tool_call_id}: {e}"
)
return False

View File

@@ -1,119 +0,0 @@
import pytest
from .model import (
ChatMessage,
ChatSession,
Usage,
get_chat_session,
upsert_chat_session,
)
messages = [
ChatMessage(content="Hello, how are you?", role="user"),
ChatMessage(
content="I'm fine, thank you!",
role="assistant",
tool_calls=[
{
"id": "t123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": '{"city": "New York"}',
},
}
],
),
ChatMessage(
content="I'm using the tool to get the weather",
role="tool",
tool_call_id="t123",
),
]
@pytest.mark.asyncio(loop_scope="session")
async def test_chatsession_serialization_deserialization():
s = ChatSession.new(user_id="abc123")
s.messages = messages
s.usage = [Usage(prompt_tokens=100, completion_tokens=200, total_tokens=300)]
serialized = s.model_dump_json()
s2 = ChatSession.model_validate_json(serialized)
assert s2.model_dump() == s.model_dump()
@pytest.mark.asyncio(loop_scope="session")
async def test_chatsession_redis_storage(setup_test_user, test_user_id):
s = ChatSession.new(user_id=test_user_id)
s.messages = messages
s = await upsert_chat_session(s)
s2 = await get_chat_session(
session_id=s.session_id,
user_id=s.user_id,
)
assert s2 == s
@pytest.mark.asyncio(loop_scope="session")
async def test_chatsession_redis_storage_user_id_mismatch(
setup_test_user, test_user_id
):
s = ChatSession.new(user_id=test_user_id)
s.messages = messages
s = await upsert_chat_session(s)
s2 = await get_chat_session(s.session_id, "different_user_id")
assert s2 is None
@pytest.mark.asyncio(loop_scope="session")
async def test_chatsession_db_storage(setup_test_user, test_user_id):
"""Test that messages are correctly saved to and loaded from DB (not cache)."""
from backend.data.redis_client import get_redis_async
# Create session with messages including assistant message
s = ChatSession.new(user_id=test_user_id)
s.messages = messages # Contains user, assistant, and tool messages
assert s.session_id is not None, "Session id is not set"
# Upsert to save to both cache and DB
s = await upsert_chat_session(s)
# Clear the Redis cache to force DB load
redis_key = f"chat:session:{s.session_id}"
async_redis = await get_redis_async()
await async_redis.delete(redis_key)
# Load from DB (cache was cleared)
s2 = await get_chat_session(
session_id=s.session_id,
user_id=s.user_id,
)
assert s2 is not None, "Session not found after loading from DB"
assert len(s2.messages) == len(
s.messages
), f"Message count mismatch: expected {len(s.messages)}, got {len(s2.messages)}"
# Verify all roles are present
roles = [m.role for m in s2.messages]
assert "user" in roles, f"User message missing. Roles found: {roles}"
assert "assistant" in roles, f"Assistant message missing. Roles found: {roles}"
assert "tool" in roles, f"Tool message missing. Roles found: {roles}"
# Verify message content
for orig, loaded in zip(s.messages, s2.messages):
assert orig.role == loaded.role, f"Role mismatch: {orig.role} != {loaded.role}"
assert (
orig.content == loaded.content
), f"Content mismatch for {orig.role}: {orig.content} != {loaded.content}"
if orig.tool_calls:
assert (
loaded.tool_calls is not None
), f"Tool calls missing for {orig.role} message"
assert len(orig.tool_calls) == len(loaded.tool_calls)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,402 @@
"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""
from datetime import UTC, datetime, timedelta
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from backend.api.features.chat import routes as chat_routes
app = fastapi.FastAPI()
app.include_router(chat_routes.router)
client = fastapi.testclient.TestClient(app)
TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
@pytest.fixture(autouse=True)
def setup_app_auth(mock_jwt_user):
"""Setup auth overrides for all tests in this module"""
from autogpt_libs.auth.jwt_utils import get_jwt_payload
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def _mock_update_session_title(
mocker: pytest_mock.MockerFixture, *, success: bool = True
):
"""Mock update_session_title."""
return mocker.patch(
"backend.api.features.chat.routes.update_session_title",
new_callable=AsyncMock,
return_value=success,
)
# ─── Update title: success ─────────────────────────────────────────────
def test_update_title_success(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
mock_update = _mock_update_session_title(mocker, success=True)
response = client.patch(
"/sessions/sess-1/title",
json={"title": "My project"},
)
assert response.status_code == 200
assert response.json() == {"status": "ok"}
mock_update.assert_called_once_with("sess-1", test_user_id, "My project")
def test_update_title_trims_whitespace(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
mock_update = _mock_update_session_title(mocker, success=True)
response = client.patch(
"/sessions/sess-1/title",
json={"title": " trimmed "},
)
assert response.status_code == 200
mock_update.assert_called_once_with("sess-1", test_user_id, "trimmed")
# ─── Update title: blank / whitespace-only → 422 ──────────────────────
def test_update_title_blank_rejected(
test_user_id: str,
) -> None:
"""Whitespace-only titles must be rejected before hitting the DB."""
response = client.patch(
"/sessions/sess-1/title",
json={"title": " "},
)
assert response.status_code == 422
def test_update_title_empty_rejected(
test_user_id: str,
) -> None:
response = client.patch(
"/sessions/sess-1/title",
json={"title": ""},
)
assert response.status_code == 422
# ─── Update title: session not found or wrong user → 404 ──────────────
def test_update_title_not_found(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
_mock_update_session_title(mocker, success=False)
response = client.patch(
"/sessions/sess-1/title",
json={"title": "New name"},
)
assert response.status_code == 404
# ─── file_ids Pydantic validation ─────────────────────────────────────
def test_stream_chat_rejects_too_many_file_ids():
"""More than 20 file_ids should be rejected by Pydantic validation (422)."""
response = client.post(
"/sessions/sess-1/stream",
json={
"message": "hello",
"file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(21)],
},
)
assert response.status_code == 422
def _mock_stream_internals(mocker: pytest_mock.MockFixture):
"""Mock the async internals of stream_chat_post so tests can exercise
validation and enrichment logic without needing Redis/RabbitMQ."""
mocker.patch(
"backend.api.features.chat.routes._validate_and_get_session",
return_value=None,
)
mocker.patch(
"backend.api.features.chat.routes.append_and_save_message",
return_value=None,
)
mock_registry = mocker.MagicMock()
mock_registry.create_session = mocker.AsyncMock(return_value=None)
mocker.patch(
"backend.api.features.chat.routes.stream_registry",
mock_registry,
)
mocker.patch(
"backend.api.features.chat.routes.enqueue_copilot_turn",
return_value=None,
)
mocker.patch(
"backend.api.features.chat.routes.track_user_message",
return_value=None,
)
def test_stream_chat_accepts_20_file_ids(mocker: pytest_mock.MockFixture):
"""Exactly 20 file_ids should be accepted (not rejected by validation)."""
_mock_stream_internals(mocker)
# Patch workspace lookup as imported by the routes module
mocker.patch(
"backend.api.features.chat.routes.get_or_create_workspace",
return_value=type("W", (), {"id": "ws-1"})(),
)
mock_prisma = mocker.MagicMock()
mock_prisma.find_many = mocker.AsyncMock(return_value=[])
mocker.patch(
"prisma.models.UserWorkspaceFile.prisma",
return_value=mock_prisma,
)
response = client.post(
"/sessions/sess-1/stream",
json={
"message": "hello",
"file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(20)],
},
)
# Should get past validation — 200 streaming response expected
assert response.status_code == 200
# ─── UUID format filtering ─────────────────────────────────────────────
def test_file_ids_filters_invalid_uuids(mocker: pytest_mock.MockFixture):
"""Non-UUID strings in file_ids should be silently filtered out
and NOT passed to the database query."""
_mock_stream_internals(mocker)
mocker.patch(
"backend.api.features.chat.routes.get_or_create_workspace",
return_value=type("W", (), {"id": "ws-1"})(),
)
mock_prisma = mocker.MagicMock()
mock_prisma.find_many = mocker.AsyncMock(return_value=[])
mocker.patch(
"prisma.models.UserWorkspaceFile.prisma",
return_value=mock_prisma,
)
valid_id = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
client.post(
"/sessions/sess-1/stream",
json={
"message": "hello",
"file_ids": [
valid_id,
"not-a-uuid",
"../../../etc/passwd",
"",
],
},
)
# The find_many call should only receive the one valid UUID
mock_prisma.find_many.assert_called_once()
call_kwargs = mock_prisma.find_many.call_args[1]
assert call_kwargs["where"]["id"]["in"] == [valid_id]
# ─── Cross-workspace file_ids ─────────────────────────────────────────
def test_file_ids_scoped_to_workspace(mocker: pytest_mock.MockFixture):
"""The batch query should scope to the user's workspace."""
_mock_stream_internals(mocker)
mocker.patch(
"backend.api.features.chat.routes.get_or_create_workspace",
return_value=type("W", (), {"id": "my-workspace-id"})(),
)
mock_prisma = mocker.MagicMock()
mock_prisma.find_many = mocker.AsyncMock(return_value=[])
mocker.patch(
"prisma.models.UserWorkspaceFile.prisma",
return_value=mock_prisma,
)
fid = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
client.post(
"/sessions/sess-1/stream",
json={"message": "hi", "file_ids": [fid]},
)
call_kwargs = mock_prisma.find_many.call_args[1]
assert call_kwargs["where"]["workspaceId"] == "my-workspace-id"
assert call_kwargs["where"]["isDeleted"] is False
# ─── Rate limit → 429 ─────────────────────────────────────────────────
def test_stream_chat_returns_429_on_daily_rate_limit(mocker: pytest_mock.MockFixture):
"""When check_rate_limit raises RateLimitExceeded for daily limit the endpoint returns 429."""
from backend.copilot.rate_limit import RateLimitExceeded
_mock_stream_internals(mocker)
# Ensure the rate-limit branch is entered by setting a non-zero limit.
mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
mocker.patch(
"backend.api.features.chat.routes.check_rate_limit",
side_effect=RateLimitExceeded("daily", datetime.now(UTC) + timedelta(hours=1)),
)
response = client.post(
"/sessions/sess-1/stream",
json={"message": "hello"},
)
assert response.status_code == 429
assert "daily" in response.json()["detail"].lower()
def test_stream_chat_returns_429_on_weekly_rate_limit(mocker: pytest_mock.MockFixture):
"""When check_rate_limit raises RateLimitExceeded for weekly limit the endpoint returns 429."""
from backend.copilot.rate_limit import RateLimitExceeded
_mock_stream_internals(mocker)
mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
resets_at = datetime.now(UTC) + timedelta(days=3)
mocker.patch(
"backend.api.features.chat.routes.check_rate_limit",
side_effect=RateLimitExceeded("weekly", resets_at),
)
response = client.post(
"/sessions/sess-1/stream",
json={"message": "hello"},
)
assert response.status_code == 429
detail = response.json()["detail"].lower()
assert "weekly" in detail
assert "resets in" in detail
def test_stream_chat_429_includes_reset_time(mocker: pytest_mock.MockFixture):
"""The 429 response detail should include the human-readable reset time."""
from backend.copilot.rate_limit import RateLimitExceeded
_mock_stream_internals(mocker)
mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
mocker.patch(
"backend.api.features.chat.routes.check_rate_limit",
side_effect=RateLimitExceeded(
"daily", datetime.now(UTC) + timedelta(hours=2, minutes=30)
),
)
response = client.post(
"/sessions/sess-1/stream",
json={"message": "hello"},
)
assert response.status_code == 429
detail = response.json()["detail"]
assert "2h" in detail
assert "Resets in" in detail
# ─── Usage endpoint ───────────────────────────────────────────────────
def _mock_usage(
mocker: pytest_mock.MockerFixture,
*,
daily_used: int = 500,
weekly_used: int = 2000,
) -> AsyncMock:
"""Mock get_usage_status to return a predictable CoPilotUsageStatus."""
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
resets_at = datetime.now(UTC) + timedelta(days=1)
status = CoPilotUsageStatus(
daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
)
return mocker.patch(
"backend.api.features.chat.routes.get_usage_status",
new_callable=AsyncMock,
return_value=status,
)
def test_usage_returns_daily_and_weekly(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""GET /usage returns daily and weekly usage."""
mock_get = _mock_usage(mocker, daily_used=500, weekly_used=2000)
mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
response = client.get("/usage")
assert response.status_code == 200
data = response.json()
assert data["daily"]["used"] == 500
assert data["weekly"]["used"] == 2000
mock_get.assert_called_once_with(
user_id=test_user_id,
daily_token_limit=10000,
weekly_token_limit=50000,
)
def test_usage_uses_config_limits(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""The endpoint forwards daily_token_limit and weekly_token_limit from config."""
mock_get = _mock_usage(mocker)
mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
response = client.get("/usage")
assert response.status_code == 200
mock_get.assert_called_once_with(
user_id=test_user_id,
daily_token_limit=99999,
weekly_token_limit=77777,
)
def test_usage_rejects_unauthenticated_request() -> None:
"""GET /usage should return 401 when no valid JWT is provided."""
unauthenticated_app = fastapi.FastAPI()
unauthenticated_app.include_router(chat_routes.router)
unauthenticated_client = fastapi.testclient.TestClient(unauthenticated_app)
response = unauthenticated_client.get("/usage")
assert response.status_code == 401

File diff suppressed because it is too large Load Diff

View File

@@ -1,82 +0,0 @@
import logging
from os import getenv
import pytest
from . import service as chat_service
from .model import create_chat_session, get_chat_session, upsert_chat_session
from .response_model import (
StreamError,
StreamFinish,
StreamTextDelta,
StreamToolOutputAvailable,
)
logger = logging.getLogger(__name__)
@pytest.mark.asyncio(loop_scope="session")
async def test_stream_chat_completion(setup_test_user, test_user_id):
"""
Test the stream_chat_completion function.
"""
api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
session = await create_chat_session(test_user_id)
has_errors = False
has_ended = False
assistant_message = ""
async for chunk in chat_service.stream_chat_completion(
session.session_id, "Hello, how are you?", user_id=session.user_id
):
logger.info(chunk)
if isinstance(chunk, StreamError):
has_errors = True
if isinstance(chunk, StreamTextDelta):
assistant_message += chunk.delta
if isinstance(chunk, StreamFinish):
has_ended = True
assert has_ended, "Chat completion did not end"
assert not has_errors, "Error occurred while streaming chat completion"
assert assistant_message, "Assistant message is empty"
@pytest.mark.asyncio(loop_scope="session")
async def test_stream_chat_completion_with_tool_calls(setup_test_user, test_user_id):
"""
Test the stream_chat_completion function.
"""
api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
session = await create_chat_session(test_user_id)
session = await upsert_chat_session(session)
has_errors = False
has_ended = False
had_tool_calls = False
async for chunk in chat_service.stream_chat_completion(
session.session_id,
"Please find me an agent that can help me with my business. Use the query 'moneny printing agent'",
user_id=session.user_id,
):
logger.info(chunk)
if isinstance(chunk, StreamError):
has_errors = True
if isinstance(chunk, StreamFinish):
has_ended = True
if isinstance(chunk, StreamToolOutputAvailable):
had_tool_calls = True
assert has_ended, "Chat completion did not end"
assert not has_errors, "Error occurred while streaming chat completion"
assert had_tool_calls, "Tool calls did not occur"
session = await get_chat_session(session.session_id)
assert session, "Session not found"
assert session.usage, "Usage is empty"

View File

@@ -1,498 +0,0 @@
"""External Agent Generator service client.
This module provides a client for communicating with the external Agent Generator
microservice. When AGENTGENERATOR_HOST is configured, the agent generation functions
will delegate to the external service instead of using the built-in LLM-based implementation.
"""
import logging
from typing import Any
import httpx
from backend.util.settings import Settings
logger = logging.getLogger(__name__)
def _create_error_response(
error_message: str,
error_type: str = "unknown",
details: dict[str, Any] | None = None,
) -> dict[str, Any]:
"""Create a standardized error response dict.
Args:
error_message: Human-readable error message
error_type: Machine-readable error type
details: Optional additional error details
Returns:
Error dict with type="error" and error details
"""
response: dict[str, Any] = {
"type": "error",
"error": error_message,
"error_type": error_type,
}
if details:
response["details"] = details
return response
def _classify_http_error(e: httpx.HTTPStatusError) -> tuple[str, str]:
"""Classify an HTTP error into error_type and message.
Args:
e: The HTTP status error
Returns:
Tuple of (error_type, error_message)
"""
status = e.response.status_code
if status == 429:
return "rate_limit", f"Agent Generator rate limited: {e}"
elif status == 503:
return "service_unavailable", f"Agent Generator unavailable: {e}"
elif status == 504 or status == 408:
return "timeout", f"Agent Generator timed out: {e}"
else:
return "http_error", f"HTTP error calling Agent Generator: {e}"
def _classify_request_error(e: httpx.RequestError) -> tuple[str, str]:
"""Classify a request error into error_type and message.
Args:
e: The request error
Returns:
Tuple of (error_type, error_message)
"""
error_str = str(e).lower()
if "timeout" in error_str or "timed out" in error_str:
return "timeout", f"Agent Generator request timed out: {e}"
elif "connect" in error_str:
return "connection_error", f"Could not connect to Agent Generator: {e}"
else:
return "request_error", f"Request error calling Agent Generator: {e}"
_client: httpx.AsyncClient | None = None
_settings: Settings | None = None
def _get_settings() -> Settings:
"""Get or create settings singleton."""
global _settings
if _settings is None:
_settings = Settings()
return _settings
def is_external_service_configured() -> bool:
"""Check if external Agent Generator service is configured."""
settings = _get_settings()
return bool(settings.config.agentgenerator_host)
def _get_base_url() -> str:
"""Get the base URL for the external service."""
settings = _get_settings()
host = settings.config.agentgenerator_host
port = settings.config.agentgenerator_port
return f"http://{host}:{port}"
def _get_client() -> httpx.AsyncClient:
"""Get or create the HTTP client for the external service."""
global _client
if _client is None:
settings = _get_settings()
_client = httpx.AsyncClient(
base_url=_get_base_url(),
timeout=httpx.Timeout(settings.config.agentgenerator_timeout),
)
return _client
async def decompose_goal_external(
description: str,
context: str = "",
library_agents: list[dict[str, Any]] | None = None,
) -> dict[str, Any] | None:
"""Call the external service to decompose a goal.
Args:
description: Natural language goal description
context: Additional context (e.g., answers to previous questions)
library_agents: User's library agents available for sub-agent composition
Returns:
Dict with either:
- {"type": "clarifying_questions", "questions": [...]}
- {"type": "instructions", "steps": [...]}
- {"type": "unachievable_goal", ...}
- {"type": "vague_goal", ...}
- {"type": "error", "error": "...", "error_type": "..."} on error
Or None on unexpected error
"""
client = _get_client()
if context:
description = f"{description}\n\nAdditional context from user:\n{context}"
payload: dict[str, Any] = {"description": description}
if library_agents:
payload["library_agents"] = library_agents
try:
response = await client.post("/api/decompose-description", json=payload)
response.raise_for_status()
data = response.json()
if not data.get("success"):
error_msg = data.get("error", "Unknown error from Agent Generator")
error_type = data.get("error_type", "unknown")
logger.error(
f"Agent Generator decomposition failed: {error_msg} "
f"(type: {error_type})"
)
return _create_error_response(error_msg, error_type)
# Map the response to the expected format
response_type = data.get("type")
if response_type == "instructions":
return {"type": "instructions", "steps": data.get("steps", [])}
elif response_type == "clarifying_questions":
return {
"type": "clarifying_questions",
"questions": data.get("questions", []),
}
elif response_type == "unachievable_goal":
return {
"type": "unachievable_goal",
"reason": data.get("reason"),
"suggested_goal": data.get("suggested_goal"),
}
elif response_type == "vague_goal":
return {
"type": "vague_goal",
"suggested_goal": data.get("suggested_goal"),
}
elif response_type == "error":
# Pass through error from the service
return _create_error_response(
data.get("error", "Unknown error"),
data.get("error_type", "unknown"),
)
else:
logger.error(
f"Unknown response type from external service: {response_type}"
)
return _create_error_response(
f"Unknown response type from Agent Generator: {response_type}",
"invalid_response",
)
except httpx.HTTPStatusError as e:
error_type, error_msg = _classify_http_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except httpx.RequestError as e:
error_type, error_msg = _classify_request_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except Exception as e:
error_msg = f"Unexpected error calling Agent Generator: {e}"
logger.error(error_msg)
return _create_error_response(error_msg, "unexpected_error")
async def generate_agent_external(
instructions: dict[str, Any],
library_agents: list[dict[str, Any]] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
) -> dict[str, Any] | None:
"""Call the external service to generate an agent from instructions.
Args:
instructions: Structured instructions from decompose_goal
library_agents: User's library agents available for sub-agent composition
operation_id: Operation ID for async processing (enables Redis Streams callback)
task_id: Task ID for async processing (enables Redis Streams callback)
Returns:
Agent JSON dict, {"status": "accepted"} for async, or error dict {"type": "error", ...} on error
"""
client = _get_client()
# Build request payload
payload: dict[str, Any] = {"instructions": instructions}
if library_agents:
payload["library_agents"] = library_agents
if operation_id and task_id:
payload["operation_id"] = operation_id
payload["task_id"] = task_id
try:
response = await client.post("/api/generate-agent", json=payload)
# Handle 202 Accepted for async processing
if response.status_code == 202:
logger.info(
f"Agent Generator accepted async request "
f"(operation_id={operation_id}, task_id={task_id})"
)
return {
"status": "accepted",
"operation_id": operation_id,
"task_id": task_id,
}
response.raise_for_status()
data = response.json()
if not data.get("success"):
error_msg = data.get("error", "Unknown error from Agent Generator")
error_type = data.get("error_type", "unknown")
logger.error(
f"Agent Generator generation failed: {error_msg} (type: {error_type})"
)
return _create_error_response(error_msg, error_type)
return data.get("agent_json")
except httpx.HTTPStatusError as e:
error_type, error_msg = _classify_http_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except httpx.RequestError as e:
error_type, error_msg = _classify_request_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except Exception as e:
error_msg = f"Unexpected error calling Agent Generator: {e}"
logger.error(error_msg)
return _create_error_response(error_msg, "unexpected_error")
async def generate_agent_patch_external(
update_request: str,
current_agent: dict[str, Any],
library_agents: list[dict[str, Any]] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
) -> dict[str, Any] | None:
"""Call the external service to generate a patch for an existing agent.
Args:
update_request: Natural language description of changes
current_agent: Current agent JSON
library_agents: User's library agents available for sub-agent composition
operation_id: Operation ID for async processing (enables Redis Streams callback)
task_id: Task ID for async processing (enables Redis Streams callback)
Returns:
Updated agent JSON, clarifying questions dict, {"status": "accepted"} for async, or error dict on error
"""
client = _get_client()
# Build request payload
payload: dict[str, Any] = {
"update_request": update_request,
"current_agent_json": current_agent,
}
if library_agents:
payload["library_agents"] = library_agents
if operation_id and task_id:
payload["operation_id"] = operation_id
payload["task_id"] = task_id
try:
response = await client.post("/api/update-agent", json=payload)
# Handle 202 Accepted for async processing
if response.status_code == 202:
logger.info(
f"Agent Generator accepted async update request "
f"(operation_id={operation_id}, task_id={task_id})"
)
return {
"status": "accepted",
"operation_id": operation_id,
"task_id": task_id,
}
response.raise_for_status()
data = response.json()
if not data.get("success"):
error_msg = data.get("error", "Unknown error from Agent Generator")
error_type = data.get("error_type", "unknown")
logger.error(
f"Agent Generator patch generation failed: {error_msg} "
f"(type: {error_type})"
)
return _create_error_response(error_msg, error_type)
# Check if it's clarifying questions
if data.get("type") == "clarifying_questions":
return {
"type": "clarifying_questions",
"questions": data.get("questions", []),
}
# Check if it's an error passed through
if data.get("type") == "error":
return _create_error_response(
data.get("error", "Unknown error"),
data.get("error_type", "unknown"),
)
# Otherwise return the updated agent JSON
return data.get("agent_json")
except httpx.HTTPStatusError as e:
error_type, error_msg = _classify_http_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except httpx.RequestError as e:
error_type, error_msg = _classify_request_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except Exception as e:
error_msg = f"Unexpected error calling Agent Generator: {e}"
logger.error(error_msg)
return _create_error_response(error_msg, "unexpected_error")
async def customize_template_external(
template_agent: dict[str, Any],
modification_request: str,
context: str = "",
) -> dict[str, Any] | None:
"""Call the external service to customize a template/marketplace agent.
Args:
template_agent: The template agent JSON to customize
modification_request: Natural language description of customizations
context: Additional context (e.g., answers to previous questions)
Returns:
Customized agent JSON, clarifying questions dict, or error dict on error
"""
client = _get_client()
request = modification_request
if context:
request = f"{modification_request}\n\nAdditional context from user:\n{context}"
payload: dict[str, Any] = {
"template_agent_json": template_agent,
"modification_request": request,
}
try:
response = await client.post("/api/template-modification", json=payload)
response.raise_for_status()
data = response.json()
if not data.get("success"):
error_msg = data.get("error", "Unknown error from Agent Generator")
error_type = data.get("error_type", "unknown")
logger.error(
f"Agent Generator template customization failed: {error_msg} "
f"(type: {error_type})"
)
return _create_error_response(error_msg, error_type)
# Check if it's clarifying questions
if data.get("type") == "clarifying_questions":
return {
"type": "clarifying_questions",
"questions": data.get("questions", []),
}
# Check if it's an error passed through
if data.get("type") == "error":
return _create_error_response(
data.get("error", "Unknown error"),
data.get("error_type", "unknown"),
)
# Otherwise return the customized agent JSON
return data.get("agent_json")
except httpx.HTTPStatusError as e:
error_type, error_msg = _classify_http_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except httpx.RequestError as e:
error_type, error_msg = _classify_request_error(e)
logger.error(error_msg)
return _create_error_response(error_msg, error_type)
except Exception as e:
error_msg = f"Unexpected error calling Agent Generator: {e}"
logger.error(error_msg)
return _create_error_response(error_msg, "unexpected_error")
async def get_blocks_external() -> list[dict[str, Any]] | None:
"""Get available blocks from the external service.
Returns:
List of block info dicts or None on error
"""
client = _get_client()
try:
response = await client.get("/api/blocks")
response.raise_for_status()
data = response.json()
if not data.get("success"):
logger.error("External service returned error getting blocks")
return None
return data.get("blocks", [])
except httpx.HTTPStatusError as e:
logger.error(f"HTTP error getting blocks from external service: {e}")
return None
except httpx.RequestError as e:
logger.error(f"Request error getting blocks from external service: {e}")
return None
except Exception as e:
logger.error(f"Unexpected error getting blocks from external service: {e}")
return None
async def health_check() -> bool:
"""Check if the external service is healthy.
Returns:
True if healthy, False otherwise
"""
if not is_external_service_configured():
return False
client = _get_client()
try:
response = await client.get("/health")
response.raise_for_status()
data = response.json()
return data.get("status") == "healthy" and data.get("blocks_loaded", False)
except Exception as e:
logger.warning(f"External agent generator health check failed: {e}")
return False
async def close_client() -> None:
"""Close the HTTP client."""
global _client
if _client is not None:
await _client.aclose()
_client = None

View File

@@ -1,239 +0,0 @@
"""Shared agent search functionality for find_agent and find_library_agent tools."""
import logging
import re
from typing import Literal
from backend.api.features.library import db as library_db
from backend.api.features.store import db as store_db
from backend.util.exceptions import DatabaseError, NotFoundError
from .models import (
AgentInfo,
AgentsFoundResponse,
ErrorResponse,
NoResultsResponse,
ToolResponseBase,
)
logger = logging.getLogger(__name__)
SearchSource = Literal["marketplace", "library"]
_UUID_PATTERN = re.compile(
r"^[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89ab][a-f0-9]{3}-[a-f0-9]{12}$",
re.IGNORECASE,
)
def _is_uuid(text: str) -> bool:
"""Check if text is a valid UUID v4."""
return bool(_UUID_PATTERN.match(text.strip()))
async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | None:
"""Fetch a library agent by ID (library agent ID or graph_id).
Tries multiple lookup strategies:
1. First by graph_id (AgentGraph primary key)
2. Then by library agent ID (LibraryAgent primary key)
Args:
user_id: The user ID
agent_id: The ID to look up (can be graph_id or library agent ID)
Returns:
AgentInfo if found, None otherwise
"""
try:
agent = await library_db.get_library_agent_by_graph_id(user_id, agent_id)
if agent:
logger.debug(f"Found library agent by graph_id: {agent.name}")
return AgentInfo(
id=agent.id,
name=agent.name,
description=agent.description or "",
source="library",
in_library=True,
creator=agent.creator_name,
status=agent.status.value,
can_access_graph=agent.can_access_graph,
has_external_trigger=agent.has_external_trigger,
new_output=agent.new_output,
graph_id=agent.graph_id,
)
except DatabaseError:
raise
except Exception as e:
logger.warning(
f"Could not fetch library agent by graph_id {agent_id}: {e}",
exc_info=True,
)
try:
agent = await library_db.get_library_agent(agent_id, user_id)
if agent:
logger.debug(f"Found library agent by library_id: {agent.name}")
return AgentInfo(
id=agent.id,
name=agent.name,
description=agent.description or "",
source="library",
in_library=True,
creator=agent.creator_name,
status=agent.status.value,
can_access_graph=agent.can_access_graph,
has_external_trigger=agent.has_external_trigger,
new_output=agent.new_output,
graph_id=agent.graph_id,
)
except NotFoundError:
logger.debug(f"Library agent not found by library_id: {agent_id}")
except DatabaseError:
raise
except Exception as e:
logger.warning(
f"Could not fetch library agent by library_id {agent_id}: {e}",
exc_info=True,
)
return None
async def search_agents(
query: str,
source: SearchSource,
session_id: str | None,
user_id: str | None = None,
) -> ToolResponseBase:
"""
Search for agents in marketplace or user library.
Args:
query: Search query string
source: "marketplace" or "library"
session_id: Chat session ID
user_id: User ID (required for library search)
Returns:
AgentsFoundResponse, NoResultsResponse, or ErrorResponse
"""
if not query:
return ErrorResponse(
message="Please provide a search query", session_id=session_id
)
if source == "library" and not user_id:
return ErrorResponse(
message="User authentication required to search library",
session_id=session_id,
)
agents: list[AgentInfo] = []
try:
if source == "marketplace":
logger.info(f"Searching marketplace for: {query}")
results = await store_db.get_store_agents(search_query=query, page_size=5)
for agent in results.agents:
agents.append(
AgentInfo(
id=f"{agent.creator}/{agent.slug}",
name=agent.agent_name,
description=agent.description or "",
source="marketplace",
in_library=False,
creator=agent.creator,
category="general",
rating=agent.rating,
runs=agent.runs,
is_featured=False,
)
)
else:
if _is_uuid(query):
logger.info(f"Query looks like UUID, trying direct lookup: {query}")
agent = await _get_library_agent_by_id(user_id, query) # type: ignore[arg-type]
if agent:
agents.append(agent)
logger.info(f"Found agent by direct ID lookup: {agent.name}")
if not agents:
logger.info(f"Searching user library for: {query}")
results = await library_db.list_library_agents(
user_id=user_id, # type: ignore[arg-type]
search_term=query,
page_size=10,
)
for agent in results.agents:
agents.append(
AgentInfo(
id=agent.id,
name=agent.name,
description=agent.description or "",
source="library",
in_library=True,
creator=agent.creator_name,
status=agent.status.value,
can_access_graph=agent.can_access_graph,
has_external_trigger=agent.has_external_trigger,
new_output=agent.new_output,
graph_id=agent.graph_id,
)
)
logger.info(f"Found {len(agents)} agents in {source}")
except NotFoundError:
pass
except DatabaseError as e:
logger.error(f"Error searching {source}: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to search {source}. Please try again.",
error=str(e),
session_id=session_id,
)
if not agents:
suggestions = (
[
"Try more general terms",
"Browse categories in the marketplace",
"Check spelling",
]
if source == "marketplace"
else [
"Try different keywords",
"Use find_agent to search the marketplace",
"Check your library at /library",
]
)
no_results_msg = (
f"No agents found matching '{query}'. Let the user know they can try different keywords or browse the marketplace. Also let them know you can create a custom agent for them based on their needs."
if source == "marketplace"
else f"No agents matching '{query}' found in your library. Let the user know you can create a custom agent for them based on their needs."
)
return NoResultsResponse(
message=no_results_msg, session_id=session_id, suggestions=suggestions
)
title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} "
title += (
f"for '{query}'"
if source == "marketplace"
else f"in your library for '{query}'"
)
message = (
"Now you have found some options for the user to choose from. "
"You can add a link to a recommended agent at: /marketplace/agent/agent_id "
"Please ask the user if they would like to use any of these agents. Let the user know we can create a custom agent for them based on their needs."
if source == "marketplace"
else "Found agents in the user's library. You can provide a link to view an agent at: "
"/library/agents/{agent_id}. Use agent_output to get execution results, or run_agent to execute. Let the user know we can create a custom agent for them based on their needs."
)
return AgentsFoundResponse(
message=message,
title=title,
agents=agents,
count=len(agents),
session_id=session_id,
)

View File

@@ -1,129 +0,0 @@
"""Base classes and shared utilities for chat tools."""
import logging
from typing import Any
from openai.types.chat import ChatCompletionToolParam
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.response_model import StreamToolOutputAvailable
from .models import ErrorResponse, NeedLoginResponse, ToolResponseBase
logger = logging.getLogger(__name__)
class BaseTool:
"""Base class for all chat tools."""
@property
def name(self) -> str:
"""Tool name for OpenAI function calling."""
raise NotImplementedError
@property
def description(self) -> str:
"""Tool description for OpenAI."""
raise NotImplementedError
@property
def parameters(self) -> dict[str, Any]:
"""Tool parameters schema for OpenAI."""
raise NotImplementedError
@property
def requires_auth(self) -> bool:
"""Whether this tool requires authentication."""
return False
@property
def is_long_running(self) -> bool:
"""Whether this tool is long-running and should execute in background.
Long-running tools (like agent generation) are executed via background
tasks to survive SSE disconnections. The result is persisted to chat
history and visible when the user refreshes.
"""
return False
def as_openai_tool(self) -> ChatCompletionToolParam:
"""Convert to OpenAI tool format."""
return ChatCompletionToolParam(
type="function",
function={
"name": self.name,
"description": self.description,
"parameters": self.parameters,
},
)
async def execute(
self,
user_id: str | None,
session: ChatSession,
tool_call_id: str,
**kwargs,
) -> StreamToolOutputAvailable:
"""Execute the tool with authentication check.
Args:
user_id: User ID (may be anonymous like "anon_123")
session_id: Chat session ID
**kwargs: Tool-specific parameters
Returns:
Pydantic response object
"""
if self.requires_auth and not user_id:
logger.error(
f"Attempted tool call for {self.name} but user not authenticated"
)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=self.name,
output=NeedLoginResponse(
message=f"Please sign in to use {self.name}",
session_id=session.session_id,
).model_dump_json(),
success=False,
)
try:
result = await self._execute(user_id, session, **kwargs)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=self.name,
output=result.model_dump_json(),
)
except Exception as e:
logger.error(f"Error in {self.name}: {e}", exc_info=True)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=self.name,
output=ErrorResponse(
message=f"An error occurred while executing {self.name}",
error=str(e),
session_id=session.session_id,
).model_dump_json(),
success=False,
)
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
"""Internal execution logic to be implemented by subclasses.
Args:
user_id: User ID (authenticated or anonymous)
session_id: Chat session ID
**kwargs: Tool-specific parameters
Returns:
Pydantic response object
"""
raise NotImplementedError

View File

@@ -1,335 +0,0 @@
"""CreateAgentTool - Creates agents from natural language descriptions."""
import logging
from typing import Any
from backend.api.features.chat.model import ChatSession
from .agent_generator import (
AgentGeneratorNotConfiguredError,
decompose_goal,
enrich_library_agents_from_steps,
generate_agent,
get_all_relevant_agents_for_generation,
get_user_message_for_error,
save_agent_to_library,
)
from .base import BaseTool
from .models import (
AgentPreviewResponse,
AgentSavedResponse,
AsyncProcessingResponse,
ClarificationNeededResponse,
ClarifyingQuestion,
ErrorResponse,
ToolResponseBase,
)
logger = logging.getLogger(__name__)
class CreateAgentTool(BaseTool):
"""Tool for creating agents from natural language descriptions."""
@property
def name(self) -> str:
return "create_agent"
@property
def description(self) -> str:
return (
"Create a new agent workflow from a natural language description. "
"First generates a preview, then saves to library if save=true."
)
@property
def requires_auth(self) -> bool:
return True
@property
def is_long_running(self) -> bool:
return True
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": (
"Natural language description of what the agent should do. "
"Be specific about inputs, outputs, and the workflow steps."
),
},
"context": {
"type": "string",
"description": (
"Additional context or answers to previous clarifying questions. "
"Include any preferences or constraints mentioned by the user."
),
},
"save": {
"type": "boolean",
"description": (
"Whether to save the agent to the user's library. "
"Default is true. Set to false for preview only."
),
"default": True,
},
},
"required": ["description"],
}
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
"""Execute the create_agent tool.
Flow:
1. Decompose the description into steps (may return clarifying questions)
2. Generate agent JSON (external service handles fixing and validation)
3. Preview or save based on the save parameter
"""
description = kwargs.get("description", "").strip()
context = kwargs.get("context", "")
save = kwargs.get("save", True)
session_id = session.session_id if session else None
# Extract async processing params (passed by long-running tool handler)
operation_id = kwargs.get("_operation_id")
task_id = kwargs.get("_task_id")
if not description:
return ErrorResponse(
message="Please provide a description of what the agent should do.",
error="Missing description parameter",
session_id=session_id,
)
library_agents = None
if user_id:
try:
library_agents = await get_all_relevant_agents_for_generation(
user_id=user_id,
search_query=description,
include_marketplace=True,
)
logger.debug(
f"Found {len(library_agents)} relevant agents for sub-agent composition"
)
except Exception as e:
logger.warning(f"Failed to fetch library agents: {e}")
try:
decomposition_result = await decompose_goal(
description, context, library_agents
)
except AgentGeneratorNotConfiguredError:
return ErrorResponse(
message=(
"Agent generation is not available. "
"The Agent Generator service is not configured."
),
error="service_not_configured",
session_id=session_id,
)
if decomposition_result is None:
return ErrorResponse(
message="Failed to analyze the goal. The agent generation service may be unavailable. Please try again.",
error="decomposition_failed",
details={"description": description[:100]},
session_id=session_id,
)
if decomposition_result.get("type") == "error":
error_msg = decomposition_result.get("error", "Unknown error")
error_type = decomposition_result.get("error_type", "unknown")
user_message = get_user_message_for_error(
error_type,
operation="analyze the goal",
llm_parse_message="The AI had trouble understanding this request. Please try rephrasing your goal.",
)
return ErrorResponse(
message=user_message,
error=f"decomposition_failed:{error_type}",
details={
"description": description[:100],
"service_error": error_msg,
"error_type": error_type,
},
session_id=session_id,
)
if decomposition_result.get("type") == "clarifying_questions":
questions = decomposition_result.get("questions", [])
return ClarificationNeededResponse(
message=(
"I need some more information to create this agent. "
"Please answer the following questions:"
),
questions=[
ClarifyingQuestion(
question=q.get("question", ""),
keyword=q.get("keyword", ""),
example=q.get("example"),
)
for q in questions
],
session_id=session_id,
)
if decomposition_result.get("type") == "unachievable_goal":
suggested = decomposition_result.get("suggested_goal", "")
reason = decomposition_result.get("reason", "")
return ErrorResponse(
message=(
f"This goal cannot be accomplished with the available blocks. "
f"{reason} "
f"Suggestion: {suggested}"
),
error="unachievable_goal",
details={"suggested_goal": suggested, "reason": reason},
session_id=session_id,
)
if decomposition_result.get("type") == "vague_goal":
suggested = decomposition_result.get("suggested_goal", "")
return ErrorResponse(
message=(
f"The goal is too vague to create a specific workflow. "
f"Suggestion: {suggested}"
),
error="vague_goal",
details={"suggested_goal": suggested},
session_id=session_id,
)
if user_id and library_agents is not None:
try:
library_agents = await enrich_library_agents_from_steps(
user_id=user_id,
decomposition_result=decomposition_result,
existing_agents=library_agents,
include_marketplace=True,
)
logger.debug(
f"After enrichment: {len(library_agents)} total agents for sub-agent composition"
)
except Exception as e:
logger.warning(f"Failed to enrich library agents from steps: {e}")
try:
agent_json = await generate_agent(
decomposition_result,
library_agents,
operation_id=operation_id,
task_id=task_id,
)
except AgentGeneratorNotConfiguredError:
return ErrorResponse(
message=(
"Agent generation is not available. "
"The Agent Generator service is not configured."
),
error="service_not_configured",
session_id=session_id,
)
if agent_json is None:
return ErrorResponse(
message="Failed to generate the agent. The agent generation service may be unavailable. Please try again.",
error="generation_failed",
details={"description": description[:100]},
session_id=session_id,
)
if isinstance(agent_json, dict) and agent_json.get("type") == "error":
error_msg = agent_json.get("error", "Unknown error")
error_type = agent_json.get("error_type", "unknown")
user_message = get_user_message_for_error(
error_type,
operation="generate the agent",
llm_parse_message="The AI had trouble generating the agent. Please try again or simplify your goal.",
validation_message=(
"I wasn't able to create a valid agent for this request. "
"The generated workflow had some structural issues. "
"Please try simplifying your goal or breaking it into smaller steps."
),
error_details=error_msg,
)
return ErrorResponse(
message=user_message,
error=f"generation_failed:{error_type}",
details={
"description": description[:100],
"service_error": error_msg,
"error_type": error_type,
},
session_id=session_id,
)
# Check if Agent Generator accepted for async processing
if agent_json.get("status") == "accepted":
logger.info(
f"Agent generation delegated to async processing "
f"(operation_id={operation_id}, task_id={task_id})"
)
return AsyncProcessingResponse(
message="Agent generation started. You'll be notified when it's complete.",
operation_id=operation_id,
task_id=task_id,
session_id=session_id,
)
agent_name = agent_json.get("name", "Generated Agent")
agent_description = agent_json.get("description", "")
node_count = len(agent_json.get("nodes", []))
link_count = len(agent_json.get("links", []))
if not save:
return AgentPreviewResponse(
message=(
f"I've generated an agent called '{agent_name}' with {node_count} blocks. "
f"Review it and call create_agent with save=true to save it to your library."
),
agent_json=agent_json,
agent_name=agent_name,
description=agent_description,
node_count=node_count,
link_count=link_count,
session_id=session_id,
)
if not user_id:
return ErrorResponse(
message="You must be logged in to save agents.",
error="auth_required",
session_id=session_id,
)
try:
created_graph, library_agent = await save_agent_to_library(
agent_json, user_id
)
return AgentSavedResponse(
message=f"Agent '{created_graph.name}' has been saved to your library!",
agent_id=created_graph.id,
agent_name=created_graph.name,
library_agent_id=library_agent.id,
library_agent_link=f"/library/agents/{library_agent.id}",
agent_page_link=f"/build?flowID={created_graph.id}",
session_id=session_id,
)
except Exception as e:
return ErrorResponse(
message=f"Failed to save the agent: {str(e)}",
error="save_failed",
details={"exception": str(e)},
session_id=session_id,
)

View File

@@ -1,337 +0,0 @@
"""CustomizeAgentTool - Customizes marketplace/template agents using natural language."""
import logging
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.api.features.store import db as store_db
from backend.api.features.store.exceptions import AgentNotFoundError
from .agent_generator import (
AgentGeneratorNotConfiguredError,
customize_template,
get_user_message_for_error,
graph_to_json,
save_agent_to_library,
)
from .base import BaseTool
from .models import (
AgentPreviewResponse,
AgentSavedResponse,
ClarificationNeededResponse,
ClarifyingQuestion,
ErrorResponse,
ToolResponseBase,
)
logger = logging.getLogger(__name__)
class CustomizeAgentTool(BaseTool):
"""Tool for customizing marketplace/template agents using natural language."""
@property
def name(self) -> str:
return "customize_agent"
@property
def description(self) -> str:
return (
"Customize a marketplace or template agent using natural language. "
"Takes an existing agent from the marketplace and modifies it based on "
"the user's requirements before adding to their library."
)
@property
def requires_auth(self) -> bool:
return True
@property
def is_long_running(self) -> bool:
return True
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"agent_id": {
"type": "string",
"description": (
"The marketplace agent ID in format 'creator/slug' "
"(e.g., 'autogpt/newsletter-writer'). "
"Get this from find_agent results."
),
},
"modifications": {
"type": "string",
"description": (
"Natural language description of how to customize the agent. "
"Be specific about what changes you want to make."
),
},
"context": {
"type": "string",
"description": (
"Additional context or answers to previous clarifying questions."
),
},
"save": {
"type": "boolean",
"description": (
"Whether to save the customized agent to the user's library. "
"Default is true. Set to false for preview only."
),
"default": True,
},
},
"required": ["agent_id", "modifications"],
}
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
"""Execute the customize_agent tool.
Flow:
1. Parse the agent ID to get creator/slug
2. Fetch the template agent from the marketplace
3. Call customize_template with the modification request
4. Preview or save based on the save parameter
"""
agent_id = kwargs.get("agent_id", "").strip()
modifications = kwargs.get("modifications", "").strip()
context = kwargs.get("context", "")
save = kwargs.get("save", True)
session_id = session.session_id if session else None
if not agent_id:
return ErrorResponse(
message="Please provide the marketplace agent ID (e.g., 'creator/agent-name').",
error="missing_agent_id",
session_id=session_id,
)
if not modifications:
return ErrorResponse(
message="Please describe how you want to customize this agent.",
error="missing_modifications",
session_id=session_id,
)
# Parse agent_id in format "creator/slug"
parts = [p.strip() for p in agent_id.split("/")]
if len(parts) != 2 or not parts[0] or not parts[1]:
return ErrorResponse(
message=(
f"Invalid agent ID format: '{agent_id}'. "
"Expected format is 'creator/agent-name' "
"(e.g., 'autogpt/newsletter-writer')."
),
error="invalid_agent_id_format",
session_id=session_id,
)
creator_username, agent_slug = parts
# Fetch the marketplace agent details
try:
agent_details = await store_db.get_store_agent_details(
username=creator_username, agent_name=agent_slug
)
except AgentNotFoundError:
return ErrorResponse(
message=(
f"Could not find marketplace agent '{agent_id}'. "
"Please check the agent ID and try again."
),
error="agent_not_found",
session_id=session_id,
)
except Exception as e:
logger.error(f"Error fetching marketplace agent {agent_id}: {e}")
return ErrorResponse(
message="Failed to fetch the marketplace agent. Please try again.",
error="fetch_error",
session_id=session_id,
)
if not agent_details.store_listing_version_id:
return ErrorResponse(
message=(
f"The agent '{agent_id}' does not have an available version. "
"Please try a different agent."
),
error="no_version_available",
session_id=session_id,
)
# Get the full agent graph
try:
graph = await store_db.get_agent(agent_details.store_listing_version_id)
template_agent = graph_to_json(graph)
except Exception as e:
logger.error(f"Error fetching agent graph for {agent_id}: {e}")
return ErrorResponse(
message="Failed to fetch the agent configuration. Please try again.",
error="graph_fetch_error",
session_id=session_id,
)
# Call customize_template
try:
result = await customize_template(
template_agent=template_agent,
modification_request=modifications,
context=context,
)
except AgentGeneratorNotConfiguredError:
return ErrorResponse(
message=(
"Agent customization is not available. "
"The Agent Generator service is not configured."
),
error="service_not_configured",
session_id=session_id,
)
except Exception as e:
logger.error(f"Error calling customize_template for {agent_id}: {e}")
return ErrorResponse(
message=(
"Failed to customize the agent due to a service error. "
"Please try again."
),
error="customization_service_error",
session_id=session_id,
)
if result is None:
return ErrorResponse(
message=(
"Failed to customize the agent. "
"The agent generation service may be unavailable or timed out. "
"Please try again."
),
error="customization_failed",
session_id=session_id,
)
# Handle error response
if isinstance(result, dict) and result.get("type") == "error":
error_msg = result.get("error", "Unknown error")
error_type = result.get("error_type", "unknown")
user_message = get_user_message_for_error(
error_type,
operation="customize the agent",
llm_parse_message=(
"The AI had trouble customizing the agent. "
"Please try again or simplify your request."
),
validation_message=(
"The customized agent failed validation. "
"Please try rephrasing your request."
),
error_details=error_msg,
)
return ErrorResponse(
message=user_message,
error=f"customization_failed:{error_type}",
session_id=session_id,
)
# Handle clarifying questions
if isinstance(result, dict) and result.get("type") == "clarifying_questions":
questions = result.get("questions") or []
if not isinstance(questions, list):
logger.error(
f"Unexpected clarifying questions format: {type(questions)}"
)
questions = []
return ClarificationNeededResponse(
message=(
"I need some more information to customize this agent. "
"Please answer the following questions:"
),
questions=[
ClarifyingQuestion(
question=q.get("question", ""),
keyword=q.get("keyword", ""),
example=q.get("example"),
)
for q in questions
if isinstance(q, dict)
],
session_id=session_id,
)
# Result should be the customized agent JSON
if not isinstance(result, dict):
logger.error(f"Unexpected customize_template response type: {type(result)}")
return ErrorResponse(
message="Failed to customize the agent due to an unexpected response.",
error="unexpected_response_type",
session_id=session_id,
)
customized_agent = result
agent_name = customized_agent.get(
"name", f"Customized {agent_details.agent_name}"
)
agent_description = customized_agent.get("description", "")
nodes = customized_agent.get("nodes")
links = customized_agent.get("links")
node_count = len(nodes) if isinstance(nodes, list) else 0
link_count = len(links) if isinstance(links, list) else 0
if not save:
return AgentPreviewResponse(
message=(
f"I've customized the agent '{agent_details.agent_name}'. "
f"The customized agent has {node_count} blocks. "
f"Review it and call customize_agent with save=true to save it."
),
agent_json=customized_agent,
agent_name=agent_name,
description=agent_description,
node_count=node_count,
link_count=link_count,
session_id=session_id,
)
if not user_id:
return ErrorResponse(
message="You must be logged in to save agents.",
error="auth_required",
session_id=session_id,
)
# Save to user's library
try:
created_graph, library_agent = await save_agent_to_library(
customized_agent, user_id, is_update=False
)
return AgentSavedResponse(
message=(
f"Customized agent '{created_graph.name}' "
f"(based on '{agent_details.agent_name}') "
f"has been saved to your library!"
),
agent_id=created_graph.id,
agent_name=created_graph.name,
library_agent_id=library_agent.id,
library_agent_link=f"/library/agents/{library_agent.id}",
agent_page_link=f"/build?flowID={created_graph.id}",
session_id=session_id,
)
except Exception as e:
logger.error(f"Error saving customized agent: {e}")
return ErrorResponse(
message="Failed to save the customized agent. Please try again.",
error="save_failed",
session_id=session_id,
)

View File

@@ -1,284 +0,0 @@
"""EditAgentTool - Edits existing agents using natural language."""
import logging
from typing import Any
from backend.api.features.chat.model import ChatSession
from .agent_generator import (
AgentGeneratorNotConfiguredError,
generate_agent_patch,
get_agent_as_json,
get_all_relevant_agents_for_generation,
get_user_message_for_error,
save_agent_to_library,
)
from .base import BaseTool
from .models import (
AgentPreviewResponse,
AgentSavedResponse,
AsyncProcessingResponse,
ClarificationNeededResponse,
ClarifyingQuestion,
ErrorResponse,
ToolResponseBase,
)
logger = logging.getLogger(__name__)
class EditAgentTool(BaseTool):
"""Tool for editing existing agents using natural language."""
@property
def name(self) -> str:
return "edit_agent"
@property
def description(self) -> str:
return (
"Edit an existing agent from the user's library using natural language. "
"Generates updates to the agent while preserving unchanged parts."
)
@property
def requires_auth(self) -> bool:
return True
@property
def is_long_running(self) -> bool:
return True
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"agent_id": {
"type": "string",
"description": (
"The ID of the agent to edit. "
"Can be a graph ID or library agent ID."
),
},
"changes": {
"type": "string",
"description": (
"Natural language description of what changes to make. "
"Be specific about what to add, remove, or modify."
),
},
"context": {
"type": "string",
"description": (
"Additional context or answers to previous clarifying questions."
),
},
"save": {
"type": "boolean",
"description": (
"Whether to save the changes. "
"Default is true. Set to false for preview only."
),
"default": True,
},
},
"required": ["agent_id", "changes"],
}
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
"""Execute the edit_agent tool.
Flow:
1. Fetch the current agent
2. Generate updated agent (external service handles fixing and validation)
3. Preview or save based on the save parameter
"""
agent_id = kwargs.get("agent_id", "").strip()
changes = kwargs.get("changes", "").strip()
context = kwargs.get("context", "")
save = kwargs.get("save", True)
session_id = session.session_id if session else None
# Extract async processing params (passed by long-running tool handler)
operation_id = kwargs.get("_operation_id")
task_id = kwargs.get("_task_id")
if not agent_id:
return ErrorResponse(
message="Please provide the agent ID to edit.",
error="Missing agent_id parameter",
session_id=session_id,
)
if not changes:
return ErrorResponse(
message="Please describe what changes you want to make.",
error="Missing changes parameter",
session_id=session_id,
)
current_agent = await get_agent_as_json(agent_id, user_id)
if current_agent is None:
return ErrorResponse(
message=f"Could not find agent with ID '{agent_id}' in your library.",
error="agent_not_found",
session_id=session_id,
)
library_agents = None
if user_id:
try:
graph_id = current_agent.get("id")
library_agents = await get_all_relevant_agents_for_generation(
user_id=user_id,
search_query=changes,
exclude_graph_id=graph_id,
include_marketplace=True,
)
logger.debug(
f"Found {len(library_agents)} relevant agents for sub-agent composition"
)
except Exception as e:
logger.warning(f"Failed to fetch library agents: {e}")
update_request = changes
if context:
update_request = f"{changes}\n\nAdditional context:\n{context}"
try:
result = await generate_agent_patch(
update_request,
current_agent,
library_agents,
operation_id=operation_id,
task_id=task_id,
)
except AgentGeneratorNotConfiguredError:
return ErrorResponse(
message=(
"Agent editing is not available. "
"The Agent Generator service is not configured."
),
error="service_not_configured",
session_id=session_id,
)
if result is None:
return ErrorResponse(
message="Failed to generate changes. The agent generation service may be unavailable or timed out. Please try again.",
error="update_generation_failed",
details={"agent_id": agent_id, "changes": changes[:100]},
session_id=session_id,
)
# Check if Agent Generator accepted for async processing
if result.get("status") == "accepted":
logger.info(
f"Agent edit delegated to async processing "
f"(operation_id={operation_id}, task_id={task_id})"
)
return AsyncProcessingResponse(
message="Agent edit started. You'll be notified when it's complete.",
operation_id=operation_id,
task_id=task_id,
session_id=session_id,
)
# Check if the result is an error from the external service
if isinstance(result, dict) and result.get("type") == "error":
error_msg = result.get("error", "Unknown error")
error_type = result.get("error_type", "unknown")
user_message = get_user_message_for_error(
error_type,
operation="generate the changes",
llm_parse_message="The AI had trouble generating the changes. Please try again or simplify your request.",
validation_message="The generated changes failed validation. Please try rephrasing your request.",
error_details=error_msg,
)
return ErrorResponse(
message=user_message,
error=f"update_generation_failed:{error_type}",
details={
"agent_id": agent_id,
"changes": changes[:100],
"service_error": error_msg,
"error_type": error_type,
},
session_id=session_id,
)
if result.get("type") == "clarifying_questions":
questions = result.get("questions", [])
return ClarificationNeededResponse(
message=(
"I need some more information about the changes. "
"Please answer the following questions:"
),
questions=[
ClarifyingQuestion(
question=q.get("question", ""),
keyword=q.get("keyword", ""),
example=q.get("example"),
)
for q in questions
],
session_id=session_id,
)
updated_agent = result
agent_name = updated_agent.get("name", "Updated Agent")
agent_description = updated_agent.get("description", "")
node_count = len(updated_agent.get("nodes", []))
link_count = len(updated_agent.get("links", []))
if not save:
return AgentPreviewResponse(
message=(
f"I've updated the agent. "
f"The agent now has {node_count} blocks. "
f"Review it and call edit_agent with save=true to save the changes."
),
agent_json=updated_agent,
agent_name=agent_name,
description=agent_description,
node_count=node_count,
link_count=link_count,
session_id=session_id,
)
if not user_id:
return ErrorResponse(
message="You must be logged in to save agents.",
error="auth_required",
session_id=session_id,
)
try:
created_graph, library_agent = await save_agent_to_library(
updated_agent, user_id, is_update=True
)
return AgentSavedResponse(
message=f"Updated agent '{created_graph.name}' has been saved to your library!",
agent_id=created_graph.id,
agent_name=created_graph.name,
library_agent_id=library_agent.id,
library_agent_link=f"/library/agents/{library_agent.id}",
agent_page_link=f"/build?flowID={created_graph.id}",
session_id=session_id,
)
except Exception as e:
return ErrorResponse(
message=f"Failed to save the updated agent: {str(e)}",
error="save_failed",
details={"exception": str(e)},
session_id=session_id,
)

View File

@@ -1,139 +0,0 @@
"""Tests for block filtering in FindBlockTool."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.api.features.chat.tools.find_block import (
COPILOT_EXCLUDED_BLOCK_IDS,
COPILOT_EXCLUDED_BLOCK_TYPES,
FindBlockTool,
)
from backend.api.features.chat.tools.models import BlockListResponse
from backend.data.block import BlockType
from ._test_data import make_session
_TEST_USER_ID = "test-user-find-block"
def make_mock_block(
block_id: str, name: str, block_type: BlockType, disabled: bool = False
):
"""Create a mock block for testing."""
mock = MagicMock()
mock.id = block_id
mock.name = name
mock.description = f"{name} description"
mock.block_type = block_type
mock.disabled = disabled
mock.input_schema = MagicMock()
mock.input_schema.jsonschema.return_value = {"properties": {}, "required": []}
mock.input_schema.get_credentials_fields.return_value = {}
mock.output_schema = MagicMock()
mock.output_schema.jsonschema.return_value = {}
mock.categories = []
return mock
class TestFindBlockFiltering:
"""Tests for block filtering in FindBlockTool."""
def test_excluded_block_types_contains_expected_types(self):
"""Verify COPILOT_EXCLUDED_BLOCK_TYPES contains all graph-only types."""
assert BlockType.INPUT in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.OUTPUT in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.WEBHOOK in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.WEBHOOK_MANUAL in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.NOTE in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.HUMAN_IN_THE_LOOP in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.AGENT in COPILOT_EXCLUDED_BLOCK_TYPES
def test_excluded_block_ids_contains_smart_decision_maker(self):
"""Verify SmartDecisionMakerBlock is in COPILOT_EXCLUDED_BLOCK_IDS."""
assert "3b191d9f-356f-482d-8238-ba04b6d18381" in COPILOT_EXCLUDED_BLOCK_IDS
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_type_filtered_from_results(self):
"""Verify blocks with excluded BlockTypes are filtered from search results."""
session = make_session(user_id=_TEST_USER_ID)
# Mock search returns an INPUT block (excluded) and a STANDARD block (included)
search_results = [
{"content_id": "input-block-id", "score": 0.9},
{"content_id": "standard-block-id", "score": 0.8},
]
input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)
standard_block = make_mock_block(
"standard-block-id", "HTTP Request", BlockType.STANDARD
)
def mock_get_block(block_id):
return {
"input-block-id": input_block,
"standard-block-id": standard_block,
}.get(block_id)
with patch(
"backend.api.features.chat.tools.find_block.unified_hybrid_search",
new_callable=AsyncMock,
return_value=(search_results, 2),
):
with patch(
"backend.api.features.chat.tools.find_block.get_block",
side_effect=mock_get_block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="test"
)
# Should only return the standard block, not the INPUT block
assert isinstance(response, BlockListResponse)
assert len(response.blocks) == 1
assert response.blocks[0].id == "standard-block-id"
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_id_filtered_from_results(self):
"""Verify SmartDecisionMakerBlock is filtered from search results."""
session = make_session(user_id=_TEST_USER_ID)
smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
search_results = [
{"content_id": smart_decision_id, "score": 0.9},
{"content_id": "normal-block-id", "score": 0.8},
]
# SmartDecisionMakerBlock has STANDARD type but is excluded by ID
smart_block = make_mock_block(
smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
)
normal_block = make_mock_block(
"normal-block-id", "Normal Block", BlockType.STANDARD
)
def mock_get_block(block_id):
return {
smart_decision_id: smart_block,
"normal-block-id": normal_block,
}.get(block_id)
with patch(
"backend.api.features.chat.tools.find_block.unified_hybrid_search",
new_callable=AsyncMock,
return_value=(search_results, 2),
):
with patch(
"backend.api.features.chat.tools.find_block.get_block",
side_effect=mock_get_block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="decision"
)
# Should only return normal block, not SmartDecisionMakerBlock
assert isinstance(response, BlockListResponse)
assert len(response.blocks) == 1
assert response.blocks[0].id == "normal-block-id"

View File

@@ -1,29 +0,0 @@
"""Shared helpers for chat tools."""
from typing import Any
def get_inputs_from_schema(
input_schema: dict[str, Any],
exclude_fields: set[str] | None = None,
) -> list[dict[str, Any]]:
"""Extract input field info from JSON schema."""
if not isinstance(input_schema, dict):
return []
exclude = exclude_fields or set()
properties = input_schema.get("properties", {})
required = set(input_schema.get("required", []))
return [
{
"name": name,
"title": schema.get("title", name),
"type": schema.get("type", "string"),
"description": schema.get("description", ""),
"required": name in required,
"default": schema.get("default"),
}
for name, schema in properties.items()
if name not in exclude
]

View File

@@ -1,423 +0,0 @@
"""Pydantic models for tool responses."""
from datetime import datetime
from enum import Enum
from typing import Any
from pydantic import BaseModel, Field
from backend.data.model import CredentialsMetaInput
class ResponseType(str, Enum):
"""Types of tool responses."""
AGENTS_FOUND = "agents_found"
AGENT_DETAILS = "agent_details"
SETUP_REQUIREMENTS = "setup_requirements"
EXECUTION_STARTED = "execution_started"
NEED_LOGIN = "need_login"
ERROR = "error"
NO_RESULTS = "no_results"
AGENT_OUTPUT = "agent_output"
UNDERSTANDING_UPDATED = "understanding_updated"
AGENT_PREVIEW = "agent_preview"
AGENT_SAVED = "agent_saved"
CLARIFICATION_NEEDED = "clarification_needed"
BLOCK_LIST = "block_list"
BLOCK_OUTPUT = "block_output"
DOC_SEARCH_RESULTS = "doc_search_results"
DOC_PAGE = "doc_page"
# Workspace response types
WORKSPACE_FILE_LIST = "workspace_file_list"
WORKSPACE_FILE_CONTENT = "workspace_file_content"
WORKSPACE_FILE_METADATA = "workspace_file_metadata"
WORKSPACE_FILE_WRITTEN = "workspace_file_written"
WORKSPACE_FILE_DELETED = "workspace_file_deleted"
# Long-running operation types
OPERATION_STARTED = "operation_started"
OPERATION_PENDING = "operation_pending"
OPERATION_IN_PROGRESS = "operation_in_progress"
# Input validation
INPUT_VALIDATION_ERROR = "input_validation_error"
# Base response model
class ToolResponseBase(BaseModel):
"""Base model for all tool responses."""
type: ResponseType
message: str
session_id: str | None = None
# Agent discovery models
class AgentInfo(BaseModel):
"""Information about an agent."""
id: str
name: str
description: str
source: str = Field(description="marketplace or library")
in_library: bool = False
creator: str | None = None
category: str | None = None
rating: float | None = None
runs: int | None = None
is_featured: bool | None = None
status: str | None = None
can_access_graph: bool | None = None
has_external_trigger: bool | None = None
new_output: bool | None = None
graph_id: str | None = None
inputs: dict[str, Any] | None = Field(
default=None,
description="Input schema for the agent, including field names, types, and defaults",
)
class AgentsFoundResponse(ToolResponseBase):
"""Response for find_agent tool."""
type: ResponseType = ResponseType.AGENTS_FOUND
title: str = "Available Agents"
agents: list[AgentInfo]
count: int
name: str = "agents_found"
class NoResultsResponse(ToolResponseBase):
"""Response when no agents found."""
type: ResponseType = ResponseType.NO_RESULTS
suggestions: list[str] = []
name: str = "no_results"
# Agent details models
class InputField(BaseModel):
"""Input field specification."""
name: str
type: str = "string"
description: str = ""
required: bool = False
default: Any | None = None
options: list[Any] | None = None
format: str | None = None
class ExecutionOptions(BaseModel):
"""Available execution options for an agent."""
manual: bool = True
scheduled: bool = True
webhook: bool = False
class AgentDetails(BaseModel):
"""Detailed agent information."""
id: str
name: str
description: str
in_library: bool = False
inputs: dict[str, Any] = {}
credentials: list[CredentialsMetaInput] = []
execution_options: ExecutionOptions = Field(default_factory=ExecutionOptions)
trigger_info: dict[str, Any] | None = None
class AgentDetailsResponse(ToolResponseBase):
"""Response for get_details action."""
type: ResponseType = ResponseType.AGENT_DETAILS
agent: AgentDetails
user_authenticated: bool = False
graph_id: str | None = None
graph_version: int | None = None
# Setup info models
class UserReadiness(BaseModel):
"""User readiness status."""
has_all_credentials: bool = False
missing_credentials: dict[str, Any] = {}
ready_to_run: bool = False
class SetupInfo(BaseModel):
"""Complete setup information."""
agent_id: str
agent_name: str
requirements: dict[str, list[Any]] = Field(
default_factory=lambda: {
"credentials": [],
"inputs": [],
"execution_modes": [],
},
)
user_readiness: UserReadiness = Field(default_factory=UserReadiness)
class SetupRequirementsResponse(ToolResponseBase):
"""Response for validate action."""
type: ResponseType = ResponseType.SETUP_REQUIREMENTS
setup_info: SetupInfo
graph_id: str | None = None
graph_version: int | None = None
# Execution models
class ExecutionStartedResponse(ToolResponseBase):
"""Response for run/schedule actions."""
type: ResponseType = ResponseType.EXECUTION_STARTED
execution_id: str
graph_id: str
graph_name: str
library_agent_id: str | None = None
library_agent_link: str | None = None
status: str = "QUEUED"
# Auth/error models
class NeedLoginResponse(ToolResponseBase):
"""Response when login is needed."""
type: ResponseType = ResponseType.NEED_LOGIN
agent_info: dict[str, Any] | None = None
class ErrorResponse(ToolResponseBase):
"""Response for errors."""
type: ResponseType = ResponseType.ERROR
error: str | None = None
details: dict[str, Any] | None = None
class InputValidationErrorResponse(ToolResponseBase):
"""Response when run_agent receives unknown input fields."""
type: ResponseType = ResponseType.INPUT_VALIDATION_ERROR
unrecognized_fields: list[str] = Field(
description="List of input field names that were not recognized"
)
inputs: dict[str, Any] = Field(
description="The agent's valid input schema for reference"
)
graph_id: str | None = None
graph_version: int | None = None
# Agent output models
class ExecutionOutputInfo(BaseModel):
"""Summary of a single execution's outputs."""
execution_id: str
status: str
started_at: datetime | None = None
ended_at: datetime | None = None
outputs: dict[str, list[Any]]
inputs_summary: dict[str, Any] | None = None
class AgentOutputResponse(ToolResponseBase):
"""Response for agent_output tool."""
type: ResponseType = ResponseType.AGENT_OUTPUT
agent_name: str
agent_id: str
library_agent_id: str | None = None
library_agent_link: str | None = None
execution: ExecutionOutputInfo | None = None
available_executions: list[dict[str, Any]] | None = None
total_executions: int = 0
# Business understanding models
class UnderstandingUpdatedResponse(ToolResponseBase):
"""Response for add_understanding tool."""
type: ResponseType = ResponseType.UNDERSTANDING_UPDATED
updated_fields: list[str] = Field(default_factory=list)
current_understanding: dict[str, Any] = Field(default_factory=dict)
# Agent generation models
class ClarifyingQuestion(BaseModel):
"""A question that needs user clarification."""
question: str
keyword: str
example: str | None = None
class AgentPreviewResponse(ToolResponseBase):
"""Response for previewing a generated agent before saving."""
type: ResponseType = ResponseType.AGENT_PREVIEW
agent_json: dict[str, Any]
agent_name: str
description: str
node_count: int
link_count: int = 0
class AgentSavedResponse(ToolResponseBase):
"""Response when an agent is saved to the library."""
type: ResponseType = ResponseType.AGENT_SAVED
agent_id: str
agent_name: str
library_agent_id: str
library_agent_link: str
agent_page_link: str # Link to the agent builder/editor page
class ClarificationNeededResponse(ToolResponseBase):
"""Response when the LLM needs more information from the user."""
type: ResponseType = ResponseType.CLARIFICATION_NEEDED
questions: list[ClarifyingQuestion] = Field(default_factory=list)
# Documentation search models
class DocSearchResult(BaseModel):
"""A single documentation search result."""
title: str
path: str
section: str
snippet: str # Short excerpt for UI display
score: float
doc_url: str | None = None
class DocSearchResultsResponse(ToolResponseBase):
"""Response for search_docs tool."""
type: ResponseType = ResponseType.DOC_SEARCH_RESULTS
results: list[DocSearchResult]
count: int
query: str
class DocPageResponse(ToolResponseBase):
"""Response for get_doc_page tool."""
type: ResponseType = ResponseType.DOC_PAGE
title: str
path: str
content: str # Full document content
doc_url: str | None = None
# Block models
class BlockInputFieldInfo(BaseModel):
"""Information about a block input field."""
name: str
type: str
description: str = ""
required: bool = False
default: Any | None = None
class BlockInfoSummary(BaseModel):
"""Summary of a block for search results."""
id: str
name: str
description: str
categories: list[str]
input_schema: dict[str, Any]
output_schema: dict[str, Any]
required_inputs: list[BlockInputFieldInfo] = Field(
default_factory=list,
description="List of required input fields for this block",
)
class BlockListResponse(ToolResponseBase):
"""Response for find_block tool."""
type: ResponseType = ResponseType.BLOCK_LIST
blocks: list[BlockInfoSummary]
count: int
query: str
usage_hint: str = Field(
default="To execute a block, call run_block with block_id set to the block's "
"'id' field and input_data containing the required fields from input_schema."
)
class BlockOutputResponse(ToolResponseBase):
"""Response for run_block tool."""
type: ResponseType = ResponseType.BLOCK_OUTPUT
block_id: str
block_name: str
outputs: dict[str, list[Any]]
success: bool = True
# Long-running operation models
class OperationStartedResponse(ToolResponseBase):
"""Response when a long-running operation has been started in the background.
This is returned immediately to the client while the operation continues
to execute. The user can close the tab and check back later.
The task_id can be used to reconnect to the SSE stream via
GET /chat/tasks/{task_id}/stream?last_idx=0
"""
type: ResponseType = ResponseType.OPERATION_STARTED
operation_id: str
tool_name: str
task_id: str | None = None # For SSE reconnection
class OperationPendingResponse(ToolResponseBase):
"""Response stored in chat history while a long-running operation is executing.
This is persisted to the database so users see a pending state when they
refresh before the operation completes.
"""
type: ResponseType = ResponseType.OPERATION_PENDING
operation_id: str
tool_name: str
class OperationInProgressResponse(ToolResponseBase):
"""Response when an operation is already in progress.
Returned for idempotency when the same tool_call_id is requested again
while the background task is still running.
"""
type: ResponseType = ResponseType.OPERATION_IN_PROGRESS
tool_call_id: str
class AsyncProcessingResponse(ToolResponseBase):
"""Response when an operation has been delegated to async processing.
This is returned by tools when the external service accepts the request
for async processing (HTTP 202 Accepted). The Redis Streams completion
consumer will handle the result when the external service completes.
The status field is specifically "accepted" to allow the long-running tool
handler to detect this response and skip LLM continuation.
"""
type: ResponseType = ResponseType.OPERATION_STARTED
status: str = "accepted" # Must be "accepted" for detection
operation_id: str | None = None
task_id: str | None = None

View File

@@ -1,355 +0,0 @@
"""Tool for executing blocks directly."""
import logging
import uuid
from collections import defaultdict
from typing import Any
from pydantic_core import PydanticUndefined
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.tools.find_block import (
COPILOT_EXCLUDED_BLOCK_IDS,
COPILOT_EXCLUDED_BLOCK_TYPES,
)
from backend.data.block import AnyBlockSchema, get_block
from backend.data.execution import ExecutionContext
from backend.data.model import CredentialsFieldInfo, CredentialsMetaInput
from backend.data.workspace import get_or_create_workspace
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.util.exceptions import BlockError
from .base import BaseTool
from .helpers import get_inputs_from_schema
from .models import (
BlockOutputResponse,
ErrorResponse,
SetupInfo,
SetupRequirementsResponse,
ToolResponseBase,
UserReadiness,
)
from .utils import (
build_missing_credentials_from_field_info,
match_credentials_to_requirements,
)
logger = logging.getLogger(__name__)
class RunBlockTool(BaseTool):
"""Tool for executing a block and returning its outputs."""
@property
def name(self) -> str:
return "run_block"
@property
def description(self) -> str:
return (
"Execute a specific block with the provided input data. "
"IMPORTANT: You MUST call find_block first to get the block's 'id' - "
"do NOT guess or make up block IDs. "
"Use the 'id' from find_block results and provide input_data "
"matching the block's required_inputs."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"block_id": {
"type": "string",
"description": (
"The block's 'id' field from find_block results. "
"NEVER guess this - always get it from find_block first."
),
},
"input_data": {
"type": "object",
"description": (
"Input values for the block. Use the 'required_inputs' field "
"from find_block to see what fields are needed."
),
},
},
"required": ["block_id", "input_data"],
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
"""Execute a block with the given input data.
Args:
user_id: User ID (required)
session: Chat session
block_id: Block UUID to execute
input_data: Input values for the block
Returns:
BlockOutputResponse: Block execution outputs
SetupRequirementsResponse: Missing credentials
ErrorResponse: Error message
"""
block_id = kwargs.get("block_id", "").strip()
input_data = kwargs.get("input_data", {})
session_id = session.session_id
if not block_id:
return ErrorResponse(
message="Please provide a block_id",
session_id=session_id,
)
if not isinstance(input_data, dict):
return ErrorResponse(
message="input_data must be an object",
session_id=session_id,
)
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
# Get the block
block = get_block(block_id)
if not block:
return ErrorResponse(
message=f"Block '{block_id}' not found",
session_id=session_id,
)
if block.disabled:
return ErrorResponse(
message=f"Block '{block_id}' is disabled",
session_id=session_id,
)
# Check if block is excluded from CoPilot (graph-only blocks)
if (
block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
or block.id in COPILOT_EXCLUDED_BLOCK_IDS
):
return ErrorResponse(
message=(
f"Block '{block.name}' cannot be run directly in CoPilot. "
"This block is designed for use within graphs only."
),
session_id=session_id,
)
logger.info(f"Executing block {block.name} ({block_id}) for user {user_id}")
creds_manager = IntegrationCredentialsManager()
matched_credentials, missing_credentials = (
await self._resolve_block_credentials(user_id, block, input_data)
)
if missing_credentials:
# Return setup requirements response with missing credentials
credentials_fields_info = block.input_schema.get_credentials_fields_info()
missing_creds_dict = build_missing_credentials_from_field_info(
credentials_fields_info, set(matched_credentials.keys())
)
missing_creds_list = list(missing_creds_dict.values())
return SetupRequirementsResponse(
message=(
f"Block '{block.name}' requires credentials that are not configured. "
"Please set up the required credentials before running this block."
),
session_id=session_id,
setup_info=SetupInfo(
agent_id=block_id,
agent_name=block.name,
user_readiness=UserReadiness(
has_all_credentials=False,
missing_credentials=missing_creds_dict,
ready_to_run=False,
),
requirements={
"credentials": missing_creds_list,
"inputs": self._get_inputs_list(block),
"execution_modes": ["immediate"],
},
),
graph_id=None,
graph_version=None,
)
try:
# Get or create user's workspace for CoPilot file operations
workspace = await get_or_create_workspace(user_id)
# Generate synthetic IDs for CoPilot context
# Each chat session is treated as its own agent with one continuous run
# This means:
# - graph_id (agent) = session (memories scoped to session when limit_to_agent=True)
# - graph_exec_id (run) = session (memories scoped to session when limit_to_run=True)
# - node_exec_id = unique per block execution
synthetic_graph_id = f"copilot-session-{session.session_id}"
synthetic_graph_exec_id = f"copilot-session-{session.session_id}"
synthetic_node_id = f"copilot-node-{block_id}"
synthetic_node_exec_id = (
f"copilot-{session.session_id}-{uuid.uuid4().hex[:8]}"
)
# Create unified execution context with all required fields
execution_context = ExecutionContext(
# Execution identity
user_id=user_id,
graph_id=synthetic_graph_id,
graph_exec_id=synthetic_graph_exec_id,
graph_version=1, # Versions are 1-indexed
node_id=synthetic_node_id,
node_exec_id=synthetic_node_exec_id,
# Workspace with session scoping
workspace_id=workspace.id,
session_id=session.session_id,
)
# Prepare kwargs for block execution
# Keep individual kwargs for backwards compatibility with existing blocks
exec_kwargs: dict[str, Any] = {
"user_id": user_id,
"execution_context": execution_context,
# Legacy: individual kwargs for blocks not yet using execution_context
"workspace_id": workspace.id,
"graph_exec_id": synthetic_graph_exec_id,
"node_exec_id": synthetic_node_exec_id,
"node_id": synthetic_node_id,
"graph_version": 1, # Versions are 1-indexed
"graph_id": synthetic_graph_id,
}
for field_name, cred_meta in matched_credentials.items():
# Inject metadata into input_data (for validation)
if field_name not in input_data:
input_data[field_name] = cred_meta.model_dump()
# Fetch actual credentials and pass as kwargs (for execution)
actual_credentials = await creds_manager.get(
user_id, cred_meta.id, lock=False
)
if actual_credentials:
exec_kwargs[field_name] = actual_credentials
else:
return ErrorResponse(
message=f"Failed to retrieve credentials for {field_name}",
session_id=session_id,
)
# Execute the block and collect outputs
outputs: dict[str, list[Any]] = defaultdict(list)
async for output_name, output_data in block.execute(
input_data,
**exec_kwargs,
):
outputs[output_name].append(output_data)
return BlockOutputResponse(
message=f"Block '{block.name}' executed successfully",
block_id=block_id,
block_name=block.name,
outputs=dict(outputs),
success=True,
session_id=session_id,
)
except BlockError as e:
logger.warning(f"Block execution failed: {e}")
return ErrorResponse(
message=f"Block execution failed: {e}",
error=str(e),
session_id=session_id,
)
except Exception as e:
logger.error(f"Unexpected error executing block: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to execute block: {str(e)}",
error=str(e),
session_id=session_id,
)
async def _resolve_block_credentials(
self,
user_id: str,
block: AnyBlockSchema,
input_data: dict[str, Any] | None = None,
) -> tuple[dict[str, CredentialsMetaInput], list[CredentialsMetaInput]]:
"""
Resolve credentials for a block by matching user's available credentials.
Args:
user_id: User ID
block: Block to resolve credentials for
input_data: Input data for the block (used to determine provider via discriminator)
Returns:
tuple of (matched_credentials, missing_credentials) - matched credentials
are used for block execution, missing ones indicate setup requirements.
"""
input_data = input_data or {}
requirements = self._resolve_discriminated_credentials(block, input_data)
if not requirements:
return {}, []
return await match_credentials_to_requirements(user_id, requirements)
def _get_inputs_list(self, block: AnyBlockSchema) -> list[dict[str, Any]]:
"""Extract non-credential inputs from block schema."""
schema = block.input_schema.jsonschema()
credentials_fields = set(block.input_schema.get_credentials_fields().keys())
return get_inputs_from_schema(schema, exclude_fields=credentials_fields)
def _resolve_discriminated_credentials(
self,
block: AnyBlockSchema,
input_data: dict[str, Any],
) -> dict[str, CredentialsFieldInfo]:
"""Resolve credential requirements, applying discriminator logic where needed."""
credentials_fields_info = block.input_schema.get_credentials_fields_info()
if not credentials_fields_info:
return {}
resolved: dict[str, CredentialsFieldInfo] = {}
for field_name, field_info in credentials_fields_info.items():
effective_field_info = field_info
if field_info.discriminator and field_info.discriminator_mapping:
discriminator_value = input_data.get(field_info.discriminator)
if discriminator_value is None:
field = block.input_schema.model_fields.get(
field_info.discriminator
)
if field and field.default is not PydanticUndefined:
discriminator_value = field.default
if (
discriminator_value
and discriminator_value in field_info.discriminator_mapping
):
effective_field_info = field_info.discriminate(discriminator_value)
# For host-scoped credentials, add the discriminator value
# (e.g., URL) so _credential_is_for_host can match it
effective_field_info.discriminator_values.add(discriminator_value)
logger.debug(
f"Discriminated provider for {field_name}: "
f"{discriminator_value} -> {effective_field_info.provider}"
)
resolved[field_name] = effective_field_info
return resolved

View File

@@ -1,106 +0,0 @@
"""Tests for block execution guards in RunBlockTool."""
from unittest.mock import MagicMock, patch
import pytest
from backend.api.features.chat.tools.models import ErrorResponse
from backend.api.features.chat.tools.run_block import RunBlockTool
from backend.data.block import BlockType
from ._test_data import make_session
_TEST_USER_ID = "test-user-run-block"
def make_mock_block(
block_id: str, name: str, block_type: BlockType, disabled: bool = False
):
"""Create a mock block for testing."""
mock = MagicMock()
mock.id = block_id
mock.name = name
mock.block_type = block_type
mock.disabled = disabled
mock.input_schema = MagicMock()
mock.input_schema.jsonschema.return_value = {"properties": {}, "required": []}
mock.input_schema.get_credentials_fields_info.return_value = []
return mock
class TestRunBlockFiltering:
"""Tests for block execution guards in RunBlockTool."""
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_type_returns_error(self):
"""Attempting to execute a block with excluded BlockType returns error."""
session = make_session(user_id=_TEST_USER_ID)
input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)
with patch(
"backend.api.features.chat.tools.run_block.get_block",
return_value=input_block,
):
tool = RunBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
block_id="input-block-id",
input_data={},
)
assert isinstance(response, ErrorResponse)
assert "cannot be run directly in CoPilot" in response.message
assert "designed for use within graphs only" in response.message
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_id_returns_error(self):
"""Attempting to execute SmartDecisionMakerBlock returns error."""
session = make_session(user_id=_TEST_USER_ID)
smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
smart_block = make_mock_block(
smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
)
with patch(
"backend.api.features.chat.tools.run_block.get_block",
return_value=smart_block,
):
tool = RunBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
block_id=smart_decision_id,
input_data={},
)
assert isinstance(response, ErrorResponse)
assert "cannot be run directly in CoPilot" in response.message
@pytest.mark.asyncio(loop_scope="session")
async def test_non_excluded_block_passes_guard(self):
"""Non-excluded blocks pass the filtering guard (may fail later for other reasons)."""
session = make_session(user_id=_TEST_USER_ID)
standard_block = make_mock_block(
"standard-id", "HTTP Request", BlockType.STANDARD
)
with patch(
"backend.api.features.chat.tools.run_block.get_block",
return_value=standard_block,
):
tool = RunBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
block_id="standard-id",
input_data={},
)
# Should NOT be an ErrorResponse about CoPilot exclusion
# (may be other errors like missing credentials, but not the exclusion guard)
if isinstance(response, ErrorResponse):
assert "cannot be run directly in CoPilot" not in response.message

View File

@@ -1,620 +0,0 @@
"""CoPilot tools for workspace file operations."""
import base64
import logging
from typing import Any, Optional
from pydantic import BaseModel
from backend.api.features.chat.model import ChatSession
from backend.data.workspace import get_or_create_workspace
from backend.util.settings import Config
from backend.util.virus_scanner import scan_content_safe
from backend.util.workspace import WorkspaceManager
from .base import BaseTool
from .models import ErrorResponse, ResponseType, ToolResponseBase
logger = logging.getLogger(__name__)
class WorkspaceFileInfoData(BaseModel):
"""Data model for workspace file information (not a response itself)."""
file_id: str
name: str
path: str
mime_type: str
size_bytes: int
class WorkspaceFileListResponse(ToolResponseBase):
"""Response containing list of workspace files."""
type: ResponseType = ResponseType.WORKSPACE_FILE_LIST
files: list[WorkspaceFileInfoData]
total_count: int
class WorkspaceFileContentResponse(ToolResponseBase):
"""Response containing workspace file content (legacy, for small text files)."""
type: ResponseType = ResponseType.WORKSPACE_FILE_CONTENT
file_id: str
name: str
path: str
mime_type: str
content_base64: str
class WorkspaceFileMetadataResponse(ToolResponseBase):
"""Response containing workspace file metadata and download URL (prevents context bloat)."""
type: ResponseType = ResponseType.WORKSPACE_FILE_METADATA
file_id: str
name: str
path: str
mime_type: str
size_bytes: int
download_url: str
preview: str | None = None # First 500 chars for text files
class WorkspaceWriteResponse(ToolResponseBase):
"""Response after writing a file to workspace."""
type: ResponseType = ResponseType.WORKSPACE_FILE_WRITTEN
file_id: str
name: str
path: str
size_bytes: int
class WorkspaceDeleteResponse(ToolResponseBase):
"""Response after deleting a file from workspace."""
type: ResponseType = ResponseType.WORKSPACE_FILE_DELETED
file_id: str
success: bool
class ListWorkspaceFilesTool(BaseTool):
"""Tool for listing files in user's workspace."""
@property
def name(self) -> str:
return "list_workspace_files"
@property
def description(self) -> str:
return (
"List files in the user's workspace. "
"Returns file names, paths, sizes, and metadata. "
"Optionally filter by path prefix."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"path_prefix": {
"type": "string",
"description": (
"Optional path prefix to filter files "
"(e.g., '/documents/' to list only files in documents folder). "
"By default, only files from the current session are listed."
),
},
"limit": {
"type": "integer",
"description": "Maximum number of files to return (default 50, max 100)",
"minimum": 1,
"maximum": 100,
},
"include_all_sessions": {
"type": "boolean",
"description": (
"If true, list files from all sessions. "
"Default is false (only current session's files)."
),
},
},
"required": [],
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
path_prefix: Optional[str] = kwargs.get("path_prefix")
limit = min(kwargs.get("limit", 50), 100)
include_all_sessions: bool = kwargs.get("include_all_sessions", False)
try:
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
files = await manager.list_files(
path=path_prefix,
limit=limit,
include_all_sessions=include_all_sessions,
)
total = await manager.get_file_count(
path=path_prefix,
include_all_sessions=include_all_sessions,
)
file_infos = [
WorkspaceFileInfoData(
file_id=f.id,
name=f.name,
path=f.path,
mime_type=f.mimeType,
size_bytes=f.sizeBytes,
)
for f in files
]
scope_msg = "all sessions" if include_all_sessions else "current session"
return WorkspaceFileListResponse(
files=file_infos,
total_count=total,
message=f"Found {len(files)} files in workspace ({scope_msg})",
session_id=session_id,
)
except Exception as e:
logger.error(f"Error listing workspace files: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to list workspace files: {str(e)}",
error=str(e),
session_id=session_id,
)
class ReadWorkspaceFileTool(BaseTool):
"""Tool for reading file content from workspace."""
# Size threshold for returning full content vs metadata+URL
# Files larger than this return metadata with download URL to prevent context bloat
MAX_INLINE_SIZE_BYTES = 32 * 1024 # 32KB
# Preview size for text files
PREVIEW_SIZE = 500
@property
def name(self) -> str:
return "read_workspace_file"
@property
def description(self) -> str:
return (
"Read a file from the user's workspace. "
"Specify either file_id or path to identify the file. "
"For small text files, returns content directly. "
"For large or binary files, returns metadata and a download URL. "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"file_id": {
"type": "string",
"description": "The file's unique ID (from list_workspace_files)",
},
"path": {
"type": "string",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
},
"force_download_url": {
"type": "boolean",
"description": (
"If true, always return metadata+URL instead of inline content. "
"Default is false (auto-selects based on file size/type)."
),
},
},
"required": [], # At least one must be provided
}
@property
def requires_auth(self) -> bool:
return True
def _is_text_mime_type(self, mime_type: str) -> bool:
"""Check if the MIME type is a text-based type."""
text_types = [
"text/",
"application/json",
"application/xml",
"application/javascript",
"application/x-python",
"application/x-sh",
]
return any(mime_type.startswith(t) for t in text_types)
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
file_id: Optional[str] = kwargs.get("file_id")
path: Optional[str] = kwargs.get("path")
force_download_url: bool = kwargs.get("force_download_url", False)
if not file_id and not path:
return ErrorResponse(
message="Please provide either file_id or path",
session_id=session_id,
)
try:
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
# Get file info
if file_id:
file_info = await manager.get_file_info(file_id)
if file_info is None:
return ErrorResponse(
message=f"File not found: {file_id}",
session_id=session_id,
)
target_file_id = file_id
else:
# path is guaranteed to be non-None here due to the check above
assert path is not None
file_info = await manager.get_file_info_by_path(path)
if file_info is None:
return ErrorResponse(
message=f"File not found at path: {path}",
session_id=session_id,
)
target_file_id = file_info.id
# Decide whether to return inline content or metadata+URL
is_small_file = file_info.sizeBytes <= self.MAX_INLINE_SIZE_BYTES
is_text_file = self._is_text_mime_type(file_info.mimeType)
# Return inline content for small text files (unless force_download_url)
if is_small_file and is_text_file and not force_download_url:
content = await manager.read_file_by_id(target_file_id)
content_b64 = base64.b64encode(content).decode("utf-8")
return WorkspaceFileContentResponse(
file_id=file_info.id,
name=file_info.name,
path=file_info.path,
mime_type=file_info.mimeType,
content_base64=content_b64,
message=f"Successfully read file: {file_info.name}",
session_id=session_id,
)
# Return metadata + workspace:// reference for large or binary files
# This prevents context bloat (100KB file = ~133KB as base64)
# Use workspace:// format so frontend urlTransform can add proxy prefix
download_url = f"workspace://{target_file_id}"
# Generate preview for text files
preview: str | None = None
if is_text_file:
try:
content = await manager.read_file_by_id(target_file_id)
preview_text = content[: self.PREVIEW_SIZE].decode(
"utf-8", errors="replace"
)
if len(content) > self.PREVIEW_SIZE:
preview_text += "..."
preview = preview_text
except Exception:
pass # Preview is optional
return WorkspaceFileMetadataResponse(
file_id=file_info.id,
name=file_info.name,
path=file_info.path,
mime_type=file_info.mimeType,
size_bytes=file_info.sizeBytes,
download_url=download_url,
preview=preview,
message=f"File: {file_info.name} ({file_info.sizeBytes} bytes). Use download_url to retrieve content.",
session_id=session_id,
)
except FileNotFoundError as e:
return ErrorResponse(
message=str(e),
session_id=session_id,
)
except Exception as e:
logger.error(f"Error reading workspace file: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to read workspace file: {str(e)}",
error=str(e),
session_id=session_id,
)
class WriteWorkspaceFileTool(BaseTool):
"""Tool for writing files to workspace."""
@property
def name(self) -> str:
return "write_workspace_file"
@property
def description(self) -> str:
return (
"Write or create a file in the user's workspace. "
"Provide the content as a base64-encoded string. "
f"Maximum file size is {Config().max_file_size_mb}MB. "
"Files are saved to the current session's folder by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "Name for the file (e.g., 'report.pdf')",
},
"content_base64": {
"type": "string",
"description": "Base64-encoded file content",
},
"path": {
"type": "string",
"description": (
"Optional virtual path where to save the file "
"(e.g., '/documents/report.pdf'). "
"Defaults to '/{filename}'. Scoped to current session."
),
},
"mime_type": {
"type": "string",
"description": (
"Optional MIME type of the file. "
"Auto-detected from filename if not provided."
),
},
"overwrite": {
"type": "boolean",
"description": "Whether to overwrite if file exists at path (default: false)",
},
},
"required": ["filename", "content_base64"],
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
filename: str = kwargs.get("filename", "")
content_b64: str = kwargs.get("content_base64", "")
path: Optional[str] = kwargs.get("path")
mime_type: Optional[str] = kwargs.get("mime_type")
overwrite: bool = kwargs.get("overwrite", False)
if not filename:
return ErrorResponse(
message="Please provide a filename",
session_id=session_id,
)
if not content_b64:
return ErrorResponse(
message="Please provide content_base64",
session_id=session_id,
)
# Decode content
try:
content = base64.b64decode(content_b64)
except Exception:
return ErrorResponse(
message="Invalid base64-encoded content",
session_id=session_id,
)
# Check size
max_file_size = Config().max_file_size_mb * 1024 * 1024
if len(content) > max_file_size:
return ErrorResponse(
message=f"File too large. Maximum size is {Config().max_file_size_mb}MB",
session_id=session_id,
)
try:
# Virus scan
await scan_content_safe(content, filename=filename)
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
file_record = await manager.write_file(
content=content,
filename=filename,
path=path,
mime_type=mime_type,
overwrite=overwrite,
)
return WorkspaceWriteResponse(
file_id=file_record.id,
name=file_record.name,
path=file_record.path,
size_bytes=file_record.sizeBytes,
message=f"Successfully wrote file: {file_record.name}",
session_id=session_id,
)
except ValueError as e:
return ErrorResponse(
message=str(e),
session_id=session_id,
)
except Exception as e:
logger.error(f"Error writing workspace file: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to write workspace file: {str(e)}",
error=str(e),
session_id=session_id,
)
class DeleteWorkspaceFileTool(BaseTool):
"""Tool for deleting files from workspace."""
@property
def name(self) -> str:
return "delete_workspace_file"
@property
def description(self) -> str:
return (
"Delete a file from the user's workspace. "
"Specify either file_id or path to identify the file. "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"file_id": {
"type": "string",
"description": "The file's unique ID (from list_workspace_files)",
},
"path": {
"type": "string",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
},
},
"required": [], # At least one must be provided
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
file_id: Optional[str] = kwargs.get("file_id")
path: Optional[str] = kwargs.get("path")
if not file_id and not path:
return ErrorResponse(
message="Please provide either file_id or path",
session_id=session_id,
)
try:
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
# Determine the file_id to delete
target_file_id: str
if file_id:
target_file_id = file_id
else:
# path is guaranteed to be non-None here due to the check above
assert path is not None
file_info = await manager.get_file_info_by_path(path)
if file_info is None:
return ErrorResponse(
message=f"File not found at path: {path}",
session_id=session_id,
)
target_file_id = file_info.id
success = await manager.delete_file(target_file_id)
if not success:
return ErrorResponse(
message=f"File not found: {target_file_id}",
session_id=session_id,
)
return WorkspaceDeleteResponse(
file_id=target_file_id,
success=True,
message="File deleted successfully",
session_id=session_id,
)
except Exception as e:
logger.error(f"Error deleting workspace file: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to delete workspace file: {str(e)}",
error=str(e),
session_id=session_id,
)

View File

@@ -638,7 +638,7 @@ async def test_process_review_action_auto_approve_creates_auto_approval_records(
# Mock get_node_executions to return node_id mapping
mock_get_node_executions = mocker.patch(
"backend.data.execution.get_node_executions"
"backend.api.features.executions.review.routes.get_node_executions"
)
mock_node_exec = mocker.Mock(spec=NodeExecutionResult)
mock_node_exec.node_exec_id = "test_node_123"
@@ -936,7 +936,7 @@ async def test_process_review_action_auto_approve_only_applies_to_approved_revie
# Mock get_node_executions to return node_id mapping
mock_get_node_executions = mocker.patch(
"backend.data.execution.get_node_executions"
"backend.api.features.executions.review.routes.get_node_executions"
)
mock_node_exec = mocker.Mock(spec=NodeExecutionResult)
mock_node_exec.node_exec_id = "node_exec_approved"
@@ -1148,7 +1148,7 @@ async def test_process_review_action_per_review_auto_approve_granularity(
# Mock get_node_executions to return batch node data
mock_get_node_executions = mocker.patch(
"backend.data.execution.get_node_executions"
"backend.api.features.executions.review.routes.get_node_executions"
)
# Create mock node executions for each review
mock_node_execs = []

View File

@@ -6,10 +6,15 @@ import autogpt_libs.auth as autogpt_auth_lib
from fastapi import APIRouter, HTTPException, Query, Security, status
from prisma.enums import ReviewStatus
from backend.copilot.constants import (
is_copilot_synthetic_id,
parse_node_id_from_exec_id,
)
from backend.data.execution import (
ExecutionContext,
ExecutionStatus,
get_graph_execution_meta,
get_node_executions,
)
from backend.data.graph import get_graph_settings
from backend.data.human_review import (
@@ -22,6 +27,7 @@ from backend.data.human_review import (
)
from backend.data.model import USER_TIMEZONE_NOT_SET
from backend.data.user import get_user_by_id
from backend.data.workspace import get_or_create_workspace
from backend.executor.utils import add_graph_execution
from .model import PendingHumanReviewModel, ReviewRequest, ReviewResponse
@@ -35,6 +41,38 @@ router = APIRouter(
)
async def _resolve_node_ids(
node_exec_ids: list[str],
graph_exec_id: str,
is_copilot: bool,
) -> dict[str, str]:
"""Resolve node_exec_id -> node_id for auto-approval records.
CoPilot synthetic IDs encode node_id in the format "{node_id}:{random}".
Graph executions look up node_id from NodeExecution records.
"""
if not node_exec_ids:
return {}
if is_copilot:
return {neid: parse_node_id_from_exec_id(neid) for neid in node_exec_ids}
node_execs = await get_node_executions(
graph_exec_id=graph_exec_id, include_exec_data=False
)
node_exec_map = {ne.node_exec_id: ne.node_id for ne in node_execs}
result = {}
for neid in node_exec_ids:
if neid in node_exec_map:
result[neid] = node_exec_map[neid]
else:
logger.error(
f"Failed to resolve node_id for {neid}: Node execution not found."
)
return result
@router.get(
"/pending",
summary="Get Pending Reviews",
@@ -109,14 +147,16 @@ async def list_pending_reviews_for_execution(
"""
# Verify user owns the graph execution before returning reviews
graph_exec = await get_graph_execution_meta(
user_id=user_id, execution_id=graph_exec_id
)
if not graph_exec:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Graph execution #{graph_exec_id} not found",
# (CoPilot synthetic IDs don't have graph execution records)
if not is_copilot_synthetic_id(graph_exec_id):
graph_exec = await get_graph_execution_meta(
user_id=user_id, execution_id=graph_exec_id
)
if not graph_exec:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Graph execution #{graph_exec_id} not found",
)
return await get_pending_reviews_for_execution(graph_exec_id, user_id)
@@ -159,30 +199,26 @@ async def process_review_action(
)
graph_exec_id = next(iter(graph_exec_ids))
is_copilot = is_copilot_synthetic_id(graph_exec_id)
# Validate execution status before processing reviews
graph_exec_meta = await get_graph_execution_meta(
user_id=user_id, execution_id=graph_exec_id
)
if not graph_exec_meta:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Graph execution #{graph_exec_id} not found",
)
# Only allow processing reviews if execution is paused for review
# or incomplete (partial execution with some reviews already processed)
if graph_exec_meta.status not in (
ExecutionStatus.REVIEW,
ExecutionStatus.INCOMPLETE,
):
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail=f"Cannot process reviews while execution status is {graph_exec_meta.status}. "
f"Reviews can only be processed when execution is paused (REVIEW status). "
f"Current status: {graph_exec_meta.status}",
# Validate execution status for graph executions (skip for CoPilot synthetic IDs)
if not is_copilot:
graph_exec_meta = await get_graph_execution_meta(
user_id=user_id, execution_id=graph_exec_id
)
if not graph_exec_meta:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Graph execution #{graph_exec_id} not found",
)
if graph_exec_meta.status not in (
ExecutionStatus.REVIEW,
ExecutionStatus.INCOMPLETE,
):
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail=f"Cannot process reviews while execution status is {graph_exec_meta.status}",
)
# Build review decisions map and track which reviews requested auto-approval
# Auto-approved reviews use original data (no modifications allowed)
@@ -235,7 +271,7 @@ async def process_review_action(
)
return (node_id, False)
# Collect node_exec_ids that need auto-approval
# Collect node_exec_ids that need auto-approval and resolve their node_ids
node_exec_ids_needing_auto_approval = [
node_exec_id
for node_exec_id, review_result in updated_reviews.items()
@@ -243,29 +279,16 @@ async def process_review_action(
and auto_approve_requests.get(node_exec_id, False)
]
# Batch-fetch node executions to get node_ids
node_id_map = await _resolve_node_ids(
node_exec_ids_needing_auto_approval, graph_exec_id, is_copilot
)
# Deduplicate by node_id — one auto-approval per node
nodes_needing_auto_approval: dict[str, Any] = {}
if node_exec_ids_needing_auto_approval:
from backend.data.execution import get_node_executions
node_execs = await get_node_executions(
graph_exec_id=graph_exec_id, include_exec_data=False
)
node_exec_map = {node_exec.node_exec_id: node_exec for node_exec in node_execs}
for node_exec_id in node_exec_ids_needing_auto_approval:
node_exec = node_exec_map.get(node_exec_id)
if node_exec:
review_result = updated_reviews[node_exec_id]
# Use the first approved review for this node (deduplicate by node_id)
if node_exec.node_id not in nodes_needing_auto_approval:
nodes_needing_auto_approval[node_exec.node_id] = review_result
else:
logger.error(
f"Failed to create auto-approval record for {node_exec_id}: "
f"Node execution not found. This may indicate a race condition "
f"or data inconsistency."
)
for node_exec_id in node_exec_ids_needing_auto_approval:
node_id = node_id_map.get(node_exec_id)
if node_id and node_id not in nodes_needing_auto_approval:
nodes_needing_auto_approval[node_id] = updated_reviews[node_exec_id]
# Execute all auto-approval creations in parallel (deduplicated by node_id)
auto_approval_results = await asyncio.gather(
@@ -280,13 +303,11 @@ async def process_review_action(
auto_approval_failed_count = 0
for result in auto_approval_results:
if isinstance(result, Exception):
# Unexpected exception during auto-approval creation
auto_approval_failed_count += 1
logger.error(
f"Unexpected exception during auto-approval creation: {result}"
)
elif isinstance(result, tuple) and len(result) == 2 and not result[1]:
# Auto-approval creation failed (returned False)
auto_approval_failed_count += 1
# Count results
@@ -301,30 +322,31 @@ async def process_review_action(
if review.status == ReviewStatus.REJECTED
)
# Resume execution only if ALL pending reviews for this execution have been processed
if updated_reviews:
# Resume graph execution only for real graph executions (not CoPilot)
# CoPilot sessions are resumed by the LLM retrying run_block with review_id
if not is_copilot and updated_reviews:
still_has_pending = await has_pending_reviews_for_graph_exec(graph_exec_id)
if not still_has_pending:
# Get the graph_id from any processed review
first_review = next(iter(updated_reviews.values()))
try:
# Fetch user and settings to build complete execution context
user = await get_user_by_id(user_id)
settings = await get_graph_settings(
user_id=user_id, graph_id=first_review.graph_id
)
# Preserve user's timezone preference when resuming execution
user_timezone = (
user.timezone if user.timezone != USER_TIMEZONE_NOT_SET else "UTC"
)
workspace = await get_or_create_workspace(user_id)
execution_context = ExecutionContext(
human_in_the_loop_safe_mode=settings.human_in_the_loop_safe_mode,
sensitive_action_safe_mode=settings.sensitive_action_safe_mode,
user_timezone=user_timezone,
workspace_id=workspace.id,
)
await add_graph_execution(

View File

@@ -1,7 +1,7 @@
import asyncio
import logging
from datetime import datetime, timedelta, timezone
from typing import TYPE_CHECKING, Annotated, List, Literal
from typing import TYPE_CHECKING, Annotated, Any, List, Literal
from autogpt_libs.auth import get_user_id
from fastapi import (
@@ -14,7 +14,7 @@ from fastapi import (
Security,
status,
)
from pydantic import BaseModel, Field, SecretStr
from pydantic import BaseModel, Field, SecretStr, model_validator
from starlette.status import HTTP_500_INTERNAL_SERVER_ERROR, HTTP_502_BAD_GATEWAY
from backend.api.features.library.db import set_preset_webhook, update_preset
@@ -39,7 +39,11 @@ from backend.data.onboarding import OnboardingStep, complete_onboarding_step
from backend.data.user import get_user_integrations
from backend.executor.utils import add_graph_execution
from backend.integrations.ayrshare import AyrshareClient, SocialPlatform
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.integrations.credentials_store import provider_matches
from backend.integrations.creds_manager import (
IntegrationCredentialsManager,
create_mcp_oauth_handler,
)
from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
from backend.integrations.providers import ProviderName
from backend.integrations.webhooks import get_webhook_manager
@@ -102,9 +106,37 @@ class CredentialsMetaResponse(BaseModel):
scopes: list[str] | None
username: str | None
host: str | None = Field(
default=None, description="Host pattern for host-scoped credentials"
default=None,
description="Host pattern for host-scoped or MCP server URL for MCP credentials",
)
@model_validator(mode="before")
@classmethod
def _normalize_provider(cls, data: Any) -> Any:
"""Fix ``ProviderName.X`` format from Python 3.13 ``str(Enum)`` bug."""
if isinstance(data, dict):
prov = data.get("provider", "")
if isinstance(prov, str) and prov.startswith("ProviderName."):
member = prov.removeprefix("ProviderName.")
try:
data = {**data, "provider": ProviderName[member].value}
except KeyError:
pass
return data
@staticmethod
def get_host(cred: Credentials) -> str | None:
"""Extract host from credential: HostScoped host or MCP server URL."""
if isinstance(cred, HostScopedCredentials):
return cred.host
if isinstance(cred, OAuth2Credentials) and cred.provider in (
ProviderName.MCP,
ProviderName.MCP.value,
"ProviderName.MCP",
):
return (cred.metadata or {}).get("mcp_server_url")
return None
@router.post("/{provider}/callback", summary="Exchange OAuth code for tokens")
async def callback(
@@ -179,9 +211,7 @@ async def callback(
title=credentials.title,
scopes=credentials.scopes,
username=credentials.username,
host=(
credentials.host if isinstance(credentials, HostScopedCredentials) else None
),
host=(CredentialsMetaResponse.get_host(credentials)),
)
@@ -199,7 +229,7 @@ async def list_credentials(
title=cred.title,
scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
username=cred.username if isinstance(cred, OAuth2Credentials) else None,
host=cred.host if isinstance(cred, HostScopedCredentials) else None,
host=CredentialsMetaResponse.get_host(cred),
)
for cred in credentials
]
@@ -222,7 +252,7 @@ async def list_credentials_by_provider(
title=cred.title,
scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
username=cred.username if isinstance(cred, OAuth2Credentials) else None,
host=cred.host if isinstance(cred, HostScopedCredentials) else None,
host=CredentialsMetaResponse.get_host(cred),
)
for cred in credentials
]
@@ -322,7 +352,11 @@ async def delete_credentials(
tokens_revoked = None
if isinstance(creds, OAuth2Credentials):
handler = _get_provider_oauth_handler(request, provider)
if provider_matches(provider.value, ProviderName.MCP.value):
# MCP uses dynamic per-server OAuth — create handler from metadata
handler = create_mcp_oauth_handler(creds)
else:
handler = _get_provider_oauth_handler(request, provider)
tokens_revoked = await handler.revoke_tokens(creds)
return CredentialsDeletionResponse(revoked=tokens_revoked)

File diff suppressed because it is too large Load Diff

View File

@@ -4,7 +4,6 @@ import prisma.enums
import prisma.models
import pytest
import backend.api.features.store.exceptions
from backend.data.db import connect
from backend.data.includes import library_agent_include
@@ -144,6 +143,7 @@ async def test_add_agent_to_library(mocker):
)
mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
mock_library_agent.return_value.find_unique = mocker.AsyncMock(return_value=None)
mock_library_agent.return_value.create = mocker.AsyncMock(
return_value=mock_library_agent_data
@@ -178,7 +178,6 @@ async def test_add_agent_to_library(mocker):
"agentGraphVersion": 1,
}
},
include={"AgentGraph": True},
)
# Check that create was called with the expected data including settings
create_call_args = mock_library_agent.return_value.create.call_args
@@ -218,7 +217,7 @@ async def test_add_agent_to_library_not_found(mocker):
)
# Call function and verify exception
with pytest.raises(backend.api.features.store.exceptions.AgentNotFoundError):
with pytest.raises(db.NotFoundError):
await db.add_store_agent_to_library("version123", "test-user")
# Verify mock called correctly

View File

@@ -0,0 +1,10 @@
class FolderValidationError(Exception):
"""Raised when folder operations fail validation."""
pass
class FolderAlreadyExistsError(FolderValidationError):
"""Raised when a folder with the same name already exists in the location."""
pass

View File

@@ -6,9 +6,12 @@ import prisma.enums
import prisma.models
import pydantic
from backend.data.block import BlockInput
from backend.data.graph import GraphModel, GraphSettings, GraphTriggerInfo
from backend.data.model import CredentialsMetaInput, is_credentials_field_name
from backend.data.model import (
CredentialsMetaInput,
GraphInput,
is_credentials_field_name,
)
from backend.util.json import loads as json_loads
from backend.util.models import Pagination
@@ -23,6 +26,95 @@ class LibraryAgentStatus(str, Enum):
ERROR = "ERROR"
# === Folder Models ===
class LibraryFolder(pydantic.BaseModel):
"""Represents a folder for organizing library agents."""
id: str
user_id: str
name: str
icon: str | None = None
color: str | None = None
parent_id: str | None = None
created_at: datetime.datetime
updated_at: datetime.datetime
agent_count: int = 0 # Direct agents in folder
subfolder_count: int = 0 # Direct child folders
@staticmethod
def from_db(
folder: prisma.models.LibraryFolder,
agent_count: int = 0,
subfolder_count: int = 0,
) -> "LibraryFolder":
"""Factory method that constructs a LibraryFolder from a Prisma model."""
return LibraryFolder(
id=folder.id,
user_id=folder.userId,
name=folder.name,
icon=folder.icon,
color=folder.color,
parent_id=folder.parentId,
created_at=folder.createdAt,
updated_at=folder.updatedAt,
agent_count=agent_count,
subfolder_count=subfolder_count,
)
class LibraryFolderTree(LibraryFolder):
"""Folder with nested children for tree view."""
children: list["LibraryFolderTree"] = []
class FolderCreateRequest(pydantic.BaseModel):
"""Request model for creating a folder."""
name: str = pydantic.Field(..., min_length=1, max_length=100)
icon: str | None = None
color: str | None = pydantic.Field(
None, pattern=r"^#[0-9A-Fa-f]{6}$", description="Hex color code (#RRGGBB)"
)
parent_id: str | None = None
class FolderUpdateRequest(pydantic.BaseModel):
"""Request model for updating a folder."""
name: str | None = pydantic.Field(None, min_length=1, max_length=100)
icon: str | None = None
color: str | None = None
class FolderMoveRequest(pydantic.BaseModel):
"""Request model for moving a folder to a new parent."""
target_parent_id: str | None = None # None = move to root
class BulkMoveAgentsRequest(pydantic.BaseModel):
"""Request model for moving multiple agents to a folder."""
agent_ids: list[str]
folder_id: str | None = None # None = move to root
class FolderListResponse(pydantic.BaseModel):
"""Response schema for a list of folders."""
folders: list[LibraryFolder]
pagination: Pagination
class FolderTreeResponse(pydantic.BaseModel):
"""Response schema for folder tree structure."""
tree: list[LibraryFolderTree]
class MarketplaceListingCreator(pydantic.BaseModel):
"""Creator information for a marketplace listing."""
@@ -73,7 +165,6 @@ class LibraryAgent(pydantic.BaseModel):
id: str
graph_id: str
graph_version: int
owner_user_id: str
image_url: str | None
@@ -114,9 +205,14 @@ class LibraryAgent(pydantic.BaseModel):
default_factory=list,
description="List of recent executions with status, score, and summary",
)
can_access_graph: bool
can_access_graph: bool = pydantic.Field(
description="Indicates whether the same user owns the corresponding graph"
)
is_latest_version: bool
is_favorite: bool
folder_id: str | None = None
folder_name: str | None = None # Denormalized for display
recommended_schedule_cron: str | None = None
settings: GraphSettings = pydantic.Field(default_factory=GraphSettings)
marketplace_listing: Optional["MarketplaceListing"] = None
@@ -229,7 +325,6 @@ class LibraryAgent(pydantic.BaseModel):
id=agent.id,
graph_id=agent.agentGraphId,
graph_version=agent.agentGraphVersion,
owner_user_id=agent.userId,
image_url=agent.imageUrl,
creator_name=creator_name,
creator_image_url=creator_image_url,
@@ -256,6 +351,8 @@ class LibraryAgent(pydantic.BaseModel):
can_access_graph=can_access_graph,
is_latest_version=is_latest_version,
is_favorite=agent.isFavorite,
folder_id=agent.folderId,
folder_name=agent.Folder.name if agent.Folder else None,
recommended_schedule_cron=agent.AgentGraph.recommendedScheduleCron,
settings=_parse_settings(agent.settings),
marketplace_listing=marketplace_listing_data,
@@ -323,7 +420,7 @@ class LibraryAgentPresetCreatable(pydantic.BaseModel):
graph_id: str
graph_version: int
inputs: BlockInput
inputs: GraphInput
credentials: dict[str, CredentialsMetaInput]
name: str
@@ -352,7 +449,7 @@ class LibraryAgentPresetUpdatable(pydantic.BaseModel):
Request model used when updating a preset for a library agent.
"""
inputs: Optional[BlockInput] = None
inputs: Optional[GraphInput] = None
credentials: Optional[dict[str, CredentialsMetaInput]] = None
name: Optional[str] = None
@@ -395,7 +492,7 @@ class LibraryAgentPreset(LibraryAgentPresetCreatable):
"Webhook must be included in AgentPreset query when webhookId is set"
)
input_data: BlockInput = {}
input_data: GraphInput = {}
input_credentials: dict[str, CredentialsMetaInput] = {}
for preset_input in preset.InputPresets:
@@ -467,3 +564,7 @@ class LibraryAgentUpdateRequest(pydantic.BaseModel):
settings: Optional[GraphSettings] = pydantic.Field(
default=None, description="User-specific settings for this library agent"
)
folder_id: Optional[str] = pydantic.Field(
default=None,
description="Folder ID to move agent to (None to move to root)",
)

View File

@@ -1,9 +1,11 @@
import fastapi
from .agents import router as agents_router
from .folders import router as folders_router
from .presets import router as presets_router
router = fastapi.APIRouter()
router.include_router(presets_router)
router.include_router(folders_router)
router.include_router(agents_router)

View File

@@ -41,6 +41,14 @@ async def list_library_agents(
ge=1,
description="Number of agents per page (must be >= 1)",
),
folder_id: Optional[str] = Query(
None,
description="Filter by folder ID",
),
include_root_only: bool = Query(
False,
description="Only return agents without a folder (root-level agents)",
),
) -> library_model.LibraryAgentResponse:
"""
Get all agents in the user's library (both created and saved).
@@ -51,6 +59,8 @@ async def list_library_agents(
sort_by=sort_by,
page=page,
page_size=page_size,
folder_id=folder_id,
include_root_only=include_root_only,
)
@@ -168,6 +178,7 @@ async def update_library_agent(
is_favorite=payload.is_favorite,
is_archived=payload.is_archived,
settings=payload.settings,
folder_id=payload.folder_id,
)

View File

@@ -0,0 +1,287 @@
from typing import Optional
import autogpt_libs.auth as autogpt_auth_lib
from fastapi import APIRouter, Query, Security, status
from fastapi.responses import Response
from .. import db as library_db
from .. import model as library_model
router = APIRouter(
prefix="/folders",
tags=["library", "folders", "private"],
dependencies=[Security(autogpt_auth_lib.requires_user)],
)
@router.get(
"",
summary="List Library Folders",
response_model=library_model.FolderListResponse,
responses={
200: {"description": "List of folders"},
500: {"description": "Server error"},
},
)
async def list_folders(
user_id: str = Security(autogpt_auth_lib.get_user_id),
parent_id: Optional[str] = Query(
None,
description="Filter by parent folder ID. If not provided, returns root-level folders.",
),
include_relations: bool = Query(
True,
description="Include agent and subfolder relations (for counts)",
),
) -> library_model.FolderListResponse:
"""
List folders for the authenticated user.
Args:
user_id: ID of the authenticated user.
parent_id: Optional parent folder ID to filter by.
include_relations: Whether to include agent and subfolder relations for counts.
Returns:
A FolderListResponse containing folders.
"""
folders = await library_db.list_folders(
user_id=user_id,
parent_id=parent_id,
include_relations=include_relations,
)
return library_model.FolderListResponse(
folders=folders,
pagination=library_model.Pagination(
total_items=len(folders),
total_pages=1,
current_page=1,
page_size=len(folders),
),
)
@router.get(
"/tree",
summary="Get Folder Tree",
response_model=library_model.FolderTreeResponse,
responses={
200: {"description": "Folder tree structure"},
500: {"description": "Server error"},
},
)
async def get_folder_tree(
user_id: str = Security(autogpt_auth_lib.get_user_id),
) -> library_model.FolderTreeResponse:
"""
Get the full folder tree for the authenticated user.
Args:
user_id: ID of the authenticated user.
Returns:
A FolderTreeResponse containing the nested folder structure.
"""
tree = await library_db.get_folder_tree(user_id=user_id)
return library_model.FolderTreeResponse(tree=tree)
@router.get(
"/{folder_id}",
summary="Get Folder",
response_model=library_model.LibraryFolder,
responses={
200: {"description": "Folder details"},
404: {"description": "Folder not found"},
500: {"description": "Server error"},
},
)
async def get_folder(
folder_id: str,
user_id: str = Security(autogpt_auth_lib.get_user_id),
) -> library_model.LibraryFolder:
"""
Get a specific folder.
Args:
folder_id: ID of the folder to retrieve.
user_id: ID of the authenticated user.
Returns:
The requested LibraryFolder.
"""
return await library_db.get_folder(folder_id=folder_id, user_id=user_id)
@router.post(
"",
summary="Create Folder",
status_code=status.HTTP_201_CREATED,
response_model=library_model.LibraryFolder,
responses={
201: {"description": "Folder created successfully"},
400: {"description": "Validation error"},
404: {"description": "Parent folder not found"},
409: {"description": "Folder name conflict"},
500: {"description": "Server error"},
},
)
async def create_folder(
payload: library_model.FolderCreateRequest,
user_id: str = Security(autogpt_auth_lib.get_user_id),
) -> library_model.LibraryFolder:
"""
Create a new folder.
Args:
payload: The folder creation request.
user_id: ID of the authenticated user.
Returns:
The created LibraryFolder.
"""
return await library_db.create_folder(
user_id=user_id,
name=payload.name,
parent_id=payload.parent_id,
icon=payload.icon,
color=payload.color,
)
@router.patch(
"/{folder_id}",
summary="Update Folder",
response_model=library_model.LibraryFolder,
responses={
200: {"description": "Folder updated successfully"},
400: {"description": "Validation error"},
404: {"description": "Folder not found"},
409: {"description": "Folder name conflict"},
500: {"description": "Server error"},
},
)
async def update_folder(
folder_id: str,
payload: library_model.FolderUpdateRequest,
user_id: str = Security(autogpt_auth_lib.get_user_id),
) -> library_model.LibraryFolder:
"""
Update a folder's properties.
Args:
folder_id: ID of the folder to update.
payload: The folder update request.
user_id: ID of the authenticated user.
Returns:
The updated LibraryFolder.
"""
return await library_db.update_folder(
folder_id=folder_id,
user_id=user_id,
name=payload.name,
icon=payload.icon,
color=payload.color,
)
@router.post(
"/{folder_id}/move",
summary="Move Folder",
response_model=library_model.LibraryFolder,
responses={
200: {"description": "Folder moved successfully"},
400: {"description": "Validation error (circular reference)"},
404: {"description": "Folder or target parent not found"},
409: {"description": "Folder name conflict in target location"},
500: {"description": "Server error"},
},
)
async def move_folder(
folder_id: str,
payload: library_model.FolderMoveRequest,
user_id: str = Security(autogpt_auth_lib.get_user_id),
) -> library_model.LibraryFolder:
"""
Move a folder to a new parent.
Args:
folder_id: ID of the folder to move.
payload: The move request with target parent.
user_id: ID of the authenticated user.
Returns:
The moved LibraryFolder.
"""
return await library_db.move_folder(
folder_id=folder_id,
user_id=user_id,
target_parent_id=payload.target_parent_id,
)
@router.delete(
"/{folder_id}",
summary="Delete Folder",
status_code=status.HTTP_204_NO_CONTENT,
responses={
204: {"description": "Folder deleted successfully"},
404: {"description": "Folder not found"},
500: {"description": "Server error"},
},
)
async def delete_folder(
folder_id: str,
user_id: str = Security(autogpt_auth_lib.get_user_id),
) -> Response:
"""
Soft-delete a folder and all its contents.
Args:
folder_id: ID of the folder to delete.
user_id: ID of the authenticated user.
Returns:
204 No Content if successful.
"""
await library_db.delete_folder(
folder_id=folder_id,
user_id=user_id,
soft_delete=True,
)
return Response(status_code=status.HTTP_204_NO_CONTENT)
# === Bulk Agent Operations ===
@router.post(
"/agents/bulk-move",
summary="Bulk Move Agents",
response_model=list[library_model.LibraryAgent],
responses={
200: {"description": "Agents moved successfully"},
404: {"description": "Folder not found"},
500: {"description": "Server error"},
},
)
async def bulk_move_agents(
payload: library_model.BulkMoveAgentsRequest,
user_id: str = Security(autogpt_auth_lib.get_user_id),
) -> list[library_model.LibraryAgent]:
"""
Move multiple agents to a folder.
Args:
payload: The bulk move request with agent IDs and target folder.
user_id: ID of the authenticated user.
Returns:
The updated LibraryAgents.
"""
return await library_db.bulk_move_agents_to_folder(
agent_ids=payload.agent_ids,
folder_id=payload.folder_id,
user_id=user_id,
)

View File

@@ -42,7 +42,6 @@ async def test_get_library_agents_success(
id="test-agent-1",
graph_id="test-agent-1",
graph_version=1,
owner_user_id=test_user_id,
name="Test Agent 1",
description="Test Description 1",
image_url=None,
@@ -67,7 +66,6 @@ async def test_get_library_agents_success(
id="test-agent-2",
graph_id="test-agent-2",
graph_version=1,
owner_user_id=test_user_id,
name="Test Agent 2",
description="Test Description 2",
image_url=None,
@@ -115,6 +113,8 @@ async def test_get_library_agents_success(
sort_by=library_model.LibraryAgentSort.UPDATED_AT,
page=1,
page_size=15,
folder_id=None,
include_root_only=False,
)
@@ -129,7 +129,6 @@ async def test_get_favorite_library_agents_success(
id="test-agent-1",
graph_id="test-agent-1",
graph_version=1,
owner_user_id=test_user_id,
name="Favorite Agent 1",
description="Test Favorite Description 1",
image_url=None,
@@ -182,7 +181,6 @@ def test_add_agent_to_library_success(
id="test-library-agent-id",
graph_id="test-agent-1",
graph_version=1,
owner_user_id=test_user_id,
name="Test Agent 1",
description="Test Description 1",
image_url=None,

View File

@@ -0,0 +1,511 @@
"""
MCP (Model Context Protocol) API routes.
Provides endpoints for MCP tool discovery and OAuth authentication so the
frontend can list available tools on an MCP server before placing a block.
"""
import logging
from typing import Annotated, Any
import fastapi
from autogpt_libs.auth import get_user_id
from fastapi import Security
from pydantic import BaseModel, Field, SecretStr
from backend.api.features.integrations.router import CredentialsMetaResponse
from backend.blocks.mcp.client import MCPClient, MCPClientError
from backend.blocks.mcp.helpers import (
auto_lookup_mcp_credential,
normalize_mcp_url,
server_host,
)
from backend.blocks.mcp.oauth import MCPOAuthHandler
from backend.data.model import OAuth2Credentials
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.integrations.providers import ProviderName
from backend.util.request import HTTPClientError, Requests, validate_url_host
from backend.util.settings import Settings
logger = logging.getLogger(__name__)
settings = Settings()
router = fastapi.APIRouter(tags=["mcp"])
creds_manager = IntegrationCredentialsManager()
# ====================== Tool Discovery ====================== #
class DiscoverToolsRequest(BaseModel):
"""Request to discover tools on an MCP server."""
server_url: str = Field(description="URL of the MCP server")
auth_token: str | None = Field(
default=None,
description="Optional Bearer token for authenticated MCP servers",
)
class MCPToolResponse(BaseModel):
"""A single MCP tool returned by discovery."""
name: str
description: str
input_schema: dict[str, Any]
class DiscoverToolsResponse(BaseModel):
"""Response containing the list of tools available on an MCP server."""
tools: list[MCPToolResponse]
server_name: str | None = None
protocol_version: str | None = None
@router.post(
"/discover-tools",
summary="Discover available tools on an MCP server",
response_model=DiscoverToolsResponse,
)
async def discover_tools(
request: DiscoverToolsRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> DiscoverToolsResponse:
"""
Connect to an MCP server and return its available tools.
If the user has a stored MCP credential for this server URL, it will be
used automatically — no need to pass an explicit auth token.
"""
# Validate URL to prevent SSRF — blocks loopback and private IP ranges.
try:
await validate_url_host(request.server_url)
except ValueError as e:
raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
auth_token = request.auth_token
# Auto-use stored MCP credential when no explicit token is provided.
if not auth_token:
best_cred = await auto_lookup_mcp_credential(
user_id, normalize_mcp_url(request.server_url)
)
if best_cred:
auth_token = best_cred.access_token.get_secret_value()
client = MCPClient(request.server_url, auth_token=auth_token)
try:
init_result = await client.initialize()
tools = await client.list_tools()
except HTTPClientError as e:
if e.status_code in (401, 403):
raise fastapi.HTTPException(
status_code=401,
detail="This MCP server requires authentication. "
"Please provide a valid auth token.",
)
raise fastapi.HTTPException(status_code=502, detail=str(e))
except MCPClientError as e:
raise fastapi.HTTPException(status_code=502, detail=str(e))
except Exception as e:
raise fastapi.HTTPException(
status_code=502,
detail=f"Failed to connect to MCP server: {e}",
)
return DiscoverToolsResponse(
tools=[
MCPToolResponse(
name=t.name,
description=t.description,
input_schema=t.input_schema,
)
for t in tools
],
server_name=(
init_result.get("serverInfo", {}).get("name")
or server_host(request.server_url)
or "MCP"
),
protocol_version=init_result.get("protocolVersion"),
)
# ======================== OAuth Flow ======================== #
class MCPOAuthLoginRequest(BaseModel):
"""Request to start an OAuth flow for an MCP server."""
server_url: str = Field(description="URL of the MCP server that requires OAuth")
class MCPOAuthLoginResponse(BaseModel):
"""Response with the OAuth login URL for the user to authenticate."""
login_url: str
state_token: str
@router.post(
"/oauth/login",
summary="Initiate OAuth login for an MCP server",
)
async def mcp_oauth_login(
request: MCPOAuthLoginRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> MCPOAuthLoginResponse:
"""
Discover OAuth metadata from the MCP server and return a login URL.
1. Discovers the protected-resource metadata (RFC 9728)
2. Fetches the authorization server metadata (RFC 8414)
3. Performs Dynamic Client Registration (RFC 7591) if available
4. Returns the authorization URL for the frontend to open in a popup
"""
# Validate URL to prevent SSRF — blocks loopback and private IP ranges.
try:
await validate_url_host(request.server_url)
except ValueError as e:
raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
# Normalize the URL so that credentials stored here are matched consistently
# by auto_lookup_mcp_credential (which also uses normalized URLs).
server_url = normalize_mcp_url(request.server_url)
client = MCPClient(server_url)
# Step 1: Discover protected-resource metadata (RFC 9728)
protected_resource = await client.discover_auth()
metadata: dict[str, Any] | None = None
if protected_resource and protected_resource.get("authorization_servers"):
auth_server_url = protected_resource["authorization_servers"][0]
resource_url = protected_resource.get("resource", server_url)
# Validate the auth server URL from metadata to prevent SSRF.
try:
await validate_url_host(auth_server_url)
except ValueError as e:
raise fastapi.HTTPException(
status_code=400,
detail=f"Invalid authorization server URL in metadata: {e}",
)
# Step 2a: Discover auth-server metadata (RFC 8414)
metadata = await client.discover_auth_server_metadata(auth_server_url)
else:
# Fallback: Some MCP servers (e.g. Linear) are their own auth server
# and serve OAuth metadata directly without protected-resource metadata.
# Don't assume a resource_url — omitting it lets the auth server choose
# the correct audience for the token (RFC 8707 resource is optional).
resource_url = None
metadata = await client.discover_auth_server_metadata(server_url)
if (
not metadata
or "authorization_endpoint" not in metadata
or "token_endpoint" not in metadata
):
raise fastapi.HTTPException(
status_code=400,
detail="This MCP server does not advertise OAuth support. "
"You may need to provide an auth token manually.",
)
authorize_url = metadata["authorization_endpoint"]
token_url = metadata["token_endpoint"]
registration_endpoint = metadata.get("registration_endpoint")
revoke_url = metadata.get("revocation_endpoint")
# Step 3: Dynamic Client Registration (RFC 7591) if available
frontend_base_url = settings.config.frontend_base_url
if not frontend_base_url:
raise fastapi.HTTPException(
status_code=500,
detail="Frontend base URL is not configured.",
)
redirect_uri = f"{frontend_base_url}/auth/integrations/mcp_callback"
client_id = ""
client_secret = ""
if registration_endpoint:
# Validate the registration endpoint to prevent SSRF via metadata.
try:
await validate_url_host(registration_endpoint)
except ValueError:
pass # Skip registration, fall back to default client_id
else:
reg_result = await _register_mcp_client(
registration_endpoint, redirect_uri, server_url
)
if reg_result:
client_id = reg_result.get("client_id", "")
client_secret = reg_result.get("client_secret", "")
if not client_id:
client_id = "autogpt-platform"
# Step 4: Store state token with OAuth metadata for the callback
scopes = (protected_resource or {}).get("scopes_supported") or metadata.get(
"scopes_supported", []
)
state_token, code_challenge = await creds_manager.store.store_state_token(
user_id,
ProviderName.MCP.value,
scopes,
state_metadata={
"authorize_url": authorize_url,
"token_url": token_url,
"revoke_url": revoke_url,
"resource_url": resource_url,
"server_url": server_url,
"client_id": client_id,
"client_secret": client_secret,
},
)
# Step 5: Build and return the login URL
handler = MCPOAuthHandler(
client_id=client_id,
client_secret=client_secret,
redirect_uri=redirect_uri,
authorize_url=authorize_url,
token_url=token_url,
resource_url=resource_url,
)
login_url = handler.get_login_url(
scopes, state_token, code_challenge=code_challenge
)
return MCPOAuthLoginResponse(login_url=login_url, state_token=state_token)
class MCPOAuthCallbackRequest(BaseModel):
"""Request to exchange an OAuth code for tokens."""
code: str = Field(description="Authorization code from OAuth callback")
state_token: str = Field(description="State token for CSRF verification")
class MCPOAuthCallbackResponse(BaseModel):
"""Response after successfully storing OAuth credentials."""
credential_id: str
@router.post(
"/oauth/callback",
summary="Exchange OAuth code for MCP tokens",
)
async def mcp_oauth_callback(
request: MCPOAuthCallbackRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> CredentialsMetaResponse:
"""
Exchange the authorization code for tokens and store the credential.
The frontend calls this after receiving the OAuth code from the popup.
On success, subsequent ``/discover-tools`` calls for the same server URL
will automatically use the stored credential.
"""
valid_state = await creds_manager.store.verify_state_token(
user_id, request.state_token, ProviderName.MCP.value
)
if not valid_state:
raise fastapi.HTTPException(
status_code=400,
detail="Invalid or expired state token.",
)
meta = valid_state.state_metadata
frontend_base_url = settings.config.frontend_base_url
if not frontend_base_url:
raise fastapi.HTTPException(
status_code=500,
detail="Frontend base URL is not configured.",
)
redirect_uri = f"{frontend_base_url}/auth/integrations/mcp_callback"
handler = MCPOAuthHandler(
client_id=meta["client_id"],
client_secret=meta.get("client_secret", ""),
redirect_uri=redirect_uri,
authorize_url=meta["authorize_url"],
token_url=meta["token_url"],
revoke_url=meta.get("revoke_url"),
resource_url=meta.get("resource_url"),
)
try:
credentials = await handler.exchange_code_for_tokens(
request.code, valid_state.scopes, valid_state.code_verifier
)
except Exception as e:
raise fastapi.HTTPException(
status_code=400,
detail=f"OAuth token exchange failed: {e}",
)
# Enrich credential metadata for future lookup and token refresh
if credentials.metadata is None:
credentials.metadata = {}
credentials.metadata["mcp_server_url"] = meta["server_url"]
credentials.metadata["mcp_client_id"] = meta["client_id"]
credentials.metadata["mcp_client_secret"] = meta.get("client_secret", "")
credentials.metadata["mcp_token_url"] = meta["token_url"]
credentials.metadata["mcp_resource_url"] = meta.get("resource_url", "")
hostname = server_host(meta["server_url"])
credentials.title = f"MCP: {hostname}"
# Remove old MCP credentials for the same server to prevent stale token buildup.
try:
old_creds = await creds_manager.store.get_creds_by_provider(
user_id, ProviderName.MCP.value
)
for old in old_creds:
if (
isinstance(old, OAuth2Credentials)
and (old.metadata or {}).get("mcp_server_url") == meta["server_url"]
):
await creds_manager.store.delete_creds_by_id(user_id, old.id)
logger.info(
"Removed old MCP credential %s for %s",
old.id,
server_host(meta["server_url"]),
)
except Exception:
logger.debug("Could not clean up old MCP credentials", exc_info=True)
await creds_manager.create(user_id, credentials)
return CredentialsMetaResponse(
id=credentials.id,
provider=credentials.provider,
type=credentials.type,
title=credentials.title,
scopes=credentials.scopes,
username=credentials.username,
host=credentials.metadata.get("mcp_server_url"),
)
# ======================== Bearer Token ======================== #
class MCPStoreTokenRequest(BaseModel):
"""Request to store a bearer token for an MCP server that doesn't support OAuth."""
server_url: str = Field(
description="MCP server URL the token authenticates against"
)
token: SecretStr = Field(
min_length=1, description="Bearer token / API key for the MCP server"
)
@router.post(
"/token",
summary="Store a bearer token for an MCP server",
)
async def mcp_store_token(
request: MCPStoreTokenRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> CredentialsMetaResponse:
"""
Store a manually provided bearer token as an MCP credential.
Used by the Copilot MCPSetupCard when the server doesn't support the MCP
OAuth discovery flow (returns 400 from /oauth/login). Subsequent
``run_mcp_tool`` calls will automatically pick up the token via
``_auto_lookup_credential``.
"""
token = request.token.get_secret_value().strip()
if not token:
raise fastapi.HTTPException(status_code=422, detail="Token must not be blank.")
# Validate URL to prevent SSRF — blocks loopback and private IP ranges.
try:
await validate_url_host(request.server_url)
except ValueError as e:
raise fastapi.HTTPException(status_code=400, detail=f"Invalid server URL: {e}")
# Normalize URL so trailing-slash variants match existing credentials.
server_url = normalize_mcp_url(request.server_url)
hostname = server_host(server_url)
# Collect IDs of old credentials to clean up after successful create.
old_cred_ids: list[str] = []
try:
old_creds = await creds_manager.store.get_creds_by_provider(
user_id, ProviderName.MCP.value
)
old_cred_ids = [
old.id
for old in old_creds
if isinstance(old, OAuth2Credentials)
and normalize_mcp_url((old.metadata or {}).get("mcp_server_url", ""))
== server_url
]
except Exception:
logger.debug("Could not query old MCP token credentials", exc_info=True)
credentials = OAuth2Credentials(
provider=ProviderName.MCP.value,
title=f"MCP: {hostname}",
access_token=SecretStr(token),
scopes=[],
metadata={"mcp_server_url": server_url},
)
await creds_manager.create(user_id, credentials)
# Only delete old credentials after the new one is safely stored.
for old_id in old_cred_ids:
try:
await creds_manager.store.delete_creds_by_id(user_id, old_id)
except Exception:
logger.debug("Could not clean up old MCP token credential", exc_info=True)
return CredentialsMetaResponse(
id=credentials.id,
provider=credentials.provider,
type=credentials.type,
title=credentials.title,
scopes=credentials.scopes,
username=credentials.username,
host=hostname,
)
# ======================== Helpers ======================== #
async def _register_mcp_client(
registration_endpoint: str,
redirect_uri: str,
server_url: str,
) -> dict[str, Any] | None:
"""Attempt Dynamic Client Registration (RFC 7591) with an MCP auth server."""
try:
response = await Requests(raise_for_status=True).post(
registration_endpoint,
json={
"client_name": "AutoGPT Platform",
"redirect_uris": [redirect_uri],
"grant_types": ["authorization_code"],
"response_types": ["code"],
"token_endpoint_auth_method": "client_secret_post",
},
)
data = response.json()
if isinstance(data, dict) and "client_id" in data:
return data
return None
except Exception as e:
logger.warning(
"Dynamic client registration failed for %s: %s", server_host(server_url), e
)
return None

View File

@@ -0,0 +1,572 @@
"""Tests for MCP API routes.
Uses httpx.AsyncClient with ASGITransport instead of fastapi.testclient.TestClient
to avoid creating blocking portals that can corrupt pytest-asyncio's session event loop.
"""
from unittest.mock import AsyncMock, patch
import fastapi
import httpx
import pytest
import pytest_asyncio
from autogpt_libs.auth import get_user_id
from pydantic import SecretStr
from backend.api.features.mcp.routes import router
from backend.blocks.mcp.client import MCPClientError, MCPTool
from backend.data.model import OAuth2Credentials
from backend.util.request import HTTPClientError
app = fastapi.FastAPI()
app.include_router(router)
app.dependency_overrides[get_user_id] = lambda: "test-user-id"
@pytest_asyncio.fixture(scope="module")
async def client():
transport = httpx.ASGITransport(app=app)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
yield c
@pytest.fixture(autouse=True)
def _bypass_ssrf_validation():
"""Bypass validate_url_host in all route tests (test URLs don't resolve)."""
with patch(
"backend.api.features.mcp.routes.validate_url_host",
new_callable=AsyncMock,
):
yield
class TestDiscoverTools:
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_success(self, client):
mock_tools = [
MCPTool(
name="get_weather",
description="Get weather for a city",
input_schema={
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
),
MCPTool(
name="add_numbers",
description="Add two numbers",
input_schema={
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"},
},
},
),
]
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch(
"backend.api.features.mcp.routes.auto_lookup_mcp_credential",
new_callable=AsyncMock,
return_value=None,
),
):
instance = MockClient.return_value
instance.initialize = AsyncMock(
return_value={
"protocolVersion": "2025-03-26",
"serverInfo": {"name": "test-server"},
}
)
instance.list_tools = AsyncMock(return_value=mock_tools)
response = await client.post(
"/discover-tools",
json={"server_url": "https://mcp.example.com/mcp"},
)
assert response.status_code == 200
data = response.json()
assert len(data["tools"]) == 2
assert data["tools"][0]["name"] == "get_weather"
assert data["tools"][1]["name"] == "add_numbers"
assert data["server_name"] == "test-server"
assert data["protocol_version"] == "2025-03-26"
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_with_auth_token(self, client):
with patch("backend.api.features.mcp.routes.MCPClient") as MockClient:
instance = MockClient.return_value
instance.initialize = AsyncMock(
return_value={"serverInfo": {}, "protocolVersion": "2025-03-26"}
)
instance.list_tools = AsyncMock(return_value=[])
response = await client.post(
"/discover-tools",
json={
"server_url": "https://mcp.example.com/mcp",
"auth_token": "my-secret-token",
},
)
assert response.status_code == 200
MockClient.assert_called_once_with(
"https://mcp.example.com/mcp",
auth_token="my-secret-token",
)
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_auto_uses_stored_credential(self, client):
"""When no explicit token is given, stored MCP credentials are used."""
stored_cred = OAuth2Credentials(
provider="mcp",
title="MCP: example.com",
access_token=SecretStr("stored-token-123"),
refresh_token=None,
access_token_expires_at=None,
refresh_token_expires_at=None,
scopes=[],
metadata={"mcp_server_url": "https://mcp.example.com/mcp"},
)
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch(
"backend.api.features.mcp.routes.auto_lookup_mcp_credential",
new_callable=AsyncMock,
return_value=stored_cred,
),
):
instance = MockClient.return_value
instance.initialize = AsyncMock(
return_value={"serverInfo": {}, "protocolVersion": "2025-03-26"}
)
instance.list_tools = AsyncMock(return_value=[])
response = await client.post(
"/discover-tools",
json={"server_url": "https://mcp.example.com/mcp"},
)
assert response.status_code == 200
MockClient.assert_called_once_with(
"https://mcp.example.com/mcp",
auth_token="stored-token-123",
)
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_mcp_error(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch(
"backend.api.features.mcp.routes.auto_lookup_mcp_credential",
new_callable=AsyncMock,
return_value=None,
),
):
instance = MockClient.return_value
instance.initialize = AsyncMock(
side_effect=MCPClientError("Connection refused")
)
response = await client.post(
"/discover-tools",
json={"server_url": "https://bad-server.example.com/mcp"},
)
assert response.status_code == 502
assert "Connection refused" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_generic_error(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch(
"backend.api.features.mcp.routes.auto_lookup_mcp_credential",
new_callable=AsyncMock,
return_value=None,
),
):
instance = MockClient.return_value
instance.initialize = AsyncMock(side_effect=Exception("Network timeout"))
response = await client.post(
"/discover-tools",
json={"server_url": "https://timeout.example.com/mcp"},
)
assert response.status_code == 502
assert "Failed to connect" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_auth_required(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch(
"backend.api.features.mcp.routes.auto_lookup_mcp_credential",
new_callable=AsyncMock,
return_value=None,
),
):
instance = MockClient.return_value
instance.initialize = AsyncMock(
side_effect=HTTPClientError("HTTP 401 Error: Unauthorized", 401)
)
response = await client.post(
"/discover-tools",
json={"server_url": "https://auth-server.example.com/mcp"},
)
assert response.status_code == 401
assert "requires authentication" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_forbidden(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch(
"backend.api.features.mcp.routes.auto_lookup_mcp_credential",
new_callable=AsyncMock,
return_value=None,
),
):
instance = MockClient.return_value
instance.initialize = AsyncMock(
side_effect=HTTPClientError("HTTP 403 Error: Forbidden", 403)
)
response = await client.post(
"/discover-tools",
json={"server_url": "https://auth-server.example.com/mcp"},
)
assert response.status_code == 401
assert "requires authentication" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_missing_url(self, client):
response = await client.post("/discover-tools", json={})
assert response.status_code == 422
class TestOAuthLogin:
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_login_success(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
patch(
"backend.api.features.mcp.routes._register_mcp_client"
) as mock_register,
):
instance = MockClient.return_value
instance.discover_auth = AsyncMock(
return_value={
"authorization_servers": ["https://auth.sentry.io"],
"resource": "https://mcp.sentry.dev/mcp",
"scopes_supported": ["openid"],
}
)
instance.discover_auth_server_metadata = AsyncMock(
return_value={
"authorization_endpoint": "https://auth.sentry.io/authorize",
"token_endpoint": "https://auth.sentry.io/token",
"registration_endpoint": "https://auth.sentry.io/register",
}
)
mock_register.return_value = {
"client_id": "registered-client-id",
"client_secret": "registered-secret",
}
mock_cm.store.store_state_token = AsyncMock(
return_value=("state-token-123", "code-challenge-abc")
)
mock_settings.config.frontend_base_url = "http://localhost:3000"
response = await client.post(
"/oauth/login",
json={"server_url": "https://mcp.sentry.dev/mcp"},
)
assert response.status_code == 200
data = response.json()
assert "login_url" in data
assert data["state_token"] == "state-token-123"
assert "auth.sentry.io/authorize" in data["login_url"]
assert "registered-client-id" in data["login_url"]
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_login_no_oauth_support(self, client):
with patch("backend.api.features.mcp.routes.MCPClient") as MockClient:
instance = MockClient.return_value
instance.discover_auth = AsyncMock(return_value=None)
instance.discover_auth_server_metadata = AsyncMock(return_value=None)
response = await client.post(
"/oauth/login",
json={"server_url": "https://simple-server.example.com/mcp"},
)
assert response.status_code == 400
assert "does not advertise OAuth" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_login_fallback_to_public_client(self, client):
"""When DCR is unavailable, falls back to default public client ID."""
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
):
instance = MockClient.return_value
instance.discover_auth = AsyncMock(
return_value={
"authorization_servers": ["https://auth.example.com"],
"resource": "https://mcp.example.com/mcp",
}
)
instance.discover_auth_server_metadata = AsyncMock(
return_value={
"authorization_endpoint": "https://auth.example.com/authorize",
"token_endpoint": "https://auth.example.com/token",
# No registration_endpoint
}
)
mock_cm.store.store_state_token = AsyncMock(
return_value=("state-abc", "challenge-xyz")
)
mock_settings.config.frontend_base_url = "http://localhost:3000"
response = await client.post(
"/oauth/login",
json={"server_url": "https://mcp.example.com/mcp"},
)
assert response.status_code == 200
data = response.json()
assert "autogpt-platform" in data["login_url"]
class TestOAuthCallback:
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_callback_success(self, client):
mock_creds = OAuth2Credentials(
provider="mcp",
title=None,
access_token=SecretStr("access-token-xyz"),
refresh_token=None,
access_token_expires_at=None,
refresh_token_expires_at=None,
scopes=[],
metadata={
"mcp_token_url": "https://auth.sentry.io/token",
"mcp_resource_url": "https://mcp.sentry.dev/mcp",
},
)
with (
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
patch("backend.api.features.mcp.routes.MCPOAuthHandler") as MockHandler,
):
mock_settings.config.frontend_base_url = "http://localhost:3000"
# Mock state verification
mock_state = AsyncMock()
mock_state.state_metadata = {
"authorize_url": "https://auth.sentry.io/authorize",
"token_url": "https://auth.sentry.io/token",
"client_id": "test-client-id",
"client_secret": "test-secret",
"server_url": "https://mcp.sentry.dev/mcp",
}
mock_state.scopes = ["openid"]
mock_state.code_verifier = "verifier-123"
mock_cm.store.verify_state_token = AsyncMock(return_value=mock_state)
mock_cm.create = AsyncMock()
handler_instance = MockHandler.return_value
handler_instance.exchange_code_for_tokens = AsyncMock(
return_value=mock_creds
)
# Mock old credential cleanup
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
response = await client.post(
"/oauth/callback",
json={"code": "auth-code-abc", "state_token": "state-token-123"},
)
assert response.status_code == 200
data = response.json()
assert "id" in data
assert data["provider"] == "mcp"
assert data["type"] == "oauth2"
mock_cm.create.assert_called_once()
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_callback_invalid_state(self, client):
with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
mock_cm.store.verify_state_token = AsyncMock(return_value=None)
response = await client.post(
"/oauth/callback",
json={"code": "auth-code", "state_token": "bad-state"},
)
assert response.status_code == 400
assert "Invalid or expired" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_callback_token_exchange_fails(self, client):
with (
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
patch("backend.api.features.mcp.routes.MCPOAuthHandler") as MockHandler,
):
mock_settings.config.frontend_base_url = "http://localhost:3000"
mock_state = AsyncMock()
mock_state.state_metadata = {
"authorize_url": "https://auth.example.com/authorize",
"token_url": "https://auth.example.com/token",
"client_id": "cid",
"server_url": "https://mcp.example.com/mcp",
}
mock_state.scopes = []
mock_state.code_verifier = "v"
mock_cm.store.verify_state_token = AsyncMock(return_value=mock_state)
handler_instance = MockHandler.return_value
handler_instance.exchange_code_for_tokens = AsyncMock(
side_effect=RuntimeError("Token exchange failed")
)
response = await client.post(
"/oauth/callback",
json={"code": "bad-code", "state_token": "state"},
)
assert response.status_code == 400
assert "token exchange failed" in response.json()["detail"].lower()
class TestStoreToken:
@pytest.mark.asyncio(loop_scope="session")
async def test_store_token_success(self, client):
with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
mock_cm.create = AsyncMock()
response = await client.post(
"/token",
json={
"server_url": "https://mcp.example.com/mcp",
"token": "my-api-key-123",
},
)
assert response.status_code == 200
data = response.json()
assert data["provider"] == "mcp"
assert data["type"] == "oauth2"
assert data["host"] == "mcp.example.com"
mock_cm.create.assert_called_once()
@pytest.mark.asyncio(loop_scope="session")
async def test_store_token_blank_rejected(self, client):
"""Blank token string (after stripping) should return 422."""
response = await client.post(
"/token",
json={
"server_url": "https://mcp.example.com/mcp",
"token": " ",
},
)
# Pydantic min_length=1 catches the whitespace-only token
assert response.status_code == 422
@pytest.mark.asyncio(loop_scope="session")
async def test_store_token_replaces_old_credential(self, client):
old_cred = OAuth2Credentials(
provider="mcp",
title="MCP: mcp.example.com",
access_token=SecretStr("old-token"),
scopes=[],
metadata={"mcp_server_url": "https://mcp.example.com/mcp"},
)
with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[old_cred])
mock_cm.create = AsyncMock()
mock_cm.store.delete_creds_by_id = AsyncMock()
response = await client.post(
"/token",
json={
"server_url": "https://mcp.example.com/mcp",
"token": "new-token",
},
)
assert response.status_code == 200
mock_cm.store.delete_creds_by_id.assert_called_once_with(
"test-user-id", old_cred.id
)
class TestSSRFValidation:
"""Verify that validate_url_host is enforced on all endpoints."""
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_ssrf_blocked(self, client):
with patch(
"backend.api.features.mcp.routes.validate_url_host",
new_callable=AsyncMock,
side_effect=ValueError("blocked loopback"),
):
response = await client.post(
"/discover-tools",
json={"server_url": "http://localhost/mcp"},
)
assert response.status_code == 400
assert "blocked loopback" in response.json()["detail"].lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_login_ssrf_blocked(self, client):
with patch(
"backend.api.features.mcp.routes.validate_url_host",
new_callable=AsyncMock,
side_effect=ValueError("blocked private IP"),
):
response = await client.post(
"/oauth/login",
json={"server_url": "http://10.0.0.1/mcp"},
)
assert response.status_code == 400
assert "blocked private ip" in response.json()["detail"].lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_store_token_ssrf_blocked(self, client):
with patch(
"backend.api.features.mcp.routes.validate_url_host",
new_callable=AsyncMock,
side_effect=ValueError("blocked loopback"),
):
response = await client.post(
"/token",
json={
"server_url": "http://127.0.0.1/mcp",
"token": "some-token",
},
)
assert response.status_code == 400
assert "blocked loopback" in response.json()["detail"].lower()

View File

@@ -5,8 +5,8 @@ from typing import Optional
import aiohttp
from fastapi import HTTPException
from backend.blocks import get_block
from backend.data import graph as graph_db
from backend.data.block import get_block
from backend.util.settings import Settings
from .models import ApiResponse, ChatRequest, GraphData

View File

@@ -1,5 +1,3 @@
from typing import Literal
from backend.util.cache import cached
from . import db as store_db
@@ -23,7 +21,7 @@ def clear_all_caches():
async def _get_cached_store_agents(
featured: bool,
creator: str | None,
sorted_by: Literal["rating", "runs", "name", "updated_at"] | None,
sorted_by: store_db.StoreAgentsSortOptions | None,
search_query: str | None,
category: str | None,
page: int,
@@ -57,7 +55,7 @@ async def _get_cached_agent_details(
async def _get_cached_store_creators(
featured: bool,
search_query: str | None,
sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None,
sorted_by: store_db.StoreCreatorsSortOptions | None,
page: int,
page_size: int,
):
@@ -75,4 +73,4 @@ async def _get_cached_store_creators(
@cached(maxsize=100, ttl_seconds=300, shared_cache=True)
async def _get_cached_creator_details(username: str):
"""Cached helper to get creator details."""
return await store_db.get_store_creator_details(username=username.lower())
return await store_db.get_store_creator(username=username.lower())

View File

@@ -5,19 +5,40 @@ Pluggable system for different content sources (store agents, blocks, docs).
Each handler knows how to fetch and process its content type for embedding.
"""
from __future__ import annotations
import asyncio
import functools
import itertools
import logging
from abc import ABC, abstractmethod
from dataclasses import dataclass
from pathlib import Path
from typing import Any
from typing import TYPE_CHECKING, Any, get_args, get_origin
from prisma.enums import ContentType
from backend.blocks import get_blocks
from backend.blocks.llm import LlmModel
from backend.data.db import query_raw_with_schema
from backend.util.text import split_camelcase
if TYPE_CHECKING:
from backend.blocks._base import AnyBlockSchema
logger = logging.getLogger(__name__)
def _contains_type(annotation: Any, target: type) -> bool:
"""Check if an annotation is or contains the target type (handles Optional/Union/Annotated)."""
if annotation is target:
return True
origin = get_origin(annotation)
if origin is None:
return False
return any(_contains_type(arg, target) for arg in get_args(annotation))
@dataclass
class ContentItem:
"""Represents a piece of content to be embedded."""
@@ -143,6 +164,28 @@ class StoreAgentHandler(ContentHandler):
}
@functools.lru_cache(maxsize=1)
def _get_enabled_blocks() -> dict[str, AnyBlockSchema]:
"""Return ``{block_id: block_instance}`` for all enabled, instantiable blocks.
Disabled blocks and blocks that fail to instantiate are silently skipped
(with a warning log), so callers never need their own try/except loop.
Results are cached for the process lifetime via ``lru_cache`` because
blocks are registered at import time and never change while running.
"""
enabled: dict[str, AnyBlockSchema] = {}
for block_id, block_cls in get_blocks().items():
try:
instance = block_cls()
except Exception as e:
logger.warning(f"Skipping block {block_id}: init failed: {e}")
continue
if not instance.disabled:
enabled[block_id] = instance
return enabled
class BlockHandler(ContentHandler):
"""Handler for block definitions (Python classes)."""
@@ -152,16 +195,14 @@ class BlockHandler(ContentHandler):
async def get_missing_items(self, batch_size: int) -> list[ContentItem]:
"""Fetch blocks without embeddings."""
from backend.data.block import get_blocks
# Get all available blocks
all_blocks = get_blocks()
# Check which ones have embeddings
if not all_blocks:
# to_thread keeps the first (heavy) call off the event loop. On
# subsequent calls the lru_cache makes this a dict lookup, so the
# thread-pool overhead is negligible compared to the DB queries below.
enabled = await asyncio.to_thread(_get_enabled_blocks)
if not enabled:
return []
block_ids = list(all_blocks.keys())
block_ids = list(enabled.keys())
# Query for existing embeddings
placeholders = ",".join([f"${i+1}" for i in range(len(block_ids))])
@@ -176,57 +217,53 @@ class BlockHandler(ContentHandler):
)
existing_ids = {row["contentId"] for row in existing_result}
missing_blocks = [
(block_id, block_cls)
for block_id, block_cls in all_blocks.items()
if block_id not in existing_ids
]
# Convert to ContentItem
# Convert to ContentItem — disabled filtering already done by
# _get_enabled_blocks so batch_size won't be exhausted by disabled blocks.
missing = ((bid, b) for bid, b in enabled.items() if bid not in existing_ids)
items = []
for block_id, block_cls in missing_blocks[:batch_size]:
for block_id, block in itertools.islice(missing, batch_size):
try:
block_instance = block_cls()
# Skip disabled blocks - they shouldn't be indexed
if block_instance.disabled:
continue
# Build searchable text from block metadata
parts = []
if hasattr(block_instance, "name") and block_instance.name:
parts.append(block_instance.name)
if (
hasattr(block_instance, "description")
and block_instance.description
):
parts.append(block_instance.description)
if hasattr(block_instance, "categories") and block_instance.categories:
# Convert BlockCategory enum to strings
parts.append(
" ".join(str(cat.value) for cat in block_instance.categories)
if not block.name:
logger.warning(
f"Block {block_id} has no name — using block_id as fallback"
)
display_name = split_camelcase(block.name) if block.name else ""
parts = []
if display_name:
parts.append(display_name)
if block.description:
parts.append(block.description)
if block.categories:
parts.append(" ".join(str(cat.value) for cat in block.categories))
# Add input/output schema info
if hasattr(block_instance, "input_schema"):
schema = block_instance.input_schema
if hasattr(schema, "model_json_schema"):
schema_dict = schema.model_json_schema()
if "properties" in schema_dict:
for prop_name, prop_info in schema_dict[
"properties"
].items():
if "description" in prop_info:
parts.append(
f"{prop_name}: {prop_info['description']}"
)
# Add input schema field descriptions
parts += [
f"{field_name}: {field_info.description}"
for field_name, field_info in block.input_schema.model_fields.items()
if field_info.description
]
searchable_text = " ".join(parts)
# Convert categories set of enums to list of strings for JSON serialization
categories = getattr(block_instance, "categories", set())
categories_list = (
[cat.value for cat in categories] if categories else []
[cat.value for cat in block.categories] if block.categories else []
)
# Extract provider names from credentials fields
credentials_info = block.input_schema.get_credentials_fields_info()
is_integration = len(credentials_info) > 0
provider_names = [
provider.value.lower()
for info in credentials_info.values()
for provider in info.provider
]
# Check if block has LlmModel field in input schema
has_llm_model_field = any(
_contains_type(field.annotation, LlmModel)
for field in block.input_schema.model_fields.values()
)
items.append(
@@ -235,10 +272,13 @@ class BlockHandler(ContentHandler):
content_type=ContentType.BLOCK,
searchable_text=searchable_text,
metadata={
"name": getattr(block_instance, "name", ""),
"name": display_name or block.name or block_id,
"categories": categories_list,
"providers": provider_names,
"has_llm_model_field": has_llm_model_field,
"is_integration": is_integration,
},
user_id=None, # Blocks are public
user_id=None,
)
)
except Exception as e:
@@ -249,22 +289,13 @@ class BlockHandler(ContentHandler):
async def get_stats(self) -> dict[str, int]:
"""Get statistics about block embedding coverage."""
from backend.data.block import get_blocks
all_blocks = get_blocks()
# Filter out disabled blocks - they're not indexed
enabled_block_ids = [
block_id
for block_id, block_cls in all_blocks.items()
if not block_cls().disabled
]
total_blocks = len(enabled_block_ids)
enabled = await asyncio.to_thread(_get_enabled_blocks)
total_blocks = len(enabled)
if total_blocks == 0:
return {"total": 0, "with_embeddings": 0, "without_embeddings": 0}
block_ids = enabled_block_ids
block_ids = list(enabled.keys())
placeholders = ",".join([f"${i+1}" for i in range(len(block_ids))])
embedded_result = await query_raw_with_schema(

View File

@@ -1,7 +1,5 @@
"""
E2E tests for content handlers (blocks, store agents, documentation).
Tests the full flow: discovering content → generating embeddings → storing.
Tests for content handlers (blocks, store agents, documentation).
"""
from pathlib import Path
@@ -15,15 +13,103 @@ from backend.api.features.store.content_handlers import (
BlockHandler,
DocumentationHandler,
StoreAgentHandler,
_get_enabled_blocks,
)
@pytest.fixture(autouse=True)
def _clear_block_cache():
"""Clear the lru_cache on _get_enabled_blocks before each test."""
_get_enabled_blocks.cache_clear()
yield
_get_enabled_blocks.cache_clear()
# ---------------------------------------------------------------------------
# Helper to build a mock block class that returns a pre-configured instance
# ---------------------------------------------------------------------------
def _make_block_class(
*,
name: str = "Block",
description: str = "",
disabled: bool = False,
categories: list[MagicMock] | None = None,
fields: dict[str, str] | None = None,
raise_on_init: Exception | None = None,
) -> MagicMock:
cls = MagicMock()
if raise_on_init is not None:
cls.side_effect = raise_on_init
return cls
inst = MagicMock()
inst.name = name
inst.disabled = disabled
inst.description = description
inst.categories = categories or []
field_mocks = {
fname: MagicMock(description=fdesc) for fname, fdesc in (fields or {}).items()
}
inst.input_schema.model_fields = field_mocks
inst.input_schema.get_credentials_fields_info.return_value = {}
cls.return_value = inst
return cls
# ---------------------------------------------------------------------------
# _get_enabled_blocks
# ---------------------------------------------------------------------------
def test_get_enabled_blocks_filters_disabled():
"""Disabled blocks are excluded."""
blocks = {
"enabled": _make_block_class(name="E", disabled=False),
"disabled": _make_block_class(name="D", disabled=True),
}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
result = _get_enabled_blocks()
assert list(result.keys()) == ["enabled"]
def test_get_enabled_blocks_skips_broken():
"""Blocks that raise on init are skipped without crashing."""
blocks = {
"good": _make_block_class(name="Good"),
"bad": _make_block_class(raise_on_init=RuntimeError("boom")),
}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
result = _get_enabled_blocks()
assert list(result.keys()) == ["good"]
def test_get_enabled_blocks_cached():
"""_get_enabled_blocks() calls get_blocks() only once across multiple calls."""
blocks = {"b1": _make_block_class(name="B1")}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
) as mock_get_blocks:
result1 = _get_enabled_blocks()
result2 = _get_enabled_blocks()
assert result1 is result2
mock_get_blocks.assert_called_once()
# ---------------------------------------------------------------------------
# StoreAgentHandler
# ---------------------------------------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_store_agent_handler_get_missing_items(mocker):
"""Test StoreAgentHandler fetches approved agents without embeddings."""
handler = StoreAgentHandler()
# Mock database query
mock_missing = [
{
"id": "agent-1",
@@ -54,9 +140,7 @@ async def test_store_agent_handler_get_stats(mocker):
"""Test StoreAgentHandler returns correct stats."""
handler = StoreAgentHandler()
# Mock approved count query
mock_approved = [{"count": 50}]
# Mock embedded count query
mock_embedded = [{"count": 30}]
with patch(
@@ -70,73 +154,130 @@ async def test_store_agent_handler_get_stats(mocker):
assert stats["without_embeddings"] == 20
# ---------------------------------------------------------------------------
# BlockHandler
# ---------------------------------------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_get_missing_items(mocker):
async def test_block_handler_get_missing_items():
"""Test BlockHandler discovers blocks without embeddings."""
handler = BlockHandler()
# Mock get_blocks to return test blocks
mock_block_class = MagicMock()
mock_block_instance = MagicMock()
mock_block_instance.name = "Calculator Block"
mock_block_instance.description = "Performs calculations"
mock_block_instance.categories = [MagicMock(value="MATH")]
mock_block_instance.disabled = False
mock_block_instance.input_schema.model_json_schema.return_value = {
"properties": {"expression": {"description": "Math expression to evaluate"}}
blocks = {
"block-uuid-1": _make_block_class(
name="CalculatorBlock",
description="Performs calculations",
categories=[MagicMock(value="MATH")],
fields={"expression": "Math expression to evaluate"},
),
}
mock_block_class.return_value = mock_block_instance
mock_blocks = {"block-uuid-1": mock_block_class}
# Mock existing embeddings query (no embeddings exist)
mock_existing = []
with patch(
"backend.data.block.get_blocks",
return_value=mock_blocks,
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=mock_existing,
return_value=[],
):
items = await handler.get_missing_items(batch_size=10)
assert len(items) == 1
assert items[0].content_id == "block-uuid-1"
assert items[0].content_type == ContentType.BLOCK
# CamelCase should be split in searchable text and metadata name
assert "Calculator Block" in items[0].searchable_text
assert "Performs calculations" in items[0].searchable_text
assert "MATH" in items[0].searchable_text
assert "expression: Math expression" in items[0].searchable_text
assert items[0].metadata["name"] == "Calculator Block"
assert items[0].user_id is None
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_get_stats(mocker):
async def test_block_handler_get_missing_items_splits_camelcase():
"""CamelCase block names are split for better search indexing."""
handler = BlockHandler()
blocks = {
"ai-block": _make_block_class(name="AITextGeneratorBlock"),
}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
):
items = await handler.get_missing_items(batch_size=10)
assert len(items) == 1
assert "AI Text Generator Block" in items[0].searchable_text
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_get_missing_items_batch_size_zero():
"""batch_size=0 returns an empty list; the DB is still queried to find missing IDs."""
handler = BlockHandler()
blocks = {"b1": _make_block_class(name="B1")}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
) as mock_query:
items = await handler.get_missing_items(batch_size=0)
assert items == []
# DB query is still issued to learn which blocks lack embeddings;
# the empty result comes from itertools.islice limiting to 0 items.
mock_query.assert_called_once()
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_disabled_dont_exhaust_batch():
"""Disabled blocks don't consume batch budget, so enabled blocks get indexed."""
handler = BlockHandler()
# 5 disabled + 3 enabled, batch_size=2
blocks = {
**{
f"dis-{i}": _make_block_class(name=f"D{i}", disabled=True) for i in range(5)
},
**{f"en-{i}": _make_block_class(name=f"E{i}") for i in range(3)},
}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
):
items = await handler.get_missing_items(batch_size=2)
assert len(items) == 2
assert all(item.content_id.startswith("en-") for item in items)
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_get_stats():
"""Test BlockHandler returns correct stats."""
handler = BlockHandler()
# Mock get_blocks - each block class returns an instance with disabled=False
def make_mock_block_class():
mock_class = MagicMock()
mock_instance = MagicMock()
mock_instance.disabled = False
mock_class.return_value = mock_instance
return mock_class
mock_blocks = {
"block-1": make_mock_block_class(),
"block-2": make_mock_block_class(),
"block-3": make_mock_block_class(),
blocks = {
"block-1": _make_block_class(name="B1"),
"block-2": _make_block_class(name="B2"),
"block-3": _make_block_class(name="B3"),
}
# Mock embedded count query (2 blocks have embeddings)
mock_embedded = [{"count": 2}]
with patch(
"backend.data.block.get_blocks",
return_value=mock_blocks,
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
@@ -149,21 +290,123 @@ async def test_block_handler_get_stats(mocker):
assert stats["without_embeddings"] == 1
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_get_stats_skips_broken():
"""get_stats skips broken blocks instead of crashing."""
handler = BlockHandler()
blocks = {
"good": _make_block_class(name="Good"),
"bad": _make_block_class(raise_on_init=RuntimeError("boom")),
}
mock_embedded = [{"count": 1}]
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=mock_embedded,
):
stats = await handler.get_stats()
assert stats["total"] == 1 # only the good block
assert stats["with_embeddings"] == 1
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_handles_none_name():
"""When block.name is None the fallback display name logic is used."""
handler = BlockHandler()
blocks = {
"none-name-block": _make_block_class(
name="placeholder", # will be overridden to None below
description="A block with no name",
),
}
# Override the name to None after construction so _make_block_class
# doesn't interfere with the mock wiring.
blocks["none-name-block"].return_value.name = None
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
):
items = await handler.get_missing_items(batch_size=10)
assert len(items) == 1
# display_name should be "" because block.name is None
# searchable_text should still contain the description
assert "A block with no name" in items[0].searchable_text
# metadata["name"] falls back to block_id when both display_name
# and block.name are falsy, ensuring it is always a non-empty string.
assert items[0].metadata["name"] == "none-name-block"
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_handles_empty_attributes():
"""Test BlockHandler handles blocks with empty/falsy attribute values."""
handler = BlockHandler()
blocks = {"block-minimal": _make_block_class(name="Minimal Block")}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
):
items = await handler.get_missing_items(batch_size=10)
assert len(items) == 1
assert items[0].searchable_text == "Minimal Block"
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_skips_failed_blocks():
"""Test BlockHandler skips blocks that fail to instantiate."""
handler = BlockHandler()
blocks = {
"good-block": _make_block_class(name="Good Block", description="Works fine"),
"bad-block": _make_block_class(raise_on_init=Exception("Instantiation failed")),
}
with patch(
"backend.api.features.store.content_handlers.get_blocks", return_value=blocks
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
):
items = await handler.get_missing_items(batch_size=10)
assert len(items) == 1
assert items[0].content_id == "good-block"
# ---------------------------------------------------------------------------
# DocumentationHandler
# ---------------------------------------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_documentation_handler_get_missing_items(tmp_path, mocker):
"""Test DocumentationHandler discovers docs without embeddings."""
handler = DocumentationHandler()
# Create temporary docs directory with test files
docs_root = tmp_path / "docs"
docs_root.mkdir()
(docs_root / "guide.md").write_text("# Getting Started\n\nThis is a guide.")
(docs_root / "api.mdx").write_text("# API Reference\n\nAPI documentation.")
# Mock _get_docs_root to return temp dir
with patch.object(handler, "_get_docs_root", return_value=docs_root):
# Mock existing embeddings query (no embeddings exist)
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
@@ -172,7 +415,6 @@ async def test_documentation_handler_get_missing_items(tmp_path, mocker):
assert len(items) == 2
# Check guide.md (content_id format: doc_path::section_index)
guide_item = next(
(item for item in items if item.content_id == "guide.md::0"), None
)
@@ -183,7 +425,6 @@ async def test_documentation_handler_get_missing_items(tmp_path, mocker):
assert guide_item.metadata["doc_title"] == "Getting Started"
assert guide_item.user_id is None
# Check api.mdx (content_id format: doc_path::section_index)
api_item = next(
(item for item in items if item.content_id == "api.mdx::0"), None
)
@@ -196,14 +437,12 @@ async def test_documentation_handler_get_stats(tmp_path, mocker):
"""Test DocumentationHandler returns correct stats."""
handler = DocumentationHandler()
# Create temporary docs directory
docs_root = tmp_path / "docs"
docs_root.mkdir()
(docs_root / "doc1.md").write_text("# Doc 1")
(docs_root / "doc2.md").write_text("# Doc 2")
(docs_root / "doc3.mdx").write_text("# Doc 3")
# Mock embedded count query (1 doc has embedding)
mock_embedded = [{"count": 1}]
with patch.object(handler, "_get_docs_root", return_value=docs_root):
@@ -223,13 +462,11 @@ async def test_documentation_handler_title_extraction(tmp_path):
"""Test DocumentationHandler extracts title from markdown heading."""
handler = DocumentationHandler()
# Test with heading
doc_with_heading = tmp_path / "with_heading.md"
doc_with_heading.write_text("# My Title\n\nContent here")
title = handler._extract_doc_title(doc_with_heading)
assert title == "My Title"
# Test without heading
doc_without_heading = tmp_path / "no-heading.md"
doc_without_heading.write_text("Just content, no heading")
title = handler._extract_doc_title(doc_without_heading)
@@ -241,7 +478,6 @@ async def test_documentation_handler_markdown_chunking(tmp_path):
"""Test DocumentationHandler chunks markdown by headings."""
handler = DocumentationHandler()
# Test document with multiple sections
doc_with_sections = tmp_path / "sections.md"
doc_with_sections.write_text(
"# Document Title\n\n"
@@ -253,7 +489,6 @@ async def test_documentation_handler_markdown_chunking(tmp_path):
)
sections = handler._chunk_markdown_by_headings(doc_with_sections)
# Should have 3 sections: intro (with doc title), section one, section two
assert len(sections) == 3
assert sections[0].title == "Document Title"
assert sections[0].index == 0
@@ -267,7 +502,6 @@ async def test_documentation_handler_markdown_chunking(tmp_path):
assert sections[2].index == 2
assert "Content for section two" in sections[2].content
# Test document without headings
doc_no_sections = tmp_path / "no-sections.md"
doc_no_sections.write_text("Just plain content without any headings.")
sections = handler._chunk_markdown_by_headings(doc_no_sections)
@@ -281,21 +515,39 @@ async def test_documentation_handler_section_content_ids():
"""Test DocumentationHandler creates and parses section content IDs."""
handler = DocumentationHandler()
# Test making content ID
content_id = handler._make_section_content_id("docs/guide.md", 2)
assert content_id == "docs/guide.md::2"
# Test parsing content ID
doc_path, section_index = handler._parse_section_content_id("docs/guide.md::2")
assert doc_path == "docs/guide.md"
assert section_index == 2
# Test parsing legacy format (no section index)
doc_path, section_index = handler._parse_section_content_id("docs/old-format.md")
assert doc_path == "docs/old-format.md"
assert section_index == 0
@pytest.mark.asyncio(loop_scope="session")
async def test_documentation_handler_missing_docs_directory():
"""Test DocumentationHandler handles missing docs directory gracefully."""
handler = DocumentationHandler()
fake_path = Path("/nonexistent/docs")
with patch.object(handler, "_get_docs_root", return_value=fake_path):
items = await handler.get_missing_items(batch_size=10)
assert items == []
stats = await handler.get_stats()
assert stats["total"] == 0
assert stats["with_embeddings"] == 0
assert stats["without_embeddings"] == 0
# ---------------------------------------------------------------------------
# Registry
# ---------------------------------------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_content_handlers_registry():
"""Test all content types are registered."""
@@ -306,86 +558,3 @@ async def test_content_handlers_registry():
assert isinstance(CONTENT_HANDLERS[ContentType.STORE_AGENT], StoreAgentHandler)
assert isinstance(CONTENT_HANDLERS[ContentType.BLOCK], BlockHandler)
assert isinstance(CONTENT_HANDLERS[ContentType.DOCUMENTATION], DocumentationHandler)
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_handles_missing_attributes():
"""Test BlockHandler gracefully handles blocks with missing attributes."""
handler = BlockHandler()
# Mock block with minimal attributes
mock_block_class = MagicMock()
mock_block_instance = MagicMock()
mock_block_instance.name = "Minimal Block"
mock_block_instance.disabled = False
# No description, categories, or schema
del mock_block_instance.description
del mock_block_instance.categories
del mock_block_instance.input_schema
mock_block_class.return_value = mock_block_instance
mock_blocks = {"block-minimal": mock_block_class}
with patch(
"backend.data.block.get_blocks",
return_value=mock_blocks,
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
):
items = await handler.get_missing_items(batch_size=10)
assert len(items) == 1
assert items[0].searchable_text == "Minimal Block"
@pytest.mark.asyncio(loop_scope="session")
async def test_block_handler_skips_failed_blocks():
"""Test BlockHandler skips blocks that fail to instantiate."""
handler = BlockHandler()
# Mock one good block and one bad block
good_block = MagicMock()
good_instance = MagicMock()
good_instance.name = "Good Block"
good_instance.description = "Works fine"
good_instance.categories = []
good_instance.disabled = False
good_block.return_value = good_instance
bad_block = MagicMock()
bad_block.side_effect = Exception("Instantiation failed")
mock_blocks = {"good-block": good_block, "bad-block": bad_block}
with patch(
"backend.data.block.get_blocks",
return_value=mock_blocks,
):
with patch(
"backend.api.features.store.content_handlers.query_raw_with_schema",
return_value=[],
):
items = await handler.get_missing_items(batch_size=10)
# Should only get the good block
assert len(items) == 1
assert items[0].content_id == "good-block"
@pytest.mark.asyncio(loop_scope="session")
async def test_documentation_handler_missing_docs_directory():
"""Test DocumentationHandler handles missing docs directory gracefully."""
handler = DocumentationHandler()
# Mock _get_docs_root to return non-existent path
fake_path = Path("/nonexistent/docs")
with patch.object(handler, "_get_docs_root", return_value=fake_path):
items = await handler.get_missing_items(batch_size=10)
assert items == []
stats = await handler.get_stats()
assert stats["total"] == 0
assert stats["with_embeddings"] == 0
assert stats["without_embeddings"] == 0

File diff suppressed because it is too large Load Diff

View File

@@ -26,7 +26,7 @@ async def test_get_store_agents(mocker):
mock_agents = [
prisma.models.StoreAgent(
listing_id="test-id",
storeListingVersionId="version123",
listing_version_id="version123",
slug="test-agent",
agent_name="Test Agent",
agent_video=None,
@@ -40,11 +40,11 @@ async def test_get_store_agents(mocker):
runs=10,
rating=4.5,
versions=["1.0"],
agentGraphVersions=["1"],
agentGraphId="test-graph-id",
graph_id="test-graph-id",
graph_versions=["1"],
updated_at=datetime.now(),
is_available=False,
useForOnboarding=False,
use_for_onboarding=False,
)
]
@@ -68,10 +68,10 @@ async def test_get_store_agents(mocker):
@pytest.mark.asyncio(loop_scope="session")
async def test_get_store_agent_details(mocker):
# Mock data
# Mock data - StoreAgent view already contains the active version data
mock_agent = prisma.models.StoreAgent(
listing_id="test-id",
storeListingVersionId="version123",
listing_version_id="version123",
slug="test-agent",
agent_name="Test Agent",
agent_video="video.mp4",
@@ -85,102 +85,38 @@ async def test_get_store_agent_details(mocker):
runs=10,
rating=4.5,
versions=["1.0"],
agentGraphVersions=["1"],
agentGraphId="test-graph-id",
updated_at=datetime.now(),
is_available=False,
useForOnboarding=False,
)
# Mock active version agent (what we want to return for active version)
mock_active_agent = prisma.models.StoreAgent(
listing_id="test-id",
storeListingVersionId="active-version-id",
slug="test-agent",
agent_name="Test Agent Active",
agent_video="active_video.mp4",
agent_image=["active_image.jpg"],
featured=False,
creator_username="creator",
creator_avatar="avatar.jpg",
sub_heading="Test heading active",
description="Test description active",
categories=["test"],
runs=15,
rating=4.8,
versions=["1.0", "2.0"],
agentGraphVersions=["1", "2"],
agentGraphId="test-graph-id-active",
graph_id="test-graph-id",
graph_versions=["1"],
updated_at=datetime.now(),
is_available=True,
useForOnboarding=False,
use_for_onboarding=False,
)
# Create a mock StoreListing result
mock_store_listing = mocker.MagicMock()
mock_store_listing.activeVersionId = "active-version-id"
mock_store_listing.hasApprovedVersion = True
mock_store_listing.ActiveVersion = mocker.MagicMock()
mock_store_listing.ActiveVersion.recommendedScheduleCron = None
# Mock StoreAgent prisma call - need to handle multiple calls
# Mock StoreAgent prisma call
mock_store_agent = mocker.patch("prisma.models.StoreAgent.prisma")
# Set up side_effect to return different results for different calls
def mock_find_first_side_effect(*args, **kwargs):
where_clause = kwargs.get("where", {})
if "storeListingVersionId" in where_clause:
# Second call for active version
return mock_active_agent
else:
# First call for initial lookup
return mock_agent
mock_store_agent.return_value.find_first = mocker.AsyncMock(
side_effect=mock_find_first_side_effect
)
# Mock Profile prisma call
mock_profile = mocker.MagicMock()
mock_profile.userId = "user-id-123"
mock_profile_db = mocker.patch("prisma.models.Profile.prisma")
mock_profile_db.return_value.find_first = mocker.AsyncMock(
return_value=mock_profile
)
# Mock StoreListing prisma call
mock_store_listing_db = mocker.patch("prisma.models.StoreListing.prisma")
mock_store_listing_db.return_value.find_first = mocker.AsyncMock(
return_value=mock_store_listing
)
mock_store_agent.return_value.find_first = mocker.AsyncMock(return_value=mock_agent)
# Call function
result = await db.get_store_agent_details("creator", "test-agent")
# Verify results - should use active version data
# Verify results - constructed from the StoreAgent view
assert result.slug == "test-agent"
assert result.agent_name == "Test Agent Active" # From active version
assert result.active_version_id == "active-version-id"
assert result.agent_name == "Test Agent"
assert result.active_version_id == "version123"
assert result.has_approved_version is True
assert (
result.store_listing_version_id == "active-version-id"
) # Should be active version ID
assert result.store_listing_version_id == "version123"
assert result.graph_id == "test-graph-id"
assert result.runs == 10
assert result.rating == 4.5
# Verify mocks called correctly - now expecting 2 calls
assert mock_store_agent.return_value.find_first.call_count == 2
# Check the specific calls
calls = mock_store_agent.return_value.find_first.call_args_list
assert calls[0] == mocker.call(
# Verify single StoreAgent lookup
mock_store_agent.return_value.find_first.assert_called_once_with(
where={"creator_username": "creator", "slug": "test-agent"}
)
assert calls[1] == mocker.call(where={"storeListingVersionId": "active-version-id"})
mock_store_listing_db.return_value.find_first.assert_called_once()
@pytest.mark.asyncio(loop_scope="session")
async def test_get_store_creator_details(mocker):
async def test_get_store_creator(mocker):
# Mock data
mock_creator_data = prisma.models.Creator(
name="Test Creator",
@@ -202,7 +138,7 @@ async def test_get_store_creator_details(mocker):
mock_creator.return_value.find_unique.return_value = mock_creator_data
# Call function
result = await db.get_store_creator_details("creator")
result = await db.get_store_creator("creator")
# Verify results
assert result.username == "creator"
@@ -218,61 +154,110 @@ async def test_get_store_creator_details(mocker):
@pytest.mark.asyncio(loop_scope="session")
async def test_create_store_submission(mocker):
# Mock data
now = datetime.now()
# Mock agent graph (with no pending submissions) and user with profile
mock_profile = prisma.models.Profile(
id="profile-id",
userId="user-id",
name="Test User",
username="testuser",
description="Test",
isFeatured=False,
links=[],
createdAt=now,
updatedAt=now,
)
mock_user = prisma.models.User(
id="user-id",
email="test@example.com",
createdAt=now,
updatedAt=now,
Profile=[mock_profile],
emailVerified=True,
metadata="{}", # type: ignore[reportArgumentType]
integrations="",
maxEmailsPerDay=1,
notifyOnAgentRun=True,
notifyOnZeroBalance=True,
notifyOnLowBalance=True,
notifyOnBlockExecutionFailed=True,
notifyOnContinuousAgentError=True,
notifyOnDailySummary=True,
notifyOnWeeklySummary=True,
notifyOnMonthlySummary=True,
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="Europe/Delft",
)
mock_agent = prisma.models.AgentGraph(
id="agent-id",
version=1,
userId="user-id",
createdAt=datetime.now(),
createdAt=now,
isActive=True,
StoreListingVersions=[],
User=mock_user,
)
mock_listing = prisma.models.StoreListing(
# Mock the created StoreListingVersion (returned by create)
mock_store_listing_obj = prisma.models.StoreListing(
id="listing-id",
createdAt=datetime.now(),
updatedAt=datetime.now(),
createdAt=now,
updatedAt=now,
isDeleted=False,
hasApprovedVersion=False,
slug="test-agent",
agentGraphId="agent-id",
agentGraphVersion=1,
owningUserId="user-id",
Versions=[
prisma.models.StoreListingVersion(
id="version-id",
agentGraphId="agent-id",
agentGraphVersion=1,
name="Test Agent",
description="Test description",
createdAt=datetime.now(),
updatedAt=datetime.now(),
subHeading="Test heading",
imageUrls=["image.jpg"],
categories=["test"],
isFeatured=False,
isDeleted=False,
version=1,
storeListingId="listing-id",
submissionStatus=prisma.enums.SubmissionStatus.PENDING,
isAvailable=True,
)
],
useForOnboarding=False,
)
mock_version = prisma.models.StoreListingVersion(
id="version-id",
agentGraphId="agent-id",
agentGraphVersion=1,
name="Test Agent",
description="Test description",
createdAt=now,
updatedAt=now,
subHeading="",
imageUrls=[],
categories=[],
isFeatured=False,
isDeleted=False,
version=1,
storeListingId="listing-id",
submissionStatus=prisma.enums.SubmissionStatus.PENDING,
isAvailable=True,
submittedAt=now,
StoreListing=mock_store_listing_obj,
)
# Mock prisma calls
mock_agent_graph = mocker.patch("prisma.models.AgentGraph.prisma")
mock_agent_graph.return_value.find_first = mocker.AsyncMock(return_value=mock_agent)
mock_store_listing = mocker.patch("prisma.models.StoreListing.prisma")
mock_store_listing.return_value.find_first = mocker.AsyncMock(return_value=None)
mock_store_listing.return_value.create = mocker.AsyncMock(return_value=mock_listing)
# Mock transaction context manager
mock_tx = mocker.MagicMock()
mocker.patch(
"backend.api.features.store.db.transaction",
return_value=mocker.AsyncMock(
__aenter__=mocker.AsyncMock(return_value=mock_tx),
__aexit__=mocker.AsyncMock(return_value=False),
),
)
mock_sl = mocker.patch("prisma.models.StoreListing.prisma")
mock_sl.return_value.find_unique = mocker.AsyncMock(return_value=None)
mock_slv = mocker.patch("prisma.models.StoreListingVersion.prisma")
mock_slv.return_value.create = mocker.AsyncMock(return_value=mock_version)
# Call function
result = await db.create_store_submission(
user_id="user-id",
agent_id="agent-id",
agent_version=1,
graph_id="agent-id",
graph_version=1,
slug="test-agent",
name="Test Agent",
description="Test description",
@@ -281,11 +266,11 @@ async def test_create_store_submission(mocker):
# Verify results
assert result.name == "Test Agent"
assert result.description == "Test description"
assert result.store_listing_version_id == "version-id"
assert result.listing_version_id == "version-id"
# Verify mocks called correctly
mock_agent_graph.return_value.find_first.assert_called_once()
mock_store_listing.return_value.create.assert_called_once()
mock_slv.return_value.create.assert_called_once()
@pytest.mark.asyncio(loop_scope="session")
@@ -318,7 +303,6 @@ async def test_update_profile(mocker):
description="Test description",
links=["link1"],
avatar_url="avatar.jpg",
is_featured=False,
)
# Call function
@@ -389,7 +373,7 @@ async def test_get_store_agents_with_search_and_filters_parameterized():
creators=["creator1'; DROP TABLE Users; --", "creator2"],
category="AI'; DELETE FROM StoreAgent; --",
featured=True,
sorted_by="rating",
sorted_by=db.StoreAgentsSortOptions.RATING,
page=1,
page_size=20,
)

View File

@@ -15,6 +15,7 @@ from prisma.enums import ContentType
from tiktoken import encoding_for_model
from backend.api.features.store.content_handlers import CONTENT_HANDLERS
from backend.blocks import get_blocks
from backend.data.db import execute_raw_with_schema, query_raw_with_schema
from backend.util.clients import get_openai_client
from backend.util.json import dumps
@@ -662,8 +663,6 @@ async def cleanup_orphaned_embeddings() -> dict[str, Any]:
)
current_ids = {row["id"] for row in valid_agents}
elif content_type == ContentType.BLOCK:
from backend.data.block import get_blocks
current_ids = set(get_blocks().keys())
elif content_type == ContentType.DOCUMENTATION:
# Use DocumentationHandler to get section-based content IDs

View File

@@ -57,12 +57,6 @@ class StoreError(ValueError):
pass
class AgentNotFoundError(NotFoundError):
"""Raised when an agent is not found"""
pass
class CreatorNotFoundError(NotFoundError):
"""Raised when a creator is not found"""

View File

@@ -31,12 +31,10 @@ logger = logging.getLogger(__name__)
def tokenize(text: str) -> list[str]:
"""Simple tokenizer for BM25 - lowercase and split on non-alphanumeric."""
"""Tokenize text for BM25."""
if not text:
return []
# Lowercase and split on non-alphanumeric characters
tokens = re.findall(r"\b\w+\b", text.lower())
return tokens
return re.findall(r"\b\w+\b", text.lower())
def bm25_rerank(
@@ -568,7 +566,7 @@ async def hybrid_search(
SELECT uce."contentId" as "storeListingVersionId"
FROM {{schema_prefix}}"UnifiedContentEmbedding" uce
INNER JOIN {{schema_prefix}}"StoreAgent" sa
ON uce."contentId" = sa."storeListingVersionId"
ON uce."contentId" = sa.listing_version_id
WHERE uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
AND uce."userId" IS NULL
AND uce.search @@ plainto_tsquery('english', {query_param})
@@ -582,7 +580,7 @@ async def hybrid_search(
SELECT uce."contentId", uce.embedding
FROM {{schema_prefix}}"UnifiedContentEmbedding" uce
INNER JOIN {{schema_prefix}}"StoreAgent" sa
ON uce."contentId" = sa."storeListingVersionId"
ON uce."contentId" = sa.listing_version_id
WHERE uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
AND uce."userId" IS NULL
AND {where_clause}
@@ -605,7 +603,7 @@ async def hybrid_search(
sa.featured,
sa.is_available,
sa.updated_at,
sa."agentGraphId",
sa.graph_id,
-- Searchable text for BM25 reranking
COALESCE(sa.agent_name, '') || ' ' || COALESCE(sa.sub_heading, '') || ' ' || COALESCE(sa.description, '') as searchable_text,
-- Semantic score
@@ -627,9 +625,9 @@ async def hybrid_search(
sa.runs as popularity_raw
FROM candidates c
INNER JOIN {{schema_prefix}}"StoreAgent" sa
ON c."storeListingVersionId" = sa."storeListingVersionId"
ON c."storeListingVersionId" = sa.listing_version_id
INNER JOIN {{schema_prefix}}"UnifiedContentEmbedding" uce
ON sa."storeListingVersionId" = uce."contentId"
ON sa.listing_version_id = uce."contentId"
AND uce."contentType" = 'STORE_AGENT'::{{schema_prefix}}"ContentType"
),
max_vals AS (
@@ -665,7 +663,7 @@ async def hybrid_search(
featured,
is_available,
updated_at,
"agentGraphId",
graph_id,
searchable_text,
semantic_score,
lexical_score,

View File

@@ -14,9 +14,27 @@ from backend.api.features.store.hybrid_search import (
HybridSearchWeights,
UnifiedSearchWeights,
hybrid_search,
tokenize,
unified_hybrid_search,
)
# ---------------------------------------------------------------------------
# tokenize (BM25)
# ---------------------------------------------------------------------------
@pytest.mark.parametrize(
"input_text, expected",
[
("AITextGeneratorBlock", ["aitextgeneratorblock"]),
("hello world", ["hello", "world"]),
("", []),
("HTTPRequest", ["httprequest"]),
],
)
def test_tokenize(input_text: str, expected: list[str]):
assert tokenize(input_text) == expected
@pytest.mark.asyncio(loop_scope="session")
@pytest.mark.integration

Some files were not shown because too many files have changed in this diff Show More