Compare commits

..

439 Commits

Author SHA1 Message Date
Zamil Majdy
64d9d6d880 Merge remote-tracking branch 'origin/codex/platform-cost-tracking' into combined-preview-test 2026-04-02 18:32:06 +02:00
Zamil Majdy
9fc324e28a Merge remote-tracking branch 'origin/fix/copilot-tool-output-e2b-bridging' into combined-preview-test 2026-04-02 18:32:06 +02:00
Zamil Majdy
adf66bdd24 Merge origin/fix/copilot-subagent-security (resolved conflicts) 2026-04-02 18:32:06 +02:00
Zamil Majdy
dc10ad715a Merge remote-tracking branch 'origin/feat/rate-limit-tiering' into combined-preview-test 2026-04-02 18:32:05 +02:00
Zamil Majdy
493c91e0dd Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 18:32:05 +02:00
Zamil Majdy
b278e66f4d Merge remote-tracking branch 'origin/dev' into combined-preview-test 2026-04-02 18:32:05 +02:00
Zamil Majdy
3e183ed2a3 fix(copilot): address 8 should-fix items from review 4051661771
1. Rewrite tautological env_test.py TestClaudeCodeTmpdir tests to call
   build_sdk_env(sdk_cwd=...) directly instead of copy-pasting the
   if-sdk_cwd pattern. Moved CLAUDE_CODE_TMPDIR logic into build_sdk_env().
2. Add DEL (\x7f), C1 (\x80-\x9f), BiDi, and zero-width chars to
   security_hooks_test.py sanitization test inputs.
3. Promote _sanitize() from closure to module-level pure function.
4. Fix GenericTool.tsx "model may poll again" -> user-friendly message.
5. Replace `as never` with @ts-expect-error + comment in useChatSession.ts.
6. Extract "Agent"/"Task"/"TaskOutput" string literals to named constants
   in helpers.ts, imported in GenericTool.tsx.
7. Extend _sanitize() to strip Unicode BiDi overrides (U+202A-U+202E,
   U+2066-U+2069) and zero-width characters (U+200B-U+200F, U+FEFF).
8. Document background agent slot lifecycle limitation in security_hooks.py
   (SubagentStop doesn't fire reliably for background agents).
2026-04-02 18:23:42 +02:00
Zamil Majdy
82887a2d92 fix(backend/copilot): address reviewer feedback on E2B bridge API surface
- Rename _bridge_to_sandbox to bridge_to_sandbox (public) since it is
  imported cross-module from tool_adapter.py (item 4)
- Extract duplicated bridge+append-annotation pattern into shared
  bridge_and_annotate() helper used by both e2b_file_tools and
  tool_adapter (item 5)
- Add tests verifying bridge_and_annotate is called from
  _read_file_handler in tool_adapter when a sandbox is active (item 2)
- Add unit tests for bridge_and_annotate helper itself
2026-04-02 18:22:13 +02:00
Zamil Majdy
993c43b623 feat(platform): add merge_stats to remaining blocks (FAL, Revid, D-ID, E2B, YouTube, Weather, TTS, Enrichlayer)
Every system credential block now has explicit merge_stats tracking.
No block relies on the generic fallback anymore.
2026-04-02 18:22:02 +02:00
Zamil Majdy
13fcc62a31 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-tool-output-e2b-bridging 2026-04-02 18:16:25 +02:00
Zamil Majdy
8fefa23468 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-subagent-security 2026-04-02 18:15:58 +02:00
Zamil Majdy
749a56ca20 fix(backend): make email lookup non-blocking in set_user_tier endpoint 2026-04-02 18:14:34 +02:00
Zamil Majdy
a8a62eeefc feat(platform): add merge_stats tracking to all system credential blocks
Every block that uses system credentials now calls merge_stats with
meaningful data after the API response:
- Google Maps: output_size = number of places returned (= detail API calls)
- Apollo people/org: output_size = results count
- Apollo person: output_size = 1 per enrichment
- SmartLead: output_size = leads added or 1 per operation
- Ideogram: output_size = 1 per image
- Replicate: output_size = 1 per prediction
- Nvidia: output_size = 1 per inference
- ScreenshotOne: output_size = 1 per screenshot
- ZeroBounce: output_size = 1 per email validated
- Mem0: output_size = 1 per memory operation
2026-04-02 18:13:15 +02:00
Zamil Majdy
173614bcc5 fix(platform): audit and fix per-provider tracking accuracy
- Fix ElevenLabs/D-ID field name: script -> script_input
- Remove incorrect Google Maps api_calls formula, use per_run instead
- Remove D-ID from generation_seconds (walltime includes polling)
- Jina embeddings: extract total_tokens from response.usage
- Simplify tracking types: cost_usd, tokens, characters,
  sandbox_seconds, walltime_seconds, per_run
2026-04-02 17:58:24 +02:00
Zamil Majdy
3396cb3f4c fix(frontend): show advanced fields toggle when all input fields are advanced
When every input field was marked as advanced, `buildExpectedInputsSchema`
returned null (no visible fields), causing the entire inputs card—including
the "Show advanced fields" toggle—to not render. This made the fields
completely inaccessible.

Two changes:
- Render the inputs card when `hasAdvancedFields` is true, even if
  `inputSchema` is null, so the toggle is always accessible.
- Base `needsInputs` on `expectedInputs.length > 0` instead of
  `inputSchema !== null` so the Proceed button and input message logic
  work correctly with advanced-only fields.
2026-04-02 17:58:15 +02:00
Zamil Majdy
0c5d628b74 fix(frontend): sync inputValues state when output prop updates in SetupRequirementsCard
The inputValues state was initialized from the output prop via useState,
which only runs on mount. When the output prop updated via streaming, the
form would show stale data. Added a useEffect that merges new initial
values from the prop while preserving user-edited fields.
2026-04-02 17:44:11 +02:00
Zamil Majdy
ed40549499 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/agent-generation-dry-run-loop 2026-04-02 17:43:17 +02:00
Zamil Majdy
fbe634fb19 fix(platform): handle null user_id in cost logs and fix 0.0 cost stored as NULL
- Add null-safe optional chaining for user_id.slice() in LogsTable, displaying
  "Deleted user" when user_id is null to prevent frontend crash
- Change `if cost_float` to `if cost_float is not None` in token_tracking.py
  so that a legitimate $0.00 cost is stored as 0 instead of NULL
2026-04-02 17:38:59 +02:00
Zamil Majdy
a338c72c42 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into codex/platform-cost-tracking 2026-04-02 17:36:14 +02:00
Zamil Majdy
a9d13f0cbf ci: retrigger CI (flaky event loop test) 2026-04-02 17:33:23 +02:00
Zamil Majdy
e83e50a8f1 fix(frontend): wrap handleDeleteConfirm to prevent MouseEvent as force param 2026-04-02 17:32:33 +02:00
Zamil Majdy
7f4398efa3 feat(platform): provider-specific tracking types for accurate cost metrics
Replace one-size-fits-all tracking cascade with provider-aware logic:
- cost_usd: OpenRouter (x-total-cost header), Exa (cost_dollars)
- tokens: OpenAI, Anthropic, Groq, Ollama (token counts)
- characters: Unreal Speech, ElevenLabs (input text length)
- api_calls: Google Maps (1 nearby + N detail calls)
- sandbox_seconds: E2B (sandbox execution time)
- generation_seconds: FAL, Revid, D-ID, Replicate (video/image gen time)
- per_run: Apollo, SmartLead, ZeroBounce, Jina, etc.
2026-04-02 17:30:15 +02:00
Zamil Majdy
c2a054c511 fix(backend): prevent provider_cost loss on stats merge and widen costMicrodollars to BigInt
- NodeExecutionStats.__iadd__ was overwriting accumulated provider_cost
  with None when merging stats that lacked provider_cost (e.g. the final
  llm_call_count/llm_retry_count merge). Skip None values in __iadd__
  so existing data is never erased.
- Widen PlatformCostLog.costMicrodollars from Int (max ~$2,147) to
  BigInt to prevent theoretical overflow for high-cost aggregated
  node executions.
2026-04-02 17:28:27 +02:00
Zamil Majdy
b256560619 fix(frontend): add force-delete flow and try/catch for credential operations
- DeleteConfirmationModal now shows backend warning message and offers
  "Force Delete" when API returns need_confirmation instead of just a
  toast (mirrors integrations page pattern)
- HostScopedCredentialsModal onSubmit delete-then-create is now wrapped
  in try/catch to prevent silent credential loss on creation failure
2026-04-02 17:25:04 +02:00
Zamil Majdy
c63d5f538b Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 17:18:50 +02:00
Zamil Majdy
eeba884671 fix(platform): fix ClamAV connectivity in Docker containers
clamd was only listening on 127.0.0.1 inside its container, so
container-to-container connections on the Docker network were refused.

- Add CLAMD_CONF_TCPAddr=0.0.0.0 to docker-compose so clamd binds
  to all interfaces
- Change default clamav_service_host from "localhost" to "clamav"
  (the docker-compose service name), matching how other services
  like redis, rabbitmq, supabase-db are referenced
2026-04-02 17:18:13 +02:00
Zamil Majdy
90822e3f37 fix(frontend+backend): prefill block inputs and hide advanced in CoPilot setup card
Backend:
- get_inputs_from_schema() now accepts input_data to populate each field's
  value with what CoPilot already provided, and includes the advanced flag
  from the schema so the frontend can hide non-essential fields.

Frontend:
- SetupRequirementsCard prefills form inputs from backend-provided values
  instead of showing empty forms
- Advanced fields hidden by default with "Show advanced fields" toggle
  (matching builder behaviour)
- siblingInputs built from both input values and discriminator_values
  so the host pattern modal can extract the host from the URL
- extractInitialValues() populates form state from prefilled values
2026-04-02 17:18:06 +02:00
Zamil Majdy
a8bb6b5544 fix(frontend): prefill host pattern in CoPilot credential setup modal
The SetupRequirementsCard passed inputValues={{}} to CredentialsGroupedView,
which meant the HostScopedCredentialsModal never received the target URL
from the backend's discriminator_values. The "Host Pattern" field was always
empty even though the CoPilot knew the exact host (e.g. api.openai.com).

Add buildSiblingInputsFromCredentials() to extract the discriminator value
(URL) from the missing_credentials setup_info and pass it as siblingInputs
so the modal can prefill the host pattern.
2026-04-02 17:17:59 +02:00
Zamil Majdy
83b00f4789 feat(platform): add copilot/autopilot cost tracking via token_tracking.py
Copilot uses OpenRouter via a separate code path (not through the block
executor). This integrates PlatformCostLog into the shared
persist_and_record_usage() function which is called by both SDK and
baseline copilot paths, capturing:
- Every LLM turn (main conversation, title gen, context compression)
- Tokens (prompt + completion + cache)
- Actual USD cost when available (SDK path provides cost_usd)
- Session ID for correlation
2026-04-02 17:17:53 +02:00
Zamil Majdy
4cd53bb7f6 Merge remote-tracking branch 'origin/codex/platform-cost-tracking' into combined-preview-test 2026-04-02 17:14:29 +02:00
Zamil Majdy
96d83e9bbd Merge remote-tracking branch 'origin/fix/copilot-p0-cli-internals' into combined-preview-test 2026-04-02 17:14:29 +02:00
Zamil Majdy
e99f4ac767 Merge remote-tracking branch 'origin/feat/rate-limit-tiering' into combined-preview-test 2026-04-02 17:14:29 +02:00
Zamil Majdy
67c2540177 Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 17:14:29 +02:00
Nicholas Tindle
0da949ba42 feat(e2b): set git committer identity from user's GitHub profile (#12650)
## Summary

Sets git author/committer identity in E2B sandboxes using the user's
connected GitHub account profile, so commits are properly attributed.

## Changes

### `integration_creds.py`
- Added `get_github_user_git_identity(user_id)` that fetches the user's
name and email from the GitHub `/user` API
- Uses TTL cache (10 min) to avoid repeated API calls
- Falls back to GitHub noreply email
(`{id}+{login}@users.noreply.github.com`) when user has a private email
- Falls back to `login` if `name` is not set

### `bash_exec.py`
- After injecting integration env vars, calls
`get_github_user_git_identity()` and sets `GIT_AUTHOR_NAME`,
`GIT_AUTHOR_EMAIL`, `GIT_COMMITTER_NAME`, `GIT_COMMITTER_EMAIL`
- Only sets these if the user has a connected GitHub account

### `bash_exec_test.py`
- Added tests covering: identity set from GitHub profile, no identity
when GitHub not connected, no injection when no user_id

## Why
Previously, commits made inside E2B sandboxes had no author identity
set, leading to unattributed commits. This dynamically resolves identity
from the user's actual GitHub account rather than hardcoding a default.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds outbound calls to GitHub’s `/user` API during `bash_exec` runs
and injects returned identity into the sandbox environment, which could
impact reliability (network/timeouts) and attribution behavior. Caching
mitigates repeated calls but incorrect/expired tokens or API failures
may lead to missing identity in commits.
> 
> **Overview**
> Sets git author/committer environment variables in the E2B `bash_exec`
path by fetching the connected user’s GitHub profile and injecting
`GIT_AUTHOR_*`/`GIT_COMMITTER_*` into the sandbox env.
> 
> Introduces `get_github_user_git_identity()` with TTL caching
(including a short-lived null cache), fallback to GitHub noreply email
when needed, and ensures `invalidate_user_provider_cache()` also clears
identity caches for the `github` provider. Updates tests to cover
identity injection behavior and the new cache invalidation semantics.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
955ec81efe. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: AutoGPT <autopilot@agpt.co>
2026-04-02 15:07:22 +00:00
Zamil Majdy
95524e94b3 feat(platform): add tracking_type and tracking_amount to cost log metadata
Standardize cost tracking across providers:
- cost_usd: actual dollar cost (OpenRouter, Exa)
- tokens: total token count (LLM blocks)
- duration_seconds: execution time (video gen, sandboxes)
- per_run: flat per-request (all others)
2026-04-02 17:04:50 +02:00
Zamil Majdy
eda02f9ce6 fix(backend/copilot): remove duplicate StreamError in _HandledStreamError handler
The _HandledStreamError exception is only raised by _run_stream_attempt
*after* it has already yielded a StreamError to the client. The handler
in the retry loop was yielding a second StreamError for non-transient
errors (e.g. circuit breaker trips) and when transient retries were
exhausted, causing the client to receive duplicate error events.

Remove the redundant yield since the StreamError was already sent.
2026-04-02 17:03:40 +02:00
Zamil Majdy
9ab6082a23 fix(frontend): handle credential deletion errors with toast feedback
handleDeleteConfirm now catches API errors and shows a destructive
toast instead of silently failing. It also checks for
need_confirmation responses when the credential is still in use.
2026-04-02 17:03:00 +02:00
Zamil Majdy
2c517ff9a1 feat(platform): add per-provider cost extraction
- OpenRouter: Extract actual USD cost from x-total-cost response header
- Exa (search, contents): Write cost_dollars.total to execution_stats
- LLM blocks: Store provider_cost in stats when available
- Add provider_cost field to NodeExecutionStats
- Hook now converts provider_cost to costMicrodollars in PlatformCostLog
- Metadata includes both credit_cost and provider_cost_usd when available
2026-04-02 16:57:34 +02:00
Zamil Majdy
7020ae2189 fix(backend): handle NULL userId in platform cost models and queries
Make user_id Optional[str] in UserCostSummary and CostLogRow to handle
cases where the referenced user has been deleted. Use .get() for safe
access to user_id from query result rows. Regenerate OpenAPI schema.
2026-04-02 16:54:09 +02:00
Zamil Majdy
a49ac5ba13 fix(frontend): update CredentialsProvidersContext state on credential deletion
The delete mutation was using useDeleteV1DeleteCredentials which only
invalidated React Query caches but did not update the context's own
useState-managed credential list. Switch to the context's
deleteCredentials method which both calls the API and removes the
credential from the provider state, so the UI updates immediately.
2026-04-02 16:54:01 +02:00
Zamil Majdy
2a969e5018 fix(backend/copilot): yield final StreamError after transient retry exhaustion for _HandledStreamError
When _run_stream_attempt raises a _HandledStreamError and all transient
retries are exhausted, the outer retry loop sets ended_with_stream_error
but stream_err remains None.  The post-loop code only emits a StreamError
when stream_err is not None, so the SSE stream closes silently and the
frontend never learns the request failed.

Yield a StreamError with the attempt's error message and code just before
breaking out of the retry loop, ensuring clients always receive an error
notification.
2026-04-02 16:49:18 +02:00
Zamil Majdy
79005b1be5 fix(backend): move audit log after user existence check in set_user_rate_limit_tier
The tier-change audit log was written before verifying the user exists,
creating misleading log entries for non-existent users. Move the user
existence check (via get_user_email_by_id) before the audit log and
remove the now-redundant prisma.errors.RecordNotFoundError catch.
2026-04-02 16:48:48 +02:00
Zamil Majdy
4f8cdbee47 Merge remote-tracking branch 'origin/codex/platform-cost-tracking' into combined-preview-test 2026-04-02 16:42:12 +02:00
Zamil Majdy
3ed444dd60 Merge remote-tracking branch 'origin/fix/copilot-credential-setup-ui' into combined-preview-test 2026-04-02 16:42:12 +02:00
Zamil Majdy
83e747ebcd Merge remote-tracking branch 'origin/fix/copilot-tool-output-e2b-bridging' into combined-preview-test 2026-04-02 16:42:12 +02:00
Zamil Majdy
827f2b0f87 Merge origin/fix/copilot-p0-cli-internals (resolved conflicts) 2026-04-02 16:42:12 +02:00
Zamil Majdy
b0d5d3b95e Merge origin/fix/copilot-subagent-security (resolved conflicts) 2026-04-02 16:42:12 +02:00
Zamil Majdy
eb9244be1a Merge origin/feat/copilot-mode-toggle (resolved conflicts) 2026-04-02 16:42:11 +02:00
Zamil Majdy
dd17e83299 Merge remote-tracking branch 'origin/feat/copilot-include-graph-option' into combined-preview-test 2026-04-02 16:42:11 +02:00
Zamil Majdy
74009bedac Merge origin/feat/rate-limit-tiering (resolved conflicts) 2026-04-02 16:42:11 +02:00
Zamil Majdy
72d0c8dad8 Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 16:42:11 +02:00
Zamil Majdy
e860f164e4 Merge remote-tracking branch 'origin/fix/dry-run-special-blocks' into combined-preview-test 2026-04-02 16:42:11 +02:00
Zamil Majdy
b9336984be fix(platform): re-add credit_cost to platform cost log metadata
Include the block's credit cost (from block_cost_config) in the log
metadata so every entry has a known cost proxy even when the provider
doesn't expose actual dollar costs.
2026-04-02 16:37:28 +02:00
Zamil Majdy
9924dedddc fix(platform): address bot review comments (sentry + coderabbit)
- CRITICAL: Use execute_raw_with_schema for INSERT (not query_raw)
- Remove accidentally committed transcripts/
- Add dry_run guard to skip cost logging for simulated executions
- Change onDelete: Cascade → SetNull to preserve cost history
- Add standalone createdAt index for date-only queries
- Add deterministic tiebreaker (id) to pagination ORDER BY
- Update migration SQL to match schema changes
2026-04-02 16:26:01 +02:00
Zamil Majdy
c054799b4f fix: regenerate API schema and block docs 2026-04-02 16:23:12 +02:00
Zamil Majdy
004d3957b3 docs: regenerate misc.md block docs after dev merge 2026-04-02 16:20:51 +02:00
Zamil Majdy
f3b5d584a3 fix(platform): address PR review round 5
- Replace ServerCrash icon with Receipt for Platform Costs sidebar
2026-04-02 16:02:00 +02:00
Zamil Majdy
476d9dcf80 fix(platform): address PR review round 4
- Add tests for query parameter forwarding and pagination
2026-04-02 16:00:08 +02:00
Zamil Majdy
072b623f8b fix(platform): address PR review round 3
- Remove duplicate block_usage_cost call from cost logging
- Add case-insensitive provider filter using LOWER()
- Add platform_cost_routes_test.py with basic endpoint tests
2026-04-02 15:58:00 +02:00
Zamil Majdy
a68f48e6b7 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-02 15:55:59 +02:00
Zamil Majdy
60e2474640 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/agent-generation-dry-run-loop 2026-04-02 15:55:58 +02:00
Zamil Majdy
a892bbd4dd Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/dry-run-special-blocks 2026-04-02 15:55:56 +02:00
Zamil Majdy
538e8619da Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-02 15:55:54 +02:00
Zamil Majdy
4edb1f6e4a Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-02 15:55:50 +02:00
Zamil Majdy
480d58607d Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-subagent-security 2026-04-02 15:55:49 +02:00
Zamil Majdy
8561eb35f2 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-include-graph-option 2026-04-02 15:55:47 +02:00
Zamil Majdy
0b4acd73f4 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-tool-output-e2b-bridging 2026-04-02 15:55:45 +02:00
Zamil Majdy
e9fe2991d6 chore: remove accidentally committed test screenshots 2026-04-02 15:55:23 +02:00
Zamil Majdy
26b0c95936 fix(platform): address PR review round 2
- Parallelize dashboard queries with asyncio.gather for ~3x speedup
- Move json import to top-level
- Use consistent p. table alias across all dashboard queries
2026-04-02 15:55:03 +02:00
Zamil Majdy
735965bbe5 docs: regenerate misc.md block docs after dev merge 2026-04-02 15:54:14 +02:00
Zamil Majdy
a8f9ed0f60 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into zamilmajdy/secrt-2171-sql-query-block-for-copilotautopilot-analytics-access 2026-04-02 15:53:49 +02:00
Zamil Majdy
308357de84 fix(platform): address PR review round 1
- Parameterize LIMIT/OFFSET in SQL queries to prevent injection
- Only log platform cost on successful block execution
- Convert model enum values to strings for proper logging
- Add error handling with try/catch/finally in frontend useEffect
- Drive filter state from URL params to prevent desync
- Add dark mode support using design tokens
- Return total_users count in dashboard for accurate reporting
- Add credit_cost to metadata as cost proxy until per-token pricing
2026-04-02 15:51:28 +02:00
Zamil Majdy
1a6c50c6cc feat(platform): add platform cost tracking for system credentials
Track real API costs incurred when users consume system-managed credentials.
Captures provider, tokens, duration, and model per block execution and
surfaces an admin dashboard with provider/user aggregation and raw logs.
2026-04-02 15:42:18 +02:00
Zamil Majdy
9391dfa4b2 docs: regenerate block documentation to sync with code 2026-04-02 15:39:07 +02:00
Zamil Majdy
6b031085bd feat(platform): add generic ask_question copilot tool (#12647)
### Why / What / How

**Why:** The copilot can ask clarifying questions in plain text, but
that text gets collapsed into hidden "reasoning" UI when the LLM also
calls tools in the same turn. This makes clarification questions
invisible to users. The existing `ClarificationNeededResponse` model and
`ClarificationQuestionsCard` UI component were built for this purpose
but had no tool wiring them up.

**What:** Adds a generic `ask_question` tool that produces a visible,
interactive clarification card instead of collapsible plain text. Unlike
the agent-generation-specific `clarify_agent_request` proposed in
#12601, this tool is workflow-agnostic — usable for agent building,
editing, troubleshooting, or any flow needing user input.

**How:** 
- Backend: New `AskQuestionTool` reuses existing
`ClarificationNeededResponse` model. Registered in `TOOL_REGISTRY` and
`ToolName` permissions.
- Frontend: New `AskQuestion/` renderer reuses
`ClarificationQuestionsCard` from CreateAgent. Registered in
`CUSTOM_TOOL_TYPES` (prevents collapse into reasoning) and
`MessagePartRenderer`.
- Guide: `agent_generation_guide.md` updated to reference `ask_question`
for the clarification step.

### Changes 🏗️

- **`copilot/tools/ask_question.py`** — New generic tool: takes
`question`, optional `options[]` and `keyword`, returns
`ClarificationNeededResponse`
- **`copilot/tools/__init__.py`** — Register `ask_question` in
`TOOL_REGISTRY`
- **`copilot/permissions.py`** — Add `ask_question` to `ToolName`
literal
- **`copilot/sdk/agent_generation_guide.md`** — Reference `ask_question`
tool in clarification step
- **`ChatMessagesContainer/helpers.ts`** — Add `tool-ask_question` to
`CUSTOM_TOOL_TYPES`
- **`MessagePartRenderer.tsx`** — Add switch case for
`tool-ask_question`
- **`AskQuestion/AskQuestion.tsx`** — Renderer reusing
`ClarificationQuestionsCard`
- **`AskQuestion/helpers.ts`** — Output parsing and animation text

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Backend format + pyright pass
  - [x] Frontend lint + types pass
  - [x] Pre-commit hooks pass
- [ ] Manual test: copilot uses `ask_question` and card renders visibly
(not collapsed)
2026-04-02 12:56:48 +00:00
Zamil Majdy
6a69d7c68d fix(backend): hoist _COMMON_CRED_KEYS to module level, conditional source-code instruction
- Move _COMMON_CRED_KEYS to a module-level frozenset to avoid recreating
  it on every call to build_simulation_prompt
- Make the "Study the block's run() source code" instruction conditional
  on source code actually being available, falling back to a generic
  description-based instruction
2026-04-02 14:51:09 +02:00
Zamil Majdy
ad77e881c9 fix(backend/copilot): strip stale thinking blocks in upload_transcript
Add strip_stale_thinking_blocks() call to upload_transcript() alongside
the existing strip_progress_entries(). When a user switches from SDK
(extended_thinking) to baseline (fast) mode and back, the re-downloaded
transcript may contain stale thinking blocks from the SDK session.
Without stripping, these blocks consume significant tokens and trigger
unnecessary compaction cycles.
2026-04-02 14:50:50 +02:00
Zamil Majdy
f1aedfeedd fix(backend): guard against None name in _default_for_input_result
When name key exists with explicit None value, _default_for_input_result
would return None for string-typed pins instead of a string. Add
fallback to "sample input" and fix the type hint to reflect nullable.
2026-04-02 14:48:06 +02:00
Zamil Majdy
49c7ab4011 fix(backend/copilot): set correct stop_reason in baseline transcript entries
Set stop_reason="tool_use" for assistant messages with tool calls and
stop_reason="end_turn" for final text responses. This ensures the
transcript format is compatible with the SDK's --resume flag when a
user switches from fast to extended_thinking mode mid-conversation.
2026-04-02 14:39:47 +02:00
Zamil Majdy
2d04584c84 fix(backend/copilot): correct outdated E2B bridge threshold in system prompt
The prompt said files >5 MB go to /home/user/ but the actual threshold
was lowered to 32 KB. Replace with a generic description that avoids
hardcoding the threshold and directs the model to the [Sandbox copy
available at ...] annotation instead.
2026-04-02 14:39:35 +02:00
Zamil Majdy
2578f61abb fix(backend): remove dead simulation_context param, fix options rename, dedupe constant
- Remove unused simulation_context parameter from simulate_block, RunAgentInput, and _run_agent
- Update placeholder_values references to options (renamed in #12595), with fallback for legacy data
- Remove duplicate _THINKING_BLOCK_TYPES definition in transcript.py
- Update tests to use options field name
2026-04-02 14:38:28 +02:00
Zamil Majdy
927c6e7db0 fix(frontend): add aria-label and disabled state to mode toggle button
- Add aria-label for screen reader accessibility
- Disable button during streaming to prevent confusing mode switches mid-turn
- Add opacity/cursor styling when disabled
2026-04-02 14:38:00 +02:00
Zamil Majdy
f753e6162f fix(backend): consolidate test_agent_search.py into agent_search_test.py
The test file used prefix naming (test_*.py) which is inconsistent with
the codebase convention (*_test.py). Moved all tests into the existing
agent_search_test.py file and removed the duplicate.
2026-04-02 14:37:38 +02:00
Zamil Majdy
b996bc556b fix(backend): clamp search_users limit to [1, 50] to prevent negative take values
A negative limit query parameter would pass through min(limit, 50) as
a negative value to Prisma's take parameter, causing unexpected behavior.
Added max(1, ...) clamping and test coverage for the edge case.
2026-04-02 14:37:02 +02:00
Zamil Majdy
e4f79261c1 fix(docs): correct host field type from "str (password)" to "str (secret)"
The host field is marked as secret=True (hidden in UI) but is not a password.
The "(password)" label was misleading.
2026-04-02 14:36:12 +02:00
Zamil Majdy
09bc939498 chore: remove accidentally committed test screenshots
These binary images and log files inflate the repository and are not
needed for CI or code review.
2026-04-02 14:35:26 +02:00
Zamil Majdy
79c5a10f75 fix(backend/copilot): add missing security test for tool-outputs path allowlist
The allowlist was expanded to accept tool-outputs/ in addition to
tool-results/, but security_hooks_test.py only verified tool-results.
Add test_read_tool_outputs_allowed to close the security test coverage gap.
2026-04-02 14:35:18 +02:00
Zamil Majdy
2bf5a37646 fix(backend): add ge/le bounds to claude_agent_max_transient_retries config field
The field lacked validation bounds unlike max_turns and max_budget_usd,
allowing negative or excessively large values to be configured.
2026-04-02 14:35:09 +02:00
Zamil Majdy
d5d24e6e66 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/dry-run-special-blocks 2026-04-02 14:34:50 +02:00
Zamil Majdy
c9cbd7531e fix(copilot): sanitize tool_use_id and resp_preview in post_tool_use_hook, remove test-results
- post_tool_use_hook logged tool_use_id with only truncation ([:12]) while
  post_tool_failure_hook properly sanitized it via _sanitize(). Now both hooks
  use _sanitize() consistently to strip control characters before logging.
- resp_preview from tool_response was also logged without sanitization.
- Remove test-results/ directory that should not ship in a production PR.
2026-04-02 14:34:46 +02:00
Zamil Majdy
289a19d402 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-02 14:34:33 +02:00
Zamil Majdy
7800af1835 fix(backend): remove duplicate _THINKING_BLOCK_TYPES definition in transcript.py
The constant was already defined at module level (line 48) and used by both
_strip_thinking_from_non_last_assistant and _flatten_assistant_content. The
duplicate added at line 692 was redundant.
2026-04-02 14:34:03 +02:00
Zamil Majdy
114f91ff53 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-02 14:32:47 +02:00
Zamil Majdy
63a0153e4f fix(platform): fix ClamAV connectivity in Docker containers
clamd was only listening on 127.0.0.1 inside its container, so
container-to-container connections on the Docker network were refused.

- Add CLAMD_CONF_TCPAddr=0.0.0.0 to docker-compose so clamd binds
  to all interfaces
- Change default clamav_service_host from "localhost" to "clamav"
  (the docker-compose service name), matching how other services
  like redis, rabbitmq, supabase-db are referenced
2026-04-02 13:34:50 +02:00
Zamil Majdy
1364616ff1 fix(frontend+backend): prefill block inputs and hide advanced in CoPilot setup card
Backend:
- get_inputs_from_schema() now accepts input_data to populate each field's
  value with what CoPilot already provided, and includes the advanced flag
  from the schema so the frontend can hide non-essential fields.

Frontend:
- SetupRequirementsCard prefills form inputs from backend-provided values
  instead of showing empty forms
- Advanced fields hidden by default with "Show advanced fields" toggle
  (matching builder behaviour)
- siblingInputs built from both input values and discriminator_values
  so the host pattern modal can extract the host from the URL
- extractInitialValues() populates form state from prefilled values
2026-04-02 13:24:49 +02:00
Zamil Majdy
4e9169c1a2 fix(frontend): prefill host pattern in CoPilot credential setup modal
The SetupRequirementsCard passed inputValues={{}} to CredentialsGroupedView,
which meant the HostScopedCredentialsModal never received the target URL
from the backend's discriminator_values. The "Host Pattern" field was always
empty even though the CoPilot knew the exact host (e.g. api.openai.com).

Add buildSiblingInputsFromCredentials() to extract the discriminator value
(URL) from the missing_credentials setup_info and pass it as siblingInputs
so the modal can prefill the host pattern.
2026-04-02 12:40:45 +02:00
Zamil Majdy
705e97ec46 fix(backend/copilot): don't cache DEFAULT_TIER for non-existent users
When `_fetch_user_tier` is called for a user that doesn't exist yet, it
was returning `DEFAULT_TIER` (FREE) which the `@cached` decorator would
store for 5 minutes. If the user was then created with a higher tier
(e.g. PRO), they'd receive the stale cached FREE tier until TTL expiry.

Fix: raise `_UserNotFoundError` instead of returning `DEFAULT_TIER` when
the user record is missing or has no subscription tier. The `@cached`
decorator only caches successful returns, not exceptions. The outer
`get_user_tier` wrapper catches the exception and returns `DEFAULT_TIER`
without caching, so the next call re-queries the database.

Adds a regression test verifying that a not-found result is not cached
and a subsequent lookup after user creation returns the correct tier.
2026-04-02 11:33:10 +02:00
Zamil Majdy
8cea0bede0 fix(backend): generate type-appropriate dry-run fallback for typed AgentInputBlock subclasses
The simulator's AgentInputBlock passthrough always generated a string
fallback when no user value was provided. Typed subclasses like
AgentNumberInputBlock (int), AgentDateInputBlock (date), and
AgentToggleInputBlock (bool) then failed downstream validation.

Inspect the block's output schema `result` pin to determine the expected
type and generate an appropriate default (0 for int, today's date for
date, false for bool, etc.) instead of a plain string.
2026-04-02 11:32:51 +02:00
Zamil Majdy
5e52050788 fix(backend): re-enable dry-run input validation and fill missing simulator output pins
1. _base.py: Instead of blanket-skipping input validation in dry-run mode,
   validate non-credential fields so blocks executing for real (e.g.
   AgentExecutorBlock) still get proper input validation. Credential fields
   are excluded since they contain sentinel None values.

2. simulator.py: After yielding LLM-simulated outputs, fill in
   type-appropriate defaults for any required output pins the LLM omitted.
   This prevents downstream nodes from stalling in INCOMPLETE state when
   the simulation response is incomplete.
2026-04-02 11:11:57 +02:00
Zamil Majdy
0b77af29aa fix(platform): address PR reviewer blockers and should-fix items
- Add schema.prisma comment documenting intentional @default(PRO) for beta
- Validate tier value in RateLimitDisplay.tsx to prevent undefined renders
- Add user existence check (404) in get_user_rate_limit_tier endpoint
- Add auth test for search_users endpoint
- Add tier downgrade test (PRO -> FREE)
- Add test for get_user_rate_limit_tier with non-existent user
2026-04-02 11:10:31 +02:00
Zamil Majdy
bd4cc21fc6 fix(backend/copilot): preserve successful graph fetches on timeout and use py3.10-compat wait_for
Remove the blanket `a.graph = None` loop in the TimeoutError handler that
was wiping already-fetched graphs. Agents that completed before the timeout
keep their results; agents still pending already have graph=None from the
model default.

Also replace `asyncio.timeout()` (Python 3.11+) with `asyncio.wait_for()`
which is available since Python 3.4, matching the `python >= 3.10`
requirement in pyproject.toml.

Add tests for the timeout path, success path, and skip-no-graph-id path.
2026-04-02 11:08:20 +02:00
Zamil Majdy
19ea753639 fix(backend/copilot): address review feedback on _bridge_to_sandbox
- Use asyncio.to_thread for synchronous file read in async context
- Promote bridge failure logging from DEBUG to WARNING
- Extract magic number 2000 to _DEFAULT_READ_LIMIT named constant
2026-04-02 11:07:48 +02:00
Zamil Majdy
07fd734fa1 style: reformat _base.py (black) 2026-04-02 10:56:09 +02:00
Zamil Majdy
8a4a16ec5c fix(backend): don't null credentials in dry-run prepare_dry_run
validate_data strips None values from input_data before JSON schema
validation. Setting credentials=None caused the field to be absent,
failing the required check. Keep original credentials in input_data
(actual platform creds injected via extra_exec_kwargs in manager.py).

This fixes OrchestratorBlock failing with "credentials is a required
property" when executed as part of a child graph in dry-run mode.
2026-04-02 10:35:08 +02:00
Zamil Majdy
5551da674e fix(blocks): skip input validation in dry-run mode for blocks with sentinel credentials
Two fixes for dry-run execution of nested agents:

1. _base.py: Skip validate_data() when execution_context.dry_run is True.
   prepare_dry_run() sets credentials=None for OrchestratorBlock (platform
   key injected separately), but the block's own JSON schema validation
   rejected None as "required property". This caused any dry-run execution
   of graphs containing OrchestratorBlock to fail with BlockInputError.

2. agent.py: Check required inner-agent inputs against data["inputs"]
   instead of top-level data keys (previous commit 6f03ceeb88).
2026-04-02 10:33:22 +02:00
Zamil Majdy
e57e48272a security: remove test artifacts containing leaked API keys and OAuth tokens 2026-04-02 10:23:21 +02:00
Zamil Majdy
6f03ceeb88 fix(blocks): validate AgentExecutorBlock inputs against nested inputs dict
get_missing_input() and get_mismatch_error() were checking required fields
from the inner agent's input_schema against the top-level node data keys
(inputs, user_id, graph_id, etc.) instead of against data["inputs"] where
the actual field values live. This caused any AgentExecutorBlock with
required inner-agent inputs to fail validation with "This field is required"
even when the values were correctly provided in the inputs dict.
2026-04-02 10:10:06 +02:00
Zamil Majdy
554ff0b20b dx(backend/copilot): add live execution test evidence for subagent security hooks
Test results from live execution showing SubagentStart/SubagentStop hooks
firing correctly for two parallel Agent tool invocations with proper
slot tracking (active=N/10) and JSONL transcript persistence.
2026-04-02 10:06:56 +02:00
Zamil Majdy
c2f421cb42 dx(backend/copilot): add live execution guardrail verification for PR #12636
Programmatic verification from running container proving all P0 guardrails
are deployed and active: max_turns=50, max_budget_usd=5.0,
fallback_model=claude-sonnet-4-20250514, max_transient_retries=3,
security env vars, and _last_reset_attempt infinite-loop fix.
2026-04-02 10:01:46 +02:00
Zamil Majdy
dd228de17d fix(backend/copilot): preserve binary files when bridging to E2B sandbox
_bridge_to_sandbox was decoding all file content with
`errors='replace'`, silently corrupting non-UTF-8 bytes (images, PDFs,
etc.) by replacing them with U+FFFD.

Now attempts strict UTF-8 decode first; on failure writes raw bytes
via sandbox.files.write() (which accepts Union[str, bytes, IO]) or
base64-encoded shell pipe for /tmp paths.

Also updates _sandbox_write to accept str | bytes and adds tests for
both small and large binary file bridging.
2026-04-02 09:56:52 +02:00
Zamil Majdy
c26ff22f9c fix(blocks): allow MySQL SELECT INTO @variable syntax in SQL query validation
The INTO keyword was blanket-blocked in _DISALLOWED_KEYWORDS, which
incorrectly rejected the valid read-only MySQL syntax
`SELECT ... INTO @variable` for session variable assignment.

Replace the blanket INTO ban with a contextual check that allows
INTO followed by @-prefixed user variables while still blocking:
- SELECT INTO table_name (PG/MSSQL table creation)
- SELECT INTO OUTFILE/DUMPFILE (MySQL filesystem writes)
- INSERT INTO (already caught by INSERT, but defense-in-depth)

Also remove dead OUTFILE/DUMPFILE entries from _DISALLOWED_KEYWORDS
since sqlparse classifies them as Name tokens, not Keywords, so
they were never matched by the keyword extraction logic.
2026-04-02 09:56:51 +02:00
Zamil Majdy
760360fbe9 fix(backend): use deterministic SHA-256 hash for Redis cache keys
Python's built-in `hash()` is randomised per-process via PYTHONHASHSEED.
In a multi-pod deployment each pod computes a different hash for the same
arguments, causing Redis cache lookups and invalidations (e.g.
`cache_delete`) to silently miss across pods.

Replace `hash()` with `hashlib.sha256` over the `repr()` of the key
tuple, which is deterministic across processes and machines.
2026-04-02 09:56:35 +02:00
Zamil Majdy
e3d589b180 fix(backend/copilot): exclude StreamError/StreamStatus from events_yielded counter
StreamError and StreamStatus are ephemeral notifications, not content
events. When _run_stream_attempt yields a StreamError for a transient
API error before raising _HandledStreamError, the events_yielded counter
was incremented, causing _next_transient_backoff() to return None and
bypassing the retry logic entirely. Exclude these event types from the
counter so transient errors are properly retried with exponential backoff.
2026-04-02 09:56:34 +02:00
Zamil Majdy
913d93f47c test: add E2E dry-run loop validation screenshots (round 4)
Unit tests (37/37 pass) and browser E2E test confirm the full
create -> dry-run -> inspect -> fix -> dry-run loop is working.
2026-04-02 09:49:26 +02:00
Zamil Majdy
03e5d37dc4 test(backend/copilot): add E2E test screenshots for PR #12646 round 1 2026-04-02 09:38:04 +02:00
Zamil Majdy
6e2dab413e test(backend): add E2E test screenshots for SQL Query block PR #12569
Screenshots from round 3 E2E testing:
- Block search results showing SQL Query block
- Basic fields: DatabaseType, Host, Database, Query, Credentials
- Advanced fields: Port, ReadOnly, Timeout, MaxRows
- Credential modal with Username & Password labels
2026-04-02 09:35:25 +02:00
Zamil Majdy
b10dc7c2d5 ci(backend/copilot): add E2E test evidence for rate-limit tiering (round 4) 2026-04-02 09:25:26 +02:00
Zamil Majdy
8de935c84b dx(backend/copilot): add round 3 E2E test screenshots for PR #12636 2026-04-02 09:20:32 +02:00
Zamil Majdy
dd34b0dc48 fix(backend): lower bridge shell threshold and add collision-free sandbox paths
- Lower _BRIDGE_SHELL_MAX_BYTES from 5 MB to 32 KB to stay within
  ARG_MAX when base64-encoding content for shell transfer.
- Prefix bridged sandbox filenames with a 12-char SHA-256 hash of the
  full source path to prevent collisions when different source files
  share the same basename (e.g. multiple result.json files).
- Fix potential NameError in exception handler when basename is not yet
  assigned.
2026-04-02 09:07:43 +02:00
Zamil Majdy
015e0d591e fix(backend/copilot): remove type: ignore from conftest, use named fixtures
Address CodeRabbit review: remove # type: ignore[override] from SDK
conftest fixtures per AGENTS.md no-suppressor rule. Use name= parameter
in pytest_asyncio.fixture decorator with private function names instead.
2026-04-02 08:29:06 +02:00
Zamil Majdy
2cb65f5c34 fix(backend/copilot): use working_dir in prompt examples instead of hardcoded /home/user
The storage supplement template and _persist_and_summarize had hardcoded
/home/user/ paths in save_to_path examples. In local (bubblewrap) mode
the working dir is /tmp/copilot-<session>/, not /home/user/. Use the
{working_dir} template variable in prompting.py and a generic
<working_dir> placeholder in base.py so the model gets correct paths
regardless of execution mode.
2026-04-02 08:26:18 +02:00
Zamil Majdy
3a49086c3d fix(backend/copilot): use resolved path for bridging, explicit return None
- Pass `resolved` (realpath-expanded) to `_bridge_to_sandbox` in
  `_read_file_handler` so the bridge target matches the file that was
  actually read (addresses review comment).
- Replace bare `return` with explicit `return None` in
  `_bridge_to_sandbox` large-file skip path for consistency with the
  declared `str | None` return type.
2026-04-02 08:19:25 +02:00
Zamil Majdy
0e567df1da fix(backend/copilot): add concrete tool examples to file copy prompting
The "Moving files between storages" section only had direction labels
("Sandbox → Persistent") with no tool examples. Model didn't know HOW
to copy. Now shows write_workspace_file(source_path=...) for upload and
read_workspace_file(save_to_path=...) for download.
2026-04-02 08:15:59 +02:00
Zamil Majdy
b5b754d5eb fix(backend/copilot): return sandbox path from bridge, inform model of copy location
Address CodeRabbit review: _bridge_to_sandbox now returns the sandbox
path (or None on failure) so callers can append "[Sandbox copy available
at /tmp/file.json]" to the Read result. This gives the model explicit
feedback about where to find the file in the sandbox, instead of
silently bridging with no indication.
2026-04-02 08:03:36 +02:00
Zamil Majdy
456bb1c4d0 fix(frontend): use unfiltered credentials for host-scoped deduplication
The useCredentials hook pre-filters savedCredentials by discriminatorValue.
When no URL is entered yet, the filtered list is empty, causing the
deduplication logic to miss existing credentials and create duplicates.

Fix: access the full unfiltered credential list from CredentialsProvidersContext
for both the hasExistingForHost check and the delete-before-create logic.
2026-04-02 08:02:23 +02:00
Zamil Majdy
263cd0ecac fix(backend/copilot): add bridging to Read tool, size limits, prompting for images
- Add _bridge_to_sandbox call in _read_file_handler (tool_adapter.py)
  so the MCP Read tool (which the model actually uses) also bridges
  SDK-internal files into the E2B sandbox — not just the E2B read_file
- Move E2B-specific bridging text to _E2B_TOOL_NOTES (not shown in
  local bubblewrap mode)
- Add size-tiered bridging: shell base64 for <=5MB, files API for
  5-50MB, skip for >50MB
- Add CRITICAL prompting sections for binary/image data handling
  (use workspace, not inline) and @@agptfile references
- Add 7 unit tests for _bridge_to_sandbox
- Fix comment accuracy in context.py, update docstring
2026-04-02 08:00:05 +02:00
Zamil Majdy
66afca6e0c fix(backend/copilot): address review feedback - size limits, prompting, tests
- Move E2B-specific bridging text from shared prompt section to E2B
  supplement's extra_notes (MAJOR 1)
- Add size cap to _bridge_to_sandbox: <=5MB uses shell base64 to /tmp,
  5-50MB uses sandbox.files.write to /home/user, >50MB skipped (MAJOR 2)
- Add 7 unit tests for _bridge_to_sandbox covering happy path, skip
  conditions, error handling, and size-based routing (MINOR 3)
- Fix inaccurate comment about tool-outputs name origin (NIT 7)
- Update is_allowed_local_path docstring to mention tool-outputs (NIT 9)
- Add prompting guidance for handling base64 images in tool outputs
  (save to workspace, show via download URL)
- Add prompting guidance for using @@agptfile: references instead of
  copy-pasting large data between tools
- Add no-op server/graph_cleanup fixtures to sdk/conftest.py so SDK
  unit tests don't require Postgres
2026-04-02 07:56:49 +02:00
Toran Bruce Richards
11b846dd49 fix(blocks): rename placeholder_values to options on AgentDropdownInputBlock (#12595)
## Summary

Resolves [REQ-78](https://linear.app/autogpt/issue/REQ-78): The
`placeholder_values` field on `AgentDropdownInputBlock` is misleadingly
named. In every major UI framework "placeholder" means non-binding hint
text that disappears on focus, but this field actually creates a
dropdown selector that restricts the user to only those values.

## Changes

### Core rename (`autogpt_platform/backend/backend/blocks/io.py`)
- Renamed `placeholder_values` → `options` on
`AgentDropdownInputBlock.Input`
- Added clear field description: *"If provided, renders the input as a
dropdown selector restricted to these values. Leave empty for free-text
input."*
- Updated class docstring to describe actual behavior
- Overrode `model_construct()` to remap legacy `placeholder_values` →
`options` for **backward compatibility** with existing persisted agent
JSON

### Tests (`autogpt_platform/backend/backend/blocks/test/test_block.py`)
- Updated existing tests to use canonical `options` field name
- Added 2 new backward-compat tests verifying legacy
`placeholder_values` still works through both `model_construct()` and
`Graph._generate_schema()` paths

### Documentation
- Updated
`autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md`
— changed field name in CoPilot SDK guide
- Updated `docs/integrations/block-integrations/basic.md` — changed
field name and description in public docs

### Load tests
(`autogpt_platform/backend/load-tests/tests/api/graph-execution-test.js`)
- Removed spurious `placeholder_values: {}` from AgentInputBlock node
(this field never existed on AgentInputBlock)
- Fixed execution input to use `value` instead of `placeholder_values`

## Backward Compatibility

Existing agents with `placeholder_values` in their persisted
`input_default` JSON will continue to work — the `model_construct()`
override transparently remaps the old key to `options`. No database
migration needed since the field is stored inside a JSON blob, not as a
dedicated column.

## Testing

- All existing tests updated and passing
- 2 new backward-compat tests added
- No frontend changes needed (frontend reads `enum` from generated JSON
Schema, not the field name directly)

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-02 05:56:17 +00:00
Zamil Majdy
a71396ee48 fix(backend): update dry-run tests for platform key + fix falsy value filter
- Mock `_get_platform_openrouter_key` in `test_prepare_dry_run_orchestrator_block`
  so the test doesn't depend on a real OpenRouter key being present in CI.
  Also fix incorrect assertion that model is preserved (it's overridden to
  the simulation model).

- Fix output filter in `simulate_block` that incorrectly dropped valid falsy
  values like `False`, `0`, and `[]`. Now only `None` and empty strings are
  skipped.

- Add `test_generic_block_preserves_falsy_values` test to cover the fix.
2026-04-02 07:52:09 +02:00
Zamil Majdy
beb43bb847 fix(frontend): replace duplicate host-scoped credentials and add delete support
- HostScopedCredentialsModal now deletes existing credentials for the same
  host before creating new ones, preventing duplicates
- Wire up delete flow: CredentialsFlatView passes onDelete to CredentialRow,
  CredentialsInput renders DeleteConfirmationModal
- Update button text to "Update headers" when credentials already exist
- Dynamic modal title/button: "Update" vs "Add" based on existing creds
2026-04-02 07:51:59 +02:00
Zamil Majdy
a55653f8c1 fix(backend): tighten fallback model detection and reset flag on retry
- Remove "overloaded" from the fallback detection pattern in _on_stderr;
  only "fallback" reliably indicates the SDK switched models. An
  "overloaded" stderr line may just be a transient 529 error that gets
  retried without activating the fallback.

- Reset fallback_model_activated = False at the start of each retry
  iteration (alongside fallback_notified) so a flag set during a failed
  attempt does not leak into the next attempt as a spurious notification.
2026-04-02 07:50:34 +02:00
Zamil Majdy
f3dd708cf6 fix(backend/copilot): fix tool output file reading between E2B and host
Three issues prevented the copilot agent from processing large tool
outputs (e.g. base64 images) in the E2B sandbox:

1. _persist_and_summarize used path= attribute in the truncation tag,
   which the model confused with a local filesystem path. Changed to
   workspace_path= and added save_to_path guidance for E2B processing.

2. is_allowed_local_path only accepted "tool-results" directory but the
   SDK may also use "tool-outputs". Now accepts both.

3. When E2B is active and the Read tool accesses an SDK-internal file,
   the content was returned to the conversation but not available in
   the sandbox for bash processing. Added automatic bridging that copies
   the file into /tmp/<filename> in the sandbox.
2026-04-02 07:47:39 +02:00
Zamil Majdy
c4ff31c79c fix(backend/copilot): remove duplicate test and narrow exception assertion
- Remove duplicate `test_dry_run_accepts_explicit_false` (identical to
  `test_dry_run_accepts_false`)
- Use `pydantic.ValidationError` instead of broad `Exception` in
  `test_wait_for_result_upper_bound`
2026-04-02 07:25:44 +02:00
Zamil Majdy
9f2257daaa refactor(backend): move dry-run credential logic from manager.py to simulator.py
- OrchestratorBlock now uses platform simulation model + OpenRouter key
  instead of user's model/credentials during dry-run
- Credential restore + fallback-to-simulation logic moved into
  prepare_dry_run() and get_dry_run_credentials() in simulator.py
- manager.py reduced by ~30 lines of business logic
- Falls back to LLM simulation if platform OpenRouter key unavailable
2026-04-02 07:10:28 +02:00
Zamil Majdy
925e9a047c fix(platform): address remaining should-fix items for rate-limit tiering
- Add docstring noting SubscriptionTier mirrors schema.prisma enum and
  can be replaced with prisma.enums import after prisma generate
- Remove unnecessary JSDoc comments from useRateLimitManager helpers
  per frontend code convention (avoid comments unless complex)
- Add audit trail: log old tier when admin changes a user's tier
- Fix stale test assertion (DEFAULT_TIER is FREE, not PRO)
- Show tier label ("Pro plan") in UsagePanelContent for end users
- Add formatResetTime unit tests (UsagePanelContent.test.ts)
- Add tier label display test in UsageLimits.test.tsx
- Fix pre-existing pyright errors from prisma stubs not having
  subscriptionTier (type: ignore until prisma generate is run)
2026-04-02 06:56:57 +02:00
Zamil Majdy
3e6faf2de7 fix(copilot): address remaining should-fix items from reviewer
- Extract _normalize_model_name() to deduplicate provider-prefix
  stripping and dot-to-hyphen normalization shared by _resolve_sdk_model
  and _resolve_fallback_model.
- Emit a StreamStatus notification when the SDK activates the fallback
  model (detected via CLI stderr lines containing "fallback" or
  "overloaded").
- Item 5 (transcript rollback) was already addressed — both
  _HandledStreamError and generic Exception handlers snapshot and
  restore transcript_builder._entries on retry.
2026-04-02 06:53:55 +02:00
Zamil Majdy
40a1f504c0 fix(copilot): address 6 should-fix items from reviewer
- Add CLAUDE_CODE_TMPDIR unit tests for build_sdk_env
- Strengthen _sanitize() tests with caplog assertions
- Fix user-facing text (no internal tool names)
- Rename task_tool_use_ids → subagent_tool_use_ids
- Standardize 'Starting agent' terminology
- Fix denial messages: sub-tasks → sub-agents
2026-04-02 06:49:24 +02:00
Zamil Majdy
22e8c5c353 fix(copilot): update response_adapter test for expanded transient patterns
"API rate limited" is now correctly caught by is_transient_api_error
after adding 429/rate-limit patterns. Use a non-transient error
("Invalid API key provided") to test the raw error pass-through path.
2026-04-02 06:31:24 +02:00
Zamil Majdy
1de2a7fb09 fix(platform): address PR review items for rate-limit tiering
- Change DEFAULT_TIER from PRO to FREE (fail-closed on DB errors)
- Use shared_cache=True (Redis-backed) for _fetch_user_tier so tier
  changes propagate across pods immediately
- Use TIER_MULTIPLIERS.get(tier, 1) to avoid KeyError on unknown tiers
- Rename _tier to tier in routes.py where the variable is used, and
  to _ where it is truly unused
- Add minimum 3-char query length for search_users to prevent user
  table enumeration
- Use generated API client (getV2SearchUsersByNameOrEmail) instead of
  raw fetch() in useRateLimitManager
- Remove unnecessary cast and fallback in RateLimitDisplay
- Fix fragile call-count-based _ld_side_effect in tests to use
  flag_key matching pattern
- Update test assertion for DEFAULT_TIER change (FREE not PRO)
2026-04-02 06:28:36 +02:00
Zamil Majdy
b3d9e9e856 fix(backend): add 429/5xx patterns to is_transient_api_error and add config validators
- Add rate-limit (429) and server error (5xx) string patterns to
  is_transient_api_error() so the fallback retry path catches these
  in addition to connection-level errors (ECONNRESET).
- Add ge/le validators on max_turns (1-500) and max_budget_usd
  (0.01-100.0) to prevent misconfiguration.
- Rename max_transient -> max_transient_retries and
  _can_retry_transient() -> _next_transient_backoff() for clarity.
- Add comprehensive tests for all new transient patterns and config
  boundary validation.
2026-04-02 06:21:51 +02:00
Zamil Majdy
48b166a82c fix(backend): address PR review items for include_graph feature
- Surface truncation notice to copilot via response message when
  >_MAX_GRAPH_FETCHES agents are skipped, instead of only logging
- Add guidance in agent_generation_guide to use include_graph only
  after narrowing to a specific agent by UUID
- Add tests for truncation, mixed graph_id presence, partial
  success/failure across multiple agents, and keyword-search
  enrichment path
2026-04-02 06:21:27 +02:00
Zamil Majdy
697b15ce81 fix(backend/copilot): always append user message to transcript on retries
When a duplicate user message was suppressed (e.g. network retry), the
user turn was not added to the transcript builder while the assistant
reply still was, creating a malformed assistant-after-assistant structure
that broke conversation resumption. Now the user message is always
appended to the transcript when present and is_user_message, regardless
of whether the session-level dedup suppressed it.
2026-04-02 06:18:26 +02:00
Zamil Majdy
5beabf936c fix(frontend): revert useChatSession mutation call to match generated API
The generated mutateAsync requires an argument even for void mutations
due to react-query typing. Use `as never` cast to satisfy both the
generated type and the void constraint.
2026-04-02 06:12:24 +02:00
Zamil Majdy
b9e29c96bd fix(backend/copilot): detect prompt-too-long in AssistantMessage content and ResultMessage success subtype (#12642)
## Why

PR #12625 fixed the prompt-too-long retry mechanism for most paths, but
two SDK-specific paths were still broken. The dev session `d2f7cba3`
kept accumulating synthetic "Prompt is too long" error entries on every
turn, growing the transcript from 2.5 MB → 3.2 MB, making recovery
impossible.

Root causes identified from production logs (`[T25]`, `[T28]`):

**Path 1 — AssistantMessage content check:**
When the Claude API rejects a prompt, the SDK surfaces it as
`AssistantMessage(error="invalid_request", content=[TextBlock("Prompt is
too long")])`. Our check only inspected `error_text = str(sdk_error)`
which is `"invalid_request"` — not a prompt-too-long pattern. The
content was then streamed out as `StreamText`, setting `events_yielded =
1`, which blocked retry even when the ResultMessage fired.

**Path 2 — ResultMessage success subtype:**
After the SDK auto-compacts internally (via `PreCompact` hook) and the
compacted transcript is _still_ too long, the SDK returns
`ResultMessage(subtype="success", result="Prompt is too long")`. Our
check only ran for `subtype="error"`. With `subtype="success"`, the
stream "completed normally", appended the synthetic error entry to the
transcript via `transcript_builder`, and uploaded it to GCS — causing
the transcript to grow on each failed turn.

## What

- **AssistantMessage handler**: when `sdk_error` is set, also check the
content text. `sdk_error` being non-`None` confirms this is an API error
message (not user-generated content), so content inspection is safe.
- **ResultMessage handler**: check `result` for prompt-too-long patterns
regardless of `subtype`, covering the SDK auto-compact path where
`subtype="success"` with `result="Prompt is too long"`.

## How

Two targeted one-line condition expansions in `_run_stream_attempt`,
plus two new integration tests in `retry_scenarios_test.py` that
reproduce each broken path and verify retry fires correctly.

## Changes

- `backend/copilot/sdk/service.py`: fix AssistantMessage content check +
ResultMessage subtype-independent check
- `backend/copilot/sdk/retry_scenarios_test.py`: add 2 integration tests
for the new scenarios

## Checklist

- [x] Tests added for both new scenarios (45 total, all pass)
- [x] Formatted (`poetry run format`)
- [x] No false-positive risk: AssistantMessage check gated behind
`sdk_error is not None`
- [x] Root cause verified from production pod logs
2026-04-01 22:32:09 +00:00
Zamil Majdy
32bfe1b209 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-01 20:52:00 +02:00
Zamil Majdy
62302db470 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/agent-generation-dry-run-loop 2026-04-01 20:51:58 +02:00
Zamil Majdy
89c7f34d26 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/dry-run-special-blocks 2026-04-01 20:51:54 +02:00
Zamil Majdy
543fc2da70 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into zamilmajdy/secrt-2171-sql-query-block-for-copilotautopilot-analytics-access 2026-04-01 20:51:52 +02:00
Zamil Majdy
7f986bc565 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-01 20:51:50 +02:00
Zamil Majdy
f4571cb9e1 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-01 20:51:48 +02:00
Zamil Majdy
5f41afe748 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-include-graph-option 2026-04-01 20:51:46 +02:00
Zamil Majdy
d046c01a65 feat(copilot): allow background sub-agents and add Agent tool UI
- Remove run_in_background deny block — SDK handles async lifecycle
  (returns isAsync:true, model polls via TaskOutput)
- Keep max_subtasks concurrency limit (background agents count too)
- Add "agent" tool category to frontend GenericTool with RobotIcon
- Detect isAsync output to show "Agent started" not "Agent completed"
- Add TaskOutput renderer showing retrieval status and results
- Fix pre-existing TS error in useChatSession (void mutation body)
- Update tests: background allowed, limit still enforced
2026-04-01 20:50:49 +02:00
Zamil Majdy
b220fe4347 test(copilot): add build_sdk_env tests for all 3 auth modes
Cover subscription, direct Anthropic, and OpenRouter auth modes in
build_sdk_env(). Also verifies that all modes return a mutable dict
that can accept security env vars like CLAUDE_CODE_TMPDIR.
2026-04-01 20:31:32 +02:00
Zamil Majdy
7af138adba fix(backend): use word-boundary regex for database name sanitization
Replaces naive str.replace() with re.sub() using \b word boundaries
when scrubbing database names from error messages. Prevents mangling
unrelated words when the database name is a common substring like
"test", "data", or "on".
2026-04-01 20:30:39 +02:00
Zamil Majdy
5c406a20ba fix(backend): handle AgentOutputBlock format field in dry-run simulation
Mirror the real AgentOutputBlock.run() behavior: when a format string
is provided, apply Jinja2 formatting and yield only the "output" pin;
when no format is provided, yield both "output" and "name" pins.
2026-04-01 20:29:40 +02:00
Zamil Majdy
61513b9dad fix(copilot): mock build_sdk_env to return {} instead of None in retry tests
The tests were mocking build_sdk_env to return None, but the service
code now assigns security env vars (CLAUDE_CODE_TMPDIR, etc.) to the
returned dict. This caused TypeError: 'NoneType' object does not
support item assignment in all 6 retry scenario tests.
2026-04-01 20:27:51 +02:00
Zamil Majdy
6f679a0e32 fix(backend/copilot): preserve tool_calls and tool_call_id through context compression 2026-04-01 20:27:33 +02:00
Zamil Majdy
b8065212b1 chore: remove accidentally committed test screenshots 2026-04-01 19:16:25 +02:00
Zamil Majdy
d5281a9a13 chore: remove accidentally committed test screenshots 2026-04-01 19:16:22 +02:00
Zamil Majdy
05495d8478 chore: remove accidentally committed test screenshots 2026-04-01 19:16:18 +02:00
Zamil Majdy
bae409d04e chore: remove accidentally committed test screenshots 2026-04-01 19:16:14 +02:00
Zamil Majdy
e11eb2caaa chore: remove accidentally committed test screenshots 2026-04-01 19:16:10 +02:00
Zamil Majdy
2c04768711 chore: remove accidentally committed test screenshots 2026-04-01 19:15:35 +02:00
Zamil Majdy
c9bf3aa339 fix(backend/copilot): clear partial graphs on timeout for consistent state 2026-04-01 19:13:10 +02:00
Zamil Majdy
4ac0ba570a fix(backend): fix copilot credential loading across event loops (#12628)
## Why

CoPilot autopilot sessions are inconsistently failing to load user
credentials (specifically GitHub OAuth). Some sessions proceed normally,
some show "provide credentials" prompts despite the user having valid
creds, and some are completely blocked.

Production logs confirmed the root cause: `RuntimeError: Task got Future
<Future pending> attached to a different loop` in the credential refresh
path, cascading into null-cache poisoning that blocks credential lookups
for 60 seconds.

## What

Three interrelated bugs in the credential system:

1. **`refresh_if_needed` always acquired Redis locks even with
`lock=False`** — The `lock` parameter only controlled the inner
credential lock, but the outer "refresh" scope lock was always acquired.
The copilot executor uses multiple worker threads with separate event
loops; the `asyncio.Lock` inside `AsyncRedisKeyedMutex` was bound to one
loop and failed on others.

2. **Stale event loop in `locks()` singleton** — Both
`IntegrationCredentialsManager` and `IntegrationCredentialsStore` cached
their `AsyncRedisKeyedMutex` without tracking which event loop created
it. When a different worker thread (with a different loop) reused the
singleton, it got the "Future attached to different loop" error.

3. **Null-cache poisoning on refresh failure** — When OAuth refresh
failed (due to the event loop error), the code fell through to cache "no
credentials found" for 60 seconds via `_null_cache`. This blocked ALL
subsequent credential lookups for that user+provider, even though the
credentials existed and could refresh fine on retry.

## How

- Split `refresh_if_needed` into `_refresh_locked` / `_refresh_unlocked`
so `lock=False` truly skips ALL Redis locking (safe for copilot's
best-effort background injection)
- Added event loop tracking to `locks()` in both
`IntegrationCredentialsManager` and `IntegrationCredentialsStore` —
recreates the mutex when the running loop changes
- Only populate `_null_cache` when the user genuinely has no
credentials; skip caching when OAuth refresh failed transiently
- Updated existing test to verify null-cache is not poisoned on refresh
failure

## Test plan

- [x] All 14 existing `integration_creds_test.py` tests pass
- [x] Updated
`test_oauth2_refresh_failure_returns_none_without_null_cache` verifies
null-cache is not populated on refresh failure
- [x] Format, lint, and typecheck pass
- [ ] Deploy to staging and verify copilot sessions consistently load
GitHub credentials
2026-04-02 00:11:38 +07:00
Zamil Majdy
d61a2c6cd0 Revert "fix(backend/copilot): detect prompt-too-long in AssistantMessage content and ResultMessage success subtype"
This reverts commit 1c301b4b61.
2026-04-01 18:59:38 +02:00
Zamil Majdy
1c301b4b61 fix(backend/copilot): detect prompt-too-long in AssistantMessage content and ResultMessage success subtype
The SDK returns AssistantMessage(error="invalid_request", content=[TextBlock("Prompt is too long")])
followed by ResultMessage(subtype="success", result="Prompt is too long") when the transcript is
rejected after internal auto-compaction. Both paths bypassed the retry mechanism:

- AssistantMessage handler only checked error_text ("invalid_request"), not the content which
  holds the actual error description. The content was then streamed as text, setting events_yielded=1,
  which blocked retry even when ResultMessage fired.
- ResultMessage handler only triggered prompt-too-long detection for subtype="error", not
  subtype="success". The stream "completed normally", stored the synthetic error entry in the
  transcript, and uploaded it — causing the transcript to grow unboundedly on each failed turn.

Fixes:
1. AssistantMessage handler: when sdk_error is set (confirmed error message), also check content
   text. sdk_error being set guarantees this is an API error, not user-generated content, so
   content inspection is safe.
2. ResultMessage handler: check result for prompt-too-long regardless of subtype, covering the
   case where the SDK auto-compacts internally but the result is still too long.

Adds integration tests for both new scenarios.
2026-04-01 18:28:46 +02:00
Zamil Majdy
e753aee7a0 fix(copilot): prevent infinite transient retry loop
The transient_retries counter was reset to 0 at the top of the while
loop on every iteration, including after transient retry `continue`
statements.  Since transient retries don't increment `attempt`, the
counter reset every time, creating an infinite retry loop that could
never exhaust the max_transient budget.

Fix: only reset transient_retries when the context-level `attempt`
actually changes, using a _last_reset_attempt sentinel.
2026-04-01 18:21:50 +02:00
Zamil Majdy
f76566c834 fix(test): update dry-run param test to match deduplicated description
The run_agent dry_run description was updated during deduplication to
reference the agent_generation_guide instead of saying "preview mode".
Update the test assertion to match.
2026-04-01 18:18:20 +02:00
Zamil Majdy
a58b997141 fix(test): align simulation prompt test with error pin exclusion from required list
The test expected "error" in "Available output pins" but the prompt now
correctly excludes error from the required output pins list to match the
instruction telling the LLM to omit it.
2026-04-01 18:15:42 +02:00
Zamil Majdy
3f24a003ad fix(copilot): add None guard to fix pyright reportOperatorIssue
_resolve_fallback_model returns str | None, so pyright flags the
`"." not in result` assertion.  Add an explicit `is not None` check
before the containment test to narrow the type.
2026-04-01 18:15:16 +02:00
Zamil Majdy
1a645e1e37 fix(backend/copilot): align _flatten_assistant_content with master (drop tool_use blocks)
The merge conflict resolution copied the pre-#12625 version of
_flatten_assistant_content which converts tool_use blocks to
[tool_use: name] placeholders. Master's #12625 changed this to
drop tool_use blocks entirely to prevent the model from mimicking
them as plain text. Align the canonical transcript.py with master.
2026-04-01 18:14:59 +02:00
Zamil Majdy
bee76962b0 fix(backend): rollback write transaction on error in SQL query block
Use explicit except/else instead of finally to ensure write transactions
are rolled back when an exception occurs, rather than committed.
2026-04-01 18:13:37 +02:00
Zamil Majdy
864e68bed1 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-01 18:09:58 +02:00
Zamil Majdy
7c6201110c test: add E2E screenshots for PR #12578 2026-04-01 18:06:51 +02:00
Zamil Majdy
bded680b77 docs(backend): add cross-cutting test location explanation to dry_run_loop_test.py 2026-04-01 18:06:51 +02:00
Zamil Majdy
1e008dc172 fix(copilot): align dry_run_loop_test with #12582's required dry_run field
After merging dev, #12582 made dry_run a required field with description
"Execute in preview mode." — update tests to match:
- Assert dry_run is in required (not optional) for both run_agent/run_block
- Match "preview mode" instead of "simulation"/"guide" in descriptions
- Pass dry_run=False explicitly in RunAgentInput constructor tests
- Lower description length threshold to 10 (was 20) for the shorter text
2026-04-01 18:06:51 +02:00
Zamil Majdy
9966e122ab test(copilot): add functional tests for dry-run loop beyond substring checks
Add 23 new tests covering:
- run_agent and run_block OpenAI tool schema validation (type, optionality,
  description quality, coexistence of dry_run + wait_for_result)
- RunAgentInput Pydantic model behavior (default value, bool coercion,
  combined parameters, validation bounds, string stripping)
- Guide workflow ordering (create before dry-run, dry-run before inspect,
  fix before repeat, numbered step sequence)
2026-04-01 18:06:51 +02:00
Zamil Majdy
65108c31dc fix(copilot): reference SendAuthenticatedWebRequestBlock in tool discovery + fix CI 2026-04-01 18:06:51 +02:00
Zamil Majdy
7767c97f50 fix(copilot): deduplicate dry-run instructions, keep only in guide
Remove duplicated dry-run workflow text from prompting.py shared notes,
service.py DEFAULT_SYSTEM_PROMPT, run_agent.py tool/param descriptions,
create_agent.py, and edit_agent.py. The agent_generation_guide.md is the
single source of truth, loaded on-demand via get_agent_building_guide.
2026-04-01 18:06:51 +02:00
Zamil Majdy
69ab21ebe7 fix(copilot): address review round 2 — remove internal Python refs from guide, format system prompt
- Replace `_SHARED_TOOL_NOTES`, `prompting.py` reference in mcp_tool_guide.md
  with LLM-friendly wording ("described in the tool notes") since the guide
  is shown to the model, not to developers.
- Break the long single-line dry-run instruction in DEFAULT_SYSTEM_PROMPT
  into bullet points matching the surrounding prompt style for readability.
2026-04-01 18:06:31 +02:00
Zamil Majdy
6fe4e1b774 fix(copilot): address review round 1 — deduplicate prompts, relocate tests
- Slim down the duplicate error-pattern list in _SHARED_TOOL_NOTES
  (prompting.py) to a concise summary that references the guide for details,
  reducing maintenance surface from 5+ near-identical copies to one.
- Move dry_run_loop_test.py from backend/copilot/ (production package) to
  test/copilot/ to match the project's test directory convention.
- Route supplement tests through the public get_sdk_supplement() API instead
  of importing the private _SHARED_TOOL_NOTES symbol.
- Loosen overly-brittle assertions (exact step numbers, exact spacing around
  '/' in error pattern names) while preserving intent as prompt regression
  tests.  Add module-level docstring documenting the deliberate brittleness.
2026-04-01 18:06:31 +02:00
Zamil Majdy
c778cc9849 fix(platform): remove hardcoded 3-iteration cap from dry-run loop
Instead of capping at 3 iterations, let the copilot repeat the
dry-run -> fix cycle until the simulation passes or the problems
are clearly unfixable. This gives the copilot flexibility to keep
going if it's making progress, or stop early if issues are not
resolvable.
2026-04-01 18:06:31 +02:00
Zamil Majdy
50b635da6d fix(copilot): remove redundant "3 iterations" repetition in supplement
De-duplicate "after 3 iterations" from the same sentence that already
says "up to 3 iterations" — now reads "If issues persist, report..."
2026-04-01 18:06:31 +02:00
Zamil Majdy
08e254143b fix(copilot): standardize iteration wording, add test for tool discovery priority, fix cross-reference
- Standardize max-iteration wording to "3 iterations" everywhere (prompting.py,
  agent_generation_guide.md, tests) instead of mixed "3 times"/"3 iterations"
- Replace loose `or` fallback in test_shared_tool_notes_include_max_iterations
  with exact "3 iterations" assertion
- Add test_shared_tool_notes_include_tool_discovery_priority test
- Make mcp_tool_guide.md cross-reference explicit: point to `_SHARED_TOOL_NOTES`
  in `prompting.py` instead of vague "see shared supplement"
2026-04-01 18:06:31 +02:00
Zamil Majdy
89fcfc4e0a refactor(copilot): move tool/action search priority to shared supplement
Move the "check blocks first" strategy from `mcp_tool_guide.md` (only
loaded for MCP) into `_SHARED_TOOL_NOTES` so it applies to every
session. The MCP guide now references the shared strategy instead of
duplicating it.
2026-04-01 18:06:31 +02:00
Zamil Majdy
e7ca07f4bf fix(copilot): align dry-run prompt wording and tighten test assertion
- Align guide heading to "create -> dry-run -> fix" matching supplement
- Align error pattern names between guide and supplement to canonical form
- Drop loose "max " fallback in test assertion for precision
2026-04-01 18:06:31 +02:00
Zamil Majdy
c564ac7277 fix(copilot): address PR review - reduce prompt redundancy, tighten tests
- Slim down DEFAULT_SYSTEM_PROMPT to a brief one-liner referencing the
  supplement for detailed workflow (avoids ~300 token duplication)
- Tighten test assertions to use specific substring checks (e.g. section
  headers, exact phrases) instead of loose single-word matches
- Restore view_agent_output reference in the agent generation guide for
  node-by-node execution trace inspection
- Add test for view_agent_output mention in guide (22 tests total)
2026-04-01 18:06:31 +02:00
Zamil Majdy
ac3a826ad0 feat(copilot): add create -> dry-run -> fix loop to agent generation prompts
Instruct the copilot LLM to automatically dry-run agents after creating
or editing them, inspect the output for wiring issues, and fix iteratively
(up to 3 attempts) before presenting the agent as ready to the user.

Changes:
- System prompt: add "Agent Development: Create -> Dry-Run -> Fix Loop" section
- Tool descriptions: create_agent, edit_agent, run_agent, get_agent_building_guide
  now reference the dry-run verification workflow
- Prompting supplement: add "Iterative agent development" section with error
  pattern guidance (failed nodes, null outputs, unexecuted nodes)
- Agent generation guide: replace "Testing with Dry Run" with comprehensive
  "REQUIRED: Dry-Run Verification Loop" section including good/bad output
  examples and workflow steps 8-9
- Tests: 21 new tests verifying prompt content across all layers
2026-04-01 18:06:31 +02:00
Zamil Majdy
6f32184019 test: add E2E screenshots for PR #12575 2026-04-01 18:06:02 +02:00
Zamil Majdy
6d0eedae83 fix(backend): truncate large run() source code in simulation prompt
Prevent prompt blowup for blocks with very large run() implementations
by applying the same _MAX_INPUT_VALUE_CHARS limit used for input values.
2026-04-01 18:06:02 +02:00
Zamil Majdy
fb328f9d74 fix(backend): move os import to top-level, remove getattr duck typing, use schema-based credential stripping in simulator
- Move `import os` from function body to top-level (stdlib, no startup cost)
- Replace `getattr(ChatConfig(), "simulation_model", "")` with direct
  attribute access since the field has a default value
- Use `block.input_schema.get_credentials_fields()` to detect credential
  fields programmatically, falling back to common names
2026-04-01 18:06:02 +02:00
Zamil Majdy
a369fbe169 fix(copilot): replace tautological env-var tests with source assertions
The TestSecurityEnvVars tests were testing Python dict assignment rather
than verifying the actual production code. Replace with source-level
assertions that grep service.py for the required env var names, catching
accidental removals without duplicating production logic.
2026-04-01 18:05:50 +02:00
Zamil Majdy
2a0b74cae4 fix(backend): update test for new prompt format (Available output pins)
The build_simulation_prompt now uses "Available output pins" instead of
"MUST include" — update the test from dev to match the new prompt.
2026-04-01 18:05:46 +02:00
Zamil Majdy
b08f9fc02a fix(platform): regenerate openapi.json and fix flaky test teardown
- Regenerate openapi.json to include Pydantic v2 ValidationError fields
  (input, ctx) that were added after the Gemini Flash commit
- Wrap oauth_test.py session fixture teardown in try/except to handle
  RuntimeError when event loop is already closed during session shutdown
2026-04-01 18:05:46 +02:00
Zamil Majdy
857acb2bbc feat(backend): use Gemini Flash for dry-run simulation, make model configurable 2026-04-01 18:05:26 +02:00
Zamil Majdy
0cb230c4f0 test(backend): add dry-run tests for AgentExecutorBlock child graph spawning
Verify prepare_dry_run returns an unmodified shallow copy for
AgentExecutorBlock (identity, equality, mutation isolation).

Also cover simulator edge cases: AgentInputBlock with all-None/missing
fields, and generic blocks yielding zero meaningful outputs.
2026-04-01 18:05:26 +02:00
Zamil Majdy
2cd5c0eab8 refactor(backend): unify MCP block simulation into generic path
Remove the MCP-specific simulation function and prompt builder.
MCPToolBlock now uses the same generic LLM simulation as all other
blocks, grounded by the block's run() source code. This eliminates
code duplication and ensures MCP blocks benefit from the same
improvements (e.g., source code grounding) as other blocks.

Also removes corresponding MCP-specific tests since the generic
simulate_block path covers the same functionality.
2026-04-01 18:05:26 +02:00
Zamil Majdy
7bf8e460ea fix(backend): add folder assignment to library agent upsert update path
The upsert's update path was missing the folder connection logic that
the create path had, causing folder changes to be silently ignored when
re-adding a previously deleted library agent.
2026-04-01 18:05:13 +02:00
Zamil Majdy
84d328517a fix(backend): always yield result pin in MCP simulation success path
The success path now explicitly yields ("result", ...) from the parsed
response rather than iterating all pins with a None/empty filter.
This prevents downstream starvation when the LLM legitimately returns
null for side-effect-only tool results.
2026-04-01 18:05:13 +02:00
Zamil Majdy
842ff6c600 fix(backend): yield result pin in MCP simulation error path
When simulate_mcp_block catches a RuntimeError/ValueError, it now yields
a ("result", None) before ("error", ...) so downstream nodes connected
to the result pin are not starved during dry-run error paths.
2026-04-01 18:05:13 +02:00
Zamil Majdy
b510fbee2a docs: fix stale iteration cap (5 → 1) in agent generation guide 2026-04-01 18:05:13 +02:00
Zamil Majdy
bb7f0ad1f2 test(simulator): align tests with dynamic pin yielding behavior
Update test assertions to match the simulator's current behavior where
empty/missing output pins are omitted rather than yielded. Also fix
prompt assertion strings to match the actual prompt text.
2026-04-01 18:05:13 +02:00
Zamil Majdy
3f8af89b63 fix(frontend): only show error styling when error output is non-empty 2026-04-01 18:04:47 +02:00
Zamil Majdy
375e5e1f10 fix(simulator): clean up error handling + dynamic pin yielding
- Don't force empty error pin — only yield error when there's a real error
- Yield all pins dynamically from LLM response (not just result+error)
- Allow logical error simulation (invalid input etc.) but not auth errors
- Omit pins with no meaningful value
2026-04-01 18:04:47 +02:00
Zamil Majdy
fd1d706315 fix(frontend): replace lucide-react icons with Phosphor equivalents in mode toggle
Use Brain and Lightning from @phosphor-icons/react instead of Brain and
Zap from lucide-react to comply with the project icon guidelines.
2026-04-01 18:04:44 +02:00
Zamil Majdy
faf2f43f6a test(simulator): add unit tests for prompt building and passthrough logic
Covers credential stripping, realistic-output instructions, input/output
block passthrough, prepare_dry_run routing, missing-pin filling, and
LLM failure handling.
2026-04-01 18:04:17 +02:00
Zamil Majdy
eea230d37f fix(simulator): produce realistic output + strip credentials from prompt
- Strip credential fields from input before sending to LLM so it never
  sees null/empty credentials and incorrectly simulates auth failures.
- Strengthen prompt: NEVER return empty/null, always generate realistic
  URLs, text, and data structures. Error pin always empty string.
- Input blocks: generate default value when no user input provided
  (first dropdown option or block name).
2026-04-01 18:03:59 +02:00
Zamil Majdy
76965429f1 fix(simulator): restore input/output block passthrough in dry-run
Re-add the passthrough logic for AgentInputBlock and AgentOutputBlock
in simulate_block. These blocks are trivial passthroughs that don't
need LLM simulation -- forwarding input values directly is faster,
deterministic, and doesn't require API keys (which aren't available
in CI).
2026-04-01 18:03:39 +02:00
Zamil Majdy
eefa60368f test(simulator): remove input/output block passthrough tests
These tests asserted passthrough behavior for AgentInputBlock and
AgentOutputBlock which was removed in the preceding refactor commit.
The simulator now LLM-simulates these blocks using their run() source
code, so the old passthrough assertions are invalid and require an
API key not available in CI.
2026-04-01 18:03:39 +02:00
Zamil Majdy
88fe1e9b5e refactor(simulator): remove special-casing for input/output blocks
The simulator now has the block's run() source code via inspect.getsource(),
so it can figure out what any block does by reading the code. No need for
special isinstance checks for AgentInputBlock/AgentOutputBlock.
2026-04-01 18:03:39 +02:00
Zamil Majdy
93264b1177 fix(simulator): generate default values for input blocks in dry-run
When users click Simulate without providing input values,
AgentInputBlock.value is None and nothing gets yielded. This leaves
downstream blocks (like OrchestratorBlock) with unpopulated links,
causing them to be skipped entirely.

Fix: generate a sensible default — first dropdown option for
AgentDropdownInputBlock, or "sample {name}" for text inputs.
2026-04-01 18:03:39 +02:00
Zamil Majdy
3269d17880 fix(simulator): use Python 3.11-compatible f-string in build_simulation_prompt
The nested f-string on line 224 used triple double-quotes inside a
triple double-quoted f-string, which is only valid from Python 3.12.
Extract the implementation section to a separate variable to fix the
SyntaxError on Python 3.11 CI.
2026-04-01 18:03:39 +02:00
Zamil Majdy
1e5788f2cf feat(simulator): include block run() source code in simulation prompt
The LLM simulator now receives the block's actual run() function source
via inspect.getsource(). This gives the LLM exact knowledge of how
inputs transform to outputs, producing far more accurate simulations.
2026-04-01 18:03:39 +02:00
Zamil Majdy
ca8214d95f fix(frontend): refetch execution details after websocket subscription to close race-condition gap
Dry-run executions can complete before the WebSocket subscription is
established, causing the frontend to miss realtime updates.  After the
subscription is confirmed, immediately invalidate the execution-details
query so react-query refetches the latest state from the REST API.

Also reduce the polling interval from 2s to 1s for more responsive
feedback during fast-completing executions.
2026-04-01 18:03:39 +02:00
Zamil Majdy
f58ce5cc70 fix(backend): passthrough input/output blocks and preserve user model in dry-run
Input blocks (AgentInputBlock and all subclasses) and AgentOutputBlock are
pure passthrough -- they just forward their input values. Previously they
went through the LLM simulator which produced verbose generated text
instead of the raw value.

Also stop swapping the OrchestratorBlock model to gpt-4o-mini during
dry-run. The user's own model and credentials are now preserved, which
avoids credential mismatches (e.g. Anthropic key vs OpenAI model).
Iterations are still capped to 1.
2026-04-01 18:03:39 +02:00
Zamil Majdy
bf29801b07 fix(backend): restore AgentExecutorBlock as dry-run passthrough block
In commit f2546b31, AgentExecutorBlock was inadvertently removed from the
passthrough list when can_simulate() was replaced with prepare_dry_run().
Since AgentExecutorBlock.Output has no properties, LLM simulation yields
zero outputs -- causing the block to "complete without output" during
dry-run.

Restore AgentExecutorBlock in prepare_dry_run() so it executes for real
during dry-run, spawning a child graph execution whose blocks are then
simulated (dry_run=True is inherited via execution context).
2026-04-01 18:03:39 +02:00
Zamil Majdy
dcc2bdd8ab fix(backend): preserve thinking blocks during transcript compaction (#12574)
AutoPilot users hit `invalid_request_error` ("thinking or
redacted_thinking blocks in the latest assistant message cannot be
modified") when sessions get long enough to trigger transcript
compaction. The Anthropic API requires thinking blocks in the last
assistant message to be byte-for-byte identical to the original response
— our compaction was flattening them to plain text, destroying the
cryptographic signatures.

Reported in Discord `#breakage` by John Ababseh with session
`31d3f08a-cb94-45eb-9fce-56b3f0287ef4`.

- **`compact_transcript`** now splits the transcript into a compressible
prefix and a preserved tail (last assistant entry + trailing entries).
Only the prefix is compressed; the tail is re-appended verbatim,
preserving thinking blocks exactly.
- **`_flatten_assistant_content`** now silently drops `thinking` and
`redacted_thinking` blocks instead of creating `[__thinking__]`
placeholders — they carry no useful context for compression summaries.
- **`response_adapter`** explicitly handles `ThinkingBlock` (skip
gracefully instead of silently falling through the isinstance chain).
- **`_format_sdk_content_blocks`** now passes through raw dict blocks
(e.g. `redacted_thinking` that the SDK may not have a typed class for)
verbatim to the transcript.

The key insight is the Anthropic API's asymmetric constraint:
- **Last assistant message**: thinking/redacted_thinking blocks must be
preserved byte-for-byte
- **Older assistant messages**: thinking blocks can be removed entirely

`compact_transcript` uses `_find_last_assistant_entry()` to split the
JSONL into two parts:
1. **Prefix** (everything before the last assistant): flattened and
compressed normally
2. **Tail** (last assistant + any trailing user message): preserved
verbatim and re-chained via `_rechain_tail()` to maintain the
`parentUuid` chain

This ensures the API always sees the original thinking blocks in the
last assistant message while still achieving meaningful compression on
older turns.

- [x] 25 new tests across `thinking_blocks_test.py` (TDD: written before
implementation)
- [x] `_find_last_assistant_entry` splits correctly at last assistant,
handles edges (no assistant, index 0, trailing user)
  - [x] `_rechain_tail` patches parentUuid chain, handles empty tail
- [x] `_flatten_assistant_content` strips thinking/redacted_thinking
blocks, handles mixed content
  - [x] `compact_transcript` preserves last assistant's thinking blocks
- [x] `compact_transcript` strips thinking from older assistant messages
- [x] Edge cases: trailing user message, single assistant, no thinking
blocks
  - [x] `response_adapter` handles ThinkingBlock without crash
- [x] `_format_sdk_content_blocks` preserves thinking block format and
raw dict blocks
- [x] All existing copilot SDK tests pass
- [x] Pre-commit hooks (lint, format, typecheck) all pass
2026-04-01 18:03:22 +02:00
Zamil Majdy
e74a918c4a debug(backend): add info-level logging to AgentExecutorBlock event listener
Logs event receipt, skip reasons, and final output count to investigate
why sub-agent outputs are not reaching the parent during dry-run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
ff05b5b8d5 revert(backend): remove unnecessary DB fallback from AgentExecutorBlock
The DB fallback was added based on wrong analysis. The actual fix is
passing dry_run=True to add_graph_execution (previous commit) so
credential validation is skipped during dry-run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
56090f870c fix(backend): pass dry_run to add_graph_execution in AgentExecutorBlock
The sub-agent's graph validation rejects missing credentials. During
dry-run, credential errors should be stripped — but the dry_run flag
wasn't being passed to add_graph_execution, so validation always
enforced credentials even in dry-run mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
a3e3d3ff6b fix(backend): fallback to DB query for AgentExecutorBlock output in dry-run
During dry-run, the sub-agent's output events may not reach the
event_bus listener before the GRAPH_EXEC_UPDATE arrives (the simulated
execution completes faster than events propagate). This causes the
AgentExecutorBlock to complete with 0 outputs.

Adds a DB fallback: after the event loop breaks on graph COMPLETED,
if no outputs were yielded, query get_node_executions for the
sub-agent's OUTPUT block results and yield them.

Evidence: normal run produces 1 output, ALL dry-runs produce 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
cfaa1ff0d4 fix(backend): execute AgentExecutorBlock for real in dry-run mode
Previously, AgentExecutorBlock was LLM-simulated during dry-run,
producing no meaningful output and making executions INCOMPLETE.

Now prepare_dry_run returns the input unchanged for AgentExecutorBlock,
letting it execute the sub-agent graph. The sub-agent's blocks are
individually simulated via the propagated dry_run execution_context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
6ab9a3285f fix(simulator): preserve traditional mode in dry-run preparation
prepare_dry_run now respects agent_mode_max_iterations=0 (traditional
mode) instead of unconditionally forcing agent mode. Only overrides
to 1 when the user configured agent mode (non-zero).
2026-04-01 18:02:45 +02:00
Zamil Majdy
390324d5a1 fix(platform): fall back to simulation when dry-run credentials missing + poll execution details
Bug 1: OrchestratorBlock in dry-run fails with "credentials is a required
property" when the user hasn't configured any LLM credentials.  After
prepare_dry_run overrides the model to gpt-4o-mini, the block still
requires credentials. Now we check if required credentials fields are
still empty after restoring from node defaults and fall back to LLM
simulation instead of attempting real execution.

Bug 2: WebSocket not showing real-time updates for dry-run executions
due to a race condition — the execution can start and complete before
the frontend subscribes to WebSocket events.  Add refetchInterval
polling (2s) on execution details while the graph is running so the
frontend catches up on any missed events.
2026-04-01 18:02:45 +02:00
Zamil Majdy
9f51796dbe fix(backend): remove dry-run markers from simulated block output text
The [DRY RUN] prefix and "simulated successfully — no real execution
occurred" message was being fed back to the LLM, causing the copilot to
become aware it was in dry-run mode and change its behavior. The output
text now looks identical to real execution output. The UI still shows
the "Simulated" badge via the is_dry_run=True flag on the response.
2026-04-01 18:02:45 +02:00
Zamil Majdy
f42f0013df fix(platform): run OrchestratorBlock with cheap model in dry-run instead of skipping
Replace `can_simulate(block)` with `prepare_dry_run(block, input_data)` which
returns modified input_data (model=gpt-4o-mini, agent_mode_max_iterations=1) for
OrchestratorBlock so it executes for real with a cheap model during dry-run,
instead of being skipped entirely.
2026-04-01 18:02:19 +02:00
Zamil Majdy
3154e5b87a test(backend): mock get_global_rate_limits in reset_usage tests for determinism
All reset_copilot_usage tests that reach the get_global_rate_limits
call path now explicitly mock it, preventing LaunchDarkly flag
evaluation from interfering with test assertions.
2026-04-01 18:02:19 +02:00
Zamil Majdy
78cd14d501 fix(platform): address review R1 - fix docstring, stale closures, shared formatCents
- Fix misleading "Fails open" docstring in reset_daily_usage (it's
  fail-closed for billed operations).
- Use refs in useResetRateLimit to avoid stale closure in mutation
  callbacks.
- Replace eslint-disable with useRef pattern in RateLimitResetDialog.
- Export and share formatCents between dialog and panel components.
- Add clarifying comment for omitted rate_limit_reset_cost in inner
  get_usage_status call.
2026-04-01 18:02:19 +02:00
Zamil Majdy
137edb3e6e fix(backend): address review nits - fix docstring and hoist constant
- Fix misleading simulate_block docstring that claimed "Returns None"
  for passthrough blocks (it never does; callers use can_simulate())
- Hoist _DRY_RUN_MAX_ITERATIONS to module-level in manager.py
2026-04-01 18:02:19 +02:00
Zamil Majdy
449e9b17f1 fix(backend): simplify dry-run special block handling per review feedback
Remove overengineered simulation_context, dry_run_passthrough flag,
credential redaction/URL sanitization, and excessive utils validation.
The simulator now decides which blocks to handle via can_simulate() and
delegates MCPToolBlock to a specialized prompt internally. Manager
changes are minimal: try simulator, fall back to normal execution.

-573 lines removed, 18 tests still pass.
2026-04-01 18:02:18 +02:00
Zamil Majdy
5b3f87d7c7 fix(backend): use exact URL equality assertions to silence CodeQL false positives
Replace substring `in` checks with exact equality assertions in
simulator_test.py. CodeQL flagged 4 instances of "Incomplete URL
substring sanitization" on test assertions like `assert "example.com"
in result`. Using `==` against the expected sanitized URL both silences
the CodeQL alert and makes the tests stricter.
2026-04-01 18:02:01 +02:00
Zamil Majdy
ee7209a575 test(backend): add simulator_test.py for redaction, URL sanitization, regex, and simulation_context
Cover test scenarios missing from test_dry_run.py:
- Secret field redaction (api_key, password, secret, tokens, credentials)
- URL sanitization (strip userinfo, query params, fragments)
- Non-secret field preservation
- simulation_context validation and 16KB size limit
- Regex false-positive guard (author, authority, token_count)
- Underscore-aware boundaries (api_secret, client_secret)
2026-04-01 18:02:01 +02:00
Zamil Majdy
7ea89b07ce fix(backend): address review R2 - use underscore-aware boundaries in secret regex
Replace \b word boundaries with (?:^|_)...(?:$|_) to treat underscores
as segment separators. This correctly catches compound keys like
api_secret, client_secret, secret_key, credentials while still avoiding
false positives on author, authority, token_count, etc.
2026-04-01 18:02:01 +02:00
Zamil Majdy
5324e0cc2f fix(backend): address review R1 - tighten secret regex, hoist constant, unconditional iteration cap
- _SECRET_KEY_PATTERN: use word boundaries to avoid false positives on
  keys like "author", "authority", "token_count"
- _SIMULATION_CONTEXT_MAX_BYTES: hoist to module level in utils.py
- agent_mode_max_iterations: apply cap unconditionally for passthrough
  blocks in dry-run mode (not only when key already exists in input_data)
2026-04-01 18:02:01 +02:00
Zamil Majdy
c7cbb8b02e fix(backend): apply simulation_context to execution_context when resuming dry-run
When resuming a graph execution with an already-provided execution_context,
the computed safe_simulation_context was never applied to it, causing
simulation hints to be silently ignored for resumed dry-run executions.
2026-04-01 18:02:01 +02:00
Zamil Majdy
d66ffb1ee4 fix(backend): fall back to simulation when passthrough block lacks credentials
When dry_run_passthrough is true but the block's required credentials
weren't acquired (e.g. user hasn't configured LLM credentials), fall
back to LLM simulation instead of failing with a credentials error.
This makes dry-run robust for agents created without credentials.
2026-04-01 18:02:01 +02:00
Zamil Majdy
5d489c72b5 fix(backend): inherit dry_run from execution_context in child graph validation
When AgentExecutorBlock spawns a child graph execution, it passes
execution_context (with dry_run=True) but the dry_run parameter
defaults to False. This caused validate_and_construct_node_execution_input
to reject missing credentials in the sub-agent, even though dry-run
should skip credential validation.

Fix: derive dry_run from execution_context.dry_run when an execution_context
is provided. Also propagate simulation_context to child graphs.
2026-04-01 18:02:01 +02:00
Zamil Majdy
646ffe1693 fix(backend): address review - move simulation_context to user prompt, redact credentials, validate before DB write
- Move simulation_context validation before create_graph_execution to
  prevent orphaned INCOMPLETE records on validation failure (sentry)
- Move simulation_context from system prompt to user prompt to prevent
  prompt injection from caller-supplied data (coderabbitai)
- Add credential redaction (_redact_inputs) that masks secret-bearing
  fields (api_key, token, password, etc.) and sanitizes URLs by
  stripping userinfo/query/fragment before serializing to LLM prompts
- Sanitize MCP server_url in system prompt
- Update tests to assert simulation_context is in user_prompt not system_prompt
2026-04-01 18:02:01 +02:00
Zamil Majdy
59b1811e8b fix(backend): validate simulation_context size and gate behind dry_run
- Only attach simulation_context when dry_run=True (ignored otherwise)
- Validate JSON-serializability and enforce 16KB size limit to prevent
  oversized queue payloads
2026-04-01 18:01:30 +02:00
Zamil Majdy
6404e58fb1 refactor(backend): address coderabbitai review - typed dry_run_passthrough, truncate MCP schema
- Add dry_run_passthrough property to Block base class; set on
  OrchestratorBlock and AgentExecutorBlock. Removes isinstance() dispatch
  from manager.py for dry-run routing.
- Truncate tool_input_schema in MCP simulation prompt to prevent oversized
  LLM payloads (reuses _MAX_INPUT_VALUE_CHARS limit).
- Replace isinstance(OrchestratorBlock) iteration cap check with generic
  field-presence check.
2026-04-01 18:01:30 +02:00
Zamil Majdy
e0bfa1524e feat(backend): add simulation_context for dry-run scenario hints
Thread an optional simulation_context dict through the execution pipeline
so users can provide scenario hints (expected emails, tickets, customer
data, etc.) that guide the LLM simulator to produce realistic outputs.

- Add simulation_context to ExecutionContext (propagates to child graphs)
- Accept simulation_context in REST API, copilot run_agent, and
  add_graph_execution
- Inject context into both block and MCP simulation prompts
- Add tool_description hidden field to MCPToolBlock for richer simulation
- Add 4 new tests for simulation_context and tool_description
2026-04-01 18:01:30 +02:00
Zamil Majdy
ac947a0c11 fix(backend): address CodeQL false positive - use full URL in test assertion
Use the complete URL variable instead of a substring to avoid CodeQL's
"Incomplete URL substring sanitization" alert in test code.
2026-04-01 18:00:43 +02:00
Zamil Majdy
c9f45f056a refactor(backend): address PR review - extract shared LLM retry loop, cap dry-run iterations
- Extract _call_llm_for_simulation() helper to deduplicate retry/error
  logic between simulate_block and simulate_mcp_block
- Cap OrchestratorBlock agent_mode_max_iterations to 5 in dry-run mode
  to prevent unbounded loops of real LLM calls
- Document LLM API cost implications in agent generation guide
- Update module docstring to reflect new dry-run behaviour
2026-04-01 18:00:43 +02:00
Zamil Majdy
89264091ad fix(backend/copilot): add missing strip_stale_thinking_blocks to canonical transcript module
The merge conflict resolution moved transcript.py to a re-export wrapper
but failed to copy strip_stale_thinking_blocks into the canonical
backend.copilot.transcript module. This caused an ImportError in
transcript_test.py which imports from the sdk wrapper.
2026-04-01 18:00:41 +02:00
Zamil Majdy
e3183f1955 test: add test screenshots for PR #12569 SQL query block testing round 2 2026-04-01 18:00:26 +02:00
Zamil Majdy
3ea243c760 fix(backend): resolve pyright type errors in SQL query block error handling
Replace dict(**kwargs) pattern with a local closure to preserve type
information for _sanitize_error parameters. Rename _format_operational_error
to _classify_operational_error since it now takes pre-sanitized input.
2026-04-01 18:00:26 +02:00
Zamil Majdy
991969612c refactor(backend): split SQL query block into block + helpers module
- Extract validation, sanitization, serialization, and query execution
  into sql_query_helpers.py to meet ~300-line file guideline
- Fix duck typing in _serialize_value: replace hasattr(value, "isoformat")
  with explicit isinstance(value, (datetime, date, time))
- Extract _configure_session and _run_in_transaction helpers to bring
  execute_query under ~40-line function guideline
- Extract _validate_query, _resolve_host, _format_operational_error
  helpers to simplify the run method
- Add database name scrubbing to _sanitize_error
2026-04-01 18:00:26 +02:00
Zamil Majdy
8de9880f43 fix(docs): revert host type to 'str (password)' to match block docs generator output 2026-04-01 18:00:26 +02:00
Zamil Majdy
86d8efe697 fix(docs): correct host type from 'str (password)' to 'str (secret)' in SQL Query docs 2026-04-01 18:00:26 +02:00
Zamil Majdy
10ec6c7215 test(blocks): add SQL injection and URL.create() tests for SQL query block
Add tests documenting that single-statement SQL injection patterns (e.g.,
tautology, UNION-based, blind boolean) pass through validation by design,
since the block uses raw SQL via text(query) for trusted admin/analytics
use. Also add tests verifying URL.create() correctly handles special
characters in credentials (passwords with @, #, :, spaces, etc.) -- the
existing test_special_chars_in_password mocked execute_query and never
exercised the actual URL construction path.
2026-04-01 18:00:26 +02:00
Zamil Majdy
51e5371362 style(backend): replace Optional[int] with int | None in SQL query block
Use modern union syntax consistent with the rest of the codebase.
Remove unused Optional import.
2026-04-01 18:00:26 +02:00
Zamil Majdy
cdd14726ce fix(backend): preserve thinking blocks during transcript compaction (#12574)
AutoPilot users hit `invalid_request_error` ("thinking or
redacted_thinking blocks in the latest assistant message cannot be
modified") when sessions get long enough to trigger transcript
compaction. The Anthropic API requires thinking blocks in the last
assistant message to be byte-for-byte identical to the original response
— our compaction was flattening them to plain text, destroying the
cryptographic signatures.

Reported in Discord `#breakage` by John Ababseh with session
`31d3f08a-cb94-45eb-9fce-56b3f0287ef4`.

- **`compact_transcript`** now splits the transcript into a compressible
prefix and a preserved tail (last assistant entry + trailing entries).
Only the prefix is compressed; the tail is re-appended verbatim,
preserving thinking blocks exactly.
- **`_flatten_assistant_content`** now silently drops `thinking` and
`redacted_thinking` blocks instead of creating `[__thinking__]`
placeholders — they carry no useful context for compression summaries.
- **`response_adapter`** explicitly handles `ThinkingBlock` (skip
gracefully instead of silently falling through the isinstance chain).
- **`_format_sdk_content_blocks`** now passes through raw dict blocks
(e.g. `redacted_thinking` that the SDK may not have a typed class for)
verbatim to the transcript.

The key insight is the Anthropic API's asymmetric constraint:
- **Last assistant message**: thinking/redacted_thinking blocks must be
preserved byte-for-byte
- **Older assistant messages**: thinking blocks can be removed entirely

`compact_transcript` uses `_find_last_assistant_entry()` to split the
JSONL into two parts:
1. **Prefix** (everything before the last assistant): flattened and
compressed normally
2. **Tail** (last assistant + any trailing user message): preserved
verbatim and re-chained via `_rechain_tail()` to maintain the
`parentUuid` chain

This ensures the API always sees the original thinking blocks in the
last assistant message while still achieving meaningful compression on
older turns.

- [x] 25 new tests across `thinking_blocks_test.py` (TDD: written before
implementation)
- [x] `_find_last_assistant_entry` splits correctly at last assistant,
handles edges (no assistant, index 0, trailing user)
  - [x] `_rechain_tail` patches parentUuid chain, handles empty tail
- [x] `_flatten_assistant_content` strips thinking/redacted_thinking
blocks, handles mixed content
  - [x] `compact_transcript` preserves last assistant's thinking blocks
- [x] `compact_transcript` strips thinking from older assistant messages
- [x] Edge cases: trailing user message, single assistant, no thinking
blocks
  - [x] `response_adapter` handles ThinkingBlock without crash
- [x] `_format_sdk_content_blocks` preserves thinking block format and
raw dict blocks
- [x] All existing copilot SDK tests pass
- [x] Pre-commit hooks (lint, format, typecheck) all pass
2026-04-01 17:59:53 +02:00
Zamil Majdy
1ebd5635f6 fix(backend/copilot): make include_graph an explicit parameter in _execute
Use an explicit keyword argument instead of extracting from **kwargs
for better discoverability and type safety.
2026-04-01 17:59:52 +02:00
Zamil Majdy
349b6c63de fix(backend): handle TimeoutError in graph enrichment to prevent tool crash 2026-04-01 17:59:52 +02:00
Zamil Majdy
2f7cfa6f1b fix(backend/copilot): strip secrets from graph data in _enrich_agents_with_graph
- Pass `for_export=True` to `get_graph()` so `stripped_for_export()`
  filters credentials, api_key, password, token, secret fields from
  `input_default` before the graph reaches the LLM context
- Use `agent.graph_version` (active version) instead of `version=None`
  to avoid exposing draft/unpublished graph versions
- Add `asyncio.timeout(15)` around `asyncio.gather` to prevent
  indefinite blocking on hung DB connections
- Resolve `graph_db()` once before the gather instead of per-coroutine
- Drop `get_graph_db` alias in favor of `graph_db` to match codebase

Fixes the CRITICAL security finding from autogpt-pr-reviewer.
2026-04-01 17:59:52 +02:00
Zamil Majdy
049aa1ad7d fix(backend/copilot): use f-strings for warning logs per CLAUDE.md style
CLAUDE.md says: use %s for debug, f-strings elsewhere for readability.
Reverts the incorrect change to printf-style for warning-level logs.
2026-04-01 17:59:52 +02:00
Zamil Majdy
a16be2675b style: use lazy formatting in logger.warning calls
Replace f-strings with %-style lazy formatting in _enrich_agents_with_graph
warning logs to follow standard logging conventions.
2026-04-01 17:59:52 +02:00
Zamil Majdy
ac416a561e fix(backend/copilot): remove type: ignore by adding explicit graph_id guard in _fetch 2026-04-01 17:59:52 +02:00
Zamil Majdy
c47fcc1925 refactor(backend/copilot): use BaseGraph type for graph field
Use BaseGraph instead of Graph to get typed nodes+links without causing
the Pydantic OpenAPI schema split. BaseGraph-Input/Output already exists
on dev so no frontend imports break. Fetches via graph_db().get_graph().
2026-04-01 17:59:52 +02:00
Zamil Majdy
77fd8648a7 fix(frontend): regenerate openapi.json to sync Graph schema
The backend Graph model no longer uses separate Input/Output variants,
so the openapi.json was out of sync causing the generated `graph.ts`
type to be missing and failing CI type checks + e2e builds.
2026-04-01 17:59:52 +02:00
Zamil Majdy
4842599bec fix(backend/copilot): remove redundant graph_id guard in _fetch 2026-04-01 17:59:52 +02:00
Zamil Majdy
339e155823 fix(backend): log truncation when include_graph skips agents
When include_graph=true and more agents have graph_ids than
_MAX_GRAPH_FETCHES, log a warning indicating how many agents
were skipped. This makes the silent truncation visible.
2026-04-01 17:59:52 +02:00
Zamil Majdy
9344e62d66 fix: remove type: ignore with proper guard clause in _enrich_agents_with_graph
Narrow agent.graph_id from str | None to str with an early return,
eliminating the type: ignore[arg-type] suppressor.
2026-04-01 17:59:52 +02:00
Zamil Majdy
ee6cc20cbc fix(backend/copilot): address review — parallel fetch, None logging, failure tests
- Use asyncio.gather for parallel graph fetching instead of sequential loop
- Cap graph fetches at 10 to prevent excessive DB calls on broad searches
- Log warning when get_agent_as_json returns None (graph not found)
- Add tests for exception and None return failure paths
2026-04-01 17:59:52 +02:00
Zamil Majdy
eb96b019c5 refactor(backend/copilot): merge create/edit workflows in agent guide 2026-04-01 17:59:52 +02:00
Zamil Majdy
9cf6ac9ad9 feat(backend/copilot): add include_graph option to find_library_agent for agent debugging/editing
The copilot's edit_agent tool required the LLM to provide a complete agent
JSON (nodes + links) without ever seeing the current graph structure — it was
editing blindly. This adds an `include_graph` boolean parameter to the
existing `find_library_agent` tool so the copilot can fetch the full graph
before making modifications.

Also updates the agent generation guide to split creating vs editing
workflows, instructing the LLM to always fetch the current graph first.
2026-04-01 17:59:36 +02:00
Zamil Majdy
d3173605eb test(copilot): add unit tests for P0 guardrails
Tests for _resolve_fallback_model (5 tests), security env vars (4 tests),
and ChatConfig defaults (4 tests). All 13 tests pass.
2026-04-01 17:59:09 +02:00
Zamil Majdy
98c27653f2 fix(copilot): snapshot/restore TranscriptBuilder on transient retry
TranscriptBuilder._entries is independent from session.messages.
Rolling back session.messages alone left duplicate entries in the
uploaded --resume transcript. Now snapshot _entries + _last_uuid
before each attempt and restore both rollback locations on failure.
2026-04-01 17:59:09 +02:00
Zamil Majdy
dced534df3 fix(copilot): review round 3 — fix transient error code check, add SDK compat fields
- Fix exc.code check: "transient" -> "transient_api_error" to match
  the actual code set in _run_stream_attempt (line 1343)
- Add fallback_model, max_turns, max_budget_usd, stderr to SDK compat
  test so field renames in the SDK are caught early
2026-04-01 17:59:09 +02:00
Zamil Majdy
4ebe294707 fix(copilot): review round 2 — fix transient retry consuming context-level attempt
Convert for-loop to while-loop so transient retries (continue) replay
the same context-level attempt instead of advancing to the next one.
Previously, `continue` in a `for attempt in range(...)` loop would
increment `attempt`, causing transient retries to wastefully trigger
context reduction and reset the transient retry counter.

Now: transient retries stay at the same attempt (no attempt++), while
context-error retries explicitly increment attempt before continue.
2026-04-01 17:59:09 +02:00
Zamil Majdy
2e8e115cd1 fix(copilot): review round 1 — fix transient retry count, strip fallback model prefix
- Fix _can_retry_transient off-by-one: >= should be > so max_retries=3
  actually performs 3 retries instead of 2
- Move events_yielded check before counter increment to avoid wasting
  a retry slot when events were already sent
- Strip OpenRouter provider prefix from fallback model name (mirrors
  _resolve_sdk_model logic) to prevent model-not-found errors
2026-04-01 17:59:09 +02:00
Zamil Majdy
5ca49a8ec9 fix(copilot): P0 guardrails — SDK limits, security env vars, transient retry
Based on analysis of the Claude Code CLI internals, adds critical
guardrails rebased on the current dev architecture (env.py extraction):

1. SDK guardrails: fallback_model (auto-retry on 529), max_turns=50
   (runaway prevention), max_budget_usd=5.0 (per-query cost cap)

2. TMPDIR redirect: sets CLAUDE_CODE_TMPDIR to sdk_cwd so CLI output
   is routed into the per-session workspace for isolation/cleanup

3. Security env vars: DISABLE_CLAUDE_MDS, SKIP_PROMPT_HISTORY,
   DISABLE_AUTO_MEMORY, DISABLE_NONESSENTIAL_TRAFFIC

4. Transient error retry: 429/5xx/ECONNRESET errors now retry with
   exponential backoff (1s, 2s, 4s) in both _HandledStreamError and
   generic Exception handlers. Skips retry if events already yielded
2026-04-01 17:59:09 +02:00
Zamil Majdy
a9db5af0fa fix(tests): mock build_sdk_env to return {} instead of None
The CLAUDE_CODE_TMPDIR assignment requires sdk_env to be a dict,
not None. Fixes TypeError in retry scenario tests.
2026-04-01 17:59:07 +02:00
Zamil Majdy
dcbfcfb158 fix(copilot): review round 3 — add Agent to ToolName Literal for permissions
Add "Agent" to the ToolName Literal and test expected set so permission
filtering does not incorrectly block the Agent tool in permissioned
sessions. Without this, apply_tool_permissions would strip "Agent" from
the allowed_tools list.
2026-04-01 17:59:07 +02:00
Zamil Majdy
723b852ba4 fix(copilot): review round 2 — sanitize all untrusted hook inputs for logging
- Sanitize error message and tool_use_id in post_tool_failure_hook
  to prevent log injection via crafted error strings
- Sanitize trigger field in pre_compact_hook
- Use %-style formatting in failure hook for consistency with other hooks
2026-04-01 17:59:07 +02:00
Zamil Majdy
c7e0f8169a fix(copilot): review round 1 — hoist subagent constant, strip C1 chars, guard tmpdir
- Move _SUBAGENT_TOOLS frozenset to module level to avoid per-session allocation
- Extend _sanitize to strip C1 control characters (U+0080-U+009F) for
  defense against log injection via non-ASCII control sequences
- Guard CLAUDE_CODE_TMPDIR assignment with `if sdk_cwd:` for defensive
  consistency (matches PR #12636 approach)
2026-04-01 17:59:07 +02:00
Zamil Majdy
ce1555c07a fix(copilot): address review round 2 — transcript path max_len, subagent tests
- SubagentStop: use max_len=500 for transcript path (consistent with
  pre_compact_hook)
- Add test coverage for SubagentStart/SubagentStop hooks including
  control character sanitization
2026-04-01 17:59:07 +02:00
Zamil Majdy
403a36a3fc fix(copilot): address review — robust sanitize, drop redundant None guard
- _sanitize: strip all C0 control chars + DEL, not just \n/\r
- Remove unnecessary `sdk_env is None` guard (build_sdk_env always returns dict)
2026-04-01 17:59:07 +02:00
Zamil Majdy
490643d65a refactor(copilot): hoist _sanitize helper and use it in pre_compact_hook
Move _sanitize() above all hooks so it can be reused. Refactor
pre_compact_hook to use _sanitize(max_len=500) instead of inline
.replace() calls for consistency across all hooks.
2026-04-01 17:59:07 +02:00
Zamil Majdy
2b14ecf5ee fix(copilot): sanitize hook inputs, rename constant, add Agent failure test
- Rename _SUBAGENT_TOOLS to _subagent_tools (frozenset, function-local)
- Extract _sanitize() helper for consistent log injection prevention
  across subagent_start_hook and subagent_stop_hook
- Add test_agent_slot_released_on_failure for coverage parity with
  the existing Task failure test
2026-04-01 17:59:07 +02:00
Zamil Majdy
14d6d66bdc refactor(copilot): use frozenset and extract _sanitize helper in hooks 2026-04-01 17:59:07 +02:00
Zamil Majdy
28443e2e33 fix(copilot): guard against None sdk_env from build_sdk_env
build_sdk_env can return None in test mocks. Guard with fallback
to empty dict before setting CLAUDE_CODE_TMPDIR.
2026-04-01 17:59:07 +02:00
Zamil Majdy
611a20d7df fix(copilot): sanitize transcript path in subagent stop hook
Strip control characters from agent_transcript_path before logging
to prevent log injection, matching the existing pattern in pre_compact_hook.
2026-04-01 17:59:07 +02:00
Zamil Majdy
ce201cd19c fix(copilot): remove HOME override to preserve subscription auth
Sentry correctly flagged that overriding HOME breaks subscription mode
(claude login) — the CLI looks for credentials at $HOME/.claude/.
Keep only CLAUDE_CODE_TMPDIR which fixes the sub-agent output path.
2026-04-01 17:59:07 +02:00
Zamil Majdy
0c76852768 fix(copilot): address self-review nits in security hooks logging 2026-04-01 17:59:07 +02:00
Zamil Majdy
414b8bbaac fix(copilot): recognize Agent tool name and route CLI state into workspace
The Claude Agent SDK CLI renamed the sub-agent tool from "Task" to "Agent"
in v2.x. Our security hooks only checked for "Task", so all sub-agent
security controls were silently bypassed: background execution was unblocked,
concurrency limiting didn't apply, and slot tracking was broken.

Additionally, the CLI writes sub-agent output to /tmp/claude-<uid>/ and
project state to $HOME/.claude/ — both outside the per-session workspace
(/tmp/copilot-<session>/). This caused PermissionError in E2B and silently
lost sub-agent results via failed @@agptfile: expansion.

Changes:
- Handle both "Task" and "Agent" tool names in security hooks
- Add "Agent" to _SDK_BUILTIN_ALWAYS allowed tools list
- Set CLAUDE_CODE_TMPDIR and HOME to sdk_cwd so CLI state lands in workspace
- Register SubagentStart/SubagentStop hooks for lifecycle visibility
- Add 5 new tests for Agent tool name handling and mixed slot sharing
2026-04-01 17:59:07 +02:00
Zamil Majdy
4c85f2399a fix(backend): propagate dry-run mode to special blocks (Orchestrator, AgentExecutor, MCP)
Previously dry-run mode simulated ALL blocks via LLM, but this didn't work
well for OrchestratorBlock, AgentExecutorBlock, and MCPToolBlock:

- OrchestratorBlock & AgentExecutorBlock now execute for real in dry-run
  mode so the orchestrator can make LLM calls and agent executors can
  spawn child graphs. Their downstream tool blocks and child-graph blocks
  are still simulated. Credential fields from node defaults are restored
  since validate_exec wipes them in dry-run mode.

- MCPToolBlock gets a specialised simulate_mcp_block() that builds an
  LLM prompt grounded in the selected tool's name and JSON Schema,
  producing more realistic mock responses than the generic simulator.
2026-04-01 17:58:51 +02:00
Zamil Majdy
db0e5a1b0b style(test): format SQL query block tests with ruff
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:58:49 +02:00
Zamil Majdy
22a5e76af9 fix(test): replace real-looking connection strings with test.invalid hosts
GitHub secret scanner flagged test connection strings as leaked secrets.
Replaced all real-looking IPs, hostnames, and Supabase URLs with
RFC 2606 reserved .invalid domains and RFC 5737 documentation IPs
(198.51.100.x).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:58:49 +02:00
Zamil Majdy
7919da16b4 test(backend): add security-focused tests for SQL query block
Adds 92 test cases covering:
- Single-statement validation (multi-statement injection blocked)
- Read-only enforcement (INSERT/UPDATE/DELETE/DROP rejected)
- Writable CTE detection (WITH...DELETE RETURNING blocked)
- SSRF protection: IPv4 private ranges, IPv6 loopback (::1),
  link-local (fe80::), Unix socket paths
- Error sanitization: passwords scrubbed, usernames scrubbed,
  IP addresses scrubbed from error messages
- Value serialization edge cases (datetime, Decimal, bytes)
- URL validation for all database types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:58:49 +02:00
Zamil Majdy
052f953afb fix(backend): replace f-string interpolation with str(int()) for SET timeout commands
Use explicit str(int(timeout * 1000)) instead of f-string interpolation
for SET statement_timeout / MAX_EXECUTION_TIME / LOCK_TIMEOUT commands.
SET commands don't support bind parameters in most databases, so we use
string concatenation with an int-cast value as defense-in-depth.
2026-04-01 17:58:49 +02:00
Zamil Majdy
abd9fbe08a docs(backend): regenerate block docs to fix check-docs-sync 2026-04-01 17:58:49 +02:00
Zamil Majdy
81308af770 fix(backend): fix remaining stale pyodbc comment in MSSQL section 2026-04-01 17:58:49 +02:00
Zamil Majdy
a726c1d1d5 fix(backend): address round 2 review — port validation, comment fix, dead fallback
- Add ge=1, le=65535 port validation to Input schema
- Fix inaccurate comment: pymssql not pyodbc
- Replace _DATABASE_TYPE_DEFAULT_PORT.get() with direct dict access
  (all types have entries after SQLite removal)
- Update default port tests to use port=None instead of port=0
2026-04-01 17:58:49 +02:00
Zamil Majdy
a015bf9e1c fix(backend): address review round — remove SQLite, hide password, cleanup dead code
- Remove DatabaseType.SQLITE from enum (rejected at runtime, confusing UX)
- Remove all SQLite dead code paths (driver map, connect_args, runtime check)
- Change render_as_string(hide_password=False) to hide_password=True to avoid
  materializing plaintext credentials in local variable
- Simplify pinned_host assignment (remove unreachable fallback branch)
- Remove SQLite-related test cases
- Add doc comment to _make_input noting read_only default deviation
2026-04-01 17:58:49 +02:00
Zamil Majdy
d99278a40d fix(backend): update _sanitize_error docstring to mention IPv6 scrubbing 2026-04-01 17:58:49 +02:00
Zamil Majdy
bd7d9a5697 fix(backend): address round 1 review findings for SQL query block
- Fix database name injection: pass URL object to create_engine() instead
  of rendered string to prevent query parameter injection via database name
- Refactor _validate_query_is_read_only to accept parsed Statement object,
  eliminating duplicate sqlparse.parse() call
- Add IPv6 address scrubbing to _sanitize_error
- Fix docs: remove sqlite from valid types, correct host type annotation
2026-04-01 17:58:48 +02:00
Zamil Majdy
9cfa53a2ff fix(backend): document MySQL MAX_EXECUTION_TIME limitation for write queries
Add code comment noting that MySQL's MAX_EXECUTION_TIME only applies to
SELECT statements; write operations rely on the database's wait_timeout.
2026-04-01 17:58:48 +02:00
Zamil Majdy
e6cf899a6d fix(docs): regenerate block docs to sync with code schema
Ran generate_block_docs.py to fix check-docs-sync CI failure.
The Inputs table is auto-generated from the block schema.
2026-04-01 17:58:48 +02:00
Zamil Majdy
b655b30aeb fix: address review findings on SQL query block PR
- Remove unnecessary pool_pre_ping/pool_recycle (engine disposed per-query)
- Fix _extract_keyword_tokens docstring to match implementation
- Move DATABASE enum entry to alphabetical position in ProviderName
- Add database entry to frontend providerIcons map
- Revert no-op string-literal extraction in API key modals
- Revert unused _provider param in getCredentialTypeLabel
2026-04-01 17:58:48 +02:00
Zamil Majdy
5b8daf5d4c fix(docs): correct SQL query block documentation to match code
- Fix "How it works" to say read-only by default (not write-enabled)
- Replace "connection URL" with "discrete host/port/database fields"
- Remove sqlite from database_type options (disabled in code)
- Fix host type from "str (password)" to "str (secret)"
2026-04-01 17:58:48 +02:00
Zamil Majdy
9b74b7bb41 fix(backend): handle single-quoted usernames in SQL error sanitization
MySQL and MSSQL error messages use single quotes around usernames (e.g.
"Access denied for user 'myuser'@'host'"), but _sanitize_error only
handled double-quoted usernames. This could leak usernames to the LLM.

Now handles both quote styles in the regex and bare replacement.
2026-04-01 17:58:48 +02:00
Zamil Majdy
a1578984cc fix(backend): update poetry.lock and regenerate block docs
- Run `poetry lock` to include pymssql and pymysql in the lock file
- Regenerate block docs to reflect the Optional[int] port field change
2026-04-01 17:58:48 +02:00
Zamil Majdy
c0869e9168 fix(backend): fix port default for MySQL/MSSQL and add missing DB drivers
- Make port field Optional[int] with default=None so the `or` fallback
  correctly picks the database-specific default port (3306 for MySQL,
  1433 for MSSQL) instead of always using 5432
- Add pymysql and pymssql dependencies and update driver names to
  mysql+pymysql and mssql+pymssql so SQLAlchemy can connect to these DBs
- Handle ModuleNotFoundError gracefully if a driver is unavailable
- Update pymssql connect_args to use login_timeout (pymssql API)
2026-04-01 17:58:48 +02:00
Zamil Majdy
0db5a6ff9a docs: regenerate block docs after SQL block Input schema changes 2026-04-01 17:58:48 +02:00
Zamil Majdy
3664624445 fix(backend): improve SQL block Input schema UX
- Mark host field as secret=True so it renders as masked text in the UI
- Change default port from 0 to 5432 (PostgreSQL default) to avoid confusing "0"
- Set database_type to advanced=False so it shows as a prominent dropdown
- Reorder fields: database_type -> host -> port -> database -> query -> read_only
- Improve descriptions and placeholders for clarity
2026-04-01 17:58:48 +02:00
Zamil Majdy
f1e2ce0703 fix(backend): add MSSQL timeout enforcement and document read-only gap
Address review feedback: add SET LOCK_TIMEOUT for MSSQL connections to
enforce query timeout at the database level, consistent with the
PostgreSQL/MySQL implementations. Document that MSSQL lacks a
session-level read-only mode, with defense-in-depth handled by the SQL
validation layer and ROLLBACK in the finally block.
2026-04-01 17:58:48 +02:00
Zamil Majdy
c226cf0925 fix(backend): address Sentry review comments on SQL query block
- Use database_type enum instead of substring-checking connection string
  to determine driver-specific connect_args (fixes false match when db
  name/user/password contains "mssql" or "sqlite")
- Use BEGIN TRANSACTION for MSSQL instead of BEGIN (T-SQL syntax)
- Extend port sanitization regex to also match :port format (host:5432)
- Add test for colon-format port sanitization
2026-04-01 17:58:48 +02:00
Zamil Majdy
dade634b4a fix(backend): use plain string for host in test_input to fix JSON schema validation
The test_input host value should be a plain string (Pydantic coerces it
to SecretStr), not a SecretStr object which serializes as '**********'
and fails JSON schema validation in the block test framework.
2026-04-01 17:58:48 +02:00
Zamil Majdy
34101c4389 docs: regenerate block docs after host field type change to SecretStr 2026-04-01 17:58:48 +02:00
Zamil Majdy
2218254c8a fix(backend): make SQL block host a SecretStr and harden error sanitization
- Change `host` field from `str` to `SecretStr` so it is hidden from
  repr/logs but still stored in graph JSON.
- Expand `_sanitize_error` to strip hostnames, IP addresses, usernames,
  and port numbers from error messages exposed to the LLM.
- Add tests for hostname/IP/username/port scrubbing and an integration
  test verifying no infrastructure details leak through run() errors.
2026-04-01 17:58:48 +02:00
Zamil Majdy
4d63cffa7a docs: regenerate block docs after SQL query block port field change 2026-04-01 17:58:48 +02:00
Zamil Majdy
ebf3b920d8 fix(backend): address 3 review findings in SQL query block
1. Fix DNS rebinding TOCTOU: pin connection to resolved IP from
   check_host_allowed instead of re-resolving the hostname, preventing
   SSRF via DNS rebinding attacks.

2. Fix default port per database type: use _DATABASE_TYPE_DEFAULT_PORT
   lookup instead of hard-coded 5432, so MySQL (3306) and MSSQL (1433)
   work without manually specifying the port.

3. Fix MSSQL connect_timeout: use pyodbc's "timeout" key instead of
   "connect_timeout" which is silently ignored, preventing indefinite
   hangs on unreachable MSSQL servers.
2026-04-01 17:58:48 +02:00
Zamil Majdy
9bd579b041 fix(platform): clean up Pyright warnings, fix comment-only query test, sync docs
- Use _DATABASE_TYPE_DEFAULT_PORT for port fallback in SQLQueryBlock.run()
- Rename **kwargs to **_kwargs and *args to *_args to silence not-accessed warnings
- Fix _validate_single_statement to reject comment-only queries as empty
- Fix test_comment_only_query to assert specific error message
- Prefix unused `provider` param with _ in getCredentialTypeLabel (frontend lint)
- Regenerate block docs to fix check-docs-sync CI
2026-04-01 17:58:48 +02:00
Zamil Majdy
41601cbb5c fix(platform): switch SQL block to user_password credentials to fix special char passwords
The SQL block previously used api_key credential type, stuffing the entire
connection URL (including password) into one field. This broke when passwords
contained special characters (@, #, !) that conflict with URL syntax.

Switch to user_password credential type with separate username/password fields.
Build the SQLAlchemy URL internally via URL.create() which accepts raw passwords
without URL encoding. Also restore accidentally deleted _validate_query_is_read_only
function, remove unused _encode_password_in_url/quote/unquote imports, and clean up
database-specific UI overrides in the frontend credential modals.
2026-04-01 17:58:48 +02:00
Zamil Majdy
c636b6f310 test(backend): add integration tests for SQLQueryBlock SSRF, SQLite, and error handling
Add run()-level tests covering SSRF private IP rejection (127.0.0.1,
10.x, 172.16.x, 192.168.x), Unix socket blocking, missing hostname
rejection, SQLite disabled error, credential sanitization on connection
failure, query timeout clean error, URL type mismatch rejection, happy
path, and EXECUTE keyword rejection. Also adds time serialization test.
2026-04-01 17:58:48 +02:00
Zamil Majdy
292be77b86 fix(platform): show "Connection URL" instead of "API Key" for database credentials
The SQL query block's credential dialog was misleadingly labeled since a
database connection URL is not an API key. This updates both backend and
frontend:

- Shorten the DatabaseCredentialsField description so it no longer
  truncates in the UI
- Make credential labels provider-aware so the database provider shows
  "Connection URL" instead of "API Key" in tab labels, input fields,
  placeholders, and action buttons
2026-04-01 17:58:48 +02:00
Zamil Majdy
dd3349e6bc fix(backend): block SELECT INTO, disable SQLite, fix read-only transaction ordering
- Add INTO, OUTFILE, DUMPFILE to disallowed SQL keywords to prevent
  SELECT...INTO table creation and file writes
- Disable SQLite database type (lacks path sandboxing and read-only
  enforcement) until proper restrictions are implemented
- Fix read-only transaction enforcement: use AUTOCOMMIT to issue SET
  commands, then open explicit BEGIN/ROLLBACK transaction for the user
  query so read-only constraints apply to it (not the next transaction)
- Add regression tests for SELECT INTO variants
2026-04-01 17:58:48 +02:00
Zamil Majdy
bfdf4b99db fix(backend): make SSRF host check mockable for block test framework
Extract resolve_and_check_blocked into a check_host_allowed method on
SQLQueryBlock so the block test framework can mock it alongside
execute_query. Without this, test credentials pointing to localhost
trigger the SSRF blocklist in CI.
2026-04-01 17:58:48 +02:00
Zamil Majdy
aba78b0fdd refactor(backend): replace psycopg2 with SQLAlchemy for multi-database support
Refactor SQLQueryBlock to use SQLAlchemy instead of psycopg2, enabling
support for PostgreSQL, MySQL, SQLite, and MSSQL. Add a database_type
enum field to Input for selecting the target database. Connection
credentials now accept any SQLAlchemy connection URL format.

- Replace psycopg2 with sqlalchemy.create_engine + connection.execute(text())
- Add DatabaseType enum (postgres, mysql, sqlite, mssql)
- Add _validate_connection_url to ensure URL matches selected db type
- Rename ProviderName.POSTGRES to ProviderName.DATABASE
- Update SSRF protection to use SQLAlchemy URL parsing (make_url)
- Add urlparse import for SQLite network connection check
- Handle bytes serialization alongside memoryview
- Update tests with TestValidateConnectionUrl class and bytes test
- Update docs to reflect multi-database support
2026-04-01 17:58:48 +02:00
Zamil Majdy
12934dfd72 docs: regenerate block documentation for SQLQueryBlock 2026-04-01 17:58:48 +02:00
Zamil Majdy
c5507415fd fix(backend): harden SQL query block against injection, SSRF bypass, and precision loss
- Replace regex-based SQL validation with sqlparse tokenizer to prevent
  multi-statement injection via quoted comment bypass (e.g. SET LOCAL
  statement_timeout = 0). Keywords in string literals no longer cause
  false positives.
- Replace urlparse with psycopg2.extensions.parse_dsn for SSRF protection,
  handling both URI and libpq DSN formats. Reject missing hostname and
  Unix socket paths.
- Use server-side named cursor to enforce max_rows at the database level
  instead of fetching entire result set into client memory.
- Serialize fractional Decimal values as str instead of float to preserve
  exact precision for analytics data.
- Add sqlparse dependency.
- Add tests for multi-statement injection, string literal keywords, and
  high-precision Decimal serialization.
2026-04-01 17:58:48 +02:00
Zamil Majdy
7ff096afd9 style(backend): extract sanitize_error to local vars for readability 2026-04-01 17:58:48 +02:00
Zamil Majdy
38fb504063 fix(backend): reduce keyword false positives, broaden SSRF handling, add tests
- Remove ambiguous keywords (COMMENT, ANALYZE, LOCK, CLUSTER, REINDEX,
  VACUUM) from disallowed list — they're harmless on readonly connections
  and cause false positives on common column names
- Add NOTE documenting intentional string-literal matching behavior
- Broaden SSRF exception handling to catch OSError (DNS failures)
- Add _serialize_value tests (Decimal, datetime, date, memoryview)
- Add tests for column names that look like keywords
2026-04-01 17:58:48 +02:00
Zamil Majdy
b4388a9c93 fix(backend): address PR review - security, async, SSRF, tests
- Add _sanitize_error() to scrub connection strings from error messages
- Wrap execute_query in asyncio.to_thread() to avoid blocking event loop
- Add SSRF protection via resolve_and_check_blocked() on database host
- Document intentional string-literal false positives in comment stripping
- Add sql_query_block_test.py with 36 tests for query validation and
  error sanitization
2026-04-01 17:58:48 +02:00
Zamil Majdy
a7a68e585a feat(backend): add SQL query block for CoPilot analytics access
Add a new SQLQueryBlock that allows CoPilot and user-built agents to
execute read-only SQL queries against PostgreSQL databases. This enables
data-driven answers for analytics (user metrics, retention, onboarding
funnels, execution stats) via the existing run_block tool.

- New POSTGRES provider in ProviderName enum
- APIKeyCredentials with connection string for MVP credential storage
- SELECT-only query validation with defense-in-depth keyword blocking
- Configurable query timeout (max 120s) and row limit (max 10000)
- Read-only connection mode + statement_timeout for safety
- JSON-safe serialization for Decimal, datetime, and binary types

Resolves: SECRT-2171
2026-04-01 17:58:48 +02:00
Zamil Majdy
14ad37b0c7 fix: resolve merge conflict in transcript.py re-export module 2026-04-01 17:53:57 +02:00
Zamil Majdy
24d0c35ed3 fix(backend/copilot): prompt-too-long retry, compaction churn, model-aware compression, and truncated tool call recovery (#12625)
## Why

CoPilot has several context management issues that degrade long
sessions:
1. "Prompt is too long" errors crash the session instead of triggering
retry/compaction
2. Stale thinking blocks bloat transcripts, causing unnecessary
compaction every turn
3. Compression target is hardcoded regardless of model context window
size
4. Truncated tool calls (empty `{}` args from max_tokens) kill the
session instead of guiding the model to self-correct

## What

**Fix 1: Prompt-too-long retry bypass (SENTRY-1207)**
The SDK surfaces "prompt too long" via `AssistantMessage.error` and
`ResultMessage.result` — neither triggered the retry/compaction loop
(only Python exceptions did). Now both paths are intercepted and
re-raised.

**Fix 2: Strip stale thinking blocks before upload**
Thinking/redacted_thinking blocks in non-last assistant entries are
10-50K tokens each but only needed for API signature verification in the
*last* message. Stripping before upload reduces transcript size and
prevents per-turn compaction.

**Fix 3: Model-aware compression target**
`compress_context()` now computes `target_tokens` from the model's
context window (e.g. 140K for Opus 200K) instead of a hardcoded 120K
default. Larger models retain more history; smaller models compress more
aggressively.

**Fix 4: Self-correcting truncated tool calls**
When the model's response exceeds max_tokens, tool call inputs get
silently truncated to `{}`. Previously this tripped a circuit breaker
after 3 attempts. Now the MCP wrapper detects empty args and returns
guidance: "write in chunks with `cat >>`, pass via
`@@agptfile:filename`". The model can self-correct instead of the
session dying.

## How

- **service.py**: `_is_prompt_too_long` checks in both
`AssistantMessage.error` and `ResultMessage` error handlers. Circuit
breaker limit raised from 3→5.
- **transcript.py**: `strip_stale_thinking_blocks()` reverse-scans for
last assistant `message.id`, strips thinking blocks from all others.
Called in `upload_transcript()`.
- **prompt.py**: `get_compression_target(model)` computes
`context_window - 60K overhead`. `compress_context()` uses it when
`target_tokens` is None.
- **tool_adapter.py**: `_truncating` wrapper intercepts empty args on
tools with required params, returns actionable guidance instead of
failing.

## Related

- Fixes SENTRY-1207
- Sessions: `d2f7cba3` (repeated compaction), `08b807d4` (prompt too
long), `130d527c` (truncated tool calls)
- Extends #12413, consolidates #12626

## Test plan

- [x] 6 unit tests for `strip_stale_thinking_blocks`
- [x] 1 integration test for ResultMessage prompt-too-long → compaction
retry
- [x] Pyright clean (0 errors), all pre-commit hooks pass
- [ ] E2E: Load transcripts from affected sessions and verify behavior
2026-04-01 15:10:57 +00:00
Zamil Majdy
389cd28879 test: add round 3 E2E screenshots for PR #12623 2026-04-01 17:01:10 +02:00
Zamil Majdy
656858eba1 test: add E2E screenshots for PR #12581 round 3 2026-04-01 16:58:11 +02:00
Zamil Majdy
8aae7751dc fix(backend/copilot): prevent duplicate block execution from pre-launch arg mismatch (#12632)
## Why

CoPilot sessions are duplicating Linear tickets and GitHub PRs.
Investigation of 5 production sessions (March 31st) found that 3/5
created duplicate Linear issues — each with consecutive IDs at the exact
same timestamp, but only one visible in Langfuse traces.

Production gcloud logs confirm: **279 arg mismatch warnings per day**,
**37 duplicate block execution pairs**, and all LinearCreateIssueBlock
failures in pairs.

Related: SECRT-2204

## What

Replace the speculative pre-launch mechanism with the SDK's native
parallel dispatch via `readOnlyHint` tool annotations. Remove ~580 lines
of pre-launch infrastructure code.

## How

### Root cause
The pre-launch mechanism had three compounding bugs:
1. **Arg mismatch**: The SDK CLI normalises args between the
`AssistantMessage` (used for pre-launch) and the MCP `tools/call`
dispatch, causing frequent mismatches (279/day in prod)
2. **FIFO desync on denial**: Security hooks can deny tool calls,
causing the CLI to skip the MCP dispatch — but the pre-launched task
stays in the FIFO queue, misaligning all subsequent matches
3. **Cancel race**: `task.cancel()` is best-effort in asyncio — if the
HTTP call to Linear/GitHub already completed, the side effect is
irreversible

### Fix
- **Removed** `pre_launch_tool_call()`, `cancel_pending_tool_tasks()`,
`_tool_task_queues` ContextVar, all FIFO queue logic, and all 4
`cancel_pending_tool_tasks()` calls in `service.py`
- **Added** `readOnlyHint=True` annotations on 15+ read-only tools
(`find_block`, `search_docs`, `list_workspace_files`, etc.) — the SDK
CLI natively dispatches these in parallel ([ref:
anthropics/claude-code#14353](https://github.com/anthropics/claude-code/issues/14353))
- Side-effect tools (`run_block`, `bash_exec`, `create_agent`, etc.)
have no annotation → CLI runs them sequentially → no duplicate execution
risk

### Net change: -578 lines, +105 lines
2026-04-01 13:42:54 +00:00
An Vy Le
725da7e887 dx(backend/copilot): clarify ambiguous agent goals using find_block before generation (#12601)
### Why / What / How

**Why:** When a user asks CoPilot to build an agent with an ambiguous
goal (output format, delivery channel, data source, or trigger
unspecified), the agent generator previously made assumptions and jumped
straight into JSON generation. This produced agents that didn't match
what the user actually wanted, requiring multiple correction cycles.

**What:** Adds a "Clarifying Before Building" section to the agent
generation guide. When the goal is ambiguous, CoPilot first calls
`find_block` to discover what the platform actually supports for the
ambiguous dimension, then asks the user one concrete question grounded
in real platform options (e.g. "The platform supports Gmail, Slack, and
Google Docs — which should the agent use for delivery?"). Only after the
user answers does the full agent generation workflow proceed.

**How:** The clarification instruction is added to
`agent_generation_guide.md` — the guide loaded on-demand via
`get_agent_building_guide` when the LLM is about to build an agent. This
avoids polluting the system prompt supplement (which loads for every
CoPilot conversation, not just agent building). No dedicated tool is
needed — the LLM asks naturally in conversation text after discovering
real platform options via `find_block`.

### Changes 🏗️

- `backend/copilot/sdk/agent_generation_guide.md`: Adds "Clarifying
Before Building" section before the workflow steps. Instructs the model
to call `find_block` for the ambiguous dimension, ask the user one
grounded question, wait for the answer, then proceed to generation.
- `backend/copilot/prompting_test.py`: New test file verifying the guide
contains the clarification section and references `find_block`.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Ask CoPilot to "build an agent to send a report" (ambiguous
output) — verify it calls `find_block` for delivery options and asks one
grounded question before generating JSON
- [ ] Ask CoPilot to "build an agent to scrape prices from Amazon and
email me daily" (specific goal) — verify it skips clarification and
proceeds directly to agent generation
- [ ] Verify the clarification question lists real block options (e.g.
Gmail, Slack, Google Docs) rather than abstract options

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-01 13:32:12 +00:00
seer-by-sentry[bot]
bd9e9ec614 fix(frontend): remove LaunchDarkly local storage bootstrapping (#12606)
### Why / What / How

<!-- Why: Why does this PR exist? What problem does it solve, or what's
broken/missing without it? -->
This PR fixes
[BUILDER-7HD](https://sentry.io/organizations/significant-gravitas/issues/7374387984/).
The issue was that: LaunchDarkly SDK fails to construct streaming URL
due to non-string `_url` from malformed `localStorage` bootstrap data.
<!-- What: What does this PR change? Summarize the changes at a high
level. -->
Removed the `bootstrap: "localStorage"` option from the LaunchDarkly
provider configuration.
<!-- How: How does it work? Describe the approach, key implementation
details, or architecture decisions. -->
This change ensures that LaunchDarkly no longer attempts to load initial
feature flag values from local storage. Flag values will now always be
fetched directly from the LaunchDarkly service, preventing potential
issues with stale local storage data.

### Changes 🏗️

<!-- List the key changes. Keep it higher level than the diff but
specific enough to highlight what's new/modified. -->
- Removed the `bootstrap: "localStorage"` option from the LaunchDarkly
provider configuration.
- LaunchDarkly will now always fetch flag values directly from its
service, bypassing local storage.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
- [ ] Verify that LaunchDarkly flags are loaded correctly without
issues.
- [ ] Ensure no errors related to `localStorage` or streaming URL
construction appear in the console.

<details>
  <summary>Example test plan</summary>
  
  - [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
  - [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
  - [ ] Edit an agent from monitor, and confirm it executes correctly
</details>

#### For configuration changes:

- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)

<details>
  <summary>Examples of configuration changes</summary>

  - Changing ports
  - Adding new services that need to communicate with each other
  - Secrets or environment variable changes
  - New or infrastructure changes such as databases
</details>

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
2026-04-01 19:12:54 +07:00
Nicholas Tindle
88589764b5 dx(platform): normalize agent instructions for Claude and Codex (#12592)
### Why / What / How

Why: repo guidance was split between Claude-specific `CLAUDE.md` files
and Codex-specific `AGENTS.md` files, which duplicated instruction
content and made the same repository behave differently across agents.
The repo also had Claude skills under `.claude/skills` but no
Codex-visible repo skill path.

What: this PR bridges the repo's Claude skills into Codex and normalizes
shared instruction files so `AGENTS.md` becomes the canonical source
while each `CLAUDE.md` imports its sibling `AGENTS.md`.

How: add a repo-local `.agents/skills` symlink pointing to
`../.claude/skills`; move nested `CLAUDE.md` content into sibling
`AGENTS.md` files; replace each repo `CLAUDE.md` with a one-line
`@AGENTS.md` shim so Claude and Codex read the same scoped guidance
without duplicating text. The root `CLAUDE.md` now imports the root
`AGENTS.md` rather than symlinking to it.

Note: the instruction-file normalization commit was created with
`--no-verify` because the repo's frontend pre-commit `tsc` hook
currently fails on unrelated existing errors, largely missing
`autogpt_platform/frontend/src/app/api/__generated__/*` modules.

### Changes 🏗️

- Add `.agents/skills` as a repo-local symlink to `../.claude/skills` so
Codex discovers the existing Claude repo skills.
- Add a real root `CLAUDE.md` shim that imports the canonical root
`AGENTS.md`.
- Promote nested scoped instruction content into sibling `AGENTS.md`
files under `autogpt_platform/`, `autogpt_platform/backend/`,
`autogpt_platform/frontend/`, `autogpt_platform/frontend/src/tests/`,
and `docs/`.
- Replace the corresponding nested `CLAUDE.md` files with one-line
`@AGENTS.md` shims.
- Preserve the existing scoped instruction hierarchy while making the
shared content cross-compatible between Claude and Codex.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified `.agents/skills` resolves to `../.claude/skills`
  - [x] Verified each repo `CLAUDE.md` now contains only `@AGENTS.md`
- [x] Verified the expected `AGENTS.md` files exist at the root and
nested scoped directories
- [x] Verified the branch contains only the intended agent-guidance
commits relative to `dev` and the working tree is clean

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No runtime configuration changes are included in this PR.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: documentation/instruction-file reshuffle plus an
`.agents/skills` pointer; no runtime code paths are modified.
> 
> **Overview**
> Unifies agent guidance so **`AGENTS.md` becomes canonical** and all
corresponding `CLAUDE.md` files become 1-line shims (`@AGENTS.md`) at
the repo root, `autogpt_platform/`, backend, frontend, frontend tests,
and `docs/`.
> 
> Adds `.agents/skills` pointing to `../.claude/skills` so non-Claude
agents discover the same shared skills/instructions, eliminating
duplicated/agent-specific guidance content.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
839483c3b6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-04-01 09:08:51 +00:00
Zamil Majdy
f0a3afda7d Add test screenshots for PR #12623 2026-04-01 08:49:33 +02:00
Zamil Majdy
a9cbb3ee2f test: add screenshots from PR #12581 round 2 testing 2026-04-01 08:47:38 +02:00
Zamil Majdy
1810452920 fix(frontend): use type-safe any cast for createSessionMutation call
The generated mutation type differs between local (void) and CI
(requires CreateSessionRequest) due to export-api-schema regeneration.
Use an explicit any cast to handle both generated type variants.
2026-04-01 08:46:17 +02:00
Zamil Majdy
4f6f3ca240 fix(frontend): remove redundant tier fetch and add empty-query guard
The backend get_user_rate_limit endpoint already returns tier in the
response — remove the separate fetchTier() calls that were duplicating
the request. Also guard search_users against empty queries to prevent
returning the entire user table. Fix pre-existing TS error in
useChatSession where createSessionMutation was called with an argument
the generated client no longer expects.
2026-04-01 08:13:15 +02:00
Zamil Majdy
9ffecbac02 fix(backend/copilot): add missing mode param to enqueue_copilot_turn docstring 2026-04-01 08:03:35 +02:00
Zamil Majdy
eb22cf4483 fix(frontend): remove duplicate JSDoc and simplify tier access in rate-limit admin UI 2026-04-01 06:33:52 +02:00
Zamil Majdy
16636b64c6 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-01 06:15:37 +02:00
Zamil Majdy
c2709fbc28 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-01 06:14:49 +02:00
Zamil Majdy
c659f3b058 fix(copilot): fix dry-run simulation showing INCOMPLETE/error status (#12580)
## Summary
- **Backend**: Strip empty `error` pins from dry-run simulation outputs
that the simulator always includes (set to `""` meaning "no error").
This was causing the LLM to misinterpret successful simulations as
failures and report "INCOMPLETE" status to users
- **Backend**: Add explicit "Status: COMPLETED" to dry-run response
message to prevent LLM misinterpretation
- **Backend**: Update simulation prompt to exclude `error` from the
"MUST include" keys list, and instruct LLM to omit error unless
simulating a logical failure
- **Frontend**: Fix `isRunBlockErrorOutput()` type guard that was too
broad (`"error" in output` matched BlockOutputResponse objects, not just
ErrorResponse), causing dry-run results to be displayed as errors
- **Frontend**: Fix `parseOutput()` fallback matching to not classify
BlockOutputResponse as ErrorResponse
- **Frontend**: Filter out empty error pins from `BlockOutputCard`
display and accordion metadata output key counting
- **Frontend**: Clear stale execution results before dry-run/no-input
runs so the UI shows fresh output
- **Frontend**: Fix first-click simulate race condition by invalidating
execution details query after WebSocket subscription confirms

## Test plan
- [x] All 12 existing + 5 new dry-run tests pass (`poetry run pytest
backend/copilot/tools/test_dry_run.py -x -v`)
- [x] All 23 helpers tests pass (`poetry run pytest
backend/copilot/tools/helpers_test.py -x -v`)
- [x] All 13 run_block tests pass (`poetry run pytest
backend/copilot/tools/run_block_test.py -x -v`)
- [x] Backend linting passes (ruff check + format)
- [x] Frontend linting passes (next lint)
- [ ] Manual: trigger dry-run on a block with error output pin (e.g.
Komodo Image Generator) — should show "Simulated" status with clean
output, no misleading "error" section
- [ ] Manual: first click on Simulate button should immediately show
results (no race condition)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-31 21:03:00 +00:00
Zamil Majdy
80581a8364 fix(copilot): add tool call circuit breakers and intermediate persistence (#12604)
## Why

CoPilot session `d2f7cba3` took **82 minutes** and cost **$20.66** for a
single user message. Root causes:
1. Redis session meta key expired after 1h, making the session invisible
to the resume endpoint — causing empty page on reload
2. Redis stream key also expired during sub-agent gaps (task_progress
events produced no chunks)
3. No intermediate persistence — session messages only saved to DB after
the entire turn completes
4. Sub-agents retried similar WebSearch queries (addressed via prompt
guidance)

## What

### Redis TTL fixes (root cause of empty session on reload)
- `publish_chunk()` now periodically refreshes **both** the session meta
key AND stream key TTL (every 60s).
- `task_progress` SDK events now emit `StreamHeartbeat` chunks, ensuring
`publish_chunk` is called even during long sub-agent gaps where no real
chunks are produced.
- Without this fix, turns exceeding the 1h `stream_ttl` lose their
"running" status and stream data, making `get_active_session()` return
False.

### Intermediate DB persistence
- Session messages flushed to DB every **30 seconds** or **10 new
messages** during the stream loop.
- Uses `asyncio.shield(upsert_chat_session())` matching the existing
`finally` block pattern.

### Orphaned message cleanup on rollback
- On stream attempt rollback, orphaned messages persisted by
intermediate flushes are now cleaned up from the DB via
`delete_messages_from_sequence`.
- Prevents stale messages from resurfacing on page reload after a failed
retry.

### Prompt guidance
- Added web search best practices to code supplement (search efficiency,
sub-agent scope separation).

### Approach: root cause fixes, not capability limits
- **No tool call caps** — artificial limits on WebSearch or total tool
calls would reduce autopilot capability without addressing why searches
were redundant.
- **Task tool remains enabled** — sub-agent delegation via Task is a
core capability. The existing `max_subtasks` concurrency guard is
sufficient.
- The real fixes (TTL refresh, persistence, prompt guidance) address the
underlying bugs and behavioral issues.

## How

### Files changed
- `stream_registry.py` — Redis meta + stream key TTL refresh in
`publish_chunk()`, module-level keepalive tracker
- `response_adapter.py` — `task_progress` SystemMessage →
StreamHeartbeat emission
- `service.py` — Intermediate DB persistence in `_run_stream_attempt`
stream loop, orphan cleanup on rollback
- `db.py` — `delete_messages_from_sequence` for rollback cleanup
- `prompting.py` — Web search best practices

### GCP log evidence
```
# Meta key expired during 82-min turn:
09:49 — GET_SESSION: active_session=False, msg_count=1  ← meta gone
10:18 — Session persisted in finally with 189 messages   ← turn completed

# T13 (1h45min) same bug reproduced live:
16:20 — task_progress events still arriving, but active_session=False

# Actual cost:
Turn usage: cache_read=347916, cache_create=212472, output=12375, cost_usd=20.66
```

### Test plan
- [x] task_progress emits StreamHeartbeat
- [x] Task background blocked, foreground allowed, slot release on
completion/failure
- [x] CI green (lint, type-check, tests, e2e, CodeQL)

---------

Co-authored-by: Zamil Majdy <majdy.zamil@gmail.com>
2026-03-31 21:01:56 +00:00
lif
3c046eb291 fix(frontend): show all agent outputs instead of only the last one (#12504)
Fixes #9175

### Changes 🏗️

The Agent Outputs panel only displayed the last execution result per
output node, discarding all prior outputs during a run.

**Root cause:** In `AgentOutputs.tsx`, the `outputs` useMemo extracted
only the last element from `nodeExecutionResults`:
```tsx
const latestResult = executionResults[executionResults.length - 1];
```

**Fix:** Changed `.map()` to `.flatMap()` over output nodes, iterating
through all `executionResults` for each node. Each execution result now
gets its own renderer lookup and metadata entry, so the panel shows
every output produced during the run.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified TypeScript compiles without errors
- [x] Confirmed the flatMap logic correctly iterates all execution
results
  - [x] Verified existing filter for null renderers is preserved
- [x] Run an agent with multiple outputs and confirm all show in the
panel

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-31 20:31:12 +00:00
Zamil Majdy
3adbaacc0e Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-03-31 19:07:34 +02:00
Zamil Majdy
4da3535a9c Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-31 19:07:23 +02:00
Zamil Majdy
3e25488b2d feat(copilot): add session-level dry_run flag to autopilot sessions (#12582)
## Summary
- Adds a session-level `dry_run` flag that forces ALL tool calls
(`run_block`, `run_agent`) in a copilot/autopilot session to use dry-run
simulation mode
- Stores the flag in a typed `ChatSessionMetadata` JSON model on the
`ChatSession` DB row, accessed via `session.dry_run` property
- Adds `dry_run` to the AutoPilot block Input schema so graph builders
can create dry-run autopilot nodes
- Refactors multiple copilot tools from `**kwargs` to explicit
parameters for type safety

## Changes
- **Prisma schema**: Added `metadata` JSON column to `ChatSession` model
with migration
- **Python models**: Added `ChatSessionMetadata` model with `dry_run`
field, added `metadata` field to `ChatSessionInfo` and `ChatSession`,
updated `from_db()`, `new()`, and `create_chat_session()`
- **Session propagation**: `set_execution_context(user_id, session)`
called from `baseline/service.py` so tool handlers can read
session-level flags via `session.dry_run`
- **Tool enforcement**: `run_block` and `run_agent` check
`session.dry_run` and force `dry_run=True` when set; `run_agent` blocks
scheduling in dry-run sessions
- **AutoPilot block**: Added `dry_run` input field, passes it when
creating sessions
- **Chat API**: Added `CreateSessionRequest` model with `dry_run` field
to `POST /sessions` endpoint; added `metadata` to session responses
- **Frontend**: Updated `useChatSession.ts` to pass body to the create
session mutation
- **Tool refactoring**: Multiple copilot tools refactored from
`**kwargs` to explicit named parameters (agent_browser, manage_folders,
workspace_files, connect_integration, agent_output, bash_exec, etc.) for
better type safety

## Test plan
- [x] Unit tests for `ChatSession.new()` with dry_run parameter
- [x] Unit tests for `RunBlockTool` session dry_run override
- [x] Unit tests for `RunAgentTool` session dry_run override
- [x] Unit tests for session dry_run blocks scheduling
- [x] Existing dry_run tests still pass (12/12)
- [x] Existing permissions tests still pass
- [x] All pre-commit hooks pass (ruff, isort, pyright, tsc)
- [ ] Manual: Create autopilot session with `dry_run=True`, verify
run_block/run_agent calls use simulation

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 16:27:36 +00:00
Zamil Majdy
56e0b568a4 fix(backend): update tests for transcript module move and new fixer defaults
- Update patch targets in transcript tests from
  backend.copilot.sdk.transcript to backend.copilot.transcript since
  the re-export shim only re-exports public symbols; private names
  like _projects_base and get_openai_client live in the canonical module.
- Update orchestrator fixer test assertions to account for 2 new
  _SDM_DEFAULTS (execution_mode, model) and add execution_mode to the
  E2E test's mock block inputSchema.
2026-03-31 18:26:18 +02:00
Zamil Majdy
4acac9ff5b fix: remove accidentally committed files and fix duplicate comment
- Remove .application.logs (local debug artifact)
- Remove test-results/ directory with PNG screenshots
- Remove duplicated JSDoc comment in useRateLimitManager.ts
2026-03-31 18:18:35 +02:00
Zamil Majdy
0b0777ac87 fix(copilot): update fix_orchestrator_blocks docstring to list all 6 defaults
The docstring only listed 4 defaults but _SDM_DEFAULTS has 6 entries
including execution_mode and model. Updated to reflect the actual behavior.
2026-03-31 17:49:54 +02:00
Zamil Majdy
698b1599cb fix(copilot): reject stale transcripts in baseline service 2026-03-31 17:41:06 +02:00
Zamil Majdy
a2f94f08d9 fix(copilot): address review comments round 3 2026-03-31 17:35:11 +02:00
Zamil Majdy
0c6f20f728 feat(copilot): set extended_thinking + Opus as OrchestratorBlock defaults
Update the agent generator fixer defaults so generated agents inherit
the copilot's default reasoning mode (extended_thinking with Opus).
User-set values are preserved — the fixer only fills in missing fields.
2026-03-31 17:23:06 +02:00
Zamil Majdy
d100b2515b fix(copilot): include tool messages in baseline conversation context
The baseline was only including user/assistant text messages when
building the OpenAI message list, dropping all tool_calls and tool
results. This meant the model had no memory of previous tool
invocations or their outputs in multi-turn conversations.

Now includes assistant messages with tool_calls and tool-role messages
with tool_call_id, giving the model full conversation context.
2026-03-31 17:12:37 +02:00
Zamil Majdy
14113f96a9 feat(copilot): use Sonnet for fast mode, Opus for extended thinking
Add `fast_model` config field (default: anthropic/claude-sonnet-4) so
fast mode uses a faster/cheaper model while extended thinking keeps
using Opus. The baseline service now uses config.fast_model for all
LLM calls.
2026-03-31 17:07:04 +02:00
Zamil Majdy
ee40a4b9a8 refactor(copilot): move transcript modules to shared location 2026-03-31 16:29:48 +02:00
Zamil Majdy
0008cafc3b fix(copilot): fix transcript ordering and mode toggle mid-session
- Fix transcript ordering: move append_tool_result from tool executor
  to conversation updater so entries follow correct API order
  (assistant tool_use → user tool_result)
- Fix mode toggle mid-session: use useRef for copilotMode so transport
  closure reads latest value without recreating DefaultChatTransport
- Use Literal type for mode in CoPilotExecutionEntry for type safety
2026-03-31 16:02:36 +02:00
Zamil Majdy
f55bc84fe7 fix(copilot): address PR review comments
- Use Literal["fast", "extended_thinking"] for mode validation (blocker)
- Wrap transcript upload in asyncio.shield() (should fix)
- Restore top-level estimate_token_count imports (nice to have)
- Guard localStorage copilotMode read against invalid values (should fix)
- Replace inline SVGs with lucide-react Brain/Zap icons (nice to have)
2026-03-31 15:52:06 +02:00
Zamil Majdy
3cfee4c4b5 feat(copilot): add mode toggle and baseline transcript support
- Add transcript support to baseline autopilot (download/upload/build)
  for feature parity with SDK path, enabling seamless mode switching
- Thread `mode` field through full stack: StreamChatRequest → queue →
  executor → service selection (fast=baseline, extended_thinking=SDK)
- Add mode toggle button in ChatInput UI with brain/lightning icons
- Persist mode preference in localStorage via Zustand store
2026-03-31 15:46:23 +02:00
Zamil Majdy
c48b5239b9 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-31 15:17:31 +02:00
goingforstudying-ctrl
c410be890e fix: add empty choices guard in extract_openai_tool_calls() (#12540)
## Summary

`extract_openai_tool_calls()` in `llm.py` crashes with `IndexError` when
the LLM provider returns a response with an empty `choices` list.

### Changes 🏗️

- Added a guard check `if not response.choices: return None` before
accessing `response.choices[0]`
- This is consistent with the function's existing pattern of returning
`None` when no tool calls are found

### Bug Details

When an LLM provider returns a response with an empty choices list
(e.g., due to content filtering, rate limiting, or API errors),
`response.choices[0]` raises `IndexError`. This can crash the entire
agent execution pipeline.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- Verified that the function returns `None` when `response.choices` is
empty
- Verified existing behavior is unchanged when `response.choices` is
non-empty

---------

Co-authored-by: goingforstudying-ctrl <forgithubuse@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-31 20:10:27 +07:00
Zamil Majdy
37d9863552 feat(platform): add extended thinking execution mode to OrchestratorBlock (#12512)
## Summary
- Adds `ExecutionMode` enum with `BUILT_IN` (default built-in tool-call
loop) and `EXTENDED_THINKING` (delegates to Claude Agent SDK for richer
reasoning)
- Extracts shared `tool_call_loop` into `backend/util/tool_call_loop.py`
— reusable by both OrchestratorBlock agent mode and copilot baseline
- Refactors copilot baseline to use the shared `tool_call_loop` with
callback-driven iteration

## ExecutionMode enum
`ExecutionMode` (`backend/blocks/orchestrator.py`) controls how
OrchestratorBlock executes tool calls:
- **`BUILT_IN`** — Default mode. Runs the built-in tool-call loop
(supports all LLM providers).
- **`EXTENDED_THINKING`** — Delegates to the Claude Agent SDK for
extended thinking and multi-step planning. Requires Anthropic-compatible
providers (`anthropic` / `open_router`) and direct API credentials
(subscription mode not supported). Validates both provider and model
name at runtime.

## Shared tool_call_loop
`backend/util/tool_call_loop.py` provides a generic, provider-agnostic
conversation loop:
1. Call LLM with tools → 2. Extract tool calls → 3. Execute tools → 4.
Update conversation → 5. Repeat

Callers provide three callbacks:
- `llm_call`: wraps any LLM provider (OpenAI streaming, Anthropic,
llm.llm_call, etc.)
- `execute_tool`: wraps any tool execution (TOOL_REGISTRY, graph block
execution, etc.)
- `update_conversation`: formats messages for the specific protocol

## OrchestratorBlock EXTENDED_THINKING mode
- `_create_graph_mcp_server()` converts graph-connected blocks to MCP
tools
- `_execute_tools_sdk_mode()` runs `ClaudeSDKClient` with those MCP
tools
- Agent mode refactored to use shared `tool_call_loop`

## Copilot baseline refactored
- Streaming callbacks buffer `Stream*` events during loop execution
- Events are drained after `tool_call_loop` returns
- Same conversation logic, less code duplication

## SDK environment builder extraction
- `build_sdk_env()` extracted to `backend/copilot/sdk/env.py` for reuse
by both copilot SDK service and OrchestratorBlock

## Provider validation
EXTENDED_THINKING mode validates `provider in ('anthropic',
'open_router')` and `model_name.startswith('claude')` because the Claude
Agent SDK requires an Anthropic API key or OpenRouter key. Subscription
mode is not supported — it uses the platform's internal credit system
which doesn't provide raw API keys needed by the SDK. The validation
raises a clear `ValueError` if an unsupported provider or model is used.

## PR Dependencies
This PR builds on #12511 (Claude SDK client). It can be reviewed
independently — #12511 only adds the SDK client module which this PR
imports. If #12511 merges first, this PR will have no conflicts.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All pre-commit hooks pass (typecheck, lint, format)
  - [x] Existing OrchestratorBlock tests still pass
- [x] Copilot baseline behavior unchanged (same stream events, same tool
execution)
- [x] Manual: OrchestratorBlock with execution_mode=EXTENDED_THINKING +
downstream blocks → SDK calls tools
  - [x] Agent mode regression test (non-SDK path works as before)
  - [x] SDK mode error handling (invalid provider raises ValueError)
2026-03-31 20:04:13 +07:00
Abhimanyu Yadav
57b17dc8e1 feat(platform): generic managed credential system with AgentMail auto-provisioning (#12537)
### Why / What / How

**Why:** We need a third credential type: **system-provided but unique
per user** (managed credentials). Currently we have system credentials
(same for all users) and user credentials (user provides their own
keys). Managed credentials bridge the gap — the platform provisions them
automatically, one per user, for integrations like AgentMail where each
user needs their own pod-scoped API key.

**What:**
- Generic **managed credential provider registry** — any integration can
register a provider that auto-provisions per-user credentials
- **AgentMail** is the first consumer: creates a pod + pod-scoped API
key using the org-level API key
- Managed credentials appear in the credential dropdown like normal API
keys but with `autogpt_managed=True` — users **cannot update or delete**
them
- **Auto-provisioning** on `GET /credentials` — lazily creates managed
credentials when users browse their credential list
- **Account deletion cleanup** utility — revokes external resources
(pods, API keys) before user deletion
- **Frontend UX** — hides the delete button for managed credentials on
the integrations page

**How:**

### Backend

**New files:**
- `backend/integrations/managed_credentials.py` —
`ManagedCredentialProvider` ABC, global registry,
`ensure_managed_credentials()` (with per-user asyncio lock +
`asyncio.gather` for concurrency), `cleanup_managed_credentials()`
- `backend/integrations/managed_providers/__init__.py` —
`register_all()` called at startup
- `backend/integrations/managed_providers/agentmail.py` —
`AgentMailManagedProvider` with `provision()` (creates pod + API key via
agentmail SDK) and `deprovision()` (deletes pod)

**Modified files:**
- `credentials_store.py` — `autogpt_managed` guards on update/delete,
`has_managed_credential()` / `add_managed_credential()` helpers
- `model.py` — `autogpt_managed: bool` + `metadata: dict` on
`_BaseCredentials`
- `router.py` — calls `ensure_managed_credentials()` in list endpoints,
removed explicit `/agentmail/connect` endpoint
- `user.py` — `cleanup_user_managed_credentials()` for account deletion
- `rest_api.py` — registers managed providers at startup
- `settings.py` — `agentmail_api_key` setting

### Frontend
- Added `autogpt_managed` to `CredentialsMetaResponse` type
- Conditionally hides delete button on integrations page for managed
credentials

### Key design decisions
- **Auto-provision in API layer, not data layer** — keeps
`get_all_creds()` side-effect-free
- **Race-safe** — per-(user, provider) asyncio lock with double-check
pattern prevents duplicate pods
- **Idempotent** — AgentMail SDK `client_id` ensures pod creation is
idempotent; `add_managed_credential()` uses upsert under Redis lock
- **Error-resilient** — provisioning failures are logged but never block
credential listing

### Changes 🏗️

| File | Action | Description |
|------|--------|-------------|
| `backend/integrations/managed_credentials.py` | NEW | ABC, registry,
ensure/cleanup |
| `backend/integrations/managed_providers/__init__.py` | NEW | Registers
all providers at startup |
| `backend/integrations/managed_providers/agentmail.py` | NEW |
AgentMail provisioning/deprovisioning |
| `backend/integrations/credentials_store.py` | MODIFY | Guards +
managed credential helpers |
| `backend/data/model.py` | MODIFY | `autogpt_managed` + `metadata`
fields |
| `backend/api/features/integrations/router.py` | MODIFY |
Auto-provision on list, removed `/agentmail/connect` |
| `backend/data/user.py` | MODIFY | Account deletion cleanup |
| `backend/api/rest_api.py` | MODIFY | Provider registration at startup
|
| `backend/util/settings.py` | MODIFY | `agentmail_api_key` setting |
| `frontend/.../integrations/page.tsx` | MODIFY | Hide delete for
managed creds |
| `frontend/.../types.ts` | MODIFY | `autogpt_managed` field |

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] 23 tests pass in `router_test.py` (9 new tests for
ensure/cleanup/auto-provisioning)
  - [x] `poetry run format && poetry run lint` — clean
  - [x] OpenAPI schema regenerated
- [x] Manual: verify managed credential appears in AgentMail block
dropdown
  - [x] Manual: verify delete button hidden for managed credentials
- [x] Manual: verify managed credential cannot be deleted via API (403)

#### For configuration changes:
- [x] `.env.default` is updated with `AGENTMAIL_API_KEY=`

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-31 12:56:18 +00:00
Krishna Chaitanya
a20188ae59 fix(blocks): validate non-empty input in AIConversationBlock before LLM call (#12545)
### Why / What / How

**Why:** When `AIConversationBlock` receives an empty messages list and
an empty prompt, the block blindly forwards the empty array to the
downstream LLM API, which returns a cryptic `400 Bad Request` error:
`"Invalid 'messages': empty array. Expected an array with minimum length
1."` This is confusing for users who don't understand why their agent
failed.

**What:** Add early input validation in `AIConversationBlock.run()` that
raises a clear `ValueError` when both `messages` and `prompt` are empty.
Also add three unit tests covering the validation logic.

**How:** A simple guard clause at the top of the `run` method checks `if
not input_data.messages and not input_data.prompt` before the LLM call
is made. If both are empty, a descriptive `ValueError` is raised. If
either one has content, the block proceeds normally.

### Changes

- `autogpt_platform/backend/backend/blocks/llm.py`: Add validation guard
in `AIConversationBlock.run()` to reject empty messages + empty prompt
before calling the LLM
- `autogpt_platform/backend/backend/blocks/test/test_llm.py`: Add
`TestAIConversationBlockValidation` with three tests:
- `test_empty_messages_and_empty_prompt_raises_error` — validates the
guard clause
- `test_empty_messages_with_prompt_succeeds` — ensures prompt-only usage
still works
- `test_nonempty_messages_with_empty_prompt_succeeds` — ensures
messages-only usage still works

### Checklist

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Lint passes (`ruff check`)
  - [x] Formatting passes (`ruff format`)
- [x] New unit tests validate the empty-input guard and the happy paths

Closes #11875

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-31 12:43:42 +00:00
Krishna Chaitanya
2f42ff9b47 fix(blocks): validate email recipients in Gmail blocks before API call (#12546)
### Why / What / How

**Why:** When a user or LLM supplies a malformed recipient string (e.g.
a bare username, a JSON blob, or an empty value) to `GmailSendBlock`,
`GmailCreateDraftBlock`, or any reply block, the Gmail API returns an
opaque `HttpError 400: "Invalid To header"`. This surfaces as a
`BlockUnknownError` with no actionable guidance, making it impossible
for the LLM to self-correct. (Fixes #11954)

**What:** Adds a lightweight `validate_email_recipients()` function that
checks every recipient against a simplified RFC 5322 pattern
(`local@domain.tld`) and raises a clear `ValueError` listing all invalid
entries before any API call is made.

**How:** The validation is called in two shared code paths —
`create_mime_message()` (used by send and draft blocks) and
`_build_reply_message()` (used by reply blocks) — so all Gmail blocks
that compose outgoing email benefit from it with zero per-block changes.
The regex is intentionally permissive (any `x@y.z` passes) to avoid
false positives on unusual but valid addresses.

### Changes 🏗️

- Added `validate_email_recipients()` helper in `gmail.py` with a
compiled regex
- Hooked validation into `create_mime_message()` for `to`, `cc`, and
`bcc` fields
- Hooked validation into `_build_reply_message()` for reply/draft-reply
blocks
- Added `TestValidateEmailRecipients` test class covering valid,
invalid, mixed, empty, JSON-string, and field-name scenarios

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `validate_email_recipients` correctly accepts valid
emails (`user@example.com`, `a@b.com`, `test@sub.domain.co`)
- [x] Verified it rejects malformed entries (bare names, missing domain
dot, empty strings, JSON strings)
- [x] Verified error messages include the field name and all invalid
entries
  - [x] Verified empty recipient lists pass without error
  - [x] Confirmed `gmail.py` and test file parse correctly (AST check)

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-31 12:37:33 +00:00
Zamil Majdy
914efc53e5 fix(backend): disambiguate duplicate tool names in OrchestratorBlock (#12555)
## Why
The OrchestratorBlock fails with `Tool names must be unique` when
multiple nodes use the same block type (e.g., two "Web Search" blocks
connected as tools). The Anthropic API rejects the request because
duplicate tool names are sent.

## What
- Detect duplicate tool names after building tool signatures
- Append `_1`, `_2`, etc. suffixes to disambiguate
- Enrich descriptions of duplicate tools with their hardcoded default
values so the LLM can distinguish between them
- Clean up internal `_hardcoded_defaults` metadata before sending to API
- Exclude sensitive/credential fields from default value descriptions

## How
- After `_create_tool_node_signatures` builds all tool functions, count
name occurrences
- For duplicates: rename with suffix and append `[Pre-configured:
key=value]` to description using the node's `input_default` (excluding
linked fields that the LLM provides)
- Added defensive `isinstance(defaults, dict)` check for compatibility
with test mocks
- Suffix collision avoidance: skips candidates that collide with
existing tool names
- Long tool names truncated to fit within 64-character API limit
- 47 unit tests covering: basic dedup, description enrichment, unique
names unchanged, no metadata leaks, single tool, triple duplicates,
linked field exclusion, mixed unique/duplicate scenarios, sensitive
field exclusion, long name truncation, suffix collision, malformed
tools, missing description, empty list, 10-tool all-same-name, multiple
distinct groups, large default truncation, suffix collision cascade,
parameter preservation, boundary name lengths, nested dict/list
defaults, null defaults, customized name priority, required fields

## Test plan
- [x] All 47 tests in `test_orchestrator_tool_dedup.py` pass
- [x] All 11 existing orchestrator unit tests pass (dict, dynamic
fields, responses API)
- [x] Pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] Manual test: connect two same-type blocks to an orchestrator and
verify the LLM call succeeds

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 11:54:10 +00:00
Carson Kahn
17e78ca382 fix(docs): remove extraneous whitespace in README (#12587)
### Why / What / How

Remove extraneous whitespace in README.md:
- "Workflow Management" description: extra spaces between "block" and
"performs"
- "Agent Interaction" description: extra spaces between "user-friendly"
and "interface"

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-31 08:38:45 +00:00
Ubbe
7ba05366ed feat(platform/copilot): live timer stats with persisted duration (#12583)
## Why

The copilot chat had no indication of how long the AI spent "thinking"
on a response. Users couldn't tell if a long wait was normal or
something was stuck. Additionally, the thinking duration was lost on
page reload since it was only tracked client-side.

## What

- **Live elapsed timer**: Shows elapsed time ("23s", "1m 5s") in the
ThinkingIndicator while the AI is processing (appears after 20s to avoid
spam on quick responses)
- **Frozen "Thought for Xm Ys"**: Displays the final thinking duration
in TurnStatsBar after the response completes
- **Persisted duration**: Saves `durationMs` on the last assistant
message in the DB so the timer survives page reloads

## How

**Backend:**
- Added `durationMs Int?` column to `ChatMessage` (Prisma migration)
- `mark_session_completed` in `stream_registry.py` computes wall-clock
duration from Redis session `created_at` and saves it via
`DatabaseManager.set_turn_duration()`
- Invalidates Redis session cache after writing so GET returns fresh
data

**Frontend:**
- `useElapsedTimer` hook tracks client-side elapsed seconds during
streaming
- `ThinkingIndicator` shows only the elapsed time (no phrases) after
20s, with `font-mono text-sm` styling
- `TurnStatsBar` displays "Thought for Xs" after completion, preferring
live `elapsedSeconds` and falling back to persisted `durationMs`
- `convertChatSessionToUiMessages` extracts `duration_ms` from
historical messages into a `Map<string, number>` threaded through to
`ChatMessagesContainer`

## Test plan

- [ ] Send a message in copilot — verify ThinkingIndicator shows elapsed
time after 20s
- [ ] After response completes — verify "Thought for Xs" appears below
the response
- [ ] Refresh the page — verify "Thought for Xs" still appears
(persisted from DB)
- [ ] Check older conversations — they should NOT show timer (no
historical data)
- [ ] Verify no Zod/SSE validation errors in browser console

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 16:46:31 +07:00
Zamil Majdy
e44615f8b8 fix(frontend): merge tier into refreshed data after tier change 2026-03-30 06:00:51 +02:00
Zamil Majdy
22f0da0a03 fix(backend): correct ENTERPRISE multiplier comment (50x → 60x) 2026-03-29 20:55:40 +02:00
Zamil Majdy
9264b42050 fix(frontend): fetch user tier on admin rate-limits page
The Subscription Tier dropdown showed "PRO" for all users because
the tier was never fetched from the backend. Now fetches the tier
via getV2GetUserRateLimitTier after loading rate limits, and uses
postV2SetUserRateLimitTier (generated client) instead of raw fetch
for tier changes.
2026-03-29 13:55:49 +02:00
Zamil Majdy
3a40188024 test(backend): add end-to-end tests for tier-adjusted rate limits
Add TestTierLimitsRespected class that verifies the full flow:
get_global_rate_limits (with tier multiplier) -> check_rate_limit.

- PRO user with 3M usage is allowed (below 12.5M PRO limit)
- FREE user at 2.5M is blocked (at FREE limit)
- ENTERPRISE user with 100M usage is allowed (below 150M limit)

Addresses reviewer feedback requesting tests that verify limits are
actually respected end-to-end.
2026-03-29 11:56:05 +02:00
Zamil Majdy
8d6433c1a5 Merge branch 'feat/rate-limit-tiering' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-29 06:42:40 +02:00
Zamil Majdy
c7430eaffb fix(platform): use lazy logger formatting in rate limit admin routes
Replace f-string interpolation in logger.info() calls with %s-style
lazy formatting to avoid unnecessary string construction when the log
level is above INFO.
2026-03-29 06:42:03 +02:00
Zamil Majdy
dc272559c6 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-29 04:19:35 +02:00
Zamil Majdy
a98b0aee95 style(frontend): format useRateLimitManager.ts with prettier
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:56:54 +00:00
Zamil Majdy
264869cab9 fix(frontend): correct proxy path to /api/proxy/api/ for fetch calls
The Next.js proxy at /api/proxy/[...path] forwards the path to
AGPT_SERVER_URL which already includes /api. So the path needs
/api/proxy/api/... (double api — one for proxy route, one for backend).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:47:29 +00:00
Zamil Majdy
a85ba9e36d fix(frontend): use /api/proxy/ prefix for search_users and tier fetch calls
The generated API hooks use /api/proxy/ as baseUrl. Raw fetch() calls
must use the same proxy path to reach the backend through Next.js.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:32:52 +00:00
Zamil Majdy
18c5f67107 fix(frontend): use search_users only, remove credit-history fallback
The getV2GetAllUsersHistory searches transactions, not users — useless
for user search. Only use the search_users endpoint which queries
the User table directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:31:26 +00:00
Zamil Majdy
0348e7b228 fix(frontend): add fallback to credit-history search when search_users unavailable
The search_users endpoint may not be deployed in preview environments
(Docker cache). Falls back to getV2GetAllUsersHistory (credit
transactions) which at least returns users with transaction history.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:26:25 +00:00
Zamil Majdy
e35376d3ec fix(frontend): regenerate openapi.json from backend export-api-schema
Generated using `poetry run export-api-schema` + prettier, matching
the exact CI pipeline. Includes all new endpoints: search_users,
tier management, SubscriptionTier enum.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:24:35 +00:00
Zamil Majdy
687af1bdc3 fix(frontend): propagate fetchRateLimit errors in handleTierChange
Use direct getV2GetUserRateLimit call instead of fetchRateLimit
(which swallows errors internally). This ensures the caller's
success/error toast is accurate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 12:26:35 +00:00
Zamil Majdy
694032e45f revert(frontend): restore PR-specific openapi.json
The dev server spec doesn't include this PR's changes (tier endpoints,
SubscriptionTier enum). Reverting to the PR-specific version.

The check API types CI requires a local backend run to generate the
exact matching spec. This is a limitation for endpoint-adding PRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:10:15 +00:00
Zamil Majdy
231a4b6f51 fix(frontend): use dev server spec as base for openapi.json
Uses the actual backend-generated spec from dev server as the base,
adds search_users endpoint, sorts alphabetically, and runs prettier.
This matches the exact CI pipeline: export → prettier → diff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:05:38 +00:00
Zamil Majdy
da6f77da47 fix(frontend): sort openapi.json paths alphabetically to match backend
The backend generates paths in alphabetical order. Our manually added
endpoint was at the end. Also fix unicode em-dash encoding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:47:47 +00:00
Zamil Majdy
1747f4e6f3 fix(frontend): add search_users endpoint to openapi.json in CI format
Uses exact format from CI-generated spec (tags, operationId, security).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:40:39 +00:00
Zamil Majdy
0d6d8e820c style(frontend): format openapi.json with prettier
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:25:55 +00:00
Zamil Majdy
24c286fbed fix(frontend): remove manual OpenAPI additions, let CI generate
The check API types CI job generates openapi.json from the running
backend. Manual additions don't match the auto-generated format.
Removing them so CI can generate the correct spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:17:19 +00:00
Zamil Majdy
c75f1ff749 fix(frontend): add search_users to OpenAPI spec and regenerate types
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:04:47 +00:00
Zamil Majdy
cfc6d3538c fix(backend): format user.py and add search_users endpoint tests
Fixes ruff formatting in search_users function. Adds tests for:
- Search returning multiple matching users
- Search with no results returning empty list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:54:32 +00:00
Zamil Majdy
e9540041d6 fix(platform): search users from User table instead of credit history
The admin rate-limits user search was querying CreditTransaction table,
which only returns users with transaction history. Users without any
credit transactions (e.g. new accounts) were missing from results.

Adds search_users() to data/user.py that queries the User table directly
with case-insensitive partial matching on email and name. Adds a new
GET /api/copilot/admin/rate_limit/search_users endpoint. Updates the
frontend to use this instead of the spending-history search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:23:14 +00:00
Zamil Majdy
ca74f980c1 fix(copilot): resolve host-scoped credentials for authenticated web requests (#12579)
## Summary
- Fixed `_resolve_discriminated_credentials()` in `helpers.py` to handle
URL/host-based credential discrimination (used by
`SendAuthenticatedWebRequestBlock`)
- Previously, only provider-based discrimination (with
`discriminator_mapping`) was handled; URL-based discrimination (with
`discriminator` set but no `discriminator_mapping`) was silently skipped
- This caused host-scoped credentials to either match the wrong host or
fail to match at all when the CoPilot called `run_block` for
authenticated HTTP requests
- Added 14 targeted tests covering discriminator resolution, host
matching, credential resolution integration, and RunBlockTool end-to-end
flows

## Root Cause
`_resolve_discriminated_credentials()` checked `if
field_info.discriminator and field_info.discriminator_mapping:` which
excluded host-scoped credentials where `discriminator="url"` but
`discriminator_mapping=None`. The URL from `input_data` was never added
to `discriminator_values`, so `_credential_is_for_host()` received empty
`discriminator_values` and returned `True` for **any** host-scoped
credential regardless of URL match.

## Fix
When `discriminator` is set without `discriminator_mapping`, the URL
value from `input_data` is now copied into `discriminator_values` on a
shallow copy of the field info (to avoid mutating the cached schema).
This enables `_credential_is_for_host()` to properly match the
credential's host against the target URL.

## Test plan
- [x] `TestResolveDiscriminatedCredentials` - 4 tests verifying URL
discriminator populates values, handles missing URL, doesn't mutate
original, preserves provider/type
- [x] `TestFindMatchingHostScopedCredential` - 5 tests verifying
correct/wrong host matching, wildcard hosts, multiple credential
selection
- [x] `TestResolveBlockCredentials` - 3 integration tests verifying full
credential resolution with matching/wrong/missing hosts
- [x] `TestRunBlockToolAuthenticatedHttp` - 2 end-to-end tests verifying
SetupRequirementsResponse when creds missing and BlockDetailsResponse
when creds matched
- [x] All 28 existing + new tests pass
- [x] Ruff lint, isort, Black formatting, pyright typecheck all pass

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:12:33 +00:00
Zamil Majdy
8ac86a03b5 fix(platform): correct tier multiplier labels and add tier validation tests
Fix TIER_MULTIPLIERS mismatch in RateLimitDisplay.tsx where PRO showed
"10x" (should be "5x") and BUSINESS showed "30x" (should be "20x"),
not matching backend rate_limit.py values.

Add tests for invalid tier API input (uppercase "INVALID"), FREE-tier
bypass prevention (negative test), and tier-change limit propagation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:00:55 +00:00
Zamil Majdy
2aac78eae4 fix(frontend): fix lint and type errors in tier selector
- Replace template literal with regular string for static URL
- Fix TypeScript cast via intermediate `unknown` for tier field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 23:43:28 +00:00
Zamil Majdy
dbfc791357 feat(frontend): add subscription tier selector to admin rate-limits page
Adds tier badge display and dropdown selector to the admin rate-limits
page. Admins can now view and change a user's subscription tier
(FREE/PRO/BUSINESS/ENTERPRISE) with multiplier info. The dropdown calls
POST /api/copilot/admin/rate_limit/tier and re-fetches limits to reflect
the new tier.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:18:12 +00:00
Zamil Majdy
68f5d2ad08 fix(blocks): raise AIConditionBlock errors instead of swallowing them (#12593)
## Why

Sentry alert
[AUTOGPT-SERVER-8C8](https://significant-gravitas.sentry.io/issues/7367978095/)
— `AIConditionBlock` failing in prod with:

```
Invalid 'max_output_tokens': integer below minimum value.
Expected a value >= 16, but got 10 instead.
```

Two problems:
1. `max_tokens=10` is below OpenAI's new minimum of 16
2. The `except Exception` handler was calling `logger.error()` which
triggered Sentry for what are known block errors, AND silently
defaulting to `result=False` — making the block appear to succeed with
an incorrect answer

## What

- Bump `max_tokens` from 10 to 16 (fixes the root cause)
- Remove the `try/except` entirely — the executor already handles
exceptions correctly (`ValueError` = known/no Sentry, everything else =
unknown/Sentry). The old handler was just swallowing errors and
producing wrong results.

## Test plan

- [x] Existing `AIConditionBlock` tests pass (block only expects
"true"/"false", 16 tokens is plenty)
- [x] No more silent `result=False` on errors
- [x] No more spurious Sentry alerts from `logger.error()`

Fixes AUTOGPT-SERVER-8C8

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 10:28:14 +00:00
Nicholas Tindle
2b3d730ca9 dx(skills): add /open-pr and /setup-repo skills (#12591)
### Why / What / How

**Why:** Agents working in worktrees lack guidance on two of the most
common workflows: properly opening PRs (using the repo template,
validating test coverage, triggering the review bot) and bootstrapping
the repo from scratch with a worktree-based layout. Without these
skills, agents either skip steps (no test plan, wrong template) or
require manual hand-holding for setup.

**What:** Adds two new Claude Code skills under `.claude/skills/`:
- `/open-pr` — A structured PR creation workflow that enforces the
canonical `.github/PULL_REQUEST_TEMPLATE.md`, validates test coverage
for existing and new behaviors, supports a configurable base branch, and
integrates the `/review` bot workflow for agents without local testing
capability. Cross-references `/pr-test`, `/pr-review`, and `/pr-address`
for the full PR lifecycle.
- `/setup-repo` — An interactive repo bootstrapping skill that creates a
worktree-based layout (main + reviews + N numbered work branches).
Handles .env file provisioning with graceful fallbacks (.env.default,
.env.example), copies branchlet config, installs dependencies, and is
fully idempotent (safe to re-run).

**How:** Markdown-based SKILL.md files following the existing skill
conventions. Both skills use proper bash patterns (seq-based loops
instead of brace expansion with variables, existence checks before
branch/worktree creation, error reporting on install failures).
`/open-pr` delegates to AskUserQuestion-style prompts for base branch
selection. `/setup-repo` uses AskUserQuestion for interactive branch
count and base branch selection.

### Changes 🏗️

- Added `.claude/skills/open-pr/SKILL.md` — PR creation workflow with:
  - Pre-flight checks (committed, pushed, formatted)
- Test coverage validation (existing behavior not broken, new behavior
covered)
- Canonical PR template enforcement (read and fill verbatim, no
pre-checked boxes)
  - Configurable base branch (defaults to dev)
- Review bot workflow (`/review` comment + 30min wait) for agents
without local testing
  - Related skills table linking `/pr-test`, `/pr-review`, `/pr-address`

- Added `.claude/skills/setup-repo/SKILL.md` — Repo bootstrap workflow
with:
- Interactive setup (branch count: 4/8/16/custom, base branch selection)
- Idempotent branch creation (skips existing branches with info message)
  - Idempotent worktree creation (skips existing directories)
- .env provisioning with fallback chain (.env → .env.default →
.env.example → warning)
  - Branchlet config propagation
  - Dependency installation with success/failure reporting per worktree

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified SKILL.md frontmatter follows existing skill conventions
  - [x] Verified trigger conditions match expected user intents
  - [x] Verified cross-references to existing skills are accurate
- [x] Verified PR template section matches
`.github/PULL_REQUEST_TEMPLATE.md`
- [x] Verified bash snippets use correct patterns (seq, show-ref, quoted
vars)
  - [x] Pre-commit hooks pass on all commits
  - [x] Addressed all CodeRabbit, Sentry, and Cursor review comments

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk documentation-only change: adds new markdown skills without
modifying runtime code. Main risk is workflow guidance drift (e.g.,
`.env`/worktree steps) if it diverges from actual repo conventions.
> 
> **Overview**
> Adds two new Claude Code skills under `.claude/skills/` to standardize
common developer workflows.
> 
> `/open-pr` documents a PR creation flow that enforces using
`.github/PULL_REQUEST_TEMPLATE.md` verbatim, calls out required test
coverage, and describes how to trigger/poll the `/review` bot when local
testing isn’t available.
> 
> `/setup-repo` documents an idempotent, interactive bootstrap for a
multi-worktree layout (creates `reviews` and `branch1..N`, provisions
`.env` files with `.env.default`/`.env.example` fallbacks, copies
`.branchlet.json`, and installs dependencies), complementing the
existing `/worktree` skill.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
80dbeb1596. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-03-27 10:22:03 +00:00
Zamil Majdy
f28628e34b fix(backend): preserve thinking blocks during transcript compaction (#12574)
## Why

AutoPilot users hit `invalid_request_error` ("thinking or
redacted_thinking blocks in the latest assistant message cannot be
modified") when sessions get long enough to trigger transcript
compaction. The Anthropic API requires thinking blocks in the last
assistant message to be byte-for-byte identical to the original response
— our compaction was flattening them to plain text, destroying the
cryptographic signatures.

Reported in Discord `#breakage` by John Ababseh with session
`31d3f08a-cb94-45eb-9fce-56b3f0287ef4`.

## What

- **`compact_transcript`** now splits the transcript into a compressible
prefix and a preserved tail (last assistant entry + trailing entries).
Only the prefix is compressed; the tail is re-appended verbatim,
preserving thinking blocks exactly.
- **`_flatten_assistant_content`** now silently drops `thinking` and
`redacted_thinking` blocks instead of creating `[__thinking__]`
placeholders — they carry no useful context for compression summaries.
- **`response_adapter`** explicitly handles `ThinkingBlock` (skip
gracefully instead of silently falling through the isinstance chain).
- **`_format_sdk_content_blocks`** now passes through raw dict blocks
(e.g. `redacted_thinking` that the SDK may not have a typed class for)
verbatim to the transcript.

## How

The key insight is the Anthropic API's asymmetric constraint:
- **Last assistant message**: thinking/redacted_thinking blocks must be
preserved byte-for-byte
- **Older assistant messages**: thinking blocks can be removed entirely

`compact_transcript` uses `_find_last_assistant_entry()` to split the
JSONL into two parts:
1. **Prefix** (everything before the last assistant): flattened and
compressed normally
2. **Tail** (last assistant + any trailing user message): preserved
verbatim and re-chained via `_rechain_tail()` to maintain the
`parentUuid` chain

This ensures the API always sees the original thinking blocks in the
last assistant message while still achieving meaningful compression on
older turns.

## Test plan
- [x] 25 new tests across `thinking_blocks_test.py` (TDD: written before
implementation)
- [x] `_find_last_assistant_entry` splits correctly at last assistant,
handles edges (no assistant, index 0, trailing user)
  - [x] `_rechain_tail` patches parentUuid chain, handles empty tail
- [x] `_flatten_assistant_content` strips thinking/redacted_thinking
blocks, handles mixed content
  - [x] `compact_transcript` preserves last assistant's thinking blocks
- [x] `compact_transcript` strips thinking from older assistant messages
- [x] Edge cases: trailing user message, single assistant, no thinking
blocks
  - [x] `response_adapter` handles ThinkingBlock without crash
- [x] `_format_sdk_content_blocks` preserves thinking block format and
raw dict blocks
- [x] All existing copilot SDK tests pass
- [x] Pre-commit hooks (lint, format, typecheck) all pass
2026-03-27 06:36:52 +00:00
Zamil Majdy
880c957c86 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-27 13:29:04 +07:00
Zamil Majdy
857a8ef0aa test(rate-limit): add tier-limit enforcement integration tests
Verifies that tier-multiplied limits are actually respected: usage
within allowance passes, usage at/above limit is rejected, and
higher tiers tolerate usage that would exceed lower tiers.
2026-03-27 13:15:22 +07:00
Zamil Majdy
1008f9fcd4 merge: resolve conflicts with dev, keep tier changes
Merge origin/dev into feat/rate-limit-tiering. Conflicts arose from
the admin-routes refactor (resolved_id rename, _patch_rate_limit_deps
helper) colliding with our 3-tuple get_global_rate_limits and tier
field additions. Resolution keeps our SubscriptionTier enum, 3-tuple
returns, and tier fields while adopting the incoming resolved_id
variable and DRY test helper. Snapshots now include both tier and
user_email fields.
2026-03-27 13:12:38 +07:00
Zamil Majdy
b6a027fd2b fix(platform): fix prod Sentry errors and reduce on-call alert noise (#12565)
## Why

Multiple Sentry issues paging on-call in prod:

1. **AUTOGPT-SERVER-8BP**: `ConversionError: Failed to convert
anthropic/claude-sonnet-4-6 to <enum 'LlmModel'>` — the copilot passes
OpenRouter-style provider-prefixed model names
(`anthropic/claude-sonnet-4-6`) to blocks, but the `LlmModel` enum only
recognizes the bare model ID (`claude-sonnet-4-6`).

2. **BUILDER-7GF**: `Error invoking postEvent: Method not found` —
Sentry SDK internal error on Chrome Mobile Android, not a platform bug.

3. **XMLParserBlock**: `BlockUnknownError raised by XMLParserBlock with
message: Error in input xml syntax` — user sent bad XML but the block
raised `SyntaxError`, which gets wrapped as `BlockUnknownError`
(unexpected) instead of `BlockExecutionError` (expected).

4. **AUTOGPT-SERVER-8BS**: `Virus scanning failed for Screenshot
2026-03-26 091900.png: range() arg 3 must not be zero` — empty (0-byte)
file upload causes `range(0, 0, 0)` in the virus scanner chunking loop,
and the failure is logged at `error` level which pages on-call.

5. **AUTOGPT-SERVER-8BT**: `ValueError: <Token var=<ContextVar
name='current_context'>> was created in a different Context` —
OpenTelemetry `context.detach()` fails when the SDK streaming async
generator is garbage-collected in a different context than where it was
created (client disconnect mid-stream).

6. **AUTOGPT-SERVER-8BW**: `RuntimeError: Attempted to exit cancel scope
in a different task than it was entered in` — anyio's
`TaskGroup.__aexit__` detects cancel scope entered in one task but
exited in another when `GeneratorExit` interrupts the SDK cleanup during
client disconnect.

7. **Workspace UniqueViolationError**: `UniqueViolationError: Unique
constraint failed on (workspaceId, path)` — race condition during
concurrent file uploads handled by `WorkspaceManager._persist_db_record`
retry logic, but Sentry still captures the exception at the raise site.

8. **Library UniqueViolationError**: `UniqueViolationError` on
`LibraryAgent (userId, agentGraphId, agentGraphVersion)` — race
conditions in `add_graph_to_library` and `create_library_agent` caused
crashes or silent data loss.

9. **Graph version collision**: `UniqueViolationError` on `AgentGraph
(id, version)` — copilot re-saving an agent at an existing version
collides with the primary key.

## What

### Backend: `LlmModel._missing_()` for provider-prefixed model names
- Adds `_missing_` classmethod to `LlmModel` enum that strips the
provider prefix (e.g., `anthropic/`) when direct lookup fails
- Self-contained in the enum — no changes to the generic type conversion
system

### Frontend: Filter Sentry SDK noise
- Adds `postEvent: Method not found` to `ignoreErrors` — a known Sentry
SDK issue on certain mobile browsers

### Backend: XMLParserBlock — raise ValueError instead of SyntaxError
- Changed `_validate_tokens()` to raise `ValueError` instead of
`SyntaxError`
- Changed the `except SyntaxError` handler in `run()` to re-raise as
`ValueError`
- This ensures `Block.execute()` wraps XML parsing failures as
`BlockExecutionError` (expected/user-caused) instead of
`BlockUnknownError` (unexpected/alerts Sentry)

### Backend: Virus scanner — handle empty files + reduce alert noise
- Added early return for empty (0-byte) files in `scan_file()` to avoid
`range() arg 3 must not be zero` when `chunk_size` is 0
- Added `max(1, len(content))` guard on `chunk_size` as defense-in-depth
- Downgraded `scan_content_safe` failure log from `error` to `warning`
so single-file scan failures don't page on-call via Sentry

### Backend: Suppress SDK client cleanup errors on SSE disconnect
- Replaced `async with ClaudeSDKClient` in `_run_stream_attempt` with
manual `__aenter__`/`__aexit__` wrapped in new
`_safe_close_sdk_client()` helper
- `_safe_close_sdk_client()` catches `ValueError` (OTEL context token
mismatch) and `RuntimeError` (anyio cancel scope in wrong task) during
`__aexit__` and logs at `debug` level — these are expected when SSE
client disconnects mid-stream
- Added `_is_sdk_disconnect_error()` helper for defense-in-depth at the
outer `except BaseException` handler in `stream_chat_completion_sdk`
- Both Sentry errors (8BT and 8BW) are now suppressed without affecting
normal cleanup flow

### Backend: Filter workspace UniqueViolationError from Sentry alerts
- Added `before_send` filter in `_before_send()` to drop
`UniqueViolationError` events where the message contains `workspaceId`
and `path`
- The error is already handled by `WorkspaceManager._persist_db_record`
retry logic — it must propagate for the retry logic to work, so the fix
is at the Sentry filter level rather than catching/suppressing at source

### Backend: Library agent race condition fixes
- **`add_graph_to_library`**: Replaced check-then-create pattern with
create-then-catch-`UniqueViolationError`-then-update. On collision,
updates the existing row (restoring soft-deleted/archived agents)
instead of crashing.
- **`create_library_agent`**: Replaced `create` with `upsert` on the
`(userId, agentGraphId, agentGraphVersion)` composite unique constraint,
so concurrent adds restore soft-deleted entries instead of throwing.

### Backend: Graph version auto-increment on collision
- `__create_graph` now checks if the `(id, version)` already exists
before `create_many`, and auto-increments the version to `max_existing +
1` to avoid `UniqueViolationError` when the copilot re-saves an agent.

### Backend: Workspace `get_or_create_workspace` upsert
- Changed from find-then-create to `upsert` to atomically handle
concurrent workspace creation.

## Test plan

- [x] `LlmModel("anthropic/claude-sonnet-4-6")` resolves correctly
- [x] `LlmModel("claude-sonnet-4-6")` still works (no regression)
- [x] `LlmModel("invalid/nonexistent-model")` still raises `ValueError`
- [x] XMLParserBlock: unclosed tags, extra closing tags, empty XML all
raise `ValueError`
- [x] XMLParserBlock: `SyntaxError` from gravitasml library is caught
and re-raised as `ValueError`
- [x] Virus scanner: empty file (0 bytes) returns clean without hitting
ClamAV
- [x] Virus scanner: single-byte file scans normally (regression test)
- [x] Virus scanner: `scan_content_safe` logs at WARNING not ERROR on
failure
- [x] SDK disconnect: `_is_sdk_disconnect_error` correctly identifies
cancel scope and context var errors
- [x] SDK disconnect: `_is_sdk_disconnect_error` rejects unrelated
errors
- [x] SDK disconnect: `_safe_close_sdk_client` suppresses ValueError,
RuntimeError, and unexpected exceptions
- [x] SDK disconnect: `_safe_close_sdk_client` calls `__aexit__` on
clean exit
- [x] Library: `add_graph_to_library` creates new agent on first call
- [x] Library: `add_graph_to_library` updates existing on
UniqueViolationError
- [x] Library: `create_library_agent` uses upsert to handle concurrent
adds
- [x] All existing workspace overwrite tests still pass
- [x] All tests passing (existing + 4 XML syntax + 3 virus scanner + 10
SDK disconnect + library tests)
2026-03-27 06:09:42 +00:00
Zamil Majdy
fb74fcf4a4 feat(platform): add shared admin user search + rate-limit modal on spending page (#12577)
## Why
Admin rate-limit management required manually entering user UUIDs. The
spending page already had user search but it wasn't reusable.

## What
- Extract `AdminUserSearch` as shared component from spending page
search
- Add rate-limit modal (usage bars + reset) to spending page user rows
- Add email/name/UUID search to standalone rate-limits page
- Backend: add email query parameter to rate-limit endpoint

## How
- `AdminUserSearch` in `admin/components/` — reused by both spending and
rate-limits
- `RateLimitModal` opens from spending page "Rate Limits" button
- Backend `_resolve_user_id()` accepts email or user_id
- Smart routing: exact email → direct lookup, UUID → direct, partial →
fuzzy search

### Follow-up
- `AdminUserSearch` is a plain text input with no typeahead/fuzzy
suggestions — consider adding autocomplete dropdown with debounced
search

### Checklist 📋
- [x] Shared search component extracted and reused
- [x] Tests pass
- [x] Type-checked
2026-03-27 05:53:04 +00:00
Zamil Majdy
c26791e6ae fix(test): mock get_global_rate_limits in reset_usage tests
The reset_copilot_usage endpoint now calls get_global_rate_limits()
which applies the tier multiplier. Tests were not mocking this, so
the daily_limit was inflated by the PRO 5x multiplier, making the
"at limit" check fail. Mock get_global_rate_limits to return base
limits directly.
2026-03-27 12:19:19 +07:00
Zamil Majdy
cf66c08125 fix(platform): rewrite migration to create enum before referencing it
The migration assumed a pre-existing SubscriptionTier enum from an
intermediate commit that was squashed. On a fresh DB the ALTER TYPE
fails with "type SubscriptionTier does not exist". Replace the
alter/rename/recreate sequence with a simple CREATE TYPE + ADD COLUMN.
2026-03-27 12:05:21 +07:00
Zamil Majdy
b4362785e4 fix(platform): update enterprise tier multiplier from 50x to 60x 2026-03-27 11:31:24 +07:00
Zamil Majdy
f38fa96df4 refactor(platform): update tier structure — remove STANDARD, add BUSINESS, default to PRO
Product decision: simplify tiers for beta testing.
- Tiers: FREE(1x), PRO(5x, default on sign-up), BUSINESS(20x), ENTERPRISE(50x)
- Remove STANDARD tier, rename existing STANDARD users to PRO in migration
- Default sign-up tier changed from FREE to PRO during beta
- Migration: recreate enum without STANDARD, add BUSINESS, update default
2026-03-27 11:25:50 +07:00
Zamil Majdy
98c8f94ef2 fix(platform): address round 1 review findings for rate-limit tiering
- Document _fetch_user_tier caching behavior for None tier values
- Add clarifying comment that TIER_MULTIPLIERS uses int intentionally
- Add 3 unit tests for set_user_tier (happy path, RecordNotFoundError,
  cache invalidation)
- Fix test isolation: mock get_global_rate_limits in chat routes usage
  tests to avoid implicit LD/Prisma fallback dependency
2026-03-27 11:07:50 +07:00
Zamil Majdy
7b0111d9b5 test(copilot): add missing PRO tier 10x multiplier test
Complete the tier multiplier coverage matrix by adding a test case
for the PRO tier (10x). Previously only FREE (1x), STANDARD (5x),
and ENTERPRISE (25x) were tested.
2026-03-27 10:48:53 +07:00
Zamil Majdy
85e9e4c5b7 refactor(copilot): rename RateLimitTier to SubscriptionTier with Prisma enum
Rename `rateLimitTier` (String) to `subscriptionTier` (Prisma enum) across
the entire stack:

- schema.prisma: Add `SubscriptionTier` enum (FREE, STANDARD, PRO,
  ENTERPRISE), change User field from `rateLimitTier String` to
  `subscriptionTier SubscriptionTier`.
- migration.sql: CREATE TYPE + ALTER TABLE for the new enum column.
- rate_limit.py: Rename Python enum and update DB field references.
- All test files, admin routes, snapshots, and openapi.json updated to
  match the new naming.

Addresses PR feedback asking for a generic name and proper Prisma enum
instead of a free-form string.
2026-03-27 10:17:21 +07:00
Zamil Majdy
e900ee615a fix(copilot): move get_user_tier import to top-level and expose cache via public API
- sdk/service.py: Move `get_user_tier` import from local (inside function)
  to module-level — no circular dependency exists.
- rate_limit.py: Expose `cache_clear`/`cache_delete` as attributes on the
  public `get_user_tier` function so callers never need to import the
  private `_fetch_user_tier`.
- rate_limit_test.py: Remove `_fetch_user_tier` import; use
  `get_user_tier.cache_clear()` instead.
2026-03-27 09:52:59 +07:00
Zamil Majdy
e1d5113051 fix(platform): pass tier to get_usage_status() in admin rate limit endpoints
For consistency, pass tier=tier to get_usage_status() in the admin
get_user_rate_limit and reset_user_rate_limit endpoints as well.
2026-03-27 01:40:14 +07:00
Zamil Majdy
4963d227ea fix(platform): pass tier to get_usage_status() in reset_copilot_usage endpoint
The reset_copilot_usage endpoint was calling get_usage_status() without
the tier parameter, causing the response to always report STANDARD tier
regardless of the user's actual tier. Pass _tier from get_global_rate_limits()
to both get_usage_status() calls in the endpoint.
2026-03-27 01:37:01 +07:00
Zamil Majdy
19dea0e4ca fix(test): update usage test assertions to include tier parameter
Update test_usage_returns_daily_and_weekly and test_usage_uses_config_limits
to include tier=RateLimitTier.STANDARD in the expected call kwargs, matching
the new tier parameter added to get_usage_status().
2026-03-27 01:24:52 +07:00
Zamil Majdy
87d5a39267 fix(platform): use direct dict indexing for tier multiplier lookup
Use TIER_MULTIPLIERS[tier] instead of .get(tier, 1) to fail fast
if a new tier is added to the enum without a corresponding multiplier.
2026-03-27 01:12:37 +07:00
Zamil Majdy
87ac8148e3 refactor(platform): pass tier to get_usage_status() instead of post-mutation
Add tier parameter to get_usage_status() so callers can set the tier
at construction time rather than mutating the model after creation.
This is safer if the model ever becomes frozen.
2026-03-27 01:01:44 +07:00
Zamil Majdy
491132f62f Merge dev: resolve conflicts + fix transient DB error caching default tier
Resolve merge conflicts between rate-limit tiering and reset-daily-usage
features (both additive). Fix Sentry-flagged bug where a transient DB
error in get_user_tier cached DEFAULT_TIER for 5 minutes, incorrectly
downgrading higher-tier users. Split into _fetch_user_tier (cached, raises
on error) and get_user_tier (uncached wrapper with fallback). Added
regression test test_db_error_is_not_cached.
2026-03-26 23:50:10 +07:00
Zamil Majdy
55815a3207 chore: trigger CI 2026-03-26 21:45:07 +07:00
Zamil Majdy
5c3aa11600 fix(test): add rateLimitTier to User mock in store db_test
The new rateLimitTier field on User is NOT NULL with a DB default,
so Prisma's Pydantic model requires it at construction time.
2026-03-26 21:06:38 +07:00
Zamil Majdy
28b26dde94 feat(platform): spend credits to reset CoPilot daily rate limit (#12526)
## Summary
- When users hit their daily CoPilot token limit, they can now spend
credits ($2.00 default) to reset it and continue working
- Adds a dialog prompt when rate limit error occurs, offering the
credit-based reset option
- Adds a "Reset daily limit" button in the usage limits panel when the
daily limit is reached
- Backend: new `POST /api/chat/usage/reset` endpoint,
`reset_daily_usage()` Redis helper, `rate_limit_reset_cost` config
- Frontend: `RateLimitResetDialog` component, updated
`UsagePanelContent` with reset button, `useCopilotStream` exposes rate
limit state
- **NEW: Resetting the daily limit also reduces weekly usage by the
daily limit amount**, effectively granting 1 extra day's worth of weekly
capacity (e.g., daily_limit=10000 → weekly usage reduced by 10000,
clamped to 0)

## Context
Users have been confused about having credits available but being
blocked by rate limits (REQ-63, REQ-61). This provides a short-term
solution allowing users to spend credits to bypass their daily limit.

The weekly usage reduction ensures that a paid daily reset doesn't just
move the bottleneck to the weekly limit — users get genuine additional
capacity for the day they paid to unlock.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Hit daily rate limit → dialog appears with reset option
- [x] Click "Reset for $2.00" → credits charged, daily counter reset,
dialog closes
- [x] Usage panel shows "Reset daily limit" button when at 100% daily
usage
- [x] When `rate_limit_reset_cost=0` (disabled), rate limit shows toast
instead of dialog
  - [x] Insufficient credits → error toast shown
  - [x] Verify existing rate limit tests pass
  - [x] Unit tests: weekly counter reduced by daily_limit on reset
  - [x] Unit tests: weekly counter clamped to 0 when usage < daily_limit
  - [x] Unit tests: no weekly reduction when daily_token_limit=0

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
(new config fields `rate_limit_reset_cost` and `max_daily_resets` have
defaults in code)
- [x] `docker-compose.yml` is updated or already compatible with my
changes (no Docker changes needed)
2026-03-26 13:52:08 +00:00
Zamil Majdy
b5cbf8505b fix(backend): remove platform schema prefix from migration SQL
CI test database doesn't have the "platform" schema. Use unqualified
table name so the migration works in all environments.
2026-03-26 20:50:40 +07:00
Zamil Majdy
f49f63de76 fix: mock PrismaUser where it is used, not where it is defined
Change mock target from prisma.models.User.prisma to
backend.copilot.rate_limit.PrismaUser.prisma to follow the
coding guideline of mocking at the import boundary.
2026-03-26 20:48:57 +07:00
Zamil Majdy
8f76384942 fix: invalidate get_user_tier cache when tier is updated via set_user_tier
Call get_user_tier.cache_delete(user_id) after DB update so that
subsequent rate-limit checks immediately see the new tier instead
of using a stale cached value for up to 5 minutes.
2026-03-26 20:47:01 +07:00
Zamil Majdy
ffb8d366d6 fix: address PR review - cache tier lookups, return tier from get_global_rate_limits, fix error handling
- Add @cached(ttl_seconds=300) to get_user_tier() to avoid DB hit on every chat turn
- Change get_global_rate_limits() to return 3-tuple (daily, weekly, tier) so callers
  don't need redundant get_user_tier() calls
- Remove redundant get_user_tier() calls from admin routes and chat /usage endpoint
- Simplify `except (ValueError, Exception)` to `except Exception`
- Handle prisma.errors.RecordNotFoundError in set_user_tier admin endpoint (404 vs 500)
- Add test for user-not-found case on set_user_tier endpoint
- Clear tier cache between tests to prevent stale cached results
2026-03-26 20:42:01 +07:00
Zamil Majdy
432ef5ab5e feat(platform): add rate-limit tiering system for CoPilot
Add a three-tier rate-limiting system (standard/pro/max) that allows
assigning different token limits to users. Tier multipliers are applied
on top of the base limits from LaunchDarkly/config.

Changes:
- Add RateLimitTier enum with standard (1x), pro (5x), max (25x) multipliers
- Add rateLimitTier column to User model in Prisma schema
- Add get_user_tier/set_user_tier DB functions in rate_limit.py
- Update get_global_rate_limits to apply tier multiplier to base limits
- Add admin endpoints: GET/POST /admin/rate_limit/tier for tier management
- Include tier info in UserRateLimitResponse and CoPilotUsageStatus
- Send user tier as metadata in OTEL/Langfuse traces
- Add comprehensive tests (43 total, all passing)
- Add Prisma migration for the new column
2026-03-26 20:31:53 +07:00
Zamil Majdy
d677978c90 feat(platform): admin rate limit check and reset with LD-configurable global limits (#12566)
## Why
Admins need visibility into per-user CoPilot rate limit usage and the
ability to reset a user's counters when needed (e.g., after a false
positive or for debugging). Additionally, the global rate limits were
hardcoded deploy-time constants with no way to adjust without
redeploying.

## What
- Admin endpoints to **check** a user's current rate limit usage and
**reset** their daily/weekly counters to zero
- Global rate limits are now **LaunchDarkly-configurable** via
`copilot-daily-token-limit` and `copilot-weekly-token-limit` flags,
falling back to existing `ChatConfig` values
- Frontend admin page at `/admin/rate-limits` with user lookup, usage
visualization, and reset capability
- Chat routes updated to source global limits from LD flags

## How
- **Backend**: Added `reset_user_usage()` to `rate_limit.py` that
deletes Redis usage keys. New admin routes in
`rate_limit_admin_routes.py` (GET `/api/copilot/admin/rate_limit` and
POST `/api/copilot/admin/rate_limit/reset`). Added
`COPILOT_DAILY_TOKEN_LIMIT` and `COPILOT_WEEKLY_TOKEN_LIMIT` to the
`Flag` enum. Chat routes use `_get_global_rate_limits()` helper that
checks LD first.
- **Frontend**: New `/admin/rate-limits` page with `RateLimitManager`
(user lookup) and `RateLimitDisplay` (usage bars + reset button). Added
`getUserRateLimit` and `resetUserRateLimit` to `BackendAPI` client.

## Test plan
- [x] Backend: 4 tests covering get, reset, redis failure, and
admin-only access
- [ ] Manual: Look up a user's rate limits in the admin UI
- [ ] Manual: Reset a user's usage counters
- [ ] Manual: Verify LD flag overrides are respected for global limits
2026-03-26 08:29:40 +00:00
Otto
a347c274b7 fix(frontend): replace unrealistic CoPilot suggestion prompt (#12564)
Replaces "Sort my bookmarks into categories" with "Summarize my unread
emails" in the Organize suggestion category. CoPilot has no access to
browser bookmarks or local files, so the original prompt was misleading.

---
Co-authored-by: Toran Bruce Richards (@Torantulino)
<Torantulino@users.noreply.github.com>
2026-03-26 08:10:28 +00:00
Zamil Majdy
f79d8f0449 fix(backend): move placeholder_values exclusively to AgentDropdownInputBlock (#12551)
## Why

`AgentInputBlock` has a `placeholder_values` field whose
`generate_schema()` converts it into a JSON schema `enum`. The frontend
renders any field with `enum` as a dropdown/select. This means
AI-generated agents that populate `placeholder_values` with example
values (e.g. URLs) on regular `AgentInputBlock` nodes end up with
dropdowns instead of free-text inputs — users can't type custom values.

Only `AgentDropdownInputBlock` should produce dropdown behavior.

## What

- Removed `placeholder_values` field from `AgentInputBlock.Input`
- Moved the `enum` generation logic to
`AgentDropdownInputBlock.Input.generate_schema()`
- Cleaned up test data for non-dropdown input blocks
- Updated copilot agent generation guide to stop suggesting
`placeholder_values` for `AgentInputBlock`

## How

The base `AgentInputBlock.Input.generate_schema()` no longer converts
`placeholder_values` → `enum`. Only `AgentDropdownInputBlock.Input`
defines `placeholder_values` and overrides `generate_schema()` to
produce the `enum`.

**Backward compatibility**: Existing agents with `placeholder_values` on
`AgentInputBlock` nodes load fine — `model_construct()` silently ignores
extra fields not defined on the model. Those inputs will now render as
text fields (desired behavior).

## Test plan
- [x] `poetry run pytest backend/blocks/test/test_block.py -xvs` — all
block tests pass
- [x] `poetry run format && poetry run lint` — clean
- [ ] Import an agent JSON with `placeholder_values` on an
`AgentInputBlock` — verify it loads and renders as text input
- [ ] Create an agent with `AgentDropdownInputBlock` — verify dropdown
still works
2026-03-26 08:09:38 +00:00
Otto
1bc48c55d5 feat(copilot): add copy button to user prompt messages [SECRT-2172] (#12571)
Requested by @itsababseh

Users can copy assistant output messages but not their own prompts. This
adds the same copy button to user messages — appears on hover,
right-aligned, using the existing `CopyButton` component.

## Why

Users write long prompts and need to copy them to reuse or share.
Currently requires manual text selection. ChatGPT shows copy on hover
for user messages — this matches that pattern.

## What

- Added `CopyButton` to user prompt messages in
`ChatMessagesContainer.tsx`
- Shows on hover (`group-hover:opacity-100`), positioned right-aligned
below the message
- Reuses the existing `CopyButton` and `MessageActions` components —
zero new code

## How

One file changed, 11 lines added:
1. Import `MessageActions` and `CopyButton`
2. Render them after user `MessageContent`, gated on `message.role ===
"user"` and having text parts

---
Co-authored-by: itsababseh (@itsababseh)
<36419647+itsababseh@users.noreply.github.com>
2026-03-26 08:02:28 +00:00
Abhimanyu Yadav
9d0a31c0f1 fix(frontend/builder): fix array field item layout and add FormRenderer stories (#12532)
Fix broken UI when selecting nodes with array fields (list[str],
list[Enum]) in the builder. The select/input inside array items was
squeezed by the Remove button instead of taking full width.
<img width="2559" height="1077" alt="Screenshot 2026-03-26 at 10 23
34 AM"
src="https://github.com/user-attachments/assets/2ffc28a2-8d6c-428c-897c-021b1575723c"
/>

### Changes 🏗️

- **ArrayFieldItemTemplate**: Changed layout from horizontal flex-row to
vertical flex-col so the input takes full width and Remove button sits
below aligned left, with tighter spacing between them
- **Storybook config**: Added `renderers/**` glob to
`.storybook/main.ts` so renderer stories are discoverable
- **FormRenderer stories**: Added comprehensive Storybook stories
covering all backend field types (string, int, float, bool, enum,
date/time, list[str], list[int], list[Enum], list[bool], nested objects,
Optional, anyOf unions, oneOf discriminated unions, multi-select, list
of objects, and a kitchen sink). Includes exact Twitter GetUserBlock
schema for realistic oneOf + multi-select testing.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified array field items render with full-width input and Remove
button below in Storybook
  - [x] Verified list[Enum] select dropdown takes full width
  - [x] Verified list[str] text input takes full width
- [x] Verified all FormRenderer stories render without errors in
Storybook
- [x] Verified multi-select and oneOf discriminated union stories match
real backend schemas
2026-03-26 06:15:30 +00:00
Abhimanyu Yadav
9b086e39c6 fix(frontend): hide placeholder text when copilot voice recording is active (#12534)
### Why / What / How

**Why:** When voice recording is active in the CoPilot chat input, the
recording UI (waveform + timer) overlays on top of the placeholder/hint
text, creating a visually broken appearance. Reported by a user via
SECRT-2163.

**What:** Hide the textarea placeholder text while voice recording is
active so it doesn't bleed through the `RecordingIndicator` overlay.

**How:** When `isRecording` is true, the placeholder is set to an empty
string. The existing `RecordingIndicator` overlay (waveform animation +
elapsed time) then displays cleanly without the hint text showing
underneath.

### Changes 🏗️

- Clear the `PromptInputTextarea` placeholder to `""` when voice
recording is active, preventing it from rendering behind the
`RecordingIndicator` overlay

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Open CoPilot chat at /copilot
- [x] Click the microphone button or press Space to start voice
recording
- [x] Verify the placeholder text ("Type your message..." / "What else
can I help with?") is hidden during recording
- [x] Verify the RecordingIndicator (waveform + timer) displays cleanly
without overlapping text
  - [x] Stop recording and verify placeholder text reappears
  - [x] Verify "Transcribing..." placeholder shows during transcription
2026-03-26 05:41:09 +00:00
Zamil Majdy
5867e4d613 Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-03-26 07:30:56 +07:00
An Vy Le
f871717f68 fix(backend): add sink input validation to AgentValidator (#12514)
## Summary

- Added `validate_sink_input_existence` method to `AgentValidator` to
ensure all sink names in links and input defaults reference valid input
schema fields in the corresponding block
- Added comprehensive tests covering valid/invalid sink names, nested
inputs, and default key handling
- Updated `ReadDiscordMessagesBlock` description to clarify it reads new
messages and triggers on new posts
- Removed leftover test function file

## Test plan

- [ ] Run `pytest` on `validator_test.py` to verify all sink input
validation cases pass
- [ ] Verify existing agent validation flow is unaffected
- [ ] Confirm `ReadDiscordMessagesBlock` description update is accurate

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-25 16:08:17 +00:00
Ubbe
f08e52dc86 fix(frontend): marketplace card description 3 lines + fallback color (#12557)
## Summary
- Increase the marketplace StoreCard description from 2 lines to 3 lines
for better readability
- Change fallback background colour for missing agent images from
`bg-violet-50` to `rgb(216, 208, 255)`

<img width="933" height="458" alt="Screenshot 2026-03-25 at 20 25 41"
src="https://github.com/user-attachments/assets/ea433741-1397-4585-b64c-c7c3b8109584"
/>
<img width="350" height="457" alt="Screenshot 2026-03-25 at 20 25 55"
src="https://github.com/user-attachments/assets/e2029c09-518a-4404-aa95-e202b4064d0b"
/>


## Test plan
- [x] Verified `pnpm format`, `pnpm lint`, `pnpm types` all pass
- [x] Visually confirmed description shows 3 lines on marketplace cards
- [x] Visually confirmed fallback color renders correctly for cards
without images

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 20:58:45 +08:00
Ubbe
500b345b3b fix(frontend): auto-reconnect copilot chat after device sleep/wake (#12519)
## Summary

- Adds `visibilitychange`-based sleep/wake detection to the copilot chat
— when the page becomes visible after >30s hidden, automatically refetch
the session and either resume an active stream or hydrate completed
messages
- Blocks chat input during re-sync (`isSyncing` state) to prevent users
from accidentally sending a message that overwrites the agent's
completed work
- Replaces `PulseLoader` with a spinning `CircleNotch` icon on sidebar
session names for background streaming sessions (closer to ChatGPT's UX)

## How it works

1. When the page goes hidden, we record a timestamp
2. When the page becomes visible, we check elapsed time
3. If >30s elapsed (indicating sleep or long background), we refetch the
session from the API
4. If backend still has `active_stream=true` → remove stale assistant
message and resume SSE
5. If backend is done → the refetch triggers React Query invalidation
which hydrates the completed messages
6. Chat input stays disabled (`isSyncing=true`) until re-sync completes

## Test plan

- [ ] Open copilot, start a long-running agent task
- [ ] Close laptop lid / lock screen for >30 seconds
- [ ] Wake device — verify chat shows the agent's completed response (or
resumes streaming)
- [ ] Verify chat input is temporarily disabled during re-sync, then
re-enables
- [ ] Verify sidebar shows spinning icon (not pulse loader) for
background sessions
- [ ] Verify no duplicate messages appear after wake
- [ ] Verify normal streaming (no sleep) still works as expected

Resolves: [SECRT-2159](https://linear.app/autogpt/issue/SECRT-2159)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 20:15:33 +08:00
Ubbe
995dd1b5f3 feat(platform): replace suggestion pills with themed prompt categories (#12515)
## Summary

<img width="700" height="575" alt="Screenshot 2026-03-23 at 21 40 07"
src="https://github.com/user-attachments/assets/f6138c63-dd5e-4bde-a2e4-7434d0d3ec72"
/>

Re-applies #12452 which was reverted as collateral in #12485 (invite
system revert).

Replaces the flat list of suggestion pills in the CoPilot empty session
with themed prompt categories (Learn, Create, Automate, Organize), each
shown as a popover with contextual prompts.

- **Backend**: Adds `suggested_prompts` as a themed `dict[str,
list[str]]` keyed by category. Updates Tally extraction LLM prompt to
generate prompts per theme, and the `/suggested-prompts` API to return
grouped themes. Legacy `list[str]` rows are preserved under a
`"General"` key for backward compatibility.
- **Frontend**: Replaces inline pill buttons with a `SuggestionThemes`
popover component. Each theme button (with icon) opens a dropdown of 5
relevant prompts. Falls back to hardcoded defaults when the API has no
personalized prompts. Normalizes partial API responses by padding
missing themes with defaults. Legacy `"General"` prompts are distributed
round-robin across themes.

### Changes 🏗️

- `backend/data/understanding.py`: `suggested_prompts` field added as
`dict[str, list[str]]`; legacy list rows preserved under `"General"` key
via `_json_to_themed_prompts`
- `backend/data/tally.py`: LLM prompt updated to generate themed
prompts; validation now per-theme with blank-string rejection
- `backend/api/features/chat/routes.py`: New `SuggestedTheme` model;
endpoint returns `themes[]`
- `frontend/copilot/components/EmptySession/EmptySession.tsx`: Uses
generated API hooks for suggested prompts
- `frontend/copilot/components/EmptySession/helpers.ts`:
`DEFAULT_THEMES` replaces `DEFAULT_QUICK_ACTIONS`; `getSuggestionThemes`
normalizes partial API responses
-
`frontend/copilot/components/EmptySession/components/SuggestionThemes/`:
New popover component with theme icons and loading states

### Checklist 📋

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verify themed suggestion buttons render on CoPilot empty session
  - [x] Click each theme button and confirm popover opens with prompts
  - [x] Click a prompt and confirm it sends the message
- [x] Verify fallback to default themes when API returns no custom
prompts
- [x] Verify legacy users' personalized prompts are preserved and
visible

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:32:49 +08:00
Zamil Majdy
336114f217 fix(backend): prevent graph execution stuck + steer SDK away from bash_exec (#12548)
## Summary

Two backend fixes for CoPilot stability:

1. **Steer model away from bash_exec for SDK tool-result files** — When
the SDK returns tool results as file paths, the copilot model was
attempting to use `bash_exec` to read them instead of treating the
content directly. Added system prompt guidance to prevent this.

2. **Guard against missing 'name' in execution input_data** —
`GraphExecution.from_db()` assumed all INPUT/OUTPUT block node
executions have a `name` field in `input_data`. This crashes with
`KeyError: 'name'` when non-standard blocks (e.g., OrchestratorBlock)
produce node executions without this field. Added `"name" in
exec.input_data` guards.

## Why

- The bash_exec issue causes copilot to fail when processing SDK tool
outputs
- The KeyError crashes the `update_graph_execution_stats` endpoint,
causing graph executions to appear stuck (retries 35+ times, never
completes)

## How

- Added system prompt instruction to treat tool result file contents
directly
- Added `"name" in exec.input_data` guard in both input extraction (line
340) and output extraction (line 365) in `execution.py`

### Changes
- `backend/copilot/sdk/service.py` — system prompt guidance
- `backend/data/execution.py` — KeyError guard for missing `name` field

### Checklist 📋
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan

#### Test plan:
- [x] OrchestratorBlock graph execution no longer gets stuck
- [x] Standard Agent Input/Output blocks still work correctly
- [x] Copilot SDK tool results are processed without bash_exec
2026-03-25 13:58:24 +07:00
320 changed files with 29443 additions and 4488 deletions

1
.agents/skills Symbolic link
View File

@@ -0,0 +1 @@
../.claude/skills

View File

@@ -0,0 +1,106 @@
---
name: open-pr
description: Open a pull request with proper PR template, test coverage, and review workflow. Guides agents through creating a PR that follows repo conventions, ensures existing behaviors aren't broken, covers new behaviors with tests, and handles review via bot when local testing isn't possible. TRIGGER when user asks to "open a PR", "create a PR", "make a PR", "submit a PR", "open pull request", "push and create PR", or any variation of opening/submitting a pull request.
user-invocable: true
args: "[base-branch] — optional target branch (defaults to dev)."
metadata:
author: autogpt-team
version: "1.0.0"
---
# Open a Pull Request
## Step 1: Pre-flight checks
Before opening the PR:
1. Ensure all changes are committed
2. Ensure the branch is pushed to the remote (`git push -u origin <branch>`)
3. Run linters/formatters across the whole repo (not just changed files) and commit any fixes
## Step 2: Test coverage
**This is critical.** Before opening the PR, verify:
### Existing behavior is not broken
- Identify which modules/components your changes touch
- Run the existing test suites for those areas
- If tests fail, fix them before opening the PR — do not open a PR with known regressions
### New behavior has test coverage
- Every new feature, endpoint, or behavior change needs tests
- If you added a new block, add tests for that block
- If you changed API behavior, add or update API tests
- If you changed frontend behavior, verify it doesn't break existing flows
If you cannot run the full test suite locally, note which tests you ran and which you couldn't in the test plan.
## Step 3: Create the PR using the repo template
Read the canonical PR template at `.github/PULL_REQUEST_TEMPLATE.md` and use it **verbatim** as your PR body:
1. Read the template: `cat .github/PULL_REQUEST_TEMPLATE.md`
2. Preserve the exact section titles and formatting, including:
- `### Why / What / How`
- `### Changes 🏗️`
- `### Checklist 📋`
3. Replace HTML comment prompts (`<!-- ... -->`) with actual content; do not leave them in
4. **Do not pre-check boxes** — leave all checkboxes as `- [ ]` until each step is actually completed
5. Do not alter the template structure, rename sections, or remove any checklist items
**PR title must use conventional commit format** (e.g., `feat(backend): add new block`, `fix(frontend): resolve routing bug`, `dx(skills): update PR workflow`). See CLAUDE.md for the full list of scopes.
Use `gh pr create` with the base branch (defaults to `dev` if no `[base-branch]` was provided). Use `--body-file` to avoid shell interpretation of backticks and special characters:
```bash
BASE_BRANCH="${BASE_BRANCH:-dev}"
PR_BODY=$(mktemp)
cat > "$PR_BODY" << 'PREOF'
<filled-in template from .github/PULL_REQUEST_TEMPLATE.md>
PREOF
gh pr create --base "$BASE_BRANCH" --title "<type>(scope): short description" --body-file "$PR_BODY"
rm "$PR_BODY"
```
## Step 4: Review workflow
### If you have a workspace that allows testing (docker, running backend, etc.)
- Run `/pr-test` to do E2E manual testing of the PR using docker compose, agent-browser, and API calls. This is the most thorough way to validate your changes before review.
- After testing, run `/pr-review` to self-review the PR for correctness, security, code quality, and testing gaps before requesting human review.
### If you do NOT have a workspace that allows testing
This is common for agents running in worktrees without a full stack. In this case:
1. Run `/pr-review` locally to catch obvious issues before pushing
2. **Comment `/review` on the PR** after creating it to trigger the review bot
3. **Poll for the review** rather than blindly waiting — check for new review comments every 30 seconds using `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` and the GraphQL inline threads query. The bot typically responds within 30 minutes, but polling lets the agent react as soon as it arrives.
4. Do NOT proceed or merge until the bot review comes back
5. Address any issues the bot raises — use `/pr-address` which has a full polling loop with CI + comment tracking
```bash
# After creating the PR:
PR_NUMBER=$(gh pr view --json number -q .number)
gh pr comment "$PR_NUMBER" --body "/review"
# Then use /pr-address to poll for and address the review when it arrives
```
## Step 5: Address review feedback
Once the review bot or human reviewers leave comments:
- Run `/pr-address` to address review comments. It will loop until CI is green and all comments are resolved.
- Do not merge without human approval.
## Related skills
| Skill | When to use |
|---|---|
| `/pr-test` | E2E testing with docker compose, agent-browser, API calls — use when you have a running workspace |
| `/pr-review` | Review for correctness, security, code quality — use before requesting human review |
| `/pr-address` | Address reviewer comments and loop until CI green — use after reviews come in |
## Step 6: Post-creation
After the PR is created and review is triggered:
- Share the PR URL with the user
- If waiting on the review bot, let the user know the expected wait time (~30 min)
- Do not merge without human approval

View File

@@ -0,0 +1,195 @@
---
name: setup-repo
description: Initialize a worktree-based repo layout for parallel development. Creates a main worktree, a reviews worktree for PR reviews, and N numbered work branches. Handles .env creation, dependency installation, and branchlet config. TRIGGER when user asks to set up the repo from scratch, initialize worktrees, bootstrap their dev environment, "setup repo", "setup worktrees", "initialize dev environment", "set up branches", or when a freshly cloned repo has no sibling worktrees.
user-invocable: true
args: "No arguments — interactive setup via prompts."
metadata:
author: autogpt-team
version: "1.0.0"
---
# Repository Setup
This skill sets up a worktree-based development layout from a freshly cloned repo. It creates:
- A **main** worktree (the primary checkout)
- A **reviews** worktree (for PR reviews)
- **N work branches** (branch1..branchN) for parallel development
## Step 1: Identify the repo
Determine the repo root and parent directory:
```bash
ROOT=$(git rev-parse --show-toplevel)
REPO_NAME=$(basename "$ROOT")
PARENT=$(dirname "$ROOT")
```
Detect if the repo is already inside a worktree layout by counting sibling worktrees (not just checking the directory name, which could be anything):
```bash
# Count worktrees that are siblings (live under $PARENT but aren't $ROOT itself)
SIBLING_COUNT=$(git worktree list --porcelain 2>/dev/null | grep "^worktree " | grep -c "$PARENT/" || true)
if [ "$SIBLING_COUNT" -gt 1 ]; then
echo "INFO: Existing worktree layout detected at $PARENT ($SIBLING_COUNT worktrees)"
# Use $ROOT as-is; skip renaming/restructuring
else
echo "INFO: Fresh clone detected, proceeding with setup"
fi
```
## Step 2: Ask the user questions
Use AskUserQuestion to gather setup preferences:
1. **How many parallel work branches do you need?** (Options: 4, 8, 16, or custom)
- These become `branch1` through `branchN`
2. **Which branch should be the base?** (Options: origin/master, origin/dev, or custom)
- All work branches and reviews will start from this
## Step 3: Fetch and set up branches
```bash
cd "$ROOT"
git fetch origin
# Create the reviews branch from base (skip if already exists)
if git show-ref --verify --quiet refs/heads/reviews; then
echo "INFO: Branch 'reviews' already exists, skipping"
else
git branch reviews <base-branch>
fi
# Create numbered work branches from base (skip if already exists)
for i in $(seq 1 "$COUNT"); do
if git show-ref --verify --quiet "refs/heads/branch$i"; then
echo "INFO: Branch 'branch$i' already exists, skipping"
else
git branch "branch$i" <base-branch>
fi
done
```
## Step 4: Create worktrees
Create worktrees as siblings to the main checkout:
```bash
if [ -d "$PARENT/reviews" ]; then
echo "INFO: Worktree '$PARENT/reviews' already exists, skipping"
else
git worktree add "$PARENT/reviews" reviews
fi
for i in $(seq 1 "$COUNT"); do
if [ -d "$PARENT/branch$i" ]; then
echo "INFO: Worktree '$PARENT/branch$i' already exists, skipping"
else
git worktree add "$PARENT/branch$i" "branch$i"
fi
done
```
## Step 5: Set up environment files
**Do NOT assume .env files exist.** For each worktree (including main if needed):
1. Check if `.env` exists in the source worktree for each path
2. If `.env` exists, copy it
3. If only `.env.default` or `.env.example` exists, copy that as `.env`
4. If neither exists, warn the user and list which env files are missing
Env file locations to check (same as the `/worktree` skill — keep these in sync):
- `autogpt_platform/.env`
- `autogpt_platform/backend/.env`
- `autogpt_platform/frontend/.env`
> **Note:** This env copying logic intentionally mirrors the `/worktree` skill's approach. If you update the path list or fallback logic here, update `/worktree` as well.
```bash
SOURCE="$ROOT"
WORKTREES="reviews"
for i in $(seq 1 "$COUNT"); do WORKTREES="$WORKTREES branch$i"; done
FOUND_ANY_ENV=0
for wt in $WORKTREES; do
TARGET="$PARENT/$wt"
for envpath in autogpt_platform autogpt_platform/backend autogpt_platform/frontend; do
if [ -f "$SOURCE/$envpath/.env" ]; then
FOUND_ANY_ENV=1
cp "$SOURCE/$envpath/.env" "$TARGET/$envpath/.env"
elif [ -f "$SOURCE/$envpath/.env.default" ]; then
FOUND_ANY_ENV=1
cp "$SOURCE/$envpath/.env.default" "$TARGET/$envpath/.env"
echo "NOTE: $wt/$envpath/.env was created from .env.default — you may need to edit it"
elif [ -f "$SOURCE/$envpath/.env.example" ]; then
FOUND_ANY_ENV=1
cp "$SOURCE/$envpath/.env.example" "$TARGET/$envpath/.env"
echo "NOTE: $wt/$envpath/.env was created from .env.example — you may need to edit it"
else
echo "WARNING: No .env, .env.default, or .env.example found at $SOURCE/$envpath/"
fi
done
done
if [ "$FOUND_ANY_ENV" -eq 0 ]; then
echo "WARNING: No environment files or templates were found in the source worktree."
# Use AskUserQuestion to confirm: "Continue setup without env files?"
# If the user declines, stop here and let them set up .env files first.
fi
```
## Step 6: Copy branchlet config
Copy `.branchlet.json` from main to each worktree so branchlet can manage sub-worktrees:
```bash
if [ -f "$ROOT/.branchlet.json" ]; then
for wt in $WORKTREES; do
cp "$ROOT/.branchlet.json" "$PARENT/$wt/.branchlet.json"
done
fi
```
## Step 7: Install dependencies
Install deps in all worktrees. Run these sequentially per worktree:
```bash
for wt in $WORKTREES; do
TARGET="$PARENT/$wt"
echo "=== Installing deps for $wt ==="
(cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install) &&
(cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate) &&
(cd "$TARGET/autogpt_platform/frontend" && pnpm install) &&
echo "=== Done: $wt ===" ||
echo "=== FAILED: $wt ==="
done
```
This is slow. Run in background if possible and notify when complete.
## Step 8: Verify and report
After setup, verify and report to the user:
```bash
git worktree list
```
Summarize:
- Number of worktrees created
- Which env files were copied vs created from defaults vs missing
- Any warnings or errors encountered
## Final directory layout
```
parent/
main/ # Primary checkout (already exists)
reviews/ # PR review worktree
branch1/ # Work branch 1
branch2/ # Work branch 2
...
branchN/ # Work branch N
```

View File

@@ -1,6 +1,6 @@
# AutoGPT Platform Contribution Guide
This guide provides context for Codex when updating the **autogpt_platform** folder.
This guide provides context for coding agents when updating the **autogpt_platform** folder.
## Directory overview

1
CLAUDE.md Normal file
View File

@@ -0,0 +1 @@
@AGENTS.md

View File

@@ -83,13 +83,13 @@ The AutoGPT frontend is where users interact with our powerful AI automation pla
**Agent Builder:** For those who want to customize, our intuitive, low-code interface allows you to design and configure your own AI agents.
**Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block performs a single action.
**Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block performs a single action.
**Deployment Controls:** Manage the lifecycle of your agents, from testing to production.
**Ready-to-Use Agents:** Don't want to build? Simply select from our library of pre-configured agents and put them to work immediately.
**Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly interface.
**Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly interface.
**Monitoring and Analytics:** Keep track of your agents' performance and gain insights to continually improve your automation processes.

120
autogpt_platform/AGENTS.md Normal file
View File

@@ -0,0 +1,120 @@
# AutoGPT Platform
This file provides guidance to coding agents when working with code in this repository.
## Repository Overview
AutoGPT Platform is a monorepo containing:
- **Backend** (`backend`): Python FastAPI server with async support
- **Frontend** (`frontend`): Next.js React application
- **Shared Libraries** (`autogpt_libs`): Common Python utilities
## Component Documentation
- **Backend**: See @backend/AGENTS.md for backend-specific commands, architecture, and development tasks
- **Frontend**: See @frontend/AGENTS.md for frontend-specific commands, architecture, and development patterns
## Key Concepts
1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
3. **Integrations**: OAuth and API connections stored per user
4. **Store**: Marketplace for sharing agent templates
5. **Virus Scanning**: ClamAV integration for file upload security
### Environment Configuration
#### Configuration Files
- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
#### Docker Environment Loading Order
1. `.env.default` files provide base configuration (tracked in git)
2. `.env` files provide user-specific overrides (gitignored)
3. Docker Compose `environment:` sections provide service-specific overrides
4. Shell environment variables have highest precedence
#### Key Points
- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
- The `env_file` directive loads variables INTO containers at runtime
- Backend/Frontend services use YAML anchors for consistent configuration
- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
### Branching Strategy
- **`dev`** is the main development branch. All PRs should target `dev`.
- **`master`** is the production branch. Only used for production releases.
### Creating Pull Requests
- Create the PR against the `dev` branch of the repository.
- **Split PRs by concern** — each PR should have a single clear purpose. For example, "usage tracking" and "credit charging" should be separate PRs even if related. Combining multiple concerns makes it harder for reviewers to understand what belongs to what.
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
```bash
PR_BODY=$(mktemp)
cat > "$PR_BODY" << 'PREOF'
## Summary
- use `backticks` freely here
PREOF
gh pr create --title "..." --body-file "$PR_BODY" --base dev
rm "$PR_BODY"
```
- Run the github pre-commit hooks to ensure code quality.
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, follow a test-first approach:
1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
2. **Implement the fix/feature** — write the minimal code to make the test pass.
3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
This ensures every change is covered by a test and that the test actually validates the intended behavior.
### Reviewing/Revising Pull Requests
Use `/pr-review` to review a PR or `/pr-address` to address comments.
When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
### Conventional Commits
Use this format for commit messages and Pull Request titles:
**Conventional Commit Types:**
- `feat`: Introduces a new feature to the codebase
- `fix`: Patches a bug in the codebase
- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
- `ci`: Changes to CI configuration
- `docs`: Documentation-only changes
- `dx`: Improvements to the developer experience
**Recommended Base Scopes:**
- `platform`: Changes affecting both frontend and backend
- `frontend`
- `backend`
- `infra`
- `blocks`: Modifications/additions of individual blocks
**Subscope Examples:**
- `backend/executor`
- `backend/db`
- `frontend/builder` (includes changes to the block UI component)
- `infra/prod`
Use these scopes and subscopes for clarity and consistency in commit messages.

View File

@@ -1,120 +1 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Repository Overview
AutoGPT Platform is a monorepo containing:
- **Backend** (`backend`): Python FastAPI server with async support
- **Frontend** (`frontend`): Next.js React application
- **Shared Libraries** (`autogpt_libs`): Common Python utilities
## Component Documentation
- **Backend**: See @backend/CLAUDE.md for backend-specific commands, architecture, and development tasks
- **Frontend**: See @frontend/CLAUDE.md for frontend-specific commands, architecture, and development patterns
## Key Concepts
1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
3. **Integrations**: OAuth and API connections stored per user
4. **Store**: Marketplace for sharing agent templates
5. **Virus Scanning**: ClamAV integration for file upload security
### Environment Configuration
#### Configuration Files
- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
#### Docker Environment Loading Order
1. `.env.default` files provide base configuration (tracked in git)
2. `.env` files provide user-specific overrides (gitignored)
3. Docker Compose `environment:` sections provide service-specific overrides
4. Shell environment variables have highest precedence
#### Key Points
- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
- The `env_file` directive loads variables INTO containers at runtime
- Backend/Frontend services use YAML anchors for consistent configuration
- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
### Branching Strategy
- **`dev`** is the main development branch. All PRs should target `dev`.
- **`master`** is the production branch. Only used for production releases.
### Creating Pull Requests
- Create the PR against the `dev` branch of the repository.
- **Split PRs by concern** — each PR should have a single clear purpose. For example, "usage tracking" and "credit charging" should be separate PRs even if related. Combining multiple concerns makes it harder for reviewers to understand what belongs to what.
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
```bash
PR_BODY=$(mktemp)
cat > "$PR_BODY" << 'PREOF'
## Summary
- use `backticks` freely here
PREOF
gh pr create --title "..." --body-file "$PR_BODY" --base dev
rm "$PR_BODY"
```
- Run the github pre-commit hooks to ensure code quality.
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, follow a test-first approach:
1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
2. **Implement the fix/feature** — write the minimal code to make the test pass.
3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
This ensures every change is covered by a test and that the test actually validates the intended behavior.
### Reviewing/Revising Pull Requests
Use `/pr-review` to review a PR or `/pr-address` to address comments.
When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
### Conventional Commits
Use this format for commit messages and Pull Request titles:
**Conventional Commit Types:**
- `feat`: Introduces a new feature to the codebase
- `fix`: Patches a bug in the codebase
- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
- `ci`: Changes to CI configuration
- `docs`: Documentation-only changes
- `dx`: Improvements to the developer experience
**Recommended Base Scopes:**
- `platform`: Changes affecting both frontend and backend
- `frontend`
- `backend`
- `infra`
- `blocks`: Modifications/additions of individual blocks
**Subscope Examples:**
- `backend/executor`
- `backend/db`
- `frontend/builder` (includes changes to the block UI component)
- `infra/prod`
Use these scopes and subscopes for clarity and consistency in commit messages.
@AGENTS.md

View File

@@ -178,6 +178,7 @@ SMTP_USERNAME=
SMTP_PASSWORD=
# Business & Marketing Tools
AGENTMAIL_API_KEY=
APOLLO_API_KEY=
ENRICHLAYER_API_KEY=
AYRSHARE_API_KEY=

View File

@@ -0,0 +1,227 @@
# Backend
This file provides guidance to coding agents when working with the backend.
## Essential Commands
To run something with Python package dependencies you MUST use `poetry run ...`.
```bash
# Install dependencies
poetry install
# Run database migrations
poetry run prisma migrate dev
# Start all services (database, redis, rabbitmq, clamav)
docker compose up -d
# Run the backend as a whole
poetry run app
# Run tests
poetry run test
# Run specific test
poetry run pytest path/to/test_file.py::test_function_name
# Run block tests (tests that validate all blocks work correctly)
poetry run pytest backend/blocks/test/test_block.py -xvs
# Run tests for a specific block (e.g., GetCurrentTimeBlock)
poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
# Lint and format
# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
poetry run format # Black + isort
poetry run lint # ruff
```
More details can be found in @TESTING.md
### Creating/Updating Snapshots
When you first write a test or when the expected output changes:
```bash
poetry run pytest path/to/test.py --snapshot-update
```
⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
## Architecture
- **API Layer**: FastAPI with REST and WebSocket endpoints
- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
- **Queue System**: RabbitMQ for async task processing
- **Execution Engine**: Separate executor service processes agent workflows
- **Authentication**: JWT-based with Supabase integration
- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
## Code Style
- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
- **Absolute imports** — use `from backend.module import ...` for cross-package imports. Single-dot relative (`from .sibling import ...`) is acceptable for sibling modules within the same package (e.g., blocks). Avoid double-dot relative imports (`from ..parent import ...`) — use the absolute path instead
- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
- **Pydantic models** over dataclass/namedtuple/dict for structured data
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
- **`max(0, value)` guards** — for computed values that should never be negative
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
## Testing Approach
- Uses pytest with snapshot testing for API responses
- Test files are colocated with source files (`*_test.py`)
- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, write the test **before** the implementation:
```python
# 1. Write a failing test marked xfail
@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
# 2. Run it — confirm it fails (XFAIL)
# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
# 3. Implement the fix
# 4. Remove xfail, run again — confirm it passes
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
```
This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
## Database Schema
Key models (defined in `schema.prisma`):
- `User`: Authentication and profile data
- `AgentGraph`: Workflow definitions with version control
- `AgentGraphExecution`: Execution history and results
- `AgentNode`: Individual nodes in a workflow
- `StoreListing`: Marketplace listings for sharing agents
## Environment Configuration
- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
## Common Development Tasks
### Adding a new block
Follow the comprehensive [Block SDK Guide](@../../docs/platform/block-sdk-guide.md) which covers:
- Provider configuration with `ProviderBuilder`
- Block schema definition
- Authentication (API keys, OAuth, webhooks)
- Testing and validation
- File organization
Quick steps:
1. Create new file in `backend/blocks/`
2. Configure provider using `ProviderBuilder` in `_config.py`
3. Inherit from `Block` base class
4. Define input/output schemas using `BlockSchema`
5. Implement async `run` method
6. Generate unique block ID using `uuid.uuid4()`
7. Test with `poetry run pytest backend/blocks/test/test_block.py`
Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
ex: do the inputs and outputs tie well together?
If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
#### Handling files in blocks with `store_media_file()`
When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
| Format | Use When | Returns |
|--------|----------|---------|
| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
**Examples:**
```python
# INPUT: Need to process file locally with ffmpeg
local_path = await store_media_file(
file=input_data.video,
execution_context=execution_context,
return_format="for_local_processing",
)
# local_path = "video.mp4" - use with Path/ffmpeg/etc
# INPUT: Need to send to external API like Replicate
image_b64 = await store_media_file(
file=input_data.image,
execution_context=execution_context,
return_format="for_external_api",
)
# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
# OUTPUT: Returning result from block
result_url = await store_media_file(
file=generated_image_url,
execution_context=execution_context,
return_format="for_block_output",
)
yield "image_url", result_url
# In CoPilot: result_url = "workspace://abc123"
# In graphs: result_url = "data:image/png;base64,..."
```
**Key points:**
- `for_block_output` is the ONLY format that auto-adapts to execution context
- Always use `for_block_output` for block outputs unless you have a specific reason not to
- Never hardcode workspace checks - let `for_block_output` handle it
### Modifying the API
1. Update route in `backend/api/features/`
2. Add/update Pydantic models in same directory
3. Write tests alongside the route file
4. Run `poetry run test` to verify
## Workspace & Media Files
**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
- Working on CoPilot file upload/download features
- Building blocks that handle `MediaFileType` inputs/outputs
- Modifying `WorkspaceManager` or `store_media_file()`
- Debugging file persistence or virus scanning issues
Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
## Security Implementation
### Cache Protection Middleware
- Located in `backend/api/middleware/security.py`
- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
- Uses an allow list approach - only explicitly permitted paths can be cached
- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
- Applied to both main API server and external API applications

View File

@@ -1,227 +1 @@
# CLAUDE.md - Backend
This file provides guidance to Claude Code when working with the backend.
## Essential Commands
To run something with Python package dependencies you MUST use `poetry run ...`.
```bash
# Install dependencies
poetry install
# Run database migrations
poetry run prisma migrate dev
# Start all services (database, redis, rabbitmq, clamav)
docker compose up -d
# Run the backend as a whole
poetry run app
# Run tests
poetry run test
# Run specific test
poetry run pytest path/to/test_file.py::test_function_name
# Run block tests (tests that validate all blocks work correctly)
poetry run pytest backend/blocks/test/test_block.py -xvs
# Run tests for a specific block (e.g., GetCurrentTimeBlock)
poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
# Lint and format
# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
poetry run format # Black + isort
poetry run lint # ruff
```
More details can be found in @TESTING.md
### Creating/Updating Snapshots
When you first write a test or when the expected output changes:
```bash
poetry run pytest path/to/test.py --snapshot-update
```
⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
## Architecture
- **API Layer**: FastAPI with REST and WebSocket endpoints
- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
- **Queue System**: RabbitMQ for async task processing
- **Execution Engine**: Separate executor service processes agent workflows
- **Authentication**: JWT-based with Supabase integration
- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
## Code Style
- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
- **Absolute imports** — use `from backend.module import ...` for cross-package imports. Single-dot relative (`from .sibling import ...`) is acceptable for sibling modules within the same package (e.g., blocks). Avoid double-dot relative imports (`from ..parent import ...`) — use the absolute path instead
- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
- **Pydantic models** over dataclass/namedtuple/dict for structured data
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
- **`max(0, value)` guards** — for computed values that should never be negative
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
## Testing Approach
- Uses pytest with snapshot testing for API responses
- Test files are colocated with source files (`*_test.py`)
- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, write the test **before** the implementation:
```python
# 1. Write a failing test marked xfail
@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
# 2. Run it — confirm it fails (XFAIL)
# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
# 3. Implement the fix
# 4. Remove xfail, run again — confirm it passes
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
```
This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
## Database Schema
Key models (defined in `schema.prisma`):
- `User`: Authentication and profile data
- `AgentGraph`: Workflow definitions with version control
- `AgentGraphExecution`: Execution history and results
- `AgentNode`: Individual nodes in a workflow
- `StoreListing`: Marketplace listings for sharing agents
## Environment Configuration
- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
## Common Development Tasks
### Adding a new block
Follow the comprehensive [Block SDK Guide](@../../docs/content/platform/block-sdk-guide.md) which covers:
- Provider configuration with `ProviderBuilder`
- Block schema definition
- Authentication (API keys, OAuth, webhooks)
- Testing and validation
- File organization
Quick steps:
1. Create new file in `backend/blocks/`
2. Configure provider using `ProviderBuilder` in `_config.py`
3. Inherit from `Block` base class
4. Define input/output schemas using `BlockSchema`
5. Implement async `run` method
6. Generate unique block ID using `uuid.uuid4()`
7. Test with `poetry run pytest backend/blocks/test/test_block.py`
Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
ex: do the inputs and outputs tie well together?
If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
#### Handling files in blocks with `store_media_file()`
When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
| Format | Use When | Returns |
|--------|----------|---------|
| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
**Examples:**
```python
# INPUT: Need to process file locally with ffmpeg
local_path = await store_media_file(
file=input_data.video,
execution_context=execution_context,
return_format="for_local_processing",
)
# local_path = "video.mp4" - use with Path/ffmpeg/etc
# INPUT: Need to send to external API like Replicate
image_b64 = await store_media_file(
file=input_data.image,
execution_context=execution_context,
return_format="for_external_api",
)
# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
# OUTPUT: Returning result from block
result_url = await store_media_file(
file=generated_image_url,
execution_context=execution_context,
return_format="for_block_output",
)
yield "image_url", result_url
# In CoPilot: result_url = "workspace://abc123"
# In graphs: result_url = "data:image/png;base64,..."
```
**Key points:**
- `for_block_output` is the ONLY format that auto-adapts to execution context
- Always use `for_block_output` for block outputs unless you have a specific reason not to
- Never hardcode workspace checks - let `for_block_output` handle it
### Modifying the API
1. Update route in `backend/api/features/`
2. Add/update Pydantic models in same directory
3. Write tests alongside the route file
4. Run `poetry run test` to verify
## Workspace & Media Files
**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
- Working on CoPilot file upload/download features
- Building blocks that handle `MediaFileType` inputs/outputs
- Modifying `WorkspaceManager` or `store_media_file()`
- Debugging file persistence or virus scanning issues
Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
## Security Implementation
### Cache Protection Middleware
- Located in `backend/api/middleware/security.py`
- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
- Uses an allow list approach - only explicitly permitted paths can be cached
- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
- Applied to both main API server and external API applications
@AGENTS.md

View File

@@ -31,7 +31,10 @@ from backend.data.model import (
UserPasswordCredentials,
is_sdk_default,
)
from backend.integrations.credentials_store import provider_matches
from backend.integrations.credentials_store import (
is_system_credential,
provider_matches,
)
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
from backend.integrations.providers import ProviderName
@@ -618,6 +621,11 @@ async def delete_credential(
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
)
if is_system_credential(cred_id):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="System-managed credentials cannot be deleted",
)
creds = await creds_manager.store.get_creds_by_id(auth.user_id, cred_id)
if not creds:
raise HTTPException(

View File

@@ -72,7 +72,7 @@ class RunAgentRequest(BaseModel):
def _create_ephemeral_session(user_id: str) -> ChatSession:
"""Create an ephemeral session for stateless API requests."""
return ChatSession.new(user_id)
return ChatSession.new(user_id, dry_run=False)
@tools_router.post(

View File

@@ -0,0 +1,85 @@
import logging
import typing
from datetime import datetime
from autogpt_libs.auth import get_user_id, requires_admin_user
from fastapi import APIRouter, Query, Security
from pydantic import BaseModel
from backend.data.platform_cost import (
CostLogRow,
PlatformCostDashboard,
get_platform_cost_dashboard,
get_platform_cost_logs,
)
from backend.util.models import Pagination
logger = logging.getLogger(__name__)
router = APIRouter(
prefix="/admin",
tags=["platform-cost", "admin"],
dependencies=[Security(requires_admin_user)],
)
class PlatformCostLogsResponse(BaseModel):
logs: list[CostLogRow]
pagination: Pagination
@router.get(
"/platform_costs/dashboard",
response_model=PlatformCostDashboard,
summary="Get Platform Cost Dashboard",
)
async def get_cost_dashboard(
admin_user_id: str = Security(get_user_id),
start: typing.Optional[datetime] = Query(None),
end: typing.Optional[datetime] = Query(None),
provider: typing.Optional[str] = Query(None),
user_id: typing.Optional[str] = Query(None),
):
logger.info(f"Admin {admin_user_id} fetching platform cost dashboard")
return await get_platform_cost_dashboard(
start=start,
end=end,
provider=provider,
user_id=user_id,
)
@router.get(
"/platform_costs/logs",
response_model=PlatformCostLogsResponse,
summary="Get Platform Cost Logs",
)
async def get_cost_logs(
admin_user_id: str = Security(get_user_id),
start: typing.Optional[datetime] = Query(None),
end: typing.Optional[datetime] = Query(None),
provider: typing.Optional[str] = Query(None),
user_id: typing.Optional[str] = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
):
logger.info(f"Admin {admin_user_id} fetching platform cost logs")
logs, total = await get_platform_cost_logs(
start=start,
end=end,
provider=provider,
user_id=user_id,
page=page,
page_size=page_size,
)
total_pages = (total + page_size - 1) // page_size
return PlatformCostLogsResponse(
logs=logs,
pagination=Pagination(
total_items=total,
total_pages=total_pages,
current_page=page,
page_size=page_size,
),
)

View File

@@ -0,0 +1,135 @@
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from .platform_cost_routes import router as platform_cost_router
app = fastapi.FastAPI()
app.include_router(platform_cost_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
"""Setup admin auth overrides for all tests in this module"""
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def test_get_dashboard_success(
mocker: pytest_mock.MockerFixture,
) -> None:
mock_dashboard = AsyncMock(
return_value=AsyncMock(
by_provider=[],
by_user=[],
total_cost_microdollars=0,
total_requests=0,
total_users=0,
model_dump=lambda **_: {
"by_provider": [],
"by_user": [],
"total_cost_microdollars": 0,
"total_requests": 0,
"total_users": 0,
},
)
)
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
mock_dashboard,
)
response = client.get("/admin/platform_costs/dashboard")
assert response.status_code == 200
data = response.json()
assert "by_provider" in data
assert "by_user" in data
assert data["total_cost_microdollars"] == 0
def test_get_logs_success(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
AsyncMock(return_value=([], 0)),
)
response = client.get("/admin/platform_costs/logs")
assert response.status_code == 200
data = response.json()
assert data["logs"] == []
assert data["pagination"]["total_items"] == 0
def test_get_dashboard_with_filters(
mocker: pytest_mock.MockerFixture,
) -> None:
mock_dashboard = AsyncMock(
return_value=AsyncMock(
by_provider=[],
by_user=[],
total_cost_microdollars=0,
total_requests=0,
total_users=0,
model_dump=lambda **_: {
"by_provider": [],
"by_user": [],
"total_cost_microdollars": 0,
"total_requests": 0,
"total_users": 0,
},
)
)
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
mock_dashboard,
)
response = client.get(
"/admin/platform_costs/dashboard",
params={
"start": "2026-01-01T00:00:00",
"end": "2026-04-01T00:00:00",
"provider": "openai",
"user_id": "test-user-123",
},
)
assert response.status_code == 200
mock_dashboard.assert_called_once()
call_kwargs = mock_dashboard.call_args.kwargs
assert call_kwargs["provider"] == "openai"
assert call_kwargs["user_id"] == "test-user-123"
assert call_kwargs["start"] is not None
assert call_kwargs["end"] is not None
def test_get_logs_with_pagination(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
AsyncMock(return_value=([], 0)),
)
response = client.get(
"/admin/platform_costs/logs",
params={"page": 2, "page_size": 25, "provider": "anthropic"},
)
assert response.status_code == 200
data = response.json()
assert data["pagination"]["current_page"] == 2
assert data["pagination"]["page_size"] == 25
def test_get_dashboard_requires_admin() -> None:
app.dependency_overrides.clear()
response = client.get("/admin/platform_costs/dashboard")
assert response.status_code in (401, 403)

View File

@@ -0,0 +1,253 @@
"""Admin endpoints for checking and resetting user CoPilot rate limit usage."""
import logging
from typing import Optional
from autogpt_libs.auth import get_user_id, requires_admin_user
from fastapi import APIRouter, Body, HTTPException, Security
from pydantic import BaseModel
from backend.copilot.config import ChatConfig
from backend.copilot.rate_limit import (
SubscriptionTier,
get_global_rate_limits,
get_usage_status,
get_user_tier,
reset_user_usage,
set_user_tier,
)
from backend.data.user import get_user_by_email, get_user_email_by_id, search_users
logger = logging.getLogger(__name__)
config = ChatConfig()
router = APIRouter(
prefix="/admin",
tags=["copilot", "admin"],
dependencies=[Security(requires_admin_user)],
)
class UserRateLimitResponse(BaseModel):
user_id: str
user_email: Optional[str] = None
daily_token_limit: int
weekly_token_limit: int
daily_tokens_used: int
weekly_tokens_used: int
tier: SubscriptionTier
class UserTierResponse(BaseModel):
user_id: str
tier: SubscriptionTier
class SetUserTierRequest(BaseModel):
user_id: str
tier: SubscriptionTier
async def _resolve_user_id(
user_id: Optional[str], email: Optional[str]
) -> tuple[str, Optional[str]]:
"""Resolve a user_id and email from the provided parameters.
Returns (user_id, email). Accepts either user_id or email; at least one
must be provided. When both are provided, ``email`` takes precedence.
"""
if email:
user = await get_user_by_email(email)
if not user:
raise HTTPException(
status_code=404, detail="No user found with the provided email."
)
return user.id, email
if not user_id:
raise HTTPException(
status_code=400,
detail="Either user_id or email query parameter is required.",
)
# We have a user_id; try to look up their email for display purposes.
# This is non-critical -- a failure should not block the response.
try:
resolved_email = await get_user_email_by_id(user_id)
except Exception:
logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
resolved_email = None
return user_id, resolved_email
@router.get(
"/rate_limit",
response_model=UserRateLimitResponse,
summary="Get User Rate Limit",
)
async def get_user_rate_limit(
user_id: Optional[str] = None,
email: Optional[str] = None,
admin_user_id: str = Security(get_user_id),
) -> UserRateLimitResponse:
"""Get a user's current usage and effective rate limits. Admin-only.
Accepts either ``user_id`` or ``email`` as a query parameter.
When ``email`` is provided the user is looked up by email first.
"""
resolved_id, resolved_email = await _resolve_user_id(user_id, email)
logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
daily_limit, weekly_limit, tier = await get_global_rate_limits(
resolved_id, config.daily_token_limit, config.weekly_token_limit
)
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
return UserRateLimitResponse(
user_id=resolved_id,
user_email=resolved_email,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
tier=tier,
)
@router.post(
"/rate_limit/reset",
response_model=UserRateLimitResponse,
summary="Reset User Rate Limit Usage",
)
async def reset_user_rate_limit(
user_id: str = Body(embed=True),
reset_weekly: bool = Body(False, embed=True),
admin_user_id: str = Security(get_user_id),
) -> UserRateLimitResponse:
"""Reset a user's daily usage counter (and optionally weekly). Admin-only."""
logger.info(
"Admin %s resetting rate limit for user %s (reset_weekly=%s)",
admin_user_id,
user_id,
reset_weekly,
)
try:
await reset_user_usage(user_id, reset_weekly=reset_weekly)
except Exception as e:
logger.exception("Failed to reset user usage")
raise HTTPException(status_code=500, detail="Failed to reset usage") from e
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
try:
resolved_email = await get_user_email_by_id(user_id)
except Exception:
logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
resolved_email = None
return UserRateLimitResponse(
user_id=user_id,
user_email=resolved_email,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
tier=tier,
)
@router.get(
"/rate_limit/tier",
response_model=UserTierResponse,
summary="Get User Rate Limit Tier",
)
async def get_user_rate_limit_tier(
user_id: str,
admin_user_id: str = Security(get_user_id),
) -> UserTierResponse:
"""Get a user's current rate-limit tier. Admin-only.
Returns 404 if the user does not exist in the database.
"""
logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
resolved_email = await get_user_email_by_id(user_id)
if resolved_email is None:
raise HTTPException(status_code=404, detail=f"User {user_id} not found")
tier = await get_user_tier(user_id)
return UserTierResponse(user_id=user_id, tier=tier)
@router.post(
"/rate_limit/tier",
response_model=UserTierResponse,
summary="Set User Rate Limit Tier",
)
async def set_user_rate_limit_tier(
request: SetUserTierRequest,
admin_user_id: str = Security(get_user_id),
) -> UserTierResponse:
"""Set a user's rate-limit tier. Admin-only."""
old_tier = await get_user_tier(request.user_id)
# Resolve email for audit logging (non-blocking — don't fail the
# tier change if email lookup fails).
try:
resolved_email = await get_user_email_by_id(request.user_id)
except Exception:
logger.warning(
"Failed to resolve email for user %s", request.user_id, exc_info=True
)
resolved_email = None
logger.info(
"Admin %s changing tier for user %s (%s): %s -> %s",
admin_user_id,
request.user_id,
resolved_email or "unknown",
old_tier.value,
request.tier.value,
)
try:
await set_user_tier(request.user_id, request.tier)
except Exception as e:
logger.exception("Failed to set user tier")
raise HTTPException(status_code=500, detail="Failed to set tier") from e
return UserTierResponse(user_id=request.user_id, tier=request.tier)
class UserSearchResult(BaseModel):
user_id: str
user_email: Optional[str] = None
@router.get(
"/rate_limit/search_users",
response_model=list[UserSearchResult],
summary="Search Users by Name or Email",
)
async def admin_search_users(
query: str,
limit: int = 20,
admin_user_id: str = Security(get_user_id),
) -> list[UserSearchResult]:
"""Search users by partial email or name. Admin-only.
Queries the User table directly — returns results even for users
without credit transaction history.
"""
if len(query.strip()) < 3:
raise HTTPException(
status_code=400,
detail="Search query must be at least 3 characters.",
)
logger.info("Admin %s searching users with query=%r", admin_user_id, query)
results = await search_users(query, limit=max(1, min(limit, 50)))
return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]

View File

@@ -0,0 +1,557 @@
import json
from types import SimpleNamespace
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from pytest_snapshot.plugin import Snapshot
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
from .rate_limit_admin_routes import router as rate_limit_admin_router
app = fastapi.FastAPI()
app.include_router(rate_limit_admin_router)
client = fastapi.testclient.TestClient(app)
_MOCK_MODULE = "backend.api.features.admin.rate_limit_admin_routes"
_TARGET_EMAIL = "target@example.com"
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
"""Setup admin auth overrides for all tests in this module"""
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def _mock_usage_status(
daily_used: int = 500_000, weekly_used: int = 3_000_000
) -> CoPilotUsageStatus:
from datetime import UTC, datetime, timedelta
now = datetime.now(UTC)
return CoPilotUsageStatus(
daily=UsageWindow(
used=daily_used, limit=2_500_000, resets_at=now + timedelta(hours=6)
),
weekly=UsageWindow(
used=weekly_used, limit=12_500_000, resets_at=now + timedelta(days=3)
),
)
def _patch_rate_limit_deps(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
daily_used: int = 500_000,
weekly_used: int = 3_000_000,
):
"""Patch the common rate-limit + user-lookup dependencies."""
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
new_callable=AsyncMock,
return_value=_mock_usage_status(daily_used=daily_used, weekly_used=weekly_used),
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
def test_get_rate_limit(
mocker: pytest_mock.MockerFixture,
configured_snapshot: Snapshot,
target_user_id: str,
) -> None:
"""Test getting rate limit and usage for a user."""
_patch_rate_limit_deps(mocker, target_user_id)
response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["user_email"] == _TARGET_EMAIL
assert data["daily_token_limit"] == 2_500_000
assert data["weekly_token_limit"] == 12_500_000
assert data["daily_tokens_used"] == 500_000
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
configured_snapshot.assert_match(
json.dumps(data, indent=2, sort_keys=True) + "\n",
"get_rate_limit",
)
def test_get_rate_limit_by_email(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test looking up rate limits via email instead of user_id."""
_patch_rate_limit_deps(mocker, target_user_id)
mock_user = SimpleNamespace(id=target_user_id, email=_TARGET_EMAIL)
mocker.patch(
f"{_MOCK_MODULE}.get_user_by_email",
new_callable=AsyncMock,
return_value=mock_user,
)
response = client.get("/admin/rate_limit", params={"email": _TARGET_EMAIL})
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["user_email"] == _TARGET_EMAIL
assert data["daily_token_limit"] == 2_500_000
def test_get_rate_limit_by_email_not_found(
mocker: pytest_mock.MockerFixture,
) -> None:
"""Test that looking up a non-existent email returns 404."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_by_email",
new_callable=AsyncMock,
return_value=None,
)
response = client.get("/admin/rate_limit", params={"email": "nobody@example.com"})
assert response.status_code == 404
def test_get_rate_limit_no_params() -> None:
"""Test that omitting both user_id and email returns 400."""
response = client.get("/admin/rate_limit")
assert response.status_code == 400
def test_reset_user_usage_daily_only(
mocker: pytest_mock.MockerFixture,
configured_snapshot: Snapshot,
target_user_id: str,
) -> None:
"""Test resetting only daily usage (default behaviour)."""
mock_reset = mocker.patch(
f"{_MOCK_MODULE}.reset_user_usage",
new_callable=AsyncMock,
)
_patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=3_000_000)
response = client.post(
"/admin/rate_limit/reset",
json={"user_id": target_user_id},
)
assert response.status_code == 200
data = response.json()
assert data["daily_tokens_used"] == 0
# Weekly is untouched
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
configured_snapshot.assert_match(
json.dumps(data, indent=2, sort_keys=True) + "\n",
"reset_user_usage_daily_only",
)
def test_reset_user_usage_daily_and_weekly(
mocker: pytest_mock.MockerFixture,
configured_snapshot: Snapshot,
target_user_id: str,
) -> None:
"""Test resetting both daily and weekly usage."""
mock_reset = mocker.patch(
f"{_MOCK_MODULE}.reset_user_usage",
new_callable=AsyncMock,
)
_patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=0)
response = client.post(
"/admin/rate_limit/reset",
json={"user_id": target_user_id, "reset_weekly": True},
)
assert response.status_code == 200
data = response.json()
assert data["daily_tokens_used"] == 0
assert data["weekly_tokens_used"] == 0
assert data["tier"] == "FREE"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
configured_snapshot.assert_match(
json.dumps(data, indent=2, sort_keys=True) + "\n",
"reset_user_usage_daily_and_weekly",
)
def test_reset_user_usage_redis_failure(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that Redis failure on reset returns 500."""
mocker.patch(
f"{_MOCK_MODULE}.reset_user_usage",
new_callable=AsyncMock,
side_effect=Exception("Redis connection refused"),
)
response = client.post(
"/admin/rate_limit/reset",
json={"user_id": target_user_id},
)
assert response.status_code == 500
def test_get_rate_limit_email_lookup_failure(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that failing to resolve a user email degrades gracefully."""
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
new_callable=AsyncMock,
return_value=_mock_usage_status(),
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
side_effect=Exception("DB connection lost"),
)
response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["user_email"] is None
def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
"""Test that rate limit admin endpoints require admin role."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
response = client.get("/admin/rate_limit", params={"user_id": "test"})
assert response.status_code == 403
response = client.post(
"/admin/rate_limit/reset",
json={"user_id": "test"},
)
assert response.status_code == 403
# ---------------------------------------------------------------------------
# Tier management endpoints
# ---------------------------------------------------------------------------
def test_get_user_tier(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test getting a user's rate-limit tier."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
)
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "PRO"
def test_get_user_tier_user_not_found(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that getting tier for a non-existent user returns 404."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=None,
)
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
assert response.status_code == 404
def test_set_user_tier(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test setting a user's rate-limit tier (upgrade)."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "ENTERPRISE"},
)
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "ENTERPRISE"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
def test_set_user_tier_downgrade(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test downgrading a user's tier from PRO to FREE."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "FREE"},
)
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "FREE"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
def test_set_user_tier_invalid_tier(
target_user_id: str,
) -> None:
"""Test that setting an invalid tier returns 422."""
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "invalid"},
)
assert response.status_code == 422
def test_set_user_tier_invalid_tier_uppercase(
target_user_id: str,
) -> None:
"""Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
Regression: ensures Pydantic enum validation rejects values that are not
members of SubscriptionTier, even when they look like valid enum names.
"""
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "INVALID"},
)
assert response.status_code == 422
body = response.json()
assert "detail" in body
def test_set_user_tier_email_lookup_failure_non_blocking(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that email lookup failure doesn't block tier change."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
side_effect=Exception("DB connection failed"),
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 200
mock_set.assert_awaited_once()
def test_set_user_tier_db_failure(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that DB failure on set tier returns 500."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
side_effect=Exception("DB connection refused"),
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 500
def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
"""Test that tier admin endpoints require admin role."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
assert response.status_code == 403
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": "test", "tier": "PRO"},
)
assert response.status_code == 403
# ─── search_users endpoint ──────────────────────────────────────────
def test_search_users_returns_matching_users(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Partial search should return all matching users from the User table."""
mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[
("user-1", "zamil.majdy@gmail.com"),
("user-2", "zamil.majdy@agpt.co"),
],
)
response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
assert response.status_code == 200
results = response.json()
assert len(results) == 2
assert results[0]["user_email"] == "zamil.majdy@gmail.com"
assert results[1]["user_email"] == "zamil.majdy@agpt.co"
def test_search_users_empty_results(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Search with no matches returns empty list."""
mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[],
)
response = client.get(
"/admin/rate_limit/search_users", params={"query": "nonexistent"}
)
assert response.status_code == 200
assert response.json() == []
def test_search_users_short_query_rejected(
admin_user_id: str,
) -> None:
"""Query shorter than 3 characters should return 400."""
response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
assert response.status_code == 400
def test_search_users_negative_limit_clamped(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Negative limit should be clamped to 1, not passed through."""
mock_search = mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[],
)
response = client.get(
"/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
)
assert response.status_code == 200
mock_search.assert_awaited_once_with("test", limit=1)
def test_search_users_requires_admin_role(mock_jwt_user) -> None:
"""Test that the search_users endpoint requires admin role."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
assert response.status_code == 403

View File

@@ -4,14 +4,14 @@ import asyncio
import logging
import re
from collections.abc import AsyncGenerator
from typing import Annotated
from typing import Annotated, Literal
from uuid import uuid4
from autogpt_libs import auth
from fastapi import APIRouter, HTTPException, Query, Response, Security
from fastapi.responses import StreamingResponse
from prisma.models import UserWorkspaceFile
from pydantic import BaseModel, Field, field_validator
from pydantic import BaseModel, ConfigDict, Field, field_validator
from backend.copilot import service as chat_service
from backend.copilot import stream_registry
@@ -20,6 +20,7 @@ from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_
from backend.copilot.model import (
ChatMessage,
ChatSession,
ChatSessionMetadata,
append_and_save_message,
create_chat_session,
delete_chat_session,
@@ -30,8 +31,14 @@ from backend.copilot.model import (
from backend.copilot.rate_limit import (
CoPilotUsageStatus,
RateLimitExceeded,
acquire_reset_lock,
check_rate_limit,
get_daily_reset_count,
get_global_rate_limits,
get_usage_status,
increment_daily_reset_count,
release_reset_lock,
reset_daily_usage,
)
from backend.copilot.response_model import StreamError, StreamFinish, StreamHeartbeat
from backend.copilot.tools.e2b_sandbox import kill_sandbox
@@ -59,9 +66,16 @@ from backend.copilot.tools.models import (
UnderstandingUpdatedResponse,
)
from backend.copilot.tracking import track_user_message
from backend.data.credit import UsageTransactionMetadata, get_user_credit_model
from backend.data.redis_client import get_redis_async
from backend.data.understanding import get_business_understanding
from backend.data.workspace import get_or_create_workspace
from backend.util.exceptions import NotFoundError
from backend.util.exceptions import InsufficientBalanceError, NotFoundError
from backend.util.settings import Settings
settings = Settings()
logger = logging.getLogger(__name__)
config = ChatConfig()
@@ -69,8 +83,6 @@ _UUID_RE = re.compile(
r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", re.I
)
logger = logging.getLogger(__name__)
async def _validate_and_get_session(
session_id: str,
@@ -99,6 +111,23 @@ class StreamChatRequest(BaseModel):
file_ids: list[str] | None = Field(
default=None, max_length=20
) # Workspace file IDs attached to this message
mode: Literal["fast", "extended_thinking"] | None = Field(
default=None,
description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
"If None, uses the server default (extended_thinking).",
)
class CreateSessionRequest(BaseModel):
"""Request model for creating a new chat session.
``dry_run`` is a **top-level** field — do not nest it inside ``metadata``.
Extra/unknown fields are rejected (422) to prevent silent mis-use.
"""
model_config = ConfigDict(extra="forbid")
dry_run: bool = False
class CreateSessionResponse(BaseModel):
@@ -107,6 +136,7 @@ class CreateSessionResponse(BaseModel):
id: str
created_at: str
user_id: str | None
metadata: ChatSessionMetadata = ChatSessionMetadata()
class ActiveStreamInfo(BaseModel):
@@ -127,6 +157,7 @@ class SessionDetailResponse(BaseModel):
active_stream: ActiveStreamInfo | None = None # Present if stream is still active
total_prompt_tokens: int = 0
total_completion_tokens: int = 0
metadata: ChatSessionMetadata = ChatSessionMetadata()
class SessionSummaryResponse(BaseModel):
@@ -237,6 +268,7 @@ async def list_sessions(
)
async def create_session(
user_id: Annotated[str, Security(auth.get_user_id)],
request: CreateSessionRequest | None = None,
) -> CreateSessionResponse:
"""
Create a new chat session.
@@ -245,22 +277,28 @@ async def create_session(
Args:
user_id: The authenticated user ID parsed from the JWT (required).
request: Optional request body. When provided, ``dry_run=True``
forces run_block and run_agent calls to use dry-run simulation.
Returns:
CreateSessionResponse: Details of the created session.
"""
dry_run = request.dry_run if request else False
logger.info(
f"Creating session with user_id: "
f"...{user_id[-8:] if len(user_id) > 8 else '<redacted>'}"
f"{', dry_run=True' if dry_run else ''}"
)
session = await create_chat_session(user_id)
session = await create_chat_session(user_id, dry_run=dry_run)
return CreateSessionResponse(
id=session.session_id,
created_at=session.started_at.isoformat(),
user_id=session.user_id,
metadata=session.metadata,
)
@@ -409,6 +447,7 @@ async def get_session(
active_stream=active_stream_info,
total_prompt_tokens=total_prompt,
total_completion_tokens=total_completion,
metadata=session.metadata,
)
@@ -421,11 +460,193 @@ async def get_copilot_usage(
"""Get CoPilot usage status for the authenticated user.
Returns current token usage vs limits for daily and weekly windows.
Global defaults sourced from LaunchDarkly (falling back to config).
Includes the user's rate-limit tier.
"""
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
return await get_usage_status(
user_id=user_id,
daily_token_limit=config.daily_token_limit,
weekly_token_limit=config.weekly_token_limit,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
class RateLimitResetResponse(BaseModel):
"""Response from resetting the daily rate limit."""
success: bool
credits_charged: int = Field(description="Credits charged (in cents)")
remaining_balance: int = Field(description="Credit balance after charge (in cents)")
usage: CoPilotUsageStatus = Field(description="Updated usage status after reset")
@router.post(
"/usage/reset",
status_code=200,
responses={
400: {
"description": "Bad Request (feature disabled or daily limit not reached)"
},
402: {"description": "Payment Required (insufficient credits)"},
429: {
"description": "Too Many Requests (max daily resets exceeded or reset in progress)"
},
503: {
"description": "Service Unavailable (Redis reset failed; credits refunded or support needed)"
},
},
)
async def reset_copilot_usage(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> RateLimitResetResponse:
"""Reset the daily CoPilot rate limit by spending credits.
Allows users who have hit their daily token limit to spend credits
to reset their daily usage counter and continue working.
Returns 400 if the feature is disabled or the user is not over the limit.
Returns 402 if the user has insufficient credits.
"""
cost = config.rate_limit_reset_cost
if cost <= 0:
raise HTTPException(
status_code=400,
detail="Rate limit reset is not available.",
)
if not settings.config.enable_credit:
raise HTTPException(
status_code=400,
detail="Rate limit reset is not available (credit system is disabled).",
)
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
if daily_limit <= 0:
raise HTTPException(
status_code=400,
detail="No daily limit is configured — nothing to reset.",
)
# Check max daily resets. get_daily_reset_count returns None when Redis
# is unavailable; reject the reset in that case to prevent unlimited
# free resets when the counter store is down.
reset_count = await get_daily_reset_count(user_id)
if reset_count is None:
raise HTTPException(
status_code=503,
detail="Unable to verify reset eligibility — please try again later.",
)
if config.max_daily_resets > 0 and reset_count >= config.max_daily_resets:
raise HTTPException(
status_code=429,
detail=f"You've used all {config.max_daily_resets} resets for today.",
)
# Acquire a per-user lock to prevent TOCTOU races (concurrent resets).
if not await acquire_reset_lock(user_id):
raise HTTPException(
status_code=429,
detail="A reset is already in progress. Please try again.",
)
try:
# Verify the user is actually at or over their daily limit.
# (rate_limit_reset_cost intentionally omitted — this object is only
# used for limit checks, not returned to the client.)
usage_status = await get_usage_status(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
tier=tier,
)
if daily_limit > 0 and usage_status.daily.used < daily_limit:
raise HTTPException(
status_code=400,
detail="You have not reached your daily limit yet.",
)
# If the weekly limit is also exhausted, resetting the daily counter
# won't help — the user would still be blocked by the weekly limit.
if weekly_limit > 0 and usage_status.weekly.used >= weekly_limit:
raise HTTPException(
status_code=400,
detail="Your weekly limit is also reached. Resetting the daily limit won't help.",
)
# Charge credits.
credit_model = await get_user_credit_model(user_id)
try:
remaining = await credit_model.spend_credits(
user_id=user_id,
cost=cost,
metadata=UsageTransactionMetadata(
reason="CoPilot daily rate limit reset",
),
)
except InsufficientBalanceError as e:
raise HTTPException(
status_code=402,
detail="Insufficient credits to reset your rate limit.",
) from e
# Reset daily usage in Redis. If this fails, refund the credits
# so the user is not charged for a service they did not receive.
if not await reset_daily_usage(user_id, daily_token_limit=daily_limit):
# Compensate: refund the charged credits.
refunded = False
try:
await credit_model.top_up_credits(user_id, cost)
refunded = True
logger.warning(
"Refunded %d credits to user %s after Redis reset failure",
cost,
user_id[:8],
)
except Exception:
logger.error(
"CRITICAL: Failed to refund %d credits to user %s "
"after Redis reset failure — manual intervention required",
cost,
user_id[:8],
exc_info=True,
)
if refunded:
raise HTTPException(
status_code=503,
detail="Rate limit reset failed — please try again later. "
"Your credits have not been charged.",
)
raise HTTPException(
status_code=503,
detail="Rate limit reset failed and the automatic refund "
"also failed. Please contact support for assistance.",
)
# Track the reset count for daily cap enforcement.
await increment_daily_reset_count(user_id)
finally:
await release_reset_lock(user_id)
# Return updated usage status.
updated_usage = await get_usage_status(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
return RateLimitResetResponse(
success=True,
credits_charged=cost,
remaining_balance=remaining,
usage=updated_usage,
)
@@ -526,12 +747,16 @@ async def stream_chat_post(
# Pre-turn rate limit check (token-based).
# check_rate_limit short-circuits internally when both limits are 0.
# Global defaults sourced from LaunchDarkly, falling back to config.
if user_id:
try:
daily_limit, weekly_limit, _ = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
await check_rate_limit(
user_id=user_id,
daily_token_limit=config.daily_token_limit,
weekly_token_limit=config.weekly_token_limit,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
)
except RateLimitExceeded as e:
raise HTTPException(status_code=429, detail=str(e)) from e
@@ -620,6 +845,7 @@ async def stream_chat_post(
is_user_message=request.is_user_message,
context=request.context,
file_ids=sanitized_file_ids,
mode=request.mode,
)
setup_time = (time.perf_counter() - stream_start_time) * 1000
@@ -894,6 +1120,47 @@ async def session_assign_user(
return {"status": "ok"}
# ========== Suggested Prompts ==========
class SuggestedTheme(BaseModel):
"""A themed group of suggested prompts."""
name: str
prompts: list[str]
class SuggestedPromptsResponse(BaseModel):
"""Response model for user-specific suggested prompts grouped by theme."""
themes: list[SuggestedTheme]
@router.get(
"/suggested-prompts",
dependencies=[Security(auth.requires_user)],
)
async def get_suggested_prompts(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> SuggestedPromptsResponse:
"""
Get LLM-generated suggested prompts grouped by theme.
Returns personalized quick-action prompts based on the user's
business understanding. Returns empty themes list if no custom
prompts are available.
"""
understanding = await get_business_understanding(user_id)
if understanding is None or not understanding.suggested_prompts:
return SuggestedPromptsResponse(themes=[])
themes = [
SuggestedTheme(name=name, prompts=prompts)
for name, prompts in understanding.suggested_prompts.items()
]
return SuggestedPromptsResponse(themes=themes)
# ========== Configuration ==========
@@ -942,7 +1209,7 @@ async def health_check() -> dict:
)
# Create and retrieve session to verify full data layer
session = await create_chat_session(health_check_user_id)
session = await create_chat_session(health_check_user_id, dry_run=False)
await get_chat_session(session.session_id, health_check_user_id)
return {

View File

@@ -1,7 +1,7 @@
"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""
from datetime import UTC, datetime, timedelta
from unittest.mock import AsyncMock
from unittest.mock import AsyncMock, MagicMock
import fastapi
import fastapi.testclient
@@ -9,6 +9,7 @@ import pytest
import pytest_mock
from backend.api.features.chat import routes as chat_routes
from backend.copilot.rate_limit import SubscriptionTier
app = fastapi.FastAPI()
app.include_router(chat_routes.router)
@@ -331,14 +332,28 @@ def _mock_usage(
*,
daily_used: int = 500,
weekly_used: int = 2000,
daily_limit: int = 10000,
weekly_limit: int = 50000,
tier: "SubscriptionTier" = SubscriptionTier.FREE,
) -> AsyncMock:
"""Mock get_usage_status to return a predictable CoPilotUsageStatus."""
"""Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
``get_usage_status`` so that tests exercise the endpoint without hitting
LaunchDarkly or Prisma.
"""
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
mocker.patch(
"backend.api.features.chat.routes.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(daily_limit, weekly_limit, tier),
)
resets_at = datetime.now(UTC) + timedelta(days=1)
status = CoPilotUsageStatus(
daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
)
return mocker.patch(
"backend.api.features.chat.routes.get_usage_status",
@@ -368,6 +383,8 @@ def test_usage_returns_daily_and_weekly(
user_id=test_user_id,
daily_token_limit=10000,
weekly_token_limit=50000,
rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
tier=SubscriptionTier.FREE,
)
@@ -375,11 +392,10 @@ def test_usage_uses_config_limits(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""The endpoint forwards daily_token_limit and weekly_token_limit from config."""
mock_get = _mock_usage(mocker)
"""The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)
mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)
response = client.get("/usage")
@@ -388,6 +404,8 @@ def test_usage_uses_config_limits(
user_id=test_user_id,
daily_token_limit=99999,
weekly_token_limit=77777,
rate_limit_reset_cost=500,
tier=SubscriptionTier.FREE,
)
@@ -400,3 +418,126 @@ def test_usage_rejects_unauthenticated_request() -> None:
response = unauthenticated_client.get("/usage")
assert response.status_code == 401
# ─── Suggested prompts endpoint ──────────────────────────────────────
def _mock_get_business_understanding(
mocker: pytest_mock.MockerFixture,
*,
return_value=None,
):
"""Mock get_business_understanding."""
return mocker.patch(
"backend.api.features.chat.routes.get_business_understanding",
new_callable=AsyncMock,
return_value=return_value,
)
def test_suggested_prompts_returns_themes(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with themed prompts gets them back as themes list."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = {
"Learn": ["L1", "L2"],
"Create": ["C1"],
}
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
data = response.json()
assert "themes" in data
themes_by_name = {t["name"]: t["prompts"] for t in data["themes"]}
assert themes_by_name["Learn"] == ["L1", "L2"]
assert themes_by_name["Create"] == ["C1"]
def test_suggested_prompts_no_understanding(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with no understanding gets empty themes list."""
_mock_get_business_understanding(mocker, return_value=None)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"themes": []}
def test_suggested_prompts_empty_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding but empty prompts gets empty themes list."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = {}
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"themes": []}
# ─── Create session: dry_run contract ─────────────────────────────────
def _mock_create_chat_session(mocker: pytest_mock.MockerFixture):
"""Mock create_chat_session to return a fake session."""
from backend.copilot.model import ChatSession
async def _fake_create(user_id: str, *, dry_run: bool):
return ChatSession.new(user_id, dry_run=dry_run)
return mocker.patch(
"backend.api.features.chat.routes.create_chat_session",
new_callable=AsyncMock,
side_effect=_fake_create,
)
def test_create_session_dry_run_true(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""Sending ``{"dry_run": true}`` sets metadata.dry_run to True."""
_mock_create_chat_session(mocker)
response = client.post("/sessions", json={"dry_run": True})
assert response.status_code == 200
assert response.json()["metadata"]["dry_run"] is True
def test_create_session_dry_run_default_false(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""Empty body defaults dry_run to False."""
_mock_create_chat_session(mocker)
response = client.post("/sessions")
assert response.status_code == 200
assert response.json()["metadata"]["dry_run"] is False
def test_create_session_rejects_nested_metadata(
test_user_id: str,
) -> None:
"""Sending ``{"metadata": {"dry_run": true}}`` must return 422, not silently
default to ``dry_run=False``. This guards against the common mistake of
nesting dry_run inside metadata instead of providing it at the top level."""
response = client.post(
"/sessions",
json={"metadata": {"dry_run": True}},
)
assert response.status_code == 422

View File

@@ -40,11 +40,15 @@ from backend.data.onboarding import OnboardingStep, complete_onboarding_step
from backend.data.user import get_user_integrations
from backend.executor.utils import add_graph_execution
from backend.integrations.ayrshare import AyrshareClient, SocialPlatform
from backend.integrations.credentials_store import provider_matches
from backend.integrations.credentials_store import (
is_system_credential,
provider_matches,
)
from backend.integrations.creds_manager import (
IntegrationCredentialsManager,
create_mcp_oauth_handler,
)
from backend.integrations.managed_credentials import ensure_managed_credentials
from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
from backend.integrations.providers import ProviderName
from backend.integrations.webhooks import get_webhook_manager
@@ -110,6 +114,7 @@ class CredentialsMetaResponse(BaseModel):
default=None,
description="Host pattern for host-scoped or MCP server URL for MCP credentials",
)
is_managed: bool = False
@model_validator(mode="before")
@classmethod
@@ -148,6 +153,7 @@ def to_meta_response(cred: Credentials) -> CredentialsMetaResponse:
scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
username=cred.username if isinstance(cred, OAuth2Credentials) else None,
host=CredentialsMetaResponse.get_host(cred),
is_managed=cred.is_managed,
)
@@ -224,6 +230,9 @@ async def callback(
async def list_credentials(
user_id: Annotated[str, Security(get_user_id)],
) -> list[CredentialsMetaResponse]:
# Fire-and-forget: provision missing managed credentials in the background.
# The credential appears on the next page load; listing is never blocked.
asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
credentials = await creds_manager.store.get_all_creds(user_id)
return [
@@ -238,6 +247,7 @@ async def list_credentials_by_provider(
],
user_id: Annotated[str, Security(get_user_id)],
) -> list[CredentialsMetaResponse]:
asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
credentials = await creds_manager.store.get_creds_by_provider(user_id, provider)
return [
@@ -332,6 +342,11 @@ async def delete_credentials(
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
)
if is_system_credential(cred_id):
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="System-managed credentials cannot be deleted",
)
creds = await creds_manager.store.get_creds_by_id(user_id, cred_id)
if not creds:
raise HTTPException(
@@ -342,6 +357,11 @@ async def delete_credentials(
status_code=status.HTTP_404_NOT_FOUND,
detail="Credentials not found",
)
if creds.is_managed:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="AutoGPT-managed credentials cannot be deleted",
)
try:
await remove_all_webhooks_for_credentials(user_id, creds, force)

View File

@@ -1,6 +1,7 @@
"""Tests for credentials API security: no secret leakage, SDK defaults filtered."""
from unittest.mock import AsyncMock, patch
from contextlib import asynccontextmanager
from unittest.mock import AsyncMock, MagicMock, patch
import fastapi
import fastapi.testclient
@@ -276,3 +277,294 @@ class TestCreateCredentialNoSecretInResponse:
assert resp.status_code == 403
mock_mgr.create.assert_not_called()
class TestManagedCredentials:
"""AutoGPT-managed credentials cannot be deleted by users."""
def test_delete_is_managed_returns_403(self):
cred = APIKeyCredentials(
id="managed-cred-1",
provider="agent_mail",
title="AgentMail (managed by AutoGPT)",
api_key=SecretStr("sk-managed-key"),
is_managed=True,
)
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.store.get_creds_by_id = AsyncMock(return_value=cred)
resp = client.request("DELETE", "/agent_mail/credentials/managed-cred-1")
assert resp.status_code == 403
assert "AutoGPT-managed" in resp.json()["detail"]
def test_list_credentials_includes_is_managed_field(self):
managed = APIKeyCredentials(
id="managed-1",
provider="agent_mail",
title="AgentMail (managed)",
api_key=SecretStr("sk-key"),
is_managed=True,
)
regular = APIKeyCredentials(
id="regular-1",
provider="openai",
title="My Key",
api_key=SecretStr("sk-key"),
)
with patch(
"backend.api.features.integrations.router.creds_manager"
) as mock_mgr:
mock_mgr.store.get_all_creds = AsyncMock(return_value=[managed, regular])
resp = client.get("/credentials")
assert resp.status_code == 200
data = resp.json()
managed_cred = next(c for c in data if c["id"] == "managed-1")
regular_cred = next(c for c in data if c["id"] == "regular-1")
assert managed_cred["is_managed"] is True
assert regular_cred["is_managed"] is False
# ---------------------------------------------------------------------------
# Managed credential provisioning infrastructure
# ---------------------------------------------------------------------------
def _make_managed_cred(
provider: str = "agent_mail", pod_id: str = "pod-abc"
) -> APIKeyCredentials:
return APIKeyCredentials(
id="managed-auto",
provider=provider,
title="AgentMail (managed by AutoGPT)",
api_key=SecretStr("sk-pod-key"),
is_managed=True,
metadata={"pod_id": pod_id},
)
def _make_store_mock(**kwargs) -> MagicMock:
"""Create a store mock with a working async ``locks()`` context manager."""
@asynccontextmanager
async def _noop_locked(key):
yield
locks_obj = MagicMock()
locks_obj.locked = _noop_locked
store = MagicMock(**kwargs)
store.locks = AsyncMock(return_value=locks_obj)
return store
class TestEnsureManagedCredentials:
"""Unit tests for the ensure/cleanup helpers in managed_credentials.py."""
@pytest.mark.asyncio
async def test_provisions_when_missing(self):
"""Provider.provision() is called when no managed credential exists."""
from backend.integrations.managed_credentials import (
_PROVIDERS,
_provisioned_users,
ensure_managed_credentials,
)
cred = _make_managed_cred()
provider = MagicMock()
provider.provider_name = "test_provider"
provider.is_available = AsyncMock(return_value=True)
provider.provision = AsyncMock(return_value=cred)
store = _make_store_mock()
store.has_managed_credential = AsyncMock(return_value=False)
store.add_managed_credential = AsyncMock()
saved = dict(_PROVIDERS)
_PROVIDERS.clear()
_PROVIDERS["test_provider"] = provider
_provisioned_users.pop("user-1", None)
try:
await ensure_managed_credentials("user-1", store)
finally:
_PROVIDERS.clear()
_PROVIDERS.update(saved)
_provisioned_users.pop("user-1", None)
provider.provision.assert_awaited_once_with("user-1")
store.add_managed_credential.assert_awaited_once_with("user-1", cred)
@pytest.mark.asyncio
async def test_skips_when_already_exists(self):
"""Provider.provision() is NOT called when managed credential exists."""
from backend.integrations.managed_credentials import (
_PROVIDERS,
_provisioned_users,
ensure_managed_credentials,
)
provider = MagicMock()
provider.provider_name = "test_provider"
provider.is_available = AsyncMock(return_value=True)
provider.provision = AsyncMock()
store = _make_store_mock()
store.has_managed_credential = AsyncMock(return_value=True)
saved = dict(_PROVIDERS)
_PROVIDERS.clear()
_PROVIDERS["test_provider"] = provider
_provisioned_users.pop("user-1", None)
try:
await ensure_managed_credentials("user-1", store)
finally:
_PROVIDERS.clear()
_PROVIDERS.update(saved)
_provisioned_users.pop("user-1", None)
provider.provision.assert_not_awaited()
@pytest.mark.asyncio
async def test_skips_when_unavailable(self):
"""Provider.provision() is NOT called when provider is not available."""
from backend.integrations.managed_credentials import (
_PROVIDERS,
_provisioned_users,
ensure_managed_credentials,
)
provider = MagicMock()
provider.provider_name = "test_provider"
provider.is_available = AsyncMock(return_value=False)
provider.provision = AsyncMock()
store = _make_store_mock()
store.has_managed_credential = AsyncMock()
saved = dict(_PROVIDERS)
_PROVIDERS.clear()
_PROVIDERS["test_provider"] = provider
_provisioned_users.pop("user-1", None)
try:
await ensure_managed_credentials("user-1", store)
finally:
_PROVIDERS.clear()
_PROVIDERS.update(saved)
_provisioned_users.pop("user-1", None)
provider.provision.assert_not_awaited()
store.has_managed_credential.assert_not_awaited()
@pytest.mark.asyncio
async def test_provision_failure_does_not_propagate(self):
"""A failed provision is logged but does not raise."""
from backend.integrations.managed_credentials import (
_PROVIDERS,
_provisioned_users,
ensure_managed_credentials,
)
provider = MagicMock()
provider.provider_name = "test_provider"
provider.is_available = AsyncMock(return_value=True)
provider.provision = AsyncMock(side_effect=RuntimeError("boom"))
store = _make_store_mock()
store.has_managed_credential = AsyncMock(return_value=False)
saved = dict(_PROVIDERS)
_PROVIDERS.clear()
_PROVIDERS["test_provider"] = provider
_provisioned_users.pop("user-1", None)
try:
await ensure_managed_credentials("user-1", store)
finally:
_PROVIDERS.clear()
_PROVIDERS.update(saved)
_provisioned_users.pop("user-1", None)
# No exception raised — provisioning failure is swallowed.
class TestCleanupManagedCredentials:
"""Unit tests for cleanup_managed_credentials."""
@pytest.mark.asyncio
async def test_calls_deprovision_for_managed_creds(self):
from backend.integrations.managed_credentials import (
_PROVIDERS,
cleanup_managed_credentials,
)
cred = _make_managed_cred()
provider = MagicMock()
provider.provider_name = "agent_mail"
provider.deprovision = AsyncMock()
store = MagicMock()
store.get_all_creds = AsyncMock(return_value=[cred])
saved = dict(_PROVIDERS)
_PROVIDERS.clear()
_PROVIDERS["agent_mail"] = provider
try:
await cleanup_managed_credentials("user-1", store)
finally:
_PROVIDERS.clear()
_PROVIDERS.update(saved)
provider.deprovision.assert_awaited_once_with("user-1", cred)
@pytest.mark.asyncio
async def test_skips_non_managed_creds(self):
from backend.integrations.managed_credentials import (
_PROVIDERS,
cleanup_managed_credentials,
)
regular = _make_api_key_cred()
provider = MagicMock()
provider.provider_name = "openai"
provider.deprovision = AsyncMock()
store = MagicMock()
store.get_all_creds = AsyncMock(return_value=[regular])
saved = dict(_PROVIDERS)
_PROVIDERS.clear()
_PROVIDERS["openai"] = provider
try:
await cleanup_managed_credentials("user-1", store)
finally:
_PROVIDERS.clear()
_PROVIDERS.update(saved)
provider.deprovision.assert_not_awaited()
@pytest.mark.asyncio
async def test_deprovision_failure_does_not_propagate(self):
from backend.integrations.managed_credentials import (
_PROVIDERS,
cleanup_managed_credentials,
)
cred = _make_managed_cred()
provider = MagicMock()
provider.provider_name = "agent_mail"
provider.deprovision = AsyncMock(side_effect=RuntimeError("boom"))
store = MagicMock()
store.get_all_creds = AsyncMock(return_value=[cred])
saved = dict(_PROVIDERS)
_PROVIDERS.clear()
_PROVIDERS["agent_mail"] = provider
try:
await cleanup_managed_credentials("user-1", store)
finally:
_PROVIDERS.clear()
_PROVIDERS.update(saved)
# No exception raised — cleanup failure is swallowed.

View File

@@ -17,8 +17,6 @@ from backend.data.includes import library_agent_include
from backend.util.exceptions import NotFoundError
from backend.util.json import SafeJson
from .db import get_library_agent_by_graph_id, update_library_agent
logger = logging.getLogger(__name__)
@@ -61,28 +59,17 @@ async def add_graph_to_library(
graph_model: GraphModel,
user_id: str,
) -> library_model.LibraryAgent:
"""Check existing / restore soft-deleted / create new LibraryAgent."""
if existing := await get_library_agent_by_graph_id(
user_id, graph_model.id, graph_model.version
):
return existing
"""Check existing / restore soft-deleted / create new LibraryAgent.
deleted_agent = await prisma.models.LibraryAgent.prisma().find_unique(
where={
"userId_agentGraphId_agentGraphVersion": {
"userId": user_id,
"agentGraphId": graph_model.id,
"agentGraphVersion": graph_model.version,
}
},
Uses a create-then-catch-UniqueViolationError-then-update pattern on
the (userId, agentGraphId, agentGraphVersion) composite unique constraint.
This is more robust than ``upsert`` because Prisma's upsert atomicity
guarantees are not well-documented for all versions.
"""
settings_json = SafeJson(GraphSettings.from_graph(graph_model).model_dump())
_include = library_agent_include(
user_id, include_nodes=False, include_executions=False
)
if deleted_agent and (deleted_agent.isDeleted or deleted_agent.isArchived):
return await update_library_agent(
deleted_agent.id,
user_id,
is_deleted=False,
is_archived=False,
)
try:
added_agent = await prisma.models.LibraryAgent.prisma().create(
@@ -98,23 +85,32 @@ async def add_graph_to_library(
},
"isCreatedByUser": False,
"useGraphIsActiveVersion": False,
"settings": SafeJson(
GraphSettings.from_graph(graph_model).model_dump()
),
"settings": settings_json,
},
include=library_agent_include(
user_id, include_nodes=False, include_executions=False
),
include=_include,
)
except prisma.errors.UniqueViolationError:
# Race condition: concurrent request created the row between our
# check and create. Re-read instead of crashing.
existing = await get_library_agent_by_graph_id(
user_id, graph_model.id, graph_model.version
# Already exists — update to restore if previously soft-deleted/archived
added_agent = await prisma.models.LibraryAgent.prisma().update(
where={
"userId_agentGraphId_agentGraphVersion": {
"userId": user_id,
"agentGraphId": graph_model.id,
"agentGraphVersion": graph_model.version,
}
},
data={
"isDeleted": False,
"isArchived": False,
"settings": settings_json,
},
include=_include,
)
if existing:
return existing
raise # Shouldn't happen, but don't swallow unexpected errors
if added_agent is None:
raise NotFoundError(
f"LibraryAgent for graph #{graph_model.id} "
f"v{graph_model.version} not found after UniqueViolationError"
)
logger.debug(
f"Added graph #{graph_model.id} v{graph_model.version} "

View File

@@ -1,71 +1,80 @@
from unittest.mock import AsyncMock, MagicMock, patch
import prisma.errors
import pytest
from ._add_to_library import add_graph_to_library
@pytest.mark.asyncio
async def test_add_graph_to_library_restores_archived_agent() -> None:
graph_model = MagicMock(id="graph-id", version=2)
archived_agent = MagicMock(id="library-agent-id", isDeleted=False, isArchived=True)
restored_agent = MagicMock(name="LibraryAgentModel")
async def test_add_graph_to_library_create_new_agent() -> None:
"""When no matching LibraryAgent exists, create inserts a new one."""
graph_model = MagicMock(id="graph-id", version=2, nodes=[])
created_agent = MagicMock(name="CreatedLibraryAgent")
converted_agent = MagicMock(name="ConvertedLibraryAgent")
with (
patch(
"backend.api.features.library._add_to_library.get_library_agent_by_graph_id",
new=AsyncMock(return_value=None),
),
patch(
"backend.api.features.library._add_to_library.prisma.models.LibraryAgent.prisma"
) as mock_prisma,
patch(
"backend.api.features.library._add_to_library.update_library_agent",
new=AsyncMock(return_value=restored_agent),
) as mock_update,
"backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
return_value=converted_agent,
) as mock_from_db,
):
mock_prisma.return_value.find_unique = AsyncMock(return_value=archived_agent)
mock_prisma.return_value.create = AsyncMock(return_value=created_agent)
result = await add_graph_to_library("slv-id", graph_model, "user-id")
assert result is restored_agent
mock_update.assert_awaited_once_with(
"library-agent-id",
"user-id",
is_deleted=False,
is_archived=False,
)
mock_prisma.return_value.create.assert_not_called()
assert result is converted_agent
mock_from_db.assert_called_once_with(created_agent)
# Verify create was called with correct data
create_call = mock_prisma.return_value.create.call_args
create_data = create_call.kwargs["data"]
assert create_data["User"] == {"connect": {"id": "user-id"}}
assert create_data["AgentGraph"] == {
"connect": {"graphVersionId": {"id": "graph-id", "version": 2}}
}
assert create_data["isCreatedByUser"] is False
assert create_data["useGraphIsActiveVersion"] is False
@pytest.mark.asyncio
async def test_add_graph_to_library_restores_deleted_agent() -> None:
graph_model = MagicMock(id="graph-id", version=2)
deleted_agent = MagicMock(id="library-agent-id", isDeleted=True, isArchived=False)
restored_agent = MagicMock(name="LibraryAgentModel")
async def test_add_graph_to_library_unique_violation_updates_existing() -> None:
"""UniqueViolationError on create falls back to update."""
graph_model = MagicMock(id="graph-id", version=2, nodes=[])
updated_agent = MagicMock(name="UpdatedLibraryAgent")
converted_agent = MagicMock(name="ConvertedLibraryAgent")
with (
patch(
"backend.api.features.library._add_to_library.get_library_agent_by_graph_id",
new=AsyncMock(return_value=None),
),
patch(
"backend.api.features.library._add_to_library.prisma.models.LibraryAgent.prisma"
) as mock_prisma,
patch(
"backend.api.features.library._add_to_library.update_library_agent",
new=AsyncMock(return_value=restored_agent),
) as mock_update,
"backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
return_value=converted_agent,
) as mock_from_db,
):
mock_prisma.return_value.find_unique = AsyncMock(return_value=deleted_agent)
mock_prisma.return_value.create = AsyncMock(
side_effect=prisma.errors.UniqueViolationError(
MagicMock(), message="unique constraint"
)
)
mock_prisma.return_value.update = AsyncMock(return_value=updated_agent)
result = await add_graph_to_library("slv-id", graph_model, "user-id")
assert result is restored_agent
mock_update.assert_awaited_once_with(
"library-agent-id",
"user-id",
is_deleted=False,
is_archived=False,
)
mock_prisma.return_value.create.assert_not_called()
assert result is converted_agent
mock_from_db.assert_called_once_with(updated_agent)
# Verify update was called with correct where and data
update_call = mock_prisma.return_value.update.call_args
assert update_call.kwargs["where"] == {
"userId_agentGraphId_agentGraphVersion": {
"userId": "user-id",
"agentGraphId": "graph-id",
"agentGraphVersion": 2,
}
}
update_data = update_call.kwargs["data"]
assert update_data["isDeleted"] is False
assert update_data["isArchived"] is False

View File

@@ -436,32 +436,58 @@ async def create_library_agent(
async with transaction() as tx:
library_agents = await asyncio.gather(
*(
prisma.models.LibraryAgent.prisma(tx).create(
data=prisma.types.LibraryAgentCreateInput(
isCreatedByUser=(user_id == user_id),
useGraphIsActiveVersion=True,
User={"connect": {"id": user_id}},
AgentGraph={
"connect": {
"graphVersionId": {
"id": graph_entry.id,
"version": graph_entry.version,
prisma.models.LibraryAgent.prisma(tx).upsert(
where={
"userId_agentGraphId_agentGraphVersion": {
"userId": user_id,
"agentGraphId": graph_entry.id,
"agentGraphVersion": graph_entry.version,
}
},
data={
"create": prisma.types.LibraryAgentCreateInput(
isCreatedByUser=(user_id == graph.user_id),
useGraphIsActiveVersion=True,
User={"connect": {"id": user_id}},
AgentGraph={
"connect": {
"graphVersionId": {
"id": graph_entry.id,
"version": graph_entry.version,
}
}
}
},
settings=SafeJson(
GraphSettings.from_graph(
graph_entry,
hitl_safe_mode=hitl_safe_mode,
sensitive_action_safe_mode=sensitive_action_safe_mode,
).model_dump()
),
**(
{"Folder": {"connect": {"id": folder_id}}}
if folder_id and graph_entry is graph
else {}
),
),
"update": {
"isDeleted": False,
"isArchived": False,
"useGraphIsActiveVersion": True,
"settings": SafeJson(
GraphSettings.from_graph(
graph_entry,
hitl_safe_mode=hitl_safe_mode,
sensitive_action_safe_mode=sensitive_action_safe_mode,
).model_dump()
),
**(
{"Folder": {"connect": {"id": folder_id}}}
if folder_id and graph_entry is graph
else {}
),
},
settings=SafeJson(
GraphSettings.from_graph(
graph_entry,
hitl_safe_mode=hitl_safe_mode,
sensitive_action_safe_mode=sensitive_action_safe_mode,
).model_dump()
),
**(
{"Folder": {"connect": {"id": folder_id}}}
if folder_id and graph_entry is graph
else {}
),
),
},
include=library_agent_include(
user_id, include_nodes=False, include_executions=False
),

View File

@@ -1,4 +1,6 @@
from contextlib import asynccontextmanager
from datetime import datetime
from unittest.mock import AsyncMock, MagicMock, patch
import prisma.enums
import prisma.models
@@ -85,10 +87,6 @@ async def test_get_library_agents(mocker):
async def test_add_agent_to_library(mocker):
await connect()
# Mock the transaction context
mock_transaction = mocker.patch("backend.api.features.library.db.transaction")
mock_transaction.return_value.__aenter__ = mocker.AsyncMock(return_value=None)
mock_transaction.return_value.__aexit__ = mocker.AsyncMock(return_value=None)
# Mock data
mock_store_listing_data = prisma.models.StoreListingVersion(
id="version123",
@@ -143,13 +141,11 @@ async def test_add_agent_to_library(mocker):
)
mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
mock_library_agent.return_value.find_unique = mocker.AsyncMock(return_value=None)
mock_library_agent.return_value.create = mocker.AsyncMock(
return_value=mock_library_agent_data
)
# Mock graph_db.get_graph function that's called to check for HITL blocks
# Mock graph_db.get_graph function that's called in resolve_graph_for_library
# (lives in _add_to_library.py after refactor, not db.py)
mock_graph_db = mocker.patch(
"backend.api.features.library._add_to_library.graph_db"
@@ -175,37 +171,27 @@ async def test_add_agent_to_library(mocker):
mock_store_listing_version.return_value.find_unique.assert_called_once_with(
where={"id": "version123"}, include={"AgentGraph": True}
)
mock_library_agent.return_value.find_unique.assert_called_once_with(
where={
"userId_agentGraphId_agentGraphVersion": {
"userId": "test-user",
"agentGraphId": "agent1",
"agentGraphVersion": 1,
}
},
)
# Check that create was called with the expected data including settings
create_call_args = mock_library_agent.return_value.create.call_args
assert create_call_args is not None
# Verify the main structure
expected_data = {
# Verify the create data structure
create_data = create_call_args.kwargs["data"]
expected_create = {
"User": {"connect": {"id": "test-user"}},
"AgentGraph": {"connect": {"graphVersionId": {"id": "agent1", "version": 1}}},
"isCreatedByUser": False,
"useGraphIsActiveVersion": False,
}
actual_data = create_call_args[1]["data"]
# Check that all expected fields are present
for key, value in expected_data.items():
assert actual_data[key] == value
for key, value in expected_create.items():
assert create_data[key] == value
# Check that settings field is present and is a SafeJson object
assert "settings" in actual_data
assert hasattr(actual_data["settings"], "__class__") # Should be a SafeJson object
assert "settings" in create_data
assert hasattr(create_data["settings"], "__class__") # Should be a SafeJson object
# Check include parameter
assert create_call_args[1]["include"] == library_agent_include(
assert create_call_args.kwargs["include"] == library_agent_include(
"test-user", include_nodes=False, include_executions=False
)
@@ -320,3 +306,50 @@ async def test_update_graph_in_library_allows_archived_library_agent(mocker):
include_archived=True,
)
mock_update_library_agent.assert_awaited_once_with("test-user", created_graph)
@pytest.mark.asyncio
async def test_create_library_agent_uses_upsert():
"""create_library_agent should use upsert (not create) to handle duplicates."""
mock_graph = MagicMock()
mock_graph.id = "graph-1"
mock_graph.version = 1
mock_graph.user_id = "user-1"
mock_graph.nodes = []
mock_graph.sub_graphs = []
mock_upserted = MagicMock(name="UpsertedLibraryAgent")
@asynccontextmanager
async def fake_tx():
yield None
with (
patch("backend.api.features.library.db.transaction", fake_tx),
patch("prisma.models.LibraryAgent.prisma") as mock_prisma,
patch(
"backend.api.features.library.db.add_generated_agent_image",
new=AsyncMock(),
),
patch(
"backend.api.features.library.model.LibraryAgent.from_db",
return_value=MagicMock(),
),
):
mock_prisma.return_value.upsert = AsyncMock(return_value=mock_upserted)
result = await db.create_library_agent(mock_graph, "user-1")
assert len(result) == 1
upsert_call = mock_prisma.return_value.upsert.call_args
assert upsert_call is not None
# Verify the upsert where clause uses the composite unique key
where = upsert_call.kwargs["where"]
assert "userId_agentGraphId_agentGraphVersion" in where
# Verify the upsert data has both create and update branches
data = upsert_call.kwargs["data"]
assert "create" in data
assert "update" in data
# Verify update branch restores soft-deleted/archived agents
assert data["update"]["isDeleted"] is False
assert data["update"]["isArchived"] is False

View File

@@ -12,6 +12,7 @@ Tests cover:
5. Complete OAuth flow end-to-end
"""
import asyncio
import base64
import hashlib
import secrets
@@ -58,14 +59,27 @@ async def test_user(server, test_user_id: str):
yield test_user_id
# Cleanup - delete in correct order due to foreign key constraints
await PrismaOAuthAccessToken.prisma().delete_many(where={"userId": test_user_id})
await PrismaOAuthRefreshToken.prisma().delete_many(where={"userId": test_user_id})
await PrismaOAuthAuthorizationCode.prisma().delete_many(
where={"userId": test_user_id}
)
await PrismaOAuthApplication.prisma().delete_many(where={"ownerId": test_user_id})
await PrismaUser.prisma().delete(where={"id": test_user_id})
# Cleanup - delete in correct order due to foreign key constraints.
# Wrap in try/except because the event loop or Prisma engine may already
# be closed during session teardown on Python 3.12+.
try:
await asyncio.gather(
PrismaOAuthAccessToken.prisma().delete_many(where={"userId": test_user_id}),
PrismaOAuthRefreshToken.prisma().delete_many(
where={"userId": test_user_id}
),
PrismaOAuthAuthorizationCode.prisma().delete_many(
where={"userId": test_user_id}
),
)
await asyncio.gather(
PrismaOAuthApplication.prisma().delete_many(
where={"ownerId": test_user_id}
),
PrismaUser.prisma().delete(where={"id": test_user_id}),
)
except RuntimeError:
pass
@pytest_asyncio.fixture

View File

@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="Europe/Delft",
subscriptionTier=prisma.enums.SubscriptionTier.FREE, # type: ignore[reportCallIssue,reportAttributeAccessIssue]
)
mock_agent = prisma.models.AgentGraph(
id="agent-id",

View File

@@ -18,6 +18,8 @@ from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.platform_cost_routes
import backend.api.features.admin.rate_limit_admin_routes
import backend.api.features.admin.store_admin_routes
import backend.api.features.builder
import backend.api.features.builder.routes
@@ -117,6 +119,11 @@ async def lifespan_context(app: fastapi.FastAPI):
AutoRegistry.patch_integrations()
# Register managed credential providers (e.g. AgentMail)
from backend.integrations.managed_providers import register_all
register_all()
await backend.data.block.initialize_blocks()
await backend.data.user.migrate_and_encrypt_user_integrations()
@@ -318,6 +325,16 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/executions",
)
app.include_router(
backend.api.features.admin.rate_limit_admin_routes.router,
tags=["v2", "admin"],
prefix="/api/copilot",
)
app.include_router(
backend.api.features.admin.platform_cost_routes.router,
tags=["v2", "admin"],
prefix="/api/platform-costs",
)
app.include_router(
backend.api.features.executions.review.routes.router,
tags=["v2", "executions", "review"],

View File

@@ -698,13 +698,30 @@ class Block(ABC, Generic[BlockSchemaInputType, BlockSchemaOutputType]):
if should_pause:
return
# Validate the input data (original or reviewer-modified) once
if error := self.input_schema.validate_data(input_data):
raise BlockInputError(
message=f"Unable to execute block with invalid input data: {error}",
block_name=self.name,
block_id=self.id,
)
# Validate the input data (original or reviewer-modified) once.
# In dry-run mode, credential fields may contain sentinel None values
# that would fail JSON schema required checks. We still validate the
# non-credential fields so blocks that execute for real during dry-run
# (e.g. AgentExecutorBlock) get proper input validation.
is_dry_run = getattr(kwargs.get("execution_context"), "dry_run", False)
if is_dry_run:
cred_field_names = set(self.input_schema.get_credentials_fields().keys())
non_cred_data = {
k: v for k, v in input_data.items() if k not in cred_field_names
}
if error := self.input_schema.validate_data(non_cred_data):
raise BlockInputError(
message=f"Unable to execute block with invalid input data: {error}",
block_name=self.name,
block_id=self.id,
)
else:
if error := self.input_schema.validate_data(input_data):
raise BlockInputError(
message=f"Unable to execute block with invalid input data: {error}",
block_name=self.name,
block_id=self.id,
)
# Use the validated input data
async for output_name, output_data in self.run(

View File

@@ -49,11 +49,17 @@ class AgentExecutorBlock(Block):
@classmethod
def get_missing_input(cls, data: BlockInput) -> set[str]:
required_fields = cls.get_input_schema(data).get("required", [])
return set(required_fields) - set(data)
# Check against the nested `inputs` dict, not the top-level node
# data — required fields like "topic" live inside data["inputs"],
# not at data["topic"].
provided = data.get("inputs", {})
return set(required_fields) - set(provided)
@classmethod
def get_mismatch_error(cls, data: BlockInput) -> str | None:
return validate_with_jsonschema(cls.get_input_schema(data), data)
return validate_with_jsonschema(
cls.get_input_schema(data), data.get("inputs", {})
)
class Output(BlockSchema):
# Use BlockSchema to avoid automatic error field that could clash with graph outputs
@@ -88,6 +94,7 @@ class AgentExecutorBlock(Block):
execution_context=execution_context.model_copy(
update={"parent_execution_id": graph_exec_id},
),
dry_run=execution_context.dry_run,
)
logger = execution_utils.LogMetadata(
@@ -149,14 +156,19 @@ class AgentExecutorBlock(Block):
ExecutionStatus.TERMINATED,
ExecutionStatus.FAILED,
]:
logger.debug(
f"Execution {log_id} received event {event.event_type} with status {event.status}"
logger.info(
f"Execution {log_id} skipping event {event.event_type} status={event.status} "
f"node={getattr(event, 'node_exec_id', '?')}"
)
continue
if event.event_type == ExecutionEventType.GRAPH_EXEC_UPDATE:
# If the graph execution is COMPLETED, TERMINATED, or FAILED,
# we can stop listening for further events.
logger.info(
f"Execution {log_id} graph completed with status {event.status}, "
f"yielded {len(yielded_node_exec_ids)} outputs"
)
self.merge_stats(
NodeExecutionStats(
extra_cost=event.stats.cost if event.stats else 0,

View File

@@ -1,3 +1,4 @@
import re
from typing import Any
from backend.blocks._base import (
@@ -19,6 +20,33 @@ from backend.blocks.llm import (
)
from backend.data.model import APIKeyCredentials, NodeExecutionStats, SchemaField
# Minimum max_output_tokens accepted by OpenAI-compatible APIs.
# A true/false answer fits comfortably within this budget.
MIN_LLM_OUTPUT_TOKENS = 16
def _parse_boolean_response(response_text: str) -> tuple[bool, str | None]:
"""Parse an LLM response into a boolean result.
Returns a ``(result, error)`` tuple. *error* is ``None`` when the
response is unambiguous; otherwise it contains a diagnostic message
and *result* defaults to ``False``.
"""
text = response_text.strip().lower()
if text == "true":
return True, None
if text == "false":
return False, None
# Fuzzy match use word boundaries to avoid false positives like "untrue".
tokens = set(re.findall(r"\b(true|false|yes|no|1|0)\b", text))
if tokens == {"true"} or tokens == {"yes"} or tokens == {"1"}:
return True, None
if tokens == {"false"} or tokens == {"no"} or tokens == {"0"}:
return False, None
return False, f"Unclear AI response: '{response_text}'"
class AIConditionBlock(AIBlockBase):
"""
@@ -162,54 +190,26 @@ class AIConditionBlock(AIBlockBase):
]
# Call the LLM
try:
response = await self.llm_call(
credentials=credentials,
llm_model=input_data.model,
prompt=prompt,
max_tokens=10, # We only expect a true/false response
response = await self.llm_call(
credentials=credentials,
llm_model=input_data.model,
prompt=prompt,
max_tokens=MIN_LLM_OUTPUT_TOKENS,
)
# Extract the boolean result from the response
result, error = _parse_boolean_response(response.response)
if error:
yield "error", error
# Update internal stats
self.merge_stats(
NodeExecutionStats(
input_token_count=response.prompt_tokens,
output_token_count=response.completion_tokens,
)
# Extract the boolean result from the response
response_text = response.response.strip().lower()
if response_text == "true":
result = True
elif response_text == "false":
result = False
else:
# If the response is not clear, try to interpret it using word boundaries
import re
# Use word boundaries to avoid false positives like 'untrue' or '10'
tokens = set(re.findall(r"\b(true|false|yes|no|1|0)\b", response_text))
if tokens == {"true"} or tokens == {"yes"} or tokens == {"1"}:
result = True
elif tokens == {"false"} or tokens == {"no"} or tokens == {"0"}:
result = False
else:
# Unclear or conflicting response - default to False and yield error
result = False
yield "error", f"Unclear AI response: '{response.response}'"
# Update internal stats
self.merge_stats(
NodeExecutionStats(
input_token_count=response.prompt_tokens,
output_token_count=response.completion_tokens,
)
)
self.prompt = response.prompt
except Exception as e:
# In case of any error, default to False to be safe
result = False
# Log the error but don't fail the block execution
import logging
logger = logging.getLogger(__name__)
logger.error(f"AI condition evaluation failed: {str(e)}")
yield "error", f"AI evaluation failed: {str(e)}"
)
self.prompt = response.prompt
# Yield results
yield "result", result

View File

@@ -0,0 +1,147 @@
"""Tests for AIConditionBlock regression coverage for max_tokens and error propagation."""
from __future__ import annotations
from typing import cast
import pytest
from backend.blocks.ai_condition import (
MIN_LLM_OUTPUT_TOKENS,
AIConditionBlock,
_parse_boolean_response,
)
from backend.blocks.llm import (
DEFAULT_LLM_MODEL,
TEST_CREDENTIALS,
TEST_CREDENTIALS_INPUT,
AICredentials,
LLMResponse,
)
_TEST_AI_CREDENTIALS = cast(AICredentials, TEST_CREDENTIALS_INPUT)
# ---------------------------------------------------------------------------
# Helper to collect all yields from the async generator
# ---------------------------------------------------------------------------
async def _collect_outputs(block: AIConditionBlock, input_data, credentials):
outputs: dict[str, object] = {}
async for name, value in block.run(input_data, credentials=credentials):
outputs[name] = value
return outputs
def _make_input(**overrides) -> AIConditionBlock.Input:
defaults: dict = {
"input_value": "hello@example.com",
"condition": "the input is an email address",
"yes_value": "yes!",
"no_value": "no!",
"model": DEFAULT_LLM_MODEL,
"credentials": TEST_CREDENTIALS_INPUT,
}
defaults.update(overrides)
return AIConditionBlock.Input(**defaults)
def _mock_llm_response(response_text: str) -> LLMResponse:
return LLMResponse(
raw_response="",
prompt=[],
response=response_text,
tool_calls=None,
prompt_tokens=10,
completion_tokens=5,
reasoning=None,
)
# ---------------------------------------------------------------------------
# _parse_boolean_response unit tests
# ---------------------------------------------------------------------------
class TestParseBooleanResponse:
def test_true_exact(self):
assert _parse_boolean_response("true") == (True, None)
def test_false_exact(self):
assert _parse_boolean_response("false") == (False, None)
def test_true_with_whitespace(self):
assert _parse_boolean_response(" True ") == (True, None)
def test_yes_fuzzy(self):
assert _parse_boolean_response("Yes") == (True, None)
def test_no_fuzzy(self):
assert _parse_boolean_response("no") == (False, None)
def test_one_fuzzy(self):
assert _parse_boolean_response("1") == (True, None)
def test_zero_fuzzy(self):
assert _parse_boolean_response("0") == (False, None)
def test_unclear_response(self):
result, error = _parse_boolean_response("I'm not sure")
assert result is False
assert error is not None
assert "Unclear" in error
def test_conflicting_tokens(self):
result, error = _parse_boolean_response("true and false")
assert result is False
assert error is not None
# ---------------------------------------------------------------------------
# Regression: max_tokens is set to MIN_LLM_OUTPUT_TOKENS
# ---------------------------------------------------------------------------
class TestMaxTokensRegression:
@pytest.mark.asyncio
async def test_llm_call_receives_min_output_tokens(self):
"""max_tokens must be MIN_LLM_OUTPUT_TOKENS (16) the previous value
of 1 was too low and caused OpenAI to reject the request."""
block = AIConditionBlock()
captured_kwargs: dict = {}
async def spy_llm_call(**kwargs):
captured_kwargs.update(kwargs)
return _mock_llm_response("true")
block.llm_call = spy_llm_call # type: ignore[assignment]
input_data = _make_input()
await _collect_outputs(block, input_data, credentials=TEST_CREDENTIALS)
assert captured_kwargs["max_tokens"] == MIN_LLM_OUTPUT_TOKENS
assert captured_kwargs["max_tokens"] == 16
# ---------------------------------------------------------------------------
# Regression: exceptions from llm_call must propagate
# ---------------------------------------------------------------------------
class TestExceptionPropagation:
@pytest.mark.asyncio
async def test_llm_call_exception_propagates(self):
"""If llm_call raises, the exception must NOT be swallowed.
Previously the block caught all exceptions and silently returned
result=False."""
block = AIConditionBlock()
async def boom(**kwargs):
raise RuntimeError("LLM provider error")
block.llm_call = boom # type: ignore[assignment]
input_data = _make_input()
with pytest.raises(RuntimeError, match="LLM provider error"):
await _collect_outputs(block, input_data, credentials=TEST_CREDENTIALS)

View File

@@ -18,6 +18,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -358,6 +359,7 @@ class AIShortformVideoCreatorBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
@@ -565,6 +567,7 @@ class AIAdMakerVideoCreatorBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
@@ -760,4 +763,5 @@ class AIScreenshotToVideoAdBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url

View File

@@ -17,7 +17,7 @@ from backend.blocks.apollo.models import (
PrimaryPhone,
SearchOrganizationsRequest,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class SearchOrganizationsBlock(Block):
@@ -218,6 +218,7 @@ To find IDs, identify the values for organization_id when you call this endpoint
) -> BlockOutput:
query = SearchOrganizationsRequest(**input_data.model_dump())
organizations = await self.search_organizations(query, credentials)
self.merge_stats(NodeExecutionStats(output_size=len(organizations)))
for organization in organizations:
yield "organization", organization
yield "organizations", organizations

View File

@@ -21,7 +21,7 @@ from backend.blocks.apollo.models import (
SearchPeopleRequest,
SenorityLevels,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class SearchPeopleBlock(Block):
@@ -366,4 +366,5 @@ class SearchPeopleBlock(Block):
*(enrich_or_fallback(person) for person in people)
)
self.merge_stats(NodeExecutionStats(output_size=len(people)))
yield "people", people

View File

@@ -13,7 +13,7 @@ from backend.blocks.apollo._auth import (
ApolloCredentialsInput,
)
from backend.blocks.apollo.models import Contact, EnrichPersonRequest
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class GetPersonDetailBlock(Block):
@@ -141,4 +141,5 @@ class GetPersonDetailBlock(Block):
**kwargs,
) -> BlockOutput:
query = EnrichPersonRequest(**input_data.model_dump())
self.merge_stats(NodeExecutionStats(output_size=1))
yield "contact", await self.enrich_person(query, credentials)

View File

@@ -146,6 +146,21 @@ class AutoPilotBlock(Block):
advanced=True,
)
dry_run: bool = SchemaField(
description=(
"When enabled, run_block and run_agent tool calls in this "
"autopilot session are forced to use dry-run simulation mode. "
"No real API calls, side effects, or credits are consumed "
"by those tools. Useful for testing agent wiring and "
"previewing outputs. "
"Only applies when creating a new session (session_id is empty). "
"When reusing an existing session_id, the session's original "
"dry_run setting is preserved."
),
default=False,
advanced=True,
)
# timeout_seconds removed: the SDK manages its own heartbeat-based
# timeouts internally; wrapping with asyncio.timeout corrupts the
# SDK's internal stream (see service.py CRITICAL comment).
@@ -232,11 +247,11 @@ class AutoPilotBlock(Block):
},
)
async def create_session(self, user_id: str) -> str:
async def create_session(self, user_id: str, *, dry_run: bool) -> str:
"""Create a new chat session and return its ID (mockable for tests)."""
from backend.copilot.model import create_chat_session # avoid circular import
session = await create_chat_session(user_id)
session = await create_chat_session(user_id, dry_run=dry_run)
return session.session_id
async def execute_copilot(
@@ -367,7 +382,9 @@ class AutoPilotBlock(Block):
# even if the downstream stream fails (avoids orphaned sessions).
sid = input_data.session_id
if not sid:
sid = await self.create_session(execution_context.user_id)
sid = await self.create_session(
execution_context.user_id, dry_run=input_data.dry_run
)
# NOTE: No asyncio.timeout() here — the SDK manages its own
# heartbeat-based timeouts internally. Wrapping with asyncio.timeout

View File

@@ -17,6 +17,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -342,6 +343,7 @@ class ExecuteCodeBlock(Block, BaseE2BExecutorMixin):
# Determine result object shape & filter out empty formats
main_result, results = self.process_execution_results(results)
self.merge_stats(NodeExecutionStats(output_size=1))
if main_result:
yield "main_result", main_result
yield "results", results
@@ -467,6 +469,7 @@ class InstantiateCodeSandboxBlock(Block, BaseE2BExecutorMixin):
setup_commands=input_data.setup_commands,
timeout=input_data.timeout,
)
self.merge_stats(NodeExecutionStats(output_size=1))
if sandbox_id:
yield "sandbox_id", sandbox_id
else:
@@ -577,6 +580,7 @@ class ExecuteCodeStepBlock(Block, BaseE2BExecutorMixin):
# Determine result object shape & filter out empty formats
main_result, results = self.process_execution_results(results)
self.merge_stats(NodeExecutionStats(output_size=1))
if main_result:
yield "main_result", main_result
yield "results", results

View File

@@ -73,7 +73,7 @@ class ReadDiscordMessagesBlock(Block):
id="df06086a-d5ac-4abb-9996-2ad0acb2eff7",
input_schema=ReadDiscordMessagesBlock.Input, # Assign input schema
output_schema=ReadDiscordMessagesBlock.Output, # Assign output schema
description="Reads messages from a Discord channel using a bot token.",
description="Reads new messages from a Discord channel using a bot token and triggers when a new message is posted",
categories={BlockCategory.SOCIAL},
test_input={
"continuous_read": False,

View File

@@ -15,7 +15,12 @@ from backend.blocks._base import (
BlockSchemaInput,
BlockSchemaOutput,
)
from backend.data.model import APIKeyCredentials, CredentialsField, SchemaField
from backend.data.model import (
APIKeyCredentials,
CredentialsField,
NodeExecutionStats,
SchemaField,
)
from backend.util.type import MediaFileType
from ._api import (
@@ -195,6 +200,7 @@ class GetLinkedinProfileBlock(Block):
include_social_media=input_data.include_social_media,
include_extra=input_data.include_extra,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "profile", profile
except Exception as e:
logger.error(f"Error fetching LinkedIn profile: {str(e)}")
@@ -341,6 +347,7 @@ class LinkedinPersonLookupBlock(Block):
include_similarity_checks=input_data.include_similarity_checks,
enrich_profile=input_data.enrich_profile,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "lookup_result", lookup_result
except Exception as e:
logger.error(f"Error looking up LinkedIn profile: {str(e)}")
@@ -443,6 +450,7 @@ class LinkedinRoleLookupBlock(Block):
company_name=input_data.company_name,
enrich_profile=input_data.enrich_profile,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "role_lookup_result", role_lookup_result
except Exception as e:
logger.error(f"Error looking up role in company: {str(e)}")
@@ -523,6 +531,7 @@ class GetLinkedinProfilePictureBlock(Block):
credentials=credentials,
linkedin_profile_url=input_data.linkedin_profile_url,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "profile_picture_url", profile_picture
except Exception as e:
logger.error(f"Error getting profile picture: {str(e)}")

View File

@@ -4,6 +4,7 @@ from typing import Optional
from exa_py import AsyncExa
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -223,3 +224,6 @@ class ExaContentsBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)

View File

@@ -4,6 +4,7 @@ from typing import Optional
from exa_py import AsyncExa
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -206,3 +207,6 @@ class ExaSearchBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)

View File

@@ -18,7 +18,7 @@ from backend.blocks.fal._auth import (
FalCredentialsInput,
)
from backend.data.execution import ExecutionContext
from backend.data.model import SchemaField
from backend.data.model import NodeExecutionStats, SchemaField
from backend.util.file import store_media_file
from backend.util.request import ClientResponseError, Requests
from backend.util.type import MediaFileType
@@ -230,6 +230,7 @@ class AIVideoGeneratorBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
except Exception as e:
error_message = str(e)

View File

@@ -1,5 +1,6 @@
import asyncio
import base64
import re
from abc import ABC
from email import encoders
from email.mime.base import MIMEBase
@@ -8,7 +9,7 @@ from email.mime.text import MIMEText
from email.policy import SMTP
from email.utils import getaddresses, parseaddr
from pathlib import Path
from typing import List, Literal, Optional
from typing import List, Literal, Optional, Protocol, runtime_checkable
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
@@ -42,8 +43,52 @@ NO_WRAP_POLICY = SMTP.clone(max_line_length=0)
def serialize_email_recipients(recipients: list[str]) -> str:
"""Serialize recipients list to comma-separated string."""
return ", ".join(recipients)
"""Serialize recipients list to comma-separated string.
Strips leading/trailing whitespace from each address to keep MIME
headers clean (mirrors the strip done in ``validate_email_recipients``).
"""
return ", ".join(addr.strip() for addr in recipients)
# RFC 5322 simplified pattern: local@domain where domain has at least one dot
_EMAIL_RE = re.compile(r"^[^@\s]+@[^@\s]+\.[^@\s]+$")
def validate_email_recipients(recipients: list[str], field_name: str = "to") -> None:
"""Validate that all recipients are plausible email addresses.
Raises ``ValueError`` with a user-friendly message listing every
invalid entry so the caller (or LLM) can correct them in one pass.
"""
invalid = [addr for addr in recipients if not _EMAIL_RE.match(addr.strip())]
if invalid:
formatted = ", ".join(f"'{a}'" for a in invalid)
raise ValueError(
f"Invalid email address(es) in '{field_name}': {formatted}. "
f"Each entry must be a valid email address (e.g. user@example.com)."
)
@runtime_checkable
class HasRecipients(Protocol):
to: list[str]
cc: list[str]
bcc: list[str]
def validate_all_recipients(input_data: HasRecipients) -> None:
"""Validate to/cc/bcc recipient fields on an input namespace.
Calls ``validate_email_recipients`` for ``to`` (required) and
``cc``/``bcc`` (when non-empty), raising ``ValueError`` on the
first field that contains an invalid address.
"""
validate_email_recipients(input_data.to, "to")
if input_data.cc:
validate_email_recipients(input_data.cc, "cc")
if input_data.bcc:
validate_email_recipients(input_data.bcc, "bcc")
def _make_mime_text(
@@ -100,14 +145,16 @@ async def create_mime_message(
) -> str:
"""Create a MIME message with attachments and return base64-encoded raw message."""
validate_all_recipients(input_data)
message = MIMEMultipart()
message["to"] = serialize_email_recipients(input_data.to)
message["subject"] = input_data.subject
if input_data.cc:
message["cc"] = ", ".join(input_data.cc)
message["cc"] = serialize_email_recipients(input_data.cc)
if input_data.bcc:
message["bcc"] = ", ".join(input_data.bcc)
message["bcc"] = serialize_email_recipients(input_data.bcc)
# Use the new helper function with content_type if available
content_type = getattr(input_data, "content_type", None)
@@ -1167,13 +1214,15 @@ async def _build_reply_message(
references.append(headers["message-id"])
# Create MIME message
validate_all_recipients(input_data)
msg = MIMEMultipart()
if input_data.to:
msg["To"] = ", ".join(input_data.to)
msg["To"] = serialize_email_recipients(input_data.to)
if input_data.cc:
msg["Cc"] = ", ".join(input_data.cc)
msg["Cc"] = serialize_email_recipients(input_data.cc)
if input_data.bcc:
msg["Bcc"] = ", ".join(input_data.bcc)
msg["Bcc"] = serialize_email_recipients(input_data.bcc)
msg["Subject"] = subject
if headers.get("message-id"):
msg["In-Reply-To"] = headers["message-id"]
@@ -1685,13 +1734,16 @@ To: {original_to}
else:
body = f"{forward_header}\n\n{original_body}"
# Validate all recipient lists before building the MIME message
validate_all_recipients(input_data)
# Create MIME message
msg = MIMEMultipart()
msg["To"] = ", ".join(input_data.to)
msg["To"] = serialize_email_recipients(input_data.to)
if input_data.cc:
msg["Cc"] = ", ".join(input_data.cc)
msg["Cc"] = serialize_email_recipients(input_data.cc)
if input_data.bcc:
msg["Bcc"] = ", ".join(input_data.bcc)
msg["Bcc"] = serialize_email_recipients(input_data.bcc)
msg["Subject"] = subject
# Add body with proper content type

View File

@@ -14,6 +14,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -117,6 +118,7 @@ class GoogleMapsSearchBlock(Block):
input_data.radius,
input_data.max_results,
)
self.merge_stats(NodeExecutionStats(output_size=len(places)))
for place in places:
yield "place", place

View File

@@ -14,6 +14,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -227,6 +228,7 @@ class IdeogramModelBlock(Block):
image_url=result,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "result", result
async def run_model(

View File

@@ -2,6 +2,8 @@ import copy
from datetime import date, time
from typing import Any, Optional
from pydantic import AliasChoices, Field
from backend.blocks._base import (
Block,
BlockCategory,
@@ -28,9 +30,9 @@ class AgentInputBlock(Block):
"""
This block is used to provide input to the graph.
It takes in a value, name, description, default values list and bool to limit selection to default values.
It takes in a value, name, and description.
It Outputs the value passed as input.
It outputs the value passed as input.
"""
class Input(BlockSchemaInput):
@@ -47,12 +49,6 @@ class AgentInputBlock(Block):
default=None,
advanced=True,
)
placeholder_values: list = SchemaField(
description="The placeholder values to be passed as input.",
default_factory=list,
advanced=True,
hidden=True,
)
advanced: bool = SchemaField(
description="Whether to show the input in the advanced section, if the field is not required.",
default=False,
@@ -65,10 +61,7 @@ class AgentInputBlock(Block):
)
def generate_schema(self):
schema = copy.deepcopy(self.get_field_schema("value"))
if possible_values := self.placeholder_values:
schema["enum"] = possible_values
return schema
return copy.deepcopy(self.get_field_schema("value"))
class Output(BlockSchema):
# Use BlockSchema to avoid automatic error field for interface definition
@@ -86,18 +79,16 @@ class AgentInputBlock(Block):
"value": "Hello, World!",
"name": "input_1",
"description": "Example test input.",
"placeholder_values": [],
},
{
"value": "Hello, World!",
"value": 42,
"name": "input_2",
"description": "Example test input with placeholders.",
"placeholder_values": ["Hello, World!"],
"description": "Example numeric input.",
},
],
"test_output": [
("result", "Hello, World!"),
("result", "Hello, World!"),
("result", 42),
],
"categories": {BlockCategory.INPUT, BlockCategory.BASIC},
"block_type": BlockType.INPUT,
@@ -245,13 +236,11 @@ class AgentShortTextInputBlock(AgentInputBlock):
"value": "Hello",
"name": "short_text_1",
"description": "Short text example 1",
"placeholder_values": [],
},
{
"value": "Quick test",
"name": "short_text_2",
"description": "Short text example 2",
"placeholder_values": ["Quick test", "Another option"],
},
],
test_output=[
@@ -285,13 +274,11 @@ class AgentLongTextInputBlock(AgentInputBlock):
"value": "Lorem ipsum dolor sit amet...",
"name": "long_text_1",
"description": "Long text example 1",
"placeholder_values": [],
},
{
"value": "Another multiline text input.",
"name": "long_text_2",
"description": "Long text example 2",
"placeholder_values": ["Another multiline text input."],
},
],
test_output=[
@@ -325,13 +312,11 @@ class AgentNumberInputBlock(AgentInputBlock):
"value": 42,
"name": "number_input_1",
"description": "Number example 1",
"placeholder_values": [],
},
{
"value": 314,
"name": "number_input_2",
"description": "Number example 2",
"placeholder_values": [314, 2718],
},
],
test_output=[
@@ -484,7 +469,8 @@ class AgentFileInputBlock(AgentInputBlock):
class AgentDropdownInputBlock(AgentInputBlock):
"""
A specialized text input block that relies on placeholder_values to present a dropdown.
A specialized text input block that presents a dropdown selector
restricted to a fixed set of values.
"""
class Input(AgentInputBlock.Input):
@@ -494,13 +480,26 @@ class AgentDropdownInputBlock(AgentInputBlock):
advanced=False,
title="Default Value",
)
placeholder_values: list = SchemaField(
description="Possible values for the dropdown.",
# Use Field() directly (not SchemaField) to pass validation_alias,
# which handles backward compat for legacy "placeholder_values" across
# all construction paths (model_construct, __init__, model_validate).
options: list = Field(
default_factory=list,
advanced=False,
title="Dropdown Options",
description=(
"If provided, renders the input as a dropdown selector "
"restricted to these values. Leave empty for free-text input."
),
validation_alias=AliasChoices("options", "placeholder_values"),
json_schema_extra={"advanced": False, "secret": False},
)
def generate_schema(self):
schema = super().generate_schema()
if possible_values := self.options:
schema["enum"] = possible_values
return schema
class Output(AgentInputBlock.Output):
result: str = SchemaField(description="Selected dropdown value.")
@@ -515,13 +514,13 @@ class AgentDropdownInputBlock(AgentInputBlock):
{
"value": "Option A",
"name": "dropdown_1",
"placeholder_values": ["Option A", "Option B", "Option C"],
"options": ["Option A", "Option B", "Option C"],
"description": "Dropdown example 1",
},
{
"value": "Option C",
"name": "dropdown_2",
"placeholder_values": ["Option A", "Option B", "Option C"],
"options": ["Option A", "Option B", "Option C"],
"description": "Dropdown example 2",
},
],

View File

@@ -10,7 +10,7 @@ from backend.blocks.jina._auth import (
JinaCredentialsField,
JinaCredentialsInput,
)
from backend.data.model import SchemaField
from backend.data.model import NodeExecutionStats, SchemaField
from backend.util.request import Requests
@@ -45,5 +45,13 @@ class JinaEmbeddingBlock(Block):
}
data = {"input": input_data.texts, "model": input_data.model}
response = await Requests().post(url, headers=headers, json=data)
embeddings = [e["embedding"] for e in response.json()["data"]]
resp_json = response.json()
embeddings = [e["embedding"] for e in resp_json["data"]]
usage = resp_json.get("usage", {})
if usage.get("total_tokens"):
self.merge_stats(
NodeExecutionStats(
input_token_count=usage.get("total_tokens", 0),
)
)
yield "embeddings", embeddings

View File

@@ -104,6 +104,18 @@ class LlmModelMeta(EnumMeta):
class LlmModel(str, Enum, metaclass=LlmModelMeta):
@classmethod
def _missing_(cls, value: object) -> "LlmModel | None":
"""Handle provider-prefixed model names like 'anthropic/claude-sonnet-4-6'."""
if isinstance(value, str) and "/" in value:
stripped = value.split("/", 1)[1]
try:
return cls(stripped)
except ValueError:
return None
return None
# OpenAI models
O3_MINI = "o3-mini"
O3 = "o3-2025-04-16"
@@ -675,6 +687,7 @@ class LLMResponse(BaseModel):
prompt_tokens: int
completion_tokens: int
reasoning: Optional[str] = None
provider_cost: float | None = None
def convert_openai_tool_fmt_to_anthropic(
@@ -712,6 +725,9 @@ def convert_openai_tool_fmt_to_anthropic(
def extract_openai_reasoning(response) -> str | None:
"""Extract reasoning from OpenAI-compatible response if available."""
"""Note: This will likely not working since the reasoning is not present in another Response API"""
if not response.choices:
logger.warning("LLM response has empty choices in extract_openai_reasoning")
return None
reasoning = None
choice = response.choices[0]
if hasattr(choice, "reasoning") and getattr(choice, "reasoning", None):
@@ -727,6 +743,9 @@ def extract_openai_reasoning(response) -> str | None:
def extract_openai_tool_calls(response) -> list[ToolContentBlock] | None:
"""Extract tool calls from OpenAI-compatible response."""
if not response.choices:
logger.warning("LLM response has empty choices in extract_openai_tool_calls")
return None
if response.choices[0].message.tool_calls:
return [
ToolContentBlock(
@@ -960,6 +979,8 @@ async def llm_call(
response_format=response_format, # type: ignore
max_tokens=max_tokens,
)
if not response.choices:
raise ValueError("Groq returned empty choices in response")
return LLMResponse(
raw_response=response.choices[0].message,
prompt=prompt,
@@ -1019,16 +1040,22 @@ async def llm_call(
parallel_tool_calls=parallel_tool_calls_param,
)
# If there's no response, raise an error
if not response.choices:
if response:
raise ValueError(f"OpenRouter error: {response}")
else:
raise ValueError("No response from OpenRouter.")
raise ValueError(f"OpenRouter returned empty choices: {response}")
tool_calls = extract_openai_tool_calls(response)
reasoning = extract_openai_reasoning(response)
cost = None
try:
raw_resp = getattr(response, "_response", None)
if raw_resp and hasattr(raw_resp, "headers"):
cost_header = raw_resp.headers.get("x-total-cost")
if cost_header:
cost = float(cost_header)
except (ValueError, AttributeError):
pass
return LLMResponse(
raw_response=response.choices[0].message,
prompt=prompt,
@@ -1037,6 +1064,7 @@ async def llm_call(
prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
completion_tokens=response.usage.completion_tokens if response.usage else 0,
reasoning=reasoning,
provider_cost=cost,
)
elif provider == "llama_api":
tools_param = tools if tools else openai.NOT_GIVEN
@@ -1061,12 +1089,8 @@ async def llm_call(
parallel_tool_calls=parallel_tool_calls_param,
)
# If there's no response, raise an error
if not response.choices:
if response:
raise ValueError(f"Llama API error: {response}")
else:
raise ValueError("No response from Llama API.")
raise ValueError(f"Llama API returned empty choices: {response}")
tool_calls = extract_openai_tool_calls(response)
reasoning = extract_openai_reasoning(response)
@@ -1096,6 +1120,8 @@ async def llm_call(
messages=prompt, # type: ignore
max_tokens=max_tokens,
)
if not completion.choices:
raise ValueError("AI/ML API returned empty choices in response")
return LLMResponse(
raw_response=completion.choices[0].message,
@@ -1132,6 +1158,9 @@ async def llm_call(
parallel_tool_calls=parallel_tool_calls_param,
)
if not response.choices:
raise ValueError(f"v0 API returned empty choices: {response}")
tool_calls = extract_openai_tool_calls(response)
reasoning = extract_openai_reasoning(response)
@@ -1360,12 +1389,13 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
max_tokens=input_data.max_tokens,
)
response_text = llm_response.response
self.merge_stats(
NodeExecutionStats(
input_token_count=llm_response.prompt_tokens,
output_token_count=llm_response.completion_tokens,
)
cost_stats = NodeExecutionStats(
input_token_count=llm_response.prompt_tokens,
output_token_count=llm_response.completion_tokens,
)
if llm_response.provider_cost is not None:
cost_stats.provider_cost = llm_response.provider_cost
self.merge_stats(cost_stats)
logger.debug(f"LLM attempt-{retry_count} response: {response_text}")
if input_data.expected_format:
@@ -1999,6 +2029,19 @@ class AIConversationBlock(AIBlockBase):
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
has_messages = any(
isinstance(m, dict)
and isinstance(m.get("content"), str)
and bool(m["content"].strip())
for m in (input_data.messages or [])
)
has_prompt = bool(input_data.prompt and input_data.prompt.strip())
if not has_messages and not has_prompt:
raise ValueError(
"Cannot call LLM with no messages and no prompt. "
"Provide at least one message or a non-empty prompt."
)
response = await self.llm_call(
AIStructuredResponseGeneratorBlock.Input(
prompt=input_data.prompt,

View File

@@ -89,6 +89,12 @@ class MCPToolBlock(Block):
default={},
hidden=True,
)
tool_description: str = SchemaField(
description="Description of the selected MCP tool. "
"Populated automatically when a tool is selected.",
default="",
hidden=True,
)
tool_arguments: dict[str, Any] = SchemaField(
description="Arguments to pass to the selected MCP tool. "

View File

@@ -8,6 +8,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -153,6 +154,7 @@ class AddMemoryBlock(Block, Mem0Base):
messages,
**params,
)
self.merge_stats(NodeExecutionStats(output_size=1))
results = result.get("results", [])
yield "results", results
@@ -255,6 +257,7 @@ class SearchMemoryBlock(Block, Mem0Base):
result: list[dict[str, Any]] = client.search(
input_data.query, version="v2", filters=filters
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "memories", result
except Exception as e:
@@ -340,6 +343,7 @@ class GetAllMemoriesBlock(Block, Mem0Base):
filters=filters,
version="v2",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "memories", memories
@@ -434,6 +438,7 @@ class GetLatestMemoryBlock(Block, Mem0Base):
filters=filters,
version="v2",
)
self.merge_stats(NodeExecutionStats(output_size=1))
if memories:
# Return the latest memory (first in the list as they're sorted by recency)

View File

@@ -10,7 +10,7 @@ from backend.blocks.nvidia._auth import (
NvidiaCredentialsField,
NvidiaCredentialsInput,
)
from backend.data.model import SchemaField
from backend.data.model import NodeExecutionStats, SchemaField
from backend.util.request import Requests
from backend.util.type import MediaFileType
@@ -69,6 +69,7 @@ class NvidiaDeepfakeDetectBlock(Block):
data = response.json()
result = data.get("data", [{}])[0]
self.merge_stats(NodeExecutionStats(output_size=1))
# Get deepfake probability from first bounding box if any
deepfake_prob = 0.0

File diff suppressed because it is too large Load Diff

View File

@@ -17,7 +17,12 @@ from backend.blocks.replicate._auth import (
ReplicateCredentialsInput,
)
from backend.blocks.replicate._helper import ReplicateOutputs, extract_result
from backend.data.model import APIKeyCredentials, CredentialsField, SchemaField
from backend.data.model import (
APIKeyCredentials,
CredentialsField,
NodeExecutionStats,
SchemaField,
)
from backend.util.exceptions import BlockExecutionError, BlockInputError
logger = logging.getLogger(__name__)
@@ -108,6 +113,7 @@ class ReplicateModelBlock(Block):
result = await self.run_model(
model_ref, input_data.model_inputs, credentials.api_key
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "result", result
yield "status", "succeeded"
yield "model_name", input_data.model_name

View File

@@ -16,6 +16,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -185,6 +186,7 @@ class ScreenshotWebPageBlock(Block):
block_chats=input_data.block_chats,
cache=input_data.cache,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "image", screenshot_data["image"]
except Exception as e:
yield "error", str(e)

View File

@@ -15,6 +15,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -146,6 +147,7 @@ class GetWeatherInformationBlock(Block, GetRequest):
weather_data = await self.get_request(url, json=True)
if "main" in weather_data and "weather" in weather_data:
self.merge_stats(NodeExecutionStats(output_size=1))
yield "temperature", str(weather_data["main"]["temp"])
yield "humidity", str(weather_data["main"]["humidity"])
yield "condition", weather_data["weather"][0]["description"]

View File

@@ -23,7 +23,7 @@ from backend.blocks.smartlead.models import (
SaveSequencesResponse,
Sequence,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class CreateCampaignBlock(Block):
@@ -100,6 +100,7 @@ class CreateCampaignBlock(Block):
**kwargs,
) -> BlockOutput:
response = await self.create_campaign(input_data.name, credentials)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "id", response.id
yield "name", response.name
@@ -226,6 +227,7 @@ class AddLeadToCampaignBlock(Block):
response = await self.add_leads_to_campaign(
input_data.campaign_id, input_data.lead_list, credentials
)
self.merge_stats(NodeExecutionStats(output_size=len(input_data.lead_list)))
yield "campaign_id", input_data.campaign_id
yield "upload_count", response.upload_count
@@ -321,6 +323,7 @@ class SaveCampaignSequencesBlock(Block):
response = await self.save_campaign_sequences(
input_data.campaign_id, input_data.sequences, credentials
)
self.merge_stats(NodeExecutionStats(output_size=1))
if response.data:
yield "data", response.data

View File

@@ -0,0 +1,304 @@
import asyncio
from typing import Any, Literal
from pydantic import SecretStr
from sqlalchemy.engine.url import URL
from sqlalchemy.exc import DBAPIError, OperationalError, ProgrammingError
from backend.blocks._base import (
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
)
from backend.blocks.sql_query_helpers import (
_DATABASE_TYPE_DEFAULT_PORT,
_DATABASE_TYPE_TO_DRIVER,
DatabaseType,
_execute_query,
_sanitize_error,
_validate_query_is_read_only,
_validate_single_statement,
)
from backend.data.model import (
CredentialsField,
CredentialsMetaInput,
SchemaField,
UserPasswordCredentials,
)
from backend.integrations.providers import ProviderName
from backend.util.request import resolve_and_check_blocked
TEST_CREDENTIALS = UserPasswordCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
provider="database",
username=SecretStr("test_user"),
password=SecretStr("test_pass"),
title="Mock Database credentials",
)
TEST_CREDENTIALS_INPUT = {
"provider": TEST_CREDENTIALS.provider,
"id": TEST_CREDENTIALS.id,
"type": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.title,
}
DatabaseCredentials = UserPasswordCredentials
DatabaseCredentialsInput = CredentialsMetaInput[
Literal[ProviderName.DATABASE],
Literal["user_password"],
]
def DatabaseCredentialsField() -> DatabaseCredentialsInput:
return CredentialsField(
description="Database username and password",
)
class SQLQueryBlock(Block):
class Input(BlockSchemaInput):
database_type: DatabaseType = SchemaField(
default=DatabaseType.POSTGRES,
description="Database engine",
advanced=False,
)
host: SecretStr = SchemaField(
description="Database hostname or IP address",
placeholder="db.example.com",
secret=True,
)
port: int | None = SchemaField(
default=None,
description=(
"Database port (leave empty for default: "
"PostgreSQL: 5432, MySQL: 3306, MSSQL: 1433)"
),
ge=1,
le=65535,
)
database: str = SchemaField(
description="Name of the database to connect to",
placeholder="my_database",
)
query: str = SchemaField(
description="SQL query to execute",
placeholder="SELECT * FROM analytics.daily_active_users LIMIT 10",
)
read_only: bool = SchemaField(
default=True,
description=(
"When enabled (default), only SELECT queries are allowed "
"and the database session is set to read-only mode. "
"Disable to allow write operations (INSERT, UPDATE, DELETE, etc.)."
),
)
timeout: int = SchemaField(
default=30,
description="Query timeout in seconds (max 120)",
ge=1,
le=120,
)
max_rows: int = SchemaField(
default=1000,
description="Maximum number of rows to return (max 10000)",
ge=1,
le=10000,
)
credentials: DatabaseCredentialsInput = DatabaseCredentialsField()
class Output(BlockSchemaOutput):
results: list[dict[str, Any]] = SchemaField(
description="Query results as a list of row dictionaries"
)
columns: list[str] = SchemaField(
description="Column names from the query result"
)
row_count: int = SchemaField(description="Number of rows returned")
affected_rows: int = SchemaField(
description="Number of rows affected by a write query (INSERT/UPDATE/DELETE)"
)
error: str = SchemaField(description="Error message if the query failed")
def __init__(self):
super().__init__(
id="4dc35c0f-4fd8-465e-9616-5a216f1ba2bc",
description=(
"Execute a SQL query. Read-only by default for safety "
"-- disable to allow write operations. "
"Supports PostgreSQL, MySQL, and MSSQL via SQLAlchemy."
),
categories={BlockCategory.DATA},
input_schema=SQLQueryBlock.Input,
output_schema=SQLQueryBlock.Output,
test_input={
"query": "SELECT 1 AS test_col",
"database_type": DatabaseType.POSTGRES,
"host": "localhost",
"database": "test_db",
"timeout": 30,
"max_rows": 1000,
"credentials": TEST_CREDENTIALS_INPUT,
},
test_credentials=TEST_CREDENTIALS,
test_output=[
("results", [{"test_col": 1}]),
("columns", ["test_col"]),
("row_count", 1),
],
test_mock={
"execute_query": lambda *_args, **_kwargs: (
[{"test_col": 1}],
["test_col"],
-1,
),
"check_host_allowed": lambda *_args, **_kwargs: ["127.0.0.1"],
},
)
@staticmethod
async def check_host_allowed(host: str) -> list[str]:
"""Validate that the given host is not a private/blocked address.
Returns the list of resolved IP addresses so the caller can pin the
connection to the validated IP (preventing DNS rebinding / TOCTOU).
Raises ValueError or OSError if the host is blocked.
Extracted as a method so it can be mocked during block tests.
"""
return await resolve_and_check_blocked(host)
@staticmethod
def execute_query(
connection_url: URL | str,
query: str,
timeout: int,
max_rows: int,
read_only: bool = True,
database_type: DatabaseType = DatabaseType.POSTGRES,
) -> tuple[list[dict[str, Any]], list[str], int]:
"""Execute a SQL query and return (rows, columns, affected_rows).
Delegates to ``_execute_query`` in ``sql_query_helpers``.
Extracted as a method so it can be mocked during block tests.
"""
return _execute_query(
connection_url=connection_url,
query=query,
timeout=timeout,
max_rows=max_rows,
read_only=read_only,
database_type=database_type,
)
async def run(
self,
input_data: Input,
*,
credentials: DatabaseCredentials,
**_kwargs: Any,
) -> BlockOutput:
# Validate query structure and read-only constraints.
error = self._validate_query(input_data)
if error:
yield "error", error
return
# Validate host and resolve for SSRF protection.
host, pinned_host, error = await self._resolve_host(input_data)
if error:
yield "error", error
return
# Build connection URL and execute.
port = input_data.port or _DATABASE_TYPE_DEFAULT_PORT[input_data.database_type]
username = credentials.username.get_secret_value()
connection_url = URL.create(
drivername=_DATABASE_TYPE_TO_DRIVER[input_data.database_type],
username=username,
password=credentials.password.get_secret_value(),
host=pinned_host,
port=port,
database=input_data.database,
)
conn_str = connection_url.render_as_string(hide_password=True)
db_name = input_data.database
def _sanitize(err: Exception) -> str:
return _sanitize_error(
str(err).strip(),
conn_str,
host=pinned_host,
original_host=host,
username=username,
port=port,
database=db_name,
)
try:
results, columns, affected = await asyncio.to_thread(
self.execute_query,
connection_url=connection_url,
query=input_data.query,
timeout=input_data.timeout,
max_rows=input_data.max_rows,
read_only=input_data.read_only,
database_type=input_data.database_type,
)
yield "results", results
yield "columns", columns
yield "row_count", len(results)
if affected >= 0:
yield "affected_rows", affected
except OperationalError as e:
yield "error", self._classify_operational_error(
_sanitize(e),
input_data.timeout,
)
except ProgrammingError as e:
yield "error", f"SQL error: {_sanitize(e)}"
except DBAPIError as e:
yield "error", f"Database error: {_sanitize(e)}"
except ModuleNotFoundError:
yield "error", (
f"Database driver not available for "
f"{input_data.database_type.value}. "
f"Please contact the platform administrator."
)
@staticmethod
def _validate_query(input_data: "SQLQueryBlock.Input") -> str | None:
"""Validate query structure and read-only constraints."""
stmt_error, parsed_stmt = _validate_single_statement(input_data.query)
if stmt_error:
return stmt_error
assert parsed_stmt is not None
if input_data.read_only:
return _validate_query_is_read_only(parsed_stmt)
return None
async def _resolve_host(
self, input_data: "SQLQueryBlock.Input"
) -> tuple[str, str, str | None]:
"""Validate and resolve the database host. Returns (host, pinned_ip, error)."""
host = input_data.host.get_secret_value().strip()
if not host:
return "", "", "Database host is required."
if host.startswith("/"):
return host, "", "Unix socket connections are not allowed."
try:
resolved_ips = await self.check_host_allowed(host)
except (ValueError, OSError) as e:
return host, "", f"Blocked host: {str(e).strip()}"
return host, resolved_ips[0], None
@staticmethod
def _classify_operational_error(sanitized_msg: str, timeout: int) -> str:
"""Classify an already-sanitized OperationalError for user display."""
lower = sanitized_msg.lower()
if "timeout" in lower or "cancel" in lower:
return f"Query timed out after {timeout}s."
if "connect" in lower:
return f"Failed to connect to database: {sanitized_msg}"
return f"Database error: {sanitized_msg}"

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,376 @@
import re
from datetime import date, datetime, time
from decimal import Decimal
from enum import Enum
from typing import Any
import sqlparse
from sqlalchemy import create_engine, text
from sqlalchemy.engine.url import URL
class DatabaseType(str, Enum):
POSTGRES = "postgres"
MYSQL = "mysql"
MSSQL = "mssql"
# Defense-in-depth: reject queries containing data-modifying keywords.
# These are checked against parsed SQL tokens (not raw text) so column names
# and string literals do not cause false positives.
_DISALLOWED_KEYWORDS = {
"INSERT",
"UPDATE",
"DELETE",
"DROP",
"ALTER",
"CREATE",
"TRUNCATE",
"GRANT",
"REVOKE",
"COPY",
"EXECUTE",
"CALL",
"SET",
"RESET",
"DISCARD",
"NOTIFY",
"DO",
}
# Map DatabaseType enum values to the expected SQLAlchemy driver prefix.
_DATABASE_TYPE_TO_DRIVER = {
DatabaseType.POSTGRES: "postgresql",
DatabaseType.MYSQL: "mysql+pymysql",
DatabaseType.MSSQL: "mssql+pymssql",
}
# Default ports for each database type.
_DATABASE_TYPE_DEFAULT_PORT = {
DatabaseType.POSTGRES: 5432,
DatabaseType.MYSQL: 3306,
DatabaseType.MSSQL: 1433,
}
def _sanitize_error(
error_msg: str,
connection_string: str,
*,
host: str = "",
original_host: str = "",
username: str = "",
port: int = 0,
database: str = "",
) -> str:
"""Remove connection string, credentials, and infrastructure details
from error messages so they are safe to expose to the LLM.
Scrubs:
- The full connection string
- URL-embedded credentials (``://user:pass@``)
- ``password=<value>`` key-value pairs
- The database hostname / IP used for the connection
- The original (pre-resolution) hostname provided by the user
- Any IPv4 addresses that appear in the message
- Any bracketed IPv6 addresses (e.g. ``[::1]``, ``[fe80::1%eth0]``)
- The database username
- The database port number
- The database name
"""
sanitized = error_msg.replace(connection_string, "<connection_string>")
sanitized = re.sub(r"password=[^\s&]+", "password=***", sanitized)
sanitized = re.sub(r"://[^@]+@", "://***:***@", sanitized)
# Replace the known host (may be an IP already) before the generic IP pass.
# Also replace the original (pre-DNS-resolution) hostname if it differs.
if original_host and original_host != host:
sanitized = sanitized.replace(original_host, "<host>")
if host:
sanitized = sanitized.replace(host, "<host>")
# Replace any remaining IPv4 addresses (e.g. resolved IPs the driver logs)
sanitized = re.sub(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", "<ip>", sanitized)
# Replace bracketed IPv6 addresses (e.g. "[::1]", "[fe80::1%eth0]")
sanitized = re.sub(r"\[[0-9a-fA-F:]+(?:%[^\]]+)?\]", "<ip>", sanitized)
# Replace the database username (handles double-quoted, single-quoted,
# and unquoted formats across PostgreSQL, MySQL, and MSSQL error messages).
if username:
sanitized = re.sub(
r"""for user ["']?""" + re.escape(username) + r"""["']?""",
"for user <user>",
sanitized,
)
# Catch remaining bare occurrences in various quote styles:
# - PostgreSQL: "FATAL: role "myuser" does not exist"
# - MySQL: "Access denied for user 'myuser'@'host'"
# - MSSQL: "Login failed for user 'myuser'"
sanitized = sanitized.replace(f'"{username}"', "<user>")
sanitized = sanitized.replace(f"'{username}'", "<user>")
# Replace the port number (handles "port 5432" and ":5432" formats)
if port:
port_str = re.escape(str(port))
sanitized = re.sub(
r"(?:port |:)" + port_str + r"(?![0-9])",
lambda m: ("port " if m.group().startswith("p") else ":") + "<port>",
sanitized,
)
# Replace the database name to avoid leaking internal infrastructure names.
# Use word-boundary regex to prevent mangling when the database name is a
# common substring (e.g. "test", "data", "on").
if database:
sanitized = re.sub(r"\b" + re.escape(database) + r"\b", "<database>", sanitized)
return sanitized
def _extract_keyword_tokens(parsed: sqlparse.sql.Statement) -> list[str]:
"""Extract keyword tokens from a parsed SQL statement.
Uses sqlparse token type classification to collect Keyword/DML/DDL/DCL
tokens. String literals and identifiers have different token types, so
they are naturally excluded from the result.
"""
return [
token.normalized.upper()
for token in parsed.flatten()
if token.ttype
in (
sqlparse.tokens.Keyword,
sqlparse.tokens.Keyword.DML,
sqlparse.tokens.Keyword.DDL,
sqlparse.tokens.Keyword.DCL,
)
]
def _has_disallowed_into(stmt: sqlparse.sql.Statement) -> bool:
"""Check if a statement contains a disallowed ``INTO`` clause.
``SELECT ... INTO @variable`` is a valid read-only MySQL syntax that stores
a query result into a session-scoped user variable. All other forms of
``INTO`` are data-modifying or file-writing and must be blocked:
* ``SELECT ... INTO new_table`` (PostgreSQL / MSSQL creates a table)
* ``SELECT ... INTO OUTFILE`` (MySQL writes to the filesystem)
* ``SELECT ... INTO DUMPFILE`` (MySQL writes to the filesystem)
* ``INSERT INTO ...`` (already blocked by INSERT being in the
disallowed set, but we reject INTO as well for defense-in-depth)
Returns ``True`` if the statement contains a disallowed ``INTO``.
"""
flat = list(stmt.flatten())
for i, token in enumerate(flat):
if not (
token.ttype in (sqlparse.tokens.Keyword,)
and token.normalized.upper() == "INTO"
):
continue
# Look at the first non-whitespace token after INTO.
j = i + 1
while j < len(flat) and flat[j].ttype is sqlparse.tokens.Text.Whitespace:
j += 1
if j >= len(flat):
# INTO at the very end malformed, block it.
return True
next_token = flat[j]
# MySQL user variable: either a single Name starting with "@"
# (e.g. ``@total``) or a bare ``@`` Operator token followed by a Name.
if next_token.ttype is sqlparse.tokens.Name and next_token.value.startswith(
"@"
):
continue
if next_token.ttype is sqlparse.tokens.Operator and next_token.value == "@":
continue
# Everything else (table name, OUTFILE, DUMPFILE, etc.) is disallowed.
return True
return False
def _validate_query_is_read_only(stmt: sqlparse.sql.Statement) -> str | None:
"""Validate that a parsed SQL statement is read-only (SELECT/WITH only).
Accepts an already-parsed statement from ``_validate_single_statement``
to avoid re-parsing. Checks:
1. Statement type must be SELECT (sqlparse classifies WITH...SELECT as SELECT)
2. No disallowed keywords (INSERT, UPDATE, DELETE, DROP, etc.)
3. No disallowed INTO clauses (allows MySQL ``SELECT ... INTO @variable``)
Returns an error message if the query is not read-only, None otherwise.
"""
# sqlparse returns 'SELECT' for SELECT and WITH...SELECT queries
if stmt.get_type() != "SELECT":
return "Only SELECT queries are allowed."
# Defense-in-depth: check parsed keyword tokens for disallowed keywords
for kw in _extract_keyword_tokens(stmt):
# Normalize multi-word tokens (e.g. "SET LOCAL" -> "SET")
base_kw = kw.split()[0] if " " in kw else kw
if base_kw in _DISALLOWED_KEYWORDS:
return f"Disallowed SQL keyword: {kw}"
# Contextual check for INTO: allow MySQL @variable syntax, block everything else
if _has_disallowed_into(stmt):
return "Disallowed SQL keyword: INTO"
return None
def _validate_single_statement(
query: str,
) -> tuple[str | None, sqlparse.sql.Statement | None]:
"""Validate that the query contains exactly one non-empty SQL statement.
Returns (error_message, parsed_statement). If error_message is not None,
the query is invalid and parsed_statement will be None.
"""
stripped = query.strip().rstrip(";").strip()
if not stripped:
return "Query is empty.", None
# Parse the SQL using sqlparse for proper tokenization
statements = sqlparse.parse(stripped)
# Filter out empty statements and comment-only statements
statements = [
s
for s in statements
if s.tokens
and str(s).strip()
and not all(
t.is_whitespace or t.ttype in sqlparse.tokens.Comment for t in s.flatten()
)
]
if not statements:
return "Query is empty.", None
# Reject multiple statements -- prevents injection via semicolons
if len(statements) > 1:
return "Only single statements are allowed.", None
return None, statements[0]
def _serialize_value(value: Any) -> Any:
"""Convert database-specific types to JSON-serializable Python types."""
if isinstance(value, Decimal):
# Use int for whole numbers; use str for fractional to preserve exact
# precision (float would silently round high-precision analytics values).
if value == value.to_integral_value():
return int(value)
return str(value)
if isinstance(value, (datetime, date, time)):
return value.isoformat()
if isinstance(value, memoryview):
return bytes(value).hex()
if isinstance(value, bytes):
return value.hex()
return value
def _configure_session(
conn: Any,
dialect_name: str,
timeout_ms: str,
read_only: bool,
) -> None:
"""Set session-level timeout and read-only mode for the given dialect."""
if dialect_name == "postgresql":
conn.execute(text("SET statement_timeout = " + timeout_ms))
if read_only:
conn.execute(text("SET default_transaction_read_only = ON"))
elif dialect_name == "mysql":
# NOTE: MAX_EXECUTION_TIME only applies to SELECT statements.
# Write queries (INSERT/UPDATE/DELETE) are not bounded by this
# setting; they rely on the database's wait_timeout instead.
conn.execute(text("SET SESSION MAX_EXECUTION_TIME = " + timeout_ms))
if read_only:
conn.execute(text("SET SESSION TRANSACTION READ ONLY"))
elif dialect_name == "mssql":
# MSSQL: SET LOCK_TIMEOUT limits lock-wait time (ms).
# pymssql's connect_args "login_timeout" handles the connection
# timeout, but LOCK_TIMEOUT covers in-query lock waits.
conn.execute(text("SET LOCK_TIMEOUT " + timeout_ms))
# MSSQL lacks a session-level read-only mode like
# PostgreSQL/MySQL. Read-only enforcement is handled by
# the SQL validation layer (_validate_query_is_read_only)
# and the ROLLBACK in the finally block.
def _run_in_transaction(
conn: Any,
dialect_name: str,
query: str,
max_rows: int,
read_only: bool,
) -> tuple[list[dict[str, Any]], list[str], int]:
"""Execute a query inside an explicit transaction, returning results."""
# MSSQL uses T-SQL "BEGIN TRANSACTION"; others use "BEGIN".
begin_stmt = "BEGIN TRANSACTION" if dialect_name == "mssql" else "BEGIN"
conn.execute(text(begin_stmt))
try:
result = conn.execute(text(query))
affected = result.rowcount if not result.returns_rows else -1
columns = list(result.keys()) if result.returns_rows else []
rows = result.fetchmany(max_rows) if result.returns_rows else []
results = [
{col: _serialize_value(val) for col, val in zip(columns, row)}
for row in rows
]
except Exception:
conn.execute(text("ROLLBACK"))
raise
else:
conn.execute(text("ROLLBACK" if read_only else "COMMIT"))
return results, columns, affected
def _execute_query(
connection_url: URL | str,
query: str,
timeout: int,
max_rows: int,
read_only: bool = True,
database_type: DatabaseType = DatabaseType.POSTGRES,
) -> tuple[list[dict[str, Any]], list[str], int]:
"""Execute a SQL query and return (rows, columns, affected_rows).
Uses SQLAlchemy to connect to any supported database.
For SELECT queries, rows are limited to ``max_rows`` via DBAPI fetchmany.
For write queries, affected_rows contains the rowcount from the driver.
When ``read_only`` is True, the database session is set to read-only
mode and the transaction is always rolled back.
"""
# Determine driver-specific connection timeout argument.
# pymssql uses "login_timeout", while PostgreSQL/MySQL use "connect_timeout".
timeout_key = (
"login_timeout" if database_type == DatabaseType.MSSQL else "connect_timeout"
)
engine = create_engine(connection_url, connect_args={timeout_key: 10})
try:
with engine.connect() as conn:
# Use AUTOCOMMIT so SET commands take effect immediately.
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
# Compute timeout in milliseconds. The value is Pydantic-validated
# (ge=1, le=120), but we use int() as defense-in-depth.
# NOTE: SET commands do not support bind parameters in most
# databases, so we use str(int(...)) for safe interpolation.
timeout_ms = str(int(timeout * 1000))
_configure_session(conn, engine.dialect.name, timeout_ms, read_only)
return _run_in_transaction(
conn, engine.dialect.name, query, max_rows, read_only
)
finally:
engine.dispose()

View File

@@ -15,6 +15,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -181,6 +182,7 @@ class CreateTalkingAvatarVideoBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
return
elif status_response["status"] == "error":

View File

@@ -4,6 +4,8 @@ import pytest
from backend.blocks import get_blocks
from backend.blocks._base import Block, BlockSchemaInput
from backend.blocks.io import AgentDropdownInputBlock, AgentInputBlock
from backend.data.graph import BaseGraph
from backend.data.model import SchemaField
from backend.util.test import execute_block_test
@@ -279,3 +281,113 @@ class TestAutoCredentialsFieldsValidation:
assert "Duplicate auto_credentials kwarg_name 'credentials'" in str(
exc_info.value
)
def test_agent_input_block_ignores_legacy_placeholder_values():
"""Verify AgentInputBlock.Input.model_construct tolerates extra placeholder_values
for backward compatibility with existing agent JSON."""
legacy_data = {
"name": "url",
"value": "",
"description": "Enter a URL",
"placeholder_values": ["https://example.com"],
}
instance = AgentInputBlock.Input.model_construct(**legacy_data)
schema = instance.generate_schema()
assert (
"enum" not in schema
), "AgentInputBlock should not produce enum from legacy placeholder_values"
def test_dropdown_input_block_produces_enum():
"""Verify AgentDropdownInputBlock.Input.generate_schema() produces enum
using the canonical 'options' field name."""
opts = ["Option A", "Option B"]
instance = AgentDropdownInputBlock.Input.model_construct(
name="choice", value=None, options=opts
)
schema = instance.generate_schema()
assert schema.get("enum") == opts
def test_dropdown_input_block_legacy_placeholder_values_produces_enum():
"""Verify backward compat: passing legacy 'placeholder_values' to
AgentDropdownInputBlock still produces enum via model_construct remap."""
opts = ["Option A", "Option B"]
instance = AgentDropdownInputBlock.Input.model_construct(
name="choice", value=None, placeholder_values=opts
)
schema = instance.generate_schema()
assert (
schema.get("enum") == opts
), "Legacy placeholder_values should be remapped to options"
def test_generate_schema_integration_legacy_placeholder_values():
"""Test the full Graph._generate_schema path with legacy placeholder_values
on AgentInputBlock — verifies no enum leaks through the graph loading path."""
legacy_input_default = {
"name": "url",
"value": "",
"description": "Enter a URL",
"placeholder_values": ["https://example.com"],
}
result = BaseGraph._generate_schema(
(AgentInputBlock.Input, legacy_input_default),
)
url_props = result["properties"]["url"]
assert (
"enum" not in url_props
), "Graph schema should not contain enum from AgentInputBlock placeholder_values"
def test_generate_schema_integration_dropdown_produces_enum():
"""Test the full Graph._generate_schema path with AgentDropdownInputBlock
— verifies enum IS produced for dropdown blocks using canonical field name."""
dropdown_input_default = {
"name": "color",
"value": None,
"options": ["Red", "Green", "Blue"],
}
result = BaseGraph._generate_schema(
(AgentDropdownInputBlock.Input, dropdown_input_default),
)
color_props = result["properties"]["color"]
assert color_props.get("enum") == [
"Red",
"Green",
"Blue",
], "Graph schema should contain enum from AgentDropdownInputBlock"
def test_generate_schema_integration_dropdown_legacy_placeholder_values():
"""Test the full Graph._generate_schema path with AgentDropdownInputBlock
using legacy 'placeholder_values' — verifies backward compat produces enum."""
legacy_dropdown_input_default = {
"name": "color",
"value": None,
"placeholder_values": ["Red", "Green", "Blue"],
}
result = BaseGraph._generate_schema(
(AgentDropdownInputBlock.Input, legacy_dropdown_input_default),
)
color_props = result["properties"]["color"]
assert color_props.get("enum") == [
"Red",
"Green",
"Blue",
], "Legacy placeholder_values should still produce enum via model_construct remap"
def test_dropdown_input_block_init_legacy_placeholder_values():
"""Verify backward compat: constructing AgentDropdownInputBlock.Input via
model_validate with legacy 'placeholder_values' correctly maps to 'options'."""
opts = ["Option A", "Option B"]
instance = AgentDropdownInputBlock.Input.model_validate(
{"name": "choice", "value": None, "placeholder_values": opts}
)
assert (
instance.options == opts
), "Legacy placeholder_values should be remapped to options via model_validate"
schema = instance.generate_schema()
assert schema.get("enum") == opts

View File

@@ -207,6 +207,51 @@ class TestXMLParserBlockSecurity:
pass
class TestXMLParserBlockSyntaxErrors:
"""XML syntax errors should raise ValueError (not SyntaxError).
This ensures the base Block.execute() wraps them as BlockExecutionError
(expected / user-caused) instead of BlockUnknownError (unexpected / alerts
Sentry).
"""
async def test_unclosed_tag_raises_value_error(self):
"""Unclosed tags should raise ValueError, not SyntaxError."""
block = XMLParserBlock()
bad_xml = "<root><unclosed>"
with pytest.raises(ValueError, match="Unclosed tag"):
async for _ in block.run(XMLParserBlock.Input(input_xml=bad_xml)):
pass
async def test_unexpected_closing_tag_raises_value_error(self):
"""Extra closing tags should raise ValueError, not SyntaxError."""
block = XMLParserBlock()
bad_xml = "</unexpected>"
with pytest.raises(ValueError):
async for _ in block.run(XMLParserBlock.Input(input_xml=bad_xml)):
pass
async def test_empty_xml_raises_value_error(self):
"""Empty XML input should raise ValueError."""
block = XMLParserBlock()
with pytest.raises(ValueError, match="XML input is empty"):
async for _ in block.run(XMLParserBlock.Input(input_xml="")):
pass
async def test_syntax_error_from_parser_becomes_value_error(self):
"""SyntaxErrors from gravitasml library become ValueError (BlockExecutionError)."""
block = XMLParserBlock()
# Malformed XML that might trigger a SyntaxError from the parser
bad_xml = "<root><child>no closing"
with pytest.raises(ValueError):
async for _ in block.run(XMLParserBlock.Input(input_xml=bad_xml)):
pass
class TestStoreMediaFileSecurity:
"""Test file storage security limits."""

View File

@@ -488,6 +488,154 @@ class TestLLMStatsTracking:
assert outputs["response"] == {"result": "test"}
class TestAIConversationBlockValidation:
"""Test that AIConversationBlock validates inputs before calling the LLM."""
@pytest.mark.asyncio
async def test_empty_messages_and_empty_prompt_raises_error(self):
"""Empty messages with no prompt should raise ValueError, not a cryptic API error."""
block = llm.AIConversationBlock()
input_data = llm.AIConversationBlock.Input(
messages=[],
prompt="",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
with pytest.raises(ValueError, match="no messages and no prompt"):
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
pass
@pytest.mark.asyncio
async def test_empty_messages_with_prompt_succeeds(self):
"""Empty messages but a non-empty prompt should proceed without error."""
block = llm.AIConversationBlock()
async def mock_llm_call(input_data, credentials):
return {"response": "OK"}
with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
input_data = llm.AIConversationBlock.Input(
messages=[],
prompt="Hello, how are you?",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
outputs = {}
async for name, data in block.run(
input_data, credentials=llm.TEST_CREDENTIALS
):
outputs[name] = data
assert outputs["response"] == "OK"
@pytest.mark.asyncio
async def test_nonempty_messages_with_empty_prompt_succeeds(self):
"""Non-empty messages with no prompt should proceed without error."""
block = llm.AIConversationBlock()
async def mock_llm_call(input_data, credentials):
return {"response": "response from conversation"}
with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
input_data = llm.AIConversationBlock.Input(
messages=[{"role": "user", "content": "Hello"}],
prompt="",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
outputs = {}
async for name, data in block.run(
input_data, credentials=llm.TEST_CREDENTIALS
):
outputs[name] = data
assert outputs["response"] == "response from conversation"
@pytest.mark.asyncio
async def test_messages_with_empty_content_raises_error(self):
"""Messages with empty content strings should be treated as no messages."""
block = llm.AIConversationBlock()
input_data = llm.AIConversationBlock.Input(
messages=[{"role": "user", "content": ""}],
prompt="",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
with pytest.raises(ValueError, match="no messages and no prompt"):
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
pass
@pytest.mark.asyncio
async def test_messages_with_whitespace_content_raises_error(self):
"""Messages with whitespace-only content should be treated as no messages."""
block = llm.AIConversationBlock()
input_data = llm.AIConversationBlock.Input(
messages=[{"role": "user", "content": " "}],
prompt="",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
with pytest.raises(ValueError, match="no messages and no prompt"):
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
pass
@pytest.mark.asyncio
async def test_messages_with_none_entry_raises_error(self):
"""Messages list containing None should be treated as no messages."""
block = llm.AIConversationBlock()
input_data = llm.AIConversationBlock.Input(
messages=[None],
prompt="",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
with pytest.raises(ValueError, match="no messages and no prompt"):
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
pass
@pytest.mark.asyncio
async def test_messages_with_empty_dict_raises_error(self):
"""Messages list containing empty dict should be treated as no messages."""
block = llm.AIConversationBlock()
input_data = llm.AIConversationBlock.Input(
messages=[{}],
prompt="",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
with pytest.raises(ValueError, match="no messages and no prompt"):
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
pass
@pytest.mark.asyncio
async def test_messages_with_none_content_raises_error(self):
"""Messages with content=None should not crash with AttributeError."""
block = llm.AIConversationBlock()
input_data = llm.AIConversationBlock.Input(
messages=[{"role": "user", "content": None}],
prompt="",
model=llm.DEFAULT_LLM_MODEL,
credentials=_TEST_AI_CREDENTIALS,
)
with pytest.raises(ValueError, match="no messages and no prompt"):
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
pass
class TestAITextSummarizerValidation:
"""Test that AITextSummarizerBlock validates LLM responses are strings."""
@@ -809,3 +957,33 @@ class TestUserErrorStatusCodeHandling:
mock_warning.assert_called_once()
mock_exception.assert_not_called()
class TestLlmModelMissing:
"""Test that LlmModel handles provider-prefixed model names."""
def test_provider_prefixed_model_resolves(self):
"""Provider-prefixed model string should resolve to the correct enum member."""
assert (
llm.LlmModel("anthropic/claude-sonnet-4-6")
== llm.LlmModel.CLAUDE_4_6_SONNET
)
def test_bare_model_still_works(self):
"""Bare (non-prefixed) model string should still resolve correctly."""
assert llm.LlmModel("claude-sonnet-4-6") == llm.LlmModel.CLAUDE_4_6_SONNET
def test_invalid_prefixed_model_raises(self):
"""Unknown provider-prefixed model string should raise ValueError."""
with pytest.raises(ValueError):
llm.LlmModel("invalid/nonexistent-model")
def test_slash_containing_value_direct_lookup(self):
"""Enum values with '/' (e.g., OpenRouter models) should resolve via direct lookup, not _missing_."""
assert llm.LlmModel("google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
def test_double_prefixed_slash_model(self):
"""Double-prefixed value should still resolve by stripping first prefix."""
assert (
llm.LlmModel("extra/google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
)

View File

@@ -0,0 +1,87 @@
"""Tests for empty-choices guard in extract_openai_tool_calls() and extract_openai_reasoning()."""
from unittest.mock import MagicMock
from backend.blocks.llm import extract_openai_reasoning, extract_openai_tool_calls
class TestExtractOpenaiToolCallsEmptyChoices:
"""extract_openai_tool_calls() must return None when choices is empty."""
def test_returns_none_for_empty_choices(self):
response = MagicMock()
response.choices = []
assert extract_openai_tool_calls(response) is None
def test_returns_none_for_none_choices(self):
response = MagicMock()
response.choices = None
assert extract_openai_tool_calls(response) is None
def test_returns_tool_calls_when_choices_present(self):
tool = MagicMock()
tool.id = "call_1"
tool.type = "function"
tool.function.name = "my_func"
tool.function.arguments = '{"a": 1}'
message = MagicMock()
message.tool_calls = [tool]
choice = MagicMock()
choice.message = message
response = MagicMock()
response.choices = [choice]
result = extract_openai_tool_calls(response)
assert result is not None
assert len(result) == 1
assert result[0].function.name == "my_func"
def test_returns_none_when_no_tool_calls(self):
message = MagicMock()
message.tool_calls = None
choice = MagicMock()
choice.message = message
response = MagicMock()
response.choices = [choice]
assert extract_openai_tool_calls(response) is None
class TestExtractOpenaiReasoningEmptyChoices:
"""extract_openai_reasoning() must return None when choices is empty."""
def test_returns_none_for_empty_choices(self):
response = MagicMock()
response.choices = []
assert extract_openai_reasoning(response) is None
def test_returns_none_for_none_choices(self):
response = MagicMock()
response.choices = None
assert extract_openai_reasoning(response) is None
def test_returns_reasoning_from_choice(self):
choice = MagicMock()
choice.reasoning = "Step-by-step reasoning"
choice.message = MagicMock(spec=[]) # no 'reasoning' attr on message
response = MagicMock(spec=[]) # no 'reasoning' attr on response
response.choices = [choice]
result = extract_openai_reasoning(response)
assert result == "Step-by-step reasoning"
def test_returns_none_when_no_reasoning(self):
choice = MagicMock(spec=[]) # no 'reasoning' attr
choice.message = MagicMock(spec=[]) # no 'reasoning' attr
response = MagicMock(spec=[]) # no 'reasoning' attr
response.choices = [choice]
result = extract_openai_reasoning(response)
assert result is None

View File

@@ -1074,6 +1074,7 @@ async def test_orchestrator_uses_customized_name_for_blocks():
mock_node.block_id = StoreValueBlock().id
mock_node.metadata = {"customized_name": "My Custom Tool Name"}
mock_node.block = StoreValueBlock()
mock_node.input_default = {}
# Create a mock link
mock_link = MagicMock(spec=Link)
@@ -1105,6 +1106,7 @@ async def test_orchestrator_falls_back_to_block_name():
mock_node.block_id = StoreValueBlock().id
mock_node.metadata = {} # No customized_name
mock_node.block = StoreValueBlock()
mock_node.input_default = {}
# Create a mock link
mock_link = MagicMock(spec=Link)

View File

@@ -0,0 +1,202 @@
"""Tests for ExecutionMode enum and provider validation in the orchestrator.
Covers:
- ExecutionMode enum members exist and have stable values
- EXTENDED_THINKING provider validation (anthropic/open_router allowed, others rejected)
- EXTENDED_THINKING model-name validation (must start with "claude")
"""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.blocks.llm import LlmModel
from backend.blocks.orchestrator import ExecutionMode, OrchestratorBlock
# ---------------------------------------------------------------------------
# ExecutionMode enum integrity
# ---------------------------------------------------------------------------
class TestExecutionModeEnum:
"""Guard against accidental renames or removals of enum members."""
def test_built_in_exists(self):
assert hasattr(ExecutionMode, "BUILT_IN")
assert ExecutionMode.BUILT_IN.value == "built_in"
def test_extended_thinking_exists(self):
assert hasattr(ExecutionMode, "EXTENDED_THINKING")
assert ExecutionMode.EXTENDED_THINKING.value == "extended_thinking"
def test_exactly_two_members(self):
"""If a new mode is added, this test should be updated intentionally."""
assert set(ExecutionMode.__members__.keys()) == {
"BUILT_IN",
"EXTENDED_THINKING",
}
def test_string_enum(self):
"""ExecutionMode is a str enum so it serialises cleanly to JSON."""
assert isinstance(ExecutionMode.BUILT_IN, str)
assert isinstance(ExecutionMode.EXTENDED_THINKING, str)
def test_round_trip_from_value(self):
"""Constructing from the string value should return the same member."""
assert ExecutionMode("built_in") is ExecutionMode.BUILT_IN
assert ExecutionMode("extended_thinking") is ExecutionMode.EXTENDED_THINKING
# ---------------------------------------------------------------------------
# Provider validation (inline in OrchestratorBlock.run)
# ---------------------------------------------------------------------------
def _make_model_stub(provider: str, value: str):
"""Create a lightweight stub that behaves like LlmModel for validation."""
metadata = MagicMock()
metadata.provider = provider
stub = MagicMock()
stub.metadata = metadata
stub.value = value
return stub
class TestExtendedThinkingProviderValidation:
"""The orchestrator rejects EXTENDED_THINKING for non-Anthropic providers."""
def test_anthropic_provider_accepted(self):
"""provider='anthropic' + claude model should not raise."""
model = _make_model_stub("anthropic", "claude-opus-4-6")
provider = model.metadata.provider
model_name = model.value
assert provider in ("anthropic", "open_router")
assert model_name.startswith("claude")
def test_open_router_provider_accepted(self):
"""provider='open_router' + claude model should not raise."""
model = _make_model_stub("open_router", "claude-sonnet-4-6")
provider = model.metadata.provider
model_name = model.value
assert provider in ("anthropic", "open_router")
assert model_name.startswith("claude")
def test_openai_provider_rejected(self):
"""provider='openai' should be rejected for EXTENDED_THINKING."""
model = _make_model_stub("openai", "gpt-4o")
provider = model.metadata.provider
assert provider not in ("anthropic", "open_router")
def test_groq_provider_rejected(self):
model = _make_model_stub("groq", "llama-3.3-70b-versatile")
provider = model.metadata.provider
assert provider not in ("anthropic", "open_router")
def test_non_claude_model_rejected_even_if_anthropic_provider(self):
"""A hypothetical non-Claude model with provider='anthropic' is rejected."""
model = _make_model_stub("anthropic", "not-a-claude-model")
model_name = model.value
assert not model_name.startswith("claude")
def test_real_gpt4o_model_rejected(self):
"""Verify a real LlmModel enum member (GPT4O) fails the provider check."""
model = LlmModel.GPT4O
provider = model.metadata.provider
assert provider not in ("anthropic", "open_router")
def test_real_claude_model_passes(self):
"""Verify a real LlmModel enum member (CLAUDE_4_6_SONNET) passes."""
model = LlmModel.CLAUDE_4_6_SONNET
provider = model.metadata.provider
model_name = model.value
assert provider in ("anthropic", "open_router")
assert model_name.startswith("claude")
# ---------------------------------------------------------------------------
# Integration-style: exercise the validation branch via OrchestratorBlock.run
# ---------------------------------------------------------------------------
def _make_input_data(model, execution_mode=ExecutionMode.EXTENDED_THINKING):
"""Build a minimal MagicMock that satisfies OrchestratorBlock.run's early path."""
inp = MagicMock()
inp.execution_mode = execution_mode
inp.model = model
inp.prompt = "test"
inp.sys_prompt = ""
inp.conversation_history = []
inp.last_tool_output = None
inp.prompt_values = {}
return inp
async def _collect_run_outputs(block, input_data, **kwargs):
"""Exhaust the OrchestratorBlock.run async generator, collecting outputs."""
outputs = []
async for item in block.run(input_data, **kwargs):
outputs.append(item)
return outputs
class TestExtendedThinkingValidationRaisesInBlock:
"""Call OrchestratorBlock.run far enough to trigger the ValueError."""
@pytest.mark.asyncio
async def test_non_anthropic_provider_raises_valueerror(self):
"""EXTENDED_THINKING + openai provider raises ValueError."""
block = OrchestratorBlock()
input_data = _make_input_data(model=LlmModel.GPT4O)
with (
patch.object(
block,
"_create_tool_node_signatures",
new_callable=AsyncMock,
return_value=[],
),
pytest.raises(ValueError, match="Anthropic-compatible"),
):
await _collect_run_outputs(
block,
input_data,
credentials=MagicMock(),
graph_id="g",
node_id="n",
graph_exec_id="ge",
node_exec_id="ne",
user_id="u",
graph_version=1,
execution_context=MagicMock(),
execution_processor=MagicMock(),
)
@pytest.mark.asyncio
async def test_non_claude_model_with_anthropic_provider_raises(self):
"""A model with anthropic provider but non-claude name raises ValueError."""
block = OrchestratorBlock()
fake_model = _make_model_stub("anthropic", "not-a-claude-model")
input_data = _make_input_data(model=fake_model)
with (
patch.object(
block,
"_create_tool_node_signatures",
new_callable=AsyncMock,
return_value=[],
),
pytest.raises(ValueError, match="only supports Claude models"),
):
await _collect_run_outputs(
block,
input_data,
credentials=MagicMock(),
graph_id="g",
node_id="n",
graph_exec_id="ge",
node_exec_id="ne",
user_id="u",
graph_version=1,
execution_context=MagicMock(),
execution_processor=MagicMock(),
)

File diff suppressed because it is too large Load Diff

View File

@@ -13,6 +13,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -104,4 +105,5 @@ class UnrealTextToSpeechBlock(Block):
input_data.text,
input_data.voice_id,
)
self.merge_stats(NodeExecutionStats(output_size=len(input_data.text)))
yield "mp3_url", api_response["OutputUri"]

View File

@@ -44,7 +44,7 @@ class XMLParserBlock(Block):
elif token.type == "TAG_CLOSE":
depth -= 1
if depth < 0:
raise SyntaxError("Unexpected closing tag in XML input.")
raise ValueError("Unexpected closing tag in XML input.")
elif token.type in {"TEXT", "ESCAPE"}:
if depth == 0 and token.value:
raise ValueError(
@@ -53,7 +53,7 @@ class XMLParserBlock(Block):
)
if depth != 0:
raise SyntaxError("Unclosed tag detected in XML input.")
raise ValueError("Unclosed tag detected in XML input.")
if not root_seen:
raise ValueError("XML must include a root element.")
@@ -76,4 +76,7 @@ class XMLParserBlock(Block):
except ValueError as val_e:
raise ValueError(f"Validation error for dict:{val_e}") from val_e
except SyntaxError as syn_e:
raise SyntaxError(f"Error in input xml syntax: {syn_e}") from syn_e
# Raise as ValueError so the base Block.execute() wraps it as
# BlockExecutionError (expected user-caused failure) instead of
# BlockUnknownError (unexpected platform error that alerts Sentry).
raise ValueError(f"Error in input xml syntax: {syn_e}") from syn_e

View File

@@ -19,6 +19,7 @@ from backend.blocks._base import (
from backend.data.model import (
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
UserPasswordCredentials,
)
@@ -170,6 +171,7 @@ class TranscribeYoutubeVideoBlock(Block):
transcript = self.get_transcript(video_id, credentials)
transcript_text = self.format_transcript(transcript=transcript)
self.merge_stats(NodeExecutionStats(output_size=1))
# Only yield after all operations succeed
yield "video_id", video_id
yield "transcript", transcript_text

View File

@@ -21,7 +21,7 @@ from backend.blocks.zerobounce._auth import (
ZeroBounceCredentials,
ZeroBounceCredentialsInput,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class Response(BaseModel):
@@ -177,5 +177,6 @@ class ValidateEmailsBlock(Block):
)
response_model = Response(**response.__dict__)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "response", response_model

View File

@@ -9,12 +9,16 @@ shared tool registry as the SDK path.
import asyncio
import logging
import uuid
from collections.abc import AsyncGenerator
from typing import Any
from collections.abc import AsyncGenerator, Sequence
from dataclasses import dataclass, field
from functools import partial
from typing import Any, cast
import orjson
from langfuse import propagate_attributes
from openai.types.chat import ChatCompletionMessageParam, ChatCompletionToolParam
from backend.copilot.context import set_execution_context
from backend.copilot.model import (
ChatMessage,
ChatSession,
@@ -47,8 +51,24 @@ from backend.copilot.service import (
from backend.copilot.token_tracking import persist_and_record_usage
from backend.copilot.tools import execute_tool, get_available_tools
from backend.copilot.tracking import track_user_message
from backend.copilot.transcript import (
download_transcript,
upload_transcript,
validate_transcript,
)
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.util.exceptions import NotFoundError
from backend.util.prompt import compress_context
from backend.util.prompt import (
compress_context,
estimate_token_count,
estimate_token_count_str,
)
from backend.util.tool_call_loop import (
LLMLoopResponse,
LLMToolCall,
ToolCallResult,
tool_call_loop,
)
logger = logging.getLogger(__name__)
@@ -59,6 +79,286 @@ _background_tasks: set[asyncio.Task[Any]] = set()
_MAX_TOOL_ROUNDS = 30
@dataclass
class _BaselineStreamState:
"""Mutable state shared between the tool-call loop callbacks.
Extracted from ``stream_chat_completion_baseline`` so that the callbacks
can be module-level functions instead of deeply nested closures.
"""
pending_events: list[StreamBaseResponse] = field(default_factory=list)
assistant_text: str = ""
text_block_id: str = field(default_factory=lambda: str(uuid.uuid4()))
text_started: bool = False
turn_prompt_tokens: int = 0
turn_completion_tokens: int = 0
async def _baseline_llm_caller(
messages: list[dict[str, Any]],
tools: Sequence[Any],
*,
state: _BaselineStreamState,
) -> LLMLoopResponse:
"""Stream an OpenAI-compatible response and collect results.
Extracted from ``stream_chat_completion_baseline`` for readability.
"""
state.pending_events.append(StreamStartStep())
round_text = ""
try:
client = _get_openai_client()
typed_messages = cast(list[ChatCompletionMessageParam], messages)
if tools:
typed_tools = cast(list[ChatCompletionToolParam], tools)
response = await client.chat.completions.create(
model=config.fast_model,
messages=typed_messages,
tools=typed_tools,
stream=True,
stream_options={"include_usage": True},
)
else:
response = await client.chat.completions.create(
model=config.fast_model,
messages=typed_messages,
stream=True,
stream_options={"include_usage": True},
)
tool_calls_by_index: dict[int, dict[str, str]] = {}
async for chunk in response:
if chunk.usage:
state.turn_prompt_tokens += chunk.usage.prompt_tokens or 0
state.turn_completion_tokens += chunk.usage.completion_tokens or 0
delta = chunk.choices[0].delta if chunk.choices else None
if not delta:
continue
if delta.content:
if not state.text_started:
state.pending_events.append(StreamTextStart(id=state.text_block_id))
state.text_started = True
round_text += delta.content
state.pending_events.append(
StreamTextDelta(id=state.text_block_id, delta=delta.content)
)
if delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index
if idx not in tool_calls_by_index:
tool_calls_by_index[idx] = {
"id": "",
"name": "",
"arguments": "",
}
entry = tool_calls_by_index[idx]
if tc.id:
entry["id"] = tc.id
if tc.function and tc.function.name:
entry["name"] = tc.function.name
if tc.function and tc.function.arguments:
entry["arguments"] += tc.function.arguments
# Close text block
if state.text_started:
state.pending_events.append(StreamTextEnd(id=state.text_block_id))
state.text_started = False
state.text_block_id = str(uuid.uuid4())
finally:
# Always persist partial text so the session history stays consistent,
# even when the stream is interrupted by an exception.
state.assistant_text += round_text
# Always emit StreamFinishStep to match the StreamStartStep,
# even if an exception occurred during streaming.
state.pending_events.append(StreamFinishStep())
# Convert to shared format
llm_tool_calls = [
LLMToolCall(
id=tc["id"],
name=tc["name"],
arguments=tc["arguments"] or "{}",
)
for tc in tool_calls_by_index.values()
]
return LLMLoopResponse(
response_text=round_text or None,
tool_calls=llm_tool_calls,
raw_response=None, # Not needed for baseline conversation updater
prompt_tokens=0, # Tracked via state accumulators
completion_tokens=0,
)
async def _baseline_tool_executor(
tool_call: LLMToolCall,
tools: Sequence[Any],
*,
state: _BaselineStreamState,
user_id: str | None,
session: ChatSession,
) -> ToolCallResult:
"""Execute a tool via the copilot tool registry.
Extracted from ``stream_chat_completion_baseline`` for readability.
"""
tool_call_id = tool_call.id
tool_name = tool_call.name
raw_args = tool_call.arguments or "{}"
try:
tool_args = orjson.loads(raw_args)
except orjson.JSONDecodeError as parse_err:
parse_error = f"Invalid JSON arguments for tool '{tool_name}': {parse_err}"
logger.warning("[Baseline] %s", parse_error)
state.pending_events.append(
StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=tool_name,
output=parse_error,
success=False,
)
)
return ToolCallResult(
tool_call_id=tool_call_id,
tool_name=tool_name,
content=parse_error,
is_error=True,
)
state.pending_events.append(
StreamToolInputStart(toolCallId=tool_call_id, toolName=tool_name)
)
state.pending_events.append(
StreamToolInputAvailable(
toolCallId=tool_call_id,
toolName=tool_name,
input=tool_args,
)
)
try:
result: StreamToolOutputAvailable = await execute_tool(
tool_name=tool_name,
parameters=tool_args,
user_id=user_id,
session=session,
tool_call_id=tool_call_id,
)
state.pending_events.append(result)
tool_output = (
result.output if isinstance(result.output, str) else str(result.output)
)
return ToolCallResult(
tool_call_id=tool_call_id,
tool_name=tool_name,
content=tool_output,
)
except Exception as e:
error_output = f"Tool execution error: {e}"
logger.error(
"[Baseline] Tool %s failed: %s",
tool_name,
error_output,
exc_info=True,
)
state.pending_events.append(
StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=tool_name,
output=error_output,
success=False,
)
)
return ToolCallResult(
tool_call_id=tool_call_id,
tool_name=tool_name,
content=error_output,
is_error=True,
)
def _baseline_conversation_updater(
messages: list[dict[str, Any]],
response: LLMLoopResponse,
tool_results: list[ToolCallResult] | None = None,
*,
transcript_builder: TranscriptBuilder,
model: str = "",
) -> None:
"""Update OpenAI message list with assistant response + tool results.
Extracted from ``stream_chat_completion_baseline`` for readability.
"""
if tool_results:
# Build assistant message with tool_calls
assistant_msg: dict[str, Any] = {"role": "assistant"}
if response.response_text:
assistant_msg["content"] = response.response_text
assistant_msg["tool_calls"] = [
{
"id": tc.id,
"type": "function",
"function": {"name": tc.name, "arguments": tc.arguments},
}
for tc in response.tool_calls
]
messages.append(assistant_msg)
# Record assistant message (with tool_calls) to transcript
content_blocks: list[dict[str, Any]] = []
if response.response_text:
content_blocks.append({"type": "text", "text": response.response_text})
for tc in response.tool_calls:
try:
args = orjson.loads(tc.arguments) if tc.arguments else {}
except Exception:
args = {}
content_blocks.append(
{
"type": "tool_use",
"id": tc.id,
"name": tc.name,
"input": args,
}
)
if content_blocks:
transcript_builder.append_assistant(
content_blocks=content_blocks,
model=model,
stop_reason="tool_use",
)
for tr in tool_results:
messages.append(
{
"role": "tool",
"tool_call_id": tr.tool_call_id,
"content": tr.content,
}
)
# Record tool result to transcript AFTER the assistant tool_use
# block to maintain correct Anthropic API ordering:
# assistant(tool_use) → user(tool_result)
transcript_builder.append_tool_result(
tool_use_id=tr.tool_call_id,
content=tr.content,
)
else:
if response.response_text:
messages.append({"role": "assistant", "content": response.response_text})
# Record final text to transcript
transcript_builder.append_assistant(
content_blocks=[{"type": "text", "text": response.response_text}],
model=model,
stop_reason="end_turn",
)
async def _update_title_async(
session_id: str, message: str, user_id: str | None
) -> None:
@@ -85,19 +385,23 @@ async def _compress_session_messages(
msg_dict: dict[str, Any] = {"role": msg.role}
if msg.content:
msg_dict["content"] = msg.content
if msg.tool_calls:
msg_dict["tool_calls"] = msg.tool_calls
if msg.tool_call_id:
msg_dict["tool_call_id"] = msg.tool_call_id
messages_dict.append(msg_dict)
try:
result = await compress_context(
messages=messages_dict,
model=config.model,
model=config.fast_model,
client=_get_openai_client(),
)
except Exception as e:
logger.warning("[Baseline] Context compression with LLM failed: %s", e)
result = await compress_context(
messages=messages_dict,
model=config.model,
model=config.fast_model,
client=None,
)
@@ -111,7 +415,12 @@ async def _compress_session_messages(
result.messages_dropped,
)
return [
ChatMessage(role=m["role"], content=m.get("content"))
ChatMessage(
role=m["role"],
content=m.get("content"),
tool_calls=m.get("tool_calls"),
tool_call_id=m.get("tool_call_id"),
)
for m in result.messages
]
@@ -142,7 +451,8 @@ async def stream_chat_completion_baseline(
f"Session {session_id} not found. Please create a new session first."
)
# Append user message
# Append user message (skip if it's an exact duplicate of the last message,
# e.g. from a network retry)
new_role = "user" if is_user_message else "assistant"
if message and (
len(session.messages) == 0
@@ -161,6 +471,54 @@ async def stream_chat_completion_baseline(
session = await upsert_chat_session(session)
# --- Transcript support (feature parity with SDK path) ---
transcript_builder = TranscriptBuilder()
transcript_covers_prefix = True
if user_id and len(session.messages) > 1:
try:
dl = await download_transcript(user_id, session_id, log_prefix="[Baseline]")
if dl and validate_transcript(dl.content):
# Reject stale transcripts: if msg_count is known and
# doesn't cover the current session, loading it would
# silently drop intermediate turns from the transcript.
session_msg_count = len(session.messages)
if dl.message_count and dl.message_count < session_msg_count - 1:
logger.warning(
"[Baseline] Transcript stale: covers %d of %d messages, skipping",
dl.message_count,
session_msg_count,
)
transcript_covers_prefix = False
else:
transcript_builder.load_previous(
dl.content, log_prefix="[Baseline]"
)
logger.info(
"[Baseline] Loaded transcript: %dB, msg_count=%d",
len(dl.content),
dl.message_count,
)
elif dl:
logger.warning("[Baseline] Downloaded transcript but invalid")
transcript_covers_prefix = False
else:
logger.debug("[Baseline] No transcript available")
transcript_covers_prefix = False
except Exception as e:
logger.warning("[Baseline] Transcript download failed: %s", e)
transcript_covers_prefix = False
# Append user message to transcript.
# Always append when the message is present and is from the user,
# even on duplicate-suppressed retries (is_new_message=False).
# The loaded transcript may be stale (uploaded before the previous
# attempt stored this message), so skipping it would leave the
# transcript without the user turn, creating a malformed
# assistant-after-assistant structure when the LLM reply is added.
if message and is_user_message:
transcript_builder.append_user(content=message)
# Generate title for new sessions
if is_user_message and not session.title:
user_messages = [m for m in session.messages if m.role == "user"]
@@ -193,16 +551,37 @@ async def stream_chat_completion_baseline(
# Compress context if approaching the model's token limit
messages_for_context = await _compress_session_messages(session.messages)
# Build OpenAI message list from session history
# Build OpenAI message list from session history.
# Include tool_calls on assistant messages and tool-role results so the
# model retains full context of what tools were invoked and their outcomes.
openai_messages: list[dict[str, Any]] = [
{"role": "system", "content": system_prompt}
]
for msg in messages_for_context:
if msg.role in ("user", "assistant") and msg.content:
if msg.role == "assistant":
entry: dict[str, Any] = {"role": "assistant"}
if msg.content:
entry["content"] = msg.content
if msg.tool_calls:
entry["tool_calls"] = msg.tool_calls
if msg.content or msg.tool_calls:
openai_messages.append(entry)
elif msg.role == "tool" and msg.tool_call_id:
openai_messages.append(
{
"role": "tool",
"tool_call_id": msg.tool_call_id,
"content": msg.content or "",
}
)
elif msg.role == "user" and msg.content:
openai_messages.append({"role": msg.role, "content": msg.content})
tools = get_available_tools()
# Propagate execution context so tool handlers can read session-level flags.
set_execution_context(user_id, session)
yield StreamStart(messageId=message_id, sessionId=session_id)
# Propagate user/session context to Langfuse so all LLM calls within
@@ -219,191 +598,38 @@ async def stream_chat_completion_baseline(
except Exception:
logger.warning("[Baseline] Langfuse trace context setup failed")
assistant_text = ""
text_block_id = str(uuid.uuid4())
text_started = False
step_open = False
# Token usage accumulators — populated from streaming chunks
turn_prompt_tokens = 0
turn_completion_tokens = 0
_stream_error = False # Track whether an error occurred during streaming
state = _BaselineStreamState()
# Bind extracted module-level callbacks to this request's state/session
# using functools.partial so they satisfy the Protocol signatures.
_bound_llm_caller = partial(_baseline_llm_caller, state=state)
_bound_tool_executor = partial(
_baseline_tool_executor, state=state, user_id=user_id, session=session
)
_bound_conversation_updater = partial(
_baseline_conversation_updater,
transcript_builder=transcript_builder,
model=config.fast_model,
)
try:
for _round in range(_MAX_TOOL_ROUNDS):
# Open a new step for each LLM round
yield StreamStartStep()
step_open = True
loop_result = None
async for loop_result in tool_call_loop(
messages=openai_messages,
tools=tools,
llm_call=_bound_llm_caller,
execute_tool=_bound_tool_executor,
update_conversation=_bound_conversation_updater,
max_iterations=_MAX_TOOL_ROUNDS,
):
# Drain buffered events after each iteration (real-time streaming)
for evt in state.pending_events:
yield evt
state.pending_events.clear()
# Stream a response from the model
create_kwargs: dict[str, Any] = dict(
model=config.model,
messages=openai_messages,
stream=True,
stream_options={"include_usage": True},
)
if tools:
create_kwargs["tools"] = tools
response = await _get_openai_client().chat.completions.create(**create_kwargs) # type: ignore[arg-type] # dynamic kwargs
# Accumulate streamed response (text + tool calls)
round_text = ""
tool_calls_by_index: dict[int, dict[str, str]] = {}
async for chunk in response:
# Capture token usage from the streaming chunk.
# OpenRouter normalises all providers into OpenAI format
# where prompt_tokens already includes cached tokens
# (unlike Anthropic's native API). Use += to sum all
# tool-call rounds since each API call is independent.
# NOTE: stream_options={"include_usage": True} is not
# universally supported — some providers (Mistral, Llama
# via OpenRouter) always return chunk.usage=None. When
# that happens, tokens stay 0 and the tiktoken fallback
# below activates. Fail-open: one round is estimated.
if chunk.usage:
turn_prompt_tokens += chunk.usage.prompt_tokens or 0
turn_completion_tokens += chunk.usage.completion_tokens or 0
delta = chunk.choices[0].delta if chunk.choices else None
if not delta:
continue
# Text content
if delta.content:
if not text_started:
yield StreamTextStart(id=text_block_id)
text_started = True
round_text += delta.content
yield StreamTextDelta(id=text_block_id, delta=delta.content)
# Tool call fragments (streamed incrementally)
if delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index
if idx not in tool_calls_by_index:
tool_calls_by_index[idx] = {
"id": "",
"name": "",
"arguments": "",
}
entry = tool_calls_by_index[idx]
if tc.id:
entry["id"] = tc.id
if tc.function and tc.function.name:
entry["name"] = tc.function.name
if tc.function and tc.function.arguments:
entry["arguments"] += tc.function.arguments
# Close text block if we had one this round
if text_started:
yield StreamTextEnd(id=text_block_id)
text_started = False
text_block_id = str(uuid.uuid4())
# Accumulate text for session persistence
assistant_text += round_text
# No tool calls -> model is done
if not tool_calls_by_index:
yield StreamFinishStep()
step_open = False
break
# Close step before tool execution
yield StreamFinishStep()
step_open = False
# Append the assistant message with tool_calls to context.
assistant_msg: dict[str, Any] = {"role": "assistant"}
if round_text:
assistant_msg["content"] = round_text
assistant_msg["tool_calls"] = [
{
"id": tc["id"],
"type": "function",
"function": {
"name": tc["name"],
"arguments": tc["arguments"] or "{}",
},
}
for tc in tool_calls_by_index.values()
]
openai_messages.append(assistant_msg)
# Execute each tool call and stream events
for tc in tool_calls_by_index.values():
tool_call_id = tc["id"]
tool_name = tc["name"]
raw_args = tc["arguments"] or "{}"
try:
tool_args = orjson.loads(raw_args)
except orjson.JSONDecodeError as parse_err:
parse_error = (
f"Invalid JSON arguments for tool '{tool_name}': {parse_err}"
)
logger.warning("[Baseline] %s", parse_error)
yield StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=tool_name,
output=parse_error,
success=False,
)
openai_messages.append(
{
"role": "tool",
"tool_call_id": tool_call_id,
"content": parse_error,
}
)
continue
yield StreamToolInputStart(toolCallId=tool_call_id, toolName=tool_name)
yield StreamToolInputAvailable(
toolCallId=tool_call_id,
toolName=tool_name,
input=tool_args,
)
# Execute via shared tool registry
try:
result: StreamToolOutputAvailable = await execute_tool(
tool_name=tool_name,
parameters=tool_args,
user_id=user_id,
session=session,
tool_call_id=tool_call_id,
)
yield result
tool_output = (
result.output
if isinstance(result.output, str)
else str(result.output)
)
except Exception as e:
error_output = f"Tool execution error: {e}"
logger.error(
"[Baseline] Tool %s failed: %s",
tool_name,
error_output,
exc_info=True,
)
yield StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=tool_name,
output=error_output,
success=False,
)
tool_output = error_output
# Append tool result to context for next round
openai_messages.append(
{
"role": "tool",
"tool_call_id": tool_call_id,
"content": tool_output,
}
)
else:
# for-loop exhausted without break -> tool-round limit hit
if loop_result and not loop_result.finished_naturally:
limit_msg = (
f"Exceeded {_MAX_TOOL_ROUNDS} tool-call rounds "
"without a final response."
@@ -418,11 +644,28 @@ async def stream_chat_completion_baseline(
_stream_error = True
error_msg = str(e) or type(e).__name__
logger.error("[Baseline] Streaming error: %s", error_msg, exc_info=True)
# Close any open text/step before emitting error
if text_started:
yield StreamTextEnd(id=text_block_id)
if step_open:
yield StreamFinishStep()
# Close any open text block. The llm_caller's finally block
# already appended StreamFinishStep to pending_events, so we must
# insert StreamTextEnd *before* StreamFinishStep to preserve the
# protocol ordering:
# StreamStartStep -> StreamTextStart -> ...deltas... ->
# StreamTextEnd -> StreamFinishStep
# Appending (or yielding directly) would place it after
# StreamFinishStep, violating the protocol.
if state.text_started:
# Find the last StreamFinishStep and insert before it.
insert_pos = len(state.pending_events)
for i in range(len(state.pending_events) - 1, -1, -1):
if isinstance(state.pending_events[i], StreamFinishStep):
insert_pos = i
break
state.pending_events.insert(
insert_pos, StreamTextEnd(id=state.text_block_id)
)
# Drain pending events in correct order
for evt in state.pending_events:
yield evt
state.pending_events.clear()
yield StreamError(errorText=error_msg, code="baseline_error")
# Still persist whatever we got
finally:
@@ -442,26 +685,21 @@ async def stream_chat_completion_baseline(
# Skip fallback when an error occurred and no output was produced —
# charging rate-limit tokens for completely failed requests is unfair.
if (
turn_prompt_tokens == 0
and turn_completion_tokens == 0
and not (_stream_error and not assistant_text)
state.turn_prompt_tokens == 0
and state.turn_completion_tokens == 0
and not (_stream_error and not state.assistant_text)
):
from backend.util.prompt import (
estimate_token_count,
estimate_token_count_str,
state.turn_prompt_tokens = max(
estimate_token_count(openai_messages, model=config.fast_model), 1
)
turn_prompt_tokens = max(
estimate_token_count(openai_messages, model=config.model), 1
)
turn_completion_tokens = estimate_token_count_str(
assistant_text, model=config.model
state.turn_completion_tokens = estimate_token_count_str(
state.assistant_text, model=config.fast_model
)
logger.info(
"[Baseline] No streaming usage reported; estimated tokens: "
"prompt=%d, completion=%d",
turn_prompt_tokens,
turn_completion_tokens,
state.turn_prompt_tokens,
state.turn_completion_tokens,
)
# Persist token usage to session and record for rate limiting.
@@ -471,31 +709,50 @@ async def stream_chat_completion_baseline(
await persist_and_record_usage(
session=session,
user_id=user_id,
prompt_tokens=turn_prompt_tokens,
completion_tokens=turn_completion_tokens,
prompt_tokens=state.turn_prompt_tokens,
completion_tokens=state.turn_completion_tokens,
log_prefix="[Baseline]",
)
# Persist assistant response
if assistant_text:
if state.assistant_text:
session.messages.append(
ChatMessage(role="assistant", content=assistant_text)
ChatMessage(role="assistant", content=state.assistant_text)
)
try:
await upsert_chat_session(session)
except Exception as persist_err:
logger.error("[Baseline] Failed to persist session: %s", persist_err)
# --- Upload transcript for next-turn continuity ---
if user_id and transcript_covers_prefix:
try:
_transcript_content = transcript_builder.to_jsonl()
if _transcript_content and validate_transcript(_transcript_content):
await asyncio.shield(
upload_transcript(
user_id=user_id,
session_id=session_id,
content=_transcript_content,
message_count=len(session.messages),
log_prefix="[Baseline]",
)
)
else:
logger.debug("[Baseline] No valid transcript to upload")
except Exception as upload_err:
logger.error("[Baseline] Transcript upload failed: %s", upload_err)
# Yield usage and finish AFTER try/finally (not inside finally).
# PEP 525 prohibits yielding from finally in async generators during
# aclose() — doing so raises RuntimeError on client disconnect.
# On GeneratorExit the client is already gone, so unreachable yields
# are harmless; on normal completion they reach the SSE stream.
if turn_prompt_tokens > 0 or turn_completion_tokens > 0:
if state.turn_prompt_tokens > 0 or state.turn_completion_tokens > 0:
yield StreamUsage(
prompt_tokens=turn_prompt_tokens,
completion_tokens=turn_completion_tokens,
total_tokens=turn_prompt_tokens + turn_completion_tokens,
prompt_tokens=state.turn_prompt_tokens,
completion_tokens=state.turn_completion_tokens,
total_tokens=state.turn_prompt_tokens + state.turn_completion_tokens,
)
yield StreamFinish()

View File

@@ -31,7 +31,7 @@ async def test_baseline_multi_turn(setup_test_user, test_user_id):
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
session = await create_chat_session(test_user_id)
session = await create_chat_session(test_user_id, dry_run=False)
session = await upsert_chat_session(session)
# --- Turn 1: send a message with a unique keyword ---

View File

@@ -14,12 +14,21 @@ class ChatConfig(BaseSettings):
# OpenAI API Configuration
model: str = Field(
default="anthropic/claude-opus-4.6", description="Default model to use"
default="anthropic/claude-opus-4.6",
description="Default model for extended thinking mode",
)
fast_model: str = Field(
default="anthropic/claude-sonnet-4",
description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
)
title_model: str = Field(
default="openai/gpt-4o-mini",
description="Model to use for generating session titles (should be fast/cheap)",
)
simulation_model: str = Field(
default="google/gemini-2.5-flash",
description="Model for dry-run block simulation (should be fast/cheap with good JSON output)",
)
api_key: str | None = Field(default=None, description="OpenAI API key")
base_url: str | None = Field(
default=OPENROUTER_BASE_URL,
@@ -77,11 +86,11 @@ class ChatConfig(BaseSettings):
# allows ~70-100 turns/day.
# Checked at the HTTP layer (routes.py) before each turn.
#
# TODO: These are deploy-time constants applied identically to every user.
# If per-user or per-plan limits are needed (e.g., free tier vs paid), these
# must move to the database (e.g., a UserPlan table) and get_usage_status /
# check_rate_limit would look up each user's specific limits instead of
# reading config.daily_token_limit / config.weekly_token_limit.
# These are base limits for the FREE tier. Higher tiers (PRO, BUSINESS,
# ENTERPRISE) multiply these by their tier multiplier (see
# rate_limit.TIER_MULTIPLIERS). User tier is stored in the
# User.subscriptionTier DB column and resolved inside
# get_global_rate_limits().
daily_token_limit: int = Field(
default=2_500_000,
description="Max tokens per day, resets at midnight UTC (0 = unlimited)",
@@ -91,6 +100,20 @@ class ChatConfig(BaseSettings):
description="Max tokens per week, resets Monday 00:00 UTC (0 = unlimited)",
)
# Cost (in credits / cents) to reset the daily rate limit using credits.
# When a user hits their daily limit, they can spend this amount to reset
# the daily counter and keep working. Set to 0 to disable the feature.
rate_limit_reset_cost: int = Field(
default=500,
ge=0,
description="Credit cost (in cents) for resetting the daily rate limit. 0 = disabled.",
)
max_daily_resets: int = Field(
default=5,
ge=0,
description="Maximum number of credit-based rate limit resets per user per day. 0 = unlimited.",
)
# Claude Agent SDK Configuration
use_claude_agent_sdk: bool = Field(
default=True,
@@ -115,6 +138,32 @@ class ChatConfig(BaseSettings):
description="Use --resume for multi-turn conversations instead of "
"history compression. Falls back to compression when unavailable.",
)
claude_agent_fallback_model: str = Field(
default="claude-sonnet-4-20250514",
description="Fallback model when the primary model is unavailable (e.g. 529 "
"overloaded). The SDK automatically retries with this cheaper model.",
)
claude_agent_max_turns: int = Field(
default=50,
ge=1,
le=500,
description="Maximum number of agentic turns (tool-use loops) per query. "
"Prevents runaway tool loops from burning budget.",
)
claude_agent_max_budget_usd: float = Field(
default=5.0,
ge=0.01,
le=100.0,
description="Maximum spend in USD per SDK query. The CLI aborts the "
"request if this budget is exceeded.",
)
claude_agent_max_transient_retries: int = Field(
default=3,
ge=0,
le=10,
description="Maximum number of retries for transient API errors "
"(429, 5xx, ECONNRESET) before surfacing the error to the user.",
)
use_openrouter: bool = Field(
default=True,
description="Enable routing API calls through the OpenRouter proxy. "
@@ -164,7 +213,7 @@ class ChatConfig(BaseSettings):
Single source of truth for "will the SDK route through OpenRouter?".
Checks the flag *and* that ``api_key`` + a valid ``base_url`` are
present — mirrors the fallback logic in ``_build_sdk_env``.
present — mirrors the fallback logic in ``build_sdk_env``.
"""
if not self.use_openrouter:
return False

View File

@@ -44,12 +44,32 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
# Transient Anthropic API error detection
# ---------------------------------------------------------------------------
# Patterns in error text that indicate a transient Anthropic API error
# (ECONNRESET / dropped TCP connection) which is retryable.
# which is retryable. Covers:
# - Connection-level: ECONNRESET, dropped TCP connections
# - HTTP 429: rate-limit / too-many-requests
# - HTTP 5xx: server errors, overloaded
_TRANSIENT_ERROR_PATTERNS = (
# Connection-level
"socket connection was closed unexpectedly",
"ECONNRESET",
"connection was forcibly closed",
"network socket disconnected",
# 429 rate-limit patterns
"rate limit",
"rate_limit",
"too many requests",
"status code 429",
# 5xx server error patterns
"overloaded",
"internal server error",
"bad gateway",
"service unavailable",
"gateway timeout",
"status code 529",
"status code 500",
"status code 502",
"status code 503",
"status code 504",
)
FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"

View File

@@ -149,7 +149,8 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
Allowed:
- Files under *sdk_cwd* (``/tmp/copilot-<session>/``)
- Files under ``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/...``.
- Files under ``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/...``
or ``tool-outputs/...``.
The SDK nests tool-results under a conversation UUID directory;
the UUID segment is validated with ``_UUID_RE``.
"""
@@ -174,17 +175,20 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
# Defence-in-depth: ensure project_dir didn't escape the base.
if not project_dir.startswith(SDK_PROJECTS_DIR + os.sep):
return False
# Only allow: <encoded-cwd>/<uuid>/tool-results/<file>
# Only allow: <encoded-cwd>/<uuid>/<tool-dir>/<file>
# The SDK always creates a conversation UUID directory between
# the project dir and tool-results/.
# the project dir and the tool directory.
# Accept both "tool-results" (SDK's persisted outputs) and
# "tool-outputs" (the model sometimes confuses workspace paths
# with filesystem paths and generates this variant).
if resolved.startswith(project_dir + os.sep):
relative = resolved[len(project_dir) + 1 :]
parts = relative.split(os.sep)
# Require exactly: [<uuid>, "tool-results", <file>, ...]
# Require exactly: [<uuid>, "tool-results"|"tool-outputs", <file>, ...]
if (
len(parts) >= 3
and _UUID_RE.match(parts[0])
and parts[1] == "tool-results"
and parts[1] in ("tool-results", "tool-outputs")
):
return True

View File

@@ -134,6 +134,21 @@ def test_is_allowed_local_path_tool_results_with_uuid():
_current_project_dir.set("")
def test_is_allowed_local_path_tool_outputs_with_uuid():
"""Files under <encoded-cwd>/<uuid>/tool-outputs/ are also allowed."""
encoded = "test-encoded-dir"
conv_uuid = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
path = os.path.join(
SDK_PROJECTS_DIR, encoded, conv_uuid, "tool-outputs", "output.json"
)
_current_project_dir.set(encoded)
try:
assert is_allowed_local_path(path, sdk_cwd=None)
finally:
_current_project_dir.set("")
def test_is_allowed_local_path_tool_results_without_uuid_rejected():
"""Direct <encoded-cwd>/tool-results/ (no UUID) is rejected."""
encoded = "test-encoded-dir"
@@ -159,7 +174,7 @@ def test_is_allowed_local_path_sibling_of_tool_results_is_rejected():
def test_is_allowed_local_path_valid_uuid_wrong_segment_name_rejected():
"""A valid UUID dir but non-'tool-results' second segment is rejected."""
"""A valid UUID dir but non-'tool-results'/'tool-outputs' second segment is rejected."""
encoded = "test-encoded-dir"
uuid_str = "12345678-1234-5678-9abc-def012345678"
path = os.path.join(

View File

@@ -18,7 +18,13 @@ from prisma.types import (
from backend.data import db
from backend.util.json import SafeJson, sanitize_string
from .model import ChatMessage, ChatSession, ChatSessionInfo
from .model import (
ChatMessage,
ChatSession,
ChatSessionInfo,
ChatSessionMetadata,
invalidate_session_cache,
)
logger = logging.getLogger(__name__)
@@ -35,6 +41,7 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
async def create_chat_session(
session_id: str,
user_id: str,
metadata: ChatSessionMetadata | None = None,
) -> ChatSessionInfo:
"""Create a new chat session in the database."""
data = ChatSessionCreateInput(
@@ -43,6 +50,7 @@ async def create_chat_session(
credentials=SafeJson({}),
successfulAgentRuns=SafeJson({}),
successfulAgentSchedules=SafeJson({}),
metadata=SafeJson((metadata or ChatSessionMetadata()).model_dump()),
)
prisma_session = await PrismaChatSession.prisma().create(data=data)
return ChatSessionInfo.from_db(prisma_session)
@@ -57,7 +65,12 @@ async def update_chat_session(
total_completion_tokens: int | None = None,
title: str | None = None,
) -> ChatSession | None:
"""Update a chat session's metadata."""
"""Update a chat session's mutable fields.
Note: ``metadata`` (which includes ``dry_run``) is intentionally omitted —
it is set once at creation time and treated as immutable for the lifetime
of the session.
"""
data: ChatSessionUpdateInput = {"updatedAt": datetime.now(UTC)}
if credentials is not None:
@@ -217,6 +230,9 @@ async def add_chat_messages_batch(
if msg.get("function_call") is not None:
data["functionCall"] = SafeJson(msg["function_call"])
if msg.get("duration_ms") is not None:
data["durationMs"] = msg["duration_ms"]
messages_data.append(data)
# Run create_many and session update in parallel within transaction
@@ -359,3 +375,22 @@ async def update_tool_message_content(
f"tool_call_id {tool_call_id}: {e}"
)
return False
async def set_turn_duration(session_id: str, duration_ms: int) -> None:
"""Set durationMs on the last assistant message in a session.
Also invalidates the Redis session cache so the next GET returns
the updated duration.
"""
last_msg = await PrismaChatMessage.prisma().find_first(
where={"sessionId": session_id, "role": "assistant"},
order={"sequence": "desc"},
)
if last_msg:
await PrismaChatMessage.prisma().update(
where={"id": last_msg.id},
data={"durationMs": duration_ms},
)
# Invalidate cache so the session is re-fetched from DB with durationMs
await invalidate_session_cache(session_id)

View File

@@ -251,20 +251,31 @@ class CoPilotProcessor:
stream_fn = stream_chat_completion_dummy
log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
else:
use_sdk = (
config.use_claude_code_subscription
or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
# Per-request mode override from the frontend takes priority.
# 'fast' → baseline (OpenAI-compatible), 'extended_thinking' → SDK.
if entry.mode == "fast":
use_sdk = False
elif entry.mode == "extended_thinking":
use_sdk = True
else:
# No mode specified — fall back to feature flag / config.
use_sdk = (
config.use_claude_code_subscription
or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
)
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else stream_chat_completion_baseline
)
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
log.info(
f"Using {'SDK' if use_sdk else 'baseline'} service "
f"(mode={entry.mode or 'default'})"
)
# Stream chat completion and publish chunks to Redis.
# stream_and_publish wraps the raw stream with registry

View File

@@ -6,6 +6,7 @@ Defines two exchanges and queues following the graph executor pattern:
"""
import logging
from typing import Literal
from pydantic import BaseModel
@@ -156,6 +157,9 @@ class CoPilotExecutionEntry(BaseModel):
file_ids: list[str] | None = None
"""Workspace file IDs attached to the user's message"""
mode: Literal["fast", "extended_thinking"] | None = None
"""Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
class CancelCoPilotEvent(BaseModel):
"""Event to cancel a CoPilot operation."""
@@ -175,6 +179,7 @@ async def enqueue_copilot_turn(
is_user_message: bool = True,
context: dict[str, str] | None = None,
file_ids: list[str] | None = None,
mode: Literal["fast", "extended_thinking"] | None = None,
) -> None:
"""Enqueue a CoPilot task for processing by the executor service.
@@ -186,6 +191,7 @@ async def enqueue_copilot_turn(
is_user_message: Whether the message is from the user (vs system/assistant)
context: Optional context for the message (e.g., {url: str, content: str})
file_ids: Optional workspace file IDs attached to the user's message
mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
"""
from backend.util.clients import get_async_copilot_queue
@@ -197,6 +203,7 @@ async def enqueue_copilot_turn(
is_user_message=is_user_message,
context=context,
file_ids=file_ids,
mode=mode,
)
queue_client = await get_async_copilot_queue()

View File

@@ -59,6 +59,16 @@ _null_cache: TTLCache[tuple[str, str], bool] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
)
# GitHub user identity caches (keyed by user_id only, not provider tuple).
# Declared here so invalidate_user_provider_cache() can reference them.
_GH_IDENTITY_CACHE_TTL = 600.0 # 10 min — profile data rarely changes
_gh_identity_cache: TTLCache[str, dict[str, str]] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_GH_IDENTITY_CACHE_TTL
)
_gh_identity_null_cache: TTLCache[str, bool] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
)
def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
"""Remove the cached entry for *user_id*/*provider* from both caches.
@@ -66,11 +76,19 @@ def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
Call this after storing new credentials so that the next
``get_provider_token()`` call performs a fresh DB lookup instead of
serving a stale TTL-cached result.
For GitHub specifically, also clears the git-identity caches so that
``get_github_user_git_identity()`` re-fetches the user's profile on
the next call instead of serving stale identity data.
"""
key = (user_id, provider)
_token_cache.pop(key, None)
_null_cache.pop(key, None)
if provider == "github":
_gh_identity_cache.pop(user_id, None)
_gh_identity_null_cache.pop(user_id, None)
# Register this module's cache-bust function with the credentials manager so
# that any create/update/delete operation immediately evicts stale cache
@@ -123,6 +141,7 @@ async def get_provider_token(user_id: str, provider: str) -> str | None:
[c for c in creds_list if c.type == "oauth2"],
key=lambda c: 0 if "repo" in (cast(OAuth2Credentials, c).scopes or []) else 1,
)
refresh_failed = False
for creds in oauth2_creds:
if creds.type == "oauth2":
try:
@@ -141,6 +160,7 @@ async def get_provider_token(user_id: str, provider: str) -> str | None:
# Do NOT fall back to the stale token — it is likely expired
# or revoked. Returning None forces the caller to re-auth,
# preventing the LLM from receiving a non-functional token.
refresh_failed = True
continue
_token_cache[cache_key] = token
return token
@@ -152,8 +172,12 @@ async def get_provider_token(user_id: str, provider: str) -> str | None:
_token_cache[cache_key] = token
return token
# No credentials found — cache to avoid repeated DB hits.
_null_cache[cache_key] = True
# Only cache "not connected" when the user truly has no credentials for this
# provider. If we had OAuth credentials but refresh failed (e.g. transient
# network error, event-loop mismatch), do NOT cache the negative result —
# the next call should retry the refresh instead of being blocked for 60 s.
if not refresh_failed:
_null_cache[cache_key] = True
return None
@@ -171,3 +195,76 @@ async def get_integration_env_vars(user_id: str) -> dict[str, str]:
for var in var_names:
env[var] = token
return env
# ---------------------------------------------------------------------------
# GitHub user identity (for git committer env vars)
# ---------------------------------------------------------------------------
async def get_github_user_git_identity(user_id: str) -> dict[str, str] | None:
"""Fetch the GitHub user's name and email for git committer env vars.
Uses the ``/user`` GitHub API endpoint with the user's stored token.
Returns a dict with ``GIT_AUTHOR_NAME``, ``GIT_AUTHOR_EMAIL``,
``GIT_COMMITTER_NAME``, and ``GIT_COMMITTER_EMAIL`` if the user has a
connected GitHub account. Returns ``None`` otherwise.
Results are cached for 10 minutes; "not connected" results are cached for
60 s (same as null-token cache).
"""
if user_id in _gh_identity_null_cache:
return None
if cached := _gh_identity_cache.get(user_id):
return cached
token = await get_provider_token(user_id, "github")
if not token:
_gh_identity_null_cache[user_id] = True
return None
import aiohttp
try:
async with aiohttp.ClientSession() as session:
async with session.get(
"https://api.github.com/user",
headers={
"Authorization": f"token {token}",
"Accept": "application/vnd.github+json",
},
timeout=aiohttp.ClientTimeout(total=5),
) as resp:
if resp.status != 200:
logger.warning(
"[git-identity] GitHub /user returned %s for user %s",
resp.status,
user_id,
)
return None
data = await resp.json()
except Exception as exc:
logger.warning(
"[git-identity] Failed to fetch GitHub profile for user %s: %s",
user_id,
exc,
)
return None
name = data.get("name") or data.get("login") or "AutoGPT User"
# GitHub may return email=null if the user has set their email to private.
# Fall back to the noreply address GitHub generates for every account.
email = data.get("email")
if not email:
gh_id = data.get("id", "")
login = data.get("login", "user")
email = f"{gh_id}+{login}@users.noreply.github.com"
identity = {
"GIT_AUTHOR_NAME": name,
"GIT_AUTHOR_EMAIL": email,
"GIT_COMMITTER_NAME": name,
"GIT_COMMITTER_EMAIL": email,
}
_gh_identity_cache[user_id] = identity
return identity

View File

@@ -9,6 +9,8 @@ from backend.copilot.integration_creds import (
_NULL_CACHE_TTL,
_TOKEN_CACHE_TTL,
PROVIDER_ENV_VARS,
_gh_identity_cache,
_gh_identity_null_cache,
_null_cache,
_token_cache,
get_integration_env_vars,
@@ -49,9 +51,13 @@ def clear_caches():
"""Ensure clean caches before and after every test."""
_token_cache.clear()
_null_cache.clear()
_gh_identity_cache.clear()
_gh_identity_null_cache.clear()
yield
_token_cache.clear()
_null_cache.clear()
_gh_identity_cache.clear()
_gh_identity_null_cache.clear()
class TestInvalidateUserProviderCache:
@@ -77,6 +83,34 @@ class TestInvalidateUserProviderCache:
invalidate_user_provider_cache(_USER, _PROVIDER)
assert other_key in _token_cache
def test_clears_gh_identity_cache_for_github_provider(self):
"""When provider is 'github', identity caches must also be cleared."""
_gh_identity_cache[_USER] = {
"GIT_AUTHOR_NAME": "Old Name",
"GIT_AUTHOR_EMAIL": "old@example.com",
"GIT_COMMITTER_NAME": "Old Name",
"GIT_COMMITTER_EMAIL": "old@example.com",
}
invalidate_user_provider_cache(_USER, "github")
assert _USER not in _gh_identity_cache
def test_clears_gh_identity_null_cache_for_github_provider(self):
"""When provider is 'github', the identity null-cache must also be cleared."""
_gh_identity_null_cache[_USER] = True
invalidate_user_provider_cache(_USER, "github")
assert _USER not in _gh_identity_null_cache
def test_does_not_clear_gh_identity_cache_for_other_providers(self):
"""When provider is NOT 'github', identity caches must be left alone."""
_gh_identity_cache[_USER] = {
"GIT_AUTHOR_NAME": "Some Name",
"GIT_AUTHOR_EMAIL": "some@example.com",
"GIT_COMMITTER_NAME": "Some Name",
"GIT_COMMITTER_EMAIL": "some@example.com",
}
invalidate_user_provider_cache(_USER, "some-other-provider")
assert _USER in _gh_identity_cache
class TestGetProviderToken:
@pytest.mark.asyncio(loop_scope="session")
@@ -129,8 +163,15 @@ class TestGetProviderToken:
assert result == "oauth-tok"
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth2_refresh_failure_returns_none(self):
"""On refresh failure, return None instead of caching a stale token."""
async def test_oauth2_refresh_failure_returns_none_without_null_cache(self):
"""On refresh failure, return None but do NOT cache in null_cache.
The user has credentials — they just couldn't be refreshed right now
(e.g. transient network error or event-loop mismatch in the copilot
executor). Caching a negative result would block all credential
lookups for 60 s even though the creds exist and may refresh fine
on the next attempt.
"""
oauth_creds = _make_oauth2_creds("stale-oauth-tok")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[oauth_creds])
@@ -141,6 +182,8 @@ class TestGetProviderToken:
# Stale tokens must NOT be returned — forces re-auth.
assert result is None
# Must NOT cache negative result when refresh failed — next call retries.
assert (_USER, _PROVIDER) not in _null_cache
@pytest.mark.asyncio(loop_scope="session")
async def test_no_credentials_caches_null_entry(self):
@@ -176,6 +219,96 @@ class TestGetProviderToken:
assert _NULL_CACHE_TTL < _TOKEN_CACHE_TTL
class TestThreadSafetyLocks:
"""Bug reproduction: shared AsyncRedisKeyedMutex across threads caused
'Future attached to a different loop' when copilot workers accessed
credentials from different event loops."""
@pytest.mark.asyncio(loop_scope="session")
async def test_store_locks_returns_per_thread_instance(self):
"""IntegrationCredentialsStore.locks() must return different instances
for different threads (via @thread_cached)."""
import asyncio
import concurrent.futures
from backend.integrations.credentials_store import IntegrationCredentialsStore
store = IntegrationCredentialsStore()
async def get_locks_id():
mock_redis = AsyncMock()
with patch(
"backend.integrations.credentials_store.get_redis_async",
return_value=mock_redis,
):
locks = await store.locks()
return id(locks)
# Get locks from main thread
main_id = await get_locks_id()
# Get locks from a worker thread
def run_in_thread():
loop = asyncio.new_event_loop()
try:
return loop.run_until_complete(get_locks_id())
finally:
loop.close()
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
worker_id = await asyncio.get_event_loop().run_in_executor(
pool, run_in_thread
)
assert main_id != worker_id, (
"Store.locks() returned the same instance across threads. "
"This would cause 'Future attached to a different loop' errors."
)
@pytest.mark.asyncio(loop_scope="session")
async def test_manager_delegates_to_store_locks(self):
"""IntegrationCredentialsManager.locks() should delegate to store."""
from backend.integrations.creds_manager import IntegrationCredentialsManager
manager = IntegrationCredentialsManager()
mock_redis = AsyncMock()
with patch(
"backend.integrations.credentials_store.get_redis_async",
return_value=mock_redis,
):
locks = await manager.locks()
# Should have gotten it from the store
assert locks is not None
class TestRefreshUnlockedPath:
"""Bug reproduction: copilot worker threads need lock-free refresh because
Redis-backed asyncio.Lock created on one event loop can't be used on another."""
@pytest.mark.asyncio(loop_scope="session")
async def test_refresh_if_needed_lock_false_skips_redis(self):
"""refresh_if_needed(lock=False) must not touch Redis locks at all."""
from backend.integrations.creds_manager import IntegrationCredentialsManager
manager = IntegrationCredentialsManager()
creds = _make_oauth2_creds()
mock_handler = MagicMock()
mock_handler.needs_refresh = MagicMock(return_value=False)
with patch(
"backend.integrations.creds_manager._get_provider_oauth_handler",
new_callable=AsyncMock,
return_value=mock_handler,
):
result = await manager.refresh_if_needed(_USER, creds, lock=False)
# Should return credentials without touching locks
assert result.id == creds.id
class TestGetIntegrationEnvVars:
@pytest.mark.asyncio(loop_scope="session")
async def test_injects_all_env_vars_for_provider(self):

View File

@@ -46,6 +46,16 @@ def _get_session_cache_key(session_id: str) -> str:
# ===================== Chat data models ===================== #
class ChatSessionMetadata(BaseModel):
"""Typed metadata stored in the ``metadata`` JSON column of ChatSession.
Add new session-level flags here instead of adding DB columns —
no migration required for new fields as long as a default is provided.
"""
dry_run: bool = False
class ChatMessage(BaseModel):
role: str
content: str | None = None
@@ -54,6 +64,7 @@ class ChatMessage(BaseModel):
refusal: str | None = None
tool_calls: list[dict] | None = None
function_call: dict | None = None
duration_ms: int | None = None
@staticmethod
def from_db(prisma_message: PrismaChatMessage) -> "ChatMessage":
@@ -66,6 +77,7 @@ class ChatMessage(BaseModel):
refusal=prisma_message.refusal,
tool_calls=_parse_json_field(prisma_message.toolCalls),
function_call=_parse_json_field(prisma_message.functionCall),
duration_ms=prisma_message.durationMs,
)
@@ -88,6 +100,12 @@ class ChatSessionInfo(BaseModel):
updated_at: datetime
successful_agent_runs: dict[str, int] = {}
successful_agent_schedules: dict[str, int] = {}
metadata: ChatSessionMetadata = ChatSessionMetadata()
@property
def dry_run(self) -> bool:
"""Convenience accessor for ``metadata.dry_run``."""
return self.metadata.dry_run
@classmethod
def from_db(cls, prisma_session: PrismaChatSession) -> Self:
@@ -101,6 +119,10 @@ class ChatSessionInfo(BaseModel):
prisma_session.successfulAgentSchedules, default={}
)
# Parse typed metadata from the JSON column.
raw_metadata = _parse_json_field(prisma_session.metadata, default={})
metadata = ChatSessionMetadata.model_validate(raw_metadata)
# Calculate usage from token counts.
# NOTE: Per-turn cache_read_tokens / cache_creation_tokens breakdown
# is lost after persistence — the DB only stores aggregate prompt and
@@ -126,6 +148,7 @@ class ChatSessionInfo(BaseModel):
updated_at=prisma_session.updatedAt,
successful_agent_runs=successful_agent_runs,
successful_agent_schedules=successful_agent_schedules,
metadata=metadata,
)
@@ -133,7 +156,7 @@ class ChatSession(ChatSessionInfo):
messages: list[ChatMessage]
@classmethod
def new(cls, user_id: str) -> Self:
def new(cls, user_id: str, *, dry_run: bool) -> Self:
return cls(
session_id=str(uuid.uuid4()),
user_id=user_id,
@@ -143,6 +166,7 @@ class ChatSession(ChatSessionInfo):
credentials={},
started_at=datetime.now(UTC),
updated_at=datetime.now(UTC),
metadata=ChatSessionMetadata(dry_run=dry_run),
)
@classmethod
@@ -530,6 +554,7 @@ async def _save_session_to_db(
await db.create_chat_session(
session_id=session.session_id,
user_id=session.user_id,
metadata=session.metadata,
)
existing_message_count = 0
@@ -607,21 +632,27 @@ async def append_and_save_message(session_id: str, message: ChatMessage) -> Chat
return session
async def create_chat_session(user_id: str) -> ChatSession:
async def create_chat_session(user_id: str, *, dry_run: bool) -> ChatSession:
"""Create a new chat session and persist it.
Args:
user_id: The authenticated user ID.
dry_run: When True, run_block and run_agent tool calls in this
session are forced to use dry-run simulation mode.
Raises:
DatabaseError: If the database write fails. We fail fast to ensure
callers never receive a non-persisted session that only exists
in cache (which would be lost when the cache expires).
"""
session = ChatSession.new(user_id)
session = ChatSession.new(user_id, dry_run=dry_run)
# Create in database first - fail fast if this fails
try:
await chat_db().create_chat_session(
session_id=session.session_id,
user_id=user_id,
metadata=session.metadata,
)
except Exception as e:
logger.error(f"Failed to create session {session.session_id} in database: {e}")

View File

@@ -46,7 +46,7 @@ messages = [
@pytest.mark.asyncio(loop_scope="session")
async def test_chatsession_serialization_deserialization():
s = ChatSession.new(user_id="abc123")
s = ChatSession.new(user_id="abc123", dry_run=False)
s.messages = messages
s.usage = [Usage(prompt_tokens=100, completion_tokens=200, total_tokens=300)]
serialized = s.model_dump_json()
@@ -57,7 +57,7 @@ async def test_chatsession_serialization_deserialization():
@pytest.mark.asyncio(loop_scope="session")
async def test_chatsession_redis_storage(setup_test_user, test_user_id):
s = ChatSession.new(user_id=test_user_id)
s = ChatSession.new(user_id=test_user_id, dry_run=False)
s.messages = messages
s = await upsert_chat_session(s)
@@ -75,7 +75,7 @@ async def test_chatsession_redis_storage_user_id_mismatch(
setup_test_user, test_user_id
):
s = ChatSession.new(user_id=test_user_id)
s = ChatSession.new(user_id=test_user_id, dry_run=False)
s.messages = messages
s = await upsert_chat_session(s)
@@ -90,7 +90,7 @@ async def test_chatsession_db_storage(setup_test_user, test_user_id):
from backend.data.redis_client import get_redis_async
# Create session with messages including assistant message
s = ChatSession.new(user_id=test_user_id)
s = ChatSession.new(user_id=test_user_id, dry_run=False)
s.messages = messages # Contains user, assistant, and tool messages
assert s.session_id is not None, "Session id is not set"
# Upsert to save to both cache and DB
@@ -241,7 +241,7 @@ _raw_tc2 = {
def test_add_tool_call_appends_to_existing_assistant():
"""When the last assistant is from the current turn, tool_call is added to it."""
session = ChatSession.new(user_id="u")
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="working on it"),
@@ -254,7 +254,7 @@ def test_add_tool_call_appends_to_existing_assistant():
def test_add_tool_call_creates_assistant_when_none_exists():
"""When there's no current-turn assistant, a new one is created."""
session = ChatSession.new(user_id="u")
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
]
@@ -267,7 +267,7 @@ def test_add_tool_call_creates_assistant_when_none_exists():
def test_add_tool_call_does_not_cross_user_boundary():
"""A user message acts as a boundary — previous assistant is not modified."""
session = ChatSession.new(user_id="u")
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="assistant", content="old turn"),
ChatMessage(role="user", content="new message"),
@@ -282,7 +282,7 @@ def test_add_tool_call_does_not_cross_user_boundary():
def test_add_tool_call_multiple_times():
"""Multiple long-running tool calls accumulate on the same assistant."""
session = ChatSession.new(user_id="u")
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="doing stuff"),
@@ -300,7 +300,7 @@ def test_add_tool_call_multiple_times():
def test_to_openai_messages_merges_split_assistants():
"""End-to-end: session with split assistants produces valid OpenAI messages."""
session = ChatSession.new(user_id="u")
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="build agent"),
ChatMessage(role="assistant", content="Let me build that"),
@@ -352,7 +352,7 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
import asyncio
# Create a session with initial messages
session = ChatSession.new(user_id=test_user_id)
session = ChatSession.new(user_id=test_user_id, dry_run=False)
for i in range(3):
session.messages.append(
ChatMessage(

View File

@@ -66,6 +66,7 @@ from pydantic import BaseModel, PrivateAttr
ToolName = Literal[
# Platform tools (must match keys in TOOL_REGISTRY)
"add_understanding",
"ask_question",
"bash_exec",
"browser_act",
"browser_navigate",
@@ -102,6 +103,7 @@ ToolName = Literal[
"web_fetch",
"write_workspace_file",
# SDK built-ins
"Agent",
"Edit",
"Glob",
"Grep",

View File

@@ -544,6 +544,7 @@ class TestApplyToolPermissions:
class TestSdkBuiltinToolNames:
def test_expected_builtins_present(self):
expected = {
"Agent",
"Read",
"Write",
"Edit",

View File

@@ -18,6 +18,18 @@ After `write_workspace_file`, embed the `download_url` in Markdown:
- Image: `![chart](workspace://file_id#image/png)`
- Video: `![recording](workspace://file_id#video/mp4)`
### Handling binary/image data in tool outputs — CRITICAL
When a tool output contains base64-encoded binary data (images, PDFs, etc.):
1. **NEVER** try to inline or render the base64 content in your response.
2. **Save** the data to workspace using `write_workspace_file` (pass the base64 data URI as content).
3. **Show** the result via the workspace download URL in Markdown: `![image](workspace://file_id#image/png)`.
### Passing large data between tools — CRITICAL
When tool outputs produce large text that you need to feed into another tool:
- **NEVER** copy-paste the full text into the next tool call argument.
- **Save** the output to a file (workspace or local), then use `@@agptfile:` references.
- This avoids token limits and ensures data integrity.
### File references — @@agptfile:
Pass large file content to tools by reference: `@@agptfile:<uri>[<start>-<end>]`
- `workspace://<file_id>` or `workspace:///<path>` — workspace files
@@ -107,6 +119,28 @@ Do not re-fetch or re-generate data you already have from prior tool calls.
After building the file, reference it with `@@agptfile:` in other tools:
`@@agptfile:/home/user/report.md`
### Web search best practices
- If 3 similar web searches don't return the specific data you need, conclude
it isn't publicly available and work with what you have.
- Prefer fewer, well-targeted searches over many variations of the same query.
- When spawning sub-agents for research, ensure each has a distinct
non-overlapping scope to avoid redundant searches.
### Tool Discovery Priority
When the user asks to interact with a service or API, follow this order:
1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
All tasks must run in the foreground.
@@ -131,6 +165,11 @@ parent autopilot handles orchestration.
# E2B-only notes — E2B has full internet access so gh CLI works there.
# Not shown in local (bubblewrap) mode: --unshare-net blocks all network.
_E2B_TOOL_NOTES = """
### SDK tool-result files in E2B
When you `Read` an SDK tool-result file, it is automatically copied into the
sandbox so `bash_exec` can access it for further processing.
The exact sandbox path is shown in the `[Sandbox copy available at ...]` note.
### GitHub CLI (`gh`) and git
- If the user has connected their GitHub account, both `gh` and `git` are
pre-authenticated — use them directly without any manual login step.
@@ -196,18 +235,22 @@ def _build_storage_supplement(
- Files here **survive across sessions indefinitely**
### Moving files between storages
- **{file_move_name_1_to_2}**: Copy to persistent workspace
- **{file_move_name_2_to_1}**: Download for processing
- **{file_move_name_1_to_2}**: `write_workspace_file(filename="output.json", source_path="/path/to/local/file")`
- **{file_move_name_2_to_1}**: `read_workspace_file(path="tool-outputs/data.json", save_to_path="{working_dir}/data.json")`
### File persistence
Important files (code, configs, outputs) should be saved to workspace to ensure they persist.
### SDK tool-result files
When tool outputs are large, the SDK truncates them and saves the full output to
a local file under `~/.claude/projects/.../tool-results/`. To read these files,
always use `read_file` or `Read` (NOT `read_workspace_file`).
`read_workspace_file` reads from cloud workspace storage, where SDK
tool-results are NOT stored.
a local file under `~/.claude/projects/.../tool-results/` (or `tool-outputs/`).
To read these files, use `Read` — it reads from the host filesystem.
### Large tool outputs saved to workspace
When a tool output contains `<tool-output-truncated workspace_path="...">`, the
full output is in workspace storage (NOT on the local filesystem). To access it:
- Use `read_workspace_file(path="...", offset=..., length=50000)` for reading sections.
- To process in the sandbox, use `read_workspace_file(path="...", save_to_path="{working_dir}/file.json")` first, then use `bash_exec` on the local copy.
{_SHARED_TOOL_NOTES}{extra_notes}"""

View File

@@ -0,0 +1,28 @@
"""Tests for agent generation guide — verifies clarification section."""
from pathlib import Path
class TestAgentGenerationGuideContainsClarifySection:
"""The agent generation guide must include the clarification section."""
def test_guide_includes_clarify_section(self):
guide_path = Path(__file__).parent / "sdk" / "agent_generation_guide.md"
content = guide_path.read_text(encoding="utf-8")
assert "Before or During Building" in content
def test_guide_mentions_find_block_for_clarification(self):
guide_path = Path(__file__).parent / "sdk" / "agent_generation_guide.md"
content = guide_path.read_text(encoding="utf-8")
clarify_section = content.split("Before or During Building")[1].split(
"### Workflow"
)[0]
assert "find_block" in clarify_section
def test_guide_mentions_ask_question_tool(self):
guide_path = Path(__file__).parent / "sdk" / "agent_generation_guide.md"
content = guide_path.read_text(encoding="utf-8")
clarify_section = content.split("Before or During Building")[1].split(
"### Workflow"
)[0]
assert "ask_question" in clarify_section

View File

@@ -9,11 +9,14 @@ UTC). Fails open when Redis is unavailable to avoid blocking users.
import asyncio
import logging
from datetime import UTC, datetime, timedelta
from enum import Enum
from prisma.models import User as PrismaUser
from pydantic import BaseModel, Field
from redis.exceptions import RedisError
from backend.data.redis_client import get_redis_async
from backend.util.cache import cached
logger = logging.getLogger(__name__)
@@ -21,6 +24,40 @@ logger = logging.getLogger(__name__)
_USAGE_KEY_PREFIX = "copilot:usage"
# ---------------------------------------------------------------------------
# Subscription tier definitions
# ---------------------------------------------------------------------------
class SubscriptionTier(str, Enum):
"""Subscription tiers with increasing token allowances.
Mirrors the ``SubscriptionTier`` enum in ``schema.prisma``.
Once ``prisma generate`` is run, this can be replaced with::
from prisma.enums import SubscriptionTier
"""
FREE = "FREE"
PRO = "PRO"
BUSINESS = "BUSINESS"
ENTERPRISE = "ENTERPRISE"
# Multiplier applied to the base limits (from LD / config) for each tier.
# Intentionally int (not float): keeps limits as whole token counts and avoids
# floating-point rounding. If fractional multipliers are ever needed, change
# the type and round the result in get_global_rate_limits().
TIER_MULTIPLIERS: dict[SubscriptionTier, int] = {
SubscriptionTier.FREE: 1,
SubscriptionTier.PRO: 5,
SubscriptionTier.BUSINESS: 20,
SubscriptionTier.ENTERPRISE: 60,
}
DEFAULT_TIER = SubscriptionTier.FREE
class UsageWindow(BaseModel):
"""Usage within a single time window."""
@@ -36,6 +73,11 @@ class CoPilotUsageStatus(BaseModel):
daily: UsageWindow
weekly: UsageWindow
tier: SubscriptionTier = DEFAULT_TIER
reset_cost: int = Field(
default=0,
description="Credit cost (in cents) to reset the daily limit. 0 = feature disabled.",
)
class RateLimitExceeded(Exception):
@@ -61,6 +103,8 @@ async def get_usage_status(
user_id: str,
daily_token_limit: int,
weekly_token_limit: int,
rate_limit_reset_cost: int = 0,
tier: SubscriptionTier = DEFAULT_TIER,
) -> CoPilotUsageStatus:
"""Get current usage status for a user.
@@ -68,6 +112,8 @@ async def get_usage_status(
user_id: The user's ID.
daily_token_limit: Max tokens per day (0 = unlimited).
weekly_token_limit: Max tokens per week (0 = unlimited).
rate_limit_reset_cost: Credit cost (cents) to reset daily limit (0 = disabled).
tier: The user's rate-limit tier (included in the response).
Returns:
CoPilotUsageStatus with current usage and limits.
@@ -97,6 +143,8 @@ async def get_usage_status(
limit=weekly_token_limit,
resets_at=_weekly_reset_time(now=now),
),
tier=tier,
reset_cost=rate_limit_reset_cost,
)
@@ -141,6 +189,111 @@ async def check_rate_limit(
raise RateLimitExceeded("weekly", _weekly_reset_time(now=now))
async def reset_daily_usage(user_id: str, daily_token_limit: int = 0) -> bool:
"""Reset a user's daily token usage counter in Redis.
Called after a user pays credits to extend their daily limit.
Also reduces the weekly usage counter by ``daily_token_limit`` tokens
(clamped to 0) so the user effectively gets one extra day's worth of
weekly capacity.
Args:
user_id: The user's ID.
daily_token_limit: The configured daily token limit. When positive,
the weekly counter is reduced by this amount.
Returns False if Redis is unavailable so the caller can handle
compensation (fail-closed for billed operations, unlike the read-only
rate-limit checks which fail-open).
"""
now = datetime.now(UTC)
try:
redis = await get_redis_async()
# Use a MULTI/EXEC transaction so that DELETE (daily) and DECRBY
# (weekly) either both execute or neither does. This prevents the
# scenario where the daily counter is cleared but the weekly
# counter is not decremented — which would let the caller refund
# credits even though the daily limit was already reset.
d_key = _daily_key(user_id, now=now)
w_key = _weekly_key(user_id, now=now) if daily_token_limit > 0 else None
pipe = redis.pipeline(transaction=True)
pipe.delete(d_key)
if w_key is not None:
pipe.decrby(w_key, daily_token_limit)
results = await pipe.execute()
# Clamp negative weekly counter to 0 (best-effort; not critical).
if w_key is not None:
new_val = results[1] # DECRBY result
if new_val < 0:
await redis.set(w_key, 0, keepttl=True)
logger.info("Reset daily usage for user %s", user_id[:8])
return True
except (RedisError, ConnectionError, OSError):
logger.warning("Redis unavailable for resetting daily usage")
return False
_RESET_LOCK_PREFIX = "copilot:reset_lock"
_RESET_COUNT_PREFIX = "copilot:reset_count"
async def acquire_reset_lock(user_id: str, ttl_seconds: int = 10) -> bool:
"""Acquire a short-lived lock to serialize rate limit resets per user."""
try:
redis = await get_redis_async()
key = f"{_RESET_LOCK_PREFIX}:{user_id}"
return bool(await redis.set(key, "1", nx=True, ex=ttl_seconds))
except (RedisError, ConnectionError, OSError) as exc:
logger.warning("Redis unavailable for reset lock, rejecting reset: %s", exc)
return False
async def release_reset_lock(user_id: str) -> None:
"""Release the per-user reset lock."""
try:
redis = await get_redis_async()
await redis.delete(f"{_RESET_LOCK_PREFIX}:{user_id}")
except (RedisError, ConnectionError, OSError):
pass # Lock will expire via TTL
async def get_daily_reset_count(user_id: str) -> int | None:
"""Get how many times the user has reset today.
Returns None when Redis is unavailable so callers can fail-closed
for billed operations (as opposed to failing open for read-only
rate-limit checks).
"""
now = datetime.now(UTC)
try:
redis = await get_redis_async()
key = f"{_RESET_COUNT_PREFIX}:{user_id}:{now.strftime('%Y-%m-%d')}"
val = await redis.get(key)
return int(val or 0)
except (RedisError, ConnectionError, OSError):
logger.warning("Redis unavailable for reading daily reset count")
return None
async def increment_daily_reset_count(user_id: str) -> None:
"""Increment and track how many resets this user has done today."""
now = datetime.now(UTC)
try:
redis = await get_redis_async()
key = f"{_RESET_COUNT_PREFIX}:{user_id}:{now.strftime('%Y-%m-%d')}"
pipe = redis.pipeline(transaction=True)
pipe.incr(key)
seconds_until_reset = int((_daily_reset_time(now=now) - now).total_seconds())
pipe.expire(key, max(seconds_until_reset, 1))
await pipe.execute()
except (RedisError, ConnectionError, OSError):
logger.warning("Redis unavailable for tracking reset count")
async def record_token_usage(
user_id: str,
prompt_tokens: int,
@@ -231,6 +384,155 @@ async def record_token_usage(
)
class _UserNotFoundError(Exception):
"""Raised when a user record is missing or has no subscription tier.
Used internally by ``_fetch_user_tier`` to signal a cache-miss condition:
by raising instead of returning ``DEFAULT_TIER``, we prevent the ``@cached``
decorator from storing the fallback value. This avoids a race condition
where a non-existent user's DEFAULT_TIER is cached, then the user is
created with a higher tier but receives the stale cached FREE tier for
up to 5 minutes.
"""
@cached(maxsize=1000, ttl_seconds=300, shared_cache=True)
async def _fetch_user_tier(user_id: str) -> SubscriptionTier:
"""Fetch the user's rate-limit tier from the database (cached via Redis).
Uses ``shared_cache=True`` so that tier changes propagate across all pods
immediately when the cache entry is invalidated (via ``cache_delete``).
Only successful DB lookups of existing users with a valid tier are cached.
Raises ``_UserNotFoundError`` when the user is missing or has no tier, so
the ``@cached`` decorator does **not** store a fallback value. This
prevents a race condition where a non-existent user's ``DEFAULT_TIER`` is
cached and then persists after the user is created with a higher tier.
"""
user = await PrismaUser.prisma().find_unique(where={"id": user_id})
if user and user.subscriptionTier: # type: ignore[reportAttributeAccessIssue]
return SubscriptionTier(user.subscriptionTier) # type: ignore[reportAttributeAccessIssue]
raise _UserNotFoundError(user_id)
async def get_user_tier(user_id: str) -> SubscriptionTier:
"""Look up the user's rate-limit tier from the database.
Successful results are cached for 5 minutes (via ``_fetch_user_tier``)
to avoid a DB round-trip on every rate-limit check.
Falls back to ``DEFAULT_TIER`` **without caching** when the DB is
unreachable or returns an unrecognised value, so the next call retries
the query instead of serving a stale fallback for up to 5 minutes.
"""
try:
return await _fetch_user_tier(user_id)
except Exception as exc:
logger.warning(
"Failed to resolve rate-limit tier for user %s, defaulting to %s: %s",
user_id[:8],
DEFAULT_TIER.value,
exc,
)
return DEFAULT_TIER
# Expose cache management on the public function so callers (including tests)
# never need to reach into the private ``_fetch_user_tier``.
get_user_tier.cache_clear = _fetch_user_tier.cache_clear # type: ignore[attr-defined]
get_user_tier.cache_delete = _fetch_user_tier.cache_delete # type: ignore[attr-defined]
async def set_user_tier(user_id: str, tier: SubscriptionTier) -> None:
"""Persist the user's rate-limit tier to the database.
Also invalidates the ``get_user_tier`` cache for this user so that
subsequent rate-limit checks immediately see the new tier.
Raises:
prisma.errors.RecordNotFoundError: If the user does not exist.
"""
await PrismaUser.prisma().update(
where={"id": user_id},
data={"subscriptionTier": tier.value},
)
# Invalidate cached tier so rate-limit checks pick up the change immediately.
get_user_tier.cache_delete(user_id) # type: ignore[attr-defined]
async def get_global_rate_limits(
user_id: str,
config_daily: int,
config_weekly: int,
) -> tuple[int, int, SubscriptionTier]:
"""Resolve global rate limits from LaunchDarkly, falling back to config.
The base limits (from LD or config) are multiplied by the user's
tier multiplier so that higher tiers receive proportionally larger
allowances.
Args:
user_id: User ID for LD flag evaluation context.
config_daily: Fallback daily limit from ChatConfig.
config_weekly: Fallback weekly limit from ChatConfig.
Returns:
(daily_token_limit, weekly_token_limit, tier) 3-tuple.
"""
# Lazy import to avoid circular dependency:
# rate_limit -> feature_flag -> settings -> ... -> rate_limit
from backend.util.feature_flag import Flag, get_feature_flag_value
daily_raw = await get_feature_flag_value(
Flag.COPILOT_DAILY_TOKEN_LIMIT.value, user_id, config_daily
)
weekly_raw = await get_feature_flag_value(
Flag.COPILOT_WEEKLY_TOKEN_LIMIT.value, user_id, config_weekly
)
try:
daily = max(0, int(daily_raw))
except (TypeError, ValueError):
logger.warning("Invalid LD value for daily token limit: %r", daily_raw)
daily = config_daily
try:
weekly = max(0, int(weekly_raw))
except (TypeError, ValueError):
logger.warning("Invalid LD value for weekly token limit: %r", weekly_raw)
weekly = config_weekly
# Apply tier multiplier
tier = await get_user_tier(user_id)
multiplier = TIER_MULTIPLIERS.get(tier, 1)
if multiplier != 1:
daily = daily * multiplier
weekly = weekly * multiplier
return daily, weekly, tier
async def reset_user_usage(user_id: str, *, reset_weekly: bool = False) -> None:
"""Reset a user's usage counters.
Always deletes the daily Redis key. When *reset_weekly* is ``True``,
the weekly key is deleted as well.
Unlike read paths (``get_usage_status``, ``check_rate_limit``) which
fail-open on Redis errors, resets intentionally re-raise so the caller
knows the operation did not succeed. A silent failure here would leave
the admin believing the counters were zeroed when they were not.
"""
now = datetime.now(UTC)
keys_to_delete = [_daily_key(user_id, now=now)]
if reset_weekly:
keys_to_delete.append(_weekly_key(user_id, now=now))
try:
redis = await get_redis_async()
await redis.delete(*keys_to_delete)
except (RedisError, ConnectionError, OSError):
logger.warning("Redis unavailable for resetting user usage")
raise
# ---------------------------------------------------------------------------
# Private helpers
# ---------------------------------------------------------------------------

View File

@@ -7,11 +7,19 @@ import pytest
from redis.exceptions import RedisError
from .rate_limit import (
DEFAULT_TIER,
TIER_MULTIPLIERS,
CoPilotUsageStatus,
RateLimitExceeded,
SubscriptionTier,
UsageWindow,
check_rate_limit,
get_global_rate_limits,
get_usage_status,
get_user_tier,
record_token_usage,
reset_daily_usage,
set_user_tier,
)
_USER = "test-user-rl"
@@ -332,3 +340,873 @@ class TestRecordTokenUsage:
):
# Should not raise — fail-open
await record_token_usage(_USER, prompt_tokens=100, completion_tokens=50)
# ---------------------------------------------------------------------------
# SubscriptionTier and tier multipliers
# ---------------------------------------------------------------------------
class TestSubscriptionTier:
def test_tier_values(self):
assert SubscriptionTier.FREE.value == "FREE"
assert SubscriptionTier.PRO.value == "PRO"
assert SubscriptionTier.BUSINESS.value == "BUSINESS"
assert SubscriptionTier.ENTERPRISE.value == "ENTERPRISE"
def test_tier_multipliers(self):
assert TIER_MULTIPLIERS[SubscriptionTier.FREE] == 1
assert TIER_MULTIPLIERS[SubscriptionTier.PRO] == 5
assert TIER_MULTIPLIERS[SubscriptionTier.BUSINESS] == 20
assert TIER_MULTIPLIERS[SubscriptionTier.ENTERPRISE] == 60
def test_default_tier_is_free(self):
assert DEFAULT_TIER == SubscriptionTier.FREE
def test_usage_status_includes_tier(self):
now = datetime.now(UTC)
status = CoPilotUsageStatus(
daily=UsageWindow(used=0, limit=100, resets_at=now + timedelta(hours=1)),
weekly=UsageWindow(used=0, limit=500, resets_at=now + timedelta(days=1)),
)
assert status.tier == SubscriptionTier.FREE
def test_usage_status_with_custom_tier(self):
now = datetime.now(UTC)
status = CoPilotUsageStatus(
daily=UsageWindow(used=0, limit=100, resets_at=now + timedelta(hours=1)),
weekly=UsageWindow(used=0, limit=500, resets_at=now + timedelta(days=1)),
tier=SubscriptionTier.PRO,
)
assert status.tier == SubscriptionTier.PRO
# ---------------------------------------------------------------------------
# get_user_tier
# ---------------------------------------------------------------------------
class TestGetUserTier:
@pytest.fixture(autouse=True)
def _clear_tier_cache(self):
"""Clear the get_user_tier cache before each test."""
get_user_tier.cache_clear() # type: ignore[attr-defined]
@pytest.mark.asyncio
async def test_returns_tier_from_db(self):
"""Should return the tier stored in the user record."""
mock_user = MagicMock()
mock_user.subscriptionTier = "PRO"
mock_prisma = AsyncMock()
mock_prisma.find_unique = AsyncMock(return_value=mock_user)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma,
):
tier = await get_user_tier(_USER)
assert tier == SubscriptionTier.PRO
@pytest.mark.asyncio
async def test_returns_default_when_user_not_found(self):
"""Should return DEFAULT_TIER when user is not in the DB."""
mock_prisma = AsyncMock()
mock_prisma.find_unique = AsyncMock(return_value=None)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma,
):
tier = await get_user_tier(_USER)
assert tier == DEFAULT_TIER
@pytest.mark.asyncio
async def test_returns_default_when_tier_is_none(self):
"""Should return DEFAULT_TIER when subscriptionTier is None."""
mock_user = MagicMock()
mock_user.subscriptionTier = None
mock_prisma = AsyncMock()
mock_prisma.find_unique = AsyncMock(return_value=mock_user)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma,
):
tier = await get_user_tier(_USER)
assert tier == DEFAULT_TIER
@pytest.mark.asyncio
async def test_returns_default_on_db_error(self):
"""Should fall back to DEFAULT_TIER when DB raises."""
mock_prisma = AsyncMock()
mock_prisma.find_unique = AsyncMock(side_effect=Exception("DB down"))
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma,
):
tier = await get_user_tier(_USER)
assert tier == DEFAULT_TIER
@pytest.mark.asyncio
async def test_db_error_is_not_cached(self):
"""Transient DB errors should NOT cache the default tier.
Regression test: a transient DB failure previously cached DEFAULT_TIER
for 5 minutes, incorrectly downgrading higher-tier users until expiry.
"""
failing_prisma = AsyncMock()
failing_prisma.find_unique = AsyncMock(side_effect=Exception("DB down"))
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=failing_prisma,
):
tier1 = await get_user_tier(_USER)
assert tier1 == DEFAULT_TIER
# Now DB recovers and returns PRO
mock_user = MagicMock()
mock_user.subscriptionTier = "PRO"
ok_prisma = AsyncMock()
ok_prisma.find_unique = AsyncMock(return_value=mock_user)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=ok_prisma,
):
tier2 = await get_user_tier(_USER)
# Should get PRO now — the error result was not cached
assert tier2 == SubscriptionTier.PRO
@pytest.mark.asyncio
async def test_returns_default_on_invalid_tier_value(self):
"""Should fall back to DEFAULT_TIER when stored value is invalid."""
mock_user = MagicMock()
mock_user.subscriptionTier = "invalid-tier"
mock_prisma = AsyncMock()
mock_prisma.find_unique = AsyncMock(return_value=mock_user)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma,
):
tier = await get_user_tier(_USER)
assert tier == DEFAULT_TIER
@pytest.mark.asyncio
async def test_user_not_found_is_not_cached(self):
"""Non-existent user should NOT cache DEFAULT_TIER.
Regression test: when ``get_user_tier`` is called before a user record
exists, the DEFAULT_TIER fallback must not be cached. Otherwise, a
newly created user with a higher tier (e.g. PRO) would receive the
stale cached FREE tier for up to 5 minutes.
"""
# First call: user does not exist yet
missing_prisma = AsyncMock()
missing_prisma.find_unique = AsyncMock(return_value=None)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=missing_prisma,
):
tier1 = await get_user_tier(_USER)
assert tier1 == DEFAULT_TIER
# Second call: user now exists with PRO tier
mock_user = MagicMock()
mock_user.subscriptionTier = "PRO"
ok_prisma = AsyncMock()
ok_prisma.find_unique = AsyncMock(return_value=mock_user)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=ok_prisma,
):
tier2 = await get_user_tier(_USER)
# Should get PRO — the not-found result was not cached
assert tier2 == SubscriptionTier.PRO
# ---------------------------------------------------------------------------
# set_user_tier
# ---------------------------------------------------------------------------
class TestSetUserTier:
@pytest.fixture(autouse=True)
def _clear_tier_cache(self):
"""Clear the get_user_tier cache before each test."""
get_user_tier.cache_clear() # type: ignore[attr-defined]
@pytest.mark.asyncio
async def test_updates_db_and_invalidates_cache(self):
"""set_user_tier should persist to DB and invalidate the tier cache."""
mock_prisma = AsyncMock()
mock_prisma.update = AsyncMock(return_value=None)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma,
):
await set_user_tier(_USER, SubscriptionTier.PRO)
mock_prisma.update.assert_awaited_once_with(
where={"id": _USER},
data={"subscriptionTier": "PRO"},
)
@pytest.mark.asyncio
async def test_record_not_found_propagates(self):
"""RecordNotFoundError from Prisma should propagate to callers."""
import prisma.errors
mock_prisma = AsyncMock()
mock_prisma.update = AsyncMock(
side_effect=prisma.errors.RecordNotFoundError(
{"error": "Record not found"}
),
)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma,
):
with pytest.raises(prisma.errors.RecordNotFoundError):
await set_user_tier(_USER, SubscriptionTier.ENTERPRISE)
@pytest.mark.asyncio
async def test_cache_invalidated_after_set(self):
"""After set_user_tier, get_user_tier should query DB again (not cache)."""
# First, populate the cache with BUSINESS
mock_user_biz = MagicMock()
mock_user_biz.subscriptionTier = "BUSINESS"
mock_prisma_get = AsyncMock()
mock_prisma_get.find_unique = AsyncMock(return_value=mock_user_biz)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma_get,
):
tier_before = await get_user_tier(_USER)
assert tier_before == SubscriptionTier.BUSINESS
# Now set tier to ENTERPRISE (this should invalidate the cache)
mock_prisma_set = AsyncMock()
mock_prisma_set.update = AsyncMock(return_value=None)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma_set,
):
await set_user_tier(_USER, SubscriptionTier.ENTERPRISE)
# Now get_user_tier should hit DB again (cache was invalidated)
mock_user_ent = MagicMock()
mock_user_ent.subscriptionTier = "ENTERPRISE"
mock_prisma_get2 = AsyncMock()
mock_prisma_get2.find_unique = AsyncMock(return_value=mock_user_ent)
with patch(
"backend.copilot.rate_limit.PrismaUser.prisma",
return_value=mock_prisma_get2,
):
tier_after = await get_user_tier(_USER)
assert tier_after == SubscriptionTier.ENTERPRISE
# ---------------------------------------------------------------------------
# get_global_rate_limits with tiers
# ---------------------------------------------------------------------------
class TestGetGlobalRateLimitsWithTiers:
@staticmethod
def _ld_side_effect(daily: int, weekly: int):
"""Return an async side_effect that dispatches by flag_key."""
async def _side_effect(flag_key: str, _uid: str, default: int) -> int:
if "daily" in flag_key.lower():
return daily
if "weekly" in flag_key.lower():
return weekly
return default
return _side_effect
@pytest.mark.asyncio
async def test_free_tier_no_multiplier(self):
"""Free tier should not change limits."""
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(2_500_000, 12_500_000),
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, 2_500_000, 12_500_000
)
assert daily == 2_500_000
assert weekly == 12_500_000
assert tier == SubscriptionTier.FREE
@pytest.mark.asyncio
async def test_pro_tier_5x_multiplier(self):
"""Pro tier should multiply limits by 5."""
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(2_500_000, 12_500_000),
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, 2_500_000, 12_500_000
)
assert daily == 12_500_000
assert weekly == 62_500_000
assert tier == SubscriptionTier.PRO
@pytest.mark.asyncio
async def test_business_tier_20x_multiplier(self):
"""Business tier should multiply limits by 20."""
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.BUSINESS,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(2_500_000, 12_500_000),
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, 2_500_000, 12_500_000
)
assert daily == 50_000_000
assert weekly == 250_000_000
assert tier == SubscriptionTier.BUSINESS
@pytest.mark.asyncio
async def test_enterprise_tier_60x_multiplier(self):
"""Enterprise tier should multiply limits by 60."""
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.ENTERPRISE,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(2_500_000, 12_500_000),
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, 2_500_000, 12_500_000
)
assert daily == 150_000_000
assert weekly == 750_000_000
assert tier == SubscriptionTier.ENTERPRISE
# ---------------------------------------------------------------------------
# End-to-end: tier limits are respected by check_rate_limit
# ---------------------------------------------------------------------------
class TestTierLimitsRespected:
"""Verify that tier-adjusted limits from get_global_rate_limits flow
correctly into check_rate_limit, so higher tiers allow more usage and
lower tiers are blocked when they would exceed their allocation."""
_BASE_DAILY = 2_500_000
_BASE_WEEKLY = 12_500_000
@staticmethod
def _ld_side_effect(daily: int, weekly: int):
async def _side_effect(flag_key: str, _uid: str, default: int) -> int:
if "daily" in flag_key.lower():
return daily
if "weekly" in flag_key.lower():
return weekly
return default
return _side_effect
@pytest.mark.asyncio
async def test_pro_user_allowed_above_free_limit(self):
"""A PRO user with usage above the FREE limit should be allowed."""
# Usage: 3M tokens (above FREE limit of 2.5M, below PRO limit of 12.5M)
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=["3000000", "3000000"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
# PRO: 5x multiplier
assert daily == 12_500_000
assert tier == SubscriptionTier.PRO
# Should NOT raise — 3M < 12.5M
await check_rate_limit(
_USER, daily_token_limit=daily, weekly_token_limit=weekly
)
@pytest.mark.asyncio
async def test_free_user_blocked_at_free_limit(self):
"""A FREE user at or above the base limit should be blocked."""
# Usage: 2.5M tokens (at FREE limit of 2.5M)
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=["2500000", "2500000"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
# FREE: 1x multiplier
assert daily == 2_500_000
assert tier == SubscriptionTier.FREE
# Should raise — 2.5M >= 2.5M
with pytest.raises(RateLimitExceeded):
await check_rate_limit(
_USER, daily_token_limit=daily, weekly_token_limit=weekly
)
@pytest.mark.asyncio
async def test_enterprise_user_has_highest_headroom(self):
"""An ENTERPRISE user should have 60x the base limit."""
# Usage: 100M tokens (huge, but below ENTERPRISE daily of 150M)
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=["100000000", "100000000"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.ENTERPRISE,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
assert daily == 150_000_000
assert tier == SubscriptionTier.ENTERPRISE
# Should NOT raise — 100M < 150M
await check_rate_limit(
_USER, daily_token_limit=daily, weekly_token_limit=weekly
)
# ---------------------------------------------------------------------------
# reset_daily_usage
# ---------------------------------------------------------------------------
class TestResetDailyUsage:
@staticmethod
def _make_pipeline_mock(decrby_result: int = 0) -> MagicMock:
"""Create a pipeline mock that returns [delete_result, decrby_result]."""
pipe = MagicMock()
pipe.execute = AsyncMock(return_value=[1, decrby_result])
return pipe
@pytest.mark.asyncio
async def test_deletes_daily_key(self):
mock_pipe = self._make_pipeline_mock(decrby_result=0)
mock_redis = AsyncMock()
mock_redis.pipeline = lambda **_kw: mock_pipe
with patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
):
result = await reset_daily_usage(_USER, daily_token_limit=10000)
assert result is True
mock_pipe.delete.assert_called_once()
@pytest.mark.asyncio
async def test_reduces_weekly_usage_via_decrby(self):
"""Weekly counter should be reduced via DECRBY in the pipeline."""
mock_pipe = self._make_pipeline_mock(decrby_result=35000)
mock_redis = AsyncMock()
mock_redis.pipeline = lambda **_kw: mock_pipe
with patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
):
await reset_daily_usage(_USER, daily_token_limit=10000)
mock_pipe.decrby.assert_called_once()
mock_redis.set.assert_not_called() # 35000 > 0, no clamp needed
@pytest.mark.asyncio
async def test_clamps_negative_weekly_to_zero(self):
"""If DECRBY goes negative, SET to 0 (outside the pipeline)."""
mock_pipe = self._make_pipeline_mock(decrby_result=-5000)
mock_redis = AsyncMock()
mock_redis.pipeline = lambda **_kw: mock_pipe
with patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
):
await reset_daily_usage(_USER, daily_token_limit=10000)
mock_pipe.decrby.assert_called_once()
mock_redis.set.assert_called_once()
@pytest.mark.asyncio
async def test_no_weekly_reduction_when_daily_limit_zero(self):
"""When daily_token_limit is 0, weekly counter should not be touched."""
mock_pipe = self._make_pipeline_mock()
mock_pipe.execute = AsyncMock(return_value=[1]) # only delete result
mock_redis = AsyncMock()
mock_redis.pipeline = lambda **_kw: mock_pipe
with patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
):
await reset_daily_usage(_USER, daily_token_limit=0)
mock_pipe.delete.assert_called_once()
mock_pipe.decrby.assert_not_called()
@pytest.mark.asyncio
async def test_returns_false_when_redis_unavailable(self):
with patch(
"backend.copilot.rate_limit.get_redis_async",
side_effect=ConnectionError("Redis down"),
):
result = await reset_daily_usage(_USER, daily_token_limit=10000)
assert result is False
# ---------------------------------------------------------------------------
# Tier-limit enforcement (integration-style)
# ---------------------------------------------------------------------------
class TestTierLimitsEnforced:
"""Verify that tier-multiplied limits are actually respected by
``check_rate_limit`` — i.e. that usage within the tier allowance passes
and usage at/above the tier allowance is rejected."""
_BASE_DAILY = 1_000_000
_BASE_WEEKLY = 5_000_000
@staticmethod
def _ld_side_effect(daily: int, weekly: int):
"""Mock LD flag lookup returning the given raw limits."""
async def _side_effect(flag_key: str, _uid: str, default: int) -> int:
if "daily" in flag_key.lower():
return daily
if "weekly" in flag_key.lower():
return weekly
return default
return _side_effect
@pytest.mark.asyncio
async def test_pro_within_limit_allowed(self):
"""Usage under PRO daily limit should not raise."""
pro_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.PRO]
mock_redis = AsyncMock()
# Simulate usage just under the PRO daily limit
mock_redis.get = AsyncMock(side_effect=[str(pro_daily - 1), "0"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
assert tier == SubscriptionTier.PRO
assert daily == pro_daily
# Should not raise — usage is under the limit
await check_rate_limit(_USER, daily, weekly)
@pytest.mark.asyncio
async def test_pro_at_limit_rejected(self):
"""Usage at exactly the PRO daily limit should raise."""
pro_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.PRO]
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=[str(pro_daily), "0"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
with pytest.raises(RateLimitExceeded) as exc_info:
await check_rate_limit(_USER, daily, weekly)
assert exc_info.value.window == "daily"
@pytest.mark.asyncio
async def test_business_higher_limit_allows_pro_overflow(self):
"""Usage exceeding PRO but under BUSINESS should pass for BUSINESS."""
pro_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.PRO]
biz_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.BUSINESS]
# Usage between PRO and BUSINESS limits
usage = pro_daily + 1_000_000
assert usage < biz_daily, "test sanity: usage must be under BUSINESS limit"
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=[str(usage), "0"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.BUSINESS,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
assert tier == SubscriptionTier.BUSINESS
assert daily == biz_daily
# Should not raise — BUSINESS tier can handle this
await check_rate_limit(_USER, daily, weekly)
@pytest.mark.asyncio
async def test_weekly_limit_enforced_for_tier(self):
"""Weekly limit should also be tier-multiplied and enforced."""
pro_weekly = self._BASE_WEEKLY * TIER_MULTIPLIERS[SubscriptionTier.PRO]
mock_redis = AsyncMock()
# Daily usage fine, weekly at limit
mock_redis.get = AsyncMock(side_effect=["0", str(pro_weekly)])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
with pytest.raises(RateLimitExceeded) as exc_info:
await check_rate_limit(_USER, daily, weekly)
assert exc_info.value.window == "weekly"
@pytest.mark.asyncio
async def test_free_tier_base_limit_enforced(self):
"""Free tier (1x multiplier) should enforce the base limit exactly."""
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=[str(self._BASE_DAILY), "0"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
assert daily == self._BASE_DAILY # 1x multiplier
with pytest.raises(RateLimitExceeded):
await check_rate_limit(_USER, daily, weekly)
@pytest.mark.asyncio
async def test_free_tier_cannot_bypass_pro_limit(self):
"""A FREE-tier user whose usage is within PRO limits but over FREE
limits must still be rejected.
Negative test: ensures the tier multiplier is applied *before* the
rate-limit check, so a lower-tier user cannot 'bypass' limits that
would be acceptable for a higher tier.
"""
free_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.FREE]
pro_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.PRO]
# Usage above FREE limit but below PRO limit
usage = free_daily + 500_000
assert usage < pro_daily, "test sanity: usage must be under PRO limit"
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=[str(usage), "0"])
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
assert tier == SubscriptionTier.FREE
assert daily == free_daily # 1x, not 5x
with pytest.raises(RateLimitExceeded) as exc_info:
await check_rate_limit(_USER, daily, weekly)
assert exc_info.value.window == "daily"
@pytest.mark.asyncio
async def test_tier_change_updates_effective_limits(self):
"""After upgrading from FREE to BUSINESS, the effective limits must
increase accordingly.
Verifies that the tier multiplier is correctly applied after a tier
change, and that usage that was over the FREE limit is within the new
BUSINESS limit.
"""
free_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.FREE]
biz_daily = self._BASE_DAILY * TIER_MULTIPLIERS[SubscriptionTier.BUSINESS]
# Usage above FREE limit but below BUSINESS limit
usage = free_daily + 500_000
assert usage < biz_daily, "test sanity: usage must be under BUSINESS limit"
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(side_effect=[str(usage), "0"])
# Simulate the user having been upgraded to BUSINESS
with (
patch(
"backend.copilot.rate_limit.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.BUSINESS,
),
patch(
"backend.util.feature_flag.get_feature_flag_value",
side_effect=self._ld_side_effect(self._BASE_DAILY, self._BASE_WEEKLY),
),
patch(
"backend.copilot.rate_limit.get_redis_async",
return_value=mock_redis,
),
):
daily, weekly, tier = await get_global_rate_limits(
_USER, self._BASE_DAILY, self._BASE_WEEKLY
)
assert tier == SubscriptionTier.BUSINESS
assert daily == biz_daily # 20x
# Should NOT raise — usage is within the BUSINESS tier allowance
await check_rate_limit(_USER, daily, weekly)

View File

@@ -0,0 +1,315 @@
"""Unit tests for the POST /usage/reset endpoint."""
from __future__ import annotations
from datetime import UTC, datetime, timedelta
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from fastapi import HTTPException
from backend.api.features.chat.routes import reset_copilot_usage
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
from backend.util.exceptions import InsufficientBalanceError
# Minimal config mock matching ChatConfig fields used by the endpoint.
def _make_config(
rate_limit_reset_cost: int = 500,
daily_token_limit: int = 2_500_000,
weekly_token_limit: int = 12_500_000,
max_daily_resets: int = 5,
):
cfg = MagicMock()
cfg.rate_limit_reset_cost = rate_limit_reset_cost
cfg.daily_token_limit = daily_token_limit
cfg.weekly_token_limit = weekly_token_limit
cfg.max_daily_resets = max_daily_resets
return cfg
def _usage(daily_used: int = 3_000_000, daily_limit: int = 2_500_000):
return CoPilotUsageStatus(
daily=UsageWindow(
used=daily_used,
limit=daily_limit,
resets_at=datetime.now(UTC) + timedelta(hours=6),
),
weekly=UsageWindow(
used=5_000_000,
limit=12_500_000,
resets_at=datetime.now(UTC) + timedelta(days=3),
),
)
_MODULE = "backend.api.features.chat.routes"
def _mock_settings(enable_credit: bool = True):
"""Return a mock Settings object with the given enable_credit flag."""
mock = MagicMock()
mock.config.enable_credit = enable_credit
return mock
def _mock_rate_limits(
daily: int = 2_500_000,
weekly: int = 12_500_000,
tier: SubscriptionTier = SubscriptionTier.PRO,
):
"""Mock get_global_rate_limits to return fixed limits (no tier multiplier)."""
return patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(daily, weekly, tier)),
)
@pytest.mark.asyncio
class TestResetCopilotUsage:
async def test_feature_disabled_returns_400(self):
"""When rate_limit_reset_cost=0, endpoint returns 400."""
with patch(f"{_MODULE}.config", _make_config(rate_limit_reset_cost=0)):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 400
assert "not available" in exc_info.value.detail
async def test_no_daily_limit_returns_400(self):
"""When daily_token_limit=0 (unlimited), endpoint returns 400."""
with (
patch(f"{_MODULE}.config", _make_config(daily_token_limit=0)),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(daily=0),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 400
assert "nothing to reset" in exc_info.value.detail.lower()
async def test_not_at_limit_returns_400(self):
"""When user hasn't hit their daily limit, returns 400."""
cfg = _make_config()
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
patch(
f"{_MODULE}.get_usage_status",
AsyncMock(return_value=_usage(daily_used=1_000_000)),
),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 400
assert "not reached" in exc_info.value.detail
mock_release.assert_awaited_once()
async def test_insufficient_credits_returns_402(self):
"""When user doesn't have enough credits, returns 402."""
mock_credit_model = AsyncMock()
mock_credit_model.spend_credits.side_effect = InsufficientBalanceError(
message="Insufficient balance",
user_id="user-1",
balance=50,
amount=200,
)
cfg = _make_config()
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
patch(
f"{_MODULE}.get_usage_status",
AsyncMock(return_value=_usage()),
),
patch(
f"{_MODULE}.get_user_credit_model",
AsyncMock(return_value=mock_credit_model),
),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 402
mock_release.assert_awaited_once()
async def test_happy_path(self):
"""Successful reset: charges credits, resets usage, returns response."""
mock_credit_model = AsyncMock()
mock_credit_model.spend_credits.return_value = 1500 # remaining balance
cfg = _make_config()
updated_usage = _usage(daily_used=0)
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
patch(
f"{_MODULE}.get_usage_status",
AsyncMock(side_effect=[_usage(), updated_usage]),
),
patch(
f"{_MODULE}.get_user_credit_model",
AsyncMock(return_value=mock_credit_model),
),
patch(
f"{_MODULE}.reset_daily_usage", AsyncMock(return_value=True)
) as mock_reset,
patch(f"{_MODULE}.increment_daily_reset_count", AsyncMock()) as mock_incr,
):
result = await reset_copilot_usage(user_id="user-1")
assert result.success is True
assert result.credits_charged == 500
assert result.remaining_balance == 1500
mock_reset.assert_awaited_once()
mock_incr.assert_awaited_once()
async def test_max_daily_resets_exceeded(self):
"""When user has exhausted daily resets, returns 429."""
cfg = _make_config(max_daily_resets=3)
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=3)),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 429
async def test_credit_system_disabled_returns_400(self):
"""When enable_credit=False, endpoint returns 400."""
with (
patch(f"{_MODULE}.config", _make_config()),
patch(f"{_MODULE}.settings", _mock_settings(enable_credit=False)),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 400
assert "credit system is disabled" in exc_info.value.detail.lower()
async def test_weekly_limit_exhausted_returns_400(self):
"""When the weekly limit is also exhausted, resetting daily won't help."""
cfg = _make_config()
weekly_exhausted = CoPilotUsageStatus(
daily=UsageWindow(
used=3_000_000,
limit=2_500_000,
resets_at=datetime.now(UTC) + timedelta(hours=6),
),
weekly=UsageWindow(
used=12_500_000,
limit=12_500_000,
resets_at=datetime.now(UTC) + timedelta(days=3),
),
)
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
patch(
f"{_MODULE}.get_usage_status",
AsyncMock(return_value=weekly_exhausted),
),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 400
assert "weekly" in exc_info.value.detail.lower()
mock_release.assert_awaited_once()
async def test_redis_failure_for_reset_count_returns_503(self):
"""When Redis is unavailable for get_daily_reset_count, returns 503."""
with (
patch(f"{_MODULE}.config", _make_config()),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=None)),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 503
assert "verify" in exc_info.value.detail.lower()
async def test_redis_reset_failure_refunds_credits(self):
"""When reset_daily_usage fails, credits are refunded and 503 returned."""
mock_credit_model = AsyncMock()
mock_credit_model.spend_credits.return_value = 1500
cfg = _make_config()
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
patch(
f"{_MODULE}.get_usage_status",
AsyncMock(return_value=_usage()),
),
patch(
f"{_MODULE}.get_user_credit_model",
AsyncMock(return_value=mock_credit_model),
),
patch(f"{_MODULE}.reset_daily_usage", AsyncMock(return_value=False)),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 503
assert "not been charged" in exc_info.value.detail
mock_credit_model.top_up_credits.assert_awaited_once()
async def test_redis_reset_failure_refund_also_fails(self):
"""When both reset and refund fail, error message reflects the truth."""
mock_credit_model = AsyncMock()
mock_credit_model.spend_credits.return_value = 1500
mock_credit_model.top_up_credits.side_effect = RuntimeError("db down")
cfg = _make_config()
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
patch(
f"{_MODULE}.get_usage_status",
AsyncMock(return_value=_usage()),
),
patch(
f"{_MODULE}.get_user_credit_model",
AsyncMock(return_value=mock_credit_model),
),
patch(f"{_MODULE}.reset_daily_usage", AsyncMock(return_value=False)),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
assert exc_info.value.status_code == 503
assert "contact support" in exc_info.value.detail.lower()

View File

@@ -3,26 +3,62 @@
You can create, edit, and customize agents directly. You ARE the brain —
generate the agent JSON yourself using block schemas, then validate and save.
### Clarifying — Before or During Building
Use `ask_question` whenever the user's intent is ambiguous — whether
that's before starting or midway through the workflow. Common moments:
- **Before building**: output format, delivery channel, data source, or
trigger is unspecified.
- **During block discovery**: multiple blocks could fit and the user
should choose.
- **During JSON generation**: a wiring decision depends on user
preference.
Steps:
1. Call `find_block` (or another discovery tool) to learn what the
platform actually supports for the ambiguous dimension.
2. Call `ask_question` with a concrete question listing the discovered
options (e.g. "The platform supports Gmail, Slack, and Google Docs —
which should the agent use for delivery?").
3. **Wait for the user's answer** before continuing.
**Skip this** when the goal already specifies all dimensions (e.g.
"scrape prices from Amazon and email me daily").
### Workflow for Creating/Editing Agents
1. **Discover blocks**: Call `find_block(query, include_schemas=true)` to
1. **If editing**: First narrow to the specific agent by UUID, then fetch its
graph: `find_library_agent(query="<agent_id>", include_graph=true)`. This
returns the full graph structure (nodes + links). **Never edit blindly**
always inspect the current graph first so you know exactly what to change.
Avoid using `include_graph=true` with broad keyword searches, as fetching
multiple graphs at once is expensive and consumes LLM context budget.
2. **Discover blocks**: Call `find_block(query, include_schemas=true)` to
search for relevant blocks. This returns block IDs, names, descriptions,
and full input/output schemas.
2. **Find library agents**: Call `find_library_agent` to discover reusable
3. **Find library agents**: Call `find_library_agent` to discover reusable
agents that can be composed as sub-agents via `AgentExecutorBlock`.
3. **Generate JSON**: Build the agent JSON using block schemas:
- Use block IDs from step 1 as `block_id` in nodes
4. **Generate/modify JSON**: Build or modify the agent JSON using block schemas:
- Use block IDs from step 2 as `block_id` in nodes
- Wire outputs to inputs using links
- Set design-time config in `input_default`
- Use `AgentInputBlock` for values the user provides at runtime
4. **Write to workspace**: Save the JSON to a workspace file so the user
- When editing, apply targeted changes and preserve unchanged parts
5. **Write to workspace**: Save the JSON to a workspace file so the user
can review it: `write_workspace_file(filename="agent.json", content=...)`
5. **Validate**: Call `validate_agent_graph` with the agent JSON to check
6. **Validate**: Call `validate_agent_graph` with the agent JSON to check
for errors
6. **Fix if needed**: Call `fix_agent_graph` to auto-fix common issues,
7. **Fix if needed**: Call `fix_agent_graph` to auto-fix common issues,
or fix manually based on the error descriptions. Iterate until valid.
7. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
8. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
the final `agent_json`
8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
`wait_for_result=120` to verify the agent works end-to-end.
9. **Inspect & fix**: Check the dry-run output for errors. If issues are
found, call `edit_agent` to fix and dry-run again. Repeat until the
simulation passes or the problems are clearly unfixable.
See "REQUIRED: Dry-Run Verification Loop" section below for details.
### Agent JSON Structure
@@ -67,9 +103,17 @@ These define the agent's interface — what it accepts and what it produces.
**AgentInputBlock** (ID: `c0a8e994-ebf1-4a9c-a4d8-89d09c86741b`):
- Defines a user-facing input field on the agent
- Required `input_default` fields: `name` (str), `value` (default: null)
- Optional: `title`, `description`, `placeholder_values` (for dropdowns)
- Optional: `title`, `description`
- Output: `result` — the user-provided value at runtime
- Create one AgentInputBlock per distinct input the agent needs
- For dropdown/select inputs, use **AgentDropdownInputBlock** instead (see below)
**AgentDropdownInputBlock** (ID: `655d6fdf-a334-421c-b733-520549c07cd1`):
- Specialized input block that presents a dropdown/select to the user
- Required `input_default` fields: `name` (str)
- Optional: `options` (list of dropdown values; when omitted/empty, input behaves as free-text), `title`, `description`, `value` (default selection)
- Output: `result` — the user-selected value at runtime
- Use this instead of AgentInputBlock when the user should pick from a fixed set of options
**AgentOutputBlock** (ID: `363ae599-353e-4804-937e-b2ee3cef3da4`):
- Defines a user-facing output displayed after the agent runs
@@ -208,19 +252,62 @@ call in a loop until the task is complete:
Regular blocks work exactly like sub-agents as tools — wire each input
field from `source_name: "tools"` on the Orchestrator side.
### Testing with Dry Run
### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)
After saving an agent, suggest a dry run to validate wiring without consuming
real API calls, credentials, or credits:
After creating or editing an agent, you MUST dry-run it before telling the
user the agent is ready. NEVER skip this step.
1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
sample inputs. This executes the graph with mock outputs, verifying that
links resolve correctly and required inputs are satisfied.
2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
to inspect the full node-by-node execution trace. This shows what each node
received as input and produced as output, making it easy to spot wiring issues.
3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
the agent JSON and re-save before suggesting a real execution.
#### Step-by-step workflow
1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
and realistic sample inputs that exercise every path in the agent. This
simulates execution using an LLM for each block — no real API calls,
credentials, or credits are consumed.
3. **Inspect output**: Examine the dry-run result for problems. If
`wait_for_result` returns only a summary, call
`view_agent_output(execution_id=..., show_execution_details=True)` to
see the full node-by-node execution trace. Look for:
- **Errors / failed nodes** — a node raised an exception or returned an
error status. Common causes: wrong `source_name`/`sink_name` in links,
missing `input_default` values, or referencing a nonexistent block output.
- **Null / empty outputs** — data did not flow through a link. Verify that
`source_name` and `sink_name` match the block schemas exactly (case-
sensitive, including nested `_#_` notation).
- **Nodes that never executed** — the node was not reached. Likely a
missing or broken link from an upstream node.
- **Unexpected values** — data arrived but in the wrong type or
structure. Check type compatibility between linked ports.
4. **Fix**: If any issues are found, call `edit_agent` with the corrected
agent JSON, then go back to step 2.
5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
or the problems are clearly unfixable. If you stop making progress,
report the remaining issues to the user and ask for guidance.
#### Good vs bad dry-run output
**Good output** (agent is ready):
- All nodes executed successfully (no errors in the execution trace)
- Data flows through every link with non-null, correctly-typed values
- The final `AgentOutputBlock` contains a meaningful result
- Status is `COMPLETED`
**Bad output** (needs fixing):
- Status is `FAILED` — check the error message for the failing node
- An output node received `null` — trace back to find the broken link
- A node received data in the wrong format (e.g. string where list expected)
- Nodes downstream of a failing node were skipped entirely
**Special block behaviour in dry-run mode:**
- **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the
orchestrator can make LLM calls and agent executors can spawn child graphs.
Their downstream tool blocks and child-graph blocks are still simulated.
Note: real LLM inference calls are made (consuming API quota), even though
platform credits are not charged. Agent-mode iterations are capped at 1 in
dry-run to keep it fast.
- **MCPToolBlock** is simulated using the selected tool's name and JSON Schema
so the LLM can produce a realistic mock response without connecting to the
MCP server.
### Example: Simple AI Text Processor

View File

@@ -25,7 +25,7 @@ from backend.copilot.sdk.compaction import (
def _make_session() -> ChatSession:
return ChatSession.new(user_id="test-user")
return ChatSession.new(user_id="test-user", dry_run=False)
# ---------------------------------------------------------------------------

View File

@@ -2,14 +2,30 @@
from __future__ import annotations
from collections.abc import AsyncIterator
from unittest.mock import patch
from uuid import uuid4
import pytest
import pytest_asyncio
from backend.util import json
@pytest_asyncio.fixture(scope="session", loop_scope="session", name="server")
async def _server_noop() -> None:
"""No-op server stub — SDK tests don't need the full backend."""
return None
@pytest_asyncio.fixture(
scope="session", loop_scope="session", autouse=True, name="graph_cleanup"
)
async def _graph_cleanup_noop() -> AsyncIterator[None]:
"""No-op graph cleanup stub."""
yield
@pytest.fixture()
def mock_chat_config():
"""Mock ChatConfig so compact_transcript tests skip real config lookup."""
@@ -25,24 +41,64 @@ def build_test_transcript(pairs: list[tuple[str, str]]) -> str:
Use this helper in any copilot SDK test that needs a well-formed
transcript without hitting the real storage layer.
Delegates to ``build_structured_transcript`` — plain content strings
are automatically wrapped in ``[{"type": "text", "text": ...}]`` for
assistant messages.
"""
# Cast widening: tuple[str, str] is structurally compatible with
# tuple[str, str | list[dict]] but list invariance requires explicit
# annotation.
widened: list[tuple[str, str | list[dict]]] = list(pairs)
return build_structured_transcript(widened)
def build_structured_transcript(
entries: list[tuple[str, str | list[dict]]],
) -> str:
"""Build a JSONL transcript with structured content blocks.
Each entry is (role, content) where content is either a plain string
(for user messages) or a list of content block dicts (for assistant
messages with thinking/tool_use/text blocks).
Example::
build_structured_transcript([
("user", "Hello"),
("assistant", [
{"type": "thinking", "thinking": "...", "signature": "sig1"},
{"type": "text", "text": "Hi there"},
]),
])
"""
lines: list[str] = []
last_uuid: str | None = None
for role, content in pairs:
for role, content in entries:
uid = str(uuid4())
entry_type = "assistant" if role == "assistant" else "user"
msg: dict = {"role": role, "content": content}
if role == "assistant":
msg.update(
{
"model": "",
"id": f"msg_{uid[:8]}",
"type": "message",
"content": [{"type": "text", "text": content}],
"stop_reason": "end_turn",
"stop_sequence": None,
}
)
if role == "assistant" and isinstance(content, list):
msg: dict = {
"role": "assistant",
"model": "claude-test",
"id": f"msg_{uid[:8]}",
"type": "message",
"content": content,
"stop_reason": "end_turn",
"stop_sequence": None,
}
elif role == "assistant":
msg = {
"role": "assistant",
"model": "claude-test",
"id": f"msg_{uid[:8]}",
"type": "message",
"content": [{"type": "text", "text": content}],
"stop_reason": "end_turn",
"stop_sequence": None,
}
else:
msg = {"role": role, "content": content}
entry = {
"type": entry_type,
"uuid": uid,

View File

@@ -8,6 +8,9 @@ SDK-internal paths (``~/.claude/projects/…/tool-results/``) are handled
by the separate ``Read`` MCP tool registered in ``tool_adapter.py``.
"""
import asyncio
import base64
import hashlib
import itertools
import json
import logging
@@ -28,6 +31,12 @@ from backend.copilot.context import (
logger = logging.getLogger(__name__)
# Default number of lines returned by ``read_file`` when the caller does not
# specify a limit. Also used as the threshold in ``bridge_to_sandbox`` to
# decide whether the model is requesting the full file (and thus whether the
# bridge copy is worthwhile).
_DEFAULT_READ_LIMIT = 2000
async def _check_sandbox_symlink_escape(
sandbox: Any,
@@ -89,7 +98,7 @@ def _get_sandbox_and_path(
return sandbox, remote
async def _sandbox_write(sandbox: Any, path: str, content: str) -> None:
async def _sandbox_write(sandbox: Any, path: str, content: str | bytes) -> None:
"""Write *content* to *path* inside the sandbox.
The E2B filesystem API (``sandbox.files.write``) and the command API
@@ -102,11 +111,14 @@ async def _sandbox_write(sandbox: Any, path: str, content: str) -> None:
To work around this, writes targeting ``/tmp`` are performed via
``tee`` through the command API, which runs as the sandbox ``user``
and can therefore always overwrite user-owned files.
*content* may be ``str`` (text) or ``bytes`` (binary). Both paths
are handled correctly: text is encoded to bytes for the base64 shell
pipe, and raw bytes are passed through without any encoding.
"""
if path == "/tmp" or path.startswith("/tmp/"):
import base64 as _b64
encoded = _b64.b64encode(content.encode()).decode()
raw = content.encode() if isinstance(content, str) else content
encoded = base64.b64encode(raw).decode()
result = await sandbox.commands.run(
f"echo {shlex.quote(encoded)} | base64 -d > {shlex.quote(path)}",
cwd=E2B_WORKDIR,
@@ -128,14 +140,25 @@ async def _handle_read_file(args: dict[str, Any]) -> dict[str, Any]:
"""Read lines from a sandbox file, falling back to the local host for SDK-internal paths."""
file_path: str = args.get("file_path", "")
offset: int = max(0, int(args.get("offset", 0)))
limit: int = max(1, int(args.get("limit", 2000)))
limit: int = max(1, int(args.get("limit", _DEFAULT_READ_LIMIT)))
if not file_path:
return _mcp("file_path is required", error=True)
# SDK-internal paths (tool-results, ephemeral working dir) stay on the host.
# SDK-internal paths (tool-results/tool-outputs, ephemeral working dir)
# stay on the host. When E2B is active, also copy the file into the
# sandbox so bash_exec can access it for further processing.
if _is_allowed_local(file_path):
return _read_local(file_path, offset, limit)
result = _read_local(file_path, offset, limit)
if not result.get("isError"):
sandbox = _get_sandbox()
if sandbox is not None:
annotation = await bridge_and_annotate(
sandbox, file_path, offset, limit
)
if annotation:
result["content"][0]["text"] += annotation
return result
result = _get_sandbox_and_path(file_path)
if isinstance(result, dict):
@@ -302,6 +325,103 @@ async def _handle_grep(args: dict[str, Any]) -> dict[str, Any]:
return _mcp(output if output else "No matches found.")
# Bridging: copy SDK-internal files into E2B sandbox
# Files larger than this are written to /home/user/ via sandbox.files.write()
# instead of /tmp/ via shell base64, to avoid shell argument length limits
# and E2B command timeouts. Base64 expands content by ~33%, so keep this
# well under the typical Linux ARG_MAX (128 KB).
_BRIDGE_SHELL_MAX_BYTES = 32 * 1024 # 32 KB
# Files larger than this are skipped entirely to avoid excessive transfer times.
_BRIDGE_SKIP_BYTES = 50 * 1024 * 1024 # 50 MB
async def bridge_to_sandbox(
sandbox: Any, file_path: str, offset: int, limit: int
) -> str | None:
"""Best-effort copy of a host-side SDK file into the E2B sandbox.
When the model reads an SDK-internal file (e.g. tool-results), it often
wants to process the data with bash. Copying the file into the sandbox
under a stable name lets ``bash_exec`` access it without extra steps.
Only copies when offset=0 and limit is large enough to indicate the model
wants the full file. Errors are logged but never propagated.
Returns the sandbox path on success, or ``None`` on skip/failure.
Size handling:
- <= 32 KB: written to ``/tmp/<hash>-<basename>`` via shell base64
(``_sandbox_write``). Kept small to stay within ARG_MAX.
- 32 KB - 50 MB: written to ``/home/user/<hash>-<basename>`` via
``sandbox.files.write()`` to avoid shell argument length limits.
- > 50 MB: skipped entirely with a warning.
The sandbox filename is prefixed with a short hash of the full source
path to avoid collisions when different source files share the same
basename (e.g. multiple ``result.json`` files).
"""
if offset != 0 or limit < _DEFAULT_READ_LIMIT:
return None
try:
expanded = os.path.realpath(os.path.expanduser(file_path))
basename = os.path.basename(expanded)
source_id = hashlib.sha256(expanded.encode()).hexdigest()[:12]
unique_name = f"{source_id}-{basename}"
file_size = os.path.getsize(expanded)
if file_size > _BRIDGE_SKIP_BYTES:
logger.warning(
"[E2B] Skipping bridge for large file (%d bytes): %s",
file_size,
basename,
)
return None
def _read_bytes() -> bytes:
with open(expanded, "rb") as fh:
return fh.read()
raw_content = await asyncio.to_thread(_read_bytes)
try:
text_content: str | None = raw_content.decode("utf-8")
except UnicodeDecodeError:
text_content = None
data: str | bytes = text_content if text_content is not None else raw_content
if file_size <= _BRIDGE_SHELL_MAX_BYTES:
sandbox_path = f"/tmp/{unique_name}"
await _sandbox_write(sandbox, sandbox_path, data)
else:
sandbox_path = f"/home/user/{unique_name}"
await sandbox.files.write(sandbox_path, data)
logger.info(
"[E2B] Bridged SDK file to sandbox: %s -> %s", basename, sandbox_path
)
return sandbox_path
except Exception:
logger.warning(
"[E2B] Failed to bridge SDK file to sandbox: %s",
file_path,
exc_info=True,
)
return None
async def bridge_and_annotate(
sandbox: Any, file_path: str, offset: int, limit: int
) -> str | None:
"""Bridge a host file to the sandbox and return a newline-prefixed annotation.
Combines ``bridge_to_sandbox`` with the standard annotation suffix so
callers don't need to duplicate the pattern. Returns a string like
``"\\n[Sandbox copy available at /tmp/abc-file.txt]"`` on success, or
``None`` if bridging was skipped or failed.
"""
sandbox_path = await bridge_to_sandbox(sandbox, file_path, offset, limit)
if sandbox_path is None:
return None
return f"\n[Sandbox copy available at {sandbox_path}]"
# Local read (for SDK-internal paths)

View File

@@ -3,6 +3,7 @@
Pure unit tests with no external dependencies (no E2B, no sandbox).
"""
import hashlib
import os
import shutil
from types import SimpleNamespace
@@ -13,12 +14,26 @@ import pytest
from backend.copilot.context import E2B_WORKDIR, SDK_PROJECTS_DIR, _current_project_dir
from .e2b_file_tools import (
_BRIDGE_SHELL_MAX_BYTES,
_BRIDGE_SKIP_BYTES,
_DEFAULT_READ_LIMIT,
_check_sandbox_symlink_escape,
_read_local,
_sandbox_write,
bridge_and_annotate,
bridge_to_sandbox,
resolve_sandbox_path,
)
def _expected_bridge_path(file_path: str, prefix: str = "/tmp") -> str:
"""Compute the expected sandbox path for a bridged file."""
expanded = os.path.realpath(os.path.expanduser(file_path))
basename = os.path.basename(expanded)
source_id = hashlib.sha256(expanded.encode()).hexdigest()[:12]
return f"{prefix}/{source_id}-{basename}"
# ---------------------------------------------------------------------------
# resolve_sandbox_path — sandbox path normalisation & boundary enforcement
# ---------------------------------------------------------------------------
@@ -91,9 +106,9 @@ class TestResolveSandboxPath:
# ---------------------------------------------------------------------------
# _read_local — host filesystem reads with allowlist enforcement
#
# In E2B mode, _read_local only allows tool-results paths (via
# is_allowed_local_path without sdk_cwd). Regular files live on the
# sandbox, not the host.
# In E2B mode, _read_local only allows tool-results/tool-outputs paths
# (via is_allowed_local_path without sdk_cwd). Regular files live on
# the sandbox, not the host.
# ---------------------------------------------------------------------------
@@ -119,7 +134,7 @@ class TestReadLocal:
)
token = _current_project_dir.set(encoded)
try:
result = _read_local(filepath, offset=0, limit=2000)
result = _read_local(filepath, offset=0, limit=_DEFAULT_READ_LIMIT)
assert result["isError"] is False
assert "line 1" in result["content"][0]["text"]
assert "line 2" in result["content"][0]["text"]
@@ -127,6 +142,25 @@ class TestReadLocal:
_current_project_dir.reset(token)
os.unlink(filepath)
def test_read_tool_outputs_file(self):
"""Reading a tool-outputs file should also succeed."""
encoded = "-tmp-copilot-e2b-test-read-outputs"
tool_outputs_dir = os.path.join(
SDK_PROJECTS_DIR, encoded, self._CONV_UUID, "tool-outputs"
)
os.makedirs(tool_outputs_dir, exist_ok=True)
filepath = os.path.join(tool_outputs_dir, "sdk-abc123.json")
with open(filepath, "w") as f:
f.write('{"data": "test"}\n')
token = _current_project_dir.set(encoded)
try:
result = _read_local(filepath, offset=0, limit=_DEFAULT_READ_LIMIT)
assert result["isError"] is False
assert "test" in result["content"][0]["text"]
finally:
_current_project_dir.reset(token)
shutil.rmtree(os.path.join(SDK_PROJECTS_DIR, encoded), ignore_errors=True)
def test_read_disallowed_path_blocked(self):
"""Reading /etc/passwd should be blocked by the allowlist."""
result = _read_local("/etc/passwd", offset=0, limit=10)
@@ -335,3 +369,199 @@ class TestSandboxWrite:
encoded_in_cmd = call_args.split("echo ")[1].split(" |")[0].strip("'")
decoded = base64.b64decode(encoded_in_cmd).decode()
assert decoded == content
# ---------------------------------------------------------------------------
# bridge_to_sandbox — copy SDK-internal files into E2B sandbox
# ---------------------------------------------------------------------------
def _make_bridge_sandbox() -> SimpleNamespace:
"""Build a sandbox mock suitable for bridge_to_sandbox tests."""
run_result = SimpleNamespace(stdout="", stderr="", exit_code=0)
commands = SimpleNamespace(run=AsyncMock(return_value=run_result))
files = SimpleNamespace(write=AsyncMock())
return SimpleNamespace(commands=commands, files=files)
class TestBridgeToSandbox:
@pytest.mark.asyncio
async def test_happy_path_small_file(self, tmp_path):
"""A small file is bridged to /tmp/<hash>-<basename> via _sandbox_write."""
f = tmp_path / "result.json"
f.write_text('{"ok": true}')
sandbox = _make_bridge_sandbox()
result = await bridge_to_sandbox(
sandbox, str(f), offset=0, limit=_DEFAULT_READ_LIMIT
)
expected = _expected_bridge_path(str(f))
assert result == expected
sandbox.commands.run.assert_called_once()
cmd = sandbox.commands.run.call_args[0][0]
assert "result.json" in cmd
sandbox.files.write.assert_not_called()
@pytest.mark.asyncio
async def test_skip_when_offset_nonzero(self, tmp_path):
"""Bridging is skipped when offset != 0 (partial read)."""
f = tmp_path / "data.txt"
f.write_text("content")
sandbox = _make_bridge_sandbox()
result = await bridge_to_sandbox(
sandbox, str(f), offset=10, limit=_DEFAULT_READ_LIMIT
)
assert result is None
sandbox.commands.run.assert_not_called()
sandbox.files.write.assert_not_called()
@pytest.mark.asyncio
async def test_skip_when_limit_too_small(self, tmp_path):
"""Bridging is skipped when limit < _DEFAULT_READ_LIMIT (partial read)."""
f = tmp_path / "data.txt"
f.write_text("content")
sandbox = _make_bridge_sandbox()
await bridge_to_sandbox(sandbox, str(f), offset=0, limit=100)
sandbox.commands.run.assert_not_called()
sandbox.files.write.assert_not_called()
@pytest.mark.asyncio
async def test_nonexistent_file_does_not_raise(self, tmp_path):
"""Bridging a non-existent file logs but does not propagate errors."""
sandbox = _make_bridge_sandbox()
await bridge_to_sandbox(
sandbox, str(tmp_path / "ghost.txt"), offset=0, limit=_DEFAULT_READ_LIMIT
)
sandbox.commands.run.assert_not_called()
sandbox.files.write.assert_not_called()
@pytest.mark.asyncio
async def test_sandbox_write_failure_returns_none(self, tmp_path):
"""If sandbox write fails, returns None (best-effort)."""
f = tmp_path / "data.txt"
f.write_text("content")
sandbox = _make_bridge_sandbox()
sandbox.commands.run.side_effect = RuntimeError("E2B timeout")
result = await bridge_to_sandbox(
sandbox, str(f), offset=0, limit=_DEFAULT_READ_LIMIT
)
assert result is None
@pytest.mark.asyncio
async def test_large_file_uses_files_api(self, tmp_path):
"""Files > 32 KB but <= 50 MB are written to /home/user/ via files.write."""
f = tmp_path / "big.json"
f.write_bytes(b"x" * (_BRIDGE_SHELL_MAX_BYTES + 1))
sandbox = _make_bridge_sandbox()
result = await bridge_to_sandbox(
sandbox, str(f), offset=0, limit=_DEFAULT_READ_LIMIT
)
expected = _expected_bridge_path(str(f), prefix="/home/user")
assert result == expected
sandbox.files.write.assert_called_once()
call_args = sandbox.files.write.call_args[0]
assert call_args[0] == expected
sandbox.commands.run.assert_not_called()
@pytest.mark.asyncio
async def test_small_binary_file_preserves_bytes(self, tmp_path):
"""A small binary file is bridged to /tmp via base64 without corruption."""
binary_data = bytes(range(256))
f = tmp_path / "image.png"
f.write_bytes(binary_data)
sandbox = _make_bridge_sandbox()
result = await bridge_to_sandbox(
sandbox, str(f), offset=0, limit=_DEFAULT_READ_LIMIT
)
expected = _expected_bridge_path(str(f))
assert result == expected
sandbox.commands.run.assert_called_once()
cmd = sandbox.commands.run.call_args[0][0]
assert "base64" in cmd
sandbox.files.write.assert_not_called()
@pytest.mark.asyncio
async def test_large_binary_file_writes_raw_bytes(self, tmp_path):
"""A large binary file is bridged to /home/user/ as raw bytes."""
binary_data = bytes(range(256)) * 200
f = tmp_path / "photo.jpg"
f.write_bytes(binary_data)
sandbox = _make_bridge_sandbox()
result = await bridge_to_sandbox(
sandbox, str(f), offset=0, limit=_DEFAULT_READ_LIMIT
)
expected = _expected_bridge_path(str(f), prefix="/home/user")
assert result == expected
sandbox.files.write.assert_called_once()
call_args = sandbox.files.write.call_args[0]
assert call_args[0] == expected
assert call_args[1] == binary_data
sandbox.commands.run.assert_not_called()
@pytest.mark.asyncio
async def test_very_large_file_skipped(self, tmp_path):
"""Files > 50 MB are skipped entirely."""
f = tmp_path / "huge.bin"
# Create a sparse file to avoid actually writing 50 MB
with open(f, "wb") as fh:
fh.seek(_BRIDGE_SKIP_BYTES + 1)
fh.write(b"\0")
sandbox = _make_bridge_sandbox()
result = await bridge_to_sandbox(
sandbox, str(f), offset=0, limit=_DEFAULT_READ_LIMIT
)
assert result is None
sandbox.commands.run.assert_not_called()
sandbox.files.write.assert_not_called()
# ---------------------------------------------------------------------------
# bridge_and_annotate — shared helper wrapping bridge_to_sandbox + annotation
# ---------------------------------------------------------------------------
class TestBridgeAndAnnotate:
@pytest.mark.asyncio
async def test_returns_annotation_on_success(self, tmp_path):
"""On success, returns a newline-prefixed annotation with the sandbox path."""
f = tmp_path / "data.json"
f.write_text('{"ok": true}')
sandbox = _make_bridge_sandbox()
annotation = await bridge_and_annotate(
sandbox, str(f), offset=0, limit=_DEFAULT_READ_LIMIT
)
expected_path = _expected_bridge_path(str(f))
assert annotation == f"\n[Sandbox copy available at {expected_path}]"
@pytest.mark.asyncio
async def test_returns_none_when_skipped(self, tmp_path):
"""When bridging is skipped (e.g. offset != 0), returns None."""
f = tmp_path / "data.json"
f.write_text("content")
sandbox = _make_bridge_sandbox()
annotation = await bridge_and_annotate(
sandbox, str(f), offset=10, limit=_DEFAULT_READ_LIMIT
)
assert annotation is None

View File

@@ -275,7 +275,7 @@ class TestCompactionE2E:
# --- Step 7: CompactionTracker receives PreCompact hook ---
tracker = CompactionTracker()
session = ChatSession.new(user_id="test-user")
session = ChatSession.new(user_id="test-user", dry_run=False)
tracker.on_compact(str(session_file))
# --- Step 8: Next SDK message arrives → emit_start ---
@@ -376,7 +376,7 @@ class TestCompactionE2E:
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
tracker = CompactionTracker()
session = ChatSession.new(user_id="test")
session = ChatSession.new(user_id="test", dry_run=False)
builder = TranscriptBuilder()
# --- First query with compaction ---

View File

@@ -0,0 +1,82 @@
"""SDK environment variable builder — importable without circular deps.
Extracted from ``service.py`` so that ``backend.blocks.orchestrator``
can reuse the same subscription / OpenRouter / direct-Anthropic logic
without pulling in the full copilot service module (which would create a
circular import through ``executor`` → ``credit`` → ``block_cost_config``).
"""
from __future__ import annotations
from backend.copilot.config import ChatConfig
from backend.copilot.sdk.subscription import validate_subscription
# ChatConfig is stateless (reads env vars) — a separate instance is fine.
# A singleton would require importing service.py which causes the circular dep
# this module was created to avoid.
config = ChatConfig()
def build_sdk_env(
session_id: str | None = None,
user_id: str | None = None,
sdk_cwd: str | None = None,
) -> dict[str, str]:
"""Build env vars for the SDK CLI subprocess.
Three modes (checked in order):
1. **Subscription** — clears all keys; CLI uses ``claude login`` auth.
2. **Direct Anthropic** — returns ``{}``; subprocess inherits
``ANTHROPIC_API_KEY`` from the parent environment.
3. **OpenRouter** (default) — overrides base URL and auth token to
route through the proxy, with Langfuse trace headers.
When *sdk_cwd* is provided, ``CLAUDE_CODE_TMPDIR`` is set so that
the CLI writes temp/sub-agent output inside the per-session workspace
directory rather than an inaccessible system temp path.
"""
# --- Mode 1: Claude Code subscription auth ---
if config.use_claude_code_subscription:
validate_subscription()
env: dict[str, str] = {
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
}
if sdk_cwd:
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
return env
# --- Mode 2: Direct Anthropic (no proxy hop) ---
if not config.openrouter_active:
env = {}
if sdk_cwd:
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
return env
# --- Mode 3: OpenRouter proxy ---
base = (config.base_url or "").rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
env = {
"ANTHROPIC_BASE_URL": base,
"ANTHROPIC_AUTH_TOKEN": config.api_key or "",
"ANTHROPIC_API_KEY": "", # force CLI to use AUTH_TOKEN
}
# Inject broadcast headers so OpenRouter forwards traces to Langfuse.
def _safe(v: str) -> str:
return v.replace("\r", "").replace("\n", "").strip()[:128]
parts = []
if session_id:
parts.append(f"x-session-id: {_safe(session_id)}")
if user_id:
parts.append(f"x-user-id: {_safe(user_id)}")
if parts:
env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)
if sdk_cwd:
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
return env

View File

@@ -0,0 +1,293 @@
"""Tests for build_sdk_env() — the SDK subprocess environment builder."""
from unittest.mock import patch
import pytest
from backend.copilot.config import ChatConfig
# ---------------------------------------------------------------------------
# Helpers — build a ChatConfig with explicit field values so tests don't
# depend on real environment variables.
# ---------------------------------------------------------------------------
def _make_config(**overrides) -> ChatConfig:
"""Create a ChatConfig with safe defaults, applying *overrides*."""
defaults = {
"use_claude_code_subscription": False,
"use_openrouter": False,
"api_key": None,
"base_url": None,
}
defaults.update(overrides)
return ChatConfig(**defaults)
# ---------------------------------------------------------------------------
# Mode 1 — Subscription auth
# ---------------------------------------------------------------------------
class TestBuildSdkEnvSubscription:
"""When ``use_claude_code_subscription`` is True, keys are blanked."""
@patch("backend.copilot.sdk.env.validate_subscription")
def test_returns_blanked_keys(self, mock_validate):
"""Subscription mode clears API_KEY, AUTH_TOKEN, and BASE_URL."""
cfg = _make_config(use_claude_code_subscription=True)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
assert result == {
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
}
mock_validate.assert_called_once()
@patch(
"backend.copilot.sdk.env.validate_subscription",
side_effect=RuntimeError("CLI not found"),
)
def test_propagates_validation_error(self, mock_validate):
"""If validate_subscription fails, the error bubbles up."""
cfg = _make_config(use_claude_code_subscription=True)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
with pytest.raises(RuntimeError, match="CLI not found"):
build_sdk_env()
# ---------------------------------------------------------------------------
# Mode 2 — Direct Anthropic (no OpenRouter)
# ---------------------------------------------------------------------------
class TestBuildSdkEnvDirectAnthropic:
"""When OpenRouter is inactive, return empty dict (inherit parent env)."""
def test_returns_empty_dict_when_openrouter_inactive(self):
cfg = _make_config(use_openrouter=False)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
assert result == {}
def test_returns_empty_dict_when_openrouter_flag_true_but_no_key(self):
"""OpenRouter flag is True but no api_key => openrouter_active is False."""
cfg = _make_config(use_openrouter=True, base_url="https://openrouter.ai/api/v1")
# Force api_key to None after construction (field_validator may pick up env vars)
object.__setattr__(cfg, "api_key", None)
assert not cfg.openrouter_active
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
assert result == {}
# ---------------------------------------------------------------------------
# Mode 3 — OpenRouter proxy
# ---------------------------------------------------------------------------
class TestBuildSdkEnvOpenRouter:
"""When OpenRouter is active, return proxy env vars."""
def _openrouter_config(self, **overrides):
defaults = {
"use_openrouter": True,
"api_key": "sk-or-test-key",
"base_url": "https://openrouter.ai/api/v1",
}
defaults.update(overrides)
return _make_config(**defaults)
def test_basic_openrouter_env(self):
cfg = self._openrouter_config()
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
assert result["ANTHROPIC_BASE_URL"] == "https://openrouter.ai/api"
assert result["ANTHROPIC_AUTH_TOKEN"] == "sk-or-test-key"
assert result["ANTHROPIC_API_KEY"] == ""
assert "ANTHROPIC_CUSTOM_HEADERS" not in result
def test_strips_trailing_v1(self):
"""The /v1 suffix is stripped from the base URL."""
cfg = self._openrouter_config(base_url="https://openrouter.ai/api/v1")
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
assert result["ANTHROPIC_BASE_URL"] == "https://openrouter.ai/api"
def test_strips_trailing_v1_and_slash(self):
"""Trailing slash before /v1 strip is handled."""
cfg = self._openrouter_config(base_url="https://openrouter.ai/api/v1/")
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
# rstrip("/") first, then remove /v1
assert result["ANTHROPIC_BASE_URL"] == "https://openrouter.ai/api"
def test_no_v1_suffix_left_alone(self):
"""A base URL without /v1 is used as-is."""
cfg = self._openrouter_config(base_url="https://custom-proxy.example.com")
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
assert result["ANTHROPIC_BASE_URL"] == "https://custom-proxy.example.com"
def test_session_id_header(self):
cfg = self._openrouter_config()
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(session_id="sess-123")
assert "ANTHROPIC_CUSTOM_HEADERS" in result
assert "x-session-id: sess-123" in result["ANTHROPIC_CUSTOM_HEADERS"]
def test_user_id_header(self):
cfg = self._openrouter_config()
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(user_id="user-456")
assert "x-user-id: user-456" in result["ANTHROPIC_CUSTOM_HEADERS"]
def test_both_headers(self):
cfg = self._openrouter_config()
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(session_id="s1", user_id="u2")
headers = result["ANTHROPIC_CUSTOM_HEADERS"]
assert "x-session-id: s1" in headers
assert "x-user-id: u2" in headers
# They should be newline-separated
assert "\n" in headers
def test_header_sanitisation_strips_newlines(self):
"""Newlines/carriage-returns in header values are stripped."""
cfg = self._openrouter_config()
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(session_id="bad\r\nvalue")
header_val = result["ANTHROPIC_CUSTOM_HEADERS"]
# The _safe helper removes \r and \n
assert "\r" not in header_val.split(": ", 1)[1]
assert "badvalue" in header_val
def test_header_value_truncated_to_128_chars(self):
"""Header values are truncated to 128 characters."""
cfg = self._openrouter_config()
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
long_id = "x" * 200
result = build_sdk_env(session_id=long_id)
# The value after "x-session-id: " should be at most 128 chars
header_line = result["ANTHROPIC_CUSTOM_HEADERS"]
value = header_line.split(": ", 1)[1]
assert len(value) == 128
# ---------------------------------------------------------------------------
# Mode priority
# ---------------------------------------------------------------------------
class TestBuildSdkEnvModePriority:
"""Subscription mode takes precedence over OpenRouter."""
@patch("backend.copilot.sdk.env.validate_subscription")
def test_subscription_overrides_openrouter(self, mock_validate):
cfg = _make_config(
use_claude_code_subscription=True,
use_openrouter=True,
api_key="sk-or-key",
base_url="https://openrouter.ai/api/v1",
)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
# Should get subscription result, not OpenRouter
assert result == {
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
}
# ---------------------------------------------------------------------------
# CLAUDE_CODE_TMPDIR integration
# ---------------------------------------------------------------------------
class TestClaudeCodeTmpdir:
"""Verify build_sdk_env() sets CLAUDE_CODE_TMPDIR from *sdk_cwd*."""
def test_tmpdir_set_when_sdk_cwd_is_truthy(self):
"""CLAUDE_CODE_TMPDIR is set to sdk_cwd when sdk_cwd is truthy."""
cfg = _make_config(use_openrouter=False)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(sdk_cwd="/tmp/copilot-workspace")
assert result["CLAUDE_CODE_TMPDIR"] == "/tmp/copilot-workspace"
def test_tmpdir_not_set_when_sdk_cwd_is_none(self):
"""CLAUDE_CODE_TMPDIR is NOT in the env when sdk_cwd is None."""
cfg = _make_config(use_openrouter=False)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(sdk_cwd=None)
assert "CLAUDE_CODE_TMPDIR" not in result
def test_tmpdir_not_set_when_sdk_cwd_is_empty_string(self):
"""CLAUDE_CODE_TMPDIR is NOT in the env when sdk_cwd is empty string."""
cfg = _make_config(use_openrouter=False)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(sdk_cwd="")
assert "CLAUDE_CODE_TMPDIR" not in result
@patch("backend.copilot.sdk.env.validate_subscription")
def test_tmpdir_set_in_subscription_mode(self, mock_validate):
"""CLAUDE_CODE_TMPDIR is set even in subscription mode."""
cfg = _make_config(use_claude_code_subscription=True)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env(sdk_cwd="/tmp/sub-workspace")
assert result["CLAUDE_CODE_TMPDIR"] == "/tmp/sub-workspace"
assert result["ANTHROPIC_API_KEY"] == ""

Some files were not shown because too many files have changed in this diff Show More