Compare commits

..

396 Commits

Author SHA1 Message Date
Zamil Majdy
64d9d6d880 Merge remote-tracking branch 'origin/codex/platform-cost-tracking' into combined-preview-test 2026-04-02 18:32:06 +02:00
Zamil Majdy
9fc324e28a Merge remote-tracking branch 'origin/fix/copilot-tool-output-e2b-bridging' into combined-preview-test 2026-04-02 18:32:06 +02:00
Zamil Majdy
adf66bdd24 Merge origin/fix/copilot-subagent-security (resolved conflicts) 2026-04-02 18:32:06 +02:00
Zamil Majdy
dc10ad715a Merge remote-tracking branch 'origin/feat/rate-limit-tiering' into combined-preview-test 2026-04-02 18:32:05 +02:00
Zamil Majdy
493c91e0dd Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 18:32:05 +02:00
Zamil Majdy
b278e66f4d Merge remote-tracking branch 'origin/dev' into combined-preview-test 2026-04-02 18:32:05 +02:00
Zamil Majdy
3e183ed2a3 fix(copilot): address 8 should-fix items from review 4051661771
1. Rewrite tautological env_test.py TestClaudeCodeTmpdir tests to call
   build_sdk_env(sdk_cwd=...) directly instead of copy-pasting the
   if-sdk_cwd pattern. Moved CLAUDE_CODE_TMPDIR logic into build_sdk_env().
2. Add DEL (\x7f), C1 (\x80-\x9f), BiDi, and zero-width chars to
   security_hooks_test.py sanitization test inputs.
3. Promote _sanitize() from closure to module-level pure function.
4. Fix GenericTool.tsx "model may poll again" -> user-friendly message.
5. Replace `as never` with @ts-expect-error + comment in useChatSession.ts.
6. Extract "Agent"/"Task"/"TaskOutput" string literals to named constants
   in helpers.ts, imported in GenericTool.tsx.
7. Extend _sanitize() to strip Unicode BiDi overrides (U+202A-U+202E,
   U+2066-U+2069) and zero-width characters (U+200B-U+200F, U+FEFF).
8. Document background agent slot lifecycle limitation in security_hooks.py
   (SubagentStop doesn't fire reliably for background agents).
2026-04-02 18:23:42 +02:00
Zamil Majdy
82887a2d92 fix(backend/copilot): address reviewer feedback on E2B bridge API surface
- Rename _bridge_to_sandbox to bridge_to_sandbox (public) since it is
  imported cross-module from tool_adapter.py (item 4)
- Extract duplicated bridge+append-annotation pattern into shared
  bridge_and_annotate() helper used by both e2b_file_tools and
  tool_adapter (item 5)
- Add tests verifying bridge_and_annotate is called from
  _read_file_handler in tool_adapter when a sandbox is active (item 2)
- Add unit tests for bridge_and_annotate helper itself
2026-04-02 18:22:13 +02:00
Zamil Majdy
993c43b623 feat(platform): add merge_stats to remaining blocks (FAL, Revid, D-ID, E2B, YouTube, Weather, TTS, Enrichlayer)
Every system credential block now has explicit merge_stats tracking.
No block relies on the generic fallback anymore.
2026-04-02 18:22:02 +02:00
Zamil Majdy
13fcc62a31 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-tool-output-e2b-bridging 2026-04-02 18:16:25 +02:00
Zamil Majdy
8fefa23468 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-subagent-security 2026-04-02 18:15:58 +02:00
Zamil Majdy
749a56ca20 fix(backend): make email lookup non-blocking in set_user_tier endpoint 2026-04-02 18:14:34 +02:00
Zamil Majdy
a8a62eeefc feat(platform): add merge_stats tracking to all system credential blocks
Every block that uses system credentials now calls merge_stats with
meaningful data after the API response:
- Google Maps: output_size = number of places returned (= detail API calls)
- Apollo people/org: output_size = results count
- Apollo person: output_size = 1 per enrichment
- SmartLead: output_size = leads added or 1 per operation
- Ideogram: output_size = 1 per image
- Replicate: output_size = 1 per prediction
- Nvidia: output_size = 1 per inference
- ScreenshotOne: output_size = 1 per screenshot
- ZeroBounce: output_size = 1 per email validated
- Mem0: output_size = 1 per memory operation
2026-04-02 18:13:15 +02:00
Zamil Majdy
173614bcc5 fix(platform): audit and fix per-provider tracking accuracy
- Fix ElevenLabs/D-ID field name: script -> script_input
- Remove incorrect Google Maps api_calls formula, use per_run instead
- Remove D-ID from generation_seconds (walltime includes polling)
- Jina embeddings: extract total_tokens from response.usage
- Simplify tracking types: cost_usd, tokens, characters,
  sandbox_seconds, walltime_seconds, per_run
2026-04-02 17:58:24 +02:00
Zamil Majdy
3396cb3f4c fix(frontend): show advanced fields toggle when all input fields are advanced
When every input field was marked as advanced, `buildExpectedInputsSchema`
returned null (no visible fields), causing the entire inputs card—including
the "Show advanced fields" toggle—to not render. This made the fields
completely inaccessible.

Two changes:
- Render the inputs card when `hasAdvancedFields` is true, even if
  `inputSchema` is null, so the toggle is always accessible.
- Base `needsInputs` on `expectedInputs.length > 0` instead of
  `inputSchema !== null` so the Proceed button and input message logic
  work correctly with advanced-only fields.
2026-04-02 17:58:15 +02:00
Zamil Majdy
0c5d628b74 fix(frontend): sync inputValues state when output prop updates in SetupRequirementsCard
The inputValues state was initialized from the output prop via useState,
which only runs on mount. When the output prop updated via streaming, the
form would show stale data. Added a useEffect that merges new initial
values from the prop while preserving user-edited fields.
2026-04-02 17:44:11 +02:00
Zamil Majdy
ed40549499 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/agent-generation-dry-run-loop 2026-04-02 17:43:17 +02:00
Zamil Majdy
fbe634fb19 fix(platform): handle null user_id in cost logs and fix 0.0 cost stored as NULL
- Add null-safe optional chaining for user_id.slice() in LogsTable, displaying
  "Deleted user" when user_id is null to prevent frontend crash
- Change `if cost_float` to `if cost_float is not None` in token_tracking.py
  so that a legitimate $0.00 cost is stored as 0 instead of NULL
2026-04-02 17:38:59 +02:00
Zamil Majdy
a338c72c42 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into codex/platform-cost-tracking 2026-04-02 17:36:14 +02:00
Zamil Majdy
a9d13f0cbf ci: retrigger CI (flaky event loop test) 2026-04-02 17:33:23 +02:00
Zamil Majdy
e83e50a8f1 fix(frontend): wrap handleDeleteConfirm to prevent MouseEvent as force param 2026-04-02 17:32:33 +02:00
Zamil Majdy
7f4398efa3 feat(platform): provider-specific tracking types for accurate cost metrics
Replace one-size-fits-all tracking cascade with provider-aware logic:
- cost_usd: OpenRouter (x-total-cost header), Exa (cost_dollars)
- tokens: OpenAI, Anthropic, Groq, Ollama (token counts)
- characters: Unreal Speech, ElevenLabs (input text length)
- api_calls: Google Maps (1 nearby + N detail calls)
- sandbox_seconds: E2B (sandbox execution time)
- generation_seconds: FAL, Revid, D-ID, Replicate (video/image gen time)
- per_run: Apollo, SmartLead, ZeroBounce, Jina, etc.
2026-04-02 17:30:15 +02:00
Zamil Majdy
c2a054c511 fix(backend): prevent provider_cost loss on stats merge and widen costMicrodollars to BigInt
- NodeExecutionStats.__iadd__ was overwriting accumulated provider_cost
  with None when merging stats that lacked provider_cost (e.g. the final
  llm_call_count/llm_retry_count merge). Skip None values in __iadd__
  so existing data is never erased.
- Widen PlatformCostLog.costMicrodollars from Int (max ~$2,147) to
  BigInt to prevent theoretical overflow for high-cost aggregated
  node executions.
2026-04-02 17:28:27 +02:00
Zamil Majdy
b256560619 fix(frontend): add force-delete flow and try/catch for credential operations
- DeleteConfirmationModal now shows backend warning message and offers
  "Force Delete" when API returns need_confirmation instead of just a
  toast (mirrors integrations page pattern)
- HostScopedCredentialsModal onSubmit delete-then-create is now wrapped
  in try/catch to prevent silent credential loss on creation failure
2026-04-02 17:25:04 +02:00
Zamil Majdy
c63d5f538b Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 17:18:50 +02:00
Zamil Majdy
eeba884671 fix(platform): fix ClamAV connectivity in Docker containers
clamd was only listening on 127.0.0.1 inside its container, so
container-to-container connections on the Docker network were refused.

- Add CLAMD_CONF_TCPAddr=0.0.0.0 to docker-compose so clamd binds
  to all interfaces
- Change default clamav_service_host from "localhost" to "clamav"
  (the docker-compose service name), matching how other services
  like redis, rabbitmq, supabase-db are referenced
2026-04-02 17:18:13 +02:00
Zamil Majdy
90822e3f37 fix(frontend+backend): prefill block inputs and hide advanced in CoPilot setup card
Backend:
- get_inputs_from_schema() now accepts input_data to populate each field's
  value with what CoPilot already provided, and includes the advanced flag
  from the schema so the frontend can hide non-essential fields.

Frontend:
- SetupRequirementsCard prefills form inputs from backend-provided values
  instead of showing empty forms
- Advanced fields hidden by default with "Show advanced fields" toggle
  (matching builder behaviour)
- siblingInputs built from both input values and discriminator_values
  so the host pattern modal can extract the host from the URL
- extractInitialValues() populates form state from prefilled values
2026-04-02 17:18:06 +02:00
Zamil Majdy
a8bb6b5544 fix(frontend): prefill host pattern in CoPilot credential setup modal
The SetupRequirementsCard passed inputValues={{}} to CredentialsGroupedView,
which meant the HostScopedCredentialsModal never received the target URL
from the backend's discriminator_values. The "Host Pattern" field was always
empty even though the CoPilot knew the exact host (e.g. api.openai.com).

Add buildSiblingInputsFromCredentials() to extract the discriminator value
(URL) from the missing_credentials setup_info and pass it as siblingInputs
so the modal can prefill the host pattern.
2026-04-02 17:17:59 +02:00
Zamil Majdy
83b00f4789 feat(platform): add copilot/autopilot cost tracking via token_tracking.py
Copilot uses OpenRouter via a separate code path (not through the block
executor). This integrates PlatformCostLog into the shared
persist_and_record_usage() function which is called by both SDK and
baseline copilot paths, capturing:
- Every LLM turn (main conversation, title gen, context compression)
- Tokens (prompt + completion + cache)
- Actual USD cost when available (SDK path provides cost_usd)
- Session ID for correlation
2026-04-02 17:17:53 +02:00
Zamil Majdy
4cd53bb7f6 Merge remote-tracking branch 'origin/codex/platform-cost-tracking' into combined-preview-test 2026-04-02 17:14:29 +02:00
Zamil Majdy
96d83e9bbd Merge remote-tracking branch 'origin/fix/copilot-p0-cli-internals' into combined-preview-test 2026-04-02 17:14:29 +02:00
Zamil Majdy
e99f4ac767 Merge remote-tracking branch 'origin/feat/rate-limit-tiering' into combined-preview-test 2026-04-02 17:14:29 +02:00
Zamil Majdy
67c2540177 Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 17:14:29 +02:00
Zamil Majdy
95524e94b3 feat(platform): add tracking_type and tracking_amount to cost log metadata
Standardize cost tracking across providers:
- cost_usd: actual dollar cost (OpenRouter, Exa)
- tokens: total token count (LLM blocks)
- duration_seconds: execution time (video gen, sandboxes)
- per_run: flat per-request (all others)
2026-04-02 17:04:50 +02:00
Zamil Majdy
eda02f9ce6 fix(backend/copilot): remove duplicate StreamError in _HandledStreamError handler
The _HandledStreamError exception is only raised by _run_stream_attempt
*after* it has already yielded a StreamError to the client. The handler
in the retry loop was yielding a second StreamError for non-transient
errors (e.g. circuit breaker trips) and when transient retries were
exhausted, causing the client to receive duplicate error events.

Remove the redundant yield since the StreamError was already sent.
2026-04-02 17:03:40 +02:00
Zamil Majdy
9ab6082a23 fix(frontend): handle credential deletion errors with toast feedback
handleDeleteConfirm now catches API errors and shows a destructive
toast instead of silently failing. It also checks for
need_confirmation responses when the credential is still in use.
2026-04-02 17:03:00 +02:00
Zamil Majdy
2c517ff9a1 feat(platform): add per-provider cost extraction
- OpenRouter: Extract actual USD cost from x-total-cost response header
- Exa (search, contents): Write cost_dollars.total to execution_stats
- LLM blocks: Store provider_cost in stats when available
- Add provider_cost field to NodeExecutionStats
- Hook now converts provider_cost to costMicrodollars in PlatformCostLog
- Metadata includes both credit_cost and provider_cost_usd when available
2026-04-02 16:57:34 +02:00
Zamil Majdy
7020ae2189 fix(backend): handle NULL userId in platform cost models and queries
Make user_id Optional[str] in UserCostSummary and CostLogRow to handle
cases where the referenced user has been deleted. Use .get() for safe
access to user_id from query result rows. Regenerate OpenAPI schema.
2026-04-02 16:54:09 +02:00
Zamil Majdy
a49ac5ba13 fix(frontend): update CredentialsProvidersContext state on credential deletion
The delete mutation was using useDeleteV1DeleteCredentials which only
invalidated React Query caches but did not update the context's own
useState-managed credential list. Switch to the context's
deleteCredentials method which both calls the API and removes the
credential from the provider state, so the UI updates immediately.
2026-04-02 16:54:01 +02:00
Zamil Majdy
2a969e5018 fix(backend/copilot): yield final StreamError after transient retry exhaustion for _HandledStreamError
When _run_stream_attempt raises a _HandledStreamError and all transient
retries are exhausted, the outer retry loop sets ended_with_stream_error
but stream_err remains None.  The post-loop code only emits a StreamError
when stream_err is not None, so the SSE stream closes silently and the
frontend never learns the request failed.

Yield a StreamError with the attempt's error message and code just before
breaking out of the retry loop, ensuring clients always receive an error
notification.
2026-04-02 16:49:18 +02:00
Zamil Majdy
79005b1be5 fix(backend): move audit log after user existence check in set_user_rate_limit_tier
The tier-change audit log was written before verifying the user exists,
creating misleading log entries for non-existent users. Move the user
existence check (via get_user_email_by_id) before the audit log and
remove the now-redundant prisma.errors.RecordNotFoundError catch.
2026-04-02 16:48:48 +02:00
Zamil Majdy
4f8cdbee47 Merge remote-tracking branch 'origin/codex/platform-cost-tracking' into combined-preview-test 2026-04-02 16:42:12 +02:00
Zamil Majdy
3ed444dd60 Merge remote-tracking branch 'origin/fix/copilot-credential-setup-ui' into combined-preview-test 2026-04-02 16:42:12 +02:00
Zamil Majdy
83e747ebcd Merge remote-tracking branch 'origin/fix/copilot-tool-output-e2b-bridging' into combined-preview-test 2026-04-02 16:42:12 +02:00
Zamil Majdy
827f2b0f87 Merge origin/fix/copilot-p0-cli-internals (resolved conflicts) 2026-04-02 16:42:12 +02:00
Zamil Majdy
b0d5d3b95e Merge origin/fix/copilot-subagent-security (resolved conflicts) 2026-04-02 16:42:12 +02:00
Zamil Majdy
eb9244be1a Merge origin/feat/copilot-mode-toggle (resolved conflicts) 2026-04-02 16:42:11 +02:00
Zamil Majdy
dd17e83299 Merge remote-tracking branch 'origin/feat/copilot-include-graph-option' into combined-preview-test 2026-04-02 16:42:11 +02:00
Zamil Majdy
74009bedac Merge origin/feat/rate-limit-tiering (resolved conflicts) 2026-04-02 16:42:11 +02:00
Zamil Majdy
72d0c8dad8 Merge remote-tracking branch 'origin/feat/agent-generation-dry-run-loop' into combined-preview-test 2026-04-02 16:42:11 +02:00
Zamil Majdy
e860f164e4 Merge remote-tracking branch 'origin/fix/dry-run-special-blocks' into combined-preview-test 2026-04-02 16:42:11 +02:00
Zamil Majdy
b9336984be fix(platform): re-add credit_cost to platform cost log metadata
Include the block's credit cost (from block_cost_config) in the log
metadata so every entry has a known cost proxy even when the provider
doesn't expose actual dollar costs.
2026-04-02 16:37:28 +02:00
Zamil Majdy
9924dedddc fix(platform): address bot review comments (sentry + coderabbit)
- CRITICAL: Use execute_raw_with_schema for INSERT (not query_raw)
- Remove accidentally committed transcripts/
- Add dry_run guard to skip cost logging for simulated executions
- Change onDelete: Cascade → SetNull to preserve cost history
- Add standalone createdAt index for date-only queries
- Add deterministic tiebreaker (id) to pagination ORDER BY
- Update migration SQL to match schema changes
2026-04-02 16:26:01 +02:00
Zamil Majdy
c054799b4f fix: regenerate API schema and block docs 2026-04-02 16:23:12 +02:00
Zamil Majdy
004d3957b3 docs: regenerate misc.md block docs after dev merge 2026-04-02 16:20:51 +02:00
Zamil Majdy
f3b5d584a3 fix(platform): address PR review round 5
- Replace ServerCrash icon with Receipt for Platform Costs sidebar
2026-04-02 16:02:00 +02:00
Zamil Majdy
476d9dcf80 fix(platform): address PR review round 4
- Add tests for query parameter forwarding and pagination
2026-04-02 16:00:08 +02:00
Zamil Majdy
072b623f8b fix(platform): address PR review round 3
- Remove duplicate block_usage_cost call from cost logging
- Add case-insensitive provider filter using LOWER()
- Add platform_cost_routes_test.py with basic endpoint tests
2026-04-02 15:58:00 +02:00
Zamil Majdy
a68f48e6b7 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-02 15:55:59 +02:00
Zamil Majdy
60e2474640 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/agent-generation-dry-run-loop 2026-04-02 15:55:58 +02:00
Zamil Majdy
a892bbd4dd Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/dry-run-special-blocks 2026-04-02 15:55:56 +02:00
Zamil Majdy
538e8619da Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-02 15:55:54 +02:00
Zamil Majdy
4edb1f6e4a Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-02 15:55:50 +02:00
Zamil Majdy
480d58607d Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-subagent-security 2026-04-02 15:55:49 +02:00
Zamil Majdy
8561eb35f2 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-include-graph-option 2026-04-02 15:55:47 +02:00
Zamil Majdy
0b4acd73f4 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-tool-output-e2b-bridging 2026-04-02 15:55:45 +02:00
Zamil Majdy
e9fe2991d6 chore: remove accidentally committed test screenshots 2026-04-02 15:55:23 +02:00
Zamil Majdy
26b0c95936 fix(platform): address PR review round 2
- Parallelize dashboard queries with asyncio.gather for ~3x speedup
- Move json import to top-level
- Use consistent p. table alias across all dashboard queries
2026-04-02 15:55:03 +02:00
Zamil Majdy
735965bbe5 docs: regenerate misc.md block docs after dev merge 2026-04-02 15:54:14 +02:00
Zamil Majdy
a8f9ed0f60 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into zamilmajdy/secrt-2171-sql-query-block-for-copilotautopilot-analytics-access 2026-04-02 15:53:49 +02:00
Zamil Majdy
308357de84 fix(platform): address PR review round 1
- Parameterize LIMIT/OFFSET in SQL queries to prevent injection
- Only log platform cost on successful block execution
- Convert model enum values to strings for proper logging
- Add error handling with try/catch/finally in frontend useEffect
- Drive filter state from URL params to prevent desync
- Add dark mode support using design tokens
- Return total_users count in dashboard for accurate reporting
- Add credit_cost to metadata as cost proxy until per-token pricing
2026-04-02 15:51:28 +02:00
Zamil Majdy
1a6c50c6cc feat(platform): add platform cost tracking for system credentials
Track real API costs incurred when users consume system-managed credentials.
Captures provider, tokens, duration, and model per block execution and
surfaces an admin dashboard with provider/user aggregation and raw logs.
2026-04-02 15:42:18 +02:00
Zamil Majdy
9391dfa4b2 docs: regenerate block documentation to sync with code 2026-04-02 15:39:07 +02:00
Zamil Majdy
6a69d7c68d fix(backend): hoist _COMMON_CRED_KEYS to module level, conditional source-code instruction
- Move _COMMON_CRED_KEYS to a module-level frozenset to avoid recreating
  it on every call to build_simulation_prompt
- Make the "Study the block's run() source code" instruction conditional
  on source code actually being available, falling back to a generic
  description-based instruction
2026-04-02 14:51:09 +02:00
Zamil Majdy
ad77e881c9 fix(backend/copilot): strip stale thinking blocks in upload_transcript
Add strip_stale_thinking_blocks() call to upload_transcript() alongside
the existing strip_progress_entries(). When a user switches from SDK
(extended_thinking) to baseline (fast) mode and back, the re-downloaded
transcript may contain stale thinking blocks from the SDK session.
Without stripping, these blocks consume significant tokens and trigger
unnecessary compaction cycles.
2026-04-02 14:50:50 +02:00
Zamil Majdy
f1aedfeedd fix(backend): guard against None name in _default_for_input_result
When name key exists with explicit None value, _default_for_input_result
would return None for string-typed pins instead of a string. Add
fallback to "sample input" and fix the type hint to reflect nullable.
2026-04-02 14:48:06 +02:00
Zamil Majdy
49c7ab4011 fix(backend/copilot): set correct stop_reason in baseline transcript entries
Set stop_reason="tool_use" for assistant messages with tool calls and
stop_reason="end_turn" for final text responses. This ensures the
transcript format is compatible with the SDK's --resume flag when a
user switches from fast to extended_thinking mode mid-conversation.
2026-04-02 14:39:47 +02:00
Zamil Majdy
2d04584c84 fix(backend/copilot): correct outdated E2B bridge threshold in system prompt
The prompt said files >5 MB go to /home/user/ but the actual threshold
was lowered to 32 KB. Replace with a generic description that avoids
hardcoding the threshold and directs the model to the [Sandbox copy
available at ...] annotation instead.
2026-04-02 14:39:35 +02:00
Zamil Majdy
2578f61abb fix(backend): remove dead simulation_context param, fix options rename, dedupe constant
- Remove unused simulation_context parameter from simulate_block, RunAgentInput, and _run_agent
- Update placeholder_values references to options (renamed in #12595), with fallback for legacy data
- Remove duplicate _THINKING_BLOCK_TYPES definition in transcript.py
- Update tests to use options field name
2026-04-02 14:38:28 +02:00
Zamil Majdy
927c6e7db0 fix(frontend): add aria-label and disabled state to mode toggle button
- Add aria-label for screen reader accessibility
- Disable button during streaming to prevent confusing mode switches mid-turn
- Add opacity/cursor styling when disabled
2026-04-02 14:38:00 +02:00
Zamil Majdy
f753e6162f fix(backend): consolidate test_agent_search.py into agent_search_test.py
The test file used prefix naming (test_*.py) which is inconsistent with
the codebase convention (*_test.py). Moved all tests into the existing
agent_search_test.py file and removed the duplicate.
2026-04-02 14:37:38 +02:00
Zamil Majdy
b996bc556b fix(backend): clamp search_users limit to [1, 50] to prevent negative take values
A negative limit query parameter would pass through min(limit, 50) as
a negative value to Prisma's take parameter, causing unexpected behavior.
Added max(1, ...) clamping and test coverage for the edge case.
2026-04-02 14:37:02 +02:00
Zamil Majdy
e4f79261c1 fix(docs): correct host field type from "str (password)" to "str (secret)"
The host field is marked as secret=True (hidden in UI) but is not a password.
The "(password)" label was misleading.
2026-04-02 14:36:12 +02:00
Zamil Majdy
09bc939498 chore: remove accidentally committed test screenshots
These binary images and log files inflate the repository and are not
needed for CI or code review.
2026-04-02 14:35:26 +02:00
Zamil Majdy
79c5a10f75 fix(backend/copilot): add missing security test for tool-outputs path allowlist
The allowlist was expanded to accept tool-outputs/ in addition to
tool-results/, but security_hooks_test.py only verified tool-results.
Add test_read_tool_outputs_allowed to close the security test coverage gap.
2026-04-02 14:35:18 +02:00
Zamil Majdy
2bf5a37646 fix(backend): add ge/le bounds to claude_agent_max_transient_retries config field
The field lacked validation bounds unlike max_turns and max_budget_usd,
allowing negative or excessively large values to be configured.
2026-04-02 14:35:09 +02:00
Zamil Majdy
d5d24e6e66 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/dry-run-special-blocks 2026-04-02 14:34:50 +02:00
Zamil Majdy
c9cbd7531e fix(copilot): sanitize tool_use_id and resp_preview in post_tool_use_hook, remove test-results
- post_tool_use_hook logged tool_use_id with only truncation ([:12]) while
  post_tool_failure_hook properly sanitized it via _sanitize(). Now both hooks
  use _sanitize() consistently to strip control characters before logging.
- resp_preview from tool_response was also logged without sanitization.
- Remove test-results/ directory that should not ship in a production PR.
2026-04-02 14:34:46 +02:00
Zamil Majdy
289a19d402 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-02 14:34:33 +02:00
Zamil Majdy
7800af1835 fix(backend): remove duplicate _THINKING_BLOCK_TYPES definition in transcript.py
The constant was already defined at module level (line 48) and used by both
_strip_thinking_from_non_last_assistant and _flatten_assistant_content. The
duplicate added at line 692 was redundant.
2026-04-02 14:34:03 +02:00
Zamil Majdy
114f91ff53 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-02 14:32:47 +02:00
Zamil Majdy
63a0153e4f fix(platform): fix ClamAV connectivity in Docker containers
clamd was only listening on 127.0.0.1 inside its container, so
container-to-container connections on the Docker network were refused.

- Add CLAMD_CONF_TCPAddr=0.0.0.0 to docker-compose so clamd binds
  to all interfaces
- Change default clamav_service_host from "localhost" to "clamav"
  (the docker-compose service name), matching how other services
  like redis, rabbitmq, supabase-db are referenced
2026-04-02 13:34:50 +02:00
Zamil Majdy
1364616ff1 fix(frontend+backend): prefill block inputs and hide advanced in CoPilot setup card
Backend:
- get_inputs_from_schema() now accepts input_data to populate each field's
  value with what CoPilot already provided, and includes the advanced flag
  from the schema so the frontend can hide non-essential fields.

Frontend:
- SetupRequirementsCard prefills form inputs from backend-provided values
  instead of showing empty forms
- Advanced fields hidden by default with "Show advanced fields" toggle
  (matching builder behaviour)
- siblingInputs built from both input values and discriminator_values
  so the host pattern modal can extract the host from the URL
- extractInitialValues() populates form state from prefilled values
2026-04-02 13:24:49 +02:00
Zamil Majdy
4e9169c1a2 fix(frontend): prefill host pattern in CoPilot credential setup modal
The SetupRequirementsCard passed inputValues={{}} to CredentialsGroupedView,
which meant the HostScopedCredentialsModal never received the target URL
from the backend's discriminator_values. The "Host Pattern" field was always
empty even though the CoPilot knew the exact host (e.g. api.openai.com).

Add buildSiblingInputsFromCredentials() to extract the discriminator value
(URL) from the missing_credentials setup_info and pass it as siblingInputs
so the modal can prefill the host pattern.
2026-04-02 12:40:45 +02:00
Zamil Majdy
705e97ec46 fix(backend/copilot): don't cache DEFAULT_TIER for non-existent users
When `_fetch_user_tier` is called for a user that doesn't exist yet, it
was returning `DEFAULT_TIER` (FREE) which the `@cached` decorator would
store for 5 minutes. If the user was then created with a higher tier
(e.g. PRO), they'd receive the stale cached FREE tier until TTL expiry.

Fix: raise `_UserNotFoundError` instead of returning `DEFAULT_TIER` when
the user record is missing or has no subscription tier. The `@cached`
decorator only caches successful returns, not exceptions. The outer
`get_user_tier` wrapper catches the exception and returns `DEFAULT_TIER`
without caching, so the next call re-queries the database.

Adds a regression test verifying that a not-found result is not cached
and a subsequent lookup after user creation returns the correct tier.
2026-04-02 11:33:10 +02:00
Zamil Majdy
8cea0bede0 fix(backend): generate type-appropriate dry-run fallback for typed AgentInputBlock subclasses
The simulator's AgentInputBlock passthrough always generated a string
fallback when no user value was provided. Typed subclasses like
AgentNumberInputBlock (int), AgentDateInputBlock (date), and
AgentToggleInputBlock (bool) then failed downstream validation.

Inspect the block's output schema `result` pin to determine the expected
type and generate an appropriate default (0 for int, today's date for
date, false for bool, etc.) instead of a plain string.
2026-04-02 11:32:51 +02:00
Zamil Majdy
5e52050788 fix(backend): re-enable dry-run input validation and fill missing simulator output pins
1. _base.py: Instead of blanket-skipping input validation in dry-run mode,
   validate non-credential fields so blocks executing for real (e.g.
   AgentExecutorBlock) still get proper input validation. Credential fields
   are excluded since they contain sentinel None values.

2. simulator.py: After yielding LLM-simulated outputs, fill in
   type-appropriate defaults for any required output pins the LLM omitted.
   This prevents downstream nodes from stalling in INCOMPLETE state when
   the simulation response is incomplete.
2026-04-02 11:11:57 +02:00
Zamil Majdy
0b77af29aa fix(platform): address PR reviewer blockers and should-fix items
- Add schema.prisma comment documenting intentional @default(PRO) for beta
- Validate tier value in RateLimitDisplay.tsx to prevent undefined renders
- Add user existence check (404) in get_user_rate_limit_tier endpoint
- Add auth test for search_users endpoint
- Add tier downgrade test (PRO -> FREE)
- Add test for get_user_rate_limit_tier with non-existent user
2026-04-02 11:10:31 +02:00
Zamil Majdy
bd4cc21fc6 fix(backend/copilot): preserve successful graph fetches on timeout and use py3.10-compat wait_for
Remove the blanket `a.graph = None` loop in the TimeoutError handler that
was wiping already-fetched graphs. Agents that completed before the timeout
keep their results; agents still pending already have graph=None from the
model default.

Also replace `asyncio.timeout()` (Python 3.11+) with `asyncio.wait_for()`
which is available since Python 3.4, matching the `python >= 3.10`
requirement in pyproject.toml.

Add tests for the timeout path, success path, and skip-no-graph-id path.
2026-04-02 11:08:20 +02:00
Zamil Majdy
19ea753639 fix(backend/copilot): address review feedback on _bridge_to_sandbox
- Use asyncio.to_thread for synchronous file read in async context
- Promote bridge failure logging from DEBUG to WARNING
- Extract magic number 2000 to _DEFAULT_READ_LIMIT named constant
2026-04-02 11:07:48 +02:00
Zamil Majdy
07fd734fa1 style: reformat _base.py (black) 2026-04-02 10:56:09 +02:00
Zamil Majdy
8a4a16ec5c fix(backend): don't null credentials in dry-run prepare_dry_run
validate_data strips None values from input_data before JSON schema
validation. Setting credentials=None caused the field to be absent,
failing the required check. Keep original credentials in input_data
(actual platform creds injected via extra_exec_kwargs in manager.py).

This fixes OrchestratorBlock failing with "credentials is a required
property" when executed as part of a child graph in dry-run mode.
2026-04-02 10:35:08 +02:00
Zamil Majdy
5551da674e fix(blocks): skip input validation in dry-run mode for blocks with sentinel credentials
Two fixes for dry-run execution of nested agents:

1. _base.py: Skip validate_data() when execution_context.dry_run is True.
   prepare_dry_run() sets credentials=None for OrchestratorBlock (platform
   key injected separately), but the block's own JSON schema validation
   rejected None as "required property". This caused any dry-run execution
   of graphs containing OrchestratorBlock to fail with BlockInputError.

2. agent.py: Check required inner-agent inputs against data["inputs"]
   instead of top-level data keys (previous commit 6f03ceeb88).
2026-04-02 10:33:22 +02:00
Zamil Majdy
e57e48272a security: remove test artifacts containing leaked API keys and OAuth tokens 2026-04-02 10:23:21 +02:00
Zamil Majdy
6f03ceeb88 fix(blocks): validate AgentExecutorBlock inputs against nested inputs dict
get_missing_input() and get_mismatch_error() were checking required fields
from the inner agent's input_schema against the top-level node data keys
(inputs, user_id, graph_id, etc.) instead of against data["inputs"] where
the actual field values live. This caused any AgentExecutorBlock with
required inner-agent inputs to fail validation with "This field is required"
even when the values were correctly provided in the inputs dict.
2026-04-02 10:10:06 +02:00
Zamil Majdy
554ff0b20b dx(backend/copilot): add live execution test evidence for subagent security hooks
Test results from live execution showing SubagentStart/SubagentStop hooks
firing correctly for two parallel Agent tool invocations with proper
slot tracking (active=N/10) and JSONL transcript persistence.
2026-04-02 10:06:56 +02:00
Zamil Majdy
c2f421cb42 dx(backend/copilot): add live execution guardrail verification for PR #12636
Programmatic verification from running container proving all P0 guardrails
are deployed and active: max_turns=50, max_budget_usd=5.0,
fallback_model=claude-sonnet-4-20250514, max_transient_retries=3,
security env vars, and _last_reset_attempt infinite-loop fix.
2026-04-02 10:01:46 +02:00
Zamil Majdy
dd228de17d fix(backend/copilot): preserve binary files when bridging to E2B sandbox
_bridge_to_sandbox was decoding all file content with
`errors='replace'`, silently corrupting non-UTF-8 bytes (images, PDFs,
etc.) by replacing them with U+FFFD.

Now attempts strict UTF-8 decode first; on failure writes raw bytes
via sandbox.files.write() (which accepts Union[str, bytes, IO]) or
base64-encoded shell pipe for /tmp paths.

Also updates _sandbox_write to accept str | bytes and adds tests for
both small and large binary file bridging.
2026-04-02 09:56:52 +02:00
Zamil Majdy
c26ff22f9c fix(blocks): allow MySQL SELECT INTO @variable syntax in SQL query validation
The INTO keyword was blanket-blocked in _DISALLOWED_KEYWORDS, which
incorrectly rejected the valid read-only MySQL syntax
`SELECT ... INTO @variable` for session variable assignment.

Replace the blanket INTO ban with a contextual check that allows
INTO followed by @-prefixed user variables while still blocking:
- SELECT INTO table_name (PG/MSSQL table creation)
- SELECT INTO OUTFILE/DUMPFILE (MySQL filesystem writes)
- INSERT INTO (already caught by INSERT, but defense-in-depth)

Also remove dead OUTFILE/DUMPFILE entries from _DISALLOWED_KEYWORDS
since sqlparse classifies them as Name tokens, not Keywords, so
they were never matched by the keyword extraction logic.
2026-04-02 09:56:51 +02:00
Zamil Majdy
760360fbe9 fix(backend): use deterministic SHA-256 hash for Redis cache keys
Python's built-in `hash()` is randomised per-process via PYTHONHASHSEED.
In a multi-pod deployment each pod computes a different hash for the same
arguments, causing Redis cache lookups and invalidations (e.g.
`cache_delete`) to silently miss across pods.

Replace `hash()` with `hashlib.sha256` over the `repr()` of the key
tuple, which is deterministic across processes and machines.
2026-04-02 09:56:35 +02:00
Zamil Majdy
e3d589b180 fix(backend/copilot): exclude StreamError/StreamStatus from events_yielded counter
StreamError and StreamStatus are ephemeral notifications, not content
events. When _run_stream_attempt yields a StreamError for a transient
API error before raising _HandledStreamError, the events_yielded counter
was incremented, causing _next_transient_backoff() to return None and
bypassing the retry logic entirely. Exclude these event types from the
counter so transient errors are properly retried with exponential backoff.
2026-04-02 09:56:34 +02:00
Zamil Majdy
913d93f47c test: add E2E dry-run loop validation screenshots (round 4)
Unit tests (37/37 pass) and browser E2E test confirm the full
create -> dry-run -> inspect -> fix -> dry-run loop is working.
2026-04-02 09:49:26 +02:00
Zamil Majdy
03e5d37dc4 test(backend/copilot): add E2E test screenshots for PR #12646 round 1 2026-04-02 09:38:04 +02:00
Zamil Majdy
6e2dab413e test(backend): add E2E test screenshots for SQL Query block PR #12569
Screenshots from round 3 E2E testing:
- Block search results showing SQL Query block
- Basic fields: DatabaseType, Host, Database, Query, Credentials
- Advanced fields: Port, ReadOnly, Timeout, MaxRows
- Credential modal with Username & Password labels
2026-04-02 09:35:25 +02:00
Zamil Majdy
b10dc7c2d5 ci(backend/copilot): add E2E test evidence for rate-limit tiering (round 4) 2026-04-02 09:25:26 +02:00
Zamil Majdy
8de935c84b dx(backend/copilot): add round 3 E2E test screenshots for PR #12636 2026-04-02 09:20:32 +02:00
Zamil Majdy
dd34b0dc48 fix(backend): lower bridge shell threshold and add collision-free sandbox paths
- Lower _BRIDGE_SHELL_MAX_BYTES from 5 MB to 32 KB to stay within
  ARG_MAX when base64-encoding content for shell transfer.
- Prefix bridged sandbox filenames with a 12-char SHA-256 hash of the
  full source path to prevent collisions when different source files
  share the same basename (e.g. multiple result.json files).
- Fix potential NameError in exception handler when basename is not yet
  assigned.
2026-04-02 09:07:43 +02:00
Zamil Majdy
015e0d591e fix(backend/copilot): remove type: ignore from conftest, use named fixtures
Address CodeRabbit review: remove # type: ignore[override] from SDK
conftest fixtures per AGENTS.md no-suppressor rule. Use name= parameter
in pytest_asyncio.fixture decorator with private function names instead.
2026-04-02 08:29:06 +02:00
Zamil Majdy
2cb65f5c34 fix(backend/copilot): use working_dir in prompt examples instead of hardcoded /home/user
The storage supplement template and _persist_and_summarize had hardcoded
/home/user/ paths in save_to_path examples. In local (bubblewrap) mode
the working dir is /tmp/copilot-<session>/, not /home/user/. Use the
{working_dir} template variable in prompting.py and a generic
<working_dir> placeholder in base.py so the model gets correct paths
regardless of execution mode.
2026-04-02 08:26:18 +02:00
Zamil Majdy
3a49086c3d fix(backend/copilot): use resolved path for bridging, explicit return None
- Pass `resolved` (realpath-expanded) to `_bridge_to_sandbox` in
  `_read_file_handler` so the bridge target matches the file that was
  actually read (addresses review comment).
- Replace bare `return` with explicit `return None` in
  `_bridge_to_sandbox` large-file skip path for consistency with the
  declared `str | None` return type.
2026-04-02 08:19:25 +02:00
Zamil Majdy
0e567df1da fix(backend/copilot): add concrete tool examples to file copy prompting
The "Moving files between storages" section only had direction labels
("Sandbox → Persistent") with no tool examples. Model didn't know HOW
to copy. Now shows write_workspace_file(source_path=...) for upload and
read_workspace_file(save_to_path=...) for download.
2026-04-02 08:15:59 +02:00
Zamil Majdy
b5b754d5eb fix(backend/copilot): return sandbox path from bridge, inform model of copy location
Address CodeRabbit review: _bridge_to_sandbox now returns the sandbox
path (or None on failure) so callers can append "[Sandbox copy available
at /tmp/file.json]" to the Read result. This gives the model explicit
feedback about where to find the file in the sandbox, instead of
silently bridging with no indication.
2026-04-02 08:03:36 +02:00
Zamil Majdy
456bb1c4d0 fix(frontend): use unfiltered credentials for host-scoped deduplication
The useCredentials hook pre-filters savedCredentials by discriminatorValue.
When no URL is entered yet, the filtered list is empty, causing the
deduplication logic to miss existing credentials and create duplicates.

Fix: access the full unfiltered credential list from CredentialsProvidersContext
for both the hasExistingForHost check and the delete-before-create logic.
2026-04-02 08:02:23 +02:00
Zamil Majdy
263cd0ecac fix(backend/copilot): add bridging to Read tool, size limits, prompting for images
- Add _bridge_to_sandbox call in _read_file_handler (tool_adapter.py)
  so the MCP Read tool (which the model actually uses) also bridges
  SDK-internal files into the E2B sandbox — not just the E2B read_file
- Move E2B-specific bridging text to _E2B_TOOL_NOTES (not shown in
  local bubblewrap mode)
- Add size-tiered bridging: shell base64 for <=5MB, files API for
  5-50MB, skip for >50MB
- Add CRITICAL prompting sections for binary/image data handling
  (use workspace, not inline) and @@agptfile references
- Add 7 unit tests for _bridge_to_sandbox
- Fix comment accuracy in context.py, update docstring
2026-04-02 08:00:05 +02:00
Zamil Majdy
66afca6e0c fix(backend/copilot): address review feedback - size limits, prompting, tests
- Move E2B-specific bridging text from shared prompt section to E2B
  supplement's extra_notes (MAJOR 1)
- Add size cap to _bridge_to_sandbox: <=5MB uses shell base64 to /tmp,
  5-50MB uses sandbox.files.write to /home/user, >50MB skipped (MAJOR 2)
- Add 7 unit tests for _bridge_to_sandbox covering happy path, skip
  conditions, error handling, and size-based routing (MINOR 3)
- Fix inaccurate comment about tool-outputs name origin (NIT 7)
- Update is_allowed_local_path docstring to mention tool-outputs (NIT 9)
- Add prompting guidance for handling base64 images in tool outputs
  (save to workspace, show via download URL)
- Add prompting guidance for using @@agptfile: references instead of
  copy-pasting large data between tools
- Add no-op server/graph_cleanup fixtures to sdk/conftest.py so SDK
  unit tests don't require Postgres
2026-04-02 07:56:49 +02:00
Zamil Majdy
a71396ee48 fix(backend): update dry-run tests for platform key + fix falsy value filter
- Mock `_get_platform_openrouter_key` in `test_prepare_dry_run_orchestrator_block`
  so the test doesn't depend on a real OpenRouter key being present in CI.
  Also fix incorrect assertion that model is preserved (it's overridden to
  the simulation model).

- Fix output filter in `simulate_block` that incorrectly dropped valid falsy
  values like `False`, `0`, and `[]`. Now only `None` and empty strings are
  skipped.

- Add `test_generic_block_preserves_falsy_values` test to cover the fix.
2026-04-02 07:52:09 +02:00
Zamil Majdy
beb43bb847 fix(frontend): replace duplicate host-scoped credentials and add delete support
- HostScopedCredentialsModal now deletes existing credentials for the same
  host before creating new ones, preventing duplicates
- Wire up delete flow: CredentialsFlatView passes onDelete to CredentialRow,
  CredentialsInput renders DeleteConfirmationModal
- Update button text to "Update headers" when credentials already exist
- Dynamic modal title/button: "Update" vs "Add" based on existing creds
2026-04-02 07:51:59 +02:00
Zamil Majdy
a55653f8c1 fix(backend): tighten fallback model detection and reset flag on retry
- Remove "overloaded" from the fallback detection pattern in _on_stderr;
  only "fallback" reliably indicates the SDK switched models. An
  "overloaded" stderr line may just be a transient 529 error that gets
  retried without activating the fallback.

- Reset fallback_model_activated = False at the start of each retry
  iteration (alongside fallback_notified) so a flag set during a failed
  attempt does not leak into the next attempt as a spurious notification.
2026-04-02 07:50:34 +02:00
Zamil Majdy
f3dd708cf6 fix(backend/copilot): fix tool output file reading between E2B and host
Three issues prevented the copilot agent from processing large tool
outputs (e.g. base64 images) in the E2B sandbox:

1. _persist_and_summarize used path= attribute in the truncation tag,
   which the model confused with a local filesystem path. Changed to
   workspace_path= and added save_to_path guidance for E2B processing.

2. is_allowed_local_path only accepted "tool-results" directory but the
   SDK may also use "tool-outputs". Now accepts both.

3. When E2B is active and the Read tool accesses an SDK-internal file,
   the content was returned to the conversation but not available in
   the sandbox for bash processing. Added automatic bridging that copies
   the file into /tmp/<filename> in the sandbox.
2026-04-02 07:47:39 +02:00
Zamil Majdy
c4ff31c79c fix(backend/copilot): remove duplicate test and narrow exception assertion
- Remove duplicate `test_dry_run_accepts_explicit_false` (identical to
  `test_dry_run_accepts_false`)
- Use `pydantic.ValidationError` instead of broad `Exception` in
  `test_wait_for_result_upper_bound`
2026-04-02 07:25:44 +02:00
Zamil Majdy
9f2257daaa refactor(backend): move dry-run credential logic from manager.py to simulator.py
- OrchestratorBlock now uses platform simulation model + OpenRouter key
  instead of user's model/credentials during dry-run
- Credential restore + fallback-to-simulation logic moved into
  prepare_dry_run() and get_dry_run_credentials() in simulator.py
- manager.py reduced by ~30 lines of business logic
- Falls back to LLM simulation if platform OpenRouter key unavailable
2026-04-02 07:10:28 +02:00
Zamil Majdy
925e9a047c fix(platform): address remaining should-fix items for rate-limit tiering
- Add docstring noting SubscriptionTier mirrors schema.prisma enum and
  can be replaced with prisma.enums import after prisma generate
- Remove unnecessary JSDoc comments from useRateLimitManager helpers
  per frontend code convention (avoid comments unless complex)
- Add audit trail: log old tier when admin changes a user's tier
- Fix stale test assertion (DEFAULT_TIER is FREE, not PRO)
- Show tier label ("Pro plan") in UsagePanelContent for end users
- Add formatResetTime unit tests (UsagePanelContent.test.ts)
- Add tier label display test in UsageLimits.test.tsx
- Fix pre-existing pyright errors from prisma stubs not having
  subscriptionTier (type: ignore until prisma generate is run)
2026-04-02 06:56:57 +02:00
Zamil Majdy
3e6faf2de7 fix(copilot): address remaining should-fix items from reviewer
- Extract _normalize_model_name() to deduplicate provider-prefix
  stripping and dot-to-hyphen normalization shared by _resolve_sdk_model
  and _resolve_fallback_model.
- Emit a StreamStatus notification when the SDK activates the fallback
  model (detected via CLI stderr lines containing "fallback" or
  "overloaded").
- Item 5 (transcript rollback) was already addressed — both
  _HandledStreamError and generic Exception handlers snapshot and
  restore transcript_builder._entries on retry.
2026-04-02 06:53:55 +02:00
Zamil Majdy
40a1f504c0 fix(copilot): address 6 should-fix items from reviewer
- Add CLAUDE_CODE_TMPDIR unit tests for build_sdk_env
- Strengthen _sanitize() tests with caplog assertions
- Fix user-facing text (no internal tool names)
- Rename task_tool_use_ids → subagent_tool_use_ids
- Standardize 'Starting agent' terminology
- Fix denial messages: sub-tasks → sub-agents
2026-04-02 06:49:24 +02:00
Zamil Majdy
22e8c5c353 fix(copilot): update response_adapter test for expanded transient patterns
"API rate limited" is now correctly caught by is_transient_api_error
after adding 429/rate-limit patterns. Use a non-transient error
("Invalid API key provided") to test the raw error pass-through path.
2026-04-02 06:31:24 +02:00
Zamil Majdy
1de2a7fb09 fix(platform): address PR review items for rate-limit tiering
- Change DEFAULT_TIER from PRO to FREE (fail-closed on DB errors)
- Use shared_cache=True (Redis-backed) for _fetch_user_tier so tier
  changes propagate across pods immediately
- Use TIER_MULTIPLIERS.get(tier, 1) to avoid KeyError on unknown tiers
- Rename _tier to tier in routes.py where the variable is used, and
  to _ where it is truly unused
- Add minimum 3-char query length for search_users to prevent user
  table enumeration
- Use generated API client (getV2SearchUsersByNameOrEmail) instead of
  raw fetch() in useRateLimitManager
- Remove unnecessary cast and fallback in RateLimitDisplay
- Fix fragile call-count-based _ld_side_effect in tests to use
  flag_key matching pattern
- Update test assertion for DEFAULT_TIER change (FREE not PRO)
2026-04-02 06:28:36 +02:00
Zamil Majdy
b3d9e9e856 fix(backend): add 429/5xx patterns to is_transient_api_error and add config validators
- Add rate-limit (429) and server error (5xx) string patterns to
  is_transient_api_error() so the fallback retry path catches these
  in addition to connection-level errors (ECONNRESET).
- Add ge/le validators on max_turns (1-500) and max_budget_usd
  (0.01-100.0) to prevent misconfiguration.
- Rename max_transient -> max_transient_retries and
  _can_retry_transient() -> _next_transient_backoff() for clarity.
- Add comprehensive tests for all new transient patterns and config
  boundary validation.
2026-04-02 06:21:51 +02:00
Zamil Majdy
48b166a82c fix(backend): address PR review items for include_graph feature
- Surface truncation notice to copilot via response message when
  >_MAX_GRAPH_FETCHES agents are skipped, instead of only logging
- Add guidance in agent_generation_guide to use include_graph only
  after narrowing to a specific agent by UUID
- Add tests for truncation, mixed graph_id presence, partial
  success/failure across multiple agents, and keyword-search
  enrichment path
2026-04-02 06:21:27 +02:00
Zamil Majdy
697b15ce81 fix(backend/copilot): always append user message to transcript on retries
When a duplicate user message was suppressed (e.g. network retry), the
user turn was not added to the transcript builder while the assistant
reply still was, creating a malformed assistant-after-assistant structure
that broke conversation resumption. Now the user message is always
appended to the transcript when present and is_user_message, regardless
of whether the session-level dedup suppressed it.
2026-04-02 06:18:26 +02:00
Zamil Majdy
5beabf936c fix(frontend): revert useChatSession mutation call to match generated API
The generated mutateAsync requires an argument even for void mutations
due to react-query typing. Use `as never` cast to satisfy both the
generated type and the void constraint.
2026-04-02 06:12:24 +02:00
Zamil Majdy
32bfe1b209 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-01 20:52:00 +02:00
Zamil Majdy
62302db470 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/agent-generation-dry-run-loop 2026-04-01 20:51:58 +02:00
Zamil Majdy
89c7f34d26 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/dry-run-special-blocks 2026-04-01 20:51:54 +02:00
Zamil Majdy
543fc2da70 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into zamilmajdy/secrt-2171-sql-query-block-for-copilotautopilot-analytics-access 2026-04-01 20:51:52 +02:00
Zamil Majdy
7f986bc565 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-01 20:51:50 +02:00
Zamil Majdy
f4571cb9e1 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-01 20:51:48 +02:00
Zamil Majdy
5f41afe748 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-include-graph-option 2026-04-01 20:51:46 +02:00
Zamil Majdy
d046c01a65 feat(copilot): allow background sub-agents and add Agent tool UI
- Remove run_in_background deny block — SDK handles async lifecycle
  (returns isAsync:true, model polls via TaskOutput)
- Keep max_subtasks concurrency limit (background agents count too)
- Add "agent" tool category to frontend GenericTool with RobotIcon
- Detect isAsync output to show "Agent started" not "Agent completed"
- Add TaskOutput renderer showing retrieval status and results
- Fix pre-existing TS error in useChatSession (void mutation body)
- Update tests: background allowed, limit still enforced
2026-04-01 20:50:49 +02:00
Zamil Majdy
b220fe4347 test(copilot): add build_sdk_env tests for all 3 auth modes
Cover subscription, direct Anthropic, and OpenRouter auth modes in
build_sdk_env(). Also verifies that all modes return a mutable dict
that can accept security env vars like CLAUDE_CODE_TMPDIR.
2026-04-01 20:31:32 +02:00
Zamil Majdy
7af138adba fix(backend): use word-boundary regex for database name sanitization
Replaces naive str.replace() with re.sub() using \b word boundaries
when scrubbing database names from error messages. Prevents mangling
unrelated words when the database name is a common substring like
"test", "data", or "on".
2026-04-01 20:30:39 +02:00
Zamil Majdy
5c406a20ba fix(backend): handle AgentOutputBlock format field in dry-run simulation
Mirror the real AgentOutputBlock.run() behavior: when a format string
is provided, apply Jinja2 formatting and yield only the "output" pin;
when no format is provided, yield both "output" and "name" pins.
2026-04-01 20:29:40 +02:00
Zamil Majdy
61513b9dad fix(copilot): mock build_sdk_env to return {} instead of None in retry tests
The tests were mocking build_sdk_env to return None, but the service
code now assigns security env vars (CLAUDE_CODE_TMPDIR, etc.) to the
returned dict. This caused TypeError: 'NoneType' object does not
support item assignment in all 6 retry scenario tests.
2026-04-01 20:27:51 +02:00
Zamil Majdy
6f679a0e32 fix(backend/copilot): preserve tool_calls and tool_call_id through context compression 2026-04-01 20:27:33 +02:00
Zamil Majdy
b8065212b1 chore: remove accidentally committed test screenshots 2026-04-01 19:16:25 +02:00
Zamil Majdy
d5281a9a13 chore: remove accidentally committed test screenshots 2026-04-01 19:16:22 +02:00
Zamil Majdy
05495d8478 chore: remove accidentally committed test screenshots 2026-04-01 19:16:18 +02:00
Zamil Majdy
bae409d04e chore: remove accidentally committed test screenshots 2026-04-01 19:16:14 +02:00
Zamil Majdy
e11eb2caaa chore: remove accidentally committed test screenshots 2026-04-01 19:16:10 +02:00
Zamil Majdy
2c04768711 chore: remove accidentally committed test screenshots 2026-04-01 19:15:35 +02:00
Zamil Majdy
c9bf3aa339 fix(backend/copilot): clear partial graphs on timeout for consistent state 2026-04-01 19:13:10 +02:00
Zamil Majdy
e753aee7a0 fix(copilot): prevent infinite transient retry loop
The transient_retries counter was reset to 0 at the top of the while
loop on every iteration, including after transient retry `continue`
statements.  Since transient retries don't increment `attempt`, the
counter reset every time, creating an infinite retry loop that could
never exhaust the max_transient budget.

Fix: only reset transient_retries when the context-level `attempt`
actually changes, using a _last_reset_attempt sentinel.
2026-04-01 18:21:50 +02:00
Zamil Majdy
f76566c834 fix(test): update dry-run param test to match deduplicated description
The run_agent dry_run description was updated during deduplication to
reference the agent_generation_guide instead of saying "preview mode".
Update the test assertion to match.
2026-04-01 18:18:20 +02:00
Zamil Majdy
a58b997141 fix(test): align simulation prompt test with error pin exclusion from required list
The test expected "error" in "Available output pins" but the prompt now
correctly excludes error from the required output pins list to match the
instruction telling the LLM to omit it.
2026-04-01 18:15:42 +02:00
Zamil Majdy
3f24a003ad fix(copilot): add None guard to fix pyright reportOperatorIssue
_resolve_fallback_model returns str | None, so pyright flags the
`"." not in result` assertion.  Add an explicit `is not None` check
before the containment test to narrow the type.
2026-04-01 18:15:16 +02:00
Zamil Majdy
1a645e1e37 fix(backend/copilot): align _flatten_assistant_content with master (drop tool_use blocks)
The merge conflict resolution copied the pre-#12625 version of
_flatten_assistant_content which converts tool_use blocks to
[tool_use: name] placeholders. Master's #12625 changed this to
drop tool_use blocks entirely to prevent the model from mimicking
them as plain text. Align the canonical transcript.py with master.
2026-04-01 18:14:59 +02:00
Zamil Majdy
bee76962b0 fix(backend): rollback write transaction on error in SQL query block
Use explicit except/else instead of finally to ensure write transactions
are rolled back when an exception occurs, rather than committed.
2026-04-01 18:13:37 +02:00
Zamil Majdy
864e68bed1 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-01 18:09:58 +02:00
Zamil Majdy
7c6201110c test: add E2E screenshots for PR #12578 2026-04-01 18:06:51 +02:00
Zamil Majdy
bded680b77 docs(backend): add cross-cutting test location explanation to dry_run_loop_test.py 2026-04-01 18:06:51 +02:00
Zamil Majdy
1e008dc172 fix(copilot): align dry_run_loop_test with #12582's required dry_run field
After merging dev, #12582 made dry_run a required field with description
"Execute in preview mode." — update tests to match:
- Assert dry_run is in required (not optional) for both run_agent/run_block
- Match "preview mode" instead of "simulation"/"guide" in descriptions
- Pass dry_run=False explicitly in RunAgentInput constructor tests
- Lower description length threshold to 10 (was 20) for the shorter text
2026-04-01 18:06:51 +02:00
Zamil Majdy
9966e122ab test(copilot): add functional tests for dry-run loop beyond substring checks
Add 23 new tests covering:
- run_agent and run_block OpenAI tool schema validation (type, optionality,
  description quality, coexistence of dry_run + wait_for_result)
- RunAgentInput Pydantic model behavior (default value, bool coercion,
  combined parameters, validation bounds, string stripping)
- Guide workflow ordering (create before dry-run, dry-run before inspect,
  fix before repeat, numbered step sequence)
2026-04-01 18:06:51 +02:00
Zamil Majdy
65108c31dc fix(copilot): reference SendAuthenticatedWebRequestBlock in tool discovery + fix CI 2026-04-01 18:06:51 +02:00
Zamil Majdy
7767c97f50 fix(copilot): deduplicate dry-run instructions, keep only in guide
Remove duplicated dry-run workflow text from prompting.py shared notes,
service.py DEFAULT_SYSTEM_PROMPT, run_agent.py tool/param descriptions,
create_agent.py, and edit_agent.py. The agent_generation_guide.md is the
single source of truth, loaded on-demand via get_agent_building_guide.
2026-04-01 18:06:51 +02:00
Zamil Majdy
69ab21ebe7 fix(copilot): address review round 2 — remove internal Python refs from guide, format system prompt
- Replace `_SHARED_TOOL_NOTES`, `prompting.py` reference in mcp_tool_guide.md
  with LLM-friendly wording ("described in the tool notes") since the guide
  is shown to the model, not to developers.
- Break the long single-line dry-run instruction in DEFAULT_SYSTEM_PROMPT
  into bullet points matching the surrounding prompt style for readability.
2026-04-01 18:06:31 +02:00
Zamil Majdy
6fe4e1b774 fix(copilot): address review round 1 — deduplicate prompts, relocate tests
- Slim down the duplicate error-pattern list in _SHARED_TOOL_NOTES
  (prompting.py) to a concise summary that references the guide for details,
  reducing maintenance surface from 5+ near-identical copies to one.
- Move dry_run_loop_test.py from backend/copilot/ (production package) to
  test/copilot/ to match the project's test directory convention.
- Route supplement tests through the public get_sdk_supplement() API instead
  of importing the private _SHARED_TOOL_NOTES symbol.
- Loosen overly-brittle assertions (exact step numbers, exact spacing around
  '/' in error pattern names) while preserving intent as prompt regression
  tests.  Add module-level docstring documenting the deliberate brittleness.
2026-04-01 18:06:31 +02:00
Zamil Majdy
c778cc9849 fix(platform): remove hardcoded 3-iteration cap from dry-run loop
Instead of capping at 3 iterations, let the copilot repeat the
dry-run -> fix cycle until the simulation passes or the problems
are clearly unfixable. This gives the copilot flexibility to keep
going if it's making progress, or stop early if issues are not
resolvable.
2026-04-01 18:06:31 +02:00
Zamil Majdy
50b635da6d fix(copilot): remove redundant "3 iterations" repetition in supplement
De-duplicate "after 3 iterations" from the same sentence that already
says "up to 3 iterations" — now reads "If issues persist, report..."
2026-04-01 18:06:31 +02:00
Zamil Majdy
08e254143b fix(copilot): standardize iteration wording, add test for tool discovery priority, fix cross-reference
- Standardize max-iteration wording to "3 iterations" everywhere (prompting.py,
  agent_generation_guide.md, tests) instead of mixed "3 times"/"3 iterations"
- Replace loose `or` fallback in test_shared_tool_notes_include_max_iterations
  with exact "3 iterations" assertion
- Add test_shared_tool_notes_include_tool_discovery_priority test
- Make mcp_tool_guide.md cross-reference explicit: point to `_SHARED_TOOL_NOTES`
  in `prompting.py` instead of vague "see shared supplement"
2026-04-01 18:06:31 +02:00
Zamil Majdy
89fcfc4e0a refactor(copilot): move tool/action search priority to shared supplement
Move the "check blocks first" strategy from `mcp_tool_guide.md` (only
loaded for MCP) into `_SHARED_TOOL_NOTES` so it applies to every
session. The MCP guide now references the shared strategy instead of
duplicating it.
2026-04-01 18:06:31 +02:00
Zamil Majdy
e7ca07f4bf fix(copilot): align dry-run prompt wording and tighten test assertion
- Align guide heading to "create -> dry-run -> fix" matching supplement
- Align error pattern names between guide and supplement to canonical form
- Drop loose "max " fallback in test assertion for precision
2026-04-01 18:06:31 +02:00
Zamil Majdy
c564ac7277 fix(copilot): address PR review - reduce prompt redundancy, tighten tests
- Slim down DEFAULT_SYSTEM_PROMPT to a brief one-liner referencing the
  supplement for detailed workflow (avoids ~300 token duplication)
- Tighten test assertions to use specific substring checks (e.g. section
  headers, exact phrases) instead of loose single-word matches
- Restore view_agent_output reference in the agent generation guide for
  node-by-node execution trace inspection
- Add test for view_agent_output mention in guide (22 tests total)
2026-04-01 18:06:31 +02:00
Zamil Majdy
ac3a826ad0 feat(copilot): add create -> dry-run -> fix loop to agent generation prompts
Instruct the copilot LLM to automatically dry-run agents after creating
or editing them, inspect the output for wiring issues, and fix iteratively
(up to 3 attempts) before presenting the agent as ready to the user.

Changes:
- System prompt: add "Agent Development: Create -> Dry-Run -> Fix Loop" section
- Tool descriptions: create_agent, edit_agent, run_agent, get_agent_building_guide
  now reference the dry-run verification workflow
- Prompting supplement: add "Iterative agent development" section with error
  pattern guidance (failed nodes, null outputs, unexecuted nodes)
- Agent generation guide: replace "Testing with Dry Run" with comprehensive
  "REQUIRED: Dry-Run Verification Loop" section including good/bad output
  examples and workflow steps 8-9
- Tests: 21 new tests verifying prompt content across all layers
2026-04-01 18:06:31 +02:00
Zamil Majdy
6f32184019 test: add E2E screenshots for PR #12575 2026-04-01 18:06:02 +02:00
Zamil Majdy
6d0eedae83 fix(backend): truncate large run() source code in simulation prompt
Prevent prompt blowup for blocks with very large run() implementations
by applying the same _MAX_INPUT_VALUE_CHARS limit used for input values.
2026-04-01 18:06:02 +02:00
Zamil Majdy
fb328f9d74 fix(backend): move os import to top-level, remove getattr duck typing, use schema-based credential stripping in simulator
- Move `import os` from function body to top-level (stdlib, no startup cost)
- Replace `getattr(ChatConfig(), "simulation_model", "")` with direct
  attribute access since the field has a default value
- Use `block.input_schema.get_credentials_fields()` to detect credential
  fields programmatically, falling back to common names
2026-04-01 18:06:02 +02:00
Zamil Majdy
a369fbe169 fix(copilot): replace tautological env-var tests with source assertions
The TestSecurityEnvVars tests were testing Python dict assignment rather
than verifying the actual production code. Replace with source-level
assertions that grep service.py for the required env var names, catching
accidental removals without duplicating production logic.
2026-04-01 18:05:50 +02:00
Zamil Majdy
2a0b74cae4 fix(backend): update test for new prompt format (Available output pins)
The build_simulation_prompt now uses "Available output pins" instead of
"MUST include" — update the test from dev to match the new prompt.
2026-04-01 18:05:46 +02:00
Zamil Majdy
b08f9fc02a fix(platform): regenerate openapi.json and fix flaky test teardown
- Regenerate openapi.json to include Pydantic v2 ValidationError fields
  (input, ctx) that were added after the Gemini Flash commit
- Wrap oauth_test.py session fixture teardown in try/except to handle
  RuntimeError when event loop is already closed during session shutdown
2026-04-01 18:05:46 +02:00
Zamil Majdy
857acb2bbc feat(backend): use Gemini Flash for dry-run simulation, make model configurable 2026-04-01 18:05:26 +02:00
Zamil Majdy
0cb230c4f0 test(backend): add dry-run tests for AgentExecutorBlock child graph spawning
Verify prepare_dry_run returns an unmodified shallow copy for
AgentExecutorBlock (identity, equality, mutation isolation).

Also cover simulator edge cases: AgentInputBlock with all-None/missing
fields, and generic blocks yielding zero meaningful outputs.
2026-04-01 18:05:26 +02:00
Zamil Majdy
2cd5c0eab8 refactor(backend): unify MCP block simulation into generic path
Remove the MCP-specific simulation function and prompt builder.
MCPToolBlock now uses the same generic LLM simulation as all other
blocks, grounded by the block's run() source code. This eliminates
code duplication and ensures MCP blocks benefit from the same
improvements (e.g., source code grounding) as other blocks.

Also removes corresponding MCP-specific tests since the generic
simulate_block path covers the same functionality.
2026-04-01 18:05:26 +02:00
Zamil Majdy
7bf8e460ea fix(backend): add folder assignment to library agent upsert update path
The upsert's update path was missing the folder connection logic that
the create path had, causing folder changes to be silently ignored when
re-adding a previously deleted library agent.
2026-04-01 18:05:13 +02:00
Zamil Majdy
84d328517a fix(backend): always yield result pin in MCP simulation success path
The success path now explicitly yields ("result", ...) from the parsed
response rather than iterating all pins with a None/empty filter.
This prevents downstream starvation when the LLM legitimately returns
null for side-effect-only tool results.
2026-04-01 18:05:13 +02:00
Zamil Majdy
842ff6c600 fix(backend): yield result pin in MCP simulation error path
When simulate_mcp_block catches a RuntimeError/ValueError, it now yields
a ("result", None) before ("error", ...) so downstream nodes connected
to the result pin are not starved during dry-run error paths.
2026-04-01 18:05:13 +02:00
Zamil Majdy
b510fbee2a docs: fix stale iteration cap (5 → 1) in agent generation guide 2026-04-01 18:05:13 +02:00
Zamil Majdy
bb7f0ad1f2 test(simulator): align tests with dynamic pin yielding behavior
Update test assertions to match the simulator's current behavior where
empty/missing output pins are omitted rather than yielded. Also fix
prompt assertion strings to match the actual prompt text.
2026-04-01 18:05:13 +02:00
Zamil Majdy
3f8af89b63 fix(frontend): only show error styling when error output is non-empty 2026-04-01 18:04:47 +02:00
Zamil Majdy
375e5e1f10 fix(simulator): clean up error handling + dynamic pin yielding
- Don't force empty error pin — only yield error when there's a real error
- Yield all pins dynamically from LLM response (not just result+error)
- Allow logical error simulation (invalid input etc.) but not auth errors
- Omit pins with no meaningful value
2026-04-01 18:04:47 +02:00
Zamil Majdy
fd1d706315 fix(frontend): replace lucide-react icons with Phosphor equivalents in mode toggle
Use Brain and Lightning from @phosphor-icons/react instead of Brain and
Zap from lucide-react to comply with the project icon guidelines.
2026-04-01 18:04:44 +02:00
Zamil Majdy
faf2f43f6a test(simulator): add unit tests for prompt building and passthrough logic
Covers credential stripping, realistic-output instructions, input/output
block passthrough, prepare_dry_run routing, missing-pin filling, and
LLM failure handling.
2026-04-01 18:04:17 +02:00
Zamil Majdy
eea230d37f fix(simulator): produce realistic output + strip credentials from prompt
- Strip credential fields from input before sending to LLM so it never
  sees null/empty credentials and incorrectly simulates auth failures.
- Strengthen prompt: NEVER return empty/null, always generate realistic
  URLs, text, and data structures. Error pin always empty string.
- Input blocks: generate default value when no user input provided
  (first dropdown option or block name).
2026-04-01 18:03:59 +02:00
Zamil Majdy
76965429f1 fix(simulator): restore input/output block passthrough in dry-run
Re-add the passthrough logic for AgentInputBlock and AgentOutputBlock
in simulate_block. These blocks are trivial passthroughs that don't
need LLM simulation -- forwarding input values directly is faster,
deterministic, and doesn't require API keys (which aren't available
in CI).
2026-04-01 18:03:39 +02:00
Zamil Majdy
eefa60368f test(simulator): remove input/output block passthrough tests
These tests asserted passthrough behavior for AgentInputBlock and
AgentOutputBlock which was removed in the preceding refactor commit.
The simulator now LLM-simulates these blocks using their run() source
code, so the old passthrough assertions are invalid and require an
API key not available in CI.
2026-04-01 18:03:39 +02:00
Zamil Majdy
88fe1e9b5e refactor(simulator): remove special-casing for input/output blocks
The simulator now has the block's run() source code via inspect.getsource(),
so it can figure out what any block does by reading the code. No need for
special isinstance checks for AgentInputBlock/AgentOutputBlock.
2026-04-01 18:03:39 +02:00
Zamil Majdy
93264b1177 fix(simulator): generate default values for input blocks in dry-run
When users click Simulate without providing input values,
AgentInputBlock.value is None and nothing gets yielded. This leaves
downstream blocks (like OrchestratorBlock) with unpopulated links,
causing them to be skipped entirely.

Fix: generate a sensible default — first dropdown option for
AgentDropdownInputBlock, or "sample {name}" for text inputs.
2026-04-01 18:03:39 +02:00
Zamil Majdy
3269d17880 fix(simulator): use Python 3.11-compatible f-string in build_simulation_prompt
The nested f-string on line 224 used triple double-quotes inside a
triple double-quoted f-string, which is only valid from Python 3.12.
Extract the implementation section to a separate variable to fix the
SyntaxError on Python 3.11 CI.
2026-04-01 18:03:39 +02:00
Zamil Majdy
1e5788f2cf feat(simulator): include block run() source code in simulation prompt
The LLM simulator now receives the block's actual run() function source
via inspect.getsource(). This gives the LLM exact knowledge of how
inputs transform to outputs, producing far more accurate simulations.
2026-04-01 18:03:39 +02:00
Zamil Majdy
ca8214d95f fix(frontend): refetch execution details after websocket subscription to close race-condition gap
Dry-run executions can complete before the WebSocket subscription is
established, causing the frontend to miss realtime updates.  After the
subscription is confirmed, immediately invalidate the execution-details
query so react-query refetches the latest state from the REST API.

Also reduce the polling interval from 2s to 1s for more responsive
feedback during fast-completing executions.
2026-04-01 18:03:39 +02:00
Zamil Majdy
f58ce5cc70 fix(backend): passthrough input/output blocks and preserve user model in dry-run
Input blocks (AgentInputBlock and all subclasses) and AgentOutputBlock are
pure passthrough -- they just forward their input values. Previously they
went through the LLM simulator which produced verbose generated text
instead of the raw value.

Also stop swapping the OrchestratorBlock model to gpt-4o-mini during
dry-run. The user's own model and credentials are now preserved, which
avoids credential mismatches (e.g. Anthropic key vs OpenAI model).
Iterations are still capped to 1.
2026-04-01 18:03:39 +02:00
Zamil Majdy
bf29801b07 fix(backend): restore AgentExecutorBlock as dry-run passthrough block
In commit f2546b31, AgentExecutorBlock was inadvertently removed from the
passthrough list when can_simulate() was replaced with prepare_dry_run().
Since AgentExecutorBlock.Output has no properties, LLM simulation yields
zero outputs -- causing the block to "complete without output" during
dry-run.

Restore AgentExecutorBlock in prepare_dry_run() so it executes for real
during dry-run, spawning a child graph execution whose blocks are then
simulated (dry_run=True is inherited via execution context).
2026-04-01 18:03:39 +02:00
Zamil Majdy
dcc2bdd8ab fix(backend): preserve thinking blocks during transcript compaction (#12574)
AutoPilot users hit `invalid_request_error` ("thinking or
redacted_thinking blocks in the latest assistant message cannot be
modified") when sessions get long enough to trigger transcript
compaction. The Anthropic API requires thinking blocks in the last
assistant message to be byte-for-byte identical to the original response
— our compaction was flattening them to plain text, destroying the
cryptographic signatures.

Reported in Discord `#breakage` by John Ababseh with session
`31d3f08a-cb94-45eb-9fce-56b3f0287ef4`.

- **`compact_transcript`** now splits the transcript into a compressible
prefix and a preserved tail (last assistant entry + trailing entries).
Only the prefix is compressed; the tail is re-appended verbatim,
preserving thinking blocks exactly.
- **`_flatten_assistant_content`** now silently drops `thinking` and
`redacted_thinking` blocks instead of creating `[__thinking__]`
placeholders — they carry no useful context for compression summaries.
- **`response_adapter`** explicitly handles `ThinkingBlock` (skip
gracefully instead of silently falling through the isinstance chain).
- **`_format_sdk_content_blocks`** now passes through raw dict blocks
(e.g. `redacted_thinking` that the SDK may not have a typed class for)
verbatim to the transcript.

The key insight is the Anthropic API's asymmetric constraint:
- **Last assistant message**: thinking/redacted_thinking blocks must be
preserved byte-for-byte
- **Older assistant messages**: thinking blocks can be removed entirely

`compact_transcript` uses `_find_last_assistant_entry()` to split the
JSONL into two parts:
1. **Prefix** (everything before the last assistant): flattened and
compressed normally
2. **Tail** (last assistant + any trailing user message): preserved
verbatim and re-chained via `_rechain_tail()` to maintain the
`parentUuid` chain

This ensures the API always sees the original thinking blocks in the
last assistant message while still achieving meaningful compression on
older turns.

- [x] 25 new tests across `thinking_blocks_test.py` (TDD: written before
implementation)
- [x] `_find_last_assistant_entry` splits correctly at last assistant,
handles edges (no assistant, index 0, trailing user)
  - [x] `_rechain_tail` patches parentUuid chain, handles empty tail
- [x] `_flatten_assistant_content` strips thinking/redacted_thinking
blocks, handles mixed content
  - [x] `compact_transcript` preserves last assistant's thinking blocks
- [x] `compact_transcript` strips thinking from older assistant messages
- [x] Edge cases: trailing user message, single assistant, no thinking
blocks
  - [x] `response_adapter` handles ThinkingBlock without crash
- [x] `_format_sdk_content_blocks` preserves thinking block format and
raw dict blocks
- [x] All existing copilot SDK tests pass
- [x] Pre-commit hooks (lint, format, typecheck) all pass
2026-04-01 18:03:22 +02:00
Zamil Majdy
e74a918c4a debug(backend): add info-level logging to AgentExecutorBlock event listener
Logs event receipt, skip reasons, and final output count to investigate
why sub-agent outputs are not reaching the parent during dry-run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
ff05b5b8d5 revert(backend): remove unnecessary DB fallback from AgentExecutorBlock
The DB fallback was added based on wrong analysis. The actual fix is
passing dry_run=True to add_graph_execution (previous commit) so
credential validation is skipped during dry-run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
56090f870c fix(backend): pass dry_run to add_graph_execution in AgentExecutorBlock
The sub-agent's graph validation rejects missing credentials. During
dry-run, credential errors should be stripped — but the dry_run flag
wasn't being passed to add_graph_execution, so validation always
enforced credentials even in dry-run mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
a3e3d3ff6b fix(backend): fallback to DB query for AgentExecutorBlock output in dry-run
During dry-run, the sub-agent's output events may not reach the
event_bus listener before the GRAPH_EXEC_UPDATE arrives (the simulated
execution completes faster than events propagate). This causes the
AgentExecutorBlock to complete with 0 outputs.

Adds a DB fallback: after the event loop breaks on graph COMPLETED,
if no outputs were yielded, query get_node_executions for the
sub-agent's OUTPUT block results and yield them.

Evidence: normal run produces 1 output, ALL dry-runs produce 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
cfaa1ff0d4 fix(backend): execute AgentExecutorBlock for real in dry-run mode
Previously, AgentExecutorBlock was LLM-simulated during dry-run,
producing no meaningful output and making executions INCOMPLETE.

Now prepare_dry_run returns the input unchanged for AgentExecutorBlock,
letting it execute the sub-agent graph. The sub-agent's blocks are
individually simulated via the propagated dry_run execution_context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:02:45 +02:00
Zamil Majdy
6ab9a3285f fix(simulator): preserve traditional mode in dry-run preparation
prepare_dry_run now respects agent_mode_max_iterations=0 (traditional
mode) instead of unconditionally forcing agent mode. Only overrides
to 1 when the user configured agent mode (non-zero).
2026-04-01 18:02:45 +02:00
Zamil Majdy
390324d5a1 fix(platform): fall back to simulation when dry-run credentials missing + poll execution details
Bug 1: OrchestratorBlock in dry-run fails with "credentials is a required
property" when the user hasn't configured any LLM credentials.  After
prepare_dry_run overrides the model to gpt-4o-mini, the block still
requires credentials. Now we check if required credentials fields are
still empty after restoring from node defaults and fall back to LLM
simulation instead of attempting real execution.

Bug 2: WebSocket not showing real-time updates for dry-run executions
due to a race condition — the execution can start and complete before
the frontend subscribes to WebSocket events.  Add refetchInterval
polling (2s) on execution details while the graph is running so the
frontend catches up on any missed events.
2026-04-01 18:02:45 +02:00
Zamil Majdy
9f51796dbe fix(backend): remove dry-run markers from simulated block output text
The [DRY RUN] prefix and "simulated successfully — no real execution
occurred" message was being fed back to the LLM, causing the copilot to
become aware it was in dry-run mode and change its behavior. The output
text now looks identical to real execution output. The UI still shows
the "Simulated" badge via the is_dry_run=True flag on the response.
2026-04-01 18:02:45 +02:00
Zamil Majdy
f42f0013df fix(platform): run OrchestratorBlock with cheap model in dry-run instead of skipping
Replace `can_simulate(block)` with `prepare_dry_run(block, input_data)` which
returns modified input_data (model=gpt-4o-mini, agent_mode_max_iterations=1) for
OrchestratorBlock so it executes for real with a cheap model during dry-run,
instead of being skipped entirely.
2026-04-01 18:02:19 +02:00
Zamil Majdy
3154e5b87a test(backend): mock get_global_rate_limits in reset_usage tests for determinism
All reset_copilot_usage tests that reach the get_global_rate_limits
call path now explicitly mock it, preventing LaunchDarkly flag
evaluation from interfering with test assertions.
2026-04-01 18:02:19 +02:00
Zamil Majdy
78cd14d501 fix(platform): address review R1 - fix docstring, stale closures, shared formatCents
- Fix misleading "Fails open" docstring in reset_daily_usage (it's
  fail-closed for billed operations).
- Use refs in useResetRateLimit to avoid stale closure in mutation
  callbacks.
- Replace eslint-disable with useRef pattern in RateLimitResetDialog.
- Export and share formatCents between dialog and panel components.
- Add clarifying comment for omitted rate_limit_reset_cost in inner
  get_usage_status call.
2026-04-01 18:02:19 +02:00
Zamil Majdy
137edb3e6e fix(backend): address review nits - fix docstring and hoist constant
- Fix misleading simulate_block docstring that claimed "Returns None"
  for passthrough blocks (it never does; callers use can_simulate())
- Hoist _DRY_RUN_MAX_ITERATIONS to module-level in manager.py
2026-04-01 18:02:19 +02:00
Zamil Majdy
449e9b17f1 fix(backend): simplify dry-run special block handling per review feedback
Remove overengineered simulation_context, dry_run_passthrough flag,
credential redaction/URL sanitization, and excessive utils validation.
The simulator now decides which blocks to handle via can_simulate() and
delegates MCPToolBlock to a specialized prompt internally. Manager
changes are minimal: try simulator, fall back to normal execution.

-573 lines removed, 18 tests still pass.
2026-04-01 18:02:18 +02:00
Zamil Majdy
5b3f87d7c7 fix(backend): use exact URL equality assertions to silence CodeQL false positives
Replace substring `in` checks with exact equality assertions in
simulator_test.py. CodeQL flagged 4 instances of "Incomplete URL
substring sanitization" on test assertions like `assert "example.com"
in result`. Using `==` against the expected sanitized URL both silences
the CodeQL alert and makes the tests stricter.
2026-04-01 18:02:01 +02:00
Zamil Majdy
ee7209a575 test(backend): add simulator_test.py for redaction, URL sanitization, regex, and simulation_context
Cover test scenarios missing from test_dry_run.py:
- Secret field redaction (api_key, password, secret, tokens, credentials)
- URL sanitization (strip userinfo, query params, fragments)
- Non-secret field preservation
- simulation_context validation and 16KB size limit
- Regex false-positive guard (author, authority, token_count)
- Underscore-aware boundaries (api_secret, client_secret)
2026-04-01 18:02:01 +02:00
Zamil Majdy
7ea89b07ce fix(backend): address review R2 - use underscore-aware boundaries in secret regex
Replace \b word boundaries with (?:^|_)...(?:$|_) to treat underscores
as segment separators. This correctly catches compound keys like
api_secret, client_secret, secret_key, credentials while still avoiding
false positives on author, authority, token_count, etc.
2026-04-01 18:02:01 +02:00
Zamil Majdy
5324e0cc2f fix(backend): address review R1 - tighten secret regex, hoist constant, unconditional iteration cap
- _SECRET_KEY_PATTERN: use word boundaries to avoid false positives on
  keys like "author", "authority", "token_count"
- _SIMULATION_CONTEXT_MAX_BYTES: hoist to module level in utils.py
- agent_mode_max_iterations: apply cap unconditionally for passthrough
  blocks in dry-run mode (not only when key already exists in input_data)
2026-04-01 18:02:01 +02:00
Zamil Majdy
c7cbb8b02e fix(backend): apply simulation_context to execution_context when resuming dry-run
When resuming a graph execution with an already-provided execution_context,
the computed safe_simulation_context was never applied to it, causing
simulation hints to be silently ignored for resumed dry-run executions.
2026-04-01 18:02:01 +02:00
Zamil Majdy
d66ffb1ee4 fix(backend): fall back to simulation when passthrough block lacks credentials
When dry_run_passthrough is true but the block's required credentials
weren't acquired (e.g. user hasn't configured LLM credentials), fall
back to LLM simulation instead of failing with a credentials error.
This makes dry-run robust for agents created without credentials.
2026-04-01 18:02:01 +02:00
Zamil Majdy
5d489c72b5 fix(backend): inherit dry_run from execution_context in child graph validation
When AgentExecutorBlock spawns a child graph execution, it passes
execution_context (with dry_run=True) but the dry_run parameter
defaults to False. This caused validate_and_construct_node_execution_input
to reject missing credentials in the sub-agent, even though dry-run
should skip credential validation.

Fix: derive dry_run from execution_context.dry_run when an execution_context
is provided. Also propagate simulation_context to child graphs.
2026-04-01 18:02:01 +02:00
Zamil Majdy
646ffe1693 fix(backend): address review - move simulation_context to user prompt, redact credentials, validate before DB write
- Move simulation_context validation before create_graph_execution to
  prevent orphaned INCOMPLETE records on validation failure (sentry)
- Move simulation_context from system prompt to user prompt to prevent
  prompt injection from caller-supplied data (coderabbitai)
- Add credential redaction (_redact_inputs) that masks secret-bearing
  fields (api_key, token, password, etc.) and sanitizes URLs by
  stripping userinfo/query/fragment before serializing to LLM prompts
- Sanitize MCP server_url in system prompt
- Update tests to assert simulation_context is in user_prompt not system_prompt
2026-04-01 18:02:01 +02:00
Zamil Majdy
59b1811e8b fix(backend): validate simulation_context size and gate behind dry_run
- Only attach simulation_context when dry_run=True (ignored otherwise)
- Validate JSON-serializability and enforce 16KB size limit to prevent
  oversized queue payloads
2026-04-01 18:01:30 +02:00
Zamil Majdy
6404e58fb1 refactor(backend): address coderabbitai review - typed dry_run_passthrough, truncate MCP schema
- Add dry_run_passthrough property to Block base class; set on
  OrchestratorBlock and AgentExecutorBlock. Removes isinstance() dispatch
  from manager.py for dry-run routing.
- Truncate tool_input_schema in MCP simulation prompt to prevent oversized
  LLM payloads (reuses _MAX_INPUT_VALUE_CHARS limit).
- Replace isinstance(OrchestratorBlock) iteration cap check with generic
  field-presence check.
2026-04-01 18:01:30 +02:00
Zamil Majdy
e0bfa1524e feat(backend): add simulation_context for dry-run scenario hints
Thread an optional simulation_context dict through the execution pipeline
so users can provide scenario hints (expected emails, tickets, customer
data, etc.) that guide the LLM simulator to produce realistic outputs.

- Add simulation_context to ExecutionContext (propagates to child graphs)
- Accept simulation_context in REST API, copilot run_agent, and
  add_graph_execution
- Inject context into both block and MCP simulation prompts
- Add tool_description hidden field to MCPToolBlock for richer simulation
- Add 4 new tests for simulation_context and tool_description
2026-04-01 18:01:30 +02:00
Zamil Majdy
ac947a0c11 fix(backend): address CodeQL false positive - use full URL in test assertion
Use the complete URL variable instead of a substring to avoid CodeQL's
"Incomplete URL substring sanitization" alert in test code.
2026-04-01 18:00:43 +02:00
Zamil Majdy
c9f45f056a refactor(backend): address PR review - extract shared LLM retry loop, cap dry-run iterations
- Extract _call_llm_for_simulation() helper to deduplicate retry/error
  logic between simulate_block and simulate_mcp_block
- Cap OrchestratorBlock agent_mode_max_iterations to 5 in dry-run mode
  to prevent unbounded loops of real LLM calls
- Document LLM API cost implications in agent generation guide
- Update module docstring to reflect new dry-run behaviour
2026-04-01 18:00:43 +02:00
Zamil Majdy
89264091ad fix(backend/copilot): add missing strip_stale_thinking_blocks to canonical transcript module
The merge conflict resolution moved transcript.py to a re-export wrapper
but failed to copy strip_stale_thinking_blocks into the canonical
backend.copilot.transcript module. This caused an ImportError in
transcript_test.py which imports from the sdk wrapper.
2026-04-01 18:00:41 +02:00
Zamil Majdy
e3183f1955 test: add test screenshots for PR #12569 SQL query block testing round 2 2026-04-01 18:00:26 +02:00
Zamil Majdy
3ea243c760 fix(backend): resolve pyright type errors in SQL query block error handling
Replace dict(**kwargs) pattern with a local closure to preserve type
information for _sanitize_error parameters. Rename _format_operational_error
to _classify_operational_error since it now takes pre-sanitized input.
2026-04-01 18:00:26 +02:00
Zamil Majdy
991969612c refactor(backend): split SQL query block into block + helpers module
- Extract validation, sanitization, serialization, and query execution
  into sql_query_helpers.py to meet ~300-line file guideline
- Fix duck typing in _serialize_value: replace hasattr(value, "isoformat")
  with explicit isinstance(value, (datetime, date, time))
- Extract _configure_session and _run_in_transaction helpers to bring
  execute_query under ~40-line function guideline
- Extract _validate_query, _resolve_host, _format_operational_error
  helpers to simplify the run method
- Add database name scrubbing to _sanitize_error
2026-04-01 18:00:26 +02:00
Zamil Majdy
8de9880f43 fix(docs): revert host type to 'str (password)' to match block docs generator output 2026-04-01 18:00:26 +02:00
Zamil Majdy
86d8efe697 fix(docs): correct host type from 'str (password)' to 'str (secret)' in SQL Query docs 2026-04-01 18:00:26 +02:00
Zamil Majdy
10ec6c7215 test(blocks): add SQL injection and URL.create() tests for SQL query block
Add tests documenting that single-statement SQL injection patterns (e.g.,
tautology, UNION-based, blind boolean) pass through validation by design,
since the block uses raw SQL via text(query) for trusted admin/analytics
use. Also add tests verifying URL.create() correctly handles special
characters in credentials (passwords with @, #, :, spaces, etc.) -- the
existing test_special_chars_in_password mocked execute_query and never
exercised the actual URL construction path.
2026-04-01 18:00:26 +02:00
Zamil Majdy
51e5371362 style(backend): replace Optional[int] with int | None in SQL query block
Use modern union syntax consistent with the rest of the codebase.
Remove unused Optional import.
2026-04-01 18:00:26 +02:00
Zamil Majdy
cdd14726ce fix(backend): preserve thinking blocks during transcript compaction (#12574)
AutoPilot users hit `invalid_request_error` ("thinking or
redacted_thinking blocks in the latest assistant message cannot be
modified") when sessions get long enough to trigger transcript
compaction. The Anthropic API requires thinking blocks in the last
assistant message to be byte-for-byte identical to the original response
— our compaction was flattening them to plain text, destroying the
cryptographic signatures.

Reported in Discord `#breakage` by John Ababseh with session
`31d3f08a-cb94-45eb-9fce-56b3f0287ef4`.

- **`compact_transcript`** now splits the transcript into a compressible
prefix and a preserved tail (last assistant entry + trailing entries).
Only the prefix is compressed; the tail is re-appended verbatim,
preserving thinking blocks exactly.
- **`_flatten_assistant_content`** now silently drops `thinking` and
`redacted_thinking` blocks instead of creating `[__thinking__]`
placeholders — they carry no useful context for compression summaries.
- **`response_adapter`** explicitly handles `ThinkingBlock` (skip
gracefully instead of silently falling through the isinstance chain).
- **`_format_sdk_content_blocks`** now passes through raw dict blocks
(e.g. `redacted_thinking` that the SDK may not have a typed class for)
verbatim to the transcript.

The key insight is the Anthropic API's asymmetric constraint:
- **Last assistant message**: thinking/redacted_thinking blocks must be
preserved byte-for-byte
- **Older assistant messages**: thinking blocks can be removed entirely

`compact_transcript` uses `_find_last_assistant_entry()` to split the
JSONL into two parts:
1. **Prefix** (everything before the last assistant): flattened and
compressed normally
2. **Tail** (last assistant + any trailing user message): preserved
verbatim and re-chained via `_rechain_tail()` to maintain the
`parentUuid` chain

This ensures the API always sees the original thinking blocks in the
last assistant message while still achieving meaningful compression on
older turns.

- [x] 25 new tests across `thinking_blocks_test.py` (TDD: written before
implementation)
- [x] `_find_last_assistant_entry` splits correctly at last assistant,
handles edges (no assistant, index 0, trailing user)
  - [x] `_rechain_tail` patches parentUuid chain, handles empty tail
- [x] `_flatten_assistant_content` strips thinking/redacted_thinking
blocks, handles mixed content
  - [x] `compact_transcript` preserves last assistant's thinking blocks
- [x] `compact_transcript` strips thinking from older assistant messages
- [x] Edge cases: trailing user message, single assistant, no thinking
blocks
  - [x] `response_adapter` handles ThinkingBlock without crash
- [x] `_format_sdk_content_blocks` preserves thinking block format and
raw dict blocks
- [x] All existing copilot SDK tests pass
- [x] Pre-commit hooks (lint, format, typecheck) all pass
2026-04-01 17:59:53 +02:00
Zamil Majdy
1ebd5635f6 fix(backend/copilot): make include_graph an explicit parameter in _execute
Use an explicit keyword argument instead of extracting from **kwargs
for better discoverability and type safety.
2026-04-01 17:59:52 +02:00
Zamil Majdy
349b6c63de fix(backend): handle TimeoutError in graph enrichment to prevent tool crash 2026-04-01 17:59:52 +02:00
Zamil Majdy
2f7cfa6f1b fix(backend/copilot): strip secrets from graph data in _enrich_agents_with_graph
- Pass `for_export=True` to `get_graph()` so `stripped_for_export()`
  filters credentials, api_key, password, token, secret fields from
  `input_default` before the graph reaches the LLM context
- Use `agent.graph_version` (active version) instead of `version=None`
  to avoid exposing draft/unpublished graph versions
- Add `asyncio.timeout(15)` around `asyncio.gather` to prevent
  indefinite blocking on hung DB connections
- Resolve `graph_db()` once before the gather instead of per-coroutine
- Drop `get_graph_db` alias in favor of `graph_db` to match codebase

Fixes the CRITICAL security finding from autogpt-pr-reviewer.
2026-04-01 17:59:52 +02:00
Zamil Majdy
049aa1ad7d fix(backend/copilot): use f-strings for warning logs per CLAUDE.md style
CLAUDE.md says: use %s for debug, f-strings elsewhere for readability.
Reverts the incorrect change to printf-style for warning-level logs.
2026-04-01 17:59:52 +02:00
Zamil Majdy
a16be2675b style: use lazy formatting in logger.warning calls
Replace f-strings with %-style lazy formatting in _enrich_agents_with_graph
warning logs to follow standard logging conventions.
2026-04-01 17:59:52 +02:00
Zamil Majdy
ac416a561e fix(backend/copilot): remove type: ignore by adding explicit graph_id guard in _fetch 2026-04-01 17:59:52 +02:00
Zamil Majdy
c47fcc1925 refactor(backend/copilot): use BaseGraph type for graph field
Use BaseGraph instead of Graph to get typed nodes+links without causing
the Pydantic OpenAPI schema split. BaseGraph-Input/Output already exists
on dev so no frontend imports break. Fetches via graph_db().get_graph().
2026-04-01 17:59:52 +02:00
Zamil Majdy
77fd8648a7 fix(frontend): regenerate openapi.json to sync Graph schema
The backend Graph model no longer uses separate Input/Output variants,
so the openapi.json was out of sync causing the generated `graph.ts`
type to be missing and failing CI type checks + e2e builds.
2026-04-01 17:59:52 +02:00
Zamil Majdy
4842599bec fix(backend/copilot): remove redundant graph_id guard in _fetch 2026-04-01 17:59:52 +02:00
Zamil Majdy
339e155823 fix(backend): log truncation when include_graph skips agents
When include_graph=true and more agents have graph_ids than
_MAX_GRAPH_FETCHES, log a warning indicating how many agents
were skipped. This makes the silent truncation visible.
2026-04-01 17:59:52 +02:00
Zamil Majdy
9344e62d66 fix: remove type: ignore with proper guard clause in _enrich_agents_with_graph
Narrow agent.graph_id from str | None to str with an early return,
eliminating the type: ignore[arg-type] suppressor.
2026-04-01 17:59:52 +02:00
Zamil Majdy
ee6cc20cbc fix(backend/copilot): address review — parallel fetch, None logging, failure tests
- Use asyncio.gather for parallel graph fetching instead of sequential loop
- Cap graph fetches at 10 to prevent excessive DB calls on broad searches
- Log warning when get_agent_as_json returns None (graph not found)
- Add tests for exception and None return failure paths
2026-04-01 17:59:52 +02:00
Zamil Majdy
eb96b019c5 refactor(backend/copilot): merge create/edit workflows in agent guide 2026-04-01 17:59:52 +02:00
Zamil Majdy
9cf6ac9ad9 feat(backend/copilot): add include_graph option to find_library_agent for agent debugging/editing
The copilot's edit_agent tool required the LLM to provide a complete agent
JSON (nodes + links) without ever seeing the current graph structure — it was
editing blindly. This adds an `include_graph` boolean parameter to the
existing `find_library_agent` tool so the copilot can fetch the full graph
before making modifications.

Also updates the agent generation guide to split creating vs editing
workflows, instructing the LLM to always fetch the current graph first.
2026-04-01 17:59:36 +02:00
Zamil Majdy
d3173605eb test(copilot): add unit tests for P0 guardrails
Tests for _resolve_fallback_model (5 tests), security env vars (4 tests),
and ChatConfig defaults (4 tests). All 13 tests pass.
2026-04-01 17:59:09 +02:00
Zamil Majdy
98c27653f2 fix(copilot): snapshot/restore TranscriptBuilder on transient retry
TranscriptBuilder._entries is independent from session.messages.
Rolling back session.messages alone left duplicate entries in the
uploaded --resume transcript. Now snapshot _entries + _last_uuid
before each attempt and restore both rollback locations on failure.
2026-04-01 17:59:09 +02:00
Zamil Majdy
dced534df3 fix(copilot): review round 3 — fix transient error code check, add SDK compat fields
- Fix exc.code check: "transient" -> "transient_api_error" to match
  the actual code set in _run_stream_attempt (line 1343)
- Add fallback_model, max_turns, max_budget_usd, stderr to SDK compat
  test so field renames in the SDK are caught early
2026-04-01 17:59:09 +02:00
Zamil Majdy
4ebe294707 fix(copilot): review round 2 — fix transient retry consuming context-level attempt
Convert for-loop to while-loop so transient retries (continue) replay
the same context-level attempt instead of advancing to the next one.
Previously, `continue` in a `for attempt in range(...)` loop would
increment `attempt`, causing transient retries to wastefully trigger
context reduction and reset the transient retry counter.

Now: transient retries stay at the same attempt (no attempt++), while
context-error retries explicitly increment attempt before continue.
2026-04-01 17:59:09 +02:00
Zamil Majdy
2e8e115cd1 fix(copilot): review round 1 — fix transient retry count, strip fallback model prefix
- Fix _can_retry_transient off-by-one: >= should be > so max_retries=3
  actually performs 3 retries instead of 2
- Move events_yielded check before counter increment to avoid wasting
  a retry slot when events were already sent
- Strip OpenRouter provider prefix from fallback model name (mirrors
  _resolve_sdk_model logic) to prevent model-not-found errors
2026-04-01 17:59:09 +02:00
Zamil Majdy
5ca49a8ec9 fix(copilot): P0 guardrails — SDK limits, security env vars, transient retry
Based on analysis of the Claude Code CLI internals, adds critical
guardrails rebased on the current dev architecture (env.py extraction):

1. SDK guardrails: fallback_model (auto-retry on 529), max_turns=50
   (runaway prevention), max_budget_usd=5.0 (per-query cost cap)

2. TMPDIR redirect: sets CLAUDE_CODE_TMPDIR to sdk_cwd so CLI output
   is routed into the per-session workspace for isolation/cleanup

3. Security env vars: DISABLE_CLAUDE_MDS, SKIP_PROMPT_HISTORY,
   DISABLE_AUTO_MEMORY, DISABLE_NONESSENTIAL_TRAFFIC

4. Transient error retry: 429/5xx/ECONNRESET errors now retry with
   exponential backoff (1s, 2s, 4s) in both _HandledStreamError and
   generic Exception handlers. Skips retry if events already yielded
2026-04-01 17:59:09 +02:00
Zamil Majdy
a9db5af0fa fix(tests): mock build_sdk_env to return {} instead of None
The CLAUDE_CODE_TMPDIR assignment requires sdk_env to be a dict,
not None. Fixes TypeError in retry scenario tests.
2026-04-01 17:59:07 +02:00
Zamil Majdy
dcbfcfb158 fix(copilot): review round 3 — add Agent to ToolName Literal for permissions
Add "Agent" to the ToolName Literal and test expected set so permission
filtering does not incorrectly block the Agent tool in permissioned
sessions. Without this, apply_tool_permissions would strip "Agent" from
the allowed_tools list.
2026-04-01 17:59:07 +02:00
Zamil Majdy
723b852ba4 fix(copilot): review round 2 — sanitize all untrusted hook inputs for logging
- Sanitize error message and tool_use_id in post_tool_failure_hook
  to prevent log injection via crafted error strings
- Sanitize trigger field in pre_compact_hook
- Use %-style formatting in failure hook for consistency with other hooks
2026-04-01 17:59:07 +02:00
Zamil Majdy
c7e0f8169a fix(copilot): review round 1 — hoist subagent constant, strip C1 chars, guard tmpdir
- Move _SUBAGENT_TOOLS frozenset to module level to avoid per-session allocation
- Extend _sanitize to strip C1 control characters (U+0080-U+009F) for
  defense against log injection via non-ASCII control sequences
- Guard CLAUDE_CODE_TMPDIR assignment with `if sdk_cwd:` for defensive
  consistency (matches PR #12636 approach)
2026-04-01 17:59:07 +02:00
Zamil Majdy
ce1555c07a fix(copilot): address review round 2 — transcript path max_len, subagent tests
- SubagentStop: use max_len=500 for transcript path (consistent with
  pre_compact_hook)
- Add test coverage for SubagentStart/SubagentStop hooks including
  control character sanitization
2026-04-01 17:59:07 +02:00
Zamil Majdy
403a36a3fc fix(copilot): address review — robust sanitize, drop redundant None guard
- _sanitize: strip all C0 control chars + DEL, not just \n/\r
- Remove unnecessary `sdk_env is None` guard (build_sdk_env always returns dict)
2026-04-01 17:59:07 +02:00
Zamil Majdy
490643d65a refactor(copilot): hoist _sanitize helper and use it in pre_compact_hook
Move _sanitize() above all hooks so it can be reused. Refactor
pre_compact_hook to use _sanitize(max_len=500) instead of inline
.replace() calls for consistency across all hooks.
2026-04-01 17:59:07 +02:00
Zamil Majdy
2b14ecf5ee fix(copilot): sanitize hook inputs, rename constant, add Agent failure test
- Rename _SUBAGENT_TOOLS to _subagent_tools (frozenset, function-local)
- Extract _sanitize() helper for consistent log injection prevention
  across subagent_start_hook and subagent_stop_hook
- Add test_agent_slot_released_on_failure for coverage parity with
  the existing Task failure test
2026-04-01 17:59:07 +02:00
Zamil Majdy
14d6d66bdc refactor(copilot): use frozenset and extract _sanitize helper in hooks 2026-04-01 17:59:07 +02:00
Zamil Majdy
28443e2e33 fix(copilot): guard against None sdk_env from build_sdk_env
build_sdk_env can return None in test mocks. Guard with fallback
to empty dict before setting CLAUDE_CODE_TMPDIR.
2026-04-01 17:59:07 +02:00
Zamil Majdy
611a20d7df fix(copilot): sanitize transcript path in subagent stop hook
Strip control characters from agent_transcript_path before logging
to prevent log injection, matching the existing pattern in pre_compact_hook.
2026-04-01 17:59:07 +02:00
Zamil Majdy
ce201cd19c fix(copilot): remove HOME override to preserve subscription auth
Sentry correctly flagged that overriding HOME breaks subscription mode
(claude login) — the CLI looks for credentials at $HOME/.claude/.
Keep only CLAUDE_CODE_TMPDIR which fixes the sub-agent output path.
2026-04-01 17:59:07 +02:00
Zamil Majdy
0c76852768 fix(copilot): address self-review nits in security hooks logging 2026-04-01 17:59:07 +02:00
Zamil Majdy
414b8bbaac fix(copilot): recognize Agent tool name and route CLI state into workspace
The Claude Agent SDK CLI renamed the sub-agent tool from "Task" to "Agent"
in v2.x. Our security hooks only checked for "Task", so all sub-agent
security controls were silently bypassed: background execution was unblocked,
concurrency limiting didn't apply, and slot tracking was broken.

Additionally, the CLI writes sub-agent output to /tmp/claude-<uid>/ and
project state to $HOME/.claude/ — both outside the per-session workspace
(/tmp/copilot-<session>/). This caused PermissionError in E2B and silently
lost sub-agent results via failed @@agptfile: expansion.

Changes:
- Handle both "Task" and "Agent" tool names in security hooks
- Add "Agent" to _SDK_BUILTIN_ALWAYS allowed tools list
- Set CLAUDE_CODE_TMPDIR and HOME to sdk_cwd so CLI state lands in workspace
- Register SubagentStart/SubagentStop hooks for lifecycle visibility
- Add 5 new tests for Agent tool name handling and mixed slot sharing
2026-04-01 17:59:07 +02:00
Zamil Majdy
4c85f2399a fix(backend): propagate dry-run mode to special blocks (Orchestrator, AgentExecutor, MCP)
Previously dry-run mode simulated ALL blocks via LLM, but this didn't work
well for OrchestratorBlock, AgentExecutorBlock, and MCPToolBlock:

- OrchestratorBlock & AgentExecutorBlock now execute for real in dry-run
  mode so the orchestrator can make LLM calls and agent executors can
  spawn child graphs. Their downstream tool blocks and child-graph blocks
  are still simulated. Credential fields from node defaults are restored
  since validate_exec wipes them in dry-run mode.

- MCPToolBlock gets a specialised simulate_mcp_block() that builds an
  LLM prompt grounded in the selected tool's name and JSON Schema,
  producing more realistic mock responses than the generic simulator.
2026-04-01 17:58:51 +02:00
Zamil Majdy
db0e5a1b0b style(test): format SQL query block tests with ruff
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:58:49 +02:00
Zamil Majdy
22a5e76af9 fix(test): replace real-looking connection strings with test.invalid hosts
GitHub secret scanner flagged test connection strings as leaked secrets.
Replaced all real-looking IPs, hostnames, and Supabase URLs with
RFC 2606 reserved .invalid domains and RFC 5737 documentation IPs
(198.51.100.x).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:58:49 +02:00
Zamil Majdy
7919da16b4 test(backend): add security-focused tests for SQL query block
Adds 92 test cases covering:
- Single-statement validation (multi-statement injection blocked)
- Read-only enforcement (INSERT/UPDATE/DELETE/DROP rejected)
- Writable CTE detection (WITH...DELETE RETURNING blocked)
- SSRF protection: IPv4 private ranges, IPv6 loopback (::1),
  link-local (fe80::), Unix socket paths
- Error sanitization: passwords scrubbed, usernames scrubbed,
  IP addresses scrubbed from error messages
- Value serialization edge cases (datetime, Decimal, bytes)
- URL validation for all database types

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:58:49 +02:00
Zamil Majdy
052f953afb fix(backend): replace f-string interpolation with str(int()) for SET timeout commands
Use explicit str(int(timeout * 1000)) instead of f-string interpolation
for SET statement_timeout / MAX_EXECUTION_TIME / LOCK_TIMEOUT commands.
SET commands don't support bind parameters in most databases, so we use
string concatenation with an int-cast value as defense-in-depth.
2026-04-01 17:58:49 +02:00
Zamil Majdy
abd9fbe08a docs(backend): regenerate block docs to fix check-docs-sync 2026-04-01 17:58:49 +02:00
Zamil Majdy
81308af770 fix(backend): fix remaining stale pyodbc comment in MSSQL section 2026-04-01 17:58:49 +02:00
Zamil Majdy
a726c1d1d5 fix(backend): address round 2 review — port validation, comment fix, dead fallback
- Add ge=1, le=65535 port validation to Input schema
- Fix inaccurate comment: pymssql not pyodbc
- Replace _DATABASE_TYPE_DEFAULT_PORT.get() with direct dict access
  (all types have entries after SQLite removal)
- Update default port tests to use port=None instead of port=0
2026-04-01 17:58:49 +02:00
Zamil Majdy
a015bf9e1c fix(backend): address review round — remove SQLite, hide password, cleanup dead code
- Remove DatabaseType.SQLITE from enum (rejected at runtime, confusing UX)
- Remove all SQLite dead code paths (driver map, connect_args, runtime check)
- Change render_as_string(hide_password=False) to hide_password=True to avoid
  materializing plaintext credentials in local variable
- Simplify pinned_host assignment (remove unreachable fallback branch)
- Remove SQLite-related test cases
- Add doc comment to _make_input noting read_only default deviation
2026-04-01 17:58:49 +02:00
Zamil Majdy
d99278a40d fix(backend): update _sanitize_error docstring to mention IPv6 scrubbing 2026-04-01 17:58:49 +02:00
Zamil Majdy
bd7d9a5697 fix(backend): address round 1 review findings for SQL query block
- Fix database name injection: pass URL object to create_engine() instead
  of rendered string to prevent query parameter injection via database name
- Refactor _validate_query_is_read_only to accept parsed Statement object,
  eliminating duplicate sqlparse.parse() call
- Add IPv6 address scrubbing to _sanitize_error
- Fix docs: remove sqlite from valid types, correct host type annotation
2026-04-01 17:58:48 +02:00
Zamil Majdy
9cfa53a2ff fix(backend): document MySQL MAX_EXECUTION_TIME limitation for write queries
Add code comment noting that MySQL's MAX_EXECUTION_TIME only applies to
SELECT statements; write operations rely on the database's wait_timeout.
2026-04-01 17:58:48 +02:00
Zamil Majdy
e6cf899a6d fix(docs): regenerate block docs to sync with code schema
Ran generate_block_docs.py to fix check-docs-sync CI failure.
The Inputs table is auto-generated from the block schema.
2026-04-01 17:58:48 +02:00
Zamil Majdy
b655b30aeb fix: address review findings on SQL query block PR
- Remove unnecessary pool_pre_ping/pool_recycle (engine disposed per-query)
- Fix _extract_keyword_tokens docstring to match implementation
- Move DATABASE enum entry to alphabetical position in ProviderName
- Add database entry to frontend providerIcons map
- Revert no-op string-literal extraction in API key modals
- Revert unused _provider param in getCredentialTypeLabel
2026-04-01 17:58:48 +02:00
Zamil Majdy
5b8daf5d4c fix(docs): correct SQL query block documentation to match code
- Fix "How it works" to say read-only by default (not write-enabled)
- Replace "connection URL" with "discrete host/port/database fields"
- Remove sqlite from database_type options (disabled in code)
- Fix host type from "str (password)" to "str (secret)"
2026-04-01 17:58:48 +02:00
Zamil Majdy
9b74b7bb41 fix(backend): handle single-quoted usernames in SQL error sanitization
MySQL and MSSQL error messages use single quotes around usernames (e.g.
"Access denied for user 'myuser'@'host'"), but _sanitize_error only
handled double-quoted usernames. This could leak usernames to the LLM.

Now handles both quote styles in the regex and bare replacement.
2026-04-01 17:58:48 +02:00
Zamil Majdy
a1578984cc fix(backend): update poetry.lock and regenerate block docs
- Run `poetry lock` to include pymssql and pymysql in the lock file
- Regenerate block docs to reflect the Optional[int] port field change
2026-04-01 17:58:48 +02:00
Zamil Majdy
c0869e9168 fix(backend): fix port default for MySQL/MSSQL and add missing DB drivers
- Make port field Optional[int] with default=None so the `or` fallback
  correctly picks the database-specific default port (3306 for MySQL,
  1433 for MSSQL) instead of always using 5432
- Add pymysql and pymssql dependencies and update driver names to
  mysql+pymysql and mssql+pymssql so SQLAlchemy can connect to these DBs
- Handle ModuleNotFoundError gracefully if a driver is unavailable
- Update pymssql connect_args to use login_timeout (pymssql API)
2026-04-01 17:58:48 +02:00
Zamil Majdy
0db5a6ff9a docs: regenerate block docs after SQL block Input schema changes 2026-04-01 17:58:48 +02:00
Zamil Majdy
3664624445 fix(backend): improve SQL block Input schema UX
- Mark host field as secret=True so it renders as masked text in the UI
- Change default port from 0 to 5432 (PostgreSQL default) to avoid confusing "0"
- Set database_type to advanced=False so it shows as a prominent dropdown
- Reorder fields: database_type -> host -> port -> database -> query -> read_only
- Improve descriptions and placeholders for clarity
2026-04-01 17:58:48 +02:00
Zamil Majdy
f1e2ce0703 fix(backend): add MSSQL timeout enforcement and document read-only gap
Address review feedback: add SET LOCK_TIMEOUT for MSSQL connections to
enforce query timeout at the database level, consistent with the
PostgreSQL/MySQL implementations. Document that MSSQL lacks a
session-level read-only mode, with defense-in-depth handled by the SQL
validation layer and ROLLBACK in the finally block.
2026-04-01 17:58:48 +02:00
Zamil Majdy
c226cf0925 fix(backend): address Sentry review comments on SQL query block
- Use database_type enum instead of substring-checking connection string
  to determine driver-specific connect_args (fixes false match when db
  name/user/password contains "mssql" or "sqlite")
- Use BEGIN TRANSACTION for MSSQL instead of BEGIN (T-SQL syntax)
- Extend port sanitization regex to also match :port format (host:5432)
- Add test for colon-format port sanitization
2026-04-01 17:58:48 +02:00
Zamil Majdy
dade634b4a fix(backend): use plain string for host in test_input to fix JSON schema validation
The test_input host value should be a plain string (Pydantic coerces it
to SecretStr), not a SecretStr object which serializes as '**********'
and fails JSON schema validation in the block test framework.
2026-04-01 17:58:48 +02:00
Zamil Majdy
34101c4389 docs: regenerate block docs after host field type change to SecretStr 2026-04-01 17:58:48 +02:00
Zamil Majdy
2218254c8a fix(backend): make SQL block host a SecretStr and harden error sanitization
- Change `host` field from `str` to `SecretStr` so it is hidden from
  repr/logs but still stored in graph JSON.
- Expand `_sanitize_error` to strip hostnames, IP addresses, usernames,
  and port numbers from error messages exposed to the LLM.
- Add tests for hostname/IP/username/port scrubbing and an integration
  test verifying no infrastructure details leak through run() errors.
2026-04-01 17:58:48 +02:00
Zamil Majdy
4d63cffa7a docs: regenerate block docs after SQL query block port field change 2026-04-01 17:58:48 +02:00
Zamil Majdy
ebf3b920d8 fix(backend): address 3 review findings in SQL query block
1. Fix DNS rebinding TOCTOU: pin connection to resolved IP from
   check_host_allowed instead of re-resolving the hostname, preventing
   SSRF via DNS rebinding attacks.

2. Fix default port per database type: use _DATABASE_TYPE_DEFAULT_PORT
   lookup instead of hard-coded 5432, so MySQL (3306) and MSSQL (1433)
   work without manually specifying the port.

3. Fix MSSQL connect_timeout: use pyodbc's "timeout" key instead of
   "connect_timeout" which is silently ignored, preventing indefinite
   hangs on unreachable MSSQL servers.
2026-04-01 17:58:48 +02:00
Zamil Majdy
9bd579b041 fix(platform): clean up Pyright warnings, fix comment-only query test, sync docs
- Use _DATABASE_TYPE_DEFAULT_PORT for port fallback in SQLQueryBlock.run()
- Rename **kwargs to **_kwargs and *args to *_args to silence not-accessed warnings
- Fix _validate_single_statement to reject comment-only queries as empty
- Fix test_comment_only_query to assert specific error message
- Prefix unused `provider` param with _ in getCredentialTypeLabel (frontend lint)
- Regenerate block docs to fix check-docs-sync CI
2026-04-01 17:58:48 +02:00
Zamil Majdy
41601cbb5c fix(platform): switch SQL block to user_password credentials to fix special char passwords
The SQL block previously used api_key credential type, stuffing the entire
connection URL (including password) into one field. This broke when passwords
contained special characters (@, #, !) that conflict with URL syntax.

Switch to user_password credential type with separate username/password fields.
Build the SQLAlchemy URL internally via URL.create() which accepts raw passwords
without URL encoding. Also restore accidentally deleted _validate_query_is_read_only
function, remove unused _encode_password_in_url/quote/unquote imports, and clean up
database-specific UI overrides in the frontend credential modals.
2026-04-01 17:58:48 +02:00
Zamil Majdy
c636b6f310 test(backend): add integration tests for SQLQueryBlock SSRF, SQLite, and error handling
Add run()-level tests covering SSRF private IP rejection (127.0.0.1,
10.x, 172.16.x, 192.168.x), Unix socket blocking, missing hostname
rejection, SQLite disabled error, credential sanitization on connection
failure, query timeout clean error, URL type mismatch rejection, happy
path, and EXECUTE keyword rejection. Also adds time serialization test.
2026-04-01 17:58:48 +02:00
Zamil Majdy
292be77b86 fix(platform): show "Connection URL" instead of "API Key" for database credentials
The SQL query block's credential dialog was misleadingly labeled since a
database connection URL is not an API key. This updates both backend and
frontend:

- Shorten the DatabaseCredentialsField description so it no longer
  truncates in the UI
- Make credential labels provider-aware so the database provider shows
  "Connection URL" instead of "API Key" in tab labels, input fields,
  placeholders, and action buttons
2026-04-01 17:58:48 +02:00
Zamil Majdy
dd3349e6bc fix(backend): block SELECT INTO, disable SQLite, fix read-only transaction ordering
- Add INTO, OUTFILE, DUMPFILE to disallowed SQL keywords to prevent
  SELECT...INTO table creation and file writes
- Disable SQLite database type (lacks path sandboxing and read-only
  enforcement) until proper restrictions are implemented
- Fix read-only transaction enforcement: use AUTOCOMMIT to issue SET
  commands, then open explicit BEGIN/ROLLBACK transaction for the user
  query so read-only constraints apply to it (not the next transaction)
- Add regression tests for SELECT INTO variants
2026-04-01 17:58:48 +02:00
Zamil Majdy
bfdf4b99db fix(backend): make SSRF host check mockable for block test framework
Extract resolve_and_check_blocked into a check_host_allowed method on
SQLQueryBlock so the block test framework can mock it alongside
execute_query. Without this, test credentials pointing to localhost
trigger the SSRF blocklist in CI.
2026-04-01 17:58:48 +02:00
Zamil Majdy
aba78b0fdd refactor(backend): replace psycopg2 with SQLAlchemy for multi-database support
Refactor SQLQueryBlock to use SQLAlchemy instead of psycopg2, enabling
support for PostgreSQL, MySQL, SQLite, and MSSQL. Add a database_type
enum field to Input for selecting the target database. Connection
credentials now accept any SQLAlchemy connection URL format.

- Replace psycopg2 with sqlalchemy.create_engine + connection.execute(text())
- Add DatabaseType enum (postgres, mysql, sqlite, mssql)
- Add _validate_connection_url to ensure URL matches selected db type
- Rename ProviderName.POSTGRES to ProviderName.DATABASE
- Update SSRF protection to use SQLAlchemy URL parsing (make_url)
- Add urlparse import for SQLite network connection check
- Handle bytes serialization alongside memoryview
- Update tests with TestValidateConnectionUrl class and bytes test
- Update docs to reflect multi-database support
2026-04-01 17:58:48 +02:00
Zamil Majdy
12934dfd72 docs: regenerate block documentation for SQLQueryBlock 2026-04-01 17:58:48 +02:00
Zamil Majdy
c5507415fd fix(backend): harden SQL query block against injection, SSRF bypass, and precision loss
- Replace regex-based SQL validation with sqlparse tokenizer to prevent
  multi-statement injection via quoted comment bypass (e.g. SET LOCAL
  statement_timeout = 0). Keywords in string literals no longer cause
  false positives.
- Replace urlparse with psycopg2.extensions.parse_dsn for SSRF protection,
  handling both URI and libpq DSN formats. Reject missing hostname and
  Unix socket paths.
- Use server-side named cursor to enforce max_rows at the database level
  instead of fetching entire result set into client memory.
- Serialize fractional Decimal values as str instead of float to preserve
  exact precision for analytics data.
- Add sqlparse dependency.
- Add tests for multi-statement injection, string literal keywords, and
  high-precision Decimal serialization.
2026-04-01 17:58:48 +02:00
Zamil Majdy
7ff096afd9 style(backend): extract sanitize_error to local vars for readability 2026-04-01 17:58:48 +02:00
Zamil Majdy
38fb504063 fix(backend): reduce keyword false positives, broaden SSRF handling, add tests
- Remove ambiguous keywords (COMMENT, ANALYZE, LOCK, CLUSTER, REINDEX,
  VACUUM) from disallowed list — they're harmless on readonly connections
  and cause false positives on common column names
- Add NOTE documenting intentional string-literal matching behavior
- Broaden SSRF exception handling to catch OSError (DNS failures)
- Add _serialize_value tests (Decimal, datetime, date, memoryview)
- Add tests for column names that look like keywords
2026-04-01 17:58:48 +02:00
Zamil Majdy
b4388a9c93 fix(backend): address PR review - security, async, SSRF, tests
- Add _sanitize_error() to scrub connection strings from error messages
- Wrap execute_query in asyncio.to_thread() to avoid blocking event loop
- Add SSRF protection via resolve_and_check_blocked() on database host
- Document intentional string-literal false positives in comment stripping
- Add sql_query_block_test.py with 36 tests for query validation and
  error sanitization
2026-04-01 17:58:48 +02:00
Zamil Majdy
a7a68e585a feat(backend): add SQL query block for CoPilot analytics access
Add a new SQLQueryBlock that allows CoPilot and user-built agents to
execute read-only SQL queries against PostgreSQL databases. This enables
data-driven answers for analytics (user metrics, retention, onboarding
funnels, execution stats) via the existing run_block tool.

- New POSTGRES provider in ProviderName enum
- APIKeyCredentials with connection string for MVP credential storage
- SELECT-only query validation with defense-in-depth keyword blocking
- Configurable query timeout (max 120s) and row limit (max 10000)
- Read-only connection mode + statement_timeout for safety
- JSON-safe serialization for Decimal, datetime, and binary types

Resolves: SECRT-2171
2026-04-01 17:58:48 +02:00
Zamil Majdy
14ad37b0c7 fix: resolve merge conflict in transcript.py re-export module 2026-04-01 17:53:57 +02:00
Zamil Majdy
389cd28879 test: add round 3 E2E screenshots for PR #12623 2026-04-01 17:01:10 +02:00
Zamil Majdy
656858eba1 test: add E2E screenshots for PR #12581 round 3 2026-04-01 16:58:11 +02:00
Zamil Majdy
f0a3afda7d Add test screenshots for PR #12623 2026-04-01 08:49:33 +02:00
Zamil Majdy
a9cbb3ee2f test: add screenshots from PR #12581 round 2 testing 2026-04-01 08:47:38 +02:00
Zamil Majdy
1810452920 fix(frontend): use type-safe any cast for createSessionMutation call
The generated mutation type differs between local (void) and CI
(requires CreateSessionRequest) due to export-api-schema regeneration.
Use an explicit any cast to handle both generated type variants.
2026-04-01 08:46:17 +02:00
Zamil Majdy
4f6f3ca240 fix(frontend): remove redundant tier fetch and add empty-query guard
The backend get_user_rate_limit endpoint already returns tier in the
response — remove the separate fetchTier() calls that were duplicating
the request. Also guard search_users against empty queries to prevent
returning the entire user table. Fix pre-existing TS error in
useChatSession where createSessionMutation was called with an argument
the generated client no longer expects.
2026-04-01 08:13:15 +02:00
Zamil Majdy
9ffecbac02 fix(backend/copilot): add missing mode param to enqueue_copilot_turn docstring 2026-04-01 08:03:35 +02:00
Zamil Majdy
eb22cf4483 fix(frontend): remove duplicate JSDoc and simplify tier access in rate-limit admin UI 2026-04-01 06:33:52 +02:00
Zamil Majdy
16636b64c6 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-04-01 06:15:37 +02:00
Zamil Majdy
c2709fbc28 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-04-01 06:14:49 +02:00
Zamil Majdy
3adbaacc0e Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-mode-toggle 2026-03-31 19:07:34 +02:00
Zamil Majdy
4da3535a9c Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-31 19:07:23 +02:00
Zamil Majdy
56e0b568a4 fix(backend): update tests for transcript module move and new fixer defaults
- Update patch targets in transcript tests from
  backend.copilot.sdk.transcript to backend.copilot.transcript since
  the re-export shim only re-exports public symbols; private names
  like _projects_base and get_openai_client live in the canonical module.
- Update orchestrator fixer test assertions to account for 2 new
  _SDM_DEFAULTS (execution_mode, model) and add execution_mode to the
  E2E test's mock block inputSchema.
2026-03-31 18:26:18 +02:00
Zamil Majdy
4acac9ff5b fix: remove accidentally committed files and fix duplicate comment
- Remove .application.logs (local debug artifact)
- Remove test-results/ directory with PNG screenshots
- Remove duplicated JSDoc comment in useRateLimitManager.ts
2026-03-31 18:18:35 +02:00
Zamil Majdy
0b0777ac87 fix(copilot): update fix_orchestrator_blocks docstring to list all 6 defaults
The docstring only listed 4 defaults but _SDM_DEFAULTS has 6 entries
including execution_mode and model. Updated to reflect the actual behavior.
2026-03-31 17:49:54 +02:00
Zamil Majdy
698b1599cb fix(copilot): reject stale transcripts in baseline service 2026-03-31 17:41:06 +02:00
Zamil Majdy
a2f94f08d9 fix(copilot): address review comments round 3 2026-03-31 17:35:11 +02:00
Zamil Majdy
0c6f20f728 feat(copilot): set extended_thinking + Opus as OrchestratorBlock defaults
Update the agent generator fixer defaults so generated agents inherit
the copilot's default reasoning mode (extended_thinking with Opus).
User-set values are preserved — the fixer only fills in missing fields.
2026-03-31 17:23:06 +02:00
Zamil Majdy
d100b2515b fix(copilot): include tool messages in baseline conversation context
The baseline was only including user/assistant text messages when
building the OpenAI message list, dropping all tool_calls and tool
results. This meant the model had no memory of previous tool
invocations or their outputs in multi-turn conversations.

Now includes assistant messages with tool_calls and tool-role messages
with tool_call_id, giving the model full conversation context.
2026-03-31 17:12:37 +02:00
Zamil Majdy
14113f96a9 feat(copilot): use Sonnet for fast mode, Opus for extended thinking
Add `fast_model` config field (default: anthropic/claude-sonnet-4) so
fast mode uses a faster/cheaper model while extended thinking keeps
using Opus. The baseline service now uses config.fast_model for all
LLM calls.
2026-03-31 17:07:04 +02:00
Zamil Majdy
ee40a4b9a8 refactor(copilot): move transcript modules to shared location 2026-03-31 16:29:48 +02:00
Zamil Majdy
0008cafc3b fix(copilot): fix transcript ordering and mode toggle mid-session
- Fix transcript ordering: move append_tool_result from tool executor
  to conversation updater so entries follow correct API order
  (assistant tool_use → user tool_result)
- Fix mode toggle mid-session: use useRef for copilotMode so transport
  closure reads latest value without recreating DefaultChatTransport
- Use Literal type for mode in CoPilotExecutionEntry for type safety
2026-03-31 16:02:36 +02:00
Zamil Majdy
f55bc84fe7 fix(copilot): address PR review comments
- Use Literal["fast", "extended_thinking"] for mode validation (blocker)
- Wrap transcript upload in asyncio.shield() (should fix)
- Restore top-level estimate_token_count imports (nice to have)
- Guard localStorage copilotMode read against invalid values (should fix)
- Replace inline SVGs with lucide-react Brain/Zap icons (nice to have)
2026-03-31 15:52:06 +02:00
Zamil Majdy
3cfee4c4b5 feat(copilot): add mode toggle and baseline transcript support
- Add transcript support to baseline autopilot (download/upload/build)
  for feature parity with SDK path, enabling seamless mode switching
- Thread `mode` field through full stack: StreamChatRequest → queue →
  executor → service selection (fast=baseline, extended_thinking=SDK)
- Add mode toggle button in ChatInput UI with brain/lightning icons
- Persist mode preference in localStorage via Zustand store
2026-03-31 15:46:23 +02:00
Zamil Majdy
c48b5239b9 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-31 15:17:31 +02:00
Zamil Majdy
e44615f8b8 fix(frontend): merge tier into refreshed data after tier change 2026-03-30 06:00:51 +02:00
Zamil Majdy
22f0da0a03 fix(backend): correct ENTERPRISE multiplier comment (50x → 60x) 2026-03-29 20:55:40 +02:00
Zamil Majdy
9264b42050 fix(frontend): fetch user tier on admin rate-limits page
The Subscription Tier dropdown showed "PRO" for all users because
the tier was never fetched from the backend. Now fetches the tier
via getV2GetUserRateLimitTier after loading rate limits, and uses
postV2SetUserRateLimitTier (generated client) instead of raw fetch
for tier changes.
2026-03-29 13:55:49 +02:00
Zamil Majdy
3a40188024 test(backend): add end-to-end tests for tier-adjusted rate limits
Add TestTierLimitsRespected class that verifies the full flow:
get_global_rate_limits (with tier multiplier) -> check_rate_limit.

- PRO user with 3M usage is allowed (below 12.5M PRO limit)
- FREE user at 2.5M is blocked (at FREE limit)
- ENTERPRISE user with 100M usage is allowed (below 150M limit)

Addresses reviewer feedback requesting tests that verify limits are
actually respected end-to-end.
2026-03-29 11:56:05 +02:00
Zamil Majdy
8d6433c1a5 Merge branch 'feat/rate-limit-tiering' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-29 06:42:40 +02:00
Zamil Majdy
c7430eaffb fix(platform): use lazy logger formatting in rate limit admin routes
Replace f-string interpolation in logger.info() calls with %s-style
lazy formatting to avoid unnecessary string construction when the log
level is above INFO.
2026-03-29 06:42:03 +02:00
Zamil Majdy
dc272559c6 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-29 04:19:35 +02:00
Zamil Majdy
a98b0aee95 style(frontend): format useRateLimitManager.ts with prettier
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:56:54 +00:00
Zamil Majdy
264869cab9 fix(frontend): correct proxy path to /api/proxy/api/ for fetch calls
The Next.js proxy at /api/proxy/[...path] forwards the path to
AGPT_SERVER_URL which already includes /api. So the path needs
/api/proxy/api/... (double api — one for proxy route, one for backend).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:47:29 +00:00
Zamil Majdy
a85ba9e36d fix(frontend): use /api/proxy/ prefix for search_users and tier fetch calls
The generated API hooks use /api/proxy/ as baseUrl. Raw fetch() calls
must use the same proxy path to reach the backend through Next.js.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 16:32:52 +00:00
Zamil Majdy
18c5f67107 fix(frontend): use search_users only, remove credit-history fallback
The getV2GetAllUsersHistory searches transactions, not users — useless
for user search. Only use the search_users endpoint which queries
the User table directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:31:26 +00:00
Zamil Majdy
0348e7b228 fix(frontend): add fallback to credit-history search when search_users unavailable
The search_users endpoint may not be deployed in preview environments
(Docker cache). Falls back to getV2GetAllUsersHistory (credit
transactions) which at least returns users with transaction history.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 15:26:25 +00:00
Zamil Majdy
e35376d3ec fix(frontend): regenerate openapi.json from backend export-api-schema
Generated using `poetry run export-api-schema` + prettier, matching
the exact CI pipeline. Includes all new endpoints: search_users,
tier management, SubscriptionTier enum.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:24:35 +00:00
Zamil Majdy
687af1bdc3 fix(frontend): propagate fetchRateLimit errors in handleTierChange
Use direct getV2GetUserRateLimit call instead of fetchRateLimit
(which swallows errors internally). This ensures the caller's
success/error toast is accurate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 12:26:35 +00:00
Zamil Majdy
694032e45f revert(frontend): restore PR-specific openapi.json
The dev server spec doesn't include this PR's changes (tier endpoints,
SubscriptionTier enum). Reverting to the PR-specific version.

The check API types CI requires a local backend run to generate the
exact matching spec. This is a limitation for endpoint-adding PRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:10:15 +00:00
Zamil Majdy
231a4b6f51 fix(frontend): use dev server spec as base for openapi.json
Uses the actual backend-generated spec from dev server as the base,
adds search_users endpoint, sorts alphabetically, and runs prettier.
This matches the exact CI pipeline: export → prettier → diff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:05:38 +00:00
Zamil Majdy
da6f77da47 fix(frontend): sort openapi.json paths alphabetically to match backend
The backend generates paths in alphabetical order. Our manually added
endpoint was at the end. Also fix unicode em-dash encoding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:47:47 +00:00
Zamil Majdy
1747f4e6f3 fix(frontend): add search_users endpoint to openapi.json in CI format
Uses exact format from CI-generated spec (tags, operationId, security).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:40:39 +00:00
Zamil Majdy
0d6d8e820c style(frontend): format openapi.json with prettier
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:25:55 +00:00
Zamil Majdy
24c286fbed fix(frontend): remove manual OpenAPI additions, let CI generate
The check API types CI job generates openapi.json from the running
backend. Manual additions don't match the auto-generated format.
Removing them so CI can generate the correct spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:17:19 +00:00
Zamil Majdy
c75f1ff749 fix(frontend): add search_users to OpenAPI spec and regenerate types
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:04:47 +00:00
Zamil Majdy
cfc6d3538c fix(backend): format user.py and add search_users endpoint tests
Fixes ruff formatting in search_users function. Adds tests for:
- Search returning multiple matching users
- Search with no results returning empty list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:54:32 +00:00
Zamil Majdy
e9540041d6 fix(platform): search users from User table instead of credit history
The admin rate-limits user search was querying CreditTransaction table,
which only returns users with transaction history. Users without any
credit transactions (e.g. new accounts) were missing from results.

Adds search_users() to data/user.py that queries the User table directly
with case-insensitive partial matching on email and name. Adds a new
GET /api/copilot/admin/rate_limit/search_users endpoint. Updates the
frontend to use this instead of the spending-history search.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:23:14 +00:00
Zamil Majdy
8ac86a03b5 fix(platform): correct tier multiplier labels and add tier validation tests
Fix TIER_MULTIPLIERS mismatch in RateLimitDisplay.tsx where PRO showed
"10x" (should be "5x") and BUSINESS showed "30x" (should be "20x"),
not matching backend rate_limit.py values.

Add tests for invalid tier API input (uppercase "INVALID"), FREE-tier
bypass prevention (negative test), and tier-change limit propagation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 00:00:55 +00:00
Zamil Majdy
2aac78eae4 fix(frontend): fix lint and type errors in tier selector
- Replace template literal with regular string for static URL
- Fix TypeScript cast via intermediate `unknown` for tier field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 23:43:28 +00:00
Zamil Majdy
dbfc791357 feat(frontend): add subscription tier selector to admin rate-limits page
Adds tier badge display and dropdown selector to the admin rate-limits
page. Admins can now view and change a user's subscription tier
(FREE/PRO/BUSINESS/ENTERPRISE) with multiplier info. The dropdown calls
POST /api/copilot/admin/rate_limit/tier and re-fetches limits to reflect
the new tier.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:18:12 +00:00
Zamil Majdy
880c957c86 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/rate-limit-tiering 2026-03-27 13:29:04 +07:00
Zamil Majdy
857a8ef0aa test(rate-limit): add tier-limit enforcement integration tests
Verifies that tier-multiplied limits are actually respected: usage
within allowance passes, usage at/above limit is rejected, and
higher tiers tolerate usage that would exceed lower tiers.
2026-03-27 13:15:22 +07:00
Zamil Majdy
1008f9fcd4 merge: resolve conflicts with dev, keep tier changes
Merge origin/dev into feat/rate-limit-tiering. Conflicts arose from
the admin-routes refactor (resolved_id rename, _patch_rate_limit_deps
helper) colliding with our 3-tuple get_global_rate_limits and tier
field additions. Resolution keeps our SubscriptionTier enum, 3-tuple
returns, and tier fields while adopting the incoming resolved_id
variable and DRY test helper. Snapshots now include both tier and
user_email fields.
2026-03-27 13:12:38 +07:00
Zamil Majdy
c26791e6ae fix(test): mock get_global_rate_limits in reset_usage tests
The reset_copilot_usage endpoint now calls get_global_rate_limits()
which applies the tier multiplier. Tests were not mocking this, so
the daily_limit was inflated by the PRO 5x multiplier, making the
"at limit" check fail. Mock get_global_rate_limits to return base
limits directly.
2026-03-27 12:19:19 +07:00
Zamil Majdy
cf66c08125 fix(platform): rewrite migration to create enum before referencing it
The migration assumed a pre-existing SubscriptionTier enum from an
intermediate commit that was squashed. On a fresh DB the ALTER TYPE
fails with "type SubscriptionTier does not exist". Replace the
alter/rename/recreate sequence with a simple CREATE TYPE + ADD COLUMN.
2026-03-27 12:05:21 +07:00
Zamil Majdy
b4362785e4 fix(platform): update enterprise tier multiplier from 50x to 60x 2026-03-27 11:31:24 +07:00
Zamil Majdy
f38fa96df4 refactor(platform): update tier structure — remove STANDARD, add BUSINESS, default to PRO
Product decision: simplify tiers for beta testing.
- Tiers: FREE(1x), PRO(5x, default on sign-up), BUSINESS(20x), ENTERPRISE(50x)
- Remove STANDARD tier, rename existing STANDARD users to PRO in migration
- Default sign-up tier changed from FREE to PRO during beta
- Migration: recreate enum without STANDARD, add BUSINESS, update default
2026-03-27 11:25:50 +07:00
Zamil Majdy
98c8f94ef2 fix(platform): address round 1 review findings for rate-limit tiering
- Document _fetch_user_tier caching behavior for None tier values
- Add clarifying comment that TIER_MULTIPLIERS uses int intentionally
- Add 3 unit tests for set_user_tier (happy path, RecordNotFoundError,
  cache invalidation)
- Fix test isolation: mock get_global_rate_limits in chat routes usage
  tests to avoid implicit LD/Prisma fallback dependency
2026-03-27 11:07:50 +07:00
Zamil Majdy
7b0111d9b5 test(copilot): add missing PRO tier 10x multiplier test
Complete the tier multiplier coverage matrix by adding a test case
for the PRO tier (10x). Previously only FREE (1x), STANDARD (5x),
and ENTERPRISE (25x) were tested.
2026-03-27 10:48:53 +07:00
Zamil Majdy
85e9e4c5b7 refactor(copilot): rename RateLimitTier to SubscriptionTier with Prisma enum
Rename `rateLimitTier` (String) to `subscriptionTier` (Prisma enum) across
the entire stack:

- schema.prisma: Add `SubscriptionTier` enum (FREE, STANDARD, PRO,
  ENTERPRISE), change User field from `rateLimitTier String` to
  `subscriptionTier SubscriptionTier`.
- migration.sql: CREATE TYPE + ALTER TABLE for the new enum column.
- rate_limit.py: Rename Python enum and update DB field references.
- All test files, admin routes, snapshots, and openapi.json updated to
  match the new naming.

Addresses PR feedback asking for a generic name and proper Prisma enum
instead of a free-form string.
2026-03-27 10:17:21 +07:00
Zamil Majdy
e900ee615a fix(copilot): move get_user_tier import to top-level and expose cache via public API
- sdk/service.py: Move `get_user_tier` import from local (inside function)
  to module-level — no circular dependency exists.
- rate_limit.py: Expose `cache_clear`/`cache_delete` as attributes on the
  public `get_user_tier` function so callers never need to import the
  private `_fetch_user_tier`.
- rate_limit_test.py: Remove `_fetch_user_tier` import; use
  `get_user_tier.cache_clear()` instead.
2026-03-27 09:52:59 +07:00
Zamil Majdy
e1d5113051 fix(platform): pass tier to get_usage_status() in admin rate limit endpoints
For consistency, pass tier=tier to get_usage_status() in the admin
get_user_rate_limit and reset_user_rate_limit endpoints as well.
2026-03-27 01:40:14 +07:00
Zamil Majdy
4963d227ea fix(platform): pass tier to get_usage_status() in reset_copilot_usage endpoint
The reset_copilot_usage endpoint was calling get_usage_status() without
the tier parameter, causing the response to always report STANDARD tier
regardless of the user's actual tier. Pass _tier from get_global_rate_limits()
to both get_usage_status() calls in the endpoint.
2026-03-27 01:37:01 +07:00
Zamil Majdy
19dea0e4ca fix(test): update usage test assertions to include tier parameter
Update test_usage_returns_daily_and_weekly and test_usage_uses_config_limits
to include tier=RateLimitTier.STANDARD in the expected call kwargs, matching
the new tier parameter added to get_usage_status().
2026-03-27 01:24:52 +07:00
Zamil Majdy
87d5a39267 fix(platform): use direct dict indexing for tier multiplier lookup
Use TIER_MULTIPLIERS[tier] instead of .get(tier, 1) to fail fast
if a new tier is added to the enum without a corresponding multiplier.
2026-03-27 01:12:37 +07:00
Zamil Majdy
87ac8148e3 refactor(platform): pass tier to get_usage_status() instead of post-mutation
Add tier parameter to get_usage_status() so callers can set the tier
at construction time rather than mutating the model after creation.
This is safer if the model ever becomes frozen.
2026-03-27 01:01:44 +07:00
Zamil Majdy
491132f62f Merge dev: resolve conflicts + fix transient DB error caching default tier
Resolve merge conflicts between rate-limit tiering and reset-daily-usage
features (both additive). Fix Sentry-flagged bug where a transient DB
error in get_user_tier cached DEFAULT_TIER for 5 minutes, incorrectly
downgrading higher-tier users. Split into _fetch_user_tier (cached, raises
on error) and get_user_tier (uncached wrapper with fallback). Added
regression test test_db_error_is_not_cached.
2026-03-26 23:50:10 +07:00
Zamil Majdy
55815a3207 chore: trigger CI 2026-03-26 21:45:07 +07:00
Zamil Majdy
5c3aa11600 fix(test): add rateLimitTier to User mock in store db_test
The new rateLimitTier field on User is NOT NULL with a DB default,
so Prisma's Pydantic model requires it at construction time.
2026-03-26 21:06:38 +07:00
Zamil Majdy
b5cbf8505b fix(backend): remove platform schema prefix from migration SQL
CI test database doesn't have the "platform" schema. Use unqualified
table name so the migration works in all environments.
2026-03-26 20:50:40 +07:00
Zamil Majdy
f49f63de76 fix: mock PrismaUser where it is used, not where it is defined
Change mock target from prisma.models.User.prisma to
backend.copilot.rate_limit.PrismaUser.prisma to follow the
coding guideline of mocking at the import boundary.
2026-03-26 20:48:57 +07:00
Zamil Majdy
8f76384942 fix: invalidate get_user_tier cache when tier is updated via set_user_tier
Call get_user_tier.cache_delete(user_id) after DB update so that
subsequent rate-limit checks immediately see the new tier instead
of using a stale cached value for up to 5 minutes.
2026-03-26 20:47:01 +07:00
Zamil Majdy
ffb8d366d6 fix: address PR review - cache tier lookups, return tier from get_global_rate_limits, fix error handling
- Add @cached(ttl_seconds=300) to get_user_tier() to avoid DB hit on every chat turn
- Change get_global_rate_limits() to return 3-tuple (daily, weekly, tier) so callers
  don't need redundant get_user_tier() calls
- Remove redundant get_user_tier() calls from admin routes and chat /usage endpoint
- Simplify `except (ValueError, Exception)` to `except Exception`
- Handle prisma.errors.RecordNotFoundError in set_user_tier admin endpoint (404 vs 500)
- Add test for user-not-found case on set_user_tier endpoint
- Clear tier cache between tests to prevent stale cached results
2026-03-26 20:42:01 +07:00
Zamil Majdy
432ef5ab5e feat(platform): add rate-limit tiering system for CoPilot
Add a three-tier rate-limiting system (standard/pro/max) that allows
assigning different token limits to users. Tier multipliers are applied
on top of the base limits from LaunchDarkly/config.

Changes:
- Add RateLimitTier enum with standard (1x), pro (5x), max (25x) multipliers
- Add rateLimitTier column to User model in Prisma schema
- Add get_user_tier/set_user_tier DB functions in rate_limit.py
- Update get_global_rate_limits to apply tier multiplier to base limits
- Add admin endpoints: GET/POST /admin/rate_limit/tier for tier management
- Include tier info in UserRateLimitResponse and CoPilotUsageStatus
- Send user tier as metadata in OTEL/Langfuse traces
- Add comprehensive tests (43 total, all passing)
- Add Prisma migration for the new column
2026-03-26 20:31:53 +07:00
2562 changed files with 825304 additions and 65170 deletions

View File

@@ -1,10 +0,0 @@
{
"permissions": {
"allowedTools": [
"Read", "Grep", "Glob",
"Bash(ls:*)", "Bash(cat:*)", "Bash(grep:*)", "Bash(find:*)",
"Bash(git status:*)", "Bash(git diff:*)", "Bash(git log:*)", "Bash(git worktree:*)",
"Bash(tmux:*)", "Bash(sleep:*)", "Bash(branchlet:*)"
]
}
}

View File

@@ -1,545 +0,0 @@
---
name: orchestrate
description: "Meta-agent supervisor that manages a fleet of Claude Code agents running in tmux windows. Auto-discovers spare worktrees, spawns agents, monitors state, kicks idle agents, approves safe confirmations, and recycles worktrees when done. TRIGGER when user asks to supervise agents, run parallel tasks, manage worktrees, check agent status, or orchestrate parallel work."
user-invocable: true
argument-hint: "any free text — e.g. 'start 3 agents on X Y Z', 'show status', 'add task: implement feature A', 'stop', 'how many are free?'"
metadata:
author: autogpt-team
version: "6.0.0"
---
# Orchestrate — Agent Fleet Supervisor
One tmux session, N windows — each window is one agent working in its own worktree. Speak naturally; Claude maps your intent to the right scripts.
## Scripts
```bash
SKILLS_DIR=$(git rev-parse --show-toplevel)/.claude/skills/orchestrate/scripts
STATE_FILE=~/.claude/orchestrator-state.json
```
| Script | Purpose |
|---|---|
| `find-spare.sh [REPO_ROOT]` | List free worktrees — one `PATH BRANCH` per line |
| `spawn-agent.sh SESSION PATH SPARE NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]` | Create window + checkout branch + launch claude + send task. **Stdout: `SESSION:WIN` only** |
| `recycle-agent.sh WINDOW PATH SPARE_BRANCH` | Kill window + restore spare branch |
| `run-loop.sh` | **Mechanical babysitter** — idle restart + dialog approval + recycle on ORCHESTRATOR:DONE + supervisor health check + all-done notification |
| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green + no fresh CHANGES_REQUESTED. Repo auto-derived from state file `.repo` or git remote. |
| `notify.sh MESSAGE` | Send notification via Discord webhook (env `DISCORD_WEBHOOK_URL` or state `.discord_webhook`), macOS notification center, and stdout |
| `capacity.sh [REPO_ROOT]` | Print available + in-use worktrees |
| `status.sh` | Print fleet status + live pane commands |
| `poll-cycle.sh` | One monitoring cycle — classifies panes, tracks checkpoints, returns JSON action array |
| `classify-pane.sh WINDOW` | Classify one pane state |
## Supervision model
```
Orchestrating Claude (this Claude session — IS the supervisor)
└── Reads pane output, checks CI, intervenes with targeted guidance
run-loop.sh (separate tmux window, every 30s)
└── Mechanical only: idle restart, dialog approval, recycle on ORCHESTRATOR:DONE
```
**You (the orchestrating Claude)** are the supervisor. After spawning agents, stay in this conversation and actively monitor: poll each agent's pane every 2-3 minutes, check CI, nudge stalled agents, and verify completions. Do not spawn a separate supervisor Claude window — it loses context, is hard to observe, and compounds context compression problems.
**run-loop.sh** is the mechanical layer — zero tokens, handles things that need no judgment: restart crashed agents, press Enter on dialogs, recycle completed worktrees (only after `verify-complete.sh` passes).
## Checkpoint protocol
Agents output checkpoints as they complete each required step:
```
CHECKPOINT:<step-name>
```
Required steps are passed as args to `spawn-agent.sh` (e.g. `pr-address pr-test`). `run-loop.sh` will not recycle a window until all required checkpoints are found in the pane output. If `verify-complete.sh` fails, the agent is re-briefed automatically.
## Worktree lifecycle
```text
spare/N branch → spawn-agent.sh (--session-id UUID) → window + feat/branch + claude running
CHECKPOINT:<step> (as steps complete)
ORCHESTRATOR:DONE
verify-complete.sh: checkpoints ✓ + 0 threads + CI green + no fresh CHANGES_REQUESTED
state → "done", notify, window KEPT OPEN
user/orchestrator explicitly requests recycle
recycle-agent.sh → spare/N (free again)
```
**Windows are never auto-killed.** The worktree stays on its branch, the session stays alive. The agent is done working but the window, git state, and Claude session are all preserved until you choose to recycle.
**To resume a done or crashed session:**
```bash
# Resume by stored session ID (preferred — exact session, full context)
claude --resume SESSION_ID --permission-mode bypassPermissions
# Or resume most recent session in that worktree directory
cd /path/to/worktree && claude --continue --permission-mode bypassPermissions
```
**To manually recycle when ready:**
```bash
bash ~/.claude/orchestrator/scripts/recycle-agent.sh SESSION:WIN WORKTREE_PATH spare/N
# Then update state:
jq --arg w "SESSION:WIN" '.agents |= map(if .window == $w then .state = "recycled" else . end)' \
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
```
## State file (`~/.claude/orchestrator-state.json`)
Never committed to git. You maintain this file directly using `jq` + atomic writes (`.tmp``mv`).
```json
{
"active": true,
"tmux_session": "autogpt1",
"idle_threshold_seconds": 300,
"loop_window": "autogpt1:5",
"repo": "Significant-Gravitas/AutoGPT",
"discord_webhook": "https://discord.com/api/webhooks/...",
"last_poll_at": 0,
"agents": [
{
"window": "autogpt1:3",
"worktree": "AutoGPT6",
"worktree_path": "/path/to/AutoGPT6",
"spare_branch": "spare/6",
"branch": "feat/my-feature",
"objective": "Implement X and open a PR",
"pr_number": "12345",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"steps": ["pr-address", "pr-test"],
"checkpoints": ["pr-address"],
"state": "running",
"last_output_hash": "",
"last_seen_at": 0,
"spawned_at": 0,
"idle_since": 0,
"revision_count": 0,
"last_rebriefed_at": 0
}
]
}
```
Top-level optional fields:
- `repo` — GitHub `owner/repo` for CI/thread checks. Auto-derived from git remote if omitted.
- `discord_webhook` — Discord webhook URL for completion notifications. Also reads `DISCORD_WEBHOOK_URL` env var.
Per-agent fields:
- `session_id` — UUID passed to `claude --session-id` at spawn; use with `claude --resume UUID` to restore exact session context after a crash or window close.
- `last_rebriefed_at` — Unix timestamp of last re-brief; enforces 5-min cooldown to prevent spam.
Agent states: `running` | `idle` | `stuck` | `waiting_approval` | `complete` | `done` | `escalated`
`done` means verified complete — window is still open, session still alive, worktree still on task branch. Not recycled yet.
## Serial /pr-test rule
`/pr-test` and `/pr-test --fix` run local Docker + integration tests that use shared ports, a shared database, and shared build caches. **Running two `/pr-test` jobs simultaneously will cause port conflicts and database corruption.**
**Rule: only one `/pr-test` runs at a time. The orchestrator serializes them.**
You (the orchestrating Claude) own the test queue:
1. Agents do `pr-review` and `pr-address` in parallel — that's safe (they only push code and reply to GitHub).
2. When a PR needs local testing, add it to your mental queue — don't give agents a `pr-test` step.
3. Run `/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER --fix` yourself, sequentially.
4. Feed results back to the relevant agent via `tmux send-keys`:
```bash
tmux send-keys -t SESSION:WIN "Local tests for PR #N: <paste failure output or 'all passed'>. Fix any failures and push, then output ORCHESTRATOR:DONE."
sleep 0.3
tmux send-keys -t SESSION:WIN Enter
```
5. Wait for CI to confirm green before marking the agent done.
If multiple PRs need testing at the same time, pick the one furthest along (fewest pending CI checks) and test it first. Only start the next test after the previous one completes.
## Session restore (tested and confirmed)
Agent sessions are saved to disk. To restore a closed or crashed session:
```bash
# If session_id is in state (preferred):
NEW_WIN=$(tmux new-window -t SESSION -n WORKTREE_NAME -P -F '#{window_index}')
tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --resume SESSION_ID --permission-mode bypassPermissions" Enter
# If no session_id (use --continue for most recent session in that directory):
tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --continue --permission-mode bypassPermissions" Enter
```
`--continue` restores the full conversation history including all tool calls, file edits, and context. The agent resumes exactly where it left off. After restoring, update the window address in the state file:
```bash
jq --arg old "SESSION:OLD_WIN" --arg new "SESSION:NEW_WIN" \
'(.agents[] | select(.window == $old)).window = $new' \
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
```
## Intent → action mapping
Match the user's message to one of these intents:
| The user says something like… | What to do |
|---|---|
| "status", "what's running", "show agents" | Run `status.sh` + `capacity.sh`, show output |
| "how many free", "capacity", "available worktrees" | Run `capacity.sh`, show output |
| "start N agents on X, Y, Z" or "run these tasks: …" | See **Spawning agents** below |
| "add task: …", "add one more agent for …" | See **Adding an agent** below |
| "stop", "shut down", "pause the fleet" | See **Stopping** below |
| "poll", "check now", "run a cycle" | Run `poll-cycle.sh`, process actions |
| "recycle window X", "free up autogpt3" | Run `recycle-agent.sh` directly |
When the intent is ambiguous, show capacity first and ask what tasks to run.
## Spawning agents
### 1. Resolve tmux session
```bash
tmux list-sessions -F "#{session_name}: #{session_windows} windows" 2>/dev/null
```
Use an existing session. **Never create a tmux session from within Claude** — it becomes a child of Claude's process and dies when the session ends. If no session exists, tell the user to run `tmux new-session -d -s autogpt1` in their terminal first, then re-invoke `/orchestrate`.
### 2. Show available capacity
```bash
bash $SKILLS_DIR/capacity.sh $(git rev-parse --show-toplevel)
```
### 3. Collect tasks from the user
For each task, gather:
- **objective** — what to do (e.g. "implement feature X and open a PR")
- **branch name** — e.g. `feat/my-feature` (derive from objective if not given)
- **pr_number** — GitHub PR number if working on an existing PR (for verification)
- **steps** — required checkpoint names in order (e.g. `pr-address pr-test`) — derive from objective
Ask for `idle_threshold_seconds` only if the user mentions it (default: 300).
Never ask the user to specify a worktree — auto-assign from `find-spare.sh`.
### 4. Spawn one agent per task
```bash
# Get ordered list of spare worktrees
SPARE_LIST=$(bash $SKILLS_DIR/find-spare.sh $(git rev-parse --show-toplevel))
# For each task, take the next spare line:
WORKTREE_PATH=$(echo "$SPARE_LINE" | awk '{print $1}')
SPARE_BRANCH=$(echo "$SPARE_LINE" | awk '{print $2}')
# With PR number and required steps:
WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE" "$PR_NUMBER" "pr-address" "pr-test")
# Without PR (new work):
WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE")
```
Build an agent record and append it to the state file. If the state file doesn't exist yet, initialize it:
```bash
# Derive repo from git remote (used by verify-complete.sh + supervisor)
REPO=$(git remote get-url origin 2>/dev/null | sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
jq -n \
--arg session "$SESSION" \
--arg repo "$REPO" \
--argjson threshold 300 \
'{active:true, tmux_session:$session, idle_threshold_seconds:$threshold,
repo:$repo, loop_window:null, supervisor_window:null, last_poll_at:0, agents:[]}' \
> ~/.claude/orchestrator-state.json
```
Optionally add a Discord webhook for completion notifications:
```bash
jq --arg hook "$DISCORD_WEBHOOK_URL" '.discord_webhook = $hook' ~/.claude/orchestrator-state.json \
> /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
```
`spawn-agent.sh` writes the initial agent record (window, worktree_path, branch, objective, state, etc.) to the state file automatically — **do not append the record again after calling it.** The record already exists and `pr_number`/`steps` are patched in by the script itself.
### 5. Start the mechanical babysitter
```bash
LOOP_WIN=$(tmux new-window -t "$SESSION" -n "orchestrator" -P -F '#{window_index}')
LOOP_WINDOW="${SESSION}:${LOOP_WIN}"
tmux send-keys -t "$LOOP_WINDOW" "bash $SKILLS_DIR/run-loop.sh" Enter
jq --arg w "$LOOP_WINDOW" '.loop_window = $w' ~/.claude/orchestrator-state.json \
> /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
```
### 6. Begin supervising directly in this conversation
You are the supervisor. After spawning, immediately start your first poll loop (see **Supervisor duties** below) and continue every 2-3 minutes. Do NOT spawn a separate supervisor Claude window.
## Adding an agent
Find the next spare worktree, then spawn and append to state — same as steps 24 above but for a single task. If no spare worktrees are available, tell the user.
## Supervisor duties (YOUR job, every 2-3 min in this conversation)
You are the supervisor. Run this poll loop directly in your Claude session — not in a separate window.
### Poll loop mechanism
You are reactive — you only act when a tool completes or the user sends a message. To create a self-sustaining poll loop without user involvement:
1. Start each poll with `run_in_background: true` + a sleep before the work:
```bash
sleep 120 && tmux capture-pane -t autogpt1:0 -p -S -200 | tail -40
# + similar for each active window
```
2. When the background job notifies you, read the pane output and take action.
3. Immediately schedule the next background poll — this keeps the loop alive.
4. Stop scheduling when all agents are done/escalated.
**Never tell the user "I'll poll every 2-3 minutes"** — that does nothing without a trigger. Start the background job instead.
### Each poll: what to check
```bash
# 1. Read state
cat ~/.claude/orchestrator-state.json | jq '.agents[] | {window, worktree, branch, state, pr_number, checkpoints}'
# 2. For each running/stuck/idle agent, capture pane
tmux capture-pane -t SESSION:WIN -p -S -200 | tail -60
```
For each agent, decide:
| What you see | Action |
|---|---|
| Spinner / tools running | Do nothing — agent is working |
| Idle `` prompt, no `ORCHESTRATOR:DONE` | Stalled — send specific nudge with objective from state |
| Stuck in error loop | Send targeted fix with exact error + solution |
| Waiting for input / question | Answer and unblock via `tmux send-keys` |
| CI red | `gh pr checks PR_NUMBER --repo REPO` → tell agent exactly what's failing |
| Context compacted / agent lost | Send recovery: `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="WIN")'` + `gh pr view PR_NUMBER --json title,body` |
| `ORCHESTRATOR:DONE` in output | Run `verify-complete.sh` — if it fails, re-brief with specific reason |
### Strict ORCHESTRATOR:DONE gate
`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CI green, spawned_at, and CHANGES_REQUESTED). Run it:
**CHANGES_REQUESTED staleness rule**: a `CHANGES_REQUESTED` review only blocks if it was submitted *after* the latest commit. If the latest commit postdates the review, the review is considered stale (feedback already addressed) and does not block. This avoids false negatives when a bot reviewer hasn't re-reviewed after the agent's fixing commits.
```bash
SKILLS_DIR=~/.claude/orchestrator/scripts
bash $SKILLS_DIR/verify-complete.sh SESSION:WIN
```
If it passes → run-loop.sh will recycle the window automatically. No manual action needed.
If it fails → re-brief the agent with the failure reason. Never manually mark state `done` to bypass this.
### Re-brief a stalled agent
```bash
OBJ=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .objective' ~/.claude/orchestrator-state.json)
PR=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .pr_number' ~/.claude/orchestrator-state.json)
tmux send-keys -t SESSION:WIN "You appear stalled. Your objective: $OBJ. Check: gh pr view $PR --json title,body,headRefName to reorient."
sleep 0.3
tmux send-keys -t SESSION:WIN Enter
```
If `image_path` is set on the agent record, include: "Re-read context at IMAGE_PATH with the Read tool."
## Self-recovery protocol (agents)
spawn-agent.sh automatically includes this instruction in every objective:
> If your context compacts and you lose track of what to do, run:
> `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="SESSION:WIN")'`
> and `gh pr view PR_NUMBER --json title,body,headRefName` to reorient.
> Output each completed step as `CHECKPOINT:<step-name>` on its own line.
## Passing images and screenshots to agents
`tmux send-keys` is text-only — you cannot paste a raw image into a pane. To give an agent visual context (screenshots, diagrams, mockups):
1. **Save the image to a temp file** with a stable path:
```bash
# If the user drags in a screenshot or you receive a file path:
IMAGE_PATH="/tmp/orchestrator-context-$(date +%s).png"
cp "$USER_PROVIDED_PATH" "$IMAGE_PATH"
```
2. **Reference the path in the objective string**:
```bash
OBJECTIVE="Implement the layout shown in /tmp/orchestrator-context-1234567890.png. Read that image first with the Read tool to understand the design."
```
3. The agent uses its `Read` tool to view the image at startup — Claude Code agents are multimodal and can read image files directly.
**Rule**: always use `/tmp/orchestrator-context-<timestamp>.png` as the naming convention so the supervisor knows what to look for if it needs to re-brief an agent with the same image.
---
## Orchestrator final evaluation (YOU decide, not the script)
`verify-complete.sh` is a gate — it blocks premature marking. But it cannot tell you if the work is actually good. That is YOUR job.
When run-loop marks an agent `pending_evaluation` and you're notified, do all of these before marking done:
### 1. Run /pr-test (required, serialized, use TodoWrite to queue)
`/pr-test` is the only reliable confirmation that the objective is actually met. Run it yourself, not the agent.
**When multiple PRs reach `pending_evaluation` at the same time, use TodoWrite to queue them:**
```
- [ ] /pr-test PR #12636 — fix copilot retry logic
- [ ] /pr-test PR #12699 — builder chat panel
```
Run one at a time. Check off as you go.
```
/pr-test https://github.com/Significant-Gravitas/AutoGPT/pull/PR_NUMBER
```
**/pr-test can be lazy** — if it gives vague output, re-run with full context:
```
/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER
Context: This PR implements <objective from state file>. Key files: <list>.
Please verify: <specific behaviors to check>.
```
Only one `/pr-test` at a time — they share ports and DB.
### /pr-test result evaluation
**PARTIAL on any headline feature scenario is an immediate blocker.** Do not approve, do not mark done, do not let the agent output `ORCHESTRATOR:DONE`.
| `/pr-test` result | Action |
|---|---|
| All headline scenarios **PASS** | Proceed to evaluation step 2 |
| Any headline scenario **PARTIAL** | Re-brief the agent immediately — see below |
| Any headline scenario **FAIL** | Re-brief the agent immediately |
**What PARTIAL means**: the feature is only partly working. Example: the Apply button never appeared, or the AI returned no action blocks. The agent addressed part of the objective but not all of it.
**When any headline scenario is PARTIAL or FAIL:**
1. Do NOT mark the agent done or accept `ORCHESTRATOR:DONE`
2. Re-brief the agent with the specific scenario that failed and what was missing:
```bash
tmux send-keys -t SESSION:WIN "PARTIAL result on /pr-test — S5 (Apply button) never appeared. The AI must output JSON action blocks for the Apply button to render. Fix this before re-running /pr-test."
sleep 0.3
tmux send-keys -t SESSION:WIN Enter
```
3. Set state back to `running`:
```bash
jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "running"' \
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
```
4. Wait for new `ORCHESTRATOR:DONE`, then re-run `/pr-test` from scratch
**Rule: only ALL-PASS qualifies for approval.** A mix of PASS + PARTIAL is a failure.
> **Why this matters**: PR #12699 was wrongly approved with S5 PARTIAL — the AI never output JSON action blocks so the Apply button never appeared. The fix was already in the agent's reach but slipped through because PARTIAL was not treated as blocking.
### 2. Do your own evaluation
1. **Read the PR diff and objective** — does the code actually implement what was asked? Is anything obviously missing or half-done?
2. **Read the resolved threads** — were comments addressed with real fixes, or just dismissed/resolved without changes?
3. **Check CI run names** — any suspicious retries that shouldn't have passed?
4. **Check the PR description** — title, summary, test plan complete?
### 3. Decide
- `/pr-test` all scenarios PASS + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
- `/pr-test` any scenario PARTIAL or FAIL → re-brief the agent with the specific failing scenario, set state back to `running` (see `/pr-test result evaluation` above)
- Evaluation finds gaps even with all PASS → re-brief the agent with specific gaps, set state back to `running`
**Never mark done based purely on script output.** You hold the full objective context; the script does not.
```bash
# Mark done after your positive evaluation:
jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "done"' \
~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
```
## When to stop the fleet
Stop the fleet (`active = false`) when **all** of the following are true:
| Check | How to verify |
|---|---|
| All agents are `done` or `escalated` | `jq '[.agents[] | select(.state | test("running\|stuck\|idle\|waiting_approval"))] | length' ~/.claude/orchestrator-state.json` == 0 |
| All PRs have 0 unresolved review threads | GraphQL `isResolved` check per PR |
| All PRs have green CI **on a run triggered after the agent's last push** | `gh run list --branch BRANCH --limit 1` timestamp > `spawned_at` in state |
| No fresh CHANGES_REQUESTED (after latest commit) | `verify-complete.sh` checks this — stale pre-commit reviews are ignored |
| No agents are `escalated` without human review | If any are escalated, surface to user first |
**Do NOT stop just because agents output `ORCHESTRATOR:DONE`.** That is a signal to verify, not a signal to stop.
**Do stop** if the user explicitly says "stop", "shut down", or "kill everything", even with agents still running.
```bash
# Graceful stop
jq '.active = false' ~/.claude/orchestrator-state.json > /tmp/orch.tmp \
&& mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
LOOP_WINDOW=$(jq -r '.loop_window // ""' ~/.claude/orchestrator-state.json)
[ -n "$LOOP_WINDOW" ] && tmux kill-window -t "$LOOP_WINDOW" 2>/dev/null || true
```
Does **not** recycle running worktrees — agents may still be mid-task. Run `capacity.sh` to see what's still in progress.
## tmux send-keys pattern
**Always split long messages into text + Enter as two separate calls with a sleep between them.** If sent as one call (`"text" Enter`), Enter can fire before the full string is buffered into Claude's input — leaving the message stuck as `[Pasted text +N lines]` unsent.
```bash
# CORRECT — text then Enter separately
tmux send-keys -t "$WINDOW" "your long message here"
sleep 0.3
tmux send-keys -t "$WINDOW" Enter
# WRONG — Enter may fire before text is buffered
tmux send-keys -t "$WINDOW" "your long message here" Enter
```
Short single-character sends (`y`, `Down`, empty Enter for dialog approval) are safe to combine since they have no buffering lag.
---
## Protected worktrees
Some worktrees must **never** be used as spare worktrees for agent tasks because they host files critical to the orchestrator itself:
| Worktree | Protected branch | Why |
|---|---|---|
| `AutoGPT1` | `dx/orchestrate-skill` | Hosts the orchestrate skill scripts. `recycle-agent.sh` would check out `spare/1`, wiping `.claude/skills/` and breaking all subsequent `spawn-agent.sh` calls. |
**Rule**: when selecting spare worktrees via `find-spare.sh`, skip any worktree whose CURRENT branch matches a protected branch. If you accidentally spawn an agent in a protected worktree, do not let `recycle-agent.sh` run on it — manually restore the branch after the agent finishes.
When `dx/orchestrate-skill` is merged into `dev`, `AutoGPT1` becomes a normal spare again.
---
## Key rules
1. **Scripts do all the heavy lifting** — don't reimplement their logic inline in this file
2. **Never ask the user to pick a worktree** — auto-assign from `find-spare.sh` output
3. **Never restart a running agent** — only restart on `idle` kicks (foreground is a shell)
4. **Auto-dismiss settings dialogs** — if "Enter to confirm" appears, send Down+Enter
5. **Always `--permission-mode bypassPermissions`** on every spawn
6. **Escalate after 3 kicks** — mark `escalated`, surface to user
7. **Atomic state writes** — always write to `.tmp` then `mv`
8. **Never approve destructive commands** outside the worktree scope — when in doubt, escalate
9. **Never recycle without verification** — `verify-complete.sh` must pass before recycling
10. **No TASK.md files** — commit risk; use state file + `gh pr view` for agent context persistence
11. **Re-brief stalled agents** — read objective from state file + `gh pr view`, send via tmux
12. **ORCHESTRATOR:DONE is a signal to verify, not to accept** — always run `verify-complete.sh` and check CI run timestamp before recycling
13. **Protected worktrees** — never use the worktree hosting the skill scripts as a spare
14. **Images via file path** — save screenshots to `/tmp/orchestrator-context-<ts>.png`, pass path in objective; agents read with the `Read` tool
15. **Split send-keys** — always separate text and Enter with `sleep 0.3` between calls for long strings

View File

@@ -1,43 +0,0 @@
#!/usr/bin/env bash
# capacity.sh — show fleet capacity: available spare worktrees + in-use agents
#
# Usage: capacity.sh [REPO_ROOT]
# REPO_ROOT defaults to the root worktree of the current git repo.
#
# Reads: ~/.claude/orchestrator-state.json (skipped if missing or corrupt)
set -euo pipefail
SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
echo "=== Available (spare) worktrees ==="
if [ -n "$REPO_ROOT" ]; then
SPARE=$("$SCRIPTS_DIR/find-spare.sh" "$REPO_ROOT" 2>/dev/null || echo "")
else
SPARE=$("$SCRIPTS_DIR/find-spare.sh" 2>/dev/null || echo "")
fi
if [ -z "$SPARE" ]; then
echo " (none)"
else
while IFS= read -r line; do
[ -z "$line" ] && continue
echo "$line"
done <<< "$SPARE"
fi
echo ""
echo "=== In-use worktrees ==="
if [ -f "$STATE_FILE" ] && jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
IN_USE=$(jq -r '.agents[] | select(.state != "done") | " [\(.state)] \(.worktree_path) → \(.branch)"' \
"$STATE_FILE" 2>/dev/null || echo "")
if [ -n "$IN_USE" ]; then
echo "$IN_USE"
else
echo " (none)"
fi
else
echo " (no active state file)"
fi

View File

@@ -1,85 +0,0 @@
#!/usr/bin/env bash
# classify-pane.sh — Classify the current state of a tmux pane
#
# Usage: classify-pane.sh <tmux-target>
# tmux-target: e.g. "work:0", "work:1.0"
#
# Output (stdout): JSON object:
# { "state": "running|idle|waiting_approval|complete", "reason": "...", "pane_cmd": "..." }
#
# Exit codes: 0=ok, 1=error (invalid target or tmux window not found)
set -euo pipefail
TARGET="${1:-}"
if [ -z "$TARGET" ]; then
echo '{"state":"error","reason":"no target provided","pane_cmd":""}'
exit 1
fi
# Validate tmux target format: session:window or session:window.pane
if ! [[ "$TARGET" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
echo '{"state":"error","reason":"invalid tmux target format","pane_cmd":""}'
exit 1
fi
# Check session exists (use %%:* to extract session name from session:window)
if ! tmux list-windows -t "${TARGET%%:*}" &>/dev/null 2>&1; then
echo '{"state":"error","reason":"tmux target not found","pane_cmd":""}'
exit 1
fi
# Get the current foreground command in the pane
PANE_CMD=$(tmux display-message -t "$TARGET" -p '#{pane_current_command}' 2>/dev/null || echo "unknown")
# Capture and strip ANSI codes (use perl for cross-platform compatibility — BSD sed lacks \x1b support)
RAW=$(tmux capture-pane -t "$TARGET" -p -S -50 2>/dev/null || echo "")
CLEAN=$(echo "$RAW" | perl -pe 's/\x1b\[[0-9;]*[a-zA-Z]//g; s/\x1b\(B//g; s/\x1b\[\?[0-9]*[hl]//g; s/\r//g' \
| grep -v '^[[:space:]]*$' || true)
# --- Check: explicit completion marker ---
# Must be on its own line (not buried in the objective text sent at spawn time).
if echo "$CLEAN" | grep -qE "^[[:space:]]*ORCHESTRATOR:DONE[[:space:]]*$"; then
jq -n --arg cmd "$PANE_CMD" '{"state":"complete","reason":"ORCHESTRATOR:DONE marker found","pane_cmd":$cmd}'
exit 0
fi
# --- Check: Claude Code approval prompt patterns ---
LAST_40=$(echo "$CLEAN" | tail -40)
APPROVAL_PATTERNS=(
"Do you want to proceed"
"Do you want to make this"
"\\[y/n\\]"
"\\[Y/n\\]"
"\\[n/Y\\]"
"Proceed\\?"
"Allow this command"
"Run bash command"
"Allow bash"
"Would you like"
"Press enter to continue"
"Esc to cancel"
)
for pattern in "${APPROVAL_PATTERNS[@]}"; do
if echo "$LAST_40" | grep -qiE "$pattern"; then
jq -n --arg pattern "$pattern" --arg cmd "$PANE_CMD" \
'{"state":"waiting_approval","reason":"approval pattern: \($pattern)","pane_cmd":$cmd}'
exit 0
fi
done
# --- Check: shell prompt (claude has exited) ---
# If the foreground process is a shell (not claude/node), the agent has exited
case "$PANE_CMD" in
zsh|bash|fish|sh|dash|tcsh|ksh)
jq -n --arg cmd "$PANE_CMD" \
'{"state":"idle","reason":"agent exited — shell prompt active","pane_cmd":$cmd}'
exit 0
;;
esac
# Agent is still running (claude/node/python is the foreground process)
jq -n --arg cmd "$PANE_CMD" \
'{"state":"running","reason":"foreground process: \($cmd)","pane_cmd":$cmd}'
exit 0

View File

@@ -1,24 +0,0 @@
#!/usr/bin/env bash
# find-spare.sh — list worktrees on spare/N branches (free to use)
#
# Usage: find-spare.sh [REPO_ROOT]
# REPO_ROOT defaults to the root worktree containing the current git repo.
#
# Output (stdout): one line per available worktree: "PATH BRANCH"
# e.g.: /Users/me/Code/AutoGPT3 spare/3
set -euo pipefail
REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
if [ -z "$REPO_ROOT" ]; then
echo "Error: not inside a git repo and no REPO_ROOT provided" >&2
exit 1
fi
git -C "$REPO_ROOT" worktree list --porcelain \
| awk '
/^worktree / { path = substr($0, 10) }
/^branch / { branch = substr($0, 8); print path " " branch }
' \
| { grep -E " refs/heads/spare/[0-9]+$" || true; } \
| sed 's|refs/heads/||'

View File

@@ -1,40 +0,0 @@
#!/usr/bin/env bash
# notify.sh — send a fleet notification message
#
# Delivery order (first available wins):
# 1. Discord webhook — DISCORD_WEBHOOK_URL env var OR state file .discord_webhook
# 2. macOS notification center — osascript (silent fail if unavailable)
# 3. Stdout only
#
# Usage: notify.sh MESSAGE
# Exit: always 0 (notification failure must not abort the caller)
MESSAGE="${1:-}"
[ -z "$MESSAGE" ] && exit 0
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
# --- Resolve Discord webhook ---
WEBHOOK="${DISCORD_WEBHOOK_URL:-}"
if [ -z "$WEBHOOK" ] && [ -f "$STATE_FILE" ]; then
WEBHOOK=$(jq -r '.discord_webhook // ""' "$STATE_FILE" 2>/dev/null || echo "")
fi
# --- Discord delivery ---
if [ -n "$WEBHOOK" ]; then
PAYLOAD=$(jq -n --arg msg "$MESSAGE" '{"content": $msg}')
curl -s -X POST "$WEBHOOK" \
-H "Content-Type: application/json" \
-d "$PAYLOAD" > /dev/null 2>&1 || true
fi
# --- macOS notification center (silent if not macOS or osascript missing) ---
if command -v osascript &>/dev/null 2>&1; then
# Escape single quotes for AppleScript
SAFE_MSG=$(echo "$MESSAGE" | sed "s/'/\\\\'/g")
osascript -e "display notification \"${SAFE_MSG}\" with title \"Orchestrator\"" 2>/dev/null || true
fi
# Always print to stdout so run-loop.sh logs it
echo "$MESSAGE"
exit 0

View File

@@ -1,257 +0,0 @@
#!/usr/bin/env bash
# poll-cycle.sh — Single orchestrator poll cycle
#
# Reads ~/.claude/orchestrator-state.json, classifies each agent, updates state,
# and outputs a JSON array of actions for Claude to take.
#
# Usage: poll-cycle.sh
# Output (stdout): JSON array of action objects
# [{ "window": "work:0", "action": "kick|approve|none", "state": "...",
# "worktree": "...", "objective": "...", "reason": "..." }]
#
# The state file is updated in-place (atomic write via .tmp).
set -euo pipefail
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CLASSIFY="$SCRIPTS_DIR/classify-pane.sh"
# Cross-platform md5: always outputs just the hex digest
md5_hash() {
if command -v md5sum &>/dev/null; then
md5sum | awk '{print $1}'
else
md5 | awk '{print $NF}'
fi
}
# Clean up temp file on any exit (avoids stale .tmp if jq write fails)
trap 'rm -f "${STATE_FILE}.tmp"' EXIT
# Ensure state file exists
if [ ! -f "$STATE_FILE" ]; then
echo '{"active":false,"agents":[]}' > "$STATE_FILE"
fi
# Validate JSON upfront before any jq reads that run under set -e.
# A truncated/corrupt file (e.g. from a SIGKILL mid-write) would otherwise
# abort the script at the ACTIVE read below without emitting any JSON output.
if ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
echo "State file parse error — check $STATE_FILE" >&2
echo "[]"
exit 0
fi
ACTIVE=$(jq -r '.active // false' "$STATE_FILE")
if [ "$ACTIVE" != "true" ]; then
echo "[]"
exit 0
fi
NOW=$(date +%s)
IDLE_THRESHOLD=$(jq -r '.idle_threshold_seconds // 300' "$STATE_FILE")
ACTIONS="[]"
UPDATED_AGENTS="[]"
# Read agents as newline-delimited JSON objects.
# jq exits non-zero when .agents[] has no matches on an empty array, which is valid —
# so we suppress that exit code and separately validate the file is well-formed JSON.
if ! AGENTS_JSON=$(jq -e -c '.agents // empty | .[]' "$STATE_FILE" 2>/dev/null); then
if ! jq -e '.' "$STATE_FILE" > /dev/null 2>&1; then
echo "State file parse error — check $STATE_FILE" >&2
fi
echo "[]"
exit 0
fi
if [ -z "$AGENTS_JSON" ]; then
echo "[]"
exit 0
fi
while IFS= read -r agent; do
[ -z "$agent" ] && continue
# Use // "" defaults so a single malformed field doesn't abort the whole cycle
WINDOW=$(echo "$agent" | jq -r '.window // ""')
WORKTREE=$(echo "$agent" | jq -r '.worktree // ""')
OBJECTIVE=$(echo "$agent"| jq -r '.objective // ""')
STATE=$(echo "$agent" | jq -r '.state // "running"')
LAST_HASH=$(echo "$agent"| jq -r '.last_output_hash // ""')
IDLE_SINCE=$(echo "$agent"| jq -r '.idle_since // 0')
REVISION_COUNT=$(echo "$agent"| jq -r '.revision_count // 0')
# Validate window format to prevent tmux target injection.
# Allow session:window (numeric or named) and session:window.pane
if ! [[ "$WINDOW" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
echo "Skipping agent with invalid window value: $WINDOW" >&2
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
continue
fi
# Pass-through terminal-state agents
if [[ "$STATE" == "done" || "$STATE" == "escalated" || "$STATE" == "complete" || "$STATE" == "pending_evaluation" ]]; then
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
continue
fi
# Classify pane.
# classify-pane.sh always emits JSON before exit (even on error), so using
# "|| echo '...'" would concatenate two JSON objects when it exits non-zero.
# Use "|| true" inside the substitution so set -euo pipefail does not abort
# the poll cycle when classify exits with a non-zero status code.
CLASSIFICATION=$("$CLASSIFY" "$WINDOW" 2>/dev/null || true)
[ -z "$CLASSIFICATION" ] && CLASSIFICATION='{"state":"error","reason":"classify failed","pane_cmd":"unknown"}'
PANE_STATE=$(echo "$CLASSIFICATION" | jq -r '.state')
PANE_REASON=$(echo "$CLASSIFICATION" | jq -r '.reason')
# Capture full pane output once — used for hash (stuck detection) and checkpoint parsing.
# Use -S -500 to get the last ~500 lines of scrollback so checkpoints aren't missed.
RAW=$(tmux capture-pane -t "$WINDOW" -p -S -500 2>/dev/null || echo "")
# --- Checkpoint tracking ---
# Parse any "CHECKPOINT:<step>" lines the agent has output and merge into state file.
# The agent writes these as it completes each required step so verify-complete.sh can gate recycling.
EXISTING_CPS=$(echo "$agent" | jq -c '.checkpoints // []')
NEW_CHECKPOINTS_JSON="$EXISTING_CPS"
if [ -n "$RAW" ]; then
FOUND_CPS=$(echo "$RAW" \
| grep -oE "CHECKPOINT:[a-zA-Z0-9_-]+" \
| sed 's/CHECKPOINT://' \
| sort -u \
| jq -R . | jq -s . 2>/dev/null || echo "[]")
NEW_CHECKPOINTS_JSON=$(jq -n \
--argjson existing "$EXISTING_CPS" \
--argjson found "$FOUND_CPS" \
'($existing + $found) | unique' 2>/dev/null || echo "$EXISTING_CPS")
fi
# Compute content hash for stuck-detection (only for running agents)
CURRENT_HASH=""
if [[ "$PANE_STATE" == "running" ]] && [ -n "$RAW" ]; then
CURRENT_HASH=$(echo "$RAW" | tail -20 | md5_hash)
fi
NEW_STATE="$STATE"
NEW_IDLE_SINCE="$IDLE_SINCE"
NEW_REVISION_COUNT="$REVISION_COUNT"
ACTION="none"
REASON="$PANE_REASON"
case "$PANE_STATE" in
complete)
# Agent output ORCHESTRATOR:DONE — mark pending_evaluation so orchestrator handles it.
# run-loop does NOT verify or notify; orchestrator's background poll picks this up.
NEW_STATE="pending_evaluation"
ACTION="complete" # run-loop logs it but takes no action
;;
waiting_approval)
NEW_STATE="waiting_approval"
ACTION="approve"
;;
idle)
# Agent process has exited — needs restart
NEW_STATE="idle"
ACTION="kick"
REASON="agent exited (shell is foreground)"
NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
NEW_IDLE_SINCE=$NOW
if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
NEW_STATE="escalated"
ACTION="none"
REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
fi
;;
running)
# Clear idle_since only when transitioning from idle (agent was kicked and
# restarted). Do NOT reset for stuck — idle_since must persist across polls
# so STUCK_DURATION can accumulate and trigger escalation.
# Also update the local IDLE_SINCE so the hash-stability check below uses
# the reset value on this same poll, not the stale kick timestamp.
if [[ "$STATE" == "idle" ]]; then
NEW_IDLE_SINCE=0
IDLE_SINCE=0
fi
# Check if hash has been stable (agent may be stuck mid-task)
if [ -n "$CURRENT_HASH" ] && [ "$CURRENT_HASH" = "$LAST_HASH" ] && [ "$LAST_HASH" != "" ]; then
if [ "$IDLE_SINCE" = "0" ] || [ "$IDLE_SINCE" = "null" ]; then
NEW_IDLE_SINCE=$NOW
else
STUCK_DURATION=$(( NOW - IDLE_SINCE ))
if [ "$STUCK_DURATION" -gt "$IDLE_THRESHOLD" ]; then
NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
NEW_IDLE_SINCE=$NOW
if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
NEW_STATE="escalated"
ACTION="none"
REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
else
NEW_STATE="stuck"
ACTION="kick"
REASON="output unchanged for ${STUCK_DURATION}s (threshold: ${IDLE_THRESHOLD}s)"
fi
fi
fi
else
# Only reset the idle timer when we have a valid hash comparison (pane
# capture succeeded). If CURRENT_HASH is empty (tmux capture-pane failed),
# preserve existing timers so stuck detection is not inadvertently reset.
if [ -n "$CURRENT_HASH" ]; then
NEW_STATE="running"
NEW_IDLE_SINCE=0
fi
fi
;;
error)
REASON="classify error: $PANE_REASON"
;;
esac
# Build updated agent record (ensure idle_since and revision_count are numeric)
# Use || true on each jq call so a malformed field skips this agent rather than
# aborting the entire poll cycle under set -e.
UPDATED_AGENT=$(echo "$agent" | jq \
--arg state "$NEW_STATE" \
--arg hash "$CURRENT_HASH" \
--argjson now "$NOW" \
--arg idle_since "$NEW_IDLE_SINCE" \
--arg revision_count "$NEW_REVISION_COUNT" \
--argjson checkpoints "$NEW_CHECKPOINTS_JSON" \
'.state = $state
| .last_output_hash = (if $hash == "" then .last_output_hash else $hash end)
| .last_seen_at = $now
| .idle_since = ($idle_since | tonumber)
| .revision_count = ($revision_count | tonumber)
| .checkpoints = $checkpoints' 2>/dev/null) || {
echo "Warning: failed to build updated agent for window $WINDOW — keeping original" >&2
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
continue
}
UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$UPDATED_AGENT" '. + [$a]')
# Add action if needed
if [ "$ACTION" != "none" ]; then
ACTION_OBJ=$(jq -n \
--arg window "$WINDOW" \
--arg action "$ACTION" \
--arg state "$NEW_STATE" \
--arg worktree "$WORKTREE" \
--arg objective "$OBJECTIVE" \
--arg reason "$REASON" \
'{window:$window, action:$action, state:$state, worktree:$worktree, objective:$objective, reason:$reason}')
ACTIONS=$(echo "$ACTIONS" | jq --argjson a "$ACTION_OBJ" '. + [$a]')
fi
done <<< "$AGENTS_JSON"
# Atomic state file update
jq --argjson agents "$UPDATED_AGENTS" \
--argjson now "$NOW" \
'.agents = $agents | .last_poll_at = $now' \
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
echo "$ACTIONS"

View File

@@ -1,32 +0,0 @@
#!/usr/bin/env bash
# recycle-agent.sh — kill a tmux window and restore the worktree to its spare branch
#
# Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH
# WINDOW — tmux target, e.g. autogpt1:3
# WORKTREE_PATH — absolute path to the git worktree
# SPARE_BRANCH — branch to restore, e.g. spare/6
#
# Stdout: one status line
set -euo pipefail
if [ $# -lt 3 ]; then
echo "Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH" >&2
exit 1
fi
WINDOW="$1"
WORKTREE_PATH="$2"
SPARE_BRANCH="$3"
# Kill the tmux window (ignore error — may already be gone)
tmux kill-window -t "$WINDOW" 2>/dev/null || true
# Restore to spare branch: abort any in-progress operation, then clean
git -C "$WORKTREE_PATH" rebase --abort 2>/dev/null || true
git -C "$WORKTREE_PATH" merge --abort 2>/dev/null || true
git -C "$WORKTREE_PATH" reset --hard HEAD 2>/dev/null
git -C "$WORKTREE_PATH" clean -fd 2>/dev/null
git -C "$WORKTREE_PATH" checkout "$SPARE_BRANCH"
echo "Recycled: $(basename "$WORKTREE_PATH")$SPARE_BRANCH (window $WINDOW closed)"

View File

@@ -1,164 +0,0 @@
#!/usr/bin/env bash
# run-loop.sh — Mechanical babysitter for the agent fleet (runs in its own tmux window)
#
# Handles ONLY two things that need no intelligence:
# idle → restart claude using --resume SESSION_ID (or --continue) to restore context
# approve → auto-approve safe dialogs, press Enter on numbered-option dialogs
#
# Everything else — ORCHESTRATOR:DONE, verification, /pr-test, final evaluation,
# marking done, deciding to close windows — is the orchestrating Claude's job.
# poll-cycle.sh sets state to pending_evaluation when ORCHESTRATOR:DONE is detected;
# the orchestrator's background poll loop handles it from there.
#
# Usage: run-loop.sh
# Env: POLL_INTERVAL (default: 30), ORCHESTRATOR_STATE_FILE
set -euo pipefail
# Copy scripts to a stable location outside the repo so they survive branch
# checkouts (e.g. recycle-agent.sh switching spare/N back into this worktree
# would wipe .claude/skills/orchestrate/scripts if the skill only exists on the
# current branch).
_ORIGIN_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
STABLE_SCRIPTS_DIR="$HOME/.claude/orchestrator/scripts"
mkdir -p "$STABLE_SCRIPTS_DIR"
cp "$_ORIGIN_DIR"/*.sh "$STABLE_SCRIPTS_DIR/"
chmod +x "$STABLE_SCRIPTS_DIR"/*.sh
SCRIPTS_DIR="$STABLE_SCRIPTS_DIR"
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
POLL_INTERVAL="${POLL_INTERVAL:-30}"
# ---------------------------------------------------------------------------
# update_state WINDOW FIELD VALUE
# ---------------------------------------------------------------------------
update_state() {
local window="$1" field="$2" value="$3"
jq --arg w "$window" --arg f "$field" --arg v "$value" \
'.agents |= map(if .window == $w then .[$f] = $v else . end)' \
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
}
update_state_int() {
local window="$1" field="$2" value="$3"
jq --arg w "$window" --arg f "$field" --argjson v "$value" \
'.agents |= map(if .window == $w then .[$f] = $v else . end)' \
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
}
agent_field() {
jq -r --arg w "$1" --arg f "$2" \
'.agents[] | select(.window == $w) | .[$f] // ""' \
"$STATE_FILE" 2>/dev/null
}
# ---------------------------------------------------------------------------
# wait_for_prompt WINDOW — wait up to 60s for Claude's prompt
# ---------------------------------------------------------------------------
wait_for_prompt() {
local window="$1"
for i in $(seq 1 60); do
local cmd pane
cmd=$(tmux display-message -t "$window" -p '#{pane_current_command}' 2>/dev/null || echo "")
pane=$(tmux capture-pane -t "$window" -p 2>/dev/null || echo "")
if echo "$pane" | grep -q "Enter to confirm"; then
tmux send-keys -t "$window" Down Enter; sleep 2; continue
fi
[[ "$cmd" == "node" ]] && echo "$pane" | grep -q "" && return 0
sleep 1
done
return 1 # timed out
}
# ---------------------------------------------------------------------------
# handle_kick WINDOW STATE — only for idle (crashed) agents, not stuck
# ---------------------------------------------------------------------------
handle_kick() {
local window="$1" state="$2"
[[ "$state" != "idle" ]] && return # stuck agents handled by supervisor
local worktree_path session_id
worktree_path=$(agent_field "$window" "worktree_path")
session_id=$(agent_field "$window" "session_id")
echo "[$(date +%H:%M:%S)] KICK restart $window — agent exited, resuming session"
# Resume the exact session so the agent retains full context — no need to re-send objective
if [ -n "$session_id" ]; then
tmux send-keys -t "$window" "cd '${worktree_path}' && claude --resume '${session_id}' --permission-mode bypassPermissions" Enter
else
tmux send-keys -t "$window" "cd '${worktree_path}' && claude --continue --permission-mode bypassPermissions" Enter
fi
wait_for_prompt "$window" || echo "[$(date +%H:%M:%S)] KICK WARNING $window — timed out waiting for "
}
# ---------------------------------------------------------------------------
# handle_approve WINDOW — auto-approve dialogs that need no judgment
# ---------------------------------------------------------------------------
handle_approve() {
local window="$1"
local pane_tail
pane_tail=$(tmux capture-pane -t "$window" -p 2>/dev/null | tail -3 || echo "")
# Settings error dialog at startup
if echo "$pane_tail" | grep -q "Enter to confirm"; then
echo "[$(date +%H:%M:%S)] APPROVE dialog $window — settings error"
tmux send-keys -t "$window" Down Enter
return
fi
# Numbered-option dialog (e.g. "Do you want to make this edit?")
# is already on option 1 (Yes) — Enter confirms it
if echo "$pane_tail" | grep -qE "\s*1\." || echo "$pane_tail" | grep -q "Esc to cancel"; then
echo "[$(date +%H:%M:%S)] APPROVE edit $window"
tmux send-keys -t "$window" "" Enter
return
fi
# y/n prompt for safe operations
if echo "$pane_tail" | grep -qiE "(^git |^npm |^pnpm |^poetry |^pytest|^docker |^make |^cargo |^pip |^yarn |curl .*(localhost|127\.0\.0\.1))"; then
echo "[$(date +%H:%M:%S)] APPROVE safe $window"
tmux send-keys -t "$window" "y" Enter
return
fi
# Anything else — supervisor handles it, just log
echo "[$(date +%H:%M:%S)] APPROVE skip $window — unknown dialog, supervisor will handle"
}
# ---------------------------------------------------------------------------
# Main loop
# ---------------------------------------------------------------------------
echo "[$(date +%H:%M:%S)] run-loop started (mechanical only, poll every ${POLL_INTERVAL}s)"
echo "[$(date +%H:%M:%S)] Supervisor: orchestrating Claude session (not a separate window)"
echo "---"
while true; do
if ! jq -e '.active == true' "$STATE_FILE" >/dev/null 2>&1; then
echo "[$(date +%H:%M:%S)] active=false — exiting."
exit 0
fi
ACTIONS=$("$SCRIPTS_DIR/poll-cycle.sh" 2>/dev/null || echo "[]")
KICKED=0; DONE=0
while IFS= read -r action; do
[ -z "$action" ] && continue
WINDOW=$(echo "$action" | jq -r '.window // ""')
ACTION=$(echo "$action" | jq -r '.action // ""')
STATE=$(echo "$action" | jq -r '.state // ""')
case "$ACTION" in
kick) handle_kick "$WINDOW" "$STATE" || true; KICKED=$(( KICKED + 1 )) ;;
approve) handle_approve "$WINDOW" || true ;;
complete) DONE=$(( DONE + 1 )) ;; # poll-cycle already set state=pending_evaluation; orchestrator handles
esac
done < <(echo "$ACTIONS" | jq -c '.[]' 2>/dev/null || true)
RUNNING=$(jq '[.agents[] | select(.state | test("running|stuck|waiting_approval|idle"))] | length' \
"$STATE_FILE" 2>/dev/null || echo 0)
echo "[$(date +%H:%M:%S)] Poll — ${RUNNING} running ${KICKED} kicked ${DONE} recycled"
sleep "$POLL_INTERVAL"
done

View File

@@ -1,122 +0,0 @@
#!/usr/bin/env bash
# spawn-agent.sh — create tmux window, checkout branch, launch claude, send task
#
# Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]
# SESSION — tmux session name, e.g. autogpt1
# WORKTREE_PATH — absolute path to the git worktree
# SPARE_BRANCH — spare branch being replaced, e.g. spare/6 (saved for recycle)
# NEW_BRANCH — task branch to create, e.g. feat/my-feature
# OBJECTIVE — task description sent to the agent
# PR_NUMBER — (optional) GitHub PR number for completion verification
# STEPS... — (optional) required checkpoint names, e.g. pr-address pr-test
#
# Stdout: SESSION:WINDOW_INDEX (nothing else — callers rely on this)
# Exit non-zero on failure.
set -euo pipefail
if [ $# -lt 5 ]; then
echo "Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]" >&2
exit 1
fi
SESSION="$1"
WORKTREE_PATH="$2"
SPARE_BRANCH="$3"
NEW_BRANCH="$4"
OBJECTIVE="$5"
PR_NUMBER="${6:-}"
STEPS=("${@:7}")
WORKTREE_NAME=$(basename "$WORKTREE_PATH")
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
# Generate a stable session ID so this agent's Claude session can always be resumed:
# claude --resume $SESSION_ID --permission-mode bypassPermissions
SESSION_ID=$(uuidgen 2>/dev/null || python3 -c "import uuid; print(uuid.uuid4())")
# Create (or switch to) the task branch
git -C "$WORKTREE_PATH" checkout -b "$NEW_BRANCH" 2>/dev/null \
|| git -C "$WORKTREE_PATH" checkout "$NEW_BRANCH"
# Open a new named tmux window; capture its numeric index
WIN_IDX=$(tmux new-window -t "$SESSION" -n "$WORKTREE_NAME" -P -F '#{window_index}')
WINDOW="${SESSION}:${WIN_IDX}"
# Append the initial agent record to the state file so subsequent jq updates find it.
# This must happen before the pr_number/steps update below.
if [ -f "$STATE_FILE" ]; then
NOW=$(date +%s)
jq --arg window "$WINDOW" \
--arg worktree "$WORKTREE_NAME" \
--arg worktree_path "$WORKTREE_PATH" \
--arg spare_branch "$SPARE_BRANCH" \
--arg branch "$NEW_BRANCH" \
--arg objective "$OBJECTIVE" \
--arg session_id "$SESSION_ID" \
--argjson now "$NOW" \
'.agents += [{
"window": $window,
"worktree": $worktree,
"worktree_path": $worktree_path,
"spare_branch": $spare_branch,
"branch": $branch,
"objective": $objective,
"session_id": $session_id,
"state": "running",
"checkpoints": [],
"last_output_hash": "",
"last_seen_at": $now,
"spawned_at": $now,
"idle_since": 0,
"revision_count": 0,
"last_rebriefed_at": 0
}]' "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
fi
# Store pr_number + steps in state file if provided (enables verify-complete.sh).
# The agent record was appended above so the jq select now finds it.
if [ -n "$PR_NUMBER" ] && [ -f "$STATE_FILE" ]; then
if [ "${#STEPS[@]}" -gt 0 ]; then
STEPS_JSON=$(printf '%s\n' "${STEPS[@]}" | jq -R . | jq -s .)
else
STEPS_JSON='[]'
fi
jq --arg w "$WINDOW" --arg pr "$PR_NUMBER" --argjson steps "$STEPS_JSON" \
'.agents |= map(if .window == $w then . + {pr_number: $pr, steps: $steps, checkpoints: []} else . end)' \
"$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
fi
# Launch claude with a stable session ID so it can always be resumed after a crash:
# claude --resume SESSION_ID --permission-mode bypassPermissions
tmux send-keys -t "$WINDOW" "cd '${WORKTREE_PATH}' && claude --permission-mode bypassPermissions --session-id '${SESSION_ID}'" Enter
# Wait up to 60s for claude to be fully interactive:
# both pane_current_command == 'node' AND the '' prompt is visible.
PROMPT_FOUND=false
for i in $(seq 1 60); do
CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "")
PANE=$(tmux capture-pane -t "$WINDOW" -p 2>/dev/null || echo "")
if echo "$PANE" | grep -q "Enter to confirm"; then
tmux send-keys -t "$WINDOW" Down Enter
sleep 2
continue
fi
if [[ "$CMD" == "node" ]] && echo "$PANE" | grep -q ""; then
PROMPT_FOUND=true
break
fi
sleep 1
done
if ! $PROMPT_FOUND; then
echo "[spawn-agent] WARNING: timed out waiting for prompt on $WINDOW — sending objective anyway" >&2
fi
# Send the task. Split text and Enter — if combined, Enter can fire before the string
# is fully buffered, leaving the message stuck as "[Pasted text +N lines]" unsent.
tmux send-keys -t "$WINDOW" "${OBJECTIVE} Output each completed step as CHECKPOINT:<step-name>. When ALL steps are done, output ORCHESTRATOR:DONE on its own line."
sleep 0.3
tmux send-keys -t "$WINDOW" Enter
# Only output the window address — nothing else (callers parse this)
echo "$WINDOW"

View File

@@ -1,43 +0,0 @@
#!/usr/bin/env bash
# status.sh — print orchestrator status: state file summary + live tmux pane commands
#
# Usage: status.sh
# Reads: ~/.claude/orchestrator-state.json
set -euo pipefail
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
if [ ! -f "$STATE_FILE" ] || ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
echo "No orchestrator state found at $STATE_FILE"
exit 0
fi
# Header: active status, session, thresholds, last poll
jq -r '
"=== Orchestrator [\(if .active then "RUNNING" else "STOPPED" end)] ===",
"Session: \(.tmux_session // "unknown") | Idle threshold: \(.idle_threshold_seconds // 300)s",
"Last poll: \(if (.last_poll_at // 0) == 0 then "never" else (.last_poll_at | strftime("%H:%M:%S")) end)",
""
' "$STATE_FILE"
# Each agent: state, window, worktree/branch, truncated objective
AGENT_COUNT=$(jq '.agents | length' "$STATE_FILE")
if [ "$AGENT_COUNT" -eq 0 ]; then
echo " (no agents registered)"
else
jq -r '
.agents[] |
" [\(.state | ascii_upcase)] \(.window) \(.worktree)/\(.branch)",
" \(.objective // "" | .[0:70])"
' "$STATE_FILE"
fi
echo ""
# Live pane_current_command for non-done agents
while IFS= read -r WINDOW; do
[ -z "$WINDOW" ] && continue
CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "unreachable")
echo " $WINDOW live: $CMD"
done < <(jq -r '.agents[] | select(.state != "done") | .window' "$STATE_FILE" 2>/dev/null || true)

View File

@@ -1,180 +0,0 @@
#!/usr/bin/env bash
# verify-complete.sh — verify a PR task is truly done before marking the agent done
#
# Check order matters:
# 1. Checkpoints — did the agent do all required steps?
# 2. CI complete — no pending (bots post comments AFTER their check runs, must wait)
# 3. CI passing — no failures (agent must fix before done)
# 4. spawned_at — a new CI run was triggered after agent spawned (proves real work)
# 5. Unresolved threads — checked AFTER CI so bot-posted comments are included
# 6. CHANGES_REQUESTED — checked AFTER CI so bot reviews are included
#
# Usage: verify-complete.sh WINDOW
# Exit 0 = verified complete; exit 1 = not complete (stderr has reason)
set -euo pipefail
WINDOW="$1"
STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
PR_NUMBER=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .pr_number // ""' "$STATE_FILE" 2>/dev/null)
STEPS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .steps // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
CHECKPOINTS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .checkpoints // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
WORKTREE_PATH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .worktree_path // ""' "$STATE_FILE" 2>/dev/null)
BRANCH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .branch // ""' "$STATE_FILE" 2>/dev/null)
SPAWNED_AT=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .spawned_at // "0"' "$STATE_FILE" 2>/dev/null || echo "0")
# No PR number = cannot verify
if [ -z "$PR_NUMBER" ]; then
echo "NOT COMPLETE: no pr_number in state — set pr_number or mark done manually" >&2
exit 1
fi
# --- Check 1: all required steps are checkpointed ---
MISSING=""
while IFS= read -r step; do
[ -z "$step" ] && continue
if ! echo "$CHECKPOINTS" | grep -qFx "$step"; then
MISSING="$MISSING $step"
fi
done <<< "$STEPS"
if [ -n "$MISSING" ]; then
echo "NOT COMPLETE: missing checkpoints:$MISSING on PR #$PR_NUMBER" >&2
exit 1
fi
# Resolve repo for all GitHub checks below
REPO=$(jq -r '.repo // ""' "$STATE_FILE" 2>/dev/null || echo "")
if [ -z "$REPO" ] && [ -n "$WORKTREE_PATH" ] && [ -d "$WORKTREE_PATH" ]; then
REPO=$(git -C "$WORKTREE_PATH" remote get-url origin 2>/dev/null \
| sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
fi
if [ -z "$REPO" ]; then
echo "Warning: cannot resolve repo — skipping CI/thread checks" >&2
echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓ (CI/thread checks skipped — no repo)"
exit 0
fi
CI_BUCKETS=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket 2>/dev/null || echo "[]")
# --- Check 2: CI fully complete — no pending checks ---
# Pending checks MUST finish before we check threads/reviews:
# bots (Seer, Check PR Status, etc.) post comments and CHANGES_REQUESTED AFTER their CI check runs.
PENDING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "pending")] | length' 2>/dev/null || echo "0")
if [ "$PENDING" -gt 0 ]; then
PENDING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
| jq -r '[.[] | select(.bucket == "pending") | .name] | join(", ")' 2>/dev/null || echo "unknown")
echo "NOT COMPLETE: $PENDING CI checks still pending on PR #$PR_NUMBER ($PENDING_NAMES)" >&2
exit 1
fi
# --- Check 3: CI passing — no failures ---
FAILING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "fail")] | length' 2>/dev/null || echo "0")
if [ "$FAILING" -gt 0 ]; then
FAILING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
| jq -r '[.[] | select(.bucket == "fail") | .name] | join(", ")' 2>/dev/null || echo "unknown")
echo "NOT COMPLETE: $FAILING failing CI checks on PR #$PR_NUMBER ($FAILING_NAMES)" >&2
exit 1
fi
# --- Check 4: a new CI run was triggered AFTER the agent spawned ---
if [ -n "$BRANCH" ] && [ "${SPAWNED_AT:-0}" -gt 0 ]; then
LATEST_RUN_AT=$(gh run list --repo "$REPO" --branch "$BRANCH" \
--json createdAt --limit 1 2>/dev/null | jq -r '.[0].createdAt // ""')
if [ -n "$LATEST_RUN_AT" ]; then
if date --version >/dev/null 2>&1; then
LATEST_RUN_EPOCH=$(date -d "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
else
LATEST_RUN_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
fi
if [ "$LATEST_RUN_EPOCH" -le "$SPAWNED_AT" ]; then
echo "NOT COMPLETE: latest CI run on $BRANCH predates agent spawn — agent may not have pushed yet" >&2
exit 1
fi
fi
fi
OWNER=$(echo "$REPO" | cut -d/ -f1)
REPONAME=$(echo "$REPO" | cut -d/ -f2)
# --- Check 5: no unresolved review threads (checked AFTER CI — bots post after their check) ---
UNRESOLVED=$(gh api graphql -f query="
{ repository(owner: \"${OWNER}\", name: \"${REPONAME}\") {
pullRequest(number: ${PR_NUMBER}) {
reviewThreads(first: 50) { nodes { isResolved } }
}
}
}
" --jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length' 2>/dev/null || echo "0")
if [ "$UNRESOLVED" -gt 0 ]; then
echo "NOT COMPLETE: $UNRESOLVED unresolved review threads on PR #$PR_NUMBER" >&2
exit 1
fi
# --- Check 6: no CHANGES_REQUESTED (checked AFTER CI — bots post reviews after their check) ---
# A CHANGES_REQUESTED review is stale if the latest commit was pushed AFTER the review was submitted.
# Stale reviews (pre-dating the fixing commits) should not block verification.
#
# Fetch commits and latestReviews in a single call and fail closed — if gh fails,
# treat that as NOT COMPLETE rather than silently passing.
# Use latestReviews (not reviews) so each reviewer's latest state is used — superseded
# CHANGES_REQUESTED entries are automatically excluded when the reviewer later approved.
# Note: we intentionally use committedDate (not PR updatedAt) because updatedAt changes on any
# PR activity (bot comments, label changes) which would create false negatives.
PR_REVIEW_METADATA=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
--json commits,latestReviews 2>/dev/null) || {
echo "NOT COMPLETE: unable to fetch PR review metadata for PR #$PR_NUMBER" >&2
exit 1
}
LATEST_COMMIT_DATE=$(jq -r '.commits[-1].committedDate // ""' <<< "$PR_REVIEW_METADATA")
CHANGES_REQUESTED_REVIEWS=$(jq '[.latestReviews[]? | select(.state == "CHANGES_REQUESTED")]' <<< "$PR_REVIEW_METADATA")
BLOCKING_CHANGES_REQUESTED=0
BLOCKING_REQUESTERS=""
if [ -n "$LATEST_COMMIT_DATE" ] && [ "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length)" -gt 0 ]; then
if date --version >/dev/null 2>&1; then
LATEST_COMMIT_EPOCH=$(date -d "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
else
LATEST_COMMIT_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
fi
while IFS= read -r review; do
[ -z "$review" ] && continue
REVIEW_DATE=$(echo "$review" | jq -r '.submittedAt // ""')
REVIEWER=$(echo "$review" | jq -r '.author.login // "unknown"')
if [ -z "$REVIEW_DATE" ]; then
# No submission date — treat as fresh (conservative: blocks verification)
BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
else
if date --version >/dev/null 2>&1; then
REVIEW_EPOCH=$(date -d "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
else
REVIEW_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
fi
if [ "$REVIEW_EPOCH" -gt "$LATEST_COMMIT_EPOCH" ]; then
# Review was submitted AFTER latest commit — still fresh, blocks verification
BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
fi
# Review submitted BEFORE latest commit — stale, skip
fi
done <<< "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -c '.[]')"
else
# No commit date or no changes_requested — check raw count as fallback
BLOCKING_CHANGES_REQUESTED=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length 2>/dev/null || echo "0")
BLOCKING_REQUESTERS=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -r '[.[].author.login] | join(", ")' 2>/dev/null || echo "unknown")
fi
if [ "$BLOCKING_CHANGES_REQUESTED" -gt 0 ]; then
echo "NOT COMPLETE: CHANGES_REQUESTED (after latest commit) from ${BLOCKING_REQUESTERS} on PR #$PR_NUMBER" >&2
exit 1
fi
echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓, CI complete + green, 0 unresolved threads, no CHANGES_REQUESTED"
exit 0

View File

@@ -90,34 +90,10 @@ Address comments **one at a time**: fix → commit → push → inline reply →
2. Commit and push the fix
3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):
Use a **markdown commit link** so GitHub renders it as a clickable reference. Get the full SHA with `git rev-parse HEAD` after committing:
| Comment type | How to reply |
|---|---|
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
## Codecov coverage
Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
### Running coverage locally
**Backend** (from `autogpt_platform/backend/`):
```bash
poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
```
**Frontend** (from `autogpt_platform/frontend/`):
```bash
pnpm vitest run --coverage
```
### When codecov/patch fails
1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
3. Run coverage locally to verify, commit, push.
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
## Format and commit

View File

@@ -547,8 +547,6 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations
**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
**CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.** Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
```bash
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
@@ -586,27 +584,15 @@ TREE_JSON+=']'
# Step 2: Create tree, commit, and branch ref
TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
# Resolve parent commit so screenshots are chained, not orphan root commits
PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || echo "")
if [ -n "$PARENT_SHA" ]; then
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
-f "parents[]=$PARENT_SHA" \
--jq '.sha')
else
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
fi
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
gh api "repos/${REPO}/git/refs" \
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
-f sha="$COMMIT_SHA" 2>/dev/null \
|| gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
-X PATCH -f sha="$COMMIT_SHA" -F force=true
-X PATCH -f sha="$COMMIT_SHA" -f force=true
```
Then post the comment with **inline images AND explanations for each screenshot**:
@@ -672,15 +658,6 @@ INNEREOF
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
rm -f "$COMMENT_FILE"
# Verify the posted comment contains inline images — exit 1 if none found
# Use separate --paginate + jq pipe: --jq applies per-page, not to the full list
LAST_COMMENT=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" --paginate 2>/dev/null | jq -r '.[-1].body // ""')
if ! echo "$LAST_COMMENT" | grep -q '!\['; then
echo "ERROR: Posted comment contains no inline images (![). Bare directory links are not acceptable." >&2
exit 1
fi
echo "✓ Inline images verified in posted comment"
```
**The PR comment MUST include:**
@@ -690,103 +667,6 @@ echo "✓ Inline images verified in posted comment"
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
## Step 8: Evaluate and post a formal PR review
After the test comment is posted, evaluate whether the run was thorough enough to make a merge decision, then post a formal GitHub review (approve or request changes). **This step is mandatory — every test run MUST end with a formal review decision.**
### Evaluation criteria
Re-read the PR description:
```bash
gh pr view "$PR_NUMBER" --json body --jq '.body' --repo "$REPO"
```
Score the run against each criterion:
| Criterion | Pass condition |
|-----------|---------------|
| **Coverage** | Every feature/change described in the PR has at least one test scenario |
| **All scenarios pass** | No FAIL rows in the results table |
| **Negative tests** | At least one failure-path test per feature (invalid input, unauthorized, edge case) |
| **Before/after evidence** | Every state-changing API call has before/after values logged |
| **Screenshots are meaningful** | Screenshots show the actual state change, not just a loading spinner or blank page |
| **No regressions** | Existing core flows (login, agent create/run) still work |
### Decision logic
```
ALL criteria pass → APPROVE
Any scenario FAIL or missing PR feature → REQUEST_CHANGES (list gaps)
Evidence weak (no before/after, vague shots) → REQUEST_CHANGES (list what's missing)
```
### Post the review
```bash
REVIEW_FILE=$(mktemp)
# Count results
PASS_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "PASS" || true)
FAIL_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "FAIL" || true)
TOTAL=$(( PASS_COUNT + FAIL_COUNT ))
# List any coverage gaps found during evaluation (populate this array as you assess)
# e.g. COVERAGE_GAPS=("PR claims to add X but no test covers it")
COVERAGE_GAPS=()
```
**If APPROVING** — all criteria met, zero failures, full coverage:
```bash
cat > "$REVIEW_FILE" <<REVIEWEOF
## E2E Test Evaluation — APPROVED
**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed.
**Coverage:** All features described in the PR were exercised.
**Evidence:** Before/after API values logged for all state-changing operations; screenshots show meaningful state transitions.
**Negative tests:** Failure paths tested for each feature.
No regressions observed on core flows.
REVIEWEOF
gh pr review "$PR_NUMBER" --repo "$REPO" --approve --body "$(cat "$REVIEW_FILE")"
echo "✅ PR approved"
```
**If REQUESTING CHANGES** — any failure, coverage gap, or missing evidence:
```bash
FAIL_LIST=$(echo "$TEST_RESULTS_TABLE" | grep "FAIL" | awk -F'|' '{print "- Scenario" $2 "failed"}' || true)
cat > "$REVIEW_FILE" <<REVIEWEOF
## E2E Test Evaluation — Changes Requested
**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed, ${FAIL_COUNT} failed.
### Required before merge
${FAIL_LIST}
$(for gap in "${COVERAGE_GAPS[@]}"; do echo "- $gap"; done)
Please fix the above and re-run the E2E tests.
REVIEWEOF
gh pr review "$PR_NUMBER" --repo "$REPO" --request-changes --body "$(cat "$REVIEW_FILE")"
echo "❌ Changes requested"
```
```bash
rm -f "$REVIEW_FILE"
```
**Rules:**
- In `--fix` mode, fix all failures before posting the review — the review reflects the final state after fixes
- Never approve if any scenario failed, even if it seems like a flake — rerun that scenario first
- Never request changes for issues already fixed in this run
## Fix mode (--fix flag)
When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.

View File

@@ -1,224 +0,0 @@
---
name: write-frontend-tests
description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
user-invocable: true
args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
metadata:
author: autogpt-team
version: "1.0.0"
---
# Write Frontend Tests
Analyze the current branch's frontend changes, plan integration tests, and write them.
## References
Before writing any tests, read the testing rules and conventions:
- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
## Step 1: Identify changed frontend files
```bash
BASE_BRANCH="${ARGUMENTS:-dev}"
cd autogpt_platform/frontend
# Get changed frontend files (excluding generated, config, and test files)
git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
| grep -v '__generated__' \
| grep -v '__tests__' \
| grep -v '\.test\.' \
| grep -v '\.stories\.' \
| grep -v '\.spec\.'
```
Also read the diff to understand what changed:
```bash
git diff "$BASE_BRANCH"...HEAD --stat -- src/
git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
```
## Step 2: Categorize changes and find test targets
For each changed file, determine:
1. **Is it a page?** (`page.tsx`) — these are the primary test targets
2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
**Priority order:**
1. Pages with new/changed data fetching or user interactions
2. Components with complex internal logic (modals, forms, wizards)
3. Hooks with non-trivial business logic
4. Pure helper functions
Skip: styling-only changes, type-only changes, config changes.
## Step 3: Check for existing tests
For each test target, check if tests already exist:
```bash
# For a page at src/app/(platform)/library/page.tsx
ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
```
Note which targets have no tests (need new files) vs which have tests that need updating.
## Step 4: Identify API endpoints used
For each test target, find which API hooks are used:
```bash
# Find generated API hook imports in the changed files
grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
```
For each API hook found, locate the corresponding MSW handler:
```bash
# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
```
List every MSW handler you will need (200 for happy path, 4xx for error paths).
## Step 5: Write the test plan
Before writing code, output a plan as a numbered list:
```
Test plan for [branch name]:
1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
- Renders page with agent list (MSW 200)
- Shows loading state
- Shows error state (MSW 422)
- Handles empty agent list
2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
- Filters agents by search query
- Shows no results message
- Clears search
3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
- Add test for new "duplicate" action
```
Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
## Step 6: Write the tests
For each test file in the plan, follow these conventions:
### File structure
```tsx
import { render, screen, waitFor } from "@/tests/integrations/test-utils";
import { server } from "@/mocks/mock-server";
// Import MSW handlers for endpoints the page uses
import {
getGetV2ListLibraryAgentsMockHandler200,
getGetV2ListLibraryAgentsMockHandler422,
} from "@/app/api/__generated__/endpoints/library/library.msw";
// Import the component under test
import LibraryPage from "../page";
describe("LibraryPage", () => {
test("renders agent list from API", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler200());
render(<LibraryPage />);
expect(await screen.findByText(/my agents/i)).toBeDefined();
});
test("shows error state on API failure", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler422());
render(<LibraryPage />);
expect(await screen.findByText(/error/i)).toBeDefined();
});
});
```
### Rules
- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
- Use `server.use()` to set up MSW handlers BEFORE rendering
- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
- Use `getBy*` only for elements that are immediately present in the DOM
- Use `screen` queries — do NOT destructure from `render()`
- Use `waitFor` when asserting side effects or state changes after interactions
- Import `fireEvent` or `userEvent` from the test-utils for interactions
- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
- Do NOT use `act()` manually — `render` and `fireEvent` handle it
- Keep tests focused: one behavior per test
- Use descriptive test names that read like sentences
### Test location
```
# For pages: __tests__/ next to page.tsx
src/app/(platform)/library/__tests__/main.test.tsx
# For complex standalone components: __tests__/ inside component folder
src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
# For pure helpers: co-located .test.ts
src/app/(platform)/library/helpers.test.ts
```
### Custom MSW overrides
When the auto-generated faker data is not enough, override with specific data:
```tsx
import { http, HttpResponse } from "msw";
server.use(
http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
return HttpResponse.json({
agents: [
{ id: "1", name: "Test Agent", description: "A test agent" },
],
pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
});
}),
);
```
Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
## Step 7: Run and verify
After writing all tests:
```bash
cd autogpt_platform/frontend
pnpm test:unit --reporter=verbose
```
If tests fail:
1. Read the error output carefully
2. Fix the test (not the source code, unless there is a genuine bug)
3. Re-run until all pass
Then run the full checks:
```bash
pnpm format
pnpm lint
pnpm types
```

View File

@@ -6,19 +6,11 @@ on:
paths:
- '.github/workflows/classic-autogpt-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/direct_benchmark/**'
- 'classic/forge/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
pull_request:
branches: [ master, dev, release-* ]
paths:
- '.github/workflows/classic-autogpt-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/direct_benchmark/**'
- 'classic/forge/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
concurrency:
group: ${{ format('classic-autogpt-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -27,22 +19,47 @@ concurrency:
defaults:
run:
shell: bash
working-directory: classic
working-directory: classic/original_autogpt
jobs:
test:
permissions:
contents: read
timeout-minutes: 30
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10"]
platform-os: [ubuntu, macos, macos-arm64, windows]
runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}
steps:
- name: Start MinIO service
# Quite slow on macOS (2~4 minutes to set up Docker)
# - name: Set up Docker (macOS)
# if: runner.os == 'macOS'
# uses: crazy-max/ghaction-setup-docker@v3
- name: Start MinIO service (Linux)
if: runner.os == 'Linux'
working-directory: '.'
run: |
docker pull minio/minio:edge-cicd
docker run -d -p 9000:9000 minio/minio:edge-cicd
- name: Start MinIO service (macOS)
if: runner.os == 'macOS'
working-directory: ${{ runner.temp }}
run: |
brew install minio/stable/minio
mkdir data
minio server ./data &
# No MinIO on Windows:
# - Windows doesn't support running Linux Docker containers
# - It doesn't seem possible to start background processes on Windows. They are
# killed after the step returns.
# See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
- name: Checkout repository
uses: actions/checkout@v4
with:
@@ -54,23 +71,41 @@ jobs:
git config --global user.name "Auto-GPT-Bot"
git config --global user.email "github-bot@agpt.co"
- name: Set up Python 3.12
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: "3.12"
python-version: ${{ matrix.python-version }}
- id: get_date
name: Get date
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT
- name: Set up Python dependency cache
# On Windows, unpacking cached dependencies takes longer than just installing them
if: runner.os != 'Windows'
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
key: poetry-${{ runner.os }}-${{ hashFiles('classic/original_autogpt/poetry.lock') }}
- name: Install Poetry
run: curl -sSL https://install.python-poetry.org | python3 -
- name: Install Poetry (Unix)
if: runner.os != 'Windows'
run: |
curl -sSL https://install.python-poetry.org | python3 -
if [ "${{ runner.os }}" = "macOS" ]; then
PATH="$HOME/.local/bin:$PATH"
echo "$HOME/.local/bin" >> $GITHUB_PATH
fi
- name: Install Poetry (Windows)
if: runner.os == 'Windows'
shell: pwsh
run: |
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
$env:PATH += ";$env:APPDATA\Python\Scripts"
echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH
- name: Install Python dependencies
run: poetry install
@@ -81,13 +116,12 @@ jobs:
--cov=autogpt --cov-branch --cov-report term-missing --cov-report xml \
--numprocesses=logical --durations=10 \
--junitxml=junit.xml -o junit_family=legacy \
original_autogpt/tests/unit original_autogpt/tests/integration
tests/unit tests/integration
env:
CI: true
PLAIN_OUTPUT: True
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
S3_ENDPOINT_URL: http://127.0.0.1:9000
S3_ENDPOINT_URL: ${{ runner.os != 'Windows' && 'http://127.0.0.1:9000' || '' }}
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
@@ -101,11 +135,11 @@ jobs:
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: autogpt-agent
flags: autogpt-agent,${{ runner.os }}
- name: Upload logs to artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: test-logs
path: classic/logs/
path: classic/original_autogpt/logs/

View File

@@ -148,7 +148,7 @@ jobs:
--entrypoint poetry ${{ env.IMAGE_NAME }} run \
pytest -v --cov=autogpt --cov-branch --cov-report term-missing \
--numprocesses=4 --durations=10 \
original_autogpt/tests/unit original_autogpt/tests/integration 2>&1 | tee test_output.txt
tests/unit tests/integration 2>&1 | tee test_output.txt
test_failure=${PIPESTATUS[0]}

View File

@@ -10,9 +10,10 @@ on:
- '.github/workflows/classic-autogpts-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
- 'classic/direct_benchmark/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
- 'classic/benchmark/**'
- 'classic/run'
- 'classic/cli.py'
- 'classic/setup.py'
- '!**/*.md'
pull_request:
branches: [ master, dev, release-* ]
@@ -20,9 +21,10 @@ on:
- '.github/workflows/classic-autogpts-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
- 'classic/direct_benchmark/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
- 'classic/benchmark/**'
- 'classic/run'
- 'classic/cli.py'
- 'classic/setup.py'
- '!**/*.md'
defaults:
@@ -33,9 +35,13 @@ defaults:
jobs:
serve-agent-protocol:
runs-on: ubuntu-latest
strategy:
matrix:
agent-name: [ original_autogpt ]
fail-fast: false
timeout-minutes: 20
env:
min-python-version: '3.12'
min-python-version: '3.10'
steps:
- name: Checkout repository
uses: actions/checkout@v4
@@ -49,22 +55,22 @@ jobs:
python-version: ${{ env.min-python-version }}
- name: Install Poetry
working-directory: ./classic/${{ matrix.agent-name }}/
run: |
curl -sSL https://install.python-poetry.org | python -
- name: Install dependencies
run: poetry install
- name: Run smoke tests with direct-benchmark
- name: Run regression tests
run: |
poetry run direct-benchmark run \
--strategies one_shot \
--models claude \
--tests ReadFile,WriteFile \
--json
./run agent start ${{ matrix.agent-name }}
cd ${{ matrix.agent-name }}
poetry run agbenchmark --mock --test=BasicRetrieval --test=Battleship --test=WebArenaTask_0
poetry run agbenchmark --test=WriteFile
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
AGENT_NAME: ${{ matrix.agent-name }}
REQUESTS_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
NONINTERACTIVE_MODE: "true"
CI: true
HELICONE_CACHE_ENABLED: false
HELICONE_PROPERTY_AGENT: ${{ matrix.agent-name }}
REPORTS_FOLDER: ${{ format('../../reports/{0}', matrix.agent-name) }}
TELEMETRY_ENVIRONMENT: autogpt-ci
TELEMETRY_OPT_IN: ${{ github.ref_name == 'master' }}

View File

@@ -1,24 +1,18 @@
name: Classic - Direct Benchmark CI
name: Classic - AGBenchmark CI
on:
push:
branches: [ master, dev, ci-test* ]
paths:
- 'classic/direct_benchmark/**'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
- 'classic/benchmark/**'
- '!classic/benchmark/reports/**'
- .github/workflows/classic-benchmark-ci.yml
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
pull_request:
branches: [ master, dev, release-* ]
paths:
- 'classic/direct_benchmark/**'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
- 'classic/benchmark/**'
- '!classic/benchmark/reports/**'
- .github/workflows/classic-benchmark-ci.yml
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
concurrency:
group: ${{ format('benchmark-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -29,16 +23,23 @@ defaults:
shell: bash
env:
min-python-version: '3.12'
min-python-version: '3.10'
jobs:
benchmark-tests:
runs-on: ubuntu-latest
test:
permissions:
contents: read
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
python-version: ["3.10"]
platform-os: [ubuntu, macos, macos-arm64, windows]
runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}
defaults:
run:
shell: bash
working-directory: classic
working-directory: classic/benchmark
steps:
- name: Checkout repository
uses: actions/checkout@v4
@@ -46,88 +47,71 @@ jobs:
fetch-depth: 0
submodules: true
- name: Set up Python ${{ env.min-python-version }}
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ env.min-python-version }}
python-version: ${{ matrix.python-version }}
- name: Set up Python dependency cache
# On Windows, unpacking cached dependencies takes longer than just installing them
if: runner.os != 'Windows'
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
key: poetry-${{ runner.os }}-${{ hashFiles('classic/benchmark/poetry.lock') }}
- name: Install Poetry
- name: Install Poetry (Unix)
if: runner.os != 'Windows'
run: |
curl -sSL https://install.python-poetry.org | python3 -
- name: Install dependencies
if [ "${{ runner.os }}" = "macOS" ]; then
PATH="$HOME/.local/bin:$PATH"
echo "$HOME/.local/bin" >> $GITHUB_PATH
fi
- name: Install Poetry (Windows)
if: runner.os == 'Windows'
shell: pwsh
run: |
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
$env:PATH += ";$env:APPDATA\Python\Scripts"
echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH
- name: Install Python dependencies
run: poetry install
- name: Run basic benchmark tests
- name: Run pytest with coverage
run: |
echo "Testing ReadFile challenge with one_shot strategy..."
poetry run direct-benchmark run \
--fresh \
--strategies one_shot \
--models claude \
--tests ReadFile \
--json
echo "Testing WriteFile challenge..."
poetry run direct-benchmark run \
--fresh \
--strategies one_shot \
--models claude \
--tests WriteFile \
--json
poetry run pytest -vv \
--cov=agbenchmark --cov-branch --cov-report term-missing --cov-report xml \
--durations=10 \
--junitxml=junit.xml -o junit_family=legacy \
tests
env:
CI: true
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
NONINTERACTIVE_MODE: "true"
- name: Test category filtering
run: |
echo "Testing coding category..."
poetry run direct-benchmark run \
--fresh \
--strategies one_shot \
--models claude \
--categories coding \
--tests ReadFile,WriteFile \
--json
env:
CI: true
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
NONINTERACTIVE_MODE: "true"
- name: Upload test results to Codecov
if: ${{ !cancelled() }} # Run even if tests fail
uses: codecov/test-results-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
- name: Test multiple strategies
run: |
echo "Testing multiple strategies..."
poetry run direct-benchmark run \
--fresh \
--strategies one_shot,plan_execute \
--models claude \
--tests ReadFile \
--parallel 2 \
--json
env:
CI: true
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
NONINTERACTIVE_MODE: "true"
- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: agbenchmark,${{ runner.os }}
# Run regression tests on maintain challenges
regression-tests:
self-test-with-agent:
runs-on: ubuntu-latest
timeout-minutes: 45
if: github.ref == 'refs/heads/master' || github.ref == 'refs/heads/dev'
defaults:
run:
shell: bash
working-directory: classic
strategy:
matrix:
agent-name: [forge]
fail-fast: false
timeout-minutes: 20
steps:
- name: Checkout repository
uses: actions/checkout@v4
@@ -140,31 +124,53 @@ jobs:
with:
python-version: ${{ env.min-python-version }}
- name: Set up Python dependency cache
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
- name: Install dependencies
run: poetry install
curl -sSL https://install.python-poetry.org | python -
- name: Run regression tests
working-directory: classic
run: |
echo "Running regression tests (previously beaten challenges)..."
poetry run direct-benchmark run \
--fresh \
--strategies one_shot \
--models claude \
--maintain \
--parallel 4 \
--json
./run agent start ${{ matrix.agent-name }}
cd ${{ matrix.agent-name }}
set +e # Ignore non-zero exit codes and continue execution
echo "Running the following command: poetry run agbenchmark --maintain --mock"
poetry run agbenchmark --maintain --mock
EXIT_CODE=$?
set -e # Stop ignoring non-zero exit codes
# Check if the exit code was 5, and if so, exit with 0 instead
if [ $EXIT_CODE -eq 5 ]; then
echo "regression_tests.json is empty."
fi
echo "Running the following command: poetry run agbenchmark --mock"
poetry run agbenchmark --mock
echo "Running the following command: poetry run agbenchmark --mock --category=data"
poetry run agbenchmark --mock --category=data
echo "Running the following command: poetry run agbenchmark --mock --category=coding"
poetry run agbenchmark --mock --category=coding
# echo "Running the following command: poetry run agbenchmark --test=WriteFile"
# poetry run agbenchmark --test=WriteFile
cd ../benchmark
poetry install
echo "Adding the BUILD_SKILL_TREE environment variable. This will attempt to add new elements in the skill tree. If new elements are added, the CI fails because they should have been pushed"
export BUILD_SKILL_TREE=true
# poetry run agbenchmark --mock
# CHANGED=$(git diff --name-only | grep -E '(agbenchmark/challenges)|(../classic/frontend/assets)') || echo "No diffs"
# if [ ! -z "$CHANGED" ]; then
# echo "There are unstaged changes please run agbenchmark and commit those changes since they are needed."
# echo "$CHANGED"
# exit 1
# else
# echo "No unstaged changes."
# fi
env:
CI: true
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
NONINTERACTIVE_MODE: "true"
TELEMETRY_ENVIRONMENT: autogpt-benchmark-ci
TELEMETRY_OPT_IN: ${{ github.ref_name == 'master' }}

View File

@@ -6,15 +6,13 @@ on:
paths:
- '.github/workflows/classic-forge-ci.yml'
- 'classic/forge/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
- '!classic/forge/tests/vcr_cassettes'
pull_request:
branches: [ master, dev, release-* ]
paths:
- '.github/workflows/classic-forge-ci.yml'
- 'classic/forge/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
- '!classic/forge/tests/vcr_cassettes'
concurrency:
group: ${{ format('forge-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -23,60 +21,131 @@ concurrency:
defaults:
run:
shell: bash
working-directory: classic
working-directory: classic/forge
jobs:
test:
permissions:
contents: read
timeout-minutes: 30
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10"]
platform-os: [ubuntu, macos, macos-arm64, windows]
runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}
steps:
- name: Start MinIO service
# Quite slow on macOS (2~4 minutes to set up Docker)
# - name: Set up Docker (macOS)
# if: runner.os == 'macOS'
# uses: crazy-max/ghaction-setup-docker@v3
- name: Start MinIO service (Linux)
if: runner.os == 'Linux'
working-directory: '.'
run: |
docker pull minio/minio:edge-cicd
docker run -d -p 9000:9000 minio/minio:edge-cicd
- name: Start MinIO service (macOS)
if: runner.os == 'macOS'
working-directory: ${{ runner.temp }}
run: |
brew install minio/stable/minio
mkdir data
minio server ./data &
# No MinIO on Windows:
# - Windows doesn't support running Linux Docker containers
# - It doesn't seem possible to start background processes on Windows. They are
# killed after the step returns.
# See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
submodules: true
- name: Set up Python 3.12
- name: Checkout cassettes
if: ${{ startsWith(github.event_name, 'pull_request') }}
env:
PR_BASE: ${{ github.event.pull_request.base.ref }}
PR_BRANCH: ${{ github.event.pull_request.head.ref }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
run: |
cassette_branch="${PR_AUTHOR}-${PR_BRANCH}"
cassette_base_branch="${PR_BASE}"
cd tests/vcr_cassettes
if ! git ls-remote --exit-code --heads origin $cassette_base_branch ; then
cassette_base_branch="master"
fi
if git ls-remote --exit-code --heads origin $cassette_branch ; then
git fetch origin $cassette_branch
git fetch origin $cassette_base_branch
git checkout $cassette_branch
# Pick non-conflicting cassette updates from the base branch
git merge --no-commit --strategy-option=ours origin/$cassette_base_branch
echo "Using cassettes from mirror branch '$cassette_branch'," \
"synced to upstream branch '$cassette_base_branch'."
else
git checkout -b $cassette_branch
echo "Branch '$cassette_branch' does not exist in cassette submodule." \
"Using cassettes from '$cassette_base_branch'."
fi
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: "3.12"
python-version: ${{ matrix.python-version }}
- name: Set up Python dependency cache
# On Windows, unpacking cached dependencies takes longer than just installing them
if: runner.os != 'Windows'
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
key: poetry-${{ runner.os }}-${{ hashFiles('classic/forge/poetry.lock') }}
- name: Install Poetry
run: curl -sSL https://install.python-poetry.org | python3 -
- name: Install Poetry (Unix)
if: runner.os != 'Windows'
run: |
curl -sSL https://install.python-poetry.org | python3 -
if [ "${{ runner.os }}" = "macOS" ]; then
PATH="$HOME/.local/bin:$PATH"
echo "$HOME/.local/bin" >> $GITHUB_PATH
fi
- name: Install Poetry (Windows)
if: runner.os == 'Windows'
shell: pwsh
run: |
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
$env:PATH += ";$env:APPDATA\Python\Scripts"
echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH
- name: Install Python dependencies
run: poetry install
- name: Install Playwright browsers
run: poetry run playwright install chromium
- name: Run pytest with coverage
run: |
poetry run pytest -vv \
--cov=forge --cov-branch --cov-report term-missing --cov-report xml \
--durations=10 \
--junitxml=junit.xml -o junit_family=legacy \
forge/forge forge/tests
forge
env:
CI: true
PLAIN_OUTPUT: True
# API keys - tests that need these will skip if not available
# Secrets are not available to fork PRs (GitHub security feature)
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
S3_ENDPOINT_URL: http://127.0.0.1:9000
S3_ENDPOINT_URL: ${{ runner.os != 'Windows' && 'http://127.0.0.1:9000' || '' }}
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
@@ -90,11 +159,85 @@ jobs:
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: forge
flags: forge,${{ runner.os }}
- id: setup_git_auth
name: Set up git token authentication
# Cassettes may be pushed even when tests fail
if: success() || failure()
run: |
config_key="http.${{ github.server_url }}/.extraheader"
if [ "${{ runner.os }}" = 'macOS' ]; then
base64_pat=$(echo -n "pat:${{ secrets.PAT_REVIEW }}" | base64)
else
base64_pat=$(echo -n "pat:${{ secrets.PAT_REVIEW }}" | base64 -w0)
fi
git config "$config_key" \
"Authorization: Basic $base64_pat"
cd tests/vcr_cassettes
git config "$config_key" \
"Authorization: Basic $base64_pat"
echo "config_key=$config_key" >> $GITHUB_OUTPUT
- id: push_cassettes
name: Push updated cassettes
# For pull requests, push updated cassettes even when tests fail
if: github.event_name == 'push' || (! github.event.pull_request.head.repo.fork && (success() || failure()))
env:
PR_BRANCH: ${{ github.event.pull_request.head.ref }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
run: |
if [ "${{ startsWith(github.event_name, 'pull_request') }}" = "true" ]; then
is_pull_request=true
cassette_branch="${PR_AUTHOR}-${PR_BRANCH}"
else
cassette_branch="${{ github.ref_name }}"
fi
cd tests/vcr_cassettes
# Commit & push changes to cassettes if any
if ! git diff --quiet; then
git add .
git commit -m "Auto-update cassettes"
git push origin HEAD:$cassette_branch
if [ ! $is_pull_request ]; then
cd ../..
git add tests/vcr_cassettes
git commit -m "Update cassette submodule"
git push origin HEAD:$cassette_branch
fi
echo "updated=true" >> $GITHUB_OUTPUT
else
echo "updated=false" >> $GITHUB_OUTPUT
echo "No cassette changes to commit"
fi
- name: Post Set up git token auth
if: steps.setup_git_auth.outcome == 'success'
run: |
git config --unset-all '${{ steps.setup_git_auth.outputs.config_key }}'
git submodule foreach git config --unset-all '${{ steps.setup_git_auth.outputs.config_key }}'
- name: Apply "behaviour change" label and comment on PR
if: ${{ startsWith(github.event_name, 'pull_request') }}
run: |
PR_NUMBER="${{ github.event.pull_request.number }}"
TOKEN="${{ secrets.PAT_REVIEW }}"
REPO="${{ github.repository }}"
if [[ "${{ steps.push_cassettes.outputs.updated }}" == "true" ]]; then
echo "Adding label and comment..."
echo $TOKEN | gh auth login --with-token
gh issue edit $PR_NUMBER --add-label "behaviour change"
gh issue comment $PR_NUMBER --body "You changed AutoGPT's behaviour on ${{ runner.os }}. The cassettes have been updated and will be merged to the submodule when this Pull Request gets merged."
fi
- name: Upload logs to artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: test-logs
path: classic/logs/
path: classic/forge/logs/

View File

@@ -0,0 +1,60 @@
name: Classic - Frontend CI/CD
on:
push:
branches:
- master
- dev
- 'ci-test*' # This will match any branch that starts with "ci-test"
paths:
- 'classic/frontend/**'
- '.github/workflows/classic-frontend-ci.yml'
pull_request:
paths:
- 'classic/frontend/**'
- '.github/workflows/classic-frontend-ci.yml'
jobs:
build:
permissions:
contents: write
pull-requests: write
runs-on: ubuntu-latest
env:
BUILD_BRANCH: ${{ format('classic-frontend-build/{0}', github.ref_name) }}
steps:
- name: Checkout Repo
uses: actions/checkout@v4
- name: Setup Flutter
uses: subosito/flutter-action@v2
with:
flutter-version: '3.13.2'
- name: Build Flutter to Web
run: |
cd classic/frontend
flutter build web --base-href /app/
# - name: Commit and Push to ${{ env.BUILD_BRANCH }}
# if: github.event_name == 'push'
# run: |
# git config --local user.email "action@github.com"
# git config --local user.name "GitHub Action"
# git add classic/frontend/build/web
# git checkout -B ${{ env.BUILD_BRANCH }}
# git commit -m "Update frontend build to ${GITHUB_SHA:0:7}" -a
# git push -f origin ${{ env.BUILD_BRANCH }}
- name: Create PR ${{ env.BUILD_BRANCH }} -> ${{ github.ref_name }}
if: github.event_name == 'push'
uses: peter-evans/create-pull-request@v8
with:
add-paths: classic/frontend/build/web
base: ${{ github.ref_name }}
branch: ${{ env.BUILD_BRANCH }}
delete-branch: true
title: "Update frontend build in `${{ github.ref_name }}`"
body: "This PR updates the frontend build based on commit ${{ github.sha }}."
commit-message: "Update frontend build based on commit ${{ github.sha }}"

View File

@@ -7,9 +7,7 @@ on:
- '.github/workflows/classic-python-checks-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
- 'classic/direct_benchmark/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
- 'classic/benchmark/**'
- '**.py'
- '!classic/forge/tests/vcr_cassettes'
pull_request:
@@ -18,9 +16,7 @@ on:
- '.github/workflows/classic-python-checks-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
- 'classic/direct_benchmark/**'
- 'classic/pyproject.toml'
- 'classic/poetry.lock'
- 'classic/benchmark/**'
- '**.py'
- '!classic/forge/tests/vcr_cassettes'
@@ -31,13 +27,44 @@ concurrency:
defaults:
run:
shell: bash
working-directory: classic
jobs:
get-changed-parts:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- id: changes-in
name: Determine affected subprojects
uses: dorny/paths-filter@v3
with:
filters: |
original_autogpt:
- classic/original_autogpt/autogpt/**
- classic/original_autogpt/tests/**
- classic/original_autogpt/poetry.lock
forge:
- classic/forge/forge/**
- classic/forge/tests/**
- classic/forge/poetry.lock
benchmark:
- classic/benchmark/agbenchmark/**
- classic/benchmark/tests/**
- classic/benchmark/poetry.lock
outputs:
changed-parts: ${{ steps.changes-in.outputs.changes }}
lint:
needs: get-changed-parts
runs-on: ubuntu-latest
env:
min-python-version: "3.12"
min-python-version: "3.10"
strategy:
matrix:
sub-package: ${{ fromJson(needs.get-changed-parts.outputs.changed-parts) }}
fail-fast: false
steps:
- name: Checkout repository
@@ -54,31 +81,42 @@ jobs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry
key: ${{ runner.os }}-poetry-${{ hashFiles('classic/poetry.lock') }}
key: ${{ runner.os }}-poetry-${{ hashFiles(format('{0}/poetry.lock', matrix.sub-package)) }}
- name: Install Poetry
run: curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
- name: Install Python dependencies
run: poetry install
run: poetry -C classic/${{ matrix.sub-package }} install
# Lint
- name: Lint (isort)
run: poetry run isort --check .
working-directory: classic/${{ matrix.sub-package }}
- name: Lint (Black)
if: success() || failure()
run: poetry run black --check .
working-directory: classic/${{ matrix.sub-package }}
- name: Lint (Flake8)
if: success() || failure()
run: poetry run flake8 .
working-directory: classic/${{ matrix.sub-package }}
types:
needs: get-changed-parts
runs-on: ubuntu-latest
env:
min-python-version: "3.12"
min-python-version: "3.10"
strategy:
matrix:
sub-package: ${{ fromJson(needs.get-changed-parts.outputs.changed-parts) }}
fail-fast: false
steps:
- name: Checkout repository
@@ -95,16 +133,19 @@ jobs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry
key: ${{ runner.os }}-poetry-${{ hashFiles('classic/poetry.lock') }}
key: ${{ runner.os }}-poetry-${{ hashFiles(format('{0}/poetry.lock', matrix.sub-package)) }}
- name: Install Poetry
run: curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
- name: Install Python dependencies
run: poetry install
run: poetry -C classic/${{ matrix.sub-package }} install
# Typecheck
- name: Typecheck
if: success() || failure()
run: poetry run pyright
working-directory: classic/${{ matrix.sub-package }}

View File

@@ -269,14 +269,12 @@ jobs:
DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
DIRECT_URL: ${{ steps.supabase.outputs.DB_URL }}
- name: Run pytest with coverage
- name: Run pytest
run: |
if [[ "${{ runner.debug }}" == "1" ]]; then
poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG \
--cov=backend --cov-branch --cov-report term-missing --cov-report xml
poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG
else
poetry run pytest -s -vv \
--cov=backend --cov-branch --cov-report term-missing --cov-report xml
poetry run pytest -s -vv
fi
env:
LOG_LEVEL: ${{ runner.debug && 'DEBUG' || 'INFO' }}
@@ -289,13 +287,11 @@ jobs:
REDIS_PORT: "6379"
ENCRYPTION_KEY: "dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=" # DO NOT USE IN PRODUCTION!!
- name: Upload coverage reports to Codecov
if: ${{ !cancelled() }}
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: platform-backend
files: ./autogpt_platform/backend/coverage.xml
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v4
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
# flags: backend,${{ runner.os }}
env:
CI: true

View File

@@ -148,11 +148,3 @@ jobs:
- name: Run Integration Tests
run: pnpm test:unit
- name: Upload coverage reports to Codecov
if: ${{ !cancelled() }}
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: platform-frontend
files: ./autogpt_platform/frontend/coverage/cobertura-coverage.xml

View File

@@ -179,30 +179,21 @@ jobs:
pip install pyyaml
# Resolve extends and generate a flat compose file that bake can understand
export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
docker compose -f docker-compose.yml config > docker-compose.resolved.yml
# Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
# (docker compose config on some versions drops this arg)
if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
sed -i '/NEXT_PUBLIC_PW_TEST/a\ NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
fi
# Add cache configuration to the resolved compose file
python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
--source docker-compose.resolved.yml \
--cache-from "type=gha" \
--cache-to "type=gha,mode=max" \
--backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
--git-ref "${{ github.ref }}"
# Build with bake using the resolved compose file (now includes cache config)
docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
env:
NEXT_PUBLIC_PW_TEST: true
NEXT_PUBLIC_SOURCEMAPS: true
- name: Set up tests - Cache E2E test data
id: e2e-data-cache
@@ -288,11 +279,6 @@ jobs:
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Copy source maps from Docker for E2E coverage
run: |
FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
- name: Set up tests - Install dependencies
run: pnpm install --frozen-lockfile
@@ -303,15 +289,6 @@ jobs:
run: pnpm test:no-build
continue-on-error: false
- name: Upload E2E coverage to Codecov
if: ${{ !cancelled() }}
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: platform-frontend-e2e
files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
disable_search: true
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4

10
.gitignore vendored
View File

@@ -3,7 +3,6 @@
classic/original_autogpt/keys.py
classic/original_autogpt/*.json
auto_gpt_workspace/*
.autogpt/
*.mpeg
.env
# Root .env files
@@ -17,7 +16,6 @@ log-ingestion.txt
/logs
*.log
*.mp3
!autogpt_platform/frontend/public/notification.mp3
mem.sqlite3
venvAutoGPT
@@ -161,10 +159,6 @@ CURRENT_BULLETIN.md
# AgBenchmark
classic/benchmark/agbenchmark/reports/
classic/reports/
classic/direct_benchmark/reports/
classic/.benchmark_workspaces/
classic/direct_benchmark/.benchmark_workspaces/
# Nodejs
package-lock.json
@@ -183,13 +177,9 @@ autogpt_platform/backend/settings.py
*.ign.*
.test-contents
**/.claude/settings.local.json
.claude/settings.local.json
CLAUDE.local.md
/autogpt_platform/backend/logs
# Test database
test.db
.next
# Implementation plans (generated by AI agents)
plans/

View File

@@ -1,36 +0,0 @@
title = "AutoGPT Gitleaks Config"
[extend]
useDefault = true
[allowlist]
description = "Global allowlist"
paths = [
# Template/example env files (no real secrets)
'''\.env\.(default|example|template)$''',
# Lock files
'''pnpm-lock\.yaml$''',
'''poetry\.lock$''',
# Secrets baseline
'''\.secrets\.baseline$''',
# Build artifacts and caches (should not be committed)
'''__pycache__/''',
'''classic/frontend/build/''',
# Docker dev setup (local dev JWTs/keys only)
'''autogpt_platform/db/docker/''',
# Load test configs (dev JWTs)
'''load-tests/configs/''',
# Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
'''(_test|test_.*|conftest)\.py$''',
# Documentation (only contains placeholder keys in curl/API examples)
'''docs/.*\.md$''',
# Firebase config (public API keys by design)
'''google-services\.json$''',
'''classic/frontend/(lib|web)/''',
]
# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
regexes = [
'''dvziYgz0KSK8FENhju0ZYi8''',
# LLM model name enum values falsely flagged as API keys
'''Llama-\d.*Instruct''',
]

3
.gitmodules vendored Normal file
View File

@@ -0,0 +1,3 @@
[submodule "classic/forge/tests/vcr_cassettes"]
path = classic/forge/tests/vcr_cassettes
url = https://github.com/Significant-Gravitas/Auto-GPT-test-cassettes

View File

@@ -23,15 +23,9 @@ repos:
- id: detect-secrets
name: Detect secrets
description: Detects high entropy strings that are likely to be passwords.
args: ["--baseline", ".secrets.baseline"]
files: ^autogpt_platform/
exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
- repo: https://github.com/gitleaks/gitleaks
rev: v8.24.3
hooks:
- id: gitleaks
name: Detect secrets (gitleaks)
exclude: pnpm-lock\.yaml$
stages: [pre-push]
- repo: local
# For proper type checking, all dependencies need to be up-to-date.
@@ -90,16 +84,51 @@ repos:
stages: [pre-commit, post-checkout]
- id: poetry-install
name: Check & Install dependencies - Classic
alias: poetry-install-classic
name: Check & Install dependencies - Classic - AutoGPT
alias: poetry-install-classic-autogpt
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^classic/poetry\.lock$" || exit 0;
poetry -C classic install
fi | grep -qE "^classic/(original_autogpt|forge)/poetry\.lock$" || exit 0;
poetry -C classic/original_autogpt install
'
# include forge source (since it's a path dependency)
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: poetry-install
name: Check & Install dependencies - Classic - Forge
alias: poetry-install-classic-forge
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^classic/forge/poetry\.lock$" || exit 0;
poetry -C classic/forge install
'
always_run: true
language: system
pass_filenames: false
stages: [pre-commit, post-checkout]
- id: poetry-install
name: Check & Install dependencies - Classic - Benchmark
alias: poetry-install-classic-benchmark
entry: >
bash -c '
if [ -n "$PRE_COMMIT_FROM_REF" ]; then
git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
else
git diff --cached --name-only
fi | grep -qE "^classic/benchmark/poetry\.lock$" || exit 0;
poetry -C classic/benchmark install
'
always_run: true
language: system
@@ -194,10 +223,26 @@ repos:
language: system
- id: isort
name: Lint (isort) - Classic
alias: isort-classic
entry: bash -c 'cd classic && poetry run isort $(echo "$@" | sed "s|classic/||g")' --
files: ^classic/(original_autogpt|forge|direct_benchmark)/
name: Lint (isort) - Classic - AutoGPT
alias: isort-classic-autogpt
entry: poetry -P classic/original_autogpt run isort -p autogpt
files: ^classic/original_autogpt/
types: [file, python]
language: system
- id: isort
name: Lint (isort) - Classic - Forge
alias: isort-classic-forge
entry: poetry -P classic/forge run isort -p forge
files: ^classic/forge/
types: [file, python]
language: system
- id: isort
name: Lint (isort) - Classic - Benchmark
alias: isort-classic-benchmark
entry: poetry -P classic/benchmark run isort -p agbenchmark
files: ^classic/benchmark/
types: [file, python]
language: system
@@ -211,13 +256,26 @@ repos:
- repo: https://github.com/PyCQA/flake8
rev: 7.0.0
# Use consolidated flake8 config at classic/.flake8
# To have flake8 load the config of the individual subprojects, we have to call
# them separately.
hooks:
- id: flake8
name: Lint (Flake8) - Classic
alias: flake8-classic
files: ^classic/(original_autogpt|forge|direct_benchmark)/
args: [--config=classic/.flake8]
name: Lint (Flake8) - Classic - AutoGPT
alias: flake8-classic-autogpt
files: ^classic/original_autogpt/(autogpt|scripts|tests)/
args: [--config=classic/original_autogpt/.flake8]
- id: flake8
name: Lint (Flake8) - Classic - Forge
alias: flake8-classic-forge
files: ^classic/forge/(forge|tests)/
args: [--config=classic/forge/.flake8]
- id: flake8
name: Lint (Flake8) - Classic - Benchmark
alias: flake8-classic-benchmark
files: ^classic/benchmark/(agbenchmark|tests)/((?!reports).)*[/.]
args: [--config=classic/benchmark/.flake8]
- repo: local
hooks:
@@ -253,10 +311,29 @@ repos:
pass_filenames: false
- id: pyright
name: Typecheck - Classic
alias: pyright-classic
entry: poetry -C classic run pyright
files: ^classic/(original_autogpt|forge|direct_benchmark)/.*\.py$|^classic/poetry\.lock$
name: Typecheck - Classic - AutoGPT
alias: pyright-classic-autogpt
entry: poetry -C classic/original_autogpt run pyright
# include forge source (since it's a path dependency) but exclude *_test.py files:
files: ^(classic/original_autogpt/((autogpt|scripts|tests)/|poetry\.lock$)|classic/forge/(forge/.*(?<!_test)\.py|poetry\.lock)$)
types: [file]
language: system
pass_filenames: false
- id: pyright
name: Typecheck - Classic - Forge
alias: pyright-classic-forge
entry: poetry -C classic/forge run pyright
files: ^classic/forge/(forge/|poetry\.lock$)
types: [file]
language: system
pass_filenames: false
- id: pyright
name: Typecheck - Classic - Benchmark
alias: pyright-classic-benchmark
entry: poetry -C classic/benchmark run pyright
files: ^classic/benchmark/(agbenchmark/|tests/|poetry\.lock$)
types: [file]
language: system
pass_filenames: false
@@ -283,9 +360,26 @@ repos:
# pass_filenames: false
# - id: pytest
# name: Run tests - Classic (excl. slow tests)
# alias: pytest-classic
# entry: bash -c 'cd classic && poetry run pytest -m "not slow"'
# files: ^classic/(original_autogpt|forge|direct_benchmark)/
# name: Run tests - Classic - AutoGPT (excl. slow tests)
# alias: pytest-classic-autogpt
# entry: bash -c 'cd classic/original_autogpt && poetry run pytest --cov=autogpt -m "not slow" tests/unit tests/integration'
# # include forge source (since it's a path dependency) but exclude *_test.py files:
# files: ^(classic/original_autogpt/((autogpt|tests)/|poetry\.lock$)|classic/forge/(forge/.*(?<!_test)\.py|poetry\.lock)$)
# language: system
# pass_filenames: false
# - id: pytest
# name: Run tests - Classic - Forge (excl. slow tests)
# alias: pytest-classic-forge
# entry: bash -c 'cd classic/forge && poetry run pytest --cov=forge -m "not slow"'
# files: ^classic/forge/(forge/|tests/|poetry\.lock$)
# language: system
# pass_filenames: false
# - id: pytest
# name: Run tests - Classic - Benchmark
# alias: pytest-classic-benchmark
# entry: bash -c 'cd classic/benchmark && poetry run pytest --cov=benchmark'
# files: ^classic/benchmark/(agbenchmark/|tests/|poetry\.lock$)
# language: system
# pass_filenames: false

View File

@@ -1,467 +0,0 @@
{
"version": "1.5.0",
"plugins_used": [
{
"name": "ArtifactoryDetector"
},
{
"name": "AWSKeyDetector"
},
{
"name": "AzureStorageKeyDetector"
},
{
"name": "Base64HighEntropyString",
"limit": 4.5
},
{
"name": "BasicAuthDetector"
},
{
"name": "CloudantDetector"
},
{
"name": "DiscordBotTokenDetector"
},
{
"name": "GitHubTokenDetector"
},
{
"name": "GitLabTokenDetector"
},
{
"name": "HexHighEntropyString",
"limit": 3.0
},
{
"name": "IbmCloudIamDetector"
},
{
"name": "IbmCosHmacDetector"
},
{
"name": "IPPublicDetector"
},
{
"name": "JwtTokenDetector"
},
{
"name": "KeywordDetector",
"keyword_exclude": ""
},
{
"name": "MailchimpDetector"
},
{
"name": "NpmDetector"
},
{
"name": "OpenAIDetector"
},
{
"name": "PrivateKeyDetector"
},
{
"name": "PypiTokenDetector"
},
{
"name": "SendGridDetector"
},
{
"name": "SlackDetector"
},
{
"name": "SoftlayerDetector"
},
{
"name": "SquareOAuthDetector"
},
{
"name": "StripeDetector"
},
{
"name": "TelegramBotTokenDetector"
},
{
"name": "TwilioKeyDetector"
}
],
"filters_used": [
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
},
{
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
"min_level": 2
},
{
"path": "detect_secrets.filters.heuristic.is_indirect_reference"
},
{
"path": "detect_secrets.filters.heuristic.is_likely_id_string"
},
{
"path": "detect_secrets.filters.heuristic.is_lock_file"
},
{
"path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
},
{
"path": "detect_secrets.filters.heuristic.is_potential_uuid"
},
{
"path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
},
{
"path": "detect_secrets.filters.heuristic.is_sequential_string"
},
{
"path": "detect_secrets.filters.heuristic.is_swagger_file"
},
{
"path": "detect_secrets.filters.heuristic.is_templated_secret"
},
{
"path": "detect_secrets.filters.regex.should_exclude_file",
"pattern": [
"\\.env$",
"pnpm-lock\\.yaml$",
"\\.env\\.(default|example|template)$",
"__pycache__",
"_test\\.py$",
"test_.*\\.py$",
"conftest\\.py$",
"poetry\\.lock$",
"node_modules"
]
}
],
"results": {
"autogpt_platform/backend/backend/api/external/v1/integrations.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
"hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
"is_verified": false,
"line_number": 289
}
],
"autogpt_platform/backend/backend/blocks/airtable/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
"hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
"is_verified": false,
"line_number": 29
}
],
"autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
"hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
"is_verified": false,
"line_number": 12
}
],
"autogpt_platform/backend/backend/blocks/github/checks.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
"is_verified": false,
"line_number": 108
}
],
"autogpt_platform/backend/backend/blocks/github/ci.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
"hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
"is_verified": false,
"line_number": 123
}
],
"autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
"is_verified": false,
"line_number": 42
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
"is_verified": false,
"line_number": 193
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
"is_verified": false,
"line_number": 344
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
"is_verified": false,
"line_number": 534
}
],
"autogpt_platform/backend/backend/blocks/github/statuses.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
"is_verified": false,
"line_number": 85
}
],
"autogpt_platform/backend/backend/blocks/google/docs.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
"hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
"is_verified": false,
"line_number": 203
}
],
"autogpt_platform/backend/backend/blocks/google/sheets.py": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
"hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
"is_verified": false,
"line_number": 57
}
],
"autogpt_platform/backend/backend/blocks/linear/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
"hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
"is_verified": false,
"line_number": 53
}
],
"autogpt_platform/backend/backend/blocks/medium.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/medium.py",
"hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
"is_verified": false,
"line_number": 131
}
],
"autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
"hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
"is_verified": false,
"line_number": 55
}
],
"autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
"hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
"is_verified": false,
"line_number": 100
}
],
"autogpt_platform/backend/backend/blocks/talking_head.py": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
"hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
"is_verified": false,
"line_number": 113
}
],
"autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
"hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
"is_verified": false,
"line_number": 17
}
],
"autogpt_platform/backend/backend/util/cache.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/util/cache.py",
"hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
"is_verified": false,
"line_number": 449
}
],
"autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
"hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
"is_verified": false,
"line_number": 6
}
],
"autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
"is_verified": false,
"line_number": 5
}
],
"autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
"hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
"is_verified": false,
"line_number": 5
}
],
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 6
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
"is_verified": false,
"line_number": 8
}
],
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 5
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
"is_verified": false,
"line_number": 7
}
],
"autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 192
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
"hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
"is_verified": false,
"line_number": 193
}
],
"autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
"hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
"is_verified": false,
"line_number": 102
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
"is_verified": false,
"line_number": 103
}
],
"autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
"is_verified": false,
"line_number": 73
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
"is_verified": false,
"line_number": 75
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
"is_verified": false,
"line_number": 77
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
"is_verified": false,
"line_number": 79
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
"is_verified": false,
"line_number": 81
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
"is_verified": false,
"line_number": 83
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
"is_verified": false,
"line_number": 85
}
],
"autogpt_platform/frontend/src/lib/constants.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/lib/constants.ts",
"hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
"is_verified": false,
"line_number": 10
}
],
"autogpt_platform/frontend/src/tests/credentials/index.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
"hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
"is_verified": false,
"line_number": 4
}
]
},
"generated_at": "2026-04-02T13:10:54Z"
}

View File

@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
- Regenerate with `pnpm generate:api`
- Pattern: `use{Method}{Version}{OperationName}`
4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
5. **Testing**: Add Storybook stories for new components, Playwright for E2E
6. **Code conventions**: Function declarations (not arrow functions) for components/handlers
- Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,9 +47,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
## Testing
- Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.
- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
Always run the relevant linters and tests before committing.
Use conventional commit messages for all commits (e.g. `feat(backend): add API`).

View File

@@ -0,0 +1,85 @@
import logging
import typing
from datetime import datetime
from autogpt_libs.auth import get_user_id, requires_admin_user
from fastapi import APIRouter, Query, Security
from pydantic import BaseModel
from backend.data.platform_cost import (
CostLogRow,
PlatformCostDashboard,
get_platform_cost_dashboard,
get_platform_cost_logs,
)
from backend.util.models import Pagination
logger = logging.getLogger(__name__)
router = APIRouter(
prefix="/admin",
tags=["platform-cost", "admin"],
dependencies=[Security(requires_admin_user)],
)
class PlatformCostLogsResponse(BaseModel):
logs: list[CostLogRow]
pagination: Pagination
@router.get(
"/platform_costs/dashboard",
response_model=PlatformCostDashboard,
summary="Get Platform Cost Dashboard",
)
async def get_cost_dashboard(
admin_user_id: str = Security(get_user_id),
start: typing.Optional[datetime] = Query(None),
end: typing.Optional[datetime] = Query(None),
provider: typing.Optional[str] = Query(None),
user_id: typing.Optional[str] = Query(None),
):
logger.info(f"Admin {admin_user_id} fetching platform cost dashboard")
return await get_platform_cost_dashboard(
start=start,
end=end,
provider=provider,
user_id=user_id,
)
@router.get(
"/platform_costs/logs",
response_model=PlatformCostLogsResponse,
summary="Get Platform Cost Logs",
)
async def get_cost_logs(
admin_user_id: str = Security(get_user_id),
start: typing.Optional[datetime] = Query(None),
end: typing.Optional[datetime] = Query(None),
provider: typing.Optional[str] = Query(None),
user_id: typing.Optional[str] = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
):
logger.info(f"Admin {admin_user_id} fetching platform cost logs")
logs, total = await get_platform_cost_logs(
start=start,
end=end,
provider=provider,
user_id=user_id,
page=page,
page_size=page_size,
)
total_pages = (total + page_size - 1) // page_size
return PlatformCostLogsResponse(
logs=logs,
pagination=Pagination(
total_items=total,
total_pages=total_pages,
current_page=page,
page_size=page_size,
),
)

View File

@@ -0,0 +1,135 @@
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from .platform_cost_routes import router as platform_cost_router
app = fastapi.FastAPI()
app.include_router(platform_cost_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
"""Setup admin auth overrides for all tests in this module"""
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def test_get_dashboard_success(
mocker: pytest_mock.MockerFixture,
) -> None:
mock_dashboard = AsyncMock(
return_value=AsyncMock(
by_provider=[],
by_user=[],
total_cost_microdollars=0,
total_requests=0,
total_users=0,
model_dump=lambda **_: {
"by_provider": [],
"by_user": [],
"total_cost_microdollars": 0,
"total_requests": 0,
"total_users": 0,
},
)
)
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
mock_dashboard,
)
response = client.get("/admin/platform_costs/dashboard")
assert response.status_code == 200
data = response.json()
assert "by_provider" in data
assert "by_user" in data
assert data["total_cost_microdollars"] == 0
def test_get_logs_success(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
AsyncMock(return_value=([], 0)),
)
response = client.get("/admin/platform_costs/logs")
assert response.status_code == 200
data = response.json()
assert data["logs"] == []
assert data["pagination"]["total_items"] == 0
def test_get_dashboard_with_filters(
mocker: pytest_mock.MockerFixture,
) -> None:
mock_dashboard = AsyncMock(
return_value=AsyncMock(
by_provider=[],
by_user=[],
total_cost_microdollars=0,
total_requests=0,
total_users=0,
model_dump=lambda **_: {
"by_provider": [],
"by_user": [],
"total_cost_microdollars": 0,
"total_requests": 0,
"total_users": 0,
},
)
)
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
mock_dashboard,
)
response = client.get(
"/admin/platform_costs/dashboard",
params={
"start": "2026-01-01T00:00:00",
"end": "2026-04-01T00:00:00",
"provider": "openai",
"user_id": "test-user-123",
},
)
assert response.status_code == 200
mock_dashboard.assert_called_once()
call_kwargs = mock_dashboard.call_args.kwargs
assert call_kwargs["provider"] == "openai"
assert call_kwargs["user_id"] == "test-user-123"
assert call_kwargs["start"] is not None
assert call_kwargs["end"] is not None
def test_get_logs_with_pagination(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
AsyncMock(return_value=([], 0)),
)
response = client.get(
"/admin/platform_costs/logs",
params={"page": 2, "page_size": 25, "provider": "anthropic"},
)
assert response.status_code == 200
data = response.json()
assert data["pagination"]["current_page"] == 2
assert data["pagination"]["page_size"] == 25
def test_get_dashboard_requires_admin() -> None:
app.dependency_overrides.clear()
response = client.get("/admin/platform_costs/dashboard")
assert response.status_code in (401, 403)

View File

@@ -194,29 +194,23 @@ async def set_user_rate_limit_tier(
request: SetUserTierRequest,
admin_user_id: str = Security(get_user_id),
) -> UserTierResponse:
"""Set a user's rate-limit tier. Admin-only.
"""Set a user's rate-limit tier. Admin-only."""
old_tier = await get_user_tier(request.user_id)
Returns 404 if the user does not exist in the database.
"""
# Resolve email for audit logging (non-blocking — don't fail the
# tier change if email lookup fails).
try:
resolved_email = await get_user_email_by_id(request.user_id)
except Exception:
logger.warning(
"Failed to resolve email for user %s",
request.user_id,
exc_info=True,
"Failed to resolve email for user %s", request.user_id, exc_info=True
)
resolved_email = None
if resolved_email is None:
raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
old_tier = await get_user_tier(request.user_id)
logger.info(
"Admin %s changing tier for user %s (%s): %s -> %s",
admin_user_id,
request.user_id,
resolved_email,
resolved_email or "unknown",
old_tier.value,
request.tier.value,
)

View File

@@ -405,34 +405,24 @@ def test_set_user_tier_invalid_tier_uppercase(
assert "detail" in body
def test_set_user_tier_email_lookup_failure_returns_404(
def test_set_user_tier_email_lookup_failure_non_blocking(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that email lookup failure returns 404 (user unverifiable)."""
"""Test that email lookup failure doesn't block tier change."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
side_effect=Exception("DB connection failed"),
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 404
def test_set_user_tier_user_not_found(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that setting tier for a non-existent user returns 404."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
return_value=None,
)
response = client.post(
@@ -440,7 +430,8 @@ def test_set_user_tier_user_not_found(
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 404
assert response.status_code == 200
mock_set.assert_awaited_once()
def test_set_user_tier_db_failure(

View File

@@ -4,7 +4,7 @@ import asyncio
import logging
import re
from collections.abc import AsyncGenerator
from typing import Annotated
from typing import Annotated, Literal
from uuid import uuid4
from autogpt_libs import auth
@@ -15,8 +15,7 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator
from backend.copilot import service as chat_service
from backend.copilot import stream_registry
from backend.copilot.config import ChatConfig, CopilotMode
from backend.copilot.db import get_chat_messages_paginated
from backend.copilot.config import ChatConfig
from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
from backend.copilot.model import (
ChatMessage,
@@ -112,7 +111,7 @@ class StreamChatRequest(BaseModel):
file_ids: list[str] | None = Field(
default=None, max_length=20
) # Workspace file IDs attached to this message
mode: CopilotMode | None = Field(
mode: Literal["fast", "extended_thinking"] | None = Field(
default=None,
description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
"If None, uses the server default (extended_thinking).",
@@ -156,8 +155,6 @@ class SessionDetailResponse(BaseModel):
user_id: str | None
messages: list[dict]
active_stream: ActiveStreamInfo | None = None # Present if stream is still active
has_more_messages: bool = False
oldest_sequence: int | None = None
total_prompt_tokens: int = 0
total_completion_tokens: int = 0
metadata: ChatSessionMetadata = ChatSessionMetadata()
@@ -397,78 +394,60 @@ async def update_session_title_route(
async def get_session(
session_id: str,
user_id: Annotated[str, Security(auth.get_user_id)],
limit: int = Query(default=50, ge=1, le=200),
before_sequence: int | None = Query(default=None, ge=0),
) -> SessionDetailResponse:
"""
Retrieve the details of a specific chat session.
Supports cursor-based pagination via ``limit`` and ``before_sequence``.
When no pagination params are provided, returns the most recent messages.
Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
If there's an active stream for this session, returns active_stream info for reconnection.
Args:
session_id: The unique identifier for the desired chat session.
user_id: The authenticated user's ID.
limit: Maximum number of messages to return (1-200, default 50).
before_sequence: Return messages with sequence < this value (cursor).
user_id: The optional authenticated user ID, or None for anonymous access.
Returns:
SessionDetailResponse: Details for the requested session, including
active_stream info and pagination metadata.
SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
"""
page = await get_chat_messages_paginated(
session_id, limit, before_sequence, user_id=user_id
)
if page is None:
session = await get_chat_session(session_id, user_id)
if not session:
raise NotFoundError(f"Session {session_id} not found.")
messages = [message.model_dump() for message in page.messages]
# Only check active stream on initial load (not on "load more" requests)
messages = [message.model_dump() for message in session.messages]
# Check if there's an active stream for this session
active_stream_info = None
if before_sequence is None:
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
active_stream_info = ActiveStreamInfo(
turn_id=active_session.turn_id,
last_message_id=last_message_id,
)
# Skip session metadata on "load more" — frontend only needs messages
if before_sequence is not None:
return SessionDetailResponse(
id=page.session.session_id,
created_at=page.session.started_at.isoformat(),
updated_at=page.session.updated_at.isoformat(),
user_id=page.session.user_id or None,
messages=messages,
active_stream=None,
has_more_messages=page.has_more,
oldest_sequence=page.oldest_sequence,
total_prompt_tokens=0,
total_completion_tokens=0,
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
# Keep the assistant message (including tool_calls) so the frontend can
# render the correct tool UI (e.g. CreateAgent with mini game).
# convertChatSessionToUiMessages handles isComplete=false by setting
# tool parts without output to state "input-available".
active_stream_info = ActiveStreamInfo(
turn_id=active_session.turn_id,
last_message_id=last_message_id,
)
total_prompt = sum(u.prompt_tokens for u in page.session.usage)
total_completion = sum(u.completion_tokens for u in page.session.usage)
# Sum token usage from session
total_prompt = sum(u.prompt_tokens for u in session.usage)
total_completion = sum(u.completion_tokens for u in session.usage)
return SessionDetailResponse(
id=page.session.session_id,
created_at=page.session.started_at.isoformat(),
updated_at=page.session.updated_at.isoformat(),
user_id=page.session.user_id or None,
id=session.session_id,
created_at=session.started_at.isoformat(),
updated_at=session.updated_at.isoformat(),
user_id=session.user_id or None,
messages=messages,
active_stream=active_stream_info,
has_more_messages=page.has_more,
oldest_sequence=page.oldest_sequence,
total_prompt_tokens=total_prompt,
total_completion_tokens=total_completion,
metadata=page.session.metadata,
metadata=session.metadata,
)

View File

@@ -541,41 +541,3 @@ def test_create_session_rejects_nested_metadata(
)
assert response.status_code == 422
class TestStreamChatRequestModeValidation:
"""Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
def test_rejects_invalid_mode_value(self) -> None:
"""Any string outside the Literal set must raise ValidationError."""
from pydantic import ValidationError
from backend.api.features.chat.routes import StreamChatRequest
with pytest.raises(ValidationError):
StreamChatRequest(message="hi", mode="turbo") # type: ignore[arg-type]
def test_accepts_fast_mode(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode="fast")
assert req.mode == "fast"
def test_accepts_extended_thinking_mode(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode="extended_thinking")
assert req.mode == "extended_thinking"
def test_accepts_none_mode(self) -> None:
"""``mode=None`` is valid (server decides via feature flags)."""
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode=None)
assert req.mode is None
def test_mode_defaults_to_none_when_omitted(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi")
assert req.mode is None

View File

@@ -1,61 +0,0 @@
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import pytest
from backend.api.features.v1 import v1_router
app = fastapi.FastAPI()
app.include_router(v1_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_auth(mock_jwt_user):
from autogpt_libs.auth.jwt_utils import get_jwt_payload
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def test_onboarding_profile_success(mocker):
mock_extract = mocker.patch(
"backend.api.features.v1.extract_business_understanding",
new_callable=AsyncMock,
)
mock_upsert = mocker.patch(
"backend.api.features.v1.upsert_business_understanding",
new_callable=AsyncMock,
)
from backend.data.understanding import BusinessUnderstandingInput
mock_extract.return_value = BusinessUnderstandingInput.model_construct(
user_name="John",
user_role="Founder/CEO",
pain_points=["Finding leads"],
suggested_prompts={"Learn": ["How do I automate lead gen?"]},
)
mock_upsert.return_value = AsyncMock()
response = client.post(
"/onboarding/profile",
json={
"user_name": "John",
"user_role": "Founder/CEO",
"pain_points": ["Finding leads", "Email & outreach"],
},
)
assert response.status_code == 200
mock_extract.assert_awaited_once()
mock_upsert.assert_awaited_once()
def test_onboarding_profile_missing_fields():
response = client.post(
"/onboarding/profile",
json={"user_name": "John"},
)
assert response.status_code == 422

View File

@@ -63,17 +63,12 @@ from backend.data.onboarding import (
UserOnboardingUpdate,
complete_onboarding_step,
complete_re_run_agent,
format_onboarding_for_extraction,
get_recommended_agents,
get_user_onboarding,
onboarding_enabled,
reset_user_onboarding,
update_user_onboarding,
)
from backend.data.tally import extract_business_understanding
from backend.data.understanding import (
BusinessUnderstandingInput,
upsert_business_understanding,
)
from backend.data.user import (
get_or_create_user,
get_user_by_id,
@@ -287,33 +282,35 @@ async def get_onboarding_agents(
return await get_recommended_agents(user_id)
class OnboardingProfileRequest(pydantic.BaseModel):
"""Request body for onboarding profile submission."""
user_name: str = pydantic.Field(min_length=1, max_length=100)
user_role: str = pydantic.Field(min_length=1, max_length=100)
pain_points: list[str] = pydantic.Field(default_factory=list, max_length=20)
class OnboardingStatusResponse(pydantic.BaseModel):
"""Response for onboarding completion check."""
"""Response for onboarding status check."""
is_completed: bool
is_onboarding_enabled: bool
is_chat_enabled: bool
@v1_router.get(
"/onboarding/completed",
summary="Check if onboarding is completed",
"/onboarding/enabled",
summary="Is onboarding enabled",
tags=["onboarding", "public"],
response_model=OnboardingStatusResponse,
dependencies=[Security(requires_user)],
)
async def is_onboarding_completed(
async def is_onboarding_enabled(
user_id: Annotated[str, Security(get_user_id)],
) -> OnboardingStatusResponse:
user_onboarding = await get_user_onboarding(user_id)
# Check if chat is enabled for user
is_chat_enabled = await is_feature_enabled(Flag.CHAT, user_id, False)
# If chat is enabled, skip legacy onboarding
if is_chat_enabled:
return OnboardingStatusResponse(
is_onboarding_enabled=False,
is_chat_enabled=True,
)
return OnboardingStatusResponse(
is_completed=OnboardingStep.VISIT_COPILOT in user_onboarding.completedSteps,
is_onboarding_enabled=await onboarding_enabled(),
is_chat_enabled=False,
)
@@ -328,38 +325,6 @@ async def reset_onboarding(user_id: Annotated[str, Security(get_user_id)]):
return await reset_user_onboarding(user_id)
@v1_router.post(
"/onboarding/profile",
summary="Submit onboarding profile",
tags=["onboarding"],
dependencies=[Security(requires_user)],
)
async def submit_onboarding_profile(
data: OnboardingProfileRequest,
user_id: Annotated[str, Security(get_user_id)],
):
formatted = format_onboarding_for_extraction(
user_name=data.user_name,
user_role=data.user_role,
pain_points=data.pain_points,
)
try:
understanding_input = await extract_business_understanding(formatted)
except Exception:
understanding_input = BusinessUnderstandingInput.model_construct()
# Ensure the direct fields are set even if LLM missed them
understanding_input.user_name = data.user_name
understanding_input.user_role = data.user_role
if not understanding_input.pain_points:
understanding_input.pain_points = data.pain_points
await upsert_business_understanding(user_id, understanding_input)
return {"status": "ok"}
########################################################
##################### Blocks ###########################
########################################################

View File

@@ -12,7 +12,7 @@ import fastapi
from autogpt_libs.auth.dependencies import get_user_id, requires_user
from fastapi import Query, UploadFile
from fastapi.responses import Response
from pydantic import BaseModel, Field
from pydantic import BaseModel
from backend.data.workspace import (
WorkspaceFile,
@@ -131,26 +131,9 @@ class StorageUsageResponse(BaseModel):
file_count: int
class WorkspaceFileItem(BaseModel):
id: str
name: str
path: str
mime_type: str
size_bytes: int
metadata: dict = Field(default_factory=dict)
created_at: str
class ListFilesResponse(BaseModel):
files: list[WorkspaceFileItem]
offset: int = 0
has_more: bool = False
@router.get(
"/files/{file_id}/download",
summary="Download file by ID",
operation_id="getWorkspaceDownloadFileById",
)
async def download_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -175,7 +158,6 @@ async def download_file(
@router.delete(
"/files/{file_id}",
summary="Delete a workspace file",
operation_id="deleteWorkspaceFile",
)
async def delete_workspace_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -201,7 +183,6 @@ async def delete_workspace_file(
@router.post(
"/files/upload",
summary="Upload file to workspace",
operation_id="uploadWorkspaceFile",
)
async def upload_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -215,9 +196,6 @@ async def upload_file(
Files are stored in session-scoped paths when session_id is provided,
so the agent's session-scoped tools can discover them automatically.
"""
# Empty-string session_id drops session scoping; normalize to None.
session_id = session_id or None
config = Config()
# Sanitize filename — strip any directory components
@@ -272,27 +250,16 @@ async def upload_file(
manager = WorkspaceManager(user_id, workspace.id, session_id)
try:
workspace_file = await manager.write_file(
content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
content, filename, overwrite=overwrite
)
except ValueError as e:
# write_file raises ValueError for both path-conflict and size-limit
# cases; map each to its correct HTTP status.
message = str(e)
if message.startswith("File too large"):
raise fastapi.HTTPException(status_code=413, detail=message) from e
raise fastapi.HTTPException(status_code=409, detail=message) from e
raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
# Post-write storage check — eliminates TOCTOU race on the quota.
# If a concurrent upload pushed us over the limit, undo this write.
new_total = await get_workspace_total_size(workspace.id)
if storage_limit_bytes and new_total > storage_limit_bytes:
try:
await soft_delete_workspace_file(workspace_file.id, workspace.id)
except Exception as e:
logger.warning(
f"Failed to soft-delete over-quota file {workspace_file.id} "
f"in workspace {workspace.id}: {e}"
)
await soft_delete_workspace_file(workspace_file.id, workspace.id)
raise fastapi.HTTPException(
status_code=413,
detail={
@@ -314,7 +281,6 @@ async def upload_file(
@router.get(
"/storage/usage",
summary="Get workspace storage usage",
operation_id="getWorkspaceStorageUsage",
)
async def get_storage_usage(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -335,57 +301,3 @@ async def get_storage_usage(
used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
file_count=file_count,
)
@router.get(
"/files",
summary="List workspace files",
operation_id="listWorkspaceFiles",
)
async def list_workspace_files(
user_id: Annotated[str, fastapi.Security(get_user_id)],
session_id: str | None = Query(default=None),
limit: int = Query(default=200, ge=1, le=1000),
offset: int = Query(default=0, ge=0),
) -> ListFilesResponse:
"""
List files in the user's workspace.
When session_id is provided, only files for that session are returned.
Otherwise, all files across sessions are listed. Results are paginated
via `limit`/`offset`; `has_more` indicates whether additional pages exist.
"""
workspace = await get_or_create_workspace(user_id)
# Treat empty-string session_id the same as omitted — an empty value
# would otherwise silently list files across every session instead of
# scoping to one.
session_id = session_id or None
manager = WorkspaceManager(user_id, workspace.id, session_id)
include_all = session_id is None
# Fetch one extra to compute has_more without a separate count query.
files = await manager.list_files(
limit=limit + 1,
offset=offset,
include_all_sessions=include_all,
)
has_more = len(files) > limit
page = files[:limit]
return ListFilesResponse(
files=[
WorkspaceFileItem(
id=f.id,
name=f.name,
path=f.path,
mime_type=f.mime_type,
size_bytes=f.size_bytes,
metadata=f.metadata or {},
created_at=f.created_at.isoformat(),
)
for f in page
],
offset=offset,
has_more=has_more,
)

View File

@@ -1,28 +1,48 @@
"""Tests for workspace file upload and download routes."""
import io
from datetime import datetime, timezone
from unittest.mock import AsyncMock, MagicMock, patch
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from backend.api.features.workspace.routes import router
from backend.data.workspace import Workspace, WorkspaceFile
from backend.api.features.workspace import routes as workspace_routes
from backend.data.workspace import WorkspaceFile
app = fastapi.FastAPI()
app.include_router(router)
app.include_router(workspace_routes.router)
@app.exception_handler(ValueError)
async def _value_error_handler(
request: fastapi.Request, exc: ValueError
) -> fastapi.responses.JSONResponse:
"""Mirror the production ValueError → 400 mapping from the REST app."""
"""Mirror the production ValueError → 400 mapping from rest_api.py."""
return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})
client = fastapi.testclient.TestClient(app)
TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
MOCK_FILE = WorkspaceFile(
id="file-aaa-bbb",
workspace_id="ws-1",
created_at=_NOW,
updated_at=_NOW,
name="hello.txt",
path="/session/hello.txt",
mime_type="text/plain",
size_bytes=13,
storage_path="local://hello.txt",
)
@pytest.fixture(autouse=True)
def setup_app_auth(mock_jwt_user):
@@ -33,201 +53,25 @@ def setup_app_auth(mock_jwt_user):
app.dependency_overrides.clear()
def _make_workspace(user_id: str = "test-user-id") -> Workspace:
return Workspace(
id="ws-001",
user_id=user_id,
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
)
def _make_file(**overrides) -> WorkspaceFile:
defaults = {
"id": "file-001",
"workspace_id": "ws-001",
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"name": "test.txt",
"path": "/test.txt",
"storage_path": "local://test.txt",
"mime_type": "text/plain",
"size_bytes": 100,
"checksum": None,
"is_deleted": False,
"deleted_at": None,
"metadata": {},
}
defaults.update(overrides)
return WorkspaceFile(**defaults)
def _make_file_mock(**overrides) -> MagicMock:
"""Create a mock WorkspaceFile to simulate DB records with null fields."""
defaults = {
"id": "file-001",
"name": "test.txt",
"path": "/test.txt",
"mime_type": "text/plain",
"size_bytes": 100,
"metadata": {},
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
}
defaults.update(overrides)
mock = MagicMock(spec=WorkspaceFile)
for k, v in defaults.items():
setattr(mock, k, v)
return mock
# -- list_workspace_files tests --
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
mock_get_workspace.return_value = _make_workspace()
files = [
_make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
_make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files")
assert response.status_code == 200
data = response.json()
assert len(data["files"]) == 2
assert data["has_more"] is False
assert data["offset"] == 0
assert data["files"][0]["id"] == "f1"
assert data["files"][0]["metadata"] == {"origin": "user-upload"}
assert data["files"][1]["id"] == "f2"
mock_instance.list_files.assert_called_once_with(
limit=201, offset=0, include_all_sessions=True
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_scopes_to_session_when_provided(
mock_manager_cls, mock_get_workspace, test_user_id
):
mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
mock_instance = AsyncMock()
mock_instance.list_files.return_value = []
mock_manager_cls.return_value = mock_instance
response = client.get("/files?session_id=sess-123")
assert response.status_code == 200
data = response.json()
assert data["files"] == []
assert data["has_more"] is False
mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
mock_instance.list_files.assert_called_once_with(
limit=201, offset=0, include_all_sessions=False
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_null_metadata_coerced_to_empty_dict(
mock_manager_cls, mock_get_workspace
):
"""Route uses `f.metadata or {}` for pre-existing files with null metadata."""
mock_get_workspace.return_value = _make_workspace()
mock_instance = AsyncMock()
mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
mock_manager_cls.return_value = mock_instance
response = client.get("/files")
assert response.status_code == 200
assert response.json()["files"][0]["metadata"] == {}
# -- upload_file metadata tests --
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
@patch("backend.api.features.workspace.routes.scan_content_safe")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_upload_passes_user_upload_origin_metadata(
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
):
mock_get_workspace.return_value = _make_workspace()
mock_total_size.return_value = 100
written = _make_file(id="new-file", name="doc.pdf")
mock_instance = AsyncMock()
mock_instance.write_file.return_value = written
mock_manager_cls.return_value = mock_instance
response = client.post(
"/files/upload",
files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
)
assert response.status_code == 200
mock_instance.write_file.assert_called_once()
call_kwargs = mock_instance.write_file.call_args
assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
@patch("backend.api.features.workspace.routes.scan_content_safe")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_upload_returns_409_on_file_conflict(
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
):
mock_get_workspace.return_value = _make_workspace()
mock_total_size.return_value = 100
mock_instance = AsyncMock()
mock_instance.write_file.side_effect = ValueError("File already exists at path")
mock_manager_cls.return_value = mock_instance
response = client.post(
"/files/upload",
files={"file": ("dup.txt", b"content", "text/plain")},
)
assert response.status_code == 409
assert "already exists" in response.json()["detail"]
# -- Restored upload/download/delete security + invariant tests --
def _upload(
filename: str = "hello.txt",
content: bytes = b"Hello, world!",
content_type: str = "text/plain",
):
"""Helper to POST a file upload."""
return client.post(
"/files/upload?session_id=sess-1",
files={"file": (filename, io.BytesIO(content), content_type)},
)
_MOCK_FILE = WorkspaceFile(
id="file-aaa-bbb",
workspace_id="ws-001",
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
name="hello.txt",
path="/sessions/sess-1/hello.txt",
mime_type="text/plain",
size_bytes=13,
storage_path="local://hello.txt",
)
# ---- Happy path ----
def test_upload_happy_path(mocker):
def test_upload_happy_path(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -238,7 +82,7 @@ def test_upload_happy_path(mocker):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -252,7 +96,10 @@ def test_upload_happy_path(mocker):
assert data["size_bytes"] == 13
def test_upload_exceeds_max_file_size(mocker):
# ---- Per-file size limit ----
def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
"""Files larger than max_file_size_mb should be rejected with 413."""
cfg = mocker.patch("backend.api.features.workspace.routes.Config")
cfg.return_value.max_file_size_mb = 0 # 0 MB → any content is too big
@@ -262,11 +109,15 @@ def test_upload_exceeds_max_file_size(mocker):
assert response.status_code == 413
def test_upload_storage_quota_exceeded(mocker):
# ---- Storage quota exceeded ----
def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
# Current usage already at limit
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=500 * 1024 * 1024,
@@ -277,22 +128,27 @@ def test_upload_storage_quota_exceeded(mocker):
assert "Storage limit exceeded" in response.text
def test_upload_post_write_quota_race(mocker):
"""Concurrent upload tipping over limit after write should soft-delete + 413."""
# ---- Post-write quota race (B2) ----
def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
"""If a concurrent upload tips the total over the limit after write,
the file should be soft-deleted and 413 returned."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
# Pre-write check passes (under limit), but post-write check fails
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
side_effect=[0, 600 * 1024 * 1024],
side_effect=[0, 600 * 1024 * 1024], # first call OK, second over limit
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -304,14 +160,17 @@ def test_upload_post_write_quota_race(mocker):
response = _upload()
assert response.status_code == 413
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
def test_upload_any_extension(mocker):
# ---- Any extension accepted (no allowlist) ----
def test_upload_any_extension(mocker: pytest_mock.MockFixture):
"""Any file extension should be accepted — ClamAV is the security layer."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -322,7 +181,7 @@ def test_upload_any_extension(mocker):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -332,13 +191,16 @@ def test_upload_any_extension(mocker):
assert response.status_code == 200
def test_upload_blocked_by_virus_scan(mocker):
# ---- Virus scan rejection ----
def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
"""Files flagged by ClamAV should be rejected and never written to storage."""
from backend.api.features.store.exceptions import VirusDetectedError
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -349,7 +211,7 @@ def test_upload_blocked_by_virus_scan(mocker):
side_effect=VirusDetectedError("Eicar-Test-Signature"),
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -357,14 +219,18 @@ def test_upload_blocked_by_virus_scan(mocker):
response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
assert response.status_code == 400
assert "Virus detected" in response.text
mock_manager.write_file.assert_not_called()
def test_upload_file_without_extension(mocker):
# ---- No file extension ----
def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
"""Files without an extension should be accepted and stored as-is."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -375,7 +241,7 @@ def test_upload_file_without_extension(mocker):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -391,11 +257,14 @@ def test_upload_file_without_extension(mocker):
assert mock_manager.write_file.call_args[0][1] == "Makefile"
def test_upload_strips_path_components(mocker):
# ---- Filename sanitization (SF5) ----
def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
"""Path-traversal filenames should be reduced to their basename."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -406,23 +275,28 @@ def test_upload_strips_path_components(mocker):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
# Filename with traversal
_upload(filename="../../etc/passwd.txt")
# write_file should have been called with just the basename
mock_manager.write_file.assert_called_once()
call_args = mock_manager.write_file.call_args
assert call_args[0][1] == "passwd.txt"
def test_download_file_not_found(mocker):
# ---- Download ----
def test_download_file_not_found(mocker: pytest_mock.MockFixture):
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_file",
@@ -433,11 +307,14 @@ def test_download_file_not_found(mocker):
assert response.status_code == 404
def test_delete_file_success(mocker):
# ---- Delete ----
def test_delete_file_success(mocker: pytest_mock.MockFixture):
"""Deleting an existing file should return {"deleted": true}."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mock_manager = mocker.MagicMock()
mock_manager.delete_file = mocker.AsyncMock(return_value=True)
@@ -452,11 +329,11 @@ def test_delete_file_success(mocker):
mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")
def test_delete_file_not_found(mocker):
def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
"""Deleting a non-existent file should return 404."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=_make_workspace(),
return_value=MOCK_WORKSPACE,
)
mock_manager = mocker.MagicMock()
mock_manager.delete_file = mocker.AsyncMock(return_value=False)
@@ -470,7 +347,7 @@ def test_delete_file_not_found(mocker):
assert "File not found" in response.text
def test_delete_file_no_workspace(mocker):
def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
"""Deleting when user has no workspace should return 404."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
@@ -480,123 +357,3 @@ def test_delete_file_no_workspace(mocker):
response = client.delete("/files/file-aaa-bbb")
assert response.status_code == 404
assert "Workspace not found" in response.text
def test_upload_write_file_too_large_returns_413(mocker):
"""write_file raises ValueError("File too large: …") → must map to 413."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=0,
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(
side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
response = _upload()
assert response.status_code == 413
assert "File too large" in response.text
def test_upload_write_file_conflict_returns_409(mocker):
"""Non-'File too large' ValueErrors from write_file stay as 409."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=0,
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(
side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
response = _upload()
assert response.status_code == 409
assert "already exists" in response.text
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_has_more_true_when_limit_exceeded(
mock_manager_cls, mock_get_workspace
):
"""The limit+1 fetch trick must flip has_more=True and trim the page."""
mock_get_workspace.return_value = _make_workspace()
# Backend was asked for limit+1=3, and returned exactly 3 items.
files = [
_make_file(id="f1", name="a.txt"),
_make_file(id="f2", name="b.txt"),
_make_file(id="f3", name="c.txt"),
]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files?limit=2")
assert response.status_code == 200
data = response.json()
assert data["has_more"] is True
assert len(data["files"]) == 2
assert data["files"][0]["id"] == "f1"
assert data["files"][1]["id"] == "f2"
mock_instance.list_files.assert_called_once_with(
limit=3, offset=0, include_all_sessions=True
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_has_more_false_when_exactly_page_size(
mock_manager_cls, mock_get_workspace
):
"""Exactly `limit` rows means we're on the last page — has_more=False."""
mock_get_workspace.return_value = _make_workspace()
files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files?limit=2")
assert response.status_code == 200
data = response.json()
assert data["has_more"] is False
assert len(data["files"]) == 2
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
mock_get_workspace.return_value = _make_workspace()
mock_instance = AsyncMock()
mock_instance.list_files.return_value = []
mock_manager_cls.return_value = mock_instance
response = client.get("/files?offset=50&limit=10")
assert response.status_code == 200
assert response.json()["offset"] == 50
mock_instance.list_files.assert_called_once_with(
limit=11, offset=50, include_all_sessions=True
)

View File

@@ -18,6 +18,7 @@ from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.platform_cost_routes
import backend.api.features.admin.rate_limit_admin_routes
import backend.api.features.admin.store_admin_routes
import backend.api.features.builder
@@ -329,6 +330,11 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/copilot",
)
app.include_router(
backend.api.features.admin.platform_cost_routes.router,
tags=["v2", "admin"],
prefix="/api/platform-costs",
)
app.include_router(
backend.api.features.executions.review.routes.router,
tags=["v2", "executions", "review"],

View File

@@ -18,6 +18,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -358,6 +359,7 @@ class AIShortformVideoCreatorBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
@@ -565,6 +567,7 @@ class AIAdMakerVideoCreatorBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
@@ -760,4 +763,5 @@ class AIScreenshotToVideoAdBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url

View File

@@ -17,7 +17,7 @@ from backend.blocks.apollo.models import (
PrimaryPhone,
SearchOrganizationsRequest,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class SearchOrganizationsBlock(Block):
@@ -218,6 +218,7 @@ To find IDs, identify the values for organization_id when you call this endpoint
) -> BlockOutput:
query = SearchOrganizationsRequest(**input_data.model_dump())
organizations = await self.search_organizations(query, credentials)
self.merge_stats(NodeExecutionStats(output_size=len(organizations)))
for organization in organizations:
yield "organization", organization
yield "organizations", organizations

View File

@@ -21,7 +21,7 @@ from backend.blocks.apollo.models import (
SearchPeopleRequest,
SenorityLevels,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class SearchPeopleBlock(Block):
@@ -366,4 +366,5 @@ class SearchPeopleBlock(Block):
*(enrich_or_fallback(person) for person in people)
)
self.merge_stats(NodeExecutionStats(output_size=len(people)))
yield "people", people

View File

@@ -13,7 +13,7 @@ from backend.blocks.apollo._auth import (
ApolloCredentialsInput,
)
from backend.blocks.apollo.models import Contact, EnrichPersonRequest
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class GetPersonDetailBlock(Block):
@@ -141,4 +141,5 @@ class GetPersonDetailBlock(Block):
**kwargs,
) -> BlockOutput:
query = EnrichPersonRequest(**input_data.model_dump())
self.merge_stats(NodeExecutionStats(output_size=1))
yield "contact", await self.enrich_person(query, credentials)

View File

@@ -17,6 +17,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -342,6 +343,7 @@ class ExecuteCodeBlock(Block, BaseE2BExecutorMixin):
# Determine result object shape & filter out empty formats
main_result, results = self.process_execution_results(results)
self.merge_stats(NodeExecutionStats(output_size=1))
if main_result:
yield "main_result", main_result
yield "results", results
@@ -467,6 +469,7 @@ class InstantiateCodeSandboxBlock(Block, BaseE2BExecutorMixin):
setup_commands=input_data.setup_commands,
timeout=input_data.timeout,
)
self.merge_stats(NodeExecutionStats(output_size=1))
if sandbox_id:
yield "sandbox_id", sandbox_id
else:
@@ -577,6 +580,7 @@ class ExecuteCodeStepBlock(Block, BaseE2BExecutorMixin):
# Determine result object shape & filter out empty formats
main_result, results = self.process_execution_results(results)
self.merge_stats(NodeExecutionStats(output_size=1))
if main_result:
yield "main_result", main_result
yield "results", results

View File

@@ -15,7 +15,12 @@ from backend.blocks._base import (
BlockSchemaInput,
BlockSchemaOutput,
)
from backend.data.model import APIKeyCredentials, CredentialsField, SchemaField
from backend.data.model import (
APIKeyCredentials,
CredentialsField,
NodeExecutionStats,
SchemaField,
)
from backend.util.type import MediaFileType
from ._api import (
@@ -195,6 +200,7 @@ class GetLinkedinProfileBlock(Block):
include_social_media=input_data.include_social_media,
include_extra=input_data.include_extra,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "profile", profile
except Exception as e:
logger.error(f"Error fetching LinkedIn profile: {str(e)}")
@@ -341,6 +347,7 @@ class LinkedinPersonLookupBlock(Block):
include_similarity_checks=input_data.include_similarity_checks,
enrich_profile=input_data.enrich_profile,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "lookup_result", lookup_result
except Exception as e:
logger.error(f"Error looking up LinkedIn profile: {str(e)}")
@@ -443,6 +450,7 @@ class LinkedinRoleLookupBlock(Block):
company_name=input_data.company_name,
enrich_profile=input_data.enrich_profile,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "role_lookup_result", role_lookup_result
except Exception as e:
logger.error(f"Error looking up role in company: {str(e)}")
@@ -523,6 +531,7 @@ class GetLinkedinProfilePictureBlock(Block):
credentials=credentials,
linkedin_profile_url=input_data.linkedin_profile_url,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "profile_picture_url", profile_picture
except Exception as e:
logger.error(f"Error getting profile picture: {str(e)}")

View File

@@ -4,6 +4,7 @@ from typing import Optional
from exa_py import AsyncExa
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -223,3 +224,6 @@ class ExaContentsBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)

View File

@@ -4,6 +4,7 @@ from typing import Optional
from exa_py import AsyncExa
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -206,3 +207,6 @@ class ExaSearchBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)

View File

@@ -18,7 +18,7 @@ from backend.blocks.fal._auth import (
FalCredentialsInput,
)
from backend.data.execution import ExecutionContext
from backend.data.model import SchemaField
from backend.data.model import NodeExecutionStats, SchemaField
from backend.util.file import store_media_file
from backend.util.request import ClientResponseError, Requests
from backend.util.type import MediaFileType
@@ -230,6 +230,7 @@ class AIVideoGeneratorBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
except Exception as e:
error_message = str(e)

View File

@@ -14,6 +14,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -117,6 +118,7 @@ class GoogleMapsSearchBlock(Block):
input_data.radius,
input_data.max_results,
)
self.merge_stats(NodeExecutionStats(output_size=len(places)))
for place in places:
yield "place", place

View File

@@ -14,6 +14,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -227,6 +228,7 @@ class IdeogramModelBlock(Block):
image_url=result,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "result", result
async def run_model(

View File

@@ -10,7 +10,7 @@ from backend.blocks.jina._auth import (
JinaCredentialsField,
JinaCredentialsInput,
)
from backend.data.model import SchemaField
from backend.data.model import NodeExecutionStats, SchemaField
from backend.util.request import Requests
@@ -45,5 +45,13 @@ class JinaEmbeddingBlock(Block):
}
data = {"input": input_data.texts, "model": input_data.model}
response = await Requests().post(url, headers=headers, json=data)
embeddings = [e["embedding"] for e in response.json()["data"]]
resp_json = response.json()
embeddings = [e["embedding"] for e in resp_json["data"]]
usage = resp_json.get("usage", {})
if usage.get("total_tokens"):
self.merge_stats(
NodeExecutionStats(
input_token_count=usage.get("total_tokens", 0),
)
)
yield "embeddings", embeddings

View File

@@ -205,19 +205,6 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
KIMI_K2 = "moonshotai/kimi-k2"
QWEN3_235B_A22B_THINKING = "qwen/qwen3-235b-a22b-thinking-2507"
QWEN3_CODER = "qwen/qwen3-coder"
# Z.ai (Zhipu) models
ZAI_GLM_4_32B = "z-ai/glm-4-32b"
ZAI_GLM_4_5 = "z-ai/glm-4.5"
ZAI_GLM_4_5_AIR = "z-ai/glm-4.5-air"
ZAI_GLM_4_5_AIR_FREE = "z-ai/glm-4.5-air:free"
ZAI_GLM_4_5V = "z-ai/glm-4.5v"
ZAI_GLM_4_6 = "z-ai/glm-4.6"
ZAI_GLM_4_6V = "z-ai/glm-4.6v"
ZAI_GLM_4_7 = "z-ai/glm-4.7"
ZAI_GLM_4_7_FLASH = "z-ai/glm-4.7-flash"
ZAI_GLM_5 = "z-ai/glm-5"
ZAI_GLM_5_TURBO = "z-ai/glm-5-turbo"
ZAI_GLM_5V_TURBO = "z-ai/glm-5v-turbo"
# Llama API models
LLAMA_API_LLAMA_4_SCOUT = "Llama-4-Scout-17B-16E-Instruct-FP8"
LLAMA_API_LLAMA4_MAVERICK = "Llama-4-Maverick-17B-128E-Instruct-FP8"
@@ -643,43 +630,6 @@ MODEL_METADATA = {
LlmModel.QWEN3_CODER: ModelMetadata(
"open_router", 262144, 262144, "Qwen 3 Coder", "OpenRouter", "Qwen", 3
),
# https://openrouter.ai/models?q=z-ai
LlmModel.ZAI_GLM_4_32B: ModelMetadata(
"open_router", 128000, 128000, "GLM 4 32B", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5: ModelMetadata(
"open_router", 131072, 98304, "GLM 4.5", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_4_5_AIR: ModelMetadata(
"open_router", 131072, 98304, "GLM 4.5 Air", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5_AIR_FREE: ModelMetadata(
"open_router", 131072, 96000, "GLM 4.5 Air (Free)", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5V: ModelMetadata(
"open_router", 65536, 16384, "GLM 4.5V", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_4_6: ModelMetadata(
"open_router", 204800, 204800, "GLM 4.6", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_6V: ModelMetadata(
"open_router", 131072, 131072, "GLM 4.6V", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_7: ModelMetadata(
"open_router", 202752, 65535, "GLM 4.7", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_7_FLASH: ModelMetadata(
"open_router", 202752, 202752, "GLM 4.7 Flash", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_5: ModelMetadata(
"open_router", 80000, 80000, "GLM 5", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_5_TURBO: ModelMetadata(
"open_router", 202752, 131072, "GLM 5 Turbo", "OpenRouter", "Z.ai", 3
),
LlmModel.ZAI_GLM_5V_TURBO: ModelMetadata(
"open_router", 202752, 131072, "GLM 5V Turbo", "OpenRouter", "Z.ai", 3
),
# Llama API models
LlmModel.LLAMA_API_LLAMA_4_SCOUT: ModelMetadata(
"llama_api",
@@ -737,6 +687,7 @@ class LLMResponse(BaseModel):
prompt_tokens: int
completion_tokens: int
reasoning: Optional[str] = None
provider_cost: float | None = None
def convert_openai_tool_fmt_to_anthropic(
@@ -1095,6 +1046,16 @@ async def llm_call(
tool_calls = extract_openai_tool_calls(response)
reasoning = extract_openai_reasoning(response)
cost = None
try:
raw_resp = getattr(response, "_response", None)
if raw_resp and hasattr(raw_resp, "headers"):
cost_header = raw_resp.headers.get("x-total-cost")
if cost_header:
cost = float(cost_header)
except (ValueError, AttributeError):
pass
return LLMResponse(
raw_response=response.choices[0].message,
prompt=prompt,
@@ -1103,6 +1064,7 @@ async def llm_call(
prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
completion_tokens=response.usage.completion_tokens if response.usage else 0,
reasoning=reasoning,
provider_cost=cost,
)
elif provider == "llama_api":
tools_param = tools if tools else openai.NOT_GIVEN
@@ -1427,12 +1389,13 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
max_tokens=input_data.max_tokens,
)
response_text = llm_response.response
self.merge_stats(
NodeExecutionStats(
input_token_count=llm_response.prompt_tokens,
output_token_count=llm_response.completion_tokens,
)
cost_stats = NodeExecutionStats(
input_token_count=llm_response.prompt_tokens,
output_token_count=llm_response.completion_tokens,
)
if llm_response.provider_cost is not None:
cost_stats.provider_cost = llm_response.provider_cost
self.merge_stats(cost_stats)
logger.debug(f"LLM attempt-{retry_count} response: {response_text}")
if input_data.expected_format:

View File

@@ -8,6 +8,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -153,6 +154,7 @@ class AddMemoryBlock(Block, Mem0Base):
messages,
**params,
)
self.merge_stats(NodeExecutionStats(output_size=1))
results = result.get("results", [])
yield "results", results
@@ -255,6 +257,7 @@ class SearchMemoryBlock(Block, Mem0Base):
result: list[dict[str, Any]] = client.search(
input_data.query, version="v2", filters=filters
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "memories", result
except Exception as e:
@@ -340,6 +343,7 @@ class GetAllMemoriesBlock(Block, Mem0Base):
filters=filters,
version="v2",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "memories", memories
@@ -434,6 +438,7 @@ class GetLatestMemoryBlock(Block, Mem0Base):
filters=filters,
version="v2",
)
self.merge_stats(NodeExecutionStats(output_size=1))
if memories:
# Return the latest memory (first in the list as they're sorted by recency)

View File

@@ -10,7 +10,7 @@ from backend.blocks.nvidia._auth import (
NvidiaCredentialsField,
NvidiaCredentialsInput,
)
from backend.data.model import SchemaField
from backend.data.model import NodeExecutionStats, SchemaField
from backend.util.request import Requests
from backend.util.type import MediaFileType
@@ -69,6 +69,7 @@ class NvidiaDeepfakeDetectBlock(Block):
data = response.json()
result = data.get("data", [{}])[0]
self.merge_stats(NodeExecutionStats(output_size=1))
# Get deepfake probability from first bounding box if any
deepfake_prob = 0.0

View File

@@ -17,7 +17,12 @@ from backend.blocks.replicate._auth import (
ReplicateCredentialsInput,
)
from backend.blocks.replicate._helper import ReplicateOutputs, extract_result
from backend.data.model import APIKeyCredentials, CredentialsField, SchemaField
from backend.data.model import (
APIKeyCredentials,
CredentialsField,
NodeExecutionStats,
SchemaField,
)
from backend.util.exceptions import BlockExecutionError, BlockInputError
logger = logging.getLogger(__name__)
@@ -108,6 +113,7 @@ class ReplicateModelBlock(Block):
result = await self.run_model(
model_ref, input_data.model_inputs, credentials.api_key
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "result", result
yield "status", "succeeded"
yield "model_name", input_data.model_name

View File

@@ -16,6 +16,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -185,6 +186,7 @@ class ScreenshotWebPageBlock(Block):
block_chats=input_data.block_chats,
cache=input_data.cache,
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "image", screenshot_data["image"]
except Exception as e:
yield "error", str(e)

View File

@@ -15,6 +15,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -146,6 +147,7 @@ class GetWeatherInformationBlock(Block, GetRequest):
weather_data = await self.get_request(url, json=True)
if "main" in weather_data and "weather" in weather_data:
self.merge_stats(NodeExecutionStats(output_size=1))
yield "temperature", str(weather_data["main"]["temp"])
yield "humidity", str(weather_data["main"]["humidity"])
yield "condition", weather_data["weather"][0]["description"]

View File

@@ -23,7 +23,7 @@ from backend.blocks.smartlead.models import (
SaveSequencesResponse,
Sequence,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class CreateCampaignBlock(Block):
@@ -100,6 +100,7 @@ class CreateCampaignBlock(Block):
**kwargs,
) -> BlockOutput:
response = await self.create_campaign(input_data.name, credentials)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "id", response.id
yield "name", response.name
@@ -226,6 +227,7 @@ class AddLeadToCampaignBlock(Block):
response = await self.add_leads_to_campaign(
input_data.campaign_id, input_data.lead_list, credentials
)
self.merge_stats(NodeExecutionStats(output_size=len(input_data.lead_list)))
yield "campaign_id", input_data.campaign_id
yield "upload_count", response.upload_count
@@ -321,6 +323,7 @@ class SaveCampaignSequencesBlock(Block):
response = await self.save_campaign_sequences(
input_data.campaign_id, input_data.sequences, credentials
)
self.merge_stats(NodeExecutionStats(output_size=1))
if response.data:
yield "data", response.data

View File

@@ -66,11 +66,7 @@ class SQLQueryBlock(Block):
advanced=False,
)
host: SecretStr = SchemaField(
description=(
"Database hostname or IP address. "
"Treated as a secret to avoid leaking infrastructure details. "
"Private/internal IPs are blocked (SSRF protection)."
),
description="Database hostname or IP address",
placeholder="db.example.com",
secret=True,
)
@@ -121,12 +117,6 @@ class SQLQueryBlock(Block):
description="Column names from the query result"
)
row_count: int = SchemaField(description="Number of rows returned")
truncated: bool = SchemaField(
description=(
"True when the result set was capped by max_rows, "
"indicating additional rows exist in the database"
)
)
affected_rows: int = SchemaField(
description="Number of rows affected by a write query (INSERT/UPDATE/DELETE)"
)
@@ -157,14 +147,12 @@ class SQLQueryBlock(Block):
("results", [{"test_col": 1}]),
("columns", ["test_col"]),
("row_count", 1),
("truncated", False),
],
test_mock={
"execute_query": lambda *_args, **_kwargs: (
[{"test_col": 1}],
["test_col"],
-1,
False,
),
"check_host_allowed": lambda *_args, **_kwargs: ["127.0.0.1"],
},
@@ -189,8 +177,8 @@ class SQLQueryBlock(Block):
max_rows: int,
read_only: bool = True,
database_type: DatabaseType = DatabaseType.POSTGRES,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
"""Execute a SQL query and return (rows, columns, affected_rows, truncated).
) -> tuple[list[dict[str, Any]], list[str], int]:
"""Execute a SQL query and return (rows, columns, affected_rows).
Delegates to ``_execute_query`` in ``sql_query_helpers``.
Extracted as a method so it can be mocked during block tests.
@@ -249,7 +237,7 @@ class SQLQueryBlock(Block):
)
try:
results, columns, affected, truncated = await asyncio.to_thread(
results, columns, affected = await asyncio.to_thread(
self.execute_query,
connection_url=connection_url,
query=input_data.query,
@@ -261,29 +249,22 @@ class SQLQueryBlock(Block):
yield "results", results
yield "columns", columns
yield "row_count", len(results)
yield "truncated", truncated
if affected >= 0:
yield "affected_rows", affected
except OperationalError as e:
yield (
"error",
self._classify_operational_error(
_sanitize(e),
input_data.timeout,
),
yield "error", self._classify_operational_error(
_sanitize(e),
input_data.timeout,
)
except ProgrammingError as e:
yield "error", f"SQL error: {_sanitize(e)}"
except DBAPIError as e:
yield "error", f"Database error: {_sanitize(e)}"
except ModuleNotFoundError:
yield (
"error",
(
f"Database driver not available for "
f"{input_data.database_type.value}. "
f"Please contact the platform administrator."
),
yield "error", (
f"Database driver not available for "
f"{input_data.database_type.value}. "
f"Please contact the platform administrator."
)
@staticmethod

View File

@@ -4,20 +4,15 @@ error sanitization, SSRF protection, and read/write mode behavior."""
from datetime import date, datetime, time
from decimal import Decimal
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch
from unittest.mock import AsyncMock
import pytest
from pydantic import SecretStr
from sqlalchemy import create_engine, text
from sqlalchemy.exc import OperationalError
from backend.blocks.sql_query_block import SQLQueryBlock, UserPasswordCredentials
from backend.blocks.sql_query_helpers import (
_CONNECT_TIMEOUT_SECONDS,
DatabaseType,
_configure_session,
_execute_query,
_run_in_transaction,
_sanitize_error,
_serialize_value,
_validate_query_is_read_only,
@@ -563,12 +558,11 @@ class TestSQLQueryBlockRunErrorHandling:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
mock_rows = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
mock_cols = ["id", "name"]
block.execute_query = lambda **_kwargs: (mock_rows, mock_cols, -1, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: (mock_rows, mock_cols, -1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert outputs["results"] == mock_rows
assert outputs["columns"] == mock_cols
assert outputs["row_count"] == 2
assert outputs["truncated"] is False
# SELECT does not produce affected_rows
assert "affected_rows" not in outputs
@@ -591,7 +585,7 @@ class TestSQLQueryBlockRunErrorHandling:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
mock_rows = [{"id": 1}]
mock_cols = ["id"]
block.execute_query = lambda **_kwargs: (mock_rows, mock_cols, -1, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: (mock_rows, mock_cols, -1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
assert outputs["results"] == mock_rows
@@ -616,7 +610,7 @@ class TestSQLQueryBlockWriteMode:
read_only=False,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 1, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
assert outputs["results"] == []
@@ -633,7 +627,7 @@ class TestSQLQueryBlockWriteMode:
read_only=False,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 1, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
assert outputs["affected_rows"] == 1
@@ -648,7 +642,7 @@ class TestSQLQueryBlockWriteMode:
read_only=False,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 1, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
assert outputs["affected_rows"] == 1
@@ -663,7 +657,7 @@ class TestSQLQueryBlockWriteMode:
read_only=False,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 0, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 0) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
@@ -677,7 +671,7 @@ class TestSQLQueryBlockWriteMode:
read_only=False,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 0, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 0) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
@@ -721,7 +715,7 @@ class TestSQLQueryBlockWriteMode:
read_only=False,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 42, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([], [], 42) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
assert outputs["affected_rows"] == 42
@@ -738,7 +732,7 @@ class TestSQLQueryBlockWriteMode:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
mock_rows = [{"id": 1, "name": "Alice"}]
mock_cols = ["id", "name"]
block.execute_query = lambda **_kwargs: (mock_rows, mock_cols, -1, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: (mock_rows, mock_cols, -1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert outputs["results"] == mock_rows
assert outputs["columns"] == mock_cols
@@ -766,7 +760,7 @@ class TestSQLQueryBlockReadOnlyMode:
read_only=True,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([{"col": 1}], ["col"], -1, False) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: ([{"col": 1}], ["col"], -1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
assert outputs["results"] == [{"col": 1}]
@@ -829,11 +823,9 @@ class TestSQLQueryBlockDefaultPort:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
captured_conn_str = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured_conn_str["value"] = str(kwargs["connection_url"])
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -854,11 +846,9 @@ class TestSQLQueryBlockDefaultPort:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
captured_conn_str = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured_conn_str["value"] = str(kwargs["connection_url"])
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -879,11 +869,9 @@ class TestSQLQueryBlockDefaultPort:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
captured_conn_str = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured_conn_str["value"] = str(kwargs["connection_url"])
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -904,11 +892,9 @@ class TestSQLQueryBlockDefaultPort:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
captured_conn_str = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured_conn_str["value"] = str(kwargs["connection_url"])
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -933,11 +919,9 @@ class TestSQLQueryBlockDNSPinning:
block.check_host_allowed = AsyncMock(return_value=["93.184.216.34"]) # type: ignore[assignment]
captured_conn_str = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured_conn_str["value"] = str(kwargs["connection_url"])
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -1074,11 +1058,9 @@ class TestTimeoutEnforcement:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
captured_kwargs: dict[str, Any] = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured_kwargs.update(kwargs)
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -1118,11 +1100,9 @@ class TestMaxRowsEnforcement:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
captured_kwargs: dict[str, Any] = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured_kwargs.update(kwargs)
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -1141,12 +1121,11 @@ class TestMaxRowsEnforcement:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
# Simulate the database returning exactly max_rows rows (truncated)
mock_rows = [{"id": i} for i in range(max_rows)]
block.execute_query = lambda **_kwargs: (mock_rows, ["id"], -1, True) # type: ignore[assignment]
block.execute_query = lambda **_kwargs: (mock_rows, ["id"], -1) # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" not in outputs
assert outputs["row_count"] == max_rows
assert len(outputs["results"]) == max_rows
assert outputs["truncated"] is True
class TestPasswordInErrorMessages:
@@ -1403,11 +1382,9 @@ class TestURLCreateSpecialCharacters:
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
captured: dict[str, Any] = {}
def fake_execute(
**kwargs: Any,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
def fake_execute(**kwargs: Any) -> tuple[list[dict[str, Any]], list[str], int]:
captured.update(kwargs)
return [{"id": 1}], ["id"], -1, False
return [{"id": 1}], ["id"], -1
block.execute_query = fake_execute # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
@@ -1417,435 +1394,3 @@ class TestURLCreateSpecialCharacters:
assert hasattr(conn_url, "password"), "connection_url should be a URL object"
assert conn_url.password == special_pass
assert conn_url.username == special_user
# ---------------------------------------------------------------------------
# Tests for newly-added disallowed keywords (LOAD, REPLACE, MERGE, BULK, EXEC)
# ---------------------------------------------------------------------------
class TestNewDisallowedKeywords:
"""LOAD, REPLACE, MERGE, BULK, and EXEC must be blocked in read-only mode."""
@pytest.mark.parametrize(
"query,expected_keyword",
[
# MySQL file exfiltration
(
"SELECT * FROM (LOAD DATA LOCAL INFILE '/etc/passwd' INTO TABLE t) AS x",
"LOAD",
),
# MySQL REPLACE (INSERT-or-UPDATE)
(
"SELECT * FROM (REPLACE INTO t VALUES (1, 'a')) AS x",
"REPLACE",
),
# ANSI MERGE (UPSERT)
(
"SELECT * FROM (MERGE INTO t USING s ON t.id = s.id WHEN MATCHED THEN UPDATE SET t.v = s.v) AS x",
"MERGE",
),
# MSSQL BULK INSERT -- sqlparse may match INSERT before BULK,
# but the query is still blocked by the disallowed keyword check.
(
"SELECT * FROM (BULK INSERT t FROM '/data.csv') AS x",
"BULK",
),
# MSSQL EXEC stored procedure
(
"SELECT * FROM (EXEC sp_who2) AS x",
"EXEC",
),
],
)
def test_keyword_blocked_in_subquery_read_only(
self, query: str, expected_keyword: str
):
"""If sqlparse parses the statement as SELECT, the keyword check
must still catch the disallowed keyword inside it."""
error, stmt = _validate_single_statement(query)
if error is None and stmt is not None:
ro_error = _validate_query_is_read_only(stmt)
assert (
ro_error is not None
), f"Expected '{expected_keyword}' to be blocked but query passed: {query}"
assert "Disallowed SQL keyword" in ro_error
@pytest.mark.parametrize(
"query",
[
"LOAD DATA LOCAL INFILE '/etc/passwd' INTO TABLE t",
"REPLACE INTO users (id, name) VALUES (1, 'Alice')",
"MERGE INTO target USING source ON target.id = source.id "
"WHEN MATCHED THEN UPDATE SET target.name = source.name",
"BULK INSERT my_table FROM '/tmp/data.csv'",
"EXEC sp_who2",
],
)
def test_standalone_keyword_rejected_as_non_select(self, query: str):
"""Standalone statements using these keywords are rejected because
their statement type is not SELECT."""
error, stmt = _validate_single_statement(query)
if error is None and stmt is not None:
ro_error = _validate_query_is_read_only(stmt)
assert ro_error is not None
def test_bare_keyword_column_names_blocked(self):
"""Bare column names that match disallowed keywords (load, replace,
merge) ARE blocked because sqlparse classifies them as Keyword tokens.
Users should quote such column names (e.g. "load") to avoid this."""
query = "SELECT load, replace, merge FROM metrics"
error, stmt = _validate_single_statement(query)
assert error is None
assert stmt is not None
ro_error = _validate_query_is_read_only(stmt)
assert ro_error is not None
def test_quoted_keyword_column_names_allowed(self):
"""Double-quoted column names that happen to match disallowed keywords
must NOT trigger false positives (sqlparse classifies them as Names)."""
query = 'SELECT "load", "replace", "merge" FROM metrics'
error, stmt = _validate_single_statement(query)
assert error is None
assert stmt is not None
ro_error = _validate_query_is_read_only(stmt)
assert ro_error is None
def test_load_in_string_literal_allowed(self):
"""String literal containing 'LOAD' must not be blocked."""
query = "SELECT * FROM logs WHERE message = 'LOAD DATA INFILE test'"
error, stmt = _validate_single_statement(query)
assert error is None
assert stmt is not None
ro_error = _validate_query_is_read_only(stmt)
assert ro_error is None
# ---------------------------------------------------------------------------
# Decimal NaN / Infinity serialization tests
# ---------------------------------------------------------------------------
class TestSerializeDecimalSpecialValues:
"""Decimal NaN and Infinity must be serialized as strings, not crash."""
def test_decimal_nan(self):
result = _serialize_value(Decimal("NaN"))
assert isinstance(result, str)
assert "NaN" in result
def test_decimal_infinity(self):
result = _serialize_value(Decimal("Infinity"))
assert isinstance(result, str)
assert "Infinity" in result
def test_decimal_negative_infinity(self):
result = _serialize_value(Decimal("-Infinity"))
assert isinstance(result, str)
assert "-Infinity" in result
def test_decimal_snan(self):
result = _serialize_value(Decimal("sNaN"))
assert isinstance(result, str)
# ---------------------------------------------------------------------------
# ModuleNotFoundError path test
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
class TestModuleNotFoundError:
"""The block must yield a clean error when a database driver is missing."""
async def test_missing_driver_yields_clean_error(self):
block = SQLQueryBlock()
creds = _make_credentials()
input_data = _make_input(
creds,
query="SELECT 1",
database_type=DatabaseType.MSSQL,
read_only=False,
)
block.check_host_allowed = AsyncMock(return_value=["1.2.3.4"]) # type: ignore[assignment]
def raise_module_not_found(**_kwargs: Any) -> None:
raise ModuleNotFoundError("No module named 'pymssql'")
block.execute_query = raise_module_not_found # type: ignore[assignment]
outputs = await _collect_outputs(block, input_data, creds)
assert "error" in outputs
assert "driver not available" in outputs["error"].lower()
assert "mssql" in outputs["error"].lower()
# ---------------------------------------------------------------------------
# Integration tests: _execute_query, _configure_session, _run_in_transaction
# using an in-memory SQLite database.
#
# SQLite does not support SET commands, so _configure_session is a no-op for
# the "sqlite" dialect. These tests validate the execution pipeline itself:
# transaction management, fetchmany truncation, COMMIT vs ROLLBACK, and the
# new truncated flag.
# ---------------------------------------------------------------------------
def _extract_sql_texts(conn_mock: MagicMock) -> list[str]:
"""Extract the raw SQL strings from a mocked connection's execute calls."""
return [c[0][0].text for c in conn_mock.execute.call_args_list]
class TestConfigureSessionDialects:
"""Verify _configure_session emits correct SET commands per dialect."""
def test_postgresql_sets_timeout_and_read_only(self):
conn = MagicMock()
_configure_session(conn, "postgresql", "30000", read_only=True)
sqls = _extract_sql_texts(conn)
assert any("statement_timeout" in s and "30000" in s for s in sqls)
assert any("default_transaction_read_only" in s for s in sqls)
def test_postgresql_timeout_only_when_not_read_only(self):
conn = MagicMock()
_configure_session(conn, "postgresql", "5000", read_only=False)
assert conn.execute.call_count == 1
sqls = _extract_sql_texts(conn)
assert "statement_timeout" in sqls[0]
def test_mysql_sets_max_execution_time_and_read_only(self):
conn = MagicMock()
_configure_session(conn, "mysql", "10000", read_only=True)
sqls = _extract_sql_texts(conn)
assert any("MAX_EXECUTION_TIME" in s and "10000" in s for s in sqls)
assert any("READ ONLY" in s for s in sqls)
def test_mysql_timeout_only_when_not_read_only(self):
conn = MagicMock()
_configure_session(conn, "mysql", "10000", read_only=False)
assert conn.execute.call_count == 1
sqls = _extract_sql_texts(conn)
assert "MAX_EXECUTION_TIME" in sqls[0]
def test_mssql_sets_lock_timeout(self):
conn = MagicMock()
_configure_session(conn, "mssql", "15000", read_only=True)
sqls = _extract_sql_texts(conn)
assert any("LOCK_TIMEOUT" in s and "15000" in s for s in sqls)
assert conn.execute.call_count == 1
def test_unknown_dialect_is_noop(self):
conn = MagicMock()
_configure_session(conn, "sqlite", "5000", read_only=True)
conn.execute.assert_not_called()
class TestRunInTransactionSQLite:
"""Integration tests for _run_in_transaction using a real SQLite engine."""
def _make_sqlite_engine(self):
return create_engine("sqlite:///:memory:")
def test_select_returns_rows(self):
engine = self._make_sqlite_engine()
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
conn.execute(text("CREATE TABLE t (id INTEGER, name TEXT)"))
conn.execute(text("INSERT INTO t VALUES (1, 'Alice')"))
conn.execute(text("INSERT INTO t VALUES (2, 'Bob')"))
results, columns, affected, truncated = _run_in_transaction(
conn,
"sqlite",
"SELECT * FROM t ORDER BY id",
max_rows=100,
read_only=True,
)
engine.dispose()
assert columns == ["id", "name"]
assert len(results) == 2
assert results[0] == {"id": 1, "name": "Alice"}
assert results[1] == {"id": 2, "name": "Bob"}
assert affected == -1
assert truncated is False
def test_fetchmany_truncation(self):
engine = self._make_sqlite_engine()
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
conn.execute(text("CREATE TABLE t (id INTEGER)"))
for i in range(10):
conn.execute(text(f"INSERT INTO t VALUES ({i})"))
results, columns, affected, truncated = _run_in_transaction(
conn,
"sqlite",
"SELECT * FROM t ORDER BY id",
max_rows=3,
read_only=True,
)
engine.dispose()
assert len(results) == 3
assert truncated is True
assert results[0]["id"] == 0
assert results[2]["id"] == 2
def test_fetchmany_exact_count_not_truncated(self):
"""When result count equals max_rows, truncated is True because
there *might* be more rows (we cannot know without fetching more)."""
engine = self._make_sqlite_engine()
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
conn.execute(text("CREATE TABLE t (id INTEGER)"))
for i in range(5):
conn.execute(text(f"INSERT INTO t VALUES ({i})"))
results, _, _, truncated = _run_in_transaction(
conn,
"sqlite",
"SELECT * FROM t",
max_rows=5,
read_only=True,
)
engine.dispose()
assert len(results) == 5
assert truncated is True
def test_fewer_rows_than_max_not_truncated(self):
engine = self._make_sqlite_engine()
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
conn.execute(text("CREATE TABLE t (id INTEGER)"))
conn.execute(text("INSERT INTO t VALUES (1)"))
results, _, _, truncated = _run_in_transaction(
conn,
"sqlite",
"SELECT * FROM t",
max_rows=100,
read_only=True,
)
engine.dispose()
assert len(results) == 1
assert truncated is False
def test_rollback_on_read_only(self):
"""When read_only=True the transaction is rolled back, so writes are
not persisted."""
engine = self._make_sqlite_engine()
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
conn.execute(text("CREATE TABLE t (id INTEGER)"))
conn.execute(text("INSERT INTO t VALUES (1)"))
_run_in_transaction(
conn,
"sqlite",
"INSERT INTO t VALUES (2)",
max_rows=100,
read_only=True,
)
# The INSERT should have been rolled back
result = conn.execute(text("SELECT COUNT(*) FROM t"))
count = result.scalar()
engine.dispose()
assert count == 1
def test_commit_on_write_mode(self):
"""When read_only=False the transaction is committed."""
engine = self._make_sqlite_engine()
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
conn.execute(text("CREATE TABLE t (id INTEGER)"))
conn.execute(text("INSERT INTO t VALUES (1)"))
_run_in_transaction(
conn,
"sqlite",
"INSERT INTO t VALUES (2)",
max_rows=100,
read_only=False,
)
result = conn.execute(text("SELECT COUNT(*) FROM t"))
count = result.scalar()
engine.dispose()
assert count == 2
def test_rollback_on_exception(self):
"""On query error the transaction is rolled back and the exception
re-raised."""
engine = self._make_sqlite_engine()
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
conn.execute(text("CREATE TABLE t (id INTEGER)"))
with pytest.raises(Exception):
_run_in_transaction(
conn,
"sqlite",
"SELECT * FROM nonexistent_table",
max_rows=100,
read_only=True,
)
engine.dispose()
def test_mssql_begin_transaction_syntax(self):
"""MSSQL dialect uses 'BEGIN TRANSACTION' instead of 'BEGIN'."""
conn = MagicMock()
result_mock = MagicMock()
result_mock.returns_rows = True
result_mock.keys.return_value = ["id"]
result_mock.fetchmany.return_value = [(1,)]
result_mock.rowcount = 1
conn.execute.return_value = result_mock
_run_in_transaction(conn, "mssql", "SELECT 1", max_rows=10, read_only=True)
sqls = _extract_sql_texts(conn)
assert sqls[0] == "BEGIN TRANSACTION"
def test_non_mssql_begin_syntax(self):
"""Non-MSSQL dialects use 'BEGIN' (not 'BEGIN TRANSACTION')."""
conn = MagicMock()
result_mock = MagicMock()
result_mock.returns_rows = True
result_mock.keys.return_value = ["id"]
result_mock.fetchmany.return_value = [(1,)]
result_mock.rowcount = 1
conn.execute.return_value = result_mock
_run_in_transaction(conn, "postgresql", "SELECT 1", max_rows=10, read_only=True)
sqls = _extract_sql_texts(conn)
assert sqls[0] == "BEGIN"
class TestExecuteQueryIntegration:
"""Integration tests for _execute_query using mocked engine internals.
Note: SQLite does not accept ``connect_timeout`` as a connect_arg, so
we cannot pass ``sqlite:///:memory:`` directly to ``_execute_query``.
Instead we mock ``create_engine`` to return a real SQLite engine while
bypassing the connect_args validation.
"""
def test_connect_timeout_constant_used(self):
"""The _CONNECT_TIMEOUT_SECONDS constant must be passed to create_engine."""
with patch("backend.blocks.sql_query_helpers.create_engine") as mock_ce:
mock_engine = MagicMock()
mock_conn = MagicMock()
mock_result = MagicMock()
mock_result.returns_rows = True
mock_result.keys.return_value = ["x"]
mock_result.fetchmany.return_value = [(1,)]
mock_result.rowcount = 1
mock_conn.execute.return_value = mock_result
mock_conn.execution_options.return_value = mock_conn
mock_engine.connect.return_value.__enter__ = lambda _: mock_conn
mock_engine.connect.return_value.__exit__ = lambda *_: None
mock_engine.dialect.name = "postgresql"
mock_ce.return_value = mock_engine
_execute_query(
"postgresql://u:p@h/d",
"SELECT 1",
timeout=10,
max_rows=100,
read_only=True,
)
mock_ce.assert_called_once()
_, kwargs = mock_ce.call_args
assert kwargs["connect_args"]["connect_timeout"] == _CONNECT_TIMEOUT_SECONDS

View File

@@ -36,16 +36,6 @@ _DISALLOWED_KEYWORDS = {
"DISCARD",
"NOTIFY",
"DO",
# MySQL file exfiltration: LOAD DATA LOCAL INFILE reads server/client files
"LOAD",
# MySQL REPLACE is INSERT-or-UPDATE; data modification
"REPLACE",
# ANSI MERGE (UPSERT) modifies data
"MERGE",
# MSSQL BULK INSERT loads external files into tables
"BULK",
# MSSQL EXEC / EXEC sp_name runs stored procedures (arbitrary code)
"EXEC",
}
# Map DatabaseType enum values to the expected SQLAlchemy driver prefix.
@@ -55,12 +45,6 @@ _DATABASE_TYPE_TO_DRIVER = {
DatabaseType.MSSQL: "mssql+pymssql",
}
# Connection timeout in seconds passed to the DBAPI driver (connect_timeout /
# login_timeout). This bounds how long the driver waits to establish a TCP
# connection to the database server. It is separate from the per-statement
# timeout configured via SET commands inside _configure_session().
_CONNECT_TIMEOUT_SECONDS = 10
# Default ports for each database type.
_DATABASE_TYPE_DEFAULT_PORT = {
DatabaseType.POSTGRES: 5432,
@@ -280,9 +264,6 @@ def _validate_single_statement(
def _serialize_value(value: Any) -> Any:
"""Convert database-specific types to JSON-serializable Python types."""
if isinstance(value, Decimal):
# NaN / Infinity are not valid JSON numbers; serialize as strings.
if value.is_nan() or value.is_infinite():
return str(value)
# Use int for whole numbers; use str for fractional to preserve exact
# precision (float would silently round high-precision analytics values).
if value == value.to_integral_value():
@@ -303,29 +284,7 @@ def _configure_session(
timeout_ms: str,
read_only: bool,
) -> None:
"""Set session-level timeout and read-only mode for the given dialect.
Timeout limitations by database:
* **PostgreSQL** ``statement_timeout`` reliably cancels any running
statement (SELECT or DML) after the configured duration.
* **MySQL** ``MAX_EXECUTION_TIME`` only applies to **read-only SELECT**
statements. DML (INSERT/UPDATE/DELETE) and DDL are *not* bounded by
this hint; they rely on the server's ``wait_timeout`` /
``interactive_timeout`` instead. There is no session-level setting in
MySQL that reliably cancels long-running writes.
* **MSSQL** ``SET LOCK_TIMEOUT`` only limits how long the server waits
to acquire a **lock**. CPU-bound queries (e.g. large scans, hash
joins) that do not block on locks will *not* be cancelled. MSSQL has
no session-level ``statement_timeout`` equivalent; the closest
mechanism is Resource Governor (requires sysadmin configuration) or
``CONTEXT_INFO``-based external monitoring.
Note: SQLite is not supported by this block. The ``_configure_session``
function is a no-op for unrecognised dialect names, so an SQLite engine
would skip all SET commands silently. The block's ``DatabaseType`` enum
intentionally excludes SQLite.
"""
"""Set session-level timeout and read-only mode for the given dialect."""
if dialect_name == "postgresql":
conn.execute(text("SET statement_timeout = " + timeout_ms))
if read_only:
@@ -334,14 +293,13 @@ def _configure_session(
# NOTE: MAX_EXECUTION_TIME only applies to SELECT statements.
# Write queries (INSERT/UPDATE/DELETE) are not bounded by this
# setting; they rely on the database's wait_timeout instead.
# See docstring above for full limitations.
conn.execute(text("SET SESSION MAX_EXECUTION_TIME = " + timeout_ms))
if read_only:
conn.execute(text("SET SESSION TRANSACTION READ ONLY"))
elif dialect_name == "mssql":
# MSSQL: SET LOCK_TIMEOUT limits lock-wait time (ms) only.
# CPU-bound queries without lock contention are NOT cancelled.
# See docstring above for full limitations.
# MSSQL: SET LOCK_TIMEOUT limits lock-wait time (ms).
# pymssql's connect_args "login_timeout" handles the connection
# timeout, but LOCK_TIMEOUT covers in-query lock waits.
conn.execute(text("SET LOCK_TIMEOUT " + timeout_ms))
# MSSQL lacks a session-level read-only mode like
# PostgreSQL/MySQL. Read-only enforcement is handled by
@@ -355,13 +313,8 @@ def _run_in_transaction(
query: str,
max_rows: int,
read_only: bool,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
"""Execute a query inside an explicit transaction, returning results.
Returns ``(rows, columns, affected_rows, truncated)`` where *truncated*
is ``True`` when ``fetchmany`` returned exactly ``max_rows`` rows,
indicating that additional rows may exist in the result set.
"""
) -> tuple[list[dict[str, Any]], list[str], int]:
"""Execute a query inside an explicit transaction, returning results."""
# MSSQL uses T-SQL "BEGIN TRANSACTION"; others use "BEGIN".
begin_stmt = "BEGIN TRANSACTION" if dialect_name == "mssql" else "BEGIN"
conn.execute(text(begin_stmt))
@@ -370,20 +323,16 @@ def _run_in_transaction(
affected = result.rowcount if not result.returns_rows else -1
columns = list(result.keys()) if result.returns_rows else []
rows = result.fetchmany(max_rows) if result.returns_rows else []
truncated = len(rows) == max_rows
results = [
{col: _serialize_value(val) for col, val in zip(columns, row)}
for row in rows
]
except Exception:
try:
conn.execute(text("ROLLBACK"))
except Exception:
pass
conn.execute(text("ROLLBACK"))
raise
else:
conn.execute(text("ROLLBACK" if read_only else "COMMIT"))
return results, columns, affected, truncated
return results, columns, affected
def _execute_query(
@@ -393,12 +342,11 @@ def _execute_query(
max_rows: int,
read_only: bool = True,
database_type: DatabaseType = DatabaseType.POSTGRES,
) -> tuple[list[dict[str, Any]], list[str], int, bool]:
"""Execute a SQL query and return (rows, columns, affected_rows, truncated).
) -> tuple[list[dict[str, Any]], list[str], int]:
"""Execute a SQL query and return (rows, columns, affected_rows).
Uses SQLAlchemy to connect to any supported database.
For SELECT queries, rows are limited to ``max_rows`` via DBAPI fetchmany.
``truncated`` is ``True`` when the result set was capped by ``max_rows``.
For write queries, affected_rows contains the rowcount from the driver.
When ``read_only`` is True, the database session is set to read-only
mode and the transaction is always rolled back.
@@ -408,9 +356,7 @@ def _execute_query(
timeout_key = (
"login_timeout" if database_type == DatabaseType.MSSQL else "connect_timeout"
)
engine = create_engine(
connection_url, connect_args={timeout_key: _CONNECT_TIMEOUT_SECONDS}
)
engine = create_engine(connection_url, connect_args={timeout_key: 10})
try:
with engine.connect() as conn:
# Use AUTOCOMMIT so SET commands take effect immediately.

View File

@@ -15,6 +15,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -181,6 +182,7 @@ class CreateTalkingAvatarVideoBlock(Block):
execution_context=execution_context,
return_format="for_block_output",
)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "video_url", stored_url
return
elif status_response["status"] == "error":

View File

@@ -13,6 +13,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -104,4 +105,5 @@ class UnrealTextToSpeechBlock(Block):
input_data.text,
input_data.voice_id,
)
self.merge_stats(NodeExecutionStats(output_size=len(input_data.text)))
yield "mp3_url", api_response["OutputUri"]

View File

@@ -19,6 +19,7 @@ from backend.blocks._base import (
from backend.data.model import (
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
UserPasswordCredentials,
)
@@ -170,6 +171,7 @@ class TranscribeYoutubeVideoBlock(Block):
transcript = self.get_transcript(video_id, credentials)
transcript_text = self.format_transcript(transcript=transcript)
self.merge_stats(NodeExecutionStats(output_size=1))
# Only yield after all operations succeed
yield "video_id", video_id
yield "transcript", transcript_text

View File

@@ -21,7 +21,7 @@ from backend.blocks.zerobounce._auth import (
ZeroBounceCredentials,
ZeroBounceCredentialsInput,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class Response(BaseModel):
@@ -177,5 +177,6 @@ class ValidateEmailsBlock(Block):
)
response_model = Response(**response.__dict__)
self.merge_stats(NodeExecutionStats(output_size=1))
yield "response", response_model

File diff suppressed because it is too large Load Diff

View File

@@ -1,633 +0,0 @@
"""Unit tests for baseline service pure-logic helpers.
These tests cover ``_baseline_conversation_updater`` and ``_BaselineStreamState``
without requiring API keys, database connections, or network access.
"""
from unittest.mock import AsyncMock, patch
import pytest
from openai.types.chat import ChatCompletionToolParam
from backend.copilot.baseline.service import (
_baseline_conversation_updater,
_BaselineStreamState,
_compress_session_messages,
_ThinkingStripper,
)
from backend.copilot.model import ChatMessage
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.util.prompt import CompressResult
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
class TestBaselineStreamState:
def test_defaults(self):
state = _BaselineStreamState()
assert state.pending_events == []
assert state.assistant_text == ""
assert state.text_started is False
assert state.turn_prompt_tokens == 0
assert state.turn_completion_tokens == 0
assert state.text_block_id # Should be a UUID string
def test_mutable_fields(self):
state = _BaselineStreamState()
state.assistant_text = "hello"
state.turn_prompt_tokens = 100
state.turn_completion_tokens = 50
assert state.assistant_text == "hello"
assert state.turn_prompt_tokens == 100
assert state.turn_completion_tokens == 50
class TestBaselineConversationUpdater:
"""Tests for _baseline_conversation_updater which updates the OpenAI
message list and transcript builder after each LLM call."""
def _make_transcript_builder(self) -> TranscriptBuilder:
builder = TranscriptBuilder()
builder.append_user("test question")
return builder
def test_text_only_response(self):
"""When the LLM returns text without tool calls, the updater appends
a single assistant message and records it in the transcript."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text="Hello, world!",
tool_calls=[],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
_baseline_conversation_updater(
messages,
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 1
assert messages[0]["role"] == "assistant"
assert messages[0]["content"] == "Hello, world!"
# Transcript should have user + assistant
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
def test_tool_calls_response(self):
"""When the LLM returns tool calls, the updater appends the assistant
message with tool_calls and tool result messages."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text="Let me search...",
tool_calls=[
LLMToolCall(
id="tc_1",
name="search",
arguments='{"query": "test"}',
),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(
tool_call_id="tc_1",
tool_name="search",
content="Found result",
),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# Messages: assistant (with tool_calls) + tool result
assert len(messages) == 2
assert messages[0]["role"] == "assistant"
assert messages[0]["content"] == "Let me search..."
assert len(messages[0]["tool_calls"]) == 1
assert messages[0]["tool_calls"][0]["id"] == "tc_1"
assert messages[1]["role"] == "tool"
assert messages[1]["tool_call_id"] == "tc_1"
assert messages[1]["content"] == "Found result"
# Transcript: user + assistant(tool_use) + user(tool_result)
assert builder.entry_count == 3
def test_tool_calls_without_text(self):
"""Tool calls without accompanying text should still work."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="run", arguments="{}"),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="run", content="done"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 2
assert "content" not in messages[0] # No text content
assert messages[0]["tool_calls"][0]["function"]["name"] == "run"
def test_no_text_no_tools(self):
"""When the response has no text and no tool calls, nothing is appended."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
_baseline_conversation_updater(
messages,
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 0
# Only the user entry from setup
assert builder.entry_count == 1
def test_multiple_tool_calls(self):
"""Multiple tool calls in a single response are all recorded."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="tool_a", arguments="{}"),
LLMToolCall(id="tc_2", name="tool_b", arguments='{"x": 1}'),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="tool_a", content="result_a"),
ToolCallResult(tool_call_id="tc_2", tool_name="tool_b", content="result_b"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# 1 assistant + 2 tool results
assert len(messages) == 3
assert len(messages[0]["tool_calls"]) == 2
assert messages[1]["tool_call_id"] == "tc_1"
assert messages[2]["tool_call_id"] == "tc_2"
def test_invalid_tool_arguments_handled(self):
"""Tool call with invalid JSON arguments: the arguments field is
stored as-is in the message, and orjson failure falls back to {}
in the transcript content_blocks."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="tool_x", arguments="not-json"),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="tool_x", content="ok"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# Should not raise — invalid JSON falls back to {} in transcript
assert len(messages) == 2
assert messages[0]["tool_calls"][0]["function"]["arguments"] == "not-json"
class TestCompressSessionMessagesPreservesToolCalls:
"""``_compress_session_messages`` must round-trip tool_calls + tool_call_id.
Compression serialises ChatMessage to dict for ``compress_context`` and
reifies the result back to ChatMessage. A regression that drops
``tool_calls`` or ``tool_call_id`` would corrupt the OpenAI message
list and break downstream tool-execution rounds.
"""
@pytest.mark.asyncio
async def test_compressed_output_keeps_tool_calls_and_ids(self):
# Simulate compression that returns a summary + the most recent
# assistant(tool_call) + tool(tool_result) intact.
summary = {"role": "system", "content": "prior turns: user asked X"}
assistant_with_tc = {
"role": "assistant",
"content": "calling tool",
"tool_calls": [
{
"id": "tc_abc",
"type": "function",
"function": {"name": "search", "arguments": '{"q":"y"}'},
}
],
}
tool_result = {
"role": "tool",
"tool_call_id": "tc_abc",
"content": "search result",
}
compress_result = CompressResult(
messages=[summary, assistant_with_tc, tool_result],
token_count=100,
was_compacted=True,
original_token_count=5000,
messages_summarized=10,
messages_dropped=0,
)
# Input: messages that should be compressed.
input_messages = [
ChatMessage(role="user", content="q1"),
ChatMessage(
role="assistant",
content="calling tool",
tool_calls=[
{
"id": "tc_abc",
"type": "function",
"function": {
"name": "search",
"arguments": '{"q":"y"}',
},
}
],
),
ChatMessage(
role="tool",
tool_call_id="tc_abc",
content="search result",
),
]
with patch(
"backend.copilot.baseline.service.compress_context",
new=AsyncMock(return_value=compress_result),
):
compressed = await _compress_session_messages(
input_messages, model="openrouter/anthropic/claude-opus-4"
)
# Summary, assistant(tool_calls), tool(tool_call_id).
assert len(compressed) == 3
# Assistant message must keep its tool_calls intact.
assistant_msg = compressed[1]
assert assistant_msg.role == "assistant"
assert assistant_msg.tool_calls is not None
assert len(assistant_msg.tool_calls) == 1
assert assistant_msg.tool_calls[0]["id"] == "tc_abc"
assert assistant_msg.tool_calls[0]["function"]["name"] == "search"
# Tool-role message must keep tool_call_id for OpenAI linkage.
tool_msg = compressed[2]
assert tool_msg.role == "tool"
assert tool_msg.tool_call_id == "tc_abc"
assert tool_msg.content == "search result"
@pytest.mark.asyncio
async def test_uncompressed_passthrough_keeps_fields(self):
"""When compression is a no-op (was_compacted=False), the original
messages must be returned unchanged — including tool_calls."""
input_messages = [
ChatMessage(
role="assistant",
content="c",
tool_calls=[
{
"id": "t1",
"type": "function",
"function": {"name": "f", "arguments": "{}"},
}
],
),
ChatMessage(role="tool", tool_call_id="t1", content="ok"),
]
noop_result = CompressResult(
messages=[], # ignored when was_compacted=False
token_count=10,
was_compacted=False,
)
with patch(
"backend.copilot.baseline.service.compress_context",
new=AsyncMock(return_value=noop_result),
):
out = await _compress_session_messages(
input_messages, model="openrouter/anthropic/claude-opus-4"
)
assert out is input_messages # same list returned
assert out[0].tool_calls is not None
assert out[0].tool_calls[0]["id"] == "t1"
assert out[1].tool_call_id == "t1"
# ---- _ThinkingStripper tests ---- #
def test_thinking_stripper_basic_thinking_tag() -> None:
"""<thinking>...</thinking> blocks are fully stripped."""
s = _ThinkingStripper()
assert s.process("<thinking>internal reasoning here</thinking>Hello!") == "Hello!"
def test_thinking_stripper_internal_reasoning_tag() -> None:
"""<internal_reasoning>...</internal_reasoning> blocks (Gemini) are stripped."""
s = _ThinkingStripper()
assert (
s.process("<internal_reasoning>step by step</internal_reasoning>Answer")
== "Answer"
)
def test_thinking_stripper_split_across_chunks() -> None:
"""Tags split across multiple chunks are handled correctly."""
s = _ThinkingStripper()
out = s.process("Hello <thin")
out += s.process("king>secret</thinking> world")
assert out == "Hello world"
def test_thinking_stripper_plain_text_preserved() -> None:
"""Plain text with the word 'thinking' is not stripped."""
s = _ThinkingStripper()
assert (
s.process("I am thinking about this problem")
== "I am thinking about this problem"
)
def test_thinking_stripper_multiple_blocks() -> None:
"""Multiple reasoning blocks in one stream are all stripped."""
s = _ThinkingStripper()
result = s.process(
"A<thinking>x</thinking>B<internal_reasoning>y</internal_reasoning>C"
)
assert result == "ABC"
def test_thinking_stripper_flush_discards_unclosed() -> None:
"""Unclosed reasoning block is discarded on flush."""
s = _ThinkingStripper()
s.process("Start<thinking>never closed")
flushed = s.flush()
assert "never closed" not in flushed
def test_thinking_stripper_empty_block() -> None:
"""Empty reasoning blocks are handled gracefully."""
s = _ThinkingStripper()
assert s.process("Before<thinking></thinking>After") == "BeforeAfter"
# ---- _filter_tools_by_permissions tests ---- #
def _make_tool(name: str) -> ChatCompletionToolParam:
"""Build a minimal OpenAI ChatCompletionToolParam."""
return ChatCompletionToolParam(
type="function",
function={"name": name, "parameters": {}},
)
class TestFilterToolsByPermissions:
"""Tests for _filter_tools_by_permissions."""
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_empty_permissions_returns_all(self, _mock_names):
"""Empty permissions (no filtering) returns every tool unchanged."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [_make_tool("run_block"), _make_tool("web_fetch")]
perms = CopilotPermissions()
result = _filter_tools_by_permissions(tools, perms)
assert result == tools
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_allowlist_keeps_only_matching(self, _mock_names):
"""Explicit allowlist (tools_exclude=False) keeps only listed tools."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [
_make_tool("run_block"),
_make_tool("web_fetch"),
_make_tool("bash_exec"),
]
perms = CopilotPermissions(tools=["web_fetch"], tools_exclude=False)
result = _filter_tools_by_permissions(tools, perms)
assert len(result) == 1
assert result[0]["function"]["name"] == "web_fetch"
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_blacklist_excludes_listed(self, _mock_names):
"""Blacklist (tools_exclude=True) removes only the listed tools."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [
_make_tool("run_block"),
_make_tool("web_fetch"),
_make_tool("bash_exec"),
]
perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
result = _filter_tools_by_permissions(tools, perms)
names = [t["function"]["name"] for t in result]
assert "bash_exec" not in names
assert "run_block" in names
assert "web_fetch" in names
assert len(result) == 2
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_unknown_tool_name_filtered_out(self, _mock_names):
"""A tool whose name is not in all_known_tool_names is dropped."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [_make_tool("run_block"), _make_tool("unknown_tool")]
perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
result = _filter_tools_by_permissions(tools, perms)
names = [t["function"]["name"] for t in result]
assert "unknown_tool" not in names
assert names == ["run_block"]
# ---- _prepare_baseline_attachments tests ---- #
class TestPrepareBaselineAttachments:
"""Tests for _prepare_baseline_attachments."""
@pytest.mark.asyncio
async def test_empty_file_ids(self):
"""Empty file_ids returns empty hint and blocks."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
hint, blocks = await _prepare_baseline_attachments([], "user1", "sess1", "/tmp")
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_empty_user_id(self):
"""Empty user_id returns empty hint and blocks."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
hint, blocks = await _prepare_baseline_attachments(
["file1"], "", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_image_file_returns_vision_blocks(self):
"""A PNG image within size limits is returned as a base64 vision block."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_info = AsyncMock()
fake_info.name = "photo.png"
fake_info.mime_type = "image/png"
fake_info.size_bytes = 1024
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
fake_manager.read_file_by_id = AsyncMock(return_value=b"\x89PNG_FAKE_DATA")
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", "/tmp/workdir"
)
assert len(blocks) == 1
assert blocks[0]["type"] == "image"
assert blocks[0]["source"]["media_type"] == "image/png"
assert blocks[0]["source"]["type"] == "base64"
assert "photo.png" in hint
assert "embedded as image" in hint
@pytest.mark.asyncio
async def test_non_image_file_saved_to_working_dir(self, tmp_path):
"""A non-image file is written to working_dir."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_info = AsyncMock()
fake_info.name = "data.csv"
fake_info.mime_type = "text/csv"
fake_info.size_bytes = 42
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
fake_manager.read_file_by_id = AsyncMock(return_value=b"col1,col2\na,b")
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", str(tmp_path)
)
assert blocks == []
assert "data.csv" in hint
assert "saved to" in hint
saved = tmp_path / "data.csv"
assert saved.exists()
assert saved.read_bytes() == b"col1,col2\na,b"
@pytest.mark.asyncio
async def test_file_not_found_skipped(self):
"""When get_file_info returns None the file is silently skipped."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["missing_id"], "user1", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_workspace_manager_error(self):
"""When get_workspace_manager raises, returns empty results."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(side_effect=RuntimeError("connection failed")),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []

View File

@@ -1,667 +0,0 @@
"""Integration tests for baseline transcript flow.
Exercises the real helpers in ``baseline/service.py`` that download,
validate, load, append to, backfill, and upload the transcript.
Storage is mocked via ``download_transcript`` / ``upload_transcript``
patches; no network access is required.
"""
import json as stdlib_json
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.baseline.service import (
_load_prior_transcript,
_record_turn_to_transcript,
_resolve_baseline_model,
_upload_final_transcript,
is_transcript_stale,
should_upload_transcript,
)
from backend.copilot.service import config
from backend.copilot.transcript import (
STOP_REASON_END_TURN,
STOP_REASON_TOOL_USE,
TranscriptDownload,
)
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
def _make_transcript_content(*roles: str) -> str:
"""Build a minimal valid JSONL transcript from role names."""
lines = []
parent = ""
for i, role in enumerate(roles):
uid = f"uuid-{i}"
entry: dict = {
"type": role,
"uuid": uid,
"parentUuid": parent,
"message": {
"role": role,
"content": [{"type": "text", "text": f"{role} message {i}"}],
},
}
if role == "assistant":
entry["message"]["id"] = f"msg_{i}"
entry["message"]["model"] = "test-model"
entry["message"]["type"] = "message"
entry["message"]["stop_reason"] = STOP_REASON_END_TURN
lines.append(stdlib_json.dumps(entry))
parent = uid
return "\n".join(lines) + "\n"
class TestResolveBaselineModel:
"""Model selection honours the per-request mode."""
def test_fast_mode_selects_fast_model(self):
assert _resolve_baseline_model("fast") == config.fast_model
def test_extended_thinking_selects_default_model(self):
assert _resolve_baseline_model("extended_thinking") == config.model
def test_none_mode_selects_default_model(self):
"""Critical: baseline users without a mode MUST keep the default (opus)."""
assert _resolve_baseline_model(None) == config.model
def test_default_and_fast_models_differ(self):
"""Sanity: the two tiers are actually distinct in production config."""
assert config.model != config.fast_model
class TestLoadPriorTranscript:
"""``_load_prior_transcript`` wraps the download + validate + load flow."""
@pytest.mark.asyncio
async def test_loads_fresh_transcript(self):
builder = TranscriptBuilder()
content = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=content, message_count=2)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
@pytest.mark.asyncio
async def test_rejects_stale_transcript(self):
"""msg_count strictly less than session-1 is treated as stale."""
builder = TranscriptBuilder()
content = _make_transcript_content("user", "assistant")
# session has 6 messages, transcript only covers 2 → stale.
download = TranscriptDownload(content=content, message_count=2)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=6,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_missing_transcript_returns_false(self):
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=None),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_invalid_transcript_returns_false(self):
builder = TranscriptBuilder()
download = TranscriptDownload(
content='{"type":"progress","uuid":"a"}\n',
message_count=1,
)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_download_exception_returns_false(self):
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(side_effect=RuntimeError("boom")),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_zero_message_count_not_stale(self):
"""When msg_count is 0 (unknown), staleness check is skipped."""
builder = TranscriptBuilder()
download = TranscriptDownload(
content=_make_transcript_content("user", "assistant"),
message_count=0,
)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=20,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
class TestUploadFinalTranscript:
"""``_upload_final_transcript`` serialises and calls storage."""
@pytest.mark.asyncio
async def test_uploads_valid_transcript(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=2,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
call_kwargs = upload_mock.await_args.kwargs
assert call_kwargs["user_id"] == "user-1"
assert call_kwargs["session_id"] == "session-1"
assert call_kwargs["message_count"] == 2
assert "hello" in call_kwargs["content"]
@pytest.mark.asyncio
async def test_skips_upload_when_builder_empty(self):
builder = TranscriptBuilder()
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=0,
)
upload_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_swallows_upload_exceptions(self):
"""Upload failures should not propagate (flow continues for the user)."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=AsyncMock(side_effect=RuntimeError("storage unavailable")),
):
# Should not raise.
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=2,
)
class TestRecordTurnToTranscript:
"""``_record_turn_to_transcript`` translates LLMLoopResponse → transcript."""
def test_records_final_assistant_text(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text="hello there",
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
jsonl = builder.to_jsonl()
assert "hello there" in jsonl
assert STOP_REASON_END_TURN in jsonl
def test_records_tool_use_then_tool_result(self):
"""Anthropic ordering: assistant(tool_use) → user(tool_result)."""
builder = TranscriptBuilder()
builder.append_user(content="use a tool")
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="call-1", name="echo", arguments='{"text":"hi"}')
],
raw_response=None,
)
tool_results = [
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="hi")
]
_record_turn_to_transcript(
response,
tool_results,
transcript_builder=builder,
model="test-model",
)
# user, assistant(tool_use), user(tool_result) = 3 entries
assert builder.entry_count == 3
jsonl = builder.to_jsonl()
assert STOP_REASON_TOOL_USE in jsonl
assert "tool_use" in jsonl
assert "tool_result" in jsonl
assert "call-1" in jsonl
def test_records_nothing_on_empty_response(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text=None,
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 1
def test_malformed_tool_args_dont_crash(self):
"""Bad JSON in tool arguments falls back to {} without raising."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text=None,
tool_calls=[LLMToolCall(id="call-1", name="echo", arguments="{not-json")],
raw_response=None,
)
tool_results = [
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="ok")
]
_record_turn_to_transcript(
response,
tool_results,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 3
jsonl = builder.to_jsonl()
assert '"input":{}' in jsonl
class TestRoundTrip:
"""End-to-end: load prior → append new turn → upload."""
@pytest.mark.asyncio
async def test_full_round_trip(self):
prior = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=prior, message_count=2)
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
# New user turn.
builder.append_user(content="new question")
assert builder.entry_count == 3
# New assistant turn.
response = LLMLoopResponse(
response_text="new answer",
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 4
# Upload.
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=4,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
uploaded = upload_mock.await_args.kwargs["content"]
assert "new question" in uploaded
assert "new answer" in uploaded
# Original content preserved in the round trip.
assert "user message 0" in uploaded
assert "assistant message 1" in uploaded
@pytest.mark.asyncio
async def test_backfill_append_guard(self):
"""Backfill only runs when the last entry is not already assistant."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
# Simulate the backfill guard from stream_chat_completion_baseline.
assistant_text = "partial text before error"
if builder.last_entry_type != "assistant":
builder.append_assistant(
content_blocks=[{"type": "text", "text": assistant_text}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert builder.last_entry_type == "assistant"
assert "partial text before error" in builder.to_jsonl()
# Second invocation: the guard must prevent double-append.
initial_count = builder.entry_count
if builder.last_entry_type != "assistant":
builder.append_assistant(
content_blocks=[{"type": "text", "text": "duplicate"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert builder.entry_count == initial_count
class TestIsTranscriptStale:
"""``is_transcript_stale`` gates prior-transcript loading."""
def test_none_download_is_not_stale(self):
assert is_transcript_stale(None, session_msg_count=5) is False
def test_zero_message_count_is_not_stale(self):
"""Legacy transcripts without msg_count tracking must remain usable."""
dl = TranscriptDownload(content="", message_count=0)
assert is_transcript_stale(dl, session_msg_count=20) is False
def test_stale_when_covers_less_than_prefix(self):
dl = TranscriptDownload(content="", message_count=2)
# session has 6 messages; transcript must cover at least 5 (6-1).
assert is_transcript_stale(dl, session_msg_count=6) is True
def test_fresh_when_covers_full_prefix(self):
dl = TranscriptDownload(content="", message_count=5)
assert is_transcript_stale(dl, session_msg_count=6) is False
def test_fresh_when_exceeds_prefix(self):
"""Race: transcript ahead of session count is still acceptable."""
dl = TranscriptDownload(content="", message_count=10)
assert is_transcript_stale(dl, session_msg_count=6) is False
def test_boundary_equal_to_prefix_minus_one(self):
dl = TranscriptDownload(content="", message_count=5)
assert is_transcript_stale(dl, session_msg_count=6) is False
class TestShouldUploadTranscript:
"""``should_upload_transcript`` gates the final upload."""
def test_upload_allowed_for_user_with_coverage(self):
assert should_upload_transcript("user-1", True) is True
def test_upload_skipped_when_no_user(self):
assert should_upload_transcript(None, True) is False
def test_upload_skipped_when_empty_user(self):
assert should_upload_transcript("", True) is False
def test_upload_skipped_without_coverage(self):
"""Partial transcript must never clobber a more complete stored one."""
assert should_upload_transcript("user-1", False) is False
def test_upload_skipped_when_no_user_and_no_coverage(self):
assert should_upload_transcript(None, False) is False
class TestTranscriptLifecycle:
"""End-to-end: download → validate → build → upload.
Simulates the full transcript lifecycle inside
``stream_chat_completion_baseline`` by mocking the storage layer and
driving each step through the real helpers.
"""
@pytest.mark.asyncio
async def test_full_lifecycle_happy_path(self):
"""Fresh download, append a turn, upload covers the session."""
builder = TranscriptBuilder()
prior = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=prior, message_count=2)
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
# --- 1. Download & load prior transcript ---
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
# --- 2. Append a new user turn + a new assistant response ---
builder.append_user(content="follow-up question")
_record_turn_to_transcript(
LLMLoopResponse(
response_text="follow-up answer",
tool_calls=[],
raw_response=None,
),
tool_results=None,
transcript_builder=builder,
model="test-model",
)
# --- 3. Gate + upload ---
assert (
should_upload_transcript(
user_id="user-1", transcript_covers_prefix=covers
)
is True
)
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=4,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
uploaded = upload_mock.await_args.kwargs["content"]
assert "follow-up question" in uploaded
assert "follow-up answer" in uploaded
# Original prior-turn content preserved.
assert "user message 0" in uploaded
assert "assistant message 1" in uploaded
@pytest.mark.asyncio
async def test_lifecycle_stale_download_suppresses_upload(self):
"""Stale download → covers=False → upload must be skipped."""
builder = TranscriptBuilder()
# session has 10 msgs but stored transcript only covers 2 → stale.
stale = TranscriptDownload(
content=_make_transcript_content("user", "assistant"),
message_count=2,
)
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=stale),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=10,
transcript_builder=builder,
)
assert covers is False
# The caller's gate mirrors the production path.
assert (
should_upload_transcript(user_id="user-1", transcript_covers_prefix=covers)
is False
)
upload_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_lifecycle_anonymous_user_skips_upload(self):
"""Anonymous (user_id=None) → upload gate must return False."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert (
should_upload_transcript(user_id=None, transcript_covers_prefix=True)
is False
)
@pytest.mark.asyncio
async def test_lifecycle_missing_download_still_uploads_new_content(self):
"""No prior transcript → covers defaults to True in the service,
new turn should upload cleanly."""
builder = TranscriptBuilder()
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=None),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=1,
transcript_builder=builder,
)
# No download: covers is False, so the production path would
# skip upload. This protects against overwriting a future
# more-complete transcript with a single-turn snapshot.
assert covers is False
assert (
should_upload_transcript(
user_id="user-1", transcript_covers_prefix=covers
)
is False
)
upload_mock.assert_not_awaited()

View File

@@ -8,14 +8,6 @@ from pydantic_settings import BaseSettings
from backend.util.clients import OPENROUTER_BASE_URL
# Per-request routing mode for a single chat turn.
# - 'fast': route to the baseline OpenAI-compatible path with the cheaper model.
# - 'extended_thinking': route to the Claude Agent SDK path with the default
# (opus) model.
# ``None`` means "no override"; the server falls back to the Claude Code
# subscription flag → LaunchDarkly COPILOT_SDK → config.use_claude_agent_sdk.
CopilotMode = Literal["fast", "extended_thinking"]
class ChatConfig(BaseSettings):
"""Configuration for the chat system."""
@@ -146,6 +138,32 @@ class ChatConfig(BaseSettings):
description="Use --resume for multi-turn conversations instead of "
"history compression. Falls back to compression when unavailable.",
)
claude_agent_fallback_model: str = Field(
default="claude-sonnet-4-20250514",
description="Fallback model when the primary model is unavailable (e.g. 529 "
"overloaded). The SDK automatically retries with this cheaper model.",
)
claude_agent_max_turns: int = Field(
default=50,
ge=1,
le=500,
description="Maximum number of agentic turns (tool-use loops) per query. "
"Prevents runaway tool loops from burning budget.",
)
claude_agent_max_budget_usd: float = Field(
default=5.0,
ge=0.01,
le=100.0,
description="Maximum spend in USD per SDK query. The CLI aborts the "
"request if this budget is exceeded.",
)
claude_agent_max_transient_retries: int = Field(
default=3,
ge=0,
le=10,
description="Maximum number of retries for transient API errors "
"(429, 5xx, ECONNRESET) before surfacing the error to the user.",
)
use_openrouter: bool = Field(
default=True,
description="Enable routing API calls through the OpenRouter proxy. "

View File

@@ -44,12 +44,32 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
# Transient Anthropic API error detection
# ---------------------------------------------------------------------------
# Patterns in error text that indicate a transient Anthropic API error
# (ECONNRESET / dropped TCP connection) which is retryable.
# which is retryable. Covers:
# - Connection-level: ECONNRESET, dropped TCP connections
# - HTTP 429: rate-limit / too-many-requests
# - HTTP 5xx: server errors, overloaded
_TRANSIENT_ERROR_PATTERNS = (
# Connection-level
"socket connection was closed unexpectedly",
"ECONNRESET",
"connection was forcibly closed",
"network socket disconnected",
# 429 rate-limit patterns
"rate limit",
"rate_limit",
"too many requests",
"status code 429",
# 5xx server error patterns
"overloaded",
"internal server error",
"bad gateway",
"service unavailable",
"gateway timeout",
"status code 529",
"status code 500",
"status code 502",
"status code 503",
"status code 504",
)
FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"

View File

@@ -14,7 +14,6 @@ from prisma.types import (
ChatSessionUpdateInput,
ChatSessionWhereInput,
)
from pydantic import BaseModel
from backend.data import db
from backend.util.json import SafeJson, sanitize_string
@@ -24,22 +23,12 @@ from .model import (
ChatSession,
ChatSessionInfo,
ChatSessionMetadata,
cache_chat_session,
invalidate_session_cache,
)
from .model import get_chat_session as get_chat_session_cached
logger = logging.getLogger(__name__)
class PaginatedMessages(BaseModel):
"""Result of a paginated message query."""
messages: list[ChatMessage]
has_more: bool
oldest_sequence: int | None
session: ChatSessionInfo
async def get_chat_session(session_id: str) -> ChatSession | None:
"""Get a chat session by ID from the database."""
session = await PrismaChatSession.prisma().find_unique(
@@ -49,116 +38,6 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
return ChatSession.from_db(session) if session else None
async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
"""Get chat session metadata (without messages) for ownership validation."""
session = await PrismaChatSession.prisma().find_unique(
where={"id": session_id},
)
return ChatSessionInfo.from_db(session) if session else None
async def get_chat_messages_paginated(
session_id: str,
limit: int = 50,
before_sequence: int | None = None,
user_id: str | None = None,
) -> PaginatedMessages | None:
"""Get paginated messages for a session, newest first.
Verifies session existence (and ownership when ``user_id`` is provided)
in parallel with the message query. Returns ``None`` when the session
is not found or does not belong to the user.
Args:
session_id: The chat session ID.
limit: Max messages to return.
before_sequence: Cursor — return messages with sequence < this value.
user_id: If provided, filters via ``Session.userId`` so only the
session owner's messages are returned (acts as an ownership guard).
"""
# Build session-existence / ownership check
session_where: ChatSessionWhereInput = {"id": session_id}
if user_id is not None:
session_where["userId"] = user_id
# Build message include — fetch paginated messages in the same query
msg_include: dict[str, Any] = {
"order_by": {"sequence": "desc"},
"take": limit + 1,
}
if before_sequence is not None:
msg_include["where"] = {"sequence": {"lt": before_sequence}}
# Single query: session existence/ownership + paginated messages
session = await PrismaChatSession.prisma().find_first(
where=session_where,
include={"Messages": msg_include},
)
if session is None:
return None
session_info = ChatSessionInfo.from_db(session)
results = list(session.Messages) if session.Messages else []
has_more = len(results) > limit
results = results[:limit]
# Reverse to ascending order
results.reverse()
# Tool-call boundary fix: if the oldest message is a tool message,
# expand backward to include the preceding assistant message that
# owns the tool_calls, so convertChatSessionMessagesToUiMessages
# can pair them correctly.
_BOUNDARY_SCAN_LIMIT = 10
if results and results[0].role == "tool":
boundary_where: dict[str, Any] = {
"sessionId": session_id,
"sequence": {"lt": results[0].sequence},
}
if user_id is not None:
boundary_where["Session"] = {"is": {"userId": user_id}}
extra = await PrismaChatMessage.prisma().find_many(
where=boundary_where,
order={"sequence": "desc"},
take=_BOUNDARY_SCAN_LIMIT,
)
# Find the first non-tool message (should be the assistant)
boundary_msgs = []
found_owner = False
for msg in extra:
boundary_msgs.append(msg)
if msg.role != "tool":
found_owner = True
break
boundary_msgs.reverse()
if not found_owner:
logger.warning(
"Boundary expansion did not find owning assistant message "
"for session=%s before sequence=%s (%d msgs scanned)",
session_id,
results[0].sequence,
len(extra),
)
if boundary_msgs:
results = boundary_msgs + results
# Only mark has_more if the expanded boundary isn't the
# very start of the conversation (sequence 0).
if boundary_msgs[0].sequence > 0:
has_more = True
messages = [ChatMessage.from_db(m) for m in results]
oldest_sequence = messages[0].sequence if messages else None
return PaginatedMessages(
messages=messages,
has_more=has_more,
oldest_sequence=oldest_sequence,
session=session_info,
)
async def create_chat_session(
session_id: str,
user_id: str,
@@ -501,11 +380,8 @@ async def update_tool_message_content(
async def set_turn_duration(session_id: str, duration_ms: int) -> None:
"""Set durationMs on the last assistant message in a session.
Updates the Redis cache in-place instead of invalidating it.
Invalidation would delete the key, creating a window where concurrent
``get_chat_session`` calls re-populate the cache from DB — potentially
with stale data if the DB write from the previous turn hasn't propagated.
This race caused duplicate user messages on the next turn.
Also invalidates the Redis session cache so the next GET returns
the updated duration.
"""
last_msg = await PrismaChatMessage.prisma().find_first(
where={"sessionId": session_id, "role": "assistant"},
@@ -516,13 +392,5 @@ async def set_turn_duration(session_id: str, duration_ms: int) -> None:
where={"id": last_msg.id},
data={"durationMs": duration_ms},
)
# Update cache in-place rather than invalidating to avoid a
# race window where the empty cache gets re-populated with
# stale data by a concurrent get_chat_session call.
session = await get_chat_session_cached(session_id)
if session and session.messages:
for msg in reversed(session.messages):
if msg.role == "assistant":
msg.duration_ms = duration_ms
break
await cache_chat_session(session)
# Invalidate cache so the session is re-fetched from DB with durationMs
await invalidate_session_cache(session_id)

View File

@@ -1,388 +0,0 @@
"""Unit tests for copilot.db — paginated message queries."""
from __future__ import annotations
from datetime import UTC, datetime
from typing import Any
from unittest.mock import AsyncMock, patch
import pytest
from prisma.models import ChatMessage as PrismaChatMessage
from prisma.models import ChatSession as PrismaChatSession
from backend.copilot.db import (
PaginatedMessages,
get_chat_messages_paginated,
set_turn_duration,
)
from backend.copilot.model import ChatMessage as CopilotChatMessage
from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
def _make_msg(
sequence: int,
role: str = "assistant",
content: str | None = "hello",
tool_calls: Any = None,
) -> PrismaChatMessage:
"""Build a minimal PrismaChatMessage for testing."""
return PrismaChatMessage(
id=f"msg-{sequence}",
createdAt=datetime.now(UTC),
sessionId="sess-1",
role=role,
content=content,
sequence=sequence,
toolCalls=tool_calls,
name=None,
toolCallId=None,
refusal=None,
functionCall=None,
)
def _make_session(
session_id: str = "sess-1",
user_id: str = "user-1",
messages: list[PrismaChatMessage] | None = None,
) -> PrismaChatSession:
"""Build a minimal PrismaChatSession for testing."""
now = datetime.now(UTC)
session = PrismaChatSession.model_construct(
id=session_id,
createdAt=now,
updatedAt=now,
userId=user_id,
credentials={},
successfulAgentRuns={},
successfulAgentSchedules={},
totalPromptTokens=0,
totalCompletionTokens=0,
title=None,
metadata={},
Messages=messages or [],
)
return session
SESSION_ID = "sess-1"
@pytest.fixture()
def mock_db():
"""Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
find_first is used for the main query (session + included messages).
find_many is used only for boundary expansion queries.
"""
with (
patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
):
find_first = AsyncMock()
mock_session_prisma.return_value.find_first = find_first
find_many = AsyncMock(return_value=[])
mock_msg_prisma.return_value.find_many = find_many
yield find_first, find_many
# ---------- Basic pagination ----------
@pytest.mark.asyncio
async def test_basic_page_returns_messages_ascending(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Messages are returned in ascending sequence order."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert isinstance(page, PaginatedMessages)
assert [m.sequence for m in page.messages] == [1, 2, 3]
assert page.has_more is False
assert page.oldest_sequence == 1
@pytest.mark.asyncio
async def test_has_more_when_results_exceed_limit(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""has_more is True when DB returns more than limit items."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=2)
assert page is not None
assert page.has_more is True
assert len(page.messages) == 2
assert [m.sequence for m in page.messages] == [2, 3]
@pytest.mark.asyncio
async def test_empty_session_returns_no_messages(
mock_db: tuple[AsyncMock, AsyncMock],
):
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[])
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is not None
assert page.messages == []
assert page.has_more is False
assert page.oldest_sequence is None
@pytest.mark.asyncio
async def test_before_sequence_filters_correctly(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""before_sequence is passed as a where filter inside the Messages include."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(2), _make_msg(1)],
)
await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
call_kwargs = find_first.call_args
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
@pytest.mark.asyncio
async def test_no_where_on_messages_without_before_sequence(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Without before_sequence, the Messages include has no where clause."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
await get_chat_messages_paginated(SESSION_ID, limit=50)
call_kwargs = find_first.call_args
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
assert "where" not in include["Messages"]
@pytest.mark.asyncio
async def test_user_id_filter_applied_to_session_where(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""user_id adds a userId filter to the session-level where clause."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
call_kwargs = find_first.call_args
where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
assert where["userId"] == "user-abc"
@pytest.mark.asyncio
async def test_session_not_found_returns_none(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Returns None when session doesn't exist or user doesn't own it."""
find_first, _ = mock_db
find_first.return_value = None
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is None
@pytest.mark.asyncio
async def test_session_info_included_in_result(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""PaginatedMessages includes session metadata."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is not None
assert page.session.session_id == SESSION_ID
# ---------- Backward boundary expansion ----------
@pytest.mark.asyncio
async def test_boundary_expansion_includes_assistant(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""When page starts with a tool message, expand backward to include
the owning assistant message."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
)
find_many.return_value = [_make_msg(3, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert [m.sequence for m in page.messages] == [3, 4, 5]
assert page.messages[0].role == "assistant"
assert page.oldest_sequence == 3
@pytest.mark.asyncio
async def test_boundary_expansion_includes_multiple_tool_msgs(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Boundary expansion scans past consecutive tool messages to find
the owning assistant."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(7, role="tool")],
)
find_many.return_value = [
_make_msg(6, role="tool"),
_make_msg(5, role="tool"),
_make_msg(4, role="assistant"),
]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
assert page.messages[0].role == "assistant"
@pytest.mark.asyncio
async def test_boundary_expansion_sets_has_more_when_not_at_start(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3, role="tool")],
)
find_many.return_value = [_make_msg(2, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert page.has_more is True
@pytest.mark.asyncio
async def test_boundary_expansion_no_has_more_at_conversation_start(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""has_more stays False when boundary expansion reaches seq 0."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(1, role="tool")],
)
find_many.return_value = [_make_msg(0, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert page.has_more is False
assert page.oldest_sequence == 0
@pytest.mark.asyncio
async def test_no_boundary_expansion_when_first_msg_not_tool(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""No boundary expansion when the first message is not a tool message."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert find_many.call_count == 0
assert [m.sequence for m in page.messages] == [2, 3]
@pytest.mark.asyncio
async def test_boundary_expansion_warns_when_no_owner_found(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""When boundary scan doesn't find a non-tool message, a warning is logged
and the boundary messages are still included."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(10, role="tool")],
)
find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
with patch("backend.copilot.db.logger") as mock_logger:
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
mock_logger.warning.assert_called_once()
assert page is not None
assert page.messages[0].role == "tool"
assert len(page.messages) > 1
# ---------- Turn duration (integration tests) ----------
@pytest.mark.asyncio(loop_scope="session")
async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_user_id):
"""set_turn_duration patches the cached session without invalidation.
Verifies that after calling set_turn_duration the Redis-cached session
reflects the updated durationMs on the last assistant message, without
the cache having been deleted and re-populated (which could race with
concurrent get_chat_session calls).
"""
session = ChatSession.new(user_id=test_user_id, dry_run=False)
session.messages = [
CopilotChatMessage(role="user", content="hello"),
CopilotChatMessage(role="assistant", content="hi there"),
]
session = await upsert_chat_session(session)
# Ensure the session is in cache
cached = await get_chat_session(session.session_id, test_user_id)
assert cached is not None
assert cached.messages[-1].duration_ms is None
# Update turn duration — should patch cache in-place
await set_turn_duration(session.session_id, 1234)
# Read from cache (not DB) — the cache should already have the update
updated = await get_chat_session(session.session_id, test_user_id)
assert updated is not None
assistant_msgs = [m for m in updated.messages if m.role == "assistant"]
assert len(assistant_msgs) == 1
assert assistant_msgs[0].duration_ms == 1234
@pytest.mark.asyncio(loop_scope="session")
async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user_id):
"""set_turn_duration is a no-op when there are no assistant messages."""
session = ChatSession.new(user_id=test_user_id, dry_run=False)
session.messages = [
CopilotChatMessage(role="user", content="hello"),
]
session = await upsert_chat_session(session)
# Should not raise
await set_turn_duration(session.session_id, 5678)
cached = await get_chat_session(session.session_id, test_user_id)
assert cached is not None
# User message should not have durationMs
assert cached.messages[0].duration_ms is None

View File

@@ -13,7 +13,7 @@ import time
from backend.copilot import stream_registry
from backend.copilot.baseline import stream_chat_completion_baseline
from backend.copilot.config import ChatConfig, CopilotMode
from backend.copilot.config import ChatConfig
from backend.copilot.response_model import StreamError
from backend.copilot.sdk import service as sdk_service
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -30,57 +30,6 @@ from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")
# ============ Mode Routing ============ #
async def resolve_effective_mode(
mode: CopilotMode | None,
user_id: str | None,
) -> CopilotMode | None:
"""Strip ``mode`` when the user is not entitled to the toggle.
The UI gates the mode toggle behind ``CHAT_MODE_OPTION``; the
processor enforces the same gate server-side so an authenticated
user cannot bypass the flag by crafting a request directly.
"""
if mode is None:
return None
allowed = await is_feature_enabled(
Flag.CHAT_MODE_OPTION,
user_id or "anonymous",
default=False,
)
if not allowed:
logger.info(f"Ignoring mode={mode} — CHAT_MODE_OPTION is disabled for user")
return None
return mode
async def resolve_use_sdk_for_mode(
mode: CopilotMode | None,
user_id: str | None,
*,
use_claude_code_subscription: bool,
config_default: bool,
) -> bool:
"""Pick the SDK vs baseline path for a single turn.
Per-request ``mode`` wins whenever it is set (after the
``CHAT_MODE_OPTION`` gate has been applied upstream). Otherwise
falls back to the Claude Code subscription override, then the
``COPILOT_SDK`` LaunchDarkly flag, then the config default.
"""
if mode == "fast":
return False
if mode == "extended_thinking":
return True
return use_claude_code_subscription or await is_feature_enabled(
Flag.COPILOT_SDK,
user_id or "anonymous",
default=config_default,
)
# ============ Module Entry Points ============ #
# Thread-local storage for processor instances
@@ -301,17 +250,23 @@ class CoPilotProcessor:
if config.test_mode:
stream_fn = stream_chat_completion_dummy
log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
effective_mode = None
else:
# Enforce server-side feature-flag gate so unauthorised
# users cannot force a mode by crafting the request.
effective_mode = await resolve_effective_mode(entry.mode, entry.user_id)
use_sdk = await resolve_use_sdk_for_mode(
effective_mode,
entry.user_id,
use_claude_code_subscription=config.use_claude_code_subscription,
config_default=config.use_claude_agent_sdk,
)
# Per-request mode override from the frontend takes priority.
# 'fast' → baseline (OpenAI-compatible), 'extended_thinking' → SDK.
if entry.mode == "fast":
use_sdk = False
elif entry.mode == "extended_thinking":
use_sdk = True
else:
# No mode specified — fall back to feature flag / config.
use_sdk = (
config.use_claude_code_subscription
or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
@@ -319,7 +274,7 @@ class CoPilotProcessor:
)
log.info(
f"Using {'SDK' if use_sdk else 'baseline'} service "
f"(mode={effective_mode or 'default'})"
f"(mode={entry.mode or 'default'})"
)
# Stream chat completion and publish chunks to Redis.
@@ -332,7 +287,6 @@ class CoPilotProcessor:
user_id=entry.user_id,
context=entry.context,
file_ids=entry.file_ids,
mode=effective_mode,
)
async for chunk in stream_registry.stream_and_publish(
session_id=entry.session_id,

View File

@@ -1,175 +0,0 @@
"""Unit tests for CoPilot mode routing logic in the processor.
Tests cover the mode→service mapping:
- 'fast' → baseline service
- 'extended_thinking' → SDK service
- None → feature flag / config fallback
as well as the ``CHAT_MODE_OPTION`` server-side gate. The tests import
the real production helpers from ``processor.py`` so the routing logic
has meaningful coverage.
"""
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.executor.processor import (
resolve_effective_mode,
resolve_use_sdk_for_mode,
)
class TestResolveUseSdkForMode:
"""Tests for the per-request mode routing logic."""
@pytest.mark.asyncio
async def test_fast_mode_uses_baseline(self):
"""mode='fast' always routes to baseline, regardless of flags."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert (
await resolve_use_sdk_for_mode(
"fast",
"user-1",
use_claude_code_subscription=True,
config_default=True,
)
is False
)
@pytest.mark.asyncio
async def test_extended_thinking_uses_sdk(self):
"""mode='extended_thinking' always routes to SDK, regardless of flags."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
"extended_thinking",
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_uses_subscription_override(self):
"""mode=None with claude_code_subscription=True routes to SDK."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=True,
config_default=False,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_uses_feature_flag(self):
"""mode=None with feature flag enabled routes to SDK."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
) as flag_mock:
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is True
)
flag_mock.assert_awaited_once()
@pytest.mark.asyncio
async def test_none_mode_uses_config_default(self):
"""mode=None falls back to config.use_claude_agent_sdk."""
# When LaunchDarkly returns the default (True), we expect SDK routing.
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=True,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_all_disabled(self):
"""mode=None with all flags off routes to baseline."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is False
)
class TestResolveEffectiveMode:
"""Tests for the CHAT_MODE_OPTION server-side gate."""
@pytest.mark.asyncio
async def test_none_mode_passes_through(self):
"""mode=None is returned as-is without a flag check."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
) as flag_mock:
assert await resolve_effective_mode(None, "user-1") is None
flag_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_mode_stripped_when_flag_disabled(self):
"""When CHAT_MODE_OPTION is off, mode is dropped to None."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert await resolve_effective_mode("fast", "user-1") is None
assert await resolve_effective_mode("extended_thinking", "user-1") is None
@pytest.mark.asyncio
async def test_mode_preserved_when_flag_enabled(self):
"""When CHAT_MODE_OPTION is on, the user-selected mode is preserved."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert await resolve_effective_mode("fast", "user-1") == "fast"
assert (
await resolve_effective_mode("extended_thinking", "user-1")
== "extended_thinking"
)
@pytest.mark.asyncio
async def test_anonymous_user_with_mode(self):
"""Anonymous users (user_id=None) still pass through the gate."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
) as flag_mock:
assert await resolve_effective_mode("fast", None) is None
flag_mock.assert_awaited_once()

View File

@@ -6,10 +6,10 @@ Defines two exchanges and queues following the graph executor pattern:
"""
import logging
from typing import Literal
from pydantic import BaseModel
from backend.copilot.config import CopilotMode
from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
from backend.util.logging import TruncatedLogger, is_structured_logging_enabled
@@ -157,7 +157,7 @@ class CoPilotExecutionEntry(BaseModel):
file_ids: list[str] | None = None
"""Workspace file IDs attached to the user's message"""
mode: CopilotMode | None = None
mode: Literal["fast", "extended_thinking"] | None = None
"""Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
@@ -179,7 +179,7 @@ async def enqueue_copilot_turn(
is_user_message: bool = True,
context: dict[str, str] | None = None,
file_ids: list[str] | None = None,
mode: CopilotMode | None = None,
mode: Literal["fast", "extended_thinking"] | None = None,
) -> None:
"""Enqueue a CoPilot task for processing by the executor service.

View File

@@ -1,123 +0,0 @@
"""Tests for CoPilot executor utils (queue config, message models, logging)."""
from backend.copilot.executor.utils import (
COPILOT_EXECUTION_EXCHANGE,
COPILOT_EXECUTION_QUEUE_NAME,
COPILOT_EXECUTION_ROUTING_KEY,
CancelCoPilotEvent,
CoPilotExecutionEntry,
CoPilotLogMetadata,
create_copilot_queue_config,
)
class TestCoPilotExecutionEntry:
def test_basic_fields(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="hello",
)
assert entry.session_id == "s1"
assert entry.user_id == "u1"
assert entry.message == "hello"
assert entry.is_user_message is True
assert entry.mode is None
assert entry.context is None
assert entry.file_ids is None
def test_mode_field(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
mode="fast",
)
assert entry.mode == "fast"
entry2 = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
mode="extended_thinking",
)
assert entry2.mode == "extended_thinking"
def test_optional_fields(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
turn_id="t1",
context={"url": "https://example.com"},
file_ids=["f1", "f2"],
is_user_message=False,
)
assert entry.turn_id == "t1"
assert entry.context == {"url": "https://example.com"}
assert entry.file_ids == ["f1", "f2"]
assert entry.is_user_message is False
def test_serialization_roundtrip(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="hello",
mode="fast",
)
json_str = entry.model_dump_json()
restored = CoPilotExecutionEntry.model_validate_json(json_str)
assert restored == entry
class TestCancelCoPilotEvent:
def test_basic(self):
event = CancelCoPilotEvent(session_id="s1")
assert event.session_id == "s1"
def test_serialization(self):
event = CancelCoPilotEvent(session_id="s1")
restored = CancelCoPilotEvent.model_validate_json(event.model_dump_json())
assert restored.session_id == "s1"
class TestCreateCopilotQueueConfig:
def test_returns_valid_config(self):
config = create_copilot_queue_config()
assert len(config.exchanges) == 2
assert len(config.queues) == 2
def test_execution_queue_properties(self):
config = create_copilot_queue_config()
exec_queue = next(
q for q in config.queues if q.name == COPILOT_EXECUTION_QUEUE_NAME
)
assert exec_queue.durable is True
assert exec_queue.exchange == COPILOT_EXECUTION_EXCHANGE
assert exec_queue.routing_key == COPILOT_EXECUTION_ROUTING_KEY
def test_cancel_queue_uses_fanout(self):
config = create_copilot_queue_config()
cancel_queue = next(
q for q in config.queues if q.name != COPILOT_EXECUTION_QUEUE_NAME
)
assert cancel_queue.exchange is not None
assert cancel_queue.exchange.type.value == "fanout"
class TestCoPilotLogMetadata:
def test_creates_logger_with_metadata(self):
import logging
base_logger = logging.getLogger("test")
log = CoPilotLogMetadata(base_logger, session_id="s1", user_id="u1")
assert log is not None
def test_filters_none_values(self):
import logging
base_logger = logging.getLogger("test")
log = CoPilotLogMetadata(
base_logger, session_id="s1", user_id=None, turn_id="t1"
)
assert log is not None

View File

@@ -64,7 +64,6 @@ class ChatMessage(BaseModel):
refusal: str | None = None
tool_calls: list[dict] | None = None
function_call: dict | None = None
sequence: int | None = None
duration_ms: int | None = None
@staticmethod
@@ -78,54 +77,10 @@ class ChatMessage(BaseModel):
refusal=prisma_message.refusal,
tool_calls=_parse_json_field(prisma_message.toolCalls),
function_call=_parse_json_field(prisma_message.functionCall),
sequence=prisma_message.sequence,
duration_ms=prisma_message.durationMs,
)
def is_message_duplicate(
messages: list[ChatMessage],
role: str,
content: str,
) -> bool:
"""Check whether *content* is already present in the current pending turn.
Only inspects trailing messages that share the given *role* (i.e. the
current turn). This ensures legitimately repeated messages across different
turns are not suppressed, while same-turn duplicates from stale cache are
still caught.
"""
for m in reversed(messages):
if m.role == role:
if m.content == content:
return True
else:
break
return False
def maybe_append_user_message(
session: "ChatSession",
message: str | None,
is_user_message: bool,
) -> bool:
"""Append a user/assistant message to the session if not already present.
The route handler already persists the user message before enqueueing,
so we check trailing same-role messages to avoid re-appending when the
session cache is slightly stale.
Returns True if the message was appended, False if skipped.
"""
if not message:
return False
role = "user" if is_user_message else "assistant"
if is_message_duplicate(session.messages, role, message):
return False
session.messages.append(ChatMessage(role=role, content=message))
return True
class Usage(BaseModel):
prompt_tokens: int
completion_tokens: int

View File

@@ -17,8 +17,6 @@ from .model import (
ChatSession,
Usage,
get_chat_session,
is_message_duplicate,
maybe_append_user_message,
upsert_chat_session,
)
@@ -426,151 +424,3 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
assert "Streaming message 1" in contents
assert "Streaming message 2" in contents
assert "Callback result" in contents
# --------------------------------------------------------------------------- #
# is_message_duplicate #
# --------------------------------------------------------------------------- #
def test_duplicate_detected_in_trailing_same_role():
"""Duplicate user message at the tail is detected."""
msgs = [
ChatMessage(role="user", content="hello"),
ChatMessage(role="assistant", content="hi there"),
ChatMessage(role="user", content="yes"),
]
assert is_message_duplicate(msgs, "user", "yes") is True
def test_duplicate_not_detected_across_turns():
"""Same text in a previous turn (separated by assistant) is NOT a duplicate."""
msgs = [
ChatMessage(role="user", content="yes"),
ChatMessage(role="assistant", content="ok"),
]
assert is_message_duplicate(msgs, "user", "yes") is False
def test_no_duplicate_on_empty_messages():
"""Empty message list never reports a duplicate."""
assert is_message_duplicate([], "user", "hello") is False
def test_no_duplicate_when_content_differs():
"""Different content in the trailing same-role block is not a duplicate."""
msgs = [
ChatMessage(role="assistant", content="response"),
ChatMessage(role="user", content="first message"),
]
assert is_message_duplicate(msgs, "user", "second message") is False
def test_duplicate_with_multiple_trailing_same_role():
"""Detects duplicate among multiple consecutive same-role messages."""
msgs = [
ChatMessage(role="assistant", content="response"),
ChatMessage(role="user", content="msg1"),
ChatMessage(role="user", content="msg2"),
]
assert is_message_duplicate(msgs, "user", "msg1") is True
assert is_message_duplicate(msgs, "user", "msg2") is True
assert is_message_duplicate(msgs, "user", "msg3") is False
def test_duplicate_check_for_assistant_role():
"""Works correctly when checking assistant role too."""
msgs = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="hello"),
ChatMessage(role="assistant", content="how can I help?"),
]
assert is_message_duplicate(msgs, "assistant", "hello") is True
assert is_message_duplicate(msgs, "assistant", "new response") is False
def test_no_false_positive_when_content_is_none():
"""Messages with content=None in the trailing block do not match."""
msgs = [
ChatMessage(role="user", content=None),
ChatMessage(role="user", content="hello"),
]
assert is_message_duplicate(msgs, "user", "hello") is True
# None-content message should not match any string
msgs2 = [
ChatMessage(role="user", content=None),
]
assert is_message_duplicate(msgs2, "user", "hello") is False
def test_all_same_role_messages():
"""When all messages share the same role, the entire list is scanned."""
msgs = [
ChatMessage(role="user", content="first"),
ChatMessage(role="user", content="second"),
ChatMessage(role="user", content="third"),
]
assert is_message_duplicate(msgs, "user", "first") is True
assert is_message_duplicate(msgs, "user", "new") is False
# --------------------------------------------------------------------------- #
# maybe_append_user_message #
# --------------------------------------------------------------------------- #
def test_maybe_append_user_message_appends_new():
"""A new user message is appended and returns True."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="assistant", content="hello"),
]
result = maybe_append_user_message(session, "new msg", is_user_message=True)
assert result is True
assert len(session.messages) == 2
assert session.messages[-1].role == "user"
assert session.messages[-1].content == "new msg"
def test_maybe_append_user_message_skips_duplicate():
"""A duplicate user message is skipped and returns False."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="assistant", content="hello"),
ChatMessage(role="user", content="dup"),
]
result = maybe_append_user_message(session, "dup", is_user_message=True)
assert result is False
assert len(session.messages) == 2
def test_maybe_append_user_message_none_message():
"""None/empty message returns False without appending."""
session = ChatSession.new(user_id="u", dry_run=False)
assert maybe_append_user_message(session, None, is_user_message=True) is False
assert maybe_append_user_message(session, "", is_user_message=True) is False
assert len(session.messages) == 0
def test_maybe_append_assistant_message():
"""Works for assistant role when is_user_message=False."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
]
result = maybe_append_user_message(session, "response", is_user_message=False)
assert result is True
assert session.messages[-1].role == "assistant"
assert session.messages[-1].content == "response"
def test_maybe_append_assistant_skips_duplicate():
"""Duplicate assistant message is skipped."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="dup"),
]
result = maybe_append_user_message(session, "dup", is_user_message=False)
assert result is False
assert len(session.messages) == 2

View File

@@ -13,21 +13,12 @@ from .rate_limit import (
RateLimitExceeded,
SubscriptionTier,
UsageWindow,
_daily_key,
_daily_reset_time,
_weekly_key,
_weekly_reset_time,
acquire_reset_lock,
check_rate_limit,
get_daily_reset_count,
get_global_rate_limits,
get_usage_status,
get_user_tier,
increment_daily_reset_count,
record_token_usage,
release_reset_lock,
reset_daily_usage,
reset_user_usage,
set_user_tier,
)
@@ -1219,205 +1210,3 @@ class TestTierLimitsEnforced:
assert daily == biz_daily # 20x
# Should NOT raise — usage is within the BUSINESS tier allowance
await check_rate_limit(_USER, daily, weekly)
# ---------------------------------------------------------------------------
# Private key/reset helpers
# ---------------------------------------------------------------------------
class TestKeyHelpers:
def test_daily_key_format(self):
now = datetime(2026, 4, 3, 12, 0, 0, tzinfo=UTC)
key = _daily_key("user-1", now=now)
assert "daily" in key
assert "user-1" in key
assert "2026-04-03" in key
def test_daily_key_defaults_to_now(self):
key = _daily_key("user-1")
assert "daily" in key
assert "user-1" in key
def test_weekly_key_format(self):
now = datetime(2026, 4, 3, 12, 0, 0, tzinfo=UTC)
key = _weekly_key("user-1", now=now)
assert "weekly" in key
assert "user-1" in key
assert "2026-W" in key
def test_weekly_key_defaults_to_now(self):
key = _weekly_key("user-1")
assert "weekly" in key
def test_daily_reset_time_is_next_midnight(self):
now = datetime(2026, 4, 3, 15, 30, 0, tzinfo=UTC)
reset = _daily_reset_time(now=now)
assert reset == datetime(2026, 4, 4, 0, 0, 0, tzinfo=UTC)
def test_daily_reset_time_defaults_to_now(self):
reset = _daily_reset_time()
assert reset.hour == 0
assert reset.minute == 0
def test_weekly_reset_time_is_next_monday(self):
# 2026-04-03 is a Friday
now = datetime(2026, 4, 3, 15, 30, 0, tzinfo=UTC)
reset = _weekly_reset_time(now=now)
assert reset.weekday() == 0 # Monday
assert reset == datetime(2026, 4, 6, 0, 0, 0, tzinfo=UTC)
def test_weekly_reset_time_defaults_to_now(self):
reset = _weekly_reset_time()
assert reset.weekday() == 0 # Monday
# ---------------------------------------------------------------------------
# acquire_reset_lock / release_reset_lock
# ---------------------------------------------------------------------------
class TestResetLock:
@pytest.mark.asyncio
async def test_acquire_lock_success(self):
mock_redis = AsyncMock()
mock_redis.set = AsyncMock(return_value=True)
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
result = await acquire_reset_lock("user-1")
assert result is True
@pytest.mark.asyncio
async def test_acquire_lock_already_held(self):
mock_redis = AsyncMock()
mock_redis.set = AsyncMock(return_value=False)
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
result = await acquire_reset_lock("user-1")
assert result is False
@pytest.mark.asyncio
async def test_acquire_lock_redis_unavailable(self):
with patch(
"backend.copilot.rate_limit.get_redis_async",
side_effect=RedisError("down"),
):
result = await acquire_reset_lock("user-1")
assert result is False
@pytest.mark.asyncio
async def test_release_lock_success(self):
mock_redis = AsyncMock()
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
await release_reset_lock("user-1")
mock_redis.delete.assert_called_once()
@pytest.mark.asyncio
async def test_release_lock_redis_unavailable(self):
with patch(
"backend.copilot.rate_limit.get_redis_async",
side_effect=RedisError("down"),
):
# Should not raise
await release_reset_lock("user-1")
# ---------------------------------------------------------------------------
# get_daily_reset_count / increment_daily_reset_count
# ---------------------------------------------------------------------------
class TestDailyResetCount:
@pytest.mark.asyncio
async def test_get_count_returns_value(self):
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(return_value="3")
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
count = await get_daily_reset_count("user-1")
assert count == 3
@pytest.mark.asyncio
async def test_get_count_returns_zero_when_no_key(self):
mock_redis = AsyncMock()
mock_redis.get = AsyncMock(return_value=None)
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
count = await get_daily_reset_count("user-1")
assert count == 0
@pytest.mark.asyncio
async def test_get_count_returns_none_when_redis_unavailable(self):
with patch(
"backend.copilot.rate_limit.get_redis_async",
side_effect=RedisError("down"),
):
count = await get_daily_reset_count("user-1")
assert count is None
@pytest.mark.asyncio
async def test_increment_count(self):
mock_pipe = MagicMock()
mock_pipe.incr = MagicMock()
mock_pipe.expire = MagicMock()
mock_pipe.execute = AsyncMock()
mock_redis = AsyncMock()
mock_redis.pipeline = MagicMock(return_value=mock_pipe)
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
await increment_daily_reset_count("user-1")
mock_pipe.incr.assert_called_once()
mock_pipe.expire.assert_called_once()
@pytest.mark.asyncio
async def test_increment_count_redis_unavailable(self):
with patch(
"backend.copilot.rate_limit.get_redis_async",
side_effect=RedisError("down"),
):
# Should not raise
await increment_daily_reset_count("user-1")
# ---------------------------------------------------------------------------
# reset_user_usage
# ---------------------------------------------------------------------------
class TestResetUserUsage:
@pytest.mark.asyncio
async def test_resets_daily_key(self):
mock_redis = AsyncMock()
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
await reset_user_usage("user-1")
mock_redis.delete.assert_called_once()
@pytest.mark.asyncio
async def test_resets_daily_and_weekly(self):
mock_redis = AsyncMock()
with patch(
"backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
):
await reset_user_usage("user-1", reset_weekly=True)
args = mock_redis.delete.call_args[0]
assert len(args) == 2 # both daily and weekly keys
@pytest.mark.asyncio
async def test_raises_on_redis_failure(self):
with patch(
"backend.copilot.rate_limit.get_redis_async",
side_effect=RedisError("down"),
):
with pytest.raises(RedisError):
await reset_user_usage("user-1")

View File

@@ -0,0 +1,391 @@
"""Tests for P0 guardrails: _resolve_fallback_model, security env vars, TMPDIR."""
from unittest.mock import patch
import pytest
from pydantic import ValidationError
from backend.copilot.config import ChatConfig
from backend.copilot.constants import is_transient_api_error
def _make_config(**overrides) -> ChatConfig:
"""Create a ChatConfig with safe defaults, applying *overrides*."""
defaults = {
"use_claude_code_subscription": False,
"use_openrouter": False,
"api_key": None,
"base_url": None,
}
defaults.update(overrides)
return ChatConfig(**defaults)
# ---------------------------------------------------------------------------
# _resolve_fallback_model
# ---------------------------------------------------------------------------
_SVC = "backend.copilot.sdk.service"
class TestResolveFallbackModel:
"""Provider-aware fallback model resolution."""
def test_returns_none_when_empty(self):
cfg = _make_config(claude_agent_fallback_model="")
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
assert _resolve_fallback_model() is None
def test_strips_provider_prefix(self):
"""OpenRouter-style 'anthropic/claude-sonnet-4-...' is stripped."""
cfg = _make_config(
claude_agent_fallback_model="anthropic/claude-sonnet-4-20250514",
use_openrouter=True,
api_key="sk-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result == "claude-sonnet-4-20250514"
assert "/" not in result
def test_dots_replaced_for_direct_anthropic(self):
"""Direct Anthropic requires hyphen-separated versions."""
cfg = _make_config(
claude_agent_fallback_model="claude-sonnet-4.5-20250514",
use_openrouter=False,
)
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result is not None
assert "." not in result
assert result == "claude-sonnet-4-5-20250514"
def test_dots_preserved_for_openrouter(self):
"""OpenRouter uses dot-separated versions — don't normalise."""
cfg = _make_config(
claude_agent_fallback_model="claude-sonnet-4.5-20250514",
use_openrouter=True,
api_key="sk-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result == "claude-sonnet-4.5-20250514"
def test_default_value(self):
"""Default fallback model resolves to a valid string."""
cfg = _make_config()
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result is not None
assert "sonnet" in result.lower() or "claude" in result.lower()
# ---------------------------------------------------------------------------
# Security & isolation env vars
# ---------------------------------------------------------------------------
class TestSecurityEnvVars:
"""Verify the env-var contract in the service module.
The production code sets CLAUDE_CODE_TMPDIR and security env vars
inline after ``build_sdk_env()`` returns. We grep for these string
literals in ``service.py`` to ensure they aren't accidentally removed.
"""
_SERVICE_PATH = "autogpt_platform/backend/backend/copilot/sdk/service.py"
@staticmethod
def _read_service_source() -> str:
import pathlib
# Walk up from this test file to the repo root
repo = pathlib.Path(__file__).resolve().parents[5]
return (repo / TestSecurityEnvVars._SERVICE_PATH).read_text()
def test_tmpdir_env_var_present_in_source(self):
"""CLAUDE_CODE_TMPDIR must be set when sdk_cwd is provided."""
src = self._read_service_source()
assert 'sdk_env["CLAUDE_CODE_TMPDIR"]' in src
def test_home_not_overridden_in_source(self):
"""HOME must NOT be overridden — would break git/ssh/npm."""
src = self._read_service_source()
assert 'sdk_env["HOME"]' not in src
def test_security_env_vars_present_in_source(self):
"""All four security env vars must be set in the service module."""
src = self._read_service_source()
for var in (
"CLAUDE_CODE_DISABLE_CLAUDE_MDS",
"CLAUDE_CODE_SKIP_PROMPT_HISTORY",
"CLAUDE_CODE_DISABLE_AUTO_MEMORY",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC",
):
assert var in src, f"{var} not found in service.py"
# ---------------------------------------------------------------------------
# Config defaults
# ---------------------------------------------------------------------------
class TestConfigDefaults:
"""Verify ChatConfig P0 fields have correct defaults."""
def test_fallback_model_default(self):
cfg = _make_config()
assert cfg.claude_agent_fallback_model
assert "sonnet" in cfg.claude_agent_fallback_model.lower()
def test_max_turns_default(self):
cfg = _make_config()
assert cfg.claude_agent_max_turns == 50
def test_max_budget_usd_default(self):
cfg = _make_config()
assert cfg.claude_agent_max_budget_usd == 5.0
def test_max_transient_retries_default(self):
cfg = _make_config()
assert cfg.claude_agent_max_transient_retries == 3
# ---------------------------------------------------------------------------
# build_sdk_env — all 3 auth modes
# ---------------------------------------------------------------------------
_ENV = "backend.copilot.sdk.env"
class TestBuildSdkEnv:
"""Verify build_sdk_env returns correct dicts for each auth mode."""
def test_subscription_mode_clears_keys(self):
"""Mode 1: subscription clears API key / auth token / base URL."""
cfg = _make_config(use_claude_code_subscription=True)
with (
patch(f"{_ENV}.config", cfg),
patch(f"{_ENV}.validate_subscription"),
):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env(session_id="s1", user_id="u1")
assert env["ANTHROPIC_API_KEY"] == ""
assert env["ANTHROPIC_AUTH_TOKEN"] == ""
assert env["ANTHROPIC_BASE_URL"] == ""
def test_direct_anthropic_returns_empty_dict(self):
"""Mode 2: direct Anthropic returns {} (inherits from parent env)."""
cfg = _make_config(
use_claude_code_subscription=False,
use_openrouter=False,
)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert env == {}
def test_openrouter_sets_base_url_and_auth(self):
"""Mode 3: OpenRouter sets base URL, auth token, and clears API key."""
cfg = _make_config(
use_claude_code_subscription=False,
use_openrouter=True,
api_key="sk-or-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env(session_id="sess-1", user_id="user-1")
assert env["ANTHROPIC_BASE_URL"] == "https://openrouter.ai/api"
assert env["ANTHROPIC_AUTH_TOKEN"] == "sk-or-test"
assert env["ANTHROPIC_API_KEY"] == ""
assert "x-session-id: sess-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
assert "x-user-id: user-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
def test_openrouter_no_headers_when_ids_empty(self):
"""Mode 3: No custom headers when session_id/user_id are not given."""
cfg = _make_config(
use_claude_code_subscription=False,
use_openrouter=True,
api_key="sk-or-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert "ANTHROPIC_CUSTOM_HEADERS" not in env
def test_all_modes_return_mutable_dict(self):
"""build_sdk_env must return a mutable dict (not None) so callers
can add security env vars like CLAUDE_CODE_TMPDIR."""
for cfg in (
_make_config(use_claude_code_subscription=True),
_make_config(use_openrouter=False),
_make_config(
use_openrouter=True,
api_key="k",
base_url="https://openrouter.ai/api/v1",
),
):
with (
patch(f"{_ENV}.config", cfg),
patch(f"{_ENV}.validate_subscription"),
):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert isinstance(env, dict)
env["CLAUDE_CODE_TMPDIR"] = "/tmp/test"
assert env["CLAUDE_CODE_TMPDIR"] == "/tmp/test"
# ---------------------------------------------------------------------------
# is_transient_api_error
# ---------------------------------------------------------------------------
class TestIsTransientApiError:
"""Verify that is_transient_api_error detects all transient patterns."""
@pytest.mark.parametrize(
"error_text",
[
"socket connection was closed unexpectedly",
"ECONNRESET",
"connection was forcibly closed",
"network socket disconnected",
],
)
def test_connection_level_errors(self, error_text: str):
assert is_transient_api_error(error_text)
@pytest.mark.parametrize(
"error_text",
[
"rate limit exceeded",
"rate_limit_error",
"Too Many Requests",
"status code 429",
],
)
def test_429_rate_limit_errors(self, error_text: str):
assert is_transient_api_error(error_text)
@pytest.mark.parametrize(
"error_text",
[
"API is overloaded",
"Internal Server Error",
"Bad Gateway",
"Service Unavailable",
"Gateway Timeout",
"status code 529",
"status code 500",
"status code 502",
"status code 503",
"status code 504",
],
)
def test_5xx_server_errors(self, error_text: str):
assert is_transient_api_error(error_text)
@pytest.mark.parametrize(
"error_text",
[
"invalid_api_key",
"Authentication failed",
"prompt is too long",
"model not found",
"",
],
)
def test_non_transient_errors(self, error_text: str):
assert not is_transient_api_error(error_text)
def test_case_insensitive(self):
assert is_transient_api_error("SOCKET CONNECTION WAS CLOSED UNEXPECTEDLY")
assert is_transient_api_error("econnreset")
# ---------------------------------------------------------------------------
# Config validators for max_turns / max_budget_usd
# ---------------------------------------------------------------------------
class TestConfigValidators:
"""Verify ge/le bounds on max_turns and max_budget_usd."""
def test_max_turns_rejects_zero(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_turns=0)
def test_max_turns_rejects_negative(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_turns=-1)
def test_max_turns_rejects_above_500(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_turns=501)
def test_max_turns_accepts_boundary_values(self):
cfg_low = _make_config(claude_agent_max_turns=1)
assert cfg_low.claude_agent_max_turns == 1
cfg_high = _make_config(claude_agent_max_turns=500)
assert cfg_high.claude_agent_max_turns == 500
def test_max_budget_rejects_zero(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_budget_usd=0.0)
def test_max_budget_rejects_negative(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_budget_usd=-1.0)
def test_max_budget_rejects_above_100(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_budget_usd=100.01)
def test_max_budget_accepts_boundary_values(self):
cfg_low = _make_config(claude_agent_max_budget_usd=0.01)
assert cfg_low.claude_agent_max_budget_usd == 0.01
cfg_high = _make_config(claude_agent_max_budget_usd=100.0)
assert cfg_high.claude_agent_max_budget_usd == 100.0
def test_max_transient_retries_rejects_negative(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_transient_retries=-1)
def test_max_transient_retries_rejects_above_10(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_transient_retries=11)
def test_max_transient_retries_accepts_boundary_values(self):
cfg_low = _make_config(claude_agent_max_transient_retries=0)
assert cfg_low.claude_agent_max_transient_retries == 0
cfg_high = _make_config(claude_agent_max_transient_retries=10)
assert cfg_high.claude_agent_max_transient_retries == 10

View File

@@ -8,19 +8,20 @@ from uuid import uuid4
import pytest
from backend.copilot.transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_run_compression,
_transcript_to_messages,
)
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import compact_transcript, validate_transcript
from .transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_run_compression,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
# ---------------------------------------------------------------------------
# _flatten_assistant_content

View File

@@ -260,13 +260,13 @@ def test_result_error_emits_error_and_finish():
is_error=True,
num_turns=0,
session_id="s1",
result="API rate limited",
result="Invalid API key provided",
)
results = adapter.convert_message(msg)
# No step was open, so no FinishStep — just Error + Finish
assert len(results) == 2
assert isinstance(results[0], StreamError)
assert "API rate limited" in results[0].errorText
assert "Invalid API key provided" in results[0].errorText
assert isinstance(results[1], StreamFinish)

View File

@@ -26,17 +26,18 @@ from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.copilot.transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_transcript_to_messages,
)
from backend.util import json
from .conftest import build_test_transcript as _build_transcript
from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
from .transcript import compact_transcript, validate_transcript
from .transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
from .transcript_builder import TranscriptBuilder
# ---------------------------------------------------------------------------
@@ -1404,9 +1405,9 @@ class TestStreamChatCompletionRetryIntegration:
events.append(event)
# Should NOT retry — only 1 attempt for auth errors
assert (
attempt_count[0] == 1
), f"Expected 1 attempt (no retry for auth error), got {attempt_count[0]}"
assert attempt_count[0] == 1, (
f"Expected 1 attempt (no retry for auth error), " f"got {attempt_count[0]}"
)
errors = [e for e in events if isinstance(e, StreamError)]
assert errors, "Expected StreamError"
assert errors[0].code == "sdk_stream_error"

View File

@@ -105,6 +105,10 @@ def test_agent_options_accepts_all_our_fields():
"env",
"resume",
"max_buffer_size",
"stderr",
"fallback_model",
"max_turns",
"max_budget_usd",
]
sig = inspect.signature(ClaudeAgentOptions)
for field in fields_we_use:

View File

@@ -33,24 +33,12 @@ from pydantic import BaseModel
from backend.copilot.context import get_workspace_manager
from backend.copilot.permissions import apply_tool_permissions
from backend.copilot.rate_limit import get_user_tier
from backend.copilot.transcript import (
_run_compression,
cleanup_stale_project_dirs,
compact_transcript,
download_transcript,
read_compacted_entries,
upload_transcript,
validate_transcript,
write_transcript_to_tempfile,
)
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.data.redis_client import get_redis_async
from backend.executor.cluster_lock import AsyncClusterLock
from backend.util.exceptions import NotFoundError
from backend.util.settings import Settings
from ..config import ChatConfig, CopilotMode
from ..config import ChatConfig
from ..constants import (
COPILOT_ERROR_PREFIX,
COPILOT_RETRYABLE_ERROR_PREFIX,
@@ -63,7 +51,6 @@ from ..model import (
ChatMessage,
ChatSession,
get_chat_session,
maybe_append_user_message,
update_session_title,
upsert_chat_session,
)
@@ -105,6 +92,17 @@ from .tool_adapter import (
set_execution_context,
wait_for_stash,
)
from .transcript import (
_run_compression,
cleanup_stale_project_dirs,
compact_transcript,
download_transcript,
read_compacted_entries,
upload_transcript,
validate_transcript,
write_transcript_to_tempfile,
)
from .transcript_builder import TranscriptBuilder
logger = logging.getLogger(__name__)
config = ChatConfig()
@@ -131,11 +129,6 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
"Try breaking your request into smaller parts."
)
# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
# hanging on a search provider that never responds).
_IDLE_TIMEOUT_SECONDS = 10 * 60 # 10 minutes
# Patterns that indicate the prompt/request exceeds the model's context limit.
# Matched case-insensitively against the full exception chain.
_PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -1278,8 +1271,6 @@ async def _run_stream_attempt(
await client.query(state.query_message, session_id=ctx.session_id)
state.transcript_builder.append_user(content=ctx.current_message)
_last_real_msg_time = time.monotonic()
async for sdk_msg in _iter_sdk_messages(client):
# Heartbeat sentinel — refresh lock and keep SSE alive
if sdk_msg is None:
@@ -1287,34 +1278,8 @@ async def _run_stream_attempt(
for ev in ctx.compaction.emit_start_if_ready():
yield ev
yield StreamHeartbeat()
# Idle timeout: if no real SDK message for too long, a tool
# call is likely hung (e.g. WebSearch provider not responding).
idle_seconds = time.monotonic() - _last_real_msg_time
if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
logger.error(
"%s Idle timeout after %.0fs with no SDK message — "
"aborting stream (likely hung tool call)",
ctx.log_prefix,
idle_seconds,
)
stream_error_msg = (
"A tool call appears to be stuck "
"(no response for 10 minutes). "
"Please try again."
)
stream_error_code = "idle_timeout"
_append_error_marker(ctx.session, stream_error_msg, retryable=True)
yield StreamError(
errorText=stream_error_msg,
code=stream_error_code,
)
ended_with_stream_error = True
break
continue
_last_real_msg_time = time.monotonic()
logger.info(
"%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
ctx.log_prefix,
@@ -1563,21 +1528,9 @@ async def _run_stream_attempt(
# --- Intermediate persistence ---
# Flush session messages to DB periodically so page reloads
# show progress during long-running turns.
#
# IMPORTANT: Skip the flush while tool calls are pending
# (tool_calls set on assistant but results not yet received).
# The DB save is append-only (uses start_sequence), so if we
# flush the assistant message before tool_calls are set on it
# (text and tool_use arrive as separate SDK events), the
# tool_calls update is lost — the next flush starts past it.
_msgs_since_flush += 1
now = time.monotonic()
has_pending_tools = (
acc.has_appended_assistant
and acc.accumulated_tool_calls
and not acc.has_tool_results
)
if not has_pending_tools and (
if (
_msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD
or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS
):
@@ -1677,7 +1630,6 @@ async def stream_chat_completion_sdk(
session: ChatSession | None = None,
file_ids: list[str] | None = None,
permissions: "CopilotPermissions | None" = None,
mode: CopilotMode | None = None,
**_kwargs: Any,
) -> AsyncIterator[StreamBaseResponse]:
"""Stream chat completion using Claude Agent SDK.
@@ -1686,10 +1638,7 @@ async def stream_chat_completion_sdk(
file_ids: Optional workspace file IDs attached to the user's message.
Images are embedded as vision content blocks; other files are
saved to the SDK working directory for the Read tool.
mode: Accepted for signature compatibility with the baseline path.
The SDK path does not currently branch on this value.
"""
_ = mode # SDK path ignores the requested mode.
if session is None:
session = await get_chat_session(session_id, user_id)
@@ -1720,12 +1669,19 @@ async def stream_chat_completion_sdk(
)
session.messages.pop()
if maybe_append_user_message(session, message, is_user_message):
# Append the new message to the session if it's not already there
new_message_role = "user" if is_user_message else "assistant"
if message and (
len(session.messages) == 0
or not (
session.messages[-1].role == new_message_role
and session.messages[-1].content == message
)
):
session.messages.append(ChatMessage(role=new_message_role, content=message))
if is_user_message:
track_user_message(
user_id=user_id,
session_id=session_id,
message_length=len(message or ""),
user_id=user_id, session_id=session_id, message_length=len(message)
)
# Structured log prefix: [SDK][<session>][T<turn>]
@@ -1990,20 +1946,15 @@ async def stream_chat_completion_sdk(
# langsmith tracing integration attaches them to every span. This
# is what Langfuse (or any OTEL backend) maps to its native
# user/session fields.
_user_tier = await get_user_tier(user_id) if user_id else None
_otel_metadata: dict[str, str] = {
"resume": str(use_resume),
"conversation_turn": str(turn),
}
if _user_tier:
_otel_metadata["subscription_tier"] = _user_tier.value
_otel_ctx = propagate_attributes(
user_id=user_id,
session_id=session_id,
trace_name="copilot-sdk",
tags=["sdk"],
metadata=_otel_metadata,
metadata={
"resume": str(use_resume),
"conversation_turn": str(turn),
},
)
_otel_ctx.__enter__()

View File

@@ -10,6 +10,7 @@ import pytest
from .service import (
_is_sdk_disconnect_error,
_normalize_model_name,
_prepare_file_attachments,
_resolve_sdk_model,
_safe_close_sdk_client,
@@ -405,6 +406,49 @@ def _clean_config_env(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv(var, raising=False)
class TestNormalizeModelName:
"""Tests for _normalize_model_name — shared provider-aware normalization."""
def test_strips_provider_prefix(self, monkeypatch, _clean_config_env):
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4-6"
def test_dots_preserved_for_openrouter(self, monkeypatch, _clean_config_env):
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4.6"
def test_no_prefix_no_dots(self, monkeypatch, _clean_config_env):
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert (
_normalize_model_name("claude-sonnet-4-20250514")
== "claude-sonnet-4-20250514"
)
class TestResolveSdkModel:
"""Tests for _resolve_sdk_model — model ID resolution for the SDK CLI."""

View File

@@ -27,19 +27,20 @@ from backend.copilot.response_model import (
StreamTextDelta,
StreamTextStart,
)
from backend.copilot.transcript import (
_find_last_assistant_entry,
_flatten_assistant_content,
_messages_to_transcript,
_rechain_tail,
_transcript_to_messages,
)
from backend.util import json
from .conftest import build_structured_transcript
from .response_adapter import SDKResponseAdapter
from .service import _format_sdk_content_blocks
from .transcript import compact_transcript, validate_transcript
from .transcript import (
_find_last_assistant_entry,
_flatten_assistant_content,
_messages_to_transcript,
_rechain_tail,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
# ---------------------------------------------------------------------------
# Fixtures: realistic thinking block content

View File

@@ -1,25 +1,33 @@
"""Re-export public API from shared ``backend.copilot.transcript``.
"""Re-export from shared ``backend.copilot.transcript`` for backward compat.
The canonical implementation now lives at ``backend.copilot.transcript``
so both the SDK and baseline paths can import without cross-package
dependencies. Public symbols are re-exported here so existing ``from
dependencies. All symbols are re-exported here so existing ``from
.transcript import ...`` statements within the ``sdk`` package continue
to work without modification.
"""
from backend.copilot.transcript import (
_MAX_PROJECT_DIRS_TO_SWEEP,
_STALE_PROJECT_DIR_SECONDS,
COMPACT_MSG_ID_PREFIX,
ENTRY_TYPE_MESSAGE,
STOP_REASON_END_TURN,
STRIPPABLE_TYPES,
TRANSCRIPT_STORAGE_PREFIX,
TranscriptDownload,
_find_last_assistant_entry,
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_rechain_tail,
_run_compression,
_transcript_to_messages,
cleanup_stale_project_dirs,
compact_transcript,
delete_transcript,
download_transcript,
read_compacted_entries,
strip_for_upload,
strip_progress_entries,
strip_stale_thinking_blocks,
upload_transcript,
@@ -34,12 +42,20 @@ __all__ = [
"STRIPPABLE_TYPES",
"TRANSCRIPT_STORAGE_PREFIX",
"TranscriptDownload",
"_MAX_PROJECT_DIRS_TO_SWEEP",
"_STALE_PROJECT_DIR_SECONDS",
"_find_last_assistant_entry",
"_flatten_assistant_content",
"_flatten_tool_result_content",
"_messages_to_transcript",
"_rechain_tail",
"_run_compression",
"_transcript_to_messages",
"cleanup_stale_project_dirs",
"compact_transcript",
"delete_transcript",
"download_transcript",
"read_compacted_entries",
"strip_for_upload",
"strip_progress_entries",
"strip_stale_thinking_blocks",
"upload_transcript",

View File

@@ -850,7 +850,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_no_client_uses_truncation(self):
"""Path (a): ``get_openai_client()`` returns None → truncation only."""
from backend.copilot.transcript import _run_compression
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated"}]
@@ -885,7 +885,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_success_returns_llm_result(self):
"""Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
from backend.copilot.transcript import _run_compression
from .transcript import _run_compression
llm_result = self._make_compress_result(
True, [{"role": "user", "content": "LLM summary"}]
@@ -916,7 +916,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_failure_falls_back_to_truncation(self):
"""Path (c): LLM call raises → truncation fallback used instead."""
from backend.copilot.transcript import _run_compression
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated fallback"}]
@@ -953,7 +953,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_timeout_falls_back_to_truncation(self):
"""Path (d): LLM call exceeds timeout → truncation fallback used."""
from backend.copilot.transcript import _run_compression
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated after timeout"}]
@@ -1007,7 +1007,7 @@ class TestCleanupStaleProjectDirs:
def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories matching copilot pattern older than threshold are removed."""
from backend.copilot.transcript import (
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1039,7 +1039,7 @@ class TestCleanupStaleProjectDirs:
def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories not matching copilot pattern are left alone."""
from backend.copilot.transcript import cleanup_stale_project_dirs
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
@@ -1062,7 +1062,7 @@ class TestCleanupStaleProjectDirs:
def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
"""A directory exactly at the TTL boundary should NOT be removed."""
from backend.copilot.transcript import (
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1088,7 +1088,7 @@ class TestCleanupStaleProjectDirs:
def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
"""Regular files matching the copilot pattern are not removed."""
from backend.copilot.transcript import (
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1114,7 +1114,7 @@ class TestCleanupStaleProjectDirs:
def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
"""If the projects base directory doesn't exist, return 0 gracefully."""
from backend.copilot.transcript import cleanup_stale_project_dirs
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
nonexistent = str(tmp_path / "does-not-exist" / "projects")
monkeypatch.setattr(
@@ -1129,7 +1129,7 @@ class TestCleanupStaleProjectDirs:
"""When encoded_cwd is supplied only that directory is swept."""
import time
from backend.copilot.transcript import (
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1160,7 +1160,7 @@ class TestCleanupStaleProjectDirs:
def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
"""Scoped sweep leaves a fresh directory alone."""
from backend.copilot.transcript import cleanup_stale_project_dirs
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
@@ -1181,7 +1181,7 @@ class TestCleanupStaleProjectDirs:
"""Scoped sweep refuses to remove a non-copilot directory."""
import time
from backend.copilot.transcript import (
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)

View File

@@ -4,12 +4,15 @@ Both the baseline (OpenRouter) and SDK (Anthropic) service layers need to:
1. Append a ``Usage`` record to the session.
2. Log the turn's token counts.
3. Record weighted usage in Redis for rate-limiting.
4. Write a PlatformCostLog entry for admin cost tracking.
This module extracts that common logic so both paths stay in sync.
"""
import logging
from backend.data.platform_cost import PlatformCostEntry, log_platform_cost_safe
from .model import ChatSession, Usage
from .rate_limit import record_token_usage
@@ -95,4 +98,47 @@ async def persist_and_record_usage(
except Exception as usage_err:
logger.warning(f"{log_prefix} Failed to record token usage: {usage_err}")
# Log to PlatformCostLog for admin cost dashboard
if user_id and total_tokens > 0:
cost_float = None
if cost_usd is not None:
try:
cost_float = float(cost_usd)
except (ValueError, TypeError):
pass
cost_microdollars = (
int(cost_float * 1_000_000) if cost_float is not None else None
)
session_id = session.session_id if session else None
if cost_float is not None:
tracking_type = "cost_usd"
tracking_amount = cost_float
else:
tracking_type = "tokens"
tracking_amount = total_tokens
await log_platform_cost_safe(
PlatformCostEntry(
user_id=user_id,
graph_exec_id=session_id,
block_id="copilot",
block_name=f"copilot:{log_prefix.strip(' []')}".rstrip(":"),
provider="open_router",
credential_id="copilot_system",
cost_microdollars=cost_microdollars,
input_tokens=prompt_tokens,
output_tokens=completion_tokens,
model=None,
metadata={
"tracking_type": tracking_type,
"tracking_amount": tracking_amount,
"cache_read_tokens": cache_read_tokens,
"cache_creation_tokens": cache_creation_tokens,
"source": "copilot",
},
)
)
return total_tokens

View File

@@ -33,12 +33,6 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
_GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
_TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"
# Default OrchestratorBlock model/mode — kept in sync with ChatConfig.model.
# ChatConfig uses the OpenRouter format ("anthropic/claude-opus-4.6");
# OrchestratorBlock uses the native Anthropic model name.
ORCHESTRATOR_DEFAULT_MODEL = "claude-opus-4-6"
ORCHESTRATOR_DEFAULT_EXECUTION_MODE = "extended_thinking"
# Defaults applied to OrchestratorBlock nodes by the fixer.
# execution_mode and model match the copilot's default (extended thinking
# with Opus) so generated agents inherit the same reasoning capabilities.
@@ -48,8 +42,8 @@ _SDM_DEFAULTS: dict[str, int | bool | str] = {
"conversation_compaction": True,
"retry": 3,
"multiple_tool_calls": False,
"execution_mode": ORCHESTRATOR_DEFAULT_EXECUTION_MODE,
"model": ORCHESTRATOR_DEFAULT_MODEL,
"execution_mode": "extended_thinking",
"model": "claude-opus-4-6",
}
@@ -890,12 +884,6 @@ class AgentFixer:
)
if is_ai_block:
# Skip AI blocks that don't expose a "model" input property
# (some AI-category blocks have no model selector at all).
input_properties = block.get("inputSchema", {}).get("properties", {})
if "model" not in input_properties:
continue
node_id = node.get("id")
input_default = node.get("input_default", {})
current_model = input_default.get("model")
@@ -904,7 +892,9 @@ class AgentFixer:
# Blocks with a block-specific enum on the model field (e.g.
# PerplexityBlock) use their own enum values; others use the
# generic set.
model_schema = input_properties.get("model", {})
model_schema = (
block.get("inputSchema", {}).get("properties", {}).get("model", {})
)
block_model_enum = model_schema.get("enum")
if block_model_enum:
@@ -1765,12 +1755,6 @@ class AgentFixer:
agent = self.fix_node_x_coordinates(agent, node_lookup=node_lookup)
agent = self.fix_getcurrentdate_offset(agent)
# Apply OrchestratorBlock defaults BEFORE fix_ai_model_parameter so that
# the orchestrator-specific model (claude-opus-4-6) is set first and
# fix_ai_model_parameter sees it as a valid allowed model instead of
# overwriting it with the generic default (gpt-4o).
agent = self.fix_orchestrator_blocks(agent)
# Apply fixes that require blocks information
if blocks:
agent = self.fix_invalid_nested_sink_links(
@@ -1788,6 +1772,9 @@ class AgentFixer:
# Apply fixes for MCPToolBlock nodes
agent = self.fix_mcp_tool_blocks(agent)
# Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
agent = self.fix_orchestrator_blocks(agent)
# Apply fixes for AgentExecutorBlock nodes (sub-agents)
if library_agents:
agent = self.fix_agent_executor_blocks(agent, library_agents)

View File

@@ -580,29 +580,6 @@ class TestFixAiModelParameter:
assert result["nodes"][0]["input_default"]["model"] == "perplexity/sonar"
def test_ai_block_without_model_property_is_skipped(self):
"""AI-category blocks that have no 'model' input property should not
have a model injected — they simply don't expose a model selector."""
fixer = AgentFixer()
block_id = generate_uuid()
node = _make_node(node_id="n1", block_id=block_id, input_default={})
agent = _make_agent(nodes=[node])
blocks = [
{
"id": block_id,
"name": "SomeAIBlock",
"categories": [{"category": "AI"}],
"inputSchema": {
"properties": {"prompt": {"type": "string"}},
},
}
]
result = fixer.fix_ai_model_parameter(agent, blocks)
assert "model" not in result["nodes"][0]["input_default"]
class TestFixAgentExecutorBlocks:
"""Tests for fix_agent_executor_blocks."""

View File

@@ -1,15 +0,0 @@
"""Tests for GetAgentBuildingGuideTool."""
from backend.copilot.tools.get_agent_building_guide import _load_guide
def test_load_guide_returns_string():
guide = _load_guide()
assert isinstance(guide, str)
assert len(guide) > 100
def test_load_guide_caches():
guide1 = _load_guide()
guide2 = _load_guide()
assert guide1 is guide2

View File

@@ -90,12 +90,11 @@ async def test_simulate_block_basic():
with patch(
"backend.executor.simulator.get_openai_client", return_value=mock_client
) as mock_get_client:
):
outputs = []
async for name, data in simulate_block(mock_block, {"query": "test"}):
outputs.append((name, data))
mock_get_client.assert_called_once_with(prefer_openrouter=True)
assert ("result", "simulated output") in outputs
# Empty error pin should NOT be yielded — the simulator omits empty values
assert ("error", "") not in outputs

View File

@@ -845,7 +845,6 @@ class WriteWorkspaceFileTool(BaseTool):
path=path,
mime_type=mime_type,
overwrite=overwrite,
metadata={"origin": "agent-created"},
)
# Build informative source label and message.

View File

@@ -217,131 +217,6 @@ def strip_stale_thinking_blocks(content: str) -> str:
return "\n".join(result_lines) + "\n"
def strip_for_upload(content: str) -> str:
"""Combined single-parse strip of progress entries and stale thinking blocks.
Equivalent to ``strip_stale_thinking_blocks(strip_progress_entries(content))``
but parses the JSONL only once, avoiding redundant ``split`` + ``json.loads``
passes on every upload.
"""
lines = content.strip().split("\n")
if not lines:
return content
parsed: list[tuple[str, dict | None]] = []
for line in lines:
parsed.append((line, json.loads(line, fallback=None)))
# --- Phase 1: progress stripping (reparent children) ---
stripped_uuids: set[str] = set()
uuid_to_parent: dict[str, str] = {}
for _line, entry in parsed:
if not isinstance(entry, dict):
continue
uid = entry.get("uuid", "")
parent = entry.get("parentUuid", "")
if uid:
uuid_to_parent[uid] = parent
if (
entry.get("type", "") in STRIPPABLE_TYPES
and uid
and not entry.get("isCompactSummary")
):
stripped_uuids.add(uid)
reparented: set[str] = set()
for _line, entry in parsed:
if not isinstance(entry, dict):
continue
parent = entry.get("parentUuid", "")
original_parent = parent
seen_parents: set[str] = set()
while parent in stripped_uuids and parent not in seen_parents:
seen_parents.add(parent)
parent = uuid_to_parent.get(parent, "")
if parent != original_parent:
entry["parentUuid"] = parent
uid = entry.get("uuid", "")
if uid:
reparented.add(uid)
# --- Phase 2: identify last assistant for thinking-block stripping ---
last_asst_msg_id: str | None = None
last_asst_idx: int | None = None
for i in range(len(parsed) - 1, -1, -1):
_line, entry = parsed[i]
if not isinstance(entry, dict):
continue
if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
"isCompactSummary"
):
continue
msg = entry.get("message", {})
if msg.get("role") == "assistant":
last_asst_msg_id = msg.get("id")
last_asst_idx = i
break
# --- Phase 3: single output pass ---
result_lines: list[str] = []
thinking_stripped = 0
for i, (line, entry) in enumerate(parsed):
if not isinstance(entry, dict):
result_lines.append(line)
continue
# Drop progress/metadata entries
if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
"isCompactSummary"
):
continue
needs_reserialize = False
uid = entry.get("uuid", "")
# Reparented entries need re-serialization
if uid in reparented:
needs_reserialize = True
# Strip stale thinking blocks from non-last assistant entries
if last_asst_idx is not None:
msg = entry.get("message", {})
is_last_turn = (
last_asst_msg_id is not None and msg.get("id") == last_asst_msg_id
) or (last_asst_msg_id is None and i == last_asst_idx)
if (
msg.get("role") == "assistant"
and not is_last_turn
and isinstance(msg.get("content"), list)
):
content_blocks = msg["content"]
filtered = [
b
for b in content_blocks
if not (
isinstance(b, dict) and b.get("type") in _THINKING_BLOCK_TYPES
)
]
if len(filtered) < len(content_blocks):
thinking_stripped += len(content_blocks) - len(filtered)
entry = {**entry, "message": {**msg, "content": filtered}}
needs_reserialize = True
if needs_reserialize:
result_lines.append(json.dumps(entry, separators=(",", ":")))
else:
result_lines.append(line)
if thinking_stripped:
logger.info(
"[Transcript] Stripped %d stale thinking block(s) from non-last entries",
thinking_stripped,
)
return "\n".join(result_lines) + "\n"
# ---------------------------------------------------------------------------
# Local file I/O (write temp file for --resume)
# ---------------------------------------------------------------------------
@@ -658,7 +533,6 @@ async def upload_transcript(
content: str,
message_count: int = 0,
log_prefix: str = "[Transcript]",
skip_strip: bool = False,
) -> None:
"""Strip progress entries and stale thinking blocks, then upload transcript.
@@ -671,18 +545,14 @@ async def upload_transcript(
Args:
content: Complete JSONL transcript (from TranscriptBuilder).
message_count: ``len(session.messages)`` at upload time.
skip_strip: When ``True``, skip the strip + re-validate pass.
Safe for builder-generated content (baseline path) which
never emits progress entries or stale thinking blocks.
"""
if skip_strip:
# Caller guarantees the content is already clean and valid.
stripped = content
else:
# Strip metadata entries and stale thinking blocks in a single parse.
# SDK-built transcripts may have progress entries; strip for safety.
stripped = strip_for_upload(content)
if not skip_strip and not validate_transcript(stripped):
# Strip metadata entries (progress, file-history-snapshot, etc.)
# Note: SDK-built transcripts shouldn't have these, but strip for safety.
# Also strip stale thinking blocks from non-last assistant entries to
# prevent token bloat when switching between SDK and baseline modes.
stripped = strip_progress_entries(content)
stripped = strip_stale_thinking_blocks(stripped)
if not validate_transcript(stripped):
# Log entry types for debugging — helps identify why validation failed
entry_types = [
json.loads(line, fallback={"type": "INVALID_JSON"}).get("type", "?")
@@ -703,34 +573,27 @@ async def upload_transcript(
storage = await get_workspace_storage()
wid, fid, fname = _storage_path_parts(user_id, session_id)
encoded = stripped.encode("utf-8")
meta = {"message_count": message_count, "uploaded_at": time.time()}
mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
meta_encoded = json.dumps(meta).encode("utf-8")
# Transcript + metadata are independent objects at different keys, so
# write them concurrently. ``return_exceptions`` keeps a metadata
# failure from sinking the transcript write.
transcript_result, metadata_result = await asyncio.gather(
storage.store(
workspace_id=wid,
file_id=fid,
filename=fname,
content=encoded,
),
storage.store(
await storage.store(
workspace_id=wid,
file_id=fid,
filename=fname,
content=encoded,
)
# Update metadata so message_count stays current. The gap-fill logic
# in _build_query_message relies on it to avoid re-compressing messages.
try:
meta = {"message_count": message_count, "uploaded_at": time.time()}
mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
await storage.store(
workspace_id=mwid,
file_id=mfid,
filename=mfname,
content=meta_encoded,
),
return_exceptions=True,
)
if isinstance(transcript_result, BaseException):
raise transcript_result
if isinstance(metadata_result, BaseException):
# Metadata is best-effort — the gap-fill logic in
# _build_query_message tolerates a missing metadata file.
logger.warning("%s Failed to write metadata: %s", log_prefix, metadata_result)
content=json.dumps(meta).encode("utf-8"),
)
except Exception as e:
logger.warning("%s Failed to write metadata: %s", log_prefix, e)
logger.info(
"%s Uploaded %dB (stripped from %dB, msg_count=%d)",
@@ -750,44 +613,33 @@ async def download_transcript(
Returns a ``TranscriptDownload`` with the JSONL content and the
``message_count`` watermark from the upload, or ``None`` if not found.
The content and metadata fetches run concurrently since they are
independent objects in the bucket.
"""
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
meta_path = _build_meta_storage_path(user_id, session_id, storage)
content_task = asyncio.create_task(storage.retrieve(path))
meta_task = asyncio.create_task(storage.retrieve(meta_path))
content_result, meta_result = await asyncio.gather(
content_task, meta_task, return_exceptions=True
)
if isinstance(content_result, FileNotFoundError):
try:
data = await storage.retrieve(path)
content = data.decode("utf-8")
except FileNotFoundError:
logger.debug("%s No transcript in storage", log_prefix)
return None
if isinstance(content_result, BaseException):
logger.warning(
"%s Failed to download transcript: %s", log_prefix, content_result
)
except Exception as e:
logger.warning("%s Failed to download transcript: %s", log_prefix, e)
return None
content = content_result.decode("utf-8")
# Metadata is best-effort — old transcripts won't have it.
# Try to load metadata (best-effort — old transcripts won't have it)
message_count = 0
uploaded_at = 0.0
if isinstance(meta_result, FileNotFoundError):
pass # No metadata — treat as unknown (msg_count=0 → always fill gap)
elif isinstance(meta_result, BaseException):
logger.debug(
"%s Failed to load transcript metadata: %s", log_prefix, meta_result
)
else:
meta = json.loads(meta_result.decode("utf-8"), fallback={})
try:
meta_path = _build_meta_storage_path(user_id, session_id, storage)
meta_data = await storage.retrieve(meta_path)
meta = json.loads(meta_data.decode("utf-8"), fallback={})
message_count = meta.get("message_count", 0)
uploaded_at = meta.get("uploaded_at", 0.0)
except FileNotFoundError:
pass # No metadata — treat as unknown (msg_count=0 → always fill gap)
except Exception as e:
logger.debug("%s Failed to load transcript metadata: %s", log_prefix, e)
logger.info(
"%s Downloaded %dB (msg_count=%d)", log_prefix, len(content), message_count
@@ -829,7 +681,6 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
# JSONL protocol values used in transcript serialization.
STOP_REASON_END_TURN = "end_turn"
STOP_REASON_TOOL_USE = "tool_use"
COMPACT_MSG_ID_PREFIX = "msg_compact_"
ENTRY_TYPE_MESSAGE = "message"

View File

@@ -29,7 +29,7 @@ class TranscriptEntry(BaseModel):
type: str
uuid: str
parentUuid: str = ""
parentUuid: str | None
isCompactSummary: bool | None = None
message: dict[str, Any]
@@ -67,7 +67,7 @@ class TranscriptBuilder:
return TranscriptEntry(
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid") or "",
parentUuid=data.get("parentUuid"),
isCompactSummary=data.get("isCompactSummary"),
message=data.get("message", {}),
)
@@ -118,7 +118,7 @@ class TranscriptBuilder:
TranscriptEntry(
type="user",
uuid=msg_uuid,
parentUuid=self._last_uuid or "",
parentUuid=self._last_uuid,
message={"role": "user", "content": content},
)
)
@@ -158,7 +158,7 @@ class TranscriptBuilder:
TranscriptEntry(
type="assistant",
uuid=msg_uuid,
parentUuid=self._last_uuid or "",
parentUuid=self._last_uuid,
message={
"role": "assistant",
"model": model,
@@ -233,8 +233,3 @@ class TranscriptBuilder:
def is_empty(self) -> bool:
"""Whether this builder has any entries."""
return len(self._entries) == 0
@property
def last_entry_type(self) -> str | None:
"""Type of the last entry, or None if empty."""
return self._entries[-1].type if self._entries else None

Some files were not shown because too many files have changed in this diff Show More