Compare commits

...

111 Commits

Author SHA1 Message Date
Zamil Majdy
f95ced2b7b fix(backend/copilot): single-owner StreamFinish via mark_session_completed
mark_session_completed() is now the SINGLE place that publishes
StreamFinish to the turn stream. Simplified API:
- mark_session_completed(session_id) → completed
- mark_session_completed(session_id, error_message='...') → failed

Flow: set status (Lua CAS) → StreamError if failed → StreamFinish.

The processor intercepts StreamFinish from generators, calls
mark_session_completed instead. Removed _mark_task_failed (redundant).
Removed cleanup_turn_stream (streams have TTL, eager deletion raced
with _stream_listener xread).
2026-02-24 14:35:40 +07:00
Zamil Majdy
4e9d3aae48 refactor(backend/copilot): rename stream_registry 'task' terminology to 'session'/'turn'
Rename all 'task' references in stream_registry.py to use consistent
session/turn terminology so callers know which ID to pass:

- ActiveTask → ActiveSession
- _get_task_meta_key → _get_session_meta_key
- _get_task_stream_key → _get_turn_stream_key
- create_task → create_session_task
- subscribe_to_task → subscribe_to_session
- unsubscribe_from_task → unsubscribe_from_session
- mark_task_completed → mark_session_completed
- get_task → get_session
- get_active_task_for_session → get_active_session
- Config: task_meta_prefix → session_meta_prefix, task_stream_prefix → turn_stream_prefix

Redis key default values preserved for backward compat.
2026-02-24 13:48:43 +07:00
Zamil Majdy
e073665d26 fix(frontend/copilot): restore original error copy text
Keep the improved collapsible <details> UI but restore the original
copywriting: 'The assistant encountered an error. Please try sending
your message again.'
2026-02-24 13:37:20 +07:00
Zamil Majdy
76a924fdef fix(backend/copilot): address PR review comments
- Move Redis stream cleanup from processor into stream_registry.cleanup_turn_stream()
- Remove obvious comment on CancelledError raise
- Remove unused get_redis_async import from processor
- Remove unnecessary comment in ChatMessagesContainer
- Save session on CancelledError to prevent message loss
2026-02-24 13:07:59 +07:00
Zamil Majdy
b30138664d Merge branch 'dev' into fix/copilot-synchronous-long-running-tools 2026-02-24 10:09:59 +07:00
Zamil Majdy
f10615c915 fix(frontend/copilot): allow duplicate user messages
Previously, sending the same message twice would cause the second one to disappear
due to aggressive fingerprint-based deduplication that applied to all messages.

Now fingerprint deduplication only applies to assistant messages (where it's needed
for hydration/stream boundary dedup). User messages are only deduplicated by ID,
allowing users to send the same text multiple times.

Fixes issue where duplicate user messages would be filtered out.
2026-02-24 06:00:18 +07:00
Zamil Majdy
f6973ec3b2 fix(frontend/copilot): integrate loading indicators with accordions instead of stacking
Previously, loading indicators were shown separately above accordions, causing them to stack.
Now the loading text only appears when there's no accordion, keeping the UI cleaner.

Changes:
- GenericTool: Only show loading text when not showing accordion
- RunAgentTool: Only show loading text when not showing accordion or other content
- For TodoWrite and running agents, the accordion now contains all the UI without duplicate indicators

This fixes the issue where parallel tasks show stacked loading indicators above their respective accordions.
2026-02-24 05:58:51 +07:00
Zamil Majdy
5195a77ed8 fix(frontend/copilot): show agent generation errors inline instead of stacking accordions
When an agent generation fails and retries, both the error accordion and the
new loading accordion were stacking on top of each other instead of replacing.

Changes:
- Modified CreateAgent and EditAgent tools to show errors inline (no accordion)
- Errors now get replaced when a new attempt starts, instead of stacking
- Error display includes technical details in collapsible section and retry button
- Prevents confusing UI where old error and new loading state both appear

Also includes previous fixes:
- Skip incomplete tool calls during hydration to prevent "No tool invocation found" errors
- Use last_message_id="0-0" in resume endpoint to replay all chunks for proper tool call delivery
2026-02-24 05:54:24 +07:00
Zamil Majdy
6fe5b9942e feat: fix openapi.json 2026-02-24 04:27:48 +07:00
Zamil Majdy
da43c214be refactor(backend/copilot): eliminate task_id, use only session_id and turn_id
- Remove dead GET /tasks/{task_id}/stream and GET /tasks/{task_id} endpoints
- Rename CancelCoPilotEvent.task_id to session_id
- Rename enqueue_cancel_task param to session_id
- Rename all task_id local vars in manager.py to session_id
- Rename Redis lock key copilot:task:{id}:lock to copilot:session:{id}:lock
- Rename StreamStart.taskId to sessionId
- Remove set_task_id from SDKResponseAdapter (uses self.session_id)
- Remove task_id from CancelTaskResponse and CoPilotExecutionEntry
- Update agent_generator dummy/service docstrings
- Regenerate openapi.json and frontend generated types
- Update tests
2026-02-24 04:15:34 +07:00
Zamil Majdy
2406e21162 revert: remove TEMPORARY test mode, restore LaunchDarkly flag logic
- Remove COPILOT_TEST_MODE conditional
- Restore original use_sdk feature flag behavior
- Addresses PR review comment
2026-02-24 03:22:57 +07:00
Zamil Majdy
c388985c75 refactor(backend/copilot): move local imports to module level
- Move redis_client import to top of processor.py
- Move uuid4 import to top of routes.py
- Addresses PR review comments about local imports
2026-02-24 03:16:26 +07:00
Zamil Majdy
392d1c3e48 refactor(backend/copilot): remove redundant task_id, use session_id directly
- Remove task_id from CoPilotExecutionEntry (was always session_id)
- Remove task_id param from enqueue_copilot_task
- Manager uses entry.session_id for dedup/locking
- Clean up cancel endpoint to use session_id directly
- Revert unintended response_adapter.py changes
2026-02-24 03:01:15 +07:00
Zamil Majdy
e5bdc7b8a9 fix(backend/copilot): remove debug logging from response_adapter
- Remove UserMessage block logging
- Remove flush operation logging
- Remove flushed output logging
- Eliminates production log overhead as requested in PR review
2026-02-24 02:45:58 +07:00
Zamil Majdy
8c75b5018f fix(backend/copilot): downgrade debug logging to logger.debug in response_adapter
- Change logger.info to logger.debug for UserMessage, flush operations
- Reduces production log noise and overhead from tool call serialization
- Addresses PR review comment about debug logging
2026-02-24 02:42:19 +07:00
Zamil Majdy
bf8c2eb5c0 fix(backend/copilot): increase stale task timeout to match 1-hour tool limit
- Increase timeout from 10 minutes to COPILOT_CONSUMER_TIMEOUT_SECONDS (1 hour) + 5min buffer
- Prevents false failures when users refresh during long-running agent generation (20-30 min)
- Fixes critical bug where valid tasks were marked as failed after 10 minutes
- Use shared constant instead of hardcoded value for consistency
2026-02-24 02:41:17 +07:00
Zamil Majdy
db281f1bd0 feat: fix openapi.json 2026-02-24 02:33:18 +07:00
Zamil Majdy
2663ac2377 fix(backend/copilot): update test mocks after agent_generator refactor
The get_all_relevant_agents_for_generation and decompose_goal functions
were moved from create_agent to agent_generator module. Tests need to
mock these at the import location (create_agent) not the definition
location (agent_generator).

Fixes 3 failing tests:
- test_vague_goal_returns_suggested_goal_response
- test_unachievable_goal_returns_suggested_goal_response
- test_clarifying_questions_returns_clarification_needed_response
2026-02-24 02:19:27 +07:00
Zamil Majdy
f9596b63c7 refactor(backend/copilot): rename task_id to turn_id in ActiveStreamInfo
Use turn_id consistently where it represents the per-turn stream UUID.
Fix missing turn_id generation and propagation in stream_chat_post.
2026-02-24 02:16:50 +07:00
Zamil Majdy
4088b2ba63 fix(platform): address PR review comments
- Revert out-of-scope changes: .gitignore, auth/dependencies.py, auth/jwt_utils.py,
  backend/.gitignore, sdk/response_adapter.py
- Raise ValueError instead of silently truncating search queries in
  agent_generator/core.py so the AI can adjust its search term
- Remove .application.logs artifact
2026-02-24 01:57:29 +07:00
Zamil Majdy
004c2d4201 chore: remove test artifacts and stale analysis files 2026-02-24 01:34:31 +07:00
Zamil Majdy
f4ccdc3e9e refactor(platform/copilot): introduce turn_id for stream isolation and cleanup
- Introduce turn_id (per-turn UUID) for Redis stream key isolation,
  preventing cross-turn event accumulation on refresh
- Remove incremental DB saves during streaming — persist only at end of turn
- Remove debug/timing logging artifacts added during development
- Remove accidentally committed analysis MD files
- Clean up excessive logging throughout copilot backend
2026-02-24 01:09:33 +07:00
Bently
ef42b17e3b docs: add Podman compatibility warning (#12120)
## Summary
Adds a warning to the Getting Started docs clarifying that **Podman and
podman-compose are not supported**.

## Problem
Users on Windows using `podman-compose` instead of Docker get errors
like:
```
Error: the specified Containerfile or Dockerfile does not exist, ..\..\autogpt_platform\backend\Dockerfile
```

This is because Podman handles relative paths differently than Docker,
causing incorrect path resolution on Windows.

## Solution
- Added a clear warning section after the Windows WSL 2 notes
- Explains the error users might see
- Directs them to install Docker Desktop instead

Closes #11358

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Adds a "Podman Not Supported" warning section to the Getting Started
documentation, placed after the Windows/WSL 2 installation notes. The
section clarifies that Docker is required, shows the typical error
message users encounter when using Podman, and directs them to install
Docker Desktop instead. This addresses issue #11358 where Windows users
using `podman-compose` hit path resolution errors.

- Adds `### ⚠️ Podman Not Supported` section under Manual Setup, after
Windows Installation Note
- Includes the specific error message users see with Podman for easy
identification
- Links to Docker Desktop installation docs as the recommended solution
- Formatting is consistent with existing sections in the document (emoji
headings, code blocks for errors)
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge — it only adds a documentation warning
section with no code changes.
- The change is a small, well-written documentation addition that adds a
Podman compatibility warning. It touches only one markdown file,
introduces no code changes, and is consistent with the existing document
structure and style. No issues were found.
- No files require special attention.
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[User wants to run AutoGPT] --> B{Which container runtime?}
    B -->|Docker / Docker Desktop| C[docker compose up -d --build]
    C --> D[AutoGPT starts successfully]
    B -->|Podman / podman-compose| E[podman-compose up -d --build]
    E --> F[Error: Containerfile or Dockerfile does not exist]
    F --> G[New warning section directs user to install Docker Desktop]
    G --> C
```
</details>


<sub>Last reviewed commit: 23ea6bd</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-23 15:19:24 +00:00
Zamil Majdy
df14b031b2 fix(platform/copilot): fix resume stream replay and message dedup
Backend: Subscribe resume stream from latest position ($) instead of 0-0.
Hydrated messages from REST already contain all history — replay only
delivers NEW chunks (tool progress, heartbeats, StreamFinish) going
forward. Prevents intro text duplication on refresh.

Frontend: Include toolCallId in message fingerprint so tool-only messages
(no text content) are properly deduplicated across hydration/stream
boundary.
2026-02-23 19:55:48 +07:00
Zamil Majdy
dd047d65a8 fix(platform/copilot): preserve tool_calls on refresh, improve EditAgent UI
- Keep assistant message with tool_calls in GET /sessions response during
  active stream so frontend renders correct tool UI (mini game) on refresh
- Use last_message_id from Redis instead of 0-0 for SSE replay position
- Match EditAgent saved-output card to CreateAgent sparkles design
- Verified StreamFinish flow is correct (mark_task_completed publishes it)
2026-02-23 19:47:05 +07:00
Zamil Majdy
d864ab6908 fix(frontend/copilot): revert hasResumedRef.delete to prevent infinite resume loop
The delete call was causing a feedback loop: stream completes → ref cleared
→ hasActiveStream still true (stale query) → resume triggers again → 73+
rapid-fire GET /stream requests. The ref should persist for the full page
lifecycle since a fresh page load creates a new Map via useRef.
2026-02-23 19:33:02 +07:00
Zamil Majdy
5751bf35a4 fix(frontend/copilot): clear hasResumedRef when stream completes
Prevents stale resume flag from blocking subsequent stream resumptions
in the same session after page refresh.
2026-02-23 19:16:34 +07:00
Zamil Majdy
b02445e755 fix(backend/copilot): fix CI test failures from merge
- Re-add missing get_all_relevant_agents_for_generation import in create_agent.py
- Fix test assertions for generate_agent_external (2 args, not 4) and
  generate_agent_patch_external (3 args, not 5) after operation_id/task_id removal
2026-02-23 19:05:14 +07:00
Zamil Majdy
0b3539ca89 fix(backend/copilot): address PR review comments
- Move json import to module level in response_model.py (was inside to_sse())
- Track fire-and-forget asyncio.create_task for title update to prevent GC
- Remove redundant inline import asyncio in service.py
2026-02-23 18:46:39 +07:00
Zamil Majdy
1d042c4f10 refactor(platform): remove long-running tool infrastructure and fix copilot bugs
Remove the entire long-running/async tool execution infrastructure that
became redundant after the executor system was introduced. All tools now
execute uniformly through execute_tool() with heartbeat support.

Backend cleanup:
- Remove _execute_long_running_tool_with_streaming and is_long_running
- Delete completion_consumer.py and completion_handler.py (dead code)
- Remove _generate_llm_continuation (both variants), _update_pending_operation,
  _mark_operation_started/completed
- Remove operation_id plumbing from stream_registry, routes, executor
- Remove LongRunningCallback from SDK tool_adapter
- Remove dead config fields (stream_completion_name, etc.)

Frontend cleanup:
- Remove OperationInProgressResponse checks from CreateAgent/EditAgent
- Remove debug logging from useCopilotPage
- Delete dead task-level stream route and useAsymptoticProgress hook
- Regenerate OpenAPI types

Bug fixes:
- #3: Pessimistic input lock — disable chat while session loading or errored
- #3: Replace O(N) SCAN with O(1) direct lookup in get_active_task_for_session
- #1: Content-based deduplication to prevent intro message replay on resume
- Fix parallel_tool_calls_test to match new _yield_tool_call signature
2026-02-23 18:34:46 +07:00
Zamil Majdy
082389d77f Merge remote-tracking branch 'origin/dev' into fix/copilot-synchronous-long-running-tools
# Conflicts:
#	autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
#	autogpt_platform/frontend/src/app/(platform)/copilot/tools/EditAgent/EditAgent.tsx
2026-02-23 18:28:01 +07:00
Ubbe
a18ffd0b21 fix(frontend/copilot): always-visible credentials, inputs, and login prompts (#12194)
Credentials, inputs, and login prompts in copilot tool outputs were
hidden inside collapsible accordions — users could accidentally collapse
them, hiding blocking actionable UI. This PR extracts all blocking
requirements out of accordions so they're always visible.

### Changes 🏗️

- **RunAgent & RunBlock**: Extract `SetupRequirementsCard` (credentials
picker) out of `ToolAccordion` — renders standalone, always visible
- **RunAgent**: Also extract `AgentDetailsCard` (inputs needed) and
`need_login` message out of accordion
- **SetupRequirementsCard (RunBlock)**: Input form always visible
(removed toggle button and animation), unified "Proceed" button disabled
until credentials + inputs are satisfied
- **SetupRequirementsCard (RunAgent)**: "Proceed" button disabled until
all credentials are selected
- **Both cards**: Added titled box with border for credentials section
("Block credentials" / "Agent credentials"), matching the existing
inputs box pattern
- **CredentialsFlatView**: "Add" button uses `variant="primary"` when
user has no credentials (was `secondary`)
- **Styleguide**: Added mock `CredentialsProvidersContext` with two
scenarios:
  - No credentials → shows "add new" flow
  - Has credentials → shows selection list with existing accounts
- **CreateAgent & EditAgent**: Picked up user-initiated styling
refinements

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format && pnpm lint && pnpm types` all pass
  - [ ] Visit `/copilot/styleguide` and verify:
- [ ] "Setup requirements — no credentials" shows add-credential button
(primary variant)
- [ ] "Setup requirements — has credentials" shows credential selection
dropdown
- [ ] Both RunAgent and RunBlock setup requirements render outside
accordion
- [ ] Trigger a copilot agent run that requires credentials — credential
picker always visible
- [ ] Trigger a copilot block run that requires credentials + inputs —
both sections visible, "Proceed" disabled until ready
- [ ] Trigger a copilot agent run that returns "agent details" — card
renders outside accordion
- [ ] Verify other output types (execution_started, error) still render
inside accordions


🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 16:39:21 +07:00
Zamil Majdy
e44154ba3d refactor(backend/copilot): remove StreamFinishStep from error path too
Same reasoning - single-step synchronous tools don't need step separation
in either success or error paths. Frontend only needs StreamFinish signal.
2026-02-23 15:07:34 +07:00
Zamil Majdy
046a04ec19 refactor(backend/copilot): remove unnecessary StreamFinishStep from sync tool path
StreamFinishStep is only needed for multi-step LLM continuations (tool → LLM → tool).
For synchronous long-running tools, there's only 1 step, so step separation isn't needed.

Frontend doesn't listen to finish_step events (confirmed via grep - no matches).
Only StreamFinish is required to signal stream completion.

Simplifies success path to match actual requirements.
2026-02-23 15:06:50 +07:00
Zamil Majdy
77de834b33 fix(backend/copilot): publish StreamFinish after sync tool completion
CRITICAL BUG FIX: Addresses issue #2 from manual testing - agent generation
result doesn't show without refresh.

Root Cause:
_execute_long_running_tool_with_streaming() was publishing the tool result
but NOT publishing StreamFinish/StreamFinishStep events. The error path
correctly published these events (lines 1682-1683), but the success path
was missing them.

Impact:
- Frontend never received StreamFinish signal after tool completion
- AI SDK waited indefinitely for stream to end
- UI showed mini-game forever until page refresh
- useLongRunningToolPolling hook was removed assuming sync execution would work
- But sync execution wasn't signaling completion properly

Fix:
After publishing tool result chunk, now also publish:
1. StreamFinishStep() - signals step completion
2. StreamFinish() - signals stream end
3. mark_task_completed(status="completed") - updates stream registry

This matches the error path behavior and ensures frontend knows when
long-running tools complete.

Also added CRITICAL_BUGS_ANALYSIS.md documenting all 3 manual test issues:
1. Intro message replay on retry
2. Agent generation result not showing (FIXED by this commit)
3. Red button/lock delay after refresh
2026-02-23 15:01:42 +07:00
Zamil Majdy
6d728a0f9c fix(backend/copilot): use ContextVar for tool_use_id instead of DB query
Addresses autogpt-reviewer blocker #1 (dead ContextVar).

The _current_tool_use_id ContextVar was being set in security_hooks.py
but never consumed. Now the long-running callback uses it as the primary
source (with DB query as fallback), which is faster and more reliable.

Flow:
1. security_hooks.py sets _current_tool_use_id for long-running tools
2. Callback reads from ContextVar first (no DB query needed)
3. Falls back to _find_latest_tool_use_id() if not set
4. Final fallback generates sdk-{uuid} if neither works

This fixes the "dead code" issue identified in PR review.
2026-02-23 14:56:16 +07:00
Zamil Majdy
7b10edb820 test(backend/copilot): add 4 reconnection tests for stream deduplication and event ordering
- test_message_deduplication: verify message IDs are unique
- test_event_ordering: verify StreamStart < text deltas < StreamFinish
- test_stream_completeness: verify exactly 1 start, N text, 1 finish
- test_text_delta_consistency: verify all deltas share same block ID and build coherent text

All tests pass using COPILOT_TEST_MODE=true with dummy implementations.
Improves test coverage for reconnection flow identified in RECONNECTION_FLOW_REVIEW.md.
2026-02-23 14:53:09 +07:00
Zamil Majdy
3c4fcf7972 docs(copilot): add comprehensive reconnection flow review
- Complete analysis of reconnection implementation
- Verified backend and frontend coordination
- Identified test coverage gaps
- All implementation is correct, just needs tests
2026-02-23 14:42:35 +07:00
Zamil Majdy
d9bce8b702 fix(copilot): critical fixes - heartbeat timing and async code removal
## Critical Bug Fixes

### 1. Fix Heartbeat Timing Mismatch
- **Problem**: Frontend timeout (12s) < backend heartbeat (15s) = stream timeout
- **Solution**: Changed heartbeat interval from 15s to 10s
- **Files**:
  - `sdk/service.py`: _HEARTBEAT_INTERVAL = 10.0
  - `service.py`: heartbeat_interval = 10.0 (2 locations)
- **Impact**: Eliminates "Stream timed out" toast on long operations

### 2. Remove Dead Async Long-Running Tool Code
- **Problem**: Recent removal of operation_id/task_id left dead subscription code
- **Solution**: Simplified _execute_long_running_tool_with_streaming()
- **Removed**:
  - create_task(blocking=True) logic
  - subscribe_to_task() queue subscription
  - Async wait loop for agent generator tools
  - Distinction between agent tools vs regular tools
- **Result**: ~80 lines of dead code removed, cleaner synchronous execution

## Testing

### New Comprehensive E2E Test Suite
- **File**: backend/copilot/test_copilot_e2e.py
- **Tests**: 8 comprehensive tests (7 passing, 1 xfail)
- **Coverage**:
  - Basic streaming flow (events, order, completeness)
  - Timeout prevention
  - Event types verification
  - Text content coherence
  - Heartbeat structure
  - Error handling
  - Concurrent sessions
  - Session state persistence (xfail - DB fixture issue)
- **Result**: Automated testing without LLM API calls via dummy mode

## Documentation

### Complete Architecture Flow Analysis
- **File**: COMPLETE_ARCHITECTURE_FLOW.md
- **Content**:
  - Full request flow from frontend to backend
  - SDK vs standard service comparison
  - RabbitMQ executor architecture explanation
  - Critical issues identified (heartbeat, dead code, context)
  - Recommended actions by priority
  - File reference guide

## Verification

All tests pass with COPILOT_TEST_MODE=true:
7 passed, 1 xfailed, 2 warnings in 14.41s

## Related Issues

Addresses parts of:
- #34: Chat loading (stream timeout fixed)
- #37: Agent execution (simplified tool execution)
- #40: Context maintenance (documented for investigation)
2026-02-23 14:36:44 +07:00
Zamil Majdy
171d7cfd06 fix(copilot): make agent generation synchronous with 30min timeout
- Increase agentgenerator_timeout from 600s to 1800s (30 minutes)
- Remove async complexity (operation_id/task_id) from agent generation
- Tools now block synchronously on HTTP response (simpler, cleaner)
- Remove dual Redis subscription issues
- Fix duplicate Otto intro with $ fallback and continuous deduplication
- Fix bare except statements and TypeScript errors

Previous async approach created duplicate subscriptions and race conditions.
Synchronous approach is simpler and safer - tools already run async in service.py.
2026-02-23 14:12:09 +07:00
Zamil Majdy
942ceeb19b refactor(copilot): remove task_id completely, use session_id everywhere
- Remove ActiveTask.task_id field (was redundant, always equaled session_id)
- Update OperationCompleteMessage to use session_id instead of task_id
- Remove all task_id references from stream_registry
- Simplify code by using session_id directly throughout

This completes the task_id -> session_id refactor with no more intermediate
variables or backwards compatibility fields.
2026-02-22 22:29:52 +07:00
Zamil Majdy
f4f607a865 revert: undo unrelated formatting changes from task_id removal commit
The commit dd10e1b33 accidentally included 84 files with formatting-only
changes (black/ruff reformatting) that were unrelated to the task_id removal.

This commit reverts those files back to their original state while keeping
the actual copilot-related changes.

Files reverted: blocks, tests, utilities (84 files)
Files kept: copilot, frontend copilot components (15 files)
2026-02-22 22:19:19 +07:00
Zamil Majdy
492eb5edd6 feat(copilot): implement async agent generation with blocking wait
Implements proper async execution for agent generator tools while maintaining
blocking UX to prevent race conditions and ensure predictable behavior.

Architecture:
- Agent Generator executes asynchronously (returns 202 Accepted)
- HTTP request stays open and waits for completion (no polling, event-driven)
- Heartbeats keep SSE connection alive during long operations
- Session stays locked - user cannot send new messages during generation
- When complete, continuation happens in same request (no concurrent LLM calls)

Changes:
- Add 'blocking' flag to ActiveTask model and Redis storage
- Update create_task() to accept blocking parameter
- Modify _execute_long_running_tool_with_streaming():
  * For agent tools (create_agent, edit_agent, customize_agent):
    - Pass operation_id/session_id to enable async on Agent Generator
    - Create task with blocking=True
    - Subscribe to stream_registry for completion
    - Wait via event-driven subscription (no polling!)
    - Forward progress chunks to SSE
  * For other tools: keep synchronous execution
- Update completion_handler to skip LLM continuation if blocking=True
  (HTTP request handles continuation instead)

Benefits:
- No HTTP timeout issues (Agent Generator doesn't block their service)
- No polling waste (event-driven via Redis Pub/Sub)
- No race conditions (session stays locked)
- Predictable UX (stop button visible, can't chat during generation)
- Efficient (heartbeats keep connection alive, progress updates forwarded)

Related to #<issue-number>
2026-02-22 18:34:35 +07:00
Zamil Majdy
b5c2d49260 debug(copilot): add comprehensive logging to track conversation flow
- Add DEBUG_CONVERSATION logs to track messages through the system
- Log raw message input, session history, OpenAI message conversion
- Log text deltas being yielded to frontend
- Log when LLM continuation is triggered
- This will help debug the 'double response then stuck' issue

Related to fix/copilot-synchronous-long-running-tools refactor
2026-02-22 17:56:15 +07:00
Zamil Majdy
dd10e1b339 fix(copilot): remove task_id concept entirely, use session_id for all streaming
- Remove separate task_id creation for long-running tools
- Update stream_registry to use session_id as primary identifier
- Update all stream_registry calls across codebase to use session_id
- Keep ActiveTask.task_id field for backwards compatibility (equals session_id)
- Fix mini game showing forever by ensuring results reach correct stream
- Remove _task_id parameter from stream_chat_completion
- Update processor, service, sdk/service, routes, completion handlers

Root cause: Long-running tools were creating separate task_ids and publishing
to different streams than the frontend was subscribed to. Now everything uses
session_id, ensuring results reach the frontend properly.

Files modified:
- backend/copilot/stream_registry.py
- backend/api/features/chat/routes.py
- backend/copilot/executor/processor.py
- backend/copilot/service.py
- backend/copilot/sdk/service.py
- backend/copilot/completion_handler.py
- backend/copilot/completion_consumer.py
- frontend EditAgent/CreateAgent components (remove duplicate loader)
- frontend ChatMessagesContainer (remove unused imports)

Tests: 139/140 passed (1 SDK initialization failure unrelated to changes)
2026-02-22 17:35:39 +07:00
Zamil Majdy
5cca95b78c fix(frontend/copilot): remove duplicate status text when accordion is visible
Hide the MorphingTextAnimation status text when the accordion is shown to
avoid duplication. Previously showed:
1. "Editing the agent" (status text)
2. "Editing agent, this may take a few minutes..." (accordion title)

Now only shows the accordion with its title, which contains more useful
information (includes the mini game and full messaging).
2026-02-22 16:10:03 +07:00
Zamil Majdy
82e265db7b fix(frontend/copilot): fix stream not resuming after page refresh
The useEffect for resuming streams was only depending on sessionId, but it
checked hasActiveStream and hydratedMessages inside. This caused a race
condition where the effect would run before those values loaded, return early,
and never re-run when they became available.

Added hasActiveStream, hydratedMessages, status, and resumeStream to the
dependency array so the effect re-runs when these values change.

Fixes the critical bug where:
1. User sends create/edit agent request
2. User refreshes page
3. SSE stream never reconnects
4. Final result never arrives (stuck with red button forever)
5. User has to keep refreshing manually to see progress
2026-02-22 16:06:50 +07:00
Zamil Majdy
ad19fd7e2d Revert "fix(backend/copilot): send initial operation_in_progress event before sync execution"
This reverts commit 20dea80fc9.
2026-02-22 15:53:39 +07:00
Zamil Majdy
dc4a994d5c Revert "fix(backend/copilot): publish initial operation_in_progress to stream registry"
This reverts commit 3aca0d6e3b.
2026-02-22 15:53:39 +07:00
Zamil Majdy
3aca0d6e3b fix(backend/copilot): publish initial operation_in_progress to stream registry
Publish the initial OperationInProgressResponse to stream_registry.publish_chunk()
so it's available when users refresh/reconnect. Previously it was only yielded
to the current SSE connection but not persisted, causing the mini game UI to
disappear after page refresh.

Fixes: "I see the agent generating UI and I see the game but once I refresh it's gone"
2026-02-22 15:51:20 +07:00
Zamil Majdy
20dea80fc9 fix(backend/copilot): send initial operation_in_progress event before sync execution
Yield initial StreamToolOutputAvailable with OperationInProgressResponse before
starting synchronous execution of long-running tools (create_agent, edit_agent).

This ensures the frontend receives output immediately and can render the accordion
UI with mini-game while the synchronous operation is running. Previously, the
frontend only had input but no output until the tool completed, preventing the
UI from rendering.

Flow is now:
1. StreamToolInputAvailable - creates tool part
2. StreamToolOutputAvailable (OperationInProgressResponse) - triggers UI rendering
3. [synchronous execution with heartbeats]
4. StreamToolOutputAvailable (final result) - updates with actual result

Addresses user feedback: "why don't we update the event before calling the sync request"
2026-02-22 15:44:42 +07:00
Zamil Majdy
bbfcfef20b fix(frontend/copilot): force accordion to render during create/edit agent execution
Set hasExpandableContent=true unconditionally in CreateAgent and EditAgent
components to ensure the accordion (with mini game) renders immediately when
the tool starts executing, not just after output is received.

This fixes the issue where users only saw the "Creating agent..." text with
spinner but no expandable UI until the agent generation completed.
2026-02-22 15:42:53 +07:00
Zamil Majdy
9f53e82831 fix(copilot): show mini game during agent creation/editing
When tool is executing (state=input-available, no output yet), expand
accordion to show mini game. Streaming updates work fine - SSE delivers
updates in real-time regardless of sync/async execution.

Changes:
- getAccordionMeta: accept null, show mini game message when no output
- Remove '&& output' check to render accordion during execution
- Add null guards before type check functions
2026-02-22 15:35:47 +07:00
Zamil Majdy
285b10dab8 Revert "fix(copilot): render accordion during execution and fix type errors"
Revert "fix(copilot): show mini game during synchronous agent operations"

These changes broke real-time streaming - UI was not updating without refresh.
Need a different approach for showing mini game that doesn't break streaming.
2026-02-22 15:31:27 +07:00
Zamil Majdy
351e404f46 fix(copilot): render accordion during execution and fix type errors
Removed '&& output' check to allow accordion to render when no output yet.
Added null checks before type guard calls to satisfy TypeScript.

This shows the mini game during agent creation/editing.
2026-02-22 15:29:26 +07:00
Zamil Majdy
818f0daca0 fix(copilot): show mini game during synchronous agent operations
With synchronous execution, getAccordionMeta() received null output
and fell through to error case, hiding the MiniGame.

Now explicitly handles null output and expands accordion to show
mini game while waiting for agent creation/editing to complete.
2026-02-22 15:26:53 +07:00
Zamil Majdy
48c2dcbd7f feat(copilot): let copilot search for library agents before create/edit
Instead of automatically searching for library agents using the user's
full description (which caused 'Search term too long' errors), we now:

1. Add library_agent_ids parameter to create_agent and edit_agent tools
2. Update tool descriptions to instruct the copilot to search first
3. Copilot uses find_library_agent to search for relevant agents
4. Copilot passes the agent IDs to the tool
5. Tool fetches agents by ID and passes to generator

This approach:
- Separates concerns: copilot handles search, tool handles generation
- Lets copilot make intelligent decisions about relevance
- Avoids search term length issues
- Gives copilot more control over sub-agent composition

New function: get_library_agents_by_ids() to fetch multiple agents by ID

Tool descriptions now explicitly instruct:
'IMPORTANT: Before calling this tool, search for relevant existing
agents using find_library_agent that could be used as building blocks.
Pass their IDs in the library_agent_ids parameter.'
2026-02-22 15:22:32 +07:00
Zamil Majdy
9229bc6751 fix(copilot): truncate search queries to avoid 100-char DB limit
When creating or editing agents with long descriptions, the full user
description was being passed as a search query to list_library_agents
and get_store_agents. This caused 'Search term is too long' errors
because the DB enforces a 100-character limit on search terms.

Now truncating search queries to 100 chars in:
- get_library_agents_for_generation
- search_marketplace_agents_for_generation

This fixes the error while preserving the sub-agent composition feature
that searches for relevant existing agents during agent creation/editing.

Fixes issue where users couldn't create agents with long descriptions
like 'Create an agent that generates funny, humorous images of bekantan
(proboscis monkeys) using AI image generation...'
2026-02-22 15:11:23 +07:00
Zamil Majdy
8e63a1d38b fix(copilot): force synchronous execution by not passing operation_id to tools
CRITICAL: We were accidentally triggering async/webhook mode by passing
_operation_id and _task_id to the tool. This caused:
- Dummy agent generator to return 202 Accepted
- 30s background task with Redis webhook
- Massive polling (hundreds of xread operations)
- Complex async flow instead of simple synchronous blocking

FIX: Don't pass operation_id/task_id to the tool execution. These are only
for our internal task tracking, not for controlling tool execution mode.

Now:
- Dummy blocks for 30s and returns result directly (synchronous)
- No 202 Accepted, no webhook, no polling loop
- Single SSE stream with final result
- Much simpler flow

The polling code remains as fallback for when the REAL external
agent-generator service needs to return 202 (queue overload, etc.)
2026-02-22 07:33:53 +07:00
Zamil Majdy
73ec753b86 fix(copilot): add stale task auto-cleanup and reduce polling aggression
STALE TASK DETECTION:
- Auto-complete tasks running >10 minutes without updates
- Prevents infinite polling when StreamFinish sent but task not marked complete
- Logged as warning for debugging

REDUCED POLLING:
- Changed POLL_INTERVAL_MS from 1.5s to 5s (less aggressive)
- Polling is fallback for legacy async tools (202 Accepted)
- Modern synchronous tools use SSE stream only (no polling)

Addresses user concerns:
- "Can't the code check if task is stale?" -> Yes, now it does
- "Why polling so aggressive?" -> Reduced from 1.5s to 5s
- "If we use polling, why Redis?" -> Redis streams for real-time,
  polling only for legacy async fallback
2026-02-22 07:30:14 +07:00
Zamil Majdy
e781a87570 fix(copilot): mark task as completed in synchronous long-running success path
CRITICAL BUG FIX: The synchronous long-running tool execution was never
marking the stream_registry task as completed in the success path, causing:
- Frontend infinite polling loop (active_task=True persists forever)
- Same messages rendered repeatedly
- "Something Went Wrong" toast after timeout

Root cause: service.py:1548 returned without calling
stream_registry.mark_task_completed() and _mark_operation_completed()
in the success path, while the error path (line 1516) correctly did.

Fix: Added task completion cleanup before yielding final
StreamToolOutputAvailable in the success path.
2026-02-22 07:23:41 +07:00
Zamil Majdy
f2c9471f3e chore: untrack accidentally committed .application.logs file 2026-02-22 07:18:18 +07:00
Zamil Majdy
195ebf2a1a fix(copilot): critical fixes - hasExpandableContent, Redis lock, dummy race condition
- Fix hasExpandableContent to check isLongRunning first (CreateAgent/EditAgent)
  Without this, MiniGame accordion won't expand during synchronous blocking
- Fix Redis lock not released in polling success path (service.py:1731)
- Fix race condition in dummy.py: copy current_agent BEFORE sleep
- Standardize dummy helper payloads (remove inconsistent "type" field)
- Add exc_info=True to error logging in dummy helpers
2026-02-22 07:16:10 +07:00
Zamil Majdy
b3f34b3319 fix(copilot): complete synchronous execution with frontend support and cleanup
This commit completes the synchronous long-running tool execution refactor by:

1. **Fixes all PR review issues**:
   - Move logging import to module level in response_model.py
   - Change INFO logging to DEBUG with isEnabledFor guards to prevent sensitive data leakage
   - Fix tool name lookup in SDK to check function.name (OpenAI format)
   - Add heartbeats during synchronous execution to keep SSE connection alive
   - Fix polling timeout to release Redis lock and mark task as failed
   - Track background tasks in dummy.py to prevent garbage collection

2. **Add frontend support for callProviderMetadata**:
   - Update CreateAgent.tsx to detect long-running via callProviderMetadata.isLongRunning
   - Update EditAgent.tsx with same detection logic
   - Maintain backwards compatibility with operation_started/pending output types

3. **Remove dead backend code**:
   - Remove unused _background_tasks set from service.py (only used in dummy.py now)
   - Remove dead _execute_long_running_tool function (never called)
   - Add comments clarifying legacy vs new execution paths

4. **Update frontend comments**:
   - Clarify that useLongRunningToolPolling is for async fallback and legacy sessions
   - Document that modern synchronous execution bypasses polling

The synchronous execution flow now works end-to-end:
- Backend sets callProviderMetadata.isLongRunning on tool input
- Backend blocks synchronously with heartbeats
- Frontend shows mini-game immediately when detecting isLongRunning
- Backend returns actual result after completion
- Frontend displays result (single LLM continuation)

Async fallback path (202 Accepted + webhooks) still supported for external services.
2026-02-22 06:07:38 +07:00
Zamil Majdy
d5115de6dd fix(backend/copilot): refactor long-running tools to synchronous execution
Removes background task spawning for long-running tools (create_agent,
edit_agent) in favor of synchronous blocking execution. This simplifies
the conversation flow by returning actual results to Claude in a single
LLM continuation instead of two separate continuations.

Changes:
- Remove asyncio.create_task() background task spawning
- Execute tools synchronously with await
- Fix database access to use get_chat_session().messages
- Remove unused pending_msg/started_msg variables
- Fix dummy agent generator to match real behavior
- Add callProviderMetadata for frontend mini-game indication

Benefits:
- Single LLM continuation with actual result (not operation started)
- Clearer conversation flow
- Frontend mini-game still shows via SSE during blocking execution
- Simplified codebase without background task tracking
2026-02-22 01:07:49 +07:00
Otto
e40c8c70ce fix(copilot): collision detection, session locking, and sync for concurrent message saves (#12177)
Requested by @majdyz

Concurrent writers (incremental streaming saves from PR #12173 and
long-running tool callbacks) can race to persist messages with the same
`(sessionId, sequence)` pair, causing unique constraint violations on
`ChatMessage`.

**Root cause:** The streaming loop tracks `saved_msg_count` in-memory,
but the long-running tool callback (`_build_long_running_callback`) also
appends messages and calls `upsert_chat_session` independently — without
coordinating sequence numbers. When the streaming loop does its next
incremental save with the stale `saved_msg_count`, it tries to insert at
a sequence that already exists.

**Fix:** Multi-layered defense-in-depth approach:

1. **Collision detection with retry** (db.py): `add_chat_messages_batch`
uses `create_many()` in a transaction. On `UniqueViolationError`,
queries `MAX(sequence)+1` from DB and retries with the correct offset
(max 5 attempts).

2. **Robust sequence tracking** (db.py): `get_next_sequence()` uses
indexed `find_first` with `order={"sequence": "desc"}` for O(1) MAX
lookup, immune to deleted messages.

3. **Session-based counter** (model.py): Added `saved_message_count`
field to `ChatSession`. `upsert_chat_session` returns the session with
updated count, eliminating tuple returns throughout the codebase.

4. **MessageCounter dataclass** (sdk/service.py): Replaced list[int]
mutable reference pattern with a clean `MessageCounter` dataclass for
shared state between streaming loop and long-running callbacks.

5. **Session locking** (sdk/service.py): Prevent concurrent streams on
the same session using Redis `SET NX EX` distributed locks with TTL
refresh on heartbeats (config.stream_ttl = 3600s).

6. **Atomic operations** (db.py): Single timestamp for all messages and
session update in batch operations for consistency. Parallel queries
with `asyncio.gather` for lower latency.

7. **Config-based TTL** (sdk/service.py, config.py): Consolidated all
TTL constants to use `config.stream_ttl` (3600s) with lock refresh on
heartbeats.

### Key implementation details

- **create_many**: Uses `sessionId` directly (not nested
`Session.connect`) as `create_many` doesn't support nested creates
- **Type narrowing**: Added explicit `assert session is not None`
statements for pyright type checking in async contexts
- **Parallel operations**: Use `asyncio.gather` for independent DB
operations (create_many + session update)
- **Single timestamp**: All messages in a batch share the same
`createdAt` timestamp for atomicity

### Changes
- `backend/copilot/db.py`: Collision detection with `create_many` +
retry, indexed sequence lookup, single timestamp, parallel queries
- `backend/copilot/model.py`: Added `saved_message_count` field,
simplified return types
- `backend/copilot/sdk/service.py`: MessageCounter dataclass, session
locking with refresh, config-based TTL, type narrowing
- `backend/copilot/service.py`: Updated all callers to handle new return
types
- `backend/copilot/config.py`: Increased long_running_operation_ttl to
3600s with clarified docstring
- `backend/copilot/*_test.py`: Tests updated for new signatures

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-20 15:05:03 +00:00
Zamil Majdy
9cdcd6793f fix(copilot): remove stream timeout, add error propagation to frontend (#12175)
## Summary

Fixes critical reliability issues where long-running copilot sessions
were forcibly terminated and failures showed no error messages to users.

## Issues Fixed

1. **Silent failures**: Tasks failed but frontend showed "stopped" with
zero explanation
2. **Premature timeout**: Sessions auto-expired after 5 minutes even
when actively running

## Changes

### Error propagation to frontend
- Add `error_message` parameter to `mark_task_completed()`
- When `status="failed"`, publish `StreamError` before `StreamFinish` so
frontend displays reason
- Update all failure callers with specific error messages:
  - Session not found: `"Session {id} not found"`
  - Tool setup failed: `"Failed to setup tool {name}: {error}"`  
  - Task cancelled: `"Task was cancelled"`

### Remove stream timeout
- Delete `stream_timeout` config (was 300s/5min)
- Remove auto-expiry logic in `get_active_task_for_session()`
- Sessions now run indefinitely — user controls stopping via UI

## Why

**Auto-expiry was broken:**
- Used `created_at` (task start) not last activity
- SDK sessions with multiple LLM calls + subagent Tasks easily run
20-30+ minutes
- A task publishing chunks every second still got killed at 5min mark
- Hard timeout is inappropriate for long-running AI agents

**Error propagation was missing:**
- `mark_task_completed(status="failed")` only sent `StreamFinish`
- No `StreamError` event = frontend had no message to show user
- Backend logs showed errors but user saw nothing

## Test Plan

- [x] Formatter, linter, type-check pass
- [ ] Start a copilot session with Task tool (spawns subagent)
- [ ] Verify session runs beyond 5 minutes without auto-expiry
- [ ] Cancel a running session → frontend shows "Task was cancelled"
error
- [ ] Trigger a tool setup failure → frontend shows error message
- [ ] Session continues running until user clicks stop or task completes

## Files Changed

- `backend/copilot/config.py` — removed `stream_timeout`
- `backend/copilot/stream_registry.py` — removed auto-expiry, added
error propagation
- `backend/copilot/service.py` — error messages for 2 failure paths
- `backend/copilot/executor/processor.py` — error message for
cancellation
2026-02-20 09:16:22 +00:00
Zamil Majdy
fc64f83331 fix(copilot): SDK streaming reliability, parallel tools, incremental saves, frontend reconnection (#12173)
## Summary

Fixes multiple reliability issues in the copilot's Claude Agent SDK
streaming pipeline — tool outputs getting stuck, parallel tool calls
flushing prematurely, messages lost on page refresh, and SSE
reconnection failures.

## Changes

### Backend: Streaming loop rewrite (`sdk/service.py`)
- **Non-cancelling heartbeat pattern**: Replace `asyncio.timeout()` with
`asyncio.wait()` for SDK message iteration. The old approach corrupted
the SDK's internal anyio memory stream when timeouts fired
mid-`__anext__()`, causing `StopAsyncIteration` on the next call and
silently dropping all in-flight tool results.
- **Hook synchronization**: Add `wait_for_stash()` before
`convert_message()` — the SDK fires PostToolUse hooks via `start_soon()`
(fire-and-forget), so the next message can arrive before the hook
stashes its output. The new asyncio.Event-based mechanism bridges this
gap without arbitrary sleeps.
- **Error handling**: Add `asyncio.CancelledError` handling at both
inner (streaming loop) and outer (session) levels, plus pending task
cleanup in `finally` block to prevent leaked coroutines. Catch
`Exception` from `done.pop().result()` for SDK error messages.
- **Safety-net flush**: After streaming loop ends, flush any remaining
unresolved tool calls so the frontend stops showing spinners even if the
stream drops unexpectedly.
- **StreamFinish fallback**: Emit `StreamFinishStep` + `StreamFinish`
when stream ends without `ResultMessage` (StopAsyncIteration) so the
frontend transitions to "ready" state.
- **Incremental session saves**: Save session to PostgreSQL after each
tool input/output event (not just at stream end), so page refresh and
other devices see recent messages.
- **Enhanced logging**: All log lines now include `session_id[:12]`
prefix and tool call resolution state (unresolved/current/resolved
counts).

### Backend: Response adapter (`sdk/response_adapter.py`)
- **Parallel tool call support**: Skip `_flush_unresolved_tool_calls()`
when an AssistantMessage contains only ToolUseBlocks (parallel
continuation) — the prior tools are still executing concurrently and
haven't finished yet.
- **Duplicate output prevention**: Skip already-resolved tool results in
both UserMessage (ToolResultBlock) and parent_tool_use_id handling to
prevent duplicate `StreamToolOutputAvailable` events.
- **`has_unresolved_tool_calls` property**: Used by the streaming loop
to decide whether to wait for PostToolUse hooks.
- **`session_id` parameter**: Passed through for structured logging.

### Backend: Hook synchronization (`sdk/tool_adapter.py`)
- **`_stash_event` ContextVar**: asyncio.Event signaled by
`stash_pending_tool_output()` whenever a PostToolUse hook stashes
output.
- **`wait_for_stash()`**: Awaits the event with configurable timeout —
replaces the racy "hope the hook finished" approach.

### Backend: Security hooks (`sdk/security_hooks.py`)
- Enhanced logging in `post_tool_use_hook` — log whether tool is
built-in, preview of stashed output, warning when `tool_response` is
None.

### Backend: Incremental save optimization (`model.py`)
- **`existing_message_count` parameter** on `upsert_chat_session`: Skip
the DB query to count existing messages when the caller already tracks
this (streaming loop).
- **`skip_existence_check` parameter** on `_save_session_to_db`: Skip
the `get_chat_session` existence query when we know the session row
already exists. Reduces from 4 DB round trips to 2 per incremental save.

### Backend: SDK version bump (`pyproject.toml`, `poetry.lock`)
- Bump `claude-agent-sdk` from `^0.1.0` to `^0.1.39`.

### Backend: New tests
- **`sdk_compat_test.py`** (new file): SDK compatibility tests — verify
the installed SDK exposes every class, attribute, and method the copilot
integration relies on. Catches SDK upgrade breakage immediately.
- **`response_adapter_test.py`**: 9 new tests covering
flush-at-ResultMessage, flush-at-next-AssistantMessage, stashed output
flush, wait_for_stash signaling/timeout/fast-path, parallel tool call
non-premature-flush, text-message flush of prior tools, and
already-resolved tool skip in UserMessage.

### Frontend: Session hydration (`convertChatSessionToUiMessages.ts`)
- **`isComplete` option**: When session has no active stream, mark
dangling tool calls (no output in DB) as `output-available` with empty
output — stops stale spinners after page refresh.

### Frontend: Chat session hook (`useChatSession.ts`)
- Reorder `hasActiveStream` memo before `hydratedMessages` so
`isComplete` flag is available.
- Pass `{ isComplete: !hasActiveStream }` to
`convertChatSessionMessagesToUiMessages`.

### Frontend: Copilot page hook (`useCopilotPage.ts`)
- **Cache invalidation on stream end**: Invalidate React Query session
cache when stream transitions active → idle, so next hydration fetches
fresh messages from backend (staleTime: Infinity otherwise keeps stale
data).
- **Resume ref reset**: Reset `hasResumedRef` on stream end to allow
re-resume if SSE drops but backend task is still running.
- **Remove old `resolveInProgressTools` effect**: Replaced by
backend-side safety-net flush + hydration-time `isComplete` marking.

## Test plan
- [ ] Existing copilot tests pass (`pytest backend/copilot/ -x -q`)
- [ ] SDK compat tests pass (`pytest
backend/copilot/sdk/sdk_compat_test.py -v`)
- [ ] Tool outputs (bash_exec, web_fetch, WebSearch) appear in the UI
instead of getting stuck
- [ ] Parallel tool calls (e.g. multiple WebSearch) complete and display
results without premature flush
- [ ] Page refresh during active stream reconnects and recovers messages
- [ ] Opening session from another device shows recent tool results
- [ ] SSE drop → automatic reconnection without losing messages
- [ ] Long-running tools (create_agent) still delegate to background
infrastructure
2026-02-20 08:25:08 +00:00
Otto
7718c49f05 Make CoPilot todo/task list card expanded by default (#12168)
The todo card rendered by GenericTool was collapsed by default,
requiring users to click to see their checklist items. Now passes
`defaultExpanded` when the category is `"todo"` so the task list is
immediately visible.

**File changed:**
`autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/GenericTool.tsx`

Resolves [SECRT-2017](https://linear.app/autogpt/issue/SECRT-2017)
2026-02-20 05:36:16 +00:00
Abhimanyu Yadav
0a1591fce2 refactor(frontend): remove old builder code and monitoring components
(#12082)

### Changes 🏗️

This PR removes old builder code and monitoring components as part of
the migration to the new flow editor:

- **NewControlPanel**: Simplified component by removing unused props
(`flowExecutionID`, `visualizeBeads`, `pinSavePopover`,
`pinBlocksPopover`, `nodes`, `onNodeSelect`, `onNodeHover`) and cleaned
up commented legacy code
- **Import paths**: Updated all references from
`legacy-builder/CustomNode` to `FlowEditor/nodes/CustomNode`
- **GraphContent**: Fixed type safety by properly handling
`customized_name` metadata and using `categoryColorMap` instead of
`getPrimaryCategoryColor`
- **useNewControlPanel**: Removed unused state and query parameter
handling related to old builder
- Removed dead code and commented-out imports throughout

### Checklist 📋

#### For code changes:

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
    - [x] Verify NewControlPanel renders correctly
    - [x] Test BlockMenu functionality
    - [x] Test Save Control
    - [x] Test Undo/Redo buttons
    - [x] Verify graph search menu still works with updated imports

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR removes legacy builder components and monitoring page (~12,000
lines of code), simplifying `NewControlPanel` to focus only on the new
flow editor.

**Key changes:**
- Removed entire `legacy-builder/` directory (36 files) containing old
CustomNode, CustomEdge, Flow, and control components
- Deleted `/monitoring` page and all related components (9 files)
- Deleted `useAgentGraph` hook (1,043 lines) that was only used by
legacy components
- Simplified `NewControlPanel` by removing unused props
(`flowExecutionID`, `nodes`, `onNodeSelect`, etc.) and commented-out
code
- Updated imports in `NewSearchGraph` components to reference new
`FlowEditor/nodes/CustomNode` instead of deleted
`legacy-builder/CustomNode`
- Removed `/monitoring` from protected pages in `helpers.ts`
- Updated test files to remove monitoring-related test helpers

**Minor style issues:**
- `useNewControlPanel` hook returns unused state (`blockMenuSelected`)
that should be cleaned up
- Unnecessary double negation (`!!`) in `GraphContent.tsx:136`
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with minor style improvements recommended
- The refactor is a straightforward deletion of legacy code with no
references remaining in the codebase. All imports have been updated
correctly, tests cleaned up, and routing configuration updated. The only
issues are minor unused code that could be cleaned up but won't cause
runtime errors.
- No files require special attention - the unused state in
`useNewControlPanel.ts` is a minor style issue
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant NewControlPanel
    participant BlockMenu
    participant NewSaveControl
    participant UndoRedoButtons
    participant Store as blockMenuStore (Zustand)

    Note over NewControlPanel: Simplified component (removed props & legacy code)
    
    User->>NewControlPanel: Render
    NewControlPanel->>useNewControlPanel: Call hook (unused return)
    
    NewControlPanel->>BlockMenu: Render
    BlockMenu->>Store: Access state via useBlockMenuStore
    Store-->>BlockMenu: Return search, filters, etc.
    
    NewControlPanel->>NewSaveControl: Render
    NewControlPanel->>UndoRedoButtons: Render
    
    Note over NewControlPanel,Store: State management moved from hook to Zustand store
    Note over User: Legacy components (CustomNode, Flow, etc.) completely removed
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-20 05:19:08 +00:00
Zamil Majdy
681bb7c2b4 feat(copilot): workspace file tools, context reconstruction, transcript upload protection (#12164)
## Summary
- **Workspace file tools**: `write_workspace_file` now accepts plain
text `content`, `source_path` (copy from ephemeral disk), and graceful
fallback for invalid base64. `read_workspace_file` gains `save_to_path`
to download workspace files to the ephemeral working directory. Both
validate paths against session-specific ephemeral directory.
- **Context reconstruction**: `_format_conversation_context` now
includes tool call summaries and tool results (not just user/assistant
text), fixing agent amnesia when transcript is unavailable or stale.
- **Transcript upload protection**: Moved transcript upload from inside
the inner `try` block to the `finally` block, ensuring it always runs
even on streaming exceptions — prevents transcript loss that caused
staleness on subsequent turns.
- **Agent inactivity timeout**: Configurable timeout (default 300s)
kills hung Claude agents that stop producing SDK messages.
- **SDK system prompt**: Restructured with clear sections for shell
commands, two storage systems, file transfer workflows, and long-running
tools.
- **Path validation hardening**: `_validate_ephemeral_path` uses
`os.path.realpath` for both session dir and target path, fixing macOS
`/tmp` → `/private/tmp` symlink mismatch. Empty-string params normalised
to `None` to prevent dispatch assertion failures.

## Test plan
- [x] `_format_conversation_context` — empty, user, assistant, tool
calls, tool results, full conversation (query_builder_test.py)
- [x] `_build_query_message` — resume up-to-date, stale transcript gap,
zero msg count, no resume single/multi (query_builder_test.py)
- [x] `_validate_ephemeral_path` — valid path, traversal, cross-session,
symlink escape, nested (workspace_files_test.py)
- [x] `_resolve_write_content` — no sources, multiple sources, plain
text, base64, invalid base64, source_path, not found, outside ephemeral,
empty strings (workspace_files_test.py)
- [ ] Verify transcript upload occurs even after streaming error
- [ ] Verify agent inactivity timeout kills hung agents (300s default)

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-02-20 03:20:12 +00:00
Zamil Majdy
0818cd6683 fix(copilot): prevent background agent stalls and context hallucination (#12167)
## Summary
- **Block background Task agents**: The SDK's `Task` tool with
`run_in_background=true` stalls the SSE stream (no messages flow while
they execute) and the agents get killed when the main agent's turn ends
and we SIGTERM the CLI. The `PreToolUse` hook now denies these and tells
the agent to run tasks in the foreground instead.
- **Add heartbeats to SDK streaming**: Replaced the `async for` loop
with an explicit async iterator + `asyncio.wait_for(15s)`. Sends
`StreamHeartbeat` when the CLI is idle (e.g. during long tool execution)
to keep SSE connections alive through proxies/LBs.
- **Fix summarization hallucination**: The `_summarize_messages_llm`
prompt forced the LLM to produce ALL 9 sections ("You MUST include
ALL"), causing fabrication when the conversation didn't have content for
every section. Changed to optional sections with explicit
anti-hallucination instructions.

## Context
Session `7a9dda34-1068-4cfb-9132-5daf8ad31253` exhibited both issues:
1. The copilot tried to spin up background agents to create files in
parallel, then stopped responding
2. On resume, the copilot hallucinated having completed a "comprehensive
competitive analysis" with "9 deliverables" that never happened

## Test plan
- [x] All 26 security hooks tests pass (3 new: background blocked,
foreground allowed, limit enforced)
- [x] All 44 prompt utility tests pass
- [x] Linting and typecheck pass
- [ ] Manual test: copilot session where agent attempts to use Task tool
— should run foreground only
- [ ] Manual test: long-running tool execution — SSE should stay alive
via heartbeats
- [ ] Manual test: resume a multi-turn session — no hallucinated context
in summary
2026-02-19 20:00:15 +00:00
Zamil Majdy
7a39bdfaf8 feat(copilot): wire up stop button to cancel executor tasks (#12171)
## Summary

- The stop button was completely disconnected — clicking it only aborted
the client-side SSE fetch while the executor kept running indefinitely
- The executor already had full cancel infrastructure (RabbitMQ FANOUT
consumer, `CancelCoPilotEvent`, `threading.Event`, periodic cancel
checks) but nobody ever published a cancel message
- This PR wires up the missing pieces: a cancel REST endpoint, a publish
function, and frontend integration

## Changes

- **`executor/utils.py`**: Add `enqueue_cancel_task()` to publish
`CancelCoPilotEvent` to the existing FANOUT exchange
- **`routes.py`**: Add `POST /sessions/{session_id}/cancel` that finds
the active task, publishes cancel, and polls Redis until the task
confirms stopped (up to 10s timeout)
- **`cancel/route.ts`**: Next.js API proxy route for the cancel endpoint
- **`useCopilotPage.ts`**: Wrap AI SDK's `stop()` to also call the
backend cancel API — `sdkStop()` fires first for instant UI feedback,
then the cancel API waits for executor confirmation

## Test plan

- [ ] Start a copilot chat session and send a message
- [ ] Click "Stop generating" while streaming
- [ ] Verify executor logs show `Received cancel for {task_id}` and
`Cancelled during streaming`
- [ ] Verify the cancel endpoint returns `{"cancelled": true}` (not
timeout)
- [ ] Verify frontend transitions to idle state
- [ ] Verify clicking stop when no task is running returns
`{"cancelled": false, "reason": "no_active_task"}`
2026-02-19 19:57:51 +00:00
Otto
0b151f64e8 feat(copilot): Execute parallel tool calls concurrently (#12165)
When the LLM returns multiple tool calls in a single response (e.g.
multiple web fetches for a research task), they now execute concurrently
instead of sequentially. This can dramatically reduce latency for
multi-tool turns.

**Before:** Tool calls execute one after another — 7 web fetches × 2s
each = 14s total
**After:** All tool calls fire concurrently — 7 web fetches = ~2s total

### Changes

- **`service.py`**: New `_execute_tool_calls_parallel()` function that
spawns tool calls as concurrent `asyncio` tasks, collecting stream
events via `asyncio.Queue`
- **`service.py`**: `_yield_tool_call()` now accepts an optional
`session_lock` parameter for concurrent-safe session mutations
- **`base.py`**: Session lock exposed via `contextvars` so tools that
need it can access it without interface changes
- **`run_agent.py`**: Rate-limit counters (`successful_agent_runs`,
`successful_agent_schedules`) protected with the session lock to prevent
race conditions

### Concurrency Safety

| Shared State | Risk | Mitigation |
|---|---|---|
| `session.messages` (long-running tools only) | Race on append + upsert
| `session_lock` wraps mutations |
| `session.successful_agent_runs` counter | Bypass max-runs check |
`session_lock` wraps read-check-increment |
| Tool-internal state (DB queries, API calls) | None — stateless | No
mitigation needed |

### Testing

- Added `parallel_tool_calls_test.py` with tests for:
  - Parallel timing verification (sum vs max of delays)
  - Single tool call regression
  - Retryable error propagation
  - Shared session lock verification
  - Cancellation cleanup

Closes SECRT-2016

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-19 17:53:36 +00:00
Zamil Majdy
be2a48aedb feat(platform/copilot): add SuggestedGoalResponse for vague/unachievable goals (#12139)
## Summary

- Add `SUGGESTED_GOAL` response type and `SuggestedGoalResponse` model
to backend; vague/unachievable goals now return a structured suggestion
instead of a generic error
- Add `SuggestedGoalCard` frontend component (amber styling, "Use this
goal" button) that lets users accept and re-submit a refined goal in one
click
- Add error recovery buttons ("Try again", "Simplify goal") to the error
output block
- Update copilot system prompt with explicit guidance for handling
`suggested_goal` and `clarifying_questions` feedback loops
- Add `create_agent_test.py` covering all four decomposition result
types

## Test plan

- [ ] Trigger vague goal (e.g. "monitor social media") →
`SuggestedGoalCard` renders with amber styling
- [ ] Trigger unachievable goal (e.g. "read my mind") → card shows goal
type "Goal cannot be accomplished" with reason
- [ ] Click "Use this goal" → sends message and triggers new
`create_agent` call with the suggested goal
- [ ] Trigger an error → "Try again" and "Simplify goal" buttons appear
below the error
- [ ] Clarifying questions answered → LLM re-calls `create_agent` with
context (system prompt guidance)
- [ ] Backend tests pass: `poetry run pytest
backend/api/features/chat/tools/create_agent_test.py -xvs` (requires
Docker services)

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaced generic `ErrorResponse` with structured `SuggestedGoalResponse`
for vague/unachievable goals in the copilot agent creation flow. Added
frontend `SuggestedGoalCard` component with amber styling and "Use this
goal" button for one-click goal refinement. Enhanced system prompt with
explicit feedback loop handling for `suggested_goal` and
`clarifying_questions`. Added comprehensive test coverage for all four
decomposition result types.

**Key improvements:**
- Better UX: Users can now accept refined goals with one click instead
of manually retyping
- Clearer error recovery: Added "Try again" and "Simplify goal" buttons
to error blocks
- Structured data: Backend now returns `suggested_goal`, `reason`,
`original_goal`, and `goal_type` fields instead of embedding everything
in error messages

**Issue found:**
- The `reason` field from the backend is not being passed to or
displayed by the `SuggestedGoalCard` component, so users won't see the
explanation for why their goal was rejected (especially important for
unachievable goals where it explains what blocks are missing)
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge after fixing the missing `reason` field in the frontend
component
- Implementation is well-structured with good test coverage and follows
established patterns. The issue with the missing `reason` field is
straightforward to fix but important for UX - users won't understand why
their goal was rejected without it. All other changes are solid: backend
properly returns structured data, tests cover all cases, and the
component integration follows the project's conventions.
-
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
and SuggestedGoalCard.tsx need the `reason` prop added
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    Start[User submits goal to create_agent] --> Decompose[decompose_goal analyzes request]
    
    Decompose --> CheckType{Decomposition result type?}
    
    CheckType -->|clarifying_questions| Questions[Return ClarificationNeededResponse]
    Questions --> UserAnswers[User answers questions]
    UserAnswers --> Retry[Retry with context]
    Retry --> Decompose
    
    CheckType -->|vague_goal| VagueResponse[Return SuggestedGoalResponse<br/>goal_type: vague]
    VagueResponse --> ShowSuggestion[Frontend: SuggestedGoalCard<br/>amber styling]
    ShowSuggestion --> UserAccepts{User clicks<br/>Use this goal?}
    UserAccepts -->|Yes| NewGoal[Send suggested goal]
    NewGoal --> Decompose
    UserAccepts -->|No| End1[User refines manually]
    
    CheckType -->|unachievable_goal| UnachievableResponse[Return SuggestedGoalResponse<br/>goal_type: unachievable<br/>reason: missing blocks]
    UnachievableResponse --> ShowSuggestion
    
    CheckType -->|success| Generate[generate_agent creates workflow]
    Generate --> SaveOrPreview{save parameter?}
    SaveOrPreview -->|true| Save[Save to library<br/>AgentSavedResponse]
    SaveOrPreview -->|false| Preview[AgentPreviewResponse]
    
    CheckType -->|error| ErrorFlow[Return ErrorResponse]
    ErrorFlow --> ShowError[Frontend: Show error with<br/>Try again & Simplify goal buttons]
    ShowError --> UserRetry{User action?}
    UserRetry -->|Try again| Decompose
    UserRetry -->|Simplify goal| GetHelp[Ask LLM to simplify]
    GetHelp --> Decompose
    
    Save --> End2[Done]
    Preview --> End2
    End1 --> End2
```
</details>


<sub>Last reviewed commit: 2f37aee</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-19 16:11:41 +00:00
Ubbe
aeca4dbb79 docs(frontend): add mandatory pre-completion checks to CLAUDE.md (#12161)
### Changes 🏗️

Adds a **Pre-completion Checks (MANDATORY)** section to
`frontend/CLAUDE.md` that instructs Claude Code agents to always run the
following commands in order before reporting frontend work as done:

1. `pnpm format` — auto-fix formatting issues
2. `pnpm lint` — check for lint errors and fix them
3. `pnpm types` — check for type errors and fix them

This ensures code quality gates are enforced consistently by AI agents
working on the frontend.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified `pnpm format` passes cleanly
  - [x] Verified `pnpm lint` passes cleanly
  - [x] Verified `pnpm types` passes cleanly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:07:55 +08:00
Ubbe
7b85eeaae2 refactor(frontend): fix flaky e2e tests (#12156)
### Changes 🏗️

Some fixes to make running e2e more predictable...

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] e2e are imdempotent

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:38:50 +07:00
Ubbe
4db3be2d61 fix(frontend): switch minigame to snake (#12160)
## Changes 🏗️

<img width="600" height="416" alt="Screenshot 2026-02-19 at 18 05 39"
src="https://github.com/user-attachments/assets/930116ad-b611-4398-bee7-4e33ca4dc688"
/>

Make the mini game a snake 🐍 game, so we don't use assets (_possible
license issues_ ), and it's simpler...

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run the app and test
2026-02-19 19:28:59 +07:00
Ubbe
f57a1995d0 fix(frontend): make chat spinner centred when loading (#12154)
## Changes 🏗️

<img width="800" height="969" alt="Screenshot 2026-02-18 at 20 30 36"
src="https://github.com/user-attachments/assets/30d7d211-98c1-4159-94e1-86e81e29ad43"
/>

- Make the spinner centred when the chat is loading

## Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run the app and test locally
2026-02-19 17:31:07 +07:00
Zamil Majdy
3928c35928 feat(copilot): SDK tool output, transcript resume, stream reconnection, GenericTool UI (#12159)
## Summary

### SDK built-in tool output forwarding
- WebSearch, Read, TodoWrite outputs now render in the frontend —
PostToolUse hook stashes outputs before SDK truncation, response adapter
flushes unresolved tool calls via `_flush_unresolved_tool_calls` +
`parent_tool_use_id` handling
- Multi-call stash upgraded to `dict[str, list[str]]` to support
multiple calls to the same built-in tool in one turn

### Transcript-based `--resume` with staleness detection
- Simplified to single upload block after `async with` (Stop hook +
`appendFileSync` guarantees), extracted `_try_upload_transcript` helper
- **NEW**: `message_count` watermark + timestamp metadata stored
alongside transcript — on the next turn, detects staleness and
compresses only the missed messages instead of the full history (hybrid:
transcript via `--resume` + compressed gap)
- Removed redundant dual-strategy code and dead
`find_cli_transcript`/`read_fallback_transcript` functions

### Frontend stream reconnection
- **NEW**: Enabled `resume: true` on `useChat` with
`prepareReconnectToStreamRequest` — page refresh reconnects to active
backend streams via Redis replay (backend `resume_session_stream`
endpoint was already wired up)

### GenericTool.tsx UI overhaul
- Tool-specific icons (terminal, globe, file, search, edit, checklist)
with category-based display
- TodoWrite checklist rendering with status indicators
(completed/in-progress/pending)
- WebSearch/MCP content display via `extractMcpText` for MCP-style
content blocks + raw JSON fallback
- Defensive TodoItem filter per coderabbit review
- Proper accordion content per tool category (bash, web, file, search,
edit, todo)

### Image support
- MCP tool results now include `{"type": "image"}` content blocks when
workspace file responses contain `content_base64` with image MIME types

### Security & cleanup
- `AskUserQuestion` added to `SDK_DISALLOWED_TOOLS` (interactive CLI
tool, no terminal in copilot)
- 36 per-operation `[TIMING]`/`[TASK_LOOKUP]` diagnostic logs downgraded
info→debug
- Silent exception fixes: warning logs for swallowed errors in
stream_registry + service

## Test plan
- [ ] Verify copilot multi-turn conversations use `--resume` (check logs
for `Using --resume`)
- [ ] Verify stale transcript detection fills gap (check logs for
`Transcript stale: covers N of M messages`)
- [ ] Verify page refresh reconnects to active stream (check network tab
for GET to `/stream` returning SSE)
- [ ] Verify WebSearch, Read, TodoWrite tool outputs render in frontend
accordion
- [ ] Verify GenericTool icons and accordion content display correctly
for each tool type
- [ ] Verify production log volume is reduced (no more `[TIMING]` at
info level)

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-02-19 08:48:12 +00:00
Otto
dc77e7b6e6 feat(frontend): Replace advanced switch with chevron on builder nodes (#12152)
## Summary

Replaces the "Advanced" switch/toggle on builder nodes with a chevron
control, matching the UX pattern used for the outputs section.

Resolves
[OPEN-3006](https://linear.app/autogpt/issue/OPEN-3006/replace-advanced-switch-with-chevron-on-builder-nodes)

Before
<img width="443" height="348" alt="Screenshot 2026-02-17 at 9 01 31 pm"
src="https://github.com/user-attachments/assets/40e98669-3136-4e53-8d46-df18ea32c4d7"
/>
After
<img width="443" height="348" alt="Screenshot 2026-02-17 at 9 00 21 pm"
src="https://github.com/user-attachments/assets/0836e3ac-1d0a-43d7-9392-c9d5741b32b6"
/>

## Changes

- `NodeAdvancedToggle.tsx` — Replaced switch component with a chevron
expand/collapse toggle

## Testing

Tested and verified by @kpczerwinski

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaces the `Switch` toggle for the "Advanced" section on builder nodes
with a chevron (`CaretDownIcon`) expand/collapse control, matching the
existing UX pattern used in `OutputHandler.tsx`. The change is clean and
consistent with the codebase.

- Swapped `Switch` component for a ghost `Button` + `CaretDownIcon` with
a `rotate-180` transition for visual feedback
- Pattern closely mirrors the output section toggle in
`OutputHandler.tsx` (lines 120-136)
- Removed the top border separator and rounded bottom corners from the
container, adjusting the visual spacing
- Toggle logic correctly inverts the `showAdvanced` boolean state
- Uses Phosphor Icons and design system components per project
conventions
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge — it is a small, focused UI change with no
logic or security concerns.
- Single file changed with a straightforward UI component swap. The new
implementation follows an established pattern already in use in
OutputHandler.tsx. Toggle logic is correct and all conventions (Phosphor
Icons, design system components, Tailwind styling) are followed.
- No files require special attention.
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant NodeAdvancedToggle
    participant nodeStore

    User->>NodeAdvancedToggle: Click chevron button
    NodeAdvancedToggle->>nodeStore: setShowAdvanced(nodeId, !showAdvanced)
    nodeStore-->>NodeAdvancedToggle: Updated showAdvanced state
    NodeAdvancedToggle->>NodeAdvancedToggle: Rotate CaretDownIcon (0° ↔ 180°)
    Note over NodeAdvancedToggle: Advanced fields shown/hidden via FormCreator
```
</details>


<sub>Last reviewed commit: ad66080</sub>

<!-- greptile_other_comments_section -->

**Context used:**

- Context from `dashboard` - autogpt_platform/frontend/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=39861924-d320-41ba-a1a7-a8bff44f780a))
- Context from `dashboard` -
autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/ARCHITECTURE_FLOW_EDITOR.md
([source](https://app.greptile.com/review/custom-context?memory=0c5511fe-9aeb-4cf1-bbe9-798f2093b748))

<!-- /greptile_comment -->

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Ubbe <0ubbe@users.noreply.github.com>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-18 15:34:02 +00:00
Otto
ba75cc28b5 fix(copilot): Remove description from feature request search, add PII prevention (#12155)
Two targeted changes to the CoPilot feature request tools:

1. **Remove description from search results** — The
`search_feature_requests` tool no longer returns issue descriptions.
Only the title is needed for duplicate detection, reducing unnecessary
data exposure.

2. **Prevent PII in created issues** — Updated the
`create_feature_request` tool description and parameter descriptions to
explicitly instruct the LLM to never include personally identifiable
information (names, emails, company names, etc.) in Linear issue titles
and descriptions.

Resolves [SECRT-2010](https://linear.app/autogpt/issue/SECRT-2010)
2026-02-18 14:36:12 +01:00
Otto
15bcdae4e8 fix(backend/copilot): Clean up GCSWorkspaceStorage per worker (#12153)
The copilot executor runs each worker in its own thread with a dedicated
event loop (`asyncio.new_event_loop()`). `aiohttp.ClientSession` is
bound to the event loop where it was created — using it from a different
loop causes `asyncio.timeout()` to fail with:

```
RuntimeError: Timeout context manager should be used inside a task
```

This was the root cause of transcript upload failures tracked in
SECRT-2009 and [Sentry
#7272473694](https://significant-gravitas.sentry.io/issues/7272473694/).

### Fix

**One `GCSWorkspaceStorage` instance per event loop** instead of a
single shared global.

- `get_workspace_storage()` now returns a per-loop GCS instance (keyed
by `id(asyncio.get_running_loop())`). Local storage remains shared since
it has no async I/O.
- `shutdown_workspace_storage()` closes the instance for the **current**
loop only, so `session.close()` always runs on the loop that created the
session.
- `CoPilotProcessor.cleanup()` shuts down workspace storage on the
worker's own loop, then stops the loop.
- Manager cleanup submits `cleanup_worker` to each thread pool worker
before shutting down the executor — replacing the old approach of
creating a temporary event loop that couldn't close cross-loop sessions.

### Changes

| File | Change |
|------|--------|
| `util/workspace_storage.py` | `GCSWorkspaceStorage` back to simple
single-session class; `get_workspace_storage()` returns per-loop GCS
instance; `shutdown_workspace_storage()` scoped to current loop |
| `copilot/executor/processor.py` | Added `CoPilotProcessor.cleanup()`
and `cleanup_worker()` |
| `copilot/executor/manager.py` | Calls `cleanup_worker` on each thread
pool worker during shutdown |

Fixes SECRT-2009

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-18 11:17:39 +00:00
Otto
e9ba7e51db fix(copilot): Route workspace through db_accessors, fix transcript upload (#12148)
## Summary

Fixes two bugs in the copilot executor:

### SECRT-2008: WorkspaceManager bypasses db_accessors
`backend/util/workspace.py` imported 6 workspace functions directly from
`backend/data/workspace.py`, which call `prisma()` directly. In the
copilot executor (no Prisma connection), these fail.

**Fix:** Replace direct imports with `workspace_db()` from
`db_accessors`, routing through the database_manager HTTP client when
Prisma is unavailable. Also:
- Register all 6 workspace functions in `DatabaseManager` and
`DatabaseManagerAsyncClient`
- Add `UniqueViolationError` to the service `EXCEPTION_MAPPING` so it's
properly re-raised over HTTP (needed for race-condition retry logic)

### SECRT-2009: Transcript upload asyncio.timeout error
`asyncio.create_task()` at line 696 of `service.py` creates an orphaned
background task in the executor's thread event loop.
`gcloud-aio-storage`'s `asyncio.timeout()` fails in this context.

**Fix:** Replace `create_task` with direct `await`. The upload runs
after streaming completes (all chunks already yielded), so no
user-facing latency impact. The function already has internal try/except
error handling.
2026-02-17 22:24:19 +00:00
Reinier van der Leer
d23248f065 feat(backend/copilot): Copilot Executor Microservice (#12057)
Uncouple Copilot task execution from the REST API server. This should
help performance and scalability, and allows task execution to continue
regardless of the state of the user's connection.

- Resolves #12023

### Changes 🏗️

- Add `backend.copilot.executor`->`CoPilotExecutor` (setup similar to
`backend.executor`->`ExecutionManager`).

This executor service uses RabbitMQ-based task distribution, and sticks
with the existing Redis Streams setup for task output. It uses a cluster
lock mechanism to ensure a task is only executed by one pod, and the
`DatabaseManager` for pooled DB access.

- Add `backend.data.db_accessors` for automatic choice of direct/proxied
DB access

Chat requests now flow: API → RabbitMQ → CoPilot Executor → Redis
Streams → SSE Client. This enables horizontal scaling of chat processing
and isolates long-running LLM operations from the API service.

- Move non-API Copilot stuff into `backend.copilot` (from
`backend.api.features.chat`)
  - Updated import paths for all usages

- Move `backend.executor.database` to `backend.data.db_manager` and add
methods for copilot executor
  - Updated import paths for all usages
- Make `backend.copilot.db` RPC-compatible (-> DB ops return ~~Prisma~~
Pydantic models)
  - Make `backend.data.workspace` RPC-compatible
  - Make `backend.data.graphs.get_store_listed_graphs` RPC-compatible

DX:
- Add `copilot_executor` service to Docker setup

Config:
- Add `Config.num_copilot_workers` (default 5) and
`Config.copilot_executor_port` (default 8008)
- Remove unused `Config.agent_server_port`

> [!WARNING]
> **This change adds a new microservice to the system, with entrypoint
`backend.copilot.executor`.**
> The `docker compose` setup has been updated, but if you run the
Platform on something else, you'll have to update your deployment config
to include this new service.
>
> When running locally, the `CoPilotExecutor` uses port 8008 by default.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Copilot works
    - [x] Processes messages when triggered
    - [x] Can use its tools

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-17 16:15:28 +00:00
Bently
905373a712 fix(frontend): use singleton Shiki highlighter for code syntax highlighting (#12144)
## Summary
Addresses SENTRY-1051: Shiki warning about multiple highlighter
instances.

## Problem
The `@streamdown/code` package creates a **new Shiki highlighter for
each language** encountered. When users view AI chat responses with code
blocks in multiple languages (JavaScript, Python, JSON, YAML, etc.),
this creates 10+ highlighter instances, triggering Shiki's warning:

> "10 instances have been created. Shiki is supposed to be used as a
singleton, consider refactoring your code to cache your highlighter
instance"

This causes memory bloat and performance degradation.

## Solution
Introduced a custom code highlighting plugin that properly implements
the singleton pattern:

### New files:
- `src/lib/shiki-highlighter.ts` - Singleton highlighter management
- `src/lib/streamdown-code-plugin.ts` - Drop-in replacement for
`@streamdown/code`

### Key features:
- **Single shared highlighter** - One instance serves all code blocks
- **Preloaded common languages** - JS, TS, Python, JSON, Bash, YAML,
etc.
- **Lazy loading** - Additional languages loaded on demand
- **Result caching** - Avoids re-highlighting identical code blocks

### Changes:
- Added `shiki` as direct dependency
- Updated `message.tsx` to use the new plugin

## Testing
- [ ] Verify code blocks render correctly in AI chat
- [ ] Confirm no Shiki singleton warnings in console
- [ ] Test with multiple languages in same conversation

## Related
- Linear: SENTRY-1051
- Sentry: Multiple Shiki instances warning

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaced `@streamdown/code` with a custom singleton-based Shiki
highlighter implementation to resolve memory bloat from creating
multiple highlighter instances per language. The new implementation
creates a single shared highlighter with preloaded common languages (JS,
TS, Python, JSON, etc.) and lazy-loads additional languages on demand.
Results are cached to avoid re-highlighting identical code blocks.

**Key changes:**
- Added `shiki` v3.21.0 as a direct dependency
- Created `shiki-highlighter.ts` with singleton pattern and language
management utilities
- Created `streamdown-code-plugin.ts` as a drop-in replacement for
`@streamdown/code`
- Updated `message.tsx` to import from the new plugin instead of
`@streamdown/code`

The implementation follows React best practices with async highlighting
and callback-based notifications. The cache key uses code length +
prefix/suffix for efficient lookups on large code blocks.
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor considerations for edge cases
- The implementation is solid with proper singleton pattern, caching,
and async handling. The code is well-structured and addresses the stated
problem. However, there's a subtle potential race condition in the
callback handling where multiple concurrent requests for the same cache
key could trigger duplicate highlight operations before the first
completes. The cache key generation using prefix/suffix could
theoretically cause false cache hits for large files with identical
prefixes and suffixes. Despite these edge cases, the implementation
should work correctly for the vast majority of use cases.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant UI as Streamdown Component
    participant Plugin as Custom Code Plugin
    participant Cache as Token Cache
    participant Singleton as Shiki Highlighter (Singleton)
    participant Callbacks as Pending Callbacks

    UI->>Plugin: highlight(code, lang)
    Plugin->>Cache: Check cache key
    
    alt Cache hit
        Cache-->>Plugin: Return cached result
        Plugin-->>UI: Return highlighted tokens
    else Cache miss
        Plugin->>Callbacks: Register callback
        Plugin->>Singleton: Get highlighter instance
        
        alt First call
            Singleton->>Singleton: Create highlighter with preloaded languages
        end
        
        Singleton-->>Plugin: Return highlighter
        
        alt Language not loaded
            Plugin->>Singleton: Load language dynamically
        end
        
        Plugin->>Singleton: codeToTokens(code, lang, themes)
        Singleton-->>Plugin: Return tokens
        Plugin->>Cache: Store result
        Plugin->>Callbacks: Notify all waiting callbacks
        Callbacks-->>UI: Async callback with result
    end
```
</details>


<sub>Last reviewed commit: 96c793b</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-17 12:15:53 +00:00
Otto
ee9d39bc0f refactor(copilot): Replace legacy delete dialog with molecules/Dialog (#12136)
## Summary
Updates the session delete confirmation in CoPilot to use the new
`Dialog` component from `molecules/Dialog` instead of the legacy
`DeleteConfirmDialog`.

## Changes
- **ChatSidebar**: Use Dialog component for delete confirmation
(desktop)
- **CopilotPage**: Use Dialog component for delete confirmation (mobile)

## Behavior
- Dialog stays **open** during deletion with loading state on button
- Cancel button **disabled** while delete is in progress
- Delete button shows **loading spinner** during deletion
- Dialog only closes on successful delete or when cancel is clicked (if
not deleting)

## Screenshots
*Dialog uses the same styling as other molecules/Dialog instances in the
app*

## Requested by
@0ubbe

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Replaces the legacy `DeleteConfirmDialog` component with the new
`molecules/Dialog` component for session delete confirmations in both
desktop (ChatSidebar) and mobile (CopilotPage) views. The new
implementation maintains the same behavior: dialog stays open during
deletion with a loading state on the delete button and disabled cancel
button, closing only on successful deletion or cancel click.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- This is a straightforward component replacement that maintains the
same behavior and UX. The Dialog component API is properly used with
controlled state, the loading states are correctly implemented, and both
mobile and desktop views are handled consistently. The changes are
well-tested patterns used elsewhere in the codebase.
- No files require special attention
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[User clicks delete button] --> B{isMobile?}
    B -->|Yes| C[CopilotPage Dialog]
    B -->|No| D[ChatSidebar Dialog]
    
    C --> E[Set sessionToDelete state]
    D --> E
    
    E --> F[Dialog opens with controlled.isOpen]
    F --> G{User action?}
    
    G -->|Cancel| H{isDeleting?}
    H -->|No| I[handleCancelDelete: setSessionToDelete null]
    H -->|Yes| J[Cancel button disabled]
    
    G -->|Confirm Delete| K[handleConfirmDelete called]
    K --> L[deleteSession mutation]
    L --> M[isDeleting = true]
    M --> N[Button shows loading spinner]
    M --> O[Cancel button disabled]
    
    L --> P{Mutation result?}
    P -->|Success| Q[Invalidate sessions query]
    Q --> R[Clear sessionId if current]
    R --> S[setSessionToDelete null]
    S --> T[Dialog closes]
    
    P -->|Error| U[Show toast error]
    U --> V[setSessionToDelete null]
    V --> W[Dialog closes]
```
</details>


<sub>Last reviewed commit: 275950c</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-17 19:12:27 +07:00
Swifty
05aaf7a85e fix(backend): Rename LINEAR_API_KEY to COPILOT_LINEAR_API_KEY to prevent global access (#12143)
The `LINEAR_API_KEY` environment variable name is too generic — it
matches the key name used by integrations/blocks, meaning that if set
globally, it could inadvertently grant all users access to Linear
through the blocks system rather than restricting it to the copilot
feature-request tool.

This renames the setting to `COPILOT_LINEAR_API_KEY` to make it clear
this key is scoped exclusively to the copilot's feature-request
functionality, preventing it from being picked up as a general-purpose
Linear credential.

### Changes 🏗️

- Renamed `linear_api_key` → `copilot_linear_api_key` in `Secrets`
settings model (`backend/util/settings.py`)
- Updated all references in the copilot feature-request tool
(`backend/api/features/chat/tools/feature_requests.py`)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified the rename is consistent across all references (settings
+ feature_requests tool)
  - [x] No other files reference the old `linear_api_key` setting name

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

> **Note:** The env var changes from `LINEAR_API_KEY` to
`COPILOT_LINEAR_API_KEY`. Any deployment using the old name will need to
update accordingly.

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

Renamed `LINEAR_API_KEY` to `COPILOT_LINEAR_API_KEY` in settings and the
copilot feature-request tool to prevent unintended access through Linear
blocks.

**Key changes:**
- Updated `Secrets.linear_api_key` → `Secrets.copilot_linear_api_key` in
`backend/util/settings.py`
- Updated all references in
`backend/api/features/chat/tools/feature_requests.py`
- The rename prevents the copilot Linear key from being picked up by the
Linear blocks integration (which uses `LINEAR_API_KEY` via
`ProviderBuilder` in `backend/blocks/linear/_config.py`)

**Issues found:**
- `.env.default` still references `LINEAR_API_KEY` instead of
`COPILOT_LINEAR_API_KEY`
- Frontend styleguide has a hardcoded error message with the old
variable name
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- Generally safe but requires fixing `.env.default` before deployment
- The code changes are correct and achieve the intended security
improvement by preventing scope leakage. However, the PR is incomplete -
`.env.default` wasn't updated (critical for deployment) and a frontend
error message reference was missed. These issues will cause
configuration problems for anyone deploying with the new variable name.
- Check `autogpt_platform/backend/.env.default` and
`autogpt_platform/frontend/src/app/(platform)/copilot/styleguide/page.tsx`
- both need updates to match the renamed variable
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[".env file<br/>COPILOT_LINEAR_API_KEY"] --> B["Secrets model<br/>copilot_linear_api_key"]
    B --> C["feature_requests.py<br/>_get_linear_config()"]
    C --> D["Creates APIKeyCredentials<br/>for copilot feature requests"]
    
    E[".env file<br/>LINEAR_API_KEY"] --> F["ProviderBuilder<br/>in blocks/linear/_config.py"]
    F --> G["Linear blocks integration<br/>for user workflows"]
    
    style A fill:#90EE90
    style B fill:#90EE90
    style C fill:#90EE90
    style D fill:#90EE90
    style E fill:#FFD700
    style F fill:#FFD700
    style G fill:#FFD700
```
</details>


<sub>Last reviewed commit: 86dc57a</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-17 11:16:43 +01:00
Reinier van der Leer
9d4dcbd9e0 fix(backend/docker): Make server last (= default) build stage
Without specifying an explicit build target it would build the `migrate` stage because it is the last stage in the Dockerfile. This caused deployment failures.

- Follow-up to #12124 and 074be7ae
2026-02-16 14:49:30 +01:00
Reinier van der Leer
074be7aea6 fix(backend/docker): Update run commands to match deployment
- Follow-up to #12124

Changes:
- Update `run` commands for all backend services in `docker-compose.platform.yml` to match the deployment commands used in production
- Add trigger on `docker-compose(.platform)?.yml` changes to the Frontend CI workflow
2026-02-16 14:23:29 +01:00
Otto
39d28b24fc ci(backend): Upgrade RabbitMQ from 3.12 (EOL) to 4.1.4 (#12118)
## Summary
Upgrades RabbitMQ from the end-of-life `rabbitmq:3.12-management` to
`rabbitmq:4.1.4`, aligning CI, local dev, and e2e testing with
production.

## Changes

### CI Workflow (`.github/workflows/platform-backend-ci.yml`)
- **Image:** `rabbitmq:3.12-management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used
- **Health check:** Added to prevent flaky tests from race conditions
during startup

### Docker Compose (`docker-compose.platform.yml`,
`docker-compose.test.yaml`)
- **Image:** `rabbitmq:management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used

## Why
- RabbitMQ 3.12 is EOL
- We don't use the management interface, so `-management` variant is
unnecessary
- CI and local dev/e2e should match production (4.1.4)

## Testing
CI validates that backend tests pass against RabbitMQ 4.1.4 on Python
3.11, 3.12, and 3.13.

---
Closes SECRT-1703
2026-02-16 12:45:39 +00:00
Reinier van der Leer
bf79a7748a fix(backend/build): Update stale Poetry usage in Dockerfile (#12124)
[SECRT-2006: Dev deployment failing: poetry not found in container
PATH](https://linear.app/autogpt/issue/SECRT-2006)

- Follow-up to #12090

### Changes 🏗️

- Remove now-broken Poetry path config values
- Remove usage of now-broken `poetry run` in container run command
- Add trigger on `backend/Dockerfile` changes to Frontend CI workflow

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - If it works, CI will pass
2026-02-16 13:54:20 +01:00
Otto
649d4ab7f5 feat(chat): Add delete chat session endpoint and UI (#12112)
## Summary

Adds the ability to delete chat sessions from the CoPilot interface.

## Changes

### Backend
- Add `DELETE /api/chat/sessions/{session_id}` endpoint in `routes.py`
- Returns 204 on success, 404 if not found or not owned by user
- Reuses existing `delete_chat_session` function from `model.py`

### Frontend
- Add delete button (trash icon) that appears on hover for each chat
session
- Add confirmation dialog before deletion using existing
`DeleteConfirmDialog` component
- Refresh session list after successful delete
- Clear current session selection if the deleted session was active
- Update OpenAPI spec with new endpoint

## Testing

1. Hover over a chat session in sidebar → trash icon appears
2. Click trash icon → confirmation dialog
3. Confirm deletion → session removed, list refreshes
4. If deleted session was active, selection is cleared

## Screenshots

Delete button appears on hover, confirmation dialog on click.

## Related Issues

Closes SECRT-1928

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Adds the ability to delete chat sessions from the CoPilot interface — a
new `DELETE /api/chat/sessions/{session_id}` backend endpoint and a
corresponding delete button with confirmation dialog in the
`ChatSidebar` frontend component.

- **Backend route** (`routes.py`): Clean implementation reusing the
existing `delete_chat_session` model function with proper auth guards
and 204/404 responses. No issues.
- **Frontend** (`ChatSidebar.tsx`): Adds hover-visible trash icon per
session, confirmation dialog, mutation with cache invalidation, and
active session clearing on delete. However, it uses a `__legacy__`
component (`DeleteConfirmDialog`) which violates the project's style
guide — new code should use the modern design system components. Error
handling only logs to console without user-facing feedback (project
convention is to use toast notifications for mutation errors).
`isDeleting` is destructured but unused.
- **OpenAPI spec** updated correctly.
- **Unrelated file included**:
`notes/plan-SECRT-1959-graph-edge-desync.md` is a planning document for
a different ticket and should be removed from this PR. The `notes/`
directory is newly introduced and both plan files should be reconsidered
for inclusion.
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- Functionally correct but has style guide violations and includes
unrelated files that should be addressed before merge.
- The core feature implementation (backend DELETE endpoint and frontend
mutation logic) is sound and follows existing patterns. Score is lowered
because: (1) the frontend uses a legacy component explicitly prohibited
by the project's style guide, (2) mutation errors are not surfaced to
the user, and (3) the PR includes an unrelated planning document for a
different ticket.
- Pay close attention to `ChatSidebar.tsx` for the legacy component
import and error handling, and
`notes/plan-SECRT-1959-graph-edge-desync.md` which should be removed.
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant ChatSidebar as ChatSidebar (Frontend)
    participant ReactQuery as React Query
    participant API as DELETE /api/chat/sessions/{id}
    participant Model as model.delete_chat_session
    participant DB as db.delete_chat_session (Prisma)
    participant Redis as Redis Cache

    User->>ChatSidebar: Click trash icon on session
    ChatSidebar->>ChatSidebar: Show DeleteConfirmDialog
    User->>ChatSidebar: Confirm deletion
    ChatSidebar->>ReactQuery: deleteSession({ sessionId })
    ReactQuery->>API: DELETE /api/chat/sessions/{session_id}
    API->>Model: delete_chat_session(session_id, user_id)
    Model->>DB: delete_many(where: {id, userId})
    DB-->>Model: bool (deleted count > 0)
    Model->>Redis: Delete session cache key
    Model->>Model: Clean up session lock
    Model-->>API: True
    API-->>ReactQuery: 204 No Content
    ReactQuery->>ChatSidebar: onSuccess callback
    ChatSidebar->>ReactQuery: invalidateQueries(sessions list)
    ChatSidebar->>ChatSidebar: Clear sessionId if deleted was active
```
</details>


<sub>Last reviewed commit: 44a92c6</sub>

<!-- greptile_other_comments_section -->

<details><summary><h4>Context used (3)</h4></summary>

- Context from `dashboard` - autogpt_platform/frontend/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=39861924-d320-41ba-a1a7-a8bff44f780a))
- Context from `dashboard` - autogpt_platform/frontend/CONTRIBUTING.md
([source](https://app.greptile.com/review/custom-context?memory=cc4f1b17-cb5c-4b63-b218-c772b48e20ee))
- Context from `dashboard` - autogpt_platform/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=6e9dc5dc-8942-47df-8677-e60062ec8c3a))
</details>


<!-- /greptile_comment -->

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-16 12:19:18 +00:00
Ubbe
223df9d3da feat(frontend): improve create/edit copilot UX (#12117)
## Changes 🏗️

Make the UX nicer when running long tasks in Copilot, like creating an
agent, editing it or running a task.

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run locally and play the game!

<!-- greptile_comment -->

<details><summary><h3>Greptile Summary</h3></summary>

This PR replaces the static progress bar and idle wait screens with an
interactive mini-game across the Create, Edit, and Run Agent copilot
tools. The existing mini-game (a simple runner with projectile-dodge
boss encounters) is significantly overhauled into a two-mode game: a
runner mode with animated tree obstacles and a duel mode featuring a
melee boss fight with attack, guard, and movement mechanics.
Sprite-based rendering replaces the previous shape-drawing approach.

- **Create/Edit/Run Agent UX**: All three tool views now show the
mini-game with contextual overlays during long-running operations,
replacing the progress bar in EditAgent and adding the game to RunAgent
- **Game mechanics overhaul**: Boss encounters changed from
projectile-dodging to melee duel with attack (Z), block (X), movement
(arrows), and jump (Space) controls
- **Sprite rendering**: Added 9 sprite sheet assets for characters,
trees, and boss animations with fallback to shape rendering if images
fail to load
- **UI overlays**: Added React-managed overlay states for idle,
boss-intro, boss-defeated, and game-over screens with continue/retry
buttons
- **Minor issues found**: Unused `isRunActive` variable in
`MiniGame.tsx`, unreachable "leaving" boss phase in `useMiniGame.ts`,
and a missing `expanded` property in `getAccordionMeta` return type
annotation in `EditAgent.tsx`
- **Unused asset**: `archer-shoot.png` is included in the PR but never
imported or referenced in any code
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge — it only affects the copilot mini-game UX
with no backend or data model changes.
- The changes are entirely frontend/cosmetic, scoped to the copilot
tools' waiting UX. The mini-game logic is self-contained in a
canvas-based hook and doesn't affect any application state, API calls,
or routing. The issues found are minor (unused variable, dead code, type
annotation gap, unused asset) and don't impact runtime behavior.
- `useMiniGame.ts` has the most complex logic changes (boss AI, death
animations, sprite rendering) and contains unreachable dead code in the
"leaving" phase handler. `EditAgent.tsx` has a return type annotation
that doesn't include `expanded`.
</details>


<details><summary><h3>Flowchart</h3></summary>

```mermaid
flowchart TD
    A[Game Idle] -->|"Start button"| B[Run Mode]
    B -->|"Jump over trees"| C{Score >= Threshold?}
    C -->|No| B
    C -->|"Yes, obstacles clear"| D[Boss Intro Overlay]
    D -->|"Continue button"| E[Duel Mode]
    E -->|"Attack Z / Guard X / Move ←→"| F{Boss HP <= 0?}
    F -->|No| G{Player hit & not guarding?}
    G -->|No| E
    G -->|Yes| H[Player Death Animation]
    H --> I[Game Over Overlay]
    I -->|"Retry button"| B
    F -->|Yes| J[Boss Death Animation]
    J --> K[Boss Defeated Overlay]
    K -->|"Continue button"| L[Reset Boss & Resume Run]
    L --> B
```
</details>


<sub>Last reviewed commit: ad80e24</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-16 10:53:08 +00:00
Ubbe
187ab04745 refactor(frontend): remove OldAgentLibraryView and NEW_AGENT_RUNS flag (#12088)
## Summary
- Removes the deprecated `OldAgentLibraryView` directory (13 files,
~2200 lines deleted)
- Removes the `NEW_AGENT_RUNS` feature flag from the `Flag` enum and
defaults
- Removes the legacy agent library page at `library/legacy/[id]`
- Moves shared `CronScheduler` components to
`src/components/contextual/CronScheduler/`
- Moves `agent-run-draft-view` and `agent-status-chip` to
`legacy-builder/` (co-located with their only consumer)
- Updates all import paths in consuming files (`AgentInfoStep`,
`SaveControl`, `RunnerInputUI`, `useRunGraph`)

## Test plan
- [x] `pnpm format` passes
- [x] `pnpm types` passes (no TypeScript errors)
- [x] No remaining references to `OldAgentLibraryView`,
`NEW_AGENT_RUNS`, or `new-agent-runs` in the codebase
- [x] Verify `RunnerInputUI` dialog still works in the legacy builder
- [x] Verify `AgentInfoStep` cron scheduling works in the publish modal
- [x] Verify `SaveControl` cron scheduling works in the legacy builder

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR removes deprecated code from the legacy agent library view
system and consolidates the codebase to use the new agent runs
implementation exclusively. The refactor successfully removes ~2200
lines of code across 13 deleted files while properly relocating shared
components.

**Key changes:**
- Removed the entire `OldAgentLibraryView` directory and its 13
component files
- Removed the `NEW_AGENT_RUNS` feature flag from the `Flag` enum and
defaults
- Deleted the legacy agent library page route at `library/legacy/[id]`
- Moved `CronScheduler` components to
`src/components/contextual/CronScheduler/` for shared use across the
application
- Moved `agent-run-draft-view` and `agent-status-chip` to
`legacy-builder/` directory, co-locating them with their only consumer
- Updated `useRunGraph.ts` to import `GraphExecutionMeta` from the
generated API models instead of the deleted custom type definition
- Updated all import paths in consuming components (`AgentInfoStep`,
`SaveControl`, `RunnerInputUI`)

**Technical notes:**
- The new import path for `GraphExecutionMeta`
(`@/app/api/__generated__/models/graphExecutionMeta`) will be generated
when running `pnpm generate:api` from the OpenAPI spec
- All references to the old code have been cleanly removed from the
codebase
- The refactor maintains proper separation of concerns by moving shared
components to contextual locations
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with minimal risk, pending manual
verification of the UI components mentioned in the test plan
- The refactor is well-structured and all code changes are correct. The
score of 4 (rather than 5) reflects that the PR author has marked three
manual testing items as incomplete in the test plan: verifying
`RunnerInputUI` dialog, `AgentInfoStep` cron scheduling, and
`SaveControl` cron scheduling. While the code changes are sound, these
UI components should be manually tested before merging to ensure the
moved components work correctly in their new locations.
- No files require special attention. The author should complete the
manual testing checklist items for `RunnerInputUI`, `AgentInfoStep`, and
`SaveControl` as noted in the test plan.
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Dev as Developer
    participant FE as Frontend Build
    participant API as Backend API
    participant Gen as Generated Types

    Note over Dev,Gen: Refactor: Remove OldAgentLibraryView & NEW_AGENT_RUNS flag

    Dev->>FE: Delete OldAgentLibraryView (13 files, ~2200 lines)
    Dev->>FE: Remove NEW_AGENT_RUNS from Flag enum
    Dev->>FE: Delete library/legacy/[id]/page.tsx
    
    Dev->>FE: Move CronScheduler → src/components/contextual/
    Dev->>FE: Move agent-run-draft-view → legacy-builder/
    Dev->>FE: Move agent-status-chip → legacy-builder/
    
    Dev->>FE: Update RunnerInputUI import path
    Dev->>FE: Update SaveControl import path
    Dev->>FE: Update AgentInfoStep import path
    
    Dev->>FE: Update useRunGraph.ts
    FE->>Gen: Import GraphExecutionMeta from generated models
    Note over Gen: Type available after pnpm generate:api
    
    Gen-->>API: Uses OpenAPI spec schema
    API-->>FE: Type-safe GraphExecutionMeta model
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 18:29:59 +08:00
Abhimanyu Yadav
e2d3c8a217 fix(frontend): Prevent node drag when selecting text in object editor key input (#11955)
## Summary
- Add `nodrag` class to the key name input wrapper in
`WrapIfAdditionalTemplate.tsx`
- This prevents the node from being dragged when users try to select
text in the key name input field
- Follows the same pattern used by other input components like
`TextWidget.tsx`

## Test plan
- [x] Open the new builder
- [x] Add a custom node with an Object input field
- [x] Try to select text in the key name input by clicking and dragging
- [x] Verify that text selection works without moving the block

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-16 06:59:33 +00:00
Eve
647c8ed8d4 feat(backend/blocks): enhance list concatenation with advanced operations (#12105)
## Summary

Enhances the existing `ConcatenateListsBlock` and adds five new
companion blocks for comprehensive list manipulation, addressing issue
#11139 ("Implement block to concatenate lists").

### Changes

- **Enhanced `ConcatenateListsBlock`** with optional deduplication
(`deduplicate`) and None-value filtering (`remove_none`), plus an output
`length` field
- **New `FlattenListBlock`**: Recursively flattens nested list
structures with configurable `max_depth`
- **New `InterleaveListsBlock`**: Round-robin interleaving of elements
from multiple lists
- **New `ZipListsBlock`**: Zips corresponding elements from multiple
lists with support for padding to longest or truncating to shortest
- **New `ListDifferenceBlock`**: Computes set difference between two
lists (regular or symmetric)
- **New `ListIntersectionBlock`**: Finds common elements between two
lists, preserving order

### Helper Utilities

Extracted reusable helper functions for validation, flattening,
deduplication, interleaving, chunking, and statistics computation to
support the blocks and enable future reuse.

### Test Coverage

Comprehensive test suite with 188 test functions across 29 test classes
covering:
- Built-in block test harness validation for all 6 blocks
- Manual edge-case tests for each block (empty inputs, large lists,
mixed types, nested structures)
- Internal method tests for all block classes
- Unit tests for all helper utility functions

Closes #11139

## Test plan

- [x] All files pass Python syntax validation (`ast.parse`)
- [x] Built-in `test_input`/`test_output` tests defined for all blocks
- [x] Manual tests cover edge cases: empty lists, large lists, mixed
types, nested structures, deduplication, None removal
- [x] Helper function tests validate all utility functions independently
- [x] All block IDs are valid UUID4
- [x] Block categories set to `BlockCategory.BASIC` for consistency with
existing list blocks


<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Enhanced `ConcatenateListsBlock` with deduplication and None-filtering
options, and added five new list manipulation blocks
(`FlattenListBlock`, `InterleaveListsBlock`, `ZipListsBlock`,
`ListDifferenceBlock`, `ListIntersectionBlock`) with comprehensive
helper functions and test coverage.

**Key Changes:**
- Enhanced `ConcatenateListsBlock` with `deduplicate` and `remove_none`
options, plus `length` output field
- Added `FlattenListBlock` for recursively flattening nested lists with
configurable `max_depth`
- Added `InterleaveListsBlock` for round-robin element interleaving
- Added `ZipListsBlock` with support for padding/truncation
- Added `ListDifferenceBlock` and `ListIntersectionBlock` for set
operations
- Extracted 12 reusable helper functions for validation, flattening,
deduplication, etc.
- Comprehensive test suite with 188 test functions covering edge cases

**Minor Issues:**
- Helper function `_deduplicate_list` has redundant logic in the `else`
branch that duplicates the `if` branch
- Three helper functions (`_filter_empty_collections`,
`_compute_list_statistics`, `_chunk_list`) are defined but unused -
consider removing unless planned for future use
- The `_make_hashable` function uses `hash(repr(item))` for unhashable
types, which correctly treats structurally identical dicts/lists as
duplicates
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor style improvements recommended
- The implementation is well-structured with comprehensive test coverage
(188 tests), proper error handling, and follows existing block patterns.
All blocks use valid UUID4 IDs and correct categories. The helper
functions provide good code reuse. The minor issues are purely stylistic
(redundant code, unused helpers) and don't affect functionality or
safety.
- No files require special attention - both files are well-tested and
follow project conventions
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant Block as List Block
    participant Helper as Helper Functions
    participant Output
    
    User->>Block: Input (lists/parameters)
    Block->>Helper: _validate_all_lists()
    Helper-->>Block: validation result
    
    alt validation fails
        Block->>Output: error message
    else validation succeeds
        Block->>Helper: _concatenate_lists_simple() / _flatten_nested_list() / etc.
        Helper-->>Block: processed result
        
        opt deduplicate enabled
            Block->>Helper: _deduplicate_list()
            Helper-->>Block: deduplicated result
        end
        
        opt remove_none enabled
            Block->>Helper: _filter_none_values()
            Helper-->>Block: filtered result
        end
        
        Block->>Output: result + length
    end
    
    Output-->>User: Block outputs
```
</details>


<sub>Last reviewed commit: a6d5445</sub>

<!-- greptile_other_comments_section -->

<sub>(2/5) Greptile learns from your feedback when you react with thumbs
up/down!</sub>

<!-- /greptile_comment -->

---------

Co-authored-by: Otto <otto@agpt.co>
2026-02-16 05:39:53 +00:00
Zamil Majdy
27d94e395c feat(backend/sdk): enable WebSearch, block WebFetch, consolidate tool constants (#12108)
## Summary
- Enable Claude Agent SDK built-in **WebSearch** tool (Brave Search via
Anthropic API) for the CoPilot SDK agent
- Explicitly **block WebFetch** via `SDK_DISALLOWED_TOOLS`. The agent
uses the SSRF-protected `mcp__copilot__web_fetch` MCP tool instead
- **Consolidate** all tool security constants (`BLOCKED_TOOLS`,
`WORKSPACE_SCOPED_TOOLS`, `DANGEROUS_PATTERNS`, `SDK_DISALLOWED_TOOLS`)
into `tool_adapter.py` as a single source of truth — previously
scattered across `tool_adapter.py`, `security_hooks.py`, and inline in
`service.py`

## Changes
- `tool_adapter.py`: Add `WebSearch` to `_SDK_BUILTIN_TOOLS`, add
`SDK_DISALLOWED_TOOLS`, move security constants here
- `security_hooks.py`: Import constants from `tool_adapter.py` instead
of defining locally
- `service.py`: Use `SDK_DISALLOWED_TOOLS` instead of inline `["Bash"]`

## Test plan
- [x] All 21 security hooks tests pass
- [x] Ruff lint clean
- [x] All pre-commit hooks pass
- [ ] Verify WebSearch works in CoPilot chat (manual test)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Consolidates tool security constants into `tool_adapter.py` as single
source of truth, enables WebSearch (Brave via Anthropic API), and
explicitly blocks WebFetch to prevent SSRF attacks. The change improves
security by ensuring the agent uses the SSRF-protected
`mcp__copilot__web_fetch` tool instead of the built-in WebFetch which
can access internal networks like `localhost:8006`.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes improve security by blocking WebFetch (SSRF risk) while
enabling safe WebSearch. The consolidation of constants into a single
source of truth improves maintainability. All existing tests pass (21
security hooks tests), and the refactoring is straightforward with no
behavioral changes to existing security logic. The only suggestions are
minor improvements: adding a test for WebFetch blocking and considering
a lowercase alias for consistency.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Agent as SDK Agent
    participant Hooks as Security Hooks
    participant TA as tool_adapter.py
    participant MCP as MCP Tools
    
    Note over TA: SDK_DISALLOWED_TOOLS = ["Bash", "WebFetch"]
    Note over TA: _SDK_BUILTIN_TOOLS includes WebSearch
    
    Agent->>Hooks: Request WebSearch (Brave API)
    Hooks->>TA: Check BLOCKED_TOOLS
    TA-->>Hooks: Not blocked
    Hooks-->>Agent: Allowed ✓
    Agent->>Agent: Execute via Anthropic API
    
    Agent->>Hooks: Request WebFetch (SSRF risk)
    Hooks->>TA: Check BLOCKED_TOOLS
    Note over TA: WebFetch in SDK_DISALLOWED_TOOLS
    TA-->>Hooks: Blocked
    Hooks-->>Agent: Denied ✗
    Note over Agent: Use mcp__copilot__web_fetch instead
    
    Agent->>Hooks: Request mcp__copilot__web_fetch
    Hooks->>MCP: Validate (MCP tool, not SDK builtin)
    MCP-->>Hooks: Has SSRF protection
    Hooks-->>Agent: Allowed ✓
    Agent->>MCP: Execute with SSRF checks
```
</details>


<sub>Last reviewed commit: 2d9975f</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-15 06:51:25 +00:00
DEEVEN SERU
b8f5c208d0 Handle errors in Jina ExtractWebsiteContentBlock (#12048)
## Summary
- catch Jina reader client/server errors in ExtractWebsiteContentBlock
and surface a clear error output keyed to the user URL
- guard empty responses to return an explicit error instead of yielding
blank content
- add regression tests covering the happy path and HTTP client failures
via a monkeypatched fetch

## Testing
- not run (pytest unavailable in this environment)

---------

Co-authored-by: Nicholas Tindle <nicktindle@outlook.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-02-13 19:15:09 +00:00
Otto
ca216dfd7f ci(docs-claude-review): Update comments instead of creating new ones (#12106)
## Changes 🏗️

This PR updates the Claude Block Docs Review CI workflow to update
existing comments instead of creating new ones on each push.

### What's Changed:
1. **Concurrency group** - Prevents race conditions if the workflow runs
twice simultaneously
2. **Comment cleanup step** - Deletes any previous Claude review comment
before posting a new one
3. **Marker instruction** - Instructs Claude to include a `<!--
CLAUDE_DOCS_REVIEW -->` marker in its comment for identification

### Why:
Previously, every PR push would create a new review comment, cluttering
the PR with multiple comments. Now only the most recent review is shown.

### Testing:
1. Create a PR that triggers this workflow (modify a file in
`docs/integrations/` or `autogpt_platform/backend/backend/blocks/`)
2. Verify first run creates comment with marker
3. Push another commit
4. Verify old comment is deleted and new comment is created (not
accumulated)

Requested by @Bentlybro

---

## Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan (will be
tested on merge)

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Added concurrency control and comment deduplication to prevent multiple
Claude review comments from accumulating on PRs. The workflow now
deletes previous review comments (identified by `<!-- CLAUDE_DOCS_REVIEW
-->` marker) before posting new ones, and uses concurrency groups to
prevent race conditions.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes are well-contained, follow GitHub Actions best practices,
and use built-in GitHub APIs safely. The concurrency control prevents
race conditions, and the comment cleanup logic uses proper filtering
with `head -1` to handle edge cases. The HTML comment marker approach is
standard and reliable.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant GH as GitHub PR Event
    participant WF as Workflow
    participant API as GitHub API
    participant Claude as Claude Action
    
    GH->>WF: PR opened/synchronized
    WF->>WF: Check concurrency group
    Note over WF: Cancel any in-progress runs<br/>for same PR number
    WF->>API: Query PR comments
    API-->>WF: Return all comments
    WF->>WF: Filter for CLAUDE_DOCS_REVIEW marker
    alt Previous comment exists
        WF->>API: DELETE comment by ID
        API-->>WF: Comment deleted
    else No previous comment
        WF->>WF: Skip deletion
    end
    WF->>Claude: Run code review
    Claude->>API: POST new comment with marker
    API-->>Claude: Comment created
```
</details>


<sub>Last reviewed commit: fb1b436</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-13 16:46:23 +00:00
Zamil Majdy
f9f358c526 feat(mcp): Add MCP tool block with OAuth, tool discovery, and standard credential integration (#12011)
## Summary

<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/18e8ef34-d222-453c-8b0a-1b25ef8cf806"
/>


<img width="250" alt="image"
src="https://github.com/user-attachments/assets/ba97556c-09c5-4f76-9f4e-49a2e8e57468"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/68f7804a-fe74-442d-9849-39a229c052cf"
/>

<img width="250" alt="image"
src="https://github.com/user-attachments/assets/700690ba-f9fe-4726-8871-3bfbab586001"
/>

Full-stack MCP (Model Context Protocol) tool block integration that
allows users to connect to any MCP server, discover available tools,
authenticate via OAuth, and execute tools — all through the standard
AutoGPT credential system.

### Backend

- **MCPToolBlock** (`blocks/mcp/block.py`): New block using
`CredentialsMetaInput` pattern with optional credentials (`default={}`),
supporting both authenticated (OAuth) and public MCP servers. Includes
auto-lookup fallback for backward compatibility.
- **MCP Client** (`blocks/mcp/client.py`): HTTP transport with JSON-RPC
2.0, tool discovery, tool execution with robust error handling
(type-checked error fields, non-JSON response handling)
- **MCP OAuth Handler** (`blocks/mcp/oauth.py`): RFC 8414 discovery,
dynamic per-server OAuth with PKCE, token storage and refresh via
`raise_for_status=True`
- **MCP API Routes** (`api/features/mcp/routes.py`): `discover-tools`,
`oauth/login`, `oauth/callback` endpoints with credential cleanup,
defensive OAuth metadata validation
- **Credential system integration**:
- `CredentialsMetaInput` model_validator normalizes legacy
`"ProviderName.MCP"` format from Python 3.13's `str(StrEnum)` change
- `CredentialsFieldInfo.combine()` supports URL-based credential
discrimination (each MCP server gets its own credential entry)
- `aggregate_credentials_inputs` checks block schema defaults for
credential optionality
- Executor normalizes credential data for both Pydantic and JSON schema
validation paths
  - Chat credential matching handles MCP server URL filtering
- `provider_matches()` helper used consistently for Python 3.13 StrEnum
compatibility
- **Pre-run validation**: `_validate_graph_get_errors` now calls
`get_missing_input()` for custom block-level validation (MCP tool
arguments)
- **Security**: HTML tag stripping loop to prevent XSS bypass, SSRF
protection (removed trusted_origins)

### Frontend

- **MCPToolDialog** (`MCPToolDialog.tsx`): Full tool discovery UI —
enter server URL, authenticate if needed, browse tools, select tool and
configure
- **OAuth popup** (`oauth-popup.ts`): Shared utility supporting
cross-origin MCP OAuth flows with BroadcastChannel + localStorage
fallback
- **Credential integration**: MCP-specific OAuth flow in
`useCredentialsInput`, server URL filtering in `useCredentials`, MCP
callback page
- **CredentialsSelect**: Auto-selects first available credential instead
of defaulting to "None", credentials listed before "None" in dropdown
- **Node rendering**: Dynamic tool input schema rendering on MCP nodes,
proper handling in both legacy and new flow editors
- **Block title persistence**: `customized_name` set at block creation
for both MCP and Agent blocks — no fallback logic needed, titles survive
save/load reliably
- **Stable credential ordering**: Removed `sortByUnsetFirst` that caused
credential inputs to jump when selected

### Tests (~2060 lines)

- Unit tests: block, client, tool execution
- Integration tests: mock MCP server with auth
- OAuth flow tests
- API endpoint tests
- Credential combining/optionality tests
- E2e tests (skipped in CI, run manually)

## Key Design Decisions

1. **Optional credentials via `default={}`**: MCP servers can be public
(no auth) or private (OAuth). The `credentials` field has `default={}`
making it optional at the schema level, so public servers work without
prompting for credentials.

2. **URL-based credential discrimination**: Each MCP server URL gets its
own credential entry in the "Run agent" form (via
`discriminator="server_url"`), so agents using multiple MCP servers
prompt for each independently.

3. **Model-level normalization**: Python 3.13 changed `str(StrEnum)` to
return `"ClassName.MEMBER"`. Rather than scattering fixes across the
codebase, a Pydantic `model_validator(mode="before")` on
`CredentialsMetaInput` handles normalization centrally, and
`provider_matches()` handles lookups.

4. **Credential auto-select**: `CredentialsSelect` component defaults to
the first available credential and notifies the parent state, ensuring
credentials are pre-filled in the "Run agent" dialog without requiring
manual selection.

5. **customized_name for block titles**: Both MCP and Agent blocks set
`customized_name` in metadata at creation time. This eliminates
convoluted runtime fallback logic (`agent_name`, hostname extraction) —
the title is persisted once and read directly.

## Test plan

- [x] Unit/integration tests pass (68 MCP + 11 graph = 79 tests)
- [x] Manual: MCP block with public server (DeepWiki) — no credentials
needed, tools discovered and executable
- [x] Manual: MCP block with OAuth server (Linear, Sentry) — OAuth flow
prompts correctly
- [x] Manual: "Run agent" form shows correct credential requirements per
MCP server
- [x] Manual: Credential auto-selects when exactly one matches,
pre-selects first when multiple exist
- [x] Manual: Credential ordering stays stable when
selecting/deselecting
- [x] Manual: MCP block title persists after save and refresh
- [x] Manual: Agent block title persists after save and refresh (via
customized_name)
- [ ] Manual: Shared agent with MCP block prompts new user for
credentials

---------

Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-13 16:17:03 +00:00
Zamil Majdy
52b3aebf71 feat(backend/sdk): Claude Agent SDK integration for CoPilot (#12103)
## Summary

Full integration of the **Claude Agent SDK** to replace the existing
one-turn OpenAI-compatible CoPilot implementation with a multi-turn,
tool-using AI agent.

### What changed

**Core SDK Integration** (`chat/sdk/` — new module)
- **`service.py`**: Main orchestrator — spawns Claude Code CLI as a
subprocess per user message, streams responses back via SSE. Handles
conversation history compression, session lifecycle, and error recovery.
- **`response_adapter.py`**: Translates Claude Agent SDK events (text
deltas, tool use, errors, result messages) into the existing CoPilot
`StreamEvent` protocol so the frontend works unchanged.
- **`tool_adapter.py`**: Bridges CoPilot's MCP tools (find_block,
run_block, create_agent, etc.) into the SDK's tool format. Handles
schema conversion and result serialization.
- **`security_hooks.py`**: Pre/Post tool-use hooks that enforce a strict
allowlist of tools, block path traversal, sandbox file operations to
per-session workspace directories, cap sub-agent spawning, and prevent
the model from accessing unauthorized system resources.
- **`transcript.py`**: JSONL transcript I/O utilities for the stateless
`--resume` feature (see below).

**Stateless Multi-Turn Resume** (new)
- Instead of compressing conversation history via LLM on every turn
(lossy and expensive), we capture Claude Code's native JSONL session
transcript via a **Stop hook** callback, persist it in the DB
(`ChatSession.sdkTranscript`), and restore it on the next turn via
`--resume <file>`.
- This preserves full tool call/result context across turns with zero
token overhead for history.
- Feature-flagged via `CLAUDE_AGENT_USE_RESUME` (default: off).
- DB migration: `ALTER TABLE "ChatSession" ADD COLUMN "sdkTranscript"
TEXT`.

**Sandboxed Tool Execution** (`chat/tools/`)
- **`bash_exec.py`**: Sandboxed bash execution using bubblewrap
(`bwrap`) with read-only root filesystem, per-session writable
workspace, resource limits (CPU, memory, file size), and network
isolation.
- **`sandbox.py`**: Shared bubblewrap sandbox infrastructure — generates
`bwrap` command lines with configurable mounts, environment, and
resource constraints.
- **`web_fetch.py`**: URL fetching tool with domain allowlist, size
limits, and content-type filtering.
- **`check_operation_status.py`**: Polling tool for long-running
operations (agent creation, block execution) so the SDK doesn't block
waiting.
- **`find_block.py`** / **`run_block.py`**: Enhanced with category
filtering, optimized response size (removed raw JSON schemas), and
better error handling.

**Security**
- Path traversal prevention: session IDs sanitized, all file ops
confined to workspace dirs, symlink resolution.
- Tool allowlist enforcement via SDK hooks — model cannot call arbitrary
tools.
- Built-in `Bash` tool blocked via `disallowed_tools` to prevent
bypassing sandboxed `bash_exec`.
- Sub-agent (`Task`) spawning capped at configurable limit (default:
10).
- CodeQL-clean path sanitization patterns.

**Streaming & Reconnection**
- SSE stream registry backed by Redis Streams for crash-resilient
reconnection.
- Long-running operation tracking with TTL-based cleanup.
- Atomic message append to prevent race conditions on concurrent writes.

**Configuration** (`config.py`)
- `use_claude_agent_sdk` — master toggle (default: on)
- `claude_agent_model` — model override for SDK path
- `claude_agent_max_buffer_size` — JSON parsing buffer (10MB)
- `claude_agent_max_subtasks` — sub-agent cap (10)
- `claude_agent_use_resume` — transcript-based resume (default: off)
- `thinking_enabled` — extended thinking for Claude models

**Tests**
- `sdk/response_adapter_test.py` — 366 lines covering all event
translation paths
- `sdk/security_hooks_test.py` — 165 lines covering tool blocking, path
traversal, subtask limits
- `chat/model_test.py` — 214 lines covering session model serialization
- `chat/service_test.py` — Integration tests including multi-turn resume
keyword recall
- `tools/find_block_test.py` / `run_block_test.py` — Extended with new
tool behavior tests

## Test plan
- [x] Unit tests pass (`sdk/response_adapter_test.py`,
`security_hooks_test.py`, `model_test.py`)
- [x] Integration test: multi-turn keyword recall via `--resume`
(`service_test.py::test_sdk_resume_multi_turn`)
- [x] Manual E2E: CoPilot chat sessions with tool calls, bash execution,
and multi-turn context
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, flake8)
- [ ] Staging deployment with `claude_agent_use_resume=false` initially
- [ ] Enable resume in staging, verify transcript capture and recall

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR replaces the existing OpenAI-compatible CoPilot with a full
Claude Agent SDK integration, introducing multi-turn conversations,
stateless resume via JSONL transcripts, and sandboxed tool execution.

**Key changes:**
- **SDK integration** (`chat/sdk/`): spawns Claude Code CLI subprocess
per message, translates events to frontend protocol, bridges MCP tools
- **Stateless resume**: captures JSONL transcripts via Stop hook,
persists in `ChatSession.sdkTranscript`, restores with `--resume`
(feature-flagged, default off)
- **Sandboxed execution**: bubblewrap sandbox for bash commands with
filesystem whitelist, network isolation, resource limits
- **Security hooks**: tool allowlist enforcement, path traversal
prevention, workspace-scoped file operations, sub-agent spawn limits
- **Long-running operations**: delegates `create_agent`/`edit_agent` to
existing stream_registry infrastructure for SSE reconnection
- **Feature flag**: `CHAT_USE_CLAUDE_AGENT_SDK` with LaunchDarkly
support, defaults to enabled

**Security issues found:**
- Path traversal validation has logic errors in `security_hooks.py:82`
(tilde expansion order) and `service.py:266` (redundant `..` check)
- Config validator always prefers env var over explicit `False` value
(`config.py:162`)
- Race condition in `routes.py:323` — message persisted before task
registration, could duplicate on retry
- Resource limits in sandbox may fail silently (`sandbox.py:109`)

**Test coverage is strong** with 366 lines for response adapter, 165 for
security hooks, and integration tests for multi-turn resume.
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- This PR is generally safe but has critical security issues in path
validation that must be fixed before merge
- Score reflects strong architecture and test coverage offset by real
security vulnerabilities: the tilde expansion bug in `security_hooks.py`
could allow sandbox escape, the race condition could cause message
duplication, and the silent ulimit failures could bypass resource
limits. The bubblewrap sandbox and allowlist enforcement are
well-designed, but the path validation bugs need fixing. The transcript
resume feature is properly feature-flagged. Overall the implementation
is solid but the security issues prevent a higher score.
- Pay close attention to
`backend/api/features/chat/sdk/security_hooks.py` (path traversal
vulnerability), `backend/api/features/chat/routes.py` (race condition),
`backend/api/features/chat/tools/sandbox.py` (silent resource limit
failures), and `backend/api/features/chat/sdk/service.py` (redundant
security check)
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Frontend
    participant Routes as routes.py
    participant SDKService as sdk/service.py
    participant ClaudeSDK as Claude Agent SDK CLI
    participant SecurityHooks as security_hooks.py
    participant ToolAdapter as tool_adapter.py
    participant CoPilotTools as tools/*
    participant Sandbox as sandbox.py (bwrap)
    participant DB as Database
    participant Redis as stream_registry

    Frontend->>Routes: POST /chat (user message)
    Routes->>SDKService: stream_chat_completion_sdk()
    
    SDKService->>DB: get_chat_session()
    DB-->>SDKService: session + messages
    
    alt Resume enabled AND transcript exists
        SDKService->>SDKService: validate_transcript()
        SDKService->>SDKService: write_transcript_to_tempfile()
        Note over SDKService: Pass --resume to SDK
    else No resume
        SDKService->>SDKService: _compress_conversation_history()
        Note over SDKService: Inject history into user message
    end
    
    SDKService->>SecurityHooks: create_security_hooks()
    SDKService->>ToolAdapter: create_copilot_mcp_server()
    SDKService->>ClaudeSDK: spawn subprocess with MCP server
    
    loop Streaming Conversation
        ClaudeSDK->>SDKService: AssistantMessage (text/tool_use)
        SDKService->>Frontend: StreamTextDelta / StreamToolInputAvailable
        
        alt Tool Call
            ClaudeSDK->>SecurityHooks: PreToolUse hook
            SecurityHooks->>SecurityHooks: validate path, check allowlist
            alt Tool blocked
                SecurityHooks-->>ClaudeSDK: deny
            else Tool allowed
                SecurityHooks-->>ClaudeSDK: allow
                ClaudeSDK->>ToolAdapter: call MCP tool
                
                alt Long-running tool (create_agent, edit_agent)
                    ToolAdapter->>Redis: register task
                    ToolAdapter->>DB: save OperationPendingResponse
                    ToolAdapter->>ToolAdapter: spawn background task
                    ToolAdapter-->>ClaudeSDK: OperationStartedResponse
                else Regular tool (find_block, bash_exec)
                    ToolAdapter->>CoPilotTools: execute()
                    alt bash_exec
                        CoPilotTools->>Sandbox: run_sandboxed()
                        Sandbox->>Sandbox: build bwrap command
                        Note over Sandbox: Network isolation,<br/>filesystem whitelist,<br/>resource limits
                        Sandbox-->>CoPilotTools: stdout, stderr, exit_code
                    end
                    CoPilotTools-->>ToolAdapter: result
                    ToolAdapter->>ToolAdapter: stash full output
                    ToolAdapter-->>ClaudeSDK: MCP response
                end
                
                SecurityHooks->>SecurityHooks: PostToolUse hook (log)
            end
        end
        
        ClaudeSDK->>SDKService: UserMessage (ToolResultBlock)
        SDKService->>ToolAdapter: pop_pending_tool_output()
        SDKService->>Frontend: StreamToolOutputAvailable
    end
    
    ClaudeSDK->>SecurityHooks: Stop hook
    SecurityHooks->>SDKService: transcript_path callback
    SDKService->>SDKService: read_transcript_file()
    SDKService->>DB: save transcript to session.sdkTranscript
    
    ClaudeSDK->>SDKService: ResultMessage (success)
    SDKService->>Frontend: StreamFinish
    SDKService->>DB: upsert_chat_session()
```
</details>


<sub>Last reviewed commit: 28c1121</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Swifty <craigswift13@gmail.com>
2026-02-13 15:49:03 +00:00
Otto
965b7d3e04 dx: Add PR overlap detection & alert (#12104)
## Summary

Adds an automated workflow that detects potential merge conflicts
between open PRs, helping contributors coordinate proactively.

**Example output:** [See comment on PR
#12057](https://github.com/Significant-Gravitas/AutoGPT/pull/12057#issuecomment-3897330632)

## How it works

1. **Triggered on PR events** — runs when a PR is opened, pushed to, or
reopened
2. **Compares against all open PRs** targeting the same base branch
3. **Detects overlaps** at multiple levels:
   - File overlap (same files modified)
   - Line overlap (same line ranges modified)
   - Actual merge conflicts (attempts real merges)
4. **Posts a comment** on the PR with findings

## Features

- Full file paths with common prefix extraction for readability
- Conflict size (number of conflict regions + lines affected)
- Conflict types (content, added, deleted, modified/deleted, etc.)
- Last-updated timestamps for each PR
- Risk categorization (conflict, medium, low)
- Ignores noise files (openapi.json, lock files)
- Updates existing comment on subsequent pushes (no spam)
- Filters out PRs older than 14 days
- Clone-once optimization for fast merge testing (~48s for 19 PRs)

## Files

- `.github/scripts/detect_overlaps.py` — main detection script
- `.github/workflows/pr-overlap-check.yml` — workflow definition
2026-02-13 15:45:10 +00:00
Bently
c2368f15ff fix(blocks): disable PrintToConsoleBlock (#12100)
## Summary
Disables the Print to Console block as requested by Nick Tindle.

## Changes
- Added `disabled=True` to PrintToConsoleBlock in `basic.py`

## Testing
- Block will no longer appear in the platform UI
- Existing graphs using this block should be checked (block ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`)

Closes OPEN-3000

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Added `disabled=True` parameter to `PrintToConsoleBlock` in `basic.py`
per Nick Tindle's request (OPEN-3000).

- Block follows the same disabling pattern used by other blocks in the
codebase (e.g., `BlockInstallationBlock`, video blocks, Ayrshare blocks)
- Block will no longer appear in the platform UI for new graph creation
- Existing graphs using this block (ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`) will need to be checked for
compatibility
- Comment properly documents the reason for disabling
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- Single-line change that adds a well-documented flag following existing
patterns used throughout the codebase. The change is non-destructive and
only affects UI visibility of the block for new graphs.
- No files require special attention
</details>


<sub>Last reviewed commit: 759003b</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-13 15:20:23 +00:00
dependabot[bot]
9ac3f64d56 chore(deps): bump github/codeql-action from 3 to 4 (#12033)
Bumps [github/codeql-action](https://github.com/github/codeql-action)
from 3 to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/releases">github/codeql-action's
releases</a>.</em></p>
<blockquote>
<h2>v3.32.2</h2>
<ul>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.1">2.24.1</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3460">#3460</a></li>
</ul>
<h2>v3.32.1</h2>
<ul>
<li>A warning is now shown in Default Setup workflow logs if a <a
href="https://docs.github.com/en/code-security/how-tos/secure-at-scale/configure-organization-security/manage-usage-and-access/giving-org-access-private-registries">private
package registry is configured</a> using a GitHub Personal Access Token
(PAT), but no username is configured. <a
href="https://redirect.github.com/github/codeql-action/pull/3422">#3422</a></li>
<li>Fixed a bug which caused the CodeQL Action to fail when repository
properties cannot successfully be retrieved. <a
href="https://redirect.github.com/github/codeql-action/pull/3421">#3421</a></li>
</ul>
<h2>v3.32.0</h2>
<ul>
<li>Update default CodeQL bundle version to <a
href="https://github.com/github/codeql-action/releases/tag/codeql-bundle-v2.24.0">2.24.0</a>.
<a
href="https://redirect.github.com/github/codeql-action/pull/3425">#3425</a></li>
</ul>
<h2>v3.31.11</h2>
<ul>
<li>When running a Default Setup workflow with <a
href="https://docs.github.com/en/actions/how-tos/monitor-workflows/enable-debug-logging">Actions
debugging enabled</a>, the CodeQL Action will now use more unique names
when uploading logs from the Dependabot authentication proxy as workflow
artifacts. This ensures that the artifact names do not clash between
multiple jobs in a build matrix. <a
href="https://redirect.github.com/github/codeql-action/pull/3409">#3409</a></li>
<li>Improved error handling throughout the CodeQL Action. <a
href="https://redirect.github.com/github/codeql-action/pull/3415">#3415</a></li>
<li>Added experimental support for automatically excluding <a
href="https://docs.github.com/en/repositories/working-with-files/managing-files/customizing-how-changed-files-appear-on-github">generated
files</a> from the analysis. This feature is not currently enabled for
any analysis. In the future, it may be enabled by default for some
GitHub-managed analyses. <a
href="https://redirect.github.com/github/codeql-action/pull/3318">#3318</a></li>
<li>The changelog extracts that are included with releases of the CodeQL
Action are now shorter to avoid duplicated information from appearing in
Dependabot PRs. <a
href="https://redirect.github.com/github/codeql-action/pull/3403">#3403</a></li>
</ul>
<h2>v3.31.10</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.31.10 - 12 Jan 2026</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.9. <a
href="https://redirect.github.com/github/codeql-action/pull/3393">#3393</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.31.10/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.31.9</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.31.9 - 16 Dec 2025</h2>
<p>No user facing changes.</p>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.31.9/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.31.8</h2>
<h1>CodeQL Action Changelog</h1>
<p>See the <a
href="https://github.com/github/codeql-action/releases">releases
page</a> for the relevant changes to the CodeQL CLI and language
packs.</p>
<h2>3.31.8 - 11 Dec 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.8. <a
href="https://redirect.github.com/github/codeql-action/pull/3354">#3354</a></li>
</ul>
<p>See the full <a
href="https://github.com/github/codeql-action/blob/v3.31.8/CHANGELOG.md">CHANGELOG.md</a>
for more information.</p>
<h2>v3.31.7</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/github/codeql-action/blob/main/CHANGELOG.md">github/codeql-action's
changelog</a>.</em></p>
<blockquote>
<h2>4.31.11 - 23 Jan 2026</h2>
<ul>
<li>When running a Default Setup workflow with <a
href="https://docs.github.com/en/actions/how-tos/monitor-workflows/enable-debug-logging">Actions
debugging enabled</a>, the CodeQL Action will now use more unique names
when uploading logs from the Dependabot authentication proxy as workflow
artifacts. This ensures that the artifact names do not clash between
multiple jobs in a build matrix. <a
href="https://redirect.github.com/github/codeql-action/pull/3409">#3409</a></li>
<li>Improved error handling throughout the CodeQL Action. <a
href="https://redirect.github.com/github/codeql-action/pull/3415">#3415</a></li>
<li>Added experimental support for automatically excluding <a
href="https://docs.github.com/en/repositories/working-with-files/managing-files/customizing-how-changed-files-appear-on-github">generated
files</a> from the analysis. This feature is not currently enabled for
any analysis. In the future, it may be enabled by default for some
GitHub-managed analyses. <a
href="https://redirect.github.com/github/codeql-action/pull/3318">#3318</a></li>
<li>The changelog extracts that are included with releases of the CodeQL
Action are now shorter to avoid duplicated information from appearing in
Dependabot PRs. <a
href="https://redirect.github.com/github/codeql-action/pull/3403">#3403</a></li>
</ul>
<h2>4.31.10 - 12 Jan 2026</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.9. <a
href="https://redirect.github.com/github/codeql-action/pull/3393">#3393</a></li>
</ul>
<h2>4.31.9 - 16 Dec 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.8 - 11 Dec 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.8. <a
href="https://redirect.github.com/github/codeql-action/pull/3354">#3354</a></li>
</ul>
<h2>4.31.7 - 05 Dec 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.7. <a
href="https://redirect.github.com/github/codeql-action/pull/3343">#3343</a></li>
</ul>
<h2>4.31.6 - 01 Dec 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.5 - 24 Nov 2025</h2>
<ul>
<li>Update default CodeQL bundle version to 2.23.6. <a
href="https://redirect.github.com/github/codeql-action/pull/3321">#3321</a></li>
</ul>
<h2>4.31.4 - 18 Nov 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.3 - 13 Nov 2025</h2>
<ul>
<li>CodeQL Action v3 will be deprecated in December 2026. The Action now
logs a warning for customers who are running v3 but could be running v4.
For more information, see <a
href="https://github.blog/changelog/2025-10-28-upcoming-deprecation-of-codeql-action-v3/">Upcoming
deprecation of CodeQL Action v3</a>.</li>
<li>Update default CodeQL bundle version to 2.23.5. <a
href="https://redirect.github.com/github/codeql-action/pull/3288">#3288</a></li>
</ul>
<h2>4.31.2 - 30 Oct 2025</h2>
<p>No user facing changes.</p>
<h2>4.31.1 - 30 Oct 2025</h2>
<ul>
<li>The <code>add-snippets</code> input has been removed from the
<code>analyze</code> action. This input has been deprecated since CodeQL
Action 3.26.4 in August 2024 when this removal was announced.</li>
</ul>
<h2>4.31.0 - 24 Oct 2025</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8aac4e47ac"><code>8aac4e4</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3448">#3448</a>
from github/mergeback/v4.32.1-to-main-6bc82e05</li>
<li><a
href="e8d7df4f04"><code>e8d7df4</code></a>
Rebuild</li>
<li><a
href="c1bba77db0"><code>c1bba77</code></a>
Update changelog and version after v4.32.1</li>
<li><a
href="6bc82e05fd"><code>6bc82e0</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3447">#3447</a>
from github/update-v4.32.1-f52cbc830</li>
<li><a
href="42f00f2d33"><code>42f00f2</code></a>
Add a couple of change notes</li>
<li><a
href="cedee6de9f"><code>cedee6d</code></a>
Update changelog for v4.32.1</li>
<li><a
href="f52cbc8309"><code>f52cbc8</code></a>
Merge pull request <a
href="https://redirect.github.com/github/codeql-action/issues/3445">#3445</a>
from github/dependabot/npm_and_yarn/fast-xml-parser-...</li>
<li>See full diff in <a
href="https://github.com/github/codeql-action/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github/codeql-action&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-13 15:04:05 +00:00
Swifty
5035b69c79 feat(platform): add feature request tools for CoPilot chat (#12102)
Users can now search for existing feature requests and submit new ones
directly through the CoPilot chat interface. Requests are tracked in
Linear with customer need attribution.

### Changes 🏗️

**Backend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` to
the CoPilot chat tools registry
- Integrated with Linear GraphQL API for searching issues in the feature
requests project, creating new issues, upserting customers, and
attaching customer needs
- Added `linear_api_key` secret to settings for system-level Linear API
access
- Added response models (`FeatureRequestSearchResponse`,
`FeatureRequestCreatedResponse`, `FeatureRequestInfo`) to the tools
models

**Frontend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` UI
components with full streaming state handling (input-streaming,
input-available, output-available, output-error)
- Added helper utilities for output parsing, type guards, animation
text, and icon rendering
- Wired tools into `ChatMessagesContainer` for rendering in the chat
- Added styleguide examples covering all tool states

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified search returns matching feature requests from Linear
- [x] Verified creating a new feature request creates an issue and
customer need in Linear
- [x] Verified adding a need to an existing issue works via
`existing_issue_id`
  - [x] Verified error states render correctly in the UI
  - [x] Verified styleguide page renders all tool states

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

New secret: `LINEAR_API_KEY` — required for system-level Linear API
operations (defaults to empty string).

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Adds feature request search and creation tools to CoPilot chat,
integrating with Linear's GraphQL API to track user feedback. Users can
now search existing feature requests and submit new ones (or add their
need to existing issues) directly through conversation.

**Key changes:**
- Backend: `SearchFeatureRequestsTool` and `CreateFeatureRequestTool`
with Linear API integration via system-level `LINEAR_API_KEY`
- Frontend: React components with streaming state handling and accordion
UI for search results and creation confirmations
- Models: Added `FeatureRequestSearchResponse` and
`FeatureRequestCreatedResponse` to response types
- Customer need tracking: Upserts customers in Linear and attaches needs
to issues for better feedback attribution

**Issues found:**
- Missing `LINEAR_API_KEY` entry in `.env.default` (required per PR
description checklist)
- Hardcoded project/team IDs reduce maintainability
- Global singleton pattern could cause issues in async contexts
- Using `user_id` as customer name reduces readability in Linear
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- Safe to merge with minor configuration fix required
- The implementation is well-structured with proper error handling, type
safety, and follows existing patterns in the codebase. The missing
`.env.default` entry is a straightforward configuration issue that must
be fixed before deployment but doesn't affect code quality. The other
findings are style improvements that don't impact functionality.
- Verify that `LINEAR_API_KEY` is added to `.env.default` before merging
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User
    participant CoPilot UI
    participant LLM
    participant FeatureRequestTool
    participant LinearClient
    participant Linear API

    User->>CoPilot UI: Request feature via chat
    CoPilot UI->>LLM: Send user message
    
    LLM->>FeatureRequestTool: search_feature_requests(query)
    FeatureRequestTool->>LinearClient: query(SEARCH_ISSUES_QUERY)
    LinearClient->>Linear API: POST /graphql (search)
    Linear API-->>LinearClient: searchIssues.nodes[]
    LinearClient-->>FeatureRequestTool: Feature request data
    FeatureRequestTool-->>LLM: FeatureRequestSearchResponse
    
    alt No existing requests found
        LLM->>FeatureRequestTool: create_feature_request(title, description)
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
        LinearClient->>Linear API: POST /graphql (upsert customer)
        Linear API-->>LinearClient: customer {id, name}
        LinearClient-->>FeatureRequestTool: Customer data
        
        FeatureRequestTool->>LinearClient: mutate(ISSUE_CREATE_MUTATION)
        LinearClient->>Linear API: POST /graphql (create issue)
        Linear API-->>LinearClient: issue {id, identifier, url}
        LinearClient-->>FeatureRequestTool: Issue data
        
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
        LinearClient->>Linear API: POST /graphql (attach need)
        Linear API-->>LinearClient: need {id, issue}
        LinearClient-->>FeatureRequestTool: Need data
        FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
    else Existing request found
        LLM->>FeatureRequestTool: create_feature_request(title, description, existing_issue_id)
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
        LinearClient->>Linear API: POST /graphql (upsert customer)
        Linear API-->>LinearClient: customer {id}
        LinearClient-->>FeatureRequestTool: Customer data
        
        FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
        LinearClient->>Linear API: POST /graphql (attach need to existing)
        Linear API-->>LinearClient: need {id, issue}
        LinearClient-->>FeatureRequestTool: Need data
        FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
    end
    
    LLM-->>CoPilot UI: Tool response + continuation
    CoPilot UI-->>User: Display result with accordion UI
```
</details>


<sub>Last reviewed commit: af2e093</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
2026-02-13 15:27:00 +01:00
Otto
86af8fc856 ci: apply E2E CI optimizations to Claude workflows (#12097)
## Summary

Applies the CI performance optimizations from #12090 to Claude Code
workflows.

## Changes

### `claude.yml` & `claude-dependabot.yml`
- **pnpm caching**: Replaced manual `actions/cache` with `setup-node`
built-in `cache: "pnpm"`
- Removes 4 steps (set pnpm store dir, cache step, manual config) → 1
step

### `claude-ci-failure-auto-fix.yml`
- **Added dev environment setup** with optimized caching
- Now Claude can run lint/tests when fixing CI failures (previously
could only edit files)
- Uses the same optimized caching patterns

## Dependency

This PR is based on #12090 and will merge after it.

## Testing

- Workflow YAML syntax validated
- Patterns match proven #12090 implementation
- CI caching changes fail gracefully to uncached builds

## Linear

Fixes [SECRT-1950](https://linear.app/autogpt/issue/SECRT-1950)

## Future Enhancements

E2E test data caching could be added to Claude workflows if needed for
running integration tests. Currently Claude workflows set up a dev
environment but don't run E2E tests by default.

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Applies proven CI performance optimizations to Claude workflows by
simplifying pnpm caching and adding dev environment setup to the
auto-fix workflow.

**Key changes:**
- Replaced manual pnpm cache configuration (4 steps) with built-in
`setup-node` `cache: "pnpm"` support in `claude.yml` and
`claude-dependabot.yml`
- Added complete dev environment setup (Python/Poetry + Node.js/pnpm) to
`claude-ci-failure-auto-fix.yml` so Claude can run linting and tests
when fixing CI failures
- Correctly orders `corepack enable` before `setup-node` to ensure pnpm
is available for caching

The changes mirror the optimizations from PR #12090 and maintain
consistency across all Claude workflows.
</details>


<details><summary><h3>Confidence Score: 5/5</h3></summary>

- This PR is safe to merge with minimal risk
- The changes are CI infrastructure optimizations that mirror proven
patterns from PR #12090. The pnpm caching simplification reduces
complexity without changing functionality (caching failures gracefully
fall back to uncached builds). The dev environment setup in the auto-fix
workflow is additive and enables Claude to run linting/tests. All YAML
syntax is correct and the step ordering follows best practices.
- No files require special attention
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant GHA as GitHub Actions
    participant Corepack as Corepack
    participant SetupNode as setup-node@v6
    participant Cache as GHA Cache
    participant pnpm as pnpm

    Note over GHA,pnpm: Before (Manual Caching)
    GHA->>SetupNode: Set up Node.js 22
    SetupNode-->>GHA: Node.js ready
    GHA->>Corepack: Enable corepack
    Corepack-->>GHA: pnpm available
    GHA->>pnpm: Configure store directory
    pnpm-->>GHA: Store path set
    GHA->>Cache: actions/cache (manual key)
    Cache-->>GHA: Cache restored/missed
    GHA->>pnpm: Install dependencies
    pnpm-->>GHA: Dependencies installed

    Note over GHA,pnpm: After (Built-in Caching)
    GHA->>Corepack: Enable corepack
    Corepack-->>GHA: pnpm available
    GHA->>SetupNode: Set up Node.js 22<br/>cache: "pnpm"<br/>cache-dependency-path: pnpm-lock.yaml
    SetupNode->>Cache: Auto-detect pnpm store
    Cache-->>SetupNode: Cache restored/missed
    SetupNode-->>GHA: Node.js + cache ready
    GHA->>pnpm: Install dependencies
    pnpm-->>GHA: Dependencies installed
```
</details>


<sub>Last reviewed commit: f1681a0</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
2026-02-13 13:48:04 +00:00
Otto
dfa517300b debug(copilot): Add detailed API error logging (#11942)
## Summary
Adds comprehensive error logging for OpenRouter/OpenAI API errors to
help diagnose issues like provider routing failures, context length
exceeded, rate limits, etc.

## Background
While investigating
[SECRT-1859](https://linear.app/autogpt/issue/SECRT-1859), we found that
when OpenRouter returns errors, the actual error details weren't being
captured or logged. Langfuse traces showed `provider_name: 'unknown'`
and `completion: null` without any insight into WHY all providers
rejected the request.

## Changes
- Add `_extract_api_error_details()` to extract rich information from
API errors including:
  - Status code and request ID
  - Response body (contains OpenRouter's actual error message)
  - OpenRouter-specific headers (provider, model)
  - Rate limit headers
- Add `_log_api_error()` helper that logs errors with context:
  - Session ID for correlation
  - Message count (helps identify context length issues)
  - Model being used
  - Retry count
- Update error handling in `_stream_chat_chunks()` and
`_generate_llm_continuation()` to use new logging
- Extract provider's error message from response body for better user
feedback

## Example log output
```
API error: {
  'error_type': 'APIStatusError',
  'error_message': 'Provider returned error',
  'status_code': 400,
  'request_id': 'req_xxx',
  'response_body': {'error': {'message': 'context_length_exceeded', 'type': 'invalid_request_error'}},
  'openrouter_provider': 'unknown',
  'session_id': '44fbb803-...',
  'message_count': 52,
  'model': 'anthropic/claude-opus-4.5',
  'retry_count': 0
}
```

## Testing
- [ ] Verified code passes linting (black, isort, ruff)
- [ ] Error details are properly extracted from different error types

## Refs
- Linear: SECRT-1859
- Thread:
https://discord.com/channels/1126875755960336515/1467066151002571034

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-02-13 13:15:17 +00:00
Reinier van der Leer
43b25b5e2f ci(frontend): Speed up E2E test job (#12090)
The frontend `e2e_test` doesn't have a working build cache setup,
causing really slow builds = slow test jobs. These changes reduce total
test runtime from ~12 minutes to ~5 minutes.

### Changes 🏗️

- Inject build cache config into docker compose config; let `buildx
bake` use GHA cache directly
  - Add `docker-ci-fix-compose-build-cache.py` script
- Optimize `backend/Dockerfile` + root `.dockerignore`
- Replace broken DIY pnpm store caching with `actions/setup-node`
built-in cache management
- Add caching for test seed data created in DB

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI
2026-02-13 11:09:41 +01:00
Swifty
ab0b537cc7 refactor(backend): optimize find_block response size by removing raw JSON schemas (#12020)
### Changes 🏗️

The `find_block` AutoPilot tool was returning ~90K characters per
response (10 blocks). The bloat came from including full JSON Schema
objects (`input_schema`, `output_schema`) with all nested `$defs`,
`anyOf`, and type definitions for every block.

**What changed:**

- **`BlockInfoSummary` model**: Removed `input_schema` (raw JSON
Schema), `output_schema` (raw JSON Schema), and `categories`. Added
`output_fields` (compact field-level summaries matching the existing
`required_inputs` format).
- **`BlockListResponse` model**: Removed `usage_hint` (info now in
`message`).
- **`FindBlockTool._execute()`**: Now extracts compact `output_fields`
from output schema properties instead of including the entire raw
schema. Credentials handling is unchanged.
- **Test**: Added `test_response_size_average_chars_per_block` with
realistic block schemas (HTTP, Email, Claude Code) to measure and assert
response size stays under 2K chars/block.
- **`CLAUDE.md`**: Clarified `dev` vs `master` branching strategy.

**Result:** Average response size reduced from ~9,000 to ~1,300 chars
per block (~85% reduction). This directly reduces LLM token consumption,
latency, and API costs for AutoPilot interactions.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified models import and serialize correctly
- [x] Verified response size: 3,970 chars for 3 realistic blocks (avg
1,323/block)
- [x] Lint (`ruff check`) and type check (`pyright`) pass on changed
files
- [x] Frontend compatibility preserved: `blocks[].name` and `count`
fields retained for `block_list` handler

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
2026-02-13 11:08:51 +01:00
dependabot[bot]
9a8c6ad609 chore(libs/deps): bump the production-dependencies group across 1 directory with 4 updates (#12056)
Bumps the production-dependencies group with 4 updates in the
/autogpt_platform/autogpt_libs directory:
[cryptography](https://github.com/pyca/cryptography),
[fastapi](https://github.com/fastapi/fastapi),
[launchdarkly-server-sdk](https://github.com/launchdarkly/python-server-sdk)
and [supabase](https://github.com/supabase/supabase-py).

Updates `cryptography` from 46.0.4 to 46.0.5
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst">cryptography's
changelog</a>.</em></p>
<blockquote>
<p>46.0.5 - 2026-02-10</p>
<pre><code>
* An attacker could create a malicious public key that reveals portions
of your
private key when using certain uncommon elliptic curves (binary curves).
This version now includes additional security checks to prevent this
attack.
This issue only affects binary elliptic curves, which are rarely used in
real-world applications. Credit to **XlabAI Team of Tencent Xuanwu Lab
and
Atuin Automated Vulnerability Discovery Engine** for reporting the
issue.
  **CVE-2026-26007**
* Support for ``SECT*`` binary elliptic curves is deprecated and will be
  removed in the next release.
<p>.. v46-0-4:<br />
</code></pre></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="06e120e682"><code>06e120e</code></a>
bump version for 46.0.5 release (<a
href="https://redirect.github.com/pyca/cryptography/issues/14289">#14289</a>)</li>
<li><a
href="0eebb9dbb6"><code>0eebb9d</code></a>
EC check key on cofactor &gt; 1 (<a
href="https://redirect.github.com/pyca/cryptography/issues/14287">#14287</a>)</li>
<li><a
href="bedf6e186b"><code>bedf6e1</code></a>
fix openssl version on 46 branch (<a
href="https://redirect.github.com/pyca/cryptography/issues/14220">#14220</a>)</li>
<li>See full diff in <a
href="https://github.com/pyca/cryptography/compare/46.0.4...46.0.5">compare
view</a></li>
</ul>
</details>
<br />

Updates `fastapi` from 0.128.0 to 0.128.7
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/fastapi/fastapi/releases">fastapi's
releases</a>.</em></p>
<blockquote>
<h2>0.128.7</h2>
<h3>Features</h3>
<ul>
<li> Show a clear error on attempt to include router into itself. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14258">#14258</a>
by <a
href="https://github.com/JavierSanchezCastro"><code>@​JavierSanchezCastro</code></a>.</li>
<li> Replace <code>dict</code> by <code>Mapping</code> on
<code>HTTPException.headers</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/12997">#12997</a>
by <a
href="https://github.com/rijenkii"><code>@​rijenkii</code></a>.</li>
</ul>
<h3>Refactors</h3>
<ul>
<li>♻️ Simplify reading files in memory, do it sequentially instead of
(fake) parallel. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14884">#14884</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h3>Docs</h3>
<ul>
<li>📝 Use <code>dfn</code> tag for definitions instead of
<code>abbr</code> in docs. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14744">#14744</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h3>Internal</h3>
<ul>
<li> Tweak comment in test to reference PR. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14885">#14885</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
<li>🔧 Update LLM-prompt for <code>abbr</code> and <code>dfn</code> tags.
PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14747">#14747</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
<li> Test order for the submitted byte Files. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14828">#14828</a>
by <a
href="https://github.com/valentinDruzhinin"><code>@​valentinDruzhinin</code></a>.</li>
<li>🔧 Configure <code>test</code> workflow to run tests with
<code>inline-snapshot=review</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14876">#14876</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h2>0.128.6</h2>
<h3>Fixes</h3>
<ul>
<li>🐛 Fix <code>on_startup</code> and <code>on_shutdown</code>
parameters of <code>APIRouter</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14873">#14873</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h3>Translations</h3>
<ul>
<li>🌐 Update translations for zh (update-outdated). PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14843">#14843</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h3>Internal</h3>
<ul>
<li> Fix parameterized tests with snapshots. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14875">#14875</a>
by <a
href="https://github.com/YuriiMotov"><code>@​YuriiMotov</code></a>.</li>
</ul>
<h2>0.128.5</h2>
<h3>Refactors</h3>
<ul>
<li>♻️ Refactor and simplify Pydantic v2 (and v1) compatibility internal
utils. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14862">#14862</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h3>Internal</h3>
<ul>
<li> Add inline snapshot tests for OpenAPI before changes from Pydantic
v2. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14864">#14864</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<h2>0.128.4</h2>
<h3>Refactors</h3>
<ul>
<li>♻️ Refactor internals, simplify Pydantic v2/v1 utils,
<code>create_model_field</code>, better types for
<code>lenient_issubclass</code>. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14860">#14860</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
<li>♻️ Simplify internals, remove Pydantic v1 only logic, no longer
needed. PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14857">#14857</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
<li>♻️ Refactor internals, cleanup unneeded Pydantic v1 specific logic.
PR <a
href="https://redirect.github.com/fastapi/fastapi/pull/14856">#14856</a>
by <a
href="https://github.com/tiangolo"><code>@​tiangolo</code></a>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8f82c94de0"><code>8f82c94</code></a>
🔖 Release version 0.128.7</li>
<li><a
href="5bb3423205"><code>5bb3423</code></a>
📝 Update release notes</li>
<li><a
href="6ce5e3e961"><code>6ce5e3e</code></a>
 Tweak comment in test to reference PR (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14885">#14885</a>)</li>
<li><a
href="65da3dde12"><code>65da3dd</code></a>
📝 Update release notes</li>
<li><a
href="81f82fd955"><code>81f82fd</code></a>
🔧 Update LLM-prompt for <code>abbr</code> and <code>dfn</code> tags (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14747">#14747</a>)</li>
<li><a
href="ff721017df"><code>ff72101</code></a>
📝 Update release notes</li>
<li><a
href="ca76a4eba9"><code>ca76a4e</code></a>
📝 Use <code>dfn</code> tag for definitions instead of <code>abbr</code>
in docs (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14744">#14744</a>)</li>
<li><a
href="1133a4594d"><code>1133a45</code></a>
📝 Update release notes</li>
<li><a
href="38f965985e"><code>38f9659</code></a>
 Test order for the submitted byte Files (<a
href="https://redirect.github.com/fastapi/fastapi/issues/14828">#14828</a>)</li>
<li><a
href="3f1cc8f8f5"><code>3f1cc8f</code></a>
📝 Update release notes</li>
<li>Additional commits viewable in <a
href="https://github.com/fastapi/fastapi/compare/0.128.0...0.128.7">compare
view</a></li>
</ul>
</details>
<br />

Updates `launchdarkly-server-sdk` from 9.14.1 to 9.15.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/launchdarkly/python-server-sdk/releases">launchdarkly-server-sdk's
releases</a>.</em></p>
<blockquote>
<h2>v9.15.0</h2>
<h2><a
href="https://github.com/launchdarkly/python-server-sdk/compare/9.14.1...9.15.0">9.15.0</a>
(2026-02-10)</h2>
<h3>Features</h3>
<ul>
<li>Drop support for python 3.9 (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/393">#393</a>)
(<a
href="5b761bd306">5b761bd</a>)</li>
<li>Update ChangeSet to always require a Selector (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/405">#405</a>)
(<a
href="5dc4f81688">5dc4f81</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Add context manager for clearer, safer locks (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/396">#396</a>)
(<a
href="beca0fa498">beca0fa</a>)</li>
<li>Address potential race condition in FeatureStore update_availability
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/391">#391</a>)
(<a
href="31cf4875c3">31cf487</a>)</li>
<li>Allow modifying fdv2 data source options independent of main config
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/403">#403</a>)
(<a
href="d78079e7f3">d78079e</a>)</li>
<li>Mark copy_with_new_sdk_key method as deprecated (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/353">#353</a>)
(<a
href="e471ccc3d5">e471ccc</a>)</li>
<li>Prevent immediate polling on recoverable error (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/399">#399</a>)
(<a
href="da565a2dce">da565a2</a>)</li>
<li>Redis store is considered initialized when <code>$inited</code> key
is written (<a
href="e99a27d48f">e99a27d</a>)</li>
<li>Stop FeatureStoreClientWrapper poller on close (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/397">#397</a>)
(<a
href="468afdfef3">468afdf</a>)</li>
<li>Update DataSystemConfig to accept list of synchronizers (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/404">#404</a>)
(<a
href="c73ad14090">c73ad14</a>)</li>
<li>Update reason documentation with inExperiment value (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/401">#401</a>)
(<a
href="cbfc3dd887">cbfc3dd</a>)</li>
<li>Update Redis to write missing <code>$inited</code> key (<a
href="e99a27d48f">e99a27d</a>)</li>
</ul>
<hr />
<p>This PR was generated with <a
href="https://github.com/googleapis/release-please">Release Please</a>.
See <a
href="https://github.com/googleapis/release-please#release-please">documentation</a>.</p>
<!-- raw HTML omitted -->
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/launchdarkly/python-server-sdk/blob/main/CHANGELOG.md">launchdarkly-server-sdk's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/launchdarkly/python-server-sdk/compare/9.14.1...9.15.0">9.15.0</a>
(2026-02-10)</h2>
<h3>⚠ BREAKING CHANGES</h3>
<p><strong>Note:</strong> The following breaking changes apply only to
FDv2 (Flag Delivery v2) early access features, which are not subject to
semantic versioning and may change without a major version bump.</p>
<ul>
<li>Update ChangeSet to always require a Selector (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/405">#405</a>)
(<a
href="5dc4f81688">5dc4f81</a>)
<ul>
<li>The <code>ChangeSetBuilder.finish()</code> method now requires a
<code>Selector</code> parameter.</li>
</ul>
</li>
<li>Update DataSystemConfig to accept list of synchronizers (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/404">#404</a>)
(<a
href="c73ad14090">c73ad14</a>)
<ul>
<li>The <code>DataSystemConfig.synchronizers</code> field now accepts a
list of synchronizers, and the
<code>ConfigBuilder.synchronizers()</code> method accepts variadic
arguments.</li>
</ul>
</li>
</ul>
<h3>Features</h3>
<ul>
<li>Drop support for python 3.9 (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/393">#393</a>)
(<a
href="5b761bd306">5b761bd</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Add context manager for clearer, safer locks (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/396">#396</a>)
(<a
href="beca0fa498">beca0fa</a>)</li>
<li>Address potential race condition in FeatureStore update_availability
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/391">#391</a>)
(<a
href="31cf4875c3">31cf487</a>)</li>
<li>Allow modifying fdv2 data source options independent of main config
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/403">#403</a>)
(<a
href="d78079e7f3">d78079e</a>)</li>
<li>Mark copy_with_new_sdk_key method as deprecated (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/353">#353</a>)
(<a
href="e471ccc3d5">e471ccc</a>)</li>
<li>Prevent immediate polling on recoverable error (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/399">#399</a>)
(<a
href="da565a2dce">da565a2</a>)</li>
<li>Redis store is considered initialized when <code>$inited</code> key
is written (<a
href="e99a27d48f">e99a27d</a>)</li>
<li>Stop FeatureStoreClientWrapper poller on close (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/397">#397</a>)
(<a
href="468afdfef3">468afdf</a>)</li>
<li>Update reason documentation with inExperiment value (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/401">#401</a>)
(<a
href="cbfc3dd887">cbfc3dd</a>)</li>
<li>Update Redis to write missing <code>$inited</code> key (<a
href="e99a27d48f">e99a27d</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e542f737a6"><code>e542f73</code></a>
chore(main): release 9.15.0 (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/394">#394</a>)</li>
<li><a
href="e471ccc3d5"><code>e471ccc</code></a>
fix: Mark copy_with_new_sdk_key method as deprecated (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/353">#353</a>)</li>
<li><a
href="5dc4f81688"><code>5dc4f81</code></a>
feat: Update ChangeSet to always require a Selector (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/405">#405</a>)</li>
<li><a
href="f20fffeb1e"><code>f20fffe</code></a>
chore: Remove dead code, clarify names, other cleanup (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/398">#398</a>)</li>
<li><a
href="c73ad14090"><code>c73ad14</code></a>
fix: Update DataSystemConfig to accept list of synchronizers (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/404">#404</a>)</li>
<li><a
href="d78079e7f3"><code>d78079e</code></a>
fix: Allow modifying fdv2 data source options independent of main config
(<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/403">#403</a>)</li>
<li><a
href="e99a27d48f"><code>e99a27d</code></a>
chore: Support persistent data store verification in contract tests (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/402">#402</a>)</li>
<li><a
href="cbfc3dd887"><code>cbfc3dd</code></a>
fix: Update reason documentation with inExperiment value (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/401">#401</a>)</li>
<li><a
href="5a1adbb2de"><code>5a1adbb</code></a>
chore: Update sdk_metadata features (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/400">#400</a>)</li>
<li><a
href="da565a2dce"><code>da565a2</code></a>
fix: Prevent immediate polling on recoverable error (<a
href="https://redirect.github.com/launchdarkly/python-server-sdk/issues/399">#399</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/launchdarkly/python-server-sdk/compare/9.14.1...9.15.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `supabase` from 2.27.2 to 2.28.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/supabase/supabase-py/releases">supabase's
releases</a>.</em></p>
<blockquote>
<h2>v2.28.0</h2>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.3...v2.28.0">2.28.0</a>
(2026-02-10)</h2>
<h3>Features</h3>
<ul>
<li><strong>storage:</strong> add list_v2 method to file_api client (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1377">#1377</a>)
(<a
href="259f4ad42d">259f4ad</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>auth:</strong> add missing is_sso_user, deleted_at,
banned_until to User model (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1375">#1375</a>)
(<a
href="7f84a62996">7f84a62</a>)</li>
<li><strong>realtime:</strong> ensure remove_channel removes channel
from channels dict (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1373">#1373</a>)
(<a
href="0923314039">0923314</a>)</li>
<li><strong>realtime:</strong> use pop with default in _handle_message
to prevent KeyError (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1388">#1388</a>)
(<a
href="baea26f7ce">baea26f</a>)</li>
<li><strong>storage3:</strong> replace print() with warnings.warn() for
trailing slash notice (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1380">#1380</a>)
(<a
href="50b099fa06">50b099f</a>)</li>
</ul>
<h2>v2.27.3</h2>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.2...v2.27.3">2.27.3</a>
(2026-02-03)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>deprecate python 3.9 in all packages (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1365">#1365</a>)
(<a
href="cc72ed75d4">cc72ed7</a>)</li>
<li>ensure storage_url has trailing slash to prevent warning (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1367">#1367</a>)
(<a
href="4267ff1345">4267ff1</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/supabase/supabase-py/blob/main/CHANGELOG.md">supabase's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.3...v2.28.0">2.28.0</a>
(2026-02-10)</h2>
<h3>Features</h3>
<ul>
<li><strong>storage:</strong> add list_v2 method to file_api client (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1377">#1377</a>)
(<a
href="259f4ad42d">259f4ad</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li><strong>auth:</strong> add missing is_sso_user, deleted_at,
banned_until to User model (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1375">#1375</a>)
(<a
href="7f84a62996">7f84a62</a>)</li>
<li><strong>realtime:</strong> ensure remove_channel removes channel
from channels dict (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1373">#1373</a>)
(<a
href="0923314039">0923314</a>)</li>
<li><strong>realtime:</strong> use pop with default in _handle_message
to prevent KeyError (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1388">#1388</a>)
(<a
href="baea26f7ce">baea26f</a>)</li>
<li><strong>storage3:</strong> replace print() with warnings.warn() for
trailing slash notice (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1380">#1380</a>)
(<a
href="50b099fa06">50b099f</a>)</li>
</ul>
<h2><a
href="https://github.com/supabase/supabase-py/compare/v2.27.2...v2.27.3">2.27.3</a>
(2026-02-03)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>deprecate python 3.9 in all packages (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1365">#1365</a>)
(<a
href="cc72ed75d4">cc72ed7</a>)</li>
<li>ensure storage_url has trailing slash to prevent warning (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1367">#1367</a>)
(<a
href="4267ff1345">4267ff1</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="59e338400b"><code>59e3384</code></a>
chore(main): release 2.28.0 (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1378">#1378</a>)</li>
<li><a
href="baea26f7ce"><code>baea26f</code></a>
fix(realtime): use pop with default in _handle_message to prevent
KeyError (#...</li>
<li><a
href="259f4ad42d"><code>259f4ad</code></a>
feat(storage): add list_v2 method to file_api client (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1377">#1377</a>)</li>
<li><a
href="50b099fa06"><code>50b099f</code></a>
fix(storage3): replace print() with warnings.warn() for trailing slash
notice...</li>
<li><a
href="0923314039"><code>0923314</code></a>
fix(realtime): ensure remove_channel removes channel from channels dict
(<a
href="https://redirect.github.com/supabase/supabase-py/issues/1373">#1373</a>)</li>
<li><a
href="7f84a62996"><code>7f84a62</code></a>
fix(auth): add missing is_sso_user, deleted_at, banned_until to User
model (#...</li>
<li><a
href="57dd6e2195"><code>57dd6e2</code></a>
chore(deps): bump the uv group across 1 directory with 3 updates (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1369">#1369</a>)</li>
<li><a
href="c357def670"><code>c357def</code></a>
chore(main): release 2.27.3 (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1368">#1368</a>)</li>
<li><a
href="4267ff1345"><code>4267ff1</code></a>
fix: ensure storage_url has trailing slash to prevent warning (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1367">#1367</a>)</li>
<li><a
href="cc72ed75d4"><code>cc72ed7</code></a>
fix: deprecate python 3.9 in all packages (<a
href="https://redirect.github.com/supabase/supabase-py/issues/1365">#1365</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/supabase/supabase-py/compare/v2.27.2...v2.28.0">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

Dependency update bumps 4 packages in the production-dependencies group,
including a **critical security patch for `cryptography`**
(CVE-2026-26007) that prevents malicious public key attacks on binary
elliptic curves. The update also includes bug fixes for `fastapi`,
`launchdarkly-server-sdk`, and `supabase`.

- **cryptography** 46.0.4 → 46.0.5: patches CVE-2026-26007, deprecates
SECT* binary curves
- **fastapi** 0.128.0 → 0.128.7: bug fixes, improved error handling,
relaxed Starlette constraint
- **launchdarkly-server-sdk** 9.14.1 → 9.15.0: drops Python 3.9 support
(requires >=3.10), fixes race conditions
- **supabase** 2.27.2/2.27.3 → 2.28.0: realtime fixes, new User model
fields

The lock files correctly resolve all dependencies. Python 3.10+
requirement is already enforced in both packages. However, backend's
`pyproject.toml` still specifies `launchdarkly-server-sdk = "^9.14.1"`
while the lock file uses 9.15.0 (pulled from autogpt_libs dependency),
creating a minor version constraint inconsistency.
</details>


<details><summary><h3>Confidence Score: 4/5</h3></summary>

- This PR is safe to merge with one minor style suggestion
- Automated dependency update with critical security patch for
cryptography. All updates are backwards-compatible within semver
constraints. Lock files correctly resolve all dependencies. Python 3.10+
is already enforced. Only minor issue is version constraint
inconsistency in backend's pyproject.toml for launchdarkly-server-sdk,
which doesn't affect functionality but should be aligned for clarity.
- autogpt_platform/backend/pyproject.toml needs launchdarkly-server-sdk
version constraint updated to ^9.15.0
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Otto <otto@agpt.co>
2026-02-13 09:10:11 +00:00
301 changed files with 26613 additions and 21046 deletions

View File

@@ -5,42 +5,13 @@
!docs/
# Platform - Libs
!autogpt_platform/autogpt_libs/autogpt_libs/
!autogpt_platform/autogpt_libs/pyproject.toml
!autogpt_platform/autogpt_libs/poetry.lock
!autogpt_platform/autogpt_libs/README.md
!autogpt_platform/autogpt_libs/
# Platform - Backend
!autogpt_platform/backend/backend/
!autogpt_platform/backend/test/e2e_test_data.py
!autogpt_platform/backend/migrations/
!autogpt_platform/backend/schema.prisma
!autogpt_platform/backend/pyproject.toml
!autogpt_platform/backend/poetry.lock
!autogpt_platform/backend/README.md
!autogpt_platform/backend/.env
!autogpt_platform/backend/gen_prisma_types_stub.py
# Platform - Market
!autogpt_platform/market/market/
!autogpt_platform/market/scripts.py
!autogpt_platform/market/schema.prisma
!autogpt_platform/market/pyproject.toml
!autogpt_platform/market/poetry.lock
!autogpt_platform/market/README.md
!autogpt_platform/backend/
# Platform - Frontend
!autogpt_platform/frontend/src/
!autogpt_platform/frontend/public/
!autogpt_platform/frontend/scripts/
!autogpt_platform/frontend/package.json
!autogpt_platform/frontend/pnpm-lock.yaml
!autogpt_platform/frontend/tsconfig.json
!autogpt_platform/frontend/README.md
## config
!autogpt_platform/frontend/*.config.*
!autogpt_platform/frontend/.env.*
!autogpt_platform/frontend/.env
!autogpt_platform/frontend/
# Classic - AutoGPT
!classic/original_autogpt/autogpt/
@@ -64,6 +35,38 @@
# Classic - Frontend
!classic/frontend/build/web/
# Explicitly re-ignore some folders
.*
**/__pycache__
# Explicitly re-ignore unwanted files from whitelisted directories
# Note: These patterns MUST come after the whitelist rules to take effect
# Hidden files and directories (but keep frontend .env files needed for build)
**/.*
!autogpt_platform/frontend/.env
!autogpt_platform/frontend/.env.default
!autogpt_platform/frontend/.env.production
# Python artifacts
**/__pycache__/
**/*.pyc
**/*.pyo
**/.venv/
**/.ruff_cache/
**/.pytest_cache/
**/.coverage
**/htmlcov/
# Node artifacts
**/node_modules/
**/.next/
**/storybook-static/
**/playwright-report/
**/test-results/
# Build artifacts
**/dist/
**/build/
!autogpt_platform/frontend/src/**/build/
**/target/
# Logs and temp files
**/*.log
**/*.tmp

1229
.github/scripts/detect_overlaps.py vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -40,6 +40,48 @@ jobs:
git checkout -b "$BRANCH_NAME"
echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT
# Backend Python/Poetry setup (so Claude can run linting/tests)
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
cd autogpt_platform/backend
HEAD_POETRY_VERSION=$(python3 ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install Python dependencies
working-directory: autogpt_platform/backend
run: poetry install
- name: Generate Prisma Client
working-directory: autogpt_platform/backend
run: poetry run prisma generate && poetry run gen-prisma-stub
# Frontend Node.js/pnpm setup (so Claude can run linting/tests)
- name: Enable corepack
run: corepack enable
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install JavaScript dependencies
working-directory: autogpt_platform/frontend
run: pnpm install --frozen-lockfile
- name: Get CI failure details
id: failure_details
uses: actions/github-script@v8

View File

@@ -77,27 +77,15 @@ jobs:
run: poetry run prisma generate && poetry run gen-prisma-stub
# Frontend Node.js/pnpm setup (mirrors platform-frontend-ci.yml)
- name: Enable corepack
run: corepack enable
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22"
- name: Enable corepack
run: corepack enable
- name: Set pnpm store directory
run: |
pnpm config set store-dir ~/.pnpm-store
echo "PNPM_HOME=$HOME/.pnpm-store" >> $GITHUB_ENV
- name: Cache frontend dependencies
uses: actions/cache@v5
with:
path: ~/.pnpm-store
key: ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install JavaScript dependencies
working-directory: autogpt_platform/frontend

View File

@@ -93,27 +93,15 @@ jobs:
run: poetry run prisma generate && poetry run gen-prisma-stub
# Frontend Node.js/pnpm setup (mirrors platform-frontend-ci.yml)
- name: Enable corepack
run: corepack enable
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22"
- name: Enable corepack
run: corepack enable
- name: Set pnpm store directory
run: |
pnpm config set store-dir ~/.pnpm-store
echo "PNPM_HOME=$HOME/.pnpm-store" >> $GITHUB_ENV
- name: Cache frontend dependencies
uses: actions/cache@v5
with:
path: ~/.pnpm-store
key: ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install JavaScript dependencies
working-directory: autogpt_platform/frontend

View File

@@ -62,7 +62,7 @@ jobs:
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v3
uses: github/codeql-action/init@v4
with:
languages: ${{ matrix.language }}
build-mode: ${{ matrix.build-mode }}
@@ -93,6 +93,6 @@ jobs:
exit 1
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
uses: github/codeql-action/analyze@v4
with:
category: "/language:${{matrix.language}}"

View File

@@ -7,6 +7,10 @@ on:
- "docs/integrations/**"
- "autogpt_platform/backend/backend/blocks/**"
concurrency:
group: claude-docs-review-${{ github.event.pull_request.number }}
cancel-in-progress: true
jobs:
claude-review:
# Only run for PRs from members/collaborators
@@ -91,5 +95,35 @@ jobs:
3. Read corresponding documentation files to verify accuracy
4. Provide your feedback as a PR comment
## IMPORTANT: Comment Marker
Start your PR comment with exactly this HTML comment marker on its own line:
<!-- CLAUDE_DOCS_REVIEW -->
This marker is used to identify and replace your comment on subsequent runs.
Be constructive and specific. If everything looks good, say so!
If there are issues, explain what's wrong and suggest how to fix it.
- name: Delete old Claude review comments
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Get all comment IDs with our marker, sorted by creation date (oldest first)
COMMENT_IDS=$(gh api \
repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments \
--jq '[.[] | select(.body | contains("<!-- CLAUDE_DOCS_REVIEW -->"))] | sort_by(.created_at) | .[].id')
# Count comments
COMMENT_COUNT=$(echo "$COMMENT_IDS" | grep -c . || true)
if [ "$COMMENT_COUNT" -gt 1 ]; then
# Delete all but the last (newest) comment
echo "$COMMENT_IDS" | head -n -1 | while read -r COMMENT_ID; do
if [ -n "$COMMENT_ID" ]; then
echo "Deleting old review comment: $COMMENT_ID"
gh api -X DELETE repos/${{ github.repository }}/issues/comments/$COMMENT_ID
fi
done
else
echo "No old review comments to clean up"
fi

View File

@@ -41,13 +41,18 @@ jobs:
ports:
- 6379:6379
rabbitmq:
image: rabbitmq:3.12-management
image: rabbitmq:4.1.4
ports:
- 5672:5672
- 15672:15672
env:
RABBITMQ_DEFAULT_USER: ${{ env.RABBITMQ_DEFAULT_USER }}
RABBITMQ_DEFAULT_PASS: ${{ env.RABBITMQ_DEFAULT_PASS }}
options: >-
--health-cmd "rabbitmq-diagnostics -q ping"
--health-interval 30s
--health-timeout 10s
--health-retries 5
--health-start-period 10s
clamav:
image: clamav/clamav-debian:latest
ports:

View File

@@ -6,10 +6,16 @@ on:
paths:
- ".github/workflows/platform-frontend-ci.yml"
- "autogpt_platform/frontend/**"
- "autogpt_platform/backend/Dockerfile"
- "autogpt_platform/docker-compose.yml"
- "autogpt_platform/docker-compose.platform.yml"
pull_request:
paths:
- ".github/workflows/platform-frontend-ci.yml"
- "autogpt_platform/frontend/**"
- "autogpt_platform/backend/Dockerfile"
- "autogpt_platform/docker-compose.yml"
- "autogpt_platform/docker-compose.platform.yml"
merge_group:
workflow_dispatch:
@@ -26,7 +32,6 @@ jobs:
setup:
runs-on: ubuntu-latest
outputs:
cache-key: ${{ steps.cache-key.outputs.key }}
components-changed: ${{ steps.filter.outputs.components }}
steps:
@@ -41,28 +46,17 @@ jobs:
components:
- 'autogpt_platform/frontend/src/components/**'
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Generate cache key
id: cache-key
run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT
- name: Cache dependencies
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ steps.cache-key.outputs.key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
- name: Install dependencies to populate cache
run: pnpm install --frozen-lockfile
lint:
@@ -73,22 +67,15 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Restore dependencies cache
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
run: pnpm install --frozen-lockfile
@@ -111,22 +98,15 @@ jobs:
with:
fetch-depth: 0
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Restore dependencies cache
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
run: pnpm install --frozen-lockfile
@@ -141,10 +121,8 @@ jobs:
exitOnceUploaded: true
e2e_test:
name: end-to-end tests
runs-on: big-boi
needs: setup
strategy:
fail-fast: false
steps:
- name: Checkout repository
@@ -152,19 +130,11 @@ jobs:
with:
submodules: recursive
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Copy default supabase .env
- name: Set up Platform - Copy default supabase .env
run: |
cp ../.env.default ../.env
- name: Copy backend .env and set OpenAI API key
- name: Set up Platform - Copy backend .env and set OpenAI API key
run: |
cp ../backend/.env.default ../backend/.env
echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
@@ -172,77 +142,125 @@ jobs:
# Used by E2E test data script to generate embeddings for approved store agents
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
- name: Set up Docker Buildx
- name: Set up Platform - Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
driver: docker-container
driver-opts: network=host
- name: Cache Docker layers
- name: Set up Platform - Expose GHA cache to docker buildx CLI
uses: crazy-max/ghaction-github-runtime@v3
- name: Set up Platform - Build Docker images (with cache)
working-directory: autogpt_platform
run: |
pip install pyyaml
# Resolve extends and generate a flat compose file that bake can understand
docker compose -f docker-compose.yml config > docker-compose.resolved.yml
# Add cache configuration to the resolved compose file
python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
--source docker-compose.resolved.yml \
--cache-from "type=gha" \
--cache-to "type=gha,mode=max" \
--backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend') }}" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src') }}" \
--git-ref "${{ github.ref }}"
# Build with bake using the resolved compose file (now includes cache config)
docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
env:
NEXT_PUBLIC_PW_TEST: true
- name: Set up tests - Cache E2E test data
id: e2e-data-cache
uses: actions/cache@v5
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-frontend-test-${{ hashFiles('autogpt_platform/docker-compose.yml', 'autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/pyproject.toml', 'autogpt_platform/backend/poetry.lock') }}
restore-keys: |
${{ runner.os }}-buildx-frontend-test-
path: /tmp/e2e_test_data.sql
key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-frontend-ci.yml') }}
- name: Run docker compose
- name: Set up Platform - Start Supabase DB + Auth
run: |
NEXT_PUBLIC_PW_TEST=true docker compose -f ../docker-compose.yml up -d
docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
echo "Waiting for database to be ready..."
timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
echo "Waiting for auth service to be ready..."
timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
- name: Set up Platform - Run migrations
run: |
echo "Running migrations..."
docker compose -f ../docker-compose.resolved.yml run --rm migrate
echo "✅ Migrations completed"
env:
DOCKER_BUILDKIT: 1
BUILDX_CACHE_FROM: type=local,src=/tmp/.buildx-cache
BUILDX_CACHE_TO: type=local,dest=/tmp/.buildx-cache-new,mode=max
NEXT_PUBLIC_PW_TEST: true
- name: Move cache
- name: Set up tests - Load cached E2E test data
if: steps.e2e-data-cache.outputs.cache-hit == 'true'
run: |
rm -rf /tmp/.buildx-cache
if [ -d "/tmp/.buildx-cache-new" ]; then
mv /tmp/.buildx-cache-new /tmp/.buildx-cache
fi
echo "✅ Found cached E2E test data, restoring..."
{
echo "SET session_replication_role = 'replica';"
cat /tmp/e2e_test_data.sql
echo "SET session_replication_role = 'origin';"
} | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
# Refresh materialized views after restore
docker compose -f ../docker-compose.resolved.yml exec -T db \
psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
- name: Wait for services to be ready
echo "✅ E2E test data restored from cache"
- name: Set up Platform - Start (all other services)
run: |
docker compose -f ../docker-compose.resolved.yml up -d --no-build
echo "Waiting for rest_server to be ready..."
timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
echo "Waiting for database to be ready..."
timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
env:
NEXT_PUBLIC_PW_TEST: true
- name: Create E2E test data
- name: Set up tests - Create E2E test data
if: steps.e2e-data-cache.outputs.cache-hit != 'true'
run: |
echo "Creating E2E test data..."
# First try to run the script from inside the container
if docker compose -f ../docker-compose.yml exec -T rest_server test -f /app/autogpt_platform/backend/test/e2e_test_data.py; then
echo "✅ Found e2e_test_data.py in container, running it..."
docker compose -f ../docker-compose.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python backend/test/e2e_test_data.py" || {
echo "❌ E2E test data creation failed!"
docker compose -f ../docker-compose.yml logs --tail=50 rest_server
exit 1
}
else
echo "⚠️ e2e_test_data.py not found in container, copying and running..."
# Copy the script into the container and run it
docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.yml ps -q rest_server):/tmp/e2e_test_data.py || {
echo "❌ Failed to copy script to container"
exit 1
}
docker compose -f ../docker-compose.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
echo "❌ E2E test data creation failed!"
docker compose -f ../docker-compose.yml logs --tail=50 rest_server
exit 1
}
fi
docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
echo "❌ E2E test data creation failed!"
docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
exit 1
}
- name: Restore dependencies cache
uses: actions/cache@v5
# Dump auth.users + platform schema for cache (two separate dumps)
echo "Dumping database for cache..."
{
docker compose -f ../docker-compose.resolved.yml exec -T db \
pg_dump -U postgres --data-only --column-inserts \
--table='auth.users' postgres
docker compose -f ../docker-compose.resolved.yml exec -T db \
pg_dump -U postgres --data-only --column-inserts \
--schema=platform \
--exclude-table='platform._prisma_migrations' \
--exclude-table='platform.apscheduler_jobs' \
--exclude-table='platform.apscheduler_jobs_batched_notifications' \
postgres
} > /tmp/e2e_test_data.sql
echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
- name: Set up tests - Enable corepack
run: corepack enable
- name: Set up tests - Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
- name: Set up tests - Install dependencies
run: pnpm install --frozen-lockfile
- name: Install Browser 'chromium'
- name: Set up tests - Install browser 'chromium'
run: pnpm playwright install --with-deps chromium
- name: Run Playwright tests
@@ -269,7 +287,7 @@ jobs:
- name: Print Final Docker Compose logs
if: always()
run: docker compose -f ../docker-compose.yml logs
run: docker compose -f ../docker-compose.resolved.yml logs
integration_test:
runs-on: ubuntu-latest
@@ -281,22 +299,15 @@ jobs:
with:
submodules: recursive
- name: Set up Node.js
uses: actions/setup-node@v6
with:
node-version: "22.18.0"
- name: Enable corepack
run: corepack enable
- name: Restore dependencies cache
uses: actions/cache@v5
- name: Set up Node
uses: actions/setup-node@v6
with:
path: ~/.pnpm-store
key: ${{ needs.setup.outputs.cache-key }}
restore-keys: |
${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
${{ runner.os }}-pnpm-
node-version: "22.18.0"
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Install dependencies
run: pnpm install --frozen-lockfile

39
.github/workflows/pr-overlap-check.yml vendored Normal file
View File

@@ -0,0 +1,39 @@
name: PR Overlap Detection
on:
pull_request:
types: [opened, synchronize, reopened]
branches:
- dev
- master
permissions:
contents: read
pull-requests: write
jobs:
check-overlaps:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # Need full history for merge testing
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Configure git
run: |
git config user.email "github-actions[bot]@users.noreply.github.com"
git config user.name "github-actions[bot]"
- name: Run overlap detection
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# Always succeed - this check informs contributors, it shouldn't block merging
continue-on-error: true
run: |
python .github/scripts/detect_overlaps.py ${{ github.event.pull_request.number }}

View File

@@ -0,0 +1,195 @@
#!/usr/bin/env python3
"""
Add cache configuration to a resolved docker-compose file for all services
that have a build key, and ensure image names match what docker compose expects.
"""
import argparse
import yaml
DEFAULT_BRANCH = "dev"
CACHE_BUILDS_FOR_COMPONENTS = ["backend", "frontend"]
def main():
parser = argparse.ArgumentParser(
description="Add cache config to a resolved compose file"
)
parser.add_argument(
"--source",
required=True,
help="Source compose file to read (should be output of `docker compose config`)",
)
parser.add_argument(
"--cache-from",
default="type=gha",
help="Cache source configuration",
)
parser.add_argument(
"--cache-to",
default="type=gha,mode=max",
help="Cache destination configuration",
)
for component in CACHE_BUILDS_FOR_COMPONENTS:
parser.add_argument(
f"--{component}-hash",
default="",
help=f"Hash for {component} cache scope (e.g., from hashFiles())",
)
parser.add_argument(
"--git-ref",
default="",
help="Git ref for branch-based cache scope (e.g., refs/heads/master)",
)
args = parser.parse_args()
# Normalize git ref to a safe scope name (e.g., refs/heads/master -> master)
git_ref_scope = ""
if args.git_ref:
git_ref_scope = args.git_ref.replace("refs/heads/", "").replace("/", "-")
with open(args.source, "r") as f:
compose = yaml.safe_load(f)
# Get project name from compose file or default
project_name = compose.get("name", "autogpt_platform")
def get_image_name(dockerfile: str, target: str) -> str:
"""Generate image name based on Dockerfile folder and build target."""
dockerfile_parts = dockerfile.replace("\\", "/").split("/")
if len(dockerfile_parts) >= 2:
folder_name = dockerfile_parts[-2] # e.g., "backend" or "frontend"
else:
folder_name = "app"
return f"{project_name}-{folder_name}:{target}"
def get_build_key(dockerfile: str, target: str) -> str:
"""Generate a unique key for a Dockerfile+target combination."""
return f"{dockerfile}:{target}"
def get_component(dockerfile: str) -> str | None:
"""Get component name (frontend/backend) from dockerfile path."""
for component in CACHE_BUILDS_FOR_COMPONENTS:
if component in dockerfile:
return component
return None
# First pass: collect all services with build configs and identify duplicates
# Track which (dockerfile, target) combinations we've seen
build_key_to_first_service: dict[str, str] = {}
services_to_build: list[str] = []
services_to_dedupe: list[str] = []
for service_name, service_config in compose.get("services", {}).items():
if "build" not in service_config:
continue
build_config = service_config["build"]
dockerfile = build_config.get("dockerfile", "Dockerfile")
target = build_config.get("target", "default")
build_key = get_build_key(dockerfile, target)
if build_key not in build_key_to_first_service:
# First service with this build config - it will do the actual build
build_key_to_first_service[build_key] = service_name
services_to_build.append(service_name)
else:
# Duplicate - will just use the image from the first service
services_to_dedupe.append(service_name)
# Second pass: configure builds and deduplicate
modified_services = []
for service_name, service_config in compose.get("services", {}).items():
if "build" not in service_config:
continue
build_config = service_config["build"]
dockerfile = build_config.get("dockerfile", "Dockerfile")
target = build_config.get("target", "latest")
image_name = get_image_name(dockerfile, target)
# Set image name for all services (needed for both builders and deduped)
service_config["image"] = image_name
if service_name in services_to_dedupe:
# Remove build config - this service will use the pre-built image
del service_config["build"]
continue
# This service will do the actual build - add cache config
cache_from_list = []
cache_to_list = []
component = get_component(dockerfile)
if not component:
# Skip services that don't clearly match frontend/backend
continue
# Get the hash for this component
component_hash = getattr(args, f"{component}_hash")
# Scope format: platform-{component}-{target}-{hash|ref}
# Example: platform-backend-server-abc123
if "type=gha" in args.cache_from:
# 1. Primary: exact hash match (most specific)
if component_hash:
hash_scope = f"platform-{component}-{target}-{component_hash}"
cache_from_list.append(f"{args.cache_from},scope={hash_scope}")
# 2. Fallback: branch-based cache
if git_ref_scope:
ref_scope = f"platform-{component}-{target}-{git_ref_scope}"
cache_from_list.append(f"{args.cache_from},scope={ref_scope}")
# 3. Fallback: dev branch cache (for PRs/feature branches)
if git_ref_scope and git_ref_scope != DEFAULT_BRANCH:
master_scope = f"platform-{component}-{target}-{DEFAULT_BRANCH}"
cache_from_list.append(f"{args.cache_from},scope={master_scope}")
if "type=gha" in args.cache_to:
# Write to both hash-based and branch-based scopes
if component_hash:
hash_scope = f"platform-{component}-{target}-{component_hash}"
cache_to_list.append(f"{args.cache_to},scope={hash_scope}")
if git_ref_scope:
ref_scope = f"platform-{component}-{target}-{git_ref_scope}"
cache_to_list.append(f"{args.cache_to},scope={ref_scope}")
# Ensure we have at least one cache source/target
if not cache_from_list:
cache_from_list.append(args.cache_from)
if not cache_to_list:
cache_to_list.append(args.cache_to)
build_config["cache_from"] = cache_from_list
build_config["cache_to"] = cache_to_list
modified_services.append(service_name)
# Write back to the same file
with open(args.source, "w") as f:
yaml.dump(compose, f, default_flow_style=False, sort_keys=False)
print(f"Added cache config to {len(modified_services)} services in {args.source}:")
for svc in modified_services:
svc_config = compose["services"][svc]
build_cfg = svc_config.get("build", {})
cache_from_list = build_cfg.get("cache_from", ["none"])
cache_to_list = build_cfg.get("cache_to", ["none"])
print(f" - {svc}")
print(f" image: {svc_config.get('image', 'N/A')}")
print(f" cache_from: {cache_from_list}")
print(f" cache_to: {cache_to_list}")
if services_to_dedupe:
print(
f"Deduplicated {len(services_to_dedupe)} services (will use pre-built images):"
)
for svc in services_to_dedupe:
print(f" - {svc} -> {compose['services'][svc].get('image', 'N/A')}")
if __name__ == "__main__":
main()

View File

@@ -45,6 +45,11 @@ AutoGPT Platform is a monorepo containing:
- Backend/Frontend services use YAML anchors for consistent configuration
- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
### Branching Strategy
- **`dev`** is the main development branch. All PRs should target `dev`.
- **`master`** is the production branch. Only used for production releases.
### Creating Pull Requests
- Create the PR against the `dev` branch of the repository.

View File

@@ -448,61 +448,61 @@ toml = ["tomli ; python_full_version <= \"3.11.0a6\""]
[[package]]
name = "cryptography"
version = "46.0.4"
version = "46.0.5"
description = "cryptography is a package which provides cryptographic recipes and primitives to Python developers."
optional = false
python-versions = "!=3.9.0,!=3.9.1,>=3.8"
groups = ["main"]
files = [
{file = "cryptography-46.0.4-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:281526e865ed4166009e235afadf3a4c4cba6056f99336a99efba65336fd5485"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5f14fba5bf6f4390d7ff8f086c566454bff0411f6d8aa7af79c88b6f9267aecc"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:47bcd19517e6389132f76e2d5303ded6cf3f78903da2158a671be8de024f4cd0"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:01df4f50f314fbe7009f54046e908d1754f19d0c6d3070df1e6268c5a4af09fa"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:5aa3e463596b0087b3da0dbe2b2487e9fc261d25da85754e30e3b40637d61f81"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:0a9ad24359fee86f131836a9ac3bffc9329e956624a2d379b613f8f8abaf5255"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:dc1272e25ef673efe72f2096e92ae39dea1a1a450dd44918b15351f72c5a168e"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:de0f5f4ec8711ebc555f54735d4c673fc34b65c44283895f1a08c2b49d2fd99c"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:eeeb2e33d8dbcccc34d64651f00a98cb41b2dc69cef866771a5717e6734dfa32"},
{file = "cryptography-46.0.4-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:3d425eacbc9aceafd2cb429e42f4e5d5633c6f873f5e567077043ef1b9bbf616"},
{file = "cryptography-46.0.4-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:91627ebf691d1ea3976a031b61fb7bac1ccd745afa03602275dda443e11c8de0"},
{file = "cryptography-46.0.4-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:2d08bc22efd73e8854b0b7caff402d735b354862f1145d7be3b9c0f740fef6a0"},
{file = "cryptography-46.0.4-cp311-abi3-win32.whl", hash = "sha256:82a62483daf20b8134f6e92898da70d04d0ef9a75829d732ea1018678185f4f5"},
{file = "cryptography-46.0.4-cp311-abi3-win_amd64.whl", hash = "sha256:6225d3ebe26a55dbc8ead5ad1265c0403552a63336499564675b29eb3184c09b"},
{file = "cryptography-46.0.4-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:485e2b65d25ec0d901bca7bcae0f53b00133bf3173916d8e421f6fddde103908"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:078e5f06bd2fa5aea5a324f2a09f914b1484f1d0c2a4d6a8a28c74e72f65f2da"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:dce1e4f068f03008da7fa51cc7abc6ddc5e5de3e3d1550334eaf8393982a5829"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:2067461c80271f422ee7bdbe79b9b4be54a5162e90345f86a23445a0cf3fd8a2"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:c92010b58a51196a5f41c3795190203ac52edfd5dc3ff99149b4659eba9d2085"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:829c2b12bbc5428ab02d6b7f7e9bbfd53e33efd6672d21341f2177470171ad8b"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:62217ba44bf81b30abaeda1488686a04a702a261e26f87db51ff61d9d3510abd"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:9c2da296c8d3415b93e6053f5a728649a87a48ce084a9aaf51d6e46c87c7f2d2"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:9b34d8ba84454641a6bf4d6762d15847ecbd85c1316c0a7984e6e4e9f748ec2e"},
{file = "cryptography-46.0.4-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:df4a817fa7138dd0c96c8c8c20f04b8aaa1fac3bbf610913dcad8ea82e1bfd3f"},
{file = "cryptography-46.0.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:b1de0ebf7587f28f9190b9cb526e901bf448c9e6a99655d2b07fff60e8212a82"},
{file = "cryptography-46.0.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:9b4d17bc7bd7cdd98e3af40b441feaea4c68225e2eb2341026c84511ad246c0c"},
{file = "cryptography-46.0.4-cp314-cp314t-win32.whl", hash = "sha256:c411f16275b0dea722d76544a61d6421e2cc829ad76eec79280dbdc9ddf50061"},
{file = "cryptography-46.0.4-cp314-cp314t-win_amd64.whl", hash = "sha256:728fedc529efc1439eb6107b677f7f7558adab4553ef8669f0d02d42d7b959a7"},
{file = "cryptography-46.0.4-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:a9556ba711f7c23f77b151d5798f3ac44a13455cc68db7697a1096e6d0563cab"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:8bf75b0259e87fa70bddc0b8b4078b76e7fd512fd9afae6c1193bcf440a4dbef"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3c268a3490df22270955966ba236d6bc4a8f9b6e4ffddb78aac535f1a5ea471d"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:812815182f6a0c1d49a37893a303b44eaac827d7f0d582cecfc81b6427f22973"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:a90e43e3ef65e6dcf969dfe3bb40cbf5aef0d523dff95bfa24256be172a845f4"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:a05177ff6296644ef2876fce50518dffb5bcdf903c85250974fc8bc85d54c0af"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:daa392191f626d50f1b136c9b4cf08af69ca8279d110ea24f5c2700054d2e263"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:e07ea39c5b048e085f15923511d8121e4a9dc45cee4e3b970ca4f0d338f23095"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:d5a45ddc256f492ce42a4e35879c5e5528c09cd9ad12420828c972951d8e016b"},
{file = "cryptography-46.0.4-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:6bb5157bf6a350e5b28aee23beb2d84ae6f5be390b2f8ee7ea179cda077e1019"},
{file = "cryptography-46.0.4-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:dd5aba870a2c40f87a3af043e0dee7d9eb02d4aff88a797b48f2b43eff8c3ab4"},
{file = "cryptography-46.0.4-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:93d8291da8d71024379ab2cb0b5c57915300155ad42e07f76bea6ad838d7e59b"},
{file = "cryptography-46.0.4-cp38-abi3-win32.whl", hash = "sha256:0563655cb3c6d05fb2afe693340bc050c30f9f34e15763361cf08e94749401fc"},
{file = "cryptography-46.0.4-cp38-abi3-win_amd64.whl", hash = "sha256:fa0900b9ef9c49728887d1576fd8d9e7e3ea872fa9b25ef9b64888adc434e976"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:766330cce7416c92b5e90c3bb71b1b79521760cdcfc3a6a1a182d4c9fab23d2b"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:c236a44acfb610e70f6b3e1c3ca20ff24459659231ef2f8c48e879e2d32b73da"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:8a15fb869670efa8f83cbffbc8753c1abf236883225aed74cd179b720ac9ec80"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:fdc3daab53b212472f1524d070735b2f0c214239df131903bae1d598016fa822"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:44cc0675b27cadb71bdbb96099cca1fa051cd11d2ade09e5cd3a2edb929ed947"},
{file = "cryptography-46.0.4-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:be8c01a7d5a55f9a47d1888162b76c8f49d62b234d88f0ff91a9fbebe32ffbc3"},
{file = "cryptography-46.0.4.tar.gz", hash = "sha256:bfd019f60f8abc2ed1b9be4ddc21cfef059c841d86d710bb69909a688cbb8f59"},
{file = "cryptography-46.0.5-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:351695ada9ea9618b3500b490ad54c739860883df6c1f555e088eaf25b1bbaad"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:c18ff11e86df2e28854939acde2d003f7984f721eba450b56a200ad90eeb0e6b"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4d7e3d356b8cd4ea5aff04f129d5f66ebdc7b6f8eae802b93739ed520c47c79b"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:50bfb6925eff619c9c023b967d5b77a54e04256c4281b0e21336a130cd7fc263"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:803812e111e75d1aa73690d2facc295eaefd4439be1023fefc4995eaea2af90d"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ee190460e2fbe447175cda91b88b84ae8322a104fc27766ad09428754a618ed"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:f145bba11b878005c496e93e257c1e88f154d278d2638e6450d17e0f31e558d2"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:e9251e3be159d1020c4030bd2e5f84d6a43fe54b6c19c12f51cde9542a2817b2"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:47fb8a66058b80e509c47118ef8a75d14c455e81ac369050f20ba0d23e77fee0"},
{file = "cryptography-46.0.5-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:4c3341037c136030cb46e4b1e17b7418ea4cbd9dd207e4a6f3b2b24e0d4ac731"},
{file = "cryptography-46.0.5-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:890bcb4abd5a2d3f852196437129eb3667d62630333aacc13dfd470fad3aaa82"},
{file = "cryptography-46.0.5-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:80a8d7bfdf38f87ca30a5391c0c9ce4ed2926918e017c29ddf643d0ed2778ea1"},
{file = "cryptography-46.0.5-cp311-abi3-win32.whl", hash = "sha256:60ee7e19e95104d4c03871d7d7dfb3d22ef8a9b9c6778c94e1c8fcc8365afd48"},
{file = "cryptography-46.0.5-cp311-abi3-win_amd64.whl", hash = "sha256:38946c54b16c885c72c4f59846be9743d699eee2b69b6988e0a00a01f46a61a4"},
{file = "cryptography-46.0.5-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:94a76daa32eb78d61339aff7952ea819b1734b46f73646a07decb40e5b3448e2"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5be7bf2fb40769e05739dd0046e7b26f9d4670badc7b032d6ce4db64dddc0678"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fe346b143ff9685e40192a4960938545c699054ba11d4f9029f94751e3f71d87"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:c69fd885df7d089548a42d5ec05be26050ebcd2283d89b3d30676eb32ff87dee"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:8293f3dea7fc929ef7240796ba231413afa7b68ce38fd21da2995549f5961981"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:1abfdb89b41c3be0365328a410baa9df3ff8a9110fb75e7b52e66803ddabc9a9"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:d66e421495fdb797610a08f43b05269e0a5ea7f5e652a89bfd5a7d3c1dee3648"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:4e817a8920bfbcff8940ecfd60f23d01836408242b30f1a708d93198393a80b4"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:68f68d13f2e1cb95163fa3b4db4bf9a159a418f5f6e7242564fc75fcae667fd0"},
{file = "cryptography-46.0.5-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:a3d1fae9863299076f05cb8a778c467578262fae09f9dc0ee9b12eb4268ce663"},
{file = "cryptography-46.0.5-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:c4143987a42a2397f2fc3b4d7e3a7d313fbe684f67ff443999e803dd75a76826"},
{file = "cryptography-46.0.5-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:7d731d4b107030987fd61a7f8ab512b25b53cef8f233a97379ede116f30eb67d"},
{file = "cryptography-46.0.5-cp314-cp314t-win32.whl", hash = "sha256:c3bcce8521d785d510b2aad26ae2c966092b7daa8f45dd8f44734a104dc0bc1a"},
{file = "cryptography-46.0.5-cp314-cp314t-win_amd64.whl", hash = "sha256:4d8ae8659ab18c65ced284993c2265910f6c9e650189d4e3f68445ef82a810e4"},
{file = "cryptography-46.0.5-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:4108d4c09fbbf2789d0c926eb4152ae1760d5a2d97612b92d508d96c861e4d31"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:7d1f30a86d2757199cb2d56e48cce14deddf1f9c95f1ef1b64ee91ea43fe2e18"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:039917b0dc418bb9f6edce8a906572d69e74bd330b0b3fea4f79dab7f8ddd235"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:ba2a27ff02f48193fc4daeadf8ad2590516fa3d0adeeb34336b96f7fa64c1e3a"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:61aa400dce22cb001a98014f647dc21cda08f7915ceb95df0c9eaf84b4b6af76"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ce58ba46e1bc2aac4f7d9290223cead56743fa6ab94a5d53292ffaac6a91614"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:420d0e909050490d04359e7fdb5ed7e667ca5c3c402b809ae2563d7e66a92229"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:582f5fcd2afa31622f317f80426a027f30dc792e9c80ffee87b993200ea115f1"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:bfd56bb4b37ed4f330b82402f6f435845a5f5648edf1ad497da51a8452d5d62d"},
{file = "cryptography-46.0.5-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:a3d507bb6a513ca96ba84443226af944b0f7f47dcc9a399d110cd6146481d24c"},
{file = "cryptography-46.0.5-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9f16fbdf4da055efb21c22d81b89f155f02ba420558db21288b3d0035bafd5f4"},
{file = "cryptography-46.0.5-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:ced80795227d70549a411a4ab66e8ce307899fad2220ce5ab2f296e687eacde9"},
{file = "cryptography-46.0.5-cp38-abi3-win32.whl", hash = "sha256:02f547fce831f5096c9a567fd41bc12ca8f11df260959ecc7c3202555cc47a72"},
{file = "cryptography-46.0.5-cp38-abi3-win_amd64.whl", hash = "sha256:556e106ee01aa13484ce9b0239bca667be5004efb0aabbed28d353df86445595"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:3b4995dc971c9fb83c25aa44cf45f02ba86f71ee600d81091c2f0cbae116b06c"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:bc84e875994c3b445871ea7181d424588171efec3e185dced958dad9e001950a"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:2ae6971afd6246710480e3f15824ed3029a60fc16991db250034efd0b9fb4356"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:d861ee9e76ace6cf36a6a89b959ec08e7bc2493ee39d07ffe5acb23ef46d27da"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:2b7a67c9cd56372f3249b39699f2ad479f6991e62ea15800973b956f4b73e257"},
{file = "cryptography-46.0.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:8456928655f856c6e1533ff59d5be76578a7157224dbd9ce6872f25055ab9ab7"},
{file = "cryptography-46.0.5.tar.gz", hash = "sha256:abace499247268e3757271b2f1e244b36b06f8515cf27c4d49468fc9eb16e93d"},
]
[package.dependencies]
@@ -516,7 +516,7 @@ nox = ["nox[uv] (>=2024.4.15)"]
pep8test = ["check-sdist", "click (>=8.0.1)", "mypy (>=1.14)", "ruff (>=0.11.11)"]
sdist = ["build (>=1.0.0)"]
ssh = ["bcrypt (>=3.1.5)"]
test = ["certifi (>=2024)", "cryptography-vectors (==46.0.4)", "pretend (>=0.7)", "pytest (>=7.4.0)", "pytest-benchmark (>=4.0)", "pytest-cov (>=2.10.1)", "pytest-xdist (>=3.5.0)"]
test = ["certifi (>=2024)", "cryptography-vectors (==46.0.5)", "pretend (>=0.7)", "pytest (>=7.4.0)", "pytest-benchmark (>=4.0)", "pytest-cov (>=2.10.1)", "pytest-xdist (>=3.5.0)"]
test-randomorder = ["pytest-randomly"]
[[package]]
@@ -570,24 +570,25 @@ tests = ["coverage", "coveralls", "dill", "mock", "nose"]
[[package]]
name = "fastapi"
version = "0.128.0"
version = "0.128.7"
description = "FastAPI framework, high performance, easy to learn, fast to code, ready for production"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "fastapi-0.128.0-py3-none-any.whl", hash = "sha256:aebd93f9716ee3b4f4fcfe13ffb7cf308d99c9f3ab5622d8877441072561582d"},
{file = "fastapi-0.128.0.tar.gz", hash = "sha256:1cc179e1cef10a6be60ffe429f79b829dce99d8de32d7acb7e6c8dfdf7f2645a"},
{file = "fastapi-0.128.7-py3-none-any.whl", hash = "sha256:6bd9bd31cb7047465f2d3fa3ba3f33b0870b17d4eaf7cdb36d1576ab060ad662"},
{file = "fastapi-0.128.7.tar.gz", hash = "sha256:783c273416995486c155ad2c0e2b45905dedfaf20b9ef8d9f6a9124670639a24"},
]
[package.dependencies]
annotated-doc = ">=0.0.2"
pydantic = ">=2.7.0"
starlette = ">=0.40.0,<0.51.0"
starlette = ">=0.40.0,<1.0.0"
typing-extensions = ">=4.8.0"
typing-inspection = ">=0.4.2"
[package.extras]
all = ["email-validator (>=2.0.0)", "fastapi-cli[standard] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "itsdangerous (>=1.1.0)", "jinja2 (>=3.1.5)", "orjson (>=3.2.1)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "pyyaml (>=5.3.1)", "ujson (>=4.0.1,!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0)", "uvicorn[standard] (>=0.12.0)"]
all = ["email-validator (>=2.0.0)", "fastapi-cli[standard] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "itsdangerous (>=1.1.0)", "jinja2 (>=3.1.5)", "orjson (>=3.9.3)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "pyyaml (>=5.3.1)", "ujson (>=5.8.0)", "uvicorn[standard] (>=0.12.0)"]
standard = ["email-validator (>=2.0.0)", "fastapi-cli[standard] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "jinja2 (>=3.1.5)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "uvicorn[standard] (>=0.12.0)"]
standard-no-fastapi-cloud-cli = ["email-validator (>=2.0.0)", "fastapi-cli[standard-no-fastapi-cloud-cli] (>=0.0.8)", "httpx (>=0.23.0,<1.0.0)", "jinja2 (>=3.1.5)", "pydantic-extra-types (>=2.0.0)", "pydantic-settings (>=2.0.0)", "python-multipart (>=0.0.18)", "uvicorn[standard] (>=0.12.0)"]
@@ -1062,14 +1063,14 @@ urllib3 = ">=1.26.0,<3"
[[package]]
name = "launchdarkly-server-sdk"
version = "9.14.1"
version = "9.15.0"
description = "LaunchDarkly SDK for Python"
optional = false
python-versions = ">=3.9"
python-versions = ">=3.10"
groups = ["main"]
files = [
{file = "launchdarkly_server_sdk-9.14.1-py3-none-any.whl", hash = "sha256:a9e2bd9ecdef845cd631ae0d4334a1115e5b44257c42eb2349492be4bac7815c"},
{file = "launchdarkly_server_sdk-9.14.1.tar.gz", hash = "sha256:1df44baf0a0efa74d8c1dad7a00592b98bce7d19edded7f770da8dbc49922213"},
{file = "launchdarkly_server_sdk-9.15.0-py3-none-any.whl", hash = "sha256:c267e29bfa3fb5e2a06a208448ada6ed5557a2924979b8d79c970b45d227c668"},
{file = "launchdarkly_server_sdk-9.15.0.tar.gz", hash = "sha256:f31441b74bc1a69c381db57c33116509e407a2612628ad6dff0a7dbb39d5020b"},
]
[package.dependencies]
@@ -1478,14 +1479,14 @@ testing = ["coverage", "pytest", "pytest-benchmark"]
[[package]]
name = "postgrest"
version = "2.27.2"
version = "2.28.0"
description = "PostgREST client for Python. This library provides an ORM interface to PostgREST."
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "postgrest-2.27.2-py3-none-any.whl", hash = "sha256:1666fef3de05ca097a314433dd5ae2f2d71c613cb7b233d0f468c4ffe37277da"},
{file = "postgrest-2.27.2.tar.gz", hash = "sha256:55407d530b5af3d64e883a71fec1f345d369958f723ce4a8ab0b7d169e313242"},
{file = "postgrest-2.28.0-py3-none-any.whl", hash = "sha256:7bca2f24dd1a1bf8a3d586c7482aba6cd41662da6733045fad585b63b7f7df75"},
{file = "postgrest-2.28.0.tar.gz", hash = "sha256:c36b38646d25ea4255321d3d924ce70f8d20ec7799cb42c1221d6a818d4f6515"},
]
[package.dependencies]
@@ -2248,14 +2249,14 @@ cli = ["click (>=5.0)"]
[[package]]
name = "realtime"
version = "2.27.2"
version = "2.28.0"
description = ""
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "realtime-2.27.2-py3-none-any.whl", hash = "sha256:34a9cbb26a274e707e8fc9e3ee0a66de944beac0fe604dc336d1e985db2c830f"},
{file = "realtime-2.27.2.tar.gz", hash = "sha256:b960a90294d2cea1b3f1275ecb89204304728e08fff1c393cc1b3150739556b3"},
{file = "realtime-2.28.0-py3-none-any.whl", hash = "sha256:db1bd59bab9b1fcc9f9d3b1a073bed35bf4994d720e6751f10031a58d57a3836"},
{file = "realtime-2.28.0.tar.gz", hash = "sha256:d18cedcebd6a8f22fcd509bc767f639761eb218b7b2b6f14fc4205b6259b50fc"},
]
[package.dependencies]
@@ -2436,14 +2437,14 @@ full = ["httpx (>=0.27.0,<0.29.0)", "itsdangerous", "jinja2", "python-multipart
[[package]]
name = "storage3"
version = "2.27.2"
version = "2.28.0"
description = "Supabase Storage client for Python."
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "storage3-2.27.2-py3-none-any.whl", hash = "sha256:e6f16e7a260729e7b1f46e9bf61746805a02e30f5e419ee1291007c432e3ec63"},
{file = "storage3-2.27.2.tar.gz", hash = "sha256:cb4807b7f86b4bb1272ac6fdd2f3cfd8ba577297046fa5f88557425200275af5"},
{file = "storage3-2.28.0-py3-none-any.whl", hash = "sha256:ecb50efd2ac71dabbdf97e99ad346eafa630c4c627a8e5a138ceb5fbbadae716"},
{file = "storage3-2.28.0.tar.gz", hash = "sha256:bc1d008aff67de7a0f2bd867baee7aadbcdb6f78f5a310b4f7a38e8c13c19865"},
]
[package.dependencies]
@@ -2487,35 +2488,35 @@ python-dateutil = ">=2.6.0"
[[package]]
name = "supabase"
version = "2.27.2"
version = "2.28.0"
description = "Supabase client for Python."
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "supabase-2.27.2-py3-none-any.whl", hash = "sha256:d4dce00b3a418ee578017ec577c0e5be47a9a636355009c76f20ed2faa15bc54"},
{file = "supabase-2.27.2.tar.gz", hash = "sha256:2aed40e4f3454438822442a1e94a47be6694c2c70392e7ae99b51a226d4293f7"},
{file = "supabase-2.28.0-py3-none-any.whl", hash = "sha256:42776971c7d0ccca16034df1ab96a31c50228eb1eb19da4249ad2f756fc20272"},
{file = "supabase-2.28.0.tar.gz", hash = "sha256:aea299aaab2a2eed3c57e0be7fc035c6807214194cce795a3575add20268ece1"},
]
[package.dependencies]
httpx = ">=0.26,<0.29"
postgrest = "2.27.2"
realtime = "2.27.2"
storage3 = "2.27.2"
supabase-auth = "2.27.2"
supabase-functions = "2.27.2"
postgrest = "2.28.0"
realtime = "2.28.0"
storage3 = "2.28.0"
supabase-auth = "2.28.0"
supabase-functions = "2.28.0"
yarl = ">=1.22.0"
[[package]]
name = "supabase-auth"
version = "2.27.2"
version = "2.28.0"
description = "Python Client Library for Supabase Auth"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "supabase_auth-2.27.2-py3-none-any.whl", hash = "sha256:78ec25b11314d0a9527a7205f3b1c72560dccdc11b38392f80297ef98664ee91"},
{file = "supabase_auth-2.27.2.tar.gz", hash = "sha256:0f5bcc79b3677cb42e9d321f3c559070cfa40d6a29a67672cc8382fb7dc2fe97"},
{file = "supabase_auth-2.28.0-py3-none-any.whl", hash = "sha256:2ac85026cc285054c7fa6d41924f3a333e9ec298c013e5b5e1754039ba7caec9"},
{file = "supabase_auth-2.28.0.tar.gz", hash = "sha256:2bb8f18ff39934e44b28f10918db965659f3735cd6fbfcc022fe0b82dbf8233e"},
]
[package.dependencies]
@@ -2525,14 +2526,14 @@ pyjwt = {version = ">=2.10.1", extras = ["crypto"]}
[[package]]
name = "supabase-functions"
version = "2.27.2"
version = "2.28.0"
description = "Library for Supabase Functions"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "supabase_functions-2.27.2-py3-none-any.whl", hash = "sha256:db480efc669d0bca07605b9b6f167312af43121adcc842a111f79bea416ef754"},
{file = "supabase_functions-2.27.2.tar.gz", hash = "sha256:d0c8266207a94371cb3fd35ad3c7f025b78a97cf026861e04ccd35ac1775f80b"},
{file = "supabase_functions-2.28.0-py3-none-any.whl", hash = "sha256:30bf2d586f8df285faf0621bb5d5bb3ec3157234fc820553ca156f009475e4ae"},
{file = "supabase_functions-2.28.0.tar.gz", hash = "sha256:db3dddfc37aca5858819eb461130968473bd8c75bd284581013958526dac718b"},
]
[package.dependencies]
@@ -2911,4 +2912,4 @@ type = ["pytest-mypy"]
[metadata]
lock-version = "2.1"
python-versions = ">=3.10,<4.0"
content-hash = "40eae94995dc0a388fa832ed4af9b6137f28d5b5ced3aaea70d5f91d4d9a179d"
content-hash = "9619cae908ad38fa2c48016a58bcf4241f6f5793aa0e6cc140276e91c433cbbb"

View File

@@ -11,14 +11,14 @@ python = ">=3.10,<4.0"
colorama = "^0.4.6"
cryptography = "^46.0"
expiringdict = "^1.2.2"
fastapi = "^0.128.0"
fastapi = "^0.128.7"
google-cloud-logging = "^3.13.0"
launchdarkly-server-sdk = "^9.14.1"
launchdarkly-server-sdk = "^9.15.0"
pydantic = "^2.12.5"
pydantic-settings = "^2.12.0"
pyjwt = { version = "^2.11.0", extras = ["crypto"] }
redis = "^6.2.0"
supabase = "^2.27.2"
supabase = "^2.28.0"
uvicorn = "^0.40.0"
[tool.poetry.group.dev.dependencies]

View File

@@ -104,6 +104,12 @@ TWITTER_CLIENT_SECRET=
# Make a new workspace for your OAuth APP -- trust me
# https://linear.app/settings/api/applications/new
# Callback URL: http://localhost:3000/auth/integrations/oauth_callback
LINEAR_API_KEY=
# Linear project and team IDs for the feature request tracker.
# Find these in your Linear workspace URL: linear.app/<workspace>/project/<project-id>
# and in team settings. Used by the chat copilot to file and search feature requests.
LINEAR_FEATURE_REQUEST_PROJECT_ID=
LINEAR_FEATURE_REQUEST_TEAM_ID=
LINEAR_CLIENT_ID=
LINEAR_CLIENT_SECRET=

View File

@@ -1,3 +1,5 @@
# ============================ DEPENDENCY BUILDER ============================ #
FROM debian:13-slim AS builder
# Set environment variables
@@ -51,27 +53,62 @@ COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/parti
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
RUN poetry run prisma generate && poetry run gen-prisma-stub
FROM debian:13-slim AS server_dependencies
# =============================== DB MIGRATOR =============================== #
# Lightweight migrate stage - only needs Prisma CLI, not full Python environment
FROM debian:13-slim AS migrate
WORKDIR /app/autogpt_platform/backend
ENV DEBIAN_FRONTEND=noninteractive
# Install only what's needed for prisma migrate: Node.js and minimal Python for prisma-python
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.13 \
python3-pip \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Copy Node.js from builder (needed for Prisma CLI)
COPY --from=builder /usr/bin/node /usr/bin/node
COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
COPY --from=builder /usr/bin/npm /usr/bin/npm
# Copy Prisma binaries
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
# Install prisma-client-py directly (much smaller than copying full venv)
RUN pip3 install prisma>=0.15.0 --break-system-packages
COPY autogpt_platform/backend/schema.prisma ./
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
COPY autogpt_platform/backend/migrations ./migrations
# ============================== BACKEND SERVER ============================== #
FROM debian:13-slim AS server
WORKDIR /app
ENV POETRY_HOME=/opt/poetry \
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=true \
POETRY_VIRTUALENVS_IN_PROJECT=true \
DEBIAN_FRONTEND=noninteractive
ENV PATH=/opt/poetry/bin:$PATH
ENV DEBIAN_FRONTEND=noninteractive
# Install Python, FFmpeg, and ImageMagick (required for video processing blocks)
RUN apt-get update && apt-get install -y \
# Install Python, FFmpeg, ImageMagick, and CLI tools for agent use.
# bubblewrap provides OS-level sandbox (whitelist-only FS + no network)
# for the bash_exec MCP tool.
# Using --no-install-recommends saves ~650MB by skipping unnecessary deps like llvm, mesa, etc.
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.13 \
python3-pip \
ffmpeg \
imagemagick \
jq \
ripgrep \
tree \
bubblewrap \
&& rm -rf /var/lib/apt/lists/*
# Copy only necessary files from builder
COPY --from=builder /app /app
# Copy poetry (build-time only, for `poetry install --only-root` to create entry points)
COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
# Copy Node.js installation for Prisma
@@ -81,30 +118,25 @@ COPY --from=builder /usr/bin/npm /usr/bin/npm
COPY --from=builder /usr/bin/npx /usr/bin/npx
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
RUN mkdir -p /app/autogpt_platform/autogpt_libs
RUN mkdir -p /app/autogpt_platform/backend
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml /app/autogpt_platform/backend/
WORKDIR /app/autogpt_platform/backend
FROM server_dependencies AS migrate
# Copy only the .venv from builder (not the entire /app directory)
# The .venv includes the generated Prisma client
COPY --from=builder /app/autogpt_platform/backend/.venv ./.venv
ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
# Migration stage only needs schema and migrations - much lighter than full backend
COPY autogpt_platform/backend/schema.prisma /app/autogpt_platform/backend/
COPY autogpt_platform/backend/backend/data/partial_types.py /app/autogpt_platform/backend/backend/data/partial_types.py
COPY autogpt_platform/backend/migrations /app/autogpt_platform/backend/migrations
# Copy dependency files + autogpt_libs (path dependency)
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml ./
FROM server_dependencies AS server
COPY autogpt_platform/backend /app/autogpt_platform/backend
# Copy backend code + docs (for Copilot docs search)
COPY autogpt_platform/backend ./
COPY docs /app/docs
RUN poetry install --no-ansi --only-root
# Install the project package to create entry point scripts in .venv/bin/
# (e.g., rest, executor, ws, db, scheduler, notification - see [tool.poetry.scripts])
RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \
poetry install --no-ansi --only-root
ENV PORT=8000
CMD ["poetry", "run", "rest"]
CMD ["rest"]

View File

@@ -1,4 +1,9 @@
"""Common test fixtures for server tests."""
"""Common test fixtures for server tests.
Note: Common fixtures like test_user_id, admin_user_id, target_user_id,
setup_test_user, and setup_admin_user are defined in the parent conftest.py
(backend/conftest.py) and are available here automatically.
"""
import pytest
from pytest_snapshot.plugin import Snapshot
@@ -11,54 +16,6 @@ def configured_snapshot(snapshot: Snapshot) -> Snapshot:
return snapshot
@pytest.fixture
def test_user_id() -> str:
"""Test user ID fixture."""
return "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
@pytest.fixture
def admin_user_id() -> str:
"""Admin user ID fixture."""
return "4e53486c-cf57-477e-ba2a-cb02dc828e1b"
@pytest.fixture
def target_user_id() -> str:
"""Target user ID fixture."""
return "5e53486c-cf57-477e-ba2a-cb02dc828e1c"
@pytest.fixture
async def setup_test_user(test_user_id):
"""Create test user in database before tests."""
from backend.data.user import get_or_create_user
# Create the test user in the database using JWT token format
user_data = {
"sub": test_user_id,
"email": "test@example.com",
"user_metadata": {"name": "Test User"},
}
await get_or_create_user(user_data)
return test_user_id
@pytest.fixture
async def setup_admin_user(admin_user_id):
"""Create admin user in database before tests."""
from backend.data.user import get_or_create_user
# Create the admin user in the database using JWT token format
user_data = {
"sub": admin_user_id,
"email": "test-admin@example.com",
"user_metadata": {"name": "Test Admin"},
}
await get_or_create_user(user_data)
return admin_user_id
@pytest.fixture
def mock_jwt_user(test_user_id):
"""Provide mock JWT payload for regular user testing."""

View File

@@ -15,9 +15,9 @@ from prisma.enums import APIKeyPermission
from pydantic import BaseModel, Field
from backend.api.external.middleware import require_permission
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.tools import find_agent_tool, run_agent_tool
from backend.api.features.chat.tools.models import ToolResponseBase
from backend.copilot.model import ChatSession
from backend.copilot.tools import find_agent_tool, run_agent_tool
from backend.copilot.tools.models import ToolResponseBase
from backend.data.auth.base import APIAuthorizationInfo
logger = logging.getLogger(__name__)

View File

@@ -1,368 +0,0 @@
"""Redis Streams consumer for operation completion messages.
This module provides a consumer (ChatCompletionConsumer) that listens for
completion notifications (OperationCompleteMessage) from external services
(like Agent Generator) and triggers the appropriate stream registry and
chat service updates via process_operation_success/process_operation_failure.
Why Redis Streams instead of RabbitMQ?
--------------------------------------
While the project typically uses RabbitMQ for async task queues (e.g., execution
queue), Redis Streams was chosen for chat completion notifications because:
1. **Unified Infrastructure**: The SSE reconnection feature already uses Redis
Streams (via stream_registry) for message persistence and replay. Using Redis
Streams for completion notifications keeps all chat streaming infrastructure
in one system, simplifying operations and reducing cross-system coordination.
2. **Message Replay**: Redis Streams support XREAD with arbitrary message IDs,
allowing consumers to replay missed messages after reconnection. This aligns
with the SSE reconnection pattern where clients can resume from last_message_id.
3. **Consumer Groups with XAUTOCLAIM**: Redis consumer groups provide automatic
load balancing across pods with explicit message claiming (XAUTOCLAIM) for
recovering from dead consumers - ideal for the completion callback pattern.
4. **Lower Latency**: For real-time SSE updates, Redis (already in-memory for
stream_registry) provides lower latency than an additional RabbitMQ hop.
5. **Atomicity with Task State**: Completion processing often needs to update
task metadata stored in Redis. Keeping both in Redis enables simpler
transactional semantics without distributed coordination.
The consumer uses Redis Streams with consumer groups for reliable message
processing across multiple platform pods, with XAUTOCLAIM for reclaiming
stale pending messages from dead consumers.
"""
import asyncio
import logging
import os
import uuid
from typing import Any
import orjson
from prisma import Prisma
from pydantic import BaseModel
from redis.exceptions import ResponseError
from backend.data.redis_client import get_redis_async
from . import stream_registry
from .completion_handler import process_operation_failure, process_operation_success
from .config import ChatConfig
logger = logging.getLogger(__name__)
config = ChatConfig()
class OperationCompleteMessage(BaseModel):
"""Message format for operation completion notifications."""
operation_id: str
task_id: str
success: bool
result: dict | str | None = None
error: str | None = None
class ChatCompletionConsumer:
"""Consumer for chat operation completion messages from Redis Streams.
This consumer initializes its own Prisma client in start() to ensure
database operations work correctly within this async context.
Uses Redis consumer groups to allow multiple platform pods to consume
messages reliably with automatic redelivery on failure.
"""
def __init__(self):
self._consumer_task: asyncio.Task | None = None
self._running = False
self._prisma: Prisma | None = None
self._consumer_name = f"consumer-{uuid.uuid4().hex[:8]}"
async def start(self) -> None:
"""Start the completion consumer."""
if self._running:
logger.warning("Completion consumer already running")
return
# Create consumer group if it doesn't exist
try:
redis = await get_redis_async()
await redis.xgroup_create(
config.stream_completion_name,
config.stream_consumer_group,
id="0",
mkstream=True,
)
logger.info(
f"Created consumer group '{config.stream_consumer_group}' "
f"on stream '{config.stream_completion_name}'"
)
except ResponseError as e:
if "BUSYGROUP" in str(e):
logger.debug(
f"Consumer group '{config.stream_consumer_group}' already exists"
)
else:
raise
self._running = True
self._consumer_task = asyncio.create_task(self._consume_messages())
logger.info(
f"Chat completion consumer started (consumer: {self._consumer_name})"
)
async def _ensure_prisma(self) -> Prisma:
"""Lazily initialize Prisma client on first use."""
if self._prisma is None:
database_url = os.getenv("DATABASE_URL", "postgresql://localhost:5432")
self._prisma = Prisma(datasource={"url": database_url})
await self._prisma.connect()
logger.info("[COMPLETION] Consumer Prisma client connected (lazy init)")
return self._prisma
async def stop(self) -> None:
"""Stop the completion consumer."""
self._running = False
if self._consumer_task:
self._consumer_task.cancel()
try:
await self._consumer_task
except asyncio.CancelledError:
pass
self._consumer_task = None
if self._prisma:
await self._prisma.disconnect()
self._prisma = None
logger.info("[COMPLETION] Consumer Prisma client disconnected")
logger.info("Chat completion consumer stopped")
async def _consume_messages(self) -> None:
"""Main message consumption loop with retry logic."""
max_retries = 10
retry_delay = 5 # seconds
retry_count = 0
block_timeout = 5000 # milliseconds
while self._running and retry_count < max_retries:
try:
redis = await get_redis_async()
# Reset retry count on successful connection
retry_count = 0
while self._running:
# First, claim any stale pending messages from dead consumers
# Redis does NOT auto-redeliver pending messages; we must explicitly
# claim them using XAUTOCLAIM
try:
claimed_result = await redis.xautoclaim(
name=config.stream_completion_name,
groupname=config.stream_consumer_group,
consumername=self._consumer_name,
min_idle_time=config.stream_claim_min_idle_ms,
start_id="0-0",
count=10,
)
# xautoclaim returns: (next_start_id, [(id, data), ...], [deleted_ids])
if claimed_result and len(claimed_result) >= 2:
claimed_entries = claimed_result[1]
if claimed_entries:
logger.info(
f"Claimed {len(claimed_entries)} stale pending messages"
)
for entry_id, data in claimed_entries:
if not self._running:
return
await self._process_entry(redis, entry_id, data)
except Exception as e:
logger.warning(f"XAUTOCLAIM failed (non-fatal): {e}")
# Read new messages from the stream
messages = await redis.xreadgroup(
groupname=config.stream_consumer_group,
consumername=self._consumer_name,
streams={config.stream_completion_name: ">"},
block=block_timeout,
count=10,
)
if not messages:
continue
for stream_name, entries in messages:
for entry_id, data in entries:
if not self._running:
return
await self._process_entry(redis, entry_id, data)
except asyncio.CancelledError:
logger.info("Consumer cancelled")
return
except Exception as e:
retry_count += 1
logger.error(
f"Consumer error (retry {retry_count}/{max_retries}): {e}",
exc_info=True,
)
if self._running and retry_count < max_retries:
await asyncio.sleep(retry_delay)
else:
logger.error("Max retries reached, stopping consumer")
return
async def _process_entry(
self, redis: Any, entry_id: str, data: dict[str, Any]
) -> None:
"""Process a single stream entry and acknowledge it on success.
Args:
redis: Redis client connection
entry_id: The stream entry ID
data: The entry data dict
"""
try:
# Handle the message
message_data = data.get("data")
if message_data:
await self._handle_message(
message_data.encode()
if isinstance(message_data, str)
else message_data
)
# Acknowledge the message after successful processing
await redis.xack(
config.stream_completion_name,
config.stream_consumer_group,
entry_id,
)
except Exception as e:
logger.error(
f"Error processing completion message {entry_id}: {e}",
exc_info=True,
)
# Message remains in pending state and will be claimed by
# XAUTOCLAIM after min_idle_time expires
async def _handle_message(self, body: bytes) -> None:
"""Handle a completion message using our own Prisma client."""
try:
data = orjson.loads(body)
message = OperationCompleteMessage(**data)
except Exception as e:
logger.error(f"Failed to parse completion message: {e}")
return
logger.info(
f"[COMPLETION] Received completion for operation {message.operation_id} "
f"(task_id={message.task_id}, success={message.success})"
)
# Find task in registry
task = await stream_registry.find_task_by_operation_id(message.operation_id)
if task is None:
task = await stream_registry.get_task(message.task_id)
if task is None:
logger.warning(
f"[COMPLETION] Task not found for operation {message.operation_id} "
f"(task_id={message.task_id})"
)
return
logger.info(
f"[COMPLETION] Found task: task_id={task.task_id}, "
f"session_id={task.session_id}, tool_call_id={task.tool_call_id}"
)
# Guard against empty task fields
if not task.task_id or not task.session_id or not task.tool_call_id:
logger.error(
f"[COMPLETION] Task has empty critical fields! "
f"task_id={task.task_id!r}, session_id={task.session_id!r}, "
f"tool_call_id={task.tool_call_id!r}"
)
return
if message.success:
await self._handle_success(task, message)
else:
await self._handle_failure(task, message)
async def _handle_success(
self,
task: stream_registry.ActiveTask,
message: OperationCompleteMessage,
) -> None:
"""Handle successful operation completion."""
prisma = await self._ensure_prisma()
await process_operation_success(task, message.result, prisma)
async def _handle_failure(
self,
task: stream_registry.ActiveTask,
message: OperationCompleteMessage,
) -> None:
"""Handle failed operation completion."""
prisma = await self._ensure_prisma()
await process_operation_failure(task, message.error, prisma)
# Module-level consumer instance
_consumer: ChatCompletionConsumer | None = None
async def start_completion_consumer() -> None:
"""Start the global completion consumer."""
global _consumer
if _consumer is None:
_consumer = ChatCompletionConsumer()
await _consumer.start()
async def stop_completion_consumer() -> None:
"""Stop the global completion consumer."""
global _consumer
if _consumer:
await _consumer.stop()
_consumer = None
async def publish_operation_complete(
operation_id: str,
task_id: str,
success: bool,
result: dict | str | None = None,
error: str | None = None,
) -> None:
"""Publish an operation completion message to Redis Streams.
Args:
operation_id: The operation ID that completed.
task_id: The task ID associated with the operation.
success: Whether the operation succeeded.
result: The result data (for success).
error: The error message (for failure).
"""
message = OperationCompleteMessage(
operation_id=operation_id,
task_id=task_id,
success=success,
result=result,
error=error,
)
redis = await get_redis_async()
await redis.xadd(
config.stream_completion_name,
{"data": message.model_dump_json()},
maxlen=config.stream_max_length,
)
logger.info(f"Published completion for operation {operation_id}")

View File

@@ -1,344 +0,0 @@
"""Shared completion handling for operation success and failure.
This module provides common logic for handling operation completion from both:
- The Redis Streams consumer (completion_consumer.py)
- The HTTP webhook endpoint (routes.py)
"""
import logging
from typing import Any
import orjson
from prisma import Prisma
from . import service as chat_service
from . import stream_registry
from .response_model import StreamError, StreamToolOutputAvailable
from .tools.models import ErrorResponse
logger = logging.getLogger(__name__)
# Tools that produce agent_json that needs to be saved to library
AGENT_GENERATION_TOOLS = {"create_agent", "edit_agent"}
# Keys that should be stripped from agent_json when returning in error responses
SENSITIVE_KEYS = frozenset(
{
"api_key",
"apikey",
"api_secret",
"password",
"secret",
"credentials",
"credential",
"token",
"access_token",
"refresh_token",
"private_key",
"privatekey",
"auth",
"authorization",
}
)
def _sanitize_agent_json(obj: Any) -> Any:
"""Recursively sanitize agent_json by removing sensitive keys.
Args:
obj: The object to sanitize (dict, list, or primitive)
Returns:
Sanitized copy with sensitive keys removed/redacted
"""
if isinstance(obj, dict):
return {
k: "[REDACTED]" if k.lower() in SENSITIVE_KEYS else _sanitize_agent_json(v)
for k, v in obj.items()
}
elif isinstance(obj, list):
return [_sanitize_agent_json(item) for item in obj]
else:
return obj
class ToolMessageUpdateError(Exception):
"""Raised when updating a tool message in the database fails."""
pass
async def _update_tool_message(
session_id: str,
tool_call_id: str,
content: str,
prisma_client: Prisma | None,
) -> None:
"""Update tool message in database.
Args:
session_id: The session ID
tool_call_id: The tool call ID to update
content: The new content for the message
prisma_client: Optional Prisma client. If None, uses chat_service.
Raises:
ToolMessageUpdateError: If the database update fails. The caller should
handle this to avoid marking the task as completed with inconsistent state.
"""
try:
if prisma_client:
# Use provided Prisma client (for consumer with its own connection)
updated_count = await prisma_client.chatmessage.update_many(
where={
"sessionId": session_id,
"toolCallId": tool_call_id,
},
data={"content": content},
)
# Check if any rows were updated - 0 means message not found
if updated_count == 0:
raise ToolMessageUpdateError(
f"No message found with tool_call_id={tool_call_id} in session {session_id}"
)
else:
# Use service function (for webhook endpoint)
await chat_service._update_pending_operation(
session_id=session_id,
tool_call_id=tool_call_id,
result=content,
)
except ToolMessageUpdateError:
raise
except Exception as e:
logger.error(f"[COMPLETION] Failed to update tool message: {e}", exc_info=True)
raise ToolMessageUpdateError(
f"Failed to update tool message for tool_call_id={tool_call_id}: {e}"
) from e
def serialize_result(result: dict | list | str | int | float | bool | None) -> str:
"""Serialize result to JSON string with sensible defaults.
Args:
result: The result to serialize. Can be a dict, list, string,
number, boolean, or None.
Returns:
JSON string representation of the result. Returns '{"status": "completed"}'
only when result is explicitly None.
"""
if isinstance(result, str):
return result
if result is None:
return '{"status": "completed"}'
return orjson.dumps(result).decode("utf-8")
async def _save_agent_from_result(
result: dict[str, Any],
user_id: str | None,
tool_name: str,
) -> dict[str, Any]:
"""Save agent to library if result contains agent_json.
Args:
result: The result dict that may contain agent_json
user_id: The user ID to save the agent for
tool_name: The tool name (create_agent or edit_agent)
Returns:
Updated result dict with saved agent details, or original result if no agent_json
"""
if not user_id:
logger.warning("[COMPLETION] Cannot save agent: no user_id in task")
return result
agent_json = result.get("agent_json")
if not agent_json:
logger.warning(
f"[COMPLETION] {tool_name} completed but no agent_json in result"
)
return result
try:
from .tools.agent_generator import save_agent_to_library
is_update = tool_name == "edit_agent"
created_graph, library_agent = await save_agent_to_library(
agent_json, user_id, is_update=is_update
)
logger.info(
f"[COMPLETION] Saved agent '{created_graph.name}' to library "
f"(graph_id={created_graph.id}, library_agent_id={library_agent.id})"
)
# Return a response similar to AgentSavedResponse
return {
"type": "agent_saved",
"message": f"Agent '{created_graph.name}' has been saved to your library!",
"agent_id": created_graph.id,
"agent_name": created_graph.name,
"library_agent_id": library_agent.id,
"library_agent_link": f"/library/agents/{library_agent.id}",
"agent_page_link": f"/build?flowID={created_graph.id}",
}
except Exception as e:
logger.error(
f"[COMPLETION] Failed to save agent to library: {e}",
exc_info=True,
)
# Return error but don't fail the whole operation
# Sanitize agent_json to remove sensitive keys before returning
return {
"type": "error",
"message": f"Agent was generated but failed to save: {str(e)}",
"error": str(e),
"agent_json": _sanitize_agent_json(agent_json),
}
async def process_operation_success(
task: stream_registry.ActiveTask,
result: dict | str | None,
prisma_client: Prisma | None = None,
) -> None:
"""Handle successful operation completion.
Publishes the result to the stream registry, updates the database,
generates LLM continuation, and marks the task as completed.
Args:
task: The active task that completed
result: The result data from the operation
prisma_client: Optional Prisma client for database operations.
If None, uses chat_service._update_pending_operation instead.
Raises:
ToolMessageUpdateError: If the database update fails. The task will be
marked as failed instead of completed to avoid inconsistent state.
"""
# For agent generation tools, save the agent to library
if task.tool_name in AGENT_GENERATION_TOOLS and isinstance(result, dict):
result = await _save_agent_from_result(result, task.user_id, task.tool_name)
# Serialize result for output (only substitute default when result is exactly None)
result_output = result if result is not None else {"status": "completed"}
output_str = (
result_output
if isinstance(result_output, str)
else orjson.dumps(result_output).decode("utf-8")
)
# Publish result to stream registry
await stream_registry.publish_chunk(
task.task_id,
StreamToolOutputAvailable(
toolCallId=task.tool_call_id,
toolName=task.tool_name,
output=output_str,
success=True,
),
)
# Update pending operation in database
# If this fails, we must not continue to mark the task as completed
result_str = serialize_result(result)
try:
await _update_tool_message(
session_id=task.session_id,
tool_call_id=task.tool_call_id,
content=result_str,
prisma_client=prisma_client,
)
except ToolMessageUpdateError:
# DB update failed - mark task as failed to avoid inconsistent state
logger.error(
f"[COMPLETION] DB update failed for task {task.task_id}, "
"marking as failed instead of completed"
)
await stream_registry.publish_chunk(
task.task_id,
StreamError(errorText="Failed to save operation result to database"),
)
await stream_registry.mark_task_completed(task.task_id, status="failed")
raise
# Generate LLM continuation with streaming
try:
await chat_service._generate_llm_continuation_with_streaming(
session_id=task.session_id,
user_id=task.user_id,
task_id=task.task_id,
)
except Exception as e:
logger.error(
f"[COMPLETION] Failed to generate LLM continuation: {e}",
exc_info=True,
)
# Mark task as completed and release Redis lock
await stream_registry.mark_task_completed(task.task_id, status="completed")
try:
await chat_service._mark_operation_completed(task.tool_call_id)
except Exception as e:
logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
logger.info(
f"[COMPLETION] Successfully processed completion for task {task.task_id}"
)
async def process_operation_failure(
task: stream_registry.ActiveTask,
error: str | None,
prisma_client: Prisma | None = None,
) -> None:
"""Handle failed operation completion.
Publishes the error to the stream registry, updates the database with
the error response, and marks the task as failed.
Args:
task: The active task that failed
error: The error message from the operation
prisma_client: Optional Prisma client for database operations.
If None, uses chat_service._update_pending_operation instead.
"""
error_msg = error or "Operation failed"
# Publish error to stream registry
await stream_registry.publish_chunk(
task.task_id,
StreamError(errorText=error_msg),
)
# Update pending operation with error
# If this fails, we still continue to mark the task as failed
error_response = ErrorResponse(
message=error_msg,
error=error,
)
try:
await _update_tool_message(
session_id=task.session_id,
tool_call_id=task.tool_call_id,
content=error_response.model_dump_json(),
prisma_client=prisma_client,
)
except ToolMessageUpdateError:
# DB update failed - log but continue with cleanup
logger.error(
f"[COMPLETION] DB update failed while processing failure for task {task.task_id}, "
"continuing with cleanup"
)
# Mark task as failed and release Redis lock
await stream_registry.mark_task_completed(task.task_id, status="failed")
try:
await chat_service._mark_operation_completed(task.tool_call_id)
except Exception as e:
logger.error(f"[COMPLETION] Failed to mark operation completed: {e}")
logger.info(f"[COMPLETION] Processed failure for task {task.task_id}: {error_msg}")

View File

@@ -1,29 +1,37 @@
"""Chat API routes for chat session management and streaming via SSE."""
import asyncio
import logging
import uuid as uuid_module
from collections.abc import AsyncGenerator
from typing import Annotated
from uuid import uuid4
from autogpt_libs import auth
from fastapi import APIRouter, Depends, Header, HTTPException, Query, Response, Security
from fastapi import APIRouter, Depends, HTTPException, Query, Response, Security
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from backend.util.exceptions import NotFoundError
from . import service as chat_service
from . import stream_registry
from .completion_handler import process_operation_failure, process_operation_success
from .config import ChatConfig
from .model import ChatSession, create_chat_session, get_chat_session, get_user_sessions
from .response_model import StreamFinish, StreamHeartbeat
from .tools.models import (
from backend.copilot import service as chat_service
from backend.copilot import stream_registry
from backend.copilot.config import ChatConfig
from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_task
from backend.copilot.model import (
ChatMessage,
ChatSession,
append_and_save_message,
create_chat_session,
delete_chat_session,
get_chat_session,
get_user_sessions,
)
from backend.copilot.response_model import StreamError, StreamFinish, StreamHeartbeat
from backend.copilot.tools.models import (
AgentDetailsResponse,
AgentOutputResponse,
AgentPreviewResponse,
AgentSavedResponse,
AgentsFoundResponse,
BlockDetailsResponse,
BlockListResponse,
BlockOutputResponse,
ClarificationNeededResponse,
@@ -34,12 +42,12 @@ from .tools.models import (
InputValidationErrorResponse,
NeedLoginResponse,
NoResultsResponse,
OperationInProgressResponse,
OperationPendingResponse,
OperationStartedResponse,
SetupRequirementsResponse,
SuggestedGoalResponse,
UnderstandingUpdatedResponse,
)
from backend.copilot.tracking import track_user_message
from backend.util.exceptions import NotFoundError
config = ChatConfig()
@@ -84,10 +92,8 @@ class CreateSessionResponse(BaseModel):
class ActiveStreamInfo(BaseModel):
"""Information about an active stream for reconnection."""
task_id: str
turn_id: str
last_message_id: str # Redis Stream message ID for resumption
operation_id: str # Operation ID for completion tracking
tool_name: str # Name of the tool being executed
class SessionDetailResponse(BaseModel):
@@ -117,12 +123,11 @@ class ListSessionsResponse(BaseModel):
total: int
class OperationCompleteRequest(BaseModel):
"""Request model for external completion webhook."""
class CancelTaskResponse(BaseModel):
"""Response model for the cancel task endpoint."""
success: bool
result: dict | str | None = None
error: str | None = None
cancelled: bool
reason: str | None = None
# ========== Routes ==========
@@ -199,6 +204,43 @@ async def create_session(
)
@router.delete(
"/sessions/{session_id}",
dependencies=[Security(auth.requires_user)],
status_code=204,
responses={404: {"description": "Session not found or access denied"}},
)
async def delete_session(
session_id: str,
user_id: Annotated[str, Security(auth.get_user_id)],
) -> Response:
"""
Delete a chat session.
Permanently removes a chat session and all its messages.
Only the owner can delete their sessions.
Args:
session_id: The session ID to delete.
user_id: The authenticated user's ID.
Returns:
204 No Content on success.
Raises:
HTTPException: 404 if session not found or not owned by user.
"""
deleted = await delete_chat_session(session_id, user_id)
if not deleted:
raise HTTPException(
status_code=404,
detail=f"Session {session_id} not found or access denied",
)
return Response(status_code=204)
@router.get(
"/sessions/{session_id}",
)
@@ -210,7 +252,7 @@ async def get_session(
Retrieve the details of a specific chat session.
Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
If there's an active stream for this session, returns the task_id for reconnection.
If there's an active stream for this session, returns active_stream info for reconnection.
Args:
session_id: The unique identifier for the desired chat session.
@@ -228,24 +270,21 @@ async def get_session(
# Check if there's an active stream for this session
active_stream_info = None
active_task, last_message_id = await stream_registry.get_active_task_for_session(
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
if active_task:
# Filter out the in-progress assistant message from the session response.
# The client will receive the complete assistant response through the SSE
# stream replay instead, preventing duplicate content.
if messages and messages[-1].get("role") == "assistant":
messages = messages[:-1]
# Use "0-0" as last_message_id to replay the stream from the beginning.
# Since we filtered out the cached assistant message, the client needs
# the full stream to reconstruct the response.
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
# Keep the assistant message (including tool_calls) so the frontend can
# render the correct tool UI (e.g. CreateAgent with mini game).
# convertChatSessionToUiMessages handles isComplete=false by setting
# tool parts without output to state "input-available".
active_stream_info = ActiveStreamInfo(
task_id=active_task.task_id,
last_message_id="0-0",
operation_id=active_task.operation_id,
tool_name=active_task.tool_name,
turn_id=active_session.turn_id,
last_message_id=last_message_id,
)
return SessionDetailResponse(
@@ -258,6 +297,50 @@ async def get_session(
)
@router.post(
"/sessions/{session_id}/cancel",
status_code=200,
)
async def cancel_session_task(
session_id: str,
user_id: Annotated[str | None, Depends(auth.get_user_id)],
) -> CancelTaskResponse:
"""Cancel the active streaming task for a session.
Publishes a cancel event to the executor via RabbitMQ FANOUT, then
polls Redis until the task status flips from ``running`` or a timeout
(5 s) is reached. Returns only after the cancellation is confirmed.
"""
await _validate_and_get_session(session_id, user_id)
active_session, _ = await stream_registry.get_active_session(session_id, user_id)
if not active_session:
return CancelTaskResponse(cancelled=False, reason="no_active_task")
await enqueue_cancel_task(session_id)
logger.info(f"[CANCEL] Published cancel for session ...{session_id[-8:]}")
# Poll until the executor confirms the task is no longer running.
poll_interval = 0.5
max_wait = 5.0
waited = 0.0
while waited < max_wait:
await asyncio.sleep(poll_interval)
waited += poll_interval
task = await stream_registry.get_session(session_id)
if task is None or task.status != "running":
logger.info(
f"[CANCEL] Session ...{session_id[-8:]} confirmed stopped "
f"(status={task.status if task else 'gone'}) after {waited:.1f}s"
)
return CancelTaskResponse(cancelled=True)
logger.warning(
f"[CANCEL] Session ...{session_id[-8:]} not confirmed after {max_wait}s"
)
return CancelTaskResponse(cancelled=True, reason="cancel_published_not_confirmed")
@router.post(
"/sessions/{session_id}/stream",
)
@@ -275,16 +358,15 @@ async def stream_chat_post(
- Tool execution results
The AI generation runs in a background task that continues even if the client disconnects.
All chunks are written to Redis for reconnection support. If the client disconnects,
they can reconnect using GET /tasks/{task_id}/stream to resume from where they left off.
All chunks are written to a per-turn Redis stream for reconnection support. If the client
disconnects, they can reconnect using GET /sessions/{session_id}/stream to resume.
Args:
session_id: The chat session identifier to associate with the streamed messages.
request: Request body containing message, is_user_message, and optional context.
user_id: Optional authenticated user ID.
Returns:
StreamingResponse: SSE-formatted response chunks. First chunk is a "start" event
containing the task_id for reconnection.
StreamingResponse: SSE-formatted response chunks.
"""
import asyncio
@@ -300,10 +382,9 @@ async def stream_chat_post(
f"user={user_id}, message_len={len(request.message)}",
extra={"json_fields": log_meta},
)
session = await _validate_and_get_session(session_id, user_id)
await _validate_and_get_session(session_id, user_id)
logger.info(
f"[TIMING] session validated in {(time.perf_counter() - stream_start_time)*1000:.1f}ms",
f"[TIMING] session validated in {(time.perf_counter() - stream_start_time) * 1000:.1f}ms",
extra={
"json_fields": {
**log_meta,
@@ -312,22 +393,39 @@ async def stream_chat_post(
},
)
# Atomically append user message to session BEFORE creating task to avoid
# race condition where GET_SESSION sees task as "running" but message isn't
# saved yet. append_and_save_message re-fetches inside a lock to prevent
# message loss from concurrent requests.
if request.message:
message = ChatMessage(
role="user" if request.is_user_message else "assistant",
content=request.message,
)
if request.is_user_message:
track_user_message(
user_id=user_id,
session_id=session_id,
message_length=len(request.message),
)
logger.info(f"[STREAM] Saving user message to session {session_id}")
await append_and_save_message(session_id, message)
logger.info(f"[STREAM] User message saved for session {session_id}")
# Create a task in the stream registry for reconnection support
task_id = str(uuid_module.uuid4())
operation_id = str(uuid_module.uuid4())
log_meta["task_id"] = task_id
turn_id = str(uuid4())
log_meta["turn_id"] = turn_id
task_create_start = time.perf_counter()
await stream_registry.create_task(
task_id=task_id,
await stream_registry.create_session_task(
session_id=session_id,
user_id=user_id,
tool_call_id="chat_stream", # Not a tool call, but needed for the model
tool_call_id="chat_stream",
tool_name="chat",
operation_id=operation_id,
turn_id=turn_id,
)
logger.info(
f"[TIMING] create_task completed in {(time.perf_counter() - task_create_start)*1000:.1f}ms",
f"[TIMING] create_session_task completed in {(time.perf_counter() - task_create_start) * 1000:.1f}ms",
extra={
"json_fields": {
**log_meta,
@@ -336,82 +434,21 @@ async def stream_chat_post(
},
)
# Background task that runs the AI generation independently of SSE connection
async def run_ai_generation():
import time as time_module
# Per-turn stream is always fresh (unique turn_id), subscribe from beginning
subscribe_from_id = "0-0"
gen_start_time = time_module.perf_counter()
logger.info(
f"[TIMING] run_ai_generation STARTED, task={task_id}, session={session_id}, user={user_id}",
extra={"json_fields": log_meta},
)
first_chunk_time, ttfc = None, None
chunk_count = 0
try:
async for chunk in chat_service.stream_chat_completion(
session_id,
request.message,
is_user_message=request.is_user_message,
user_id=user_id,
session=session, # Pass pre-fetched session to avoid double-fetch
context=request.context,
_task_id=task_id, # Pass task_id so service emits start with taskId for reconnection
):
chunk_count += 1
if first_chunk_time is None:
first_chunk_time = time_module.perf_counter()
ttfc = first_chunk_time - gen_start_time
logger.info(
f"[TIMING] FIRST AI CHUNK at {ttfc:.2f}s, type={type(chunk).__name__}",
extra={
"json_fields": {
**log_meta,
"chunk_type": type(chunk).__name__,
"time_to_first_chunk_ms": ttfc * 1000,
}
},
)
# Write to Redis (subscribers will receive via XREAD)
await stream_registry.publish_chunk(task_id, chunk)
await enqueue_copilot_task(
session_id=session_id,
user_id=user_id,
message=request.message,
turn_id=turn_id,
is_user_message=request.is_user_message,
context=request.context,
)
gen_end_time = time_module.perf_counter()
total_time = (gen_end_time - gen_start_time) * 1000
logger.info(
f"[TIMING] run_ai_generation FINISHED in {total_time/1000:.1f}s; "
f"task={task_id}, session={session_id}, "
f"ttfc={ttfc or -1:.2f}s, n_chunks={chunk_count}",
extra={
"json_fields": {
**log_meta,
"total_time_ms": total_time,
"time_to_first_chunk_ms": (
ttfc * 1000 if ttfc is not None else None
),
"n_chunks": chunk_count,
}
},
)
await stream_registry.mark_task_completed(task_id, "completed")
except Exception as e:
elapsed = time_module.perf_counter() - gen_start_time
logger.error(
f"[TIMING] run_ai_generation ERROR after {elapsed:.2f}s: {e}",
extra={
"json_fields": {
**log_meta,
"elapsed_ms": elapsed * 1000,
"error": str(e),
}
},
)
await stream_registry.mark_task_completed(task_id, "failed")
# Start the AI generation in a background task
bg_task = asyncio.create_task(run_ai_generation())
await stream_registry.set_task_asyncio_task(task_id, bg_task)
setup_time = (time.perf_counter() - stream_start_time) * 1000
logger.info(
f"[TIMING] Background task started, setup={setup_time:.1f}ms",
f"[TIMING] Task enqueued to RabbitMQ, setup={setup_time:.1f}ms",
extra={"json_fields": {**log_meta, "setup_time_ms": setup_time}},
)
@@ -421,7 +458,7 @@ async def stream_chat_post(
event_gen_start = time_module.perf_counter()
logger.info(
f"[TIMING] event_generator STARTED, task={task_id}, session={session_id}, "
f"[TIMING] event_generator STARTED, turn={turn_id}, session={session_id}, "
f"user={user_id}",
extra={"json_fields": log_meta},
)
@@ -429,11 +466,12 @@ async def stream_chat_post(
first_chunk_yielded = False
chunks_yielded = 0
try:
# Subscribe to the task stream (this replays existing messages + live updates)
subscriber_queue = await stream_registry.subscribe_to_task(
task_id=task_id,
# Subscribe from the position we captured before enqueuing
# This avoids replaying old messages while catching all new ones
subscriber_queue = await stream_registry.subscribe_to_session(
session_id=session_id,
user_id=user_id,
last_message_id="0-0", # Get all messages from the beginning
last_message_id=subscribe_from_id,
)
if subscriber_queue is None:
@@ -506,23 +544,29 @@ async def stream_chat_post(
"json_fields": {**log_meta, "elapsed_ms": elapsed, "error": str(e)}
},
)
# Surface error to frontend so it doesn't appear stuck
yield StreamError(
errorText="An error occurred. Please try again.",
code="stream_error",
).to_sse()
yield StreamFinish().to_sse()
finally:
# Unsubscribe when client disconnects or stream ends to prevent resource leak
# Unsubscribe when client disconnects or stream ends
if subscriber_queue is not None:
try:
await stream_registry.unsubscribe_from_task(
task_id, subscriber_queue
await stream_registry.unsubscribe_from_session(
session_id, subscriber_queue
)
except Exception as unsub_err:
logger.error(
f"Error unsubscribing from task {task_id}: {unsub_err}",
f"Error unsubscribing from session {session_id}: {unsub_err}",
exc_info=True,
)
# AI SDK protocol termination - always yield even if unsubscribe fails
total_time = time_module.perf_counter() - event_gen_start
logger.info(
f"[TIMING] event_generator FINISHED in {total_time:.2f}s; "
f"task={task_id}, session={session_id}, n_chunks={chunks_yielded}",
f"turn={turn_id}, session={session_id}, n_chunks={chunks_yielded}",
extra={
"json_fields": {
**log_meta,
@@ -569,17 +613,22 @@ async def resume_session_stream(
"""
import asyncio
active_task, _last_id = await stream_registry.get_active_task_for_session(
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
if not active_task:
if not active_session:
return Response(status_code=204)
subscriber_queue = await stream_registry.subscribe_to_task(
task_id=active_task.task_id,
# Subscribe from the beginning ("0-0") to replay all chunks for this turn.
# This is necessary because hydrated messages filter out incomplete tool calls
# to avoid "No tool invocation found" errors. The resume stream delivers
# those tool calls fresh with proper SDK state.
# The AI SDK's deduplication will handle any duplicate chunks.
subscriber_queue = await stream_registry.subscribe_to_session(
session_id=session_id,
user_id=user_id,
last_message_id="0-0", # Full replay so useChat rebuilds the message
last_message_id="0-0",
)
if subscriber_queue is None:
@@ -615,12 +664,12 @@ async def resume_session_stream(
logger.error(f"Error in resume stream for session {session_id}: {e}")
finally:
try:
await stream_registry.unsubscribe_from_task(
active_task.task_id, subscriber_queue
await stream_registry.unsubscribe_from_session(
session_id, subscriber_queue
)
except Exception as unsub_err:
logger.error(
f"Error unsubscribing from task {active_task.task_id}: {unsub_err}",
f"Error unsubscribing from session {active_session.session_id}: {unsub_err}",
exc_info=True,
)
logger.info(
@@ -671,231 +720,6 @@ async def session_assign_user(
return {"status": "ok"}
# ========== Task Streaming (SSE Reconnection) ==========
@router.get(
"/tasks/{task_id}/stream",
)
async def stream_task(
task_id: str,
user_id: str | None = Depends(auth.get_user_id),
last_message_id: str = Query(
default="0-0",
description="Last Redis Stream message ID received (e.g., '1706540123456-0'). Use '0-0' for full replay.",
),
):
"""
Reconnect to a long-running task's SSE stream.
When a long-running operation (like agent generation) starts, the client
receives a task_id. If the connection drops, the client can reconnect
using this endpoint to resume receiving updates.
Args:
task_id: The task ID from the operation_started response.
user_id: Authenticated user ID for ownership validation.
last_message_id: Last Redis Stream message ID received ("0-0" for full replay).
Returns:
StreamingResponse: SSE-formatted response chunks starting after last_message_id.
Raises:
HTTPException: 404 if task not found, 410 if task expired, 403 if access denied.
"""
# Check task existence and expiry before subscribing
task, error_code = await stream_registry.get_task_with_expiry_info(task_id)
if error_code == "TASK_EXPIRED":
raise HTTPException(
status_code=410,
detail={
"code": "TASK_EXPIRED",
"message": "This operation has expired. Please try again.",
},
)
if error_code == "TASK_NOT_FOUND":
raise HTTPException(
status_code=404,
detail={
"code": "TASK_NOT_FOUND",
"message": f"Task {task_id} not found.",
},
)
# Validate ownership if task has an owner
if task and task.user_id and user_id != task.user_id:
raise HTTPException(
status_code=403,
detail={
"code": "ACCESS_DENIED",
"message": "You do not have access to this task.",
},
)
# Get subscriber queue from stream registry
subscriber_queue = await stream_registry.subscribe_to_task(
task_id=task_id,
user_id=user_id,
last_message_id=last_message_id,
)
if subscriber_queue is None:
raise HTTPException(
status_code=404,
detail={
"code": "TASK_NOT_FOUND",
"message": f"Task {task_id} not found or access denied.",
},
)
async def event_generator() -> AsyncGenerator[str, None]:
import asyncio
heartbeat_interval = 15.0 # Send heartbeat every 15 seconds
try:
while True:
try:
# Wait for next chunk with timeout for heartbeats
chunk = await asyncio.wait_for(
subscriber_queue.get(), timeout=heartbeat_interval
)
yield chunk.to_sse()
# Check for finish signal
if isinstance(chunk, StreamFinish):
break
except asyncio.TimeoutError:
# Send heartbeat to keep connection alive
yield StreamHeartbeat().to_sse()
except Exception as e:
logger.error(f"Error in task stream {task_id}: {e}", exc_info=True)
finally:
# Unsubscribe when client disconnects or stream ends
try:
await stream_registry.unsubscribe_from_task(task_id, subscriber_queue)
except Exception as unsub_err:
logger.error(
f"Error unsubscribing from task {task_id}: {unsub_err}",
exc_info=True,
)
# AI SDK protocol termination - always yield even if unsubscribe fails
yield "data: [DONE]\n\n"
return StreamingResponse(
event_generator(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no",
"x-vercel-ai-ui-message-stream": "v1",
},
)
@router.get(
"/tasks/{task_id}",
)
async def get_task_status(
task_id: str,
user_id: str | None = Depends(auth.get_user_id),
) -> dict:
"""
Get the status of a long-running task.
Args:
task_id: The task ID to check.
user_id: Authenticated user ID for ownership validation.
Returns:
dict: Task status including task_id, status, tool_name, and operation_id.
Raises:
NotFoundError: If task_id is not found or user doesn't have access.
"""
task = await stream_registry.get_task(task_id)
if task is None:
raise NotFoundError(f"Task {task_id} not found.")
# Validate ownership - if task has an owner, requester must match
if task.user_id and user_id != task.user_id:
raise NotFoundError(f"Task {task_id} not found.")
return {
"task_id": task.task_id,
"session_id": task.session_id,
"status": task.status,
"tool_name": task.tool_name,
"operation_id": task.operation_id,
"created_at": task.created_at.isoformat(),
}
# ========== External Completion Webhook ==========
@router.post(
"/operations/{operation_id}/complete",
status_code=200,
)
async def complete_operation(
operation_id: str,
request: OperationCompleteRequest,
x_api_key: str | None = Header(default=None),
) -> dict:
"""
External completion webhook for long-running operations.
Called by Agent Generator (or other services) when an operation completes.
This triggers the stream registry to publish completion and continue LLM generation.
Args:
operation_id: The operation ID to complete.
request: Completion payload with success status and result/error.
x_api_key: Internal API key for authentication.
Returns:
dict: Status of the completion.
Raises:
HTTPException: If API key is invalid or operation not found.
"""
# Validate internal API key - reject if not configured or invalid
if not config.internal_api_key:
logger.error(
"Operation complete webhook rejected: CHAT_INTERNAL_API_KEY not configured"
)
raise HTTPException(
status_code=503,
detail="Webhook not available: internal API key not configured",
)
if x_api_key != config.internal_api_key:
raise HTTPException(status_code=401, detail="Invalid API key")
# Find task by operation_id
task = await stream_registry.find_task_by_operation_id(operation_id)
if task is None:
raise HTTPException(
status_code=404,
detail=f"Operation {operation_id} not found",
)
logger.info(
f"Received completion webhook for operation {operation_id} "
f"(task_id={task.task_id}, success={request.success})"
)
if request.success:
await process_operation_success(task, request.result)
else:
await process_operation_failure(task, request.error)
return {"status": "ok", "task_id": task.task_id}
# ========== Configuration ==========
@@ -970,13 +794,12 @@ ToolResponseUnion = (
| AgentPreviewResponse
| AgentSavedResponse
| ClarificationNeededResponse
| SuggestedGoalResponse
| BlockListResponse
| BlockDetailsResponse
| BlockOutputResponse
| DocSearchResultsResponse
| DocPageResponse
| OperationStartedResponse
| OperationPendingResponse
| OperationInProgressResponse
)

View File

@@ -1,82 +0,0 @@
import logging
from os import getenv
import pytest
from . import service as chat_service
from .model import create_chat_session, get_chat_session, upsert_chat_session
from .response_model import (
StreamError,
StreamFinish,
StreamTextDelta,
StreamToolOutputAvailable,
)
logger = logging.getLogger(__name__)
@pytest.mark.asyncio(loop_scope="session")
async def test_stream_chat_completion(setup_test_user, test_user_id):
"""
Test the stream_chat_completion function.
"""
api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
session = await create_chat_session(test_user_id)
has_errors = False
has_ended = False
assistant_message = ""
async for chunk in chat_service.stream_chat_completion(
session.session_id, "Hello, how are you?", user_id=session.user_id
):
logger.info(chunk)
if isinstance(chunk, StreamError):
has_errors = True
if isinstance(chunk, StreamTextDelta):
assistant_message += chunk.delta
if isinstance(chunk, StreamFinish):
has_ended = True
assert has_ended, "Chat completion did not end"
assert not has_errors, "Error occurred while streaming chat completion"
assert assistant_message, "Assistant message is empty"
@pytest.mark.asyncio(loop_scope="session")
async def test_stream_chat_completion_with_tool_calls(setup_test_user, test_user_id):
"""
Test the stream_chat_completion function.
"""
api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
session = await create_chat_session(test_user_id)
session = await upsert_chat_session(session)
has_errors = False
has_ended = False
had_tool_calls = False
async for chunk in chat_service.stream_chat_completion(
session.session_id,
"Please find me an agent that can help me with my business. Use the query 'moneny printing agent'",
user_id=session.user_id,
):
logger.info(chunk)
if isinstance(chunk, StreamError):
has_errors = True
if isinstance(chunk, StreamFinish):
has_ended = True
if isinstance(chunk, StreamToolOutputAvailable):
had_tool_calls = True
assert has_ended, "Chat completion did not end"
assert not has_errors, "Error occurred while streaming chat completion"
assert had_tool_calls, "Tool calls did not occur"
session = await get_chat_session(session.session_id)
assert session, "Session not found"
assert session.usage, "Usage is empty"

View File

@@ -1,139 +0,0 @@
"""Tests for block filtering in FindBlockTool."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.api.features.chat.tools.find_block import (
COPILOT_EXCLUDED_BLOCK_IDS,
COPILOT_EXCLUDED_BLOCK_TYPES,
FindBlockTool,
)
from backend.api.features.chat.tools.models import BlockListResponse
from backend.blocks._base import BlockType
from ._test_data import make_session
_TEST_USER_ID = "test-user-find-block"
def make_mock_block(
block_id: str, name: str, block_type: BlockType, disabled: bool = False
):
"""Create a mock block for testing."""
mock = MagicMock()
mock.id = block_id
mock.name = name
mock.description = f"{name} description"
mock.block_type = block_type
mock.disabled = disabled
mock.input_schema = MagicMock()
mock.input_schema.jsonschema.return_value = {"properties": {}, "required": []}
mock.input_schema.get_credentials_fields.return_value = {}
mock.output_schema = MagicMock()
mock.output_schema.jsonschema.return_value = {}
mock.categories = []
return mock
class TestFindBlockFiltering:
"""Tests for block filtering in FindBlockTool."""
def test_excluded_block_types_contains_expected_types(self):
"""Verify COPILOT_EXCLUDED_BLOCK_TYPES contains all graph-only types."""
assert BlockType.INPUT in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.OUTPUT in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.WEBHOOK in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.WEBHOOK_MANUAL in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.NOTE in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.HUMAN_IN_THE_LOOP in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.AGENT in COPILOT_EXCLUDED_BLOCK_TYPES
def test_excluded_block_ids_contains_smart_decision_maker(self):
"""Verify SmartDecisionMakerBlock is in COPILOT_EXCLUDED_BLOCK_IDS."""
assert "3b191d9f-356f-482d-8238-ba04b6d18381" in COPILOT_EXCLUDED_BLOCK_IDS
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_type_filtered_from_results(self):
"""Verify blocks with excluded BlockTypes are filtered from search results."""
session = make_session(user_id=_TEST_USER_ID)
# Mock search returns an INPUT block (excluded) and a STANDARD block (included)
search_results = [
{"content_id": "input-block-id", "score": 0.9},
{"content_id": "standard-block-id", "score": 0.8},
]
input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)
standard_block = make_mock_block(
"standard-block-id", "HTTP Request", BlockType.STANDARD
)
def mock_get_block(block_id):
return {
"input-block-id": input_block,
"standard-block-id": standard_block,
}.get(block_id)
with patch(
"backend.api.features.chat.tools.find_block.unified_hybrid_search",
new_callable=AsyncMock,
return_value=(search_results, 2),
):
with patch(
"backend.api.features.chat.tools.find_block.get_block",
side_effect=mock_get_block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="test"
)
# Should only return the standard block, not the INPUT block
assert isinstance(response, BlockListResponse)
assert len(response.blocks) == 1
assert response.blocks[0].id == "standard-block-id"
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_id_filtered_from_results(self):
"""Verify SmartDecisionMakerBlock is filtered from search results."""
session = make_session(user_id=_TEST_USER_ID)
smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
search_results = [
{"content_id": smart_decision_id, "score": 0.9},
{"content_id": "normal-block-id", "score": 0.8},
]
# SmartDecisionMakerBlock has STANDARD type but is excluded by ID
smart_block = make_mock_block(
smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
)
normal_block = make_mock_block(
"normal-block-id", "Normal Block", BlockType.STANDARD
)
def mock_get_block(block_id):
return {
smart_decision_id: smart_block,
"normal-block-id": normal_block,
}.get(block_id)
with patch(
"backend.api.features.chat.tools.find_block.unified_hybrid_search",
new_callable=AsyncMock,
return_value=(search_results, 2),
):
with patch(
"backend.api.features.chat.tools.find_block.get_block",
side_effect=mock_get_block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="decision"
)
# Should only return normal block, not SmartDecisionMakerBlock
assert isinstance(response, BlockListResponse)
assert len(response.blocks) == 1
assert response.blocks[0].id == "normal-block-id"

View File

@@ -1,106 +0,0 @@
"""Tests for block execution guards in RunBlockTool."""
from unittest.mock import MagicMock, patch
import pytest
from backend.api.features.chat.tools.models import ErrorResponse
from backend.api.features.chat.tools.run_block import RunBlockTool
from backend.blocks._base import BlockType
from ._test_data import make_session
_TEST_USER_ID = "test-user-run-block"
def make_mock_block(
block_id: str, name: str, block_type: BlockType, disabled: bool = False
):
"""Create a mock block for testing."""
mock = MagicMock()
mock.id = block_id
mock.name = name
mock.block_type = block_type
mock.disabled = disabled
mock.input_schema = MagicMock()
mock.input_schema.jsonschema.return_value = {"properties": {}, "required": []}
mock.input_schema.get_credentials_fields_info.return_value = []
return mock
class TestRunBlockFiltering:
"""Tests for block execution guards in RunBlockTool."""
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_type_returns_error(self):
"""Attempting to execute a block with excluded BlockType returns error."""
session = make_session(user_id=_TEST_USER_ID)
input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)
with patch(
"backend.api.features.chat.tools.run_block.get_block",
return_value=input_block,
):
tool = RunBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
block_id="input-block-id",
input_data={},
)
assert isinstance(response, ErrorResponse)
assert "cannot be run directly in CoPilot" in response.message
assert "designed for use within graphs only" in response.message
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_id_returns_error(self):
"""Attempting to execute SmartDecisionMakerBlock returns error."""
session = make_session(user_id=_TEST_USER_ID)
smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
smart_block = make_mock_block(
smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
)
with patch(
"backend.api.features.chat.tools.run_block.get_block",
return_value=smart_block,
):
tool = RunBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
block_id=smart_decision_id,
input_data={},
)
assert isinstance(response, ErrorResponse)
assert "cannot be run directly in CoPilot" in response.message
@pytest.mark.asyncio(loop_scope="session")
async def test_non_excluded_block_passes_guard(self):
"""Non-excluded blocks pass the filtering guard (may fail later for other reasons)."""
session = make_session(user_id=_TEST_USER_ID)
standard_block = make_mock_block(
"standard-id", "HTTP Request", BlockType.STANDARD
)
with patch(
"backend.api.features.chat.tools.run_block.get_block",
return_value=standard_block,
):
tool = RunBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
block_id="standard-id",
input_data={},
)
# Should NOT be an ErrorResponse about CoPilot exclusion
# (may be other errors like missing credentials, but not the exclusion guard)
if isinstance(response, ErrorResponse):
assert "cannot be run directly in CoPilot" not in response.message

View File

@@ -1,620 +0,0 @@
"""CoPilot tools for workspace file operations."""
import base64
import logging
from typing import Any, Optional
from pydantic import BaseModel
from backend.api.features.chat.model import ChatSession
from backend.data.workspace import get_or_create_workspace
from backend.util.settings import Config
from backend.util.virus_scanner import scan_content_safe
from backend.util.workspace import WorkspaceManager
from .base import BaseTool
from .models import ErrorResponse, ResponseType, ToolResponseBase
logger = logging.getLogger(__name__)
class WorkspaceFileInfoData(BaseModel):
"""Data model for workspace file information (not a response itself)."""
file_id: str
name: str
path: str
mime_type: str
size_bytes: int
class WorkspaceFileListResponse(ToolResponseBase):
"""Response containing list of workspace files."""
type: ResponseType = ResponseType.WORKSPACE_FILE_LIST
files: list[WorkspaceFileInfoData]
total_count: int
class WorkspaceFileContentResponse(ToolResponseBase):
"""Response containing workspace file content (legacy, for small text files)."""
type: ResponseType = ResponseType.WORKSPACE_FILE_CONTENT
file_id: str
name: str
path: str
mime_type: str
content_base64: str
class WorkspaceFileMetadataResponse(ToolResponseBase):
"""Response containing workspace file metadata and download URL (prevents context bloat)."""
type: ResponseType = ResponseType.WORKSPACE_FILE_METADATA
file_id: str
name: str
path: str
mime_type: str
size_bytes: int
download_url: str
preview: str | None = None # First 500 chars for text files
class WorkspaceWriteResponse(ToolResponseBase):
"""Response after writing a file to workspace."""
type: ResponseType = ResponseType.WORKSPACE_FILE_WRITTEN
file_id: str
name: str
path: str
size_bytes: int
class WorkspaceDeleteResponse(ToolResponseBase):
"""Response after deleting a file from workspace."""
type: ResponseType = ResponseType.WORKSPACE_FILE_DELETED
file_id: str
success: bool
class ListWorkspaceFilesTool(BaseTool):
"""Tool for listing files in user's workspace."""
@property
def name(self) -> str:
return "list_workspace_files"
@property
def description(self) -> str:
return (
"List files in the user's workspace. "
"Returns file names, paths, sizes, and metadata. "
"Optionally filter by path prefix."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"path_prefix": {
"type": "string",
"description": (
"Optional path prefix to filter files "
"(e.g., '/documents/' to list only files in documents folder). "
"By default, only files from the current session are listed."
),
},
"limit": {
"type": "integer",
"description": "Maximum number of files to return (default 50, max 100)",
"minimum": 1,
"maximum": 100,
},
"include_all_sessions": {
"type": "boolean",
"description": (
"If true, list files from all sessions. "
"Default is false (only current session's files)."
),
},
},
"required": [],
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
path_prefix: Optional[str] = kwargs.get("path_prefix")
limit = min(kwargs.get("limit", 50), 100)
include_all_sessions: bool = kwargs.get("include_all_sessions", False)
try:
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
files = await manager.list_files(
path=path_prefix,
limit=limit,
include_all_sessions=include_all_sessions,
)
total = await manager.get_file_count(
path=path_prefix,
include_all_sessions=include_all_sessions,
)
file_infos = [
WorkspaceFileInfoData(
file_id=f.id,
name=f.name,
path=f.path,
mime_type=f.mimeType,
size_bytes=f.sizeBytes,
)
for f in files
]
scope_msg = "all sessions" if include_all_sessions else "current session"
return WorkspaceFileListResponse(
files=file_infos,
total_count=total,
message=f"Found {len(files)} files in workspace ({scope_msg})",
session_id=session_id,
)
except Exception as e:
logger.error(f"Error listing workspace files: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to list workspace files: {str(e)}",
error=str(e),
session_id=session_id,
)
class ReadWorkspaceFileTool(BaseTool):
"""Tool for reading file content from workspace."""
# Size threshold for returning full content vs metadata+URL
# Files larger than this return metadata with download URL to prevent context bloat
MAX_INLINE_SIZE_BYTES = 32 * 1024 # 32KB
# Preview size for text files
PREVIEW_SIZE = 500
@property
def name(self) -> str:
return "read_workspace_file"
@property
def description(self) -> str:
return (
"Read a file from the user's workspace. "
"Specify either file_id or path to identify the file. "
"For small text files, returns content directly. "
"For large or binary files, returns metadata and a download URL. "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"file_id": {
"type": "string",
"description": "The file's unique ID (from list_workspace_files)",
},
"path": {
"type": "string",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
},
"force_download_url": {
"type": "boolean",
"description": (
"If true, always return metadata+URL instead of inline content. "
"Default is false (auto-selects based on file size/type)."
),
},
},
"required": [], # At least one must be provided
}
@property
def requires_auth(self) -> bool:
return True
def _is_text_mime_type(self, mime_type: str) -> bool:
"""Check if the MIME type is a text-based type."""
text_types = [
"text/",
"application/json",
"application/xml",
"application/javascript",
"application/x-python",
"application/x-sh",
]
return any(mime_type.startswith(t) for t in text_types)
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
file_id: Optional[str] = kwargs.get("file_id")
path: Optional[str] = kwargs.get("path")
force_download_url: bool = kwargs.get("force_download_url", False)
if not file_id and not path:
return ErrorResponse(
message="Please provide either file_id or path",
session_id=session_id,
)
try:
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
# Get file info
if file_id:
file_info = await manager.get_file_info(file_id)
if file_info is None:
return ErrorResponse(
message=f"File not found: {file_id}",
session_id=session_id,
)
target_file_id = file_id
else:
# path is guaranteed to be non-None here due to the check above
assert path is not None
file_info = await manager.get_file_info_by_path(path)
if file_info is None:
return ErrorResponse(
message=f"File not found at path: {path}",
session_id=session_id,
)
target_file_id = file_info.id
# Decide whether to return inline content or metadata+URL
is_small_file = file_info.sizeBytes <= self.MAX_INLINE_SIZE_BYTES
is_text_file = self._is_text_mime_type(file_info.mimeType)
# Return inline content for small text files (unless force_download_url)
if is_small_file and is_text_file and not force_download_url:
content = await manager.read_file_by_id(target_file_id)
content_b64 = base64.b64encode(content).decode("utf-8")
return WorkspaceFileContentResponse(
file_id=file_info.id,
name=file_info.name,
path=file_info.path,
mime_type=file_info.mimeType,
content_base64=content_b64,
message=f"Successfully read file: {file_info.name}",
session_id=session_id,
)
# Return metadata + workspace:// reference for large or binary files
# This prevents context bloat (100KB file = ~133KB as base64)
# Use workspace:// format so frontend urlTransform can add proxy prefix
download_url = f"workspace://{target_file_id}"
# Generate preview for text files
preview: str | None = None
if is_text_file:
try:
content = await manager.read_file_by_id(target_file_id)
preview_text = content[: self.PREVIEW_SIZE].decode(
"utf-8", errors="replace"
)
if len(content) > self.PREVIEW_SIZE:
preview_text += "..."
preview = preview_text
except Exception:
pass # Preview is optional
return WorkspaceFileMetadataResponse(
file_id=file_info.id,
name=file_info.name,
path=file_info.path,
mime_type=file_info.mimeType,
size_bytes=file_info.sizeBytes,
download_url=download_url,
preview=preview,
message=f"File: {file_info.name} ({file_info.sizeBytes} bytes). Use download_url to retrieve content.",
session_id=session_id,
)
except FileNotFoundError as e:
return ErrorResponse(
message=str(e),
session_id=session_id,
)
except Exception as e:
logger.error(f"Error reading workspace file: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to read workspace file: {str(e)}",
error=str(e),
session_id=session_id,
)
class WriteWorkspaceFileTool(BaseTool):
"""Tool for writing files to workspace."""
@property
def name(self) -> str:
return "write_workspace_file"
@property
def description(self) -> str:
return (
"Write or create a file in the user's workspace. "
"Provide the content as a base64-encoded string. "
f"Maximum file size is {Config().max_file_size_mb}MB. "
"Files are saved to the current session's folder by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "Name for the file (e.g., 'report.pdf')",
},
"content_base64": {
"type": "string",
"description": "Base64-encoded file content",
},
"path": {
"type": "string",
"description": (
"Optional virtual path where to save the file "
"(e.g., '/documents/report.pdf'). "
"Defaults to '/{filename}'. Scoped to current session."
),
},
"mime_type": {
"type": "string",
"description": (
"Optional MIME type of the file. "
"Auto-detected from filename if not provided."
),
},
"overwrite": {
"type": "boolean",
"description": "Whether to overwrite if file exists at path (default: false)",
},
},
"required": ["filename", "content_base64"],
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
filename: str = kwargs.get("filename", "")
content_b64: str = kwargs.get("content_base64", "")
path: Optional[str] = kwargs.get("path")
mime_type: Optional[str] = kwargs.get("mime_type")
overwrite: bool = kwargs.get("overwrite", False)
if not filename:
return ErrorResponse(
message="Please provide a filename",
session_id=session_id,
)
if not content_b64:
return ErrorResponse(
message="Please provide content_base64",
session_id=session_id,
)
# Decode content
try:
content = base64.b64decode(content_b64)
except Exception:
return ErrorResponse(
message="Invalid base64-encoded content",
session_id=session_id,
)
# Check size
max_file_size = Config().max_file_size_mb * 1024 * 1024
if len(content) > max_file_size:
return ErrorResponse(
message=f"File too large. Maximum size is {Config().max_file_size_mb}MB",
session_id=session_id,
)
try:
# Virus scan
await scan_content_safe(content, filename=filename)
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
file_record = await manager.write_file(
content=content,
filename=filename,
path=path,
mime_type=mime_type,
overwrite=overwrite,
)
return WorkspaceWriteResponse(
file_id=file_record.id,
name=file_record.name,
path=file_record.path,
size_bytes=file_record.sizeBytes,
message=f"Successfully wrote file: {file_record.name}",
session_id=session_id,
)
except ValueError as e:
return ErrorResponse(
message=str(e),
session_id=session_id,
)
except Exception as e:
logger.error(f"Error writing workspace file: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to write workspace file: {str(e)}",
error=str(e),
session_id=session_id,
)
class DeleteWorkspaceFileTool(BaseTool):
"""Tool for deleting files from workspace."""
@property
def name(self) -> str:
return "delete_workspace_file"
@property
def description(self) -> str:
return (
"Delete a file from the user's workspace. "
"Specify either file_id or path to identify the file. "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"file_id": {
"type": "string",
"description": "The file's unique ID (from list_workspace_files)",
},
"path": {
"type": "string",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
},
},
"required": [], # At least one must be provided
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
session_id = session.session_id
if not user_id:
return ErrorResponse(
message="Authentication required",
session_id=session_id,
)
file_id: Optional[str] = kwargs.get("file_id")
path: Optional[str] = kwargs.get("path")
if not file_id and not path:
return ErrorResponse(
message="Please provide either file_id or path",
session_id=session_id,
)
try:
workspace = await get_or_create_workspace(user_id)
# Pass session_id for session-scoped file access
manager = WorkspaceManager(user_id, workspace.id, session_id)
# Determine the file_id to delete
target_file_id: str
if file_id:
target_file_id = file_id
else:
# path is guaranteed to be non-None here due to the check above
assert path is not None
file_info = await manager.get_file_info_by_path(path)
if file_info is None:
return ErrorResponse(
message=f"File not found at path: {path}",
session_id=session_id,
)
target_file_id = file_info.id
success = await manager.delete_file(target_file_id)
if not success:
return ErrorResponse(
message=f"File not found: {target_file_id}",
session_id=session_id,
)
return WorkspaceDeleteResponse(
file_id=target_file_id,
success=True,
message="File deleted successfully",
session_id=session_id,
)
except Exception as e:
logger.error(f"Error deleting workspace file: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to delete workspace file: {str(e)}",
error=str(e),
session_id=session_id,
)

View File

@@ -1,7 +1,7 @@
import asyncio
import logging
from datetime import datetime, timedelta, timezone
from typing import TYPE_CHECKING, Annotated, List, Literal
from typing import TYPE_CHECKING, Annotated, Any, List, Literal
from autogpt_libs.auth import get_user_id
from fastapi import (
@@ -14,7 +14,7 @@ from fastapi import (
Security,
status,
)
from pydantic import BaseModel, Field, SecretStr
from pydantic import BaseModel, Field, SecretStr, model_validator
from starlette.status import HTTP_500_INTERNAL_SERVER_ERROR, HTTP_502_BAD_GATEWAY
from backend.api.features.library.db import set_preset_webhook, update_preset
@@ -39,7 +39,11 @@ from backend.data.onboarding import OnboardingStep, complete_onboarding_step
from backend.data.user import get_user_integrations
from backend.executor.utils import add_graph_execution
from backend.integrations.ayrshare import AyrshareClient, SocialPlatform
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.integrations.credentials_store import provider_matches
from backend.integrations.creds_manager import (
IntegrationCredentialsManager,
create_mcp_oauth_handler,
)
from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
from backend.integrations.providers import ProviderName
from backend.integrations.webhooks import get_webhook_manager
@@ -102,9 +106,37 @@ class CredentialsMetaResponse(BaseModel):
scopes: list[str] | None
username: str | None
host: str | None = Field(
default=None, description="Host pattern for host-scoped credentials"
default=None,
description="Host pattern for host-scoped or MCP server URL for MCP credentials",
)
@model_validator(mode="before")
@classmethod
def _normalize_provider(cls, data: Any) -> Any:
"""Fix ``ProviderName.X`` format from Python 3.13 ``str(Enum)`` bug."""
if isinstance(data, dict):
prov = data.get("provider", "")
if isinstance(prov, str) and prov.startswith("ProviderName."):
member = prov.removeprefix("ProviderName.")
try:
data = {**data, "provider": ProviderName[member].value}
except KeyError:
pass
return data
@staticmethod
def get_host(cred: Credentials) -> str | None:
"""Extract host from credential: HostScoped host or MCP server URL."""
if isinstance(cred, HostScopedCredentials):
return cred.host
if isinstance(cred, OAuth2Credentials) and cred.provider in (
ProviderName.MCP,
ProviderName.MCP.value,
"ProviderName.MCP",
):
return (cred.metadata or {}).get("mcp_server_url")
return None
@router.post("/{provider}/callback", summary="Exchange OAuth code for tokens")
async def callback(
@@ -179,9 +211,7 @@ async def callback(
title=credentials.title,
scopes=credentials.scopes,
username=credentials.username,
host=(
credentials.host if isinstance(credentials, HostScopedCredentials) else None
),
host=(CredentialsMetaResponse.get_host(credentials)),
)
@@ -199,7 +229,7 @@ async def list_credentials(
title=cred.title,
scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
username=cred.username if isinstance(cred, OAuth2Credentials) else None,
host=cred.host if isinstance(cred, HostScopedCredentials) else None,
host=CredentialsMetaResponse.get_host(cred),
)
for cred in credentials
]
@@ -222,7 +252,7 @@ async def list_credentials_by_provider(
title=cred.title,
scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
username=cred.username if isinstance(cred, OAuth2Credentials) else None,
host=cred.host if isinstance(cred, HostScopedCredentials) else None,
host=CredentialsMetaResponse.get_host(cred),
)
for cred in credentials
]
@@ -322,7 +352,11 @@ async def delete_credentials(
tokens_revoked = None
if isinstance(creds, OAuth2Credentials):
handler = _get_provider_oauth_handler(request, provider)
if provider_matches(provider.value, ProviderName.MCP.value):
# MCP uses dynamic per-server OAuth — create handler from metadata
handler = create_mcp_oauth_handler(creds)
else:
handler = _get_provider_oauth_handler(request, provider)
tokens_revoked = await handler.revoke_tokens(creds)
return CredentialsDeletionResponse(revoked=tokens_revoked)

View File

@@ -0,0 +1,404 @@
"""
MCP (Model Context Protocol) API routes.
Provides endpoints for MCP tool discovery and OAuth authentication so the
frontend can list available tools on an MCP server before placing a block.
"""
import logging
from typing import Annotated, Any
from urllib.parse import urlparse
import fastapi
from autogpt_libs.auth import get_user_id
from fastapi import Security
from pydantic import BaseModel, Field
from backend.api.features.integrations.router import CredentialsMetaResponse
from backend.blocks.mcp.client import MCPClient, MCPClientError
from backend.blocks.mcp.oauth import MCPOAuthHandler
from backend.data.model import OAuth2Credentials
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.integrations.providers import ProviderName
from backend.util.request import HTTPClientError, Requests
from backend.util.settings import Settings
logger = logging.getLogger(__name__)
settings = Settings()
router = fastapi.APIRouter(tags=["mcp"])
creds_manager = IntegrationCredentialsManager()
# ====================== Tool Discovery ====================== #
class DiscoverToolsRequest(BaseModel):
"""Request to discover tools on an MCP server."""
server_url: str = Field(description="URL of the MCP server")
auth_token: str | None = Field(
default=None,
description="Optional Bearer token for authenticated MCP servers",
)
class MCPToolResponse(BaseModel):
"""A single MCP tool returned by discovery."""
name: str
description: str
input_schema: dict[str, Any]
class DiscoverToolsResponse(BaseModel):
"""Response containing the list of tools available on an MCP server."""
tools: list[MCPToolResponse]
server_name: str | None = None
protocol_version: str | None = None
@router.post(
"/discover-tools",
summary="Discover available tools on an MCP server",
response_model=DiscoverToolsResponse,
)
async def discover_tools(
request: DiscoverToolsRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> DiscoverToolsResponse:
"""
Connect to an MCP server and return its available tools.
If the user has a stored MCP credential for this server URL, it will be
used automatically — no need to pass an explicit auth token.
"""
auth_token = request.auth_token
# Auto-use stored MCP credential when no explicit token is provided.
if not auth_token:
mcp_creds = await creds_manager.store.get_creds_by_provider(
user_id, ProviderName.MCP.value
)
# Find the freshest credential for this server URL
best_cred: OAuth2Credentials | None = None
for cred in mcp_creds:
if (
isinstance(cred, OAuth2Credentials)
and (cred.metadata or {}).get("mcp_server_url") == request.server_url
):
if best_cred is None or (
(cred.access_token_expires_at or 0)
> (best_cred.access_token_expires_at or 0)
):
best_cred = cred
if best_cred:
# Refresh the token if expired before using it
best_cred = await creds_manager.refresh_if_needed(user_id, best_cred)
logger.info(
f"Using MCP credential {best_cred.id} for {request.server_url}, "
f"expires_at={best_cred.access_token_expires_at}"
)
auth_token = best_cred.access_token.get_secret_value()
client = MCPClient(request.server_url, auth_token=auth_token)
try:
init_result = await client.initialize()
tools = await client.list_tools()
except HTTPClientError as e:
if e.status_code in (401, 403):
raise fastapi.HTTPException(
status_code=401,
detail="This MCP server requires authentication. "
"Please provide a valid auth token.",
)
raise fastapi.HTTPException(status_code=502, detail=str(e))
except MCPClientError as e:
raise fastapi.HTTPException(status_code=502, detail=str(e))
except Exception as e:
raise fastapi.HTTPException(
status_code=502,
detail=f"Failed to connect to MCP server: {e}",
)
return DiscoverToolsResponse(
tools=[
MCPToolResponse(
name=t.name,
description=t.description,
input_schema=t.input_schema,
)
for t in tools
],
server_name=(
init_result.get("serverInfo", {}).get("name")
or urlparse(request.server_url).hostname
or "MCP"
),
protocol_version=init_result.get("protocolVersion"),
)
# ======================== OAuth Flow ======================== #
class MCPOAuthLoginRequest(BaseModel):
"""Request to start an OAuth flow for an MCP server."""
server_url: str = Field(description="URL of the MCP server that requires OAuth")
class MCPOAuthLoginResponse(BaseModel):
"""Response with the OAuth login URL for the user to authenticate."""
login_url: str
state_token: str
@router.post(
"/oauth/login",
summary="Initiate OAuth login for an MCP server",
)
async def mcp_oauth_login(
request: MCPOAuthLoginRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> MCPOAuthLoginResponse:
"""
Discover OAuth metadata from the MCP server and return a login URL.
1. Discovers the protected-resource metadata (RFC 9728)
2. Fetches the authorization server metadata (RFC 8414)
3. Performs Dynamic Client Registration (RFC 7591) if available
4. Returns the authorization URL for the frontend to open in a popup
"""
client = MCPClient(request.server_url)
# Step 1: Discover protected-resource metadata (RFC 9728)
protected_resource = await client.discover_auth()
metadata: dict[str, Any] | None = None
if protected_resource and protected_resource.get("authorization_servers"):
auth_server_url = protected_resource["authorization_servers"][0]
resource_url = protected_resource.get("resource", request.server_url)
# Step 2a: Discover auth-server metadata (RFC 8414)
metadata = await client.discover_auth_server_metadata(auth_server_url)
else:
# Fallback: Some MCP servers (e.g. Linear) are their own auth server
# and serve OAuth metadata directly without protected-resource metadata.
# Don't assume a resource_url — omitting it lets the auth server choose
# the correct audience for the token (RFC 8707 resource is optional).
resource_url = None
metadata = await client.discover_auth_server_metadata(request.server_url)
if (
not metadata
or "authorization_endpoint" not in metadata
or "token_endpoint" not in metadata
):
raise fastapi.HTTPException(
status_code=400,
detail="This MCP server does not advertise OAuth support. "
"You may need to provide an auth token manually.",
)
authorize_url = metadata["authorization_endpoint"]
token_url = metadata["token_endpoint"]
registration_endpoint = metadata.get("registration_endpoint")
revoke_url = metadata.get("revocation_endpoint")
# Step 3: Dynamic Client Registration (RFC 7591) if available
frontend_base_url = settings.config.frontend_base_url
if not frontend_base_url:
raise fastapi.HTTPException(
status_code=500,
detail="Frontend base URL is not configured.",
)
redirect_uri = f"{frontend_base_url}/auth/integrations/mcp_callback"
client_id = ""
client_secret = ""
if registration_endpoint:
reg_result = await _register_mcp_client(
registration_endpoint, redirect_uri, request.server_url
)
if reg_result:
client_id = reg_result.get("client_id", "")
client_secret = reg_result.get("client_secret", "")
if not client_id:
client_id = "autogpt-platform"
# Step 4: Store state token with OAuth metadata for the callback
scopes = (protected_resource or {}).get("scopes_supported") or metadata.get(
"scopes_supported", []
)
state_token, code_challenge = await creds_manager.store.store_state_token(
user_id,
ProviderName.MCP.value,
scopes,
state_metadata={
"authorize_url": authorize_url,
"token_url": token_url,
"revoke_url": revoke_url,
"resource_url": resource_url,
"server_url": request.server_url,
"client_id": client_id,
"client_secret": client_secret,
},
)
# Step 5: Build and return the login URL
handler = MCPOAuthHandler(
client_id=client_id,
client_secret=client_secret,
redirect_uri=redirect_uri,
authorize_url=authorize_url,
token_url=token_url,
resource_url=resource_url,
)
login_url = handler.get_login_url(
scopes, state_token, code_challenge=code_challenge
)
return MCPOAuthLoginResponse(login_url=login_url, state_token=state_token)
class MCPOAuthCallbackRequest(BaseModel):
"""Request to exchange an OAuth code for tokens."""
code: str = Field(description="Authorization code from OAuth callback")
state_token: str = Field(description="State token for CSRF verification")
class MCPOAuthCallbackResponse(BaseModel):
"""Response after successfully storing OAuth credentials."""
credential_id: str
@router.post(
"/oauth/callback",
summary="Exchange OAuth code for MCP tokens",
)
async def mcp_oauth_callback(
request: MCPOAuthCallbackRequest,
user_id: Annotated[str, Security(get_user_id)],
) -> CredentialsMetaResponse:
"""
Exchange the authorization code for tokens and store the credential.
The frontend calls this after receiving the OAuth code from the popup.
On success, subsequent ``/discover-tools`` calls for the same server URL
will automatically use the stored credential.
"""
valid_state = await creds_manager.store.verify_state_token(
user_id, request.state_token, ProviderName.MCP.value
)
if not valid_state:
raise fastapi.HTTPException(
status_code=400,
detail="Invalid or expired state token.",
)
meta = valid_state.state_metadata
frontend_base_url = settings.config.frontend_base_url
if not frontend_base_url:
raise fastapi.HTTPException(
status_code=500,
detail="Frontend base URL is not configured.",
)
redirect_uri = f"{frontend_base_url}/auth/integrations/mcp_callback"
handler = MCPOAuthHandler(
client_id=meta["client_id"],
client_secret=meta.get("client_secret", ""),
redirect_uri=redirect_uri,
authorize_url=meta["authorize_url"],
token_url=meta["token_url"],
revoke_url=meta.get("revoke_url"),
resource_url=meta.get("resource_url"),
)
try:
credentials = await handler.exchange_code_for_tokens(
request.code, valid_state.scopes, valid_state.code_verifier
)
except Exception as e:
raise fastapi.HTTPException(
status_code=400,
detail=f"OAuth token exchange failed: {e}",
)
# Enrich credential metadata for future lookup and token refresh
if credentials.metadata is None:
credentials.metadata = {}
credentials.metadata["mcp_server_url"] = meta["server_url"]
credentials.metadata["mcp_client_id"] = meta["client_id"]
credentials.metadata["mcp_client_secret"] = meta.get("client_secret", "")
credentials.metadata["mcp_token_url"] = meta["token_url"]
credentials.metadata["mcp_resource_url"] = meta.get("resource_url", "")
hostname = urlparse(meta["server_url"]).hostname or meta["server_url"]
credentials.title = f"MCP: {hostname}"
# Remove old MCP credentials for the same server to prevent stale token buildup.
try:
old_creds = await creds_manager.store.get_creds_by_provider(
user_id, ProviderName.MCP.value
)
for old in old_creds:
if (
isinstance(old, OAuth2Credentials)
and (old.metadata or {}).get("mcp_server_url") == meta["server_url"]
):
await creds_manager.store.delete_creds_by_id(user_id, old.id)
logger.info(
f"Removed old MCP credential {old.id} for {meta['server_url']}"
)
except Exception:
logger.debug("Could not clean up old MCP credentials", exc_info=True)
await creds_manager.create(user_id, credentials)
return CredentialsMetaResponse(
id=credentials.id,
provider=credentials.provider,
type=credentials.type,
title=credentials.title,
scopes=credentials.scopes,
username=credentials.username,
host=credentials.metadata.get("mcp_server_url"),
)
# ======================== Helpers ======================== #
async def _register_mcp_client(
registration_endpoint: str,
redirect_uri: str,
server_url: str,
) -> dict[str, Any] | None:
"""Attempt Dynamic Client Registration (RFC 7591) with an MCP auth server."""
try:
response = await Requests(raise_for_status=True).post(
registration_endpoint,
json={
"client_name": "AutoGPT Platform",
"redirect_uris": [redirect_uri],
"grant_types": ["authorization_code"],
"response_types": ["code"],
"token_endpoint_auth_method": "client_secret_post",
},
)
data = response.json()
if isinstance(data, dict) and "client_id" in data:
return data
return None
except Exception as e:
logger.warning(f"Dynamic client registration failed for {server_url}: {e}")
return None

View File

@@ -0,0 +1,436 @@
"""Tests for MCP API routes.
Uses httpx.AsyncClient with ASGITransport instead of fastapi.testclient.TestClient
to avoid creating blocking portals that can corrupt pytest-asyncio's session event loop.
"""
from unittest.mock import AsyncMock, patch
import fastapi
import httpx
import pytest
import pytest_asyncio
from autogpt_libs.auth import get_user_id
from backend.api.features.mcp.routes import router
from backend.blocks.mcp.client import MCPClientError, MCPTool
from backend.util.request import HTTPClientError
app = fastapi.FastAPI()
app.include_router(router)
app.dependency_overrides[get_user_id] = lambda: "test-user-id"
@pytest_asyncio.fixture(scope="module")
async def client():
transport = httpx.ASGITransport(app=app)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
yield c
class TestDiscoverTools:
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_success(self, client):
mock_tools = [
MCPTool(
name="get_weather",
description="Get weather for a city",
input_schema={
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
),
MCPTool(
name="add_numbers",
description="Add two numbers",
input_schema={
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"},
},
},
),
]
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
):
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
instance = MockClient.return_value
instance.initialize = AsyncMock(
return_value={
"protocolVersion": "2025-03-26",
"serverInfo": {"name": "test-server"},
}
)
instance.list_tools = AsyncMock(return_value=mock_tools)
response = await client.post(
"/discover-tools",
json={"server_url": "https://mcp.example.com/mcp"},
)
assert response.status_code == 200
data = response.json()
assert len(data["tools"]) == 2
assert data["tools"][0]["name"] == "get_weather"
assert data["tools"][1]["name"] == "add_numbers"
assert data["server_name"] == "test-server"
assert data["protocol_version"] == "2025-03-26"
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_with_auth_token(self, client):
with patch("backend.api.features.mcp.routes.MCPClient") as MockClient:
instance = MockClient.return_value
instance.initialize = AsyncMock(
return_value={"serverInfo": {}, "protocolVersion": "2025-03-26"}
)
instance.list_tools = AsyncMock(return_value=[])
response = await client.post(
"/discover-tools",
json={
"server_url": "https://mcp.example.com/mcp",
"auth_token": "my-secret-token",
},
)
assert response.status_code == 200
MockClient.assert_called_once_with(
"https://mcp.example.com/mcp",
auth_token="my-secret-token",
)
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_auto_uses_stored_credential(self, client):
"""When no explicit token is given, stored MCP credentials are used."""
from pydantic import SecretStr
from backend.data.model import OAuth2Credentials
stored_cred = OAuth2Credentials(
provider="mcp",
title="MCP: example.com",
access_token=SecretStr("stored-token-123"),
refresh_token=None,
access_token_expires_at=None,
refresh_token_expires_at=None,
scopes=[],
metadata={"mcp_server_url": "https://mcp.example.com/mcp"},
)
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
):
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[stored_cred])
mock_cm.refresh_if_needed = AsyncMock(return_value=stored_cred)
instance = MockClient.return_value
instance.initialize = AsyncMock(
return_value={"serverInfo": {}, "protocolVersion": "2025-03-26"}
)
instance.list_tools = AsyncMock(return_value=[])
response = await client.post(
"/discover-tools",
json={"server_url": "https://mcp.example.com/mcp"},
)
assert response.status_code == 200
MockClient.assert_called_once_with(
"https://mcp.example.com/mcp",
auth_token="stored-token-123",
)
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_mcp_error(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
):
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
instance = MockClient.return_value
instance.initialize = AsyncMock(
side_effect=MCPClientError("Connection refused")
)
response = await client.post(
"/discover-tools",
json={"server_url": "https://bad-server.example.com/mcp"},
)
assert response.status_code == 502
assert "Connection refused" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_generic_error(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
):
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
instance = MockClient.return_value
instance.initialize = AsyncMock(side_effect=Exception("Network timeout"))
response = await client.post(
"/discover-tools",
json={"server_url": "https://timeout.example.com/mcp"},
)
assert response.status_code == 502
assert "Failed to connect" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_auth_required(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
):
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
instance = MockClient.return_value
instance.initialize = AsyncMock(
side_effect=HTTPClientError("HTTP 401 Error: Unauthorized", 401)
)
response = await client.post(
"/discover-tools",
json={"server_url": "https://auth-server.example.com/mcp"},
)
assert response.status_code == 401
assert "requires authentication" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_forbidden(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
):
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
instance = MockClient.return_value
instance.initialize = AsyncMock(
side_effect=HTTPClientError("HTTP 403 Error: Forbidden", 403)
)
response = await client.post(
"/discover-tools",
json={"server_url": "https://auth-server.example.com/mcp"},
)
assert response.status_code == 401
assert "requires authentication" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_tools_missing_url(self, client):
response = await client.post("/discover-tools", json={})
assert response.status_code == 422
class TestOAuthLogin:
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_login_success(self, client):
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
patch(
"backend.api.features.mcp.routes._register_mcp_client"
) as mock_register,
):
instance = MockClient.return_value
instance.discover_auth = AsyncMock(
return_value={
"authorization_servers": ["https://auth.sentry.io"],
"resource": "https://mcp.sentry.dev/mcp",
"scopes_supported": ["openid"],
}
)
instance.discover_auth_server_metadata = AsyncMock(
return_value={
"authorization_endpoint": "https://auth.sentry.io/authorize",
"token_endpoint": "https://auth.sentry.io/token",
"registration_endpoint": "https://auth.sentry.io/register",
}
)
mock_register.return_value = {
"client_id": "registered-client-id",
"client_secret": "registered-secret",
}
mock_cm.store.store_state_token = AsyncMock(
return_value=("state-token-123", "code-challenge-abc")
)
mock_settings.config.frontend_base_url = "http://localhost:3000"
response = await client.post(
"/oauth/login",
json={"server_url": "https://mcp.sentry.dev/mcp"},
)
assert response.status_code == 200
data = response.json()
assert "login_url" in data
assert data["state_token"] == "state-token-123"
assert "auth.sentry.io/authorize" in data["login_url"]
assert "registered-client-id" in data["login_url"]
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_login_no_oauth_support(self, client):
with patch("backend.api.features.mcp.routes.MCPClient") as MockClient:
instance = MockClient.return_value
instance.discover_auth = AsyncMock(return_value=None)
instance.discover_auth_server_metadata = AsyncMock(return_value=None)
response = await client.post(
"/oauth/login",
json={"server_url": "https://simple-server.example.com/mcp"},
)
assert response.status_code == 400
assert "does not advertise OAuth" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_login_fallback_to_public_client(self, client):
"""When DCR is unavailable, falls back to default public client ID."""
with (
patch("backend.api.features.mcp.routes.MCPClient") as MockClient,
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
):
instance = MockClient.return_value
instance.discover_auth = AsyncMock(
return_value={
"authorization_servers": ["https://auth.example.com"],
"resource": "https://mcp.example.com/mcp",
}
)
instance.discover_auth_server_metadata = AsyncMock(
return_value={
"authorization_endpoint": "https://auth.example.com/authorize",
"token_endpoint": "https://auth.example.com/token",
# No registration_endpoint
}
)
mock_cm.store.store_state_token = AsyncMock(
return_value=("state-abc", "challenge-xyz")
)
mock_settings.config.frontend_base_url = "http://localhost:3000"
response = await client.post(
"/oauth/login",
json={"server_url": "https://mcp.example.com/mcp"},
)
assert response.status_code == 200
data = response.json()
assert "autogpt-platform" in data["login_url"]
class TestOAuthCallback:
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_callback_success(self, client):
from pydantic import SecretStr
from backend.data.model import OAuth2Credentials
mock_creds = OAuth2Credentials(
provider="mcp",
title=None,
access_token=SecretStr("access-token-xyz"),
refresh_token=None,
access_token_expires_at=None,
refresh_token_expires_at=None,
scopes=[],
metadata={
"mcp_token_url": "https://auth.sentry.io/token",
"mcp_resource_url": "https://mcp.sentry.dev/mcp",
},
)
with (
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
patch("backend.api.features.mcp.routes.MCPOAuthHandler") as MockHandler,
):
mock_settings.config.frontend_base_url = "http://localhost:3000"
# Mock state verification
mock_state = AsyncMock()
mock_state.state_metadata = {
"authorize_url": "https://auth.sentry.io/authorize",
"token_url": "https://auth.sentry.io/token",
"client_id": "test-client-id",
"client_secret": "test-secret",
"server_url": "https://mcp.sentry.dev/mcp",
}
mock_state.scopes = ["openid"]
mock_state.code_verifier = "verifier-123"
mock_cm.store.verify_state_token = AsyncMock(return_value=mock_state)
mock_cm.create = AsyncMock()
handler_instance = MockHandler.return_value
handler_instance.exchange_code_for_tokens = AsyncMock(
return_value=mock_creds
)
# Mock old credential cleanup
mock_cm.store.get_creds_by_provider = AsyncMock(return_value=[])
response = await client.post(
"/oauth/callback",
json={"code": "auth-code-abc", "state_token": "state-token-123"},
)
assert response.status_code == 200
data = response.json()
assert "id" in data
assert data["provider"] == "mcp"
assert data["type"] == "oauth2"
mock_cm.create.assert_called_once()
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_callback_invalid_state(self, client):
with patch("backend.api.features.mcp.routes.creds_manager") as mock_cm:
mock_cm.store.verify_state_token = AsyncMock(return_value=None)
response = await client.post(
"/oauth/callback",
json={"code": "auth-code", "state_token": "bad-state"},
)
assert response.status_code == 400
assert "Invalid or expired" in response.json()["detail"]
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth_callback_token_exchange_fails(self, client):
with (
patch("backend.api.features.mcp.routes.creds_manager") as mock_cm,
patch("backend.api.features.mcp.routes.settings") as mock_settings,
patch("backend.api.features.mcp.routes.MCPOAuthHandler") as MockHandler,
):
mock_settings.config.frontend_base_url = "http://localhost:3000"
mock_state = AsyncMock()
mock_state.state_metadata = {
"authorize_url": "https://auth.example.com/authorize",
"token_url": "https://auth.example.com/token",
"client_id": "cid",
"server_url": "https://mcp.example.com/mcp",
}
mock_state.scopes = []
mock_state.code_verifier = "v"
mock_cm.store.verify_state_token = AsyncMock(return_value=mock_state)
handler_instance = MockHandler.return_value
handler_instance.exchange_code_for_tokens = AsyncMock(
side_effect=RuntimeError("Token exchange failed")
)
response = await client.post(
"/oauth/callback",
json={"code": "bad-code", "state_token": "state"},
)
assert response.status_code == 400
assert "token exchange failed" in response.json()["detail"].lower()

View File

@@ -11,7 +11,7 @@ import fastapi
from autogpt_libs.auth.dependencies import get_user_id, requires_user
from fastapi.responses import Response
from backend.data.workspace import get_workspace, get_workspace_file
from backend.data.workspace import WorkspaceFile, get_workspace, get_workspace_file
from backend.util.workspace_storage import get_workspace_storage
@@ -44,11 +44,11 @@ router = fastapi.APIRouter(
)
def _create_streaming_response(content: bytes, file) -> Response:
def _create_streaming_response(content: bytes, file: WorkspaceFile) -> Response:
"""Create a streaming response for file content."""
return Response(
content=content,
media_type=file.mimeType,
media_type=file.mime_type,
headers={
"Content-Disposition": _sanitize_filename_for_header(file.name),
"Content-Length": str(len(content)),
@@ -56,7 +56,7 @@ def _create_streaming_response(content: bytes, file) -> Response:
)
async def _create_file_download_response(file) -> Response:
async def _create_file_download_response(file: WorkspaceFile) -> Response:
"""
Create a download response for a workspace file.
@@ -66,33 +66,33 @@ async def _create_file_download_response(file) -> Response:
storage = await get_workspace_storage()
# For local storage, stream the file directly
if file.storagePath.startswith("local://"):
content = await storage.retrieve(file.storagePath)
if file.storage_path.startswith("local://"):
content = await storage.retrieve(file.storage_path)
return _create_streaming_response(content, file)
# For GCS, try to redirect to signed URL, fall back to streaming
try:
url = await storage.get_download_url(file.storagePath, expires_in=300)
url = await storage.get_download_url(file.storage_path, expires_in=300)
# If we got back an API path (fallback), stream directly instead
if url.startswith("/api/"):
content = await storage.retrieve(file.storagePath)
content = await storage.retrieve(file.storage_path)
return _create_streaming_response(content, file)
return fastapi.responses.RedirectResponse(url=url, status_code=302)
except Exception as e:
# Log the signed URL failure with context
logger.error(
f"Failed to get signed URL for file {file.id} "
f"(storagePath={file.storagePath}): {e}",
f"(storagePath={file.storage_path}): {e}",
exc_info=True,
)
# Fall back to streaming directly from GCS
try:
content = await storage.retrieve(file.storagePath)
content = await storage.retrieve(file.storage_path)
return _create_streaming_response(content, file)
except Exception as fallback_error:
logger.error(
f"Fallback streaming also failed for file {file.id} "
f"(storagePath={file.storagePath}): {fallback_error}",
f"(storagePath={file.storage_path}): {fallback_error}",
exc_info=True,
)
raise

View File

@@ -26,6 +26,7 @@ import backend.api.features.executions.review.routes
import backend.api.features.library.db
import backend.api.features.library.model
import backend.api.features.library.routes
import backend.api.features.mcp.routes as mcp_routes
import backend.api.features.oauth
import backend.api.features.otto.routes
import backend.api.features.postmark.postmark
@@ -40,10 +41,6 @@ import backend.data.user
import backend.integrations.webhooks.utils
import backend.util.service
import backend.util.settings
from backend.api.features.chat.completion_consumer import (
start_completion_consumer,
stop_completion_consumer,
)
from backend.blocks.llm import DEFAULT_LLM_MODEL
from backend.data.model import Credentials
from backend.integrations.providers import ProviderName
@@ -122,21 +119,9 @@ async def lifespan_context(app: fastapi.FastAPI):
await backend.data.graph.migrate_llm_models(DEFAULT_LLM_MODEL)
await backend.integrations.webhooks.utils.migrate_legacy_triggered_graphs()
# Start chat completion consumer for Redis Streams notifications
try:
await start_completion_consumer()
except Exception as e:
logger.warning(f"Could not start chat completion consumer: {e}")
with launch_darkly_context():
yield
# Stop chat completion consumer
try:
await stop_completion_consumer()
except Exception as e:
logger.warning(f"Error stopping chat completion consumer: {e}")
try:
await shutdown_cloud_storage_handler()
except Exception as e:
@@ -343,6 +328,11 @@ app.include_router(
tags=["workspace"],
prefix="/api/workspace",
)
app.include_router(
mcp_routes.router,
tags=["v2", "mcp"],
prefix="/api/mcp",
)
app.include_router(
backend.api.features.oauth.router,
tags=["oauth"],

View File

@@ -24,7 +24,7 @@ def run_processes(*processes: "AppProcess", **kwargs):
# Run the last process in the foreground.
processes[-1].start(background=False, **kwargs)
finally:
for process in processes:
for process in reversed(processes):
try:
process.stop()
except Exception as e:
@@ -38,7 +38,9 @@ def main(**kwargs):
from backend.api.rest_api import AgentServer
from backend.api.ws_api import WebsocketServer
from backend.executor import DatabaseManager, ExecutionManager, Scheduler
from backend.copilot.executor.manager import CoPilotExecutor
from backend.data.db_manager import DatabaseManager
from backend.executor import ExecutionManager, Scheduler
from backend.notifications import NotificationManager
run_processes(
@@ -48,6 +50,7 @@ def main(**kwargs):
WebsocketServer(),
AgentServer(),
ExecutionManager(),
CoPilotExecutor(),
**kwargs,
)

View File

@@ -64,6 +64,7 @@ class BlockType(Enum):
AI = "AI"
AYRSHARE = "Ayrshare"
HUMAN_IN_THE_LOOP = "Human In The Loop"
MCP_TOOL = "MCP Tool"
class BlockCategory(Enum):

View File

@@ -126,6 +126,7 @@ class PrintToConsoleBlock(Block):
output_schema=PrintToConsoleBlock.Output,
test_input={"text": "Hello, World!"},
is_sensitive_action=True,
disabled=True, # Disabled per Nick Tindle's request (OPEN-3000)
test_output=[
("output", "Hello, World!"),
("status", "printed"),

View File

@@ -682,17 +682,219 @@ class ListIsEmptyBlock(Block):
yield "is_empty", len(input_data.list) == 0
# =============================================================================
# List Concatenation Helpers
# =============================================================================
def _validate_list_input(item: Any, index: int) -> str | None:
"""Validate that an item is a list. Returns error message or None."""
if item is None:
return None # None is acceptable, will be skipped
if not isinstance(item, list):
return (
f"Invalid input at index {index}: expected a list, "
f"got {type(item).__name__}. "
f"All items in 'lists' must be lists (e.g., [[1, 2], [3, 4]])."
)
return None
def _validate_all_lists(lists: List[Any]) -> str | None:
"""Validate that all items in a sequence are lists. Returns first error or None."""
for idx, item in enumerate(lists):
error = _validate_list_input(item, idx)
if error is not None and item is not None:
return error
return None
def _concatenate_lists_simple(lists: List[List[Any]]) -> List[Any]:
"""Concatenate a sequence of lists into a single list, skipping None values."""
result: List[Any] = []
for lst in lists:
if lst is None:
continue
result.extend(lst)
return result
def _flatten_nested_list(nested: List[Any], max_depth: int = -1) -> List[Any]:
"""
Recursively flatten a nested list structure.
Args:
nested: The list to flatten.
max_depth: Maximum recursion depth. -1 means unlimited.
Returns:
A flat list with all nested elements extracted.
"""
result: List[Any] = []
_flatten_recursive(nested, result, current_depth=0, max_depth=max_depth)
return result
_MAX_FLATTEN_DEPTH = 1000
def _flatten_recursive(
items: List[Any],
result: List[Any],
current_depth: int,
max_depth: int,
) -> None:
"""Internal recursive helper for flattening nested lists."""
if current_depth > _MAX_FLATTEN_DEPTH:
raise RecursionError(
f"Flattening exceeded maximum depth of {_MAX_FLATTEN_DEPTH} levels. "
"Input may be too deeply nested."
)
for item in items:
if isinstance(item, list) and (max_depth == -1 or current_depth < max_depth):
_flatten_recursive(item, result, current_depth + 1, max_depth)
else:
result.append(item)
def _deduplicate_list(items: List[Any]) -> List[Any]:
"""
Remove duplicate elements from a list, preserving order of first occurrences.
Args:
items: The list to deduplicate.
Returns:
A list with duplicates removed, maintaining original order.
"""
seen: set = set()
result: List[Any] = []
for item in items:
item_id = _make_hashable(item)
if item_id not in seen:
seen.add(item_id)
result.append(item)
return result
def _make_hashable(item: Any):
"""
Create a hashable representation of any item for deduplication.
Converts unhashable types (dicts, lists) into deterministic tuple structures.
"""
if isinstance(item, dict):
return tuple(
sorted(
((_make_hashable(k), _make_hashable(v)) for k, v in item.items()),
key=lambda x: (str(type(x[0])), str(x[0])),
)
)
if isinstance(item, (list, tuple)):
return tuple(_make_hashable(i) for i in item)
if isinstance(item, set):
return frozenset(_make_hashable(i) for i in item)
return item
def _filter_none_values(items: List[Any]) -> List[Any]:
"""Remove None values from a list."""
return [item for item in items if item is not None]
def _compute_nesting_depth(
items: Any, current: int = 0, max_depth: int = _MAX_FLATTEN_DEPTH
) -> int:
"""
Compute the maximum nesting depth of a list structure using iteration to avoid RecursionError.
Uses a stack-based approach to handle deeply nested structures without hitting Python's
recursion limit (~1000 levels).
"""
if not isinstance(items, list):
return current
# Stack contains tuples of (item, depth)
stack = [(items, current)]
max_observed_depth = current
while stack:
item, depth = stack.pop()
if depth > max_depth:
return depth
if not isinstance(item, list):
max_observed_depth = max(max_observed_depth, depth)
continue
if len(item) == 0:
max_observed_depth = max(max_observed_depth, depth + 1)
continue
# Add all children to stack with incremented depth
for child in item:
stack.append((child, depth + 1))
return max_observed_depth
def _interleave_lists(lists: List[List[Any]]) -> List[Any]:
"""
Interleave elements from multiple lists in round-robin fashion.
Example: [[1,2,3], [a,b], [x,y,z]] -> [1, a, x, 2, b, y, 3, z]
"""
if not lists:
return []
filtered = [lst for lst in lists if lst is not None]
if not filtered:
return []
result: List[Any] = []
max_len = max(len(lst) for lst in filtered)
for i in range(max_len):
for lst in filtered:
if i < len(lst):
result.append(lst[i])
return result
# =============================================================================
# List Concatenation Blocks
# =============================================================================
class ConcatenateListsBlock(Block):
"""
Concatenates two or more lists into a single list.
This block accepts a list of lists and combines all their elements
in order into one flat output list. It supports options for
deduplication and None-filtering to provide flexible list merging
capabilities for workflow pipelines.
"""
class Input(BlockSchemaInput):
lists: List[List[Any]] = SchemaField(
description="A list of lists to concatenate together. All lists will be combined in order into a single list.",
placeholder="e.g., [[1, 2], [3, 4], [5, 6]]",
)
deduplicate: bool = SchemaField(
description="If True, remove duplicate elements from the concatenated result while preserving order.",
default=False,
advanced=True,
)
remove_none: bool = SchemaField(
description="If True, remove None values from the concatenated result.",
default=False,
advanced=True,
)
class Output(BlockSchemaOutput):
concatenated_list: List[Any] = SchemaField(
description="The concatenated list containing all elements from all input lists in order."
)
length: int = SchemaField(
description="The total number of elements in the concatenated list."
)
error: str = SchemaField(
description="Error message if concatenation failed due to invalid input types."
)
@@ -700,7 +902,7 @@ class ConcatenateListsBlock(Block):
def __init__(self):
super().__init__(
id="3cf9298b-5817-4141-9d80-7c2cc5199c8e",
description="Concatenates multiple lists into a single list. All elements from all input lists are combined in order.",
description="Concatenates multiple lists into a single list. All elements from all input lists are combined in order. Supports optional deduplication and None removal.",
categories={BlockCategory.BASIC},
input_schema=ConcatenateListsBlock.Input,
output_schema=ConcatenateListsBlock.Output,
@@ -709,29 +911,497 @@ class ConcatenateListsBlock(Block):
{"lists": [["a", "b"], ["c"], ["d", "e", "f"]]},
{"lists": [[1, 2], []]},
{"lists": []},
{"lists": [[1, 2, 2, 3], [3, 4]], "deduplicate": True},
{"lists": [[1, None, 2], [None, 3]], "remove_none": True},
],
test_output=[
("concatenated_list", [1, 2, 3, 4, 5, 6]),
("length", 6),
("concatenated_list", ["a", "b", "c", "d", "e", "f"]),
("length", 6),
("concatenated_list", [1, 2]),
("length", 2),
("concatenated_list", []),
("length", 0),
("concatenated_list", [1, 2, 3, 4]),
("length", 4),
("concatenated_list", [1, 2, 3]),
("length", 3),
],
)
def _validate_inputs(self, lists: List[Any]) -> str | None:
return _validate_all_lists(lists)
def _perform_concatenation(self, lists: List[List[Any]]) -> List[Any]:
return _concatenate_lists_simple(lists)
def _apply_deduplication(self, items: List[Any]) -> List[Any]:
return _deduplicate_list(items)
def _apply_none_removal(self, items: List[Any]) -> List[Any]:
return _filter_none_values(items)
def _post_process(
self, items: List[Any], deduplicate: bool, remove_none: bool
) -> List[Any]:
"""Apply all post-processing steps to the concatenated result."""
result = items
if remove_none:
result = self._apply_none_removal(result)
if deduplicate:
result = self._apply_deduplication(result)
return result
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
concatenated = []
for idx, lst in enumerate(input_data.lists):
if lst is None:
# Skip None values to avoid errors
continue
if not isinstance(lst, list):
# Type validation: each item must be a list
# Strings are iterable and would cause extend() to iterate character-by-character
# Non-iterable types would raise TypeError
yield "error", (
f"Invalid input at index {idx}: expected a list, got {type(lst).__name__}. "
f"All items in 'lists' must be lists (e.g., [[1, 2], [3, 4]])."
)
return
concatenated.extend(lst)
yield "concatenated_list", concatenated
# Validate all inputs are lists
validation_error = self._validate_inputs(input_data.lists)
if validation_error is not None:
yield "error", validation_error
return
# Perform concatenation
concatenated = self._perform_concatenation(input_data.lists)
# Apply post-processing
result = self._post_process(
concatenated, input_data.deduplicate, input_data.remove_none
)
yield "concatenated_list", result
yield "length", len(result)
class FlattenListBlock(Block):
"""
Flattens a nested list structure into a single flat list.
This block takes a list that may contain nested lists at any depth
and produces a single-level list with all leaf elements. Useful
for normalizing data structures from multiple sources that may
have varying levels of nesting.
"""
class Input(BlockSchemaInput):
nested_list: List[Any] = SchemaField(
description="A potentially nested list to flatten into a single-level list.",
placeholder="e.g., [[1, [2, 3]], [4, [5, [6]]]]",
)
max_depth: int = SchemaField(
description="Maximum depth to flatten. -1 means flatten completely. 1 means flatten only one level.",
default=-1,
advanced=True,
)
class Output(BlockSchemaOutput):
flattened_list: List[Any] = SchemaField(
description="The flattened list with all nested elements extracted."
)
length: int = SchemaField(
description="The number of elements in the flattened list."
)
original_depth: int = SchemaField(
description="The maximum nesting depth of the original input list."
)
error: str = SchemaField(description="Error message if flattening failed.")
def __init__(self):
super().__init__(
id="cc45bb0f-d035-4756-96a7-fe3e36254b4d",
description="Flattens a nested list structure into a single flat list. Supports configurable maximum flattening depth.",
categories={BlockCategory.BASIC},
input_schema=FlattenListBlock.Input,
output_schema=FlattenListBlock.Output,
test_input=[
{"nested_list": [[1, 2], [3, [4, 5]]]},
{"nested_list": [1, [2, [3, [4]]]]},
{"nested_list": [1, [2, [3, [4]]], 5], "max_depth": 1},
{"nested_list": []},
{"nested_list": [1, 2, 3]},
],
test_output=[
("flattened_list", [1, 2, 3, 4, 5]),
("length", 5),
("original_depth", 3),
("flattened_list", [1, 2, 3, 4]),
("length", 4),
("original_depth", 4),
("flattened_list", [1, 2, [3, [4]], 5]),
("length", 4),
("original_depth", 4),
("flattened_list", []),
("length", 0),
("original_depth", 1),
("flattened_list", [1, 2, 3]),
("length", 3),
("original_depth", 1),
],
)
def _compute_depth(self, items: List[Any]) -> int:
"""Compute the nesting depth of the input list."""
return _compute_nesting_depth(items)
def _flatten(self, items: List[Any], max_depth: int) -> List[Any]:
"""Flatten the list to the specified depth."""
return _flatten_nested_list(items, max_depth=max_depth)
def _validate_max_depth(self, max_depth: int) -> str | None:
"""Validate the max_depth parameter."""
if max_depth < -1:
return f"max_depth must be -1 (unlimited) or a non-negative integer, got {max_depth}"
return None
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
# Validate max_depth
depth_error = self._validate_max_depth(input_data.max_depth)
if depth_error is not None:
yield "error", depth_error
return
original_depth = self._compute_depth(input_data.nested_list)
flattened = self._flatten(input_data.nested_list, input_data.max_depth)
yield "flattened_list", flattened
yield "length", len(flattened)
yield "original_depth", original_depth
class InterleaveListsBlock(Block):
"""
Interleaves elements from multiple lists in round-robin fashion.
Given multiple input lists, this block takes one element from each
list in turn, producing an output where elements alternate between
sources. Lists of different lengths are handled gracefully - shorter
lists simply stop contributing once exhausted.
"""
class Input(BlockSchemaInput):
lists: List[List[Any]] = SchemaField(
description="A list of lists to interleave. Elements will be taken in round-robin order.",
placeholder="e.g., [[1, 2, 3], ['a', 'b', 'c']]",
)
class Output(BlockSchemaOutput):
interleaved_list: List[Any] = SchemaField(
description="The interleaved list with elements alternating from each input list."
)
length: int = SchemaField(
description="The total number of elements in the interleaved list."
)
error: str = SchemaField(description="Error message if interleaving failed.")
def __init__(self):
super().__init__(
id="9f616084-1d9f-4f8e-bc00-5b9d2a75cd75",
description="Interleaves elements from multiple lists in round-robin fashion, alternating between sources.",
categories={BlockCategory.BASIC},
input_schema=InterleaveListsBlock.Input,
output_schema=InterleaveListsBlock.Output,
test_input=[
{"lists": [[1, 2, 3], ["a", "b", "c"]]},
{"lists": [[1, 2, 3], ["a", "b"], ["x", "y", "z"]]},
{"lists": [[1], [2], [3]]},
{"lists": []},
],
test_output=[
("interleaved_list", [1, "a", 2, "b", 3, "c"]),
("length", 6),
("interleaved_list", [1, "a", "x", 2, "b", "y", 3, "z"]),
("length", 8),
("interleaved_list", [1, 2, 3]),
("length", 3),
("interleaved_list", []),
("length", 0),
],
)
def _validate_inputs(self, lists: List[Any]) -> str | None:
return _validate_all_lists(lists)
def _interleave(self, lists: List[List[Any]]) -> List[Any]:
return _interleave_lists(lists)
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
validation_error = self._validate_inputs(input_data.lists)
if validation_error is not None:
yield "error", validation_error
return
result = self._interleave(input_data.lists)
yield "interleaved_list", result
yield "length", len(result)
class ZipListsBlock(Block):
"""
Zips multiple lists together into a list of grouped tuples/lists.
Takes two or more input lists and combines corresponding elements
into sub-lists. For example, zipping [1,2,3] and ['a','b','c']
produces [[1,'a'], [2,'b'], [3,'c']]. Supports both truncating
to shortest list and padding to longest list with a fill value.
"""
class Input(BlockSchemaInput):
lists: List[List[Any]] = SchemaField(
description="A list of lists to zip together. Corresponding elements will be grouped.",
placeholder="e.g., [[1, 2, 3], ['a', 'b', 'c']]",
)
pad_to_longest: bool = SchemaField(
description="If True, pad shorter lists with fill_value to match the longest list. If False, truncate to shortest.",
default=False,
advanced=True,
)
fill_value: Any = SchemaField(
description="Value to use for padding when pad_to_longest is True.",
default=None,
advanced=True,
)
class Output(BlockSchemaOutput):
zipped_list: List[List[Any]] = SchemaField(
description="The zipped list of grouped elements."
)
length: int = SchemaField(
description="The number of groups in the zipped result."
)
error: str = SchemaField(description="Error message if zipping failed.")
def __init__(self):
super().__init__(
id="0d0e684f-5cb9-4c4b-b8d1-47a0860e0c07",
description="Zips multiple lists together into a list of grouped elements. Supports padding to longest or truncating to shortest.",
categories={BlockCategory.BASIC},
input_schema=ZipListsBlock.Input,
output_schema=ZipListsBlock.Output,
test_input=[
{"lists": [[1, 2, 3], ["a", "b", "c"]]},
{"lists": [[1, 2, 3], ["a", "b"]]},
{
"lists": [[1, 2], ["a", "b", "c"]],
"pad_to_longest": True,
"fill_value": 0,
},
{"lists": []},
],
test_output=[
("zipped_list", [[1, "a"], [2, "b"], [3, "c"]]),
("length", 3),
("zipped_list", [[1, "a"], [2, "b"]]),
("length", 2),
("zipped_list", [[1, "a"], [2, "b"], [0, "c"]]),
("length", 3),
("zipped_list", []),
("length", 0),
],
)
def _validate_inputs(self, lists: List[Any]) -> str | None:
return _validate_all_lists(lists)
def _zip_truncate(self, lists: List[List[Any]]) -> List[List[Any]]:
"""Zip lists, truncating to shortest."""
filtered = [lst for lst in lists if lst is not None]
if not filtered:
return []
return [list(group) for group in zip(*filtered)]
def _zip_pad(self, lists: List[List[Any]], fill_value: Any) -> List[List[Any]]:
"""Zip lists, padding shorter ones with fill_value."""
if not lists:
return []
lists = [lst for lst in lists if lst is not None]
if not lists:
return []
max_len = max(len(lst) for lst in lists)
result: List[List[Any]] = []
for i in range(max_len):
group: List[Any] = []
for lst in lists:
if i < len(lst):
group.append(lst[i])
else:
group.append(fill_value)
result.append(group)
return result
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
validation_error = self._validate_inputs(input_data.lists)
if validation_error is not None:
yield "error", validation_error
return
if not input_data.lists:
yield "zipped_list", []
yield "length", 0
return
if input_data.pad_to_longest:
result = self._zip_pad(input_data.lists, input_data.fill_value)
else:
result = self._zip_truncate(input_data.lists)
yield "zipped_list", result
yield "length", len(result)
class ListDifferenceBlock(Block):
"""
Computes the difference between two lists (elements in the first
list that are not in the second list).
This is useful for finding items that exist in one dataset but
not in another, such as finding new items, missing items, or
items that need to be processed.
"""
class Input(BlockSchemaInput):
list_a: List[Any] = SchemaField(
description="The primary list to check elements from.",
placeholder="e.g., [1, 2, 3, 4, 5]",
)
list_b: List[Any] = SchemaField(
description="The list to subtract. Elements found here will be removed from list_a.",
placeholder="e.g., [3, 4, 5, 6]",
)
symmetric: bool = SchemaField(
description="If True, compute symmetric difference (elements in either list but not both).",
default=False,
advanced=True,
)
class Output(BlockSchemaOutput):
difference: List[Any] = SchemaField(
description="Elements from list_a not found in list_b (or symmetric difference if enabled)."
)
length: int = SchemaField(
description="The number of elements in the difference result."
)
error: str = SchemaField(description="Error message if the operation failed.")
def __init__(self):
super().__init__(
id="05309873-9d61-447e-96b5-b804e2511829",
description="Computes the difference between two lists. Returns elements in the first list not found in the second, or symmetric difference.",
categories={BlockCategory.BASIC},
input_schema=ListDifferenceBlock.Input,
output_schema=ListDifferenceBlock.Output,
test_input=[
{"list_a": [1, 2, 3, 4, 5], "list_b": [3, 4, 5, 6, 7]},
{
"list_a": [1, 2, 3, 4, 5],
"list_b": [3, 4, 5, 6, 7],
"symmetric": True,
},
{"list_a": ["a", "b", "c"], "list_b": ["b"]},
{"list_a": [], "list_b": [1, 2, 3]},
],
test_output=[
("difference", [1, 2]),
("length", 2),
("difference", [1, 2, 6, 7]),
("length", 4),
("difference", ["a", "c"]),
("length", 2),
("difference", []),
("length", 0),
],
)
def _compute_difference(self, list_a: List[Any], list_b: List[Any]) -> List[Any]:
"""Compute elements in list_a not in list_b."""
b_hashes = {_make_hashable(item) for item in list_b}
return [item for item in list_a if _make_hashable(item) not in b_hashes]
def _compute_symmetric_difference(
self, list_a: List[Any], list_b: List[Any]
) -> List[Any]:
"""Compute elements in either list but not both."""
a_hashes = {_make_hashable(item) for item in list_a}
b_hashes = {_make_hashable(item) for item in list_b}
only_in_a = [item for item in list_a if _make_hashable(item) not in b_hashes]
only_in_b = [item for item in list_b if _make_hashable(item) not in a_hashes]
return only_in_a + only_in_b
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
if input_data.symmetric:
result = self._compute_symmetric_difference(
input_data.list_a, input_data.list_b
)
else:
result = self._compute_difference(input_data.list_a, input_data.list_b)
yield "difference", result
yield "length", len(result)
class ListIntersectionBlock(Block):
"""
Computes the intersection of two lists (elements present in both lists).
This is useful for finding common items between two datasets,
such as shared tags, mutual connections, or overlapping categories.
"""
class Input(BlockSchemaInput):
list_a: List[Any] = SchemaField(
description="The first list to intersect.",
placeholder="e.g., [1, 2, 3, 4, 5]",
)
list_b: List[Any] = SchemaField(
description="The second list to intersect.",
placeholder="e.g., [3, 4, 5, 6, 7]",
)
class Output(BlockSchemaOutput):
intersection: List[Any] = SchemaField(
description="Elements present in both list_a and list_b."
)
length: int = SchemaField(
description="The number of elements in the intersection."
)
error: str = SchemaField(description="Error message if the operation failed.")
def __init__(self):
super().__init__(
id="b6eb08b6-dbe3-411b-b9b4-2508cb311a1f",
description="Computes the intersection of two lists, returning only elements present in both.",
categories={BlockCategory.BASIC},
input_schema=ListIntersectionBlock.Input,
output_schema=ListIntersectionBlock.Output,
test_input=[
{"list_a": [1, 2, 3, 4, 5], "list_b": [3, 4, 5, 6, 7]},
{"list_a": ["a", "b", "c"], "list_b": ["c", "d", "e"]},
{"list_a": [1, 2], "list_b": [3, 4]},
{"list_a": [], "list_b": [1, 2, 3]},
],
test_output=[
("intersection", [3, 4, 5]),
("length", 3),
("intersection", ["c"]),
("length", 1),
("intersection", []),
("length", 0),
("intersection", []),
("length", 0),
],
)
def _compute_intersection(self, list_a: List[Any], list_b: List[Any]) -> List[Any]:
"""Compute elements present in both lists, preserving order from list_a."""
b_hashes = {_make_hashable(item) for item in list_b}
seen: set = set()
result: List[Any] = []
for item in list_a:
h = _make_hashable(item)
if h in b_hashes and h not in seen:
result.append(item)
seen.add(h)
return result
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
result = self._compute_intersection(input_data.list_a, input_data.list_b)
yield "intersection", result
yield "length", len(result)

View File

@@ -17,6 +17,7 @@ from backend.blocks.jina._auth import (
from backend.blocks.search import GetRequest
from backend.data.model import SchemaField
from backend.util.exceptions import BlockExecutionError
from backend.util.request import HTTPClientError, HTTPServerError, validate_url
class SearchTheWebBlock(Block, GetRequest):
@@ -110,7 +111,12 @@ class ExtractWebsiteContentBlock(Block, GetRequest):
self, input_data: Input, *, credentials: JinaCredentials, **kwargs
) -> BlockOutput:
if input_data.raw_content:
url = input_data.url
try:
parsed_url, _, _ = await validate_url(input_data.url, [])
url = parsed_url.geturl()
except ValueError as e:
yield "error", f"Invalid URL: {e}"
return
headers = {}
else:
url = f"https://r.jina.ai/{input_data.url}"
@@ -119,5 +125,20 @@ class ExtractWebsiteContentBlock(Block, GetRequest):
"Authorization": f"Bearer {credentials.api_key.get_secret_value()}",
}
content = await self.get_request(url, json=False, headers=headers)
try:
content = await self.get_request(url, json=False, headers=headers)
except HTTPClientError as e:
yield "error", f"Client error ({e.status_code}) fetching {input_data.url}: {e}"
return
except HTTPServerError as e:
yield "error", f"Server error ({e.status_code}) fetching {input_data.url}: {e}"
return
except Exception as e:
yield "error", f"Failed to fetch {input_data.url}: {e}"
return
if not content:
yield "error", f"No content returned for {input_data.url}"
return
yield "content", content

View File

@@ -0,0 +1,300 @@
"""
MCP (Model Context Protocol) Tool Block.
A single dynamic block that can connect to any MCP server, discover available tools,
and execute them. Works like AgentExecutorBlock — the user selects a tool from a
dropdown and the input/output schema adapts dynamically.
"""
import json
import logging
from typing import Any, Literal
from pydantic import SecretStr
from backend.blocks._base import (
Block,
BlockCategory,
BlockSchemaInput,
BlockSchemaOutput,
BlockType,
)
from backend.blocks.mcp.client import MCPClient, MCPClientError
from backend.data.block import BlockInput, BlockOutput
from backend.data.model import (
CredentialsField,
CredentialsMetaInput,
OAuth2Credentials,
SchemaField,
)
from backend.integrations.providers import ProviderName
from backend.util.json import validate_with_jsonschema
logger = logging.getLogger(__name__)
TEST_CREDENTIALS = OAuth2Credentials(
id="test-mcp-cred",
provider="mcp",
access_token=SecretStr("mock-mcp-token"),
refresh_token=SecretStr("mock-refresh"),
scopes=[],
title="Mock MCP credential",
)
TEST_CREDENTIALS_INPUT = {
"provider": TEST_CREDENTIALS.provider,
"id": TEST_CREDENTIALS.id,
"type": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.title,
}
MCPCredentials = CredentialsMetaInput[Literal[ProviderName.MCP], Literal["oauth2"]]
class MCPToolBlock(Block):
"""
A block that connects to an MCP server, lets the user pick a tool,
and executes it with dynamic input/output schema.
The flow:
1. User provides an MCP server URL (and optional credentials)
2. Frontend calls the backend to get tool list from that URL
3. User selects a tool from a dropdown (available_tools)
4. The block's input schema updates to reflect the selected tool's parameters
5. On execution, the block calls the MCP server to run the tool
"""
class Input(BlockSchemaInput):
server_url: str = SchemaField(
description="URL of the MCP server (Streamable HTTP endpoint)",
placeholder="https://mcp.example.com/mcp",
)
credentials: MCPCredentials = CredentialsField(
discriminator="server_url",
description="MCP server OAuth credentials",
default={},
)
selected_tool: str = SchemaField(
description="The MCP tool to execute",
placeholder="Select a tool",
default="",
)
tool_input_schema: dict[str, Any] = SchemaField(
description="JSON Schema for the selected tool's input parameters. "
"Populated automatically when a tool is selected.",
default={},
hidden=True,
)
tool_arguments: dict[str, Any] = SchemaField(
description="Arguments to pass to the selected MCP tool. "
"The fields here are defined by the tool's input schema.",
default={},
)
@classmethod
def get_input_schema(cls, data: BlockInput) -> dict[str, Any]:
"""Return the tool's input schema so the builder UI renders dynamic fields."""
return data.get("tool_input_schema", {})
@classmethod
def get_input_defaults(cls, data: BlockInput) -> BlockInput:
"""Return the current tool_arguments as defaults for the dynamic fields."""
return data.get("tool_arguments", {})
@classmethod
def get_missing_input(cls, data: BlockInput) -> set[str]:
"""Check which required tool arguments are missing."""
required_fields = cls.get_input_schema(data).get("required", [])
tool_arguments = data.get("tool_arguments", {})
return set(required_fields) - set(tool_arguments)
@classmethod
def get_mismatch_error(cls, data: BlockInput) -> str | None:
"""Validate tool_arguments against the tool's input schema."""
tool_schema = cls.get_input_schema(data)
if not tool_schema:
return None
tool_arguments = data.get("tool_arguments", {})
return validate_with_jsonschema(tool_schema, tool_arguments)
class Output(BlockSchemaOutput):
result: Any = SchemaField(description="The result returned by the MCP tool")
error: str = SchemaField(description="Error message if the tool call failed")
def __init__(self):
super().__init__(
id="a0a4b1c2-d3e4-4f56-a7b8-c9d0e1f2a3b4",
description="Connect to any MCP server and execute its tools. "
"Provide a server URL, select a tool, and pass arguments dynamically.",
categories={BlockCategory.DEVELOPER_TOOLS},
input_schema=MCPToolBlock.Input,
output_schema=MCPToolBlock.Output,
block_type=BlockType.MCP_TOOL,
test_credentials=TEST_CREDENTIALS,
test_input={
"server_url": "https://mcp.example.com/mcp",
"credentials": TEST_CREDENTIALS_INPUT,
"selected_tool": "get_weather",
"tool_input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
"tool_arguments": {"city": "London"},
},
test_output=[
(
"result",
{"weather": "sunny", "temperature": 20},
),
],
test_mock={
"_call_mcp_tool": lambda *a, **kw: {
"weather": "sunny",
"temperature": 20,
},
},
)
async def _call_mcp_tool(
self,
server_url: str,
tool_name: str,
arguments: dict[str, Any],
auth_token: str | None = None,
) -> Any:
"""Call a tool on the MCP server. Extracted for easy mocking in tests."""
client = MCPClient(server_url, auth_token=auth_token)
await client.initialize()
result = await client.call_tool(tool_name, arguments)
if result.is_error:
error_text = ""
for item in result.content:
if item.get("type") == "text":
error_text += item.get("text", "")
raise MCPClientError(
f"MCP tool '{tool_name}' returned an error: "
f"{error_text or 'Unknown error'}"
)
# Extract text content from the result
output_parts = []
for item in result.content:
if item.get("type") == "text":
text = item.get("text", "")
# Try to parse as JSON for structured output
try:
output_parts.append(json.loads(text))
except (json.JSONDecodeError, ValueError):
output_parts.append(text)
elif item.get("type") == "image":
output_parts.append(
{
"type": "image",
"data": item.get("data"),
"mimeType": item.get("mimeType"),
}
)
elif item.get("type") == "resource":
output_parts.append(item.get("resource", {}))
# If single result, unwrap
if len(output_parts) == 1:
return output_parts[0]
return output_parts if output_parts else None
@staticmethod
async def _auto_lookup_credential(
user_id: str, server_url: str
) -> "OAuth2Credentials | None":
"""Auto-lookup stored MCP credential for a server URL.
This is a fallback for nodes that don't have ``credentials`` explicitly
set (e.g. nodes created before the credential field was wired up).
"""
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.integrations.providers import ProviderName
try:
mgr = IntegrationCredentialsManager()
mcp_creds = await mgr.store.get_creds_by_provider(
user_id, ProviderName.MCP.value
)
best: OAuth2Credentials | None = None
for cred in mcp_creds:
if (
isinstance(cred, OAuth2Credentials)
and (cred.metadata or {}).get("mcp_server_url") == server_url
):
if best is None or (
(cred.access_token_expires_at or 0)
> (best.access_token_expires_at or 0)
):
best = cred
if best:
best = await mgr.refresh_if_needed(user_id, best)
logger.info(
"Auto-resolved MCP credential %s for %s", best.id, server_url
)
return best
except Exception:
logger.warning("Auto-lookup MCP credential failed", exc_info=True)
return None
async def run(
self,
input_data: Input,
*,
user_id: str,
credentials: OAuth2Credentials | None = None,
**kwargs,
) -> BlockOutput:
if not input_data.server_url:
yield "error", "MCP server URL is required"
return
if not input_data.selected_tool:
yield "error", "No tool selected. Please select a tool from the dropdown."
return
# Validate required tool arguments before calling the server.
# The executor-level validation is bypassed for MCP blocks because
# get_input_defaults() flattens tool_arguments, stripping tool_input_schema
# from the validation context.
required = set(input_data.tool_input_schema.get("required", []))
if required:
missing = required - set(input_data.tool_arguments.keys())
if missing:
yield "error", (
f"Missing required argument(s): {', '.join(sorted(missing))}. "
f"Please fill in all required fields marked with * in the block form."
)
return
# If no credentials were injected by the executor (e.g. legacy nodes
# that don't have the credentials field set), try to auto-lookup
# the stored MCP credential for this server URL.
if credentials is None:
credentials = await self._auto_lookup_credential(
user_id, input_data.server_url
)
auth_token = (
credentials.access_token.get_secret_value() if credentials else None
)
try:
result = await self._call_mcp_tool(
server_url=input_data.server_url,
tool_name=input_data.selected_tool,
arguments=input_data.tool_arguments,
auth_token=auth_token,
)
yield "result", result
except MCPClientError as e:
yield "error", str(e)
except Exception as e:
logger.exception(f"MCP tool call failed: {e}")
yield "error", f"MCP tool call failed: {str(e)}"

View File

@@ -0,0 +1,323 @@
"""
MCP (Model Context Protocol) HTTP client.
Implements the MCP Streamable HTTP transport for listing tools and calling tools
on remote MCP servers. Uses JSON-RPC 2.0 over HTTP POST.
Handles both JSON and SSE (text/event-stream) response formats per the MCP spec.
Reference: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports
"""
import json
import logging
from dataclasses import dataclass, field
from typing import Any
from backend.util.request import Requests
logger = logging.getLogger(__name__)
@dataclass
class MCPTool:
"""Represents an MCP tool discovered from a server."""
name: str
description: str
input_schema: dict[str, Any]
@dataclass
class MCPCallResult:
"""Result from calling an MCP tool."""
content: list[dict[str, Any]] = field(default_factory=list)
is_error: bool = False
class MCPClientError(Exception):
"""Raised when an MCP protocol error occurs."""
pass
class MCPClient:
"""
Async HTTP client for the MCP Streamable HTTP transport.
Communicates with MCP servers using JSON-RPC 2.0 over HTTP POST.
Supports optional Bearer token authentication.
"""
def __init__(
self,
server_url: str,
auth_token: str | None = None,
):
self.server_url = server_url.rstrip("/")
self.auth_token = auth_token
self._request_id = 0
self._session_id: str | None = None
def _next_id(self) -> int:
self._request_id += 1
return self._request_id
def _build_headers(self) -> dict[str, str]:
headers = {
"Content-Type": "application/json",
"Accept": "application/json, text/event-stream",
}
if self.auth_token:
headers["Authorization"] = f"Bearer {self.auth_token}"
if self._session_id:
headers["Mcp-Session-Id"] = self._session_id
return headers
def _build_jsonrpc_request(
self, method: str, params: dict[str, Any] | None = None
) -> dict[str, Any]:
req: dict[str, Any] = {
"jsonrpc": "2.0",
"method": method,
"id": self._next_id(),
}
if params is not None:
req["params"] = params
return req
@staticmethod
def _parse_sse_response(text: str) -> dict[str, Any]:
"""Parse an SSE (text/event-stream) response body into JSON-RPC data.
MCP servers may return responses as SSE with format:
event: message
data: {"jsonrpc":"2.0","result":{...},"id":1}
We extract the last `data:` line that contains a JSON-RPC response
(i.e. has an "id" field), which is the reply to our request.
"""
last_data: dict[str, Any] | None = None
for line in text.splitlines():
stripped = line.strip()
if stripped.startswith("data:"):
payload = stripped[len("data:") :].strip()
if not payload:
continue
try:
parsed = json.loads(payload)
# Only keep JSON-RPC responses (have "id"), skip notifications
if isinstance(parsed, dict) and "id" in parsed:
last_data = parsed
except (json.JSONDecodeError, ValueError):
continue
if last_data is None:
raise MCPClientError("No JSON-RPC response found in SSE stream")
return last_data
async def _send_request(
self, method: str, params: dict[str, Any] | None = None
) -> Any:
"""Send a JSON-RPC request to the MCP server and return the result.
Handles both ``application/json`` and ``text/event-stream`` responses
as required by the MCP Streamable HTTP transport specification.
"""
payload = self._build_jsonrpc_request(method, params)
headers = self._build_headers()
requests = Requests(
raise_for_status=True,
extra_headers=headers,
)
response = await requests.post(self.server_url, json=payload)
# Capture session ID from response (MCP Streamable HTTP transport)
session_id = response.headers.get("Mcp-Session-Id")
if session_id:
self._session_id = session_id
content_type = response.headers.get("content-type", "")
if "text/event-stream" in content_type:
body = self._parse_sse_response(response.text())
else:
try:
body = response.json()
except Exception as e:
raise MCPClientError(
f"MCP server returned non-JSON response: {e}"
) from e
if not isinstance(body, dict):
raise MCPClientError(
f"MCP server returned unexpected JSON type: {type(body).__name__}"
)
# Handle JSON-RPC error
if "error" in body:
error = body["error"]
if isinstance(error, dict):
raise MCPClientError(
f"MCP server error [{error.get('code', '?')}]: "
f"{error.get('message', 'Unknown error')}"
)
raise MCPClientError(f"MCP server error: {error}")
return body.get("result")
async def _send_notification(self, method: str) -> None:
"""Send a JSON-RPC notification (no id, no response expected)."""
headers = self._build_headers()
notification = {"jsonrpc": "2.0", "method": method}
requests = Requests(
raise_for_status=False,
extra_headers=headers,
)
await requests.post(self.server_url, json=notification)
async def discover_auth(self) -> dict[str, Any] | None:
"""Probe the MCP server's OAuth metadata (RFC 9728 / MCP spec).
Returns ``None`` if the server doesn't require auth, otherwise returns
a dict with:
- ``authorization_servers``: list of authorization server URLs
- ``resource``: the resource indicator URL (usually the MCP endpoint)
- ``scopes_supported``: optional list of supported scopes
The caller can then fetch the authorization server metadata to get
``authorization_endpoint``, ``token_endpoint``, etc.
"""
from urllib.parse import urlparse
parsed = urlparse(self.server_url)
base = f"{parsed.scheme}://{parsed.netloc}"
# Build candidates for protected-resource metadata (per RFC 9728)
path = parsed.path.rstrip("/")
candidates = []
if path and path != "/":
candidates.append(f"{base}/.well-known/oauth-protected-resource{path}")
candidates.append(f"{base}/.well-known/oauth-protected-resource")
requests = Requests(
raise_for_status=False,
)
for url in candidates:
try:
resp = await requests.get(url)
if resp.status == 200:
data = resp.json()
if isinstance(data, dict) and "authorization_servers" in data:
return data
except Exception:
continue
return None
async def discover_auth_server_metadata(
self, auth_server_url: str
) -> dict[str, Any] | None:
"""Fetch the OAuth Authorization Server Metadata (RFC 8414).
Given an authorization server URL, returns a dict with:
- ``authorization_endpoint``
- ``token_endpoint``
- ``registration_endpoint`` (for dynamic client registration)
- ``scopes_supported``
- ``code_challenge_methods_supported``
- etc.
"""
from urllib.parse import urlparse
parsed = urlparse(auth_server_url)
base = f"{parsed.scheme}://{parsed.netloc}"
path = parsed.path.rstrip("/")
# Try standard metadata endpoints (RFC 8414 and OpenID Connect)
candidates = []
if path and path != "/":
candidates.append(f"{base}/.well-known/oauth-authorization-server{path}")
candidates.append(f"{base}/.well-known/oauth-authorization-server")
candidates.append(f"{base}/.well-known/openid-configuration")
requests = Requests(
raise_for_status=False,
)
for url in candidates:
try:
resp = await requests.get(url)
if resp.status == 200:
data = resp.json()
if isinstance(data, dict) and "authorization_endpoint" in data:
return data
except Exception:
continue
return None
async def initialize(self) -> dict[str, Any]:
"""
Send the MCP initialize request.
This is required by the MCP protocol before any other requests.
Returns the server's capabilities.
"""
result = await self._send_request(
"initialize",
{
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": {"name": "AutoGPT-Platform", "version": "1.0.0"},
},
)
# Send initialized notification (no response expected)
await self._send_notification("notifications/initialized")
return result or {}
async def list_tools(self) -> list[MCPTool]:
"""
Discover available tools from the MCP server.
Returns a list of MCPTool objects with name, description, and input schema.
"""
result = await self._send_request("tools/list")
if not result or "tools" not in result:
return []
tools = []
for tool_data in result["tools"]:
tools.append(
MCPTool(
name=tool_data.get("name", ""),
description=tool_data.get("description", ""),
input_schema=tool_data.get("inputSchema", {}),
)
)
return tools
async def call_tool(
self, tool_name: str, arguments: dict[str, Any]
) -> MCPCallResult:
"""
Call a tool on the MCP server.
Args:
tool_name: The name of the tool to call.
arguments: The arguments to pass to the tool.
Returns:
MCPCallResult with the tool's response content.
"""
result = await self._send_request(
"tools/call",
{"name": tool_name, "arguments": arguments},
)
if not result:
return MCPCallResult(is_error=True)
return MCPCallResult(
content=result.get("content", []),
is_error=result.get("isError", False),
)

View File

@@ -0,0 +1,204 @@
"""
MCP OAuth handler for MCP servers that use OAuth 2.1 authorization.
Unlike other OAuth handlers (GitHub, Google, etc.) where endpoints are fixed,
MCP servers have dynamic endpoints discovered via RFC 9728 / RFC 8414 metadata.
This handler accepts those endpoints at construction time.
"""
import logging
import time
import urllib.parse
from typing import ClassVar, Optional
from pydantic import SecretStr
from backend.data.model import OAuth2Credentials
from backend.integrations.oauth.base import BaseOAuthHandler
from backend.integrations.providers import ProviderName
from backend.util.request import Requests
logger = logging.getLogger(__name__)
class MCPOAuthHandler(BaseOAuthHandler):
"""
OAuth handler for MCP servers with dynamically-discovered endpoints.
Construction requires the authorization and token endpoint URLs,
which are obtained via MCP OAuth metadata discovery
(``MCPClient.discover_auth`` + ``discover_auth_server_metadata``).
"""
PROVIDER_NAME: ClassVar[ProviderName | str] = ProviderName.MCP
DEFAULT_SCOPES: ClassVar[list[str]] = []
def __init__(
self,
client_id: str,
client_secret: str,
redirect_uri: str,
*,
authorize_url: str,
token_url: str,
revoke_url: str | None = None,
resource_url: str | None = None,
):
self.client_id = client_id
self.client_secret = client_secret
self.redirect_uri = redirect_uri
self.authorize_url = authorize_url
self.token_url = token_url
self.revoke_url = revoke_url
self.resource_url = resource_url
def get_login_url(
self,
scopes: list[str],
state: str,
code_challenge: Optional[str],
) -> str:
scopes = self.handle_default_scopes(scopes)
params: dict[str, str] = {
"response_type": "code",
"client_id": self.client_id,
"redirect_uri": self.redirect_uri,
"state": state,
}
if scopes:
params["scope"] = " ".join(scopes)
# PKCE (S256) — included when the caller provides a code_challenge
if code_challenge:
params["code_challenge"] = code_challenge
params["code_challenge_method"] = "S256"
# MCP spec requires resource indicator (RFC 8707)
if self.resource_url:
params["resource"] = self.resource_url
return f"{self.authorize_url}?{urllib.parse.urlencode(params)}"
async def exchange_code_for_tokens(
self,
code: str,
scopes: list[str],
code_verifier: Optional[str],
) -> OAuth2Credentials:
data: dict[str, str] = {
"grant_type": "authorization_code",
"code": code,
"redirect_uri": self.redirect_uri,
"client_id": self.client_id,
}
if self.client_secret:
data["client_secret"] = self.client_secret
if code_verifier:
data["code_verifier"] = code_verifier
if self.resource_url:
data["resource"] = self.resource_url
response = await Requests(raise_for_status=True).post(
self.token_url,
data=data,
headers={"Content-Type": "application/x-www-form-urlencoded"},
)
tokens = response.json()
if "error" in tokens:
raise RuntimeError(
f"Token exchange failed: {tokens.get('error_description', tokens['error'])}"
)
if "access_token" not in tokens:
raise RuntimeError("OAuth token response missing 'access_token' field")
now = int(time.time())
expires_in = tokens.get("expires_in")
return OAuth2Credentials(
provider=self.PROVIDER_NAME,
title=None,
access_token=SecretStr(tokens["access_token"]),
refresh_token=(
SecretStr(tokens["refresh_token"])
if tokens.get("refresh_token")
else None
),
access_token_expires_at=now + expires_in if expires_in else None,
refresh_token_expires_at=None,
scopes=scopes,
metadata={
"mcp_token_url": self.token_url,
"mcp_resource_url": self.resource_url,
},
)
async def _refresh_tokens(
self, credentials: OAuth2Credentials
) -> OAuth2Credentials:
if not credentials.refresh_token:
raise ValueError("No refresh token available for MCP OAuth credentials")
data: dict[str, str] = {
"grant_type": "refresh_token",
"refresh_token": credentials.refresh_token.get_secret_value(),
"client_id": self.client_id,
}
if self.client_secret:
data["client_secret"] = self.client_secret
if self.resource_url:
data["resource"] = self.resource_url
response = await Requests(raise_for_status=True).post(
self.token_url,
data=data,
headers={"Content-Type": "application/x-www-form-urlencoded"},
)
tokens = response.json()
if "error" in tokens:
raise RuntimeError(
f"Token refresh failed: {tokens.get('error_description', tokens['error'])}"
)
if "access_token" not in tokens:
raise RuntimeError("OAuth refresh response missing 'access_token' field")
now = int(time.time())
expires_in = tokens.get("expires_in")
return OAuth2Credentials(
id=credentials.id,
provider=self.PROVIDER_NAME,
title=credentials.title,
access_token=SecretStr(tokens["access_token"]),
refresh_token=(
SecretStr(tokens["refresh_token"])
if tokens.get("refresh_token")
else credentials.refresh_token
),
access_token_expires_at=now + expires_in if expires_in else None,
refresh_token_expires_at=credentials.refresh_token_expires_at,
scopes=credentials.scopes,
metadata=credentials.metadata,
)
async def revoke_tokens(self, credentials: OAuth2Credentials) -> bool:
if not self.revoke_url:
return False
try:
data = {
"token": credentials.access_token.get_secret_value(),
"token_type_hint": "access_token",
"client_id": self.client_id,
}
await Requests().post(
self.revoke_url,
data=data,
headers={"Content-Type": "application/x-www-form-urlencoded"},
)
return True
except Exception:
logger.warning("Failed to revoke MCP OAuth tokens", exc_info=True)
return False

View File

@@ -0,0 +1,109 @@
"""
End-to-end tests against a real public MCP server.
These tests hit the OpenAI docs MCP server (https://developers.openai.com/mcp)
which is publicly accessible without authentication and returns SSE responses.
Mark: These are tagged with ``@pytest.mark.e2e`` so they can be run/skipped
independently of the rest of the test suite (they require network access).
"""
import json
import os
import pytest
from backend.blocks.mcp.client import MCPClient
# Public MCP server that requires no authentication
OPENAI_DOCS_MCP_URL = "https://developers.openai.com/mcp"
# Skip all tests in this module unless RUN_E2E env var is set
pytestmark = pytest.mark.skipif(
not os.environ.get("RUN_E2E"), reason="set RUN_E2E=1 to run e2e tests"
)
class TestRealMCPServer:
"""Tests against the live OpenAI docs MCP server."""
@pytest.mark.asyncio(loop_scope="session")
async def test_initialize(self):
"""Verify we can complete the MCP handshake with a real server."""
client = MCPClient(OPENAI_DOCS_MCP_URL)
result = await client.initialize()
assert result["protocolVersion"] == "2025-03-26"
assert "serverInfo" in result
assert result["serverInfo"]["name"] == "openai-docs-mcp"
assert "tools" in result.get("capabilities", {})
@pytest.mark.asyncio(loop_scope="session")
async def test_list_tools(self):
"""Verify we can discover tools from a real MCP server."""
client = MCPClient(OPENAI_DOCS_MCP_URL)
await client.initialize()
tools = await client.list_tools()
assert len(tools) >= 3 # server has at least 5 tools as of writing
tool_names = {t.name for t in tools}
# These tools are documented and should be stable
assert "search_openai_docs" in tool_names
assert "list_openai_docs" in tool_names
assert "fetch_openai_doc" in tool_names
# Verify schema structure
search_tool = next(t for t in tools if t.name == "search_openai_docs")
assert "query" in search_tool.input_schema.get("properties", {})
assert "query" in search_tool.input_schema.get("required", [])
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_list_api_endpoints(self):
"""Call the list_api_endpoints tool and verify we get real data."""
client = MCPClient(OPENAI_DOCS_MCP_URL)
await client.initialize()
result = await client.call_tool("list_api_endpoints", {})
assert not result.is_error
assert len(result.content) >= 1
assert result.content[0]["type"] == "text"
data = json.loads(result.content[0]["text"])
assert "paths" in data or "urls" in data
# The OpenAI API should have many endpoints
total = data.get("total", len(data.get("paths", [])))
assert total > 50
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_search(self):
"""Search for docs and verify we get results."""
client = MCPClient(OPENAI_DOCS_MCP_URL)
await client.initialize()
result = await client.call_tool(
"search_openai_docs", {"query": "chat completions", "limit": 3}
)
assert not result.is_error
assert len(result.content) >= 1
@pytest.mark.asyncio(loop_scope="session")
async def test_sse_response_handling(self):
"""Verify the client correctly handles SSE responses from a real server.
This is the key test — our local test server returns JSON,
but real MCP servers typically return SSE. This proves the
SSE parsing works end-to-end.
"""
client = MCPClient(OPENAI_DOCS_MCP_URL)
# initialize() internally calls _send_request which must parse SSE
result = await client.initialize()
# If we got here without error, SSE parsing works
assert isinstance(result, dict)
assert "protocolVersion" in result
# Also verify list_tools works (another SSE response)
tools = await client.list_tools()
assert len(tools) > 0
assert all(hasattr(t, "name") for t in tools)

View File

@@ -0,0 +1,389 @@
"""
Integration tests for MCP client and MCPToolBlock against a real HTTP server.
These tests spin up a local MCP test server and run the full client/block flow
against it — no mocking, real HTTP requests.
"""
import asyncio
import json
import threading
from unittest.mock import patch
import pytest
from aiohttp import web
from pydantic import SecretStr
from backend.blocks.mcp.block import MCPToolBlock
from backend.blocks.mcp.client import MCPClient
from backend.blocks.mcp.test_server import create_test_mcp_app
from backend.data.model import OAuth2Credentials
MOCK_USER_ID = "test-user-integration"
class _MCPTestServer:
"""
Run an MCP test server in a background thread with its own event loop.
This avoids event loop conflicts with pytest-asyncio.
"""
def __init__(self, auth_token: str | None = None):
self.auth_token = auth_token
self.url: str = ""
self._runner: web.AppRunner | None = None
self._loop: asyncio.AbstractEventLoop | None = None
self._thread: threading.Thread | None = None
self._started = threading.Event()
def _run(self):
self._loop = asyncio.new_event_loop()
asyncio.set_event_loop(self._loop)
self._loop.run_until_complete(self._start())
self._started.set()
self._loop.run_forever()
async def _start(self):
app = create_test_mcp_app(auth_token=self.auth_token)
self._runner = web.AppRunner(app)
await self._runner.setup()
site = web.TCPSite(self._runner, "127.0.0.1", 0)
await site.start()
port = site._server.sockets[0].getsockname()[1] # type: ignore[union-attr]
self.url = f"http://127.0.0.1:{port}/mcp"
def start(self):
self._thread = threading.Thread(target=self._run, daemon=True)
self._thread.start()
if not self._started.wait(timeout=5):
raise RuntimeError("MCP test server failed to start within 5 seconds")
return self
def stop(self):
if self._loop and self._runner:
asyncio.run_coroutine_threadsafe(self._runner.cleanup(), self._loop).result(
timeout=5
)
self._loop.call_soon_threadsafe(self._loop.stop)
if self._thread:
self._thread.join(timeout=5)
@pytest.fixture(scope="module")
def mcp_server():
"""Start a local MCP test server in a background thread."""
server = _MCPTestServer()
server.start()
yield server.url
server.stop()
@pytest.fixture(scope="module")
def mcp_server_with_auth():
"""Start a local MCP test server with auth in a background thread."""
server = _MCPTestServer(auth_token="test-secret-token")
server.start()
yield server.url, "test-secret-token"
server.stop()
@pytest.fixture(autouse=True)
def _allow_localhost():
"""
Allow 127.0.0.1 through SSRF protection for integration tests.
The Requests class blocks private IPs by default. We patch the Requests
constructor to always include 127.0.0.1 as a trusted origin so the local
test server is reachable.
"""
from backend.util.request import Requests
original_init = Requests.__init__
def patched_init(self, *args, **kwargs):
trusted = list(kwargs.get("trusted_origins") or [])
trusted.append("http://127.0.0.1")
kwargs["trusted_origins"] = trusted
original_init(self, *args, **kwargs)
with patch.object(Requests, "__init__", patched_init):
yield
def _make_client(url: str, auth_token: str | None = None) -> MCPClient:
"""Create an MCPClient for integration tests."""
return MCPClient(url, auth_token=auth_token)
# ── MCPClient integration tests ──────────────────────────────────────
class TestMCPClientIntegration:
"""Test MCPClient against a real local MCP server."""
@pytest.mark.asyncio(loop_scope="session")
async def test_initialize(self, mcp_server):
client = _make_client(mcp_server)
result = await client.initialize()
assert result["protocolVersion"] == "2025-03-26"
assert result["serverInfo"]["name"] == "test-mcp-server"
assert "tools" in result["capabilities"]
@pytest.mark.asyncio(loop_scope="session")
async def test_list_tools(self, mcp_server):
client = _make_client(mcp_server)
await client.initialize()
tools = await client.list_tools()
assert len(tools) == 3
tool_names = {t.name for t in tools}
assert tool_names == {"get_weather", "add_numbers", "echo"}
# Check get_weather schema
weather = next(t for t in tools if t.name == "get_weather")
assert weather.description == "Get current weather for a city"
assert "city" in weather.input_schema["properties"]
assert weather.input_schema["required"] == ["city"]
# Check add_numbers schema
add = next(t for t in tools if t.name == "add_numbers")
assert "a" in add.input_schema["properties"]
assert "b" in add.input_schema["properties"]
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_get_weather(self, mcp_server):
client = _make_client(mcp_server)
await client.initialize()
result = await client.call_tool("get_weather", {"city": "London"})
assert not result.is_error
assert len(result.content) == 1
assert result.content[0]["type"] == "text"
data = json.loads(result.content[0]["text"])
assert data["city"] == "London"
assert data["temperature"] == 22
assert data["condition"] == "sunny"
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_add_numbers(self, mcp_server):
client = _make_client(mcp_server)
await client.initialize()
result = await client.call_tool("add_numbers", {"a": 3, "b": 7})
assert not result.is_error
data = json.loads(result.content[0]["text"])
assert data["result"] == 10
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_echo(self, mcp_server):
client = _make_client(mcp_server)
await client.initialize()
result = await client.call_tool("echo", {"message": "Hello MCP!"})
assert not result.is_error
assert result.content[0]["text"] == "Hello MCP!"
@pytest.mark.asyncio(loop_scope="session")
async def test_call_unknown_tool(self, mcp_server):
client = _make_client(mcp_server)
await client.initialize()
result = await client.call_tool("nonexistent_tool", {})
assert result.is_error
assert "Unknown tool" in result.content[0]["text"]
@pytest.mark.asyncio(loop_scope="session")
async def test_auth_success(self, mcp_server_with_auth):
url, token = mcp_server_with_auth
client = _make_client(url, auth_token=token)
result = await client.initialize()
assert result["protocolVersion"] == "2025-03-26"
tools = await client.list_tools()
assert len(tools) == 3
@pytest.mark.asyncio(loop_scope="session")
async def test_auth_failure(self, mcp_server_with_auth):
url, _ = mcp_server_with_auth
client = _make_client(url, auth_token="wrong-token")
with pytest.raises(Exception):
await client.initialize()
@pytest.mark.asyncio(loop_scope="session")
async def test_auth_missing(self, mcp_server_with_auth):
url, _ = mcp_server_with_auth
client = _make_client(url)
with pytest.raises(Exception):
await client.initialize()
# ── MCPToolBlock integration tests ───────────────────────────────────
class TestMCPToolBlockIntegration:
"""Test MCPToolBlock end-to-end against a real local MCP server."""
@pytest.mark.asyncio(loop_scope="session")
async def test_full_flow_get_weather(self, mcp_server):
"""Full flow: discover tools, select one, execute it."""
# Step 1: Discover tools (simulating what the frontend/API would do)
client = _make_client(mcp_server)
await client.initialize()
tools = await client.list_tools()
assert len(tools) == 3
# Step 2: User selects "get_weather" and we get its schema
weather_tool = next(t for t in tools if t.name == "get_weather")
# Step 3: Execute the block — no credentials (public server)
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url=mcp_server,
selected_tool="get_weather",
tool_input_schema=weather_tool.input_schema,
tool_arguments={"city": "Paris"},
)
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert len(outputs) == 1
assert outputs[0][0] == "result"
result = outputs[0][1]
assert result["city"] == "Paris"
assert result["temperature"] == 22
assert result["condition"] == "sunny"
@pytest.mark.asyncio(loop_scope="session")
async def test_full_flow_add_numbers(self, mcp_server):
"""Full flow for add_numbers tool."""
client = _make_client(mcp_server)
await client.initialize()
tools = await client.list_tools()
add_tool = next(t for t in tools if t.name == "add_numbers")
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url=mcp_server,
selected_tool="add_numbers",
tool_input_schema=add_tool.input_schema,
tool_arguments={"a": 42, "b": 58},
)
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert len(outputs) == 1
assert outputs[0][0] == "result"
assert outputs[0][1]["result"] == 100
@pytest.mark.asyncio(loop_scope="session")
async def test_full_flow_echo_plain_text(self, mcp_server):
"""Verify plain text (non-JSON) responses work."""
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url=mcp_server,
selected_tool="echo",
tool_input_schema={
"type": "object",
"properties": {"message": {"type": "string"}},
"required": ["message"],
},
tool_arguments={"message": "Hello from AutoGPT!"},
)
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert len(outputs) == 1
assert outputs[0][0] == "result"
assert outputs[0][1] == "Hello from AutoGPT!"
@pytest.mark.asyncio(loop_scope="session")
async def test_full_flow_unknown_tool_yields_error(self, mcp_server):
"""Calling an unknown tool should yield an error output."""
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url=mcp_server,
selected_tool="nonexistent_tool",
tool_arguments={},
)
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert len(outputs) == 1
assert outputs[0][0] == "error"
assert "returned an error" in outputs[0][1]
@pytest.mark.asyncio(loop_scope="session")
async def test_full_flow_with_auth(self, mcp_server_with_auth):
"""Full flow with authentication via credentials kwarg."""
url, token = mcp_server_with_auth
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url=url,
selected_tool="echo",
tool_input_schema={
"type": "object",
"properties": {"message": {"type": "string"}},
"required": ["message"],
},
tool_arguments={"message": "Authenticated!"},
)
# Pass credentials via the standard kwarg (as the executor would)
test_creds = OAuth2Credentials(
id="test-cred",
provider="mcp",
access_token=SecretStr(token),
refresh_token=SecretStr(""),
scopes=[],
title="Test MCP credential",
)
outputs = []
async for name, data in block.run(
input_data, user_id=MOCK_USER_ID, credentials=test_creds
):
outputs.append((name, data))
assert len(outputs) == 1
assert outputs[0][0] == "result"
assert outputs[0][1] == "Authenticated!"
@pytest.mark.asyncio(loop_scope="session")
async def test_no_credentials_runs_without_auth(self, mcp_server):
"""Block runs without auth when no credentials are provided."""
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url=mcp_server,
selected_tool="echo",
tool_input_schema={
"type": "object",
"properties": {"message": {"type": "string"}},
"required": ["message"],
},
tool_arguments={"message": "No auth needed"},
)
outputs = []
async for name, data in block.run(
input_data, user_id=MOCK_USER_ID, credentials=None
):
outputs.append((name, data))
assert len(outputs) == 1
assert outputs[0][0] == "result"
assert outputs[0][1] == "No auth needed"

View File

@@ -0,0 +1,619 @@
"""
Tests for MCP client and MCPToolBlock.
"""
import json
from unittest.mock import AsyncMock, patch
import pytest
from backend.blocks.mcp.block import MCPToolBlock
from backend.blocks.mcp.client import MCPCallResult, MCPClient, MCPClientError
from backend.util.test import execute_block_test
# ── SSE parsing unit tests ───────────────────────────────────────────
class TestSSEParsing:
"""Tests for SSE (text/event-stream) response parsing."""
def test_parse_sse_simple(self):
sse = (
"event: message\n"
'data: {"jsonrpc":"2.0","result":{"tools":[]},"id":1}\n'
"\n"
)
body = MCPClient._parse_sse_response(sse)
assert body["result"] == {"tools": []}
assert body["id"] == 1
def test_parse_sse_with_notifications(self):
"""SSE streams can contain notifications (no id) before the response."""
sse = (
"event: message\n"
'data: {"jsonrpc":"2.0","method":"some/notification"}\n'
"\n"
"event: message\n"
'data: {"jsonrpc":"2.0","result":{"ok":true},"id":2}\n'
"\n"
)
body = MCPClient._parse_sse_response(sse)
assert body["result"] == {"ok": True}
assert body["id"] == 2
def test_parse_sse_error_response(self):
sse = (
"event: message\n"
'data: {"jsonrpc":"2.0","error":{"code":-32600,"message":"Bad Request"},"id":1}\n'
)
body = MCPClient._parse_sse_response(sse)
assert "error" in body
assert body["error"]["code"] == -32600
def test_parse_sse_no_data_raises(self):
with pytest.raises(MCPClientError, match="No JSON-RPC response found"):
MCPClient._parse_sse_response("event: message\n\n")
def test_parse_sse_empty_raises(self):
with pytest.raises(MCPClientError, match="No JSON-RPC response found"):
MCPClient._parse_sse_response("")
def test_parse_sse_ignores_non_data_lines(self):
sse = (
": comment line\n"
"event: message\n"
"id: 123\n"
'data: {"jsonrpc":"2.0","result":"ok","id":1}\n'
"\n"
)
body = MCPClient._parse_sse_response(sse)
assert body["result"] == "ok"
def test_parse_sse_uses_last_response(self):
"""If multiple responses exist, use the last one."""
sse = (
'data: {"jsonrpc":"2.0","result":"first","id":1}\n'
"\n"
'data: {"jsonrpc":"2.0","result":"second","id":2}\n'
"\n"
)
body = MCPClient._parse_sse_response(sse)
assert body["result"] == "second"
# ── MCPClient unit tests ─────────────────────────────────────────────
class TestMCPClient:
"""Tests for the MCP HTTP client."""
def test_build_headers_without_auth(self):
client = MCPClient("https://mcp.example.com")
headers = client._build_headers()
assert "Authorization" not in headers
assert headers["Content-Type"] == "application/json"
def test_build_headers_with_auth(self):
client = MCPClient("https://mcp.example.com", auth_token="my-token")
headers = client._build_headers()
assert headers["Authorization"] == "Bearer my-token"
def test_build_jsonrpc_request(self):
client = MCPClient("https://mcp.example.com")
req = client._build_jsonrpc_request("tools/list")
assert req["jsonrpc"] == "2.0"
assert req["method"] == "tools/list"
assert "id" in req
assert "params" not in req
def test_build_jsonrpc_request_with_params(self):
client = MCPClient("https://mcp.example.com")
req = client._build_jsonrpc_request(
"tools/call", {"name": "test", "arguments": {"x": 1}}
)
assert req["params"] == {"name": "test", "arguments": {"x": 1}}
def test_request_id_increments(self):
client = MCPClient("https://mcp.example.com")
req1 = client._build_jsonrpc_request("tools/list")
req2 = client._build_jsonrpc_request("tools/list")
assert req2["id"] > req1["id"]
def test_server_url_trailing_slash_stripped(self):
client = MCPClient("https://mcp.example.com/mcp/")
assert client.server_url == "https://mcp.example.com/mcp"
@pytest.mark.asyncio(loop_scope="session")
async def test_send_request_success(self):
client = MCPClient("https://mcp.example.com")
mock_response = AsyncMock()
mock_response.json.return_value = {
"jsonrpc": "2.0",
"result": {"tools": []},
"id": 1,
}
with patch.object(client, "_send_request", return_value={"tools": []}):
result = await client._send_request("tools/list")
assert result == {"tools": []}
@pytest.mark.asyncio(loop_scope="session")
async def test_send_request_error(self):
client = MCPClient("https://mcp.example.com")
async def mock_send(*args, **kwargs):
raise MCPClientError("MCP server error [-32600]: Invalid Request")
with patch.object(client, "_send_request", side_effect=mock_send):
with pytest.raises(MCPClientError, match="Invalid Request"):
await client._send_request("tools/list")
@pytest.mark.asyncio(loop_scope="session")
async def test_list_tools(self):
client = MCPClient("https://mcp.example.com")
mock_result = {
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a city",
"inputSchema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
{
"name": "search",
"description": "Search the web",
"inputSchema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
},
},
]
}
with patch.object(client, "_send_request", return_value=mock_result):
tools = await client.list_tools()
assert len(tools) == 2
assert tools[0].name == "get_weather"
assert tools[0].description == "Get current weather for a city"
assert tools[0].input_schema["properties"]["city"]["type"] == "string"
assert tools[1].name == "search"
@pytest.mark.asyncio(loop_scope="session")
async def test_list_tools_empty(self):
client = MCPClient("https://mcp.example.com")
with patch.object(client, "_send_request", return_value={"tools": []}):
tools = await client.list_tools()
assert tools == []
@pytest.mark.asyncio(loop_scope="session")
async def test_list_tools_none_result(self):
client = MCPClient("https://mcp.example.com")
with patch.object(client, "_send_request", return_value=None):
tools = await client.list_tools()
assert tools == []
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_success(self):
client = MCPClient("https://mcp.example.com")
mock_result = {
"content": [
{"type": "text", "text": json.dumps({"temp": 20, "city": "London"})}
],
"isError": False,
}
with patch.object(client, "_send_request", return_value=mock_result):
result = await client.call_tool("get_weather", {"city": "London"})
assert not result.is_error
assert len(result.content) == 1
assert result.content[0]["type"] == "text"
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_error(self):
client = MCPClient("https://mcp.example.com")
mock_result = {
"content": [{"type": "text", "text": "City not found"}],
"isError": True,
}
with patch.object(client, "_send_request", return_value=mock_result):
result = await client.call_tool("get_weather", {"city": "???"})
assert result.is_error
@pytest.mark.asyncio(loop_scope="session")
async def test_call_tool_none_result(self):
client = MCPClient("https://mcp.example.com")
with patch.object(client, "_send_request", return_value=None):
result = await client.call_tool("get_weather", {"city": "London"})
assert result.is_error
@pytest.mark.asyncio(loop_scope="session")
async def test_initialize(self):
client = MCPClient("https://mcp.example.com")
mock_result = {
"protocolVersion": "2025-03-26",
"capabilities": {"tools": {}},
"serverInfo": {"name": "test-server", "version": "1.0.0"},
}
with (
patch.object(client, "_send_request", return_value=mock_result) as mock_req,
patch.object(client, "_send_notification") as mock_notif,
):
result = await client.initialize()
mock_req.assert_called_once()
mock_notif.assert_called_once_with("notifications/initialized")
assert result["protocolVersion"] == "2025-03-26"
# ── MCPToolBlock unit tests ──────────────────────────────────────────
MOCK_USER_ID = "test-user-123"
class TestMCPToolBlock:
"""Tests for the MCPToolBlock."""
def test_block_instantiation(self):
block = MCPToolBlock()
assert block.id == "a0a4b1c2-d3e4-4f56-a7b8-c9d0e1f2a3b4"
assert block.name == "MCPToolBlock"
def test_input_schema_has_required_fields(self):
block = MCPToolBlock()
schema = block.input_schema.jsonschema()
props = schema.get("properties", {})
assert "server_url" in props
assert "selected_tool" in props
assert "tool_arguments" in props
assert "credentials" in props
def test_output_schema(self):
block = MCPToolBlock()
schema = block.output_schema.jsonschema()
props = schema.get("properties", {})
assert "result" in props
assert "error" in props
def test_get_input_schema_with_tool_schema(self):
tool_schema = {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"],
}
data = {"tool_input_schema": tool_schema}
result = MCPToolBlock.Input.get_input_schema(data)
assert result == tool_schema
def test_get_input_schema_without_tool_schema(self):
result = MCPToolBlock.Input.get_input_schema({})
assert result == {}
def test_get_input_defaults(self):
data = {"tool_arguments": {"city": "London"}}
result = MCPToolBlock.Input.get_input_defaults(data)
assert result == {"city": "London"}
def test_get_missing_input(self):
data = {
"tool_input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"units": {"type": "string"},
},
"required": ["city", "units"],
},
"tool_arguments": {"city": "London"},
}
missing = MCPToolBlock.Input.get_missing_input(data)
assert missing == {"units"}
def test_get_missing_input_all_present(self):
data = {
"tool_input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
"tool_arguments": {"city": "London"},
}
missing = MCPToolBlock.Input.get_missing_input(data)
assert missing == set()
@pytest.mark.asyncio(loop_scope="session")
async def test_run_with_mock(self):
"""Test the block using the built-in test infrastructure."""
block = MCPToolBlock()
await execute_block_test(block)
@pytest.mark.asyncio(loop_scope="session")
async def test_run_missing_server_url(self):
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url="",
selected_tool="test",
)
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert outputs == [("error", "MCP server URL is required")]
@pytest.mark.asyncio(loop_scope="session")
async def test_run_missing_tool(self):
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url="https://mcp.example.com/mcp",
selected_tool="",
)
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert outputs == [
("error", "No tool selected. Please select a tool from the dropdown.")
]
@pytest.mark.asyncio(loop_scope="session")
async def test_run_success(self):
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url="https://mcp.example.com/mcp",
selected_tool="get_weather",
tool_input_schema={
"type": "object",
"properties": {"city": {"type": "string"}},
},
tool_arguments={"city": "London"},
)
async def mock_call(*args, **kwargs):
return {"temp": 20, "city": "London"}
block._call_mcp_tool = mock_call # type: ignore
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert len(outputs) == 1
assert outputs[0][0] == "result"
assert outputs[0][1] == {"temp": 20, "city": "London"}
@pytest.mark.asyncio(loop_scope="session")
async def test_run_mcp_error(self):
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url="https://mcp.example.com/mcp",
selected_tool="bad_tool",
)
async def mock_call(*args, **kwargs):
raise MCPClientError("Tool not found")
block._call_mcp_tool = mock_call # type: ignore
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert outputs[0][0] == "error"
assert "Tool not found" in outputs[0][1]
@pytest.mark.asyncio(loop_scope="session")
async def test_call_mcp_tool_parses_json_text(self):
block = MCPToolBlock()
mock_result = MCPCallResult(
content=[
{"type": "text", "text": '{"temp": 20}'},
],
is_error=False,
)
async def mock_init(self):
return {}
async def mock_call(self, name, args):
return mock_result
with (
patch.object(MCPClient, "initialize", mock_init),
patch.object(MCPClient, "call_tool", mock_call),
):
result = await block._call_mcp_tool(
"https://mcp.example.com", "test_tool", {}
)
assert result == {"temp": 20}
@pytest.mark.asyncio(loop_scope="session")
async def test_call_mcp_tool_plain_text(self):
block = MCPToolBlock()
mock_result = MCPCallResult(
content=[
{"type": "text", "text": "Hello, world!"},
],
is_error=False,
)
async def mock_init(self):
return {}
async def mock_call(self, name, args):
return mock_result
with (
patch.object(MCPClient, "initialize", mock_init),
patch.object(MCPClient, "call_tool", mock_call),
):
result = await block._call_mcp_tool(
"https://mcp.example.com", "test_tool", {}
)
assert result == "Hello, world!"
@pytest.mark.asyncio(loop_scope="session")
async def test_call_mcp_tool_multiple_content(self):
block = MCPToolBlock()
mock_result = MCPCallResult(
content=[
{"type": "text", "text": "Part 1"},
{"type": "text", "text": '{"part": 2}'},
],
is_error=False,
)
async def mock_init(self):
return {}
async def mock_call(self, name, args):
return mock_result
with (
patch.object(MCPClient, "initialize", mock_init),
patch.object(MCPClient, "call_tool", mock_call),
):
result = await block._call_mcp_tool(
"https://mcp.example.com", "test_tool", {}
)
assert result == ["Part 1", {"part": 2}]
@pytest.mark.asyncio(loop_scope="session")
async def test_call_mcp_tool_error_result(self):
block = MCPToolBlock()
mock_result = MCPCallResult(
content=[{"type": "text", "text": "Something went wrong"}],
is_error=True,
)
async def mock_init(self):
return {}
async def mock_call(self, name, args):
return mock_result
with (
patch.object(MCPClient, "initialize", mock_init),
patch.object(MCPClient, "call_tool", mock_call),
):
with pytest.raises(MCPClientError, match="returned an error"):
await block._call_mcp_tool("https://mcp.example.com", "test_tool", {})
@pytest.mark.asyncio(loop_scope="session")
async def test_call_mcp_tool_image_content(self):
block = MCPToolBlock()
mock_result = MCPCallResult(
content=[
{
"type": "image",
"data": "base64data==",
"mimeType": "image/png",
}
],
is_error=False,
)
async def mock_init(self):
return {}
async def mock_call(self, name, args):
return mock_result
with (
patch.object(MCPClient, "initialize", mock_init),
patch.object(MCPClient, "call_tool", mock_call),
):
result = await block._call_mcp_tool(
"https://mcp.example.com", "test_tool", {}
)
assert result == {
"type": "image",
"data": "base64data==",
"mimeType": "image/png",
}
@pytest.mark.asyncio(loop_scope="session")
async def test_run_with_credentials(self):
"""Verify the block uses OAuth2Credentials and passes auth token."""
from pydantic import SecretStr
from backend.data.model import OAuth2Credentials
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url="https://mcp.example.com/mcp",
selected_tool="test_tool",
)
captured_tokens: list[str | None] = []
async def mock_call(server_url, tool_name, arguments, auth_token=None):
captured_tokens.append(auth_token)
return "ok"
block._call_mcp_tool = mock_call # type: ignore
test_creds = OAuth2Credentials(
id="cred-123",
provider="mcp",
access_token=SecretStr("resolved-token"),
refresh_token=SecretStr(""),
scopes=[],
title="Test MCP credential",
)
async for _ in block.run(
input_data, user_id=MOCK_USER_ID, credentials=test_creds
):
pass
assert captured_tokens == ["resolved-token"]
@pytest.mark.asyncio(loop_scope="session")
async def test_run_without_credentials(self):
"""Verify the block works without credentials (public server)."""
block = MCPToolBlock()
input_data = MCPToolBlock.Input(
server_url="https://mcp.example.com/mcp",
selected_tool="test_tool",
)
captured_tokens: list[str | None] = []
async def mock_call(server_url, tool_name, arguments, auth_token=None):
captured_tokens.append(auth_token)
return "ok"
block._call_mcp_tool = mock_call # type: ignore
outputs = []
async for name, data in block.run(input_data, user_id=MOCK_USER_ID):
outputs.append((name, data))
assert captured_tokens == [None]
assert outputs == [("result", "ok")]

View File

@@ -0,0 +1,242 @@
"""
Tests for MCP OAuth handler.
"""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from pydantic import SecretStr
from backend.blocks.mcp.client import MCPClient
from backend.blocks.mcp.oauth import MCPOAuthHandler
from backend.data.model import OAuth2Credentials
def _mock_response(json_data: dict, status: int = 200) -> MagicMock:
"""Create a mock Response with synchronous json() (matching Requests.Response)."""
resp = MagicMock()
resp.status = status
resp.ok = 200 <= status < 300
resp.json.return_value = json_data
return resp
class TestMCPOAuthHandler:
"""Tests for the MCPOAuthHandler."""
def _make_handler(self, **overrides) -> MCPOAuthHandler:
defaults = {
"client_id": "test-client-id",
"client_secret": "test-client-secret",
"redirect_uri": "https://app.example.com/callback",
"authorize_url": "https://auth.example.com/authorize",
"token_url": "https://auth.example.com/token",
}
defaults.update(overrides)
return MCPOAuthHandler(**defaults)
def test_get_login_url_basic(self):
handler = self._make_handler()
url = handler.get_login_url(
scopes=["read", "write"],
state="random-state-token",
code_challenge="S256-challenge-value",
)
assert "https://auth.example.com/authorize?" in url
assert "response_type=code" in url
assert "client_id=test-client-id" in url
assert "state=random-state-token" in url
assert "code_challenge=S256-challenge-value" in url
assert "code_challenge_method=S256" in url
assert "scope=read+write" in url
def test_get_login_url_with_resource(self):
handler = self._make_handler(resource_url="https://mcp.example.com/mcp")
url = handler.get_login_url(
scopes=[], state="state", code_challenge="challenge"
)
assert "resource=https" in url
def test_get_login_url_without_pkce(self):
handler = self._make_handler()
url = handler.get_login_url(scopes=["read"], state="state", code_challenge=None)
assert "code_challenge" not in url
assert "code_challenge_method" not in url
@pytest.mark.asyncio(loop_scope="session")
async def test_exchange_code_for_tokens(self):
handler = self._make_handler()
resp = _mock_response(
{
"access_token": "new-access-token",
"refresh_token": "new-refresh-token",
"expires_in": 3600,
"token_type": "Bearer",
}
)
with patch("backend.blocks.mcp.oauth.Requests") as MockRequests:
instance = MockRequests.return_value
instance.post = AsyncMock(return_value=resp)
creds = await handler.exchange_code_for_tokens(
code="auth-code",
scopes=["read"],
code_verifier="pkce-verifier",
)
assert isinstance(creds, OAuth2Credentials)
assert creds.access_token.get_secret_value() == "new-access-token"
assert creds.refresh_token is not None
assert creds.refresh_token.get_secret_value() == "new-refresh-token"
assert creds.scopes == ["read"]
assert creds.access_token_expires_at is not None
@pytest.mark.asyncio(loop_scope="session")
async def test_refresh_tokens(self):
handler = self._make_handler()
existing_creds = OAuth2Credentials(
id="existing-id",
provider="mcp",
access_token=SecretStr("old-token"),
refresh_token=SecretStr("old-refresh"),
scopes=["read"],
title="test",
)
resp = _mock_response(
{
"access_token": "refreshed-token",
"refresh_token": "new-refresh",
"expires_in": 3600,
}
)
with patch("backend.blocks.mcp.oauth.Requests") as MockRequests:
instance = MockRequests.return_value
instance.post = AsyncMock(return_value=resp)
refreshed = await handler._refresh_tokens(existing_creds)
assert refreshed.id == "existing-id"
assert refreshed.access_token.get_secret_value() == "refreshed-token"
assert refreshed.refresh_token is not None
assert refreshed.refresh_token.get_secret_value() == "new-refresh"
@pytest.mark.asyncio(loop_scope="session")
async def test_refresh_tokens_no_refresh_token(self):
handler = self._make_handler()
creds = OAuth2Credentials(
provider="mcp",
access_token=SecretStr("token"),
scopes=["read"],
title="test",
)
with pytest.raises(ValueError, match="No refresh token"):
await handler._refresh_tokens(creds)
@pytest.mark.asyncio(loop_scope="session")
async def test_revoke_tokens_no_url(self):
handler = self._make_handler(revoke_url=None)
creds = OAuth2Credentials(
provider="mcp",
access_token=SecretStr("token"),
scopes=[],
title="test",
)
result = await handler.revoke_tokens(creds)
assert result is False
@pytest.mark.asyncio(loop_scope="session")
async def test_revoke_tokens_with_url(self):
handler = self._make_handler(revoke_url="https://auth.example.com/revoke")
creds = OAuth2Credentials(
provider="mcp",
access_token=SecretStr("token"),
scopes=[],
title="test",
)
resp = _mock_response({}, status=200)
with patch("backend.blocks.mcp.oauth.Requests") as MockRequests:
instance = MockRequests.return_value
instance.post = AsyncMock(return_value=resp)
result = await handler.revoke_tokens(creds)
assert result is True
class TestMCPClientDiscovery:
"""Tests for MCPClient OAuth metadata discovery."""
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_auth_found(self):
client = MCPClient("https://mcp.example.com/mcp")
metadata = {
"authorization_servers": ["https://auth.example.com"],
"resource": "https://mcp.example.com/mcp",
}
resp = _mock_response(metadata, status=200)
with patch("backend.blocks.mcp.client.Requests") as MockRequests:
instance = MockRequests.return_value
instance.get = AsyncMock(return_value=resp)
result = await client.discover_auth()
assert result is not None
assert result["authorization_servers"] == ["https://auth.example.com"]
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_auth_not_found(self):
client = MCPClient("https://mcp.example.com/mcp")
resp = _mock_response({}, status=404)
with patch("backend.blocks.mcp.client.Requests") as MockRequests:
instance = MockRequests.return_value
instance.get = AsyncMock(return_value=resp)
result = await client.discover_auth()
assert result is None
@pytest.mark.asyncio(loop_scope="session")
async def test_discover_auth_server_metadata(self):
client = MCPClient("https://mcp.example.com/mcp")
server_metadata = {
"issuer": "https://auth.example.com",
"authorization_endpoint": "https://auth.example.com/authorize",
"token_endpoint": "https://auth.example.com/token",
"registration_endpoint": "https://auth.example.com/register",
"code_challenge_methods_supported": ["S256"],
}
resp = _mock_response(server_metadata, status=200)
with patch("backend.blocks.mcp.client.Requests") as MockRequests:
instance = MockRequests.return_value
instance.get = AsyncMock(return_value=resp)
result = await client.discover_auth_server_metadata(
"https://auth.example.com"
)
assert result is not None
assert result["authorization_endpoint"] == "https://auth.example.com/authorize"
assert result["token_endpoint"] == "https://auth.example.com/token"

View File

@@ -0,0 +1,162 @@
"""
Minimal MCP server for integration testing.
Implements the MCP Streamable HTTP transport (JSON-RPC 2.0 over HTTP POST)
with a few sample tools. Runs on localhost with a random available port.
"""
import json
import logging
from aiohttp import web
logger = logging.getLogger(__name__)
# Sample tools this test server exposes
TEST_TOOLS = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"inputSchema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name",
},
},
"required": ["city"],
},
},
{
"name": "add_numbers",
"description": "Add two numbers together",
"inputSchema": {
"type": "object",
"properties": {
"a": {"type": "number", "description": "First number"},
"b": {"type": "number", "description": "Second number"},
},
"required": ["a", "b"],
},
},
{
"name": "echo",
"description": "Echo back the input message",
"inputSchema": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "Message to echo"},
},
"required": ["message"],
},
},
]
def _handle_initialize(params: dict) -> dict:
return {
"protocolVersion": "2025-03-26",
"capabilities": {"tools": {"listChanged": False}},
"serverInfo": {"name": "test-mcp-server", "version": "1.0.0"},
}
def _handle_tools_list(params: dict) -> dict:
return {"tools": TEST_TOOLS}
def _handle_tools_call(params: dict) -> dict:
tool_name = params.get("name", "")
arguments = params.get("arguments", {})
if tool_name == "get_weather":
city = arguments.get("city", "Unknown")
return {
"content": [
{
"type": "text",
"text": json.dumps(
{"city": city, "temperature": 22, "condition": "sunny"}
),
}
],
}
elif tool_name == "add_numbers":
a = arguments.get("a", 0)
b = arguments.get("b", 0)
return {
"content": [{"type": "text", "text": json.dumps({"result": a + b})}],
}
elif tool_name == "echo":
message = arguments.get("message", "")
return {
"content": [{"type": "text", "text": message}],
}
else:
return {
"content": [{"type": "text", "text": f"Unknown tool: {tool_name}"}],
"isError": True,
}
HANDLERS = {
"initialize": _handle_initialize,
"tools/list": _handle_tools_list,
"tools/call": _handle_tools_call,
}
async def handle_mcp_request(request: web.Request) -> web.Response:
"""Handle incoming MCP JSON-RPC 2.0 requests."""
# Check auth if configured
expected_token = request.app.get("auth_token")
if expected_token:
auth_header = request.headers.get("Authorization", "")
if auth_header != f"Bearer {expected_token}":
return web.json_response(
{
"jsonrpc": "2.0",
"error": {"code": -32001, "message": "Unauthorized"},
"id": None,
},
status=401,
)
body = await request.json()
# Handle notifications (no id field) — just acknowledge
if "id" not in body:
return web.Response(status=202)
method = body.get("method", "")
params = body.get("params", {})
request_id = body.get("id")
handler = HANDLERS.get(method)
if not handler:
return web.json_response(
{
"jsonrpc": "2.0",
"error": {
"code": -32601,
"message": f"Method not found: {method}",
},
"id": request_id,
}
)
result = handler(params)
return web.json_response({"jsonrpc": "2.0", "result": result, "id": request_id})
def create_test_mcp_app(auth_token: str | None = None) -> web.Application:
"""Create an aiohttp app that acts as an MCP server."""
app = web.Application()
app.router.add_post("/mcp", handle_mcp_request)
if auth_token:
app["auth_token"] = auth_token
return app

View File

@@ -1,6 +1,7 @@
import logging
import os
import pytest
import pytest_asyncio
from dotenv import load_dotenv
@@ -27,6 +28,54 @@ async def server():
yield server
@pytest.fixture
def test_user_id() -> str:
"""Test user ID fixture."""
return "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
@pytest.fixture
def admin_user_id() -> str:
"""Admin user ID fixture."""
return "4e53486c-cf57-477e-ba2a-cb02dc828e1b"
@pytest.fixture
def target_user_id() -> str:
"""Target user ID fixture."""
return "5e53486c-cf57-477e-ba2a-cb02dc828e1c"
@pytest.fixture
async def setup_test_user(test_user_id):
"""Create test user in database before tests."""
from backend.data.user import get_or_create_user
# Create the test user in the database using JWT token format
user_data = {
"sub": test_user_id,
"email": "test@example.com",
"user_metadata": {"name": "Test User"},
}
await get_or_create_user(user_data)
return test_user_id
@pytest.fixture
async def setup_admin_user(admin_user_id):
"""Create admin user in database before tests."""
from backend.data.user import get_or_create_user
# Create the admin user in the database using JWT token format
user_data = {
"sub": admin_user_id,
"email": "test-admin@example.com",
"user_metadata": {"name": "Test Admin"},
}
await get_or_create_user(user_data)
return admin_user_id
@pytest_asyncio.fixture(scope="session", loop_scope="session", autouse=True)
async def graph_cleanup(server):
created_graph_ids = []

View File

@@ -0,0 +1 @@

View File

@@ -27,63 +27,38 @@ class ChatConfig(BaseSettings):
session_ttl: int = Field(default=43200, description="Session TTL in seconds")
# Streaming Configuration
max_context_messages: int = Field(
default=50, ge=1, le=200, description="Maximum context messages"
max_retries: int = Field(
default=3,
description="Max retries for fallback path (SDK handles retries internally)",
)
stream_timeout: int = Field(default=300, description="Stream timeout in seconds")
max_retries: int = Field(default=3, description="Maximum number of retries")
max_agent_runs: int = Field(default=30, description="Maximum number of agent runs")
max_agent_schedules: int = Field(
default=30, description="Maximum number of agent schedules"
)
# Long-running operation configuration
long_running_operation_ttl: int = Field(
default=600,
description="TTL in seconds for long-running operation tracking in Redis (safety net if pod dies)",
)
# Stream registry configuration for SSE reconnection
stream_ttl: int = Field(
default=3600,
description="TTL in seconds for stream data in Redis (1 hour)",
)
stream_lock_ttl: int = Field(
default=120,
description="TTL in seconds for stream lock (2 minutes). Short timeout allows "
"reconnection after refresh/crash without long waits.",
)
stream_max_length: int = Field(
default=10000,
description="Maximum number of messages to store per stream",
)
# Redis Streams configuration for completion consumer
stream_completion_name: str = Field(
default="chat:completions",
description="Redis Stream name for operation completions",
)
stream_consumer_group: str = Field(
default="chat_consumers",
description="Consumer group name for completion stream",
)
stream_claim_min_idle_ms: int = Field(
default=60000,
description="Minimum idle time in milliseconds before claiming pending messages from dead consumers",
)
# Redis key prefixes for stream registry
task_meta_prefix: str = Field(
session_meta_prefix: str = Field(
default="chat:task:meta:",
description="Prefix for task metadata hash keys",
description="Prefix for session metadata hash keys",
)
task_stream_prefix: str = Field(
turn_stream_prefix: str = Field(
default="chat:stream:",
description="Prefix for task message stream keys",
)
task_op_prefix: str = Field(
default="chat:task:op:",
description="Prefix for operation ID to task ID mapping keys",
)
internal_api_key: str | None = Field(
default=None,
description="API key for internal webhook callbacks (env: CHAT_INTERNAL_API_KEY)",
description="Prefix for turn message stream keys",
)
# Langfuse Prompt Management Configuration
@@ -93,6 +68,31 @@ class ChatConfig(BaseSettings):
description="Name of the prompt in Langfuse to fetch",
)
# Claude Agent SDK Configuration
use_claude_agent_sdk: bool = Field(
default=True,
description="Use Claude Agent SDK for chat completions",
)
claude_agent_model: str | None = Field(
default=None,
description="Model for the Claude Agent SDK path. If None, derives from "
"the `model` field by stripping the OpenRouter provider prefix.",
)
claude_agent_max_buffer_size: int = Field(
default=10 * 1024 * 1024, # 10MB (default SDK is 1MB)
description="Max buffer size in bytes for Claude Agent SDK JSON message parsing. "
"Increase if tool outputs exceed the limit.",
)
claude_agent_max_subtasks: int = Field(
default=10,
description="Max number of sub-agent Tasks the SDK can spawn per session.",
)
claude_agent_use_resume: bool = Field(
default=True,
description="Use --resume for multi-turn conversations instead of "
"history compression. Falls back to compression when unavailable.",
)
# Extended thinking configuration for Claude models
thinking_enabled: bool = Field(
default=True,
@@ -130,13 +130,16 @@ class ChatConfig(BaseSettings):
v = "https://openrouter.ai/api/v1"
return v
@field_validator("internal_api_key", mode="before")
@field_validator("use_claude_agent_sdk", mode="before")
@classmethod
def get_internal_api_key(cls, v):
"""Get internal API key from environment if not provided."""
if v is None:
v = os.getenv("CHAT_INTERNAL_API_KEY")
return v
def get_use_claude_agent_sdk(cls, v):
"""Get use_claude_agent_sdk from environment if not provided."""
# Check environment variable - default to True if not set
env_val = os.getenv("CHAT_USE_CLAUDE_AGENT_SDK", "").lower()
if env_val:
return env_val in ("true", "1", "yes", "on")
# Default to True (SDK enabled by default)
return True if v is None else v
# Prompt paths for different contexts
PROMPT_PATHS: dict[str, str] = {

View File

@@ -3,8 +3,9 @@
import asyncio
import logging
from datetime import UTC, datetime
from typing import Any, cast
from typing import Any
from prisma.errors import UniqueViolationError
from prisma.models import ChatMessage as PrismaChatMessage
from prisma.models import ChatSession as PrismaChatSession
from prisma.types import (
@@ -14,29 +15,27 @@ from prisma.types import (
ChatSessionWhereInput,
)
from backend.data.db import transaction
from backend.data import db
from backend.util.json import SafeJson
from .model import ChatMessage, ChatSession, ChatSessionInfo
logger = logging.getLogger(__name__)
async def get_chat_session(session_id: str) -> PrismaChatSession | None:
async def get_chat_session(session_id: str) -> ChatSession | None:
"""Get a chat session by ID from the database."""
session = await PrismaChatSession.prisma().find_unique(
where={"id": session_id},
include={"Messages": True},
include={"Messages": {"order_by": {"sequence": "asc"}}},
)
if session and session.Messages:
# Sort messages by sequence in Python - Prisma Python client doesn't support
# order_by in include clauses (unlike Prisma JS), so we sort after fetching
session.Messages.sort(key=lambda m: m.sequence)
return session
return ChatSession.from_db(session) if session else None
async def create_chat_session(
session_id: str,
user_id: str,
) -> PrismaChatSession:
) -> ChatSessionInfo:
"""Create a new chat session in the database."""
data = ChatSessionCreateInput(
id=session_id,
@@ -45,7 +44,8 @@ async def create_chat_session(
successfulAgentRuns=SafeJson({}),
successfulAgentSchedules=SafeJson({}),
)
return await PrismaChatSession.prisma().create(data=data)
prisma_session = await PrismaChatSession.prisma().create(data=data)
return ChatSessionInfo.from_db(prisma_session)
async def update_chat_session(
@@ -56,7 +56,7 @@ async def update_chat_session(
total_prompt_tokens: int | None = None,
total_completion_tokens: int | None = None,
title: str | None = None,
) -> PrismaChatSession | None:
) -> ChatSession | None:
"""Update a chat session's metadata."""
data: ChatSessionUpdateInput = {"updatedAt": datetime.now(UTC)}
@@ -76,12 +76,9 @@ async def update_chat_session(
session = await PrismaChatSession.prisma().update(
where={"id": session_id},
data=data,
include={"Messages": True},
include={"Messages": {"order_by": {"sequence": "asc"}}},
)
if session and session.Messages:
# Sort in Python - Prisma Python doesn't support order_by in include clauses
session.Messages.sort(key=lambda m: m.sequence)
return session
return ChatSession.from_db(session) if session else None
async def add_chat_message(
@@ -94,12 +91,11 @@ async def add_chat_message(
refusal: str | None = None,
tool_calls: list[dict[str, Any]] | None = None,
function_call: dict[str, Any] | None = None,
) -> PrismaChatMessage:
) -> ChatMessage:
"""Add a message to a chat session."""
# Build input dict dynamically rather than using ChatMessageCreateInput directly
# because Prisma's TypedDict validation rejects optional fields set to None.
# We only include fields that have values, then cast at the end.
data: dict[str, Any] = {
# Build ChatMessageCreateInput with only non-None values
# (Prisma TypedDict rejects optional fields set to None)
data: ChatMessageCreateInput = {
"Session": {"connect": {"id": session_id}},
"role": role,
"sequence": sequence,
@@ -127,81 +123,117 @@ async def add_chat_message(
where={"id": session_id},
data={"updatedAt": datetime.now(UTC)},
),
PrismaChatMessage.prisma().create(data=cast(ChatMessageCreateInput, data)),
PrismaChatMessage.prisma().create(data=data),
)
return message
return ChatMessage.from_db(message)
async def add_chat_messages_batch(
session_id: str,
messages: list[dict[str, Any]],
start_sequence: int,
) -> list[PrismaChatMessage]:
) -> int:
"""Add multiple messages to a chat session in a batch.
Uses a transaction for atomicity - if any message creation fails,
the entire batch is rolled back.
Uses collision detection with retry: tries to create messages starting
at start_sequence. If a unique constraint violation occurs (e.g., the
streaming loop and long-running callback race), queries the latest
sequence and retries with the correct offset. This avoids unnecessary
upserts and DB queries in the common case (no collision).
Returns:
Next sequence number for the next message to be inserted. This equals
start_sequence + len(messages) and allows callers to update their
counters even when collision detection adjusts start_sequence.
"""
if not messages:
return []
# No messages to add - return current count
return start_sequence
created_messages = []
max_retries = 5
for attempt in range(max_retries):
try:
# Single timestamp for all messages and session update
now = datetime.now(UTC)
async with transaction() as tx:
for i, msg in enumerate(messages):
# Build input dict dynamically rather than using ChatMessageCreateInput
# directly because Prisma's TypedDict validation rejects optional fields
# set to None. We only include fields that have values, then cast.
data: dict[str, Any] = {
"Session": {"connect": {"id": session_id}},
"role": msg["role"],
"sequence": start_sequence + i,
}
async with db.transaction() as tx:
# Build all message data
messages_data = []
for i, msg in enumerate(messages):
# Build ChatMessageCreateInput with only non-None values
# (Prisma TypedDict rejects optional fields set to None)
# Note: create_many doesn't support nested creates, use sessionId directly
data: ChatMessageCreateInput = {
"sessionId": session_id,
"role": msg["role"],
"sequence": start_sequence + i,
"createdAt": now,
}
# Add optional string fields
if msg.get("content") is not None:
data["content"] = msg["content"]
if msg.get("name") is not None:
data["name"] = msg["name"]
if msg.get("tool_call_id") is not None:
data["toolCallId"] = msg["tool_call_id"]
if msg.get("refusal") is not None:
data["refusal"] = msg["refusal"]
# Add optional string fields
if msg.get("content") is not None:
data["content"] = msg["content"]
if msg.get("name") is not None:
data["name"] = msg["name"]
if msg.get("tool_call_id") is not None:
data["toolCallId"] = msg["tool_call_id"]
if msg.get("refusal") is not None:
data["refusal"] = msg["refusal"]
# Add optional JSON fields only when they have values
if msg.get("tool_calls") is not None:
data["toolCalls"] = SafeJson(msg["tool_calls"])
if msg.get("function_call") is not None:
data["functionCall"] = SafeJson(msg["function_call"])
# Add optional JSON fields only when they have values
if msg.get("tool_calls") is not None:
data["toolCalls"] = SafeJson(msg["tool_calls"])
if msg.get("function_call") is not None:
data["functionCall"] = SafeJson(msg["function_call"])
created = await PrismaChatMessage.prisma(tx).create(
data=cast(ChatMessageCreateInput, data)
)
created_messages.append(created)
messages_data.append(data)
# Update session's updatedAt timestamp within the same transaction.
# Note: Token usage (total_prompt_tokens, total_completion_tokens) is updated
# separately via update_chat_session() after streaming completes.
await PrismaChatSession.prisma(tx).update(
where={"id": session_id},
data={"updatedAt": datetime.now(UTC)},
)
# Run create_many and session update in parallel within transaction
# Both use the same timestamp for consistency
await asyncio.gather(
PrismaChatMessage.prisma(tx).create_many(data=messages_data),
PrismaChatSession.prisma(tx).update(
where={"id": session_id},
data={"updatedAt": now},
),
)
return created_messages
# Return next sequence number for counter sync
return start_sequence + len(messages)
except UniqueViolationError:
if attempt < max_retries - 1:
# Collision detected - query MAX(sequence)+1 and retry with correct offset
logger.info(
f"Collision detected for session {session_id} at sequence "
f"{start_sequence}, querying DB for latest sequence"
)
start_sequence = await get_next_sequence(session_id)
logger.info(
f"Retrying batch insert with start_sequence={start_sequence}"
)
continue
else:
# Max retries exceeded - propagate error
raise
# Should never reach here due to raise in exception handler
raise RuntimeError(f"Failed to insert messages after {max_retries} attempts")
async def get_user_chat_sessions(
user_id: str,
limit: int = 50,
offset: int = 0,
) -> list[PrismaChatSession]:
) -> list[ChatSessionInfo]:
"""Get chat sessions for a user, ordered by most recent."""
return await PrismaChatSession.prisma().find_many(
prisma_sessions = await PrismaChatSession.prisma().find_many(
where={"userId": user_id},
order={"updatedAt": "desc"},
take=limit,
skip=offset,
)
return [ChatSessionInfo.from_db(s) for s in prisma_sessions]
async def get_user_session_count(user_id: str) -> int:
@@ -240,10 +272,20 @@ async def delete_chat_session(session_id: str, user_id: str | None = None) -> bo
return False
async def get_chat_session_message_count(session_id: str) -> int:
"""Get the number of messages in a chat session."""
count = await PrismaChatMessage.prisma().count(where={"sessionId": session_id})
return count
async def get_next_sequence(session_id: str) -> int:
"""Get the next sequence number for a new message in this session.
Uses MAX(sequence) + 1 for robustness. Returns 0 if no messages exist.
More robust than COUNT(*) because it's immune to deleted messages.
Optimized to select only the sequence column using raw SQL.
The unique index on (sessionId, sequence) makes this query fast.
"""
results = await db.query_raw_with_schema(
'SELECT "sequence" FROM {schema_prefix}"ChatMessage" WHERE "sessionId" = $1 ORDER BY "sequence" DESC LIMIT 1',
session_id,
)
return 0 if not results else results[0]["sequence"] + 1
async def update_tool_message_content(

View File

@@ -0,0 +1,18 @@
"""Entry point for running the CoPilot Executor service.
Usage:
python -m backend.copilot.executor
"""
from backend.app import run_processes
from .manager import CoPilotExecutor
def main():
"""Run the CoPilot Executor service."""
run_processes(CoPilotExecutor())
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,520 @@
"""CoPilot Executor Manager - main service for CoPilot task execution.
This module contains the CoPilotExecutor class that consumes chat tasks from
RabbitMQ and processes them using a thread pool, following the graph executor pattern.
"""
import logging
import os
import threading
import time
import uuid
from concurrent.futures import Future, ThreadPoolExecutor
from pika.adapters.blocking_connection import BlockingChannel
from pika.exceptions import AMQPChannelError, AMQPConnectionError
from pika.spec import Basic, BasicProperties
from prometheus_client import Gauge, start_http_server
from backend.data import redis_client as redis
from backend.data.rabbitmq import SyncRabbitMQ
from backend.executor.cluster_lock import ClusterLock
from backend.util.decorator import error_logged
from backend.util.logging import TruncatedLogger
from backend.util.process import AppProcess
from backend.util.retry import continuous_retry
from backend.util.settings import Settings
from .processor import execute_copilot_task, init_worker
from .utils import (
COPILOT_CANCEL_QUEUE_NAME,
COPILOT_EXECUTION_QUEUE_NAME,
GRACEFUL_SHUTDOWN_TIMEOUT_SECONDS,
CancelCoPilotEvent,
CoPilotExecutionEntry,
create_copilot_queue_config,
)
logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")
settings = Settings()
# Prometheus metrics
active_tasks_gauge = Gauge(
"copilot_executor_active_tasks",
"Number of active CoPilot tasks",
)
pool_size_gauge = Gauge(
"copilot_executor_pool_size",
"Maximum number of CoPilot executor workers",
)
utilization_gauge = Gauge(
"copilot_executor_utilization_ratio",
"Ratio of active tasks to pool size",
)
class CoPilotExecutor(AppProcess):
"""CoPilot Executor service for processing chat generation tasks.
This service consumes tasks from RabbitMQ, processes them using a thread pool,
and publishes results to Redis Streams. It follows the graph executor pattern
for reliable message handling and graceful shutdown.
Key features:
- RabbitMQ-based task distribution with manual acknowledgment
- Thread pool executor for concurrent task processing
- Cluster lock for duplicate prevention across pods
- Graceful shutdown with timeout for in-flight tasks
- FANOUT exchange for cancellation broadcast
"""
def __init__(self):
super().__init__()
self.pool_size = settings.config.num_copilot_workers
self.active_tasks: dict[str, tuple[Future, threading.Event]] = {}
self.executor_id = str(uuid.uuid4())
self._executor = None
self._stop_consuming = None
self._cancel_thread = None
self._cancel_client = None
self._run_thread = None
self._run_client = None
self._task_locks: dict[str, ClusterLock] = {}
self._active_tasks_lock = threading.Lock()
# ============ Main Entry Points (AppProcess interface) ============ #
def run(self):
"""Main service loop - consume from RabbitMQ."""
logger.info(f"Pod assigned executor_id: {self.executor_id}")
logger.info(f"Spawn max-{self.pool_size} workers...")
pool_size_gauge.set(self.pool_size)
self._update_metrics()
start_http_server(settings.config.copilot_executor_port)
self.cancel_thread.start()
self.run_thread.start()
while True:
time.sleep(1e5)
def cleanup(self):
"""Graceful shutdown with active execution waiting."""
pid = os.getpid()
logger.info(f"[cleanup {pid}] Starting graceful shutdown...")
# Signal the consumer thread to stop
try:
self.stop_consuming.set()
run_channel = self.run_client.get_channel()
run_channel.connection.add_callback_threadsafe(
lambda: run_channel.stop_consuming()
)
logger.info(f"[cleanup {pid}] Consumer has been signaled to stop")
except Exception as e:
logger.error(f"[cleanup {pid}] Error stopping consumer: {e}")
# Wait for active executions to complete
if self.active_tasks:
logger.info(
f"[cleanup {pid}] Waiting for {len(self.active_tasks)} active tasks to complete (timeout: {GRACEFUL_SHUTDOWN_TIMEOUT_SECONDS}s)..."
)
start_time = time.monotonic()
last_refresh = start_time
lock_refresh_interval = settings.config.cluster_lock_timeout / 10
while (
self.active_tasks
and (time.monotonic() - start_time) < GRACEFUL_SHUTDOWN_TIMEOUT_SECONDS
):
self._cleanup_completed_tasks()
if not self.active_tasks:
break
# Refresh cluster locks periodically
current_time = time.monotonic()
if current_time - last_refresh >= lock_refresh_interval:
for lock in list(self._task_locks.values()):
try:
lock.refresh()
except Exception as e:
logger.warning(
f"[cleanup {pid}] Failed to refresh lock: {e}"
)
last_refresh = current_time
logger.info(
f"[cleanup {pid}] {len(self.active_tasks)} tasks still active, waiting..."
)
time.sleep(10.0)
# Stop message consumers
if self._run_thread:
self._stop_message_consumers(
self._run_thread, self.run_client, "[cleanup][run]"
)
if self._cancel_thread:
self._stop_message_consumers(
self._cancel_thread, self.cancel_client, "[cleanup][cancel]"
)
# Clean up worker threads (closes per-loop workspace storage sessions)
if self._executor:
from .processor import cleanup_worker
logger.info(f"[cleanup {pid}] Cleaning up workers...")
futures = []
for _ in range(self._executor._max_workers):
futures.append(self._executor.submit(cleanup_worker))
for f in futures:
try:
f.result(timeout=10)
except Exception as e:
logger.warning(f"[cleanup {pid}] Worker cleanup error: {e}")
logger.info(f"[cleanup {pid}] Shutting down executor...")
self._executor.shutdown(wait=False)
# Release any remaining locks
for session_id, lock in list(self._task_locks.items()):
try:
lock.release()
logger.info(f"[cleanup {pid}] Released lock for {session_id}")
except Exception as e:
logger.error(
f"[cleanup {pid}] Failed to release lock for {session_id}: {e}"
)
logger.info(f"[cleanup {pid}] Graceful shutdown completed")
# ============ RabbitMQ Consumer Methods ============ #
@continuous_retry()
def _consume_cancel(self):
"""Consume cancellation messages from FANOUT exchange."""
if self.stop_consuming.is_set() and not self.active_tasks:
logger.info("Stop reconnecting cancel consumer - service cleaned up")
return
if not self.cancel_client.is_ready:
self.cancel_client.disconnect()
self.cancel_client.connect()
# Check again after connect - shutdown may have been requested
if self.stop_consuming.is_set() and not self.active_tasks:
logger.info("Stop consuming requested during reconnect - disconnecting")
self.cancel_client.disconnect()
return
cancel_channel = self.cancel_client.get_channel()
cancel_channel.basic_consume(
queue=COPILOT_CANCEL_QUEUE_NAME,
on_message_callback=self._handle_cancel_message,
auto_ack=True,
)
logger.info("Starting to consume cancel messages...")
cancel_channel.start_consuming()
if not self.stop_consuming.is_set() or self.active_tasks:
raise RuntimeError("Cancel message consumer stopped unexpectedly")
logger.info("Cancel message consumer stopped gracefully")
@continuous_retry()
def _consume_run(self):
"""Consume run messages from DIRECT exchange."""
if self.stop_consuming.is_set():
logger.info("Stop reconnecting run consumer - service cleaned up")
return
if not self.run_client.is_ready:
self.run_client.disconnect()
self.run_client.connect()
# Check again after connect - shutdown may have been requested
if self.stop_consuming.is_set():
logger.info("Stop consuming requested during reconnect - disconnecting")
self.run_client.disconnect()
return
run_channel = self.run_client.get_channel()
run_channel.basic_qos(prefetch_count=self.pool_size)
run_channel.basic_consume(
queue=COPILOT_EXECUTION_QUEUE_NAME,
on_message_callback=self._handle_run_message,
auto_ack=False,
consumer_tag="copilot_execution_consumer",
)
logger.info("Starting to consume run messages...")
run_channel.start_consuming()
if not self.stop_consuming.is_set():
raise RuntimeError("Run message consumer stopped unexpectedly")
logger.info("Run message consumer stopped gracefully")
# ============ Message Handlers ============ #
@error_logged(swallow=True)
def _handle_cancel_message(
self,
_channel: BlockingChannel,
_method: Basic.Deliver,
_properties: BasicProperties,
body: bytes,
):
"""Handle cancel message from FANOUT exchange."""
request = CancelCoPilotEvent.model_validate_json(body)
session_id = request.session_id
if not session_id:
logger.warning("Cancel message missing 'session_id'")
return
if session_id not in self.active_tasks:
logger.debug(f"Cancel received for {session_id} but not active")
return
_, cancel_event = self.active_tasks[session_id]
logger.info(f"Received cancel for {session_id}")
if not cancel_event.is_set():
cancel_event.set()
else:
logger.debug(f"Cancel already set for {session_id}")
def _handle_run_message(
self,
_channel: BlockingChannel,
method: Basic.Deliver,
_properties: BasicProperties,
body: bytes,
):
"""Handle run message from DIRECT exchange."""
delivery_tag = method.delivery_tag
# Capture the channel used at message delivery time to ensure we ack
# on the correct channel. Delivery tags are channel-scoped and become
# invalid if the channel is recreated after reconnection.
delivery_channel = _channel
def ack_message(reject: bool, requeue: bool):
"""Acknowledge or reject the message.
Uses the channel from the original message delivery. If the channel
is no longer open (e.g., after reconnection), logs a warning and
skips the ack - RabbitMQ will redeliver the message automatically.
"""
try:
if not delivery_channel.is_open:
logger.warning(
f"Channel closed, cannot ack delivery_tag={delivery_tag}. "
"Message will be redelivered by RabbitMQ."
)
return
if reject:
delivery_channel.connection.add_callback_threadsafe(
lambda: delivery_channel.basic_nack(
delivery_tag, requeue=requeue
)
)
else:
delivery_channel.connection.add_callback_threadsafe(
lambda: delivery_channel.basic_ack(delivery_tag)
)
except (AMQPChannelError, AMQPConnectionError) as e:
# Channel/connection errors indicate stale delivery tag - don't retry
logger.warning(
f"Cannot ack delivery_tag={delivery_tag} due to channel/connection "
f"error: {e}. Message will be redelivered by RabbitMQ."
)
except Exception as e:
# Other errors might be transient, but log and skip to avoid blocking
logger.error(
f"Unexpected error acking delivery_tag={delivery_tag}: {e}"
)
# Check if we're shutting down
if self.stop_consuming.is_set():
logger.info("Rejecting new task during shutdown")
ack_message(reject=True, requeue=True)
return
# Check if we can accept more tasks
self._cleanup_completed_tasks()
if len(self.active_tasks) >= self.pool_size:
ack_message(reject=True, requeue=True)
return
try:
entry = CoPilotExecutionEntry.model_validate_json(body)
except Exception as e:
logger.error(f"Could not parse run message: {e}, body={body}")
ack_message(reject=True, requeue=False)
return
session_id = entry.session_id
# Check for local duplicate - session is already running on this executor
if session_id in self.active_tasks:
logger.warning(
f"Session {session_id} already running locally, rejecting duplicate"
)
ack_message(reject=True, requeue=False)
return
# Try to acquire cluster-wide lock
cluster_lock = ClusterLock(
redis=redis.get_redis(),
key=f"copilot:session:{session_id}:lock",
owner_id=self.executor_id,
timeout=settings.config.cluster_lock_timeout,
)
current_owner = cluster_lock.try_acquire()
if current_owner != self.executor_id:
if current_owner is not None:
logger.warning(
f"Session {session_id} already running on pod {current_owner}"
)
ack_message(reject=True, requeue=False)
else:
logger.warning(
f"Could not acquire lock for {session_id} - Redis unavailable"
)
ack_message(reject=True, requeue=True)
return
# Execute the task
try:
self._task_locks[session_id] = cluster_lock
logger.info(
f"Acquired cluster lock for {session_id}, "
f"executor_id={self.executor_id}"
)
cancel_event = threading.Event()
future = self.executor.submit(
execute_copilot_task, entry, cancel_event, cluster_lock
)
self.active_tasks[session_id] = (future, cancel_event)
except Exception as e:
logger.warning(f"Failed to setup execution for {session_id}: {e}")
cluster_lock.release()
if session_id in self._task_locks:
del self._task_locks[session_id]
ack_message(reject=True, requeue=True)
return
self._update_metrics()
def on_run_done(f: Future):
logger.info(f"Run completed for {session_id}")
try:
if exec_error := f.exception():
logger.error(f"Execution for {session_id} failed: {exec_error}")
ack_message(reject=True, requeue=False)
else:
ack_message(reject=False, requeue=False)
except BaseException as e:
logger.exception(f"Error in run completion callback: {e}")
finally:
# Release the cluster lock
if session_id in self._task_locks:
logger.info(f"Releasing cluster lock for {session_id}")
self._task_locks[session_id].release()
del self._task_locks[session_id]
self._cleanup_completed_tasks()
future.add_done_callback(on_run_done)
# ============ Helper Methods ============ #
def _cleanup_completed_tasks(self) -> list[str]:
"""Remove completed futures from active_tasks and update metrics."""
completed_tasks = []
with self._active_tasks_lock:
for session_id, (future, _) in list(self.active_tasks.items()):
if future.done():
completed_tasks.append(session_id)
self.active_tasks.pop(session_id, None)
logger.info(f"Cleaned up completed session {session_id}")
self._update_metrics()
return completed_tasks
def _update_metrics(self):
"""Update Prometheus metrics."""
active_count = len(self.active_tasks)
active_tasks_gauge.set(active_count)
if self.stop_consuming.is_set():
utilization_gauge.set(1.0)
else:
utilization_gauge.set(
active_count / self.pool_size if self.pool_size > 0 else 0
)
def _stop_message_consumers(
self, thread: threading.Thread, client: SyncRabbitMQ, prefix: str
):
"""Stop a message consumer thread."""
try:
channel = client.get_channel()
channel.connection.add_callback_threadsafe(lambda: channel.stop_consuming())
thread.join(timeout=300)
if thread.is_alive():
logger.error(
f"{prefix} Thread did not finish in time, forcing disconnect"
)
client.disconnect()
logger.info(f"{prefix} Client disconnected")
except Exception as e:
logger.error(f"{prefix} Error disconnecting client: {e}")
# ============ Lazy-initialized Properties ============ #
@property
def cancel_thread(self) -> threading.Thread:
if self._cancel_thread is None:
self._cancel_thread = threading.Thread(
target=lambda: self._consume_cancel(),
daemon=True,
)
return self._cancel_thread
@property
def run_thread(self) -> threading.Thread:
if self._run_thread is None:
self._run_thread = threading.Thread(
target=lambda: self._consume_run(),
daemon=True,
)
return self._run_thread
@property
def stop_consuming(self) -> threading.Event:
if self._stop_consuming is None:
self._stop_consuming = threading.Event()
return self._stop_consuming
@property
def executor(self) -> ThreadPoolExecutor:
if self._executor is None:
self._executor = ThreadPoolExecutor(
max_workers=self.pool_size,
initializer=init_worker,
)
return self._executor
@property
def cancel_client(self) -> SyncRabbitMQ:
if self._cancel_client is None:
self._cancel_client = SyncRabbitMQ(create_copilot_queue_config())
return self._cancel_client
@property
def run_client(self) -> SyncRabbitMQ:
if self._run_client is None:
self._run_client = SyncRabbitMQ(create_copilot_queue_config())
return self._run_client

View File

@@ -0,0 +1,287 @@
"""CoPilot execution processor - per-worker execution logic.
This module contains the processor class that handles CoPilot task execution
in a thread-local context, following the graph executor pattern.
"""
import asyncio
import logging
import threading
import time
from backend.copilot import service as copilot_service
from backend.copilot import stream_registry
from backend.copilot.config import ChatConfig
from backend.copilot.response_model import StreamFinish, StreamFinishStep
from backend.copilot.sdk import service as sdk_service
from backend.executor.cluster_lock import ClusterLock
from backend.util.decorator import error_logged
from backend.util.feature_flag import Flag, is_feature_enabled
from backend.util.logging import TruncatedLogger, configure_logging
from backend.util.process import set_service_name
from backend.util.retry import func_retry
from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")
# ============ Module Entry Points ============ #
# Thread-local storage for processor instances
_tls = threading.local()
def execute_copilot_task(
entry: CoPilotExecutionEntry,
cancel: threading.Event,
cluster_lock: ClusterLock,
):
"""Execute a CoPilot task using the thread-local processor.
This function is the entry point called by the thread pool executor.
Args:
entry: The task payload
cancel: Threading event to signal cancellation
cluster_lock: Distributed lock for this execution
"""
processor: CoPilotProcessor = _tls.processor
return processor.execute(entry, cancel, cluster_lock)
def init_worker():
"""Initialize the processor for the current worker thread.
This function is called by the thread pool executor when a new worker
thread is created. It ensures each worker has its own processor instance.
"""
_tls.processor = CoPilotProcessor()
_tls.processor.on_executor_start()
def cleanup_worker():
"""Clean up the processor for the current worker thread.
Should be called before the worker thread's event loop is destroyed so
that event-loop-bound resources (e.g. ``aiohttp.ClientSession``) are
closed on the correct loop.
"""
processor: CoPilotProcessor | None = getattr(_tls, "processor", None)
if processor is not None:
processor.cleanup()
# ============ Processor Class ============ #
class CoPilotProcessor:
"""Per-worker execution logic for CoPilot tasks.
This class is instantiated once per worker thread and handles the execution
of CoPilot chat generation tasks. It maintains an async event loop for
running the async service code.
The execution flow:
1. CoPilot task is picked from RabbitMQ queue
2. Manager submits task to thread pool
3. Processor executes the task in its event loop
4. Results are published to Redis Streams
"""
@func_retry
def on_executor_start(self):
"""Initialize the processor when the worker thread starts.
This method is called once per worker thread to set up the async event
loop and initialize any required resources.
Database is accessed only through DatabaseManager, so we don't need to connect
to Prisma directly.
"""
configure_logging()
set_service_name("CoPilotExecutor")
self.tid = threading.get_ident()
self.execution_loop = asyncio.new_event_loop()
self.execution_thread = threading.Thread(
target=self.execution_loop.run_forever, daemon=True
)
self.execution_thread.start()
logger.info(f"[CoPilotExecutor] Worker {self.tid} started")
def cleanup(self):
"""Clean up event-loop-bound resources before the loop is destroyed.
Shuts down the workspace storage instance that belongs to this
worker's event loop, ensuring ``aiohttp.ClientSession.close()``
runs on the same loop that created the session.
"""
from backend.util.workspace_storage import shutdown_workspace_storage
try:
future = asyncio.run_coroutine_threadsafe(
shutdown_workspace_storage(), self.execution_loop
)
future.result(timeout=5)
except Exception as e:
logger.warning(f"[CoPilotExecutor] Worker {self.tid} cleanup error: {e}")
# Stop the event loop
self.execution_loop.call_soon_threadsafe(self.execution_loop.stop)
self.execution_thread.join(timeout=5)
logger.info(f"[CoPilotExecutor] Worker {self.tid} cleaned up")
@error_logged(swallow=False)
def execute(
self,
entry: CoPilotExecutionEntry,
cancel: threading.Event,
cluster_lock: ClusterLock,
):
"""Execute a CoPilot task.
This is the main entry point for task execution. It runs the async
execution logic in the worker's event loop and handles errors.
Args:
entry: The task payload containing session and message info
cancel: Threading event to signal cancellation
cluster_lock: Distributed lock to prevent duplicate execution
"""
log = CoPilotLogMetadata(
logging.getLogger(__name__),
session_id=entry.session_id,
user_id=entry.user_id,
)
log.info("Starting execution")
start_time = time.monotonic()
try:
# Run the async execution in our event loop
future = asyncio.run_coroutine_threadsafe(
self._execute_async(entry, cancel, cluster_lock, log),
self.execution_loop,
)
# Wait for completion, checking cancel periodically
while not future.done():
try:
future.result(timeout=1.0)
except asyncio.TimeoutError:
if cancel.is_set():
log.info("Cancellation requested")
future.cancel()
break
# Refresh cluster lock to maintain ownership
cluster_lock.refresh()
if not future.cancelled():
# Get result to propagate any exceptions
future.result()
elapsed = time.monotonic() - start_time
log.info(f"Execution completed in {elapsed:.2f}s")
except Exception as e:
elapsed = time.monotonic() - start_time
log.error(f"Execution failed after {elapsed:.2f}s: {e}")
# Note: _execute_async already marks the task as failed before re-raising,
# so we don't call _mark_task_failed here to avoid duplicate error events.
raise
async def _execute_async(
self,
entry: CoPilotExecutionEntry,
cancel: threading.Event,
cluster_lock: ClusterLock,
log: CoPilotLogMetadata,
):
"""Async execution logic for CoPilot task.
This method calls the existing stream_chat_completion service function
and publishes results to the stream registry.
Args:
entry: The task payload
cancel: Threading event to signal cancellation
cluster_lock: Distributed lock for refresh
log: Structured logger for this task
"""
last_refresh = time.monotonic()
refresh_interval = 30.0 # Refresh lock every 30 seconds
try:
# Choose service based on LaunchDarkly flag
config = ChatConfig()
use_sdk = await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else copilot_service.stream_chat_completion
)
log.info(f"Using {'SDK' if use_sdk else 'standard'} service")
# Stream chat completion and publish chunks to Redis
async for chunk in stream_fn(
session_id=entry.session_id,
message=entry.message if entry.message else None,
is_user_message=entry.is_user_message,
user_id=entry.user_id,
context=entry.context,
):
# Check for cancellation
if cancel.is_set():
log.info("Cancelled during streaming")
await stream_registry.publish_chunk(
entry.turn_id, StreamFinishStep()
)
await stream_registry.mark_session_completed(
entry.session_id,
error_message="Operation cancelled",
)
return
current_time = time.monotonic()
if current_time - last_refresh >= refresh_interval:
cluster_lock.refresh()
last_refresh = current_time
if isinstance(chunk, StreamFinish):
break
try:
await stream_registry.publish_chunk(entry.turn_id, chunk)
except Exception as e:
log.error(
f"Error publishing chunk {type(chunk).__name__}: {e}",
exc_info=True,
)
await stream_registry.mark_session_completed(entry.session_id)
log.info("Task completed successfully")
except asyncio.CancelledError:
log.info("Task cancelled")
await stream_registry.mark_session_completed(
entry.session_id, error_message="Task was cancelled"
)
raise
except Exception as e:
log.error(f"Task failed: {e}")
try:
await stream_registry.publish_chunk(entry.turn_id, StreamFinishStep())
await stream_registry.mark_session_completed(
entry.session_id, error_message=str(e)
)
except Exception as mark_err:
logger.error(
f"Failed to mark session {entry.session_id} as failed: {mark_err}"
)
raise

View File

@@ -0,0 +1,218 @@
"""RabbitMQ queue configuration for CoPilot executor.
Defines two exchanges and queues following the graph executor pattern:
- 'copilot_execution' (DIRECT) for chat generation tasks
- 'copilot_cancel' (FANOUT) for cancellation requests
"""
import logging
from pydantic import BaseModel
from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
from backend.util.logging import TruncatedLogger, is_structured_logging_enabled
logger = logging.getLogger(__name__)
# ============ Logging Helper ============ #
class CoPilotLogMetadata(TruncatedLogger):
"""Structured logging helper for CoPilot executor.
In cloud environments (structured logging enabled), uses a simple prefix
and passes metadata via json_fields. In local environments, uses a detailed
prefix with all metadata key-value pairs for easier debugging.
Args:
logger: The underlying logger instance
max_length: Maximum log message length before truncation
**kwargs: Metadata key-value pairs (e.g., session_id="xyz", turn_id="abc")
These are added to json_fields in cloud mode, or to the prefix in local mode.
"""
def __init__(
self,
logger: logging.Logger,
max_length: int = 1000,
**kwargs: str | None,
):
# Filter out None values
metadata = {k: v for k, v in kwargs.items() if v is not None}
metadata["component"] = "CoPilotExecutor"
if is_structured_logging_enabled():
prefix = "[CoPilotExecutor]"
else:
# Build prefix from metadata key-value pairs
meta_parts = "|".join(
f"{k}:{v}" for k, v in metadata.items() if k != "component"
)
prefix = (
f"[CoPilotExecutor|{meta_parts}]" if meta_parts else "[CoPilotExecutor]"
)
super().__init__(
logger,
max_length=max_length,
prefix=prefix,
metadata=metadata,
)
# ============ Exchange and Queue Configuration ============ #
COPILOT_EXECUTION_EXCHANGE = Exchange(
name="copilot_execution",
type=ExchangeType.DIRECT,
durable=True,
auto_delete=False,
)
COPILOT_EXECUTION_QUEUE_NAME = "copilot_execution_queue"
COPILOT_EXECUTION_ROUTING_KEY = "copilot.run"
COPILOT_CANCEL_EXCHANGE = Exchange(
name="copilot_cancel",
type=ExchangeType.FANOUT,
durable=True,
auto_delete=False,
)
COPILOT_CANCEL_QUEUE_NAME = "copilot_cancel_queue"
# CoPilot operations can include extended thinking and agent generation
# which may take 30+ minutes to complete
COPILOT_CONSUMER_TIMEOUT_SECONDS = 60 * 60 # 1 hour
# Graceful shutdown timeout - allow in-flight operations to complete
GRACEFUL_SHUTDOWN_TIMEOUT_SECONDS = 30 * 60 # 30 minutes
def create_copilot_queue_config() -> RabbitMQConfig:
"""Create RabbitMQ configuration for CoPilot executor.
Defines two exchanges and queues:
- 'copilot_execution' (DIRECT) for chat generation tasks
- 'copilot_cancel' (FANOUT) for cancellation requests
Returns:
RabbitMQConfig with exchanges and queues defined
"""
run_queue = Queue(
name=COPILOT_EXECUTION_QUEUE_NAME,
exchange=COPILOT_EXECUTION_EXCHANGE,
routing_key=COPILOT_EXECUTION_ROUTING_KEY,
durable=True,
auto_delete=False,
arguments={
# Extended consumer timeout for long-running LLM operations
# Default 30-minute timeout is insufficient for extended thinking
# and agent generation which can take 30+ minutes
"x-consumer-timeout": COPILOT_CONSUMER_TIMEOUT_SECONDS
* 1000,
},
)
cancel_queue = Queue(
name=COPILOT_CANCEL_QUEUE_NAME,
exchange=COPILOT_CANCEL_EXCHANGE,
routing_key="", # not used for FANOUT
durable=True,
auto_delete=False,
)
return RabbitMQConfig(
vhost="/",
exchanges=[COPILOT_EXECUTION_EXCHANGE, COPILOT_CANCEL_EXCHANGE],
queues=[run_queue, cancel_queue],
)
# ============ Message Models ============ #
class CoPilotExecutionEntry(BaseModel):
"""Task payload for CoPilot AI generation.
This model represents a chat generation task to be processed by the executor.
"""
session_id: str
"""Chat session ID (also used for dedup/locking)"""
turn_id: str = ""
"""Per-turn UUID for Redis stream isolation"""
user_id: str | None
"""User ID (may be None for anonymous users)"""
message: str
"""User's message to process"""
is_user_message: bool = True
"""Whether the message is from the user (vs system/assistant)"""
context: dict[str, str] | None = None
"""Optional context for the message (e.g., {url: str, content: str})"""
class CancelCoPilotEvent(BaseModel):
"""Event to cancel a CoPilot operation."""
session_id: str
"""Session ID to cancel"""
# ============ Queue Publishing Helpers ============ #
async def enqueue_copilot_task(
session_id: str,
user_id: str | None,
message: str,
turn_id: str = "",
is_user_message: bool = True,
context: dict[str, str] | None = None,
) -> None:
"""Enqueue a CoPilot task for processing by the executor service.
Args:
session_id: Chat session ID (also used for dedup/locking)
user_id: User ID (may be None for anonymous users)
message: User's message to process
turn_id: Per-turn UUID for Redis stream isolation
is_user_message: Whether the message is from the user (vs system/assistant)
context: Optional context for the message (e.g., {url: str, content: str})
"""
from backend.util.clients import get_async_copilot_queue
entry = CoPilotExecutionEntry(
session_id=session_id,
turn_id=turn_id,
user_id=user_id,
message=message,
is_user_message=is_user_message,
context=context,
)
queue_client = await get_async_copilot_queue()
await queue_client.publish_message(
routing_key=COPILOT_EXECUTION_ROUTING_KEY,
message=entry.model_dump_json(),
exchange=COPILOT_EXECUTION_EXCHANGE,
)
async def enqueue_cancel_task(session_id: str) -> None:
"""Publish a cancel request for a running CoPilot session.
Sends a ``CancelCoPilotEvent`` to the FANOUT exchange so all executor
pods receive the cancellation signal.
"""
from backend.util.clients import get_async_copilot_queue
event = CancelCoPilotEvent(session_id=session_id)
queue_client = await get_async_copilot_queue()
await queue_client.publish_message(
routing_key="", # FANOUT ignores routing key
message=event.model_dump_json(),
exchange=COPILOT_CANCEL_EXCHANGE,
)

View File

@@ -2,7 +2,7 @@ import asyncio
import logging
import uuid
from datetime import UTC, datetime
from typing import Any, cast
from typing import Any, Self, cast
from weakref import WeakValueDictionary
from openai.types.chat import (
@@ -23,26 +23,17 @@ from prisma.models import ChatMessage as PrismaChatMessage
from prisma.models import ChatSession as PrismaChatSession
from pydantic import BaseModel
from backend.data.db_accessors import chat_db
from backend.data.redis_client import get_redis_async
from backend.util import json
from backend.util.exceptions import DatabaseError, RedisError
from . import db as chat_db
from .config import ChatConfig
logger = logging.getLogger(__name__)
config = ChatConfig()
def _parse_json_field(value: str | dict | list | None, default: Any = None) -> Any:
"""Parse a JSON field that may be stored as string or already parsed."""
if value is None:
return default
if isinstance(value, str):
return json.loads(value)
return value
# Redis cache key prefix for chat sessions
CHAT_SESSION_CACHE_PREFIX = "chat:session:"
@@ -52,28 +43,7 @@ def _get_session_cache_key(session_id: str) -> str:
return f"{CHAT_SESSION_CACHE_PREFIX}{session_id}"
# Session-level locks to prevent race conditions during concurrent upserts.
# Uses WeakValueDictionary to automatically garbage collect locks when no longer referenced,
# preventing unbounded memory growth while maintaining lock semantics for active sessions.
# Invalidation: Locks are auto-removed by GC when no coroutine holds a reference (after
# async with lock: completes). Explicit cleanup also occurs in delete_chat_session().
_session_locks: WeakValueDictionary[str, asyncio.Lock] = WeakValueDictionary()
_session_locks_mutex = asyncio.Lock()
async def _get_session_lock(session_id: str) -> asyncio.Lock:
"""Get or create a lock for a specific session to prevent concurrent upserts.
Uses WeakValueDictionary for automatic cleanup: locks are garbage collected
when no coroutine holds a reference to them, preventing memory leaks from
unbounded growth of session locks.
"""
async with _session_locks_mutex:
lock = _session_locks.get(session_id)
if lock is None:
lock = asyncio.Lock()
_session_locks[session_id] = lock
return lock
# ===================== Chat data models ===================== #
class ChatMessage(BaseModel):
@@ -85,6 +55,19 @@ class ChatMessage(BaseModel):
tool_calls: list[dict] | None = None
function_call: dict | None = None
@staticmethod
def from_db(prisma_message: PrismaChatMessage) -> "ChatMessage":
"""Convert a Prisma ChatMessage to a Pydantic ChatMessage."""
return ChatMessage(
role=prisma_message.role,
content=prisma_message.content,
name=prisma_message.name,
tool_call_id=prisma_message.toolCallId,
refusal=prisma_message.refusal,
tool_calls=_parse_json_field(prisma_message.toolCalls),
function_call=_parse_json_field(prisma_message.functionCall),
)
class Usage(BaseModel):
prompt_tokens: int
@@ -92,11 +75,10 @@ class Usage(BaseModel):
total_tokens: int
class ChatSession(BaseModel):
class ChatSessionInfo(BaseModel):
session_id: str
user_id: str
title: str | None = None
messages: list[ChatMessage]
usage: list[Usage]
credentials: dict[str, dict] = {} # Map of provider -> credential metadata
started_at: datetime
@@ -104,60 +86,9 @@ class ChatSession(BaseModel):
successful_agent_runs: dict[str, int] = {}
successful_agent_schedules: dict[str, int] = {}
def add_tool_call_to_current_turn(self, tool_call: dict) -> None:
"""Attach a tool_call to the current turn's assistant message.
Searches backwards for the most recent assistant message (stopping at
any user message boundary). If found, appends the tool_call to it.
Otherwise creates a new assistant message with the tool_call.
"""
for msg in reversed(self.messages):
if msg.role == "user":
break
if msg.role == "assistant":
if not msg.tool_calls:
msg.tool_calls = []
msg.tool_calls.append(tool_call)
return
self.messages.append(
ChatMessage(role="assistant", content="", tool_calls=[tool_call])
)
@staticmethod
def new(user_id: str) -> "ChatSession":
return ChatSession(
session_id=str(uuid.uuid4()),
user_id=user_id,
title=None,
messages=[],
usage=[],
credentials={},
started_at=datetime.now(UTC),
updated_at=datetime.now(UTC),
)
@staticmethod
def from_db(
prisma_session: PrismaChatSession,
prisma_messages: list[PrismaChatMessage] | None = None,
) -> "ChatSession":
"""Convert Prisma models to Pydantic ChatSession."""
messages = []
if prisma_messages:
for msg in prisma_messages:
messages.append(
ChatMessage(
role=msg.role,
content=msg.content,
name=msg.name,
tool_call_id=msg.toolCallId,
refusal=msg.refusal,
tool_calls=_parse_json_field(msg.toolCalls),
function_call=_parse_json_field(msg.functionCall),
)
)
@classmethod
def from_db(cls, prisma_session: PrismaChatSession) -> Self:
"""Convert Prisma ChatSession to Pydantic ChatSession."""
# Parse JSON fields from Prisma
credentials = _parse_json_field(prisma_session.credentials, default={})
successful_agent_runs = _parse_json_field(
@@ -179,11 +110,10 @@ class ChatSession(BaseModel):
)
)
return ChatSession(
return cls(
session_id=prisma_session.id,
user_id=prisma_session.userId,
title=prisma_session.title,
messages=messages,
usage=usage,
credentials=credentials,
started_at=prisma_session.createdAt,
@@ -192,46 +122,55 @@ class ChatSession(BaseModel):
successful_agent_schedules=successful_agent_schedules,
)
@staticmethod
def _merge_consecutive_assistant_messages(
messages: list[ChatCompletionMessageParam],
) -> list[ChatCompletionMessageParam]:
"""Merge consecutive assistant messages into single messages.
Long-running tool flows can create split assistant messages: one with
text content and another with tool_calls. Anthropic's API requires
tool_result blocks to reference a tool_use in the immediately preceding
assistant message, so these splits cause 400 errors via OpenRouter.
class ChatSession(ChatSessionInfo):
messages: list[ChatMessage]
@classmethod
def new(cls, user_id: str) -> Self:
return cls(
session_id=str(uuid.uuid4()),
user_id=user_id,
title=None,
messages=[],
usage=[],
credentials={},
started_at=datetime.now(UTC),
updated_at=datetime.now(UTC),
)
@classmethod
def from_db(cls, prisma_session: PrismaChatSession) -> Self:
"""Convert Prisma ChatSession to Pydantic ChatSession."""
if prisma_session.Messages is None:
raise ValueError(
f"Prisma session {prisma_session.id} is missing Messages relation"
)
return cls(
**ChatSessionInfo.from_db(prisma_session).model_dump(),
messages=[ChatMessage.from_db(m) for m in prisma_session.Messages],
)
def add_tool_call_to_current_turn(self, tool_call: dict) -> None:
"""Attach a tool_call to the current turn's assistant message.
Searches backwards for the most recent assistant message (stopping at
any user message boundary). If found, appends the tool_call to it.
Otherwise creates a new assistant message with the tool_call.
"""
if len(messages) < 2:
return messages
for msg in reversed(self.messages):
if msg.role == "user":
break
if msg.role == "assistant":
if not msg.tool_calls:
msg.tool_calls = []
msg.tool_calls.append(tool_call)
return
result: list[ChatCompletionMessageParam] = [messages[0]]
for msg in messages[1:]:
prev = result[-1]
if prev.get("role") != "assistant" or msg.get("role") != "assistant":
result.append(msg)
continue
prev = cast(ChatCompletionAssistantMessageParam, prev)
curr = cast(ChatCompletionAssistantMessageParam, msg)
curr_content = curr.get("content") or ""
if curr_content:
prev_content = prev.get("content") or ""
prev["content"] = (
f"{prev_content}\n{curr_content}" if prev_content else curr_content
)
curr_tool_calls = curr.get("tool_calls")
if curr_tool_calls:
prev_tool_calls = prev.get("tool_calls")
prev["tool_calls"] = (
list(prev_tool_calls) + list(curr_tool_calls)
if prev_tool_calls
else list(curr_tool_calls)
)
return result
self.messages.append(
ChatMessage(role="assistant", content="", tool_calls=[tool_call])
)
def to_openai_messages(self) -> list[ChatCompletionMessageParam]:
messages = []
@@ -321,39 +260,68 @@ class ChatSession(BaseModel):
)
return self._merge_consecutive_assistant_messages(messages)
@staticmethod
def _merge_consecutive_assistant_messages(
messages: list[ChatCompletionMessageParam],
) -> list[ChatCompletionMessageParam]:
"""Merge consecutive assistant messages into single messages.
async def _get_session_from_cache(session_id: str) -> ChatSession | None:
"""Get a chat session from Redis cache."""
redis_key = _get_session_cache_key(session_id)
async_redis = await get_redis_async()
raw_session: bytes | None = await async_redis.get(redis_key)
Long-running tool flows can create split assistant messages: one with
text content and another with tool_calls. Anthropic's API requires
tool_result blocks to reference a tool_use in the immediately preceding
assistant message, so these splits cause 400 errors via OpenRouter.
"""
if len(messages) < 2:
return messages
if raw_session is None:
return None
result: list[ChatCompletionMessageParam] = [messages[0]]
for msg in messages[1:]:
prev = result[-1]
if prev.get("role") != "assistant" or msg.get("role") != "assistant":
result.append(msg)
continue
try:
session = ChatSession.model_validate_json(raw_session)
logger.info(
f"Loading session {session_id} from cache: "
f"message_count={len(session.messages)}, "
f"roles={[m.role for m in session.messages]}"
)
return session
except Exception as e:
logger.error(f"Failed to deserialize session {session_id}: {e}", exc_info=True)
raise RedisError(f"Corrupted session data for {session_id}") from e
prev = cast(ChatCompletionAssistantMessageParam, prev)
curr = cast(ChatCompletionAssistantMessageParam, msg)
curr_content = curr.get("content") or ""
if curr_content:
prev_content = prev.get("content") or ""
prev["content"] = (
f"{prev_content}\n{curr_content}" if prev_content else curr_content
)
curr_tool_calls = curr.get("tool_calls")
if curr_tool_calls:
prev_tool_calls = prev.get("tool_calls")
prev["tool_calls"] = (
list(prev_tool_calls) + list(curr_tool_calls)
if prev_tool_calls
else list(curr_tool_calls)
)
return result
async def _cache_session(session: ChatSession) -> None:
"""Cache a chat session in Redis."""
redis_key = _get_session_cache_key(session.session_id)
async_redis = await get_redis_async()
await async_redis.setex(redis_key, config.session_ttl, session.model_dump_json())
def _parse_json_field(value: str | dict | list | None, default: Any = None) -> Any:
"""Parse a JSON field that may be stored as string or already parsed."""
if value is None:
return default
if isinstance(value, str):
return json.loads(value)
return value
# ================ Chat cache + DB operations ================ #
# NOTE: Database calls are automatically routed through DatabaseManager if Prisma is not
# connected directly.
async def cache_chat_session(session: ChatSession) -> None:
"""Cache a chat session without persisting to the database."""
await _cache_session(session)
"""Cache a chat session in Redis (without persisting to the database)."""
redis_key = _get_session_cache_key(session.session_id)
async_redis = await get_redis_async()
await async_redis.setex(redis_key, config.session_ttl, session.model_dump_json())
async def invalidate_session_cache(session_id: str) -> None:
@@ -371,80 +339,6 @@ async def invalidate_session_cache(session_id: str) -> None:
logger.warning(f"Failed to invalidate session cache for {session_id}: {e}")
async def _get_session_from_db(session_id: str) -> ChatSession | None:
"""Get a chat session from the database."""
prisma_session = await chat_db.get_chat_session(session_id)
if not prisma_session:
return None
messages = prisma_session.Messages
logger.info(
f"Loading session {session_id} from DB: "
f"has_messages={messages is not None}, "
f"message_count={len(messages) if messages else 0}, "
f"roles={[m.role for m in messages] if messages else []}"
)
return ChatSession.from_db(prisma_session, messages)
async def _save_session_to_db(
session: ChatSession, existing_message_count: int
) -> None:
"""Save or update a chat session in the database."""
# Check if session exists in DB
existing = await chat_db.get_chat_session(session.session_id)
if not existing:
# Create new session
await chat_db.create_chat_session(
session_id=session.session_id,
user_id=session.user_id,
)
existing_message_count = 0
# Calculate total tokens from usage
total_prompt = sum(u.prompt_tokens for u in session.usage)
total_completion = sum(u.completion_tokens for u in session.usage)
# Update session metadata
await chat_db.update_chat_session(
session_id=session.session_id,
credentials=session.credentials,
successful_agent_runs=session.successful_agent_runs,
successful_agent_schedules=session.successful_agent_schedules,
total_prompt_tokens=total_prompt,
total_completion_tokens=total_completion,
)
# Add new messages (only those after existing count)
new_messages = session.messages[existing_message_count:]
if new_messages:
messages_data = []
for msg in new_messages:
messages_data.append(
{
"role": msg.role,
"content": msg.content,
"name": msg.name,
"tool_call_id": msg.tool_call_id,
"refusal": msg.refusal,
"tool_calls": msg.tool_calls,
"function_call": msg.function_call,
}
)
logger.info(
f"Saving {len(new_messages)} new messages to DB for session {session.session_id}: "
f"roles={[m['role'] for m in messages_data]}, "
f"start_sequence={existing_message_count}"
)
await chat_db.add_chat_messages_batch(
session_id=session.session_id,
messages=messages_data,
start_sequence=existing_message_count,
)
async def get_chat_session(
session_id: str,
user_id: str | None = None,
@@ -476,7 +370,7 @@ async def get_chat_session(
logger.warning(f"Unexpected cache error for session {session_id}: {e}")
# Fall back to database
logger.info(f"Session {session_id} not in cache, checking database")
logger.debug(f"Session {session_id} not in cache, checking database")
session = await _get_session_from_db(session_id)
if session is None:
@@ -492,7 +386,7 @@ async def get_chat_session(
# Cache the session from DB
try:
await _cache_session(session)
await cache_chat_session(session)
logger.info(f"Cached session {session_id} from database")
except Exception as e:
logger.warning(f"Failed to cache session {session_id}: {e}")
@@ -500,6 +394,44 @@ async def get_chat_session(
return session
async def _get_session_from_cache(session_id: str) -> ChatSession | None:
"""Get a chat session from Redis cache."""
redis_key = _get_session_cache_key(session_id)
async_redis = await get_redis_async()
raw_session: bytes | None = await async_redis.get(redis_key)
if raw_session is None:
return None
try:
session = ChatSession.model_validate_json(raw_session)
logger.info(
f"Loading session {session_id} from cache: "
f"message_count={len(session.messages)}, "
f"roles={[m.role for m in session.messages]}"
)
return session
except Exception as e:
logger.error(f"Failed to deserialize session {session_id}: {e}", exc_info=True)
raise RedisError(f"Corrupted session data for {session_id}") from e
async def _get_session_from_db(session_id: str) -> ChatSession | None:
"""Get a chat session from the database."""
session = await chat_db().get_chat_session(session_id)
if not session:
return None
logger.info(
f"Loaded session {session_id} from DB: "
f"has_messages={bool(session.messages)}, "
f"message_count={len(session.messages)}, "
f"roles={[m.role for m in session.messages]}"
)
return session
async def upsert_chat_session(
session: ChatSession,
) -> ChatSession:
@@ -519,16 +451,18 @@ async def upsert_chat_session(
lock = await _get_session_lock(session.session_id)
async with lock:
# Get existing message count from DB for incremental saves
existing_message_count = await chat_db.get_chat_session_message_count(
session.session_id
)
# Always query DB for existing message count to ensure consistency
existing_message_count = await chat_db().get_next_sequence(session.session_id)
db_error: Exception | None = None
# Save to database (primary storage)
try:
await _save_session_to_db(session, existing_message_count)
await _save_session_to_db(
session,
existing_message_count,
skip_existence_check=existing_message_count > 0,
)
except Exception as e:
logger.error(
f"Failed to save session {session.session_id} to database: {e}"
@@ -537,7 +471,7 @@ async def upsert_chat_session(
# Save to cache (best-effort, even if DB failed)
try:
await _cache_session(session)
await cache_chat_session(session)
except Exception as e:
# If DB succeeded but cache failed, raise cache error
if db_error is None:
@@ -558,6 +492,107 @@ async def upsert_chat_session(
return session
async def _save_session_to_db(
session: ChatSession,
existing_message_count: int,
*,
skip_existence_check: bool = False,
) -> None:
"""Save or update a chat session in the database.
Args:
skip_existence_check: When True, skip the ``get_chat_session`` query
and assume the session row already exists. Saves one DB round trip
for incremental saves during streaming.
"""
db = chat_db()
if not skip_existence_check:
# Check if session exists in DB
existing = await db.get_chat_session(session.session_id)
if not existing:
# Create new session
await db.create_chat_session(
session_id=session.session_id,
user_id=session.user_id,
)
existing_message_count = 0
# Calculate total tokens from usage
total_prompt = sum(u.prompt_tokens for u in session.usage)
total_completion = sum(u.completion_tokens for u in session.usage)
# Update session metadata
await db.update_chat_session(
session_id=session.session_id,
credentials=session.credentials,
successful_agent_runs=session.successful_agent_runs,
successful_agent_schedules=session.successful_agent_schedules,
total_prompt_tokens=total_prompt,
total_completion_tokens=total_completion,
)
# Add new messages (only those after existing count)
new_messages = session.messages[existing_message_count:]
if new_messages:
messages_data = []
for msg in new_messages:
messages_data.append(
{
"role": msg.role,
"content": msg.content,
"name": msg.name,
"tool_call_id": msg.tool_call_id,
"refusal": msg.refusal,
"tool_calls": msg.tool_calls,
"function_call": msg.function_call,
}
)
logger.info(
f"Saving {len(new_messages)} new messages to DB for session {session.session_id}: "
f"roles={[m['role'] for m in messages_data]}, "
f"start_sequence={existing_message_count}"
)
await db.add_chat_messages_batch(
session_id=session.session_id,
messages=messages_data,
start_sequence=existing_message_count,
)
async def append_and_save_message(session_id: str, message: ChatMessage) -> ChatSession:
"""Atomically append a message to a session and persist it.
Acquires the session lock, re-fetches the latest session state,
appends the message, and saves preventing message loss when
concurrent requests modify the same session.
"""
lock = await _get_session_lock(session_id)
async with lock:
session = await get_chat_session(session_id)
if session is None:
raise ValueError(f"Session {session_id} not found")
session.messages.append(message)
existing_message_count = await chat_db().get_next_sequence(session_id)
try:
await _save_session_to_db(session, existing_message_count)
except Exception as e:
raise DatabaseError(
f"Failed to persist message to session {session_id}"
) from e
try:
await cache_chat_session(session)
except Exception as e:
logger.warning(f"Cache write failed for session {session_id}: {e}")
return session
async def create_chat_session(user_id: str) -> ChatSession:
"""Create a new chat session and persist it.
@@ -570,7 +605,7 @@ async def create_chat_session(user_id: str) -> ChatSession:
# Create in database first - fail fast if this fails
try:
await chat_db.create_chat_session(
await chat_db().create_chat_session(
session_id=session.session_id,
user_id=user_id,
)
@@ -582,7 +617,7 @@ async def create_chat_session(user_id: str) -> ChatSession:
# Cache the session (best-effort optimization, DB is source of truth)
try:
await _cache_session(session)
await cache_chat_session(session)
except Exception as e:
logger.warning(f"Failed to cache new session {session.session_id}: {e}")
@@ -593,20 +628,16 @@ async def get_user_sessions(
user_id: str,
limit: int = 50,
offset: int = 0,
) -> tuple[list[ChatSession], int]:
) -> tuple[list[ChatSessionInfo], int]:
"""Get chat sessions for a user from the database with total count.
Returns:
A tuple of (sessions, total_count) where total_count is the overall
number of sessions for the user (not just the current page).
"""
prisma_sessions = await chat_db.get_user_chat_sessions(user_id, limit, offset)
total_count = await chat_db.get_user_session_count(user_id)
sessions = []
for prisma_session in prisma_sessions:
# Convert without messages for listing (lighter weight)
sessions.append(ChatSession.from_db(prisma_session, None))
db = chat_db()
sessions = await db.get_user_chat_sessions(user_id, limit, offset)
total_count = await db.get_user_session_count(user_id)
return sessions, total_count
@@ -624,7 +655,7 @@ async def delete_chat_session(session_id: str, user_id: str | None = None) -> bo
"""
# Delete from database first (with optional user_id validation)
# This confirms ownership before invalidating cache
deleted = await chat_db.delete_chat_session(session_id, user_id)
deleted = await chat_db().delete_chat_session(session_id, user_id)
if not deleted:
return False
@@ -659,20 +690,52 @@ async def update_session_title(session_id: str, title: str) -> bool:
True if updated successfully, False otherwise.
"""
try:
result = await chat_db.update_chat_session(session_id=session_id, title=title)
result = await chat_db().update_chat_session(session_id=session_id, title=title)
if result is None:
logger.warning(f"Session {session_id} not found for title update")
return False
# Invalidate cache so next fetch gets updated title
# Update title in cache if it exists (instead of invalidating).
# This prevents race conditions where cache invalidation causes
# the frontend to see stale DB data while streaming is still in progress.
try:
redis_key = _get_session_cache_key(session_id)
async_redis = await get_redis_async()
await async_redis.delete(redis_key)
cached = await _get_session_from_cache(session_id)
if cached:
cached.title = title
await cache_chat_session(cached)
except Exception as e:
logger.warning(f"Failed to invalidate cache for session {session_id}: {e}")
# Not critical - title will be correct on next full cache refresh
logger.warning(
f"Failed to update title in cache for session {session_id}: {e}"
)
return True
except Exception as e:
logger.error(f"Failed to update title for session {session_id}: {e}")
return False
# ==================== Chat session locks ==================== #
_session_locks: WeakValueDictionary[str, asyncio.Lock] = WeakValueDictionary()
_session_locks_mutex = asyncio.Lock()
async def _get_session_lock(session_id: str) -> asyncio.Lock:
"""Get or create a lock for a specific session to prevent concurrent upserts.
This was originally added to solve the specific problem of race conditions between
the session title thread and the conversation thread, which always occurs on the
same instance as we prevent rapid request sends on the frontend.
Uses WeakValueDictionary for automatic cleanup: locks are garbage collected
when no coroutine holds a reference to them, preventing memory leaks from
unbounded growth of session locks. Explicit cleanup also occurs
in `delete_chat_session()`.
"""
async with _session_locks_mutex:
lock = _session_locks.get(session_id)
if lock is None:
lock = asyncio.Lock()
_session_locks[session_id] = lock
return lock

View File

@@ -331,3 +331,96 @@ def test_to_openai_messages_merges_split_assistants():
tc_list = merged.get("tool_calls")
assert tc_list is not None and len(list(tc_list)) == 1
assert list(tc_list)[0]["id"] == "tc1"
# --------------------------------------------------------------------------- #
# Concurrent save collision detection #
# --------------------------------------------------------------------------- #
@pytest.mark.asyncio(loop_scope="session")
async def test_concurrent_saves_collision_detection(setup_test_user, test_user_id):
"""Test that concurrent saves from streaming loop and callback handle collisions correctly.
Simulates the race condition where:
1. Streaming loop starts with saved_msg_count=5
2. Long-running callback appends message #5 and saves
3. Streaming loop tries to save with stale count=5
The collision detection should handle this gracefully.
"""
import asyncio
# Create a session with initial messages
session = ChatSession.new(user_id=test_user_id)
for i in range(3):
session.messages.append(
ChatMessage(
role="user" if i % 2 == 0 else "assistant", content=f"Message {i}"
)
)
# Save initial messages
session = await upsert_chat_session(session)
# Simulate streaming loop and callback saving concurrently
async def streaming_loop_save():
"""Simulates streaming loop saving messages."""
# Add 2 messages
session.messages.append(ChatMessage(role="user", content="Streaming message 1"))
session.messages.append(
ChatMessage(role="assistant", content="Streaming message 2")
)
# Wait a bit to let callback potentially save first
await asyncio.sleep(0.01)
# Save (will query DB for existing count)
return await upsert_chat_session(session)
async def callback_save():
"""Simulates long-running callback saving a message."""
# Add 1 message
session.messages.append(
ChatMessage(role="tool", content="Callback result", tool_call_id="tc1")
)
# Save immediately (will query DB for existing count)
return await upsert_chat_session(session)
# Run both saves concurrently - one will hit collision detection
results = await asyncio.gather(streaming_loop_save(), callback_save())
# Both should succeed
assert all(r is not None for r in results)
# Reload session from DB to verify
from backend.data.redis_client import get_redis_async
redis_key = f"chat:session:{session.session_id}"
async_redis = await get_redis_async()
await async_redis.delete(redis_key) # Clear cache to force DB load
loaded_session = await get_chat_session(session.session_id, test_user_id)
assert loaded_session is not None
# Should have all 6 messages (3 initial + 2 streaming + 1 callback)
assert len(loaded_session.messages) == 6
# Verify no duplicate sequences
sequences = []
for i, msg in enumerate(loaded_session.messages):
# Messages should have sequential sequence numbers starting from 0
sequences.append(i)
# All sequences should be unique and sequential
assert sequences == list(range(6))
# Verify message content is preserved
contents = [m.content for m in loaded_session.messages]
assert "Message 0" in contents
assert "Message 1" in contents
assert "Message 2" in contents
assert "Streaming message 1" in contents
assert "Streaming message 2" in contents
assert "Callback result" in contents

View File

@@ -0,0 +1,269 @@
"""Tests for parallel tool call execution in CoPilot.
These tests mock _yield_tool_call to avoid importing the full copilot stack
which requires Prisma, DB connections, etc.
"""
import asyncio
import time
from typing import Any, cast
import pytest
@pytest.mark.asyncio
async def test_parallel_tool_calls_run_concurrently():
"""Multiple tool calls should complete in ~max(delays), not sum(delays)."""
from backend.copilot.response_model import (
StreamToolInputAvailable,
StreamToolOutputAvailable,
)
from backend.copilot.service import _execute_tool_calls_parallel
n_tools = 3
delay_per_tool = 0.2
tool_calls = [
{
"id": f"call_{i}",
"type": "function",
"function": {"name": f"tool_{i}", "arguments": "{}"},
}
for i in range(n_tools)
]
class FakeSession:
session_id = "test"
user_id = "test"
def __init__(self):
self.messages = []
original_yield = None
async def fake_yield(tc_list, idx, sess):
yield StreamToolInputAvailable(
toolCallId=tc_list[idx]["id"],
toolName=tc_list[idx]["function"]["name"],
input={},
)
await asyncio.sleep(delay_per_tool)
yield StreamToolOutputAvailable(
toolCallId=tc_list[idx]["id"],
toolName=tc_list[idx]["function"]["name"],
output="{}",
)
import backend.copilot.service as svc
original_yield = svc._yield_tool_call
svc._yield_tool_call = fake_yield
try:
start = time.monotonic()
events = []
async for event in _execute_tool_calls_parallel(
tool_calls, cast(Any, FakeSession())
):
events.append(event)
elapsed = time.monotonic() - start
finally:
svc._yield_tool_call = original_yield
assert len(events) == n_tools * 2
# Parallel: should take ~delay, not ~n*delay
assert elapsed < delay_per_tool * (
n_tools - 0.5
), f"Took {elapsed:.2f}s, expected parallel (~{delay_per_tool}s)"
@pytest.mark.asyncio
async def test_single_tool_call_works():
"""Single tool call should work identically."""
from backend.copilot.response_model import (
StreamToolInputAvailable,
StreamToolOutputAvailable,
)
from backend.copilot.service import _execute_tool_calls_parallel
tool_calls = [
{
"id": "call_0",
"type": "function",
"function": {"name": "t", "arguments": "{}"},
}
]
class FakeSession:
session_id = "test"
user_id = "test"
def __init__(self):
self.messages = []
async def fake_yield(tc_list, idx, sess):
yield StreamToolInputAvailable(toolCallId="call_0", toolName="t", input={})
yield StreamToolOutputAvailable(toolCallId="call_0", toolName="t", output="{}")
import backend.copilot.service as svc
orig = svc._yield_tool_call
svc._yield_tool_call = fake_yield
try:
events = [
e
async for e in _execute_tool_calls_parallel(
tool_calls, cast(Any, FakeSession())
)
]
finally:
svc._yield_tool_call = orig
assert len(events) == 2
@pytest.mark.asyncio
async def test_retryable_error_propagates():
"""Retryable errors should be raised after all tools finish."""
from backend.copilot.response_model import StreamToolOutputAvailable
from backend.copilot.service import _execute_tool_calls_parallel
tool_calls = [
{
"id": f"call_{i}",
"type": "function",
"function": {"name": f"t_{i}", "arguments": "{}"},
}
for i in range(2)
]
class FakeSession:
session_id = "test"
user_id = "test"
def __init__(self):
self.messages = []
async def fake_yield(tc_list, idx, sess):
if idx == 1:
raise KeyError("bad")
from backend.copilot.response_model import StreamToolInputAvailable
yield StreamToolInputAvailable(
toolCallId=tc_list[idx]["id"], toolName="t_0", input={}
)
await asyncio.sleep(0.05)
yield StreamToolOutputAvailable(
toolCallId=tc_list[idx]["id"], toolName="t_0", output="{}"
)
import backend.copilot.service as svc
orig = svc._yield_tool_call
svc._yield_tool_call = fake_yield
try:
events = []
with pytest.raises(KeyError):
async for event in _execute_tool_calls_parallel(
tool_calls, cast(Any, FakeSession())
):
events.append(event)
# First tool's events should still be yielded
assert any(isinstance(e, StreamToolOutputAvailable) for e in events)
finally:
svc._yield_tool_call = orig
@pytest.mark.asyncio
async def test_session_shared_across_parallel_tools():
"""All parallel tools should receive the same session instance."""
from backend.copilot.response_model import (
StreamToolInputAvailable,
StreamToolOutputAvailable,
)
from backend.copilot.service import _execute_tool_calls_parallel
tool_calls = [
{
"id": f"call_{i}",
"type": "function",
"function": {"name": f"t_{i}", "arguments": "{}"},
}
for i in range(3)
]
class FakeSession:
session_id = "test"
user_id = "test"
def __init__(self):
self.messages = []
observed_sessions = []
async def fake_yield(tc_list, idx, sess):
observed_sessions.append(sess)
yield StreamToolInputAvailable(
toolCallId=tc_list[idx]["id"], toolName=f"t_{idx}", input={}
)
yield StreamToolOutputAvailable(
toolCallId=tc_list[idx]["id"], toolName=f"t_{idx}", output="{}"
)
import backend.copilot.service as svc
orig = svc._yield_tool_call
svc._yield_tool_call = fake_yield
try:
async for _ in _execute_tool_calls_parallel(
tool_calls, cast(Any, FakeSession())
):
pass
finally:
svc._yield_tool_call = orig
assert len(observed_sessions) == 3
assert observed_sessions[0] is observed_sessions[1] is observed_sessions[2]
@pytest.mark.asyncio
async def test_cancellation_cleans_up():
"""Generator close should cancel in-flight tasks."""
from backend.copilot.response_model import StreamToolInputAvailable
from backend.copilot.service import _execute_tool_calls_parallel
tool_calls = [
{
"id": f"call_{i}",
"type": "function",
"function": {"name": f"t_{i}", "arguments": "{}"},
}
for i in range(2)
]
class FakeSession:
session_id = "test"
user_id = "test"
def __init__(self):
self.messages = []
started = asyncio.Event()
async def fake_yield(tc_list, idx, sess):
yield StreamToolInputAvailable(
toolCallId=tc_list[idx]["id"], toolName=f"t_{idx}", input={}
)
started.set()
await asyncio.sleep(10) # simulate long-running
import backend.copilot.service as svc
orig = svc._yield_tool_call
svc._yield_tool_call = fake_yield
try:
gen = _execute_tool_calls_parallel(tool_calls, cast(Any, FakeSession()))
await gen.__anext__() # get first event
await started.wait()
await gen.aclose() # close generator
finally:
svc._yield_tool_call = orig
# If we get here without hanging, cleanup worked

View File

@@ -5,6 +5,8 @@ This module implements the AI SDK UI Stream Protocol (v1) for streaming chat res
See: https://ai-sdk.dev/docs/ai-sdk-ui/stream-protocol
"""
import json
import logging
from enum import Enum
from typing import Any
@@ -12,6 +14,8 @@ from pydantic import BaseModel, Field
from backend.util.json import dumps as json_dumps
logger = logging.getLogger(__name__)
class ResponseType(str, Enum):
"""Types of streaming responses following AI SDK protocol."""
@@ -47,7 +51,8 @@ class StreamBaseResponse(BaseModel):
def to_sse(self) -> str:
"""Convert to SSE format."""
return f"data: {self.model_dump_json()}\n\n"
json_str = self.model_dump_json(exclude_none=True)
return f"data: {json_str}\n\n"
# ========== Message Lifecycle ==========
@@ -58,15 +63,13 @@ class StreamStart(StreamBaseResponse):
type: ResponseType = ResponseType.START
messageId: str = Field(..., description="Unique message ID")
taskId: str | None = Field(
sessionId: str | None = Field(
default=None,
description="Task ID for SSE reconnection. Clients can reconnect using GET /tasks/{taskId}/stream",
description="Session ID for SSE reconnection.",
)
def to_sse(self) -> str:
"""Convert to SSE format, excluding non-protocol fields like taskId."""
import json
"""Convert to SSE format, excluding non-protocol fields like sessionId."""
data: dict[str, Any] = {
"type": self.type.value,
"messageId": self.messageId,
@@ -163,8 +166,6 @@ class StreamToolOutputAvailable(StreamBaseResponse):
def to_sse(self) -> str:
"""Convert to SSE format, excluding non-spec fields."""
import json
data = {
"type": self.type.value,
"toolCallId": self.toolCallId,

View File

@@ -0,0 +1,14 @@
"""Claude Agent SDK integration for CoPilot.
This module provides the integration layer between the Claude Agent SDK
and the existing CoPilot tool system, enabling drop-in replacement of
the current LLM orchestration with the battle-tested Claude Agent SDK.
"""
from .service import stream_chat_completion_sdk
from .tool_adapter import create_copilot_mcp_server
__all__ = [
"stream_chat_completion_sdk",
"create_copilot_mcp_server",
]

View File

@@ -0,0 +1,65 @@
"""Dummy SDK service for testing copilot streaming.
Returns mock streaming responses without calling Claude Agent SDK.
Enable via COPILOT_TEST_MODE=true environment variable.
WARNING: This is for testing only. Do not use in production.
"""
import asyncio
import logging
import uuid
from collections.abc import AsyncGenerator
from ..model import ChatSession
from ..response_model import (
StreamBaseResponse,
StreamFinish,
StreamStart,
StreamTextDelta,
)
logger = logging.getLogger(__name__)
async def stream_chat_completion_dummy(
session_id: str,
message: str | None = None,
tool_call_response: str | None = None,
is_user_message: bool = True,
user_id: str | None = None,
retry_count: int = 0,
session: ChatSession | None = None,
context: dict[str, str] | None = None,
) -> AsyncGenerator[StreamBaseResponse, None]:
"""Stream dummy chat completion for testing.
Returns a simple streaming response with text deltas to test:
- Streaming infrastructure works
- No timeout occurs
- Text arrives in chunks
- StreamFinish is sent
"""
logger.warning(
f"[TEST MODE] Using dummy copilot streaming for session {session_id}"
)
message_id = str(uuid.uuid4())
text_block_id = str(uuid.uuid4())
# Start the stream
yield StreamStart(messageId=message_id, sessionId=session_id)
# Simulate streaming text response with delays
dummy_response = "I counted: 1... 2... 3. All done!"
words = dummy_response.split()
for i, word in enumerate(words):
# Add space except for last word
text = word if i == len(words) - 1 else f"{word} "
yield StreamTextDelta(id=text_block_id, delta=text)
# Small delay to simulate real streaming
await asyncio.sleep(0.1)
# Finish the stream
yield StreamFinish()

View File

@@ -0,0 +1,221 @@
"""Tests for _format_conversation_context and _build_query_message."""
from datetime import UTC, datetime
import pytest
from backend.copilot.model import ChatMessage, ChatSession
from backend.copilot.sdk.service import (
_build_query_message,
_format_conversation_context,
)
# ---------------------------------------------------------------------------
# _format_conversation_context
# ---------------------------------------------------------------------------
def test_format_empty_list():
assert _format_conversation_context([]) is None
def test_format_none_content_messages():
msgs = [ChatMessage(role="user", content=None)]
assert _format_conversation_context(msgs) is None
def test_format_user_message():
msgs = [ChatMessage(role="user", content="hello")]
result = _format_conversation_context(msgs)
assert result is not None
assert "User: hello" in result
assert result.startswith("<conversation_history>")
assert result.endswith("</conversation_history>")
def test_format_assistant_text():
msgs = [ChatMessage(role="assistant", content="hi there")]
result = _format_conversation_context(msgs)
assert result is not None
assert "You responded: hi there" in result
def test_format_assistant_tool_calls():
msgs = [
ChatMessage(
role="assistant",
content=None,
tool_calls=[{"function": {"name": "search", "arguments": '{"q": "test"}'}}],
)
]
result = _format_conversation_context(msgs)
assert result is not None
assert 'You called tool: search({"q": "test"})' in result
def test_format_tool_result():
msgs = [ChatMessage(role="tool", content='{"result": "ok"}')]
result = _format_conversation_context(msgs)
assert result is not None
assert 'Tool result: {"result": "ok"}' in result
def test_format_tool_result_none_content():
msgs = [ChatMessage(role="tool", content=None)]
result = _format_conversation_context(msgs)
assert result is not None
assert "Tool result: " in result
def test_format_full_conversation():
msgs = [
ChatMessage(role="user", content="find agents"),
ChatMessage(
role="assistant",
content="I'll search for agents.",
tool_calls=[
{"function": {"name": "find_agents", "arguments": '{"q": "test"}'}}
],
),
ChatMessage(role="tool", content='[{"id": "1", "name": "Agent1"}]'),
ChatMessage(role="assistant", content="Found Agent1."),
]
result = _format_conversation_context(msgs)
assert result is not None
assert "User: find agents" in result
assert "You responded: I'll search for agents." in result
assert "You called tool: find_agents" in result
assert "Tool result:" in result
assert "You responded: Found Agent1." in result
# ---------------------------------------------------------------------------
# _build_query_message
# ---------------------------------------------------------------------------
def _make_session(messages: list[ChatMessage]) -> ChatSession:
"""Build a minimal ChatSession with the given messages."""
now = datetime.now(UTC)
return ChatSession(
session_id="test-session",
user_id="user-1",
messages=messages,
title="test",
usage=[],
started_at=now,
updated_at=now,
)
@pytest.mark.asyncio
async def test_build_query_resume_up_to_date():
"""With --resume and transcript covers all messages, return raw message."""
session = _make_session(
[
ChatMessage(role="user", content="hello"),
ChatMessage(role="assistant", content="hi"),
ChatMessage(role="user", content="what's new?"),
]
)
result = await _build_query_message(
"what's new?",
session,
use_resume=True,
transcript_msg_count=2,
session_id="test-session",
)
# transcript_msg_count == msg_count - 1, so no gap
assert result == "what's new?"
@pytest.mark.asyncio
async def test_build_query_resume_stale_transcript():
"""With --resume and stale transcript, gap context is prepended."""
session = _make_session(
[
ChatMessage(role="user", content="turn 1"),
ChatMessage(role="assistant", content="reply 1"),
ChatMessage(role="user", content="turn 2"),
ChatMessage(role="assistant", content="reply 2"),
ChatMessage(role="user", content="turn 3"),
]
)
result = await _build_query_message(
"turn 3",
session,
use_resume=True,
transcript_msg_count=2,
session_id="test-session",
)
assert "<conversation_history>" in result
assert "turn 2" in result
assert "reply 2" in result
assert "Now, the user says:\nturn 3" in result
@pytest.mark.asyncio
async def test_build_query_resume_zero_msg_count():
"""With --resume but transcript_msg_count=0, return raw message."""
session = _make_session(
[
ChatMessage(role="user", content="hello"),
ChatMessage(role="assistant", content="hi"),
ChatMessage(role="user", content="new msg"),
]
)
result = await _build_query_message(
"new msg",
session,
use_resume=True,
transcript_msg_count=0,
session_id="test-session",
)
assert result == "new msg"
@pytest.mark.asyncio
async def test_build_query_no_resume_single_message():
"""Without --resume and only 1 message, return raw message."""
session = _make_session([ChatMessage(role="user", content="first")])
result = await _build_query_message(
"first",
session,
use_resume=False,
transcript_msg_count=0,
session_id="test-session",
)
assert result == "first"
@pytest.mark.asyncio
async def test_build_query_no_resume_multi_message(monkeypatch):
"""Without --resume and multiple messages, compress and prepend."""
session = _make_session(
[
ChatMessage(role="user", content="older question"),
ChatMessage(role="assistant", content="older answer"),
ChatMessage(role="user", content="new question"),
]
)
# Mock _compress_conversation_history to return the messages as-is
async def _mock_compress(sess):
return sess.messages[:-1]
monkeypatch.setattr(
"backend.copilot.sdk.service._compress_conversation_history",
_mock_compress,
)
result = await _build_query_message(
"new question",
session,
use_resume=False,
transcript_msg_count=0,
session_id="test-session",
)
assert "<conversation_history>" in result
assert "older question" in result
assert "older answer" in result
assert "Now, the user says:\nnew question" in result

View File

@@ -0,0 +1,371 @@
"""Response adapter for converting Claude Agent SDK messages to Vercel AI SDK format.
This module provides the adapter layer that converts streaming messages from
the Claude Agent SDK into the Vercel AI SDK UI Stream Protocol format that
the frontend expects.
"""
import json
import logging
import uuid
from claude_agent_sdk import (
AssistantMessage,
Message,
ResultMessage,
SystemMessage,
TextBlock,
ToolResultBlock,
ToolUseBlock,
UserMessage,
)
from backend.copilot.response_model import (
StreamBaseResponse,
StreamError,
StreamFinish,
StreamFinishStep,
StreamStart,
StreamStartStep,
StreamTextDelta,
StreamTextEnd,
StreamTextStart,
StreamToolInputAvailable,
StreamToolInputStart,
StreamToolOutputAvailable,
)
from .tool_adapter import MCP_TOOL_PREFIX, pop_pending_tool_output
logger = logging.getLogger(__name__)
class SDKResponseAdapter:
"""Adapter for converting Claude Agent SDK messages to Vercel AI SDK format.
This class maintains state during a streaming session to properly track
text blocks, tool calls, and message lifecycle.
"""
def __init__(self, message_id: str | None = None, session_id: str | None = None):
self.message_id = message_id or str(uuid.uuid4())
self.session_id = session_id
self.text_block_id = str(uuid.uuid4())
self.has_started_text = False
self.has_ended_text = False
self.current_tool_calls: dict[str, dict[str, str]] = {}
self.resolved_tool_calls: set[str] = set()
self.step_open = False
@property
def has_unresolved_tool_calls(self) -> bool:
"""True when there are tool calls that haven't received output yet."""
return bool(self.current_tool_calls.keys() - self.resolved_tool_calls)
def convert_message(self, sdk_message: Message) -> list[StreamBaseResponse]:
"""Convert a single SDK message to Vercel AI SDK format."""
responses: list[StreamBaseResponse] = []
if isinstance(sdk_message, SystemMessage):
if sdk_message.subtype == "init":
responses.append(
StreamStart(messageId=self.message_id, sessionId=self.session_id)
)
# Open the first step (matches non-SDK: StreamStart then StreamStartStep)
responses.append(StreamStartStep())
self.step_open = True
elif isinstance(sdk_message, AssistantMessage):
# Flush any SDK built-in tool calls that didn't get a UserMessage
# result (e.g. WebSearch, Read handled internally by the CLI).
# BUT skip flush when this AssistantMessage is a parallel tool
# continuation (contains only ToolUseBlocks) — the prior tools
# are still executing concurrently and haven't finished yet.
is_tool_only = all(isinstance(b, ToolUseBlock) for b in sdk_message.content)
if not is_tool_only:
self._flush_unresolved_tool_calls(responses)
# After tool results, the SDK sends a new AssistantMessage for the
# next LLM turn. Open a new step if the previous one was closed.
if not self.step_open:
responses.append(StreamStartStep())
self.step_open = True
for block in sdk_message.content:
if isinstance(block, TextBlock):
if block.text:
self._ensure_text_started(responses)
responses.append(
StreamTextDelta(id=self.text_block_id, delta=block.text)
)
elif isinstance(block, ToolUseBlock):
self._end_text_if_open(responses)
# Strip MCP prefix so frontend sees "find_block"
# instead of "mcp__copilot__find_block".
tool_name = block.name.removeprefix(MCP_TOOL_PREFIX)
responses.append(
StreamToolInputStart(toolCallId=block.id, toolName=tool_name)
)
responses.append(
StreamToolInputAvailable(
toolCallId=block.id,
toolName=tool_name,
input=block.input,
)
)
self.current_tool_calls[block.id] = {"name": tool_name}
elif isinstance(sdk_message, UserMessage):
# UserMessage carries tool results back from tool execution.
content = sdk_message.content
blocks = content if isinstance(content, list) else []
resolved_in_blocks: set[str] = set()
sid = (self.session_id or "?")[:12]
parent_id_preview = getattr(sdk_message, "parent_tool_use_id", None)
logger.info(
"[SDK] [%s] UserMessage: %d blocks, content_type=%s, "
"parent_tool_use_id=%s",
sid,
len(blocks),
type(content).__name__,
parent_id_preview[:12] if parent_id_preview else "None",
)
for block in blocks:
if isinstance(block, ToolResultBlock) and block.tool_use_id:
# Skip if already resolved (e.g. by flush) — the real
# result supersedes the empty flush, but re-emitting
# would confuse the frontend's state machine.
if block.tool_use_id in self.resolved_tool_calls:
continue
tool_info = self.current_tool_calls.get(block.tool_use_id, {})
tool_name = tool_info.get("name", "unknown")
# Prefer the stashed full output over the SDK's
# (potentially truncated) ToolResultBlock content.
# The SDK truncates large results, writing them to disk,
# which breaks frontend widget parsing.
output = pop_pending_tool_output(tool_name) or (
_extract_tool_output(block.content)
)
responses.append(
StreamToolOutputAvailable(
toolCallId=block.tool_use_id,
toolName=tool_name,
output=output,
success=not (block.is_error or False),
)
)
resolved_in_blocks.add(block.tool_use_id)
# Handle SDK built-in tool results carried via parent_tool_use_id
# instead of (or in addition to) ToolResultBlock content.
parent_id = sdk_message.parent_tool_use_id
if (
parent_id
and parent_id not in resolved_in_blocks
and parent_id not in self.resolved_tool_calls
):
tool_info = self.current_tool_calls.get(parent_id, {})
tool_name = tool_info.get("name", "unknown")
# Try stashed output first (from PostToolUse hook),
# then tool_use_result dict, then string content.
output = pop_pending_tool_output(tool_name)
if not output:
tur = sdk_message.tool_use_result
if tur is not None:
output = _extract_tool_use_result(tur)
if not output and isinstance(content, str) and content.strip():
output = content.strip()
if output:
responses.append(
StreamToolOutputAvailable(
toolCallId=parent_id,
toolName=tool_name,
output=output,
success=True,
)
)
resolved_in_blocks.add(parent_id)
self.resolved_tool_calls.update(resolved_in_blocks)
# Close the current step after tool results — the next
# AssistantMessage will open a new step for the continuation.
if self.step_open:
responses.append(StreamFinishStep())
self.step_open = False
elif isinstance(sdk_message, ResultMessage):
self._flush_unresolved_tool_calls(responses)
self._end_text_if_open(responses)
# Close the step before finishing.
if self.step_open:
responses.append(StreamFinishStep())
self.step_open = False
if sdk_message.subtype == "success":
responses.append(StreamFinish())
elif sdk_message.subtype in ("error", "error_during_execution"):
error_msg = sdk_message.result or "Unknown error"
responses.append(
StreamError(errorText=str(error_msg), code="sdk_error")
)
responses.append(StreamFinish())
else:
logger.warning(
f"Unexpected ResultMessage subtype: {sdk_message.subtype}"
)
responses.append(StreamFinish())
else:
logger.debug(f"Unhandled SDK message type: {type(sdk_message).__name__}")
return responses
def _ensure_text_started(self, responses: list[StreamBaseResponse]) -> None:
"""Start (or restart) a text block if needed."""
if not self.has_started_text or self.has_ended_text:
if self.has_ended_text:
self.text_block_id = str(uuid.uuid4())
self.has_ended_text = False
responses.append(StreamTextStart(id=self.text_block_id))
self.has_started_text = True
def _end_text_if_open(self, responses: list[StreamBaseResponse]) -> None:
"""End the current text block if one is open."""
if self.has_started_text and not self.has_ended_text:
responses.append(StreamTextEnd(id=self.text_block_id))
self.has_ended_text = True
def _flush_unresolved_tool_calls(self, responses: list[StreamBaseResponse]) -> None:
"""Emit outputs for tool calls that didn't receive a UserMessage result.
SDK built-in tools (WebSearch, Read, etc.) may be executed by the CLI
internally without surfacing a separate ``UserMessage`` with
``ToolResultBlock`` content. The ``PostToolUse`` hook stashes their
output, which we pop and emit here before the next ``AssistantMessage``
starts.
"""
unresolved = [
(tid, info.get("name", "unknown"))
for tid, info in self.current_tool_calls.items()
if tid not in self.resolved_tool_calls
]
sid = (self.session_id or "?")[:12]
if not unresolved:
logger.info(
"[SDK] [%s] Flush called but all %d tool(s) already resolved",
sid,
len(self.current_tool_calls),
)
return
logger.info(
"[SDK] [%s] Flushing %d unresolved tool call(s): %s",
sid,
len(unresolved),
", ".join(f"{name}({tid[:12]})" for tid, name in unresolved),
)
flushed = False
for tool_id, tool_name in unresolved:
output = pop_pending_tool_output(tool_name)
if output is not None:
responses.append(
StreamToolOutputAvailable(
toolCallId=tool_id,
toolName=tool_name,
output=output,
success=True,
)
)
self.resolved_tool_calls.add(tool_id)
flushed = True
logger.info(
"[SDK] [%s] Flushed stashed output for %s " "(call %s, %d chars)",
sid,
tool_name,
tool_id[:12],
len(output),
)
else:
# No output available — emit an empty output so the frontend
# transitions the tool from input-available to output-available
# (stops the spinner).
responses.append(
StreamToolOutputAvailable(
toolCallId=tool_id,
toolName=tool_name,
output="",
success=True,
)
)
self.resolved_tool_calls.add(tool_id)
flushed = True
logger.warning(
"[SDK] [%s] Flushed EMPTY output for unresolved tool %s "
"(call %s) — stash was empty (likely SDK hook race "
"condition: PostToolUse hook hadn't completed before "
"flush was triggered)",
sid,
tool_name,
tool_id[:12],
)
if flushed and self.step_open:
responses.append(StreamFinishStep())
self.step_open = False
def _extract_tool_output(content: str | list[dict[str, str]] | None) -> str:
"""Extract a string output from a ToolResultBlock's content field."""
if isinstance(content, str):
return content
if isinstance(content, list):
parts = [item.get("text", "") for item in content if item.get("type") == "text"]
if parts:
return "".join(parts)
try:
return json.dumps(content)
except (TypeError, ValueError):
return str(content)
if content is None:
return ""
try:
return json.dumps(content)
except (TypeError, ValueError):
return str(content)
def _extract_tool_use_result(result: object) -> str:
"""Extract a string from a UserMessage's ``tool_use_result`` dict.
SDK built-in tools may store their result in ``tool_use_result``
instead of (or in addition to) ``ToolResultBlock`` content blocks.
"""
if isinstance(result, str):
return result
if isinstance(result, dict):
# Try common result keys
for key in ("content", "text", "output", "stdout", "result"):
val = result.get(key)
if isinstance(val, str) and val:
return val
# Fall back to JSON serialization of the whole dict
try:
return json.dumps(result)
except (TypeError, ValueError):
return str(result)
if result is None:
return ""
try:
return json.dumps(result)
except (TypeError, ValueError):
return str(result)

View File

@@ -0,0 +1,678 @@
"""Unit tests for the SDK response adapter."""
import asyncio
import pytest
from claude_agent_sdk import (
AssistantMessage,
ResultMessage,
SystemMessage,
TextBlock,
ToolResultBlock,
ToolUseBlock,
UserMessage,
)
from backend.copilot.response_model import (
StreamBaseResponse,
StreamError,
StreamFinish,
StreamFinishStep,
StreamStart,
StreamStartStep,
StreamTextDelta,
StreamTextEnd,
StreamTextStart,
StreamToolInputAvailable,
StreamToolInputStart,
StreamToolOutputAvailable,
)
from .response_adapter import SDKResponseAdapter
from .tool_adapter import MCP_TOOL_PREFIX
from .tool_adapter import _pending_tool_outputs as _pto
from .tool_adapter import _stash_event
from .tool_adapter import stash_pending_tool_output as _stash
from .tool_adapter import wait_for_stash
def _adapter() -> SDKResponseAdapter:
return SDKResponseAdapter(message_id="msg-1", session_id="session-1")
# -- SystemMessage -----------------------------------------------------------
def test_system_init_emits_start_and_step():
adapter = _adapter()
results = adapter.convert_message(SystemMessage(subtype="init", data={}))
assert len(results) == 2
assert isinstance(results[0], StreamStart)
assert results[0].messageId == "msg-1"
assert results[0].sessionId == "session-1"
assert isinstance(results[1], StreamStartStep)
def test_system_non_init_emits_nothing():
adapter = _adapter()
results = adapter.convert_message(SystemMessage(subtype="other", data={}))
assert results == []
# -- AssistantMessage with TextBlock -----------------------------------------
def test_text_block_emits_step_start_and_delta():
adapter = _adapter()
msg = AssistantMessage(content=[TextBlock(text="hello")], model="test")
results = adapter.convert_message(msg)
assert len(results) == 3
assert isinstance(results[0], StreamStartStep)
assert isinstance(results[1], StreamTextStart)
assert isinstance(results[2], StreamTextDelta)
assert results[2].delta == "hello"
def test_empty_text_block_emits_only_step():
adapter = _adapter()
msg = AssistantMessage(content=[TextBlock(text="")], model="test")
results = adapter.convert_message(msg)
# Empty text skipped, but step still opens
assert len(results) == 1
assert isinstance(results[0], StreamStartStep)
def test_multiple_text_deltas_reuse_block_id():
adapter = _adapter()
msg1 = AssistantMessage(content=[TextBlock(text="a")], model="test")
msg2 = AssistantMessage(content=[TextBlock(text="b")], model="test")
r1 = adapter.convert_message(msg1)
r2 = adapter.convert_message(msg2)
# First gets step+start+delta, second only delta (block & step already started)
assert len(r1) == 3
assert isinstance(r1[0], StreamStartStep)
assert isinstance(r1[1], StreamTextStart)
assert len(r2) == 1
assert isinstance(r2[0], StreamTextDelta)
assert r1[1].id == r2[0].id # same block ID
# -- AssistantMessage with ToolUseBlock --------------------------------------
def test_tool_use_emits_input_start_and_available():
"""Tool names arrive with MCP prefix and should be stripped for the frontend."""
adapter = _adapter()
msg = AssistantMessage(
content=[
ToolUseBlock(
id="tool-1",
name=f"{MCP_TOOL_PREFIX}find_agent",
input={"q": "x"},
)
],
model="test",
)
results = adapter.convert_message(msg)
assert len(results) == 3
assert isinstance(results[0], StreamStartStep)
assert isinstance(results[1], StreamToolInputStart)
assert results[1].toolCallId == "tool-1"
assert results[1].toolName == "find_agent" # prefix stripped
assert isinstance(results[2], StreamToolInputAvailable)
assert results[2].toolName == "find_agent" # prefix stripped
assert results[2].input == {"q": "x"}
def test_text_then_tool_ends_text_block():
adapter = _adapter()
text_msg = AssistantMessage(content=[TextBlock(text="thinking...")], model="test")
tool_msg = AssistantMessage(
content=[ToolUseBlock(id="t1", name=f"{MCP_TOOL_PREFIX}tool", input={})],
model="test",
)
adapter.convert_message(text_msg) # opens step + text
results = adapter.convert_message(tool_msg)
# Step already open, so: TextEnd, ToolInputStart, ToolInputAvailable
assert len(results) == 3
assert isinstance(results[0], StreamTextEnd)
assert isinstance(results[1], StreamToolInputStart)
# -- UserMessage with ToolResultBlock ----------------------------------------
def test_tool_result_emits_output_and_finish_step():
adapter = _adapter()
# First register the tool call (opens step) — SDK sends prefixed name
tool_msg = AssistantMessage(
content=[ToolUseBlock(id="t1", name=f"{MCP_TOOL_PREFIX}find_agent", input={})],
model="test",
)
adapter.convert_message(tool_msg)
# Now send tool result
result_msg = UserMessage(
content=[ToolResultBlock(tool_use_id="t1", content="found 3 agents")]
)
results = adapter.convert_message(result_msg)
assert len(results) == 2
assert isinstance(results[0], StreamToolOutputAvailable)
assert results[0].toolCallId == "t1"
assert results[0].toolName == "find_agent" # prefix stripped
assert results[0].output == "found 3 agents"
assert results[0].success is True
assert isinstance(results[1], StreamFinishStep)
def test_tool_result_error():
adapter = _adapter()
adapter.convert_message(
AssistantMessage(
content=[
ToolUseBlock(id="t1", name=f"{MCP_TOOL_PREFIX}run_agent", input={})
],
model="test",
)
)
result_msg = UserMessage(
content=[ToolResultBlock(tool_use_id="t1", content="timeout", is_error=True)]
)
results = adapter.convert_message(result_msg)
assert isinstance(results[0], StreamToolOutputAvailable)
assert results[0].success is False
assert isinstance(results[1], StreamFinishStep)
def test_tool_result_list_content():
adapter = _adapter()
adapter.convert_message(
AssistantMessage(
content=[ToolUseBlock(id="t1", name=f"{MCP_TOOL_PREFIX}tool", input={})],
model="test",
)
)
result_msg = UserMessage(
content=[
ToolResultBlock(
tool_use_id="t1",
content=[
{"type": "text", "text": "line1"},
{"type": "text", "text": "line2"},
],
)
]
)
results = adapter.convert_message(result_msg)
assert isinstance(results[0], StreamToolOutputAvailable)
assert results[0].output == "line1line2"
assert isinstance(results[1], StreamFinishStep)
def test_string_user_message_ignored():
"""A plain string UserMessage (not tool results) produces no output."""
adapter = _adapter()
results = adapter.convert_message(UserMessage(content="hello"))
assert results == []
# -- ResultMessage -----------------------------------------------------------
def test_result_success_emits_finish_step_and_finish():
adapter = _adapter()
# Start some text first (opens step)
adapter.convert_message(
AssistantMessage(content=[TextBlock(text="done")], model="test")
)
msg = ResultMessage(
subtype="success",
duration_ms=100,
duration_api_ms=50,
is_error=False,
num_turns=1,
session_id="s1",
)
results = adapter.convert_message(msg)
# TextEnd + FinishStep + StreamFinish
assert len(results) == 3
assert isinstance(results[0], StreamTextEnd)
assert isinstance(results[1], StreamFinishStep)
assert isinstance(results[2], StreamFinish)
def test_result_error_emits_error_and_finish():
adapter = _adapter()
msg = ResultMessage(
subtype="error",
duration_ms=100,
duration_api_ms=50,
is_error=True,
num_turns=0,
session_id="s1",
result="API rate limited",
)
results = adapter.convert_message(msg)
# No step was open, so no FinishStep — just Error + Finish
assert len(results) == 2
assert isinstance(results[0], StreamError)
assert "API rate limited" in results[0].errorText
assert isinstance(results[1], StreamFinish)
# -- Text after tools (new block ID) ----------------------------------------
def test_text_after_tool_gets_new_block_id():
adapter = _adapter()
# Text -> Tool -> ToolResult -> Text should get a new text block ID and step
adapter.convert_message(
AssistantMessage(content=[TextBlock(text="before")], model="test")
)
adapter.convert_message(
AssistantMessage(
content=[ToolUseBlock(id="t1", name=f"{MCP_TOOL_PREFIX}tool", input={})],
model="test",
)
)
# Send tool result (closes step)
adapter.convert_message(
UserMessage(content=[ToolResultBlock(tool_use_id="t1", content="ok")])
)
results = adapter.convert_message(
AssistantMessage(content=[TextBlock(text="after")], model="test")
)
# Should get StreamStartStep (new step) + StreamTextStart (new block) + StreamTextDelta
assert len(results) == 3
assert isinstance(results[0], StreamStartStep)
assert isinstance(results[1], StreamTextStart)
assert isinstance(results[2], StreamTextDelta)
assert results[2].delta == "after"
# -- Full conversation flow --------------------------------------------------
def test_full_conversation_flow():
"""Simulate a complete conversation: init -> text -> tool -> result -> text -> finish."""
adapter = _adapter()
all_responses: list[StreamBaseResponse] = []
# 1. Init
all_responses.extend(
adapter.convert_message(SystemMessage(subtype="init", data={}))
)
# 2. Assistant text
all_responses.extend(
adapter.convert_message(
AssistantMessage(content=[TextBlock(text="Let me search")], model="test")
)
)
# 3. Tool use
all_responses.extend(
adapter.convert_message(
AssistantMessage(
content=[
ToolUseBlock(
id="t1",
name=f"{MCP_TOOL_PREFIX}find_agent",
input={"query": "email"},
)
],
model="test",
)
)
)
# 4. Tool result
all_responses.extend(
adapter.convert_message(
UserMessage(
content=[ToolResultBlock(tool_use_id="t1", content="Found 2 agents")]
)
)
)
# 5. More text
all_responses.extend(
adapter.convert_message(
AssistantMessage(content=[TextBlock(text="I found 2")], model="test")
)
)
# 6. Result
all_responses.extend(
adapter.convert_message(
ResultMessage(
subtype="success",
duration_ms=500,
duration_api_ms=400,
is_error=False,
num_turns=2,
session_id="s1",
)
)
)
types = [type(r).__name__ for r in all_responses]
assert types == [
"StreamStart",
"StreamStartStep", # step 1: text + tool call
"StreamTextStart",
"StreamTextDelta", # "Let me search"
"StreamTextEnd", # closed before tool
"StreamToolInputStart",
"StreamToolInputAvailable",
"StreamToolOutputAvailable", # tool result
"StreamFinishStep", # step 1 closed after tool result
"StreamStartStep", # step 2: continuation text
"StreamTextStart", # new block after tool
"StreamTextDelta", # "I found 2"
"StreamTextEnd", # closed by result
"StreamFinishStep", # step 2 closed
"StreamFinish",
]
# -- Flush unresolved tool calls --------------------------------------------
def test_flush_unresolved_at_result_message():
"""Built-in tools (WebSearch) without UserMessage results get flushed at ResultMessage."""
adapter = _adapter()
all_responses: list[StreamBaseResponse] = []
# 1. Init
all_responses.extend(
adapter.convert_message(SystemMessage(subtype="init", data={}))
)
# 2. Tool use (built-in tool — no MCP prefix)
all_responses.extend(
adapter.convert_message(
AssistantMessage(
content=[
ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
],
model="test",
)
)
)
# 3. No UserMessage for this tool — go straight to ResultMessage
all_responses.extend(
adapter.convert_message(
ResultMessage(
subtype="success",
duration_ms=100,
duration_api_ms=50,
is_error=False,
num_turns=1,
session_id="s1",
)
)
)
types = [type(r).__name__ for r in all_responses]
assert types == [
"StreamStart",
"StreamStartStep",
"StreamToolInputStart",
"StreamToolInputAvailable",
"StreamToolOutputAvailable", # flushed with empty output
"StreamFinishStep", # step closed by flush
"StreamFinish",
]
# The flushed output should be empty (no stash available)
output_event = [
r for r in all_responses if isinstance(r, StreamToolOutputAvailable)
][0]
assert output_event.toolCallId == "ws-1"
assert output_event.toolName == "WebSearch"
assert output_event.output == ""
def test_flush_unresolved_at_next_assistant_message():
"""Built-in tools get flushed when the next AssistantMessage arrives."""
adapter = _adapter()
all_responses: list[StreamBaseResponse] = []
# 1. Init
all_responses.extend(
adapter.convert_message(SystemMessage(subtype="init", data={}))
)
# 2. Tool use (built-in — no UserMessage will come)
all_responses.extend(
adapter.convert_message(
AssistantMessage(
content=[
ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
],
model="test",
)
)
)
# 3. Next AssistantMessage triggers flush before processing its blocks
all_responses.extend(
adapter.convert_message(
AssistantMessage(
content=[TextBlock(text="Here are the results")], model="test"
)
)
)
types = [type(r).__name__ for r in all_responses]
assert types == [
"StreamStart",
"StreamStartStep",
"StreamToolInputStart",
"StreamToolInputAvailable",
# Flush at next AssistantMessage:
"StreamToolOutputAvailable",
"StreamFinishStep", # step closed by flush
# New step for continuation text:
"StreamStartStep",
"StreamTextStart",
"StreamTextDelta",
]
def test_flush_with_stashed_output():
"""Stashed output from PostToolUse hook is used when flushing."""
adapter = _adapter()
# Simulate PostToolUse hook stashing output
_pto.set({})
_stash("WebSearch", "Search result: 5 items found")
all_responses: list[StreamBaseResponse] = []
# Tool use
all_responses.extend(
adapter.convert_message(
AssistantMessage(
content=[
ToolUseBlock(id="ws-1", name="WebSearch", input={"query": "test"})
],
model="test",
)
)
)
# ResultMessage triggers flush
all_responses.extend(
adapter.convert_message(
ResultMessage(
subtype="success",
duration_ms=100,
duration_api_ms=50,
is_error=False,
num_turns=1,
session_id="s1",
)
)
)
output_events = [
r for r in all_responses if isinstance(r, StreamToolOutputAvailable)
]
assert len(output_events) == 1
assert output_events[0].output == "Search result: 5 items found"
# Cleanup
_pto.set({}) # type: ignore[arg-type]
# -- wait_for_stash synchronisation tests --
@pytest.mark.asyncio
async def test_wait_for_stash_signaled():
"""wait_for_stash returns True when stash_pending_tool_output signals."""
_pto.set({})
event = asyncio.Event()
_stash_event.set(event)
# Simulate a PostToolUse hook that stashes output after a short delay
async def delayed_stash():
await asyncio.sleep(0.01)
_stash("WebSearch", "result data")
asyncio.create_task(delayed_stash())
result = await wait_for_stash(timeout=1.0)
assert result is True
assert _pto.get({}).get("WebSearch") == ["result data"]
# Cleanup
_pto.set({}) # type: ignore[arg-type]
_stash_event.set(None)
@pytest.mark.asyncio
async def test_wait_for_stash_timeout():
"""wait_for_stash returns False on timeout when no stash occurs."""
_pto.set({})
event = asyncio.Event()
_stash_event.set(event)
result = await wait_for_stash(timeout=0.05)
assert result is False
# Cleanup
_pto.set({}) # type: ignore[arg-type]
_stash_event.set(None)
@pytest.mark.asyncio
async def test_wait_for_stash_already_stashed():
"""wait_for_stash picks up a stash that happened just before the wait."""
_pto.set({})
event = asyncio.Event()
_stash_event.set(event)
# Stash before waiting — simulates hook completing before message arrives
_stash("Read", "file contents")
# Event is now set; wait_for_stash detects the fast path and returns
# immediately without timing out.
result = await wait_for_stash(timeout=0.05)
assert result is True
# But the stash itself is populated
assert _pto.get({}).get("Read") == ["file contents"]
# Cleanup
_pto.set({}) # type: ignore[arg-type]
_stash_event.set(None)
# -- Parallel tool call tests --
def test_parallel_tool_calls_not_flushed_prematurely():
"""Parallel tool calls should NOT be flushed when the next AssistantMessage
only contains ToolUseBlocks (parallel continuation)."""
adapter = SDKResponseAdapter()
# Init
adapter.convert_message(SystemMessage(subtype="init", data={}))
# First AssistantMessage: tool call #1
msg1 = AssistantMessage(
content=[ToolUseBlock(id="t1", name="WebSearch", input={"q": "foo"})],
model="test",
)
r1 = adapter.convert_message(msg1)
assert any(isinstance(r, StreamToolInputAvailable) for r in r1)
assert adapter.has_unresolved_tool_calls
# Second AssistantMessage: tool call #2 (parallel continuation)
msg2 = AssistantMessage(
content=[ToolUseBlock(id="t2", name="WebSearch", input={"q": "bar"})],
model="test",
)
r2 = adapter.convert_message(msg2)
# No flush should have happened — t1 should NOT have StreamToolOutputAvailable
output_events = [r for r in r2 if isinstance(r, StreamToolOutputAvailable)]
assert len(output_events) == 0, (
f"Tool-only AssistantMessage should not flush prior tools, "
f"but got {len(output_events)} output events"
)
# Both t1 and t2 should still be unresolved
assert "t1" not in adapter.resolved_tool_calls
assert "t2" not in adapter.resolved_tool_calls
def test_text_assistant_message_flushes_prior_tools():
"""An AssistantMessage with text (new turn) should flush unresolved tools."""
adapter = SDKResponseAdapter()
# Init
adapter.convert_message(SystemMessage(subtype="init", data={}))
# Tool call
msg1 = AssistantMessage(
content=[ToolUseBlock(id="t1", name="WebSearch", input={"q": "foo"})],
model="test",
)
adapter.convert_message(msg1)
assert adapter.has_unresolved_tool_calls
# Text AssistantMessage (new turn after tools completed)
msg2 = AssistantMessage(
content=[TextBlock(text="Here are the results")],
model="test",
)
r2 = adapter.convert_message(msg2)
# Flush SHOULD have happened — t1 gets empty output
output_events = [r for r in r2 if isinstance(r, StreamToolOutputAvailable)]
assert len(output_events) == 1
assert output_events[0].toolCallId == "t1"
assert "t1" in adapter.resolved_tool_calls
def test_already_resolved_tool_skipped_in_user_message():
"""A tool result in UserMessage should be skipped if already resolved by flush."""
adapter = SDKResponseAdapter()
adapter.convert_message(SystemMessage(subtype="init", data={}))
# Tool call + flush via text message
adapter.convert_message(
AssistantMessage(
content=[ToolUseBlock(id="t1", name="WebSearch", input={})],
model="test",
)
)
adapter.convert_message(
AssistantMessage(
content=[TextBlock(text="Done")],
model="test",
)
)
assert "t1" in adapter.resolved_tool_calls
# Now UserMessage arrives with the real result — should be skipped
user_msg = UserMessage(content=[ToolResultBlock(tool_use_id="t1", content="real")])
r = adapter.convert_message(user_msg)
output_events = [r_ for r_ in r if isinstance(r_, StreamToolOutputAvailable)]
assert (
len(output_events) == 0
), "Already-resolved tool should not emit duplicate output"

View File

@@ -0,0 +1,194 @@
"""SDK compatibility tests — verify the claude-agent-sdk public API surface we depend on.
Instead of pinning to a narrow version range, these tests verify that the
installed SDK exposes every class, function, attribute, and method the copilot
integration relies on. If an SDK upgrade removes or renames something these
tests will catch it immediately.
"""
import inspect
import pytest
# ---------------------------------------------------------------------------
# Public types & factories
# ---------------------------------------------------------------------------
def test_sdk_exports_client_and_options():
from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
assert inspect.isclass(ClaudeSDKClient)
assert inspect.isclass(ClaudeAgentOptions)
def test_sdk_exports_message_types():
from claude_agent_sdk import (
AssistantMessage,
Message,
ResultMessage,
SystemMessage,
UserMessage,
)
for cls in (AssistantMessage, ResultMessage, SystemMessage, UserMessage):
assert inspect.isclass(cls), f"{cls.__name__} is not a class"
# Message is a Union type alias, just verify it's importable
assert Message is not None
def test_sdk_exports_content_block_types():
from claude_agent_sdk import TextBlock, ToolResultBlock, ToolUseBlock
for cls in (TextBlock, ToolResultBlock, ToolUseBlock):
assert inspect.isclass(cls), f"{cls.__name__} is not a class"
def test_sdk_exports_mcp_helpers():
from claude_agent_sdk import create_sdk_mcp_server, tool
assert callable(create_sdk_mcp_server)
assert callable(tool)
# ---------------------------------------------------------------------------
# ClaudeSDKClient interface
# ---------------------------------------------------------------------------
def test_client_has_required_methods():
from claude_agent_sdk import ClaudeSDKClient
required = ["connect", "disconnect", "query", "receive_messages"]
for name in required:
attr = getattr(ClaudeSDKClient, name, None)
assert attr is not None, f"ClaudeSDKClient.{name} missing"
assert callable(attr), f"ClaudeSDKClient.{name} is not callable"
def test_client_supports_async_context_manager():
from claude_agent_sdk import ClaudeSDKClient
assert hasattr(ClaudeSDKClient, "__aenter__")
assert hasattr(ClaudeSDKClient, "__aexit__")
# ---------------------------------------------------------------------------
# ClaudeAgentOptions fields
# ---------------------------------------------------------------------------
def test_agent_options_accepts_required_fields():
"""Verify ClaudeAgentOptions accepts all kwargs our code passes."""
from claude_agent_sdk import ClaudeAgentOptions
opts = ClaudeAgentOptions(
system_prompt="test",
cwd="/tmp",
)
assert opts.system_prompt == "test"
assert opts.cwd == "/tmp"
def test_agent_options_accepts_all_our_fields():
"""Comprehensive check of every field we use in service.py."""
from claude_agent_sdk import ClaudeAgentOptions
fields_we_use = [
"system_prompt",
"mcp_servers",
"allowed_tools",
"disallowed_tools",
"hooks",
"cwd",
"model",
"env",
"resume",
"max_buffer_size",
]
sig = inspect.signature(ClaudeAgentOptions)
for field in fields_we_use:
assert field in sig.parameters, (
f"ClaudeAgentOptions no longer accepts '{field}'"
f"available params: {list(sig.parameters.keys())}"
)
# ---------------------------------------------------------------------------
# Message attributes
# ---------------------------------------------------------------------------
def test_assistant_message_has_content_and_model():
from claude_agent_sdk import AssistantMessage, TextBlock
msg = AssistantMessage(content=[TextBlock(text="hi")], model="test")
assert hasattr(msg, "content")
assert hasattr(msg, "model")
def test_result_message_has_required_attrs():
from claude_agent_sdk import ResultMessage
msg = ResultMessage(
subtype="success",
duration_ms=100,
duration_api_ms=50,
is_error=False,
num_turns=1,
session_id="s1",
)
assert msg.subtype == "success"
assert hasattr(msg, "result")
def test_system_message_has_subtype_and_data():
from claude_agent_sdk import SystemMessage
msg = SystemMessage(subtype="init", data={})
assert msg.subtype == "init"
assert msg.data == {}
def test_user_message_has_parent_tool_use_id():
from claude_agent_sdk import UserMessage
msg = UserMessage(content="test")
assert hasattr(msg, "parent_tool_use_id")
assert hasattr(msg, "tool_use_result")
def test_tool_use_block_has_id_name_input():
from claude_agent_sdk import ToolUseBlock
block = ToolUseBlock(id="t1", name="test", input={"key": "val"})
assert block.id == "t1"
assert block.name == "test"
assert block.input == {"key": "val"}
def test_tool_result_block_has_required_attrs():
from claude_agent_sdk import ToolResultBlock
block = ToolResultBlock(tool_use_id="t1", content="result")
assert block.tool_use_id == "t1"
assert block.content == "result"
assert hasattr(block, "is_error")
# ---------------------------------------------------------------------------
# Hook types
# ---------------------------------------------------------------------------
@pytest.mark.parametrize(
"hook_event",
["PreToolUse", "PostToolUse", "Stop"],
)
def test_sdk_exports_hook_event_type(hook_event: str):
"""Verify HookEvent literal includes the events our security_hooks use."""
from claude_agent_sdk.types import HookEvent
# HookEvent is a Literal type — check that our events are valid values.
# We can't easily inspect Literal at runtime, so just verify the type exists.
assert HookEvent is not None

View File

@@ -0,0 +1,351 @@
"""Security hooks for Claude Agent SDK integration.
This module provides security hooks that validate tool calls before execution,
ensuring multi-user isolation and preventing unauthorized operations.
"""
import json
import logging
import os
import re
from collections.abc import Callable
from typing import Any, cast
from .tool_adapter import (
BLOCKED_TOOLS,
DANGEROUS_PATTERNS,
MCP_TOOL_PREFIX,
WORKSPACE_SCOPED_TOOLS,
stash_pending_tool_output,
)
logger = logging.getLogger(__name__)
def _deny(reason: str) -> dict[str, Any]:
"""Return a hook denial response."""
return {
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": reason,
}
}
def _validate_workspace_path(
tool_name: str, tool_input: dict[str, Any], sdk_cwd: str | None
) -> dict[str, Any]:
"""Validate that a workspace-scoped tool only accesses allowed paths.
Allowed directories:
- The SDK working directory (``/tmp/copilot-<session>/``)
- The SDK tool-results directory (``~/.claude/projects/…/tool-results/``)
"""
path = tool_input.get("file_path") or tool_input.get("path") or ""
if not path:
# Glob/Grep without a path default to cwd which is already sandboxed
return {}
# Resolve relative paths against sdk_cwd (the SDK sets cwd so the LLM
# naturally uses relative paths like "test.txt" instead of absolute ones).
# Tilde paths (~/) are home-dir references, not relative — expand first.
if path.startswith("~"):
resolved = os.path.realpath(os.path.expanduser(path))
elif not os.path.isabs(path) and sdk_cwd:
resolved = os.path.realpath(os.path.join(sdk_cwd, path))
else:
resolved = os.path.realpath(path)
# Allow access within the SDK working directory
if sdk_cwd:
norm_cwd = os.path.realpath(sdk_cwd)
if resolved.startswith(norm_cwd + os.sep) or resolved == norm_cwd:
return {}
# Allow access to ~/.claude/projects/*/tool-results/ (big tool results)
claude_dir = os.path.realpath(os.path.expanduser("~/.claude/projects"))
tool_results_seg = os.sep + "tool-results" + os.sep
if resolved.startswith(claude_dir + os.sep) and tool_results_seg in resolved:
return {}
logger.warning(
f"Blocked {tool_name} outside workspace: {path} (resolved={resolved})"
)
workspace_hint = f" Allowed workspace: {sdk_cwd}" if sdk_cwd else ""
return _deny(
f"[SECURITY] Tool '{tool_name}' can only access files within the workspace "
f"directory.{workspace_hint} "
"This is enforced by the platform and cannot be bypassed."
)
def _validate_tool_access(
tool_name: str, tool_input: dict[str, Any], sdk_cwd: str | None = None
) -> dict[str, Any]:
"""Validate that a tool call is allowed.
Returns:
Empty dict to allow, or dict with hookSpecificOutput to deny
"""
# Block forbidden tools
if tool_name in BLOCKED_TOOLS:
logger.warning(f"Blocked tool access attempt: {tool_name}")
return _deny(
f"[SECURITY] Tool '{tool_name}' is blocked for security. "
"This is enforced by the platform and cannot be bypassed. "
"Use the CoPilot-specific MCP tools instead."
)
# Workspace-scoped tools: allowed only within the SDK workspace directory
if tool_name in WORKSPACE_SCOPED_TOOLS:
return _validate_workspace_path(tool_name, tool_input, sdk_cwd)
# Check for dangerous patterns in tool input
# Use json.dumps for predictable format (str() produces Python repr)
input_str = json.dumps(tool_input) if tool_input else ""
for pattern in DANGEROUS_PATTERNS:
if re.search(pattern, input_str, re.IGNORECASE):
logger.warning(
f"Blocked dangerous pattern in tool input: {pattern} in {tool_name}"
)
return _deny(
"[SECURITY] Input contains a blocked pattern. "
"This is enforced by the platform and cannot be bypassed."
)
return {}
def _validate_user_isolation(
tool_name: str, tool_input: dict[str, Any], user_id: str | None
) -> dict[str, Any]:
"""Validate that tool calls respect user isolation."""
# For workspace file tools, ensure path doesn't escape
if "workspace" in tool_name.lower():
# The "path" param is a cloud storage key (e.g. "/ASEAN/report.md")
# where a leading "/" is normal. Only check for ".." traversal.
# Filesystem paths (source_path, save_to_path) are validated inside
# the tool itself via _validate_ephemeral_path.
path = tool_input.get("path", "") or tool_input.get("file_path", "")
if path and ".." in path:
logger.warning(f"Blocked path traversal attempt: {path} by user {user_id}")
return {
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "Path traversal not allowed",
}
}
return {}
def create_security_hooks(
user_id: str | None,
sdk_cwd: str | None = None,
max_subtasks: int = 3,
on_stop: Callable[[str, str], None] | None = None,
) -> dict[str, Any]:
"""Create the security hooks configuration for Claude Agent SDK.
Includes security validation and observability hooks:
- PreToolUse: Security validation before tool execution
- PostToolUse: Log successful tool executions
- PostToolUseFailure: Log and handle failed tool executions
- PreCompact: Log context compaction events (SDK handles compaction automatically)
- Stop: Capture transcript path for stateless resume (when *on_stop* is provided)
Args:
user_id: Current user ID for isolation validation
sdk_cwd: SDK working directory for workspace-scoped tool validation
max_subtasks: Maximum Task (sub-agent) spawns allowed per session
on_stop: Callback ``(transcript_path, sdk_session_id)`` invoked when
the SDK finishes processing — used to read the JSONL transcript
before the CLI process exits.
Returns:
Hooks configuration dict for ClaudeAgentOptions
"""
try:
from claude_agent_sdk import HookMatcher
from claude_agent_sdk.types import HookContext, HookInput, SyncHookJSONOutput
# Per-session counter for Task sub-agent spawns
task_spawn_count = 0
async def pre_tool_use_hook(
input_data: HookInput,
tool_use_id: str | None,
context: HookContext,
) -> SyncHookJSONOutput:
"""Combined pre-tool-use validation hook."""
nonlocal task_spawn_count
_ = context # unused but required by signature
tool_name = cast(str, input_data.get("tool_name", ""))
tool_input = cast(dict[str, Any], input_data.get("tool_input", {}))
# Rate-limit Task (sub-agent) spawns per session
if tool_name == "Task":
# Block background task execution first — denied calls
# should not consume a subtask slot.
if tool_input.get("run_in_background"):
logger.info(f"[SDK] Blocked background Task, user={user_id}")
return cast(
SyncHookJSONOutput,
_deny(
"Background task execution is not supported. "
"Run tasks in the foreground instead "
"(remove the run_in_background parameter)."
),
)
if task_spawn_count >= max_subtasks:
logger.warning(
f"[SDK] Task limit reached ({max_subtasks}), user={user_id}"
)
return cast(
SyncHookJSONOutput,
_deny(
f"Maximum {max_subtasks} sub-tasks per session. "
"Please continue in the main conversation."
),
)
task_spawn_count += 1
# Strip MCP prefix for consistent validation
is_copilot_tool = tool_name.startswith(MCP_TOOL_PREFIX)
clean_name = tool_name.removeprefix(MCP_TOOL_PREFIX)
# Only block non-CoPilot tools; our MCP-registered tools
# (including Read for oversized results) are already sandboxed.
if not is_copilot_tool:
result = _validate_tool_access(clean_name, tool_input, sdk_cwd)
if result:
return cast(SyncHookJSONOutput, result)
# Validate user isolation
result = _validate_user_isolation(clean_name, tool_input, user_id)
if result:
return cast(SyncHookJSONOutput, result)
logger.debug(f"[SDK] Tool start: {tool_name}, user={user_id}")
return cast(SyncHookJSONOutput, {})
async def post_tool_use_hook(
input_data: HookInput,
tool_use_id: str | None,
context: HookContext,
) -> SyncHookJSONOutput:
"""Log successful tool executions and stash SDK built-in tool outputs.
MCP tools stash their output in ``_execute_tool_sync`` before the
SDK can truncate it. SDK built-in tools (WebSearch, Read, etc.)
are executed by the CLI internally — this hook captures their
output so the response adapter can forward it to the frontend.
"""
_ = context
tool_name = cast(str, input_data.get("tool_name", ""))
is_builtin = not tool_name.startswith(MCP_TOOL_PREFIX)
logger.info(
"[SDK] PostToolUse: %s (builtin=%s, tool_use_id=%s)",
tool_name,
is_builtin,
(tool_use_id or "")[:12],
)
# Stash output for SDK built-in tools so the response adapter can
# emit StreamToolOutputAvailable even when the CLI doesn't surface
# a separate UserMessage with ToolResultBlock content.
if is_builtin:
tool_response = input_data.get("tool_response")
if tool_response is not None:
resp_preview = str(tool_response)[:100]
logger.info(
"[SDK] Stashing builtin output for %s (%d chars): %s...",
tool_name,
len(str(tool_response)),
resp_preview,
)
stash_pending_tool_output(tool_name, tool_response)
else:
logger.warning(
"[SDK] PostToolUse for builtin %s but tool_response is None",
tool_name,
)
return cast(SyncHookJSONOutput, {})
async def post_tool_failure_hook(
input_data: HookInput,
tool_use_id: str | None,
context: HookContext,
) -> SyncHookJSONOutput:
"""Log failed tool executions for debugging."""
_ = context
tool_name = cast(str, input_data.get("tool_name", ""))
error = input_data.get("error", "Unknown error")
logger.warning(
f"[SDK] Tool failed: {tool_name}, error={error}, "
f"user={user_id}, tool_use_id={tool_use_id}"
)
return cast(SyncHookJSONOutput, {})
async def pre_compact_hook(
input_data: HookInput,
tool_use_id: str | None,
context: HookContext,
) -> SyncHookJSONOutput:
"""Log when SDK triggers context compaction.
The SDK automatically compacts conversation history when it grows too large.
This hook provides visibility into when compaction happens.
"""
_ = context, tool_use_id
trigger = input_data.get("trigger", "auto")
logger.info(
f"[SDK] Context compaction triggered: {trigger}, user={user_id}"
)
return cast(SyncHookJSONOutput, {})
# --- Stop hook: capture transcript path for stateless resume ---
async def stop_hook(
input_data: HookInput,
tool_use_id: str | None,
context: HookContext,
) -> SyncHookJSONOutput:
"""Capture transcript path when SDK finishes processing.
The Stop hook fires while the CLI process is still alive, giving us
a reliable window to read the JSONL transcript before SIGTERM.
"""
_ = context, tool_use_id
transcript_path = cast(str, input_data.get("transcript_path", ""))
sdk_session_id = cast(str, input_data.get("session_id", ""))
if transcript_path and on_stop:
logger.info(
f"[SDK] Stop hook: transcript_path={transcript_path}, "
f"sdk_session_id={sdk_session_id[:12]}..."
)
on_stop(transcript_path, sdk_session_id)
return cast(SyncHookJSONOutput, {})
hooks: dict[str, Any] = {
"PreToolUse": [HookMatcher(matcher="*", hooks=[pre_tool_use_hook])],
"PostToolUse": [HookMatcher(matcher="*", hooks=[post_tool_use_hook])],
"PostToolUseFailure": [
HookMatcher(matcher="*", hooks=[post_tool_failure_hook])
],
"PreCompact": [HookMatcher(matcher="*", hooks=[pre_compact_hook])],
}
if on_stop is not None:
hooks["Stop"] = [HookMatcher(matcher=None, hooks=[stop_hook])]
return hooks
except ImportError:
# Fallback for when SDK isn't available - return empty hooks
logger.warning("claude-agent-sdk not available, security hooks disabled")
return {}

View File

@@ -0,0 +1,335 @@
"""Tests for SDK security hooks — workspace paths, tool access, and deny messages.
These are pure unit tests with no external dependencies (no SDK, no DB, no server).
They validate that the security hooks correctly block unauthorized paths,
tool access, and dangerous input patterns.
"""
import os
import pytest
from .security_hooks import _validate_tool_access, _validate_user_isolation
from .service import _is_tool_error_or_denial
SDK_CWD = "/tmp/copilot-abc123"
def _sdk_available() -> bool:
try:
import claude_agent_sdk # noqa: F401
return True
except ImportError:
return False
def _is_denied(result: dict) -> bool:
hook = result.get("hookSpecificOutput", {})
return hook.get("permissionDecision") == "deny"
def _reason(result: dict) -> str:
return result.get("hookSpecificOutput", {}).get("permissionDecisionReason", "")
# -- Blocked tools -----------------------------------------------------------
def test_blocked_tools_denied():
for tool in ("bash", "shell", "exec", "terminal", "command"):
result = _validate_tool_access(tool, {})
assert _is_denied(result), f"{tool} should be blocked"
def test_unknown_tool_allowed():
result = _validate_tool_access("SomeCustomTool", {})
assert result == {}
# -- Workspace-scoped tools --------------------------------------------------
def test_read_within_workspace_allowed():
result = _validate_tool_access(
"Read", {"file_path": f"{SDK_CWD}/file.txt"}, sdk_cwd=SDK_CWD
)
assert result == {}
def test_write_within_workspace_allowed():
result = _validate_tool_access(
"Write", {"file_path": f"{SDK_CWD}/output.json"}, sdk_cwd=SDK_CWD
)
assert result == {}
def test_edit_within_workspace_allowed():
result = _validate_tool_access(
"Edit", {"file_path": f"{SDK_CWD}/src/main.py"}, sdk_cwd=SDK_CWD
)
assert result == {}
def test_glob_within_workspace_allowed():
result = _validate_tool_access("Glob", {"path": f"{SDK_CWD}/src"}, sdk_cwd=SDK_CWD)
assert result == {}
def test_grep_within_workspace_allowed():
result = _validate_tool_access("Grep", {"path": f"{SDK_CWD}/src"}, sdk_cwd=SDK_CWD)
assert result == {}
def test_read_outside_workspace_denied():
result = _validate_tool_access(
"Read", {"file_path": "/etc/passwd"}, sdk_cwd=SDK_CWD
)
assert _is_denied(result)
def test_write_outside_workspace_denied():
result = _validate_tool_access(
"Write", {"file_path": "/home/user/secrets.txt"}, sdk_cwd=SDK_CWD
)
assert _is_denied(result)
def test_traversal_attack_denied():
result = _validate_tool_access(
"Read",
{"file_path": f"{SDK_CWD}/../../etc/passwd"},
sdk_cwd=SDK_CWD,
)
assert _is_denied(result)
def test_no_path_allowed():
"""Glob/Grep without a path argument defaults to cwd — should pass."""
result = _validate_tool_access("Glob", {}, sdk_cwd=SDK_CWD)
assert result == {}
def test_read_no_cwd_denies_absolute():
"""If no sdk_cwd is set, absolute paths are denied."""
result = _validate_tool_access("Read", {"file_path": "/tmp/anything"})
assert _is_denied(result)
# -- Tool-results directory --------------------------------------------------
def test_read_tool_results_allowed():
home = os.path.expanduser("~")
path = f"{home}/.claude/projects/-tmp-copilot-abc123/tool-results/12345.txt"
result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
assert result == {}
def test_read_claude_projects_without_tool_results_denied():
home = os.path.expanduser("~")
path = f"{home}/.claude/projects/-tmp-copilot-abc123/settings.json"
result = _validate_tool_access("Read", {"file_path": path}, sdk_cwd=SDK_CWD)
assert _is_denied(result)
# -- Built-in Bash is blocked (use bash_exec MCP tool instead) ---------------
def test_bash_builtin_always_blocked():
"""SDK built-in Bash is blocked — bash_exec MCP tool with bubblewrap is used instead."""
result = _validate_tool_access("Bash", {"command": "echo hello"}, sdk_cwd=SDK_CWD)
assert _is_denied(result)
# -- Dangerous patterns ------------------------------------------------------
def test_dangerous_pattern_blocked():
result = _validate_tool_access("SomeTool", {"cmd": "sudo rm -rf /"})
assert _is_denied(result)
def test_subprocess_pattern_blocked():
result = _validate_tool_access("SomeTool", {"code": "subprocess.run(...)"})
assert _is_denied(result)
# -- User isolation ----------------------------------------------------------
def test_workspace_path_traversal_blocked():
result = _validate_user_isolation(
"workspace_read", {"path": "../../../etc/shadow"}, user_id="user-1"
)
assert _is_denied(result)
def test_workspace_absolute_path_allowed():
"""Workspace 'path' is a cloud storage key — leading '/' is normal."""
result = _validate_user_isolation(
"workspace_read", {"path": "/ASEAN/report.md"}, user_id="user-1"
)
assert result == {}
def test_workspace_normal_path_allowed():
result = _validate_user_isolation(
"workspace_read", {"path": "src/main.py"}, user_id="user-1"
)
assert result == {}
def test_non_workspace_tool_passes_isolation():
result = _validate_user_isolation(
"find_agent", {"query": "email"}, user_id="user-1"
)
assert result == {}
# -- Deny message quality ----------------------------------------------------
def test_blocked_tool_message_clarity():
"""Deny messages must include [SECURITY] and 'cannot be bypassed'."""
reason = _reason(_validate_tool_access("bash", {}))
assert "[SECURITY]" in reason
assert "cannot be bypassed" in reason
def test_bash_builtin_blocked_message_clarity():
reason = _reason(_validate_tool_access("Bash", {"command": "echo hello"}))
assert "[SECURITY]" in reason
assert "cannot be bypassed" in reason
# -- Task sub-agent hooks (require SDK) --------------------------------------
@pytest.fixture()
def _hooks():
"""Create security hooks and return the PreToolUse handler."""
from .security_hooks import create_security_hooks
hooks = create_security_hooks(user_id="u1", sdk_cwd=SDK_CWD, max_subtasks=2)
pre = hooks["PreToolUse"][0].hooks[0]
return pre
@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
@pytest.mark.asyncio
async def test_task_background_blocked(_hooks):
"""Task with run_in_background=true must be denied."""
result = await _hooks(
{"tool_name": "Task", "tool_input": {"run_in_background": True, "prompt": "x"}},
tool_use_id=None,
context={},
)
assert _is_denied(result)
assert "foreground" in _reason(result).lower()
@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
@pytest.mark.asyncio
async def test_task_foreground_allowed(_hooks):
"""Task without run_in_background should be allowed."""
result = await _hooks(
{"tool_name": "Task", "tool_input": {"prompt": "do stuff"}},
tool_use_id=None,
context={},
)
assert not _is_denied(result)
@pytest.mark.skipif(not _sdk_available(), reason="claude_agent_sdk not installed")
@pytest.mark.asyncio
async def test_task_limit_enforced(_hooks):
"""Task spawns beyond max_subtasks should be denied."""
# First two should pass
for _ in range(2):
result = await _hooks(
{"tool_name": "Task", "tool_input": {"prompt": "ok"}},
tool_use_id=None,
context={},
)
assert not _is_denied(result)
# Third should be denied (limit=2)
result = await _hooks(
{"tool_name": "Task", "tool_input": {"prompt": "over limit"}},
tool_use_id=None,
context={},
)
assert _is_denied(result)
assert "Maximum" in _reason(result)
# -- _is_tool_error_or_denial ------------------------------------------------
class TestIsToolErrorOrDenial:
def test_none_content(self):
assert _is_tool_error_or_denial(None) is False
def test_empty_content(self):
assert _is_tool_error_or_denial("") is False
def test_benign_output(self):
assert _is_tool_error_or_denial("All good, no issues.") is False
def test_security_marker(self):
assert _is_tool_error_or_denial("[SECURITY] Tool access blocked") is True
def test_cannot_be_bypassed(self):
assert _is_tool_error_or_denial("This restriction cannot be bypassed.") is True
def test_not_allowed(self):
assert _is_tool_error_or_denial("Operation not allowed in sandbox") is True
def test_background_task_denial(self):
assert (
_is_tool_error_or_denial(
"Background task execution is not supported. "
"Run tasks in the foreground instead."
)
is True
)
def test_subtask_limit_denial(self):
assert (
_is_tool_error_or_denial(
"Maximum 2 sub-tasks per session. Please continue in the main conversation."
)
is True
)
def test_denied_marker(self):
assert (
_is_tool_error_or_denial("Access denied: insufficient privileges") is True
)
def test_blocked_marker(self):
assert _is_tool_error_or_denial("Request blocked by security policy") is True
def test_failed_marker(self):
assert _is_tool_error_or_denial("Failed to execute tool: timeout") is True
def test_mcp_iserror(self):
assert _is_tool_error_or_denial('{"isError": true, "content": []}') is True
def test_benign_error_in_value(self):
"""Content like '0 errors found' should not trigger — 'error' was removed."""
assert _is_tool_error_or_denial("0 errors found") is False
def test_benign_permission_field(self):
"""Schema descriptions mentioning 'permission' should not trigger."""
assert (
_is_tool_error_or_denial(
'{"fields": [{"name": "permission_level", "type": "int"}]}'
)
is False
)
def test_benign_not_found_in_listing(self):
"""File listing containing 'not found' in filenames should not trigger."""
assert _is_tool_error_or_denial("readme.md\nfile-not-found-handler.py") is False

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,463 @@
"""Tool adapter for wrapping existing CoPilot tools as Claude Agent SDK MCP tools.
This module provides the adapter layer that converts existing BaseTool implementations
into in-process MCP tools that can be used with the Claude Agent SDK.
"""
import asyncio
import itertools
import json
import logging
import os
import uuid
from contextvars import ContextVar
from typing import Any
from backend.copilot.model import ChatSession
from backend.copilot.tools import TOOL_REGISTRY
from backend.copilot.tools.base import BaseTool
logger = logging.getLogger(__name__)
# Allowed base directory for the Read tool (SDK saves oversized tool results here).
# Restricted to ~/.claude/projects/ and further validated to require "tool-results"
# in the path — prevents reading settings, credentials, or other sensitive files.
_SDK_PROJECTS_DIR = os.path.expanduser("~/.claude/projects/")
# MCP server naming - the SDK prefixes tool names as "mcp__{server_name}__{tool}"
MCP_SERVER_NAME = "copilot"
MCP_TOOL_PREFIX = f"mcp__{MCP_SERVER_NAME}__"
# Context variables to pass user/session info to tool execution
_current_user_id: ContextVar[str | None] = ContextVar("current_user_id", default=None)
_current_session: ContextVar[ChatSession | None] = ContextVar(
"current_session", default=None
)
# Stash for MCP tool outputs before the SDK potentially truncates them.
# Keyed by tool_name → full output string. Consumed (popped) by the
# response adapter when it builds StreamToolOutputAvailable.
_pending_tool_outputs: ContextVar[dict[str, list[str]]] = ContextVar(
"pending_tool_outputs",
default=None, # type: ignore[arg-type]
)
# Event signaled whenever stash_pending_tool_output() adds a new entry.
# Used by the streaming loop to wait for PostToolUse hooks to complete
# instead of sleeping an arbitrary duration. The SDK fires hooks via
# start_soon (fire-and-forget) so the next message can arrive before
# the hook stashes its output — this event bridges that gap.
_stash_event: ContextVar[asyncio.Event | None] = ContextVar(
"_stash_event", default=None
)
def set_execution_context(
user_id: str | None,
session: ChatSession,
) -> None:
"""Set the execution context for tool calls.
This must be called before streaming begins to ensure tools have access
to user_id and session information.
Args:
user_id: Current user's ID.
session: Current chat session.
"""
_current_user_id.set(user_id)
_current_session.set(session)
_pending_tool_outputs.set({})
_stash_event.set(asyncio.Event())
def get_execution_context() -> tuple[str | None, ChatSession | None]:
"""Get the current execution context."""
return (
_current_user_id.get(),
_current_session.get(),
)
def pop_pending_tool_output(tool_name: str) -> str | None:
"""Pop and return the oldest stashed output for *tool_name*.
The SDK CLI may truncate large tool results (writing them to disk and
replacing the content with a file reference). This stash keeps the
original MCP output so the response adapter can forward it to the
frontend for proper widget rendering.
Uses a FIFO queue per tool name so duplicate calls to the same tool
in one turn each get their own output.
Returns ``None`` if nothing was stashed for *tool_name*.
"""
pending = _pending_tool_outputs.get(None)
if pending is None:
return None
queue = pending.get(tool_name)
if not queue:
pending.pop(tool_name, None)
return None
value = queue.pop(0)
if not queue:
del pending[tool_name]
return value
def stash_pending_tool_output(tool_name: str, output: Any) -> None:
"""Stash tool output for later retrieval by the response adapter.
Used by the PostToolUse hook to capture SDK built-in tool outputs
(WebSearch, Read, etc.) that aren't available through the MCP stash
mechanism in ``_execute_tool_sync``.
Appends to a FIFO queue per tool name so multiple calls to the same
tool in one turn are all preserved.
"""
pending = _pending_tool_outputs.get(None)
if pending is None:
return
if isinstance(output, str):
text = output
else:
try:
text = json.dumps(output)
except (TypeError, ValueError):
text = str(output)
pending.setdefault(tool_name, []).append(text)
# Signal any waiters that new output is available.
event = _stash_event.get(None)
if event is not None:
event.set()
async def wait_for_stash(timeout: float = 0.5) -> bool:
"""Wait for a PostToolUse hook to stash tool output.
The SDK fires PostToolUse hooks asynchronously via ``start_soon()`` —
the next message (AssistantMessage/ResultMessage) can arrive before the
hook completes and stashes its output. This function bridges that gap
by waiting on the ``_stash_event``, which is signaled by
:func:`stash_pending_tool_output`.
After the event fires, callers should ``await asyncio.sleep(0)`` to
give any remaining concurrent hooks a chance to complete.
Returns ``True`` if a stash signal was received, ``False`` on timeout.
The timeout is a safety net — normally the stash happens within
microseconds of yielding to the event loop.
"""
event = _stash_event.get(None)
if event is None:
return False
# Fast path: hook already completed before we got here.
if event.is_set():
event.clear()
return True
# Slow path: wait for the hook to signal.
try:
async with asyncio.timeout(timeout):
await event.wait()
event.clear()
return True
except TimeoutError:
return False
async def _execute_tool_sync(
base_tool: BaseTool,
user_id: str | None,
session: ChatSession,
args: dict[str, Any],
) -> dict[str, Any]:
"""Execute a tool synchronously and return MCP-formatted response."""
effective_id = f"sdk-{uuid.uuid4().hex[:12]}"
result = await base_tool.execute(
user_id=user_id,
session=session,
tool_call_id=effective_id,
**args,
)
text = (
result.output if isinstance(result.output, str) else json.dumps(result.output)
)
# Stash the full output before the SDK potentially truncates it.
pending = _pending_tool_outputs.get(None)
if pending is not None:
pending.setdefault(base_tool.name, []).append(text)
content_blocks: list[dict[str, str]] = [{"type": "text", "text": text}]
# If the tool result contains inline image data, add an MCP image block
# so Claude can "see" the image (e.g. read_workspace_file on a small PNG).
image_block = _extract_image_block(text)
if image_block:
content_blocks.append(image_block)
return {
"content": content_blocks,
"isError": not result.success,
}
# MIME types that Claude can process as image content blocks.
_SUPPORTED_IMAGE_TYPES = frozenset(
{"image/png", "image/jpeg", "image/gif", "image/webp"}
)
def _extract_image_block(text: str) -> dict[str, str] | None:
"""Extract an MCP image content block from a tool result JSON string.
Detects workspace file responses with ``content_base64`` and an image
MIME type, returning an MCP-format image block that allows Claude to
"see" the image. Returns ``None`` if the result is not an inline image.
"""
try:
data = json.loads(text)
except (json.JSONDecodeError, TypeError):
return None
if not isinstance(data, dict):
return None
mime_type = data.get("mime_type", "")
base64_content = data.get("content_base64", "")
# Only inline small images — large ones would exceed Claude's limits.
# 32 KB raw ≈ ~43 KB base64.
_MAX_IMAGE_BASE64_BYTES = 43_000
if (
mime_type in _SUPPORTED_IMAGE_TYPES
and base64_content
and len(base64_content) <= _MAX_IMAGE_BASE64_BYTES
):
return {
"type": "image",
"data": base64_content,
"mimeType": mime_type,
}
return None
def _mcp_error(message: str) -> dict[str, Any]:
return {
"content": [
{"type": "text", "text": json.dumps({"error": message, "type": "error"})}
],
"isError": True,
}
def create_tool_handler(base_tool: BaseTool):
"""Create an async handler function for a BaseTool.
This wraps the existing BaseTool._execute method to be compatible
with the Claude Agent SDK MCP tool format.
"""
async def tool_handler(args: dict[str, Any]) -> dict[str, Any]:
"""Execute the wrapped tool and return MCP-formatted response."""
user_id, session = get_execution_context()
if session is None:
return _mcp_error("No session context available")
try:
return await _execute_tool_sync(base_tool, user_id, session, args)
except Exception as e:
logger.error(f"Error executing tool {base_tool.name}: {e}", exc_info=True)
return _mcp_error(f"Failed to execute {base_tool.name}: {e}")
return tool_handler
def _build_input_schema(base_tool: BaseTool) -> dict[str, Any]:
"""Build a JSON Schema input schema for a tool."""
return {
"type": "object",
"properties": base_tool.parameters.get("properties", {}),
"required": base_tool.parameters.get("required", []),
}
async def _read_file_handler(args: dict[str, Any]) -> dict[str, Any]:
"""Read a file with optional offset/limit. Restricted to SDK working directory.
After reading, the file is deleted to prevent accumulation in long-running pods.
"""
file_path = args.get("file_path", "")
offset = args.get("offset", 0)
limit = args.get("limit", 2000)
# Security: only allow reads under ~/.claude/projects/**/tool-results/
real_path = os.path.realpath(file_path)
if not real_path.startswith(_SDK_PROJECTS_DIR) or "tool-results" not in real_path:
return {
"content": [{"type": "text", "text": f"Access denied: {file_path}"}],
"isError": True,
}
try:
with open(real_path) as f:
selected = list(itertools.islice(f, offset, offset + limit))
content = "".join(selected)
# Cleanup happens in _cleanup_sdk_tool_results after session ends;
# don't delete here — the SDK may read in multiple chunks.
return {"content": [{"type": "text", "text": content}], "isError": False}
except FileNotFoundError:
return {
"content": [{"type": "text", "text": f"File not found: {file_path}"}],
"isError": True,
}
except Exception as e:
return {
"content": [{"type": "text", "text": f"Error reading file: {e}"}],
"isError": True,
}
_READ_TOOL_NAME = "Read"
_READ_TOOL_DESCRIPTION = (
"Read a file from the local filesystem. "
"Use offset and limit to read specific line ranges for large files."
)
_READ_TOOL_SCHEMA = {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "The absolute path to the file to read",
},
"offset": {
"type": "integer",
"description": "Line number to start reading from (0-indexed). Default: 0",
},
"limit": {
"type": "integer",
"description": "Number of lines to read. Default: 2000",
},
},
"required": ["file_path"],
}
# Create the MCP server configuration
def create_copilot_mcp_server():
"""Create an in-process MCP server configuration for CoPilot tools.
This can be passed to ClaudeAgentOptions.mcp_servers.
Note: The actual SDK MCP server creation depends on the claude-agent-sdk
package being available. This function returns the configuration that
can be used with the SDK.
"""
try:
from claude_agent_sdk import create_sdk_mcp_server, tool
# Create decorated tool functions
sdk_tools = []
for tool_name, base_tool in TOOL_REGISTRY.items():
handler = create_tool_handler(base_tool)
decorated = tool(
tool_name,
base_tool.description,
_build_input_schema(base_tool),
)(handler)
sdk_tools.append(decorated)
# Add the Read tool so the SDK can read back oversized tool results
read_tool = tool(
_READ_TOOL_NAME,
_READ_TOOL_DESCRIPTION,
_READ_TOOL_SCHEMA,
)(_read_file_handler)
sdk_tools.append(read_tool)
server = create_sdk_mcp_server(
name=MCP_SERVER_NAME,
version="1.0.0",
tools=sdk_tools,
)
return server
except ImportError:
# Let ImportError propagate so service.py handles the fallback
raise
# SDK built-in tools allowed within the workspace directory.
# Security hooks validate that file paths stay within sdk_cwd.
# Bash is NOT included — use the sandboxed MCP bash_exec tool instead,
# which provides kernel-level network isolation via unshare --net.
# Task allows spawning sub-agents (rate-limited by security hooks).
# WebSearch uses Brave Search via Anthropic's API — safe, no SSRF risk.
# TodoWrite manages the task checklist shown in the UI — no security concern.
_SDK_BUILTIN_TOOLS = [
"Read",
"Write",
"Edit",
"Glob",
"Grep",
"Task",
"WebSearch",
"TodoWrite",
]
# SDK built-in tools that must be explicitly blocked.
# Bash: dangerous — agent uses mcp__copilot__bash_exec with kernel-level
# network isolation (unshare --net) instead.
# WebFetch: SSRF risk — can reach internal network (localhost, 10.x, etc.).
# Agent uses the SSRF-protected mcp__copilot__web_fetch tool instead.
# AskUserQuestion: interactive CLI tool — no terminal in copilot context.
SDK_DISALLOWED_TOOLS = [
"Bash",
"WebFetch",
"AskUserQuestion",
]
# Tools that are blocked entirely in security hooks (defence-in-depth).
# Includes SDK_DISALLOWED_TOOLS plus common aliases/synonyms.
BLOCKED_TOOLS = {
*SDK_DISALLOWED_TOOLS,
"bash",
"shell",
"exec",
"terminal",
"command",
}
# Tools allowed only when their path argument stays within the SDK workspace.
# The SDK uses these to handle oversized tool results (writes to tool-results/
# files, then reads them back) and for workspace file operations.
WORKSPACE_SCOPED_TOOLS = {"Read", "Write", "Edit", "Glob", "Grep"}
# Dangerous patterns in tool inputs
DANGEROUS_PATTERNS = [
r"sudo",
r"rm\s+-rf",
r"dd\s+if=",
r"/etc/passwd",
r"/etc/shadow",
r"chmod\s+777",
r"curl\s+.*\|.*sh",
r"wget\s+.*\|.*sh",
r"eval\s*\(",
r"exec\s*\(",
r"__import__",
r"os\.system",
r"subprocess",
]
# List of tool names for allowed_tools configuration
# Include MCP tools, the MCP Read tool for oversized results,
# and SDK built-in file tools for workspace operations.
COPILOT_TOOL_NAMES = [
*[f"{MCP_TOOL_PREFIX}{name}" for name in TOOL_REGISTRY.keys()],
f"{MCP_TOOL_PREFIX}{_READ_TOOL_NAME}",
*_SDK_BUILTIN_TOOLS,
]

View File

@@ -0,0 +1,467 @@
"""JSONL transcript management for stateless multi-turn resume.
The Claude Code CLI persists conversations as JSONL files (one JSON object per
line). When the SDK's ``Stop`` hook fires we read this file, strip bloat
(progress entries, metadata), and upload the result to bucket storage. On the
next turn we download the transcript, write it to a temp file, and pass
``--resume`` so the CLI can reconstruct the full conversation.
Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
filesystem for self-hosted) — no DB column needed.
"""
import json
import logging
import os
import re
import time
from dataclasses import dataclass
logger = logging.getLogger(__name__)
# UUIDs are hex + hyphens; strip everything else to prevent path injection.
_SAFE_ID_RE = re.compile(r"[^0-9a-fA-F-]")
# Entry types that can be safely removed from the transcript without breaking
# the parentUuid conversation tree that ``--resume`` relies on.
# - progress: UI progress ticks, no message content (avg 97KB for agent_progress)
# - file-history-snapshot: undo tracking metadata
# - queue-operation: internal queue bookkeeping
# - summary: session summaries
# - pr-link: PR link metadata
STRIPPABLE_TYPES = frozenset(
{"progress", "file-history-snapshot", "queue-operation", "summary", "pr-link"}
)
@dataclass
class TranscriptDownload:
"""Result of downloading a transcript with its metadata."""
content: str
message_count: int = 0 # session.messages length when uploaded
uploaded_at: float = 0.0 # epoch timestamp of upload
# Workspace storage constants — deterministic path from session_id.
TRANSCRIPT_STORAGE_PREFIX = "chat-transcripts"
# ---------------------------------------------------------------------------
# Progress stripping
# ---------------------------------------------------------------------------
def strip_progress_entries(content: str) -> str:
"""Remove progress/metadata entries from a JSONL transcript.
Removes entries whose ``type`` is in ``STRIPPABLE_TYPES`` and reparents
any remaining child entries so the ``parentUuid`` chain stays intact.
Typically reduces transcript size by ~30%.
"""
lines = content.strip().split("\n")
entries: list[dict] = []
for line in lines:
try:
entries.append(json.loads(line))
except json.JSONDecodeError:
# Keep unparseable lines as-is (safety)
entries.append({"_raw": line})
stripped_uuids: set[str] = set()
uuid_to_parent: dict[str, str] = {}
kept: list[dict] = []
for entry in entries:
if "_raw" in entry:
kept.append(entry)
continue
uid = entry.get("uuid", "")
parent = entry.get("parentUuid", "")
entry_type = entry.get("type", "")
if uid:
uuid_to_parent[uid] = parent
if entry_type in STRIPPABLE_TYPES:
if uid:
stripped_uuids.add(uid)
else:
kept.append(entry)
# Reparent: walk up chain through stripped entries to find surviving ancestor
for entry in kept:
if "_raw" in entry:
continue
parent = entry.get("parentUuid", "")
original_parent = parent
while parent in stripped_uuids:
parent = uuid_to_parent.get(parent, "")
if parent != original_parent:
entry["parentUuid"] = parent
result_lines: list[str] = []
for entry in kept:
if "_raw" in entry:
result_lines.append(entry["_raw"])
else:
result_lines.append(json.dumps(entry, separators=(",", ":")))
return "\n".join(result_lines) + "\n"
# ---------------------------------------------------------------------------
# Local file I/O (read from CLI's JSONL, write temp file for --resume)
# ---------------------------------------------------------------------------
def read_transcript_file(transcript_path: str) -> str | None:
"""Read a JSONL transcript file from disk.
Returns the raw JSONL content, or ``None`` if the file is missing, empty,
or only contains metadata (≤2 lines with no conversation messages).
"""
if not transcript_path or not os.path.isfile(transcript_path):
logger.debug(f"[Transcript] File not found: {transcript_path}")
return None
try:
with open(transcript_path) as f:
content = f.read()
if not content.strip():
logger.debug("[Transcript] File is empty: %s", transcript_path)
return None
lines = content.strip().split("\n")
# Validate that the transcript has real conversation content
# (not just metadata like queue-operation entries).
if not validate_transcript(content):
logger.debug(
"[Transcript] No conversation content (%d lines) in %s",
len(lines),
transcript_path,
)
return None
logger.info(
f"[Transcript] Read {len(lines)} lines, "
f"{len(content)} bytes from {transcript_path}"
)
return content
except (json.JSONDecodeError, OSError) as e:
logger.warning(f"[Transcript] Failed to read {transcript_path}: {e}")
return None
def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
"""Sanitize an ID for safe use in file paths.
Session/user IDs are expected to be UUIDs (hex + hyphens). Strip
everything else and truncate to *max_len* so the result cannot introduce
path separators or other special characters.
"""
cleaned = _SAFE_ID_RE.sub("", raw_id or "")[:max_len]
return cleaned or "unknown"
_SAFE_CWD_PREFIX = os.path.realpath("/tmp/copilot-")
def _encode_cwd_for_cli(cwd: str) -> str:
"""Encode a working directory path the same way the Claude CLI does.
The CLI replaces all non-alphanumeric characters with ``-``.
"""
return re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(cwd))
def cleanup_cli_project_dir(sdk_cwd: str) -> None:
"""Remove the CLI's project directory for a specific working directory.
The CLI stores session data under ``~/.claude/projects/<encoded_cwd>/``.
Each SDK turn uses a unique ``sdk_cwd``, so the project directory is
safe to remove entirely after the transcript has been uploaded.
"""
import shutil
cwd_encoded = _encode_cwd_for_cli(sdk_cwd)
config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
projects_base = os.path.realpath(os.path.join(config_dir, "projects"))
project_dir = os.path.realpath(os.path.join(projects_base, cwd_encoded))
if not project_dir.startswith(projects_base + os.sep):
logger.warning(
f"[Transcript] Cleanup path escaped projects base: {project_dir}"
)
return
if os.path.isdir(project_dir):
shutil.rmtree(project_dir, ignore_errors=True)
logger.debug(f"[Transcript] Cleaned up CLI project dir: {project_dir}")
else:
logger.debug(f"[Transcript] Project dir not found: {project_dir}")
def write_transcript_to_tempfile(
transcript_content: str,
session_id: str,
cwd: str,
) -> str | None:
"""Write JSONL transcript to a temp file inside *cwd* for ``--resume``.
The file lives in the session working directory so it is cleaned up
automatically when the session ends.
Returns the absolute path to the file, or ``None`` on failure.
"""
# Validate cwd is under the expected sandbox prefix (CodeQL sanitizer).
real_cwd = os.path.realpath(cwd)
if not real_cwd.startswith(_SAFE_CWD_PREFIX):
logger.warning(f"[Transcript] cwd outside sandbox: {cwd}")
return None
try:
os.makedirs(real_cwd, exist_ok=True)
safe_id = _sanitize_id(session_id, max_len=8)
jsonl_path = os.path.realpath(
os.path.join(real_cwd, f"transcript-{safe_id}.jsonl")
)
if not jsonl_path.startswith(real_cwd):
logger.warning(f"[Transcript] Path escaped cwd: {jsonl_path}")
return None
with open(jsonl_path, "w") as f:
f.write(transcript_content)
logger.info(f"[Transcript] Wrote resume file: {jsonl_path}")
return jsonl_path
except OSError as e:
logger.warning(f"[Transcript] Failed to write resume file: {e}")
return None
def validate_transcript(content: str | None) -> bool:
"""Check that a transcript has actual conversation messages.
A valid transcript for resume needs at least one user message and one
assistant message (not just queue-operation / file-history-snapshot
metadata).
"""
if not content or not content.strip():
return False
lines = content.strip().split("\n")
if len(lines) < 2:
return False
has_user = False
has_assistant = False
for line in lines:
try:
entry = json.loads(line)
msg_type = entry.get("type")
if msg_type == "user":
has_user = True
elif msg_type == "assistant":
has_assistant = True
except json.JSONDecodeError:
return False
return has_user and has_assistant
# ---------------------------------------------------------------------------
# Bucket storage (GCS / local via WorkspaceStorageBackend)
# ---------------------------------------------------------------------------
def _storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
"""Return (workspace_id, file_id, filename) for a session's transcript.
Path structure: ``chat-transcripts/{user_id}/{session_id}.jsonl``
IDs are sanitized to hex+hyphen to prevent path traversal.
"""
return (
TRANSCRIPT_STORAGE_PREFIX,
_sanitize_id(user_id),
f"{_sanitize_id(session_id)}.jsonl",
)
def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
"""Return (workspace_id, file_id, filename) for a session's transcript metadata."""
return (
TRANSCRIPT_STORAGE_PREFIX,
_sanitize_id(user_id),
f"{_sanitize_id(session_id)}.meta.json",
)
def _build_storage_path(user_id: str, session_id: str, backend: object) -> str:
"""Build the full storage path string that ``retrieve()`` expects.
``store()`` returns a path like ``gcs://bucket/workspaces/...`` or
``local://workspace_id/file_id/filename``. Since we use deterministic
arguments we can reconstruct the same path for download/delete without
having stored the return value.
"""
from backend.util.workspace_storage import GCSWorkspaceStorage
wid, fid, fname = _storage_path_parts(user_id, session_id)
if isinstance(backend, GCSWorkspaceStorage):
blob = f"workspaces/{wid}/{fid}/{fname}"
return f"gcs://{backend.bucket_name}/{blob}"
else:
# LocalWorkspaceStorage returns local://{relative_path}
return f"local://{wid}/{fid}/{fname}"
async def upload_transcript(
user_id: str,
session_id: str,
content: str,
message_count: int = 0,
) -> None:
"""Strip progress entries and upload transcript to bucket storage.
Safety: only overwrites when the new (stripped) transcript is larger than
what is already stored. Since JSONL is append-only, the latest transcript
is always the longest. This prevents a slow/stale background task from
clobbering a newer upload from a concurrent turn.
Args:
message_count: ``len(session.messages)`` at upload time — used by
the next turn to detect staleness and compress only the gap.
"""
from backend.util.workspace_storage import get_workspace_storage
stripped = strip_progress_entries(content)
if not validate_transcript(stripped):
logger.warning(
f"[Transcript] Skipping upload — stripped content not valid "
f"for session {session_id}"
)
return
storage = await get_workspace_storage()
wid, fid, fname = _storage_path_parts(user_id, session_id)
encoded = stripped.encode("utf-8")
new_size = len(encoded)
# Check existing transcript size to avoid overwriting newer with older
path = _build_storage_path(user_id, session_id, storage)
try:
existing = await storage.retrieve(path)
if len(existing) >= new_size:
logger.info(
f"[Transcript] Skipping upload — existing ({len(existing)}B) "
f">= new ({new_size}B) for session {session_id}"
)
return
except (FileNotFoundError, Exception):
pass # No existing transcript or retrieval error — proceed with upload
await storage.store(
workspace_id=wid,
file_id=fid,
filename=fname,
content=encoded,
)
# Store metadata alongside the transcript so the next turn can detect
# staleness and only compress the gap instead of the full history.
# Wrapped in try/except so a metadata write failure doesn't orphan
# the already-uploaded transcript — the next turn will just fall back
# to full gap fill (msg_count=0).
try:
meta = {"message_count": message_count, "uploaded_at": time.time()}
mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
await storage.store(
workspace_id=mwid,
file_id=mfid,
filename=mfname,
content=json.dumps(meta).encode("utf-8"),
)
except Exception as e:
logger.warning(f"[Transcript] Failed to write metadata for {session_id}: {e}")
logger.info(
f"[Transcript] Uploaded {new_size}B "
f"(stripped from {len(content)}B, msg_count={message_count}) "
f"for session {session_id}"
)
async def download_transcript(
user_id: str, session_id: str
) -> TranscriptDownload | None:
"""Download transcript and metadata from bucket storage.
Returns a ``TranscriptDownload`` with the JSONL content and the
``message_count`` watermark from the upload, or ``None`` if not found.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
try:
data = await storage.retrieve(path)
content = data.decode("utf-8")
except FileNotFoundError:
logger.debug(f"[Transcript] No transcript in storage for {session_id}")
return None
except Exception as e:
logger.warning(f"[Transcript] Failed to download transcript: {e}")
return None
# Try to load metadata (best-effort — old transcripts won't have it)
message_count = 0
uploaded_at = 0.0
try:
from backend.util.workspace_storage import GCSWorkspaceStorage
mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
if isinstance(storage, GCSWorkspaceStorage):
blob = f"workspaces/{mwid}/{mfid}/{mfname}"
meta_path = f"gcs://{storage.bucket_name}/{blob}"
else:
meta_path = f"local://{mwid}/{mfid}/{mfname}"
meta_data = await storage.retrieve(meta_path)
meta = json.loads(meta_data.decode("utf-8"))
message_count = meta.get("message_count", 0)
uploaded_at = meta.get("uploaded_at", 0.0)
except (FileNotFoundError, json.JSONDecodeError, Exception):
pass # No metadata — treat as unknown (msg_count=0 → always fill gap)
logger.info(
f"[Transcript] Downloaded {len(content)}B "
f"(msg_count={message_count}) for session {session_id}"
)
return TranscriptDownload(
content=content,
message_count=message_count,
uploaded_at=uploaded_at,
)
async def delete_transcript(user_id: str, session_id: str) -> None:
"""Delete transcript from bucket storage (e.g. after resume failure)."""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
try:
await storage.delete(path)
logger.info(f"[Transcript] Deleted transcript for session {session_id}")
except Exception as e:
logger.warning(f"[Transcript] Failed to delete transcript: {e}")

View File

@@ -0,0 +1,255 @@
"""Unit tests for JSONL transcript management utilities."""
import json
import os
from .transcript import (
STRIPPABLE_TYPES,
read_transcript_file,
strip_progress_entries,
validate_transcript,
write_transcript_to_tempfile,
)
def _make_jsonl(*entries: dict) -> str:
return "\n".join(json.dumps(e) for e in entries) + "\n"
# --- Fixtures ---
METADATA_LINE = {"type": "queue-operation", "subtype": "create"}
FILE_HISTORY = {"type": "file-history-snapshot", "files": []}
USER_MSG = {"type": "user", "uuid": "u1", "message": {"role": "user", "content": "hi"}}
ASST_MSG = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "u1",
"message": {"role": "assistant", "content": "hello"},
}
PROGRESS_ENTRY = {
"type": "progress",
"uuid": "p1",
"parentUuid": "u1",
"data": {"type": "bash_progress", "stdout": "running..."},
}
VALID_TRANSCRIPT = _make_jsonl(METADATA_LINE, FILE_HISTORY, USER_MSG, ASST_MSG)
# --- read_transcript_file ---
class TestReadTranscriptFile:
def test_returns_content_for_valid_file(self, tmp_path):
path = tmp_path / "session.jsonl"
path.write_text(VALID_TRANSCRIPT)
result = read_transcript_file(str(path))
assert result is not None
assert "user" in result
def test_returns_none_for_missing_file(self):
assert read_transcript_file("/nonexistent/path.jsonl") is None
def test_returns_none_for_empty_path(self):
assert read_transcript_file("") is None
def test_returns_none_for_empty_file(self, tmp_path):
path = tmp_path / "empty.jsonl"
path.write_text("")
assert read_transcript_file(str(path)) is None
def test_returns_none_for_metadata_only(self, tmp_path):
content = _make_jsonl(METADATA_LINE, FILE_HISTORY)
path = tmp_path / "meta.jsonl"
path.write_text(content)
assert read_transcript_file(str(path)) is None
def test_returns_none_for_invalid_json(self, tmp_path):
path = tmp_path / "bad.jsonl"
path.write_text("not json\n{}\n{}\n")
assert read_transcript_file(str(path)) is None
def test_no_size_limit(self, tmp_path):
"""Large files are accepted — bucket storage has no size limit."""
big_content = {"type": "user", "uuid": "u9", "data": "x" * 1_000_000}
content = _make_jsonl(METADATA_LINE, FILE_HISTORY, big_content, ASST_MSG)
path = tmp_path / "big.jsonl"
path.write_text(content)
result = read_transcript_file(str(path))
assert result is not None
# --- write_transcript_to_tempfile ---
class TestWriteTranscriptToTempfile:
"""Tests use /tmp/copilot-* paths to satisfy the sandbox prefix check."""
def test_writes_file_and_returns_path(self):
cwd = "/tmp/copilot-test-write"
try:
result = write_transcript_to_tempfile(
VALID_TRANSCRIPT, "sess-1234-abcd", cwd
)
assert result is not None
assert os.path.isfile(result)
assert result.endswith(".jsonl")
with open(result) as f:
assert f.read() == VALID_TRANSCRIPT
finally:
import shutil
shutil.rmtree(cwd, ignore_errors=True)
def test_creates_parent_directory(self):
cwd = "/tmp/copilot-test-mkdir"
try:
result = write_transcript_to_tempfile(VALID_TRANSCRIPT, "sess-1234", cwd)
assert result is not None
assert os.path.isdir(cwd)
finally:
import shutil
shutil.rmtree(cwd, ignore_errors=True)
def test_uses_session_id_prefix(self):
cwd = "/tmp/copilot-test-prefix"
try:
result = write_transcript_to_tempfile(
VALID_TRANSCRIPT, "abcdef12-rest", cwd
)
assert result is not None
assert "abcdef12" in os.path.basename(result)
finally:
import shutil
shutil.rmtree(cwd, ignore_errors=True)
def test_rejects_cwd_outside_sandbox(self, tmp_path):
cwd = str(tmp_path / "not-copilot")
result = write_transcript_to_tempfile(VALID_TRANSCRIPT, "sess-1234", cwd)
assert result is None
# --- validate_transcript ---
class TestValidateTranscript:
def test_valid_transcript(self):
assert validate_transcript(VALID_TRANSCRIPT) is True
def test_none_content(self):
assert validate_transcript(None) is False
def test_empty_content(self):
assert validate_transcript("") is False
def test_metadata_only(self):
content = _make_jsonl(METADATA_LINE, FILE_HISTORY)
assert validate_transcript(content) is False
def test_user_only_no_assistant(self):
content = _make_jsonl(METADATA_LINE, FILE_HISTORY, USER_MSG)
assert validate_transcript(content) is False
def test_assistant_only_no_user(self):
content = _make_jsonl(METADATA_LINE, FILE_HISTORY, ASST_MSG)
assert validate_transcript(content) is False
def test_invalid_json_returns_false(self):
assert validate_transcript("not json\n{}\n{}\n") is False
# --- strip_progress_entries ---
class TestStripProgressEntries:
def test_strips_all_strippable_types(self):
"""All STRIPPABLE_TYPES are removed from the output."""
entries = [
USER_MSG,
{"type": "progress", "uuid": "p1", "parentUuid": "u1"},
{"type": "file-history-snapshot", "files": []},
{"type": "queue-operation", "subtype": "create"},
{"type": "summary", "text": "..."},
{"type": "pr-link", "url": "..."},
ASST_MSG,
]
result = strip_progress_entries(_make_jsonl(*entries))
result_types = {json.loads(line)["type"] for line in result.strip().split("\n")}
assert result_types == {"user", "assistant"}
for stype in STRIPPABLE_TYPES:
assert stype not in result_types
def test_reparents_children_of_stripped_entries(self):
"""An assistant message whose parent is a progress entry gets reparented."""
progress = {
"type": "progress",
"uuid": "p1",
"parentUuid": "u1",
"data": {"type": "bash_progress"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "p1", # Points to progress
"message": {"role": "assistant", "content": "done"},
}
content = _make_jsonl(USER_MSG, progress, asst)
result = strip_progress_entries(content)
lines = [json.loads(line) for line in result.strip().split("\n")]
asst_entry = next(e for e in lines if e["type"] == "assistant")
# Should be reparented to u1 (the user message)
assert asst_entry["parentUuid"] == "u1"
def test_reparents_through_chain(self):
"""Reparenting walks through multiple stripped entries."""
p1 = {"type": "progress", "uuid": "p1", "parentUuid": "u1"}
p2 = {"type": "progress", "uuid": "p2", "parentUuid": "p1"}
p3 = {"type": "progress", "uuid": "p3", "parentUuid": "p2"}
asst = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "p3", # 3 levels deep
"message": {"role": "assistant", "content": "done"},
}
content = _make_jsonl(USER_MSG, p1, p2, p3, asst)
result = strip_progress_entries(content)
lines = [json.loads(line) for line in result.strip().split("\n")]
asst_entry = next(e for e in lines if e["type"] == "assistant")
assert asst_entry["parentUuid"] == "u1"
def test_preserves_non_strippable_entries(self):
"""User, assistant, and system entries are preserved."""
system = {"type": "system", "uuid": "s1", "message": "prompt"}
content = _make_jsonl(system, USER_MSG, ASST_MSG)
result = strip_progress_entries(content)
result_types = [json.loads(line)["type"] for line in result.strip().split("\n")]
assert result_types == ["system", "user", "assistant"]
def test_empty_input(self):
result = strip_progress_entries("")
# Should return just a newline (empty content stripped)
assert result.strip() == ""
def test_no_strippable_entries(self):
"""When there's nothing to strip, output matches input structure."""
content = _make_jsonl(USER_MSG, ASST_MSG)
result = strip_progress_entries(content)
result_lines = result.strip().split("\n")
assert len(result_lines) == 2
def test_handles_entries_without_uuid(self):
"""Entries without uuid field are handled gracefully."""
no_uuid = {"type": "queue-operation", "subtype": "create"}
content = _make_jsonl(no_uuid, USER_MSG, ASST_MSG)
result = strip_progress_entries(content)
result_types = [json.loads(line)["type"] for line in result.strip().split("\n")]
# queue-operation is strippable
assert "queue-operation" not in result_types
assert "user" in result_types
assert "assistant" in result_types

View File

@@ -0,0 +1,184 @@
import asyncio
import logging
from os import getenv
import pytest
from . import service as chat_service
from .model import create_chat_session, get_chat_session, upsert_chat_session
from .response_model import (
StreamError,
StreamFinish,
StreamTextDelta,
StreamToolOutputAvailable,
)
from .sdk import service as sdk_service
from .sdk.transcript import download_transcript
logger = logging.getLogger(__name__)
@pytest.mark.asyncio(loop_scope="session")
async def test_stream_chat_completion(setup_test_user, test_user_id):
"""
Test the stream_chat_completion function.
"""
api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
session = await create_chat_session(test_user_id)
has_errors = False
has_ended = False
assistant_message = ""
async for chunk in chat_service.stream_chat_completion(
session.session_id, "Hello, how are you?", user_id=session.user_id
):
logger.info(chunk)
if isinstance(chunk, StreamError):
has_errors = True
if isinstance(chunk, StreamTextDelta):
assistant_message += chunk.delta
if isinstance(chunk, StreamFinish):
has_ended = True
assert has_ended, "Chat completion did not end"
assert not has_errors, "Error occurred while streaming chat completion"
assert assistant_message, "Assistant message is empty"
@pytest.mark.asyncio(loop_scope="session")
async def test_stream_chat_completion_with_tool_calls(setup_test_user, test_user_id):
"""
Test the stream_chat_completion function.
"""
api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
session = await create_chat_session(test_user_id)
session = await upsert_chat_session(session)
has_errors = False
has_ended = False
had_tool_calls = False
async for chunk in chat_service.stream_chat_completion(
session.session_id,
"Please find me an agent that can help me with my business. Use the query 'moneny printing agent'",
user_id=session.user_id,
):
logger.info(chunk)
if isinstance(chunk, StreamError):
has_errors = True
if isinstance(chunk, StreamFinish):
has_ended = True
if isinstance(chunk, StreamToolOutputAvailable):
had_tool_calls = True
assert has_ended, "Chat completion did not end"
assert not has_errors, "Error occurred while streaming chat completion"
assert had_tool_calls, "Tool calls did not occur"
session = await get_chat_session(session.session_id)
assert session, "Session not found"
assert session.usage, "Usage is empty"
@pytest.mark.asyncio(loop_scope="session")
async def test_sdk_resume_multi_turn(setup_test_user, test_user_id):
"""Test that the SDK --resume path captures and uses transcripts across turns.
Turn 1: Send a message containing a unique keyword.
Turn 2: Ask the model to recall that keyword — proving the transcript was
persisted and restored via --resume.
"""
api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
if not api_key:
return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
from .config import ChatConfig
cfg = ChatConfig()
if not cfg.claude_agent_use_resume:
return pytest.skip("CLAUDE_AGENT_USE_RESUME is not enabled, skipping test")
session = await create_chat_session(test_user_id)
session = await upsert_chat_session(session)
# --- Turn 1: send a message with a unique keyword ---
keyword = "ZEPHYR42"
turn1_msg = (
f"Please remember this special keyword: {keyword}. "
"Just confirm you've noted it, keep your response brief."
)
turn1_text = ""
turn1_errors: list[str] = []
turn1_ended = False
async for chunk in sdk_service.stream_chat_completion_sdk(
session.session_id,
turn1_msg,
user_id=test_user_id,
):
if isinstance(chunk, StreamTextDelta):
turn1_text += chunk.delta
elif isinstance(chunk, StreamError):
turn1_errors.append(chunk.errorText)
elif isinstance(chunk, StreamFinish):
turn1_ended = True
assert turn1_ended, "Turn 1 did not finish"
assert not turn1_errors, f"Turn 1 errors: {turn1_errors}"
assert turn1_text, "Turn 1 produced no text"
# Wait for background upload task to complete (retry up to 5s).
# The CLI may not produce a usable transcript for very short
# conversations (only metadata entries) — this is environment-dependent
# (CLI version, platform). When that happens, multi-turn still works
# via conversation compression (non-resume path), but we can't test
# the --resume round-trip.
transcript = None
for _ in range(10):
await asyncio.sleep(0.5)
transcript = await download_transcript(test_user_id, session.session_id)
if transcript:
break
if not transcript:
return pytest.skip(
"CLI did not produce a usable transcript — "
"cannot test --resume round-trip in this environment"
)
logger.info(f"Turn 1 transcript uploaded: {len(transcript.content)} bytes")
# Reload session for turn 2
session = await get_chat_session(session.session_id, test_user_id)
assert session, "Session not found after turn 1"
# --- Turn 2: ask model to recall the keyword ---
turn2_msg = "What was the special keyword I asked you to remember?"
turn2_text = ""
turn2_errors: list[str] = []
turn2_ended = False
async for chunk in sdk_service.stream_chat_completion_sdk(
session.session_id,
turn2_msg,
user_id=test_user_id,
session=session,
):
if isinstance(chunk, StreamTextDelta):
turn2_text += chunk.delta
elif isinstance(chunk, StreamError):
turn2_errors.append(chunk.errorText)
elif isinstance(chunk, StreamFinish):
turn2_ended = True
assert turn2_ended, "Turn 2 did not finish"
assert not turn2_errors, f"Turn 2 errors: {turn2_errors}"
assert turn2_text, "Turn 2 produced no text"
assert keyword in turn2_text, (
f"Model did not recall keyword '{keyword}' in turn 2. "
f"Response: {turn2_text[:200]}"
)
logger.info(f"Turn 2 recalled keyword successfully: {turn2_text[:100]}")

View File

@@ -0,0 +1,420 @@
"""End-to-end tests for Copilot streaming with dummy implementations.
These tests verify the complete copilot flow using dummy implementations
for agent generator and SDK service, allowing automated testing without
external LLM calls.
Enable test mode with COPILOT_TEST_MODE=true environment variable.
"""
import asyncio
import os
from uuid import uuid4
import pytest
from backend.copilot.model import ChatMessage, ChatSession, upsert_chat_session
from backend.copilot.response_model import (
StreamError,
StreamFinish,
StreamHeartbeat,
StreamStart,
StreamTextDelta,
)
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@pytest.fixture(autouse=True)
def enable_test_mode():
"""Enable test mode for all tests in this module."""
os.environ["COPILOT_TEST_MODE"] = "true"
yield
os.environ.pop("COPILOT_TEST_MODE", None)
@pytest.mark.asyncio
async def test_dummy_streaming_basic_flow():
"""Test that dummy streaming produces correct event sequence."""
events = []
async for event in stream_chat_completion_dummy(
session_id="test-session-basic",
message="Hello",
is_user_message=True,
user_id="test-user",
):
events.append(event)
# Verify we got events
assert len(events) > 0, "Should receive events"
# Verify StreamStart
start_events = [e for e in events if isinstance(e, StreamStart)]
assert len(start_events) == 1
assert start_events[0].messageId
assert start_events[0].sessionId
# Verify StreamTextDelta events
text_events = [e for e in events if isinstance(e, StreamTextDelta)]
assert len(text_events) > 0
full_text = "".join(e.delta for e in text_events)
assert len(full_text) > 0
# Verify StreamFinish
finish_events = [e for e in events if isinstance(e, StreamFinish)]
assert len(finish_events) == 1
# Verify order: start before text before finish
start_idx = events.index(start_events[0])
finish_idx = events.index(finish_events[0])
first_text_idx = events.index(text_events[0]) if text_events else -1
if first_text_idx >= 0:
assert start_idx < first_text_idx < finish_idx
print(f"✅ Basic flow: {len(events)} events, {len(text_events)} text deltas")
@pytest.mark.asyncio
async def test_streaming_no_timeout():
"""Test that streaming completes within reasonable time without timeout."""
import time
start_time = time.monotonic()
event_count = 0
async for event in stream_chat_completion_dummy(
session_id="test-session-timeout",
message="count to 10",
is_user_message=True,
user_id="test-user",
):
event_count += 1
elapsed = time.monotonic() - start_time
# Should complete in < 5 seconds (dummy has 0.1s delays between words)
assert elapsed < 5.0, f"Streaming took {elapsed:.1f}s, expected < 5s"
assert event_count > 0, "Should receive events"
print(f"✅ No timeout: completed in {elapsed:.2f}s with {event_count} events")
@pytest.mark.asyncio
async def test_streaming_event_types():
"""Test that all expected event types are present."""
event_types = set()
async for event in stream_chat_completion_dummy(
session_id="test-session-types",
message="test",
is_user_message=True,
user_id="test-user",
):
event_types.add(type(event).__name__)
# Required event types
assert "StreamStart" in event_types, "Missing StreamStart"
assert "StreamTextDelta" in event_types, "Missing StreamTextDelta"
assert "StreamFinish" in event_types, "Missing StreamFinish"
print(f"✅ Event types: {sorted(event_types)}")
@pytest.mark.asyncio
async def test_streaming_text_content():
"""Test that streamed text is coherent and complete."""
text_events = []
async for event in stream_chat_completion_dummy(
session_id="test-session-content",
message="count to 3",
is_user_message=True,
user_id="test-user",
):
if isinstance(event, StreamTextDelta):
text_events.append(event)
# Verify text deltas
assert len(text_events) > 0, "Should have text deltas"
# Reconstruct full text
full_text = "".join(e.delta for e in text_events)
assert len(full_text) > 0, "Text should not be empty"
assert (
"1" in full_text or "counted" in full_text.lower()
), "Text should contain count"
# Verify all deltas have IDs
for text_event in text_events:
assert text_event.id, "Text delta must have ID"
assert text_event.delta, "Text delta must have content"
print(f"✅ Text content: '{full_text}' ({len(text_events)} deltas)")
@pytest.mark.asyncio
async def test_streaming_heartbeat_timing():
"""Test that heartbeats are sent at correct interval during long operations."""
# This test would need a dummy that takes longer
# For now, just verify heartbeat structure if we receive one
heartbeats = []
async for event in stream_chat_completion_dummy(
session_id="test-session-heartbeat",
message="test",
is_user_message=True,
user_id="test-user",
):
if isinstance(event, StreamHeartbeat):
heartbeats.append(event)
# Dummy is fast, so we might not get heartbeats
# But if we do, verify they're valid
if heartbeats:
print(f"✅ Heartbeat structure verified ({len(heartbeats)} received)")
else:
print("✅ No heartbeats (dummy executes quickly)")
@pytest.mark.asyncio
async def test_error_handling():
"""Test that errors are properly formatted and sent."""
# This would require a dummy that can trigger errors
# For now, just verify error event structure
error = StreamError(errorText="Test error", code="test_error")
assert error.errorText == "Test error"
assert error.code == "test_error"
assert str(error.type.value) in ["error", "error"]
print("✅ Error structure verified")
@pytest.mark.asyncio
async def test_concurrent_sessions():
"""Test that multiple sessions can stream concurrently."""
async def stream_session(session_id: str) -> int:
count = 0
async for event in stream_chat_completion_dummy(
session_id=session_id,
message="test",
is_user_message=True,
user_id="test-user",
):
count += 1
return count
# Run 3 concurrent sessions
results = await asyncio.gather(
stream_session("session-1"),
stream_session("session-2"),
stream_session("session-3"),
)
# All should complete successfully
assert all(count > 0 for count in results), "All sessions should produce events"
print(f"✅ Concurrent sessions: {results} events each")
@pytest.mark.asyncio
@pytest.mark.xfail(
reason="Event loop isolation issue with DB operations in tests - needs fixture refactoring"
)
async def test_session_state_persistence():
"""Test that session state is maintained across multiple messages."""
from datetime import datetime, timezone
session_id = f"test-session-{uuid4()}"
user_id = "test-user"
# Create session with first message
session = ChatSession(
session_id=session_id,
user_id=user_id,
messages=[
ChatMessage(role="user", content="Hello"),
ChatMessage(role="assistant", content="Hi there!"),
],
usage=[],
started_at=datetime.now(timezone.utc),
updated_at=datetime.now(timezone.utc),
)
await upsert_chat_session(session)
# Stream second message
events = []
async for event in stream_chat_completion_dummy(
session_id=session_id,
message="How are you?",
is_user_message=True,
user_id=user_id,
session=session, # Pass existing session
):
events.append(event)
# Verify events were produced
assert len(events) > 0, "Should produce events for second message"
# Verify we got a complete response
finish_events = [e for e in events if isinstance(e, StreamFinish)]
assert len(finish_events) == 1, "Should have StreamFinish"
print(f"✅ Session persistence: {len(events)} events for second message")
@pytest.mark.asyncio
async def test_message_deduplication():
"""Test that duplicate messages are filtered out."""
# Simulate receiving duplicate events (e.g., from reconnection)
events = []
# First stream
async for event in stream_chat_completion_dummy(
session_id="test-dedup-1",
message="Hello",
is_user_message=True,
user_id="test-user",
):
events.append(event)
if isinstance(event, StreamFinish):
break
# Count unique message IDs in StreamStart events
start_events = [e for e in events if isinstance(e, StreamStart)]
message_ids = [e.messageId for e in start_events]
# Verify all IDs are present
assert len(message_ids) == len(set(message_ids)), "Message IDs should be unique"
print(f"✅ Deduplication: {len(events)} events, all unique")
@pytest.mark.asyncio
async def test_event_ordering():
"""Test that events arrive in correct order."""
events = []
async for event in stream_chat_completion_dummy(
session_id="test-ordering",
message="Test",
is_user_message=True,
user_id="test-user",
):
events.append(event)
# Find event indices
start_idx = next(
(i for i, e in enumerate(events) if isinstance(e, StreamStart)), None
)
text_indices = [i for i, e in enumerate(events) if isinstance(e, StreamTextDelta)]
finish_idx = next(
(i for i, e in enumerate(events) if isinstance(e, StreamFinish)), None
)
# Verify ordering
assert start_idx is not None, "Should have StreamStart"
assert finish_idx is not None, "Should have StreamFinish"
assert start_idx == 0, "StreamStart should be first"
assert finish_idx == len(events) - 1, "StreamFinish should be last"
if text_indices:
assert all(
start_idx < i < finish_idx for i in text_indices
), "Text deltas should be between start and finish"
print(f"✅ Event ordering: start({start_idx}) < text < finish({finish_idx})")
@pytest.mark.asyncio
async def test_stream_completeness():
"""Test that stream includes all required event types."""
events = []
async for event in stream_chat_completion_dummy(
session_id="test-completeness",
message="Complete stream test",
is_user_message=True,
user_id="test-user",
):
events.append(event)
# Check for required events
has_start = any(isinstance(e, StreamStart) for e in events)
has_text = any(isinstance(e, StreamTextDelta) for e in events)
has_finish = any(isinstance(e, StreamFinish) for e in events)
assert has_start, "Stream must include StreamStart"
assert has_text, "Stream must include text deltas"
assert has_finish, "Stream must include StreamFinish"
# Verify exactly one start and one finish
start_count = sum(1 for e in events if isinstance(e, StreamStart))
finish_count = sum(1 for e in events if isinstance(e, StreamFinish))
assert start_count == 1, f"Should have exactly 1 StreamStart, got {start_count}"
assert finish_count == 1, f"Should have exactly 1 StreamFinish, got {finish_count}"
print(
f"✅ Completeness: 1 start, {sum(1 for e in events if isinstance(e, StreamTextDelta))} text, 1 finish"
)
@pytest.mark.asyncio
async def test_text_delta_consistency():
"""Test that text deltas have consistent IDs and build coherent text."""
text_events = []
async for event in stream_chat_completion_dummy(
session_id="test-consistency",
message="Test consistency",
is_user_message=True,
user_id="test-user",
):
if isinstance(event, StreamTextDelta):
text_events.append(event)
# Verify all text deltas have IDs
assert all(e.id for e in text_events), "All text deltas must have IDs"
# Verify all deltas have the same ID (same text block)
if text_events:
first_id = text_events[0].id
assert all(
e.id == first_id for e in text_events
), "All text deltas should share the same block ID"
# Verify deltas build coherent text
full_text = "".join(e.delta for e in text_events)
assert len(full_text) > 0, "Deltas should build non-empty text"
assert (
full_text == full_text.strip()
), "Text should not have leading/trailing whitespace artifacts"
print(
f"✅ Consistency: {len(text_events)} deltas with ID '{text_events[0].id if text_events else 'N/A'}', text: '{full_text}'"
)
if __name__ == "__main__":
# Run tests directly
print("Running Copilot E2E tests with dummy implementations...")
print("=" * 60)
asyncio.run(test_dummy_streaming_basic_flow())
asyncio.run(test_streaming_no_timeout())
asyncio.run(test_streaming_event_types())
asyncio.run(test_streaming_text_content())
asyncio.run(test_streaming_heartbeat_timing())
asyncio.run(test_error_handling())
asyncio.run(test_concurrent_sessions())
asyncio.run(test_session_state_persistence())
asyncio.run(test_message_deduplication())
asyncio.run(test_event_ordering())
asyncio.run(test_stream_completeness())
asyncio.run(test_text_delta_consistency())
print("=" * 60)
print("✅ All E2E tests passed!")

View File

@@ -3,15 +3,17 @@ from typing import TYPE_CHECKING, Any
from openai.types.chat import ChatCompletionToolParam
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.tracking import track_tool_called
from backend.copilot.model import ChatSession
from backend.copilot.tracking import track_tool_called
from .add_understanding import AddUnderstandingTool
from .agent_output import AgentOutputTool
from .base import BaseTool
from .bash_exec import BashExecTool
from .create_agent import CreateAgentTool
from .customize_agent import CustomizeAgentTool
from .edit_agent import EditAgentTool
from .feature_requests import CreateFeatureRequestTool, SearchFeatureRequestsTool
from .find_agent import FindAgentTool
from .find_block import FindBlockTool
from .find_library_agent import FindLibraryAgentTool
@@ -19,6 +21,7 @@ from .get_doc_page import GetDocPageTool
from .run_agent import RunAgentTool
from .run_block import RunBlockTool
from .search_docs import SearchDocsTool
from .web_fetch import WebFetchTool
from .workspace_files import (
DeleteWorkspaceFileTool,
ListWorkspaceFilesTool,
@@ -27,7 +30,7 @@ from .workspace_files import (
)
if TYPE_CHECKING:
from backend.api.features.chat.response_model import StreamToolOutputAvailable
from backend.copilot.response_model import StreamToolOutputAvailable
logger = logging.getLogger(__name__)
@@ -45,6 +48,14 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
"view_agent_output": AgentOutputTool(),
"search_docs": SearchDocsTool(),
"get_doc_page": GetDocPageTool(),
# Web fetch for safe URL retrieval
"web_fetch": WebFetchTool(),
# Sandboxed code execution (bubblewrap)
"bash_exec": BashExecTool(),
# Persistent workspace tools (cloud storage, survives across sessions)
# Feature request tools
"search_feature_requests": SearchFeatureRequestsTool(),
"create_feature_request": CreateFeatureRequestTool(),
# Workspace tools for CoPilot file operations
"list_workspace_files": ListWorkspaceFilesTool(),
"read_workspace_file": ReadWorkspaceFileTool(),

View File

@@ -6,11 +6,11 @@ import pytest
from prisma.types import ProfileCreateInput
from pydantic import SecretStr
from backend.api.features.chat.model import ChatSession
from backend.api.features.store import db as store_db
from backend.blocks.firecrawl.scrape import FirecrawlScrapeBlock
from backend.blocks.io import AgentInputBlock, AgentOutputBlock
from backend.blocks.llm import AITextGeneratorBlock
from backend.copilot.model import ChatSession
from backend.data.db import prisma
from backend.data.graph import Graph, Link, Node, create_graph
from backend.data.model import APIKeyCredentials

View File

@@ -3,11 +3,9 @@
import logging
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.data.understanding import (
BusinessUnderstandingInput,
upsert_business_understanding,
)
from backend.copilot.model import ChatSession
from backend.data.db_accessors import understanding_db
from backend.data.understanding import BusinessUnderstandingInput
from .base import BaseTool
from .models import ErrorResponse, ToolResponseBase, UnderstandingUpdatedResponse
@@ -99,7 +97,9 @@ and automations for the user's specific needs."""
]
# Upsert with merge
understanding = await upsert_business_understanding(user_id, input_data)
understanding = await understanding_db().upsert_business_understanding(
user_id, input_data
)
# Build current understanding summary (filter out empty values)
current_understanding = {

View File

@@ -19,6 +19,7 @@ from .core import (
get_all_relevant_agents_for_generation,
get_library_agent_by_graph_id,
get_library_agent_by_id,
get_library_agents_by_ids,
get_library_agents_for_generation,
graph_to_json,
json_to_graph,
@@ -49,6 +50,7 @@ __all__ = [
"get_all_relevant_agents_for_generation",
"get_library_agent_by_graph_id",
"get_library_agent_by_id",
"get_library_agents_by_ids",
"get_library_agents_for_generation",
"get_user_message_for_error",
"graph_to_json",

View File

@@ -3,11 +3,11 @@
import logging
import re
import uuid
from collections.abc import Sequence
from typing import Any, NotRequired, TypedDict
from backend.api.features.library import db as library_db
from backend.api.features.store import db as store_db
from backend.data.graph import Graph, Link, Node, get_graph, get_store_listed_graphs
from backend.data.db_accessors import graph_db, library_db, store_db
from backend.data.graph import Graph, Link, Node
from backend.util.exceptions import DatabaseError, NotFoundError
from .service import (
@@ -79,7 +79,7 @@ AgentSummary = LibraryAgentSummary | MarketplaceAgentSummary | dict[str, Any]
def _to_dict_list(
agents: list[AgentSummary] | list[dict[str, Any]] | None,
agents: Sequence[AgentSummary] | Sequence[dict[str, Any]] | None,
) -> list[dict[str, Any]] | None:
"""Convert typed agent summaries to plain dicts for external service calls."""
if agents is None:
@@ -145,8 +145,9 @@ async def get_library_agent_by_id(
Returns:
LibraryAgentSummary if found, None otherwise
"""
db = library_db()
try:
agent = await library_db.get_library_agent_by_graph_id(user_id, agent_id)
agent = await db.get_library_agent_by_graph_id(user_id, agent_id)
if agent:
logger.debug(f"Found library agent by graph_id: {agent.name}")
return LibraryAgentSummary(
@@ -163,7 +164,7 @@ async def get_library_agent_by_id(
logger.debug(f"Could not fetch library agent by graph_id {agent_id}: {e}")
try:
agent = await library_db.get_library_agent(agent_id, user_id)
agent = await db.get_library_agent(agent_id, user_id)
if agent:
logger.debug(f"Found library agent by library_id: {agent.name}")
return LibraryAgentSummary(
@@ -190,6 +191,36 @@ async def get_library_agent_by_id(
get_library_agent_by_graph_id = get_library_agent_by_id
async def get_library_agents_by_ids(
user_id: str,
agent_ids: list[str],
) -> list[LibraryAgentSummary]:
"""Fetch multiple library agents by their IDs.
Args:
user_id: The user ID
agent_ids: List of agent IDs (can be graph_ids or library agent IDs)
Returns:
List of LibraryAgentSummary for found agents (silently skips not found)
"""
agents: list[LibraryAgentSummary] = []
for agent_id in agent_ids:
try:
agent = await get_library_agent_by_id(user_id, agent_id)
if agent:
agents.append(agent)
logger.debug(f"Fetched library agent by ID: {agent['name']}")
else:
logger.warning(f"Library agent not found for ID: {agent_id}")
except Exception as e:
logger.warning(f"Failed to fetch library agent {agent_id}: {e}")
continue
logger.info(f"Fetched {len(agents)}/{len(agent_ids)} library agents by ID")
return agents
async def get_library_agents_for_generation(
user_id: str,
search_query: str | None = None,
@@ -214,10 +245,17 @@ async def get_library_agents_for_generation(
Returns:
List of LibraryAgentSummary with schemas and recent executions for sub-agent composition
"""
search_term = search_query.strip() if search_query else None
if search_term and len(search_term) > 100:
raise ValueError(
f"Search query is too long ({len(search_term)} chars, max 100). "
f"Please use a shorter, more specific search term."
)
try:
response = await library_db.list_library_agents(
response = await library_db().list_library_agents(
user_id=user_id,
search_term=search_query,
search_term=search_term,
page=1,
page_size=max_results,
include_executions=True,
@@ -271,9 +309,16 @@ async def search_marketplace_agents_for_generation(
Returns:
List of LibraryAgentSummary with full input/output schemas
"""
search_term = search_query.strip()
if len(search_term) > 100:
raise ValueError(
f"Search query is too long ({len(search_term)} chars, max 100). "
f"Please use a shorter, more specific search term."
)
try:
response = await store_db.get_store_agents(
search_query=search_query,
response = await store_db().get_store_agents(
search_query=search_term,
page=1,
page_size=max_results,
)
@@ -286,7 +331,7 @@ async def search_marketplace_agents_for_generation(
return []
graph_ids = [agent.agent_graph_id for agent in agents_with_graphs]
graphs = await get_store_listed_graphs(*graph_ids)
graphs = await graph_db().get_store_listed_graphs(graph_ids)
results: list[LibraryAgentSummary] = []
for agent in agents_with_graphs:
@@ -424,7 +469,7 @@ def extract_search_terms_from_steps(
async def enrich_library_agents_from_steps(
user_id: str,
decomposition_result: DecompositionResult | dict[str, Any],
existing_agents: list[AgentSummary] | list[dict[str, Any]],
existing_agents: Sequence[AgentSummary] | Sequence[dict[str, Any]],
exclude_graph_id: str | None = None,
include_marketplace: bool = True,
max_additional_results: int = 10,
@@ -448,7 +493,7 @@ async def enrich_library_agents_from_steps(
search_terms = extract_search_terms_from_steps(decomposition_result)
if not search_terms:
return existing_agents
return list(existing_agents)
existing_ids: set[str] = set()
existing_names: set[str] = set()
@@ -511,7 +556,7 @@ async def enrich_library_agents_from_steps(
async def decompose_goal(
description: str,
context: str = "",
library_agents: list[AgentSummary] | None = None,
library_agents: Sequence[AgentSummary] | None = None,
) -> DecompositionResult | None:
"""Break down a goal into steps or return clarifying questions.
@@ -539,22 +584,16 @@ async def decompose_goal(
async def generate_agent(
instructions: DecompositionResult | dict[str, Any],
library_agents: list[AgentSummary] | list[dict[str, Any]] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
library_agents: Sequence[AgentSummary] | Sequence[dict[str, Any]] | None = None,
) -> dict[str, Any] | None:
"""Generate agent JSON from instructions.
Args:
instructions: Structured instructions from decompose_goal
library_agents: User's library agents available for sub-agent composition
operation_id: Operation ID for async processing (enables Redis Streams
completion notification)
task_id: Task ID for async processing (enables Redis Streams persistence
and SSE delivery)
Returns:
Agent JSON dict, {"status": "accepted"} for async, error dict {"type": "error", ...}, or None on error
Agent JSON dict, error dict {"type": "error", ...}, or None on error
Raises:
AgentGeneratorNotConfiguredError: If the external service is not configured.
@@ -562,13 +601,9 @@ async def generate_agent(
_check_service_configured()
logger.info("Calling external Agent Generator service for generate_agent")
result = await generate_agent_external(
dict(instructions), _to_dict_list(library_agents), operation_id, task_id
dict(instructions), _to_dict_list(library_agents)
)
# Don't modify async response
if result and result.get("status") == "accepted":
return result
if result:
if isinstance(result, dict) and result.get("type") == "error":
return result
@@ -673,9 +708,10 @@ async def save_agent_to_library(
Tuple of (created Graph, LibraryAgent)
"""
graph = json_to_graph(agent_json)
db = library_db()
if is_update:
return await library_db.update_graph_in_library(graph, user_id)
return await library_db.create_graph_in_library(graph, user_id)
return await db.update_graph_in_library(graph, user_id)
return await db.create_graph_in_library(graph, user_id)
def graph_to_json(graph: Graph) -> dict[str, Any]:
@@ -735,12 +771,14 @@ async def get_agent_as_json(
Returns:
Agent as JSON dict or None if not found
"""
graph = await get_graph(agent_id, version=None, user_id=user_id)
db = graph_db()
graph = await db.get_graph(agent_id, version=None, user_id=user_id)
if not graph and user_id:
try:
library_agent = await library_db.get_library_agent(agent_id, user_id)
graph = await get_graph(
library_agent = await library_db().get_library_agent(agent_id, user_id)
graph = await db.get_graph(
library_agent.graph_id, version=None, user_id=user_id
)
except NotFoundError:
@@ -755,9 +793,7 @@ async def get_agent_as_json(
async def generate_agent_patch(
update_request: str,
current_agent: dict[str, Any],
library_agents: list[AgentSummary] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
library_agents: Sequence[AgentSummary] | None = None,
) -> dict[str, Any] | None:
"""Update an existing agent using natural language.
@@ -770,12 +806,10 @@ async def generate_agent_patch(
update_request: Natural language description of changes
current_agent: Current agent JSON
library_agents: User's library agents available for sub-agent composition
operation_id: Operation ID for async processing (enables Redis Streams callback)
task_id: Task ID for async processing (enables Redis Streams callback)
Returns:
Updated agent JSON, clarifying questions dict {"type": "clarifying_questions", ...},
{"status": "accepted"} for async, error dict {"type": "error", ...}, or None on error
error dict {"type": "error", ...}, or None on error
Raises:
AgentGeneratorNotConfiguredError: If the external service is not configured.
@@ -786,8 +820,6 @@ async def generate_agent_patch(
update_request,
current_agent,
_to_dict_list(library_agents),
operation_id,
task_id,
)

View File

@@ -102,10 +102,15 @@ async def generate_agent_dummy(
instructions: dict[str, Any],
library_agents: list[dict[str, Any]] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
session_id: str | None = None,
) -> dict[str, Any]:
"""Return dummy agent JSON after a simulated delay."""
logger.info("Using dummy agent generator for generate_agent (30s delay)")
"""Return dummy agent synchronously (blocks for 30s, returns agent JSON).
Note: operation_id and session_id parameters are ignored - we always use synchronous mode.
"""
logger.info(
"Using dummy agent generator (sync mode): returning agent JSON after 30s"
)
await asyncio.sleep(30)
return _generate_dummy_agent_json()
@@ -115,10 +120,16 @@ async def generate_agent_patch_dummy(
current_agent: dict[str, Any],
library_agents: list[dict[str, Any]] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
session_id: str | None = None,
) -> dict[str, Any]:
"""Return dummy patched agent (returns the current agent with updated description)."""
logger.info("Using dummy agent generator for generate_agent_patch")
"""Return dummy patched agent synchronously (blocks for 30s, returns patched agent JSON).
Note: operation_id and session_id parameters are ignored - we always use synchronous mode.
"""
logger.info(
"Using dummy agent generator patch (sync mode): returning patched agent after 30s"
)
await asyncio.sleep(30)
patched = current_agent.copy()
patched["description"] = (
f"{current_agent.get('description', '')} (updated: {update_request})"

View File

@@ -242,24 +242,18 @@ async def decompose_goal_external(
async def generate_agent_external(
instructions: dict[str, Any],
library_agents: list[dict[str, Any]] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
) -> dict[str, Any] | None:
"""Call the external service to generate an agent from instructions.
Args:
instructions: Structured instructions from decompose_goal
library_agents: User's library agents available for sub-agent composition
operation_id: Operation ID for async processing (enables Redis Streams callback)
task_id: Task ID for async processing (enables Redis Streams callback)
Returns:
Agent JSON dict, {"status": "accepted"} for async, or error dict {"type": "error", ...} on error
Agent JSON dict or error dict {"type": "error", ...} on error
"""
if _is_dummy_mode():
return await generate_agent_dummy(
instructions, library_agents, operation_id, task_id
)
return await generate_agent_dummy(instructions, library_agents)
client = _get_client()
@@ -267,25 +261,9 @@ async def generate_agent_external(
payload: dict[str, Any] = {"instructions": instructions}
if library_agents:
payload["library_agents"] = library_agents
if operation_id and task_id:
payload["operation_id"] = operation_id
payload["task_id"] = task_id
try:
response = await client.post("/api/generate-agent", json=payload)
# Handle 202 Accepted for async processing
if response.status_code == 202:
logger.info(
f"Agent Generator accepted async request "
f"(operation_id={operation_id}, task_id={task_id})"
)
return {
"status": "accepted",
"operation_id": operation_id,
"task_id": task_id,
}
response.raise_for_status()
data = response.json()
@@ -317,8 +295,6 @@ async def generate_agent_patch_external(
update_request: str,
current_agent: dict[str, Any],
library_agents: list[dict[str, Any]] | None = None,
operation_id: str | None = None,
task_id: str | None = None,
) -> dict[str, Any] | None:
"""Call the external service to generate a patch for an existing agent.
@@ -327,14 +303,14 @@ async def generate_agent_patch_external(
current_agent: Current agent JSON
library_agents: User's library agents available for sub-agent composition
operation_id: Operation ID for async processing (enables Redis Streams callback)
task_id: Task ID for async processing (enables Redis Streams callback)
session_id: Session ID for async processing (enables Redis Streams callback)
Returns:
Updated agent JSON, clarifying questions dict, {"status": "accepted"} for async, or error dict on error
"""
if _is_dummy_mode():
return await generate_agent_patch_dummy(
update_request, current_agent, library_agents, operation_id, task_id
update_request, current_agent, library_agents
)
client = _get_client()
@@ -346,25 +322,9 @@ async def generate_agent_patch_external(
}
if library_agents:
payload["library_agents"] = library_agents
if operation_id and task_id:
payload["operation_id"] = operation_id
payload["task_id"] = task_id
try:
response = await client.post("/api/update-agent", json=payload)
# Handle 202 Accepted for async processing
if response.status_code == 202:
logger.info(
f"Agent Generator accepted async update request "
f"(operation_id={operation_id}, task_id={task_id})"
)
return {
"status": "accepted",
"operation_id": operation_id,
"task_id": task_id,
}
response.raise_for_status()
data = response.json()
@@ -419,6 +379,8 @@ async def customize_template_external(
template_agent: The template agent JSON to customize
modification_request: Natural language description of customizations
context: Additional context (e.g., answers to previous questions)
operation_id: Operation ID for async processing (enables Redis Streams callback)
session_id: Session ID for async processing (enables Redis Streams callback)
Returns:
Customized agent JSON, clarifying questions dict, or error dict on error

View File

@@ -7,10 +7,9 @@ from typing import Any
from pydantic import BaseModel, field_validator
from backend.api.features.chat.model import ChatSession
from backend.api.features.library import db as library_db
from backend.api.features.library.model import LibraryAgent
from backend.data import execution as execution_db
from backend.copilot.model import ChatSession
from backend.data.db_accessors import execution_db, library_db
from backend.data.execution import ExecutionStatus, GraphExecution, GraphExecutionMeta
from .base import BaseTool
@@ -165,10 +164,12 @@ class AgentOutputTool(BaseTool):
Resolve agent from provided identifiers.
Returns (library_agent, error_message).
"""
lib_db = library_db()
# Priority 1: Exact library agent ID
if library_agent_id:
try:
agent = await library_db.get_library_agent(library_agent_id, user_id)
agent = await lib_db.get_library_agent(library_agent_id, user_id)
return agent, None
except Exception as e:
logger.warning(f"Failed to get library agent by ID: {e}")
@@ -182,7 +183,7 @@ class AgentOutputTool(BaseTool):
return None, f"Agent '{store_slug}' not found in marketplace"
# Find in user's library by graph_id
agent = await library_db.get_library_agent_by_graph_id(user_id, graph.id)
agent = await lib_db.get_library_agent_by_graph_id(user_id, graph.id)
if not agent:
return (
None,
@@ -194,7 +195,7 @@ class AgentOutputTool(BaseTool):
# Priority 3: Fuzzy name search in library
if agent_name:
try:
response = await library_db.list_library_agents(
response = await lib_db.list_library_agents(
user_id=user_id,
search_term=agent_name,
page_size=5,
@@ -228,9 +229,11 @@ class AgentOutputTool(BaseTool):
Fetch execution(s) based on filters.
Returns (single_execution, available_executions_meta, error_message).
"""
exec_db = execution_db()
# If specific execution_id provided, fetch it directly
if execution_id:
execution = await execution_db.get_graph_execution(
execution = await exec_db.get_graph_execution(
user_id=user_id,
execution_id=execution_id,
include_node_executions=False,
@@ -240,7 +243,7 @@ class AgentOutputTool(BaseTool):
return execution, [], None
# Get completed executions with time filters
executions = await execution_db.get_graph_executions(
executions = await exec_db.get_graph_executions(
graph_id=graph_id,
user_id=user_id,
statuses=[ExecutionStatus.COMPLETED],
@@ -254,7 +257,7 @@ class AgentOutputTool(BaseTool):
# If only one execution, fetch full details
if len(executions) == 1:
full_execution = await execution_db.get_graph_execution(
full_execution = await exec_db.get_graph_execution(
user_id=user_id,
execution_id=executions[0].id,
include_node_executions=False,
@@ -262,7 +265,7 @@ class AgentOutputTool(BaseTool):
return full_execution, [], None
# Multiple executions - return latest with full details, plus list of available
full_execution = await execution_db.get_graph_execution(
full_execution = await exec_db.get_graph_execution(
user_id=user_id,
execution_id=executions[0].id,
include_node_executions=False,
@@ -380,7 +383,7 @@ class AgentOutputTool(BaseTool):
and not input_data.store_slug
):
# Fetch execution directly to get graph_id
execution = await execution_db.get_graph_execution(
execution = await execution_db().get_graph_execution(
user_id=user_id,
execution_id=input_data.execution_id,
include_node_executions=False,
@@ -392,7 +395,7 @@ class AgentOutputTool(BaseTool):
)
# Find library agent by graph_id
agent = await library_db.get_library_agent_by_graph_id(
agent = await library_db().get_library_agent_by_graph_id(
user_id, execution.graph_id
)
if not agent:

View File

@@ -4,8 +4,7 @@ import logging
import re
from typing import Literal
from backend.api.features.library import db as library_db
from backend.api.features.store import db as store_db
from backend.data.db_accessors import library_db, store_db
from backend.util.exceptions import DatabaseError, NotFoundError
from .models import (
@@ -45,8 +44,10 @@ async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | N
Returns:
AgentInfo if found, None otherwise
"""
lib_db = library_db()
try:
agent = await library_db.get_library_agent_by_graph_id(user_id, agent_id)
agent = await lib_db.get_library_agent_by_graph_id(user_id, agent_id)
if agent:
logger.debug(f"Found library agent by graph_id: {agent.name}")
return AgentInfo(
@@ -71,7 +72,7 @@ async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | N
)
try:
agent = await library_db.get_library_agent(agent_id, user_id)
agent = await lib_db.get_library_agent(agent_id, user_id)
if agent:
logger.debug(f"Found library agent by library_id: {agent.name}")
return AgentInfo(
@@ -133,7 +134,7 @@ async def search_agents(
try:
if source == "marketplace":
logger.info(f"Searching marketplace for: {query}")
results = await store_db.get_store_agents(search_query=query, page_size=5)
results = await store_db().get_store_agents(search_query=query, page_size=5)
for agent in results.agents:
agents.append(
AgentInfo(
@@ -159,7 +160,7 @@ async def search_agents(
if not agents:
logger.info(f"Searching user library for: {query}")
results = await library_db.list_library_agents(
results = await library_db().list_library_agents(
user_id=user_id, # type: ignore[arg-type]
search_term=query,
page_size=10,

View File

@@ -5,8 +5,8 @@ from typing import Any
from openai.types.chat import ChatCompletionToolParam
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.response_model import StreamToolOutputAvailable
from backend.copilot.model import ChatSession
from backend.copilot.response_model import StreamToolOutputAvailable
from .models import ErrorResponse, NeedLoginResponse, ToolResponseBase
@@ -36,16 +36,6 @@ class BaseTool:
"""Whether this tool requires authentication."""
return False
@property
def is_long_running(self) -> bool:
"""Whether this tool is long-running and should execute in background.
Long-running tools (like agent generation) are executed via background
tasks to survive SSE disconnections. The result is persisted to chat
history and visible when the user refreshes.
"""
return False
def as_openai_tool(self) -> ChatCompletionToolParam:
"""Convert to OpenAI tool format."""
return ChatCompletionToolParam(

View File

@@ -0,0 +1,124 @@
"""Bash execution tool — run shell commands in a bubblewrap sandbox.
Full Bash scripting is allowed (loops, conditionals, pipes, functions, etc.).
Safety comes from OS-level isolation (bubblewrap): only system dirs visible
read-only, writable workspace only, clean env, no network.
Requires bubblewrap (``bwrap``) — the tool is disabled when bwrap is not
available (e.g. macOS development).
"""
import logging
from typing import Any
from backend.copilot.model import ChatSession
from .base import BaseTool
from .models import BashExecResponse, ErrorResponse, ToolResponseBase
from .sandbox import get_workspace_dir, has_full_sandbox, run_sandboxed
logger = logging.getLogger(__name__)
class BashExecTool(BaseTool):
"""Execute Bash commands in a bubblewrap sandbox."""
@property
def name(self) -> str:
return "bash_exec"
@property
def description(self) -> str:
if not has_full_sandbox():
return (
"Bash execution is DISABLED — bubblewrap sandbox is not "
"available on this platform. Do not call this tool."
)
return (
"Execute a Bash command or script in a bubblewrap sandbox. "
"Full Bash scripting is supported (loops, conditionals, pipes, "
"functions, etc.). "
"The sandbox shares the same working directory as the SDK Read/Write "
"tools — files created by either are accessible to both. "
"SECURITY: Only system directories (/usr, /bin, /lib, /etc) are "
"visible read-only, the per-session workspace is the only writable "
"path, environment variables are wiped (no secrets), all network "
"access is blocked at the kernel level, and resource limits are "
"enforced (max 64 processes, 512MB memory, 50MB file size). "
"Application code, configs, and other directories are NOT accessible. "
"To fetch web content, use the web_fetch tool instead. "
"Execution is killed after the timeout (default 30s, max 120s). "
"Returns stdout and stderr. "
"Useful for file manipulation, data processing with Unix tools "
"(grep, awk, sed, jq, etc.), and running shell scripts."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "Bash command or script to execute.",
},
"timeout": {
"type": "integer",
"description": (
"Max execution time in seconds (default 30, max 120)."
),
"default": 30,
},
},
"required": ["command"],
}
@property
def requires_auth(self) -> bool:
return False
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs: Any,
) -> ToolResponseBase:
session_id = session.session_id if session else None
if not has_full_sandbox():
return ErrorResponse(
message="bash_exec requires bubblewrap sandbox (Linux only).",
error="sandbox_unavailable",
session_id=session_id,
)
command: str = (kwargs.get("command") or "").strip()
timeout: int = kwargs.get("timeout", 30)
if not command:
return ErrorResponse(
message="No command provided.",
error="empty_command",
session_id=session_id,
)
workspace = get_workspace_dir(session_id or "default")
stdout, stderr, exit_code, timed_out = await run_sandboxed(
command=["bash", "-c", command],
cwd=workspace,
timeout=timeout,
)
return BashExecResponse(
message=(
"Execution timed out"
if timed_out
else f"Command executed (exit {exit_code})"
),
stdout=stdout,
stderr=stderr,
exit_code=exit_code,
timed_out=timed_out,
session_id=session_id,
)

View File

@@ -3,14 +3,13 @@
import logging
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.copilot.model import ChatSession
from .agent_generator import (
AgentGeneratorNotConfiguredError,
decompose_goal,
enrich_library_agents_from_steps,
generate_agent,
get_all_relevant_agents_for_generation,
get_user_message_for_error,
save_agent_to_library,
)
@@ -18,10 +17,10 @@ from .base import BaseTool
from .models import (
AgentPreviewResponse,
AgentSavedResponse,
AsyncProcessingResponse,
ClarificationNeededResponse,
ClarifyingQuestion,
ErrorResponse,
SuggestedGoalResponse,
ToolResponseBase,
)
@@ -39,17 +38,16 @@ class CreateAgentTool(BaseTool):
def description(self) -> str:
return (
"Create a new agent workflow from a natural language description. "
"First generates a preview, then saves to library if save=true."
"First generates a preview, then saves to library if save=true. "
"\n\nIMPORTANT: Before calling this tool, search for relevant existing agents "
"using find_library_agent that could be used as building blocks. "
"Pass their IDs in the library_agent_ids parameter so the generator can compose them."
)
@property
def requires_auth(self) -> bool:
return True
@property
def is_long_running(self) -> bool:
return True
@property
def parameters(self) -> dict[str, Any]:
return {
@@ -69,6 +67,15 @@ class CreateAgentTool(BaseTool):
"Include any preferences or constraints mentioned by the user."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": (
"List of library agent IDs to use as building blocks. "
"Search for relevant agents using find_library_agent first, "
"then pass their IDs here so they can be composed into the new agent."
),
},
"save": {
"type": "boolean",
"description": (
@@ -96,12 +103,14 @@ class CreateAgentTool(BaseTool):
"""
description = kwargs.get("description", "").strip()
context = kwargs.get("context", "")
library_agent_ids = kwargs.get("library_agent_ids", [])
save = kwargs.get("save", True)
session_id = session.session_id if session else None
# Extract async processing params (passed by long-running tool handler)
operation_id = kwargs.get("_operation_id")
task_id = kwargs.get("_task_id")
logger.info(
f"[AGENT_CREATE_DEBUG] START - description_len={len(description)}, "
f"library_agent_ids={library_agent_ids}, save={save}, user_id={user_id}, session_id={session_id}"
)
if not description:
return ErrorResponse(
@@ -110,25 +119,34 @@ class CreateAgentTool(BaseTool):
session_id=session_id,
)
# Fetch library agents by IDs if provided
library_agents = None
if user_id:
if user_id and library_agent_ids:
try:
library_agents = await get_all_relevant_agents_for_generation(
from .agent_generator import get_library_agents_by_ids
library_agents = await get_library_agents_by_ids(
user_id=user_id,
search_query=description,
include_marketplace=True,
agent_ids=library_agent_ids,
)
logger.debug(
f"Found {len(library_agents)} relevant agents for sub-agent composition"
f"Fetched {len(library_agents)} library agents by ID for sub-agent composition"
)
except Exception as e:
logger.warning(f"Failed to fetch library agents: {e}")
logger.warning(f"Failed to fetch library agents by IDs: {e}")
try:
decomposition_result = await decompose_goal(
description, context, library_agents
)
logger.info(
f"[AGENT_CREATE_DEBUG] DECOMPOSE - type={decomposition_result.get('type') if decomposition_result else None}, "
f"session_id={session_id}"
)
except AgentGeneratorNotConfiguredError:
logger.error(
f"[AGENT_CREATE_DEBUG] ERROR - AgentGeneratorNotConfigured, session_id={session_id}"
)
return ErrorResponse(
message=(
"Agent generation is not available. "
@@ -186,26 +204,28 @@ class CreateAgentTool(BaseTool):
if decomposition_result.get("type") == "unachievable_goal":
suggested = decomposition_result.get("suggested_goal", "")
reason = decomposition_result.get("reason", "")
return ErrorResponse(
return SuggestedGoalResponse(
message=(
f"This goal cannot be accomplished with the available blocks. "
f"{reason} "
f"Suggestion: {suggested}"
f"This goal cannot be accomplished with the available blocks. {reason}"
),
error="unachievable_goal",
details={"suggested_goal": suggested, "reason": reason},
suggested_goal=suggested,
reason=reason,
original_goal=description,
goal_type="unachievable",
session_id=session_id,
)
if decomposition_result.get("type") == "vague_goal":
suggested = decomposition_result.get("suggested_goal", "")
return ErrorResponse(
message=(
f"The goal is too vague to create a specific workflow. "
f"Suggestion: {suggested}"
),
error="vague_goal",
details={"suggested_goal": suggested},
reason = decomposition_result.get(
"reason", "The goal needs more specific details"
)
return SuggestedGoalResponse(
message="The goal is too vague to create a specific workflow.",
suggested_goal=suggested,
reason=reason,
original_goal=description,
goal_type="vague",
session_id=session_id,
)
@@ -227,10 +247,17 @@ class CreateAgentTool(BaseTool):
agent_json = await generate_agent(
decomposition_result,
library_agents,
operation_id=operation_id,
task_id=task_id,
)
logger.info(
f"[AGENT_CREATE_DEBUG] GENERATE - "
f"success={agent_json is not None}, "
f"is_error={isinstance(agent_json, dict) and agent_json.get('type') == 'error'}, "
f"session_id={session_id}"
)
except AgentGeneratorNotConfiguredError:
logger.error(
f"[AGENT_CREATE_DEBUG] ERROR - AgentGeneratorNotConfigured during generation, session_id={session_id}"
)
return ErrorResponse(
message=(
"Agent generation is not available. "
@@ -273,25 +300,20 @@ class CreateAgentTool(BaseTool):
session_id=session_id,
)
# Check if Agent Generator accepted for async processing
if agent_json.get("status") == "accepted":
logger.info(
f"Agent generation delegated to async processing "
f"(operation_id={operation_id}, task_id={task_id})"
)
return AsyncProcessingResponse(
message="Agent generation started. You'll be notified when it's complete.",
operation_id=operation_id,
task_id=task_id,
session_id=session_id,
)
agent_name = agent_json.get("name", "Generated Agent")
agent_description = agent_json.get("description", "")
node_count = len(agent_json.get("nodes", []))
link_count = len(agent_json.get("links", []))
logger.info(
f"[AGENT_CREATE_DEBUG] AGENT_JSON - name={agent_name}, "
f"nodes={node_count}, links={link_count}, save={save}, session_id={session_id}"
)
if not save:
logger.info(
f"[AGENT_CREATE_DEBUG] RETURN - AgentPreviewResponse, session_id={session_id}"
)
return AgentPreviewResponse(
message=(
f"I've generated an agent called '{agent_name}' with {node_count} blocks. "
@@ -317,6 +339,13 @@ class CreateAgentTool(BaseTool):
agent_json, user_id
)
logger.info(
f"[AGENT_CREATE_DEBUG] SAVED - graph_id={created_graph.id}, "
f"library_agent_id={library_agent.id}, session_id={session_id}"
)
logger.info(
f"[AGENT_CREATE_DEBUG] RETURN - AgentSavedResponse, session_id={session_id}"
)
return AgentSavedResponse(
message=f"Agent '{created_graph.name}' has been saved to your library!",
agent_id=created_graph.id,
@@ -327,6 +356,12 @@ class CreateAgentTool(BaseTool):
session_id=session_id,
)
except Exception as e:
logger.error(
f"[AGENT_CREATE_DEBUG] ERROR - save_failed: {str(e)}, session_id={session_id}"
)
logger.info(
f"[AGENT_CREATE_DEBUG] RETURN - ErrorResponse (save_failed), session_id={session_id}"
)
return ErrorResponse(
message=f"Failed to save the agent: {str(e)}",
error="save_failed",

View File

@@ -0,0 +1,127 @@
"""Tests for CreateAgentTool response types."""
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.tools.create_agent import CreateAgentTool
from backend.copilot.tools.models import (
ClarificationNeededResponse,
ErrorResponse,
SuggestedGoalResponse,
)
from ._test_data import make_session
_TEST_USER_ID = "test-user-create-agent"
@pytest.fixture
def tool():
return CreateAgentTool()
@pytest.fixture
def session():
return make_session(_TEST_USER_ID)
@pytest.mark.asyncio
async def test_missing_description_returns_error(tool, session):
"""Missing description returns ErrorResponse."""
result = await tool._execute(user_id=_TEST_USER_ID, session=session, description="")
assert isinstance(result, ErrorResponse)
assert result.error == "Missing description parameter"
@pytest.mark.asyncio
async def test_vague_goal_returns_suggested_goal_response(tool, session):
"""vague_goal decomposition result returns SuggestedGoalResponse, not ErrorResponse."""
vague_result = {
"type": "vague_goal",
"suggested_goal": "Monitor Twitter mentions for a specific keyword and send a daily digest email",
}
with (
patch(
"backend.copilot.tools.create_agent.decompose_goal",
new_callable=AsyncMock,
return_value=vague_result,
),
):
result = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
description="monitor social media",
)
assert isinstance(result, SuggestedGoalResponse)
assert result.goal_type == "vague"
assert result.suggested_goal == vague_result["suggested_goal"]
assert result.original_goal == "monitor social media"
assert result.reason == "The goal needs more specific details"
assert not isinstance(result, ErrorResponse)
@pytest.mark.asyncio
async def test_unachievable_goal_returns_suggested_goal_response(tool, session):
"""unachievable_goal decomposition result returns SuggestedGoalResponse, not ErrorResponse."""
unachievable_result = {
"type": "unachievable_goal",
"suggested_goal": "Summarize the latest news articles on a topic and send them by email",
"reason": "There are no blocks for mind-reading.",
}
with (
patch(
"backend.copilot.tools.create_agent.decompose_goal",
new_callable=AsyncMock,
return_value=unachievable_result,
),
):
result = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
description="read my mind",
)
assert isinstance(result, SuggestedGoalResponse)
assert result.goal_type == "unachievable"
assert result.suggested_goal == unachievable_result["suggested_goal"]
assert result.original_goal == "read my mind"
assert result.reason == unachievable_result["reason"]
assert not isinstance(result, ErrorResponse)
@pytest.mark.asyncio
async def test_clarifying_questions_returns_clarification_needed_response(
tool, session
):
"""clarifying_questions decomposition result returns ClarificationNeededResponse."""
clarifying_result = {
"type": "clarifying_questions",
"questions": [
{
"question": "What platform should be monitored?",
"keyword": "platform",
"example": "Twitter, Reddit",
}
],
}
with (
patch(
"backend.copilot.tools.create_agent.decompose_goal",
new_callable=AsyncMock,
return_value=clarifying_result,
),
):
result = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
description="monitor social media and alert me",
)
assert isinstance(result, ClarificationNeededResponse)
assert len(result.questions) == 1
assert result.questions[0].keyword == "platform"

View File

@@ -3,9 +3,9 @@
import logging
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.api.features.store import db as store_db
from backend.api.features.store.exceptions import AgentNotFoundError
from backend.copilot.model import ChatSession
from backend.data.db_accessors import store_db as get_store_db
from .agent_generator import (
AgentGeneratorNotConfiguredError,
@@ -46,10 +46,6 @@ class CustomizeAgentTool(BaseTool):
def requires_auth(self) -> bool:
return True
@property
def is_long_running(self) -> bool:
return True
@property
def parameters(self) -> dict[str, Any]:
return {
@@ -137,6 +133,8 @@ class CustomizeAgentTool(BaseTool):
creator_username, agent_slug = parts
store_db = get_store_db()
# Fetch the marketplace agent details
try:
agent_details = await store_db.get_store_agent_details(

View File

@@ -3,13 +3,12 @@
import logging
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.copilot.model import ChatSession
from .agent_generator import (
AgentGeneratorNotConfiguredError,
generate_agent_patch,
get_agent_as_json,
get_all_relevant_agents_for_generation,
get_user_message_for_error,
save_agent_to_library,
)
@@ -17,7 +16,6 @@ from .base import BaseTool
from .models import (
AgentPreviewResponse,
AgentSavedResponse,
AsyncProcessingResponse,
ClarificationNeededResponse,
ClarifyingQuestion,
ErrorResponse,
@@ -38,17 +36,16 @@ class EditAgentTool(BaseTool):
def description(self) -> str:
return (
"Edit an existing agent from the user's library using natural language. "
"Generates updates to the agent while preserving unchanged parts."
"Generates updates to the agent while preserving unchanged parts. "
"\n\nIMPORTANT: Before calling this tool, if the changes involve adding new "
"functionality, search for relevant existing agents using find_library_agent "
"that could be used as building blocks. Pass their IDs in library_agent_ids."
)
@property
def requires_auth(self) -> bool:
return True
@property
def is_long_running(self) -> bool:
return True
@property
def parameters(self) -> dict[str, Any]:
return {
@@ -74,6 +71,15 @@ class EditAgentTool(BaseTool):
"Additional context or answers to previous clarifying questions."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": (
"List of library agent IDs to use as building blocks for the changes. "
"If adding new functionality, search for relevant agents using "
"find_library_agent first, then pass their IDs here."
),
},
"save": {
"type": "boolean",
"description": (
@@ -102,13 +108,10 @@ class EditAgentTool(BaseTool):
agent_id = kwargs.get("agent_id", "").strip()
changes = kwargs.get("changes", "").strip()
context = kwargs.get("context", "")
library_agent_ids = kwargs.get("library_agent_ids", [])
save = kwargs.get("save", True)
session_id = session.session_id if session else None
# Extract async processing params (passed by long-running tool handler)
operation_id = kwargs.get("_operation_id")
task_id = kwargs.get("_task_id")
if not agent_id:
return ErrorResponse(
message="Please provide the agent ID to edit.",
@@ -132,21 +135,25 @@ class EditAgentTool(BaseTool):
session_id=session_id,
)
# Fetch library agents by IDs if provided
library_agents = None
if user_id:
if user_id and library_agent_ids:
try:
from .agent_generator import get_library_agents_by_ids
graph_id = current_agent.get("id")
library_agents = await get_all_relevant_agents_for_generation(
# Filter out the current agent being edited
filtered_ids = [id for id in library_agent_ids if id != graph_id]
library_agents = await get_library_agents_by_ids(
user_id=user_id,
search_query=changes,
exclude_graph_id=graph_id,
include_marketplace=True,
agent_ids=filtered_ids,
)
logger.debug(
f"Found {len(library_agents)} relevant agents for sub-agent composition"
f"Fetched {len(library_agents)} library agents by ID for sub-agent composition"
)
except Exception as e:
logger.warning(f"Failed to fetch library agents: {e}")
logger.warning(f"Failed to fetch library agents by IDs: {e}")
update_request = changes
if context:
@@ -157,8 +164,6 @@ class EditAgentTool(BaseTool):
update_request,
current_agent,
library_agents,
operation_id=operation_id,
task_id=task_id,
)
except AgentGeneratorNotConfiguredError:
return ErrorResponse(
@@ -178,19 +183,6 @@ class EditAgentTool(BaseTool):
session_id=session_id,
)
# Check if Agent Generator accepted for async processing
if result.get("status") == "accepted":
logger.info(
f"Agent edit delegated to async processing "
f"(operation_id={operation_id}, task_id={task_id})"
)
return AsyncProcessingResponse(
message="Agent edit started. You'll be notified when it's complete.",
operation_id=operation_id,
task_id=task_id,
session_id=session_id,
)
# Check if the result is an error from the external service
if isinstance(result, dict) and result.get("type") == "error":
error_msg = result.get("error", "Unknown error")

View File

@@ -0,0 +1,462 @@
"""Feature request tools - search and create feature requests via Linear."""
import logging
from typing import Any
from pydantic import SecretStr
from backend.blocks.linear._api import LinearClient
from backend.copilot.model import ChatSession
from backend.data.db_accessors import user_db
from backend.data.model import APIKeyCredentials
from backend.util.settings import Settings
from .base import BaseTool
from .models import (
ErrorResponse,
FeatureRequestCreatedResponse,
FeatureRequestInfo,
FeatureRequestSearchResponse,
NoResultsResponse,
ToolResponseBase,
)
logger = logging.getLogger(__name__)
MAX_SEARCH_RESULTS = 10
# GraphQL queries/mutations
SEARCH_ISSUES_QUERY = """
query SearchFeatureRequests($term: String!, $filter: IssueFilter, $first: Int) {
searchIssues(term: $term, filter: $filter, first: $first) {
nodes {
id
identifier
title
}
}
}
"""
CUSTOMER_UPSERT_MUTATION = """
mutation CustomerUpsert($input: CustomerUpsertInput!) {
customerUpsert(input: $input) {
success
customer {
id
name
externalIds
}
}
}
"""
ISSUE_CREATE_MUTATION = """
mutation IssueCreate($input: IssueCreateInput!) {
issueCreate(input: $input) {
success
issue {
id
identifier
title
url
}
}
}
"""
CUSTOMER_NEED_CREATE_MUTATION = """
mutation CustomerNeedCreate($input: CustomerNeedCreateInput!) {
customerNeedCreate(input: $input) {
success
need {
id
body
customer {
id
name
}
issue {
id
identifier
title
url
}
}
}
}
"""
_settings: Settings | None = None
def _get_settings() -> Settings:
global _settings
if _settings is None:
_settings = Settings()
return _settings
def _get_linear_config() -> tuple[LinearClient, str, str]:
"""Return a configured Linear client, project ID, and team ID.
Raises RuntimeError if any required setting is missing.
"""
secrets = _get_settings().secrets
if not secrets.copilot_linear_api_key:
raise RuntimeError("COPILOT_LINEAR_API_KEY is not configured")
if not secrets.linear_feature_request_project_id:
raise RuntimeError("LINEAR_FEATURE_REQUEST_PROJECT_ID is not configured")
if not secrets.linear_feature_request_team_id:
raise RuntimeError("LINEAR_FEATURE_REQUEST_TEAM_ID is not configured")
credentials = APIKeyCredentials(
id="system-linear",
provider="linear",
api_key=SecretStr(secrets.copilot_linear_api_key),
title="System Linear API Key",
)
client = LinearClient(credentials=credentials)
return (
client,
secrets.linear_feature_request_project_id,
secrets.linear_feature_request_team_id,
)
class SearchFeatureRequestsTool(BaseTool):
"""Tool for searching existing feature requests in Linear."""
@property
def name(self) -> str:
return "search_feature_requests"
@property
def description(self) -> str:
return (
"Search existing feature requests to check if a similar request "
"already exists before creating a new one. Returns matching feature "
"requests with their ID, title, and description."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search term to find matching feature requests.",
},
},
"required": ["query"],
}
@property
def requires_auth(self) -> bool:
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
query = kwargs.get("query", "").strip()
session_id = session.session_id if session else None
if not query:
return ErrorResponse(
message="Please provide a search query.",
error="Missing query parameter",
session_id=session_id,
)
try:
client, project_id, _team_id = _get_linear_config()
data = await client.query(
SEARCH_ISSUES_QUERY,
{
"term": query,
"filter": {
"project": {"id": {"eq": project_id}},
},
"first": MAX_SEARCH_RESULTS,
},
)
nodes = data.get("searchIssues", {}).get("nodes", [])
if not nodes:
return NoResultsResponse(
message=f"No feature requests found matching '{query}'.",
suggestions=[
"Try different keywords",
"Use broader search terms",
"You can create a new feature request if none exists",
],
session_id=session_id,
)
results = [
FeatureRequestInfo(
id=node["id"],
identifier=node["identifier"],
title=node["title"],
)
for node in nodes
]
return FeatureRequestSearchResponse(
message=f"Found {len(results)} feature request(s) matching '{query}'.",
results=results,
count=len(results),
query=query,
session_id=session_id,
)
except Exception as e:
logger.exception("Failed to search feature requests")
return ErrorResponse(
message="Failed to search feature requests.",
error=str(e),
session_id=session_id,
)
class CreateFeatureRequestTool(BaseTool):
"""Tool for creating feature requests (or adding needs to existing ones)."""
@property
def name(self) -> str:
return "create_feature_request"
@property
def description(self) -> str:
return (
"Create a new feature request or add a customer need to an existing one. "
"Always search first with search_feature_requests to avoid duplicates. "
"If a matching request exists, pass its ID as existing_issue_id to add "
"the user's need to it instead of creating a duplicate. "
"IMPORTANT: Never include personally identifiable information (PII) in "
"the title or description — no names, emails, phone numbers, company "
"names, or other identifying details. Write titles and descriptions in "
"generic, feature-focused language."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": (
"Title for the feature request. Must be generic and "
"feature-focused — do not include any user names, emails, "
"company names, or other PII."
),
},
"description": {
"type": "string",
"description": (
"Detailed description of what the user wants and why. "
"Must not contain any personally identifiable information "
"(PII) — describe the feature need generically without "
"referencing specific users, companies, or contact details."
),
},
"existing_issue_id": {
"type": "string",
"description": (
"If adding a need to an existing feature request, "
"provide its Linear issue ID (from search results). "
"Omit to create a new feature request."
),
},
},
"required": ["title", "description"],
}
@property
def requires_auth(self) -> bool:
return True
async def _find_or_create_customer(
self, client: LinearClient, user_id: str, name: str
) -> dict:
"""Find existing customer by user_id or create a new one via upsert.
Args:
client: Linear API client.
user_id: Stable external ID used to deduplicate customers.
name: Human-readable display name (e.g. the user's email).
"""
data = await client.mutate(
CUSTOMER_UPSERT_MUTATION,
{
"input": {
"name": name,
"externalId": user_id,
},
},
)
result = data.get("customerUpsert", {})
if not result.get("success"):
raise RuntimeError(f"Failed to upsert customer: {data}")
return result["customer"]
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs,
) -> ToolResponseBase:
title = kwargs.get("title", "").strip()
description = kwargs.get("description", "").strip()
existing_issue_id = kwargs.get("existing_issue_id")
session_id = session.session_id if session else None
if not title or not description:
return ErrorResponse(
message="Both title and description are required.",
error="Missing required parameters",
session_id=session_id,
)
if not user_id:
return ErrorResponse(
message="Authentication required to create feature requests.",
error="Missing user_id",
session_id=session_id,
)
try:
client, project_id, team_id = _get_linear_config()
except Exception as e:
logger.exception("Failed to initialize Linear client")
return ErrorResponse(
message="Failed to create feature request.",
error=str(e),
session_id=session_id,
)
# Resolve a human-readable name (email) for the Linear customer record.
# Fall back to user_id if the lookup fails or returns None.
try:
customer_display_name = (
await user_db().get_user_email_by_id(user_id) or user_id
)
except Exception:
customer_display_name = user_id
# Step 1: Find or create customer for this user
try:
customer = await self._find_or_create_customer(
client, user_id, customer_display_name
)
customer_id = customer["id"]
customer_name = customer["name"]
except Exception as e:
logger.exception("Failed to upsert customer in Linear")
return ErrorResponse(
message="Failed to create feature request.",
error=str(e),
session_id=session_id,
)
# Step 2: Create or reuse issue
issue_id: str | None = None
issue_identifier: str | None = None
if existing_issue_id:
# Add need to existing issue - we still need the issue details for response
is_new_issue = False
issue_id = existing_issue_id
else:
# Create new issue in the feature requests project
try:
data = await client.mutate(
ISSUE_CREATE_MUTATION,
{
"input": {
"title": title,
"description": description,
"teamId": team_id,
"projectId": project_id,
},
},
)
result = data.get("issueCreate", {})
if not result.get("success"):
return ErrorResponse(
message="Failed to create feature request issue.",
error=str(data),
session_id=session_id,
)
issue = result["issue"]
issue_id = issue["id"]
issue_identifier = issue.get("identifier")
except Exception as e:
logger.exception("Failed to create feature request issue")
return ErrorResponse(
message="Failed to create feature request.",
error=str(e),
session_id=session_id,
)
is_new_issue = True
# Step 3: Create customer need on the issue
try:
data = await client.mutate(
CUSTOMER_NEED_CREATE_MUTATION,
{
"input": {
"customerId": customer_id,
"issueId": issue_id,
"body": description,
"priority": 0,
},
},
)
need_result = data.get("customerNeedCreate", {})
if not need_result.get("success"):
orphaned = (
{"issue_id": issue_id, "issue_identifier": issue_identifier}
if is_new_issue
else None
)
return ErrorResponse(
message="Failed to attach customer need to the feature request.",
error=str(data),
details=orphaned,
session_id=session_id,
)
need = need_result["need"]
issue_info = need["issue"]
except Exception as e:
logger.exception("Failed to create customer need")
orphaned = (
{"issue_id": issue_id, "issue_identifier": issue_identifier}
if is_new_issue
else None
)
return ErrorResponse(
message="Failed to attach customer need to the feature request.",
error=str(e),
details=orphaned,
session_id=session_id,
)
return FeatureRequestCreatedResponse(
message=(
f"{'Created new feature request' if is_new_issue else 'Added your request to existing feature request'}: "
f"{issue_info['title']}."
),
issue_id=issue_info["id"],
issue_identifier=issue_info["identifier"],
issue_title=issue_info["title"],
issue_url=issue_info.get("url", ""),
is_new_issue=is_new_issue,
customer_name=customer_name,
session_id=session_id,
)

View File

@@ -0,0 +1,610 @@
"""Tests for SearchFeatureRequestsTool and CreateFeatureRequestTool."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from ._test_data import make_session
from .feature_requests import CreateFeatureRequestTool, SearchFeatureRequestsTool
from .models import (
ErrorResponse,
FeatureRequestCreatedResponse,
FeatureRequestSearchResponse,
NoResultsResponse,
)
_TEST_USER_ID = "test-user-feature-requests"
_TEST_USER_EMAIL = "testuser@example.com"
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_FAKE_PROJECT_ID = "test-project-id"
_FAKE_TEAM_ID = "test-team-id"
def _mock_linear_config(*, query_return=None, mutate_return=None):
"""Return a patched _get_linear_config that yields a mock LinearClient."""
client = AsyncMock()
if query_return is not None:
client.query.return_value = query_return
if mutate_return is not None:
client.mutate.return_value = mutate_return
return (
patch(
"backend.copilot.tools.feature_requests._get_linear_config",
return_value=(client, _FAKE_PROJECT_ID, _FAKE_TEAM_ID),
),
client,
)
def _search_response(nodes: list[dict]) -> dict:
return {"searchIssues": {"nodes": nodes}}
def _customer_upsert_response(
customer_id: str = "cust-1", name: str = _TEST_USER_EMAIL, success: bool = True
) -> dict:
return {
"customerUpsert": {
"success": success,
"customer": {"id": customer_id, "name": name, "externalIds": [name]},
}
}
def _issue_create_response(
issue_id: str = "issue-1",
identifier: str = "FR-1",
title: str = "New Feature",
success: bool = True,
) -> dict:
return {
"issueCreate": {
"success": success,
"issue": {
"id": issue_id,
"identifier": identifier,
"title": title,
"url": f"https://linear.app/issue/{identifier}",
},
}
}
def _need_create_response(
need_id: str = "need-1",
issue_id: str = "issue-1",
identifier: str = "FR-1",
title: str = "New Feature",
success: bool = True,
) -> dict:
return {
"customerNeedCreate": {
"success": success,
"need": {
"id": need_id,
"body": "description",
"customer": {"id": "cust-1", "name": _TEST_USER_EMAIL},
"issue": {
"id": issue_id,
"identifier": identifier,
"title": title,
"url": f"https://linear.app/issue/{identifier}",
},
},
}
}
# ===========================================================================
# SearchFeatureRequestsTool
# ===========================================================================
class TestSearchFeatureRequestsTool:
"""Tests for SearchFeatureRequestsTool._execute."""
@pytest.mark.asyncio(loop_scope="session")
async def test_successful_search(self):
session = make_session(user_id=_TEST_USER_ID)
nodes = [
{
"id": "id-1",
"identifier": "FR-1",
"title": "Dark mode",
},
{
"id": "id-2",
"identifier": "FR-2",
"title": "Dark theme",
},
]
patcher, _ = _mock_linear_config(query_return=_search_response(nodes))
with patcher:
tool = SearchFeatureRequestsTool()
resp = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="dark mode"
)
assert isinstance(resp, FeatureRequestSearchResponse)
assert resp.count == 2
assert resp.results[0].id == "id-1"
assert resp.results[1].identifier == "FR-2"
assert resp.query == "dark mode"
@pytest.mark.asyncio(loop_scope="session")
async def test_no_results(self):
session = make_session(user_id=_TEST_USER_ID)
patcher, _ = _mock_linear_config(query_return=_search_response([]))
with patcher:
tool = SearchFeatureRequestsTool()
resp = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="nonexistent"
)
assert isinstance(resp, NoResultsResponse)
assert "nonexistent" in resp.message
@pytest.mark.asyncio(loop_scope="session")
async def test_empty_query_returns_error(self):
session = make_session(user_id=_TEST_USER_ID)
tool = SearchFeatureRequestsTool()
resp = await tool._execute(user_id=_TEST_USER_ID, session=session, query=" ")
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "query" in resp.error.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_missing_query_returns_error(self):
session = make_session(user_id=_TEST_USER_ID)
tool = SearchFeatureRequestsTool()
resp = await tool._execute(user_id=_TEST_USER_ID, session=session)
assert isinstance(resp, ErrorResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_api_failure(self):
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.query.side_effect = RuntimeError("Linear API down")
with patcher:
tool = SearchFeatureRequestsTool()
resp = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="test"
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "Linear API down" in resp.error
@pytest.mark.asyncio(loop_scope="session")
async def test_malformed_node_returns_error(self):
"""A node missing required keys should be caught by the try/except."""
session = make_session(user_id=_TEST_USER_ID)
# Node missing 'identifier' key
bad_nodes = [{"id": "id-1", "title": "Missing identifier"}]
patcher, _ = _mock_linear_config(query_return=_search_response(bad_nodes))
with patcher:
tool = SearchFeatureRequestsTool()
resp = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="test"
)
assert isinstance(resp, ErrorResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_linear_client_init_failure(self):
session = make_session(user_id=_TEST_USER_ID)
with patch(
"backend.copilot.tools.feature_requests._get_linear_config",
side_effect=RuntimeError("No API key"),
):
tool = SearchFeatureRequestsTool()
resp = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="test"
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "No API key" in resp.error
# ===========================================================================
# CreateFeatureRequestTool
# ===========================================================================
class TestCreateFeatureRequestTool:
"""Tests for CreateFeatureRequestTool._execute."""
@pytest.fixture(autouse=True)
def _patch_email_lookup(self):
mock_user_db = MagicMock()
mock_user_db.get_user_email_by_id = AsyncMock(return_value=_TEST_USER_EMAIL)
with patch(
"backend.copilot.tools.feature_requests.user_db",
return_value=mock_user_db,
):
yield
# ---- Happy paths -------------------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_create_new_issue(self):
"""Full happy path: upsert customer -> create issue -> attach need."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
_issue_create_response(),
_need_create_response(),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="New Feature",
description="Please add this",
)
assert isinstance(resp, FeatureRequestCreatedResponse)
assert resp.is_new_issue is True
assert resp.issue_identifier == "FR-1"
assert resp.customer_name == _TEST_USER_EMAIL
assert client.mutate.call_count == 3
@pytest.mark.asyncio(loop_scope="session")
async def test_add_need_to_existing_issue(self):
"""When existing_issue_id is provided, skip issue creation."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
_need_create_response(issue_id="existing-1", identifier="FR-99"),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Existing Feature",
description="Me too",
existing_issue_id="existing-1",
)
assert isinstance(resp, FeatureRequestCreatedResponse)
assert resp.is_new_issue is False
assert resp.issue_id == "existing-1"
# Only 2 mutations: customer upsert + need create (no issue create)
assert client.mutate.call_count == 2
# ---- Validation errors -------------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_missing_title(self):
session = make_session(user_id=_TEST_USER_ID)
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="",
description="some desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "required" in resp.error.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_missing_description(self):
session = make_session(user_id=_TEST_USER_ID)
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Some title",
description="",
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "required" in resp.error.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_missing_user_id(self):
session = make_session(user_id=_TEST_USER_ID)
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=None,
session=session,
title="Some title",
description="Some desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "user_id" in resp.error.lower()
# ---- Linear client init failure ----------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_linear_client_init_failure(self):
session = make_session(user_id=_TEST_USER_ID)
with patch(
"backend.copilot.tools.feature_requests._get_linear_config",
side_effect=RuntimeError("No API key"),
):
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "No API key" in resp.error
# ---- Customer upsert failures ------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_upsert_api_error(self):
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = RuntimeError("Customer API error")
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "Customer API error" in resp.error
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_upsert_not_success(self):
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.return_value = _customer_upsert_response(success=False)
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_malformed_response(self):
"""Customer dict missing 'id' key should be caught."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
# success=True but customer has no 'id'
client.mutate.return_value = {
"customerUpsert": {
"success": True,
"customer": {"name": _TEST_USER_ID},
}
}
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
# ---- Issue creation failures -------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_issue_create_api_error(self):
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
RuntimeError("Issue create failed"),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "Issue create failed" in resp.error
@pytest.mark.asyncio(loop_scope="session")
async def test_issue_create_not_success(self):
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
_issue_create_response(success=False),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
assert "Failed to create feature request issue" in resp.message
@pytest.mark.asyncio(loop_scope="session")
async def test_issue_create_malformed_response(self):
"""issueCreate success=True but missing 'issue' key."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
{"issueCreate": {"success": True}}, # no 'issue' key
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
# ---- Customer need attachment failures ---------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_need_create_api_error_new_issue(self):
"""Need creation fails after new issue was created -> orphaned issue info."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
_issue_create_response(issue_id="orphan-1", identifier="FR-10"),
RuntimeError("Need attach failed"),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.error is not None
assert "Need attach failed" in resp.error
assert resp.details is not None
assert resp.details["issue_id"] == "orphan-1"
assert resp.details["issue_identifier"] == "FR-10"
@pytest.mark.asyncio(loop_scope="session")
async def test_need_create_api_error_existing_issue(self):
"""Need creation fails on existing issue -> no orphaned info."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
RuntimeError("Need attach failed"),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
existing_issue_id="existing-1",
)
assert isinstance(resp, ErrorResponse)
assert resp.details is None
@pytest.mark.asyncio(loop_scope="session")
async def test_need_create_not_success_includes_orphaned_info(self):
"""customerNeedCreate returns success=False -> includes orphaned issue."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
_issue_create_response(issue_id="orphan-2", identifier="FR-20"),
_need_create_response(success=False),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.details is not None
assert resp.details["issue_id"] == "orphan-2"
assert resp.details["issue_identifier"] == "FR-20"
@pytest.mark.asyncio(loop_scope="session")
async def test_need_create_not_success_existing_issue_no_details(self):
"""customerNeedCreate fails on existing issue -> no orphaned info."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
_need_create_response(success=False),
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
existing_issue_id="existing-1",
)
assert isinstance(resp, ErrorResponse)
assert resp.details is None
@pytest.mark.asyncio(loop_scope="session")
async def test_need_create_malformed_response(self):
"""need_result missing 'need' key after success=True."""
session = make_session(user_id=_TEST_USER_ID)
patcher, client = _mock_linear_config()
client.mutate.side_effect = [
_customer_upsert_response(),
_issue_create_response(),
{"customerNeedCreate": {"success": True}}, # no 'need' key
]
with patcher:
tool = CreateFeatureRequestTool()
resp = await tool._execute(
user_id=_TEST_USER_ID,
session=session,
title="Title",
description="Desc",
)
assert isinstance(resp, ErrorResponse)
assert resp.details is not None
assert resp.details["issue_id"] == "issue-1"

View File

@@ -2,7 +2,7 @@
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.copilot.model import ChatSession
from .agent_search import search_agents
from .base import BaseTool

View File

@@ -3,18 +3,18 @@ from typing import Any
from prisma.enums import ContentType
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.tools.base import BaseTool, ToolResponseBase
from backend.api.features.chat.tools.models import (
from backend.blocks import get_block
from backend.blocks._base import BlockType
from backend.copilot.model import ChatSession
from backend.data.db_accessors import search
from .base import BaseTool, ToolResponseBase
from .models import (
BlockInfoSummary,
BlockInputFieldInfo,
BlockListResponse,
ErrorResponse,
NoResultsResponse,
)
from backend.api.features.store.hybrid_search import unified_hybrid_search
from backend.blocks import get_block
from backend.blocks._base import BlockType
logger = logging.getLogger(__name__)
@@ -55,7 +55,8 @@ class FindBlockTool(BaseTool):
"Blocks are reusable components that perform specific tasks like "
"sending emails, making API calls, processing text, etc. "
"IMPORTANT: Use this tool FIRST to get the block's 'id' before calling run_block. "
"The response includes each block's id, required_inputs, and input_schema."
"The response includes each block's id, name, and description. "
"Call run_block with the block's id **with no inputs** to see detailed inputs/outputs and execute it."
)
@property
@@ -107,7 +108,7 @@ class FindBlockTool(BaseTool):
try:
# Search for blocks using hybrid search
results, total = await unified_hybrid_search(
results, total = await search().unified_hybrid_search(
query=query,
content_types=[ContentType.BLOCK],
page=1,
@@ -124,7 +125,7 @@ class FindBlockTool(BaseTool):
session_id=session_id,
)
# Enrich results with full block information
# Enrich results with block information
blocks: list[BlockInfoSummary] = []
for result in results:
block_id = result["content_id"]
@@ -141,65 +142,12 @@ class FindBlockTool(BaseTool):
):
continue
# Get input/output schemas
input_schema = {}
output_schema = {}
try:
input_schema = block.input_schema.jsonschema()
except Exception as e:
logger.debug(
"Failed to generate input schema for block %s: %s",
block_id,
e,
)
try:
output_schema = block.output_schema.jsonschema()
except Exception as e:
logger.debug(
"Failed to generate output schema for block %s: %s",
block_id,
e,
)
# Get categories from block instance
categories = []
if hasattr(block, "categories") and block.categories:
categories = [cat.value for cat in block.categories]
# Extract required inputs for easier use
required_inputs: list[BlockInputFieldInfo] = []
if input_schema:
properties = input_schema.get("properties", {})
required_fields = set(input_schema.get("required", []))
# Get credential field names to exclude from required inputs
credentials_fields = set(
block.input_schema.get_credentials_fields().keys()
)
for field_name, field_schema in properties.items():
# Skip credential fields - they're handled separately
if field_name in credentials_fields:
continue
required_inputs.append(
BlockInputFieldInfo(
name=field_name,
type=field_schema.get("type", "string"),
description=field_schema.get("description", ""),
required=field_name in required_fields,
default=field_schema.get("default"),
)
)
blocks.append(
BlockInfoSummary(
id=block_id,
name=block.name,
description=block.description or "",
categories=categories,
input_schema=input_schema,
output_schema=output_schema,
required_inputs=required_inputs,
categories=[c.value for c in block.categories],
)
)
@@ -228,8 +176,7 @@ class FindBlockTool(BaseTool):
return BlockListResponse(
message=(
f"Found {len(blocks)} block(s) matching '{query}'. "
"To execute a block, use run_block with the block's 'id' field "
"and provide 'input_data' matching the block's input_schema."
"To see a block's inputs/outputs and execute it, use run_block with the block's 'id' - providing no inputs."
),
blocks=blocks,
count=len(blocks),

View File

@@ -0,0 +1,401 @@
"""Tests for block filtering in FindBlockTool."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.blocks._base import BlockType
from ._test_data import make_session
from .find_block import (
COPILOT_EXCLUDED_BLOCK_IDS,
COPILOT_EXCLUDED_BLOCK_TYPES,
FindBlockTool,
)
from .models import BlockListResponse
_TEST_USER_ID = "test-user-find-block"
def make_mock_block(
block_id: str,
name: str,
block_type: BlockType,
disabled: bool = False,
input_schema: dict | None = None,
output_schema: dict | None = None,
credentials_fields: dict | None = None,
):
"""Create a mock block for testing."""
mock = MagicMock()
mock.id = block_id
mock.name = name
mock.description = f"{name} description"
mock.block_type = block_type
mock.disabled = disabled
mock.input_schema = MagicMock()
mock.input_schema.jsonschema.return_value = input_schema or {
"properties": {},
"required": [],
}
mock.input_schema.get_credentials_fields.return_value = credentials_fields or {}
mock.output_schema = MagicMock()
mock.output_schema.jsonschema.return_value = output_schema or {}
mock.categories = []
return mock
class TestFindBlockFiltering:
"""Tests for block filtering in FindBlockTool."""
def test_excluded_block_types_contains_expected_types(self):
"""Verify COPILOT_EXCLUDED_BLOCK_TYPES contains all graph-only types."""
assert BlockType.INPUT in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.OUTPUT in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.WEBHOOK in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.WEBHOOK_MANUAL in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.NOTE in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.HUMAN_IN_THE_LOOP in COPILOT_EXCLUDED_BLOCK_TYPES
assert BlockType.AGENT in COPILOT_EXCLUDED_BLOCK_TYPES
def test_excluded_block_ids_contains_smart_decision_maker(self):
"""Verify SmartDecisionMakerBlock is in COPILOT_EXCLUDED_BLOCK_IDS."""
assert "3b191d9f-356f-482d-8238-ba04b6d18381" in COPILOT_EXCLUDED_BLOCK_IDS
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_type_filtered_from_results(self):
"""Verify blocks with excluded BlockTypes are filtered from search results."""
session = make_session(user_id=_TEST_USER_ID)
# Mock search returns an INPUT block (excluded) and a STANDARD block (included)
search_results = [
{"content_id": "input-block-id", "score": 0.9},
{"content_id": "standard-block-id", "score": 0.8},
]
input_block = make_mock_block("input-block-id", "Input Block", BlockType.INPUT)
standard_block = make_mock_block(
"standard-block-id", "HTTP Request", BlockType.STANDARD
)
def mock_get_block(block_id):
return {
"input-block-id": input_block,
"standard-block-id": standard_block,
}.get(block_id)
mock_search_db = MagicMock()
mock_search_db.unified_hybrid_search = AsyncMock(
return_value=(search_results, 2)
)
with patch(
"backend.copilot.tools.find_block.search",
return_value=mock_search_db,
):
with patch(
"backend.copilot.tools.find_block.get_block",
side_effect=mock_get_block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="test"
)
# Should only return the standard block, not the INPUT block
assert isinstance(response, BlockListResponse)
assert len(response.blocks) == 1
assert response.blocks[0].id == "standard-block-id"
@pytest.mark.asyncio(loop_scope="session")
async def test_excluded_block_id_filtered_from_results(self):
"""Verify SmartDecisionMakerBlock is filtered from search results."""
session = make_session(user_id=_TEST_USER_ID)
smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
search_results = [
{"content_id": smart_decision_id, "score": 0.9},
{"content_id": "normal-block-id", "score": 0.8},
]
# SmartDecisionMakerBlock has STANDARD type but is excluded by ID
smart_block = make_mock_block(
smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
)
normal_block = make_mock_block(
"normal-block-id", "Normal Block", BlockType.STANDARD
)
def mock_get_block(block_id):
return {
smart_decision_id: smart_block,
"normal-block-id": normal_block,
}.get(block_id)
mock_search_db = MagicMock()
mock_search_db.unified_hybrid_search = AsyncMock(
return_value=(search_results, 2)
)
with patch(
"backend.copilot.tools.find_block.search",
return_value=mock_search_db,
):
with patch(
"backend.copilot.tools.find_block.get_block",
side_effect=mock_get_block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="decision"
)
# Should only return normal block, not SmartDecisionMakerBlock
assert isinstance(response, BlockListResponse)
assert len(response.blocks) == 1
assert response.blocks[0].id == "normal-block-id"
@pytest.mark.asyncio(loop_scope="session")
async def test_response_size_average_chars_per_block(self):
"""Measure average chars per block in the serialized response."""
session = make_session(user_id=_TEST_USER_ID)
# Realistic block definitions modeled after real blocks
block_defs = [
{
"id": "http-block-id",
"name": "Send Web Request",
"input_schema": {
"properties": {
"url": {
"type": "string",
"description": "The URL to send the request to",
},
"method": {
"type": "string",
"description": "The HTTP method to use",
},
"headers": {
"type": "object",
"description": "Headers to include in the request",
},
"json_format": {
"type": "boolean",
"description": "If true, send the body as JSON",
},
"body": {
"type": "object",
"description": "Form/JSON body payload",
},
"credentials": {
"type": "object",
"description": "HTTP credentials",
},
},
"required": ["url", "method"],
},
"output_schema": {
"properties": {
"response": {
"type": "object",
"description": "The response from the server",
},
"client_error": {
"type": "object",
"description": "Errors on 4xx status codes",
},
"server_error": {
"type": "object",
"description": "Errors on 5xx status codes",
},
"error": {
"type": "string",
"description": "Errors for all other exceptions",
},
},
},
"credentials_fields": {"credentials": True},
},
{
"id": "email-block-id",
"name": "Send Email",
"input_schema": {
"properties": {
"to_email": {
"type": "string",
"description": "Recipient email address",
},
"subject": {
"type": "string",
"description": "Subject of the email",
},
"body": {
"type": "string",
"description": "Body of the email",
},
"config": {
"type": "object",
"description": "SMTP Config",
},
"credentials": {
"type": "object",
"description": "SMTP credentials",
},
},
"required": ["to_email", "subject", "body", "credentials"],
},
"output_schema": {
"properties": {
"status": {
"type": "string",
"description": "Status of the email sending operation",
},
"error": {
"type": "string",
"description": "Error message if sending failed",
},
},
},
"credentials_fields": {"credentials": True},
},
{
"id": "claude-code-block-id",
"name": "Claude Code",
"input_schema": {
"properties": {
"e2b_credentials": {
"type": "object",
"description": "API key for E2B platform",
},
"anthropic_credentials": {
"type": "object",
"description": "API key for Anthropic",
},
"prompt": {
"type": "string",
"description": "Task or instruction for Claude Code",
},
"timeout": {
"type": "integer",
"description": "Sandbox timeout in seconds",
},
"setup_commands": {
"type": "array",
"description": "Shell commands to run before execution",
},
"working_directory": {
"type": "string",
"description": "Working directory for Claude Code",
},
"session_id": {
"type": "string",
"description": "Session ID to resume a conversation",
},
"sandbox_id": {
"type": "string",
"description": "Sandbox ID to reconnect to",
},
"conversation_history": {
"type": "string",
"description": "Previous conversation history",
},
"dispose_sandbox": {
"type": "boolean",
"description": "Whether to dispose sandbox after execution",
},
},
"required": [
"e2b_credentials",
"anthropic_credentials",
"prompt",
],
},
"output_schema": {
"properties": {
"response": {
"type": "string",
"description": "Output from Claude Code execution",
},
"files": {
"type": "array",
"description": "Files created/modified by Claude Code",
},
"conversation_history": {
"type": "string",
"description": "Full conversation history",
},
"session_id": {
"type": "string",
"description": "Session ID for this conversation",
},
"sandbox_id": {
"type": "string",
"description": "ID of the sandbox instance",
},
"error": {
"type": "string",
"description": "Error message if execution failed",
},
},
},
"credentials_fields": {
"e2b_credentials": True,
"anthropic_credentials": True,
},
},
]
search_results = [
{"content_id": d["id"], "score": 0.9 - i * 0.1}
for i, d in enumerate(block_defs)
]
mock_blocks = {
d["id"]: make_mock_block(
block_id=d["id"],
name=d["name"],
block_type=BlockType.STANDARD,
input_schema=d["input_schema"],
output_schema=d["output_schema"],
credentials_fields=d["credentials_fields"],
)
for d in block_defs
}
mock_search_db = MagicMock()
mock_search_db.unified_hybrid_search = AsyncMock(
return_value=(search_results, len(search_results))
)
with (
patch(
"backend.copilot.tools.find_block.search",
return_value=mock_search_db,
),
patch(
"backend.copilot.tools.find_block.get_block",
side_effect=lambda bid: mock_blocks.get(bid),
),
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query="test"
)
assert isinstance(response, BlockListResponse)
assert response.count == len(block_defs)
total_chars = len(response.model_dump_json())
avg_chars = total_chars // response.count
# Print for visibility in test output
print(f"\nTotal response size: {total_chars} chars")
print(f"Number of blocks: {response.count}")
print(f"Average chars per block: {avg_chars}")
# The old response was ~90K for 10 blocks (~9K per block).
# Previous optimization reduced it to ~1.5K per block (no raw JSON schemas).
# Now with only id/name/description, we expect ~300 chars per block.
assert avg_chars < 500, (
f"Average chars per block ({avg_chars}) exceeds 500. "
f"Total response: {total_chars} chars for {response.count} blocks."
)

View File

@@ -2,7 +2,7 @@
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.copilot.model import ChatSession
from .agent_search import search_agents
from .base import BaseTool

View File

@@ -4,13 +4,10 @@ import logging
from pathlib import Path
from typing import Any
from backend.api.features.chat.model import ChatSession
from backend.api.features.chat.tools.base import BaseTool
from backend.api.features.chat.tools.models import (
DocPageResponse,
ErrorResponse,
ToolResponseBase,
)
from backend.copilot.model import ChatSession
from .base import BaseTool
from .models import DocPageResponse, ErrorResponse, ToolResponseBase
logger = logging.getLogger(__name__)

View File

@@ -2,7 +2,7 @@
from datetime import datetime
from enum import Enum
from typing import Any
from typing import Any, Literal
from pydantic import BaseModel, Field
@@ -25,6 +25,7 @@ class ResponseType(str, Enum):
AGENT_SAVED = "agent_saved"
CLARIFICATION_NEEDED = "clarification_needed"
BLOCK_LIST = "block_list"
BLOCK_DETAILS = "block_details"
BLOCK_OUTPUT = "block_output"
DOC_SEARCH_RESULTS = "doc_search_results"
DOC_PAGE = "doc_page"
@@ -35,11 +36,18 @@ class ResponseType(str, Enum):
WORKSPACE_FILE_WRITTEN = "workspace_file_written"
WORKSPACE_FILE_DELETED = "workspace_file_deleted"
# Long-running operation types
OPERATION_STARTED = "operation_started"
OPERATION_PENDING = "operation_pending"
OPERATION_IN_PROGRESS = "operation_in_progress"
# Input validation
INPUT_VALIDATION_ERROR = "input_validation_error"
# Web fetch
WEB_FETCH = "web_fetch"
# Code execution
BASH_EXEC = "bash_exec"
# Feature request types
FEATURE_REQUEST_SEARCH = "feature_request_search"
FEATURE_REQUEST_CREATED = "feature_request_created"
# Goal refinement
SUGGESTED_GOAL = "suggested_goal"
# Base response model
@@ -286,6 +294,22 @@ class ClarificationNeededResponse(ToolResponseBase):
questions: list[ClarifyingQuestion] = Field(default_factory=list)
class SuggestedGoalResponse(ToolResponseBase):
"""Response when the goal needs refinement with a suggested alternative."""
type: ResponseType = ResponseType.SUGGESTED_GOAL
suggested_goal: str = Field(description="The suggested alternative goal")
reason: str = Field(
default="", description="Why the original goal needs refinement"
)
original_goal: str = Field(
default="", description="The user's original goal for context"
)
goal_type: Literal["vague", "unachievable"] = Field(
default="vague", description="Type: 'vague' or 'unachievable'"
)
# Documentation search models
class DocSearchResult(BaseModel):
"""A single documentation search result."""
@@ -335,11 +359,17 @@ class BlockInfoSummary(BaseModel):
name: str
description: str
categories: list[str]
input_schema: dict[str, Any]
output_schema: dict[str, Any]
input_schema: dict[str, Any] = Field(
default_factory=dict,
description="Full JSON schema for block inputs",
)
output_schema: dict[str, Any] = Field(
default_factory=dict,
description="Full JSON schema for block outputs",
)
required_inputs: list[BlockInputFieldInfo] = Field(
default_factory=list,
description="List of required input fields for this block",
description="List of input fields for this block",
)
@@ -352,10 +382,29 @@ class BlockListResponse(ToolResponseBase):
query: str
usage_hint: str = Field(
default="To execute a block, call run_block with block_id set to the block's "
"'id' field and input_data containing the required fields from input_schema."
"'id' field and input_data containing the fields listed in required_inputs."
)
class BlockDetails(BaseModel):
"""Detailed block information."""
id: str
name: str
description: str
inputs: dict[str, Any] = {}
outputs: dict[str, Any] = {}
credentials: list[CredentialsMetaInput] = []
class BlockDetailsResponse(ToolResponseBase):
"""Response for block details (first run_block attempt)."""
type: ResponseType = ResponseType.BLOCK_DETAILS
block: BlockDetails
user_authenticated: bool = False
class BlockOutputResponse(ToolResponseBase):
"""Response for run_block tool."""
@@ -367,34 +416,6 @@ class BlockOutputResponse(ToolResponseBase):
# Long-running operation models
class OperationStartedResponse(ToolResponseBase):
"""Response when a long-running operation has been started in the background.
This is returned immediately to the client while the operation continues
to execute. The user can close the tab and check back later.
The task_id can be used to reconnect to the SSE stream via
GET /chat/tasks/{task_id}/stream?last_idx=0
"""
type: ResponseType = ResponseType.OPERATION_STARTED
operation_id: str
tool_name: str
task_id: str | None = None # For SSE reconnection
class OperationPendingResponse(ToolResponseBase):
"""Response stored in chat history while a long-running operation is executing.
This is persisted to the database so users see a pending state when they
refresh before the operation completes.
"""
type: ResponseType = ResponseType.OPERATION_PENDING
operation_id: str
tool_name: str
class OperationInProgressResponse(ToolResponseBase):
"""Response when an operation is already in progress.
@@ -406,18 +427,52 @@ class OperationInProgressResponse(ToolResponseBase):
tool_call_id: str
class AsyncProcessingResponse(ToolResponseBase):
"""Response when an operation has been delegated to async processing.
class WebFetchResponse(ToolResponseBase):
"""Response for web_fetch tool."""
This is returned by tools when the external service accepts the request
for async processing (HTTP 202 Accepted). The Redis Streams completion
consumer will handle the result when the external service completes.
type: ResponseType = ResponseType.WEB_FETCH
url: str
status_code: int
content_type: str
content: str
truncated: bool = False
The status field is specifically "accepted" to allow the long-running tool
handler to detect this response and skip LLM continuation.
"""
type: ResponseType = ResponseType.OPERATION_STARTED
status: str = "accepted" # Must be "accepted" for detection
operation_id: str | None = None
task_id: str | None = None
class BashExecResponse(ToolResponseBase):
"""Response for bash_exec tool."""
type: ResponseType = ResponseType.BASH_EXEC
stdout: str
stderr: str
exit_code: int
timed_out: bool = False
# Feature request models
class FeatureRequestInfo(BaseModel):
"""Information about a feature request issue."""
id: str
identifier: str
title: str
class FeatureRequestSearchResponse(ToolResponseBase):
"""Response for search_feature_requests tool."""
type: ResponseType = ResponseType.FEATURE_REQUEST_SEARCH
results: list[FeatureRequestInfo]
count: int
query: str
class FeatureRequestCreatedResponse(ToolResponseBase):
"""Response for create_feature_request tool."""
type: ResponseType = ResponseType.FEATURE_REQUEST_CREATED
issue_id: str
issue_identifier: str
issue_title: str
issue_url: str
is_new_issue: bool # False if added to existing
customer_name: str

Some files were not shown because too many files have changed in this diff Show More