Compare commits

...

45 Commits

Author SHA1 Message Date
Nicholas Tindle
f20693d02b chore(classic): remove deprecated benchmark suite (agbenchmark)
Remove the entire classic/benchmark directory containing the agbenchmark
suite, test agent reports (mini-agi, babyagi, beebot, etc.), and challenge
fixtures. This benchmark harness is no longer used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 01:41:44 -05:00
Nicholas Tindle
a4188c5657 chore(classic): remove deprecated Flutter frontend and unneeded files
- Remove deprecated Flutter frontend (replaced by autogpt_platform)
- Remove shell scripts (run, setup, autogpt.sh, etc.)
- Remove tutorials (outdated)
- Remove CLI-USAGE.md, FORGE-QUICKSTART.md, and cli.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 01:34:01 -05:00
Ubbe
0f67e45d05 hotfix(marketplace): adjust card height overflow (#12497)
## Summary

### Before

<img width="500" height="501" alt="Screenshot 2026-03-20 at 21 50 31"
src="https://github.com/user-attachments/assets/6154cffb-6772-4c3d-a703-527c8ca0daff"
/>

### After

<img width="500" height="581" alt="Screenshot 2026-03-20 at 21 33 12"
src="https://github.com/user-attachments/assets/2f9bd69d-30c5-4d06-ad1e-ed76b184afe5"
/>

### Other minor fixes

- minor spacing adjustments in creator/search pages when empty and
between sections


### Summary

- Increase StoreCard height from 25rem to 26.5rem to prevent content
overflow
- Replace manual tooltip-based title truncation with `OverflowText`
component in StoreCard
- Adjust carousel indicator positioning and hide it on md+ when exactly
3 featured agents are shown

## Test plan
- [x] Verify marketplace cards display without text overflow
- [x] Verify featured section carousel indicators behave correctly
- [x] Check responsive behavior at common breakpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 22:03:28 +08:00
Nicholas Tindle
f01f668674 fix(backend): support Responses API in SmartDecisionMakerBlock (#12489)
## Summary

- Fixes SmartDecisionMakerBlock conversation management to work with
OpenAI's Responses API, which was introduced in #12099 (commit 1240f38)
- The migration to `responses.create` updated the outbound LLM call but
missed the conversation history serialization — the `raw_response` is
now the entire `Response` object (not a `ChatCompletionMessage`), and
tool calls/results use `function_call` / `function_call_output` types
instead of role-based messages
- This caused a 400 error on the second LLM call in agent mode:
`"Invalid value: ''. Supported values are: 'assistant', 'system',
'developer', and 'user'."`

### Changes

**`smart_decision_maker.py`** — 6 functions updated:
| Function | Fix |
|---|---|
| `_convert_raw_response_to_dict` | Detects Responses API `Response`
objects, extracts output items as a list |
| `_get_tool_requests` | Recognizes `type: "function_call"` items |
| `_get_tool_responses` | Recognizes `type: "function_call_output"`
items |
| `_create_tool_response` | New `responses_api` kwarg produces
`function_call_output` format |
| `_update_conversation` | Handles list return from
`_convert_raw_response_to_dict` |
| Non-agent mode path | Same list handling for traditional execution |

**`test_smart_decision_maker_responses_api.py`** — 61 tests covering:
- Every branch of all 6 affected helper functions
- Chat Completions, Anthropic, and Responses API formats
- End-to-end agent mode and traditional mode conversation validity

## Test plan

- [x] 61 new unit tests all pass
- [x] 11 existing SmartDecisionMakerBlock tests still pass (no
regressions)
- [x] All pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] CI integration tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Updates core LLM invocation and agent conversation/tool-call
bookkeeping to match OpenAI’s Responses API, which can affect tool
execution loops and prompt serialization across providers. Risk is
mitigated by extensive new unit tests, but regressions could surface in
production agent-mode flows or token/usage accounting.
> 
> **Overview**
> **Migrates OpenAI calls from Chat Completions to the Responses API
end-to-end**, including tool schema conversion, output parsing,
reasoning/text extraction, and updated token usage fields in
`LLMResponse`.
> 
> **Fixes SmartDecisionMakerBlock conversation/tool handling for
Responses API** by treating `raw_response` as a Response object
(splitting it into `output` items for replay), recognizing
`function_call`/`function_call_output` entries, and emitting tool
outputs in the correct Responses format to prevent invalid follow-up
prompts.
> 
> Also adjusts prompt compaction/token estimation to understand
Responses API tool items, changes
`get_execution_outputs_by_node_exec_id` to return list-valued
`CompletedBlockOutput`, removes `gpt-3.5-turbo` from model/cost/docs
lists, and adds focused unit tests plus a lightweight `conftest.py` to
run these tests without the full server stack.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ff292efd3d. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
2026-03-20 03:23:52 +00:00
Otto
f7a3491f91 docs(platform): add TDD guidance to CLAUDE.md files (#12491)
Requested by @majdyz

Adds TDD (test-driven development) guidance to CLAUDE.md files so Claude
Code follows a test-first workflow when fixing bugs or adding features.

**Changes:**
- **Parent `CLAUDE.md`**: Cross-cutting TDD workflow — write a failing
`xfail` test, implement the fix, remove the marker
- **Backend `CLAUDE.md`**: Concrete pytest example with
`@pytest.mark.xfail` pattern
- **Frontend `CLAUDE.md`**: Note about using Playwright `.fixme`
annotation for bug-fix tests

The workflow is: write a failing test first → confirm it fails for the
right reason → implement → confirm it passes. This ensures every bug fix
is covered by a test that would have caught the regression.

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-20 02:13:16 +00:00
Nicholas Tindle
cbff3b53d3 Revert "feat(backend): migrate OpenAI provider to Responses API" (#12490)
Reverts Significant-Gravitas/AutoGPT#12099

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Reverts the OpenAI integration in `llm_call` from the Responses API
back to `chat.completions`, which can change tool-calling, JSON-mode
behavior, and token accounting across core AI blocks. The change is
localized but touches the primary LLM execution path and associated
tests/docs.
> 
> **Overview**
> Reverts the OpenAI path in `backend/blocks/llm.py` from the Responses
API back to `chat.completions`, including updating JSON-mode
(`response_format`), tool handling, and usage extraction to match the
Chat Completions response shape.
> 
> Removes the now-unused `backend/util/openai_responses.py` helpers and
their unit tests, updates LLM tests to mock `chat.completions.create`,
and adds `gpt-3.5-turbo` to the supported model list, cost config, and
LLM docs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7d6226d10e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-03-20 01:51:56 +00:00
Reinier van der Leer
5b9a4c52c9 revert(platform): Revert invite system (#12485)
## Summary

Reverts the invite system PRs due to security gaps identified during
review:

- The move from Supabase-native `allowed_users` gating to
application-level gating allows orphaned Supabase auth accounts (valid
JWT without a platform `User`)
- The auth middleware never verifies `User` existence, so orphaned users
get 500s instead of clean 403s
- OAuth/Google SSO signup completely bypasses the invite gate
- The DB trigger that atomically created `User` + `Profile` on signup
was dropped in favor of a client-initiated API call, introducing a
failure window

### Reverted PRs
- Reverts #12347 — Foundation: InvitedUser model, invite-gated signup,
admin UI
- Reverts #12374 — Tally enrichment: personalized prompts from form
submissions
- Reverts #12451 — Pre-check: POST /auth/check-invite endpoint
- Reverts #12452 (collateral) — Themed prompt categories /
SuggestionThemes UI. This PR built on top of #12374's
`suggested_prompts` backend field and `/chat/suggested-prompts`
endpoint, so it cannot remain without #12374. The copilot empty session
falls back to hardcoded default prompts.

### Migration
Includes a new migration (`20260319120000_revert_invite_system`) that:
- Drops the `InvitedUser` table and its enums (`InvitedUserStatus`,
`TallyComputationStatus`)
- Restores the `add_user_and_profile_to_platform()` trigger on
`auth.users`
- Backfills `User` + `Profile` rows for any auth accounts created during
the invite-gate window

### What's NOT reverted
- The `generate_username()` function (never dropped, still used by
backfill migration)
- The old `add_user_to_platform()` function (superseded by
`add_user_and_profile_to_platform()`)
- PR #12471 (admin UX improvements) — was never merged, no action needed

## Test plan
- [x] Verify migration: `InvitedUser` table dropped, enums dropped,
trigger restored
- [x] Verify backfill: no orphaned auth users, no users without Profile
- [x] Verify existing users can still log in (email + OAuth)
- [x] Verify CoPilot chat page loads with default prompts
- [ ] Verify new user signup creates `User` + `Profile` via the restored
trigger
- [ ] Verify admin `/admin/users` page loads without crashing
- [ ] Run backend tests: `poetry run test`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-19 17:15:30 +00:00
Otto
0ce1c90b55 fix(frontend): rename "CoPilot" to "AutoPilot" on credits page (#12481)
Requested by @kcze

Renames "CoPilot" → "AutoPilot" on the credits/usage limits page:

- **Heading:** "CoPilot Usage Limits" → "AutoPilot Usage Limits"
- **Button:** "Open CoPilot" → "Open AutoPilot"
- Comment updated to match

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>

Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-19 15:25:21 +00:00
Ubbe
d4c6eb9adc fix(frontend): collapse navbar text to icons below 1280px (#12484)
## Summary

<img width="400" height="339" alt="Screenshot 2026-03-19 at 22 53 23"
src="https://github.com/user-attachments/assets/2fa76b8f-424d-4764-90ac-b7a331f5f610"
/>

<img width="600" height="595" alt="Screenshot 2026-03-19 at 22 53 31"
src="https://github.com/user-attachments/assets/23f51cc7-b01e-4d83-97ba-2c43683877db"
/>

<img width="800" height="523" alt="Screenshot 2026-03-19 at 22 53 36"
src="https://github.com/user-attachments/assets/1e447b9a-1cca-428c-bccd-1730f1670b8e"
/>

Now that we have the `Give feedback` button on the Navigation bar,
collpase some of the links below `1280px` so there is more space and
they don't collide with each other...

- Collapse navbar link text to icon-only below 1280px (`xl` breakpoint)
to prevent crowding
- Wallet button shows only the wallet icon below 1280px instead of "Earn
credits" text
- Feedback button shows only the chat icon below 1280px instead of "Give
Feedback" text
- Added `whitespace-nowrap` to feedback button to prevent wrapping

## Changes
- `NavbarLink.tsx`: `lg:block` → `xl:block` for link text
- `Wallet.tsx`: `md:hidden`/`md:inline-block` →
`xl:hidden`/`xl:inline-block`
- `FeedbackButton.tsx`: wrap text in `hidden xl:inline` span, add
`whitespace-nowrap`

## Test plan
- [ ] Resize browser between 1024px–1280px and verify navbar shows only
icons
- [ ] At 1280px+ verify full text labels appear for links, wallet, and
feedback
- [ ] Verify mobile navbar still works correctly below `md` breakpoint

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 15:10:27 +00:00
Ubbe
1bb91b53b7 fix(frontend/marketplace): comprehensive marketplace UI redesign (#12462)
## Summary

<img width="600" height="964" alt="Screenshot_2026-03-19_at_00 07 52"
src="https://github.com/user-attachments/assets/95c0430a-26a3-499b-8f6a-25b9715d3012"
/>
<img width="600" height="968" alt="Screenshot_2026-03-19_at_00 08 01"
src="https://github.com/user-attachments/assets/d440c3b0-c247-4f13-bf82-a51ff2e50902"
/>
<img width="600" height="939" alt="Screenshot_2026-03-19_at_00 08 14"
src="https://github.com/user-attachments/assets/f19be759-e102-4a95-9474-64f18bce60cf"
/>"
<img width="600" height="953" alt="Screenshot_2026-03-19_at_00 08 24"
src="https://github.com/user-attachments/assets/ba4fa644-3958-45e2-89e9-a6a4448c63c5"
/>



- Re-style and re-skin the Marketplace pages to look more "professional"
...
- Move the `Give feedback` button to the header

## Test plan
- [x] Verify marketplace page search bar matches Form text field styling
- [x] Verify agent cards have padding and subtle border
- [x] Verify hover/focus states work correctly
- [x] Check responsive behavior at different breakpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 22:28:01 +08:00
Ubbe
a5f9c43a41 feat(platform): replace suggestion pills with themed prompt categories (#12452)
## Summary



https://github.com/user-attachments/assets/13da6d36-5f35-429b-a6cf-e18316bb8709



Replaces the flat list of suggestion pills in the CoPilot empty session
with themed prompt categories (Learn, Create, Automate, Organize), each
shown as a popover with contextual prompts.

- **Backend**: Changes `suggested_prompts` from a flat `list[str]` to a
themed `dict[str, list[str]]` keyed by category. Updates Tally
extraction LLM prompt to generate prompts per theme, and the
`/suggested-prompts` API to return grouped themes. Legacy `list[str]`
rows are preserved under a `"General"` key for backward compatibility.
- **Frontend**: Replaces inline pill buttons with a `SuggestionThemes`
popover component. Each theme button (with icon) opens a dropdown of 5
relevant prompts. Falls back to hardcoded defaults when the API has no
personalized prompts. Normalizes partial API responses by padding
missing themes with defaults. Legacy `"General"` prompts are distributed
round-robin across themes so existing users keep their personalized
suggestions.

### Changes 🏗️

- `backend/data/understanding.py`: `suggested_prompts` field changed
from `list[str]` to `dict[str, list[str]]`; legacy list rows preserved
under `"General"` key; list items validated as strings
- `backend/data/tally.py`: LLM prompt updated to generate themed
prompts; validation now per-theme with blank-string rejection
- `backend/api/features/chat/routes.py`: New `SuggestedTheme` model;
endpoint returns `themes[]`
- `frontend/copilot/components/EmptySession/EmptySession.tsx`: Uses
generated API types directly (no cast)
- `frontend/copilot/components/EmptySession/helpers.ts`:
`DEFAULT_THEMES` replaces `DEFAULT_QUICK_ACTIONS`; `getSuggestionThemes`
normalizes partial API responses and distributes legacy `"General"`
prompts across themes
-
`frontend/copilot/components/EmptySession/components/SuggestionThemes/`:
New popover component with theme icons and loading states

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verify themed suggestion buttons render on CoPilot empty session
  - [x] Click each theme button and confirm popover opens with prompts
  - [x] Click a prompt and confirm it sends the message
- [x] Verify fallback to default themes when API returns no custom
prompts
- [x] Verify legacy users' personalized prompts are preserved and
visible


🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 18:46:12 +08:00
Otto
1240f38f75 feat(backend): migrate OpenAI provider to Responses API (#12099)
## Summary

Migrates the OpenAI provider in the LLM block from
`chat.completions.create` to `responses.create` — OpenAI's newer,
unified API. Also removes the obsolete GPT-3.5-turbo model.

Resolves #11624
Linear:
[OPEN-2911](https://linear.app/autogpt/issue/OPEN-2911/update-openai-calls-to-use-responsescreate)

## Changes

- **`backend/blocks/llm.py`** — OpenAI provider now uses
`responses.create` exclusively. Removed GPT-3.5-turbo enum + metadata.
- **`backend/util/openai_responses.py`** *(new)* — Helpers for the
Responses API: tool format conversion, content/reasoning/usage/tool-call
extraction.
- **`backend/util/openai_responses_test.py`** *(new)* — Unit tests for
all helper functions.
- **`backend/data/block_cost_config.py`** — Removed GPT-3.5 cost entry.
- **`docs/integrations/block-integrations/llm.md`** — Regenerated block
docs.

## Key API differences handled

| Aspect | Chat Completions | Responses API |
|--------|-----------------|---------------|
| Messages param | `messages` | `input` |
| Max tokens param | `max_completion_tokens` | `max_output_tokens` |
| Usage fields | `prompt_tokens` / `completion_tokens` | `input_tokens`
/ `output_tokens` |
| Tool format | Nested under `function` key | Flat structure |

## Test plan

- [x] Unit tests for all `openai_responses.py` helpers
- [x] Existing LLM block tests updated for Responses API mocks
- [x] Regular OpenAI models work
- [x] Reasoning OpenAI models work
- [x] Non-OpenAI models work

---------

Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 09:19:31 +00:00
Zamil Majdy
f617f50f0b dx(skills): improve pr-address skill — full thread context + PR description backtick fix (#12480)
## Summary

Improves the `pr-address` skill with two fixes:

- **Full comment thread loading**: Adds `--paginate` to the inline
comments fetch and explicit instructions to reconstruct threads using
`in_reply_to_id`, reading root-to-last-reply before acting. Previously,
only the opening comment was visible — missing reviewer replies led to
wrong fixes.
- **Backtick-safe PR descriptions**: Adds instructions to write the PR
body to a temp file via `<<'PREOF'` heredoc before passing to `gh pr
edit/create`. Inlining the body directly causes backticks to be
shell-escaped, breaking markdown rendering.

## Test plan
- [ ] Run `/pr-address` on a PR with multi-reply inline comment threads
— verify the last reply is what gets acted on
- [ ] Update a PR description containing backticks — verify they render
correctly in GitHub
2026-03-19 15:11:14 +07:00
Otto
943a1df815 dx(backend): Make Builder and Marketplace search work without embeddings (#12479)
When OpenAI credentials are unavailable (fork PRs, dev envs without API
keys), both builder block search and store agent functionality break:

1. **Block search returns wrong results.** `unified_hybrid_search` falls
back to a zero vector when embedding generation fails. With ~200 blocks
in `UnifiedContentEmbedding`, the zero-vector semantic scores are
garbage, and lexical matching on short block names is too weak — "Store
Value" doesn't appear in the top results for query "Store Value".

2. **Store submission approval fails entirely.**
`review_store_submission` calls `ensure_embedding()` inside a
transaction. When it throws, the entire transaction rolls back — no
store submissions get approved, the `StoreAgent` materialized view stays
empty, and all marketplace e2e tests fail.

3. **Store search returns nothing.** Even when store data exists,
`hybrid_search` queries `UnifiedContentEmbedding` which has no store
agent rows (backfill failed). It succeeds with zero results rather than
throwing, so the existing exception-based fallback never triggers.

### Changes 🏗️

- Replace `unified_hybrid_search` with in-memory text search in
`_hybrid_search_blocks` (-> `_text_search_blocks`). All ~200 blocks are
already loaded in memory, and `_score_primary_fields` provides correct
deterministic text relevance scoring against block name, description,
and input schema field descriptions — the same rich text the embedding
pipeline uses. CamelCase block names are split via `split_camelcase()`
to match the tokenization from PR #12400.

- Make embedding generation in `review_store_submission` best-effort:
catch failures and log a warning instead of rolling back the approval
transaction. The backfill scheduler retries later when credentials
become available.

- Fall through to direct DB search when `hybrid_search` returns empty
results (not just when it throws). The fallback uses ad-hoc
`to_tsvector`/`plainto_tsquery` with `ts_rank_cd` ranking on
`StoreAgent` view fields, restoring the search quality of the original
pre-hybrid implementation (stemming, stop-word removal, relevance
ranking).

- Fix Playwright artifact upload in end-to-end test CI

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `build.spec.ts`: 8/8 pass locally (was 0/7 before fix)
  - [x] All 79 e2e tests pass in CI (was 15 failures before fix)

---
Co-authored-by: Reinier van der Leer (@Pwuts)

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 00:11:06 +00:00
Otto
593001e0c8 fix(frontend): Remove dead Tutorial button from TallyPopup (#12474)
After the legacy builder was removed in #12082, the TallyPopup component
still showed a "Tutorial" button (bottom-right, next to "Give Feedback")
that navigated to `/build?resetTutorial=true`. Nothing handles that
param anymore, so clicking it did nothing.

This removes the dead button and its associated state/handler from
TallyPopup and useTallyPopup. The working tutorial (Shepherd.js
chalkboard icon in CustomControls) is unaffected.

**Changes:**
- `TallyPopup.tsx`: Remove Tutorial button JSX, unused imports
(`usePathname`, `useSearchParams`), and `isNewBuilder` check
- `useTallyPopup.ts`: Remove `showTutorial` state, `handleResetTutorial`
handler, unused `useRouter` import

Resolves SECRT-2109

---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>

Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
2026-03-19 00:09:46 +00:00
Ubbe
e1db8234a3 fix(frontend/copilot): constrain markdown heading sizes in user chat messages (#12463)
### Before

<img width="600" height="489" alt="Screenshot 2026-03-18 at 19 24 41"
src="https://github.com/user-attachments/assets/bb8dc0fa-04cd-4f32-8125-2d7930b4acde"
/>

Formatted headings in user messages would look massive

### After

<img width="600" height="549" alt="Screenshot 2026-03-18 at 19 24 33"
src="https://github.com/user-attachments/assets/51230232-c914-42dd-821f-3b067b80bab4"
/>

Markdown headings (`# H1` through `###### H6`) and setext-style headings
(`====`) in user chat messages rendered at their full HTML heading size,
which looked disproportionately large in the chat bubble context.

### Changes 🏗️

- Added Tailwind CSS overrides on the user message `MessageContent`
wrapper to cap all heading elements (h1-h6) at `text-lg font-semibold`
- Only affects user messages in copilot chat (via `group-[.is-user]`
selector); assistant messages are unchanged

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Send a user message containing `# Heading 1` through `######
Heading 6` and verify they all render at constrained size
- [ ] Send a message with `====` separator pattern and verify it doesn't
render as a mega H1
  - [ ] Verify assistant messages with headings still render normally

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 00:33:09 +08:00
Zamil Majdy
282173be9d feat(copilot): GitHub CLI support — inject GH_TOKEN and connect_integration tool (#12426)
## Summary

- When a user has connected GitHub, `GH_TOKEN` is automatically injected
into the Claude Agent SDK subprocess environment so `gh` CLI commands
work without any manual auth step
- When GitHub is **not** connected, the copilot can call a new
`connect_integration(provider="github")` MCP tool, which surfaces the
same credential setup card used by regular GitHub blocks — the user
connects inline without leaving the chat
- After connecting, the copilot is instructed to retry the operation
automatically

## Changes

**Backend**
- `sdk/service.py`: `_get_github_token_for_user()` fetches OAuth2 or API
key credentials and injects `GH_TOKEN` + `GITHUB_TOKEN` into `sdk_env`
before the SDK subprocess starts (per-request, thread-safe via
`ClaudeAgentOptions.env`)
- `tools/connect_integration.py`: new `ConnectIntegrationTool` MCP tool
— returns `SetupRequirementsResponse` for a given provider (`github` for
now); extensible via `_PROVIDER_INFO` dict
- `tools/__init__.py`: registers `connect_integration` in
`TOOL_REGISTRY`
- `prompting.py`: adds GitHub CLI / `connect_integration` guidance to
`_SHARED_TOOL_NOTES`

**Frontend**
- `ConnectIntegrationTool/ConnectIntegrationTool.tsx`: thin wrapper
around the existing `SetupRequirementsCard` with a tailored retry
instruction
- `MessagePartRenderer.tsx`: dispatches `tool-connect_integration` to
the new component

## Test plan

- [ ] User with GitHub credentials: `gh pr list` works without any auth
step in copilot
- [ ] User without GitHub credentials: copilot calls
`connect_integration`, card renders with GitHub credential input, after
connecting copilot retries and `gh` works
- [ ] `GH_TOKEN` is NOT leaked across users (injected via
`ClaudeAgentOptions.env`, not `os.environ`)
- [ ] `connect_integration` with unknown provider returns a graceful
error message
2026-03-18 11:52:42 +00:00
Zamil Majdy
5d9a169e04 feat(blocks): add AutoPilotBlock for invoking AutoPilot from graphs (#12439)
## Summary
- Adds `AutogptCopilotBlock` that invokes the platform's copilot system
(`stream_chat_completion_sdk`) directly from graph executions
- Enables sub-agent patterns: copilot can call this block recursively
(with depth limiting via `contextvars`)
- Enables scheduled copilot execution through the agent executor system
- No user credentials needed — uses server-side copilot config

## Inputs/Outputs
**Inputs:** prompt, system_context, session_id (continuation), timeout,
max_recursion_depth
**Outputs:** response text, tool_calls list, conversation_history JSON,
session_id, token_usage

## Test plan
- [x] Block test passes (`test_available_blocks[AutogptCopilotBlock]`)
- [x] Pre-commit hooks pass (format, lint, typecheck)
- [ ] Manual test: add block to graph, send prompt, verify response
- [ ] Manual test: chain two copilot blocks with session_id to verify
continuation
2026-03-18 11:22:25 +00:00
Ubbe
6fd1050457 fix(backend): arch-conditional chromium in Docker for ARM64 compatibility (#12466)
## Summary
- On **amd64**: keep `agent-browser install` (Chrome for Testing —
pinned version tested with Playwright) + restore runtime libs
- On **arm64**: install system `chromium` package (Chrome for Testing
has no ARM64 binary) + skip `agent-browser install`
- An entrypoint script sets
`AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` at container startup
on arm64 (detected via presence of `/usr/bin/chromium`); on amd64 the
var is left unset so agent-browser uses Chrome for Testing as before

**Why not system chromium on amd64?** `agent-browser install` downloads
a specific Chrome for Testing version pinned to the Playwright version
in use. Using whatever Debian ships on amd64 could cause protocol
compatibility issues.

Introduced by #12301 (cc @Significant-Gravitas/zamil-majdy)

## Test plan
- [ ] `docker compose up --build` succeeds on ARM64 (Apple Silicon)
- [ ] `docker compose up --build` succeeds on x86_64
- [ ] Copilot browser tools (`browser_navigate`, `browser_act`,
`browser_screenshot`) work in a Copilot session on both architectures

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-18 19:08:14 +08:00
Otto
02708bcd00 fix(platform): pre-check invite eligibility before Supabase signup (#12451)
Requested by @Swiftyos

The invite gate check in `get_or_activate_user()` runs after Supabase
creates the auth user, resulting in orphaned auth accounts with no
platform access when a non-invited user signs up. Users could create a
Supabase account but had no `User`, `Profile`, or `Onboarding` records —
they could log in but access nothing.

### Changes 🏗️

**Backend** (`v1.py`, `invited_user.py`):
- Add public `POST /api/auth/check-invite` endpoint (no auth required —
this is a pre-signup check)
- Add `check_invite_eligibility()` helper in the data layer
- Returns `{allowed: true}` when `enable_invite_gate` is disabled
- Extracted `is_internal_email()` helper to deduplicate `@agpt.co`
bypass logic (was duplicated between route and `get_or_activate_user`)
- Checks `InvitedUser` table for `INVITED` status
- Added IP-based Redis rate limiting (10 req/60 s per IP, fails open if
Redis unavailable, returns HTTP 429 when exceeded)
- Fixed Redis pipeline atomicity: `incr` + `expire` now sent in a single
pipeline round-trip, preventing a TTL-less key if `expire` had
previously failed after `incr`
- Fixed incorrect `await` on `pipe.incr()` / `pipe.expire()` — redis-py
async pipeline queue methods are synchronous; only `execute()` is
awaitable. The erroneous `await` was silently swallowed by the `except`
block, making the rate limiter completely non-functional

**Frontend** (`signup/actions.ts`):
- Call the generated `postV1CheckIfAnEmailIsAllowedToSignUp` client
(replacing raw `fetch`) before `supabase.auth.signUp()`
- `ApiError` (non-OK HTTP responses) logs a Sentry warning with the HTTP
status; network/other errors capture a Sentry exception
- If not allowed, return `not_allowed` error (existing
`EmailNotAllowedModal` handles this)
- Graceful fallback: if the pre-check fails (backend unreachable), falls
through to the existing flow — `get_or_activate_user()` remains as
defense-in-depth

**Tests** (`v1_test.py`, `invited_user_test.py`):
- 5 route-level tests covering: gate disabled → allowed, `@agpt.co`
bypass, eligible email, ineligible email, rate-limit exceeded
- Rate-limit test mock updated to use pipeline interface
(`pipeline().execute()` returns `[count, True]`)
- Existing `invited_user_test.py` updated to cover
`check_invite_eligibility` branches

**Not changed:**
- Google OAuth flow — already gated by OAuth provider settings
- `get_or_activate_user()` — stays as backend safety net
- All admin invite CRUD routes — unchanged

### Test plan
1. Email/password signup with invited email → signup proceeds normally
2. Email/password signup with non-invited email → `EmailNotAllowedModal`
shown, no Supabase user created
3. `enable_invite_gate=false` → all emails allowed
4. Backend unreachable during pre-check → falls through to existing flow
5. Same IP exceeds 10 requests/60 s → HTTP 429 returned

---
Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com>

---------

Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-18 10:36:50 +00:00
Zamil Majdy
156d61fe5c dx(skills): add merge conflict detection and resolution to pr-address (#12469)
## Summary
- Adds merge conflict detection as step 2 of the polling loop (between
CI check and comment check), including handling of the transient
`"UNKNOWN"` state
- Adds a "Resolving merge conflicts" section with step-by-step
instructions using 3-way merge (no force push needed since PRs are
squash-merged)
- Validates all three git conflict markers before staging to prevent
committing broken code
- Fixes `args` → `argument-hint` in skill frontmatter

## Test plan
- [ ] Verify skill renders correctly in Claude Code
2026-03-18 17:46:32 +07:00
Zamil Majdy
5a29de0e0e fix(platform): try-compact-retry for prompt-too-long errors in CoPilot SDK (#12413)
## Summary

When the Claude SDK returns a prompt-too-long error (e.g. transcript +
query exceeds the model's context window), the streaming loop now
retries with escalating fallbacks instead of failing immediately:

1. **Attempt 1**: Use the transcript as-is (normal path)
2. **Attempt 2**: Compact the transcript via LLM summarization
(`compact_transcript`) and retry
3. **Attempt 3**: Drop the transcript entirely and fall back to
DB-reconstructed context (`_build_query_message`)

If all 3 attempts fail, a `StreamError(code="prompt_too_long")` is
yielded to the frontend.

### Key changes

**`service.py`**
- Add `_is_prompt_too_long(err)` — pattern-matches SDK exceptions for
prompt-length errors (`prompt is too long`, `prompt_too_long`,
`context_length_exceeded`, `request too large`)
- Wrap `async with ClaudeSDKClient` in a 3-attempt retry `for` loop with
compaction/fallback logic
- Move `current_message`, `_build_query_message`, and
`_prepare_file_attachments` before the retry loop (computed once,
reused)
- Skip transcript upload in `finally` when `transcript_caused_error`
(avoids persisting a broken/empty transcript)
- Reset `stream_completed` between retry iterations
- Document outer-scope variable contract in `_run_stream_attempt`
closure (which variables are reassigned between retries vs read-only)

**`transcript.py`**
- Add `compact_transcript(content, log_prefix, model)` — converts JSONL
→ messages → `compress_context` (LLM summarization with truncation
fallback) → JSONL
- Add helpers: `_flatten_assistant_content`,
`_flatten_tool_result_content`, `_transcript_to_messages`,
`_messages_to_transcript`, `_run_compression`
- Returns `None` when compaction fails or transcript is already within
budget (signals caller to fall through to DB fallback)
- Truncation fallback wrapped in 30s timeout to prevent unbounded CPU
time on large transcripts
- Accepts `model` parameter to avoid creating a new `ChatConfig()` on
every call

**`util/prompt.py`**
- Fix `_truncate_middle_tokens` edge case: returns empty string when
`max_tok < 1`, properly handles `max_tok < 3`

**`config.py`**
- E2B sandbox timeout raised from 5 min to 15 min to accommodate
compaction retries

**`prompt_too_long_test.py`** (new, 45 tests)
- `_is_prompt_too_long` positive/negative patterns, case sensitivity,
BaseException handling
- Flatten helpers for assistant/tool_result content blocks
- `_transcript_to_messages` / `_messages_to_transcript` roundtrip,
strippable types, empty content
- `compact_transcript` async tests: too few messages, not compacted,
successful compaction, compression failure

**`retry_scenarios_test.py`** (new, 27 tests)
- Full retry state machine simulation covering all 8 scenarios:
  1. Normal flow (no retry)
  2. Compact succeeds → retry succeeds
  3. Compact fails → DB fallback succeeds
  4. No transcript → DB fallback succeeds
  5. Double fail → DB fallback on attempt 3
  6. All 3 attempts exhausted
  7. Non-prompt-too-long error (no retry)
  8. Compaction returns identical content → DB fallback
- Edge cases: nested exceptions, case insensitivity, unicode content,
large transcripts, resume-after-compaction flow

**Shared test fixtures** (`conftest.py`)
- Extracted `build_test_transcript` helper used across 3 test files to
eliminate duplication

## Test plan

- [x] `_is_prompt_too_long` correctly identifies prompt-length errors (8
positive, 5 negative patterns)
- [x] `compact_transcript` compacts oversized transcripts via LLM
summarization
- [x] `compact_transcript` returns `None` on failure or when already
within budget
- [x] Retry loop state machine: all 8 scenarios verified with state
assertions
- [x] `TranscriptBuilder` works correctly after loading compacted
transcripts
- [x] `_messages_to_transcript` roundtrip preserves content including
unicode
- [x] `transcript_caused_error` prevents stale transcript upload
- [x] Truncation timeout prevents unbounded CPU time
- [x] All 139 unit tests pass locally
- [x] CI green (tests 3.11/3.12/3.13, types, CodeQL, linting)
2026-03-18 10:27:31 +00:00
Otto
e657472162 feat(blocks): Add Nano Banana 2 to image generator, customizer, and editor blocks (#12218)
Requested by @Torantulino

Add `google/nano-banana-2` (Gemini 3.1 Flash Image) support across all
three image blocks.

### Changes

**`ai_image_customizer.py`**
- Add `NANO_BANANA_2 = "google/nano-banana-2"` to `GeminiImageModel`
enum
- Update block description to reference Nano-Banana models generically

**`ai_image_generator_block.py`**
- Add `NANO_BANANA_2` to `ImageGenModel` enum
- Add generation branch (identical to NBP except model name)

**`flux_kontext.py` (AI Image Editor)**
- Rename `FluxKontextModelName` → `ImageEditorModel` (with
backwards-compatible alias)
- Add `NANO_BANANA_PRO` and `NANO_BANANA_2` to the editor
- Model-aware branching in `run_model()`: NB models use `image_input`
list (not `input_image`), no `seed`, and add `output_format`

**`block_cost_config.py`**
- Add NB2 cost entries for all three blocks (14 credits, matching NBP)
- Add NB Pro cost entry for editor block
- Update editor block refs from `.PRO`/`.MAX` to
`.FLUX_KONTEXT_PRO`/`.FLUX_KONTEXT_MAX`

Resolves SECRT-2047

---------

Co-authored-by: Torantulino <Torantulino@users.noreply.github.com>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
2026-03-18 09:42:18 +00:00
DEEVEN SERU
4d00e0f179 fix(blocks): allow falsy entries in AddToListBlock (#12028)
## Summary
- treat AddToListBlock.entry as optional rather than truthy so
0/""/False are appended
- extend block self-tests with a falsy entry case

## Testing
- Not run (pytest not available in environment)

Co-authored-by: DEEVEN SERU <144827577+DEVELOPER-DEEVEN@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-03-18 09:42:14 +00:00
DEEVEN SERU
1d7282b5f3 fix(backend): Truncate filenames with excessively long 'extensions' (#12025)
Fixes issue where filenames with no dots until the end (or massive
extensions) bypassed truncation logic, causing OSError [Errno 36].
Limits extension preservation to 20 chars.

---------

Co-authored-by: DEVELOPER-DEEVEN <144827577+DEVELOPER-DEEVEN@users.noreply.github.com>
2026-03-18 09:42:06 +00:00
Reinier van der Leer
e3591fcaa3 ci(backend): Python version specific type checking (#12453)
- Resolves #10657
- Partially based on #10913

### Changes 🏗️

- Run Pyright separately for each supported Python version
  - Move type checking and linting into separate jobs
    - Add `--skip-pyright` option to lint script
- Move `linter.py` into `backend/scripts`
  - Move other scripts in `backend/` too for consistency

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI

---

Co-authored-by: @Joaco2603 <jpappa2603@gmail.com>

---------

Co-authored-by: Joaco2603 <jpappa2603@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 09:41:35 +00:00
Reinier van der Leer
876dc32e17 chore(backend): Update poetry to v2.2.1 (#12459)
Poetry v2.2.1 has bugfixes that are relevant in context of our
`.pre-commit-config.yaml`

### Changes 🏗️

- Update `poetry` from v2.1.1 to v2.2.1 (latest version supported by
Dependabot)
- Re-generate `poetry.lock`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI
2026-03-18 09:41:28 +00:00
Reinier van der Leer
616e29f5e4 fix tests for 6d0e206 2026-03-18 10:39:51 +01:00
Zamil Majdy
280a98ad38 dx(skills): poll for new PR comments while waiting for CI (#12461)
## Summary
- Updates the `pr-address` skill to poll for new PR comments while
waiting for CI, instead of blocking solely on `gh pr checks --watch
--fail-fast`
- Runs CI watch in the background and polls all 3 comment endpoints
every 30s
- Allows bot comments (coderabbitai, sentry) to be addressed in parallel
with CI rather than sequentially

## Test plan
- [ ] Run `/pr-address` on a PR with pending CI and verify it detects
new comments while CI is running
- [ ] Verify CI failures are still handled correctly after the combined
wait
2026-03-18 15:07:13 +07:00
Reinier van der Leer
c7f2a7dd03 fix formatting 2026-03-17 20:30:33 +01:00
Otto
6d0e2063ec Merge commit from fork
* fix(backend): add resource limits to Jinja2 template rendering

Prevent DoS via computational exhaustion in FillTextTemplateBlock by:

- Subclassing SandboxedEnvironment to intercept ** and * operators
  with caps on exponent size (1000) and string repeat length (10K)
- Replacing range() global with a capped version (max 10K items)
- Wrapping template.render() in a ThreadPoolExecutor with a 10s
  timeout to kill runaway expressions

Addresses GHSA-ppw9-h7rv-gwq9 (CWE-400).

* address review: move helpers after TextFormatter, drop ThreadPoolExecutor

- Move _safe_range and _RestrictedEnvironment below TextFormatter
  (helpers after the function that uses them)
- Remove ThreadPoolExecutor timeout wrapper from format_string() —
  it has problematic behavior in async contexts and the static
  interception (operator caps, range limit) already covers the
  known attack vectors

* address review: extend sequence guard, harden format_email, add tests

- Extend * guard to cover list and tuple repetition, not just strings
  (blocks {{ [0] * 999999999 }} and {{ (0,) * 999999999 }})
- Rename MAX_STRING_REPEAT → MAX_SEQUENCE_REPEAT
- Use _RestrictedEnvironment in format_email (defense-in-depth)
- Add tests: list repeat, tuple repeat, negative exponent, nested
  exponentiation (18 tests total)

* add async timeout wrapper at block level

Wrap format_string calls in FillTextTemplateBlock and AgentOutputBlock
with asyncio.wait_for(asyncio.to_thread(...), timeout=10s).

This provides defense-in-depth: if an expression somehow bypasses the
static operator checks, the async timeout will cancel it. Uses
asyncio.to_thread for proper async integration (no event loop blocking)
and asyncio.wait_for for real cancellation on timeout.

* make format_string async with timeout kwarg

Move asyncio.wait_for + asyncio.to_thread into format_string() itself
with a timeout kwarg (default 10s). This way all callers get the
timeout automatically — no wrapper needed at each call site.

- format_string() is now async, callers use await
- format_email() is now async (calls format_string internally)
- Updated all callers: text.py, io.py, llm.py, smart_decision_maker.py,
  email.py, notifications.py
- Tests updated to use asyncio.run()

* use Jinja2 native async rendering instead of to_thread

Switch from asyncio.to_thread(template.render) to Jinja2's native
enable_async=True + template.render_async(). No thread overhead,
proper async integration. asyncio.wait_for timeout still applies.

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-17 20:24:04 +01:00
Zamil Majdy
8b577ae194 feat(backend/copilot): add direct ID lookup to find_agent and find_block tools (#12446)
## Summary
- Add direct `creator/slug` lookup to `find_agent` marketplace search,
bypassing full-text search when an exact identifier is provided
- Add direct UUID lookup to `find_block`, returning the block
immediately when a valid block ID is given
- Update tool descriptions and parameter hints to document the new
lookup capabilities

## Test plan
- [ ] Verify `find_agent` with a `creator/slug` query returns the exact
agent
- [ ] Verify `find_agent` falls back to search when slug lookup fails
- [ ] Verify `find_block` with a block UUID returns the exact block
- [ ] Verify `find_block` with a non-existent UUID falls through to
search
- [ ] Verify excluded block types/IDs are still filtered in direct
lookup
2026-03-17 16:41:17 +00:00
Zamil Majdy
d8f5f783ae feat(copilot): enable SmartDecisionMakerBlock in agent generator (#12438)
## Summary
- Enable the agent generator to create orchestrator agents using
**SmartDecisionMakerBlock** with agent mode
- SmartDecisionMaker + AgentExecutorBlock tools = autonomous agent that
decides which sub-agents to call, executes them, reads results, and
loops until done
- Follows existing patterns (AgentExecutorBlock/MCPToolBlock) for fixer,
validator, and guide documentation

## Changes
- Remove SmartDecisionMakerBlock from `COPILOT_EXCLUDED_BLOCK_IDS` in
`find_block.py`
- Add `SMART_DECISION_MAKER_BLOCK_ID` constant to `helpers.py`
- Add `fix_smart_decision_maker_blocks()` in `fixer.py` — populates
agent-mode defaults (`max_iterations=-1`,
`conversation_compaction=True`, etc.)
- Add `validate_smart_decision_maker_blocks()` in `validator.py` —
ensures downstream tool blocks are connected
- Add SmartDecisionMakerBlock documentation section in
`agent_generation_guide.md`
- Add 18 tests: 7 fixer, 7 validator, 4 e2e pipeline

## Test plan
- [x] All 18 new tests pass
(`test/agent_generator/test_smart_decision_maker.py`)
- [x] All 31 existing agent generator tests still pass
- [x] Pre-commit hooks (ruff, black, isort, pyright) all pass
- [ ] Manual: use CoPilot to generate an orchestrator agent with
SmartDecisionMakerBlock

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-17 16:30:04 +00:00
Reinier van der Leer
82d22f3680 dx(backend): Update CLAUDE.md (#12458)
- Prefer f-strings except for debug statements
- Top-down module/function/class ordering

As suggested by @majdyz, this is more effective than commenting on every
single instance on PRs.
2026-03-17 16:27:09 +00:00
Zamil Majdy
50622333d1 fix(backend/copilot): fix tool-result file read failures across turns (#12399)
## Summary
- **Path validation fix**: `is_allowed_local_path()` now correctly
handles the SDK's nested conversation UUID path structure
(`<encoded-cwd>/<conversation-uuid>/tool-results/<file>`) instead of
only matching `<encoded-cwd>/tool-results/<file>`
- **`read_workspace_file` fallback**: When the model mistakenly calls
`read_workspace_file` for an SDK tool-result path (local disk, not cloud
storage), the tool now falls back to reading from local disk instead of
returning "file not found"
- **Cross-turn cleanup fix**: Stopped deleting
`~/.claude/projects/<encoded-cwd>/` between turns — tool-result files
now persist across `--resume` turns so the model can re-read them. Added
TTL-based stale directory sweeping (24h) to prevent unbounded disk
growth.
- **System prompt**: Added guidance telling the model to use `read_file`
(not `read_workspace_file`) for SDK tool-result paths
- **Symlink escape fix** (e2b_file_tools.py): Added `readlink -f`
canonicalization inside the E2B sandbox to detect symlink-based path
escapes before writes
- **Stash timeout increase**: `wait_for_stash` timeout increased from
0.5s to 2.0s, with a post-timeout `sleep(0)` fallback

### Root cause
Investigated via Langfuse trace `5116befdca6a6ff9a8af6153753e267d`
(session `d5841fd8`). The model ran 3 Perplexity deep research calls,
SDK truncated large outputs to `~/.claude/projects/.../tool-results/`
files. Model then called `read_workspace_file` (cloud DB) instead of
`read_file` (local disk), getting "file not found". Additionally, the
path validation check didn't account for the SDK's nested UUID directory
structure, and cleanup between turns deleted tool-result files that the
transcript still referenced.

## Test plan
- [x] All 653 copilot tests pass (excluding 1 pre-existing infra test)
- [x] Security test `test_read_claude_projects_settings_json_denied`
still passes — non-tool-result files under the project dir are still
blocked
- [x] `poetry run format` passes all checks
2026-03-17 15:57:15 +00:00
Zamil Majdy
27af5782a9 feat(skills): add gh pr checks --watch to pr-address loop (#12457)
## Summary
- Teaches the `pr-address` skill to use `gh pr checks --watch
--fail-fast` for efficient CI waiting instead of manual polling
- Adds guidance on investigating failures with `gh run view
--log-failed`
- Adds explicit "between CI waits" section: re-fetch and address new bot
comments while CI runs

## Test plan
- [x] Verified the updated skill renders correctly
- [ ] Use `/pr-address` on a PR with pending CI to confirm the new flow
works
2026-03-17 22:10:18 +07:00
Otto
522f932e67 Merge commit from fork
SendEmailBlock accepted user-supplied smtp_server and smtp_port inputs
and passed them directly to smtplib.SMTP() with no IP validation,
bypassing the platform's SSRF protections in request.py.

This fix:
- Makes _resolve_and_check_blocked public in request.py so non-HTTP
  blocks can reuse the same IP validation
- Validates the SMTP server hostname via resolve_and_check_blocked()
  before connecting
- Restricts allowed SMTP ports to standard values (25, 465, 587, 2525)
- Catches SMTPConnectError and SMTPServerDisconnected to prevent TCP
  banner leakage in error messages

Fixes GHSA-4jwj-6mg5-wrwf
2026-03-17 15:55:49 +01:00
Otto
a6124b06d5 Merge commit from fork
* fix(backend): add HMAC signing to Redis cache to prevent pickle deserialization attacks

Add HMAC-SHA256 integrity verification to all values stored in the shared
Redis cache. This prevents cache poisoning attacks where an attacker with
Redis access injects malicious pickled payloads that execute arbitrary code
on deserialization.

Changes:
- Sign pickled values with HMAC-SHA256 before storing in Redis
- Verify HMAC signature before deserializing cached values
- Reject tampered or unsigned (legacy) cache entries gracefully
  (treated as cache misses, logged as warnings)
- Derive HMAC key from redis_password or unsubscribe_secret_key
- Add tests for HMAC round-trip, tamper detection, and legacy rejection

Fixes GHSA-rfg2-37xq-w4m9

* improve log message

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-03-17 15:52:37 +01:00
Otto
ae660ea04f Merge commit from fork
Replace NamedTemporaryFile(delete=False) with a direct Response,
preventing unbounded disk consumption on the public download endpoint.

Fixes: GHSA-374w-2pxq-c9jp
2026-03-17 15:33:55 +01:00
Otto
2479f3a1c4 Merge commit from fork
- Normalize IPv4-mapped IPv6 addresses (e.g. ::ffff:127.0.0.1) to IPv4
  before checking against blocked networks, preventing blocklist bypass
- Add missing blocked ranges: CGNAT (100.64.0.0/10), IETF Protocol
  Assignments (192.0.0.0/24), Benchmarking (198.18.0.0/15)
- Add comprehensive tests for IPv4-mapped bypass and new blocked ranges
2026-03-17 14:43:38 +01:00
Abhimanyu Yadav
8153306384 feat(frontend): reusable confetti with enhanced particles and dual bursts (#12454)
<!-- Clearly explain the need for these changes: -->

The previous confetti implementation using party-js was causing lag.
Replaced it with canvas-confetti for smoother, more performant
celebrations with enhanced visual effects.

### Changes 🏗️

- **New Confetti Component**: Reusable canvas-confetti wrapper with
AutoGPT purple color palette and Storybook stories demonstrating various
effects
- **Enhanced Wallet Confetti**: Dual simultaneous bursts at 45° and 135°
angles with larger particles (scalar 1.2) for better visibility
- **Enhanced Task Celebration**: Dual-burst confetti for task group and
individual task completion events
- **Onboarding Congrats Page**: Replaced party-js with canvas-confetti
for side-cannon animation effect
- **Dependency**: Added canvas-confetti v1.9.4, removed party-js

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Trigger task completion in wallet to see dual-burst confetti at
45° and 135° angles
- [x] Complete tasks/groups to verify celebration confetti displays with
larger particles
  - [x] Visit onboarding congratulations page to see side-cannon effect
  - [x] Verify confetti rendering performance and no console errors
2026-03-17 12:49:15 +00:00
Abhimanyu Yadav
9c3d100a22 feat(frontend): add builder e2e tests for new Flow Editor (#12436)
### Changes
- Replace skipped legacy builder tests with 8 working Playwright e2e
tests
  targeting the new Flow Editor
- Rewrite `BuildPage` page object to match new `data-id`/`data-testid`
  selectors
- Update `agent-activity.spec.ts` to use new `BuildPage` API

### Tests added
  - Build page loads successfully (canvas + control buttons)
  - Add a block via block menu search
  - Add multiple blocks
  - Remove a block (select + Backspace)
  - Save an agent (name/description, verify flowID in URL)
  - Save and verify run button becomes enabled
  - Copy and paste a node (Cmd+C/V)
  - Run an agent from the builder

 ### Test plan
  - [x] All 8 builder tests pass locally (`pnpm test:no-build
  src/tests/build.spec.ts`)
  - [x] `pnpm format`, `pnpm lint`, `pnpm types` all clean
  - [x] CI passes
2026-03-17 12:48:59 +00:00
Zamil Majdy
fc3bf6c154 fix(copilot): handle transient Anthropic API connection errors gracefully (#12445)
## Summary
- Detect transient Anthropic API errors (ECONNRESET, "socket connection
was closed unexpectedly") across all error paths in the copilot SDK
streaming loop
- Replace raw technical error messages with user-friendly text:
**"Anthropic connection interrupted — please retry"**
- Add `retryable` field to `StreamError` model so the frontend can
distinguish retryable errors
- Add **"Try Again" button** on the error card for transient errors,
which re-sends the last user message

### Background
Sentry issue
[AUTOGPT-SERVER-875](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-875)
— 25+ events since March 13, caused by Anthropic API infrastructure
instability (confirmed by their status page). Same SDK/code on dev and
prod, prod-only because of higher volume of long-running streaming
sessions.

### Changes
**Backend (`constants.py`, `service.py`, `response_adapter.py`,
`response_model.py`):**
- `is_transient_api_error()` — pattern-matching helper for known
transient error strings
- Intercept transient errors in 3 places: `AssistantMessage.error`,
stream exceptions, `BaseException` handler
- Use friendly message in error markers persisted to session (so it
shows properly on page refresh too)
- `StreamError.retryable` field for frontend consumption

**Frontend (`ChatContainer`, `ChatMessagesContainer`,
`MessagePartRenderer`):**
- Thread `onRetry` callback from `ChatContainer` →
`ChatMessagesContainer` → `MessagePartRenderer`
- Detect transient error text in error markers and show "Try Again"
button via existing `ErrorCard.onRetry`
- Clicking "Try Again" re-sends the last user message (backend
auto-cleans stale error markers)

Fixes SECRT-2128, SECRT-2129, SECRT-2130

## Test plan
- [ ] Verify transient error detection with `is_transient_api_error()`
for known patterns
- [ ] Confirm error card shows "Anthropic connection interrupted —
please retry" instead of raw socket error
- [ ] Confirm "Try Again" button appears on transient error cards
- [ ] Confirm "Try Again" re-sends the last user message successfully
- [ ] Confirm non-transient errors (e.g., "Prompt is too long") still
show original error text without retry button
- [ ] Verify error marker persists correctly on page refresh
2026-03-17 12:48:53 +00:00
Abhimanyu Yadav
e32d258a7e feat(blocks): add AgentMail integration blocks (#12417)
## Summary
- Add a full AgentMail integration with blocks for managing inboxes,
messages, threads, drafts, attachments, lists, and pods
- Includes shared provider configuration (`_config.py`) with API key
authentication
- 8 block modules covering ~25 individual blocks across all AgentMail
API surfaces

  ## Block Modules
  | Module | Blocks |
  |--------|--------|
  | `inbox.py` | Create, Get, List, Update, Delete inboxes |
| `messages.py` | Send, Get, List, Delete messages + org-wide listing |
  | `threads.py` | Get, List, Delete threads + org-wide listing |
| `drafts.py` | Create, Get, List, Update, Send, Delete drafts +
org-wide listing |
  | `attachments.py` | Download attachments |
  | `lists.py` | Create, Get, List, Update, Delete mailing lists |
  | `pods.py` | Create, Get, List, Update, Delete pods |

  ## Test plan
- [x] `poetry run pytest 'backend/blocks/test/test_block.py' -xvs` — all
new blocks pass the standard block test suite
  - [x] test all blocks manually
2026-03-17 12:40:32 +00:00
Abhimanyu Yadav
3e86544bfe feat(frontend): add graph search functionality to new builder (#12395)
### Changes
- Integrates the existing graph search components into the new builder's
control panel
- Search by block name/title, block type, node inputs/outputs, and
description with fuzzy matching
  (Jaro-Winkler)
- Clicking a result zooms/navigates to the node on the canvas
- Keyboard shortcut Cmd/Ctrl+F to open search
- Arrow key navigation and Enter to select within results
- Styled to match the new builder's block menu card pattern


https://github.com/user-attachments/assets/41ed676d-83b1-4f00-8611-00d20987a7af


### Test plan

- [x] Open builder with a graph containing multiple nodes
- [x] Click magnifying glass icon in control panel — search panel opens
- [x] Type a query — results filter by name, type, inputs, outputs
- [x] Click a result — canvas zooms to that node
- [x] Use arrow keys + Enter to navigate and select results
- [x] Press Cmd/Ctrl+F — search panel opens
- [x] Press Escape or click outside — search panel closes and query
clears
2026-03-17 12:19:54 +00:00
2283 changed files with 21105 additions and 735816 deletions

View File

@@ -2,7 +2,7 @@
name: pr-address
description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
user-invocable: true
args: "[PR number or URL] — if omitted, finds PR for current branch."
argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
@@ -19,16 +19,60 @@ gh pr view {N}
## Fetch comments (all sources)
### 1. Inline review threads — GraphQL (primary source of actionable items)
Use GraphQL to fetch inline threads. It natively exposes `isResolved`, returns threads already grouped with all replies, and paginates via cursor — no manual thread reconstruction needed.
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews # top-level reviews
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments # inline review comments
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments # PR conversation comments
gh api graphql -f query='
{
repository(owner: "Significant-Gravitas", name: "AutoGPT") {
pullRequest(number: {N}) {
reviewThreads(first: 100) {
pageInfo { hasNextPage endCursor }
nodes {
id
isResolved
path
comments(last: 1) {
nodes { databaseId body author { login } createdAt }
}
}
}
}
}
}'
```
**Bots to watch for:**
- `autogpt-reviewer` — posts "Blockers", "Should Fix", "Nice to Have". Address ALL of them.
- `sentry[bot]` — bug predictions. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — automated review. Address actionable items.
If `pageInfo.hasNextPage` is true, fetch subsequent pages by adding `after: "<endCursor>"` to `reviewThreads(first: 100, after: "...")` and repeat until `hasNextPage` is false.
**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
### 2. Top-level reviews — REST (MUST paginate)
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
Two things to extract:
- **Overall state**: look for `CHANGES_REQUESTED` or `APPROVED` reviews.
- **Actionable feedback**: non-empty bodies only. Empty-body reviews are thread-resolution events — they indicate progress but have no feedback to act on.
**Where each reviewer posts:**
- `autogpt-reviewer` — posts detailed structured reviews ("Blockers", "Should Fix", "Nice to Have") as **top-level reviews**. Not present on every PR. Address ALL items.
- `sentry[bot]` — posts bug predictions as **inline threads**. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — posts summaries as **top-level reviews** AND actionable items as **inline threads**. Address actionable items.
- Human reviewers — can post in any source. Address ALL non-empty feedback.
### 3. PR conversation comments — REST
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.
## For each unaddressed comment
@@ -40,8 +84,8 @@ Address comments **one at a time**: fix → commit → push → inline reply →
| Comment type | How to reply |
|---|---|
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="Fixed in <commit-sha>: <description>"` |
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
## Format and commit
@@ -69,11 +113,88 @@ For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).
```text
address comments → format → commit → push
re-check comments → fix new ones → push
→ wait for CI → re-check comments after CI settles
wait for CI (while addressing new comments) → fix failures → push
→ re-check comments after CI settles
→ repeat until: all comments addressed AND CI green AND no new comments arriving
```
While CI runs, stay productive: run local tests, address remaining comments.
### Polling for CI + new comments
**The loop ends when:** CI fully green + all comments addressed + no new comments since CI settled.
After pushing, poll for **both** CI status and new comments in a single loop. Do not use `gh pr checks --watch` — it blocks the tool and prevents reacting to new comments while CI is running.
> **Note:** `gh pr checks --watch --fail-fast` is tempting but it blocks the entire Bash tool call, meaning the agent cannot check for or address new comments until CI fully completes. Always poll manually instead.
**Polling loop — repeat every 30 seconds:**
1. Check CI status:
```bash
gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,name,link
```
Parse the results: if every check has `bucket` of `"pass"` or `"skipping"`, CI is green. If any has `"fail"`, CI has failed. Otherwise CI is still pending.
2. Check for merge conflicts:
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json mergeable --jq '.mergeable'
```
If the result is `"CONFLICTING"`, the PR has a merge conflict — see "Resolving merge conflicts" below. If `"UNKNOWN"`, GitHub is still computing mergeability — wait and re-check next poll.
3. Check for new/changed comments (all three sources):
**Inline threads** — re-run the GraphQL query from "Fetch comments". For each unresolved thread, record `{thread_id, last_comment_databaseId}` as your baseline. On each poll, action is needed if:
- A new thread `id` appears that wasn't in the baseline (new thread), OR
- An existing thread's `last_comment_databaseId` has changed (new reply on existing thread)
**Conversation comments:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Compare total count and newest `id` against baseline. Filter to non-empty, non-bot, non-author-update messages.
**Top-level reviews:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
Watch for new non-empty reviews (`CHANGES_REQUESTED` or `COMMENTED` with body). Compare total count and newest `id` against baseline.
4. **React in this precedence order (first match wins):**
| What happened | Action |
|---|---|
| Merge conflict detected | See "Resolving merge conflicts" below. |
| Mergeability is `UNKNOWN` | GitHub is still computing mergeability. Sleep 30 seconds, then restart polling from the top. |
| New comments detected | Address them (fix → commit → push → reply). After pushing, re-fetch all comments to update your baseline, then restart this polling loop from the top (new commits invalidate CI status). |
| CI failed (bucket == "fail") | Get failed check links: `gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,link --jq '.[] \| select(.bucket == "fail") \| .link'`. Extract run ID from link (format: `.../actions/runs/<run-id>/job/...`), read logs with `gh run view <run-id> --repo Significant-Gravitas/AutoGPT --log-failed`. Fix → commit → push → restart polling. |
| CI green + no new comments | **Do not exit immediately.** Bots (coderabbitai, sentry) often post reviews shortly after CI settles. Continue polling for **2 more cycles (60s)** after CI goes green. Only exit after 2 consecutive green+quiet polls. |
| CI pending + no new comments | Sleep 30 seconds, then poll again. |
**The loop ends when:** CI fully green + all comments addressed + **2 consecutive polls with no new comments after CI settled.**
### Resolving merge conflicts
1. Identify the PR's target branch and remote:
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json baseRefName --jq '.baseRefName'
git remote -v # find the remote pointing to Significant-Gravitas/AutoGPT (typically 'upstream' in forks, 'origin' for direct contributors)
```
2. Pull the latest base branch with a 3-way merge:
```bash
git pull {base-remote} {base-branch} --no-rebase
```
3. Resolve conflicting files, then verify no conflict markers remain:
```bash
if grep -R -n -E '^(<<<<<<<|=======|>>>>>>>)' <conflicted-files>; then
echo "Unresolved conflict markers found — resolve before proceeding."
exit 1
fi
```
4. Stage and push:
```bash
git add <conflicted-files>
git commit -m "Resolve merge conflicts with {base-branch}"
git push
```
5. Restart the polling loop from the top — new commits reset CI status.

View File

@@ -28,7 +28,7 @@ gh pr diff {N}
Before posting anything, fetch existing inline comments to avoid duplicates:
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
```

View File

@@ -27,10 +27,91 @@ defaults:
working-directory: autogpt_platform/backend
jobs:
lint:
permissions:
contents: read
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-py3.12-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
echo "Using Poetry version ${HEAD_POETRY_VERSION}"
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
- name: Install Python dependencies
run: poetry install
- name: Run Linters
run: poetry run lint --skip-pyright
env:
CI: true
PLAIN_OUTPUT: True
type-check:
permissions:
contents: read
timeout-minutes: 10
strategy:
fail-fast: false
matrix:
python-version: ["3.11", "3.12", "3.13"]
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
echo "Using Poetry version ${HEAD_POETRY_VERSION}"
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
- name: Install Python dependencies
run: poetry install
- name: Generate Prisma Client
run: poetry run prisma generate && poetry run gen-prisma-stub
- name: Run Pyright
run: poetry run pyright --pythonversion ${{ matrix.python-version }}
env:
CI: true
PLAIN_OUTPUT: True
test:
permissions:
contents: read
timeout-minutes: 30
timeout-minutes: 15
strategy:
fail-fast: false
matrix:
@@ -98,9 +179,9 @@ jobs:
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry (Unix)
- name: Install Poetry
run: |
# Extract Poetry version from backend/poetry.lock
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
@@ -158,22 +239,22 @@ jobs:
echo "Waiting for ClamAV daemon to start..."
max_attempts=60
attempt=0
until nc -z localhost 3310 || [ $attempt -eq $max_attempts ]; do
echo "ClamAV is unavailable - sleeping (attempt $((attempt+1))/$max_attempts)"
sleep 5
attempt=$((attempt+1))
done
if [ $attempt -eq $max_attempts ]; then
echo "ClamAV failed to start after $((max_attempts*5)) seconds"
echo "Checking ClamAV service logs..."
docker logs $(docker ps -q --filter "ancestor=clamav/clamav-debian:latest") 2>&1 | tail -50 || echo "No ClamAV container found"
exit 1
fi
echo "ClamAV is ready!"
# Verify ClamAV is responsive
echo "Testing ClamAV connection..."
timeout 10 bash -c 'echo "PING" | nc localhost 3310' || {
@@ -188,18 +269,13 @@ jobs:
DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
DIRECT_URL: ${{ steps.supabase.outputs.DB_URL }}
- id: lint
name: Run Linter
run: poetry run lint
- name: Run pytest with coverage
- name: Run pytest
run: |
if [[ "${{ runner.debug }}" == "1" ]]; then
poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG
else
poetry run pytest -s -vv
fi
if: success() || (failure() && steps.lint.outcome == 'failure')
env:
LOG_LEVEL: ${{ runner.debug && 'DEBUG' || 'INFO' }}
DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
@@ -211,6 +287,12 @@ jobs:
REDIS_PORT: "6379"
ENCRYPTION_KEY: "dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=" # DO NOT USE IN PRODUCTION!!
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v4
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
# flags: backend,${{ runner.os }}
env:
CI: true
PLAIN_OUTPUT: True
@@ -224,9 +306,3 @@ jobs:
# the backend service, docker composes, and examples
RABBITMQ_DEFAULT_USER: "rabbitmq_user_default"
RABBITMQ_DEFAULT_PASS: "k0VMxyIJF9S35f3x2uaw5IWAl6Y536O7"
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v4
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
# flags: backend,${{ runner.os }}

View File

@@ -294,7 +294,7 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: playwright-report
path: autogpt_platform/frontend/playwright-report
if-no-files-found: ignore
retention-days: 3
@@ -303,7 +303,7 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: playwright-test-results
path: test-results
path: autogpt_platform/frontend/test-results
if-no-files-found: ignore
retention-days: 3

View File

@@ -56,15 +56,35 @@ AutoGPT Platform is a monorepo containing:
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
```bash
PR_BODY=$(mktemp)
cat > "$PR_BODY" << 'PREOF'
## Summary
- use `backticks` freely here
PREOF
gh pr create --title "..." --body-file "$PR_BODY" --base dev
rm "$PR_BODY"
```
- Run the github pre-commit hooks to ensure code quality.
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, follow a test-first approach:
1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
2. **Implement the fix/feature** — write the minimal code to make the test pass.
3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
This ensures every change is covered by a test and that the test actually validates the intended behavior.
### Reviewing/Revising Pull Requests
Use `/pr-review` to review a PR or `/pr-address` to address comments.
When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` — inline review comments
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
### Conventional Commits

View File

@@ -37,10 +37,6 @@ JWT_VERIFY_KEY=your-super-secret-jwt-token-with-at-least-32-characters-long
ENCRYPTION_KEY=dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=
UNSUBSCRIBE_SECRET_KEY=HlP8ivStJjmbf6NKi78m_3FnOogut0t5ckzjsIqeaio=
## ===== SIGNUP / INVITE GATE ===== ##
# Set to true to require an invite before users can sign up
ENABLE_INVITE_GATE=false
## ===== IMPORTANT OPTIONAL CONFIGURATION ===== ##
# Platform URLs (set these for webhooks and OAuth to work)
PLATFORM_BASE_URL=http://localhost:8000

View File

@@ -66,7 +66,7 @@ poetry run pytest path/to/test.py --snapshot-update
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **Lazy `%s` logging** — `logger.info("Processing %s items", count)` not `logger.info(f"Processing {count} items")`
- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
@@ -75,6 +75,7 @@ poetry run pytest path/to/test.py --snapshot-update
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
## Testing Approach
@@ -84,6 +85,30 @@ poetry run pytest path/to/test.py --snapshot-update
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, write the test **before** the implementation:
```python
# 1. Write a failing test marked xfail
@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
# 2. Run it — confirm it fails (XFAIL)
# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
# 3. Implement the fix
# 4. Remove xfail, run again — confirm it passes
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
```
This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
## Database Schema
Key models (defined in `schema.prisma`):

View File

@@ -50,7 +50,7 @@ RUN poetry install --no-ansi --no-root
# Generate Prisma client
COPY autogpt_platform/backend/schema.prisma ./
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
RUN poetry run prisma generate && poetry run gen-prisma-stub
# =============================== DB MIGRATOR =============================== #
@@ -82,7 +82,7 @@ RUN pip3 install prisma>=0.15.0 --break-system-packages
COPY autogpt_platform/backend/schema.prisma ./
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
COPY autogpt_platform/backend/migrations ./migrations
# ============================== BACKEND SERVER ============================== #
@@ -121,19 +121,37 @@ RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
&& ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
RUN apt-get update && apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
fonts-liberation libfontconfig1 \
# Install agent-browser (Copilot browser tool) + Chromium.
# On amd64: install runtime libs + run `agent-browser install` to download
# Chrome for Testing (pinned version, tested with Playwright).
# On arm64: install system chromium package — Chrome for Testing has no ARM64
# binary. AGENT_BROWSER_EXECUTABLE_PATH is set at runtime by the entrypoint
# script (below) to redirect agent-browser to the system binary.
ARG TARGETARCH
RUN apt-get update \
&& if [ "$TARGETARCH" = "arm64" ]; then \
apt-get install -y --no-install-recommends chromium fonts-liberation; \
else \
apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
fonts-liberation libfontconfig1; \
fi \
&& rm -rf /var/lib/apt/lists/* \
&& npm install -g agent-browser \
&& agent-browser install \
&& ([ "$TARGETARCH" = "arm64" ] || agent-browser install) \
&& rm -rf /tmp/* /root/.npm
# On arm64 the system chromium is at /usr/bin/chromium; set
# AGENT_BROWSER_EXECUTABLE_PATH so agent-browser's daemon uses it instead of
# Chrome for Testing (which has no ARM64 binary). On amd64 the variable is left
# unset so agent-browser uses the Chrome for Testing binary it downloaded above.
RUN printf '#!/bin/sh\n[ -x /usr/bin/chromium ] && export AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium\nexec "$@"\n' \
> /usr/local/bin/entrypoint.sh \
&& chmod +x /usr/local/bin/entrypoint.sh
WORKDIR /app/autogpt_platform/backend
# Copy only the .venv from builder (not the entire /app directory)
@@ -155,4 +173,5 @@ RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \
ENV PORT=8000
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["rest"]

View File

@@ -1,17 +1,8 @@
from __future__ import annotations
from datetime import datetime
from typing import TYPE_CHECKING, Any, Literal, Optional
import prisma.enums
from pydantic import BaseModel, EmailStr
from pydantic import BaseModel
from backend.data.model import UserTransaction
from backend.util.models import Pagination
if TYPE_CHECKING:
from backend.data.invited_user import BulkInvitedUsersResult, InvitedUserRecord
class UserHistoryResponse(BaseModel):
"""Response model for listings with version history"""
@@ -23,70 +14,3 @@ class UserHistoryResponse(BaseModel):
class AddUserCreditsResponse(BaseModel):
new_balance: int
transaction_key: str
class CreateInvitedUserRequest(BaseModel):
email: EmailStr
name: Optional[str] = None
class InvitedUserResponse(BaseModel):
id: str
email: str
status: prisma.enums.InvitedUserStatus
auth_user_id: Optional[str] = None
name: Optional[str] = None
tally_understanding: Optional[dict[str, Any]] = None
tally_status: prisma.enums.TallyComputationStatus
tally_computed_at: Optional[datetime] = None
tally_error: Optional[str] = None
created_at: datetime
updated_at: datetime
@classmethod
def from_record(cls, record: InvitedUserRecord) -> InvitedUserResponse:
return cls.model_validate(record.model_dump())
class InvitedUsersResponse(BaseModel):
invited_users: list[InvitedUserResponse]
pagination: Pagination
class BulkInvitedUserRowResponse(BaseModel):
row_number: int
email: Optional[str] = None
name: Optional[str] = None
status: Literal["CREATED", "SKIPPED", "ERROR"]
message: str
invited_user: Optional[InvitedUserResponse] = None
class BulkInvitedUsersResponse(BaseModel):
created_count: int
skipped_count: int
error_count: int
results: list[BulkInvitedUserRowResponse]
@classmethod
def from_result(cls, result: BulkInvitedUsersResult) -> BulkInvitedUsersResponse:
return cls(
created_count=result.created_count,
skipped_count=result.skipped_count,
error_count=result.error_count,
results=[
BulkInvitedUserRowResponse(
row_number=row.row_number,
email=row.email,
name=row.name,
status=row.status,
message=row.message,
invited_user=(
InvitedUserResponse.from_record(row.invited_user)
if row.invited_user is not None
else None
),
)
for row in result.results
],
)

View File

@@ -1,137 +0,0 @@
import logging
import math
from autogpt_libs.auth import get_user_id, requires_admin_user
from fastapi import APIRouter, File, Query, Security, UploadFile
from backend.data.invited_user import (
bulk_create_invited_users_from_file,
create_invited_user,
list_invited_users,
retry_invited_user_tally,
revoke_invited_user,
)
from backend.data.tally import mask_email
from backend.util.models import Pagination
from .model import (
BulkInvitedUsersResponse,
CreateInvitedUserRequest,
InvitedUserResponse,
InvitedUsersResponse,
)
logger = logging.getLogger(__name__)
router = APIRouter(
prefix="/admin",
tags=["users", "admin"],
dependencies=[Security(requires_admin_user)],
)
@router.get(
"/invited-users",
response_model=InvitedUsersResponse,
summary="List Invited Users",
)
async def get_invited_users(
admin_user_id: str = Security(get_user_id),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
) -> InvitedUsersResponse:
logger.info("Admin user %s requested invited users", admin_user_id)
invited_users, total = await list_invited_users(page=page, page_size=page_size)
return InvitedUsersResponse(
invited_users=[InvitedUserResponse.from_record(iu) for iu in invited_users],
pagination=Pagination(
total_items=total,
total_pages=max(1, math.ceil(total / page_size)),
current_page=page,
page_size=page_size,
),
)
@router.post(
"/invited-users",
response_model=InvitedUserResponse,
summary="Create Invited User",
)
async def create_invited_user_route(
request: CreateInvitedUserRequest,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s creating invited user for %s",
admin_user_id,
mask_email(request.email),
)
invited_user = await create_invited_user(request.email, request.name)
logger.info(
"Admin user %s created invited user %s",
admin_user_id,
invited_user.id,
)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/bulk",
response_model=BulkInvitedUsersResponse,
summary="Bulk Create Invited Users",
operation_id="postV2BulkCreateInvitedUsers",
)
async def bulk_create_invited_users_route(
file: UploadFile = File(...),
admin_user_id: str = Security(get_user_id),
) -> BulkInvitedUsersResponse:
logger.info(
"Admin user %s bulk invited users from %s",
admin_user_id,
file.filename or "<unnamed>",
)
content = await file.read()
result = await bulk_create_invited_users_from_file(file.filename, content)
return BulkInvitedUsersResponse.from_result(result)
@router.post(
"/invited-users/{invited_user_id}/revoke",
response_model=InvitedUserResponse,
summary="Revoke Invited User",
)
async def revoke_invited_user_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s revoking invited user %s", admin_user_id, invited_user_id
)
invited_user = await revoke_invited_user(invited_user_id)
logger.info("Admin user %s revoked invited user %s", admin_user_id, invited_user_id)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/{invited_user_id}/retry-tally",
response_model=InvitedUserResponse,
summary="Retry Invited User Tally",
)
async def retry_invited_user_tally_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s retrying Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
invited_user = await retry_invited_user_tally(invited_user_id)
logger.info(
"Admin user %s retried Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
return InvitedUserResponse.from_record(invited_user)

View File

@@ -1,168 +0,0 @@
from datetime import datetime, timezone
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import prisma.enums
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from backend.data.invited_user import (
BulkInvitedUserRowResult,
BulkInvitedUsersResult,
InvitedUserRecord,
)
from .user_admin_routes import router as user_admin_router
app = fastapi.FastAPI()
app.include_router(user_admin_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def _sample_invited_user() -> InvitedUserRecord:
now = datetime.now(timezone.utc)
return InvitedUserRecord(
id="invite-1",
email="invited@example.com",
status=prisma.enums.InvitedUserStatus.INVITED,
auth_user_id=None,
name="Invited User",
tally_understanding=None,
tally_status=prisma.enums.TallyComputationStatus.PENDING,
tally_computed_at=None,
tally_error=None,
created_at=now,
updated_at=now,
)
def _sample_bulk_invited_users_result() -> BulkInvitedUsersResult:
return BulkInvitedUsersResult(
created_count=1,
skipped_count=1,
error_count=0,
results=[
BulkInvitedUserRowResult(
row_number=1,
email="invited@example.com",
name=None,
status="CREATED",
message="Invite created",
invited_user=_sample_invited_user(),
),
BulkInvitedUserRowResult(
row_number=2,
email="duplicate@example.com",
name=None,
status="SKIPPED",
message="An invited user with this email already exists",
invited_user=None,
),
],
)
def test_get_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.list_invited_users",
AsyncMock(return_value=([_sample_invited_user()], 1)),
)
response = client.get("/admin/invited-users")
assert response.status_code == 200
data = response.json()
assert len(data["invited_users"]) == 1
assert data["invited_users"][0]["email"] == "invited@example.com"
assert data["invited_users"][0]["status"] == "INVITED"
assert data["pagination"]["total_items"] == 1
assert data["pagination"]["current_page"] == 1
assert data["pagination"]["page_size"] == 50
def test_create_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.create_invited_user",
AsyncMock(return_value=_sample_invited_user()),
)
response = client.post(
"/admin/invited-users",
json={"email": "invited@example.com", "name": "Invited User"},
)
assert response.status_code == 200
data = response.json()
assert data["email"] == "invited@example.com"
assert data["name"] == "Invited User"
def test_bulk_create_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.bulk_create_invited_users_from_file",
AsyncMock(return_value=_sample_bulk_invited_users_result()),
)
response = client.post(
"/admin/invited-users/bulk",
files={
"file": ("invites.txt", b"invited@example.com\nduplicate@example.com\n")
},
)
assert response.status_code == 200
data = response.json()
assert data["created_count"] == 1
assert data["skipped_count"] == 1
assert data["results"][0]["status"] == "CREATED"
assert data["results"][1]["status"] == "SKIPPED"
def test_revoke_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
revoked = _sample_invited_user().model_copy(
update={"status": prisma.enums.InvitedUserStatus.REVOKED}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.revoke_invited_user",
AsyncMock(return_value=revoked),
)
response = client.post("/admin/invited-users/invite-1/revoke")
assert response.status_code == 200
assert response.json()["status"] == "REVOKED"
def test_retry_invited_user_tally(
mocker: pytest_mock.MockerFixture,
) -> None:
retried = _sample_invited_user().model_copy(
update={"tally_status": prisma.enums.TallyComputationStatus.RUNNING}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.retry_invited_user_tally",
AsyncMock(return_value=retried),
)
response = client.post("/admin/invited-users/invite-1/retry-tally")
assert response.status_code == 200
assert response.json()["tally_status"] == "RUNNING"

View File

@@ -4,14 +4,12 @@ from difflib import SequenceMatcher
from typing import Any, Sequence, get_args, get_origin
import prisma
from prisma.enums import ContentType
from prisma.models import mv_suggested_blocks
import backend.api.features.library.db as library_db
import backend.api.features.library.model as library_model
import backend.api.features.store.db as store_db
import backend.api.features.store.model as store_model
from backend.api.features.store.hybrid_search import unified_hybrid_search
from backend.blocks import load_all_blocks
from backend.blocks._base import (
AnyBlockSchema,
@@ -24,6 +22,7 @@ from backend.blocks.llm import LlmModel
from backend.integrations.providers import ProviderName
from backend.util.cache import cached
from backend.util.models import Pagination
from backend.util.text import split_camelcase
from .model import (
BlockCategoryResponse,
@@ -271,7 +270,7 @@ async def _build_cached_search_results(
# Use hybrid search when query is present, otherwise list all blocks
if (include_blocks or include_integrations) and normalized_query:
block_results, block_total, integration_total = await _hybrid_search_blocks(
block_results, block_total, integration_total = await _text_search_blocks(
query=search_query,
include_blocks=include_blocks,
include_integrations=include_integrations,
@@ -383,117 +382,75 @@ def _collect_block_results(
return results, block_count, integration_count
async def _hybrid_search_blocks(
async def _text_search_blocks(
*,
query: str,
include_blocks: bool,
include_integrations: bool,
) -> tuple[list[_ScoredItem], int, int]:
"""
Search blocks using hybrid search with builder-specific filtering.
Search blocks using in-memory text matching over the block registry.
Uses unified_hybrid_search for semantic + lexical search, then applies
post-filtering for block/integration types and scoring adjustments.
All blocks are already loaded in memory, so this is fast and reliable
regardless of whether OpenAI embeddings are available.
Scoring:
- Base: hybrid relevance score (0-1) scaled to 0-100, plus BLOCK_SCORE_BOOST
- Base: text relevance via _score_primary_fields, plus BLOCK_SCORE_BOOST
to prioritize blocks over marketplace agents in combined results
- +30 for exact name match, +15 for prefix name match
- +20 if the block has an LlmModel field and the query matches an LLM model name
Args:
query: The search query string
include_blocks: Whether to include regular blocks
include_integrations: Whether to include integration blocks
Returns:
Tuple of (scored_items, block_count, integration_count)
"""
results: list[_ScoredItem] = []
block_count = 0
integration_count = 0
if not include_blocks and not include_integrations:
return results, block_count, integration_count
return results, 0, 0
normalized_query = query.strip().lower()
# Fetch more results to account for post-filtering
search_results, _ = await unified_hybrid_search(
query=query,
content_types=[ContentType.BLOCK],
page=1,
page_size=150,
min_score=0.10,
all_results, _, _ = _collect_block_results(
include_blocks=include_blocks,
include_integrations=include_integrations,
)
# Load all blocks for getting BlockInfo
all_blocks = load_all_blocks()
for result in search_results:
block_id = result["content_id"]
for item in all_results:
block_info = item.item
assert isinstance(block_info, BlockInfo)
name = split_camelcase(block_info.name).lower()
# Skip excluded blocks
if block_id in EXCLUDED_BLOCK_IDS:
continue
# Build rich description including input field descriptions,
# matching the searchable text that the embedding pipeline uses
desc_parts = [block_info.description or ""]
block_cls = all_blocks.get(block_info.id)
if block_cls is not None:
block: AnyBlockSchema = block_cls()
desc_parts += [
f"{f}: {info.description}"
for f, info in block.input_schema.model_fields.items()
if info.description
]
description = " ".join(desc_parts).lower()
metadata = result.get("metadata", {})
hybrid_score = result.get("relevance", 0.0)
# Get the actual block class
if block_id not in all_blocks:
continue
block_cls = all_blocks[block_id]
block: AnyBlockSchema = block_cls()
if block.disabled:
continue
# Check block/integration filter using metadata
is_integration = metadata.get("is_integration", False)
if is_integration and not include_integrations:
continue
if not is_integration and not include_blocks:
continue
# Get block info
block_info = block.get_info()
# Calculate final score: scale hybrid score and add builder-specific bonuses
# Hybrid scores are 0-1, builder scores were 0-200+
# Add BLOCK_SCORE_BOOST to prioritize blocks over marketplace agents
final_score = hybrid_score * 100 + BLOCK_SCORE_BOOST
score = _score_primary_fields(name, description, normalized_query)
# Add LLM model match bonus
has_llm_field = metadata.get("has_llm_model_field", False)
if has_llm_field and _matches_llm_model(block.input_schema, normalized_query):
final_score += 20
if block_cls is not None and _matches_llm_model(
block_cls().input_schema, normalized_query
):
score += 20
# Add exact/prefix match bonus for deterministic tie-breaking
name = block_info.name.lower()
if name == normalized_query:
final_score += 30
elif name.startswith(normalized_query):
final_score += 15
# Track counts
filter_type: FilterType = "integrations" if is_integration else "blocks"
if is_integration:
integration_count += 1
else:
block_count += 1
results.append(
_ScoredItem(
item=block_info,
filter_type=filter_type,
score=final_score,
sort_key=name,
if score >= MIN_SCORE_FOR_FILTERED_RESULTS:
results.append(
_ScoredItem(
item=block_info,
filter_type=item.filter_type,
score=score + BLOCK_SCORE_BOOST,
sort_key=name,
)
)
)
block_count = sum(1 for r in results if r.filter_type == "blocks")
integration_count = sum(1 for r in results if r.filter_type == "integrations")
return results, block_count, integration_count

View File

@@ -60,7 +60,6 @@ from backend.copilot.tools.models import (
)
from backend.copilot.tracking import track_user_message
from backend.data.redis_client import get_redis_async
from backend.data.understanding import get_business_understanding
from backend.data.workspace import get_or_create_workspace
from backend.util.exceptions import NotFoundError
@@ -895,36 +894,6 @@ async def session_assign_user(
return {"status": "ok"}
# ========== Suggested Prompts ==========
class SuggestedPromptsResponse(BaseModel):
"""Response model for user-specific suggested prompts."""
prompts: list[str]
@router.get(
"/suggested-prompts",
dependencies=[Security(auth.requires_user)],
)
async def get_suggested_prompts(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> SuggestedPromptsResponse:
"""
Get LLM-generated suggested prompts for the authenticated user.
Returns personalized quick-action prompts based on the user's
business understanding. Returns an empty list if no custom prompts
are available.
"""
understanding = await get_business_understanding(user_id)
if understanding is None:
return SuggestedPromptsResponse(prompts=[])
return SuggestedPromptsResponse(prompts=understanding.suggested_prompts)
# ========== Configuration ==========

View File

@@ -1,7 +1,7 @@
"""Tests for chat API routes: session title update, file attachment validation, usage, rate limiting, and suggested prompts."""
"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""
from datetime import UTC, datetime, timedelta
from unittest.mock import AsyncMock, MagicMock
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
@@ -400,62 +400,3 @@ def test_usage_rejects_unauthenticated_request() -> None:
response = unauthenticated_client.get("/usage")
assert response.status_code == 401
# ─── Suggested prompts endpoint ──────────────────────────────────────
def _mock_get_business_understanding(
mocker: pytest_mock.MockerFixture,
*,
return_value=None,
):
"""Mock get_business_understanding."""
return mocker.patch(
"backend.api.features.chat.routes.get_business_understanding",
new_callable=AsyncMock,
return_value=return_value,
)
def test_suggested_prompts_returns_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding and prompts gets them back."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = ["Do X", "Do Y", "Do Z"]
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": ["Do X", "Do Y", "Do Z"]}
def test_suggested_prompts_no_understanding(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with no understanding gets empty list."""
_mock_get_business_understanding(mocker, return_value=None)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}
def test_suggested_prompts_empty_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding but no prompts gets empty list."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = []
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}

View File

@@ -9,7 +9,7 @@ import prisma.errors
import prisma.models
import prisma.types
from backend.data.db import transaction
from backend.data.db import query_raw_with_schema, transaction
from backend.data.graph import (
GraphModel,
GraphModelWithoutNodes,
@@ -104,7 +104,8 @@ async def get_store_agents(
# search_used_hybrid remains False, will use fallback path below
# Convert hybrid search results (dict format) if hybrid succeeded
if search_used_hybrid:
# Fall through to direct DB search if hybrid returned nothing
if search_used_hybrid and agents:
total_pages = (total + page_size - 1) // page_size
store_agents: list[store_model.StoreAgent] = []
for agent in agents:
@@ -130,52 +131,20 @@ async def get_store_agents(
)
continue
if not search_used_hybrid:
# Fallback path - use basic search or no search
where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
if featured:
where_clause["featured"] = featured
if creators:
where_clause["creator_username"] = {"in": creators}
if category:
where_clause["categories"] = {"has": category}
# Add basic text search if search_query provided but hybrid failed
if search_query:
where_clause["OR"] = [
{"agent_name": {"contains": search_query, "mode": "insensitive"}},
{"sub_heading": {"contains": search_query, "mode": "insensitive"}},
{"description": {"contains": search_query, "mode": "insensitive"}},
]
order_by = []
if sorted_by == StoreAgentsSortOptions.RATING:
order_by.append({"rating": "desc"})
elif sorted_by == StoreAgentsSortOptions.RUNS:
order_by.append({"runs": "desc"})
elif sorted_by == StoreAgentsSortOptions.NAME:
order_by.append({"agent_name": "asc"})
elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
order_by.append({"updated_at": "desc"})
db_agents = await prisma.models.StoreAgent.prisma().find_many(
where=where_clause,
order=order_by,
skip=(page - 1) * page_size,
take=page_size,
if not search_used_hybrid or not agents:
# Fallback path: direct DB query with optional tsvector search.
# This mirrors the original pre-hybrid-search implementation.
store_agents, total = await _fallback_store_agent_search(
search_query=search_query,
featured=featured,
creators=creators,
category=category,
sorted_by=sorted_by,
page=page,
page_size=page_size,
)
total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
total_pages = (total + page_size - 1) // page_size
store_agents: list[store_model.StoreAgent] = []
for agent in db_agents:
try:
store_agents.append(store_model.StoreAgent.from_db(agent))
except Exception as e:
logger.error(f"Error parsing StoreAgent from db: {e}")
continue
logger.debug(f"Found {len(store_agents)} agents")
return store_model.StoreAgentsResponse(
agents=store_agents,
@@ -195,6 +164,126 @@ async def get_store_agents(
# await log_search_term(search_query=search_term)
async def _fallback_store_agent_search(
*,
search_query: str | None,
featured: bool,
creators: list[str] | None,
category: str | None,
sorted_by: StoreAgentsSortOptions | None,
page: int,
page_size: int,
) -> tuple[list[store_model.StoreAgent], int]:
"""Direct DB search fallback when hybrid search is unavailable or empty.
Uses ad-hoc to_tsvector/plainto_tsquery with ts_rank_cd for text search,
matching the quality of the original pre-hybrid-search implementation.
Falls back to simple listing when no search query is provided.
"""
if not search_query:
# No search query — use Prisma for simple filtered listing
where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
if featured:
where_clause["featured"] = featured
if creators:
where_clause["creator_username"] = {"in": creators}
if category:
where_clause["categories"] = {"has": category}
order_by = []
if sorted_by == StoreAgentsSortOptions.RATING:
order_by.append({"rating": "desc"})
elif sorted_by == StoreAgentsSortOptions.RUNS:
order_by.append({"runs": "desc"})
elif sorted_by == StoreAgentsSortOptions.NAME:
order_by.append({"agent_name": "asc"})
elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
order_by.append({"updated_at": "desc"})
db_agents = await prisma.models.StoreAgent.prisma().find_many(
where=where_clause,
order=order_by,
skip=(page - 1) * page_size,
take=page_size,
)
total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
return [store_model.StoreAgent.from_db(a) for a in db_agents], total
# Text search using ad-hoc tsvector on StoreAgent view fields
params: list[Any] = [search_query]
filters = ["sa.is_available = true"]
param_idx = 2
if featured:
filters.append("sa.featured = true")
if creators:
params.append(creators)
filters.append(f"sa.creator_username = ANY(${param_idx})")
param_idx += 1
if category:
params.append(category)
filters.append(f"${param_idx} = ANY(sa.categories)")
param_idx += 1
where_sql = " AND ".join(filters)
params.extend([page_size, (page - 1) * page_size])
limit_param = f"${param_idx}"
param_idx += 1
offset_param = f"${param_idx}"
sql = f"""
WITH ranked AS (
SELECT sa.*,
ts_rank_cd(
to_tsvector('english',
COALESCE(sa.agent_name, '') || ' ' ||
COALESCE(sa.sub_heading, '') || ' ' ||
COALESCE(sa.description, '')
),
plainto_tsquery('english', $1)
) AS rank,
COUNT(*) OVER () AS total_count
FROM {{schema_prefix}}"StoreAgent" sa
WHERE {where_sql}
AND to_tsvector('english',
COALESCE(sa.agent_name, '') || ' ' ||
COALESCE(sa.sub_heading, '') || ' ' ||
COALESCE(sa.description, '')
) @@ plainto_tsquery('english', $1)
)
SELECT * FROM ranked
ORDER BY rank DESC
LIMIT {limit_param} OFFSET {offset_param}
"""
results = await query_raw_with_schema(sql, *params)
total = results[0]["total_count"] if results else 0
store_agents = []
for row in results:
try:
store_agents.append(
store_model.StoreAgent(
slug=row["slug"],
agent_name=row["agent_name"],
agent_image=row["agent_image"][0] if row["agent_image"] else "",
creator=row["creator_username"] or "Needs Profile",
creator_avatar=row["creator_avatar"] or "",
sub_heading=row["sub_heading"],
description=row["description"],
runs=row["runs"],
rating=row["rating"],
agent_graph_id=row.get("graph_id", ""),
)
)
except Exception as e:
logger.error(f"Error parsing StoreAgent from fallback search: {e}")
continue
return store_agents, total
async def log_search_term(search_query: str):
"""Log a search term to the database"""
@@ -1139,16 +1228,21 @@ async def review_store_submission(
},
)
# Generate embedding for approved listing (blocking - admin operation)
# Inside transaction: if embedding fails, entire transaction rolls back
await ensure_embedding(
version_id=store_listing_version_id,
name=submission.name,
description=submission.description,
sub_heading=submission.subHeading,
categories=submission.categories,
tx=tx,
)
# Generate embedding for approved listing (best-effort)
try:
await ensure_embedding(
version_id=store_listing_version_id,
name=submission.name,
description=submission.description,
sub_heading=submission.subHeading,
categories=submission.categories,
tx=tx,
)
except Exception as emb_err:
logger.warning(
f"Could not generate embedding for listing "
f"{store_listing_version_id}: {emb_err}"
)
await prisma.models.StoreListing.prisma(tx).update(
where={"id": submission.storeListingId},

View File

@@ -1,5 +1,4 @@
import logging
import tempfile
import urllib.parse
import autogpt_libs.auth
@@ -259,21 +258,18 @@ async def get_graph_meta_by_store_listing_version_id(
)
async def download_agent_file(
store_listing_version_id: str,
) -> fastapi.responses.FileResponse:
) -> fastapi.responses.Response:
"""Download agent graph file for a specific marketplace listing version"""
graph_data = await store_db.get_agent(store_listing_version_id)
file_name = f"agent_{graph_data.id}_v{graph_data.version or 'latest'}.json"
# Sending graph as a stream (similar to marketplace v1)
with tempfile.NamedTemporaryFile(
mode="w", suffix=".json", delete=False
) as tmp_file:
tmp_file.write(backend.util.json.dumps(graph_data))
tmp_file.flush()
return fastapi.responses.FileResponse(
tmp_file.name, filename=file_name, media_type="application/json"
)
return fastapi.responses.Response(
content=backend.util.json.dumps(graph_data),
media_type="application/json",
headers={
"Content-Disposition": f'attachment; filename="{file_name}"',
},
)
##############################################

View File

@@ -55,7 +55,6 @@ from backend.data.credit import (
set_auto_top_up,
)
from backend.data.graph import GraphSettings
from backend.data.invited_user import get_or_activate_user
from backend.data.model import CredentialsMetaInput, UserOnboarding
from backend.data.notifications import NotificationPreference, NotificationPreferenceDTO
from backend.data.onboarding import (
@@ -71,6 +70,7 @@ from backend.data.onboarding import (
update_user_onboarding,
)
from backend.data.user import (
get_or_create_user,
get_user_by_id,
get_user_notification_preference,
update_user_email,
@@ -136,10 +136,12 @@ _tally_background_tasks: set[asyncio.Task] = set()
dependencies=[Security(requires_user)],
)
async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
user = await get_or_activate_user(user_data)
user = await get_or_create_user(user_data)
# Fire-and-forget: backfill Tally understanding when invite pre-seeding did
# not produce a stored result before first activation.
# Fire-and-forget: populate business understanding from Tally form.
# We use created_at proximity instead of an is_new flag because
# get_or_create_user is cached — a separate is_new return value would be
# unreliable on repeated calls within the cache TTL.
age_seconds = (datetime.now(timezone.utc) - user.created_at).total_seconds()
if age_seconds < 30:
try:
@@ -163,8 +165,7 @@ async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
dependencies=[Security(requires_user)],
)
async def update_user_email_route(
user_id: Annotated[str, Security(get_user_id)],
email: str = Body(...),
user_id: Annotated[str, Security(get_user_id)], email: str = Body(...)
) -> dict[str, str]:
await update_user_email(user_id, email)
@@ -178,16 +179,10 @@ async def update_user_email_route(
dependencies=[Security(requires_user)],
)
async def get_user_timezone_route(
user_id: Annotated[str, Security(get_user_id)],
user_data: dict = Security(get_jwt_payload),
) -> TimezoneResponse:
"""Get user timezone setting."""
try:
user = await get_user_by_id(user_id)
except ValueError:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail="User not found. Please complete activation via /auth/user first.",
)
user = await get_or_create_user(user_data)
return TimezoneResponse(timezone=user.timezone)
@@ -198,8 +193,7 @@ async def get_user_timezone_route(
dependencies=[Security(requires_user)],
)
async def update_user_timezone_route(
user_id: Annotated[str, Security(get_user_id)],
request: UpdateTimezoneRequest,
user_id: Annotated[str, Security(get_user_id)], request: UpdateTimezoneRequest
) -> TimezoneResponse:
"""Update user timezone. The timezone should be a valid IANA timezone identifier."""
user = await update_user_timezone(user_id, str(request.timezone))

View File

@@ -51,7 +51,7 @@ def test_get_or_create_user_route(
}
mocker.patch(
"backend.api.features.v1.get_or_activate_user",
"backend.api.features.v1.get_or_create_user",
return_value=mock_user,
)

View File

@@ -19,7 +19,6 @@ from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.store_admin_routes
import backend.api.features.admin.user_admin_routes
import backend.api.features.builder
import backend.api.features.builder.routes
import backend.api.features.chat.routes as chat_routes
@@ -312,11 +311,6 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/executions",
)
app.include_router(
backend.api.features.admin.user_admin_routes.router,
tags=["v2", "admin"],
prefix="/api/users",
)
app.include_router(
backend.api.features.executions.review.routes.router,
tags=["v2", "executions", "review"],

View File

@@ -0,0 +1,33 @@
"""
Shared configuration for all AgentMail blocks.
"""
from agentmail import AsyncAgentMail
from backend.sdk import APIKeyCredentials, ProviderBuilder, SecretStr
agent_mail = (
ProviderBuilder("agent_mail")
.with_api_key("AGENTMAIL_API_KEY", "AgentMail API Key")
.build()
)
TEST_CREDENTIALS = APIKeyCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
provider="agent_mail",
title="Mock AgentMail API Key",
api_key=SecretStr("mock-agentmail-api-key"),
expires_at=None,
)
TEST_CREDENTIALS_INPUT = {
"id": TEST_CREDENTIALS.id,
"provider": TEST_CREDENTIALS.provider,
"type": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.title,
}
def _client(credentials: APIKeyCredentials) -> AsyncAgentMail:
"""Create an AsyncAgentMail client from credentials."""
return AsyncAgentMail(api_key=credentials.api_key.get_secret_value())

View File

@@ -0,0 +1,211 @@
"""
AgentMail Attachment blocks — download file attachments from messages and threads.
Attachments are files associated with messages (PDFs, CSVs, images, etc.).
To send attachments, include them in the attachments parameter when using
AgentMailSendMessageBlock or AgentMailReplyToMessageBlock.
To download, first get the attachment_id from a message's attachments array,
then use these blocks to retrieve the file content as base64.
"""
import base64
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailGetMessageAttachmentBlock(Block):
"""
Download a file attachment from a specific email message.
Retrieves the raw file content and returns it as base64-encoded data.
First get the attachment_id from a message object's attachments array,
then use this block to download the file.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the message belongs to"
)
message_id: str = SchemaField(
description="Message ID containing the attachment"
)
attachment_id: str = SchemaField(
description="Attachment ID to download (from the message's attachments array)"
)
class Output(BlockSchemaOutput):
content_base64: str = SchemaField(
description="File content encoded as a base64 string. Decode with base64.b64decode() to get raw bytes."
)
attachment_id: str = SchemaField(
description="The attachment ID that was downloaded"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="a283ffc4-8087-4c3d-9135-8f26b86742ec",
description="Download a file attachment from an email message. Returns base64-encoded file content.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"attachment_id": "test-attach",
},
test_output=[
("content_base64", "dGVzdA=="),
("attachment_id", "test-attach"),
],
test_mock={
"get_attachment": lambda *a, **kw: b"test",
},
)
@staticmethod
async def get_attachment(
credentials: APIKeyCredentials,
inbox_id: str,
message_id: str,
attachment_id: str,
):
client = _client(credentials)
return await client.inboxes.messages.get_attachment(
inbox_id=inbox_id,
message_id=message_id,
attachment_id=attachment_id,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
data = await self.get_attachment(
credentials=credentials,
inbox_id=input_data.inbox_id,
message_id=input_data.message_id,
attachment_id=input_data.attachment_id,
)
if isinstance(data, bytes):
encoded = base64.b64encode(data).decode()
elif isinstance(data, str):
encoded = base64.b64encode(data.encode("utf-8")).decode()
else:
raise TypeError(
f"Unexpected attachment data type: {type(data).__name__}"
)
yield "content_base64", encoded
yield "attachment_id", input_data.attachment_id
except Exception as e:
yield "error", str(e)
class AgentMailGetThreadAttachmentBlock(Block):
"""
Download a file attachment from a conversation thread.
Same as GetMessageAttachment but looks up by thread ID instead of
message ID. Useful when you know the thread but not the specific
message containing the attachment.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the thread belongs to"
)
thread_id: str = SchemaField(description="Thread ID containing the attachment")
attachment_id: str = SchemaField(
description="Attachment ID to download (from a message's attachments array within the thread)"
)
class Output(BlockSchemaOutput):
content_base64: str = SchemaField(
description="File content encoded as a base64 string. Decode with base64.b64decode() to get raw bytes."
)
attachment_id: str = SchemaField(
description="The attachment ID that was downloaded"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="06b6a4c4-9d71-4992-9e9c-cf3b352763b5",
description="Download a file attachment from a conversation thread. Returns base64-encoded file content.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"thread_id": "test-thread",
"attachment_id": "test-attach",
},
test_output=[
("content_base64", "dGVzdA=="),
("attachment_id", "test-attach"),
],
test_mock={
"get_attachment": lambda *a, **kw: b"test",
},
)
@staticmethod
async def get_attachment(
credentials: APIKeyCredentials,
inbox_id: str,
thread_id: str,
attachment_id: str,
):
client = _client(credentials)
return await client.inboxes.threads.get_attachment(
inbox_id=inbox_id,
thread_id=thread_id,
attachment_id=attachment_id,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
data = await self.get_attachment(
credentials=credentials,
inbox_id=input_data.inbox_id,
thread_id=input_data.thread_id,
attachment_id=input_data.attachment_id,
)
if isinstance(data, bytes):
encoded = base64.b64encode(data).decode()
elif isinstance(data, str):
encoded = base64.b64encode(data.encode("utf-8")).decode()
else:
raise TypeError(
f"Unexpected attachment data type: {type(data).__name__}"
)
yield "content_base64", encoded
yield "attachment_id", input_data.attachment_id
except Exception as e:
yield "error", str(e)

View File

@@ -0,0 +1,678 @@
"""
AgentMail Draft blocks — create, get, list, update, send, and delete drafts.
A Draft is an unsent message that can be reviewed, edited, and sent later.
Drafts enable human-in-the-loop review, scheduled sending (via send_at),
and complex multi-step email composition workflows.
"""
from typing import Optional
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailCreateDraftBlock(Block):
"""
Create a draft email in an AgentMail inbox for review or scheduled sending.
Drafts let agents prepare emails without sending immediately. Use send_at
to schedule automatic sending at a future time (ISO 8601 format).
Scheduled drafts are auto-labeled 'scheduled' and can be cancelled by
deleting the draft.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to create the draft in"
)
to: list[str] = SchemaField(
description="Recipient email addresses (e.g. ['user@example.com'])"
)
subject: str = SchemaField(description="Email subject line", default="")
text: str = SchemaField(description="Plain text body of the draft", default="")
html: str = SchemaField(
description="Rich HTML body of the draft", default="", advanced=True
)
cc: list[str] = SchemaField(
description="CC recipient email addresses",
default_factory=list,
advanced=True,
)
bcc: list[str] = SchemaField(
description="BCC recipient email addresses",
default_factory=list,
advanced=True,
)
in_reply_to: str = SchemaField(
description="Message ID this draft replies to, for threading follow-up drafts",
default="",
advanced=True,
)
send_at: str = SchemaField(
description="Schedule automatic sending at this ISO 8601 datetime (e.g. '2025-01-15T09:00:00Z'). Leave empty for manual send.",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
draft_id: str = SchemaField(
description="Unique identifier of the created draft"
)
send_status: str = SchemaField(
description="'scheduled' if send_at was set, empty otherwise. Values: scheduled, sending, failed.",
default="",
)
result: dict = SchemaField(
description="Complete draft object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="25ac9086-69fd-48b8-b910-9dbe04b8f3bd",
description="Create a draft email for review or scheduled sending. Use send_at for automatic future delivery.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"to": ["user@example.com"],
},
test_output=[
("draft_id", "mock-draft-id"),
("send_status", ""),
("result", dict),
],
test_mock={
"create_draft": lambda *a, **kw: type(
"Draft",
(),
{
"draft_id": "mock-draft-id",
"send_status": "",
"model_dump": lambda self: {"draft_id": "mock-draft-id"},
},
)(),
},
)
@staticmethod
async def create_draft(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.drafts.create(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"to": input_data.to}
if input_data.subject:
params["subject"] = input_data.subject
if input_data.text:
params["text"] = input_data.text
if input_data.html:
params["html"] = input_data.html
if input_data.cc:
params["cc"] = input_data.cc
if input_data.bcc:
params["bcc"] = input_data.bcc
if input_data.in_reply_to:
params["in_reply_to"] = input_data.in_reply_to
if input_data.send_at:
params["send_at"] = input_data.send_at
draft = await self.create_draft(credentials, input_data.inbox_id, **params)
result = draft.model_dump()
yield "draft_id", draft.draft_id
yield "send_status", draft.send_status or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailGetDraftBlock(Block):
"""
Retrieve a specific draft from an AgentMail inbox.
Returns the draft contents including recipients, subject, body, and
scheduled send status. Use this to review a draft before approving it.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(description="Draft ID to retrieve")
class Output(BlockSchemaOutput):
draft_id: str = SchemaField(description="Unique identifier of the draft")
subject: str = SchemaField(description="Draft subject line", default="")
send_status: str = SchemaField(
description="Scheduled send status: 'scheduled', 'sending', 'failed', or empty",
default="",
)
send_at: str = SchemaField(
description="Scheduled send time (ISO 8601) if set", default=""
)
result: dict = SchemaField(description="Complete draft object with all fields")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="8e57780d-dc25-43d4-a0f4-1f02877b09fb",
description="Retrieve a draft email to review its contents, recipients, and scheduled send status.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[
("draft_id", "test-draft"),
("subject", ""),
("send_status", ""),
("send_at", ""),
("result", dict),
],
test_mock={
"get_draft": lambda *a, **kw: type(
"Draft",
(),
{
"draft_id": "test-draft",
"subject": "",
"send_status": "",
"send_at": "",
"model_dump": lambda self: {"draft_id": "test-draft"},
},
)(),
},
)
@staticmethod
async def get_draft(credentials: APIKeyCredentials, inbox_id: str, draft_id: str):
client = _client(credentials)
return await client.inboxes.drafts.get(inbox_id=inbox_id, draft_id=draft_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
draft = await self.get_draft(
credentials, input_data.inbox_id, input_data.draft_id
)
result = draft.model_dump()
yield "draft_id", draft.draft_id
yield "subject", draft.subject or ""
yield "send_status", draft.send_status or ""
yield "send_at", draft.send_at or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListDraftsBlock(Block):
"""
List all drafts in an AgentMail inbox with optional label filtering.
Use labels=['scheduled'] to find all drafts queued for future sending.
Useful for building approval dashboards or monitoring pending outreach.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to list drafts from"
)
limit: int = SchemaField(
description="Maximum number of drafts to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Filter drafts by labels (e.g. ['scheduled'] for pending sends)",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
drafts: list[dict] = SchemaField(
description="List of draft objects with subject, recipients, send_status, etc."
)
count: int = SchemaField(description="Number of drafts returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="e84883b7-7c39-4c5c-88e8-0a72b078ea63",
description="List drafts in an AgentMail inbox. Filter by labels=['scheduled'] to find pending sends.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("drafts", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_drafts": lambda *a, **kw: type(
"Resp",
(),
{
"drafts": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_drafts(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.drafts.list(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_drafts(
credentials, input_data.inbox_id, **params
)
drafts = [d.model_dump() for d in response.drafts]
yield "drafts", drafts
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailUpdateDraftBlock(Block):
"""
Update an existing draft's content, recipients, or scheduled send time.
Use this to reschedule a draft (change send_at), modify recipients,
or edit the subject/body before sending. To cancel a scheduled send,
delete the draft instead.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(description="Draft ID to update")
to: Optional[list[str]] = SchemaField(
description="Updated recipient email addresses (replaces existing list). Omit to keep current value.",
default=None,
)
subject: Optional[str] = SchemaField(
description="Updated subject line. Omit to keep current value.",
default=None,
)
text: Optional[str] = SchemaField(
description="Updated plain text body. Omit to keep current value.",
default=None,
)
html: Optional[str] = SchemaField(
description="Updated HTML body. Omit to keep current value.",
default=None,
advanced=True,
)
send_at: Optional[str] = SchemaField(
description="Reschedule: new ISO 8601 send time (e.g. '2025-01-20T14:00:00Z'). Omit to keep current value.",
default=None,
advanced=True,
)
class Output(BlockSchemaOutput):
draft_id: str = SchemaField(description="The updated draft ID")
send_status: str = SchemaField(description="Updated send status", default="")
result: dict = SchemaField(description="Complete updated draft object")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="351f6e51-695a-421a-9032-46a587b10336",
description="Update a draft's content, recipients, or scheduled send time. Use to reschedule or edit before sending.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[
("draft_id", "test-draft"),
("send_status", ""),
("result", dict),
],
test_mock={
"update_draft": lambda *a, **kw: type(
"Draft",
(),
{
"draft_id": "test-draft",
"send_status": "",
"model_dump": lambda self: {"draft_id": "test-draft"},
},
)(),
},
)
@staticmethod
async def update_draft(
credentials: APIKeyCredentials, inbox_id: str, draft_id: str, **params
):
client = _client(credentials)
return await client.inboxes.drafts.update(
inbox_id=inbox_id, draft_id=draft_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.to is not None:
params["to"] = input_data.to
if input_data.subject is not None:
params["subject"] = input_data.subject
if input_data.text is not None:
params["text"] = input_data.text
if input_data.html is not None:
params["html"] = input_data.html
if input_data.send_at is not None:
params["send_at"] = input_data.send_at
draft = await self.update_draft(
credentials, input_data.inbox_id, input_data.draft_id, **params
)
result = draft.model_dump()
yield "draft_id", draft.draft_id
yield "send_status", draft.send_status or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailSendDraftBlock(Block):
"""
Send a draft immediately, converting it into a delivered message.
The draft is deleted after successful sending and becomes a regular
message with a message_id. Use this for human-in-the-loop approval
workflows: agent creates draft, human reviews, then this block sends it.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(description="Draft ID to send now")
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Message ID of the now-sent email (draft is deleted)"
)
thread_id: str = SchemaField(
description="Thread ID the sent message belongs to"
)
result: dict = SchemaField(description="Complete sent message object")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="37c39e83-475d-4b3d-843a-d923d001b85a",
description="Send a draft immediately, converting it into a delivered message. The draft is deleted after sending.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[
("message_id", "mock-msg-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"send_draft": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-msg-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {"message_id": "mock-msg-id"},
},
)(),
},
)
@staticmethod
async def send_draft(credentials: APIKeyCredentials, inbox_id: str, draft_id: str):
client = _client(credentials)
return await client.inboxes.drafts.send(inbox_id=inbox_id, draft_id=draft_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
msg = await self.send_draft(
credentials, input_data.inbox_id, input_data.draft_id
)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "thread_id", msg.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailDeleteDraftBlock(Block):
"""
Delete a draft from an AgentMail inbox. Also cancels any scheduled send.
If the draft was scheduled with send_at, deleting it cancels the
scheduled delivery. This is the way to cancel a scheduled email.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(
description="Draft ID to delete (also cancels scheduled sends)"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the draft was successfully deleted/cancelled"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="9023eb99-3e2f-4def-808b-d9c584b3d9e7",
description="Delete a draft or cancel a scheduled email. Removes the draft permanently.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[("success", True)],
test_mock={
"delete_draft": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_draft(
credentials: APIKeyCredentials, inbox_id: str, draft_id: str
):
client = _client(credentials)
await client.inboxes.drafts.delete(inbox_id=inbox_id, draft_id=draft_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_draft(
credentials, input_data.inbox_id, input_data.draft_id
)
yield "success", True
except Exception as e:
yield "error", str(e)
class AgentMailListOrgDraftsBlock(Block):
"""
List all drafts across every inbox in your organization.
Returns drafts from all inboxes in one query. Perfect for building
a central approval dashboard where a human supervisor can review
and approve any draft created by any agent.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of drafts to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
drafts: list[dict] = SchemaField(
description="List of draft objects from all inboxes in the organization"
)
count: int = SchemaField(description="Number of drafts returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="ed7558ae-3a07-45f5-af55-a25fe88c9971",
description="List all drafts across every inbox in your organization. Use for central approval dashboards.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("drafts", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_org_drafts": lambda *a, **kw: type(
"Resp",
(),
{
"drafts": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_org_drafts(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.drafts.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_org_drafts(credentials, **params)
drafts = [d.model_dump() for d in response.drafts]
yield "drafts", drafts
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)

View File

@@ -0,0 +1,414 @@
"""
AgentMail Inbox blocks — create, get, list, update, and delete inboxes.
An Inbox is a fully programmable email account for AI agents. Each inbox gets
a unique email address and can send, receive, and manage emails via the
AgentMail API. You can create thousands of inboxes on demand.
"""
from agentmail.inboxes.types import CreateInboxRequest
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailCreateInboxBlock(Block):
"""
Create a new email inbox for an AI agent via AgentMail.
Each inbox gets a unique email address (e.g. username@agentmail.to).
If username and domain are not provided, AgentMail auto-generates them.
Use custom domains by specifying the domain field.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
username: str = SchemaField(
description="Local part of the email address (e.g. 'support' for support@domain.com). Leave empty to auto-generate.",
default="",
advanced=False,
)
domain: str = SchemaField(
description="Email domain (e.g. 'mydomain.com'). Defaults to agentmail.to if empty.",
default="",
advanced=False,
)
display_name: str = SchemaField(
description="Friendly name shown in the 'From' field of sent emails (e.g. 'Support Agent')",
default="",
advanced=False,
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(
description="Unique identifier for the created inbox (also the email address)"
)
email_address: str = SchemaField(
description="Full email address of the inbox (e.g. support@agentmail.to)"
)
result: dict = SchemaField(
description="Complete inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="7a8ac219-c6ec-4eec-a828-81af283ce04c",
description="Create a new email inbox for an AI agent via AgentMail. Each inbox gets a unique address and can send/receive emails.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("inbox_id", "mock-inbox-id"),
("email_address", "mock-inbox-id"),
("result", dict),
],
test_mock={
"create_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "mock-inbox-id",
"model_dump": lambda self: {"inbox_id": "mock-inbox-id"},
},
)(),
},
)
@staticmethod
async def create_inbox(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.inboxes.create(request=CreateInboxRequest(**params))
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.username:
params["username"] = input_data.username
if input_data.domain:
params["domain"] = input_data.domain
if input_data.display_name:
params["display_name"] = input_data.display_name
inbox = await self.create_inbox(credentials, **params)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "email_address", inbox.inbox_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailGetInboxBlock(Block):
"""
Retrieve details of an existing AgentMail inbox by its ID or email address.
Returns the inbox metadata including email address, display name, and
configuration. Use this to check if an inbox exists or get its properties.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to look up (e.g. 'support@agentmail.to')"
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(description="Unique identifier of the inbox")
email_address: str = SchemaField(description="Full email address of the inbox")
display_name: str = SchemaField(
description="Friendly name shown in the 'From' field", default=""
)
result: dict = SchemaField(
description="Complete inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b858f62b-6c12-4736-aaf2-dbc5a9281320",
description="Retrieve details of an existing AgentMail inbox including its email address, display name, and configuration.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("inbox_id", "test-inbox"),
("email_address", "test-inbox"),
("display_name", ""),
("result", dict),
],
test_mock={
"get_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "test-inbox",
"display_name": "",
"model_dump": lambda self: {"inbox_id": "test-inbox"},
},
)(),
},
)
@staticmethod
async def get_inbox(credentials: APIKeyCredentials, inbox_id: str):
client = _client(credentials)
return await client.inboxes.get(inbox_id=inbox_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
inbox = await self.get_inbox(credentials, input_data.inbox_id)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "email_address", inbox.inbox_id
yield "display_name", inbox.display_name or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListInboxesBlock(Block):
"""
List all email inboxes in your AgentMail organization.
Returns a paginated list of all inboxes with their metadata.
Use page_token for pagination when you have many inboxes.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of inboxes to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page of results",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
inboxes: list[dict] = SchemaField(
description="List of inbox objects, each containing inbox_id, email_address, display_name, etc."
)
count: int = SchemaField(
description="Total number of inboxes in your organization"
)
next_page_token: str = SchemaField(
description="Token to pass as page_token to get the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="cfd84a06-2121-4cef-8d14-8badf52d22f0",
description="List all email inboxes in your AgentMail organization with pagination support.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("inboxes", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_inboxes": lambda *a, **kw: type(
"Resp",
(),
{
"inboxes": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_inboxes(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.inboxes.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_inboxes(credentials, **params)
inboxes = [i.model_dump() for i in response.inboxes]
yield "inboxes", inboxes
yield "count", (c if (c := response.count) is not None else len(inboxes))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailUpdateInboxBlock(Block):
"""
Update the display name of an existing AgentMail inbox.
Changes the friendly name shown in the 'From' field when emails are sent
from this inbox. The email address itself cannot be changed.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to update (e.g. 'support@agentmail.to')"
)
display_name: str = SchemaField(
description="New display name for the inbox (e.g. 'Customer Support Bot')"
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(description="The updated inbox ID")
result: dict = SchemaField(
description="Complete updated inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="59b49f59-a6d1-4203-94c0-3908adac50b6",
description="Update the display name of an AgentMail inbox. Changes the 'From' name shown when emails are sent.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"display_name": "Updated",
},
test_output=[
("inbox_id", "test-inbox"),
("result", dict),
],
test_mock={
"update_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "test-inbox",
"model_dump": lambda self: {"inbox_id": "test-inbox"},
},
)(),
},
)
@staticmethod
async def update_inbox(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.update(inbox_id=inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
inbox = await self.update_inbox(
credentials,
input_data.inbox_id,
display_name=input_data.display_name,
)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailDeleteInboxBlock(Block):
"""
Permanently delete an AgentMail inbox and all its data.
This removes the inbox, all its messages, threads, and drafts.
This action cannot be undone. The email address will no longer
receive or send emails.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to permanently delete"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the inbox was successfully deleted"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="ade970ae-8428-4a7b-9278-b52054dbf535",
description="Permanently delete an AgentMail inbox and all its messages, threads, and drafts. This action cannot be undone.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[("success", True)],
test_mock={
"delete_inbox": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_inbox(credentials: APIKeyCredentials, inbox_id: str):
client = _client(credentials)
await client.inboxes.delete(inbox_id=inbox_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_inbox(credentials, input_data.inbox_id)
yield "success", True
except Exception as e:
yield "error", str(e)

View File

@@ -0,0 +1,384 @@
"""
AgentMail List blocks — manage allow/block lists for email filtering.
Lists let you control which email addresses and domains your agents can
send to or receive from. There are four list types based on two dimensions:
direction (send/receive) and type (allow/block).
- receive + allow: Only accept emails from these addresses/domains
- receive + block: Reject emails from these addresses/domains
- send + allow: Only send emails to these addresses/domains
- send + block: Prevent sending emails to these addresses/domains
"""
from enum import Enum
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class ListDirection(str, Enum):
SEND = "send"
RECEIVE = "receive"
class ListType(str, Enum):
ALLOW = "allow"
BLOCK = "block"
class AgentMailListEntriesBlock(Block):
"""
List all entries in an AgentMail allow/block list.
Retrieves email addresses and domains that are currently allowed
or blocked for sending or receiving. Use direction and list_type
to select which of the four lists to query.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' to filter outgoing emails, 'receive' to filter incoming emails"
)
list_type: ListType = SchemaField(
description="'allow' for whitelist (only permit these), 'block' for blacklist (reject these)"
)
limit: int = SchemaField(
description="Maximum number of entries to return per page",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
entries: list[dict] = SchemaField(
description="List of entries, each with an email address or domain"
)
count: int = SchemaField(description="Number of entries returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="01489100-35da-45aa-8a01-9540ba0e9a21",
description="List all entries in an AgentMail allow/block list. Choose send/receive direction and allow/block type.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
},
test_output=[
("entries", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_entries": lambda *a, **kw: type(
"Resp",
(),
{
"entries": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_entries(
credentials: APIKeyCredentials, direction: str, list_type: str, **params
):
client = _client(credentials)
return await client.lists.list(direction, list_type, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_entries(
credentials,
input_data.direction.value,
input_data.list_type.value,
**params,
)
entries = [e.model_dump() for e in response.entries]
yield "entries", entries
yield "count", (c if (c := response.count) is not None else len(entries))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailCreateListEntryBlock(Block):
"""
Add an email address or domain to an AgentMail allow/block list.
Entries can be full email addresses (e.g. 'partner@example.com') or
entire domains (e.g. 'example.com'). For block lists, you can optionally
provide a reason (e.g. 'spam', 'competitor').
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' for outgoing email rules, 'receive' for incoming email rules"
)
list_type: ListType = SchemaField(
description="'allow' to whitelist, 'block' to blacklist"
)
entry: str = SchemaField(
description="Email address (user@example.com) or domain (example.com) to add"
)
reason: str = SchemaField(
description="Reason for blocking (only used with block lists, e.g. 'spam', 'competitor')",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
entry: str = SchemaField(
description="The email address or domain that was added"
)
result: dict = SchemaField(description="Complete entry object")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b6650a0a-b113-40cf-8243-ff20f684f9b8",
description="Add an email address or domain to an allow/block list. Block spam senders or whitelist trusted domains.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
"entry": "spam@example.com",
},
test_output=[
("entry", "spam@example.com"),
("result", dict),
],
test_mock={
"create_entry": lambda *a, **kw: type(
"Entry",
(),
{
"model_dump": lambda self: {"entry": "spam@example.com"},
},
)(),
},
)
@staticmethod
async def create_entry(
credentials: APIKeyCredentials, direction: str, list_type: str, **params
):
client = _client(credentials)
return await client.lists.create(direction, list_type, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"entry": input_data.entry}
if input_data.reason and input_data.list_type == ListType.BLOCK:
params["reason"] = input_data.reason
result = await self.create_entry(
credentials,
input_data.direction.value,
input_data.list_type.value,
**params,
)
result_dict = result.model_dump()
yield "entry", input_data.entry
yield "result", result_dict
except Exception as e:
yield "error", str(e)
class AgentMailGetListEntryBlock(Block):
"""
Check if an email address or domain exists in an AgentMail allow/block list.
Returns the entry details if found. Use this to verify whether a specific
address or domain is currently allowed or blocked.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' for outgoing rules, 'receive' for incoming rules"
)
list_type: ListType = SchemaField(
description="'allow' for whitelist, 'block' for blacklist"
)
entry: str = SchemaField(description="Email address or domain to look up")
class Output(BlockSchemaOutput):
entry: str = SchemaField(
description="The email address or domain that was found"
)
result: dict = SchemaField(description="Complete entry object with metadata")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="fb117058-ab27-40d1-9231-eb1dd526fc7a",
description="Check if an email address or domain is in an allow/block list. Verify filtering rules.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
"entry": "spam@example.com",
},
test_output=[
("entry", "spam@example.com"),
("result", dict),
],
test_mock={
"get_entry": lambda *a, **kw: type(
"Entry",
(),
{
"model_dump": lambda self: {"entry": "spam@example.com"},
},
)(),
},
)
@staticmethod
async def get_entry(
credentials: APIKeyCredentials, direction: str, list_type: str, entry: str
):
client = _client(credentials)
return await client.lists.get(direction, list_type, entry=entry)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
result = await self.get_entry(
credentials,
input_data.direction.value,
input_data.list_type.value,
input_data.entry,
)
result_dict = result.model_dump()
yield "entry", input_data.entry
yield "result", result_dict
except Exception as e:
yield "error", str(e)
class AgentMailDeleteListEntryBlock(Block):
"""
Remove an email address or domain from an AgentMail allow/block list.
After removal, the address/domain will no longer be filtered by this list.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' for outgoing rules, 'receive' for incoming rules"
)
list_type: ListType = SchemaField(
description="'allow' for whitelist, 'block' for blacklist"
)
entry: str = SchemaField(
description="Email address or domain to remove from the list"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the entry was successfully removed"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="2b8d57f1-1c9e-470f-a70b-5991c80fad5f",
description="Remove an email address or domain from an allow/block list to stop filtering it.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
"entry": "spam@example.com",
},
test_output=[("success", True)],
test_mock={
"delete_entry": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_entry(
credentials: APIKeyCredentials, direction: str, list_type: str, entry: str
):
client = _client(credentials)
await client.lists.delete(direction, list_type, entry=entry)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_entry(
credentials,
input_data.direction.value,
input_data.list_type.value,
input_data.entry,
)
yield "success", True
except Exception as e:
yield "error", str(e)

View File

@@ -0,0 +1,695 @@
"""
AgentMail Message blocks — send, list, get, reply, forward, and update messages.
A Message is an individual email within a Thread. Agents can send new messages
(which create threads), reply to existing messages, forward them, and manage
labels for state tracking (e.g. read/unread, campaign tags).
"""
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailSendMessageBlock(Block):
"""
Send a new email from an AgentMail inbox, automatically creating a new thread.
Supports plain text and HTML bodies, CC/BCC recipients, and labels for
organizing messages (e.g. campaign tracking, state management).
Max 50 combined recipients across to, cc, and bcc.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to send from (e.g. 'agent@agentmail.to')"
)
to: list[str] = SchemaField(
description="Recipient email addresses (e.g. ['user@example.com'])"
)
subject: str = SchemaField(description="Email subject line")
text: str = SchemaField(
description="Plain text body of the email. Always provide this as a fallback for email clients that don't render HTML."
)
html: str = SchemaField(
description="Rich HTML body of the email. Embed CSS in a <style> tag for best compatibility across email clients.",
default="",
advanced=True,
)
cc: list[str] = SchemaField(
description="CC recipient email addresses for human-in-the-loop oversight",
default_factory=list,
advanced=True,
)
bcc: list[str] = SchemaField(
description="BCC recipient email addresses (hidden from other recipients)",
default_factory=list,
advanced=True,
)
labels: list[str] = SchemaField(
description="Labels to tag the message for filtering and state management (e.g. ['outreach', 'q4-campaign'])",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Unique identifier of the sent message"
)
thread_id: str = SchemaField(
description="Thread ID grouping this message and any future replies"
)
result: dict = SchemaField(
description="Complete sent message object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b67469b2-7748-4d81-a223-4ebd332cca89",
description="Send a new email from an AgentMail inbox. Creates a new conversation thread. Supports HTML, CC/BCC, and labels.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"to": ["user@example.com"],
"subject": "Test",
"text": "Hello",
},
test_output=[
("message_id", "mock-msg-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"send_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-msg-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {
"message_id": "mock-msg-id",
"thread_id": "mock-thread-id",
},
},
)(),
},
)
@staticmethod
async def send_message(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.messages.send(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
total = len(input_data.to) + len(input_data.cc) + len(input_data.bcc)
if total > 50:
raise ValueError(
f"Max 50 combined recipients across to, cc, and bcc (got {total})"
)
params: dict = {
"to": input_data.to,
"subject": input_data.subject,
"text": input_data.text,
}
if input_data.html:
params["html"] = input_data.html
if input_data.cc:
params["cc"] = input_data.cc
if input_data.bcc:
params["bcc"] = input_data.bcc
if input_data.labels:
params["labels"] = input_data.labels
msg = await self.send_message(credentials, input_data.inbox_id, **params)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "thread_id", msg.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListMessagesBlock(Block):
"""
List all messages in an AgentMail inbox with optional label filtering.
Returns a paginated list of messages. Use labels to filter (e.g.
labels=['unread'] to only get unprocessed messages). Useful for
polling workflows or building inbox views.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to list messages from"
)
limit: int = SchemaField(
description="Maximum number of messages to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return messages with ALL of these labels (e.g. ['unread'] or ['q4-campaign', 'follow-up'])",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
messages: list[dict] = SchemaField(
description="List of message objects with subject, sender, text, html, labels, etc."
)
count: int = SchemaField(description="Number of messages returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="721234df-c7a2-4927-b205-744badbd5844",
description="List messages in an AgentMail inbox. Filter by labels to find unread, campaign-tagged, or categorized messages.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("messages", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_messages": lambda *a, **kw: type(
"Resp",
(),
{
"messages": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_messages(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.messages.list(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_messages(
credentials, input_data.inbox_id, **params
)
messages = [m.model_dump() for m in response.messages]
yield "messages", messages
yield "count", (c if (c := response.count) is not None else len(messages))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailGetMessageBlock(Block):
"""
Retrieve a specific email message by ID from an AgentMail inbox.
Returns the full message including subject, body (text and HTML),
sender, recipients, and attachments. Use extracted_text to get
only the new reply content without quoted history.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the message belongs to"
)
message_id: str = SchemaField(
description="Message ID to retrieve (e.g. '<abc123@agentmail.to>')"
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(description="Unique identifier of the message")
thread_id: str = SchemaField(description="Thread this message belongs to")
subject: str = SchemaField(description="Email subject line")
text: str = SchemaField(
description="Full plain text body (may include quoted reply history)"
)
extracted_text: str = SchemaField(
description="Just the new reply content with quoted history stripped. Best for AI processing.",
default="",
)
html: str = SchemaField(description="HTML body of the email", default="")
result: dict = SchemaField(
description="Complete message object with all fields including sender, recipients, attachments, labels"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="2788bdfa-1527-4603-a5e4-a455c05c032f",
description="Retrieve a specific email message by ID. Includes extracted_text for clean reply content without quoted history.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
},
test_output=[
("message_id", "test-msg"),
("thread_id", "t1"),
("subject", "Hi"),
("text", "Hello"),
("extracted_text", "Hello"),
("html", ""),
("result", dict),
],
test_mock={
"get_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "test-msg",
"thread_id": "t1",
"subject": "Hi",
"text": "Hello",
"extracted_text": "Hello",
"html": "",
"model_dump": lambda self: {"message_id": "test-msg"},
},
)(),
},
)
@staticmethod
async def get_message(
credentials: APIKeyCredentials,
inbox_id: str,
message_id: str,
):
client = _client(credentials)
return await client.inboxes.messages.get(
inbox_id=inbox_id, message_id=message_id
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
msg = await self.get_message(
credentials, input_data.inbox_id, input_data.message_id
)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "thread_id", msg.thread_id or ""
yield "subject", msg.subject or ""
yield "text", msg.text or ""
yield "extracted_text", msg.extracted_text or ""
yield "html", msg.html or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailReplyToMessageBlock(Block):
"""
Reply to an existing email message, keeping the reply in the same thread.
The reply is automatically added to the same conversation thread as the
original message. Use this for multi-turn agent conversations.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to send the reply from"
)
message_id: str = SchemaField(
description="Message ID to reply to (e.g. '<abc123@agentmail.to>')"
)
text: str = SchemaField(description="Plain text body of the reply")
html: str = SchemaField(
description="Rich HTML body of the reply",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Unique identifier of the reply message"
)
thread_id: str = SchemaField(description="Thread ID the reply was added to")
result: dict = SchemaField(
description="Complete reply message object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b9fe53fa-5026-4547-9570-b54ccb487229",
description="Reply to an existing email in the same conversation thread. Use for multi-turn agent conversations.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"text": "Reply",
},
test_output=[
("message_id", "mock-reply-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"reply_to_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-reply-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {"message_id": "mock-reply-id"},
},
)(),
},
)
@staticmethod
async def reply_to_message(
credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
):
client = _client(credentials)
return await client.inboxes.messages.reply(
inbox_id=inbox_id, message_id=message_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"text": input_data.text}
if input_data.html:
params["html"] = input_data.html
reply = await self.reply_to_message(
credentials,
input_data.inbox_id,
input_data.message_id,
**params,
)
result = reply.model_dump()
yield "message_id", reply.message_id
yield "thread_id", reply.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailForwardMessageBlock(Block):
"""
Forward an existing email message to one or more recipients.
Sends the original message content to different email addresses.
Optionally prepend additional text or override the subject line.
Max 50 combined recipients across to, cc, and bcc.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to forward from"
)
message_id: str = SchemaField(description="Message ID to forward")
to: list[str] = SchemaField(
description="Recipient email addresses to forward the message to (e.g. ['user@example.com'])"
)
cc: list[str] = SchemaField(
description="CC recipient email addresses",
default_factory=list,
advanced=True,
)
bcc: list[str] = SchemaField(
description="BCC recipient email addresses (hidden from other recipients)",
default_factory=list,
advanced=True,
)
subject: str = SchemaField(
description="Override the subject line (defaults to 'Fwd: <original subject>')",
default="",
advanced=True,
)
text: str = SchemaField(
description="Additional plain text to prepend before the forwarded content",
default="",
advanced=True,
)
html: str = SchemaField(
description="Additional HTML to prepend before the forwarded content",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Unique identifier of the forwarded message"
)
thread_id: str = SchemaField(description="Thread ID of the forward")
result: dict = SchemaField(
description="Complete forwarded message object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b70c7e33-5d66-4f8e-897f-ac73a7bfce82",
description="Forward an email message to one or more recipients. Supports CC/BCC and optional extra text or subject override.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"to": ["user@example.com"],
},
test_output=[
("message_id", "mock-fwd-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"forward_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-fwd-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {"message_id": "mock-fwd-id"},
},
)(),
},
)
@staticmethod
async def forward_message(
credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
):
client = _client(credentials)
return await client.inboxes.messages.forward(
inbox_id=inbox_id, message_id=message_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
total = len(input_data.to) + len(input_data.cc) + len(input_data.bcc)
if total > 50:
raise ValueError(
f"Max 50 combined recipients across to, cc, and bcc (got {total})"
)
params: dict = {"to": input_data.to}
if input_data.cc:
params["cc"] = input_data.cc
if input_data.bcc:
params["bcc"] = input_data.bcc
if input_data.subject:
params["subject"] = input_data.subject
if input_data.text:
params["text"] = input_data.text
if input_data.html:
params["html"] = input_data.html
fwd = await self.forward_message(
credentials,
input_data.inbox_id,
input_data.message_id,
**params,
)
result = fwd.model_dump()
yield "message_id", fwd.message_id
yield "thread_id", fwd.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailUpdateMessageBlock(Block):
"""
Add or remove labels on an email message for state management.
Labels are string tags used to track message state (read/unread),
categorize messages (billing, support), or tag campaigns (q4-outreach).
Common pattern: add 'read' and remove 'unread' after processing a message.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the message belongs to"
)
message_id: str = SchemaField(description="Message ID to update labels on")
add_labels: list[str] = SchemaField(
description="Labels to add (e.g. ['read', 'processed', 'high-priority'])",
default_factory=list,
)
remove_labels: list[str] = SchemaField(
description="Labels to remove (e.g. ['unread', 'pending'])",
default_factory=list,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(description="The updated message ID")
result: dict = SchemaField(
description="Complete updated message object with current labels"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="694ff816-4c89-4a5e-a552-8c31be187735",
description="Add or remove labels on an email message. Use for read/unread tracking, campaign tagging, or state management.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"add_labels": ["read"],
},
test_output=[
("message_id", "test-msg"),
("result", dict),
],
test_mock={
"update_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "test-msg",
"model_dump": lambda self: {"message_id": "test-msg"},
},
)(),
},
)
@staticmethod
async def update_message(
credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
):
client = _client(credentials)
return await client.inboxes.messages.update(
inbox_id=inbox_id, message_id=message_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
if not input_data.add_labels and not input_data.remove_labels:
raise ValueError(
"Must specify at least one label operation: add_labels or remove_labels"
)
params: dict = {}
if input_data.add_labels:
params["add_labels"] = input_data.add_labels
if input_data.remove_labels:
params["remove_labels"] = input_data.remove_labels
msg = await self.update_message(
credentials,
input_data.inbox_id,
input_data.message_id,
**params,
)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "result", result
except Exception as e:
yield "error", str(e)

View File

@@ -0,0 +1,651 @@
"""
AgentMail Pod blocks — create, get, list, delete pods and list pod-scoped resources.
Pods provide multi-tenant isolation between your customers. Each pod acts as
an isolated workspace containing its own inboxes, domains, threads, and drafts.
Use pods when building SaaS platforms, agency tools, or AI agent fleets that
serve multiple customers.
"""
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailCreatePodBlock(Block):
"""
Create a new pod for multi-tenant customer isolation.
Each pod acts as an isolated workspace for one customer or tenant.
Use client_id to map pods to your internal tenant IDs for idempotent
creation (safe to retry without creating duplicates).
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
client_id: str = SchemaField(
description="Your internal tenant/customer ID for idempotent mapping. Lets you access the pod by your own ID instead of AgentMail's pod_id.",
default="",
)
class Output(BlockSchemaOutput):
pod_id: str = SchemaField(description="Unique identifier of the created pod")
result: dict = SchemaField(description="Complete pod object with all metadata")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="a2db9784-2d17-4f8f-9d6b-0214e6f22101",
description="Create a new pod for multi-tenant customer isolation. Use client_id to map to your internal tenant IDs.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("pod_id", "mock-pod-id"),
("result", dict),
],
test_mock={
"create_pod": lambda *a, **kw: type(
"Pod",
(),
{
"pod_id": "mock-pod-id",
"model_dump": lambda self: {"pod_id": "mock-pod-id"},
},
)(),
},
)
@staticmethod
async def create_pod(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.pods.create(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.client_id:
params["client_id"] = input_data.client_id
pod = await self.create_pod(credentials, **params)
result = pod.model_dump()
yield "pod_id", pod.pod_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailGetPodBlock(Block):
"""
Retrieve details of an existing pod by its ID.
Returns the pod metadata including its client_id mapping and
creation timestamp.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to retrieve")
class Output(BlockSchemaOutput):
pod_id: str = SchemaField(description="Unique identifier of the pod")
result: dict = SchemaField(description="Complete pod object with all metadata")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="553361bc-bb1b-4322-9ad4-0c226200217e",
description="Retrieve details of an existing pod including its client_id mapping and metadata.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("pod_id", "test-pod"),
("result", dict),
],
test_mock={
"get_pod": lambda *a, **kw: type(
"Pod",
(),
{
"pod_id": "test-pod",
"model_dump": lambda self: {"pod_id": "test-pod"},
},
)(),
},
)
@staticmethod
async def get_pod(credentials: APIKeyCredentials, pod_id: str):
client = _client(credentials)
return await client.pods.get(pod_id=pod_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
pod = await self.get_pod(credentials, pod_id=input_data.pod_id)
result = pod.model_dump()
yield "pod_id", pod.pod_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListPodsBlock(Block):
"""
List all pods in your AgentMail organization.
Returns a paginated list of all tenant pods with their metadata.
Use this to see all customer workspaces at a glance.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of pods to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
pods: list[dict] = SchemaField(
description="List of pod objects with pod_id, client_id, creation time, etc."
)
count: int = SchemaField(description="Number of pods returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="9d3725ee-2968-431a-a816-857ab41e1420",
description="List all tenant pods in your organization. See all customer workspaces at a glance.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("pods", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pods": lambda *a, **kw: type(
"Resp",
(),
{
"pods": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pods(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.pods.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_pods(credentials, **params)
pods = [p.model_dump() for p in response.pods]
yield "pods", pods
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailDeletePodBlock(Block):
"""
Permanently delete a pod. All inboxes and domains must be removed first.
You cannot delete a pod that still contains inboxes or domains.
Delete all child resources first, then delete the pod.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(
description="Pod ID to permanently delete (must have no inboxes or domains)"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the pod was successfully deleted"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="f371f8cd-682d-4f5f-905c-529c74a8fb35",
description="Permanently delete a pod. All inboxes and domains must be removed first.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[("success", True)],
test_mock={
"delete_pod": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_pod(credentials: APIKeyCredentials, pod_id: str):
client = _client(credentials)
await client.pods.delete(pod_id=pod_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_pod(credentials, pod_id=input_data.pod_id)
yield "success", True
except Exception as e:
yield "error", str(e)
class AgentMailListPodInboxesBlock(Block):
"""
List all inboxes within a specific pod (customer workspace).
Returns only the inboxes belonging to this pod, providing
tenant-scoped visibility.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to list inboxes from")
limit: int = SchemaField(
description="Maximum number of inboxes to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
inboxes: list[dict] = SchemaField(
description="List of inbox objects within this pod"
)
count: int = SchemaField(description="Number of inboxes returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="a8c17ce0-b7c1-4bc3-ae39-680e1952e5d0",
description="List all inboxes within a pod. View email accounts scoped to a specific customer.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("inboxes", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pod_inboxes": lambda *a, **kw: type(
"Resp",
(),
{
"inboxes": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pod_inboxes(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.inboxes.list(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_pod_inboxes(
credentials, pod_id=input_data.pod_id, **params
)
inboxes = [i.model_dump() for i in response.inboxes]
yield "inboxes", inboxes
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailListPodThreadsBlock(Block):
"""
List all conversation threads across all inboxes within a pod.
Returns threads from every inbox in the pod. Use for building
per-customer dashboards showing all email activity, or for
supervisor agents monitoring a customer's conversations.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to list threads from")
limit: int = SchemaField(
description="Maximum number of threads to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return threads matching ALL of these labels",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
threads: list[dict] = SchemaField(
description="List of thread objects from all inboxes in this pod"
)
count: int = SchemaField(description="Number of threads returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="80214f08-8b85-4533-a6b8-f8123bfcb410",
description="List all conversation threads across all inboxes within a pod. View all email activity for a customer.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("threads", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pod_threads": lambda *a, **kw: type(
"Resp",
(),
{
"threads": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pod_threads(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.threads.list(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_pod_threads(
credentials, pod_id=input_data.pod_id, **params
)
threads = [t.model_dump() for t in response.threads]
yield "threads", threads
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailListPodDraftsBlock(Block):
"""
List all drafts across all inboxes within a pod.
Returns pending drafts from every inbox in the pod. Use for
per-customer approval dashboards or monitoring scheduled sends.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to list drafts from")
limit: int = SchemaField(
description="Maximum number of drafts to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
drafts: list[dict] = SchemaField(
description="List of draft objects from all inboxes in this pod"
)
count: int = SchemaField(description="Number of drafts returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="12fd7a3e-51ad-4b20-97c1-0391f207f517",
description="List all drafts across all inboxes within a pod. View pending emails for a customer.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("drafts", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pod_drafts": lambda *a, **kw: type(
"Resp",
(),
{
"drafts": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pod_drafts(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.drafts.list(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_pod_drafts(
credentials, pod_id=input_data.pod_id, **params
)
drafts = [d.model_dump() for d in response.drafts]
yield "drafts", drafts
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailCreatePodInboxBlock(Block):
"""
Create a new email inbox within a specific pod (customer workspace).
The inbox is automatically scoped to the pod and inherits its
isolation guarantees. If username/domain are not provided,
AgentMail auto-generates a unique address.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to create the inbox in")
username: str = SchemaField(
description="Local part of the email address (e.g. 'support'). Leave empty to auto-generate.",
default="",
)
domain: str = SchemaField(
description="Email domain (e.g. 'mydomain.com'). Defaults to agentmail.to if empty.",
default="",
)
display_name: str = SchemaField(
description="Friendly name shown in the 'From' field (e.g. 'Customer Support')",
default="",
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(
description="Unique identifier of the created inbox"
)
email_address: str = SchemaField(description="Full email address of the inbox")
result: dict = SchemaField(
description="Complete inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="c6862373-1ac6-402e-89e6-7db1fea882af",
description="Create a new email inbox within a pod. The inbox is scoped to the customer workspace.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("inbox_id", "mock-inbox-id"),
("email_address", "mock-inbox-id"),
("result", dict),
],
test_mock={
"create_pod_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "mock-inbox-id",
"model_dump": lambda self: {"inbox_id": "mock-inbox-id"},
},
)(),
},
)
@staticmethod
async def create_pod_inbox(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.inboxes.create(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.username:
params["username"] = input_data.username
if input_data.domain:
params["domain"] = input_data.domain
if input_data.display_name:
params["display_name"] = input_data.display_name
inbox = await self.create_pod_inbox(
credentials, pod_id=input_data.pod_id, **params
)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "email_address", inbox.inbox_id
yield "result", result
except Exception as e:
yield "error", str(e)

View File

@@ -0,0 +1,438 @@
"""
AgentMail Thread blocks — list, get, and delete conversation threads.
A Thread groups related messages into a single conversation. Threads are
created automatically when a new message is sent and grow as replies are added.
Threads can be queried per-inbox or across the entire organization.
"""
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailListInboxThreadsBlock(Block):
"""
List all conversation threads within a specific AgentMail inbox.
Returns a paginated list of threads with optional label filtering.
Use labels to find threads by campaign, status, or custom tags.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to list threads from"
)
limit: int = SchemaField(
description="Maximum number of threads to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return threads matching ALL of these labels (e.g. ['q4-campaign', 'follow-up'])",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
threads: list[dict] = SchemaField(
description="List of thread objects with thread_id, subject, message count, labels, etc."
)
count: int = SchemaField(description="Number of threads returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="63dd9e2d-ef81-405c-b034-c031f0437334",
description="List all conversation threads in an AgentMail inbox. Filter by labels for campaign tracking or status management.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("threads", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_threads": lambda *a, **kw: type(
"Resp",
(),
{
"threads": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_threads(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.threads.list(inbox_id=inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_threads(
credentials, input_data.inbox_id, **params
)
threads = [t.model_dump() for t in response.threads]
yield "threads", threads
yield "count", (c if (c := response.count) is not None else len(threads))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailGetInboxThreadBlock(Block):
"""
Retrieve a single conversation thread from an AgentMail inbox.
Returns the thread with all its messages in chronological order.
Use this to get the full conversation history for context when
composing replies.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the thread belongs to"
)
thread_id: str = SchemaField(description="Thread ID to retrieve")
class Output(BlockSchemaOutput):
thread_id: str = SchemaField(description="Unique identifier of the thread")
messages: list[dict] = SchemaField(
description="All messages in the thread, in chronological order"
)
result: dict = SchemaField(
description="Complete thread object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="42866290-1479-4153-83e7-550b703e9da2",
description="Retrieve a conversation thread with all its messages. Use for getting full conversation context before replying.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"thread_id": "test-thread",
},
test_output=[
("thread_id", "test-thread"),
("messages", []),
("result", dict),
],
test_mock={
"get_thread": lambda *a, **kw: type(
"Thread",
(),
{
"thread_id": "test-thread",
"messages": [],
"model_dump": lambda self: {
"thread_id": "test-thread",
"messages": [],
},
},
)(),
},
)
@staticmethod
async def get_thread(credentials: APIKeyCredentials, inbox_id: str, thread_id: str):
client = _client(credentials)
return await client.inboxes.threads.get(inbox_id=inbox_id, thread_id=thread_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
thread = await self.get_thread(
credentials, input_data.inbox_id, input_data.thread_id
)
messages = [m.model_dump() for m in thread.messages]
result = thread.model_dump()
result["messages"] = messages
yield "thread_id", thread.thread_id
yield "messages", messages
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailDeleteInboxThreadBlock(Block):
"""
Permanently delete a conversation thread and all its messages from an inbox.
This removes the thread and every message within it. This action
cannot be undone.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the thread belongs to"
)
thread_id: str = SchemaField(description="Thread ID to permanently delete")
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the thread was successfully deleted"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="18cd5f6f-4ff6-45da-8300-25a50ea7fb75",
description="Permanently delete a conversation thread and all its messages. This action cannot be undone.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"thread_id": "test-thread",
},
test_output=[("success", True)],
test_mock={
"delete_thread": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_thread(
credentials: APIKeyCredentials, inbox_id: str, thread_id: str
):
client = _client(credentials)
await client.inboxes.threads.delete(inbox_id=inbox_id, thread_id=thread_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_thread(
credentials, input_data.inbox_id, input_data.thread_id
)
yield "success", True
except Exception as e:
yield "error", str(e)
class AgentMailListOrgThreadsBlock(Block):
"""
List conversation threads across ALL inboxes in your organization.
Unlike per-inbox listing, this returns threads from every inbox.
Ideal for building supervisor agents that monitor all conversations,
analytics dashboards, or cross-agent routing workflows.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of threads to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return threads matching ALL of these labels",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
threads: list[dict] = SchemaField(
description="List of thread objects from all inboxes in the organization"
)
count: int = SchemaField(description="Number of threads returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="d7a0657b-58ab-48b2-898b-7bd94f44a708",
description="List threads across ALL inboxes in your organization. Use for supervisor agents, dashboards, or cross-agent monitoring.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("threads", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_org_threads": lambda *a, **kw: type(
"Resp",
(),
{
"threads": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_org_threads(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.threads.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_org_threads(credentials, **params)
threads = [t.model_dump() for t in response.threads]
yield "threads", threads
yield "count", (c if (c := response.count) is not None else len(threads))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailGetOrgThreadBlock(Block):
"""
Retrieve a single conversation thread by ID from anywhere in the organization.
Works without needing to know which inbox the thread belongs to.
Returns the thread with all its messages in chronological order.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
thread_id: str = SchemaField(
description="Thread ID to retrieve (works across all inboxes)"
)
class Output(BlockSchemaOutput):
thread_id: str = SchemaField(description="Unique identifier of the thread")
messages: list[dict] = SchemaField(
description="All messages in the thread, in chronological order"
)
result: dict = SchemaField(
description="Complete thread object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="39aaae31-3eb1-44c6-9e37-5a44a4529649",
description="Retrieve a conversation thread by ID from anywhere in the organization, without needing the inbox ID.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"thread_id": "test-thread",
},
test_output=[
("thread_id", "test-thread"),
("messages", []),
("result", dict),
],
test_mock={
"get_org_thread": lambda *a, **kw: type(
"Thread",
(),
{
"thread_id": "test-thread",
"messages": [],
"model_dump": lambda self: {
"thread_id": "test-thread",
"messages": [],
},
},
)(),
},
)
@staticmethod
async def get_org_thread(credentials: APIKeyCredentials, thread_id: str):
client = _client(credentials)
return await client.threads.get(thread_id=thread_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
thread = await self.get_org_thread(credentials, input_data.thread_id)
messages = [m.model_dump() for m in thread.messages]
result = thread.model_dump()
result["messages"] = messages
yield "thread_id", thread.thread_id
yield "messages", messages
yield "result", result
except Exception as e:
yield "error", str(e)

View File

@@ -27,6 +27,7 @@ from backend.util.file import MediaFileType, store_media_file
class GeminiImageModel(str, Enum):
NANO_BANANA = "google/nano-banana"
NANO_BANANA_PRO = "google/nano-banana-pro"
NANO_BANANA_2 = "google/nano-banana-2"
class AspectRatio(str, Enum):
@@ -77,7 +78,7 @@ class AIImageCustomizerBlock(Block):
)
model: GeminiImageModel = SchemaField(
description="The AI model to use for image generation and editing",
default=GeminiImageModel.NANO_BANANA,
default=GeminiImageModel.NANO_BANANA_2,
title="Model",
)
images: list[MediaFileType] = SchemaField(
@@ -103,7 +104,7 @@ class AIImageCustomizerBlock(Block):
super().__init__(
id="d76bbe4c-930e-4894-8469-b66775511f71",
description=(
"Generate and edit custom images using Google's Nano-Banana model from Gemini 2.5. "
"Generate and edit custom images using Google's Nano-Banana models from Gemini. "
"Provide a prompt and optional reference images to create or modify images."
),
categories={BlockCategory.AI, BlockCategory.MULTIMEDIA},
@@ -111,7 +112,7 @@ class AIImageCustomizerBlock(Block):
output_schema=AIImageCustomizerBlock.Output,
test_input={
"prompt": "Make the scene more vibrant and colorful",
"model": GeminiImageModel.NANO_BANANA,
"model": GeminiImageModel.NANO_BANANA_2,
"images": [],
"aspect_ratio": AspectRatio.MATCH_INPUT_IMAGE,
"output_format": OutputFormat.JPG,

View File

@@ -115,6 +115,7 @@ class ImageGenModel(str, Enum):
RECRAFT = "Recraft v3"
SD3_5 = "Stable Diffusion 3.5 Medium"
NANO_BANANA_PRO = "Nano Banana Pro"
NANO_BANANA_2 = "Nano Banana 2"
class AIImageGeneratorBlock(Block):
@@ -131,7 +132,7 @@ class AIImageGeneratorBlock(Block):
)
model: ImageGenModel = SchemaField(
description="The AI model to use for image generation",
default=ImageGenModel.SD3_5,
default=ImageGenModel.NANO_BANANA_2,
title="Model",
)
size: ImageSize = SchemaField(
@@ -165,7 +166,7 @@ class AIImageGeneratorBlock(Block):
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"prompt": "An octopus using a laptop in a snowy forest with 'AutoGPT' clearly visible on the screen",
"model": ImageGenModel.RECRAFT,
"model": ImageGenModel.NANO_BANANA_2,
"size": ImageSize.SQUARE,
"style": ImageStyle.REALISTIC,
},
@@ -179,7 +180,9 @@ class AIImageGeneratorBlock(Block):
],
test_mock={
# Return a data URI directly so store_media_file doesn't need to download
"_run_client": lambda *args, **kwargs: "data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
"_run_client": lambda *args, **kwargs: (
"data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
)
},
)
@@ -280,17 +283,24 @@ class AIImageGeneratorBlock(Block):
)
return output
elif input_data.model == ImageGenModel.NANO_BANANA_PRO:
# Use Nano Banana Pro (Google Gemini 3 Pro Image)
elif input_data.model in (
ImageGenModel.NANO_BANANA_PRO,
ImageGenModel.NANO_BANANA_2,
):
# Use Nano Banana models (Google Gemini image variants)
model_map = {
ImageGenModel.NANO_BANANA_PRO: "google/nano-banana-pro",
ImageGenModel.NANO_BANANA_2: "google/nano-banana-2",
}
input_params = {
"prompt": modified_prompt,
"aspect_ratio": SIZE_TO_NANO_BANANA_RATIO[input_data.size],
"resolution": "2K", # Default to 2K for good quality/cost balance
"resolution": "2K",
"output_format": "jpg",
"safety_filter_level": "block_only_high", # Most permissive
"safety_filter_level": "block_only_high",
}
output = await self._run_client(
credentials, "google/nano-banana-pro", input_params
credentials, model_map[input_data.model], input_params
)
return output

View File

@@ -0,0 +1,376 @@
from __future__ import annotations
import asyncio
import contextvars
import json
import logging
from typing import TYPE_CHECKING, Any
from typing_extensions import TypedDict # Needed for Python <3.12 compatibility
from backend.blocks._base import (
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
)
from backend.data.model import SchemaField
if TYPE_CHECKING:
from backend.data.execution import ExecutionContext
logger = logging.getLogger(__name__)
# Block ID shared between autopilot.py and copilot prompting.py.
AUTOPILOT_BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
class ToolCallEntry(TypedDict):
"""A single tool invocation record from an autopilot execution."""
tool_call_id: str
tool_name: str
input: Any
output: Any | None
success: bool | None
class TokenUsage(TypedDict):
"""Aggregated token counts from the autopilot stream."""
prompt_tokens: int
completion_tokens: int
total_tokens: int
class AutoPilotBlock(Block):
"""Execute tasks using AutoGPT AutoPilot with full access to platform tools.
The autopilot can manage agents, access workspace files, fetch web content,
run blocks, and more. This block enables sub-agent patterns (autopilot calling
autopilot) and scheduled autopilot execution via the agent executor.
"""
class Input(BlockSchemaInput):
"""Input schema for the AutoPilot block."""
prompt: str = SchemaField(
description=(
"The task or instruction for the autopilot to execute. "
"The autopilot has access to platform tools like agent management, "
"workspace files, web fetch, block execution, and more."
),
placeholder="Find my agents and list them",
advanced=False,
)
system_context: str = SchemaField(
description=(
"Optional additional context prepended to the prompt. "
"Use this to constrain autopilot behavior, provide domain "
"context, or set output format requirements."
),
default="",
advanced=True,
)
session_id: str = SchemaField(
description=(
"Session ID to continue an existing autopilot conversation. "
"Leave empty to start a new session. "
"Use the session_id output from a previous run to continue."
),
default="",
advanced=True,
)
max_recursion_depth: int = SchemaField(
description=(
"Maximum nesting depth when the autopilot calls this block "
"recursively (sub-agent pattern). Prevents infinite loops."
),
default=3,
ge=1,
le=10,
advanced=True,
)
# timeout_seconds removed: the SDK manages its own heartbeat-based
# timeouts internally; wrapping with asyncio.timeout corrupts the
# SDK's internal stream (see service.py CRITICAL comment).
class Output(BlockSchemaOutput):
"""Output schema for the AutoPilot block."""
response: str = SchemaField(
description="The final text response from the autopilot."
)
tool_calls: list[ToolCallEntry] = SchemaField(
description=(
"List of tools called during execution. Each entry has "
"tool_call_id, tool_name, input, output, and success fields."
),
)
conversation_history: str = SchemaField(
description=(
"Current turn messages (user prompt + assistant reply) as JSON. "
"It can be used for logging or analysis."
),
)
session_id: str = SchemaField(
description=(
"Session ID for this conversation. "
"Pass this back to continue the conversation in a future run."
),
)
token_usage: TokenUsage = SchemaField(
description=(
"Token usage statistics: prompt_tokens, "
"completion_tokens, total_tokens."
),
)
def __init__(self):
super().__init__(
id=AUTOPILOT_BLOCK_ID,
description=(
"Execute tasks using AutoGPT AutoPilot with full access to "
"platform tools (agent management, workspace files, web fetch, "
"block execution, and more). Enables sub-agent patterns and "
"scheduled autopilot execution."
),
categories={BlockCategory.AI, BlockCategory.AGENT},
input_schema=AutoPilotBlock.Input,
output_schema=AutoPilotBlock.Output,
test_input={
"prompt": "List my agents",
"system_context": "",
"session_id": "",
"max_recursion_depth": 3,
},
test_output=[
("response", "You have 2 agents: Agent A and Agent B."),
("tool_calls", []),
(
"conversation_history",
'[{"role": "user", "content": "List my agents"}]',
),
("session_id", "test-session-id"),
(
"token_usage",
{
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150,
},
),
],
test_mock={
"create_session": lambda *args, **kwargs: "test-session-id",
"execute_copilot": lambda *args, **kwargs: (
"You have 2 agents: Agent A and Agent B.",
[],
'[{"role": "user", "content": "List my agents"}]',
"test-session-id",
{
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150,
},
),
},
)
async def create_session(self, user_id: str) -> str:
"""Create a new chat session and return its ID (mockable for tests)."""
from backend.copilot.model import create_chat_session
session = await create_chat_session(user_id)
return session.session_id
async def execute_copilot(
self,
prompt: str,
system_context: str,
session_id: str,
max_recursion_depth: int,
user_id: str,
) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
"""Invoke the copilot and collect all stream results.
Delegates to :func:`collect_copilot_response` — the shared helper that
consumes ``stream_chat_completion_sdk`` without wrapping it in an
``asyncio.timeout`` (the SDK manages its own heartbeat-based timeouts).
Args:
prompt: The user task/instruction.
system_context: Optional context prepended to the prompt.
session_id: Chat session to use.
max_recursion_depth: Maximum allowed recursion nesting.
user_id: Authenticated user ID.
Returns:
A tuple of (response_text, tool_calls, history_json, session_id, usage).
"""
from backend.copilot.sdk.collect import collect_copilot_response
tokens = _check_recursion(max_recursion_depth)
try:
effective_prompt = prompt
if system_context:
effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
result = await collect_copilot_response(
session_id=session_id,
message=effective_prompt,
user_id=user_id,
)
# Build a lightweight conversation summary from streamed data.
turn_messages: list[dict[str, Any]] = [
{"role": "user", "content": effective_prompt},
]
if result.tool_calls:
turn_messages.append(
{
"role": "assistant",
"content": result.response_text,
"tool_calls": result.tool_calls,
}
)
else:
turn_messages.append(
{"role": "assistant", "content": result.response_text}
)
history_json = json.dumps(turn_messages, default=str)
tool_calls: list[ToolCallEntry] = [
{
"tool_call_id": tc["tool_call_id"],
"tool_name": tc["tool_name"],
"input": tc["input"],
"output": tc["output"],
"success": tc["success"],
}
for tc in result.tool_calls
]
usage: TokenUsage = {
"prompt_tokens": result.prompt_tokens,
"completion_tokens": result.completion_tokens,
"total_tokens": result.total_tokens,
}
return (
result.response_text,
tool_calls,
history_json,
session_id,
usage,
)
finally:
_reset_recursion(tokens)
async def run(
self,
input_data: Input,
*,
execution_context: ExecutionContext,
**kwargs,
) -> BlockOutput:
"""Validate inputs, invoke the autopilot, and yield structured outputs.
Yields session_id even on failure so callers can inspect/resume the session.
"""
if not input_data.prompt.strip():
yield "error", "Prompt cannot be empty."
return
if not execution_context.user_id:
yield "error", "Cannot run autopilot without an authenticated user."
return
if input_data.max_recursion_depth < 1:
yield "error", "max_recursion_depth must be at least 1."
return
# Create session eagerly so the user always gets the session_id,
# even if the downstream stream fails (avoids orphaned sessions).
sid = input_data.session_id
if not sid:
sid = await self.create_session(execution_context.user_id)
# NOTE: No asyncio.timeout() here — the SDK manages its own
# heartbeat-based timeouts internally. Wrapping with asyncio.timeout
# would cancel the task mid-flight, corrupting the SDK's internal
# anyio memory stream (see service.py CRITICAL comment).
try:
response, tool_calls, history, _, usage = await self.execute_copilot(
prompt=input_data.prompt,
system_context=input_data.system_context,
session_id=sid,
max_recursion_depth=input_data.max_recursion_depth,
user_id=execution_context.user_id,
)
yield "response", response
yield "tool_calls", tool_calls
yield "conversation_history", history
yield "session_id", sid
yield "token_usage", usage
except asyncio.CancelledError:
yield "session_id", sid
yield "error", "AutoPilot execution was cancelled."
raise
except Exception as exc:
yield "session_id", sid
yield "error", str(exc)
# ---------------------------------------------------------------------------
# Helpers placed after the block class for top-down readability.
# ---------------------------------------------------------------------------
# Task-scoped recursion depth counter & chain-wide limit.
# contextvars are scoped to the current asyncio task, so concurrent
# graph executions each get independent counters.
_autopilot_recursion_depth: contextvars.ContextVar[int] = contextvars.ContextVar(
"_autopilot_recursion_depth", default=0
)
_autopilot_recursion_limit: contextvars.ContextVar[int | None] = contextvars.ContextVar(
"_autopilot_recursion_limit", default=None
)
def _check_recursion(
max_depth: int,
) -> tuple[contextvars.Token[int], contextvars.Token[int | None]]:
"""Check and increment recursion depth.
Returns ContextVar tokens that must be passed to ``_reset_recursion``
when the caller exits to restore the previous depth.
Raises:
RuntimeError: If the current depth already meets or exceeds the limit.
"""
current = _autopilot_recursion_depth.get()
inherited = _autopilot_recursion_limit.get()
limit = max_depth if inherited is None else min(inherited, max_depth)
if current >= limit:
raise RuntimeError(
f"AutoPilot recursion depth limit reached ({limit}). "
"The autopilot has called itself too many times."
)
return (
_autopilot_recursion_depth.set(current + 1),
_autopilot_recursion_limit.set(limit),
)
def _reset_recursion(
tokens: tuple[contextvars.Token[int], contextvars.Token[int | None]],
) -> None:
"""Restore recursion depth and limit to their previous values."""
_autopilot_recursion_depth.reset(tokens[0])
_autopilot_recursion_limit.reset(tokens[1])

View File

@@ -472,7 +472,7 @@ class AddToListBlock(Block):
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
entries_added = input_data.entries.copy()
if input_data.entry:
if input_data.entry is not None:
entries_added.append(input_data.entry)
updated_list = input_data.list.copy()

View File

@@ -21,6 +21,7 @@ from backend.data.model import (
UserPasswordCredentials,
)
from backend.integrations.providers import ProviderName
from backend.util.request import resolve_and_check_blocked
TEST_CREDENTIALS = UserPasswordCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
@@ -99,6 +100,8 @@ class SendEmailBlock(Block):
is_sensitive_action=True,
)
ALLOWED_SMTP_PORTS = {25, 465, 587, 2525}
@staticmethod
def send_email(
config: SMTPConfig,
@@ -129,6 +132,17 @@ class SendEmailBlock(Block):
self, input_data: Input, *, credentials: SMTPCredentials, **kwargs
) -> BlockOutput:
try:
# --- SSRF Protection ---
smtp_port = input_data.config.smtp_port
if smtp_port not in self.ALLOWED_SMTP_PORTS:
yield "error", (
f"SMTP port {smtp_port} is not allowed. "
f"Allowed ports: {sorted(self.ALLOWED_SMTP_PORTS)}"
)
return
await resolve_and_check_blocked(input_data.config.smtp_server)
status = self.send_email(
config=input_data.config,
to_email=input_data.to_email,
@@ -180,7 +194,19 @@ class SendEmailBlock(Block):
"was rejected by the server. "
"Please verify your account is authorized to send emails."
)
except smtplib.SMTPConnectError:
yield "error", (
f"Cannot connect to SMTP server '{input_data.config.smtp_server}' "
f"on port {input_data.config.smtp_port}."
)
except smtplib.SMTPServerDisconnected:
yield "error", (
f"SMTP server '{input_data.config.smtp_server}' "
"disconnected unexpectedly."
)
except smtplib.SMTPDataError as e:
yield "error", f"Email data rejected by server: {str(e)}"
except ValueError as e:
yield "error", str(e)
except Exception as e:
raise e

View File

@@ -34,17 +34,29 @@ TEST_CREDENTIALS_INPUT = {
"provider": TEST_CREDENTIALS.provider,
"id": TEST_CREDENTIALS.id,
"type": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.title,
}
class FluxKontextModelName(str, Enum):
PRO = "Flux Kontext Pro"
MAX = "Flux Kontext Max"
class ImageEditorModel(str, Enum):
FLUX_KONTEXT_PRO = "Flux Kontext Pro"
FLUX_KONTEXT_MAX = "Flux Kontext Max"
NANO_BANANA_PRO = "Nano Banana Pro"
NANO_BANANA_2 = "Nano Banana 2"
@property
def api_name(self) -> str:
return f"black-forest-labs/flux-kontext-{self.name.lower()}"
_map = {
"FLUX_KONTEXT_PRO": "black-forest-labs/flux-kontext-pro",
"FLUX_KONTEXT_MAX": "black-forest-labs/flux-kontext-max",
"NANO_BANANA_PRO": "google/nano-banana-pro",
"NANO_BANANA_2": "google/nano-banana-2",
}
return _map[self.name]
# Keep old name as alias for backwards compatibility
FluxKontextModelName = ImageEditorModel
class AspectRatio(str, Enum):
@@ -69,7 +81,7 @@ class AIImageEditorBlock(Block):
credentials: CredentialsMetaInput[
Literal[ProviderName.REPLICATE], Literal["api_key"]
] = CredentialsField(
description="Replicate API key with permissions for Flux Kontext models",
description="Replicate API key with permissions for Flux Kontext and Nano Banana models",
)
prompt: str = SchemaField(
description="Text instruction describing the desired edit",
@@ -87,14 +99,14 @@ class AIImageEditorBlock(Block):
advanced=False,
)
seed: Optional[int] = SchemaField(
description="Random seed. Set for reproducible generation",
description="Random seed. Set for reproducible generation (Flux Kontext only; ignored by Nano Banana models)",
default=None,
title="Seed",
advanced=True,
)
model: FluxKontextModelName = SchemaField(
model: ImageEditorModel = SchemaField(
description="Model variant to use",
default=FluxKontextModelName.PRO,
default=ImageEditorModel.NANO_BANANA_2,
title="Model",
)
@@ -107,7 +119,7 @@ class AIImageEditorBlock(Block):
super().__init__(
id="3fd9c73d-4370-4925-a1ff-1b86b99fabfa",
description=(
"Edit images using BlackForest Labs' Flux Kontext models. Provide a prompt "
"Edit images using Flux Kontext or Google Nano Banana models. Provide a prompt "
"and optional reference image to generate a modified image."
),
categories={BlockCategory.AI, BlockCategory.MULTIMEDIA},
@@ -118,7 +130,7 @@ class AIImageEditorBlock(Block):
"input_image": "data:image/png;base64,MQ==",
"aspect_ratio": AspectRatio.MATCH_INPUT_IMAGE,
"seed": None,
"model": FluxKontextModelName.PRO,
"model": ImageEditorModel.NANO_BANANA_2,
"credentials": TEST_CREDENTIALS_INPUT,
},
test_output=[
@@ -127,7 +139,9 @@ class AIImageEditorBlock(Block):
],
test_mock={
# Use data URI to avoid HTTP requests during tests
"run_model": lambda *args, **kwargs: "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
"run_model": lambda *args, **kwargs: (
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
),
},
test_credentials=TEST_CREDENTIALS,
)
@@ -142,7 +156,7 @@ class AIImageEditorBlock(Block):
) -> BlockOutput:
result = await self.run_model(
api_key=credentials.api_key,
model_name=input_data.model.api_name,
model=input_data.model,
prompt=input_data.prompt,
input_image_b64=(
await store_media_file(
@@ -169,7 +183,7 @@ class AIImageEditorBlock(Block):
async def run_model(
self,
api_key: SecretStr,
model_name: str,
model: ImageEditorModel,
prompt: str,
input_image_b64: Optional[str],
aspect_ratio: str,
@@ -178,12 +192,29 @@ class AIImageEditorBlock(Block):
graph_exec_id: str,
) -> MediaFileType:
client = ReplicateClient(api_token=api_key.get_secret_value())
input_params = {
"prompt": prompt,
"input_image": input_image_b64,
"aspect_ratio": aspect_ratio,
**({"seed": seed} if seed is not None else {}),
}
model_name = model.api_name
is_nano_banana = model in (
ImageEditorModel.NANO_BANANA_PRO,
ImageEditorModel.NANO_BANANA_2,
)
if is_nano_banana:
input_params: dict = {
"prompt": prompt,
"aspect_ratio": aspect_ratio,
"output_format": "jpg",
"safety_filter_level": "block_only_high",
}
# NB API expects "image_input" as a list, unlike Flux's single "input_image"
if input_image_b64:
input_params["image_input"] = [input_image_b64]
else:
input_params = {
"prompt": prompt,
"input_image": input_image_b64,
"aspect_ratio": aspect_ratio,
**({"seed": seed} if seed is not None else {}),
}
try:
output: FileOutput | list[FileOutput] = await client.async_run( # type: ignore

View File

@@ -211,7 +211,7 @@ class AgentOutputBlock(Block):
if input_data.format:
try:
formatter = TextFormatter(autoescape=input_data.escape_html)
yield "output", formatter.format_string(
yield "output", await formatter.format_string(
input_data.format, {input_data.name: input_data.value}
)
except Exception as e:

View File

@@ -33,6 +33,13 @@ from backend.integrations.providers import ProviderName
from backend.util import json
from backend.util.clients import OPENROUTER_BASE_URL
from backend.util.logging import TruncatedLogger
from backend.util.openai_responses import (
convert_tools_to_responses_format,
extract_responses_content,
extract_responses_reasoning,
extract_responses_tool_calls,
extract_responses_usage,
)
from backend.util.prompt import compress_context, estimate_token_count
from backend.util.request import validate_url_host
from backend.util.settings import Settings
@@ -111,7 +118,6 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
GPT4O_MINI = "gpt-4o-mini"
GPT4O = "gpt-4o"
GPT4_TURBO = "gpt-4-turbo"
GPT3_5_TURBO = "gpt-3.5-turbo"
# Anthropic models
CLAUDE_4_1_OPUS = "claude-opus-4-1-20250805"
CLAUDE_4_OPUS = "claude-opus-4-20250514"
@@ -277,9 +283,6 @@ MODEL_METADATA = {
LlmModel.GPT4_TURBO: ModelMetadata(
"openai", 128000, 4096, "GPT-4 Turbo", "OpenAI", "OpenAI", 3
), # gpt-4-turbo-2024-04-09
LlmModel.GPT3_5_TURBO: ModelMetadata(
"openai", 16385, 4096, "GPT-3.5 Turbo", "OpenAI", "OpenAI", 1
), # gpt-3.5-turbo-0125
# https://docs.anthropic.com/en/docs/about-claude/models
LlmModel.CLAUDE_4_1_OPUS: ModelMetadata(
"anthropic", 200000, 32000, "Claude Opus 4.1", "Anthropic", "Anthropic", 3
@@ -801,36 +804,53 @@ async def llm_call(
max_tokens = max(min(available_tokens, model_max_output, user_max), 1)
if provider == "openai":
tools_param = tools if tools else openai.NOT_GIVEN
oai_client = openai.AsyncOpenAI(api_key=credentials.api_key.get_secret_value())
response_format = None
parallel_tool_calls = get_parallel_tool_calls_param(
llm_model, parallel_tool_calls
)
tools_param = convert_tools_to_responses_format(tools) if tools else openai.omit
text_config = openai.omit
if force_json_output:
response_format = {"type": "json_object"}
text_config = {"format": {"type": "json_object"}} # type: ignore
response = await oai_client.chat.completions.create(
response = await oai_client.responses.create(
model=llm_model.value,
messages=prompt, # type: ignore
response_format=response_format, # type: ignore
max_completion_tokens=max_tokens,
tools=tools_param, # type: ignore
parallel_tool_calls=parallel_tool_calls,
input=prompt, # type: ignore[arg-type]
tools=tools_param, # type: ignore[arg-type]
max_output_tokens=max_tokens,
parallel_tool_calls=get_parallel_tool_calls_param(
llm_model, parallel_tool_calls
),
text=text_config, # type: ignore[arg-type]
store=False,
)
tool_calls = extract_openai_tool_calls(response)
reasoning = extract_openai_reasoning(response)
raw_tool_calls = extract_responses_tool_calls(response)
tool_calls = (
[
ToolContentBlock(
id=tc["id"],
type=tc["type"],
function=ToolCall(
name=tc["function"]["name"],
arguments=tc["function"]["arguments"],
),
)
for tc in raw_tool_calls
]
if raw_tool_calls
else None
)
reasoning = extract_responses_reasoning(response)
content = extract_responses_content(response)
prompt_tokens, completion_tokens = extract_responses_usage(response)
return LLMResponse(
raw_response=response.choices[0].message,
raw_response=response,
prompt=prompt,
response=response.choices[0].message.content or "",
response=content,
tool_calls=tool_calls,
prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
completion_tokens=response.usage.completion_tokens if response.usage else 0,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
reasoning=reasoning,
)
elif provider == "anthropic":
@@ -1276,8 +1296,10 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
values = input_data.prompt_values
if values:
input_data.prompt = fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = fmt.format_string(input_data.sys_prompt, values)
input_data.prompt = await fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = await fmt.format_string(
input_data.sys_prompt, values
)
if input_data.sys_prompt:
prompt.append({"role": "system", "content": input_data.sys_prompt})

View File

@@ -61,20 +61,27 @@ class ExecutionParams(BaseModel):
def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
"""
Return a list of tool_call_ids if the entry is a tool request.
Supports both OpenAI and Anthropics formats.
Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
"""
tool_call_ids = []
# OpenAI Responses API: function_call items have type="function_call"
if entry.get("type") == "function_call":
if call_id := entry.get("call_id"):
tool_call_ids.append(call_id)
return tool_call_ids
if entry.get("role") != "assistant":
return tool_call_ids
# OpenAI: check for tool_calls in the entry.
# OpenAI Chat Completions: check for tool_calls in the entry.
calls = entry.get("tool_calls")
if isinstance(calls, list):
for call in calls:
if tool_id := call.get("id"):
tool_call_ids.append(tool_id)
# Anthropics: check content items for tool_use type.
# Anthropic: check content items for tool_use type.
content = entry.get("content")
if isinstance(content, list):
for item in content:
@@ -89,16 +96,22 @@ def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
"""
Return a list of tool_call_ids if the entry is a tool response.
Supports both OpenAI and Anthropics formats.
Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
"""
tool_call_ids: list[str] = []
# OpenAI: a tool response message with role "tool" and key "tool_call_id".
# OpenAI Responses API: function_call_output items
if entry.get("type") == "function_call_output":
if call_id := entry.get("call_id"):
tool_call_ids.append(str(call_id))
return tool_call_ids
# OpenAI Chat Completions: a tool response message with role "tool".
if entry.get("role") == "tool":
if tool_call_id := entry.get("tool_call_id"):
tool_call_ids.append(str(tool_call_id))
# Anthropics: check content items for tool_result type.
# Anthropic: check content items for tool_result type.
if entry.get("role") == "user":
content = entry.get("content")
if isinstance(content, list):
@@ -111,14 +124,16 @@ def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
return tool_call_ids
def _create_tool_response(call_id: str, output: Any) -> dict[str, Any]:
def _create_tool_response(
call_id: str, output: Any, *, responses_api: bool = False
) -> dict[str, Any]:
"""
Create a tool response message for either OpenAI or Anthropics,
based on the tool_id format.
Create a tool response message for OpenAI, Anthropic, or OpenAI Responses API,
based on the tool_id format and the responses_api flag.
"""
content = output if isinstance(output, str) else json.dumps(output)
# Anthropics format: tool IDs typically start with "toolu_"
# Anthropic format: tool IDs typically start with "toolu_"
if call_id.startswith("toolu_"):
return {
"role": "user",
@@ -128,8 +143,11 @@ def _create_tool_response(call_id: str, output: Any) -> dict[str, Any]:
],
}
# OpenAI format: tool IDs typically start with "call_".
# Or default fallback (if the tool_id doesn't match any known prefix)
# OpenAI Responses API format
if responses_api:
return {"type": "function_call_output", "call_id": call_id, "output": content}
# OpenAI Chat Completions format (default fallback)
return {"role": "tool", "tool_call_id": call_id, "content": content}
@@ -177,10 +195,19 @@ def _combine_tool_responses(tool_outputs: list[dict[str, Any]]) -> list[dict[str
return tool_outputs
def _convert_raw_response_to_dict(raw_response: Any) -> dict[str, Any]:
def _convert_raw_response_to_dict(
raw_response: Any,
) -> dict[str, Any] | list[dict[str, Any]]:
"""
Safely convert raw_response to dictionary format for conversation history.
Handles different response types from different LLM providers.
For the OpenAI Responses API, the raw_response is the entire Response
object. Its ``output`` items (messages, function_calls) are extracted
individually so they can be used as valid input items on the next call.
Returns a **list** of dicts in that case.
For Chat Completions / Anthropic / Ollama, returns a single dict.
"""
if isinstance(raw_response, str):
# Ollama returns a string, convert to dict format
@@ -188,11 +215,28 @@ def _convert_raw_response_to_dict(raw_response: Any) -> dict[str, Any]:
elif isinstance(raw_response, dict):
# Already a dict (from tests or some providers)
return raw_response
elif _is_responses_api_object(raw_response):
# OpenAI Responses API: extract individual output items
items = [json.to_dict(item) for item in raw_response.output]
return items if items else [{"role": "assistant", "content": ""}]
else:
# OpenAI/Anthropic return objects, convert with json.to_dict
# Chat Completions / Anthropic return message objects
return json.to_dict(raw_response)
def _is_responses_api_object(obj: Any) -> bool:
"""Detect an OpenAI Responses API Response object.
These have ``object == "response"`` and an ``output`` list, but no
``role`` attribute (unlike ChatCompletionMessage).
"""
return (
getattr(obj, "object", None) == "response"
and hasattr(obj, "output")
and not hasattr(obj, "role")
)
def get_pending_tool_calls(conversation_history: list[Any] | None) -> dict[str, int]:
"""
All the tool calls entry in the conversation history requires a response.
@@ -754,19 +798,34 @@ class SmartDecisionMakerBlock(Block):
self, prompt: list[dict], response, tool_outputs: list | None = None
):
"""Update conversation history with response and tool outputs."""
# Don't add separate reasoning message with tool calls (breaks Anthropic's tool_use->tool_result pairing)
assistant_message = _convert_raw_response_to_dict(response.raw_response)
has_tool_calls = isinstance(assistant_message.get("content"), list) and any(
item.get("type") == "tool_use"
for item in assistant_message.get("content", [])
)
converted = _convert_raw_response_to_dict(response.raw_response)
if response.reasoning and not has_tool_calls:
prompt.append(
{"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
if isinstance(converted, list):
# Responses API: output items are already individual dicts
has_tool_calls = any(
item.get("type") == "function_call" for item in converted
)
prompt.append(assistant_message)
if response.reasoning and not has_tool_calls:
prompt.append(
{
"role": "assistant",
"content": f"[Reasoning]: {response.reasoning}",
}
)
prompt.extend(converted)
else:
# Chat Completions / Anthropic: single assistant message dict
has_tool_calls = isinstance(converted.get("content"), list) and any(
item.get("type") == "tool_use" for item in converted.get("content", [])
)
if response.reasoning and not has_tool_calls:
prompt.append(
{
"role": "assistant",
"content": f"[Reasoning]: {response.reasoning}",
}
)
prompt.append(converted)
if tool_outputs:
prompt.extend(tool_outputs)
@@ -776,6 +835,8 @@ class SmartDecisionMakerBlock(Block):
tool_info: ToolInfo,
execution_params: ExecutionParams,
execution_processor: "ExecutionProcessor",
*,
responses_api: bool = False,
) -> dict:
"""Execute a single tool using the execution manager for proper integration."""
# Lazy imports to avoid circular dependencies
@@ -868,13 +929,17 @@ class SmartDecisionMakerBlock(Block):
if node_outputs
else "Tool executed successfully"
)
return _create_tool_response(tool_call.id, tool_response_content)
return _create_tool_response(
tool_call.id, tool_response_content, responses_api=responses_api
)
except Exception as e:
logger.error(f"Tool execution with manager failed: {e}")
# Return error response
return _create_tool_response(
tool_call.id, f"Tool execution failed: {str(e)}"
tool_call.id,
f"Tool execution failed: {str(e)}",
responses_api=responses_api,
)
async def _execute_tools_agent_mode(
@@ -895,6 +960,7 @@ class SmartDecisionMakerBlock(Block):
"""Execute tools in agent mode with a loop until finished."""
max_iterations = input_data.agent_mode_max_iterations
iteration = 0
use_responses_api = input_data.model.metadata.provider == "openai"
# Execution parameters for tool execution
execution_params = ExecutionParams(
@@ -951,14 +1017,19 @@ class SmartDecisionMakerBlock(Block):
for tool_info in processed_tools:
try:
tool_response = await self._execute_single_tool_with_manager(
tool_info, execution_params, execution_processor
tool_info,
execution_params,
execution_processor,
responses_api=use_responses_api,
)
tool_outputs.append(tool_response)
except Exception as e:
logger.error(f"Tool execution failed: {e}")
# Create error response for the tool
error_response = _create_tool_response(
tool_info.tool_call.id, f"Error: {str(e)}"
tool_info.tool_call.id,
f"Error: {str(e)}",
responses_api=use_responses_api,
)
tool_outputs.append(error_response)
@@ -1020,11 +1091,17 @@ class SmartDecisionMakerBlock(Block):
if pending_tool_calls and input_data.last_tool_output is None:
raise ValueError(f"Tool call requires an output for {pending_tool_calls}")
use_responses_api = input_data.model.metadata.provider == "openai"
tool_output = []
if pending_tool_calls and input_data.last_tool_output is not None:
first_call_id = next(iter(pending_tool_calls.keys()))
tool_output.append(
_create_tool_response(first_call_id, input_data.last_tool_output)
_create_tool_response(
first_call_id,
input_data.last_tool_output,
responses_api=use_responses_api,
)
)
prompt.extend(tool_output)
@@ -1050,11 +1127,15 @@ class SmartDecisionMakerBlock(Block):
values = input_data.prompt_values
if values:
input_data.prompt = llm.fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = llm.fmt.format_string(input_data.sys_prompt, values)
input_data.prompt = await llm.fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = await llm.fmt.format_string(
input_data.sys_prompt, values
)
if input_data.sys_prompt and not any(
p["role"] == "system" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
p.get("role") == "system"
and isinstance(p.get("content"), str)
and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
for p in prompt
):
prompt.append(
@@ -1065,7 +1146,9 @@ class SmartDecisionMakerBlock(Block):
)
if input_data.prompt and not any(
p["role"] == "user" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
p.get("role") == "user"
and isinstance(p.get("content"), str)
and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
for p in prompt
):
prompt.append(
@@ -1173,11 +1256,26 @@ class SmartDecisionMakerBlock(Block):
)
yield emit_key, arg_value
if response.reasoning:
converted = _convert_raw_response_to_dict(response.raw_response)
# Check for tool calls to avoid inserting reasoning between tool pairs
if isinstance(converted, list):
has_tool_calls = any(
item.get("type") == "function_call" for item in converted
)
else:
has_tool_calls = isinstance(converted.get("content"), list) and any(
item.get("type") == "tool_use" for item in converted.get("content", [])
)
if response.reasoning and not has_tool_calls:
prompt.append(
{"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
)
prompt.append(_convert_raw_response_to_dict(response.raw_response))
if isinstance(converted, list):
prompt.extend(converted)
else:
prompt.append(converted)
yield "conversations", prompt

View File

@@ -0,0 +1,223 @@
"""Tests for AutoPilotBlock: recursion guard, streaming, validation, and error paths."""
import asyncio
from unittest.mock import AsyncMock
import pytest
from backend.blocks.autopilot import (
AUTOPILOT_BLOCK_ID,
AutoPilotBlock,
_autopilot_recursion_depth,
_autopilot_recursion_limit,
_check_recursion,
_reset_recursion,
)
from backend.data.execution import ExecutionContext
def _make_context(user_id: str = "test-user-123") -> ExecutionContext:
"""Helper to build an ExecutionContext for tests."""
return ExecutionContext(
user_id=user_id,
graph_id="graph-1",
graph_exec_id="gexec-1",
graph_version=1,
node_id="node-1",
node_exec_id="nexec-1",
)
# ---------------------------------------------------------------------------
# Recursion guard unit tests
# ---------------------------------------------------------------------------
class TestCheckRecursion:
"""Unit tests for _check_recursion / _reset_recursion."""
def test_first_call_increments_depth(self):
tokens = _check_recursion(3)
try:
assert _autopilot_recursion_depth.get() == 1
assert _autopilot_recursion_limit.get() == 3
finally:
_reset_recursion(tokens)
def test_reset_restores_previous_values(self):
assert _autopilot_recursion_depth.get() == 0
assert _autopilot_recursion_limit.get() is None
tokens = _check_recursion(5)
_reset_recursion(tokens)
assert _autopilot_recursion_depth.get() == 0
assert _autopilot_recursion_limit.get() is None
def test_exceeding_limit_raises(self):
t1 = _check_recursion(2)
try:
t2 = _check_recursion(2)
try:
with pytest.raises(RuntimeError, match="recursion depth limit"):
_check_recursion(2)
finally:
_reset_recursion(t2)
finally:
_reset_recursion(t1)
def test_nested_calls_respect_inherited_limit(self):
"""Inner call with higher max_depth still respects outer limit."""
t1 = _check_recursion(2) # sets limit=2
try:
t2 = _check_recursion(10) # inner wants 10, but inherited is 2
try:
# depth is now 2, limit is min(10, 2) = 2 → should raise
with pytest.raises(RuntimeError, match="recursion depth limit"):
_check_recursion(10)
finally:
_reset_recursion(t2)
finally:
_reset_recursion(t1)
def test_limit_of_one_blocks_immediately_on_second_call(self):
t1 = _check_recursion(1)
try:
with pytest.raises(RuntimeError):
_check_recursion(1)
finally:
_reset_recursion(t1)
# ---------------------------------------------------------------------------
# AutoPilotBlock.run() validation tests
# ---------------------------------------------------------------------------
class TestRunValidation:
"""Tests for input validation in AutoPilotBlock.run()."""
@pytest.fixture
def block(self):
return AutoPilotBlock()
@pytest.mark.asyncio
async def test_empty_prompt_yields_error(self, block):
block.Input # ensure schema is accessible
input_data = block.Input(prompt=" ", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs.get("error") == "Prompt cannot be empty."
assert "response" not in outputs
@pytest.mark.asyncio
async def test_missing_user_id_yields_error(self, block):
input_data = block.Input(prompt="hello", max_recursion_depth=3)
ctx = _make_context(user_id="")
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert "authenticated user" in outputs.get("error", "")
@pytest.mark.asyncio
async def test_successful_run_yields_all_outputs(self, block):
"""With execute_copilot mocked, run() should yield all 5 success outputs."""
mock_result = (
"Hello world",
[],
'[{"role":"user","content":"hi"}]',
"sess-abc",
{"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
)
block.execute_copilot = AsyncMock(return_value=mock_result)
block.create_session = AsyncMock(return_value="sess-abc")
input_data = block.Input(prompt="hi", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["response"] == "Hello world"
assert outputs["tool_calls"] == []
assert outputs["session_id"] == "sess-abc"
assert outputs["token_usage"]["total_tokens"] == 15
assert "error" not in outputs
@pytest.mark.asyncio
async def test_exception_yields_error(self, block):
"""On unexpected failure, run() should yield an error output."""
block.execute_copilot = AsyncMock(side_effect=RuntimeError("boom"))
block.create_session = AsyncMock(return_value="sess-fail")
input_data = block.Input(prompt="do something", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["session_id"] == "sess-fail"
assert "boom" in outputs.get("error", "")
@pytest.mark.asyncio
async def test_cancelled_error_yields_error_and_reraises(self, block):
"""CancelledError should yield error, then re-raise."""
block.execute_copilot = AsyncMock(side_effect=asyncio.CancelledError())
block.create_session = AsyncMock(return_value="sess-cancel")
input_data = block.Input(prompt="do something", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
with pytest.raises(asyncio.CancelledError):
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["session_id"] == "sess-cancel"
assert "cancelled" in outputs.get("error", "").lower()
@pytest.mark.asyncio
async def test_existing_session_id_skips_create(self, block):
"""When session_id is provided, create_session should not be called."""
mock_result = (
"ok",
[],
"[]",
"existing-sid",
{"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
)
block.execute_copilot = AsyncMock(return_value=mock_result)
block.create_session = AsyncMock()
input_data = block.Input(
prompt="test", session_id="existing-sid", max_recursion_depth=3
)
ctx = _make_context()
async for _ in block.run(input_data, execution_context=ctx):
pass
block.create_session.assert_not_called()
# ---------------------------------------------------------------------------
# Block registration / ID tests
# ---------------------------------------------------------------------------
class TestBlockRegistration:
def test_block_id_matches_constant(self):
block = AutoPilotBlock()
assert block.id == AUTOPILOT_BLOCK_ID
def test_max_recursion_depth_has_upper_bound(self):
"""Schema should enforce le=10."""
schema = AutoPilotBlock.Input.model_json_schema()
max_rec = schema["properties"]["max_recursion_depth"]
assert (
max_rec.get("maximum") == 10 or max_rec.get("exclusiveMaximum", 999) <= 11
)
def test_output_schema_has_no_duplicate_error_field(self):
"""Output should inherit error from BlockSchemaOutput, not redefine it."""
# The field should exist (inherited) but there should be no explicit
# redefinition. We verify by checking the class __annotations__ directly.
assert "error" not in AutoPilotBlock.Output.__annotations__

View File

@@ -13,18 +13,17 @@ class TestLLMStatsTracking:
"""Test that llm_call returns proper token counts in LLMResponse."""
import backend.blocks.llm as llm
# Mock the OpenAI client
# Mock the OpenAI Responses API response
mock_response = MagicMock()
mock_response.choices = [
MagicMock(message=MagicMock(content="Test response", tool_calls=None))
]
mock_response.usage = MagicMock(prompt_tokens=10, completion_tokens=20)
mock_response.output_text = "Test response"
mock_response.output = []
mock_response.usage = MagicMock(input_tokens=10, output_tokens=20)
# Test with mocked OpenAI response
with patch("openai.AsyncOpenAI") as mock_openai:
mock_client = AsyncMock()
mock_openai.return_value = mock_client
mock_client.chat.completions.create = AsyncMock(return_value=mock_response)
mock_client.responses.create = AsyncMock(return_value=mock_response)
response = await llm.llm_call(
credentials=llm.TEST_CREDENTIALS,
@@ -271,30 +270,17 @@ class TestLLMStatsTracking:
mock_response = MagicMock()
# Return different responses for chunk summary vs final summary
if call_count == 1:
mock_response.choices = [
MagicMock(
message=MagicMock(
content='<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>',
tool_calls=None,
)
)
]
mock_response.output_text = '<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>'
else:
mock_response.choices = [
MagicMock(
message=MagicMock(
content='<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>',
tool_calls=None,
)
)
]
mock_response.usage = MagicMock(prompt_tokens=50, completion_tokens=30)
mock_response.output_text = '<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>'
mock_response.output = []
mock_response.usage = MagicMock(input_tokens=50, output_tokens=30)
return mock_response
with patch("openai.AsyncOpenAI") as mock_openai:
mock_client = AsyncMock()
mock_openai.return_value = mock_client
mock_client.chat.completions.create = mock_create
mock_client.responses.create = mock_create
# Test with very short text (should only need 1 chunk + 1 final summary)
input_data = llm.AITextSummarizerBlock.Input(

View File

@@ -290,7 +290,9 @@ class FillTextTemplateBlock(Block):
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
formatter = text.TextFormatter(autoescape=input_data.escape_html)
yield "output", formatter.format_string(input_data.format, input_data.values)
yield "output", await formatter.format_string(
input_data.format, input_data.values
)
class CombineTextsBlock(Block):

View File

@@ -115,10 +115,22 @@ class ChatConfig(BaseSettings):
description="Use --resume for multi-turn conversations instead of "
"history compression. Falls back to compression when unavailable.",
)
use_openrouter: bool = Field(
default=True,
description="Enable routing API calls through the OpenRouter proxy. "
"The actual decision also requires ``api_key`` and ``base_url`` — "
"use the ``openrouter_active`` property for the final answer.",
)
use_claude_code_subscription: bool = Field(
default=False,
description="For personal/dev use: use Claude Code CLI subscription auth instead of API keys. Requires `claude login` on the host. Only works with SDK mode.",
)
test_mode: bool = Field(
default=False,
description="Use dummy service instead of real LLM calls. "
"Send __test_transient_error__, __test_fatal_error__, or "
"__test_slow_response__ to trigger specific scenarios.",
)
# E2B Sandbox Configuration
use_e2b_sandbox: bool = Field(
@@ -136,7 +148,7 @@ class ChatConfig(BaseSettings):
description="E2B sandbox template to use for copilot sessions.",
)
e2b_sandbox_timeout: int = Field(
default=300, # 5 min safety net — explicit per-turn pause is the primary mechanism
default=420, # 7 min safety net — allows headroom for compaction retries
description="E2B sandbox running-time timeout (seconds). "
"E2B timeout is wall-clock (not idle). Explicit per-turn pause is the primary "
"mechanism; this is the safety net.",
@@ -146,6 +158,21 @@ class ChatConfig(BaseSettings):
description="E2B lifecycle action on timeout: 'pause' (default, free) or 'kill'.",
)
@property
def openrouter_active(self) -> bool:
"""True when OpenRouter is enabled AND credentials are usable.
Single source of truth for "will the SDK route through OpenRouter?".
Checks the flag *and* that ``api_key`` + a valid ``base_url`` are
present — mirrors the fallback logic in ``_build_sdk_env``.
"""
if not self.use_openrouter:
return False
base = (self.base_url or "").rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
return bool(self.api_key and base and base.startswith("http"))
@property
def e2b_active(self) -> bool:
"""True when E2B is enabled and the API key is present.
@@ -168,15 +195,6 @@ class ChatConfig(BaseSettings):
"""
return self.e2b_api_key if self.e2b_active else None
@field_validator("use_e2b_sandbox", mode="before")
@classmethod
def get_use_e2b_sandbox(cls, v):
"""Get use_e2b_sandbox from environment if not provided."""
env_val = os.getenv("CHAT_USE_E2B_SANDBOX", "").lower()
if env_val:
return env_val in ("true", "1", "yes", "on")
return True if v is None else v
@field_validator("e2b_api_key", mode="before")
@classmethod
def get_e2b_api_key(cls, v):
@@ -219,26 +237,6 @@ class ChatConfig(BaseSettings):
v = OPENROUTER_BASE_URL
return v
@field_validator("use_claude_agent_sdk", mode="before")
@classmethod
def get_use_claude_agent_sdk(cls, v):
"""Get use_claude_agent_sdk from environment if not provided."""
# Check environment variable - default to True if not set
env_val = os.getenv("CHAT_USE_CLAUDE_AGENT_SDK", "").lower()
if env_val:
return env_val in ("true", "1", "yes", "on")
# Default to True (SDK enabled by default)
return True if v is None else v
@field_validator("use_claude_code_subscription", mode="before")
@classmethod
def get_use_claude_code_subscription(cls, v):
"""Get use_claude_code_subscription from environment if not provided."""
env_val = os.getenv("CHAT_USE_CLAUDE_CODE_SUBSCRIPTION", "").lower()
if env_val:
return env_val in ("true", "1", "yes", "on")
return False if v is None else v
# Prompt paths for different contexts
PROMPT_PATHS: dict[str, str] = {
"default": "prompts/chat_system.md",
@@ -248,6 +246,7 @@ class ChatConfig(BaseSettings):
class Config:
"""Pydantic config."""
env_prefix = "CHAT_"
env_file = ".env"
env_file_encoding = "utf-8"
extra = "ignore" # Ignore extra environment variables

View File

@@ -6,19 +6,70 @@ from .config import ChatConfig
# Env vars that the ChatConfig validators read — must be cleared so they don't
# override the explicit constructor values we pass in each test.
_E2B_ENV_VARS = (
_ENV_VARS_TO_CLEAR = (
"CHAT_USE_E2B_SANDBOX",
"CHAT_E2B_API_KEY",
"E2B_API_KEY",
"CHAT_USE_OPENROUTER",
"CHAT_API_KEY",
"OPEN_ROUTER_API_KEY",
"OPENAI_API_KEY",
"CHAT_BASE_URL",
"OPENROUTER_BASE_URL",
"OPENAI_BASE_URL",
)
@pytest.fixture(autouse=True)
def _clean_e2b_env(monkeypatch: pytest.MonkeyPatch) -> None:
for var in _E2B_ENV_VARS:
def _clean_env(monkeypatch: pytest.MonkeyPatch) -> None:
for var in _ENV_VARS_TO_CLEAR:
monkeypatch.delenv(var, raising=False)
class TestOpenrouterActive:
"""Tests for the openrouter_active property."""
def test_enabled_with_credentials_returns_true(self):
cfg = ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is True
def test_enabled_but_missing_api_key_returns_false(self):
cfg = ChatConfig(
use_openrouter=True,
api_key=None,
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is False
def test_disabled_returns_false_despite_credentials(self):
cfg = ChatConfig(
use_openrouter=False,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is False
def test_strips_v1_suffix_and_still_valid(self):
cfg = ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is True
def test_invalid_base_url_returns_false(self):
cfg = ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="not-a-url",
)
assert cfg.openrouter_active is False
class TestE2BActive:
"""Tests for the e2b_active property — single source of truth for E2B usage."""

View File

@@ -4,6 +4,9 @@
# The hex suffix makes accidental LLM generation of these strings virtually
# impossible, avoiding false-positive marker detection in normal conversation.
COPILOT_ERROR_PREFIX = "[__COPILOT_ERROR_f7a1__]" # Renders as ErrorCard
COPILOT_RETRYABLE_ERROR_PREFIX = (
"[__COPILOT_RETRYABLE_ERROR_a9c2__]" # ErrorCard + retry
)
COPILOT_SYSTEM_PREFIX = "[__COPILOT_SYSTEM_e3b0__]" # Renders as system info message
# Prefix for all synthetic IDs generated by CoPilot block execution.
@@ -35,3 +38,24 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
Format: "{node_id}:{random_hex}" → returns "{node_id}".
"""
return node_exec_id.rsplit(COPILOT_NODE_EXEC_ID_SEPARATOR, 1)[0]
# ---------------------------------------------------------------------------
# Transient Anthropic API error detection
# ---------------------------------------------------------------------------
# Patterns in error text that indicate a transient Anthropic API error
# (ECONNRESET / dropped TCP connection) which is retryable.
_TRANSIENT_ERROR_PATTERNS = (
"socket connection was closed unexpectedly",
"ECONNRESET",
"connection was forcibly closed",
"network socket disconnected",
)
FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"
def is_transient_api_error(error_text: str) -> bool:
"""Return True if *error_text* matches a known transient Anthropic API error."""
lower = error_text.lower()
return any(pat.lower() in lower for pat in _TRANSIENT_ERROR_PATTERNS)

View File

@@ -17,8 +17,17 @@ from backend.util.workspace import WorkspaceManager
if TYPE_CHECKING:
from e2b import AsyncSandbox
# Allowed base directory for the Read tool.
_SDK_PROJECTS_DIR = os.path.realpath(os.path.expanduser("~/.claude/projects"))
# Allowed base directory for the Read tool. Public so service.py can use it
# for sweep operations without depending on a private implementation detail.
# Respects CLAUDE_CONFIG_DIR env var, consistent with transcript.py's
# _projects_base() function.
_config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
SDK_PROJECTS_DIR = os.path.realpath(os.path.join(_config_dir, "projects"))
# Compiled UUID pattern for validating conversation directory names.
# Kept as a module-level constant so the security-relevant pattern is easy
# to audit in one place and avoids recompilation on every call.
_UUID_RE = re.compile(r"^[0-9a-f]{8}(?:-[0-9a-f]{4}){3}-[0-9a-f]{12}$", re.IGNORECASE)
# Encoded project-directory name for the current session (e.g.
# "-private-tmp-copilot-<uuid>"). Set by set_execution_context() so path
@@ -35,11 +44,20 @@ _current_sandbox: ContextVar["AsyncSandbox | None"] = ContextVar(
_current_sdk_cwd: ContextVar[str] = ContextVar("_current_sdk_cwd", default="")
def _encode_cwd_for_cli(cwd: str) -> str:
"""Encode a working directory path the same way the Claude CLI does."""
def encode_cwd_for_cli(cwd: str) -> str:
"""Encode a working directory path the same way the Claude CLI does.
The Claude CLI encodes the absolute cwd as a directory name by replacing
every non-alphanumeric character with ``-``. For example
``/tmp/copilot-abc`` becomes ``-tmp-copilot-abc``.
"""
return re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(cwd))
# Keep the private alias for internal callers (backwards compat).
_encode_cwd_for_cli = encode_cwd_for_cli
def set_execution_context(
user_id: str | None,
session: ChatSession,
@@ -100,7 +118,9 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
Allowed:
- Files under *sdk_cwd* (``/tmp/copilot-<session>/``)
- Files under ``~/.claude/projects/<encoded-cwd>/tool-results/`` (SDK tool-results)
- Files under ``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/...``.
The SDK nests tool-results under a conversation UUID directory;
the UUID segment is validated with ``_UUID_RE``.
"""
if not path:
return False
@@ -119,10 +139,22 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
encoded = _current_project_dir.get("")
if encoded:
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
if resolved == tool_results_dir or resolved.startswith(
tool_results_dir + os.sep
):
return True
project_dir = os.path.realpath(os.path.join(SDK_PROJECTS_DIR, encoded))
# Defence-in-depth: ensure project_dir didn't escape the base.
if not project_dir.startswith(SDK_PROJECTS_DIR + os.sep):
return False
# Only allow: <encoded-cwd>/<uuid>/tool-results/<file>
# The SDK always creates a conversation UUID directory between
# the project dir and tool-results/.
if resolved.startswith(project_dir + os.sep):
relative = resolved[len(project_dir) + 1 :]
parts = relative.split(os.sep)
# Require exactly: [<uuid>, "tool-results", <file>, ...]
if (
len(parts) >= 3
and _UUID_RE.match(parts[0])
and parts[1] == "tool-results"
):
return True
return False

View File

@@ -9,7 +9,7 @@ from unittest.mock import MagicMock
import pytest
from backend.copilot.context import (
_SDK_PROJECTS_DIR,
SDK_PROJECTS_DIR,
_current_project_dir,
get_current_sandbox,
get_execution_context,
@@ -104,11 +104,13 @@ def test_is_allowed_local_path_no_sdk_cwd_no_project_dir():
assert not is_allowed_local_path("/tmp/some-file.txt", sdk_cwd=None)
def test_is_allowed_local_path_tool_results_dir():
"""Files under the tool-results directory for the current project are allowed."""
def test_is_allowed_local_path_tool_results_with_uuid():
"""Files under <encoded-cwd>/<uuid>/tool-results/ are allowed."""
encoded = "test-encoded-dir"
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
path = os.path.join(tool_results_dir, "output.txt")
conv_uuid = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
path = os.path.join(
SDK_PROJECTS_DIR, encoded, conv_uuid, "tool-results", "output.txt"
)
_current_project_dir.set(encoded)
try:
@@ -117,10 +119,22 @@ def test_is_allowed_local_path_tool_results_dir():
_current_project_dir.set("")
def test_is_allowed_local_path_tool_results_without_uuid_rejected():
"""Direct <encoded-cwd>/tool-results/ (no UUID) is rejected."""
encoded = "test-encoded-dir"
path = os.path.join(SDK_PROJECTS_DIR, encoded, "tool-results", "output.txt")
_current_project_dir.set(encoded)
try:
assert not is_allowed_local_path(path, sdk_cwd=None)
finally:
_current_project_dir.set("")
def test_is_allowed_local_path_sibling_of_tool_results_is_rejected():
"""A path adjacent to tool-results/ but not inside it is rejected."""
encoded = "test-encoded-dir"
sibling_path = os.path.join(_SDK_PROJECTS_DIR, encoded, "other-dir", "file.txt")
sibling_path = os.path.join(SDK_PROJECTS_DIR, encoded, "other-dir", "file.txt")
_current_project_dir.set(encoded)
try:
@@ -129,6 +143,21 @@ def test_is_allowed_local_path_sibling_of_tool_results_is_rejected():
_current_project_dir.set("")
def test_is_allowed_local_path_valid_uuid_wrong_segment_name_rejected():
"""A valid UUID dir but non-'tool-results' second segment is rejected."""
encoded = "test-encoded-dir"
uuid_str = "12345678-1234-5678-9abc-def012345678"
path = os.path.join(
SDK_PROJECTS_DIR, encoded, uuid_str, "not-tool-results", "output.txt"
)
_current_project_dir.set(encoded)
try:
assert not is_allowed_local_path(path, sdk_cwd=None)
finally:
_current_project_dir.set("")
# ---------------------------------------------------------------------------
# resolve_sandbox_path
# ---------------------------------------------------------------------------

View File

@@ -16,6 +16,7 @@ from backend.copilot.baseline import stream_chat_completion_baseline
from backend.copilot.config import ChatConfig
from backend.copilot.response_model import StreamFinish
from backend.copilot.sdk import service as sdk_service
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
from backend.executor.cluster_lock import ClusterLock
from backend.util.decorator import error_logged
from backend.util.feature_flag import Flag, is_feature_enabled
@@ -246,17 +247,25 @@ class CoPilotProcessor:
# Choose service based on LaunchDarkly flag.
# Claude Code subscription forces SDK mode (CLI subprocess auth).
config = ChatConfig()
use_sdk = config.use_claude_code_subscription or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else stream_chat_completion_baseline
)
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
if config.test_mode:
stream_fn = stream_chat_completion_dummy
log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
else:
use_sdk = (
config.use_claude_code_subscription
or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else stream_chat_completion_baseline
)
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
# Stream chat completion and publish chunks to Redis.
async for chunk in stream_fn(

View File

@@ -0,0 +1,173 @@
"""Integration credential lookup with per-process TTL cache.
Provides token retrieval for connected integrations so that copilot tools
(e.g. bash_exec) can inject auth tokens into the execution environment without
hitting the database on every command.
Cache semantics (handled automatically by TTLCache):
- Token found → cached for _TOKEN_CACHE_TTL (5 min). Avoids repeated DB hits
for users who have credentials and are running many bash commands.
- No credentials found → cached for _NULL_CACHE_TTL (60 s). Avoids a DB hit
on every E2B command for users who haven't connected an account yet, while
still picking up a newly-connected account within one minute.
Both caches are bounded to _CACHE_MAX_SIZE entries; cachetools evicts the
least-recently-used entry when the limit is reached.
Multi-worker note: both caches are in-process only. Each worker/replica
maintains its own independent cache, so a credential fetch may be duplicated
across processes. This is acceptable for the current goal (reduce DB hits per
session per-process), but if cache efficiency across replicas becomes important
a shared cache (e.g. Redis) should be used instead.
"""
import logging
from typing import cast
from cachetools import TTLCache
from backend.copilot.providers import SUPPORTED_PROVIDERS
from backend.data.model import APIKeyCredentials, OAuth2Credentials
from backend.integrations.creds_manager import (
IntegrationCredentialsManager,
register_creds_changed_hook,
)
logger = logging.getLogger(__name__)
# Derived from the single SUPPORTED_PROVIDERS registry for backward compat.
PROVIDER_ENV_VARS: dict[str, list[str]] = {
slug: entry["env_vars"] for slug, entry in SUPPORTED_PROVIDERS.items()
}
_TOKEN_CACHE_TTL = 300.0 # seconds — for found tokens
_NULL_CACHE_TTL = 60.0 # seconds — for "not connected" results
_CACHE_MAX_SIZE = 10_000
# (user_id, provider) → token string. TTLCache handles expiry + eviction.
# Thread-safety note: TTLCache is NOT thread-safe, but that is acceptable here
# because all callers (get_provider_token, invalidate_user_provider_cache) run
# exclusively on the asyncio event loop. There are no await points between a
# cache read and its corresponding write within any function, so no concurrent
# coroutine can interleave. If ThreadPoolExecutor workers are ever added to
# this path, a threading.RLock should be wrapped around these caches.
_token_cache: TTLCache[tuple[str, str], str] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_TOKEN_CACHE_TTL
)
# Separate cache for "no credentials" results with a shorter TTL.
_null_cache: TTLCache[tuple[str, str], bool] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
)
def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
"""Remove the cached entry for *user_id*/*provider* from both caches.
Call this after storing new credentials so that the next
``get_provider_token()`` call performs a fresh DB lookup instead of
serving a stale TTL-cached result.
"""
key = (user_id, provider)
_token_cache.pop(key, None)
_null_cache.pop(key, None)
# Register this module's cache-bust function with the credentials manager so
# that any create/update/delete operation immediately evicts stale cache
# entries. This avoids a lazy import inside creds_manager and eliminates the
# circular-import risk.
try:
register_creds_changed_hook(invalidate_user_provider_cache)
except RuntimeError:
# Hook already registered (e.g. module re-import in tests).
pass
# Module-level singleton to avoid re-instantiating IntegrationCredentialsManager
# on every cache-miss call to get_provider_token().
_manager = IntegrationCredentialsManager()
async def get_provider_token(user_id: str, provider: str) -> str | None:
"""Return the user's access token for *provider*, or ``None`` if not connected.
OAuth2 tokens are preferred (refreshed if needed); API keys are the fallback.
Found tokens are cached for _TOKEN_CACHE_TTL (5 min). "Not connected" results
are cached for _NULL_CACHE_TTL (60 s) to avoid a DB hit on every bash_exec
command for users who haven't connected yet, while still picking up a
newly-connected account within one minute.
"""
cache_key = (user_id, provider)
if cache_key in _null_cache:
return None
if cached := _token_cache.get(cache_key):
return cached
manager = _manager
try:
creds_list = await manager.store.get_creds_by_provider(user_id, provider)
except Exception:
logger.warning(
"Failed to fetch %s credentials for user %s",
provider,
user_id,
exc_info=True,
)
return None
# Pass 1: prefer OAuth2 (carry scope info, refreshable via token endpoint).
# Sort so broader-scoped tokens come first: a token with "repo" scope covers
# full git access, while a public-data-only token lacks push/pull permission.
# lock=False — background injection; not worth a distributed lock acquisition.
oauth2_creds = sorted(
[c for c in creds_list if c.type == "oauth2"],
key=lambda c: 0 if "repo" in (cast(OAuth2Credentials, c).scopes or []) else 1,
)
for creds in oauth2_creds:
if creds.type == "oauth2":
try:
fresh = await manager.refresh_if_needed(
user_id, cast(OAuth2Credentials, creds), lock=False
)
token = fresh.access_token.get_secret_value()
except Exception:
logger.warning(
"Failed to refresh %s OAuth token for user %s; "
"discarding stale token to force re-auth",
provider,
user_id,
exc_info=True,
)
# Do NOT fall back to the stale token — it is likely expired
# or revoked. Returning None forces the caller to re-auth,
# preventing the LLM from receiving a non-functional token.
continue
_token_cache[cache_key] = token
return token
# Pass 2: fall back to API key (no expiry, no refresh needed).
for creds in creds_list:
if creds.type == "api_key":
token = cast(APIKeyCredentials, creds).api_key.get_secret_value()
_token_cache[cache_key] = token
return token
# No credentials found — cache to avoid repeated DB hits.
_null_cache[cache_key] = True
return None
async def get_integration_env_vars(user_id: str) -> dict[str, str]:
"""Return env vars for all providers the user has connected.
Iterates :data:`PROVIDER_ENV_VARS`, fetches each token, and builds a flat
``{env_var: token}`` dict ready to pass to a subprocess or E2B sandbox.
Only providers with a stored credential contribute entries.
"""
env: dict[str, str] = {}
for provider, var_names in PROVIDER_ENV_VARS.items():
token = await get_provider_token(user_id, provider)
if token:
for var in var_names:
env[var] = token
return env

View File

@@ -0,0 +1,195 @@
"""Tests for integration_creds — TTL cache and token lookup paths."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from pydantic import SecretStr
from backend.copilot.integration_creds import (
_NULL_CACHE_TTL,
_TOKEN_CACHE_TTL,
PROVIDER_ENV_VARS,
_null_cache,
_token_cache,
get_integration_env_vars,
get_provider_token,
invalidate_user_provider_cache,
)
from backend.data.model import APIKeyCredentials, OAuth2Credentials
_USER = "user-integration-creds-test"
_PROVIDER = "github"
def _make_api_key_creds(key: str = "test-api-key") -> APIKeyCredentials:
return APIKeyCredentials(
id="creds-api-key",
provider=_PROVIDER,
api_key=SecretStr(key),
title="Test API Key",
expires_at=None,
)
def _make_oauth2_creds(token: str = "test-oauth-token") -> OAuth2Credentials:
return OAuth2Credentials(
id="creds-oauth2",
provider=_PROVIDER,
title="Test OAuth",
access_token=SecretStr(token),
refresh_token=SecretStr("test-refresh"),
access_token_expires_at=None,
refresh_token_expires_at=None,
scopes=[],
)
@pytest.fixture(autouse=True)
def clear_caches():
"""Ensure clean caches before and after every test."""
_token_cache.clear()
_null_cache.clear()
yield
_token_cache.clear()
_null_cache.clear()
class TestInvalidateUserProviderCache:
def test_removes_token_entry(self):
key = (_USER, _PROVIDER)
_token_cache[key] = "tok"
invalidate_user_provider_cache(_USER, _PROVIDER)
assert key not in _token_cache
def test_removes_null_entry(self):
key = (_USER, _PROVIDER)
_null_cache[key] = True
invalidate_user_provider_cache(_USER, _PROVIDER)
assert key not in _null_cache
def test_noop_when_key_not_cached(self):
# Should not raise even when there is no cache entry.
invalidate_user_provider_cache("no-such-user", _PROVIDER)
def test_only_removes_targeted_key(self):
other_key = ("other-user", _PROVIDER)
_token_cache[other_key] = "other-tok"
invalidate_user_provider_cache(_USER, _PROVIDER)
assert other_key in _token_cache
class TestGetProviderToken:
@pytest.mark.asyncio(loop_scope="session")
async def test_returns_cached_token_without_db_hit(self):
_token_cache[(_USER, _PROVIDER)] = "cached-tok"
mock_manager = MagicMock()
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "cached-tok"
mock_manager.store.get_creds_by_provider.assert_not_called()
@pytest.mark.asyncio(loop_scope="session")
async def test_returns_none_for_null_cached_provider(self):
_null_cache[(_USER, _PROVIDER)] = True
mock_manager = MagicMock()
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
mock_manager.store.get_creds_by_provider.assert_not_called()
@pytest.mark.asyncio(loop_scope="session")
async def test_api_key_creds_returned_and_cached(self):
api_creds = _make_api_key_creds("my-api-key")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[api_creds])
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "my-api-key"
assert _token_cache.get((_USER, _PROVIDER)) == "my-api-key"
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth2_preferred_over_api_key(self):
oauth_creds = _make_oauth2_creds("oauth-tok")
api_creds = _make_api_key_creds("api-tok")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(
return_value=[api_creds, oauth_creds]
)
mock_manager.refresh_if_needed = AsyncMock(return_value=oauth_creds)
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "oauth-tok"
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth2_refresh_failure_returns_none(self):
"""On refresh failure, return None instead of caching a stale token."""
oauth_creds = _make_oauth2_creds("stale-oauth-tok")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[oauth_creds])
mock_manager.refresh_if_needed = AsyncMock(side_effect=RuntimeError("network"))
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
# Stale tokens must NOT be returned — forces re-auth.
assert result is None
@pytest.mark.asyncio(loop_scope="session")
async def test_no_credentials_caches_null_entry(self):
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[])
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
assert _null_cache.get((_USER, _PROVIDER)) is True
@pytest.mark.asyncio(loop_scope="session")
async def test_db_exception_returns_none_without_caching(self):
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(
side_effect=RuntimeError("db down")
)
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
# DB errors are not cached — next call will retry
assert (_USER, _PROVIDER) not in _token_cache
assert (_USER, _PROVIDER) not in _null_cache
@pytest.mark.asyncio(loop_scope="session")
async def test_null_cache_has_shorter_ttl_than_token_cache(self):
"""Verify the TTL constants are set correctly for each cache."""
assert _null_cache.ttl == _NULL_CACHE_TTL
assert _token_cache.ttl == _TOKEN_CACHE_TTL
assert _NULL_CACHE_TTL < _TOKEN_CACHE_TTL
class TestGetIntegrationEnvVars:
@pytest.mark.asyncio(loop_scope="session")
async def test_injects_all_env_vars_for_provider(self):
_token_cache[(_USER, "github")] = "gh-tok"
result = await get_integration_env_vars(_USER)
for var in PROVIDER_ENV_VARS["github"]:
assert result[var] == "gh-tok"
@pytest.mark.asyncio(loop_scope="session")
async def test_empty_dict_when_no_credentials(self):
_null_cache[(_USER, "github")] = True
result = await get_integration_env_vars(_USER)
assert result == {}

View File

@@ -6,10 +6,11 @@ handling the distinction between:
- Local mode vs E2B mode (storage/filesystem differences)
"""
from backend.blocks.autopilot import AUTOPILOT_BLOCK_ID
from backend.copilot.tools import TOOL_REGISTRY
# Shared technical notes that apply to both SDK and baseline modes
_SHARED_TOOL_NOTES = """\
_SHARED_TOOL_NOTES = f"""\
### Sharing files with the user
After saving a file to the persistent workspace with `write_workspace_file`,
@@ -81,18 +82,53 @@ that would be corrupted by text encoding.
Example — committing an image file to GitHub:
```json
{
"files": [{
{{
"files": [{{
"path": "docs/hero.png",
"content": "workspace://abc123#image/png",
"operation": "upsert"
}]
}
}}]
}}
```
### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
All tasks must run in the foreground.
### Delegating to another autopilot (sub-autopilot pattern)
Use the **AutoPilotBlock** (`run_block` with block_id
`{AUTOPILOT_BLOCK_ID}`) to delegate a task to a fresh
autopilot instance. The sub-autopilot has its own full tool set and can
perform multi-step work autonomously.
- **Input**: `prompt` (required) — the task description.
Optional: `system_context` to constrain behavior, `session_id` to
continue a previous conversation, `max_recursion_depth` (default 3).
- **Output**: `response` (text), `tool_calls` (list), `session_id`
(for continuation), `conversation_history`, `token_usage`.
Use this when a task is complex enough to benefit from a separate
autopilot context, e.g. "research X and write a report" while the
parent autopilot handles orchestration.
"""
# E2B-only notes — E2B has full internet access so gh CLI works there.
# Not shown in local (bubblewrap) mode: --unshare-net blocks all network.
_E2B_TOOL_NOTES = """
### GitHub CLI (`gh`) and git
- If the user has connected their GitHub account, both `gh` and `git` are
pre-authenticated — use them directly without any manual login step.
`git` HTTPS operations (clone, push, pull) work automatically.
- If the token changes mid-session (e.g. user reconnects with a new token),
run `gh auth setup-git` to re-register the credential helper.
- If `gh` or `git` fails with an authentication error (e.g. "authentication
required", "could not read Username", or exit code 128), call
`connect_integration(provider="github")` to surface the GitHub credentials
setup card so the user can connect their account. Once connected, retry
the operation.
- For operations that need broader access (e.g. private org repos, GitHub
Actions), pass the required scopes: e.g.
`connect_integration(provider="github", scopes=["repo", "read:org"])`.
"""
@@ -105,6 +141,7 @@ def _build_storage_supplement(
storage_system_1_persistence: list[str],
file_move_name_1_to_2: str,
file_move_name_2_to_1: str,
extra_notes: str = "",
) -> str:
"""Build storage/filesystem supplement for a specific environment.
@@ -119,6 +156,7 @@ def _build_storage_supplement(
storage_system_1_persistence: List of persistence behavior descriptions
file_move_name_1_to_2: Direction label for primary→persistent
file_move_name_2_to_1: Direction label for persistent→primary
extra_notes: Environment-specific notes appended after shared notes
"""
# Format lists as bullet points with proper indentation
characteristics = "\n".join(f" - {c}" for c in storage_system_1_characteristics)
@@ -152,12 +190,23 @@ def _build_storage_supplement(
### File persistence
Important files (code, configs, outputs) should be saved to workspace to ensure they persist.
{_SHARED_TOOL_NOTES}"""
### SDK tool-result files
When tool outputs are large, the SDK truncates them and saves the full output to
a local file under `~/.claude/projects/.../tool-results/`. To read these files,
always use `read_file` or `Read` (NOT `read_workspace_file`).
`read_workspace_file` reads from cloud workspace storage, where SDK
tool-results are NOT stored.
{_SHARED_TOOL_NOTES}{extra_notes}"""
# Pre-built supplements for common environments
def _get_local_storage_supplement(cwd: str) -> str:
"""Local ephemeral storage (files lost between turns)."""
"""Local ephemeral storage (files lost between turns).
Network is isolated (bubblewrap --unshare-net), so internet-dependent CLIs
like gh will not work — no integration env-var notes are included.
"""
return _build_storage_supplement(
working_dir=cwd,
sandbox_type="in a network-isolated sandbox",
@@ -175,7 +224,11 @@ def _get_local_storage_supplement(cwd: str) -> str:
def _get_cloud_sandbox_supplement() -> str:
"""Cloud persistent sandbox (files survive across turns in session)."""
"""Cloud persistent sandbox (files survive across turns in session).
E2B has full internet access, so integration tokens (GH_TOKEN etc.) are
injected per command in bash_exec — include the CLI guidance notes.
"""
return _build_storage_supplement(
working_dir="/home/user",
sandbox_type="in a cloud sandbox with full internet access",
@@ -190,6 +243,7 @@ def _get_cloud_sandbox_supplement() -> str:
],
file_move_name_1_to_2="Sandbox → Persistent",
file_move_name_2_to_1="Persistent → Sandbox",
extra_notes=_E2B_TOOL_NOTES,
)

View File

@@ -0,0 +1,63 @@
"""Single source of truth for copilot-supported integration providers.
Both :mod:`~backend.copilot.integration_creds` (env-var injection) and
:mod:`~backend.copilot.tools.connect_integration` (UI setup card) import from
here, eliminating the risk of the two registries drifting out of sync.
"""
from typing import TypedDict
class ProviderEntry(TypedDict):
"""Metadata for a supported integration provider.
Attributes:
name: Human-readable display name (e.g. "GitHub").
env_vars: Environment variable names injected when the provider is
connected (e.g. ``["GH_TOKEN", "GITHUB_TOKEN"]``).
default_scopes: Default OAuth scopes requested when the agent does not
specify any.
"""
name: str
env_vars: list[str]
default_scopes: list[str]
def _is_github_oauth_configured() -> bool:
"""Return True if GitHub OAuth env vars are set.
Uses a lazy import to avoid triggering ``Secrets()`` during module import,
which can fail in environments where secrets are not yet loaded (e.g. tests,
CLI tooling).
"""
from backend.blocks.github._auth import GITHUB_OAUTH_IS_CONFIGURED
return GITHUB_OAUTH_IS_CONFIGURED
# -- Registry ----------------------------------------------------------------
# Add new providers here. Both env-var injection and the setup-card tool read
# from this single registry.
SUPPORTED_PROVIDERS: dict[str, ProviderEntry] = {
"github": {
"name": "GitHub",
"env_vars": ["GH_TOKEN", "GITHUB_TOKEN"],
"default_scopes": ["repo"],
},
}
def get_provider_auth_types(provider: str) -> list[str]:
"""Return the supported credential types for *provider* at runtime.
OAuth types are only offered when the corresponding OAuth client env vars
are configured.
"""
if provider == "github":
if _is_github_oauth_configured():
return ["api_key", "oauth2"]
return ["api_key"]
# Default for unknown/future providers — API key only.
return ["api_key"]

View File

@@ -43,6 +43,7 @@ class ResponseType(str, Enum):
ERROR = "error"
USAGE = "usage"
HEARTBEAT = "heartbeat"
STATUS = "status"
class StreamBaseResponse(BaseModel):
@@ -263,3 +264,19 @@ class StreamHeartbeat(StreamBaseResponse):
def to_sse(self) -> str:
"""Convert to SSE comment format to keep connection alive."""
return ": heartbeat\n\n"
class StreamStatus(StreamBaseResponse):
"""Transient status notification shown to the user during long operations.
Used to provide feedback when the backend performs behind-the-scenes work
(e.g., compacting conversation context on a retry) that would otherwise
leave the user staring at an unexplained pause.
Sent as a proper ``data:`` event so the frontend can display it to the
user. The AI SDK stream parser gracefully skips unknown chunk types
(logs a console warning), so this does not break the stream.
"""
type: ResponseType = ResponseType.STATUS
message: str = Field(..., description="Human-readable status message")

View File

@@ -19,9 +19,19 @@ least invasive way to break the cycle while keeping module-level constants
intact.
"""
from typing import Any
from typing import TYPE_CHECKING, Any
# Static imports for type checkers so they can resolve __all__ entries
# without executing the lazy-import machinery at runtime.
if TYPE_CHECKING:
from .collect import CopilotResult as CopilotResult
from .collect import collect_copilot_response as collect_copilot_response
from .service import stream_chat_completion_sdk as stream_chat_completion_sdk
from .tool_adapter import create_copilot_mcp_server as create_copilot_mcp_server
__all__ = [
"CopilotResult",
"collect_copilot_response",
"stream_chat_completion_sdk",
"create_copilot_mcp_server",
]
@@ -29,6 +39,8 @@ __all__ = [
# Dispatch table for PEP 562 lazy imports. Each entry is a (module, attr)
# pair so new exports can be added without touching __getattr__ itself.
_LAZY_IMPORTS: dict[str, tuple[str, str]] = {
"CopilotResult": (".collect", "CopilotResult"),
"collect_copilot_response": (".collect", "collect_copilot_response"),
"stream_chat_completion_sdk": (".service", "stream_chat_completion_sdk"),
"create_copilot_mcp_server": (".tool_adapter", "create_copilot_mcp_server"),
}

View File

@@ -143,6 +143,71 @@ To use an MCP (Model Context Protocol) tool as a node in the agent:
tool_arguments.
6. Output: `result` (the tool's return value) and `error` (error message)
### Using SmartDecisionMakerBlock (AI Orchestrator with Agent Mode)
To create an agent where AI autonomously decides which tools or sub-agents to
call in a loop until the task is complete:
1. Create a `SmartDecisionMakerBlock` node
(ID: `3b191d9f-356f-482d-8238-ba04b6d18381`)
2. Set `input_default`:
- `agent_mode_max_iterations`: Choose based on task complexity:
- `1` for single-step tool calls (AI picks one tool, calls it, done)
- `3``10` for multi-step tasks (AI calls tools iteratively)
- `-1` for open-ended orchestration (AI loops until it decides it's done).
**Use with caution** — prefer bounded iterations (310) unless
genuinely needed, as unbounded loops risk runaway cost and execution.
Do NOT use `0` (traditional mode) — it requires complex external
conversation-history loop wiring that the agent generator does not
produce.
- `conversation_compaction`: `true` (recommended to avoid context overflow)
- `retry`: Number of retries on tool-call failure (default `3`).
Set to `0` to disable retries.
- `multiple_tool_calls`: Whether the AI can invoke multiple tools in a
single turn (default `false`). Enable when tools are independent and
can run concurrently.
- Optional: `sys_prompt` for extra LLM context about how to orchestrate
3. Wire the `prompt` input from an `AgentInputBlock` (the user's task)
4. Create downstream tool blocks — regular blocks **or** `AgentExecutorBlock`
nodes that call sub-agents
5. Link each tool to the SmartDecisionMaker: set `source_name: "tools"` on
the SmartDecisionMaker side and `sink_name: <input_field>` on each tool
block's input. Create one link per input field the tool needs.
6. Wire the `finished` output to an `AgentOutputBlock` for the final result
7. Credentials (LLM API key) are configured by the user in the platform UI
after saving — do NOT require them upfront
**Example — Orchestrator calling two sub-agents:**
- Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `SmartDecisionMakerBlock` (input_default:
`{"agent_mode_max_iterations": 10, "conversation_compaction": true}`)
- Node 3: `AgentExecutorBlock` (sub-agent A — set `graph_id`, `graph_version`,
`input_schema`, `output_schema` from library agent)
- Node 4: `AgentExecutorBlock` (sub-agent B — same pattern)
- Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
- Links:
- Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
- SDM→Agent A (per input field): `source_name: "tools"`,
`sink_name: "<agent_a_input_field>"`
- SDM→Agent B (per input field): `source_name: "tools"`,
`sink_name: "<agent_b_input_field>"`
- SDM→Output: `source_name: "finished"`, `sink_name: "value"`
**Example — Orchestrator calling regular blocks as tools:**
- Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `SmartDecisionMakerBlock` (input_default:
`{"agent_mode_max_iterations": 5, "conversation_compaction": true}`)
- Node 3: `GetWebpageBlock` (regular block — the AI calls it as a tool)
- Node 4: `AITextGeneratorBlock` (another regular block as a tool)
- Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
- Links:
- Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
- SDM→GetWebpage: `source_name: "tools"`, `sink_name: "url"`
- SDM→AITextGenerator: `source_name: "tools"`, `sink_name: "prompt"`
- SDM→Output: `source_name: "finished"`, `sink_name: "value"`
Regular blocks work exactly like sub-agents as tools — wire each input
field from `source_name: "tools"` on the SmartDecisionMaker side.
### Example: Simple AI Text Processor
A minimal agent with input, processing, and output:

View File

@@ -0,0 +1,108 @@
"""Public helpers for consuming a copilot stream as a simple request-response.
This module exposes :class:`CopilotResult` and :func:`collect_copilot_response`
so that callers (e.g. the AutoPilot block) can consume the copilot stream
without implementing their own event loop.
"""
from __future__ import annotations
from typing import Any
class CopilotResult:
"""Aggregated result from consuming a copilot stream.
Returned by :func:`collect_copilot_response` so callers don't need to
implement their own event-loop over the raw stream events.
"""
__slots__ = (
"response_text",
"tool_calls",
"prompt_tokens",
"completion_tokens",
"total_tokens",
)
def __init__(self) -> None:
self.response_text: str = ""
self.tool_calls: list[dict[str, Any]] = []
self.prompt_tokens: int = 0
self.completion_tokens: int = 0
self.total_tokens: int = 0
async def collect_copilot_response(
*,
session_id: str,
message: str,
user_id: str,
is_user_message: bool = True,
) -> CopilotResult:
"""Consume :func:`stream_chat_completion_sdk` and return aggregated results.
This is the recommended entry-point for callers that need a simple
request-response interface (e.g. the AutoPilot block) rather than
streaming individual events. It avoids duplicating the event-collection
logic and does NOT wrap the stream in ``asyncio.timeout`` — the SDK
manages its own heartbeat-based timeouts internally.
Args:
session_id: Chat session to use.
message: The user message / prompt.
user_id: Authenticated user ID.
is_user_message: Whether this is a user-initiated message.
Returns:
A :class:`CopilotResult` with the aggregated response text,
tool calls, and token usage.
Raises:
RuntimeError: If the stream yields a ``StreamError`` event.
"""
from backend.copilot.response_model import (
StreamError,
StreamTextDelta,
StreamToolInputAvailable,
StreamToolOutputAvailable,
StreamUsage,
)
from .service import stream_chat_completion_sdk
result = CopilotResult()
response_parts: list[str] = []
tool_calls_by_id: dict[str, dict[str, Any]] = {}
async for event in stream_chat_completion_sdk(
session_id=session_id,
message=message,
is_user_message=is_user_message,
user_id=user_id,
):
if isinstance(event, StreamTextDelta):
response_parts.append(event.delta)
elif isinstance(event, StreamToolInputAvailable):
entry: dict[str, Any] = {
"tool_call_id": event.toolCallId,
"tool_name": event.toolName,
"input": event.input,
"output": None,
"success": None,
}
result.tool_calls.append(entry)
tool_calls_by_id[event.toolCallId] = entry
elif isinstance(event, StreamToolOutputAvailable):
if tc := tool_calls_by_id.get(event.toolCallId):
tc["output"] = event.output
tc["success"] = event.success
elif isinstance(event, StreamUsage):
result.prompt_tokens += event.prompt_tokens
result.completion_tokens += event.completion_tokens
result.total_tokens += event.total_tokens
elif isinstance(event, StreamError):
raise RuntimeError(f"Copilot error: {event.errorText}")
result.response_text = "".join(response_parts)
return result

View File

@@ -12,6 +12,7 @@ import asyncio
import logging
import uuid
from dataclasses import dataclass, field
from typing import Any
from ..constants import COMPACTION_DONE_MSG, COMPACTION_TOOL_NAME
from ..model import ChatMessage, ChatSession
@@ -119,14 +120,12 @@ def filter_compaction_messages(
filtered: list[ChatMessage] = []
for msg in messages:
if msg.role == "assistant" and msg.tool_calls:
real_calls: list[dict[str, Any]] = []
for tc in msg.tool_calls:
if tc.get("function", {}).get("name") == COMPACTION_TOOL_NAME:
compaction_ids.add(tc.get("id", ""))
real_calls = [
tc
for tc in msg.tool_calls
if tc.get("function", {}).get("name") != COMPACTION_TOOL_NAME
]
else:
real_calls.append(tc)
if not real_calls and not msg.content:
continue
if msg.role == "tool" and msg.tool_call_id in compaction_ids:
@@ -222,6 +221,7 @@ class CompactionTracker:
def reset_for_query(self) -> None:
"""Reset per-query state before a new SDK query."""
self._compact_start.clear()
self._done = False
self._start_emitted = False
self._tool_call_id = ""

View File

@@ -0,0 +1,54 @@
"""Shared test fixtures for copilot SDK tests."""
from __future__ import annotations
from unittest.mock import patch
from uuid import uuid4
import pytest
from backend.util import json
@pytest.fixture()
def mock_chat_config():
"""Mock ChatConfig so compact_transcript tests skip real config lookup."""
with patch(
"backend.copilot.config.ChatConfig",
return_value=type("Cfg", (), {"model": "m", "api_key": "k", "base_url": "u"})(),
):
yield
def build_test_transcript(pairs: list[tuple[str, str]]) -> str:
"""Build a minimal valid JSONL transcript from (role, content) pairs.
Use this helper in any copilot SDK test that needs a well-formed
transcript without hitting the real storage layer.
"""
lines: list[str] = []
last_uuid: str | None = None
for role, content in pairs:
uid = str(uuid4())
entry_type = "assistant" if role == "assistant" else "user"
msg: dict = {"role": role, "content": content}
if role == "assistant":
msg.update(
{
"model": "",
"id": f"msg_{uid[:8]}",
"type": "message",
"content": [{"type": "text", "text": content}],
"stop_reason": "end_turn",
"stop_sequence": None,
}
)
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": last_uuid,
"message": msg,
}
lines.append(json.dumps(entry, separators=(",", ":")))
last_uuid = uid
return "\n".join(lines) + "\n"

View File

@@ -1,9 +1,17 @@
"""Dummy SDK service for testing copilot streaming.
Returns mock streaming responses without calling Claude Agent SDK.
Enable via COPILOT_TEST_MODE=true environment variable.
Enable via CHAT_TEST_MODE=true in .env (ChatConfig.test_mode).
WARNING: This is for testing only. Do not use in production.
Magic keywords (case-insensitive, anywhere in message):
__test_transient_error__ — Simulate a transient Anthropic API error
(ECONNRESET). Streams partial text, then
yields StreamError with retryable prefix.
__test_fatal_error__ — Simulate a non-retryable SDK error.
__test_slow_response__ — Simulate a slow response (2s per word).
(no keyword) — Normal dummy response.
"""
import asyncio
@@ -12,12 +20,39 @@ import uuid
from collections.abc import AsyncGenerator
from typing import Any
from ..model import ChatSession
from ..response_model import StreamBaseResponse, StreamStart, StreamTextDelta
from ..constants import (
COPILOT_ERROR_PREFIX,
COPILOT_RETRYABLE_ERROR_PREFIX,
FRIENDLY_TRANSIENT_MSG,
)
from ..model import ChatMessage, ChatSession, get_chat_session, upsert_chat_session
from ..response_model import (
StreamBaseResponse,
StreamError,
StreamFinish,
StreamFinishStep,
StreamStart,
StreamStartStep,
StreamTextDelta,
StreamTextEnd,
StreamTextStart,
)
logger = logging.getLogger(__name__)
async def _safe_upsert(session: ChatSession) -> None:
"""Best-effort session persist — skip silently if DB is unavailable."""
try:
await upsert_chat_session(session)
except Exception:
logger.debug("[TEST MODE] Could not persist session (DB unavailable)")
def _has_keyword(message: str | None, keyword: str) -> bool:
return keyword in (message or "").lower()
async def stream_chat_completion_dummy(
session_id: str,
message: str | None = None,
@@ -36,24 +71,89 @@ async def stream_chat_completion_dummy(
- No timeout occurs
- Text arrives in chunks
- StreamFinish is sent by mark_session_completed
See module docstring for magic keywords that trigger error scenarios.
"""
logger.warning(
f"[TEST MODE] Using dummy copilot streaming for session {session_id}"
)
# Load session from DB (matches SDK service behaviour) so error markers
# and the assistant reply are persisted and survive page refresh.
# Best-effort: skip if DB is unavailable (e.g. unit tests).
if session is None:
try:
session = await get_chat_session(session_id, user_id)
except Exception:
logger.debug("[TEST MODE] Could not load session (DB unavailable)")
session = None
message_id = str(uuid.uuid4())
text_block_id = str(uuid.uuid4())
# Start the stream
# Start the stream (matches baseline: StreamStart → StreamStartStep)
yield StreamStart(messageId=message_id, sessionId=session_id)
yield StreamStartStep()
# Simulate streaming text response with delays
# --- Magic keyword: transient error (retryable) -------------------------
if _has_keyword(message, "__test_transient_error__"):
# Stream some partial text first (simulates mid-stream failure)
yield StreamTextStart(id=text_block_id)
for word in ["Working", "on", "it..."]:
yield StreamTextDelta(id=text_block_id, delta=f"{word} ")
await asyncio.sleep(0.1)
yield StreamTextEnd(id=text_block_id)
yield StreamFinishStep()
# Persist retryable marker so "Try Again" button shows after refresh
if session:
session.messages.append(
ChatMessage(
role="assistant",
content=f"{COPILOT_RETRYABLE_ERROR_PREFIX} {FRIENDLY_TRANSIENT_MSG}",
)
)
await _safe_upsert(session)
yield StreamError(
errorText=FRIENDLY_TRANSIENT_MSG,
code="transient_api_error",
)
return
# --- Magic keyword: fatal error (non-retryable) -------------------------
if _has_keyword(message, "__test_fatal_error__"):
yield StreamFinishStep()
error_msg = "Internal SDK error: model refused to respond"
# Persist non-retryable error marker
if session:
session.messages.append(
ChatMessage(
role="assistant",
content=f"{COPILOT_ERROR_PREFIX} {error_msg}",
)
)
await _safe_upsert(session)
yield StreamError(errorText=error_msg, code="sdk_error")
return
# --- Magic keyword: slow response ---------------------------------------
delay = 2.0 if _has_keyword(message, "__test_slow_response__") else 0.1
# --- Normal dummy response ----------------------------------------------
dummy_response = "I counted: 1... 2... 3. All done!"
words = dummy_response.split()
yield StreamTextStart(id=text_block_id)
for i, word in enumerate(words):
# Add space except for last word
text = word if i == len(words) - 1 else f"{word} "
yield StreamTextDelta(id=text_block_id, delta=text)
# Small delay to simulate real streaming
await asyncio.sleep(0.1)
await asyncio.sleep(delay)
yield StreamTextEnd(id=text_block_id)
# Persist the assistant reply so it survives page refresh
if session:
session.messages.append(ChatMessage(role="assistant", content=dummy_response))
await _safe_upsert(session)
yield StreamFinishStep()
yield StreamFinish()

View File

@@ -26,6 +26,41 @@ from backend.copilot.context import (
logger = logging.getLogger(__name__)
async def _check_sandbox_symlink_escape(
sandbox: Any,
parent: str,
) -> str | None:
"""Resolve the canonical parent path inside the sandbox to detect symlink escapes.
``normpath`` (used by ``resolve_sandbox_path``) only normalises the string;
``readlink -f`` follows actual symlinks on the sandbox filesystem.
Returns the canonical parent path, or ``None`` if the path escapes
``E2B_WORKDIR``.
Note: There is an inherent TOCTOU window between this check and the
subsequent ``sandbox.files.write()``. A symlink could theoretically be
replaced between the two operations. This is acceptable in the E2B
sandbox model since the sandbox is single-user and ephemeral.
"""
canonical_res = await sandbox.commands.run(
f"readlink -f {shlex.quote(parent or E2B_WORKDIR)}",
cwd=E2B_WORKDIR,
timeout=5,
)
canonical_parent = (canonical_res.stdout or "").strip()
if (
canonical_res.exit_code != 0
or not canonical_parent
or (
canonical_parent != E2B_WORKDIR
and not canonical_parent.startswith(E2B_WORKDIR + "/")
)
):
return None
return canonical_parent
def _get_sandbox():
return get_current_sandbox()
@@ -106,6 +141,10 @@ async def _handle_write_file(args: dict[str, Any]) -> dict[str, Any]:
parent = os.path.dirname(remote)
if parent and parent != E2B_WORKDIR:
await sandbox.files.make_dir(parent)
canonical_parent = await _check_sandbox_symlink_escape(sandbox, parent)
if canonical_parent is None:
return _mcp(f"Path must be within {E2B_WORKDIR}: {parent}", error=True)
remote = os.path.join(canonical_parent, os.path.basename(remote))
await sandbox.files.write(remote, content)
except Exception as exc:
return _mcp(f"Failed to write {remote}: {exc}", error=True)
@@ -130,6 +169,12 @@ async def _handle_edit_file(args: dict[str, Any]) -> dict[str, Any]:
return result
sandbox, remote = result
parent = os.path.dirname(remote)
canonical_parent = await _check_sandbox_symlink_escape(sandbox, parent)
if canonical_parent is None:
return _mcp(f"Path must be within {E2B_WORKDIR}: {parent}", error=True)
remote = os.path.join(canonical_parent, os.path.basename(remote))
try:
raw: bytes = await sandbox.files.read(remote, format="bytes")
content = raw.decode("utf-8", errors="replace")

View File

@@ -4,15 +4,19 @@ Pure unit tests with no external dependencies (no E2B, no sandbox).
"""
import os
import shutil
from types import SimpleNamespace
from unittest.mock import AsyncMock
import pytest
from backend.copilot.context import _current_project_dir
from .e2b_file_tools import _read_local, resolve_sandbox_path
_SDK_PROJECTS_DIR = os.path.realpath(os.path.expanduser("~/.claude/projects"))
from backend.copilot.context import E2B_WORKDIR, SDK_PROJECTS_DIR, _current_project_dir
from .e2b_file_tools import (
_check_sandbox_symlink_escape,
_read_local,
resolve_sandbox_path,
)
# ---------------------------------------------------------------------------
# resolve_sandbox_path — sandbox path normalisation & boundary enforcement
@@ -21,46 +25,48 @@ _SDK_PROJECTS_DIR = os.path.realpath(os.path.expanduser("~/.claude/projects"))
class TestResolveSandboxPath:
def test_relative_path_resolved(self):
assert resolve_sandbox_path("src/main.py") == "/home/user/src/main.py"
assert resolve_sandbox_path("src/main.py") == f"{E2B_WORKDIR}/src/main.py"
def test_absolute_within_sandbox(self):
assert resolve_sandbox_path("/home/user/file.txt") == "/home/user/file.txt"
assert (
resolve_sandbox_path(f"{E2B_WORKDIR}/file.txt") == f"{E2B_WORKDIR}/file.txt"
)
def test_workdir_itself(self):
assert resolve_sandbox_path("/home/user") == "/home/user"
assert resolve_sandbox_path(E2B_WORKDIR) == E2B_WORKDIR
def test_relative_dotslash(self):
assert resolve_sandbox_path("./README.md") == "/home/user/README.md"
assert resolve_sandbox_path("./README.md") == f"{E2B_WORKDIR}/README.md"
def test_traversal_blocked(self):
with pytest.raises(ValueError, match="must be within /home/user"):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
resolve_sandbox_path("../../etc/passwd")
def test_absolute_traversal_blocked(self):
with pytest.raises(ValueError, match="must be within /home/user"):
resolve_sandbox_path("/home/user/../../etc/passwd")
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
resolve_sandbox_path(f"{E2B_WORKDIR}/../../etc/passwd")
def test_absolute_outside_sandbox_blocked(self):
with pytest.raises(ValueError, match="must be within /home/user"):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
resolve_sandbox_path("/etc/passwd")
def test_root_blocked(self):
with pytest.raises(ValueError, match="must be within /home/user"):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
resolve_sandbox_path("/")
def test_home_other_user_blocked(self):
with pytest.raises(ValueError, match="must be within /home/user"):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
resolve_sandbox_path("/home/other/file.txt")
def test_deep_nested_allowed(self):
assert resolve_sandbox_path("a/b/c/d/e.txt") == "/home/user/a/b/c/d/e.txt"
assert resolve_sandbox_path("a/b/c/d/e.txt") == f"{E2B_WORKDIR}/a/b/c/d/e.txt"
def test_trailing_slash_normalised(self):
assert resolve_sandbox_path("src/") == "/home/user/src"
assert resolve_sandbox_path("src/") == f"{E2B_WORKDIR}/src"
def test_double_dots_within_sandbox_ok(self):
"""Path that resolves back within /home/user is allowed."""
assert resolve_sandbox_path("a/b/../c.txt") == "/home/user/a/c.txt"
"""Path that resolves back within E2B_WORKDIR is allowed."""
assert resolve_sandbox_path("a/b/../c.txt") == f"{E2B_WORKDIR}/a/c.txt"
# ---------------------------------------------------------------------------
@@ -73,9 +79,13 @@ class TestResolveSandboxPath:
class TestReadLocal:
_CONV_UUID = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
def _make_tool_results_file(self, encoded: str, filename: str, content: str) -> str:
"""Create a tool-results file and return its path."""
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
"""Create a tool-results file under <encoded>/<uuid>/tool-results/."""
tool_results_dir = os.path.join(
SDK_PROJECTS_DIR, encoded, self._CONV_UUID, "tool-results"
)
os.makedirs(tool_results_dir, exist_ok=True)
filepath = os.path.join(tool_results_dir, filename)
with open(filepath, "w") as f:
@@ -107,7 +117,9 @@ class TestReadLocal:
def test_read_nonexistent_tool_results(self):
"""A tool-results path that doesn't exist returns FileNotFoundError."""
encoded = "-tmp-copilot-e2b-test-nofile"
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
tool_results_dir = os.path.join(
SDK_PROJECTS_DIR, encoded, self._CONV_UUID, "tool-results"
)
os.makedirs(tool_results_dir, exist_ok=True)
filepath = os.path.join(tool_results_dir, "nonexistent.txt")
token = _current_project_dir.set(encoded)
@@ -117,7 +129,7 @@ class TestReadLocal:
assert "not found" in result["content"][0]["text"].lower()
finally:
_current_project_dir.reset(token)
os.rmdir(tool_results_dir)
shutil.rmtree(os.path.join(SDK_PROJECTS_DIR, encoded), ignore_errors=True)
def test_read_traversal_path_blocked(self):
"""A traversal attempt that escapes allowed directories is blocked."""
@@ -152,3 +164,66 @@ class TestReadLocal:
"""Without _current_project_dir set, all paths are blocked."""
result = _read_local("/tmp/anything.txt", offset=0, limit=10)
assert result["isError"] is True
# ---------------------------------------------------------------------------
# _check_sandbox_symlink_escape — symlink escape detection
# ---------------------------------------------------------------------------
def _make_sandbox(stdout: str, exit_code: int = 0) -> SimpleNamespace:
"""Build a minimal sandbox mock whose commands.run returns a fixed result."""
run_result = SimpleNamespace(stdout=stdout, exit_code=exit_code)
commands = SimpleNamespace(run=AsyncMock(return_value=run_result))
return SimpleNamespace(commands=commands)
class TestCheckSandboxSymlinkEscape:
@pytest.mark.asyncio
async def test_canonical_path_within_workdir_returns_path(self):
"""When readlink -f resolves to a path inside E2B_WORKDIR, returns it."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}/src\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/src")
assert result == f"{E2B_WORKDIR}/src"
@pytest.mark.asyncio
async def test_workdir_itself_returns_workdir(self):
"""When readlink -f resolves to E2B_WORKDIR exactly, returns E2B_WORKDIR."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, E2B_WORKDIR)
assert result == E2B_WORKDIR
@pytest.mark.asyncio
async def test_symlink_escape_returns_none(self):
"""When readlink -f resolves outside E2B_WORKDIR (symlink escape), returns None."""
sandbox = _make_sandbox(stdout="/etc\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/evil")
assert result is None
@pytest.mark.asyncio
async def test_nonzero_exit_code_returns_none(self):
"""A non-zero exit code from readlink -f returns None."""
sandbox = _make_sandbox(stdout="", exit_code=1)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/src")
assert result is None
@pytest.mark.asyncio
async def test_empty_stdout_returns_none(self):
"""Empty stdout from readlink (e.g. path doesn't exist yet) returns None."""
sandbox = _make_sandbox(stdout="", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/src")
assert result is None
@pytest.mark.asyncio
async def test_prefix_collision_returns_none(self):
"""A path prefixed with E2B_WORKDIR but not within it is rejected."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}-evil\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}-evil")
assert result is None
@pytest.mark.asyncio
async def test_deeply_nested_path_within_workdir(self):
"""Deep nested paths inside E2B_WORKDIR are allowed."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}/a/b/c/d\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/a/b/c/d")
assert result == f"{E2B_WORKDIR}/a/b/c/d"

View File

@@ -0,0 +1,651 @@
"""Tests for retry logic and transcript compaction helpers."""
from __future__ import annotations
import asyncio
from unittest.mock import AsyncMock, patch
from uuid import uuid4
import pytest
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_run_compression,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
# ---------------------------------------------------------------------------
# _flatten_assistant_content
# ---------------------------------------------------------------------------
class TestFlattenAssistantContent:
def test_text_blocks(self):
blocks = [
{"type": "text", "text": "Hello"},
{"type": "text", "text": "World"},
]
assert _flatten_assistant_content(blocks) == "Hello\nWorld"
def test_tool_use_blocks(self):
blocks = [{"type": "tool_use", "name": "read_file", "input": {}}]
assert _flatten_assistant_content(blocks) == "[tool_use: read_file]"
def test_mixed_blocks(self):
blocks = [
{"type": "text", "text": "Let me read that."},
{"type": "tool_use", "name": "Read", "input": {"path": "/foo"}},
]
result = _flatten_assistant_content(blocks)
assert "Let me read that." in result
assert "[tool_use: Read]" in result
def test_raw_strings(self):
assert _flatten_assistant_content(["hello", "world"]) == "hello\nworld"
def test_unknown_block_type_preserved_as_placeholder(self):
blocks = [
{"type": "text", "text": "See this image:"},
{"type": "image", "source": {"type": "base64", "data": "..."}},
]
result = _flatten_assistant_content(blocks)
assert "See this image:" in result
assert "[__image__]" in result
def test_empty(self):
assert _flatten_assistant_content([]) == ""
# ---------------------------------------------------------------------------
# _flatten_tool_result_content
# ---------------------------------------------------------------------------
class TestFlattenToolResultContent:
def test_tool_result_with_text(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "123",
"content": [{"type": "text", "text": "file contents here"}],
}
]
assert _flatten_tool_result_content(blocks) == "file contents here"
def test_tool_result_with_string_content(self):
blocks = [{"type": "tool_result", "tool_use_id": "123", "content": "ok"}]
assert _flatten_tool_result_content(blocks) == "ok"
def test_text_block(self):
blocks = [{"type": "text", "text": "plain text"}]
assert _flatten_tool_result_content(blocks) == "plain text"
def test_raw_string(self):
assert _flatten_tool_result_content(["raw"]) == "raw"
def test_tool_result_with_none_content(self):
"""tool_result with content=None should produce empty string."""
blocks = [{"type": "tool_result", "tool_use_id": "x", "content": None}]
assert _flatten_tool_result_content(blocks) == ""
def test_tool_result_with_empty_list_content(self):
"""tool_result with content=[] should produce empty string."""
blocks = [{"type": "tool_result", "tool_use_id": "x", "content": []}]
assert _flatten_tool_result_content(blocks) == ""
def test_empty(self):
assert _flatten_tool_result_content([]) == ""
def test_nested_dict_without_text(self):
"""Dict blocks without text key use json.dumps fallback."""
blocks = [
{
"type": "tool_result",
"tool_use_id": "x",
"content": [{"type": "image", "source": "data:..."}],
}
]
result = _flatten_tool_result_content(blocks)
assert "image" in result # json.dumps fallback
def test_unknown_block_type_preserved_as_placeholder(self):
blocks = [{"type": "image", "source": {"type": "base64", "data": "..."}}]
result = _flatten_tool_result_content(blocks)
assert "[__image__]" in result
# ---------------------------------------------------------------------------
# _transcript_to_messages
# ---------------------------------------------------------------------------
def _make_entry(entry_type: str, role: str, content: str | list, **kwargs) -> str:
"""Build a JSONL line for testing."""
uid = str(uuid4())
msg: dict = {"role": role, "content": content}
msg.update(kwargs)
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": None,
"message": msg,
}
return json.dumps(entry, separators=(",", ":"))
class TestTranscriptToMessages:
def test_basic_roundtrip(self):
lines = [
_make_entry("user", "user", "Hello"),
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert messages[0] == {"role": "user", "content": "Hello"}
assert messages[1] == {"role": "assistant", "content": "Hi"}
def test_skips_strippable_types(self):
"""Progress and metadata entries are excluded."""
lines = [
_make_entry("user", "user", "Hello"),
json.dumps(
{
"type": "progress",
"uuid": str(uuid4()),
"parentUuid": None,
"message": {"role": "assistant", "content": "..."},
}
),
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_empty_content(self):
assert _transcript_to_messages("") == []
def test_tool_result_content(self):
"""User entries with tool_result content blocks are flattened."""
lines = [
_make_entry(
"user",
"user",
[
{
"type": "tool_result",
"tool_use_id": "123",
"content": "tool output",
}
],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 1
assert messages[0]["content"] == "tool output"
def test_malformed_json_lines_skipped(self):
"""Malformed JSON lines in transcript are silently skipped."""
lines = [
_make_entry("user", "user", "Hello"),
"this is not valid json",
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_empty_lines_skipped(self):
"""Empty lines and whitespace-only lines are skipped."""
lines = [
_make_entry("user", "user", "Hello"),
"",
" ",
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_unicode_content_preserved(self):
"""Unicode characters survive transcript roundtrip."""
lines = [
_make_entry("user", "user", "Hello 你好 🌍"),
_make_entry(
"assistant",
"assistant",
[{"type": "text", "text": "Bonjour 日本語 émojis 🎉"}],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert messages[0]["content"] == "Hello 你好 🌍"
assert messages[1]["content"] == "Bonjour 日本語 émojis 🎉"
def test_entry_without_role_skipped(self):
"""Entries with missing role in message are skipped."""
entry_no_role = json.dumps(
{
"type": "user",
"uuid": str(uuid4()),
"parentUuid": None,
"message": {"content": "no role here"},
}
)
lines = [
entry_no_role,
_make_entry("user", "user", "Hello"),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 1
assert messages[0]["content"] == "Hello"
def test_tool_use_and_result_pairs(self):
"""Tool use + tool result pairs are properly flattened."""
lines = [
_make_entry(
"assistant",
"assistant",
[
{"type": "text", "text": "Let me check."},
{"type": "tool_use", "name": "read_file", "input": {"path": "/x"}},
],
),
_make_entry(
"user",
"user",
[
{
"type": "tool_result",
"tool_use_id": "abc",
"content": [{"type": "text", "text": "file contents"}],
}
],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert "Let me check." in messages[0]["content"]
assert "[tool_use: read_file]" in messages[0]["content"]
assert messages[1]["content"] == "file contents"
# ---------------------------------------------------------------------------
# _messages_to_transcript
# ---------------------------------------------------------------------------
class TestMessagesToTranscript:
def test_produces_valid_jsonl(self):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
assert len(lines) == 2
for line in lines:
parsed = json.loads(line)
assert "type" in parsed
assert "uuid" in parsed
assert "message" in parsed
def test_assistant_has_proper_structure(self):
messages = [{"role": "assistant", "content": "Hello"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["type"] == "assistant"
msg = entry["message"]
assert msg["role"] == "assistant"
assert msg["type"] == "message"
assert msg["stop_reason"] == "end_turn"
assert isinstance(msg["content"], list)
assert msg["content"][0]["type"] == "text"
def test_user_has_plain_content(self):
messages = [{"role": "user", "content": "Hi"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["type"] == "user"
assert entry["message"]["content"] == "Hi"
def test_parent_uuid_chain(self):
messages = [
{"role": "user", "content": "A"},
{"role": "assistant", "content": "B"},
{"role": "user", "content": "C"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
entries = [json.loads(line) for line in lines]
assert entries[0]["parentUuid"] == ""
assert entries[1]["parentUuid"] == entries[0]["uuid"]
assert entries[2]["parentUuid"] == entries[1]["uuid"]
def test_empty_messages(self):
assert _messages_to_transcript([]) == ""
def test_output_is_valid_transcript(self):
"""Output should pass validate_transcript if it has assistant entries."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
]
result = _messages_to_transcript(messages)
assert validate_transcript(result)
def test_roundtrip_to_messages(self):
"""Messages → transcript → messages preserves structure."""
original = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
{"role": "user", "content": "How are you?"},
]
transcript = _messages_to_transcript(original)
restored = _transcript_to_messages(transcript)
assert len(restored) == len(original)
for orig, rest in zip(original, restored):
assert orig["role"] == rest["role"]
assert orig["content"] == rest["content"]
# ---------------------------------------------------------------------------
# compact_transcript
# ---------------------------------------------------------------------------
class TestCompactTranscript:
@pytest.mark.asyncio
async def test_too_few_messages_returns_none(self, mock_chat_config):
"""compact_transcript returns None when transcript has < 2 messages."""
transcript = _build_transcript([("user", "Hello")])
result = await compact_transcript(transcript, model="test-model")
assert result is None
@pytest.mark.asyncio
async def test_returns_none_when_not_compacted(self, mock_chat_config):
"""When compress_context says no compaction needed, returns None.
The compressor couldn't reduce it, so retrying with the same
content would fail identically."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi there"),
]
)
mock_result = type(
"CompressResult",
(),
{
"was_compacted": False,
"messages": [],
"original_token_count": 100,
"token_count": 100,
"messages_summarized": 0,
"messages_dropped": 0,
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
result = await compact_transcript(transcript, model="test-model")
assert result is None
@pytest.mark.asyncio
async def test_returns_compacted_transcript(self, mock_chat_config):
"""When compaction succeeds, returns a valid compacted transcript."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi"),
("user", "More"),
("assistant", "Details"),
]
)
compacted_msgs = [
{"role": "user", "content": "[summary]"},
{"role": "assistant", "content": "Summarized response"},
]
mock_result = type(
"CompressResult",
(),
{
"was_compacted": True,
"messages": compacted_msgs,
"original_token_count": 500,
"token_count": 100,
"messages_summarized": 2,
"messages_dropped": 0,
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
result = await compact_transcript(transcript, model="test-model")
assert result is not None
assert validate_transcript(result)
msgs = _transcript_to_messages(result)
assert len(msgs) == 2
assert msgs[1]["content"] == "Summarized response"
@pytest.mark.asyncio
async def test_returns_none_on_compression_failure(self, mock_chat_config):
"""When _run_compression raises, returns None."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi"),
]
)
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("LLM unavailable"),
):
result = await compact_transcript(transcript, model="test-model")
assert result is None
# ---------------------------------------------------------------------------
# _is_prompt_too_long
# ---------------------------------------------------------------------------
class TestIsPromptTooLong:
"""Unit tests for _is_prompt_too_long pattern matching."""
def test_prompt_is_too_long(self):
err = RuntimeError("prompt is too long for model context")
assert _is_prompt_too_long(err) is True
def test_request_too_large(self):
err = Exception("request too large: 250000 tokens")
assert _is_prompt_too_long(err) is True
def test_maximum_context_length(self):
err = ValueError("maximum context length exceeded")
assert _is_prompt_too_long(err) is True
def test_context_length_exceeded(self):
err = Exception("context_length_exceeded")
assert _is_prompt_too_long(err) is True
def test_input_tokens_exceed(self):
err = Exception("input tokens exceed the max_tokens limit")
assert _is_prompt_too_long(err) is True
def test_input_is_too_long(self):
err = Exception("input is too long for the model")
assert _is_prompt_too_long(err) is True
def test_content_length_exceeds(self):
err = Exception("content length exceeds maximum")
assert _is_prompt_too_long(err) is True
def test_unrelated_error_returns_false(self):
err = RuntimeError("network timeout")
assert _is_prompt_too_long(err) is False
def test_auth_error_returns_false(self):
err = Exception("authentication failed: invalid API key")
assert _is_prompt_too_long(err) is False
def test_chained_exception_detected(self):
"""Prompt-too-long error wrapped in another exception is detected."""
inner = RuntimeError("prompt is too long")
outer = Exception("SDK error")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is True
def test_case_insensitive(self):
err = Exception("PROMPT IS TOO LONG")
assert _is_prompt_too_long(err) is True
def test_old_max_tokens_exceeded_not_matched(self):
"""The old broad 'max_tokens_exceeded' pattern was removed.
Only 'input tokens exceed' should match now."""
err = Exception("max_tokens_exceeded")
assert _is_prompt_too_long(err) is False
# ---------------------------------------------------------------------------
# _run_compression timeout fallback
# ---------------------------------------------------------------------------
class TestRunCompressionTimeout:
"""Verify _run_compression falls back to truncation when LLM times out."""
@pytest.mark.asyncio
async def test_timeout_falls_back_to_truncation(self):
"""When compress_context with LLM client times out,
_run_compression falls back to truncation (client=None)."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
truncation_result = CompressResult(
messages=messages,
was_compacted=False,
original_token_count=50,
token_count=50,
messages_summarized=0,
messages_dropped=0,
)
call_args: list[dict] = []
async def _mock_compress(**kwargs):
call_args.append(kwargs)
if kwargs.get("client") is not None:
# Simulate timeout by raising asyncio.TimeoutError
raise asyncio.TimeoutError("LLM compaction timed out")
return truncation_result
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value="fake-client",
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_mock_compress,
),
):
result = await _run_compression(messages, "test-model", "[test]")
assert result == truncation_result
# Should have been called twice: once with client, once without
assert len(call_args) == 2
assert call_args[0]["client"] is not None # LLM attempt
assert call_args[1]["client"] is None # truncation fallback
@pytest.mark.asyncio
async def test_no_client_uses_truncation_directly(self):
"""When no OpenAI client is configured, goes straight to truncation."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
truncation_result = CompressResult(
messages=messages,
was_compacted=False,
original_token_count=50,
token_count=50,
messages_summarized=0,
messages_dropped=0,
)
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
):
result = await _run_compression(messages, "test-model", "[test]")
assert result == truncation_result
mock_compress.assert_called_once()
# When no client, compress_context is called with client=None
assert mock_compress.call_args.kwargs.get("client") is None
# ---------------------------------------------------------------------------
# _friendly_error_text
# ---------------------------------------------------------------------------
class TestFriendlyErrorText:
"""Verify user-friendly error message mapping."""
def test_authentication_error(self):
result = _friendly_error_text("authentication failed: invalid API key")
assert "Authentication" in result
assert "API key" in result
def test_rate_limit_error(self):
result = _friendly_error_text("rate limit exceeded")
assert "Rate limit" in result
def test_overloaded_error(self):
result = _friendly_error_text("API is overloaded")
assert "overloaded" in result
def test_timeout_error(self):
result = _friendly_error_text("Request timeout after 30s")
assert "timed out" in result
def test_connection_error(self):
result = _friendly_error_text("Connection refused")
assert "Connection" in result or "connection" in result
def test_unknown_error_passthrough(self):
result = _friendly_error_text("some unknown error XYZ")
assert "SDK stream error:" in result
assert "XYZ" in result
def test_unauthorized_error(self):
result = _friendly_error_text("401 Unauthorized")
assert "Authentication" in result

View File

@@ -20,6 +20,7 @@ from claude_agent_sdk import (
UserMessage,
)
from backend.copilot.constants import FRIENDLY_TRANSIENT_MSG, is_transient_api_error
from backend.copilot.response_model import (
StreamBaseResponse,
StreamError,
@@ -214,10 +215,12 @@ class SDKResponseAdapter:
if sdk_message.subtype == "success":
responses.append(StreamFinish())
elif sdk_message.subtype in ("error", "error_during_execution"):
error_msg = sdk_message.result or "Unknown error"
responses.append(
StreamError(errorText=str(error_msg), code="sdk_error")
)
raw_error = str(sdk_message.result or "Unknown error")
if is_transient_api_error(raw_error):
error_text, code = FRIENDLY_TRANSIENT_MSG, "transient_api_error"
else:
error_text, code = raw_error, "sdk_error"
responses.append(StreamError(errorText=error_text, code=code))
responses.append(StreamFinish())
else:
logger.warning(

File diff suppressed because it is too large Load Diff

View File

@@ -42,7 +42,7 @@ def _validate_workspace_path(
Delegates to :func:`is_allowed_local_path` which permits:
- The SDK working directory (``/tmp/copilot-<session>/``)
- The current session's tool-results directory
(``~/.claude/projects/<encoded-cwd>/tool-results/``)
(``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/``)
"""
path = tool_input.get("file_path") or tool_input.get("path") or ""
if not path:
@@ -302,7 +302,11 @@ def create_security_hooks(
"""
_ = context, tool_use_id
trigger = input_data.get("trigger", "auto")
# Sanitize untrusted input before logging to prevent log injection
# Sanitize untrusted input: strip control chars for logging AND
# for the value passed downstream. read_compacted_entries()
# validates against _projects_base() as defence-in-depth, but
# sanitizing here prevents log injection and rejects obviously
# malformed paths early.
transcript_path = (
str(input_data.get("transcript_path", ""))
.replace("\n", "")

View File

@@ -122,7 +122,7 @@ def test_read_no_cwd_denies_absolute():
def test_read_tool_results_allowed():
home = os.path.expanduser("~")
path = f"{home}/.claude/projects/-tmp-copilot-abc123/tool-results/12345.txt"
path = f"{home}/.claude/projects/-tmp-copilot-abc123/a1b2c3d4-e5f6-7890-abcd-ef1234567890/tool-results/12345.txt"
# is_allowed_local_path requires the session's encoded cwd to be set
token = _current_project_dir.set("-tmp-copilot-abc123")
try:

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,283 @@
"""Unit tests for extracted service helpers.
Covers ``_is_prompt_too_long``, ``_reduce_context``, ``_iter_sdk_messages``,
and the ``ReducedContext`` named tuple.
"""
from __future__ import annotations
import asyncio
from collections.abc import AsyncGenerator
from unittest.mock import AsyncMock, patch
import pytest
from .conftest import build_test_transcript as _build_transcript
from .service import (
ReducedContext,
_is_prompt_too_long,
_iter_sdk_messages,
_reduce_context,
)
# ---------------------------------------------------------------------------
# _is_prompt_too_long
# ---------------------------------------------------------------------------
class TestIsPromptTooLong:
def test_direct_match(self) -> None:
assert _is_prompt_too_long(Exception("prompt is too long")) is True
def test_case_insensitive(self) -> None:
assert _is_prompt_too_long(Exception("PROMPT IS TOO LONG")) is True
def test_no_match(self) -> None:
assert _is_prompt_too_long(Exception("network timeout")) is False
def test_request_too_large(self) -> None:
assert _is_prompt_too_long(Exception("request too large for model")) is True
def test_context_length_exceeded(self) -> None:
assert _is_prompt_too_long(Exception("context_length_exceeded")) is True
def test_max_tokens_exceeded_not_matched(self) -> None:
"""'max_tokens_exceeded' is intentionally excluded (too broad)."""
assert _is_prompt_too_long(Exception("max_tokens_exceeded")) is False
def test_max_tokens_config_error_no_match(self) -> None:
"""'max_tokens must be at least 1' should NOT match."""
assert _is_prompt_too_long(Exception("max_tokens must be at least 1")) is False
def test_chained_cause(self) -> None:
inner = Exception("prompt is too long")
outer = RuntimeError("SDK error")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is True
def test_chained_context(self) -> None:
inner = Exception("request too large")
outer = RuntimeError("wrapped")
outer.__context__ = inner
assert _is_prompt_too_long(outer) is True
def test_deep_chain(self) -> None:
bottom = Exception("maximum context length")
middle = RuntimeError("middle")
middle.__cause__ = bottom
top = ValueError("top")
top.__cause__ = middle
assert _is_prompt_too_long(top) is True
def test_chain_no_match(self) -> None:
inner = Exception("rate limit exceeded")
outer = RuntimeError("wrapped")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is False
def test_cycle_detection(self) -> None:
"""Exception chain with a cycle should not infinite-loop."""
a = Exception("error a")
b = Exception("error b")
a.__cause__ = b
b.__cause__ = a # cycle
assert _is_prompt_too_long(a) is False
def test_all_patterns(self) -> None:
patterns = [
"prompt is too long",
"request too large",
"maximum context length",
"context_length_exceeded",
"input tokens exceed",
"input is too long",
"content length exceeds",
]
for pattern in patterns:
assert _is_prompt_too_long(Exception(pattern)) is True, pattern
# ---------------------------------------------------------------------------
# _reduce_context
# ---------------------------------------------------------------------------
class TestReduceContext:
@pytest.mark.asyncio
async def test_first_retry_compaction_success(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
with (
patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=compacted,
),
patch(
"backend.copilot.sdk.service.validate_transcript",
return_value=True,
),
patch(
"backend.copilot.sdk.service.write_transcript_to_tempfile",
return_value="/tmp/resume.jsonl",
),
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert isinstance(ctx, ReducedContext)
assert ctx.use_resume is True
assert ctx.resume_file == "/tmp/resume.jsonl"
assert ctx.transcript_lost is False
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_compaction_fails_drops_transcript(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
with patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=None,
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.use_resume is False
assert ctx.resume_file is None
assert ctx.transcript_lost is True
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_already_tried_compaction_skips(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
ctx = await _reduce_context(transcript, True, "sess-123", "/tmp/cwd", "[test]")
assert ctx.use_resume is False
assert ctx.transcript_lost is True
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_empty_transcript_drops(self) -> None:
ctx = await _reduce_context("", False, "sess-123", "/tmp/cwd", "[test]")
assert ctx.use_resume is False
assert ctx.transcript_lost is True
@pytest.mark.asyncio
async def test_compaction_returns_same_content_drops(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
with patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=transcript, # same content
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.transcript_lost is True
@pytest.mark.asyncio
async def test_write_tempfile_fails_drops(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
with (
patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=compacted,
),
patch(
"backend.copilot.sdk.service.validate_transcript",
return_value=True,
),
patch(
"backend.copilot.sdk.service.write_transcript_to_tempfile",
return_value=None,
),
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.transcript_lost is True
# ---------------------------------------------------------------------------
# _iter_sdk_messages
# ---------------------------------------------------------------------------
class TestIterSdkMessages:
@pytest.mark.asyncio
async def test_yields_messages(self) -> None:
messages = ["msg1", "msg2", "msg3"]
client = AsyncMock()
async def _fake_receive() -> AsyncGenerator[str]:
for m in messages:
yield m
client.receive_response = _fake_receive
result = [msg async for msg in _iter_sdk_messages(client)]
assert result == messages
@pytest.mark.asyncio
async def test_heartbeat_on_timeout(self) -> None:
"""Yields None when asyncio.wait times out."""
client = AsyncMock()
received: list = []
async def _slow_receive() -> AsyncGenerator[str]:
await asyncio.sleep(100) # never completes
yield "never" # pragma: no cover — unreachable, yield makes this an async generator
client.receive_response = _slow_receive
with patch("backend.copilot.sdk.service._HEARTBEAT_INTERVAL", 0.01):
count = 0
async for msg in _iter_sdk_messages(client):
received.append(msg)
count += 1
if count >= 3:
break
assert all(m is None for m in received)
@pytest.mark.asyncio
async def test_exception_propagates(self) -> None:
client = AsyncMock()
async def _error_receive() -> AsyncGenerator[str]:
raise RuntimeError("SDK crash")
yield # pragma: no cover — unreachable, yield makes this an async generator
client.receive_response = _error_receive
with pytest.raises(RuntimeError, match="SDK crash"):
async for _ in _iter_sdk_messages(client):
pass
@pytest.mark.asyncio
async def test_task_cleanup_on_break(self) -> None:
"""Pending task is cancelled when generator is closed."""
client = AsyncMock()
async def _slow_receive() -> AsyncGenerator[str]:
yield "first"
await asyncio.sleep(100)
yield "second"
client.receive_response = _slow_receive
gen = _iter_sdk_messages(client)
first = await gen.__anext__()
assert first == "first"
await gen.aclose() # should cancel pending task cleanly

View File

@@ -8,7 +8,7 @@ from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from .service import _prepare_file_attachments
from .service import _prepare_file_attachments, _resolve_sdk_model
@dataclass
@@ -288,3 +288,214 @@ class TestPromptSupplement:
# Count how many times this tool appears as a bullet point
count = docs.count(f"- **`{tool_name}`**")
assert count == 1, f"Tool '{tool_name}' appears {count} times (should be 1)"
# ---------------------------------------------------------------------------
# _cleanup_sdk_tool_results — orchestration + rate-limiting
# ---------------------------------------------------------------------------
class TestCleanupSdkToolResults:
"""Tests for _cleanup_sdk_tool_results orchestration and sweep rate-limiting."""
# All valid cwds must start with /tmp/copilot- (the _SDK_CWD_PREFIX).
_CWD_PREFIX = "/tmp/copilot-"
@pytest.mark.asyncio
async def test_removes_cwd_directory(self):
"""Cleanup removes the session working directory."""
from .service import _cleanup_sdk_tool_results
cwd = "/tmp/copilot-test-cleanup-remove"
os.makedirs(cwd, exist_ok=True)
with patch("backend.copilot.sdk.service.cleanup_stale_project_dirs"):
import backend.copilot.sdk.service as svc_mod
svc_mod._last_sweep_time = 0.0
await _cleanup_sdk_tool_results(cwd)
assert not os.path.exists(cwd)
@pytest.mark.asyncio
async def test_sweep_runs_when_interval_elapsed(self):
"""cleanup_stale_project_dirs is called when 5-minute interval has elapsed."""
import backend.copilot.sdk.service as svc_mod
from .service import _cleanup_sdk_tool_results
cwd = "/tmp/copilot-test-sweep-elapsed"
os.makedirs(cwd, exist_ok=True)
with patch(
"backend.copilot.sdk.service.cleanup_stale_project_dirs"
) as mock_sweep:
# Set last sweep to a time far in the past
svc_mod._last_sweep_time = 0.0
await _cleanup_sdk_tool_results(cwd)
mock_sweep.assert_called_once()
@pytest.mark.asyncio
async def test_sweep_skipped_within_interval(self):
"""cleanup_stale_project_dirs is NOT called when within 5-minute interval."""
import time
import backend.copilot.sdk.service as svc_mod
from .service import _cleanup_sdk_tool_results
cwd = "/tmp/copilot-test-sweep-ratelimit"
os.makedirs(cwd, exist_ok=True)
with patch(
"backend.copilot.sdk.service.cleanup_stale_project_dirs"
) as mock_sweep:
# Set last sweep to now — interval not elapsed
svc_mod._last_sweep_time = time.time()
await _cleanup_sdk_tool_results(cwd)
mock_sweep.assert_not_called()
@pytest.mark.asyncio
async def test_rejects_path_outside_prefix(self, tmp_path):
"""Cleanup rejects a cwd that does not start with the expected prefix."""
from .service import _cleanup_sdk_tool_results
evil_cwd = str(tmp_path / "evil-path")
os.makedirs(evil_cwd, exist_ok=True)
with patch(
"backend.copilot.sdk.service.cleanup_stale_project_dirs"
) as mock_sweep:
await _cleanup_sdk_tool_results(evil_cwd)
# Directory should NOT have been removed (rejected early)
assert os.path.exists(evil_cwd)
mock_sweep.assert_not_called()
# ---------------------------------------------------------------------------
# Env vars that ChatConfig validators read — must be cleared so explicit
# constructor values are used.
# ---------------------------------------------------------------------------
_CONFIG_ENV_VARS = (
"CHAT_USE_OPENROUTER",
"CHAT_API_KEY",
"OPEN_ROUTER_API_KEY",
"OPENAI_API_KEY",
"CHAT_BASE_URL",
"OPENROUTER_BASE_URL",
"OPENAI_BASE_URL",
"CHAT_USE_CLAUDE_CODE_SUBSCRIPTION",
"CHAT_USE_CLAUDE_AGENT_SDK",
)
@pytest.fixture()
def _clean_config_env(monkeypatch: pytest.MonkeyPatch) -> None:
for var in _CONFIG_ENV_VARS:
monkeypatch.delenv(var, raising=False)
class TestResolveSdkModel:
"""Tests for _resolve_sdk_model — model ID resolution for the SDK CLI."""
def test_openrouter_active_keeps_dots(self, monkeypatch, _clean_config_env):
"""When OpenRouter is fully active, model keeps dot-separated version."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4.6"
def test_openrouter_disabled_normalizes_to_hyphens(
self, monkeypatch, _clean_config_env
):
"""When OpenRouter is disabled, dots are replaced with hyphens."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4-6"
def test_openrouter_enabled_but_missing_key_normalizes(
self, monkeypatch, _clean_config_env
):
"""When OpenRouter is enabled but api_key is missing, falls back to
direct Anthropic and normalizes dots to hyphens."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=True,
api_key=None,
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4-6"
def test_explicit_claude_agent_model_takes_precedence(
self, monkeypatch, _clean_config_env
):
"""When claude_agent_model is explicitly set, it is returned as-is."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model="claude-sonnet-4-5-20250514",
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-sonnet-4-5-20250514"
def test_subscription_mode_returns_none(self, monkeypatch, _clean_config_env):
"""When using Claude Code subscription, returns None (CLI picks model)."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=True,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() is None
def test_model_without_provider_prefix(self, monkeypatch, _clean_config_env):
"""When model has no provider prefix, it still normalizes correctly."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="claude-opus-4.6",
claude_agent_model=None,
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4-6"

View File

@@ -146,7 +146,7 @@ def stash_pending_tool_output(tool_name: str, output: Any) -> None:
event.set()
async def wait_for_stash(timeout: float = 0.5) -> bool:
async def wait_for_stash(timeout: float = 2.0) -> bool:
"""Wait for a PostToolUse hook to stash tool output.
The SDK fires PostToolUse hooks asynchronously via ``start_soon()`` —
@@ -155,12 +155,12 @@ async def wait_for_stash(timeout: float = 0.5) -> bool:
by waiting on the ``_stash_event``, which is signaled by
:func:`stash_pending_tool_output`.
After the event fires, callers should ``await asyncio.sleep(0)`` to
give any remaining concurrent hooks a chance to complete.
Returns ``True`` if a stash signal was received, ``False`` on timeout.
The timeout is a safety net — normally the stash happens within
microseconds of yielding to the event loop.
The 2.0 s default was chosen based on production metrics: the original
0.5 s caused frequent timeouts under load (parallel tool calls, large
outputs). 2.0 s gives a comfortable margin while still failing fast
when the hook genuinely will not fire.
"""
event = _stash_event.get(None)
if event is None:
@@ -285,7 +285,7 @@ async def _read_file_handler(args: dict[str, Any]) -> dict[str, Any]:
resolved = os.path.realpath(os.path.expanduser(file_path))
try:
with open(resolved) as f:
with open(resolved, encoding="utf-8", errors="replace") as f:
selected = list(itertools.islice(f, offset, offset + limit))
# Cleanup happens in _cleanup_sdk_tool_results after session ends;
# don't delete here — the SDK may read in multiple chunks.

View File

@@ -10,6 +10,9 @@ Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
filesystem for self-hosted) — no DB column needed.
"""
from __future__ import annotations
import asyncio
import logging
import os
import re
@@ -17,8 +20,12 @@ import shutil
import time
from dataclasses import dataclass
from pathlib import Path
from uuid import uuid4
from backend.util import json
from backend.util.clients import get_openai_client
from backend.util.prompt import CompressResult, compress_context
from backend.util.workspace_storage import GCSWorkspaceStorage, get_workspace_storage
logger = logging.getLogger(__name__)
@@ -99,7 +106,14 @@ def strip_progress_entries(content: str) -> str:
continue
parent = entry.get("parentUuid", "")
original_parent = parent
while parent in stripped_uuids:
# seen_parents is local per-entry (not shared across iterations) so
# it can only detect cycles within a single ancestry walk, not across
# entries. This is intentional: each entry's parent chain is
# independent, and reusing a global set would incorrectly short-circuit
# valid re-use of the same UUID as a parent in different subtrees.
seen_parents: set[str] = set()
while parent in stripped_uuids and parent not in seen_parents:
seen_parents.add(parent)
parent = uuid_to_parent.get(parent, "")
if parent != original_parent:
entry["parentUuid"] = parent
@@ -151,44 +165,110 @@ def _projects_base() -> str:
return os.path.realpath(os.path.join(config_dir, "projects"))
def _cli_project_dir(sdk_cwd: str) -> str | None:
"""Return the CLI's project directory for a given working directory.
_STALE_PROJECT_DIR_SECONDS = 12 * 3600 # 12 hours — matches max session lifetime
_MAX_PROJECT_DIRS_TO_SWEEP = 50 # limit per sweep to avoid long pauses
Returns ``None`` if the path would escape the projects base.
def cleanup_stale_project_dirs(encoded_cwd: str | None = None) -> int:
"""Remove CLI project directories older than ``_STALE_PROJECT_DIR_SECONDS``.
Each CoPilot SDK turn creates a unique ``~/.claude/projects/<encoded-cwd>/``
directory. These are intentionally kept across turns so the model can read
tool-result files via ``--resume``. However, after a session ends they
become stale. This function sweeps old ones to prevent unbounded disk
growth.
When *encoded_cwd* is provided the sweep is scoped to that single
directory, making the operation safe in multi-tenant environments where
multiple copilot sessions share the same host. Without it the function
falls back to sweeping all directories matching the copilot naming pattern
(``-tmp-copilot-``), which is only safe for single-tenant deployments.
Returns the number of directories removed.
"""
cwd_encoded = re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(sdk_cwd))
projects_base = _projects_base()
project_dir = os.path.realpath(os.path.join(projects_base, cwd_encoded))
if not os.path.isdir(projects_base):
return 0
if not project_dir.startswith(projects_base + os.sep):
logger.warning(
"[Transcript] Project dir escaped projects base: %s", project_dir
)
return None
return project_dir
now = time.time()
removed = 0
def _safe_glob_jsonl(project_dir: str) -> list[Path]:
"""Glob ``*.jsonl`` files, filtering out symlinks that escape the directory."""
try:
resolved_base = Path(project_dir).resolve()
except OSError as e:
logger.warning("[Transcript] Failed to resolve project dir: %s", e)
return []
result: list[Path] = []
for candidate in Path(project_dir).glob("*.jsonl"):
try:
resolved = candidate.resolve()
if resolved.is_relative_to(resolved_base):
result.append(resolved)
except (OSError, RuntimeError) as e:
logger.debug(
"[Transcript] Skipping invalid CLI session candidate %s: %s",
candidate,
e,
# Scoped mode: only clean up the one directory for the current session.
if encoded_cwd:
target = Path(projects_base) / encoded_cwd
if not target.is_dir():
return 0
# Guard: only sweep copilot-generated dirs.
if "-tmp-copilot-" not in target.name:
logger.warning(
"[Transcript] Refusing to sweep non-copilot dir: %s", target.name
)
return result
return 0
try:
# st_mtime is used as a proxy for session activity. Claude CLI writes
# its JSONL transcript into this directory during each turn, so mtime
# advances on every turn. A directory whose mtime is older than
# _STALE_PROJECT_DIR_SECONDS has not had an active turn in that window
# and is safe to remove (the session cannot --resume after cleanup).
age = now - target.stat().st_mtime
except OSError:
return 0
if age < _STALE_PROJECT_DIR_SECONDS:
return 0
try:
shutil.rmtree(target, ignore_errors=True)
removed = 1
except OSError:
pass
if removed:
logger.info(
"[Transcript] Swept stale CLI project dir %s (age %ds > %ds)",
target.name,
int(age),
_STALE_PROJECT_DIR_SECONDS,
)
return removed
# Unscoped fallback: sweep all copilot dirs across the projects base.
# Only safe for single-tenant deployments; callers should prefer the
# scoped variant by passing encoded_cwd.
try:
entries = Path(projects_base).iterdir()
except OSError as e:
logger.warning("[Transcript] Failed to list projects dir: %s", e)
return 0
for entry in entries:
if removed >= _MAX_PROJECT_DIRS_TO_SWEEP:
break
# Only sweep copilot-generated dirs (pattern: -tmp-copilot- or
# -private-tmp-copilot-).
if "-tmp-copilot-" not in entry.name:
continue
if not entry.is_dir():
continue
try:
# See the scoped-mode comment above: st_mtime advances on every turn,
# so a stale mtime reliably indicates an inactive session.
age = now - entry.stat().st_mtime
except OSError:
continue
if age < _STALE_PROJECT_DIR_SECONDS:
continue
try:
shutil.rmtree(entry, ignore_errors=True)
removed += 1
except OSError:
pass
if removed:
logger.info(
"[Transcript] Swept %d stale CLI project dirs (older than %ds)",
removed,
_STALE_PROJECT_DIR_SECONDS,
)
return removed
def read_compacted_entries(transcript_path: str) -> list[dict] | None:
@@ -255,63 +335,6 @@ def read_compacted_entries(transcript_path: str) -> list[dict] | None:
return entries
def read_cli_session_file(sdk_cwd: str) -> str | None:
"""Read the CLI's own session file, which reflects any compaction.
The CLI writes its session transcript to
``~/.claude/projects/<encoded_cwd>/<session_id>.jsonl``.
Since each SDK turn uses a unique ``sdk_cwd``, there should be
exactly one ``.jsonl`` file in that directory.
Returns the file content, or ``None`` if not found.
"""
project_dir = _cli_project_dir(sdk_cwd)
if not project_dir or not os.path.isdir(project_dir):
return None
jsonl_files = _safe_glob_jsonl(project_dir)
if not jsonl_files:
logger.debug("[Transcript] No CLI session file found in %s", project_dir)
return None
# Pick the most recently modified file (should be only one per turn).
try:
session_file = max(jsonl_files, key=lambda p: p.stat().st_mtime)
except OSError as e:
logger.warning("[Transcript] Failed to inspect CLI session files: %s", e)
return None
try:
content = session_file.read_text()
logger.info(
"[Transcript] Read CLI session file: %s (%d bytes)",
session_file,
len(content),
)
return content
except OSError as e:
logger.warning("[Transcript] Failed to read CLI session file: %s", e)
return None
def cleanup_cli_project_dir(sdk_cwd: str) -> None:
"""Remove the CLI's project directory for a specific working directory.
The CLI stores session data under ``~/.claude/projects/<encoded_cwd>/``.
Each SDK turn uses a unique ``sdk_cwd``, so the project directory is
safe to remove entirely after the transcript has been uploaded.
"""
project_dir = _cli_project_dir(sdk_cwd)
if not project_dir:
return
if os.path.isdir(project_dir):
shutil.rmtree(project_dir, ignore_errors=True)
logger.debug("[Transcript] Cleaned up CLI project dir: %s", project_dir)
else:
logger.debug("[Transcript] Project dir not found: %s", project_dir)
def write_transcript_to_tempfile(
transcript_content: str,
session_id: str,
@@ -327,7 +350,7 @@ def write_transcript_to_tempfile(
# Validate cwd is under the expected sandbox prefix (CodeQL sanitizer).
real_cwd = os.path.realpath(cwd)
if not real_cwd.startswith(_SAFE_CWD_PREFIX):
logger.warning(f"[Transcript] cwd outside sandbox: {cwd}")
logger.warning("[Transcript] cwd outside sandbox: %s", cwd)
return None
try:
@@ -337,17 +360,17 @@ def write_transcript_to_tempfile(
os.path.join(real_cwd, f"transcript-{safe_id}.jsonl")
)
if not jsonl_path.startswith(real_cwd):
logger.warning(f"[Transcript] Path escaped cwd: {jsonl_path}")
logger.warning("[Transcript] Path escaped cwd: %s", jsonl_path)
return None
with open(jsonl_path, "w") as f:
f.write(transcript_content)
logger.info(f"[Transcript] Wrote resume file: {jsonl_path}")
logger.info("[Transcript] Wrote resume file: %s", jsonl_path)
return jsonl_path
except OSError as e:
logger.warning(f"[Transcript] Failed to write resume file: {e}")
logger.warning("[Transcript] Failed to write resume file: %s", e)
return None
@@ -408,8 +431,6 @@ def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, s
def _build_path_from_parts(parts: tuple[str, str, str], backend: object) -> str:
"""Build a full storage path from (workspace_id, file_id, filename) parts."""
from backend.util.workspace_storage import GCSWorkspaceStorage
wid, fid, fname = parts
if isinstance(backend, GCSWorkspaceStorage):
blob = f"workspaces/{wid}/{fid}/{fname}"
@@ -448,17 +469,15 @@ async def upload_transcript(
content: Complete JSONL transcript (from TranscriptBuilder).
message_count: ``len(session.messages)`` at upload time.
"""
from backend.util.workspace_storage import get_workspace_storage
# Strip metadata entries (progress, file-history-snapshot, etc.)
# Note: SDK-built transcripts shouldn't have these, but strip for safety
stripped = strip_progress_entries(content)
if not validate_transcript(stripped):
# Log entry types for debugging — helps identify why validation failed
entry_types: list[str] = []
for line in stripped.strip().split("\n"):
entry = json.loads(line, fallback={"type": "INVALID_JSON"})
entry_types.append(entry.get("type", "?"))
entry_types = [
json.loads(line, fallback={"type": "INVALID_JSON"}).get("type", "?")
for line in stripped.strip().split("\n")
]
logger.warning(
"%s Skipping upload — stripped content not valid "
"(types=%s, stripped_len=%d, raw_len=%d)",
@@ -494,11 +513,14 @@ async def upload_transcript(
content=json.dumps(meta).encode("utf-8"),
)
except Exception as e:
logger.warning(f"{log_prefix} Failed to write metadata: {e}")
logger.warning("%s Failed to write metadata: %s", log_prefix, e)
logger.info(
f"{log_prefix} Uploaded {len(encoded)}B "
f"(stripped from {len(content)}B, msg_count={message_count})"
"%s Uploaded %dB (stripped from %dB, msg_count=%d)",
log_prefix,
len(encoded),
len(content),
message_count,
)
@@ -512,8 +534,6 @@ async def download_transcript(
Returns a ``TranscriptDownload`` with the JSONL content and the
``message_count`` watermark from the upload, or ``None`` if not found.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
@@ -521,10 +541,10 @@ async def download_transcript(
data = await storage.retrieve(path)
content = data.decode("utf-8")
except FileNotFoundError:
logger.debug(f"{log_prefix} No transcript in storage")
logger.debug("%s No transcript in storage", log_prefix)
return None
except Exception as e:
logger.warning(f"{log_prefix} Failed to download transcript: {e}")
logger.warning("%s Failed to download transcript: %s", log_prefix, e)
return None
# Try to load metadata (best-effort — old transcripts won't have it)
@@ -536,10 +556,14 @@ async def download_transcript(
meta = json.loads(meta_data.decode("utf-8"), fallback={})
message_count = meta.get("message_count", 0)
uploaded_at = meta.get("uploaded_at", 0.0)
except (FileNotFoundError, Exception):
except FileNotFoundError:
pass # No metadata — treat as unknown (msg_count=0 → always fill gap)
except Exception as e:
logger.debug("%s Failed to load transcript metadata: %s", log_prefix, e)
logger.info(f"{log_prefix} Downloaded {len(content)}B (msg_count={message_count})")
logger.info(
"%s Downloaded %dB (msg_count=%d)", log_prefix, len(content), message_count
)
return TranscriptDownload(
content=content,
message_count=message_count,
@@ -553,8 +577,6 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
Removes both the ``.jsonl`` transcript and the companion ``.meta.json``
so stale ``message_count`` watermarks cannot corrupt gap-fill logic.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
@@ -571,3 +593,280 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
logger.info("[Transcript] Deleted metadata for session %s", session_id)
except Exception as e:
logger.warning("[Transcript] Failed to delete metadata: %s", e)
# ---------------------------------------------------------------------------
# Transcript compaction — LLM summarization for prompt-too-long recovery
# ---------------------------------------------------------------------------
# JSONL protocol values used in transcript serialization.
STOP_REASON_END_TURN = "end_turn"
COMPACT_MSG_ID_PREFIX = "msg_compact_"
ENTRY_TYPE_MESSAGE = "message"
def _flatten_assistant_content(blocks: list) -> str:
"""Flatten assistant content blocks into a single plain-text string.
Structured ``tool_use`` blocks are converted to ``[tool_use: name]``
placeholders. This is intentional: ``compress_context`` requires plain
text for token counting and LLM summarization. The structural loss is
acceptable because compaction only runs when the original transcript was
already too large for the model — a summarized plain-text version is
better than no context at all.
"""
parts: list[str] = []
for block in blocks:
if isinstance(block, dict):
btype = block.get("type", "")
if btype == "text":
parts.append(block.get("text", ""))
elif btype == "tool_use":
parts.append(f"[tool_use: {block.get('name', '?')}]")
else:
# Preserve non-text blocks (e.g. image) as placeholders.
# Use __prefix__ to distinguish from literal user text.
parts.append(f"[__{btype}__]")
elif isinstance(block, str):
parts.append(block)
return "\n".join(parts) if parts else ""
def _flatten_tool_result_content(blocks: list) -> str:
"""Flatten tool_result and other content blocks into plain text.
Handles nested tool_result structures, text blocks, and raw strings.
Uses ``json.dumps`` as fallback for dict blocks without a ``text`` key
or where ``text`` is ``None``.
Like ``_flatten_assistant_content``, structured blocks (images, nested
tool results) are reduced to text representations for compression.
"""
str_parts: list[str] = []
for block in blocks:
if isinstance(block, dict) and block.get("type") == "tool_result":
inner = block.get("content") or ""
if isinstance(inner, list):
for sub in inner:
if isinstance(sub, dict):
sub_type = sub.get("type")
if sub_type in ("image", "document"):
# Avoid serializing base64 binary data into
# the compaction input — use a placeholder.
str_parts.append(f"[__{sub_type}__]")
elif sub_type == "text" or sub.get("text") is not None:
str_parts.append(str(sub.get("text", "")))
else:
str_parts.append(json.dumps(sub))
else:
str_parts.append(str(sub))
else:
str_parts.append(str(inner))
elif isinstance(block, dict) and block.get("type") == "text":
str_parts.append(str(block.get("text", "")))
elif isinstance(block, dict):
# Preserve non-text/non-tool_result blocks (e.g. image) as placeholders.
# Use __prefix__ to distinguish from literal user text.
btype = block.get("type", "unknown")
str_parts.append(f"[__{btype}__]")
elif isinstance(block, str):
str_parts.append(block)
return "\n".join(str_parts) if str_parts else ""
def _transcript_to_messages(content: str) -> list[dict]:
"""Convert JSONL transcript entries to plain message dicts for compression.
Parses each line of the JSONL *content*, skips strippable metadata entries
(progress, file-history-snapshot, etc.), and extracts the ``role`` and
flattened ``content`` from the ``message`` field of each remaining entry.
Structured content blocks (``tool_use``, ``tool_result``, images) are
flattened to plain text via ``_flatten_assistant_content`` and
``_flatten_tool_result_content`` so that ``compress_context`` can
perform token counting and LLM summarization on uniform strings.
Returns:
A list of ``{"role": str, "content": str}`` dicts suitable for
``compress_context``.
"""
messages: list[dict] = []
for line in content.strip().split("\n"):
if not line.strip():
continue
entry = json.loads(line, fallback=None)
if not isinstance(entry, dict):
continue
if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
"isCompactSummary"
):
continue
msg = entry.get("message", {})
role = msg.get("role", "")
if not role:
continue
msg_dict: dict = {"role": role}
raw_content = msg.get("content")
if role == "assistant" and isinstance(raw_content, list):
msg_dict["content"] = _flatten_assistant_content(raw_content)
elif isinstance(raw_content, list):
msg_dict["content"] = _flatten_tool_result_content(raw_content)
else:
msg_dict["content"] = raw_content or ""
messages.append(msg_dict)
return messages
def _messages_to_transcript(messages: list[dict]) -> str:
"""Convert compressed message dicts back to JSONL transcript format.
Rebuilds a minimal JSONL transcript from the ``{"role", "content"}``
dicts returned by ``compress_context``. Each message becomes one JSONL
line with a fresh ``uuid`` / ``parentUuid`` chain so the CLI's
``--resume`` flag can reconstruct a valid conversation tree.
Assistant messages are wrapped in the full ``message`` envelope
(``id``, ``model``, ``stop_reason``, structured ``content`` blocks)
that the CLI expects. User messages use the simpler ``{role, content}``
form.
Returns:
A newline-terminated JSONL string, or an empty string if *messages*
is empty.
"""
lines: list[str] = []
last_uuid: str = "" # root entry uses empty string, not null
for msg in messages:
role = msg.get("role", "user")
entry_type = "assistant" if role == "assistant" else "user"
uid = str(uuid4())
content = msg.get("content", "")
if role == "assistant":
message: dict = {
"role": "assistant",
"model": "",
"id": f"{COMPACT_MSG_ID_PREFIX}{uuid4().hex[:24]}",
"type": ENTRY_TYPE_MESSAGE,
"content": [{"type": "text", "text": content}] if content else [],
"stop_reason": STOP_REASON_END_TURN,
"stop_sequence": None,
}
else:
message = {"role": role, "content": content}
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": last_uuid,
"message": message,
}
lines.append(json.dumps(entry, separators=(",", ":")))
last_uuid = uid
return "\n".join(lines) + "\n" if lines else ""
_COMPACTION_TIMEOUT_SECONDS = 60
_TRUNCATION_TIMEOUT_SECONDS = 30
async def _run_compression(
messages: list[dict],
model: str,
log_prefix: str,
) -> CompressResult:
"""Run LLM-based compression with truncation fallback.
Uses the shared OpenAI client from ``get_openai_client()``.
If no client is configured or the LLM call fails, falls back to
truncation-based compression which drops older messages without
summarization.
A 60-second timeout prevents a hung LLM call from blocking the
retry path indefinitely. The truncation fallback also has a
30-second timeout to guard against slow tokenization on very large
transcripts.
"""
client = get_openai_client()
if client is None:
logger.warning("%s No OpenAI client configured, using truncation", log_prefix)
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=None),
timeout=_TRUNCATION_TIMEOUT_SECONDS,
)
try:
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=client),
timeout=_COMPACTION_TIMEOUT_SECONDS,
)
except Exception as e:
logger.warning("%s LLM compaction failed, using truncation: %s", log_prefix, e)
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=None),
timeout=_TRUNCATION_TIMEOUT_SECONDS,
)
async def compact_transcript(
content: str,
*,
model: str,
log_prefix: str = "[Transcript]",
) -> str | None:
"""Compact an oversized JSONL transcript using LLM summarization.
Converts transcript entries to plain messages, runs ``compress_context``
(the same compressor used for pre-query history), and rebuilds JSONL.
Structured content (``tool_use`` blocks, ``tool_result`` nesting, images)
is flattened to plain text for compression. This matches the fidelity of
the Plan C (DB compression) fallback path, where
``_format_conversation_context`` similarly renders tool calls as
``You called tool: name(args)`` and results as ``Tool result: ...``.
Neither path preserves structured API content blocks — the compacted
context serves as text history for the LLM, which creates proper
structured tool calls going forward.
Images are per-turn attachments loaded from workspace storage by file ID
(via ``_prepare_file_attachments``), not part of the conversation history.
They are re-attached each turn and are unaffected by compaction.
Returns the compacted JSONL string, or ``None`` on failure.
See also:
``_compress_messages`` in ``service.py`` — compresses ``ChatMessage``
lists for pre-query DB history. Both share ``compress_context()``
but operate on different input formats (JSONL transcript entries
here vs. ChatMessage dicts there).
"""
messages = _transcript_to_messages(content)
if len(messages) < 2:
logger.warning("%s Too few messages to compact (%d)", log_prefix, len(messages))
return None
try:
result = await _run_compression(messages, model, log_prefix)
if not result.was_compacted:
# Compressor says it's within budget, but the SDK rejected it.
# Return None so the caller falls through to DB fallback.
logger.warning(
"%s Compressor reports within budget but SDK rejected — "
"signalling failure",
log_prefix,
)
return None
logger.info(
"%s Compacted transcript: %d->%d tokens (%d summarized, %d dropped)",
log_prefix,
result.original_token_count,
result.token_count,
result.messages_summarized,
result.messages_dropped,
)
compacted = _messages_to_transcript(result.messages)
if not validate_transcript(compacted):
logger.warning("%s Compacted transcript failed validation", log_prefix)
return None
return compacted
except Exception as e:
logger.error(
"%s Transcript compaction failed: %s", log_prefix, e, exc_info=True
)
return None

View File

@@ -68,7 +68,7 @@ class TranscriptBuilder:
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid"),
isCompactSummary=data.get("isCompactSummary") or None,
isCompactSummary=data.get("isCompactSummary"),
message=data.get("message", {}),
)

View File

@@ -1,7 +1,8 @@
"""Unit tests for JSONL transcript management utilities."""
import asyncio
import os
from unittest.mock import AsyncMock, patch
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
@@ -9,9 +10,7 @@ from backend.util import json
from .transcript import (
STRIPPABLE_TYPES,
_cli_project_dir,
delete_transcript,
read_cli_session_file,
read_compacted_entries,
strip_progress_entries,
validate_transcript,
@@ -292,85 +291,6 @@ class TestStripProgressEntries:
assert asst_entry["parentUuid"] == "u1" # reparented
# --- read_cli_session_file ---
class TestReadCliSessionFile:
def test_no_matching_files_returns_none(self, tmp_path, monkeypatch):
"""read_cli_session_file returns None when no .jsonl files exist."""
# Create a project dir with no jsonl files
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
assert read_cli_session_file("/fake/cwd") is None
def test_one_jsonl_file_returns_content(self, tmp_path, monkeypatch):
"""read_cli_session_file returns the content of a single .jsonl file."""
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
jsonl_file = project_dir / "session.jsonl"
jsonl_file.write_text("line1\nline2\n")
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
result = read_cli_session_file("/fake/cwd")
assert result == "line1\nline2\n"
def test_symlink_escaping_project_dir_is_skipped(self, tmp_path, monkeypatch):
"""read_cli_session_file skips symlinks that escape the project dir."""
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
# Create a file outside the project dir
outside = tmp_path / "outside"
outside.mkdir()
outside_file = outside / "evil.jsonl"
outside_file.write_text("should not be read\n")
# Symlink from inside project_dir to outside file
symlink = project_dir / "evil.jsonl"
symlink.symlink_to(outside_file)
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
# The symlink target resolves outside project_dir, so it should be skipped
result = read_cli_session_file("/fake/cwd")
assert result is None
# --- _cli_project_dir ---
class TestCliProjectDir:
def test_returns_none_for_path_traversal(self, tmp_path, monkeypatch):
"""_cli_project_dir returns None when the project dir symlink escapes projects base."""
config_dir = tmp_path / "config"
config_dir.mkdir()
projects_dir = config_dir / "projects"
projects_dir.mkdir()
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
# Create a symlink inside projects/ that points outside of it.
# _cli_project_dir encodes the cwd as all-alnum-hyphens, so use a
# cwd whose encoded form matches the symlink name we create.
evil_target = tmp_path / "escaped"
evil_target.mkdir()
# The encoded form of "/evil/cwd" is "-evil-cwd"
symlink_path = projects_dir / "-evil-cwd"
symlink_path.symlink_to(evil_target)
result = _cli_project_dir("/evil/cwd")
assert result is None
# --- delete_transcript ---
@@ -382,7 +302,7 @@ class TestDeleteTranscript:
mock_storage.delete = AsyncMock()
with patch(
"backend.util.workspace_storage.get_workspace_storage",
"backend.copilot.sdk.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -402,7 +322,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.util.workspace_storage.get_workspace_storage",
"backend.copilot.sdk.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -420,7 +340,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.util.workspace_storage.get_workspace_storage",
"backend.copilot.sdk.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -897,3 +817,386 @@ class TestCompactionFlowIntegration:
output2 = builder2.to_jsonl()
lines2 = [json.loads(line) for line in output2.strip().split("\n")]
assert lines2[-1]["parentUuid"] == "a2"
# ---------------------------------------------------------------------------
# _run_compression (direct tests for the 3 code paths)
# ---------------------------------------------------------------------------
class TestRunCompression:
"""Direct tests for ``_run_compression`` covering all 3 code paths.
Paths:
(a) No OpenAI client configured → truncation fallback immediately.
(b) LLM success → returns LLM-compressed result.
(c) LLM call raises → truncation fallback.
"""
def _make_compress_result(self, was_compacted: bool, msgs=None):
"""Build a minimal CompressResult-like object."""
from types import SimpleNamespace
return SimpleNamespace(
was_compacted=was_compacted,
messages=msgs or [{"role": "user", "content": "summary"}],
original_token_count=500,
token_count=100 if was_compacted else 500,
messages_summarized=2 if was_compacted else 0,
messages_dropped=0,
)
@pytest.mark.asyncio
async def test_no_client_uses_truncation(self):
"""Path (a): ``get_openai_client()`` returns None → truncation only."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated"}]
)
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
):
result = await _run_compression(
[{"role": "user", "content": "hello"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called with client=None (truncation mode)
call_kwargs = mock_compress.call_args
assert (
call_kwargs.kwargs.get("client") is None
or (call_kwargs.args and call_kwargs.args[2] is None)
or mock_compress.call_args[1].get("client") is None
)
assert result is truncation_result
@pytest.mark.asyncio
async def test_llm_success_returns_llm_result(self):
"""Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
from .transcript import _run_compression
llm_result = self._make_compress_result(
True, [{"role": "user", "content": "LLM summary"}]
)
mock_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=llm_result,
) as mock_compress,
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called with the real client
assert mock_compress.called
assert result is llm_result
@pytest.mark.asyncio
async def test_llm_failure_falls_back_to_truncation(self):
"""Path (c): LLM call raises → truncation fallback used instead."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated fallback"}]
)
mock_client = MagicMock()
call_count = [0]
async def _compress_side_effect(**kwargs):
call_count[0] += 1
if kwargs.get("client") is not None:
raise RuntimeError("LLM timeout")
return truncation_result
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_compress_side_effect,
),
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called twice: once for LLM (raises), once for truncation
assert call_count[0] == 2
assert result is truncation_result
@pytest.mark.asyncio
async def test_llm_timeout_falls_back_to_truncation(self):
"""Path (d): LLM call exceeds timeout → truncation fallback used."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated after timeout"}]
)
call_count = [0]
async def _compress_side_effect(*, messages, model, client):
call_count[0] += 1
if client is not None:
# Simulate a hang that exceeds the timeout
await asyncio.sleep(9999)
return truncation_result
fake_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=fake_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_compress_side_effect,
),
patch(
"backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
0.05,
),
patch(
"backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
5,
),
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called twice: once for LLM (times out), once truncation
assert call_count[0] == 2
assert result is truncation_result
# ---------------------------------------------------------------------------
# cleanup_stale_project_dirs
# ---------------------------------------------------------------------------
class TestCleanupStaleProjectDirs:
"""Tests for cleanup_stale_project_dirs (disk leak prevention)."""
def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories matching copilot pattern older than threshold are removed."""
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
# Create a stale dir
stale = projects_dir / "-tmp-copilot-old-session"
stale.mkdir()
# Set mtime to past the threshold
import time
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
os.utime(stale, (old_time, old_time))
# Create a fresh dir
fresh = projects_dir / "-tmp-copilot-new-session"
fresh.mkdir()
removed = cleanup_stale_project_dirs()
assert removed == 1
assert not stale.exists()
assert fresh.exists()
def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories not matching copilot pattern are left alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
# Non-copilot dir that's old
import time
other = projects_dir / "some-other-project"
other.mkdir()
old_time = time.time() - 999999
os.utime(other, (old_time, old_time))
removed = cleanup_stale_project_dirs()
assert removed == 0
assert other.exists()
def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
"""A directory exactly at the TTL boundary should NOT be removed."""
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
import time
# Dir that's exactly at the TTL (age == threshold, not >) — should survive
boundary = projects_dir / "-tmp-copilot-boundary"
boundary.mkdir()
boundary_time = time.time() - _STALE_PROJECT_DIR_SECONDS + 1
os.utime(boundary, (boundary_time, boundary_time))
removed = cleanup_stale_project_dirs()
assert removed == 0
assert boundary.exists()
def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
"""Regular files matching the copilot pattern are not removed."""
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
import time
# Create a regular FILE (not a dir) with the copilot pattern name
stale_file = projects_dir / "-tmp-copilot-stale-file"
stale_file.write_text("not a dir")
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
os.utime(stale_file, (old_time, old_time))
removed = cleanup_stale_project_dirs()
assert removed == 0
assert stale_file.exists()
def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
"""If the projects base directory doesn't exist, return 0 gracefully."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
nonexistent = str(tmp_path / "does-not-exist" / "projects")
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: nonexistent,
)
removed = cleanup_stale_project_dirs()
assert removed == 0
def test_scoped_removes_only_target_dir(self, tmp_path, monkeypatch):
"""When encoded_cwd is supplied only that directory is swept."""
import time
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
# Two stale copilot dirs
target = projects_dir / "-tmp-copilot-session-abc"
target.mkdir()
os.utime(target, (old_time, old_time))
other = projects_dir / "-tmp-copilot-session-xyz"
other.mkdir()
os.utime(other, (old_time, old_time))
# Only the target dir should be removed
removed = cleanup_stale_project_dirs(encoded_cwd="-tmp-copilot-session-abc")
assert removed == 1
assert not target.exists()
assert other.exists() # untouched — not the current session
def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
"""Scoped sweep leaves a fresh directory alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
fresh = projects_dir / "-tmp-copilot-session-new"
fresh.mkdir()
# mtime is now — well within TTL
removed = cleanup_stale_project_dirs(encoded_cwd="-tmp-copilot-session-new")
assert removed == 0
assert fresh.exists()
def test_scoped_non_copilot_dir_not_removed(self, tmp_path, monkeypatch):
"""Scoped sweep refuses to remove a non-copilot directory."""
import time
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
non_copilot = projects_dir / "some-other-project"
non_copilot.mkdir()
os.utime(non_copilot, (old_time, old_time))
removed = cleanup_stale_project_dirs(encoded_cwd="some-other-project")
assert removed == 0
assert non_copilot.exists()

View File

@@ -4,11 +4,12 @@ These tests verify the complete copilot flow using dummy implementations
for agent generator and SDK service, allowing automated testing without
external LLM calls.
Enable test mode with COPILOT_TEST_MODE=true environment variable.
Enable test mode with CHAT_TEST_MODE=true environment variable (or in .env).
Note: StreamFinish is NOT emitted by the dummy service — it is published
by mark_session_completed in the processor layer. These tests only cover
the service-level streaming output (StreamStart + StreamTextDelta).
The dummy service emits the full AI SDK protocol event sequence:
StreamStart → StreamStartStep → StreamTextStart → StreamTextDelta(s) →
StreamTextEnd → StreamFinishStep → StreamFinish.
The processor skips StreamFinish and publishes its own via mark_session_completed.
"""
import asyncio
@@ -20,9 +21,14 @@ import pytest
from backend.copilot.model import ChatMessage, ChatSession, upsert_chat_session
from backend.copilot.response_model import (
StreamError,
StreamFinish,
StreamFinishStep,
StreamHeartbeat,
StreamStart,
StreamStartStep,
StreamTextDelta,
StreamTextEnd,
StreamTextStart,
)
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -30,9 +36,9 @@ from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@pytest.fixture(autouse=True)
def enable_test_mode():
"""Enable test mode for all tests in this module."""
os.environ["COPILOT_TEST_MODE"] = "true"
os.environ["CHAT_TEST_MODE"] = "true"
yield
os.environ.pop("COPILOT_TEST_MODE", None)
os.environ.pop("CHAT_TEST_MODE", None)
@pytest.mark.asyncio
@@ -110,9 +116,14 @@ async def test_streaming_event_types():
):
event_types.add(type(event).__name__)
# Required event types (StreamFinish is published by processor, not service)
# Required event types for full AI SDK protocol
assert "StreamStart" in event_types, "Missing StreamStart"
assert "StreamStartStep" in event_types, "Missing StreamStartStep"
assert "StreamTextStart" in event_types, "Missing StreamTextStart"
assert "StreamTextDelta" in event_types, "Missing StreamTextDelta"
assert "StreamTextEnd" in event_types, "Missing StreamTextEnd"
assert "StreamFinishStep" in event_types, "Missing StreamFinishStep"
assert "StreamFinish" in event_types, "Missing StreamFinish"
print(f"✅ Event types: {sorted(event_types)}")
@@ -175,16 +186,17 @@ async def test_streaming_heartbeat_timing():
@pytest.mark.asyncio
async def test_error_handling():
"""Test that errors are properly formatted and sent."""
# This would require a dummy that can trigger errors
# For now, just verify error event structure
"""Test that error events have correct SSE structure."""
error = StreamError(errorText="Test error", code="test_error")
assert error.errorText == "Test error"
assert error.code == "test_error"
assert str(error.type.value) in ["error", "error"]
print("✅ Error structure verified")
# Verify to_sse() strips code (AI SDK protocol compliance)
sse = error.to_sse()
assert '"errorText"' in sse
assert '"code"' not in sse, "to_sse() must strip code field for AI SDK"
print("✅ Error structure verified (code stripped in SSE)")
@pytest.mark.asyncio
@@ -326,20 +338,85 @@ async def test_stream_completeness():
):
events.append(event)
# Check for required events (StreamFinish is published by processor)
has_start = any(isinstance(e, StreamStart) for e in events)
has_text = any(isinstance(e, StreamTextDelta) for e in events)
assert has_start, "Stream must include StreamStart"
assert has_text, "Stream must include text deltas"
# Check for all required event types
assert any(isinstance(e, StreamStart) for e in events), "Missing StreamStart"
assert any(
isinstance(e, StreamStartStep) for e in events
), "Missing StreamStartStep"
assert any(
isinstance(e, StreamTextStart) for e in events
), "Missing StreamTextStart"
assert any(
isinstance(e, StreamTextDelta) for e in events
), "Missing StreamTextDelta"
assert any(isinstance(e, StreamTextEnd) for e in events), "Missing StreamTextEnd"
assert any(
isinstance(e, StreamFinishStep) for e in events
), "Missing StreamFinishStep"
assert any(isinstance(e, StreamFinish) for e in events), "Missing StreamFinish"
# Verify exactly one start
start_count = sum(1 for e in events if isinstance(e, StreamStart))
assert start_count == 1, f"Should have exactly 1 StreamStart, got {start_count}"
print(
f"✅ Completeness: 1 start, {sum(1 for e in events if isinstance(e, StreamTextDelta))} text deltas"
)
print(f"✅ Completeness: {len(events)} events, full protocol sequence")
@pytest.mark.asyncio
async def test_transient_error_shows_retryable():
"""Test __test_transient_error__ yields partial text then retryable StreamError."""
events = []
async for event in stream_chat_completion_dummy(
session_id="test-transient",
message="please fail __test_transient_error__",
is_user_message=True,
user_id="test-user",
):
events.append(event)
# Should start with StreamStart
assert isinstance(events[0], StreamStart)
# Should have some partial text before the error
text_events = [e for e in events if isinstance(e, StreamTextDelta)]
assert len(text_events) > 0, "Should stream partial text before error"
# Should end with StreamError
error_events = [e for e in events if isinstance(e, StreamError)]
assert len(error_events) == 1, "Should have exactly one StreamError"
assert error_events[0].code == "transient_api_error"
assert "connection interrupted" in error_events[0].errorText.lower()
print(f"✅ Transient error: {len(text_events)} partial deltas + retryable error")
@pytest.mark.asyncio
async def test_fatal_error_not_retryable():
"""Test __test_fatal_error__ yields StreamError without retryable code."""
events = []
async for event in stream_chat_completion_dummy(
session_id="test-fatal",
message="__test_fatal_error__",
is_user_message=True,
user_id="test-user",
):
events.append(event)
assert isinstance(events[0], StreamStart)
# Should have StreamError with sdk_error code (not transient)
error_events = [e for e in events if isinstance(e, StreamError)]
assert len(error_events) == 1
assert error_events[0].code == "sdk_error"
assert "transient" not in error_events[0].code
# Should NOT have any text deltas (fatal errors fail immediately)
text_events = [e for e in events if isinstance(e, StreamTextDelta)]
assert len(text_events) == 0, "Fatal error should not stream any text"
print("✅ Fatal error: immediate error, no partial text")
@pytest.mark.asyncio
@@ -395,6 +472,8 @@ if __name__ == "__main__":
asyncio.run(test_message_deduplication())
asyncio.run(test_event_ordering())
asyncio.run(test_stream_completeness())
asyncio.run(test_transient_error_shows_retryable())
asyncio.run(test_fatal_error_not_retryable())
asyncio.run(test_text_delta_consistency())
print("=" * 60)

View File

@@ -12,6 +12,7 @@ from .agent_browser import BrowserActTool, BrowserNavigateTool, BrowserScreensho
from .agent_output import AgentOutputTool
from .base import BaseTool
from .bash_exec import BashExecTool
from .connect_integration import ConnectIntegrationTool
from .continue_run_block import ContinueRunBlockTool
from .create_agent import CreateAgentTool
from .customize_agent import CustomizeAgentTool
@@ -84,6 +85,7 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
"browser_screenshot": BrowserScreenshotTool(),
# Sandboxed code execution (bubblewrap)
"bash_exec": BashExecTool(),
"connect_integration": ConnectIntegrationTool(),
# Persistent workspace tools (cloud storage, survives across sessions)
# Feature request tools
"search_feature_requests": SearchFeatureRequestsTool(),

View File

@@ -20,7 +20,9 @@ SSRF protection:
Requires:
npm install -g agent-browser
agent-browser install (downloads Chromium, one-time per machine)
agent-browser install (downloads Chromium, one-time — skipped in Docker
where system chromium is pre-installed and
AGENT_BROWSER_EXECUTABLE_PATH is set)
"""
import asyncio

View File

@@ -7,6 +7,7 @@ from typing import Any
from .helpers import (
AGENT_EXECUTOR_BLOCK_ID,
MCP_TOOL_BLOCK_ID,
SMART_DECISION_MAKER_BLOCK_ID,
AgentDict,
are_types_compatible,
generate_uuid,
@@ -30,6 +31,14 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
_GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
_TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"
# Defaults applied to SmartDecisionMakerBlock nodes by the fixer.
_SDM_DEFAULTS: dict[str, int | bool] = {
"agent_mode_max_iterations": 10,
"conversation_compaction": True,
"retry": 3,
"multiple_tool_calls": False,
}
class AgentFixer:
"""
@@ -1630,6 +1639,43 @@ class AgentFixer:
return agent
def fix_smart_decision_maker_blocks(self, agent: AgentDict) -> AgentDict:
"""Fix SmartDecisionMakerBlock nodes to ensure agent-mode defaults.
Ensures:
1. ``agent_mode_max_iterations`` defaults to ``10`` (bounded agent mode)
2. ``conversation_compaction`` defaults to ``True``
3. ``retry`` defaults to ``3``
4. ``multiple_tool_calls`` defaults to ``False``
Args:
agent: The agent dictionary to fix
Returns:
The fixed agent dictionary
"""
nodes = agent.get("nodes", [])
for node in nodes:
if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
continue
node_id = node.get("id", "unknown")
input_default = node.get("input_default")
if not isinstance(input_default, dict):
input_default = {}
node["input_default"] = input_default
for field, default_value in _SDM_DEFAULTS.items():
if field not in input_default or input_default[field] is None:
input_default[field] = default_value
self.add_fix_log(
f"SmartDecisionMakerBlock {node_id}: "
f"Set {field}={default_value!r}"
)
return agent
def fix_dynamic_block_sink_names(self, agent: AgentDict) -> AgentDict:
"""Fix links that use _#_ notation for dynamic block sink names.
@@ -1717,6 +1763,9 @@ class AgentFixer:
# Apply fixes for MCPToolBlock nodes
agent = self.fix_mcp_tool_blocks(agent)
# Apply fixes for SmartDecisionMakerBlock nodes (agent-mode defaults)
agent = self.fix_smart_decision_maker_blocks(agent)
# Apply fixes for AgentExecutorBlock nodes (sub-agents)
if library_agents:
agent = self.fix_agent_executor_blocks(agent, library_agents)

View File

@@ -12,6 +12,7 @@ __all__ = [
"AGENT_OUTPUT_BLOCK_ID",
"AgentDict",
"MCP_TOOL_BLOCK_ID",
"SMART_DECISION_MAKER_BLOCK_ID",
"UUID_REGEX",
"are_types_compatible",
"generate_uuid",
@@ -33,6 +34,7 @@ UUID_REGEX = re.compile(r"^" + UUID_RE_STR + r"$")
AGENT_EXECUTOR_BLOCK_ID = "e189baac-8c20-45a1-94a7-55177ea42565"
MCP_TOOL_BLOCK_ID = "a0a4b1c2-d3e4-4f56-a7b8-c9d0e1f2a3b4"
SMART_DECISION_MAKER_BLOCK_ID = "3b191d9f-356f-482d-8238-ba04b6d18381"
AGENT_INPUT_BLOCK_ID = "c0a8e994-ebf1-4a9c-a4d8-89d09c86741b"
AGENT_OUTPUT_BLOCK_ID = "363ae599-353e-4804-937e-b2ee3cef3da4"

View File

@@ -10,6 +10,7 @@ from .helpers import (
AGENT_INPUT_BLOCK_ID,
AGENT_OUTPUT_BLOCK_ID,
MCP_TOOL_BLOCK_ID,
SMART_DECISION_MAKER_BLOCK_ID,
AgentDict,
are_types_compatible,
get_defined_property_type,
@@ -181,15 +182,23 @@ class AgentValidator:
return valid
def _build_node_lookup(self, agent: AgentDict) -> dict[str, dict[str, Any]]:
"""Build a node-id → node dict from the agent's nodes."""
return {node.get("id", ""): node for node in agent.get("nodes", [])}
def validate_data_type_compatibility(
self, agent: AgentDict, blocks: list[dict[str, Any]]
self,
agent: AgentDict,
blocks: list[dict[str, Any]],
node_lookup: dict[str, dict[str, Any]] | None = None,
) -> bool:
"""
Validate that linked data types are compatible between source and sink.
Returns True if all data types are compatible, False otherwise.
"""
valid = True
node_lookup = {node.get("id", ""): node for node in agent.get("nodes", [])}
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
block_lookup = {block.get("id", ""): block for block in blocks}
for link in agent.get("links", []):
@@ -209,8 +218,8 @@ class AgentValidator:
valid = False
continue
source_node = node_lookup.get(source_id, "")
sink_node = node_lookup.get(sink_id, "")
source_node = node_lookup.get(source_id)
sink_node = node_lookup.get(sink_id)
if not source_node or not sink_node:
continue
@@ -248,7 +257,10 @@ class AgentValidator:
return valid
def validate_nested_sink_links(
self, agent: AgentDict, blocks: list[dict[str, Any]]
self,
agent: AgentDict,
blocks: list[dict[str, Any]],
node_lookup: dict[str, dict[str, Any]] | None = None,
) -> bool:
"""
Validate nested sink links (links with _#_ notation).
@@ -262,7 +274,8 @@ class AgentValidator:
block_names = {
block.get("id", ""): block.get("name", "Unknown Block") for block in blocks
}
node_lookup = {node.get("id", ""): node for node in agent.get("nodes", [])}
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
for link in agent.get("links", []):
sink_name = link.get("sink_name", "")
@@ -388,7 +401,10 @@ class AgentValidator:
return valid
def validate_source_output_existence(
self, agent: AgentDict, blocks: list[dict[str, Any]]
self,
agent: AgentDict,
blocks: list[dict[str, Any]],
node_lookup: dict[str, dict[str, Any]] | None = None,
) -> bool:
"""
Validate that all source_names in links exist in the corresponding
@@ -401,6 +417,7 @@ class AgentValidator:
Args:
agent: The agent dictionary to validate
blocks: List of available blocks with their schemas
node_lookup: Optional pre-built node-id → node dict
Returns:
True if all source output fields exist, False otherwise
@@ -415,7 +432,8 @@ class AgentValidator:
block_names = {
block.get("id", ""): block.get("name", "Unknown Block") for block in blocks
}
node_lookup = {node.get("id", ""): node for node in agent.get("nodes", [])}
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
for link in agent.get("links", []):
source_id = link.get("source_id")
@@ -809,6 +827,96 @@ class AgentValidator:
return valid
def validate_smart_decision_maker_blocks(
self,
agent: AgentDict,
node_lookup: dict[str, dict[str, Any]] | None = None,
) -> bool:
"""Validate that SmartDecisionMakerBlock nodes have downstream tools.
Checks that each SmartDecisionMakerBlock node has at least one link
with ``source_name == "tools"`` connecting to a downstream block.
Without tools, the block has nothing to call and will error at runtime.
Returns True if all SmartDecisionMakerBlock nodes are valid.
"""
valid = True
nodes = agent.get("nodes", [])
links = agent.get("links", [])
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
non_tool_block_ids = {AGENT_INPUT_BLOCK_ID, AGENT_OUTPUT_BLOCK_ID}
for node in nodes:
if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
continue
node_id = node.get("id", "unknown")
customized_name = (node.get("metadata") or {}).get(
"customized_name", node_id
)
# Warn if agent_mode_max_iterations is 0 (traditional mode) —
# requires complex external conversation-history loop wiring
# that the agent generator does not produce.
input_default = node.get("input_default", {})
max_iter = input_default.get("agent_mode_max_iterations")
if max_iter is not None and not isinstance(max_iter, int):
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has non-integer "
f"agent_mode_max_iterations={max_iter!r}. "
f"This field must be an integer."
)
valid = False
elif isinstance(max_iter, int) and max_iter < -1:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has invalid "
f"agent_mode_max_iterations={max_iter}. "
f"Use -1 for infinite or a positive number for "
f"bounded iterations."
)
valid = False
elif isinstance(max_iter, int) and max_iter > 100:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has agent_mode_max_iterations="
f"{max_iter} which is unusually high. Values above "
f"100 risk excessive cost and long execution times. "
f"Consider using a lower value (3-10) or -1 for "
f"genuinely open-ended tasks."
)
valid = False
elif max_iter == 0:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has agent_mode_max_iterations=0 "
f"(traditional mode). The agent generator only supports "
f"agent mode (set to -1 for infinite or a positive "
f"number for bounded iterations)."
)
valid = False
has_tools = any(
link.get("source_id") == node_id
and link.get("source_name") == "tools"
and node_lookup.get(link.get("sink_id", ""), {}).get("block_id")
not in non_tool_block_ids
for link in links
)
if not has_tools:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has no downstream tool blocks connected. "
f"Connect at least one block to its 'tools' output so "
f"the AI has tools to call."
)
valid = False
return valid
def validate_mcp_tool_blocks(self, agent: AgentDict) -> bool:
"""Validate that MCPToolBlock nodes have required fields.
@@ -870,6 +978,9 @@ class AgentValidator:
logger.info("Validating agent...")
self.errors = []
# Build node lookup once and share across validation methods
node_lookup = self._build_node_lookup(agent)
checks = [
(
"Block existence",
@@ -885,15 +996,15 @@ class AgentValidator:
),
(
"Data type compatibility",
self.validate_data_type_compatibility(agent, blocks),
self.validate_data_type_compatibility(agent, blocks, node_lookup),
),
(
"Nested sink links",
self.validate_nested_sink_links(agent, blocks),
self.validate_nested_sink_links(agent, blocks, node_lookup),
),
(
"Source output existence",
self.validate_source_output_existence(agent, blocks),
self.validate_source_output_existence(agent, blocks, node_lookup),
),
(
"Prompt double curly braces spaces",
@@ -913,6 +1024,10 @@ class AgentValidator:
"MCP tool blocks",
self.validate_mcp_tool_blocks(agent),
),
(
"SmartDecisionMaker blocks",
self.validate_smart_decision_maker_blocks(agent, node_lookup),
),
]
# Add AgentExecutorBlock detailed validation if library_agents

View File

@@ -3,11 +3,11 @@
from __future__ import annotations
import logging
import re
from typing import TYPE_CHECKING, Literal
if TYPE_CHECKING:
from backend.api.features.library.model import LibraryAgent
from backend.api.features.store.model import StoreAgent, StoreAgentDetails
from backend.data.db_accessors import library_db, store_db
from backend.util.exceptions import DatabaseError, NotFoundError
@@ -19,16 +19,12 @@ from .models import (
NoResultsResponse,
ToolResponseBase,
)
from .utils import is_creator_slug, is_uuid
logger = logging.getLogger(__name__)
SearchSource = Literal["marketplace", "library"]
_UUID_PATTERN = re.compile(
r"^[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89ab][a-f0-9]{3}-[a-f0-9]{12}$",
re.IGNORECASE,
)
# Keywords that should be treated as "list all" rather than a literal search
_LIST_ALL_KEYWORDS = frozenset({"all", "*", "everything", "any", ""})
@@ -39,149 +35,160 @@ async def search_agents(
session_id: str | None = None,
user_id: str | None = None,
) -> ToolResponseBase:
"""
Search for agents in marketplace or user library.
"""Search for agents in marketplace or user library."""
if source == "marketplace":
return await _search_marketplace(query, session_id)
else:
return await _search_library(query, session_id, user_id)
For library searches, keywords like "all", "*", "everything", or an empty
query will list all agents without filtering.
Args:
query: Search query string. Special keywords list all library agents.
source: "marketplace" or "library"
session_id: Chat session ID
user_id: User ID (required for library search)
Returns:
AgentsFoundResponse, NoResultsResponse, or ErrorResponse
"""
# Normalize list-all keywords to empty string for library searches
if source == "library" and query.lower().strip() in _LIST_ALL_KEYWORDS:
query = ""
if source == "marketplace" and not query:
async def _search_marketplace(query: str, session_id: str | None) -> ToolResponseBase:
"""Search marketplace agents, with direct creator/slug lookup fallback."""
query = query.strip()
if not query:
return ErrorResponse(
message="Please provide a search query", session_id=session_id
)
if source == "library" and not user_id:
return ErrorResponse(
message="User authentication required to search library",
session_id=session_id,
)
agents: list[AgentInfo] = []
try:
if source == "marketplace":
# Direct lookup if query matches "creator/slug" pattern
if is_creator_slug(query):
logger.info(f"Query looks like creator/slug, trying direct lookup: {query}")
creator, slug = query.split("/", 1)
agent_info = await _get_marketplace_agent_by_slug(creator, slug)
if agent_info:
agents.append(agent_info)
if not agents:
logger.info(f"Searching marketplace for: {query}")
results = await store_db().get_store_agents(search_query=query, page_size=5)
for agent in results.agents:
agents.append(
AgentInfo(
id=f"{agent.creator}/{agent.slug}",
name=agent.agent_name,
description=agent.description or "",
source="marketplace",
in_library=False,
creator=agent.creator,
category="general",
rating=agent.rating,
runs=agent.runs,
is_featured=False,
)
)
else:
if _is_uuid(query):
logger.info(f"Query looks like UUID, trying direct lookup: {query}")
agent = await _get_library_agent_by_id(user_id, query) # type: ignore[arg-type]
if agent:
agents.append(agent)
logger.info(f"Found agent by direct ID lookup: {agent.name}")
if not agents:
search_term = query or None
logger.info(
f"{'Listing all agents in' if not query else 'Searching'} "
f"user library{'' if not query else f' for: {query}'}"
)
results = await library_db().list_library_agents(
user_id=user_id, # type: ignore[arg-type]
search_term=search_term,
page_size=50 if not query else 10,
)
for agent in results.agents:
agents.append(_library_agent_to_info(agent))
logger.info(f"Found {len(agents)} agents in {source}")
agents.append(_marketplace_agent_to_info(agent))
except NotFoundError:
pass
except DatabaseError as e:
logger.error(f"Error searching {source}: {e}", exc_info=True)
logger.error(f"Error searching marketplace: {e}", exc_info=True)
return ErrorResponse(
message=f"Failed to search {source}. Please try again.",
message="Failed to search marketplace. Please try again.",
error=str(e),
session_id=session_id,
)
if not agents:
if source == "marketplace":
suggestions = [
"Try more general terms",
"Browse categories in the marketplace",
"Check spelling",
]
no_results_msg = (
return NoResultsResponse(
message=(
f"No agents found matching '{query}'. Let the user know they can "
"try different keywords or browse the marketplace. Also let them "
"know you can create a custom agent for them based on their needs."
),
suggestions=[
"Try more general terms",
"Browse categories in the marketplace",
"Check spelling",
],
session_id=session_id,
)
return AgentsFoundResponse(
message=(
"Now you have found some options for the user to choose from. "
"You can add a link to a recommended agent at: /marketplace/agent/agent_id "
"Please ask the user if they would like to use any of these agents. "
"Let the user know we can create a custom agent for them based on their needs."
),
title=f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} for '{query}'",
agents=agents,
count=len(agents),
session_id=session_id,
)
async def _search_library(
query: str, session_id: str | None, user_id: str | None
) -> ToolResponseBase:
"""Search user's library agents, with direct UUID lookup fallback."""
if not user_id:
return ErrorResponse(
message="User authentication required to search library",
session_id=session_id,
)
query = query.strip()
# Normalize list-all keywords to empty string
if query.lower() in _LIST_ALL_KEYWORDS:
query = ""
agents: list[AgentInfo] = []
try:
if is_uuid(query):
logger.info(f"Query looks like UUID, trying direct lookup: {query}")
agent = await _get_library_agent_by_id(user_id, query)
if agent:
agents.append(agent)
if not agents:
logger.info(
f"{'Listing all agents in' if not query else 'Searching'} "
f"user library{'' if not query else f' for: {query}'}"
)
elif not query:
# User asked to list all but library is empty
suggestions = [
"Browse the marketplace to find and add agents",
"Use find_agent to search the marketplace",
]
no_results_msg = (
"Your library is empty. Let the user know they can browse the "
"marketplace to find agents, or you can create a custom agent "
"for them based on their needs."
results = await library_db().list_library_agents(
user_id=user_id,
search_term=query or None,
page_size=50 if not query else 10,
)
else:
suggestions = [
"Try different keywords",
"Use find_agent to search the marketplace",
"Check your library at /library",
]
no_results_msg = (
for agent in results.agents:
agents.append(_library_agent_to_info(agent))
except NotFoundError:
pass
except DatabaseError as e:
logger.error(f"Error searching library: {e}", exc_info=True)
return ErrorResponse(
message="Failed to search library. Please try again.",
error=str(e),
session_id=session_id,
)
if not agents:
if not query:
return NoResultsResponse(
message=(
"Your library is empty. Let the user know they can browse the "
"marketplace to find agents, or you can create a custom agent "
"for them based on their needs."
),
suggestions=[
"Browse the marketplace to find and add agents",
"Use find_agent to search the marketplace",
],
session_id=session_id,
)
return NoResultsResponse(
message=(
f"No agents matching '{query}' found in your library. Let the "
"user know you can create a custom agent for them based on "
"their needs."
)
return NoResultsResponse(
message=no_results_msg, session_id=session_id, suggestions=suggestions
),
suggestions=[
"Try different keywords",
"Use find_agent to search the marketplace",
"Check your library at /library",
],
session_id=session_id,
)
if source == "marketplace":
title = (
f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} for '{query}'"
)
elif not query:
if not query:
title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} in your library"
else:
title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} in your library for '{query}'"
message = (
"Now you have found some options for the user to choose from. "
"You can add a link to a recommended agent at: /marketplace/agent/agent_id "
"Please ask the user if they would like to use any of these agents. "
"Let the user know we can create a custom agent for them based on their needs."
if source == "marketplace"
else "Found agents in the user's library. You can provide a link to view "
"an agent at: /library/agents/{agent_id}. Use agent_output to get "
"execution results, or run_agent to execute. Let the user know we can "
"create a custom agent for them based on their needs."
)
return AgentsFoundResponse(
message=message,
message=(
"Found agents in the user's library. You can provide a link to view "
"an agent at: /library/agents/{agent_id}. Use agent_output to get "
"execution results, or run_agent to execute. Let the user know we can "
"create a custom agent for them based on their needs."
),
title=title,
agents=agents,
count=len(agents),
@@ -189,9 +196,20 @@ async def search_agents(
)
def _is_uuid(text: str) -> bool:
"""Check if text is a valid UUID v4."""
return bool(_UUID_PATTERN.match(text.strip()))
def _marketplace_agent_to_info(agent: StoreAgent | StoreAgentDetails) -> AgentInfo:
"""Convert a marketplace agent (StoreAgent or StoreAgentDetails) to an AgentInfo."""
return AgentInfo(
id=f"{agent.creator}/{agent.slug}",
name=agent.agent_name,
description=agent.description or "",
source="marketplace",
in_library=False,
creator=agent.creator,
category="general",
rating=agent.rating,
runs=agent.runs,
is_featured=False,
)
def _library_agent_to_info(agent: LibraryAgent) -> AgentInfo:
@@ -214,6 +232,23 @@ def _library_agent_to_info(agent: LibraryAgent) -> AgentInfo:
)
async def _get_marketplace_agent_by_slug(creator: str, slug: str) -> AgentInfo | None:
"""Fetch a marketplace agent by creator/slug identifier."""
try:
details = await store_db().get_store_agent_details(creator, slug)
return _marketplace_agent_to_info(details)
except NotFoundError:
pass
except DatabaseError:
raise
except Exception as e:
logger.warning(
f"Could not fetch marketplace agent {creator}/{slug}: {e}",
exc_info=True,
)
return None
async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | None:
"""Fetch a library agent by ID (library agent ID or graph_id).
@@ -226,10 +261,9 @@ async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | N
try:
agent = await lib_db.get_library_agent_by_graph_id(user_id, agent_id)
if agent:
logger.debug(f"Found library agent by graph_id: {agent.name}")
return _library_agent_to_info(agent)
except NotFoundError:
logger.debug(f"Library agent not found by graph_id: {agent_id}")
pass
except DatabaseError:
raise
except Exception as e:
@@ -241,10 +275,9 @@ async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | N
try:
agent = await lib_db.get_library_agent(agent_id, user_id)
if agent:
logger.debug(f"Found library agent by library_id: {agent.name}")
return _library_agent_to_info(agent)
except NotFoundError:
logger.debug(f"Library agent not found by library_id: {agent_id}")
pass
except DatabaseError:
raise
except Exception as e:

View File

@@ -0,0 +1,170 @@
"""Tests for agent search direct lookup functionality."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from .agent_search import search_agents
from .models import AgentsFoundResponse, NoResultsResponse
_TEST_USER_ID = "test-user-agent-search"
class TestMarketplaceSlugLookup:
"""Tests for creator/slug direct lookup in marketplace search."""
@pytest.mark.asyncio(loop_scope="session")
async def test_slug_lookup_found(self):
"""creator/slug query returns the agent directly."""
mock_details = MagicMock()
mock_details.creator = "testuser"
mock_details.slug = "my-agent"
mock_details.agent_name = "My Agent"
mock_details.description = "A test agent"
mock_details.rating = 4.5
mock_details.runs = 100
mock_store = MagicMock()
mock_store.get_store_agent_details = AsyncMock(return_value=mock_details)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="testuser/my-agent",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, AgentsFoundResponse)
assert response.count == 1
assert response.agents[0].id == "testuser/my-agent"
assert response.agents[0].name == "My Agent"
@pytest.mark.asyncio(loop_scope="session")
async def test_slug_lookup_not_found_falls_back_to_search(self):
"""creator/slug not found falls back to general search."""
from backend.util.exceptions import NotFoundError
mock_store = MagicMock()
mock_store.get_store_agent_details = AsyncMock(side_effect=NotFoundError(""))
# Fallback search returns results
mock_search_results = MagicMock()
mock_agent = MagicMock()
mock_agent.creator = "other"
mock_agent.slug = "similar-agent"
mock_agent.agent_name = "Similar Agent"
mock_agent.description = "A similar agent"
mock_agent.rating = 3.0
mock_agent.runs = 50
mock_search_results.agents = [mock_agent]
mock_store.get_store_agents = AsyncMock(return_value=mock_search_results)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="testuser/my-agent",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, AgentsFoundResponse)
assert response.count == 1
assert response.agents[0].id == "other/similar-agent"
@pytest.mark.asyncio(loop_scope="session")
async def test_slug_lookup_not_found_no_search_results(self):
"""creator/slug not found and search returns nothing."""
from backend.util.exceptions import NotFoundError
mock_store = MagicMock()
mock_store.get_store_agent_details = AsyncMock(side_effect=NotFoundError(""))
mock_search_results = MagicMock()
mock_search_results.agents = []
mock_store.get_store_agents = AsyncMock(return_value=mock_search_results)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="testuser/nonexistent",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, NoResultsResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_non_slug_query_goes_to_search(self):
"""Regular keyword query skips slug lookup and goes to search."""
mock_store = MagicMock()
mock_search_results = MagicMock()
mock_agent = MagicMock()
mock_agent.creator = "creator1"
mock_agent.slug = "email-agent"
mock_agent.agent_name = "Email Agent"
mock_agent.description = "Sends emails"
mock_agent.rating = 4.0
mock_agent.runs = 200
mock_search_results.agents = [mock_agent]
mock_store.get_store_agents = AsyncMock(return_value=mock_search_results)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="email",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, AgentsFoundResponse)
# get_store_agent_details should NOT have been called
mock_store.get_store_agent_details.assert_not_called()
class TestLibraryUUIDLookup:
"""Tests for UUID direct lookup in library search."""
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_found_by_graph_id(self):
"""UUID query matching a graph_id returns the agent directly."""
agent_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
mock_agent = MagicMock()
mock_agent.id = "lib-agent-id"
mock_agent.name = "My Library Agent"
mock_agent.description = "A library agent"
mock_agent.creator_name = "testuser"
mock_agent.status.value = "HEALTHY"
mock_agent.can_access_graph = True
mock_agent.has_external_trigger = False
mock_agent.new_output = False
mock_agent.graph_id = agent_id
mock_agent.graph_version = 1
mock_agent.input_schema = {}
mock_agent.output_schema = {}
mock_lib_db = MagicMock()
mock_lib_db.get_library_agent_by_graph_id = AsyncMock(return_value=mock_agent)
with patch(
"backend.copilot.tools.agent_search.library_db",
return_value=mock_lib_db,
):
response = await search_agents(
query=agent_id,
source="library",
session_id="test-session",
user_id=_TEST_USER_ID,
)
assert isinstance(response, AgentsFoundResponse)
assert response.count == 1
assert response.agents[0].name == "My Library Agent"

View File

@@ -164,8 +164,9 @@ class BaseTool:
"""
if self.requires_auth and not user_id:
logger.error(
f"Attempted tool call for {self.name} but user not authenticated"
logger.warning(
"Attempted tool call for %s but user not authenticated",
self.name,
)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
@@ -196,7 +197,7 @@ class BaseTool:
output=raw_output,
)
except Exception as e:
logger.error(f"Error in {self.name}: {e}", exc_info=True)
logger.warning("Error in %s", self.name, exc_info=True)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=self.name,

View File

@@ -22,6 +22,7 @@ from e2b import AsyncSandbox
from e2b.exceptions import TimeoutException
from backend.copilot.context import E2B_WORKDIR, get_current_sandbox
from backend.copilot.integration_creds import get_integration_env_vars
from backend.copilot.model import ChatSession
from .base import BaseTool
@@ -74,7 +75,10 @@ class BashExecTool(BaseTool):
@property
def requires_auth(self) -> bool:
return False
# True because _execute_on_e2b injects user tokens (GH_TOKEN etc.)
# when user_id is present. Defense-in-depth: ensures only authenticated
# users reach the token injection path.
return True
async def _execute(
self,
@@ -82,6 +86,14 @@ class BashExecTool(BaseTool):
session: ChatSession,
**kwargs: Any,
) -> ToolResponseBase:
"""Run a bash command on E2B (if available) or in a bubblewrap sandbox.
Dispatches to :meth:`_execute_on_e2b` when a sandbox is present in the
current execution context, otherwise falls back to the local bubblewrap
sandbox. Returns a :class:`BashExecResponse` on success or an
:class:`ErrorResponse` when the sandbox is unavailable or the command
is empty.
"""
session_id = session.session_id if session else None
command: str = (kwargs.get("command") or "").strip()
@@ -96,7 +108,9 @@ class BashExecTool(BaseTool):
sandbox = get_current_sandbox()
if sandbox is not None:
return await self._execute_on_e2b(sandbox, command, timeout, session_id)
return await self._execute_on_e2b(
sandbox, command, timeout, session_id, user_id
)
# Bubblewrap fallback: local isolated execution.
if not has_full_sandbox():
@@ -133,19 +147,42 @@ class BashExecTool(BaseTool):
command: str,
timeout: int,
session_id: str | None,
user_id: str | None = None,
) -> ToolResponseBase:
"""Execute *command* on the E2B sandbox via commands.run()."""
"""Execute *command* on the E2B sandbox via commands.run().
Integration tokens (e.g. GH_TOKEN) are injected into the sandbox env
for any user with connected accounts. E2B has full internet access, so
CLI tools like ``gh`` work without manual authentication.
"""
envs: dict[str, str] = {
"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin",
}
# Collect injected secret values so we can scrub them from output.
secret_values: list[str] = []
if user_id is not None:
integration_env = await get_integration_env_vars(user_id)
secret_values = [v for v in integration_env.values() if v]
envs.update(integration_env)
try:
result = await sandbox.commands.run(
f"bash -c {shlex.quote(command)}",
cwd=E2B_WORKDIR,
timeout=timeout,
envs={"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"},
envs=envs,
)
stdout = result.stdout or ""
stderr = result.stderr or ""
# Scrub injected tokens from command output to prevent exfiltration
# via `echo $GH_TOKEN`, `env`, `printenv`, etc.
for secret in secret_values:
stdout = stdout.replace(secret, "[REDACTED]")
stderr = stderr.replace(secret, "[REDACTED]")
return BashExecResponse(
message=f"Command executed on E2B (exit {result.exit_code})",
stdout=result.stdout or "",
stderr=result.stderr or "",
stdout=stdout,
stderr=stderr,
exit_code=result.exit_code,
timed_out=False,
session_id=session_id,

View File

@@ -0,0 +1,78 @@
"""Tests for BashExecTool — E2B path with token injection."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from ._test_data import make_session
from .bash_exec import BashExecTool
from .models import BashExecResponse
_USER = "user-bash-exec-test"
def _make_tool() -> BashExecTool:
return BashExecTool()
def _make_sandbox(exit_code: int = 0, stdout: str = "", stderr: str = "") -> MagicMock:
result = MagicMock()
result.exit_code = exit_code
result.stdout = stdout
result.stderr = stderr
sandbox = MagicMock()
sandbox.commands.run = AsyncMock(return_value=result)
return sandbox
class TestBashExecE2BTokenInjection:
@pytest.mark.asyncio(loop_scope="session")
async def test_token_injected_when_user_id_set(self):
"""When user_id is provided, integration env vars are merged into sandbox envs."""
tool = _make_tool()
session = make_session(user_id=_USER)
sandbox = _make_sandbox(stdout="ok")
env_vars = {"GH_TOKEN": "gh-secret", "GITHUB_TOKEN": "gh-secret"}
with patch(
"backend.copilot.tools.bash_exec.get_integration_env_vars",
new=AsyncMock(return_value=env_vars),
) as mock_get_env:
result = await tool._execute_on_e2b(
sandbox=sandbox,
command="echo hi",
timeout=10,
session_id=session.session_id,
user_id=_USER,
)
mock_get_env.assert_awaited_once_with(_USER)
call_kwargs = sandbox.commands.run.call_args[1]
assert call_kwargs["envs"]["GH_TOKEN"] == "gh-secret"
assert call_kwargs["envs"]["GITHUB_TOKEN"] == "gh-secret"
assert isinstance(result, BashExecResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_no_token_injection_when_user_id_is_none(self):
"""When user_id is None, get_integration_env_vars must NOT be called."""
tool = _make_tool()
session = make_session(user_id=_USER)
sandbox = _make_sandbox(stdout="ok")
with patch(
"backend.copilot.tools.bash_exec.get_integration_env_vars",
new=AsyncMock(return_value={"GH_TOKEN": "should-not-appear"}),
) as mock_get_env:
result = await tool._execute_on_e2b(
sandbox=sandbox,
command="echo hi",
timeout=10,
session_id=session.session_id,
user_id=None,
)
mock_get_env.assert_not_called()
call_kwargs = sandbox.commands.run.call_args[1]
assert "GH_TOKEN" not in call_kwargs["envs"]
assert isinstance(result, BashExecResponse)

View File

@@ -0,0 +1,196 @@
"""Tool for prompting the user to connect a required integration.
When the copilot encounters an authentication failure (e.g. `gh` CLI returns
"authentication required"), it calls this tool to surface the credentials
setup card in the chat — the same UI that appears when a GitHub block runs
without configured credentials.
"""
from typing import Any, TypedDict
from backend.copilot.model import ChatSession
from backend.copilot.providers import SUPPORTED_PROVIDERS, get_provider_auth_types
from backend.copilot.tools.models import (
ErrorResponse,
ResponseType,
SetupInfo,
SetupRequirementsResponse,
ToolResponseBase,
UserReadiness,
)
from .base import BaseTool
class _CredentialEntry(TypedDict):
"""Shape of each entry inside SetupRequirementsResponse.user_readiness.missing_credentials.
Partially overlaps with :class:`~backend.data.model.CredentialsMetaInput`
(``id``, ``title``, ``provider``) but carries extra UI-facing fields
(``types``, ``scopes``) that the frontend ``SetupRequirementsCard`` needs
to render the inline credential setup card.
Display name is derived from :data:`SUPPORTED_PROVIDERS` at build time
rather than stored here — eliminates the old ``provider_name`` field.
``types`` replaces the old singular ``type`` field; the frontend already
prefers ``types`` and only fell back to ``type`` for compatibility.
"""
id: str
title: str
# Slug used as the credential key (e.g. "github").
provider: str
# All supported credential types the user can choose from (e.g. ["api_key", "oauth2"]).
# The first element is the default/primary type.
types: list[str]
scopes: list[str]
class ConnectIntegrationTool(BaseTool):
"""Surface the credentials setup UI when an integration is not connected."""
@property
def name(self) -> str:
return "connect_integration"
@property
def description(self) -> str:
return (
"Prompt the user to connect a required integration (e.g. GitHub). "
"Call this when an external CLI or API call fails because the user "
"has not connected the relevant account. "
"The tool surfaces a credentials setup card in the chat so the user "
"can authenticate without leaving the page. "
"After the user connects the account, retry the operation. "
"In E2B/cloud sandbox mode the token (GH_TOKEN/GITHUB_TOKEN) is "
"automatically injected per-command in bash_exec — no manual export needed. "
"In local bubblewrap mode network is isolated so GitHub CLI commands "
"will still fail after connecting; inform the user of this limitation."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"provider": {
"type": "string",
"description": (
"Integration provider slug, e.g. 'github'. "
"Must be one of the supported providers."
),
"enum": list(SUPPORTED_PROVIDERS.keys()),
},
"reason": {
"type": "string",
"description": (
"Brief explanation of why the integration is needed, "
"shown to the user in the setup card."
),
"maxLength": 500,
},
"scopes": {
"type": "array",
"items": {"type": "string"},
"description": (
"OAuth scopes to request. Omit to use the provider default. "
"Add extra scopes when you need more access — e.g. for GitHub: "
"'repo' (clone/push/pull), 'read:org' (org membership), "
"'workflow' (GitHub Actions). "
"Requesting only the scopes you actually need is best practice."
),
},
},
"required": ["provider"],
}
@property
def requires_auth(self) -> bool:
# Require auth so only authenticated users can trigger the setup card.
# The card itself is user-agnostic (no per-user data needed), so
# user_id is intentionally unused in _execute.
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs: Any,
) -> ToolResponseBase:
"""Build and return a :class:`SetupRequirementsResponse` for the requested provider.
Validates the *provider* slug against the known registry, merges any
agent-requested OAuth *scopes* with the provider defaults, and constructs
the credential setup card payload that the frontend renders as an inline
authentication prompt.
Returns an :class:`ErrorResponse` if *provider* is unknown.
"""
_ = user_id # setup card is user-agnostic; auth is enforced via requires_auth
session_id = session.session_id if session else None
provider: str = (kwargs.get("provider") or "").strip().lower()
reason: str = (kwargs.get("reason") or "").strip()[
:500
] # cap LLM-controlled text
extra_scopes: list[str] = [
str(s).strip() for s in (kwargs.get("scopes") or []) if str(s).strip()
]
entry = SUPPORTED_PROVIDERS.get(provider)
if not entry:
supported = ", ".join(f"'{p}'" for p in SUPPORTED_PROVIDERS)
return ErrorResponse(
message=(
f"Unknown provider '{provider}'. "
f"Supported providers: {supported}."
),
error="unknown_provider",
session_id=session_id,
)
display_name: str = entry["name"]
supported_types: list[str] = get_provider_auth_types(provider)
# Merge agent-requested scopes with provider defaults (deduplicated, order preserved).
default_scopes: list[str] = entry["default_scopes"]
seen: set[str] = set()
scopes: list[str] = []
for s in default_scopes + extra_scopes:
if s not in seen:
seen.add(s)
scopes.append(s)
field_key = f"{provider}_credentials"
message_parts = [
f"To continue, please connect your {display_name} account.",
]
if reason:
message_parts.append(reason)
credential_entry: _CredentialEntry = {
"id": field_key,
"title": f"{display_name} Credentials",
"provider": provider,
"types": supported_types,
"scopes": scopes,
}
missing_credentials: dict[str, _CredentialEntry] = {field_key: credential_entry}
return SetupRequirementsResponse(
type=ResponseType.SETUP_REQUIREMENTS,
message=" ".join(message_parts),
session_id=session_id,
setup_info=SetupInfo(
agent_id=f"connect_{provider}",
agent_name=display_name,
user_readiness=UserReadiness(
has_all_credentials=False,
missing_credentials=missing_credentials,
ready_to_run=False,
),
requirements={
"credentials": [missing_credentials[field_key]],
"inputs": [],
"execution_modes": [],
},
),
)

View File

@@ -0,0 +1,135 @@
"""Tests for ConnectIntegrationTool."""
import pytest
from ._test_data import make_session
from .connect_integration import ConnectIntegrationTool
from .models import ErrorResponse, SetupRequirementsResponse
_TEST_USER_ID = "test-user-connect-integration"
class TestConnectIntegrationTool:
def _make_tool(self) -> ConnectIntegrationTool:
return ConnectIntegrationTool()
@pytest.mark.asyncio(loop_scope="session")
async def test_unknown_provider_returns_error(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="nonexistent"
)
assert isinstance(result, ErrorResponse)
assert result.error == "unknown_provider"
assert "nonexistent" in result.message
assert "github" in result.message
@pytest.mark.asyncio(loop_scope="session")
async def test_empty_provider_returns_error(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider=""
)
assert isinstance(result, ErrorResponse)
assert result.error == "unknown_provider"
@pytest.mark.asyncio(loop_scope="session")
async def test_github_provider_returns_setup_response(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
assert result.setup_info.agent_name == "GitHub"
assert result.setup_info.agent_id == "connect_github"
@pytest.mark.asyncio(loop_scope="session")
async def test_github_has_missing_credentials_in_readiness(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
readiness = result.setup_info.user_readiness
assert readiness.has_all_credentials is False
assert readiness.ready_to_run is False
assert "github_credentials" in readiness.missing_credentials
@pytest.mark.asyncio(loop_scope="session")
async def test_github_requirements_include_credential_entry(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
creds = result.setup_info.requirements["credentials"]
assert len(creds) == 1
assert creds[0]["provider"] == "github"
assert creds[0]["id"] == "github_credentials"
@pytest.mark.asyncio(loop_scope="session")
async def test_reason_appears_in_message(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
reason = "Needed to create a pull request."
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github", reason=reason
)
assert isinstance(result, SetupRequirementsResponse)
assert reason in result.message
@pytest.mark.asyncio(loop_scope="session")
async def test_session_id_propagated(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
assert result.session_id == session.session_id
@pytest.mark.asyncio(loop_scope="session")
async def test_provider_case_insensitive(self):
"""Provider slug is normalised to lowercase before lookup."""
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="GitHub"
)
assert isinstance(result, SetupRequirementsResponse)
def test_tool_name(self):
assert ConnectIntegrationTool().name == "connect_integration"
def test_requires_auth(self):
assert ConnectIntegrationTool().requires_auth is True
@pytest.mark.asyncio(loop_scope="session")
async def test_unauthenticated_user_gets_need_login_response(self):
"""execute() with user_id=None must return NeedLoginResponse, not the setup card.
This verifies that the requires_auth guard in BaseTool.execute() fires
before _execute() is called, so unauthenticated callers cannot probe
which integrations are configured.
"""
import json
tool = self._make_tool()
# Session still needs a user_id string; the None is passed to execute()
# to simulate an unauthenticated call.
session = make_session(user_id=_TEST_USER_ID)
result = await tool.execute(
user_id=None,
session=session,
tool_call_id="test-call-id",
provider="github",
)
raw = result.output
output = json.loads(raw) if isinstance(raw, str) else raw
assert output.get("type") == "need_login"
assert result.success is False

View File

@@ -41,8 +41,7 @@ import contextlib
import logging
from typing import Any, Awaitable, Callable, Literal
from e2b import AsyncSandbox
from e2b.sandbox.sandbox_api import SandboxLifecycle
from e2b import AsyncSandbox, SandboxLifecycle
from backend.data.redis_client import get_redis_async

View File

@@ -19,7 +19,8 @@ class FindAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Discover agents from the marketplace based on capabilities and user needs."
"Discover agents from the marketplace based on capabilities and "
"user needs, or look up a specific agent by its creator/slug ID."
)
@property
@@ -29,7 +30,7 @@ class FindAgentTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search query describing what the user wants to accomplish. Use single keywords for best results.",
"description": "Search query describing what the user wants to accomplish, or a creator/slug ID (e.g. 'username/agent-name') for direct lookup. Use single keywords for best results.",
},
},
"required": ["query"],
@@ -38,6 +39,7 @@ class FindAgentTool(BaseTool):
async def _execute(
self, user_id: str | None, session: ChatSession, **kwargs
) -> ToolResponseBase:
"""Search marketplace for agents matching the query."""
return await search_agents(
query=kwargs.get("query", "").strip(),
source="marketplace",

View File

@@ -15,6 +15,7 @@ from .models import (
ErrorResponse,
NoResultsResponse,
)
from .utils import is_uuid
logger = logging.getLogger(__name__)
@@ -37,7 +38,8 @@ COPILOT_EXCLUDED_BLOCK_TYPES = {
# Specific block IDs excluded from CoPilot (STANDARD type but still require graph context)
COPILOT_EXCLUDED_BLOCK_IDS = {
# SmartDecisionMakerBlock - dynamically discovers downstream blocks via graph topology
# SmartDecisionMakerBlock - dynamically discovers downstream blocks via graph topology;
# usable in agent graphs (guide hardcodes its ID) but cannot run standalone.
"3b191d9f-356f-482d-8238-ba04b6d18381",
}
@@ -52,7 +54,8 @@ class FindBlockTool(BaseTool):
@property
def description(self) -> str:
return (
"Search for available blocks by name or description. "
"Search for available blocks by name or description, or look up a "
"specific block by its ID. "
"Blocks are reusable components that perform specific tasks like "
"sending emails, making API calls, processing text, etc. "
"IMPORTANT: Use this tool FIRST to get the block's 'id' before calling run_block. "
@@ -68,7 +71,8 @@ class FindBlockTool(BaseTool):
"query": {
"type": "string",
"description": (
"Search query to find blocks by name or description. "
"Search query to find blocks by name or description, "
"or a block ID (UUID) for direct lookup. "
"Use keywords like 'email', 'http', 'text', 'ai', etc."
),
},
@@ -113,11 +117,77 @@ class FindBlockTool(BaseTool):
if not query:
return ErrorResponse(
message="Please provide a search query",
message="Please provide a search query or block ID",
session_id=session_id,
)
try:
# Direct ID lookup if query looks like a UUID
if is_uuid(query):
block = get_block(query.lower())
if block:
if block.disabled:
return NoResultsResponse(
message=f"Block '{block.name}' (ID: {block.id}) is disabled and cannot be used.",
suggestions=["Search for an alternative block by name"],
session_id=session_id,
)
if (
block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
or block.id in COPILOT_EXCLUDED_BLOCK_IDS
):
if block.block_type == BlockType.MCP_TOOL:
return NoResultsResponse(
message=(
f"Block '{block.name}' (ID: {block.id}) is not "
"runnable through find_block/run_block. Use "
"run_mcp_tool instead."
),
suggestions=[
"Use run_mcp_tool to discover and run this MCP tool",
"Search for an alternative block by name",
],
session_id=session_id,
)
return NoResultsResponse(
message=(
f"Block '{block.name}' (ID: {block.id}) is not available "
"in CoPilot. It can only be used within agent graphs."
),
suggestions=[
"Search for an alternative block by name",
"Use this block in an agent graph instead",
],
session_id=session_id,
)
summary = BlockInfoSummary(
id=block.id,
name=block.name,
description=(
block.optimized_description or block.description or ""
),
categories=[c.value for c in block.categories],
)
if include_schemas:
info = block.get_info()
summary.input_schema = info.inputSchema
summary.output_schema = info.outputSchema
summary.static_output = info.staticOutput
return BlockListResponse(
message=(
f"Found block '{block.name}' by ID. "
"To see inputs/outputs and execute it, use "
"run_block with the block's 'id' - providing "
"no inputs."
),
blocks=[summary],
count=1,
query=query,
session_id=session_id,
)
# Search for blocks using hybrid search
results, total = await search().unified_hybrid_search(
query=query,

View File

@@ -499,3 +499,123 @@ class TestFindBlockFiltering:
assert response.blocks[0].input_schema == input_schema
assert response.blocks[0].output_schema == output_schema
assert response.blocks[0].static_output is True
class TestFindBlockDirectLookup:
"""Tests for direct UUID lookup in FindBlockTool."""
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_found(self):
"""UUID query returns the block directly without search."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
block = make_mock_block(block_id, "Test Block", BlockType.STANDARD)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
assert isinstance(response, BlockListResponse)
assert response.count == 1
assert response.blocks[0].id == block_id
assert response.blocks[0].name == "Test Block"
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_not_found_falls_through(self):
"""UUID that doesn't match any block falls through to search."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
mock_search_db = MagicMock()
mock_search_db.unified_hybrid_search = AsyncMock(return_value=([], 0))
with (
patch(
"backend.copilot.tools.find_block.get_block",
return_value=None,
),
patch(
"backend.copilot.tools.find_block.search",
return_value=mock_search_db,
),
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_disabled_block(self):
"""UUID matching a disabled block returns NoResultsResponse."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
block = make_mock_block(
block_id, "Disabled Block", BlockType.STANDARD, disabled=True
)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
assert "disabled" in response.message.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_excluded_block_type(self):
"""UUID matching an excluded block type returns NoResultsResponse."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
block = make_mock_block(block_id, "Input Block", BlockType.INPUT)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
assert "not available" in response.message.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_excluded_block_id(self):
"""UUID matching an excluded block ID returns NoResultsResponse."""
session = make_session(user_id=_TEST_USER_ID)
smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
block = make_mock_block(
smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=smart_decision_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
assert "not available" in response.message.lower()

View File

@@ -1,6 +1,7 @@
"""Shared utilities for chat tools."""
import logging
import re
from typing import Any
from backend.api.features.library import model as library_model
@@ -19,6 +20,26 @@ from backend.util.exceptions import NotFoundError
logger = logging.getLogger(__name__)
# Shared UUID v4 pattern used by multiple tools for direct ID lookups.
_UUID_V4_PATTERN = re.compile(
r"^[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89ab][a-f0-9]{3}-[a-f0-9]{12}$",
re.IGNORECASE,
)
def is_uuid(text: str) -> bool:
"""Check if text is a valid UUID v4."""
return bool(_UUID_V4_PATTERN.match(text.strip()))
# Matches "creator/slug" identifiers used in the marketplace
_CREATOR_SLUG_PATTERN = re.compile(r"^[\w-]+/[\w-]+$")
def is_creator_slug(text: str) -> bool:
"""Check if text matches a 'creator/slug' marketplace identifier."""
return bool(_CREATOR_SLUG_PATTERN.match(text.strip()))
async def fetch_graph_from_store_slug(
username: str,

View File

@@ -2,6 +2,7 @@
import base64
import logging
import mimetypes
import os
from typing import Any, Optional
@@ -10,7 +11,9 @@ from pydantic import BaseModel
from backend.copilot.context import (
E2B_WORKDIR,
get_current_sandbox,
get_sdk_cwd,
get_workspace_manager,
is_allowed_local_path,
resolve_sandbox_path,
)
from backend.copilot.model import ChatSession
@@ -24,6 +27,10 @@ from .models import ErrorResponse, ResponseType, ToolResponseBase
logger = logging.getLogger(__name__)
# Sentinel file_id used when a tool-result file is read directly from the local
# host filesystem (rather than from workspace storage).
_LOCAL_TOOL_RESULT_FILE_ID = "local"
async def _resolve_write_content(
content_text: str | None,
@@ -275,6 +282,93 @@ class WorkspaceFileContentResponse(ToolResponseBase):
content_base64: str
_MAX_LOCAL_TOOL_RESULT_BYTES = 10 * 1024 * 1024 # 10 MB
def _read_local_tool_result(
path: str,
char_offset: int,
char_length: Optional[int],
session_id: str,
sdk_cwd: str | None = None,
) -> ToolResponseBase:
"""Read an SDK tool-result file from local disk.
This is a fallback for when the model mistakenly calls
``read_workspace_file`` with an SDK tool-result path that only exists on
the host filesystem, not in cloud workspace storage.
Defence-in-depth: validates *path* via :func:`is_allowed_local_path`
regardless of what the caller has already checked.
"""
# TOCTOU: path validated then opened separately. Acceptable because
# the tool-results directory is server-controlled, not user-writable.
expanded = os.path.realpath(os.path.expanduser(path))
# Defence-in-depth: re-check with resolved path (caller checked raw path).
if not is_allowed_local_path(expanded, sdk_cwd or get_sdk_cwd()):
return ErrorResponse(
message=f"Path not allowed: {os.path.basename(path)}", session_id=session_id
)
try:
# The 10 MB cap (_MAX_LOCAL_TOOL_RESULT_BYTES) bounds memory usage.
# Pre-read size check prevents loading files far above the cap;
# the remaining TOCTOU gap is acceptable for server-controlled paths.
file_size = os.path.getsize(expanded)
if file_size > _MAX_LOCAL_TOOL_RESULT_BYTES:
return ErrorResponse(
message=(f"File too large: {os.path.basename(path)}"),
session_id=session_id,
)
# Detect binary files: try strict UTF-8 first, fall back to
# base64-encoding the raw bytes for binary content.
with open(expanded, "rb") as fh:
raw = fh.read()
try:
text_content = raw.decode("utf-8")
except UnicodeDecodeError:
# Binary file — return raw base64, ignore char_offset/char_length
return WorkspaceFileContentResponse(
file_id=_LOCAL_TOOL_RESULT_FILE_ID,
name=os.path.basename(path),
path=path,
mime_type=mimetypes.guess_type(path)[0] or "application/octet-stream",
content_base64=base64.b64encode(raw).decode("ascii"),
message=(
f"Read {file_size:,} bytes (binary) from local tool-result "
f"{os.path.basename(path)}"
),
session_id=session_id,
)
end = (
char_offset + char_length if char_length is not None else len(text_content)
)
slice_text = text_content[char_offset:end]
except FileNotFoundError:
return ErrorResponse(
message=f"File not found: {os.path.basename(path)}", session_id=session_id
)
except Exception as exc:
return ErrorResponse(
message=f"Error reading file: {type(exc).__name__}", session_id=session_id
)
return WorkspaceFileContentResponse(
file_id=_LOCAL_TOOL_RESULT_FILE_ID,
name=os.path.basename(path),
path=path,
mime_type=mimetypes.guess_type(path)[0] or "text/plain",
content_base64=base64.b64encode(slice_text.encode("utf-8")).decode("ascii"),
message=(
f"Read chars {char_offset}\u2013{char_offset + len(slice_text)} "
f"of {len(text_content):,} chars from local tool-result "
f"{os.path.basename(path)}"
),
session_id=session_id,
)
class WorkspaceFileMetadataResponse(ToolResponseBase):
"""Response containing workspace file metadata and download URL (prevents context bloat)."""
@@ -533,6 +627,14 @@ class ReadWorkspaceFileTool(BaseTool):
manager = await get_workspace_manager(user_id, session_id)
resolved = await _resolve_file(manager, file_id, path, session_id)
if isinstance(resolved, ErrorResponse):
# Fallback: if the path is an SDK tool-result on local disk,
# read it directly instead of failing. The model sometimes
# calls read_workspace_file for these paths by mistake.
sdk_cwd = get_sdk_cwd()
if path and is_allowed_local_path(path, sdk_cwd):
return _read_local_tool_result(
path, char_offset, char_length, session_id, sdk_cwd=sdk_cwd
)
return resolved
target_file_id, file_info = resolved

View File

@@ -2,18 +2,25 @@
import base64
import os
import shutil
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.context import SDK_PROJECTS_DIR, _current_project_dir
from backend.copilot.tools._test_data import make_session, setup_test_data
from backend.copilot.tools.models import ErrorResponse
from backend.copilot.tools.workspace_files import (
_MAX_LOCAL_TOOL_RESULT_BYTES,
DeleteWorkspaceFileTool,
ListWorkspaceFilesTool,
ReadWorkspaceFileTool,
WorkspaceDeleteResponse,
WorkspaceFileContentResponse,
WorkspaceFileListResponse,
WorkspaceWriteResponse,
WriteWorkspaceFileTool,
_read_local_tool_result,
_resolve_write_content,
_validate_ephemeral_path,
)
@@ -325,3 +332,294 @@ async def test_write_workspace_file_source_path(setup_test_data):
await delete_tool._execute(
user_id=user.id, session=session, file_id=write_resp.file_id
)
# ---------------------------------------------------------------------------
# _read_local_tool_result — local disk fallback for SDK tool-result files
# ---------------------------------------------------------------------------
_CONV_UUID = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
class TestReadLocalToolResult:
"""Tests for _read_local_tool_result (local disk fallback)."""
def _make_tool_result(self, encoded: str, filename: str, content: bytes) -> str:
"""Create a tool-results file and return its path."""
tool_dir = os.path.join(SDK_PROJECTS_DIR, encoded, _CONV_UUID, "tool-results")
os.makedirs(tool_dir, exist_ok=True)
filepath = os.path.join(tool_dir, filename)
with open(filepath, "wb") as f:
f.write(content)
return filepath
def _cleanup(self, encoded: str) -> None:
shutil.rmtree(os.path.join(SDK_PROJECTS_DIR, encoded), ignore_errors=True)
def test_read_text_file(self):
"""Read a UTF-8 text tool-result file."""
encoded = "-tmp-copilot-local-read-text"
path = self._make_tool_result(encoded, "output.txt", b"hello world")
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 0, None, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
decoded = base64.b64decode(result.content_base64).decode("utf-8")
assert decoded == "hello world"
assert "text/plain" in result.mime_type
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_read_text_with_offset(self):
"""Read a slice of a text file using char_offset and char_length."""
encoded = "-tmp-copilot-local-read-offset"
path = self._make_tool_result(encoded, "data.txt", b"ABCDEFGHIJ")
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 3, 4, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
decoded = base64.b64decode(result.content_base64).decode("utf-8")
assert decoded == "DEFG"
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_read_binary_file(self):
"""Binary files are returned as raw base64."""
encoded = "-tmp-copilot-local-read-binary"
binary_data = bytes(range(256))
path = self._make_tool_result(encoded, "image.png", binary_data)
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 0, None, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
decoded = base64.b64decode(result.content_base64)
assert decoded == binary_data
assert "binary" in result.message
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_disallowed_path_rejected(self):
"""Paths not under allowed directories are rejected."""
result = _read_local_tool_result("/etc/passwd", 0, None, "s1")
assert isinstance(result, ErrorResponse)
assert "not allowed" in result.message.lower()
def test_file_not_found(self):
"""Missing files return an error."""
encoded = "-tmp-copilot-local-read-missing"
tool_dir = os.path.join(SDK_PROJECTS_DIR, encoded, _CONV_UUID, "tool-results")
os.makedirs(tool_dir, exist_ok=True)
path = os.path.join(tool_dir, "nope.txt")
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 0, None, "s1")
assert isinstance(result, ErrorResponse)
assert "not found" in result.message.lower()
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_file_too_large(self, monkeypatch):
"""Files exceeding the size limit are rejected."""
encoded = "-tmp-copilot-local-read-large"
# Create a small file but fake os.path.getsize to return a huge value
path = self._make_tool_result(encoded, "big.txt", b"small")
token = _current_project_dir.set(encoded)
monkeypatch.setattr(
"os.path.getsize", lambda _: _MAX_LOCAL_TOOL_RESULT_BYTES + 1
)
try:
result = _read_local_tool_result(path, 0, None, "s1")
assert isinstance(result, ErrorResponse)
assert "too large" in result.message.lower()
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_offset_beyond_file_length(self):
"""Offset past end-of-file returns empty content."""
encoded = "-tmp-copilot-local-read-past-eof"
path = self._make_tool_result(encoded, "short.txt", b"abc")
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 999, 10, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
decoded = base64.b64decode(result.content_base64).decode("utf-8")
assert decoded == ""
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_zero_length_read(self):
"""Requesting zero characters returns empty content."""
encoded = "-tmp-copilot-local-read-zero-len"
path = self._make_tool_result(encoded, "data.txt", b"ABCDEF")
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 2, 0, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
decoded = base64.b64decode(result.content_base64).decode("utf-8")
assert decoded == ""
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_mime_type_from_json_extension(self):
"""JSON files get application/json MIME type, not hardcoded text/plain."""
encoded = "-tmp-copilot-local-read-json"
path = self._make_tool_result(encoded, "result.json", b'{"key": "value"}')
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 0, None, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
assert result.mime_type == "application/json"
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_mime_type_from_png_extension(self):
"""Binary .png files get image/png MIME type via mimetypes."""
encoded = "-tmp-copilot-local-read-png-mime"
binary_data = bytes(range(256))
path = self._make_tool_result(encoded, "chart.png", binary_data)
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 0, None, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
assert result.mime_type == "image/png"
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_explicit_sdk_cwd_parameter(self):
"""The sdk_cwd parameter overrides get_sdk_cwd() for path validation."""
encoded = "-tmp-copilot-local-read-sdkcwd"
path = self._make_tool_result(encoded, "out.txt", b"content")
token = _current_project_dir.set(encoded)
try:
# Pass sdk_cwd explicitly — should still succeed because the path
# is under SDK_PROJECTS_DIR which is always allowed.
result = _read_local_tool_result(
path, 0, None, "s1", sdk_cwd="/tmp/copilot-test"
)
assert isinstance(result, WorkspaceFileContentResponse)
decoded = base64.b64decode(result.content_base64).decode("utf-8")
assert decoded == "content"
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
def test_offset_with_no_length_reads_to_end(self):
"""When char_length is None, read from offset to end of file."""
encoded = "-tmp-copilot-local-read-offset-noLen"
path = self._make_tool_result(encoded, "data.txt", b"0123456789")
token = _current_project_dir.set(encoded)
try:
result = _read_local_tool_result(path, 5, None, "s1")
assert isinstance(result, WorkspaceFileContentResponse)
decoded = base64.b64decode(result.content_base64).decode("utf-8")
assert decoded == "56789"
finally:
_current_project_dir.reset(token)
self._cleanup(encoded)
# ---------------------------------------------------------------------------
# ReadWorkspaceFileTool fallback to _read_local_tool_result
# ---------------------------------------------------------------------------
@pytest.mark.asyncio(loop_scope="session")
async def test_read_workspace_file_falls_back_to_local_tool_result(setup_test_data):
"""When _resolve_file returns ErrorResponse for an allowed local path,
ReadWorkspaceFileTool should fall back to _read_local_tool_result."""
user = setup_test_data["user"]
session = make_session(user.id)
# Create a real tool-result file on disk so the fallback can read it.
encoded = "-tmp-copilot-fallback-test"
conv_uuid = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
tool_dir = os.path.join(SDK_PROJECTS_DIR, encoded, conv_uuid, "tool-results")
os.makedirs(tool_dir, exist_ok=True)
filepath = os.path.join(tool_dir, "result.txt")
with open(filepath, "w") as f:
f.write("fallback content")
token = _current_project_dir.set(encoded)
try:
# Mock _resolve_file to return an ErrorResponse (simulating "file not
# found in workspace") so the fallback branch is exercised.
mock_resolve = AsyncMock(
return_value=ErrorResponse(
message="File not found at path: result.txt",
session_id=session.session_id,
)
)
with patch("backend.copilot.tools.workspace_files._resolve_file", mock_resolve):
read_tool = ReadWorkspaceFileTool()
result = await read_tool._execute(
user_id=user.id,
session=session,
path=filepath,
)
# Should have fallen back to _read_local_tool_result and succeeded.
assert isinstance(result, WorkspaceFileContentResponse), (
f"Expected fallback to local read, got {type(result).__name__}: "
f"{getattr(result, 'message', '')}"
)
decoded = base64.b64decode(result.content_base64).decode("utf-8")
assert decoded == "fallback content"
mock_resolve.assert_awaited_once()
finally:
_current_project_dir.reset(token)
shutil.rmtree(os.path.join(SDK_PROJECTS_DIR, encoded), ignore_errors=True)
@pytest.mark.asyncio(loop_scope="session")
async def test_read_workspace_file_no_fallback_when_resolve_succeeds(setup_test_data):
"""When _resolve_file succeeds, the local-disk fallback must NOT be invoked."""
user = setup_test_data["user"]
session = make_session(user.id)
fake_file_id = "fake-file-id-001"
fake_content = b"workspace content"
# Build a minimal file_info stub that the tool's happy-path needs.
class _FakeFileInfo:
id = fake_file_id
name = "result.json"
path = "/result.json"
mime_type = "text/plain"
size_bytes = len(fake_content)
mock_resolve = AsyncMock(return_value=(fake_file_id, _FakeFileInfo()))
mock_manager = AsyncMock()
mock_manager.read_file_by_id = AsyncMock(return_value=fake_content)
with (
patch("backend.copilot.tools.workspace_files._resolve_file", mock_resolve),
patch(
"backend.copilot.tools.workspace_files.get_workspace_manager",
AsyncMock(return_value=mock_manager),
),
patch(
"backend.copilot.tools.workspace_files._read_local_tool_result"
) as patched_local,
):
read_tool = ReadWorkspaceFileTool()
result = await read_tool._execute(
user_id=user.id,
session=session,
file_id=fake_file_id,
)
# Fallback must not have been called.
patched_local.assert_not_called()
# Normal workspace path must have produced a content response.
assert isinstance(result, WorkspaceFileContentResponse)
assert base64.b64decode(result.content_base64) == fake_content

View File

@@ -76,7 +76,6 @@ MODEL_COST: dict[LlmModel, int] = {
LlmModel.GPT4O_MINI: 1,
LlmModel.GPT4O: 3,
LlmModel.GPT4_TURBO: 10,
LlmModel.GPT3_5_TURBO: 1,
LlmModel.CLAUDE_4_1_OPUS: 21,
LlmModel.CLAUDE_4_OPUS: 21,
LlmModel.CLAUDE_4_SONNET: 5,
@@ -423,7 +422,7 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
BlockCost(
cost_amount=10,
cost_filter={
"model": FluxKontextModelName.PRO.api_name,
"model": FluxKontextModelName.FLUX_KONTEXT_PRO,
"credentials": {
"id": replicate_credentials.id,
"provider": replicate_credentials.provider,
@@ -434,7 +433,29 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
BlockCost(
cost_amount=20,
cost_filter={
"model": FluxKontextModelName.MAX.api_name,
"model": FluxKontextModelName.FLUX_KONTEXT_MAX,
"credentials": {
"id": replicate_credentials.id,
"provider": replicate_credentials.provider,
"type": replicate_credentials.type,
},
},
),
BlockCost(
cost_amount=14, # Nano Banana Pro
cost_filter={
"model": FluxKontextModelName.NANO_BANANA_PRO,
"credentials": {
"id": replicate_credentials.id,
"provider": replicate_credentials.provider,
"type": replicate_credentials.type,
},
},
),
BlockCost(
cost_amount=14, # Nano Banana 2
cost_filter={
"model": FluxKontextModelName.NANO_BANANA_2,
"credentials": {
"id": replicate_credentials.id,
"provider": replicate_credentials.provider,
@@ -632,6 +653,17 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
},
},
),
BlockCost(
cost_amount=14, # Nano Banana 2: same pricing tier as Pro
cost_filter={
"model": ImageGenModel.NANO_BANANA_2,
"credentials": {
"id": replicate_credentials.id,
"provider": replicate_credentials.provider,
"type": replicate_credentials.type,
},
},
),
],
AIImageCustomizerBlock: [
BlockCost(
@@ -656,6 +688,17 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
},
},
),
BlockCost(
cost_amount=14, # Nano Banana 2: same pricing tier as Pro
cost_filter={
"model": GeminiImageModel.NANO_BANANA_2,
"credentials": {
"id": replicate_credentials.id,
"provider": replicate_credentials.provider,
"type": replicate_credentials.type,
},
},
),
],
VideoNarrationBlock: [
BlockCost(

View File

@@ -877,12 +877,12 @@ async def get_execution_outputs_by_node_exec_id(
where={"referencedByOutputExecId": node_exec_id}
)
result = {}
result: CompletedBlockOutput = defaultdict(list)
for output in outputs:
if output.data is not None:
result[output.name] = type_utils.convert(output.data, JsonValue)
result[output.name].append(type_utils.convert(output.data, JsonValue))
return result
return dict(result)
async def update_graph_execution_start_time(

View File

@@ -0,0 +1,102 @@
"""Test that get_execution_outputs_by_node_exec_id returns CompletedBlockOutput.
CompletedBlockOutput is dict[str, list[Any]] — values must be lists.
The RPC service layer validates return types via TypeAdapter, so if
the function returns plain values instead of lists, it causes:
1 validation error for dict[str,list[any]] response
Input should be a valid list [type=list_type, input_value='', input_type=str]
This breaks SmartDecisionMakerBlock agent mode tool execution.
"""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from pydantic import TypeAdapter
from backend.data.block import CompletedBlockOutput
@pytest.mark.asyncio
async def test_outputs_are_lists():
"""Each value in the returned dict must be a list, matching CompletedBlockOutput."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
mock_output = MagicMock()
mock_output.name = "response"
mock_output.data = "some text output"
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(return_value=[mock_output])
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
# The result must conform to CompletedBlockOutput = dict[str, list[Any]]
assert "response" in result
assert isinstance(
result["response"], list
), f"Expected list, got {type(result['response']).__name__}: {result['response']!r}"
# Must also pass TypeAdapter validation (this is what the RPC layer does)
adapter = TypeAdapter(CompletedBlockOutput)
validated = adapter.validate_python(result) # This is the line that fails in prod
assert validated == {"response": ["some text output"]}
@pytest.mark.asyncio
async def test_multiple_outputs_same_name_are_collected():
"""Multiple outputs with the same name should all appear in the list."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
mock_out1 = MagicMock()
mock_out1.name = "result"
mock_out1.data = "first"
mock_out2 = MagicMock()
mock_out2.name = "result"
mock_out2.data = "second"
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(
return_value=[mock_out1, mock_out2]
)
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
assert isinstance(result["result"], list)
assert len(result["result"]) == 2
@pytest.mark.asyncio
async def test_empty_outputs_returns_empty_dict():
"""No outputs → empty dict."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(return_value=[])
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
assert result == {}
@pytest.mark.asyncio
async def test_none_data_skipped():
"""Outputs with data=None should be skipped."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
mock_output = MagicMock()
mock_output.name = "response"
mock_output.data = None
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(return_value=[mock_output])
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
assert result == {}

View File

@@ -1,750 +0,0 @@
import asyncio
import csv
import io
import logging
import os
import re
import socket
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Literal, Optional
from uuid import uuid4
import prisma.enums
import prisma.models
import prisma.types
from prisma.errors import UniqueViolationError
from pydantic import BaseModel, EmailStr, TypeAdapter, ValidationError
from backend.data.db import transaction
from backend.data.model import User
from backend.data.redis_client import get_redis_async
from backend.data.tally import get_business_understanding_input_from_tally, mask_email
from backend.data.understanding import (
BusinessUnderstandingInput,
merge_business_understanding_data,
)
from backend.data.user import get_user_by_email, get_user_by_id
from backend.executor.cluster_lock import AsyncClusterLock
from backend.util.exceptions import (
NotAuthorizedError,
NotFoundError,
PreconditionFailed,
)
from backend.util.json import SafeJson
from backend.util.settings import Settings
logger = logging.getLogger(__name__)
_settings = Settings()
_WORKER_ID = f"{socket.gethostname()}:{os.getpid()}"
_tally_seed_tasks: dict[str, asyncio.Task] = {}
_TALLY_STALE_SECONDS = 300
_MAX_TALLY_ERROR_LENGTH = 200
_email_adapter = TypeAdapter(EmailStr)
MAX_BULK_INVITE_FILE_BYTES = 1024 * 1024
MAX_BULK_INVITE_ROWS = 500
class InvitedUserRecord(BaseModel):
id: str
email: str
status: prisma.enums.InvitedUserStatus
auth_user_id: Optional[str] = None
name: Optional[str] = None
tally_understanding: Optional[dict[str, Any]] = None
tally_status: prisma.enums.TallyComputationStatus
tally_computed_at: Optional[datetime] = None
tally_error: Optional[str] = None
created_at: datetime
updated_at: datetime
@classmethod
def from_db(cls, invited_user: "prisma.models.InvitedUser") -> "InvitedUserRecord":
payload = (
invited_user.tallyUnderstanding
if isinstance(invited_user.tallyUnderstanding, dict)
else None
)
return cls(
id=invited_user.id,
email=invited_user.email,
status=invited_user.status,
auth_user_id=invited_user.authUserId,
name=invited_user.name,
tally_understanding=payload,
tally_status=invited_user.tallyStatus,
tally_computed_at=invited_user.tallyComputedAt,
tally_error=invited_user.tallyError,
created_at=invited_user.createdAt,
updated_at=invited_user.updatedAt,
)
class BulkInvitedUserRowResult(BaseModel):
row_number: int
email: Optional[str] = None
name: Optional[str] = None
status: Literal["CREATED", "SKIPPED", "ERROR"]
message: str
invited_user: Optional[InvitedUserRecord] = None
class BulkInvitedUsersResult(BaseModel):
created_count: int
skipped_count: int
error_count: int
results: list[BulkInvitedUserRowResult]
@dataclass(frozen=True)
class _ParsedInviteRow:
row_number: int
email: str
name: Optional[str]
def normalize_email(email: str) -> str:
return email.strip().lower()
def _normalize_name(name: Optional[str]) -> Optional[str]:
if name is None:
return None
normalized = name.strip()
return normalized or None
def _default_profile_name(email: str, preferred_name: Optional[str]) -> str:
if preferred_name:
return preferred_name
local_part = email.split("@", 1)[0].strip()
return local_part or "user"
def _sanitize_username_base(email: str) -> str:
local_part = email.split("@", 1)[0].lower()
sanitized = re.sub(r"[^a-z0-9-]", "", local_part)
sanitized = sanitized.strip("-")
return sanitized[:40] or "user"
async def _generate_unique_profile_username(email: str, tx) -> str:
base = _sanitize_username_base(email)
for _ in range(2):
candidate = f"{base}-{uuid4().hex[:6]}"
existing = await prisma.models.Profile.prisma(tx).find_unique(
where={"username": candidate}
)
if existing is None:
return candidate
raise RuntimeError(f"Unable to generate unique username for {email}")
async def _ensure_default_profile(
user_id: str,
email: str,
preferred_name: Optional[str],
tx,
) -> None:
existing_profile = await prisma.models.Profile.prisma(tx).find_unique(
where={"userId": user_id}
)
if existing_profile is not None:
return
username = await _generate_unique_profile_username(email, tx)
await prisma.models.Profile.prisma(tx).create(
data=prisma.types.ProfileCreateInput(
userId=user_id,
name=_default_profile_name(email, preferred_name),
username=username,
description="I'm new here",
links=[],
avatarUrl="",
)
)
async def _ensure_default_onboarding(user_id: str, tx) -> None:
await prisma.models.UserOnboarding.prisma(tx).upsert(
where={"userId": user_id},
data={
"create": prisma.types.UserOnboardingCreateInput(userId=user_id),
"update": {},
},
)
async def _apply_tally_understanding(
user_id: str,
invited_user: "prisma.models.InvitedUser",
tx,
) -> None:
if not isinstance(invited_user.tallyUnderstanding, dict):
return
try:
input_data = BusinessUnderstandingInput.model_validate(
invited_user.tallyUnderstanding
)
except Exception:
logger.warning(
"Malformed tallyUnderstanding for invited user %s; skipping",
invited_user.id,
exc_info=True,
)
return
payload = merge_business_understanding_data({}, input_data)
await prisma.models.CoPilotUnderstanding.prisma(tx).upsert(
where={"userId": user_id},
data={
"create": {"userId": user_id, "data": SafeJson(payload)},
"update": {"data": SafeJson(payload)},
},
)
async def list_invited_users(
page: int = 1,
page_size: int = 50,
) -> tuple[list[InvitedUserRecord], int]:
total = await prisma.models.InvitedUser.prisma().count()
invited_users = await prisma.models.InvitedUser.prisma().find_many(
order={"createdAt": "desc"},
skip=(page - 1) * page_size,
take=page_size,
)
return [InvitedUserRecord.from_db(iu) for iu in invited_users], total
async def create_invited_user(
email: str, name: Optional[str] = None
) -> InvitedUserRecord:
normalized_email = normalize_email(email)
normalized_name = _normalize_name(name)
existing_user = await prisma.models.User.prisma().find_unique(
where={"email": normalized_email}
)
if existing_user is not None:
raise PreconditionFailed("An active user with this email already exists")
existing_invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"email": normalized_email}
)
if existing_invited_user is not None:
raise PreconditionFailed("An invited user with this email already exists")
try:
invited_user = await prisma.models.InvitedUser.prisma().create(
data={
"email": normalized_email,
"name": normalized_name,
"status": prisma.enums.InvitedUserStatus.INVITED,
"tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
}
)
except UniqueViolationError:
raise PreconditionFailed("An invited user with this email already exists")
schedule_invited_user_tally_precompute(invited_user.id)
return InvitedUserRecord.from_db(invited_user)
async def revoke_invited_user(invited_user_id: str) -> InvitedUserRecord:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
if invited_user.status == prisma.enums.InvitedUserStatus.CLAIMED:
raise PreconditionFailed("Claimed invited users cannot be revoked")
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
return InvitedUserRecord.from_db(invited_user)
revoked_user = await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={"status": prisma.enums.InvitedUserStatus.REVOKED},
)
if revoked_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
return InvitedUserRecord.from_db(revoked_user)
async def retry_invited_user_tally(invited_user_id: str) -> InvitedUserRecord:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
raise PreconditionFailed("Revoked invited users cannot retry Tally seeding")
refreshed_user = await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyUnderstanding": None,
"tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
"tallyComputedAt": None,
"tallyError": None,
},
)
if refreshed_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
schedule_invited_user_tally_precompute(invited_user_id)
return InvitedUserRecord.from_db(refreshed_user)
def _decode_bulk_invite_file(content: bytes) -> str:
if len(content) > MAX_BULK_INVITE_FILE_BYTES:
raise ValueError("Invite file exceeds the maximum size of 1 MB")
try:
return content.decode("utf-8-sig")
except UnicodeDecodeError as exc:
raise ValueError("Invite file must be UTF-8 encoded") from exc
def _parse_bulk_invite_csv(text: str) -> list[_ParsedInviteRow]:
indexed_rows: list[tuple[int, list[str]]] = []
for row_number, row in enumerate(csv.reader(io.StringIO(text)), start=1):
normalized_row = [cell.strip() for cell in row]
if any(normalized_row):
indexed_rows.append((row_number, normalized_row))
if not indexed_rows:
return []
header = [cell.lower() for cell in indexed_rows[0][1]]
has_header = "email" in header
email_index = header.index("email") if has_header else 0
name_index: Optional[int] = (
header.index("name")
if has_header and "name" in header
else (1 if not has_header else None)
)
data_rows = indexed_rows[1:] if has_header else indexed_rows
parsed_rows: list[_ParsedInviteRow] = []
for row_number, row in data_rows:
if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
break
email = row[email_index].strip() if len(row) > email_index else ""
name = (
row[name_index].strip()
if name_index is not None and len(row) > name_index
else ""
)
parsed_rows.append(
_ParsedInviteRow(
row_number=row_number,
email=email,
name=name or None,
)
)
return parsed_rows
def _parse_bulk_invite_text(text: str) -> list[_ParsedInviteRow]:
parsed_rows: list[_ParsedInviteRow] = []
for row_number, raw_line in enumerate(text.splitlines(), start=1):
if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
break
line = raw_line.strip()
if not line or line.startswith("#"):
continue
parsed_rows.append(
_ParsedInviteRow(
row_number=row_number,
email=line,
name=None,
)
)
return parsed_rows
def _parse_bulk_invite_file(
filename: Optional[str],
content: bytes,
) -> list[_ParsedInviteRow]:
text = _decode_bulk_invite_file(content)
file_name = filename.lower() if filename else ""
parsed_rows = (
_parse_bulk_invite_csv(text)
if file_name.endswith(".csv")
else _parse_bulk_invite_text(text)
)
if not parsed_rows:
raise ValueError("Invite file did not contain any emails")
return parsed_rows
async def bulk_create_invited_users_from_file(
filename: Optional[str],
content: bytes,
) -> BulkInvitedUsersResult:
parsed_rows = _parse_bulk_invite_file(filename, content)
created_count = 0
skipped_count = 0
error_count = 0
results: list[BulkInvitedUserRowResult] = []
seen_emails: set[str] = set()
for row in parsed_rows:
row_name = _normalize_name(row.name)
try:
validated_email = _email_adapter.validate_python(row.email)
except ValidationError:
error_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=row.email or None,
name=row_name,
status="ERROR",
message="Invalid email address",
)
)
continue
normalized_email = normalize_email(str(validated_email))
if normalized_email in seen_emails:
skipped_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="SKIPPED",
message="Duplicate email in upload file",
)
)
continue
seen_emails.add(normalized_email)
try:
invited_user = await create_invited_user(normalized_email, row_name)
except PreconditionFailed as exc:
skipped_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="SKIPPED",
message=str(exc),
)
)
except Exception:
masked = mask_email(normalized_email)
logger.exception(
"Failed to create bulk invite for row %s (%s)",
row.row_number,
masked,
)
error_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="ERROR",
message="Unexpected error creating invite",
)
)
else:
created_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="CREATED",
message="Invite created",
invited_user=invited_user,
)
)
return BulkInvitedUsersResult(
created_count=created_count,
skipped_count=skipped_count,
error_count=error_count,
results=results,
)
async def _compute_invited_user_tally_seed(invited_user_id: str) -> None:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
return
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
return
try:
r = await get_redis_async()
except Exception:
r = None
lock: AsyncClusterLock | None = None
if r is not None:
lock = AsyncClusterLock(
redis=r,
key=f"tally_seed:{invited_user_id}",
owner_id=_WORKER_ID,
timeout=_TALLY_STALE_SECONDS,
)
current_owner = await lock.try_acquire()
if current_owner is None:
logger.warn("Redis unvailable for tally lock - skipping tally enrichement")
return
elif current_owner != _WORKER_ID:
logger.debug(
"Tally seed for %s already locked by %s, skipping",
invited_user_id,
current_owner,
)
return
if (
invited_user.tallyStatus == prisma.enums.TallyComputationStatus.RUNNING
and invited_user.updatedAt is not None
):
age = (datetime.now(timezone.utc) - invited_user.updatedAt).total_seconds()
if age < _TALLY_STALE_SECONDS:
logger.debug(
"Tally task for %s still RUNNING (age=%ds), skipping",
invited_user_id,
int(age),
)
return
logger.info(
"Tally task for %s is stale (age=%ds), re-running",
invited_user_id,
int(age),
)
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyStatus": prisma.enums.TallyComputationStatus.RUNNING,
"tallyError": None,
},
)
try:
input_data = await get_business_understanding_input_from_tally(
invited_user.email,
require_api_key=True,
)
payload = (
SafeJson(input_data.model_dump(exclude_none=True))
if input_data is not None
else None
)
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyUnderstanding": payload,
"tallyStatus": prisma.enums.TallyComputationStatus.READY,
"tallyComputedAt": datetime.now(timezone.utc),
"tallyError": None,
},
)
except Exception as exc:
logger.exception(
"Failed to compute Tally understanding for invited user %s",
invited_user_id,
)
sanitized_error = re.sub(
r"https?://\S+", "<url>", f"{type(exc).__name__}: {exc}"
)[:_MAX_TALLY_ERROR_LENGTH]
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyStatus": prisma.enums.TallyComputationStatus.FAILED,
"tallyError": sanitized_error,
},
)
def schedule_invited_user_tally_precompute(invited_user_id: str) -> None:
existing = _tally_seed_tasks.get(invited_user_id)
if existing is not None and not existing.done():
logger.debug("Tally task already running for %s, skipping", invited_user_id)
return
task = asyncio.create_task(_compute_invited_user_tally_seed(invited_user_id))
_tally_seed_tasks[invited_user_id] = task
def _on_done(t: asyncio.Task, _id: str = invited_user_id) -> None:
if _tally_seed_tasks.get(_id) is t:
del _tally_seed_tasks[_id]
task.add_done_callback(_on_done)
async def _open_signup_create_user(
auth_user_id: str,
normalized_email: str,
metadata_name: Optional[str],
) -> User:
"""Create a user without requiring an invite (open signup mode)."""
preferred_name = _normalize_name(metadata_name)
try:
async with transaction() as tx:
user = await prisma.models.User.prisma(tx).create(
data=prisma.types.UserCreateInput(
id=auth_user_id,
email=normalized_email,
name=preferred_name,
)
)
await _ensure_default_profile(
auth_user_id, normalized_email, preferred_name, tx
)
await _ensure_default_onboarding(auth_user_id, tx)
except UniqueViolationError:
existing = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if existing is not None:
return User.from_db(existing)
raise
return User.from_db(user)
# TODO: We need to change this functions logic before going live
async def get_or_activate_user(user_data: dict) -> User:
auth_user_id = user_data.get("sub")
if not auth_user_id:
raise NotAuthorizedError("User ID not found in token")
auth_email = user_data.get("email")
if not auth_email:
raise NotAuthorizedError("Email not found in token")
normalized_email = normalize_email(auth_email)
user_metadata = user_data.get("user_metadata")
metadata_name = (
user_metadata.get("name") if isinstance(user_metadata, dict) else None
)
existing_user = None
try:
existing_user = await get_user_by_id(auth_user_id)
except ValueError:
existing_user = None
except Exception:
logger.exception("Error on get user by id during tally enrichment process")
raise
if existing_user is not None:
return existing_user
if not _settings.config.enable_invite_gate or normalized_email.endswith("@agpt.co"):
return await _open_signup_create_user(
auth_user_id, normalized_email, metadata_name
)
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"email": normalized_email}
)
if invited_user is None:
raise NotAuthorizedError("Your email is not allowed to access the platform")
if invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
raise NotAuthorizedError("Your invitation is no longer active")
try:
async with transaction() as tx:
current_user = await prisma.models.User.prisma(tx).find_unique(
where={"id": auth_user_id}
)
if current_user is not None:
return User.from_db(current_user)
current_invited_user = await prisma.models.InvitedUser.prisma(
tx
).find_unique(where={"email": normalized_email})
if current_invited_user is None:
raise NotAuthorizedError(
"Your email is not allowed to access the platform"
)
if current_invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
raise NotAuthorizedError("Your invitation is no longer active")
if current_invited_user.authUserId not in (None, auth_user_id):
raise NotAuthorizedError("Your invitation has already been claimed")
preferred_name = current_invited_user.name or _normalize_name(metadata_name)
await prisma.models.User.prisma(tx).create(
data=prisma.types.UserCreateInput(
id=auth_user_id,
email=normalized_email,
name=preferred_name,
)
)
await prisma.models.InvitedUser.prisma(tx).update(
where={"id": current_invited_user.id},
data={
"status": prisma.enums.InvitedUserStatus.CLAIMED,
"authUserId": auth_user_id,
},
)
await _ensure_default_profile(
auth_user_id,
normalized_email,
preferred_name,
tx,
)
await _ensure_default_onboarding(auth_user_id, tx)
await _apply_tally_understanding(auth_user_id, current_invited_user, tx)
except UniqueViolationError:
logger.info("Concurrent activation for user %s; re-fetching", auth_user_id)
already_created = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if already_created is not None:
return User.from_db(already_created)
raise RuntimeError(
f"UniqueViolationError during activation but user {auth_user_id} not found"
)
get_user_by_id.cache_delete(auth_user_id)
get_user_by_email.cache_delete(normalized_email)
activated_user = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if activated_user is None:
raise RuntimeError(
f"Activated user {auth_user_id} was not found after creation"
)
return User.from_db(activated_user)

View File

@@ -1,335 +0,0 @@
from contextlib import asynccontextmanager
from datetime import datetime, timezone
from types import SimpleNamespace
from typing import cast
from unittest.mock import AsyncMock, Mock
import prisma.enums
import prisma.models
import pytest
import pytest_mock
from backend.util.exceptions import NotAuthorizedError, PreconditionFailed
from .invited_user import (
InvitedUserRecord,
bulk_create_invited_users_from_file,
create_invited_user,
get_or_activate_user,
retry_invited_user_tally,
)
def _invited_user_db_record(
*,
status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
tally_understanding: dict | None = None,
):
now = datetime.now(timezone.utc)
return SimpleNamespace(
id="invite-1",
email="invited@example.com",
status=status,
authUserId=None,
name="Invited User",
tallyUnderstanding=tally_understanding,
tallyStatus=prisma.enums.TallyComputationStatus.PENDING,
tallyComputedAt=None,
tallyError=None,
createdAt=now,
updatedAt=now,
)
def _invited_user_record(
*,
status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
tally_understanding: dict | None = None,
):
return InvitedUserRecord.from_db(
cast(
prisma.models.InvitedUser,
_invited_user_db_record(
status=status,
tally_understanding=tally_understanding,
),
)
)
def _user_db_record():
now = datetime.now(timezone.utc)
return SimpleNamespace(
id="auth-user-1",
email="invited@example.com",
emailVerified=True,
name="Invited User",
createdAt=now,
updatedAt=now,
metadata={},
integrations="",
stripeCustomerId=None,
topUpConfig=None,
maxEmailsPerDay=3,
notifyOnAgentRun=True,
notifyOnZeroBalance=True,
notifyOnLowBalance=True,
notifyOnBlockExecutionFailed=True,
notifyOnContinuousAgentError=True,
notifyOnDailySummary=True,
notifyOnWeeklySummary=True,
notifyOnMonthlySummary=True,
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="not-set",
)
@pytest.mark.asyncio
async def test_create_invited_user_rejects_existing_active_user(
mocker: pytest_mock.MockerFixture,
) -> None:
user_repo = Mock()
user_repo.find_unique = AsyncMock(return_value=_user_db_record())
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock()
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
with pytest.raises(PreconditionFailed):
await create_invited_user("Invited@example.com")
@pytest.mark.asyncio
async def test_create_invited_user_schedules_tally_seed(
mocker: pytest_mock.MockerFixture,
) -> None:
user_repo = Mock()
user_repo.find_unique = AsyncMock(return_value=None)
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=None)
invited_user_repo.create = AsyncMock(return_value=_invited_user_db_record())
schedule = mocker.patch(
"backend.data.invited_user.schedule_invited_user_tally_precompute"
)
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
invited_user = await create_invited_user("Invited@example.com", "Invited User")
assert invited_user.email == "invited@example.com"
invited_user_repo.create.assert_awaited_once()
schedule.assert_called_once_with("invite-1")
@pytest.mark.asyncio
async def test_retry_invited_user_tally_resets_state_and_schedules(
mocker: pytest_mock.MockerFixture,
) -> None:
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=_invited_user_db_record())
invited_user_repo.update = AsyncMock(return_value=_invited_user_db_record())
schedule = mocker.patch(
"backend.data.invited_user.schedule_invited_user_tally_precompute"
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
invited_user = await retry_invited_user_tally("invite-1")
assert invited_user.id == "invite-1"
invited_user_repo.update.assert_awaited_once()
schedule.assert_called_once_with("invite-1")
@pytest.mark.asyncio
async def test_get_or_activate_user_requires_invite(
mocker: pytest_mock.MockerFixture,
) -> None:
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=None)
mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
mock_get_user_by_id.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_id",
mock_get_user_by_id,
)
mocker.patch(
"backend.data.invited_user._settings.config.enable_invite_gate",
True,
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
with pytest.raises(NotAuthorizedError):
await get_or_activate_user(
{"sub": "auth-user-1", "email": "invited@example.com"}
)
@pytest.mark.asyncio
async def test_get_or_activate_user_creates_user_from_invite(
mocker: pytest_mock.MockerFixture,
) -> None:
tx = object()
invited_user = _invited_user_db_record(
tally_understanding={"user_name": "Invited User", "industry": "Automation"}
)
created_user = _user_db_record()
outside_user_repo = Mock()
# Only called once at post-transaction verification (line 741);
# get_user_by_id (line 657) uses prisma.user.find_unique, not this mock.
outside_user_repo.find_unique = AsyncMock(return_value=created_user)
inside_user_repo = Mock()
inside_user_repo.find_unique = AsyncMock(return_value=None)
inside_user_repo.create = AsyncMock(return_value=created_user)
outside_invited_repo = Mock()
outside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
inside_invited_repo = Mock()
inside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
inside_invited_repo.update = AsyncMock(return_value=invited_user)
def user_prisma(client=None):
return inside_user_repo if client is tx else outside_user_repo
def invited_user_prisma(client=None):
return inside_invited_repo if client is tx else outside_invited_repo
@asynccontextmanager
async def fake_transaction():
yield tx
# Mock get_user_by_id since it uses prisma.user.find_unique (global client),
# not prisma.models.User.prisma().find_unique which we mock above.
mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
mock_get_user_by_id.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_id",
mock_get_user_by_id,
)
mock_get_user_by_email = AsyncMock()
mock_get_user_by_email.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_email",
mock_get_user_by_email,
)
ensure_profile = mocker.patch(
"backend.data.invited_user._ensure_default_profile",
AsyncMock(),
)
ensure_onboarding = mocker.patch(
"backend.data.invited_user._ensure_default_onboarding",
AsyncMock(),
)
apply_tally = mocker.patch(
"backend.data.invited_user._apply_tally_understanding",
AsyncMock(),
)
mocker.patch("backend.data.invited_user.transaction", fake_transaction)
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", side_effect=user_prisma
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
side_effect=invited_user_prisma,
)
user = await get_or_activate_user(
{
"sub": "auth-user-1",
"email": "Invited@example.com",
"user_metadata": {"name": "Invited User"},
}
)
assert user.id == "auth-user-1"
inside_user_repo.create.assert_awaited_once()
inside_invited_repo.update.assert_awaited_once()
ensure_profile.assert_awaited_once()
ensure_onboarding.assert_awaited_once_with("auth-user-1", tx)
apply_tally.assert_awaited_once_with("auth-user-1", invited_user, tx)
@pytest.mark.asyncio
async def test_bulk_create_invited_users_from_text_file(
mocker: pytest_mock.MockerFixture,
) -> None:
create_invited = mocker.patch(
"backend.data.invited_user.create_invited_user",
AsyncMock(
side_effect=[
_invited_user_record(),
_invited_user_record(),
]
),
)
result = await bulk_create_invited_users_from_file(
"invites.txt",
b"Invited@example.com\nsecond@example.com\n",
)
assert result.created_count == 2
assert result.skipped_count == 0
assert result.error_count == 0
assert [row.status for row in result.results] == ["CREATED", "CREATED"]
assert create_invited.await_count == 2
@pytest.mark.asyncio
async def test_bulk_create_invited_users_handles_csv_duplicates_and_invalid_rows(
mocker: pytest_mock.MockerFixture,
) -> None:
create_invited = mocker.patch(
"backend.data.invited_user.create_invited_user",
AsyncMock(
side_effect=[
_invited_user_record(),
PreconditionFailed("An invited user with this email already exists"),
]
),
)
result = await bulk_create_invited_users_from_file(
"invites.csv",
(
"email,name\n"
"valid@example.com,Valid User\n"
"not-an-email,Bad Row\n"
"valid@example.com,Duplicate In File\n"
"existing@example.com,Existing User\n"
).encode("utf-8"),
)
assert result.created_count == 1
assert result.skipped_count == 2
assert result.error_count == 1
assert [row.status for row in result.results] == [
"CREATED",
"ERROR",
"SKIPPED",
"SKIPPED",
]
assert create_invited.await_count == 2

View File

@@ -41,7 +41,7 @@ _MAX_PAGES = 100
_LLM_TIMEOUT = 30
def mask_email(email: str) -> str:
def _mask_email(email: str) -> str:
"""Mask an email for safe logging: 'alice@example.com' -> 'a***e@example.com'."""
try:
local, domain = email.rsplit("@", 1)
@@ -196,7 +196,8 @@ async def _refresh_cache(form_id: str) -> tuple[dict, list]:
Returns (email_index, questions).
"""
client = _make_tally_client(_settings.secrets.tally_api_key)
settings = Settings()
client = _make_tally_client(settings.secrets.tally_api_key)
redis = await get_redis_async()
last_fetch_key = _LAST_FETCH_KEY.format(form_id=form_id)
@@ -331,9 +332,6 @@ Fields:
- current_software (list of strings): software/tools currently used
- existing_automation (list of strings): existing automations
- additional_notes (string): any additional context
- suggested_prompts (list of 5 strings): short action prompts (each under 20 words) that would help \
this person get started with automating their work. Should be specific to their industry, role, and \
pain points; actionable and conversational in tone; focused on automation opportunities.
Form data:
"""
@@ -341,21 +339,21 @@ Form data:
_EXTRACTION_SUFFIX = "\n\nReturn ONLY valid JSON."
async def extract_business_understanding_from_tally(
async def extract_business_understanding(
formatted_text: str,
) -> BusinessUnderstandingInput:
"""
Use an LLM to extract structured business understanding from form text.
"""Use an LLM to extract structured business understanding from form text.
Raises on timeout or unparseable response so the caller can handle it.
"""
api_key = _settings.secrets.open_router_api_key
settings = Settings()
api_key = settings.secrets.open_router_api_key
client = AsyncOpenAI(api_key=api_key, base_url=OPENROUTER_BASE_URL)
try:
response = await asyncio.wait_for(
client.chat.completions.create(
model=_settings.config.tally_extraction_llm_model,
model="openai/gpt-4o-mini",
messages=[
{
"role": "user",
@@ -380,57 +378,9 @@ async def extract_business_understanding_from_tally(
# Filter out null values before constructing
cleaned = {k: v for k, v in data.items() if v is not None}
# Validate suggested_prompts: filter >20 words, keep top 3
raw_prompts = cleaned.get("suggested_prompts", [])
if isinstance(raw_prompts, list):
valid = [
p.strip()
for p in raw_prompts
if isinstance(p, str) and len(p.strip().split()) <= 20
]
# This will keep up to 3 suggestions
short_prompts = valid[:3] if valid else None
if short_prompts:
cleaned["suggested_prompts"] = short_prompts
else:
# We dont want to add a None value suggested_prompts field
cleaned.pop("suggested_prompts", None)
else:
# suggested_prompts must be a list - removing it as its not here
cleaned.pop("suggested_prompts", None)
return BusinessUnderstandingInput(**cleaned)
async def get_business_understanding_input_from_tally(
email: str,
*,
require_api_key: bool = False,
) -> Optional[BusinessUnderstandingInput]:
if not _settings.secrets.tally_api_key:
if require_api_key:
raise RuntimeError("Tally API key is not configured")
logger.debug("Tally: no API key configured, skipping")
return None
masked = mask_email(email)
result = await find_submission_by_email(TALLY_FORM_ID, email)
if result is None:
logger.debug(f"Tally: no submission found for {masked}")
return None
submission, questions = result
logger.info(f"Tally: found submission for {masked}, extracting understanding")
formatted = format_submission_for_llm(submission, questions)
if not formatted.strip():
logger.warning("Tally: formatted submission was empty, skipping")
return None
return await extract_business_understanding_from_tally(formatted)
async def populate_understanding_from_tally(user_id: str, email: str) -> None:
"""Main orchestrator: check Tally for a matching submission and populate understanding.
@@ -445,9 +395,32 @@ async def populate_understanding_from_tally(user_id: str, email: str) -> None:
)
return
understanding_input = await get_business_understanding_input_from_tally(email)
if understanding_input is None:
# Check required config is present
settings = Settings()
if not settings.secrets.tally_api_key or not settings.secrets.tally_form_id:
logger.debug("Tally: Tally config incomplete, skipping")
return
if not settings.secrets.open_router_api_key:
logger.debug("Tally: no OpenRouter API key configured, skipping")
return
# Look up submission by email
masked = _mask_email(email)
result = await find_submission_by_email(settings.secrets.tally_form_id, email)
if result is None:
logger.debug(f"Tally: no submission found for {masked}")
return
submission, questions = result
logger.info(f"Tally: found submission for {masked}, extracting understanding")
# Format and extract
formatted = format_submission_for_llm(submission, questions)
if not formatted.strip():
logger.warning("Tally: formatted submission was empty, skipping")
return
understanding_input = await extract_business_understanding(formatted)
# Upsert into database
await upsert_business_understanding(user_id, understanding_input)

View File

@@ -12,11 +12,11 @@ from backend.data.tally import (
_build_email_index,
_format_answer,
_make_tally_client,
_mask_email,
_refresh_cache,
extract_business_understanding_from_tally,
extract_business_understanding,
find_submission_by_email,
format_submission_for_llm,
mask_email,
populate_understanding_from_tally,
)
@@ -248,7 +248,7 @@ async def test_populate_understanding_skips_no_api_key():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally._settings", mock_settings),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
@@ -284,7 +284,6 @@ async def test_populate_understanding_full_flow():
],
}
mock_input = MagicMock()
mock_input.suggested_prompts = ["Prompt 1", "Prompt 2", "Prompt 3"]
with (
patch(
@@ -292,14 +291,14 @@ async def test_populate_understanding_full_flow():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally._settings", mock_settings),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
return_value=(submission, SAMPLE_QUESTIONS),
),
patch(
"backend.data.tally.extract_business_understanding_from_tally",
"backend.data.tally.extract_business_understanding",
new_callable=AsyncMock,
return_value=mock_input,
) as mock_extract,
@@ -332,14 +331,14 @@ async def test_populate_understanding_handles_llm_timeout():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally._settings", mock_settings),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
return_value=(submission, SAMPLE_QUESTIONS),
),
patch(
"backend.data.tally.extract_business_understanding_from_tally",
"backend.data.tally.extract_business_understanding",
new_callable=AsyncMock,
side_effect=asyncio.TimeoutError(),
),
@@ -357,13 +356,13 @@ async def test_populate_understanding_handles_llm_timeout():
def test_mask_email():
assert mask_email("alice@example.com") == "a***e@example.com"
assert mask_email("ab@example.com") == "a***@example.com"
assert mask_email("a@example.com") == "a***@example.com"
assert _mask_email("alice@example.com") == "a***e@example.com"
assert _mask_email("ab@example.com") == "a***@example.com"
assert _mask_email("a@example.com") == "a***@example.com"
def test_mask_email_invalid():
assert mask_email("no-at-sign") == "***"
assert _mask_email("no-at-sign") == "***"
# ── Prompt construction (curly-brace safety) ─────────────────────────────────
@@ -394,11 +393,11 @@ def test_extraction_prompt_no_format_placeholders():
assert single_braces == [], f"Found format placeholders: {single_braces}"
# ── extract_business_understanding_from_tally ────────────────────────────────────────────
# ── extract_business_understanding ────────────────────────────────────────────
@pytest.mark.asyncio
async def test_extract_business_understanding_from_tally_success():
async def test_extract_business_understanding_success():
"""Happy path: LLM returns valid JSON that maps to BusinessUnderstandingInput."""
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
@@ -407,13 +406,6 @@ async def test_extract_business_understanding_from_tally_success():
"business_name": "Acme Corp",
"industry": "Technology",
"pain_points": ["manual reporting"],
"suggested_prompts": [
"Automate weekly reports",
"Set up invoice processing",
"Create a customer onboarding flow",
"Track project deadlines automatically",
"Send follow-up emails after meetings",
],
}
)
mock_response = MagicMock()
@@ -423,56 +415,16 @@ async def test_extract_business_understanding_from_tally_success():
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
result = await extract_business_understanding("Q: Name?\nA: Alice")
assert result.user_name == "Alice"
assert result.business_name == "Acme Corp"
assert result.industry == "Technology"
assert result.pain_points == ["manual reporting"]
# suggested_prompts validated and sliced to top 3
assert result.suggested_prompts == [
"Automate weekly reports",
"Set up invoice processing",
"Create a customer onboarding flow",
]
@pytest.mark.asyncio
async def test_extract_business_understanding_from_tally_filters_long_prompts():
"""Prompts exceeding 20 words are excluded and only top 3 are kept."""
long_prompt = " ".join(["word"] * 21)
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
{
"user_name": "Alice",
"suggested_prompts": [
long_prompt,
"Short prompt one",
long_prompt,
"Short prompt two",
"Short prompt three",
"Short prompt four",
],
}
)
mock_response = MagicMock()
mock_response.choices = [mock_choice]
mock_client = AsyncMock()
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
assert result.suggested_prompts == [
"Short prompt one",
"Short prompt two",
"Short prompt three",
]
@pytest.mark.asyncio
async def test_extract_business_understanding_from_tally_filters_nulls():
async def test_extract_business_understanding_filters_nulls():
"""Null values from LLM should be excluded from the result."""
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
@@ -485,7 +437,7 @@ async def test_extract_business_understanding_from_tally_filters_nulls():
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
result = await extract_business_understanding("Q: Name?\nA: Alice")
assert result.user_name == "Alice"
assert result.business_name is None
@@ -493,7 +445,7 @@ async def test_extract_business_understanding_from_tally_filters_nulls():
@pytest.mark.asyncio
async def test_extract_business_understanding_from_tally_invalid_json():
async def test_extract_business_understanding_invalid_json():
"""Invalid JSON from LLM should raise JSONDecodeError."""
mock_choice = MagicMock()
mock_choice.message.content = "not valid json {"
@@ -507,11 +459,11 @@ async def test_extract_business_understanding_from_tally_invalid_json():
patch("backend.data.tally.AsyncOpenAI", return_value=mock_client),
pytest.raises(json.JSONDecodeError),
):
await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
await extract_business_understanding("Q: Name?\nA: Alice")
@pytest.mark.asyncio
async def test_extract_business_understanding_from_tally_timeout():
async def test_extract_business_understanding_timeout():
"""LLM timeout should propagate as asyncio.TimeoutError."""
mock_client = AsyncMock()
mock_client.chat.completions.create.side_effect = asyncio.TimeoutError()
@@ -521,7 +473,7 @@ async def test_extract_business_understanding_from_tally_timeout():
patch("backend.data.tally._LLM_TIMEOUT", 0.001),
pytest.raises(asyncio.TimeoutError),
):
await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
await extract_business_understanding("Q: Name?\nA: Alice")
# ── _refresh_cache ───────────────────────────────────────────────────────────
@@ -540,7 +492,7 @@ async def test_refresh_cache_full_fetch():
submissions = SAMPLE_SUBMISSIONS
with (
patch("backend.data.tally._settings", mock_settings),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch(
"backend.data.tally.get_redis_async",
new_callable=AsyncMock,
@@ -588,7 +540,7 @@ async def test_refresh_cache_incremental_fetch():
new_submissions = [SAMPLE_SUBMISSIONS[0]] # Just Alice
with (
patch("backend.data.tally._settings", mock_settings),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch(
"backend.data.tally.get_redis_async",
new_callable=AsyncMock,

View File

@@ -86,11 +86,6 @@ class BusinessUnderstandingInput(pydantic.BaseModel):
None, description="Any additional context"
)
# Suggested prompts (UI-only, not included in system prompt)
suggested_prompts: Optional[list[str]] = pydantic.Field(
None, description="LLM-generated suggested prompts based on business context"
)
class BusinessUnderstanding(pydantic.BaseModel):
"""Full business understanding model returned from database."""
@@ -127,9 +122,6 @@ class BusinessUnderstanding(pydantic.BaseModel):
# Additional context
additional_notes: Optional[str] = None
# Suggested prompts (UI-only, not included in system prompt)
suggested_prompts: list[str] = pydantic.Field(default_factory=list)
@classmethod
def from_db(cls, db_record: CoPilotUnderstanding) -> "BusinessUnderstanding":
"""Convert database record to Pydantic model."""
@@ -157,7 +149,6 @@ class BusinessUnderstanding(pydantic.BaseModel):
current_software=_json_to_list(business.get("current_software")),
existing_automation=_json_to_list(business.get("existing_automation")),
additional_notes=business.get("additional_notes"),
suggested_prompts=_json_to_list(data.get("suggested_prompts")),
)
@@ -175,62 +166,6 @@ def _merge_lists(existing: list | None, new: list | None) -> list | None:
return merged
def merge_business_understanding_data(
existing_data: dict[str, Any],
input_data: BusinessUnderstandingInput,
) -> dict[str, Any]:
merged_data = dict(existing_data)
merged_business: dict[str, Any] = {}
if isinstance(merged_data.get("business"), dict):
merged_business = dict(merged_data["business"])
business_string_fields = [
"job_title",
"business_name",
"industry",
"business_size",
"user_role",
"additional_notes",
]
business_list_fields = [
"key_workflows",
"daily_activities",
"pain_points",
"bottlenecks",
"manual_tasks",
"automation_goals",
"current_software",
"existing_automation",
]
if input_data.user_name is not None:
merged_data["name"] = input_data.user_name
for field in business_string_fields:
value = getattr(input_data, field)
if value is not None:
merged_business[field] = value
for field in business_list_fields:
value = getattr(input_data, field)
if value is not None:
existing_list = _json_to_list(merged_business.get(field))
merged_list = _merge_lists(existing_list, value)
merged_business[field] = merged_list
merged_business["version"] = 1
merged_data["business"] = merged_business
# suggested_prompts lives at the top level (not under `business`) because
# it's a UI-only artifact consumed by the frontend, not business understanding
# data. The `business` sub-dict feeds the system prompt.
if input_data.suggested_prompts is not None:
merged_data["suggested_prompts"] = input_data.suggested_prompts
return merged_data
async def _get_from_cache(user_id: str) -> Optional[BusinessUnderstanding]:
"""Get business understanding from Redis cache."""
try:
@@ -310,18 +245,63 @@ async def upsert_business_understanding(
where={"userId": user_id}
)
# Get existing data structure or start fresh
existing_data: dict[str, Any] = {}
if existing and isinstance(existing.data, dict):
existing_data = dict(existing.data)
merged_data = merge_business_understanding_data(existing_data, input_data)
existing_business: dict[str, Any] = {}
if isinstance(existing_data.get("business"), dict):
existing_business = dict(existing_data["business"])
# Business fields (stored inside business object)
business_string_fields = [
"job_title",
"business_name",
"industry",
"business_size",
"user_role",
"additional_notes",
]
business_list_fields = [
"key_workflows",
"daily_activities",
"pain_points",
"bottlenecks",
"manual_tasks",
"automation_goals",
"current_software",
"existing_automation",
]
# Handle top-level name field
if input_data.user_name is not None:
existing_data["name"] = input_data.user_name
# Business string fields - overwrite if provided
for field in business_string_fields:
value = getattr(input_data, field)
if value is not None:
existing_business[field] = value
# Business list fields - merge with existing
for field in business_list_fields:
value = getattr(input_data, field)
if value is not None:
existing_list = _json_to_list(existing_business.get(field))
merged = _merge_lists(existing_list, value)
existing_business[field] = merged
# Set version and nest business data
existing_business["version"] = 1
existing_data["business"] = existing_business
# Upsert with the merged data
record = await CoPilotUnderstanding.prisma().upsert(
where={"userId": user_id},
data={
"create": {"userId": user_id, "data": SafeJson(merged_data)},
"update": {"data": SafeJson(merged_data)},
"create": {"userId": user_id, "data": SafeJson(existing_data)},
"update": {"data": SafeJson(existing_data)},
},
)

View File

@@ -1,102 +0,0 @@
"""Tests for business understanding merge and format logic."""
from datetime import datetime, timezone
from typing import Any
from backend.data.understanding import (
BusinessUnderstanding,
BusinessUnderstandingInput,
format_understanding_for_prompt,
merge_business_understanding_data,
)
def _make_input(**kwargs: Any) -> BusinessUnderstandingInput:
"""Create a BusinessUnderstandingInput with only the specified fields."""
return BusinessUnderstandingInput.model_validate(kwargs)
# ─── merge_business_understanding_data: suggested_prompts ─────────────
def test_merge_suggested_prompts_overwrites_existing():
"""New suggested_prompts should fully replace existing ones (not append)."""
existing = {
"name": "Alice",
"business": {"industry": "Tech", "version": 1},
"suggested_prompts": ["Old prompt 1", "Old prompt 2"],
}
input_data = _make_input(
suggested_prompts=["New prompt A", "New prompt B", "New prompt C"],
)
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == [
"New prompt A",
"New prompt B",
"New prompt C",
]
def test_merge_suggested_prompts_none_preserves_existing():
"""When input has suggested_prompts=None, existing prompts are preserved."""
existing = {
"name": "Alice",
"business": {"industry": "Tech", "version": 1},
"suggested_prompts": ["Keep me"],
}
input_data = _make_input(industry="Finance")
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == ["Keep me"]
assert result["business"]["industry"] == "Finance"
def test_merge_suggested_prompts_added_to_empty_data():
"""Suggested prompts are set at top level even when starting from empty data."""
existing: dict[str, Any] = {}
input_data = _make_input(suggested_prompts=["Prompt 1"])
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == ["Prompt 1"]
def test_merge_suggested_prompts_empty_list_overwrites():
"""An explicit empty list should overwrite existing prompts."""
existing: dict[str, Any] = {
"suggested_prompts": ["Old prompt"],
"business": {"version": 1},
}
input_data = _make_input(suggested_prompts=[])
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == []
# ─── format_understanding_for_prompt: excludes suggested_prompts ──────
def test_format_understanding_excludes_suggested_prompts():
"""suggested_prompts is UI-only and must NOT appear in the system prompt."""
understanding = BusinessUnderstanding(
id="test-id",
user_id="user-1",
created_at=datetime.now(tz=timezone.utc),
updated_at=datetime.now(tz=timezone.utc),
user_name="Alice",
industry="Technology",
suggested_prompts=["Automate reports", "Set up alerts", "Track KPIs"],
)
formatted = format_understanding_for_prompt(understanding)
assert "Alice" in formatted
assert "Technology" in formatted
assert "suggested_prompts" not in formatted
assert "Automate reports" not in formatted
assert "Set up alerts" not in formatted
assert "Track KPIs" not in formatted

Some files were not shown because too many files have changed in this diff Show More